linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* corrupt leaf; invalid root item size
@ 2020-06-03 13:37 Thorsten Rehm
  2020-06-04  1:30 ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Thorsten Rehm @ 2020-06-03 13:37 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I've updated my system (Debian testing) [1] several months ago (~
December) and I noticed a lot of corrupt leaf messages flooding my
kern.log [2]. Furthermore my system had some trouble, e.g.
applications were terminated after some uptime, due to the btrfs
filesystem errors. This was with kernel 5.3.
The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.

I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
release and with this kernel there aren't any corrupt leaf messages
and the problem is gone. IMHO, it must be something coming with kernel
5.3 (or 5.x).

My harddisk is a SSD which is responsible for the root partition. I've
encrypted my filesystem with LUKS and just right after I entered my
password at the boot, the first corrupt leaf errors appear.

An error message looks like this:
May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
size, have 239 expect 439

"root=1", "slot=32", "have 239 expect 439" is always the same at every
error line. Only the block number changes.

Interestingly it's the very same as reported to the ML here [3]. I've
contacted the reporter, but he didn't have a solution for me, because
he changed to a different filesystem.

I've already tried "btrfs scrub" and "btrfs check --readonly /" in
rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
values of the SSD, which are fine. Furthermore I've tested my RAM, but
again, w/o any errors.

So, I have no more ideas what I can do. Could you please help me to
investigate this further? Could it be a bug?

Thank you very much.

Best regards,
Thorsten



1:
$ cat /etc/debian_version
bullseye/sid

$ uname -a
[no problem with this kernel]
Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux

$ btrfs --version
btrfs-progs v5.6

$ sudo btrfs fi show
Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
        Total devices 1 FS bytes used 7.33GiB
        devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt

$ btrfs fi df /
Data, single: total=22.01GiB, used=7.16GiB
System, DUP: total=32.00MiB, used=4.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=2.00GiB, used=168.19MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=25.42MiB, used=0.00B


2:
[several messages per second]
May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
size, have 239 expect 439
May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
size, have 239 expect 439
May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
size, have 239 expect 439
May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
size, have 239 expect 439
May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
size, have 239 expect 439
May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
size, have 239 expect 439
May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
size, have 239 expect 439
May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
size, have 239 expect 439
May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
size, have 239 expect 439
May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
size, have 239 expect 439

3:
https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-03 13:37 corrupt leaf; invalid root item size Thorsten Rehm
@ 2020-06-04  1:30 ` Qu Wenruo
  2020-06-04  9:45   ` Thorsten Rehm
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2020-06-04  1:30 UTC (permalink / raw)
  To: Thorsten Rehm, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5223 bytes --]



On 2020/6/3 下午9:37, Thorsten Rehm wrote:
> Hi,
> 
> I've updated my system (Debian testing) [1] several months ago (~
> December) and I noticed a lot of corrupt leaf messages flooding my
> kern.log [2]. Furthermore my system had some trouble, e.g.
> applications were terminated after some uptime, due to the btrfs
> filesystem errors. This was with kernel 5.3.
> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
> 
> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
> release and with this kernel there aren't any corrupt leaf messages
> and the problem is gone. IMHO, it must be something coming with kernel
> 5.3 (or 5.x).

V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
such *obviously* wrong metadata.
> 
> My harddisk is a SSD which is responsible for the root partition. I've
> encrypted my filesystem with LUKS and just right after I entered my
> password at the boot, the first corrupt leaf errors appear.
> 
> An error message looks like this:
> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> size, have 239 expect 439

Btrfs root items have fixed size. This is already something very bad.

Furthermore, the item size is smaller than expected, which means we can
easily get garbage. I'm a little surprised that older kernel can even
work without crashing the whole kernel.

Some extra info could help us to find out how badly the fs is corrupted.
# btrfs ins dump-tree -b 35799040 /dev/dm-0

> 
> "root=1", "slot=32", "have 239 expect 439" is always the same at every
> error line. Only the block number changes.

And dumps for the other block numbers too.

> 
> Interestingly it's the very same as reported to the ML here [3]. I've
> contacted the reporter, but he didn't have a solution for me, because
> he changed to a different filesystem.
> 
> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
> values of the SSD, which are fine. Furthermore I've tested my RAM, but
> again, w/o any errors.

This doesn't look like a bit flip, so not RAM problems.

Don't have any better advice until we got the dumps, but I'd recommend
to backup your data since it's still possible.

Thanks,
Qu

> 
> So, I have no more ideas what I can do. Could you please help me to
> investigate this further? Could it be a bug?
> 
> Thank you very much.
> 
> Best regards,
> Thorsten
> 
> 
> 
> 1:
> $ cat /etc/debian_version
> bullseye/sid
> 
> $ uname -a
> [no problem with this kernel]
> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
> 
> $ btrfs --version
> btrfs-progs v5.6
> 
> $ sudo btrfs fi show
> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
>         Total devices 1 FS bytes used 7.33GiB
>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
> 
> $ btrfs fi df /
> Data, single: total=22.01GiB, used=7.16GiB
> System, DUP: total=32.00MiB, used=4.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=2.00GiB, used=168.19MiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=25.42MiB, used=0.00B
> 
> 
> 2:
> [several messages per second]
> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> size, have 239 expect 439
> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
> size, have 239 expect 439
> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
> size, have 239 expect 439
> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
> size, have 239 expect 439
> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
> size, have 239 expect 439
> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
> size, have 239 expect 439
> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
> size, have 239 expect 439
> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
> size, have 239 expect 439
> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
> size, have 239 expect 439
> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
> size, have 239 expect 439
> 
> 3:
> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-04  1:30 ` Qu Wenruo
@ 2020-06-04  9:45   ` Thorsten Rehm
  2020-06-04 10:00     ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Thorsten Rehm @ 2020-06-04  9:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Thank you for you answer.
I've just updated my system, did a reboot and it's running with a
5.6.0-2-amd64 now.
So, this is how my kern.log looks like, just right after the start:

--- snip ---
$ grep 'corrupt leaf' /var/log/kern.log
Jun  4 11:17:31 foo kernel: [   17.318906] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=55352561664 slot=32, invalid root
item size, have 239 expect 439
Jun  4 11:17:31 foo kernel: [   18.481280] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=29552640 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:17:31 foo kernel: [   19.384536] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=29978624 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:17:52 foo kernel: [   53.325803] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=33017856 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:17:59 foo kernel: [   60.297490] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=33316864 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:00 foo kernel: [   61.036287] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=34476032 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:02 foo kernel: [   63.935084] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=34799616 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:03 foo kernel: [   64.655925] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=36147200 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:08 foo kernel: [   69.039268] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:09 foo kernel: [   70.117411] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=38862848 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:15 foo kernel: [   76.437708] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=39235584 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:17 foo kernel: [   78.742254] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=40624128 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:22 foo kernel: [   83.297564] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=40849408 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:23 foo kernel: [   84.091532] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=41259008 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:24 foo kernel: [   85.020701] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=41410560 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:25 foo kernel: [   86.131558] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=41639936 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:26 foo kernel: [   87.072399] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=41832448 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:27 foo kernel: [   88.541477] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=41975808 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:28 foo kernel: [   89.115634] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=42217472 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:29 foo kernel: [   90.103851] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=42438656 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:29 foo kernel: [   90.809648] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=42627072 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:30 foo kernel: [   91.440182] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=42909696 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:31 foo kernel: [   92.340470] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=43171840 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:31 foo kernel: [   92.870607] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=43511808 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:33 foo kernel: [   94.219649] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=43868160 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:33 foo kernel: [   94.969616] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=44179456 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:35 foo kernel: [   96.562527] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=44670976 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:36 foo kernel: [   97.129857] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=44900352 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:36 foo kernel: [   97.748836] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=44998656 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:37 foo kernel: [   98.391906] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=45289472 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:38 foo kernel: [   99.089307] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=45383680 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:38 foo kernel: [   99.461716] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=45555712 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:39 foo kernel: [  100.158759] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=45752320 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:39 foo kernel: [  100.740379] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=46080000 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:40 foo kernel: [  101.369630] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=46178304 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:40 foo kernel: [  101.800933] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=46428160 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:41 foo kernel: [  102.498185] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=49192960 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:44 foo kernel: [  105.790049] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=49565696 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:46 foo kernel: [  107.411126] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=49868800 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:46 foo kernel: [  107.801978] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=49987584 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:47 foo kernel: [  108.270144] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=51200000 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:54 foo kernel: [  115.373156] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=51433472 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:56 foo kernel: [  117.062892] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=51310592 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:56 foo kernel: [  117.535135] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=51961856 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:57 foo kernel: [  118.001052] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=51216384 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:57 foo kernel: [  118.683809] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=52215808 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:59 foo kernel: [  120.062056] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=52436992 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:18:59 foo kernel: [  120.561448] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=52490240 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:00 foo kernel: [  121.304476] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=52662272 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:00 foo kernel: [  121.950378] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=54153216 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:03 foo kernel: [  124.896498] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=54390784 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:04 foo kernel: [  125.583191] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=54599680 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:07 foo kernel: [  128.121654] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=55197696 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:07 foo kernel: [  128.598669] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=56119296 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:12 foo kernel: [  133.197514] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=56369152 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:13 foo kernel: [  134.963297] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=56881152 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:14 foo kernel: [  135.356174] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=57028608 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:14 foo kernel: [  135.820369] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=57270272 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:16 foo kernel: [  137.022879] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=57438208 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:16 foo kernel: [  137.516500] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=57614336 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:17 foo kernel: [  138.105293] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=57716736 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:17 foo kernel: [  138.523561] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=57962496 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:18 foo kernel: [  139.495373] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=58118144 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:18 foo kernel: [  139.997187] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=58273792 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:19 foo kernel: [  140.273888] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=58449920 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:19 foo kernel: [  140.714191] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=58843136 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:20 foo kernel: [  141.905748] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=59174912 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:21 foo kernel: [  142.654312] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=59387904 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:22 foo kernel: [  143.054925] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=59469824 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:22 foo kernel: [  143.475570] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=59674624 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:23 foo kernel: [  144.235453] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=59731968 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:23 foo kernel: [  144.754629] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=59830272 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:24 foo kernel: [  145.159837] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=59924480 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:24 foo kernel: [  145.726221] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=60141568 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:25 foo kernel: [  146.585324] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=60342272 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:26 foo kernel: [  147.087844] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=60502016 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:26 foo kernel: [  147.484708] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=60678144 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:26 foo kernel: [  147.797383] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=60952576 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:27 foo kernel: [  148.766842] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=61206528 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:28 foo kernel: [  149.214399] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=61345792 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:28 foo kernel: [  149.524317] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=61493248 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:28 foo kernel: [  149.900128] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=61706240 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:30 foo kernel: [  151.036028] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=62074880 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:30 foo kernel: [  151.962081] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=62574592 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:32 foo kernel: [  153.292089] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=62902272 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:33 foo kernel: [  154.005536] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=63176704 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:33 foo kernel: [  154.385265] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=63303680 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:33 foo kernel: [  154.663472] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=63455232 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:33 foo kernel: [  154.964766] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=63557632 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:34 foo kernel: [  155.263943] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=63688704 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:34 foo kernel: [  155.523667] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=63827968 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:34 foo kernel: [  155.863992] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=63963136 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:35 foo kernel: [  156.121666] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=64106496 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:35 foo kernel: [  156.434339] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=64598016 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:37 foo kernel: [  158.309759] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=64815104 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:37 foo kernel: [  158.956640] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=65032192 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:38 foo kernel: [  159.275606] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=65220608 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:38 foo kernel: [  159.655287] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=65429504 slot=32, invalid root item
size, have 239 expect 439
Jun  4 11:19:39 foo kernel: [  160.017161] BTRFS critical (device
dm-0): corrupt leaf: root=1 block=65576960 slot=32, invalid root item
size, have 239 expect 439
[...]
--- snap ---

There are too many blocks. I just picked three randomly:

=== Block 33017856 ===
$ btrfs ins dump-tree -b 33017856 /dev/dm-0
btrfs-progs v5.6
leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
leaf 33017856 flags 0x1(WRITTEN) backref revision 1
fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
        item 0 key (4000670 EXTENT_DATA 1572864) itemoff 3942 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1568768 nr 4096 ram 4194304
                extent compression 0 (none)
        item 1 key (4000670 EXTENT_DATA 1576960) itemoff 3889 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1121533952 nr 4096
                extent data offset 0 nr 20480 ram 20480
                extent compression 2 (lzo)
        item 2 key (4000670 EXTENT_DATA 1597440) itemoff 3836 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1593344 nr 4096 ram 4194304
                extent compression 0 (none)
        item 3 key (4000670 EXTENT_DATA 1601536) itemoff 3783 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1122422784 nr 4096
                extent data offset 0 nr 8192 ram 8192
                extent compression 2 (lzo)
        item 4 key (4000670 EXTENT_DATA 1609728) itemoff 3730 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1605632 nr 4096 ram 4194304
                extent compression 0 (none)
        item 5 key (4000670 EXTENT_DATA 1613824) itemoff 3677 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1122488320 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 6 key (4000670 EXTENT_DATA 1617920) itemoff 3624 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1613824 nr 4096 ram 4194304
                extent compression 0 (none)
        item 7 key (4000670 EXTENT_DATA 1622016) itemoff 3571 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1122660352 nr 4096
                extent data offset 0 nr 49152 ram 49152
                extent compression 2 (lzo)
        item 8 key (4000670 EXTENT_DATA 1671168) itemoff 3518 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1667072 nr 4096 ram 4194304
                extent compression 0 (none)
        item 9 key (4000670 EXTENT_DATA 1675264) itemoff 3465 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1122840576 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 10 key (4000670 EXTENT_DATA 1679360) itemoff 3412 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1675264 nr 4096 ram 4194304
                extent compression 0 (none)
        item 11 key (4000670 EXTENT_DATA 1683456) itemoff 3359 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1122869248 nr 4096
                extent data offset 0 nr 28672 ram 28672
                extent compression 2 (lzo)
        item 12 key (4000670 EXTENT_DATA 1712128) itemoff 3306 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1708032 nr 8192 ram 4194304
                extent compression 0 (none)
        item 13 key (4000670 EXTENT_DATA 1720320) itemoff 3253 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1123074048 nr 4096
                extent data offset 0 nr 12288 ram 12288
                extent compression 2 (lzo)
        item 14 key (4000670 EXTENT_DATA 1732608) itemoff 3200 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1728512 nr 8192 ram 4194304
                extent compression 0 (none)
        item 15 key (4000670 EXTENT_DATA 1740800) itemoff 3147 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1123078144 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 16 key (4000670 EXTENT_DATA 1744896) itemoff 3094 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1740800 nr 4096 ram 4194304
                extent compression 0 (none)
        item 17 key (4000670 EXTENT_DATA 1748992) itemoff 3041 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1123119104 nr 4096
                extent data offset 0 nr 20480 ram 20480
                extent compression 2 (lzo)
        item 18 key (4000670 EXTENT_DATA 1769472) itemoff 2988 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1765376 nr 4096 ram 4194304
                extent compression 0 (none)
        item 19 key (4000670 EXTENT_DATA 1773568) itemoff 2935 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1123926016 nr 4096
                extent data offset 0 nr 16384 ram 16384
                extent compression 2 (lzo)
        item 20 key (4000670 EXTENT_DATA 1789952) itemoff 2882 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1785856 nr 4096 ram 4194304
                extent compression 0 (none)
        item 21 key (4000670 EXTENT_DATA 1794048) itemoff 2829 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1124212736 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 22 key (4000670 EXTENT_DATA 1798144) itemoff 2776 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1794048 nr 8192 ram 4194304
                extent compression 0 (none)
        item 23 key (4000670 EXTENT_DATA 1806336) itemoff 2723 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1124581376 nr 4096
                extent data offset 0 nr 53248 ram 53248
                extent compression 2 (lzo)
        item 24 key (4000670 EXTENT_DATA 1859584) itemoff 2670 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1855488 nr 4096 ram 4194304
                extent compression 0 (none)
        item 25 key (4000670 EXTENT_DATA 1863680) itemoff 2617 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1125728256 nr 4096
                extent data offset 0 nr 12288 ram 12288
                extent compression 2 (lzo)
        item 26 key (4000670 EXTENT_DATA 1875968) itemoff 2564 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1871872 nr 4096 ram 4194304
                extent compression 0 (none)
        item 27 key (4000670 EXTENT_DATA 1880064) itemoff 2511 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1125928960 nr 4096
                extent data offset 0 nr 20480 ram 20480
                extent compression 2 (lzo)
        item 28 key (4000670 EXTENT_DATA 1900544) itemoff 2458 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1896448 nr 4096 ram 4194304
                extent compression 0 (none)
        item 29 key (4000670 EXTENT_DATA 1904640) itemoff 2405 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1126227968 nr 4096
                extent data offset 0 nr 20480 ram 20480
                extent compression 2 (lzo)
        item 30 key (4000670 EXTENT_DATA 1925120) itemoff 2352 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1921024 nr 8192 ram 4194304
                extent compression 0 (none)
        item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1126502400 nr 4096
                extent data offset 0 nr 8192 ram 8192
                extent compression 2 (lzo)
        item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1937408 nr 4096 ram 4194304
                extent compression 0 (none)
        item 33 key (4000670 EXTENT_DATA 1945600) itemoff 2193 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1129639936 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 34 key (4000670 EXTENT_DATA 1949696) itemoff 2140 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1945600 nr 4096 ram 4194304
                extent compression 0 (none)
        item 35 key (4000670 EXTENT_DATA 1953792) itemoff 2087 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1130332160 nr 4096
                extent data offset 0 nr 8192 ram 8192
                extent compression 2 (lzo)
        item 36 key (4000670 EXTENT_DATA 1961984) itemoff 2034 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1957888 nr 4096 ram 4194304
                extent compression 0 (none)
        item 37 key (4000670 EXTENT_DATA 1966080) itemoff 1981 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1140027392 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 38 key (4000670 EXTENT_DATA 1970176) itemoff 1928 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1966080 nr 4096 ram 4194304
                extent compression 0 (none)
        item 39 key (4000670 EXTENT_DATA 1974272) itemoff 1875 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1143840768 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 40 key (4000670 EXTENT_DATA 1978368) itemoff 1822 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1974272 nr 4096 ram 4194304
                extent compression 0 (none)
        item 41 key (4000670 EXTENT_DATA 1982464) itemoff 1769 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1143992320 nr 4096
                extent data offset 0 nr 8192 ram 8192
                extent compression 2 (lzo)
        item 42 key (4000670 EXTENT_DATA 1990656) itemoff 1716 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 1986560 nr 4096 ram 4194304
                extent compression 0 (none)
        item 43 key (4000670 EXTENT_DATA 1994752) itemoff 1663 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1144045568 nr 4096
                extent data offset 0 nr 20480 ram 20480
                extent compression 2 (lzo)
        item 44 key (4000670 EXTENT_DATA 2015232) itemoff 1610 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 2011136 nr 4096 ram 4194304
                extent compression 0 (none)
        item 45 key (4000670 EXTENT_DATA 2019328) itemoff 1557 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1144172544 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 46 key (4000670 EXTENT_DATA 2023424) itemoff 1504 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 2019328 nr 4096 ram 4194304
                extent compression 0 (none)
        item 47 key (4000670 EXTENT_DATA 2027520) itemoff 1451 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1144614912 nr 4096
                extent data offset 0 nr 32768 ram 32768
                extent compression 2 (lzo)
        item 48 key (4000670 EXTENT_DATA 2060288) itemoff 1398 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 2056192 nr 12288 ram 4194304
                extent compression 0 (none)
        item 49 key (4000670 EXTENT_DATA 2072576) itemoff 1345 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 1144692736 nr 4096
                extent data offset 0 nr 8192 ram 8192
                extent compression 2 (lzo)
        item 50 key (4000670 EXTENT_DATA 2080768) itemoff 1292 itemsize 53
                generation 24749502 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 2076672 nr 4096 ram 4194304
                extent compression 0 (none)


=== Block 44900352  ===
btrfs ins dump-tree -b 44900352 /dev/dm-0
btrfs-progs v5.6
leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
leaf 44900352 flags 0x1(WRITTEN) backref revision 1
fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
        item 0 key (4000688 INODE_ITEM 0) itemoff 3835 itemsize 160
                generation 24749518 transid 24749521 size 100 nbytes 100
                block group 0 mode 100666 links 1 uid 1000 gid 1000 rdev 0
                sequence 2 flags 0x0(none)
                atime 1591262309.809076167 (2020-06-04 11:18:29)
                ctime 1591262311.864927044 (2020-06-04 11:18:31)
                mtime 1591262309.809076167 (2020-06-04 11:18:29)
                otime 1591262309.809076167 (2020-06-04 11:18:29)
        item 1 key (4000688 INODE_REF 134426) itemoff 3813 itemsize 22
                index 6104 namelen 12 name: profile_stor
        item 2 key (4000688 EXTENT_DATA 0) itemoff 3749 itemsize 64
                generation 24749518 type 0 (inline)
                inline extent data size 43 ram_bytes 100 compression 2 (lzo)
        item 3 key (4000691 INODE_ITEM 0) itemoff 3589 itemsize 160
                generation 24749521 transid 24749525 size 2428 nbytes 2428
                block group 0 mode 100666 links 1 uid 1000 gid 1000 rdev 0
                sequence 2 flags 0x0(none)
                atime 1591262311.868926754 (2020-06-04 11:18:31)
                ctime 1591262316.124617986 (2020-06-04 11:18:36)
                mtime 1591262311.872926463 (2020-06-04 11:18:31)
                otime 1591262311.868926754 (2020-06-04 11:18:31)
        item 4 key (4000691 INODE_REF 134426) itemoff 3562 itemsize 27
                index 6107 namelen 17 name: notification_buck
        item 5 key (4000691 EXTENT_DATA 0) itemoff 2294 itemsize 1268
                generation 24749521 type 0 (inline)
                inline extent data size 1247 ram_bytes 2428 compression 2 (lzo)
        item 6 key (4000692 INODE_ITEM 0) itemoff 2134 itemsize 160
                generation 24749521 transid 24749525 size 100 nbytes 100
                block group 0 mode 100666 links 1 uid 1000 gid 1000 rdev 0
                sequence 2 flags 0x0(none)
                atime 1591262311.868926754 (2020-06-04 11:18:31)
                ctime 1591262316.124617986 (2020-06-04 11:18:36)
                mtime 1591262311.868926754 (2020-06-04 11:18:31)
                otime 1591262311.868926754 (2020-06-04 11:18:31)
        item 7 key (4000692 INODE_REF 134426) itemoff 2107 itemsize 27
                index 6108 namelen 17 name: notification_stor
        item 8 key (4000692 EXTENT_DATA 0) itemoff 2043 itemsize 64
                generation 24749522 type 0 (inline)
                inline extent data size 43 ram_bytes 100 compression 2 (lzo)
        item 9 key (4000695 INODE_ITEM 0) itemoff 1883 itemsize 160
                generation 24749525 transid 24749525 size 1241 nbytes 1241
                block group 0 mode 100666 links 1 uid 1000 gid 1000 rdev 0
                sequence 0 flags 0x0(none)
                atime 1591262316.124617986 (2020-06-04 11:18:36)
                ctime 1591262316.128617696 (2020-06-04 11:18:36)
                mtime 1591262316.128617696 (2020-06-04 11:18:36)
                otime 1591262316.124617986 (2020-06-04 11:18:36)
        item 10 key (4000695 INODE_REF 134426) itemoff 1846 itemsize 37
                index 6109 namelen 27 name: media_children_compact_buck
        item 11 key (4000695 EXTENT_DATA 0) itemoff 1571 itemsize 275
                generation 24749525 type 0 (inline)
                inline extent data size 254 ram_bytes 1241 compression 2 (lzo)
        item 12 key (4000696 INODE_ITEM 0) itemoff 1411 itemsize 160
                generation 24749525 transid 24749527 size 100 nbytes 100
                block group 0 mode 100666 links 1 uid 1000 gid 1000 rdev 0
                sequence 0 flags 0x0(none)
                atime 1591262316.128617696 (2020-06-04 11:18:36)
                ctime 1591262316.128617696 (2020-06-04 11:18:36)
                mtime 1591262316.128617696 (2020-06-04 11:18:36)
                otime 1591262316.128617696 (2020-06-04 11:18:36)
        item 13 key (4000696 INODE_REF 134426) itemoff 1374 itemsize 37
                index 6110 namelen 27 name: media_children_compact_stor
        item 14 key (4000696 EXTENT_DATA 0) itemoff 1310 itemsize 64
                generation 24749527 type 0 (inline)
                inline extent data size 43 ram_bytes 100 compression 2 (lzo)
        item 15 key (4000697 INODE_ITEM 0) itemoff 1150 itemsize 160
                generation 24749526 transid 24749526 size 8720 nbytes 12288
                block group 0 mode 100644 links 1 uid 1002 gid 1002 rdev 0
                sequence 0 flags 0x0(none)
                atime 1591262316.772570966 (2020-06-04 11:18:36)
                ctime 1591262316.772570966 (2020-06-04 11:18:36)
                mtime 1591262316.772570966 (2020-06-04 11:18:36)
                otime 1591262316.772570966 (2020-06-04 11:18:36)
        item 16 key (4000697 INODE_REF 3137452) itemoff 1119 itemsize 31
                index 44 namelen 21 name: Textures13.db-journal
        item 17 key (4000697 EXTENT_DATA 0) itemoff 1066 itemsize 53
                generation 24749526 type 1 (regular)
                extent data disk byte 14946304 nr 4096
                extent data offset 0 nr 12288 ram 12288
                extent compression 2 (lzo)
        item 18 key (ORPHAN ORPHAN_ITEM 4000554) itemoff 1066 itemsize 0
                orphan item


=== Block 55352561664 ===
$ btrfs ins dump-tree -b 55352561664 /dev/dm-0
btrfs-progs v5.6
leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
        item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
                generation 24703953 transid 24703953 size 262144
nbytes 8595701760
                block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                sequence 32790 flags
0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
                atime 0.0 (1970-01-01 01:00:00)
                ctime 1589235096.486856306 (2020-05-12 00:11:36)
                mtime 0.0 (1970-01-01 01:00:00)
                otime 0.0 (1970-01-01 01:00:00)
        item 1 key (289 EXTENT_DATA 0) itemoff 3782 itemsize 53
                generation 24703953 type 1 (regular)
                extent data disk byte 3544403968 nr 262144
                extent data offset 0 nr 262144 ram 262144
                extent compression 0 (none)
        item 2 key (290 INODE_ITEM 0) itemoff 3622 itemsize 160
                generation 20083477 transid 20083477 size 262144
nbytes 6823346176
                block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                sequence 26029 flags
0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
                atime 0.0 (1970-01-01 01:00:00)
                ctime 1587576718.255911112 (2020-04-22 19:31:58)
                mtime 0.0 (1970-01-01 01:00:00)
                otime 0.0 (1970-01-01 01:00:00)
        item 3 key (290 EXTENT_DATA 0) itemoff 3569 itemsize 53
                generation 20083477 type 1 (regular)
                extent data disk byte 3373088768 nr 262144
                extent data offset 0 nr 262144 ram 262144
                extent compression 0 (none)
        item 4 key (291 INODE_ITEM 0) itemoff 3409 itemsize 160
                generation 24712508 transid 24712508 size 262144
nbytes 5454692352
                block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                sequence 20808 flags
0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
                atime 0.0 (1970-01-01 01:00:00)
                ctime 1589569287.32299836 (2020-05-15 21:01:27)
                mtime 0.0 (1970-01-01 01:00:00)
                otime 0.0 (1970-01-01 01:00:00)
        item 5 key (291 EXTENT_DATA 0) itemoff 3356 itemsize 53
                generation 24712508 type 1 (regular)
                extent data disk byte 5286600704 nr 262144
                extent data offset 0 nr 262144 ram 262144
                extent compression 0 (none)
        item 6 key (292 INODE_ITEM 0) itemoff 3196 itemsize 160
                generation 24749497 transid 24749497 size 262144
nbytes 3022001274880
                block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                sequence 11528020 flags
0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
                atime 0.0 (1970-01-01 01:00:00)
                ctime 1591262206.30950961 (2020-06-04 11:16:46)
                mtime 0.0 (1970-01-01 01:00:00)
                otime 0.0 (1970-01-01 01:00:00)
        item 7 key (292 EXTENT_DATA 0) itemoff 3143 itemsize 53
                generation 24749497 type 1 (regular)
                extent data disk byte 2998218752 nr 262144
                extent data offset 0 nr 262144 ram 262144
                extent compression 0 (none)
        item 8 key (FREE_SPACE UNTYPED 29360128) itemoff 3102 itemsize 41
                location key (256 INODE_ITEM 0)
                cache generation 24749495 entries 18 bitmaps 8
        item 9 key (FREE_SPACE UNTYPED 1103101952) itemoff 3061 itemsize 41
                location key (257 INODE_ITEM 0)
                cache generation 24749496 entries 34 bitmaps 8
        item 10 key (FREE_SPACE UNTYPED 2176843776) itemoff 3020 itemsize 41
                location key (258 INODE_ITEM 0)
                cache generation 24749497 entries 11 bitmaps 8
        item 11 key (FREE_SPACE UNTYPED 3250585600) itemoff 2979 itemsize 41
                location key (259 INODE_ITEM 0)
                cache generation 24749497 entries 8 bitmaps 8
        item 12 key (FREE_SPACE UNTYPED 4324327424) itemoff 2938 itemsize 41
                location key (261 INODE_ITEM 0)
                cache generation 24749495 entries 141 bitmaps 8
        item 13 key (FREE_SPACE UNTYPED 5398069248) itemoff 2897 itemsize 41
                location key (260 INODE_ITEM 0)
                cache generation 24749493 entries 23 bitmaps 8
        item 14 key (FREE_SPACE UNTYPED 6471811072) itemoff 2856 itemsize 41
                location key (262 INODE_ITEM 0)
                cache generation 24749493 entries 70 bitmaps 8
        item 15 key (FREE_SPACE UNTYPED 7545552896) itemoff 2815 itemsize 41
                location key (263 INODE_ITEM 0)
                cache generation 24749493 entries 22 bitmaps 8
        item 16 key (FREE_SPACE UNTYPED 8619294720) itemoff 2774 itemsize 41
                location key (264 INODE_ITEM 0)
                cache generation 24729885 entries 35 bitmaps 8
        item 17 key (FREE_SPACE UNTYPED 9693036544) itemoff 2733 itemsize 41
                location key (265 INODE_ITEM 0)
                cache generation 22144003 entries 30 bitmaps 8
        item 18 key (FREE_SPACE UNTYPED 10766778368) itemoff 2692 itemsize 41
                location key (266 INODE_ITEM 0)
                cache generation 24749177 entries 148 bitmaps 4
        item 19 key (FREE_SPACE UNTYPED 11840520192) itemoff 2651 itemsize 41
                location key (267 INODE_ITEM 0)
                cache generation 24749152 entries 33 bitmaps 8
        item 20 key (FREE_SPACE UNTYPED 12914262016) itemoff 2610 itemsize 41
                location key (268 INODE_ITEM 0)
                cache generation 24706177 entries 11 bitmaps 8
        item 21 key (FREE_SPACE UNTYPED 13988003840) itemoff 2569 itemsize 41
                location key (269 INODE_ITEM 0)
                cache generation 21296150 entries 46 bitmaps 8
        item 22 key (FREE_SPACE UNTYPED 15061745664) itemoff 2528 itemsize 41
                location key (270 INODE_ITEM 0)
                cache generation 24729843 entries 58 bitmaps 8
        item 23 key (FREE_SPACE UNTYPED 16135487488) itemoff 2487 itemsize 41
                location key (271 INODE_ITEM 0)
                cache generation 20064465 entries 36 bitmaps 8
        item 24 key (FREE_SPACE UNTYPED 17209229312) itemoff 2446 itemsize 41
                location key (272 INODE_ITEM 0)
                cache generation 20079294 entries 86 bitmaps 0
        item 25 key (FREE_SPACE UNTYPED 18282971136) itemoff 2405 itemsize 41
                location key (273 INODE_ITEM 0)
                cache generation 20081218 entries 38 bitmaps 8
        item 26 key (FREE_SPACE UNTYPED 19356712960) itemoff 2364 itemsize 41
                location key (274 INODE_ITEM 0)
                cache generation 20088898 entries 22 bitmaps 4
        item 27 key (FREE_SPACE UNTYPED 20430454784) itemoff 2323 itemsize 41
                location key (275 INODE_ITEM 0)
                cache generation 20055389 entries 91 bitmaps 7
        item 28 key (FREE_SPACE UNTYPED 35462840320) itemoff 2282 itemsize 41
                location key (289 INODE_ITEM 0)
                cache generation 24703953 entries 10 bitmaps 8
        item 29 key (FREE_SPACE UNTYPED 44052774912) itemoff 2241 itemsize 41
                location key (290 INODE_ITEM 0)
                cache generation 20083477 entries 36 bitmaps 8
        item 30 key (FREE_SPACE UNTYPED 52642709504) itemoff 2200 itemsize 41
                location key (291 INODE_ITEM 0)
                cache generation 24712508 entries 9 bitmaps 8
        item 31 key (FREE_SPACE UNTYPED 54857302016) itemoff 2159 itemsize 41
                location key (292 INODE_ITEM 0)
                cache generation 24749497 entries 24 bitmaps 8
        item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
                generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
                lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
                drop key (0 UNKNOWN.0 0) level 0
--- snap ---



On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
> > Hi,
> >
> > I've updated my system (Debian testing) [1] several months ago (~
> > December) and I noticed a lot of corrupt leaf messages flooding my
> > kern.log [2]. Furthermore my system had some trouble, e.g.
> > applications were terminated after some uptime, due to the btrfs
> > filesystem errors. This was with kernel 5.3.
> > The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
> >
> > I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
> > release and with this kernel there aren't any corrupt leaf messages
> > and the problem is gone. IMHO, it must be something coming with kernel
> > 5.3 (or 5.x).
>
> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
> such *obviously* wrong metadata.
> >
> > My harddisk is a SSD which is responsible for the root partition. I've
> > encrypted my filesystem with LUKS and just right after I entered my
> > password at the boot, the first corrupt leaf errors appear.
> >
> > An error message looks like this:
> > May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> > size, have 239 expect 439
>
> Btrfs root items have fixed size. This is already something very bad.
>
> Furthermore, the item size is smaller than expected, which means we can
> easily get garbage. I'm a little surprised that older kernel can even
> work without crashing the whole kernel.
>
> Some extra info could help us to find out how badly the fs is corrupted.
> # btrfs ins dump-tree -b 35799040 /dev/dm-0
>
> >
> > "root=1", "slot=32", "have 239 expect 439" is always the same at every
> > error line. Only the block number changes.
>
> And dumps for the other block numbers too.
>
> >
> > Interestingly it's the very same as reported to the ML here [3]. I've
> > contacted the reporter, but he didn't have a solution for me, because
> > he changed to a different filesystem.
> >
> > I've already tried "btrfs scrub" and "btrfs check --readonly /" in
> > rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
> > values of the SSD, which are fine. Furthermore I've tested my RAM, but
> > again, w/o any errors.
>
> This doesn't look like a bit flip, so not RAM problems.
>
> Don't have any better advice until we got the dumps, but I'd recommend
> to backup your data since it's still possible.
>
> Thanks,
> Qu
>
> >
> > So, I have no more ideas what I can do. Could you please help me to
> > investigate this further? Could it be a bug?
> >
> > Thank you very much.
> >
> > Best regards,
> > Thorsten
> >
> >
> >
> > 1:
> > $ cat /etc/debian_version
> > bullseye/sid
> >
> > $ uname -a
> > [no problem with this kernel]
> > Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
> >
> > $ btrfs --version
> > btrfs-progs v5.6
> >
> > $ sudo btrfs fi show
> > Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
> >         Total devices 1 FS bytes used 7.33GiB
> >         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
> >
> > $ btrfs fi df /
> > Data, single: total=22.01GiB, used=7.16GiB
> > System, DUP: total=32.00MiB, used=4.00KiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, DUP: total=2.00GiB, used=168.19MiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=25.42MiB, used=0.00B
> >
> >
> > 2:
> > [several messages per second]
> > May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> > size, have 239 expect 439
> > May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
> > size, have 239 expect 439
> > May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
> > size, have 239 expect 439
> > May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
> > size, have 239 expect 439
> > May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
> > size, have 239 expect 439
> > May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
> > size, have 239 expect 439
> > May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
> > size, have 239 expect 439
> > May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
> > size, have 239 expect 439
> > May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
> > size, have 239 expect 439
> > May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
> > dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
> > size, have 239 expect 439
> >
> > 3:
> > https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> >
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-04  9:45   ` Thorsten Rehm
@ 2020-06-04 10:00     ` Qu Wenruo
  2020-06-04 10:52       ` Thorsten Rehm
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2020-06-04 10:00 UTC (permalink / raw)
  To: Thorsten Rehm; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 8516 bytes --]



On 2020/6/4 下午5:45, Thorsten Rehm wrote:
> Thank you for you answer.
> I've just updated my system, did a reboot and it's running with a
> 5.6.0-2-amd64 now.
> So, this is how my kern.log looks like, just right after the start:
> 

> 
> There are too many blocks. I just picked three randomly:

Looks like we need more result, especially some result doesn't match at all.

> 
> === Block 33017856 ===
> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
> btrfs-progs v5.6
> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
...
>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
>                 generation 24749502 type 1 (regular)
>                 extent data disk byte 1126502400 nr 4096
>                 extent data offset 0 nr 8192 ram 8192
>                 extent compression 2 (lzo)
>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
>                 generation 24749502 type 1 (regular)
>                 extent data disk byte 0 nr 0
>                 extent data offset 1937408 nr 4096 ram 4194304
>                 extent compression 0 (none)
Not root item at all.
At least for this copy, it looks like kernel got one completely bad
copy, then discarded it and found a good copy.

That's very strange, especially when all the other involved ones seems
random and all at slot 32 is not a coincident.


> === Block 44900352  ===
> btrfs ins dump-tree -b 44900352 /dev/dm-0
> btrfs-progs v5.6
> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
> leaf 44900352 flags 0x1(WRITTEN) backref revision 1

This block doesn't even have slot 32... It only have 19 items, thus slot
0 ~ slot 18.
And its owner, FS_TREE shouldn't have ROOT_ITEM.

> 
> 
> === Block 55352561664 ===
> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
> btrfs-progs v5.6
> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
...
>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>                 drop key (0 UNKNOWN.0 0) level 0

This looks like the offending tree block.
Slot 32, item size 239, which is ROOT_ITEM, but in valid size.

Since you're here, I guess a btrfs check without --repair on the
unmounted fs would help to identify the real damage.

And again, the fs looks very damaged, it's highly recommended to backup
your data asap.

Thanks,
Qu

> --- snap ---
> 
> 
> 
> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
>>> Hi,
>>>
>>> I've updated my system (Debian testing) [1] several months ago (~
>>> December) and I noticed a lot of corrupt leaf messages flooding my
>>> kern.log [2]. Furthermore my system had some trouble, e.g.
>>> applications were terminated after some uptime, due to the btrfs
>>> filesystem errors. This was with kernel 5.3.
>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
>>>
>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
>>> release and with this kernel there aren't any corrupt leaf messages
>>> and the problem is gone. IMHO, it must be something coming with kernel
>>> 5.3 (or 5.x).
>>
>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
>> such *obviously* wrong metadata.
>>>
>>> My harddisk is a SSD which is responsible for the root partition. I've
>>> encrypted my filesystem with LUKS and just right after I entered my
>>> password at the boot, the first corrupt leaf errors appear.
>>>
>>> An error message looks like this:
>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>> size, have 239 expect 439
>>
>> Btrfs root items have fixed size. This is already something very bad.
>>
>> Furthermore, the item size is smaller than expected, which means we can
>> easily get garbage. I'm a little surprised that older kernel can even
>> work without crashing the whole kernel.
>>
>> Some extra info could help us to find out how badly the fs is corrupted.
>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
>>
>>>
>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
>>> error line. Only the block number changes.
>>
>> And dumps for the other block numbers too.
>>
>>>
>>> Interestingly it's the very same as reported to the ML here [3]. I've
>>> contacted the reporter, but he didn't have a solution for me, because
>>> he changed to a different filesystem.
>>>
>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
>>> again, w/o any errors.
>>
>> This doesn't look like a bit flip, so not RAM problems.
>>
>> Don't have any better advice until we got the dumps, but I'd recommend
>> to backup your data since it's still possible.
>>
>> Thanks,
>> Qu
>>
>>>
>>> So, I have no more ideas what I can do. Could you please help me to
>>> investigate this further? Could it be a bug?
>>>
>>> Thank you very much.
>>>
>>> Best regards,
>>> Thorsten
>>>
>>>
>>>
>>> 1:
>>> $ cat /etc/debian_version
>>> bullseye/sid
>>>
>>> $ uname -a
>>> [no problem with this kernel]
>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
>>>
>>> $ btrfs --version
>>> btrfs-progs v5.6
>>>
>>> $ sudo btrfs fi show
>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>         Total devices 1 FS bytes used 7.33GiB
>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
>>>
>>> $ btrfs fi df /
>>> Data, single: total=22.01GiB, used=7.16GiB
>>> System, DUP: total=32.00MiB, used=4.00KiB
>>> System, single: total=4.00MiB, used=0.00B
>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
>>> Metadata, single: total=8.00MiB, used=0.00B
>>> GlobalReserve, single: total=25.42MiB, used=0.00B
>>>
>>>
>>> 2:
>>> [several messages per second]
>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
>>> size, have 239 expect 439
>>>
>>> 3:
>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
>>>
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-04 10:00     ` Qu Wenruo
@ 2020-06-04 10:52       ` Thorsten Rehm
  2020-06-04 12:06         ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Thorsten Rehm @ 2020-06-04 10:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

The disk in question is my root (/) partition. If the filesystem is
that highly damaged, I have to reinstall my system. We will see, if
it's come to that. Maybe we find something interesting on the way...
I've downloaded the latest grml daily image and started my system from
a usb stick. Here we go:

root@grml ~ # uname -r
5.6.0-2-amd64

root@grml ~ # cryptsetup open /dev/sda5 foo

                                                                  :(
Enter passphrase for /dev/sda5:

root@grml ~ # file -L -s /dev/mapper/foo
/dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
nodesize 4096, leafsize 4096,
UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
bytes used, 1 devices

root@grml ~ # btrfs check /dev/mapper/foo
Opening filesystem to check...
Checking filesystem on /dev/mapper/foo
UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 7815716864 bytes used, no error found
total csum bytes: 6428260
total tree bytes: 175968256
total fs tree bytes: 149475328
total extent tree bytes: 16052224
btree space waste bytes: 43268911
file data blocks allocated: 10453221376
 referenced 8746053632

root@grml ~ # lsblk /dev/sda5 --fs
NAME  FSTYPE      FSVER LABEL UUID
FSAVAIL FSUSE% MOUNTPOINT
sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
└─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685

root@grml ~ # mount /dev/mapper/foo /mnt
root@grml ~ # btrfs scrub start /mnt

root@grml ~ # journalctl -k --no-pager | grep BTRFS
Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
24750795 /dev/dm-0 scanned by systemd-udevd (3233)
Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
caching is enabled
Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
expect 439
Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
optimizations
Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
expect 439
Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
expect 439
Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
on devid 1 with status: 0
Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
expect 439

root@grml ~ # btrfs scrub status /mnt
UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
Scrub started:    Thu Jun  4 10:45:38 2020
Status:           finished
Duration:         0:00:53
Total to scrub:   7.44GiB
Rate:             143.80MiB/s
Error summary:    no errors found


root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
btrfs ins dump-tree -b $block /dev/dm-0; done
btrfs-progs v5.6
leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
leaf 54222848 flags 0x1(WRITTEN) backref revision 1
fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
    item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
        generation 24703953 transid 24703953 size 262144 nbytes 8595701760
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 32790 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1589235096.486856306 (2020-05-11 22:11:36)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 1 key (289 EXTENT_DATA 0) itemoff 3782 itemsize 53
        generation 24703953 type 1 (regular)
        extent data disk byte 3544403968 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 2 key (290 INODE_ITEM 0) itemoff 3622 itemsize 160
        generation 20083477 transid 20083477 size 262144 nbytes 6823346176
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 26029 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1587576718.255911112 (2020-04-22 17:31:58)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 3 key (290 EXTENT_DATA 0) itemoff 3569 itemsize 53
        generation 20083477 type 1 (regular)
        extent data disk byte 3373088768 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 4 key (291 INODE_ITEM 0) itemoff 3409 itemsize 160
        generation 24712508 transid 24712508 size 262144 nbytes 5454692352
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 20808 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1589569287.32299836 (2020-05-15 19:01:27)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 5 key (291 EXTENT_DATA 0) itemoff 3356 itemsize 53
        generation 24712508 type 1 (regular)
        extent data disk byte 5286600704 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 6 key (292 INODE_ITEM 0) itemoff 3196 itemsize 160
        generation 24750791 transid 24750791 size 262144 nbytes 3022026440704
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 11528116 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1591266423.923005453 (2020-06-04 10:27:03)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 7 key (292 EXTENT_DATA 0) itemoff 3143 itemsize 53
        generation 24750791 type 1 (regular)
        extent data disk byte 3249909760 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 8 key (FREE_SPACE UNTYPED 29360128) itemoff 3102 itemsize 41
        location key (256 INODE_ITEM 0)
        cache generation 24750795 entries 25 bitmaps 8
    item 9 key (FREE_SPACE UNTYPED 1103101952) itemoff 3061 itemsize 41
        location key (257 INODE_ITEM 0)
        cache generation 24750794 entries 140 bitmaps 8
    item 10 key (FREE_SPACE UNTYPED 2176843776) itemoff 3020 itemsize 41
        location key (258 INODE_ITEM 0)
        cache generation 24750795 entries 31 bitmaps 8
    item 11 key (FREE_SPACE UNTYPED 3250585600) itemoff 2979 itemsize 41
        location key (259 INODE_ITEM 0)
        cache generation 24750795 entries 39 bitmaps 8
    item 12 key (FREE_SPACE UNTYPED 4324327424) itemoff 2938 itemsize 41
        location key (261 INODE_ITEM 0)
        cache generation 24750702 entries 155 bitmaps 8
    item 13 key (FREE_SPACE UNTYPED 5398069248) itemoff 2897 itemsize 41
        location key (260 INODE_ITEM 0)
        cache generation 24749493 entries 23 bitmaps 8
    item 14 key (FREE_SPACE UNTYPED 6471811072) itemoff 2856 itemsize 41
        location key (262 INODE_ITEM 0)
        cache generation 24749507 entries 72 bitmaps 8
    item 15 key (FREE_SPACE UNTYPED 7545552896) itemoff 2815 itemsize 41
        location key (263 INODE_ITEM 0)
        cache generation 24749493 entries 22 bitmaps 8
    item 16 key (FREE_SPACE UNTYPED 8619294720) itemoff 2774 itemsize 41
        location key (264 INODE_ITEM 0)
        cache generation 24729885 entries 35 bitmaps 8
    item 17 key (FREE_SPACE UNTYPED 9693036544) itemoff 2733 itemsize 41
        location key (265 INODE_ITEM 0)
        cache generation 22144003 entries 30 bitmaps 8
    item 18 key (FREE_SPACE UNTYPED 10766778368) itemoff 2692 itemsize 41
        location key (266 INODE_ITEM 0)
        cache generation 24749177 entries 148 bitmaps 4
    item 19 key (FREE_SPACE UNTYPED 11840520192) itemoff 2651 itemsize 41
        location key (267 INODE_ITEM 0)
        cache generation 24749152 entries 33 bitmaps 8
    item 20 key (FREE_SPACE UNTYPED 12914262016) itemoff 2610 itemsize 41
        location key (268 INODE_ITEM 0)
        cache generation 24706177 entries 11 bitmaps 8
    item 21 key (FREE_SPACE UNTYPED 13988003840) itemoff 2569 itemsize 41
        location key (269 INODE_ITEM 0)
        cache generation 21296150 entries 46 bitmaps 8
    item 22 key (FREE_SPACE UNTYPED 15061745664) itemoff 2528 itemsize 41
        location key (270 INODE_ITEM 0)
        cache generation 24729843 entries 58 bitmaps 8
    item 23 key (FREE_SPACE UNTYPED 16135487488) itemoff 2487 itemsize 41
        location key (271 INODE_ITEM 0)
        cache generation 20064465 entries 36 bitmaps 8
    item 24 key (FREE_SPACE UNTYPED 17209229312) itemoff 2446 itemsize 41
        location key (272 INODE_ITEM 0)
        cache generation 20079294 entries 86 bitmaps 0
    item 25 key (FREE_SPACE UNTYPED 18282971136) itemoff 2405 itemsize 41
        location key (273 INODE_ITEM 0)
        cache generation 20081218 entries 38 bitmaps 8
    item 26 key (FREE_SPACE UNTYPED 19356712960) itemoff 2364 itemsize 41
        location key (274 INODE_ITEM 0)
        cache generation 20088898 entries 22 bitmaps 4
    item 27 key (FREE_SPACE UNTYPED 20430454784) itemoff 2323 itemsize 41
        location key (275 INODE_ITEM 0)
        cache generation 20055389 entries 91 bitmaps 7
    item 28 key (FREE_SPACE UNTYPED 35462840320) itemoff 2282 itemsize 41
        location key (289 INODE_ITEM 0)
        cache generation 24703953 entries 10 bitmaps 8
    item 29 key (FREE_SPACE UNTYPED 44052774912) itemoff 2241 itemsize 41
        location key (290 INODE_ITEM 0)
        cache generation 20083477 entries 36 bitmaps 8
    item 30 key (FREE_SPACE UNTYPED 52642709504) itemoff 2200 itemsize 41
        location key (291 INODE_ITEM 0)
        cache generation 24712508 entries 9 bitmaps 8
    item 31 key (FREE_SPACE UNTYPED 54857302016) itemoff 2159 itemsize 41
        location key (292 INODE_ITEM 0)
        cache generation 24750791 entries 139 bitmaps 8
    item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
        generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
        lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
        drop key (0 UNKNOWN.0 0) level 0
btrfs-progs v5.6
leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
leaf 29552640 flags 0x1(WRITTEN) backref revision 1
fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
    item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
        generation 24703953 transid 24703953 size 262144 nbytes 8595701760
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 32790 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1589235096.486856306 (2020-05-11 22:11:36)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 1 key (289 EXTENT_DATA 0) itemoff 3782 itemsize 53
        generation 24703953 type 1 (regular)
        extent data disk byte 3544403968 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 2 key (290 INODE_ITEM 0) itemoff 3622 itemsize 160
        generation 20083477 transid 20083477 size 262144 nbytes 6823346176
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 26029 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1587576718.255911112 (2020-04-22 17:31:58)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 3 key (290 EXTENT_DATA 0) itemoff 3569 itemsize 53
        generation 20083477 type 1 (regular)
        extent data disk byte 3373088768 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 4 key (291 INODE_ITEM 0) itemoff 3409 itemsize 160
        generation 24712508 transid 24712508 size 262144 nbytes 5454692352
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 20808 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1589569287.32299836 (2020-05-15 19:01:27)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 5 key (291 EXTENT_DATA 0) itemoff 3356 itemsize 53
        generation 24712508 type 1 (regular)
        extent data disk byte 5286600704 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 6 key (292 INODE_ITEM 0) itemoff 3196 itemsize 160
        generation 24750791 transid 24750791 size 262144 nbytes 3022026440704
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 11528116 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1591266423.923005453 (2020-06-04 10:27:03)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 7 key (292 EXTENT_DATA 0) itemoff 3143 itemsize 53
        generation 24750791 type 1 (regular)
        extent data disk byte 3249909760 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 8 key (FREE_SPACE UNTYPED 29360128) itemoff 3102 itemsize 41
        location key (256 INODE_ITEM 0)
        cache generation 24750796 entries 45 bitmaps 8
    item 9 key (FREE_SPACE UNTYPED 1103101952) itemoff 3061 itemsize 41
        location key (257 INODE_ITEM 0)
        cache generation 24750794 entries 140 bitmaps 8
    item 10 key (FREE_SPACE UNTYPED 2176843776) itemoff 3020 itemsize 41
        location key (258 INODE_ITEM 0)
        cache generation 24750796 entries 34 bitmaps 8
    item 11 key (FREE_SPACE UNTYPED 3250585600) itemoff 2979 itemsize 41
        location key (259 INODE_ITEM 0)
        cache generation 24750796 entries 37 bitmaps 8
    item 12 key (FREE_SPACE UNTYPED 4324327424) itemoff 2938 itemsize 41
        location key (261 INODE_ITEM 0)
        cache generation 24750702 entries 155 bitmaps 8
    item 13 key (FREE_SPACE UNTYPED 5398069248) itemoff 2897 itemsize 41
        location key (260 INODE_ITEM 0)
        cache generation 24749493 entries 23 bitmaps 8
    item 14 key (FREE_SPACE UNTYPED 6471811072) itemoff 2856 itemsize 41
        location key (262 INODE_ITEM 0)
        cache generation 24749507 entries 72 bitmaps 8
    item 15 key (FREE_SPACE UNTYPED 7545552896) itemoff 2815 itemsize 41
        location key (263 INODE_ITEM 0)
        cache generation 24749493 entries 22 bitmaps 8
    item 16 key (FREE_SPACE UNTYPED 8619294720) itemoff 2774 itemsize 41
        location key (264 INODE_ITEM 0)
        cache generation 24729885 entries 35 bitmaps 8
    item 17 key (FREE_SPACE UNTYPED 9693036544) itemoff 2733 itemsize 41
        location key (265 INODE_ITEM 0)
        cache generation 22144003 entries 30 bitmaps 8
    item 18 key (FREE_SPACE UNTYPED 10766778368) itemoff 2692 itemsize 41
        location key (266 INODE_ITEM 0)
        cache generation 24749177 entries 148 bitmaps 4
    item 19 key (FREE_SPACE UNTYPED 11840520192) itemoff 2651 itemsize 41
        location key (267 INODE_ITEM 0)
        cache generation 24749152 entries 33 bitmaps 8
    item 20 key (FREE_SPACE UNTYPED 12914262016) itemoff 2610 itemsize 41
        location key (268 INODE_ITEM 0)
        cache generation 24706177 entries 11 bitmaps 8
    item 21 key (FREE_SPACE UNTYPED 13988003840) itemoff 2569 itemsize 41
        location key (269 INODE_ITEM 0)
        cache generation 21296150 entries 46 bitmaps 8
    item 22 key (FREE_SPACE UNTYPED 15061745664) itemoff 2528 itemsize 41
        location key (270 INODE_ITEM 0)
        cache generation 24729843 entries 58 bitmaps 8
    item 23 key (FREE_SPACE UNTYPED 16135487488) itemoff 2487 itemsize 41
        location key (271 INODE_ITEM 0)
        cache generation 20064465 entries 36 bitmaps 8
    item 24 key (FREE_SPACE UNTYPED 17209229312) itemoff 2446 itemsize 41
        location key (272 INODE_ITEM 0)
        cache generation 20079294 entries 86 bitmaps 0
    item 25 key (FREE_SPACE UNTYPED 18282971136) itemoff 2405 itemsize 41
        location key (273 INODE_ITEM 0)
        cache generation 20081218 entries 38 bitmaps 8
    item 26 key (FREE_SPACE UNTYPED 19356712960) itemoff 2364 itemsize 41
        location key (274 INODE_ITEM 0)
        cache generation 20088898 entries 22 bitmaps 4
    item 27 key (FREE_SPACE UNTYPED 20430454784) itemoff 2323 itemsize 41
        location key (275 INODE_ITEM 0)
        cache generation 20055389 entries 91 bitmaps 7
    item 28 key (FREE_SPACE UNTYPED 35462840320) itemoff 2282 itemsize 41
        location key (289 INODE_ITEM 0)
        cache generation 24703953 entries 10 bitmaps 8
    item 29 key (FREE_SPACE UNTYPED 44052774912) itemoff 2241 itemsize 41
        location key (290 INODE_ITEM 0)
        cache generation 20083477 entries 36 bitmaps 8
    item 30 key (FREE_SPACE UNTYPED 52642709504) itemoff 2200 itemsize 41
        location key (291 INODE_ITEM 0)
        cache generation 24712508 entries 9 bitmaps 8
    item 31 key (FREE_SPACE UNTYPED 54857302016) itemoff 2159 itemsize 41
        location key (292 INODE_ITEM 0)
        cache generation 24750791 entries 139 bitmaps 8
    item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
        generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
        lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
        drop key (0 UNKNOWN.0 0) level 0
btrfs-progs v5.6
leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE
leaf 29741056 flags 0x1(WRITTEN) backref revision 1
fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
    item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
        generation 24703953 transid 24703953 size 262144 nbytes 8595701760
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 32790 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1589235096.486856306 (2020-05-11 22:11:36)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 1 key (289 EXTENT_DATA 0) itemoff 3782 itemsize 53
        generation 24703953 type 1 (regular)
        extent data disk byte 3544403968 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 2 key (290 INODE_ITEM 0) itemoff 3622 itemsize 160
        generation 20083477 transid 20083477 size 262144 nbytes 6823346176
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 26029 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1587576718.255911112 (2020-04-22 17:31:58)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 3 key (290 EXTENT_DATA 0) itemoff 3569 itemsize 53
        generation 20083477 type 1 (regular)
        extent data disk byte 3373088768 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 4 key (291 INODE_ITEM 0) itemoff 3409 itemsize 160
        generation 24712508 transid 24712508 size 262144 nbytes 5454692352
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 20808 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1589569287.32299836 (2020-05-15 19:01:27)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 5 key (291 EXTENT_DATA 0) itemoff 3356 itemsize 53
        generation 24712508 type 1 (regular)
        extent data disk byte 5286600704 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 6 key (292 INODE_ITEM 0) itemoff 3196 itemsize 160
        generation 24750791 transid 24750791 size 262144 nbytes 3022026440704
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 11528116 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1591266423.923005453 (2020-06-04 10:27:03)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 7 key (292 EXTENT_DATA 0) itemoff 3143 itemsize 53
        generation 24750791 type 1 (regular)
        extent data disk byte 3249909760 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 8 key (FREE_SPACE UNTYPED 29360128) itemoff 3102 itemsize 41
        location key (256 INODE_ITEM 0)
        cache generation 24750797 entries 60 bitmaps 8
    item 9 key (FREE_SPACE UNTYPED 1103101952) itemoff 3061 itemsize 41
        location key (257 INODE_ITEM 0)
        cache generation 24750794 entries 140 bitmaps 8
    item 10 key (FREE_SPACE UNTYPED 2176843776) itemoff 3020 itemsize 41
        location key (258 INODE_ITEM 0)
        cache generation 24750797 entries 31 bitmaps 8
    item 11 key (FREE_SPACE UNTYPED 3250585600) itemoff 2979 itemsize 41
        location key (259 INODE_ITEM 0)
        cache generation 24750797 entries 40 bitmaps 8
    item 12 key (FREE_SPACE UNTYPED 4324327424) itemoff 2938 itemsize 41
        location key (261 INODE_ITEM 0)
        cache generation 24750702 entries 155 bitmaps 8
    item 13 key (FREE_SPACE UNTYPED 5398069248) itemoff 2897 itemsize 41
        location key (260 INODE_ITEM 0)
        cache generation 24749493 entries 23 bitmaps 8
    item 14 key (FREE_SPACE UNTYPED 6471811072) itemoff 2856 itemsize 41
        location key (262 INODE_ITEM 0)
        cache generation 24749507 entries 72 bitmaps 8
    item 15 key (FREE_SPACE UNTYPED 7545552896) itemoff 2815 itemsize 41
        location key (263 INODE_ITEM 0)
        cache generation 24749493 entries 22 bitmaps 8
    item 16 key (FREE_SPACE UNTYPED 8619294720) itemoff 2774 itemsize 41
        location key (264 INODE_ITEM 0)
        cache generation 24729885 entries 35 bitmaps 8
    item 17 key (FREE_SPACE UNTYPED 9693036544) itemoff 2733 itemsize 41
        location key (265 INODE_ITEM 0)
        cache generation 22144003 entries 30 bitmaps 8
    item 18 key (FREE_SPACE UNTYPED 10766778368) itemoff 2692 itemsize 41
        location key (266 INODE_ITEM 0)
        cache generation 24749177 entries 148 bitmaps 4
    item 19 key (FREE_SPACE UNTYPED 11840520192) itemoff 2651 itemsize 41
        location key (267 INODE_ITEM 0)
        cache generation 24749152 entries 33 bitmaps 8
    item 20 key (FREE_SPACE UNTYPED 12914262016) itemoff 2610 itemsize 41
        location key (268 INODE_ITEM 0)
        cache generation 24706177 entries 11 bitmaps 8
    item 21 key (FREE_SPACE UNTYPED 13988003840) itemoff 2569 itemsize 41
        location key (269 INODE_ITEM 0)
        cache generation 21296150 entries 46 bitmaps 8
    item 22 key (FREE_SPACE UNTYPED 15061745664) itemoff 2528 itemsize 41
        location key (270 INODE_ITEM 0)
        cache generation 24729843 entries 58 bitmaps 8
    item 23 key (FREE_SPACE UNTYPED 16135487488) itemoff 2487 itemsize 41
        location key (271 INODE_ITEM 0)
        cache generation 20064465 entries 36 bitmaps 8
    item 24 key (FREE_SPACE UNTYPED 17209229312) itemoff 2446 itemsize 41
        location key (272 INODE_ITEM 0)
        cache generation 20079294 entries 86 bitmaps 0
    item 25 key (FREE_SPACE UNTYPED 18282971136) itemoff 2405 itemsize 41
        location key (273 INODE_ITEM 0)
        cache generation 20081218 entries 38 bitmaps 8
    item 26 key (FREE_SPACE UNTYPED 19356712960) itemoff 2364 itemsize 41
        location key (274 INODE_ITEM 0)
        cache generation 20088898 entries 22 bitmaps 4
    item 27 key (FREE_SPACE UNTYPED 20430454784) itemoff 2323 itemsize 41
        location key (275 INODE_ITEM 0)
        cache generation 20055389 entries 91 bitmaps 7
    item 28 key (FREE_SPACE UNTYPED 35462840320) itemoff 2282 itemsize 41
        location key (289 INODE_ITEM 0)
        cache generation 24703953 entries 10 bitmaps 8
    item 29 key (FREE_SPACE UNTYPED 44052774912) itemoff 2241 itemsize 41
        location key (290 INODE_ITEM 0)
        cache generation 20083477 entries 36 bitmaps 8
    item 30 key (FREE_SPACE UNTYPED 52642709504) itemoff 2200 itemsize 41
        location key (291 INODE_ITEM 0)
        cache generation 24712508 entries 9 bitmaps 8
    item 31 key (FREE_SPACE UNTYPED 54857302016) itemoff 2159 itemsize 41
        location key (292 INODE_ITEM 0)
        cache generation 24750791 entries 139 bitmaps 8
    item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
        generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
        lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
        drop key (0 UNKNOWN.0 0) level 0
btrfs-progs v5.6
leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE
leaf 29974528 flags 0x1(WRITTEN) backref revision 1
fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
    item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
        generation 24703953 transid 24703953 size 262144 nbytes 8595701760
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 32790 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1589235096.486856306 (2020-05-11 22:11:36)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 1 key (289 EXTENT_DATA 0) itemoff 3782 itemsize 53
        generation 24703953 type 1 (regular)
        extent data disk byte 3544403968 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 2 key (290 INODE_ITEM 0) itemoff 3622 itemsize 160
        generation 20083477 transid 20083477 size 262144 nbytes 6823346176
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 26029 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1587576718.255911112 (2020-04-22 17:31:58)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 3 key (290 EXTENT_DATA 0) itemoff 3569 itemsize 53
        generation 20083477 type 1 (regular)
        extent data disk byte 3373088768 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 4 key (291 INODE_ITEM 0) itemoff 3409 itemsize 160
        generation 24712508 transid 24712508 size 262144 nbytes 5454692352
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 20808 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1589569287.32299836 (2020-05-15 19:01:27)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 5 key (291 EXTENT_DATA 0) itemoff 3356 itemsize 53
        generation 24712508 type 1 (regular)
        extent data disk byte 5286600704 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 6 key (292 INODE_ITEM 0) itemoff 3196 itemsize 160
        generation 24750791 transid 24750791 size 262144 nbytes 3022026440704
        block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
        sequence 11528116 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 0.0 (1970-01-01 00:00:00)
        ctime 1591266423.923005453 (2020-06-04 10:27:03)
        mtime 0.0 (1970-01-01 00:00:00)
        otime 0.0 (1970-01-01 00:00:00)
    item 7 key (292 EXTENT_DATA 0) itemoff 3143 itemsize 53
        generation 24750791 type 1 (regular)
        extent data disk byte 3249909760 nr 262144
        extent data offset 0 nr 262144 ram 262144
        extent compression 0 (none)
    item 8 key (FREE_SPACE UNTYPED 29360128) itemoff 3102 itemsize 41
        location key (256 INODE_ITEM 0)
        cache generation 24750798 entries 79 bitmaps 8
    item 9 key (FREE_SPACE UNTYPED 1103101952) itemoff 3061 itemsize 41
        location key (257 INODE_ITEM 0)
        cache generation 24750794 entries 140 bitmaps 8
    item 10 key (FREE_SPACE UNTYPED 2176843776) itemoff 3020 itemsize 41
        location key (258 INODE_ITEM 0)
        cache generation 24750798 entries 33 bitmaps 8
    item 11 key (FREE_SPACE UNTYPED 3250585600) itemoff 2979 itemsize 41
        location key (259 INODE_ITEM 0)
        cache generation 24750798 entries 37 bitmaps 8
    item 12 key (FREE_SPACE UNTYPED 4324327424) itemoff 2938 itemsize 41
        location key (261 INODE_ITEM 0)
        cache generation 24750702 entries 155 bitmaps 8
    item 13 key (FREE_SPACE UNTYPED 5398069248) itemoff 2897 itemsize 41
        location key (260 INODE_ITEM 0)
        cache generation 24749493 entries 23 bitmaps 8
    item 14 key (FREE_SPACE UNTYPED 6471811072) itemoff 2856 itemsize 41
        location key (262 INODE_ITEM 0)
        cache generation 24749507 entries 72 bitmaps 8
    item 15 key (FREE_SPACE UNTYPED 7545552896) itemoff 2815 itemsize 41
        location key (263 INODE_ITEM 0)
        cache generation 24749493 entries 22 bitmaps 8
    item 16 key (FREE_SPACE UNTYPED 8619294720) itemoff 2774 itemsize 41
        location key (264 INODE_ITEM 0)
        cache generation 24729885 entries 35 bitmaps 8
    item 17 key (FREE_SPACE UNTYPED 9693036544) itemoff 2733 itemsize 41
        location key (265 INODE_ITEM 0)
        cache generation 22144003 entries 30 bitmaps 8
    item 18 key (FREE_SPACE UNTYPED 10766778368) itemoff 2692 itemsize 41
        location key (266 INODE_ITEM 0)
        cache generation 24749177 entries 148 bitmaps 4
    item 19 key (FREE_SPACE UNTYPED 11840520192) itemoff 2651 itemsize 41
        location key (267 INODE_ITEM 0)
        cache generation 24749152 entries 33 bitmaps 8
    item 20 key (FREE_SPACE UNTYPED 12914262016) itemoff 2610 itemsize 41
        location key (268 INODE_ITEM 0)
        cache generation 24706177 entries 11 bitmaps 8
    item 21 key (FREE_SPACE UNTYPED 13988003840) itemoff 2569 itemsize 41
        location key (269 INODE_ITEM 0)
        cache generation 21296150 entries 46 bitmaps 8
    item 22 key (FREE_SPACE UNTYPED 15061745664) itemoff 2528 itemsize 41
        location key (270 INODE_ITEM 0)
        cache generation 24729843 entries 58 bitmaps 8
    item 23 key (FREE_SPACE UNTYPED 16135487488) itemoff 2487 itemsize 41
        location key (271 INODE_ITEM 0)
        cache generation 20064465 entries 36 bitmaps 8
    item 24 key (FREE_SPACE UNTYPED 17209229312) itemoff 2446 itemsize 41
        location key (272 INODE_ITEM 0)
        cache generation 20079294 entries 86 bitmaps 0
    item 25 key (FREE_SPACE UNTYPED 18282971136) itemoff 2405 itemsize 41
        location key (273 INODE_ITEM 0)
        cache generation 20081218 entries 38 bitmaps 8
    item 26 key (FREE_SPACE UNTYPED 19356712960) itemoff 2364 itemsize 41
        location key (274 INODE_ITEM 0)
        cache generation 20088898 entries 22 bitmaps 4
    item 27 key (FREE_SPACE UNTYPED 20430454784) itemoff 2323 itemsize 41
        location key (275 INODE_ITEM 0)
        cache generation 20055389 entries 91 bitmaps 7
    item 28 key (FREE_SPACE UNTYPED 35462840320) itemoff 2282 itemsize 41
        location key (289 INODE_ITEM 0)
        cache generation 24703953 entries 10 bitmaps 8
    item 29 key (FREE_SPACE UNTYPED 44052774912) itemoff 2241 itemsize 41
        location key (290 INODE_ITEM 0)
        cache generation 20083477 entries 36 bitmaps 8
    item 30 key (FREE_SPACE UNTYPED 52642709504) itemoff 2200 itemsize 41
        location key (291 INODE_ITEM 0)
        cache generation 24712508 entries 9 bitmaps 8
    item 31 key (FREE_SPACE UNTYPED 54857302016) itemoff 2159 itemsize 41
        location key (292 INODE_ITEM 0)
        cache generation 24750791 entries 139 bitmaps 8
    item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
        generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
        lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
        drop key (0 UNKNOWN.0 0) level 0

On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
> > Thank you for you answer.
> > I've just updated my system, did a reboot and it's running with a
> > 5.6.0-2-amd64 now.
> > So, this is how my kern.log looks like, just right after the start:
> >
>
> >
> > There are too many blocks. I just picked three randomly:
>
> Looks like we need more result, especially some result doesn't match at all.
>
> >
> > === Block 33017856 ===
> > $ btrfs ins dump-tree -b 33017856 /dev/dm-0
> > btrfs-progs v5.6
> > leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
> > leaf 33017856 flags 0x1(WRITTEN) backref revision 1
> > fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> ...
> >         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
> >                 generation 24749502 type 1 (regular)
> >                 extent data disk byte 1126502400 nr 4096
> >                 extent data offset 0 nr 8192 ram 8192
> >                 extent compression 2 (lzo)
> >         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
> >                 generation 24749502 type 1 (regular)
> >                 extent data disk byte 0 nr 0
> >                 extent data offset 1937408 nr 4096 ram 4194304
> >                 extent compression 0 (none)
> Not root item at all.
> At least for this copy, it looks like kernel got one completely bad
> copy, then discarded it and found a good copy.
>
> That's very strange, especially when all the other involved ones seems
> random and all at slot 32 is not a coincident.
>
>
> > === Block 44900352  ===
> > btrfs ins dump-tree -b 44900352 /dev/dm-0
> > btrfs-progs v5.6
> > leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
> > leaf 44900352 flags 0x1(WRITTEN) backref revision 1
>
> This block doesn't even have slot 32... It only have 19 items, thus slot
> 0 ~ slot 18.
> And its owner, FS_TREE shouldn't have ROOT_ITEM.
>
> >
> >
> > === Block 55352561664 ===
> > $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
> > btrfs-progs v5.6
> > leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
> > leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
> > fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> ...
> >         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >                 drop key (0 UNKNOWN.0 0) level 0
>
> This looks like the offending tree block.
> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
>
> Since you're here, I guess a btrfs check without --repair on the
> unmounted fs would help to identify the real damage.
>
> And again, the fs looks very damaged, it's highly recommended to backup
> your data asap.
>
> Thanks,
> Qu
>
> > --- snap ---
> >
> >
> >
> > On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
> >>> Hi,
> >>>
> >>> I've updated my system (Debian testing) [1] several months ago (~
> >>> December) and I noticed a lot of corrupt leaf messages flooding my
> >>> kern.log [2]. Furthermore my system had some trouble, e.g.
> >>> applications were terminated after some uptime, due to the btrfs
> >>> filesystem errors. This was with kernel 5.3.
> >>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
> >>>
> >>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
> >>> release and with this kernel there aren't any corrupt leaf messages
> >>> and the problem is gone. IMHO, it must be something coming with kernel
> >>> 5.3 (or 5.x).
> >>
> >> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
> >> such *obviously* wrong metadata.
> >>>
> >>> My harddisk is a SSD which is responsible for the root partition. I've
> >>> encrypted my filesystem with LUKS and just right after I entered my
> >>> password at the boot, the first corrupt leaf errors appear.
> >>>
> >>> An error message looks like this:
> >>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>
> >> Btrfs root items have fixed size. This is already something very bad.
> >>
> >> Furthermore, the item size is smaller than expected, which means we can
> >> easily get garbage. I'm a little surprised that older kernel can even
> >> work without crashing the whole kernel.
> >>
> >> Some extra info could help us to find out how badly the fs is corrupted.
> >> # btrfs ins dump-tree -b 35799040 /dev/dm-0
> >>
> >>>
> >>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
> >>> error line. Only the block number changes.
> >>
> >> And dumps for the other block numbers too.
> >>
> >>>
> >>> Interestingly it's the very same as reported to the ML here [3]. I've
> >>> contacted the reporter, but he didn't have a solution for me, because
> >>> he changed to a different filesystem.
> >>>
> >>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
> >>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
> >>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
> >>> again, w/o any errors.
> >>
> >> This doesn't look like a bit flip, so not RAM problems.
> >>
> >> Don't have any better advice until we got the dumps, but I'd recommend
> >> to backup your data since it's still possible.
> >>
> >> Thanks,
> >> Qu
> >>
> >>>
> >>> So, I have no more ideas what I can do. Could you please help me to
> >>> investigate this further? Could it be a bug?
> >>>
> >>> Thank you very much.
> >>>
> >>> Best regards,
> >>> Thorsten
> >>>
> >>>
> >>>
> >>> 1:
> >>> $ cat /etc/debian_version
> >>> bullseye/sid
> >>>
> >>> $ uname -a
> >>> [no problem with this kernel]
> >>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
> >>>
> >>> $ btrfs --version
> >>> btrfs-progs v5.6
> >>>
> >>> $ sudo btrfs fi show
> >>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>         Total devices 1 FS bytes used 7.33GiB
> >>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
> >>>
> >>> $ btrfs fi df /
> >>> Data, single: total=22.01GiB, used=7.16GiB
> >>> System, DUP: total=32.00MiB, used=4.00KiB
> >>> System, single: total=4.00MiB, used=0.00B
> >>> Metadata, DUP: total=2.00GiB, used=168.19MiB
> >>> Metadata, single: total=8.00MiB, used=0.00B
> >>> GlobalReserve, single: total=25.42MiB, used=0.00B
> >>>
> >>>
> >>> 2:
> >>> [several messages per second]
> >>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
> >>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
> >>> size, have 239 expect 439
> >>>
> >>> 3:
> >>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-04 10:52       ` Thorsten Rehm
@ 2020-06-04 12:06         ` Qu Wenruo
  2020-06-04 17:57           ` Thorsten Rehm
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2020-06-04 12:06 UTC (permalink / raw)
  To: Thorsten Rehm; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 15542 bytes --]



On 2020/6/4 下午6:52, Thorsten Rehm wrote:
> The disk in question is my root (/) partition. If the filesystem is
> that highly damaged, I have to reinstall my system. We will see, if
> it's come to that. Maybe we find something interesting on the way...
> I've downloaded the latest grml daily image and started my system from
> a usb stick. Here we go:
> 
> root@grml ~ # uname -r
> 5.6.0-2-amd64
> 
> root@grml ~ # cryptsetup open /dev/sda5 foo
> 
>                                                                   :(
> Enter passphrase for /dev/sda5:
> 
> root@grml ~ # file -L -s /dev/mapper/foo
> /dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
> nodesize 4096, leafsize 4096,
> UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
> bytes used, 1 devices
> 
> root@grml ~ # btrfs check /dev/mapper/foo
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/foo
> UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 7815716864 bytes used, no error found
> total csum bytes: 6428260
> total tree bytes: 175968256
> total fs tree bytes: 149475328
> total extent tree bytes: 16052224
> btree space waste bytes: 43268911
> file data blocks allocated: 10453221376
>  referenced 8746053632

Errr, this is a super good news, all your fs metadata is completely fine
(at least for the first copy).
Which is completely different from the kernel dmesg.

> 
> root@grml ~ # lsblk /dev/sda5 --fs
> NAME  FSTYPE      FSVER LABEL UUID
> FSAVAIL FSUSE% MOUNTPOINT
> sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
> └─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685
> 
> root@grml ~ # mount /dev/mapper/foo /mnt
> root@grml ~ # btrfs scrub start /mnt
> 
> root@grml ~ # journalctl -k --no-pager | grep BTRFS
> Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
> 24750795 /dev/dm-0 scanned by systemd-udevd (3233)
> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
> caching is enabled
> Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
> leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
> expect 439

One error line without "read time corruption" line means btrfs kernel
indeed skipped to next copy.
In this case, there is one copy (aka the first copy) corrupted.
Strangely, if it's the first copy in kernel, it should also be the first
copy in btrfs check.

And no problem reported from btrfs check, that's already super strange.

> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
> optimizations
> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
> Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
> Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
> leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
> expect 439
> Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
> leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
> expect 439
> Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
> on devid 1 with status: 0
> Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
> leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
> expect 439

This means the corrupted copy are also there for several (and I guess
unrelated) tree blocks.
For scrub I guess it just try to read the good copy without bothering
the bad one it found, so no error reported in scrub.

But still, if you're using metadata without copy (aka, SINGLE, RAID0)
then it would be a completely different story.


> 
> root@grml ~ # btrfs scrub status /mnt
> UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
> Scrub started:    Thu Jun  4 10:45:38 2020
> Status:           finished
> Duration:         0:00:53
> Total to scrub:   7.44GiB
> Rate:             143.80MiB/s
> Error summary:    no errors found
> 
> 
> root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
> btrfs ins dump-tree -b $block /dev/dm-0; done
> btrfs-progs v5.6
> leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
> leaf 54222848 flags 0x1(WRITTEN) backref revision 1
> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>     item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
>         generation 24703953 transid 24703953 size 262144 nbytes 8595701760
...
>         cache generation 24750791 entries 139 bitmaps 8
>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239

So it's still there. The first copy is corrupted. Just btrfs-progs can't
detect it.

>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>         drop key (0 UNKNOWN.0 0) level 0
> btrfs-progs v5.6
> leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
> leaf 29552640 flags 0x1(WRITTEN) backref revision 1
> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
...
>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>         drop key (0 UNKNOWN.0 0) level 0

This is different from previous copy, which means it should be an CoWed
tree blocks.

> btrfs-progs v5.6
> leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE

Even newer one.

...
> btrfs-progs v5.6
> leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE

Newer.

So It looks the bad copy exists for a while, but at the same time we
still have one good copy to let everything float.

To kill all the old corrupted copies, if it supports TRIM/DISCARD, I
recommend to run scrub first, then fstrim on the fs.

If it's HDD, I recommend to run a btrfs balance -m to relocate all
metadata blocks, to get rid the bad copies.

Of course, all using v5.3+ kernels.

Thanks,
Qu
> 
> On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
>>> Thank you for you answer.
>>> I've just updated my system, did a reboot and it's running with a
>>> 5.6.0-2-amd64 now.
>>> So, this is how my kern.log looks like, just right after the start:
>>>
>>
>>>
>>> There are too many blocks. I just picked three randomly:
>>
>> Looks like we need more result, especially some result doesn't match at all.
>>
>>>
>>> === Block 33017856 ===
>>> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
>>> btrfs-progs v5.6
>>> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
>>> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>> ...
>>>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
>>>                 generation 24749502 type 1 (regular)
>>>                 extent data disk byte 1126502400 nr 4096
>>>                 extent data offset 0 nr 8192 ram 8192
>>>                 extent compression 2 (lzo)
>>>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
>>>                 generation 24749502 type 1 (regular)
>>>                 extent data disk byte 0 nr 0
>>>                 extent data offset 1937408 nr 4096 ram 4194304
>>>                 extent compression 0 (none)
>> Not root item at all.
>> At least for this copy, it looks like kernel got one completely bad
>> copy, then discarded it and found a good copy.
>>
>> That's very strange, especially when all the other involved ones seems
>> random and all at slot 32 is not a coincident.
>>
>>
>>> === Block 44900352  ===
>>> btrfs ins dump-tree -b 44900352 /dev/dm-0
>>> btrfs-progs v5.6
>>> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
>>> leaf 44900352 flags 0x1(WRITTEN) backref revision 1
>>
>> This block doesn't even have slot 32... It only have 19 items, thus slot
>> 0 ~ slot 18.
>> And its owner, FS_TREE shouldn't have ROOT_ITEM.
>>
>>>
>>>
>>> === Block 55352561664 ===
>>> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
>>> btrfs-progs v5.6
>>> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
>>> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>> ...
>>>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>                 drop key (0 UNKNOWN.0 0) level 0
>>
>> This looks like the offending tree block.
>> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
>>
>> Since you're here, I guess a btrfs check without --repair on the
>> unmounted fs would help to identify the real damage.
>>
>> And again, the fs looks very damaged, it's highly recommended to backup
>> your data asap.
>>
>> Thanks,
>> Qu
>>
>>> --- snap ---
>>>
>>>
>>>
>>> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
>>>>> Hi,
>>>>>
>>>>> I've updated my system (Debian testing) [1] several months ago (~
>>>>> December) and I noticed a lot of corrupt leaf messages flooding my
>>>>> kern.log [2]. Furthermore my system had some trouble, e.g.
>>>>> applications were terminated after some uptime, due to the btrfs
>>>>> filesystem errors. This was with kernel 5.3.
>>>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
>>>>>
>>>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
>>>>> release and with this kernel there aren't any corrupt leaf messages
>>>>> and the problem is gone. IMHO, it must be something coming with kernel
>>>>> 5.3 (or 5.x).
>>>>
>>>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
>>>> such *obviously* wrong metadata.
>>>>>
>>>>> My harddisk is a SSD which is responsible for the root partition. I've
>>>>> encrypted my filesystem with LUKS and just right after I entered my
>>>>> password at the boot, the first corrupt leaf errors appear.
>>>>>
>>>>> An error message looks like this:
>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>
>>>> Btrfs root items have fixed size. This is already something very bad.
>>>>
>>>> Furthermore, the item size is smaller than expected, which means we can
>>>> easily get garbage. I'm a little surprised that older kernel can even
>>>> work without crashing the whole kernel.
>>>>
>>>> Some extra info could help us to find out how badly the fs is corrupted.
>>>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
>>>>
>>>>>
>>>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
>>>>> error line. Only the block number changes.
>>>>
>>>> And dumps for the other block numbers too.
>>>>
>>>>>
>>>>> Interestingly it's the very same as reported to the ML here [3]. I've
>>>>> contacted the reporter, but he didn't have a solution for me, because
>>>>> he changed to a different filesystem.
>>>>>
>>>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
>>>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
>>>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
>>>>> again, w/o any errors.
>>>>
>>>> This doesn't look like a bit flip, so not RAM problems.
>>>>
>>>> Don't have any better advice until we got the dumps, but I'd recommend
>>>> to backup your data since it's still possible.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> So, I have no more ideas what I can do. Could you please help me to
>>>>> investigate this further? Could it be a bug?
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> Best regards,
>>>>> Thorsten
>>>>>
>>>>>
>>>>>
>>>>> 1:
>>>>> $ cat /etc/debian_version
>>>>> bullseye/sid
>>>>>
>>>>> $ uname -a
>>>>> [no problem with this kernel]
>>>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
>>>>>
>>>>> $ btrfs --version
>>>>> btrfs-progs v5.6
>>>>>
>>>>> $ sudo btrfs fi show
>>>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>         Total devices 1 FS bytes used 7.33GiB
>>>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
>>>>>
>>>>> $ btrfs fi df /
>>>>> Data, single: total=22.01GiB, used=7.16GiB
>>>>> System, DUP: total=32.00MiB, used=4.00KiB
>>>>> System, single: total=4.00MiB, used=0.00B
>>>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
>>>>> Metadata, single: total=8.00MiB, used=0.00B
>>>>> GlobalReserve, single: total=25.42MiB, used=0.00B
>>>>>
>>>>>
>>>>> 2:
>>>>> [several messages per second]
>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
>>>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
>>>>> size, have 239 expect 439
>>>>>
>>>>> 3:
>>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
>>>>>
>>>>
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-04 12:06         ` Qu Wenruo
@ 2020-06-04 17:57           ` Thorsten Rehm
  2020-06-08 13:25             ` Thorsten Rehm
  0 siblings, 1 reply; 14+ messages in thread
From: Thorsten Rehm @ 2020-06-04 17:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hmm, ok wait a minute:

"But still, if you're using metadata without copy (aka, SINGLE, RAID0)
then it would be a completely different story."

It's a single disk (SSD):

root@grml ~ # btrfs filesystem usage /mnt
Overall:
    Device size:         115.23GiB
    Device allocated:          26.08GiB
    Device unallocated:          89.15GiB
    Device missing:             0.00B
    Used:               7.44GiB
    Free (estimated):         104.04GiB    (min: 59.47GiB)
    Data ratio:                  1.00
    Metadata ratio:              2.00
    Global reserve:          25.25MiB    (used: 0.00B)

Data,single: Size:22.01GiB, Used:7.11GiB (32.33%)
   /dev/mapper/foo      22.01GiB

Metadata,single: Size:8.00MiB, Used:0.00B (0.00%)
   /dev/mapper/foo       8.00MiB

Metadata,DUP: Size:2.00GiB, Used:167.81MiB (8.19%)
   /dev/mapper/foo       4.00GiB

System,single: Size:4.00MiB, Used:0.00B (0.00%)
   /dev/mapper/foo       4.00MiB

System,DUP: Size:32.00MiB, Used:4.00KiB (0.01%)
   /dev/mapper/foo      64.00MiB

Unallocated:
   /dev/mapper/foo      89.15GiB


root@grml ~ # btrfs filesystem df /mnt
Data, single: total=22.01GiB, used=7.11GiB
System, DUP: total=32.00MiB, used=4.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=2.00GiB, used=167.81MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=25.25MiB, used=0.00B

I did also a fstrim:

root@grml ~ # cryptsetup --allow-discards open /dev/sda5 foo
Enter passphrase for /dev/sda5:
root@grml ~ # mount -o discard /dev/mapper/foo /mnt
root@grml ~ # fstrim -v /mnt/
/mnt/: 105.8 GiB (113600049152 bytes) trimmed
fstrim -v /mnt/  0.00s user 5.34s system 0% cpu 10:28.70 total

The kern.log in the runtime of fstrim:
--- snip ---
Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
leaf: root=1 block=32505856 slot=32, invalid root item size, have 239
expect 439
Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
expect 439
Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): turning on sync discard
Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): disk space
caching is enabled
Jun 04 12:32:37 grml kernel: BTRFS critical (device dm-0): corrupt
leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
expect 439
Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): enabling ssd
optimizations
Jun 04 12:34:35 grml kernel: BTRFS critical (device dm-0): corrupt
leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
expect 439
Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): turning on sync discard
Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): disk space
caching is enabled
Jun 04 12:36:50 grml kernel: BTRFS critical (device dm-0): corrupt
leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
expect 439
Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): enabling ssd
optimizations
--- snap ---

Furthermore the system runs for years now. I can't remember exactly,
but think for 4-5 years. I've started with Debian Testing and just
upgraded my system on a regular basis. And and I started with btrfs of
course, but I can't remember with which version...

The problem is still there after the fstrim. Any further suggestions?

And isn't it a little bit strange, that someone had a very similiar problem?
https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/

root=1, slot=32, and "invalid root item size, have 239 expect 439" are
identical to my errors.

Thx so far!



On Thu, Jun 4, 2020 at 2:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2020/6/4 下午6:52, Thorsten Rehm wrote:
> > The disk in question is my root (/) partition. If the filesystem is
> > that highly damaged, I have to reinstall my system. We will see, if
> > it's come to that. Maybe we find something interesting on the way...
> > I've downloaded the latest grml daily image and started my system from
> > a usb stick. Here we go:
> >
> > root@grml ~ # uname -r
> > 5.6.0-2-amd64
> >
> > root@grml ~ # cryptsetup open /dev/sda5 foo
> >
> >                                                                   :(
> > Enter passphrase for /dev/sda5:
> >
> > root@grml ~ # file -L -s /dev/mapper/foo
> > /dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
> > nodesize 4096, leafsize 4096,
> > UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
> > bytes used, 1 devices
> >
> > root@grml ~ # btrfs check /dev/mapper/foo
> > Opening filesystem to check...
> > Checking filesystem on /dev/mapper/foo
> > UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
> > [1/7] checking root items
> > [2/7] checking extents
> > [3/7] checking free space cache
> > [4/7] checking fs roots
> > [5/7] checking only csums items (without verifying data)
> > [6/7] checking root refs
> > [7/7] checking quota groups skipped (not enabled on this FS)
> > found 7815716864 bytes used, no error found
> > total csum bytes: 6428260
> > total tree bytes: 175968256
> > total fs tree bytes: 149475328
> > total extent tree bytes: 16052224
> > btree space waste bytes: 43268911
> > file data blocks allocated: 10453221376
> >  referenced 8746053632
>
> Errr, this is a super good news, all your fs metadata is completely fine
> (at least for the first copy).
> Which is completely different from the kernel dmesg.
>
> >
> > root@grml ~ # lsblk /dev/sda5 --fs
> > NAME  FSTYPE      FSVER LABEL UUID
> > FSAVAIL FSUSE% MOUNTPOINT
> > sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
> > └─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685
> >
> > root@grml ~ # mount /dev/mapper/foo /mnt
> > root@grml ~ # btrfs scrub start /mnt
> >
> > root@grml ~ # journalctl -k --no-pager | grep BTRFS
> > Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
> > 24750795 /dev/dm-0 scanned by systemd-udevd (3233)
> > Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
> > caching is enabled
> > Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
> > leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
> > expect 439
>
> One error line without "read time corruption" line means btrfs kernel
> indeed skipped to next copy.
> In this case, there is one copy (aka the first copy) corrupted.
> Strangely, if it's the first copy in kernel, it should also be the first
> copy in btrfs check.
>
> And no problem reported from btrfs check, that's already super strange.
>
> > Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
> > optimizations
> > Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
> > Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
> > Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
> > leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
> > expect 439
> > Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
> > leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
> > expect 439
> > Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
> > on devid 1 with status: 0
> > Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
> > leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
> > expect 439
>
> This means the corrupted copy are also there for several (and I guess
> unrelated) tree blocks.
> For scrub I guess it just try to read the good copy without bothering
> the bad one it found, so no error reported in scrub.
>
> But still, if you're using metadata without copy (aka, SINGLE, RAID0)
> then it would be a completely different story.
>
>
> >
> > root@grml ~ # btrfs scrub status /mnt
> > UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
> > Scrub started:    Thu Jun  4 10:45:38 2020
> > Status:           finished
> > Duration:         0:00:53
> > Total to scrub:   7.44GiB
> > Rate:             143.80MiB/s
> > Error summary:    no errors found
> >
> >
> > root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
> > btrfs ins dump-tree -b $block /dev/dm-0; done
> > btrfs-progs v5.6
> > leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
> > leaf 54222848 flags 0x1(WRITTEN) backref revision 1
> > fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >     item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
> >         generation 24703953 transid 24703953 size 262144 nbytes 8595701760
> ...
> >         cache generation 24750791 entries 139 bitmaps 8
> >     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>
> So it's still there. The first copy is corrupted. Just btrfs-progs can't
> detect it.
>
> >         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >         drop key (0 UNKNOWN.0 0) level 0
> > btrfs-progs v5.6
> > leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
> > leaf 29552640 flags 0x1(WRITTEN) backref revision 1
> > fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> ...
> >     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >         drop key (0 UNKNOWN.0 0) level 0
>
> This is different from previous copy, which means it should be an CoWed
> tree blocks.
>
> > btrfs-progs v5.6
> > leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE
>
> Even newer one.
>
> ...
> > btrfs-progs v5.6
> > leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE
>
> Newer.
>
> So It looks the bad copy exists for a while, but at the same time we
> still have one good copy to let everything float.
>
> To kill all the old corrupted copies, if it supports TRIM/DISCARD, I
> recommend to run scrub first, then fstrim on the fs.
>
> If it's HDD, I recommend to run a btrfs balance -m to relocate all
> metadata blocks, to get rid the bad copies.
>
> Of course, all using v5.3+ kernels.
>
> Thanks,
> Qu
> >
> > On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
> >>> Thank you for you answer.
> >>> I've just updated my system, did a reboot and it's running with a
> >>> 5.6.0-2-amd64 now.
> >>> So, this is how my kern.log looks like, just right after the start:
> >>>
> >>
> >>>
> >>> There are too many blocks. I just picked three randomly:
> >>
> >> Looks like we need more result, especially some result doesn't match at all.
> >>
> >>>
> >>> === Block 33017856 ===
> >>> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
> >>> btrfs-progs v5.6
> >>> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
> >>> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
> >>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >> ...
> >>>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
> >>>                 generation 24749502 type 1 (regular)
> >>>                 extent data disk byte 1126502400 nr 4096
> >>>                 extent data offset 0 nr 8192 ram 8192
> >>>                 extent compression 2 (lzo)
> >>>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
> >>>                 generation 24749502 type 1 (regular)
> >>>                 extent data disk byte 0 nr 0
> >>>                 extent data offset 1937408 nr 4096 ram 4194304
> >>>                 extent compression 0 (none)
> >> Not root item at all.
> >> At least for this copy, it looks like kernel got one completely bad
> >> copy, then discarded it and found a good copy.
> >>
> >> That's very strange, especially when all the other involved ones seems
> >> random and all at slot 32 is not a coincident.
> >>
> >>
> >>> === Block 44900352  ===
> >>> btrfs ins dump-tree -b 44900352 /dev/dm-0
> >>> btrfs-progs v5.6
> >>> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
> >>> leaf 44900352 flags 0x1(WRITTEN) backref revision 1
> >>
> >> This block doesn't even have slot 32... It only have 19 items, thus slot
> >> 0 ~ slot 18.
> >> And its owner, FS_TREE shouldn't have ROOT_ITEM.
> >>
> >>>
> >>>
> >>> === Block 55352561664 ===
> >>> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
> >>> btrfs-progs v5.6
> >>> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
> >>> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
> >>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >> ...
> >>>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >>>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >>>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >>>                 drop key (0 UNKNOWN.0 0) level 0
> >>
> >> This looks like the offending tree block.
> >> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
> >>
> >> Since you're here, I guess a btrfs check without --repair on the
> >> unmounted fs would help to identify the real damage.
> >>
> >> And again, the fs looks very damaged, it's highly recommended to backup
> >> your data asap.
> >>
> >> Thanks,
> >> Qu
> >>
> >>> --- snap ---
> >>>
> >>>
> >>>
> >>> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I've updated my system (Debian testing) [1] several months ago (~
> >>>>> December) and I noticed a lot of corrupt leaf messages flooding my
> >>>>> kern.log [2]. Furthermore my system had some trouble, e.g.
> >>>>> applications were terminated after some uptime, due to the btrfs
> >>>>> filesystem errors. This was with kernel 5.3.
> >>>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
> >>>>>
> >>>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
> >>>>> release and with this kernel there aren't any corrupt leaf messages
> >>>>> and the problem is gone. IMHO, it must be something coming with kernel
> >>>>> 5.3 (or 5.x).
> >>>>
> >>>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
> >>>> such *obviously* wrong metadata.
> >>>>>
> >>>>> My harddisk is a SSD which is responsible for the root partition. I've
> >>>>> encrypted my filesystem with LUKS and just right after I entered my
> >>>>> password at the boot, the first corrupt leaf errors appear.
> >>>>>
> >>>>> An error message looks like this:
> >>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>
> >>>> Btrfs root items have fixed size. This is already something very bad.
> >>>>
> >>>> Furthermore, the item size is smaller than expected, which means we can
> >>>> easily get garbage. I'm a little surprised that older kernel can even
> >>>> work without crashing the whole kernel.
> >>>>
> >>>> Some extra info could help us to find out how badly the fs is corrupted.
> >>>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
> >>>>
> >>>>>
> >>>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
> >>>>> error line. Only the block number changes.
> >>>>
> >>>> And dumps for the other block numbers too.
> >>>>
> >>>>>
> >>>>> Interestingly it's the very same as reported to the ML here [3]. I've
> >>>>> contacted the reporter, but he didn't have a solution for me, because
> >>>>> he changed to a different filesystem.
> >>>>>
> >>>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
> >>>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
> >>>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
> >>>>> again, w/o any errors.
> >>>>
> >>>> This doesn't look like a bit flip, so not RAM problems.
> >>>>
> >>>> Don't have any better advice until we got the dumps, but I'd recommend
> >>>> to backup your data since it's still possible.
> >>>>
> >>>> Thanks,
> >>>> Qu
> >>>>
> >>>>>
> >>>>> So, I have no more ideas what I can do. Could you please help me to
> >>>>> investigate this further? Could it be a bug?
> >>>>>
> >>>>> Thank you very much.
> >>>>>
> >>>>> Best regards,
> >>>>> Thorsten
> >>>>>
> >>>>>
> >>>>>
> >>>>> 1:
> >>>>> $ cat /etc/debian_version
> >>>>> bullseye/sid
> >>>>>
> >>>>> $ uname -a
> >>>>> [no problem with this kernel]
> >>>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
> >>>>>
> >>>>> $ btrfs --version
> >>>>> btrfs-progs v5.6
> >>>>>
> >>>>> $ sudo btrfs fi show
> >>>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>         Total devices 1 FS bytes used 7.33GiB
> >>>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
> >>>>>
> >>>>> $ btrfs fi df /
> >>>>> Data, single: total=22.01GiB, used=7.16GiB
> >>>>> System, DUP: total=32.00MiB, used=4.00KiB
> >>>>> System, single: total=4.00MiB, used=0.00B
> >>>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
> >>>>> Metadata, single: total=8.00MiB, used=0.00B
> >>>>> GlobalReserve, single: total=25.42MiB, used=0.00B
> >>>>>
> >>>>>
> >>>>> 2:
> >>>>> [several messages per second]
> >>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
> >>>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
> >>>>> size, have 239 expect 439
> >>>>>
> >>>>> 3:
> >>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> >>>>>
> >>>>
> >>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-04 17:57           ` Thorsten Rehm
@ 2020-06-08 13:25             ` Thorsten Rehm
  2020-06-08 13:29               ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Thorsten Rehm @ 2020-06-08 13:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hi,

any more ideas to investigate this?

On Thu, Jun 4, 2020 at 7:57 PM Thorsten Rehm <thorsten.rehm@gmail.com> wrote:
>
> Hmm, ok wait a minute:
>
> "But still, if you're using metadata without copy (aka, SINGLE, RAID0)
> then it would be a completely different story."
>
> It's a single disk (SSD):
>
> root@grml ~ # btrfs filesystem usage /mnt
> Overall:
>     Device size:         115.23GiB
>     Device allocated:          26.08GiB
>     Device unallocated:          89.15GiB
>     Device missing:             0.00B
>     Used:               7.44GiB
>     Free (estimated):         104.04GiB    (min: 59.47GiB)
>     Data ratio:                  1.00
>     Metadata ratio:              2.00
>     Global reserve:          25.25MiB    (used: 0.00B)
>
> Data,single: Size:22.01GiB, Used:7.11GiB (32.33%)
>    /dev/mapper/foo      22.01GiB
>
> Metadata,single: Size:8.00MiB, Used:0.00B (0.00%)
>    /dev/mapper/foo       8.00MiB
>
> Metadata,DUP: Size:2.00GiB, Used:167.81MiB (8.19%)
>    /dev/mapper/foo       4.00GiB
>
> System,single: Size:4.00MiB, Used:0.00B (0.00%)
>    /dev/mapper/foo       4.00MiB
>
> System,DUP: Size:32.00MiB, Used:4.00KiB (0.01%)
>    /dev/mapper/foo      64.00MiB
>
> Unallocated:
>    /dev/mapper/foo      89.15GiB
>
>
> root@grml ~ # btrfs filesystem df /mnt
> Data, single: total=22.01GiB, used=7.11GiB
> System, DUP: total=32.00MiB, used=4.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=2.00GiB, used=167.81MiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=25.25MiB, used=0.00B
>
> I did also a fstrim:
>
> root@grml ~ # cryptsetup --allow-discards open /dev/sda5 foo
> Enter passphrase for /dev/sda5:
> root@grml ~ # mount -o discard /dev/mapper/foo /mnt
> root@grml ~ # fstrim -v /mnt/
> /mnt/: 105.8 GiB (113600049152 bytes) trimmed
> fstrim -v /mnt/  0.00s user 5.34s system 0% cpu 10:28.70 total
>
> The kern.log in the runtime of fstrim:
> --- snip ---
> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
> leaf: root=1 block=32505856 slot=32, invalid root item size, have 239
> expect 439
> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
> expect 439
> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): turning on sync discard
> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): disk space
> caching is enabled
> Jun 04 12:32:37 grml kernel: BTRFS critical (device dm-0): corrupt
> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
> expect 439
> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): enabling ssd
> optimizations
> Jun 04 12:34:35 grml kernel: BTRFS critical (device dm-0): corrupt
> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
> expect 439
> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): turning on sync discard
> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): disk space
> caching is enabled
> Jun 04 12:36:50 grml kernel: BTRFS critical (device dm-0): corrupt
> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
> expect 439
> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): enabling ssd
> optimizations
> --- snap ---
>
> Furthermore the system runs for years now. I can't remember exactly,
> but think for 4-5 years. I've started with Debian Testing and just
> upgraded my system on a regular basis. And and I started with btrfs of
> course, but I can't remember with which version...
>
> The problem is still there after the fstrim. Any further suggestions?
>
> And isn't it a little bit strange, that someone had a very similiar problem?
> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
>
> root=1, slot=32, and "invalid root item size, have 239 expect 439" are
> identical to my errors.
>
> Thx so far!
>
>
>
> On Thu, Jun 4, 2020 at 2:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >
> >
> >
> > On 2020/6/4 下午6:52, Thorsten Rehm wrote:
> > > The disk in question is my root (/) partition. If the filesystem is
> > > that highly damaged, I have to reinstall my system. We will see, if
> > > it's come to that. Maybe we find something interesting on the way...
> > > I've downloaded the latest grml daily image and started my system from
> > > a usb stick. Here we go:
> > >
> > > root@grml ~ # uname -r
> > > 5.6.0-2-amd64
> > >
> > > root@grml ~ # cryptsetup open /dev/sda5 foo
> > >
> > >                                                                   :(
> > > Enter passphrase for /dev/sda5:
> > >
> > > root@grml ~ # file -L -s /dev/mapper/foo
> > > /dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
> > > nodesize 4096, leafsize 4096,
> > > UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
> > > bytes used, 1 devices
> > >
> > > root@grml ~ # btrfs check /dev/mapper/foo
> > > Opening filesystem to check...
> > > Checking filesystem on /dev/mapper/foo
> > > UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
> > > [1/7] checking root items
> > > [2/7] checking extents
> > > [3/7] checking free space cache
> > > [4/7] checking fs roots
> > > [5/7] checking only csums items (without verifying data)
> > > [6/7] checking root refs
> > > [7/7] checking quota groups skipped (not enabled on this FS)
> > > found 7815716864 bytes used, no error found
> > > total csum bytes: 6428260
> > > total tree bytes: 175968256
> > > total fs tree bytes: 149475328
> > > total extent tree bytes: 16052224
> > > btree space waste bytes: 43268911
> > > file data blocks allocated: 10453221376
> > >  referenced 8746053632
> >
> > Errr, this is a super good news, all your fs metadata is completely fine
> > (at least for the first copy).
> > Which is completely different from the kernel dmesg.
> >
> > >
> > > root@grml ~ # lsblk /dev/sda5 --fs
> > > NAME  FSTYPE      FSVER LABEL UUID
> > > FSAVAIL FSUSE% MOUNTPOINT
> > > sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
> > > └─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >
> > > root@grml ~ # mount /dev/mapper/foo /mnt
> > > root@grml ~ # btrfs scrub start /mnt
> > >
> > > root@grml ~ # journalctl -k --no-pager | grep BTRFS
> > > Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
> > > 24750795 /dev/dm-0 scanned by systemd-udevd (3233)
> > > Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
> > > caching is enabled
> > > Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
> > > leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
> > > expect 439
> >
> > One error line without "read time corruption" line means btrfs kernel
> > indeed skipped to next copy.
> > In this case, there is one copy (aka the first copy) corrupted.
> > Strangely, if it's the first copy in kernel, it should also be the first
> > copy in btrfs check.
> >
> > And no problem reported from btrfs check, that's already super strange.
> >
> > > Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
> > > optimizations
> > > Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
> > > Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
> > > Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
> > > leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
> > > expect 439
> > > Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
> > > leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
> > > expect 439
> > > Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
> > > on devid 1 with status: 0
> > > Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
> > > leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
> > > expect 439
> >
> > This means the corrupted copy are also there for several (and I guess
> > unrelated) tree blocks.
> > For scrub I guess it just try to read the good copy without bothering
> > the bad one it found, so no error reported in scrub.
> >
> > But still, if you're using metadata without copy (aka, SINGLE, RAID0)
> > then it would be a completely different story.
> >
> >
> > >
> > > root@grml ~ # btrfs scrub status /mnt
> > > UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
> > > Scrub started:    Thu Jun  4 10:45:38 2020
> > > Status:           finished
> > > Duration:         0:00:53
> > > Total to scrub:   7.44GiB
> > > Rate:             143.80MiB/s
> > > Error summary:    no errors found
> > >
> > >
> > > root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
> > > btrfs ins dump-tree -b $block /dev/dm-0; done
> > > btrfs-progs v5.6
> > > leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
> > > leaf 54222848 flags 0x1(WRITTEN) backref revision 1
> > > fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > > chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> > >     item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
> > >         generation 24703953 transid 24703953 size 262144 nbytes 8595701760
> > ...
> > >         cache generation 24750791 entries 139 bitmaps 8
> > >     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >
> > So it's still there. The first copy is corrupted. Just btrfs-progs can't
> > detect it.
> >
> > >         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> > >         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> > >         drop key (0 UNKNOWN.0 0) level 0
> > > btrfs-progs v5.6
> > > leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
> > > leaf 29552640 flags 0x1(WRITTEN) backref revision 1
> > > fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > > chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> > ...
> > >     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> > >         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> > >         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> > >         drop key (0 UNKNOWN.0 0) level 0
> >
> > This is different from previous copy, which means it should be an CoWed
> > tree blocks.
> >
> > > btrfs-progs v5.6
> > > leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE
> >
> > Even newer one.
> >
> > ...
> > > btrfs-progs v5.6
> > > leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE
> >
> > Newer.
> >
> > So It looks the bad copy exists for a while, but at the same time we
> > still have one good copy to let everything float.
> >
> > To kill all the old corrupted copies, if it supports TRIM/DISCARD, I
> > recommend to run scrub first, then fstrim on the fs.
> >
> > If it's HDD, I recommend to run a btrfs balance -m to relocate all
> > metadata blocks, to get rid the bad copies.
> >
> > Of course, all using v5.3+ kernels.
> >
> > Thanks,
> > Qu
> > >
> > > On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
> > >>> Thank you for you answer.
> > >>> I've just updated my system, did a reboot and it's running with a
> > >>> 5.6.0-2-amd64 now.
> > >>> So, this is how my kern.log looks like, just right after the start:
> > >>>
> > >>
> > >>>
> > >>> There are too many blocks. I just picked three randomly:
> > >>
> > >> Looks like we need more result, especially some result doesn't match at all.
> > >>
> > >>>
> > >>> === Block 33017856 ===
> > >>> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
> > >>> btrfs-progs v5.6
> > >>> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
> > >>> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
> > >>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> > >> ...
> > >>>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
> > >>>                 generation 24749502 type 1 (regular)
> > >>>                 extent data disk byte 1126502400 nr 4096
> > >>>                 extent data offset 0 nr 8192 ram 8192
> > >>>                 extent compression 2 (lzo)
> > >>>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
> > >>>                 generation 24749502 type 1 (regular)
> > >>>                 extent data disk byte 0 nr 0
> > >>>                 extent data offset 1937408 nr 4096 ram 4194304
> > >>>                 extent compression 0 (none)
> > >> Not root item at all.
> > >> At least for this copy, it looks like kernel got one completely bad
> > >> copy, then discarded it and found a good copy.
> > >>
> > >> That's very strange, especially when all the other involved ones seems
> > >> random and all at slot 32 is not a coincident.
> > >>
> > >>
> > >>> === Block 44900352  ===
> > >>> btrfs ins dump-tree -b 44900352 /dev/dm-0
> > >>> btrfs-progs v5.6
> > >>> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
> > >>> leaf 44900352 flags 0x1(WRITTEN) backref revision 1
> > >>
> > >> This block doesn't even have slot 32... It only have 19 items, thus slot
> > >> 0 ~ slot 18.
> > >> And its owner, FS_TREE shouldn't have ROOT_ITEM.
> > >>
> > >>>
> > >>>
> > >>> === Block 55352561664 ===
> > >>> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
> > >>> btrfs-progs v5.6
> > >>> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
> > >>> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
> > >>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> > >> ...
> > >>>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> > >>>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> > >>>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> > >>>                 drop key (0 UNKNOWN.0 0) level 0
> > >>
> > >> This looks like the offending tree block.
> > >> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
> > >>
> > >> Since you're here, I guess a btrfs check without --repair on the
> > >> unmounted fs would help to identify the real damage.
> > >>
> > >> And again, the fs looks very damaged, it's highly recommended to backup
> > >> your data asap.
> > >>
> > >> Thanks,
> > >> Qu
> > >>
> > >>> --- snap ---
> > >>>
> > >>>
> > >>>
> > >>> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> I've updated my system (Debian testing) [1] several months ago (~
> > >>>>> December) and I noticed a lot of corrupt leaf messages flooding my
> > >>>>> kern.log [2]. Furthermore my system had some trouble, e.g.
> > >>>>> applications were terminated after some uptime, due to the btrfs
> > >>>>> filesystem errors. This was with kernel 5.3.
> > >>>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
> > >>>>>
> > >>>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
> > >>>>> release and with this kernel there aren't any corrupt leaf messages
> > >>>>> and the problem is gone. IMHO, it must be something coming with kernel
> > >>>>> 5.3 (or 5.x).
> > >>>>
> > >>>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
> > >>>> such *obviously* wrong metadata.
> > >>>>>
> > >>>>> My harddisk is a SSD which is responsible for the root partition. I've
> > >>>>> encrypted my filesystem with LUKS and just right after I entered my
> > >>>>> password at the boot, the first corrupt leaf errors appear.
> > >>>>>
> > >>>>> An error message looks like this:
> > >>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>
> > >>>> Btrfs root items have fixed size. This is already something very bad.
> > >>>>
> > >>>> Furthermore, the item size is smaller than expected, which means we can
> > >>>> easily get garbage. I'm a little surprised that older kernel can even
> > >>>> work without crashing the whole kernel.
> > >>>>
> > >>>> Some extra info could help us to find out how badly the fs is corrupted.
> > >>>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
> > >>>>
> > >>>>>
> > >>>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
> > >>>>> error line. Only the block number changes.
> > >>>>
> > >>>> And dumps for the other block numbers too.
> > >>>>
> > >>>>>
> > >>>>> Interestingly it's the very same as reported to the ML here [3]. I've
> > >>>>> contacted the reporter, but he didn't have a solution for me, because
> > >>>>> he changed to a different filesystem.
> > >>>>>
> > >>>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
> > >>>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
> > >>>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
> > >>>>> again, w/o any errors.
> > >>>>
> > >>>> This doesn't look like a bit flip, so not RAM problems.
> > >>>>
> > >>>> Don't have any better advice until we got the dumps, but I'd recommend
> > >>>> to backup your data since it's still possible.
> > >>>>
> > >>>> Thanks,
> > >>>> Qu
> > >>>>
> > >>>>>
> > >>>>> So, I have no more ideas what I can do. Could you please help me to
> > >>>>> investigate this further? Could it be a bug?
> > >>>>>
> > >>>>> Thank you very much.
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Thorsten
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> 1:
> > >>>>> $ cat /etc/debian_version
> > >>>>> bullseye/sid
> > >>>>>
> > >>>>> $ uname -a
> > >>>>> [no problem with this kernel]
> > >>>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
> > >>>>>
> > >>>>> $ btrfs --version
> > >>>>> btrfs-progs v5.6
> > >>>>>
> > >>>>> $ sudo btrfs fi show
> > >>>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>>>>         Total devices 1 FS bytes used 7.33GiB
> > >>>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
> > >>>>>
> > >>>>> $ btrfs fi df /
> > >>>>> Data, single: total=22.01GiB, used=7.16GiB
> > >>>>> System, DUP: total=32.00MiB, used=4.00KiB
> > >>>>> System, single: total=4.00MiB, used=0.00B
> > >>>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
> > >>>>> Metadata, single: total=8.00MiB, used=0.00B
> > >>>>> GlobalReserve, single: total=25.42MiB, used=0.00B
> > >>>>>
> > >>>>>
> > >>>>> 2:
> > >>>>> [several messages per second]
> > >>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
> > >>>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
> > >>>>> size, have 239 expect 439
> > >>>>>
> > >>>>> 3:
> > >>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> > >>>>>
> > >>>>
> > >>
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-08 13:25             ` Thorsten Rehm
@ 2020-06-08 13:29               ` Qu Wenruo
  2020-06-08 14:41                 ` Thorsten Rehm
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2020-06-08 13:29 UTC (permalink / raw)
  To: Thorsten Rehm; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 21431 bytes --]



On 2020/6/8 下午9:25, Thorsten Rehm wrote:
> Hi,
> 
> any more ideas to investigate this?

If you can still hit the same bug, and the fs is still completely fine,
I could craft some test patches for you tomorrow.

The idea behind it is to zero out all the memory for any bad eb.
Thus bad eb cache won't affect other read.
If that hugely reduced the frequency, I guess that would be the case.


But I'm still very interested in, have you hit "read time tree block
corruption detected" lines? Or just such slot=32 error lines?

Thanks,
Qu

> 
> On Thu, Jun 4, 2020 at 7:57 PM Thorsten Rehm <thorsten.rehm@gmail.com> wrote:
>>
>> Hmm, ok wait a minute:
>>
>> "But still, if you're using metadata without copy (aka, SINGLE, RAID0)
>> then it would be a completely different story."
>>
>> It's a single disk (SSD):
>>
>> root@grml ~ # btrfs filesystem usage /mnt
>> Overall:
>>     Device size:         115.23GiB
>>     Device allocated:          26.08GiB
>>     Device unallocated:          89.15GiB
>>     Device missing:             0.00B
>>     Used:               7.44GiB
>>     Free (estimated):         104.04GiB    (min: 59.47GiB)
>>     Data ratio:                  1.00
>>     Metadata ratio:              2.00
>>     Global reserve:          25.25MiB    (used: 0.00B)
>>
>> Data,single: Size:22.01GiB, Used:7.11GiB (32.33%)
>>    /dev/mapper/foo      22.01GiB
>>
>> Metadata,single: Size:8.00MiB, Used:0.00B (0.00%)
>>    /dev/mapper/foo       8.00MiB
>>
>> Metadata,DUP: Size:2.00GiB, Used:167.81MiB (8.19%)
>>    /dev/mapper/foo       4.00GiB
>>
>> System,single: Size:4.00MiB, Used:0.00B (0.00%)
>>    /dev/mapper/foo       4.00MiB
>>
>> System,DUP: Size:32.00MiB, Used:4.00KiB (0.01%)
>>    /dev/mapper/foo      64.00MiB
>>
>> Unallocated:
>>    /dev/mapper/foo      89.15GiB
>>
>>
>> root@grml ~ # btrfs filesystem df /mnt
>> Data, single: total=22.01GiB, used=7.11GiB
>> System, DUP: total=32.00MiB, used=4.00KiB
>> System, single: total=4.00MiB, used=0.00B
>> Metadata, DUP: total=2.00GiB, used=167.81MiB
>> Metadata, single: total=8.00MiB, used=0.00B
>> GlobalReserve, single: total=25.25MiB, used=0.00B
>>
>> I did also a fstrim:
>>
>> root@grml ~ # cryptsetup --allow-discards open /dev/sda5 foo
>> Enter passphrase for /dev/sda5:
>> root@grml ~ # mount -o discard /dev/mapper/foo /mnt
>> root@grml ~ # fstrim -v /mnt/
>> /mnt/: 105.8 GiB (113600049152 bytes) trimmed
>> fstrim -v /mnt/  0.00s user 5.34s system 0% cpu 10:28.70 total
>>
>> The kern.log in the runtime of fstrim:
>> --- snip ---
>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
>> leaf: root=1 block=32505856 slot=32, invalid root item size, have 239
>> expect 439
>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
>> expect 439
>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): turning on sync discard
>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): disk space
>> caching is enabled
>> Jun 04 12:32:37 grml kernel: BTRFS critical (device dm-0): corrupt
>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
>> expect 439
>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): enabling ssd
>> optimizations
>> Jun 04 12:34:35 grml kernel: BTRFS critical (device dm-0): corrupt
>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
>> expect 439
>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): turning on sync discard
>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): disk space
>> caching is enabled
>> Jun 04 12:36:50 grml kernel: BTRFS critical (device dm-0): corrupt
>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
>> expect 439
>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): enabling ssd
>> optimizations
>> --- snap ---
>>
>> Furthermore the system runs for years now. I can't remember exactly,
>> but think for 4-5 years. I've started with Debian Testing and just
>> upgraded my system on a regular basis. And and I started with btrfs of
>> course, but I can't remember with which version...
>>
>> The problem is still there after the fstrim. Any further suggestions?
>>
>> And isn't it a little bit strange, that someone had a very similiar problem?
>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
>>
>> root=1, slot=32, and "invalid root item size, have 239 expect 439" are
>> identical to my errors.
>>
>> Thx so far!
>>
>>
>>
>> On Thu, Jun 4, 2020 at 2:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>>
>>> On 2020/6/4 下午6:52, Thorsten Rehm wrote:
>>>> The disk in question is my root (/) partition. If the filesystem is
>>>> that highly damaged, I have to reinstall my system. We will see, if
>>>> it's come to that. Maybe we find something interesting on the way...
>>>> I've downloaded the latest grml daily image and started my system from
>>>> a usb stick. Here we go:
>>>>
>>>> root@grml ~ # uname -r
>>>> 5.6.0-2-amd64
>>>>
>>>> root@grml ~ # cryptsetup open /dev/sda5 foo
>>>>
>>>>                                                                   :(
>>>> Enter passphrase for /dev/sda5:
>>>>
>>>> root@grml ~ # file -L -s /dev/mapper/foo
>>>> /dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
>>>> nodesize 4096, leafsize 4096,
>>>> UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
>>>> bytes used, 1 devices
>>>>
>>>> root@grml ~ # btrfs check /dev/mapper/foo
>>>> Opening filesystem to check...
>>>> Checking filesystem on /dev/mapper/foo
>>>> UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>> [1/7] checking root items
>>>> [2/7] checking extents
>>>> [3/7] checking free space cache
>>>> [4/7] checking fs roots
>>>> [5/7] checking only csums items (without verifying data)
>>>> [6/7] checking root refs
>>>> [7/7] checking quota groups skipped (not enabled on this FS)
>>>> found 7815716864 bytes used, no error found
>>>> total csum bytes: 6428260
>>>> total tree bytes: 175968256
>>>> total fs tree bytes: 149475328
>>>> total extent tree bytes: 16052224
>>>> btree space waste bytes: 43268911
>>>> file data blocks allocated: 10453221376
>>>>  referenced 8746053632
>>>
>>> Errr, this is a super good news, all your fs metadata is completely fine
>>> (at least for the first copy).
>>> Which is completely different from the kernel dmesg.
>>>
>>>>
>>>> root@grml ~ # lsblk /dev/sda5 --fs
>>>> NAME  FSTYPE      FSVER LABEL UUID
>>>> FSAVAIL FSUSE% MOUNTPOINT
>>>> sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
>>>> └─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>
>>>> root@grml ~ # mount /dev/mapper/foo /mnt
>>>> root@grml ~ # btrfs scrub start /mnt
>>>>
>>>> root@grml ~ # journalctl -k --no-pager | grep BTRFS
>>>> Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
>>>> 24750795 /dev/dm-0 scanned by systemd-udevd (3233)
>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
>>>> caching is enabled
>>>> Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
>>>> leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
>>>> expect 439
>>>
>>> One error line without "read time corruption" line means btrfs kernel
>>> indeed skipped to next copy.
>>> In this case, there is one copy (aka the first copy) corrupted.
>>> Strangely, if it's the first copy in kernel, it should also be the first
>>> copy in btrfs check.
>>>
>>> And no problem reported from btrfs check, that's already super strange.
>>>
>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
>>>> optimizations
>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
>>>> Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
>>>> Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
>>>> leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
>>>> expect 439
>>>> Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
>>>> leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
>>>> expect 439
>>>> Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
>>>> on devid 1 with status: 0
>>>> Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
>>>> leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
>>>> expect 439
>>>
>>> This means the corrupted copy are also there for several (and I guess
>>> unrelated) tree blocks.
>>> For scrub I guess it just try to read the good copy without bothering
>>> the bad one it found, so no error reported in scrub.
>>>
>>> But still, if you're using metadata without copy (aka, SINGLE, RAID0)
>>> then it would be a completely different story.
>>>
>>>
>>>>
>>>> root@grml ~ # btrfs scrub status /mnt
>>>> UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
>>>> Scrub started:    Thu Jun  4 10:45:38 2020
>>>> Status:           finished
>>>> Duration:         0:00:53
>>>> Total to scrub:   7.44GiB
>>>> Rate:             143.80MiB/s
>>>> Error summary:    no errors found
>>>>
>>>>
>>>> root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
>>>> btrfs ins dump-tree -b $block /dev/dm-0; done
>>>> btrfs-progs v5.6
>>>> leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
>>>> leaf 54222848 flags 0x1(WRITTEN) backref revision 1
>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>     item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
>>>>         generation 24703953 transid 24703953 size 262144 nbytes 8595701760
>>> ...
>>>>         cache generation 24750791 entries 139 bitmaps 8
>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>
>>> So it's still there. The first copy is corrupted. Just btrfs-progs can't
>>> detect it.
>>>
>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>>         drop key (0 UNKNOWN.0 0) level 0
>>>> btrfs-progs v5.6
>>>> leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
>>>> leaf 29552640 flags 0x1(WRITTEN) backref revision 1
>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>> ...
>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>>         drop key (0 UNKNOWN.0 0) level 0
>>>
>>> This is different from previous copy, which means it should be an CoWed
>>> tree blocks.
>>>
>>>> btrfs-progs v5.6
>>>> leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE
>>>
>>> Even newer one.
>>>
>>> ...
>>>> btrfs-progs v5.6
>>>> leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE
>>>
>>> Newer.
>>>
>>> So It looks the bad copy exists for a while, but at the same time we
>>> still have one good copy to let everything float.
>>>
>>> To kill all the old corrupted copies, if it supports TRIM/DISCARD, I
>>> recommend to run scrub first, then fstrim on the fs.
>>>
>>> If it's HDD, I recommend to run a btrfs balance -m to relocate all
>>> metadata blocks, to get rid the bad copies.
>>>
>>> Of course, all using v5.3+ kernels.
>>>
>>> Thanks,
>>> Qu
>>>>
>>>> On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
>>>>>> Thank you for you answer.
>>>>>> I've just updated my system, did a reboot and it's running with a
>>>>>> 5.6.0-2-amd64 now.
>>>>>> So, this is how my kern.log looks like, just right after the start:
>>>>>>
>>>>>
>>>>>>
>>>>>> There are too many blocks. I just picked three randomly:
>>>>>
>>>>> Looks like we need more result, especially some result doesn't match at all.
>>>>>
>>>>>>
>>>>>> === Block 33017856 ===
>>>>>> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
>>>>>> btrfs-progs v5.6
>>>>>> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
>>>>>> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>> ...
>>>>>>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
>>>>>>                 generation 24749502 type 1 (regular)
>>>>>>                 extent data disk byte 1126502400 nr 4096
>>>>>>                 extent data offset 0 nr 8192 ram 8192
>>>>>>                 extent compression 2 (lzo)
>>>>>>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
>>>>>>                 generation 24749502 type 1 (regular)
>>>>>>                 extent data disk byte 0 nr 0
>>>>>>                 extent data offset 1937408 nr 4096 ram 4194304
>>>>>>                 extent compression 0 (none)
>>>>> Not root item at all.
>>>>> At least for this copy, it looks like kernel got one completely bad
>>>>> copy, then discarded it and found a good copy.
>>>>>
>>>>> That's very strange, especially when all the other involved ones seems
>>>>> random and all at slot 32 is not a coincident.
>>>>>
>>>>>
>>>>>> === Block 44900352  ===
>>>>>> btrfs ins dump-tree -b 44900352 /dev/dm-0
>>>>>> btrfs-progs v5.6
>>>>>> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
>>>>>> leaf 44900352 flags 0x1(WRITTEN) backref revision 1
>>>>>
>>>>> This block doesn't even have slot 32... It only have 19 items, thus slot
>>>>> 0 ~ slot 18.
>>>>> And its owner, FS_TREE shouldn't have ROOT_ITEM.
>>>>>
>>>>>>
>>>>>>
>>>>>> === Block 55352561664 ===
>>>>>> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
>>>>>> btrfs-progs v5.6
>>>>>> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
>>>>>> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>> ...
>>>>>>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>>>>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>>>>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>>>>                 drop key (0 UNKNOWN.0 0) level 0
>>>>>
>>>>> This looks like the offending tree block.
>>>>> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
>>>>>
>>>>> Since you're here, I guess a btrfs check without --repair on the
>>>>> unmounted fs would help to identify the real damage.
>>>>>
>>>>> And again, the fs looks very damaged, it's highly recommended to backup
>>>>> your data asap.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>> --- snap ---
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've updated my system (Debian testing) [1] several months ago (~
>>>>>>>> December) and I noticed a lot of corrupt leaf messages flooding my
>>>>>>>> kern.log [2]. Furthermore my system had some trouble, e.g.
>>>>>>>> applications were terminated after some uptime, due to the btrfs
>>>>>>>> filesystem errors. This was with kernel 5.3.
>>>>>>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
>>>>>>>>
>>>>>>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
>>>>>>>> release and with this kernel there aren't any corrupt leaf messages
>>>>>>>> and the problem is gone. IMHO, it must be something coming with kernel
>>>>>>>> 5.3 (or 5.x).
>>>>>>>
>>>>>>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
>>>>>>> such *obviously* wrong metadata.
>>>>>>>>
>>>>>>>> My harddisk is a SSD which is responsible for the root partition. I've
>>>>>>>> encrypted my filesystem with LUKS and just right after I entered my
>>>>>>>> password at the boot, the first corrupt leaf errors appear.
>>>>>>>>
>>>>>>>> An error message looks like this:
>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>
>>>>>>> Btrfs root items have fixed size. This is already something very bad.
>>>>>>>
>>>>>>> Furthermore, the item size is smaller than expected, which means we can
>>>>>>> easily get garbage. I'm a little surprised that older kernel can even
>>>>>>> work without crashing the whole kernel.
>>>>>>>
>>>>>>> Some extra info could help us to find out how badly the fs is corrupted.
>>>>>>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
>>>>>>>
>>>>>>>>
>>>>>>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
>>>>>>>> error line. Only the block number changes.
>>>>>>>
>>>>>>> And dumps for the other block numbers too.
>>>>>>>
>>>>>>>>
>>>>>>>> Interestingly it's the very same as reported to the ML here [3]. I've
>>>>>>>> contacted the reporter, but he didn't have a solution for me, because
>>>>>>>> he changed to a different filesystem.
>>>>>>>>
>>>>>>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
>>>>>>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
>>>>>>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
>>>>>>>> again, w/o any errors.
>>>>>>>
>>>>>>> This doesn't look like a bit flip, so not RAM problems.
>>>>>>>
>>>>>>> Don't have any better advice until we got the dumps, but I'd recommend
>>>>>>> to backup your data since it's still possible.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Qu
>>>>>>>
>>>>>>>>
>>>>>>>> So, I have no more ideas what I can do. Could you please help me to
>>>>>>>> investigate this further? Could it be a bug?
>>>>>>>>
>>>>>>>> Thank you very much.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Thorsten
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 1:
>>>>>>>> $ cat /etc/debian_version
>>>>>>>> bullseye/sid
>>>>>>>>
>>>>>>>> $ uname -a
>>>>>>>> [no problem with this kernel]
>>>>>>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
>>>>>>>>
>>>>>>>> $ btrfs --version
>>>>>>>> btrfs-progs v5.6
>>>>>>>>
>>>>>>>> $ sudo btrfs fi show
>>>>>>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>         Total devices 1 FS bytes used 7.33GiB
>>>>>>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
>>>>>>>>
>>>>>>>> $ btrfs fi df /
>>>>>>>> Data, single: total=22.01GiB, used=7.16GiB
>>>>>>>> System, DUP: total=32.00MiB, used=4.00KiB
>>>>>>>> System, single: total=4.00MiB, used=0.00B
>>>>>>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
>>>>>>>> Metadata, single: total=8.00MiB, used=0.00B
>>>>>>>> GlobalReserve, single: total=25.42MiB, used=0.00B
>>>>>>>>
>>>>>>>>
>>>>>>>> 2:
>>>>>>>> [several messages per second]
>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
>>>>>>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
>>>>>>>> size, have 239 expect 439
>>>>>>>>
>>>>>>>> 3:
>>>>>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
>>>>>>>>
>>>>>>>
>>>>>
>>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-08 13:29               ` Qu Wenruo
@ 2020-06-08 14:41                 ` Thorsten Rehm
  2020-06-12  6:50                   ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Thorsten Rehm @ 2020-06-08 14:41 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

I just have to start my system with kernel 5.6. After that, the
slot=32 error lines will be written. And only these lines:

$ grep 'BTRFS critical' kern.log.1 | wc -l
1191

$ grep 'slot=32' kern.log.1 | wc -l
1191

$ grep 'corruption' kern.log.1 | wc -l
0

Period: 10 Minutes (~1200 lines in 10 minutes).

On Mon, Jun 8, 2020 at 3:29 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2020/6/8 下午9:25, Thorsten Rehm wrote:
> > Hi,
> >
> > any more ideas to investigate this?
>
> If you can still hit the same bug, and the fs is still completely fine,
> I could craft some test patches for you tomorrow.
>
> The idea behind it is to zero out all the memory for any bad eb.
> Thus bad eb cache won't affect other read.
> If that hugely reduced the frequency, I guess that would be the case.
>
>
> But I'm still very interested in, have you hit "read time tree block
> corruption detected" lines? Or just such slot=32 error lines?
>
> Thanks,
> Qu
>
> >
> > On Thu, Jun 4, 2020 at 7:57 PM Thorsten Rehm <thorsten.rehm@gmail.com> wrote:
> >>
> >> Hmm, ok wait a minute:
> >>
> >> "But still, if you're using metadata without copy (aka, SINGLE, RAID0)
> >> then it would be a completely different story."
> >>
> >> It's a single disk (SSD):
> >>
> >> root@grml ~ # btrfs filesystem usage /mnt
> >> Overall:
> >>     Device size:         115.23GiB
> >>     Device allocated:          26.08GiB
> >>     Device unallocated:          89.15GiB
> >>     Device missing:             0.00B
> >>     Used:               7.44GiB
> >>     Free (estimated):         104.04GiB    (min: 59.47GiB)
> >>     Data ratio:                  1.00
> >>     Metadata ratio:              2.00
> >>     Global reserve:          25.25MiB    (used: 0.00B)
> >>
> >> Data,single: Size:22.01GiB, Used:7.11GiB (32.33%)
> >>    /dev/mapper/foo      22.01GiB
> >>
> >> Metadata,single: Size:8.00MiB, Used:0.00B (0.00%)
> >>    /dev/mapper/foo       8.00MiB
> >>
> >> Metadata,DUP: Size:2.00GiB, Used:167.81MiB (8.19%)
> >>    /dev/mapper/foo       4.00GiB
> >>
> >> System,single: Size:4.00MiB, Used:0.00B (0.00%)
> >>    /dev/mapper/foo       4.00MiB
> >>
> >> System,DUP: Size:32.00MiB, Used:4.00KiB (0.01%)
> >>    /dev/mapper/foo      64.00MiB
> >>
> >> Unallocated:
> >>    /dev/mapper/foo      89.15GiB
> >>
> >>
> >> root@grml ~ # btrfs filesystem df /mnt
> >> Data, single: total=22.01GiB, used=7.11GiB
> >> System, DUP: total=32.00MiB, used=4.00KiB
> >> System, single: total=4.00MiB, used=0.00B
> >> Metadata, DUP: total=2.00GiB, used=167.81MiB
> >> Metadata, single: total=8.00MiB, used=0.00B
> >> GlobalReserve, single: total=25.25MiB, used=0.00B
> >>
> >> I did also a fstrim:
> >>
> >> root@grml ~ # cryptsetup --allow-discards open /dev/sda5 foo
> >> Enter passphrase for /dev/sda5:
> >> root@grml ~ # mount -o discard /dev/mapper/foo /mnt
> >> root@grml ~ # fstrim -v /mnt/
> >> /mnt/: 105.8 GiB (113600049152 bytes) trimmed
> >> fstrim -v /mnt/  0.00s user 5.34s system 0% cpu 10:28.70 total
> >>
> >> The kern.log in the runtime of fstrim:
> >> --- snip ---
> >> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
> >> leaf: root=1 block=32505856 slot=32, invalid root item size, have 239
> >> expect 439
> >> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
> >> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
> >> expect 439
> >> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): turning on sync discard
> >> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): disk space
> >> caching is enabled
> >> Jun 04 12:32:37 grml kernel: BTRFS critical (device dm-0): corrupt
> >> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
> >> expect 439
> >> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): enabling ssd
> >> optimizations
> >> Jun 04 12:34:35 grml kernel: BTRFS critical (device dm-0): corrupt
> >> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
> >> expect 439
> >> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): turning on sync discard
> >> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): disk space
> >> caching is enabled
> >> Jun 04 12:36:50 grml kernel: BTRFS critical (device dm-0): corrupt
> >> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
> >> expect 439
> >> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): enabling ssd
> >> optimizations
> >> --- snap ---
> >>
> >> Furthermore the system runs for years now. I can't remember exactly,
> >> but think for 4-5 years. I've started with Debian Testing and just
> >> upgraded my system on a regular basis. And and I started with btrfs of
> >> course, but I can't remember with which version...
> >>
> >> The problem is still there after the fstrim. Any further suggestions?
> >>
> >> And isn't it a little bit strange, that someone had a very similiar problem?
> >> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> >>
> >> root=1, slot=32, and "invalid root item size, have 239 expect 439" are
> >> identical to my errors.
> >>
> >> Thx so far!
> >>
> >>
> >>
> >> On Thu, Jun 4, 2020 at 2:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>
> >>>
> >>>
> >>> On 2020/6/4 下午6:52, Thorsten Rehm wrote:
> >>>> The disk in question is my root (/) partition. If the filesystem is
> >>>> that highly damaged, I have to reinstall my system. We will see, if
> >>>> it's come to that. Maybe we find something interesting on the way...
> >>>> I've downloaded the latest grml daily image and started my system from
> >>>> a usb stick. Here we go:
> >>>>
> >>>> root@grml ~ # uname -r
> >>>> 5.6.0-2-amd64
> >>>>
> >>>> root@grml ~ # cryptsetup open /dev/sda5 foo
> >>>>
> >>>>                                                                   :(
> >>>> Enter passphrase for /dev/sda5:
> >>>>
> >>>> root@grml ~ # file -L -s /dev/mapper/foo
> >>>> /dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
> >>>> nodesize 4096, leafsize 4096,
> >>>> UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
> >>>> bytes used, 1 devices
> >>>>
> >>>> root@grml ~ # btrfs check /dev/mapper/foo
> >>>> Opening filesystem to check...
> >>>> Checking filesystem on /dev/mapper/foo
> >>>> UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>> [1/7] checking root items
> >>>> [2/7] checking extents
> >>>> [3/7] checking free space cache
> >>>> [4/7] checking fs roots
> >>>> [5/7] checking only csums items (without verifying data)
> >>>> [6/7] checking root refs
> >>>> [7/7] checking quota groups skipped (not enabled on this FS)
> >>>> found 7815716864 bytes used, no error found
> >>>> total csum bytes: 6428260
> >>>> total tree bytes: 175968256
> >>>> total fs tree bytes: 149475328
> >>>> total extent tree bytes: 16052224
> >>>> btree space waste bytes: 43268911
> >>>> file data blocks allocated: 10453221376
> >>>>  referenced 8746053632
> >>>
> >>> Errr, this is a super good news, all your fs metadata is completely fine
> >>> (at least for the first copy).
> >>> Which is completely different from the kernel dmesg.
> >>>
> >>>>
> >>>> root@grml ~ # lsblk /dev/sda5 --fs
> >>>> NAME  FSTYPE      FSVER LABEL UUID
> >>>> FSAVAIL FSUSE% MOUNTPOINT
> >>>> sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
> >>>> └─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>
> >>>> root@grml ~ # mount /dev/mapper/foo /mnt
> >>>> root@grml ~ # btrfs scrub start /mnt
> >>>>
> >>>> root@grml ~ # journalctl -k --no-pager | grep BTRFS
> >>>> Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
> >>>> 24750795 /dev/dm-0 scanned by systemd-udevd (3233)
> >>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
> >>>> caching is enabled
> >>>> Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>> leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
> >>>> expect 439
> >>>
> >>> One error line without "read time corruption" line means btrfs kernel
> >>> indeed skipped to next copy.
> >>> In this case, there is one copy (aka the first copy) corrupted.
> >>> Strangely, if it's the first copy in kernel, it should also be the first
> >>> copy in btrfs check.
> >>>
> >>> And no problem reported from btrfs check, that's already super strange.
> >>>
> >>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
> >>>> optimizations
> >>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
> >>>> Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
> >>>> Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>> leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
> >>>> expect 439
> >>>> Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>> leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
> >>>> expect 439
> >>>> Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
> >>>> on devid 1 with status: 0
> >>>> Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>> leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
> >>>> expect 439
> >>>
> >>> This means the corrupted copy are also there for several (and I guess
> >>> unrelated) tree blocks.
> >>> For scrub I guess it just try to read the good copy without bothering
> >>> the bad one it found, so no error reported in scrub.
> >>>
> >>> But still, if you're using metadata without copy (aka, SINGLE, RAID0)
> >>> then it would be a completely different story.
> >>>
> >>>
> >>>>
> >>>> root@grml ~ # btrfs scrub status /mnt
> >>>> UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>> Scrub started:    Thu Jun  4 10:45:38 2020
> >>>> Status:           finished
> >>>> Duration:         0:00:53
> >>>> Total to scrub:   7.44GiB
> >>>> Rate:             143.80MiB/s
> >>>> Error summary:    no errors found
> >>>>
> >>>>
> >>>> root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
> >>>> btrfs ins dump-tree -b $block /dev/dm-0; done
> >>>> btrfs-progs v5.6
> >>>> leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
> >>>> leaf 54222848 flags 0x1(WRITTEN) backref revision 1
> >>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >>>>     item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
> >>>>         generation 24703953 transid 24703953 size 262144 nbytes 8595701760
> >>> ...
> >>>>         cache generation 24750791 entries 139 bitmaps 8
> >>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >>>
> >>> So it's still there. The first copy is corrupted. Just btrfs-progs can't
> >>> detect it.
> >>>
> >>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >>>>         drop key (0 UNKNOWN.0 0) level 0
> >>>> btrfs-progs v5.6
> >>>> leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
> >>>> leaf 29552640 flags 0x1(WRITTEN) backref revision 1
> >>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >>> ...
> >>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >>>>         drop key (0 UNKNOWN.0 0) level 0
> >>>
> >>> This is different from previous copy, which means it should be an CoWed
> >>> tree blocks.
> >>>
> >>>> btrfs-progs v5.6
> >>>> leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE
> >>>
> >>> Even newer one.
> >>>
> >>> ...
> >>>> btrfs-progs v5.6
> >>>> leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE
> >>>
> >>> Newer.
> >>>
> >>> So It looks the bad copy exists for a while, but at the same time we
> >>> still have one good copy to let everything float.
> >>>
> >>> To kill all the old corrupted copies, if it supports TRIM/DISCARD, I
> >>> recommend to run scrub first, then fstrim on the fs.
> >>>
> >>> If it's HDD, I recommend to run a btrfs balance -m to relocate all
> >>> metadata blocks, to get rid the bad copies.
> >>>
> >>> Of course, all using v5.3+ kernels.
> >>>
> >>> Thanks,
> >>> Qu
> >>>>
> >>>> On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
> >>>>>> Thank you for you answer.
> >>>>>> I've just updated my system, did a reboot and it's running with a
> >>>>>> 5.6.0-2-amd64 now.
> >>>>>> So, this is how my kern.log looks like, just right after the start:
> >>>>>>
> >>>>>
> >>>>>>
> >>>>>> There are too many blocks. I just picked three randomly:
> >>>>>
> >>>>> Looks like we need more result, especially some result doesn't match at all.
> >>>>>
> >>>>>>
> >>>>>> === Block 33017856 ===
> >>>>>> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
> >>>>>> btrfs-progs v5.6
> >>>>>> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
> >>>>>> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
> >>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >>>>> ...
> >>>>>>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
> >>>>>>                 generation 24749502 type 1 (regular)
> >>>>>>                 extent data disk byte 1126502400 nr 4096
> >>>>>>                 extent data offset 0 nr 8192 ram 8192
> >>>>>>                 extent compression 2 (lzo)
> >>>>>>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
> >>>>>>                 generation 24749502 type 1 (regular)
> >>>>>>                 extent data disk byte 0 nr 0
> >>>>>>                 extent data offset 1937408 nr 4096 ram 4194304
> >>>>>>                 extent compression 0 (none)
> >>>>> Not root item at all.
> >>>>> At least for this copy, it looks like kernel got one completely bad
> >>>>> copy, then discarded it and found a good copy.
> >>>>>
> >>>>> That's very strange, especially when all the other involved ones seems
> >>>>> random and all at slot 32 is not a coincident.
> >>>>>
> >>>>>
> >>>>>> === Block 44900352  ===
> >>>>>> btrfs ins dump-tree -b 44900352 /dev/dm-0
> >>>>>> btrfs-progs v5.6
> >>>>>> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
> >>>>>> leaf 44900352 flags 0x1(WRITTEN) backref revision 1
> >>>>>
> >>>>> This block doesn't even have slot 32... It only have 19 items, thus slot
> >>>>> 0 ~ slot 18.
> >>>>> And its owner, FS_TREE shouldn't have ROOT_ITEM.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> === Block 55352561664 ===
> >>>>>> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
> >>>>>> btrfs-progs v5.6
> >>>>>> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
> >>>>>> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
> >>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >>>>> ...
> >>>>>>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >>>>>>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >>>>>>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >>>>>>                 drop key (0 UNKNOWN.0 0) level 0
> >>>>>
> >>>>> This looks like the offending tree block.
> >>>>> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
> >>>>>
> >>>>> Since you're here, I guess a btrfs check without --repair on the
> >>>>> unmounted fs would help to identify the real damage.
> >>>>>
> >>>>> And again, the fs looks very damaged, it's highly recommended to backup
> >>>>> your data asap.
> >>>>>
> >>>>> Thanks,
> >>>>> Qu
> >>>>>
> >>>>>> --- snap ---
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I've updated my system (Debian testing) [1] several months ago (~
> >>>>>>>> December) and I noticed a lot of corrupt leaf messages flooding my
> >>>>>>>> kern.log [2]. Furthermore my system had some trouble, e.g.
> >>>>>>>> applications were terminated after some uptime, due to the btrfs
> >>>>>>>> filesystem errors. This was with kernel 5.3.
> >>>>>>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
> >>>>>>>>
> >>>>>>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
> >>>>>>>> release and with this kernel there aren't any corrupt leaf messages
> >>>>>>>> and the problem is gone. IMHO, it must be something coming with kernel
> >>>>>>>> 5.3 (or 5.x).
> >>>>>>>
> >>>>>>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
> >>>>>>> such *obviously* wrong metadata.
> >>>>>>>>
> >>>>>>>> My harddisk is a SSD which is responsible for the root partition. I've
> >>>>>>>> encrypted my filesystem with LUKS and just right after I entered my
> >>>>>>>> password at the boot, the first corrupt leaf errors appear.
> >>>>>>>>
> >>>>>>>> An error message looks like this:
> >>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>
> >>>>>>> Btrfs root items have fixed size. This is already something very bad.
> >>>>>>>
> >>>>>>> Furthermore, the item size is smaller than expected, which means we can
> >>>>>>> easily get garbage. I'm a little surprised that older kernel can even
> >>>>>>> work without crashing the whole kernel.
> >>>>>>>
> >>>>>>> Some extra info could help us to find out how badly the fs is corrupted.
> >>>>>>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
> >>>>>>>
> >>>>>>>>
> >>>>>>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
> >>>>>>>> error line. Only the block number changes.
> >>>>>>>
> >>>>>>> And dumps for the other block numbers too.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Interestingly it's the very same as reported to the ML here [3]. I've
> >>>>>>>> contacted the reporter, but he didn't have a solution for me, because
> >>>>>>>> he changed to a different filesystem.
> >>>>>>>>
> >>>>>>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
> >>>>>>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
> >>>>>>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
> >>>>>>>> again, w/o any errors.
> >>>>>>>
> >>>>>>> This doesn't look like a bit flip, so not RAM problems.
> >>>>>>>
> >>>>>>> Don't have any better advice until we got the dumps, but I'd recommend
> >>>>>>> to backup your data since it's still possible.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Qu
> >>>>>>>
> >>>>>>>>
> >>>>>>>> So, I have no more ideas what I can do. Could you please help me to
> >>>>>>>> investigate this further? Could it be a bug?
> >>>>>>>>
> >>>>>>>> Thank you very much.
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> Thorsten
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 1:
> >>>>>>>> $ cat /etc/debian_version
> >>>>>>>> bullseye/sid
> >>>>>>>>
> >>>>>>>> $ uname -a
> >>>>>>>> [no problem with this kernel]
> >>>>>>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
> >>>>>>>>
> >>>>>>>> $ btrfs --version
> >>>>>>>> btrfs-progs v5.6
> >>>>>>>>
> >>>>>>>> $ sudo btrfs fi show
> >>>>>>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>>>>         Total devices 1 FS bytes used 7.33GiB
> >>>>>>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
> >>>>>>>>
> >>>>>>>> $ btrfs fi df /
> >>>>>>>> Data, single: total=22.01GiB, used=7.16GiB
> >>>>>>>> System, DUP: total=32.00MiB, used=4.00KiB
> >>>>>>>> System, single: total=4.00MiB, used=0.00B
> >>>>>>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
> >>>>>>>> Metadata, single: total=8.00MiB, used=0.00B
> >>>>>>>> GlobalReserve, single: total=25.42MiB, used=0.00B
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2:
> >>>>>>>> [several messages per second]
> >>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
> >>>>>>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
> >>>>>>>> size, have 239 expect 439
> >>>>>>>>
> >>>>>>>> 3:
> >>>>>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-08 14:41                 ` Thorsten Rehm
@ 2020-06-12  6:50                   ` Qu Wenruo
  2020-06-16  5:41                     ` Thorsten Rehm
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2020-06-12  6:50 UTC (permalink / raw)
  To: Thorsten Rehm; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 23422 bytes --]

Would you mind to create a btrfs-image dump?

It would greatly help us to pin down the cause.

# btrfs-image -c9 <device> <file>

Although it may leak sensitive data like file and dir names, you can try
-s options to fuzz them since it's not important in this particular
case, but it would cause more time and may cause some extra problems.

After looking into related code, and your SINGLE metadata profile, I
can't find any clues yet.

Thanks,
Qu


On 2020/6/8 下午10:41, Thorsten Rehm wrote:
> I just have to start my system with kernel 5.6. After that, the
> slot=32 error lines will be written. And only these lines:
> 
> $ grep 'BTRFS critical' kern.log.1 | wc -l
> 1191
> 
> $ grep 'slot=32' kern.log.1 | wc -l
> 1191
> 
> $ grep 'corruption' kern.log.1 | wc -l
> 0
> 
> Period: 10 Minutes (~1200 lines in 10 minutes).
> 
> On Mon, Jun 8, 2020 at 3:29 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2020/6/8 下午9:25, Thorsten Rehm wrote:
>>> Hi,
>>>
>>> any more ideas to investigate this?
>>
>> If you can still hit the same bug, and the fs is still completely fine,
>> I could craft some test patches for you tomorrow.
>>
>> The idea behind it is to zero out all the memory for any bad eb.
>> Thus bad eb cache won't affect other read.
>> If that hugely reduced the frequency, I guess that would be the case.
>>
>>
>> But I'm still very interested in, have you hit "read time tree block
>> corruption detected" lines? Or just such slot=32 error lines?
>>
>> Thanks,
>> Qu
>>
>>>
>>> On Thu, Jun 4, 2020 at 7:57 PM Thorsten Rehm <thorsten.rehm@gmail.com> wrote:
>>>>
>>>> Hmm, ok wait a minute:
>>>>
>>>> "But still, if you're using metadata without copy (aka, SINGLE, RAID0)
>>>> then it would be a completely different story."
>>>>
>>>> It's a single disk (SSD):
>>>>
>>>> root@grml ~ # btrfs filesystem usage /mnt
>>>> Overall:
>>>>     Device size:         115.23GiB
>>>>     Device allocated:          26.08GiB
>>>>     Device unallocated:          89.15GiB
>>>>     Device missing:             0.00B
>>>>     Used:               7.44GiB
>>>>     Free (estimated):         104.04GiB    (min: 59.47GiB)
>>>>     Data ratio:                  1.00
>>>>     Metadata ratio:              2.00
>>>>     Global reserve:          25.25MiB    (used: 0.00B)
>>>>
>>>> Data,single: Size:22.01GiB, Used:7.11GiB (32.33%)
>>>>    /dev/mapper/foo      22.01GiB
>>>>
>>>> Metadata,single: Size:8.00MiB, Used:0.00B (0.00%)
>>>>    /dev/mapper/foo       8.00MiB
>>>>
>>>> Metadata,DUP: Size:2.00GiB, Used:167.81MiB (8.19%)
>>>>    /dev/mapper/foo       4.00GiB
>>>>
>>>> System,single: Size:4.00MiB, Used:0.00B (0.00%)
>>>>    /dev/mapper/foo       4.00MiB
>>>>
>>>> System,DUP: Size:32.00MiB, Used:4.00KiB (0.01%)
>>>>    /dev/mapper/foo      64.00MiB
>>>>
>>>> Unallocated:
>>>>    /dev/mapper/foo      89.15GiB
>>>>
>>>>
>>>> root@grml ~ # btrfs filesystem df /mnt
>>>> Data, single: total=22.01GiB, used=7.11GiB
>>>> System, DUP: total=32.00MiB, used=4.00KiB
>>>> System, single: total=4.00MiB, used=0.00B
>>>> Metadata, DUP: total=2.00GiB, used=167.81MiB
>>>> Metadata, single: total=8.00MiB, used=0.00B
>>>> GlobalReserve, single: total=25.25MiB, used=0.00B
>>>>
>>>> I did also a fstrim:
>>>>
>>>> root@grml ~ # cryptsetup --allow-discards open /dev/sda5 foo
>>>> Enter passphrase for /dev/sda5:
>>>> root@grml ~ # mount -o discard /dev/mapper/foo /mnt
>>>> root@grml ~ # fstrim -v /mnt/
>>>> /mnt/: 105.8 GiB (113600049152 bytes) trimmed
>>>> fstrim -v /mnt/  0.00s user 5.34s system 0% cpu 10:28.70 total
>>>>
>>>> The kern.log in the runtime of fstrim:
>>>> --- snip ---
>>>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
>>>> leaf: root=1 block=32505856 slot=32, invalid root item size, have 239
>>>> expect 439
>>>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
>>>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
>>>> expect 439
>>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): turning on sync discard
>>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): disk space
>>>> caching is enabled
>>>> Jun 04 12:32:37 grml kernel: BTRFS critical (device dm-0): corrupt
>>>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
>>>> expect 439
>>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): enabling ssd
>>>> optimizations
>>>> Jun 04 12:34:35 grml kernel: BTRFS critical (device dm-0): corrupt
>>>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
>>>> expect 439
>>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): turning on sync discard
>>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): disk space
>>>> caching is enabled
>>>> Jun 04 12:36:50 grml kernel: BTRFS critical (device dm-0): corrupt
>>>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
>>>> expect 439
>>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): enabling ssd
>>>> optimizations
>>>> --- snap ---
>>>>
>>>> Furthermore the system runs for years now. I can't remember exactly,
>>>> but think for 4-5 years. I've started with Debian Testing and just
>>>> upgraded my system on a regular basis. And and I started with btrfs of
>>>> course, but I can't remember with which version...
>>>>
>>>> The problem is still there after the fstrim. Any further suggestions?
>>>>
>>>> And isn't it a little bit strange, that someone had a very similiar problem?
>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
>>>>
>>>> root=1, slot=32, and "invalid root item size, have 239 expect 439" are
>>>> identical to my errors.
>>>>
>>>> Thx so far!
>>>>
>>>>
>>>>
>>>> On Thu, Jun 4, 2020 at 2:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2020/6/4 下午6:52, Thorsten Rehm wrote:
>>>>>> The disk in question is my root (/) partition. If the filesystem is
>>>>>> that highly damaged, I have to reinstall my system. We will see, if
>>>>>> it's come to that. Maybe we find something interesting on the way...
>>>>>> I've downloaded the latest grml daily image and started my system from
>>>>>> a usb stick. Here we go:
>>>>>>
>>>>>> root@grml ~ # uname -r
>>>>>> 5.6.0-2-amd64
>>>>>>
>>>>>> root@grml ~ # cryptsetup open /dev/sda5 foo
>>>>>>
>>>>>>                                                                   :(
>>>>>> Enter passphrase for /dev/sda5:
>>>>>>
>>>>>> root@grml ~ # file -L -s /dev/mapper/foo
>>>>>> /dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
>>>>>> nodesize 4096, leafsize 4096,
>>>>>> UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
>>>>>> bytes used, 1 devices
>>>>>>
>>>>>> root@grml ~ # btrfs check /dev/mapper/foo
>>>>>> Opening filesystem to check...
>>>>>> Checking filesystem on /dev/mapper/foo
>>>>>> UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>> [1/7] checking root items
>>>>>> [2/7] checking extents
>>>>>> [3/7] checking free space cache
>>>>>> [4/7] checking fs roots
>>>>>> [5/7] checking only csums items (without verifying data)
>>>>>> [6/7] checking root refs
>>>>>> [7/7] checking quota groups skipped (not enabled on this FS)
>>>>>> found 7815716864 bytes used, no error found
>>>>>> total csum bytes: 6428260
>>>>>> total tree bytes: 175968256
>>>>>> total fs tree bytes: 149475328
>>>>>> total extent tree bytes: 16052224
>>>>>> btree space waste bytes: 43268911
>>>>>> file data blocks allocated: 10453221376
>>>>>>  referenced 8746053632
>>>>>
>>>>> Errr, this is a super good news, all your fs metadata is completely fine
>>>>> (at least for the first copy).
>>>>> Which is completely different from the kernel dmesg.
>>>>>
>>>>>>
>>>>>> root@grml ~ # lsblk /dev/sda5 --fs
>>>>>> NAME  FSTYPE      FSVER LABEL UUID
>>>>>> FSAVAIL FSUSE% MOUNTPOINT
>>>>>> sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
>>>>>> └─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>
>>>>>> root@grml ~ # mount /dev/mapper/foo /mnt
>>>>>> root@grml ~ # btrfs scrub start /mnt
>>>>>>
>>>>>> root@grml ~ # journalctl -k --no-pager | grep BTRFS
>>>>>> Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
>>>>>> 24750795 /dev/dm-0 scanned by systemd-udevd (3233)
>>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
>>>>>> caching is enabled
>>>>>> Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>> leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
>>>>>> expect 439
>>>>>
>>>>> One error line without "read time corruption" line means btrfs kernel
>>>>> indeed skipped to next copy.
>>>>> In this case, there is one copy (aka the first copy) corrupted.
>>>>> Strangely, if it's the first copy in kernel, it should also be the first
>>>>> copy in btrfs check.
>>>>>
>>>>> And no problem reported from btrfs check, that's already super strange.
>>>>>
>>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
>>>>>> optimizations
>>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
>>>>>> Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
>>>>>> Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>> leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
>>>>>> expect 439
>>>>>> Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>> leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
>>>>>> expect 439
>>>>>> Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
>>>>>> on devid 1 with status: 0
>>>>>> Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>> leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
>>>>>> expect 439
>>>>>
>>>>> This means the corrupted copy are also there for several (and I guess
>>>>> unrelated) tree blocks.
>>>>> For scrub I guess it just try to read the good copy without bothering
>>>>> the bad one it found, so no error reported in scrub.
>>>>>
>>>>> But still, if you're using metadata without copy (aka, SINGLE, RAID0)
>>>>> then it would be a completely different story.
>>>>>
>>>>>
>>>>>>
>>>>>> root@grml ~ # btrfs scrub status /mnt
>>>>>> UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>> Scrub started:    Thu Jun  4 10:45:38 2020
>>>>>> Status:           finished
>>>>>> Duration:         0:00:53
>>>>>> Total to scrub:   7.44GiB
>>>>>> Rate:             143.80MiB/s
>>>>>> Error summary:    no errors found
>>>>>>
>>>>>>
>>>>>> root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
>>>>>> btrfs ins dump-tree -b $block /dev/dm-0; done
>>>>>> btrfs-progs v5.6
>>>>>> leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
>>>>>> leaf 54222848 flags 0x1(WRITTEN) backref revision 1
>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>>>     item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
>>>>>>         generation 24703953 transid 24703953 size 262144 nbytes 8595701760
>>>>> ...
>>>>>>         cache generation 24750791 entries 139 bitmaps 8
>>>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>>>
>>>>> So it's still there. The first copy is corrupted. Just btrfs-progs can't
>>>>> detect it.
>>>>>
>>>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>>>>         drop key (0 UNKNOWN.0 0) level 0
>>>>>> btrfs-progs v5.6
>>>>>> leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
>>>>>> leaf 29552640 flags 0x1(WRITTEN) backref revision 1
>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>> ...
>>>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>>>>         drop key (0 UNKNOWN.0 0) level 0
>>>>>
>>>>> This is different from previous copy, which means it should be an CoWed
>>>>> tree blocks.
>>>>>
>>>>>> btrfs-progs v5.6
>>>>>> leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE
>>>>>
>>>>> Even newer one.
>>>>>
>>>>> ...
>>>>>> btrfs-progs v5.6
>>>>>> leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE
>>>>>
>>>>> Newer.
>>>>>
>>>>> So It looks the bad copy exists for a while, but at the same time we
>>>>> still have one good copy to let everything float.
>>>>>
>>>>> To kill all the old corrupted copies, if it supports TRIM/DISCARD, I
>>>>> recommend to run scrub first, then fstrim on the fs.
>>>>>
>>>>> If it's HDD, I recommend to run a btrfs balance -m to relocate all
>>>>> metadata blocks, to get rid the bad copies.
>>>>>
>>>>> Of course, all using v5.3+ kernels.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>>
>>>>>> On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
>>>>>>>> Thank you for you answer.
>>>>>>>> I've just updated my system, did a reboot and it's running with a
>>>>>>>> 5.6.0-2-amd64 now.
>>>>>>>> So, this is how my kern.log looks like, just right after the start:
>>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> There are too many blocks. I just picked three randomly:
>>>>>>>
>>>>>>> Looks like we need more result, especially some result doesn't match at all.
>>>>>>>
>>>>>>>>
>>>>>>>> === Block 33017856 ===
>>>>>>>> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
>>>>>>>> btrfs-progs v5.6
>>>>>>>> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
>>>>>>>> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
>>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>>>> ...
>>>>>>>>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
>>>>>>>>                 generation 24749502 type 1 (regular)
>>>>>>>>                 extent data disk byte 1126502400 nr 4096
>>>>>>>>                 extent data offset 0 nr 8192 ram 8192
>>>>>>>>                 extent compression 2 (lzo)
>>>>>>>>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
>>>>>>>>                 generation 24749502 type 1 (regular)
>>>>>>>>                 extent data disk byte 0 nr 0
>>>>>>>>                 extent data offset 1937408 nr 4096 ram 4194304
>>>>>>>>                 extent compression 0 (none)
>>>>>>> Not root item at all.
>>>>>>> At least for this copy, it looks like kernel got one completely bad
>>>>>>> copy, then discarded it and found a good copy.
>>>>>>>
>>>>>>> That's very strange, especially when all the other involved ones seems
>>>>>>> random and all at slot 32 is not a coincident.
>>>>>>>
>>>>>>>
>>>>>>>> === Block 44900352  ===
>>>>>>>> btrfs ins dump-tree -b 44900352 /dev/dm-0
>>>>>>>> btrfs-progs v5.6
>>>>>>>> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
>>>>>>>> leaf 44900352 flags 0x1(WRITTEN) backref revision 1
>>>>>>>
>>>>>>> This block doesn't even have slot 32... It only have 19 items, thus slot
>>>>>>> 0 ~ slot 18.
>>>>>>> And its owner, FS_TREE shouldn't have ROOT_ITEM.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> === Block 55352561664 ===
>>>>>>>> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
>>>>>>>> btrfs-progs v5.6
>>>>>>>> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
>>>>>>>> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
>>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>>>> ...
>>>>>>>>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>>>>>>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>>>>>>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>>>>>>                 drop key (0 UNKNOWN.0 0) level 0
>>>>>>>
>>>>>>> This looks like the offending tree block.
>>>>>>> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
>>>>>>>
>>>>>>> Since you're here, I guess a btrfs check without --repair on the
>>>>>>> unmounted fs would help to identify the real damage.
>>>>>>>
>>>>>>> And again, the fs looks very damaged, it's highly recommended to backup
>>>>>>> your data asap.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Qu
>>>>>>>
>>>>>>>> --- snap ---
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I've updated my system (Debian testing) [1] several months ago (~
>>>>>>>>>> December) and I noticed a lot of corrupt leaf messages flooding my
>>>>>>>>>> kern.log [2]. Furthermore my system had some trouble, e.g.
>>>>>>>>>> applications were terminated after some uptime, due to the btrfs
>>>>>>>>>> filesystem errors. This was with kernel 5.3.
>>>>>>>>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
>>>>>>>>>>
>>>>>>>>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
>>>>>>>>>> release and with this kernel there aren't any corrupt leaf messages
>>>>>>>>>> and the problem is gone. IMHO, it must be something coming with kernel
>>>>>>>>>> 5.3 (or 5.x).
>>>>>>>>>
>>>>>>>>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
>>>>>>>>> such *obviously* wrong metadata.
>>>>>>>>>>
>>>>>>>>>> My harddisk is a SSD which is responsible for the root partition. I've
>>>>>>>>>> encrypted my filesystem with LUKS and just right after I entered my
>>>>>>>>>> password at the boot, the first corrupt leaf errors appear.
>>>>>>>>>>
>>>>>>>>>> An error message looks like this:
>>>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>
>>>>>>>>> Btrfs root items have fixed size. This is already something very bad.
>>>>>>>>>
>>>>>>>>> Furthermore, the item size is smaller than expected, which means we can
>>>>>>>>> easily get garbage. I'm a little surprised that older kernel can even
>>>>>>>>> work without crashing the whole kernel.
>>>>>>>>>
>>>>>>>>> Some extra info could help us to find out how badly the fs is corrupted.
>>>>>>>>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
>>>>>>>>>> error line. Only the block number changes.
>>>>>>>>>
>>>>>>>>> And dumps for the other block numbers too.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Interestingly it's the very same as reported to the ML here [3]. I've
>>>>>>>>>> contacted the reporter, but he didn't have a solution for me, because
>>>>>>>>>> he changed to a different filesystem.
>>>>>>>>>>
>>>>>>>>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
>>>>>>>>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
>>>>>>>>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
>>>>>>>>>> again, w/o any errors.
>>>>>>>>>
>>>>>>>>> This doesn't look like a bit flip, so not RAM problems.
>>>>>>>>>
>>>>>>>>> Don't have any better advice until we got the dumps, but I'd recommend
>>>>>>>>> to backup your data since it's still possible.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Qu
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So, I have no more ideas what I can do. Could you please help me to
>>>>>>>>>> investigate this further? Could it be a bug?
>>>>>>>>>>
>>>>>>>>>> Thank you very much.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Thorsten
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 1:
>>>>>>>>>> $ cat /etc/debian_version
>>>>>>>>>> bullseye/sid
>>>>>>>>>>
>>>>>>>>>> $ uname -a
>>>>>>>>>> [no problem with this kernel]
>>>>>>>>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
>>>>>>>>>>
>>>>>>>>>> $ btrfs --version
>>>>>>>>>> btrfs-progs v5.6
>>>>>>>>>>
>>>>>>>>>> $ sudo btrfs fi show
>>>>>>>>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>>>         Total devices 1 FS bytes used 7.33GiB
>>>>>>>>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
>>>>>>>>>>
>>>>>>>>>> $ btrfs fi df /
>>>>>>>>>> Data, single: total=22.01GiB, used=7.16GiB
>>>>>>>>>> System, DUP: total=32.00MiB, used=4.00KiB
>>>>>>>>>> System, single: total=4.00MiB, used=0.00B
>>>>>>>>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
>>>>>>>>>> Metadata, single: total=8.00MiB, used=0.00B
>>>>>>>>>> GlobalReserve, single: total=25.42MiB, used=0.00B
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2:
>>>>>>>>>> [several messages per second]
>>>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>
>>>>>>>>>> 3:
>>>>>>>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-12  6:50                   ` Qu Wenruo
@ 2020-06-16  5:41                     ` Thorsten Rehm
  2020-11-20 13:17                       ` Thorsten Rehm
  0 siblings, 1 reply; 14+ messages in thread
From: Thorsten Rehm @ 2020-06-16  5:41 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Yepp, sure.
I will do that in the next few days.


On Fri, Jun 12, 2020 at 8:50 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
> Would you mind to create a btrfs-image dump?
>
> It would greatly help us to pin down the cause.
>
> # btrfs-image -c9 <device> <file>
>
> Although it may leak sensitive data like file and dir names, you can try
> -s options to fuzz them since it's not important in this particular
> case, but it would cause more time and may cause some extra problems.
>
> After looking into related code, and your SINGLE metadata profile, I
> can't find any clues yet.
>
> Thanks,
> Qu
>
>
> On 2020/6/8 下午10:41, Thorsten Rehm wrote:
> > I just have to start my system with kernel 5.6. After that, the
> > slot=32 error lines will be written. And only these lines:
> >
> > $ grep 'BTRFS critical' kern.log.1 | wc -l
> > 1191
> >
> > $ grep 'slot=32' kern.log.1 | wc -l
> > 1191
> >
> > $ grep 'corruption' kern.log.1 | wc -l
> > 0
> >
> > Period: 10 Minutes (~1200 lines in 10 minutes).
> >
> > On Mon, Jun 8, 2020 at 3:29 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2020/6/8 下午9:25, Thorsten Rehm wrote:
> >>> Hi,
> >>>
> >>> any more ideas to investigate this?
> >>
> >> If you can still hit the same bug, and the fs is still completely fine,
> >> I could craft some test patches for you tomorrow.
> >>
> >> The idea behind it is to zero out all the memory for any bad eb.
> >> Thus bad eb cache won't affect other read.
> >> If that hugely reduced the frequency, I guess that would be the case.
> >>
> >>
> >> But I'm still very interested in, have you hit "read time tree block
> >> corruption detected" lines? Or just such slot=32 error lines?
> >>
> >> Thanks,
> >> Qu
> >>
> >>>
> >>> On Thu, Jun 4, 2020 at 7:57 PM Thorsten Rehm <thorsten.rehm@gmail.com> wrote:
> >>>>
> >>>> Hmm, ok wait a minute:
> >>>>
> >>>> "But still, if you're using metadata without copy (aka, SINGLE, RAID0)
> >>>> then it would be a completely different story."
> >>>>
> >>>> It's a single disk (SSD):
> >>>>
> >>>> root@grml ~ # btrfs filesystem usage /mnt
> >>>> Overall:
> >>>>     Device size:         115.23GiB
> >>>>     Device allocated:          26.08GiB
> >>>>     Device unallocated:          89.15GiB
> >>>>     Device missing:             0.00B
> >>>>     Used:               7.44GiB
> >>>>     Free (estimated):         104.04GiB    (min: 59.47GiB)
> >>>>     Data ratio:                  1.00
> >>>>     Metadata ratio:              2.00
> >>>>     Global reserve:          25.25MiB    (used: 0.00B)
> >>>>
> >>>> Data,single: Size:22.01GiB, Used:7.11GiB (32.33%)
> >>>>    /dev/mapper/foo      22.01GiB
> >>>>
> >>>> Metadata,single: Size:8.00MiB, Used:0.00B (0.00%)
> >>>>    /dev/mapper/foo       8.00MiB
> >>>>
> >>>> Metadata,DUP: Size:2.00GiB, Used:167.81MiB (8.19%)
> >>>>    /dev/mapper/foo       4.00GiB
> >>>>
> >>>> System,single: Size:4.00MiB, Used:0.00B (0.00%)
> >>>>    /dev/mapper/foo       4.00MiB
> >>>>
> >>>> System,DUP: Size:32.00MiB, Used:4.00KiB (0.01%)
> >>>>    /dev/mapper/foo      64.00MiB
> >>>>
> >>>> Unallocated:
> >>>>    /dev/mapper/foo      89.15GiB
> >>>>
> >>>>
> >>>> root@grml ~ # btrfs filesystem df /mnt
> >>>> Data, single: total=22.01GiB, used=7.11GiB
> >>>> System, DUP: total=32.00MiB, used=4.00KiB
> >>>> System, single: total=4.00MiB, used=0.00B
> >>>> Metadata, DUP: total=2.00GiB, used=167.81MiB
> >>>> Metadata, single: total=8.00MiB, used=0.00B
> >>>> GlobalReserve, single: total=25.25MiB, used=0.00B
> >>>>
> >>>> I did also a fstrim:
> >>>>
> >>>> root@grml ~ # cryptsetup --allow-discards open /dev/sda5 foo
> >>>> Enter passphrase for /dev/sda5:
> >>>> root@grml ~ # mount -o discard /dev/mapper/foo /mnt
> >>>> root@grml ~ # fstrim -v /mnt/
> >>>> /mnt/: 105.8 GiB (113600049152 bytes) trimmed
> >>>> fstrim -v /mnt/  0.00s user 5.34s system 0% cpu 10:28.70 total
> >>>>
> >>>> The kern.log in the runtime of fstrim:
> >>>> --- snip ---
> >>>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>> leaf: root=1 block=32505856 slot=32, invalid root item size, have 239
> >>>> expect 439
> >>>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
> >>>> expect 439
> >>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): turning on sync discard
> >>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): disk space
> >>>> caching is enabled
> >>>> Jun 04 12:32:37 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
> >>>> expect 439
> >>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): enabling ssd
> >>>> optimizations
> >>>> Jun 04 12:34:35 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
> >>>> expect 439
> >>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): turning on sync discard
> >>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): disk space
> >>>> caching is enabled
> >>>> Jun 04 12:36:50 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
> >>>> expect 439
> >>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): enabling ssd
> >>>> optimizations
> >>>> --- snap ---
> >>>>
> >>>> Furthermore the system runs for years now. I can't remember exactly,
> >>>> but think for 4-5 years. I've started with Debian Testing and just
> >>>> upgraded my system on a regular basis. And and I started with btrfs of
> >>>> course, but I can't remember with which version...
> >>>>
> >>>> The problem is still there after the fstrim. Any further suggestions?
> >>>>
> >>>> And isn't it a little bit strange, that someone had a very similiar problem?
> >>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> >>>>
> >>>> root=1, slot=32, and "invalid root item size, have 239 expect 439" are
> >>>> identical to my errors.
> >>>>
> >>>> Thx so far!
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Jun 4, 2020 at 2:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 2020/6/4 下午6:52, Thorsten Rehm wrote:
> >>>>>> The disk in question is my root (/) partition. If the filesystem is
> >>>>>> that highly damaged, I have to reinstall my system. We will see, if
> >>>>>> it's come to that. Maybe we find something interesting on the way...
> >>>>>> I've downloaded the latest grml daily image and started my system from
> >>>>>> a usb stick. Here we go:
> >>>>>>
> >>>>>> root@grml ~ # uname -r
> >>>>>> 5.6.0-2-amd64
> >>>>>>
> >>>>>> root@grml ~ # cryptsetup open /dev/sda5 foo
> >>>>>>
> >>>>>>                                                                   :(
> >>>>>> Enter passphrase for /dev/sda5:
> >>>>>>
> >>>>>> root@grml ~ # file -L -s /dev/mapper/foo
> >>>>>> /dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
> >>>>>> nodesize 4096, leafsize 4096,
> >>>>>> UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
> >>>>>> bytes used, 1 devices
> >>>>>>
> >>>>>> root@grml ~ # btrfs check /dev/mapper/foo
> >>>>>> Opening filesystem to check...
> >>>>>> Checking filesystem on /dev/mapper/foo
> >>>>>> UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>> [1/7] checking root items
> >>>>>> [2/7] checking extents
> >>>>>> [3/7] checking free space cache
> >>>>>> [4/7] checking fs roots
> >>>>>> [5/7] checking only csums items (without verifying data)
> >>>>>> [6/7] checking root refs
> >>>>>> [7/7] checking quota groups skipped (not enabled on this FS)
> >>>>>> found 7815716864 bytes used, no error found
> >>>>>> total csum bytes: 6428260
> >>>>>> total tree bytes: 175968256
> >>>>>> total fs tree bytes: 149475328
> >>>>>> total extent tree bytes: 16052224
> >>>>>> btree space waste bytes: 43268911
> >>>>>> file data blocks allocated: 10453221376
> >>>>>>  referenced 8746053632
> >>>>>
> >>>>> Errr, this is a super good news, all your fs metadata is completely fine
> >>>>> (at least for the first copy).
> >>>>> Which is completely different from the kernel dmesg.
> >>>>>
> >>>>>>
> >>>>>> root@grml ~ # lsblk /dev/sda5 --fs
> >>>>>> NAME  FSTYPE      FSVER LABEL UUID
> >>>>>> FSAVAIL FSUSE% MOUNTPOINT
> >>>>>> sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
> >>>>>> └─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>>
> >>>>>> root@grml ~ # mount /dev/mapper/foo /mnt
> >>>>>> root@grml ~ # btrfs scrub start /mnt
> >>>>>>
> >>>>>> root@grml ~ # journalctl -k --no-pager | grep BTRFS
> >>>>>> Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
> >>>>>> 24750795 /dev/dm-0 scanned by systemd-udevd (3233)
> >>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
> >>>>>> caching is enabled
> >>>>>> Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>>>> leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
> >>>>>> expect 439
> >>>>>
> >>>>> One error line without "read time corruption" line means btrfs kernel
> >>>>> indeed skipped to next copy.
> >>>>> In this case, there is one copy (aka the first copy) corrupted.
> >>>>> Strangely, if it's the first copy in kernel, it should also be the first
> >>>>> copy in btrfs check.
> >>>>>
> >>>>> And no problem reported from btrfs check, that's already super strange.
> >>>>>
> >>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
> >>>>>> optimizations
> >>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
> >>>>>> Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
> >>>>>> Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>>>> leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
> >>>>>> expect 439
> >>>>>> Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>>>> leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
> >>>>>> expect 439
> >>>>>> Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
> >>>>>> on devid 1 with status: 0
> >>>>>> Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
> >>>>>> leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
> >>>>>> expect 439
> >>>>>
> >>>>> This means the corrupted copy are also there for several (and I guess
> >>>>> unrelated) tree blocks.
> >>>>> For scrub I guess it just try to read the good copy without bothering
> >>>>> the bad one it found, so no error reported in scrub.
> >>>>>
> >>>>> But still, if you're using metadata without copy (aka, SINGLE, RAID0)
> >>>>> then it would be a completely different story.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> root@grml ~ # btrfs scrub status /mnt
> >>>>>> UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>> Scrub started:    Thu Jun  4 10:45:38 2020
> >>>>>> Status:           finished
> >>>>>> Duration:         0:00:53
> >>>>>> Total to scrub:   7.44GiB
> >>>>>> Rate:             143.80MiB/s
> >>>>>> Error summary:    no errors found
> >>>>>>
> >>>>>>
> >>>>>> root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
> >>>>>> btrfs ins dump-tree -b $block /dev/dm-0; done
> >>>>>> btrfs-progs v5.6
> >>>>>> leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
> >>>>>> leaf 54222848 flags 0x1(WRITTEN) backref revision 1
> >>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >>>>>>     item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
> >>>>>>         generation 24703953 transid 24703953 size 262144 nbytes 8595701760
> >>>>> ...
> >>>>>>         cache generation 24750791 entries 139 bitmaps 8
> >>>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >>>>>
> >>>>> So it's still there. The first copy is corrupted. Just btrfs-progs can't
> >>>>> detect it.
> >>>>>
> >>>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >>>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >>>>>>         drop key (0 UNKNOWN.0 0) level 0
> >>>>>> btrfs-progs v5.6
> >>>>>> leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
> >>>>>> leaf 29552640 flags 0x1(WRITTEN) backref revision 1
> >>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >>>>> ...
> >>>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >>>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >>>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >>>>>>         drop key (0 UNKNOWN.0 0) level 0
> >>>>>
> >>>>> This is different from previous copy, which means it should be an CoWed
> >>>>> tree blocks.
> >>>>>
> >>>>>> btrfs-progs v5.6
> >>>>>> leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE
> >>>>>
> >>>>> Even newer one.
> >>>>>
> >>>>> ...
> >>>>>> btrfs-progs v5.6
> >>>>>> leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE
> >>>>>
> >>>>> Newer.
> >>>>>
> >>>>> So It looks the bad copy exists for a while, but at the same time we
> >>>>> still have one good copy to let everything float.
> >>>>>
> >>>>> To kill all the old corrupted copies, if it supports TRIM/DISCARD, I
> >>>>> recommend to run scrub first, then fstrim on the fs.
> >>>>>
> >>>>> If it's HDD, I recommend to run a btrfs balance -m to relocate all
> >>>>> metadata blocks, to get rid the bad copies.
> >>>>>
> >>>>> Of course, all using v5.3+ kernels.
> >>>>>
> >>>>> Thanks,
> >>>>> Qu
> >>>>>>
> >>>>>> On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
> >>>>>>>> Thank you for you answer.
> >>>>>>>> I've just updated my system, did a reboot and it's running with a
> >>>>>>>> 5.6.0-2-amd64 now.
> >>>>>>>> So, this is how my kern.log looks like, just right after the start:
> >>>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> There are too many blocks. I just picked three randomly:
> >>>>>>>
> >>>>>>> Looks like we need more result, especially some result doesn't match at all.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> === Block 33017856 ===
> >>>>>>>> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
> >>>>>>>> btrfs-progs v5.6
> >>>>>>>> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
> >>>>>>>> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
> >>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >>>>>>> ...
> >>>>>>>>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
> >>>>>>>>                 generation 24749502 type 1 (regular)
> >>>>>>>>                 extent data disk byte 1126502400 nr 4096
> >>>>>>>>                 extent data offset 0 nr 8192 ram 8192
> >>>>>>>>                 extent compression 2 (lzo)
> >>>>>>>>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
> >>>>>>>>                 generation 24749502 type 1 (regular)
> >>>>>>>>                 extent data disk byte 0 nr 0
> >>>>>>>>                 extent data offset 1937408 nr 4096 ram 4194304
> >>>>>>>>                 extent compression 0 (none)
> >>>>>>> Not root item at all.
> >>>>>>> At least for this copy, it looks like kernel got one completely bad
> >>>>>>> copy, then discarded it and found a good copy.
> >>>>>>>
> >>>>>>> That's very strange, especially when all the other involved ones seems
> >>>>>>> random and all at slot 32 is not a coincident.
> >>>>>>>
> >>>>>>>
> >>>>>>>> === Block 44900352  ===
> >>>>>>>> btrfs ins dump-tree -b 44900352 /dev/dm-0
> >>>>>>>> btrfs-progs v5.6
> >>>>>>>> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
> >>>>>>>> leaf 44900352 flags 0x1(WRITTEN) backref revision 1
> >>>>>>>
> >>>>>>> This block doesn't even have slot 32... It only have 19 items, thus slot
> >>>>>>> 0 ~ slot 18.
> >>>>>>> And its owner, FS_TREE shouldn't have ROOT_ITEM.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> === Block 55352561664 ===
> >>>>>>>> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
> >>>>>>>> btrfs-progs v5.6
> >>>>>>>> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
> >>>>>>>> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
> >>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> >>>>>>> ...
> >>>>>>>>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> >>>>>>>>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> >>>>>>>>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> >>>>>>>>                 drop key (0 UNKNOWN.0 0) level 0
> >>>>>>>
> >>>>>>> This looks like the offending tree block.
> >>>>>>> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
> >>>>>>>
> >>>>>>> Since you're here, I guess a btrfs check without --repair on the
> >>>>>>> unmounted fs would help to identify the real damage.
> >>>>>>>
> >>>>>>> And again, the fs looks very damaged, it's highly recommended to backup
> >>>>>>> your data asap.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Qu
> >>>>>>>
> >>>>>>>> --- snap ---
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I've updated my system (Debian testing) [1] several months ago (~
> >>>>>>>>>> December) and I noticed a lot of corrupt leaf messages flooding my
> >>>>>>>>>> kern.log [2]. Furthermore my system had some trouble, e.g.
> >>>>>>>>>> applications were terminated after some uptime, due to the btrfs
> >>>>>>>>>> filesystem errors. This was with kernel 5.3.
> >>>>>>>>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
> >>>>>>>>>>
> >>>>>>>>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
> >>>>>>>>>> release and with this kernel there aren't any corrupt leaf messages
> >>>>>>>>>> and the problem is gone. IMHO, it must be something coming with kernel
> >>>>>>>>>> 5.3 (or 5.x).
> >>>>>>>>>
> >>>>>>>>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
> >>>>>>>>> such *obviously* wrong metadata.
> >>>>>>>>>>
> >>>>>>>>>> My harddisk is a SSD which is responsible for the root partition. I've
> >>>>>>>>>> encrypted my filesystem with LUKS and just right after I entered my
> >>>>>>>>>> password at the boot, the first corrupt leaf errors appear.
> >>>>>>>>>>
> >>>>>>>>>> An error message looks like this:
> >>>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>
> >>>>>>>>> Btrfs root items have fixed size. This is already something very bad.
> >>>>>>>>>
> >>>>>>>>> Furthermore, the item size is smaller than expected, which means we can
> >>>>>>>>> easily get garbage. I'm a little surprised that older kernel can even
> >>>>>>>>> work without crashing the whole kernel.
> >>>>>>>>>
> >>>>>>>>> Some extra info could help us to find out how badly the fs is corrupted.
> >>>>>>>>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
> >>>>>>>>>> error line. Only the block number changes.
> >>>>>>>>>
> >>>>>>>>> And dumps for the other block numbers too.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Interestingly it's the very same as reported to the ML here [3]. I've
> >>>>>>>>>> contacted the reporter, but he didn't have a solution for me, because
> >>>>>>>>>> he changed to a different filesystem.
> >>>>>>>>>>
> >>>>>>>>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
> >>>>>>>>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
> >>>>>>>>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
> >>>>>>>>>> again, w/o any errors.
> >>>>>>>>>
> >>>>>>>>> This doesn't look like a bit flip, so not RAM problems.
> >>>>>>>>>
> >>>>>>>>> Don't have any better advice until we got the dumps, but I'd recommend
> >>>>>>>>> to backup your data since it's still possible.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Qu
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> So, I have no more ideas what I can do. Could you please help me to
> >>>>>>>>>> investigate this further? Could it be a bug?
> >>>>>>>>>>
> >>>>>>>>>> Thank you very much.
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Thorsten
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 1:
> >>>>>>>>>> $ cat /etc/debian_version
> >>>>>>>>>> bullseye/sid
> >>>>>>>>>>
> >>>>>>>>>> $ uname -a
> >>>>>>>>>> [no problem with this kernel]
> >>>>>>>>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
> >>>>>>>>>>
> >>>>>>>>>> $ btrfs --version
> >>>>>>>>>> btrfs-progs v5.6
> >>>>>>>>>>
> >>>>>>>>>> $ sudo btrfs fi show
> >>>>>>>>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
> >>>>>>>>>>         Total devices 1 FS bytes used 7.33GiB
> >>>>>>>>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
> >>>>>>>>>>
> >>>>>>>>>> $ btrfs fi df /
> >>>>>>>>>> Data, single: total=22.01GiB, used=7.16GiB
> >>>>>>>>>> System, DUP: total=32.00MiB, used=4.00KiB
> >>>>>>>>>> System, single: total=4.00MiB, used=0.00B
> >>>>>>>>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
> >>>>>>>>>> Metadata, single: total=8.00MiB, used=0.00B
> >>>>>>>>>> GlobalReserve, single: total=25.42MiB, used=0.00B
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2:
> >>>>>>>>>> [several messages per second]
> >>>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
> >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
> >>>>>>>>>> size, have 239 expect 439
> >>>>>>>>>>
> >>>>>>>>>> 3:
> >>>>>>>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-06-16  5:41                     ` Thorsten Rehm
@ 2020-11-20 13:17                       ` Thorsten Rehm
  2020-11-20 13:47                         ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Thorsten Rehm @ 2020-11-20 13:17 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hi,

I'm very sorry, but I didn't have the time to do the btrfs-image dump.
I was just about to go back to work on the problem, but first I've
updated my system and now the problem is gone.
My system (Debian testing) is running with the latest available kernel
5.9.0-2 and btrfs-progs 5.9.
The last time I updated my system was 60 days ago and at this point
the problem still existed.
So, for now, no more corrupt leaf; invalid root item size erros.

I just wanted you and others to know.
Thanks again!


On Tue, 16 Jun 2020 at 07:41, Thorsten Rehm <thorsten.rehm@gmail.com> wrote:
>
> Yepp, sure.
> I will do that in the next few days.
>
>
> On Fri, Jun 12, 2020 at 8:50 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >
> > Would you mind to create a btrfs-image dump?
> >
> > It would greatly help us to pin down the cause.
> >
> > # btrfs-image -c9 <device> <file>
> >
> > Although it may leak sensitive data like file and dir names, you can try
> > -s options to fuzz them since it's not important in this particular
> > case, but it would cause more time and may cause some extra problems.
> >
> > After looking into related code, and your SINGLE metadata profile, I
> > can't find any clues yet.
> >
> > Thanks,
> > Qu
> >
> >
> > On 2020/6/8 下午10:41, Thorsten Rehm wrote:
> > > I just have to start my system with kernel 5.6. After that, the
> > > slot=32 error lines will be written. And only these lines:
> > >
> > > $ grep 'BTRFS critical' kern.log.1 | wc -l
> > > 1191
> > >
> > > $ grep 'slot=32' kern.log.1 | wc -l
> > > 1191
> > >
> > > $ grep 'corruption' kern.log.1 | wc -l
> > > 0
> > >
> > > Period: 10 Minutes (~1200 lines in 10 minutes).
> > >
> > > On Mon, Jun 8, 2020 at 3:29 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2020/6/8 下午9:25, Thorsten Rehm wrote:
> > >>> Hi,
> > >>>
> > >>> any more ideas to investigate this?
> > >>
> > >> If you can still hit the same bug, and the fs is still completely fine,
> > >> I could craft some test patches for you tomorrow.
> > >>
> > >> The idea behind it is to zero out all the memory for any bad eb.
> > >> Thus bad eb cache won't affect other read.
> > >> If that hugely reduced the frequency, I guess that would be the case.
> > >>
> > >>
> > >> But I'm still very interested in, have you hit "read time tree block
> > >> corruption detected" lines? Or just such slot=32 error lines?
> > >>
> > >> Thanks,
> > >> Qu
> > >>
> > >>>
> > >>> On Thu, Jun 4, 2020 at 7:57 PM Thorsten Rehm <thorsten.rehm@gmail.com> wrote:
> > >>>>
> > >>>> Hmm, ok wait a minute:
> > >>>>
> > >>>> "But still, if you're using metadata without copy (aka, SINGLE, RAID0)
> > >>>> then it would be a completely different story."
> > >>>>
> > >>>> It's a single disk (SSD):
> > >>>>
> > >>>> root@grml ~ # btrfs filesystem usage /mnt
> > >>>> Overall:
> > >>>>     Device size:         115.23GiB
> > >>>>     Device allocated:          26.08GiB
> > >>>>     Device unallocated:          89.15GiB
> > >>>>     Device missing:             0.00B
> > >>>>     Used:               7.44GiB
> > >>>>     Free (estimated):         104.04GiB    (min: 59.47GiB)
> > >>>>     Data ratio:                  1.00
> > >>>>     Metadata ratio:              2.00
> > >>>>     Global reserve:          25.25MiB    (used: 0.00B)
> > >>>>
> > >>>> Data,single: Size:22.01GiB, Used:7.11GiB (32.33%)
> > >>>>    /dev/mapper/foo      22.01GiB
> > >>>>
> > >>>> Metadata,single: Size:8.00MiB, Used:0.00B (0.00%)
> > >>>>    /dev/mapper/foo       8.00MiB
> > >>>>
> > >>>> Metadata,DUP: Size:2.00GiB, Used:167.81MiB (8.19%)
> > >>>>    /dev/mapper/foo       4.00GiB
> > >>>>
> > >>>> System,single: Size:4.00MiB, Used:0.00B (0.00%)
> > >>>>    /dev/mapper/foo       4.00MiB
> > >>>>
> > >>>> System,DUP: Size:32.00MiB, Used:4.00KiB (0.01%)
> > >>>>    /dev/mapper/foo      64.00MiB
> > >>>>
> > >>>> Unallocated:
> > >>>>    /dev/mapper/foo      89.15GiB
> > >>>>
> > >>>>
> > >>>> root@grml ~ # btrfs filesystem df /mnt
> > >>>> Data, single: total=22.01GiB, used=7.11GiB
> > >>>> System, DUP: total=32.00MiB, used=4.00KiB
> > >>>> System, single: total=4.00MiB, used=0.00B
> > >>>> Metadata, DUP: total=2.00GiB, used=167.81MiB
> > >>>> Metadata, single: total=8.00MiB, used=0.00B
> > >>>> GlobalReserve, single: total=25.25MiB, used=0.00B
> > >>>>
> > >>>> I did also a fstrim:
> > >>>>
> > >>>> root@grml ~ # cryptsetup --allow-discards open /dev/sda5 foo
> > >>>> Enter passphrase for /dev/sda5:
> > >>>> root@grml ~ # mount -o discard /dev/mapper/foo /mnt
> > >>>> root@grml ~ # fstrim -v /mnt/
> > >>>> /mnt/: 105.8 GiB (113600049152 bytes) trimmed
> > >>>> fstrim -v /mnt/  0.00s user 5.34s system 0% cpu 10:28.70 total
> > >>>>
> > >>>> The kern.log in the runtime of fstrim:
> > >>>> --- snip ---
> > >>>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
> > >>>> leaf: root=1 block=32505856 slot=32, invalid root item size, have 239
> > >>>> expect 439
> > >>>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
> > >>>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
> > >>>> expect 439
> > >>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): turning on sync discard
> > >>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): disk space
> > >>>> caching is enabled
> > >>>> Jun 04 12:32:37 grml kernel: BTRFS critical (device dm-0): corrupt
> > >>>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
> > >>>> expect 439
> > >>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): enabling ssd
> > >>>> optimizations
> > >>>> Jun 04 12:34:35 grml kernel: BTRFS critical (device dm-0): corrupt
> > >>>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
> > >>>> expect 439
> > >>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): turning on sync discard
> > >>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): disk space
> > >>>> caching is enabled
> > >>>> Jun 04 12:36:50 grml kernel: BTRFS critical (device dm-0): corrupt
> > >>>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
> > >>>> expect 439
> > >>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): enabling ssd
> > >>>> optimizations
> > >>>> --- snap ---
> > >>>>
> > >>>> Furthermore the system runs for years now. I can't remember exactly,
> > >>>> but think for 4-5 years. I've started with Debian Testing and just
> > >>>> upgraded my system on a regular basis. And and I started with btrfs of
> > >>>> course, but I can't remember with which version...
> > >>>>
> > >>>> The problem is still there after the fstrim. Any further suggestions?
> > >>>>
> > >>>> And isn't it a little bit strange, that someone had a very similiar problem?
> > >>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> > >>>>
> > >>>> root=1, slot=32, and "invalid root item size, have 239 expect 439" are
> > >>>> identical to my errors.
> > >>>>
> > >>>> Thx so far!
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Thu, Jun 4, 2020 at 2:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 2020/6/4 下午6:52, Thorsten Rehm wrote:
> > >>>>>> The disk in question is my root (/) partition. If the filesystem is
> > >>>>>> that highly damaged, I have to reinstall my system. We will see, if
> > >>>>>> it's come to that. Maybe we find something interesting on the way...
> > >>>>>> I've downloaded the latest grml daily image and started my system from
> > >>>>>> a usb stick. Here we go:
> > >>>>>>
> > >>>>>> root@grml ~ # uname -r
> > >>>>>> 5.6.0-2-amd64
> > >>>>>>
> > >>>>>> root@grml ~ # cryptsetup open /dev/sda5 foo
> > >>>>>>
> > >>>>>>                                                                   :(
> > >>>>>> Enter passphrase for /dev/sda5:
> > >>>>>>
> > >>>>>> root@grml ~ # file -L -s /dev/mapper/foo
> > >>>>>> /dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
> > >>>>>> nodesize 4096, leafsize 4096,
> > >>>>>> UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
> > >>>>>> bytes used, 1 devices
> > >>>>>>
> > >>>>>> root@grml ~ # btrfs check /dev/mapper/foo
> > >>>>>> Opening filesystem to check...
> > >>>>>> Checking filesystem on /dev/mapper/foo
> > >>>>>> UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>>>>> [1/7] checking root items
> > >>>>>> [2/7] checking extents
> > >>>>>> [3/7] checking free space cache
> > >>>>>> [4/7] checking fs roots
> > >>>>>> [5/7] checking only csums items (without verifying data)
> > >>>>>> [6/7] checking root refs
> > >>>>>> [7/7] checking quota groups skipped (not enabled on this FS)
> > >>>>>> found 7815716864 bytes used, no error found
> > >>>>>> total csum bytes: 6428260
> > >>>>>> total tree bytes: 175968256
> > >>>>>> total fs tree bytes: 149475328
> > >>>>>> total extent tree bytes: 16052224
> > >>>>>> btree space waste bytes: 43268911
> > >>>>>> file data blocks allocated: 10453221376
> > >>>>>>  referenced 8746053632
> > >>>>>
> > >>>>> Errr, this is a super good news, all your fs metadata is completely fine
> > >>>>> (at least for the first copy).
> > >>>>> Which is completely different from the kernel dmesg.
> > >>>>>
> > >>>>>>
> > >>>>>> root@grml ~ # lsblk /dev/sda5 --fs
> > >>>>>> NAME  FSTYPE      FSVER LABEL UUID
> > >>>>>> FSAVAIL FSUSE% MOUNTPOINT
> > >>>>>> sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
> > >>>>>> └─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>>>>>
> > >>>>>> root@grml ~ # mount /dev/mapper/foo /mnt
> > >>>>>> root@grml ~ # btrfs scrub start /mnt
> > >>>>>>
> > >>>>>> root@grml ~ # journalctl -k --no-pager | grep BTRFS
> > >>>>>> Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
> > >>>>>> 24750795 /dev/dm-0 scanned by systemd-udevd (3233)
> > >>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
> > >>>>>> caching is enabled
> > >>>>>> Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
> > >>>>>> leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
> > >>>>>> expect 439
> > >>>>>
> > >>>>> One error line without "read time corruption" line means btrfs kernel
> > >>>>> indeed skipped to next copy.
> > >>>>> In this case, there is one copy (aka the first copy) corrupted.
> > >>>>> Strangely, if it's the first copy in kernel, it should also be the first
> > >>>>> copy in btrfs check.
> > >>>>>
> > >>>>> And no problem reported from btrfs check, that's already super strange.
> > >>>>>
> > >>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
> > >>>>>> optimizations
> > >>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
> > >>>>>> Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
> > >>>>>> Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
> > >>>>>> leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
> > >>>>>> expect 439
> > >>>>>> Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
> > >>>>>> leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
> > >>>>>> expect 439
> > >>>>>> Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
> > >>>>>> on devid 1 with status: 0
> > >>>>>> Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
> > >>>>>> leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
> > >>>>>> expect 439
> > >>>>>
> > >>>>> This means the corrupted copy are also there for several (and I guess
> > >>>>> unrelated) tree blocks.
> > >>>>> For scrub I guess it just try to read the good copy without bothering
> > >>>>> the bad one it found, so no error reported in scrub.
> > >>>>>
> > >>>>> But still, if you're using metadata without copy (aka, SINGLE, RAID0)
> > >>>>> then it would be a completely different story.
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>> root@grml ~ # btrfs scrub status /mnt
> > >>>>>> UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>>>>> Scrub started:    Thu Jun  4 10:45:38 2020
> > >>>>>> Status:           finished
> > >>>>>> Duration:         0:00:53
> > >>>>>> Total to scrub:   7.44GiB
> > >>>>>> Rate:             143.80MiB/s
> > >>>>>> Error summary:    no errors found
> > >>>>>>
> > >>>>>>
> > >>>>>> root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
> > >>>>>> btrfs ins dump-tree -b $block /dev/dm-0; done
> > >>>>>> btrfs-progs v5.6
> > >>>>>> leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
> > >>>>>> leaf 54222848 flags 0x1(WRITTEN) backref revision 1
> > >>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> > >>>>>>     item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
> > >>>>>>         generation 24703953 transid 24703953 size 262144 nbytes 8595701760
> > >>>>> ...
> > >>>>>>         cache generation 24750791 entries 139 bitmaps 8
> > >>>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> > >>>>>
> > >>>>> So it's still there. The first copy is corrupted. Just btrfs-progs can't
> > >>>>> detect it.
> > >>>>>
> > >>>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> > >>>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> > >>>>>>         drop key (0 UNKNOWN.0 0) level 0
> > >>>>>> btrfs-progs v5.6
> > >>>>>> leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
> > >>>>>> leaf 29552640 flags 0x1(WRITTEN) backref revision 1
> > >>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> > >>>>> ...
> > >>>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> > >>>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> > >>>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> > >>>>>>         drop key (0 UNKNOWN.0 0) level 0
> > >>>>>
> > >>>>> This is different from previous copy, which means it should be an CoWed
> > >>>>> tree blocks.
> > >>>>>
> > >>>>>> btrfs-progs v5.6
> > >>>>>> leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE
> > >>>>>
> > >>>>> Even newer one.
> > >>>>>
> > >>>>> ...
> > >>>>>> btrfs-progs v5.6
> > >>>>>> leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE
> > >>>>>
> > >>>>> Newer.
> > >>>>>
> > >>>>> So It looks the bad copy exists for a while, but at the same time we
> > >>>>> still have one good copy to let everything float.
> > >>>>>
> > >>>>> To kill all the old corrupted copies, if it supports TRIM/DISCARD, I
> > >>>>> recommend to run scrub first, then fstrim on the fs.
> > >>>>>
> > >>>>> If it's HDD, I recommend to run a btrfs balance -m to relocate all
> > >>>>> metadata blocks, to get rid the bad copies.
> > >>>>>
> > >>>>> Of course, all using v5.3+ kernels.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Qu
> > >>>>>>
> > >>>>>> On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
> > >>>>>>>> Thank you for you answer.
> > >>>>>>>> I've just updated my system, did a reboot and it's running with a
> > >>>>>>>> 5.6.0-2-amd64 now.
> > >>>>>>>> So, this is how my kern.log looks like, just right after the start:
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> There are too many blocks. I just picked three randomly:
> > >>>>>>>
> > >>>>>>> Looks like we need more result, especially some result doesn't match at all.
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> === Block 33017856 ===
> > >>>>>>>> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
> > >>>>>>>> btrfs-progs v5.6
> > >>>>>>>> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
> > >>>>>>>> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
> > >>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> > >>>>>>> ...
> > >>>>>>>>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
> > >>>>>>>>                 generation 24749502 type 1 (regular)
> > >>>>>>>>                 extent data disk byte 1126502400 nr 4096
> > >>>>>>>>                 extent data offset 0 nr 8192 ram 8192
> > >>>>>>>>                 extent compression 2 (lzo)
> > >>>>>>>>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
> > >>>>>>>>                 generation 24749502 type 1 (regular)
> > >>>>>>>>                 extent data disk byte 0 nr 0
> > >>>>>>>>                 extent data offset 1937408 nr 4096 ram 4194304
> > >>>>>>>>                 extent compression 0 (none)
> > >>>>>>> Not root item at all.
> > >>>>>>> At least for this copy, it looks like kernel got one completely bad
> > >>>>>>> copy, then discarded it and found a good copy.
> > >>>>>>>
> > >>>>>>> That's very strange, especially when all the other involved ones seems
> > >>>>>>> random and all at slot 32 is not a coincident.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> === Block 44900352  ===
> > >>>>>>>> btrfs ins dump-tree -b 44900352 /dev/dm-0
> > >>>>>>>> btrfs-progs v5.6
> > >>>>>>>> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
> > >>>>>>>> leaf 44900352 flags 0x1(WRITTEN) backref revision 1
> > >>>>>>>
> > >>>>>>> This block doesn't even have slot 32... It only have 19 items, thus slot
> > >>>>>>> 0 ~ slot 18.
> > >>>>>>> And its owner, FS_TREE shouldn't have ROOT_ITEM.
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> === Block 55352561664 ===
> > >>>>>>>> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
> > >>>>>>>> btrfs-progs v5.6
> > >>>>>>>> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
> > >>>>>>>> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
> > >>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
> > >>>>>>> ...
> > >>>>>>>>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
> > >>>>>>>>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
> > >>>>>>>>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
> > >>>>>>>>                 drop key (0 UNKNOWN.0 0) level 0
> > >>>>>>>
> > >>>>>>> This looks like the offending tree block.
> > >>>>>>> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
> > >>>>>>>
> > >>>>>>> Since you're here, I guess a btrfs check without --repair on the
> > >>>>>>> unmounted fs would help to identify the real damage.
> > >>>>>>>
> > >>>>>>> And again, the fs looks very damaged, it's highly recommended to backup
> > >>>>>>> your data asap.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Qu
> > >>>>>>>
> > >>>>>>>> --- snap ---
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> I've updated my system (Debian testing) [1] several months ago (~
> > >>>>>>>>>> December) and I noticed a lot of corrupt leaf messages flooding my
> > >>>>>>>>>> kern.log [2]. Furthermore my system had some trouble, e.g.
> > >>>>>>>>>> applications were terminated after some uptime, due to the btrfs
> > >>>>>>>>>> filesystem errors. This was with kernel 5.3.
> > >>>>>>>>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
> > >>>>>>>>>>
> > >>>>>>>>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
> > >>>>>>>>>> release and with this kernel there aren't any corrupt leaf messages
> > >>>>>>>>>> and the problem is gone. IMHO, it must be something coming with kernel
> > >>>>>>>>>> 5.3 (or 5.x).
> > >>>>>>>>>
> > >>>>>>>>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
> > >>>>>>>>> such *obviously* wrong metadata.
> > >>>>>>>>>>
> > >>>>>>>>>> My harddisk is a SSD which is responsible for the root partition. I've
> > >>>>>>>>>> encrypted my filesystem with LUKS and just right after I entered my
> > >>>>>>>>>> password at the boot, the first corrupt leaf errors appear.
> > >>>>>>>>>>
> > >>>>>>>>>> An error message looks like this:
> > >>>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>
> > >>>>>>>>> Btrfs root items have fixed size. This is already something very bad.
> > >>>>>>>>>
> > >>>>>>>>> Furthermore, the item size is smaller than expected, which means we can
> > >>>>>>>>> easily get garbage. I'm a little surprised that older kernel can even
> > >>>>>>>>> work without crashing the whole kernel.
> > >>>>>>>>>
> > >>>>>>>>> Some extra info could help us to find out how badly the fs is corrupted.
> > >>>>>>>>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
> > >>>>>>>>>> error line. Only the block number changes.
> > >>>>>>>>>
> > >>>>>>>>> And dumps for the other block numbers too.
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Interestingly it's the very same as reported to the ML here [3]. I've
> > >>>>>>>>>> contacted the reporter, but he didn't have a solution for me, because
> > >>>>>>>>>> he changed to a different filesystem.
> > >>>>>>>>>>
> > >>>>>>>>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
> > >>>>>>>>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
> > >>>>>>>>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
> > >>>>>>>>>> again, w/o any errors.
> > >>>>>>>>>
> > >>>>>>>>> This doesn't look like a bit flip, so not RAM problems.
> > >>>>>>>>>
> > >>>>>>>>> Don't have any better advice until we got the dumps, but I'd recommend
> > >>>>>>>>> to backup your data since it's still possible.
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Qu
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> So, I have no more ideas what I can do. Could you please help me to
> > >>>>>>>>>> investigate this further? Could it be a bug?
> > >>>>>>>>>>
> > >>>>>>>>>> Thank you very much.
> > >>>>>>>>>>
> > >>>>>>>>>> Best regards,
> > >>>>>>>>>> Thorsten
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> 1:
> > >>>>>>>>>> $ cat /etc/debian_version
> > >>>>>>>>>> bullseye/sid
> > >>>>>>>>>>
> > >>>>>>>>>> $ uname -a
> > >>>>>>>>>> [no problem with this kernel]
> > >>>>>>>>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
> > >>>>>>>>>>
> > >>>>>>>>>> $ btrfs --version
> > >>>>>>>>>> btrfs-progs v5.6
> > >>>>>>>>>>
> > >>>>>>>>>> $ sudo btrfs fi show
> > >>>>>>>>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
> > >>>>>>>>>>         Total devices 1 FS bytes used 7.33GiB
> > >>>>>>>>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
> > >>>>>>>>>>
> > >>>>>>>>>> $ btrfs fi df /
> > >>>>>>>>>> Data, single: total=22.01GiB, used=7.16GiB
> > >>>>>>>>>> System, DUP: total=32.00MiB, used=4.00KiB
> > >>>>>>>>>> System, single: total=4.00MiB, used=0.00B
> > >>>>>>>>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
> > >>>>>>>>>> Metadata, single: total=8.00MiB, used=0.00B
> > >>>>>>>>>> GlobalReserve, single: total=25.42MiB, used=0.00B
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> 2:
> > >>>>>>>>>> [several messages per second]
> > >>>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
> > >>>>>>>>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
> > >>>>>>>>>> size, have 239 expect 439
> > >>>>>>>>>>
> > >>>>>>>>>> 3:
> > >>>>>>>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: corrupt leaf; invalid root item size
  2020-11-20 13:17                       ` Thorsten Rehm
@ 2020-11-20 13:47                         ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2020-11-20 13:47 UTC (permalink / raw)
  To: Thorsten Rehm; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 26307 bytes --]



On 2020/11/20 下午9:17, Thorsten Rehm wrote:
> Hi,
> 
> I'm very sorry, but I didn't have the time to do the btrfs-image dump.
> I was just about to go back to work on the problem, but first I've
> updated my system and now the problem is gone.
> My system (Debian testing) is running with the latest available kernel
> 5.9.0-2 and btrfs-progs 5.9.
> The last time I updated my system was 60 days ago and at this point
> the problem still existed.
> So, for now, no more corrupt leaf; invalid root item size erros.

Oh, that's because we have located the cause and fixed the false alert.

The fix is this one:
1465af12e254 ("btrfs: tree-checker: fix false alert caused by legacy
btrfs root item")

Some legacy root item can have smaller size than what we have now.
Thanks for another reporter's dump, we fixed it and existing kernels
should receive the backport already.

Thanks,
Qu
> 
> I just wanted you and others to know.
> Thanks again!
> 
> 
> On Tue, 16 Jun 2020 at 07:41, Thorsten Rehm <thorsten.rehm@gmail.com> wrote:
>>
>> Yepp, sure.
>> I will do that in the next few days.
>>
>>
>> On Fri, Jun 12, 2020 at 8:50 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>> Would you mind to create a btrfs-image dump?
>>>
>>> It would greatly help us to pin down the cause.
>>>
>>> # btrfs-image -c9 <device> <file>
>>>
>>> Although it may leak sensitive data like file and dir names, you can try
>>> -s options to fuzz them since it's not important in this particular
>>> case, but it would cause more time and may cause some extra problems.
>>>
>>> After looking into related code, and your SINGLE metadata profile, I
>>> can't find any clues yet.
>>>
>>> Thanks,
>>> Qu
>>>
>>>
>>> On 2020/6/8 下午10:41, Thorsten Rehm wrote:
>>>> I just have to start my system with kernel 5.6. After that, the
>>>> slot=32 error lines will be written. And only these lines:
>>>>
>>>> $ grep 'BTRFS critical' kern.log.1 | wc -l
>>>> 1191
>>>>
>>>> $ grep 'slot=32' kern.log.1 | wc -l
>>>> 1191
>>>>
>>>> $ grep 'corruption' kern.log.1 | wc -l
>>>> 0
>>>>
>>>> Period: 10 Minutes (~1200 lines in 10 minutes).
>>>>
>>>> On Mon, Jun 8, 2020 at 3:29 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2020/6/8 下午9:25, Thorsten Rehm wrote:
>>>>>> Hi,
>>>>>>
>>>>>> any more ideas to investigate this?
>>>>>
>>>>> If you can still hit the same bug, and the fs is still completely fine,
>>>>> I could craft some test patches for you tomorrow.
>>>>>
>>>>> The idea behind it is to zero out all the memory for any bad eb.
>>>>> Thus bad eb cache won't affect other read.
>>>>> If that hugely reduced the frequency, I guess that would be the case.
>>>>>
>>>>>
>>>>> But I'm still very interested in, have you hit "read time tree block
>>>>> corruption detected" lines? Or just such slot=32 error lines?
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>>
>>>>>> On Thu, Jun 4, 2020 at 7:57 PM Thorsten Rehm <thorsten.rehm@gmail.com> wrote:
>>>>>>>
>>>>>>> Hmm, ok wait a minute:
>>>>>>>
>>>>>>> "But still, if you're using metadata without copy (aka, SINGLE, RAID0)
>>>>>>> then it would be a completely different story."
>>>>>>>
>>>>>>> It's a single disk (SSD):
>>>>>>>
>>>>>>> root@grml ~ # btrfs filesystem usage /mnt
>>>>>>> Overall:
>>>>>>>     Device size:         115.23GiB
>>>>>>>     Device allocated:          26.08GiB
>>>>>>>     Device unallocated:          89.15GiB
>>>>>>>     Device missing:             0.00B
>>>>>>>     Used:               7.44GiB
>>>>>>>     Free (estimated):         104.04GiB    (min: 59.47GiB)
>>>>>>>     Data ratio:                  1.00
>>>>>>>     Metadata ratio:              2.00
>>>>>>>     Global reserve:          25.25MiB    (used: 0.00B)
>>>>>>>
>>>>>>> Data,single: Size:22.01GiB, Used:7.11GiB (32.33%)
>>>>>>>    /dev/mapper/foo      22.01GiB
>>>>>>>
>>>>>>> Metadata,single: Size:8.00MiB, Used:0.00B (0.00%)
>>>>>>>    /dev/mapper/foo       8.00MiB
>>>>>>>
>>>>>>> Metadata,DUP: Size:2.00GiB, Used:167.81MiB (8.19%)
>>>>>>>    /dev/mapper/foo       4.00GiB
>>>>>>>
>>>>>>> System,single: Size:4.00MiB, Used:0.00B (0.00%)
>>>>>>>    /dev/mapper/foo       4.00MiB
>>>>>>>
>>>>>>> System,DUP: Size:32.00MiB, Used:4.00KiB (0.01%)
>>>>>>>    /dev/mapper/foo      64.00MiB
>>>>>>>
>>>>>>> Unallocated:
>>>>>>>    /dev/mapper/foo      89.15GiB
>>>>>>>
>>>>>>>
>>>>>>> root@grml ~ # btrfs filesystem df /mnt
>>>>>>> Data, single: total=22.01GiB, used=7.11GiB
>>>>>>> System, DUP: total=32.00MiB, used=4.00KiB
>>>>>>> System, single: total=4.00MiB, used=0.00B
>>>>>>> Metadata, DUP: total=2.00GiB, used=167.81MiB
>>>>>>> Metadata, single: total=8.00MiB, used=0.00B
>>>>>>> GlobalReserve, single: total=25.25MiB, used=0.00B
>>>>>>>
>>>>>>> I did also a fstrim:
>>>>>>>
>>>>>>> root@grml ~ # cryptsetup --allow-discards open /dev/sda5 foo
>>>>>>> Enter passphrase for /dev/sda5:
>>>>>>> root@grml ~ # mount -o discard /dev/mapper/foo /mnt
>>>>>>> root@grml ~ # fstrim -v /mnt/
>>>>>>> /mnt/: 105.8 GiB (113600049152 bytes) trimmed
>>>>>>> fstrim -v /mnt/  0.00s user 5.34s system 0% cpu 10:28.70 total
>>>>>>>
>>>>>>> The kern.log in the runtime of fstrim:
>>>>>>> --- snip ---
>>>>>>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>>> leaf: root=1 block=32505856 slot=32, invalid root item size, have 239
>>>>>>> expect 439
>>>>>>> Jun 04 12:32:02 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
>>>>>>> expect 439
>>>>>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): turning on sync discard
>>>>>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): disk space
>>>>>>> caching is enabled
>>>>>>> Jun 04 12:32:37 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>>> leaf: root=1 block=32813056 slot=32, invalid root item size, have 239
>>>>>>> expect 439
>>>>>>> Jun 04 12:32:37 grml kernel: BTRFS info (device dm-0): enabling ssd
>>>>>>> optimizations
>>>>>>> Jun 04 12:34:35 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
>>>>>>> expect 439
>>>>>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): turning on sync discard
>>>>>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): disk space
>>>>>>> caching is enabled
>>>>>>> Jun 04 12:36:50 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>>> leaf: root=1 block=32382976 slot=32, invalid root item size, have 239
>>>>>>> expect 439
>>>>>>> Jun 04 12:36:50 grml kernel: BTRFS info (device dm-0): enabling ssd
>>>>>>> optimizations
>>>>>>> --- snap ---
>>>>>>>
>>>>>>> Furthermore the system runs for years now. I can't remember exactly,
>>>>>>> but think for 4-5 years. I've started with Debian Testing and just
>>>>>>> upgraded my system on a regular basis. And and I started with btrfs of
>>>>>>> course, but I can't remember with which version...
>>>>>>>
>>>>>>> The problem is still there after the fstrim. Any further suggestions?
>>>>>>>
>>>>>>> And isn't it a little bit strange, that someone had a very similiar problem?
>>>>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
>>>>>>>
>>>>>>> root=1, slot=32, and "invalid root item size, have 239 expect 439" are
>>>>>>> identical to my errors.
>>>>>>>
>>>>>>> Thx so far!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 4, 2020 at 2:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020/6/4 下午6:52, Thorsten Rehm wrote:
>>>>>>>>> The disk in question is my root (/) partition. If the filesystem is
>>>>>>>>> that highly damaged, I have to reinstall my system. We will see, if
>>>>>>>>> it's come to that. Maybe we find something interesting on the way...
>>>>>>>>> I've downloaded the latest grml daily image and started my system from
>>>>>>>>> a usb stick. Here we go:
>>>>>>>>>
>>>>>>>>> root@grml ~ # uname -r
>>>>>>>>> 5.6.0-2-amd64
>>>>>>>>>
>>>>>>>>> root@grml ~ # cryptsetup open /dev/sda5 foo
>>>>>>>>>
>>>>>>>>>                                                                   :(
>>>>>>>>> Enter passphrase for /dev/sda5:
>>>>>>>>>
>>>>>>>>> root@grml ~ # file -L -s /dev/mapper/foo
>>>>>>>>> /dev/mapper/foo: BTRFS Filesystem label "slash", sectorsize 4096,
>>>>>>>>> nodesize 4096, leafsize 4096,
>>>>>>>>> UUID=65005d0f-f8ea-4f77-8372-eb8b53198685, 7815716864/123731968000
>>>>>>>>> bytes used, 1 devices
>>>>>>>>>
>>>>>>>>> root@grml ~ # btrfs check /dev/mapper/foo
>>>>>>>>> Opening filesystem to check...
>>>>>>>>> Checking filesystem on /dev/mapper/foo
>>>>>>>>> UUID: 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>> [1/7] checking root items
>>>>>>>>> [2/7] checking extents
>>>>>>>>> [3/7] checking free space cache
>>>>>>>>> [4/7] checking fs roots
>>>>>>>>> [5/7] checking only csums items (without verifying data)
>>>>>>>>> [6/7] checking root refs
>>>>>>>>> [7/7] checking quota groups skipped (not enabled on this FS)
>>>>>>>>> found 7815716864 bytes used, no error found
>>>>>>>>> total csum bytes: 6428260
>>>>>>>>> total tree bytes: 175968256
>>>>>>>>> total fs tree bytes: 149475328
>>>>>>>>> total extent tree bytes: 16052224
>>>>>>>>> btree space waste bytes: 43268911
>>>>>>>>> file data blocks allocated: 10453221376
>>>>>>>>>  referenced 8746053632
>>>>>>>>
>>>>>>>> Errr, this is a super good news, all your fs metadata is completely fine
>>>>>>>> (at least for the first copy).
>>>>>>>> Which is completely different from the kernel dmesg.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> root@grml ~ # lsblk /dev/sda5 --fs
>>>>>>>>> NAME  FSTYPE      FSVER LABEL UUID
>>>>>>>>> FSAVAIL FSUSE% MOUNTPOINT
>>>>>>>>> sda5  crypto_LUKS 1           d2b4fa40-8afd-4e16-b207-4d106096fd22
>>>>>>>>> └─foo btrfs             slash 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>>
>>>>>>>>> root@grml ~ # mount /dev/mapper/foo /mnt
>>>>>>>>> root@grml ~ # btrfs scrub start /mnt
>>>>>>>>>
>>>>>>>>> root@grml ~ # journalctl -k --no-pager | grep BTRFS
>>>>>>>>> Jun 04 10:33:04 grml kernel: BTRFS: device label slash devid 1 transid
>>>>>>>>> 24750795 /dev/dm-0 scanned by systemd-udevd (3233)
>>>>>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): disk space
>>>>>>>>> caching is enabled
>>>>>>>>> Jun 04 10:45:17 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>>>>> leaf: root=1 block=54222848 slot=32, invalid root item size, have 239
>>>>>>>>> expect 439
>>>>>>>>
>>>>>>>> One error line without "read time corruption" line means btrfs kernel
>>>>>>>> indeed skipped to next copy.
>>>>>>>> In this case, there is one copy (aka the first copy) corrupted.
>>>>>>>> Strangely, if it's the first copy in kernel, it should also be the first
>>>>>>>> copy in btrfs check.
>>>>>>>>
>>>>>>>> And no problem reported from btrfs check, that's already super strange.
>>>>>>>>
>>>>>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): enabling ssd
>>>>>>>>> optimizations
>>>>>>>>> Jun 04 10:45:17 grml kernel: BTRFS info (device dm-0): checking UUID tree
>>>>>>>>> Jun 04 10:45:38 grml kernel: BTRFS info (device dm-0): scrub: started on devid 1
>>>>>>>>> Jun 04 10:45:49 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>>>>> leaf: root=1 block=29552640 slot=32, invalid root item size, have 239
>>>>>>>>> expect 439
>>>>>>>>> Jun 04 10:46:25 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>>>>> leaf: root=1 block=29741056 slot=32, invalid root item size, have 239
>>>>>>>>> expect 439
>>>>>>>>> Jun 04 10:46:31 grml kernel: BTRFS info (device dm-0): scrub: finished
>>>>>>>>> on devid 1 with status: 0
>>>>>>>>> Jun 04 10:46:56 grml kernel: BTRFS critical (device dm-0): corrupt
>>>>>>>>> leaf: root=1 block=29974528 slot=32, invalid root item size, have 239
>>>>>>>>> expect 439
>>>>>>>>
>>>>>>>> This means the corrupted copy are also there for several (and I guess
>>>>>>>> unrelated) tree blocks.
>>>>>>>> For scrub I guess it just try to read the good copy without bothering
>>>>>>>> the bad one it found, so no error reported in scrub.
>>>>>>>>
>>>>>>>> But still, if you're using metadata without copy (aka, SINGLE, RAID0)
>>>>>>>> then it would be a completely different story.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> root@grml ~ # btrfs scrub status /mnt
>>>>>>>>> UUID:             65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>> Scrub started:    Thu Jun  4 10:45:38 2020
>>>>>>>>> Status:           finished
>>>>>>>>> Duration:         0:00:53
>>>>>>>>> Total to scrub:   7.44GiB
>>>>>>>>> Rate:             143.80MiB/s
>>>>>>>>> Error summary:    no errors found
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> root@grml ~ # for block in 54222848 29552640 29741056 29974528; do
>>>>>>>>> btrfs ins dump-tree -b $block /dev/dm-0; done
>>>>>>>>> btrfs-progs v5.6
>>>>>>>>> leaf 54222848 items 33 free space 1095 generation 24750795 owner ROOT_TREE
>>>>>>>>> leaf 54222848 flags 0x1(WRITTEN) backref revision 1
>>>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>>>>>>     item 0 key (289 INODE_ITEM 0) itemoff 3835 itemsize 160
>>>>>>>>>         generation 24703953 transid 24703953 size 262144 nbytes 8595701760
>>>>>>>> ...
>>>>>>>>>         cache generation 24750791 entries 139 bitmaps 8
>>>>>>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>>>>>>
>>>>>>>> So it's still there. The first copy is corrupted. Just btrfs-progs can't
>>>>>>>> detect it.
>>>>>>>>
>>>>>>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>>>>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>>>>>>>         drop key (0 UNKNOWN.0 0) level 0
>>>>>>>>> btrfs-progs v5.6
>>>>>>>>> leaf 29552640 items 33 free space 1095 generation 24750796 owner ROOT_TREE
>>>>>>>>> leaf 29552640 flags 0x1(WRITTEN) backref revision 1
>>>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>>>>> ...
>>>>>>>>>     item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>>>>>>>         generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>>>>>>>         lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>>>>>>>         drop key (0 UNKNOWN.0 0) level 0
>>>>>>>>
>>>>>>>> This is different from previous copy, which means it should be an CoWed
>>>>>>>> tree blocks.
>>>>>>>>
>>>>>>>>> btrfs-progs v5.6
>>>>>>>>> leaf 29741056 items 33 free space 1095 generation 24750797 owner ROOT_TREE
>>>>>>>>
>>>>>>>> Even newer one.
>>>>>>>>
>>>>>>>> ...
>>>>>>>>> btrfs-progs v5.6
>>>>>>>>> leaf 29974528 items 33 free space 1095 generation 24750798 owner ROOT_TREE
>>>>>>>>
>>>>>>>> Newer.
>>>>>>>>
>>>>>>>> So It looks the bad copy exists for a while, but at the same time we
>>>>>>>> still have one good copy to let everything float.
>>>>>>>>
>>>>>>>> To kill all the old corrupted copies, if it supports TRIM/DISCARD, I
>>>>>>>> recommend to run scrub first, then fstrim on the fs.
>>>>>>>>
>>>>>>>> If it's HDD, I recommend to run a btrfs balance -m to relocate all
>>>>>>>> metadata blocks, to get rid the bad copies.
>>>>>>>>
>>>>>>>> Of course, all using v5.3+ kernels.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Qu
>>>>>>>>>
>>>>>>>>> On Thu, Jun 4, 2020 at 12:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2020/6/4 下午5:45, Thorsten Rehm wrote:
>>>>>>>>>>> Thank you for you answer.
>>>>>>>>>>> I've just updated my system, did a reboot and it's running with a
>>>>>>>>>>> 5.6.0-2-amd64 now.
>>>>>>>>>>> So, this is how my kern.log looks like, just right after the start:
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> There are too many blocks. I just picked three randomly:
>>>>>>>>>>
>>>>>>>>>> Looks like we need more result, especially some result doesn't match at all.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> === Block 33017856 ===
>>>>>>>>>>> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
>>>>>>>>>>> btrfs-progs v5.6
>>>>>>>>>>> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
>>>>>>>>>>> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
>>>>>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>>>>>>> ...
>>>>>>>>>>>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
>>>>>>>>>>>                 generation 24749502 type 1 (regular)
>>>>>>>>>>>                 extent data disk byte 1126502400 nr 4096
>>>>>>>>>>>                 extent data offset 0 nr 8192 ram 8192
>>>>>>>>>>>                 extent compression 2 (lzo)
>>>>>>>>>>>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
>>>>>>>>>>>                 generation 24749502 type 1 (regular)
>>>>>>>>>>>                 extent data disk byte 0 nr 0
>>>>>>>>>>>                 extent data offset 1937408 nr 4096 ram 4194304
>>>>>>>>>>>                 extent compression 0 (none)
>>>>>>>>>> Not root item at all.
>>>>>>>>>> At least for this copy, it looks like kernel got one completely bad
>>>>>>>>>> copy, then discarded it and found a good copy.
>>>>>>>>>>
>>>>>>>>>> That's very strange, especially when all the other involved ones seems
>>>>>>>>>> random and all at slot 32 is not a coincident.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> === Block 44900352  ===
>>>>>>>>>>> btrfs ins dump-tree -b 44900352 /dev/dm-0
>>>>>>>>>>> btrfs-progs v5.6
>>>>>>>>>>> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
>>>>>>>>>>> leaf 44900352 flags 0x1(WRITTEN) backref revision 1
>>>>>>>>>>
>>>>>>>>>> This block doesn't even have slot 32... It only have 19 items, thus slot
>>>>>>>>>> 0 ~ slot 18.
>>>>>>>>>> And its owner, FS_TREE shouldn't have ROOT_ITEM.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> === Block 55352561664 ===
>>>>>>>>>>> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
>>>>>>>>>>> btrfs-progs v5.6
>>>>>>>>>>> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
>>>>>>>>>>> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
>>>>>>>>>>> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>>>> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
>>>>>>>>>> ...
>>>>>>>>>>>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>>>>>>>>>>>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>>>>>>>>>>>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>>>>>>>>>>>                 drop key (0 UNKNOWN.0 0) level 0
>>>>>>>>>>
>>>>>>>>>> This looks like the offending tree block.
>>>>>>>>>> Slot 32, item size 239, which is ROOT_ITEM, but in valid size.
>>>>>>>>>>
>>>>>>>>>> Since you're here, I guess a btrfs check without --repair on the
>>>>>>>>>> unmounted fs would help to identify the real damage.
>>>>>>>>>>
>>>>>>>>>> And again, the fs looks very damaged, it's highly recommended to backup
>>>>>>>>>> your data asap.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Qu
>>>>>>>>>>
>>>>>>>>>>> --- snap ---
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've updated my system (Debian testing) [1] several months ago (~
>>>>>>>>>>>>> December) and I noticed a lot of corrupt leaf messages flooding my
>>>>>>>>>>>>> kern.log [2]. Furthermore my system had some trouble, e.g.
>>>>>>>>>>>>> applications were terminated after some uptime, due to the btrfs
>>>>>>>>>>>>> filesystem errors. This was with kernel 5.3.
>>>>>>>>>>>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
>>>>>>>>>>>>> release and with this kernel there aren't any corrupt leaf messages
>>>>>>>>>>>>> and the problem is gone. IMHO, it must be something coming with kernel
>>>>>>>>>>>>> 5.3 (or 5.x).
>>>>>>>>>>>>
>>>>>>>>>>>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
>>>>>>>>>>>> such *obviously* wrong metadata.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My harddisk is a SSD which is responsible for the root partition. I've
>>>>>>>>>>>>> encrypted my filesystem with LUKS and just right after I entered my
>>>>>>>>>>>>> password at the boot, the first corrupt leaf errors appear.
>>>>>>>>>>>>>
>>>>>>>>>>>>> An error message looks like this:
>>>>>>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>
>>>>>>>>>>>> Btrfs root items have fixed size. This is already something very bad.
>>>>>>>>>>>>
>>>>>>>>>>>> Furthermore, the item size is smaller than expected, which means we can
>>>>>>>>>>>> easily get garbage. I'm a little surprised that older kernel can even
>>>>>>>>>>>> work without crashing the whole kernel.
>>>>>>>>>>>>
>>>>>>>>>>>> Some extra info could help us to find out how badly the fs is corrupted.
>>>>>>>>>>>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
>>>>>>>>>>>>> error line. Only the block number changes.
>>>>>>>>>>>>
>>>>>>>>>>>> And dumps for the other block numbers too.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Interestingly it's the very same as reported to the ML here [3]. I've
>>>>>>>>>>>>> contacted the reporter, but he didn't have a solution for me, because
>>>>>>>>>>>>> he changed to a different filesystem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
>>>>>>>>>>>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
>>>>>>>>>>>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
>>>>>>>>>>>>> again, w/o any errors.
>>>>>>>>>>>>
>>>>>>>>>>>> This doesn't look like a bit flip, so not RAM problems.
>>>>>>>>>>>>
>>>>>>>>>>>> Don't have any better advice until we got the dumps, but I'd recommend
>>>>>>>>>>>> to backup your data since it's still possible.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Qu
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, I have no more ideas what I can do. Could you please help me to
>>>>>>>>>>>>> investigate this further? Could it be a bug?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you very much.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Thorsten
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1:
>>>>>>>>>>>>> $ cat /etc/debian_version
>>>>>>>>>>>>> bullseye/sid
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ uname -a
>>>>>>>>>>>>> [no problem with this kernel]
>>>>>>>>>>>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ btrfs --version
>>>>>>>>>>>>> btrfs-progs v5.6
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ sudo btrfs fi show
>>>>>>>>>>>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>>>>>>>>>>>         Total devices 1 FS bytes used 7.33GiB
>>>>>>>>>>>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ btrfs fi df /
>>>>>>>>>>>>> Data, single: total=22.01GiB, used=7.16GiB
>>>>>>>>>>>>> System, DUP: total=32.00MiB, used=4.00KiB
>>>>>>>>>>>>> System, single: total=4.00MiB, used=0.00B
>>>>>>>>>>>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
>>>>>>>>>>>>> Metadata, single: total=8.00MiB, used=0.00B
>>>>>>>>>>>>> GlobalReserve, single: total=25.42MiB, used=0.00B
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2:
>>>>>>>>>>>>> [several messages per second]
>>>>>>>>>>>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
>>>>>>>>>>>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
>>>>>>>>>>>>> size, have 239 expect 439
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3:
>>>>>>>>>>>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@web.de/
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>
>>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-11-20 13:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-03 13:37 corrupt leaf; invalid root item size Thorsten Rehm
2020-06-04  1:30 ` Qu Wenruo
2020-06-04  9:45   ` Thorsten Rehm
2020-06-04 10:00     ` Qu Wenruo
2020-06-04 10:52       ` Thorsten Rehm
2020-06-04 12:06         ` Qu Wenruo
2020-06-04 17:57           ` Thorsten Rehm
2020-06-08 13:25             ` Thorsten Rehm
2020-06-08 13:29               ` Qu Wenruo
2020-06-08 14:41                 ` Thorsten Rehm
2020-06-12  6:50                   ` Qu Wenruo
2020-06-16  5:41                     ` Thorsten Rehm
2020-11-20 13:17                       ` Thorsten Rehm
2020-11-20 13:47                         ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).