linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* corrupt leaf: root=1 block=57567265079296 slot=83, bad key order
@ 2019-02-14 11:58 Jesper Utoft
  2019-02-14 12:25 ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Jesper Utoft @ 2019-02-14 11:58 UTC (permalink / raw)
  To: linux-btrfs

Hello Fellow BTRFS users.

I have run into the bad key order issue.
corrupt leaf: root=1 block=57567265079296 slot=83, bad key order, prev
(18446744073709551605 0 57707594776576) current (18446726481523507189
0 57709742260224)
The lines repeats over and over..

I read a thread between Hugo Mills and Eric Wolf about a similar issue
and i have gathered the same info.

I understand that it probably is hardware related, i have been running
memtest for 60h+ to see if i could reproduce it.
I also tried to run btrfs check --recover but it did not help.

My questions is if it can be fixed? It looks like that all surrounding
blocks are empty space nodes, so could i reset the empty space using
mount option clear_cache?
If i have to copy the files out, can i then use btrfs send-recive or
should i copy the files using the ordinary cp command?
I hope you can help me, thanks in advance. Below is log snippets etc
that i hope can help with the debugging.

Failing OS: Ubuntu 16.04.5 LTS with the latest updates.

The rest is done from a liveOS Ubuntu 18.10.
Details:
ubuntu@ubuntu:/isodevice$ sudo uname -a
Linux ubuntu 4.18.0-10-generic #11-Ubuntu SMP Thu Oct 11 15:13:55 UTC
2018 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@ubuntu:/isodevice$ sudo btrfs --version
btrfs-progs v4.16.1
ubuntu@ubuntu:/isodevice$ sudo btrfs fi show
Label: none  uuid: 90900a21-8a71-4301-b5c2-21dea31f1132
    Total devices 4 FS bytes used 6.61TiB
    devid    1 size 2.72TiB used 2.30TiB path /dev/sdc4
    devid    2 size 2.72TiB used 2.30TiB path /dev/sde4
    devid    3 size 2.72TiB used 2.30TiB path /dev/sda4
    devid    4 size 5.45TiB used 4.96TiB path /dev/sdb1

ubuntu@ubuntu:/isodevice$ btrfs fi df /mnt
Data, RAID1: total=4.36TiB, used=4.35TiB
Data, RAID5: total=2.25TiB, used=2.23TiB
System, RAID1: total=64.00MiB, used=832.00KiB
Metadata, RAID1: total=33.00GiB, used=31.43GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

DMESG:
[  451.381275] BTRFS info (device sdc4): disk space caching is enabled
[  451.531224] BTRFS info (device sdc4): bdev /dev/sdc4 errs: wr 0, rd
0, flush 0, corrupt 0, gen 19
[  451.531239] BTRFS info (device sdc4): bdev /dev/sde4 errs: wr 0, rd
0, flush 0, corrupt 0, gen 3
[  451.531249] BTRFS info (device sdc4): bdev /dev/sda4 errs: wr 0, rd
15, flush 0, corrupt 0, gen 16
[  493.187544] BTRFS info (device sdc4): checking UUID tree
[  550.070483] BTRFS critical (device sdc4): corrupt leaf: root=1
block=57567265079296 slot=83, bad key order, prev
(18446744073709551605 0 57707594776576) current (18446726481523507189
0 57709742260224)
[  550.070569] BTRFS warning (device sdc4): btrfs_uuid_scan_kthread failed -5

Debug info:
sudo btrfs inspect-internal dump-tree -b 57567265079296 --follow /dev/sda4
btrfs-progs v4.16.1
leaf 57567265079296 items 246 free space 47 generation 7325984 owner ROOT_TREE
leaf 57567265079296 flags 0x1(WRITTEN) backref revision 1
fs uuid 90900a21-8a71-4301-b5c2-21dea31f1132
chunk uuid b3eed334-68af-43e6-ae1e-c05e504f30cf
    item 0 key (FREE_SPACE UNTYPED 57557270921216) itemoff 16242 itemsize 41
        location key (74274 INODE_ITEM 0)
        cache generation 7325984 entries 308 bitmaps 8
    item 1 key (FREE_SPACE UNTYPED 57558344663040) itemoff 16201 itemsize 41
        location key (74275 INODE_ITEM 0)
        cache generation 7325984 entries 338 bitmaps 8
... [snip] ...
    item 81 key (FREE_SPACE UNTYPED 57705447292928) itemoff 12921 itemsize 41
        location key (74392 INODE_ITEM 0)
        cache generation 2530576 entries 0 bitmaps 0
    item 82 key (FREE_SPACE UNTYPED 57707594776576) itemoff 12880 itemsize 41
        location key (74393 INODE_ITEM 0)
        cache generation 2530576 entries 0 bitmaps 0
    item 83 key (18446726481523507189 UNKNOWN.0 57709742260224)
itemoff 12839 itemsize 41
    item 84 key (FREE_SPACE UNTYPED 57711889743872) itemoff 12798 itemsize 41
        location key (74395 INODE_ITEM 0)
        cache generation 2530576 entries 0 bitmaps 0
    item 85 key (FREE_SPACE UNTYPED 57714037227520) itemoff 12757 itemsize 41
        location key (74396 INODE_ITEM 0)
        cache generation 5542000 entries 0 bitmaps 0
... [snip] ...

If i read the block for slot 83 and also 82 and 84 i get:
ubuntu@ubuntu:/isodevice$ sudo btrfs inspect-internal dump-tree -b
57709742260224 --follow /dev/sda4
btrfs-progs v4.16.1
checksum verify failed on 57709742260224 found 957DA941 wanted 46464952
checksum verify failed on 57709742260224 found 957DA941 wanted 46464952
checksum verify failed on 57709742260224 found 1A43BFC8 wanted F8AC1E60
checksum verify failed on 57709742260224 found 1A43BFC8 wanted F8AC1E60
bytenr mismatch, want=57709742260224, have=41175219756947292
ERROR: failed to read 57709742260224
ubuntu@ubuntu:/isodevice$ sudo btrfs inspect-internal dump-tree -b
57711889743872 --follow /dev/sda4
btrfs-progs v4.16.1
checksum verify failed on 57711889743872 found 7C7F3E7E wanted D95F6073
checksum verify failed on 57711889743872 found 7C7F3E7E wanted D95F6073
checksum verify failed on 57711889743872 found 15984D09 wanted 72B7E0E4
checksum verify failed on 57711889743872 found 7C7F3E7E wanted D95F6073
bytenr mismatch, want=57711889743872, have=3894064465997382214
ERROR: failed to read 57711889743872
ubuntu@ubuntu:/isodevice$ sudo btrfs inspect-internal dump-tree -b
57707594776576 --follow /dev/sda4
btrfs-progs v4.16.1
checksum verify failed on 57707594776576 found FD8DDF70 wanted B48A9029
checksum verify failed on 57707594776576 found FD8DDF70 wanted B48A9029
checksum verify failed on 57707594776576 found F7A75766 wanted 563355DA
checksum verify failed on 57707594776576 found F7A75766 wanted 563355DA
bytenr mismatch, want=57707594776576, have=14084120062297221284
ERROR: failed to read 57707594776576

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: corrupt leaf: root=1 block=57567265079296 slot=83, bad key order
  2019-02-14 11:58 corrupt leaf: root=1 block=57567265079296 slot=83, bad key order Jesper Utoft
@ 2019-02-14 12:25 ` Qu Wenruo
  2019-02-14 12:35   ` Hugo Mills
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2019-02-14 12:25 UTC (permalink / raw)
  To: Jesper Utoft, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3404 bytes --]



On 2019/2/14 下午7:58, Jesper Utoft wrote:
> Hello Fellow BTRFS users.
> 
> I have run into the bad key order issue.
> corrupt leaf: root=1 block=57567265079296 slot=83, bad key order, prev
> (18446744073709551605 0 57707594776576) current (18446726481523507189
> 0 57709742260224)
> The lines repeats over and over..
> 
> I read a thread between Hugo Mills and Eric Wolf about a similar issue
> and i have gathered the same info. 
Now we have all the needed info.

> 
> I understand that it probably is hardware related, i have been running
> memtest for 60h+ to see if i could reproduce it.
> I also tried to run btrfs check --recover but it did not help.
> 
> My questions is if it can be fixed?

Yes, but only manual patching is possible yet.

[snip]
> [  550.070483] BTRFS critical (device sdc4): corrupt leaf: root=1
> block=57567265079296 slot=83, bad key order, prev
> (18446744073709551605 0 57707594776576) current (18446726481523507189
> 0 57709742260224)
> [  550.070569] BTRFS warning (device sdc4): btrfs_uuid_scan_kthread failed -5
> 
> Debug info:
> sudo btrfs inspect-internal dump-tree -b 57567265079296 --follow /dev/sda4
> btrfs-progs v4.16.1
> leaf 57567265079296 items 246 free space 47 generation 7325984 owner ROOT_TREE
> leaf 57567265079296 flags 0x1(WRITTEN) backref revision 1
> fs uuid 90900a21-8a71-4301-b5c2-21dea31f1132
> chunk uuid b3eed334-68af-43e6-ae1e-c05e504f30cf
>     item 0 key (FREE_SPACE UNTYPED 57557270921216) itemoff 16242 itemsize 41
>         location key (74274 INODE_ITEM 0)
>         cache generation 7325984 entries 308 bitmaps 8
>     item 1 key (FREE_SPACE UNTYPED 57558344663040) itemoff 16201 itemsize 41
>         location key (74275 INODE_ITEM 0)
>         cache generation 7325984 entries 338 bitmaps 8
> ... [snip] ...
>     item 81 key (FREE_SPACE UNTYPED 57705447292928) itemoff 12921 itemsize 41
>         location key (74392 INODE_ITEM 0)
>         cache generation 2530576 entries 0 bitmaps 0
>     item 82 key (FREE_SPACE UNTYPED 57707594776576) itemoff 12880 itemsize 41
>         location key (74393 INODE_ITEM 0)
>         cache generation 2530576 entries 0 bitmaps 0
>     item 83 key (18446726481523507189 UNKNOWN.0 57709742260224) itemoff 12839 itemsize 41
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thankfully, all keys around give us a pretty good idea what the original
value should be: (FREE_SPACE UNTYPED 57709742260224).

And for the raw value:
bad:  0xffffeffffffffff5
good: 0xfffffffffffffff5
            ^
e->f, one bit get flipped.
(UNTYPED is the same value for UNKNOWN.0, so don't worry about that).

I have created a special branch for you:
https://github.com/adam900710/btrfs-progs/tree/dirty_fix

Just compile that btrfs-progs, no need to install, then excute the
following command inside btrfs-progs directory:

# ./btrfs-corrupt-block -X <device>

And your report just remind me to update the write time tree block
checker....

Thanks,
Qu

>     item 84 key (FREE_SPACE UNTYPED 57711889743872) itemoff 12798 itemsize 41
>         location key (74395 INODE_ITEM 0)
>         cache generation 2530576 entries 0 bitmaps 0
>     item 85 key (FREE_SPACE UNTYPED 57714037227520) itemoff 12757 itemsize 41
>         location key (74396 INODE_ITEM 0)
>         cache generation 5542000 entries 0 bitmaps 0
> ... [snip] ...
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: corrupt leaf: root=1 block=57567265079296 slot=83, bad key order
  2019-02-14 12:25 ` Qu Wenruo
@ 2019-02-14 12:35   ` Hugo Mills
  2019-02-14 12:39     ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Hugo Mills @ 2019-02-14 12:35 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Jesper Utoft, linux-btrfs, David Sterba

[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]

On Thu, Feb 14, 2019 at 08:25:26PM +0800, Qu Wenruo wrote:
> On 2019/2/14 下午7:58, Jesper Utoft wrote:
> > Hello Fellow BTRFS users.
> > 
> > I have run into the bad key order issue.
> > corrupt leaf: root=1 block=57567265079296 slot=83, bad key order, prev
> > (18446744073709551605 0 57707594776576) current (18446726481523507189
> > 0 57709742260224)
> > The lines repeats over and over..
> > 
> > I read a thread between Hugo Mills and Eric Wolf about a similar issue
> > and i have gathered the same info. 
> Now we have all the needed info.
> 
> > 
> > I understand that it probably is hardware related, i have been running
> > memtest for 60h+ to see if i could reproduce it.
> > I also tried to run btrfs check --recover but it did not help.
> > 
> > My questions is if it can be fixed?
> 
> Yes, but only manual patching is possible yet.

   David: What needs to be done to get the bitflip-in-key patches
added to btrfs check? They've been lurking in some patch stack for
literally years, and would have dealt with this one easily.

[snip]
> Thankfully, all keys around give us a pretty good idea what the original
> value should be: (FREE_SPACE UNTYPED 57709742260224).
> 
> And for the raw value:
> bad:  0xffffeffffffffff5
> good: 0xfffffffffffffff5
>             ^
> e->f, one bit get flipped.
> (UNTYPED is the same value for UNKNOWN.0, so don't worry about that).
> 
> I have created a special branch for you:
> https://github.com/adam900710/btrfs-progs/tree/dirty_fix
> 
> Just compile that btrfs-progs, no need to install, then excute the
> following command inside btrfs-progs directory:
> 
> # ./btrfs-corrupt-block -X <device>

   BUT, don't do it until you've found and replaced the bad RAM that
broke it in the first place.

> And your report just remind me to update the write time tree block
> checker....

   Looking forward to dealing with a whole new type of "btrfs is
broken!" complaints on IRC (followed by "can't I just let it carry on
regardless?"). ;)

   Hugo.

-- 
Hugo Mills             | Hickory Dickory Dock,
hugo@... carfax.org.uk | Three mice ran up the clock.
http://carfax.org.uk/  | The clock struck one,
PGP: E2AB1DE4          | The other two escaped with minor injuries

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: corrupt leaf: root=1 block=57567265079296 slot=83, bad key order
  2019-02-14 12:35   ` Hugo Mills
@ 2019-02-14 12:39     ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2019-02-14 12:39 UTC (permalink / raw)
  To: Hugo Mills, Jesper Utoft, linux-btrfs, David Sterba


[-- Attachment #1.1: Type: text/plain, Size: 2440 bytes --]



On 2019/2/14 下午8:35, Hugo Mills wrote:
> On Thu, Feb 14, 2019 at 08:25:26PM +0800, Qu Wenruo wrote:
>> On 2019/2/14 下午7:58, Jesper Utoft wrote:
>>> Hello Fellow BTRFS users.
>>>
>>> I have run into the bad key order issue.
>>> corrupt leaf: root=1 block=57567265079296 slot=83, bad key order, prev
>>> (18446744073709551605 0 57707594776576) current (18446726481523507189
>>> 0 57709742260224)
>>> The lines repeats over and over..
>>>
>>> I read a thread between Hugo Mills and Eric Wolf about a similar issue
>>> and i have gathered the same info. 
>> Now we have all the needed info.
>>
>>>
>>> I understand that it probably is hardware related, i have been running
>>> memtest for 60h+ to see if i could reproduce it.
>>> I also tried to run btrfs check --recover but it did not help.
>>>
>>> My questions is if it can be fixed?
>>
>> Yes, but only manual patching is possible yet.
> 
>    David: What needs to be done to get the bitflip-in-key patches
> added to btrfs check? They've been lurking in some patch stack for
> literally years, and would have dealt with this one easily.

It's not David's fault, it's all my fault.

I just forgot I still need to update that patchset (a big vacation just
ended on my side).
There are still valid comment on that patchset.

I'll update that patchset in recent days.

> 
> [snip]
>> Thankfully, all keys around give us a pretty good idea what the original
>> value should be: (FREE_SPACE UNTYPED 57709742260224).
>>
>> And for the raw value:
>> bad:  0xffffeffffffffff5
>> good: 0xfffffffffffffff5
>>             ^
>> e->f, one bit get flipped.
>> (UNTYPED is the same value for UNKNOWN.0, so don't worry about that).
>>
>> I have created a special branch for you:
>> https://github.com/adam900710/btrfs-progs/tree/dirty_fix
>>
>> Just compile that btrfs-progs, no need to install, then excute the
>> following command inside btrfs-progs directory:
>>
>> # ./btrfs-corrupt-block -X <device>
> 
>    BUT, don't do it until you've found and replaced the bad RAM that
> broke it in the first place.

Sure.

> 
>> And your report just remind me to update the write time tree block
>> checker....
> 
>    Looking forward to dealing with a whole new type of "btrfs is
> broken!" complaints on IRC (followed by "can't I just let it carry on
> regardless?"). ;)

Definitely :-P.

Thanks,
Qu

> 
>    Hugo.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: corrupt leaf: root=1 block=57567265079296 slot=83, bad key order
@ 2019-02-19 19:45 Jesper Utoft
  0 siblings, 0 replies; 5+ messages in thread
From: Jesper Utoft @ 2019-02-19 19:45 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, David Sterba, Hugo Mills

> On Thu, Feb 14, 2019 at 08:25:26PM +0800, Qu Wenruo wrote:
>> On 2019/2/14 =E4=B8=8B=E5=8D=887:58, Jesper Utoft wrote:
>>> Hello Fellow BTRFS users.
>>>
>>> I have run into the bad key order issue.
>>> corrupt leaf: root=3D1 block=3D57567265079296 slot=3D83, bad key orde=
r, prev
>>> (18446744073709551605 0 57707594776576) current (18446726481523507189=

>>> 0 57709742260224)
>>> The lines repeats over and over..
>>>
>>> I read a thread between Hugo Mills and Eric Wolf about a similar issu=
e
>>> and i have gathered the same info.=20
>> Now we have all the needed info.
>>
>>>
>>> I understand that it probably is hardware related, i have been runnin=
g
>>> memtest for 60h+ to see if i could reproduce it.
>>> I also tried to run btrfs check --recover but it did not help.
>>>
>>> My questions is if it can be fixed?
>>
>> Yes, but only manual patching is possible yet.
>=20
>    David: What needs to be done to get the bitflip-in-key patches
> added to btrfs check? They've been lurking in some patch stack for
> literally years, and would have dealt with this one easily.

[snip]

>=20
> [snip]
>> Thankfully, all keys around give us a pretty good idea what the origin=
al
>> value should be: (FREE_SPACE UNTYPED 57709742260224).
>>
>> And for the raw value:
>> bad:  0xffffeffffffffff5
>> good: 0xfffffffffffffff5
>>             ^
>> e->f, one bit get flipped.
>> (UNTYPED is the same value for UNKNOWN.0, so don't worry about that).
>>
>> I have created a special branch for you:
>> https://github.com/adam900710/btrfs-progs/tree/dirty_fix
>>
>> Just compile that btrfs-progs, no need to install, then excute the
>> following command inside btrfs-progs directory:
>>
>> # ./btrfs-corrupt-block -X <device>
>=20
>    BUT, don't do it until you've found and replaced the bad RAM that
> broke it in the first place.

I got the code to build & ran as described above. I do not know if it
worked and there were many other errors, or if it failed and just
moved the issue elsewhere.
In any case i had a few subvolumes with missing files so i have been
send receiving the subvolumes i can, and cp'ed the ones that i could
not.
Now it's running in a new btrfs volume on a new disk. And i will
probably use snapraid or a transfer of subvolumes between btrfs
filesystems for "backup" instead of a raid 1. Which i expect would be
a more sane approach anyway.

>=20
>> And your report just remind me to update the write time tree block
>> checker....
>=20
>    Looking forward to dealing with a whole new type of "btrfs is
> broken!" complaints on IRC (followed by "can't I just let it carry on
> regardless?"). ;)

Thank you all for the very quick assistance. Especially for the "dirty
fix" even though the volume was damaged too much to fix.

I will save money for a new hardware setup with ecc ram, for now i
will have to hope for the best, and keep taking regular backups.

Thanks
Jesper Utoft
Ps: I'm not a member of the mailing list, so if you reply please do
reply to me direcly as well.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-02-19 19:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14 11:58 corrupt leaf: root=1 block=57567265079296 slot=83, bad key order Jesper Utoft
2019-02-14 12:25 ` Qu Wenruo
2019-02-14 12:35   ` Hugo Mills
2019-02-14 12:39     ` Qu Wenruo
2019-02-19 19:45 Jesper Utoft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).