All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS critical corrupt leaf bad key order
@ 2019-01-15 11:28 Leonard Lausen
  2019-01-15 11:48 ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: Leonard Lausen @ 2019-01-15 11:28 UTC (permalink / raw)
  To: linux-btrfs

Hi everyone,

I just found my btrfs filesystem to be remounted read-only with the
following in my journalctl [1]:

  Jan 15 08:56:40 leonard-xps13 kernel: BTRFS critical (device dm-2): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
  Jan 15 08:56:40 leonard-xps13 kernel: BTRFS: error (device dm-2) in __btrfs_free_extent:6831: errno=-5 IO failure
  Jan 15 08:56:40 leonard-xps13 kernel: BTRFS info (device dm-2): forced readonly
  Jan 15 08:56:40 leonard-xps13 kernel: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2978: errno=-5 IO failure
  Jan 15 08:56:40 leonard-xps13 kernel: BTRFS info (device dm-2): delayed_refs has NO entry

Following Qu Wenruo's comment from 4th Sep 2018, I have generated the
following tree-dumps:

  sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > /tmp/btrfsdumproot
  sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > /tmp/btrfsdump1350630375424

The root dump is at https://termbin.com/lz0l and the block dump at
https://termbin.com/oev5 . The number 1350630375424 does not occur in
the root dump. The root dump has 16715 lines, the block dump only 645.

Would this imply that the corrupt tree block was not yet commited? What
actions do you recommend to take next?

My kernel version is 4.20.2. I am writing this email via ssh from the
affected system on some working server. Besides the error message above
and the fact that the filesystem is readonly, I have not yet found any
issues on the affected system. Note that the error was occuring under
high system load while compiling a bunch of software on a tmpfs (and the
compilation was successful, but installation failed in the end due to
trying to copy to the by then read-only btrfs root filessytem).

Does this suggest a hardware issue?

Thank you for your help and taking the time to read this.

Best regards
Leonard

[1]: For unknown reason, the dmesg output does not reach back to the
time of the error, but only contains log messages from after the
filesystem was mounted ro.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 11:28 BTRFS critical corrupt leaf bad key order Leonard Lausen
@ 2019-01-15 11:48 ` Qu Wenruo
  2019-01-15 11:51   ` David Sterba
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Qu Wenruo @ 2019-01-15 11:48 UTC (permalink / raw)
  To: Leonard Lausen, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3323 bytes --]



On 2019/1/15 下午7:28, Leonard Lausen wrote:
> Hi everyone,
> 
> I just found my btrfs filesystem to be remounted read-only with the
> following in my journalctl [1]:
> 
>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS critical (device dm-2): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)

Tree-checker catches the corrupted tree block, again and again.

>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS: error (device dm-2) in __btrfs_free_extent:6831: errno=-5 IO failure
>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS info (device dm-2): forced readonly
>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2978: errno=-5 IO failure
>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS info (device dm-2): delayed_refs has NO entry
> 
> Following Qu Wenruo's comment from 4th Sep 2018, I have generated the
> following tree-dumps:
> 
>   sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > /tmp/btrfsdumproot
>   sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > /tmp/btrfsdump1350630375424
> 
> The root dump is at https://termbin.com/lz0l and the block dump at
> https://termbin.com/oev5 . The number 1350630375424 does not occur in
> the root dump. The root dump has 16715 lines, the block dump only 645.

Super nice move, it shows the corruption and the cause.

	item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
	item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
	item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33

See the key objectid of key 67 is way larger than item 66/68.

And furthermore, it indeed looks like a bit rot:
0x18f19810000 (1714119835648)
0x98f19814000 (10510212874240)
0x18f19818000 (1714119868416)

See one bit got flipped.

I don't know it's corrupted in memory or on the SSD, although I tend to
believe it's caused by memory bit flip.
But anyway, it can be fixed by patching the corrupted leaf manually.

I'm working on the fix.
Please make sure there is no write into the fs (just in case, since the
fs should be RO).

And prepare a LiveUSB on which you could compile btrfs-progs (needs some
dependency).

It shouldn't take me too long time crafting the fix.

Thanks,
Qu


> 
> Would this imply that the corrupt tree block was not yet commited? What
> actions do you recommend to take next?
> 
> My kernel version is 4.20.2. I am writing this email via ssh from the
> affected system on some working server. Besides the error message above
> and the fact that the filesystem is readonly, I have not yet found any
> issues on the affected system. Note that the error was occuring under
> high system load while compiling a bunch of software on a tmpfs (and the
> compilation was successful, but installation failed in the end due to
> trying to copy to the by then read-only btrfs root filessytem).
> 
> Does this suggest a hardware issue?
> 
> Thank you for your help and taking the time to read this.
> 
> Best regards
> Leonard
> 
> [1]: For unknown reason, the dmesg output does not reach back to the
> time of the error, but only contains log messages from after the
> filesystem was mounted ro.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 11:48 ` Qu Wenruo
@ 2019-01-15 11:51   ` David Sterba
  2019-01-15 12:17     ` Qu Wenruo
  2019-01-15 12:03   ` David Sterba
  2019-01-15 12:27   ` Qu Wenruo
  2 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2019-01-15 11:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Leonard Lausen, linux-btrfs

On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
> > following tree-dumps:
> > 
> >   sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > /tmp/btrfsdumproot
> >   sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > /tmp/btrfsdump1350630375424
> > 
> > The root dump is at https://termbin.com/lz0l and the block dump at
> > https://termbin.com/oev5 . The number 1350630375424 does not occur in
> > the root dump. The root dump has 16715 lines, the block dump only 645.
> 
> Super nice move, it shows the corruption and the cause.
> 
> 	item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
> 	item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
> 	item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
> 
> See the key objectid of key 67 is way larger than item 66/68.
> 
> And furthermore, it indeed looks like a bit rot:
> 0x18f19810000 (1714119835648)
> 0x98f19814000 (10510212874240)
> 0x18f19818000 (1714119868416)
> 
> See one bit got flipped.
> 
> I don't know it's corrupted in memory or on the SSD, although I tend to
> believe it's caused by memory bit flip.

Single bit flips are almost always caused by RAM, not storage (that
fails in larger blocks or does not even return any data)

> But anyway, it can be fixed by patching the corrupted leaf manually.

That will fix one instance of the corrupted key, without an analysis how
far the wrong key got spred it's still risky.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 11:48 ` Qu Wenruo
  2019-01-15 11:51   ` David Sterba
@ 2019-01-15 12:03   ` David Sterba
  2019-01-15 12:22     ` Qu Wenruo
  2019-01-16  1:38     ` Chris Murphy
  2019-01-15 12:27   ` Qu Wenruo
  2 siblings, 2 replies; 12+ messages in thread
From: David Sterba @ 2019-01-15 12:03 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Leonard Lausen, linux-btrfs

On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
> Super nice move, it shows the corruption and the cause.
> 
> 	item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
> 	item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
> 	item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33

The key order is the most frequent and also very reliable report of the
memory bitlips. I think we should add an unconditional check before a
leaf or node is written so we catch such errors before the bad data hit
the disk.

This seems to happen way too often, I believe the check overhead would
be acceptable and at least give early warning.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 11:51   ` David Sterba
@ 2019-01-15 12:17     ` Qu Wenruo
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2019-01-15 12:17 UTC (permalink / raw)
  To: dsterba, Leonard Lausen, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1923 bytes --]



On 2019/1/15 下午7:51, David Sterba wrote:
> On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
>>> following tree-dumps:
>>>
>>>   sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > /tmp/btrfsdumproot
>>>   sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > /tmp/btrfsdump1350630375424
>>>
>>> The root dump is at https://termbin.com/lz0l and the block dump at
>>> https://termbin.com/oev5 . The number 1350630375424 does not occur in
>>> the root dump. The root dump has 16715 lines, the block dump only 645.
>>
>> Super nice move, it shows the corruption and the cause.
>>
>> 	item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
>> 	item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
>> 	item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
>>
>> See the key objectid of key 67 is way larger than item 66/68.
>>
>> And furthermore, it indeed looks like a bit rot:
>> 0x18f19810000 (1714119835648)
>> 0x98f19814000 (10510212874240)
>> 0x18f19818000 (1714119868416)
>>
>> See one bit got flipped.
>>
>> I don't know it's corrupted in memory or on the SSD, although I tend to
>> believe it's caused by memory bit flip.
> 
> Single bit flips are almost always caused by RAM, not storage (that
> fails in larger blocks or does not even return any data)

Yep, as I don't really think a bit flip could sneak in without
triggering both the disk internal csum and the tree block csum.

> 
>> But anyway, it can be fixed by patching the corrupted leaf manually.
> 
> That will fix one instance of the corrupted key, without an analysis how
> far the wrong key got spred it's still risky.

Looking from the content of the culprit leaf, it doesn't look too
problematic.

I would recommend to fix it first, then do a full btrfs check
--readonly, just as all repair routine.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 12:03   ` David Sterba
@ 2019-01-15 12:22     ` Qu Wenruo
  2019-01-15 12:28       ` Leonard Lausen
  2019-01-16  1:38     ` Chris Murphy
  1 sibling, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2019-01-15 12:22 UTC (permalink / raw)
  To: dsterba, Leonard Lausen, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1391 bytes --]



On 2019/1/15 下午8:03, David Sterba wrote:
> On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
>> Super nice move, it shows the corruption and the cause.
>>
>> 	item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
>> 	item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
>> 	item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
> 
> The key order is the most frequent and also very reliable report of the
> memory bitlips. I think we should add an unconditional check before a
> leaf or node is written so we catch such errors before the bad data hit
> the disk.

I'm super happy for that.
Although I need to do some extra check before just removing that #ifdef
#endif pair.

> 
> This seems to happen way too often,

Right, but I don't know if it's some bad kernel driver poking the
memory, or really just some hardware memory flip.
(Especially when it comes to ultrabook like the reporter is using,
soldered memory will really be a pain in ass)

> I believe the check overhead would
> be acceptable and at least give early warning.

The problem is, current check_leaf_relaxed() call is too frequently.
It's not at leaf write time, but every time btrfs_mark_buffer_dirty().

It may cause some performance regression.

I need to look into a better location for such check.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 11:48 ` Qu Wenruo
  2019-01-15 11:51   ` David Sterba
  2019-01-15 12:03   ` David Sterba
@ 2019-01-15 12:27   ` Qu Wenruo
  2019-01-15 15:17     ` Leonard Lausen
  2 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2019-01-15 12:27 UTC (permalink / raw)
  To: Leonard Lausen, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3935 bytes --]



On 2019/1/15 下午7:48, Qu Wenruo wrote:
> 
> 
> On 2019/1/15 下午7:28, Leonard Lausen wrote:
>> Hi everyone,
>>
>> I just found my btrfs filesystem to be remounted read-only with the
>> following in my journalctl [1]:
>>
>>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS critical (device dm-2): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
> 
> Tree-checker catches the corrupted tree block, again and again.
> 
>>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS: error (device dm-2) in __btrfs_free_extent:6831: errno=-5 IO failure
>>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS info (device dm-2): forced readonly
>>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2978: errno=-5 IO failure
>>   Jan 15 08:56:40 leonard-xps13 kernel: BTRFS info (device dm-2): delayed_refs has NO entry
>>
>> Following Qu Wenruo's comment from 4th Sep 2018, I have generated the
>> following tree-dumps:
>>
>>   sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > /tmp/btrfsdumproot
>>   sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > /tmp/btrfsdump1350630375424
>>
>> The root dump is at https://termbin.com/lz0l and the block dump at
>> https://termbin.com/oev5 . The number 1350630375424 does not occur in
>> the root dump. The root dump has 16715 lines, the block dump only 645.
> 
> Super nice move, it shows the corruption and the cause.
> 
> 	item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
> 	item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
> 	item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
> 
> See the key objectid of key 67 is way larger than item 66/68.
> 
> And furthermore, it indeed looks like a bit rot:
> 0x18f19810000 (1714119835648)
> 0x98f19814000 (10510212874240)
> 0x18f19818000 (1714119868416)
> 
> See one bit got flipped.
> 
> I don't know it's corrupted in memory or on the SSD, although I tend to
> believe it's caused by memory bit flip.
> But anyway, it can be fixed by patching the corrupted leaf manually.
> 
> I'm working on the fix.
> Please make sure there is no write into the fs (just in case, since the
> fs should be RO).

Here it is:
https://github.com/adam900710/btrfs-progs/tree/dirty_fix_for_leonard_lausen

You need to git checkout the branch, and then compile.
(No need to install)

Then inside the directory, execute:
# ./btrfs-corrupt-block -X <device>

It will try to locate the corrupted leaf using the dump-tree result.
If it doesn't find the corrupted leaf or the content isn't expected, it
will just exit without writing anything.

Thanks,
Qu

> 
> And prepare a LiveUSB on which you could compile btrfs-progs (needs some
> dependency).
> 
> It shouldn't take me too long time crafting the fix.
> 
> Thanks,
> Qu
> 
> 
>>
>> Would this imply that the corrupt tree block was not yet commited? What
>> actions do you recommend to take next?
>>
>> My kernel version is 4.20.2. I am writing this email via ssh from the
>> affected system on some working server. Besides the error message above
>> and the fact that the filesystem is readonly, I have not yet found any
>> issues on the affected system. Note that the error was occuring under
>> high system load while compiling a bunch of software on a tmpfs (and the
>> compilation was successful, but installation failed in the end due to
>> trying to copy to the by then read-only btrfs root filessytem).
>>
>> Does this suggest a hardware issue?
>>
>> Thank you for your help and taking the time to read this.
>>
>> Best regards
>> Leonard
>>
>> [1]: For unknown reason, the dmesg output does not reach back to the
>> time of the error, but only contains log messages from after the
>> filesystem was mounted ro.
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 12:22     ` Qu Wenruo
@ 2019-01-15 12:28       ` Leonard Lausen
  2019-01-15 12:31         ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: Leonard Lausen @ 2019-01-15 12:28 UTC (permalink / raw)
  To: Qu Wenruo, dsterba, linux-btrfs


Thanks Qu and David for your prompt attention!

Qu Wenruo <quwenruo.btrfs@gmx.com> writes:
>> following tree-dumps:
>> 
>>   sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > /tmp/btrfsdumproot
>>   sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > /tmp/btrfsdump1350630375424
>> 
>> The root dump is at https://termbin.com/lz0l and the block dump at
>> https://termbin.com/oev5 . The number 1350630375424 does not occur in
>> the root dump. The root dump has 16715 lines, the block dump only 645.
>
> Super nice move, it shows the corruption and the cause.
>
> 	item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
> 	item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
> 	item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
>
> See the key objectid of key 67 is way larger than item 66/68.
>
> And furthermore, it indeed looks like a bit rot:
> 0x18f19810000 (1714119835648)
> 0x98f19814000 (10510212874240)
> 0x18f19818000 (1714119868416)
>
> See one bit got flipped.

Thanks for the explanation!

> I don't know it's corrupted in memory or on the SSD, although I tend to
> believe it's caused by memory bit flip.
> But anyway, it can be fixed by patching the corrupted leaf manually.
>
> I'm working on the fix.
> Please make sure there is no write into the fs (just in case, since the
> fs should be RO).
>
> And prepare a LiveUSB on which you could compile btrfs-progs (needs some
> dependency).
>
> It shouldn't take me too long time crafting the fix.

Thanks Qu! I see that ArchLinux LiveUSB is based on linux 4.20.0, but
4.20.1 contains some btrfs fixes. Should I make sure to be at least on
4.20.1 for this?

David Sterba <dsterba@suse.cz> writes:
> On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
>> See the key objectid of key 67 is way larger than item 66/68.
>> 
>> And furthermore, it indeed looks like a bit rot:
>> 0x18f19810000 (1714119835648)
>> 0x98f19814000 (10510212874240)
>> 0x18f19818000 (1714119868416)
>> 
>> See one bit got flipped.

>> I don't know it's corrupted in memory or on the SSD, although I tend to
>> believe it's caused by memory bit flip.
>
> Single bit flips are almost always caused by RAM, not storage (that
> fails in larger blocks or does not even return any data)
>> But anyway, it can be fixed by patching the corrupted leaf manually.
>
> That will fix one instance of the corrupted key, without an analysis how
> far the wrong key got spred it's still risky.

How could I analyse this?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 12:28       ` Leonard Lausen
@ 2019-01-15 12:31         ` Qu Wenruo
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2019-01-15 12:31 UTC (permalink / raw)
  To: Leonard Lausen, dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2973 bytes --]



On 2019/1/15 下午8:28, Leonard Lausen wrote:
> 
> Thanks Qu and David for your prompt attention!
> 
> Qu Wenruo <quwenruo.btrfs@gmx.com> writes:
>>> following tree-dumps:
>>>
>>>   sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > /tmp/btrfsdumproot
>>>   sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > /tmp/btrfsdump1350630375424
>>>
>>> The root dump is at https://termbin.com/lz0l and the block dump at
>>> https://termbin.com/oev5 . The number 1350630375424 does not occur in
>>> the root dump. The root dump has 16715 lines, the block dump only 645.
>>
>> Super nice move, it shows the corruption and the cause.
>>
>> 	item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
>> 	item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
>> 	item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
>>
>> See the key objectid of key 67 is way larger than item 66/68.
>>
>> And furthermore, it indeed looks like a bit rot:
>> 0x18f19810000 (1714119835648)
>> 0x98f19814000 (10510212874240)
>> 0x18f19818000 (1714119868416)
>>
>> See one bit got flipped.
> 
> Thanks for the explanation!
> 
>> I don't know it's corrupted in memory or on the SSD, although I tend to
>> believe it's caused by memory bit flip.
>> But anyway, it can be fixed by patching the corrupted leaf manually.
>>
>> I'm working on the fix.
>> Please make sure there is no write into the fs (just in case, since the
>> fs should be RO).
>>
>> And prepare a LiveUSB on which you could compile btrfs-progs (needs some
>> dependency).
>>
>> It shouldn't take me too long time crafting the fix.
> 
> Thanks Qu! I see that ArchLinux LiveUSB is based on linux 4.20.0, but
> 4.20.1 contains some btrfs fixes. Should I make sure to be at least on
> 4.20.1 for this?

You won't even need to try mount the fs, so kernel version doesn't
matter here.

BTW, archlinux ISO is really a nice tool as liveUSB, your needed
dependency could be found by checking the PKGBUILD of btrfs-progs.

Thanks,
Qu
> 
> David Sterba <dsterba@suse.cz> writes:
>> On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
>>> See the key objectid of key 67 is way larger than item 66/68.
>>>
>>> And furthermore, it indeed looks like a bit rot:
>>> 0x18f19810000 (1714119835648)
>>> 0x98f19814000 (10510212874240)
>>> 0x18f19818000 (1714119868416)
>>>
>>> See one bit got flipped.
> 
>>> I don't know it's corrupted in memory or on the SSD, although I tend to
>>> believe it's caused by memory bit flip.
>>
>> Single bit flips are almost always caused by RAM, not storage (that
>> fails in larger blocks or does not even return any data)
>>> But anyway, it can be fixed by patching the corrupted leaf manually.
>>
>> That will fix one instance of the corrupted key, without an analysis how
>> far the wrong key got spred it's still risky.
> 
> How could I analyse this?
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 12:27   ` Qu Wenruo
@ 2019-01-15 15:17     ` Leonard Lausen
  0 siblings, 0 replies; 12+ messages in thread
From: Leonard Lausen @ 2019-01-15 15:17 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Qu Wenruo <quwenruo.btrfs@gmx.com> writes:
>> I'm working on the fix.
>> Please make sure there is no write into the fs (just in case, since the
>> fs should be RO).
>
> Here it is:
> https://github.com/adam900710/btrfs-progs/tree/dirty_fix_for_leonard_lausen
>
> You need to git checkout the branch, and then compile.
> (No need to install)
>
> Then inside the directory, execute:
> # ./btrfs-corrupt-block -X <device>
>
> It will try to locate the corrupted leaf using the dump-tree result.
> If it doesn't find the corrupted leaf or the content isn't expected, it
> will just exit without writing anything.

Thanks Qu for the quick fix! After running some BIOS system check to
verify that there is no persistent hardware problem, your fix resolved
the corruption issue. According to btrfs check there are no remaining
issues and the system booted up fine.

  root@archiso ~/btrfs-progs (git)-[dirty_fix_for_leonard_lausen] # ./btrfs check -p --readonly /dev/mapper/vg1-root
  Opening filesystem to check...
  Checking filesystem on /dev/mapper/vg1-root
  UUID: ea519c2e-3571-46f8-905a-99c824327caa
  [1/7] checking root items                      (0:00:24 elapsed, 7398633 items checked)
  [2/7] checking extents                         (0:03:07 elapsed, 578278 items checked)
  [3/7] checking free space cache                (0:00:13 elapsed, 814 items checked)
  [4/7] checking fs roots                        (0:07:07 elapsed, 464005 items checked)
  [5/7] checking csums (without verifying data)  (0:00:19 elapsed, 1827711 items checked)
  [6/7] checking root refs                       (0:00:00 elapsed, 277 items checked)
  [7/7] checking quota groups skipped (not enabled on this FS)
  found 848209637510 bytes used, no error found
  total csum bytes: 785717792
  total tree bytes: 9469886464
  total fs tree bytes: 7635271680
  total extent tree bytes: 848330752
  btree space waste bytes: 1936310259
  file data blocks allocated: 5230984867840
   referenced 1031230742528
  ./btrfs check -p --readonly /dev/mapper/vg1-root  374.85s user 45.16s system 62% cpu 11:11.17 total

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-15 12:03   ` David Sterba
  2019-01-15 12:22     ` Qu Wenruo
@ 2019-01-16  1:38     ` Chris Murphy
  2019-01-16  1:52       ` Qu Wenruo
  1 sibling, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2019-01-16  1:38 UTC (permalink / raw)
  To: David Sterba, Qu Wenruo, Btrfs BTRFS

On Tue, Jan 15, 2019 at 5:04 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
> > Super nice move, it shows the corruption and the cause.
> >
> >       item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
> >       item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
> >       item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
>
> The key order is the most frequent and also very reliable report of the
> memory bitlips. I think we should add an unconditional check before a
> leaf or node is written so we catch such errors before the bad data hit
> the disk.
>
> This seems to happen way too often, I believe the check overhead would
> be acceptable and at least give early warning.

What about out of tree or proprietary modules tainting the kernel? Or
other corruptions we see that aren't key order related, like the
several recent "unable to find ref byte" reports? Are these memory
corruption related, or are they non-Btrfs bugs causing such
corruption? Does it make any sense for users who are running
proprietary or out of tree kernels to run with slub_debug=F or even
FZP and possibly get a better idea what category the corruption is in?

I guess what I'm getting at is, users get a corrupt file system, they
can't repair it (honestly the tools are not good enough, and aren't
user friendly), so we tell them OK just start over with a new file
system. It would be better if there's some additional advice to give
them to try and find out what caused the corruption to begin with,
rather than just start over and maybe run into the same problem again.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTRFS critical corrupt leaf bad key order
  2019-01-16  1:38     ` Chris Murphy
@ 2019-01-16  1:52       ` Qu Wenruo
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2019-01-16  1:52 UTC (permalink / raw)
  To: Chris Murphy, David Sterba, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 2425 bytes --]



On 2019/1/16 上午9:38, Chris Murphy wrote:
> On Tue, Jan 15, 2019 at 5:04 AM David Sterba <dsterba@suse.cz> wrote:
>>
>> On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
>>> Super nice move, it shows the corruption and the cause.
>>>
>>>       item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
>>>       item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
>>>       item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
>>
>> The key order is the most frequent and also very reliable report of the
>> memory bitlips. I think we should add an unconditional check before a
>> leaf or node is written so we catch such errors before the bad data hit
>> the disk.
>>
>> This seems to happen way too often, I believe the check overhead would
>> be acceptable and at least give early warning.
> 
> What about out of tree or proprietary modules tainting the kernel?

For XPS13 there is no dedicated GPU on board, so no NVidia bullsh*t.
And I don't really think it's proprietary modules.


> Or
> other corruptions we see that aren't key order related, like the
> several recent "unable to find ref byte" reports?

I'm not super clear on extent tree corruption. but I really don't think
they are the same bug.

> Are these memory
> corruption related, or are they non-Btrfs bugs causing such
> corruption? Does it make any sense for users who are running
> proprietary or out of tree kernels to run with slub_debug=F or even
> FZP and possibly get a better idea what category the corruption is in?

Anyway, I'm working on the idea David mentioned.
Hopes soon we will get a more early detection to get some clue.

> 
> I guess what I'm getting at is, users get a corrupt file system, they
> can't repair it (honestly the tools are not good enough, and aren't
> user friendly),

Definitely.

> so we tell them OK just start over with a new file
> system. It would be better if there's some additional advice to give
> them to try and find out what caused the corruption to begin with,
> rather than just start over and maybe run into the same problem again.

Obviously, current tree checker is already too late for such case.
But if we catch them just before writing to disk, then it'll be much better.
User won't get a corrupted fs, and we will get a clue, then everyone is
happy.

Thanks,
Qu

> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-01-16  1:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-15 11:28 BTRFS critical corrupt leaf bad key order Leonard Lausen
2019-01-15 11:48 ` Qu Wenruo
2019-01-15 11:51   ` David Sterba
2019-01-15 12:17     ` Qu Wenruo
2019-01-15 12:03   ` David Sterba
2019-01-15 12:22     ` Qu Wenruo
2019-01-15 12:28       ` Leonard Lausen
2019-01-15 12:31         ` Qu Wenruo
2019-01-16  1:38     ` Chris Murphy
2019-01-16  1:52       ` Qu Wenruo
2019-01-15 12:27   ` Qu Wenruo
2019-01-15 15:17     ` Leonard Lausen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.