All of lore.kernel.org
 help / color / mirror / Atom feed
* off-by-one uncompressed invalid ram_bytes corruptions
@ 2018-05-18  5:23 Steve Leung
  2018-05-18  5:49 ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Steve Leung @ 2018-05-18  5:23 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2507 bytes --]

Hi list,

I've got 3-device raid1 btrfs filesystem that's throwing up some
"corrupt leaf" errors in dmesg.  This is a uniquified list I've
observed lately:

   BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970196795392 slot=307 ino=206231 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 3468 expect 3469
   BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970552426496 slot=91 ino=209736 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 3496 expect 3497
   BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970712399872 slot=221 ino=205230 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 1790 expect 1791
   BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970803920896 slot=368 ino=205732 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 2475 expect 2476
   BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970987945984 slot=236 ino=208896 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 490 expect 491

All of them seem to be 1 short of the expected value.

Some files do seem to be inaccessible on the filesystem, and btrfs
inspect-internal on any of those inode numbers fails with:

  ERROR: ino paths ioctl: Input/output error

and another message for that inode appears.

'btrfs check' (output attached) seems to notice these corruptions (among 
a few others, some of which seem to be related to a problematic attempt 
to build Android I posted about some months ago).

Other information:

Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem has 
about 25 snapshots at the moment, only a handful of compressed files, 
and nothing fancy like qgroups enabled.

btrfs fi show:

  Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
          Total devices 4 FS bytes used 2.48TiB
          devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
          devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
          devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
          devid    4 size 3.49TiB used 2.49TiB path /dev/sda1

btrfs fi df:

  Data, RAID1: total=2.49TiB, used=2.48TiB
  System, RAID1: total=32.00MiB, used=416.00KiB
  Metadata, RAID1: total=7.00GiB, used=5.29GiB
  GlobalReserve, single: total=512.00MiB, used=0.00B

dmesg output attached as well.

Thanks in advance for any assistance!  I have backups of all the 
important stuff here but it would be nice to fix the corruptions in place.

Steve

[-- Attachment #2: btrfs-check.txt.gz --]
[-- Type: application/gzip, Size: 3360 bytes --]

[-- Attachment #3: dmesg.txt.gz --]
[-- Type: application/gzip, Size: 15624 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-18  5:23 off-by-one uncompressed invalid ram_bytes corruptions Steve Leung
@ 2018-05-18  5:49 ` Qu Wenruo
  2018-05-18  9:42   ` james harvey
  2018-05-19 23:40   ` Steve Leung
  0 siblings, 2 replies; 17+ messages in thread
From: Qu Wenruo @ 2018-05-18  5:49 UTC (permalink / raw)
  To: Steve Leung, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3469 bytes --]



On 2018年05月18日 13:23, Steve Leung wrote:
> Hi list,
> 
> I've got 3-device raid1 btrfs filesystem that's throwing up some
> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
> observed lately:
> 
>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970196795392
> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 3468 expect 3469

Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
dump the leaf?

It's caught by tree-checker code which is ensuring all tree blocks are
correct before btrfs can take use of them.

That inline extent size check is tested, so I'm wondering if this
indicates any real corruption.
That btrfs-debug-tree output will definitely help.

BTW, if I didn't miss anything, there should not be any inlined extent
in root tree.

>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970552426496
> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 3496 expect 3497

Same dump will definitely help.

>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970712399872
> slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 1790 expect 1791
>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970803920896
> slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 2475 expect 2476
>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970987945984
> slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 490 expect 491
> 
> All of them seem to be 1 short of the expected value.
> 
> Some files do seem to be inaccessible on the filesystem, and btrfs
> inspect-internal on any of those inode numbers fails with:
> 
>  ERROR: ino paths ioctl: Input/output error
> 
> and another message for that inode appears.
> 
> 'btrfs check' (output attached) seems to notice these corruptions (among
> a few others, some of which seem to be related to a problematic attempt
> to build Android I posted about some months ago).
> 
> Other information:
> 
> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem has
> about 25 snapshots at the moment, only a handful of compressed files,
> and nothing fancy like qgroups enabled.
> 
> btrfs fi show:
> 
>  Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>          Total devices 4 FS bytes used 2.48TiB
>          devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>          devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
>          devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>          devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
> 
> btrfs fi df:
> 
>  Data, RAID1: total=2.49TiB, used=2.48TiB
>  System, RAID1: total=32.00MiB, used=416.00KiB
>  Metadata, RAID1: total=7.00GiB, used=5.29GiB
>  GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> dmesg output attached as well.
> 
> Thanks in advance for any assistance!  I have backups of all the
> important stuff here but it would be nice to fix the corruptions in place.

And btrfs check doesn't report the same problem as the default original
mode doesn't have such check.

Please also post the result of "btrfs check --mode=lowmem /dev/sda1"

Thanks,
Qu

> 
> Steve


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-18  5:49 ` Qu Wenruo
@ 2018-05-18  9:42   ` james harvey
  2018-05-18  9:56     ` Qu Wenruo
  2018-05-19 23:40   ` Steve Leung
  1 sibling, 1 reply; 17+ messages in thread
From: james harvey @ 2018-05-18  9:42 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Steve Leung, Btrfs BTRFS

On Fri, May 18, 2018 at 1:49 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> And btrfs check doesn't report the same problem as the default original
> mode doesn't have such check.
>
> Please also post the result of "btrfs check --mode=lowmem /dev/sda1"

Are you saying "--mode=lowmem" does more checks than without it?  "man
btrfs check" says it's experimental and the difference is just
original is unoptimized regarding memory consumption and can run out
of memory, and low memory addresses this with increased IO cost from
re-reading blocks increasing run time.  It doesn't indicate lowmem is
a better check.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-18  9:42   ` james harvey
@ 2018-05-18  9:56     ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2018-05-18  9:56 UTC (permalink / raw)
  To: james harvey; +Cc: Steve Leung, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1154 bytes --]



On 2018年05月18日 17:42, james harvey wrote:
> On Fri, May 18, 2018 at 1:49 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> And btrfs check doesn't report the same problem as the default original
>> mode doesn't have such check.
>>
>> Please also post the result of "btrfs check --mode=lowmem /dev/sda1"
> 
> Are you saying "--mode=lowmem" does more checks than without it?

Sometimes it does more check.

>  "man
> btrfs check" says it's experimental and the difference is just
> original is unoptimized regarding memory consumption and can run out
> of memory, and low memory addresses this with increased IO cost from
> re-reading blocks increasing run time.  It doesn't indicate lowmem is
> a better check.

Well, due to the fact original mode and lowmem mode use completely
different way to check, you'd better consider lowmem mode as a
completely rework.
Thus sometimes it will cause different result. (Although most of the
time lowmem is causing false alerts)

Here in this particular case, lowmem does indeed do extra check.

And overall, lowmem mode provides more human readable error output.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-18  5:49 ` Qu Wenruo
  2018-05-18  9:42   ` james harvey
@ 2018-05-19 23:40   ` Steve Leung
  2018-05-20  1:02     ` Qu Wenruo
  1 sibling, 1 reply; 17+ messages in thread
From: Steve Leung @ 2018-05-19 23:40 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3923 bytes --]

On 05/17/2018 11:49 PM, Qu Wenruo wrote:
> On 2018年05月18日 13:23, Steve Leung wrote:
>> Hi list,
>>
>> I've got 3-device raid1 btrfs filesystem that's throwing up some
>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
>> observed lately:

Evidently I forgot that I added a fourth device to this system, from the 
info below, but I don't think it matters.  :)

>>    BTRFS critical (device sda1): corrupt leaf: root=1 block=4970196795392
>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>> inline extent, have 3468 expect 3469
> 
> Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
> dump the leaf?

Attached btrfs-debug-tree dumps for all of the blocks that I saw 
messages for.

> It's caught by tree-checker code which is ensuring all tree blocks are
> correct before btrfs can take use of them.
> 
> That inline extent size check is tested, so I'm wondering if this
> indicates any real corruption.
> That btrfs-debug-tree output will definitely help.
> 
> BTW, if I didn't miss anything, there should not be any inlined extent
> in root tree.
> 
>>    BTRFS critical (device sda1): corrupt leaf: root=1 block=4970552426496
>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
>> inline extent, have 3496 expect 3497
> 
> Same dump will definitely help.
> 
>>    BTRFS critical (device sda1): corrupt leaf: root=1 block=4970712399872
>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed
>> inline extent, have 1790 expect 1791
>>    BTRFS critical (device sda1): corrupt leaf: root=1 block=4970803920896
>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed
>> inline extent, have 2475 expect 2476
>>    BTRFS critical (device sda1): corrupt leaf: root=1 block=4970987945984
>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed
>> inline extent, have 490 expect 491
>>
>> All of them seem to be 1 short of the expected value.
>>
>> Some files do seem to be inaccessible on the filesystem, and btrfs
>> inspect-internal on any of those inode numbers fails with:
>>
>>   ERROR: ino paths ioctl: Input/output error
>>
>> and another message for that inode appears.
>>
>> 'btrfs check' (output attached) seems to notice these corruptions (among
>> a few others, some of which seem to be related to a problematic attempt
>> to build Android I posted about some months ago).
>>
>> Other information:
>>
>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem has
>> about 25 snapshots at the moment, only a handful of compressed files,
>> and nothing fancy like qgroups enabled.
>>
>> btrfs fi show:
>>
>>   Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>>           Total devices 4 FS bytes used 2.48TiB
>>           devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>>           devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
>>           devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>>           devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>>
>> btrfs fi df:
>>
>>   Data, RAID1: total=2.49TiB, used=2.48TiB
>>   System, RAID1: total=32.00MiB, used=416.00KiB
>>   Metadata, RAID1: total=7.00GiB, used=5.29GiB
>>   GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>> dmesg output attached as well.
>>
>> Thanks in advance for any assistance!  I have backups of all the
>> important stuff here but it would be nice to fix the corruptions in place.
> 
> And btrfs check doesn't report the same problem as the default original
> mode doesn't have such check.
> 
> Please also post the result of "btrfs check --mode=lowmem /dev/sda1"

Also, attached.  It seems to notice the same off-by-one problems, though 
there also seem to be a couple of examples of being off by more than one.

Thanks for looking at this!  I'll get my backups ready, just in case.

Steve

[-- Attachment #2: btrfs-check-lowmem.txt.gz --]
[-- Type: application/gzip, Size: 5468 bytes --]

[-- Attachment #3: btrfs-debug-tree-4970196795392.txt.gz --]
[-- Type: application/gzip, Size: 9068 bytes --]

[-- Attachment #4: btrfs-debug-tree-4970552426496.txt.gz --]
[-- Type: application/gzip, Size: 8321 bytes --]

[-- Attachment #5: btrfs-debug-tree-4970712399872.txt.gz --]
[-- Type: application/gzip, Size: 7581 bytes --]

[-- Attachment #6: btrfs-debug-tree-4970803920896.txt.gz --]
[-- Type: application/gzip, Size: 8601 bytes --]

[-- Attachment #7: btrfs-debug-tree-4970987945984.txt.gz --]
[-- Type: application/gzip, Size: 7559 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-19 23:40   ` Steve Leung
@ 2018-05-20  1:02     ` Qu Wenruo
  2018-05-20 20:43       ` Steve Leung
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-05-20  1:02 UTC (permalink / raw)
  To: Steve Leung, linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 4846 bytes --]



On 2018年05月20日 07:40, Steve Leung wrote:
> On 05/17/2018 11:49 PM, Qu Wenruo wrote:
>> On 2018年05月18日 13:23, Steve Leung wrote:
>>> Hi list,
>>>
>>> I've got 3-device raid1 btrfs filesystem that's throwing up some
>>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
>>> observed lately:
> 
> Evidently I forgot that I added a fourth device to this system, from the
> info below, but I don't think it matters.  :)
> 
>>>    BTRFS critical (device sda1): corrupt leaf: root=1
>>> block=4970196795392
>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>>> inline extent, have 3468 expect 3469
>>
>> Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
>> dump the leaf?
> 
> Attached btrfs-debug-tree dumps for all of the blocks that I saw
> messages for.
> 
>> It's caught by tree-checker code which is ensuring all tree blocks are
>> correct before btrfs can take use of them.
>>
>> That inline extent size check is tested, so I'm wondering if this
>> indicates any real corruption.
>> That btrfs-debug-tree output will definitely help.
>>
>> BTW, if I didn't miss anything, there should not be any inlined extent
>> in root tree.
>>
>>>    BTRFS critical (device sda1): corrupt leaf: root=1
>>> block=4970552426496
>>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
>>> inline extent, have 3496 expect 3497
>>
>> Same dump will definitely help.
>>
>>>    BTRFS critical (device sda1): corrupt leaf: root=1
>>> block=4970712399872
>>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed
>>> inline extent, have 1790 expect 1791
>>>    BTRFS critical (device sda1): corrupt leaf: root=1
>>> block=4970803920896
>>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed
>>> inline extent, have 2475 expect 2476
>>>    BTRFS critical (device sda1): corrupt leaf: root=1
>>> block=4970987945984
>>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed
>>> inline extent, have 490 expect 491
>>>
>>> All of them seem to be 1 short of the expected value.
>>>
>>> Some files do seem to be inaccessible on the filesystem, and btrfs
>>> inspect-internal on any of those inode numbers fails with:
>>>
>>>   ERROR: ino paths ioctl: Input/output error
>>>
>>> and another message for that inode appears.
>>>
>>> 'btrfs check' (output attached) seems to notice these corruptions (among
>>> a few others, some of which seem to be related to a problematic attempt
>>> to build Android I posted about some months ago).
>>>
>>> Other information:
>>>
>>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem has
>>> about 25 snapshots at the moment, only a handful of compressed files,
>>> and nothing fancy like qgroups enabled.
>>>
>>> btrfs fi show:
>>>
>>>   Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>>>           Total devices 4 FS bytes used 2.48TiB
>>>           devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>>>           devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
>>>           devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>>>           devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>>>
>>> btrfs fi df:
>>>
>>>   Data, RAID1: total=2.49TiB, used=2.48TiB
>>>   System, RAID1: total=32.00MiB, used=416.00KiB
>>>   Metadata, RAID1: total=7.00GiB, used=5.29GiB
>>>   GlobalReserve, single: total=512.00MiB, used=0.00B
>>>
>>> dmesg output attached as well.
>>>
>>> Thanks in advance for any assistance!  I have backups of all the
>>> important stuff here but it would be nice to fix the corruptions in
>>> place.
>>
>> And btrfs check doesn't report the same problem as the default original
>> mode doesn't have such check.
>>
>> Please also post the result of "btrfs check --mode=lowmem /dev/sda1"
> 
> Also, attached.  It seems to notice the same off-by-one problems, though
> there also seem to be a couple of examples of being off by more than one.

Unfortunately, it doesn't detect, as there is no off-by-one error at all.

The problem is, kernel is reporting error on completely fine leaf.

Further more, even in the same leaf, there are more inlined extents, and
they are all valid.

So the kernel reports the error out of nowhere.

More problems happens for extent_size where a lot of them is offset by one.

Moreover, the root owner is not printed correctly, thus I'm wondering if
the memory is corrupted.

Please try memtest+ to verify all your memory is correct, and if so,
please try the attached patch and to see if it provides extra info.


> 
> Thanks for looking at this!  I'll get my backups ready, just in case.
> 
> Steve

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-btrfs-tree-checker-Add-extra-inline-extent-ram_bytes.patch --]
[-- Type: text/x-patch; name="0001-btrfs-tree-checker-Add-extra-inline-extent-ram_bytes.patch", Size: 1164 bytes --]

From 3540534d0ff8b6e9dc200f9dff92b8a5afa7d384 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu@suse.com>
Date: Sun, 20 May 2018 09:01:43 +0800
Subject: [PATCH] btrfs: tree-checker: Add extra inline extent ram_bytes debug
 info

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tree-checker.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 8d40e7dd8c30..3a4534e7068e 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -163,8 +163,10 @@ static int check_extent_data_item(struct btrfs_fs_info *fs_info,
 		if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START +
 		    btrfs_file_extent_ram_bytes(leaf, fi)) {
 			file_extent_err(fs_info, leaf, slot,
-	"invalid ram_bytes for uncompressed inline extent, have %u expect %llu",
+	"invalid ram_bytes for uncompressed inline extent, have %u expect %llu (%lu + %llu)",
 				item_size, BTRFS_FILE_EXTENT_INLINE_DATA_START +
+				btrfs_file_extent_ram_bytes(leaf, fi),
+				BTRFS_FILE_EXTENT_INLINE_DATA_START,
 				btrfs_file_extent_ram_bytes(leaf, fi));
 			return -EUCLEAN;
 		}
-- 
2.17.0


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-20  1:02     ` Qu Wenruo
@ 2018-05-20 20:43       ` Steve Leung
  2018-05-21  1:07         ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Steve Leung @ 2018-05-20 20:43 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 05/19/2018 07:02 PM, Qu Wenruo wrote:
> 
> 
> On 2018年05月20日 07:40, Steve Leung wrote:
>> On 05/17/2018 11:49 PM, Qu Wenruo wrote:
>>> On 2018年05月18日 13:23, Steve Leung wrote:
>>>> Hi list,
>>>>
>>>> I've got 3-device raid1 btrfs filesystem that's throwing up some
>>>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
>>>> observed lately:

>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>> block=4970196795392
>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>>>> inline extent, have 3468 expect 3469
>>>
>>> Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
>>> dump the leaf?
>>
>> Attached btrfs-debug-tree dumps for all of the blocks that I saw
>> messages for.
>>
>>> It's caught by tree-checker code which is ensuring all tree blocks are
>>> correct before btrfs can take use of them.
>>>
>>> That inline extent size check is tested, so I'm wondering if this
>>> indicates any real corruption.
>>> That btrfs-debug-tree output will definitely help.
>>>
>>> BTW, if I didn't miss anything, there should not be any inlined extent
>>> in root tree.
>>>
>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>> block=4970552426496
>>>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
>>>> inline extent, have 3496 expect 3497
>>>
>>> Same dump will definitely help.
>>>
>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>> block=4970712399872
>>>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed
>>>> inline extent, have 1790 expect 1791
>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>> block=4970803920896
>>>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed
>>>> inline extent, have 2475 expect 2476
>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>> block=4970987945984
>>>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed
>>>> inline extent, have 490 expect 491
>>>>
>>>> All of them seem to be 1 short of the expected value.
>>>>
>>>> Some files do seem to be inaccessible on the filesystem, and btrfs
>>>> inspect-internal on any of those inode numbers fails with:
>>>>
>>>>    ERROR: ino paths ioctl: Input/output error
>>>>
>>>> and another message for that inode appears.
>>>>
>>>> 'btrfs check' (output attached) seems to notice these corruptions (among
>>>> a few others, some of which seem to be related to a problematic attempt
>>>> to build Android I posted about some months ago).
>>>>
>>>> Other information:
>>>>
>>>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem has
>>>> about 25 snapshots at the moment, only a handful of compressed files,
>>>> and nothing fancy like qgroups enabled.
>>>>
>>>> btrfs fi show:
>>>>
>>>>    Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>>>>            Total devices 4 FS bytes used 2.48TiB
>>>>            devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>>>>            devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
>>>>            devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>>>>            devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>>>>
>>>> btrfs fi df:
>>>>
>>>>    Data, RAID1: total=2.49TiB, used=2.48TiB
>>>>    System, RAID1: total=32.00MiB, used=416.00KiB
>>>>    Metadata, RAID1: total=7.00GiB, used=5.29GiB
>>>>    GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>
>>>> dmesg output attached as well.
>>>>
>>>> Thanks in advance for any assistance!  I have backups of all the
>>>> important stuff here but it would be nice to fix the corruptions in
>>>> place.
>>>
>>> And btrfs check doesn't report the same problem as the default original
>>> mode doesn't have such check.
>>>
>>> Please also post the result of "btrfs check --mode=lowmem /dev/sda1"
>>
>> Also, attached.  It seems to notice the same off-by-one problems, though
>> there also seem to be a couple of examples of being off by more than one.
> 
> Unfortunately, it doesn't detect, as there is no off-by-one error at all.
> 
> The problem is, kernel is reporting error on completely fine leaf.
> 
> Further more, even in the same leaf, there are more inlined extents, and
> they are all valid.
> 
> So the kernel reports the error out of nowhere.
> 
> More problems happens for extent_size where a lot of them is offset by one.
> 
> Moreover, the root owner is not printed correctly, thus I'm wondering if
> the memory is corrupted.
> 
> Please try memtest+ to verify all your memory is correct, and if so,
> please try the attached patch and to see if it provides extra info.

Memtest ran for about 12 hours last night, and didn't find any errors.

New messages from patched kernel:

  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970196795392 
slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed 
inline extent, have 3468 expect 3469 (21 + 3448)
  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970552426496 
slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed 
inline extent, have 3496 expect 3497 (21 + 3476)
  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970712399872 
slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed 
inline extent, have 1790 expect 1791 (21 + 1770)
  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970803920896 
slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed 
inline extent, have 2475 expect 2476 (21 + 2455)
  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970987945984 
slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed 
inline extent, have 490 expect 491 (21 + 470)

Steve

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-20 20:43       ` Steve Leung
@ 2018-05-21  1:07         ` Qu Wenruo
  2018-05-26 14:06           ` Steve Leung
  2018-05-29 18:49           ` Hans van Kranenburg
  0 siblings, 2 replies; 17+ messages in thread
From: Qu Wenruo @ 2018-05-21  1:07 UTC (permalink / raw)
  To: Steve Leung, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 7628 bytes --]



On 2018年05月21日 04:43, Steve Leung wrote:
> On 05/19/2018 07:02 PM, Qu Wenruo wrote:
>>
>>
>> On 2018年05月20日 07:40, Steve Leung wrote:
>>> On 05/17/2018 11:49 PM, Qu Wenruo wrote:
>>>> On 2018年05月18日 13:23, Steve Leung wrote:
>>>>> Hi list,
>>>>>
>>>>> I've got 3-device raid1 btrfs filesystem that's throwing up some
>>>>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
>>>>> observed lately:
> 
>>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>>> block=4970196795392
>>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>>>>> inline extent, have 3468 expect 3469
>>>>
>>>> Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
>>>> dump the leaf?
>>>
>>> Attached btrfs-debug-tree dumps for all of the blocks that I saw
>>> messages for.
>>>
>>>> It's caught by tree-checker code which is ensuring all tree blocks are
>>>> correct before btrfs can take use of them.
>>>>
>>>> That inline extent size check is tested, so I'm wondering if this
>>>> indicates any real corruption.
>>>> That btrfs-debug-tree output will definitely help.
>>>>
>>>> BTW, if I didn't miss anything, there should not be any inlined extent
>>>> in root tree.
>>>>
>>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>>> block=4970552426496
>>>>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
>>>>> inline extent, have 3496 expect 3497
>>>>
>>>> Same dump will definitely help.
>>>>
>>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>>> block=4970712399872
>>>>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed
>>>>> inline extent, have 1790 expect 1791
>>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>>> block=4970803920896
>>>>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed
>>>>> inline extent, have 2475 expect 2476
>>>>>     BTRFS critical (device sda1): corrupt leaf: root=1
>>>>> block=4970987945984
>>>>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed
>>>>> inline extent, have 490 expect 491
>>>>>
>>>>> All of them seem to be 1 short of the expected value.
>>>>>
>>>>> Some files do seem to be inaccessible on the filesystem, and btrfs
>>>>> inspect-internal on any of those inode numbers fails with:
>>>>>
>>>>>    ERROR: ino paths ioctl: Input/output error
>>>>>
>>>>> and another message for that inode appears.
>>>>>
>>>>> 'btrfs check' (output attached) seems to notice these corruptions
>>>>> (among
>>>>> a few others, some of which seem to be related to a problematic
>>>>> attempt
>>>>> to build Android I posted about some months ago).
>>>>>
>>>>> Other information:
>>>>>
>>>>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem
>>>>> has
>>>>> about 25 snapshots at the moment, only a handful of compressed files,
>>>>> and nothing fancy like qgroups enabled.
>>>>>
>>>>> btrfs fi show:
>>>>>
>>>>>    Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>>>>>            Total devices 4 FS bytes used 2.48TiB
>>>>>            devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>>>>>            devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
>>>>>            devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>>>>>            devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>>>>>
>>>>> btrfs fi df:
>>>>>
>>>>>    Data, RAID1: total=2.49TiB, used=2.48TiB
>>>>>    System, RAID1: total=32.00MiB, used=416.00KiB
>>>>>    Metadata, RAID1: total=7.00GiB, used=5.29GiB
>>>>>    GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>>
>>>>> dmesg output attached as well.
>>>>>
>>>>> Thanks in advance for any assistance!  I have backups of all the
>>>>> important stuff here but it would be nice to fix the corruptions in
>>>>> place.
>>>>
>>>> And btrfs check doesn't report the same problem as the default original
>>>> mode doesn't have such check.
>>>>
>>>> Please also post the result of "btrfs check --mode=lowmem /dev/sda1"
>>>
>>> Also, attached.  It seems to notice the same off-by-one problems, though
>>> there also seem to be a couple of examples of being off by more than
>>> one.
>>
>> Unfortunately, it doesn't detect, as there is no off-by-one error at all.
>>
>> The problem is, kernel is reporting error on completely fine leaf.
>>
>> Further more, even in the same leaf, there are more inlined extents, and
>> they are all valid.
>>
>> So the kernel reports the error out of nowhere.
>>
>> More problems happens for extent_size where a lot of them is offset by
>> one.
>>
>> Moreover, the root owner is not printed correctly, thus I'm wondering if
>> the memory is corrupted.
>>
>> Please try memtest+ to verify all your memory is correct, and if so,
>> please try the attached patch and to see if it provides extra info.
> 
> Memtest ran for about 12 hours last night, and didn't find any errors.
> 
> New messages from patched kernel:
> 
>  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970196795392
> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 3468 expect 3469 (21 + 3448)

This output doesn't match with debug-tree dump.

item 307 key (206231 EXTENT_DATA 0) itemoff 15118 itemsize 3468
	generation 692987 type 0 (inline)
	inline extent data size 3447 ram_bytes 3447 compression 0 (none)

Where its ram_bytes is 3447, not 3448.

Further more, there are 2 more inlined extent, if something really went
wrong reading ram_bytes, it should also trigger the same warning.

item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
	generation 367 type 0 (inline)
	inline extent data size 154 ram_bytes 154 compression 0 (none)

and

item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
	generation 367 type 0 (inline)
	inline extent data size 154 ram_bytes 154 compression 0 (none)

The only way to get the number 3448 is from its inode item.

item 305 key (206231 INODE_ITEM 0) itemoff 18607 itemsize 160
	generation 1136104 transid 1136104 size 3447 nbytes  >>3448<<
	block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0
	sequence 4 flags 0x0(none)
	atime 1390923260.43167583 (2014-01-28 15:34:20)
	ctime 1416461176.910968309 (2014-11-20 05:26:16)
	mtime 1392531030.754511511 (2014-02-16 06:10:30)
	otime 0.0 (1970-01-01 00:00:00)

But the slot is correct, and nothing wrong with these item offset/length.

And the problem of wrong "root=" output also makes me pretty curious.

Is it possible to make a btrfs-image dump if all the filenames in this
fs are not sensitive?

Thanks,
Qu

>  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970552426496
> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 3496 expect 3497 (21 + 3476)
>  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970712399872
> slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 1790 expect 1791 (21 + 1770)
>  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970803920896
> slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 2475 expect 2476 (21 + 2455)
>  BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970987945984
> slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 490 expect 491 (21 + 470)
> 
> Steve


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-21  1:07         ` Qu Wenruo
@ 2018-05-26 14:06           ` Steve Leung
  2018-05-27  0:57             ` Qu Wenruo
  2018-05-29 18:49           ` Hans van Kranenburg
  1 sibling, 1 reply; 17+ messages in thread
From: Steve Leung @ 2018-05-26 14:06 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 05/20/2018 07:07 PM, Qu Wenruo wrote:
> 
> 
> On 2018年05月21日 04:43, Steve Leung wrote:
>> On 05/19/2018 07:02 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年05月20日 07:40, Steve Leung wrote:
>>>> On 05/17/2018 11:49 PM, Qu Wenruo wrote:
>>>>> On 2018年05月18日 13:23, Steve Leung wrote:
>>>>>> Hi list,
>>>>>>
>>>>>> I've got 3-device raid1 btrfs filesystem that's throwing up some
>>>>>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
>>>>>> observed lately:
>>
>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>> block=4970196795392
>>>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>>>>>> inline extent, have 3468 expect 3469
>>>>>
>>>>> Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
>>>>> dump the leaf?
>>>>
>>>> Attached btrfs-debug-tree dumps for all of the blocks that I saw
>>>> messages for.
>>>>
>>>>> It's caught by tree-checker code which is ensuring all tree blocks are
>>>>> correct before btrfs can take use of them.
>>>>>
>>>>> That inline extent size check is tested, so I'm wondering if this
>>>>> indicates any real corruption.
>>>>> That btrfs-debug-tree output will definitely help.
>>>>>
>>>>> BTW, if I didn't miss anything, there should not be any inlined extent
>>>>> in root tree.
>>>>>
>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>> block=4970552426496
>>>>>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
>>>>>> inline extent, have 3496 expect 3497
>>>>>
>>>>> Same dump will definitely help.
>>>>>
>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>> block=4970712399872
>>>>>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed
>>>>>> inline extent, have 1790 expect 1791
>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>> block=4970803920896
>>>>>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed
>>>>>> inline extent, have 2475 expect 2476
>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>> block=4970987945984
>>>>>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed
>>>>>> inline extent, have 490 expect 491
>>>>>>
>>>>>> All of them seem to be 1 short of the expected value.
>>>>>>
>>>>>> Some files do seem to be inaccessible on the filesystem, and btrfs
>>>>>> inspect-internal on any of those inode numbers fails with:
>>>>>>
>>>>>>     ERROR: ino paths ioctl: Input/output error
>>>>>>
>>>>>> and another message for that inode appears.
>>>>>>
>>>>>> 'btrfs check' (output attached) seems to notice these corruptions
>>>>>> (among
>>>>>> a few others, some of which seem to be related to a problematic
>>>>>> attempt
>>>>>> to build Android I posted about some months ago).
>>>>>>
>>>>>> Other information:
>>>>>>
>>>>>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem
>>>>>> has
>>>>>> about 25 snapshots at the moment, only a handful of compressed files,
>>>>>> and nothing fancy like qgroups enabled.
>>>>>>
>>>>>> btrfs fi show:
>>>>>>
>>>>>>     Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>>>>>>             Total devices 4 FS bytes used 2.48TiB
>>>>>>             devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>>>>>>             devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
>>>>>>             devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>>>>>>             devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>>>>>>
>>>>>> btrfs fi df:
>>>>>>
>>>>>>     Data, RAID1: total=2.49TiB, used=2.48TiB
>>>>>>     System, RAID1: total=32.00MiB, used=416.00KiB
>>>>>>     Metadata, RAID1: total=7.00GiB, used=5.29GiB
>>>>>>     GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>>>
>>>>>> dmesg output attached as well.
>>>>>>
>>>>>> Thanks in advance for any assistance!  I have backups of all the
>>>>>> important stuff here but it would be nice to fix the corruptions in
>>>>>> place.
>>>>>
>>>>> And btrfs check doesn't report the same problem as the default original
>>>>> mode doesn't have such check.
>>>>>
>>>>> Please also post the result of "btrfs check --mode=lowmem /dev/sda1"
>>>>
>>>> Also, attached.  It seems to notice the same off-by-one problems, though
>>>> there also seem to be a couple of examples of being off by more than
>>>> one.
>>>
>>> Unfortunately, it doesn't detect, as there is no off-by-one error at all.
>>>
>>> The problem is, kernel is reporting error on completely fine leaf.
>>>
>>> Further more, even in the same leaf, there are more inlined extents, and
>>> they are all valid.
>>>
>>> So the kernel reports the error out of nowhere.
>>>
>>> More problems happens for extent_size where a lot of them is offset by
>>> one.
>>>
>>> Moreover, the root owner is not printed correctly, thus I'm wondering if
>>> the memory is corrupted.
>>>
>>> Please try memtest+ to verify all your memory is correct, and if so,
>>> please try the attached patch and to see if it provides extra info.
>>
>> Memtest ran for about 12 hours last night, and didn't find any errors.
>>
>> New messages from patched kernel:
>>
>>   BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970196795392
>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>> inline extent, have 3468 expect 3469 (21 + 3448)
> 
> This output doesn't match with debug-tree dump.
> 
> item 307 key (206231 EXTENT_DATA 0) itemoff 15118 itemsize 3468
> 	generation 692987 type 0 (inline)
> 	inline extent data size 3447 ram_bytes 3447 compression 0 (none)
> 
> Where its ram_bytes is 3447, not 3448.
> 
> Further more, there are 2 more inlined extent, if something really went
> wrong reading ram_bytes, it should also trigger the same warning.
> 
> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
> 	generation 367 type 0 (inline)
> 	inline extent data size 154 ram_bytes 154 compression 0 (none)
> 
> and
> 
> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
> 	generation 367 type 0 (inline)
> 	inline extent data size 154 ram_bytes 154 compression 0 (none)
> 
> The only way to get the number 3448 is from its inode item.
> 
> item 305 key (206231 INODE_ITEM 0) itemoff 18607 itemsize 160
> 	generation 1136104 transid 1136104 size 3447 nbytes  >>3448<<
> 	block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0
> 	sequence 4 flags 0x0(none)
> 	atime 1390923260.43167583 (2014-01-28 15:34:20)
> 	ctime 1416461176.910968309 (2014-11-20 05:26:16)
> 	mtime 1392531030.754511511 (2014-02-16 06:10:30)
> 	otime 0.0 (1970-01-01 00:00:00)
> 
> But the slot is correct, and nothing wrong with these item offset/length.
> 
> And the problem of wrong "root=" output also makes me pretty curious.
> 
> Is it possible to make a btrfs-image dump if all the filenames in this
> fs are not sensitive?

Hi Qu Wenruo,

I sent details of the btrfs-image to you in a private message. 
Hopefully you've received it and will find it useful.

But FYI I've been able to recover my damaged data from backups so at 
least now there's no issue of data loss for me personally.

One question I had though - what's the easiest way to get rid of these 
problematic inodes?  I can't directly copy new files on top of the old 
names now.  A couple of options spring to mind:

- 'btrfs check'; how dangerous is it, really?  :)

- Make a new subvolume and reflink all of the surviving files over. 
Then copy the restored files in, and delete the old subvolume.  Would 
that actually work?

- Move all the enclosing directories to a self-made lost+found 
directory, and ignore them.

- Any other ideas?

Steve

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-26 14:06           ` Steve Leung
@ 2018-05-27  0:57             ` Qu Wenruo
  2018-05-28  3:47               ` Steve Leung
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-05-27  0:57 UTC (permalink / raw)
  To: Steve Leung, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 9396 bytes --]



On 2018年05月26日 22:06, Steve Leung wrote:
> On 05/20/2018 07:07 PM, Qu Wenruo wrote:
>>
>>
>> On 2018年05月21日 04:43, Steve Leung wrote:
>>> On 05/19/2018 07:02 PM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2018年05月20日 07:40, Steve Leung wrote:
>>>>> On 05/17/2018 11:49 PM, Qu Wenruo wrote:
>>>>>> On 2018年05月18日 13:23, Steve Leung wrote:
>>>>>>> Hi list,
>>>>>>>
>>>>>>> I've got 3-device raid1 btrfs filesystem that's throwing up some
>>>>>>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
>>>>>>> observed lately:
>>>
>>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>> block=4970196795392
>>>>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for
>>>>>>> uncompressed
>>>>>>> inline extent, have 3468 expect 3469
>>>>>>
>>>>>> Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
>>>>>> dump the leaf?
>>>>>
>>>>> Attached btrfs-debug-tree dumps for all of the blocks that I saw
>>>>> messages for.
>>>>>
>>>>>> It's caught by tree-checker code which is ensuring all tree blocks
>>>>>> are
>>>>>> correct before btrfs can take use of them.
>>>>>>
>>>>>> That inline extent size check is tested, so I'm wondering if this
>>>>>> indicates any real corruption.
>>>>>> That btrfs-debug-tree output will definitely help.
>>>>>>
>>>>>> BTW, if I didn't miss anything, there should not be any inlined
>>>>>> extent
>>>>>> in root tree.
>>>>>>
>>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>> block=4970552426496
>>>>>>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
>>>>>>> inline extent, have 3496 expect 3497
>>>>>>
>>>>>> Same dump will definitely help.
>>>>>>
>>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>> block=4970712399872
>>>>>>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for
>>>>>>> uncompressed
>>>>>>> inline extent, have 1790 expect 1791
>>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>> block=4970803920896
>>>>>>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for
>>>>>>> uncompressed
>>>>>>> inline extent, have 2475 expect 2476
>>>>>>>      BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>> block=4970987945984
>>>>>>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for
>>>>>>> uncompressed
>>>>>>> inline extent, have 490 expect 491
>>>>>>>
>>>>>>> All of them seem to be 1 short of the expected value.
>>>>>>>
>>>>>>> Some files do seem to be inaccessible on the filesystem, and btrfs
>>>>>>> inspect-internal on any of those inode numbers fails with:
>>>>>>>
>>>>>>>     ERROR: ino paths ioctl: Input/output error
>>>>>>>
>>>>>>> and another message for that inode appears.
>>>>>>>
>>>>>>> 'btrfs check' (output attached) seems to notice these corruptions
>>>>>>> (among
>>>>>>> a few others, some of which seem to be related to a problematic
>>>>>>> attempt
>>>>>>> to build Android I posted about some months ago).
>>>>>>>
>>>>>>> Other information:
>>>>>>>
>>>>>>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem
>>>>>>> has
>>>>>>> about 25 snapshots at the moment, only a handful of compressed
>>>>>>> files,
>>>>>>> and nothing fancy like qgroups enabled.
>>>>>>>
>>>>>>> btrfs fi show:
>>>>>>>
>>>>>>>     Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>>>>>>>             Total devices 4 FS bytes used 2.48TiB
>>>>>>>             devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>>>>>>>             devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
>>>>>>>             devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>>>>>>>             devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>>>>>>>
>>>>>>> btrfs fi df:
>>>>>>>
>>>>>>>     Data, RAID1: total=2.49TiB, used=2.48TiB
>>>>>>>     System, RAID1: total=32.00MiB, used=416.00KiB
>>>>>>>     Metadata, RAID1: total=7.00GiB, used=5.29GiB
>>>>>>>     GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>>>>
>>>>>>> dmesg output attached as well.
>>>>>>>
>>>>>>> Thanks in advance for any assistance!  I have backups of all the
>>>>>>> important stuff here but it would be nice to fix the corruptions in
>>>>>>> place.
>>>>>>
>>>>>> And btrfs check doesn't report the same problem as the default
>>>>>> original
>>>>>> mode doesn't have such check.
>>>>>>
>>>>>> Please also post the result of "btrfs check --mode=lowmem /dev/sda1"
>>>>>
>>>>> Also, attached.  It seems to notice the same off-by-one problems,
>>>>> though
>>>>> there also seem to be a couple of examples of being off by more than
>>>>> one.
>>>>
>>>> Unfortunately, it doesn't detect, as there is no off-by-one error at
>>>> all.
>>>>
>>>> The problem is, kernel is reporting error on completely fine leaf.
>>>>
>>>> Further more, even in the same leaf, there are more inlined extents,
>>>> and
>>>> they are all valid.
>>>>
>>>> So the kernel reports the error out of nowhere.
>>>>
>>>> More problems happens for extent_size where a lot of them is offset by
>>>> one.
>>>>
>>>> Moreover, the root owner is not printed correctly, thus I'm
>>>> wondering if
>>>> the memory is corrupted.
>>>>
>>>> Please try memtest+ to verify all your memory is correct, and if so,
>>>> please try the attached patch and to see if it provides extra info.
>>>
>>> Memtest ran for about 12 hours last night, and didn't find any errors.
>>>
>>> New messages from patched kernel:
>>>
>>>   BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970196795392
>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>>> inline extent, have 3468 expect 3469 (21 + 3448)
>>
>> This output doesn't match with debug-tree dump.
>>
>> item 307 key (206231 EXTENT_DATA 0) itemoff 15118 itemsize 3468
>>     generation 692987 type 0 (inline)
>>     inline extent data size 3447 ram_bytes 3447 compression 0 (none)
>>
>> Where its ram_bytes is 3447, not 3448.
>>
>> Further more, there are 2 more inlined extent, if something really went
>> wrong reading ram_bytes, it should also trigger the same warning.
>>
>> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
>>     generation 367 type 0 (inline)
>>     inline extent data size 154 ram_bytes 154 compression 0 (none)
>>
>> and
>>
>> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
>>     generation 367 type 0 (inline)
>>     inline extent data size 154 ram_bytes 154 compression 0 (none)
>>
>> The only way to get the number 3448 is from its inode item.
>>
>> item 305 key (206231 INODE_ITEM 0) itemoff 18607 itemsize 160
>>     generation 1136104 transid 1136104 size 3447 nbytes  >>3448<<
>>     block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0
>>     sequence 4 flags 0x0(none)
>>     atime 1390923260.43167583 (2014-01-28 15:34:20)
>>     ctime 1416461176.910968309 (2014-11-20 05:26:16)
>>     mtime 1392531030.754511511 (2014-02-16 06:10:30)
>>     otime 0.0 (1970-01-01 00:00:00)
>>
>> But the slot is correct, and nothing wrong with these item offset/length.
>>
>> And the problem of wrong "root=" output also makes me pretty curious.
>>
>> Is it possible to make a btrfs-image dump if all the filenames in this
>> fs are not sensitive?
> 
> Hi Qu Wenruo,
> 
> I sent details of the btrfs-image to you in a private message. Hopefully
> you've received it and will find it useful.

Sorry, I didn't find the private message.

> 
> But FYI I've been able to recover my damaged data from backups so at
> least now there's no issue of data loss for me personally.
> 
> One question I had though - what's the easiest way to get rid of these
> problematic inodes?  I can't directly copy new files on top of the old
> names now.  A couple of options spring to mind:

Sorry, not way right now.

The problem is, the whole kernel is not behaving as expected.
And I have no idea how this happened at all.

> 
> - 'btrfs check'; how dangerous is it, really?  :)

As long as you don't use --repair, it's pretty safe.

And normally, btrfs community could provide enough help.
(Sometimes with black magic to manually modify tree blocks to fix it,
but it's not the case)

For this particular case, --repair should not be that dangerous, as most
problems are just inode's nbytes, which should be easily fixed by --repair.

> 
> - Make a new subvolume and reflink all of the surviving files over. Then
> copy the restored files in, and delete the old subvolume.  Would that
> actually work?

No. It's kernel refusing to read some tree blocks.
No way to fix using the current kernel.

> 
> - Move all the enclosing directories to a self-made lost+found
> directory, and ignore them.

Nope, the same reason.

> 
> - Any other ideas?

Since it's the latest kernel causing problem, reverting to old kernel
may help.
At least for older kernel, there is no such restrict check.

Thanks,
Qu

> 
> Steve
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-27  0:57             ` Qu Wenruo
@ 2018-05-28  3:47               ` Steve Leung
  2018-05-28  5:11                 ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Steve Leung @ 2018-05-28  3:47 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 05/26/2018 06:57 PM, Qu Wenruo wrote:
> 
> 
> On 2018年05月26日 22:06, Steve Leung wrote:
>> On 05/20/2018 07:07 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年05月21日 04:43, Steve Leung wrote:
>>>> On 05/19/2018 07:02 PM, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2018年05月20日 07:40, Steve Leung wrote:
>>>>>> On 05/17/2018 11:49 PM, Qu Wenruo wrote:
>>>>>>> On 2018年05月18日 13:23, Steve Leung wrote:
>>>>>>>> Hi list,
>>>>>>>>
>>>>>>>> I've got 3-device raid1 btrfs filesystem that's throwing up some
>>>>>>>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
>>>>>>>> observed lately:
>>>>
>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>> block=4970196795392
>>>>>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for
>>>>>>>> uncompressed
>>>>>>>> inline extent, have 3468 expect 3469
>>>>>>>
>>>>>>> Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
>>>>>>> dump the leaf?
>>>>>>
>>>>>> Attached btrfs-debug-tree dumps for all of the blocks that I saw
>>>>>> messages for.
>>>>>>
>>>>>>> It's caught by tree-checker code which is ensuring all tree blocks
>>>>>>> are
>>>>>>> correct before btrfs can take use of them.
>>>>>>>
>>>>>>> That inline extent size check is tested, so I'm wondering if this
>>>>>>> indicates any real corruption.
>>>>>>> That btrfs-debug-tree output will definitely help.
>>>>>>>
>>>>>>> BTW, if I didn't miss anything, there should not be any inlined
>>>>>>> extent
>>>>>>> in root tree.
>>>>>>>
>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>> block=4970552426496
>>>>>>>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
>>>>>>>> inline extent, have 3496 expect 3497
>>>>>>>
>>>>>>> Same dump will definitely help.
>>>>>>>
>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>> block=4970712399872
>>>>>>>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for
>>>>>>>> uncompressed
>>>>>>>> inline extent, have 1790 expect 1791
>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>> block=4970803920896
>>>>>>>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for
>>>>>>>> uncompressed
>>>>>>>> inline extent, have 2475 expect 2476
>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>> block=4970987945984
>>>>>>>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for
>>>>>>>> uncompressed
>>>>>>>> inline extent, have 490 expect 491
>>>>>>>>
>>>>>>>> All of them seem to be 1 short of the expected value.
>>>>>>>>
>>>>>>>> Some files do seem to be inaccessible on the filesystem, and btrfs
>>>>>>>> inspect-internal on any of those inode numbers fails with:
>>>>>>>>
>>>>>>>>      ERROR: ino paths ioctl: Input/output error
>>>>>>>>
>>>>>>>> and another message for that inode appears.
>>>>>>>>
>>>>>>>> 'btrfs check' (output attached) seems to notice these corruptions
>>>>>>>> (among
>>>>>>>> a few others, some of which seem to be related to a problematic
>>>>>>>> attempt
>>>>>>>> to build Android I posted about some months ago).
>>>>>>>>
>>>>>>>> Other information:
>>>>>>>>
>>>>>>>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem
>>>>>>>> has
>>>>>>>> about 25 snapshots at the moment, only a handful of compressed
>>>>>>>> files,
>>>>>>>> and nothing fancy like qgroups enabled.
>>>>>>>>
>>>>>>>> btrfs fi show:
>>>>>>>>
>>>>>>>>      Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>>>>>>>>              Total devices 4 FS bytes used 2.48TiB
>>>>>>>>              devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>>>>>>>>              devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
>>>>>>>>              devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>>>>>>>>              devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>>>>>>>>
>>>>>>>> btrfs fi df:
>>>>>>>>
>>>>>>>>      Data, RAID1: total=2.49TiB, used=2.48TiB
>>>>>>>>      System, RAID1: total=32.00MiB, used=416.00KiB
>>>>>>>>      Metadata, RAID1: total=7.00GiB, used=5.29GiB
>>>>>>>>      GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>>>>>
>>>>>>>> dmesg output attached as well.
>>>>>>>>
>>>>>>>> Thanks in advance for any assistance!  I have backups of all the
>>>>>>>> important stuff here but it would be nice to fix the corruptions in
>>>>>>>> place.
>>>>>>>
>>>>>>> And btrfs check doesn't report the same problem as the default
>>>>>>> original
>>>>>>> mode doesn't have such check.
>>>>>>>
>>>>>>> Please also post the result of "btrfs check --mode=lowmem /dev/sda1"
>>>>>>
>>>>>> Also, attached.  It seems to notice the same off-by-one problems,
>>>>>> though
>>>>>> there also seem to be a couple of examples of being off by more than
>>>>>> one.
>>>>>
>>>>> Unfortunately, it doesn't detect, as there is no off-by-one error at
>>>>> all.
>>>>>
>>>>> The problem is, kernel is reporting error on completely fine leaf.
>>>>>
>>>>> Further more, even in the same leaf, there are more inlined extents,
>>>>> and
>>>>> they are all valid.
>>>>>
>>>>> So the kernel reports the error out of nowhere.
>>>>>
>>>>> More problems happens for extent_size where a lot of them is offset by
>>>>> one.
>>>>>
>>>>> Moreover, the root owner is not printed correctly, thus I'm
>>>>> wondering if
>>>>> the memory is corrupted.
>>>>>
>>>>> Please try memtest+ to verify all your memory is correct, and if so,
>>>>> please try the attached patch and to see if it provides extra info.
>>>>
>>>> Memtest ran for about 12 hours last night, and didn't find any errors.
>>>>
>>>> New messages from patched kernel:
>>>>
>>>>    BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970196795392
>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>>>> inline extent, have 3468 expect 3469 (21 + 3448)
>>>
>>> This output doesn't match with debug-tree dump.
>>>
>>> item 307 key (206231 EXTENT_DATA 0) itemoff 15118 itemsize 3468
>>>      generation 692987 type 0 (inline)
>>>      inline extent data size 3447 ram_bytes 3447 compression 0 (none)
>>>
>>> Where its ram_bytes is 3447, not 3448.
>>>
>>> Further more, there are 2 more inlined extent, if something really went
>>> wrong reading ram_bytes, it should also trigger the same warning.
>>>
>>> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
>>>      generation 367 type 0 (inline)
>>>      inline extent data size 154 ram_bytes 154 compression 0 (none)
>>>
>>> and
>>>
>>> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
>>>      generation 367 type 0 (inline)
>>>      inline extent data size 154 ram_bytes 154 compression 0 (none)
>>>
>>> The only way to get the number 3448 is from its inode item.
>>>
>>> item 305 key (206231 INODE_ITEM 0) itemoff 18607 itemsize 160
>>>      generation 1136104 transid 1136104 size 3447 nbytes  >>3448<<
>>>      block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0
>>>      sequence 4 flags 0x0(none)
>>>      atime 1390923260.43167583 (2014-01-28 15:34:20)
>>>      ctime 1416461176.910968309 (2014-11-20 05:26:16)
>>>      mtime 1392531030.754511511 (2014-02-16 06:10:30)
>>>      otime 0.0 (1970-01-01 00:00:00)
>>>
>>> But the slot is correct, and nothing wrong with these item offset/length.
>>>
>>> And the problem of wrong "root=" output also makes me pretty curious.
>>>
>>> Is it possible to make a btrfs-image dump if all the filenames in this
>>> fs are not sensitive?
>>
>> Hi Qu Wenruo,
>>
>> I sent details of the btrfs-image to you in a private message. Hopefully
>> you've received it and will find it useful.
> 
> Sorry, I didn't find the private message.

Ok, resent with a subject of "resend: btrfs image dump".  Hopefully it 
didn't get caught by your spam filter.

Steve

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-28  3:47               ` Steve Leung
@ 2018-05-28  5:11                 ` Qu Wenruo
  2018-05-29 14:58                   ` Steve Leung
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-05-28  5:11 UTC (permalink / raw)
  To: Steve Leung, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 8782 bytes --]



On 2018年05月28日 11:47, Steve Leung wrote:
> On 05/26/2018 06:57 PM, Qu Wenruo wrote:
>>
>>
>> On 2018年05月26日 22:06, Steve Leung wrote:
>>> On 05/20/2018 07:07 PM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2018年05月21日 04:43, Steve Leung wrote:
>>>>> On 05/19/2018 07:02 PM, Qu Wenruo wrote:
>>>>>>
>>>>>>
>>>>>> On 2018年05月20日 07:40, Steve Leung wrote:
>>>>>>> On 05/17/2018 11:49 PM, Qu Wenruo wrote:
>>>>>>>> On 2018年05月18日 13:23, Steve Leung wrote:
>>>>>>>>> Hi list,
>>>>>>>>>
>>>>>>>>> I've got 3-device raid1 btrfs filesystem that's throwing up some
>>>>>>>>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
>>>>>>>>> observed lately:
>>>>>
>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>> block=4970196795392
>>>>>>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for
>>>>>>>>> uncompressed
>>>>>>>>> inline extent, have 3468 expect 3469
>>>>>>>>
>>>>>>>> Would you please use "btrfs-debug-tree -b 4970196795392
>>>>>>>> /dev/sda1" to
>>>>>>>> dump the leaf?
>>>>>>>
>>>>>>> Attached btrfs-debug-tree dumps for all of the blocks that I saw
>>>>>>> messages for.
>>>>>>>
>>>>>>>> It's caught by tree-checker code which is ensuring all tree blocks
>>>>>>>> are
>>>>>>>> correct before btrfs can take use of them.
>>>>>>>>
>>>>>>>> That inline extent size check is tested, so I'm wondering if this
>>>>>>>> indicates any real corruption.
>>>>>>>> That btrfs-debug-tree output will definitely help.
>>>>>>>>
>>>>>>>> BTW, if I didn't miss anything, there should not be any inlined
>>>>>>>> extent
>>>>>>>> in root tree.
>>>>>>>>
>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>> block=4970552426496
>>>>>>>>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for
>>>>>>>>> uncompressed
>>>>>>>>> inline extent, have 3496 expect 3497
>>>>>>>>
>>>>>>>> Same dump will definitely help.
>>>>>>>>
>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>> block=4970712399872
>>>>>>>>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for
>>>>>>>>> uncompressed
>>>>>>>>> inline extent, have 1790 expect 1791
>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>> block=4970803920896
>>>>>>>>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for
>>>>>>>>> uncompressed
>>>>>>>>> inline extent, have 2475 expect 2476
>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>> block=4970987945984
>>>>>>>>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for
>>>>>>>>> uncompressed
>>>>>>>>> inline extent, have 490 expect 491
>>>>>>>>>
>>>>>>>>> All of them seem to be 1 short of the expected value.
>>>>>>>>>
>>>>>>>>> Some files do seem to be inaccessible on the filesystem, and btrfs
>>>>>>>>> inspect-internal on any of those inode numbers fails with:
>>>>>>>>>
>>>>>>>>>      ERROR: ino paths ioctl: Input/output error
>>>>>>>>>
>>>>>>>>> and another message for that inode appears.
>>>>>>>>>
>>>>>>>>> 'btrfs check' (output attached) seems to notice these corruptions
>>>>>>>>> (among
>>>>>>>>> a few others, some of which seem to be related to a problematic
>>>>>>>>> attempt
>>>>>>>>> to build Android I posted about some months ago).
>>>>>>>>>
>>>>>>>>> Other information:
>>>>>>>>>
>>>>>>>>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The
>>>>>>>>> filesystem
>>>>>>>>> has
>>>>>>>>> about 25 snapshots at the moment, only a handful of compressed
>>>>>>>>> files,
>>>>>>>>> and nothing fancy like qgroups enabled.
>>>>>>>>>
>>>>>>>>> btrfs fi show:
>>>>>>>>>
>>>>>>>>>      Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>>>>>>>>>              Total devices 4 FS bytes used 2.48TiB
>>>>>>>>>              devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>>>>>>>>>              devid    2 size 464.73GiB used 230.00GiB path
>>>>>>>>> /dev/sdc1
>>>>>>>>>              devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>>>>>>>>>              devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>>>>>>>>>
>>>>>>>>> btrfs fi df:
>>>>>>>>>
>>>>>>>>>      Data, RAID1: total=2.49TiB, used=2.48TiB
>>>>>>>>>      System, RAID1: total=32.00MiB, used=416.00KiB
>>>>>>>>>      Metadata, RAID1: total=7.00GiB, used=5.29GiB
>>>>>>>>>      GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>>>>>>
>>>>>>>>> dmesg output attached as well.
>>>>>>>>>
>>>>>>>>> Thanks in advance for any assistance!  I have backups of all the
>>>>>>>>> important stuff here but it would be nice to fix the
>>>>>>>>> corruptions in
>>>>>>>>> place.
>>>>>>>>
>>>>>>>> And btrfs check doesn't report the same problem as the default
>>>>>>>> original
>>>>>>>> mode doesn't have such check.
>>>>>>>>
>>>>>>>> Please also post the result of "btrfs check --mode=lowmem
>>>>>>>> /dev/sda1"
>>>>>>>
>>>>>>> Also, attached.  It seems to notice the same off-by-one problems,
>>>>>>> though
>>>>>>> there also seem to be a couple of examples of being off by more than
>>>>>>> one.
>>>>>>
>>>>>> Unfortunately, it doesn't detect, as there is no off-by-one error at
>>>>>> all.
>>>>>>
>>>>>> The problem is, kernel is reporting error on completely fine leaf.
>>>>>>
>>>>>> Further more, even in the same leaf, there are more inlined extents,
>>>>>> and
>>>>>> they are all valid.
>>>>>>
>>>>>> So the kernel reports the error out of nowhere.
>>>>>>
>>>>>> More problems happens for extent_size where a lot of them is
>>>>>> offset by
>>>>>> one.
>>>>>>
>>>>>> Moreover, the root owner is not printed correctly, thus I'm
>>>>>> wondering if
>>>>>> the memory is corrupted.
>>>>>>
>>>>>> Please try memtest+ to verify all your memory is correct, and if so,
>>>>>> please try the attached patch and to see if it provides extra info.
>>>>>
>>>>> Memtest ran for about 12 hours last night, and didn't find any errors.
>>>>>
>>>>> New messages from patched kernel:
>>>>>
>>>>>    BTRFS critical (device sdd1): corrupt leaf: root=1
>>>>> block=4970196795392
>>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>>>>> inline extent, have 3468 expect 3469 (21 + 3448)
>>>>
>>>> This output doesn't match with debug-tree dump.
>>>>
>>>> item 307 key (206231 EXTENT_DATA 0) itemoff 15118 itemsize 3468
>>>>      generation 692987 type 0 (inline)
>>>>      inline extent data size 3447 ram_bytes 3447 compression 0 (none)
>>>>
>>>> Where its ram_bytes is 3447, not 3448.
>>>>
>>>> Further more, there are 2 more inlined extent, if something really went
>>>> wrong reading ram_bytes, it should also trigger the same warning.
>>>>
>>>> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
>>>>      generation 367 type 0 (inline)
>>>>      inline extent data size 154 ram_bytes 154 compression 0 (none)
>>>>
>>>> and
>>>>
>>>> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
>>>>      generation 367 type 0 (inline)
>>>>      inline extent data size 154 ram_bytes 154 compression 0 (none)
>>>>
>>>> The only way to get the number 3448 is from its inode item.
>>>>
>>>> item 305 key (206231 INODE_ITEM 0) itemoff 18607 itemsize 160
>>>>      generation 1136104 transid 1136104 size 3447 nbytes  >>3448<<
>>>>      block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0
>>>>      sequence 4 flags 0x0(none)
>>>>      atime 1390923260.43167583 (2014-01-28 15:34:20)
>>>>      ctime 1416461176.910968309 (2014-11-20 05:26:16)
>>>>      mtime 1392531030.754511511 (2014-02-16 06:10:30)
>>>>      otime 0.0 (1970-01-01 00:00:00)
>>>>
>>>> But the slot is correct, and nothing wrong with these item
>>>> offset/length.
>>>>
>>>> And the problem of wrong "root=" output also makes me pretty curious.
>>>>
>>>> Is it possible to make a btrfs-image dump if all the filenames in this
>>>> fs are not sensitive?
>>>
>>> Hi Qu Wenruo,
>>>
>>> I sent details of the btrfs-image to you in a private message. Hopefully
>>> you've received it and will find it useful.
>>
>> Sorry, I didn't find the private message.
> 
> Ok, resent with a subject of "resend: btrfs image dump".  Hopefully it
> didn't get caught by your spam filter.

Still nope.
What about encrypt it and upload it to some public storage provider like
google drive/dropbox?

Thanks,
Qu

> 
> Steve
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-28  5:11                 ` Qu Wenruo
@ 2018-05-29 14:58                   ` Steve Leung
  2018-06-05  5:30                     ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Steve Leung @ 2018-05-29 14:58 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Qu Wenruo <quwenruo.btrfs@gmx.com> writes:

> On 2018年05月28日 11:47, Steve Leung wrote:
>> On 05/26/2018 06:57 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年05月26日 22:06, Steve Leung wrote:
>>>> On 05/20/2018 07:07 PM, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2018年05月21日 04:43, Steve Leung wrote:
>>>>>> On 05/19/2018 07:02 PM, Qu Wenruo wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2018年05月20日 07:40, Steve Leung wrote:
>>>>>>>> On 05/17/2018 11:49 PM, Qu Wenruo wrote:
>>>>>>>>> On 2018年05月18日 13:23, Steve Leung wrote:
>>>>>>>>>> Hi list,
>>>>>>>>>>
>>>>>>>>>> I've got 3-device raid1 btrfs filesystem that's throwing up some
>>>>>>>>>> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
>>>>>>>>>> observed lately:
>>>>>>
>>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>>> block=4970196795392
>>>>>>>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for
>>>>>>>>>> uncompressed
>>>>>>>>>> inline extent, have 3468 expect 3469
>>>>>>>>>
>>>>>>>>> Would you please use "btrfs-debug-tree -b 4970196795392
>>>>>>>>> /dev/sda1" to
>>>>>>>>> dump the leaf?
>>>>>>>>
>>>>>>>> Attached btrfs-debug-tree dumps for all of the blocks that I saw
>>>>>>>> messages for.
>>>>>>>>
>>>>>>>>> It's caught by tree-checker code which is ensuring all tree blocks
>>>>>>>>> are
>>>>>>>>> correct before btrfs can take use of them.
>>>>>>>>>
>>>>>>>>> That inline extent size check is tested, so I'm wondering if this
>>>>>>>>> indicates any real corruption.
>>>>>>>>> That btrfs-debug-tree output will definitely help.
>>>>>>>>>
>>>>>>>>> BTW, if I didn't miss anything, there should not be any inlined
>>>>>>>>> extent
>>>>>>>>> in root tree.
>>>>>>>>>
>>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>>> block=4970552426496
>>>>>>>>>> slot=91 ino=209736 file_offset=0, invalid ram_bytes for
>>>>>>>>>> uncompressed
>>>>>>>>>> inline extent, have 3496 expect 3497
>>>>>>>>>
>>>>>>>>> Same dump will definitely help.
>>>>>>>>>
>>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>>> block=4970712399872
>>>>>>>>>> slot=221 ino=205230 file_offset=0, invalid ram_bytes for
>>>>>>>>>> uncompressed
>>>>>>>>>> inline extent, have 1790 expect 1791
>>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>>> block=4970803920896
>>>>>>>>>> slot=368 ino=205732 file_offset=0, invalid ram_bytes for
>>>>>>>>>> uncompressed
>>>>>>>>>> inline extent, have 2475 expect 2476
>>>>>>>>>>       BTRFS critical (device sda1): corrupt leaf: root=1
>>>>>>>>>> block=4970987945984
>>>>>>>>>> slot=236 ino=208896 file_offset=0, invalid ram_bytes for
>>>>>>>>>> uncompressed
>>>>>>>>>> inline extent, have 490 expect 491
>>>>>>>>>>
>>>>>>>>>> All of them seem to be 1 short of the expected value.
>>>>>>>>>>
>>>>>>>>>> Some files do seem to be inaccessible on the filesystem, and btrfs
>>>>>>>>>> inspect-internal on any of those inode numbers fails with:
>>>>>>>>>>
>>>>>>>>>>      ERROR: ino paths ioctl: Input/output error
>>>>>>>>>>
>>>>>>>>>> and another message for that inode appears.
>>>>>>>>>>
>>>>>>>>>> 'btrfs check' (output attached) seems to notice these corruptions
>>>>>>>>>> (among
>>>>>>>>>> a few others, some of which seem to be related to a problematic
>>>>>>>>>> attempt
>>>>>>>>>> to build Android I posted about some months ago).
>>>>>>>>>>
>>>>>>>>>> Other information:
>>>>>>>>>>
>>>>>>>>>> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The
>>>>>>>>>> filesystem
>>>>>>>>>> has
>>>>>>>>>> about 25 snapshots at the moment, only a handful of compressed
>>>>>>>>>> files,
>>>>>>>>>> and nothing fancy like qgroups enabled.
>>>>>>>>>>
>>>>>>>>>> btrfs fi show:
>>>>>>>>>>
>>>>>>>>>>      Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>>>>>>>>>>              Total devices 4 FS bytes used 2.48TiB
>>>>>>>>>>              devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>>>>>>>>>>              devid    2 size 464.73GiB used 230.00GiB path
>>>>>>>>>> /dev/sdc1
>>>>>>>>>>              devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>>>>>>>>>>              devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>>>>>>>>>>
>>>>>>>>>> btrfs fi df:
>>>>>>>>>>
>>>>>>>>>>      Data, RAID1: total=2.49TiB, used=2.48TiB
>>>>>>>>>>      System, RAID1: total=32.00MiB, used=416.00KiB
>>>>>>>>>>      Metadata, RAID1: total=7.00GiB, used=5.29GiB
>>>>>>>>>>      GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>>>>>>>
>>>>>>>>>> dmesg output attached as well.
>>>>>>>>>>
>>>>>>>>>> Thanks in advance for any assistance!  I have backups of all the
>>>>>>>>>> important stuff here but it would be nice to fix the
>>>>>>>>>> corruptions in
>>>>>>>>>> place.
>>>>>>>>>
>>>>>>>>> And btrfs check doesn't report the same problem as the default
>>>>>>>>> original
>>>>>>>>> mode doesn't have such check.
>>>>>>>>>
>>>>>>>>> Please also post the result of "btrfs check --mode=lowmem
>>>>>>>>> /dev/sda1"
>>>>>>>>
>>>>>>>> Also, attached.  It seems to notice the same off-by-one problems,
>>>>>>>> though
>>>>>>>> there also seem to be a couple of examples of being off by more than
>>>>>>>> one.
>>>>>>>
>>>>>>> Unfortunately, it doesn't detect, as there is no off-by-one error at
>>>>>>> all.
>>>>>>>
>>>>>>> The problem is, kernel is reporting error on completely fine leaf.
>>>>>>>
>>>>>>> Further more, even in the same leaf, there are more inlined extents,
>>>>>>> and
>>>>>>> they are all valid.
>>>>>>>
>>>>>>> So the kernel reports the error out of nowhere.
>>>>>>>
>>>>>>> More problems happens for extent_size where a lot of them is
>>>>>>> offset by
>>>>>>> one.
>>>>>>>
>>>>>>> Moreover, the root owner is not printed correctly, thus I'm
>>>>>>> wondering if
>>>>>>> the memory is corrupted.
>>>>>>>
>>>>>>> Please try memtest+ to verify all your memory is correct, and if so,
>>>>>>> please try the attached patch and to see if it provides extra info.
>>>>>>
>>>>>> Memtest ran for about 12 hours last night, and didn't find any errors.
>>>>>>
>>>>>> New messages from patched kernel:
>>>>>>
>>>>>>    BTRFS critical (device sdd1): corrupt leaf: root=1
>>>>>> block=4970196795392
>>>>>> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
>>>>>> inline extent, have 3468 expect 3469 (21 + 3448)
>>>>>
>>>>> This output doesn't match with debug-tree dump.
>>>>>
>>>>> item 307 key (206231 EXTENT_DATA 0) itemoff 15118 itemsize 3468
>>>>>      generation 692987 type 0 (inline)
>>>>>      inline extent data size 3447 ram_bytes 3447 compression 0 (none)
>>>>>
>>>>> Where its ram_bytes is 3447, not 3448.
>>>>>
>>>>> Further more, there are 2 more inlined extent, if something really went
>>>>> wrong reading ram_bytes, it should also trigger the same warning.
>>>>>
>>>>> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
>>>>>      generation 367 type 0 (inline)
>>>>>      inline extent data size 154 ram_bytes 154 compression 0 (none)
>>>>>
>>>>> and
>>>>>
>>>>> item 26 key (206227 EXTENT_DATA 0) itemoff 30917 itemsize 175
>>>>>      generation 367 type 0 (inline)
>>>>>      inline extent data size 154 ram_bytes 154 compression 0 (none)
>>>>>
>>>>> The only way to get the number 3448 is from its inode item.
>>>>>
>>>>> item 305 key (206231 INODE_ITEM 0) itemoff 18607 itemsize 160
>>>>>      generation 1136104 transid 1136104 size 3447 nbytes  >>3448<<
>>>>>      block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0
>>>>>      sequence 4 flags 0x0(none)
>>>>>      atime 1390923260.43167583 (2014-01-28 15:34:20)
>>>>>      ctime 1416461176.910968309 (2014-11-20 05:26:16)
>>>>>      mtime 1392531030.754511511 (2014-02-16 06:10:30)
>>>>>      otime 0.0 (1970-01-01 00:00:00)
>>>>>
>>>>> But the slot is correct, and nothing wrong with these item
>>>>> offset/length.
>>>>>
>>>>> And the problem of wrong "root=" output also makes me pretty curious.
>>>>>
>>>>> Is it possible to make a btrfs-image dump if all the filenames in this
>>>>> fs are not sensitive?
>>>>
>>>> Hi Qu Wenruo,
>>>>
>>>> I sent details of the btrfs-image to you in a private message. Hopefully
>>>> you've received it and will find it useful.
>>>
>>> Sorry, I didn't find the private message.
>> 
>> Ok, resent with a subject of "resend: btrfs image dump".  Hopefully it
>> didn't get caught by your spam filter.
>
> Still nope.
> What about encrypt it and upload it to some public storage provider like
> google drive/dropbox?

Ok, uploaded to Google Drive.  You'll need to request access to it.

  https://drive.google.com/file/d/16NM1NVoMVgkJ_JiePi8VfAzit5_Onz2H/view?usp=sharing

sha256sum for the file should be:

  ea0abc21fcbc3a71c68b7307d57b26763ac711bd3437a60e32db3144facfeb3f

Thanks!

Steve

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-21  1:07         ` Qu Wenruo
  2018-05-26 14:06           ` Steve Leung
@ 2018-05-29 18:49           ` Hans van Kranenburg
  2018-06-05  5:24             ` Qu Wenruo
  1 sibling, 1 reply; 17+ messages in thread
From: Hans van Kranenburg @ 2018-05-29 18:49 UTC (permalink / raw)
  To: Qu Wenruo, Steve Leung, linux-btrfs

On 05/21/2018 03:07 AM, Qu Wenruo wrote:
> 
> [...]
> 
> And the problem of wrong "root=" output also makes me pretty curious.
Yeah, there a bunch of error messages in the kernel with always say
root=1, regardless of the actual root the tree block is in.

I think I once tried to find out why, but apparently it wasn't a really
obvious error somewhere.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-29 18:49           ` Hans van Kranenburg
@ 2018-06-05  5:24             ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2018-06-05  5:24 UTC (permalink / raw)
  To: Hans van Kranenburg, Steve Leung, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 731 bytes --]



On 2018年05月30日 02:49, Hans van Kranenburg wrote:
> On 05/21/2018 03:07 AM, Qu Wenruo wrote:
>>
>> [...]
>>
>> And the problem of wrong "root=" output also makes me pretty curious.
> Yeah, there a bunch of error messages in the kernel with always say
> root=1, regardless of the actual root the tree block is in.

I think the problem is already fixed in latest kernel.

The original version is using inode->root, which is always pointing to
root tree.

In my testing VM using 4.17-rc5, it's showing the correct value.

If you still find such problem, feel free to report it.

Thanks,
Qu

> 
> I think I once tried to find out why, but apparently it wasn't a really
> obvious error somewhere.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-05-29 14:58                   ` Steve Leung
@ 2018-06-05  5:30                     ` Qu Wenruo
  2018-06-06  4:06                       ` Steve Leung
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2018-06-05  5:30 UTC (permalink / raw)
  To: Steve Leung, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1461 bytes --]



On 2018年05月29日 22:58, Steve Leung wrote:
> Qu Wenruo <quwenruo.btrfs@gmx.com> writes:
> 
>> On 2018年05月28日 11:47, Steve Leung wrote:
>>> On 05/26/2018 06:57 PM, Qu Wenruo wrote:
>>>>
>>>>
[snip]
>> Still nope.
>> What about encrypt it and upload it to some public storage provider like
>> google drive/dropbox?
> 
> Ok, uploaded to Google Drive.  You'll need to request access to it.
> 
>   https://drive.google.com/file/d/16NM1NVoMVgkJ_JiePi8VfAzit5_Onz2H/view?usp=sharing
> 
> sha256sum for the file should be:
> 
>   ea0abc21fcbc3a71c68b7307d57b26763ac711bd3437a60e32db3144facfeb3f
Sorry for the slow reply.

After all the testing, the result is a little surprising.

It's indeed *CORRUPTED*! And tree-checker code exposed it.

It's just btrfs-progs and kernel print-tree code doesn't use correct
ram_bytes to output, thus pretty tricky to expose.

The problem is the ram_bytes of that inlined extent, it's indeed larger
than it should, just by one byte.

I'm not completely sure how it's happened, but according to the
timestamp it's 4 years ago and I think some kernel off-by-one error
happens and fixed.

And current kernel can handle it pretty well without reading out the
last byte.
However it's still a corruption.

Although it's not a big problem, and can be fixed easily.
I'll submit a btrfs-progs patch to allow btrfs-check to fix in this week.

Thanks,
Qu

> Thanks!
> 
> Steve
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: off-by-one uncompressed invalid ram_bytes corruptions
  2018-06-05  5:30                     ` Qu Wenruo
@ 2018-06-06  4:06                       ` Steve Leung
  0 siblings, 0 replies; 17+ messages in thread
From: Steve Leung @ 2018-06-06  4:06 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 06/04/2018 11:30 PM, Qu Wenruo wrote:
> 
> 
> On 2018年05月29日 22:58, Steve Leung wrote:
>> Qu Wenruo <quwenruo.btrfs@gmx.com> writes:
>>
>>> On 2018年05月28日 11:47, Steve Leung wrote:
>>>> On 05/26/2018 06:57 PM, Qu Wenruo wrote:
>>>>>
>>>>>
> [snip]
>>> Still nope.
>>> What about encrypt it and upload it to some public storage provider like
>>> google drive/dropbox?
>>
>> Ok, uploaded to Google Drive.  You'll need to request access to it.
>>
>>    https://drive.google.com/file/d/16NM1NVoMVgkJ_JiePi8VfAzit5_Onz2H/view?usp=sharing
>>
>> sha256sum for the file should be:
>>
>>    ea0abc21fcbc3a71c68b7307d57b26763ac711bd3437a60e32db3144facfeb3f
> Sorry for the slow reply.
> 
> After all the testing, the result is a little surprising.
> 
> It's indeed *CORRUPTED*! And tree-checker code exposed it.
> 
> It's just btrfs-progs and kernel print-tree code doesn't use correct
> ram_bytes to output, thus pretty tricky to expose.
> 
> The problem is the ram_bytes of that inlined extent, it's indeed larger
> than it should, just by one byte.
> 
> I'm not completely sure how it's happened, but according to the
> timestamp it's 4 years ago and I think some kernel off-by-one error
> happens and fixed.
> 
> And current kernel can handle it pretty well without reading out the
> last byte.
> However it's still a corruption.
> 
> Although it's not a big problem, and can be fixed easily.
> I'll submit a btrfs-progs patch to allow btrfs-check to fix in this week.

Ok great to hear!  I'll give it a test whenver you have it.

Steve

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-06-06  4:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-18  5:23 off-by-one uncompressed invalid ram_bytes corruptions Steve Leung
2018-05-18  5:49 ` Qu Wenruo
2018-05-18  9:42   ` james harvey
2018-05-18  9:56     ` Qu Wenruo
2018-05-19 23:40   ` Steve Leung
2018-05-20  1:02     ` Qu Wenruo
2018-05-20 20:43       ` Steve Leung
2018-05-21  1:07         ` Qu Wenruo
2018-05-26 14:06           ` Steve Leung
2018-05-27  0:57             ` Qu Wenruo
2018-05-28  3:47               ` Steve Leung
2018-05-28  5:11                 ` Qu Wenruo
2018-05-29 14:58                   ` Steve Leung
2018-06-05  5:30                     ` Qu Wenruo
2018-06-06  4:06                       ` Steve Leung
2018-05-29 18:49           ` Hans van Kranenburg
2018-06-05  5:24             ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.