linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unable to remove directory entry
@ 2019-12-08 19:19 Mike Gilbert
  2019-12-09  0:11 ` Qu Wenruo
  2019-12-09  0:17 ` Zygo Blaxell
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Gilbert @ 2019-12-08 19:19 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I have a directory entry that cannot be stat-ed or unlinked. This
issue persists across reboots, so it seems there is something wrong on
disk.

% ls -l /var/cache/ccache.bad/2/c
ls: cannot access
'/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
No such
file or directory
total 0
-????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest

% uname -a
Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
Phenom(tm) II X6 1055T Processor
AuthenticAMD GNU/Linux

% btrfs --version
btrfs-progs v5.4

I have tried running btrfs check, and I get differing results based on
the --mode switch:

# btrfs check --readonly /dev/sda3
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups
Opening filesystem to check...
Checking filesystem on /dev/sda3
UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
found 284337733632 bytes used, no error found
total csum bytes: 267182280
total tree bytes: 4498915328
total fs tree bytes: 3972464640
total extent tree bytes: 199819264
btree space waste bytes: 776711635
file data blocks allocated: 313928671232
 referenced 279141621760

# btrfs check --readonly --mode=lowmem /dev/sda3
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name
0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing
ERROR: root 5 DIR ITEM[486836 13905] name
0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath
ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name
0390cb341d248c589c419007da68b2-7351.manifest filetype 1
ERROR: errors found in fs roots
Opening filesystem to check...
Checking filesystem on /dev/sda3
UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
found 284337733632 bytes used, error(s) found
total csum bytes: 267182280
total tree bytes: 4498915328
total fs tree bytes: 3972464640
total extent tree bytes: 199819264
btree space waste bytes: 776711635
file data blocks allocated: 313928671232
 referenced 279141621760

Please advise on possible next steps to diagnose and fix this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-08 19:19 Unable to remove directory entry Mike Gilbert
@ 2019-12-09  0:11 ` Qu Wenruo
  2019-12-09  0:30   ` Mike Gilbert
  2019-12-09  0:17 ` Zygo Blaxell
  1 sibling, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2019-12-09  0:11 UTC (permalink / raw)
  To: Mike Gilbert, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3218 bytes --]



On 2019/12/9 上午3:19, Mike Gilbert wrote:
> Hello,
> 
> I have a directory entry that cannot be stat-ed or unlinked. This
> issue persists across reboots, so it seems there is something wrong on
> disk.
> 
> % ls -l /var/cache/ccache.bad/2/c
> ls: cannot access
> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> No such
> file or directory
> total 0
> -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest

Dmesg if any, please.

> 
> % uname -a
> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
> Phenom(tm) II X6 1055T Processor
> AuthenticAMD GNU/Linux

The kernel is not new enough to btrfs' standard.

For this possibility name hash mismatch bug, newer kernel will reported
detailed problems.

> 
> % btrfs --version
> btrfs-progs v5.4
> 
> I have tried running btrfs check, and I get differing results based on
> the --mode switch:
> 
> # btrfs check --readonly /dev/sda3
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs
> [7/7] checking quota groups
> Opening filesystem to check...
> Checking filesystem on /dev/sda3
> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> found 284337733632 bytes used, no error found
> total csum bytes: 267182280
> total tree bytes: 4498915328
> total fs tree bytes: 3972464640
> total extent tree bytes: 199819264
> btree space waste bytes: 776711635
> file data blocks allocated: 313928671232
>  referenced 279141621760
> 
> # btrfs check --readonly --mode=lowmem /dev/sda3
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name
> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing
> ERROR: root 5 DIR ITEM[486836 13905] name
> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath
> ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name
> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1

This means the name hash for the filename
"0390cb341d248c589c419007da68b2-7351.manifest" is incorrect.

Thus kernel can't locate that inode correctly.

Furthermore, the index for inode 4065004 doesn't make much sense. The
number looks absolutely insane.

If your fs is small enough, you can try do a binary dump first, then try
btrfs check --mode=lowmem --repair, as we had such ability to repair in
v5.4.

If your fs is too large, I guess you can only prey bad thing doesn't
happen...

Thanks,
Qu

> ERROR: errors found in fs roots
> Opening filesystem to check...
> Checking filesystem on /dev/sda3
> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> found 284337733632 bytes used, error(s) found
> total csum bytes: 267182280
> total tree bytes: 4498915328
> total fs tree bytes: 3972464640
> total extent tree bytes: 199819264
> btree space waste bytes: 776711635
> file data blocks allocated: 313928671232
>  referenced 279141621760
> 
> Please advise on possible next steps to diagnose and fix this.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-08 19:19 Unable to remove directory entry Mike Gilbert
  2019-12-09  0:11 ` Qu Wenruo
@ 2019-12-09  0:17 ` Zygo Blaxell
  2019-12-09  1:33   ` Zygo Blaxell
  1 sibling, 1 reply; 15+ messages in thread
From: Zygo Blaxell @ 2019-12-09  0:17 UTC (permalink / raw)
  To: Mike Gilbert; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4094 bytes --]

On Sun, Dec 08, 2019 at 02:19:10PM -0500, Mike Gilbert wrote:
> Hello,
> 
> I have a directory entry that cannot be stat-ed or unlinked. This
> issue persists across reboots, so it seems there is something wrong on
> disk.
> 
> % ls -l /var/cache/ccache.bad/2/c
> ls: cannot access
> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> No such
> file or directory
> total 0
> -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest

I have seen a bug similar to this some years ago.  It was present as
far back as 4.5, and seems to still be present in 5.0.21.  I don't have
detailed tracking information on it due to the low severity: not a crash
or data corruption bug, and workarounds exist both to prevent the bug
and to clean up its aftermath.

The reproducer is something like:

	while (true) { // pseudocode
		int fd = create(tmp_name);
		write(fd, ...);
		fsync(fd);	// required, bug does not appear without this fsync
		close(fd);
		rename(tmp_name, regular_name);
	}

and a crash, maybe with some heavy write load.  This is typical of
applications like git and ccache, and in the wild, broken directory
entries are often found in these applications' directories.

Somewhere between 4.5 and 4.12 (a big range, I know), there was a change
in behavior:  before, the broken directory entry could not be removed,
renamed, or used for a new file, the only way to get rid of the broken
directory entry was to delete the entire subvol.  After the behavior
change, the broken directory entry could be removed by creating a new
file and renaming it to the broken directory entry name.

Another workaround is to remove the fsync by running the application
under eatmydata.  btrfs performs a flush in the rename() operation when
an existing file is replaced, so the fsync that triggers the bug was
not necessary in the first place.  Note this only works when replacing
an existing file, so the flushoncommit mount option is required to make
this work in other cases.

> % uname -a
> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
> Phenom(tm) II X6 1055T Processor
> AuthenticAMD GNU/Linux
> 
> % btrfs --version
> btrfs-progs v5.4
> 
> I have tried running btrfs check, and I get differing results based on
> the --mode switch:
> 
> # btrfs check --readonly /dev/sda3
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs
> [7/7] checking quota groups
> Opening filesystem to check...
> Checking filesystem on /dev/sda3
> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> found 284337733632 bytes used, no error found
> total csum bytes: 267182280
> total tree bytes: 4498915328
> total fs tree bytes: 3972464640
> total extent tree bytes: 199819264
> btree space waste bytes: 776711635
> file data blocks allocated: 313928671232
>  referenced 279141621760
> 
> # btrfs check --readonly --mode=lowmem /dev/sda3
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name
> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing
> ERROR: root 5 DIR ITEM[486836 13905] name
> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath
> ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name
> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1
> ERROR: errors found in fs roots
> Opening filesystem to check...
> Checking filesystem on /dev/sda3
> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> found 284337733632 bytes used, error(s) found
> total csum bytes: 267182280
> total tree bytes: 4498915328
> total fs tree bytes: 3972464640
> total extent tree bytes: 199819264
> btree space waste bytes: 776711635
> file data blocks allocated: 313928671232
>  referenced 279141621760
> 
> Please advise on possible next steps to diagnose and fix this.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  0:11 ` Qu Wenruo
@ 2019-12-09  0:30   ` Mike Gilbert
  2019-12-09  0:41     ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Gilbert @ 2019-12-09  0:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/12/9 上午3:19, Mike Gilbert wrote:
> > Hello,
> >
> > I have a directory entry that cannot be stat-ed or unlinked. This
> > issue persists across reboots, so it seems there is something wrong on
> > disk.
> >
> > % ls -l /var/cache/ccache.bad/2/c
> > ls: cannot access
> > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> > No such
> > file or directory
> > total 0
> > -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
>
> Dmesg if any, please.

There's nothing btrfs-related in the dmesg output.

> >
> > % uname -a
> > Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
> > Phenom(tm) II X6 1055T Processor
> > AuthenticAMD GNU/Linux
>
> The kernel is not new enough to btrfs' standard.
>
> For this possibility name hash mismatch bug, newer kernel will reported
> detailed problems.

Would 4.19.88 suffice, or do I need to switch to a newer release branch?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  0:30   ` Mike Gilbert
@ 2019-12-09  0:41     ` Qu Wenruo
  2019-12-09  1:31       ` Mike Gilbert
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2019-12-09  0:41 UTC (permalink / raw)
  To: Mike Gilbert; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1301 bytes --]



On 2019/12/9 上午8:30, Mike Gilbert wrote:
> On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2019/12/9 上午3:19, Mike Gilbert wrote:
>>> Hello,
>>>
>>> I have a directory entry that cannot be stat-ed or unlinked. This
>>> issue persists across reboots, so it seems there is something wrong on
>>> disk.
>>>
>>> % ls -l /var/cache/ccache.bad/2/c
>>> ls: cannot access
>>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
>>> No such
>>> file or directory
>>> total 0
>>> -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
>>
>> Dmesg if any, please.
> 
> There's nothing btrfs-related in the dmesg output.
> 
>>>
>>> % uname -a
>>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
>>> Phenom(tm) II X6 1055T Processor
>>> AuthenticAMD GNU/Linux
>>
>> The kernel is not new enough to btrfs' standard.
>>
>> For this possibility name hash mismatch bug, newer kernel will reported
>> detailed problems.
> 
> Would 4.19.88 suffice, or do I need to switch to a newer release branch?
> 
I'd recommend to go at least latest LTS (v5.3.x).

.88 is just backports, nothing really different. And sometimes big fixes
won't get backported.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  0:41     ` Qu Wenruo
@ 2019-12-09  1:31       ` Mike Gilbert
  2019-12-09  1:45         ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Gilbert @ 2019-12-09  1:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, Dec 8, 2019 at 7:41 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/12/9 上午8:30, Mike Gilbert wrote:
> > On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2019/12/9 上午3:19, Mike Gilbert wrote:
> >>> Hello,
> >>>
> >>> I have a directory entry that cannot be stat-ed or unlinked. This
> >>> issue persists across reboots, so it seems there is something wrong on
> >>> disk.
> >>>
> >>> % ls -l /var/cache/ccache.bad/2/c
> >>> ls: cannot access
> >>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> >>> No such
> >>> file or directory
> >>> total 0
> >>> -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
> >>
> >> Dmesg if any, please.
> >
> > There's nothing btrfs-related in the dmesg output.
> >
> >>>
> >>> % uname -a
> >>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
> >>> Phenom(tm) II X6 1055T Processor
> >>> AuthenticAMD GNU/Linux
> >>
> >> The kernel is not new enough to btrfs' standard.
> >>
> >> For this possibility name hash mismatch bug, newer kernel will reported
> >> detailed problems.
> >
> > Would 4.19.88 suffice, or do I need to switch to a newer release branch?
> >
> I'd recommend to go at least latest LTS (v5.3.x).
>
> .88 is just backports, nothing really different. And sometimes big fixes
> won't get backported.

I upgraded to linux-5.4.2, and attempted to remove the file, with the
same results.

ls: cannot access
'/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
No such
file or directory
total 0
-????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest

rm: cannot remove
'/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
No such
file or directory

I don't see any output in dmesg. Is there some option I need to enable?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  0:17 ` Zygo Blaxell
@ 2019-12-09  1:33   ` Zygo Blaxell
  2019-12-09  1:52     ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Zygo Blaxell @ 2019-12-09  1:33 UTC (permalink / raw)
  To: Mike Gilbert; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 9252 bytes --]

On Sun, Dec 08, 2019 at 07:17:21PM -0500, Zygo Blaxell wrote:
> On Sun, Dec 08, 2019 at 02:19:10PM -0500, Mike Gilbert wrote:
> > Hello,
> > 
> > I have a directory entry that cannot be stat-ed or unlinked. This
> > issue persists across reboots, so it seems there is something wrong on
> > disk.
> > 
> > % ls -l /var/cache/ccache.bad/2/c
> > ls: cannot access
> > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> > No such
> > file or directory
> > total 0
> > -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
> 
> I have seen a bug similar to this some years ago.  It was present as
> far back as 4.5, and seems to still be present in 5.0.21.  I don't have
> detailed tracking information on it due to the low severity: not a crash
> or data corruption bug, and workarounds exist both to prevent the bug
> and to clean up its aftermath.
> 
> The reproducer is something like:
> 
> 	while (true) { // pseudocode
> 		int fd = create(tmp_name);
> 		write(fd, ...);
> 		fsync(fd);	// required, bug does not appear without this fsync
> 		close(fd);
> 		rename(tmp_name, regular_name);
> 	}
> 
> and a crash, maybe with some heavy write load.  This is typical of
> applications like git and ccache, and in the wild, broken directory
> entries are often found in these applications' directories.
> 
> Somewhere between 4.5 and 4.12 (a big range, I know), there was a change
> in behavior:  before, the broken directory entry could not be removed,
> renamed, or used for a new file, the only way to get rid of the broken
> directory entry was to delete the entire subvol.  After the behavior
> change, the broken directory entry could be removed by creating a new
> file and renaming it to the broken directory entry name.

I found a filesystem that currently has one of these broken dirents:

	root@tester24:/media/testfs/beeshome# ls -l
	ls: cannot access 'beesstats.txt.tmp': No such file or directory
	total 3446032
	-rw-r--r-- 1 root root      10313 Nov 22  2018 all-df-today.png
	-rw-r--r-- 1 root root    3297813 Nov 22  2018 all-df-today.txt
	-rw------- 1 root root    1048488 Dec  7 17:35 beescrawl.dat
	-rwx------ 1 root root 1073741824 Dec  8 20:19 beeshash.dat
	-????????? ? ?    ?             ?            ? beesstats.txt.tmp
	-rw-r--r-- 1 root root   16064406 Dec  3 00:13 df-2019-11-28.txt
	-rw-r--r-- 1 root root    4269887 Dec  5 00:52 df-2019-12-03.txt
	-rw-r--r-- 1 root root    6358158 Dec  7 16:44 df-2019-12-05.txt
	-rw-r--r-- 1 root root    3221101 Dec  8 20:18 df-2019-12-07.txt
	-rw-r--r-- 1 root root 2208475574 Dec  3 00:13 log-2019-11-28.txt
	-rw-r--r-- 1 root root   72372394 Dec  5 00:52 log-2019-12-03.txt
	-rw-r--r-- 1 root root   97472346 Dec  7 16:44 log-2019-12-05.txt
	-rw-r--r-- 1 root root   42378425 Dec  8 20:19 log-2019-12-07.txt
	lrwxrwxrwx 1 root root         18 Dec  7 17:35 log-today.txt -> log-2019-12-07.txt

It seems I can create a file with the same name, and then I get two:

	root@tester24:/media/testfs/beeshome# date > beesstats.txt.tmp
	root@tester24:/media/testfs/beeshome# ls -l
	total 3446044
	-rw-r--r-- 1 root root      10313 Nov 22  2018 all-df-today.png
	-rw-r--r-- 1 root root    3297813 Nov 22  2018 all-df-today.txt
	-rw------- 1 root root    1048488 Dec  7 17:35 beescrawl.dat
	-rwx------ 1 root root 1073741824 Dec  8 20:19 beeshash.dat
	-rw-r--r-- 1 root root         29 Dec  8 20:19 beesstats.txt.tmp
	-rw-r--r-- 1 root root         29 Dec  8 20:19 beesstats.txt.tmp
	-rw-r--r-- 1 root root   16064406 Dec  3 00:13 df-2019-11-28.txt
	-rw-r--r-- 1 root root    4269887 Dec  5 00:52 df-2019-12-03.txt
	-rw-r--r-- 1 root root    6358158 Dec  7 16:44 df-2019-12-05.txt
	-rw-r--r-- 1 root root    3221363 Dec  8 20:19 df-2019-12-07.txt
	-rw-r--r-- 1 root root 2208475574 Dec  3 00:13 log-2019-11-28.txt
	-rw-r--r-- 1 root root   72372394 Dec  5 00:52 log-2019-12-03.txt
	-rw-r--r-- 1 root root   97472346 Dec  7 16:44 log-2019-12-05.txt
	-rw-r--r-- 1 root root   42384027 Dec  8 20:19 log-2019-12-07.txt
	lrwxrwxrwx 1 root root         18 Dec  7 17:35 log-today.txt -> log-2019-12-07.txt
	root@tester24:/media/testfs/beeshome# cat beesstats.txt.tmp 
	Sun Dec  8 20:19:38 EST 2019

dump-tree sees both DIR_INDEX but only one DIR_ITEM:

        item 9 key (256 DIR_ITEM 2721875446) itemoff 15740 itemsize 47
                location key (133693 INODE_ITEM 0) type FILE
                transid 5002644 data_len 0 name_len 17
                name: beesstats.txt.tmp
        item 18 key (256 DIR_INDEX 22037) itemoff 15332 itemsize 47
                location key (11481 INODE_ITEM 0) type FILE
                transid 1876891 data_len 0 name_len 17
                name: beesstats.txt.tmp
        item 32 key (256 DIR_INDEX 264858) itemoff 14684 itemsize 47
                location key (133693 INODE_ITEM 0) type FILE
                transid 5002644 data_len 0 name_len 17
                name: beesstats.txt.tmp

but I can only delete DIR_ITEMs:

	root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp 
	root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp 
	rm: cannot remove 'beesstats.txt.tmp': No such file or directory
	root@tester24:/media/testfs/beeshome# ls -l
	ls: cannot access 'beesstats.txt.tmp': No such file or directory
	total 3446048
	-rw-r--r-- 1 root root      10313 Nov 22  2018 all-df-today.png
	-rw-r--r-- 1 root root    3297813 Nov 22  2018 all-df-today.txt
	-rw------- 1 root root    1048488 Dec  7 17:35 beescrawl.dat
	-rwx------ 1 root root 1073741824 Dec  8 20:20 beeshash.dat
	-????????? ? ?    ?             ?            ? beesstats.txt.tmp
	-rw-r--r-- 1 root root   16064406 Dec  3 00:13 df-2019-11-28.txt
	-rw-r--r-- 1 root root    4269887 Dec  5 00:52 df-2019-12-03.txt
	-rw-r--r-- 1 root root    6358158 Dec  7 16:44 df-2019-12-05.txt
	-rw-r--r-- 1 root root    3221494 Dec  8 20:19 df-2019-12-07.txt
	-rw-r--r-- 1 root root 2208475574 Dec  3 00:13 log-2019-11-28.txt
	-rw-r--r-- 1 root root   72372394 Dec  5 00:52 log-2019-12-03.txt
	-rw-r--r-- 1 root root   97472346 Dec  7 16:44 log-2019-12-05.txt
	-rw-r--r-- 1 root root   42396102 Dec  8 20:20 log-2019-12-07.txt
	lrwxrwxrwx 1 root root         18 Dec  7 17:35 log-today.txt -> log-2019-12-07.txt

leaving the first DIR_INDEX behind:

        item 17 key (256 DIR_INDEX 22037) itemoff 15379 itemsize 47
                location key (11481 INODE_ITEM 0) type FILE
                transid 1876891 data_len 0 name_len 17
                name: beesstats.txt.tmp

So the btrfs read side is fine, it's the writing side that is putting bad
metadata on the disk.

> Another workaround is to remove the fsync by running the application
> under eatmydata.  btrfs performs a flush in the rename() operation when
> an existing file is replaced, so the fsync that triggers the bug was
> not necessary in the first place.  Note this only works when replacing
> an existing file, so the flushoncommit mount option is required to make
> this work in other cases.
> 
> > % uname -a
> > Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
> > Phenom(tm) II X6 1055T Processor
> > AuthenticAMD GNU/Linux
> > 
> > % btrfs --version
> > btrfs-progs v5.4
> > 
> > I have tried running btrfs check, and I get differing results based on
> > the --mode switch:
> > 
> > # btrfs check --readonly /dev/sda3
> > [1/7] checking root items
> > [2/7] checking extents
> > [3/7] checking free space cache
> > [4/7] checking fs roots
> > [5/7] checking only csums items (without verifying data)
> > [6/7] checking root refs
> > [7/7] checking quota groups
> > Opening filesystem to check...
> > Checking filesystem on /dev/sda3
> > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> > found 284337733632 bytes used, no error found
> > total csum bytes: 267182280
> > total tree bytes: 4498915328
> > total fs tree bytes: 3972464640
> > total extent tree bytes: 199819264
> > btree space waste bytes: 776711635
> > file data blocks allocated: 313928671232
> >  referenced 279141621760
> > 
> > # btrfs check --readonly --mode=lowmem /dev/sda3
> > [1/7] checking root items
> > [2/7] checking extents
> > [3/7] checking free space cache
> > [4/7] checking fs roots
> > ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name
> > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing
> > ERROR: root 5 DIR ITEM[486836 13905] name
> > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath
> > ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name
> > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1
> > ERROR: errors found in fs roots
> > Opening filesystem to check...
> > Checking filesystem on /dev/sda3
> > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> > found 284337733632 bytes used, error(s) found
> > total csum bytes: 267182280
> > total tree bytes: 4498915328
> > total fs tree bytes: 3972464640
> > total extent tree bytes: 199819264
> > btree space waste bytes: 776711635
> > file data blocks allocated: 313928671232
> >  referenced 279141621760
> > 
> > Please advise on possible next steps to diagnose and fix this.



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  1:31       ` Mike Gilbert
@ 2019-12-09  1:45         ` Qu Wenruo
  2019-12-09  1:51           ` Mike Gilbert
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2019-12-09  1:45 UTC (permalink / raw)
  To: Mike Gilbert; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2650 bytes --]



On 2019/12/9 上午9:31, Mike Gilbert wrote:
> On Sun, Dec 8, 2019 at 7:41 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2019/12/9 上午8:30, Mike Gilbert wrote:
>>> On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2019/12/9 上午3:19, Mike Gilbert wrote:
>>>>> Hello,
>>>>>
>>>>> I have a directory entry that cannot be stat-ed or unlinked. This
>>>>> issue persists across reboots, so it seems there is something wrong on
>>>>> disk.
>>>>>
>>>>> % ls -l /var/cache/ccache.bad/2/c
>>>>> ls: cannot access
>>>>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
>>>>> No such
>>>>> file or directory
>>>>> total 0
>>>>> -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
>>>>
>>>> Dmesg if any, please.
>>>
>>> There's nothing btrfs-related in the dmesg output.
>>>
>>>>>
>>>>> % uname -a
>>>>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
>>>>> Phenom(tm) II X6 1055T Processor
>>>>> AuthenticAMD GNU/Linux
>>>>
>>>> The kernel is not new enough to btrfs' standard.
>>>>
>>>> For this possibility name hash mismatch bug, newer kernel will reported
>>>> detailed problems.
>>>
>>> Would 4.19.88 suffice, or do I need to switch to a newer release branch?
>>>
>> I'd recommend to go at least latest LTS (v5.3.x).
>>
>> .88 is just backports, nothing really different. And sometimes big fixes
>> won't get backported.
> 
> I upgraded to linux-5.4.2, and attempted to remove the file, with the
> same results.
> 
> ls: cannot access
> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> No such
> file or directory
> total 0
> -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
> 
> rm: cannot remove
> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> No such
> file or directory
> 
> I don't see any output in dmesg. Is there some option I need to enable?
> 
Then it's not name hash mismatch, but just index mismatch.

In that case, kernel won't detect such problem by tree-checker. I'll
update tree-checker to handle the case.

I guess the only way to fix it is to rely on btrfs check --mode=lowmem
--repair.
But before that, would you please provde the following dump? So that I
can be sure before crafting the enhanced tree-checker patch.

# btrfs ins dump-tree -t 5 /dev/sda3 | grep "(4065004 INO" -A7
# btrfs ins dump-tree -t 5 /dev/sda3 | grep "(486836.*13905)" -A7
# btrfs ins dump-tree -t 5 /dev/sda3 | grep "(486836.*2543451757)" -A7

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  1:45         ` Qu Wenruo
@ 2019-12-09  1:51           ` Mike Gilbert
  2019-12-09  2:05             ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Gilbert @ 2019-12-09  1:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, Dec 8, 2019 at 8:45 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/12/9 上午9:31, Mike Gilbert wrote:
> > On Sun, Dec 8, 2019 at 7:41 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2019/12/9 上午8:30, Mike Gilbert wrote:
> >>> On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2019/12/9 上午3:19, Mike Gilbert wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I have a directory entry that cannot be stat-ed or unlinked. This
> >>>>> issue persists across reboots, so it seems there is something wrong on
> >>>>> disk.
> >>>>>
> >>>>> % ls -l /var/cache/ccache.bad/2/c
> >>>>> ls: cannot access
> >>>>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> >>>>> No such
> >>>>> file or directory
> >>>>> total 0
> >>>>> -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
> >>>>
> >>>> Dmesg if any, please.
> >>>
> >>> There's nothing btrfs-related in the dmesg output.
> >>>
> >>>>>
> >>>>> % uname -a
> >>>>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
> >>>>> Phenom(tm) II X6 1055T Processor
> >>>>> AuthenticAMD GNU/Linux
> >>>>
> >>>> The kernel is not new enough to btrfs' standard.
> >>>>
> >>>> For this possibility name hash mismatch bug, newer kernel will reported
> >>>> detailed problems.
> >>>
> >>> Would 4.19.88 suffice, or do I need to switch to a newer release branch?
> >>>
> >> I'd recommend to go at least latest LTS (v5.3.x).
> >>
> >> .88 is just backports, nothing really different. And sometimes big fixes
> >> won't get backported.
> >
> > I upgraded to linux-5.4.2, and attempted to remove the file, with the
> > same results.
> >
> > ls: cannot access
> > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> > No such
> > file or directory
> > total 0
> > -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
> >
> > rm: cannot remove
> > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> > No such
> > file or directory
> >
> > I don't see any output in dmesg. Is there some option I need to enable?
> >
> Then it's not name hash mismatch, but just index mismatch.
>
> In that case, kernel won't detect such problem by tree-checker. I'll
> update tree-checker to handle the case.
>
> I guess the only way to fix it is to rely on btrfs check --mode=lowmem
> --repair.
> But before that, would you please provde the following dump? So that I
> can be sure before crafting the enhanced tree-checker patch.
>
> # btrfs ins dump-tree -t 5 /dev/sda3 | grep "(4065004 INO" -A7
> # btrfs ins dump-tree -t 5 /dev/sda3 | grep "(486836.*13905)" -A7
> # btrfs ins dump-tree -t 5 /dev/sda3 | grep "(486836.*2543451757)" -A7

Here you go.

I ran this while the filesystem was mounted; if you need it to be run
while offline, I'll have to fire up a livecd.

                location key (4065004 INODE_ITEM 1073741824) type FILE
               transid 21397 data_len 0 name_len 44
               name: 0390cb341d248c589c419007da68b2-7351.manifest
       item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74
               location key (4065004 INODE_ITEM 0) type FILE
               transid 21397 data_len 0 name_len 44
               name: 0390cb341d248c589c419007da68b2-7351.manifest
leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE
leaf 533498265600 flags 0x1(WRITTEN) backref revision 1
fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb
       item 0 key (1059762 INODE_ITEM 0) itemoff 16123 itemsize 160
--
       item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160
               generation 21397 transid 21397 size 12261 nbytes 12288
               block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0
               sequence 23 flags 0x0(none)
               atime 1565546668.383680243 (2019-08-11 14:04:28)
               ctime 1565546668.383680243 (2019-08-11 14:04:28)
               mtime 1565546668.383680243 (2019-08-11 14:04:28)
               otime 1565546668.336681213 (2019-08-11 14:04:28)
       item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54
               index 13905 namelen 44 name:
0390cb341d248c589c419007da68b2-7351.manifest
       item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53
               generation 21397 type 1 (regular)
               extent data disk byte 6288928768 nr 12288
               extent data offset 0 nr 12288 ram 12288
               extent compression 0 (none)
       item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160
       item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74
               location key (4065004 INODE_ITEM 0) type FILE
               transid 21397 data_len 0 name_len 44
               name: 0390cb341d248c589c419007da68b2-7351.manifest
leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE
leaf 533498265600 flags 0x1(WRITTEN) backref revision 1
fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb
       item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74
               location key (4065004 INODE_ITEM 1073741824) type FILE
               transid 21397 data_len 0 name_len 44
               name: 0390cb341d248c589c419007da68b2-7351.manifest
       item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74
               location key (4065004 INODE_ITEM 0) type FILE
               transid 21397 data_len 0 name_len 44
               name: 0390cb341d248c589c419007da68b2-7351.manifest
parent transid verify failed on 629293056 wanted 177041 found 177044
parent transid verify failed on 629293056 wanted 177041 found 177044
Ignoring transid failure

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  1:33   ` Zygo Blaxell
@ 2019-12-09  1:52     ` Qu Wenruo
  2019-12-09  2:23       ` Zygo Blaxell
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2019-12-09  1:52 UTC (permalink / raw)
  To: Zygo Blaxell, Mike Gilbert; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 9812 bytes --]



On 2019/12/9 上午9:33, Zygo Blaxell wrote:
> On Sun, Dec 08, 2019 at 07:17:21PM -0500, Zygo Blaxell wrote:
>> On Sun, Dec 08, 2019 at 02:19:10PM -0500, Mike Gilbert wrote:
>>> Hello,
>>>
>>> I have a directory entry that cannot be stat-ed or unlinked. This
>>> issue persists across reboots, so it seems there is something wrong on
>>> disk.
>>>
>>> % ls -l /var/cache/ccache.bad/2/c
>>> ls: cannot access
>>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
>>> No such
>>> file or directory
>>> total 0
>>> -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
>>
>> I have seen a bug similar to this some years ago.  It was present as
>> far back as 4.5, and seems to still be present in 5.0.21.  I don't have
>> detailed tracking information on it due to the low severity: not a crash
>> or data corruption bug, and workarounds exist both to prevent the bug
>> and to clean up its aftermath.
>>
>> The reproducer is something like:
>>
>> 	while (true) { // pseudocode
>> 		int fd = create(tmp_name);
>> 		write(fd, ...);
>> 		fsync(fd);	// required, bug does not appear without this fsync
>> 		close(fd);
>> 		rename(tmp_name, regular_name);
>> 	}
>>
>> and a crash, maybe with some heavy write load.  This is typical of
>> applications like git and ccache, and in the wild, broken directory
>> entries are often found in these applications' directories.
>>
>> Somewhere between 4.5 and 4.12 (a big range, I know), there was a change
>> in behavior:  before, the broken directory entry could not be removed,
>> renamed, or used for a new file, the only way to get rid of the broken
>> directory entry was to delete the entire subvol.  After the behavior
>> change, the broken directory entry could be removed by creating a new
>> file and renaming it to the broken directory entry name.
> 
> I found a filesystem that currently has one of these broken dirents:
> 
> 	root@tester24:/media/testfs/beeshome# ls -l
> 	ls: cannot access 'beesstats.txt.tmp': No such file or directory
> 	total 3446032
> 	-rw-r--r-- 1 root root      10313 Nov 22  2018 all-df-today.png
> 	-rw-r--r-- 1 root root    3297813 Nov 22  2018 all-df-today.txt
> 	-rw------- 1 root root    1048488 Dec  7 17:35 beescrawl.dat
> 	-rwx------ 1 root root 1073741824 Dec  8 20:19 beeshash.dat
> 	-????????? ? ?    ?             ?            ? beesstats.txt.tmp
> 	-rw-r--r-- 1 root root   16064406 Dec  3 00:13 df-2019-11-28.txt
> 	-rw-r--r-- 1 root root    4269887 Dec  5 00:52 df-2019-12-03.txt
> 	-rw-r--r-- 1 root root    6358158 Dec  7 16:44 df-2019-12-05.txt
> 	-rw-r--r-- 1 root root    3221101 Dec  8 20:18 df-2019-12-07.txt
> 	-rw-r--r-- 1 root root 2208475574 Dec  3 00:13 log-2019-11-28.txt
> 	-rw-r--r-- 1 root root   72372394 Dec  5 00:52 log-2019-12-03.txt
> 	-rw-r--r-- 1 root root   97472346 Dec  7 16:44 log-2019-12-05.txt
> 	-rw-r--r-- 1 root root   42378425 Dec  8 20:19 log-2019-12-07.txt
> 	lrwxrwxrwx 1 root root         18 Dec  7 17:35 log-today.txt -> log-2019-12-07.txt
> 
> It seems I can create a file with the same name, and then I get two:
> 
> 	root@tester24:/media/testfs/beeshome# date > beesstats.txt.tmp
> 	root@tester24:/media/testfs/beeshome# ls -l
> 	total 3446044
> 	-rw-r--r-- 1 root root      10313 Nov 22  2018 all-df-today.png
> 	-rw-r--r-- 1 root root    3297813 Nov 22  2018 all-df-today.txt
> 	-rw------- 1 root root    1048488 Dec  7 17:35 beescrawl.dat
> 	-rwx------ 1 root root 1073741824 Dec  8 20:19 beeshash.dat
> 	-rw-r--r-- 1 root root         29 Dec  8 20:19 beesstats.txt.tmp
> 	-rw-r--r-- 1 root root         29 Dec  8 20:19 beesstats.txt.tmp
> 	-rw-r--r-- 1 root root   16064406 Dec  3 00:13 df-2019-11-28.txt
> 	-rw-r--r-- 1 root root    4269887 Dec  5 00:52 df-2019-12-03.txt
> 	-rw-r--r-- 1 root root    6358158 Dec  7 16:44 df-2019-12-05.txt
> 	-rw-r--r-- 1 root root    3221363 Dec  8 20:19 df-2019-12-07.txt
> 	-rw-r--r-- 1 root root 2208475574 Dec  3 00:13 log-2019-11-28.txt
> 	-rw-r--r-- 1 root root   72372394 Dec  5 00:52 log-2019-12-03.txt
> 	-rw-r--r-- 1 root root   97472346 Dec  7 16:44 log-2019-12-05.txt
> 	-rw-r--r-- 1 root root   42384027 Dec  8 20:19 log-2019-12-07.txt
> 	lrwxrwxrwx 1 root root         18 Dec  7 17:35 log-today.txt -> log-2019-12-07.txt
> 	root@tester24:/media/testfs/beeshome# cat beesstats.txt.tmp 
> 	Sun Dec  8 20:19:38 EST 2019
> 
> dump-tree sees both DIR_INDEX but only one DIR_ITEM:
> 
>         item 9 key (256 DIR_ITEM 2721875446) itemoff 15740 itemsize 47
>                 location key (133693 INODE_ITEM 0) type FILE
>                 transid 5002644 data_len 0 name_len 17
>                 name: beesstats.txt.tmp
>         item 18 key (256 DIR_INDEX 22037) itemoff 15332 itemsize 47
>                 location key (11481 INODE_ITEM 0) type FILE
>                 transid 1876891 data_len 0 name_len 17
>                 name: beesstats.txt.tmp
>         item 32 key (256 DIR_INDEX 264858) itemoff 14684 itemsize 47
>                 location key (133693 INODE_ITEM 0) type FILE
>                 transid 5002644 data_len 0 name_len 17
>                 name: beesstats.txt.tmp
> 
> but I can only delete DIR_ITEMs:
> 
> 	root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp 
> 	root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp 
> 	rm: cannot remove 'beesstats.txt.tmp': No such file or directory
> 	root@tester24:/media/testfs/beeshome# ls -l
> 	ls: cannot access 'beesstats.txt.tmp': No such file or directory
> 	total 3446048
> 	-rw-r--r-- 1 root root      10313 Nov 22  2018 all-df-today.png
> 	-rw-r--r-- 1 root root    3297813 Nov 22  2018 all-df-today.txt
> 	-rw------- 1 root root    1048488 Dec  7 17:35 beescrawl.dat
> 	-rwx------ 1 root root 1073741824 Dec  8 20:20 beeshash.dat
> 	-????????? ? ?    ?             ?            ? beesstats.txt.tmp
> 	-rw-r--r-- 1 root root   16064406 Dec  3 00:13 df-2019-11-28.txt
> 	-rw-r--r-- 1 root root    4269887 Dec  5 00:52 df-2019-12-03.txt
> 	-rw-r--r-- 1 root root    6358158 Dec  7 16:44 df-2019-12-05.txt
> 	-rw-r--r-- 1 root root    3221494 Dec  8 20:19 df-2019-12-07.txt
> 	-rw-r--r-- 1 root root 2208475574 Dec  3 00:13 log-2019-11-28.txt
> 	-rw-r--r-- 1 root root   72372394 Dec  5 00:52 log-2019-12-03.txt
> 	-rw-r--r-- 1 root root   97472346 Dec  7 16:44 log-2019-12-05.txt
> 	-rw-r--r-- 1 root root   42396102 Dec  8 20:20 log-2019-12-07.txt
> 	lrwxrwxrwx 1 root root         18 Dec  7 17:35 log-today.txt -> log-2019-12-07.txt
> 
> leaving the first DIR_INDEX behind:
> 
>         item 17 key (256 DIR_INDEX 22037) itemoff 15379 itemsize 47
>                 location key (11481 INODE_ITEM 0) type FILE
>                 transid 1876891 data_len 0 name_len 17
>                 name: beesstats.txt.tmp

This looks like a older kernel bug (hopes so).

So there is an orphan DIR_INDEX left, but never cleaned up properly.

In that case, btrfs-progs should be able to repair it.
But strangely, why original mode check didn't report it?

BTW, does that 11481 inode still exist?

Thanks,
Qu
> 
> So the btrfs read side is fine, it's the writing side that is putting bad
> metadata on the disk.
> 
>> Another workaround is to remove the fsync by running the application
>> under eatmydata.  btrfs performs a flush in the rename() operation when
>> an existing file is replaced, so the fsync that triggers the bug was
>> not necessary in the first place.  Note this only works when replacing
>> an existing file, so the flushoncommit mount option is required to make
>> this work in other cases.
>>
>>> % uname -a
>>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
>>> Phenom(tm) II X6 1055T Processor
>>> AuthenticAMD GNU/Linux
>>>
>>> % btrfs --version
>>> btrfs-progs v5.4
>>>
>>> I have tried running btrfs check, and I get differing results based on
>>> the --mode switch:
>>>
>>> # btrfs check --readonly /dev/sda3
>>> [1/7] checking root items
>>> [2/7] checking extents
>>> [3/7] checking free space cache
>>> [4/7] checking fs roots
>>> [5/7] checking only csums items (without verifying data)
>>> [6/7] checking root refs
>>> [7/7] checking quota groups
>>> Opening filesystem to check...
>>> Checking filesystem on /dev/sda3
>>> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
>>> found 284337733632 bytes used, no error found
>>> total csum bytes: 267182280
>>> total tree bytes: 4498915328
>>> total fs tree bytes: 3972464640
>>> total extent tree bytes: 199819264
>>> btree space waste bytes: 776711635
>>> file data blocks allocated: 313928671232
>>>  referenced 279141621760
>>>
>>> # btrfs check --readonly --mode=lowmem /dev/sda3
>>> [1/7] checking root items
>>> [2/7] checking extents
>>> [3/7] checking free space cache
>>> [4/7] checking fs roots
>>> ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name
>>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing
>>> ERROR: root 5 DIR ITEM[486836 13905] name
>>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath
>>> ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name
>>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1
>>> ERROR: errors found in fs roots
>>> Opening filesystem to check...
>>> Checking filesystem on /dev/sda3
>>> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
>>> found 284337733632 bytes used, error(s) found
>>> total csum bytes: 267182280
>>> total tree bytes: 4498915328
>>> total fs tree bytes: 3972464640
>>> total extent tree bytes: 199819264
>>> btree space waste bytes: 776711635
>>> file data blocks allocated: 313928671232
>>>  referenced 279141621760
>>>
>>> Please advise on possible next steps to diagnose and fix this.
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  1:51           ` Mike Gilbert
@ 2019-12-09  2:05             ` Qu Wenruo
  2019-12-09  2:20               ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2019-12-09  2:05 UTC (permalink / raw)
  To: Mike Gilbert; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3112 bytes --]



On 2019/12/9 上午9:51, Mike Gilbert wrote:
[...]
> 
> Here you go.
> 
> I ran this while the filesystem was mounted; if you need it to be run
> while offline, I'll have to fire up a livecd.
The info is good enough, no need to go livecd.

> --
>        item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160
>                generation 21397 transid 21397 size 12261 nbytes 12288
>                block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0
>                sequence 23 flags 0x0(none)
>                atime 1565546668.383680243 (2019-08-11 14:04:28)
>                ctime 1565546668.383680243 (2019-08-11 14:04:28)
>                mtime 1565546668.383680243 (2019-08-11 14:04:28)
>                otime 1565546668.336681213 (2019-08-11 14:04:28)
>        item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54
>                index 13905 namelen 44 name:
> 0390cb341d248c589c419007da68b2-7351.manifest

That inode exists and is good.

>        item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53
>                generation 21397 type 1 (regular)
>                extent data disk byte 6288928768 nr 12288
>                extent data offset 0 nr 12288 ram 12288
>                extent compression 0 (none)
>        item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160
>        item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74
>                location key (4065004 INODE_ITEM 0) type FILE
>                transid 21397 data_len 0 name_len 44
>                name: 0390cb341d248c589c419007da68b2-7351.manifest

Good parent dir index.

> leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE
> leaf 533498265600 flags 0x1(WRITTEN) backref revision 1
> fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb
>        item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74
>                location key (4065004 INODE_ITEM 1073741824) type FILE
>                transid 21397 data_len 0 name_len 44
>                name: 0390cb341d248c589c419007da68b2-7351.manifest

This is the problem, bad parent dir hash.

The key should be (4065004 INODE_ITEM 0). The 1073741824 (0x40000000) is
completely garbage.

That garbage looks like a bit flip at runtime.
It's recommended to check your memory.

I'll add extra tree-check checks, so that such runtime problem can be
detected before corrupted data reach disk.


For repair, I'll craft a special btrfs-progs for you to handle it, as
that should be the safest way.
Please wait for another 15min for that tool.

Thanks,
Qu


>        item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74
>                location key (4065004 INODE_ITEM 0) type FILE
>                transid 21397 data_len 0 name_len 44
>                name: 0390cb341d248c589c419007da68b2-7351.manifest
> parent transid verify failed on 629293056 wanted 177041 found 177044
> parent transid verify failed on 629293056 wanted 177041 found 177044
> Ignoring transid failure
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  2:05             ` Qu Wenruo
@ 2019-12-09  2:20               ` Qu Wenruo
  2019-12-09  2:37                 ` Mike Gilbert
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2019-12-09  2:20 UTC (permalink / raw)
  To: Mike Gilbert; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3667 bytes --]



On 2019/12/9 上午10:05, Qu Wenruo wrote:
> 
> 
> On 2019/12/9 上午9:51, Mike Gilbert wrote:
> [...]
>>
>> Here you go.
>>
>> I ran this while the filesystem was mounted; if you need it to be run
>> while offline, I'll have to fire up a livecd.
> The info is good enough, no need to go livecd.
> 
>> --
>>        item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160
>>                generation 21397 transid 21397 size 12261 nbytes 12288
>>                block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0
>>                sequence 23 flags 0x0(none)
>>                atime 1565546668.383680243 (2019-08-11 14:04:28)
>>                ctime 1565546668.383680243 (2019-08-11 14:04:28)
>>                mtime 1565546668.383680243 (2019-08-11 14:04:28)
>>                otime 1565546668.336681213 (2019-08-11 14:04:28)
>>        item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54
>>                index 13905 namelen 44 name:
>> 0390cb341d248c589c419007da68b2-7351.manifest
> 
> That inode exists and is good.
> 
>>        item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53
>>                generation 21397 type 1 (regular)
>>                extent data disk byte 6288928768 nr 12288
>>                extent data offset 0 nr 12288 ram 12288
>>                extent compression 0 (none)
>>        item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160
>>        item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74
>>                location key (4065004 INODE_ITEM 0) type FILE
>>                transid 21397 data_len 0 name_len 44
>>                name: 0390cb341d248c589c419007da68b2-7351.manifest
> 
> Good parent dir index.
> 
>> leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE
>> leaf 533498265600 flags 0x1(WRITTEN) backref revision 1
>> fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
>> chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb
>>        item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74
>>                location key (4065004 INODE_ITEM 1073741824) type FILE
>>                transid 21397 data_len 0 name_len 44
>>                name: 0390cb341d248c589c419007da68b2-7351.manifest
> 
> This is the problem, bad parent dir hash.
> 
> The key should be (4065004 INODE_ITEM 0). The 1073741824 (0x40000000) is
> completely garbage.
> 
> That garbage looks like a bit flip at runtime.
> It's recommended to check your memory.
> 
> I'll add extra tree-check checks, so that such runtime problem can be
> detected before corrupted data reach disk.
> 
> 
> For repair, I'll craft a special btrfs-progs for you to handle it, as
> that should be the safest way.
> Please wait for another 15min for that tool.

Here is the special branch for you:
https://github.com/adam900710/btrfs-progs/tree/dirty_fix_for_mike

After compile, you can use btrfs-corrupt-block (I know it's a bad name)
to repair your fs (must be unmounted):

# ./btrfs-corrupt-block -X /dev/sda3

If anything wrong happened, your fs should be kept untouched.
If repaired successfully, there should be no output.

Thanks,
Qu

> 
> Thanks,
> Qu
> 
> 
>>        item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74
>>                location key (4065004 INODE_ITEM 0) type FILE
>>                transid 21397 data_len 0 name_len 44
>>                name: 0390cb341d248c589c419007da68b2-7351.manifest
>> parent transid verify failed on 629293056 wanted 177041 found 177044
>> parent transid verify failed on 629293056 wanted 177041 found 177044
>> Ignoring transid failure
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  1:52     ` Qu Wenruo
@ 2019-12-09  2:23       ` Zygo Blaxell
  0 siblings, 0 replies; 15+ messages in thread
From: Zygo Blaxell @ 2019-12-09  2:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Mike Gilbert, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 10853 bytes --]

On Mon, Dec 09, 2019 at 09:52:54AM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/12/9 上午9:33, Zygo Blaxell wrote:
> > On Sun, Dec 08, 2019 at 07:17:21PM -0500, Zygo Blaxell wrote:
> >> On Sun, Dec 08, 2019 at 02:19:10PM -0500, Mike Gilbert wrote:
> >>> Hello,
> >>>
> >>> I have a directory entry that cannot be stat-ed or unlinked. This
> >>> issue persists across reboots, so it seems there is something wrong on
> >>> disk.
> >>>
> >>> % ls -l /var/cache/ccache.bad/2/c
> >>> ls: cannot access
> >>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest':
> >>> No such
> >>> file or directory
> >>> total 0
> >>> -????????? ? ? ? ?            ? 0390cb341d248c589c419007da68b2-7351.manifest
> >>
> >> I have seen a bug similar to this some years ago.  It was present as
> >> far back as 4.5, and seems to still be present in 5.0.21.  I don't have
> >> detailed tracking information on it due to the low severity: not a crash
> >> or data corruption bug, and workarounds exist both to prevent the bug
> >> and to clean up its aftermath.
> >>
> >> The reproducer is something like:
> >>
> >> 	while (true) { // pseudocode
> >> 		int fd = create(tmp_name);
> >> 		write(fd, ...);
> >> 		fsync(fd);	// required, bug does not appear without this fsync
> >> 		close(fd);
> >> 		rename(tmp_name, regular_name);
> >> 	}
> >>
> >> and a crash, maybe with some heavy write load.  This is typical of
> >> applications like git and ccache, and in the wild, broken directory
> >> entries are often found in these applications' directories.
> >>
> >> Somewhere between 4.5 and 4.12 (a big range, I know), there was a change
> >> in behavior:  before, the broken directory entry could not be removed,
> >> renamed, or used for a new file, the only way to get rid of the broken
> >> directory entry was to delete the entire subvol.  After the behavior
> >> change, the broken directory entry could be removed by creating a new
> >> file and renaming it to the broken directory entry name.
> > 
> > I found a filesystem that currently has one of these broken dirents:
> > 
> > 	root@tester24:/media/testfs/beeshome# ls -l
> > 	ls: cannot access 'beesstats.txt.tmp': No such file or directory
> > 	total 3446032
> > 	-rw-r--r-- 1 root root      10313 Nov 22  2018 all-df-today.png
> > 	-rw-r--r-- 1 root root    3297813 Nov 22  2018 all-df-today.txt
> > 	-rw------- 1 root root    1048488 Dec  7 17:35 beescrawl.dat
> > 	-rwx------ 1 root root 1073741824 Dec  8 20:19 beeshash.dat
> > 	-????????? ? ?    ?             ?            ? beesstats.txt.tmp
> > 	-rw-r--r-- 1 root root   16064406 Dec  3 00:13 df-2019-11-28.txt
> > 	-rw-r--r-- 1 root root    4269887 Dec  5 00:52 df-2019-12-03.txt
> > 	-rw-r--r-- 1 root root    6358158 Dec  7 16:44 df-2019-12-05.txt
> > 	-rw-r--r-- 1 root root    3221101 Dec  8 20:18 df-2019-12-07.txt
> > 	-rw-r--r-- 1 root root 2208475574 Dec  3 00:13 log-2019-11-28.txt
> > 	-rw-r--r-- 1 root root   72372394 Dec  5 00:52 log-2019-12-03.txt
> > 	-rw-r--r-- 1 root root   97472346 Dec  7 16:44 log-2019-12-05.txt
> > 	-rw-r--r-- 1 root root   42378425 Dec  8 20:19 log-2019-12-07.txt
> > 	lrwxrwxrwx 1 root root         18 Dec  7 17:35 log-today.txt -> log-2019-12-07.txt
> > 
> > It seems I can create a file with the same name, and then I get two:
> > 
> > 	root@tester24:/media/testfs/beeshome# date > beesstats.txt.tmp
> > 	root@tester24:/media/testfs/beeshome# ls -l
> > 	total 3446044
> > 	-rw-r--r-- 1 root root      10313 Nov 22  2018 all-df-today.png
> > 	-rw-r--r-- 1 root root    3297813 Nov 22  2018 all-df-today.txt
> > 	-rw------- 1 root root    1048488 Dec  7 17:35 beescrawl.dat
> > 	-rwx------ 1 root root 1073741824 Dec  8 20:19 beeshash.dat
> > 	-rw-r--r-- 1 root root         29 Dec  8 20:19 beesstats.txt.tmp
> > 	-rw-r--r-- 1 root root         29 Dec  8 20:19 beesstats.txt.tmp
> > 	-rw-r--r-- 1 root root   16064406 Dec  3 00:13 df-2019-11-28.txt
> > 	-rw-r--r-- 1 root root    4269887 Dec  5 00:52 df-2019-12-03.txt
> > 	-rw-r--r-- 1 root root    6358158 Dec  7 16:44 df-2019-12-05.txt
> > 	-rw-r--r-- 1 root root    3221363 Dec  8 20:19 df-2019-12-07.txt
> > 	-rw-r--r-- 1 root root 2208475574 Dec  3 00:13 log-2019-11-28.txt
> > 	-rw-r--r-- 1 root root   72372394 Dec  5 00:52 log-2019-12-03.txt
> > 	-rw-r--r-- 1 root root   97472346 Dec  7 16:44 log-2019-12-05.txt
> > 	-rw-r--r-- 1 root root   42384027 Dec  8 20:19 log-2019-12-07.txt
> > 	lrwxrwxrwx 1 root root         18 Dec  7 17:35 log-today.txt -> log-2019-12-07.txt
> > 	root@tester24:/media/testfs/beeshome# cat beesstats.txt.tmp 
> > 	Sun Dec  8 20:19:38 EST 2019
> > 
> > dump-tree sees both DIR_INDEX but only one DIR_ITEM:
> > 
> >         item 9 key (256 DIR_ITEM 2721875446) itemoff 15740 itemsize 47
> >                 location key (133693 INODE_ITEM 0) type FILE
> >                 transid 5002644 data_len 0 name_len 17
> >                 name: beesstats.txt.tmp
> >         item 18 key (256 DIR_INDEX 22037) itemoff 15332 itemsize 47
> >                 location key (11481 INODE_ITEM 0) type FILE
> >                 transid 1876891 data_len 0 name_len 17
> >                 name: beesstats.txt.tmp
> >         item 32 key (256 DIR_INDEX 264858) itemoff 14684 itemsize 47
> >                 location key (133693 INODE_ITEM 0) type FILE
> >                 transid 5002644 data_len 0 name_len 17
> >                 name: beesstats.txt.tmp
> > 
> > but I can only delete DIR_ITEMs:
> > 
> > 	root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp 
> > 	root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp 
> > 	rm: cannot remove 'beesstats.txt.tmp': No such file or directory
> > 	root@tester24:/media/testfs/beeshome# ls -l
> > 	ls: cannot access 'beesstats.txt.tmp': No such file or directory
> > 	total 3446048
> > 	-rw-r--r-- 1 root root      10313 Nov 22  2018 all-df-today.png
> > 	-rw-r--r-- 1 root root    3297813 Nov 22  2018 all-df-today.txt
> > 	-rw------- 1 root root    1048488 Dec  7 17:35 beescrawl.dat
> > 	-rwx------ 1 root root 1073741824 Dec  8 20:20 beeshash.dat
> > 	-????????? ? ?    ?             ?            ? beesstats.txt.tmp
> > 	-rw-r--r-- 1 root root   16064406 Dec  3 00:13 df-2019-11-28.txt
> > 	-rw-r--r-- 1 root root    4269887 Dec  5 00:52 df-2019-12-03.txt
> > 	-rw-r--r-- 1 root root    6358158 Dec  7 16:44 df-2019-12-05.txt
> > 	-rw-r--r-- 1 root root    3221494 Dec  8 20:19 df-2019-12-07.txt
> > 	-rw-r--r-- 1 root root 2208475574 Dec  3 00:13 log-2019-11-28.txt
> > 	-rw-r--r-- 1 root root   72372394 Dec  5 00:52 log-2019-12-03.txt
> > 	-rw-r--r-- 1 root root   97472346 Dec  7 16:44 log-2019-12-05.txt
> > 	-rw-r--r-- 1 root root   42396102 Dec  8 20:20 log-2019-12-07.txt
> > 	lrwxrwxrwx 1 root root         18 Dec  7 17:35 log-today.txt -> log-2019-12-07.txt
> > 
> > leaving the first DIR_INDEX behind:
> > 
> >         item 17 key (256 DIR_INDEX 22037) itemoff 15379 itemsize 47
> >                 location key (11481 INODE_ITEM 0) type FILE
> >                 transid 1876891 data_len 0 name_len 17
> >                 name: beesstats.txt.tmp
> 
> This looks like a older kernel bug (hopes so).
> 
> So there is an orphan DIR_INDEX left, but never cleaned up properly.
> 
> In that case, btrfs-progs should be able to repair it.
> But strangely, why original mode check didn't report it?

I'm not sure what you mean by "original mode check".

I can't run btrfs check on this filesystem (97GB of metadata, too
big for either regular or lowmem to handle in reasonable time).

There used to be a stat check on the missing inode, which would fail
in older kernels, and make the filename permanently unusable (until the
subvol was deleted).  That broke a lot of applications.  The stat check
was removed at some point, which is much better.

> BTW, does that 11481 inode still exist?

Nope, the only '11481' in the entire subvol's dump-tree output is that
DIR_INDEX item.

> Thanks,
> Qu
> > 
> > So the btrfs read side is fine, it's the writing side that is putting bad
> > metadata on the disk.
> > 
> >> Another workaround is to remove the fsync by running the application
> >> under eatmydata.  btrfs performs a flush in the rename() operation when
> >> an existing file is replaced, so the fsync that triggers the bug was
> >> not necessary in the first place.  Note this only works when replacing
> >> an existing file, so the flushoncommit mount option is required to make
> >> this work in other cases.
> >>
> >>> % uname -a
> >>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD
> >>> Phenom(tm) II X6 1055T Processor
> >>> AuthenticAMD GNU/Linux
> >>>
> >>> % btrfs --version
> >>> btrfs-progs v5.4
> >>>
> >>> I have tried running btrfs check, and I get differing results based on
> >>> the --mode switch:
> >>>
> >>> # btrfs check --readonly /dev/sda3
> >>> [1/7] checking root items
> >>> [2/7] checking extents
> >>> [3/7] checking free space cache
> >>> [4/7] checking fs roots
> >>> [5/7] checking only csums items (without verifying data)
> >>> [6/7] checking root refs
> >>> [7/7] checking quota groups
> >>> Opening filesystem to check...
> >>> Checking filesystem on /dev/sda3
> >>> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> >>> found 284337733632 bytes used, no error found
> >>> total csum bytes: 267182280
> >>> total tree bytes: 4498915328
> >>> total fs tree bytes: 3972464640
> >>> total extent tree bytes: 199819264
> >>> btree space waste bytes: 776711635
> >>> file data blocks allocated: 313928671232
> >>>  referenced 279141621760
> >>>
> >>> # btrfs check --readonly --mode=lowmem /dev/sda3
> >>> [1/7] checking root items
> >>> [2/7] checking extents
> >>> [3/7] checking free space cache
> >>> [4/7] checking fs roots
> >>> ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name
> >>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing
> >>> ERROR: root 5 DIR ITEM[486836 13905] name
> >>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath
> >>> ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name
> >>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1
> >>> ERROR: errors found in fs roots
> >>> Opening filesystem to check...
> >>> Checking filesystem on /dev/sda3
> >>> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> >>> found 284337733632 bytes used, error(s) found
> >>> total csum bytes: 267182280
> >>> total tree bytes: 4498915328
> >>> total fs tree bytes: 3972464640
> >>> total extent tree bytes: 199819264
> >>> btree space waste bytes: 776711635
> >>> file data blocks allocated: 313928671232
> >>>  referenced 279141621760
> >>>
> >>> Please advise on possible next steps to diagnose and fix this.
> > 
> > 
> 




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  2:20               ` Qu Wenruo
@ 2019-12-09  2:37                 ` Mike Gilbert
  2019-12-09  2:43                   ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Gilbert @ 2019-12-09  2:37 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, Dec 8, 2019 at 9:20 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/12/9 上午10:05, Qu Wenruo wrote:
> >
> >
> > On 2019/12/9 上午9:51, Mike Gilbert wrote:
> > [...]
> >>
> >> Here you go.
> >>
> >> I ran this while the filesystem was mounted; if you need it to be run
> >> while offline, I'll have to fire up a livecd.
> > The info is good enough, no need to go livecd.
> >
> >> --
> >>        item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160
> >>                generation 21397 transid 21397 size 12261 nbytes 12288
> >>                block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0
> >>                sequence 23 flags 0x0(none)
> >>                atime 1565546668.383680243 (2019-08-11 14:04:28)
> >>                ctime 1565546668.383680243 (2019-08-11 14:04:28)
> >>                mtime 1565546668.383680243 (2019-08-11 14:04:28)
> >>                otime 1565546668.336681213 (2019-08-11 14:04:28)
> >>        item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54
> >>                index 13905 namelen 44 name:
> >> 0390cb341d248c589c419007da68b2-7351.manifest
> >
> > That inode exists and is good.
> >
> >>        item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53
> >>                generation 21397 type 1 (regular)
> >>                extent data disk byte 6288928768 nr 12288
> >>                extent data offset 0 nr 12288 ram 12288
> >>                extent compression 0 (none)
> >>        item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160
> >>        item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74
> >>                location key (4065004 INODE_ITEM 0) type FILE
> >>                transid 21397 data_len 0 name_len 44
> >>                name: 0390cb341d248c589c419007da68b2-7351.manifest
> >
> > Good parent dir index.
> >
> >> leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE
> >> leaf 533498265600 flags 0x1(WRITTEN) backref revision 1
> >> fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
> >> chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb
> >>        item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74
> >>                location key (4065004 INODE_ITEM 1073741824) type FILE
> >>                transid 21397 data_len 0 name_len 44
> >>                name: 0390cb341d248c589c419007da68b2-7351.manifest
> >
> > This is the problem, bad parent dir hash.
> >
> > The key should be (4065004 INODE_ITEM 0). The 1073741824 (0x40000000) is
> > completely garbage.
> >
> > That garbage looks like a bit flip at runtime.
> > It's recommended to check your memory.
> >
> > I'll add extra tree-check checks, so that such runtime problem can be
> > detected before corrupted data reach disk.
> >
> >
> > For repair, I'll craft a special btrfs-progs for you to handle it, as
> > that should be the safest way.
> > Please wait for another 15min for that tool.
>
> Here is the special branch for you:
> https://github.com/adam900710/btrfs-progs/tree/dirty_fix_for_mike
>
> After compile, you can use btrfs-corrupt-block (I know it's a bad name)
> to repair your fs (must be unmounted):
>
> # ./btrfs-corrupt-block -X /dev/sda3
>
> If anything wrong happened, your fs should be kept untouched.
> If repaired successfully, there should be no output.
>
> Thanks,
> Qu

That worked. Thank you very much for your help with this!

Now, I guess I'll fire up Memtest86 overnight.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unable to remove directory entry
  2019-12-09  2:37                 ` Mike Gilbert
@ 2019-12-09  2:43                   ` Qu Wenruo
  0 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2019-12-09  2:43 UTC (permalink / raw)
  To: Mike Gilbert; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3757 bytes --]



On 2019/12/9 上午10:37, Mike Gilbert wrote:
> On Sun, Dec 8, 2019 at 9:20 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2019/12/9 上午10:05, Qu Wenruo wrote:
>>>
>>>
>>> On 2019/12/9 上午9:51, Mike Gilbert wrote:
>>> [...]
>>>>
>>>> Here you go.
>>>>
>>>> I ran this while the filesystem was mounted; if you need it to be run
>>>> while offline, I'll have to fire up a livecd.
>>> The info is good enough, no need to go livecd.
>>>
>>>> --
>>>>        item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160
>>>>                generation 21397 transid 21397 size 12261 nbytes 12288
>>>>                block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0
>>>>                sequence 23 flags 0x0(none)
>>>>                atime 1565546668.383680243 (2019-08-11 14:04:28)
>>>>                ctime 1565546668.383680243 (2019-08-11 14:04:28)
>>>>                mtime 1565546668.383680243 (2019-08-11 14:04:28)
>>>>                otime 1565546668.336681213 (2019-08-11 14:04:28)
>>>>        item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54
>>>>                index 13905 namelen 44 name:
>>>> 0390cb341d248c589c419007da68b2-7351.manifest
>>>
>>> That inode exists and is good.
>>>
>>>>        item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53
>>>>                generation 21397 type 1 (regular)
>>>>                extent data disk byte 6288928768 nr 12288
>>>>                extent data offset 0 nr 12288 ram 12288
>>>>                extent compression 0 (none)
>>>>        item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160
>>>>        item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74
>>>>                location key (4065004 INODE_ITEM 0) type FILE
>>>>                transid 21397 data_len 0 name_len 44
>>>>                name: 0390cb341d248c589c419007da68b2-7351.manifest
>>>
>>> Good parent dir index.
>>>
>>>> leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE
>>>> leaf 533498265600 flags 0x1(WRITTEN) backref revision 1
>>>> fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6
>>>> chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb
>>>>        item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74
>>>>                location key (4065004 INODE_ITEM 1073741824) type FILE
>>>>                transid 21397 data_len 0 name_len 44
>>>>                name: 0390cb341d248c589c419007da68b2-7351.manifest
>>>
>>> This is the problem, bad parent dir hash.
>>>
>>> The key should be (4065004 INODE_ITEM 0). The 1073741824 (0x40000000) is
>>> completely garbage.
>>>
>>> That garbage looks like a bit flip at runtime.
>>> It's recommended to check your memory.
>>>
>>> I'll add extra tree-check checks, so that such runtime problem can be
>>> detected before corrupted data reach disk.
>>>
>>>
>>> For repair, I'll craft a special btrfs-progs for you to handle it, as
>>> that should be the safest way.
>>> Please wait for another 15min for that tool.
>>
>> Here is the special branch for you:
>> https://github.com/adam900710/btrfs-progs/tree/dirty_fix_for_mike
>>
>> After compile, you can use btrfs-corrupt-block (I know it's a bad name)
>> to repair your fs (must be unmounted):
>>
>> # ./btrfs-corrupt-block -X /dev/sda3
>>
>> If anything wrong happened, your fs should be kept untouched.
>> If repaired successfully, there should be no output.
>>
>> Thanks,
>> Qu
> 
> That worked. Thank you very much for your help with this!
> 
> Now, I guess I'll fire up Memtest86 overnight.
> 
Just a reminder, if tree-checker is properly enhanced, for 5.6 even with
bad memory, we should be able to detect and prevent it in advance.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-12-09  2:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-08 19:19 Unable to remove directory entry Mike Gilbert
2019-12-09  0:11 ` Qu Wenruo
2019-12-09  0:30   ` Mike Gilbert
2019-12-09  0:41     ` Qu Wenruo
2019-12-09  1:31       ` Mike Gilbert
2019-12-09  1:45         ` Qu Wenruo
2019-12-09  1:51           ` Mike Gilbert
2019-12-09  2:05             ` Qu Wenruo
2019-12-09  2:20               ` Qu Wenruo
2019-12-09  2:37                 ` Mike Gilbert
2019-12-09  2:43                   ` Qu Wenruo
2019-12-09  0:17 ` Zygo Blaxell
2019-12-09  1:33   ` Zygo Blaxell
2019-12-09  1:52     ` Qu Wenruo
2019-12-09  2:23       ` Zygo Blaxell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).