* Unable to remove directory entry @ 2019-12-08 19:19 Mike Gilbert 2019-12-09 0:11 ` Qu Wenruo 2019-12-09 0:17 ` Zygo Blaxell 0 siblings, 2 replies; 15+ messages in thread From: Mike Gilbert @ 2019-12-08 19:19 UTC (permalink / raw) To: linux-btrfs Hello, I have a directory entry that cannot be stat-ed or unlinked. This issue persists across reboots, so it seems there is something wrong on disk. % ls -l /var/cache/ccache.bad/2/c ls: cannot access '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': No such file or directory total 0 -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest % uname -a Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD Phenom(tm) II X6 1055T Processor AuthenticAMD GNU/Linux % btrfs --version btrfs-progs v5.4 I have tried running btrfs check, and I get differing results based on the --mode switch: # btrfs check --readonly /dev/sda3 [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups Opening filesystem to check... Checking filesystem on /dev/sda3 UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 found 284337733632 bytes used, no error found total csum bytes: 267182280 total tree bytes: 4498915328 total fs tree bytes: 3972464640 total extent tree bytes: 199819264 btree space waste bytes: 776711635 file data blocks allocated: 313928671232 referenced 279141621760 # btrfs check --readonly --mode=lowmem /dev/sda3 [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing ERROR: root 5 DIR ITEM[486836 13905] name 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 ERROR: errors found in fs roots Opening filesystem to check... Checking filesystem on /dev/sda3 UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 found 284337733632 bytes used, error(s) found total csum bytes: 267182280 total tree bytes: 4498915328 total fs tree bytes: 3972464640 total extent tree bytes: 199819264 btree space waste bytes: 776711635 file data blocks allocated: 313928671232 referenced 279141621760 Please advise on possible next steps to diagnose and fix this. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-08 19:19 Unable to remove directory entry Mike Gilbert @ 2019-12-09 0:11 ` Qu Wenruo 2019-12-09 0:30 ` Mike Gilbert 2019-12-09 0:17 ` Zygo Blaxell 1 sibling, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2019-12-09 0:11 UTC (permalink / raw) To: Mike Gilbert, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 3218 bytes --] On 2019/12/9 上午3:19, Mike Gilbert wrote: > Hello, > > I have a directory entry that cannot be stat-ed or unlinked. This > issue persists across reboots, so it seems there is something wrong on > disk. > > % ls -l /var/cache/ccache.bad/2/c > ls: cannot access > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > No such > file or directory > total 0 > -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest Dmesg if any, please. > > % uname -a > Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD > Phenom(tm) II X6 1055T Processor > AuthenticAMD GNU/Linux The kernel is not new enough to btrfs' standard. For this possibility name hash mismatch bug, newer kernel will reported detailed problems. > > % btrfs --version > btrfs-progs v5.4 > > I have tried running btrfs check, and I get differing results based on > the --mode switch: > > # btrfs check --readonly /dev/sda3 > [1/7] checking root items > [2/7] checking extents > [3/7] checking free space cache > [4/7] checking fs roots > [5/7] checking only csums items (without verifying data) > [6/7] checking root refs > [7/7] checking quota groups > Opening filesystem to check... > Checking filesystem on /dev/sda3 > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > found 284337733632 bytes used, no error found > total csum bytes: 267182280 > total tree bytes: 4498915328 > total fs tree bytes: 3972464640 > total extent tree bytes: 199819264 > btree space waste bytes: 776711635 > file data blocks allocated: 313928671232 > referenced 279141621760 > > # btrfs check --readonly --mode=lowmem /dev/sda3 > [1/7] checking root items > [2/7] checking extents > [3/7] checking free space cache > [4/7] checking fs roots > ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing > ERROR: root 5 DIR ITEM[486836 13905] name > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath > ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 This means the name hash for the filename "0390cb341d248c589c419007da68b2-7351.manifest" is incorrect. Thus kernel can't locate that inode correctly. Furthermore, the index for inode 4065004 doesn't make much sense. The number looks absolutely insane. If your fs is small enough, you can try do a binary dump first, then try btrfs check --mode=lowmem --repair, as we had such ability to repair in v5.4. If your fs is too large, I guess you can only prey bad thing doesn't happen... Thanks, Qu > ERROR: errors found in fs roots > Opening filesystem to check... > Checking filesystem on /dev/sda3 > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > found 284337733632 bytes used, error(s) found > total csum bytes: 267182280 > total tree bytes: 4498915328 > total fs tree bytes: 3972464640 > total extent tree bytes: 199819264 > btree space waste bytes: 776711635 > file data blocks allocated: 313928671232 > referenced 279141621760 > > Please advise on possible next steps to diagnose and fix this. > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 0:11 ` Qu Wenruo @ 2019-12-09 0:30 ` Mike Gilbert 2019-12-09 0:41 ` Qu Wenruo 0 siblings, 1 reply; 15+ messages in thread From: Mike Gilbert @ 2019-12-09 0:30 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2019/12/9 上午3:19, Mike Gilbert wrote: > > Hello, > > > > I have a directory entry that cannot be stat-ed or unlinked. This > > issue persists across reboots, so it seems there is something wrong on > > disk. > > > > % ls -l /var/cache/ccache.bad/2/c > > ls: cannot access > > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > > No such > > file or directory > > total 0 > > -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest > > Dmesg if any, please. There's nothing btrfs-related in the dmesg output. > > > > % uname -a > > Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD > > Phenom(tm) II X6 1055T Processor > > AuthenticAMD GNU/Linux > > The kernel is not new enough to btrfs' standard. > > For this possibility name hash mismatch bug, newer kernel will reported > detailed problems. Would 4.19.88 suffice, or do I need to switch to a newer release branch? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 0:30 ` Mike Gilbert @ 2019-12-09 0:41 ` Qu Wenruo 2019-12-09 1:31 ` Mike Gilbert 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2019-12-09 0:41 UTC (permalink / raw) To: Mike Gilbert; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 1301 bytes --] On 2019/12/9 上午8:30, Mike Gilbert wrote: > On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> >> On 2019/12/9 上午3:19, Mike Gilbert wrote: >>> Hello, >>> >>> I have a directory entry that cannot be stat-ed or unlinked. This >>> issue persists across reboots, so it seems there is something wrong on >>> disk. >>> >>> % ls -l /var/cache/ccache.bad/2/c >>> ls: cannot access >>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': >>> No such >>> file or directory >>> total 0 >>> -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest >> >> Dmesg if any, please. > > There's nothing btrfs-related in the dmesg output. > >>> >>> % uname -a >>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD >>> Phenom(tm) II X6 1055T Processor >>> AuthenticAMD GNU/Linux >> >> The kernel is not new enough to btrfs' standard. >> >> For this possibility name hash mismatch bug, newer kernel will reported >> detailed problems. > > Would 4.19.88 suffice, or do I need to switch to a newer release branch? > I'd recommend to go at least latest LTS (v5.3.x). .88 is just backports, nothing really different. And sometimes big fixes won't get backported. Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 0:41 ` Qu Wenruo @ 2019-12-09 1:31 ` Mike Gilbert 2019-12-09 1:45 ` Qu Wenruo 0 siblings, 1 reply; 15+ messages in thread From: Mike Gilbert @ 2019-12-09 1:31 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Sun, Dec 8, 2019 at 7:41 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2019/12/9 上午8:30, Mike Gilbert wrote: > > On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> > >> > >> > >> On 2019/12/9 上午3:19, Mike Gilbert wrote: > >>> Hello, > >>> > >>> I have a directory entry that cannot be stat-ed or unlinked. This > >>> issue persists across reboots, so it seems there is something wrong on > >>> disk. > >>> > >>> % ls -l /var/cache/ccache.bad/2/c > >>> ls: cannot access > >>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > >>> No such > >>> file or directory > >>> total 0 > >>> -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest > >> > >> Dmesg if any, please. > > > > There's nothing btrfs-related in the dmesg output. > > > >>> > >>> % uname -a > >>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD > >>> Phenom(tm) II X6 1055T Processor > >>> AuthenticAMD GNU/Linux > >> > >> The kernel is not new enough to btrfs' standard. > >> > >> For this possibility name hash mismatch bug, newer kernel will reported > >> detailed problems. > > > > Would 4.19.88 suffice, or do I need to switch to a newer release branch? > > > I'd recommend to go at least latest LTS (v5.3.x). > > .88 is just backports, nothing really different. And sometimes big fixes > won't get backported. I upgraded to linux-5.4.2, and attempted to remove the file, with the same results. ls: cannot access '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': No such file or directory total 0 -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest rm: cannot remove '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': No such file or directory I don't see any output in dmesg. Is there some option I need to enable? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 1:31 ` Mike Gilbert @ 2019-12-09 1:45 ` Qu Wenruo 2019-12-09 1:51 ` Mike Gilbert 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2019-12-09 1:45 UTC (permalink / raw) To: Mike Gilbert; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 2650 bytes --] On 2019/12/9 上午9:31, Mike Gilbert wrote: > On Sun, Dec 8, 2019 at 7:41 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> >> On 2019/12/9 上午8:30, Mike Gilbert wrote: >>> On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>> >>>> >>>> >>>> On 2019/12/9 上午3:19, Mike Gilbert wrote: >>>>> Hello, >>>>> >>>>> I have a directory entry that cannot be stat-ed or unlinked. This >>>>> issue persists across reboots, so it seems there is something wrong on >>>>> disk. >>>>> >>>>> % ls -l /var/cache/ccache.bad/2/c >>>>> ls: cannot access >>>>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': >>>>> No such >>>>> file or directory >>>>> total 0 >>>>> -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest >>>> >>>> Dmesg if any, please. >>> >>> There's nothing btrfs-related in the dmesg output. >>> >>>>> >>>>> % uname -a >>>>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD >>>>> Phenom(tm) II X6 1055T Processor >>>>> AuthenticAMD GNU/Linux >>>> >>>> The kernel is not new enough to btrfs' standard. >>>> >>>> For this possibility name hash mismatch bug, newer kernel will reported >>>> detailed problems. >>> >>> Would 4.19.88 suffice, or do I need to switch to a newer release branch? >>> >> I'd recommend to go at least latest LTS (v5.3.x). >> >> .88 is just backports, nothing really different. And sometimes big fixes >> won't get backported. > > I upgraded to linux-5.4.2, and attempted to remove the file, with the > same results. > > ls: cannot access > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > No such > file or directory > total 0 > -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest > > rm: cannot remove > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > No such > file or directory > > I don't see any output in dmesg. Is there some option I need to enable? > Then it's not name hash mismatch, but just index mismatch. In that case, kernel won't detect such problem by tree-checker. I'll update tree-checker to handle the case. I guess the only way to fix it is to rely on btrfs check --mode=lowmem --repair. But before that, would you please provde the following dump? So that I can be sure before crafting the enhanced tree-checker patch. # btrfs ins dump-tree -t 5 /dev/sda3 | grep "(4065004 INO" -A7 # btrfs ins dump-tree -t 5 /dev/sda3 | grep "(486836.*13905)" -A7 # btrfs ins dump-tree -t 5 /dev/sda3 | grep "(486836.*2543451757)" -A7 Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 1:45 ` Qu Wenruo @ 2019-12-09 1:51 ` Mike Gilbert 2019-12-09 2:05 ` Qu Wenruo 0 siblings, 1 reply; 15+ messages in thread From: Mike Gilbert @ 2019-12-09 1:51 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Sun, Dec 8, 2019 at 8:45 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2019/12/9 上午9:31, Mike Gilbert wrote: > > On Sun, Dec 8, 2019 at 7:41 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> > >> > >> > >> On 2019/12/9 上午8:30, Mike Gilbert wrote: > >>> On Sun, Dec 8, 2019 at 7:11 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2019/12/9 上午3:19, Mike Gilbert wrote: > >>>>> Hello, > >>>>> > >>>>> I have a directory entry that cannot be stat-ed or unlinked. This > >>>>> issue persists across reboots, so it seems there is something wrong on > >>>>> disk. > >>>>> > >>>>> % ls -l /var/cache/ccache.bad/2/c > >>>>> ls: cannot access > >>>>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > >>>>> No such > >>>>> file or directory > >>>>> total 0 > >>>>> -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest > >>>> > >>>> Dmesg if any, please. > >>> > >>> There's nothing btrfs-related in the dmesg output. > >>> > >>>>> > >>>>> % uname -a > >>>>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD > >>>>> Phenom(tm) II X6 1055T Processor > >>>>> AuthenticAMD GNU/Linux > >>>> > >>>> The kernel is not new enough to btrfs' standard. > >>>> > >>>> For this possibility name hash mismatch bug, newer kernel will reported > >>>> detailed problems. > >>> > >>> Would 4.19.88 suffice, or do I need to switch to a newer release branch? > >>> > >> I'd recommend to go at least latest LTS (v5.3.x). > >> > >> .88 is just backports, nothing really different. And sometimes big fixes > >> won't get backported. > > > > I upgraded to linux-5.4.2, and attempted to remove the file, with the > > same results. > > > > ls: cannot access > > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > > No such > > file or directory > > total 0 > > -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest > > > > rm: cannot remove > > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > > No such > > file or directory > > > > I don't see any output in dmesg. Is there some option I need to enable? > > > Then it's not name hash mismatch, but just index mismatch. > > In that case, kernel won't detect such problem by tree-checker. I'll > update tree-checker to handle the case. > > I guess the only way to fix it is to rely on btrfs check --mode=lowmem > --repair. > But before that, would you please provde the following dump? So that I > can be sure before crafting the enhanced tree-checker patch. > > # btrfs ins dump-tree -t 5 /dev/sda3 | grep "(4065004 INO" -A7 > # btrfs ins dump-tree -t 5 /dev/sda3 | grep "(486836.*13905)" -A7 > # btrfs ins dump-tree -t 5 /dev/sda3 | grep "(486836.*2543451757)" -A7 Here you go. I ran this while the filesystem was mounted; if you need it to be run while offline, I'll have to fire up a livecd. location key (4065004 INODE_ITEM 1073741824) type FILE transid 21397 data_len 0 name_len 44 name: 0390cb341d248c589c419007da68b2-7351.manifest item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74 location key (4065004 INODE_ITEM 0) type FILE transid 21397 data_len 0 name_len 44 name: 0390cb341d248c589c419007da68b2-7351.manifest leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE leaf 533498265600 flags 0x1(WRITTEN) backref revision 1 fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb item 0 key (1059762 INODE_ITEM 0) itemoff 16123 itemsize 160 -- item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160 generation 21397 transid 21397 size 12261 nbytes 12288 block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0 sequence 23 flags 0x0(none) atime 1565546668.383680243 (2019-08-11 14:04:28) ctime 1565546668.383680243 (2019-08-11 14:04:28) mtime 1565546668.383680243 (2019-08-11 14:04:28) otime 1565546668.336681213 (2019-08-11 14:04:28) item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54 index 13905 namelen 44 name: 0390cb341d248c589c419007da68b2-7351.manifest item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53 generation 21397 type 1 (regular) extent data disk byte 6288928768 nr 12288 extent data offset 0 nr 12288 ram 12288 extent compression 0 (none) item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160 item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74 location key (4065004 INODE_ITEM 0) type FILE transid 21397 data_len 0 name_len 44 name: 0390cb341d248c589c419007da68b2-7351.manifest leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE leaf 533498265600 flags 0x1(WRITTEN) backref revision 1 fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74 location key (4065004 INODE_ITEM 1073741824) type FILE transid 21397 data_len 0 name_len 44 name: 0390cb341d248c589c419007da68b2-7351.manifest item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74 location key (4065004 INODE_ITEM 0) type FILE transid 21397 data_len 0 name_len 44 name: 0390cb341d248c589c419007da68b2-7351.manifest parent transid verify failed on 629293056 wanted 177041 found 177044 parent transid verify failed on 629293056 wanted 177041 found 177044 Ignoring transid failure ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 1:51 ` Mike Gilbert @ 2019-12-09 2:05 ` Qu Wenruo 2019-12-09 2:20 ` Qu Wenruo 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2019-12-09 2:05 UTC (permalink / raw) To: Mike Gilbert; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 3112 bytes --] On 2019/12/9 上午9:51, Mike Gilbert wrote: [...] > > Here you go. > > I ran this while the filesystem was mounted; if you need it to be run > while offline, I'll have to fire up a livecd. The info is good enough, no need to go livecd. > -- > item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160 > generation 21397 transid 21397 size 12261 nbytes 12288 > block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0 > sequence 23 flags 0x0(none) > atime 1565546668.383680243 (2019-08-11 14:04:28) > ctime 1565546668.383680243 (2019-08-11 14:04:28) > mtime 1565546668.383680243 (2019-08-11 14:04:28) > otime 1565546668.336681213 (2019-08-11 14:04:28) > item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54 > index 13905 namelen 44 name: > 0390cb341d248c589c419007da68b2-7351.manifest That inode exists and is good. > item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53 > generation 21397 type 1 (regular) > extent data disk byte 6288928768 nr 12288 > extent data offset 0 nr 12288 ram 12288 > extent compression 0 (none) > item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160 > item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74 > location key (4065004 INODE_ITEM 0) type FILE > transid 21397 data_len 0 name_len 44 > name: 0390cb341d248c589c419007da68b2-7351.manifest Good parent dir index. > leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE > leaf 533498265600 flags 0x1(WRITTEN) backref revision 1 > fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb > item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74 > location key (4065004 INODE_ITEM 1073741824) type FILE > transid 21397 data_len 0 name_len 44 > name: 0390cb341d248c589c419007da68b2-7351.manifest This is the problem, bad parent dir hash. The key should be (4065004 INODE_ITEM 0). The 1073741824 (0x40000000) is completely garbage. That garbage looks like a bit flip at runtime. It's recommended to check your memory. I'll add extra tree-check checks, so that such runtime problem can be detected before corrupted data reach disk. For repair, I'll craft a special btrfs-progs for you to handle it, as that should be the safest way. Please wait for another 15min for that tool. Thanks, Qu > item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74 > location key (4065004 INODE_ITEM 0) type FILE > transid 21397 data_len 0 name_len 44 > name: 0390cb341d248c589c419007da68b2-7351.manifest > parent transid verify failed on 629293056 wanted 177041 found 177044 > parent transid verify failed on 629293056 wanted 177041 found 177044 > Ignoring transid failure > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 2:05 ` Qu Wenruo @ 2019-12-09 2:20 ` Qu Wenruo 2019-12-09 2:37 ` Mike Gilbert 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2019-12-09 2:20 UTC (permalink / raw) To: Mike Gilbert; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 3667 bytes --] On 2019/12/9 上午10:05, Qu Wenruo wrote: > > > On 2019/12/9 上午9:51, Mike Gilbert wrote: > [...] >> >> Here you go. >> >> I ran this while the filesystem was mounted; if you need it to be run >> while offline, I'll have to fire up a livecd. > The info is good enough, no need to go livecd. > >> -- >> item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160 >> generation 21397 transid 21397 size 12261 nbytes 12288 >> block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0 >> sequence 23 flags 0x0(none) >> atime 1565546668.383680243 (2019-08-11 14:04:28) >> ctime 1565546668.383680243 (2019-08-11 14:04:28) >> mtime 1565546668.383680243 (2019-08-11 14:04:28) >> otime 1565546668.336681213 (2019-08-11 14:04:28) >> item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54 >> index 13905 namelen 44 name: >> 0390cb341d248c589c419007da68b2-7351.manifest > > That inode exists and is good. > >> item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53 >> generation 21397 type 1 (regular) >> extent data disk byte 6288928768 nr 12288 >> extent data offset 0 nr 12288 ram 12288 >> extent compression 0 (none) >> item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160 >> item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74 >> location key (4065004 INODE_ITEM 0) type FILE >> transid 21397 data_len 0 name_len 44 >> name: 0390cb341d248c589c419007da68b2-7351.manifest > > Good parent dir index. > >> leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE >> leaf 533498265600 flags 0x1(WRITTEN) backref revision 1 >> fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 >> chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb >> item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74 >> location key (4065004 INODE_ITEM 1073741824) type FILE >> transid 21397 data_len 0 name_len 44 >> name: 0390cb341d248c589c419007da68b2-7351.manifest > > This is the problem, bad parent dir hash. > > The key should be (4065004 INODE_ITEM 0). The 1073741824 (0x40000000) is > completely garbage. > > That garbage looks like a bit flip at runtime. > It's recommended to check your memory. > > I'll add extra tree-check checks, so that such runtime problem can be > detected before corrupted data reach disk. > > > For repair, I'll craft a special btrfs-progs for you to handle it, as > that should be the safest way. > Please wait for another 15min for that tool. Here is the special branch for you: https://github.com/adam900710/btrfs-progs/tree/dirty_fix_for_mike After compile, you can use btrfs-corrupt-block (I know it's a bad name) to repair your fs (must be unmounted): # ./btrfs-corrupt-block -X /dev/sda3 If anything wrong happened, your fs should be kept untouched. If repaired successfully, there should be no output. Thanks, Qu > > Thanks, > Qu > > >> item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74 >> location key (4065004 INODE_ITEM 0) type FILE >> transid 21397 data_len 0 name_len 44 >> name: 0390cb341d248c589c419007da68b2-7351.manifest >> parent transid verify failed on 629293056 wanted 177041 found 177044 >> parent transid verify failed on 629293056 wanted 177041 found 177044 >> Ignoring transid failure >> > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 2:20 ` Qu Wenruo @ 2019-12-09 2:37 ` Mike Gilbert 2019-12-09 2:43 ` Qu Wenruo 0 siblings, 1 reply; 15+ messages in thread From: Mike Gilbert @ 2019-12-09 2:37 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Sun, Dec 8, 2019 at 9:20 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2019/12/9 上午10:05, Qu Wenruo wrote: > > > > > > On 2019/12/9 上午9:51, Mike Gilbert wrote: > > [...] > >> > >> Here you go. > >> > >> I ran this while the filesystem was mounted; if you need it to be run > >> while offline, I'll have to fire up a livecd. > > The info is good enough, no need to go livecd. > > > >> -- > >> item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160 > >> generation 21397 transid 21397 size 12261 nbytes 12288 > >> block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0 > >> sequence 23 flags 0x0(none) > >> atime 1565546668.383680243 (2019-08-11 14:04:28) > >> ctime 1565546668.383680243 (2019-08-11 14:04:28) > >> mtime 1565546668.383680243 (2019-08-11 14:04:28) > >> otime 1565546668.336681213 (2019-08-11 14:04:28) > >> item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54 > >> index 13905 namelen 44 name: > >> 0390cb341d248c589c419007da68b2-7351.manifest > > > > That inode exists and is good. > > > >> item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53 > >> generation 21397 type 1 (regular) > >> extent data disk byte 6288928768 nr 12288 > >> extent data offset 0 nr 12288 ram 12288 > >> extent compression 0 (none) > >> item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160 > >> item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74 > >> location key (4065004 INODE_ITEM 0) type FILE > >> transid 21397 data_len 0 name_len 44 > >> name: 0390cb341d248c589c419007da68b2-7351.manifest > > > > Good parent dir index. > > > >> leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE > >> leaf 533498265600 flags 0x1(WRITTEN) backref revision 1 > >> fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > >> chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb > >> item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74 > >> location key (4065004 INODE_ITEM 1073741824) type FILE > >> transid 21397 data_len 0 name_len 44 > >> name: 0390cb341d248c589c419007da68b2-7351.manifest > > > > This is the problem, bad parent dir hash. > > > > The key should be (4065004 INODE_ITEM 0). The 1073741824 (0x40000000) is > > completely garbage. > > > > That garbage looks like a bit flip at runtime. > > It's recommended to check your memory. > > > > I'll add extra tree-check checks, so that such runtime problem can be > > detected before corrupted data reach disk. > > > > > > For repair, I'll craft a special btrfs-progs for you to handle it, as > > that should be the safest way. > > Please wait for another 15min for that tool. > > Here is the special branch for you: > https://github.com/adam900710/btrfs-progs/tree/dirty_fix_for_mike > > After compile, you can use btrfs-corrupt-block (I know it's a bad name) > to repair your fs (must be unmounted): > > # ./btrfs-corrupt-block -X /dev/sda3 > > If anything wrong happened, your fs should be kept untouched. > If repaired successfully, there should be no output. > > Thanks, > Qu That worked. Thank you very much for your help with this! Now, I guess I'll fire up Memtest86 overnight. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 2:37 ` Mike Gilbert @ 2019-12-09 2:43 ` Qu Wenruo 0 siblings, 0 replies; 15+ messages in thread From: Qu Wenruo @ 2019-12-09 2:43 UTC (permalink / raw) To: Mike Gilbert; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 3757 bytes --] On 2019/12/9 上午10:37, Mike Gilbert wrote: > On Sun, Dec 8, 2019 at 9:20 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> >> On 2019/12/9 上午10:05, Qu Wenruo wrote: >>> >>> >>> On 2019/12/9 上午9:51, Mike Gilbert wrote: >>> [...] >>>> >>>> Here you go. >>>> >>>> I ran this while the filesystem was mounted; if you need it to be run >>>> while offline, I'll have to fire up a livecd. >>> The info is good enough, no need to go livecd. >>> >>>> -- >>>> item 6 key (4065004 INODE_ITEM 0) itemoff 15158 itemsize 160 >>>> generation 21397 transid 21397 size 12261 nbytes 12288 >>>> block group 0 mode 100644 links 1 uid 250 gid 250 rdev 0 >>>> sequence 23 flags 0x0(none) >>>> atime 1565546668.383680243 (2019-08-11 14:04:28) >>>> ctime 1565546668.383680243 (2019-08-11 14:04:28) >>>> mtime 1565546668.383680243 (2019-08-11 14:04:28) >>>> otime 1565546668.336681213 (2019-08-11 14:04:28) >>>> item 7 key (4065004 INODE_REF 486836) itemoff 15104 itemsize 54 >>>> index 13905 namelen 44 name: >>>> 0390cb341d248c589c419007da68b2-7351.manifest >>> >>> That inode exists and is good. >>> >>>> item 8 key (4065004 EXTENT_DATA 0) itemoff 15051 itemsize 53 >>>> generation 21397 type 1 (regular) >>>> extent data disk byte 6288928768 nr 12288 >>>> extent data offset 0 nr 12288 ram 12288 >>>> extent compression 0 (none) >>>> item 9 key (4210974 INODE_ITEM 0) itemoff 14891 itemsize 160 >>>> item 63 key (486836 DIR_INDEX 13905) itemoff 6199 itemsize 74 >>>> location key (4065004 INODE_ITEM 0) type FILE >>>> transid 21397 data_len 0 name_len 44 >>>> name: 0390cb341d248c589c419007da68b2-7351.manifest >>> >>> Good parent dir index. >>> >>>> leaf 533498265600 items 128 free space 6682 generation 176439 owner FS_TREE >>>> leaf 533498265600 flags 0x1(WRITTEN) backref revision 1 >>>> fs uuid 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 >>>> chunk uuid 0be705de-5d3b-4c23-979e-d7aaad224cfb >>>> item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74 >>>> location key (4065004 INODE_ITEM 1073741824) type FILE >>>> transid 21397 data_len 0 name_len 44 >>>> name: 0390cb341d248c589c419007da68b2-7351.manifest >>> >>> This is the problem, bad parent dir hash. >>> >>> The key should be (4065004 INODE_ITEM 0). The 1073741824 (0x40000000) is >>> completely garbage. >>> >>> That garbage looks like a bit flip at runtime. >>> It's recommended to check your memory. >>> >>> I'll add extra tree-check checks, so that such runtime problem can be >>> detected before corrupted data reach disk. >>> >>> >>> For repair, I'll craft a special btrfs-progs for you to handle it, as >>> that should be the safest way. >>> Please wait for another 15min for that tool. >> >> Here is the special branch for you: >> https://github.com/adam900710/btrfs-progs/tree/dirty_fix_for_mike >> >> After compile, you can use btrfs-corrupt-block (I know it's a bad name) >> to repair your fs (must be unmounted): >> >> # ./btrfs-corrupt-block -X /dev/sda3 >> >> If anything wrong happened, your fs should be kept untouched. >> If repaired successfully, there should be no output. >> >> Thanks, >> Qu > > That worked. Thank you very much for your help with this! > > Now, I guess I'll fire up Memtest86 overnight. > Just a reminder, if tree-checker is properly enhanced, for 5.6 even with bad memory, we should be able to detect and prevent it in advance. Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-08 19:19 Unable to remove directory entry Mike Gilbert 2019-12-09 0:11 ` Qu Wenruo @ 2019-12-09 0:17 ` Zygo Blaxell 2019-12-09 1:33 ` Zygo Blaxell 1 sibling, 1 reply; 15+ messages in thread From: Zygo Blaxell @ 2019-12-09 0:17 UTC (permalink / raw) To: Mike Gilbert; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 4094 bytes --] On Sun, Dec 08, 2019 at 02:19:10PM -0500, Mike Gilbert wrote: > Hello, > > I have a directory entry that cannot be stat-ed or unlinked. This > issue persists across reboots, so it seems there is something wrong on > disk. > > % ls -l /var/cache/ccache.bad/2/c > ls: cannot access > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > No such > file or directory > total 0 > -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest I have seen a bug similar to this some years ago. It was present as far back as 4.5, and seems to still be present in 5.0.21. I don't have detailed tracking information on it due to the low severity: not a crash or data corruption bug, and workarounds exist both to prevent the bug and to clean up its aftermath. The reproducer is something like: while (true) { // pseudocode int fd = create(tmp_name); write(fd, ...); fsync(fd); // required, bug does not appear without this fsync close(fd); rename(tmp_name, regular_name); } and a crash, maybe with some heavy write load. This is typical of applications like git and ccache, and in the wild, broken directory entries are often found in these applications' directories. Somewhere between 4.5 and 4.12 (a big range, I know), there was a change in behavior: before, the broken directory entry could not be removed, renamed, or used for a new file, the only way to get rid of the broken directory entry was to delete the entire subvol. After the behavior change, the broken directory entry could be removed by creating a new file and renaming it to the broken directory entry name. Another workaround is to remove the fsync by running the application under eatmydata. btrfs performs a flush in the rename() operation when an existing file is replaced, so the fsync that triggers the bug was not necessary in the first place. Note this only works when replacing an existing file, so the flushoncommit mount option is required to make this work in other cases. > % uname -a > Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD > Phenom(tm) II X6 1055T Processor > AuthenticAMD GNU/Linux > > % btrfs --version > btrfs-progs v5.4 > > I have tried running btrfs check, and I get differing results based on > the --mode switch: > > # btrfs check --readonly /dev/sda3 > [1/7] checking root items > [2/7] checking extents > [3/7] checking free space cache > [4/7] checking fs roots > [5/7] checking only csums items (without verifying data) > [6/7] checking root refs > [7/7] checking quota groups > Opening filesystem to check... > Checking filesystem on /dev/sda3 > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > found 284337733632 bytes used, no error found > total csum bytes: 267182280 > total tree bytes: 4498915328 > total fs tree bytes: 3972464640 > total extent tree bytes: 199819264 > btree space waste bytes: 776711635 > file data blocks allocated: 313928671232 > referenced 279141621760 > > # btrfs check --readonly --mode=lowmem /dev/sda3 > [1/7] checking root items > [2/7] checking extents > [3/7] checking free space cache > [4/7] checking fs roots > ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing > ERROR: root 5 DIR ITEM[486836 13905] name > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath > ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 > ERROR: errors found in fs roots > Opening filesystem to check... > Checking filesystem on /dev/sda3 > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > found 284337733632 bytes used, error(s) found > total csum bytes: 267182280 > total tree bytes: 4498915328 > total fs tree bytes: 3972464640 > total extent tree bytes: 199819264 > btree space waste bytes: 776711635 > file data blocks allocated: 313928671232 > referenced 279141621760 > > Please advise on possible next steps to diagnose and fix this. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 0:17 ` Zygo Blaxell @ 2019-12-09 1:33 ` Zygo Blaxell 2019-12-09 1:52 ` Qu Wenruo 0 siblings, 1 reply; 15+ messages in thread From: Zygo Blaxell @ 2019-12-09 1:33 UTC (permalink / raw) To: Mike Gilbert; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 9252 bytes --] On Sun, Dec 08, 2019 at 07:17:21PM -0500, Zygo Blaxell wrote: > On Sun, Dec 08, 2019 at 02:19:10PM -0500, Mike Gilbert wrote: > > Hello, > > > > I have a directory entry that cannot be stat-ed or unlinked. This > > issue persists across reboots, so it seems there is something wrong on > > disk. > > > > % ls -l /var/cache/ccache.bad/2/c > > ls: cannot access > > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > > No such > > file or directory > > total 0 > > -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest > > I have seen a bug similar to this some years ago. It was present as > far back as 4.5, and seems to still be present in 5.0.21. I don't have > detailed tracking information on it due to the low severity: not a crash > or data corruption bug, and workarounds exist both to prevent the bug > and to clean up its aftermath. > > The reproducer is something like: > > while (true) { // pseudocode > int fd = create(tmp_name); > write(fd, ...); > fsync(fd); // required, bug does not appear without this fsync > close(fd); > rename(tmp_name, regular_name); > } > > and a crash, maybe with some heavy write load. This is typical of > applications like git and ccache, and in the wild, broken directory > entries are often found in these applications' directories. > > Somewhere between 4.5 and 4.12 (a big range, I know), there was a change > in behavior: before, the broken directory entry could not be removed, > renamed, or used for a new file, the only way to get rid of the broken > directory entry was to delete the entire subvol. After the behavior > change, the broken directory entry could be removed by creating a new > file and renaming it to the broken directory entry name. I found a filesystem that currently has one of these broken dirents: root@tester24:/media/testfs/beeshome# ls -l ls: cannot access 'beesstats.txt.tmp': No such file or directory total 3446032 -rw-r--r-- 1 root root 10313 Nov 22 2018 all-df-today.png -rw-r--r-- 1 root root 3297813 Nov 22 2018 all-df-today.txt -rw------- 1 root root 1048488 Dec 7 17:35 beescrawl.dat -rwx------ 1 root root 1073741824 Dec 8 20:19 beeshash.dat -????????? ? ? ? ? ? beesstats.txt.tmp -rw-r--r-- 1 root root 16064406 Dec 3 00:13 df-2019-11-28.txt -rw-r--r-- 1 root root 4269887 Dec 5 00:52 df-2019-12-03.txt -rw-r--r-- 1 root root 6358158 Dec 7 16:44 df-2019-12-05.txt -rw-r--r-- 1 root root 3221101 Dec 8 20:18 df-2019-12-07.txt -rw-r--r-- 1 root root 2208475574 Dec 3 00:13 log-2019-11-28.txt -rw-r--r-- 1 root root 72372394 Dec 5 00:52 log-2019-12-03.txt -rw-r--r-- 1 root root 97472346 Dec 7 16:44 log-2019-12-05.txt -rw-r--r-- 1 root root 42378425 Dec 8 20:19 log-2019-12-07.txt lrwxrwxrwx 1 root root 18 Dec 7 17:35 log-today.txt -> log-2019-12-07.txt It seems I can create a file with the same name, and then I get two: root@tester24:/media/testfs/beeshome# date > beesstats.txt.tmp root@tester24:/media/testfs/beeshome# ls -l total 3446044 -rw-r--r-- 1 root root 10313 Nov 22 2018 all-df-today.png -rw-r--r-- 1 root root 3297813 Nov 22 2018 all-df-today.txt -rw------- 1 root root 1048488 Dec 7 17:35 beescrawl.dat -rwx------ 1 root root 1073741824 Dec 8 20:19 beeshash.dat -rw-r--r-- 1 root root 29 Dec 8 20:19 beesstats.txt.tmp -rw-r--r-- 1 root root 29 Dec 8 20:19 beesstats.txt.tmp -rw-r--r-- 1 root root 16064406 Dec 3 00:13 df-2019-11-28.txt -rw-r--r-- 1 root root 4269887 Dec 5 00:52 df-2019-12-03.txt -rw-r--r-- 1 root root 6358158 Dec 7 16:44 df-2019-12-05.txt -rw-r--r-- 1 root root 3221363 Dec 8 20:19 df-2019-12-07.txt -rw-r--r-- 1 root root 2208475574 Dec 3 00:13 log-2019-11-28.txt -rw-r--r-- 1 root root 72372394 Dec 5 00:52 log-2019-12-03.txt -rw-r--r-- 1 root root 97472346 Dec 7 16:44 log-2019-12-05.txt -rw-r--r-- 1 root root 42384027 Dec 8 20:19 log-2019-12-07.txt lrwxrwxrwx 1 root root 18 Dec 7 17:35 log-today.txt -> log-2019-12-07.txt root@tester24:/media/testfs/beeshome# cat beesstats.txt.tmp Sun Dec 8 20:19:38 EST 2019 dump-tree sees both DIR_INDEX but only one DIR_ITEM: item 9 key (256 DIR_ITEM 2721875446) itemoff 15740 itemsize 47 location key (133693 INODE_ITEM 0) type FILE transid 5002644 data_len 0 name_len 17 name: beesstats.txt.tmp item 18 key (256 DIR_INDEX 22037) itemoff 15332 itemsize 47 location key (11481 INODE_ITEM 0) type FILE transid 1876891 data_len 0 name_len 17 name: beesstats.txt.tmp item 32 key (256 DIR_INDEX 264858) itemoff 14684 itemsize 47 location key (133693 INODE_ITEM 0) type FILE transid 5002644 data_len 0 name_len 17 name: beesstats.txt.tmp but I can only delete DIR_ITEMs: root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp rm: cannot remove 'beesstats.txt.tmp': No such file or directory root@tester24:/media/testfs/beeshome# ls -l ls: cannot access 'beesstats.txt.tmp': No such file or directory total 3446048 -rw-r--r-- 1 root root 10313 Nov 22 2018 all-df-today.png -rw-r--r-- 1 root root 3297813 Nov 22 2018 all-df-today.txt -rw------- 1 root root 1048488 Dec 7 17:35 beescrawl.dat -rwx------ 1 root root 1073741824 Dec 8 20:20 beeshash.dat -????????? ? ? ? ? ? beesstats.txt.tmp -rw-r--r-- 1 root root 16064406 Dec 3 00:13 df-2019-11-28.txt -rw-r--r-- 1 root root 4269887 Dec 5 00:52 df-2019-12-03.txt -rw-r--r-- 1 root root 6358158 Dec 7 16:44 df-2019-12-05.txt -rw-r--r-- 1 root root 3221494 Dec 8 20:19 df-2019-12-07.txt -rw-r--r-- 1 root root 2208475574 Dec 3 00:13 log-2019-11-28.txt -rw-r--r-- 1 root root 72372394 Dec 5 00:52 log-2019-12-03.txt -rw-r--r-- 1 root root 97472346 Dec 7 16:44 log-2019-12-05.txt -rw-r--r-- 1 root root 42396102 Dec 8 20:20 log-2019-12-07.txt lrwxrwxrwx 1 root root 18 Dec 7 17:35 log-today.txt -> log-2019-12-07.txt leaving the first DIR_INDEX behind: item 17 key (256 DIR_INDEX 22037) itemoff 15379 itemsize 47 location key (11481 INODE_ITEM 0) type FILE transid 1876891 data_len 0 name_len 17 name: beesstats.txt.tmp So the btrfs read side is fine, it's the writing side that is putting bad metadata on the disk. > Another workaround is to remove the fsync by running the application > under eatmydata. btrfs performs a flush in the rename() operation when > an existing file is replaced, so the fsync that triggers the bug was > not necessary in the first place. Note this only works when replacing > an existing file, so the flushoncommit mount option is required to make > this work in other cases. > > > % uname -a > > Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD > > Phenom(tm) II X6 1055T Processor > > AuthenticAMD GNU/Linux > > > > % btrfs --version > > btrfs-progs v5.4 > > > > I have tried running btrfs check, and I get differing results based on > > the --mode switch: > > > > # btrfs check --readonly /dev/sda3 > > [1/7] checking root items > > [2/7] checking extents > > [3/7] checking free space cache > > [4/7] checking fs roots > > [5/7] checking only csums items (without verifying data) > > [6/7] checking root refs > > [7/7] checking quota groups > > Opening filesystem to check... > > Checking filesystem on /dev/sda3 > > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > > found 284337733632 bytes used, no error found > > total csum bytes: 267182280 > > total tree bytes: 4498915328 > > total fs tree bytes: 3972464640 > > total extent tree bytes: 199819264 > > btree space waste bytes: 776711635 > > file data blocks allocated: 313928671232 > > referenced 279141621760 > > > > # btrfs check --readonly --mode=lowmem /dev/sda3 > > [1/7] checking root items > > [2/7] checking extents > > [3/7] checking free space cache > > [4/7] checking fs roots > > ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name > > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing > > ERROR: root 5 DIR ITEM[486836 13905] name > > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath > > ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name > > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 > > ERROR: errors found in fs roots > > Opening filesystem to check... > > Checking filesystem on /dev/sda3 > > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > > found 284337733632 bytes used, error(s) found > > total csum bytes: 267182280 > > total tree bytes: 4498915328 > > total fs tree bytes: 3972464640 > > total extent tree bytes: 199819264 > > btree space waste bytes: 776711635 > > file data blocks allocated: 313928671232 > > referenced 279141621760 > > > > Please advise on possible next steps to diagnose and fix this. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 1:33 ` Zygo Blaxell @ 2019-12-09 1:52 ` Qu Wenruo 2019-12-09 2:23 ` Zygo Blaxell 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2019-12-09 1:52 UTC (permalink / raw) To: Zygo Blaxell, Mike Gilbert; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 9812 bytes --] On 2019/12/9 上午9:33, Zygo Blaxell wrote: > On Sun, Dec 08, 2019 at 07:17:21PM -0500, Zygo Blaxell wrote: >> On Sun, Dec 08, 2019 at 02:19:10PM -0500, Mike Gilbert wrote: >>> Hello, >>> >>> I have a directory entry that cannot be stat-ed or unlinked. This >>> issue persists across reboots, so it seems there is something wrong on >>> disk. >>> >>> % ls -l /var/cache/ccache.bad/2/c >>> ls: cannot access >>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': >>> No such >>> file or directory >>> total 0 >>> -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest >> >> I have seen a bug similar to this some years ago. It was present as >> far back as 4.5, and seems to still be present in 5.0.21. I don't have >> detailed tracking information on it due to the low severity: not a crash >> or data corruption bug, and workarounds exist both to prevent the bug >> and to clean up its aftermath. >> >> The reproducer is something like: >> >> while (true) { // pseudocode >> int fd = create(tmp_name); >> write(fd, ...); >> fsync(fd); // required, bug does not appear without this fsync >> close(fd); >> rename(tmp_name, regular_name); >> } >> >> and a crash, maybe with some heavy write load. This is typical of >> applications like git and ccache, and in the wild, broken directory >> entries are often found in these applications' directories. >> >> Somewhere between 4.5 and 4.12 (a big range, I know), there was a change >> in behavior: before, the broken directory entry could not be removed, >> renamed, or used for a new file, the only way to get rid of the broken >> directory entry was to delete the entire subvol. After the behavior >> change, the broken directory entry could be removed by creating a new >> file and renaming it to the broken directory entry name. > > I found a filesystem that currently has one of these broken dirents: > > root@tester24:/media/testfs/beeshome# ls -l > ls: cannot access 'beesstats.txt.tmp': No such file or directory > total 3446032 > -rw-r--r-- 1 root root 10313 Nov 22 2018 all-df-today.png > -rw-r--r-- 1 root root 3297813 Nov 22 2018 all-df-today.txt > -rw------- 1 root root 1048488 Dec 7 17:35 beescrawl.dat > -rwx------ 1 root root 1073741824 Dec 8 20:19 beeshash.dat > -????????? ? ? ? ? ? beesstats.txt.tmp > -rw-r--r-- 1 root root 16064406 Dec 3 00:13 df-2019-11-28.txt > -rw-r--r-- 1 root root 4269887 Dec 5 00:52 df-2019-12-03.txt > -rw-r--r-- 1 root root 6358158 Dec 7 16:44 df-2019-12-05.txt > -rw-r--r-- 1 root root 3221101 Dec 8 20:18 df-2019-12-07.txt > -rw-r--r-- 1 root root 2208475574 Dec 3 00:13 log-2019-11-28.txt > -rw-r--r-- 1 root root 72372394 Dec 5 00:52 log-2019-12-03.txt > -rw-r--r-- 1 root root 97472346 Dec 7 16:44 log-2019-12-05.txt > -rw-r--r-- 1 root root 42378425 Dec 8 20:19 log-2019-12-07.txt > lrwxrwxrwx 1 root root 18 Dec 7 17:35 log-today.txt -> log-2019-12-07.txt > > It seems I can create a file with the same name, and then I get two: > > root@tester24:/media/testfs/beeshome# date > beesstats.txt.tmp > root@tester24:/media/testfs/beeshome# ls -l > total 3446044 > -rw-r--r-- 1 root root 10313 Nov 22 2018 all-df-today.png > -rw-r--r-- 1 root root 3297813 Nov 22 2018 all-df-today.txt > -rw------- 1 root root 1048488 Dec 7 17:35 beescrawl.dat > -rwx------ 1 root root 1073741824 Dec 8 20:19 beeshash.dat > -rw-r--r-- 1 root root 29 Dec 8 20:19 beesstats.txt.tmp > -rw-r--r-- 1 root root 29 Dec 8 20:19 beesstats.txt.tmp > -rw-r--r-- 1 root root 16064406 Dec 3 00:13 df-2019-11-28.txt > -rw-r--r-- 1 root root 4269887 Dec 5 00:52 df-2019-12-03.txt > -rw-r--r-- 1 root root 6358158 Dec 7 16:44 df-2019-12-05.txt > -rw-r--r-- 1 root root 3221363 Dec 8 20:19 df-2019-12-07.txt > -rw-r--r-- 1 root root 2208475574 Dec 3 00:13 log-2019-11-28.txt > -rw-r--r-- 1 root root 72372394 Dec 5 00:52 log-2019-12-03.txt > -rw-r--r-- 1 root root 97472346 Dec 7 16:44 log-2019-12-05.txt > -rw-r--r-- 1 root root 42384027 Dec 8 20:19 log-2019-12-07.txt > lrwxrwxrwx 1 root root 18 Dec 7 17:35 log-today.txt -> log-2019-12-07.txt > root@tester24:/media/testfs/beeshome# cat beesstats.txt.tmp > Sun Dec 8 20:19:38 EST 2019 > > dump-tree sees both DIR_INDEX but only one DIR_ITEM: > > item 9 key (256 DIR_ITEM 2721875446) itemoff 15740 itemsize 47 > location key (133693 INODE_ITEM 0) type FILE > transid 5002644 data_len 0 name_len 17 > name: beesstats.txt.tmp > item 18 key (256 DIR_INDEX 22037) itemoff 15332 itemsize 47 > location key (11481 INODE_ITEM 0) type FILE > transid 1876891 data_len 0 name_len 17 > name: beesstats.txt.tmp > item 32 key (256 DIR_INDEX 264858) itemoff 14684 itemsize 47 > location key (133693 INODE_ITEM 0) type FILE > transid 5002644 data_len 0 name_len 17 > name: beesstats.txt.tmp > > but I can only delete DIR_ITEMs: > > root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp > root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp > rm: cannot remove 'beesstats.txt.tmp': No such file or directory > root@tester24:/media/testfs/beeshome# ls -l > ls: cannot access 'beesstats.txt.tmp': No such file or directory > total 3446048 > -rw-r--r-- 1 root root 10313 Nov 22 2018 all-df-today.png > -rw-r--r-- 1 root root 3297813 Nov 22 2018 all-df-today.txt > -rw------- 1 root root 1048488 Dec 7 17:35 beescrawl.dat > -rwx------ 1 root root 1073741824 Dec 8 20:20 beeshash.dat > -????????? ? ? ? ? ? beesstats.txt.tmp > -rw-r--r-- 1 root root 16064406 Dec 3 00:13 df-2019-11-28.txt > -rw-r--r-- 1 root root 4269887 Dec 5 00:52 df-2019-12-03.txt > -rw-r--r-- 1 root root 6358158 Dec 7 16:44 df-2019-12-05.txt > -rw-r--r-- 1 root root 3221494 Dec 8 20:19 df-2019-12-07.txt > -rw-r--r-- 1 root root 2208475574 Dec 3 00:13 log-2019-11-28.txt > -rw-r--r-- 1 root root 72372394 Dec 5 00:52 log-2019-12-03.txt > -rw-r--r-- 1 root root 97472346 Dec 7 16:44 log-2019-12-05.txt > -rw-r--r-- 1 root root 42396102 Dec 8 20:20 log-2019-12-07.txt > lrwxrwxrwx 1 root root 18 Dec 7 17:35 log-today.txt -> log-2019-12-07.txt > > leaving the first DIR_INDEX behind: > > item 17 key (256 DIR_INDEX 22037) itemoff 15379 itemsize 47 > location key (11481 INODE_ITEM 0) type FILE > transid 1876891 data_len 0 name_len 17 > name: beesstats.txt.tmp This looks like a older kernel bug (hopes so). So there is an orphan DIR_INDEX left, but never cleaned up properly. In that case, btrfs-progs should be able to repair it. But strangely, why original mode check didn't report it? BTW, does that 11481 inode still exist? Thanks, Qu > > So the btrfs read side is fine, it's the writing side that is putting bad > metadata on the disk. > >> Another workaround is to remove the fsync by running the application >> under eatmydata. btrfs performs a flush in the rename() operation when >> an existing file is replaced, so the fsync that triggers the bug was >> not necessary in the first place. Note this only works when replacing >> an existing file, so the flushoncommit mount option is required to make >> this work in other cases. >> >>> % uname -a >>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD >>> Phenom(tm) II X6 1055T Processor >>> AuthenticAMD GNU/Linux >>> >>> % btrfs --version >>> btrfs-progs v5.4 >>> >>> I have tried running btrfs check, and I get differing results based on >>> the --mode switch: >>> >>> # btrfs check --readonly /dev/sda3 >>> [1/7] checking root items >>> [2/7] checking extents >>> [3/7] checking free space cache >>> [4/7] checking fs roots >>> [5/7] checking only csums items (without verifying data) >>> [6/7] checking root refs >>> [7/7] checking quota groups >>> Opening filesystem to check... >>> Checking filesystem on /dev/sda3 >>> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 >>> found 284337733632 bytes used, no error found >>> total csum bytes: 267182280 >>> total tree bytes: 4498915328 >>> total fs tree bytes: 3972464640 >>> total extent tree bytes: 199819264 >>> btree space waste bytes: 776711635 >>> file data blocks allocated: 313928671232 >>> referenced 279141621760 >>> >>> # btrfs check --readonly --mode=lowmem /dev/sda3 >>> [1/7] checking root items >>> [2/7] checking extents >>> [3/7] checking free space cache >>> [4/7] checking fs roots >>> ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name >>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing >>> ERROR: root 5 DIR ITEM[486836 13905] name >>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath >>> ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name >>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 >>> ERROR: errors found in fs roots >>> Opening filesystem to check... >>> Checking filesystem on /dev/sda3 >>> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 >>> found 284337733632 bytes used, error(s) found >>> total csum bytes: 267182280 >>> total tree bytes: 4498915328 >>> total fs tree bytes: 3972464640 >>> total extent tree bytes: 199819264 >>> btree space waste bytes: 776711635 >>> file data blocks allocated: 313928671232 >>> referenced 279141621760 >>> >>> Please advise on possible next steps to diagnose and fix this. > > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Unable to remove directory entry 2019-12-09 1:52 ` Qu Wenruo @ 2019-12-09 2:23 ` Zygo Blaxell 0 siblings, 0 replies; 15+ messages in thread From: Zygo Blaxell @ 2019-12-09 2:23 UTC (permalink / raw) To: Qu Wenruo; +Cc: Mike Gilbert, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 10853 bytes --] On Mon, Dec 09, 2019 at 09:52:54AM +0800, Qu Wenruo wrote: > > > On 2019/12/9 上午9:33, Zygo Blaxell wrote: > > On Sun, Dec 08, 2019 at 07:17:21PM -0500, Zygo Blaxell wrote: > >> On Sun, Dec 08, 2019 at 02:19:10PM -0500, Mike Gilbert wrote: > >>> Hello, > >>> > >>> I have a directory entry that cannot be stat-ed or unlinked. This > >>> issue persists across reboots, so it seems there is something wrong on > >>> disk. > >>> > >>> % ls -l /var/cache/ccache.bad/2/c > >>> ls: cannot access > >>> '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > >>> No such > >>> file or directory > >>> total 0 > >>> -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest > >> > >> I have seen a bug similar to this some years ago. It was present as > >> far back as 4.5, and seems to still be present in 5.0.21. I don't have > >> detailed tracking information on it due to the low severity: not a crash > >> or data corruption bug, and workarounds exist both to prevent the bug > >> and to clean up its aftermath. > >> > >> The reproducer is something like: > >> > >> while (true) { // pseudocode > >> int fd = create(tmp_name); > >> write(fd, ...); > >> fsync(fd); // required, bug does not appear without this fsync > >> close(fd); > >> rename(tmp_name, regular_name); > >> } > >> > >> and a crash, maybe with some heavy write load. This is typical of > >> applications like git and ccache, and in the wild, broken directory > >> entries are often found in these applications' directories. > >> > >> Somewhere between 4.5 and 4.12 (a big range, I know), there was a change > >> in behavior: before, the broken directory entry could not be removed, > >> renamed, or used for a new file, the only way to get rid of the broken > >> directory entry was to delete the entire subvol. After the behavior > >> change, the broken directory entry could be removed by creating a new > >> file and renaming it to the broken directory entry name. > > > > I found a filesystem that currently has one of these broken dirents: > > > > root@tester24:/media/testfs/beeshome# ls -l > > ls: cannot access 'beesstats.txt.tmp': No such file or directory > > total 3446032 > > -rw-r--r-- 1 root root 10313 Nov 22 2018 all-df-today.png > > -rw-r--r-- 1 root root 3297813 Nov 22 2018 all-df-today.txt > > -rw------- 1 root root 1048488 Dec 7 17:35 beescrawl.dat > > -rwx------ 1 root root 1073741824 Dec 8 20:19 beeshash.dat > > -????????? ? ? ? ? ? beesstats.txt.tmp > > -rw-r--r-- 1 root root 16064406 Dec 3 00:13 df-2019-11-28.txt > > -rw-r--r-- 1 root root 4269887 Dec 5 00:52 df-2019-12-03.txt > > -rw-r--r-- 1 root root 6358158 Dec 7 16:44 df-2019-12-05.txt > > -rw-r--r-- 1 root root 3221101 Dec 8 20:18 df-2019-12-07.txt > > -rw-r--r-- 1 root root 2208475574 Dec 3 00:13 log-2019-11-28.txt > > -rw-r--r-- 1 root root 72372394 Dec 5 00:52 log-2019-12-03.txt > > -rw-r--r-- 1 root root 97472346 Dec 7 16:44 log-2019-12-05.txt > > -rw-r--r-- 1 root root 42378425 Dec 8 20:19 log-2019-12-07.txt > > lrwxrwxrwx 1 root root 18 Dec 7 17:35 log-today.txt -> log-2019-12-07.txt > > > > It seems I can create a file with the same name, and then I get two: > > > > root@tester24:/media/testfs/beeshome# date > beesstats.txt.tmp > > root@tester24:/media/testfs/beeshome# ls -l > > total 3446044 > > -rw-r--r-- 1 root root 10313 Nov 22 2018 all-df-today.png > > -rw-r--r-- 1 root root 3297813 Nov 22 2018 all-df-today.txt > > -rw------- 1 root root 1048488 Dec 7 17:35 beescrawl.dat > > -rwx------ 1 root root 1073741824 Dec 8 20:19 beeshash.dat > > -rw-r--r-- 1 root root 29 Dec 8 20:19 beesstats.txt.tmp > > -rw-r--r-- 1 root root 29 Dec 8 20:19 beesstats.txt.tmp > > -rw-r--r-- 1 root root 16064406 Dec 3 00:13 df-2019-11-28.txt > > -rw-r--r-- 1 root root 4269887 Dec 5 00:52 df-2019-12-03.txt > > -rw-r--r-- 1 root root 6358158 Dec 7 16:44 df-2019-12-05.txt > > -rw-r--r-- 1 root root 3221363 Dec 8 20:19 df-2019-12-07.txt > > -rw-r--r-- 1 root root 2208475574 Dec 3 00:13 log-2019-11-28.txt > > -rw-r--r-- 1 root root 72372394 Dec 5 00:52 log-2019-12-03.txt > > -rw-r--r-- 1 root root 97472346 Dec 7 16:44 log-2019-12-05.txt > > -rw-r--r-- 1 root root 42384027 Dec 8 20:19 log-2019-12-07.txt > > lrwxrwxrwx 1 root root 18 Dec 7 17:35 log-today.txt -> log-2019-12-07.txt > > root@tester24:/media/testfs/beeshome# cat beesstats.txt.tmp > > Sun Dec 8 20:19:38 EST 2019 > > > > dump-tree sees both DIR_INDEX but only one DIR_ITEM: > > > > item 9 key (256 DIR_ITEM 2721875446) itemoff 15740 itemsize 47 > > location key (133693 INODE_ITEM 0) type FILE > > transid 5002644 data_len 0 name_len 17 > > name: beesstats.txt.tmp > > item 18 key (256 DIR_INDEX 22037) itemoff 15332 itemsize 47 > > location key (11481 INODE_ITEM 0) type FILE > > transid 1876891 data_len 0 name_len 17 > > name: beesstats.txt.tmp > > item 32 key (256 DIR_INDEX 264858) itemoff 14684 itemsize 47 > > location key (133693 INODE_ITEM 0) type FILE > > transid 5002644 data_len 0 name_len 17 > > name: beesstats.txt.tmp > > > > but I can only delete DIR_ITEMs: > > > > root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp > > root@tester24:/media/testfs/beeshome# rm beesstats.txt.tmp > > rm: cannot remove 'beesstats.txt.tmp': No such file or directory > > root@tester24:/media/testfs/beeshome# ls -l > > ls: cannot access 'beesstats.txt.tmp': No such file or directory > > total 3446048 > > -rw-r--r-- 1 root root 10313 Nov 22 2018 all-df-today.png > > -rw-r--r-- 1 root root 3297813 Nov 22 2018 all-df-today.txt > > -rw------- 1 root root 1048488 Dec 7 17:35 beescrawl.dat > > -rwx------ 1 root root 1073741824 Dec 8 20:20 beeshash.dat > > -????????? ? ? ? ? ? beesstats.txt.tmp > > -rw-r--r-- 1 root root 16064406 Dec 3 00:13 df-2019-11-28.txt > > -rw-r--r-- 1 root root 4269887 Dec 5 00:52 df-2019-12-03.txt > > -rw-r--r-- 1 root root 6358158 Dec 7 16:44 df-2019-12-05.txt > > -rw-r--r-- 1 root root 3221494 Dec 8 20:19 df-2019-12-07.txt > > -rw-r--r-- 1 root root 2208475574 Dec 3 00:13 log-2019-11-28.txt > > -rw-r--r-- 1 root root 72372394 Dec 5 00:52 log-2019-12-03.txt > > -rw-r--r-- 1 root root 97472346 Dec 7 16:44 log-2019-12-05.txt > > -rw-r--r-- 1 root root 42396102 Dec 8 20:20 log-2019-12-07.txt > > lrwxrwxrwx 1 root root 18 Dec 7 17:35 log-today.txt -> log-2019-12-07.txt > > > > leaving the first DIR_INDEX behind: > > > > item 17 key (256 DIR_INDEX 22037) itemoff 15379 itemsize 47 > > location key (11481 INODE_ITEM 0) type FILE > > transid 1876891 data_len 0 name_len 17 > > name: beesstats.txt.tmp > > This looks like a older kernel bug (hopes so). > > So there is an orphan DIR_INDEX left, but never cleaned up properly. > > In that case, btrfs-progs should be able to repair it. > But strangely, why original mode check didn't report it? I'm not sure what you mean by "original mode check". I can't run btrfs check on this filesystem (97GB of metadata, too big for either regular or lowmem to handle in reasonable time). There used to be a stat check on the missing inode, which would fail in older kernels, and make the filename permanently unusable (until the subvol was deleted). That broke a lot of applications. The stat check was removed at some point, which is much better. > BTW, does that 11481 inode still exist? Nope, the only '11481' in the entire subvol's dump-tree output is that DIR_INDEX item. > Thanks, > Qu > > > > So the btrfs read side is fine, it's the writing side that is putting bad > > metadata on the disk. > > > >> Another workaround is to remove the fsync by running the application > >> under eatmydata. btrfs performs a flush in the rename() operation when > >> an existing file is replaced, so the fsync that triggers the bug was > >> not necessary in the first place. Note this only works when replacing > >> an existing file, so the flushoncommit mount option is required to make > >> this work in other cases. > >> > >>> % uname -a > >>> Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD > >>> Phenom(tm) II X6 1055T Processor > >>> AuthenticAMD GNU/Linux > >>> > >>> % btrfs --version > >>> btrfs-progs v5.4 > >>> > >>> I have tried running btrfs check, and I get differing results based on > >>> the --mode switch: > >>> > >>> # btrfs check --readonly /dev/sda3 > >>> [1/7] checking root items > >>> [2/7] checking extents > >>> [3/7] checking free space cache > >>> [4/7] checking fs roots > >>> [5/7] checking only csums items (without verifying data) > >>> [6/7] checking root refs > >>> [7/7] checking quota groups > >>> Opening filesystem to check... > >>> Checking filesystem on /dev/sda3 > >>> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > >>> found 284337733632 bytes used, no error found > >>> total csum bytes: 267182280 > >>> total tree bytes: 4498915328 > >>> total fs tree bytes: 3972464640 > >>> total extent tree bytes: 199819264 > >>> btree space waste bytes: 776711635 > >>> file data blocks allocated: 313928671232 > >>> referenced 279141621760 > >>> > >>> # btrfs check --readonly --mode=lowmem /dev/sda3 > >>> [1/7] checking root items > >>> [2/7] checking extents > >>> [3/7] checking free space cache > >>> [4/7] checking fs roots > >>> ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name > >>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing > >>> ERROR: root 5 DIR ITEM[486836 13905] name > >>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath > >>> ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name > >>> 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 > >>> ERROR: errors found in fs roots > >>> Opening filesystem to check... > >>> Checking filesystem on /dev/sda3 > >>> UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > >>> found 284337733632 bytes used, error(s) found > >>> total csum bytes: 267182280 > >>> total tree bytes: 4498915328 > >>> total fs tree bytes: 3972464640 > >>> total extent tree bytes: 199819264 > >>> btree space waste bytes: 776711635 > >>> file data blocks allocated: 313928671232 > >>> referenced 279141621760 > >>> > >>> Please advise on possible next steps to diagnose and fix this. > > > > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-12-09 2:44 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-08 19:19 Unable to remove directory entry Mike Gilbert 2019-12-09 0:11 ` Qu Wenruo 2019-12-09 0:30 ` Mike Gilbert 2019-12-09 0:41 ` Qu Wenruo 2019-12-09 1:31 ` Mike Gilbert 2019-12-09 1:45 ` Qu Wenruo 2019-12-09 1:51 ` Mike Gilbert 2019-12-09 2:05 ` Qu Wenruo 2019-12-09 2:20 ` Qu Wenruo 2019-12-09 2:37 ` Mike Gilbert 2019-12-09 2:43 ` Qu Wenruo 2019-12-09 0:17 ` Zygo Blaxell 2019-12-09 1:33 ` Zygo Blaxell 2019-12-09 1:52 ` Qu Wenruo 2019-12-09 2:23 ` Zygo Blaxell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).