On Sun, Dec 08, 2019 at 02:19:10PM -0500, Mike Gilbert wrote: > Hello, > > I have a directory entry that cannot be stat-ed or unlinked. This > issue persists across reboots, so it seems there is something wrong on > disk. > > % ls -l /var/cache/ccache.bad/2/c > ls: cannot access > '/var/cache/ccache.bad/2/c/0390cb341d248c589c419007da68b2-7351.manifest': > No such > file or directory > total 0 > -????????? ? ? ? ? ? 0390cb341d248c589c419007da68b2-7351.manifest I have seen a bug similar to this some years ago. It was present as far back as 4.5, and seems to still be present in 5.0.21. I don't have detailed tracking information on it due to the low severity: not a crash or data corruption bug, and workarounds exist both to prevent the bug and to clean up its aftermath. The reproducer is something like: while (true) { // pseudocode int fd = create(tmp_name); write(fd, ...); fsync(fd); // required, bug does not appear without this fsync close(fd); rename(tmp_name, regular_name); } and a crash, maybe with some heavy write load. This is typical of applications like git and ccache, and in the wild, broken directory entries are often found in these applications' directories. Somewhere between 4.5 and 4.12 (a big range, I know), there was a change in behavior: before, the broken directory entry could not be removed, renamed, or used for a new file, the only way to get rid of the broken directory entry was to delete the entire subvol. After the behavior change, the broken directory entry could be removed by creating a new file and renaming it to the broken directory entry name. Another workaround is to remove the fsync by running the application under eatmydata. btrfs performs a flush in the rename() operation when an existing file is replaced, so the fsync that triggers the bug was not necessary in the first place. Note this only works when replacing an existing file, so the flushoncommit mount option is required to make this work in other cases. > % uname -a > Linux naomi 4.19.67 #4 SMP Sun Aug 18 14:35:39 EDT 2019 x86_64 AMD > Phenom(tm) II X6 1055T Processor > AuthenticAMD GNU/Linux > > % btrfs --version > btrfs-progs v5.4 > > I have tried running btrfs check, and I get differing results based on > the --mode switch: > > # btrfs check --readonly /dev/sda3 > [1/7] checking root items > [2/7] checking extents > [3/7] checking free space cache > [4/7] checking fs roots > [5/7] checking only csums items (without verifying data) > [6/7] checking root refs > [7/7] checking quota groups > Opening filesystem to check... > Checking filesystem on /dev/sda3 > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > found 284337733632 bytes used, no error found > total csum bytes: 267182280 > total tree bytes: 4498915328 > total fs tree bytes: 3972464640 > total extent tree bytes: 199819264 > btree space waste bytes: 776711635 > file data blocks allocated: 313928671232 > referenced 279141621760 > > # btrfs check --readonly --mode=lowmem /dev/sda3 > [1/7] checking root items > [2/7] checking extents > [3/7] checking free space cache > [4/7] checking fs roots > ERROR: root 5 INODE_ITEM[4065004] index 18446744073709551615 name > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 missing > ERROR: root 5 DIR ITEM[486836 13905] name > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 mismath > ERROR: root 5 DIR ITEM[486836 2543451757] mismatch name > 0390cb341d248c589c419007da68b2-7351.manifest filetype 1 > ERROR: errors found in fs roots > Opening filesystem to check... > Checking filesystem on /dev/sda3 > UUID: 5e9dcab6-036d-40f1-8b40-24ab4c062bf6 > found 284337733632 bytes used, error(s) found > total csum bytes: 267182280 > total tree bytes: 4498915328 > total fs tree bytes: 3972464640 > total extent tree bytes: 199819264 > btree space waste bytes: 776711635 > file data blocks allocated: 313928671232 > referenced 279141621760 > > Please advise on possible next steps to diagnose and fix this.