Hi, It is curious but it happens only on machine which have BTRFS volume combined from two high speed nvme (pcie 4) SSD in RAID 0. On machines with BTRFS volume from one HDD the bug does not appear. To bisect the problematic commit, I had to sweat a lot. At each step, I downloaded the 150 GB game "Assassin's Creed Valhalla" 4 times and deleted it. For make sure that the commit previous to 947a629988f191807d2d22ba63ae18259bb645c5 is definitely not affected by the bug, I downloaded this game 10 times, which should have provided more than 1.5 Tb of data writing to the btrfs volume. Here is result of my bisection: 947a629988f191807d2d22ba63ae18259bb645c5 is the first bad commit commit 947a629988f191807d2d22ba63ae18259bb645c5 Author: Qu Wenruo Date: Wed Sep 14 13:32:51 2022 +0800 btrfs: move tree block parentness check into validate_extent_buffer() [BACKGROUND] Although both btrfs metadata and data has their read time verification done at endio time (btrfs_validate_metadata_buffer() and btrfs_verify_data_csum()), metadata has extra verification, mostly parentness check including first key/transid/owner_root/level, done at read_tree_block() and btrfs_read_extent_buffer(). On the other hand, all the data verification is done at endio context. [ENHANCEMENT] This patch will make a new union in btrfs_bio, taking the space of the old data checksums, thus it will not increase the memory usage. With that extra btrfs_tree_parent_check inside btrfs_bio, we can just pass the check parameter into read_extent_buffer_pages(), and before submitting the bio, we can copy the check structure into btrfs_bio. And finally at endio time, we can grab btrfs_bio::parent_check and pass it to validate_extent_buffer(), to move the remaining checks into it. This brings the following benefits: - Much simpler btrfs_read_extent_buffer() Now it only needs to iterate through all mirrors. - Simpler read-time transid check Previously we go verify_parent_transid() after reading out the extent buffer. Now the transid check is done inside the endio function, no other code can modify the content. Thus no need to use the extent lock anymore. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba fs/btrfs/disk-io.c | 73 ++++++++++++++++++++++++++++++++++++++-------------- fs/btrfs/extent_io.c | 18 ++++++++++--- fs/btrfs/extent_io.h | 5 ++-- fs/btrfs/volumes.h | 25 +++++++++++++++--- 4 files changed, 93 insertions(+), 28 deletions(-) Before going to readonly, the preceding line in kernel log display a message: [ 1908.029663] BTRFS: error (device nvme0n1p3: state A) in btrfs_run_delayed_refs:2147: errno=-5 IO failure I also attached a full kernel log. -- Best Regards, Mike Gavrilov.