On 2020/2/12 下午7:32, ethanwu wrote: > David Sterba 於 2020-02-12 02:21 寫到: >> On Tue, Feb 11, 2020 at 12:33:48PM +0800, Qu Wenruo wrote: >>> >> 39862272 have 30949376 >>> >> [ 5949.328136] repair_io_failure: 22 callbacks suppressed >>> >> [ 5949.328139] BTRFS info (device vdb): read error corrected: ino 0 >>> >> off 39862272 (dev /dev/vdd sector 19488) >>> >> [ 5949.333447] BTRFS info (device vdb): read error corrected: ino 0 >>> >> off 39866368 (dev /dev/vdd sector 19496) >>> >> [ 5949.336875] BTRFS info (device vdb): read error corrected: ino 0 >>> >> off 39870464 (dev /dev/vdd sector 19504) >>> >> [ 5949.340325] BTRFS info (device vdb): read error corrected: ino 0 >>> >> off 39874560 (dev /dev/vdd sector 19512) >>> >> [ 5949.409934] BTRFS warning (device vdb): csum failed root -9 ino >>> 257 >>> >> off 2228224 csum >>> >>> This looks like an existing bug, IIRC Zygo reported it before. >>> >>> Btrfs balance just randomly failed at data reloc tree. >>> >>> Thus I don't believe it's related to Ethan's patches. >> >> Ok, than the patches make it more likely to happen, which could mean >> that faster backref processing hits some race window. As there could be >> more we should first fix the bug you say Zygo reported. > > I added a log to check if find_parent_nodes is ever called under > test btrfs/125. It turns out that btrfs/125 doesn't pass through the > function. What my patches do is all under find_parent_nodes. Balance goes through its own backref cache, thus it doesn't utilize the path you're modifying. So don't worry your patches look pretty good. Furthermore, this csum mismatch is not related to backref walk, but the data csum and the data in data reloc tree, which are all created by balance. So there is really no reason to block such good optimization. Thanks, Qu > Therefore, I don't think my patch would make btrfs/125 more likely > to happen, at least it doesn't change the behavior of functions > btrfs/125 run through. > > Is it easy to reproduce in your test environment?> > Thanks, > ethanwu