On 2019/2/13 上午2:13, Zygo Blaxell wrote: > On Tue, Feb 12, 2019 at 05:56:24PM +0000, Filipe Manana wrote: >> On Tue, Feb 12, 2019 at 5:01 PM Zygo Blaxell >> wrote: >>> >>> On Tue, Feb 12, 2019 at 03:35:37PM +0000, Filipe Manana wrote: >>>> On Tue, Feb 12, 2019 at 3:11 AM Zygo Blaxell >>>> wrote: >>>>> >>>>> Still reproducible on 4.20.7. >>>> >>>> I tried your reproducer when you first reported it, on different >>>> machines with different kernel versions. >>> >>> That would have been useful to know last August... :-/ >>> >>>> Never managed to reproduce it, nor see anything obviously wrong in >>>> relevant code paths. >>> >>> I built a fresh VM running Debian stretch and >>> reproduced the issue immediately. Mount options are >>> "rw,noatime,compress=zlib,space_cache,subvolid=5,subvol=/". Kernel is >>> Debian's "4.9.0-8-amd64" but the bug is old enough that kernel version >>> probably doesn't matter. >>> >>> I don't have any configuration that can't reproduce this issue, so I don't >>> know how to help you. I've tested AMD and Intel CPUs, VM, baremetal, >>> hardware ranging in age from 0 to 9 years. Locally built kernels from >>> 4.1 to 4.20 and the stock Debian kernel (4.9). SSDs and spinning rust. >>> All of these reproduce the issue immediately--wrong sha1sum appears in >>> the first 10 loops. >>> >>> What is your test environment? I can try that here. >> >> Debian unstable, all qemu vms, 4 cpus 4G to 8G ram iirc. > > I have several environments like that... > >> Always built from source kernels. > > ...that could be a relevant difference. Have you tried a stock > Debian kernel? I'm afraid you may need to use upstream vanilla kernel other than kernel from distro, especially for distros who may have heavy backports. I also tried my test runs, using Arch stock kernel (pretty vanilla) and upstream kernel. Both my host and VM tested. No reproduce either. Upstream community is mostly focused on upstream vanilla kernel. Bugs from distro kernel can sometimes be a good clue of existing upstream bugs, but when dig deeper, vanilla kernel is always necessary. Would you mind to reproduce it in a as vanilla as possible environment? E.g. vanilla kernel and vanilla user space progs? Thanks, Qu > >> I have tested this when you reported it for 1 to 2 weeks in 2 or 3 vms >> that kept running the test in an infinite loop during those weeks. >> Don't recall what were the kernel versions (whatever was the latest at >> the time), but that shouldn't matter according to what you say. > > That's an extremely long time compared to the rate of occurrence > of this bug. It should appear in only a few seconds of testing. > Some data-hole-data patterns reproduce much slower (change the position > of "block 0" lines in the setup script), but "slower" is minutes, > not machine-months. > > Is your filesystem compressed? Does compsize show the test > file 'am' is compressed during the test? Is the sha1sum you get > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4? Does the sha1sum change > when a second process reads the file while the sha1sum/drop_caches loop > is running? > [snip]