Find attached a patch which introduces a min_bw and max_bw limit for a backing_dev_info. As outlined in the commit description, this can be used to work around the issue until we have a better understanding of how a real solution would look like. Could we include this change in Linux? What would be the next step? Thanks, On Mon, Mar 9, 2020 at 4:11 PM Michael Stapelberg wrote: > > Thanks for clarifying. I have modified the mmap test program (see > attached) to optionally read in the entire file when the WORKAROUND= > environment variable is set, thereby preventing the FUSE reads in the > write phase. I can now see a batch of reads, followed by a batch of > writes. > > What’s interesting: when polling using “while :; do grep ^Bdi > /sys/kernel/debug/bdi/0:93/stats; sleep 0.1; done” and running the > mmap test program, I see: > > BdiDirtied: 3566304 kB > BdiWritten: 3563616 kB > BdiWriteBandwidth: 13596 kBps > > BdiDirtied: 3566304 kB > BdiWritten: 3563616 kB > BdiWriteBandwidth: 13596 kBps > > BdiDirtied: 3566528 kB (+224 kB) <-- starting to dirty pages > BdiWritten: 3564064 kB (+448 kB) <-- starting to write > BdiWriteBandwidth: 10700 kBps <-- only bandwidth update! > > BdiDirtied: 3668224 kB (+ 101696 kB) <-- all pages dirtied > BdiWritten: 3565632 kB (+1568 kB) > BdiWriteBandwidth: 10700 kBps > > BdiDirtied: 3668224 kB > BdiWritten: 3665536 kB (+ 99904 kB) <-- all pages written > BdiWriteBandwidth: 10700 kBps > > BdiDirtied: 3668224 kB > BdiWritten: 3665536 kB > BdiWriteBandwidth: 10700 kBps > > This seems to suggest that the bandwidth measurements only capture the > rising slope of the transfer, but not the bulk of the transfer itself, > resulting in inaccurate measurements. This effect is worsened when the > test program doesn’t pre-read the output file and hence the kernel > gets fewer FUSE write requests out. > > On Mon, Mar 9, 2020 at 3:36 PM Miklos Szeredi wrote: > > > > On Mon, Mar 9, 2020 at 3:32 PM Michael Stapelberg > > wrote: > > > > > > Here’s one more thing I noticed: when polling > > > /sys/kernel/debug/bdi/0:93/stats, I see that BdiDirtied and BdiWritten > > > remain at their original values while the kernel sends FUSE read > > > requests, and only goes up when the kernel transitions into sending > > > FUSE write requests. Notably, the page dirtying throttling happens in > > > the read phase, which is most likely why the write bandwidth is > > > (correctly) measured as 0. > > > > > > Do we have any ideas on why the kernel sends FUSE reads at all? > > > > Memory writes (stores) need the memory page to be up-to-date wrt. the > > backing file before proceeding. This means that if the page hasn't > > yet been cached by the kernel, it needs to be read first. > > > > Thanks, > > Miklos