Script for trivial demo in attachment $ bash test_writebehind.sh SIZE 3,2G dummy vm.dirty_write_behind = 0 COPY real 0m3.629s user 0m0.016s sys 0m3.613s Dirty: 3254552 kB SYNC real 0m31.953s user 0m0.002s sys 0m0.000s vm.dirty_write_behind = 1 COPY real 0m32.738s user 0m0.008s sys 0m4.047s Dirty: 2900 kB SYNC real 0m0.427s user 0m0.000s sys 0m0.004s vm.dirty_write_behind = 2 COPY real 0m32.168s user 0m0.000s sys 0m4.066s Dirty: 3088 kB SYNC real 0m0.421s user 0m0.004s sys 0m0.001s With vm.dirty_write_behind 1 or 2 files are written even faster and during copying amount of dirty memory always stays around at 16MiB. On 20/09/2019 10.35, Konstantin Khlebnikov wrote: > Traditional writeback tries to accumulate as much dirty data as possible. > This is worth strategy for extremely short-living files and for batching > writes for saving battery power. But for workloads where disk latency is > important this policy generates periodic disk load spikes which increases > latency for concurrent operations. > > Also dirty pages in file cache cannot be reclaimed and reused immediately. > This way massive I/O like file copying affects memory allocation latency. > > Present writeback engine allows to tune only dirty data size or expiration > time. Such tuning cannot eliminate spikes - this just lowers and multiplies > them. Other option is switching into sync mode which flushes written data > right after each write, obviously this have significant performance impact. > Such tuning is system-wide and affects memory-mapped and randomly written > files, flusher threads handle them much better. > > This patch implements write-behind policy which tracks sequential writes > and starts background writeback when file have enough dirty pages. > > Global switch in sysctl vm.dirty_write_behind: > =0: disabled, default > =1: enabled for strictly sequential writes (append, copying) > =2: enabled for all sequential writes > > The only parameter is window size: maximum amount of dirty pages behind > current position and maximum amount of pages in background writeback. > > Setup is per-disk in sysfs in file /sys/block/$DISK/bdi/write_behind_kb. > Default: 16MiB, '0' disables write-behind for this disk. > > When amount of unwritten pages exceeds window size write-behind starts > background writeback for max(excess, max_sectors_kb) and then waits for > the same amount of background writeback initiated at previously. > > |<-wait-this->| |<-send-this->|<---pending-write-behind--->| > |<--async-write-behind--->|<--------previous-data------>|<-new-data->| > current head-^ new head-^ file position-^ > > Remaining tail pages are flushed at closing file if async write-behind was > started or this is new file and it is at least max_sectors_kb long. > > Overall behavior depending on total data size: > < max_sectors_kb - no writes >> max_sectors_kb - write new files in background after close >> write_behind_kb - streaming write, write tail at close > > Special cases: > > * files with POSIX_FADV_RANDOM, O_DIRECT, O_[D]SYNC are ignored > > * writing cursor for O_APPEND is aligned to covers previous small appends > Append might happen via multiple files or via new file each time. > > * mode vm.dirty_write_behind=1 ignores non-append writes > This reacts only to completely sequential writes like copying files, > writing logs with O_APPEND or rewriting files after O_TRUNC. > > Note: ext4 feature "auto_da_alloc" also writes cache at closing file > after truncating it to 0 and after renaming one file over other. > > Changes since v1 (2017-10-02): > * rework window management: > * change default window 1MiB -> 16MiB > * change default request 256KiB -> max_sectors_kb > * drop always-async behavior for O_NONBLOCK > * drop handling POSIX_FADV_NOREUSE (should be in separate patch) > * ignore writes with O_DIRECT, O_SYNC, O_DSYNC > * align head position for O_APPEND > * add strictly sequential mode > * write tail pages for new files > * make void, keep errors at mapping > > Signed-off-by: Konstantin Khlebnikov > Link: https://lore.kernel.org/patchwork/patch/836149/ (v1) > ---