[Question] About XFS random buffer write performance

* [Question] About XFS random buffer write performance
@ 2020-07-28 11:34 Zhengyuan Liu
  2020-07-28 15:34 ` Darrick J. Wong
  0 siblings, 1 reply; 18+ messages in thread
From: Zhengyuan Liu @ 2020-07-28 11:34 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Zhengyuan Liu

Hi all,

When doing random buffer write testing I found the bandwidth on EXT4 is much
better than XFS under the same environment.
The test case ,test result and test environment is as follows:
Test case:
fio --ioengine=sync --rw=randwrite --iodepth=64 --size=4G --name=test
--filename=/mnt/testfile --bs=4k
Before doing fio, use dd (if=/dev/zero of=/mnt/testfile bs=1M
count=4096) to warm-up the file in the page cache.

Test result (bandwidth):
         ext4                   xfs
       ~300MB/s       ~120MB/s

Test environment:
    Platform:  arm64
    Kernel:  v5.7
    PAGESIZE:  64K
    Memtotal:  16G
    Storage: sata ssd(Max bandwidth about 350MB/s)
    FS block size: 4K

The  fio "Test result" shows that EXT4 has more than 2x bandwidth compared to
XFS, but iostat shows the transfer speed of XFS to SSD is about 300MB/s too.
So I debt XFS writing back many non-dirty blocks to SSD while  writing back
dirty pages. I tried to read the core writeback code of both
filesystem and found
XFS will write back blocks which is uptodate (seeing iomap_writepage_map()),
while EXT4 writes back blocks which must be dirty (seeing
ext4_bio_write_page() ) . XFS had turned from buffer head to iomap since
V4.8, there is only a bitmap in iomap to track block's uptodate
status, no 'dirty'
concept was found, my question is if this is the reason why XFS writes many
extra blocks to SSD when doing random buffer write? If it is, then why don't we
track the dirty status of blocks in XFS?

With the questions in brain, I start digging into XFS's history, and found a
annotations in V2.6.12:
        /*
         * Calling this without startio set means we are being asked
to make a dirty
         * page ready for freeing it's buffers.  When called with
startio set then
         * we are coming from writepage.
         * When called with startio set it is important that we write the WHOLE
         * page if possible.
         * The bh->b_state's cannot know if any of the blocks or which block for
         * that matter are dirty due to mmap writes, and therefore bh
uptodate is
         * only vaild if the page itself isn't completely uptodate.  Some layers
         * may clear the page dirty flag prior to calling write page, under the
         * assumption the entire page will be written out; by not
writing out the
         * whole page the page can be reused before all valid dirty data is
         * written out.  Note: in the case of a page that has been dirty'd by
         * mapwrite and but partially setup by block_prepare_write the
         * bh->b_states's will not agree and only ones setup by BPW/BCW will
         * have valid state, thus the whole page must be written out thing.
         */
        STATIC int　xfs_page_state_convert()

From above annotations, It seems this has something to do with mmap, but I
can't get the point , so I turn to you guys to get the help. Anyway, I don't
think there is such a difference about random write between XFS and EXT4.

Any reply would be appreciative, Thanks in advance.

^ permalink raw reply	[flat|nested] 18+ messages in thread