linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhengyuan Liu <liuzhengyuang521@gmail.com>
To: darrick.wong@oracle.com
Cc: linux-xfs@vger.kernel.org, Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Subject: [Question] About XFS random buffer write performance
Date: Tue, 28 Jul 2020 19:34:39 +0800	[thread overview]
Message-ID: <CAOOPZo45E+hVAo9S_2psMJQzrzwmVKo_WjWOM7Zwhm_CS0J3iA@mail.gmail.com> (raw)

Hi all,

When doing random buffer write testing I found the bandwidth on EXT4 is much
better than XFS under the same environment.
The test case ,test result and test environment is as follows:
Test case:
fio --ioengine=sync --rw=randwrite --iodepth=64 --size=4G --name=test
--filename=/mnt/testfile --bs=4k
Before doing fio, use dd (if=/dev/zero of=/mnt/testfile bs=1M
count=4096) to warm-up the file in the page cache.

Test result (bandwidth):
         ext4                   xfs
       ~300MB/s       ~120MB/s

Test environment:
    Platform:  arm64
    Kernel:  v5.7
    PAGESIZE:  64K
    Memtotal:  16G
    Storage: sata ssd(Max bandwidth about 350MB/s)
    FS block size: 4K

The  fio "Test result" shows that EXT4 has more than 2x bandwidth compared to
XFS, but iostat shows the transfer speed of XFS to SSD is about 300MB/s too.
So I debt XFS writing back many non-dirty blocks to SSD while  writing back
dirty pages. I tried to read the core writeback code of both
filesystem and found
XFS will write back blocks which is uptodate (seeing iomap_writepage_map()),
while EXT4 writes back blocks which must be dirty (seeing
ext4_bio_write_page() ) . XFS had turned from buffer head to iomap since
V4.8, there is only a bitmap in iomap to track block's uptodate
status, no 'dirty'
concept was found, my question is if this is the reason why XFS writes many
extra blocks to SSD when doing random buffer write? If it is, then why don't we
track the dirty status of blocks in XFS?

With the questions in brain, I start digging into XFS's history, and found a
annotations in V2.6.12:
        /*
         * Calling this without startio set means we are being asked
to make a dirty
         * page ready for freeing it's buffers.  When called with
startio set then
         * we are coming from writepage.
         * When called with startio set it is important that we write the WHOLE
         * page if possible.
         * The bh->b_state's cannot know if any of the blocks or which block for
         * that matter are dirty due to mmap writes, and therefore bh
uptodate is
         * only vaild if the page itself isn't completely uptodate.  Some layers
         * may clear the page dirty flag prior to calling write page, under the
         * assumption the entire page will be written out; by not
writing out the
         * whole page the page can be reused before all valid dirty data is
         * written out.  Note: in the case of a page that has been dirty'd by
         * mapwrite and but partially setup by block_prepare_write the
         * bh->b_states's will not agree and only ones setup by BPW/BCW will
         * have valid state, thus the whole page must be written out thing.
         */
        STATIC int xfs_page_state_convert()

From above annotations, It seems this has something to do with mmap, but I
can't get the point , so I turn to you guys to get the help. Anyway, I don't
think there is such a difference about random write between XFS and EXT4.

Any reply would be appreciative, Thanks in advance.

             reply	other threads:[~2020-07-28 11:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28 11:34 Zhengyuan Liu [this message]
2020-07-28 15:34 ` [Question] About XFS random buffer write performance Darrick J. Wong
2020-07-28 15:47   ` Matthew Wilcox
2020-07-29  1:54     ` Dave Chinner
2020-07-29  2:12       ` Matthew Wilcox
2020-07-29  5:19         ` Dave Chinner
2020-07-29 18:50           ` Matthew Wilcox
2020-07-29 23:05             ` Dave Chinner
2020-07-30 13:50               ` Matthew Wilcox
2020-07-30 22:08                 ` Dave Chinner
2020-07-30 23:45                   ` Matthew Wilcox
2020-07-31  2:05                     ` Dave Chinner
2020-07-31  2:37                       ` Matthew Wilcox
2020-07-31 20:47                     ` Matthew Wilcox
2020-07-31 22:13                       ` Dave Chinner
2020-08-21  2:39                         ` Zhengyuan Liu
2020-07-31  6:55                   ` Christoph Hellwig
2020-07-29 13:02       ` Zhengyuan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOOPZo45E+hVAo9S_2psMJQzrzwmVKo_WjWOM7Zwhm_CS0J3iA@mail.gmail.com \
    --to=liuzhengyuang521@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=liuzhengyuan@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).