Re: [Question] About XFS random buffer write performance

From: Matthew Wilcox <willy@infradead.org>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Zhengyuan Liu <liuzhengyuang521@gmail.com>,
	linux-xfs@vger.kernel.org,
	Zhengyuan Liu <liuzhengyuan@kylinos.cn>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [Question] About XFS random buffer write performance
Date: Tue, 28 Jul 2020 16:47:53 +0100	[thread overview]
Message-ID: <20200728154753.GS23808@casper.infradead.org> (raw)
In-Reply-To: <20200728153453.GC3151642@magnolia>

On Tue, Jul 28, 2020 at 08:34:53AM -0700, Darrick J. Wong wrote:
> On Tue, Jul 28, 2020 at 07:34:39PM +0800, Zhengyuan Liu wrote:
> > Hi all,
> > 
> > When doing random buffer write testing I found the bandwidth on EXT4 is much
> > better than XFS under the same environment.
> > The test case ,test result and test environment is as follows:
> > Test case:
> > fio --ioengine=sync --rw=randwrite --iodepth=64 --size=4G --name=test
> > --filename=/mnt/testfile --bs=4k
> > Before doing fio, use dd (if=/dev/zero of=/mnt/testfile bs=1M
> > count=4096) to warm-up the file in the page cache.
> > 
> > Test result (bandwidth):
> >          ext4                   xfs
> >        ~300MB/s       ~120MB/s
> > 
> > Test environment:
> >     Platform:  arm64
> >     Kernel:  v5.7
> >     PAGESIZE:  64K
> >     Memtotal:  16G
> >     Storage: sata ssd(Max bandwidth about 350MB/s)
> >     FS block size: 4K
> > 
> > The  fio "Test result" shows that EXT4 has more than 2x bandwidth compared to
> > XFS, but iostat shows the transfer speed of XFS to SSD is about 300MB/s too.
> > So I debt XFS writing back many non-dirty blocks to SSD while  writing back
> > dirty pages. I tried to read the core writeback code of both
> > filesystem and found
> > XFS will write back blocks which is uptodate (seeing iomap_writepage_map()),
> 
> Ahhh, right, because iomap tracks uptodate separately for each block in
> the page, but only tracks dirty status for the whole page.  Hence if you
> dirty one byte in the 64k page, xfs will write all 64k even though we
> could get away writing 4k like ext4 does.
> 
> Hey Christoph & Matthew: If you're already thinking about changing
> struct iomap_page, should we add the ability to track per-block dirty
> state to reduce the write amplification that Zhengyuan is asking about?
> 
> I'm guessing that between willy's THP series, Dave's iomap chunks
> series, and whatever Christoph may or may not be writing, at least one
> of you might have already implemented this? :)

Well, this is good timing!  I was wondering whether something along
these lines was an important use-case.

I propose we do away with the 'uptodate' bit-array and replace it with an
'writeback' bit-array.  We set the page uptodate bit whenever the reads to
fill the page have completed rather than checking the 'writeback' array.
In page_mkwrite, we fill the writeback bit-array on the grounds that we
have no way to track a block's non-dirtiness and we don't want to scan
each block at writeback time to see if it's been written to.

I'll do this now before the THP series gets reposted.