linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: "yukuai (C)" <yukuai3@huawei.com>
Cc: hch@infradead.org, darrick.wong@oracle.com,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, houtao1@huawei.com,
	zhengbin13@huawei.com, yi.zhang@huawei.com
Subject: Re: [RFC] iomap: fix race between readahead and direct write
Date: Sat, 18 Jan 2020 23:58:28 -0800	[thread overview]
Message-ID: <20200119075828.GA4147@bombadil.infradead.org> (raw)
In-Reply-To: <fafa0550-184c-e59c-9b79-bd5d716a20cc@huawei.com>

On Sun, Jan 19, 2020 at 02:55:14PM +0800, yukuai (C) wrote:
> On 2020/1/19 14:14, Matthew Wilcox wrote:
> > I don't understand your reasoning here.  If another process wants to
> > access a page of the file which isn't currently in cache, it would have
> > to first read the page in from storage.  If it's under readahead, it
> > has to wait for the read to finish.  Why is the second case worse than
> > the second?  It seems better to me.
> 
> Thanks for your response! My worries is that, for example:
> 
> We read page 0, and trigger readahead to read n pages(0 - n-1). While in
> another thread, we read page n-1.
> 
> In the current implementation, if readahead is in the process of reading
> page 0 - n-2,  later operation doesn't need to wait the former one to
> finish. However, later operation will have to wait if we add all pages
> to page cache first. And that is why I said it might cause problem for
> performance overhead.

OK, but let's put some numbers on that.  Imagine that we're using high
performance spinning rust so we have an access latency of 5ms (200
IOPS), we're accessing 20 consecutive pages which happen to have their
data contiguous on disk.  Our CPU is running at 2GHz and takes about
100,000 cycles to submit an I/O, plus 1,000 cycles to add an extra page
to the I/O.

Current implementation: Allocate 20 pages, place 19 of them in the cache,
fail to place the last one in the cache.  The later thread actually gets
to jump the queue and submit its bio first.  Its latency will be 100,000
cycles (20us) plus the 5ms access time.  But it only has 20,000 cycles
(4us) to hit this race, or it will end up behaving the same way as below.

New implementation: Allocate 20 pages, place them all in the cache,
then takes 120,000 cycles to build & submit the I/O, and wait 5ms for
the I/O to complete.

But look how much more likely it is that it'll hit during the window
where we're waiting for the I/O to complete -- 5ms is 1250 times longer
than 4us.

If it _does_ get the latency benefit of jumping the queue, the readahead
will create one or two I/Os.  If it hit page 18 instead of page 19, we'd
end up doing three I/Os; the first for page 18, then one for pages 0-17,
and one for page 19.  And that means the disk is going to be busy for
15ms, delaying the next I/O for up to 10ms.  It's actually beneficial in
the long term for the second thread to wait for the readahead to finish.

Oh, and the current ->readpages code has a race where if the page tagged
with PageReadahead ends up not being inserted, we'll lose that bit,
which means the readahead will just stop and have to restart (because
it will look to the readahead code like it's not being effective).
That's a far worse performance problem.

  reply	other threads:[~2020-01-19  7:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-16  6:36 [RFC] iomap: fix race between readahead and direct write yu kuai
2020-01-16 15:32 ` Jan Kara
2020-01-17  9:39   ` yukuai (C)
2020-01-17 11:05     ` Jan Kara
2020-01-17 16:24       ` Darrick J. Wong
2020-01-19  1:25         ` yukuai (C)
2020-01-19  1:17       ` yukuai (C)
2020-01-20 11:42         ` Jan Kara
2020-01-18 23:08 ` Matthew Wilcox
2020-01-19  1:34   ` yukuai (C)
2020-01-19  1:42     ` Matthew Wilcox
2020-01-19  1:57       ` yukuai (C)
2020-01-19  2:51       ` yukuai (C)
2020-01-19  3:01         ` Gao Xiang
2020-01-19  3:15           ` yukuai (C)
2020-01-19  6:14         ` Matthew Wilcox
2020-01-19  6:55           ` yukuai (C)
2020-01-19  7:58             ` Matthew Wilcox [this message]
2020-01-19 11:21               ` yukuai (C)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200119075828.GA4147@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=houtao1@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    --cc=zhengbin13@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).