From: Matthew Wilcox <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net,
linux-xfs@vger.kernel.org, cluster-devel@redhat.com,
ocfs2-devel@oss.oracle.com, Mark Fasheh <mark@fasheh.com>,
Joel Becker <jlbec@evilplan.org>,
Joseph Qi <joseph.qi@linux.alibaba.com>,
Bob Peterson <rpeterso@redhat.com>,
Andreas Gruenbacher <agruenba@redhat.com>
Subject: Re: [PATCH 00/12] Change readahead API
Date: Thu, 13 Feb 2020 05:43:16 -0800 [thread overview]
Message-ID: <20200213134316.GK7778@bombadil.infradead.org> (raw)
In-Reply-To: <20200212203852.8b7e0b28974e41227bd97329@linux-foundation.org>
On Wed, Feb 12, 2020 at 08:38:52PM -0800, Andrew Morton wrote:
> On Fri, 24 Jan 2020 17:35:41 -0800 Matthew Wilcox <willy@infradead.org> wrote:
>
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> >
> > This series adds a readahead address_space operation to eventually
> > replace the readpages operation. The key difference is that
> > pages are added to the page cache as they are allocated (and
> > then looked up by the filesystem) instead of passing them on a
> > list to the readpages operation and having the filesystem add
> > them to the page cache. It's a net reduction in code for each
> > implementation, more efficient than walking a list, and solves
> > the direct-write vs buffered-read problem reported by yu kuai at
> > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/
>
> Unclear which patch fixes this and how it did it?
I suppose the problem isn't fixed until patch 13/13 is applied.
What yu kuai is seeing is a race where readahead allocates a page,
then passes it to iomap_readpages, which calls xfs_read_iomap_begin()
which looks up the extent. Then thread 2 does DIO which modifies the
extent, because there's nothing to say that thread 1 is still using it.
With this patch series, the readpages code puts the locked pages in the
cache before calling iomap_readpages, so any racing write will block on
the locked page until readahead is completed.
If you're tempted to put this into -mm, I have a couple of new changes;
one to fix a kernel-doc warning for mpage_readahead() and one to add
kernel-doc for iomap_readahead():
+++ b/fs/mpage.c
@@ -339,9 +339,7 @@
/**
* mpage_readahead - start reads against pages
- * @mapping: the address_space
- * @start: The number of the first page to read.
- * @nr_pages: The number of consecutive pages to read.
+ * @rac: Describes which pages to read.
* @get_block: The filesystem's block mapper function.
*
* This function walks the pages and the blocks within each page, building and
+++ b/fs/iomap/buffered-io.c
@@ -395,6 +395,21 @@
return done;
}
+/**
+ * iomap_readahead - Attempt to read pages from a file.
+ * @rac: Describes the pages to be read.
+ * @ops: The operations vector for the filesystem.
+ *
+ * This function is for filesystems to call to implement their readahead
+ * address_space operation.
+ *
+ * Context: The file is pinned by the caller, and the pages to be read are
+ * all locked and have an elevated refcount. This function will unlock
+ * the pages (once I/O has completed on them, or I/O has been determined to
+ * not be necessary). It will also decrease the refcount once the pages
+ * have been submitted for I/O. After this point, the page may be removed
+ * from the page cache, and should not be referenced.
+ */
void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
{
struct inode *inode = rac->mapping->host;
I'll do a v6 with those changes soon, but I would really like a bit more
review from filesystem people, particularly ocfs2 and gfs2.
prev parent reply other threads:[~2020-02-13 13:43 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-25 1:35 [PATCH 00/12] Change readahead API Matthew Wilcox
2020-01-25 1:35 ` [PATCH 03/12] readahead: Put pages in cache earlier Matthew Wilcox
2020-01-25 19:44 ` Matthew Wilcox
2020-01-25 1:35 ` [PATCH 04/12] mm: Add readahead address space operation Matthew Wilcox
2020-01-25 3:57 ` Randy Dunlap
2020-02-01 0:25 ` Matthew Wilcox
2020-01-29 0:24 ` Dave Chinner
2020-01-30 8:00 ` Matthew Wilcox
2020-01-25 1:35 ` [PATCH 12/12] iomap: Convert from readpages to readahead Matthew Wilcox
2020-01-29 1:38 ` Dave Chinner
2020-01-31 9:44 ` Matthew Wilcox
2020-02-13 4:38 ` [PATCH 00/12] Change readahead API Andrew Morton
2020-02-13 13:43 ` Matthew Wilcox [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200213134316.GK7778@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=agruenba@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cluster-devel@redhat.com \
--cc=jlbec@evilplan.org \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mark@fasheh.com \
--cc=ocfs2-devel@oss.oracle.com \
--cc=rpeterso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).