Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-xfs@vger.kernel.org, cluster-devel@redhat.com,
	ocfs2-devel@oss.oracle.com, Mark Fasheh <mark@fasheh.com>,
	Joel Becker <jlbec@evilplan.org>,
	Joseph Qi <joseph.qi@linux.alibaba.com>,
	Bob Peterson <rpeterso@redhat.com>,
	Andreas Gruenbacher <agruenba@redhat.com>
Subject: Re: [PATCH 00/12] Change readahead API
Date: Thu, 13 Feb 2020 05:43:16 -0800
Message-ID: <20200213134316.GK7778@bombadil.infradead.org> (raw)
In-Reply-To: <20200212203852.8b7e0b28974e41227bd97329@linux-foundation.org>

On Wed, Feb 12, 2020 at 08:38:52PM -0800, Andrew Morton wrote:
> On Fri, 24 Jan 2020 17:35:41 -0800 Matthew Wilcox <willy@infradead.org> wrote:
> 
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > This series adds a readahead address_space operation to eventually
> > replace the readpages operation.  The key difference is that
> > pages are added to the page cache as they are allocated (and
> > then looked up by the filesystem) instead of passing them on a
> > list to the readpages operation and having the filesystem add
> > them to the page cache.  It's a net reduction in code for each
> > implementation, more efficient than walking a list, and solves
> > the direct-write vs buffered-read problem reported by yu kuai at
> > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/
> 
> Unclear which patch fixes this and how it did it?

I suppose the problem isn't fixed until patch 13/13 is applied.
What yu kuai is seeing is a race where readahead allocates a page,
then passes it to iomap_readpages, which calls xfs_read_iomap_begin()
which looks up the extent.  Then thread 2 does DIO which modifies the
extent, because there's nothing to say that thread 1 is still using it.
With this patch series, the readpages code puts the locked pages in the
cache before calling iomap_readpages, so any racing write will block on
the locked page until readahead is completed.

If you're tempted to put this into -mm, I have a couple of new changes;
one to fix a kernel-doc warning for mpage_readahead() and one to add
kernel-doc for iomap_readahead():

+++ b/fs/mpage.c
@@ -339,9 +339,7 @@
 
 /**
  * mpage_readahead - start reads against pages
- * @mapping: the address_space
- * @start: The number of the first page to read.
- * @nr_pages: The number of consecutive pages to read.
+ * @rac: Describes which pages to read.
  * @get_block: The filesystem's block mapper function.
  *
  * This function walks the pages and the blocks within each page, building and

+++ b/fs/iomap/buffered-io.c
@@ -395,6 +395,21 @@
 	return done;
 }
 
+/**
+ * iomap_readahead - Attempt to read pages from a file.
+ * @rac: Describes the pages to be read.
+ * @ops: The operations vector for the filesystem.
+ *
+ * This function is for filesystems to call to implement their readahead
+ * address_space operation.
+ *
+ * Context: The file is pinned by the caller, and the pages to be read are
+ * all locked and have an elevated refcount.  This function will unlock
+ * the pages (once I/O has completed on them, or I/O has been determined to
+ * not be necessary).  It will also decrease the refcount once the pages
+ * have been submitted for I/O.  After this point, the page may be removed
+ * from the page cache, and should not be referenced.
+ */
 void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 {
 	struct inode *inode = rac->mapping->host;

I'll do a v6 with those changes soon, but I would really like a bit more
review from filesystem people, particularly ocfs2 and gfs2.

      reply index

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-25  1:35 Matthew Wilcox
2020-01-25  1:35 ` [PATCH 01/12] mm: Fix the return type of __do_page_cache_readahead Matthew Wilcox
2020-01-25  1:35 ` [PATCH 02/12] readahead: Ignore return value of ->readpages Matthew Wilcox
2020-01-25  1:35 ` [PATCH 03/12] readahead: Put pages in cache earlier Matthew Wilcox
2020-01-25 19:44   ` Matthew Wilcox
2020-01-25  1:35 ` [PATCH 04/12] mm: Add readahead address space operation Matthew Wilcox
2020-01-25  3:57   ` Randy Dunlap
2020-02-01  0:25     ` Matthew Wilcox
2020-01-29  0:24   ` Dave Chinner
2020-01-30  8:00     ` Matthew Wilcox
2020-01-25  1:35 ` [PATCH 05/12] fs: Convert mpage_readpages to mpage_readahead Matthew Wilcox
2020-01-25  1:35 ` [PATCH 06/12] btrfs: Convert from readpages to readahead Matthew Wilcox
2020-01-29  0:46   ` Dave Chinner
2020-01-30  8:09     ` Matthew Wilcox
2020-01-31  2:17       ` Dave Chinner
2020-01-25  1:35 ` [PATCH 07/12] erofs: Convert uncompressed files " Matthew Wilcox
2020-01-25  1:53   ` Gao Xiang
2020-01-25 19:09     ` Matthew Wilcox
2020-01-29  0:57   ` Dave Chinner
2020-01-30  8:10     ` Matthew Wilcox
2020-01-25  1:35 ` [PATCH 08/12] erofs: Convert compressed " Matthew Wilcox
2020-01-25  1:35 ` [PATCH 09/12] ext4: Convert " Matthew Wilcox
2020-01-25  1:35 ` [PATCH 10/12] f2fs: " Matthew Wilcox
2020-01-25  1:35 ` [PATCH 11/12] fuse: " Matthew Wilcox
2020-01-29  1:08   ` Dave Chinner
2020-01-30 21:35     ` Matthew Wilcox
2020-01-31  2:19       ` Dave Chinner
2020-01-29 10:50   ` Miklos Szeredi
2020-01-30  7:26     ` Matthew Wilcox
2020-01-25  1:35 ` [PATCH 12/12] iomap: " Matthew Wilcox
2020-01-29  1:38   ` Dave Chinner
2020-01-31  9:44     ` Matthew Wilcox
2020-02-13  4:38 ` [PATCH 00/12] Change readahead API Andrew Morton
2020-02-13 13:43   ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200213134316.GK7778@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=agruenba@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cluster-devel@redhat.com \
    --cc=jlbec@evilplan.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mark@fasheh.com \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git