All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-xfs@vger.kernel.org, cluster-devel@redhat.com,
	ocfs2-devel@oss.oracle.com, Mark Fasheh <mark@fasheh.com>,
	Joel Becker <jlbec@evilplan.org>,
	Joseph Qi <joseph.qi@linux.alibaba.com>,
	Bob Peterson <rpeterso@redhat.com>,
	Andreas Gruenbacher <agruenba@redhat.com>
Subject: Re: [PATCH 00/12] Change readahead API
Date: Thu, 13 Feb 2020 05:43:16 -0800	[thread overview]
Message-ID: <20200213134316.GK7778@bombadil.infradead.org> (raw)
In-Reply-To: <20200212203852.8b7e0b28974e41227bd97329@linux-foundation.org>

On Wed, Feb 12, 2020 at 08:38:52PM -0800, Andrew Morton wrote:
> On Fri, 24 Jan 2020 17:35:41 -0800 Matthew Wilcox <willy@infradead.org> wrote:
> 
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > This series adds a readahead address_space operation to eventually
> > replace the readpages operation.  The key difference is that
> > pages are added to the page cache as they are allocated (and
> > then looked up by the filesystem) instead of passing them on a
> > list to the readpages operation and having the filesystem add
> > them to the page cache.  It's a net reduction in code for each
> > implementation, more efficient than walking a list, and solves
> > the direct-write vs buffered-read problem reported by yu kuai at
> > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/
> 
> Unclear which patch fixes this and how it did it?

I suppose the problem isn't fixed until patch 13/13 is applied.
What yu kuai is seeing is a race where readahead allocates a page,
then passes it to iomap_readpages, which calls xfs_read_iomap_begin()
which looks up the extent.  Then thread 2 does DIO which modifies the
extent, because there's nothing to say that thread 1 is still using it.
With this patch series, the readpages code puts the locked pages in the
cache before calling iomap_readpages, so any racing write will block on
the locked page until readahead is completed.

If you're tempted to put this into -mm, I have a couple of new changes;
one to fix a kernel-doc warning for mpage_readahead() and one to add
kernel-doc for iomap_readahead():

+++ b/fs/mpage.c
@@ -339,9 +339,7 @@
 
 /**
  * mpage_readahead - start reads against pages
- * @mapping: the address_space
- * @start: The number of the first page to read.
- * @nr_pages: The number of consecutive pages to read.
+ * @rac: Describes which pages to read.
  * @get_block: The filesystem's block mapper function.
  *
  * This function walks the pages and the blocks within each page, building and

+++ b/fs/iomap/buffered-io.c
@@ -395,6 +395,21 @@
 	return done;
 }
 
+/**
+ * iomap_readahead - Attempt to read pages from a file.
+ * @rac: Describes the pages to be read.
+ * @ops: The operations vector for the filesystem.
+ *
+ * This function is for filesystems to call to implement their readahead
+ * address_space operation.
+ *
+ * Context: The file is pinned by the caller, and the pages to be read are
+ * all locked and have an elevated refcount.  This function will unlock
+ * the pages (once I/O has completed on them, or I/O has been determined to
+ * not be necessary).  It will also decrease the refcount once the pages
+ * have been submitted for I/O.  After this point, the page may be removed
+ * from the page cache, and should not be referenced.
+ */
 void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 {
 	struct inode *inode = rac->mapping->host;

I'll do a v6 with those changes soon, but I would really like a bit more
review from filesystem people, particularly ocfs2 and gfs2.

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: cluster-devel@redhat.com,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Joseph Qi <joseph.qi@linux.alibaba.com>,
	Mark Fasheh <mark@fasheh.com>,
	linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	ocfs2-devel@oss.oracle.com, linux-fsdevel@vger.kernel.org,
	Bob Peterson <rpeterso@redhat.com>,
	linux-ext4@vger.kernel.org, linux-erofs@lists.ozlabs.org,
	linux-btrfs@vger.kernel.org, Joel Becker <jlbec@evilplan.org>
Subject: Re: [f2fs-dev] [PATCH 00/12] Change readahead API
Date: Thu, 13 Feb 2020 05:43:16 -0800	[thread overview]
Message-ID: <20200213134316.GK7778@bombadil.infradead.org> (raw)
In-Reply-To: <20200212203852.8b7e0b28974e41227bd97329@linux-foundation.org>

On Wed, Feb 12, 2020 at 08:38:52PM -0800, Andrew Morton wrote:
> On Fri, 24 Jan 2020 17:35:41 -0800 Matthew Wilcox <willy@infradead.org> wrote:
> 
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > This series adds a readahead address_space operation to eventually
> > replace the readpages operation.  The key difference is that
> > pages are added to the page cache as they are allocated (and
> > then looked up by the filesystem) instead of passing them on a
> > list to the readpages operation and having the filesystem add
> > them to the page cache.  It's a net reduction in code for each
> > implementation, more efficient than walking a list, and solves
> > the direct-write vs buffered-read problem reported by yu kuai at
> > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/
> 
> Unclear which patch fixes this and how it did it?

I suppose the problem isn't fixed until patch 13/13 is applied.
What yu kuai is seeing is a race where readahead allocates a page,
then passes it to iomap_readpages, which calls xfs_read_iomap_begin()
which looks up the extent.  Then thread 2 does DIO which modifies the
extent, because there's nothing to say that thread 1 is still using it.
With this patch series, the readpages code puts the locked pages in the
cache before calling iomap_readpages, so any racing write will block on
the locked page until readahead is completed.

If you're tempted to put this into -mm, I have a couple of new changes;
one to fix a kernel-doc warning for mpage_readahead() and one to add
kernel-doc for iomap_readahead():

+++ b/fs/mpage.c
@@ -339,9 +339,7 @@
 
 /**
  * mpage_readahead - start reads against pages
- * @mapping: the address_space
- * @start: The number of the first page to read.
- * @nr_pages: The number of consecutive pages to read.
+ * @rac: Describes which pages to read.
  * @get_block: The filesystem's block mapper function.
  *
  * This function walks the pages and the blocks within each page, building and

+++ b/fs/iomap/buffered-io.c
@@ -395,6 +395,21 @@
 	return done;
 }
 
+/**
+ * iomap_readahead - Attempt to read pages from a file.
+ * @rac: Describes the pages to be read.
+ * @ops: The operations vector for the filesystem.
+ *
+ * This function is for filesystems to call to implement their readahead
+ * address_space operation.
+ *
+ * Context: The file is pinned by the caller, and the pages to be read are
+ * all locked and have an elevated refcount.  This function will unlock
+ * the pages (once I/O has completed on them, or I/O has been determined to
+ * not be necessary).  It will also decrease the refcount once the pages
+ * have been submitted for I/O.  After this point, the page may be removed
+ * from the page cache, and should not be referenced.
+ */
 void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 {
 	struct inode *inode = rac->mapping->host;

I'll do a v6 with those changes soon, but I would really like a bit more
review from filesystem people, particularly ocfs2 and gfs2.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: cluster-devel@redhat.com,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Joseph Qi <joseph.qi@linux.alibaba.com>,
	Mark Fasheh <mark@fasheh.com>,
	linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	ocfs2-devel@oss.oracle.com, linux-fsdevel@vger.kernel.org,
	Bob Peterson <rpeterso@redhat.com>,
	linux-ext4@vger.kernel.org, linux-erofs@lists.ozlabs.org,
	linux-btrfs@vger.kernel.org, Joel Becker <jlbec@evilplan.org>
Subject: Re: [PATCH 00/12] Change readahead API
Date: Thu, 13 Feb 2020 05:43:16 -0800	[thread overview]
Message-ID: <20200213134316.GK7778@bombadil.infradead.org> (raw)
In-Reply-To: <20200212203852.8b7e0b28974e41227bd97329@linux-foundation.org>

On Wed, Feb 12, 2020 at 08:38:52PM -0800, Andrew Morton wrote:
> On Fri, 24 Jan 2020 17:35:41 -0800 Matthew Wilcox <willy@infradead.org> wrote:
> 
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > This series adds a readahead address_space operation to eventually
> > replace the readpages operation.  The key difference is that
> > pages are added to the page cache as they are allocated (and
> > then looked up by the filesystem) instead of passing them on a
> > list to the readpages operation and having the filesystem add
> > them to the page cache.  It's a net reduction in code for each
> > implementation, more efficient than walking a list, and solves
> > the direct-write vs buffered-read problem reported by yu kuai at
> > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/
> 
> Unclear which patch fixes this and how it did it?

I suppose the problem isn't fixed until patch 13/13 is applied.
What yu kuai is seeing is a race where readahead allocates a page,
then passes it to iomap_readpages, which calls xfs_read_iomap_begin()
which looks up the extent.  Then thread 2 does DIO which modifies the
extent, because there's nothing to say that thread 1 is still using it.
With this patch series, the readpages code puts the locked pages in the
cache before calling iomap_readpages, so any racing write will block on
the locked page until readahead is completed.

If you're tempted to put this into -mm, I have a couple of new changes;
one to fix a kernel-doc warning for mpage_readahead() and one to add
kernel-doc for iomap_readahead():

+++ b/fs/mpage.c
@@ -339,9 +339,7 @@
 
 /**
  * mpage_readahead - start reads against pages
- * @mapping: the address_space
- * @start: The number of the first page to read.
- * @nr_pages: The number of consecutive pages to read.
+ * @rac: Describes which pages to read.
  * @get_block: The filesystem's block mapper function.
  *
  * This function walks the pages and the blocks within each page, building and

+++ b/fs/iomap/buffered-io.c
@@ -395,6 +395,21 @@
 	return done;
 }
 
+/**
+ * iomap_readahead - Attempt to read pages from a file.
+ * @rac: Describes the pages to be read.
+ * @ops: The operations vector for the filesystem.
+ *
+ * This function is for filesystems to call to implement their readahead
+ * address_space operation.
+ *
+ * Context: The file is pinned by the caller, and the pages to be read are
+ * all locked and have an elevated refcount.  This function will unlock
+ * the pages (once I/O has completed on them, or I/O has been determined to
+ * not be necessary).  It will also decrease the refcount once the pages
+ * have been submitted for I/O.  After this point, the page may be removed
+ * from the page cache, and should not be referenced.
+ */
 void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 {
 	struct inode *inode = rac->mapping->host;

I'll do a v6 with those changes soon, but I would really like a bit more
review from filesystem people, particularly ocfs2 and gfs2.

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@infradead.org>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 00/12] Change readahead API
Date: Thu, 13 Feb 2020 05:43:16 -0800	[thread overview]
Message-ID: <20200213134316.GK7778@bombadil.infradead.org> (raw)
In-Reply-To: <20200212203852.8b7e0b28974e41227bd97329@linux-foundation.org>

On Wed, Feb 12, 2020 at 08:38:52PM -0800, Andrew Morton wrote:
> On Fri, 24 Jan 2020 17:35:41 -0800 Matthew Wilcox <willy@infradead.org> wrote:
> 
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > This series adds a readahead address_space operation to eventually
> > replace the readpages operation.  The key difference is that
> > pages are added to the page cache as they are allocated (and
> > then looked up by the filesystem) instead of passing them on a
> > list to the readpages operation and having the filesystem add
> > them to the page cache.  It's a net reduction in code for each
> > implementation, more efficient than walking a list, and solves
> > the direct-write vs buffered-read problem reported by yu kuai at
> > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3 at huawei.com/
> 
> Unclear which patch fixes this and how it did it?

I suppose the problem isn't fixed until patch 13/13 is applied.
What yu kuai is seeing is a race where readahead allocates a page,
then passes it to iomap_readpages, which calls xfs_read_iomap_begin()
which looks up the extent.  Then thread 2 does DIO which modifies the
extent, because there's nothing to say that thread 1 is still using it.
With this patch series, the readpages code puts the locked pages in the
cache before calling iomap_readpages, so any racing write will block on
the locked page until readahead is completed.

If you're tempted to put this into -mm, I have a couple of new changes;
one to fix a kernel-doc warning for mpage_readahead() and one to add
kernel-doc for iomap_readahead():

+++ b/fs/mpage.c
@@ -339,9 +339,7 @@
 
 /**
  * mpage_readahead - start reads against pages
- * @mapping: the address_space
- * @start: The number of the first page to read.
- * @nr_pages: The number of consecutive pages to read.
+ * @rac: Describes which pages to read.
  * @get_block: The filesystem's block mapper function.
  *
  * This function walks the pages and the blocks within each page, building and

+++ b/fs/iomap/buffered-io.c
@@ -395,6 +395,21 @@
 	return done;
 }
 
+/**
+ * iomap_readahead - Attempt to read pages from a file.
+ * @rac: Describes the pages to be read.
+ * @ops: The operations vector for the filesystem.
+ *
+ * This function is for filesystems to call to implement their readahead
+ * address_space operation.
+ *
+ * Context: The file is pinned by the caller, and the pages to be read are
+ * all locked and have an elevated refcount.  This function will unlock
+ * the pages (once I/O has completed on them, or I/O has been determined to
+ * not be necessary).  It will also decrease the refcount once the pages
+ * have been submitted for I/O.  After this point, the page may be removed
+ * from the page cache, and should not be referenced.
+ */
 void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 {
 	struct inode *inode = rac->mapping->host;

I'll do a v6 with those changes soon, but I would really like a bit more
review from filesystem people, particularly ocfs2 and gfs2.




  reply	other threads:[~2020-02-13 13:43 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-25  1:35 [PATCH 00/12] Change readahead API Matthew Wilcox
2020-01-25  1:35 ` [Cluster-devel] " Matthew Wilcox
2020-01-25  1:35 ` Matthew Wilcox
2020-01-25  1:35 ` [f2fs-dev] " Matthew Wilcox
2020-01-25  1:35 ` [PATCH 01/12] mm: Fix the return type of __do_page_cache_readahead Matthew Wilcox
2020-01-25  1:35 ` [PATCH 02/12] readahead: Ignore return value of ->readpages Matthew Wilcox
2020-01-25  1:35 ` [PATCH 03/12] readahead: Put pages in cache earlier Matthew Wilcox
2020-01-25  1:35   ` [Cluster-devel] " Matthew Wilcox
2020-01-25  1:35   ` Matthew Wilcox
2020-01-25  1:35   ` [f2fs-dev] " Matthew Wilcox
2020-01-25 19:44   ` Matthew Wilcox
2020-01-25 19:44     ` [Cluster-devel] " Matthew Wilcox
2020-01-25 19:44     ` Matthew Wilcox
2020-01-25 19:44     ` [f2fs-dev] " Matthew Wilcox
2020-01-25  1:35 ` [PATCH 04/12] mm: Add readahead address space operation Matthew Wilcox
2020-01-25  1:35   ` [Cluster-devel] " Matthew Wilcox
2020-01-25  1:35   ` Matthew Wilcox
2020-01-25  1:35   ` [f2fs-dev] " Matthew Wilcox
2020-01-25  3:57   ` Randy Dunlap
2020-01-25  3:57     ` [Cluster-devel] " Randy Dunlap
2020-01-25  3:57     ` Randy Dunlap
2020-01-25  3:57     ` [f2fs-dev] " Randy Dunlap
2020-02-01  0:25     ` Matthew Wilcox
2020-02-01  0:25       ` [Cluster-devel] " Matthew Wilcox
2020-02-01  0:25       ` Matthew Wilcox
2020-02-01  0:25       ` [f2fs-dev] " Matthew Wilcox
2020-01-29  0:24   ` Dave Chinner
2020-01-29  0:24     ` [Cluster-devel] " Dave Chinner
2020-01-29  0:24     ` Dave Chinner
2020-01-29  0:24     ` [f2fs-dev] " Dave Chinner
2020-01-29  0:24     ` [Ocfs2-devel] " Dave Chinner
2020-01-30  8:00     ` Matthew Wilcox
2020-01-30  8:00       ` [Cluster-devel] " Matthew Wilcox
2020-01-30  8:00       ` Matthew Wilcox
2020-01-30  8:00       ` [f2fs-dev] " Matthew Wilcox
2020-01-25  1:35 ` [PATCH 05/12] fs: Convert mpage_readpages to mpage_readahead Matthew Wilcox
2020-01-25  1:35   ` [Cluster-devel] " Matthew Wilcox
2020-01-25  1:35 ` [PATCH 06/12] btrfs: Convert from readpages to readahead Matthew Wilcox
2020-01-29  0:46   ` Dave Chinner
2020-01-30  8:09     ` Matthew Wilcox
2020-01-31  2:17       ` Dave Chinner
2020-01-25  1:35 ` [PATCH 07/12] erofs: Convert uncompressed files " Matthew Wilcox
2020-01-25  1:35   ` Matthew Wilcox
2020-01-25  1:53   ` Gao Xiang
2020-01-25  1:53     ` Gao Xiang via Linux-erofs
2020-01-25 19:09     ` Matthew Wilcox
2020-01-29  0:57   ` Dave Chinner
2020-01-29  0:57     ` Dave Chinner
2020-01-30  8:10     ` Matthew Wilcox
2020-01-30  8:10       ` Matthew Wilcox
2020-01-25  1:35 ` [PATCH 08/12] erofs: Convert compressed " Matthew Wilcox
2020-01-25  1:35   ` Matthew Wilcox
2020-01-25  1:35 ` [PATCH 09/12] ext4: Convert " Matthew Wilcox
2020-01-25  1:35 ` [PATCH 10/12] f2fs: " Matthew Wilcox
2020-01-25  1:35   ` [f2fs-dev] " Matthew Wilcox
2020-01-25  1:35 ` [PATCH 11/12] fuse: " Matthew Wilcox
2020-01-29  1:08   ` Dave Chinner
2020-01-30 21:35     ` Matthew Wilcox
2020-01-31  2:19       ` Dave Chinner
2020-01-29 10:50   ` Miklos Szeredi
2020-01-29 10:50     ` Miklos Szeredi
2020-01-30  7:26     ` Matthew Wilcox
2020-01-25  1:35 ` [PATCH 12/12] iomap: " Matthew Wilcox
2020-01-29  1:38   ` Dave Chinner
2020-01-31  9:44     ` Matthew Wilcox
2020-02-13  4:38 ` [PATCH 00/12] Change readahead API Andrew Morton
2020-02-13  4:38   ` [Cluster-devel] " Andrew Morton
2020-02-13  4:38   ` Andrew Morton
2020-02-13  4:38   ` [f2fs-dev] " Andrew Morton
2020-02-13  4:38   ` [Ocfs2-devel] " Andrew Morton
2020-02-13 13:43   ` Matthew Wilcox [this message]
2020-02-13 13:43     ` [Cluster-devel] " Matthew Wilcox
2020-02-13 13:43     ` Matthew Wilcox
2020-02-13 13:43     ` [f2fs-dev] " Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200213134316.GK7778@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=agruenba@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cluster-devel@redhat.com \
    --cc=jlbec@evilplan.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mark@fasheh.com \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.