linux-erofs.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/23] Change readahead API
@ 2020-02-19 21:00 Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 01/24] mm: Move readahead prototypes from mm.h Matthew Wilcox
                   ` (24 more replies)
  0 siblings, 25 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This series adds a readahead address_space operation to eventually
replace the readpages operation.  The key difference is that
pages are added to the page cache as they are allocated (and
then looked up by the filesystem) instead of passing them on a
list to the readpages operation and having the filesystem add
them to the page cache.  It's a net reduction in code for each
implementation, more efficient than walking a list, and solves
the direct-write vs buffered-read problem reported by yu kuai at
https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/

The only unconverted filesystems are those which use fscache.
Their conversion is pending Dave Howells' rewrite which will make the
conversion substantially easier.

I want to thank the reviewers; Dave Chinner, John Hubbard and Christoph
Hellwig have done a marvellous job of providing constructive criticism.
Eric Biggers pointed out how I'd broken ext4 (which led to a substantial
change).  I've tried to take it all on board, but I may have missed
something simply because you've done such a thorough job.

This series can also be found at
http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/tags/readahead_v7
(I also pushed the readahead_v6 tag there in case anyone wants to diff, and
they're both based on 5.6-rc2 so they're easy to diff)

v7:
 - Now passes an xfstests run on ext4!
 - Documentation improvements
 - Move the readahead prototypes out of mm.h (new patch)
 - readahead_for_each* iterators are gone; replaced with readahead_page()
   and readahead_page_batch()
 - page_cache_readahead_limit() renamed to page_cache_readahead_unbounded()
   and arguments changed
 - iomap_readahead_actor() restructured differently
 - The readahead code no longer uses the word 'offset' to reduce ambiguity
 - read_pages() now maintains the rac so we can just call it and continue
   instead of mucking around with branches
 - More assertions
 - More readahead functions return void

v6:
 - Name the private members of readahead_control with a leading underscore
   (suggested by Christoph Hellwig)
 - Fix whitespace in rst file
 - Remove misleading comment in btrfs patch
 - Add readahead_next() API and use it in iomap
 - Add iomap_readahead kerneldoc.
 - Fix the mpage_readahead kerneldoc
 - Make various readahead functions return void
 - Keep readahead_index() and readahead_offset() pointing to the start of
   this batch through the body.  No current user requires this, but it's
   less surprising.
 - Add kerneldoc for page_cache_readahead_limit
 - Make page_idx an unsigned long, and rename it to just 'i'
 - Get rid of page_offset local variable
 - Add patch to call memalloc_nofs_save() before allocating pages (suggested
   by Michal Hocko)
 - Resplit a lot of patches for more logical progression and easier review
   (suggested by John Hubbard)
 - Added sign-offs where received, and I deemed still relevant

v5 switched to passing a readahead_control struct (mirroring the
writepages_control struct passed to writepages).  This has a number of
advantages:
 - It fixes a number of bugs in various implementations, eg forgetting to
   increment 'start', an off-by-one error in 'nr_pages' or treating 'start'
   as a byte offset instead of a page offset.
 - It allows us to change the arguments without changing all the
   implementations of ->readahead which just call mpage_readahead() or
   iomap_readahead()
 - Figuring out which pages haven't been attempted by the implementation
   is more natural this way.
 - There's less code in each implementation.

Matthew Wilcox (Oracle) (24):
  mm: Move readahead prototypes from mm.h
  mm: Return void from various readahead functions
  mm: Ignore return value of ->readpages
  mm: Move readahead nr_pages check into read_pages
  mm: Use readahead_control to pass arguments
  mm: Rename various 'offset' parameters to 'index'
  mm: rename readahead loop variable to 'i'
  mm: Remove 'page_offset' from readahead loop
  mm: Put readahead pages in cache earlier
  mm: Add readahead address space operation
  mm: Move end_index check out of readahead loop
  mm: Add page_cache_readahead_unbounded
  fs: Convert mpage_readpages to mpage_readahead
  btrfs: Convert from readpages to readahead
  erofs: Convert uncompressed files from readpages to readahead
  erofs: Convert compressed files from readpages to readahead
  ext4: Convert from readpages to readahead
  ext4: Pass the inode to ext4_mpage_readpages
  f2fs: Convert from readpages to readahead
  fuse: Convert from readpages to readahead
  iomap: Restructure iomap_readpages_actor
  iomap: Convert from readpages to readahead
  mm: Document why we don't set PageReadahead
  mm: Use memalloc_nofs_save in readahead path

 Documentation/filesystems/locking.rst |   6 +-
 Documentation/filesystems/vfs.rst     |  15 ++
 block/blk-core.c                      |   1 +
 drivers/staging/exfat/exfat_super.c   |   7 +-
 fs/block_dev.c                        |   7 +-
 fs/btrfs/extent_io.c                  |  46 ++---
 fs/btrfs/extent_io.h                  |   3 +-
 fs/btrfs/inode.c                      |  16 +-
 fs/erofs/data.c                       |  39 ++--
 fs/erofs/zdata.c                      |  29 +--
 fs/ext2/inode.c                       |  10 +-
 fs/ext4/ext4.h                        |   5 +-
 fs/ext4/inode.c                       |  21 +-
 fs/ext4/readpage.c                    |  25 +--
 fs/ext4/verity.c                      |  35 +---
 fs/f2fs/data.c                        |  50 ++---
 fs/f2fs/f2fs.h                        |   5 +-
 fs/f2fs/verity.c                      |  35 +---
 fs/fat/inode.c                        |   7 +-
 fs/fuse/file.c                        |  46 ++---
 fs/gfs2/aops.c                        |  23 +--
 fs/hpfs/file.c                        |   7 +-
 fs/iomap/buffered-io.c                | 124 +++++-------
 fs/iomap/trace.h                      |   2 +-
 fs/isofs/inode.c                      |   7 +-
 fs/jfs/inode.c                        |   7 +-
 fs/mpage.c                            |  38 +---
 fs/nilfs2/inode.c                     |  15 +-
 fs/ocfs2/aops.c                       |  34 ++--
 fs/omfs/file.c                        |   7 +-
 fs/qnx6/inode.c                       |   7 +-
 fs/reiserfs/inode.c                   |   8 +-
 fs/udf/inode.c                        |   7 +-
 fs/xfs/xfs_aops.c                     |  13 +-
 fs/zonefs/super.c                     |   7 +-
 include/linux/fs.h                    |   2 +
 include/linux/iomap.h                 |   3 +-
 include/linux/mm.h                    |  19 --
 include/linux/mpage.h                 |   4 +-
 include/linux/pagemap.h               | 103 ++++++++++
 include/trace/events/erofs.h          |   6 +-
 include/trace/events/f2fs.h           |   6 +-
 mm/fadvise.c                          |   6 +-
 mm/internal.h                         |  12 +-
 mm/migrate.c                          |   2 +-
 mm/readahead.c                        | 277 ++++++++++++++++----------
 46 files changed, 551 insertions(+), 603 deletions(-)


base-commit: 11a48a5a18c63fd7621bb050228cebf13566e4d8
-- 
2.25.0

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v7 01/24] mm: Move readahead prototypes from mm.h
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-21  2:43   ` John Hubbard
  2020-02-24 21:32   ` Christoph Hellwig
  2020-02-19 21:00 ` [PATCH v7 02/24] mm: Return void from various readahead functions Matthew Wilcox
                   ` (23 subsequent siblings)
  24 siblings, 2 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

The readahead code is part of the page cache so should be found in the
pagemap.h file.  force_page_cache_readahead is only used within mm,
so move it to mm/internal.h instead.  Remove the parameter names where
they add no value, and rename the ones which were actively misleading.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 block/blk-core.c        |  1 +
 include/linux/mm.h      | 19 -------------------
 include/linux/pagemap.h |  8 ++++++++
 mm/fadvise.c            |  2 ++
 mm/internal.h           |  2 ++
 5 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 089e890ab208..41417bb93634 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -20,6 +20,7 @@
 #include <linux/blk-mq.h>
 #include <linux/highmem.h>
 #include <linux/mm.h>
+#include <linux/pagemap.h>
 #include <linux/kernel_stat.h>
 #include <linux/string.h>
 #include <linux/init.h>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..68dcda9a2112 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2401,25 +2401,6 @@ extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf);
 int __must_check write_one_page(struct page *page);
 void task_dirty_inc(struct task_struct *tsk);
 
-/* readahead.c */
-#define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
-
-int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
-			pgoff_t offset, unsigned long nr_to_read);
-
-void page_cache_sync_readahead(struct address_space *mapping,
-			       struct file_ra_state *ra,
-			       struct file *filp,
-			       pgoff_t offset,
-			       unsigned long size);
-
-void page_cache_async_readahead(struct address_space *mapping,
-				struct file_ra_state *ra,
-				struct file *filp,
-				struct page *pg,
-				pgoff_t offset,
-				unsigned long size);
-
 extern unsigned long stack_guard_gap;
 /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */
 extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ccb14b6a16b5..24894b9b90c9 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -614,6 +614,14 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
 void delete_from_page_cache_batch(struct address_space *mapping,
 				  struct pagevec *pvec);
 
+#define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
+
+void page_cache_sync_readahead(struct address_space *, struct file_ra_state *,
+		struct file *, pgoff_t index, unsigned long req_count);
+void page_cache_async_readahead(struct address_space *, struct file_ra_state *,
+		struct file *, struct page *, pgoff_t index,
+		unsigned long req_count);
+
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
  * the page is new, so we can just run __SetPageLocked() against it.
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 4f17c83db575..3efebfb9952c 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -22,6 +22,8 @@
 
 #include <asm/unistd.h>
 
+#include "internal.h"
+
 /*
  * POSIX_FADV_WILLNEED could set PG_Referenced, and POSIX_FADV_NOREUSE could
  * deactivate the pages and clear PG_Referenced.
diff --git a/mm/internal.h b/mm/internal.h
index 3cf20ab3ca01..83f353e74654 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -49,6 +49,8 @@ void unmap_page_range(struct mmu_gather *tlb,
 			     unsigned long addr, unsigned long end,
 			     struct zap_details *details);
 
+int force_page_cache_readahead(struct address_space *, struct file *,
+		pgoff_t index, unsigned long nr_to_read);
 extern unsigned int __do_page_cache_readahead(struct address_space *mapping,
 		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
 		unsigned long lookahead_size);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 02/24] mm: Return void from various readahead functions
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 01/24] mm: Move readahead prototypes from mm.h Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-24 21:33   ` Christoph Hellwig
  2020-02-19 21:00 ` [PATCH v7 03/24] mm: Ignore return value of ->readpages Matthew Wilcox
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, John Hubbard, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	Dave Chinner, linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

ondemand_readahead has two callers, neither of which use the return value.
That means that both ra_submit and __do_page_cache_readahead() can return
void, and we don't need to worry that a present page in the readahead
window causes us to return a smaller nr_pages than we ought to have.

Similarly, no caller uses the return value from force_page_cache_readahead().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 mm/fadvise.c   |  4 ----
 mm/internal.h  | 12 ++++++------
 mm/readahead.c | 31 +++++++++++++------------------
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index 3efebfb9952c..0e66f2aaeea3 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -104,10 +104,6 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 		if (!nrpages)
 			nrpages = ~0UL;
 
-		/*
-		 * Ignore return value because fadvise() shall return
-		 * success even if filesystem can't retrieve a hint,
-		 */
 		force_page_cache_readahead(mapping, file, start_index, nrpages);
 		break;
 	case POSIX_FADV_NOREUSE:
diff --git a/mm/internal.h b/mm/internal.h
index 83f353e74654..15aaebebd768 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -49,20 +49,20 @@ void unmap_page_range(struct mmu_gather *tlb,
 			     unsigned long addr, unsigned long end,
 			     struct zap_details *details);
 
-int force_page_cache_readahead(struct address_space *, struct file *,
+void force_page_cache_readahead(struct address_space *, struct file *,
 		pgoff_t index, unsigned long nr_to_read);
-extern unsigned int __do_page_cache_readahead(struct address_space *mapping,
-		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
+void __do_page_cache_readahead(struct address_space *, struct file *,
+		pgoff_t index, unsigned long nr_to_read,
 		unsigned long lookahead_size);
 
 /*
  * Submit IO for the read-ahead request in file_ra_state.
  */
-static inline unsigned long ra_submit(struct file_ra_state *ra,
+static inline void ra_submit(struct file_ra_state *ra,
 		struct address_space *mapping, struct file *filp)
 {
-	return __do_page_cache_readahead(mapping, filp,
-					ra->start, ra->size, ra->async_size);
+	__do_page_cache_readahead(mapping, filp,
+			ra->start, ra->size, ra->async_size);
 }
 
 /*
diff --git a/mm/readahead.c b/mm/readahead.c
index 2fe72cd29b47..41a592886da7 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -149,10 +149,8 @@ static int read_pages(struct address_space *mapping, struct file *filp,
  * the pages first, then submits them for I/O. This avoids the very bad
  * behaviour which would occur if page allocations are causing VM writeback.
  * We really don't want to intermingle reads and writes like that.
- *
- * Returns the number of pages requested, or the maximum amount of I/O allowed.
  */
-unsigned int __do_page_cache_readahead(struct address_space *mapping,
+void __do_page_cache_readahead(struct address_space *mapping,
 		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
 		unsigned long lookahead_size)
 {
@@ -166,7 +164,7 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping,
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
 
 	if (isize == 0)
-		goto out;
+		return;
 
 	end_index = ((isize - 1) >> PAGE_SHIFT);
 
@@ -211,23 +209,21 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping,
 	if (nr_pages)
 		read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
 	BUG_ON(!list_empty(&page_pool));
-out:
-	return nr_pages;
 }
 
 /*
  * Chunk the readahead into 2 megabyte units, so that we don't pin too much
  * memory at once.
  */
-int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
-			       pgoff_t offset, unsigned long nr_to_read)
+void force_page_cache_readahead(struct address_space *mapping,
+		struct file *filp, pgoff_t offset, unsigned long nr_to_read)
 {
 	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
 	struct file_ra_state *ra = &filp->f_ra;
 	unsigned long max_pages;
 
 	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
-		return -EINVAL;
+		return;
 
 	/*
 	 * If the request exceeds the readahead window, allow the read to
@@ -245,7 +241,6 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
 		offset += this_chunk;
 		nr_to_read -= this_chunk;
 	}
-	return 0;
 }
 
 /*
@@ -378,11 +373,10 @@ static int try_context_readahead(struct address_space *mapping,
 /*
  * A minimal readahead algorithm for trivial sequential/random reads.
  */
-static unsigned long
-ondemand_readahead(struct address_space *mapping,
-		   struct file_ra_state *ra, struct file *filp,
-		   bool hit_readahead_marker, pgoff_t offset,
-		   unsigned long req_size)
+static void ondemand_readahead(struct address_space *mapping,
+		struct file_ra_state *ra, struct file *filp,
+		bool hit_readahead_marker, pgoff_t offset,
+		unsigned long req_size)
 {
 	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
 	unsigned long max_pages = ra->ra_pages;
@@ -428,7 +422,7 @@ ondemand_readahead(struct address_space *mapping,
 		rcu_read_unlock();
 
 		if (!start || start - offset > max_pages)
-			return 0;
+			return;
 
 		ra->start = start;
 		ra->size = start - offset;	/* old async_size */
@@ -464,7 +458,8 @@ ondemand_readahead(struct address_space *mapping,
 	 * standalone, small random read
 	 * Read as is, and do not pollute the readahead state.
 	 */
-	return __do_page_cache_readahead(mapping, filp, offset, req_size, 0);
+	__do_page_cache_readahead(mapping, filp, offset, req_size, 0);
+	return;
 
 initial_readahead:
 	ra->start = offset;
@@ -489,7 +484,7 @@ ondemand_readahead(struct address_space *mapping,
 		}
 	}
 
-	return ra_submit(ra, mapping, filp);
+	ra_submit(ra, mapping, filp);
 }
 
 /**
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 03/24] mm: Ignore return value of ->readpages
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 01/24] mm: Move readahead prototypes from mm.h Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 02/24] mm: Return void from various readahead functions Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages Matthew Wilcox
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, John Hubbard, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	Dave Chinner, linux-ext4, linux-erofs, Christoph Hellwig,
	linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

We used to assign the return value to a variable, which we then ignored.
Remove the pretence of caring.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 mm/readahead.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 41a592886da7..61b15b6b9e72 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -113,17 +113,16 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages,
 
 EXPORT_SYMBOL(read_cache_pages);
 
-static int read_pages(struct address_space *mapping, struct file *filp,
+static void read_pages(struct address_space *mapping, struct file *filp,
 		struct list_head *pages, unsigned int nr_pages, gfp_t gfp)
 {
 	struct blk_plug plug;
 	unsigned page_idx;
-	int ret;
 
 	blk_start_plug(&plug);
 
 	if (mapping->a_ops->readpages) {
-		ret = mapping->a_ops->readpages(filp, mapping, pages, nr_pages);
+		mapping->a_ops->readpages(filp, mapping, pages, nr_pages);
 		/* Clean up the remaining pages */
 		put_pages_list(pages);
 		goto out;
@@ -136,12 +135,9 @@ static int read_pages(struct address_space *mapping, struct file *filp,
 			mapping->a_ops->readpage(filp, page);
 		put_page(page);
 	}
-	ret = 0;
 
 out:
 	blk_finish_plug(&plug);
-
-	return ret;
 }
 
 /*
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (2 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 03/24] mm: Ignore return value of ->readpages Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-20 14:36   ` Zi Yan
                     ` (2 more replies)
  2020-02-19 21:00 ` [PATCH v7 05/24] mm: Use readahead_control to pass arguments Matthew Wilcox
                   ` (20 subsequent siblings)
  24 siblings, 3 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Simplify the callers by moving the check for nr_pages and the BUG_ON
into read_pages().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/readahead.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 61b15b6b9e72..9fcd4e32b62d 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -119,6 +119,9 @@ static void read_pages(struct address_space *mapping, struct file *filp,
 	struct blk_plug plug;
 	unsigned page_idx;
 
+	if (!nr_pages)
+		return;
+
 	blk_start_plug(&plug);
 
 	if (mapping->a_ops->readpages) {
@@ -138,6 +141,8 @@ static void read_pages(struct address_space *mapping, struct file *filp,
 
 out:
 	blk_finish_plug(&plug);
+
+	BUG_ON(!list_empty(pages));
 }
 
 /*
@@ -180,8 +185,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 			 * contiguous pages before continuing with the next
 			 * batch.
 			 */
-			if (nr_pages)
-				read_pages(mapping, filp, &page_pool, nr_pages,
+			read_pages(mapping, filp, &page_pool, nr_pages,
 						gfp_mask);
 			nr_pages = 0;
 			continue;
@@ -202,9 +206,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	 * uptodate then the caller will launch readpage again, and
 	 * will then handle the error.
 	 */
-	if (nr_pages)
-		read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
-	BUG_ON(!list_empty(&page_pool));
+	read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
 }
 
 /*
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 05/24] mm: Use readahead_control to pass arguments
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (3 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-24 21:36   ` Christoph Hellwig
  2020-02-19 21:00 ` [PATCH v7 06/24] mm: Rename various 'offset' parameters to 'index' Matthew Wilcox
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, John Hubbard, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

In this patch, only between __do_page_cache_readahead() and
read_pages(), but it will be extended in upcoming patches.  Also add
the readahead_count() accessor.  The read_pages() function becomes aops
centric, as this makes the most sense by the end of the patchset.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/pagemap.h | 17 +++++++++++++++++
 mm/readahead.c          | 34 ++++++++++++++++++++--------------
 2 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 24894b9b90c9..55fcea0249e6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -638,6 +638,23 @@ static inline int add_to_page_cache(struct page *page,
 	return error;
 }
 
+/*
+ * Readahead is of a block of consecutive pages.
+ */
+struct readahead_control {
+	struct file *file;
+	struct address_space *mapping;
+/* private: use the readahead_* accessors instead */
+	pgoff_t _index;
+	unsigned int _nr_pages;
+};
+
+/* The number of pages in this readahead block */
+static inline unsigned int readahead_count(struct readahead_control *rac)
+{
+	return rac->_nr_pages;
+}
+
 static inline unsigned long dir_pages(struct inode *inode)
 {
 	return (unsigned long)(inode->i_size + PAGE_SIZE - 1) >>
diff --git a/mm/readahead.c b/mm/readahead.c
index 9fcd4e32b62d..6a9d99229bd6 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -113,29 +113,32 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages,
 
 EXPORT_SYMBOL(read_cache_pages);
 
-static void read_pages(struct address_space *mapping, struct file *filp,
-		struct list_head *pages, unsigned int nr_pages, gfp_t gfp)
+static void read_pages(struct readahead_control *rac, struct list_head *pages,
+		gfp_t gfp)
 {
+	const struct address_space_operations *aops = rac->mapping->a_ops;
 	struct blk_plug plug;
 	unsigned page_idx;
 
-	if (!nr_pages)
+	if (!readahead_count(rac))
 		return;
 
 	blk_start_plug(&plug);
 
-	if (mapping->a_ops->readpages) {
-		mapping->a_ops->readpages(filp, mapping, pages, nr_pages);
+	if (aops->readpages) {
+		aops->readpages(rac->file, rac->mapping, pages,
+				readahead_count(rac));
 		/* Clean up the remaining pages */
 		put_pages_list(pages);
 		goto out;
 	}
 
-	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
+	for (page_idx = 0; page_idx < readahead_count(rac); page_idx++) {
 		struct page *page = lru_to_page(pages);
 		list_del(&page->lru);
-		if (!add_to_page_cache_lru(page, mapping, page->index, gfp))
-			mapping->a_ops->readpage(filp, page);
+		if (!add_to_page_cache_lru(page, rac->mapping, page->index,
+				gfp))
+			aops->readpage(rac->file, page);
 		put_page(page);
 	}
 
@@ -143,6 +146,7 @@ static void read_pages(struct address_space *mapping, struct file *filp,
 	blk_finish_plug(&plug);
 
 	BUG_ON(!list_empty(pages));
+	rac->_nr_pages = 0;
 }
 
 /*
@@ -160,9 +164,13 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	unsigned long end_index;	/* The last page we want to read */
 	LIST_HEAD(page_pool);
 	int page_idx;
-	unsigned int nr_pages = 0;
 	loff_t isize = i_size_read(inode);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
+	struct readahead_control rac = {
+		.mapping = mapping,
+		.file = filp,
+		._nr_pages = 0,
+	};
 
 	if (isize == 0)
 		return;
@@ -185,9 +193,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 			 * contiguous pages before continuing with the next
 			 * batch.
 			 */
-			read_pages(mapping, filp, &page_pool, nr_pages,
-						gfp_mask);
-			nr_pages = 0;
+			read_pages(&rac, &page_pool, gfp_mask);
 			continue;
 		}
 
@@ -198,7 +204,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 		list_add(&page->lru, &page_pool);
 		if (page_idx == nr_to_read - lookahead_size)
 			SetPageReadahead(page);
-		nr_pages++;
+		rac._nr_pages++;
 	}
 
 	/*
@@ -206,7 +212,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	 * uptodate then the caller will launch readpage again, and
 	 * will then handle the error.
 	 */
-	read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
+	read_pages(&rac, &page_pool, gfp_mask);
 }
 
 /*
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 06/24] mm: Rename various 'offset' parameters to 'index'
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (4 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 05/24] mm: Use readahead_control to pass arguments Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-21  2:21   ` John Hubbard
  2020-02-21  3:27   ` John Hubbard
  2020-02-19 21:00 ` [PATCH v7 07/24] mm: rename readahead loop variable to 'i' Matthew Wilcox
                   ` (18 subsequent siblings)
  24 siblings, 2 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

The word 'offset' is used ambiguously to mean 'byte offset within
a page', 'byte offset from the start of the file' and 'page offset
from the start of the file'.  Use 'index' to mean 'page offset
from the start of the file' throughout the readahead code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/readahead.c | 86 ++++++++++++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 44 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 6a9d99229bd6..096cf9020648 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -156,7 +156,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
  * We really don't want to intermingle reads and writes like that.
  */
 void __do_page_cache_readahead(struct address_space *mapping,
-		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
+		struct file *filp, pgoff_t index, unsigned long nr_to_read,
 		unsigned long lookahead_size)
 {
 	struct inode *inode = mapping->host;
@@ -181,7 +181,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	 * Preallocate as many pages as we will need.
 	 */
 	for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
-		pgoff_t page_offset = offset + page_idx;
+		pgoff_t page_offset = index + page_idx;
 
 		if (page_offset > end_index)
 			break;
@@ -220,7 +220,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
  * memory at once.
  */
 void force_page_cache_readahead(struct address_space *mapping,
-		struct file *filp, pgoff_t offset, unsigned long nr_to_read)
+		struct file *filp, pgoff_t index, unsigned long nr_to_read)
 {
 	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
 	struct file_ra_state *ra = &filp->f_ra;
@@ -240,9 +240,9 @@ void force_page_cache_readahead(struct address_space *mapping,
 
 		if (this_chunk > nr_to_read)
 			this_chunk = nr_to_read;
-		__do_page_cache_readahead(mapping, filp, offset, this_chunk, 0);
+		__do_page_cache_readahead(mapping, filp, index, this_chunk, 0);
 
-		offset += this_chunk;
+		index += this_chunk;
 		nr_to_read -= this_chunk;
 	}
 }
@@ -323,21 +323,21 @@ static unsigned long get_next_ra_size(struct file_ra_state *ra,
  */
 
 /*
- * Count contiguously cached pages from @offset-1 to @offset-@max,
+ * Count contiguously cached pages from @index-1 to @index-@max,
  * this count is a conservative estimation of
  * 	- length of the sequential read sequence, or
  * 	- thrashing threshold in memory tight systems
  */
 static pgoff_t count_history_pages(struct address_space *mapping,
-				   pgoff_t offset, unsigned long max)
+				   pgoff_t index, unsigned long max)
 {
 	pgoff_t head;
 
 	rcu_read_lock();
-	head = page_cache_prev_miss(mapping, offset - 1, max);
+	head = page_cache_prev_miss(mapping, index - 1, max);
 	rcu_read_unlock();
 
-	return offset - 1 - head;
+	return index - 1 - head;
 }
 
 /*
@@ -345,13 +345,13 @@ static pgoff_t count_history_pages(struct address_space *mapping,
  */
 static int try_context_readahead(struct address_space *mapping,
 				 struct file_ra_state *ra,
-				 pgoff_t offset,
+				 pgoff_t index,
 				 unsigned long req_size,
 				 unsigned long max)
 {
 	pgoff_t size;
 
-	size = count_history_pages(mapping, offset, max);
+	size = count_history_pages(mapping, index, max);
 
 	/*
 	 * not enough history pages:
@@ -364,10 +364,10 @@ static int try_context_readahead(struct address_space *mapping,
 	 * starts from beginning of file:
 	 * it is a strong indication of long-run stream (or whole-file-read)
 	 */
-	if (size >= offset)
+	if (size >= index)
 		size *= 2;
 
-	ra->start = offset;
+	ra->start = index;
 	ra->size = min(size + req_size, max);
 	ra->async_size = 1;
 
@@ -379,13 +379,13 @@ static int try_context_readahead(struct address_space *mapping,
  */
 static void ondemand_readahead(struct address_space *mapping,
 		struct file_ra_state *ra, struct file *filp,
-		bool hit_readahead_marker, pgoff_t offset,
+		bool hit_readahead_marker, pgoff_t index,
 		unsigned long req_size)
 {
 	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
 	unsigned long max_pages = ra->ra_pages;
 	unsigned long add_pages;
-	pgoff_t prev_offset;
+	pgoff_t prev_index;
 
 	/*
 	 * If the request exceeds the readahead window, allow the read to
@@ -397,15 +397,15 @@ static void ondemand_readahead(struct address_space *mapping,
 	/*
 	 * start of file
 	 */
-	if (!offset)
+	if (!index)
 		goto initial_readahead;
 
 	/*
-	 * It's the expected callback offset, assume sequential access.
+	 * It's the expected callback index, assume sequential access.
 	 * Ramp up sizes, and push forward the readahead window.
 	 */
-	if ((offset == (ra->start + ra->size - ra->async_size) ||
-	     offset == (ra->start + ra->size))) {
+	if ((index == (ra->start + ra->size - ra->async_size) ||
+	     index == (ra->start + ra->size))) {
 		ra->start += ra->size;
 		ra->size = get_next_ra_size(ra, max_pages);
 		ra->async_size = ra->size;
@@ -422,14 +422,14 @@ static void ondemand_readahead(struct address_space *mapping,
 		pgoff_t start;
 
 		rcu_read_lock();
-		start = page_cache_next_miss(mapping, offset + 1, max_pages);
+		start = page_cache_next_miss(mapping, index + 1, max_pages);
 		rcu_read_unlock();
 
-		if (!start || start - offset > max_pages)
+		if (!start || start - index > max_pages)
 			return;
 
 		ra->start = start;
-		ra->size = start - offset;	/* old async_size */
+		ra->size = start - index;	/* old async_size */
 		ra->size += req_size;
 		ra->size = get_next_ra_size(ra, max_pages);
 		ra->async_size = ra->size;
@@ -444,29 +444,29 @@ static void ondemand_readahead(struct address_space *mapping,
 
 	/*
 	 * sequential cache miss
-	 * trivial case: (offset - prev_offset) == 1
-	 * unaligned reads: (offset - prev_offset) == 0
+	 * trivial case: (index - prev_index) == 1
+	 * unaligned reads: (index - prev_index) == 0
 	 */
-	prev_offset = (unsigned long long)ra->prev_pos >> PAGE_SHIFT;
-	if (offset - prev_offset <= 1UL)
+	prev_index = (unsigned long long)ra->prev_pos >> PAGE_SHIFT;
+	if (index - prev_index <= 1UL)
 		goto initial_readahead;
 
 	/*
 	 * Query the page cache and look for the traces(cached history pages)
 	 * that a sequential stream would leave behind.
 	 */
-	if (try_context_readahead(mapping, ra, offset, req_size, max_pages))
+	if (try_context_readahead(mapping, ra, index, req_size, max_pages))
 		goto readit;
 
 	/*
 	 * standalone, small random read
 	 * Read as is, and do not pollute the readahead state.
 	 */
-	__do_page_cache_readahead(mapping, filp, offset, req_size, 0);
+	__do_page_cache_readahead(mapping, filp, index, req_size, 0);
 	return;
 
 initial_readahead:
-	ra->start = offset;
+	ra->start = index;
 	ra->size = get_init_ra_size(req_size, max_pages);
 	ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
 
@@ -477,7 +477,7 @@ static void ondemand_readahead(struct address_space *mapping,
 	 * the resulted next readahead window into the current one.
 	 * Take care of maximum IO pages as above.
 	 */
-	if (offset == ra->start && ra->size == ra->async_size) {
+	if (index == ra->start && ra->size == ra->async_size) {
 		add_pages = get_next_ra_size(ra, max_pages);
 		if (ra->size + add_pages <= max_pages) {
 			ra->async_size = add_pages;
@@ -496,9 +496,8 @@ static void ondemand_readahead(struct address_space *mapping,
  * @mapping: address_space which holds the pagecache and I/O vectors
  * @ra: file_ra_state which holds the readahead state
  * @filp: passed on to ->readpage() and ->readpages()
- * @offset: start offset into @mapping, in pagecache page-sized units
- * @req_size: hint: total size of the read which the caller is performing in
- *            pagecache pages
+ * @index: Index of first page to be read.
+ * @req_count: Total number of pages being read by the caller.
  *
  * page_cache_sync_readahead() should be called when a cache miss happened:
  * it will submit the read.  The readahead logic may decide to piggyback more
@@ -507,7 +506,7 @@ static void ondemand_readahead(struct address_space *mapping,
  */
 void page_cache_sync_readahead(struct address_space *mapping,
 			       struct file_ra_state *ra, struct file *filp,
-			       pgoff_t offset, unsigned long req_size)
+			       pgoff_t index, unsigned long req_count)
 {
 	/* no read-ahead */
 	if (!ra->ra_pages)
@@ -518,12 +517,12 @@ void page_cache_sync_readahead(struct address_space *mapping,
 
 	/* be dumb */
 	if (filp && (filp->f_mode & FMODE_RANDOM)) {
-		force_page_cache_readahead(mapping, filp, offset, req_size);
+		force_page_cache_readahead(mapping, filp, index, req_count);
 		return;
 	}
 
 	/* do read-ahead */
-	ondemand_readahead(mapping, ra, filp, false, offset, req_size);
+	ondemand_readahead(mapping, ra, filp, false, index, req_count);
 }
 EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
 
@@ -532,21 +531,20 @@ EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
  * @mapping: address_space which holds the pagecache and I/O vectors
  * @ra: file_ra_state which holds the readahead state
  * @filp: passed on to ->readpage() and ->readpages()
- * @page: the page at @offset which has the PG_readahead flag set
- * @offset: start offset into @mapping, in pagecache page-sized units
- * @req_size: hint: total size of the read which the caller is performing in
- *            pagecache pages
+ * @page: The page at @index which triggered the readahead call.
+ * @index: Index of first page to be read.
+ * @req_count: Total number of pages being read by the caller.
  *
  * page_cache_async_readahead() should be called when a page is used which
- * has the PG_readahead flag; this is a marker to suggest that the application
+ * is marked as PageReadahead; this is a marker to suggest that the application
  * has used up enough of the readahead window that we should start pulling in
  * more pages.
  */
 void
 page_cache_async_readahead(struct address_space *mapping,
 			   struct file_ra_state *ra, struct file *filp,
-			   struct page *page, pgoff_t offset,
-			   unsigned long req_size)
+			   struct page *page, pgoff_t index,
+			   unsigned long req_count)
 {
 	/* no read-ahead */
 	if (!ra->ra_pages)
@@ -570,7 +568,7 @@ page_cache_async_readahead(struct address_space *mapping,
 		return;
 
 	/* do read-ahead */
-	ondemand_readahead(mapping, ra, filp, true, offset, req_size);
+	ondemand_readahead(mapping, ra, filp, true, index, req_count);
 }
 EXPORT_SYMBOL_GPL(page_cache_async_readahead);
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 07/24] mm: rename readahead loop variable to 'i'
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (5 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 06/24] mm: Rename various 'offset' parameters to 'index' Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 08/24] mm: Remove 'page_offset' from readahead loop Matthew Wilcox
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, John Hubbard, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	Dave Chinner, linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Change the type of page_idx to unsigned long, and rename it -- it's
just a loop counter, not a page index.

Suggested-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 mm/readahead.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 096cf9020648..8a25fc7e2bf2 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -163,7 +163,6 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	struct page *page;
 	unsigned long end_index;	/* The last page we want to read */
 	LIST_HEAD(page_pool);
-	int page_idx;
 	loff_t isize = i_size_read(inode);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
 	struct readahead_control rac = {
@@ -171,6 +170,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 		.file = filp,
 		._nr_pages = 0,
 	};
+	unsigned long i;
 
 	if (isize == 0)
 		return;
@@ -180,8 +180,8 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	/*
 	 * Preallocate as many pages as we will need.
 	 */
-	for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
-		pgoff_t page_offset = index + page_idx;
+	for (i = 0; i < nr_to_read; i++) {
+		pgoff_t page_offset = index + i;
 
 		if (page_offset > end_index)
 			break;
@@ -202,7 +202,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 			break;
 		page->index = page_offset;
 		list_add(&page->lru, &page_pool);
-		if (page_idx == nr_to_read - lookahead_size)
+		if (i == nr_to_read - lookahead_size)
 			SetPageReadahead(page);
 		rac._nr_pages++;
 	}
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 08/24] mm: Remove 'page_offset' from readahead loop
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (6 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 07/24] mm: rename readahead loop variable to 'i' Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-21  2:48   ` John Hubbard
  2020-02-24 21:37   ` Christoph Hellwig
  2020-02-19 21:00 ` [PATCH v7 09/24] mm: Put readahead pages in cache earlier Matthew Wilcox
                   ` (16 subsequent siblings)
  24 siblings, 2 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Replace the page_offset variable with 'index + i'.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/readahead.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 8a25fc7e2bf2..83df5c061d33 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -181,12 +181,10 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	 * Preallocate as many pages as we will need.
 	 */
 	for (i = 0; i < nr_to_read; i++) {
-		pgoff_t page_offset = index + i;
-
-		if (page_offset > end_index)
+		if (index + i > end_index)
 			break;
 
-		page = xa_load(&mapping->i_pages, page_offset);
+		page = xa_load(&mapping->i_pages, index + i);
 		if (page && !xa_is_value(page)) {
 			/*
 			 * Page already present?  Kick off the current batch of
@@ -200,7 +198,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 		page = __page_cache_alloc(gfp_mask);
 		if (!page)
 			break;
-		page->index = page_offset;
+		page->index = index + i;
 		list_add(&page->lru, &page_pool);
 		if (i == nr_to_read - lookahead_size)
 			SetPageReadahead(page);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 09/24] mm: Put readahead pages in cache earlier
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (7 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 08/24] mm: Remove 'page_offset' from readahead loop Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-21  3:19   ` John Hubbard
  2020-02-24 21:40   ` Christoph Hellwig
  2020-02-19 21:00 ` [PATCH v7 10/24] mm: Add readahead address space operation Matthew Wilcox
                   ` (15 subsequent siblings)
  24 siblings, 2 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

When populating the page cache for readahead, mappings that use
->readpages must populate the page cache themselves as the pages are
passed on a linked list which would normally be used for the page cache's
LRU.  For mappings that use ->readpage or the upcoming ->readahead method,
we can put the pages into the page cache as soon as they're allocated,
which solves a race between readahead and direct IO.  It also lets us
remove the gfp argument from read_pages().

Use the new readahead_page() API to implement the repeated calls to
->readpage(), just like most filesystems will.  This iterator also
supports huge pages, even though none of the filesystems have been
converted to use them yet.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 20 +++++++++++++++++
 mm/readahead.c          | 48 +++++++++++++++++++++++++----------------
 2 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 55fcea0249e6..4989d330fada 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -647,8 +647,28 @@ struct readahead_control {
 /* private: use the readahead_* accessors instead */
 	pgoff_t _index;
 	unsigned int _nr_pages;
+	unsigned int _batch_count;
 };
 
+static inline struct page *readahead_page(struct readahead_control *rac)
+{
+	struct page *page;
+
+	BUG_ON(rac->_batch_count > rac->_nr_pages);
+	rac->_nr_pages -= rac->_batch_count;
+	rac->_index += rac->_batch_count;
+	rac->_batch_count = 0;
+
+	if (!rac->_nr_pages)
+		return NULL;
+
+	page = xa_load(&rac->mapping->i_pages, rac->_index);
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	rac->_batch_count = hpage_nr_pages(page);
+
+	return page;
+}
+
 /* The number of pages in this readahead block */
 static inline unsigned int readahead_count(struct readahead_control *rac)
 {
diff --git a/mm/readahead.c b/mm/readahead.c
index 83df5c061d33..aaa209559ba2 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -113,15 +113,14 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages,
 
 EXPORT_SYMBOL(read_cache_pages);
 
-static void read_pages(struct readahead_control *rac, struct list_head *pages,
-		gfp_t gfp)
+static void read_pages(struct readahead_control *rac, struct list_head *pages)
 {
 	const struct address_space_operations *aops = rac->mapping->a_ops;
+	struct page *page;
 	struct blk_plug plug;
-	unsigned page_idx;
 
 	if (!readahead_count(rac))
-		return;
+		goto out;
 
 	blk_start_plug(&plug);
 
@@ -130,23 +129,23 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
 				readahead_count(rac));
 		/* Clean up the remaining pages */
 		put_pages_list(pages);
-		goto out;
-	}
-
-	for (page_idx = 0; page_idx < readahead_count(rac); page_idx++) {
-		struct page *page = lru_to_page(pages);
-		list_del(&page->lru);
-		if (!add_to_page_cache_lru(page, rac->mapping, page->index,
-				gfp))
+		rac->_index += rac->_nr_pages;
+		rac->_nr_pages = 0;
+	} else {
+		while ((page = readahead_page(rac))) {
 			aops->readpage(rac->file, page);
-		put_page(page);
+			put_page(page);
+		}
 	}
 
-out:
 	blk_finish_plug(&plug);
 
 	BUG_ON(!list_empty(pages));
-	rac->_nr_pages = 0;
+	BUG_ON(readahead_count(rac));
+
+out:
+	/* If we were called due to a conflicting page, skip over it */
+	rac->_index++;
 }
 
 /*
@@ -165,9 +164,11 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	LIST_HEAD(page_pool);
 	loff_t isize = i_size_read(inode);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
+	bool use_list = mapping->a_ops->readpages;
 	struct readahead_control rac = {
 		.mapping = mapping,
 		.file = filp,
+		._index = index,
 		._nr_pages = 0,
 	};
 	unsigned long i;
@@ -184,6 +185,8 @@ void __do_page_cache_readahead(struct address_space *mapping,
 		if (index + i > end_index)
 			break;
 
+		BUG_ON(index + i != rac._index + rac._nr_pages);
+
 		page = xa_load(&mapping->i_pages, index + i);
 		if (page && !xa_is_value(page)) {
 			/*
@@ -191,15 +194,22 @@ void __do_page_cache_readahead(struct address_space *mapping,
 			 * contiguous pages before continuing with the next
 			 * batch.
 			 */
-			read_pages(&rac, &page_pool, gfp_mask);
+			read_pages(&rac, &page_pool);
 			continue;
 		}
 
 		page = __page_cache_alloc(gfp_mask);
 		if (!page)
 			break;
-		page->index = index + i;
-		list_add(&page->lru, &page_pool);
+		if (use_list) {
+			page->index = index + i;
+			list_add(&page->lru, &page_pool);
+		} else if (add_to_page_cache_lru(page, mapping, index + i,
+					gfp_mask) < 0) {
+			put_page(page);
+			read_pages(&rac, &page_pool);
+			continue;
+		}
 		if (i == nr_to_read - lookahead_size)
 			SetPageReadahead(page);
 		rac._nr_pages++;
@@ -210,7 +220,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	 * uptodate then the caller will launch readpage again, and
 	 * will then handle the error.
 	 */
-	read_pages(&rac, &page_pool, gfp_mask);
+	read_pages(&rac, &page_pool);
 }
 
 /*
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 10/24] mm: Add readahead address space operation
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (8 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 09/24] mm: Put readahead pages in cache earlier Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-20 15:00   ` Zi Yan
                     ` (2 more replies)
  2020-02-19 21:00 ` [PATCH v7 11/24] mm: Move end_index check out of readahead loop Matthew Wilcox
                   ` (14 subsequent siblings)
  24 siblings, 3 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This replaces ->readpages with a saner interface:
 - Return void instead of an ignored error code.
 - Page cache is already populated with locked pages when ->readahead
   is called.
 - New arguments can be passed to the implementation without changing
   all the filesystems that use a common helper function like
   mpage_readahead().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/locking.rst |  6 +++++-
 Documentation/filesystems/vfs.rst     | 15 +++++++++++++++
 include/linux/fs.h                    |  2 ++
 include/linux/pagemap.h               | 18 ++++++++++++++++++
 mm/readahead.c                        | 12 ++++++++++--
 5 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 5057e4d9dcd1..0af2e0e11461 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -239,6 +239,7 @@ prototypes::
 	int (*readpage)(struct file *, struct page *);
 	int (*writepages)(struct address_space *, struct writeback_control *);
 	int (*set_page_dirty)(struct page *page);
+	void (*readahead)(struct readahead_control *);
 	int (*readpages)(struct file *filp, struct address_space *mapping,
 			struct list_head *pages, unsigned nr_pages);
 	int (*write_begin)(struct file *, struct address_space *mapping,
@@ -271,7 +272,8 @@ writepage:		yes, unlocks (see below)
 readpage:		yes, unlocks
 writepages:
 set_page_dirty		no
-readpages:
+readahead:		yes, unlocks
+readpages:		no
 write_begin:		locks the page		 exclusive
 write_end:		yes, unlocks		 exclusive
 bmap:
@@ -295,6 +297,8 @@ the request handler (/dev/loop).
 ->readpage() unlocks the page, either synchronously or via I/O
 completion.
 
+->readahead() unlocks the pages that I/O is attempted on like ->readpage().
+
 ->readpages() populates the pagecache with the passed pages and starts
 I/O against them.  They come unlocked upon I/O completion.
 
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7d4d09dd5e6d..ed17771c212b 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -706,6 +706,7 @@ cache in your filesystem.  The following members are defined:
 		int (*readpage)(struct file *, struct page *);
 		int (*writepages)(struct address_space *, struct writeback_control *);
 		int (*set_page_dirty)(struct page *page);
+		void (*readahead)(struct readahead_control *);
 		int (*readpages)(struct file *filp, struct address_space *mapping,
 				 struct list_head *pages, unsigned nr_pages);
 		int (*write_begin)(struct file *, struct address_space *mapping,
@@ -781,12 +782,26 @@ cache in your filesystem.  The following members are defined:
 	If defined, it should set the PageDirty flag, and the
 	PAGECACHE_TAG_DIRTY tag in the radix tree.
 
+``readahead``
+	Called by the VM to read pages associated with the address_space
+	object.  The pages are consecutive in the page cache and are
+	locked.  The implementation should decrement the page refcount
+	after starting I/O on each page.  Usually the page will be
+	unlocked by the I/O completion handler.  If the filesystem decides
+	to stop attempting I/O before reaching the end of the readahead
+	window, it can simply return.  The caller will decrement the page
+	refcount and unlock the remaining pages for you.  Set PageUptodate
+	if the I/O completes successfully.  Setting PageError on any page
+	will be ignored; simply unlock the page if an I/O error occurs.
+
 ``readpages``
 	called by the VM to read pages associated with the address_space
 	object.  This is essentially just a vector version of readpage.
 	Instead of just one page, several pages are requested.
 	readpages is only used for read-ahead, so read errors are
 	ignored.  If anything goes wrong, feel free to give up.
+	This interface is deprecated and will be removed by the end of
+	2020; implement readahead instead.
 
 ``write_begin``
 	Called by the generic buffered write code to ask the filesystem
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3cd4fe6b845e..d4e2d2964346 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -292,6 +292,7 @@ enum positive_aop_returns {
 struct page;
 struct address_space;
 struct writeback_control;
+struct readahead_control;
 
 /*
  * Write life time hint values.
@@ -375,6 +376,7 @@ struct address_space_operations {
 	 */
 	int (*readpages)(struct file *filp, struct address_space *mapping,
 			struct list_head *pages, unsigned nr_pages);
+	void (*readahead)(struct readahead_control *);
 
 	int (*write_begin)(struct file *, struct address_space *mapping,
 				loff_t pos, unsigned len, unsigned flags,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4989d330fada..b3008605fd1b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -669,6 +669,24 @@ static inline struct page *readahead_page(struct readahead_control *rac)
 	return page;
 }
 
+/* The byte offset into the file of this readahead block */
+static inline loff_t readahead_pos(struct readahead_control *rac)
+{
+	return (loff_t)rac->_index * PAGE_SIZE;
+}
+
+/* The number of bytes in this readahead block */
+static inline loff_t readahead_length(struct readahead_control *rac)
+{
+	return (loff_t)rac->_nr_pages * PAGE_SIZE;
+}
+
+/* The index of the first page in this readahead block */
+static inline unsigned int readahead_index(struct readahead_control *rac)
+{
+	return rac->_index;
+}
+
 /* The number of pages in this readahead block */
 static inline unsigned int readahead_count(struct readahead_control *rac)
 {
diff --git a/mm/readahead.c b/mm/readahead.c
index aaa209559ba2..07cdfbf00f4b 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -124,7 +124,14 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages)
 
 	blk_start_plug(&plug);
 
-	if (aops->readpages) {
+	if (aops->readahead) {
+		aops->readahead(rac);
+		/* Clean up the remaining pages */
+		while ((page = readahead_page(rac))) {
+			unlock_page(page);
+			put_page(page);
+		}
+	} else if (aops->readpages) {
 		aops->readpages(rac->file, rac->mapping, pages,
 				readahead_count(rac));
 		/* Clean up the remaining pages */
@@ -234,7 +241,8 @@ void force_page_cache_readahead(struct address_space *mapping,
 	struct file_ra_state *ra = &filp->f_ra;
 	unsigned long max_pages;
 
-	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
+	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages &&
+			!mapping->a_ops->readahead))
 		return;
 
 	/*
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 11/24] mm: Move end_index check out of readahead loop
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (9 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 10/24] mm: Add readahead address space operation Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-21  3:50   ` John Hubbard
  2020-02-19 21:00 ` [PATCH v7 12/24] mm: Add page_cache_readahead_unbounded Matthew Wilcox
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

By reducing nr_to_read, we can eliminate this check from inside the loop.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/readahead.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 07cdfbf00f4b..ace611f4bf05 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -166,8 +166,6 @@ void __do_page_cache_readahead(struct address_space *mapping,
 		unsigned long lookahead_size)
 {
 	struct inode *inode = mapping->host;
-	struct page *page;
-	unsigned long end_index;	/* The last page we want to read */
 	LIST_HEAD(page_pool);
 	loff_t isize = i_size_read(inode);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
@@ -179,22 +177,27 @@ void __do_page_cache_readahead(struct address_space *mapping,
 		._nr_pages = 0,
 	};
 	unsigned long i;
+	pgoff_t end_index;	/* The last page we want to read */
 
 	if (isize == 0)
 		return;
 
-	end_index = ((isize - 1) >> PAGE_SHIFT);
+	end_index = (isize - 1) >> PAGE_SHIFT;
+	if (index > end_index)
+		return;
+	if (index + nr_to_read < index)
+		nr_to_read = ULONG_MAX - index + 1;
+	if (index + nr_to_read >= end_index)
+		nr_to_read = end_index - index + 1;
 
 	/*
 	 * Preallocate as many pages as we will need.
 	 */
 	for (i = 0; i < nr_to_read; i++) {
-		if (index + i > end_index)
-			break;
+		struct page *page = xa_load(&mapping->i_pages, index + i);
 
 		BUG_ON(index + i != rac._index + rac._nr_pages);
 
-		page = xa_load(&mapping->i_pages, index + i);
 		if (page && !xa_is_value(page)) {
 			/*
 			 * Page already present?  Kick off the current batch of
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 12/24] mm: Add page_cache_readahead_unbounded
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (10 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 11/24] mm: Move end_index check out of readahead loop Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-24 21:53   ` [Cluster-devel] " Christoph Hellwig
  2020-02-19 21:00 ` [PATCH v7 13/24] fs: Convert mpage_readpages to mpage_readahead Matthew Wilcox
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

ext4 and f2fs have duplicated the guts of the readahead code so
they can read past i_size.  Instead, separate out the guts of the
readahead code so they can call it directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/ext4/verity.c        | 35 ++-------------------
 fs/f2fs/verity.c        | 35 ++-------------------
 include/linux/pagemap.h |  3 ++
 mm/readahead.c          | 70 ++++++++++++++++++++++++++++-------------
 4 files changed, 55 insertions(+), 88 deletions(-)

diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index dc5ec724d889..dec1244dd062 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -342,37 +342,6 @@ static int ext4_get_verity_descriptor(struct inode *inode, void *buf,
 	return desc_size;
 }
 
-/*
- * Prefetch some pages from the file's Merkle tree.
- *
- * This is basically a stripped-down version of __do_page_cache_readahead()
- * which works on pages past i_size.
- */
-static void ext4_merkle_tree_readahead(struct address_space *mapping,
-				       pgoff_t start_index, unsigned long count)
-{
-	LIST_HEAD(pages);
-	unsigned int nr_pages = 0;
-	struct page *page;
-	pgoff_t index;
-	struct blk_plug plug;
-
-	for (index = start_index; index < start_index + count; index++) {
-		page = xa_load(&mapping->i_pages, index);
-		if (!page || xa_is_value(page)) {
-			page = __page_cache_alloc(readahead_gfp_mask(mapping));
-			if (!page)
-				break;
-			page->index = index;
-			list_add(&page->lru, &pages);
-			nr_pages++;
-		}
-	}
-	blk_start_plug(&plug);
-	ext4_mpage_readpages(mapping, &pages, NULL, nr_pages, true);
-	blk_finish_plug(&plug);
-}
-
 static struct page *ext4_read_merkle_tree_page(struct inode *inode,
 					       pgoff_t index,
 					       unsigned long num_ra_pages)
@@ -386,8 +355,8 @@ static struct page *ext4_read_merkle_tree_page(struct inode *inode,
 		if (page)
 			put_page(page);
 		else if (num_ra_pages > 1)
-			ext4_merkle_tree_readahead(inode->i_mapping, index,
-						   num_ra_pages);
+			page_cache_readahead_unbounded(inode->i_mapping, NULL,
+					index, num_ra_pages, 0);
 		page = read_mapping_page(inode->i_mapping, index, NULL);
 	}
 	return page;
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index d7d430a6f130..865c9fb774fb 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -222,37 +222,6 @@ static int f2fs_get_verity_descriptor(struct inode *inode, void *buf,
 	return size;
 }
 
-/*
- * Prefetch some pages from the file's Merkle tree.
- *
- * This is basically a stripped-down version of __do_page_cache_readahead()
- * which works on pages past i_size.
- */
-static void f2fs_merkle_tree_readahead(struct address_space *mapping,
-				       pgoff_t start_index, unsigned long count)
-{
-	LIST_HEAD(pages);
-	unsigned int nr_pages = 0;
-	struct page *page;
-	pgoff_t index;
-	struct blk_plug plug;
-
-	for (index = start_index; index < start_index + count; index++) {
-		page = xa_load(&mapping->i_pages, index);
-		if (!page || xa_is_value(page)) {
-			page = __page_cache_alloc(readahead_gfp_mask(mapping));
-			if (!page)
-				break;
-			page->index = index;
-			list_add(&page->lru, &pages);
-			nr_pages++;
-		}
-	}
-	blk_start_plug(&plug);
-	f2fs_mpage_readpages(mapping, &pages, NULL, nr_pages, true);
-	blk_finish_plug(&plug);
-}
-
 static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
 					       pgoff_t index,
 					       unsigned long num_ra_pages)
@@ -266,8 +235,8 @@ static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
 		if (page)
 			put_page(page);
 		else if (num_ra_pages > 1)
-			f2fs_merkle_tree_readahead(inode->i_mapping, index,
-						   num_ra_pages);
+			page_cache_readahead_unbounded(inode->i_mapping, NULL,
+					index, num_ra_pages, 0);
 		page = read_mapping_page(inode->i_mapping, index, NULL);
 	}
 	return page;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index b3008605fd1b..60f9b8d4da6c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -621,6 +621,9 @@ void page_cache_sync_readahead(struct address_space *, struct file_ra_state *,
 void page_cache_async_readahead(struct address_space *, struct file_ra_state *,
 		struct file *, struct page *, pgoff_t index,
 		unsigned long req_count);
+void page_cache_readahead_unbounded(struct address_space *, struct file *,
+		pgoff_t index, unsigned long nr_to_read,
+		unsigned long lookahead_count);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
diff --git a/mm/readahead.c b/mm/readahead.c
index ace611f4bf05..453ef146de83 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -155,40 +155,36 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages)
 	rac->_index++;
 }
 
-/*
- * __do_page_cache_readahead() actually reads a chunk of disk.  It allocates
- * the pages first, then submits them for I/O. This avoids the very bad
- * behaviour which would occur if page allocations are causing VM writeback.
- * We really don't want to intermingle reads and writes like that.
+/**
+ * page_cache_readahead_unbounded - Start unchecked readahead.
+ * @mapping: File address space.
+ * @file: This instance of the open file; used for authentication.
+ * @index: First page index to read.
+ * @nr_to_read: The number of pages to read.
+ * @lookahead_size: Where to start the next readahead.
+ *
+ * This function is for filesystems to call when they want to start
+ * readahead beyond a file's stated i_size.  This is almost certainly
+ * not the function you want to call.  Use page_cache_async_readahead()
+ * or page_cache_sync_readahead() instead.
+ *
+ * Context: File is referenced by caller.  Mutexes may be held by caller.
+ * May sleep, but will not reenter filesystem to reclaim memory.
  */
-void __do_page_cache_readahead(struct address_space *mapping,
-		struct file *filp, pgoff_t index, unsigned long nr_to_read,
+void page_cache_readahead_unbounded(struct address_space *mapping,
+		struct file *file, pgoff_t index, unsigned long nr_to_read,
 		unsigned long lookahead_size)
 {
-	struct inode *inode = mapping->host;
 	LIST_HEAD(page_pool);
-	loff_t isize = i_size_read(inode);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
 	bool use_list = mapping->a_ops->readpages;
 	struct readahead_control rac = {
 		.mapping = mapping,
-		.file = filp,
+		.file = file,
 		._index = index,
 		._nr_pages = 0,
 	};
 	unsigned long i;
-	pgoff_t end_index;	/* The last page we want to read */
-
-	if (isize == 0)
-		return;
-
-	end_index = (isize - 1) >> PAGE_SHIFT;
-	if (index > end_index)
-		return;
-	if (index + nr_to_read < index)
-		nr_to_read = ULONG_MAX - index + 1;
-	if (index + nr_to_read >= end_index)
-		nr_to_read = end_index - index + 1;
 
 	/*
 	 * Preallocate as many pages as we will need.
@@ -232,6 +228,36 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	 */
 	read_pages(&rac, &page_pool);
 }
+EXPORT_SYMBOL_GPL(page_cache_readahead_unbounded);
+
+/*
+ * __do_page_cache_readahead() actually reads a chunk of disk.  It allocates
+ * the pages first, then submits them for I/O. This avoids the very bad
+ * behaviour which would occur if page allocations are causing VM writeback.
+ * We really don't want to intermingle reads and writes like that.
+ */
+void __do_page_cache_readahead(struct address_space *mapping,
+		struct file *file, pgoff_t index, unsigned long nr_to_read,
+		unsigned long lookahead_size)
+{
+	struct inode *inode = mapping->host;
+	loff_t isize = i_size_read(inode);
+	pgoff_t end_index;	/* The last page we want to read */
+
+	if (isize == 0)
+		return;
+
+	end_index = (isize - 1) >> PAGE_SHIFT;
+	if (index > end_index)
+		return;
+	if (index + nr_to_read < index)
+		nr_to_read = ULONG_MAX - index + 1;
+	if (index + nr_to_read >= end_index)
+		nr_to_read = end_index - index + 1;
+
+	page_cache_readahead_unbounded(mapping, file, index, nr_to_read,
+			lookahead_size);
+}
 
 /*
  * Chunk the readahead into 2 megabyte units, so that we don't pin too much
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 13/24] fs: Convert mpage_readpages to mpage_readahead
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (11 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 12/24] mm: Add page_cache_readahead_unbounded Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-24 21:54   ` [Cluster-devel] " Christoph Hellwig
  2020-02-19 21:00 ` [PATCH v7 14/24] btrfs: Convert from readpages to readahead Matthew Wilcox
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, Junxiao Bi, Joseph Qi, John Hubbard, linux-kernel,
	Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	Dave Chinner, linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Implement the new readahead aop and convert all callers (block_dev,
exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
reiserfs & udf).  The callers are all trivial except for GFS2 & OCFS2.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/staging/exfat/exfat_super.c |  7 +++---
 fs/block_dev.c                      |  7 +++---
 fs/ext2/inode.c                     | 10 +++-----
 fs/fat/inode.c                      |  7 +++---
 fs/gfs2/aops.c                      | 23 ++++++-----------
 fs/hpfs/file.c                      |  7 +++---
 fs/iomap/buffered-io.c              |  2 +-
 fs/isofs/inode.c                    |  7 +++---
 fs/jfs/inode.c                      |  7 +++---
 fs/mpage.c                          | 38 +++++++++--------------------
 fs/nilfs2/inode.c                   | 15 +++---------
 fs/ocfs2/aops.c                     | 34 ++++++++++----------------
 fs/omfs/file.c                      |  7 +++---
 fs/qnx6/inode.c                     |  7 +++---
 fs/reiserfs/inode.c                 |  8 +++---
 fs/udf/inode.c                      |  7 +++---
 include/linux/mpage.h               |  4 +--
 mm/migrate.c                        |  2 +-
 18 files changed, 73 insertions(+), 126 deletions(-)

diff --git a/drivers/staging/exfat/exfat_super.c b/drivers/staging/exfat/exfat_super.c
index b81d2a87b82e..96aad9b16d31 100644
--- a/drivers/staging/exfat/exfat_super.c
+++ b/drivers/staging/exfat/exfat_super.c
@@ -3002,10 +3002,9 @@ static int exfat_readpage(struct file *file, struct page *page)
 	return  mpage_readpage(page, exfat_get_block);
 }
 
-static int exfat_readpages(struct file *file, struct address_space *mapping,
-			   struct list_head *pages, unsigned int nr_pages)
+static void exfat_readahead(struct readahead_control *rac)
 {
-	return  mpage_readpages(mapping, pages, nr_pages, exfat_get_block);
+	mpage_readahead(rac, exfat_get_block);
 }
 
 static int exfat_writepage(struct page *page, struct writeback_control *wbc)
@@ -3104,7 +3103,7 @@ static sector_t _exfat_bmap(struct address_space *mapping, sector_t block)
 
 static const struct address_space_operations exfat_aops = {
 	.readpage    = exfat_readpage,
-	.readpages   = exfat_readpages,
+	.readahead   = exfat_readahead,
 	.writepage   = exfat_writepage,
 	.writepages  = exfat_writepages,
 	.write_begin = exfat_write_begin,
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 69bf2fb6f7cd..2fd9c7bd61f6 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -614,10 +614,9 @@ static int blkdev_readpage(struct file * file, struct page * page)
 	return block_read_full_page(page, blkdev_get_block);
 }
 
-static int blkdev_readpages(struct file *file, struct address_space *mapping,
-			struct list_head *pages, unsigned nr_pages)
+static void blkdev_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block);
+	mpage_readahead(rac, blkdev_get_block);
 }
 
 static int blkdev_write_begin(struct file *file, struct address_space *mapping,
@@ -2062,7 +2061,7 @@ static int blkdev_writepages(struct address_space *mapping,
 
 static const struct address_space_operations def_blk_aops = {
 	.readpage	= blkdev_readpage,
-	.readpages	= blkdev_readpages,
+	.readahead	= blkdev_readahead,
 	.writepage	= blkdev_writepage,
 	.write_begin	= blkdev_write_begin,
 	.write_end	= blkdev_write_end,
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index c885cf7d724b..2875c0a705b5 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -877,11 +877,9 @@ static int ext2_readpage(struct file *file, struct page *page)
 	return mpage_readpage(page, ext2_get_block);
 }
 
-static int
-ext2_readpages(struct file *file, struct address_space *mapping,
-		struct list_head *pages, unsigned nr_pages)
+static void ext2_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, ext2_get_block);
+	mpage_readahead(rac, ext2_get_block);
 }
 
 static int
@@ -967,7 +965,7 @@ ext2_dax_writepages(struct address_space *mapping, struct writeback_control *wbc
 
 const struct address_space_operations ext2_aops = {
 	.readpage		= ext2_readpage,
-	.readpages		= ext2_readpages,
+	.readahead		= ext2_readahead,
 	.writepage		= ext2_writepage,
 	.write_begin		= ext2_write_begin,
 	.write_end		= ext2_write_end,
@@ -981,7 +979,7 @@ const struct address_space_operations ext2_aops = {
 
 const struct address_space_operations ext2_nobh_aops = {
 	.readpage		= ext2_readpage,
-	.readpages		= ext2_readpages,
+	.readahead		= ext2_readahead,
 	.writepage		= ext2_nobh_writepage,
 	.write_begin		= ext2_nobh_write_begin,
 	.write_end		= nobh_write_end,
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 594b05ae16c9..3496f5fc3e6d 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -210,10 +210,9 @@ static int fat_readpage(struct file *file, struct page *page)
 	return mpage_readpage(page, fat_get_block);
 }
 
-static int fat_readpages(struct file *file, struct address_space *mapping,
-			 struct list_head *pages, unsigned nr_pages)
+static void fat_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, fat_get_block);
+	mpage_readahead(rac, fat_get_block);
 }
 
 static void fat_write_failed(struct address_space *mapping, loff_t to)
@@ -344,7 +343,7 @@ int fat_block_truncate_page(struct inode *inode, loff_t from)
 
 static const struct address_space_operations fat_aops = {
 	.readpage	= fat_readpage,
-	.readpages	= fat_readpages,
+	.readahead	= fat_readahead,
 	.writepage	= fat_writepage,
 	.writepages	= fat_writepages,
 	.write_begin	= fat_write_begin,
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index ba83b49ce18c..5e63c13c12c1 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -577,7 +577,7 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, loff_t *pos,
 }
 
 /**
- * gfs2_readpages - Read a bunch of pages at once
+ * gfs2_readahead - Read a bunch of pages at once
  * @file: The file to read from
  * @mapping: Address space info
  * @pages: List of pages to read
@@ -590,31 +590,24 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, loff_t *pos,
  *    obviously not something we'd want to do on too regular a basis.
  *    Any I/O we ignore at this time will be done via readpage later.
  * 2. We don't handle stuffed files here we let readpage do the honours.
- * 3. mpage_readpages() does most of the heavy lifting in the common case.
+ * 3. mpage_readahead() does most of the heavy lifting in the common case.
  * 4. gfs2_block_map() is relied upon to set BH_Boundary in the right places.
  */
 
-static int gfs2_readpages(struct file *file, struct address_space *mapping,
-			  struct list_head *pages, unsigned nr_pages)
+static void gfs2_readahead(struct readahead_control *rac)
 {
-	struct inode *inode = mapping->host;
+	struct inode *inode = rac->mapping->host;
 	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_holder gh;
-	int ret;
 
 	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
-	ret = gfs2_glock_nq(&gh);
-	if (unlikely(ret))
+	if (gfs2_glock_nq(&gh))
 		goto out_uninit;
 	if (!gfs2_is_stuffed(ip))
-		ret = mpage_readpages(mapping, pages, nr_pages, gfs2_block_map);
+		mpage_readahead(rac, gfs2_block_map);
 	gfs2_glock_dq(&gh);
 out_uninit:
 	gfs2_holder_uninit(&gh);
-	if (unlikely(gfs2_withdrawn(sdp)))
-		ret = -EIO;
-	return ret;
 }
 
 /**
@@ -828,7 +821,7 @@ static const struct address_space_operations gfs2_aops = {
 	.writepage = gfs2_writepage,
 	.writepages = gfs2_writepages,
 	.readpage = gfs2_readpage,
-	.readpages = gfs2_readpages,
+	.readahead = gfs2_readahead,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
 	.releasepage = gfs2_releasepage,
@@ -842,7 +835,7 @@ static const struct address_space_operations gfs2_jdata_aops = {
 	.writepage = gfs2_jdata_writepage,
 	.writepages = gfs2_jdata_writepages,
 	.readpage = gfs2_readpage,
-	.readpages = gfs2_readpages,
+	.readahead = gfs2_readahead,
 	.set_page_dirty = jdata_set_page_dirty,
 	.bmap = gfs2_bmap,
 	.invalidatepage = gfs2_invalidatepage,
diff --git a/fs/hpfs/file.c b/fs/hpfs/file.c
index b36abf9cb345..2de0d3492d15 100644
--- a/fs/hpfs/file.c
+++ b/fs/hpfs/file.c
@@ -125,10 +125,9 @@ static int hpfs_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page, hpfs_get_block, wbc);
 }
 
-static int hpfs_readpages(struct file *file, struct address_space *mapping,
-			  struct list_head *pages, unsigned nr_pages)
+static void hpfs_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, hpfs_get_block);
+	mpage_readahead(rac, hpfs_get_block);
 }
 
 static int hpfs_writepages(struct address_space *mapping,
@@ -198,7 +197,7 @@ static int hpfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 const struct address_space_operations hpfs_aops = {
 	.readpage = hpfs_readpage,
 	.writepage = hpfs_writepage,
-	.readpages = hpfs_readpages,
+	.readahead = hpfs_readahead,
 	.writepages = hpfs_writepages,
 	.write_begin = hpfs_write_begin,
 	.write_end = hpfs_write_end,
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7c84c4c027c4..cb3511eb152a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -359,7 +359,7 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 	}
 
 	/*
-	 * Just like mpage_readpages and block_read_full_page we always
+	 * Just like mpage_readahead and block_read_full_page we always
 	 * return 0 and just mark the page as PageError on errors.  This
 	 * should be cleaned up all through the stack eventually.
 	 */
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 62c0462dc89f..95b1f377ad09 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -1185,10 +1185,9 @@ static int isofs_readpage(struct file *file, struct page *page)
 	return mpage_readpage(page, isofs_get_block);
 }
 
-static int isofs_readpages(struct file *file, struct address_space *mapping,
-			struct list_head *pages, unsigned nr_pages)
+static void isofs_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, isofs_get_block);
+	mpage_readahead(rac, isofs_get_block);
 }
 
 static sector_t _isofs_bmap(struct address_space *mapping, sector_t block)
@@ -1198,7 +1197,7 @@ static sector_t _isofs_bmap(struct address_space *mapping, sector_t block)
 
 static const struct address_space_operations isofs_aops = {
 	.readpage = isofs_readpage,
-	.readpages = isofs_readpages,
+	.readahead = isofs_readahead,
 	.bmap = _isofs_bmap
 };
 
diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
index 9486afcdac76..6f65bfa9f18d 100644
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
@@ -296,10 +296,9 @@ static int jfs_readpage(struct file *file, struct page *page)
 	return mpage_readpage(page, jfs_get_block);
 }
 
-static int jfs_readpages(struct file *file, struct address_space *mapping,
-		struct list_head *pages, unsigned nr_pages)
+static void jfs_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, jfs_get_block);
+	mpage_readahead(rac, jfs_get_block);
 }
 
 static void jfs_write_failed(struct address_space *mapping, loff_t to)
@@ -358,7 +357,7 @@ static ssize_t jfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 const struct address_space_operations jfs_aops = {
 	.readpage	= jfs_readpage,
-	.readpages	= jfs_readpages,
+	.readahead	= jfs_readahead,
 	.writepage	= jfs_writepage,
 	.writepages	= jfs_writepages,
 	.write_begin	= jfs_write_begin,
diff --git a/fs/mpage.c b/fs/mpage.c
index ccba3c4c4479..830e6cc2a9e7 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -91,7 +91,7 @@ mpage_alloc(struct block_device *bdev,
 }
 
 /*
- * support function for mpage_readpages.  The fs supplied get_block might
+ * support function for mpage_readahead.  The fs supplied get_block might
  * return an up to date buffer.  This is used to map that buffer into
  * the page, which allows readpage to avoid triggering a duplicate call
  * to get_block.
@@ -338,13 +338,8 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 }
 
 /**
- * mpage_readpages - populate an address space with some pages & start reads against them
- * @mapping: the address_space
- * @pages: The address of a list_head which contains the target pages.  These
- *   pages have their ->index populated and are otherwise uninitialised.
- *   The page at @pages->prev has the lowest file offset, and reads should be
- *   issued in @pages->prev to @pages->next order.
- * @nr_pages: The number of pages at *@pages
+ * mpage_readahead - start reads against pages
+ * @rac: Describes which pages to read.
  * @get_block: The filesystem's block mapper function.
  *
  * This function walks the pages and the blocks within each page, building and
@@ -381,36 +376,25 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
  *
  * This all causes the disk requests to be issued in the correct order.
  */
-int
-mpage_readpages(struct address_space *mapping, struct list_head *pages,
-				unsigned nr_pages, get_block_t get_block)
+void mpage_readahead(struct readahead_control *rac, get_block_t get_block)
 {
+	struct page *page;
 	struct mpage_readpage_args args = {
 		.get_block = get_block,
 		.is_readahead = true,
 	};
-	unsigned page_idx;
-
-	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
-		struct page *page = lru_to_page(pages);
 
+	while ((page = readahead_page(rac))) {
 		prefetchw(&page->flags);
-		list_del(&page->lru);
-		if (!add_to_page_cache_lru(page, mapping,
-					page->index,
-					readahead_gfp_mask(mapping))) {
-			args.page = page;
-			args.nr_pages = nr_pages - page_idx;
-			args.bio = do_mpage_readpage(&args);
-		}
+		args.page = page;
+		args.nr_pages = readahead_count(rac);
+		args.bio = do_mpage_readpage(&args);
 		put_page(page);
 	}
-	BUG_ON(!list_empty(pages));
 	if (args.bio)
 		mpage_bio_submit(REQ_OP_READ, REQ_RAHEAD, args.bio);
-	return 0;
 }
-EXPORT_SYMBOL(mpage_readpages);
+EXPORT_SYMBOL(mpage_readahead);
 
 /*
  * This isn't called much at all
@@ -563,7 +547,7 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
 		 * Page has buffers, but they are all unmapped. The page was
 		 * created by pagein or read over a hole which was handled by
 		 * block_read_full_page().  If this address_space is also
-		 * using mpage_readpages then this can rarely happen.
+		 * using mpage_readahead then this can rarely happen.
 		 */
 		goto confused;
 	}
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 671085512e0f..ceeb3b441844 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -145,18 +145,9 @@ static int nilfs_readpage(struct file *file, struct page *page)
 	return mpage_readpage(page, nilfs_get_block);
 }
 
-/**
- * nilfs_readpages() - implement readpages() method of nilfs_aops {}
- * address_space_operations.
- * @file - file struct of the file to be read
- * @mapping - address_space struct used for reading multiple pages
- * @pages - the pages to be read
- * @nr_pages - number of pages to be read
- */
-static int nilfs_readpages(struct file *file, struct address_space *mapping,
-			   struct list_head *pages, unsigned int nr_pages)
+static void nilfs_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, nilfs_get_block);
+	mpage_readahead(rac, nilfs_get_block);
 }
 
 static int nilfs_writepages(struct address_space *mapping,
@@ -308,7 +299,7 @@ const struct address_space_operations nilfs_aops = {
 	.readpage		= nilfs_readpage,
 	.writepages		= nilfs_writepages,
 	.set_page_dirty		= nilfs_set_page_dirty,
-	.readpages		= nilfs_readpages,
+	.readahead		= nilfs_readahead,
 	.write_begin		= nilfs_write_begin,
 	.write_end		= nilfs_write_end,
 	/* .releasepage		= nilfs_releasepage, */
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 3a67a6518ddf..3bfb4147895a 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -350,14 +350,11 @@ static int ocfs2_readpage(struct file *file, struct page *page)
  * grow out to a tree. If need be, detecting boundary extents could
  * trivially be added in a future version of ocfs2_get_block().
  */
-static int ocfs2_readpages(struct file *filp, struct address_space *mapping,
-			   struct list_head *pages, unsigned nr_pages)
+static void ocfs2_readahead(struct readahead_control *rac)
 {
-	int ret, err = -EIO;
-	struct inode *inode = mapping->host;
+	int ret;
+	struct inode *inode = rac->mapping->host;
 	struct ocfs2_inode_info *oi = OCFS2_I(inode);
-	loff_t start;
-	struct page *last;
 
 	/*
 	 * Use the nonblocking flag for the dlm code to avoid page
@@ -365,36 +362,31 @@ static int ocfs2_readpages(struct file *filp, struct address_space *mapping,
 	 */
 	ret = ocfs2_inode_lock_full(inode, NULL, 0, OCFS2_LOCK_NONBLOCK);
 	if (ret)
-		return err;
+		return;
 
-	if (down_read_trylock(&oi->ip_alloc_sem) == 0) {
-		ocfs2_inode_unlock(inode, 0);
-		return err;
-	}
+	if (down_read_trylock(&oi->ip_alloc_sem) == 0)
+		goto out_unlock;
 
 	/*
 	 * Don't bother with inline-data. There isn't anything
 	 * to read-ahead in that case anyway...
 	 */
 	if (oi->ip_dyn_features & OCFS2_INLINE_DATA_FL)
-		goto out_unlock;
+		goto out_up;
 
 	/*
 	 * Check whether a remote node truncated this file - we just
 	 * drop out in that case as it's not worth handling here.
 	 */
-	last = lru_to_page(pages);
-	start = (loff_t)last->index << PAGE_SHIFT;
-	if (start >= i_size_read(inode))
-		goto out_unlock;
+	if (readahead_pos(rac) >= i_size_read(inode))
+		goto out_up;
 
-	err = mpage_readpages(mapping, pages, nr_pages, ocfs2_get_block);
+	mpage_readahead(rac, ocfs2_get_block);
 
-out_unlock:
+out_up:
 	up_read(&oi->ip_alloc_sem);
+out_unlock:
 	ocfs2_inode_unlock(inode, 0);
-
-	return err;
 }
 
 /* Note: Because we don't support holes, our allocation has
@@ -2474,7 +2466,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 const struct address_space_operations ocfs2_aops = {
 	.readpage		= ocfs2_readpage,
-	.readpages		= ocfs2_readpages,
+	.readahead		= ocfs2_readahead,
 	.writepage		= ocfs2_writepage,
 	.write_begin		= ocfs2_write_begin,
 	.write_end		= ocfs2_write_end,
diff --git a/fs/omfs/file.c b/fs/omfs/file.c
index d640b9388238..d7b5f09d298c 100644
--- a/fs/omfs/file.c
+++ b/fs/omfs/file.c
@@ -289,10 +289,9 @@ static int omfs_readpage(struct file *file, struct page *page)
 	return block_read_full_page(page, omfs_get_block);
 }
 
-static int omfs_readpages(struct file *file, struct address_space *mapping,
-		struct list_head *pages, unsigned nr_pages)
+static void omfs_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, omfs_get_block);
+	mpage_readahead(rac, omfs_get_block);
 }
 
 static int omfs_writepage(struct page *page, struct writeback_control *wbc)
@@ -373,7 +372,7 @@ const struct inode_operations omfs_file_inops = {
 
 const struct address_space_operations omfs_aops = {
 	.readpage = omfs_readpage,
-	.readpages = omfs_readpages,
+	.readahead = omfs_readahead,
 	.writepage = omfs_writepage,
 	.writepages = omfs_writepages,
 	.write_begin = omfs_write_begin,
diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c
index 345db56c98fd..755293c8c71a 100644
--- a/fs/qnx6/inode.c
+++ b/fs/qnx6/inode.c
@@ -99,10 +99,9 @@ static int qnx6_readpage(struct file *file, struct page *page)
 	return mpage_readpage(page, qnx6_get_block);
 }
 
-static int qnx6_readpages(struct file *file, struct address_space *mapping,
-		   struct list_head *pages, unsigned nr_pages)
+static void qnx6_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, qnx6_get_block);
+	mpage_readahead(rac, qnx6_get_block);
 }
 
 /*
@@ -499,7 +498,7 @@ static sector_t qnx6_bmap(struct address_space *mapping, sector_t block)
 }
 static const struct address_space_operations qnx6_aops = {
 	.readpage	= qnx6_readpage,
-	.readpages	= qnx6_readpages,
+	.readahead	= qnx6_readahead,
 	.bmap		= qnx6_bmap
 };
 
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 6419e6dacc39..0031070b3692 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -1160,11 +1160,9 @@ int reiserfs_get_block(struct inode *inode, sector_t block,
 	return retval;
 }
 
-static int
-reiserfs_readpages(struct file *file, struct address_space *mapping,
-		   struct list_head *pages, unsigned nr_pages)
+static void reiserfs_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, reiserfs_get_block);
+	mpage_readahead(rac, reiserfs_get_block);
 }
 
 /*
@@ -3434,7 +3432,7 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr)
 const struct address_space_operations reiserfs_address_space_operations = {
 	.writepage = reiserfs_writepage,
 	.readpage = reiserfs_readpage,
-	.readpages = reiserfs_readpages,
+	.readahead = reiserfs_readahead,
 	.releasepage = reiserfs_releasepage,
 	.invalidatepage = reiserfs_invalidatepage,
 	.write_begin = reiserfs_write_begin,
diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index e875bc5668ee..adaba8e8b326 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -195,10 +195,9 @@ static int udf_readpage(struct file *file, struct page *page)
 	return mpage_readpage(page, udf_get_block);
 }
 
-static int udf_readpages(struct file *file, struct address_space *mapping,
-			struct list_head *pages, unsigned nr_pages)
+static void udf_readahead(struct readahead_control *rac)
 {
-	return mpage_readpages(mapping, pages, nr_pages, udf_get_block);
+	mpage_readahead(rac, udf_get_block);
 }
 
 static int udf_write_begin(struct file *file, struct address_space *mapping,
@@ -234,7 +233,7 @@ static sector_t udf_bmap(struct address_space *mapping, sector_t block)
 
 const struct address_space_operations udf_aops = {
 	.readpage	= udf_readpage,
-	.readpages	= udf_readpages,
+	.readahead	= udf_readahead,
 	.writepage	= udf_writepage,
 	.writepages	= udf_writepages,
 	.write_begin	= udf_write_begin,
diff --git a/include/linux/mpage.h b/include/linux/mpage.h
index 001f1fcf9836..f4f5e90a6844 100644
--- a/include/linux/mpage.h
+++ b/include/linux/mpage.h
@@ -13,9 +13,9 @@
 #ifdef CONFIG_BLOCK
 
 struct writeback_control;
+struct readahead_control;
 
-int mpage_readpages(struct address_space *mapping, struct list_head *pages,
-				unsigned nr_pages, get_block_t get_block);
+void mpage_readahead(struct readahead_control *, get_block_t get_block);
 int mpage_readpage(struct page *page, get_block_t get_block);
 int mpage_writepages(struct address_space *mapping,
 		struct writeback_control *wbc, get_block_t get_block);
diff --git a/mm/migrate.c b/mm/migrate.c
index b1092876e537..a32122095702 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1020,7 +1020,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		 * to the LRU. Later, when the IO completes the pages are
 		 * marked uptodate and unlocked. However, the queueing
 		 * could be merging multiple pages for one bio (e.g.
-		 * mpage_readpages). If an allocation happens for the
+		 * mpage_readahead). If an allocation happens for the
 		 * second or third page, the process can end up locking
 		 * the same page twice and deadlocking. Rather than
 		 * trying to be clever about what pages can be locked,
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 14/24] btrfs: Convert from readpages to readahead
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (12 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 13/24] fs: Convert mpage_readpages to mpage_readahead Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-20  9:42   ` Johannes Thumshirn
  2020-02-19 21:00 ` [PATCH v7 15/24] erofs: Convert uncompressed files " Matthew Wilcox
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new readahead operation in btrfs.  Add a
readahead_for_each_batch() iterator to optimise the loop in the XArray.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/btrfs/extent_io.c    | 46 +++++++++++++----------------------------
 fs/btrfs/extent_io.h    |  3 +--
 fs/btrfs/inode.c        | 16 +++++++-------
 include/linux/pagemap.h | 37 +++++++++++++++++++++++++++++++++
 4 files changed, 59 insertions(+), 43 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c0f202741e09..e70f14c1de60 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4278,52 +4278,34 @@ int extent_writepages(struct address_space *mapping,
 	return ret;
 }
 
-int extent_readpages(struct address_space *mapping, struct list_head *pages,
-		     unsigned nr_pages)
+void extent_readahead(struct readahead_control *rac)
 {
 	struct bio *bio = NULL;
 	unsigned long bio_flags = 0;
 	struct page *pagepool[16];
 	struct extent_map *em_cached = NULL;
-	struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree;
-	int nr = 0;
+	struct extent_io_tree *tree = &BTRFS_I(rac->mapping->host)->io_tree;
 	u64 prev_em_start = (u64)-1;
+	int nr;
 
-	while (!list_empty(pages)) {
-		u64 contig_end = 0;
-
-		for (nr = 0; nr < ARRAY_SIZE(pagepool) && !list_empty(pages);) {
-			struct page *page = lru_to_page(pages);
-
-			prefetchw(&page->flags);
-			list_del(&page->lru);
-			if (add_to_page_cache_lru(page, mapping, page->index,
-						readahead_gfp_mask(mapping))) {
-				put_page(page);
-				break;
-			}
-
-			pagepool[nr++] = page;
-			contig_end = page_offset(page) + PAGE_SIZE - 1;
-		}
+	while ((nr = readahead_page_batch(rac, pagepool))) {
+		u64 contig_start = page_offset(pagepool[0]);
+		u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;
 
-		if (nr) {
-			u64 contig_start = page_offset(pagepool[0]);
+		ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
 
-			ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
-
-			contiguous_readpages(tree, pagepool, nr, contig_start,
-				     contig_end, &em_cached, &bio, &bio_flags,
-				     &prev_em_start);
-		}
+		contiguous_readpages(tree, pagepool, nr, contig_start,
+				contig_end, &em_cached, &bio, &bio_flags,
+				&prev_em_start);
 	}
 
 	if (em_cached)
 		free_extent_map(em_cached);
 
-	if (bio)
-		return submit_one_bio(bio, 0, bio_flags);
-	return 0;
+	if (bio) {
+		if (submit_one_bio(bio, 0, bio_flags))
+			return;
+	}
 }
 
 /*
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 5d205bbaafdc..bddac32948c7 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -198,8 +198,7 @@ int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc);
 int btree_write_cache_pages(struct address_space *mapping,
 			    struct writeback_control *wbc);
-int extent_readpages(struct address_space *mapping, struct list_head *pages,
-		     unsigned nr_pages);
+void extent_readahead(struct readahead_control *rac);
 int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		__u64 start, __u64 len);
 void set_page_extent_mapped(struct page *page);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7d26b4bfb2c6..61d5137ce4e9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4802,8 +4802,8 @@ static void evict_inode_truncate_pages(struct inode *inode)
 
 	/*
 	 * Keep looping until we have no more ranges in the io tree.
-	 * We can have ongoing bios started by readpages (called from readahead)
-	 * that have their endio callback (extent_io.c:end_bio_extent_readpage)
+	 * We can have ongoing bios started by readahead that have
+	 * their endio callback (extent_io.c:end_bio_extent_readpage)
 	 * still in progress (unlocked the pages in the bio but did not yet
 	 * unlocked the ranges in the io tree). Therefore this means some
 	 * ranges can still be locked and eviction started because before
@@ -7004,11 +7004,11 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
 			 * for it to complete) and then invalidate the pages for
 			 * this range (through invalidate_inode_pages2_range()),
 			 * but that can lead us to a deadlock with a concurrent
-			 * call to readpages() (a buffered read or a defrag call
+			 * call to readahead (a buffered read or a defrag call
 			 * triggered a readahead) on a page lock due to an
 			 * ordered dio extent we created before but did not have
 			 * yet a corresponding bio submitted (whence it can not
-			 * complete), which makes readpages() wait for that
+			 * complete), which makes readahead wait for that
 			 * ordered extent to complete while holding a lock on
 			 * that page.
 			 */
@@ -8247,11 +8247,9 @@ static int btrfs_writepages(struct address_space *mapping,
 	return extent_writepages(mapping, wbc);
 }
 
-static int
-btrfs_readpages(struct file *file, struct address_space *mapping,
-		struct list_head *pages, unsigned nr_pages)
+static void btrfs_readahead(struct readahead_control *rac)
 {
-	return extent_readpages(mapping, pages, nr_pages);
+	extent_readahead(rac);
 }
 
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
@@ -10456,7 +10454,7 @@ static const struct address_space_operations btrfs_aops = {
 	.readpage	= btrfs_readpage,
 	.writepage	= btrfs_writepage,
 	.writepages	= btrfs_writepages,
-	.readpages	= btrfs_readpages,
+	.readahead	= btrfs_readahead,
 	.direct_IO	= btrfs_direct_IO,
 	.invalidatepage = btrfs_invalidatepage,
 	.releasepage	= btrfs_releasepage,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 60f9b8d4da6c..dadf6ed11e71 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -672,6 +672,43 @@ static inline struct page *readahead_page(struct readahead_control *rac)
 	return page;
 }
 
+static inline unsigned int __readahead_batch(struct readahead_control *rac,
+		struct page **array, unsigned int array_sz)
+{
+	unsigned int i = 0;
+	XA_STATE(xas, &rac->mapping->i_pages, rac->_index);
+	struct page *page;
+
+	BUG_ON(rac->_batch_count > rac->_nr_pages);
+	rac->_nr_pages -= rac->_batch_count;
+	rac->_index += rac->_batch_count;
+	rac->_batch_count = 0;
+
+	xas_for_each(&xas, page, rac->_index + rac->_nr_pages - 1) {
+		VM_BUG_ON_PAGE(!PageLocked(page), page);
+		VM_BUG_ON_PAGE(PageTail(page), page);
+		array[i++] = page;
+		rac->_batch_count += hpage_nr_pages(page);
+
+		/*
+		 * The page cache isn't using multi-index entries yet,
+		 * so the xas cursor needs to be manually moved to the
+		 * next index.  This can be removed once the page cache
+		 * is converted.
+		 */
+		if (PageHead(page))
+			xas_set(&xas, rac->_index + rac->_batch_count);
+
+		if (i == array_sz)
+			break;
+	}
+
+	return i;
+}
+
+#define readahead_page_batch(rac, array)				\
+	__readahead_batch(rac, array, ARRAY_SIZE(array))
+
 /* The byte offset into the file of this readahead block */
 static inline loff_t readahead_pos(struct readahead_control *rac)
 {
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 15/24] erofs: Convert uncompressed files from readpages to readahead
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (13 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 14/24] btrfs: Convert from readpages to readahead Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 16/24] erofs: Convert compressed " Matthew Wilcox
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new readahead operation in erofs

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Gao Xiang <gaoxiang25@huawei.com>
---
 fs/erofs/data.c              | 39 +++++++++++++-----------------------
 fs/erofs/zdata.c             |  2 +-
 include/trace/events/erofs.h |  6 +++---
 3 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index fc3a8d8064f8..d0542151e8c4 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -280,47 +280,36 @@ static int erofs_raw_access_readpage(struct file *file, struct page *page)
 	return 0;
 }
 
-static int erofs_raw_access_readpages(struct file *filp,
-				      struct address_space *mapping,
-				      struct list_head *pages,
-				      unsigned int nr_pages)
+static void erofs_raw_access_readahead(struct readahead_control *rac)
 {
 	erofs_off_t last_block;
 	struct bio *bio = NULL;
-	gfp_t gfp = readahead_gfp_mask(mapping);
-	struct page *page = list_last_entry(pages, struct page, lru);
-
-	trace_erofs_readpages(mapping->host, page, nr_pages, true);
+	struct page *page;
 
-	for (; nr_pages; --nr_pages) {
-		page = list_entry(pages->prev, struct page, lru);
+	trace_erofs_readpages(rac->mapping->host, readahead_index(rac),
+			readahead_count(rac), true);
 
+	while ((page = readahead_page(rac))) {
 		prefetchw(&page->flags);
-		list_del(&page->lru);
 
-		if (!add_to_page_cache_lru(page, mapping, page->index, gfp)) {
-			bio = erofs_read_raw_page(bio, mapping, page,
-						  &last_block, nr_pages, true);
+		bio = erofs_read_raw_page(bio, rac->mapping, page, &last_block,
+				readahead_count(rac), true);
 
-			/* all the page errors are ignored when readahead */
-			if (IS_ERR(bio)) {
-				pr_err("%s, readahead error at page %lu of nid %llu\n",
-				       __func__, page->index,
-				       EROFS_I(mapping->host)->nid);
+		/* all the page errors are ignored when readahead */
+		if (IS_ERR(bio)) {
+			pr_err("%s, readahead error at page %lu of nid %llu\n",
+			       __func__, page->index,
+			       EROFS_I(rac->mapping->host)->nid);
 
-				bio = NULL;
-			}
+			bio = NULL;
 		}
 
-		/* pages could still be locked */
 		put_page(page);
 	}
-	DBG_BUGON(!list_empty(pages));
 
 	/* the rare case (end in gaps) */
 	if (bio)
 		submit_bio(bio);
-	return 0;
 }
 
 static int erofs_get_block(struct inode *inode, sector_t iblock,
@@ -358,7 +347,7 @@ static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
 /* for uncompressed (aligned) files and raw access for other files */
 const struct address_space_operations erofs_raw_access_aops = {
 	.readpage = erofs_raw_access_readpage,
-	.readpages = erofs_raw_access_readpages,
+	.readahead = erofs_raw_access_readahead,
 	.bmap = erofs_bmap,
 };
 
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 80e47f07d946..17f45fcb8c5c 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1315,7 +1315,7 @@ static int z_erofs_readpages(struct file *filp, struct address_space *mapping,
 	struct page *head = NULL;
 	LIST_HEAD(pagepool);
 
-	trace_erofs_readpages(mapping->host, lru_to_page(pages),
+	trace_erofs_readpages(mapping->host, lru_to_page(pages)->index,
 			      nr_pages, false);
 
 	f.headoffset = (erofs_off_t)lru_to_page(pages)->index << PAGE_SHIFT;
diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
index 27f5caa6299a..bf9806fd1306 100644
--- a/include/trace/events/erofs.h
+++ b/include/trace/events/erofs.h
@@ -113,10 +113,10 @@ TRACE_EVENT(erofs_readpage,
 
 TRACE_EVENT(erofs_readpages,
 
-	TP_PROTO(struct inode *inode, struct page *page, unsigned int nrpage,
+	TP_PROTO(struct inode *inode, pgoff_t start, unsigned int nrpage,
 		bool raw),
 
-	TP_ARGS(inode, page, nrpage, raw),
+	TP_ARGS(inode, start, nrpage, raw),
 
 	TP_STRUCT__entry(
 		__field(dev_t,		dev	)
@@ -129,7 +129,7 @@ TRACE_EVENT(erofs_readpages,
 	TP_fast_assign(
 		__entry->dev	= inode->i_sb->s_dev;
 		__entry->nid	= EROFS_I(inode)->nid;
-		__entry->start	= page->index;
+		__entry->start	= start;
 		__entry->nrpage	= nrpage;
 		__entry->raw	= raw;
 	),
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 16/24] erofs: Convert compressed files from readpages to readahead
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (14 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 15/24] erofs: Convert uncompressed files " Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 17/24] ext4: Convert " Matthew Wilcox
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	Dave Chinner, linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new readahead operation in erofs.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/erofs/zdata.c | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 17f45fcb8c5c..e64d8ab0900d 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1303,28 +1303,23 @@ static bool should_decompress_synchronously(struct erofs_sb_info *sbi,
 	return nr <= sbi->max_sync_decompress_pages;
 }
 
-static int z_erofs_readpages(struct file *filp, struct address_space *mapping,
-			     struct list_head *pages, unsigned int nr_pages)
+static void z_erofs_readahead(struct readahead_control *rac)
 {
-	struct inode *const inode = mapping->host;
+	struct inode *const inode = rac->mapping->host;
 	struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
 
-	bool sync = should_decompress_synchronously(sbi, nr_pages);
+	bool sync = should_decompress_synchronously(sbi, readahead_count(rac));
 	struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
-	gfp_t gfp = mapping_gfp_constraint(mapping, GFP_KERNEL);
-	struct page *head = NULL;
+	struct page *page, *head = NULL;
 	LIST_HEAD(pagepool);
 
-	trace_erofs_readpages(mapping->host, lru_to_page(pages)->index,
-			      nr_pages, false);
+	trace_erofs_readpages(inode, readahead_index(rac),
+			readahead_count(rac), false);
 
-	f.headoffset = (erofs_off_t)lru_to_page(pages)->index << PAGE_SHIFT;
-
-	for (; nr_pages; --nr_pages) {
-		struct page *page = lru_to_page(pages);
+	f.headoffset = readahead_pos(rac);
 
+	while ((page = readahead_page(rac))) {
 		prefetchw(&page->flags);
-		list_del(&page->lru);
 
 		/*
 		 * A pure asynchronous readahead is indicated if
@@ -1333,11 +1328,6 @@ static int z_erofs_readpages(struct file *filp, struct address_space *mapping,
 		 */
 		sync &= !(PageReadahead(page) && !head);
 
-		if (add_to_page_cache_lru(page, mapping, page->index, gfp)) {
-			list_add(&page->lru, &pagepool);
-			continue;
-		}
-
 		set_page_private(page, (unsigned long)head);
 		head = page;
 	}
@@ -1366,11 +1356,10 @@ static int z_erofs_readpages(struct file *filp, struct address_space *mapping,
 
 	/* clean up the remaining free pages */
 	put_pages_list(&pagepool);
-	return 0;
 }
 
 const struct address_space_operations z_erofs_aops = {
 	.readpage = z_erofs_readpage,
-	.readpages = z_erofs_readpages,
+	.readahead = z_erofs_readahead,
 };
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 17/24] ext4: Convert from readpages to readahead
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (15 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 16/24] erofs: Convert compressed " Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 18/24] ext4: Pass the inode to ext4_mpage_readpages Matthew Wilcox
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new readahead operation in ext4

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/ext4/ext4.h     |  3 +--
 fs/ext4/inode.c    | 21 +++++++++------------
 fs/ext4/readpage.c | 22 ++++++++--------------
 3 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 4441331d06cc..1570a0b51b73 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3279,8 +3279,7 @@ static inline void ext4_set_de_type(struct super_block *sb,
 
 /* readpages.c */
 extern int ext4_mpage_readpages(struct address_space *mapping,
-				struct list_head *pages, struct page *page,
-				unsigned nr_pages, bool is_readahead);
+		struct readahead_control *rac, struct page *page);
 extern int __init ext4_init_post_read_processing(void);
 extern void ext4_exit_post_read_processing(void);
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e60aca791d3f..d674c5f9066c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3226,23 +3226,20 @@ static int ext4_readpage(struct file *file, struct page *page)
 		ret = ext4_readpage_inline(inode, page);
 
 	if (ret == -EAGAIN)
-		return ext4_mpage_readpages(page->mapping, NULL, page, 1,
-						false);
+		return ext4_mpage_readpages(page->mapping, NULL, page);
 
 	return ret;
 }
 
-static int
-ext4_readpages(struct file *file, struct address_space *mapping,
-		struct list_head *pages, unsigned nr_pages)
+static void ext4_readahead(struct readahead_control *rac)
 {
-	struct inode *inode = mapping->host;
+	struct inode *inode = rac->mapping->host;
 
-	/* If the file has inline data, no need to do readpages. */
+	/* If the file has inline data, no need to do readahead. */
 	if (ext4_has_inline_data(inode))
-		return 0;
+		return;
 
-	return ext4_mpage_readpages(mapping, pages, NULL, nr_pages, true);
+	ext4_mpage_readpages(rac->mapping, rac, NULL);
 }
 
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
@@ -3587,7 +3584,7 @@ static int ext4_set_page_dirty(struct page *page)
 
 static const struct address_space_operations ext4_aops = {
 	.readpage		= ext4_readpage,
-	.readpages		= ext4_readpages,
+	.readahead		= ext4_readahead,
 	.writepage		= ext4_writepage,
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_write_begin,
@@ -3604,7 +3601,7 @@ static const struct address_space_operations ext4_aops = {
 
 static const struct address_space_operations ext4_journalled_aops = {
 	.readpage		= ext4_readpage,
-	.readpages		= ext4_readpages,
+	.readahead		= ext4_readahead,
 	.writepage		= ext4_writepage,
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_write_begin,
@@ -3620,7 +3617,7 @@ static const struct address_space_operations ext4_journalled_aops = {
 
 static const struct address_space_operations ext4_da_aops = {
 	.readpage		= ext4_readpage,
-	.readpages		= ext4_readpages,
+	.readahead		= ext4_readahead,
 	.writepage		= ext4_writepage,
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_da_write_begin,
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index c1769afbf799..66275f25235d 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -7,8 +7,8 @@
  *
  * This was originally taken from fs/mpage.c
  *
- * The intent is the ext4_mpage_readpages() function here is intended
- * to replace mpage_readpages() in the general case, not just for
+ * The ext4_mpage_readpages() function here is intended to
+ * replace mpage_readahead() in the general case, not just for
  * encrypted files.  It has some limitations (see below), where it
  * will fall back to read_block_full_page(), but these limitations
  * should only be hit when page_size != block_size.
@@ -222,8 +222,7 @@ static inline loff_t ext4_readpage_limit(struct inode *inode)
 }
 
 int ext4_mpage_readpages(struct address_space *mapping,
-			 struct list_head *pages, struct page *page,
-			 unsigned nr_pages, bool is_readahead)
+		struct readahead_control *rac, struct page *page)
 {
 	struct bio *bio = NULL;
 	sector_t last_block_in_bio = 0;
@@ -241,6 +240,7 @@ int ext4_mpage_readpages(struct address_space *mapping,
 	int length;
 	unsigned relative_block = 0;
 	struct ext4_map_blocks map;
+	unsigned int nr_pages = rac ? readahead_count(rac) : 1;
 
 	map.m_pblk = 0;
 	map.m_lblk = 0;
@@ -251,14 +251,9 @@ int ext4_mpage_readpages(struct address_space *mapping,
 		int fully_mapped = 1;
 		unsigned first_hole = blocks_per_page;
 
-		if (pages) {
-			page = lru_to_page(pages);
-
+		if (rac) {
+			page = readahead_page(rac);
 			prefetchw(&page->flags);
-			list_del(&page->lru);
-			if (add_to_page_cache_lru(page, mapping, page->index,
-				  readahead_gfp_mask(mapping)))
-				goto next_page;
 		}
 
 		if (page_has_buffers(page))
@@ -381,7 +376,7 @@ int ext4_mpage_readpages(struct address_space *mapping,
 			bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9);
 			bio->bi_end_io = mpage_end_io;
 			bio_set_op_attrs(bio, REQ_OP_READ,
-						is_readahead ? REQ_RAHEAD : 0);
+						rac ? REQ_RAHEAD : 0);
 		}
 
 		length = first_hole << blkbits;
@@ -406,10 +401,9 @@ int ext4_mpage_readpages(struct address_space *mapping,
 		else
 			unlock_page(page);
 	next_page:
-		if (pages)
+		if (rac)
 			put_page(page);
 	}
-	BUG_ON(pages && !list_empty(pages));
 	if (bio)
 		submit_bio(bio);
 	return 0;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 18/24] ext4: Pass the inode to ext4_mpage_readpages
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (16 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 17/24] ext4: Convert " Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 19/24] f2fs: Convert from readpages to readahead Matthew Wilcox
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This function now only uses the mapping argument to look up the inode,
and both callers already have the inode, so just pass the inode instead
of the mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/ext4/ext4.h     | 2 +-
 fs/ext4/inode.c    | 4 ++--
 fs/ext4/readpage.c | 3 +--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1570a0b51b73..bc1b34ba6eab 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3278,7 +3278,7 @@ static inline void ext4_set_de_type(struct super_block *sb,
 }
 
 /* readpages.c */
-extern int ext4_mpage_readpages(struct address_space *mapping,
+extern int ext4_mpage_readpages(struct inode *inode,
 		struct readahead_control *rac, struct page *page);
 extern int __init ext4_init_post_read_processing(void);
 extern void ext4_exit_post_read_processing(void);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d674c5f9066c..4f3703c1408d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3226,7 +3226,7 @@ static int ext4_readpage(struct file *file, struct page *page)
 		ret = ext4_readpage_inline(inode, page);
 
 	if (ret == -EAGAIN)
-		return ext4_mpage_readpages(page->mapping, NULL, page);
+		return ext4_mpage_readpages(inode, NULL, page);
 
 	return ret;
 }
@@ -3239,7 +3239,7 @@ static void ext4_readahead(struct readahead_control *rac)
 	if (ext4_has_inline_data(inode))
 		return;
 
-	ext4_mpage_readpages(rac->mapping, rac, NULL);
+	ext4_mpage_readpages(inode, rac, NULL);
 }
 
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 66275f25235d..5761e9961682 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -221,13 +221,12 @@ static inline loff_t ext4_readpage_limit(struct inode *inode)
 	return i_size_read(inode);
 }
 
-int ext4_mpage_readpages(struct address_space *mapping,
+int ext4_mpage_readpages(struct inode *inode,
 		struct readahead_control *rac, struct page *page)
 {
 	struct bio *bio = NULL;
 	sector_t last_block_in_bio = 0;
 
-	struct inode *inode = mapping->host;
 	const unsigned blkbits = inode->i_blkbits;
 	const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
 	const unsigned blocksize = 1 << blkbits;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 19/24] f2fs: Convert from readpages to readahead
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (17 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 18/24] ext4: Pass the inode to ext4_mpage_readpages Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-19 21:00 ` [PATCH v7 20/24] fuse: " Matthew Wilcox
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new readahead operation in f2fs

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/f2fs/data.c              | 50 +++++++++++++++----------------------
 fs/f2fs/f2fs.h              |  5 ++--
 include/trace/events/f2fs.h |  6 ++---
 3 files changed, 25 insertions(+), 36 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b27b72107911..87964e4cb6b8 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2159,13 +2159,11 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
  * use ->readpage() or do the necessary surgery to decouple ->readpages()
  * from read-ahead.
  */
-int f2fs_mpage_readpages(struct address_space *mapping,
-			struct list_head *pages, struct page *page,
-			unsigned nr_pages, bool is_readahead)
+int f2fs_mpage_readpages(struct inode *inode, struct readahead_control *rac,
+		struct page *page)
 {
 	struct bio *bio = NULL;
 	sector_t last_block_in_bio = 0;
-	struct inode *inode = mapping->host;
 	struct f2fs_map_blocks map;
 #ifdef CONFIG_F2FS_FS_COMPRESSION
 	struct compress_ctx cc = {
@@ -2179,6 +2177,7 @@ int f2fs_mpage_readpages(struct address_space *mapping,
 		.nr_cpages = 0,
 	};
 #endif
+	unsigned nr_pages = rac ? readahead_count(rac) : 1;
 	unsigned max_nr_pages = nr_pages;
 	int ret = 0;
 
@@ -2192,15 +2191,9 @@ int f2fs_mpage_readpages(struct address_space *mapping,
 	map.m_may_create = false;
 
 	for (; nr_pages; nr_pages--) {
-		if (pages) {
-			page = list_last_entry(pages, struct page, lru);
-
+		if (rac) {
+			page = readahead_page(rac);
 			prefetchw(&page->flags);
-			list_del(&page->lru);
-			if (add_to_page_cache_lru(page, mapping,
-						  page_index(page),
-						  readahead_gfp_mask(mapping)))
-				goto next_page;
 		}
 
 #ifdef CONFIG_F2FS_FS_COMPRESSION
@@ -2210,7 +2203,7 @@ int f2fs_mpage_readpages(struct address_space *mapping,
 				ret = f2fs_read_multi_pages(&cc, &bio,
 							max_nr_pages,
 							&last_block_in_bio,
-							is_readahead);
+							rac);
 				f2fs_destroy_compress_ctx(&cc);
 				if (ret)
 					goto set_error_page;
@@ -2233,7 +2226,7 @@ int f2fs_mpage_readpages(struct address_space *mapping,
 #endif
 
 		ret = f2fs_read_single_page(inode, page, max_nr_pages, &map,
-					&bio, &last_block_in_bio, is_readahead);
+					&bio, &last_block_in_bio, rac);
 		if (ret) {
 #ifdef CONFIG_F2FS_FS_COMPRESSION
 set_error_page:
@@ -2242,8 +2235,10 @@ int f2fs_mpage_readpages(struct address_space *mapping,
 			zero_user_segment(page, 0, PAGE_SIZE);
 			unlock_page(page);
 		}
+#ifdef CONFIG_F2FS_FS_COMPRESSION
 next_page:
-		if (pages)
+#endif
+		if (rac)
 			put_page(page);
 
 #ifdef CONFIG_F2FS_FS_COMPRESSION
@@ -2253,16 +2248,15 @@ int f2fs_mpage_readpages(struct address_space *mapping,
 				ret = f2fs_read_multi_pages(&cc, &bio,
 							max_nr_pages,
 							&last_block_in_bio,
-							is_readahead);
+							rac);
 				f2fs_destroy_compress_ctx(&cc);
 			}
 		}
 #endif
 	}
-	BUG_ON(pages && !list_empty(pages));
 	if (bio)
 		__submit_bio(F2FS_I_SB(inode), bio, DATA);
-	return pages ? 0 : ret;
+	return ret;
 }
 
 static int f2fs_read_data_page(struct file *file, struct page *page)
@@ -2281,28 +2275,24 @@ static int f2fs_read_data_page(struct file *file, struct page *page)
 	if (f2fs_has_inline_data(inode))
 		ret = f2fs_read_inline_data(inode, page);
 	if (ret == -EAGAIN)
-		ret = f2fs_mpage_readpages(page_file_mapping(page),
-						NULL, page, 1, false);
+		ret = f2fs_mpage_readpages(inode, NULL, page);
 	return ret;
 }
 
-static int f2fs_read_data_pages(struct file *file,
-			struct address_space *mapping,
-			struct list_head *pages, unsigned nr_pages)
+static void f2fs_readahead(struct readahead_control *rac)
 {
-	struct inode *inode = mapping->host;
-	struct page *page = list_last_entry(pages, struct page, lru);
+	struct inode *inode = rac->mapping->host;
 
-	trace_f2fs_readpages(inode, page, nr_pages);
+	trace_f2fs_readpages(inode, readahead_index(rac), readahead_count(rac));
 
 	if (!f2fs_is_compress_backend_ready(inode))
-		return 0;
+		return;
 
 	/* If the file has inline data, skip readpages */
 	if (f2fs_has_inline_data(inode))
-		return 0;
+		return;
 
-	return f2fs_mpage_readpages(mapping, pages, NULL, nr_pages, true);
+	f2fs_mpage_readpages(inode, rac, NULL);
 }
 
 int f2fs_encrypt_one_page(struct f2fs_io_info *fio)
@@ -3784,7 +3774,7 @@ static void f2fs_swap_deactivate(struct file *file)
 
 const struct address_space_operations f2fs_dblock_aops = {
 	.readpage	= f2fs_read_data_page,
-	.readpages	= f2fs_read_data_pages,
+	.readahead	= f2fs_readahead,
 	.writepage	= f2fs_write_data_page,
 	.writepages	= f2fs_write_data_pages,
 	.write_begin	= f2fs_write_begin,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 5355be6b6755..b5e72dee8826 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3344,9 +3344,8 @@ int f2fs_reserve_new_block(struct dnode_of_data *dn);
 int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
 int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from);
 int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
-int f2fs_mpage_readpages(struct address_space *mapping,
-			struct list_head *pages, struct page *page,
-			unsigned nr_pages, bool is_readahead);
+int f2fs_mpage_readpages(struct inode *inode, struct readahead_control *rac,
+		struct page *page);
 struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
 			int op_flags, bool for_write);
 struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index);
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 67a97838c2a0..d72da4a33883 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -1375,9 +1375,9 @@ TRACE_EVENT(f2fs_writepages,
 
 TRACE_EVENT(f2fs_readpages,
 
-	TP_PROTO(struct inode *inode, struct page *page, unsigned int nrpage),
+	TP_PROTO(struct inode *inode, pgoff_t start, unsigned int nrpage),
 
-	TP_ARGS(inode, page, nrpage),
+	TP_ARGS(inode, start, nrpage),
 
 	TP_STRUCT__entry(
 		__field(dev_t,	dev)
@@ -1389,7 +1389,7 @@ TRACE_EVENT(f2fs_readpages,
 	TP_fast_assign(
 		__entry->dev	= inode->i_sb->s_dev;
 		__entry->ino	= inode->i_ino;
-		__entry->start	= page->index;
+		__entry->start	= start;
 		__entry->nrpage	= nrpage;
 	),
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 20/24] fuse: Convert from readpages to readahead
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (18 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 19/24] f2fs: Convert from readpages to readahead Matthew Wilcox
@ 2020-02-19 21:00 ` Matthew Wilcox
  2020-02-19 21:01 ` [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor Matthew Wilcox
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:00 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	Dave Chinner, linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new readahead operation in fuse.  Switching away from the
read_cache_pages() helper gets rid of an implicit call to put_page(),
so we can get rid of the get_page() call in fuse_readpages_fill().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fuse/file.c | 46 +++++++++++++++++++---------------------------
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 9d67b830fb7a..5749505bcff6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -923,9 +923,8 @@ struct fuse_fill_data {
 	unsigned int max_pages;
 };
 
-static int fuse_readpages_fill(void *_data, struct page *page)
+static int fuse_readpages_fill(struct fuse_fill_data *data, struct page *page)
 {
-	struct fuse_fill_data *data = _data;
 	struct fuse_io_args *ia = data->ia;
 	struct fuse_args_pages *ap = &ia->ap;
 	struct inode *inode = data->inode;
@@ -941,10 +940,8 @@ static int fuse_readpages_fill(void *_data, struct page *page)
 					fc->max_pages);
 		fuse_send_readpages(ia, data->file);
 		data->ia = ia = fuse_io_alloc(NULL, data->max_pages);
-		if (!ia) {
-			unlock_page(page);
+		if (!ia)
 			return -ENOMEM;
-		}
 		ap = &ia->ap;
 	}
 
@@ -954,7 +951,6 @@ static int fuse_readpages_fill(void *_data, struct page *page)
 		return -EIO;
 	}
 
-	get_page(page);
 	ap->pages[ap->num_pages] = page;
 	ap->descs[ap->num_pages].length = PAGE_SIZE;
 	ap->num_pages++;
@@ -962,37 +958,33 @@ static int fuse_readpages_fill(void *_data, struct page *page)
 	return 0;
 }
 
-static int fuse_readpages(struct file *file, struct address_space *mapping,
-			  struct list_head *pages, unsigned nr_pages)
+static void fuse_readahead(struct readahead_control *rac)
 {
-	struct inode *inode = mapping->host;
+	struct inode *inode = rac->mapping->host;
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	struct fuse_fill_data data;
-	int err;
+	struct page *page;
 
-	err = -EIO;
 	if (is_bad_inode(inode))
-		goto out;
+		return;
 
-	data.file = file;
+	data.file = rac->file;
 	data.inode = inode;
-	data.nr_pages = nr_pages;
-	data.max_pages = min_t(unsigned int, nr_pages, fc->max_pages);
-;
+	data.nr_pages = readahead_count(rac);
+	data.max_pages = min_t(unsigned int, data.nr_pages, fc->max_pages);
 	data.ia = fuse_io_alloc(NULL, data.max_pages);
-	err = -ENOMEM;
 	if (!data.ia)
-		goto out;
+		return;
 
-	err = read_cache_pages(mapping, pages, fuse_readpages_fill, &data);
-	if (!err) {
-		if (data.ia->ap.num_pages)
-			fuse_send_readpages(data.ia, file);
-		else
-			fuse_io_free(data.ia);
+	while ((page = readahead_page(rac))) {
+		if (fuse_readpages_fill(&data, page) != 0)
+			return;
 	}
-out:
-	return err;
+
+	if (data.ia->ap.num_pages)
+		fuse_send_readpages(data.ia, rac->file);
+	else
+		fuse_io_free(data.ia);
 }
 
 static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
@@ -3373,10 +3365,10 @@ static const struct file_operations fuse_file_operations = {
 
 static const struct address_space_operations fuse_file_aops  = {
 	.readpage	= fuse_readpage,
+	.readahead	= fuse_readahead,
 	.writepage	= fuse_writepage,
 	.writepages	= fuse_writepages,
 	.launder_page	= fuse_launder_page,
-	.readpages	= fuse_readpages,
 	.set_page_dirty	= __set_page_dirty_nobuffers,
 	.bmap		= fuse_bmap,
 	.direct_IO	= fuse_direct_IO,
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (19 preceding siblings ...)
  2020-02-19 21:00 ` [PATCH v7 20/24] fuse: " Matthew Wilcox
@ 2020-02-19 21:01 ` Matthew Wilcox
  2020-02-20 15:47   ` Christoph Hellwig
  2020-02-22  0:44   ` Darrick J. Wong
  2020-02-19 21:01 ` [PATCH v7 22/24] iomap: Convert from readpages to readahead Matthew Wilcox
                   ` (3 subsequent siblings)
  24 siblings, 2 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:01 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

By putting the 'have we reached the end of the page' condition at the end
of the loop instead of the beginning, we can remove the 'submit the last
page' code from iomap_readpages().  Also check that iomap_readpage_actor()
didn't return 0, which would lead to an endless loop.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index cb3511eb152a..31899e6cb0f8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -400,15 +400,9 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
 		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iomap_readpage_ctx *ctx = data;
-	loff_t done, ret;
-
-	for (done = 0; done < length; done += ret) {
-		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
-			if (!ctx->cur_page_in_bio)
-				unlock_page(ctx->cur_page);
-			put_page(ctx->cur_page);
-			ctx->cur_page = NULL;
-		}
+	loff_t ret, done = 0;
+
+	while (done < length) {
 		if (!ctx->cur_page) {
 			ctx->cur_page = iomap_next_page(inode, ctx->pages,
 					pos, length, &done);
@@ -418,6 +412,20 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
 		}
 		ret = iomap_readpage_actor(inode, pos + done, length - done,
 				ctx, iomap, srcmap);
+		done += ret;
+
+		/* Keep working on a partial page */
+		if (ret && offset_in_page(pos + done))
+			continue;
+
+		if (!ctx->cur_page_in_bio)
+			unlock_page(ctx->cur_page);
+		put_page(ctx->cur_page);
+		ctx->cur_page = NULL;
+
+		/* Don't loop forever if we made no progress */
+		if (WARN_ON(!ret))
+			break;
 	}
 
 	return done;
@@ -451,11 +459,7 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
 done:
 	if (ctx.bio)
 		submit_bio(ctx.bio);
-	if (ctx.cur_page) {
-		if (!ctx.cur_page_in_bio)
-			unlock_page(ctx.cur_page);
-		put_page(ctx.cur_page);
-	}
+	BUG_ON(ctx.cur_page);
 
 	/*
 	 * Check that we didn't lose a page due to the arcance calling
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 22/24] iomap: Convert from readpages to readahead
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (20 preceding siblings ...)
  2020-02-19 21:01 ` [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor Matthew Wilcox
@ 2020-02-19 21:01 ` Matthew Wilcox
  2020-02-20 15:49   ` Christoph Hellwig
  2020-02-22  1:03   ` Darrick J. Wong
  2020-02-19 21:01 ` [PATCH v7 23/24] mm: Document why we don't set PageReadahead Matthew Wilcox
                   ` (2 subsequent siblings)
  24 siblings, 2 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:01 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new readahead operation in iomap.  Convert XFS and ZoneFS to
use it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 90 +++++++++++++++---------------------------
 fs/iomap/trace.h       |  2 +-
 fs/xfs/xfs_aops.c      | 13 +++---
 fs/zonefs/super.c      |  7 ++--
 include/linux/iomap.h  |  3 +-
 5 files changed, 41 insertions(+), 74 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 31899e6cb0f8..66cf453f4bb7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -214,9 +214,8 @@ iomap_read_end_io(struct bio *bio)
 struct iomap_readpage_ctx {
 	struct page		*cur_page;
 	bool			cur_page_in_bio;
-	bool			is_readahead;
 	struct bio		*bio;
-	struct list_head	*pages;
+	struct readahead_control *rac;
 };
 
 static void
@@ -307,11 +306,11 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		if (ctx->bio)
 			submit_bio(ctx->bio);
 
-		if (ctx->is_readahead) /* same as readahead_gfp_mask */
+		if (ctx->rac) /* same as readahead_gfp_mask */
 			gfp |= __GFP_NORETRY | __GFP_NOWARN;
 		ctx->bio = bio_alloc(gfp, min(BIO_MAX_PAGES, nr_vecs));
 		ctx->bio->bi_opf = REQ_OP_READ;
-		if (ctx->is_readahead)
+		if (ctx->rac)
 			ctx->bio->bi_opf |= REQ_RAHEAD;
 		ctx->bio->bi_iter.bi_sector = sector;
 		bio_set_dev(ctx->bio, iomap->bdev);
@@ -367,36 +366,8 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 }
 EXPORT_SYMBOL_GPL(iomap_readpage);
 
-static struct page *
-iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
-		loff_t length, loff_t *done)
-{
-	while (!list_empty(pages)) {
-		struct page *page = lru_to_page(pages);
-
-		if (page_offset(page) >= (u64)pos + length)
-			break;
-
-		list_del(&page->lru);
-		if (!add_to_page_cache_lru(page, inode->i_mapping, page->index,
-				GFP_NOFS))
-			return page;
-
-		/*
-		 * If we already have a page in the page cache at index we are
-		 * done.  Upper layers don't care if it is uptodate after the
-		 * readpages call itself as every page gets checked again once
-		 * actually needed.
-		 */
-		*done += PAGE_SIZE;
-		put_page(page);
-	}
-
-	return NULL;
-}
-
 static loff_t
-iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
+iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
 		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iomap_readpage_ctx *ctx = data;
@@ -404,10 +375,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
 
 	while (done < length) {
 		if (!ctx->cur_page) {
-			ctx->cur_page = iomap_next_page(inode, ctx->pages,
-					pos, length, &done);
-			if (!ctx->cur_page)
-				break;
+			ctx->cur_page = readahead_page(ctx->rac);
 			ctx->cur_page_in_bio = false;
 		}
 		ret = iomap_readpage_actor(inode, pos + done, length - done,
@@ -431,44 +399,48 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
 	return done;
 }
 
-int
-iomap_readpages(struct address_space *mapping, struct list_head *pages,
-		unsigned nr_pages, const struct iomap_ops *ops)
+/**
+ * iomap_readahead - Attempt to read pages from a file.
+ * @rac: Describes the pages to be read.
+ * @ops: The operations vector for the filesystem.
+ *
+ * This function is for filesystems to call to implement their readahead
+ * address_space operation.
+ *
+ * Context: The file is pinned by the caller, and the pages to be read are
+ * all locked and have an elevated refcount.  This function will unlock
+ * the pages (once I/O has completed on them, or I/O has been determined to
+ * not be necessary).  It will also decrease the refcount once the pages
+ * have been submitted for I/O.  After this point, the page may be removed
+ * from the page cache, and should not be referenced.
+ */
+void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 {
+	struct inode *inode = rac->mapping->host;
+	loff_t pos = readahead_pos(rac);
+	loff_t length = readahead_length(rac);
 	struct iomap_readpage_ctx ctx = {
-		.pages		= pages,
-		.is_readahead	= true,
+		.rac	= rac,
 	};
-	loff_t pos = page_offset(list_entry(pages->prev, struct page, lru));
-	loff_t last = page_offset(list_entry(pages->next, struct page, lru));
-	loff_t length = last - pos + PAGE_SIZE, ret = 0;
 
-	trace_iomap_readpages(mapping->host, nr_pages);
+	trace_iomap_readahead(inode, readahead_count(rac));
 
 	while (length > 0) {
-		ret = iomap_apply(mapping->host, pos, length, 0, ops,
-				&ctx, iomap_readpages_actor);
+		loff_t ret = iomap_apply(inode, pos, length, 0, ops,
+				&ctx, iomap_readahead_actor);
 		if (ret <= 0) {
 			WARN_ON_ONCE(ret == 0);
-			goto done;
+			break;
 		}
 		pos += ret;
 		length -= ret;
 	}
-	ret = 0;
-done:
+
 	if (ctx.bio)
 		submit_bio(ctx.bio);
 	BUG_ON(ctx.cur_page);
-
-	/*
-	 * Check that we didn't lose a page due to the arcance calling
-	 * conventions..
-	 */
-	WARN_ON_ONCE(!ret && !list_empty(ctx.pages));
-	return ret;
 }
-EXPORT_SYMBOL_GPL(iomap_readpages);
+EXPORT_SYMBOL_GPL(iomap_readahead);
 
 /*
  * iomap_is_partially_uptodate checks whether blocks within a page are
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 6dc227b8c47e..d6ba705f938a 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -39,7 +39,7 @@ DEFINE_EVENT(iomap_readpage_class, name,	\
 	TP_PROTO(struct inode *inode, int nr_pages), \
 	TP_ARGS(inode, nr_pages))
 DEFINE_READPAGE_EVENT(iomap_readpage);
-DEFINE_READPAGE_EVENT(iomap_readpages);
+DEFINE_READPAGE_EVENT(iomap_readahead);
 
 DECLARE_EVENT_CLASS(iomap_page_class,
 	TP_PROTO(struct inode *inode, struct page *page, unsigned long off,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 58e937be24ce..6e68eeb50b07 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -621,14 +621,11 @@ xfs_vm_readpage(
 	return iomap_readpage(page, &xfs_read_iomap_ops);
 }
 
-STATIC int
-xfs_vm_readpages(
-	struct file		*unused,
-	struct address_space	*mapping,
-	struct list_head	*pages,
-	unsigned		nr_pages)
+STATIC void
+xfs_vm_readahead(
+	struct readahead_control	*rac)
 {
-	return iomap_readpages(mapping, pages, nr_pages, &xfs_read_iomap_ops);
+	iomap_readahead(rac, &xfs_read_iomap_ops);
 }
 
 static int
@@ -644,7 +641,7 @@ xfs_iomap_swapfile_activate(
 
 const struct address_space_operations xfs_address_space_operations = {
 	.readpage		= xfs_vm_readpage,
-	.readpages		= xfs_vm_readpages,
+	.readahead		= xfs_vm_readahead,
 	.writepage		= xfs_vm_writepage,
 	.writepages		= xfs_vm_writepages,
 	.set_page_dirty		= iomap_set_page_dirty,
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 8bc6ef82d693..8327a01d3bac 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -78,10 +78,9 @@ static int zonefs_readpage(struct file *unused, struct page *page)
 	return iomap_readpage(page, &zonefs_iomap_ops);
 }
 
-static int zonefs_readpages(struct file *unused, struct address_space *mapping,
-			    struct list_head *pages, unsigned int nr_pages)
+static void zonefs_readahead(struct readahead_control *rac)
 {
-	return iomap_readpages(mapping, pages, nr_pages, &zonefs_iomap_ops);
+	iomap_readahead(rac, &zonefs_iomap_ops);
 }
 
 /*
@@ -128,7 +127,7 @@ static int zonefs_writepages(struct address_space *mapping,
 
 static const struct address_space_operations zonefs_file_aops = {
 	.readpage		= zonefs_readpage,
-	.readpages		= zonefs_readpages,
+	.readahead		= zonefs_readahead,
 	.writepage		= zonefs_writepage,
 	.writepages		= zonefs_writepages,
 	.set_page_dirty		= iomap_set_page_dirty,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8b09463dae0d..bc20bd04c2a2 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -155,8 +155,7 @@ loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
 int iomap_readpage(struct page *page, const struct iomap_ops *ops);
-int iomap_readpages(struct address_space *mapping, struct list_head *pages,
-		unsigned nr_pages, const struct iomap_ops *ops);
+void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
 int iomap_set_page_dirty(struct page *page);
 int iomap_is_partially_uptodate(struct page *page, unsigned long from,
 		unsigned long count);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 23/24] mm: Document why we don't set PageReadahead
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (21 preceding siblings ...)
  2020-02-19 21:01 ` [PATCH v7 22/24] iomap: Convert from readpages to readahead Matthew Wilcox
@ 2020-02-19 21:01 ` Matthew Wilcox
  2020-02-19 21:01 ` [PATCH v7 24/24] mm: Use memalloc_nofs_save in readahead path Matthew Wilcox
  2020-02-20 17:54 ` [PATCH v7 00/23] Change readahead API David Sterba
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:01 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

If the page is already in cache, we don't set PageReadahead on it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/readahead.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 453ef146de83..bbe7208fcc2d 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -196,9 +196,12 @@ void page_cache_readahead_unbounded(struct address_space *mapping,
 
 		if (page && !xa_is_value(page)) {
 			/*
-			 * Page already present?  Kick off the current batch of
-			 * contiguous pages before continuing with the next
-			 * batch.
+			 * Page already present?  Kick off the current batch
+			 * of contiguous pages before continuing with the
+			 * next batch.  This page may be the one we would
+			 * have intended to mark as Readahead, but we don't
+			 * have a stable reference to this page, and it's
+			 * not worth getting one just for that.
 			 */
 			read_pages(&rac, &page_pool);
 			continue;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v7 24/24] mm: Use memalloc_nofs_save in readahead path
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (22 preceding siblings ...)
  2020-02-19 21:01 ` [PATCH v7 23/24] mm: Document why we don't set PageReadahead Matthew Wilcox
@ 2020-02-19 21:01 ` Matthew Wilcox
  2020-02-20 17:54 ` [PATCH v7 00/23] Change readahead API David Sterba
  24 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-19 21:01 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-xfs, Michal Hocko, linux-kernel, Matthew Wilcox (Oracle),
	linux-f2fs-devel, cluster-devel, linux-mm, ocfs2-devel,
	Cong Wang, linux-ext4, linux-erofs, linux-btrfs

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Ensure that memory allocations in the readahead path do not attempt to
reclaim file-backed pages, which could lead to a deadlock.  It is
possible, though unlikely this is the root cause of a problem observed
by Cong Wang.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
---
 mm/readahead.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/mm/readahead.c b/mm/readahead.c
index bbe7208fcc2d..9fb5f77dcf69 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -22,6 +22,7 @@
 #include <linux/mm_inline.h>
 #include <linux/blk-cgroup.h>
 #include <linux/fadvise.h>
+#include <linux/sched/mm.h>
 
 #include "internal.h"
 
@@ -186,6 +187,18 @@ void page_cache_readahead_unbounded(struct address_space *mapping,
 	};
 	unsigned long i;
 
+	/*
+	 * Partway through the readahead operation, we will have added
+	 * locked pages to the page cache, but will not yet have submitted
+	 * them for I/O.  Adding another page may need to allocate memory,
+	 * which can trigger memory reclaim.  Telling the VM we're in
+	 * the middle of a filesystem operation will cause it to not
+	 * touch file-backed pages, preventing a deadlock.  Most (all?)
+	 * filesystems already specify __GFP_NOFS in their mapping's
+	 * gfp_mask, but let's be explicit here.
+	 */
+	unsigned int nofs = memalloc_nofs_save();
+
 	/*
 	 * Preallocate as many pages as we will need.
 	 */
@@ -230,6 +243,7 @@ void page_cache_readahead_unbounded(struct address_space *mapping,
 	 * will then handle the error.
 	 */
 	read_pages(&rac, &page_pool);
+	memalloc_nofs_restore(nofs);
 }
 EXPORT_SYMBOL_GPL(page_cache_readahead_unbounded);
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 14/24] btrfs: Convert from readpages to readahead
  2020-02-19 21:00 ` [PATCH v7 14/24] btrfs: Convert from readpages to readahead Matthew Wilcox
@ 2020-02-20  9:42   ` Johannes Thumshirn
  2020-02-20 13:48     ` Matthew Wilcox
  0 siblings, 1 reply; 76+ messages in thread
From: Johannes Thumshirn @ 2020-02-20  9:42 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-ext4, linux-erofs, linux-btrfs

On 19/02/2020 22:03, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Use the new readahead operation in btrfs.  Add a
> readahead_for_each_batch() iterator to optimise the loop in the XArray.


OK I must admit I haven't followed this series closely, but what 
happened to said readahead_for_each_batch()?

As far as I can see it's now:

[...]
> +	while ((nr = readahead_page_batch(rac, pagepool))) {



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 14/24] btrfs: Convert from readpages to readahead
  2020-02-20  9:42   ` Johannes Thumshirn
@ 2020-02-20 13:48     ` Matthew Wilcox
  2020-02-20 15:46       ` Christoph Hellwig
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-20 13:48 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Thu, Feb 20, 2020 at 09:42:19AM +0000, Johannes Thumshirn wrote:
> On 19/02/2020 22:03, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > Use the new readahead operation in btrfs.  Add a
> > readahead_for_each_batch() iterator to optimise the loop in the XArray.
> 
> OK I must admit I haven't followed this series closely, but what 
> happened to said readahead_for_each_batch()?
> 
> As far as I can see it's now:
> 
> [...]
> > +	while ((nr = readahead_page_batch(rac, pagepool))) {

Oops, forgot to update the changelog there.  Yes, that's exactly what it
changed to.  That discussion was here:

https://lore.kernel.org/linux-fsdevel/20200219144117.GP24185@bombadil.infradead.org/

... and then Christoph pointed out the iterators weren't really adding
much value at that point, so they got deleted.  New changelog for
this patch:

btrfs: Convert from readpages to readahead
  
Implement the new readahead method in btrfs.  Add a readahead_page_batch()
to optimise fetching a batch of pages at once.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages
  2020-02-19 21:00 ` [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages Matthew Wilcox
@ 2020-02-20 14:36   ` Zi Yan
  2020-02-21  4:24   ` John Hubbard
  2020-02-24 21:34   ` Christoph Hellwig
  2 siblings, 0 replies; 76+ messages in thread
From: Zi Yan @ 2020-02-20 14:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]

On 19 Feb 2020, at 16:00, Matthew Wilcox wrote:

> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> Simplify the callers by moving the check for nr_pages and the BUG_ON
> into read_pages().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/readahead.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 61b15b6b9e72..9fcd4e32b62d 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -119,6 +119,9 @@ static void read_pages(struct address_space *mapping, struct file *filp,
>  	struct blk_plug plug;
>  	unsigned page_idx;
>
> +	if (!nr_pages)
> +		return;
> +
>  	blk_start_plug(&plug);
>
>  	if (mapping->a_ops->readpages) {
> @@ -138,6 +141,8 @@ static void read_pages(struct address_space *mapping, struct file *filp,
>
>  out:
>  	blk_finish_plug(&plug);
> +
> +	BUG_ON(!list_empty(pages));
>  }
>
>  /*
> @@ -180,8 +185,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  			 * contiguous pages before continuing with the next
>  			 * batch.
>  			 */
> -			if (nr_pages)
> -				read_pages(mapping, filp, &page_pool, nr_pages,
> +			read_pages(mapping, filp, &page_pool, nr_pages,
>  						gfp_mask);
>  			nr_pages = 0;
>  			continue;
> @@ -202,9 +206,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  	 * uptodate then the caller will launch readpage again, and
>  	 * will then handle the error.
>  	 */
> -	if (nr_pages)
> -		read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
> -	BUG_ON(!list_empty(&page_pool));
> +	read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
>  }
>
>  /*
> -- 
> 2.25.0

Looks good to me. Thanks.

Reviewed-by: Zi Yan <ziy@nvidia.com>


--
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 10/24] mm: Add readahead address space operation
  2020-02-19 21:00 ` [PATCH v7 10/24] mm: Add readahead address space operation Matthew Wilcox
@ 2020-02-20 15:00   ` Zi Yan
  2020-02-20 15:10     ` Matthew Wilcox
  2020-02-21  4:30   ` John Hubbard
  2020-02-24 21:41   ` Christoph Hellwig
  2 siblings, 1 reply; 76+ messages in thread
From: Zi Yan @ 2020-02-20 15:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 7352 bytes --]

On 19 Feb 2020, at 16:00, Matthew Wilcox wrote:

> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> This replaces ->readpages with a saner interface:
>  - Return void instead of an ignored error code.
>  - Page cache is already populated with locked pages when ->readahead
>    is called.
>  - New arguments can be passed to the implementation without changing
>    all the filesystems that use a common helper function like
>    mpage_readahead().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  Documentation/filesystems/locking.rst |  6 +++++-
>  Documentation/filesystems/vfs.rst     | 15 +++++++++++++++
>  include/linux/fs.h                    |  2 ++
>  include/linux/pagemap.h               | 18 ++++++++++++++++++
>  mm/readahead.c                        | 12 ++++++++++--
>  5 files changed, 50 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index 5057e4d9dcd1..0af2e0e11461 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -239,6 +239,7 @@ prototypes::
>  	int (*readpage)(struct file *, struct page *);
>  	int (*writepages)(struct address_space *, struct writeback_control *);
>  	int (*set_page_dirty)(struct page *page);
> +	void (*readahead)(struct readahead_control *);
>  	int (*readpages)(struct file *filp, struct address_space *mapping,
>  			struct list_head *pages, unsigned nr_pages);
>  	int (*write_begin)(struct file *, struct address_space *mapping,
> @@ -271,7 +272,8 @@ writepage:		yes, unlocks (see below)
>  readpage:		yes, unlocks
>  writepages:
>  set_page_dirty		no
> -readpages:
> +readahead:		yes, unlocks
> +readpages:		no
>  write_begin:		locks the page		 exclusive
>  write_end:		yes, unlocks		 exclusive
>  bmap:
> @@ -295,6 +297,8 @@ the request handler (/dev/loop).
>  ->readpage() unlocks the page, either synchronously or via I/O
>  completion.
>
> +->readahead() unlocks the pages that I/O is attempted on like ->readpage().
> +
>  ->readpages() populates the pagecache with the passed pages and starts
>  I/O against them.  They come unlocked upon I/O completion.
>
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 7d4d09dd5e6d..ed17771c212b 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -706,6 +706,7 @@ cache in your filesystem.  The following members are defined:
>  		int (*readpage)(struct file *, struct page *);
>  		int (*writepages)(struct address_space *, struct writeback_control *);
>  		int (*set_page_dirty)(struct page *page);
> +		void (*readahead)(struct readahead_control *);
>  		int (*readpages)(struct file *filp, struct address_space *mapping,
>  				 struct list_head *pages, unsigned nr_pages);
>  		int (*write_begin)(struct file *, struct address_space *mapping,
> @@ -781,12 +782,26 @@ cache in your filesystem.  The following members are defined:
>  	If defined, it should set the PageDirty flag, and the
>  	PAGECACHE_TAG_DIRTY tag in the radix tree.
>
> +``readahead``
> +	Called by the VM to read pages associated with the address_space
> +	object.  The pages are consecutive in the page cache and are
> +	locked.  The implementation should decrement the page refcount
> +	after starting I/O on each page.  Usually the page will be
> +	unlocked by the I/O completion handler.  If the filesystem decides
> +	to stop attempting I/O before reaching the end of the readahead
> +	window, it can simply return.  The caller will decrement the page
> +	refcount and unlock the remaining pages for you.  Set PageUptodate
> +	if the I/O completes successfully.  Setting PageError on any page
> +	will be ignored; simply unlock the page if an I/O error occurs.
> +
>  ``readpages``
>  	called by the VM to read pages associated with the address_space
>  	object.  This is essentially just a vector version of readpage.
>  	Instead of just one page, several pages are requested.
>  	readpages is only used for read-ahead, so read errors are
>  	ignored.  If anything goes wrong, feel free to give up.
> +	This interface is deprecated and will be removed by the end of
> +	2020; implement readahead instead.
>
>  ``write_begin``
>  	Called by the generic buffered write code to ask the filesystem
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3cd4fe6b845e..d4e2d2964346 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -292,6 +292,7 @@ enum positive_aop_returns {
>  struct page;
>  struct address_space;
>  struct writeback_control;
> +struct readahead_control;
>
>  /*
>   * Write life time hint values.
> @@ -375,6 +376,7 @@ struct address_space_operations {
>  	 */
>  	int (*readpages)(struct file *filp, struct address_space *mapping,
>  			struct list_head *pages, unsigned nr_pages);
> +	void (*readahead)(struct readahead_control *);
>
>  	int (*write_begin)(struct file *, struct address_space *mapping,
>  				loff_t pos, unsigned len, unsigned flags,
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 4989d330fada..b3008605fd1b 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -669,6 +669,24 @@ static inline struct page *readahead_page(struct readahead_control *rac)
>  	return page;
>  }
>
> +/* The byte offset into the file of this readahead block */
> +static inline loff_t readahead_pos(struct readahead_control *rac)
> +{
> +	return (loff_t)rac->_index * PAGE_SIZE;
> +}
> +
> +/* The number of bytes in this readahead block */
> +static inline loff_t readahead_length(struct readahead_control *rac)
> +{
> +	return (loff_t)rac->_nr_pages * PAGE_SIZE;
> +}
> +
> +/* The index of the first page in this readahead block */
> +static inline unsigned int readahead_index(struct readahead_control *rac)
> +{
> +	return rac->_index;
> +}

rac->_index is pgoff_t, so readahead_index() should return the same type, right?
BTW, pgoff_t is unsigned long.

> +
>  /* The number of pages in this readahead block */
>  static inline unsigned int readahead_count(struct readahead_control *rac)
>  {
> diff --git a/mm/readahead.c b/mm/readahead.c
> index aaa209559ba2..07cdfbf00f4b 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -124,7 +124,14 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages)
>
>  	blk_start_plug(&plug);
>
> -	if (aops->readpages) {
> +	if (aops->readahead) {
> +		aops->readahead(rac);
> +		/* Clean up the remaining pages */
> +		while ((page = readahead_page(rac))) {
> +			unlock_page(page);
> +			put_page(page);
> +		}
> +	} else if (aops->readpages) {
>  		aops->readpages(rac->file, rac->mapping, pages,
>  				readahead_count(rac));
>  		/* Clean up the remaining pages */
> @@ -234,7 +241,8 @@ void force_page_cache_readahead(struct address_space *mapping,
>  	struct file_ra_state *ra = &filp->f_ra;
>  	unsigned long max_pages;
>
> -	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
> +	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages &&
> +			!mapping->a_ops->readahead))
>  		return;
>
>  	/*
> -- 
> 2.25.0


--
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 10/24] mm: Add readahead address space operation
  2020-02-20 15:00   ` Zi Yan
@ 2020-02-20 15:10     ` Matthew Wilcox
  0 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-20 15:10 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Thu, Feb 20, 2020 at 10:00:30AM -0500, Zi Yan wrote:
> > +/* The index of the first page in this readahead block */
> > +static inline unsigned int readahead_index(struct readahead_control *rac)
> > +{
> > +	return rac->_index;
> > +}
> 
> rac->_index is pgoff_t, so readahead_index() should return the same type, right?
> BTW, pgoff_t is unsigned long.

Oh my goodness!  Thank you for spotting that.  Fortunately, it's only
currently used by tracepoints, so it wasn't causing any trouble, but
that's a nasty landmine to leave lying around.  Fixed:

static inline pgoff_t readahead_index(struct readahead_control *rac)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 14/24] btrfs: Convert from readpages to readahead
  2020-02-20 13:48     ` Matthew Wilcox
@ 2020-02-20 15:46       ` Christoph Hellwig
  2020-02-20 15:54         ` Matthew Wilcox
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-20 15:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, Johannes Thumshirn, linux-kernel, linux-f2fs-devel,
	cluster-devel, linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4,
	linux-erofs, linux-btrfs

On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> btrfs: Convert from readpages to readahead
>   
> Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> to optimise fetching a batch of pages at once.

Shouldn't this readahead_page_batch heper go into a separate patch so
that it clearly stands out?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor
  2020-02-19 21:01 ` [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor Matthew Wilcox
@ 2020-02-20 15:47   ` Christoph Hellwig
  2020-02-20 16:24     ` Matthew Wilcox
  2020-02-22  0:44   ` Darrick J. Wong
  1 sibling, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-20 15:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Wed, Feb 19, 2020 at 01:01:00PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> By putting the 'have we reached the end of the page' condition at the end
> of the loop instead of the beginning, we can remove the 'submit the last
> page' code from iomap_readpages().  Also check that iomap_readpage_actor()
> didn't return 0, which would lead to an endless loop.

I'm obviously biassed a I wrote the original code, but I find the new
very much harder to understand (not that the previous one was easy, this
is tricky code..).

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 22/24] iomap: Convert from readpages to readahead
  2020-02-19 21:01 ` [PATCH v7 22/24] iomap: Convert from readpages to readahead Matthew Wilcox
@ 2020-02-20 15:49   ` Christoph Hellwig
  2020-02-20 16:57     ` Matthew Wilcox
  2020-02-22  1:03   ` Darrick J. Wong
  1 sibling, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-20 15:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Wed, Feb 19, 2020 at 01:01:01PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Use the new readahead operation in iomap.  Convert XFS and ZoneFS to
> use it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 90 +++++++++++++++---------------------------
>  fs/iomap/trace.h       |  2 +-
>  fs/xfs/xfs_aops.c      | 13 +++---
>  fs/zonefs/super.c      |  7 ++--
>  include/linux/iomap.h  |  3 +-
>  5 files changed, 41 insertions(+), 74 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 31899e6cb0f8..66cf453f4bb7 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -214,9 +214,8 @@ iomap_read_end_io(struct bio *bio)
>  struct iomap_readpage_ctx {
>  	struct page		*cur_page;
>  	bool			cur_page_in_bio;
> -	bool			is_readahead;
>  	struct bio		*bio;
> -	struct list_head	*pages;
> +	struct readahead_control *rac;
>  };
>  
>  static void
> @@ -307,11 +306,11 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		if (ctx->bio)
>  			submit_bio(ctx->bio);
>  
> -		if (ctx->is_readahead) /* same as readahead_gfp_mask */
> +		if (ctx->rac) /* same as readahead_gfp_mask */
>  			gfp |= __GFP_NORETRY | __GFP_NOWARN;
>  		ctx->bio = bio_alloc(gfp, min(BIO_MAX_PAGES, nr_vecs));
>  		ctx->bio->bi_opf = REQ_OP_READ;
> -		if (ctx->is_readahead)
> +		if (ctx->rac)
>  			ctx->bio->bi_opf |= REQ_RAHEAD;
>  		ctx->bio->bi_iter.bi_sector = sector;
>  		bio_set_dev(ctx->bio, iomap->bdev);
> @@ -367,36 +366,8 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(iomap_readpage);
>  
> -static struct page *
> -iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
> -		loff_t length, loff_t *done)
> -{
> -	while (!list_empty(pages)) {
> -		struct page *page = lru_to_page(pages);
> -
> -		if (page_offset(page) >= (u64)pos + length)
> -			break;
> -
> -		list_del(&page->lru);
> -		if (!add_to_page_cache_lru(page, inode->i_mapping, page->index,
> -				GFP_NOFS))
> -			return page;
> -
> -		/*
> -		 * If we already have a page in the page cache at index we are
> -		 * done.  Upper layers don't care if it is uptodate after the
> -		 * readpages call itself as every page gets checked again once
> -		 * actually needed.
> -		 */
> -		*done += PAGE_SIZE;
> -		put_page(page);
> -	}
> -
> -	return NULL;
> -}
> -
>  static loff_t
> -iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> +iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
>  		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_readpage_ctx *ctx = data;
> @@ -404,10 +375,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
>  
>  	while (done < length) {
>  		if (!ctx->cur_page) {
> -			ctx->cur_page = iomap_next_page(inode, ctx->pages,
> -					pos, length, &done);
> -			if (!ctx->cur_page)
> -				break;
> +			ctx->cur_page = readahead_page(ctx->rac);
>  			ctx->cur_page_in_bio = false;
>  		}
>  		ret = iomap_readpage_actor(inode, pos + done, length - done,
> @@ -431,44 +399,48 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
>  	return done;
>  }
>  
> -int
> -iomap_readpages(struct address_space *mapping, struct list_head *pages,
> -		unsigned nr_pages, const struct iomap_ops *ops)
> +/**
> + * iomap_readahead - Attempt to read pages from a file.
> + * @rac: Describes the pages to be read.
> + * @ops: The operations vector for the filesystem.
> + *
> + * This function is for filesystems to call to implement their readahead
> + * address_space operation.
> + *
> + * Context: The file is pinned by the caller, and the pages to be read are
> + * all locked and have an elevated refcount.  This function will unlock
> + * the pages (once I/O has completed on them, or I/O has been determined to
> + * not be necessary).  It will also decrease the refcount once the pages
> + * have been submitted for I/O.  After this point, the page may be removed
> + * from the page cache, and should not be referenced.
> + */

Isn't the context documentation something that belongs into the aop
documentation?  I've never really seen the value of duplicating this
information in method instances, as it is just bound to be out of date
rather sooner than later.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 14/24] btrfs: Convert from readpages to readahead
  2020-02-20 15:46       ` Christoph Hellwig
@ 2020-02-20 15:54         ` Matthew Wilcox
  2020-02-20 15:57           ` Christoph Hellwig
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-20 15:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, Johannes Thumshirn, linux-kernel, linux-f2fs-devel,
	cluster-devel, linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4,
	linux-erofs, linux-btrfs

On Thu, Feb 20, 2020 at 07:46:58AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> > btrfs: Convert from readpages to readahead
> >   
> > Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> > to optimise fetching a batch of pages at once.
> 
> Shouldn't this readahead_page_batch heper go into a separate patch so
> that it clearly stands out?

I'll move it into 'Put readahead pages in cache earlier' for v8 (the
same patch where we add readahead_page())

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 14/24] btrfs: Convert from readpages to readahead
  2020-02-20 15:54         ` Matthew Wilcox
@ 2020-02-20 15:57           ` Christoph Hellwig
  2020-02-24 21:43             ` Christoph Hellwig
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-20 15:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: cluster-devel, Johannes Thumshirn, linux-kernel,
	linux-f2fs-devel, Christoph Hellwig, linux-mm, ocfs2-devel,
	linux-fsdevel, linux-ext4, linux-erofs, linux-xfs, linux-btrfs

On Thu, Feb 20, 2020 at 07:54:52AM -0800, Matthew Wilcox wrote:
> On Thu, Feb 20, 2020 at 07:46:58AM -0800, Christoph Hellwig wrote:
> > On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> > > btrfs: Convert from readpages to readahead
> > >   
> > > Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> > > to optimise fetching a batch of pages at once.
> > 
> > Shouldn't this readahead_page_batch heper go into a separate patch so
> > that it clearly stands out?
> 
> I'll move it into 'Put readahead pages in cache earlier' for v8 (the
> same patch where we add readahead_page())

One argument for keeping it in a patch of its own is that btrfs appears
to be the only user, and Goldwyn has a WIP conversion of btrfs to iomap,
so it might go away pretty soon and we could just revert the commit.

But this starts to get into really minor details, so I'll shut up now :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor
  2020-02-20 15:47   ` Christoph Hellwig
@ 2020-02-20 16:24     ` Matthew Wilcox
  2020-02-24 22:17       ` Christoph Hellwig
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-20 16:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Thu, Feb 20, 2020 at 07:47:41AM -0800, Christoph Hellwig wrote:
> On Wed, Feb 19, 2020 at 01:01:00PM -0800, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > By putting the 'have we reached the end of the page' condition at the end
> > of the loop instead of the beginning, we can remove the 'submit the last
> > page' code from iomap_readpages().  Also check that iomap_readpage_actor()
> > didn't return 0, which would lead to an endless loop.
> 
> I'm obviously biassed a I wrote the original code, but I find the new
> very much harder to understand (not that the previous one was easy, this
> is tricky code..).

Agreed, I found the original code hard to understand.  I think this is
easier because now cur_page doesn't leak outside this loop, so it has
an obvious lifecycle.

I'm kind of optimistic for Dave Howells' iov_iter addition of an
ITER_MAPPING.  That might simplify all of this code.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 22/24] iomap: Convert from readpages to readahead
  2020-02-20 15:49   ` Christoph Hellwig
@ 2020-02-20 16:57     ` Matthew Wilcox
  2020-02-22  1:00       ` Darrick J. Wong
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-20 16:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Thu, Feb 20, 2020 at 07:49:12AM -0800, Christoph Hellwig wrote:
> > +/**
> > + * iomap_readahead - Attempt to read pages from a file.
> > + * @rac: Describes the pages to be read.
> > + * @ops: The operations vector for the filesystem.
> > + *
> > + * This function is for filesystems to call to implement their readahead
> > + * address_space operation.
> > + *
> > + * Context: The file is pinned by the caller, and the pages to be read are
> > + * all locked and have an elevated refcount.  This function will unlock
> > + * the pages (once I/O has completed on them, or I/O has been determined to
> > + * not be necessary).  It will also decrease the refcount once the pages
> > + * have been submitted for I/O.  After this point, the page may be removed
> > + * from the page cache, and should not be referenced.
> > + */
> 
> Isn't the context documentation something that belongs into the aop
> documentation?  I've never really seen the value of duplicating this
> information in method instances, as it is just bound to be out of date
> rather sooner than later.

I'm in two minds about it as well.  There's definitely no value in
providing kernel-doc for implementations of a common interface ... so
rather than fixing the nilfs2 kernel-doc, I just deleted it.  But this
isn't just the implementation, like nilfs2_readahead() is, it's a library
function for filesystems to call, so it deserves documentation.  On the
other hand, there's no real thought to this on the part of the filesystem;
the implementation just calls this with the appropriate ops pointer.

Then again, I kind of feel like we need more documentation of iomap to
help filesystems convert to using it.  But maybe kernel-doc isn't the
mechanism to provide that.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 00/23] Change readahead API
  2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
                   ` (23 preceding siblings ...)
  2020-02-19 21:01 ` [PATCH v7 24/24] mm: Use memalloc_nofs_save in readahead path Matthew Wilcox
@ 2020-02-20 17:54 ` David Sterba
  2020-02-20 22:39   ` Matthew Wilcox
  24 siblings, 1 reply; 76+ messages in thread
From: David Sterba @ 2020-02-20 17:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Wed, Feb 19, 2020 at 01:00:39PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> This series adds a readahead address_space operation to eventually
> replace the readpages operation.  The key difference is that
> pages are added to the page cache as they are allocated (and
> then looked up by the filesystem) instead of passing them on a
> list to the readpages operation and having the filesystem add
> them to the page cache.  It's a net reduction in code for each
> implementation, more efficient than walking a list, and solves
> the direct-write vs buffered-read problem reported by yu kuai at
> https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/
> 
> The only unconverted filesystems are those which use fscache.
> Their conversion is pending Dave Howells' rewrite which will make the
> conversion substantially easier.
> 
> I want to thank the reviewers; Dave Chinner, John Hubbard and Christoph
> Hellwig have done a marvellous job of providing constructive criticism.
> Eric Biggers pointed out how I'd broken ext4 (which led to a substantial
> change).  I've tried to take it all on board, but I may have missed
> something simply because you've done such a thorough job.
> 
> This series can also be found at
> http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/tags/readahead_v7
> (I also pushed the readahead_v6 tag there in case anyone wants to diff, and
> they're both based on 5.6-rc2 so they're easy to diff)
> 
> v7:
>  - Now passes an xfstests run on ext4!

On btrfs it still chokes on the first test btrfs/001, with the following
warning, the test is stuck there.

[   21.100922] WARNING: suspicious RCU usage
[   21.103107] 5.6.0-rc2-default+ #996 Not tainted
[   21.105133] -----------------------------
[   21.106864] include/linux/xarray.h:1164 suspicious rcu_dereference_check() usage!
[   21.109948]
[   21.109948] other info that might help us debug this:
[   21.109948]
[   21.113373]
[   21.113373] rcu_scheduler_active = 2, debug_locks = 1
[   21.115801] 4 locks held by umount/793:
[   21.117135]  #0: ffff964a736890e8 (&type->s_umount_key#26){+.+.}, at: deactivate_super+0x2f/0x40
[   21.120188]  #1: ffff964a7347ba68 (&delayed_node->mutex){+.+.}, at: __btrfs_commit_inode_delayed_items+0x44c/0x4e0 [btrfs]
[   21.123042]  #2: ffff964a612fe5c8 (&space_info->groups_sem){++++}, at: find_free_extent+0x27d/0xf00 [btrfs]
[   21.126068]  #3: ffff964a60b93280 (&caching_ctl->mutex){+.+.}, at: btrfs_cache_block_group+0x1f0/0x500 [btrfs]
[   21.129655]
[   21.129655] stack backtrace:
[   21.131943] CPU: 1 PID: 793 Comm: umount Not tainted 5.6.0-rc2-default+ #996
[   21.134164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[   21.138076] Call Trace:
[   21.139441]  dump_stack+0x71/0xa0
[   21.140954]  xas_start+0x1a4/0x240
[   21.142473]  xas_load+0xa/0x50
[   21.143874]  xas_find+0x226/0x280
[   21.145298]  extent_readahead+0xcb/0x4f0 [btrfs]
[   21.146934]  ? mem_cgroup_commit_charge+0x56/0x400
[   21.148654]  ? rcu_read_lock_sched_held+0x5d/0x90
[   21.150382]  ? __add_to_page_cache_locked+0x327/0x380
[   21.152155]  read_pages+0x80/0x1f0
[   21.153531]  page_cache_readahead_unbounded+0x1b7/0x210
[   21.155196]  __load_free_space_cache+0x1c1/0x730 [btrfs]
[   21.157014]  load_free_space_cache+0xb9/0x190 [btrfs]
[   21.158222]  btrfs_cache_block_group+0x1f8/0x500 [btrfs]
[   21.159717]  ? finish_wait+0x90/0x90
[   21.160723]  find_free_extent+0xa17/0xf00 [btrfs]
[   21.161798]  ? kvm_sched_clock_read+0x14/0x30
[   21.163022]  ? sched_clock_cpu+0x10/0x120
[   21.164361]  btrfs_reserve_extent+0x9b/0x180 [btrfs]
[   21.165952]  btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
[   21.167680]  ? __lock_acquire+0x272/0x1320
[   21.169353]  alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
[   21.171313]  __btrfs_cow_block+0x143/0x7a0 [btrfs]
[   21.173080]  btrfs_cow_block+0x15f/0x310 [btrfs]
[   21.174487]  btrfs_search_slot+0x93b/0xf70 [btrfs]
[   21.175940]  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
[   21.177419]  ? __btrfs_commit_inode_delayed_items+0x417/0x4e0 [btrfs]
[   21.179032]  ? __btrfs_commit_inode_delayed_items+0x44c/0x4e0 [btrfs]
[   21.180787]  __btrfs_update_delayed_inode+0x73/0x260 [btrfs]
[   21.182174]  __btrfs_commit_inode_delayed_items+0x46c/0x4e0 [btrfs]
[   21.183907]  ? btrfs_first_delayed_node+0x4c/0x90 [btrfs]
[   21.185204]  __btrfs_run_delayed_items+0x8e/0x140 [btrfs]
[   21.186521]  btrfs_commit_transaction+0x312/0xae0 [btrfs]
[   21.188142]  ? btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs]
[   21.189684]  sync_filesystem+0x6e/0x90
[   21.190878]  generic_shutdown_super+0x22/0x100
[   21.192693]  kill_anon_super+0x14/0x30
[   21.194389]  btrfs_kill_super+0x12/0x20 [btrfs]
[   21.196078]  deactivate_locked_super+0x2c/0x70
[   21.197732]  cleanup_mnt+0x100/0x160
[   21.199033]  task_work_run+0x90/0xc0
[   21.200331]  exit_to_usermode_loop+0x96/0xa0
[   21.201744]  do_syscall_64+0x1df/0x210
[   21.203187]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 00/23] Change readahead API
  2020-02-20 17:54 ` [PATCH v7 00/23] Change readahead API David Sterba
@ 2020-02-20 22:39   ` Matthew Wilcox
  2020-02-21 11:59     ` David Sterba
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-20 22:39 UTC (permalink / raw)
  To: dsterba, linux-fsdevel, linux-mm, linux-kernel, linux-btrfs,
	linux-erofs, linux-ext4, linux-f2fs-devel, cluster-devel,
	ocfs2-devel, linux-xfs

On Thu, Feb 20, 2020 at 06:54:00PM +0100, David Sterba wrote:
> On Wed, Feb 19, 2020 at 01:00:39PM -0800, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > This series adds a readahead address_space operation to eventually
> > replace the readpages operation.  The key difference is that
> > pages are added to the page cache as they are allocated (and
> > then looked up by the filesystem) instead of passing them on a
> > list to the readpages operation and having the filesystem add
> > them to the page cache.  It's a net reduction in code for each
> > implementation, more efficient than walking a list, and solves
> > the direct-write vs buffered-read problem reported by yu kuai at
> > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/
> > 
> > The only unconverted filesystems are those which use fscache.
> > Their conversion is pending Dave Howells' rewrite which will make the
> > conversion substantially easier.
> > 
> > I want to thank the reviewers; Dave Chinner, John Hubbard and Christoph
> > Hellwig have done a marvellous job of providing constructive criticism.
> > Eric Biggers pointed out how I'd broken ext4 (which led to a substantial
> > change).  I've tried to take it all on board, but I may have missed
> > something simply because you've done such a thorough job.
> > 
> > This series can also be found at
> > http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/tags/readahead_v7
> > (I also pushed the readahead_v6 tag there in case anyone wants to diff, and
> > they're both based on 5.6-rc2 so they're easy to diff)
> > 
> > v7:
> >  - Now passes an xfstests run on ext4!
> 
> On btrfs it still chokes on the first test btrfs/001, with the following
> warning, the test is stuck there.

Thanks.  The warning actually wasn't the problem, but it did need to
be addressed.  I got a test system up & running with btrfs, and it's
currently on generic/027 with the following patch:

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 9c782c15f7f7..d23a224d2ad2 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -676,7 +676,7 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac,
 		struct page **array, unsigned int array_sz)
 {
 	unsigned int i = 0;
-	XA_STATE(xas, &rac->mapping->i_pages, rac->_index);
+	XA_STATE(xas, &rac->mapping->i_pages, 0);
 	struct page *page;
 
 	BUG_ON(rac->_batch_count > rac->_nr_pages);
@@ -684,6 +684,8 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac,
 	rac->_index += rac->_batch_count;
 	rac->_batch_count = 0;
 
+	xas_set(&xas, rac->_index);
+	rcu_read_lock();
 	xas_for_each(&xas, page, rac->_index + rac->_nr_pages - 1) {
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		VM_BUG_ON_PAGE(PageTail(page), page);
@@ -702,6 +704,7 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac,
 		if (i == array_sz)
 			break;
 	}
+	rcu_read_unlock();
 
 	return i;
 }

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 06/24] mm: Rename various 'offset' parameters to 'index'
  2020-02-19 21:00 ` [PATCH v7 06/24] mm: Rename various 'offset' parameters to 'index' Matthew Wilcox
@ 2020-02-21  2:21   ` John Hubbard
  2020-02-21  3:27   ` John Hubbard
  1 sibling, 0 replies; 76+ messages in thread
From: John Hubbard @ 2020-02-21  2:21 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-ext4, linux-erofs, linux-btrfs

On 2/19/20 1:00 PM, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> The word 'offset' is used ambiguously to mean 'byte offset within
> a page', 'byte offset from the start of the file' and 'page offset
> from the start of the file'.  Use 'index' to mean 'page offset
> from the start of the file' throughout the readahead code.


And: use 'count' to mean 'number of pages'.  Since the patch also changes
req_size to req_count.

I'm casting about for a nice place in the code, to put your above note...and
there isn't one. But would it be terrible to just put a short comment
block at the top of this file, just so we have it somewhere? It's pretty
cool to settle on a consistent terminology, so backing it up with "yes, we
actually mean it" documentation would be even better.

One minor note below, but no bugs spotted, and this clarifies the code a bit, so:

    Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA


> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/readahead.c | 86 ++++++++++++++++++++++++--------------------------
>  1 file changed, 42 insertions(+), 44 deletions(-)
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 6a9d99229bd6..096cf9020648 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -156,7 +156,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
>   * We really don't want to intermingle reads and writes like that.
>   */
>  void __do_page_cache_readahead(struct address_space *mapping,
> -		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
> +		struct file *filp, pgoff_t index, unsigned long nr_to_read,
>  		unsigned long lookahead_size)
>  {
>  	struct inode *inode = mapping->host;
> @@ -181,7 +181,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  	 * Preallocate as many pages as we will need.
>  	 */
>  	for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
> -		pgoff_t page_offset = offset + page_idx;
> +		pgoff_t page_offset = index + page_idx;


This line seems to unrepentantly mix offset and index, still. :)


>  
>  		if (page_offset > end_index)
>  			break;
> @@ -220,7 +220,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
>   * memory at once.
>   */
>  void force_page_cache_readahead(struct address_space *mapping,
> -		struct file *filp, pgoff_t offset, unsigned long nr_to_read)
> +		struct file *filp, pgoff_t index, unsigned long nr_to_read)
>  {
>  	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
>  	struct file_ra_state *ra = &filp->f_ra;
> @@ -240,9 +240,9 @@ void force_page_cache_readahead(struct address_space *mapping,
>  
>  		if (this_chunk > nr_to_read)
>  			this_chunk = nr_to_read;
> -		__do_page_cache_readahead(mapping, filp, offset, this_chunk, 0);
> +		__do_page_cache_readahead(mapping, filp, index, this_chunk, 0);
>  
> -		offset += this_chunk;
> +		index += this_chunk;
>  		nr_to_read -= this_chunk;
>  	}
>  }
> @@ -323,21 +323,21 @@ static unsigned long get_next_ra_size(struct file_ra_state *ra,
>   */
>  
>  /*
> - * Count contiguously cached pages from @offset-1 to @offset-@max,
> + * Count contiguously cached pages from @index-1 to @index-@max,
>   * this count is a conservative estimation of
>   * 	- length of the sequential read sequence, or
>   * 	- thrashing threshold in memory tight systems
>   */
>  static pgoff_t count_history_pages(struct address_space *mapping,
> -				   pgoff_t offset, unsigned long max)
> +				   pgoff_t index, unsigned long max)
>  {
>  	pgoff_t head;
>  
>  	rcu_read_lock();
> -	head = page_cache_prev_miss(mapping, offset - 1, max);
> +	head = page_cache_prev_miss(mapping, index - 1, max);
>  	rcu_read_unlock();
>  
> -	return offset - 1 - head;
> +	return index - 1 - head;
>  }
>  
>  /*
> @@ -345,13 +345,13 @@ static pgoff_t count_history_pages(struct address_space *mapping,
>   */
>  static int try_context_readahead(struct address_space *mapping,
>  				 struct file_ra_state *ra,
> -				 pgoff_t offset,
> +				 pgoff_t index,
>  				 unsigned long req_size,
>  				 unsigned long max)
>  {
>  	pgoff_t size;
>  
> -	size = count_history_pages(mapping, offset, max);
> +	size = count_history_pages(mapping, index, max);
>  
>  	/*
>  	 * not enough history pages:
> @@ -364,10 +364,10 @@ static int try_context_readahead(struct address_space *mapping,
>  	 * starts from beginning of file:
>  	 * it is a strong indication of long-run stream (or whole-file-read)
>  	 */
> -	if (size >= offset)
> +	if (size >= index)
>  		size *= 2;
>  
> -	ra->start = offset;
> +	ra->start = index;
>  	ra->size = min(size + req_size, max);
>  	ra->async_size = 1;
>  
> @@ -379,13 +379,13 @@ static int try_context_readahead(struct address_space *mapping,
>   */
>  static void ondemand_readahead(struct address_space *mapping,
>  		struct file_ra_state *ra, struct file *filp,
> -		bool hit_readahead_marker, pgoff_t offset,
> +		bool hit_readahead_marker, pgoff_t index,
>  		unsigned long req_size)
>  {
>  	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
>  	unsigned long max_pages = ra->ra_pages;
>  	unsigned long add_pages;
> -	pgoff_t prev_offset;
> +	pgoff_t prev_index;
>  
>  	/*
>  	 * If the request exceeds the readahead window, allow the read to
> @@ -397,15 +397,15 @@ static void ondemand_readahead(struct address_space *mapping,
>  	/*
>  	 * start of file
>  	 */
> -	if (!offset)
> +	if (!index)
>  		goto initial_readahead;
>  
>  	/*
> -	 * It's the expected callback offset, assume sequential access.
> +	 * It's the expected callback index, assume sequential access.
>  	 * Ramp up sizes, and push forward the readahead window.
>  	 */
> -	if ((offset == (ra->start + ra->size - ra->async_size) ||
> -	     offset == (ra->start + ra->size))) {
> +	if ((index == (ra->start + ra->size - ra->async_size) ||
> +	     index == (ra->start + ra->size))) {
>  		ra->start += ra->size;
>  		ra->size = get_next_ra_size(ra, max_pages);
>  		ra->async_size = ra->size;
> @@ -422,14 +422,14 @@ static void ondemand_readahead(struct address_space *mapping,
>  		pgoff_t start;
>  
>  		rcu_read_lock();
> -		start = page_cache_next_miss(mapping, offset + 1, max_pages);
> +		start = page_cache_next_miss(mapping, index + 1, max_pages);
>  		rcu_read_unlock();
>  
> -		if (!start || start - offset > max_pages)
> +		if (!start || start - index > max_pages)
>  			return;
>  
>  		ra->start = start;
> -		ra->size = start - offset;	/* old async_size */
> +		ra->size = start - index;	/* old async_size */
>  		ra->size += req_size;
>  		ra->size = get_next_ra_size(ra, max_pages);
>  		ra->async_size = ra->size;
> @@ -444,29 +444,29 @@ static void ondemand_readahead(struct address_space *mapping,
>  
>  	/*
>  	 * sequential cache miss
> -	 * trivial case: (offset - prev_offset) == 1
> -	 * unaligned reads: (offset - prev_offset) == 0
> +	 * trivial case: (index - prev_index) == 1
> +	 * unaligned reads: (index - prev_index) == 0
>  	 */
> -	prev_offset = (unsigned long long)ra->prev_pos >> PAGE_SHIFT;
> -	if (offset - prev_offset <= 1UL)
> +	prev_index = (unsigned long long)ra->prev_pos >> PAGE_SHIFT;
> +	if (index - prev_index <= 1UL)
>  		goto initial_readahead;
>  
>  	/*
>  	 * Query the page cache and look for the traces(cached history pages)
>  	 * that a sequential stream would leave behind.
>  	 */
> -	if (try_context_readahead(mapping, ra, offset, req_size, max_pages))
> +	if (try_context_readahead(mapping, ra, index, req_size, max_pages))
>  		goto readit;
>  
>  	/*
>  	 * standalone, small random read
>  	 * Read as is, and do not pollute the readahead state.
>  	 */
> -	__do_page_cache_readahead(mapping, filp, offset, req_size, 0);
> +	__do_page_cache_readahead(mapping, filp, index, req_size, 0);
>  	return;
>  
>  initial_readahead:
> -	ra->start = offset;
> +	ra->start = index;
>  	ra->size = get_init_ra_size(req_size, max_pages);
>  	ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
>  
> @@ -477,7 +477,7 @@ static void ondemand_readahead(struct address_space *mapping,
>  	 * the resulted next readahead window into the current one.
>  	 * Take care of maximum IO pages as above.
>  	 */
> -	if (offset == ra->start && ra->size == ra->async_size) {
> +	if (index == ra->start && ra->size == ra->async_size) {
>  		add_pages = get_next_ra_size(ra, max_pages);
>  		if (ra->size + add_pages <= max_pages) {
>  			ra->async_size = add_pages;
> @@ -496,9 +496,8 @@ static void ondemand_readahead(struct address_space *mapping,
>   * @mapping: address_space which holds the pagecache and I/O vectors
>   * @ra: file_ra_state which holds the readahead state
>   * @filp: passed on to ->readpage() and ->readpages()
> - * @offset: start offset into @mapping, in pagecache page-sized units
> - * @req_size: hint: total size of the read which the caller is performing in
> - *            pagecache pages
> + * @index: Index of first page to be read.
> + * @req_count: Total number of pages being read by the caller.
>   *
>   * page_cache_sync_readahead() should be called when a cache miss happened:
>   * it will submit the read.  The readahead logic may decide to piggyback more
> @@ -507,7 +506,7 @@ static void ondemand_readahead(struct address_space *mapping,
>   */
>  void page_cache_sync_readahead(struct address_space *mapping,
>  			       struct file_ra_state *ra, struct file *filp,
> -			       pgoff_t offset, unsigned long req_size)
> +			       pgoff_t index, unsigned long req_count)
>  {
>  	/* no read-ahead */
>  	if (!ra->ra_pages)
> @@ -518,12 +517,12 @@ void page_cache_sync_readahead(struct address_space *mapping,
>  
>  	/* be dumb */
>  	if (filp && (filp->f_mode & FMODE_RANDOM)) {
> -		force_page_cache_readahead(mapping, filp, offset, req_size);
> +		force_page_cache_readahead(mapping, filp, index, req_count);
>  		return;
>  	}
>  
>  	/* do read-ahead */
> -	ondemand_readahead(mapping, ra, filp, false, offset, req_size);
> +	ondemand_readahead(mapping, ra, filp, false, index, req_count);
>  }
>  EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
>  
> @@ -532,21 +531,20 @@ EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
>   * @mapping: address_space which holds the pagecache and I/O vectors
>   * @ra: file_ra_state which holds the readahead state
>   * @filp: passed on to ->readpage() and ->readpages()
> - * @page: the page at @offset which has the PG_readahead flag set
> - * @offset: start offset into @mapping, in pagecache page-sized units
> - * @req_size: hint: total size of the read which the caller is performing in
> - *            pagecache pages
> + * @page: The page at @index which triggered the readahead call.
> + * @index: Index of first page to be read.
> + * @req_count: Total number of pages being read by the caller.
>   *
>   * page_cache_async_readahead() should be called when a page is used which
> - * has the PG_readahead flag; this is a marker to suggest that the application
> + * is marked as PageReadahead; this is a marker to suggest that the application
>   * has used up enough of the readahead window that we should start pulling in
>   * more pages.
>   */
>  void
>  page_cache_async_readahead(struct address_space *mapping,
>  			   struct file_ra_state *ra, struct file *filp,
> -			   struct page *page, pgoff_t offset,
> -			   unsigned long req_size)
> +			   struct page *page, pgoff_t index,
> +			   unsigned long req_count)
>  {
>  	/* no read-ahead */
>  	if (!ra->ra_pages)
> @@ -570,7 +568,7 @@ page_cache_async_readahead(struct address_space *mapping,
>  		return;
>  
>  	/* do read-ahead */
> -	ondemand_readahead(mapping, ra, filp, true, offset, req_size);
> +	ondemand_readahead(mapping, ra, filp, true, index, req_count);
>  }
>  EXPORT_SYMBOL_GPL(page_cache_async_readahead);
>  
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 01/24] mm: Move readahead prototypes from mm.h
  2020-02-19 21:00 ` [PATCH v7 01/24] mm: Move readahead prototypes from mm.h Matthew Wilcox
@ 2020-02-21  2:43   ` John Hubbard
  2020-02-21 21:48     ` Matthew Wilcox
  2020-02-24 21:32   ` Christoph Hellwig
  1 sibling, 1 reply; 76+ messages in thread
From: John Hubbard @ 2020-02-21  2:43 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-ext4, linux-erofs, linux-btrfs

On 2/19/20 1:00 PM, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> The readahead code is part of the page cache so should be found in the
> pagemap.h file.  force_page_cache_readahead is only used within mm,
> so move it to mm/internal.h instead.  Remove the parameter names where
> they add no value, and rename the ones which were actively misleading.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  block/blk-core.c        |  1 +
>  include/linux/mm.h      | 19 -------------------
>  include/linux/pagemap.h |  8 ++++++++
>  mm/fadvise.c            |  2 ++
>  mm/internal.h           |  2 ++
>  5 files changed, 13 insertions(+), 19 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 089e890ab208..41417bb93634 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -20,6 +20,7 @@
>  #include <linux/blk-mq.h>
>  #include <linux/highmem.h>
>  #include <linux/mm.h>
> +#include <linux/pagemap.h>

Yes. But I think these files also need a similar change:

    fs/btrfs/disk-io.c
    fs/nfs/super.c
    

...because they also use VM_READAHEAD_PAGES, and do not directly include
pagemap.h yet.


>  #include <linux/kernel_stat.h>
>  #include <linux/string.h>
>  #include <linux/init.h>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 52269e56c514..68dcda9a2112 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2401,25 +2401,6 @@ extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf);
>  int __must_check write_one_page(struct page *page);
>  void task_dirty_inc(struct task_struct *tsk);
>  
> -/* readahead.c */
> -#define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
> -
> -int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
> -			pgoff_t offset, unsigned long nr_to_read);
> -
> -void page_cache_sync_readahead(struct address_space *mapping,
> -			       struct file_ra_state *ra,
> -			       struct file *filp,
> -			       pgoff_t offset,
> -			       unsigned long size);
> -
> -void page_cache_async_readahead(struct address_space *mapping,
> -				struct file_ra_state *ra,
> -				struct file *filp,
> -				struct page *pg,
> -				pgoff_t offset,
> -				unsigned long size);
> -
>  extern unsigned long stack_guard_gap;
>  /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */
>  extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index ccb14b6a16b5..24894b9b90c9 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -614,6 +614,14 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
>  void delete_from_page_cache_batch(struct address_space *mapping,
>  				  struct pagevec *pvec);
>  
> +#define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
> +
> +void page_cache_sync_readahead(struct address_space *, struct file_ra_state *,
> +		struct file *, pgoff_t index, unsigned long req_count);


Yes, "struct address_space *mapping" is weird, but I don't know if it's
"misleading", given that it's actually one of the things you have to learn
right from the beginning, with linux-mm, right? Or is that about to change?

I'm not asking to restore this to "struct address_space *mapping", but I thought
it's worth mentioning out loud, especially if you or others are planning on
changing those names or something. Just curious.



thanks,
-- 
John Hubbard
NVIDIA


> +void page_cache_async_readahead(struct address_space *, struct file_ra_state *,
> +		struct file *, struct page *, pgoff_t index,
> +		unsigned long req_count);
> +
>  /*
>   * Like add_to_page_cache_locked, but used to add newly allocated pages:
>   * the page is new, so we can just run __SetPageLocked() against it.
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 4f17c83db575..3efebfb9952c 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -22,6 +22,8 @@
>  
>  #include <asm/unistd.h>
>  
> +#include "internal.h"
> +
>  /*
>   * POSIX_FADV_WILLNEED could set PG_Referenced, and POSIX_FADV_NOREUSE could
>   * deactivate the pages and clear PG_Referenced.
> diff --git a/mm/internal.h b/mm/internal.h
> index 3cf20ab3ca01..83f353e74654 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -49,6 +49,8 @@ void unmap_page_range(struct mmu_gather *tlb,
>  			     unsigned long addr, unsigned long end,
>  			     struct zap_details *details);
>  
> +int force_page_cache_readahead(struct address_space *, struct file *,
> +		pgoff_t index, unsigned long nr_to_read);
>  extern unsigned int __do_page_cache_readahead(struct address_space *mapping,
>  		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
>  		unsigned long lookahead_size);
> 




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 08/24] mm: Remove 'page_offset' from readahead loop
  2020-02-19 21:00 ` [PATCH v7 08/24] mm: Remove 'page_offset' from readahead loop Matthew Wilcox
@ 2020-02-21  2:48   ` John Hubbard
  2020-02-24 21:37   ` Christoph Hellwig
  1 sibling, 0 replies; 76+ messages in thread
From: John Hubbard @ 2020-02-21  2:48 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-ext4, linux-erofs, linux-btrfs

On 2/19/20 1:00 PM, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Replace the page_offset variable with 'index + i'.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/readahead.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 8a25fc7e2bf2..83df5c061d33 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -181,12 +181,10 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  	 * Preallocate as many pages as we will need.
>  	 */
>  	for (i = 0; i < nr_to_read; i++) {
> -		pgoff_t page_offset = index + i;


ha, the naming mismatch I complained about in an earlier patch gets deleted
here, so we're good after all. :)

Looks good,

    Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> -
> -		if (page_offset > end_index)
> +		if (index + i > end_index)
>  			break;
>  
> -		page = xa_load(&mapping->i_pages, page_offset);
> +		page = xa_load(&mapping->i_pages, index + i);
>  		if (page && !xa_is_value(page)) {
>  			/*
>  			 * Page already present?  Kick off the current batch of
> @@ -200,7 +198,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  		page = __page_cache_alloc(gfp_mask);
>  		if (!page)
>  			break;
> -		page->index = page_offset;
> +		page->index = index + i;
>  		list_add(&page->lru, &page_pool);
>  		if (i == nr_to_read - lookahead_size)
>  			SetPageReadahead(page);
> 



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 09/24] mm: Put readahead pages in cache earlier
  2020-02-19 21:00 ` [PATCH v7 09/24] mm: Put readahead pages in cache earlier Matthew Wilcox
@ 2020-02-21  3:19   ` John Hubbard
  2020-02-21  3:43     ` Matthew Wilcox
  2020-02-24 21:40   ` Christoph Hellwig
  1 sibling, 1 reply; 76+ messages in thread
From: John Hubbard @ 2020-02-21  3:19 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-ext4, linux-erofs, linux-btrfs

On 2/19/20 1:00 PM, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> When populating the page cache for readahead, mappings that use
> ->readpages must populate the page cache themselves as the pages are
> passed on a linked list which would normally be used for the page cache's
> LRU.  For mappings that use ->readpage or the upcoming ->readahead method,
> we can put the pages into the page cache as soon as they're allocated,
> which solves a race between readahead and direct IO.  It also lets us
> remove the gfp argument from read_pages().
> 
> Use the new readahead_page() API to implement the repeated calls to
> ->readpage(), just like most filesystems will.  This iterator also
> supports huge pages, even though none of the filesystems have been
> converted to use them yet.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/pagemap.h | 20 +++++++++++++++++
>  mm/readahead.c          | 48 +++++++++++++++++++++++++----------------
>  2 files changed, 49 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 55fcea0249e6..4989d330fada 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -647,8 +647,28 @@ struct readahead_control {
>  /* private: use the readahead_* accessors instead */
>  	pgoff_t _index;
>  	unsigned int _nr_pages;
> +	unsigned int _batch_count;
>  };
>  
> +static inline struct page *readahead_page(struct readahead_control *rac)
> +{
> +	struct page *page;
> +
> +	BUG_ON(rac->_batch_count > rac->_nr_pages);
> +	rac->_nr_pages -= rac->_batch_count;
> +	rac->_index += rac->_batch_count;
> +	rac->_batch_count = 0;


Is it intentional, to set rac->_batch_count twice (here, and below)? The
only reason I can see is if a caller needs to use ->_batch_count in the
"return NULL" case, which doesn't seem to come up...


> +
> +	if (!rac->_nr_pages)
> +		return NULL;
> +
> +	page = xa_load(&rac->mapping->i_pages, rac->_index);
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	rac->_batch_count = hpage_nr_pages(page);
> +
> +	return page;
> +}
> +
>  /* The number of pages in this readahead block */
>  static inline unsigned int readahead_count(struct readahead_control *rac)
>  {
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 83df5c061d33..aaa209559ba2 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -113,15 +113,14 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages,
>  
>  EXPORT_SYMBOL(read_cache_pages);
>  
> -static void read_pages(struct readahead_control *rac, struct list_head *pages,
> -		gfp_t gfp)
> +static void read_pages(struct readahead_control *rac, struct list_head *pages)
>  {
>  	const struct address_space_operations *aops = rac->mapping->a_ops;
> +	struct page *page;
>  	struct blk_plug plug;
> -	unsigned page_idx;
>  
>  	if (!readahead_count(rac))
> -		return;
> +		goto out;
>  
>  	blk_start_plug(&plug);
>  
> @@ -130,23 +129,23 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
>  				readahead_count(rac));
>  		/* Clean up the remaining pages */
>  		put_pages_list(pages);
> -		goto out;
> -	}
> -
> -	for (page_idx = 0; page_idx < readahead_count(rac); page_idx++) {
> -		struct page *page = lru_to_page(pages);
> -		list_del(&page->lru);
> -		if (!add_to_page_cache_lru(page, rac->mapping, page->index,
> -				gfp))
> +		rac->_index += rac->_nr_pages;
> +		rac->_nr_pages = 0;
> +	} else {
> +		while ((page = readahead_page(rac))) {
>  			aops->readpage(rac->file, page);
> -		put_page(page);
> +			put_page(page);
> +		}
>  	}
>  
> -out:
>  	blk_finish_plug(&plug);
>  
>  	BUG_ON(!list_empty(pages));
> -	rac->_nr_pages = 0;
> +	BUG_ON(readahead_count(rac));
> +
> +out:
> +	/* If we were called due to a conflicting page, skip over it */


Tiny documentation nit: What if we were *not* called due to a conflicting page? 
(And what is a "conflicting page", in this context, btw?) The next line unconditionally 
moves the index ahead, so the "if" part of the comment really confuses me.


> +	rac->_index++;
>  }
>  
>  /*
> @@ -165,9 +164,11 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  	LIST_HEAD(page_pool);
>  	loff_t isize = i_size_read(inode);
>  	gfp_t gfp_mask = readahead_gfp_mask(mapping);
> +	bool use_list = mapping->a_ops->readpages;
>  	struct readahead_control rac = {
>  		.mapping = mapping,
>  		.file = filp,
> +		._index = index,
>  		._nr_pages = 0,
>  	};
>  	unsigned long i;
> @@ -184,6 +185,8 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  		if (index + i > end_index)
>  			break;
>  
> +		BUG_ON(index + i != rac._index + rac._nr_pages);
> +
>  		page = xa_load(&mapping->i_pages, index + i);
>  		if (page && !xa_is_value(page)) {
>  			/*
> @@ -191,15 +194,22 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  			 * contiguous pages before continuing with the next
>  			 * batch.
>  			 */
> -			read_pages(&rac, &page_pool, gfp_mask);
> +			read_pages(&rac, &page_pool);
>  			continue;
>  		}
>  
>  		page = __page_cache_alloc(gfp_mask);
>  		if (!page)
>  			break;
> -		page->index = index + i;
> -		list_add(&page->lru, &page_pool);
> +		if (use_list) {
> +			page->index = index + i;
> +			list_add(&page->lru, &page_pool);
> +		} else if (add_to_page_cache_lru(page, mapping, index + i,
> +					gfp_mask) < 0) {


I still think you'll want to compare against !=0, rather than < 0, here.


> +			put_page(page);
> +			read_pages(&rac, &page_pool);


Doing a read_pages() in the error case is because...actually, I'm not sure yet.
Why do we do this? Effectively it's a retry?


> +			continue;
> +		}
>  		if (i == nr_to_read - lookahead_size)
>  			SetPageReadahead(page);
>  		rac._nr_pages++;
> @@ -210,7 +220,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  	 * uptodate then the caller will launch readpage again, and
>  	 * will then handle the error.
>  	 */
> -	read_pages(&rac, &page_pool, gfp_mask);
> +	read_pages(&rac, &page_pool);
>  }
>  
>  /*
> 

Didn't spot any actual errors, just mainly my own questions here. :)


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 06/24] mm: Rename various 'offset' parameters to 'index'
  2020-02-19 21:00 ` [PATCH v7 06/24] mm: Rename various 'offset' parameters to 'index' Matthew Wilcox
  2020-02-21  2:21   ` John Hubbard
@ 2020-02-21  3:27   ` John Hubbard
  1 sibling, 0 replies; 76+ messages in thread
From: John Hubbard @ 2020-02-21  3:27 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-ext4, linux-erofs, linux-btrfs

On 2/19/20 1:00 PM, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> The word 'offset' is used ambiguously to mean 'byte offset within
> a page', 'byte offset from the start of the file' and 'page offset
> from the start of the file'.  Use 'index' to mean 'page offset
> from the start of the file' throughout the readahead code.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/readahead.c | 86 ++++++++++++++++++++++++--------------------------
>  1 file changed, 42 insertions(+), 44 deletions(-)
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 6a9d99229bd6..096cf9020648 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -156,7 +156,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
>   * We really don't want to intermingle reads and writes like that.
>   */
>  void __do_page_cache_readahead(struct address_space *mapping,
> -		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
> +		struct file *filp, pgoff_t index, unsigned long nr_to_read,
>  		unsigned long lookahead_size)


One more tiny thing: seeing as how this patch is also changing "size" to "count",
maybe it should also change lookahead_size to lookahead_count.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 09/24] mm: Put readahead pages in cache earlier
  2020-02-21  3:19   ` John Hubbard
@ 2020-02-21  3:43     ` Matthew Wilcox
  2020-02-21  4:19       ` John Hubbard
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-21  3:43 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Thu, Feb 20, 2020 at 07:19:58PM -0800, John Hubbard wrote:
> > +static inline struct page *readahead_page(struct readahead_control *rac)
> > +{
> > +	struct page *page;
> > +
> > +	BUG_ON(rac->_batch_count > rac->_nr_pages);
> > +	rac->_nr_pages -= rac->_batch_count;
> > +	rac->_index += rac->_batch_count;
> > +	rac->_batch_count = 0;
> 
> 
> Is it intentional, to set rac->_batch_count twice (here, and below)? The
> only reason I can see is if a caller needs to use ->_batch_count in the
> "return NULL" case, which doesn't seem to come up...

Ah, but it does.  Not in this patch, but the next one ...

+       if (aops->readahead) {
+               aops->readahead(rac);
+               /* Clean up the remaining pages */
+               while ((page = readahead_page(rac))) {
+                       unlock_page(page);
+                       put_page(page);
+               }

In the normal case, the ->readahead method will consume all the pages,
and we need readahead_page() to do nothing if it is called again.

> > +	if (!rac->_nr_pages)
> > +		return NULL;

... admittedly I could do:

	if (!rac->_nr_pages) {
		rac->_batch_count = 0;
		return NULL;
	}

which might be less confusing.

> > @@ -130,23 +129,23 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
> >  				readahead_count(rac));
> >  		/* Clean up the remaining pages */
> >  		put_pages_list(pages);
> > -		goto out;
> > -	}
> > -
> > -	for (page_idx = 0; page_idx < readahead_count(rac); page_idx++) {
> > -		struct page *page = lru_to_page(pages);
> > -		list_del(&page->lru);
> > -		if (!add_to_page_cache_lru(page, rac->mapping, page->index,
> > -				gfp))
> > +		rac->_index += rac->_nr_pages;
> > +		rac->_nr_pages = 0;
> > +	} else {
> > +		while ((page = readahead_page(rac))) {
> >  			aops->readpage(rac->file, page);
> > -		put_page(page);
> > +			put_page(page);
> > +		}
> >  	}
> >  
> > -out:
> >  	blk_finish_plug(&plug);
> >  
> >  	BUG_ON(!list_empty(pages));
> > -	rac->_nr_pages = 0;
> > +	BUG_ON(readahead_count(rac));
> > +
> > +out:
> > +	/* If we were called due to a conflicting page, skip over it */
> 
> Tiny documentation nit: What if we were *not* called due to a conflicting page? 
> (And what is a "conflicting page", in this context, btw?) The next line unconditionally 
> moves the index ahead, so the "if" part of the comment really confuses me.

By the end of the series, read_pages() is called in three places:

1.              if (page && !xa_is_value(page)) {
                        read_pages(&rac, &page_pool);

2.              } else if (add_to_page_cache_lru(page, mapping, index + i,
                                        gfp_mask) < 0) {
                        put_page(page);
                        read_pages(&rac, &page_pool);

3.      read_pages(&rac, &page_pool);

In the first two cases, there's an existing page in the page cache
(which conflicts with this readahead operation), and so we need to
advance index.  In the third case, we're exiting the function, so it
does no harm to advance index one further.

> > +		} else if (add_to_page_cache_lru(page, mapping, index + i,
> > +					gfp_mask) < 0) {
> 
> I still think you'll want to compare against !=0, rather than < 0, here.

I tend to prefer < 0 when checking for an error value in case the function
decides to start using positive numbers to mean something.  I don't think
it's a particularly important preference though (after all, returning 1
might mean "failed, but for this weird reason rather than an errno").

> > +			put_page(page);
> > +			read_pages(&rac, &page_pool);
> 
> Doing a read_pages() in the error case is because...actually, I'm not sure yet.
> Why do we do this? Effectively it's a retry?

Same as the reason we call read_pages() if we found a page in the page
cache earlier -- we're sending down a set of pages which are consecutive
in the file's address space, and now we have to skip one.  At least one ;-)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 11/24] mm: Move end_index check out of readahead loop
  2020-02-19 21:00 ` [PATCH v7 11/24] mm: Move end_index check out of readahead loop Matthew Wilcox
@ 2020-02-21  3:50   ` John Hubbard
  2020-02-21 15:35     ` Matthew Wilcox
  0 siblings, 1 reply; 76+ messages in thread
From: John Hubbard @ 2020-02-21  3:50 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-ext4, linux-erofs, linux-btrfs

On 2/19/20 1:00 PM, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> By reducing nr_to_read, we can eliminate this check from inside the loop.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/readahead.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 07cdfbf00f4b..ace611f4bf05 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -166,8 +166,6 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  		unsigned long lookahead_size)
>  {
>  	struct inode *inode = mapping->host;
> -	struct page *page;
> -	unsigned long end_index;	/* The last page we want to read */
>  	LIST_HEAD(page_pool);
>  	loff_t isize = i_size_read(inode);
>  	gfp_t gfp_mask = readahead_gfp_mask(mapping);
> @@ -179,22 +177,27 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  		._nr_pages = 0,
>  	};
>  	unsigned long i;
> +	pgoff_t end_index;	/* The last page we want to read */
>  
>  	if (isize == 0)
>  		return;
>  
> -	end_index = ((isize - 1) >> PAGE_SHIFT);
> +	end_index = (isize - 1) >> PAGE_SHIFT;
> +	if (index > end_index)
> +		return;
> +	if (index + nr_to_read < index)
> +		nr_to_read = ULONG_MAX - index + 1;
> +	if (index + nr_to_read >= end_index)
> +		nr_to_read = end_index - index + 1;


This tiny patch made me pause, because I wasn't sure at first of the exact
intent of the lines above. Once I worked it out, it seemed like it might
be helpful (or overkill??) to add a few hints for the reader, especially since
there are no hints in the function's (minimal) documentation header. What
do you think of this?

	/*
	 * If we can't read *any* pages without going past the inodes's isize
	 * limit, give up entirely:
	 */
	if (index > end_index)
		return;

	/* Cap nr_to_read, in order to avoid overflowing the ULONG type: */
	if (index + nr_to_read < index)
		nr_to_read = ULONG_MAX - index + 1;

	/* Cap nr_to_read, to avoid reading past the inode's isize limit: */
	if (index + nr_to_read >= end_index)
		nr_to_read = end_index - index + 1;


Either way, it looks corrected written to me, so:

    Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

>  
>  	/*
>  	 * Preallocate as many pages as we will need.
>  	 */
>  	for (i = 0; i < nr_to_read; i++) {
> -		if (index + i > end_index)
> -			break;
> +		struct page *page = xa_load(&mapping->i_pages, index + i);
>  
>  		BUG_ON(index + i != rac._index + rac._nr_pages);
>  
> -		page = xa_load(&mapping->i_pages, index + i);
>  		if (page && !xa_is_value(page)) {
>  			/*
>  			 * Page already present?  Kick off the current batch of
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 09/24] mm: Put readahead pages in cache earlier
  2020-02-21  3:43     ` Matthew Wilcox
@ 2020-02-21  4:19       ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2020-02-21  4:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On 2/20/20 7:43 PM, Matthew Wilcox wrote:
> On Thu, Feb 20, 2020 at 07:19:58PM -0800, John Hubbard wrote:
>>> +static inline struct page *readahead_page(struct readahead_control *rac)
>>> +{
>>> +	struct page *page;
>>> +
>>> +	BUG_ON(rac->_batch_count > rac->_nr_pages);
>>> +	rac->_nr_pages -= rac->_batch_count;
>>> +	rac->_index += rac->_batch_count;
>>> +	rac->_batch_count = 0;
>>
>>
>> Is it intentional, to set rac->_batch_count twice (here, and below)? The
>> only reason I can see is if a caller needs to use ->_batch_count in the
>> "return NULL" case, which doesn't seem to come up...
> 
> Ah, but it does.  Not in this patch, but the next one ...
> 
> +       if (aops->readahead) {
> +               aops->readahead(rac);
> +               /* Clean up the remaining pages */
> +               while ((page = readahead_page(rac))) {
> +                       unlock_page(page);
> +                       put_page(page);
> +               }
> 
> In the normal case, the ->readahead method will consume all the pages,
> and we need readahead_page() to do nothing if it is called again.
> 
>>> +	if (!rac->_nr_pages)
>>> +		return NULL;
> 
> ... admittedly I could do:
> 
> 	if (!rac->_nr_pages) {
> 		rac->_batch_count = 0;
> 		return NULL;
> 	}
> 
> which might be less confusing.


Yes, that would be a nice bit of polish if you end up doing another revision for other
reasons.


> 
>>> @@ -130,23 +129,23 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
>>>  				readahead_count(rac));
>>>  		/* Clean up the remaining pages */
>>>  		put_pages_list(pages);
>>> -		goto out;
>>> -	}
>>> -
>>> -	for (page_idx = 0; page_idx < readahead_count(rac); page_idx++) {
>>> -		struct page *page = lru_to_page(pages);
>>> -		list_del(&page->lru);
>>> -		if (!add_to_page_cache_lru(page, rac->mapping, page->index,
>>> -				gfp))
>>> +		rac->_index += rac->_nr_pages;
>>> +		rac->_nr_pages = 0;
>>> +	} else {
>>> +		while ((page = readahead_page(rac))) {
>>>  			aops->readpage(rac->file, page);
>>> -		put_page(page);
>>> +			put_page(page);
>>> +		}
>>>  	}
>>>  
>>> -out:
>>>  	blk_finish_plug(&plug);
>>>  
>>>  	BUG_ON(!list_empty(pages));
>>> -	rac->_nr_pages = 0;
>>> +	BUG_ON(readahead_count(rac));
>>> +
>>> +out:
>>> +	/* If we were called due to a conflicting page, skip over it */
>>
>> Tiny documentation nit: What if we were *not* called due to a conflicting page? 
>> (And what is a "conflicting page", in this context, btw?) The next line unconditionally 
>> moves the index ahead, so the "if" part of the comment really confuses me.
> 
> By the end of the series, read_pages() is called in three places:
> 
> 1.              if (page && !xa_is_value(page)) {
>                         read_pages(&rac, &page_pool);
> 
> 2.              } else if (add_to_page_cache_lru(page, mapping, index + i,
>                                         gfp_mask) < 0) {
>                         put_page(page);
>                         read_pages(&rac, &page_pool);
> 
> 3.      read_pages(&rac, &page_pool);
> 
> In the first two cases, there's an existing page in the page cache
> (which conflicts with this readahead operation), and so we need to
> advance index.  In the third case, we're exiting the function, so it
> does no harm to advance index one further.


OK, I see. As you know, I tend toward maybe over-documenting, but what about
adding just a *few* hints to help new readers, like this approximately (maybe
it should be pared down):


diff --git a/mm/readahead.c b/mm/readahead.c
index 9fb5f77dcf69..0dd5b09c376e 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -114,6 +114,10 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages,
 
 EXPORT_SYMBOL(read_cache_pages);
 
+/*
+ * Read pages into the page cache, OR skip over a page if it is already in the
+ * page cache.
+ */
 static void read_pages(struct readahead_control *rac, struct list_head *pages)
 {
        const struct address_space_operations *aops = rac->mapping->a_ops;
@@ -152,7 +156,11 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages)
        BUG_ON(readahead_count(rac));
 
 out:
-       /* If we were called due to a conflicting page, skip over it */
+       /*
+        * This routine might have been called in order to skip over a page
+        * that is already in the page cache. And for other cases, the index is
+        * ignored by the caller. So just increment unconditionally:
+        */
        rac->_index++;
 }


?

> 
>>> +		} else if (add_to_page_cache_lru(page, mapping, index + i,
>>> +					gfp_mask) < 0) {
>>
>> I still think you'll want to compare against !=0, rather than < 0, here.
> 
> I tend to prefer < 0 when checking for an error value in case the function
> decides to start using positive numbers to mean something.  I don't think
> it's a particularly important preference though (after all, returning 1
> might mean "failed, but for this weird reason rather than an errno").
> 
>>> +			put_page(page);
>>> +			read_pages(&rac, &page_pool);
>>
>> Doing a read_pages() in the error case is because...actually, I'm not sure yet.
>> Why do we do this? Effectively it's a retry?
> 
> Same as the reason we call read_pages() if we found a page in the page
> cache earlier -- we're sending down a set of pages which are consecutive
> in the file's address space, and now we have to skip one.  At least one ;-)
> 

Got it. Finally. :)


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages
  2020-02-19 21:00 ` [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages Matthew Wilcox
  2020-02-20 14:36   ` Zi Yan
@ 2020-02-21  4:24   ` John Hubbard
  2020-02-24 21:34   ` Christoph Hellwig
  2 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2020-02-21  4:24 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-ext4, linux-erofs, linux-btrfs

On 2/19/20 1:00 PM, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Simplify the callers by moving the check for nr_pages and the BUG_ON
> into read_pages().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/readahead.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 

Looks nice,

    Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/mm/readahead.c b/mm/readahead.c
> index 61b15b6b9e72..9fcd4e32b62d 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -119,6 +119,9 @@ static void read_pages(struct address_space *mapping, struct file *filp,
>  	struct blk_plug plug;
>  	unsigned page_idx;
>  
> +	if (!nr_pages)
> +		return;
> +
>  	blk_start_plug(&plug);
>  
>  	if (mapping->a_ops->readpages) {
> @@ -138,6 +141,8 @@ static void read_pages(struct address_space *mapping, struct file *filp,
>  
>  out:
>  	blk_finish_plug(&plug);
> +
> +	BUG_ON(!list_empty(pages));
>  }
>  
>  /*
> @@ -180,8 +185,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  			 * contiguous pages before continuing with the next
>  			 * batch.
>  			 */
> -			if (nr_pages)
> -				read_pages(mapping, filp, &page_pool, nr_pages,
> +			read_pages(mapping, filp, &page_pool, nr_pages,
>  						gfp_mask);
>  			nr_pages = 0;
>  			continue;
> @@ -202,9 +206,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  	 * uptodate then the caller will launch readpage again, and
>  	 * will then handle the error.
>  	 */
> -	if (nr_pages)
> -		read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
> -	BUG_ON(!list_empty(&page_pool));
> +	read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 10/24] mm: Add readahead address space operation
  2020-02-19 21:00 ` [PATCH v7 10/24] mm: Add readahead address space operation Matthew Wilcox
  2020-02-20 15:00   ` Zi Yan
@ 2020-02-21  4:30   ` John Hubbard
  2020-02-24 21:41   ` Christoph Hellwig
  2 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2020-02-21  4:30 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-ext4, linux-erofs, linux-btrfs

On 2/19/20 1:00 PM, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> This replaces ->readpages with a saner interface:
>  - Return void instead of an ignored error code.
>  - Page cache is already populated with locked pages when ->readahead
>    is called.
>  - New arguments can be passed to the implementation without changing
>    all the filesystems that use a common helper function like
>    mpage_readahead().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  Documentation/filesystems/locking.rst |  6 +++++-
>  Documentation/filesystems/vfs.rst     | 15 +++++++++++++++
>  include/linux/fs.h                    |  2 ++
>  include/linux/pagemap.h               | 18 ++++++++++++++++++
>  mm/readahead.c                        | 12 ++++++++++--
>  5 files changed, 50 insertions(+), 3 deletions(-)


This all looks correct to me, so: 

    Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA


> 
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index 5057e4d9dcd1..0af2e0e11461 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -239,6 +239,7 @@ prototypes::
>  	int (*readpage)(struct file *, struct page *);
>  	int (*writepages)(struct address_space *, struct writeback_control *);
>  	int (*set_page_dirty)(struct page *page);
> +	void (*readahead)(struct readahead_control *);
>  	int (*readpages)(struct file *filp, struct address_space *mapping,
>  			struct list_head *pages, unsigned nr_pages);
>  	int (*write_begin)(struct file *, struct address_space *mapping,
> @@ -271,7 +272,8 @@ writepage:		yes, unlocks (see below)
>  readpage:		yes, unlocks
>  writepages:
>  set_page_dirty		no
> -readpages:
> +readahead:		yes, unlocks
> +readpages:		no
>  write_begin:		locks the page		 exclusive
>  write_end:		yes, unlocks		 exclusive
>  bmap:
> @@ -295,6 +297,8 @@ the request handler (/dev/loop).
>  ->readpage() unlocks the page, either synchronously or via I/O
>  completion.
>  
> +->readahead() unlocks the pages that I/O is attempted on like ->readpage().
> +
>  ->readpages() populates the pagecache with the passed pages and starts
>  I/O against them.  They come unlocked upon I/O completion.
>  
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 7d4d09dd5e6d..ed17771c212b 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -706,6 +706,7 @@ cache in your filesystem.  The following members are defined:
>  		int (*readpage)(struct file *, struct page *);
>  		int (*writepages)(struct address_space *, struct writeback_control *);
>  		int (*set_page_dirty)(struct page *page);
> +		void (*readahead)(struct readahead_control *);
>  		int (*readpages)(struct file *filp, struct address_space *mapping,
>  				 struct list_head *pages, unsigned nr_pages);
>  		int (*write_begin)(struct file *, struct address_space *mapping,
> @@ -781,12 +782,26 @@ cache in your filesystem.  The following members are defined:
>  	If defined, it should set the PageDirty flag, and the
>  	PAGECACHE_TAG_DIRTY tag in the radix tree.
>  
> +``readahead``
> +	Called by the VM to read pages associated with the address_space
> +	object.  The pages are consecutive in the page cache and are
> +	locked.  The implementation should decrement the page refcount
> +	after starting I/O on each page.  Usually the page will be
> +	unlocked by the I/O completion handler.  If the filesystem decides
> +	to stop attempting I/O before reaching the end of the readahead
> +	window, it can simply return.  The caller will decrement the page
> +	refcount and unlock the remaining pages for you.  Set PageUptodate
> +	if the I/O completes successfully.  Setting PageError on any page
> +	will be ignored; simply unlock the page if an I/O error occurs.
> +
>  ``readpages``
>  	called by the VM to read pages associated with the address_space
>  	object.  This is essentially just a vector version of readpage.
>  	Instead of just one page, several pages are requested.
>  	readpages is only used for read-ahead, so read errors are
>  	ignored.  If anything goes wrong, feel free to give up.
> +	This interface is deprecated and will be removed by the end of
> +	2020; implement readahead instead.
>  
>  ``write_begin``
>  	Called by the generic buffered write code to ask the filesystem
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3cd4fe6b845e..d4e2d2964346 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -292,6 +292,7 @@ enum positive_aop_returns {
>  struct page;
>  struct address_space;
>  struct writeback_control;
> +struct readahead_control;
>  
>  /*
>   * Write life time hint values.
> @@ -375,6 +376,7 @@ struct address_space_operations {
>  	 */
>  	int (*readpages)(struct file *filp, struct address_space *mapping,
>  			struct list_head *pages, unsigned nr_pages);
> +	void (*readahead)(struct readahead_control *);
>  
>  	int (*write_begin)(struct file *, struct address_space *mapping,
>  				loff_t pos, unsigned len, unsigned flags,
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 4989d330fada..b3008605fd1b 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -669,6 +669,24 @@ static inline struct page *readahead_page(struct readahead_control *rac)
>  	return page;
>  }
>  
> +/* The byte offset into the file of this readahead block */
> +static inline loff_t readahead_pos(struct readahead_control *rac)
> +{
> +	return (loff_t)rac->_index * PAGE_SIZE;
> +}
> +
> +/* The number of bytes in this readahead block */
> +static inline loff_t readahead_length(struct readahead_control *rac)
> +{
> +	return (loff_t)rac->_nr_pages * PAGE_SIZE;
> +}
> +
> +/* The index of the first page in this readahead block */
> +static inline unsigned int readahead_index(struct readahead_control *rac)
> +{
> +	return rac->_index;
> +}
> +
>  /* The number of pages in this readahead block */
>  static inline unsigned int readahead_count(struct readahead_control *rac)
>  {
> diff --git a/mm/readahead.c b/mm/readahead.c
> index aaa209559ba2..07cdfbf00f4b 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -124,7 +124,14 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages)
>  
>  	blk_start_plug(&plug);
>  
> -	if (aops->readpages) {
> +	if (aops->readahead) {
> +		aops->readahead(rac);
> +		/* Clean up the remaining pages */
> +		while ((page = readahead_page(rac))) {
> +			unlock_page(page);
> +			put_page(page);
> +		}
> +	} else if (aops->readpages) {
>  		aops->readpages(rac->file, rac->mapping, pages,
>  				readahead_count(rac));
>  		/* Clean up the remaining pages */
> @@ -234,7 +241,8 @@ void force_page_cache_readahead(struct address_space *mapping,
>  	struct file_ra_state *ra = &filp->f_ra;
>  	unsigned long max_pages;
>  
> -	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
> +	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages &&
> +			!mapping->a_ops->readahead))
>  		return;
>  
>  	/*
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 00/23] Change readahead API
  2020-02-20 22:39   ` Matthew Wilcox
@ 2020-02-21 11:59     ` David Sterba
  0 siblings, 0 replies; 76+ messages in thread
From: David Sterba @ 2020-02-21 11:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-f2fs-devel, dsterba, linux-kernel,
	cluster-devel, linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4,
	linux-erofs, linux-btrfs

On Thu, Feb 20, 2020 at 02:39:09PM -0800, Matthew Wilcox wrote:
> > >  - Now passes an xfstests run on ext4!
> > 
> > On btrfs it still chokes on the first test btrfs/001, with the following
> > warning, the test is stuck there.
> 
> Thanks.  The warning actually wasn't the problem, but it did need to
> be addressed.  I got a test system up & running with btrfs, and it's
> currently on generic/027 with the following patch:

Thanks, with the fix applied the first 10 tests passed, I'll let the
testsuite finish and let you know if ther are more warnings and such.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 11/24] mm: Move end_index check out of readahead loop
  2020-02-21  3:50   ` John Hubbard
@ 2020-02-21 15:35     ` Matthew Wilcox
  2020-02-21 19:41       ` John Hubbard
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-21 15:35 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Thu, Feb 20, 2020 at 07:50:39PM -0800, John Hubbard wrote:
> This tiny patch made me pause, because I wasn't sure at first of the exact
> intent of the lines above. Once I worked it out, it seemed like it might
> be helpful (or overkill??) to add a few hints for the reader, especially since
> there are no hints in the function's (minimal) documentation header. What
> do you think of this?
> 
> 	/*
> 	 * If we can't read *any* pages without going past the inodes's isize
> 	 * limit, give up entirely:
> 	 */
> 	if (index > end_index)
> 		return;
> 
> 	/* Cap nr_to_read, in order to avoid overflowing the ULONG type: */
> 	if (index + nr_to_read < index)
> 		nr_to_read = ULONG_MAX - index + 1;
> 
> 	/* Cap nr_to_read, to avoid reading past the inode's isize limit: */
> 	if (index + nr_to_read >= end_index)
> 		nr_to_read = end_index - index + 1;

A little verbose for my taste ... How about this?

        end_index = (isize - 1) >> PAGE_SHIFT;
        if (index > end_index)
                return;
        /* Avoid wrapping to the beginning of the file */
        if (index + nr_to_read < index)
                nr_to_read = ULONG_MAX - index + 1;
        /* Don't read past the page containing the last byte of the file */
        if (index + nr_to_read >= end_index)
                nr_to_read = end_index - index + 1;


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 11/24] mm: Move end_index check out of readahead loop
  2020-02-21 15:35     ` Matthew Wilcox
@ 2020-02-21 19:41       ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2020-02-21 19:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On 2/21/20 7:35 AM, Matthew Wilcox wrote:
> On Thu, Feb 20, 2020 at 07:50:39PM -0800, John Hubbard wrote:
>> This tiny patch made me pause, because I wasn't sure at first of the exact
>> intent of the lines above. Once I worked it out, it seemed like it might
>> be helpful (or overkill??) to add a few hints for the reader, especially since
>> there are no hints in the function's (minimal) documentation header. What
>> do you think of this?
>>
>> 	/*
>> 	 * If we can't read *any* pages without going past the inodes's isize
>> 	 * limit, give up entirely:
>> 	 */
>> 	if (index > end_index)
>> 		return;
>>
>> 	/* Cap nr_to_read, in order to avoid overflowing the ULONG type: */
>> 	if (index + nr_to_read < index)
>> 		nr_to_read = ULONG_MAX - index + 1;
>>
>> 	/* Cap nr_to_read, to avoid reading past the inode's isize limit: */
>> 	if (index + nr_to_read >= end_index)
>> 		nr_to_read = end_index - index + 1;
> 
> A little verbose for my taste ... How about this?


Mine too, actually. :)  I think your version below looks good.


thanks,
-- 
John Hubbard
NVIDIA

> 
>          end_index = (isize - 1) >> PAGE_SHIFT;
>          if (index > end_index)
>                  return;
>          /* Avoid wrapping to the beginning of the file */
>          if (index + nr_to_read < index)
>                  nr_to_read = ULONG_MAX - index + 1;
>          /* Don't read past the page containing the last byte of the file */
>          if (index + nr_to_read >= end_index)
>                  nr_to_read = end_index - index + 1;
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 01/24] mm: Move readahead prototypes from mm.h
  2020-02-21  2:43   ` John Hubbard
@ 2020-02-21 21:48     ` Matthew Wilcox
  2020-02-22  0:15       ` John Hubbard
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-21 21:48 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Thu, Feb 20, 2020 at 06:43:31PM -0800, John Hubbard wrote:
> Yes. But I think these files also need a similar change:
> 
>     fs/btrfs/disk-io.c

That gets pagemap.h through ctree.h, so I think it's fine.  It's
already using mapping_set_gfp_mask(), so it already depends on pagemap.h.

>     fs/nfs/super.c

That gets it through linux/nfs_fs.h.

I was reluctant to not add it to blk-core.c because it doesn't seem
necessarily intuitive that the block device core would include pagemap.h.

That said, blkdev.h does include pagemap.h, so maybe I don't need to
include it here.

> ...because they also use VM_READAHEAD_PAGES, and do not directly include
> pagemap.h yet.

> > +#define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
> > +
> > +void page_cache_sync_readahead(struct address_space *, struct file_ra_state *,
> > +		struct file *, pgoff_t index, unsigned long req_count);
> 
> Yes, "struct address_space *mapping" is weird, but I don't know if it's
> "misleading", given that it's actually one of the things you have to learn
> right from the beginning, with linux-mm, right? Or is that about to change?
> 
> I'm not asking to restore this to "struct address_space *mapping", but I thought
> it's worth mentioning out loud, especially if you or others are planning on
> changing those names or something. Just curious.

No plans (on my part) to change the name, although I have heard people
grumbling that there's very little need for it to be a separate struct
from inode, except for the benefit of coda, which is not exactly a
filesystem with a lot of users ...

Anyway, no plans to change it.  If there were something _special_ about
it like a theoretical:

void mapping_dedup(struct address_space *canonical,
		struct address_space *victim);

then that's useful information and shouldn't be deleted.  But I don't
think the word 'mapping' there conveys anything useful (other than the
convention is to call a 'struct address_space' a mapping, which you'll
see soon enough once you look at any of the .c files).

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 01/24] mm: Move readahead prototypes from mm.h
  2020-02-21 21:48     ` Matthew Wilcox
@ 2020-02-22  0:15       ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2020-02-22  0:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On 2/21/20 1:48 PM, Matthew Wilcox wrote:
> On Thu, Feb 20, 2020 at 06:43:31PM -0800, John Hubbard wrote:
>> Yes. But I think these files also need a similar change:
>>
>>     fs/btrfs/disk-io.c
> 
> That gets pagemap.h through ctree.h, so I think it's fine.  It's
> already using mapping_set_gfp_mask(), so it already depends on pagemap.h.
> 
>>     fs/nfs/super.c
> 
> That gets it through linux/nfs_fs.h.
> 
> I was reluctant to not add it to blk-core.c because it doesn't seem
> necessarily intuitive that the block device core would include pagemap.h.
> 
> That said, blkdev.h does include pagemap.h, so maybe I don't need to
> include it here.

OK. Looks good (either through blkdev.h or as-is), so:

    Reviewed-by: John Hubbard <jhubbard@nvidia.com>


> 
>> ...because they also use VM_READAHEAD_PAGES, and do not directly include
>> pagemap.h yet.
> 
>>> +#define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
>>> +
>>> +void page_cache_sync_readahead(struct address_space *, struct file_ra_state *,
>>> +		struct file *, pgoff_t index, unsigned long req_count);
>>
>> Yes, "struct address_space *mapping" is weird, but I don't know if it's
>> "misleading", given that it's actually one of the things you have to learn
>> right from the beginning, with linux-mm, right? Or is that about to change?
>>
>> I'm not asking to restore this to "struct address_space *mapping", but I thought
>> it's worth mentioning out loud, especially if you or others are planning on
>> changing those names or something. Just curious.
> 
> No plans (on my part) to change the name, although I have heard people
> grumbling that there's very little need for it to be a separate struct
> from inode, except for the benefit of coda, which is not exactly a
> filesystem with a lot of users ...
> 
> Anyway, no plans to change it.  If there were something _special_ about
> it like a theoretical:
> 
> void mapping_dedup(struct address_space *canonical,
> 		struct address_space *victim);
> 
> then that's useful information and shouldn't be deleted.  But I don't
> think the word 'mapping' there conveys anything useful (other than the
> convention is to call a 'struct address_space' a mapping, which you'll
> see soon enough once you look at any of the .c files).
> 

OK, that's consistent and makes sense.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor
  2020-02-19 21:01 ` [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor Matthew Wilcox
  2020-02-20 15:47   ` Christoph Hellwig
@ 2020-02-22  0:44   ` Darrick J. Wong
  2020-02-22  1:54     ` Matthew Wilcox
  1 sibling, 1 reply; 76+ messages in thread
From: Darrick J. Wong @ 2020-02-22  0:44 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Wed, Feb 19, 2020 at 01:01:00PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> By putting the 'have we reached the end of the page' condition at the end
> of the loop instead of the beginning, we can remove the 'submit the last
> page' code from iomap_readpages().  Also check that iomap_readpage_actor()
> didn't return 0, which would lead to an endless loop.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index cb3511eb152a..31899e6cb0f8 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -400,15 +400,9 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
>  		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_readpage_ctx *ctx = data;
> -	loff_t done, ret;
> -
> -	for (done = 0; done < length; done += ret) {
> -		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
> -			if (!ctx->cur_page_in_bio)
> -				unlock_page(ctx->cur_page);
> -			put_page(ctx->cur_page);
> -			ctx->cur_page = NULL;
> -		}
> +	loff_t ret, done = 0;
> +
> +	while (done < length) {
>  		if (!ctx->cur_page) {
>  			ctx->cur_page = iomap_next_page(inode, ctx->pages,
>  					pos, length, &done);
> @@ -418,6 +412,20 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
>  		}
>  		ret = iomap_readpage_actor(inode, pos + done, length - done,
>  				ctx, iomap, srcmap);
> +		done += ret;
> +
> +		/* Keep working on a partial page */
> +		if (ret && offset_in_page(pos + done))
> +			continue;
> +
> +		if (!ctx->cur_page_in_bio)
> +			unlock_page(ctx->cur_page);
> +		put_page(ctx->cur_page);
> +		ctx->cur_page = NULL;
> +
> +		/* Don't loop forever if we made no progress */
> +		if (WARN_ON(!ret))
> +			break;
>  	}
>  
>  	return done;
> @@ -451,11 +459,7 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
>  done:
>  	if (ctx.bio)
>  		submit_bio(ctx.bio);
> -	if (ctx.cur_page) {
> -		if (!ctx.cur_page_in_bio)
> -			unlock_page(ctx.cur_page);
> -		put_page(ctx.cur_page);
> -	}
> +	BUG_ON(ctx.cur_page);

Whoah, is the system totally unrecoverably hosed at this point?

I get that this /shouldn't/ happen, but should we somehow end up with a
page here, are we unable either to release it or even just leak it?  I'd
have thought a WARN_ON would be just fine here.

--D

>  
>  	/*
>  	 * Check that we didn't lose a page due to the arcance calling
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 22/24] iomap: Convert from readpages to readahead
  2020-02-20 16:57     ` Matthew Wilcox
@ 2020-02-22  1:00       ` Darrick J. Wong
  2020-02-24  4:33         ` Matthew Wilcox
  0 siblings, 1 reply; 76+ messages in thread
From: Darrick J. Wong @ 2020-02-22  1:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: cluster-devel, linux-kernel, linux-f2fs-devel, Christoph Hellwig,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-xfs, linux-btrfs

On Thu, Feb 20, 2020 at 08:57:34AM -0800, Matthew Wilcox wrote:
> On Thu, Feb 20, 2020 at 07:49:12AM -0800, Christoph Hellwig wrote:
> > > +/**
> > > + * iomap_readahead - Attempt to read pages from a file.
> > > + * @rac: Describes the pages to be read.
> > > + * @ops: The operations vector for the filesystem.
> > > + *
> > > + * This function is for filesystems to call to implement their readahead
> > > + * address_space operation.
> > > + *
> > > + * Context: The file is pinned by the caller, and the pages to be read are
> > > + * all locked and have an elevated refcount.  This function will unlock
> > > + * the pages (once I/O has completed on them, or I/O has been determined to
> > > + * not be necessary).  It will also decrease the refcount once the pages
> > > + * have been submitted for I/O.  After this point, the page may be removed
> > > + * from the page cache, and should not be referenced.
> > > + */
> > 
> > Isn't the context documentation something that belongs into the aop
> > documentation?  I've never really seen the value of duplicating this
> > information in method instances, as it is just bound to be out of date
> > rather sooner than later.
> 
> I'm in two minds about it as well.  There's definitely no value in
> providing kernel-doc for implementations of a common interface ... so
> rather than fixing the nilfs2 kernel-doc, I just deleted it.  But this
> isn't just the implementation, like nilfs2_readahead() is, it's a library
> function for filesystems to call, so it deserves documentation.  On the
> other hand, there's no real thought to this on the part of the filesystem;
> the implementation just calls this with the appropriate ops pointer.
> 
> Then again, I kind of feel like we need more documentation of iomap to
> help filesystems convert to using it.  But maybe kernel-doc isn't the
> mechanism to provide that.

I think we need more documentation of the parts of iomap where it can
call back into the filesystem (looking at you, iomap_dio_ops).

I'm not opposed to letting this comment stay, though I don't see it as
all that necessary since iomap_readahead implements a callout that's
documented in vfs.rst and is thus subject to all the constraints listed
in the (*readahead) documentation.

--D

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 22/24] iomap: Convert from readpages to readahead
  2020-02-19 21:01 ` [PATCH v7 22/24] iomap: Convert from readpages to readahead Matthew Wilcox
  2020-02-20 15:49   ` Christoph Hellwig
@ 2020-02-22  1:03   ` Darrick J. Wong
  2020-02-22  1:09     ` Matthew Wilcox
  1 sibling, 1 reply; 76+ messages in thread
From: Darrick J. Wong @ 2020-02-22  1:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Wed, Feb 19, 2020 at 01:01:01PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Use the new readahead operation in iomap.  Convert XFS and ZoneFS to
> use it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Ok... so from what I saw in the mm patches, this series changes
readahead to shove the locked pages into the page cache before calling
the filesystem's ->readahead function.  Therefore, this (and the
previous patch) are more or less just getting rid of all the iomap
machinery to put pages in the cache and instead pulling them out of the
mapping prior to submitting a read bio?

If so,

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 90 +++++++++++++++---------------------------
>  fs/iomap/trace.h       |  2 +-
>  fs/xfs/xfs_aops.c      | 13 +++---
>  fs/zonefs/super.c      |  7 ++--
>  include/linux/iomap.h  |  3 +-
>  5 files changed, 41 insertions(+), 74 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 31899e6cb0f8..66cf453f4bb7 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -214,9 +214,8 @@ iomap_read_end_io(struct bio *bio)
>  struct iomap_readpage_ctx {
>  	struct page		*cur_page;
>  	bool			cur_page_in_bio;
> -	bool			is_readahead;
>  	struct bio		*bio;
> -	struct list_head	*pages;
> +	struct readahead_control *rac;
>  };
>  
>  static void
> @@ -307,11 +306,11 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		if (ctx->bio)
>  			submit_bio(ctx->bio);
>  
> -		if (ctx->is_readahead) /* same as readahead_gfp_mask */
> +		if (ctx->rac) /* same as readahead_gfp_mask */
>  			gfp |= __GFP_NORETRY | __GFP_NOWARN;
>  		ctx->bio = bio_alloc(gfp, min(BIO_MAX_PAGES, nr_vecs));
>  		ctx->bio->bi_opf = REQ_OP_READ;
> -		if (ctx->is_readahead)
> +		if (ctx->rac)
>  			ctx->bio->bi_opf |= REQ_RAHEAD;
>  		ctx->bio->bi_iter.bi_sector = sector;
>  		bio_set_dev(ctx->bio, iomap->bdev);
> @@ -367,36 +366,8 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(iomap_readpage);
>  
> -static struct page *
> -iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
> -		loff_t length, loff_t *done)
> -{
> -	while (!list_empty(pages)) {
> -		struct page *page = lru_to_page(pages);
> -
> -		if (page_offset(page) >= (u64)pos + length)
> -			break;
> -
> -		list_del(&page->lru);
> -		if (!add_to_page_cache_lru(page, inode->i_mapping, page->index,
> -				GFP_NOFS))
> -			return page;
> -
> -		/*
> -		 * If we already have a page in the page cache at index we are
> -		 * done.  Upper layers don't care if it is uptodate after the
> -		 * readpages call itself as every page gets checked again once
> -		 * actually needed.
> -		 */
> -		*done += PAGE_SIZE;
> -		put_page(page);
> -	}
> -
> -	return NULL;
> -}
> -
>  static loff_t
> -iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> +iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
>  		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_readpage_ctx *ctx = data;
> @@ -404,10 +375,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
>  
>  	while (done < length) {
>  		if (!ctx->cur_page) {
> -			ctx->cur_page = iomap_next_page(inode, ctx->pages,
> -					pos, length, &done);
> -			if (!ctx->cur_page)
> -				break;
> +			ctx->cur_page = readahead_page(ctx->rac);
>  			ctx->cur_page_in_bio = false;
>  		}
>  		ret = iomap_readpage_actor(inode, pos + done, length - done,
> @@ -431,44 +399,48 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
>  	return done;
>  }
>  
> -int
> -iomap_readpages(struct address_space *mapping, struct list_head *pages,
> -		unsigned nr_pages, const struct iomap_ops *ops)
> +/**
> + * iomap_readahead - Attempt to read pages from a file.
> + * @rac: Describes the pages to be read.
> + * @ops: The operations vector for the filesystem.
> + *
> + * This function is for filesystems to call to implement their readahead
> + * address_space operation.
> + *
> + * Context: The file is pinned by the caller, and the pages to be read are
> + * all locked and have an elevated refcount.  This function will unlock
> + * the pages (once I/O has completed on them, or I/O has been determined to
> + * not be necessary).  It will also decrease the refcount once the pages
> + * have been submitted for I/O.  After this point, the page may be removed
> + * from the page cache, and should not be referenced.
> + */
> +void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
>  {
> +	struct inode *inode = rac->mapping->host;
> +	loff_t pos = readahead_pos(rac);
> +	loff_t length = readahead_length(rac);
>  	struct iomap_readpage_ctx ctx = {
> -		.pages		= pages,
> -		.is_readahead	= true,
> +		.rac	= rac,
>  	};
> -	loff_t pos = page_offset(list_entry(pages->prev, struct page, lru));
> -	loff_t last = page_offset(list_entry(pages->next, struct page, lru));
> -	loff_t length = last - pos + PAGE_SIZE, ret = 0;
>  
> -	trace_iomap_readpages(mapping->host, nr_pages);
> +	trace_iomap_readahead(inode, readahead_count(rac));
>  
>  	while (length > 0) {
> -		ret = iomap_apply(mapping->host, pos, length, 0, ops,
> -				&ctx, iomap_readpages_actor);
> +		loff_t ret = iomap_apply(inode, pos, length, 0, ops,
> +				&ctx, iomap_readahead_actor);
>  		if (ret <= 0) {
>  			WARN_ON_ONCE(ret == 0);
> -			goto done;
> +			break;
>  		}
>  		pos += ret;
>  		length -= ret;
>  	}
> -	ret = 0;
> -done:
> +
>  	if (ctx.bio)
>  		submit_bio(ctx.bio);
>  	BUG_ON(ctx.cur_page);
> -
> -	/*
> -	 * Check that we didn't lose a page due to the arcance calling
> -	 * conventions..
> -	 */
> -	WARN_ON_ONCE(!ret && !list_empty(ctx.pages));
> -	return ret;
>  }
> -EXPORT_SYMBOL_GPL(iomap_readpages);
> +EXPORT_SYMBOL_GPL(iomap_readahead);
>  
>  /*
>   * iomap_is_partially_uptodate checks whether blocks within a page are
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index 6dc227b8c47e..d6ba705f938a 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -39,7 +39,7 @@ DEFINE_EVENT(iomap_readpage_class, name,	\
>  	TP_PROTO(struct inode *inode, int nr_pages), \
>  	TP_ARGS(inode, nr_pages))
>  DEFINE_READPAGE_EVENT(iomap_readpage);
> -DEFINE_READPAGE_EVENT(iomap_readpages);
> +DEFINE_READPAGE_EVENT(iomap_readahead);
>  
>  DECLARE_EVENT_CLASS(iomap_page_class,
>  	TP_PROTO(struct inode *inode, struct page *page, unsigned long off,
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 58e937be24ce..6e68eeb50b07 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -621,14 +621,11 @@ xfs_vm_readpage(
>  	return iomap_readpage(page, &xfs_read_iomap_ops);
>  }
>  
> -STATIC int
> -xfs_vm_readpages(
> -	struct file		*unused,
> -	struct address_space	*mapping,
> -	struct list_head	*pages,
> -	unsigned		nr_pages)
> +STATIC void
> +xfs_vm_readahead(
> +	struct readahead_control	*rac)
>  {
> -	return iomap_readpages(mapping, pages, nr_pages, &xfs_read_iomap_ops);
> +	iomap_readahead(rac, &xfs_read_iomap_ops);
>  }
>  
>  static int
> @@ -644,7 +641,7 @@ xfs_iomap_swapfile_activate(
>  
>  const struct address_space_operations xfs_address_space_operations = {
>  	.readpage		= xfs_vm_readpage,
> -	.readpages		= xfs_vm_readpages,
> +	.readahead		= xfs_vm_readahead,
>  	.writepage		= xfs_vm_writepage,
>  	.writepages		= xfs_vm_writepages,
>  	.set_page_dirty		= iomap_set_page_dirty,
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 8bc6ef82d693..8327a01d3bac 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -78,10 +78,9 @@ static int zonefs_readpage(struct file *unused, struct page *page)
>  	return iomap_readpage(page, &zonefs_iomap_ops);
>  }
>  
> -static int zonefs_readpages(struct file *unused, struct address_space *mapping,
> -			    struct list_head *pages, unsigned int nr_pages)
> +static void zonefs_readahead(struct readahead_control *rac)
>  {
> -	return iomap_readpages(mapping, pages, nr_pages, &zonefs_iomap_ops);
> +	iomap_readahead(rac, &zonefs_iomap_ops);
>  }
>  
>  /*
> @@ -128,7 +127,7 @@ static int zonefs_writepages(struct address_space *mapping,
>  
>  static const struct address_space_operations zonefs_file_aops = {
>  	.readpage		= zonefs_readpage,
> -	.readpages		= zonefs_readpages,
> +	.readahead		= zonefs_readahead,
>  	.writepage		= zonefs_writepage,
>  	.writepages		= zonefs_writepages,
>  	.set_page_dirty		= iomap_set_page_dirty,
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 8b09463dae0d..bc20bd04c2a2 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -155,8 +155,7 @@ loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>  		const struct iomap_ops *ops);
>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
> -int iomap_readpages(struct address_space *mapping, struct list_head *pages,
> -		unsigned nr_pages, const struct iomap_ops *ops);
> +void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
>  int iomap_set_page_dirty(struct page *page);
>  int iomap_is_partially_uptodate(struct page *page, unsigned long from,
>  		unsigned long count);
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 22/24] iomap: Convert from readpages to readahead
  2020-02-22  1:03   ` Darrick J. Wong
@ 2020-02-22  1:09     ` Matthew Wilcox
  0 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-22  1:09 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Fri, Feb 21, 2020 at 05:03:53PM -0800, Darrick J. Wong wrote:
> On Wed, Feb 19, 2020 at 01:01:01PM -0800, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > Use the new readahead operation in iomap.  Convert XFS and ZoneFS to
> > use it.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> Ok... so from what I saw in the mm patches, this series changes
> readahead to shove the locked pages into the page cache before calling
> the filesystem's ->readahead function.  Therefore, this (and the
> previous patch) are more or less just getting rid of all the iomap
> machinery to put pages in the cache and instead pulling them out of the
> mapping prior to submitting a read bio?

Correct.

> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks!

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor
  2020-02-22  0:44   ` Darrick J. Wong
@ 2020-02-22  1:54     ` Matthew Wilcox
  2020-02-23 17:55       ` Darrick J. Wong
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-22  1:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Fri, Feb 21, 2020 at 04:44:25PM -0800, Darrick J. Wong wrote:
> On Wed, Feb 19, 2020 at 01:01:00PM -0800, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > By putting the 'have we reached the end of the page' condition at the end
> > of the loop instead of the beginning, we can remove the 'submit the last
> > page' code from iomap_readpages().  Also check that iomap_readpage_actor()
> > didn't return 0, which would lead to an endless loop.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  fs/iomap/buffered-io.c | 32 ++++++++++++++++++--------------
> >  1 file changed, 18 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index cb3511eb152a..31899e6cb0f8 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -400,15 +400,9 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> >  		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct iomap_readpage_ctx *ctx = data;
> > -	loff_t done, ret;
> > -
> > -	for (done = 0; done < length; done += ret) {
> > -		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
> > -			if (!ctx->cur_page_in_bio)
> > -				unlock_page(ctx->cur_page);
> > -			put_page(ctx->cur_page);
> > -			ctx->cur_page = NULL;
> > -		}
> > +	loff_t ret, done = 0;
> > +
> > +	while (done < length) {
> >  		if (!ctx->cur_page) {
> >  			ctx->cur_page = iomap_next_page(inode, ctx->pages,
> >  					pos, length, &done);
> > @@ -418,6 +412,20 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> >  		}
> >  		ret = iomap_readpage_actor(inode, pos + done, length - done,
> >  				ctx, iomap, srcmap);
> > +		done += ret;
> > +
> > +		/* Keep working on a partial page */
> > +		if (ret && offset_in_page(pos + done))
> > +			continue;
> > +
> > +		if (!ctx->cur_page_in_bio)
> > +			unlock_page(ctx->cur_page);
> > +		put_page(ctx->cur_page);
> > +		ctx->cur_page = NULL;
> > +
> > +		/* Don't loop forever if we made no progress */
> > +		if (WARN_ON(!ret))
> > +			break;
> >  	}
> >  
> >  	return done;
> > @@ -451,11 +459,7 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
> >  done:
> >  	if (ctx.bio)
> >  		submit_bio(ctx.bio);
> > -	if (ctx.cur_page) {
> > -		if (!ctx.cur_page_in_bio)
> > -			unlock_page(ctx.cur_page);
> > -		put_page(ctx.cur_page);
> > -	}
> > +	BUG_ON(ctx.cur_page);
> 
> Whoah, is the system totally unrecoverably hosed at this point?
> 
> I get that this /shouldn't/ happen, but should we somehow end up with a
> page here, are we unable either to release it or even just leak it?  I'd
> have thought a WARN_ON would be just fine here.

If we do find a page here, we don't actually know what to do with it.
It might be (currently) locked, it might have the wrong refcount.
Whatever is going on, it's probably better that we stop everything right
here rather than allow things to go further and possibly present bad
data to the application.  I mean, we could even be leaking the previous
contents of this page to userspace.  Or maybe the future contents of a
page which shouldn't be in the page cache any more, but userspace gets
a mapping to it.

I'm not enthusiastic about putting in some code here to try to handle
a "can't happen" case, since it's never going to be tested, and might
end up causing more problems than it tries to solve.  Let's just stop.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor
  2020-02-22  1:54     ` Matthew Wilcox
@ 2020-02-23 17:55       ` Darrick J. Wong
  0 siblings, 0 replies; 76+ messages in thread
From: Darrick J. Wong @ 2020-02-23 17:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Fri, Feb 21, 2020 at 05:54:35PM -0800, Matthew Wilcox wrote:
> On Fri, Feb 21, 2020 at 04:44:25PM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 19, 2020 at 01:01:00PM -0800, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > > 
> > > By putting the 'have we reached the end of the page' condition at the end
> > > of the loop instead of the beginning, we can remove the 'submit the last
> > > page' code from iomap_readpages().  Also check that iomap_readpage_actor()
> > > didn't return 0, which would lead to an endless loop.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  fs/iomap/buffered-io.c | 32 ++++++++++++++++++--------------
> > >  1 file changed, 18 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index cb3511eb152a..31899e6cb0f8 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -400,15 +400,9 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> > >  		void *data, struct iomap *iomap, struct iomap *srcmap)
> > >  {
> > >  	struct iomap_readpage_ctx *ctx = data;
> > > -	loff_t done, ret;
> > > -
> > > -	for (done = 0; done < length; done += ret) {
> > > -		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
> > > -			if (!ctx->cur_page_in_bio)
> > > -				unlock_page(ctx->cur_page);
> > > -			put_page(ctx->cur_page);
> > > -			ctx->cur_page = NULL;
> > > -		}
> > > +	loff_t ret, done = 0;
> > > +
> > > +	while (done < length) {
> > >  		if (!ctx->cur_page) {
> > >  			ctx->cur_page = iomap_next_page(inode, ctx->pages,
> > >  					pos, length, &done);
> > > @@ -418,6 +412,20 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> > >  		}
> > >  		ret = iomap_readpage_actor(inode, pos + done, length - done,
> > >  				ctx, iomap, srcmap);
> > > +		done += ret;
> > > +
> > > +		/* Keep working on a partial page */
> > > +		if (ret && offset_in_page(pos + done))
> > > +			continue;
> > > +
> > > +		if (!ctx->cur_page_in_bio)
> > > +			unlock_page(ctx->cur_page);
> > > +		put_page(ctx->cur_page);
> > > +		ctx->cur_page = NULL;
> > > +
> > > +		/* Don't loop forever if we made no progress */
> > > +		if (WARN_ON(!ret))
> > > +			break;
> > >  	}
> > >  
> > >  	return done;
> > > @@ -451,11 +459,7 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
> > >  done:
> > >  	if (ctx.bio)
> > >  		submit_bio(ctx.bio);
> > > -	if (ctx.cur_page) {
> > > -		if (!ctx.cur_page_in_bio)
> > > -			unlock_page(ctx.cur_page);
> > > -		put_page(ctx.cur_page);
> > > -	}
> > > +	BUG_ON(ctx.cur_page);
> > 
> > Whoah, is the system totally unrecoverably hosed at this point?
> > 
> > I get that this /shouldn't/ happen, but should we somehow end up with a
> > page here, are we unable either to release it or even just leak it?  I'd
> > have thought a WARN_ON would be just fine here.
> 
> If we do find a page here, we don't actually know what to do with it.
> It might be (currently) locked, it might have the wrong refcount.
> Whatever is going on, it's probably better that we stop everything right
> here rather than allow things to go further and possibly present bad
> data to the application.  I mean, we could even be leaking the previous
> contents of this page to userspace.  Or maybe the future contents of a
> page which shouldn't be in the page cache any more, but userspace gets
> a mapping to it.
> 
> I'm not enthusiastic about putting in some code here to try to handle
> a "can't happen" case, since it's never going to be tested, and might
> end up causing more problems than it tries to solve.  Let's just stop.

Seeing how Linus (and others like myself) are a bit allergic to BUG
these days, could you add the first paragraph of the above justification
as a comment adjacent to the BUG_ON(), please? :)

--D

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 22/24] iomap: Convert from readpages to readahead
  2020-02-22  1:00       ` Darrick J. Wong
@ 2020-02-24  4:33         ` Matthew Wilcox
  2020-02-24 16:52           ` Darrick J. Wong
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-24  4:33 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: cluster-devel, linux-kernel, linux-f2fs-devel, Christoph Hellwig,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-xfs, linux-btrfs

On Fri, Feb 21, 2020 at 05:00:13PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 20, 2020 at 08:57:34AM -0800, Matthew Wilcox wrote:
> > On Thu, Feb 20, 2020 at 07:49:12AM -0800, Christoph Hellwig wrote:
> > +/**
> > + * iomap_readahead - Attempt to read pages from a file.
> > + * @rac: Describes the pages to be read.
> > + * @ops: The operations vector for the filesystem.
> > + *
> > + * This function is for filesystems to call to implement their readahead
> > + * address_space operation.
> > + *
> > + * Context: The file is pinned by the caller, and the pages to be read are
> > + * all locked and have an elevated refcount.  This function will unlock
> > + * the pages (once I/O has completed on them, or I/O has been determined to
> > + * not be necessary).  It will also decrease the refcount once the pages
> > + * have been submitted for I/O.  After this point, the page may be removed
> > + * from the page cache, and should not be referenced.
> > + */
> > 
> > > Isn't the context documentation something that belongs into the aop
> > > documentation?  I've never really seen the value of duplicating this
> > > information in method instances, as it is just bound to be out of date
> > > rather sooner than later.
> > 
> > I'm in two minds about it as well.  There's definitely no value in
> > providing kernel-doc for implementations of a common interface ... so
> > rather than fixing the nilfs2 kernel-doc, I just deleted it.  But this
> > isn't just the implementation, like nilfs2_readahead() is, it's a library
> > function for filesystems to call, so it deserves documentation.  On the
> > other hand, there's no real thought to this on the part of the filesystem;
> > the implementation just calls this with the appropriate ops pointer.
> > 
> > Then again, I kind of feel like we need more documentation of iomap to
> > help filesystems convert to using it.  But maybe kernel-doc isn't the
> > mechanism to provide that.
> 
> I think we need more documentation of the parts of iomap where it can
> call back into the filesystem (looking at you, iomap_dio_ops).
> 
> I'm not opposed to letting this comment stay, though I don't see it as
> all that necessary since iomap_readahead implements a callout that's
> documented in vfs.rst and is thus subject to all the constraints listed
> in the (*readahead) documentation.

Right.  And that's not currently in kernel-doc format, but should be.
Something for a different patchset, IMO.

What we need documenting _here_ is the conditions under which the
iomap_ops are called so the filesystem author doesn't need to piece them
together from three different places.  Here's what I currently have:

 * Context: The @ops callbacks may submit I/O (eg to read the addresses of
 * blocks from disc), and may wait for it.  The caller may be trying to
 * access a different page, and so sleeping excessively should be avoided.
 * It may allocate memory, but should avoid large allocations.  This
 * function is called with memalloc_nofs set, so allocations will not cause
 * the filesystem to be reentered.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 22/24] iomap: Convert from readpages to readahead
  2020-02-24  4:33         ` Matthew Wilcox
@ 2020-02-24 16:52           ` Darrick J. Wong
  0 siblings, 0 replies; 76+ messages in thread
From: Darrick J. Wong @ 2020-02-24 16:52 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: cluster-devel, linux-kernel, linux-f2fs-devel, Christoph Hellwig,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-xfs, linux-btrfs

On Sun, Feb 23, 2020 at 08:33:55PM -0800, Matthew Wilcox wrote:
> On Fri, Feb 21, 2020 at 05:00:13PM -0800, Darrick J. Wong wrote:
> > On Thu, Feb 20, 2020 at 08:57:34AM -0800, Matthew Wilcox wrote:
> > > On Thu, Feb 20, 2020 at 07:49:12AM -0800, Christoph Hellwig wrote:
> > > +/**
> > > + * iomap_readahead - Attempt to read pages from a file.
> > > + * @rac: Describes the pages to be read.
> > > + * @ops: The operations vector for the filesystem.
> > > + *
> > > + * This function is for filesystems to call to implement their readahead
> > > + * address_space operation.
> > > + *
> > > + * Context: The file is pinned by the caller, and the pages to be read are
> > > + * all locked and have an elevated refcount.  This function will unlock
> > > + * the pages (once I/O has completed on them, or I/O has been determined to
> > > + * not be necessary).  It will also decrease the refcount once the pages
> > > + * have been submitted for I/O.  After this point, the page may be removed
> > > + * from the page cache, and should not be referenced.
> > > + */
> > > 
> > > > Isn't the context documentation something that belongs into the aop
> > > > documentation?  I've never really seen the value of duplicating this
> > > > information in method instances, as it is just bound to be out of date
> > > > rather sooner than later.
> > > 
> > > I'm in two minds about it as well.  There's definitely no value in
> > > providing kernel-doc for implementations of a common interface ... so
> > > rather than fixing the nilfs2 kernel-doc, I just deleted it.  But this
> > > isn't just the implementation, like nilfs2_readahead() is, it's a library
> > > function for filesystems to call, so it deserves documentation.  On the
> > > other hand, there's no real thought to this on the part of the filesystem;
> > > the implementation just calls this with the appropriate ops pointer.
> > > 
> > > Then again, I kind of feel like we need more documentation of iomap to
> > > help filesystems convert to using it.  But maybe kernel-doc isn't the
> > > mechanism to provide that.
> > 
> > I think we need more documentation of the parts of iomap where it can
> > call back into the filesystem (looking at you, iomap_dio_ops).
> > 
> > I'm not opposed to letting this comment stay, though I don't see it as
> > all that necessary since iomap_readahead implements a callout that's
> > documented in vfs.rst and is thus subject to all the constraints listed
> > in the (*readahead) documentation.
> 
> Right.  And that's not currently in kernel-doc format, but should be.
> Something for a different patchset, IMO.
> 
> What we need documenting _here_ is the conditions under which the
> iomap_ops are called so the filesystem author doesn't need to piece them
> together from three different places.  Here's what I currently have:
> 
>  * Context: The @ops callbacks may submit I/O (eg to read the addresses of
>  * blocks from disc), and may wait for it.  The caller may be trying to
>  * access a different page, and so sleeping excessively should be avoided.
>  * It may allocate memory, but should avoid large allocations.  This
>  * function is called with memalloc_nofs set, so allocations will not cause
>  * the filesystem to be reentered.

How large? :)

--D

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 01/24] mm: Move readahead prototypes from mm.h
  2020-02-19 21:00 ` [PATCH v7 01/24] mm: Move readahead prototypes from mm.h Matthew Wilcox
  2020-02-21  2:43   ` John Hubbard
@ 2020-02-24 21:32   ` Christoph Hellwig
  1 sibling, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Wed, Feb 19, 2020 at 01:00:40PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> The readahead code is part of the page cache so should be found in the
> pagemap.h file.  force_page_cache_readahead is only used within mm,
> so move it to mm/internal.h instead.  Remove the parameter names where
> they add no value, and rename the ones which were actively misleading.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 02/24] mm: Return void from various readahead functions
  2020-02-19 21:00 ` [PATCH v7 02/24] mm: Return void from various readahead functions Matthew Wilcox
@ 2020-02-24 21:33   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, John Hubbard, linux-kernel, linux-f2fs-devel,
	cluster-devel, linux-mm, ocfs2-devel, Dave Chinner,
	linux-fsdevel, linux-ext4, linux-erofs, linux-btrfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages
  2020-02-19 21:00 ` [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages Matthew Wilcox
  2020-02-20 14:36   ` Zi Yan
  2020-02-21  4:24   ` John Hubbard
@ 2020-02-24 21:34   ` Christoph Hellwig
  2 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Wed, Feb 19, 2020 at 01:00:43PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Simplify the callers by moving the check for nr_pages and the BUG_ON
> into read_pages().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 05/24] mm: Use readahead_control to pass arguments
  2020-02-19 21:00 ` [PATCH v7 05/24] mm: Use readahead_control to pass arguments Matthew Wilcox
@ 2020-02-24 21:36   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, John Hubbard, linux-kernel, linux-f2fs-devel,
	cluster-devel, linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4,
	linux-erofs, linux-btrfs

On Wed, Feb 19, 2020 at 01:00:44PM -0800, Matthew Wilcox wrote:
> @@ -160,9 +164,13 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  	unsigned long end_index;	/* The last page we want to read */
>  	LIST_HEAD(page_pool);
>  	int page_idx;
> -	unsigned int nr_pages = 0;
>  	loff_t isize = i_size_read(inode);
>  	gfp_t gfp_mask = readahead_gfp_mask(mapping);
> +	struct readahead_control rac = {
> +		.mapping = mapping,
> +		.file = filp,
> +		._nr_pages = 0,

There is no real need to initialize fields to zero if we initialize
the structure at all.  And while I'd normally not care that much,
having as few references as possible to the _-prefixed internal
variables helps making clear how internal they are.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 08/24] mm: Remove 'page_offset' from readahead loop
  2020-02-19 21:00 ` [PATCH v7 08/24] mm: Remove 'page_offset' from readahead loop Matthew Wilcox
  2020-02-21  2:48   ` John Hubbard
@ 2020-02-24 21:37   ` Christoph Hellwig
  1 sibling, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Wed, Feb 19, 2020 at 01:00:47PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Replace the page_offset variable with 'index + i'.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 09/24] mm: Put readahead pages in cache earlier
  2020-02-19 21:00 ` [PATCH v7 09/24] mm: Put readahead pages in cache earlier Matthew Wilcox
  2020-02-21  3:19   ` John Hubbard
@ 2020-02-24 21:40   ` Christoph Hellwig
  1 sibling, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Wed, Feb 19, 2020 at 01:00:48PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> When populating the page cache for readahead, mappings that use
> ->readpages must populate the page cache themselves as the pages are
> passed on a linked list which would normally be used for the page cache's
> LRU.  For mappings that use ->readpage or the upcoming ->readahead method,
> we can put the pages into the page cache as soon as they're allocated,
> which solves a race between readahead and direct IO.  It also lets us
> remove the gfp argument from read_pages().
> 
> Use the new readahead_page() API to implement the repeated calls to
> ->readpage(), just like most filesystems will.  This iterator also
> supports huge pages, even though none of the filesystems have been
> converted to use them yet.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/pagemap.h | 20 +++++++++++++++++
>  mm/readahead.c          | 48 +++++++++++++++++++++++++----------------
>  2 files changed, 49 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 55fcea0249e6..4989d330fada 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -647,8 +647,28 @@ struct readahead_control {
>  /* private: use the readahead_* accessors instead */
>  	pgoff_t _index;
>  	unsigned int _nr_pages;
> +	unsigned int _batch_count;
>  };
>  
> +static inline struct page *readahead_page(struct readahead_control *rac)
> +{
> +	struct page *page;
> +
> +	BUG_ON(rac->_batch_count > rac->_nr_pages);
> +	rac->_nr_pages -= rac->_batch_count;
> +	rac->_index += rac->_batch_count;
> +	rac->_batch_count = 0;
> +
> +	if (!rac->_nr_pages)
> +		return NULL;
> +
> +	page = xa_load(&rac->mapping->i_pages, rac->_index);
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	rac->_batch_count = hpage_nr_pages(page);
> +
> +	return page;
> +}
> +
>  /* The number of pages in this readahead block */
>  static inline unsigned int readahead_count(struct readahead_control *rac)
>  {
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 83df5c061d33..aaa209559ba2 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -113,15 +113,14 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages,
>  
>  EXPORT_SYMBOL(read_cache_pages);
>  
> -static void read_pages(struct readahead_control *rac, struct list_head *pages,
> -		gfp_t gfp)
> +static void read_pages(struct readahead_control *rac, struct list_head *pages)
>  {
>  	const struct address_space_operations *aops = rac->mapping->a_ops;
> +	struct page *page;
>  	struct blk_plug plug;
> -	unsigned page_idx;
>  
>  	if (!readahead_count(rac))
> -		return;
> +		goto out;
>  
>  	blk_start_plug(&plug);
>  
> @@ -130,23 +129,23 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
>  				readahead_count(rac));
>  		/* Clean up the remaining pages */
>  		put_pages_list(pages);
> -		goto out;
> -	}
> -
> -	for (page_idx = 0; page_idx < readahead_count(rac); page_idx++) {
> -		struct page *page = lru_to_page(pages);
> -		list_del(&page->lru);
> -		if (!add_to_page_cache_lru(page, rac->mapping, page->index,
> -				gfp))
> +		rac->_index += rac->_nr_pages;
> +		rac->_nr_pages = 0;
> +	} else {
> +		while ((page = readahead_page(rac))) {
>  			aops->readpage(rac->file, page);
> -		put_page(page);
> +			put_page(page);
> +		}
>  	}
>  
> -out:
>  	blk_finish_plug(&plug);
>  
>  	BUG_ON(!list_empty(pages));
> -	rac->_nr_pages = 0;
> +	BUG_ON(readahead_count(rac));
> +
> +out:
> +	/* If we were called due to a conflicting page, skip over it */
> +	rac->_index++;
>  }
>  
>  /*
> @@ -165,9 +164,11 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  	LIST_HEAD(page_pool);
>  	loff_t isize = i_size_read(inode);
>  	gfp_t gfp_mask = readahead_gfp_mask(mapping);
> +	bool use_list = mapping->a_ops->readpages;

I find this single use variable a little weird.  Not a dealbreaker,
but just checking the methods would seem a little more obvious to me.

Except for this and the other nitpick the patch looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 10/24] mm: Add readahead address space operation
  2020-02-19 21:00 ` [PATCH v7 10/24] mm: Add readahead address space operation Matthew Wilcox
  2020-02-20 15:00   ` Zi Yan
  2020-02-21  4:30   ` John Hubbard
@ 2020-02-24 21:41   ` Christoph Hellwig
  2 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

Looks fine modulo the little data type issue:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 14/24] btrfs: Convert from readpages to readahead
  2020-02-20 15:57           ` Christoph Hellwig
@ 2020-02-24 21:43             ` Christoph Hellwig
  2020-02-24 21:54               ` Matthew Wilcox
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: cluster-devel, Johannes Thumshirn, linux-kernel,
	linux-f2fs-devel, Christoph Hellwig, linux-mm, ocfs2-devel,
	linux-fsdevel, linux-ext4, linux-erofs, linux-xfs, linux-btrfs

On Thu, Feb 20, 2020 at 07:57:27AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 07:54:52AM -0800, Matthew Wilcox wrote:
> > On Thu, Feb 20, 2020 at 07:46:58AM -0800, Christoph Hellwig wrote:
> > > On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> > > > btrfs: Convert from readpages to readahead
> > > >   
> > > > Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> > > > to optimise fetching a batch of pages at once.
> > > 
> > > Shouldn't this readahead_page_batch heper go into a separate patch so
> > > that it clearly stands out?
> > 
> > I'll move it into 'Put readahead pages in cache earlier' for v8 (the
> > same patch where we add readahead_page())
> 
> One argument for keeping it in a patch of its own is that btrfs appears
> to be the only user, and Goldwyn has a WIP conversion of btrfs to iomap,
> so it might go away pretty soon and we could just revert the commit.
> 
> But this starts to get into really minor details, so I'll shut up now :)

So looking at this again I have another comment and a question.

First I think the implicit ARRAY_SIZE in readahead_page_batch is highly
dangerous, as it will do the wrong thing when passing a pointer or
function argument.

Second I wonder іf it would be worth to also switch to a batched
operation in iomap if the xarray overhead is high enough.  That should
be pretty trivial, but we don't really need to do it in this series.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Cluster-devel] [PATCH v7 12/24] mm: Add page_cache_readahead_unbounded
  2020-02-19 21:00 ` [PATCH v7 12/24] mm: Add page_cache_readahead_unbounded Matthew Wilcox
@ 2020-02-24 21:53   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: cluster-devel, linux-kernel, linux-f2fs-devel, linux-xfs,
	linux-mm, linux-btrfs, linux-fsdevel, linux-ext4, linux-erofs,
	ocfs2-devel

On Wed, Feb 19, 2020 at 01:00:51PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> ext4 and f2fs have duplicated the guts of the readahead code so
> they can read past i_size.  Instead, separate out the guts of the
> readahead code so they can call it directly.

I don't like this, but then I like the horrible open coded versions
even less..  Can you add a do not use for new code comment to the
function as well?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 14/24] btrfs: Convert from readpages to readahead
  2020-02-24 21:43             ` Christoph Hellwig
@ 2020-02-24 21:54               ` Matthew Wilcox
  2020-02-24 21:57                 ` Christoph Hellwig
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-24 21:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, Johannes Thumshirn, linux-kernel, linux-f2fs-devel,
	cluster-devel, linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4,
	linux-erofs, linux-btrfs

On Mon, Feb 24, 2020 at 01:43:47PM -0800, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 07:57:27AM -0800, Christoph Hellwig wrote:
> > On Thu, Feb 20, 2020 at 07:54:52AM -0800, Matthew Wilcox wrote:
> > > On Thu, Feb 20, 2020 at 07:46:58AM -0800, Christoph Hellwig wrote:
> > > > On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> > > > > btrfs: Convert from readpages to readahead
> > > > >   
> > > > > Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> > > > > to optimise fetching a batch of pages at once.
> > > > 
> > > > Shouldn't this readahead_page_batch heper go into a separate patch so
> > > > that it clearly stands out?
> > > 
> > > I'll move it into 'Put readahead pages in cache earlier' for v8 (the
> > > same patch where we add readahead_page())
> > 
> > One argument for keeping it in a patch of its own is that btrfs appears
> > to be the only user, and Goldwyn has a WIP conversion of btrfs to iomap,
> > so it might go away pretty soon and we could just revert the commit.
> > 
> > But this starts to get into really minor details, so I'll shut up now :)
> 
> So looking at this again I have another comment and a question.
> 
> First I think the implicit ARRAY_SIZE in readahead_page_batch is highly
> dangerous, as it will do the wrong thing when passing a pointer or
> function argument.

somebody already thought of that ;-)

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

> Second I wonder іf it would be worth to also switch to a batched
> operation in iomap if the xarray overhead is high enough.  That should
> be pretty trivial, but we don't really need to do it in this series.

I've also considered keeping a small array of pointers inside the
readahead_control so nobody needs to have a readahead_page_batch()
operation.  Even keeping 10 pointers in there will reduce the XArray
overhead by 90%.  But this fit the current btrfs model well, and it
lets us play with different approaches by abstracting everything away.
I'm sure this won't be the last patch that touches the readahead code ;-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Cluster-devel] [PATCH v7 13/24] fs: Convert mpage_readpages to mpage_readahead
  2020-02-19 21:00 ` [PATCH v7 13/24] fs: Convert mpage_readpages to mpage_readahead Matthew Wilcox
@ 2020-02-24 21:54   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: cluster-devel, linux-ext4, linux-mm, John Hubbard, linux-kernel,
	Junxiao Bi, linux-xfs, Joseph Qi, linux-btrfs, linux-fsdevel,
	linux-f2fs-devel, linux-erofs, ocfs2-devel

On Wed, Feb 19, 2020 at 01:00:52PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Implement the new readahead aop and convert all callers (block_dev,
> exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
> reiserfs & udf).  The callers are all trivial except for GFS2 & OCFS2.

Looks sensible:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 14/24] btrfs: Convert from readpages to readahead
  2020-02-24 21:54               ` Matthew Wilcox
@ 2020-02-24 21:57                 ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 21:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: cluster-devel, Johannes Thumshirn, linux-kernel,
	linux-f2fs-devel, Christoph Hellwig, linux-mm, ocfs2-devel,
	linux-fsdevel, linux-ext4, linux-erofs, linux-xfs, linux-btrfs

On Mon, Feb 24, 2020 at 01:54:14PM -0800, Matthew Wilcox wrote:
> > First I think the implicit ARRAY_SIZE in readahead_page_batch is highly
> > dangerous, as it will do the wrong thing when passing a pointer or
> > function argument.
> 
> somebody already thought of that ;-)
> 
> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

Ok.  Still find it pretty weird to design a primary interface that
just works with an array type.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor
  2020-02-20 16:24     ` Matthew Wilcox
@ 2020-02-24 22:17       ` Christoph Hellwig
  2020-02-25  1:49         ` Matthew Wilcox
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2020-02-24 22:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: cluster-devel, linux-kernel, linux-f2fs-devel, Christoph Hellwig,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-xfs, linux-btrfs

On Thu, Feb 20, 2020 at 08:24:04AM -0800, Matthew Wilcox wrote:
> On Thu, Feb 20, 2020 at 07:47:41AM -0800, Christoph Hellwig wrote:
> > On Wed, Feb 19, 2020 at 01:01:00PM -0800, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > > 
> > > By putting the 'have we reached the end of the page' condition at the end
> > > of the loop instead of the beginning, we can remove the 'submit the last
> > > page' code from iomap_readpages().  Also check that iomap_readpage_actor()
> > > didn't return 0, which would lead to an endless loop.
> > 
> > I'm obviously biassed a I wrote the original code, but I find the new
> > very much harder to understand (not that the previous one was easy, this
> > is tricky code..).
> 
> Agreed, I found the original code hard to understand.  I think this is
> easier because now cur_page doesn't leak outside this loop, so it has
> an obvious lifecycle.

I really don't like this patch, and would prefer if the series goes
ahead without it, as the current sctructure works just fine even
with the readahead changes.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor
  2020-02-24 22:17       ` Christoph Hellwig
@ 2020-02-25  1:49         ` Matthew Wilcox
  0 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2020-02-25  1:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-kernel, linux-f2fs-devel, cluster-devel,
	linux-mm, ocfs2-devel, linux-fsdevel, linux-ext4, linux-erofs,
	linux-btrfs

On Mon, Feb 24, 2020 at 02:17:49PM -0800, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 08:24:04AM -0800, Matthew Wilcox wrote:
> > On Thu, Feb 20, 2020 at 07:47:41AM -0800, Christoph Hellwig wrote:
> > > On Wed, Feb 19, 2020 at 01:01:00PM -0800, Matthew Wilcox wrote:
> > > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > > > 
> > > > By putting the 'have we reached the end of the page' condition at the end
> > > > of the loop instead of the beginning, we can remove the 'submit the last
> > > > page' code from iomap_readpages().  Also check that iomap_readpage_actor()
> > > > didn't return 0, which would lead to an endless loop.
> > > 
> > > I'm obviously biassed a I wrote the original code, but I find the new
> > > very much harder to understand (not that the previous one was easy, this
> > > is tricky code..).
> > 
> > Agreed, I found the original code hard to understand.  I think this is
> > easier because now cur_page doesn't leak outside this loop, so it has
> > an obvious lifecycle.
> 
> I really don't like this patch, and would prefer if the series goes
> ahead without it, as the current sctructure works just fine even
> with the readahead changes.

Dave Chinner specifically asked me to do it this way, so please fight
amongst yourselves.

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2020-02-25  1:50 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-19 21:00 [PATCH v7 00/23] Change readahead API Matthew Wilcox
2020-02-19 21:00 ` [PATCH v7 01/24] mm: Move readahead prototypes from mm.h Matthew Wilcox
2020-02-21  2:43   ` John Hubbard
2020-02-21 21:48     ` Matthew Wilcox
2020-02-22  0:15       ` John Hubbard
2020-02-24 21:32   ` Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 02/24] mm: Return void from various readahead functions Matthew Wilcox
2020-02-24 21:33   ` Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 03/24] mm: Ignore return value of ->readpages Matthew Wilcox
2020-02-19 21:00 ` [PATCH v7 04/24] mm: Move readahead nr_pages check into read_pages Matthew Wilcox
2020-02-20 14:36   ` Zi Yan
2020-02-21  4:24   ` John Hubbard
2020-02-24 21:34   ` Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 05/24] mm: Use readahead_control to pass arguments Matthew Wilcox
2020-02-24 21:36   ` Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 06/24] mm: Rename various 'offset' parameters to 'index' Matthew Wilcox
2020-02-21  2:21   ` John Hubbard
2020-02-21  3:27   ` John Hubbard
2020-02-19 21:00 ` [PATCH v7 07/24] mm: rename readahead loop variable to 'i' Matthew Wilcox
2020-02-19 21:00 ` [PATCH v7 08/24] mm: Remove 'page_offset' from readahead loop Matthew Wilcox
2020-02-21  2:48   ` John Hubbard
2020-02-24 21:37   ` Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 09/24] mm: Put readahead pages in cache earlier Matthew Wilcox
2020-02-21  3:19   ` John Hubbard
2020-02-21  3:43     ` Matthew Wilcox
2020-02-21  4:19       ` John Hubbard
2020-02-24 21:40   ` Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 10/24] mm: Add readahead address space operation Matthew Wilcox
2020-02-20 15:00   ` Zi Yan
2020-02-20 15:10     ` Matthew Wilcox
2020-02-21  4:30   ` John Hubbard
2020-02-24 21:41   ` Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 11/24] mm: Move end_index check out of readahead loop Matthew Wilcox
2020-02-21  3:50   ` John Hubbard
2020-02-21 15:35     ` Matthew Wilcox
2020-02-21 19:41       ` John Hubbard
2020-02-19 21:00 ` [PATCH v7 12/24] mm: Add page_cache_readahead_unbounded Matthew Wilcox
2020-02-24 21:53   ` [Cluster-devel] " Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 13/24] fs: Convert mpage_readpages to mpage_readahead Matthew Wilcox
2020-02-24 21:54   ` [Cluster-devel] " Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 14/24] btrfs: Convert from readpages to readahead Matthew Wilcox
2020-02-20  9:42   ` Johannes Thumshirn
2020-02-20 13:48     ` Matthew Wilcox
2020-02-20 15:46       ` Christoph Hellwig
2020-02-20 15:54         ` Matthew Wilcox
2020-02-20 15:57           ` Christoph Hellwig
2020-02-24 21:43             ` Christoph Hellwig
2020-02-24 21:54               ` Matthew Wilcox
2020-02-24 21:57                 ` Christoph Hellwig
2020-02-19 21:00 ` [PATCH v7 15/24] erofs: Convert uncompressed files " Matthew Wilcox
2020-02-19 21:00 ` [PATCH v7 16/24] erofs: Convert compressed " Matthew Wilcox
2020-02-19 21:00 ` [PATCH v7 17/24] ext4: Convert " Matthew Wilcox
2020-02-19 21:00 ` [PATCH v7 18/24] ext4: Pass the inode to ext4_mpage_readpages Matthew Wilcox
2020-02-19 21:00 ` [PATCH v7 19/24] f2fs: Convert from readpages to readahead Matthew Wilcox
2020-02-19 21:00 ` [PATCH v7 20/24] fuse: " Matthew Wilcox
2020-02-19 21:01 ` [PATCH v7 21/24] iomap: Restructure iomap_readpages_actor Matthew Wilcox
2020-02-20 15:47   ` Christoph Hellwig
2020-02-20 16:24     ` Matthew Wilcox
2020-02-24 22:17       ` Christoph Hellwig
2020-02-25  1:49         ` Matthew Wilcox
2020-02-22  0:44   ` Darrick J. Wong
2020-02-22  1:54     ` Matthew Wilcox
2020-02-23 17:55       ` Darrick J. Wong
2020-02-19 21:01 ` [PATCH v7 22/24] iomap: Convert from readpages to readahead Matthew Wilcox
2020-02-20 15:49   ` Christoph Hellwig
2020-02-20 16:57     ` Matthew Wilcox
2020-02-22  1:00       ` Darrick J. Wong
2020-02-24  4:33         ` Matthew Wilcox
2020-02-24 16:52           ` Darrick J. Wong
2020-02-22  1:03   ` Darrick J. Wong
2020-02-22  1:09     ` Matthew Wilcox
2020-02-19 21:01 ` [PATCH v7 23/24] mm: Document why we don't set PageReadahead Matthew Wilcox
2020-02-19 21:01 ` [PATCH v7 24/24] mm: Use memalloc_nofs_save in readahead path Matthew Wilcox
2020-02-20 17:54 ` [PATCH v7 00/23] Change readahead API David Sterba
2020-02-20 22:39   ` Matthew Wilcox
2020-02-21 11:59     ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).