linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/25] Page folios
@ 2020-12-16 18:23 Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 01/25] mm: Introduce struct folio Matthew Wilcox (Oracle)
                   ` (25 more replies)
  0 siblings, 26 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

One of the great things about compound pages is that when you try to
do various operations on a tail page, it redirects to the head page and
everything Just Works.  One of the awful things is how much we pay for
that simplicity.  Here's an example, end_page_writeback():

        if (PageReclaim(page)) {
                ClearPageReclaim(page);
                rotate_reclaimable_page(page);
        }
        get_page(page);
        if (!test_clear_page_writeback(page))
                BUG();

        smp_mb__after_atomic();
        wake_up_page(page, PG_writeback);
        put_page(page);

That all looks very straightforward, but if you dive into the disassembly,
you see that there are four calls to compound_head() in this function
(PageReclaim(), ClearPageReclaim(), get_page() and put_page()).  It's
all for nothing, because if anyone does call this routine with a tail
page, wake_up_page() will VM_BUG_ON_PGFLAGS(PageTail(page), page).

I'm not really a CPU person, but I imagine there's some kind of dependency
here that sucks too:

    1fd7:       48 8b 57 08             mov    0x8(%rdi),%rdx
    1fdb:       48 8d 42 ff             lea    -0x1(%rdx),%rax
    1fdf:       83 e2 01                and    $0x1,%edx
    1fe2:       48 0f 44 c7             cmove  %rdi,%rax
    1fe6:       f0 80 60 02 fb          lock andb $0xfb,0x2(%rax)

Sure, it's going to be cache hot, but that cmove has to execute before
the lock andb.

I would like to introduce a new concept that I call a Page Folio.
Or just struct folio to its friends.  Here it is,
struct folio {
        struct page page;
};

A folio is a struct page which is guaranteed not to be a tail page.
So it's either a head page or a base (order-0) page.  That means
we don't have to call compound_head() on it and we save massively.
end_page_writeback() reduces from four calls to compound_head() to just
one (at the beginning of the function) and it shrinks from 213 bytes
to 126 bytes (using distro kernel config options).  I think even that one
can be eliminated, but I'm going slowly at this point and taking the
safe route of transforming a random struct page pointer into a struct
folio pointer by calling page_folio().  By the end of this exercise,
end_page_writeback() will become end_folio_writeback().

This is going to be a ton of work, and massively disruptive.  It'll touch
every filesystem, and a good few device drivers!  But I think it's worth
it.  Not every routine benefits as much as end_page_writeback(), but it
makes everything a little better.  At 29 bytes per call to lock_page(),
unlock_page(), put_page() and get_page(), that's on the order of 60kB of
text for allyesconfig.  More when you add on all the PageFoo() calls.
With the small amount of work I've done here, mm/filemap.o shrinks its
text segment by over a kilobyte from 33687 to 32318 bytes (and also 192
bytes of data).

But better than that, it's good documentation.  A function which has a
struct page argument might be expecting a head or base page and will
BUG if given a tail page.  It might work with any kind of page and
operate on PAGE_SIZE bytes.  It might work with any kind of page and
operate on page_size() bytes if given a head page but PAGE_SIZE bytes
if given a base or tail page.  It might operate on page_size() bytes
if passed a head or tail page.  We have examples of all of these today.
If a function takes a folio argument, it's operating on the entire folio.

This version of the patch series converts the deduplication code from
operating on pages to operating on folios.  Most of the patches are
somewhat generic infrastructure we'll need, then there's a big gulp as
all filesystems are converted to use folios for readahead and readpage.
Finally, we can convert the deduplification code to use page folios.

If you're interested, you can listen to a discussion of page folios
from last week here: https://www.youtube.com/watch?v=iP49_ER1FUM
Git tree version here (against next-20201216):
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio

Matthew Wilcox (Oracle) (25):
  mm: Introduce struct folio
  mm: Add put_folio
  mm: Add get_folio
  mm: Create FolioFlags
  mm: Add unlock_folio
  mm: Add lock_folio
  mm: Add lock_folio_killable
  mm: Add __alloc_folio_node and alloc_folio
  mm: Convert __page_cache_alloc to return a folio
  mm/filemap: Convert end_page_writeback to use a folio
  mm: Convert mapping_get_entry to return a folio
  mm: Add mark_folio_accessed
  mm: Add filemap_get_folio and find_get_folio
  mm/filemap: Add folio_add_to_page_cache
  mm/swap: Convert rotate_reclaimable_page to folio
  mm: Add folio_mapping
  mm: Rename THP_SUPPORT to MULTI_PAGE_FOLIOS
  btrfs: Use readahead_batch_length
  fs: Change page refcount rules for readahead
  fs: Change readpage to take a folio
  mm: Convert wait_on_page_bit to wait_on_folio_bit
  mm: Add wait_on_folio_locked & wait_on_folio_locked_killable
  mm: Add flush_dcache_folio
  mm: Add read_cache_folio and read_mapping_folio
  fs: Convert vfs_dedupe_file_range_compare to folios

 Documentation/core-api/cachetlb.rst   |   6 +
 Documentation/filesystems/locking.rst |   2 +-
 Documentation/filesystems/porting.rst |   8 +
 Documentation/filesystems/vfs.rst     |  35 +-
 fs/9p/vfs_addr.c                      |   9 +-
 fs/adfs/inode.c                       |   4 +-
 fs/affs/file.c                        |   8 +-
 fs/affs/symlink.c                     |   3 +-
 fs/afs/dir.c                          |   2 +-
 fs/afs/file.c                         |   5 +-
 fs/afs/write.c                        |   2 +-
 fs/befs/linuxvfs.c                    |  23 +-
 fs/bfs/file.c                         |   4 +-
 fs/block_dev.c                        |   4 +-
 fs/btrfs/compression.c                |   4 +-
 fs/btrfs/ctree.h                      |   2 +-
 fs/btrfs/extent_io.c                  |  19 +-
 fs/btrfs/file.c                       |  13 +-
 fs/btrfs/free-space-cache.c           |   9 +-
 fs/btrfs/inode.c                      |  16 +-
 fs/btrfs/ioctl.c                      |  11 +-
 fs/btrfs/relocation.c                 |  11 +-
 fs/btrfs/send.c                       |  11 +-
 fs/buffer.c                           |  12 +-
 fs/cachefiles/rdwr.c                  |  17 +-
 fs/ceph/addr.c                        |   8 +-
 fs/ceph/file.c                        |   2 +-
 fs/cifs/file.c                        |   3 +-
 fs/coda/symlink.c                     |   3 +-
 fs/cramfs/inode.c                     |   3 +-
 fs/ecryptfs/mmap.c                    |   3 +-
 fs/efs/inode.c                        |   4 +-
 fs/efs/symlink.c                      |   3 +-
 fs/erofs/data.c                       |  12 +-
 fs/erofs/zdata.c                      |   8 +-
 fs/exfat/inode.c                      |   4 +-
 fs/ext2/inode.c                       |   4 +-
 fs/ext4/ext4.h                        |   2 +-
 fs/ext4/inode.c                       |  10 +-
 fs/ext4/readpage.c                    |  35 +-
 fs/f2fs/data.c                        |  12 +-
 fs/fat/inode.c                        |   4 +-
 fs/freevxfs/vxfs_immed.c              |   7 +-
 fs/freevxfs/vxfs_subr.c               |   7 +-
 fs/fuse/dir.c                         |   8 +-
 fs/fuse/file.c                        |   7 +-
 fs/gfs2/aops.c                        |  13 +-
 fs/hfs/inode.c                        |   4 +-
 fs/hfsplus/inode.c                    |   4 +-
 fs/hpfs/file.c                        |   4 +-
 fs/hpfs/namei.c                       |   3 +-
 fs/inode.c                            |   4 +-
 fs/iomap/buffered-io.c                |  14 +-
 fs/isofs/compress.c                   |   3 +-
 fs/isofs/inode.c                      |   4 +-
 fs/isofs/rock.c                       |   3 +-
 fs/jffs2/file.c                       |  20 +-
 fs/jffs2/os-linux.h                   |   2 +-
 fs/jfs/inode.c                        |   4 +-
 fs/jfs/jfs_metapage.c                 |   3 +-
 fs/libfs.c                            |  10 +-
 fs/minix/inode.c                      |   4 +-
 fs/mpage.c                            |   9 +-
 fs/nfs/file.c                         |   5 +-
 fs/nfs/read.c                         |   7 +-
 fs/nfs/symlink.c                      |  12 +-
 fs/nilfs2/inode.c                     |   4 +-
 fs/ntfs/aops.c                        |   3 +-
 fs/ocfs2/aops.c                       |  14 +-
 fs/ocfs2/refcounttree.c               |   5 +-
 fs/ocfs2/symlink.c                    |   3 +-
 fs/omfs/file.c                        |   4 +-
 fs/orangefs/inode.c                   |   3 +-
 fs/qnx4/inode.c                       |   4 +-
 fs/qnx6/inode.c                       |   4 +-
 fs/reiserfs/inode.c                   |   4 +-
 fs/remap_range.c                      | 109 +++---
 fs/romfs/super.c                      |   3 +-
 fs/squashfs/file.c                    |   3 +-
 fs/squashfs/symlink.c                 |   3 +-
 fs/sysv/itree.c                       |   4 +-
 fs/ubifs/file.c                       |   8 +-
 fs/udf/file.c                         |   8 +-
 fs/udf/inode.c                        |   4 +-
 fs/udf/symlink.c                      |   3 +-
 fs/ufs/inode.c                        |   4 +-
 fs/vboxsf/file.c                      |   3 +-
 fs/xfs/xfs_aops.c                     |   4 +-
 fs/zonefs/super.c                     |   4 +-
 include/asm-generic/cacheflush.h      |  13 +
 include/linux/buffer_head.h           |   2 +-
 include/linux/fs.h                    |   6 +-
 include/linux/gfp.h                   |  11 +
 include/linux/iomap.h                 |   2 +-
 include/linux/mm.h                    |  57 +++-
 include/linux/mm_types.h              |  17 +
 include/linux/mpage.h                 |   2 +-
 include/linux/nfs_fs.h                |   2 +-
 include/linux/page-flags.h            |  80 ++++-
 include/linux/pagemap.h               | 227 +++++++++----
 include/linux/swap.h                  |   9 +-
 mm/filemap.c                          | 466 +++++++++++++-------------
 mm/internal.h                         |   1 +
 mm/page-writeback.c                   |   7 +-
 mm/page_io.c                          |   6 +-
 mm/readahead.c                        |  24 +-
 mm/shmem.c                            |   2 +-
 mm/swap.c                             |  40 ++-
 mm/swapfile.c                         |   6 +-
 mm/util.c                             |  20 +-
 net/ceph/pagelist.c                   |   4 +-
 net/ceph/pagevec.c                    |   2 +-
 112 files changed, 983 insertions(+), 752 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/25] mm: Introduce struct folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 02/25] mm: Add put_folio Matthew Wilcox (Oracle)
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

We have trouble keeping track of whether we've already called
compound_head() to ensure we're not operating on a tail page.  Further,
it's never clear whether we intend a struct page to refer to PAGE_SIZE
bytes or page_size(compound_head(page)).

Introduce a new type 'struct folio' that always refers to an entire
(possibly compound) page, and points to the head page (or base page).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h       | 10 ++++++++++
 include/linux/mm_types.h | 17 +++++++++++++++++
 include/linux/pagemap.h  | 14 ++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5299b90a6c40..ed20fd0c6169 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -916,6 +916,11 @@ static inline unsigned int compound_order(struct page *page)
 	return page[1].compound_order;
 }
 
+static inline unsigned int folio_order(struct folio *folio)
+{
+	return compound_order(&folio->page);
+}
+
 static inline bool hpage_pincount_available(struct page *page)
 {
 	/*
@@ -967,6 +972,11 @@ static inline unsigned int page_shift(struct page *page)
 
 void free_compound_page(struct page *page);
 
+static inline unsigned long folio_nr_pages(struct folio *folio)
+{
+	return compound_nr(&folio->page);
+}
+
 #ifdef CONFIG_MMU
 /*
  * Do pte_mkwrite, but only if the vma says VM_WRITE.  We do this when
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 65df8abd90bd..d7e487d9998f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -223,6 +223,23 @@ struct page {
 #endif
 } _struct_page_alignment;
 
+/*
+ * A struct folio is either a base (order-0) page or the head page of
+ * a compound page.
+ */
+struct folio {
+	struct page page;
+};
+
+static inline struct folio *page_folio(struct page *page)
+{
+	unsigned long head = READ_ONCE(page->compound_head);
+
+	if (unlikely(head & 1))
+		return (struct folio *)(head - 1);
+	return (struct folio *)page;
+}
+
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
 {
 	return &page[1].compound_mapcount;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f18857c79478..b5af2c3719ab 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -437,6 +437,20 @@ static inline bool thp_contains(struct page *head, pgoff_t index)
 	return page_index(head) == (index & ~(thp_nr_pages(head) - 1UL));
 }
 
+static inline pgoff_t folio_index(struct folio *folio)
+{
+        if (unlikely(FolioSwapCache(folio)))
+                return __page_file_index(&folio->page);
+        return folio->page.index;
+}
+
+static inline struct page *folio_page(struct folio *folio, pgoff_t index)
+{
+	index -= folio_index(folio);
+	VM_BUG_ON_PAGE(index >= folio_nr_pages(folio), &folio->page);
+	return &folio->page + index;
+}
+
 /*
  * Given the page we found in the page cache, return the page corresponding
  * to this index in the file
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/25] mm: Add put_folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 01/25] mm: Introduce struct folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 03/25] mm: Add get_folio Matthew Wilcox (Oracle)
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

If we know we have a folio, we can call put_folio() instead of put_page()
and save the overhead of calling compound_head().  Also skips the
devmap checks.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed20fd0c6169..a9191dc250a6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1194,9 +1194,15 @@ static inline __must_check bool try_get_page(struct page *page)
 	return true;
 }
 
+static inline void put_folio(struct folio *folio)
+{
+	if (put_page_testzero(&folio->page))
+		__put_page(&folio->page);
+}
+
 static inline void put_page(struct page *page)
 {
-	page = compound_head(page);
+	struct folio *folio = page_folio(page);
 
 	/*
 	 * For devmap managed pages we need to catch refcount transition from
@@ -1204,13 +1210,12 @@ static inline void put_page(struct page *page)
 	 * need to inform the device driver through callback. See
 	 * include/linux/memremap.h and HMM for details.
 	 */
-	if (page_is_devmap_managed(page)) {
-		put_devmap_managed_page(page);
+	if (page_is_devmap_managed(&folio->page)) {
+		put_devmap_managed_page(&folio->page);
 		return;
 	}
 
-	if (put_page_testzero(page))
-		__put_page(page);
+	put_folio(folio);
 }
 
 /*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/25] mm: Add get_folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 01/25] mm: Introduce struct folio Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 02/25] mm: Add put_folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 04/25] mm: Create FolioFlags Matthew Wilcox (Oracle)
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

If we know we have a folio, we can call get_folio() instead of get_page()
and save the overhead of calling compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a9191dc250a6..02ccb7a09190 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1172,15 +1172,17 @@ static inline bool is_pci_p2pdma_page(const struct page *page)
 #define page_ref_zero_or_close_to_overflow(page) \
 	((unsigned int) page_ref_count(page) + 127u <= 127u)
 
+static inline void get_folio(struct folio *folio)
+{
+	/* Getting a page requires an already elevated page->_refcount. */
+	VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(&folio->page),
+			&folio->page);
+	page_ref_inc(&folio->page);
+}
+
 static inline void get_page(struct page *page)
 {
-	page = compound_head(page);
-	/*
-	 * Getting a normal page or the head of a compound page
-	 * requires to already have an elevated page->_refcount.
-	 */
-	VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
-	page_ref_inc(page);
+	get_folio(page_folio(page));
 }
 
 bool __must_check try_grab_page(struct page *page, unsigned int flags);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/25] mm: Create FolioFlags
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 03/25] mm: Add get_folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 05/25] mm: Add unlock_folio Matthew Wilcox (Oracle)
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

These new functions are the folio analogues of the PageFlags functions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/page-flags.h | 80 ++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 17 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index ec5d0290e0ee..446217ce13e9 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -212,6 +212,12 @@ static inline void page_init_poison(struct page *page, size_t size)
 }
 #endif
 
+static unsigned long *folio_flags(struct folio *folio)
+{
+	VM_BUG_ON_PGFLAGS(PagePoisoned(&folio->page), &folio->page);
+	return &folio->page.flags;
+}
+
 /*
  * Page flags policies wrt compound pages
  *
@@ -260,30 +266,44 @@ static inline void page_init_poison(struct page *page, size_t size)
  * Macros to create function definitions for page flags
  */
 #define TESTPAGEFLAG(uname, lname, policy)				\
+static __always_inline int Folio##uname(struct folio *folio)		\
+	{ return test_bit(PG_##lname, folio_flags(folio)); }		\
 static __always_inline int Page##uname(struct page *page)		\
 	{ return test_bit(PG_##lname, &policy(page, 0)->flags); }
 
 #define SETPAGEFLAG(uname, lname, policy)				\
+static __always_inline void SetFolio##uname(struct folio *folio)	\
+	{ set_bit(PG_##lname, folio_flags(folio)); }			\
 static __always_inline void SetPage##uname(struct page *page)		\
 	{ set_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define CLEARPAGEFLAG(uname, lname, policy)				\
+static __always_inline void ClearFolio##uname(struct folio *folio)	\
+	{ clear_bit(PG_##lname, folio_flags(folio)); }			\
 static __always_inline void ClearPage##uname(struct page *page)		\
 	{ clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define __SETPAGEFLAG(uname, lname, policy)				\
+static __always_inline void __SetFolio##uname(struct folio *folio)	\
+	{ __set_bit(PG_##lname, folio_flags(folio)); }			\
 static __always_inline void __SetPage##uname(struct page *page)		\
 	{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define __CLEARPAGEFLAG(uname, lname, policy)				\
+static __always_inline void __ClearFolio##uname(struct folio *folio)	\
+	{ __clear_bit(PG_##lname, folio_flags(folio)); }		\
 static __always_inline void __ClearPage##uname(struct page *page)	\
 	{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define TESTSETFLAG(uname, lname, policy)				\
+static __always_inline int TestSetFolio##uname(struct folio *folio)	\
+	{ return test_and_set_bit(PG_##lname, folio_flags(folio)); }	\
 static __always_inline int TestSetPage##uname(struct page *page)	\
 	{ return test_and_set_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define TESTCLEARFLAG(uname, lname, policy)				\
+static __always_inline int TestClearFolio##uname(struct folio *folio)	\
+	{ return test_and_clear_bit(PG_##lname, folio_flags(folio)); }	\
 static __always_inline int TestClearPage##uname(struct page *page)	\
 	{ return test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
@@ -302,21 +322,27 @@ static __always_inline int TestClearPage##uname(struct page *page)	\
 	TESTCLEARFLAG(uname, lname, policy)
 
 #define TESTPAGEFLAG_FALSE(uname)					\
+static inline int Folio##uname(const struct folio *folio) { return 0; }	\
 static inline int Page##uname(const struct page *page) { return 0; }
 
 #define SETPAGEFLAG_NOOP(uname)						\
+static inline void SetFolio##uname(struct folio *folio) { }		\
 static inline void SetPage##uname(struct page *page) {  }
 
 #define CLEARPAGEFLAG_NOOP(uname)					\
+static inline void ClearFolio##uname(struct folio *folio) { }		\
 static inline void ClearPage##uname(struct page *page) {  }
 
 #define __CLEARPAGEFLAG_NOOP(uname)					\
+static inline void __ClearFolio##uname(struct folio *folio) { }		\
 static inline void __ClearPage##uname(struct page *page) {  }
 
 #define TESTSETFLAG_FALSE(uname)					\
+static inline int TestSetFolio##uname(struct folio *folio) { return 0; } \
 static inline int TestSetPage##uname(struct page *page) { return 0; }
 
 #define TESTCLEARFLAG_FALSE(uname)					\
+static inline int TestClearFolio##uname(struct folio *folio) { return 0; } \
 static inline int TestClearPage##uname(struct page *page) { return 0; }
 
 #define PAGEFLAG_FALSE(uname) TESTPAGEFLAG_FALSE(uname)			\
@@ -393,14 +419,18 @@ PAGEFLAG_FALSE(HighMem)
 #endif
 
 #ifdef CONFIG_SWAP
-static __always_inline int PageSwapCache(struct page *page)
+static __always_inline bool FolioSwapCache(struct folio *folio)
 {
-#ifdef CONFIG_THP_SWAP
-	page = compound_head(page);
-#endif
-	return PageSwapBacked(page) && test_bit(PG_swapcache, &page->flags);
+	return FolioSwapBacked(folio) &&
+			test_bit(PG_swapcache, folio_flags(folio));
+
+}
 
+static __always_inline bool PageSwapCache(struct page *page)
+{
+	return FolioSwapCache(page_folio(page));
 }
+
 SETPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
 CLEARPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
 #else
@@ -509,18 +539,16 @@ TESTPAGEFLAG_FALSE(Ksm)
 
 u64 stable_page_flags(struct page *page);
 
-static inline int PageUptodate(struct page *page)
+static inline int FolioUptodate(struct folio *folio)
 {
-	int ret;
-	page = compound_head(page);
-	ret = test_bit(PG_uptodate, &(page)->flags);
+	int ret = test_bit(PG_uptodate, folio_flags(folio));
 	/*
 	 * Must ensure that the data we read out of the page is loaded
 	 * _after_ we've loaded page->flags to check for PageUptodate.
 	 * We can skip the barrier if the page is not uptodate, because
 	 * we wouldn't be reading anything from it.
 	 *
-	 * See SetPageUptodate() for the other side of the story.
+	 * See SetFolioUptodate() for the other side of the story.
 	 */
 	if (ret)
 		smp_rmb();
@@ -528,23 +556,38 @@ static inline int PageUptodate(struct page *page)
 	return ret;
 }
 
-static __always_inline void __SetPageUptodate(struct page *page)
+static inline int PageUptodate(struct page *page)
+{
+	return FolioUptodate(page_folio(page));
+}
+
+static __always_inline void __SetFolioUptodate(struct folio *folio)
 {
-	VM_BUG_ON_PAGE(PageTail(page), page);
 	smp_wmb();
-	__set_bit(PG_uptodate, &page->flags);
+	__set_bit(PG_uptodate, folio_flags(folio));
 }
 
-static __always_inline void SetPageUptodate(struct page *page)
+static __always_inline void SetFolioUptodate(struct folio *folio)
 {
-	VM_BUG_ON_PAGE(PageTail(page), page);
 	/*
 	 * Memory barrier must be issued before setting the PG_uptodate bit,
 	 * so that all previous stores issued in order to bring the page
 	 * uptodate are actually visible before PageUptodate becomes true.
 	 */
 	smp_wmb();
-	set_bit(PG_uptodate, &page->flags);
+	set_bit(PG_uptodate, folio_flags(folio));
+}
+
+static __always_inline void __SetPageUptodate(struct page *page)
+{
+	VM_BUG_ON_PAGE(PageTail(page), page);
+	__SetFolioUptodate((struct folio *)page);
+}
+
+static __always_inline void SetPageUptodate(struct page *page)
+{
+	VM_BUG_ON_PAGE(PageTail(page), page);
+	SetFolioUptodate((struct folio *)page);
 }
 
 CLEARPAGEFLAG(Uptodate, uptodate, PF_NO_TAIL)
@@ -593,6 +636,10 @@ static inline void ClearPageCompound(struct page *page)
 int PageHuge(struct page *page);
 int PageHeadHuge(struct page *page);
 bool page_huge_active(struct page *page);
+static inline bool FolioHuge(struct folio *folio)
+{
+	return PageHeadHuge(&folio->page);
+}
 #else
 TESTPAGEFLAG_FALSE(Huge)
 TESTPAGEFLAG_FALSE(HeadHuge)
@@ -603,7 +650,6 @@ static inline bool page_huge_active(struct page *page)
 }
 #endif
 
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * PageHuge() only returns true for hugetlbfs pages, but not for
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/25] mm: Add unlock_folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 04/25] mm: Create FolioFlags Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 06/25] mm: Add lock_folio Matthew Wilcox (Oracle)
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Convert unlock_page() to call unlock_folio().  By using a folio we avoid
doing a repeated compound_head() This shortens the function from 120
bytes to 76 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 16 +++++++++++++++-
 mm/filemap.c            | 27 ++++++++++-----------------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index b5af2c3719ab..83786e7eeb23 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -604,7 +604,21 @@ extern int __lock_page_killable(struct page *page);
 extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
-extern void unlock_page(struct page *page);
+extern void unlock_folio(struct folio *folio);
+
+/**
+ * unlock_page - Unlock a locked page.
+ * @page: The page.
+ *
+ * Unlocks the page and wakes up any thread sleeping on the page lock.
+ *
+ * Context: May be called from interrupt or process context.  May not be
+ * called from NMI context.
+ */
+static inline void unlock_page(struct page *page)
+{
+	return unlock_folio(page_folio(page));
+}
 
 /*
  * Return true if the page was successfully locked
diff --git a/mm/filemap.c b/mm/filemap.c
index 13ef23f72330..8af89ecc1452 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1443,29 +1443,22 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
 #endif
 
 /**
- * unlock_page - unlock a locked page
- * @page: the page
+ * unlock_folio - Unlock a locked folio.
+ * @folio: The folio.
  *
- * Unlocks the page and wakes up sleepers in wait_on_page_locked().
- * Also wakes sleepers in wait_on_page_writeback() because the wakeup
- * mechanism between PageLocked pages and PageWriteback pages is shared.
- * But that's OK - sleepers in wait_on_page_writeback() just go back to sleep.
+ * Unlocks the folio and wakes up any thread sleeping on the page lock.
  *
- * Note that this depends on PG_waiters being the sign bit in the byte
- * that contains PG_locked - thus the BUILD_BUG_ON(). That allows us to
- * clear the PG_locked bit and test PG_waiters at the same time fairly
- * portably (architectures that do LL/SC can test any bit, while x86 can
- * test the sign bit).
+ * Context: May be called from interrupt or process context.  May not be
+ * called from NMI context.
  */
-void unlock_page(struct page *page)
+void unlock_folio(struct folio *folio)
 {
 	BUILD_BUG_ON(PG_waiters != 7);
-	page = compound_head(page);
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags))
-		wake_up_page_bit(page, PG_locked);
+	VM_BUG_ON_PAGE(!FolioLocked(folio), &folio->page);
+	if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio)))
+		wake_up_page_bit(&folio->page, PG_locked);
 }
-EXPORT_SYMBOL(unlock_page);
+EXPORT_SYMBOL(unlock_folio);
 
 /**
  * end_page_writeback - end writeback against a page
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/25] mm: Add lock_folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 05/25] mm: Add unlock_folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 07/25] mm: Add lock_folio_killable Matthew Wilcox (Oracle)
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

This is like lock_page() but for use by callers who know they have a folio.
Convert __lock_page() to be __lock_folio().  This saves one call to
compound_head() per contended call to lock_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 21 +++++++++++++++------
 mm/filemap.c            | 29 +++++++++++++++--------------
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 83786e7eeb23..c5fe759872b5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -599,7 +599,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 	return true;
 }
 
-extern void __lock_page(struct page *page);
+extern void __lock_folio(struct folio *folio);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
@@ -620,13 +620,24 @@ static inline void unlock_page(struct page *page)
 	return unlock_folio(page_folio(page));
 }
 
+static inline bool trylock_folio(struct folio *folio)
+{
+	return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio)));
+}
+
 /*
  * Return true if the page was successfully locked
  */
 static inline int trylock_page(struct page *page)
 {
-	page = compound_head(page);
-	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
+	return trylock_folio(page_folio(page));
+}
+
+static inline void lock_folio(struct folio *folio)
+{
+	might_sleep();
+	if (!trylock_folio(folio))
+		__lock_folio(folio);
 }
 
 /*
@@ -634,9 +645,7 @@ static inline int trylock_page(struct page *page)
  */
 static inline void lock_page(struct page *page)
 {
-	might_sleep();
-	if (!trylock_page(page))
-		__lock_page(page);
+	lock_folio(page_folio(page));
 }
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index 8af89ecc1452..50fdc03590b3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1160,7 +1160,7 @@ static void wake_up_page(struct page *page, int bit)
  */
 enum behavior {
 	EXCLUSIVE,	/* Hold ref to page and take the bit when woken, like
-			 * __lock_page() waiting on then setting PG_locked.
+			 * __lock_folio() waiting on then setting PG_locked.
 			 */
 	SHARED,		/* Hold ref to page and check the bit when woken, like
 			 * wait_on_page_writeback() waiting on PG_writeback.
@@ -1523,17 +1523,16 @@ void page_endio(struct page *page, bool is_write, int err)
 EXPORT_SYMBOL_GPL(page_endio);
 
 /**
- * __lock_page - get a lock on the page, assuming we need to sleep to get it
- * @__page: the page to lock
+ * __lock_folio - Get a lock on the folio, assuming we need to sleep to get it.
+ * @folio: The folio to lock
  */
-void __lock_page(struct page *__page)
+void __lock_folio(struct folio *folio)
 {
-	struct page *page = compound_head(__page);
-	wait_queue_head_t *q = page_waitqueue(page);
-	wait_on_page_bit_common(q, page, PG_locked, TASK_UNINTERRUPTIBLE,
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_UNINTERRUPTIBLE,
 				EXCLUSIVE);
 }
-EXPORT_SYMBOL(__lock_page);
+EXPORT_SYMBOL(__lock_folio);
 
 int __lock_page_killable(struct page *__page)
 {
@@ -1587,10 +1586,10 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 			return 0;
 		}
 	} else {
-		__lock_page(page);
+		__lock_folio(page_folio(page));
 	}
-	return 1;
 
+	return 1;
 }
 
 /**
@@ -2764,7 +2763,9 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
 static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 				     struct file **fpin)
 {
-	if (trylock_page(page))
+	struct folio *folio = page_folio(page);
+
+	if (trylock_folio(folio))
 		return 1;
 
 	/*
@@ -2777,7 +2778,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 
 	*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
 	if (vmf->flags & FAULT_FLAG_KILLABLE) {
-		if (__lock_page_killable(page)) {
+		if (__lock_page_killable(&folio->page)) {
 			/*
 			 * We didn't have the right flags to drop the mmap_lock,
 			 * but all fault_handlers only check for fatal signals
@@ -2789,11 +2790,11 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 			return 0;
 		}
 	} else
-		__lock_page(page);
+		__lock_folio(folio);
+
 	return 1;
 }
 
-
 /*
  * Synchronous readahead happens when we don't even find a page in the page
  * cache at all.  We don't want to perform IO under the mmap sem, so if we have
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/25] mm: Add lock_folio_killable
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 06/25] mm: Add lock_folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 08/25] mm: Add __alloc_folio_node and alloc_folio Matthew Wilcox (Oracle)
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

This is like lock_page_killable() but for use by callers who
know they have a folio.  Convert __lock_page_killable() to be
__lock_folio_killable().  This saves one call to compound_head() per
contended call to lock_page_killable().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 15 ++++++++++-----
 mm/filemap.c            | 17 +++++++++--------
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c5fe759872b5..5acebbb75d41 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -600,7 +600,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 }
 
 extern void __lock_folio(struct folio *folio);
-extern int __lock_page_killable(struct page *page);
+extern int __lock_folio_killable(struct folio *folio);
 extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
@@ -648,6 +648,14 @@ static inline void lock_page(struct page *page)
 	lock_folio(page_folio(page));
 }
 
+static inline int lock_folio_killable(struct folio *folio)
+{
+	might_sleep();
+	if (!trylock_folio(folio))
+		return __lock_folio_killable(folio);
+	return 0;
+}
+
 /*
  * lock_page_killable is like lock_page but can be interrupted by fatal
  * signals.  It returns 0 if it locked the page and -EINTR if it was
@@ -655,10 +663,7 @@ static inline void lock_page(struct page *page)
  */
 static inline int lock_page_killable(struct page *page)
 {
-	might_sleep();
-	if (!trylock_page(page))
-		return __lock_page_killable(page);
-	return 0;
+	return lock_folio_killable(page_folio(page));
 }
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index 50fdc03590b3..dd26b50e3676 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1534,14 +1534,13 @@ void __lock_folio(struct folio *folio)
 }
 EXPORT_SYMBOL(__lock_folio);
 
-int __lock_page_killable(struct page *__page)
+int __lock_folio_killable(struct folio *folio)
 {
-	struct page *page = compound_head(__page);
-	wait_queue_head_t *q = page_waitqueue(page);
-	return wait_on_page_bit_common(q, page, PG_locked, TASK_KILLABLE,
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	return wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_KILLABLE,
 					EXCLUSIVE);
 }
-EXPORT_SYMBOL_GPL(__lock_page_killable);
+EXPORT_SYMBOL_GPL(__lock_folio_killable);
 
 int __lock_page_async(struct page *page, struct wait_page_queue *wait)
 {
@@ -1562,6 +1561,8 @@ int __lock_page_async(struct page *page, struct wait_page_queue *wait)
 int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 			 unsigned int flags)
 {
+	struct folio *folio = page_folio(page);
+
 	if (fault_flag_allow_retry_first(flags)) {
 		/*
 		 * CAUTION! In this case, mmap_lock is not released
@@ -1580,13 +1581,13 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 	if (flags & FAULT_FLAG_KILLABLE) {
 		int ret;
 
-		ret = __lock_page_killable(page);
+		ret = __lock_folio_killable(folio);
 		if (ret) {
 			mmap_read_unlock(mm);
 			return 0;
 		}
 	} else {
-		__lock_folio(page_folio(page));
+		__lock_folio(folio);
 	}
 
 	return 1;
@@ -2778,7 +2779,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 
 	*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
 	if (vmf->flags & FAULT_FLAG_KILLABLE) {
-		if (__lock_page_killable(&folio->page)) {
+		if (__lock_folio_killable(folio)) {
 			/*
 			 * We didn't have the right flags to drop the mmap_lock,
 			 * but all fault_handlers only check for fatal signals
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/25] mm: Add __alloc_folio_node and alloc_folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 07/25] mm: Add lock_folio_killable Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 09/25] mm: Convert __page_cache_alloc to return a folio Matthew Wilcox (Oracle)
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

These wrappers are mostly for typesafety, but they also ensure that
the page allocator allocates a compound page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/gfp.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 53caa9846854..9e416efb4ff8 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -524,6 +524,12 @@ __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 	return __alloc_pages(gfp_mask, order, nid);
 }
 
+static inline
+struct folio *__alloc_folio_node(int nid, gfp_t gfp, unsigned int order)
+{
+	return (struct folio *)__alloc_pages_node(nid, gfp | __GFP_COMP, order);
+}
+
 /*
  * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
  * prefer the current CPU's closest node. Otherwise node must be valid and
@@ -565,6 +571,11 @@ static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
 #define alloc_page_vma(gfp_mask, vma, addr)			\
 	alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)
 
+static inline struct folio *alloc_folio(gfp_t gfp, unsigned int order)
+{
+	return (struct folio *)alloc_pages(gfp | __GFP_COMP, order);
+}
+
 extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
 extern unsigned long get_zeroed_page(gfp_t gfp_mask);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/25] mm: Convert __page_cache_alloc to return a folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 08/25] mm: Add __alloc_folio_node and alloc_folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 10/25] mm/filemap: Convert end_page_writeback to use " Matthew Wilcox (Oracle)
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Most of the users turn it back into a struct page pointer, but
some can make use of it as a folio immediately.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/afs/dir.c            |  2 +-
 fs/btrfs/compression.c  |  4 ++--
 fs/cachefiles/rdwr.c    |  6 ++++--
 fs/ceph/addr.c          |  2 +-
 fs/ceph/file.c          |  2 +-
 include/linux/pagemap.h |  8 ++++----
 mm/filemap.c            | 16 ++++++++--------
 mm/readahead.c          |  2 +-
 net/ceph/pagelist.c     |  4 ++--
 net/ceph/pagevec.c      |  2 +-
 10 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 9068d5578a26..52e9da468787 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -272,7 +272,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 				afs_stat_v(dvnode, n_inval);
 
 			ret = -ENOMEM;
-			req->pages[i] = __page_cache_alloc(gfp);
+			req->pages[i] = &__page_cache_alloc(gfp, 0)->page;
 			if (!req->pages[i])
 				goto error;
 			ret = add_to_page_cache_lru(req->pages[i],
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 5ae3fa0386b7..3309a973b678 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -532,8 +532,8 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			goto next;
 		}
 
-		page = __page_cache_alloc(mapping_gfp_constraint(mapping,
-								 ~__GFP_FS));
+		page = &__page_cache_alloc(mapping_gfp_constraint(mapping,
+						 ~__GFP_FS), 0)->page;
 		if (!page)
 			break;
 
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 8bda092e60c5..268fbcac4afb 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -260,7 +260,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 			goto backing_page_already_present;
 
 		if (!newpage) {
-			newpage = __page_cache_alloc(cachefiles_gfp);
+			newpage = &__page_cache_alloc(cachefiles_gfp, 0)->page;
 			if (!newpage)
 				goto nomem_monitor;
 		}
@@ -497,7 +497,9 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 				goto backing_page_already_present;
 
 			if (!newpage) {
-				newpage = __page_cache_alloc(cachefiles_gfp);
+				struct folio *folio;
+				folio = __page_cache_alloc(cachefiles_gfp, 0);
+				newpage = &folio->page;
 				if (!newpage)
 					goto nomem;
 			}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 950552944436..5b2873b12904 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1760,7 +1760,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
 		if (len > PAGE_SIZE)
 			len = PAGE_SIZE;
 	} else {
-		page = __page_cache_alloc(GFP_NOFS);
+		page = &__page_cache_alloc(GFP_NOFS, 0)->page;
 		if (!page) {
 			err = -ENOMEM;
 			goto out;
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 209535d5b8d3..f8e1482ea7c1 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1587,7 +1587,7 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		struct page *page = NULL;
 		loff_t i_size;
 		if (retry_op == READ_INLINE) {
-			page = __page_cache_alloc(GFP_KERNEL);
+			page = &__page_cache_alloc(GFP_KERNEL, 0)->page;
 			if (!page)
 				return -ENOMEM;
 		}
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 5acebbb75d41..317f17e98412 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -282,17 +282,17 @@ static inline void *detach_page_private(struct page *page)
 }
 
 #ifdef CONFIG_NUMA
-extern struct page *__page_cache_alloc(gfp_t gfp);
+extern struct folio *__page_cache_alloc(gfp_t gfp, unsigned int order);
 #else
-static inline struct page *__page_cache_alloc(gfp_t gfp)
+static inline struct folio *__page_cache_alloc(gfp_t gfp, unsigned int order)
 {
-	return alloc_pages(gfp, 0);
+	return alloc_folio(gfp, order);
 }
 #endif
 
 static inline struct page *page_cache_alloc(struct address_space *x)
 {
-	return __page_cache_alloc(mapping_gfp_mask(x));
+	return &__page_cache_alloc(mapping_gfp_mask(x), 0)->page;
 }
 
 static inline gfp_t readahead_gfp_mask(struct address_space *x)
diff --git a/mm/filemap.c b/mm/filemap.c
index dd26b50e3676..6012e8a7bd6c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -960,22 +960,22 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 EXPORT_SYMBOL_GPL(add_to_page_cache_lru);
 
 #ifdef CONFIG_NUMA
-struct page *__page_cache_alloc(gfp_t gfp)
+struct folio *__page_cache_alloc(gfp_t gfp, unsigned int order)
 {
 	int n;
-	struct page *page;
+	struct folio *folio;
 
 	if (cpuset_do_page_mem_spread()) {
 		unsigned int cpuset_mems_cookie;
 		do {
 			cpuset_mems_cookie = read_mems_allowed_begin();
 			n = cpuset_mem_spread_node();
-			page = __alloc_pages_node(n, gfp, 0);
-		} while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
+			folio = __alloc_folio_node(n, gfp, order);
+		} while (!folio && read_mems_allowed_retry(cpuset_mems_cookie));
 
-		return page;
+		return folio;
 	}
-	return alloc_pages(gfp, 0);
+	return alloc_folio(gfp, order);
 }
 EXPORT_SYMBOL(__page_cache_alloc);
 #endif
@@ -1801,7 +1801,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 		if (fgp_flags & FGP_NOFS)
 			gfp_mask &= ~__GFP_FS;
 
-		page = __page_cache_alloc(gfp_mask);
+		page = &__page_cache_alloc(gfp_mask, 0)->page;
 		if (!page)
 			return NULL;
 
@@ -3192,7 +3192,7 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 repeat:
 	page = find_get_page(mapping, index);
 	if (!page) {
-		page = __page_cache_alloc(gfp);
+		page = &__page_cache_alloc(gfp, 0)->page;
 		if (!page)
 			return ERR_PTR(-ENOMEM);
 		err = add_to_page_cache_lru(page, mapping, index, gfp);
diff --git a/mm/readahead.c b/mm/readahead.c
index c5b0457415be..d7a5424e3d0d 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -213,7 +213,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 			continue;
 		}
 
-		page = __page_cache_alloc(gfp_mask);
+		page = &__page_cache_alloc(gfp_mask, 0)->page;
 		if (!page)
 			break;
 		if (mapping->a_ops->readpages) {
diff --git a/net/ceph/pagelist.c b/net/ceph/pagelist.c
index 65e34f78b05d..bde78f5eea33 100644
--- a/net/ceph/pagelist.c
+++ b/net/ceph/pagelist.c
@@ -56,7 +56,7 @@ static int ceph_pagelist_addpage(struct ceph_pagelist *pl)
 	struct page *page;
 
 	if (!pl->num_pages_free) {
-		page = __page_cache_alloc(GFP_NOFS);
+		page = &__page_cache_alloc(GFP_NOFS, 0)->page;
 	} else {
 		page = list_first_entry(&pl->free_list, struct page, lru);
 		list_del(&page->lru);
@@ -107,7 +107,7 @@ int ceph_pagelist_reserve(struct ceph_pagelist *pl, size_t space)
 	space = (space + PAGE_SIZE - 1) >> PAGE_SHIFT;   /* conv to num pages */
 
 	while (space > pl->num_pages_free) {
-		struct page *page = __page_cache_alloc(GFP_NOFS);
+		struct page *page = &__page_cache_alloc(GFP_NOFS, 0)->page;
 		if (!page)
 			return -ENOMEM;
 		list_add_tail(&page->lru, &pl->free_list);
diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
index 64305e7056a1..8e5f70b8fa10 100644
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -45,7 +45,7 @@ struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
 	if (!pages)
 		return ERR_PTR(-ENOMEM);
 	for (i = 0; i < num_pages; i++) {
-		pages[i] = __page_cache_alloc(flags);
+		pages[i] = &__page_cache_alloc(flags, 0)->page;
 		if (pages[i] == NULL) {
 			ceph_release_page_vector(pages, i);
 			return ERR_PTR(-ENOMEM);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/25] mm/filemap: Convert end_page_writeback to use a folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 09/25] mm: Convert __page_cache_alloc to return a folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 11/25] mm: Convert mapping_get_entry to return " Matthew Wilcox (Oracle)
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

With my config, this function shrinks from 480 bytes to 240 bytes
due to elimination of repeated calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 6012e8a7bd6c..654bba53442a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1148,11 +1148,11 @@ static void wake_up_page_bit(struct page *page, int bit_nr)
 	spin_unlock_irqrestore(&q->lock, flags);
 }
 
-static void wake_up_page(struct page *page, int bit)
+static void wake_up_folio(struct folio *folio, int bit)
 {
-	if (!PageWaiters(page))
+	if (!FolioWaiters(folio))
 		return;
-	wake_up_page_bit(page, bit);
+	wake_up_page_bit(&folio->page, bit);
 }
 
 /*
@@ -1466,6 +1466,8 @@ EXPORT_SYMBOL(unlock_folio);
  */
 void end_page_writeback(struct page *page)
 {
+	struct folio *folio = page_folio(page);
+
 	/*
 	 * TestClearPageReclaim could be used here but it is an atomic
 	 * operation and overkill in this particular case. Failing to
@@ -1473,9 +1475,9 @@ void end_page_writeback(struct page *page)
 	 * justify taking an atomic operation penalty at the end of
 	 * ever page writeback.
 	 */
-	if (PageReclaim(page)) {
-		ClearPageReclaim(page);
-		rotate_reclaimable_page(page);
+	if (FolioReclaim(folio)) {
+		ClearFolioReclaim(folio);
+		rotate_reclaimable_page(&folio->page);
 	}
 
 	/*
@@ -1484,13 +1486,13 @@ void end_page_writeback(struct page *page)
 	 * But here we must make sure that the page is not freed and
 	 * reused before the wake_up_page().
 	 */
-	get_page(page);
-	if (!test_clear_page_writeback(page))
+	get_folio(folio);
+	if (!test_clear_page_writeback(&folio->page))
 		BUG();
 
 	smp_mb__after_atomic();
-	wake_up_page(page, PG_writeback);
-	put_page(page);
+	wake_up_folio(folio, PG_writeback);
+	put_folio(folio);
 }
 EXPORT_SYMBOL(end_page_writeback);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/25] mm: Convert mapping_get_entry to return a folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 10/25] mm/filemap: Convert end_page_writeback to use " Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 12/25] mm: Add mark_folio_accessed Matthew Wilcox (Oracle)
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

The pagecache only contains folios, so this is the right thing to do.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 654bba53442a..b9f25a2d8312 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1673,33 +1673,33 @@ EXPORT_SYMBOL(page_cache_prev_miss);
  * @index: The page cache index.
  *
  * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, the head page is returned with an increased refcount.
+ * page cache page, the folio is returned with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * Return: The head page or shadow entry, %NULL if nothing is found.
+ * Return: The folio or shadow entry, %NULL if nothing is found.
  */
-static struct page *mapping_get_entry(struct address_space *mapping,
+static struct folio *mapping_get_entry(struct address_space *mapping,
 		pgoff_t index)
 {
 	XA_STATE(xas, &mapping->i_pages, index);
-	struct page *page;
+	struct folio *folio;
 
 	rcu_read_lock();
 repeat:
 	xas_reset(&xas);
-	page = xas_load(&xas);
-	if (xas_retry(&xas, page))
+	folio = xas_load(&xas);
+	if (xas_retry(&xas, folio))
 		goto repeat;
 	/*
 	 * A shadow entry of a recently evicted page, or a swap entry from
 	 * shmem/tmpfs.  Return it without attempting to raise page count.
 	 */
-	if (!page || xa_is_value(page))
+	if (!folio || xa_is_value(folio))
 		goto out;
 
-	if (!page_cache_get_speculative(page))
+	if (!page_cache_get_speculative(&folio->page))
 		goto repeat;
 
 	/*
@@ -1707,14 +1707,14 @@ static struct page *mapping_get_entry(struct address_space *mapping,
 	 * This is part of the lockless pagecache protocol. See
 	 * include/linux/pagemap.h for details.
 	 */
-	if (unlikely(page != xas_reload(&xas))) {
-		put_page(page);
+	if (unlikely(folio != xas_reload(&xas))) {
+		put_folio(folio);
 		goto repeat;
 	}
 out:
 	rcu_read_unlock();
 
-	return page;
+	return folio;
 }
 
 /**
@@ -1757,7 +1757,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 	struct page *page;
 
 repeat:
-	page = mapping_get_entry(mapping, index);
+	page = &mapping_get_entry(mapping, index)->page;
 	if (xa_is_value(page)) {
 		if (fgp_flags & FGP_ENTRY)
 			return page;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 12/25] mm: Add mark_folio_accessed
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 11/25] mm: Convert mapping_get_entry to return " Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 13/25] mm: Add filemap_get_folio and find_get_folio Matthew Wilcox (Oracle)
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

This already operated on the entire compound page, but now we can avoid
calling compound_head quite so many times.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/swap.h |  8 ++++++--
 mm/swap.c            | 28 +++++++++++++---------------
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5bba15ac5a2e..c097bc9cedd9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -338,7 +338,7 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file,
 			  unsigned int nr_pages);
 extern void lru_note_cost_page(struct page *);
 extern void lru_cache_add(struct page *);
-extern void mark_page_accessed(struct page *);
+void mark_folio_accessed(struct folio *);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_cpu_zone(struct zone *zone);
@@ -348,10 +348,14 @@ extern void deactivate_file_page(struct page *page);
 extern void deactivate_page(struct page *page);
 extern void mark_page_lazyfree(struct page *page);
 extern void swap_setup(void);
-
 extern void lru_cache_add_inactive_or_unevictable(struct page *page,
 						struct vm_area_struct *vma);
 
+static inline void mark_page_accessed(struct page *page)
+{
+	mark_folio_accessed(page_folio(page));
+}
+
 /* linux/mm/vmscan.c */
 extern unsigned long zone_reclaimable_pages(struct zone *zone);
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
diff --git a/mm/swap.c b/mm/swap.c
index 490553f3f9ef..c3638a13987f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -411,36 +411,34 @@ static void __lru_cache_activate_page(struct page *page)
  * When a newly allocated page is not yet visible, so safe for non-atomic ops,
  * __SetPageReferenced(page) may be substituted for mark_page_accessed(page).
  */
-void mark_page_accessed(struct page *page)
+void mark_folio_accessed(struct folio *folio)
 {
-	page = compound_head(page);
-
-	if (!PageReferenced(page)) {
-		SetPageReferenced(page);
-	} else if (PageUnevictable(page)) {
+	if (!FolioReferenced(folio)) {
+		SetFolioReferenced(folio);
+	} else if (FolioUnevictable(folio)) {
 		/*
 		 * Unevictable pages are on the "LRU_UNEVICTABLE" list. But,
 		 * this list is never rotated or maintained, so marking an
 		 * evictable page accessed has no effect.
 		 */
-	} else if (!PageActive(page)) {
+	} else if (!FolioActive(folio)) {
 		/*
 		 * If the page is on the LRU, queue it for activation via
 		 * lru_pvecs.activate_page. Otherwise, assume the page is on a
 		 * pagevec, mark it active and it'll be moved to the active
 		 * LRU on the next drain.
 		 */
-		if (PageLRU(page))
-			activate_page(page);
+		if (FolioLRU(folio))
+			activate_page(&folio->page);
 		else
-			__lru_cache_activate_page(page);
-		ClearPageReferenced(page);
-		workingset_activation(page);
+			__lru_cache_activate_page(&folio->page);
+		ClearFolioReferenced(folio);
+		workingset_activation(&folio->page);
 	}
-	if (page_is_idle(page))
-		clear_page_idle(page);
+	if (page_is_idle(&folio->page))
+		clear_page_idle(&folio->page);
 }
-EXPORT_SYMBOL(mark_page_accessed);
+EXPORT_SYMBOL(mark_folio_accessed);
 
 /**
  * lru_cache_add - add a page to a page list
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 13/25] mm: Add filemap_get_folio and find_get_folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (11 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 12/25] mm: Add mark_folio_accessed Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 14/25] mm/filemap: Add folio_add_to_page_cache Matthew Wilcox (Oracle)
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Turn pagecache_get_page() into a wrapper around filemap_get_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h |  21 +++++-
 mm/filemap.c            | 141 +++++++++++++++++++++-------------------
 2 files changed, 94 insertions(+), 68 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 317f17e98412..2c2974970467 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -317,8 +317,10 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
 #define FGP_HEAD		0x00000080
 #define FGP_ENTRY		0x00000100
 
-struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
-		int fgp_flags, gfp_t cache_gfp_mask);
+struct folio *filemap_get_folio(struct address_space *mapping, pgoff_t index,
+		int fgp_flags, gfp_t gfp);
+struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
+		int fgp_flags, gfp_t gfp);
 
 /**
  * find_get_page - find and get a page reference
@@ -336,6 +338,12 @@ static inline struct page *find_get_page(struct address_space *mapping,
 	return pagecache_get_page(mapping, offset, 0, 0);
 }
 
+static inline struct folio *find_get_folio(struct address_space *mapping,
+					pgoff_t index)
+{
+	return filemap_get_folio(mapping, index, 0, 0);
+}
+
 static inline struct page *find_get_page_flags(struct address_space *mapping,
 					pgoff_t offset, int fgp_flags)
 {
@@ -451,6 +459,15 @@ static inline struct page *folio_page(struct folio *folio, pgoff_t index)
 	return &folio->page + index;
 }
 
+/* Does this folio contain this index? */
+static inline bool folio_contains(struct folio *folio, pgoff_t index)
+{
+	/* HugeTLBfs indexes the page cache in units of hpage_size */
+	if (PageHuge(&folio->page))
+		return folio->page.index == index;
+	return index - folio_index(folio) < folio_nr_pages(folio);
+}
+
 /*
  * Given the page we found in the page cache, return the page corresponding
  * to this index in the file
diff --git a/mm/filemap.c b/mm/filemap.c
index b9f25a2d8312..7ed9e3dcefc8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1717,94 +1717,58 @@ static struct folio *mapping_get_entry(struct address_space *mapping,
 	return folio;
 }
 
-/**
- * pagecache_get_page - Find and get a reference to a page.
- * @mapping: The address_space to search.
- * @index: The page index.
- * @fgp_flags: %FGP flags modify how the page is returned.
- * @gfp_mask: Memory allocation flags to use if %FGP_CREAT is specified.
- *
- * Looks up the page cache entry at @mapping & @index.
- *
- * @fgp_flags can be zero or more of these flags:
- *
- * * %FGP_ACCESSED - The page will be marked accessed.
- * * %FGP_LOCK - The page is returned locked.
- * * %FGP_HEAD - If the page is present and a THP, return the head page
- *   rather than the exact page specified by the index.
- * * %FGP_ENTRY - If there is a shadow / swap / DAX entry, return it
- *   instead of allocating a new page to replace it.
- * * %FGP_CREAT - If no page is present then a new page is allocated using
- *   @gfp_mask and added to the page cache and the VM's LRU list.
- *   The page is returned locked and with an increased refcount.
- * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the
- *   page is already in cache.  If the page was allocated, unlock it before
- *   returning so the caller can do the same dance.
- * * %FGP_WRITE - The page will be written
- * * %FGP_NOFS - __GFP_FS will get cleared in gfp mask
- * * %FGP_NOWAIT - Don't get blocked by page lock
- *
- * If %FGP_LOCK or %FGP_CREAT are specified then the function may sleep even
- * if the %GFP flags specified for %FGP_CREAT are atomic.
- *
- * If there is a page cache page, it is returned with an increased refcount.
- *
- * Return: The found page or %NULL otherwise.
- */
-struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
-		int fgp_flags, gfp_t gfp_mask)
+struct folio *filemap_get_folio(struct address_space *mapping, pgoff_t index,
+		int fgp_flags, gfp_t gfp)
 {
-	struct page *page;
+	struct folio *folio;
 
 repeat:
-	page = &mapping_get_entry(mapping, index)->page;
-	if (xa_is_value(page)) {
+	folio = mapping_get_entry(mapping, index);
+	if (xa_is_value(folio)) {
 		if (fgp_flags & FGP_ENTRY)
-			return page;
-		page = NULL;
+			return folio;
+		folio = NULL;
 	}
-	if (!page)
+	if (!folio)
 		goto no_page;
 
 	if (fgp_flags & FGP_LOCK) {
 		if (fgp_flags & FGP_NOWAIT) {
-			if (!trylock_page(page)) {
-				put_page(page);
+			if (!trylock_folio(folio)) {
+				put_folio(folio);
 				return NULL;
 			}
 		} else {
-			lock_page(page);
+			lock_folio(folio);
 		}
 
 		/* Has the page been truncated? */
-		if (unlikely(page->mapping != mapping)) {
-			unlock_page(page);
-			put_page(page);
+		if (unlikely(folio->page.mapping != mapping)) {
+			unlock_folio(folio);
+			put_folio(folio);
 			goto repeat;
 		}
-		VM_BUG_ON_PAGE(!thp_contains(page, index), page);
+		VM_BUG_ON_PAGE(!folio_contains(folio, index), &folio->page);
 	}
 
 	if (fgp_flags & FGP_ACCESSED)
-		mark_page_accessed(page);
+		mark_folio_accessed(folio);
 	else if (fgp_flags & FGP_WRITE) {
 		/* Clear idle flag for buffer write */
-		if (page_is_idle(page))
-			clear_page_idle(page);
+		if (page_is_idle(&folio->page))
+			clear_page_idle(&folio->page);
 	}
-	if (!(fgp_flags & FGP_HEAD))
-		page = find_subpage(page, index);
 
 no_page:
-	if (!page && (fgp_flags & FGP_CREAT)) {
+	if (!folio && (fgp_flags & FGP_CREAT)) {
 		int err;
 		if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping))
-			gfp_mask |= __GFP_WRITE;
+			gfp |= __GFP_WRITE;
 		if (fgp_flags & FGP_NOFS)
-			gfp_mask &= ~__GFP_FS;
+			gfp &= ~__GFP_FS;
 
-		page = &__page_cache_alloc(gfp_mask, 0)->page;
-		if (!page)
+		folio = __page_cache_alloc(gfp, 0);
+		if (!folio)
 			return NULL;
 
 		if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP))))
@@ -1812,12 +1776,12 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 
 		/* Init accessed so avoid atomic mark_page_accessed later */
 		if (fgp_flags & FGP_ACCESSED)
-			__SetPageReferenced(page);
+			__SetFolioReferenced(folio);
 
-		err = add_to_page_cache_lru(page, mapping, index, gfp_mask);
+		err = add_to_page_cache_lru(&folio->page, mapping, index, gfp);
 		if (unlikely(err)) {
-			put_page(page);
-			page = NULL;
+			put_folio(folio);
+			folio = NULL;
 			if (err == -EEXIST)
 				goto repeat;
 		}
@@ -1826,11 +1790,56 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 		 * add_to_page_cache_lru locks the page, and for mmap we expect
 		 * an unlocked page.
 		 */
-		if (page && (fgp_flags & FGP_FOR_MMAP))
-			unlock_page(page);
+		if (folio && (fgp_flags & FGP_FOR_MMAP))
+			unlock_folio(folio);
 	}
 
-	return page;
+	return folio;
+}
+EXPORT_SYMBOL(filemap_get_folio);
+
+/**
+ * pagecache_get_page - Find and get a reference to a page.
+ * @mapping: The address_space to search.
+ * @index: The page index.
+ * @fgp_flags: %FGP flags modify how the page is returned.
+ * @gfp: Memory allocation flags to use if %FGP_CREAT is specified.
+ *
+ * Looks up the page cache entry at @mapping & @index.
+ *
+ * @fgp_flags can be zero or more of these flags:
+ *
+ * * %FGP_ACCESSED - The page will be marked accessed.
+ * * %FGP_LOCK - The page is returned locked.
+ * * %FGP_HEAD - If the page is present and a THP, return the head page
+ *   rather than the exact page specified by the index.
+ * * %FGP_ENTRY - If there is a shadow / swap / DAX entry, return it
+ *   instead of allocating a new page to replace it.
+ * * %FGP_CREAT - If no page is present then a new page is allocated using
+ *   @gfp_mask and added to the page cache and the VM's LRU list.
+ *   The page is returned locked and with an increased refcount.
+ * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the
+ *   page is already in cache.  If the page was allocated, unlock it before
+ *   returning so the caller can do the same dance.
+ * * %FGP_WRITE - The page will be written
+ * * %FGP_NOFS - __GFP_FS will get cleared in gfp mask
+ * * %FGP_NOWAIT - Don't get blocked by page lock
+ *
+ * If %FGP_LOCK or %FGP_CREAT are specified then the function may sleep even
+ * if the %GFP flags specified for %FGP_CREAT are atomic.
+ *
+ * If there is a page cache page, it is returned with an increased refcount.
+ *
+ * Return: The found page or %NULL otherwise.
+ */
+struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
+		int fgp_flags, gfp_t gfp)
+{
+	struct folio *folio = filemap_get_folio(mapping, index, fgp_flags, gfp);
+
+	if ((fgp_flags & FGP_HEAD) || !folio || xa_is_value(folio))
+		return &folio->page;
+	return folio_page(folio, index);
 }
 EXPORT_SYMBOL(pagecache_get_page);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 14/25] mm/filemap: Add folio_add_to_page_cache
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (12 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 13/25] mm: Add filemap_get_folio and find_get_folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 15/25] mm/swap: Convert rotate_reclaimable_page to folio Matthew Wilcox (Oracle)
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Pages being added to the page cache should already be folios, so
turn add_to_page_cache_lru() into a wrapper.  Saves hundreds of
bytes of text.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 13 +++++++--
 mm/filemap.c            | 62 ++++++++++++++++++++---------------------
 2 files changed, 41 insertions(+), 34 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2c2974970467..88a66b65d1ed 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -811,9 +811,9 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size)
 }
 
 int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
-				pgoff_t index, gfp_t gfp_mask);
-int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
-				pgoff_t index, gfp_t gfp_mask);
+				pgoff_t index, gfp_t gfp);
+int folio_add_to_page_cache(struct folio *folio, struct address_space *mapping,
+				pgoff_t index, gfp_t gfp);
 extern void delete_from_page_cache(struct page *page);
 extern void __delete_from_page_cache(struct page *page, void *shadow);
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
@@ -838,6 +838,13 @@ static inline int add_to_page_cache(struct page *page,
 	return error;
 }
 
+static inline int add_to_page_cache_lru(struct page *page,
+		struct address_space *mapping, pgoff_t index, gfp_t gfp)
+{
+	return folio_add_to_page_cache((struct folio *)page, mapping,
+			index, gfp);
+}
+
 /**
  * struct readahead_control - Describes a readahead request.
  *
diff --git a/mm/filemap.c b/mm/filemap.c
index 7ed9e3dcefc8..6fc896a38ef7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -828,25 +828,25 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL_GPL(replace_page_cache_page);
 
-noinline int __add_to_page_cache_locked(struct page *page,
+static noinline int __add_to_page_cache_locked(struct folio *folio,
 					struct address_space *mapping,
-					pgoff_t offset, gfp_t gfp,
+					pgoff_t index, gfp_t gfp,
 					void **shadowp)
 {
-	XA_STATE(xas, &mapping->i_pages, offset);
-	int huge = PageHuge(page);
+	XA_STATE(xas, &mapping->i_pages, index);
+	int huge = FolioHuge(folio);
 	int error;
 
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	VM_BUG_ON_PAGE(PageSwapBacked(page), page);
+	VM_BUG_ON_PAGE(!FolioLocked(folio), &folio->page);
+	VM_BUG_ON_PAGE(FolioSwapBacked(folio), &folio->page);
 	mapping_set_update(&xas, mapping);
 
-	get_page(page);
-	page->mapping = mapping;
-	page->index = offset;
+	get_folio(folio);
+	folio->page.mapping = mapping;
+	folio->page.index = index;
 
-	if (!huge && !page_is_secretmem(page)) {
-		error = mem_cgroup_charge(page, current->mm, gfp);
+	if (!huge && !page_is_secretmem(&folio->page)) {
+		error = mem_cgroup_charge(&folio->page, current->mm, gfp);
 		if (error)
 			goto error;
 	}
@@ -857,7 +857,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
 		unsigned int order = xa_get_order(xas.xa, xas.xa_index);
 		void *entry, *old = NULL;
 
-		if (order > thp_order(page))
+		if (order > folio_order(folio))
 			xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
 					order, gfp);
 		xas_lock_irq(&xas);
@@ -874,13 +874,13 @@ noinline int __add_to_page_cache_locked(struct page *page,
 				*shadowp = old;
 			/* entry may have been split before we acquired lock */
 			order = xa_get_order(xas.xa, xas.xa_index);
-			if (order > thp_order(page)) {
+			if (order > folio_order(folio)) {
 				xas_split(&xas, old, order);
 				xas_reset(&xas);
 			}
 		}
 
-		xas_store(&xas, page);
+		xas_store(&xas, folio);
 		if (xas_error(&xas))
 			goto unlock;
 
@@ -890,7 +890,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
 
 		/* hugetlb pages do not participate in page cache accounting */
 		if (!huge)
-			__inc_lruvec_page_state(page, NR_FILE_PAGES);
+			__inc_lruvec_page_state(&folio->page, NR_FILE_PAGES);
 unlock:
 		xas_unlock_irq(&xas);
 	} while (xas_nomem(&xas, gfp));
@@ -900,12 +900,12 @@ noinline int __add_to_page_cache_locked(struct page *page,
 		goto error;
 	}
 
-	trace_mm_filemap_add_to_page_cache(page);
+	trace_mm_filemap_add_to_page_cache(&folio->page);
 	return 0;
 error:
-	page->mapping = NULL;
+	folio->page.mapping = NULL;
 	/* Leave page->index set: truncation relies upon it */
-	put_page(page);
+	put_folio(folio);
 	return error;
 }
 ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);
@@ -925,22 +925,22 @@ ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);
 int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 		pgoff_t offset, gfp_t gfp_mask)
 {
-	return __add_to_page_cache_locked(page, mapping, offset,
+	return __add_to_page_cache_locked(page_folio(page), mapping, offset,
 					  gfp_mask, NULL);
 }
 EXPORT_SYMBOL(add_to_page_cache_locked);
 
-int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
-				pgoff_t offset, gfp_t gfp_mask)
+int folio_add_to_page_cache(struct folio *folio, struct address_space *mapping,
+				pgoff_t index, gfp_t gfp_mask)
 {
 	void *shadow = NULL;
 	int ret;
 
-	__SetPageLocked(page);
-	ret = __add_to_page_cache_locked(page, mapping, offset,
+	__SetFolioLocked(folio);
+	ret = __add_to_page_cache_locked(folio, mapping, index,
 					 gfp_mask, &shadow);
 	if (unlikely(ret))
-		__ClearPageLocked(page);
+		__ClearFolioLocked(folio);
 	else {
 		/*
 		 * The page might have been evicted from cache only
@@ -950,14 +950,14 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 		 * data from the working set, only to cache data that will
 		 * get overwritten with something else, is a waste of memory.
 		 */
-		WARN_ON_ONCE(PageActive(page));
+		WARN_ON_ONCE(FolioActive(folio));
 		if (!(gfp_mask & __GFP_WRITE) && shadow)
-			workingset_refault(page, shadow);
-		lru_cache_add(page);
+			workingset_refault(&folio->page, shadow);
+		lru_cache_add(&folio->page);
 	}
 	return ret;
 }
-EXPORT_SYMBOL_GPL(add_to_page_cache_lru);
+EXPORT_SYMBOL_GPL(folio_add_to_page_cache);
 
 #ifdef CONFIG_NUMA
 struct folio *__page_cache_alloc(gfp_t gfp, unsigned int order)
@@ -1778,7 +1778,7 @@ struct folio *filemap_get_folio(struct address_space *mapping, pgoff_t index,
 		if (fgp_flags & FGP_ACCESSED)
 			__SetFolioReferenced(folio);
 
-		err = add_to_page_cache_lru(&folio->page, mapping, index, gfp);
+		err = folio_add_to_page_cache(folio, mapping, index, gfp);
 		if (unlikely(err)) {
 			put_folio(folio);
 			folio = NULL;
@@ -1787,8 +1787,8 @@ struct folio *filemap_get_folio(struct address_space *mapping, pgoff_t index,
 		}
 
 		/*
-		 * add_to_page_cache_lru locks the page, and for mmap we expect
-		 * an unlocked page.
+		 * folio_add_to_page_cache locks the page, and for mmap we
+		 * expect an unlocked page.
 		 */
 		if (folio && (fgp_flags & FGP_FOR_MMAP))
 			unlock_folio(folio);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 15/25] mm/swap: Convert rotate_reclaimable_page to folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (13 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 14/25] mm/filemap: Add folio_add_to_page_cache Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 16/25] mm: Add folio_mapping Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Move the declaration into mm/internal.h and rename the function to
rotate_reclaimable_folio().  This eliminates all five of the calls to
compound_head() in this function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/swap.h |  1 -
 mm/filemap.c         |  2 +-
 mm/internal.h        |  1 +
 mm/page_io.c         |  4 ++--
 mm/swap.c            | 12 ++++++------
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index c097bc9cedd9..02ce65c29569 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -343,7 +343,6 @@ extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_cpu_zone(struct zone *zone);
 extern void lru_add_drain_all(void);
-extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
 extern void deactivate_page(struct page *page);
 extern void mark_page_lazyfree(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index 6fc896a38ef7..f3722ca8f7d4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1477,7 +1477,7 @@ void end_page_writeback(struct page *page)
 	 */
 	if (FolioReclaim(folio)) {
 		ClearFolioReclaim(folio);
-		rotate_reclaimable_page(&folio->page);
+		rotate_reclaimable_folio(folio);
 	}
 
 	/*
diff --git a/mm/internal.h b/mm/internal.h
index 8e9c660f33ca..f089535b5d86 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -35,6 +35,7 @@
 void page_writeback_init(void);
 
 vm_fault_t do_swap_page(struct vm_fault *vmf);
+void rotate_reclaimable_folio(struct folio *folio);
 
 void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
 		unsigned long floor, unsigned long ceiling);
diff --git a/mm/page_io.c b/mm/page_io.c
index 9bca17ecc4df..1fc0a579da58 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -57,7 +57,7 @@ void end_swap_bio_write(struct bio *bio)
 		 * Also print a dire warning that things will go BAD (tm)
 		 * very quickly.
 		 *
-		 * Also clear PG_reclaim to avoid rotate_reclaimable_page()
+		 * Also clear PG_reclaim to avoid rotate_reclaimable_folio()
 		 */
 		set_page_dirty(page);
 		pr_alert("Write-error on swap-device (%u:%u:%llu)\n",
@@ -341,7 +341,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 			 * temporary failure if the system has limited
 			 * memory for allocating transmit buffers.
 			 * Mark the page dirty and avoid
-			 * rotate_reclaimable_page but rate-limit the
+			 * rotate_reclaimable_folio but rate-limit the
 			 * messages but do not flag PageError like
 			 * the normal direct-to-bio case as it could
 			 * be temporary.
diff --git a/mm/swap.c b/mm/swap.c
index c3638a13987f..43e4c507ad0f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -241,19 +241,19 @@ static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec)
  * reclaim.  If it still appears to be reclaimable, move it to the tail of the
  * inactive list.
  *
- * rotate_reclaimable_page() must disable IRQs, to prevent nasty races.
+ * rotate_reclaimable_folio() must disable IRQs, to prevent nasty races.
  */
-void rotate_reclaimable_page(struct page *page)
+void rotate_reclaimable_folio(struct folio *folio)
 {
-	if (!PageLocked(page) && !PageDirty(page) &&
-	    !PageUnevictable(page) && PageLRU(page)) {
+	if (!FolioLocked(folio) && !FolioDirty(folio) &&
+	    !FolioUnevictable(folio) && FolioLRU(folio)) {
 		struct pagevec *pvec;
 		unsigned long flags;
 
-		get_page(page);
+		get_folio(folio);
 		local_lock_irqsave(&lru_rotate.lock, flags);
 		pvec = this_cpu_ptr(&lru_rotate.pvec);
-		if (!pagevec_add(pvec, page) || PageCompound(page))
+		if (!pagevec_add(pvec, &folio->page) || FolioHead(folio))
 			pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
 		local_unlock_irqrestore(&lru_rotate.lock, flags);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 16/25] mm: Add folio_mapping
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (14 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 15/25] mm/swap: Convert rotate_reclaimable_page to folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 17/25] mm: Rename THP_SUPPORT to MULTI_PAGE_FOLIOS Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

This is the folio equivalent of page_mapping().  Adjust
page_file_mapping() and page_mapping_file() to use folios internally.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 16 ++++++++++------
 mm/swapfile.c      |  6 +++---
 mm/util.c          | 20 ++++++++++----------
 3 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 02ccb7a09190..8bc28b4aa933 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1559,17 +1559,22 @@ void page_address_init(void);
 
 extern void *page_rmapping(struct page *page);
 extern struct anon_vma *page_anon_vma(struct page *page);
-extern struct address_space *page_mapping(struct page *page);
+struct address_space *folio_mapping(struct folio *);
+struct address_space *__folio_file_mapping(struct folio *);
 
-extern struct address_space *__page_file_mapping(struct page *);
+static inline struct address_space *page_mapping(struct page *page)
+{
+	return folio_mapping(page_folio(page));
+}
 
 static inline
 struct address_space *page_file_mapping(struct page *page)
 {
-	if (unlikely(PageSwapCache(page)))
-		return __page_file_mapping(page);
+	struct folio *folio = page_folio(page);
+	if (unlikely(FolioSwapCache(folio)))
+		return __folio_file_mapping(folio);
 
-	return page->mapping;
+	return folio->page.mapping;
 }
 
 extern pgoff_t __page_file_index(struct page *page);
@@ -1586,7 +1591,6 @@ static inline pgoff_t page_index(struct page *page)
 }
 
 bool page_mapped(struct page *page);
-struct address_space *page_mapping(struct page *page);
 struct address_space *page_mapping_file(struct page *page);
 
 /*
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1c0a829f7311..9bf2f8daaa79 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3550,11 +3550,11 @@ struct swap_info_struct *page_swap_info(struct page *page)
 /*
  * out-of-line __page_file_ methods to avoid include hell.
  */
-struct address_space *__page_file_mapping(struct page *page)
+struct address_space *__folio_file_mapping(struct folio *folio)
 {
-	return page_swap_info(page)->swap_file->f_mapping;
+	return page_swap_info(&folio->page)->swap_file->f_mapping;
 }
-EXPORT_SYMBOL_GPL(__page_file_mapping);
+EXPORT_SYMBOL_GPL(__folio_file_mapping);
 
 pgoff_t __page_file_index(struct page *page)
 {
diff --git a/mm/util.c b/mm/util.c
index 8c9b7d1e7c49..7e9fc89c883a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -686,39 +686,39 @@ struct anon_vma *page_anon_vma(struct page *page)
 	return __page_rmapping(page);
 }
 
-struct address_space *page_mapping(struct page *page)
+struct address_space *folio_mapping(struct folio *folio)
 {
 	struct address_space *mapping;
 
-	page = compound_head(page);
-
 	/* This happens if someone calls flush_dcache_page on slab page */
-	if (unlikely(PageSlab(page)))
+	if (unlikely(FolioSlab(folio)))
 		return NULL;
 
-	if (unlikely(PageSwapCache(page))) {
+	if (unlikely(FolioSwapCache(folio))) {
 		swp_entry_t entry;
 
-		entry.val = page_private(page);
+		entry.val = page_private(&folio->page);
 		return swap_address_space(entry);
 	}
 
-	mapping = page->mapping;
+	mapping = folio->page.mapping;
 	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
 		return NULL;
 
 	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
 }
-EXPORT_SYMBOL(page_mapping);
+EXPORT_SYMBOL(folio_mapping);
 
 /*
  * For file cache pages, return the address_space, otherwise return NULL
  */
 struct address_space *page_mapping_file(struct page *page)
 {
-	if (unlikely(PageSwapCache(page)))
+	struct folio *folio = page_folio(page);
+
+	if (unlikely(FolioSwapCache(folio)))
 		return NULL;
-	return page_mapping(page);
+	return folio_mapping(folio);
 }
 
 /* Slow path of page_mapcount() for compound pages */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 17/25] mm: Rename THP_SUPPORT to MULTI_PAGE_FOLIOS
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (15 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 16/25] mm: Add folio_mapping Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 18/25] btrfs: Use readahead_batch_length Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Using THPs was confusing everyone.  Switch to the new name of folios.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/inode.c              |  4 ++--
 include/linux/fs.h      |  2 +-
 include/linux/pagemap.h | 14 +++++++-------
 mm/shmem.c              |  2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index cb008acf0efd..2c79282803e7 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -180,8 +180,8 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	mapping->a_ops = &empty_aops;
 	mapping->host = inode;
 	mapping->flags = 0;
-	if (sb->s_type->fs_flags & FS_THP_SUPPORT)
-		__set_bit(AS_THP_SUPPORT, &mapping->flags);
+	if (sb->s_type->fs_flags & FS_MULTI_PAGE_FOLIOS)
+		__set_bit(AS_MULTI_PAGE_FOLIOS, &mapping->flags);
 	mapping->wb_err = 0;
 	atomic_set(&mapping->i_mmap_writable, 0);
 #ifdef CONFIG_READ_ONLY_THP_FOR_FS
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ad4cf1bae586..08f9a8a524f2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2231,7 +2231,7 @@ struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
-#define FS_THP_SUPPORT		8192	/* Remove once all fs converted */
+#define FS_MULTI_PAGE_FOLIOS	8192	/* Remove once all fs converted */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
 	const struct fs_parameter_spec *parameters;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 88a66b65d1ed..630a0a589073 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -29,7 +29,7 @@ enum mapping_flags {
 	AS_EXITING	= 4, 	/* final truncate in progress */
 	/* writeback related tags are not used */
 	AS_NO_WRITEBACK_TAGS = 5,
-	AS_THP_SUPPORT = 6,	/* THPs supported */
+	AS_MULTI_PAGE_FOLIOS = 6,
 };
 
 /**
@@ -121,9 +121,9 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 	m->gfp_mask = mask;
 }
 
-static inline bool mapping_thp_support(struct address_space *mapping)
+static inline bool mapping_multi_page_folios(struct address_space *mapping)
 {
-	return test_bit(AS_THP_SUPPORT, &mapping->flags);
+	return test_bit(AS_MULTI_PAGE_FOLIOS, &mapping->flags);
 }
 
 static inline int filemap_nr_thps(struct address_space *mapping)
@@ -138,20 +138,20 @@ static inline int filemap_nr_thps(struct address_space *mapping)
 static inline void filemap_nr_thps_inc(struct address_space *mapping)
 {
 #ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	if (!mapping_thp_support(mapping))
+	if (!mapping_multi_page_folios(mapping))
 		atomic_inc(&mapping->nr_thps);
 #else
-	WARN_ON_ONCE(1);
+	WARN_ON_ONCE(!mapping_multi_page_folios(mapping));
 #endif
 }
 
 static inline void filemap_nr_thps_dec(struct address_space *mapping)
 {
 #ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	if (!mapping_thp_support(mapping))
+	if (!mapping_multi_page_folios(mapping))
 		atomic_dec(&mapping->nr_thps);
 #else
-	WARN_ON_ONCE(1);
+	WARN_ON_ONCE(!mapping_multi_page_folios(mapping));
 #endif
 }
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 53d84d2c9fe5..192b7b5a7852 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3910,7 +3910,7 @@ static struct file_system_type shmem_fs_type = {
 	.parameters	= shmem_fs_parameters,
 #endif
 	.kill_sb	= kill_litter_super,
-	.fs_flags	= FS_USERNS_MOUNT | FS_THP_SUPPORT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_MULTI_PAGE_FOLIOS,
 };
 
 int __init shmem_init(void)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 18/25] btrfs: Use readahead_batch_length
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (16 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 17/25] mm: Rename THP_SUPPORT to MULTI_PAGE_FOLIOS Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-17  9:15   ` John Hubbard
  2020-12-16 18:23 ` [PATCH 19/25] fs: Change page refcount rules for readahead Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Implement readahead_batch_length() to determine the number of bytes in
the current batch of readahead pages and use it in btrfs.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/btrfs/extent_io.c    | 6 ++----
 include/linux/pagemap.h | 9 +++++++++
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6e3b72e63e42..42936a83a91b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4436,10 +4436,8 @@ void extent_readahead(struct readahead_control *rac)
 	int nr;
 
 	while ((nr = readahead_page_batch(rac, pagepool))) {
-		u64 contig_start = page_offset(pagepool[0]);
-		u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;
-
-		ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
+		u64 contig_start = readahead_pos(rac);
+		u64 contig_end = contig_start + readahead_batch_length(rac);
 
 		contiguous_readpages(pagepool, nr, contig_start, contig_end,
 				&em_cached, &bio, &bio_flags, &prev_em_start);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 630a0a589073..81ff21289722 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1048,6 +1048,15 @@ static inline unsigned int readahead_count(struct readahead_control *rac)
 	return rac->_nr_pages;
 }
 
+/**
+ * readahead_batch_length - The number of bytes in the current batch.
+ * @rac: The readahead request.
+ */
+static inline loff_t readahead_batch_length(struct readahead_control *rac)
+{
+	return rac->_batch_count * PAGE_SIZE;
+}
+
 static inline unsigned long dir_pages(struct inode *inode)
 {
 	return (unsigned long)(inode->i_size + PAGE_SIZE - 1) >>
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 19/25] fs: Change page refcount rules for readahead
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (17 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 18/25] btrfs: Use readahead_batch_length Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 20/25] fs: Change readpage to take a folio Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

This brings ->readahead into line with ->readpage for the refcount on
struct page.  It simplifies the various filesystems which implement
readahead and will reduce the number of atomic operations on the page
refcount in the future.  This change is combined with the conversion of
readahead to use the struct folio in order to make unconverted filesystems
fail to compile.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/porting.rst |  8 ++++
 Documentation/filesystems/vfs.rst     | 17 ++++-----
 fs/btrfs/extent_io.c                  | 13 +++----
 fs/erofs/data.c                       |  9 ++---
 fs/erofs/zdata.c                      |  5 ++-
 fs/ext4/readpage.c                    | 11 ++----
 fs/f2fs/data.c                        |  9 +----
 fs/fuse/file.c                        |  4 +-
 fs/iomap/buffered-io.c                |  4 +-
 fs/mpage.c                            |  3 +-
 include/linux/pagemap.h               | 55 +++++++++++++--------------
 mm/readahead.c                        | 18 ++++-----
 12 files changed, 72 insertions(+), 84 deletions(-)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 867036aa90b8..0580f69a5e8f 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -865,3 +865,11 @@ no matter what.  Everything is handled by the caller.
 
 clone_private_mount() returns a longterm mount now, so the proper destructor of
 its result is kern_unmount() or kern_unmount_array().
+
+---
+
+**mandatory**
+
+->readahead() has changed the reference count on struct page so that
+the filesystem *does not* drop a reference.  This is in line with how
+->readpage works but different from how ->readpages used to work.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index ca52c82e5bb5..5ac42b93225c 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -784,15 +784,14 @@ cache in your filesystem.  The following members are defined:
 
 ``readahead``
 	Called by the VM to read pages associated with the address_space
-	object.  The pages are consecutive in the page cache and are
-	locked.  The implementation should decrement the page refcount
-	after starting I/O on each page.  Usually the page will be
-	unlocked by the I/O completion handler.  If the filesystem decides
-	to stop attempting I/O before reaching the end of the readahead
-	window, it can simply return.  The caller will decrement the page
-	refcount and unlock the remaining pages for you.  Set PageUptodate
-	if the I/O completes successfully.  Setting PageError on any page
-	will be ignored; simply unlock the page if an I/O error occurs.
+	object.  The pages are consecutive in the page cache and
+	are locked.  Usually the page will be unlocked by the I/O
+	completion handler.  If the filesystem decides to stop attempting
+	I/O before reaching the end of the readahead window, it can
+	simply return.	The caller will unlock the remaining pages
+	for you.  Set PageUptodate if the I/O completes successfully.
+	Setting PageError on any page will be ignored; simply unlock
+	the page if an I/O error occurs.
 
 ``readpages``
 	called by the VM to read pages associated with the address_space
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 42936a83a91b..02665daa6172 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3399,22 +3399,21 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	return ret;
 }
 
-static inline void contiguous_readpages(struct page *pages[], int nr_pages,
+static inline void contiguous_readpages(struct folio **folios, int nr_pages,
 					     u64 start, u64 end,
 					     struct extent_map **em_cached,
 					     struct bio **bio,
 					     unsigned long *bio_flags,
 					     u64 *prev_em_start)
 {
-	struct btrfs_inode *inode = BTRFS_I(pages[0]->mapping->host);
+	struct btrfs_inode *inode = BTRFS_I(folios[0]->page.mapping->host);
 	int index;
 
 	btrfs_lock_and_flush_ordered_range(inode, start, end, NULL);
 
 	for (index = 0; index < nr_pages; index++) {
-		btrfs_do_readpage(pages[index], em_cached, bio, bio_flags,
-				  REQ_RAHEAD, prev_em_start);
-		put_page(pages[index]);
+		btrfs_do_readpage(&folios[index]->page, em_cached, bio,
+				bio_flags, REQ_RAHEAD, prev_em_start);
 	}
 }
 
@@ -4430,12 +4429,12 @@ void extent_readahead(struct readahead_control *rac)
 {
 	struct bio *bio = NULL;
 	unsigned long bio_flags = 0;
-	struct page *pagepool[16];
+	struct folio *pagepool[16];
 	struct extent_map *em_cached = NULL;
 	u64 prev_em_start = (u64)-1;
 	int nr;
 
-	while ((nr = readahead_page_batch(rac, pagepool))) {
+	while ((nr = readahead_folio_batch(rac, pagepool))) {
 		u64 contig_start = readahead_pos(rac);
 		u64 contig_end = contig_start + readahead_batch_length(rac);
 
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index ea4f693bee22..ba6deef9a4cc 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -201,9 +201,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio,
 			flush_dcache_page(page);
 
 			SetPageUptodate(page);
-			/* TODO: could we unlock the page earlier? */
 			unlock_page(ipage);
-			put_page(ipage);
 
 			/* imply err = 0, see erofs_map_blocks */
 			goto has_updated;
@@ -284,12 +282,13 @@ static void erofs_raw_access_readahead(struct readahead_control *rac)
 {
 	erofs_off_t last_block;
 	struct bio *bio = NULL;
-	struct page *page;
+	struct folio *folio;
 
 	trace_erofs_readpages(rac->mapping->host, readahead_index(rac),
 			readahead_count(rac), true);
 
-	while ((page = readahead_page(rac))) {
+	while ((folio = readahead_folio(rac))) {
+		struct page *page = &folio->page;
 		prefetchw(&page->flags);
 
 		bio = erofs_read_raw_page(bio, rac->mapping, page, &last_block,
@@ -303,8 +302,6 @@ static void erofs_raw_access_readahead(struct readahead_control *rac)
 
 			bio = NULL;
 		}
-
-		put_page(page);
 	}
 
 	/* the rare case (end in gaps) */
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 6cb356c4217b..f83ddf5fd1b1 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1336,6 +1336,7 @@ static void z_erofs_readahead(struct readahead_control *rac)
 	bool sync = (nr_pages <= sbi->ctx.max_sync_decompress_pages);
 	struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
 	struct page *page, *head = NULL;
+	struct folio *folio;
 	LIST_HEAD(pagepool);
 
 	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
@@ -1343,7 +1344,8 @@ static void z_erofs_readahead(struct readahead_control *rac)
 	f.readahead = true;
 	f.headoffset = readahead_pos(rac);
 
-	while ((page = readahead_page(rac))) {
+	while ((folio = readahead_folio(rac))) {
+		page = &folio->page;
 		prefetchw(&page->flags);
 
 		/*
@@ -1369,7 +1371,6 @@ static void z_erofs_readahead(struct readahead_control *rac)
 			erofs_err(inode->i_sb,
 				  "readahead error at page %lu @ nid %llu",
 				  page->index, EROFS_I(inode)->nid);
-		put_page(page);
 	}
 
 	(void)z_erofs_collector_end(&f.clt);
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index f014c5e473a9..6f5724d80a01 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -252,7 +252,7 @@ int ext4_mpage_readpages(struct inode *inode,
 		unsigned first_hole = blocks_per_page;
 
 		if (rac) {
-			page = readahead_page(rac);
+			page = &readahead_folio(rac)->page;
 			prefetchw(&page->flags);
 		}
 
@@ -307,7 +307,7 @@ int ext4_mpage_readpages(struct inode *inode,
 					zero_user_segment(page, 0,
 							  PAGE_SIZE);
 					unlock_page(page);
-					goto next_page;
+					continue;
 				}
 			}
 			if ((map.m_flags & EXT4_MAP_MAPPED) == 0) {
@@ -345,7 +345,7 @@ int ext4_mpage_readpages(struct inode *inode,
 					goto set_error_page;
 				SetPageUptodate(page);
 				unlock_page(page);
-				goto next_page;
+				continue;
 			}
 		} else if (fully_mapped) {
 			SetPageMappedToDisk(page);
@@ -394,7 +394,7 @@ int ext4_mpage_readpages(struct inode *inode,
 			bio = NULL;
 		} else
 			last_block_in_bio = blocks[blocks_per_page - 1];
-		goto next_page;
+		continue;
 	confused:
 		if (bio) {
 			submit_bio(bio);
@@ -404,9 +404,6 @@ int ext4_mpage_readpages(struct inode *inode,
 			block_read_full_page(page, ext4_get_block);
 		else
 			unlock_page(page);
-	next_page:
-		if (rac)
-			put_page(page);
 	}
 	if (bio)
 		submit_bio(bio);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index aa34d620bec9..2397bfd1a88d 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2389,10 +2389,10 @@ static int f2fs_mpage_readpages(struct inode *inode,
 
 	for (; nr_pages; nr_pages--) {
 		if (rac) {
-			page = readahead_page(rac);
+			page = &readahead_folio(rac)->page;
 			prefetchw(&page->flags);
 			if (drop_ra) {
-				f2fs_put_page(page, 1);
+				unlock_page(page);
 				continue;
 			}
 		}
@@ -2438,11 +2438,6 @@ static int f2fs_mpage_readpages(struct inode *inode,
 		}
 #ifdef CONFIG_F2FS_FS_COMPRESSION
 next_page:
-#endif
-		if (rac)
-			put_page(page);
-
-#ifdef CONFIG_F2FS_FS_COMPRESSION
 		if (f2fs_compressed_file(inode)) {
 			/* last page */
 			if (nr_pages == 1 && !f2fs_cluster_is_empty(&cc)) {
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8cccecb55fb8..c4645a54e932 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -911,7 +911,6 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
 		else
 			SetPageError(page);
 		unlock_page(page);
-		put_page(page);
 	}
 	if (ia->ff)
 		fuse_file_put(ia->ff, false, false);
@@ -980,7 +979,8 @@ static void fuse_readahead(struct readahead_control *rac)
 		if (!ia)
 			return;
 		ap = &ia->ap;
-		nr_pages = __readahead_batch(rac, ap->pages, nr_pages);
+		nr_pages = __readahead_batch(rac, (struct folio **)ap->pages,
+						nr_pages);
 		for (i = 0; i < nr_pages; i++) {
 			fuse_wait_on_page_writeback(inode,
 						    readahead_index(rac) + i);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 16a1e82e3aeb..ef650573ab9e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -361,11 +361,10 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
 		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
 			if (!ctx->cur_page_in_bio)
 				unlock_page(ctx->cur_page);
-			put_page(ctx->cur_page);
 			ctx->cur_page = NULL;
 		}
 		if (!ctx->cur_page) {
-			ctx->cur_page = readahead_page(ctx->rac);
+			ctx->cur_page = &readahead_folio(ctx->rac)->page;
 			ctx->cur_page_in_bio = false;
 		}
 		ret = iomap_readpage_actor(inode, pos + done, length - done,
@@ -417,7 +416,6 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 	if (ctx.cur_page) {
 		if (!ctx.cur_page_in_bio)
 			unlock_page(ctx.cur_page);
-		put_page(ctx.cur_page);
 	}
 }
 EXPORT_SYMBOL_GPL(iomap_readahead);
diff --git a/fs/mpage.c b/fs/mpage.c
index 830e6cc2a9e7..58b7e15d85c1 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -384,12 +384,11 @@ void mpage_readahead(struct readahead_control *rac, get_block_t get_block)
 		.is_readahead = true,
 	};
 
-	while ((page = readahead_page(rac))) {
+	while ((page = &readahead_folio(rac)->page)) {
 		prefetchw(&page->flags);
 		args.page = page;
 		args.nr_pages = readahead_count(rac);
 		args.bio = do_mpage_readpage(&args);
-		put_page(page);
 	}
 	if (args.bio)
 		mpage_bio_submit(REQ_OP_READ, REQ_RAHEAD, args.bio);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 81ff21289722..30123ae18ee1 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -849,8 +849,8 @@ static inline int add_to_page_cache_lru(struct page *page,
  * struct readahead_control - Describes a readahead request.
  *
  * A readahead request is for consecutive pages.  Filesystems which
- * implement the ->readahead method should call readahead_page() or
- * readahead_page_batch() in a loop and attempt to start I/O against
+ * implement the ->readahead method should call readahead_folio() or
+ * readahead_folio_batch() in a loop and attempt to start I/O against
  * each page in the request.
  *
  * Most of the fields in this struct are private and should be accessed
@@ -931,17 +931,16 @@ void page_cache_async_readahead(struct address_space *mapping,
 }
 
 /**
- * readahead_page - Get the next page to read.
+ * readahead_folio - Get the next folio to read.
  * @rac: The current readahead request.
  *
- * Context: The page is locked and has an elevated refcount.  The caller
- * should decreases the refcount once the page has been submitted for I/O
- * and unlock the page once all I/O to that page has completed.
- * Return: A pointer to the next page, or %NULL if we are done.
+ * Context: The folio is locked.  The caller should unlock the folio once
+ * all I/O to that folio has completed.
+ * Return: A pointer to the next folio, or %NULL if we are done.
  */
-static inline struct page *readahead_page(struct readahead_control *rac)
+static inline struct folio *readahead_folio(struct readahead_control *rac)
 {
-	struct page *page;
+	struct folio *folio;
 
 	BUG_ON(rac->_batch_count > rac->_nr_pages);
 	rac->_nr_pages -= rac->_batch_count;
@@ -952,19 +951,19 @@ static inline struct page *readahead_page(struct readahead_control *rac)
 		return NULL;
 	}
 
-	page = xa_load(&rac->mapping->i_pages, rac->_index);
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	rac->_batch_count = thp_nr_pages(page);
+	folio = xa_load(&rac->mapping->i_pages, rac->_index);
+	VM_BUG_ON_PAGE(!FolioLocked(folio), &folio->page);
+	rac->_batch_count = folio_nr_pages(folio);
 
-	return page;
+	return folio;
 }
 
 static inline unsigned int __readahead_batch(struct readahead_control *rac,
-		struct page **array, unsigned int array_sz)
+		struct folio **array, unsigned int array_sz)
 {
 	unsigned int i = 0;
 	XA_STATE(xas, &rac->mapping->i_pages, 0);
-	struct page *page;
+	struct folio *folio;
 
 	BUG_ON(rac->_batch_count > rac->_nr_pages);
 	rac->_nr_pages -= rac->_batch_count;
@@ -973,13 +972,12 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac,
 
 	xas_set(&xas, rac->_index);
 	rcu_read_lock();
-	xas_for_each(&xas, page, rac->_index + rac->_nr_pages - 1) {
-		if (xas_retry(&xas, page))
+	xas_for_each(&xas, folio, rac->_index + rac->_nr_pages - 1) {
+		if (xas_retry(&xas, folio))
 			continue;
-		VM_BUG_ON_PAGE(!PageLocked(page), page);
-		VM_BUG_ON_PAGE(PageTail(page), page);
-		array[i++] = page;
-		rac->_batch_count += thp_nr_pages(page);
+		VM_BUG_ON_PAGE(!FolioLocked(folio), &folio->page);
+		array[i++] = folio;
+		rac->_batch_count += folio_nr_pages(folio);
 
 		/*
 		 * The page cache isn't using multi-index entries yet,
@@ -987,7 +985,7 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac,
 		 * next index.  This can be removed once the page cache
 		 * is converted.
 		 */
-		if (PageHead(page))
+		if (FolioHead(folio))
 			xas_set(&xas, rac->_index + rac->_batch_count);
 
 		if (i == array_sz)
@@ -999,17 +997,16 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac,
 }
 
 /**
- * readahead_page_batch - Get a batch of pages to read.
+ * readahead_folio_batch - Get a batch of folios to read.
  * @rac: The current readahead request.
- * @array: An array of pointers to struct page.
+ * @array: An array of pointers to struct folio.
  *
- * Context: The pages are locked and have an elevated refcount.  The caller
- * should decreases the refcount once the page has been submitted for I/O
- * and unlock the page once all I/O to that page has completed.
- * Return: The number of pages placed in the array.  0 indicates the request
+ * Context: The folios are locked.  The caller should unlock the folio
+ * once all I/O to that folio has completed.
+ * Return: The number of folios placed in the array.  0 indicates the request
  * is complete.
  */
-#define readahead_page_batch(rac, array)				\
+#define readahead_folio_batch(rac, array)				\
 	__readahead_batch(rac, array, ARRAY_SIZE(array))
 
 /**
diff --git a/mm/readahead.c b/mm/readahead.c
index d7a5424e3d0d..b2d78984e406 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -118,7 +118,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
 		bool skip_page)
 {
 	const struct address_space_operations *aops = rac->mapping->a_ops;
-	struct page *page;
+	struct folio *folio;
 	struct blk_plug plug;
 
 	if (!readahead_count(rac))
@@ -128,11 +128,9 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
 
 	if (aops->readahead) {
 		aops->readahead(rac);
-		/* Clean up the remaining pages */
-		while ((page = readahead_page(rac))) {
-			unlock_page(page);
-			put_page(page);
-		}
+		/* Clean up the remaining folios */
+		while ((folio = readahead_folio(rac)))
+			unlock_folio(folio);
 	} else if (aops->readpages) {
 		aops->readpages(rac->file, rac->mapping, pages,
 				readahead_count(rac));
@@ -141,10 +139,8 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
 		rac->_index += rac->_nr_pages;
 		rac->_nr_pages = 0;
 	} else {
-		while ((page = readahead_page(rac))) {
-			aops->readpage(rac->file, page);
-			put_page(page);
-		}
+		while ((folio = readahead_folio(rac)))
+			aops->readpage(rac->file, &folio->page);
 	}
 
 	blk_finish_plug(&plug);
@@ -224,6 +220,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 			put_page(page);
 			read_pages(ractl, &page_pool, true);
 			continue;
+		} else {
+			put_page(page);
 		}
 		if (i == nr_to_read - lookahead_size)
 			SetPageReadahead(page);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 20/25] fs: Change readpage to take a folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (18 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 19/25] fs: Change page refcount rules for readahead Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 21/25] mm: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

We track the uptodate state on the entire folio, not individual
pages.  That means the readpage should be passed a folio and told
to update the entire folio.  Filesystems will not have multi-page
folios created for them until they indicate support for them by
setting the FS_MULTI_PAGE_FOLIO flag.  Until they do, they can
assume that the folio being passed in contains a single page.

Also convert filler_t to take a folio as these two are tightly
intertwined.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/locking.rst |  2 +-
 Documentation/filesystems/vfs.rst     | 18 ++++++++++--------
 fs/9p/vfs_addr.c                      |  9 +++++----
 fs/adfs/inode.c                       |  4 ++--
 fs/affs/file.c                        |  8 ++++----
 fs/affs/symlink.c                     |  3 ++-
 fs/afs/file.c                         |  5 +++--
 fs/befs/linuxvfs.c                    | 23 +++++++++++------------
 fs/bfs/file.c                         |  4 ++--
 fs/block_dev.c                        |  4 ++--
 fs/btrfs/ctree.h                      |  2 +-
 fs/btrfs/file.c                       | 13 +++++++------
 fs/btrfs/free-space-cache.c           |  9 +++++----
 fs/btrfs/inode.c                      | 16 +++++++++-------
 fs/btrfs/ioctl.c                      | 11 ++++++-----
 fs/btrfs/relocation.c                 | 11 ++++++-----
 fs/btrfs/send.c                       | 11 ++++++-----
 fs/buffer.c                           | 12 +++++++-----
 fs/cachefiles/rdwr.c                  | 11 ++++++-----
 fs/ceph/addr.c                        |  6 +++---
 fs/cifs/file.c                        |  3 ++-
 fs/coda/symlink.c                     |  3 ++-
 fs/cramfs/inode.c                     |  3 ++-
 fs/ecryptfs/mmap.c                    |  3 ++-
 fs/efs/inode.c                        |  4 ++--
 fs/efs/symlink.c                      |  3 ++-
 fs/erofs/data.c                       |  3 ++-
 fs/erofs/zdata.c                      |  3 ++-
 fs/exfat/inode.c                      |  4 ++--
 fs/ext2/inode.c                       |  4 ++--
 fs/ext4/ext4.h                        |  2 +-
 fs/ext4/inode.c                       | 10 +++++-----
 fs/ext4/readpage.c                    | 26 ++++++++++++++------------
 fs/f2fs/data.c                        |  3 ++-
 fs/fat/inode.c                        |  4 ++--
 fs/freevxfs/vxfs_immed.c              |  7 ++++---
 fs/freevxfs/vxfs_subr.c               |  7 +++----
 fs/fuse/dir.c                         |  8 ++++----
 fs/fuse/file.c                        |  3 ++-
 fs/gfs2/aops.c                        | 13 +++++++------
 fs/hfs/inode.c                        |  4 ++--
 fs/hfsplus/inode.c                    |  4 ++--
 fs/hpfs/file.c                        |  4 ++--
 fs/hpfs/namei.c                       |  3 ++-
 fs/iomap/buffered-io.c                | 10 +++++-----
 fs/isofs/compress.c                   |  3 ++-
 fs/isofs/inode.c                      |  4 ++--
 fs/isofs/rock.c                       |  3 ++-
 fs/jffs2/file.c                       | 20 +++++++++++---------
 fs/jffs2/os-linux.h                   |  2 +-
 fs/jfs/inode.c                        |  4 ++--
 fs/jfs/jfs_metapage.c                 |  3 ++-
 fs/libfs.c                            | 10 +++++-----
 fs/minix/inode.c                      |  4 ++--
 fs/mpage.c                            |  6 +++---
 fs/nfs/file.c                         |  5 +++--
 fs/nfs/read.c                         |  7 ++++---
 fs/nfs/symlink.c                      | 12 ++++++------
 fs/nilfs2/inode.c                     |  4 ++--
 fs/ntfs/aops.c                        |  3 ++-
 fs/ocfs2/aops.c                       | 14 +++++++-------
 fs/ocfs2/refcounttree.c               |  5 +++--
 fs/ocfs2/symlink.c                    |  3 ++-
 fs/omfs/file.c                        |  4 ++--
 fs/orangefs/inode.c                   |  3 ++-
 fs/qnx4/inode.c                       |  4 ++--
 fs/qnx6/inode.c                       |  4 ++--
 fs/reiserfs/inode.c                   |  4 ++--
 fs/romfs/super.c                      |  3 ++-
 fs/squashfs/file.c                    |  3 ++-
 fs/squashfs/symlink.c                 |  3 ++-
 fs/sysv/itree.c                       |  4 ++--
 fs/ubifs/file.c                       |  8 ++++----
 fs/udf/file.c                         |  8 ++++----
 fs/udf/inode.c                        |  4 ++--
 fs/udf/symlink.c                      |  3 ++-
 fs/ufs/inode.c                        |  4 ++--
 fs/vboxsf/file.c                      |  3 ++-
 fs/xfs/xfs_aops.c                     |  4 ++--
 fs/zonefs/super.c                     |  4 ++--
 include/linux/buffer_head.h           |  2 +-
 include/linux/fs.h                    |  4 ++--
 include/linux/iomap.h                 |  2 +-
 include/linux/mpage.h                 |  2 +-
 include/linux/nfs_fs.h                |  2 +-
 include/linux/pagemap.h               |  2 +-
 mm/filemap.c                          | 19 +++++++------------
 mm/page_io.c                          |  2 +-
 mm/readahead.c                        |  6 +++---
 89 files changed, 292 insertions(+), 254 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index c0f2c7586531..13a7a1278200 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -236,7 +236,7 @@ address_space_operations
 prototypes::
 
 	int (*writepage)(struct page *page, struct writeback_control *wbc);
-	int (*readpage)(struct file *, struct page *);
+	int (*readpage)(struct file *, struct folio *);
 	int (*writepages)(struct address_space *, struct writeback_control *);
 	int (*set_page_dirty)(struct page *page);
 	void (*readahead)(struct readahead_control *);
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 5ac42b93225c..fee05c6e71f7 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -703,7 +703,7 @@ cache in your filesystem.  The following members are defined:
 
 	struct address_space_operations {
 		int (*writepage)(struct page *page, struct writeback_control *wbc);
-		int (*readpage)(struct file *, struct page *);
+		int (*readpage)(struct file *, struct folio *);
 		int (*writepages)(struct address_space *, struct writeback_control *);
 		int (*set_page_dirty)(struct page *page);
 		void (*readahead)(struct readahead_control *);
@@ -756,13 +756,15 @@ cache in your filesystem.  The following members are defined:
 	See the file "Locking" for more details.
 
 ``readpage``
-	called by the VM to read a page from backing store.  The page
-	will be Locked when readpage is called, and should be unlocked
-	and marked uptodate once the read completes.  If ->readpage
-	discovers that it needs to unlock the page for some reason, it
-	can do so, and then return AOP_TRUNCATED_PAGE.  In this case,
-	the page will be relocated, relocked and if that all succeeds,
-	->readpage will be called again.
+	Called by the VM to read a folio from the backing store.  If the
+	filesystem has not indicated that it can handle multi-page
+	folios by setting FS_MULTI_PAGE_FOLIOS, the folio will contain
+	one page.  The folio will be Locked when readpage is called, and
+	should be unlocked and marked uptodate once the read completes.
+	If ->readpage discovers that it needs to unlock the folio for
+	some reason, it can do so, and then return AOP_TRUNCATED_PAGE.
+	In this case, the caller will attempt to look up the page and
+	call ->readpage again.
 
 ``writepages``
 	called by the VM to write out pages associated with the
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index cce9ace651a2..db31b25c6282 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -35,8 +35,9 @@
  * @page: structure to page
  *
  */
-static int v9fs_fid_readpage(void *data, struct page *page)
+static int v9fs_fid_readpage(void *data, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct p9_fid *fid = data;
 	struct inode *inode = page->mapping->host;
 	struct bio_vec bvec = {.bv_page = page, .bv_len = PAGE_SIZE};
@@ -80,9 +81,9 @@ static int v9fs_fid_readpage(void *data, struct page *page)
  *
  */
 
-static int v9fs_vfs_readpage(struct file *filp, struct page *page)
+static int v9fs_vfs_readpage(struct file *filp, struct folio *folio)
 {
-	return v9fs_fid_readpage(filp->private_data, page);
+	return v9fs_fid_readpage(filp->private_data, folio);
 }
 
 /**
@@ -279,7 +280,7 @@ static int v9fs_write_begin(struct file *filp, struct address_space *mapping,
 	if (len == PAGE_SIZE)
 		goto out;
 
-	retval = v9fs_fid_readpage(v9inode->writeback_fid, page);
+	retval = v9fs_fid_readpage(v9inode->writeback_fid, page_folio(page));
 	put_page(page);
 	if (!retval)
 		goto start;
diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c
index 32620f4a7623..aaf9f749a4ab 100644
--- a/fs/adfs/inode.c
+++ b/fs/adfs/inode.c
@@ -38,9 +38,9 @@ static int adfs_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page, adfs_get_block, wbc);
 }
 
-static int adfs_readpage(struct file *file, struct page *page)
+static int adfs_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page, adfs_get_block);
+	return block_read_full_page(folio, adfs_get_block);
 }
 
 static void adfs_write_failed(struct address_space *mapping, loff_t to)
diff --git a/fs/affs/file.c b/fs/affs/file.c
index d91b0133d95d..0c3b614b16a1 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -375,9 +375,9 @@ static int affs_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page, affs_get_block, wbc);
 }
 
-static int affs_readpage(struct file *file, struct page *page)
+static int affs_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page, affs_get_block);
+	return block_read_full_page(folio, affs_get_block);
 }
 
 static void affs_write_failed(struct address_space *mapping, loff_t to)
@@ -626,9 +626,9 @@ affs_extent_file_ofs(struct inode *inode, u32 newsize)
 	return PTR_ERR(bh);
 }
 
-static int
-affs_readpage_ofs(struct file *file, struct page *page)
+static int affs_readpage_ofs(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	u32 to;
 	int err;
diff --git a/fs/affs/symlink.c b/fs/affs/symlink.c
index a7531b26e8f0..01d30f8ef0dd 100644
--- a/fs/affs/symlink.c
+++ b/fs/affs/symlink.c
@@ -11,8 +11,9 @@
 
 #include "affs.h"
 
-static int affs_symlink_readpage(struct file *file, struct page *page)
+static int affs_symlink_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct buffer_head *bh;
 	struct inode *inode = page->mapping->host;
 	char *link = page_address(page);
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 85f5adf21aa0..a4cd8b9f7806 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -17,7 +17,7 @@
 #include "internal.h"
 
 static int afs_file_mmap(struct file *file, struct vm_area_struct *vma);
-static int afs_readpage(struct file *file, struct page *page);
+static int afs_readpage(struct file *file, struct folio *folio);
 static void afs_invalidatepage(struct page *page, unsigned int offset,
 			       unsigned int length);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
@@ -389,8 +389,9 @@ int afs_page_filler(void *data, struct page *page)
  * read page from file, directory or symlink, given a file to nominate the key
  * to be used
  */
-static int afs_readpage(struct file *file, struct page *page)
+static int afs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct key *key;
 	int ret;
 
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index c1ba13d19024..fb397f0433f4 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -40,7 +40,7 @@ MODULE_LICENSE("GPL");
 
 static int befs_readdir(struct file *, struct dir_context *);
 static int befs_get_block(struct inode *, sector_t, struct buffer_head *, int);
-static int befs_readpage(struct file *file, struct page *page);
+static int befs_readpage(struct file *file, struct folio *folio);
 static sector_t befs_bmap(struct address_space *mapping, sector_t block);
 static struct dentry *befs_lookup(struct inode *, struct dentry *,
 				  unsigned int);
@@ -48,7 +48,7 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
 static struct inode *befs_alloc_inode(struct super_block *sb);
 static void befs_free_inode(struct inode *inode);
 static void befs_destroy_inodecache(void);
-static int befs_symlink_readpage(struct file *, struct page *);
+static int befs_symlink_readpage(struct file *, struct folio *);
 static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
 			char **out, int *out_len);
 static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -108,10 +108,9 @@ static const struct export_operations befs_export_operations = {
  * passes it the address of befs_get_block, for mapping file
  * positions to disk blocks.
  */
-static int
-befs_readpage(struct file *file, struct page *page)
+static int befs_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page, befs_get_block);
+	return block_read_full_page(folio, befs_get_block);
 }
 
 static sector_t
@@ -468,14 +467,14 @@ befs_destroy_inodecache(void)
  * The data stream become link name. Unless the LONG_SYMLINK
  * flag is set.
  */
-static int befs_symlink_readpage(struct file *unused, struct page *page)
+static int befs_symlink_readpage(struct file *unused, struct folio *folio)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = folio->page.mapping->host;
 	struct super_block *sb = inode->i_sb;
 	struct befs_inode_info *befs_ino = BEFS_I(inode);
 	befs_data_stream *data = &befs_ino->i_data.ds;
 	befs_off_t len = data->size;
-	char *link = page_address(page);
+	char *link = page_address(&folio->page);
 
 	if (len == 0 || len > PAGE_SIZE) {
 		befs_error(sb, "Long symlink with illegal length");
@@ -488,12 +487,12 @@ static int befs_symlink_readpage(struct file *unused, struct page *page)
 		goto fail;
 	}
 	link[len - 1] = '\0';
-	SetPageUptodate(page);
-	unlock_page(page);
+	SetFolioUptodate(folio);
+	unlock_folio(folio);
 	return 0;
 fail:
-	SetPageError(page);
-	unlock_page(page);
+	SetFolioError(folio);
+	unlock_folio(folio);
 	return -EIO;
 }
 
diff --git a/fs/bfs/file.c b/fs/bfs/file.c
index 0dceefc54b48..852f98af446f 100644
--- a/fs/bfs/file.c
+++ b/fs/bfs/file.c
@@ -155,9 +155,9 @@ static int bfs_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page, bfs_get_block, wbc);
 }
 
-static int bfs_readpage(struct file *file, struct page *page)
+static int bfs_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page, bfs_get_block);
+	return block_read_full_page(folio, bfs_get_block);
 }
 
 static void bfs_write_failed(struct address_space *mapping, loff_t to)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9e56ee1f2652..932cb795e3d6 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -617,9 +617,9 @@ static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page, blkdev_get_block, wbc);
 }
 
-static int blkdev_readpage(struct file * file, struct page * page)
+static int blkdev_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page, blkdev_get_block);
+	return block_read_full_page(folio, blkdev_get_block);
 }
 
 static void blkdev_readahead(struct readahead_control *rac)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 45f8c5797aca..bdde4582479d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3102,7 +3102,7 @@ int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio,
 			     unsigned long bio_flags);
 void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end);
 vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf);
-int btrfs_readpage(struct file *file, struct page *page);
+int btrfs_readpage(struct file *file, struct folio *folio);
 void btrfs_evict_inode(struct inode *inode);
 int btrfs_write_inode(struct inode *inode, struct writeback_control *wbc);
 struct inode *btrfs_alloc_inode(struct super_block *sb);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0e41459b8de6..a88b18bb37f1 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1331,16 +1331,17 @@ static int prepare_uptodate_page(struct inode *inode,
 
 	if (((pos & (PAGE_SIZE - 1)) || force_uptodate) &&
 	    !PageUptodate(page)) {
-		ret = btrfs_readpage(NULL, page);
+		struct folio *folio = page_folio(page);
+		ret = btrfs_readpage(NULL, folio);
 		if (ret)
 			return ret;
-		lock_page(page);
-		if (!PageUptodate(page)) {
-			unlock_page(page);
+		lock_folio(folio);
+		if (!FolioUptodate(folio)) {
+			unlock_folio(folio);
 			return -EIO;
 		}
-		if (page->mapping != inode->i_mapping) {
-			unlock_page(page);
+		if (folio->page.mapping != inode->i_mapping) {
+			unlock_folio(folio);
 			return -EAGAIN;
 		}
 	}
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 71d0d14bc18b..ca0fef79ce6c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -438,15 +438,16 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
 		}
 		io_ctl->pages[i] = page;
 		if (uptodate && !PageUptodate(page)) {
-			btrfs_readpage(NULL, page);
-			lock_page(page);
-			if (page->mapping != inode->i_mapping) {
+			struct folio *folio = page_folio(page);
+			btrfs_readpage(NULL, folio);
+			lock_folio(folio);
+			if (folio->page.mapping != inode->i_mapping) {
 				btrfs_err(BTRFS_I(inode)->root->fs_info,
 					  "free space cache page truncated");
 				io_ctl_drop_pages(io_ctl);
 				return -EIO;
 			}
-			if (!PageUptodate(page)) {
+			if (!FolioUptodate(folio)) {
 				btrfs_err(BTRFS_I(inode)->root->fs_info,
 					   "error reading free space cache");
 				io_ctl_drop_pages(io_ctl);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 070716650df8..ce9eea76135a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4722,14 +4722,15 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	}
 
 	if (!PageUptodate(page)) {
-		ret = btrfs_readpage(NULL, page);
-		lock_page(page);
-		if (page->mapping != mapping) {
-			unlock_page(page);
-			put_page(page);
+		struct folio *folio = page_folio(page);
+		ret = btrfs_readpage(NULL, folio);
+		lock_folio(folio);
+		if (folio->page.mapping != mapping) {
+			unlock_folio(folio);
+			put_folio(folio);
 			goto again;
 		}
-		if (!PageUptodate(page)) {
+		if (!FolioUptodate(folio)) {
 			ret = -EIO;
 			goto out_unlock;
 		}
@@ -8060,8 +8061,9 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	return extent_fiemap(BTRFS_I(inode), fieinfo, start, len);
 }
 
-int btrfs_readpage(struct file *file, struct page *page)
+int btrfs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
 	u64 start = page_offset(page);
 	u64 end = start + PAGE_SIZE - 1;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dde49a791f3e..cab05b00e91e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1347,11 +1347,12 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		}
 
 		if (!PageUptodate(page)) {
-			btrfs_readpage(NULL, page);
-			lock_page(page);
-			if (!PageUptodate(page)) {
-				unlock_page(page);
-				put_page(page);
+			struct folio *folio = page_folio(page);
+			btrfs_readpage(NULL, folio);
+			lock_folio(folio);
+			if (!FolioUptodate(folio)) {
+				unlock_folio(folio);
+				put_folio(folio);
 				ret = -EIO;
 				break;
 			}
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 19b7db8b2117..e66039c13e3a 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2687,11 +2687,12 @@ static int relocate_file_extent_cluster(struct inode *inode,
 		}
 
 		if (!PageUptodate(page)) {
-			btrfs_readpage(NULL, page);
-			lock_page(page);
-			if (!PageUptodate(page)) {
-				unlock_page(page);
-				put_page(page);
+			struct folio *folio = page_folio(page);
+			btrfs_readpage(NULL, folio);
+			lock_folio(folio);
+			if (!FolioUptodate(folio)) {
+				unlock_folio(folio);
+				put_folio(folio);
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
 							PAGE_SIZE, true);
 				btrfs_delalloc_release_extents(BTRFS_I(inode),
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index d719a2755a40..8f803297459f 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -4978,11 +4978,12 @@ static int put_file_data(struct send_ctx *sctx, u64 offset, u32 len)
 		}
 
 		if (!PageUptodate(page)) {
-			btrfs_readpage(NULL, page);
-			lock_page(page);
-			if (!PageUptodate(page)) {
-				unlock_page(page);
-				put_page(page);
+			struct folio *folio = page_folio(page);
+			btrfs_readpage(NULL, folio);
+			lock_folio(folio);
+			if (!FolioUptodate(folio)) {
+				unlock_folio(folio);
+				put_folio(folio);
 				ret = -EIO;
 				break;
 			}
diff --git a/fs/buffer.c b/fs/buffer.c
index 96c7604f69b3..f7b62b69f33d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2293,8 +2293,9 @@ EXPORT_SYMBOL(block_is_partially_uptodate);
  * set/clear_buffer_uptodate() functions propagate buffer state into the
  * page struct once IO has completed.
  */
-int block_read_full_page(struct page *page, get_block_t *get_block)
+int block_read_full_page(struct folio *folio, get_block_t *get_block)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	sector_t iblock, lblock;
 	struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
@@ -2882,13 +2883,14 @@ int nobh_truncate_page(struct address_space *mapping,
 
 	/* Ok, it's mapped. Make sure it's up-to-date */
 	if (!PageUptodate(page)) {
-		err = mapping->a_ops->readpage(NULL, page);
+		struct folio *folio = page_folio(page);
+		err = mapping->a_ops->readpage(NULL, folio);
 		if (err) {
-			put_page(page);
+			put_folio(folio);
 			goto out;
 		}
-		lock_page(page);
-		if (!PageUptodate(page)) {
+		lock_folio(folio);
+		if (!FolioUptodate(folio)) {
 			err = -EIO;
 			goto unlock;
 		}
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 268fbcac4afb..f2a858d71927 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -111,15 +111,16 @@ static int cachefiles_read_reissue(struct cachefiles_object *object,
 	add_page_wait_queue(backpage, &monitor->monitor);
 
 	if (trylock_page(backpage)) {
+		struct folio *folio = page_folio(backpage);
 		ret = -EIO;
-		if (PageError(backpage))
+		if (FolioError(folio))
 			goto unlock_discard;
 		ret = 0;
-		if (PageUptodate(backpage))
+		if (FolioUptodate(folio))
 			goto unlock_discard;
 
 		_debug("reissue read");
-		ret = bmapping->a_ops->readpage(NULL, backpage);
+		ret = bmapping->a_ops->readpage(NULL, folio);
 		if (ret < 0)
 			goto discard;
 	}
@@ -282,7 +283,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 	newpage = NULL;
 
 read_backing_page:
-	ret = bmapping->a_ops->readpage(NULL, backpage);
+	ret = bmapping->a_ops->readpage(NULL, page_folio(backpage));
 	if (ret < 0)
 		goto read_error;
 
@@ -522,7 +523,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 		newpage = NULL;
 
 	reread_backing_page:
-		ret = bmapping->a_ops->readpage(NULL, backpage);
+		ret = bmapping->a_ops->readpage(NULL, page_folio(backpage));
 		if (ret < 0)
 			goto read_error;
 
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 5b2873b12904..1bcd7bf20930 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -259,11 +259,11 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
 	return err < 0 ? err : 0;
 }
 
-static int ceph_readpage(struct file *filp, struct page *page)
+static int ceph_readpage(struct file *filp, struct folio *folio)
 {
-	int r = ceph_do_readpage(filp, page);
+	int r = ceph_do_readpage(filp, &folio->page);
 	if (r != -EINPROGRESS)
-		unlock_page(page);
+		unlock_folio(folio);
 	else
 		r = 0;
 	return r;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 6d001905c8e5..cf806c7331aa 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4553,8 +4553,9 @@ static int cifs_readpage_worker(struct file *file, struct page *page,
 	return rc;
 }
 
-static int cifs_readpage(struct file *file, struct page *page)
+static int cifs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	loff_t offset = (loff_t)page->index << PAGE_SHIFT;
 	int rc = -EACCES;
 	unsigned int xid;
diff --git a/fs/coda/symlink.c b/fs/coda/symlink.c
index 8907d0508198..966053c1c523 100644
--- a/fs/coda/symlink.c
+++ b/fs/coda/symlink.c
@@ -20,8 +20,9 @@
 #include "coda_psdev.h"
 #include "coda_linux.h"
 
-static int coda_symlink_filler(struct file *file, struct page *page)
+static int coda_symlink_filler(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	int error;
 	struct coda_inode_info *cii;
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 4b90cfd1ec36..991650846605 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -816,8 +816,9 @@ static struct dentry *cramfs_lookup(struct inode *dir, struct dentry *dentry, un
 	return d_splice_alias(inode, dentry);
 }
 
-static int cramfs_readpage(struct file *file, struct page *page)
+static int cramfs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	u32 maxblock;
 	int bytes_filled;
diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 019572c6b39a..38f89b6f8095 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -177,8 +177,9 @@ ecryptfs_copy_up_encrypted_with_header(struct page *page,
  *
  * Returns zero on success; non-zero on error.
  */
-static int ecryptfs_readpage(struct file *file, struct page *page)
+static int ecryptfs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct ecryptfs_crypt_stat *crypt_stat =
 		&ecryptfs_inode_to_private(page->mapping->host)->crypt_stat;
 	int rc = 0;
diff --git a/fs/efs/inode.c b/fs/efs/inode.c
index 89e73a6f0d36..28d85bc27cce 100644
--- a/fs/efs/inode.c
+++ b/fs/efs/inode.c
@@ -14,9 +14,9 @@
 #include "efs.h"
 #include <linux/efs_fs_sb.h>
 
-static int efs_readpage(struct file *file, struct page *page)
+static int efs_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page,efs_get_block);
+	return block_read_full_page(folio, efs_get_block);
 }
 static sector_t _efs_bmap(struct address_space *mapping, sector_t block)
 {
diff --git a/fs/efs/symlink.c b/fs/efs/symlink.c
index 923eb91654d5..3f6d9c8786a4 100644
--- a/fs/efs/symlink.c
+++ b/fs/efs/symlink.c
@@ -12,8 +12,9 @@
 #include <linux/buffer_head.h>
 #include "efs.h"
 
-static int efs_symlink_readpage(struct file *file, struct page *page)
+static int efs_symlink_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	char *link = page_address(page);
 	struct buffer_head * bh;
 	struct inode * inode = page->mapping->host;
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index ba6deef9a4cc..be148c090046 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -261,8 +261,9 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio,
  * since we dont have write or truncate flows, so no inode
  * locking needs to be held at the moment.
  */
-static int erofs_raw_access_readpage(struct file *file, struct page *page)
+static int erofs_raw_access_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	erofs_off_t last_block;
 	struct bio *bio;
 
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index f83ddf5fd1b1..bcee824b82cb 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1299,8 +1299,9 @@ static void z_erofs_runqueue(struct super_block *sb,
 	z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
 }
 
-static int z_erofs_readpage(struct file *file, struct page *page)
+static int z_erofs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *const inode = page->mapping->host;
 	struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
 	int err;
diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index 730373e0965a..94292c163c00 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -357,9 +357,9 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
 	return err;
 }
 
-static int exfat_readpage(struct file *file, struct page *page)
+static int exfat_readpage(struct file *file, struct folio *folio)
 {
-	return mpage_readpage(page, exfat_get_block);
+	return mpage_readpage(folio, exfat_get_block);
 }
 
 static void exfat_readahead(struct readahead_control *rac)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 78c417d3c898..a57a7a25db45 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -869,9 +869,9 @@ static int ext2_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page, ext2_get_block, wbc);
 }
 
-static int ext2_readpage(struct file *file, struct page *page)
+static int ext2_readpage(struct file *file, struct folio *folio)
 {
-	return mpage_readpage(page, ext2_get_block);
+	return mpage_readpage(folio, ext2_get_block);
 }
 
 static void ext2_readahead(struct readahead_control *rac)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 22f80efbe3a4..65454096cf74 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3525,7 +3525,7 @@ static inline void ext4_set_de_type(struct super_block *sb,
 
 /* readpages.c */
 extern int ext4_mpage_readpages(struct inode *inode,
-		struct readahead_control *rac, struct page *page);
+		struct readahead_control *rac, struct folio *folio);
 extern int __init ext4_init_post_read_processing(void);
 extern void ext4_exit_post_read_processing(void);
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b147c2e20469..50b91a16ce19 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3225,18 +3225,18 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
 	return iomap_bmap(mapping, block, &ext4_iomap_ops);
 }
 
-static int ext4_readpage(struct file *file, struct page *page)
+static int ext4_readpage(struct file *file, struct folio *folio)
 {
 	int ret = -EAGAIN;
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = folio->page.mapping->host;
 
-	trace_ext4_readpage(page);
+	trace_ext4_readpage(&folio->page);
 
 	if (ext4_has_inline_data(inode))
-		ret = ext4_readpage_inline(inode, page);
+		ret = ext4_readpage_inline(inode, &folio->page);
 
 	if (ret == -EAGAIN)
-		return ext4_mpage_readpages(inode, NULL, page);
+		return ext4_mpage_readpages(inode, NULL, folio);
 
 	return ret;
 }
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 6f5724d80a01..fd6e5f3b7ba7 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -222,8 +222,9 @@ static inline loff_t ext4_readpage_limit(struct inode *inode)
 }
 
 int ext4_mpage_readpages(struct inode *inode,
-		struct readahead_control *rac, struct page *page)
+		struct readahead_control *rac, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct bio *bio = NULL;
 	sector_t last_block_in_bio = 0;
 
@@ -252,7 +253,8 @@ int ext4_mpage_readpages(struct inode *inode,
 		unsigned first_hole = blocks_per_page;
 
 		if (rac) {
-			page = &readahead_folio(rac)->page;
+			folio = readahead_folio(rac);
+			page = &folio->page;
 			prefetchw(&page->flags);
 		}
 
@@ -303,10 +305,10 @@ int ext4_mpage_readpages(struct inode *inode,
 
 				if (ext4_map_blocks(NULL, inode, &map, 0) < 0) {
 				set_error_page:
-					SetPageError(page);
+					SetFolioError(folio);
 					zero_user_segment(page, 0,
 							  PAGE_SIZE);
-					unlock_page(page);
+					unlock_folio(folio);
 					continue;
 				}
 			}
@@ -343,16 +345,16 @@ int ext4_mpage_readpages(struct inode *inode,
 				if (ext4_need_verity(inode, page->index) &&
 				    !fsverity_verify_page(page))
 					goto set_error_page;
-				SetPageUptodate(page);
-				unlock_page(page);
+				SetFolioUptodate(folio);
+				unlock_folio(folio);
 				continue;
 			}
 		} else if (fully_mapped) {
-			SetPageMappedToDisk(page);
+			SetFolioMappedToDisk(folio);
 		}
 		if (fully_mapped && blocks_per_page == 1 &&
-		    !PageUptodate(page) && cleancache_get_page(page) == 0) {
-			SetPageUptodate(page);
+		    !FolioUptodate(folio) && cleancache_get_page(page) == 0) {
+			SetFolioUptodate(folio);
 			goto confused;
 		}
 
@@ -400,10 +402,10 @@ int ext4_mpage_readpages(struct inode *inode,
 			submit_bio(bio);
 			bio = NULL;
 		}
-		if (!PageUptodate(page))
-			block_read_full_page(page, ext4_get_block);
+		if (!FolioUptodate(folio))
+			block_read_full_page(folio, ext4_get_block);
 		else
-			unlock_page(page);
+			unlock_folio(folio);
 	}
 	if (bio)
 		submit_bio(bio);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 2397bfd1a88d..654a79f5e4ea 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2458,8 +2458,9 @@ static int f2fs_mpage_readpages(struct inode *inode,
 	return ret;
 }
 
-static int f2fs_read_data_page(struct file *file, struct page *page)
+static int f2fs_read_data_page(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page_file_mapping(page)->host;
 	int ret = -EAGAIN;
 
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index bab9b202b496..9d55d47a28df 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -205,9 +205,9 @@ static int fat_writepages(struct address_space *mapping,
 	return mpage_writepages(mapping, wbc, fat_get_block);
 }
 
-static int fat_readpage(struct file *file, struct page *page)
+static int fat_readpage(struct file *file, struct folio *folio)
 {
-	return mpage_readpage(page, fat_get_block);
+	return mpage_readpage(folio, fat_get_block);
 }
 
 static void fat_readahead(struct readahead_control *rac)
diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c
index bfc780c682fb..69c05606d904 100644
--- a/fs/freevxfs/vxfs_immed.c
+++ b/fs/freevxfs/vxfs_immed.c
@@ -38,7 +38,7 @@
 #include "vxfs_inode.h"
 
 
-static int	vxfs_immed_readpage(struct file *, struct page *);
+static int	vxfs_immed_readpage(struct file *, struct folio *);
 
 /*
  * Address space operations for immed files and directories.
@@ -50,7 +50,7 @@ const struct address_space_operations vxfs_immed_aops = {
 /**
  * vxfs_immed_readpage - read part of an immed inode into pagecache
  * @file:	file context (unused)
- * @page:	page frame to fill in.
+ * @folio:	folio to fill in.
  *
  * Description:
  *   vxfs_immed_readpage reads a part of the immed area of the
@@ -63,8 +63,9 @@ const struct address_space_operations vxfs_immed_aops = {
  *   @page is locked and will be unlocked.
  */
 static int
-vxfs_immed_readpage(struct file *fp, struct page *pp)
+vxfs_immed_readpage(struct file *fp, struct folio *folio)
 {
+	struct page *pp = &folio->page;
 	struct vxfs_inode_info	*vip = VXFS_INO(pp->mapping->host);
 	u_int64_t	offset = (u_int64_t)pp->index << PAGE_SHIFT;
 	caddr_t		kaddr;
diff --git a/fs/freevxfs/vxfs_subr.c b/fs/freevxfs/vxfs_subr.c
index e806694d4145..1b68210ab859 100644
--- a/fs/freevxfs/vxfs_subr.c
+++ b/fs/freevxfs/vxfs_subr.c
@@ -38,7 +38,7 @@
 #include "vxfs_extern.h"
 
 
-static int		vxfs_readpage(struct file *, struct page *);
+static int		vxfs_readpage(struct file *, struct folio *);
 static sector_t		vxfs_bmap(struct address_space *, sector_t);
 
 const struct address_space_operations vxfs_aops = {
@@ -155,10 +155,9 @@ vxfs_getblk(struct inode *ip, sector_t iblock,
  * Locking status:
  *   @page is locked and will be unlocked.
  */
-static int
-vxfs_readpage(struct file *file, struct page *page)
+static int vxfs_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page, vxfs_getblk);
+	return block_read_full_page(folio, vxfs_getblk);
 }
  
 /**
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 78f9f209078c..3622fc5f33e8 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1908,14 +1908,14 @@ void fuse_init_dir(struct inode *inode)
 	fi->rdc.version = 0;
 }
 
-static int fuse_symlink_readpage(struct file *null, struct page *page)
+static int fuse_symlink_readpage(struct file *null, struct folio *folio)
 {
-	int err = fuse_readlink_page(page->mapping->host, page);
+	int err = fuse_readlink_page(folio->page.mapping->host, &folio->page);
 
 	if (!err)
-		SetPageUptodate(page);
+		SetFolioUptodate(folio);
 
-	unlock_page(page);
+	unlock_folio(folio);
 
 	return err;
 }
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index c4645a54e932..5d957f931caf 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -862,8 +862,9 @@ static int fuse_do_readpage(struct file *file, struct page *page)
 	return 0;
 }
 
-static int fuse_readpage(struct file *file, struct page *page)
+static int fuse_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	int err;
 
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index cc4f987687f3..f3736ab1f6ce 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -465,8 +465,9 @@ static int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
 }
 
 
-static int __gfs2_readpage(void *file, struct page *page)
+static int __gfs2_readpage(void *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
@@ -474,12 +475,12 @@ static int __gfs2_readpage(void *file, struct page *page)
 
 	if (!gfs2_is_jdata(ip) ||
 	    (i_blocksize(inode) == PAGE_SIZE && !page_has_buffers(page))) {
-		error = iomap_readpage(page, &gfs2_iomap_ops);
+		error = iomap_readpage(folio, &gfs2_iomap_ops);
 	} else if (gfs2_is_stuffed(ip)) {
 		error = stuffed_readpage(ip, page);
-		unlock_page(page);
+		unlock_folio(folio);
 	} else {
-		error = mpage_readpage(page, gfs2_block_map);
+		error = mpage_readpage(folio, gfs2_block_map);
 	}
 
 	if (unlikely(gfs2_withdrawn(sdp)))
@@ -494,9 +495,9 @@ static int __gfs2_readpage(void *file, struct page *page)
  * @page: The page of the file
  */
 
-static int gfs2_readpage(struct file *file, struct page *page)
+static int gfs2_readpage(struct file *file, struct folio *folio)
 {
-	return __gfs2_readpage(file, page);
+	return __gfs2_readpage(file, folio);
 }
 
 /**
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index f35a37c65e5f..778f65256e49 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -34,9 +34,9 @@ static int hfs_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page, hfs_get_block, wbc);
 }
 
-static int hfs_readpage(struct file *file, struct page *page)
+static int hfs_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page, hfs_get_block);
+	return block_read_full_page(folio, hfs_get_block);
 }
 
 static void hfs_write_failed(struct address_space *mapping, loff_t to)
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index e3da9e96b835..afaa784fae9f 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -22,9 +22,9 @@
 #include "hfsplus_raw.h"
 #include "xattr.h"
 
-static int hfsplus_readpage(struct file *file, struct page *page)
+static int hfsplus_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page, hfsplus_get_block);
+	return block_read_full_page(folio, hfsplus_get_block);
 }
 
 static int hfsplus_writepage(struct page *page, struct writeback_control *wbc)
diff --git a/fs/hpfs/file.c b/fs/hpfs/file.c
index 077c25128eb7..b1fe2040279e 100644
--- a/fs/hpfs/file.c
+++ b/fs/hpfs/file.c
@@ -116,9 +116,9 @@ static int hpfs_get_block(struct inode *inode, sector_t iblock, struct buffer_he
 	return r;
 }
 
-static int hpfs_readpage(struct file *file, struct page *page)
+static int hpfs_readpage(struct file *file, struct folio *folio)
 {
-	return mpage_readpage(page, hpfs_get_block);
+	return mpage_readpage(folio, hpfs_get_block);
 }
 
 static int hpfs_writepage(struct page *page, struct writeback_control *wbc)
diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
index 1aee39160ac5..16bf0c2f7925 100644
--- a/fs/hpfs/namei.c
+++ b/fs/hpfs/namei.c
@@ -475,8 +475,9 @@ static int hpfs_rmdir(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static int hpfs_symlink_readpage(struct file *file, struct page *page)
+static int hpfs_symlink_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	char *link = page_address(page);
 	struct inode *i = page->mapping->host;
 	struct fnode *fnode;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ef650573ab9e..8cfb5fc2c13d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -312,15 +312,15 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	return pos - orig_pos + plen;
 }
 
-int
-iomap_readpage(struct page *page, const struct iomap_ops *ops)
+int iomap_readpage(struct folio *folio, const struct iomap_ops *ops)
 {
+	struct page *page = &folio->page;
 	struct iomap_readpage_ctx ctx = { .cur_page = page };
 	struct inode *inode = page->mapping->host;
 	unsigned poff;
 	loff_t ret;
 
-	trace_iomap_readpage(page->mapping->host, 1);
+	trace_iomap_readpage(inode, 1);
 
 	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
 		ret = iomap_apply(inode, page_offset(page) + poff,
@@ -328,7 +328,7 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 				iomap_readpage_actor);
 		if (ret <= 0) {
 			WARN_ON_ONCE(ret == 0);
-			SetPageError(page);
+			SetFolioError(folio);
 			break;
 		}
 	}
@@ -338,7 +338,7 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 		WARN_ON_ONCE(!ctx.cur_page_in_bio);
 	} else {
 		WARN_ON_ONCE(ctx.cur_page_in_bio);
-		unlock_page(page);
+		unlock_folio(folio);
 	}
 
 	/*
diff --git a/fs/isofs/compress.c b/fs/isofs/compress.c
index bc12ac7e2312..f502103d5b6d 100644
--- a/fs/isofs/compress.c
+++ b/fs/isofs/compress.c
@@ -296,8 +296,9 @@ static int zisofs_fill_pages(struct inode *inode, int full_page, int pcount,
  * per reference.  We inject the additional pages into the page
  * cache as a form of readahead.
  */
-static int zisofs_readpage(struct file *file, struct page *page)
+static int zisofs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = file_inode(file);
 	struct address_space *mapping = inode->i_mapping;
 	int err;
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index ec90773527ee..ea1b99562cbc 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -1174,9 +1174,9 @@ struct buffer_head *isofs_bread(struct inode *inode, sector_t block)
 	return sb_bread(inode->i_sb, blknr);
 }
 
-static int isofs_readpage(struct file *file, struct page *page)
+static int isofs_readpage(struct file *file, struct folio *folio)
 {
-	return mpage_readpage(page, isofs_get_block);
+	return mpage_readpage(folio, isofs_get_block);
 }
 
 static void isofs_readahead(struct readahead_control *rac)
diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c
index 94ef92fe806c..7c5358efa130 100644
--- a/fs/isofs/rock.c
+++ b/fs/isofs/rock.c
@@ -690,8 +690,9 @@ int parse_rock_ridge_inode(struct iso_directory_record *de, struct inode *inode,
  * readpage() for symlinks: reads symlink contents into the page and either
  * makes it uptodate and returns 0 or returns error (-EIO)
  */
-static int rock_ridge_symlink_readpage(struct file *file, struct page *page)
+static int rock_ridge_symlink_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	struct iso_inode_info *ei = ISOFS_I(inode);
 	struct isofs_sb_info *sbi = ISOFS_SB(inode->i_sb);
diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c
index f8fb89b10227..378ddff6bbea 100644
--- a/fs/jffs2/file.c
+++ b/fs/jffs2/file.c
@@ -27,7 +27,7 @@ static int jffs2_write_end(struct file *filp, struct address_space *mapping,
 static int jffs2_write_begin(struct file *filp, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata);
-static int jffs2_readpage (struct file *filp, struct page *pg);
+static int jffs2_readpage(struct file *filp, struct folio *folio);
 
 int jffs2_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
 {
@@ -76,8 +76,9 @@ const struct address_space_operations jffs2_file_address_operations =
 	.write_end =	jffs2_write_end,
 };
 
-static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
+static int jffs2_do_readpage_nolock(struct inode *inode, struct folio *folio)
 {
+	struct page *pg = &folio->page;
 	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
 	struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);
 	unsigned char *pg_buf;
@@ -109,21 +110,22 @@ static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
 	return ret;
 }
 
-int jffs2_do_readpage_unlock(void *data, struct page *pg)
+int jffs2_do_readpage_unlock(void *data, struct folio *folio)
 {
-	int ret = jffs2_do_readpage_nolock(data, pg);
-	unlock_page(pg);
+	int ret = jffs2_do_readpage_nolock(data, folio);
+	unlock_folio(folio);
 	return ret;
 }
 
 
-static int jffs2_readpage (struct file *filp, struct page *pg)
+static int jffs2_readpage(struct file *file, struct folio *folio)
 {
-	struct jffs2_inode_info *f = JFFS2_INODE_INFO(pg->mapping->host);
+	struct inode *inode = folio->page.mapping->host;
+	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
 	int ret;
 
 	mutex_lock(&f->sem);
-	ret = jffs2_do_readpage_unlock(pg->mapping->host, pg);
+	ret = jffs2_do_readpage_unlock(inode, folio);
 	mutex_unlock(&f->sem);
 	return ret;
 }
@@ -218,7 +220,7 @@ static int jffs2_write_begin(struct file *filp, struct address_space *mapping,
 	 */
 	if (!PageUptodate(pg)) {
 		mutex_lock(&f->sem);
-		ret = jffs2_do_readpage_nolock(inode, pg);
+		ret = jffs2_do_readpage_nolock(inode, page_folio(pg));
 		mutex_unlock(&f->sem);
 		if (ret)
 			goto out_page;
diff --git a/fs/jffs2/os-linux.h b/fs/jffs2/os-linux.h
index ef1cfa61549e..ff4a38d4510c 100644
--- a/fs/jffs2/os-linux.h
+++ b/fs/jffs2/os-linux.h
@@ -155,7 +155,7 @@ extern const struct file_operations jffs2_file_operations;
 extern const struct inode_operations jffs2_file_inode_operations;
 extern const struct address_space_operations jffs2_file_address_operations;
 int jffs2_fsync(struct file *, loff_t, loff_t, int);
-int jffs2_do_readpage_unlock(void *data, struct page *pg);
+int jffs2_do_readpage_unlock(void *data, struct folio *folio);
 
 /* ioctl.c */
 long jffs2_ioctl(struct file *, unsigned int, unsigned long);
diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
index 6f65bfa9f18d..f502131a4e69 100644
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
@@ -291,9 +291,9 @@ static int jfs_writepages(struct address_space *mapping,
 	return mpage_writepages(mapping, wbc, jfs_get_block);
 }
 
-static int jfs_readpage(struct file *file, struct page *page)
+static int jfs_readpage(struct file *file, struct folio *folio)
 {
-	return mpage_readpage(page, jfs_get_block);
+	return mpage_readpage(folio, jfs_get_block);
 }
 
 static void jfs_readahead(struct readahead_control *rac)
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index 176580f54af9..058837d7172e 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -468,8 +468,9 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
 	return -EIO;
 }
 
-static int metapage_readpage(struct file *fp, struct page *page)
+static int metapage_readpage(struct file *fp, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	struct bio *bio = NULL;
 	int block_offset;
diff --git a/fs/libfs.c b/fs/libfs.c
index d1c3bade9f30..a0f9274271c4 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -509,12 +509,12 @@ int simple_setattr(struct dentry *dentry, struct iattr *iattr)
 }
 EXPORT_SYMBOL(simple_setattr);
 
-int simple_readpage(struct file *file, struct page *page)
+int simple_readpage(struct file *file, struct folio *folio)
 {
-	clear_highpage(page);
-	flush_dcache_page(page);
-	SetPageUptodate(page);
-	unlock_page(page);
+	clear_highpage(&folio->page);
+	flush_dcache_page(&folio->page);
+	SetFolioUptodate(folio);
+	unlock_folio(folio);
 	return 0;
 }
 EXPORT_SYMBOL(simple_readpage);
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index 34f546404aa1..df1ee731bf22 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -402,9 +402,9 @@ static int minix_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page, minix_get_block, wbc);
 }
 
-static int minix_readpage(struct file *file, struct page *page)
+static int minix_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page,minix_get_block);
+	return block_read_full_page(folio, minix_get_block);
 }
 
 int minix_prepare_chunk(struct page *page, loff_t pos, unsigned len)
diff --git a/fs/mpage.c b/fs/mpage.c
index 58b7e15d85c1..f1b89b05f8ce 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -331,7 +331,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 	if (args->bio)
 		args->bio = mpage_bio_submit(REQ_OP_READ, op_flags, args->bio);
 	if (!PageUptodate(page))
-		block_read_full_page(page, args->get_block);
+		block_read_full_page((struct folio *)page, args->get_block);
 	else
 		unlock_page(page);
 	goto out;
@@ -398,10 +398,10 @@ EXPORT_SYMBOL(mpage_readahead);
 /*
  * This isn't called much at all
  */
-int mpage_readpage(struct page *page, get_block_t get_block)
+int mpage_readpage(struct folio *folio, get_block_t get_block)
 {
 	struct mpage_readpage_args args = {
-		.page = page,
+		.page = &folio->page,
 		.nr_pages = 1,
 		.get_block = get_block,
 	};
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 63940a7a70be..ca2844ff0d28 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -340,9 +340,10 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
 		put_page(page);
 	} else if (!once_thru &&
 		   nfs_want_read_modify_write(file, page, pos, len)) {
+		struct folio *folio = page_folio(page);
 		once_thru = 1;
-		ret = nfs_readpage(file, page);
-		put_page(page);
+		ret = nfs_readpage(file, folio);
+		put_folio(folio);
 		if (!ret)
 			goto start;
 	}
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index eb854f1f86e2..293394785e69 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -310,8 +310,9 @@ static void nfs_readpage_result(struct rpc_task *task,
  *  -	The error flag is set for this page. This happens only when a
  *	previous async read operation failed.
  */
-int nfs_readpage(struct file *file, struct page *page)
+int nfs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct nfs_open_context *ctx;
 	struct inode *inode = page_file_mapping(page)->host;
 	int		error;
@@ -372,9 +373,9 @@ struct nfs_readdesc {
 	struct nfs_open_context *ctx;
 };
 
-static int
-readpage_async_filler(void *data, struct page *page)
+static int readpage_async_filler(void *data, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct nfs_readdesc *desc = (struct nfs_readdesc *)data;
 	struct nfs_page *new;
 	unsigned int len;
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 25ba299fdac2..e42efd820f2f 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -26,21 +26,21 @@
  * and straight-forward than readdir caching.
  */
 
-static int nfs_symlink_filler(void *data, struct page *page)
+static int nfs_symlink_filler(void *data, struct folio *folio)
 {
 	struct inode *inode = data;
 	int error;
 
-	error = NFS_PROTO(inode)->readlink(inode, page, 0, PAGE_SIZE);
+	error = NFS_PROTO(inode)->readlink(inode, &folio->page, 0, PAGE_SIZE);
 	if (error < 0)
 		goto error;
-	SetPageUptodate(page);
-	unlock_page(page);
+	SetFolioUptodate(folio);
+	unlock_folio(folio);
 	return 0;
 
 error:
-	SetPageError(page);
-	unlock_page(page);
+	SetFolioError(folio);
+	unlock_folio(folio);
 	return -EIO;
 }
 
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 745d371d6fea..118c23c75239 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -141,9 +141,9 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
  * @file - file struct of the file to be read
  * @page - the page to be read
  */
-static int nilfs_readpage(struct file *file, struct page *page)
+static int nilfs_readpage(struct file *file, struct folio *folio)
 {
-	return mpage_readpage(page, nilfs_get_block);
+	return mpage_readpage(folio, nilfs_get_block);
 }
 
 static void nilfs_readahead(struct readahead_control *rac)
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index bb0a43860ad2..7ee896069be9 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -375,8 +375,9 @@ static int ntfs_read_block(struct page *page)
  *
  * Return 0 on success and -errno on error.
  */
-static int ntfs_readpage(struct file *file, struct page *page)
+static int ntfs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	loff_t i_size;
 	struct inode *vi;
 	ntfs_inode *ni, *base_ni;
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 3bfb4147895a..7264e844e577 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -277,15 +277,15 @@ static int ocfs2_readpage_inline(struct inode *inode, struct page *page)
 	return ret;
 }
 
-static int ocfs2_readpage(struct file *file, struct page *page)
+static int ocfs2_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	struct ocfs2_inode_info *oi = OCFS2_I(inode);
 	loff_t start = (loff_t)page->index << PAGE_SHIFT;
 	int ret, unlock = 1;
 
-	trace_ocfs2_readpage((unsigned long long)oi->ip_blkno,
-			     (page ? page->index : 0));
+	trace_ocfs2_readpage((unsigned long long)oi->ip_blkno, page->index);
 
 	ret = ocfs2_inode_lock_with_page(inode, NULL, 0, page);
 	if (ret != 0) {
@@ -301,7 +301,7 @@ static int ocfs2_readpage(struct file *file, struct page *page)
 		 * busyloop waiting for ip_alloc_sem to unlock
 		 */
 		ret = AOP_TRUNCATED_PAGE;
-		unlock_page(page);
+		unlock_folio(folio);
 		unlock = 0;
 		down_read(&oi->ip_alloc_sem);
 		up_read(&oi->ip_alloc_sem);
@@ -320,7 +320,7 @@ static int ocfs2_readpage(struct file *file, struct page *page)
 	 */
 	if (start >= i_size_read(inode)) {
 		zero_user(page, 0, PAGE_SIZE);
-		SetPageUptodate(page);
+		SetFolioUptodate(folio);
 		ret = 0;
 		goto out_alloc;
 	}
@@ -328,7 +328,7 @@ static int ocfs2_readpage(struct file *file, struct page *page)
 	if (oi->ip_dyn_features & OCFS2_INLINE_DATA_FL)
 		ret = ocfs2_readpage_inline(inode, page);
 	else
-		ret = block_read_full_page(page, ocfs2_get_block);
+		ret = block_read_full_page(folio, ocfs2_get_block);
 	unlock = 0;
 
 out_alloc:
@@ -337,7 +337,7 @@ static int ocfs2_readpage(struct file *file, struct page *page)
 	ocfs2_inode_unlock(inode, 0);
 out:
 	if (unlock)
-		unlock_page(page);
+		unlock_folio(folio);
 	return ret;
 }
 
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 3b397fa9c9e8..f4700e9a36da 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -2963,12 +2963,13 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle,
 		}
 
 		if (!PageUptodate(page)) {
-			ret = block_read_full_page(page, ocfs2_get_block);
+			struct folio *folio = page_folio(page);
+			ret = block_read_full_page(folio, ocfs2_get_block);
 			if (ret) {
 				mlog_errno(ret);
 				goto unlock;
 			}
-			lock_page(page);
+			lock_folio(folio);
 		}
 
 		if (page_has_buffers(page)) {
diff --git a/fs/ocfs2/symlink.c b/fs/ocfs2/symlink.c
index 94cfacc9bad7..3091c3278ce8 100644
--- a/fs/ocfs2/symlink.c
+++ b/fs/ocfs2/symlink.c
@@ -54,8 +54,9 @@
 #include "buffer_head_io.h"
 
 
-static int ocfs2_fast_symlink_readpage(struct file *unused, struct page *page)
+static int ocfs2_fast_symlink_readpage(struct file *unused, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	struct buffer_head *bh = NULL;
 	int status = ocfs2_read_inode_block(inode, &bh);
diff --git a/fs/omfs/file.c b/fs/omfs/file.c
index 2c7b70ee1388..db00be77e3f2 100644
--- a/fs/omfs/file.c
+++ b/fs/omfs/file.c
@@ -284,9 +284,9 @@ static int omfs_get_block(struct inode *inode, sector_t block,
 	return ret;
 }
 
-static int omfs_readpage(struct file *file, struct page *page)
+static int omfs_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page, omfs_get_block);
+	return block_read_full_page(folio, omfs_get_block);
 }
 
 static void omfs_readahead(struct readahead_control *rac)
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 48f0547d4850..c277a0fbc417 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -244,8 +244,9 @@ static int orangefs_writepages(struct address_space *mapping,
 
 static int orangefs_launder_page(struct page *);
 
-static int orangefs_readpage(struct file *file, struct page *page)
+static int orangefs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	struct iov_iter iter;
 	struct bio_vec bv;
diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
index 3fb7fc819b4f..7515e9b6e6cb 100644
--- a/fs/qnx4/inode.c
+++ b/fs/qnx4/inode.c
@@ -245,9 +245,9 @@ static void qnx4_kill_sb(struct super_block *sb)
 	}
 }
 
-static int qnx4_readpage(struct file *file, struct page *page)
+static int qnx4_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page,qnx4_get_block);
+	return block_read_full_page(folio, qnx4_get_block);
 }
 
 static sector_t qnx4_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c
index 61191f7bdf62..dc55db339ee6 100644
--- a/fs/qnx6/inode.c
+++ b/fs/qnx6/inode.c
@@ -94,9 +94,9 @@ static int qnx6_check_blockptr(__fs32 ptr)
 	return 1;
 }
 
-static int qnx6_readpage(struct file *file, struct page *page)
+static int qnx6_readpage(struct file *file, struct folio *folio)
 {
-	return mpage_readpage(page, qnx6_get_block);
+	return mpage_readpage(folio, qnx6_get_block);
 }
 
 static void qnx6_readahead(struct readahead_control *rac)
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index c76d563dec0e..5a37575b8dc6 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2735,9 +2735,9 @@ static int reiserfs_write_full_page(struct page *page,
 	goto done;
 }
 
-static int reiserfs_readpage(struct file *f, struct page *page)
+static int reiserfs_readpage(struct file *f, struct folio *folio)
 {
-	return block_read_full_page(page, reiserfs_get_block);
+	return block_read_full_page(folio, reiserfs_get_block);
 }
 
 static int reiserfs_writepage(struct page *page, struct writeback_control *wbc)
diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index 259f684d9236..fe88dc5e5b4d 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -99,8 +99,9 @@ static struct inode *romfs_iget(struct super_block *sb, unsigned long pos);
 /*
  * read a page worth of data from the image
  */
-static int romfs_readpage(struct file *file, struct page *page)
+static int romfs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	loff_t offset, size;
 	unsigned long fillsize, pos;
diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 7b1128398976..20c458e8c45f 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -444,8 +444,9 @@ static int squashfs_readpage_sparse(struct page *page, int expected)
 	return 0;
 }
 
-static int squashfs_readpage(struct file *file, struct page *page)
+static int squashfs_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
 	int index = page->index >> (msblk->block_log - PAGE_SHIFT);
diff --git a/fs/squashfs/symlink.c b/fs/squashfs/symlink.c
index 1430613183e6..277dcd4bf4b2 100644
--- a/fs/squashfs/symlink.c
+++ b/fs/squashfs/symlink.c
@@ -30,8 +30,9 @@
 #include "squashfs.h"
 #include "xattr.h"
 
-static int squashfs_symlink_readpage(struct file *file, struct page *page)
+static int squashfs_symlink_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	struct super_block *sb = inode->i_sb;
 	struct squashfs_sb_info *msblk = sb->s_fs_info;
diff --git a/fs/sysv/itree.c b/fs/sysv/itree.c
index bcb67b0cabe7..81b5e6947932 100644
--- a/fs/sysv/itree.c
+++ b/fs/sysv/itree.c
@@ -456,9 +456,9 @@ static int sysv_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page,get_block,wbc);
 }
 
-static int sysv_readpage(struct file *file, struct page *page)
+static int sysv_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page,get_block);
+	return block_read_full_page(folio, get_block);
 }
 
 int sysv_prepare_chunk(struct page *page, loff_t pos, unsigned len)
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 2bc7780d2963..0b596e3665f5 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -890,12 +890,12 @@ static int ubifs_bulk_read(struct page *page)
 	return err;
 }
 
-static int ubifs_readpage(struct file *file, struct page *page)
+static int ubifs_readpage(struct file *file, struct folio *folio)
 {
-	if (ubifs_bulk_read(page))
+	if (ubifs_bulk_read(&folio->page))
 		return 0;
-	do_readpage(page);
-	unlock_page(page);
+	do_readpage(&folio->page);
+	unlock_folio(folio);
 	return 0;
 }
 
diff --git a/fs/udf/file.c b/fs/udf/file.c
index ad8eefad27d7..9fc85201dd20 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -57,11 +57,11 @@ static void __udf_adinicb_readpage(struct page *page)
 	kunmap_atomic(kaddr);
 }
 
-static int udf_adinicb_readpage(struct file *file, struct page *page)
+static int udf_adinicb_readpage(struct file *file, struct folio *folio)
 {
-	BUG_ON(!PageLocked(page));
-	__udf_adinicb_readpage(page);
-	unlock_page(page);
+	BUG_ON(!FolioLocked(folio));
+	__udf_adinicb_readpage(&folio->page);
+	unlock_folio(folio);
 
 	return 0;
 }
diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index bb89c3e43212..3bce0406956e 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -193,9 +193,9 @@ static int udf_writepages(struct address_space *mapping,
 	return mpage_writepages(mapping, wbc, udf_get_block);
 }
 
-static int udf_readpage(struct file *file, struct page *page)
+static int udf_readpage(struct file *file, struct folio *folio)
 {
-	return mpage_readpage(page, udf_get_block);
+	return mpage_readpage(folio, udf_get_block);
 }
 
 static void udf_readahead(struct readahead_control *rac)
diff --git a/fs/udf/symlink.c b/fs/udf/symlink.c
index c973db239604..b621ae41e6e3 100644
--- a/fs/udf/symlink.c
+++ b/fs/udf/symlink.c
@@ -101,8 +101,9 @@ static int udf_pc_to_char(struct super_block *sb, unsigned char *from,
 	return 0;
 }
 
-static int udf_symlink_filler(struct file *file, struct page *page)
+static int udf_symlink_filler(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct inode *inode = page->mapping->host;
 	struct buffer_head *bh = NULL;
 	unsigned char *symlink;
diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index c843ec858cf7..1b87d88761d5 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -472,9 +472,9 @@ static int ufs_writepage(struct page *page, struct writeback_control *wbc)
 	return block_write_full_page(page,ufs_getfrag_block,wbc);
 }
 
-static int ufs_readpage(struct file *file, struct page *page)
+static int ufs_readpage(struct file *file, struct folio *folio)
 {
-	return block_read_full_page(page,ufs_getfrag_block);
+	return block_read_full_page(folio, ufs_getfrag_block);
 }
 
 int ufs_prepare_chunk(struct page *page, loff_t pos, unsigned len)
diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
index c4ab5996d97a..aa918615cec9 100644
--- a/fs/vboxsf/file.c
+++ b/fs/vboxsf/file.c
@@ -208,8 +208,9 @@ const struct inode_operations vboxsf_reg_iops = {
 	.setattr = vboxsf_setattr
 };
 
-static int vboxsf_readpage(struct file *file, struct page *page)
+static int vboxsf_readpage(struct file *file, struct folio *folio)
 {
+	struct page *page = &folio->page;
 	struct vboxsf_handle *sf_handle = file->private_data;
 	loff_t off = page_offset(page);
 	u32 nread = PAGE_SIZE;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4304c6416fbb..cd1880b31652 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -620,9 +620,9 @@ xfs_vm_bmap(
 STATIC int
 xfs_vm_readpage(
 	struct file		*unused,
-	struct page		*page)
+	struct folio		*folio)
 {
-	return iomap_readpage(page, &xfs_read_iomap_ops);
+	return iomap_readpage(folio, &xfs_read_iomap_ops);
 }
 
 STATIC void
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index bec47f2d074b..5a6dbd515ca8 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -107,9 +107,9 @@ static const struct iomap_ops zonefs_iomap_ops = {
 	.iomap_begin	= zonefs_iomap_begin,
 };
 
-static int zonefs_readpage(struct file *unused, struct page *page)
+static int zonefs_readpage(struct file *unused, struct folio *folio)
 {
-	return iomap_readpage(page, &zonefs_iomap_ops);
+	return iomap_readpage(folio, &zonefs_iomap_ops);
 }
 
 static void zonefs_readahead(struct readahead_control *rac)
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 6b47f94378c5..7a3c2caf5740 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -221,7 +221,7 @@ int block_write_full_page(struct page *page, get_block_t *get_block,
 int __block_write_full_page(struct inode *inode, struct page *page,
 			get_block_t *get_block, struct writeback_control *wbc,
 			bh_end_io_t *handler);
-int block_read_full_page(struct page*, get_block_t*);
+int block_read_full_page(struct folio *, get_block_t *);
 int block_is_partially_uptodate(struct page *page, unsigned long from,
 				unsigned long count);
 int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 08f9a8a524f2..0a8250c8d6f7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -366,7 +366,7 @@ typedef int (*read_actor_t)(read_descriptor_t *, struct page *,
 
 struct address_space_operations {
 	int (*writepage)(struct page *page, struct writeback_control *wbc);
-	int (*readpage)(struct file *, struct page *);
+	int (*readpage)(struct file *, struct folio *);
 
 	/* Write back some dirty pages from this mapping. */
 	int (*writepages)(struct address_space *, struct writeback_control *);
@@ -3158,7 +3158,7 @@ extern void noop_invalidatepage(struct page *page, unsigned int offset,
 		unsigned int length);
 extern ssize_t noop_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
 extern int simple_empty(struct dentry *);
-extern int simple_readpage(struct file *file, struct page *page);
+extern int simple_readpage(struct file *file, struct folio *folio);
 extern int simple_write_begin(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 5bd3cac4df9c..a6da774a7532 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -155,7 +155,7 @@ loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
 
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
-int iomap_readpage(struct page *page, const struct iomap_ops *ops);
+int iomap_readpage(struct folio *folio, const struct iomap_ops *ops);
 void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
 int iomap_set_page_dirty(struct page *page);
 int iomap_is_partially_uptodate(struct page *page, unsigned long from,
diff --git a/include/linux/mpage.h b/include/linux/mpage.h
index f4f5e90a6844..b3361c9e5439 100644
--- a/include/linux/mpage.h
+++ b/include/linux/mpage.h
@@ -16,7 +16,7 @@ struct writeback_control;
 struct readahead_control;
 
 void mpage_readahead(struct readahead_control *, get_block_t get_block);
-int mpage_readpage(struct page *page, get_block_t get_block);
+int mpage_readpage(struct folio *folio, get_block_t get_block);
 int mpage_writepages(struct address_space *mapping,
 		struct writeback_control *wbc, get_block_t get_block);
 int mpage_writepage(struct page *page, get_block_t *get_block,
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 681ed98e4ba8..3643609cfe13 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -567,7 +567,7 @@ nfs_have_writebacks(struct inode *inode)
 /*
  * linux/fs/nfs/read.c
  */
-extern int  nfs_readpage(struct file *, struct page *);
+extern int nfs_readpage(struct file *, struct folio *);
 extern int  nfs_readpages(struct file *, struct address_space *,
 		struct list_head *, unsigned);
 extern int  nfs_readpage_async(struct nfs_open_context *, struct inode *,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 30123ae18ee1..2283e58ebe32 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -300,7 +300,7 @@ static inline gfp_t readahead_gfp_mask(struct address_space *x)
 	return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
 }
 
-typedef int filler_t(void *, struct page *);
+typedef int filler_t(void *, struct folio *);
 
 pgoff_t page_cache_next_miss(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan);
diff --git a/mm/filemap.c b/mm/filemap.c
index f3722ca8f7d4..3c5eb39452c3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2222,7 +2222,7 @@ generic_file_buffered_read_readpage(struct kiocb *iocb,
 	 */
 	ClearPageError(page);
 	/* Start the actual read. The read will unlock the page. */
-	error = mapping->a_ops->readpage(filp, page);
+	error = mapping->a_ops->readpage(filp, page_folio(page));
 
 	if (unlikely(error)) {
 		put_page(page);
@@ -3006,7 +3006,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	 */
 	ClearPageError(page);
 	fpin = maybe_unlock_mmap_for_io(vmf, fpin);
-	error = mapping->a_ops->readpage(file, page);
+	error = mapping->a_ops->readpage(file, page_folio(page));
 	if (!error) {
 		wait_on_page_locked(page);
 		if (!PageUptodate(page))
@@ -3193,10 +3193,7 @@ static struct page *wait_on_page_read(struct page *page)
 }
 
 static struct page *do_read_cache_page(struct address_space *mapping,
-				pgoff_t index,
-				int (*filler)(void *, struct page *),
-				void *data,
-				gfp_t gfp)
+		pgoff_t index, filler_t filler, void *data, gfp_t gfp)
 {
 	struct page *page;
 	int err;
@@ -3217,9 +3214,9 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 
 filler:
 		if (filler)
-			err = filler(data, page);
+			err = filler(data, page_folio(page));
 		else
-			err = mapping->a_ops->readpage(data, page);
+			err = mapping->a_ops->readpage(data, page_folio(page));
 
 		if (err < 0) {
 			put_page(page);
@@ -3313,10 +3310,8 @@ static struct page *do_read_cache_page(struct address_space *mapping,
  *
  * Return: up to date page on success, ERR_PTR() on failure.
  */
-struct page *read_cache_page(struct address_space *mapping,
-				pgoff_t index,
-				int (*filler)(void *, struct page *),
-				void *data)
+struct page *read_cache_page(struct address_space *mapping, pgoff_t index,
+		filler_t filler, void *data)
 {
 	return do_read_cache_page(mapping, index, filler, data,
 			mapping_gfp_mask(mapping));
diff --git a/mm/page_io.c b/mm/page_io.c
index 1fc0a579da58..09d509ea0c62 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -407,7 +407,7 @@ int swap_readpage(struct page *page, bool synchronous)
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
 
-		ret = mapping->a_ops->readpage(swap_file, page);
+		ret = mapping->a_ops->readpage(swap_file, page_folio(page));
 		if (!ret)
 			count_vm_event(PSWPIN);
 		goto out;
diff --git a/mm/readahead.c b/mm/readahead.c
index b2d78984e406..fb08d1d46ddb 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -87,7 +87,7 @@ static void read_cache_pages_invalidate_pages(struct address_space *mapping,
  * Returns: %0 on success, error return by @filler otherwise
  */
 int read_cache_pages(struct address_space *mapping, struct list_head *pages,
-			int (*filler)(void *, struct page *), void *data)
+			filler_t filler, void *data)
 {
 	struct page *page;
 	int ret = 0;
@@ -102,7 +102,7 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages,
 		}
 		put_page(page);
 
-		ret = filler(data, page);
+		ret = filler(data, page_folio(page));
 		if (unlikely(ret)) {
 			read_cache_pages_invalidate_pages(mapping, pages);
 			break;
@@ -140,7 +140,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
 		rac->_nr_pages = 0;
 	} else {
 		while ((folio = readahead_folio(rac)))
-			aops->readpage(rac->file, &folio->page);
+			aops->readpage(rac->file, folio);
 	}
 
 	blk_finish_plug(&plug);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 21/25] mm: Convert wait_on_page_bit to wait_on_folio_bit
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (19 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 20/25] fs: Change readpage to take a folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 22/25] mm: Add wait_on_folio_locked & wait_on_folio_locked_killable Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

We must deal with folios here otherwise we'll get the wrong waitqueue
and fail to receive wakeups.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/afs/write.c          |  2 +-
 include/linux/pagemap.h | 14 ++++++-----
 mm/filemap.c            | 54 ++++++++++++++++++-----------------------
 mm/page-writeback.c     |  7 +++---
 4 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index c9195fc67fd8..b58e7a69a464 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -852,7 +852,7 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 #endif
 
 	if (PageWriteback(vmf->page) &&
-	    wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)
+	    wait_on_folio_bit_killable(page_folio(vmf->page), PG_writeback) < 0)
 		return VM_FAULT_RETRY;
 
 	if (lock_page_killable(vmf->page) < 0)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2283e58ebe32..ac4d3e2ac86c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -717,8 +717,8 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
  * This is exported only for wait_on_page_locked/wait_on_page_writeback, etc.,
  * and should not be used directly.
  */
-extern void wait_on_page_bit(struct page *page, int bit_nr);
-extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
+extern void wait_on_folio_bit(struct folio *folio, int bit_nr);
+extern int wait_on_folio_bit_killable(struct folio *folio, int bit_nr);
 
 /* 
  * Wait for a page to be unlocked.
@@ -729,15 +729,17 @@ extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
  */
 static inline void wait_on_page_locked(struct page *page)
 {
-	if (PageLocked(page))
-		wait_on_page_bit(compound_head(page), PG_locked);
+	struct folio *folio = page_folio(page);
+	if (FolioLocked(folio))
+		wait_on_folio_bit(folio, PG_locked);
 }
 
 static inline int wait_on_page_locked_killable(struct page *page)
 {
-	if (!PageLocked(page))
+	struct folio *folio = page_folio(page);
+	if (!FolioLocked(folio))
 		return 0;
-	return wait_on_page_bit_killable(compound_head(page), PG_locked);
+	return wait_on_folio_bit_killable(folio, PG_locked);
 }
 
 extern void put_and_wait_on_page_locked(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index 3c5eb39452c3..a5925450ee13 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1075,7 +1075,7 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 	 *
 	 * So update the flags atomically, and wake up the waiter
 	 * afterwards to avoid any races. This store-release pairs
-	 * with the load-acquire in wait_on_page_bit_common().
+	 * with the load-acquire in wait_on_folio_bit_common().
 	 */
 	smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN);
 	wake_up_state(wait->private, mode);
@@ -1156,7 +1156,7 @@ static void wake_up_folio(struct folio *folio, int bit)
 }
 
 /*
- * A choice of three behaviors for wait_on_page_bit_common():
+ * A choice of three behaviors for wait_on_folio_bit_common():
  */
 enum behavior {
 	EXCLUSIVE,	/* Hold ref to page and take the bit when woken, like
@@ -1190,9 +1190,10 @@ static inline bool trylock_page_bit_common(struct page *page, int bit_nr,
 /* How many times do we accept lock stealing from under a waiter? */
 int sysctl_page_lock_unfairness = 5;
 
-static inline int wait_on_page_bit_common(wait_queue_head_t *q,
-	struct page *page, int bit_nr, int state, enum behavior behavior)
+static inline int wait_on_folio_bit_common(struct folio *folio, int bit_nr,
+		int state, enum behavior behavior)
 {
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
 	int unfairness = sysctl_page_lock_unfairness;
 	struct wait_page_queue wait_page;
 	wait_queue_entry_t *wait = &wait_page.wait;
@@ -1201,8 +1202,8 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	unsigned long pflags;
 
 	if (bit_nr == PG_locked &&
-	    !PageUptodate(page) && PageWorkingset(page)) {
-		if (!PageSwapBacked(page)) {
+	    !FolioUptodate(folio) && FolioWorkingset(folio)) {
+		if (!FolioSwapBacked(folio)) {
 			delayacct_thrashing_start();
 			delayacct = true;
 		}
@@ -1212,7 +1213,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 
 	init_wait(wait);
 	wait->func = wake_page_function;
-	wait_page.page = page;
+	wait_page.page = &folio->page;
 	wait_page.bit_nr = bit_nr;
 
 repeat:
@@ -1227,7 +1228,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	 * Do one last check whether we can get the
 	 * page bit synchronously.
 	 *
-	 * Do the SetPageWaiters() marking before that
+	 * Do the SetFolioWaiters() marking before that
 	 * to let any waker we _just_ missed know they
 	 * need to wake us up (otherwise they'll never
 	 * even go to the slow case that looks at the
@@ -1238,8 +1239,8 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	 * lock to avoid races.
 	 */
 	spin_lock_irq(&q->lock);
-	SetPageWaiters(page);
-	if (!trylock_page_bit_common(page, bit_nr, wait))
+	SetFolioWaiters(folio);
+	if (!trylock_page_bit_common(&folio->page, bit_nr, wait))
 		__add_wait_queue_entry_tail(q, wait);
 	spin_unlock_irq(&q->lock);
 
@@ -1249,10 +1250,10 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	 * see whether the page bit testing has already
 	 * been done by the wake function.
 	 *
-	 * We can drop our reference to the page.
+	 * We can drop our reference to the folio.
 	 */
 	if (behavior == DROP)
-		put_page(page);
+		put_folio(folio);
 
 	/*
 	 * Note that until the "finish_wait()", or until
@@ -1289,7 +1290,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 		 *
 		 * And if that fails, we'll have to retry this all.
 		 */
-		if (unlikely(test_and_set_bit(bit_nr, &page->flags)))
+		if (unlikely(test_and_set_bit(bit_nr, folio_flags(folio))))
 			goto repeat;
 
 		wait->flags |= WQ_FLAG_DONE;
@@ -1329,19 +1330,17 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
 }
 
-void wait_on_page_bit(struct page *page, int bit_nr)
+void wait_on_folio_bit(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
-	wait_on_page_bit_common(q, page, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
+	wait_on_folio_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
 }
-EXPORT_SYMBOL(wait_on_page_bit);
+EXPORT_SYMBOL(wait_on_folio_bit);
 
-int wait_on_page_bit_killable(struct page *page, int bit_nr)
+int wait_on_folio_bit_killable(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
-	return wait_on_page_bit_common(q, page, bit_nr, TASK_KILLABLE, SHARED);
+	return wait_on_folio_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED);
 }
-EXPORT_SYMBOL(wait_on_page_bit_killable);
+EXPORT_SYMBOL(wait_on_folio_bit_killable);
 
 static int __wait_on_page_locked_async(struct page *page,
 				       struct wait_page_queue *wait, bool set)
@@ -1393,11 +1392,8 @@ static int wait_on_page_locked_async(struct page *page,
  */
 void put_and_wait_on_page_locked(struct page *page)
 {
-	wait_queue_head_t *q;
-
-	page = compound_head(page);
-	q = page_waitqueue(page);
-	wait_on_page_bit_common(q, page, PG_locked, TASK_UNINTERRUPTIBLE, DROP);
+	wait_on_folio_bit_common(page_folio(page), PG_locked,
+				TASK_UNINTERRUPTIBLE, DROP);
 }
 
 /**
@@ -1530,16 +1526,14 @@ EXPORT_SYMBOL_GPL(page_endio);
  */
 void __lock_folio(struct folio *folio)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
-	wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_UNINTERRUPTIBLE,
+	wait_on_folio_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE,
 				EXCLUSIVE);
 }
 EXPORT_SYMBOL(__lock_folio);
 
 int __lock_folio_killable(struct folio *folio)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
-	return wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_KILLABLE,
+	return wait_on_folio_bit_common(folio, PG_locked, TASK_KILLABLE,
 					EXCLUSIVE);
 }
 EXPORT_SYMBOL_GPL(__lock_folio_killable);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 586042472ac9..500ed9afcec2 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2826,9 +2826,10 @@ EXPORT_SYMBOL(__test_set_page_writeback);
  */
 void wait_on_page_writeback(struct page *page)
 {
-	if (PageWriteback(page)) {
-		trace_wait_on_page_writeback(page, page_mapping(page));
-		wait_on_page_bit(page, PG_writeback);
+	struct folio *folio = page_folio(page);
+	if (FolioWriteback(folio)) {
+		trace_wait_on_page_writeback(page, folio_mapping(folio));
+		wait_on_folio_bit(folio, PG_writeback);
 	}
 }
 EXPORT_SYMBOL_GPL(wait_on_page_writeback);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 22/25] mm: Add wait_on_folio_locked & wait_on_folio_locked_killable
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (20 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 21/25] mm: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 23/25] mm: Add flush_dcache_folio Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Turn wait_on_page_locked() and wait_on_page_locked_killable() into
wrappers.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ac4d3e2ac86c..22f9774d8a83 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -727,21 +727,29 @@ extern int wait_on_folio_bit_killable(struct folio *folio, int bit_nr);
  * ie with increased "page->count" so that the page won't
  * go away during the wait..
  */
-static inline void wait_on_page_locked(struct page *page)
+static inline void wait_on_folio_locked(struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
 	if (FolioLocked(folio))
 		wait_on_folio_bit(folio, PG_locked);
 }
 
-static inline int wait_on_page_locked_killable(struct page *page)
+static inline int wait_on_folio_locked_killable(struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
 	if (!FolioLocked(folio))
 		return 0;
 	return wait_on_folio_bit_killable(folio, PG_locked);
 }
 
+static inline void wait_on_page_locked(struct page *page)
+{
+	wait_on_folio_locked(page_folio(page));
+}
+
+static inline int wait_on_page_locked_killable(struct page *page)
+{
+	return wait_on_folio_locked_killable(page_folio(page));
+}
+
 extern void put_and_wait_on_page_locked(struct page *page);
 
 void wait_on_page_writeback(struct page *page);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 23/25] mm: Add flush_dcache_folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (21 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 22/25] mm: Add wait_on_folio_locked & wait_on_folio_locked_killable Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 20:59   ` kernel test robot
  2020-12-16 18:23 ` [PATCH 24/25] mm: Add read_cache_folio and read_mapping_folio Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

This is a default implementation which calls flush_dcache_page() on
each page in the folio.  If architectures can do better, they should
implement their own version of it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/core-api/cachetlb.rst |  6 ++++++
 include/asm-generic/cacheflush.h    | 13 +++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
index a1582cc79f0f..484cf31fcded 100644
--- a/Documentation/core-api/cachetlb.rst
+++ b/Documentation/core-api/cachetlb.rst
@@ -325,6 +325,12 @@ maps this page at its virtual address.
 			dirty.  Again, see sparc64 for examples of how
 			to deal with this.
 
+  ``void flush_dcache_folio(struct folio *folio)``
+	This function is called under the same circumstances as
+	flush_dcache_page().  It allows the architecture to
+	optimise for flushing the entire folio of pages instead
+	of flushing one page at a time.
+
   ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
   unsigned long user_vaddr, void *dst, void *src, int len)``
   ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 4a674db4e1fa..5537ea24333d 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -49,9 +49,22 @@ static inline void flush_cache_page(struct vm_area_struct *vma,
 static inline void flush_dcache_page(struct page *page)
 {
 }
+
+static inline void flush_dcache_folio(struct folio *folio) { }
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
+#define ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
 #endif
 
+#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	unsigned int n = folio_nr_pages(folio);
+
+	do {
+		flush_dcache_page(&folio->page[--n]);
+	} while (n);
+}
+#endif
 
 #ifndef flush_dcache_mmap_lock
 static inline void flush_dcache_mmap_lock(struct address_space *mapping)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 24/25] mm: Add read_cache_folio and read_mapping_folio
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (22 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 23/25] mm: Add flush_dcache_folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-16 18:23 ` [PATCH 25/25] fs: Convert vfs_dedupe_file_range_compare to folios Matthew Wilcox (Oracle)
  2020-12-17 12:47 ` [PATCH 00/25] Page folios David Hildenbrand
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Reimplement read_cache_page() as a wrapper around read_cache_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 17 ++++++++-
 mm/filemap.c            | 81 +++++++++++++++++++----------------------
 2 files changed, 53 insertions(+), 45 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 22f9774d8a83..ae20b6fa46f0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -518,19 +518,32 @@ static inline struct page *grab_cache_page(struct address_space *mapping,
 	return find_or_create_page(mapping, index, mapping_gfp_mask(mapping));
 }
 
-extern struct page * read_cache_page(struct address_space *mapping,
-				pgoff_t index, filler_t *filler, void *data);
+struct folio *read_cache_folio(struct address_space *mapping, pgoff_t index,
+		filler_t *filler, void *data);
 extern struct page * read_cache_page_gfp(struct address_space *mapping,
 				pgoff_t index, gfp_t gfp_mask);
 extern int read_cache_pages(struct address_space *mapping,
 		struct list_head *pages, filler_t *filler, void *data);
 
+static inline struct page *read_cache_page(struct address_space *mapping,
+				pgoff_t index, filler_t *filler, void *data)
+{
+	struct folio *folio = read_cache_folio(mapping, index, filler, data);
+	return folio_page(folio, index);
+}
+
 static inline struct page *read_mapping_page(struct address_space *mapping,
 				pgoff_t index, void *data)
 {
 	return read_cache_page(mapping, index, NULL, data);
 }
 
+static inline struct folio *read_mapping_folio(struct address_space *mapping,
+				pgoff_t index, void *data)
+{
+	return read_cache_folio(mapping, index, NULL, data);
+}
+
 /*
  * Get index of the page with in radix-tree
  * (TODO: remove once hugetlb pages will have ->index in PAGE_SIZE)
diff --git a/mm/filemap.c b/mm/filemap.c
index a5925450ee13..0131208e45f7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3174,32 +3174,20 @@ EXPORT_SYMBOL(filemap_page_mkwrite);
 EXPORT_SYMBOL(generic_file_mmap);
 EXPORT_SYMBOL(generic_file_readonly_mmap);
 
-static struct page *wait_on_page_read(struct page *page)
-{
-	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
-		if (!PageUptodate(page)) {
-			put_page(page);
-			page = ERR_PTR(-EIO);
-		}
-	}
-	return page;
-}
-
-static struct page *do_read_cache_page(struct address_space *mapping,
+static struct folio *do_read_cache_folio(struct address_space *mapping,
 		pgoff_t index, filler_t filler, void *data, gfp_t gfp)
 {
-	struct page *page;
+	struct folio *folio;
 	int err;
 repeat:
-	page = find_get_page(mapping, index);
-	if (!page) {
-		page = &__page_cache_alloc(gfp, 0)->page;
-		if (!page)
+	folio = find_get_folio(mapping, index);
+	if (!folio) {
+		folio = __page_cache_alloc(gfp, 0);
+		if (!folio)
 			return ERR_PTR(-ENOMEM);
-		err = add_to_page_cache_lru(page, mapping, index, gfp);
+		err = folio_add_to_page_cache(folio, mapping, index, gfp);
 		if (unlikely(err)) {
-			put_page(page);
+			put_folio(folio);
 			if (err == -EEXIST)
 				goto repeat;
 			/* Presumably ENOMEM for xarray node */
@@ -3208,21 +3196,24 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 
 filler:
 		if (filler)
-			err = filler(data, page_folio(page));
+			err = filler(data, folio);
 		else
-			err = mapping->a_ops->readpage(data, page_folio(page));
+			err = mapping->a_ops->readpage(data, folio);
 
 		if (err < 0) {
-			put_page(page);
+			put_folio(folio);
 			return ERR_PTR(err);
 		}
 
-		page = wait_on_page_read(page);
-		if (IS_ERR(page))
-			return page;
+		wait_on_folio_locked(folio);
+		if (!FolioUptodate(folio)) {
+			put_folio(folio);
+			return ERR_PTR(-EIO);
+		}
+
 		goto out;
 	}
-	if (PageUptodate(page))
+	if (FolioUptodate(folio))
 		goto out;
 
 	/*
@@ -3256,23 +3247,23 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 	 * avoid spurious serialisations and wakeups when multiple processes
 	 * wait on the same page for IO to complete.
 	 */
-	wait_on_page_locked(page);
-	if (PageUptodate(page))
+	wait_on_folio_locked(folio);
+	if (FolioUptodate(folio))
 		goto out;
 
 	/* Distinguish between all the cases under the safety of the lock */
-	lock_page(page);
+	lock_folio(folio);
 
 	/* Case c or d, restart the operation */
-	if (!page->mapping) {
-		unlock_page(page);
-		put_page(page);
+	if (!folio->page.mapping) {
+		unlock_folio(folio);
+		put_folio(folio);
 		goto repeat;
 	}
 
 	/* Someone else locked and filled the page in a very small window */
-	if (PageUptodate(page)) {
-		unlock_page(page);
+	if (FolioUptodate(folio)) {
+		unlock_folio(folio);
 		goto out;
 	}
 
@@ -3282,16 +3273,16 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 	 * Clear page error before actual read, PG_error will be
 	 * set again if read page fails.
 	 */
-	ClearPageError(page);
+	ClearFolioError(folio);
 	goto filler;
 
 out:
-	mark_page_accessed(page);
-	return page;
+	mark_folio_accessed(folio);
+	return folio;
 }
 
 /**
- * read_cache_page - read into page cache, fill it if needed
+ * read_cache_folio - read into page cache, fill it if needed
  * @mapping:	the page's address_space
  * @index:	the page index
  * @filler:	function to perform the read
@@ -3304,13 +3295,13 @@ static struct page *do_read_cache_page(struct address_space *mapping,
  *
  * Return: up to date page on success, ERR_PTR() on failure.
  */
-struct page *read_cache_page(struct address_space *mapping, pgoff_t index,
+struct folio *read_cache_folio(struct address_space *mapping, pgoff_t index,
 		filler_t filler, void *data)
 {
-	return do_read_cache_page(mapping, index, filler, data,
+	return do_read_cache_folio(mapping, index, filler, data,
 			mapping_gfp_mask(mapping));
 }
-EXPORT_SYMBOL(read_cache_page);
+EXPORT_SYMBOL(read_cache_folio);
 
 /**
  * read_cache_page_gfp - read into page cache, using specified page allocation flags.
@@ -3329,7 +3320,11 @@ struct page *read_cache_page_gfp(struct address_space *mapping,
 				pgoff_t index,
 				gfp_t gfp)
 {
-	return do_read_cache_page(mapping, index, NULL, NULL, gfp);
+	struct folio *folio = do_read_cache_folio(mapping, index, NULL, NULL,
+									gfp);
+	if (IS_ERR(folio))
+		return &folio->page;
+	return folio_page(folio, index);
 }
 EXPORT_SYMBOL(read_cache_page_gfp);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 25/25] fs: Convert vfs_dedupe_file_range_compare to folios
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (23 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 24/25] mm: Add read_cache_folio and read_mapping_folio Matthew Wilcox (Oracle)
@ 2020-12-16 18:23 ` Matthew Wilcox (Oracle)
  2020-12-17 12:47 ` [PATCH 00/25] Page folios David Hildenbrand
  25 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-12-16 18:23 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Matthew Wilcox (Oracle), linux-kernel

Simplify the implementation somewhat by working in pgoff_t instead
of loff_t.  We still only operate on a single page of data at a time
due to using kmap().  A more complex implementation would work on
an entire folio at a time and (if the pages are highmem) map and
unmap, but it's not clear that such a complex implementation would
be worthwhile.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/remap_range.c | 109 ++++++++++++++++++++++-------------------------
 1 file changed, 52 insertions(+), 57 deletions(-)

diff --git a/fs/remap_range.c b/fs/remap_range.c
index 77dba3a49e65..0ee52c2da2cf 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -158,41 +158,41 @@ static int generic_remap_check_len(struct inode *inode_in,
 }
 
 /* Read a page's worth of file data into the page cache. */
-static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
+static struct folio *vfs_dedupe_get_folio(struct inode *inode, pgoff_t index)
 {
-	struct page *page;
+	struct folio *folio;
 
-	page = read_mapping_page(inode->i_mapping, offset >> PAGE_SHIFT, NULL);
-	if (IS_ERR(page))
-		return page;
-	if (!PageUptodate(page)) {
-		put_page(page);
+	folio = read_mapping_folio(inode->i_mapping, index, NULL);
+	if (IS_ERR(folio))
+		return folio;
+	if (!FolioUptodate(folio)) {
+		put_folio(folio);
 		return ERR_PTR(-EIO);
 	}
-	return page;
+	return folio;
 }
 
 /*
- * Lock two pages, ensuring that we lock in offset order if the pages are from
- * the same file.
+ * Lock two folios, ensuring that we lock in offset order if the folios
+ * are from the same file.
  */
-static void vfs_lock_two_pages(struct page *page1, struct page *page2)
+static void vfs_lock_two_folios(struct folio *folio1, struct folio *folio2)
 {
 	/* Always lock in order of increasing index. */
-	if (page1->index > page2->index)
-		swap(page1, page2);
+	if (folio_index(folio1) > folio_index(folio2))
+		swap(folio1, folio2);
 
-	lock_page(page1);
-	if (page1 != page2)
-		lock_page(page2);
+	lock_folio(folio1);
+	if (folio1 != folio2)
+		lock_folio(folio2);
 }
 
-/* Unlock two pages, being careful not to unlock the same page twice. */
-static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
+/* Unlock two folios, being careful not to unlock the same folio twice. */
+static void vfs_unlock_two_folios(struct folio *folio1, struct folio *folio2)
 {
-	unlock_page(page1);
-	if (page1 != page2)
-		unlock_page(page2);
+	unlock_folio(folio1);
+	if (folio1 != folio2)
+		unlock_folio(folio2);
 }
 
 /*
@@ -203,68 +203,63 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 					 struct inode *dest, loff_t destoff,
 					 loff_t len, bool *is_same)
 {
-	loff_t src_poff;
-	loff_t dest_poff;
-	void *src_addr;
-	void *dest_addr;
-	struct page *src_page;
-	struct page *dest_page;
-	loff_t cmp_len;
-	bool same;
-	int error;
-
-	error = -EINVAL;
-	same = true;
+	bool same = true;
+	int error = -EINVAL;
+
 	while (len) {
-		src_poff = srcoff & (PAGE_SIZE - 1);
-		dest_poff = destoff & (PAGE_SIZE - 1);
-		cmp_len = min(PAGE_SIZE - src_poff,
-			      PAGE_SIZE - dest_poff);
+		struct folio *src_folio, *dst_folio;
+		void *src_addr, *dest_addr;
+		pgoff_t src_index = srcoff / PAGE_SIZE;
+		pgoff_t dst_index = destoff / PAGE_SIZE;
+		loff_t cmp_len = min(PAGE_SIZE - offset_in_page(srcoff),
+				     PAGE_SIZE - offset_in_page(destoff));
+
 		cmp_len = min(cmp_len, len);
 		if (cmp_len <= 0)
 			goto out_error;
 
-		src_page = vfs_dedupe_get_page(src, srcoff);
-		if (IS_ERR(src_page)) {
-			error = PTR_ERR(src_page);
+		src_folio = vfs_dedupe_get_folio(src, src_index);
+		if (IS_ERR(src_folio)) {
+			error = PTR_ERR(src_folio);
 			goto out_error;
 		}
-		dest_page = vfs_dedupe_get_page(dest, destoff);
-		if (IS_ERR(dest_page)) {
-			error = PTR_ERR(dest_page);
-			put_page(src_page);
+		dst_folio = vfs_dedupe_get_folio(dest, dst_index);
+		if (IS_ERR(dst_folio)) {
+			error = PTR_ERR(dst_folio);
+			put_folio(src_folio);
 			goto out_error;
 		}
 
-		vfs_lock_two_pages(src_page, dest_page);
+		vfs_lock_two_folios(src_folio, dst_folio);
 
 		/*
-		 * Now that we've locked both pages, make sure they're still
+		 * Now that we've locked both folios, make sure they're still
 		 * mapped to the file data we're interested in.  If not,
 		 * someone is invalidating pages on us and we lose.
 		 */
-		if (!PageUptodate(src_page) || !PageUptodate(dest_page) ||
-		    src_page->mapping != src->i_mapping ||
-		    dest_page->mapping != dest->i_mapping) {
+		if (!FolioUptodate(src_folio) || !FolioUptodate(dst_folio) ||
+		    folio_mapping(src_folio) != src->i_mapping ||
+		    folio_mapping(dst_folio) != dest->i_mapping) {
 			same = false;
 			goto unlock;
 		}
 
-		src_addr = kmap_atomic(src_page);
-		dest_addr = kmap_atomic(dest_page);
+		src_addr = kmap_atomic(folio_page(src_folio, src_index));
+		dest_addr = kmap_atomic(folio_page(dst_folio, dst_index));
 
-		flush_dcache_page(src_page);
-		flush_dcache_page(dest_page);
+		flush_dcache_folio(src_folio);
+		flush_dcache_folio(dst_folio);
 
-		if (memcmp(src_addr + src_poff, dest_addr + dest_poff, cmp_len))
+		if (memcmp(src_addr + offset_in_page(srcoff),
+			   dest_addr + offset_in_page(destoff), cmp_len))
 			same = false;
 
 		kunmap_atomic(dest_addr);
 		kunmap_atomic(src_addr);
 unlock:
-		vfs_unlock_two_pages(src_page, dest_page);
-		put_page(dest_page);
-		put_page(src_page);
+		vfs_unlock_two_folios(src_folio, dst_folio);
+		put_folio(dst_folio);
+		put_folio(src_folio);
 
 		if (!same)
 			break;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 23/25] mm: Add flush_dcache_folio
  2020-12-16 18:23 ` [PATCH 23/25] mm: Add flush_dcache_folio Matthew Wilcox (Oracle)
@ 2020-12-16 20:59   ` kernel test robot
  2020-12-16 22:01     ` Matthew Wilcox
  0 siblings, 1 reply; 35+ messages in thread
From: kernel test robot @ 2020-12-16 20:59 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm
  Cc: kbuild-all, Matthew Wilcox (Oracle), linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5147 bytes --]

Hi "Matthew,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20201215]
[cannot apply to kdave/for-next ceph-client/for-linus linus/master hnaz-linux-mm/master v5.10 v5.10-rc7 v5.10-rc6 v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Matthew-Wilcox-Oracle/Page-folios/20201217-023021
base:    9317f948b0b188b8d2fded75957e6d42c460df1b
config: powerpc64-randconfig-p002-20201216 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/7611dc869e8fa7240e3c841bffe6a88e46a802c6
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Matthew-Wilcox-Oracle/Page-folios/20201217-023021
        git checkout 7611dc869e8fa7240e3c841bffe6a88e46a802c6
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/cacheflush.h:111,
                    from include/linux/highmem.h:12,
                    from include/linux/pagemap.h:11,
                    from include/linux/blkdev.h:14,
                    from include/linux/blk-cgroup.h:23,
                    from include/linux/writeback.h:14,
                    from include/linux/memcontrol.h:22,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/powerpc/kernel/asm-offsets.c:23:
   include/asm-generic/cacheflush.h: In function 'flush_dcache_folio':
>> include/asm-generic/cacheflush.h:64:33: error: subscripted value is neither array nor pointer nor vector
      64 |   flush_dcache_page(&folio->page[--n]);
         |                                 ^
--
   In file included from arch/powerpc/include/asm/cacheflush.h:111,
                    from include/linux/highmem.h:12,
                    from include/linux/pagemap.h:11,
                    from include/linux/blkdev.h:14,
                    from include/linux/blk-cgroup.h:23,
                    from include/linux/writeback.h:14,
                    from include/linux/memcontrol.h:22,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/powerpc/kernel/asm-offsets.c:23:
   include/asm-generic/cacheflush.h: In function 'flush_dcache_folio':
>> include/asm-generic/cacheflush.h:64:33: error: subscripted value is neither array nor pointer nor vector
      64 |   flush_dcache_page(&folio->page[--n]);
         |                                 ^
   make[2]: *** [scripts/Makefile.build:117: arch/powerpc/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1206: prepare0] Error 2
   make[1]: Target 'modules_prepare' not remade because of errors.
   make: *** [Makefile:185: __sub-make] Error 2
   make: Target 'modules_prepare' not remade because of errors.
--
   In file included from arch/powerpc/include/asm/cacheflush.h:111,
                    from include/linux/highmem.h:12,
                    from include/linux/pagemap.h:11,
                    from include/linux/blkdev.h:14,
                    from include/linux/blk-cgroup.h:23,
                    from include/linux/writeback.h:14,
                    from include/linux/memcontrol.h:22,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/powerpc/kernel/asm-offsets.c:23:
   include/asm-generic/cacheflush.h: In function 'flush_dcache_folio':
>> include/asm-generic/cacheflush.h:64:33: error: subscripted value is neither array nor pointer nor vector
      64 |   flush_dcache_page(&folio->page[--n]);
         |                                 ^
   make[2]: *** [scripts/Makefile.build:117: arch/powerpc/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1206: prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:185: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.


vim +64 include/asm-generic/cacheflush.h

    57	
    58	#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
    59	static inline void flush_dcache_folio(struct folio *folio)
    60	{
    61		unsigned int n = folio_nr_pages(folio);
    62	
    63		do {
  > 64			flush_dcache_page(&folio->page[--n]);
    65		} while (n);
    66	}
    67	#endif
    68	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37507 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 23/25] mm: Add flush_dcache_folio
  2020-12-16 20:59   ` kernel test robot
@ 2020-12-16 22:01     ` Matthew Wilcox
  0 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox @ 2020-12-16 22:01 UTC (permalink / raw)
  To: kernel test robot; +Cc: linux-fsdevel, linux-mm, kbuild-all, linux-kernel

On Thu, Dec 17, 2020 at 04:59:21AM +0800, kernel test robot wrote:
> All errors (new ones prefixed by >>):
> 
>    In file included from arch/powerpc/include/asm/cacheflush.h:111,
>                     from include/linux/highmem.h:12,
>                     from include/linux/pagemap.h:11,
>                     from include/linux/blkdev.h:14,
>                     from include/linux/blk-cgroup.h:23,
>                     from include/linux/writeback.h:14,
>                     from include/linux/memcontrol.h:22,
>                     from include/linux/swap.h:9,
>                     from include/linux/suspend.h:5,
>                     from arch/powerpc/kernel/asm-offsets.c:23:
>    include/asm-generic/cacheflush.h: In function 'flush_dcache_folio':
> >> include/asm-generic/cacheflush.h:64:33: error: subscripted value is neither array nor pointer nor vector
>       64 |   flush_dcache_page(&folio->page[--n]);

Thanks.  Apparently I need to compile on more than just x86 ;-)

This compiles on aargh64:

@@ -61,7 +61,8 @@ static inline void flush_dcache_folio(struct folio *folio)
        unsigned int n = folio_nr_pages(folio);
 
        do {
-               flush_dcache_page(&folio->page[--n]);
+               n--;
+               flush_dcache_page(&folio->page + n);
        } while (n);
 }
 #endif

I'll fold it into my git tree.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 18/25] btrfs: Use readahead_batch_length
  2020-12-16 18:23 ` [PATCH 18/25] btrfs: Use readahead_batch_length Matthew Wilcox (Oracle)
@ 2020-12-17  9:15   ` John Hubbard
  2020-12-17 12:12     ` Matthew Wilcox
  0 siblings, 1 reply; 35+ messages in thread
From: John Hubbard @ 2020-12-17  9:15 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm; +Cc: linux-kernel

On 12/16/20 10:23 AM, Matthew Wilcox (Oracle) wrote:
> Implement readahead_batch_length() to determine the number of bytes in
> the current batch of readahead pages and use it in btrfs.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   fs/btrfs/extent_io.c    | 6 ++----
>   include/linux/pagemap.h | 9 +++++++++
>   2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6e3b72e63e42..42936a83a91b 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4436,10 +4436,8 @@ void extent_readahead(struct readahead_control *rac)
>   	int nr;
>   
>   	while ((nr = readahead_page_batch(rac, pagepool))) {
> -		u64 contig_start = page_offset(pagepool[0]);
> -		u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;
> -
> -		ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
> +		u64 contig_start = readahead_pos(rac);
> +		u64 contig_end = contig_start + readahead_batch_length(rac);

Something in this tiny change is breaking btrfs: it hangs my Fedora 33 test
system (which changed over to btrfs) on boot. I haven't quite figured out
what's really wrong, but git bisect lands here, *and* turning the whole
extent_readahead() function into a no-op (on top of the whole series)
allows everything to work once again.

Sorry for not actually solving the root cause, but I figured you'd be able
to jump straight to the answer, with the above information, so I'm sending
it out early.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 18/25] btrfs: Use readahead_batch_length
  2020-12-17  9:15   ` John Hubbard
@ 2020-12-17 12:12     ` Matthew Wilcox
  2020-12-17 13:42       ` Matthew Wilcox
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2020-12-17 12:12 UTC (permalink / raw)
  To: John Hubbard; +Cc: linux-fsdevel, linux-mm, linux-kernel

On Thu, Dec 17, 2020 at 01:15:10AM -0800, John Hubbard wrote:
> On 12/16/20 10:23 AM, Matthew Wilcox (Oracle) wrote:
> > Implement readahead_batch_length() to determine the number of bytes in
> > the current batch of readahead pages and use it in btrfs.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >   fs/btrfs/extent_io.c    | 6 ++----
> >   include/linux/pagemap.h | 9 +++++++++
> >   2 files changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 6e3b72e63e42..42936a83a91b 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -4436,10 +4436,8 @@ void extent_readahead(struct readahead_control *rac)
> >   	int nr;
> >   	while ((nr = readahead_page_batch(rac, pagepool))) {
> > -		u64 contig_start = page_offset(pagepool[0]);
> > -		u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;
> > -
> > -		ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
> > +		u64 contig_start = readahead_pos(rac);
> > +		u64 contig_end = contig_start + readahead_batch_length(rac);
> 
> Something in this tiny change is breaking btrfs: it hangs my Fedora 33 test
> system (which changed over to btrfs) on boot. I haven't quite figured out
> what's really wrong, but git bisect lands here, *and* turning the whole
> extent_readahead() function into a no-op (on top of the whole series)
> allows everything to work once again.
> 
> Sorry for not actually solving the root cause, but I figured you'd be able
> to jump straight to the answer, with the above information, so I'm sending
> it out early.

ehh ... probably an off-by-one.  Does subtracting 1 from contig_end fix it?
I'll spool up a test VM shortly and try it out.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 00/25] Page folios
  2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
                   ` (24 preceding siblings ...)
  2020-12-16 18:23 ` [PATCH 25/25] fs: Convert vfs_dedupe_file_range_compare to folios Matthew Wilcox (Oracle)
@ 2020-12-17 12:47 ` David Hildenbrand
  2020-12-17 13:55   ` Matthew Wilcox
  25 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2020-12-17 12:47 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm; +Cc: linux-kernel

On 16.12.20 19:23, Matthew Wilcox (Oracle) wrote:
> One of the great things about compound pages is that when you try to
> do various operations on a tail page, it redirects to the head page and
> everything Just Works.  One of the awful things is how much we pay for
> that simplicity.  Here's an example, end_page_writeback():
> 
>         if (PageReclaim(page)) {
>                 ClearPageReclaim(page);
>                 rotate_reclaimable_page(page);
>         }
>         get_page(page);
>         if (!test_clear_page_writeback(page))
>                 BUG();
> 
>         smp_mb__after_atomic();
>         wake_up_page(page, PG_writeback);
>         put_page(page);
> 
> That all looks very straightforward, but if you dive into the disassembly,
> you see that there are four calls to compound_head() in this function
> (PageReclaim(), ClearPageReclaim(), get_page() and put_page()).  It's
> all for nothing, because if anyone does call this routine with a tail
> page, wake_up_page() will VM_BUG_ON_PGFLAGS(PageTail(page), page).
> 
> I'm not really a CPU person, but I imagine there's some kind of dependency
> here that sucks too:
> 
>     1fd7:       48 8b 57 08             mov    0x8(%rdi),%rdx
>     1fdb:       48 8d 42 ff             lea    -0x1(%rdx),%rax
>     1fdf:       83 e2 01                and    $0x1,%edx
>     1fe2:       48 0f 44 c7             cmove  %rdi,%rax
>     1fe6:       f0 80 60 02 fb          lock andb $0xfb,0x2(%rax)
> 
> Sure, it's going to be cache hot, but that cmove has to execute before
> the lock andb.
> 
> I would like to introduce a new concept that I call a Page Folio.
> Or just struct folio to its friends.  Here it is,
> struct folio {
>         struct page page;
> };
> 
> A folio is a struct page which is guaranteed not to be a tail page.
> So it's either a head page or a base (order-0) page.  That means
> we don't have to call compound_head() on it and we save massively.
> end_page_writeback() reduces from four calls to compound_head() to just
> one (at the beginning of the function) and it shrinks from 213 bytes
> to 126 bytes (using distro kernel config options).  I think even that one
> can be eliminated, but I'm going slowly at this point and taking the
> safe route of transforming a random struct page pointer into a struct
> folio pointer by calling page_folio().  By the end of this exercise,
> end_page_writeback() will become end_folio_writeback().
> 
> This is going to be a ton of work, and massively disruptive.  It'll touch
> every filesystem, and a good few device drivers!  But I think it's worth
> it.  Not every routine benefits as much as end_page_writeback(), but it
> makes everything a little better.  At 29 bytes per call to lock_page(),
> unlock_page(), put_page() and get_page(), that's on the order of 60kB of
> text for allyesconfig.  More when you add on all the PageFoo() calls.
> With the small amount of work I've done here, mm/filemap.o shrinks its
> text segment by over a kilobyte from 33687 to 32318 bytes (and also 192
> bytes of data).

Just wondering, as the primary motivation here is "minimizing CPU work",
did you run any benchmarks that revealed a visible performance improvement?

Otherwise, we're left with a concept that's hard to grasp first (folio -
what?!) and "a ton of work, and massively disruptive", saving some kb of
code - which does not sound too appealing to me.

(I like the idea of abstracting which pages are actually worth looking
at directly instead of going via a tail page - tail pages act somewhat
like a proxy for the head page when accessing flags)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 18/25] btrfs: Use readahead_batch_length
  2020-12-17 12:12     ` Matthew Wilcox
@ 2020-12-17 13:42       ` Matthew Wilcox
  2020-12-17 19:36         ` John Hubbard
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2020-12-17 13:42 UTC (permalink / raw)
  To: John Hubbard; +Cc: linux-fsdevel, linux-mm, linux-kernel

On Thu, Dec 17, 2020 at 12:12:46PM +0000, Matthew Wilcox wrote:
> ehh ... probably an off-by-one.  Does subtracting 1 from contig_end fix it?
> I'll spool up a test VM shortly and try it out.

Yes, this fixed it:

-               u64 contig_end = contig_start + readahead_batch_length(rac);
+               u64 contig_end = contig_start + readahead_batch_length(rac) - 1;


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 00/25] Page folios
  2020-12-17 12:47 ` [PATCH 00/25] Page folios David Hildenbrand
@ 2020-12-17 13:55   ` Matthew Wilcox
  2020-12-17 14:35     ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2020-12-17 13:55 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-fsdevel, linux-mm, linux-kernel

On Thu, Dec 17, 2020 at 01:47:57PM +0100, David Hildenbrand wrote:
> On 16.12.20 19:23, Matthew Wilcox (Oracle) wrote:
> > One of the great things about compound pages is that when you try to
> > do various operations on a tail page, it redirects to the head page and
> > everything Just Works.  One of the awful things is how much we pay for
> > that simplicity.  Here's an example, end_page_writeback():
> > 
> >         if (PageReclaim(page)) {
> >                 ClearPageReclaim(page);
> >                 rotate_reclaimable_page(page);
> >         }
> >         get_page(page);
> >         if (!test_clear_page_writeback(page))
> >                 BUG();
> > 
> >         smp_mb__after_atomic();
> >         wake_up_page(page, PG_writeback);
> >         put_page(page);
> > 
> > That all looks very straightforward, but if you dive into the disassembly,
> > you see that there are four calls to compound_head() in this function
> > (PageReclaim(), ClearPageReclaim(), get_page() and put_page()).  It's
> > all for nothing, because if anyone does call this routine with a tail
> > page, wake_up_page() will VM_BUG_ON_PGFLAGS(PageTail(page), page).
> > 
> > I'm not really a CPU person, but I imagine there's some kind of dependency
> > here that sucks too:
> > 
> >     1fd7:       48 8b 57 08             mov    0x8(%rdi),%rdx
> >     1fdb:       48 8d 42 ff             lea    -0x1(%rdx),%rax
> >     1fdf:       83 e2 01                and    $0x1,%edx
> >     1fe2:       48 0f 44 c7             cmove  %rdi,%rax
> >     1fe6:       f0 80 60 02 fb          lock andb $0xfb,0x2(%rax)
> > 
> > Sure, it's going to be cache hot, but that cmove has to execute before
> > the lock andb.
> > 
> > I would like to introduce a new concept that I call a Page Folio.
> > Or just struct folio to its friends.  Here it is,
> > struct folio {
> >         struct page page;
> > };
> > 
> > A folio is a struct page which is guaranteed not to be a tail page.
> > So it's either a head page or a base (order-0) page.  That means
> > we don't have to call compound_head() on it and we save massively.
> > end_page_writeback() reduces from four calls to compound_head() to just
> > one (at the beginning of the function) and it shrinks from 213 bytes
> > to 126 bytes (using distro kernel config options).  I think even that one
> > can be eliminated, but I'm going slowly at this point and taking the
> > safe route of transforming a random struct page pointer into a struct
> > folio pointer by calling page_folio().  By the end of this exercise,
> > end_page_writeback() will become end_folio_writeback().
> > 
> > This is going to be a ton of work, and massively disruptive.  It'll touch
> > every filesystem, and a good few device drivers!  But I think it's worth
> > it.  Not every routine benefits as much as end_page_writeback(), but it
> > makes everything a little better.  At 29 bytes per call to lock_page(),
> > unlock_page(), put_page() and get_page(), that's on the order of 60kB of
> > text for allyesconfig.  More when you add on all the PageFoo() calls.
> > With the small amount of work I've done here, mm/filemap.o shrinks its
> > text segment by over a kilobyte from 33687 to 32318 bytes (and also 192
> > bytes of data).
> 
> Just wondering, as the primary motivation here is "minimizing CPU work",
> did you run any benchmarks that revealed a visible performance improvement?
> 
> Otherwise, we're left with a concept that's hard to grasp first (folio -
> what?!) and "a ton of work, and massively disruptive", saving some kb of
> code - which does not sound too appealing to me.
> 
> (I like the idea of abstracting which pages are actually worth looking
> at directly instead of going via a tail page - tail pages act somewhat
> like a proxy for the head page when accessing flags)

My primary motivation here isn't minimising CPU work at all.  It's trying
to document which interfaces are expected to operate on an entire
compound page and which are expected to operate on a PAGE_SIZE page.
Today, we have a horrible mishmash of

 - This is a head page, I shall operate on 2MB of data
 - This is a tail page, I shall operate on 2MB of data
 - This is not a head page, I shall operate on 4kB of data
 - This is a head page, I shall operate on 4kB of data
 - This is a head|tail page, I shall operate on the size of the compound page.

You might say "Well, why not lead with that?", but I don't know which
advantages people are going to find most compelling.  Even if someone
doesn't believe in the advantages of using folios in the page cache,
looking at the assembler output is, I think, compelling.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 00/25] Page folios
  2020-12-17 13:55   ` Matthew Wilcox
@ 2020-12-17 14:35     ` David Hildenbrand
  0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2020-12-17 14:35 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel

On 17.12.20 14:55, Matthew Wilcox wrote:
> On Thu, Dec 17, 2020 at 01:47:57PM +0100, David Hildenbrand wrote:
>> On 16.12.20 19:23, Matthew Wilcox (Oracle) wrote:
>>> One of the great things about compound pages is that when you try to
>>> do various operations on a tail page, it redirects to the head page and
>>> everything Just Works.  One of the awful things is how much we pay for
>>> that simplicity.  Here's an example, end_page_writeback():
>>>
>>>         if (PageReclaim(page)) {
>>>                 ClearPageReclaim(page);
>>>                 rotate_reclaimable_page(page);
>>>         }
>>>         get_page(page);
>>>         if (!test_clear_page_writeback(page))
>>>                 BUG();
>>>
>>>         smp_mb__after_atomic();
>>>         wake_up_page(page, PG_writeback);
>>>         put_page(page);
>>>
>>> That all looks very straightforward, but if you dive into the disassembly,
>>> you see that there are four calls to compound_head() in this function
>>> (PageReclaim(), ClearPageReclaim(), get_page() and put_page()).  It's
>>> all for nothing, because if anyone does call this routine with a tail
>>> page, wake_up_page() will VM_BUG_ON_PGFLAGS(PageTail(page), page).
>>>
>>> I'm not really a CPU person, but I imagine there's some kind of dependency
>>> here that sucks too:
>>>
>>>     1fd7:       48 8b 57 08             mov    0x8(%rdi),%rdx
>>>     1fdb:       48 8d 42 ff             lea    -0x1(%rdx),%rax
>>>     1fdf:       83 e2 01                and    $0x1,%edx
>>>     1fe2:       48 0f 44 c7             cmove  %rdi,%rax
>>>     1fe6:       f0 80 60 02 fb          lock andb $0xfb,0x2(%rax)
>>>
>>> Sure, it's going to be cache hot, but that cmove has to execute before
>>> the lock andb.
>>>
>>> I would like to introduce a new concept that I call a Page Folio.
>>> Or just struct folio to its friends.  Here it is,
>>> struct folio {
>>>         struct page page;
>>> };
>>>
>>> A folio is a struct page which is guaranteed not to be a tail page.
>>> So it's either a head page or a base (order-0) page.  That means
>>> we don't have to call compound_head() on it and we save massively.
>>> end_page_writeback() reduces from four calls to compound_head() to just
>>> one (at the beginning of the function) and it shrinks from 213 bytes
>>> to 126 bytes (using distro kernel config options).  I think even that one
>>> can be eliminated, but I'm going slowly at this point and taking the
>>> safe route of transforming a random struct page pointer into a struct
>>> folio pointer by calling page_folio().  By the end of this exercise,
>>> end_page_writeback() will become end_folio_writeback().
>>>
>>> This is going to be a ton of work, and massively disruptive.  It'll touch
>>> every filesystem, and a good few device drivers!  But I think it's worth
>>> it.  Not every routine benefits as much as end_page_writeback(), but it
>>> makes everything a little better.  At 29 bytes per call to lock_page(),
>>> unlock_page(), put_page() and get_page(), that's on the order of 60kB of
>>> text for allyesconfig.  More when you add on all the PageFoo() calls.
>>> With the small amount of work I've done here, mm/filemap.o shrinks its
>>> text segment by over a kilobyte from 33687 to 32318 bytes (and also 192
>>> bytes of data).
>>
>> Just wondering, as the primary motivation here is "minimizing CPU work",
>> did you run any benchmarks that revealed a visible performance improvement?
>>
>> Otherwise, we're left with a concept that's hard to grasp first (folio -
>> what?!) and "a ton of work, and massively disruptive", saving some kb of
>> code - which does not sound too appealing to me.
>>
>> (I like the idea of abstracting which pages are actually worth looking
>> at directly instead of going via a tail page - tail pages act somewhat
>> like a proxy for the head page when accessing flags)
> 
> My primary motivation here isn't minimising CPU work at all.  It's trying

Ah, okay, reading about disassembly gave me that impression.

> to document which interfaces are expected to operate on an entire
> compound page and which are expected to operate on a PAGE_SIZE page.
> Today, we have a horrible mishmash of
> 
>  - This is a head page, I shall operate on 2MB of data
>  - This is a tail page, I shall operate on 2MB of data
>  - This is not a head page, I shall operate on 4kB of data
>  - This is a head page, I shall operate on 4kB of data
>  - This is a head|tail page, I shall operate on the size of the compound page.
> 
> You might say "Well, why not lead with that?", but I don't know which
> advantages people are going to find most compelling.  Even if someone
> doesn't believe in the advantages of using folios in the page cache,
> looking at the assembler output is, I think, compelling.

Personally, I think the implicit documentation of which type of pages
functions expect is a clear advantage. Having less code is a nice cherry
on top.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 18/25] btrfs: Use readahead_batch_length
  2020-12-17 13:42       ` Matthew Wilcox
@ 2020-12-17 19:36         ` John Hubbard
  0 siblings, 0 replies; 35+ messages in thread
From: John Hubbard @ 2020-12-17 19:36 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel

On 12/17/20 5:42 AM, Matthew Wilcox wrote:
> On Thu, Dec 17, 2020 at 12:12:46PM +0000, Matthew Wilcox wrote:
>> ehh ... probably an off-by-one.  Does subtracting 1 from contig_end fix it?
>> I'll spool up a test VM shortly and try it out.
> 
> Yes, this fixed it:
> 
> -               u64 contig_end = contig_start + readahead_batch_length(rac);
> +               u64 contig_end = contig_start + readahead_batch_length(rac) - 1;
> 

Yes, confirmed on my end, too.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2020-12-17 19:37 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 01/25] mm: Introduce struct folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 02/25] mm: Add put_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 03/25] mm: Add get_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 04/25] mm: Create FolioFlags Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 05/25] mm: Add unlock_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 06/25] mm: Add lock_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 07/25] mm: Add lock_folio_killable Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 08/25] mm: Add __alloc_folio_node and alloc_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 09/25] mm: Convert __page_cache_alloc to return a folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 10/25] mm/filemap: Convert end_page_writeback to use " Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 11/25] mm: Convert mapping_get_entry to return " Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 12/25] mm: Add mark_folio_accessed Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 13/25] mm: Add filemap_get_folio and find_get_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 14/25] mm/filemap: Add folio_add_to_page_cache Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 15/25] mm/swap: Convert rotate_reclaimable_page to folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 16/25] mm: Add folio_mapping Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 17/25] mm: Rename THP_SUPPORT to MULTI_PAGE_FOLIOS Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 18/25] btrfs: Use readahead_batch_length Matthew Wilcox (Oracle)
2020-12-17  9:15   ` John Hubbard
2020-12-17 12:12     ` Matthew Wilcox
2020-12-17 13:42       ` Matthew Wilcox
2020-12-17 19:36         ` John Hubbard
2020-12-16 18:23 ` [PATCH 19/25] fs: Change page refcount rules for readahead Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 20/25] fs: Change readpage to take a folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 21/25] mm: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 22/25] mm: Add wait_on_folio_locked & wait_on_folio_locked_killable Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 23/25] mm: Add flush_dcache_folio Matthew Wilcox (Oracle)
2020-12-16 20:59   ` kernel test robot
2020-12-16 22:01     ` Matthew Wilcox
2020-12-16 18:23 ` [PATCH 24/25] mm: Add read_cache_folio and read_mapping_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 25/25] fs: Convert vfs_dedupe_file_range_compare to folios Matthew Wilcox (Oracle)
2020-12-17 12:47 ` [PATCH 00/25] Page folios David Hildenbrand
2020-12-17 13:55   ` Matthew Wilcox
2020-12-17 14:35     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).