linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/27] Memory Folios
@ 2021-03-31 18:47 Matthew Wilcox (Oracle)
  2021-03-31 18:47 ` [PATCH v6 01/27] mm: Introduce struct folio Matthew Wilcox (Oracle)
                   ` (29 more replies)
  0 siblings, 30 replies; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Managing memory in 4KiB pages is a serious overhead.  Many benchmarks
exist which show the benefits of a larger "page size".  As an example,
an earlier iteration of this idea which used compound pages got a 7%
performance boost when compiling the kernel using kernbench without any
particular tuning.

Using compound pages or THPs exposes a serious weakness in our type
system.  Functions are often unprepared for compound pages to be passed
to them, and may only act on PAGE_SIZE chunks.  Even functions which are
aware of compound pages may expect a head page, and do the wrong thing
if passed a tail page.

There have been efforts to label function parameters as 'head' instead
of 'page' to indicate that the function expects a head page, but this
leaves us with runtime assertions instead of using the compiler to prove
that nobody has mistakenly passed a tail page.  Calling a struct page
'head' is also inaccurate as they will work perfectly well on base pages.
The term 'nottail' has not proven popular.

We also waste a lot of instructions ensuring that we're not looking at
a tail page.  Almost every call to PageFoo() contains one or more hidden
calls to compound_head().  This also happens for get_page(), put_page()
and many more functions.  There does not appear to be a way to tell gcc
that it can cache the result of compound_head(), nor is there a way to
tell it that compound_head() is idempotent.

This series introduces the 'struct folio' as a replacement for
head-or-base pages.  This initial set reduces the kernel size by
approximately 5kB by removing conversions from tail pages to head pages.
The real purpose of this series is adding infrastructure to enable
further use of the folio.

The medium-term goal is to convert all filesystems and some device
drivers to work in terms of folios.  This series contains a lot of
explicit conversions, but it's important to realise it's removing a lot
of implicit conversions in some relatively hot paths.  There will be very
few conversions from folios when this work is completed; filesystems,
the page cache, the LRU and so on will generally only deal with folios.

I analysed the text size reduction using a config based on Oracle UEK
with all modules changed to built-in.  That's obviously not a kernel
which makes sense to run, but it serves to compare the effects on (many
common) filesystems & drivers, not just the core.

add/remove: 34266/34260 grow/shrink: 5220/3206 up/down: 1083860/-1088546 (-4686)

Current tree at:
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio

(contains another ~100 patches on top of this batch, not all of which are
in good shape for submission)

v6:
 - Rebase on next-20210330
   - wait_bit_key patch merged by Linus
   - wait_on_page_writeback_killable() patches merged by Linus
   - Documentation patch merged by Andrew
 - Move folio_next_index() into this series
 - Move folio_offset() and folio_file_offset() into this series
 - Mirror members of struct page (for pagecache / anon) into struct folio,
   so (eg) you can use folio->mapping instead of folio->page.mapping
 - Add folio_ref_* functions, including kernel-doc for folio_ref_count().
 - Add count_memcg_folio_event()
 - Add put_folio_testzero()
 - Add folio_mapcount()
 - Add FolioKsm()
 - Fix afs_page_mkwrite() compilation
 - Fix/improve kernel-doc for
   - struct folio
   - add_folio_wait_queue()
   - wait_for_stable_folio()
   - wait_on_folio_writeback()
   - wait_on_folio_writeback_killable()
v5:
 - Rebase on next-20210319
 - Pull out three bug-fix patches to the front of the series, allowing
   them to be applied earlier.
 - Fix folio_page() against pages being moved between swap & page cache
 - Fix FolioDoubleMap to use the right page flags
 - Rename next_folio() to folio_next() (akpm)
 - Renamed folio stat functions (akpm)
 - Add 'mod' versions of the folio stats for users that already have 'nr'
 - Renamed folio_page to folio_file_page() (akpm)
 - Added kernel-doc for struct folio, folio_next(), folio_index(),
   folio_file_page(), folio_contains(), folio_order(), folio_nr_pages(),
   folio_shift(), folio_size(), page_folio(), get_folio(), put_folio()
 - Make folio_private() work in terms of void * instead of unsigned long
 - Used page_folio() in attach/detach page_private() (hch)
 - Drop afs_page_mkwrite folio conversion from this series
 - Add wait_on_folio_writeback_killable()
 - Convert add_page_wait_queue() to add_folio_wait_queue()
 - Add folio_swap_entry() helper
 - Drop the additions of *FolioFsCache
 - Simplify the addition of lock_folio_memcg() et al
 - Drop test_clear_page_writeback() conversion from this series
 - Add FolioTransHuge() definition
 - Rename __folio_file_mapping() to swapcache_mapping()
 - Added swapcache_index() helper
 - Removed lock_folio_async()
 - Made __lock_folio_async() static to filemap.c
 - Converted unlock_page_private_2() to use a folio internally
v4:
 - Rebase on current Linus tree (including swap fix)
 - Analyse each patch in terms of its effects on kernel text size.
   A few were modified to improve their effect.  In particular, where
   pushing calls to page_folio() into the callers resulted in unacceptable
   size increases, the wrapper was placed in mm/folio-compat.c.  This lets
   us see all the places which are good targets for conversion to folios.
 - Some of the patches were reordered, split or merged in order to make
   more logical sense.
 - Use nth_page() for folio_next() if we're using SPARSEMEM and not
   VMEMMAP (Zi Yan)
 - Increment and decrement page stats in units of pages instead of units
   of folios (Zi Yan)
v3:
 - Rebase on next-20210127.  Two major sources of conflict, the
   generic_file_buffered_read refactoring (in akpm tree) and the
   fscache work (in dhowells tree).
v2:
 - Pare patch series back to just infrastructure and the page waiting
   parts.

Matthew Wilcox (Oracle) (27):
  mm: Introduce struct folio
  mm: Add folio_pgdat and folio_zone
  mm/vmstat: Add functions to account folio statistics
  mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
  mm: Add folio reference count functions
  mm: Add put_folio
  mm: Add get_folio
  mm: Create FolioFlags
  mm: Handle per-folio private data
  mm/filemap: Add folio_index, folio_file_page and folio_contains
  mm/filemap: Add folio_next_index
  mm/filemap: Add folio_offset and folio_file_offset
  mm/util: Add folio_mapping and folio_file_mapping
  mm: Add folio_mapcount
  mm/memcg: Add folio wrappers for various functions
  mm/filemap: Add unlock_folio
  mm/filemap: Add lock_folio
  mm/filemap: Add lock_folio_killable
  mm/filemap: Add __lock_folio_async
  mm/filemap: Add __lock_folio_or_retry
  mm/filemap: Add wait_on_folio_locked
  mm/filemap: Add end_folio_writeback
  mm/writeback: Add wait_on_folio_writeback
  mm/writeback: Add wait_for_stable_folio
  mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit
  mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit
  mm/filemap: Convert page wait queues to be folios

 Documentation/core-api/mm-api.rst |   3 +
 fs/afs/write.c                    |   7 +-
 fs/cachefiles/rdwr.c              |  16 +-
 fs/io_uring.c                     |   2 +-
 include/linux/memcontrol.h        |  30 ++++
 include/linux/mm.h                | 177 ++++++++++++++++----
 include/linux/mm_types.h          |  81 +++++++++
 include/linux/mmdebug.h           |  20 +++
 include/linux/netfs.h             |   2 +-
 include/linux/page-flags.h        | 130 +++++++++++---
 include/linux/page_ref.h          |  88 +++++++++-
 include/linux/pagemap.h           | 270 ++++++++++++++++++++++--------
 include/linux/swap.h              |   6 +
 include/linux/vmstat.h            | 107 ++++++++++++
 mm/Makefile                       |   2 +-
 mm/filemap.c                      | 242 +++++++++++++-------------
 mm/folio-compat.c                 |  37 ++++
 mm/memory.c                       |   8 +-
 mm/page-writeback.c               |  72 +++++---
 mm/swapfile.c                     |   8 +-
 mm/util.c                         |  49 ++++--
 21 files changed, 1051 insertions(+), 306 deletions(-)
 create mode 100644 mm/folio-compat.c

-- 
2.30.2


From 99da34311602826672621c3d69bad13813993c1a Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Tue, 30 Mar 2021 10:47:46 -0400
Subject: [PATCH v6 00/25] *** SUBJECT HERE ***
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
    linux-fsdevel@vger.kernel.org,
    linux-cachefs@redhat.com,
    linux-afs@lists.infradead.org

*** BLURB HERE ***

Matthew Wilcox (Oracle) (25):
  mm: Introduce struct folio
  mm: Add folio_pgdat and folio_zone
  mm/vmstat: Add functions to account folio statistics
  mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
  mm: Add put_folio
  mm: Add get_folio
  mm: Create FolioFlags
  mm: Handle per-folio private data
  mm/filemap: Add folio_index, folio_file_page and folio_contains
  mm/filemap: Add folio_next_index
  mm/filemap: Add folio_offset and folio_file_offset
  mm/util: Add folio_mapping and folio_file_mapping
  mm/memcg: Add folio wrappers for various functions
  mm/filemap: Add unlock_folio
  mm/filemap: Add lock_folio
  mm/filemap: Add lock_folio_killable
  mm/filemap: Add __lock_folio_async
  mm/filemap: Add __lock_folio_or_retry
  mm/filemap: Add wait_on_folio_locked
  mm/filemap: Add end_folio_writeback
  mm/writeback: Add wait_on_folio_writeback
  mm/writeback: Add wait_for_stable_folio
  mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit
  mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit
  mm/filemap: Convert page wait queues to be folios

 Documentation/core-api/mm-api.rst |   2 +
 fs/afs/write.c                    |   7 +-
 fs/cachefiles/rdwr.c              |  16 +-
 fs/io_uring.c                     |   2 +-
 include/linux/memcontrol.h        |  21 +++
 include/linux/mm.h                | 156 +++++++++++++----
 include/linux/mm_types.h          |  81 +++++++++
 include/linux/mmdebug.h           |  20 +++
 include/linux/netfs.h             |   2 +-
 include/linux/page-flags.h        | 120 ++++++++++---
 include/linux/pagemap.h           | 270 ++++++++++++++++++++++--------
 include/linux/swap.h              |   6 +
 include/linux/vmstat.h            | 107 ++++++++++++
 mm/Makefile                       |   2 +-
 mm/filemap.c                      | 242 +++++++++++++-------------
 mm/folio-compat.c                 |  37 ++++
 mm/memory.c                       |   8 +-
 mm/page-writeback.c               |  72 +++++---
 mm/swapfile.c                     |   8 +-
 mm/util.c                         |  49 ++++--
 20 files changed, 926 insertions(+), 302 deletions(-)
 create mode 100644 mm/folio-compat.c

-- 
2.30.2

From 99da34311602826672621c3d69bad13813993c1a Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Tue, 30 Mar 2021 10:47:46 -0400
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
    linux-fsdevel@vger.kernel.org,
    linux-cachefs@redhat.com,
    linux-afs@lists.infradead.org



^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v6 01/27] mm: Introduce struct folio
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 12:29   ` Kirill A. Shutemov
  2021-04-08  9:01   ` Rasmus Villemoes
  2021-03-31 18:47 ` [PATCH v6 02/27] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
                   ` (28 subsequent siblings)
  29 siblings, 2 replies; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

A struct folio is a new abstraction to replace the venerable struct page.
A function which takes a struct folio argument declares that it will
operate on the entire (possibly compound) page, not just PAGE_SIZE bytes.
In return, the caller guarantees that the pointer it is passing does
not point to a tail page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h       | 78 ++++++++++++++++++++++++++++++++++++++++
 include/linux/mm_types.h | 65 +++++++++++++++++++++++++++++++++
 mm/util.c                | 19 ++++++++++
 3 files changed, 162 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3e4dc6678eb2..761063e733bf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -936,6 +936,20 @@ static inline unsigned int compound_order(struct page *page)
 	return page[1].compound_order;
 }
 
+/**
+ * folio_order - The allocation order of a folio.
+ * @folio: The folio.
+ *
+ * A folio is composed of 2^order pages.  See get_order() for the definition
+ * of order.
+ *
+ * Return: The order of the folio.
+ */
+static inline unsigned int folio_order(struct folio *folio)
+{
+	return compound_order(&folio->page);
+}
+
 static inline bool hpage_pincount_available(struct page *page)
 {
 	/*
@@ -1581,6 +1595,69 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
 #endif
 }
 
+/**
+ * folio_nr_pages - The number of pages in the folio.
+ * @folio: The folio.
+ *
+ * Return: A number which is a power of two.
+ */
+static inline unsigned long folio_nr_pages(struct folio *folio)
+{
+	return compound_nr(&folio->page);
+}
+
+/**
+ * folio_next - Move to the next physical folio.
+ * @folio: The folio we're currently operating on.
+ *
+ * If you have physically contiguous memory which may span more than
+ * one folio (eg a &struct bio_vec), use this function to move from one
+ * folio to the next.  Do not use it if the memory is only virtually
+ * contiguous as the folios are almost certainly not adjacent to each
+ * other.  This is the folio equivalent to writing ``page++``.
+ *
+ * Context: We assume that the folios are refcounted and/or locked at a
+ * higher level and do not adjust the reference counts.
+ * Return: The next struct folio.
+ */
+static inline struct folio *folio_next(struct folio *folio)
+{
+#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
+	return (struct folio *)nth_page(&folio->page, folio_nr_pages(folio));
+#else
+	return folio + folio_nr_pages(folio);
+#endif
+}
+
+/**
+ * folio_shift - The number of bits covered by this folio.
+ * @folio: The folio.
+ *
+ * A folio contains a number of bytes which is a power-of-two in size.
+ * This function tells you which power-of-two the folio is.
+ *
+ * Context: The caller should have a reference on the folio to prevent
+ * it from being split.  It is not necessary for the folio to be locked.
+ * Return: The base-2 logarithm of the size of this folio.
+ */
+static inline unsigned int folio_shift(struct folio *folio)
+{
+	return PAGE_SHIFT + folio_order(folio);
+}
+
+/**
+ * folio_size - The number of bytes in a folio.
+ * @folio: The folio.
+ *
+ * Context: The caller should have a reference on the folio to prevent
+ * it from being split.  It is not necessary for the folio to be locked.
+ * Return: The number of bytes in this folio.
+ */
+static inline size_t folio_size(struct folio *folio)
+{
+	return PAGE_SIZE << folio_order(folio);
+}
+
 /*
  * Some inline functions in vmstat.h depend on page_zone()
  */
@@ -1685,6 +1762,7 @@ extern void pagefault_out_of_memory(void);
 
 #define offset_in_page(p)	((unsigned long)(p) & ~PAGE_MASK)
 #define offset_in_thp(page, p)	((unsigned long)(p) & (thp_size(page) - 1))
+#define offset_in_folio(folio, p) ((unsigned long)(p) & (folio_size(folio) - 1))
 
 /*
  * Flags passed to show_mem() and show_free_areas() to suppress output in
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6613b26a8894..a0c7894fad1d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -224,6 +224,71 @@ struct page {
 #endif
 } _struct_page_alignment;
 
+/**
+ * struct folio - Represents a contiguous set of bytes.
+ * @flags: Identical to the page flags.
+ * @lru: Least Recently Used list; tracks how recently this folio was used.
+ * @mapping: The file this page belongs to, or refers to the anon_vma for
+ *    anonymous pages.
+ * @index: Offset within the file, in units of pages.  For anonymous pages,
+ *    this is the index from the beginning of the mmap.
+ * @private: Filesystem per-folio data (see attach_folio_private()).
+ *    Used for swp_entry_t if FolioSwapCache().
+ * @_mapcount: How many times this folio is mapped to userspace.  Use
+ *    folio_mapcount() to access it.
+ * @_refcount: Number of references to this folio.  Use folio_ref_count()
+ *    to read it.
+ * @memcg_data: Memory Control Group data.
+ *
+ * A folio is a physically, virtually and logically contiguous set
+ * of bytes.  It is a power-of-two in size, and it is aligned to that
+ * same power-of-two.  It is at least as large as %PAGE_SIZE.  If it is
+ * in the page cache, it is at a file offset which is a multiple of that
+ * power-of-two.
+ */
+struct folio {
+	/* private: don't document the anon union */
+	union {
+		struct {
+	/* public: */
+			unsigned long flags;
+			struct list_head lru;
+			struct address_space *mapping;
+			pgoff_t index;
+			unsigned long private;
+			atomic_t _mapcount;
+			atomic_t _refcount;
+#ifdef CONFIG_MEMCG
+			unsigned long memcg_data;
+#endif
+	/* private: the union with struct page is transitional */
+		};
+		struct page page;
+	};
+};
+
+/**
+ * page_folio - Converts from page to folio.
+ * @page: The page.
+ *
+ * Every page is part of a folio.  This function cannot be called on a
+ * NULL pointer.
+ *
+ * Context: No reference, nor lock is required on @page.  If the caller
+ * does not hold a reference, this call may race with a folio split, so
+ * it should re-check the folio still contains this page after gaining
+ * a reference on the folio.
+ * Return: The folio which contains this page.
+ */
+static inline struct folio *page_folio(struct page *page)
+{
+	unsigned long head = READ_ONCE(page->compound_head);
+
+	if (unlikely(head & 1))
+		return (struct folio *)(head - 1);
+	return (struct folio *)page;
+}
+
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
 {
 	return &page[1].compound_mapcount;
diff --git a/mm/util.c b/mm/util.c
index 0b6dd9d81da7..521a772f06eb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -686,6 +686,25 @@ struct anon_vma *page_anon_vma(struct page *page)
 	return __page_rmapping(page);
 }
 
+static inline void folio_build_bug(void)
+{
+#define FOLIO_MATCH(pg, fl)						\
+BUILD_BUG_ON(offsetof(struct page, pg) != offsetof(struct folio, fl));
+
+	FOLIO_MATCH(flags, flags);
+	FOLIO_MATCH(lru, lru);
+	FOLIO_MATCH(mapping, mapping);
+	FOLIO_MATCH(index, index);
+	FOLIO_MATCH(private, private);
+	FOLIO_MATCH(_mapcount, _mapcount);
+	FOLIO_MATCH(_refcount, _refcount);
+#ifdef CONFIG_MEMCG
+	FOLIO_MATCH(memcg_data, memcg_data);
+#endif
+#undef FOLIO_MATCH
+	BUILD_BUG_ON(sizeof(struct page) != sizeof(struct folio));
+}
+
 struct address_space *page_mapping(struct page *page)
 {
 	struct address_space *mapping;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 02/27] mm: Add folio_pgdat and folio_zone
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
  2021-03-31 18:47 ` [PATCH v6 01/27] mm: Introduce struct folio Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:23   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 03/27] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs, Zi Yan

These are just convenience wrappers for callers with folios; pgdat and
zone can be reached from tail pages as well as head pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mm.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 761063e733bf..195c4740522d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1546,6 +1546,16 @@ static inline pg_data_t *page_pgdat(const struct page *page)
 	return NODE_DATA(page_to_nid(page));
 }
 
+static inline struct zone *folio_zone(const struct folio *folio)
+{
+	return page_zone(&folio->page);
+}
+
+static inline pg_data_t *folio_pgdat(const struct folio *folio)
+{
+	return page_pgdat(&folio->page);
+}
+
 #ifdef SECTION_IN_PAGE_FLAGS
 static inline void set_page_section(struct page *page, unsigned long section)
 {
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 03/27] mm/vmstat: Add functions to account folio statistics
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
  2021-03-31 18:47 ` [PATCH v6 01/27] mm: Introduce struct folio Matthew Wilcox (Oracle)
  2021-03-31 18:47 ` [PATCH v6 02/27] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:25   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 04/27] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Allow page counters to be more readily modified by callers which have
a folio.  Name these wrappers with 'stat' instead of 'state' as requested
by Linus here:
https://lore.kernel.org/linux-mm/CAHk-=wj847SudR-kt+46fT3+xFFgiwpgThvm7DJWGdi4cVrbnQ@mail.gmail.com/

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/vmstat.h | 107 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 3299cd69e4ca..d287d7c31b8f 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -402,6 +402,78 @@ static inline void drain_zonestat(struct zone *zone,
 			struct per_cpu_pageset *pset) { }
 #endif		/* CONFIG_SMP */
 
+static inline void __zone_stat_mod_folio(struct folio *folio,
+		enum zone_stat_item item, long nr)
+{
+	__mod_zone_page_state(folio_zone(folio), item, nr);
+}
+
+static inline void __zone_stat_add_folio(struct folio *folio,
+		enum zone_stat_item item)
+{
+	__mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
+}
+
+static inline void __zone_stat_sub_folio(struct folio *folio,
+		enum zone_stat_item item)
+{
+	__mod_zone_page_state(folio_zone(folio), item, -folio_nr_pages(folio));
+}
+
+static inline void zone_stat_mod_folio(struct folio *folio,
+		enum zone_stat_item item, long nr)
+{
+	mod_zone_page_state(folio_zone(folio), item, nr);
+}
+
+static inline void zone_stat_add_folio(struct folio *folio,
+		enum zone_stat_item item)
+{
+	mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
+}
+
+static inline void zone_stat_sub_folio(struct folio *folio,
+		enum zone_stat_item item)
+{
+	mod_zone_page_state(folio_zone(folio), item, -folio_nr_pages(folio));
+}
+
+static inline void __node_stat_mod_folio(struct folio *folio,
+		enum node_stat_item item, long nr)
+{
+	__mod_node_page_state(folio_pgdat(folio), item, nr);
+}
+
+static inline void __node_stat_add_folio(struct folio *folio,
+		enum node_stat_item item)
+{
+	__mod_node_page_state(folio_pgdat(folio), item, folio_nr_pages(folio));
+}
+
+static inline void __node_stat_sub_folio(struct folio *folio,
+		enum node_stat_item item)
+{
+	__mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
+}
+
+static inline void node_stat_mod_folio(struct folio *folio,
+		enum node_stat_item item, long nr)
+{
+	mod_node_page_state(folio_pgdat(folio), item, nr);
+}
+
+static inline void node_stat_add_folio(struct folio *folio,
+		enum node_stat_item item)
+{
+	mod_node_page_state(folio_pgdat(folio), item, folio_nr_pages(folio));
+}
+
+static inline void node_stat_sub_folio(struct folio *folio,
+		enum node_stat_item item)
+{
+	mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
+}
+
 static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
 					     int migratetype)
 {
@@ -530,6 +602,24 @@ static inline void __dec_lruvec_page_state(struct page *page,
 	__mod_lruvec_page_state(page, idx, -1);
 }
 
+static inline void __lruvec_stat_mod_folio(struct folio *folio,
+					   enum node_stat_item idx, int val)
+{
+	__mod_lruvec_page_state(&folio->page, idx, val);
+}
+
+static inline void __lruvec_stat_add_folio(struct folio *folio,
+					   enum node_stat_item idx)
+{
+	__lruvec_stat_mod_folio(folio, idx, folio_nr_pages(folio));
+}
+
+static inline void __lruvec_stat_sub_folio(struct folio *folio,
+					   enum node_stat_item idx)
+{
+	__lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio));
+}
+
 static inline void inc_lruvec_page_state(struct page *page,
 					 enum node_stat_item idx)
 {
@@ -542,4 +632,21 @@ static inline void dec_lruvec_page_state(struct page *page,
 	mod_lruvec_page_state(page, idx, -1);
 }
 
+static inline void lruvec_stat_mod_folio(struct folio *folio,
+					 enum node_stat_item idx, int val)
+{
+	mod_lruvec_page_state(&folio->page, idx, val);
+}
+
+static inline void lruvec_stat_add_folio(struct folio *folio,
+					 enum node_stat_item idx)
+{
+	lruvec_stat_mod_folio(folio, idx, folio_nr_pages(folio));
+}
+
+static inline void lruvec_stat_sub_folio(struct folio *folio,
+					 enum node_stat_item idx)
+{
+	lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio));
+}
 #endif /* _LINUX_VMSTAT_H */
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 04/27] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 03/27] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:26   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 05/27] mm: Add folio reference count functions Matthew Wilcox (Oracle)
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs, Zi Yan

These are the folio equivalents of VM_BUG_ON_PAGE and VM_WARN_ON_ONCE_PAGE.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmdebug.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 5d0767cb424a..77d24e1dcaec 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -23,6 +23,13 @@ void dump_mm(const struct mm_struct *mm);
 			BUG();						\
 		}							\
 	} while (0)
+#define VM_BUG_ON_FOLIO(cond, folio)					\
+	do {								\
+		if (unlikely(cond)) {					\
+			dump_page(&folio->page, "VM_BUG_ON_FOLIO(" __stringify(cond)")");\
+			BUG();						\
+		}							\
+	} while (0)
 #define VM_BUG_ON_VMA(cond, vma)					\
 	do {								\
 		if (unlikely(cond)) {					\
@@ -48,6 +55,17 @@ void dump_mm(const struct mm_struct *mm);
 	}								\
 	unlikely(__ret_warn_once);					\
 })
+#define VM_WARN_ON_ONCE_FOLIO(cond, folio)	({			\
+	static bool __section(".data.once") __warned;			\
+	int __ret_warn_once = !!(cond);					\
+									\
+	if (unlikely(__ret_warn_once && !__warned)) {			\
+		dump_page(&folio->page, "VM_WARN_ON_ONCE_FOLIO(" __stringify(cond)")");\
+		__warned = true;					\
+		WARN_ON(1);						\
+	}								\
+	unlikely(__ret_warn_once);					\
+})
 
 #define VM_WARN_ON(cond) (void)WARN_ON(cond)
 #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
@@ -56,11 +74,13 @@ void dump_mm(const struct mm_struct *mm);
 #else
 #define VM_BUG_ON(cond) BUILD_BUG_ON_INVALID(cond)
 #define VM_BUG_ON_PAGE(cond, page) VM_BUG_ON(cond)
+#define VM_BUG_ON_FOLIO(cond, folio) VM_BUG_ON(cond)
 #define VM_BUG_ON_VMA(cond, vma) VM_BUG_ON(cond)
 #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
 #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
+#define VM_WARN_ON_ONCE_FOLIO(cond, folio)  BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #endif
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 05/27] mm: Add folio reference count functions
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 04/27] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:30   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 06/27] mm: Add put_folio Matthew Wilcox (Oracle)
                   ` (24 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

These functions mirror their page reference counterparts.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/core-api/mm-api.rst |  1 +
 include/linux/page_ref.h          | 88 ++++++++++++++++++++++++++++++-
 2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst
index 34f46df91a8b..1ead2570b217 100644
--- a/Documentation/core-api/mm-api.rst
+++ b/Documentation/core-api/mm-api.rst
@@ -97,3 +97,4 @@ More Memory Management Functions
    :internal:
 .. kernel-doc:: include/linux/mm.h
    :internal:
+.. kernel-doc:: include/linux/page_ref.h
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index f3318f34fc54..f27005e760fd 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -69,7 +69,29 @@ static inline int page_ref_count(struct page *page)
 
 static inline int page_count(struct page *page)
 {
-	return atomic_read(&compound_head(page)->_refcount);
+	return page_ref_count(compound_head(page));
+}
+
+/**
+ * folio_ref_count - The reference count on this folio.
+ * @folio: The folio.
+ *
+ * The refcount is usually incremented by calls to get_folio() and
+ * decremented by calls to put_folio().  Some typical users of the
+ * folio refcount:
+ *
+ * - Each reference from a page table
+ * - The page cache
+ * - Filesystem private data
+ * - The LRU list
+ * - Pipes
+ * - Direct IO which references this page in the process address space
+ *
+ * Return: The number of references to this folio.
+ */
+static inline int folio_ref_count(struct folio *folio)
+{
+	return page_ref_count(&folio->page);
 }
 
 static inline void set_page_count(struct page *page, int v)
@@ -79,6 +101,11 @@ static inline void set_page_count(struct page *page, int v)
 		__page_ref_set(page, v);
 }
 
+static inline void set_folio_count(struct folio *folio, int v)
+{
+	set_page_count(&folio->page, v);
+}
+
 /*
  * Setup the page count before being freed into the page allocator for
  * the first time (boot or memory hotplug)
@@ -95,6 +122,11 @@ static inline void page_ref_add(struct page *page, int nr)
 		__page_ref_mod(page, nr);
 }
 
+static inline void folio_ref_add(struct folio *folio, int nr)
+{
+	page_ref_add(&folio->page, nr);
+}
+
 static inline void page_ref_sub(struct page *page, int nr)
 {
 	atomic_sub(nr, &page->_refcount);
@@ -102,6 +134,11 @@ static inline void page_ref_sub(struct page *page, int nr)
 		__page_ref_mod(page, -nr);
 }
 
+static inline void folio_ref_sub(struct folio *folio, int nr)
+{
+	page_ref_sub(&folio->page, nr);
+}
+
 static inline int page_ref_sub_return(struct page *page, int nr)
 {
 	int ret = atomic_sub_return(nr, &page->_refcount);
@@ -111,6 +148,11 @@ static inline int page_ref_sub_return(struct page *page, int nr)
 	return ret;
 }
 
+static inline int folio_ref_sub_return(struct folio *folio, int nr)
+{
+	return page_ref_sub_return(&folio->page, nr);
+}
+
 static inline void page_ref_inc(struct page *page)
 {
 	atomic_inc(&page->_refcount);
@@ -118,6 +160,11 @@ static inline void page_ref_inc(struct page *page)
 		__page_ref_mod(page, 1);
 }
 
+static inline void folio_ref_inc(struct folio *folio)
+{
+	page_ref_inc(&folio->page);
+}
+
 static inline void page_ref_dec(struct page *page)
 {
 	atomic_dec(&page->_refcount);
@@ -125,6 +172,11 @@ static inline void page_ref_dec(struct page *page)
 		__page_ref_mod(page, -1);
 }
 
+static inline void folio_ref_dec(struct folio *folio)
+{
+	page_ref_dec(&folio->page);
+}
+
 static inline int page_ref_sub_and_test(struct page *page, int nr)
 {
 	int ret = atomic_sub_and_test(nr, &page->_refcount);
@@ -134,6 +186,11 @@ static inline int page_ref_sub_and_test(struct page *page, int nr)
 	return ret;
 }
 
+static inline int folio_ref_sub_and_test(struct folio *folio, int nr)
+{
+	return page_ref_sub_and_test(&folio->page, nr);
+}
+
 static inline int page_ref_inc_return(struct page *page)
 {
 	int ret = atomic_inc_return(&page->_refcount);
@@ -143,6 +200,11 @@ static inline int page_ref_inc_return(struct page *page)
 	return ret;
 }
 
+static inline int folio_ref_inc_return(struct folio *folio)
+{
+	return page_ref_inc_return(&folio->page);
+}
+
 static inline int page_ref_dec_and_test(struct page *page)
 {
 	int ret = atomic_dec_and_test(&page->_refcount);
@@ -152,6 +214,11 @@ static inline int page_ref_dec_and_test(struct page *page)
 	return ret;
 }
 
+static inline int folio_ref_dec_and_test(struct folio *folio)
+{
+	return page_ref_dec_and_test(&folio->page);
+}
+
 static inline int page_ref_dec_return(struct page *page)
 {
 	int ret = atomic_dec_return(&page->_refcount);
@@ -161,6 +228,11 @@ static inline int page_ref_dec_return(struct page *page)
 	return ret;
 }
 
+static inline int folio_ref_dec_return(struct folio *folio)
+{
+	return page_ref_dec_return(&folio->page);
+}
+
 static inline int page_ref_add_unless(struct page *page, int nr, int u)
 {
 	int ret = atomic_add_unless(&page->_refcount, nr, u);
@@ -170,6 +242,11 @@ static inline int page_ref_add_unless(struct page *page, int nr, int u)
 	return ret;
 }
 
+static inline int folio_ref_add_unless(struct folio *folio, int nr, int u)
+{
+	return page_ref_add_unless(&folio->page, nr, u);
+}
+
 static inline int page_ref_freeze(struct page *page, int count)
 {
 	int ret = likely(atomic_cmpxchg(&page->_refcount, count, 0) == count);
@@ -179,6 +256,11 @@ static inline int page_ref_freeze(struct page *page, int count)
 	return ret;
 }
 
+static inline int folio_ref_freeze(struct folio *folio, int count)
+{
+	return page_ref_freeze(&folio->page, count);
+}
+
 static inline void page_ref_unfreeze(struct page *page, int count)
 {
 	VM_BUG_ON_PAGE(page_count(page) != 0, page);
@@ -189,4 +271,8 @@ static inline void page_ref_unfreeze(struct page *page, int count)
 		__page_ref_unfreeze(page, count);
 }
 
+static inline void folio_ref_unfreeze(struct folio *folio, int count)
+{
+	page_ref_unfreeze(&folio->page, count);
+}
 #endif
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 06/27] mm: Add put_folio
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 05/27] mm: Add folio reference count functions Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:31   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 07/27] mm: Add get_folio Matthew Wilcox (Oracle)
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs, Zi Yan

If we know we have a folio, we can call put_folio() instead of put_page()
and save the overhead of calling compound_head().  Also skips the
devmap checks.

This commit looks like it should be a no-op, but actually saves 1312 bytes
of text with the distro-derived config that I'm testing.  Some functions
grow a little while others shrink.  I presume the compiler is making
different inlining decisions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mm.h | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 195c4740522d..824acedc1253 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -737,6 +737,11 @@ static inline int put_page_testzero(struct page *page)
 	return page_ref_dec_and_test(page);
 }
 
+static inline int put_folio_testzero(struct folio *folio)
+{
+	return put_page_testzero(&folio->page);
+}
+
 /*
  * Try to grab a ref unless the page has a refcount of zero, return false if
  * that is the case.
@@ -1228,9 +1233,28 @@ static inline __must_check bool try_get_page(struct page *page)
 	return true;
 }
 
+/**
+ * put_folio - Decrement the reference count on a folio.
+ * @folio: The folio.
+ *
+ * If the folio's reference count reaches zero, the memory will be
+ * released back to the page allocator and may be used by another
+ * allocation immediately.  Do not access the memory or the struct folio
+ * after calling put_folio() unless you can be sure that it wasn't the
+ * last reference.
+ *
+ * Context: May be called in process or interrupt context, but not in NMI
+ * context.  May be called while holding a spinlock.
+ */
+static inline void put_folio(struct folio *folio)
+{
+	if (put_folio_testzero(folio))
+		__put_page(&folio->page);
+}
+
 static inline void put_page(struct page *page)
 {
-	page = compound_head(page);
+	struct folio *folio = page_folio(page);
 
 	/*
 	 * For devmap managed pages we need to catch refcount transition from
@@ -1238,13 +1262,12 @@ static inline void put_page(struct page *page)
 	 * need to inform the device driver through callback. See
 	 * include/linux/memremap.h and HMM for details.
 	 */
-	if (page_is_devmap_managed(page)) {
-		put_devmap_managed_page(page);
+	if (page_is_devmap_managed(&folio->page)) {
+		put_devmap_managed_page(&folio->page);
 		return;
 	}
 
-	if (put_page_testzero(page))
-		__put_page(page);
+	put_folio(folio);
 }
 
 /*
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 07/27] mm: Add get_folio
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 06/27] mm: Add put_folio Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:32   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 08/27] mm: Create FolioFlags Matthew Wilcox (Oracle)
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs, Zi Yan

If we know we have a folio, we can call get_folio() instead
of get_page() and save the overhead of calling compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mm.h | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 824acedc1253..818010a6b4c9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1205,18 +1205,26 @@ static inline bool is_pci_p2pdma_page(const struct page *page)
 }
 
 /* 127: arbitrary random number, small enough to assemble well */
-#define page_ref_zero_or_close_to_overflow(page) \
-	((unsigned int) page_ref_count(page) + 127u <= 127u)
+#define folio_ref_zero_or_close_to_overflow(folio) \
+	((unsigned int) folio_ref_count(folio) + 127u <= 127u)
+
+/**
+ * get_folio - Increment the reference count on a folio.
+ * @folio: The folio.
+ *
+ * Context: May be called in any context, as long as you know that
+ * you have a refcount on the folio.  If you do not already have one,
+ * try_grab_page() may be the right interface for you to use.
+ */
+static inline void get_folio(struct folio *folio)
+{
+	VM_BUG_ON_FOLIO(folio_ref_zero_or_close_to_overflow(folio), folio);
+	folio_ref_inc(folio);
+}
 
 static inline void get_page(struct page *page)
 {
-	page = compound_head(page);
-	/*
-	 * Getting a normal page or the head of a compound page
-	 * requires to already have an elevated page->_refcount.
-	 */
-	VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
-	page_ref_inc(page);
+	get_folio(page_folio(page));
 }
 
 bool __must_check try_grab_page(struct page *page, unsigned int flags);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 08/27] mm: Create FolioFlags
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 07/27] mm: Add get_folio Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:34   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 09/27] mm: Handle per-folio private data Matthew Wilcox (Oracle)
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

These new functions are the folio analogues of the PageFlags functions.
If CONFIG_DEBUG_VM_PGFLAGS is enabled, we check the folio is not a tail
page at every invocation.  Note that this will also catch the PagePoisoned
case as a poisoned page has every bit set, which would include PageTail.

This saves 1727 bytes of text with the distro-derived config that
I'm testing due to removing a double call to compound_head() in
PageSwapCache().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/page-flags.h | 130 ++++++++++++++++++++++++++++++-------
 1 file changed, 107 insertions(+), 23 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 04a34c08e0a6..b923a90b3ba5 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -212,6 +212,15 @@ static inline void page_init_poison(struct page *page, size_t size)
 }
 #endif
 
+static unsigned long *folio_flags(struct folio *folio, unsigned n)
+{
+	struct page *page = &folio->page;
+
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
+	VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
+	return &page[n].flags;
+}
+
 /*
  * Page flags policies wrt compound pages
  *
@@ -256,34 +265,56 @@ static inline void page_init_poison(struct page *page, size_t size)
 		VM_BUG_ON_PGFLAGS(!PageHead(page), page);		\
 		PF_POISONED_CHECK(&page[1]); })
 
+/* Which page is the flag stored in */
+#define FOLIO_PF_ANY		0
+#define FOLIO_PF_HEAD		0
+#define FOLIO_PF_ONLY_HEAD	0
+#define FOLIO_PF_NO_TAIL	0
+#define FOLIO_PF_NO_COMPOUND	0
+#define FOLIO_PF_SECOND		1
+
 /*
  * Macros to create function definitions for page flags
  */
 #define TESTPAGEFLAG(uname, lname, policy)				\
+static __always_inline int Folio##uname(struct folio *folio)		\
+	{ return test_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
 static __always_inline int Page##uname(struct page *page)		\
 	{ return test_bit(PG_##lname, &policy(page, 0)->flags); }
 
 #define SETPAGEFLAG(uname, lname, policy)				\
+static __always_inline void SetFolio##uname(struct folio *folio)	\
+	{ set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }	\
 static __always_inline void SetPage##uname(struct page *page)		\
 	{ set_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define CLEARPAGEFLAG(uname, lname, policy)				\
+static __always_inline void ClearFolio##uname(struct folio *folio)	\
+	{ clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }	\
 static __always_inline void ClearPage##uname(struct page *page)		\
 	{ clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define __SETPAGEFLAG(uname, lname, policy)				\
+static __always_inline void __SetFolio##uname(struct folio *folio)	\
+	{ __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }	\
 static __always_inline void __SetPage##uname(struct page *page)		\
 	{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define __CLEARPAGEFLAG(uname, lname, policy)				\
+static __always_inline void __ClearFolio##uname(struct folio *folio)	\
+	{ __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
 static __always_inline void __ClearPage##uname(struct page *page)	\
 	{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define TESTSETFLAG(uname, lname, policy)				\
+static __always_inline int TestSetFolio##uname(struct folio *folio)	\
+	{ return test_and_set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
 static __always_inline int TestSetPage##uname(struct page *page)	\
 	{ return test_and_set_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define TESTCLEARFLAG(uname, lname, policy)				\
+static __always_inline int TestClearFolio##uname(struct folio *folio)	\
+	{ return test_and_clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
 static __always_inline int TestClearPage##uname(struct page *page)	\
 	{ return test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
@@ -302,21 +333,27 @@ static __always_inline int TestClearPage##uname(struct page *page)	\
 	TESTCLEARFLAG(uname, lname, policy)
 
 #define TESTPAGEFLAG_FALSE(uname)					\
+static inline int Folio##uname(const struct folio *folio) { return 0; }	\
 static inline int Page##uname(const struct page *page) { return 0; }
 
 #define SETPAGEFLAG_NOOP(uname)						\
+static inline void SetFolio##uname(struct folio *folio) { }		\
 static inline void SetPage##uname(struct page *page) {  }
 
 #define CLEARPAGEFLAG_NOOP(uname)					\
+static inline void ClearFolio##uname(struct folio *folio) { }		\
 static inline void ClearPage##uname(struct page *page) {  }
 
 #define __CLEARPAGEFLAG_NOOP(uname)					\
+static inline void __ClearFolio##uname(struct folio *folio) { }		\
 static inline void __ClearPage##uname(struct page *page) {  }
 
 #define TESTSETFLAG_FALSE(uname)					\
+static inline int TestSetFolio##uname(struct folio *folio) { return 0; } \
 static inline int TestSetPage##uname(struct page *page) { return 0; }
 
 #define TESTCLEARFLAG_FALSE(uname)					\
+static inline int TestClearFolio##uname(struct folio *folio) { return 0; } \
 static inline int TestClearPage##uname(struct page *page) { return 0; }
 
 #define PAGEFLAG_FALSE(uname) TESTPAGEFLAG_FALSE(uname)			\
@@ -393,14 +430,18 @@ PAGEFLAG_FALSE(HighMem)
 #endif
 
 #ifdef CONFIG_SWAP
-static __always_inline int PageSwapCache(struct page *page)
+static __always_inline bool FolioSwapCache(struct folio *folio)
 {
-#ifdef CONFIG_THP_SWAP
-	page = compound_head(page);
-#endif
-	return PageSwapBacked(page) && test_bit(PG_swapcache, &page->flags);
+	return FolioSwapBacked(folio) &&
+			test_bit(PG_swapcache, folio_flags(folio, 0));
 
 }
+
+static __always_inline bool PageSwapCache(struct page *page)
+{
+	return FolioSwapCache(page_folio(page));
+}
+
 SETPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
 CLEARPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
 #else
@@ -478,10 +519,14 @@ static __always_inline int PageMappingFlags(struct page *page)
 	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
 }
 
-static __always_inline int PageAnon(struct page *page)
+static __always_inline bool FolioAnon(struct folio *folio)
+{
+	return ((unsigned long)folio->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
+static __always_inline bool PageAnon(struct page *page)
 {
-	page = compound_head(page);
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+	return FolioAnon(page_folio(page));
 }
 
 static __always_inline int __PageMovable(struct page *page)
@@ -497,30 +542,32 @@ static __always_inline int __PageMovable(struct page *page)
  * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
  * anon_vma, but to that page's node of the stable tree.
  */
-static __always_inline int PageKsm(struct page *page)
+static __always_inline bool FolioKsm(struct folio *folio)
 {
-	page = compound_head(page);
-	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+	return ((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS) ==
 				PAGE_MAPPING_KSM;
 }
+
+static __always_inline bool PageKsm(struct page *page)
+{
+	return FolioKsm(page_folio(page));
+}
 #else
 TESTPAGEFLAG_FALSE(Ksm)
 #endif
 
 u64 stable_page_flags(struct page *page);
 
-static inline int PageUptodate(struct page *page)
+static inline int FolioUptodate(struct folio *folio)
 {
-	int ret;
-	page = compound_head(page);
-	ret = test_bit(PG_uptodate, &(page)->flags);
+	int ret = test_bit(PG_uptodate, folio_flags(folio, 0));
 	/*
 	 * Must ensure that the data we read out of the page is loaded
 	 * _after_ we've loaded page->flags to check for PageUptodate.
 	 * We can skip the barrier if the page is not uptodate, because
 	 * we wouldn't be reading anything from it.
 	 *
-	 * See SetPageUptodate() for the other side of the story.
+	 * See SetFolioUptodate() for the other side of the story.
 	 */
 	if (ret)
 		smp_rmb();
@@ -528,23 +575,36 @@ static inline int PageUptodate(struct page *page)
 	return ret;
 }
 
-static __always_inline void __SetPageUptodate(struct page *page)
+static inline int PageUptodate(struct page *page)
+{
+	return FolioUptodate(page_folio(page));
+}
+
+static __always_inline void __SetFolioUptodate(struct folio *folio)
 {
-	VM_BUG_ON_PAGE(PageTail(page), page);
 	smp_wmb();
-	__set_bit(PG_uptodate, &page->flags);
+	__set_bit(PG_uptodate, folio_flags(folio, 0));
 }
 
-static __always_inline void SetPageUptodate(struct page *page)
+static __always_inline void SetFolioUptodate(struct folio *folio)
 {
-	VM_BUG_ON_PAGE(PageTail(page), page);
 	/*
 	 * Memory barrier must be issued before setting the PG_uptodate bit,
 	 * so that all previous stores issued in order to bring the page
 	 * uptodate are actually visible before PageUptodate becomes true.
 	 */
 	smp_wmb();
-	set_bit(PG_uptodate, &page->flags);
+	set_bit(PG_uptodate, folio_flags(folio, 0));
+}
+
+static __always_inline void __SetPageUptodate(struct page *page)
+{
+	__SetFolioUptodate((struct folio *)page);
+}
+
+static __always_inline void SetPageUptodate(struct page *page)
+{
+	SetFolioUptodate((struct folio *)page);
 }
 
 CLEARPAGEFLAG(Uptodate, uptodate, PF_NO_TAIL)
@@ -569,6 +629,17 @@ static inline void set_page_writeback_keepwrite(struct page *page)
 
 __PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)
 
+/* Whether there are one or multiple pages in a folio */
+static inline bool FolioSingle(struct folio *folio)
+{
+	return !FolioHead(folio);
+}
+
+static inline bool FolioMulti(struct folio *folio)
+{
+	return FolioHead(folio);
+}
+
 static __always_inline void set_compound_head(struct page *page, struct page *head)
 {
 	WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
@@ -592,12 +663,15 @@ static inline void ClearPageCompound(struct page *page)
 #ifdef CONFIG_HUGETLB_PAGE
 int PageHuge(struct page *page);
 int PageHeadHuge(struct page *page);
+static inline bool FolioHuge(struct folio *folio)
+{
+	return PageHeadHuge(&folio->page);
+}
 #else
 TESTPAGEFLAG_FALSE(Huge)
 TESTPAGEFLAG_FALSE(HeadHuge)
 #endif
 
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * PageHuge() only returns true for hugetlbfs pages, but not for
@@ -613,6 +687,11 @@ static inline int PageTransHuge(struct page *page)
 	return PageHead(page);
 }
 
+static inline bool FolioTransHuge(struct folio *folio)
+{
+	return FolioHead(folio);
+}
+
 /*
  * PageTransCompound returns true for both transparent huge pages
  * and hugetlbfs pages, so it should only be called when it's known
@@ -844,6 +923,11 @@ static inline int page_has_private(struct page *page)
 	return !!(page->flags & PAGE_FLAGS_PRIVATE);
 }
 
+static inline bool folio_has_private(struct folio *folio)
+{
+	return page_has_private(&folio->page);
+}
+
 #undef PF_ANY
 #undef PF_HEAD
 #undef PF_ONLY_HEAD
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 09/27] mm: Handle per-folio private data
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 08/27] mm: Create FolioFlags Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:37   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 10/27] mm/filemap: Add folio_index, folio_file_page and folio_contains Matthew Wilcox (Oracle)
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Add folio_private() and set_folio_private() which mirror page_private()
and set_page_private() -- ie folio private data is the same as page
private data.  The only difference is that these return a void *
instead of an unsigned long, which matches the majority of users.

Turn attach_page_private() into attach_folio_private() and reimplement
attach_page_private() as a wrapper.  No filesystem which uses page private
data currently supports compound pages, so we're free to define the rules.
attach_page_private() may only be called on a head page; if you want
to add private data to a tail page, you can call set_page_private()
directly (and shouldn't increment the page refcount!  That should be
done when adding private data to the head page / folio).

This saves 597 bytes of text with the distro-derived config that I'm
testing due to removing the calls to compound_head() in get_page()
& put_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm_types.h | 16 ++++++++++++++
 include/linux/pagemap.h  | 48 ++++++++++++++++++++++++----------------
 2 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a0c7894fad1d..1fe32ada3dab 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -307,6 +307,12 @@ static inline atomic_t *compound_pincount_ptr(struct page *page)
 #define PAGE_FRAG_CACHE_MAX_SIZE	__ALIGN_MASK(32768, ~PAGE_MASK)
 #define PAGE_FRAG_CACHE_MAX_ORDER	get_order(PAGE_FRAG_CACHE_MAX_SIZE)
 
+/*
+ * page_private can be used on tail pages.  However, PagePrivate is only
+ * checked by the VM on the head page.  So page_private on the tail pages
+ * should be used for data that's ancillary to the head page (eg attaching
+ * buffer heads to tail pages after attaching buffer heads to the head page)
+ */
 #define page_private(page)		((page)->private)
 
 static inline void set_page_private(struct page *page, unsigned long private)
@@ -314,6 +320,16 @@ static inline void set_page_private(struct page *page, unsigned long private)
 	page->private = private;
 }
 
+static inline void *folio_private(struct folio *folio)
+{
+	return (void *)folio->private;
+}
+
+static inline void set_folio_private(struct folio *folio, void *v)
+{
+	folio->private = (unsigned long)v;
+}
+
 struct page_frag_cache {
 	void * va;
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8c844ba67785..6676210addf6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -260,42 +260,52 @@ static inline int page_cache_add_speculative(struct page *page, int count)
 }
 
 /**
- * attach_page_private - Attach private data to a page.
- * @page: Page to attach data to.
- * @data: Data to attach to page.
+ * attach_folio_private - Attach private data to a folio.
+ * @folio: Folio to attach data to.
+ * @data: Data to attach to folio.
  *
- * Attaching private data to a page increments the page's reference count.
- * The data must be detached before the page will be freed.
+ * Attaching private data to a folio increments the page's reference count.
+ * The data must be detached before the folio will be freed.
  */
-static inline void attach_page_private(struct page *page, void *data)
+static inline void attach_folio_private(struct folio *folio, void *data)
 {
-	get_page(page);
-	set_page_private(page, (unsigned long)data);
-	SetPagePrivate(page);
+	get_folio(folio);
+	set_folio_private(folio, data);
+	SetFolioPrivate(folio);
 }
 
 /**
- * detach_page_private - Detach private data from a page.
- * @page: Page to detach data from.
+ * detach_folio_private - Detach private data from a folio.
+ * @folio: Folio to detach data from.
  *
- * Removes the data that was previously attached to the page and decrements
+ * Removes the data that was previously attached to the folio and decrements
  * the refcount on the page.
  *
- * Return: Data that was attached to the page.
+ * Return: Data that was attached to the folio.
  */
-static inline void *detach_page_private(struct page *page)
+static inline void *detach_folio_private(struct folio *folio)
 {
-	void *data = (void *)page_private(page);
+	void *data = folio_private(folio);
 
-	if (!PagePrivate(page))
+	if (!FolioPrivate(folio))
 		return NULL;
-	ClearPagePrivate(page);
-	set_page_private(page, 0);
-	put_page(page);
+	ClearFolioPrivate(folio);
+	set_folio_private(folio, NULL);
+	put_folio(folio);
 
 	return data;
 }
 
+static inline void attach_page_private(struct page *page, void *data)
+{
+	attach_folio_private(page_folio(page), data);
+}
+
+static inline void *detach_page_private(struct page *page)
+{
+	return detach_folio_private(page_folio(page));
+}
+
 #ifdef CONFIG_NUMA
 extern struct page *__page_cache_alloc(gfp_t gfp);
 #else
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 10/27] mm/filemap: Add folio_index, folio_file_page and folio_contains
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 09/27] mm: Handle per-folio private data Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:39   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 11/27] mm/filemap: Add folio_next_index Matthew Wilcox (Oracle)
                   ` (19 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

folio_index() is the equivalent of page_index() for folios.
folio_file_page() is the equivalent of find_subpage().
folio_contains() is the equivalent of thp_contains().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 53 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6676210addf6..6749c47d3c33 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -462,6 +462,59 @@ static inline bool thp_contains(struct page *head, pgoff_t index)
 	return page_index(head) == (index & ~(thp_nr_pages(head) - 1UL));
 }
 
+#define swapcache_index(folio)	__page_file_index(&(folio)->page)
+
+/**
+ * folio_index - File index of a folio.
+ * @folio: The folio.
+ *
+ * For a folio which is either in the page cache or the swap cache,
+ * return its index within the address_space it belongs to.  If you know
+ * the page is definitely in the page cache, you can look at the folio's
+ * index directly.
+ *
+ * Return: The index (offset in units of pages) of a folio in its file.
+ */
+static inline pgoff_t folio_index(struct folio *folio)
+{
+        if (unlikely(FolioSwapCache(folio)))
+                return swapcache_index(folio);
+        return folio->index;
+}
+
+/**
+ * folio_file_page - The page for a particular index.
+ * @folio: The folio which contains this index.
+ * @index: The index we want to look up.
+ *
+ * Sometimes after looking up a folio in the page cache, we need to
+ * obtain the specific page for an index (eg a page fault).
+ *
+ * Return: The page containing the file data for this index.
+ */
+static inline struct page *folio_file_page(struct folio *folio, pgoff_t index)
+{
+	return &folio->page + (index & (folio_nr_pages(folio) - 1));
+}
+
+/**
+ * folio_contains - Does this folio contain this index?
+ * @folio: The folio.
+ * @index: The page index within the file.
+ *
+ * Context: The caller should have the page locked in order to prevent
+ * (eg) shmem from moving the page between the page cache and swap cache
+ * and changing its index in the middle of the operation.
+ * Return: true or false.
+ */
+static inline bool folio_contains(struct folio *folio, pgoff_t index)
+{
+	/* HugeTLBfs indexes the page cache in units of hpage_size */
+	if (PageHuge(&folio->page))
+		return folio->index == index;
+	return index - folio_index(folio) < folio_nr_pages(folio);
+}
+
 /*
  * Given the page we found in the page cache, return the page corresponding
  * to this index in the file
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 11/27] mm/filemap: Add folio_next_index
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 10/27] mm/filemap: Add folio_index, folio_file_page and folio_contains Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:40   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 12/27] mm/filemap: Add folio_offset and folio_file_offset Matthew Wilcox (Oracle)
                   ` (18 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

This helper returns the page index of the next folio in the file (ie
the end of this folio, plus one).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6749c47d3c33..3aefe6558f7d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -482,6 +482,17 @@ static inline pgoff_t folio_index(struct folio *folio)
         return folio->index;
 }
 
+/**
+ * folio_next_index - Get the index of the next folio.
+ * @folio: The current folio.
+ *
+ * Return: The index of the folio which follows this folio in the file.
+ */
+static inline pgoff_t folio_next_index(struct folio *folio)
+{
+	return folio->index + folio_nr_pages(folio);
+}
+
 /**
  * folio_file_page - The page for a particular index.
  * @folio: The folio which contains this index.
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 12/27] mm/filemap: Add folio_offset and folio_file_offset
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 11/27] mm/filemap: Add folio_next_index Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:42   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 13/27] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
                   ` (17 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

These are just wrappers around their page counterpart.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 3aefe6558f7d..b4570422a691 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -634,6 +634,16 @@ static inline loff_t page_file_offset(struct page *page)
 	return ((loff_t)page_index(page)) << PAGE_SHIFT;
 }
 
+static inline loff_t folio_offset(struct folio *folio)
+{
+	return page_offset(&folio->page);
+}
+
+static inline loff_t folio_file_offset(struct folio *folio)
+{
+	return page_file_offset(&folio->page);
+}
+
 extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
 				     unsigned long address);
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 13/27] mm/util: Add folio_mapping and folio_file_mapping
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (11 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 12/27] mm/filemap: Add folio_offset and folio_file_offset Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:45   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 14/27] mm: Add folio_mapcount Matthew Wilcox (Oracle)
                   ` (16 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

These are the folio equivalent of page_mapping() and page_file_mapping().
Add an out-of-line page_mapping() wrapper around folio_mapping()
in order to prevent the page_folio() call from bloating every caller
of page_mapping().  Adjust page_file_mapping() and page_mapping_file()
to use folios internally.  Rename __page_file_mapping() to
swapcache_mapping() and change it to take a folio.

This ends up saving 186 bytes of text overall.  folio_mapping() is
45 bytes shorter than page_mapping() was, but the new page_mapping()
wrapper is 30 bytes.  The major reduction is a few bytes less in dozens
of nfs functions (which call page_file_mapping()).  Most of these appear
to be a slight change in gcc's register allocation decisions, which allow:

   48 8b 56 08         mov    0x8(%rsi),%rdx
   48 8d 42 ff         lea    -0x1(%rdx),%rax
   83 e2 01            and    $0x1,%edx
   48 0f 44 c6         cmove  %rsi,%rax

to become:

   48 8b 46 08         mov    0x8(%rsi),%rax
   48 8d 78 ff         lea    -0x1(%rax),%rdi
   a8 01               test   $0x1,%al
   48 0f 44 fe         cmove  %rsi,%rdi

for a reduction of a single byte.  Once the NFS client is converted to
use folios, this entire sequence will disappear.

Also add folio_mapping() documentation.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/core-api/mm-api.rst |  2 ++
 include/linux/mm.h                | 14 -------------
 include/linux/pagemap.h           | 35 +++++++++++++++++++++++++++++--
 include/linux/swap.h              |  6 ++++++
 mm/Makefile                       |  2 +-
 mm/folio-compat.c                 | 13 ++++++++++++
 mm/swapfile.c                     |  8 +++----
 mm/util.c                         | 30 +++++++++++++++-----------
 8 files changed, 77 insertions(+), 33 deletions(-)
 create mode 100644 mm/folio-compat.c

diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst
index 1ead2570b217..b1f4e6d52199 100644
--- a/Documentation/core-api/mm-api.rst
+++ b/Documentation/core-api/mm-api.rst
@@ -98,3 +98,5 @@ More Memory Management Functions
 .. kernel-doc:: include/linux/mm.h
    :internal:
 .. kernel-doc:: include/linux/page_ref.h
+.. kernel-doc:: mm/util.c
+   :functions: folio_mapping
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 818010a6b4c9..a4f2818aeb1d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1739,19 +1739,6 @@ void page_address_init(void);
 
 extern void *page_rmapping(struct page *page);
 extern struct anon_vma *page_anon_vma(struct page *page);
-extern struct address_space *page_mapping(struct page *page);
-
-extern struct address_space *__page_file_mapping(struct page *);
-
-static inline
-struct address_space *page_file_mapping(struct page *page)
-{
-	if (unlikely(PageSwapCache(page)))
-		return __page_file_mapping(page);
-
-	return page->mapping;
-}
-
 extern pgoff_t __page_file_index(struct page *page);
 
 /*
@@ -1766,7 +1753,6 @@ static inline pgoff_t page_index(struct page *page)
 }
 
 bool page_mapped(struct page *page);
-struct address_space *page_mapping(struct page *page);
 
 /*
  * Return true only if the page has been allocated with
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index b4570422a691..01082cda7e72 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -162,14 +162,45 @@ static inline void filemap_nr_thps_dec(struct address_space *mapping)
 
 void release_pages(struct page **pages, int nr);
 
+struct address_space *page_mapping(struct page *);
+struct address_space *folio_mapping(struct folio *);
+struct address_space *swapcache_mapping(struct folio *);
+
+/**
+ * folio_file_mapping - Find the mapping this folio belongs to.
+ * @folio: The folio.
+ *
+ * For folios which are in the page cache, return the mapping that this
+ * page belongs to.  Folios in the swap cache return the mapping of the
+ * swap file or swap device where the data is stored.  This is different
+ * from the mapping returned by folio_mapping().  The only reason to
+ * use it is if, like NFS, you return 0 from ->activate_swapfile.
+ *
+ * Do not call this for folios which aren't in the page cache or swap cache.
+ */
+static inline struct address_space *folio_file_mapping(struct folio *folio)
+{
+	if (unlikely(FolioSwapCache(folio)))
+		return swapcache_mapping(folio);
+
+	return folio->mapping;
+}
+
+static inline struct address_space *page_file_mapping(struct page *page)
+{
+	return folio_file_mapping(page_folio(page));
+}
+
 /*
  * For file cache pages, return the address_space, otherwise return NULL
  */
 static inline struct address_space *page_mapping_file(struct page *page)
 {
-	if (unlikely(PageSwapCache(page)))
+	struct folio *folio = page_folio(page);
+
+	if (unlikely(FolioSwapCache(folio)))
 		return NULL;
-	return page_mapping(page);
+	return folio_mapping(folio);
 }
 
 /*
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 144727041e78..20766342845b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -314,6 +314,12 @@ struct vma_swap_readahead {
 #endif
 };
 
+static inline swp_entry_t folio_swap_entry(struct folio *folio)
+{
+	swp_entry_t entry = { .val = page_private(&folio->page) };
+	return entry;
+}
+
 /* linux/mm/workingset.c */
 void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages);
 void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg);
diff --git a/mm/Makefile b/mm/Makefile
index a9ad6122d468..434c2a46b6c5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,7 +46,7 @@ mmu-$(CONFIG_MMU)	+= process_vm_access.o
 endif
 
 obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
-			   maccess.o page-writeback.o \
+			   maccess.o page-writeback.o folio-compat.o \
 			   readahead.o swap.o truncate.o vmscan.o shmem.o \
 			   util.o mmzone.o vmstat.o backing-dev.o \
 			   mm_init.o percpu.o slab_common.o \
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
new file mode 100644
index 000000000000..5e107aa30a62
--- /dev/null
+++ b/mm/folio-compat.c
@@ -0,0 +1,13 @@
+/*
+ * Compatibility functions which bloat the callers too much to make inline.
+ * All of the callers of these functions should be converted to use folios
+ * eventually.
+ */
+
+#include <linux/pagemap.h>
+
+struct address_space *page_mapping(struct page *page)
+{
+	return folio_mapping(page_folio(page));
+}
+EXPORT_SYMBOL(page_mapping);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 149e77454e3c..d0ee24239a83 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3533,13 +3533,13 @@ struct swap_info_struct *page_swap_info(struct page *page)
 }
 
 /*
- * out-of-line __page_file_ methods to avoid include hell.
+ * out-of-line methods to avoid include hell.
  */
-struct address_space *__page_file_mapping(struct page *page)
+struct address_space *swapcache_mapping(struct folio *folio)
 {
-	return page_swap_info(page)->swap_file->f_mapping;
+	return page_swap_info(&folio->page)->swap_file->f_mapping;
 }
-EXPORT_SYMBOL_GPL(__page_file_mapping);
+EXPORT_SYMBOL_GPL(swapcache_mapping);
 
 pgoff_t __page_file_index(struct page *page)
 {
diff --git a/mm/util.c b/mm/util.c
index 521a772f06eb..b01b35429473 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -705,30 +705,36 @@ BUILD_BUG_ON(offsetof(struct page, pg) != offsetof(struct folio, fl));
 	BUILD_BUG_ON(sizeof(struct page) != sizeof(struct folio));
 }
 
-struct address_space *page_mapping(struct page *page)
+/**
+ * folio_mapping - Find the mapping where this folio is stored.
+ * @folio: The folio.
+ *
+ * For folios which are in the page cache, return the mapping that this
+ * page belongs to.  Folios in the swap cache return the swap mapping
+ * this page is stored in (which is different from the mapping for the
+ * swap file or swap device where the data is stored).
+ *
+ * You can call this for folios which aren't in the swap cache or page
+ * cache and it will return NULL.
+ */
+struct address_space *folio_mapping(struct folio *folio)
 {
 	struct address_space *mapping;
 
-	page = compound_head(page);
-
 	/* This happens if someone calls flush_dcache_page on slab page */
-	if (unlikely(PageSlab(page)))
+	if (unlikely(FolioSlab(folio)))
 		return NULL;
 
-	if (unlikely(PageSwapCache(page))) {
-		swp_entry_t entry;
-
-		entry.val = page_private(page);
-		return swap_address_space(entry);
-	}
+	if (unlikely(FolioSwapCache(folio)))
+		return swap_address_space(folio_swap_entry(folio));
 
-	mapping = page->mapping;
+	mapping = folio->mapping;
 	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
 		return NULL;
 
 	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
 }
-EXPORT_SYMBOL(page_mapping);
+EXPORT_SYMBOL(folio_mapping);
 
 /* Slow path of page_mapcount() for compound pages */
 int __page_mapcount(struct page *page)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 14/27] mm: Add folio_mapcount
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (12 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 13/27] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:46   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 15/27] mm/memcg: Add folio wrappers for various functions Matthew Wilcox (Oracle)
                   ` (15 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

This is the folio equivalent of page_mapcount().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a4f2818aeb1d..fc15a256e686 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -869,6 +869,22 @@ static inline int page_mapcount(struct page *page)
 	return atomic_read(&page->_mapcount) + 1;
 }
 
+/**
+ * folio_mapcount - The number of mappings of this folio.
+ * @folio: The folio.
+ *
+ * The result includes the number of times any of the pages in the
+ * folio are mapped to userspace.
+ *
+ * Return: The number of page table entries which refer to this folio.
+ */
+static inline int folio_mapcount(struct folio *folio)
+{
+	if (unlikely(FolioMulti(folio)))
+		return __page_mapcount(&folio->page);
+	return atomic_read(&folio->_mapcount) + 1;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
 int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 15/27] mm/memcg: Add folio wrappers for various functions
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (13 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 14/27] mm: Add folio_mapcount Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:48   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 16/27] mm/filemap: Add unlock_folio Matthew Wilcox (Oracle)
                   ` (14 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Add new wrapper functions folio_memcg(), lock_folio_memcg(),
unlock_folio_memcg(), mem_cgroup_folio_lruvec() and
count_memcg_folio_event()

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/memcontrol.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0e8907957227..d8027831a681 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -456,6 +456,11 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
 		return __page_memcg(page);
 }
 
+static inline struct mem_cgroup *folio_memcg(struct folio *folio)
+{
+	return page_memcg(&folio->page);
+}
+
 /*
  * page_memcg_rcu - locklessly get the memory cgroup associated with a page
  * @page: a pointer to the page struct
@@ -1052,6 +1057,15 @@ static inline void count_memcg_page_event(struct page *page,
 		count_memcg_events(memcg, idx, 1);
 }
 
+static inline void count_memcg_folio_event(struct folio *folio,
+					  enum vm_event_item idx)
+{
+	struct mem_cgroup *memcg = folio_memcg(folio);
+
+	if (memcg)
+		count_memcg_events(memcg, idx, folio_nr_pages(folio));
+}
+
 static inline void count_memcg_event_mm(struct mm_struct *mm,
 					enum vm_event_item idx)
 {
@@ -1473,6 +1487,22 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 }
 #endif /* CONFIG_MEMCG */
 
+static inline void lock_folio_memcg(struct folio *folio)
+{
+	lock_page_memcg(&folio->page);
+}
+
+static inline void unlock_folio_memcg(struct folio *folio)
+{
+	unlock_page_memcg(&folio->page);
+}
+
+static inline struct lruvec *mem_cgroup_folio_lruvec(struct folio *folio,
+						    struct pglist_data *pgdat)
+{
+	return mem_cgroup_page_lruvec(&folio->page, pgdat);
+}
+
 static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
 {
 	__mod_lruvec_kmem_state(p, idx, 1);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 16/27] mm/filemap: Add unlock_folio
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (14 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 15/27] mm/memcg: Add folio wrappers for various functions Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:51   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 17/27] mm/filemap: Add lock_folio Matthew Wilcox (Oracle)
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Convert unlock_page() to call unlock_folio().  By using a folio we
avoid a call to compound_head().  This shortens the function from 39
bytes to 25 and removes 4 instructions on x86-64.  Because we still
have unlock_page(), it's a net increase of 24 bytes of text for the
kernel as a whole, but any path that uses unlock_folio() will execute
4 fewer instructions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h |  3 ++-
 mm/filemap.c            | 27 ++++++++++-----------------
 mm/folio-compat.c       |  6 ++++++
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 01082cda7e72..ee83ada556e0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -719,7 +719,8 @@ extern int __lock_page_killable(struct page *page);
 extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
-extern void unlock_page(struct page *page);
+void unlock_page(struct page *page);
+void unlock_folio(struct folio *folio);
 void unlock_page_private_2(struct page *page);
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index c03463cb72d6..6d320264e5e0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1435,29 +1435,22 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
 #endif
 
 /**
- * unlock_page - unlock a locked page
- * @page: the page
+ * unlock_folio - Unlock a locked folio.
+ * @folio: The folio.
  *
- * Unlocks the page and wakes up sleepers in wait_on_page_locked().
- * Also wakes sleepers in wait_on_page_writeback() because the wakeup
- * mechanism between PageLocked pages and PageWriteback pages is shared.
- * But that's OK - sleepers in wait_on_page_writeback() just go back to sleep.
+ * Unlocks the folio and wakes up any thread sleeping on the page lock.
  *
- * Note that this depends on PG_waiters being the sign bit in the byte
- * that contains PG_locked - thus the BUILD_BUG_ON(). That allows us to
- * clear the PG_locked bit and test PG_waiters at the same time fairly
- * portably (architectures that do LL/SC can test any bit, while x86 can
- * test the sign bit).
+ * Context: May be called from interrupt or process context.  May not be
+ * called from NMI context.
  */
-void unlock_page(struct page *page)
+void unlock_folio(struct folio *folio)
 {
 	BUILD_BUG_ON(PG_waiters != 7);
-	page = compound_head(page);
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags))
-		wake_up_page_bit(page, PG_locked);
+	VM_BUG_ON_FOLIO(!FolioLocked(folio), folio);
+	if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0)))
+		wake_up_page_bit(&folio->page, PG_locked);
 }
-EXPORT_SYMBOL(unlock_page);
+EXPORT_SYMBOL(unlock_folio);
 
 /**
  * unlock_page_private_2 - Unlock a page that's locked with PG_private_2
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 5e107aa30a62..02798abf19a1 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -11,3 +11,9 @@ struct address_space *page_mapping(struct page *page)
 	return folio_mapping(page_folio(page));
 }
 EXPORT_SYMBOL(page_mapping);
+
+void unlock_page(struct page *page)
+{
+	return unlock_folio(page_folio(page));
+}
+EXPORT_SYMBOL(unlock_page);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 17/27] mm/filemap: Add lock_folio
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (15 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 16/27] mm/filemap: Add unlock_folio Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:52   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 18/27] mm/filemap: Add lock_folio_killable Matthew Wilcox (Oracle)
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

This is like lock_page() but for use by callers who know they have a folio.
Convert __lock_page() to be __lock_folio().  This saves one call to
compound_head() per contended call to lock_page().

Saves 362 bytes of text; mostly from improved register allocation and
inlining decisions.  __lock_folio is 59 bytes while __lock_page was 79.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 24 +++++++++++++++++++-----
 mm/filemap.c            | 29 +++++++++++++++--------------
 2 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ee83ada556e0..1e0705c74539 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -714,7 +714,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 	return true;
 }
 
-extern void __lock_page(struct page *page);
+void __lock_folio(struct folio *folio);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
@@ -723,13 +723,24 @@ void unlock_page(struct page *page);
 void unlock_folio(struct folio *folio);
 void unlock_page_private_2(struct page *page);
 
+static inline bool trylock_folio(struct folio *folio)
+{
+	return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0)));
+}
+
 /*
  * Return true if the page was successfully locked
  */
 static inline int trylock_page(struct page *page)
 {
-	page = compound_head(page);
-	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
+	return trylock_folio(page_folio(page));
+}
+
+static inline void lock_folio(struct folio *folio)
+{
+	might_sleep();
+	if (!trylock_folio(folio))
+		__lock_folio(folio);
 }
 
 /*
@@ -737,9 +748,12 @@ static inline int trylock_page(struct page *page)
  */
 static inline void lock_page(struct page *page)
 {
+	struct folio *folio;
 	might_sleep();
-	if (!trylock_page(page))
-		__lock_page(page);
+
+	folio = page_folio(page);
+	if (!trylock_folio(folio))
+		__lock_folio(folio);
 }
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index 6d320264e5e0..daf66d00e57a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1187,7 +1187,7 @@ static void wake_up_page(struct page *page, int bit)
  */
 enum behavior {
 	EXCLUSIVE,	/* Hold ref to page and take the bit when woken, like
-			 * __lock_page() waiting on then setting PG_locked.
+			 * __lock_folio() waiting on then setting PG_locked.
 			 */
 	SHARED,		/* Hold ref to page and check the bit when woken, like
 			 * wait_on_page_writeback() waiting on PG_writeback.
@@ -1535,17 +1535,16 @@ void page_endio(struct page *page, bool is_write, int err)
 EXPORT_SYMBOL_GPL(page_endio);
 
 /**
- * __lock_page - get a lock on the page, assuming we need to sleep to get it
- * @__page: the page to lock
+ * __lock_folio - Get a lock on the folio, assuming we need to sleep to get it.
+ * @folio: The folio to lock
  */
-void __lock_page(struct page *__page)
+void __lock_folio(struct folio *folio)
 {
-	struct page *page = compound_head(__page);
-	wait_queue_head_t *q = page_waitqueue(page);
-	wait_on_page_bit_common(q, page, PG_locked, TASK_UNINTERRUPTIBLE,
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_UNINTERRUPTIBLE,
 				EXCLUSIVE);
 }
-EXPORT_SYMBOL(__lock_page);
+EXPORT_SYMBOL(__lock_folio);
 
 int __lock_page_killable(struct page *__page)
 {
@@ -1620,10 +1619,10 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 			return 0;
 		}
 	} else {
-		__lock_page(page);
+		__lock_folio(page_folio(page));
 	}
-	return 1;
 
+	return 1;
 }
 
 /**
@@ -2767,7 +2766,9 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
 static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 				     struct file **fpin)
 {
-	if (trylock_page(page))
+	struct folio *folio = page_folio(page);
+
+	if (trylock_folio(folio))
 		return 1;
 
 	/*
@@ -2780,7 +2781,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 
 	*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
 	if (vmf->flags & FAULT_FLAG_KILLABLE) {
-		if (__lock_page_killable(page)) {
+		if (__lock_page_killable(&folio->page)) {
 			/*
 			 * We didn't have the right flags to drop the mmap_lock,
 			 * but all fault_handlers only check for fatal signals
@@ -2792,11 +2793,11 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 			return 0;
 		}
 	} else
-		__lock_page(page);
+		__lock_folio(folio);
+
 	return 1;
 }
 
-
 /*
  * Synchronous readahead happens when we don't even find a page in the page
  * cache at all.  We don't want to perform IO under the mmap sem, so if we have
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 18/27] mm/filemap: Add lock_folio_killable
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (16 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 17/27] mm/filemap: Add lock_folio Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:53   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 19/27] mm/filemap: Add __lock_folio_async Matthew Wilcox (Oracle)
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

This is like lock_page_killable() but for use by callers who
know they have a folio.  Convert __lock_page_killable() to be
__lock_folio_killable().  This saves one call to compound_head() per
contended call to lock_page_killable().

__lock_folio_killable() is 20 bytes smaller than __lock_page_killable()
was.  lock_page_maybe_drop_mmap() shrinks by 68 bytes and
__lock_page_or_retry() shrinks by 66 bytes.  That's a total of 154 bytes
of text saved.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 15 ++++++++++-----
 mm/filemap.c            | 17 +++++++++--------
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 1e0705c74539..75060da77be5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -715,7 +715,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 }
 
 void __lock_folio(struct folio *folio);
-extern int __lock_page_killable(struct page *page);
+int __lock_folio_killable(struct folio *folio);
 extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
@@ -756,6 +756,14 @@ static inline void lock_page(struct page *page)
 		__lock_folio(folio);
 }
 
+static inline int lock_folio_killable(struct folio *folio)
+{
+	might_sleep();
+	if (!trylock_folio(folio))
+		return __lock_folio_killable(folio);
+	return 0;
+}
+
 /*
  * lock_page_killable is like lock_page but can be interrupted by fatal
  * signals.  It returns 0 if it locked the page and -EINTR if it was
@@ -763,10 +771,7 @@ static inline void lock_page(struct page *page)
  */
 static inline int lock_page_killable(struct page *page)
 {
-	might_sleep();
-	if (!trylock_page(page))
-		return __lock_page_killable(page);
-	return 0;
+	return lock_folio_killable(page_folio(page));
 }
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index daf66d00e57a..5675237c985a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1546,14 +1546,13 @@ void __lock_folio(struct folio *folio)
 }
 EXPORT_SYMBOL(__lock_folio);
 
-int __lock_page_killable(struct page *__page)
+int __lock_folio_killable(struct folio *folio)
 {
-	struct page *page = compound_head(__page);
-	wait_queue_head_t *q = page_waitqueue(page);
-	return wait_on_page_bit_common(q, page, PG_locked, TASK_KILLABLE,
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	return wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_KILLABLE,
 					EXCLUSIVE);
 }
-EXPORT_SYMBOL_GPL(__lock_page_killable);
+EXPORT_SYMBOL_GPL(__lock_folio_killable);
 
 int __lock_page_async(struct page *page, struct wait_page_queue *wait)
 {
@@ -1595,6 +1594,8 @@ int __lock_page_async(struct page *page, struct wait_page_queue *wait)
 int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 			 unsigned int flags)
 {
+	struct folio *folio = page_folio(page);
+
 	if (fault_flag_allow_retry_first(flags)) {
 		/*
 		 * CAUTION! In this case, mmap_lock is not released
@@ -1613,13 +1614,13 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 	if (flags & FAULT_FLAG_KILLABLE) {
 		int ret;
 
-		ret = __lock_page_killable(page);
+		ret = __lock_folio_killable(folio);
 		if (ret) {
 			mmap_read_unlock(mm);
 			return 0;
 		}
 	} else {
-		__lock_folio(page_folio(page));
+		__lock_folio(folio);
 	}
 
 	return 1;
@@ -2781,7 +2782,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 
 	*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
 	if (vmf->flags & FAULT_FLAG_KILLABLE) {
-		if (__lock_page_killable(&folio->page)) {
+		if (__lock_folio_killable(folio)) {
 			/*
 			 * We didn't have the right flags to drop the mmap_lock,
 			 * but all fault_handlers only check for fatal signals
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 19/27] mm/filemap: Add __lock_folio_async
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (17 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 18/27] mm/filemap: Add lock_folio_killable Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:55   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 20/27] mm/filemap: Add __lock_folio_or_retry Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

There aren't any actual callers of lock_page_async(), so remove it.
Convert filemap_update_page() to call __lock_folio_async().

__lock_folio_async() is 21 bytes smaller than __lock_page_async(),
but the real savings come from using a folio in filemap_update_page(),
shrinking it from 514 bytes to 403 bytes, saving 111 bytes.  The text
shrinks by 132 bytes in total.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/io_uring.c           |  2 +-
 include/linux/pagemap.h | 17 -----------------
 mm/filemap.c            | 31 ++++++++++++++++---------------
 3 files changed, 17 insertions(+), 33 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 962c121fa107..64a22b2ea6c5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3156,7 +3156,7 @@ static int io_read_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 }
 
 /*
- * This is our waitqueue callback handler, registered through lock_page_async()
+ * This is our waitqueue callback handler, registered through lock_folio_async()
  * when we initially tried to do the IO with the iocb armed our waitqueue.
  * This gets called when the page is unlocked, and we generally expect that to
  * happen when the page IO is completed and the page is now uptodate. This will
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 75060da77be5..054e9dd7628e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -716,7 +716,6 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 
 void __lock_folio(struct folio *folio);
 int __lock_folio_killable(struct folio *folio);
-extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 void unlock_page(struct page *page);
@@ -774,22 +773,6 @@ static inline int lock_page_killable(struct page *page)
 	return lock_folio_killable(page_folio(page));
 }
 
-/*
- * lock_page_async - Lock the page, unless this would block. If the page
- * is already locked, then queue a callback when the page becomes unlocked.
- * This callback can then retry the operation.
- *
- * Returns 0 if the page is locked successfully, or -EIOCBQUEUED if the page
- * was already locked and the callback defined in 'wait' was queued.
- */
-static inline int lock_page_async(struct page *page,
-				  struct wait_page_queue *wait)
-{
-	if (!trylock_page(page))
-		return __lock_page_async(page, wait);
-	return 0;
-}
-
 /*
  * lock_page_or_retry - Lock the page, unless this would block and the
  * caller indicated that it can handle a retry.
diff --git a/mm/filemap.c b/mm/filemap.c
index 5675237c985a..84642e41c6b5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1554,18 +1554,18 @@ int __lock_folio_killable(struct folio *folio)
 }
 EXPORT_SYMBOL_GPL(__lock_folio_killable);
 
-int __lock_page_async(struct page *page, struct wait_page_queue *wait)
+static int __lock_folio_async(struct folio *folio, struct wait_page_queue *wait)
 {
-	struct wait_queue_head *q = page_waitqueue(page);
+	struct wait_queue_head *q = page_waitqueue(&folio->page);
 	int ret = 0;
 
-	wait->page = page;
+	wait->page = &folio->page;
 	wait->bit_nr = PG_locked;
 
 	spin_lock_irq(&q->lock);
 	__add_wait_queue_entry_tail(q, &wait->wait);
-	SetPageWaiters(page);
-	ret = !trylock_page(page);
+	SetFolioWaiters(folio);
+	ret = !trylock_folio(folio);
 	/*
 	 * If we were successful now, we know we're still on the
 	 * waitqueue as we're still under the lock. This means it's
@@ -2312,41 +2312,42 @@ static int filemap_update_page(struct kiocb *iocb,
 		struct address_space *mapping, struct iov_iter *iter,
 		struct page *page)
 {
+	struct folio *folio = page_folio(page);
 	int error;
 
-	if (!trylock_page(page)) {
+	if (!trylock_folio(folio)) {
 		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
 			return -EAGAIN;
 		if (!(iocb->ki_flags & IOCB_WAITQ)) {
-			put_and_wait_on_page_locked(page, TASK_KILLABLE);
+			put_and_wait_on_page_locked(&folio->page, TASK_KILLABLE);
 			return AOP_TRUNCATED_PAGE;
 		}
-		error = __lock_page_async(page, iocb->ki_waitq);
+		error = __lock_folio_async(folio, iocb->ki_waitq);
 		if (error)
 			return error;
 	}
 
-	if (!page->mapping)
+	if (!folio->mapping)
 		goto truncated;
 
 	error = 0;
-	if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, page))
+	if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, &folio->page))
 		goto unlock;
 
 	error = -EAGAIN;
 	if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT | IOCB_WAITQ))
 		goto unlock;
 
-	error = filemap_read_page(iocb->ki_filp, mapping, page);
+	error = filemap_read_page(iocb->ki_filp, mapping, &folio->page);
 	if (error == AOP_TRUNCATED_PAGE)
-		put_page(page);
+		put_folio(folio);
 	return error;
 truncated:
-	unlock_page(page);
-	put_page(page);
+	unlock_folio(folio);
+	put_folio(folio);
 	return AOP_TRUNCATED_PAGE;
 unlock:
-	unlock_page(page);
+	unlock_folio(folio);
 	return error;
 }
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 20/27] mm/filemap: Add __lock_folio_or_retry
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (18 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 19/27] mm/filemap: Add __lock_folio_async Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 13:57   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 21/27] mm/filemap: Add wait_on_folio_locked Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Convert __lock_page_or_retry() to __lock_folio_or_retry().  This actually
saves 4 bytes in the only caller of lock_page_or_retry() (due to better
register allocation) and saves the 20 byte cost of calling page_folio()
in __lock_folio_or_retry() for a total saving of 24 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h |  9 ++++++---
 mm/filemap.c            | 10 ++++------
 mm/memory.c             |  8 ++++----
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 054e9dd7628e..43664bef7392 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -716,7 +716,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 
 void __lock_folio(struct folio *folio);
 int __lock_folio_killable(struct folio *folio);
-extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
+int __lock_folio_or_retry(struct folio *folio, struct mm_struct *mm,
 				unsigned int flags);
 void unlock_page(struct page *page);
 void unlock_folio(struct folio *folio);
@@ -778,13 +778,16 @@ static inline int lock_page_killable(struct page *page)
  * caller indicated that it can handle a retry.
  *
  * Return value and mmap_lock implications depend on flags; see
- * __lock_page_or_retry().
+ * __lock_folio_or_retry().
  */
 static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				     unsigned int flags)
 {
+	struct folio *folio;
 	might_sleep();
-	return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
+
+	folio = page_folio(page);
+	return trylock_folio(folio) || __lock_folio_or_retry(folio, mm, flags);
 }
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index 84642e41c6b5..c0a986ac830f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1582,20 +1582,18 @@ static int __lock_folio_async(struct folio *folio, struct wait_page_queue *wait)
 
 /*
  * Return values:
- * 1 - page is locked; mmap_lock is still held.
- * 0 - page is not locked.
+ * 1 - folio is locked; mmap_lock is still held.
+ * 0 - folio is not locked.
  *     mmap_lock has been released (mmap_read_unlock(), unless flags had both
  *     FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in
  *     which case mmap_lock is still held.
  *
  * If neither ALLOW_RETRY nor KILLABLE are set, will always return 1
- * with the page locked and the mmap_lock unperturbed.
+ * with the folio locked and the mmap_lock unperturbed.
  */
-int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
+int __lock_folio_or_retry(struct folio *folio, struct mm_struct *mm,
 			 unsigned int flags)
 {
-	struct folio *folio = page_folio(page);
-
 	if (fault_flag_allow_retry_first(flags)) {
 		/*
 		 * CAUTION! In this case, mmap_lock is not released
diff --git a/mm/memory.c b/mm/memory.c
index e66b11ac1659..d538441ccb49 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4077,7 +4077,7 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
  * We enter with non-exclusive mmap_lock (to exclude vma changes,
  * but allow concurrent faults).
  * The mmap_lock may have been released depending on flags and our
- * return value.  See filemap_fault() and __lock_page_or_retry().
+ * return value.  See filemap_fault() and __lock_folio_or_retry().
  * If mmap_lock is released, vma may become invalid (for example
  * by other thread calling munmap()).
  */
@@ -4309,7 +4309,7 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
  * concurrent faults).
  *
  * The mmap_lock may have been released depending on flags and our return value.
- * See filemap_fault() and __lock_page_or_retry().
+ * See filemap_fault() and __lock_folio_or_retry().
  */
 static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 {
@@ -4413,7 +4413,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
  * By the time we get here, we already hold the mm semaphore
  *
  * The mmap_lock may have been released depending on flags and our
- * return value.  See filemap_fault() and __lock_page_or_retry().
+ * return value.  See filemap_fault() and __lock_folio_or_retry().
  */
 static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags)
@@ -4569,7 +4569,7 @@ static inline void mm_account_fault(struct pt_regs *regs,
  * By the time we get here, we already hold the mm semaphore
  *
  * The mmap_lock may have been released depending on flags and our
- * return value.  See filemap_fault() and __lock_page_or_retry().
+ * return value.  See filemap_fault() and __lock_folio_or_retry().
  */
 vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 			   unsigned int flags, struct pt_regs *regs)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 21/27] mm/filemap: Add wait_on_folio_locked
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (19 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 20/27] mm/filemap: Add __lock_folio_or_retry Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 14:11   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 22/27] mm/filemap: Add end_folio_writeback Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Also add wait_on_folio_locked_killable().  Turn wait_on_page_locked()
and wait_on_page_locked_killable() into wrappers.  This eliminates a
call to compound_head() from each call-site, reducing text size by 200
bytes for me.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 26 ++++++++++++++++++--------
 mm/filemap.c            |  4 ++--
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 43664bef7392..a7bbc34d92cb 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -798,23 +798,33 @@ extern void wait_on_page_bit(struct page *page, int bit_nr);
 extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
 
 /* 
- * Wait for a page to be unlocked.
+ * Wait for a folio to be unlocked.
  *
- * This must be called with the caller "holding" the page,
- * ie with increased "page->count" so that the page won't
+ * This must be called with the caller "holding" the folio,
+ * ie with increased "page->count" so that the folio won't
  * go away during the wait..
  */
+static inline void wait_on_folio_locked(struct folio *folio)
+{
+	if (FolioLocked(folio))
+		wait_on_page_bit(&folio->page, PG_locked);
+}
+
+static inline int wait_on_folio_locked_killable(struct folio *folio)
+{
+	if (!FolioLocked(folio))
+		return 0;
+	return wait_on_page_bit_killable(&folio->page, PG_locked);
+}
+
 static inline void wait_on_page_locked(struct page *page)
 {
-	if (PageLocked(page))
-		wait_on_page_bit(compound_head(page), PG_locked);
+	wait_on_folio_locked(page_folio(page));
 }
 
 static inline int wait_on_page_locked_killable(struct page *page)
 {
-	if (!PageLocked(page))
-		return 0;
-	return wait_on_page_bit_killable(compound_head(page), PG_locked);
+	return wait_on_folio_locked_killable(page_folio(page));
 }
 
 int put_and_wait_on_page_locked(struct page *page, int state);
diff --git a/mm/filemap.c b/mm/filemap.c
index c0a986ac830f..21607db04835 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1604,9 +1604,9 @@ int __lock_folio_or_retry(struct folio *folio, struct mm_struct *mm,
 
 		mmap_read_unlock(mm);
 		if (flags & FAULT_FLAG_KILLABLE)
-			wait_on_page_locked_killable(page);
+			wait_on_folio_locked_killable(folio);
 		else
-			wait_on_page_locked(page);
+			wait_on_folio_locked(folio);
 		return 0;
 	}
 	if (flags & FAULT_FLAG_KILLABLE) {
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 22/27] mm/filemap: Add end_folio_writeback
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (20 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 21/27] mm/filemap: Add wait_on_folio_locked Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 14:13   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 23/27] mm/writeback: Add wait_on_folio_writeback Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Add an end_page_writeback() wrapper function for users that are not yet
converted to folios.

end_folio_writeback() is less than half the size of end_page_writeback()
at just 105 bytes compared to 213 bytes, due to removing all the
compound_head() calls.  The 30 byte wrapper function makes this a net
saving of 70 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h |  3 ++-
 mm/filemap.c            | 38 +++++++++++++++++++-------------------
 mm/folio-compat.c       |  6 ++++++
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a7bbc34d92cb..36c31cfa3e64 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -830,7 +830,8 @@ static inline int wait_on_page_locked_killable(struct page *page)
 int put_and_wait_on_page_locked(struct page *page, int state);
 void wait_on_page_writeback(struct page *page);
 int wait_on_page_writeback_killable(struct page *page);
-extern void end_page_writeback(struct page *page);
+void end_page_writeback(struct page *page);
+void end_folio_writeback(struct folio *folio);
 void wait_for_stable_page(struct page *page);
 
 void page_endio(struct page *page, bool is_write, int err);
diff --git a/mm/filemap.c b/mm/filemap.c
index 21607db04835..4591974f2c28 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1175,11 +1175,11 @@ static void wake_up_page_bit(struct page *page, int bit_nr)
 	spin_unlock_irqrestore(&q->lock, flags);
 }
 
-static void wake_up_page(struct page *page, int bit)
+static void wake_up_folio(struct folio *folio, int bit)
 {
-	if (!PageWaiters(page))
+	if (!FolioWaiters(folio))
 		return;
-	wake_up_page_bit(page, bit);
+	wake_up_page_bit(&folio->page, bit);
 }
 
 /*
@@ -1473,38 +1473,38 @@ void unlock_page_private_2(struct page *page)
 EXPORT_SYMBOL(unlock_page_private_2);
 
 /**
- * end_page_writeback - end writeback against a page
- * @page: the page
+ * end_folio_writeback - End writeback against a folio.
+ * @folio: The folio.
  */
-void end_page_writeback(struct page *page)
+void end_folio_writeback(struct folio *folio)
 {
 	/*
 	 * TestClearPageReclaim could be used here but it is an atomic
 	 * operation and overkill in this particular case. Failing to
-	 * shuffle a page marked for immediate reclaim is too mild to
+	 * shuffle a folio marked for immediate reclaim is too mild to
 	 * justify taking an atomic operation penalty at the end of
-	 * ever page writeback.
+	 * every folio writeback.
 	 */
-	if (PageReclaim(page)) {
-		ClearPageReclaim(page);
-		rotate_reclaimable_page(page);
+	if (FolioReclaim(folio)) {
+		ClearFolioReclaim(folio);
+		rotate_reclaimable_page(&folio->page);
 	}
 
 	/*
-	 * Writeback does not hold a page reference of its own, relying
+	 * Writeback does not hold a folio reference of its own, relying
 	 * on truncation to wait for the clearing of PG_writeback.
-	 * But here we must make sure that the page is not freed and
-	 * reused before the wake_up_page().
+	 * But here we must make sure that the folio is not freed and
+	 * reused before the wake_up_folio().
 	 */
-	get_page(page);
-	if (!test_clear_page_writeback(page))
+	get_folio(folio);
+	if (!test_clear_page_writeback(&folio->page))
 		BUG();
 
 	smp_mb__after_atomic();
-	wake_up_page(page, PG_writeback);
-	put_page(page);
+	wake_up_folio(folio, PG_writeback);
+	put_folio(folio);
 }
-EXPORT_SYMBOL(end_page_writeback);
+EXPORT_SYMBOL(end_folio_writeback);
 
 /*
  * After completing I/O on a page, call this routine to update the page
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 02798abf19a1..d1a1dfe52589 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -17,3 +17,9 @@ void unlock_page(struct page *page)
 	return unlock_folio(page_folio(page));
 }
 EXPORT_SYMBOL(unlock_page);
+
+void end_page_writeback(struct page *page)
+{
+	return end_folio_writeback(page_folio(page));
+}
+EXPORT_SYMBOL(end_page_writeback);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 23/27] mm/writeback: Add wait_on_folio_writeback
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (21 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 22/27] mm/filemap: Add end_folio_writeback Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 14:15   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 24/27] mm/writeback: Add wait_for_stable_folio Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

wait_on_page_writeback_killable() only has one caller, so convert it to
call wait_on_folio_writeback_killable().  For the wait_on_page_writeback()
callers, add a compatibility wrapper around wait_on_folio_writeback().

Turning PageWriteback() into FolioWriteback() eliminates a call to
compound_head() which saves 8 bytes and 15 bytes in the two functions.
That is more than offset by adding the wait_on_page_writeback
compatibility wrapper for a net increase in text of 15 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/afs/write.c          |  5 +++--
 include/linux/pagemap.h |  3 ++-
 mm/folio-compat.c       |  6 ++++++
 mm/page-writeback.c     | 48 ++++++++++++++++++++++++++++-------------
 4 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index 106a864b6a93..7af2b57e601b 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -831,7 +831,8 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
  */
 vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 {
-	struct page *page = thp_head(vmf->page);
+	struct folio *folio = page_folio(vmf->page);
+	struct page *page = &folio->page;
 	struct file *file = vmf->vma->vm_file;
 	struct inode *inode = file_inode(file);
 	struct afs_vnode *vnode = AFS_FS_I(inode);
@@ -850,7 +851,7 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 		return VM_FAULT_RETRY;
 #endif
 
-	if (wait_on_page_writeback_killable(page))
+	if (wait_on_folio_writeback_killable(folio))
 		return VM_FAULT_RETRY;
 
 	if (lock_page_killable(page) < 0)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 36c31cfa3e64..2be3a028eb3d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -829,7 +829,8 @@ static inline int wait_on_page_locked_killable(struct page *page)
 
 int put_and_wait_on_page_locked(struct page *page, int state);
 void wait_on_page_writeback(struct page *page);
-int wait_on_page_writeback_killable(struct page *page);
+void wait_on_folio_writeback(struct folio *folio);
+int wait_on_folio_writeback_killable(struct folio *folio);
 void end_page_writeback(struct page *page);
 void end_folio_writeback(struct folio *folio);
 void wait_for_stable_page(struct page *page);
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index d1a1dfe52589..6aadecc39fba 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -23,3 +23,9 @@ void end_page_writeback(struct page *page)
 	return end_folio_writeback(page_folio(page));
 }
 EXPORT_SYMBOL(end_page_writeback);
+
+void wait_on_page_writeback(struct page *page)
+{
+	return wait_on_folio_writeback(page_folio(page));
+}
+EXPORT_SYMBOL_GPL(wait_on_page_writeback);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0062d5c57d41..8271f9b24b69 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2818,33 +2818,51 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 }
 EXPORT_SYMBOL(__test_set_page_writeback);
 
-/*
- * Wait for a page to complete writeback
+/**
+ * wait_on_folio_writeback - Wait for a folio to finish writeback.
+ * @folio: The folio to wait for.
+ *
+ * If the folio is currently being written back to storage, wait for the
+ * I/O to complete.
+ *
+ * Context: Sleeps.  Must be called in process context and with
+ * no spinlocks held.  Caller should hold a reference on the folio.
+ * If the folio is not locked, writeback may start again after writeback
+ * has finished.
  */
-void wait_on_page_writeback(struct page *page)
+void wait_on_folio_writeback(struct folio *folio)
 {
-	while (PageWriteback(page)) {
-		trace_wait_on_page_writeback(page, page_mapping(page));
-		wait_on_page_bit(page, PG_writeback);
+	while (FolioWriteback(folio)) {
+		trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
+		wait_on_page_bit(&folio->page, PG_writeback);
 	}
 }
-EXPORT_SYMBOL_GPL(wait_on_page_writeback);
+EXPORT_SYMBOL_GPL(wait_on_folio_writeback);
 
-/*
- * Wait for a page to complete writeback.  Returns -EINTR if we get a
- * fatal signal while waiting.
+/**
+ * wait_on_folio_writeback_killable - Wait for a folio to finish writeback.
+ * @folio: The folio to wait for.
+ *
+ * If the folio is currently being written back to storage, wait for the
+ * I/O to complete or a fatal signal to arrive.
+ *
+ * Context: Sleeps.  Must be called in process context and with
+ * no spinlocks held.  Caller should hold a reference on the folio.
+ * If the folio is not locked, writeback may start again after writeback
+ * has finished.
+ * Return: 0 on success, -EINTR if we get a fatal signal while waiting.
  */
-int wait_on_page_writeback_killable(struct page *page)
+int wait_on_folio_writeback_killable(struct folio *folio)
 {
-	while (PageWriteback(page)) {
-		trace_wait_on_page_writeback(page, page_mapping(page));
-		if (wait_on_page_bit_killable(page, PG_writeback))
+	while (FolioWriteback(folio)) {
+		trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
+		if (wait_on_page_bit_killable(&folio->page, PG_writeback))
 			return -EINTR;
 	}
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(wait_on_page_writeback_killable);
+EXPORT_SYMBOL_GPL(wait_on_folio_writeback_killable);
 
 /**
  * wait_for_stable_page() - wait for writeback to finish, if necessary.
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 24/27] mm/writeback: Add wait_for_stable_folio
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (22 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 23/27] mm/writeback: Add wait_on_folio_writeback Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 14:18   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 25/27] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Move wait_for_stable_page() into the folio compatibility file.
wait_for_stable_folio() avoids a call to compound_head() and is 14 bytes
smaller than wait_for_stable_page() was.  The net text size grows by 24
bytes as a result of this patch.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h |  1 +
 mm/folio-compat.c       |  6 ++++++
 mm/page-writeback.c     | 24 ++++++++++++++----------
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2be3a028eb3d..001f8ec67ee7 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -834,6 +834,7 @@ int wait_on_folio_writeback_killable(struct folio *folio);
 void end_page_writeback(struct page *page);
 void end_folio_writeback(struct folio *folio);
 void wait_for_stable_page(struct page *page);
+void wait_for_stable_folio(struct folio *folio);
 
 void page_endio(struct page *page, bool is_write, int err);
 
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 6aadecc39fba..335594fe414e 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -29,3 +29,9 @@ void wait_on_page_writeback(struct page *page)
 	return wait_on_folio_writeback(page_folio(page));
 }
 EXPORT_SYMBOL_GPL(wait_on_page_writeback);
+
+void wait_for_stable_page(struct page *page)
+{
+	return wait_for_stable_folio(page_folio(page));
+}
+EXPORT_SYMBOL_GPL(wait_for_stable_page);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 8271f9b24b69..9d55ceec05c0 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2865,17 +2865,21 @@ int wait_on_folio_writeback_killable(struct folio *folio)
 EXPORT_SYMBOL_GPL(wait_on_folio_writeback_killable);
 
 /**
- * wait_for_stable_page() - wait for writeback to finish, if necessary.
- * @page:	The page to wait on.
+ * wait_for_stable_folio() - wait for writeback to finish, if necessary.
+ * @folio: The folio to wait on.
  *
- * This function determines if the given page is related to a backing device
- * that requires page contents to be held stable during writeback.  If so, then
- * it will wait for any pending writeback to complete.
+ * This function determines if the given folio is related to a backing
+ * device that requires folio contents to be held stable during writeback.
+ * If so, then it will wait for any pending writeback to complete.
+ *
+ * Context: Sleeps.  Must be called in process context and with
+ * no spinlocks held.  Caller should hold a reference on the folio.
+ * If the folio is not locked, writeback may start again after writeback
+ * has finished.
  */
-void wait_for_stable_page(struct page *page)
+void wait_for_stable_folio(struct folio *folio)
 {
-	page = thp_head(page);
-	if (page->mapping->host->i_sb->s_iflags & SB_I_STABLE_WRITES)
-		wait_on_page_writeback(page);
+	if (folio->mapping->host->i_sb->s_iflags & SB_I_STABLE_WRITES)
+		wait_on_folio_writeback(folio);
 }
-EXPORT_SYMBOL_GPL(wait_for_stable_page);
+EXPORT_SYMBOL_GPL(wait_for_stable_folio);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 25/27] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (23 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 24/27] mm/writeback: Add wait_for_stable_folio Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 14:19   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 26/27] mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

We must always wait on the folio, otherwise we won't be woken up.

This commit shrinks the kernel by 691 bytes, mostly due to moving
the page waitqueue lookup into wait_on_folio_bit_common().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/afs/write.c          |  2 +-
 include/linux/netfs.h   |  2 +-
 include/linux/pagemap.h | 10 ++++----
 mm/filemap.c            | 56 ++++++++++++++++++-----------------------
 mm/page-writeback.c     |  4 +--
 5 files changed, 34 insertions(+), 40 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index 7af2b57e601b..93f15e5770f2 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -847,7 +847,7 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	 */
 #ifdef CONFIG_AFS_FSCACHE
 	if (PageFsCache(page) &&
-	    wait_on_page_bit_killable(page, PG_fscache) < 0)
+	    wait_on_folio_bit_killable(folio, PG_fscache) < 0)
 		return VM_FAULT_RETRY;
 #endif
 
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 9d3fbed4e30a..f44142dca767 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -54,7 +54,7 @@ static inline void unlock_page_fscache(struct page *page)
 static inline void wait_on_page_fscache(struct page *page)
 {
 	if (PageFsCache(page))
-		wait_on_page_bit(compound_head(page), PG_fscache);
+		wait_on_folio_bit(page_folio(page), PG_fscache);
 }
 
 enum netfs_read_source {
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 001f8ec67ee7..d800fae55f98 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -791,11 +791,11 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 }
 
 /*
- * This is exported only for wait_on_page_locked/wait_on_page_writeback, etc.,
+ * This is exported only for wait_on_folio_locked/wait_on_folio_writeback, etc.,
  * and should not be used directly.
  */
-extern void wait_on_page_bit(struct page *page, int bit_nr);
-extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
+extern void wait_on_folio_bit(struct folio *folio, int bit_nr);
+extern int wait_on_folio_bit_killable(struct folio *folio, int bit_nr);
 
 /* 
  * Wait for a folio to be unlocked.
@@ -807,14 +807,14 @@ extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
 static inline void wait_on_folio_locked(struct folio *folio)
 {
 	if (FolioLocked(folio))
-		wait_on_page_bit(&folio->page, PG_locked);
+		wait_on_folio_bit(folio, PG_locked);
 }
 
 static inline int wait_on_folio_locked_killable(struct folio *folio)
 {
 	if (!FolioLocked(folio))
 		return 0;
-	return wait_on_page_bit_killable(&folio->page, PG_locked);
+	return wait_on_folio_bit_killable(folio, PG_locked);
 }
 
 static inline void wait_on_page_locked(struct page *page)
diff --git a/mm/filemap.c b/mm/filemap.c
index 4591974f2c28..76e1c4be1205 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1102,7 +1102,7 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 	 *
 	 * So update the flags atomically, and wake up the waiter
 	 * afterwards to avoid any races. This store-release pairs
-	 * with the load-acquire in wait_on_page_bit_common().
+	 * with the load-acquire in wait_on_folio_bit_common().
 	 */
 	smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN);
 	wake_up_state(wait->private, mode);
@@ -1183,7 +1183,7 @@ static void wake_up_folio(struct folio *folio, int bit)
 }
 
 /*
- * A choice of three behaviors for wait_on_page_bit_common():
+ * A choice of three behaviors for wait_on_folio_bit_common():
  */
 enum behavior {
 	EXCLUSIVE,	/* Hold ref to page and take the bit when woken, like
@@ -1217,9 +1217,10 @@ static inline bool trylock_page_bit_common(struct page *page, int bit_nr,
 /* How many times do we accept lock stealing from under a waiter? */
 int sysctl_page_lock_unfairness = 5;
 
-static inline int wait_on_page_bit_common(wait_queue_head_t *q,
-	struct page *page, int bit_nr, int state, enum behavior behavior)
+static inline int wait_on_folio_bit_common(struct folio *folio, int bit_nr,
+		int state, enum behavior behavior)
 {
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
 	int unfairness = sysctl_page_lock_unfairness;
 	struct wait_page_queue wait_page;
 	wait_queue_entry_t *wait = &wait_page.wait;
@@ -1228,8 +1229,8 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	unsigned long pflags;
 
 	if (bit_nr == PG_locked &&
-	    !PageUptodate(page) && PageWorkingset(page)) {
-		if (!PageSwapBacked(page)) {
+	    !FolioUptodate(folio) && FolioWorkingset(folio)) {
+		if (!FolioSwapBacked(folio)) {
 			delayacct_thrashing_start();
 			delayacct = true;
 		}
@@ -1239,7 +1240,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 
 	init_wait(wait);
 	wait->func = wake_page_function;
-	wait_page.page = page;
+	wait_page.page = &folio->page;
 	wait_page.bit_nr = bit_nr;
 
 repeat:
@@ -1254,7 +1255,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	 * Do one last check whether we can get the
 	 * page bit synchronously.
 	 *
-	 * Do the SetPageWaiters() marking before that
+	 * Do the SetFolioWaiters() marking before that
 	 * to let any waker we _just_ missed know they
 	 * need to wake us up (otherwise they'll never
 	 * even go to the slow case that looks at the
@@ -1265,8 +1266,8 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	 * lock to avoid races.
 	 */
 	spin_lock_irq(&q->lock);
-	SetPageWaiters(page);
-	if (!trylock_page_bit_common(page, bit_nr, wait))
+	SetFolioWaiters(folio);
+	if (!trylock_page_bit_common(&folio->page, bit_nr, wait))
 		__add_wait_queue_entry_tail(q, wait);
 	spin_unlock_irq(&q->lock);
 
@@ -1276,10 +1277,10 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	 * see whether the page bit testing has already
 	 * been done by the wake function.
 	 *
-	 * We can drop our reference to the page.
+	 * We can drop our reference to the folio.
 	 */
 	if (behavior == DROP)
-		put_page(page);
+		put_folio(folio);
 
 	/*
 	 * Note that until the "finish_wait()", or until
@@ -1316,7 +1317,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 		 *
 		 * And if that fails, we'll have to retry this all.
 		 */
-		if (unlikely(test_and_set_bit(bit_nr, &page->flags)))
+		if (unlikely(test_and_set_bit(bit_nr, folio_flags(folio, 0))))
 			goto repeat;
 
 		wait->flags |= WQ_FLAG_DONE;
@@ -1325,7 +1326,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 
 	/*
 	 * If a signal happened, this 'finish_wait()' may remove the last
-	 * waiter from the wait-queues, but the PageWaiters bit will remain
+	 * waiter from the wait-queues, but the FolioWaiters bit will remain
 	 * set. That's ok. The next wakeup will take care of it, and trying
 	 * to do it here would be difficult and prone to races.
 	 */
@@ -1356,19 +1357,17 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
 }
 
-void wait_on_page_bit(struct page *page, int bit_nr)
+void wait_on_folio_bit(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
-	wait_on_page_bit_common(q, page, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
+	wait_on_folio_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
 }
-EXPORT_SYMBOL(wait_on_page_bit);
+EXPORT_SYMBOL(wait_on_folio_bit);
 
-int wait_on_page_bit_killable(struct page *page, int bit_nr)
+int wait_on_folio_bit_killable(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
-	return wait_on_page_bit_common(q, page, bit_nr, TASK_KILLABLE, SHARED);
+	return wait_on_folio_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED);
 }
-EXPORT_SYMBOL(wait_on_page_bit_killable);
+EXPORT_SYMBOL(wait_on_folio_bit_killable);
 
 /**
  * put_and_wait_on_page_locked - Drop a reference and wait for it to be unlocked
@@ -1385,11 +1384,8 @@ EXPORT_SYMBOL(wait_on_page_bit_killable);
  */
 int put_and_wait_on_page_locked(struct page *page, int state)
 {
-	wait_queue_head_t *q;
-
-	page = compound_head(page);
-	q = page_waitqueue(page);
-	return wait_on_page_bit_common(q, page, PG_locked, state, DROP);
+	return wait_on_folio_bit_common(page_folio(page), PG_locked, state,
+			DROP);
 }
 
 /**
@@ -1540,16 +1536,14 @@ EXPORT_SYMBOL_GPL(page_endio);
  */
 void __lock_folio(struct folio *folio)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
-	wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_UNINTERRUPTIBLE,
+	wait_on_folio_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE,
 				EXCLUSIVE);
 }
 EXPORT_SYMBOL(__lock_folio);
 
 int __lock_folio_killable(struct folio *folio)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
-	return wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_KILLABLE,
+	return wait_on_folio_bit_common(folio, PG_locked, TASK_KILLABLE,
 					EXCLUSIVE);
 }
 EXPORT_SYMBOL_GPL(__lock_folio_killable);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 9d55ceec05c0..7aed4feabdd2 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2834,7 +2834,7 @@ void wait_on_folio_writeback(struct folio *folio)
 {
 	while (FolioWriteback(folio)) {
 		trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
-		wait_on_page_bit(&folio->page, PG_writeback);
+		wait_on_folio_bit(folio, PG_writeback);
 	}
 }
 EXPORT_SYMBOL_GPL(wait_on_folio_writeback);
@@ -2856,7 +2856,7 @@ int wait_on_folio_writeback_killable(struct folio *folio)
 {
 	while (FolioWriteback(folio)) {
 		trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
-		if (wait_on_page_bit_killable(&folio->page, PG_writeback))
+		if (wait_on_folio_bit_killable(folio, PG_writeback))
 			return -EINTR;
 	}
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 26/27] mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (24 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 25/27] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 14:23   ` Christoph Hellwig
  2021-03-31 18:47 ` [PATCH v6 27/27] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

All callers have a folio, so use it directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 76e1c4be1205..51b2091d402c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1121,14 +1121,14 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 	return (flags & WQ_FLAG_EXCLUSIVE) != 0;
 }
 
-static void wake_up_page_bit(struct page *page, int bit_nr)
+static void wake_up_folio_bit(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
 	struct wait_page_key key;
 	unsigned long flags;
 	wait_queue_entry_t bookmark;
 
-	key.page = page;
+	key.page = &folio->page;
 	key.bit_nr = bit_nr;
 	key.page_match = 0;
 
@@ -1163,7 +1163,7 @@ static void wake_up_page_bit(struct page *page, int bit_nr)
 	 * page waiters.
 	 */
 	if (!waitqueue_active(q) || !key.page_match) {
-		ClearPageWaiters(page);
+		ClearFolioWaiters(folio);
 		/*
 		 * It's possible to miss clearing Waiters here, when we woke
 		 * our page waiters, but the hashed waitqueue has waiters for
@@ -1179,7 +1179,7 @@ static void wake_up_folio(struct folio *folio, int bit)
 {
 	if (!FolioWaiters(folio))
 		return;
-	wake_up_page_bit(&folio->page, bit);
+	wake_up_folio_bit(folio, bit);
 }
 
 /*
@@ -1444,7 +1444,7 @@ void unlock_folio(struct folio *folio)
 	BUILD_BUG_ON(PG_waiters != 7);
 	VM_BUG_ON_FOLIO(!FolioLocked(folio), folio);
 	if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0)))
-		wake_up_page_bit(&folio->page, PG_locked);
+		wake_up_folio_bit(folio, PG_locked);
 }
 EXPORT_SYMBOL(unlock_folio);
 
@@ -1461,10 +1461,10 @@ EXPORT_SYMBOL(unlock_folio);
  */
 void unlock_page_private_2(struct page *page)
 {
-	page = compound_head(page);
-	VM_BUG_ON_PAGE(!PagePrivate2(page), page);
-	clear_bit_unlock(PG_private_2, &page->flags);
-	wake_up_page_bit(page, PG_private_2);
+	struct folio *folio = page_folio(page);
+	VM_BUG_ON_FOLIO(!FolioPrivate2(folio), folio);
+	clear_bit_unlock(PG_private_2, folio_flags(folio, 0));
+	wake_up_folio_bit(folio, PG_private_2);
 }
 EXPORT_SYMBOL(unlock_page_private_2);
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 27/27] mm/filemap: Convert page wait queues to be folios
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (25 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 26/27] mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit Matthew Wilcox (Oracle)
@ 2021-03-31 18:47 ` Matthew Wilcox (Oracle)
  2021-04-06 14:25   ` Christoph Hellwig
  2021-04-01  7:05 ` [PATCH v6 00/27] Memory Folios Christoph Hellwig
                   ` (2 subsequent siblings)
  29 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-03-31 18:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Reinforce that if we're waiting for a bit in a struct page, that's
actually in the head page by changing the type from page to folio.
Increases the size of cachefiles by two bytes, but the kernel core
is unchanged in size.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/cachefiles/rdwr.c    | 16 ++++++++--------
 include/linux/pagemap.h |  8 ++++----
 mm/filemap.c            | 38 +++++++++++++++++++-------------------
 3 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 8ffc40e84a59..364af267ebaa 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -25,20 +25,20 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
 	struct cachefiles_object *object;
 	struct fscache_retrieval *op = monitor->op;
 	struct wait_page_key *key = _key;
-	struct page *page = wait->private;
+	struct folio *folio = wait->private;
 
 	ASSERT(key);
 
 	_enter("{%lu},%u,%d,{%p,%u}",
 	       monitor->netfs_page->index, mode, sync,
-	       key->page, key->bit_nr);
+	       key->folio, key->bit_nr);
 
-	if (key->page != page || key->bit_nr != PG_locked)
+	if (key->folio != folio || key->bit_nr != PG_locked)
 		return 0;
 
-	_debug("--- monitor %p %lx ---", page, page->flags);
+	_debug("--- monitor %p %lx ---", folio, folio->flags);
 
-	if (!PageUptodate(page) && !PageError(page)) {
+	if (!FolioUptodate(folio) && !FolioError(folio)) {
 		/* unlocked, not uptodate and not erronous? */
 		_debug("page probably truncated");
 	}
@@ -107,7 +107,7 @@ static int cachefiles_read_reissue(struct cachefiles_object *object,
 	put_page(backpage2);
 
 	INIT_LIST_HEAD(&monitor->op_link);
-	add_page_wait_queue(backpage, &monitor->monitor);
+	add_folio_wait_queue(page_folio(backpage), &monitor->monitor);
 
 	if (trylock_page(backpage)) {
 		ret = -EIO;
@@ -294,7 +294,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 	get_page(backpage);
 	monitor->back_page = backpage;
 	monitor->monitor.private = backpage;
-	add_page_wait_queue(backpage, &monitor->monitor);
+	add_folio_wait_queue(page_folio(backpage), &monitor->monitor);
 	monitor = NULL;
 
 	/* but the page may have been read before the monitor was installed, so
@@ -548,7 +548,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 		get_page(backpage);
 		monitor->back_page = backpage;
 		monitor->monitor.private = backpage;
-		add_page_wait_queue(backpage, &monitor->monitor);
+		add_folio_wait_queue(page_folio(backpage), &monitor->monitor);
 		monitor = NULL;
 
 		/* but the page may have been read before the monitor was
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index d800fae55f98..bf38ce40694d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -690,13 +690,13 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 }
 
 struct wait_page_key {
-	struct page *page;
+	struct folio *folio;
 	int bit_nr;
 	int page_match;
 };
 
 struct wait_page_queue {
-	struct page *page;
+	struct folio *folio;
 	int bit_nr;
 	wait_queue_entry_t wait;
 };
@@ -704,7 +704,7 @@ struct wait_page_queue {
 static inline bool wake_page_match(struct wait_page_queue *wait_page,
 				  struct wait_page_key *key)
 {
-	if (wait_page->page != key->page)
+	if (wait_page->folio != key->folio)
 	       return false;
 	key->page_match = 1;
 
@@ -841,7 +841,7 @@ void page_endio(struct page *page, bool is_write, int err);
 /*
  * Add an arbitrary waiter to a page's wait queue
  */
-extern void add_page_wait_queue(struct page *page, wait_queue_entry_t *waiter);
+void add_folio_wait_queue(struct folio *folio, wait_queue_entry_t *waiter);
 
 /*
  * Fault everything in given userspace address range in.
diff --git a/mm/filemap.c b/mm/filemap.c
index 51b2091d402c..b93ea19afd89 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1019,11 +1019,11 @@ EXPORT_SYMBOL(__page_cache_alloc);
  */
 #define PAGE_WAIT_TABLE_BITS 8
 #define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS)
-static wait_queue_head_t page_wait_table[PAGE_WAIT_TABLE_SIZE] __cacheline_aligned;
+static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE] __cacheline_aligned;
 
-static wait_queue_head_t *page_waitqueue(struct page *page)
+static wait_queue_head_t *folio_waitqueue(struct folio *folio)
 {
-	return &page_wait_table[hash_ptr(page, PAGE_WAIT_TABLE_BITS)];
+	return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)];
 }
 
 void __init pagecache_init(void)
@@ -1031,7 +1031,7 @@ void __init pagecache_init(void)
 	int i;
 
 	for (i = 0; i < PAGE_WAIT_TABLE_SIZE; i++)
-		init_waitqueue_head(&page_wait_table[i]);
+		init_waitqueue_head(&folio_wait_table[i]);
 
 	page_writeback_init();
 }
@@ -1086,10 +1086,10 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 	 */
 	flags = wait->flags;
 	if (flags & WQ_FLAG_EXCLUSIVE) {
-		if (test_bit(key->bit_nr, &key->page->flags))
+		if (test_bit(key->bit_nr, &key->folio->flags))
 			return -1;
 		if (flags & WQ_FLAG_CUSTOM) {
-			if (test_and_set_bit(key->bit_nr, &key->page->flags))
+			if (test_and_set_bit(key->bit_nr, &key->folio->flags))
 				return -1;
 			flags |= WQ_FLAG_DONE;
 		}
@@ -1123,12 +1123,12 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 
 static void wake_up_folio_bit(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	wait_queue_head_t *q = folio_waitqueue(folio);
 	struct wait_page_key key;
 	unsigned long flags;
 	wait_queue_entry_t bookmark;
 
-	key.page = &folio->page;
+	key.folio = folio;
 	key.bit_nr = bit_nr;
 	key.page_match = 0;
 
@@ -1220,7 +1220,7 @@ int sysctl_page_lock_unfairness = 5;
 static inline int wait_on_folio_bit_common(struct folio *folio, int bit_nr,
 		int state, enum behavior behavior)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	wait_queue_head_t *q = folio_waitqueue(folio);
 	int unfairness = sysctl_page_lock_unfairness;
 	struct wait_page_queue wait_page;
 	wait_queue_entry_t *wait = &wait_page.wait;
@@ -1240,7 +1240,7 @@ static inline int wait_on_folio_bit_common(struct folio *folio, int bit_nr,
 
 	init_wait(wait);
 	wait->func = wake_page_function;
-	wait_page.page = &folio->page;
+	wait_page.folio = folio;
 	wait_page.bit_nr = bit_nr;
 
 repeat:
@@ -1389,23 +1389,23 @@ int put_and_wait_on_page_locked(struct page *page, int state)
 }
 
 /**
- * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
- * @page: Page defining the wait queue of interest
+ * add_folio_wait_queue - Add an arbitrary waiter to a folio's wait queue
+ * @folio: Folio defining the wait queue of interest
  * @waiter: Waiter to add to the queue
  *
- * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ * Add an arbitrary @waiter to the wait queue for the nominated @folio.
  */
-void add_page_wait_queue(struct page *page, wait_queue_entry_t *waiter)
+void add_folio_wait_queue(struct folio *folio, wait_queue_entry_t *waiter)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
+	wait_queue_head_t *q = folio_waitqueue(folio);
 	unsigned long flags;
 
 	spin_lock_irqsave(&q->lock, flags);
 	__add_wait_queue_entry_tail(q, waiter);
-	SetPageWaiters(page);
+	SetFolioWaiters(folio);
 	spin_unlock_irqrestore(&q->lock, flags);
 }
-EXPORT_SYMBOL_GPL(add_page_wait_queue);
+EXPORT_SYMBOL_GPL(add_folio_wait_queue);
 
 #ifndef clear_bit_unlock_is_negative_byte
 
@@ -1550,10 +1550,10 @@ EXPORT_SYMBOL_GPL(__lock_folio_killable);
 
 static int __lock_folio_async(struct folio *folio, struct wait_page_queue *wait)
 {
-	struct wait_queue_head *q = page_waitqueue(&folio->page);
+	struct wait_queue_head *q = folio_waitqueue(folio);
 	int ret = 0;
 
-	wait->page = &folio->page;
+	wait->folio = folio;
 	wait->bit_nr = PG_locked;
 
 	spin_lock_irq(&q->lock);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (26 preceding siblings ...)
  2021-03-31 18:47 ` [PATCH v6 27/27] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
@ 2021-04-01  7:05 ` Christoph Hellwig
  2021-04-01 11:26   ` Matthew Wilcox
  2021-04-03  0:31 ` Kent Overstreet
  2021-04-05 19:14 ` Jeff Layton
  29 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-01  7:05 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
>  - Mirror members of struct page (for pagecache / anon) into struct folio,
>    so (eg) you can use folio->mapping instead of folio->page.mapping

Eww, why?


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-04-01  7:05 ` [PATCH v6 00/27] Memory Folios Christoph Hellwig
@ 2021-04-01 11:26   ` Matthew Wilcox
  2021-04-01 12:28     ` Jason Gunthorpe
  2021-04-02 14:37     ` Christoph Hellwig
  0 siblings, 2 replies; 76+ messages in thread
From: Matthew Wilcox @ 2021-04-01 11:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Thu, Apr 01, 2021 at 08:05:37AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 31, 2021 at 07:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> >  - Mirror members of struct page (for pagecache / anon) into struct folio,
> >    so (eg) you can use folio->mapping instead of folio->page.mapping
> 
> Eww, why?

So that eventually we can rename page->mapping to page->_mapping and
prevent the bugs from people doing page->mapping on a tail page.  eg
https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2103102214170.7159@eggly.anvils/



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-04-01 11:26   ` Matthew Wilcox
@ 2021-04-01 12:28     ` Jason Gunthorpe
  2021-04-01 12:52       ` Matthew Wilcox
  2021-04-02 14:37     ` Christoph Hellwig
  1 sibling, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 12:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel,
	linux-cachefs, linux-afs

On Thu, Apr 01, 2021 at 12:26:56PM +0100, Matthew Wilcox wrote:
> On Thu, Apr 01, 2021 at 08:05:37AM +0100, Christoph Hellwig wrote:
> > On Wed, Mar 31, 2021 at 07:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > >  - Mirror members of struct page (for pagecache / anon) into struct folio,
> > >    so (eg) you can use folio->mapping instead of folio->page.mapping
> > 
> > Eww, why?
> 
> So that eventually we can rename page->mapping to page->_mapping and
> prevent the bugs from people doing page->mapping on a tail page.  eg
> https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2103102214170.7159@eggly.anvils/

Is that gcc structure layout randomization stuff going to be a problem
here?

Add some 
  static_assert(offsetof(struct folio,..) == offsetof(struct page,..))

tests to force it?

Jason


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-04-01 12:28     ` Jason Gunthorpe
@ 2021-04-01 12:52       ` Matthew Wilcox
  2021-04-01 13:30         ` Jason Gunthorpe
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2021-04-01 12:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel,
	linux-cachefs, linux-afs

On Thu, Apr 01, 2021 at 09:28:03AM -0300, Jason Gunthorpe wrote:
> On Thu, Apr 01, 2021 at 12:26:56PM +0100, Matthew Wilcox wrote:
> > On Thu, Apr 01, 2021 at 08:05:37AM +0100, Christoph Hellwig wrote:
> > > On Wed, Mar 31, 2021 at 07:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > > >  - Mirror members of struct page (for pagecache / anon) into struct folio,
> > > >    so (eg) you can use folio->mapping instead of folio->page.mapping
> > > 
> > > Eww, why?
> > 
> > So that eventually we can rename page->mapping to page->_mapping and
> > prevent the bugs from people doing page->mapping on a tail page.  eg
> > https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2103102214170.7159@eggly.anvils/
> 
> Is that gcc structure layout randomization stuff going to be a problem
> here?
> 
> Add some 
>   static_assert(offsetof(struct folio,..) == offsetof(struct page,..))
> 
> tests to force it?

You sound like the kind of person who hasn't read patch 1.

diff --git a/mm/util.c b/mm/util.c
index 0b6dd9d81da7..521a772f06eb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -686,6 +686,25 @@ struct anon_vma *page_anon_vma(struct page *page)
 	return __page_rmapping(page);
 }
 
+static inline void folio_build_bug(void)
+{
+#define FOLIO_MATCH(pg, fl)						\
+BUILD_BUG_ON(offsetof(struct page, pg) != offsetof(struct folio, fl));
+
+	FOLIO_MATCH(flags, flags);
+	FOLIO_MATCH(lru, lru);
+	FOLIO_MATCH(mapping, mapping);
+	FOLIO_MATCH(index, index);
+	FOLIO_MATCH(private, private);
+	FOLIO_MATCH(_mapcount, _mapcount);
+	FOLIO_MATCH(_refcount, _refcount);
+#ifdef CONFIG_MEMCG
+	FOLIO_MATCH(memcg_data, memcg_data);
+#endif
+#undef FOLIO_MATCH
+	BUILD_BUG_ON(sizeof(struct page) != sizeof(struct folio));
+}
+
 struct address_space *page_mapping(struct page *page)
 {
 	struct address_space *mapping;



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-04-01 12:52       ` Matthew Wilcox
@ 2021-04-01 13:30         ` Jason Gunthorpe
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 13:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel,
	linux-cachefs, linux-afs

On Thu, Apr 01, 2021 at 01:52:01PM +0100, Matthew Wilcox wrote:
> On Thu, Apr 01, 2021 at 09:28:03AM -0300, Jason Gunthorpe wrote:
> > On Thu, Apr 01, 2021 at 12:26:56PM +0100, Matthew Wilcox wrote:
> > > On Thu, Apr 01, 2021 at 08:05:37AM +0100, Christoph Hellwig wrote:
> > > > On Wed, Mar 31, 2021 at 07:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > > > >  - Mirror members of struct page (for pagecache / anon) into struct folio,
> > > > >    so (eg) you can use folio->mapping instead of folio->page.mapping
> > > > 
> > > > Eww, why?
> > > 
> > > So that eventually we can rename page->mapping to page->_mapping and
> > > prevent the bugs from people doing page->mapping on a tail page.  eg
> > > https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2103102214170.7159@eggly.anvils/
> > 
> > Is that gcc structure layout randomization stuff going to be a problem
> > here?
> > 
> > Add some 
> >   static_assert(offsetof(struct folio,..) == offsetof(struct page,..))
> > 
> > tests to force it?
> 
> You sound like the kind of person who hasn't read patch 1.

Yes, I missed this hunk :)

Jason


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-04-01 11:26   ` Matthew Wilcox
  2021-04-01 12:28     ` Jason Gunthorpe
@ 2021-04-02 14:37     ` Christoph Hellwig
  2021-04-02 14:49       ` Matthew Wilcox
  1 sibling, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-02 14:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel,
	linux-cachefs, linux-afs

On Thu, Apr 01, 2021 at 12:26:56PM +0100, Matthew Wilcox wrote:
> On Thu, Apr 01, 2021 at 08:05:37AM +0100, Christoph Hellwig wrote:
> > On Wed, Mar 31, 2021 at 07:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > >  - Mirror members of struct page (for pagecache / anon) into struct folio,
> > >    so (eg) you can use folio->mapping instead of folio->page.mapping
> > 
> > Eww, why?
> 
> So that eventually we can rename page->mapping to page->_mapping and
> prevent the bugs from people doing page->mapping on a tail page.  eg
> https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2103102214170.7159@eggly.anvils/

I'm not sure I like this.  This whole concept of structures that do need
the same layout is very problematic, even with the safe guards you've
added.  So if it was up to me I'd prefer the folio as a simple container
as it was in the previous revisions.  At some point members should move
from the page to the folio, but I'd rather do that over a shorter period
an in targeted series.  We need the basic to go in first.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-04-02 14:37     ` Christoph Hellwig
@ 2021-04-02 14:49       ` Matthew Wilcox
  0 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2021-04-02 14:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Fri, Apr 02, 2021 at 03:37:55PM +0100, Christoph Hellwig wrote:
> On Thu, Apr 01, 2021 at 12:26:56PM +0100, Matthew Wilcox wrote:
> > On Thu, Apr 01, 2021 at 08:05:37AM +0100, Christoph Hellwig wrote:
> > > On Wed, Mar 31, 2021 at 07:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > > >  - Mirror members of struct page (for pagecache / anon) into struct folio,
> > > >    so (eg) you can use folio->mapping instead of folio->page.mapping
> > > 
> > > Eww, why?
> > 
> > So that eventually we can rename page->mapping to page->_mapping and
> > prevent the bugs from people doing page->mapping on a tail page.  eg
> > https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2103102214170.7159@eggly.anvils/
> 
> I'm not sure I like this.  This whole concept of structures that do need
> the same layout is very problematic, even with the safe guards you've
> added.  So if it was up to me I'd prefer the folio as a simple container
> as it was in the previous revisions.  At some point members should move
> from the page to the folio, but I'd rather do that over a shorter period
> an in targeted series.  We need the basic to go in first.

That was my original plan, but it'll be another round of churn, and I'm
not sure there'll be the appetite for it.  There's not a lot of appetite
for this round, and this one has measurable performance gains!



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (27 preceding siblings ...)
  2021-04-01  7:05 ` [PATCH v6 00/27] Memory Folios Christoph Hellwig
@ 2021-04-03  0:31 ` Kent Overstreet
  2021-04-05 19:14 ` Jeff Layton
  29 siblings, 0 replies; 76+ messages in thread
From: Kent Overstreet @ 2021-04-03  0:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> The medium-term goal is to convert all filesystems and some device
> drivers to work in terms of folios.  This series contains a lot of
> explicit conversions, but it's important to realise it's removing a lot
> of implicit conversions in some relatively hot paths.  There will be very
> few conversions from folios when this work is completed; filesystems,
> the page cache, the LRU and so on will generally only deal with folios.

I'm pretty excited for this to land - 4k page overhead has been a pain point for
me for quite some time. I know this is going to be a lot of churn but I think
leveraging the type system is exactly the right way to go about this, and I
can't wait to start converting bcachefs.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
                   ` (28 preceding siblings ...)
  2021-04-03  0:31 ` Kent Overstreet
@ 2021-04-05 19:14 ` Jeff Layton
  2021-04-05 19:31   ` Matthew Wilcox
  29 siblings, 1 reply; 76+ messages in thread
From: Jeff Layton @ 2021-04-05 19:14 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, 2021-03-31 at 19:47 +0100, Matthew Wilcox (Oracle) wrote:
> Managing memory in 4KiB pages is a serious overhead.  Many benchmarks
> exist which show the benefits of a larger "page size".  As an example,
> an earlier iteration of this idea which used compound pages got a 7%
> performance boost when compiling the kernel using kernbench without any
> particular tuning.
> 
> Using compound pages or THPs exposes a serious weakness in our type
> system.  Functions are often unprepared for compound pages to be passed
> to them, and may only act on PAGE_SIZE chunks.  Even functions which are
> aware of compound pages may expect a head page, and do the wrong thing
> if passed a tail page.
> 
> There have been efforts to label function parameters as 'head' instead
> of 'page' to indicate that the function expects a head page, but this
> leaves us with runtime assertions instead of using the compiler to prove
> that nobody has mistakenly passed a tail page.  Calling a struct page
> 'head' is also inaccurate as they will work perfectly well on base pages.
> The term 'nottail' has not proven popular.
> 
> We also waste a lot of instructions ensuring that we're not looking at
> a tail page.  Almost every call to PageFoo() contains one or more hidden
> calls to compound_head().  This also happens for get_page(), put_page()
> and many more functions.  There does not appear to be a way to tell gcc
> that it can cache the result of compound_head(), nor is there a way to
> tell it that compound_head() is idempotent.
> 
> This series introduces the 'struct folio' as a replacement for
> head-or-base pages.  This initial set reduces the kernel size by
> approximately 5kB by removing conversions from tail pages to head pages.
> The real purpose of this series is adding infrastructure to enable
> further use of the folio.
> 
> The medium-term goal is to convert all filesystems and some device
> drivers to work in terms of folios.  This series contains a lot of
> explicit conversions, but it's important to realise it's removing a lot
> of implicit conversions in some relatively hot paths.  There will be very
> few conversions from folios when this work is completed; filesystems,
> the page cache, the LRU and so on will generally only deal with folios.
> 
> I analysed the text size reduction using a config based on Oracle UEK
> with all modules changed to built-in.  That's obviously not a kernel
> which makes sense to run, but it serves to compare the effects on (many
> common) filesystems & drivers, not just the core.
> 
> add/remove: 34266/34260 grow/shrink: 5220/3206 up/down: 1083860/-1088546 (-4686)
> 
> Current tree at:
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio
> 
> (contains another ~100 patches on top of this batch, not all of which are
> in good shape for submission)
> 
> v6:
>  - Rebase on next-20210330
>    - wait_bit_key patch merged by Linus
>    - wait_on_page_writeback_killable() patches merged by Linus
>    - Documentation patch merged by Andrew
>  - Move folio_next_index() into this series
>  - Move folio_offset() and folio_file_offset() into this series
>  - Mirror members of struct page (for pagecache / anon) into struct folio,
>    so (eg) you can use folio->mapping instead of folio->page.mapping
>  - Add folio_ref_* functions, including kernel-doc for folio_ref_count().
>  - Add count_memcg_folio_event()
>  - Add put_folio_testzero()
>  - Add folio_mapcount()
>  - Add FolioKsm()
>  - Fix afs_page_mkwrite() compilation
>  - Fix/improve kernel-doc for
>    - struct folio
>    - add_folio_wait_queue()
>    - wait_for_stable_folio()
>    - wait_on_folio_writeback()
>    - wait_on_folio_writeback_killable()
> v5:
>  - Rebase on next-20210319
>  - Pull out three bug-fix patches to the front of the series, allowing
>    them to be applied earlier.
>  - Fix folio_page() against pages being moved between swap & page cache
>  - Fix FolioDoubleMap to use the right page flags
>  - Rename next_folio() to folio_next() (akpm)
>  - Renamed folio stat functions (akpm)
>  - Add 'mod' versions of the folio stats for users that already have 'nr'
>  - Renamed folio_page to folio_file_page() (akpm)
>  - Added kernel-doc for struct folio, folio_next(), folio_index(),
>    folio_file_page(), folio_contains(), folio_order(), folio_nr_pages(),
>    folio_shift(), folio_size(), page_folio(), get_folio(), put_folio()
>  - Make folio_private() work in terms of void * instead of unsigned long
>  - Used page_folio() in attach/detach page_private() (hch)
>  - Drop afs_page_mkwrite folio conversion from this series
>  - Add wait_on_folio_writeback_killable()
>  - Convert add_page_wait_queue() to add_folio_wait_queue()
>  - Add folio_swap_entry() helper
>  - Drop the additions of *FolioFsCache
>  - Simplify the addition of lock_folio_memcg() et al
>  - Drop test_clear_page_writeback() conversion from this series
>  - Add FolioTransHuge() definition
>  - Rename __folio_file_mapping() to swapcache_mapping()
>  - Added swapcache_index() helper
>  - Removed lock_folio_async()
>  - Made __lock_folio_async() static to filemap.c
>  - Converted unlock_page_private_2() to use a folio internally
> v4:
>  - Rebase on current Linus tree (including swap fix)
>  - Analyse each patch in terms of its effects on kernel text size.
>    A few were modified to improve their effect.  In particular, where
>    pushing calls to page_folio() into the callers resulted in unacceptable
>    size increases, the wrapper was placed in mm/folio-compat.c.  This lets
>    us see all the places which are good targets for conversion to folios.
>  - Some of the patches were reordered, split or merged in order to make
>    more logical sense.
>  - Use nth_page() for folio_next() if we're using SPARSEMEM and not
>    VMEMMAP (Zi Yan)
>  - Increment and decrement page stats in units of pages instead of units
>    of folios (Zi Yan)
> v3:
>  - Rebase on next-20210127.  Two major sources of conflict, the
>    generic_file_buffered_read refactoring (in akpm tree) and the
>    fscache work (in dhowells tree).
> v2:
>  - Pare patch series back to just infrastructure and the page waiting
>    parts.
> 
> Matthew Wilcox (Oracle) (27):
>   mm: Introduce struct folio
>   mm: Add folio_pgdat and folio_zone
>   mm/vmstat: Add functions to account folio statistics
>   mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
>   mm: Add folio reference count functions
>   mm: Add put_folio
>   mm: Add get_folio
>   mm: Create FolioFlags
>   mm: Handle per-folio private data
>   mm/filemap: Add folio_index, folio_file_page and folio_contains
>   mm/filemap: Add folio_next_index
>   mm/filemap: Add folio_offset and folio_file_offset
>   mm/util: Add folio_mapping and folio_file_mapping
>   mm: Add folio_mapcount
>   mm/memcg: Add folio wrappers for various functions
>   mm/filemap: Add unlock_folio
>   mm/filemap: Add lock_folio
>   mm/filemap: Add lock_folio_killable
>   mm/filemap: Add __lock_folio_async
>   mm/filemap: Add __lock_folio_or_retry
>   mm/filemap: Add wait_on_folio_locked
>   mm/filemap: Add end_folio_writeback
>   mm/writeback: Add wait_on_folio_writeback
>   mm/writeback: Add wait_for_stable_folio
>   mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit
>   mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit
>   mm/filemap: Convert page wait queues to be folios
> 
>  Documentation/core-api/mm-api.rst |   3 +
>  fs/afs/write.c                    |   7 +-
>  fs/cachefiles/rdwr.c              |  16 +-
>  fs/io_uring.c                     |   2 +-
>  include/linux/memcontrol.h        |  30 ++++
>  include/linux/mm.h                | 177 ++++++++++++++++----
>  include/linux/mm_types.h          |  81 +++++++++
>  include/linux/mmdebug.h           |  20 +++
>  include/linux/netfs.h             |   2 +-
>  include/linux/page-flags.h        | 130 +++++++++++---
>  include/linux/page_ref.h          |  88 +++++++++-
>  include/linux/pagemap.h           | 270 ++++++++++++++++++++++--------
>  include/linux/swap.h              |   6 +
>  include/linux/vmstat.h            | 107 ++++++++++++
>  mm/Makefile                       |   2 +-
>  mm/filemap.c                      | 242 +++++++++++++-------------
>  mm/folio-compat.c                 |  37 ++++
>  mm/memory.c                       |   8 +-
>  mm/page-writeback.c               |  72 +++++---
>  mm/swapfile.c                     |   8 +-
>  mm/util.c                         |  49 ++++--
>  21 files changed, 1051 insertions(+), 306 deletions(-)
>  create mode 100644 mm/folio-compat.c
> 
> -- 
> 2.30.2
> 
> 
> From 99da34311602826672621c3d69bad13813993c1a Mon Sep 17 00:00:00 2001
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Date: Tue, 30 Mar 2021 10:47:46 -0400
> Subject: [PATCH v6 00/25] *** SUBJECT HERE ***
> To: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org,
>     linux-fsdevel@vger.kernel.org,
>     linux-cachefs@redhat.com,
>     linux-afs@lists.infradead.org
> 
> *** BLURB HERE ***
> 
> Matthew Wilcox (Oracle) (25):
>   mm: Introduce struct folio
>   mm: Add folio_pgdat and folio_zone
>   mm/vmstat: Add functions to account folio statistics
>   mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
>   mm: Add put_folio
>   mm: Add get_folio
>   mm: Create FolioFlags
>   mm: Handle per-folio private data
>   mm/filemap: Add folio_index, folio_file_page and folio_contains
>   mm/filemap: Add folio_next_index
>   mm/filemap: Add folio_offset and folio_file_offset
>   mm/util: Add folio_mapping and folio_file_mapping
>   mm/memcg: Add folio wrappers for various functions
>   mm/filemap: Add unlock_folio
>   mm/filemap: Add lock_folio
>   mm/filemap: Add lock_folio_killable
>   mm/filemap: Add __lock_folio_async
>   mm/filemap: Add __lock_folio_or_retry
>   mm/filemap: Add wait_on_folio_locked
>   mm/filemap: Add end_folio_writeback
>   mm/writeback: Add wait_on_folio_writeback
>   mm/writeback: Add wait_for_stable_folio
>   mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit
>   mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit
>   mm/filemap: Convert page wait queues to be folios
> 
>  Documentation/core-api/mm-api.rst |   2 +
>  fs/afs/write.c                    |   7 +-
>  fs/cachefiles/rdwr.c              |  16 +-
>  fs/io_uring.c                     |   2 +-
>  include/linux/memcontrol.h        |  21 +++
>  include/linux/mm.h                | 156 +++++++++++++----
>  include/linux/mm_types.h          |  81 +++++++++
>  include/linux/mmdebug.h           |  20 +++
>  include/linux/netfs.h             |   2 +-
>  include/linux/page-flags.h        | 120 ++++++++++---
>  include/linux/pagemap.h           | 270 ++++++++++++++++++++++--------
>  include/linux/swap.h              |   6 +
>  include/linux/vmstat.h            | 107 ++++++++++++
>  mm/Makefile                       |   2 +-
>  mm/filemap.c                      | 242 +++++++++++++-------------
>  mm/folio-compat.c                 |  37 ++++
>  mm/memory.c                       |   8 +-
>  mm/page-writeback.c               |  72 +++++---
>  mm/swapfile.c                     |   8 +-
>  mm/util.c                         |  49 ++++--
>  20 files changed, 926 insertions(+), 302 deletions(-)
>  create mode 100644 mm/folio-compat.c
> 

I too am a little concerned about the amount of churn this is likely to
cause, but this does seem like a fairly promising way forward for
actually using THPs in the pagecache. The set is fairly straightforward.

That said, there are few callers of these new functions in here. Is this
set enough to allow converting some subsystem to use folios? It might be
good to do that if possible, so we can get an idea of how much work
we're in for.

-- 
Jeff Layton <jlayton@kernel.org>



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-04-05 19:14 ` Jeff Layton
@ 2021-04-05 19:31   ` Matthew Wilcox
  2021-04-06 15:14     ` Jeff Layton
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2021-04-05 19:31 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Mon, Apr 05, 2021 at 03:14:29PM -0400, Jeff Layton wrote:
> On Wed, 2021-03-31 at 19:47 +0100, Matthew Wilcox (Oracle) wrote:
> > Managing memory in 4KiB pages is a serious overhead.  Many benchmarks
> > exist which show the benefits of a larger "page size".  As an example,
> > an earlier iteration of this idea which used compound pages got a 7%
> > performance boost when compiling the kernel using kernbench without any
> > particular tuning.
> > 
> > Using compound pages or THPs exposes a serious weakness in our type
> > system.  Functions are often unprepared for compound pages to be passed
> > to them, and may only act on PAGE_SIZE chunks.  Even functions which are
> > aware of compound pages may expect a head page, and do the wrong thing
> > if passed a tail page.
> > 
> > There have been efforts to label function parameters as 'head' instead
> > of 'page' to indicate that the function expects a head page, but this
> > leaves us with runtime assertions instead of using the compiler to prove
> > that nobody has mistakenly passed a tail page.  Calling a struct page
> > 'head' is also inaccurate as they will work perfectly well on base pages.
> > The term 'nottail' has not proven popular.
> > 
> > We also waste a lot of instructions ensuring that we're not looking at
> > a tail page.  Almost every call to PageFoo() contains one or more hidden
> > calls to compound_head().  This also happens for get_page(), put_page()
> > and many more functions.  There does not appear to be a way to tell gcc
> > that it can cache the result of compound_head(), nor is there a way to
> > tell it that compound_head() is idempotent.
> > 
> > This series introduces the 'struct folio' as a replacement for
> > head-or-base pages.  This initial set reduces the kernel size by
> > approximately 5kB by removing conversions from tail pages to head pages.
> > The real purpose of this series is adding infrastructure to enable
> > further use of the folio.
> > 
> > The medium-term goal is to convert all filesystems and some device
> > drivers to work in terms of folios.  This series contains a lot of
> > explicit conversions, but it's important to realise it's removing a lot
> > of implicit conversions in some relatively hot paths.  There will be very
> > few conversions from folios when this work is completed; filesystems,
> > the page cache, the LRU and so on will generally only deal with folios.
> 
> I too am a little concerned about the amount of churn this is likely to
> cause, but this does seem like a fairly promising way forward for
> actually using THPs in the pagecache. The set is fairly straightforward.
> 
> That said, there are few callers of these new functions in here. Is this
> set enough to allow converting some subsystem to use folios? It might be
> good to do that if possible, so we can get an idea of how much work
> we're in for.

It isn't enough to start converting much.  There needs to be a second set
of patches which add all the infrastructure for converting a filesystem.
Then we can start working on the filesystems.  I have a start at that
here:

https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio

I don't know if it's exactly how I'll arrange it for submission.  It might
be better to convert all the filesystem implementations of readpage
to work on a folio, and then the big bang conversion of ->readpage to
->read_folio will look much more mechanical.

But if I can't convince people that a folio approach is what we need,
then I should stop working on it, and go back to fixing the endless
stream of bugs that the thp-based approach surfaces.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-03-31 18:47 ` [PATCH v6 01/27] mm: Introduce struct folio Matthew Wilcox (Oracle)
@ 2021-04-06 12:29   ` Kirill A. Shutemov
  2021-04-06 12:48     ` Matthew Wilcox
  2021-04-08  9:01   ` Rasmus Villemoes
  1 sibling, 1 reply; 76+ messages in thread
From: Kirill A. Shutemov @ 2021-04-06 12:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:02PM +0100, Matthew Wilcox (Oracle) wrote:
> +/**
> + * folio_next - Move to the next physical folio.
> + * @folio: The folio we're currently operating on.
> + *
> + * If you have physically contiguous memory which may span more than
> + * one folio (eg a &struct bio_vec), use this function to move from one
> + * folio to the next.  Do not use it if the memory is only virtually
> + * contiguous as the folios are almost certainly not adjacent to each
> + * other.  This is the folio equivalent to writing ``page++``.
> + *
> + * Context: We assume that the folios are refcounted and/or locked at a
> + * higher level and do not adjust the reference counts.
> + * Return: The next struct folio.
> + */
> +static inline struct folio *folio_next(struct folio *folio)
> +{
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +	return (struct folio *)nth_page(&folio->page, folio_nr_pages(folio));
> +#else
> +	return folio + folio_nr_pages(folio);
> +#endif

Do we really need the #if here?

From quick look at nth_page() and memory_model.h, compiler should be able
to simplify calculation for FLATMEM or SPARSEMEM_VMEMMAP to what you do in
the #else. No?

> @@ -224,6 +224,71 @@ struct page {
>  #endif
>  } _struct_page_alignment;
>  
> +/**
> + * struct folio - Represents a contiguous set of bytes.
> + * @flags: Identical to the page flags.
> + * @lru: Least Recently Used list; tracks how recently this folio was used.
> + * @mapping: The file this page belongs to, or refers to the anon_vma for
> + *    anonymous pages.
> + * @index: Offset within the file, in units of pages.  For anonymous pages,
> + *    this is the index from the beginning of the mmap.
> + * @private: Filesystem per-folio data (see attach_folio_private()).
> + *    Used for swp_entry_t if FolioSwapCache().
> + * @_mapcount: How many times this folio is mapped to userspace.  Use
> + *    folio_mapcount() to access it.
> + * @_refcount: Number of references to this folio.  Use folio_ref_count()
> + *    to read it.
> + * @memcg_data: Memory Control Group data.
> + *
> + * A folio is a physically, virtually and logically contiguous set
> + * of bytes.  It is a power-of-two in size, and it is aligned to that
> + * same power-of-two.  It is at least as large as %PAGE_SIZE.  If it is
> + * in the page cache, it is at a file offset which is a multiple of that
> + * power-of-two.
> + */
> +struct folio {
> +	/* private: don't document the anon union */
> +	union {
> +		struct {
> +	/* public: */
> +			unsigned long flags;
> +			struct list_head lru;
> +			struct address_space *mapping;
> +			pgoff_t index;
> +			unsigned long private;
> +			atomic_t _mapcount;
> +			atomic_t _refcount;
> +#ifdef CONFIG_MEMCG
> +			unsigned long memcg_data;
> +#endif

As Christoph, I'm not a fan of this :/

> +	/* private: the union with struct page is transitional */
> +		};
> +		struct page page;
> +	};
> +};

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-04-06 12:29   ` Kirill A. Shutemov
@ 2021-04-06 12:48     ` Matthew Wilcox
  2021-04-06 14:21       ` Kirill A. Shutemov
  2021-04-06 14:31       ` Christoph Hellwig
  0 siblings, 2 replies; 76+ messages in thread
From: Matthew Wilcox @ 2021-04-06 12:48 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Tue, Apr 06, 2021 at 03:29:18PM +0300, Kirill A. Shutemov wrote:
> On Wed, Mar 31, 2021 at 07:47:02PM +0100, Matthew Wilcox (Oracle) wrote:
> > +/**
> > + * folio_next - Move to the next physical folio.
> > + * @folio: The folio we're currently operating on.
> > + *
> > + * If you have physically contiguous memory which may span more than
> > + * one folio (eg a &struct bio_vec), use this function to move from one
> > + * folio to the next.  Do not use it if the memory is only virtually
> > + * contiguous as the folios are almost certainly not adjacent to each
> > + * other.  This is the folio equivalent to writing ``page++``.
> > + *
> > + * Context: We assume that the folios are refcounted and/or locked at a
> > + * higher level and do not adjust the reference counts.
> > + * Return: The next struct folio.
> > + */
> > +static inline struct folio *folio_next(struct folio *folio)
> > +{
> > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> > +	return (struct folio *)nth_page(&folio->page, folio_nr_pages(folio));
> > +#else
> > +	return folio + folio_nr_pages(folio);
> > +#endif
> 
> Do we really need the #if here?
> 
> >From quick look at nth_page() and memory_model.h, compiler should be able
> to simplify calculation for FLATMEM or SPARSEMEM_VMEMMAP to what you do in
> the #else. No?

No.

0000000000001180 <a>:
struct page *a(struct page *p, unsigned long n)
{
    1180:       e8 00 00 00 00          callq  1185 <a+0x5>
                        1181: R_X86_64_PLT32    __fentry__-0x4
    1185:       55                      push   %rbp
        return nth_page(p, n);
    1186:       48 2b 3d 00 00 00 00    sub    0x0(%rip),%rdi
                        1189: R_X86_64_PC32     vmemmap_base-0x4
    118d:       48 c1 ff 06             sar    $0x6,%rdi
    1191:       48 8d 04 37             lea    (%rdi,%rsi,1),%rax
    1195:       48 89 e5                mov    %rsp,%rbp
        return nth_page(p, n);
    1198:       48 c1 e0 06             shl    $0x6,%rax
    119c:       48 03 05 00 00 00 00    add    0x0(%rip),%rax
                        119f: R_X86_64_PC32     vmemmap_base-0x4
    11a3:       5d                      pop    %rbp
    11a4:       c3                      retq   

vs

00000000000011b0 <b>:

struct page *b(struct page *p, unsigned long n)
{
    11b0:       e8 00 00 00 00          callq  11b5 <b+0x5>
                        11b1: R_X86_64_PLT32    __fentry__-0x4
    11b5:       55                      push   %rbp
        return p + n;
    11b6:       48 c1 e6 06             shl    $0x6,%rsi
    11ba:       48 8d 04 37             lea    (%rdi,%rsi,1),%rax
    11be:       48 89 e5                mov    %rsp,%rbp
    11c1:       5d                      pop    %rbp
    11c2:       c3                      retq   

Now, maybe we should put this optimisation into the definition of nth_page?

> > +struct folio {
> > +	/* private: don't document the anon union */
> > +	union {
> > +		struct {
> > +	/* public: */
> > +			unsigned long flags;
> > +			struct list_head lru;
> > +			struct address_space *mapping;
> > +			pgoff_t index;
> > +			unsigned long private;
> > +			atomic_t _mapcount;
> > +			atomic_t _refcount;
> > +#ifdef CONFIG_MEMCG
> > +			unsigned long memcg_data;
> > +#endif
> 
> As Christoph, I'm not a fan of this :/

What would you prefer?


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 02/27] mm: Add folio_pgdat and folio_zone
  2021-03-31 18:47 ` [PATCH v6 02/27] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
@ 2021-04-06 13:23   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs, Zi Yan

On Wed, Mar 31, 2021 at 07:47:03PM +0100, Matthew Wilcox (Oracle) wrote:
> These are just convenience wrappers for callers with folios; pgdat and
> zone can be reached from tail pages as well as head pages.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Zi Yan <ziy@nvidia.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 03/27] mm/vmstat: Add functions to account folio statistics
  2021-03-31 18:47 ` [PATCH v6 03/27] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
@ 2021-04-06 13:25   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:25 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:04PM +0100, Matthew Wilcox (Oracle) wrote:
> Allow page counters to be more readily modified by callers which have
> a folio.  Name these wrappers with 'stat' instead of 'state' as requested
> by Linus here:

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 04/27] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
  2021-03-31 18:47 ` [PATCH v6 04/27] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
@ 2021-04-06 13:26   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs, Zi Yan

On Wed, Mar 31, 2021 at 07:47:05PM +0100, Matthew Wilcox (Oracle) wrote:
> These are the folio equivalents of VM_BUG_ON_PAGE and VM_WARN_ON_ONCE_PAGE.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Zi Yan <ziy@nvidia.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/27] mm: Add folio reference count functions
  2021-03-31 18:47 ` [PATCH v6 05/27] mm: Add folio reference count functions Matthew Wilcox (Oracle)
@ 2021-04-06 13:30   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:30 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:06PM +0100, Matthew Wilcox (Oracle) wrote:
> These functions mirror their page reference counterparts.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  Documentation/core-api/mm-api.rst |  1 +
>  include/linux/page_ref.h          | 88 ++++++++++++++++++++++++++++++-
>  2 files changed, 88 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst
> index 34f46df91a8b..1ead2570b217 100644
> --- a/Documentation/core-api/mm-api.rst
> +++ b/Documentation/core-api/mm-api.rst
> @@ -97,3 +97,4 @@ More Memory Management Functions
>     :internal:
>  .. kernel-doc:: include/linux/mm.h
>     :internal:
> +.. kernel-doc:: include/linux/page_ref.h
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index f3318f34fc54..f27005e760fd 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -69,7 +69,29 @@ static inline int page_ref_count(struct page *page)
>  
>  static inline int page_count(struct page *page)
>  {
> -	return atomic_read(&compound_head(page)->_refcount);
> +	return page_ref_count(compound_head(page));
> +}

I don't think this change belongs in here.  It seems useful though,
so maybe split it into a standalone patch?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 06/27] mm: Add put_folio
  2021-03-31 18:47 ` [PATCH v6 06/27] mm: Add put_folio Matthew Wilcox (Oracle)
@ 2021-04-06 13:31   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs, Zi Yan

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 07/27] mm: Add get_folio
  2021-03-31 18:47 ` [PATCH v6 07/27] mm: Add get_folio Matthew Wilcox (Oracle)
@ 2021-04-06 13:32   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:32 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs, Zi Yan

On Wed, Mar 31, 2021 at 07:47:08PM +0100, Matthew Wilcox (Oracle) wrote:
> If we know we have a folio, we can call get_folio() instead
> of get_page() and save the overhead of calling compound_head().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Zi Yan <ziy@nvidia.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 08/27] mm: Create FolioFlags
  2021-03-31 18:47 ` [PATCH v6 08/27] mm: Create FolioFlags Matthew Wilcox (Oracle)
@ 2021-04-06 13:34   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:34 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:09PM +0100, Matthew Wilcox (Oracle) wrote:
> These new functions are the folio analogues of the PageFlags functions.
> If CONFIG_DEBUG_VM_PGFLAGS is enabled, we check the folio is not a tail
> page at every invocation.  Note that this will also catch the PagePoisoned
> case as a poisoned page has every bit set, which would include PageTail.
> 
> This saves 1727 bytes of text with the distro-derived config that
> I'm testing due to removing a double call to compound_head() in
> PageSwapCache().

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 09/27] mm: Handle per-folio private data
  2021-03-31 18:47 ` [PATCH v6 09/27] mm: Handle per-folio private data Matthew Wilcox (Oracle)
@ 2021-04-06 13:37   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:37 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:10PM +0100, Matthew Wilcox (Oracle) wrote:
> Add folio_private() and set_folio_private() which mirror page_private()
> and set_page_private() -- ie folio private data is the same as page
> private data.  The only difference is that these return a void *
> instead of an unsigned long, which matches the majority of users.
> 
> Turn attach_page_private() into attach_folio_private() and reimplement
> attach_page_private() as a wrapper.  No filesystem which uses page private
> data currently supports compound pages, so we're free to define the rules.
> attach_page_private() may only be called on a head page; if you want
> to add private data to a tail page, you can call set_page_private()
> directly (and shouldn't increment the page refcount!  That should be
> done when adding private data to the head page / folio).
> 
> This saves 597 bytes of text with the distro-derived config that I'm
> testing due to removing the calls to compound_head() in get_page()
> & put_page().

Except that this seems to be the first patch that uses a field in the
non-struct page union leg in struct folio, which could be trivially
avoided this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 10/27] mm/filemap: Add folio_index, folio_file_page and folio_contains
  2021-03-31 18:47 ` [PATCH v6 10/27] mm/filemap: Add folio_index, folio_file_page and folio_contains Matthew Wilcox (Oracle)
@ 2021-04-06 13:39   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:39 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Except for the implementation details using the unions field (I'm not
going to mention these going forward), this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 11/27] mm/filemap: Add folio_next_index
  2021-03-31 18:47 ` [PATCH v6 11/27] mm/filemap: Add folio_next_index Matthew Wilcox (Oracle)
@ 2021-04-06 13:40   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:40 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:12PM +0100, Matthew Wilcox (Oracle) wrote:
> This helper returns the page index of the next folio in the file (ie
> the end of this folio, plus one).

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/27] mm/filemap: Add folio_offset and folio_file_offset
  2021-03-31 18:47 ` [PATCH v6 12/27] mm/filemap: Add folio_offset and folio_file_offset Matthew Wilcox (Oracle)
@ 2021-04-06 13:42   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:13PM +0100, Matthew Wilcox (Oracle) wrote:
> These are just wrappers around their page counterpart.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 13/27] mm/util: Add folio_mapping and folio_file_mapping
  2021-03-31 18:47 ` [PATCH v6 13/27] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
@ 2021-04-06 13:45   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:45 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 14/27] mm: Add folio_mapcount
  2021-03-31 18:47 ` [PATCH v6 14/27] mm: Add folio_mapcount Matthew Wilcox (Oracle)
@ 2021-04-06 13:46   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:46 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:15PM +0100, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of page_mapcount().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/27] mm/memcg: Add folio wrappers for various functions
  2021-03-31 18:47 ` [PATCH v6 15/27] mm/memcg: Add folio wrappers for various functions Matthew Wilcox (Oracle)
@ 2021-04-06 13:48   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:48 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:16PM +0100, Matthew Wilcox (Oracle) wrote:
> Add new wrapper functions folio_memcg(), lock_folio_memcg(),
> unlock_folio_memcg(), mem_cgroup_folio_lruvec() and
> count_memcg_folio_event()

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 16/27] mm/filemap: Add unlock_folio
  2021-03-31 18:47 ` [PATCH v6 16/27] mm/filemap: Add unlock_folio Matthew Wilcox (Oracle)
@ 2021-04-06 13:51   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:51 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:17PM +0100, Matthew Wilcox (Oracle) wrote:
> Convert unlock_page() to call unlock_folio().  By using a folio we
> avoid a call to compound_head().  This shortens the function from 39
> bytes to 25 and removes 4 instructions on x86-64.  Because we still
> have unlock_page(), it's a net increase of 24 bytes of text for the
> kernel as a whole, but any path that uses unlock_folio() will execute
> 4 fewer instructions.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 17/27] mm/filemap: Add lock_folio
  2021-03-31 18:47 ` [PATCH v6 17/27] mm/filemap: Add lock_folio Matthew Wilcox (Oracle)
@ 2021-04-06 13:52   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:52 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 18/27] mm/filemap: Add lock_folio_killable
  2021-03-31 18:47 ` [PATCH v6 18/27] mm/filemap: Add lock_folio_killable Matthew Wilcox (Oracle)
@ 2021-04-06 13:53   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:53 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 19/27] mm/filemap: Add __lock_folio_async
  2021-03-31 18:47 ` [PATCH v6 19/27] mm/filemap: Add __lock_folio_async Matthew Wilcox (Oracle)
@ 2021-04-06 13:55   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:55 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 20/27] mm/filemap: Add __lock_folio_or_retry
  2021-03-31 18:47 ` [PATCH v6 20/27] mm/filemap: Add __lock_folio_or_retry Matthew Wilcox (Oracle)
@ 2021-04-06 13:57   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 13:57 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:21PM +0100, Matthew Wilcox (Oracle) wrote:
> Convert __lock_page_or_retry() to __lock_folio_or_retry().  This actually
> saves 4 bytes in the only caller of lock_page_or_retry() (due to better
> register allocation) and saves the 20 byte cost of calling page_folio()
> in __lock_folio_or_retry() for a total saving of 24 bytes.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 21/27] mm/filemap: Add wait_on_folio_locked
  2021-03-31 18:47 ` [PATCH v6 21/27] mm/filemap: Add wait_on_folio_locked Matthew Wilcox (Oracle)
@ 2021-04-06 14:11   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 14:11 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:22PM +0100, Matthew Wilcox (Oracle) wrote:
> Also add wait_on_folio_locked_killable().  Turn wait_on_page_locked()
> and wait_on_page_locked_killable() into wrappers.  This eliminates a
> call to compound_head() from each call-site, reducing text size by 200
> bytes for me.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 22/27] mm/filemap: Add end_folio_writeback
  2021-03-31 18:47 ` [PATCH v6 22/27] mm/filemap: Add end_folio_writeback Matthew Wilcox (Oracle)
@ 2021-04-06 14:13   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 14:13 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:23PM +0100, Matthew Wilcox (Oracle) wrote:
> Add an end_page_writeback() wrapper function for users that are not yet
> converted to folios.
> 
> end_folio_writeback() is less than half the size of end_page_writeback()
> at just 105 bytes compared to 213 bytes, due to removing all the
> compound_head() calls.  The 30 byte wrapper function makes this a net
> saving of 70 bytes.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 23/27] mm/writeback: Add wait_on_folio_writeback
  2021-03-31 18:47 ` [PATCH v6 23/27] mm/writeback: Add wait_on_folio_writeback Matthew Wilcox (Oracle)
@ 2021-04-06 14:15   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 14:15 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:24PM +0100, Matthew Wilcox (Oracle) wrote:
> wait_on_page_writeback_killable() only has one caller, so convert it to
> call wait_on_folio_writeback_killable().  For the wait_on_page_writeback()
> callers, add a compatibility wrapper around wait_on_folio_writeback().
> 
> Turning PageWriteback() into FolioWriteback() eliminates a call to
> compound_head() which saves 8 bytes and 15 bytes in the two functions.
> That is more than offset by adding the wait_on_page_writeback
> compatibility wrapper for a net increase in text of 15 bytes.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 24/27] mm/writeback: Add wait_for_stable_folio
  2021-03-31 18:47 ` [PATCH v6 24/27] mm/writeback: Add wait_for_stable_folio Matthew Wilcox (Oracle)
@ 2021-04-06 14:18   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 14:18 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:25PM +0100, Matthew Wilcox (Oracle) wrote:
> Move wait_for_stable_page() into the folio compatibility file.
> wait_for_stable_folio() avoids a call to compound_head() and is 14 bytes
> smaller than wait_for_stable_page() was.  The net text size grows by 24
> bytes as a result of this patch.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 25/27] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit
  2021-03-31 18:47 ` [PATCH v6 25/27] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
@ 2021-04-06 14:19   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 14:19 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-04-06 12:48     ` Matthew Wilcox
@ 2021-04-06 14:21       ` Kirill A. Shutemov
  2021-04-06 14:31       ` Christoph Hellwig
  1 sibling, 0 replies; 76+ messages in thread
From: Kirill A. Shutemov @ 2021-04-06 14:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Tue, Apr 06, 2021 at 01:48:07PM +0100, Matthew Wilcox wrote:
> Now, maybe we should put this optimisation into the definition of nth_page?

Sounds like a good idea to me.

> > > +struct folio {
> > > +	/* private: don't document the anon union */
> > > +	union {
> > > +		struct {
> > > +	/* public: */
> > > +			unsigned long flags;
> > > +			struct list_head lru;
> > > +			struct address_space *mapping;
> > > +			pgoff_t index;
> > > +			unsigned long private;
> > > +			atomic_t _mapcount;
> > > +			atomic_t _refcount;
> > > +#ifdef CONFIG_MEMCG
> > > +			unsigned long memcg_data;
> > > +#endif
> > 
> > As Christoph, I'm not a fan of this :/
> 
> What would you prefer?

I liked earlier approach with only struct page here. Once we know a field
should never be referenced from raw struct page, we can move it here.

But feel free to ignore my suggestion. It's not show-stopper for me and
reverting is back doesn't worth it.

I went through the patchset and it looks good. You can use my

  Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

on all of them.

Thanks a lot for doing this.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 26/27] mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit
  2021-03-31 18:47 ` [PATCH v6 26/27] mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit Matthew Wilcox (Oracle)
@ 2021-04-06 14:23   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 14:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:27PM +0100, Matthew Wilcox (Oracle) wrote:
>  void unlock_page_private_2(struct page *page)
>  {
> -	page = compound_head(page);
> -	VM_BUG_ON_PAGE(!PagePrivate2(page), page);
> -	clear_bit_unlock(PG_private_2, &page->flags);
> -	wake_up_page_bit(page, PG_private_2);
> +	struct folio *folio = page_folio(page);
> +	VM_BUG_ON_FOLIO(!FolioPrivate2(folio), folio);

A whitespace between the declaration and the code would be nice.

Otherwise looks good;

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 27/27] mm/filemap: Convert page wait queues to be folios
  2021-03-31 18:47 ` [PATCH v6 27/27] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
@ 2021-04-06 14:25   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 14:25 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Wed, Mar 31, 2021 at 07:47:28PM +0100, Matthew Wilcox (Oracle) wrote:
> Reinforce that if we're waiting for a bit in a struct page, that's
> actually in the head page by changing the type from page to folio.
> Increases the size of cachefiles by two bytes, but the kernel core
> is unchanged in size.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-04-06 12:48     ` Matthew Wilcox
  2021-04-06 14:21       ` Kirill A. Shutemov
@ 2021-04-06 14:31       ` Christoph Hellwig
  2021-04-06 14:40         ` Matthew Wilcox
  1 sibling, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 14:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, linux-fsdevel,
	linux-cachefs, linux-afs

On Tue, Apr 06, 2021 at 01:48:07PM +0100, Matthew Wilcox wrote:
> Now, maybe we should put this optimisation into the definition of nth_page?

That would be nice.

> > As Christoph, I'm not a fan of this :/
> 
> What would you prefer?

Looking at your full folio series on git.infradead.org, there are a
total of 12 references to non-page members of struct folio, assuming
my crude grep that expects a folio to be named folio did not miss any.

Except for one that prints folio->flags in cachefiles code, and which
should go away they are all in core MM code in mm/ or include/.  With
enough file system conversions I do see potential uses for ->mapping
and ->index outside of core code, but IMHO we can ignore those for now
and just switch them over if/when we actually change the struct folio
internals to split them from tail pages.

So my opinion is:  leave these fields out for now, and when the problem
that we'd have a lot of reference out of core code arises deal with it
once we know about the scope.  Maybe we add wrappers for the few
members that are reasonable "public", maybe we then do the union
trick you have here because it is the least evil, or maybe we just do
not do anything at all until these fields move over to the folio
entirely.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-04-06 14:31       ` Christoph Hellwig
@ 2021-04-06 14:40         ` Matthew Wilcox
  2021-04-06 14:47           ` Christoph Hellwig
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2021-04-06 14:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, linux-fsdevel,
	linux-cachefs, linux-afs

On Tue, Apr 06, 2021 at 03:31:50PM +0100, Christoph Hellwig wrote:
> > > As Christoph, I'm not a fan of this :/
> > 
> > What would you prefer?
> 
> Looking at your full folio series on git.infradead.org, there are a
> total of 12 references to non-page members of struct folio, assuming
> my crude grep that expects a folio to be named folio did not miss any.

Hmm ... I count more in the filesystems:

fs/afs/dir.c:   struct afs_vnode *dvnode = AFS_FS_I(folio->page.mapping->host);
fs/afs/dir.c:   _enter("{%lu},%zu,%zu", folio->page.index, offset, length);
fs/afs/file.c:  _enter("{%lu},%zu,%zu", folio->page.index, offset, length);
fs/afs/write.c:         folio->page.index);
fs/befs/linuxvfs.c:     struct inode *inode = folio->page.mapping->host;
fs/btrfs/disk-io.c:     tree = &BTRFS_I(folio->page.mapping->host)->io_tree;
fs/btrfs/disk-io.c:             btrfs_warn(BTRFS_I(folio->page.mapping->host)->root->fs_info,
fs/btrfs/extent_io.c:   struct btrfs_inode *inode = BTRFS_I(folios[0]->page.mapping->host);
fs/btrfs/file.c:                if (folio->page.mapping != inode->i_mapping) {
fs/btrfs/free-space-cache.c:                    if (folio->page.mapping != inode->i_mapping) {
fs/btrfs/inode.c:               if (folio->page.mapping != mapping) {
fs/btrfs/inode.c:       struct btrfs_inode *inode = BTRFS_I(folio->page.mapping->host);
fs/buffer.c:    spin_lock(&folio->page.mapping->private_lock);
fs/buffer.c:    spin_unlock(&folio->page.mapping->private_lock);
fs/buffer.c:    block_in_file = (sector_t)folio->page.index <<
fs/ceph/addr.c:              mapping->host, folio, folio->page.index);
fs/ceph/addr.c:      mapping->host, folio, folio->page.index,
fs/ceph/addr.c: folio->page.private = (unsigned long)snapc;
fs/ceph/addr.c: inode = folio->page.mapping->host;
fs/ceph/addr.c:              inode, folio, folio->page.index, offset, length);
fs/ceph/addr.c:      inode, folio, folio->page.index);
fs/cifs/file.c: struct cifsInodeInfo *cifsi = CIFS_I(folio->page.mapping->host);
fs/ext4/inode.c:        struct inode *inode = folio->page.mapping->host;
fs/f2fs/data.c: struct inode *inode = folio->page.mapping->host;
fs/fuse/dir.c:  int err = fuse_readlink_page(folio->page.mapping->host, &folio->page);
fs/gfs2/aops.c: struct gfs2_sbd *sdp = GFS2_SB(folio->page.mapping->host);
fs/iomap/buffered-io.c: unsigned int nr_blocks = i_blocks_per_folio(folio->page.mapping->host,
fs/iomap/buffered-io.c: struct inode *inode = folio->page.mapping->host;
fs/iomap/buffered-io.c: BUG_ON(folio->page.index);
fs/iomap/buffered-io.c:         gfp_t gfp = mapping_gfp_constraint(folio->page.mapping,
fs/iomap/buffered-io.c: struct inode *inode = folio->page.mapping->host;
fs/iomap/buffered-io.c: struct inode *inode = folio->page.mapping->host;
fs/iomap/buffered-io.c: trace_iomap_releasepage(folio->page.mapping->host, folio_offset(folio),
fs/iomap/buffered-io.c: trace_iomap_invalidatepage(folio->page.mapping->host, offset, len);
fs/jffs2/file.c:        struct inode *inode = folio->page.mapping->host;
fs/mpage.c:     struct inode *inode = folio->page.mapping->host;
fs/mpage.c:             gfp = readahead_gfp_mask(folio->page.mapping);
fs/mpage.c:             gfp = mapping_gfp_constraint(folio->page.mapping, GFP_KERNEL);
fs/mpage.c:     block_in_file = (sector_t)folio->page.index << (PAGE_SHIFT - blkbits);
fs/mpage.c:             prefetchw(&folio->page.flags);
fs/nfs/file.c:  nfs_fscache_invalidate_page(&folio->page, folio->page.mapping->host);
fs/nfs/fscache.c:                nfs_i_fscache(inode), folio, folio->page.index,
fs/nfs/fscache.c:                folio->page.flags, inode);
fs/reiserfs/inode.c:    struct inode *inode = folio->page.mapping->host;
fs/remap_range.c:       if (folio1->page.index > folio2->page.index)
fs/ubifs/file.c:        struct inode *inode = folio->page.mapping->host;
fs/xfs/xfs_aops.c:      struct inode            *inode = folio->page.mapping->host;

(I didn't go through my whole series and do the conversion from
folio->page.x to folio->x yet)



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-04-06 14:40         ` Matthew Wilcox
@ 2021-04-06 14:47           ` Christoph Hellwig
  2021-04-06 14:55             ` Matthew Wilcox
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 14:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Kirill A. Shutemov, linux-mm, linux-kernel,
	linux-fsdevel, linux-cachefs, linux-afs

On Tue, Apr 06, 2021 at 03:40:22PM +0100, Matthew Wilcox wrote:
> On Tue, Apr 06, 2021 at 03:31:50PM +0100, Christoph Hellwig wrote:
> > > > As Christoph, I'm not a fan of this :/
> > > 
> > > What would you prefer?
> > 
> > Looking at your full folio series on git.infradead.org, there are a
> > total of 12 references to non-page members of struct folio, assuming
> > my crude grep that expects a folio to be named folio did not miss any.
> 
> Hmm ... I count more in the filesystems:

I only counted the ones that you actually did convert.

This add about 80 more.  IMHO still not worth doing the union.  I'd
rather sort this out properl if/when the structures get properly split.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-04-06 14:47           ` Christoph Hellwig
@ 2021-04-06 14:55             ` Matthew Wilcox
  2021-04-06 15:05               ` Christoph Hellwig
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2021-04-06 14:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, linux-fsdevel,
	linux-cachefs, linux-afs

On Tue, Apr 06, 2021 at 03:47:12PM +0100, Christoph Hellwig wrote:
> On Tue, Apr 06, 2021 at 03:40:22PM +0100, Matthew Wilcox wrote:
> > On Tue, Apr 06, 2021 at 03:31:50PM +0100, Christoph Hellwig wrote:
> > > > > As Christoph, I'm not a fan of this :/
> > > > 
> > > > What would you prefer?
> > > 
> > > Looking at your full folio series on git.infradead.org, there are a
> > > total of 12 references to non-page members of struct folio, assuming
> > > my crude grep that expects a folio to be named folio did not miss any.
> > 
> > Hmm ... I count more in the filesystems:
> 
> I only counted the ones that you actually did convert.
> 
> This add about 80 more.  IMHO still not worth doing the union.  I'd
> rather sort this out properl if/when the structures get properly split.

Assuming we're getting rid of them all though, we have to include:

$ git grep 'page->mapping' fs |wc -l
358
$ git grep 'page->index' fs |wc -l
355



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-04-06 14:55             ` Matthew Wilcox
@ 2021-04-06 15:05               ` Christoph Hellwig
  2021-04-06 16:25                 ` Matthew Wilcox
  0 siblings, 1 reply; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-06 15:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Kirill A. Shutemov, linux-mm, linux-kernel,
	linux-fsdevel, linux-cachefs, linux-afs

On Tue, Apr 06, 2021 at 03:55:11PM +0100, Matthew Wilcox wrote:
> Assuming we're getting rid of them all though, we have to include:
> 
> $ git grep 'page->mapping' fs |wc -l
> 358
> $ git grep 'page->index' fs |wc -l
> 355

Are they all going to stay?  Or are we going to clean up some of that
mess.  A lot of ->index should be page_offet, and on the mapping side
the page_mapping and page_file_mapping mess is also waiting to be
sorted..


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/27] Memory Folios
  2021-04-05 19:31   ` Matthew Wilcox
@ 2021-04-06 15:14     ` Jeff Layton
  0 siblings, 0 replies; 76+ messages in thread
From: Jeff Layton @ 2021-04-06 15:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On Mon, 2021-04-05 at 20:31 +0100, Matthew Wilcox wrote:
> On Mon, Apr 05, 2021 at 03:14:29PM -0400, Jeff Layton wrote:
> > On Wed, 2021-03-31 at 19:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > Managing memory in 4KiB pages is a serious overhead.  Many benchmarks
> > > exist which show the benefits of a larger "page size".  As an example,
> > > an earlier iteration of this idea which used compound pages got a 7%
> > > performance boost when compiling the kernel using kernbench without any
> > > particular tuning.
> > > 
> > > Using compound pages or THPs exposes a serious weakness in our type
> > > system.  Functions are often unprepared for compound pages to be passed
> > > to them, and may only act on PAGE_SIZE chunks.  Even functions which are
> > > aware of compound pages may expect a head page, and do the wrong thing
> > > if passed a tail page.
> > > 
> > > There have been efforts to label function parameters as 'head' instead
> > > of 'page' to indicate that the function expects a head page, but this
> > > leaves us with runtime assertions instead of using the compiler to prove
> > > that nobody has mistakenly passed a tail page.  Calling a struct page
> > > 'head' is also inaccurate as they will work perfectly well on base pages.
> > > The term 'nottail' has not proven popular.
> > > 
> > > We also waste a lot of instructions ensuring that we're not looking at
> > > a tail page.  Almost every call to PageFoo() contains one or more hidden
> > > calls to compound_head().  This also happens for get_page(), put_page()
> > > and many more functions.  There does not appear to be a way to tell gcc
> > > that it can cache the result of compound_head(), nor is there a way to
> > > tell it that compound_head() is idempotent.
> > > 
> > > This series introduces the 'struct folio' as a replacement for
> > > head-or-base pages.  This initial set reduces the kernel size by
> > > approximately 5kB by removing conversions from tail pages to head pages.
> > > The real purpose of this series is adding infrastructure to enable
> > > further use of the folio.
> > > 
> > > The medium-term goal is to convert all filesystems and some device
> > > drivers to work in terms of folios.  This series contains a lot of
> > > explicit conversions, but it's important to realise it's removing a lot
> > > of implicit conversions in some relatively hot paths.  There will be very
> > > few conversions from folios when this work is completed; filesystems,
> > > the page cache, the LRU and so on will generally only deal with folios.
> > 
> > I too am a little concerned about the amount of churn this is likely to
> > cause, but this does seem like a fairly promising way forward for
> > actually using THPs in the pagecache. The set is fairly straightforward.
> > 
> > That said, there are few callers of these new functions in here. Is this
> > set enough to allow converting some subsystem to use folios? It might be
> > good to do that if possible, so we can get an idea of how much work
> > we're in for.
> 
> It isn't enough to start converting much.  There needs to be a second set
> of patches which add all the infrastructure for converting a filesystem.
> Then we can start working on the filesystems.  I have a start at that
> here:
> 
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio
> 
> I don't know if it's exactly how I'll arrange it for submission.  It might
> be better to convert all the filesystem implementations of readpage
> to work on a folio, and then the big bang conversion of ->readpage to
> ->read_folio will look much more mechanical.
> 
> But if I can't convince people that a folio approach is what we need,
> then I should stop working on it, and go back to fixing the endless
> stream of bugs that the thp-based approach surfaces.

Fair enough. I generally prefer to see some callers added at the same
time as new functions, but I understand that the scale of this patchset
makes that difficult. You can add this to the whole series. I don't see
any major show-stoppers here:

Acked-by: Jeff Layton <jlayton@kernel.org>



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-04-06 15:05               ` Christoph Hellwig
@ 2021-04-06 16:25                 ` Matthew Wilcox
  2021-04-07  6:09                   ` Christoph Hellwig
  0 siblings, 1 reply; 76+ messages in thread
From: Matthew Wilcox @ 2021-04-06 16:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, linux-fsdevel,
	linux-cachefs, linux-afs

On Tue, Apr 06, 2021 at 04:05:50PM +0100, Christoph Hellwig wrote:
> On Tue, Apr 06, 2021 at 03:55:11PM +0100, Matthew Wilcox wrote:
> > Assuming we're getting rid of them all though, we have to include:
> > 
> > $ git grep 'page->mapping' fs |wc -l
> > 358
> > $ git grep 'page->index' fs |wc -l
> > 355
> 
> Are they all going to stay?  Or are we going to clean up some of that
> mess.  A lot of ->index should be page_offet, and on the mapping side
> the page_mapping and page_file_mapping mess is also waiting to be
> sorted..

About a third of ->index can be folio_offset(), based on a crude:

$ git grep 'page->index.*PAGE_' |wc -l
101

and I absolutely don't mind cleaning that up as part of the folio work,
but that still leaves 200-250 instances that would need to be changed
later.

I don't want to change the page->mapping to calls to folio_mapping().
That's a lot of extra work for a page which the filesystem knows belongs
to it.  folio_mapping() only needs to be used for pages which might not
belong to a filesystem.

page_file_mapping() absolutely needs to go away.  The way to do that
is to change swap-over-nfs to use direct IO, and then NFS can use
folio->mapping like all other filesystems.  f2fs is just terminally
confused and shouldn't be using page_file_mapping at all.  I'll fix
that as part of the folio work.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-04-06 16:25                 ` Matthew Wilcox
@ 2021-04-07  6:09                   ` Christoph Hellwig
  0 siblings, 0 replies; 76+ messages in thread
From: Christoph Hellwig @ 2021-04-07  6:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Kirill A. Shutemov, linux-mm, linux-kernel,
	linux-fsdevel, linux-cachefs, linux-afs

On Tue, Apr 06, 2021 at 05:25:30PM +0100, Matthew Wilcox wrote:
> About a third of ->index can be folio_offset(), based on a crude:
> 
> $ git grep 'page->index.*PAGE_' |wc -l
> 101
> 
> and I absolutely don't mind cleaning that up as part of the folio work,
> but that still leaves 200-250 instances that would need to be changed
> later.
> 
> I don't want to change the page->mapping to calls to folio_mapping().
> That's a lot of extra work for a page which the filesystem knows belongs
> to it.  folio_mapping() only needs to be used for pages which might not
> belong to a filesystem.
> 
> page_file_mapping() absolutely needs to go away.  The way to do that
> is to change swap-over-nfs to use direct IO, and then NFS can use
> folio->mapping like all other filesystems.  f2fs is just terminally
> confused and shouldn't be using page_file_mapping at all.  I'll fix
> that as part of the folio work.

Thanks.  So my opinion for now remains preferably just don't add
the union and derefence through the page.  But I'm not going to block
the series for it, as I think it is a huge and badly needed cleanup
required to make further use of larger pages / large chunks of memory
in the pagecache and the file systems.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/27] mm: Introduce struct folio
  2021-03-31 18:47 ` [PATCH v6 01/27] mm: Introduce struct folio Matthew Wilcox (Oracle)
  2021-04-06 12:29   ` Kirill A. Shutemov
@ 2021-04-08  9:01   ` Rasmus Villemoes
  1 sibling, 0 replies; 76+ messages in thread
From: Rasmus Villemoes @ 2021-04-08  9:01 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: linux-kernel, linux-fsdevel, linux-cachefs, linux-afs

On 31/03/2021 20.47, Matthew Wilcox (Oracle) wrote:

> +static inline void folio_build_bug(void)
> +{
> +#define FOLIO_MATCH(pg, fl)						\
> +BUILD_BUG_ON(offsetof(struct page, pg) != offsetof(struct folio, fl));
> +
> +	FOLIO_MATCH(flags, flags);
> +	FOLIO_MATCH(lru, lru);
> +	FOLIO_MATCH(mapping, mapping);
> +	FOLIO_MATCH(index, index);
> +	FOLIO_MATCH(private, private);
> +	FOLIO_MATCH(_mapcount, _mapcount);
> +	FOLIO_MATCH(_refcount, _refcount);
> +#ifdef CONFIG_MEMCG
> +	FOLIO_MATCH(memcg_data, memcg_data);
> +#endif
> +#undef FOLIO_MATCH
> +	BUILD_BUG_ON(sizeof(struct page) != sizeof(struct folio));
> +}
> +

Perhaps do this next to the definition of struct folio instead of hiding
it in some arbitrary TU - hint, we have static_assert() that doesn't
need to be in function context. And consider amending FOLIO_MATCH by a
static_assert(__same_type(typeof_member(...), typeof_member(...))).

Rasmus


^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2021-04-08  9:02 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31 18:47 [PATCH v6 00/27] Memory Folios Matthew Wilcox (Oracle)
2021-03-31 18:47 ` [PATCH v6 01/27] mm: Introduce struct folio Matthew Wilcox (Oracle)
2021-04-06 12:29   ` Kirill A. Shutemov
2021-04-06 12:48     ` Matthew Wilcox
2021-04-06 14:21       ` Kirill A. Shutemov
2021-04-06 14:31       ` Christoph Hellwig
2021-04-06 14:40         ` Matthew Wilcox
2021-04-06 14:47           ` Christoph Hellwig
2021-04-06 14:55             ` Matthew Wilcox
2021-04-06 15:05               ` Christoph Hellwig
2021-04-06 16:25                 ` Matthew Wilcox
2021-04-07  6:09                   ` Christoph Hellwig
2021-04-08  9:01   ` Rasmus Villemoes
2021-03-31 18:47 ` [PATCH v6 02/27] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
2021-04-06 13:23   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 03/27] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
2021-04-06 13:25   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 04/27] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
2021-04-06 13:26   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 05/27] mm: Add folio reference count functions Matthew Wilcox (Oracle)
2021-04-06 13:30   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 06/27] mm: Add put_folio Matthew Wilcox (Oracle)
2021-04-06 13:31   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 07/27] mm: Add get_folio Matthew Wilcox (Oracle)
2021-04-06 13:32   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 08/27] mm: Create FolioFlags Matthew Wilcox (Oracle)
2021-04-06 13:34   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 09/27] mm: Handle per-folio private data Matthew Wilcox (Oracle)
2021-04-06 13:37   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 10/27] mm/filemap: Add folio_index, folio_file_page and folio_contains Matthew Wilcox (Oracle)
2021-04-06 13:39   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 11/27] mm/filemap: Add folio_next_index Matthew Wilcox (Oracle)
2021-04-06 13:40   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 12/27] mm/filemap: Add folio_offset and folio_file_offset Matthew Wilcox (Oracle)
2021-04-06 13:42   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 13/27] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
2021-04-06 13:45   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 14/27] mm: Add folio_mapcount Matthew Wilcox (Oracle)
2021-04-06 13:46   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 15/27] mm/memcg: Add folio wrappers for various functions Matthew Wilcox (Oracle)
2021-04-06 13:48   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 16/27] mm/filemap: Add unlock_folio Matthew Wilcox (Oracle)
2021-04-06 13:51   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 17/27] mm/filemap: Add lock_folio Matthew Wilcox (Oracle)
2021-04-06 13:52   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 18/27] mm/filemap: Add lock_folio_killable Matthew Wilcox (Oracle)
2021-04-06 13:53   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 19/27] mm/filemap: Add __lock_folio_async Matthew Wilcox (Oracle)
2021-04-06 13:55   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 20/27] mm/filemap: Add __lock_folio_or_retry Matthew Wilcox (Oracle)
2021-04-06 13:57   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 21/27] mm/filemap: Add wait_on_folio_locked Matthew Wilcox (Oracle)
2021-04-06 14:11   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 22/27] mm/filemap: Add end_folio_writeback Matthew Wilcox (Oracle)
2021-04-06 14:13   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 23/27] mm/writeback: Add wait_on_folio_writeback Matthew Wilcox (Oracle)
2021-04-06 14:15   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 24/27] mm/writeback: Add wait_for_stable_folio Matthew Wilcox (Oracle)
2021-04-06 14:18   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 25/27] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
2021-04-06 14:19   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 26/27] mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit Matthew Wilcox (Oracle)
2021-04-06 14:23   ` Christoph Hellwig
2021-03-31 18:47 ` [PATCH v6 27/27] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
2021-04-06 14:25   ` Christoph Hellwig
2021-04-01  7:05 ` [PATCH v6 00/27] Memory Folios Christoph Hellwig
2021-04-01 11:26   ` Matthew Wilcox
2021-04-01 12:28     ` Jason Gunthorpe
2021-04-01 12:52       ` Matthew Wilcox
2021-04-01 13:30         ` Jason Gunthorpe
2021-04-02 14:37     ` Christoph Hellwig
2021-04-02 14:49       ` Matthew Wilcox
2021-04-03  0:31 ` Kent Overstreet
2021-04-05 19:14 ` Jeff Layton
2021-04-05 19:31   ` Matthew Wilcox
2021-04-06 15:14     ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).