linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 00/33] Memory folios
@ 2021-05-11 21:47 Matthew Wilcox (Oracle)
  2021-05-11 21:47 ` [PATCH v10 01/33] mm: Introduce struct folio Matthew Wilcox (Oracle)
                   ` (35 more replies)
  0 siblings, 36 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm, linux-kernel

Managing memory in 4KiB pages is a serious overhead.  Many benchmarks
benefit from a larger "page size".  As an example, an earlier iteration
of this idea which used compound pages (and wasn't particularly tuned)
got a 7% performance boost when compiling the kernel.

Using compound pages or THPs exposes a weakness of our type system.
Functions are often unprepared for compound pages to be passed to them,
and may only act on PAGE_SIZE chunks.  Even functions which are aware of
compound pages may expect a head page, and do the wrong thing if passed
a tail page.

We also waste a lot of instructions ensuring that we're not looking at
a tail page.  Almost every call to PageFoo() contains one or more hidden
calls to compound_head().  This also happens for get_page(), put_page()
and many more functions.  There does not appear to be a way to tell gcc
that it can cache the result of compound_head(), nor is there a way to
tell it that compound_head() is idempotent.

This patch series uses a new type, the struct folio, to manage memory.
It provides some basic infrastructure that's worthwhile in its own right,
shrinking the kernel by about 5kB of text.

Since v9:
 - Rebase onto mmotm 2021-05-10-21-46
 - Add folio_memcg() definition for !MEMCG (intel lkp)
 - Change folio->private from an unsigned long to a void *
 - Use folio_page() to implement folio_file_page()
 - Add folio_try_get() and folio_try_get_rcu()
 - Trim back down to just the first few patches, which are better-reviewed.
v9: https://lore.kernel.org/linux-mm/20210505150628.111735-1-willy@infradead.org/
v8: https://lore.kernel.org/linux-mm/20210430180740.2707166-1-willy@infradead.org/

Matthew Wilcox (Oracle) (33):
  mm: Introduce struct folio
  mm: Add folio_pgdat and folio_zone
  mm/vmstat: Add functions to account folio statistics
  mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
  mm: Add folio reference count functions
  mm: Add folio_put
  mm: Add folio_get
  mm: Add folio_try_get_rcu
  mm: Add folio flag manipulation functions
  mm: Add folio_young and folio_idle
  mm: Handle per-folio private data
  mm/filemap: Add folio_index, folio_file_page and folio_contains
  mm/filemap: Add folio_next_index
  mm/filemap: Add folio_offset and folio_file_offset
  mm/util: Add folio_mapping and folio_file_mapping
  mm: Add folio_mapcount
  mm/memcg: Add folio wrappers for various functions
  mm/filemap: Add folio_unlock
  mm/filemap: Add folio_lock
  mm/filemap: Add folio_lock_killable
  mm/filemap: Add __folio_lock_async
  mm/filemap: Add __folio_lock_or_retry
  mm/filemap: Add folio_wait_locked
  mm/swap: Add folio_rotate_reclaimable
  mm/filemap: Add folio_end_writeback
  mm/writeback: Add folio_wait_writeback
  mm/writeback: Add folio_wait_stable
  mm/filemap: Add folio_wait_bit
  mm/filemap: Add folio_wake_bit
  mm/filemap: Convert page wait queues to be folios
  mm/filemap: Add folio private_2 functions
  fs/netfs: Add folio fscache functions
  mm: Add folio_mapped

 Documentation/core-api/mm-api.rst           |   4 +
 Documentation/filesystems/netfs_library.rst |   2 +
 fs/afs/write.c                              |   9 +-
 fs/cachefiles/rdwr.c                        |  16 +-
 fs/io_uring.c                               |   2 +-
 include/linux/memcontrol.h                  |  63 ++++
 include/linux/mm.h                          | 174 ++++++++--
 include/linux/mm_types.h                    |  71 ++++
 include/linux/mmdebug.h                     |  20 ++
 include/linux/netfs.h                       |  77 +++--
 include/linux/page-flags.h                  | 230 ++++++++++---
 include/linux/page_idle.h                   |  99 +++---
 include/linux/page_ref.h                    | 158 ++++++++-
 include/linux/pagemap.h                     | 358 ++++++++++++--------
 include/linux/swap.h                        |   7 +-
 include/linux/vmstat.h                      | 107 ++++++
 mm/Makefile                                 |   2 +-
 mm/filemap.c                                | 315 ++++++++---------
 mm/folio-compat.c                           |  43 +++
 mm/internal.h                               |   1 +
 mm/memory.c                                 |   8 +-
 mm/page-writeback.c                         |  72 ++--
 mm/page_io.c                                |   4 +-
 mm/swap.c                                   |  18 +-
 mm/swapfile.c                               |   8 +-
 mm/util.c                                   |  59 ++--
 26 files changed, 1374 insertions(+), 553 deletions(-)
 create mode 100644 mm/folio-compat.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 01/33] mm: Introduce struct folio
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 10:34   ` Vlastimil Babka
                     ` (3 more replies)
  2021-05-11 21:47 ` [PATCH v10 02/33] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
                   ` (34 subsequent siblings)
  35 siblings, 4 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Jeff Layton

A struct folio is a new abstraction to replace the venerable struct page.
A function which takes a struct folio argument declares that it will
operate on the entire (possibly compound) page, not just PAGE_SIZE bytes.
In return, the caller guarantees that the pointer it is passing does
not point to a tail page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 Documentation/core-api/mm-api.rst |  1 +
 include/linux/mm.h                | 74 +++++++++++++++++++++++++++++++
 include/linux/mm_types.h          | 60 +++++++++++++++++++++++++
 include/linux/page-flags.h        | 27 +++++++++++
 4 files changed, 162 insertions(+)

diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst
index a42f9baddfbf..2a94e6164f80 100644
--- a/Documentation/core-api/mm-api.rst
+++ b/Documentation/core-api/mm-api.rst
@@ -95,6 +95,7 @@ More Memory Management Functions
 .. kernel-doc:: mm/mempolicy.c
 .. kernel-doc:: include/linux/mm_types.h
    :internal:
+.. kernel-doc:: include/linux/page-flags.h
 .. kernel-doc:: include/linux/mm.h
    :internal:
 .. kernel-doc:: include/linux/mmzone.h
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2327f99b121f..b29c86824e6b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -950,6 +950,20 @@ static inline unsigned int compound_order(struct page *page)
 	return page[1].compound_order;
 }
 
+/**
+ * folio_order - The allocation order of a folio.
+ * @folio: The folio.
+ *
+ * A folio is composed of 2^order pages.  See get_order() for the definition
+ * of order.
+ *
+ * Return: The order of the folio.
+ */
+static inline unsigned int folio_order(struct folio *folio)
+{
+	return compound_order(&folio->page);
+}
+
 static inline bool hpage_pincount_available(struct page *page)
 {
 	/*
@@ -1595,6 +1609,65 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
 #endif
 }
 
+/**
+ * folio_nr_pages - The number of pages in the folio.
+ * @folio: The folio.
+ *
+ * Return: A number which is a power of two.
+ */
+static inline unsigned long folio_nr_pages(struct folio *folio)
+{
+	return compound_nr(&folio->page);
+}
+
+/**
+ * folio_next - Move to the next physical folio.
+ * @folio: The folio we're currently operating on.
+ *
+ * If you have physically contiguous memory which may span more than
+ * one folio (eg a &struct bio_vec), use this function to move from one
+ * folio to the next.  Do not use it if the memory is only virtually
+ * contiguous as the folios are almost certainly not adjacent to each
+ * other.  This is the folio equivalent to writing ``page++``.
+ *
+ * Context: We assume that the folios are refcounted and/or locked at a
+ * higher level and do not adjust the reference counts.
+ * Return: The next struct folio.
+ */
+static inline struct folio *folio_next(struct folio *folio)
+{
+	return (struct folio *)folio_page(folio, folio_nr_pages(folio));
+}
+
+/**
+ * folio_shift - The number of bits covered by this folio.
+ * @folio: The folio.
+ *
+ * A folio contains a number of bytes which is a power-of-two in size.
+ * This function tells you which power-of-two the folio is.
+ *
+ * Context: The caller should have a reference on the folio to prevent
+ * it from being split.  It is not necessary for the folio to be locked.
+ * Return: The base-2 logarithm of the size of this folio.
+ */
+static inline unsigned int folio_shift(struct folio *folio)
+{
+	return PAGE_SHIFT + folio_order(folio);
+}
+
+/**
+ * folio_size - The number of bytes in a folio.
+ * @folio: The folio.
+ *
+ * Context: The caller should have a reference on the folio to prevent
+ * it from being split.  It is not necessary for the folio to be locked.
+ * Return: The number of bytes in this folio.
+ */
+static inline size_t folio_size(struct folio *folio)
+{
+	return PAGE_SIZE << folio_order(folio);
+}
+
 /*
  * Some inline functions in vmstat.h depend on page_zone()
  */
@@ -1699,6 +1772,7 @@ extern void pagefault_out_of_memory(void);
 
 #define offset_in_page(p)	((unsigned long)(p) & ~PAGE_MASK)
 #define offset_in_thp(page, p)	((unsigned long)(p) & (thp_size(page) - 1))
+#define offset_in_folio(folio, p) ((unsigned long)(p) & (folio_size(folio) - 1))
 
 /*
  * Flags passed to show_mem() and show_free_areas() to suppress output in
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5aacc1c10a45..3118ba8b5a4e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -224,6 +224,66 @@ struct page {
 #endif
 } _struct_page_alignment;
 
+/**
+ * struct folio - Represents a contiguous set of bytes.
+ * @flags: Identical to the page flags.
+ * @lru: Least Recently Used list; tracks how recently this folio was used.
+ * @mapping: The file this page belongs to, or refers to the anon_vma for
+ *    anonymous pages.
+ * @index: Offset within the file, in units of pages.  For anonymous pages,
+ *    this is the index from the beginning of the mmap.
+ * @private: Filesystem per-folio data (see folio_attach_private()).
+ *    Used for swp_entry_t if folio_swapcache().
+ * @_mapcount: Do not access this member directly.  Use folio_mapcount() to
+ *    find out how many times this folio is mapped by userspace.
+ * @_refcount: Do not access this member directly.  Use folio_ref_count()
+ *    to find how many references there are to this folio.
+ * @memcg_data: Memory Control Group data.
+ *
+ * A folio is a physically, virtually and logically contiguous set
+ * of bytes.  It is a power-of-two in size, and it is aligned to that
+ * same power-of-two.  It is at least as large as %PAGE_SIZE.  If it is
+ * in the page cache, it is at a file offset which is a multiple of that
+ * power-of-two.  It may be mapped into userspace at an address which is
+ * at an arbitrary page offset, but its kernel virtual address is aligned
+ * to its size.
+ */
+struct folio {
+	/* private: don't document the anon union */
+	union {
+		struct {
+	/* public: */
+			unsigned long flags;
+			struct list_head lru;
+			struct address_space *mapping;
+			pgoff_t index;
+			void *private;
+			atomic_t _mapcount;
+			atomic_t _refcount;
+#ifdef CONFIG_MEMCG
+			unsigned long memcg_data;
+#endif
+	/* private: the union with struct page is transitional */
+		};
+		struct page page;
+	};
+};
+
+static_assert(sizeof(struct page) == sizeof(struct folio));
+#define FOLIO_MATCH(pg, fl)						\
+	static_assert(offsetof(struct page, pg) == offsetof(struct folio, fl))
+FOLIO_MATCH(flags, flags);
+FOLIO_MATCH(lru, lru);
+FOLIO_MATCH(compound_head, lru);
+FOLIO_MATCH(index, index);
+FOLIO_MATCH(private, private);
+FOLIO_MATCH(_mapcount, _mapcount);
+FOLIO_MATCH(_refcount, _refcount);
+#ifdef CONFIG_MEMCG
+FOLIO_MATCH(memcg_data, memcg_data);
+#endif
+#undef FOLIO_MATCH
+
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
 {
 	return &page[1].compound_mapcount;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d8e26243db25..e069aa8b11b7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -188,6 +188,33 @@ static inline unsigned long _compound_head(const struct page *page)
 
 #define compound_head(page)	((typeof(page))_compound_head(page))
 
+/**
+ * page_folio - Converts from page to folio.
+ * @p: The page.
+ *
+ * Every page is part of a folio.  This function cannot be called on a
+ * NULL pointer.
+ *
+ * Context: No reference, nor lock is required on @page.  If the caller
+ * does not hold a reference, this call may race with a folio split, so
+ * it should re-check the folio still contains this page after gaining
+ * a reference on the folio.
+ * Return: The folio which contains this page.
+ */
+#define page_folio(p)		(_Generic((p),				\
+	const struct page *:	(const struct folio *)_compound_head(p), \
+	struct page *:		(struct folio *)_compound_head(p)))
+
+/**
+ * folio_page - Return a page from a folio.
+ * @folio: The folio.
+ * @n: The page number to return.
+ *
+ * @n is relative to the start of the folio.  It should be between
+ * 0 and folio_nr_pages(@folio) - 1, but this is not checked for.
+ */
+#define folio_page(folio, n)	nth_page(&(folio)->page, n)
+
 static __always_inline int PageTail(struct page *page)
 {
 	return READ_ONCE(page->compound_head) & 1;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 02/33] mm: Add folio_pgdat and folio_zone
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
  2021-05-11 21:47 ` [PATCH v10 01/33] mm: Introduce struct folio Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 10:35   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 03/33] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
                   ` (33 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Zi Yan, Christoph Hellwig,
	Jeff Layton

These are just convenience wrappers for callers with folios; pgdat and
zone can be reached from tail pages as well as head pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/mm.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b29c86824e6b..a55c2c0628b6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1560,6 +1560,16 @@ static inline pg_data_t *page_pgdat(const struct page *page)
 	return NODE_DATA(page_to_nid(page));
 }
 
+static inline struct zone *folio_zone(const struct folio *folio)
+{
+	return page_zone(&folio->page);
+}
+
+static inline pg_data_t *folio_pgdat(const struct folio *folio)
+{
+	return page_pgdat(&folio->page);
+}
+
 #ifdef SECTION_IN_PAGE_FLAGS
 static inline void set_page_section(struct page *page, unsigned long section)
 {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 03/33] mm/vmstat: Add functions to account folio statistics
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
  2021-05-11 21:47 ` [PATCH v10 01/33] mm: Introduce struct folio Matthew Wilcox (Oracle)
  2021-05-11 21:47 ` [PATCH v10 02/33] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 10:36   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 04/33] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
                   ` (32 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Allow page counters to be more readily modified by callers which have
a folio.  Name these wrappers with 'stat' instead of 'state' as requested
by Linus here:
https://lore.kernel.org/linux-mm/CAHk-=wj847SudR-kt+46fT3+xFFgiwpgThvm7DJWGdi4cVrbnQ@mail.gmail.com/

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/vmstat.h | 107 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 3299cd69e4ca..d287d7c31b8f 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -402,6 +402,78 @@ static inline void drain_zonestat(struct zone *zone,
 			struct per_cpu_pageset *pset) { }
 #endif		/* CONFIG_SMP */
 
+static inline void __zone_stat_mod_folio(struct folio *folio,
+		enum zone_stat_item item, long nr)
+{
+	__mod_zone_page_state(folio_zone(folio), item, nr);
+}
+
+static inline void __zone_stat_add_folio(struct folio *folio,
+		enum zone_stat_item item)
+{
+	__mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
+}
+
+static inline void __zone_stat_sub_folio(struct folio *folio,
+		enum zone_stat_item item)
+{
+	__mod_zone_page_state(folio_zone(folio), item, -folio_nr_pages(folio));
+}
+
+static inline void zone_stat_mod_folio(struct folio *folio,
+		enum zone_stat_item item, long nr)
+{
+	mod_zone_page_state(folio_zone(folio), item, nr);
+}
+
+static inline void zone_stat_add_folio(struct folio *folio,
+		enum zone_stat_item item)
+{
+	mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
+}
+
+static inline void zone_stat_sub_folio(struct folio *folio,
+		enum zone_stat_item item)
+{
+	mod_zone_page_state(folio_zone(folio), item, -folio_nr_pages(folio));
+}
+
+static inline void __node_stat_mod_folio(struct folio *folio,
+		enum node_stat_item item, long nr)
+{
+	__mod_node_page_state(folio_pgdat(folio), item, nr);
+}
+
+static inline void __node_stat_add_folio(struct folio *folio,
+		enum node_stat_item item)
+{
+	__mod_node_page_state(folio_pgdat(folio), item, folio_nr_pages(folio));
+}
+
+static inline void __node_stat_sub_folio(struct folio *folio,
+		enum node_stat_item item)
+{
+	__mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
+}
+
+static inline void node_stat_mod_folio(struct folio *folio,
+		enum node_stat_item item, long nr)
+{
+	mod_node_page_state(folio_pgdat(folio), item, nr);
+}
+
+static inline void node_stat_add_folio(struct folio *folio,
+		enum node_stat_item item)
+{
+	mod_node_page_state(folio_pgdat(folio), item, folio_nr_pages(folio));
+}
+
+static inline void node_stat_sub_folio(struct folio *folio,
+		enum node_stat_item item)
+{
+	mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
+}
+
 static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
 					     int migratetype)
 {
@@ -530,6 +602,24 @@ static inline void __dec_lruvec_page_state(struct page *page,
 	__mod_lruvec_page_state(page, idx, -1);
 }
 
+static inline void __lruvec_stat_mod_folio(struct folio *folio,
+					   enum node_stat_item idx, int val)
+{
+	__mod_lruvec_page_state(&folio->page, idx, val);
+}
+
+static inline void __lruvec_stat_add_folio(struct folio *folio,
+					   enum node_stat_item idx)
+{
+	__lruvec_stat_mod_folio(folio, idx, folio_nr_pages(folio));
+}
+
+static inline void __lruvec_stat_sub_folio(struct folio *folio,
+					   enum node_stat_item idx)
+{
+	__lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio));
+}
+
 static inline void inc_lruvec_page_state(struct page *page,
 					 enum node_stat_item idx)
 {
@@ -542,4 +632,21 @@ static inline void dec_lruvec_page_state(struct page *page,
 	mod_lruvec_page_state(page, idx, -1);
 }
 
+static inline void lruvec_stat_mod_folio(struct folio *folio,
+					 enum node_stat_item idx, int val)
+{
+	mod_lruvec_page_state(&folio->page, idx, val);
+}
+
+static inline void lruvec_stat_add_folio(struct folio *folio,
+					 enum node_stat_item idx)
+{
+	lruvec_stat_mod_folio(folio, idx, folio_nr_pages(folio));
+}
+
+static inline void lruvec_stat_sub_folio(struct folio *folio,
+					 enum node_stat_item idx)
+{
+	lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio));
+}
 #endif /* _LINUX_VMSTAT_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 04/33] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 03/33] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 10:44   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 05/33] mm: Add folio reference count functions Matthew Wilcox (Oracle)
                   ` (31 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Zi Yan, Christoph Hellwig,
	Jeff Layton

These are the folio equivalents of VM_BUG_ON_PAGE and VM_WARN_ON_ONCE_PAGE.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/mmdebug.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 1935d4c72d10..d7285f8148a3 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -22,6 +22,13 @@ void dump_mm(const struct mm_struct *mm);
 			BUG();						\
 		}							\
 	} while (0)
+#define VM_BUG_ON_FOLIO(cond, folio)					\
+	do {								\
+		if (unlikely(cond)) {					\
+			dump_page(&folio->page, "VM_BUG_ON_FOLIO(" __stringify(cond)")");\
+			BUG();						\
+		}							\
+	} while (0)
 #define VM_BUG_ON_VMA(cond, vma)					\
 	do {								\
 		if (unlikely(cond)) {					\
@@ -47,6 +54,17 @@ void dump_mm(const struct mm_struct *mm);
 	}								\
 	unlikely(__ret_warn_once);					\
 })
+#define VM_WARN_ON_ONCE_FOLIO(cond, folio)	({			\
+	static bool __section(".data.once") __warned;			\
+	int __ret_warn_once = !!(cond);					\
+									\
+	if (unlikely(__ret_warn_once && !__warned)) {			\
+		dump_page(&folio->page, "VM_WARN_ON_ONCE_FOLIO(" __stringify(cond)")");\
+		__warned = true;					\
+		WARN_ON(1);						\
+	}								\
+	unlikely(__ret_warn_once);					\
+})
 
 #define VM_WARN_ON(cond) (void)WARN_ON(cond)
 #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
@@ -55,11 +73,13 @@ void dump_mm(const struct mm_struct *mm);
 #else
 #define VM_BUG_ON(cond) BUILD_BUG_ON_INVALID(cond)
 #define VM_BUG_ON_PAGE(cond, page) VM_BUG_ON(cond)
+#define VM_BUG_ON_FOLIO(cond, folio) VM_BUG_ON(cond)
 #define VM_BUG_ON_VMA(cond, vma) VM_BUG_ON(cond)
 #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
 #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
+#define VM_WARN_ON_ONCE_FOLIO(cond, folio)  BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #endif
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 05/33] mm: Add folio reference count functions
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 04/33] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 11:04   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 06/33] mm: Add folio_put Matthew Wilcox (Oracle)
                   ` (30 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

These functions mirror their page reference counterparts.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 Documentation/core-api/mm-api.rst |  1 +
 include/linux/page_ref.h          | 88 ++++++++++++++++++++++++++++++-
 2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst
index 2a94e6164f80..5c459ee2acce 100644
--- a/Documentation/core-api/mm-api.rst
+++ b/Documentation/core-api/mm-api.rst
@@ -98,4 +98,5 @@ More Memory Management Functions
 .. kernel-doc:: include/linux/page-flags.h
 .. kernel-doc:: include/linux/mm.h
    :internal:
+.. kernel-doc:: include/linux/page_ref.h
 .. kernel-doc:: include/linux/mmzone.h
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 7ad46f45df39..85816b2c0496 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -67,9 +67,31 @@ static inline int page_ref_count(const struct page *page)
 	return atomic_read(&page->_refcount);
 }
 
+/**
+ * folio_ref_count - The reference count on this folio.
+ * @folio: The folio.
+ *
+ * The refcount is usually incremented by calls to folio_get() and
+ * decremented by calls to folio_put().  Some typical users of the
+ * folio refcount:
+ *
+ * - Each reference from a page table
+ * - The page cache
+ * - Filesystem private data
+ * - The LRU list
+ * - Pipes
+ * - Direct IO which references this page in the process address space
+ *
+ * Return: The number of references to this folio.
+ */
+static inline int folio_ref_count(const struct folio *folio)
+{
+	return page_ref_count(&folio->page);
+}
+
 static inline int page_count(const struct page *page)
 {
-	return atomic_read(&compound_head(page)->_refcount);
+	return folio_ref_count(page_folio(page));
 }
 
 static inline void set_page_count(struct page *page, int v)
@@ -79,6 +101,11 @@ static inline void set_page_count(struct page *page, int v)
 		__page_ref_set(page, v);
 }
 
+static inline void folio_set_count(struct folio *folio, int v)
+{
+	set_page_count(&folio->page, v);
+}
+
 /*
  * Setup the page count before being freed into the page allocator for
  * the first time (boot or memory hotplug)
@@ -95,6 +122,11 @@ static inline void page_ref_add(struct page *page, int nr)
 		__page_ref_mod(page, nr);
 }
 
+static inline void folio_ref_add(struct folio *folio, int nr)
+{
+	page_ref_add(&folio->page, nr);
+}
+
 static inline void page_ref_sub(struct page *page, int nr)
 {
 	atomic_sub(nr, &page->_refcount);
@@ -102,6 +134,11 @@ static inline void page_ref_sub(struct page *page, int nr)
 		__page_ref_mod(page, -nr);
 }
 
+static inline void folio_ref_sub(struct folio *folio, int nr)
+{
+	page_ref_sub(&folio->page, nr);
+}
+
 static inline int page_ref_sub_return(struct page *page, int nr)
 {
 	int ret = atomic_sub_return(nr, &page->_refcount);
@@ -111,6 +148,11 @@ static inline int page_ref_sub_return(struct page *page, int nr)
 	return ret;
 }
 
+static inline int folio_ref_sub_return(struct folio *folio, int nr)
+{
+	return page_ref_sub_return(&folio->page, nr);
+}
+
 static inline void page_ref_inc(struct page *page)
 {
 	atomic_inc(&page->_refcount);
@@ -118,6 +160,11 @@ static inline void page_ref_inc(struct page *page)
 		__page_ref_mod(page, 1);
 }
 
+static inline void folio_ref_inc(struct folio *folio)
+{
+	page_ref_inc(&folio->page);
+}
+
 static inline void page_ref_dec(struct page *page)
 {
 	atomic_dec(&page->_refcount);
@@ -125,6 +172,11 @@ static inline void page_ref_dec(struct page *page)
 		__page_ref_mod(page, -1);
 }
 
+static inline void folio_ref_dec(struct folio *folio)
+{
+	page_ref_dec(&folio->page);
+}
+
 static inline int page_ref_sub_and_test(struct page *page, int nr)
 {
 	int ret = atomic_sub_and_test(nr, &page->_refcount);
@@ -134,6 +186,11 @@ static inline int page_ref_sub_and_test(struct page *page, int nr)
 	return ret;
 }
 
+static inline int folio_ref_sub_and_test(struct folio *folio, int nr)
+{
+	return page_ref_sub_and_test(&folio->page, nr);
+}
+
 static inline int page_ref_inc_return(struct page *page)
 {
 	int ret = atomic_inc_return(&page->_refcount);
@@ -143,6 +200,11 @@ static inline int page_ref_inc_return(struct page *page)
 	return ret;
 }
 
+static inline int folio_ref_inc_return(struct folio *folio)
+{
+	return page_ref_inc_return(&folio->page);
+}
+
 static inline int page_ref_dec_and_test(struct page *page)
 {
 	int ret = atomic_dec_and_test(&page->_refcount);
@@ -152,6 +214,11 @@ static inline int page_ref_dec_and_test(struct page *page)
 	return ret;
 }
 
+static inline int folio_ref_dec_and_test(struct folio *folio)
+{
+	return page_ref_dec_and_test(&folio->page);
+}
+
 static inline int page_ref_dec_return(struct page *page)
 {
 	int ret = atomic_dec_return(&page->_refcount);
@@ -161,6 +228,11 @@ static inline int page_ref_dec_return(struct page *page)
 	return ret;
 }
 
+static inline int folio_ref_dec_return(struct folio *folio)
+{
+	return page_ref_dec_return(&folio->page);
+}
+
 static inline int page_ref_add_unless(struct page *page, int nr, int u)
 {
 	int ret = atomic_add_unless(&page->_refcount, nr, u);
@@ -170,6 +242,11 @@ static inline int page_ref_add_unless(struct page *page, int nr, int u)
 	return ret;
 }
 
+static inline int folio_ref_add_unless(struct folio *folio, int nr, int u)
+{
+	return page_ref_add_unless(&folio->page, nr, u);
+}
+
 static inline int page_ref_freeze(struct page *page, int count)
 {
 	int ret = likely(atomic_cmpxchg(&page->_refcount, count, 0) == count);
@@ -179,6 +256,11 @@ static inline int page_ref_freeze(struct page *page, int count)
 	return ret;
 }
 
+static inline int folio_ref_freeze(struct folio *folio, int count)
+{
+	return page_ref_freeze(&folio->page, count);
+}
+
 static inline void page_ref_unfreeze(struct page *page, int count)
 {
 	VM_BUG_ON_PAGE(page_count(page) != 0, page);
@@ -189,4 +271,8 @@ static inline void page_ref_unfreeze(struct page *page, int count)
 		__page_ref_unfreeze(page, count);
 }
 
+static inline void folio_ref_unfreeze(struct folio *folio, int count)
+{
+	page_ref_unfreeze(&folio->page, count);
+}
 #endif
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 06/33] mm: Add folio_put
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 05/33] mm: Add folio reference count functions Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 11:52   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 07/33] mm: Add folio_get Matthew Wilcox (Oracle)
                   ` (29 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Zi Yan, Christoph Hellwig,
	Jeff Layton

If we know we have a folio, we can call folio_put() instead of put_page()
and save the overhead of calling compound_head().  Also skips the
devmap checks.

This commit looks like it should be a no-op, but actually saves 1312 bytes
of text with the distro-derived config that I'm testing.  Some functions
grow a little while others shrink.  I presume the compiler is making
different inlining decisions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/mm.h | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a55c2c0628b6..610948f0cb43 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -751,6 +751,11 @@ static inline int put_page_testzero(struct page *page)
 	return page_ref_dec_and_test(page);
 }
 
+static inline int folio_put_testzero(struct folio *folio)
+{
+	return put_page_testzero(&folio->page);
+}
+
 /*
  * Try to grab a ref unless the page has a refcount of zero, return false if
  * that is the case.
@@ -1242,9 +1247,28 @@ static inline __must_check bool try_get_page(struct page *page)
 	return true;
 }
 
+/**
+ * folio_put - Decrement the reference count on a folio.
+ * @folio: The folio.
+ *
+ * If the folio's reference count reaches zero, the memory will be
+ * released back to the page allocator and may be used by another
+ * allocation immediately.  Do not access the memory or the struct folio
+ * after calling folio_put() unless you can be sure that it wasn't the
+ * last reference.
+ *
+ * Context: May be called in process or interrupt context, but not in NMI
+ * context.  May be called while holding a spinlock.
+ */
+static inline void folio_put(struct folio *folio)
+{
+	if (folio_put_testzero(folio))
+		__put_page(&folio->page);
+}
+
 static inline void put_page(struct page *page)
 {
-	page = compound_head(page);
+	struct folio *folio = page_folio(page);
 
 	/*
 	 * For devmap managed pages we need to catch refcount transition from
@@ -1252,13 +1276,12 @@ static inline void put_page(struct page *page)
 	 * need to inform the device driver through callback. See
 	 * include/linux/memremap.h and HMM for details.
 	 */
-	if (page_is_devmap_managed(page)) {
-		put_devmap_managed_page(page);
+	if (page_is_devmap_managed(&folio->page)) {
+		put_devmap_managed_page(&folio->page);
 		return;
 	}
 
-	if (put_page_testzero(page))
-		__put_page(page);
+	folio_put(folio);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 07/33] mm: Add folio_get
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 06/33] mm: Add folio_put Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 11:56   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 08/33] mm: Add folio_try_get_rcu Matthew Wilcox (Oracle)
                   ` (28 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Zi Yan, Christoph Hellwig,
	Jeff Layton

If we know we have a folio, we can call folio_get() instead
of get_page() and save the overhead of calling compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/mm.h | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 610948f0cb43..feb4645ef4f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1219,18 +1219,26 @@ static inline bool is_pci_p2pdma_page(const struct page *page)
 }
 
 /* 127: arbitrary random number, small enough to assemble well */
-#define page_ref_zero_or_close_to_overflow(page) \
-	((unsigned int) page_ref_count(page) + 127u <= 127u)
+#define folio_ref_zero_or_close_to_overflow(folio) \
+	((unsigned int) folio_ref_count(folio) + 127u <= 127u)
+
+/**
+ * folio_get - Increment the reference count on a folio.
+ * @folio: The folio.
+ *
+ * Context: May be called in any context, as long as you know that
+ * you have a refcount on the folio.  If you do not already have one,
+ * folio_try_get() may be the right interface for you to use.
+ */
+static inline void folio_get(struct folio *folio)
+{
+	VM_BUG_ON_FOLIO(folio_ref_zero_or_close_to_overflow(folio), folio);
+	folio_ref_inc(folio);
+}
 
 static inline void get_page(struct page *page)
 {
-	page = compound_head(page);
-	/*
-	 * Getting a normal page or the head of a compound page
-	 * requires to already have an elevated page->_refcount.
-	 */
-	VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
-	page_ref_inc(page);
+	folio_get(page_folio(page));
 }
 
 bool __must_check try_grab_page(struct page *page, unsigned int flags);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 08/33] mm: Add folio_try_get_rcu
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 07/33] mm: Add folio_get Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 12:11   ` Vlastimil Babka
  2021-05-27  8:16   ` Christoph Hellwig
  2021-05-11 21:47 ` [PATCH v10 09/33] mm: Add folio flag manipulation functions Matthew Wilcox (Oracle)
                   ` (27 subsequent siblings)
  35 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm, linux-kernel

This is the equivalent of page_cache_get_speculative().  Also add
folio_ref_try_add_rcu (the equivalent of page_cache_add_speculative)
and folio_get_unless_zero() (the equivalent of get_page_unless_zero()).

The new kernel-doc attempts to explain from the user's point of view
when to use folio_try_get_rcu() and when to use folio_get_unless_zero(),
because there seems to be some confusion currently between the users of
page_cache_get_speculative() and get_page_unless_zero().

Reimplement page_cache_add_speculative() and page_cache_get_speculative()
as wrappers around the folio equivalents, but leave get_page_unless_zero()
alone for now.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/page_ref.h | 72 ++++++++++++++++++++++++++++++++--
 include/linux/pagemap.h  | 84 ++--------------------------------------
 mm/filemap.c             | 20 ++++++++++
 3 files changed, 93 insertions(+), 83 deletions(-)

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 85816b2c0496..2e677e6ad09f 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -233,20 +233,86 @@ static inline int folio_ref_dec_return(struct folio *folio)
 	return page_ref_dec_return(&folio->page);
 }
 
-static inline int page_ref_add_unless(struct page *page, int nr, int u)
+static inline bool page_ref_add_unless(struct page *page, int nr, int u)
 {
-	int ret = atomic_add_unless(&page->_refcount, nr, u);
+	bool ret = atomic_add_unless(&page->_refcount, nr, u);
 
 	if (page_ref_tracepoint_active(page_ref_mod_unless))
 		__page_ref_mod_unless(page, nr, ret);
 	return ret;
 }
 
-static inline int folio_ref_add_unless(struct folio *folio, int nr, int u)
+static inline bool folio_ref_add_unless(struct folio *folio, int nr, int u)
 {
 	return page_ref_add_unless(&folio->page, nr, u);
 }
 
+/**
+ * folio_try_get - Attempt to increase the refcount on a folio.
+ * @folio: The folio.
+ *
+ * If you do not already have a reference to a folio, you can attempt to
+ * get one using this function.  It may fail if, for example, the folio
+ * has been freed since you found a pointer to it, or it is frozen for
+ * the purposes of splitting or migration.
+ *
+ * Return: True if the reference count was successfully incremented.
+ */
+static inline bool folio_try_get(struct folio *folio)
+{
+	return folio_ref_add_unless(folio, 1, 0);
+}
+
+static inline bool folio_ref_try_add_rcu(struct folio *folio, int count)
+{
+#ifdef CONFIG_TINY_RCU
+	/*
+	 * The caller guarantees the folio will not be freed from interrupt
+	 * context, so (on !SMP) we only need preemption to be disabled
+	 * and TINY_RCU does that for us.
+	 */
+# ifdef CONFIG_PREEMPT_COUNT
+	VM_BUG_ON(!in_atomic() && !irqs_disabled());
+# endif
+	VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio);
+	folio_ref_add(folio, count);
+#else
+	if (unlikely(!folio_ref_add_unless(folio, count, 0))) {
+		/* Either the folio has been freed, or will be freed. */
+		return false;
+	}
+#endif
+	return true;
+}
+
+/**
+ * folio_try_get_rcu - Attempt to increase the refcount on a folio.
+ * @folio: The folio.
+ *
+ * This is a version of folio_try_get() optimised for non-SMP kernels.
+ * If you are still holding the rcu_read_lock() after looking up the
+ * page and know that the page cannot have its refcount decreased to
+ * zero in interrupt context, you can use this instead of folio_try_get().
+ *
+ * Example users include get_user_pages_fast() (as pages are not unmapped
+ * from interrupt context) and the page cache lookups (as pages are not
+ * truncated from interrupt context).  We also know that pages are not
+ * frozen in interrupt context for the purposes of splitting or migration.
+ *
+ * You can also use this function if you're holding a lock that prevents
+ * pages being frozen & removed; eg the i_pages lock for the page cache
+ * or the mmap_sem or page table lock for page tables.  In this case,
+ * it will always succeed, and you could have used a plain folio_get(),
+ * but it's sometimes more convenient to have a common function called
+ * from both locked and RCU-protected contexts.
+ *
+ * Return: True if the reference count was successfully incremented.
+ */
+static inline bool folio_try_get_rcu(struct folio *folio)
+{
+	return folio_ref_try_add_rcu(folio, 1);
+}
+
 static inline int page_ref_freeze(struct page *page, int count)
 {
 	int ret = likely(atomic_cmpxchg(&page->_refcount, count, 0) == count);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a4bd41128bf3..4900e64c880d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -172,91 +172,15 @@ static inline struct address_space *page_mapping_file(struct page *page)
 	return page_mapping(page);
 }
 
-/*
- * speculatively take a reference to a page.
- * If the page is free (_refcount == 0), then _refcount is untouched, and 0
- * is returned. Otherwise, _refcount is incremented by 1 and 1 is returned.
- *
- * This function must be called inside the same rcu_read_lock() section as has
- * been used to lookup the page in the pagecache radix-tree (or page table):
- * this allows allocators to use a synchronize_rcu() to stabilize _refcount.
- *
- * Unless an RCU grace period has passed, the count of all pages coming out
- * of the allocator must be considered unstable. page_count may return higher
- * than expected, and put_page must be able to do the right thing when the
- * page has been finished with, no matter what it is subsequently allocated
- * for (because put_page is what is used here to drop an invalid speculative
- * reference).
- *
- * This is the interesting part of the lockless pagecache (and lockless
- * get_user_pages) locking protocol, where the lookup-side (eg. find_get_page)
- * has the following pattern:
- * 1. find page in radix tree
- * 2. conditionally increment refcount
- * 3. check the page is still in pagecache (if no, goto 1)
- *
- * Remove-side that cares about stability of _refcount (eg. reclaim) has the
- * following (with the i_pages lock held):
- * A. atomically check refcount is correct and set it to 0 (atomic_cmpxchg)
- * B. remove page from pagecache
- * C. free the page
- *
- * There are 2 critical interleavings that matter:
- * - 2 runs before A: in this case, A sees elevated refcount and bails out
- * - A runs before 2: in this case, 2 sees zero refcount and retries;
- *   subsequently, B will complete and 1 will find no page, causing the
- *   lookup to return NULL.
- *
- * It is possible that between 1 and 2, the page is removed then the exact same
- * page is inserted into the same position in pagecache. That's OK: the
- * old find_get_page using a lock could equally have run before or after
- * such a re-insertion, depending on order that locks are granted.
- *
- * Lookups racing against pagecache insertion isn't a big problem: either 1
- * will find the page or it will not. Likewise, the old find_get_page could run
- * either before the insertion or afterwards, depending on timing.
- */
-static inline int __page_cache_add_speculative(struct page *page, int count)
+static inline bool page_cache_add_speculative(struct page *page, int count)
 {
-#ifdef CONFIG_TINY_RCU
-# ifdef CONFIG_PREEMPT_COUNT
-	VM_BUG_ON(!in_atomic() && !irqs_disabled());
-# endif
-	/*
-	 * Preempt must be disabled here - we rely on rcu_read_lock doing
-	 * this for us.
-	 *
-	 * Pagecache won't be truncated from interrupt context, so if we have
-	 * found a page in the radix tree here, we have pinned its refcount by
-	 * disabling preempt, and hence no need for the "speculative get" that
-	 * SMP requires.
-	 */
-	VM_BUG_ON_PAGE(page_count(page) == 0, page);
-	page_ref_add(page, count);
-
-#else
-	if (unlikely(!page_ref_add_unless(page, count, 0))) {
-		/*
-		 * Either the page has been freed, or will be freed.
-		 * In either case, retry here and the caller should
-		 * do the right thing (see comments above).
-		 */
-		return 0;
-	}
-#endif
 	VM_BUG_ON_PAGE(PageTail(page), page);
-
-	return 1;
-}
-
-static inline int page_cache_get_speculative(struct page *page)
-{
-	return __page_cache_add_speculative(page, 1);
+	return folio_ref_try_add_rcu((struct folio *)page, count);
 }
 
-static inline int page_cache_add_speculative(struct page *page, int count)
+static inline bool page_cache_get_speculative(struct page *page)
 {
-	return __page_cache_add_speculative(page, count);
+	return page_cache_add_speculative(page, 1);
 }
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index 66f7e9fdfbc4..817a47059bd0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1746,6 +1746,26 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
 }
 EXPORT_SYMBOL(page_cache_prev_miss);
 
+/*
+ * Lockless page cache protocol:
+ * On the lookup side:
+ * 1. Load the folio from i_pages
+ * 2. Increment the refcount if it's not zero
+ * 3. If the folio is not found by xas_reload(), put the refcount and retry
+ *
+ * On the removal side:
+ * A. Freeze the page (by zeroing the refcount if nobody else has a reference)
+ * B. Remove the page from i_pages
+ * C. Return the page to the page allocator
+ *
+ * This means that any page may have its reference count temporarily
+ * increased by a speculative page cache (or fast GUP) lookup as it can
+ * be allocated by another user before the RCU grace period expires.
+ * Because the refcount temporarily acquired here may end up being the
+ * last refcount on the page, any page allocation must be freeable by
+ * put_folio().
+ */
+
 /*
  * mapping_get_entry - Get a page cache entry.
  * @mapping: the address_space to search
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 09/33] mm: Add folio flag manipulation functions
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 08/33] mm: Add folio_try_get_rcu Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 15:29   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 10/33] mm: Add folio_young and folio_idle Matthew Wilcox (Oracle)
                   ` (26 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

These new functions are the folio analogues of the various PageFlags
functions.  If CONFIG_DEBUG_VM_PGFLAGS is enabled, we check the folio
is not a tail page at every invocation.  This will also catch the
PagePoisoned case as a poisoned page has every bit set, which would
include PageTail.

This saves 1727 bytes of text with the distro-derived config that
I'm testing due to removing a double call to compound_head() in
PageSwapCache().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/page-flags.h | 203 +++++++++++++++++++++++++++----------
 1 file changed, 148 insertions(+), 55 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e069aa8b11b7..ef8b7c6dc91c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -140,6 +140,8 @@ enum pageflags {
 #endif
 	__NR_PAGEFLAGS,
 
+	PG_readahead = PG_reclaim,
+
 	/* Filesystems */
 	PG_checked = PG_owner_priv_1,
 
@@ -239,6 +241,15 @@ static inline void page_init_poison(struct page *page, size_t size)
 }
 #endif
 
+static unsigned long *folio_flags(struct folio *folio, unsigned n)
+{
+	struct page *page = &folio->page;
+
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
+	VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
+	return &page[n].flags;
+}
+
 /*
  * Page flags policies wrt compound pages
  *
@@ -283,34 +294,62 @@ static inline void page_init_poison(struct page *page, size_t size)
 		VM_BUG_ON_PGFLAGS(!PageHead(page), page);		\
 		PF_POISONED_CHECK(&page[1]); })
 
+/* Which page is the flag stored in */
+#define FOLIO_PF_ANY		0
+#define FOLIO_PF_HEAD		0
+#define FOLIO_PF_ONLY_HEAD	0
+#define FOLIO_PF_NO_TAIL	0
+#define FOLIO_PF_NO_COMPOUND	0
+#define FOLIO_PF_SECOND		1
+
 /*
  * Macros to create function definitions for page flags
  */
 #define TESTPAGEFLAG(uname, lname, policy)				\
+static __always_inline bool folio_##lname(struct folio *folio)		\
+{ return test_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }	\
 static __always_inline int Page##uname(struct page *page)		\
 	{ return test_bit(PG_##lname, &policy(page, 0)->flags); }
 
 #define SETPAGEFLAG(uname, lname, policy)				\
+static __always_inline							\
+void folio_set_##lname##_flag(struct folio *folio)			\
+{ set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }		\
 static __always_inline void SetPage##uname(struct page *page)		\
 	{ set_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define CLEARPAGEFLAG(uname, lname, policy)				\
+static __always_inline							\
+void folio_clear_##lname##_flag(struct folio *folio)			\
+{ clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }		\
 static __always_inline void ClearPage##uname(struct page *page)		\
 	{ clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define __SETPAGEFLAG(uname, lname, policy)				\
+static __always_inline							\
+void __folio_set_##lname##_flag(struct folio *folio)			\
+{ __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }		\
 static __always_inline void __SetPage##uname(struct page *page)		\
 	{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define __CLEARPAGEFLAG(uname, lname, policy)				\
+static __always_inline							\
+void __folio_clear_##lname##_flag(struct folio *folio)			\
+{ __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }	\
 static __always_inline void __ClearPage##uname(struct page *page)	\
 	{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define TESTSETFLAG(uname, lname, policy)				\
+static __always_inline							\
+bool folio_test_set_##lname##_flag(struct folio *folio)		\
+{ return test_and_set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
 static __always_inline int TestSetPage##uname(struct page *page)	\
 	{ return test_and_set_bit(PG_##lname, &policy(page, 1)->flags); }
 
 #define TESTCLEARFLAG(uname, lname, policy)				\
+static __always_inline							\
+bool folio_test_clear_##lname##_flag(struct folio *folio)		\
+{ return test_and_clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
 static __always_inline int TestClearPage##uname(struct page *page)	\
 	{ return test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }
 
@@ -328,29 +367,37 @@ static __always_inline int TestClearPage##uname(struct page *page)	\
 	TESTSETFLAG(uname, lname, policy)				\
 	TESTCLEARFLAG(uname, lname, policy)
 
-#define TESTPAGEFLAG_FALSE(uname)					\
+#define TESTPAGEFLAG_FALSE(uname, lname)				\
+static inline bool folio_##lname(const struct folio *folio) { return 0; } \
 static inline int Page##uname(const struct page *page) { return 0; }
 
-#define SETPAGEFLAG_NOOP(uname)						\
+#define SETPAGEFLAG_NOOP(uname, lname)					\
+static inline void folio_set_##lname##_flag(struct folio *folio) { }	\
 static inline void SetPage##uname(struct page *page) {  }
 
-#define CLEARPAGEFLAG_NOOP(uname)					\
+#define CLEARPAGEFLAG_NOOP(uname, lname)				\
+static inline void folio_clear_##lname##_flag(struct folio *folio) { }	\
 static inline void ClearPage##uname(struct page *page) {  }
 
-#define __CLEARPAGEFLAG_NOOP(uname)					\
+#define __CLEARPAGEFLAG_NOOP(uname, lname)				\
+static inline void __folio_clear_##lname_flags(struct folio *folio) { }	\
 static inline void __ClearPage##uname(struct page *page) {  }
 
-#define TESTSETFLAG_FALSE(uname)					\
+#define TESTSETFLAG_FALSE(uname, lname)					\
+static inline bool folio_test_set_##lname##_flag(struct folio *folio)	\
+{ return 0; }								\
 static inline int TestSetPage##uname(struct page *page) { return 0; }
 
-#define TESTCLEARFLAG_FALSE(uname)					\
+#define TESTCLEARFLAG_FALSE(uname, lname)				\
+static inline bool folio_test_clear_##lname##_flag(struct folio *folio) \
+{ return 0; }								\
 static inline int TestClearPage##uname(struct page *page) { return 0; }
 
-#define PAGEFLAG_FALSE(uname) TESTPAGEFLAG_FALSE(uname)			\
-	SETPAGEFLAG_NOOP(uname) CLEARPAGEFLAG_NOOP(uname)
+#define PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname, lname)	\
+	SETPAGEFLAG_NOOP(uname, lname) CLEARPAGEFLAG_NOOP(uname, lname)
 
-#define TESTSCFLAG_FALSE(uname)						\
-	TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)
+#define TESTSCFLAG_FALSE(uname, lname)					\
+	TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname)
 
 __PAGEFLAG(Locked, locked, PF_NO_TAIL)
 PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, PF_ONLY_HEAD)
@@ -406,8 +453,8 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
 PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
 	TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
-PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
-	TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
+PAGEFLAG(Readahead, readahead, PF_NO_COMPOUND)
+	TESTCLEARFLAG(Readahead, readahead, PF_NO_COMPOUND)
 
 #ifdef CONFIG_HIGHMEM
 /*
@@ -416,22 +463,25 @@ PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
  */
 #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
 #else
-PAGEFLAG_FALSE(HighMem)
+PAGEFLAG_FALSE(HighMem, highmem)
 #endif
 
 #ifdef CONFIG_SWAP
-static __always_inline int PageSwapCache(struct page *page)
+static __always_inline bool folio_swapcache(struct folio *folio)
 {
-#ifdef CONFIG_THP_SWAP
-	page = compound_head(page);
-#endif
-	return PageSwapBacked(page) && test_bit(PG_swapcache, &page->flags);
+	return folio_swapbacked(folio) &&
+			test_bit(PG_swapcache, folio_flags(folio, 0));
+}
 
+static __always_inline bool PageSwapCache(struct page *page)
+{
+	return folio_swapcache(page_folio(page));
 }
+
 SETPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
 CLEARPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
 #else
-PAGEFLAG_FALSE(SwapCache)
+PAGEFLAG_FALSE(SwapCache, swapcache)
 #endif
 
 PAGEFLAG(Unevictable, unevictable, PF_HEAD)
@@ -443,14 +493,14 @@ PAGEFLAG(Mlocked, mlocked, PF_NO_TAIL)
 	__CLEARPAGEFLAG(Mlocked, mlocked, PF_NO_TAIL)
 	TESTSCFLAG(Mlocked, mlocked, PF_NO_TAIL)
 #else
-PAGEFLAG_FALSE(Mlocked) __CLEARPAGEFLAG_NOOP(Mlocked)
-	TESTSCFLAG_FALSE(Mlocked)
+PAGEFLAG_FALSE(Mlocked, mlocked) __CLEARPAGEFLAG_NOOP(Mlocked, mlocked)
+	TESTSCFLAG_FALSE(Mlocked, mlocked)
 #endif
 
 #ifdef CONFIG_ARCH_USES_PG_UNCACHED
 PAGEFLAG(Uncached, uncached, PF_NO_COMPOUND)
 #else
-PAGEFLAG_FALSE(Uncached)
+PAGEFLAG_FALSE(Uncached, uncached)
 #endif
 
 #ifdef CONFIG_MEMORY_FAILURE
@@ -459,7 +509,7 @@ TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
 extern bool take_page_off_buddy(struct page *page);
 #else
-PAGEFLAG_FALSE(HWPoison)
+PAGEFLAG_FALSE(HWPoison, hwpoison)
 #define __PG_HWPOISON 0
 #endif
 
@@ -505,10 +555,14 @@ static __always_inline int PageMappingFlags(struct page *page)
 	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
 }
 
-static __always_inline int PageAnon(struct page *page)
+static __always_inline bool folio_anon(struct folio *folio)
+{
+	return ((unsigned long)folio->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
+static __always_inline bool PageAnon(struct page *page)
 {
-	page = compound_head(page);
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+	return folio_anon(page_folio(page));
 }
 
 static __always_inline int __PageMovable(struct page *page)
@@ -524,30 +578,32 @@ static __always_inline int __PageMovable(struct page *page)
  * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
  * anon_vma, but to that page's node of the stable tree.
  */
-static __always_inline int PageKsm(struct page *page)
+static __always_inline bool folio_ksm(struct folio *folio)
 {
-	page = compound_head(page);
-	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+	return ((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS) ==
 				PAGE_MAPPING_KSM;
 }
+
+static __always_inline bool PageKsm(struct page *page)
+{
+	return folio_ksm(page_folio(page));
+}
 #else
-TESTPAGEFLAG_FALSE(Ksm)
+TESTPAGEFLAG_FALSE(Ksm, ksm)
 #endif
 
 u64 stable_page_flags(struct page *page);
 
-static inline int PageUptodate(struct page *page)
+static inline bool folio_uptodate(struct folio *folio)
 {
-	int ret;
-	page = compound_head(page);
-	ret = test_bit(PG_uptodate, &(page)->flags);
+	bool ret = test_bit(PG_uptodate, folio_flags(folio, 0));
 	/*
-	 * Must ensure that the data we read out of the page is loaded
-	 * _after_ we've loaded page->flags to check for PageUptodate.
-	 * We can skip the barrier if the page is not uptodate, because
+	 * Must ensure that the data we read out of the folio is loaded
+	 * _after_ we've loaded folio->flags to check the uptodate bit.
+	 * We can skip the barrier if the folio is not uptodate, because
 	 * we wouldn't be reading anything from it.
 	 *
-	 * See SetPageUptodate() for the other side of the story.
+	 * See folio_mark_uptodate() for the other side of the story.
 	 */
 	if (ret)
 		smp_rmb();
@@ -555,23 +611,36 @@ static inline int PageUptodate(struct page *page)
 	return ret;
 }
 
-static __always_inline void __SetPageUptodate(struct page *page)
+static inline int PageUptodate(struct page *page)
+{
+	return folio_uptodate(page_folio(page));
+}
+
+static __always_inline void __folio_mark_uptodate(struct folio *folio)
 {
-	VM_BUG_ON_PAGE(PageTail(page), page);
 	smp_wmb();
-	__set_bit(PG_uptodate, &page->flags);
+	__set_bit(PG_uptodate, folio_flags(folio, 0));
 }
 
-static __always_inline void SetPageUptodate(struct page *page)
+static __always_inline void folio_mark_uptodate(struct folio *folio)
 {
-	VM_BUG_ON_PAGE(PageTail(page), page);
 	/*
 	 * Memory barrier must be issued before setting the PG_uptodate bit,
-	 * so that all previous stores issued in order to bring the page
-	 * uptodate are actually visible before PageUptodate becomes true.
+	 * so that all previous stores issued in order to bring the folio
+	 * uptodate are actually visible before folio_uptodate becomes true.
 	 */
 	smp_wmb();
-	set_bit(PG_uptodate, &page->flags);
+	set_bit(PG_uptodate, folio_flags(folio, 0));
+}
+
+static __always_inline void __SetPageUptodate(struct page *page)
+{
+	__folio_mark_uptodate((struct folio *)page);
+}
+
+static __always_inline void SetPageUptodate(struct page *page)
+{
+	folio_mark_uptodate((struct folio *)page);
 }
 
 CLEARPAGEFLAG(Uptodate, uptodate, PF_NO_TAIL)
@@ -596,6 +665,17 @@ static inline void set_page_writeback_keepwrite(struct page *page)
 
 __PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)
 
+/* Whether there are one or multiple pages in a folio */
+static inline bool folio_single(struct folio *folio)
+{
+	return !folio_head(folio);
+}
+
+static inline bool folio_multi(struct folio *folio)
+{
+	return folio_head(folio);
+}
+
 static __always_inline void set_compound_head(struct page *page, struct page *head)
 {
 	WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
@@ -619,12 +699,15 @@ static inline void ClearPageCompound(struct page *page)
 #ifdef CONFIG_HUGETLB_PAGE
 int PageHuge(struct page *page);
 int PageHeadHuge(struct page *page);
+static inline bool folio_hugetlb(struct folio *folio)
+{
+	return PageHeadHuge(&folio->page);
+}
 #else
-TESTPAGEFLAG_FALSE(Huge)
-TESTPAGEFLAG_FALSE(HeadHuge)
+TESTPAGEFLAG_FALSE(Huge, hugetlb)
+TESTPAGEFLAG_FALSE(HeadHuge, headhuge)
 #endif
 
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * PageHuge() only returns true for hugetlbfs pages, but not for
@@ -640,6 +723,11 @@ static inline int PageTransHuge(struct page *page)
 	return PageHead(page);
 }
 
+static inline bool folio_transhuge(struct folio *folio)
+{
+	return folio_head(folio);
+}
+
 /*
  * PageTransCompound returns true for both transparent huge pages
  * and hugetlbfs pages, so it should only be called when it's known
@@ -713,12 +801,12 @@ static inline int PageTransTail(struct page *page)
 PAGEFLAG(DoubleMap, double_map, PF_SECOND)
 	TESTSCFLAG(DoubleMap, double_map, PF_SECOND)
 #else
-TESTPAGEFLAG_FALSE(TransHuge)
-TESTPAGEFLAG_FALSE(TransCompound)
-TESTPAGEFLAG_FALSE(TransCompoundMap)
-TESTPAGEFLAG_FALSE(TransTail)
-PAGEFLAG_FALSE(DoubleMap)
-	TESTSCFLAG_FALSE(DoubleMap)
+TESTPAGEFLAG_FALSE(TransHuge, transhuge)
+TESTPAGEFLAG_FALSE(TransCompound, transcompound)
+TESTPAGEFLAG_FALSE(TransCompoundMap, transcompoundmap)
+TESTPAGEFLAG_FALSE(TransTail, transtail)
+PAGEFLAG_FALSE(DoubleMap, double_map)
+	TESTSCFLAG_FALSE(DoubleMap, double_map)
 #endif
 
 /*
@@ -871,6 +959,11 @@ static inline int page_has_private(struct page *page)
 	return !!(page->flags & PAGE_FLAGS_PRIVATE);
 }
 
+static inline bool folio_has_private(struct folio *folio)
+{
+	return page_has_private(&folio->page);
+}
+
 #undef PF_ANY
 #undef PF_HEAD
 #undef PF_ONLY_HEAD
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 10/33] mm: Add folio_young and folio_idle
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 09/33] mm: Add folio flag manipulation functions Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 15:33   ` Vlastimil Babka
  2021-05-27  8:17   ` Christoph Hellwig
  2021-05-11 21:47 ` [PATCH v10 11/33] mm: Handle per-folio private data Matthew Wilcox (Oracle)
                   ` (25 subsequent siblings)
  35 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm, linux-kernel

Idle page tracking is handled through page_ext on 32-bit architectures.
Add folio equivalents for 32-bit and move all the page compatibility
parts to common code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/page_idle.h | 99 +++++++++++++++++++--------------------
 1 file changed, 49 insertions(+), 50 deletions(-)

diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h
index 1e894d34bdce..bd957e818558 100644
--- a/include/linux/page_idle.h
+++ b/include/linux/page_idle.h
@@ -8,46 +8,16 @@
 
 #ifdef CONFIG_IDLE_PAGE_TRACKING
 
-#ifdef CONFIG_64BIT
-static inline bool page_is_young(struct page *page)
-{
-	return PageYoung(page);
-}
-
-static inline void set_page_young(struct page *page)
-{
-	SetPageYoung(page);
-}
-
-static inline bool test_and_clear_page_young(struct page *page)
-{
-	return TestClearPageYoung(page);
-}
-
-static inline bool page_is_idle(struct page *page)
-{
-	return PageIdle(page);
-}
-
-static inline void set_page_idle(struct page *page)
-{
-	SetPageIdle(page);
-}
-
-static inline void clear_page_idle(struct page *page)
-{
-	ClearPageIdle(page);
-}
-#else /* !CONFIG_64BIT */
+#ifndef CONFIG_64BIT
 /*
  * If there is not enough space to store Idle and Young bits in page flags, use
  * page ext flags instead.
  */
 extern struct page_ext_operations page_idle_ops;
 
-static inline bool page_is_young(struct page *page)
+static inline bool folio_young(struct folio *folio)
 {
-	struct page_ext *page_ext = lookup_page_ext(page);
+	struct page_ext *page_ext = lookup_page_ext(&folio->page);
 
 	if (unlikely(!page_ext))
 		return false;
@@ -55,9 +25,9 @@ static inline bool page_is_young(struct page *page)
 	return test_bit(PAGE_EXT_YOUNG, &page_ext->flags);
 }
 
-static inline void set_page_young(struct page *page)
+static inline void folio_set_young_flag(struct folio *folio)
 {
-	struct page_ext *page_ext = lookup_page_ext(page);
+	struct page_ext *page_ext = lookup_page_ext(&folio->page);
 
 	if (unlikely(!page_ext))
 		return;
@@ -65,9 +35,9 @@ static inline void set_page_young(struct page *page)
 	set_bit(PAGE_EXT_YOUNG, &page_ext->flags);
 }
 
-static inline bool test_and_clear_page_young(struct page *page)
+static inline bool folio_test_clear_young_flag(struct folio *folio)
 {
-	struct page_ext *page_ext = lookup_page_ext(page);
+	struct page_ext *page_ext = lookup_page_ext(&folio->page);
 
 	if (unlikely(!page_ext))
 		return false;
@@ -75,9 +45,9 @@ static inline bool test_and_clear_page_young(struct page *page)
 	return test_and_clear_bit(PAGE_EXT_YOUNG, &page_ext->flags);
 }
 
-static inline bool page_is_idle(struct page *page)
+static inline bool folio_idle(struct folio *folio)
 {
-	struct page_ext *page_ext = lookup_page_ext(page);
+	struct page_ext *page_ext = lookup_page_ext(&folio->page);
 
 	if (unlikely(!page_ext))
 		return false;
@@ -85,9 +55,9 @@ static inline bool page_is_idle(struct page *page)
 	return test_bit(PAGE_EXT_IDLE, &page_ext->flags);
 }
 
-static inline void set_page_idle(struct page *page)
+static inline void folio_set_idle_flag(struct folio *folio)
 {
-	struct page_ext *page_ext = lookup_page_ext(page);
+	struct page_ext *page_ext = lookup_page_ext(&folio->page);
 
 	if (unlikely(!page_ext))
 		return;
@@ -95,46 +65,75 @@ static inline void set_page_idle(struct page *page)
 	set_bit(PAGE_EXT_IDLE, &page_ext->flags);
 }
 
-static inline void clear_page_idle(struct page *page)
+static inline void folio_clear_idle_flag(struct folio *folio)
 {
-	struct page_ext *page_ext = lookup_page_ext(page);
+	struct page_ext *page_ext = lookup_page_ext(&folio->page);
 
 	if (unlikely(!page_ext))
 		return;
 
 	clear_bit(PAGE_EXT_IDLE, &page_ext->flags);
 }
-#endif /* CONFIG_64BIT */
+#endif /* !CONFIG_64BIT */
 
 #else /* !CONFIG_IDLE_PAGE_TRACKING */
 
-static inline bool page_is_young(struct page *page)
+static inline bool folio_young(struct folio *folio)
 {
 	return false;
 }
 
-static inline void set_page_young(struct page *page)
+static inline void folio_set_young_flag(struct folio *folio)
 {
 }
 
-static inline bool test_and_clear_page_young(struct page *page)
+static inline bool folio_test_clear_young_flag(struct folio *folio)
 {
 	return false;
 }
 
-static inline bool page_is_idle(struct page *page)
+static inline bool folio_idle(struct folio *folio)
 {
 	return false;
 }
 
-static inline void set_page_idle(struct page *page)
+static inline void folio_set_idle_flag(struct folio *folio)
 {
 }
 
-static inline void clear_page_idle(struct page *page)
+static inline void folio_clear_idle_flag(struct folio *folio)
 {
 }
 
 #endif /* CONFIG_IDLE_PAGE_TRACKING */
 
+static inline bool page_is_young(struct page *page)
+{
+	return folio_young(page_folio(page));
+}
+
+static inline void set_page_young(struct page *page)
+{
+	folio_set_young_flag(page_folio(page));
+}
+
+static inline bool test_and_clear_page_young(struct page *page)
+{
+	return folio_test_clear_young_flag(page_folio(page));
+}
+
+static inline bool page_is_idle(struct page *page)
+{
+	return folio_idle(page_folio(page));
+}
+
+static inline void set_page_idle(struct page *page)
+{
+	folio_set_idle_flag(page_folio(page));
+}
+
+static inline void clear_page_idle(struct page *page)
+{
+	folio_clear_idle_flag(page_folio(page));
+}
 #endif /* _LINUX_MM_PAGE_IDLE_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 11/33] mm: Handle per-folio private data
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 10/33] mm: Add folio_young and folio_idle Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 15:41   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 12/33] mm/filemap: Add folio_index, folio_file_page and folio_contains Matthew Wilcox (Oracle)
                   ` (24 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Add folio_get_private() which mirrors page_private() -- ie folio private
data is the same as page private data.  The only difference is that these
return a void * instead of an unsigned long, which matches the majority
of users.

Turn attach_page_private() into folio_attach_private() and reimplement
attach_page_private() as a wrapper.  No filesystem which uses page private
data currently supports compound pages, so we're free to define the rules.
attach_page_private() may only be called on a head page; if you want
to add private data to a tail page, you can call set_page_private()
directly (and shouldn't increment the page refcount!  That should be
done when adding private data to the head page / folio).

This saves 597 bytes of text with the distro-derived config that I'm
testing due to removing the calls to compound_head() in get_page()
& put_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/mm_types.h | 11 +++++++++
 include/linux/pagemap.h  | 48 ++++++++++++++++++++++++----------------
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3118ba8b5a4e..943854268986 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -302,6 +302,12 @@ static inline atomic_t *compound_pincount_ptr(struct page *page)
 #define PAGE_FRAG_CACHE_MAX_SIZE	__ALIGN_MASK(32768, ~PAGE_MASK)
 #define PAGE_FRAG_CACHE_MAX_ORDER	get_order(PAGE_FRAG_CACHE_MAX_SIZE)
 
+/*
+ * page_private can be used on tail pages.  However, PagePrivate is only
+ * checked by the VM on the head page.  So page_private on the tail pages
+ * should be used for data that's ancillary to the head page (eg attaching
+ * buffer heads to tail pages after attaching buffer heads to the head page)
+ */
 #define page_private(page)		((page)->private)
 
 static inline void set_page_private(struct page *page, unsigned long private)
@@ -309,6 +315,11 @@ static inline void set_page_private(struct page *page, unsigned long private)
 	page->private = private;
 }
 
+static inline void *folio_get_private(struct folio *folio)
+{
+	return folio->private;
+}
+
 struct page_frag_cache {
 	void * va;
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4900e64c880d..bc5fa3d7204e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -184,42 +184,52 @@ static inline bool page_cache_get_speculative(struct page *page)
 }
 
 /**
- * attach_page_private - Attach private data to a page.
- * @page: Page to attach data to.
- * @data: Data to attach to page.
+ * folio_attach_private - Attach private data to a folio.
+ * @folio: Folio to attach data to.
+ * @data: Data to attach to folio.
  *
- * Attaching private data to a page increments the page's reference count.
- * The data must be detached before the page will be freed.
+ * Attaching private data to a folio increments the page's reference count.
+ * The data must be detached before the folio will be freed.
  */
-static inline void attach_page_private(struct page *page, void *data)
+static inline void folio_attach_private(struct folio *folio, void *data)
 {
-	get_page(page);
-	set_page_private(page, (unsigned long)data);
-	SetPagePrivate(page);
+	folio_get(folio);
+	folio->private = data;
+	folio_set_private_flag(folio);
 }
 
 /**
- * detach_page_private - Detach private data from a page.
- * @page: Page to detach data from.
+ * folio_detach_private - Detach private data from a folio.
+ * @folio: Folio to detach data from.
  *
- * Removes the data that was previously attached to the page and decrements
+ * Removes the data that was previously attached to the folio and decrements
  * the refcount on the page.
  *
- * Return: Data that was attached to the page.
+ * Return: Data that was attached to the folio.
  */
-static inline void *detach_page_private(struct page *page)
+static inline void *folio_detach_private(struct folio *folio)
 {
-	void *data = (void *)page_private(page);
+	void *data = folio_get_private(folio);
 
-	if (!PagePrivate(page))
+	if (!folio_private(folio))
 		return NULL;
-	ClearPagePrivate(page);
-	set_page_private(page, 0);
-	put_page(page);
+	folio_clear_private_flag(folio);
+	folio->private = NULL;
+	folio_put(folio);
 
 	return data;
 }
 
+static inline void attach_page_private(struct page *page, void *data)
+{
+	folio_attach_private(page_folio(page), data);
+}
+
+static inline void *detach_page_private(struct page *page)
+{
+	return folio_detach_private(page_folio(page));
+}
+
 #ifdef CONFIG_NUMA
 extern struct page *__page_cache_alloc(gfp_t gfp);
 #else
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 12/33] mm/filemap: Add folio_index, folio_file_page and folio_contains
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 11/33] mm: Handle per-folio private data Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 15:55   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 13/33] mm/filemap: Add folio_next_index Matthew Wilcox (Oracle)
                   ` (23 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

folio_index() is the equivalent of page_index() for folios.
folio_file_page() is the equivalent of find_subpage().
folio_contains() is the equivalent of thp_contains().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h | 53 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index bc5fa3d7204e..8eaeffccfd38 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -386,6 +386,59 @@ static inline bool thp_contains(struct page *head, pgoff_t index)
 	return page_index(head) == (index & ~(thp_nr_pages(head) - 1UL));
 }
 
+#define swapcache_index(folio)	__page_file_index(&(folio)->page)
+
+/**
+ * folio_index - File index of a folio.
+ * @folio: The folio.
+ *
+ * For a folio which is either in the page cache or the swap cache,
+ * return its index within the address_space it belongs to.  If you know
+ * the page is definitely in the page cache, you can look at the folio's
+ * index directly.
+ *
+ * Return: The index (offset in units of pages) of a folio in its file.
+ */
+static inline pgoff_t folio_index(struct folio *folio)
+{
+        if (unlikely(folio_swapcache(folio)))
+                return swapcache_index(folio);
+        return folio->index;
+}
+
+/**
+ * folio_file_page - The page for a particular index.
+ * @folio: The folio which contains this index.
+ * @index: The index we want to look up.
+ *
+ * Sometimes after looking up a folio in the page cache, we need to
+ * obtain the specific page for an index (eg a page fault).
+ *
+ * Return: The page containing the file data for this index.
+ */
+static inline struct page *folio_file_page(struct folio *folio, pgoff_t index)
+{
+	return folio_page(folio, index & (folio_nr_pages(folio) - 1));
+}
+
+/**
+ * folio_contains - Does this folio contain this index?
+ * @folio: The folio.
+ * @index: The page index within the file.
+ *
+ * Context: The caller should have the page locked in order to prevent
+ * (eg) shmem from moving the page between the page cache and swap cache
+ * and changing its index in the middle of the operation.
+ * Return: true or false.
+ */
+static inline bool folio_contains(struct folio *folio, pgoff_t index)
+{
+	/* HugeTLBfs indexes the page cache in units of hpage_size */
+	if (folio_hugetlb(folio))
+		return folio->index == index;
+	return index - folio_index(folio) < folio_nr_pages(folio);
+}
+
 /*
  * Given the page we found in the page cache, return the page corresponding
  * to this index in the file
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 13/33] mm/filemap: Add folio_next_index
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (11 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 12/33] mm/filemap: Add folio_index, folio_file_page and folio_contains Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 17:07   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 14/33] mm/filemap: Add folio_offset and folio_file_offset Matthew Wilcox (Oracle)
                   ` (22 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

This helper returns the page index of the next folio in the file (ie
the end of this folio, plus one).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8eaeffccfd38..3b82252d12fc 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -406,6 +406,17 @@ static inline pgoff_t folio_index(struct folio *folio)
         return folio->index;
 }
 
+/**
+ * folio_next_index - Get the index of the next folio.
+ * @folio: The current folio.
+ *
+ * Return: The index of the folio which follows this folio in the file.
+ */
+static inline pgoff_t folio_next_index(struct folio *folio)
+{
+	return folio->index + folio_nr_pages(folio);
+}
+
 /**
  * folio_file_page - The page for a particular index.
  * @folio: The folio which contains this index.
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 14/33] mm/filemap: Add folio_offset and folio_file_offset
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (12 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 13/33] mm/filemap: Add folio_next_index Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 17:08   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 15/33] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
                   ` (21 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

These are just wrappers around their page counterpart.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 3b82252d12fc..448a2dfb5ff1 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -558,6 +558,16 @@ static inline loff_t page_file_offset(struct page *page)
 	return ((loff_t)page_index(page)) << PAGE_SHIFT;
 }
 
+static inline loff_t folio_offset(struct folio *folio)
+{
+	return page_offset(&folio->page);
+}
+
+static inline loff_t folio_file_offset(struct folio *folio)
+{
+	return page_file_offset(&folio->page);
+}
+
 extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
 				     unsigned long address);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 15/33] mm/util: Add folio_mapping and folio_file_mapping
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (13 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 14/33] mm/filemap: Add folio_offset and folio_file_offset Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 17:29   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 16/33] mm: Add folio_mapcount Matthew Wilcox (Oracle)
                   ` (20 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

These are the folio equivalent of page_mapping() and page_file_mapping().
Add an out-of-line page_mapping() wrapper around folio_mapping()
in order to prevent the page_folio() call from bloating every caller
of page_mapping().  Adjust page_file_mapping() and page_mapping_file()
to use folios internally.  Rename __page_file_mapping() to
swapcache_mapping() and change it to take a folio.

This ends up saving 186 bytes of text overall.  folio_mapping() is
45 bytes shorter than page_mapping() was, but the new page_mapping()
wrapper is 30 bytes.  The major reduction is a few bytes less in dozens
of nfs functions (which call page_file_mapping()).  Most of these appear
to be a slight change in gcc's register allocation decisions, which allow:

   48 8b 56 08         mov    0x8(%rsi),%rdx
   48 8d 42 ff         lea    -0x1(%rdx),%rax
   83 e2 01            and    $0x1,%edx
   48 0f 44 c6         cmove  %rsi,%rax

to become:

   48 8b 46 08         mov    0x8(%rsi),%rax
   48 8d 78 ff         lea    -0x1(%rax),%rdi
   a8 01               test   $0x1,%al
   48 0f 44 fe         cmove  %rsi,%rdi

for a reduction of a single byte.  Once the NFS client is converted to
use folios, this entire sequence will disappear.

Also add folio_mapping() documentation.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 Documentation/core-api/mm-api.rst |  2 ++
 include/linux/mm.h                | 14 -------------
 include/linux/pagemap.h           | 35 +++++++++++++++++++++++++++++--
 include/linux/swap.h              |  6 ++++++
 mm/Makefile                       |  2 +-
 mm/folio-compat.c                 | 13 ++++++++++++
 mm/swapfile.c                     |  8 +++----
 mm/util.c                         | 30 +++++++++++++++-----------
 8 files changed, 77 insertions(+), 33 deletions(-)
 create mode 100644 mm/folio-compat.c

diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst
index 5c459ee2acce..dcce6605947a 100644
--- a/Documentation/core-api/mm-api.rst
+++ b/Documentation/core-api/mm-api.rst
@@ -100,3 +100,5 @@ More Memory Management Functions
    :internal:
 .. kernel-doc:: include/linux/page_ref.h
 .. kernel-doc:: include/linux/mmzone.h
+.. kernel-doc:: mm/util.c
+   :functions: folio_mapping
diff --git a/include/linux/mm.h b/include/linux/mm.h
index feb4645ef4f2..dca39daf3495 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1749,19 +1749,6 @@ void page_address_init(void);
 
 extern void *page_rmapping(struct page *page);
 extern struct anon_vma *page_anon_vma(struct page *page);
-extern struct address_space *page_mapping(struct page *page);
-
-extern struct address_space *__page_file_mapping(struct page *);
-
-static inline
-struct address_space *page_file_mapping(struct page *page)
-{
-	if (unlikely(PageSwapCache(page)))
-		return __page_file_mapping(page);
-
-	return page->mapping;
-}
-
 extern pgoff_t __page_file_index(struct page *page);
 
 /*
@@ -1776,7 +1763,6 @@ static inline pgoff_t page_index(struct page *page)
 }
 
 bool page_mapped(struct page *page);
-struct address_space *page_mapping(struct page *page);
 
 /*
  * Return true only if the page has been allocated with
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 448a2dfb5ff1..1f37d7656955 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -162,14 +162,45 @@ static inline void filemap_nr_thps_dec(struct address_space *mapping)
 
 void release_pages(struct page **pages, int nr);
 
+struct address_space *page_mapping(struct page *);
+struct address_space *folio_mapping(struct folio *);
+struct address_space *swapcache_mapping(struct folio *);
+
+/**
+ * folio_file_mapping - Find the mapping this folio belongs to.
+ * @folio: The folio.
+ *
+ * For folios which are in the page cache, return the mapping that this
+ * page belongs to.  Folios in the swap cache return the mapping of the
+ * swap file or swap device where the data is stored.  This is different
+ * from the mapping returned by folio_mapping().  The only reason to
+ * use it is if, like NFS, you return 0 from ->activate_swapfile.
+ *
+ * Do not call this for folios which aren't in the page cache or swap cache.
+ */
+static inline struct address_space *folio_file_mapping(struct folio *folio)
+{
+	if (unlikely(folio_swapcache(folio)))
+		return swapcache_mapping(folio);
+
+	return folio->mapping;
+}
+
+static inline struct address_space *page_file_mapping(struct page *page)
+{
+	return folio_file_mapping(page_folio(page));
+}
+
 /*
  * For file cache pages, return the address_space, otherwise return NULL
  */
 static inline struct address_space *page_mapping_file(struct page *page)
 {
-	if (unlikely(PageSwapCache(page)))
+	struct folio *folio = page_folio(page);
+
+	if (unlikely(folio_swapcache(folio)))
 		return NULL;
-	return page_mapping(page);
+	return folio_mapping(folio);
 }
 
 static inline bool page_cache_add_speculative(struct page *page, int count)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 144727041e78..20766342845b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -314,6 +314,12 @@ struct vma_swap_readahead {
 #endif
 };
 
+static inline swp_entry_t folio_swap_entry(struct folio *folio)
+{
+	swp_entry_t entry = { .val = page_private(&folio->page) };
+	return entry;
+}
+
 /* linux/mm/workingset.c */
 void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages);
 void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg);
diff --git a/mm/Makefile b/mm/Makefile
index a9ad6122d468..434c2a46b6c5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,7 +46,7 @@ mmu-$(CONFIG_MMU)	+= process_vm_access.o
 endif
 
 obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
-			   maccess.o page-writeback.o \
+			   maccess.o page-writeback.o folio-compat.o \
 			   readahead.o swap.o truncate.o vmscan.o shmem.o \
 			   util.o mmzone.o vmstat.o backing-dev.o \
 			   mm_init.o percpu.o slab_common.o \
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
new file mode 100644
index 000000000000..5e107aa30a62
--- /dev/null
+++ b/mm/folio-compat.c
@@ -0,0 +1,13 @@
+/*
+ * Compatibility functions which bloat the callers too much to make inline.
+ * All of the callers of these functions should be converted to use folios
+ * eventually.
+ */
+
+#include <linux/pagemap.h>
+
+struct address_space *page_mapping(struct page *page)
+{
+	return folio_mapping(page_folio(page));
+}
+EXPORT_SYMBOL(page_mapping);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 149e77454e3c..d0ee24239a83 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3533,13 +3533,13 @@ struct swap_info_struct *page_swap_info(struct page *page)
 }
 
 /*
- * out-of-line __page_file_ methods to avoid include hell.
+ * out-of-line methods to avoid include hell.
  */
-struct address_space *__page_file_mapping(struct page *page)
+struct address_space *swapcache_mapping(struct folio *folio)
 {
-	return page_swap_info(page)->swap_file->f_mapping;
+	return page_swap_info(&folio->page)->swap_file->f_mapping;
 }
-EXPORT_SYMBOL_GPL(__page_file_mapping);
+EXPORT_SYMBOL_GPL(swapcache_mapping);
 
 pgoff_t __page_file_index(struct page *page)
 {
diff --git a/mm/util.c b/mm/util.c
index 0b6dd9d81da7..245f5c7bedae 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -686,30 +686,36 @@ struct anon_vma *page_anon_vma(struct page *page)
 	return __page_rmapping(page);
 }
 
-struct address_space *page_mapping(struct page *page)
+/**
+ * folio_mapping - Find the mapping where this folio is stored.
+ * @folio: The folio.
+ *
+ * For folios which are in the page cache, return the mapping that this
+ * page belongs to.  Folios in the swap cache return the swap mapping
+ * this page is stored in (which is different from the mapping for the
+ * swap file or swap device where the data is stored).
+ *
+ * You can call this for folios which aren't in the swap cache or page
+ * cache and it will return NULL.
+ */
+struct address_space *folio_mapping(struct folio *folio)
 {
 	struct address_space *mapping;
 
-	page = compound_head(page);
-
 	/* This happens if someone calls flush_dcache_page on slab page */
-	if (unlikely(PageSlab(page)))
+	if (unlikely(folio_slab(folio)))
 		return NULL;
 
-	if (unlikely(PageSwapCache(page))) {
-		swp_entry_t entry;
-
-		entry.val = page_private(page);
-		return swap_address_space(entry);
-	}
+	if (unlikely(folio_swapcache(folio)))
+		return swap_address_space(folio_swap_entry(folio));
 
-	mapping = page->mapping;
+	mapping = folio->mapping;
 	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
 		return NULL;
 
 	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
 }
-EXPORT_SYMBOL(page_mapping);
+EXPORT_SYMBOL(folio_mapping);
 
 /* Slow path of page_mapcount() for compound pages */
 int __page_mapcount(struct page *page)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 16/33] mm: Add folio_mapcount
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (14 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 15/33] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-14 17:39   ` Vlastimil Babka
  2021-05-18 18:45   ` Matthew Wilcox
  2021-05-11 21:47 ` [PATCH v10 17/33] mm/memcg: Add folio wrappers for various functions Matthew Wilcox (Oracle)
                   ` (19 subsequent siblings)
  35 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

This is the folio equivalent of page_mapcount().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/mm.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index dca39daf3495..6e3dde81ecc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -883,6 +883,22 @@ static inline int page_mapcount(struct page *page)
 	return atomic_read(&page->_mapcount) + 1;
 }
 
+/**
+ * folio_mapcount - The number of mappings of this folio.
+ * @folio: The folio.
+ *
+ * The result includes the number of times any of the pages in the
+ * folio are mapped to userspace.
+ *
+ * Return: The number of page table entries which refer to this folio.
+ */
+static inline int folio_mapcount(struct folio *folio)
+{
+	if (unlikely(folio_multi(folio)))
+		return __page_mapcount(&folio->page);
+	return atomic_read(&folio->_mapcount) + 1;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 int total_mapcount(struct page *page);
 int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 17/33] mm/memcg: Add folio wrappers for various functions
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (15 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 16/33] mm: Add folio_mapcount Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18  9:57   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 18/33] mm/filemap: Add folio_unlock Matthew Wilcox (Oracle)
                   ` (18 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Add new wrapper functions folio_memcg(), lock_folio_memcg(),
unlock_folio_memcg(), mem_cgroup_folio_lruvec() and
count_memcg_folio_event()

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/memcontrol.h | 63 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c193be760709..a3e627ea98e0 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -456,6 +456,11 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
 		return __page_memcg(page);
 }
 
+static inline struct mem_cgroup *folio_memcg(struct folio *folio)
+{
+	return page_memcg(&folio->page);
+}
+
 /*
  * page_memcg_rcu - locklessly get the memory cgroup associated with a page
  * @page: a pointer to the page struct
@@ -1058,6 +1063,15 @@ static inline void count_memcg_page_event(struct page *page,
 		count_memcg_events(memcg, idx, 1);
 }
 
+static inline void count_memcg_folio_event(struct folio *folio,
+					  enum vm_event_item idx)
+{
+	struct mem_cgroup *memcg = folio_memcg(folio);
+
+	if (memcg)
+		count_memcg_events(memcg, idx, folio_nr_pages(folio));
+}
+
 static inline void count_memcg_event_mm(struct mm_struct *mm,
 					enum vm_event_item idx)
 {
@@ -1129,6 +1143,11 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
 	return NULL;
 }
 
+static inline struct mem_cgroup *folio_memcg(struct folio *folio)
+{
+	return NULL;
+}
+
 static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
 {
 	WARN_ON_ONCE(!rcu_read_lock_held());
@@ -1477,6 +1496,22 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 }
 #endif /* CONFIG_MEMCG */
 
+static inline void lock_folio_memcg(struct folio *folio)
+{
+	lock_page_memcg(&folio->page);
+}
+
+static inline void unlock_folio_memcg(struct folio *folio)
+{
+	unlock_page_memcg(&folio->page);
+}
+
+static inline struct lruvec *mem_cgroup_folio_lruvec(struct folio *folio,
+						    struct pglist_data *pgdat)
+{
+	return mem_cgroup_page_lruvec(&folio->page, pgdat);
+}
+
 static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
 {
 	__mod_lruvec_kmem_state(p, idx, 1);
@@ -1544,6 +1579,34 @@ static inline struct lruvec *relock_page_lruvec_irqsave(struct page *page,
 	return lock_page_lruvec_irqsave(page, flags);
 }
 
+static inline struct lruvec *folio_lock_lruvec(struct folio *folio)
+{
+	return lock_page_lruvec(&folio->page);
+}
+
+static inline struct lruvec *folio_lock_lruvec_irq(struct folio *folio)
+{
+	return lock_page_lruvec_irq(&folio->page);
+}
+
+static inline struct lruvec *folio_lock_lruvec_irqsave(struct folio *folio,
+		unsigned long *flagsp)
+{
+	return lock_page_lruvec_irqsave(&folio->page, flagsp);
+}
+
+static inline struct lruvec *folio_relock_lruvec_irq(struct folio *folio,
+		struct lruvec *locked_lruvec)
+{
+	return relock_page_lruvec_irq(&folio->page, locked_lruvec);
+}
+
+static inline struct lruvec *folio_relock_lruvec_irqsave(struct folio *folio,
+		struct lruvec *locked_lruvec, unsigned long *flagsp)
+{
+	return relock_page_lruvec_irqsave(&folio->page, locked_lruvec, flagsp);
+}
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 
 struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 18/33] mm/filemap: Add folio_unlock
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (16 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 17/33] mm/memcg: Add folio wrappers for various functions Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 10:06   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 19/33] mm/filemap: Add folio_lock Matthew Wilcox (Oracle)
                   ` (17 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Convert unlock_page() to call folio_unlock().  By using a folio we
avoid a call to compound_head().  This shortens the function from 39
bytes to 25 and removes 4 instructions on x86-64.  Because we still
have unlock_page(), it's a net increase of 24 bytes of text for the
kernel as a whole, but any path that uses folio_unlock() will execute
4 fewer instructions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h |  3 ++-
 mm/filemap.c            | 27 ++++++++++-----------------
 mm/folio-compat.c       |  6 ++++++
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 1f37d7656955..8dbba0074536 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -643,7 +643,8 @@ extern int __lock_page_killable(struct page *page);
 extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
-extern void unlock_page(struct page *page);
+void unlock_page(struct page *page);
+void folio_unlock(struct folio *folio);
 
 /*
  * Return true if the page was successfully locked
diff --git a/mm/filemap.c b/mm/filemap.c
index 817a47059bd0..e7a6a58d6cd9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1435,29 +1435,22 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
 #endif
 
 /**
- * unlock_page - unlock a locked page
- * @page: the page
+ * folio_unlock - Unlock a locked folio.
+ * @folio: The folio.
  *
- * Unlocks the page and wakes up sleepers in wait_on_page_locked().
- * Also wakes sleepers in wait_on_page_writeback() because the wakeup
- * mechanism between PageLocked pages and PageWriteback pages is shared.
- * But that's OK - sleepers in wait_on_page_writeback() just go back to sleep.
+ * Unlocks the folio and wakes up any thread sleeping on the page lock.
  *
- * Note that this depends on PG_waiters being the sign bit in the byte
- * that contains PG_locked - thus the BUILD_BUG_ON(). That allows us to
- * clear the PG_locked bit and test PG_waiters at the same time fairly
- * portably (architectures that do LL/SC can test any bit, while x86 can
- * test the sign bit).
+ * Context: May be called from interrupt or process context.  May not be
+ * called from NMI context.
  */
-void unlock_page(struct page *page)
+void folio_unlock(struct folio *folio)
 {
 	BUILD_BUG_ON(PG_waiters != 7);
-	page = compound_head(page);
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags))
-		wake_up_page_bit(page, PG_locked);
+	VM_BUG_ON_FOLIO(!folio_locked(folio), folio);
+	if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0)))
+		wake_up_page_bit(&folio->page, PG_locked);
 }
-EXPORT_SYMBOL(unlock_page);
+EXPORT_SYMBOL(folio_unlock);
 
 /**
  * end_page_private_2 - Clear PG_private_2 and release any waiters
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 5e107aa30a62..91b3d00a92f7 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -11,3 +11,9 @@ struct address_space *page_mapping(struct page *page)
 	return folio_mapping(page_folio(page));
 }
 EXPORT_SYMBOL(page_mapping);
+
+void unlock_page(struct page *page)
+{
+	return folio_unlock(page_folio(page));
+}
+EXPORT_SYMBOL(unlock_page);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 19/33] mm/filemap: Add folio_lock
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (17 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 18/33] mm/filemap: Add folio_unlock Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 10:26   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 20/33] mm/filemap: Add folio_lock_killable Matthew Wilcox (Oracle)
                   ` (16 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

This is like lock_page() but for use by callers who know they have a folio.
Convert __lock_page() to be __folio_lock().  This saves one call to
compound_head() per contended call to lock_page().

Saves 362 bytes of text; mostly from improved register allocation and
inlining decisions.  __folio_lock is 59 bytes while __lock_page was 79.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h | 24 +++++++++++++++++++-----
 mm/filemap.c            | 29 +++++++++++++++--------------
 2 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8dbba0074536..9a78397609b8 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -638,7 +638,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 	return true;
 }
 
-extern void __lock_page(struct page *page);
+void __folio_lock(struct folio *folio);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
@@ -646,13 +646,24 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 void unlock_page(struct page *page);
 void folio_unlock(struct folio *folio);
 
+static inline bool folio_trylock(struct folio *folio)
+{
+	return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0)));
+}
+
 /*
  * Return true if the page was successfully locked
  */
 static inline int trylock_page(struct page *page)
 {
-	page = compound_head(page);
-	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
+	return folio_trylock(page_folio(page));
+}
+
+static inline void folio_lock(struct folio *folio)
+{
+	might_sleep();
+	if (!folio_trylock(folio))
+		__folio_lock(folio);
 }
 
 /*
@@ -660,9 +671,12 @@ static inline int trylock_page(struct page *page)
  */
 static inline void lock_page(struct page *page)
 {
+	struct folio *folio;
 	might_sleep();
-	if (!trylock_page(page))
-		__lock_page(page);
+
+	folio = page_folio(page);
+	if (!folio_trylock(folio))
+		__folio_lock(folio);
 }
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index e7a6a58d6cd9..c6e5ba176764 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1187,7 +1187,7 @@ static void wake_up_page(struct page *page, int bit)
  */
 enum behavior {
 	EXCLUSIVE,	/* Hold ref to page and take the bit when woken, like
-			 * __lock_page() waiting on then setting PG_locked.
+			 * __folio_lock() waiting on then setting PG_locked.
 			 */
 	SHARED,		/* Hold ref to page and check the bit when woken, like
 			 * wait_on_page_writeback() waiting on PG_writeback.
@@ -1576,17 +1576,16 @@ void page_endio(struct page *page, bool is_write, int err)
 EXPORT_SYMBOL_GPL(page_endio);
 
 /**
- * __lock_page - get a lock on the page, assuming we need to sleep to get it
- * @__page: the page to lock
+ * __folio_lock - Get a lock on the folio, assuming we need to sleep to get it.
+ * @folio: The folio to lock
  */
-void __lock_page(struct page *__page)
+void __folio_lock(struct folio *folio)
 {
-	struct page *page = compound_head(__page);
-	wait_queue_head_t *q = page_waitqueue(page);
-	wait_on_page_bit_common(q, page, PG_locked, TASK_UNINTERRUPTIBLE,
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_UNINTERRUPTIBLE,
 				EXCLUSIVE);
 }
-EXPORT_SYMBOL(__lock_page);
+EXPORT_SYMBOL(__folio_lock);
 
 int __lock_page_killable(struct page *__page)
 {
@@ -1661,10 +1660,10 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 			return 0;
 		}
 	} else {
-		__lock_page(page);
+		__folio_lock(page_folio(page));
 	}
-	return 1;
 
+	return 1;
 }
 
 /**
@@ -2835,7 +2834,9 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
 static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 				     struct file **fpin)
 {
-	if (trylock_page(page))
+	struct folio *folio = page_folio(page);
+
+	if (folio_trylock(folio))
 		return 1;
 
 	/*
@@ -2848,7 +2849,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 
 	*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
 	if (vmf->flags & FAULT_FLAG_KILLABLE) {
-		if (__lock_page_killable(page)) {
+		if (__lock_page_killable(&folio->page)) {
 			/*
 			 * We didn't have the right flags to drop the mmap_lock,
 			 * but all fault_handlers only check for fatal signals
@@ -2860,11 +2861,11 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 			return 0;
 		}
 	} else
-		__lock_page(page);
+		__folio_lock(folio);
+
 	return 1;
 }
 
-
 /*
  * Synchronous readahead happens when we don't even find a page in the page
  * cache at all.  We don't want to perform IO under the mmap sem, so if we have
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 20/33] mm/filemap: Add folio_lock_killable
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (18 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 19/33] mm/filemap: Add folio_lock Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 10:31   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 21/33] mm/filemap: Add __folio_lock_async Matthew Wilcox (Oracle)
                   ` (15 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

This is like lock_page_killable() but for use by callers who
know they have a folio.  Convert __lock_page_killable() to be
__folio_lock_killable().  This saves one call to compound_head() per
contended call to lock_page_killable().

__folio_lock_killable() is 20 bytes smaller than __lock_page_killable()
was.  lock_page_maybe_drop_mmap() shrinks by 68 bytes and
__lock_page_or_retry() shrinks by 66 bytes.  That's a total of 154 bytes
of text saved.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h | 15 ++++++++++-----
 mm/filemap.c            | 17 +++++++++--------
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 9a78397609b8..21262e74fcd0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -639,7 +639,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 }
 
 void __folio_lock(struct folio *folio);
-extern int __lock_page_killable(struct page *page);
+int __folio_lock_killable(struct folio *folio);
 extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
@@ -679,6 +679,14 @@ static inline void lock_page(struct page *page)
 		__folio_lock(folio);
 }
 
+static inline int folio_lock_killable(struct folio *folio)
+{
+	might_sleep();
+	if (!folio_trylock(folio))
+		return __folio_lock_killable(folio);
+	return 0;
+}
+
 /*
  * lock_page_killable is like lock_page but can be interrupted by fatal
  * signals.  It returns 0 if it locked the page and -EINTR if it was
@@ -686,10 +694,7 @@ static inline void lock_page(struct page *page)
  */
 static inline int lock_page_killable(struct page *page)
 {
-	might_sleep();
-	if (!trylock_page(page))
-		return __lock_page_killable(page);
-	return 0;
+	return folio_lock_killable(page_folio(page));
 }
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index c6e5ba176764..ff4a2cd464f2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1587,14 +1587,13 @@ void __folio_lock(struct folio *folio)
 }
 EXPORT_SYMBOL(__folio_lock);
 
-int __lock_page_killable(struct page *__page)
+int __folio_lock_killable(struct folio *folio)
 {
-	struct page *page = compound_head(__page);
-	wait_queue_head_t *q = page_waitqueue(page);
-	return wait_on_page_bit_common(q, page, PG_locked, TASK_KILLABLE,
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	return wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_KILLABLE,
 					EXCLUSIVE);
 }
-EXPORT_SYMBOL_GPL(__lock_page_killable);
+EXPORT_SYMBOL_GPL(__folio_lock_killable);
 
 int __lock_page_async(struct page *page, struct wait_page_queue *wait)
 {
@@ -1636,6 +1635,8 @@ int __lock_page_async(struct page *page, struct wait_page_queue *wait)
 int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 			 unsigned int flags)
 {
+	struct folio *folio = page_folio(page);
+
 	if (fault_flag_allow_retry_first(flags)) {
 		/*
 		 * CAUTION! In this case, mmap_lock is not released
@@ -1654,13 +1655,13 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 	if (flags & FAULT_FLAG_KILLABLE) {
 		int ret;
 
-		ret = __lock_page_killable(page);
+		ret = __folio_lock_killable(folio);
 		if (ret) {
 			mmap_read_unlock(mm);
 			return 0;
 		}
 	} else {
-		__folio_lock(page_folio(page));
+		__folio_lock(folio);
 	}
 
 	return 1;
@@ -2849,7 +2850,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 
 	*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
 	if (vmf->flags & FAULT_FLAG_KILLABLE) {
-		if (__lock_page_killable(&folio->page)) {
+		if (__folio_lock_killable(folio)) {
 			/*
 			 * We didn't have the right flags to drop the mmap_lock,
 			 * but all fault_handlers only check for fatal signals
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 21/33] mm/filemap: Add __folio_lock_async
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (19 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 20/33] mm/filemap: Add folio_lock_killable Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 10:34   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 22/33] mm/filemap: Add __folio_lock_or_retry Matthew Wilcox (Oracle)
                   ` (14 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

There aren't any actual callers of lock_page_async(), so remove it.
Convert filemap_update_page() to call __folio_lock_async().

__folio_lock_async() is 21 bytes smaller than __lock_page_async(),
but the real savings come from using a folio in filemap_update_page(),
shrinking it from 514 bytes to 403 bytes, saving 111 bytes.  The text
shrinks by 132 bytes in total.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 fs/io_uring.c           |  2 +-
 include/linux/pagemap.h | 17 -----------------
 mm/filemap.c            | 31 ++++++++++++++++---------------
 3 files changed, 17 insertions(+), 33 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f46acbbeed57..d09bb3af1324 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3158,7 +3158,7 @@ static int io_read_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 }
 
 /*
- * This is our waitqueue callback handler, registered through lock_page_async()
+ * This is our waitqueue callback handler, registered through __folio_lock_async()
  * when we initially tried to do the IO with the iocb armed our waitqueue.
  * This gets called when the page is unlocked, and we generally expect that to
  * happen when the page IO is completed and the page is now uptodate. This will
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 21262e74fcd0..41224e4ca8cc 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -640,7 +640,6 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 
 void __folio_lock(struct folio *folio);
 int __folio_lock_killable(struct folio *folio);
-extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 void unlock_page(struct page *page);
@@ -697,22 +696,6 @@ static inline int lock_page_killable(struct page *page)
 	return folio_lock_killable(page_folio(page));
 }
 
-/*
- * lock_page_async - Lock the page, unless this would block. If the page
- * is already locked, then queue a callback when the page becomes unlocked.
- * This callback can then retry the operation.
- *
- * Returns 0 if the page is locked successfully, or -EIOCBQUEUED if the page
- * was already locked and the callback defined in 'wait' was queued.
- */
-static inline int lock_page_async(struct page *page,
-				  struct wait_page_queue *wait)
-{
-	if (!trylock_page(page))
-		return __lock_page_async(page, wait);
-	return 0;
-}
-
 /*
  * lock_page_or_retry - Lock the page, unless this would block and the
  * caller indicated that it can handle a retry.
diff --git a/mm/filemap.c b/mm/filemap.c
index ff4a2cd464f2..67334eb3fd94 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1595,18 +1595,18 @@ int __folio_lock_killable(struct folio *folio)
 }
 EXPORT_SYMBOL_GPL(__folio_lock_killable);
 
-int __lock_page_async(struct page *page, struct wait_page_queue *wait)
+static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
 {
-	struct wait_queue_head *q = page_waitqueue(page);
+	struct wait_queue_head *q = page_waitqueue(&folio->page);
 	int ret = 0;
 
-	wait->page = page;
+	wait->page = &folio->page;
 	wait->bit_nr = PG_locked;
 
 	spin_lock_irq(&q->lock);
 	__add_wait_queue_entry_tail(q, &wait->wait);
-	SetPageWaiters(page);
-	ret = !trylock_page(page);
+	folio_set_waiters_flag(folio);
+	ret = !folio_trylock(folio);
 	/*
 	 * If we were successful now, we know we're still on the
 	 * waitqueue as we're still under the lock. This means it's
@@ -2379,41 +2379,42 @@ static int filemap_update_page(struct kiocb *iocb,
 		struct address_space *mapping, struct iov_iter *iter,
 		struct page *page)
 {
+	struct folio *folio = page_folio(page);
 	int error;
 
-	if (!trylock_page(page)) {
+	if (!folio_trylock(folio)) {
 		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
 			return -EAGAIN;
 		if (!(iocb->ki_flags & IOCB_WAITQ)) {
-			put_and_wait_on_page_locked(page, TASK_KILLABLE);
+			put_and_wait_on_page_locked(&folio->page, TASK_KILLABLE);
 			return AOP_TRUNCATED_PAGE;
 		}
-		error = __lock_page_async(page, iocb->ki_waitq);
+		error = __folio_lock_async(folio, iocb->ki_waitq);
 		if (error)
 			return error;
 	}
 
-	if (!page->mapping)
+	if (!folio->mapping)
 		goto truncated;
 
 	error = 0;
-	if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, page))
+	if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, &folio->page))
 		goto unlock;
 
 	error = -EAGAIN;
 	if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT | IOCB_WAITQ))
 		goto unlock;
 
-	error = filemap_read_page(iocb->ki_filp, mapping, page);
+	error = filemap_read_page(iocb->ki_filp, mapping, &folio->page);
 	if (error == AOP_TRUNCATED_PAGE)
-		put_page(page);
+		folio_put(folio);
 	return error;
 truncated:
-	unlock_page(page);
-	put_page(page);
+	folio_unlock(folio);
+	folio_put(folio);
 	return AOP_TRUNCATED_PAGE;
 unlock:
-	unlock_page(page);
+	folio_unlock(folio);
 	return error;
 }
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 22/33] mm/filemap: Add __folio_lock_or_retry
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (20 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 21/33] mm/filemap: Add __folio_lock_async Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 10:38   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 23/33] mm/filemap: Add folio_wait_locked Matthew Wilcox (Oracle)
                   ` (13 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Convert __lock_page_or_retry() to __folio_lock_or_retry().  This actually
saves 4 bytes in the only caller of lock_page_or_retry() (due to better
register allocation) and saves the 20 byte cost of calling page_folio()
in __folio_lock_or_retry() for a total saving of 24 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h |  9 ++++++---
 mm/filemap.c            | 10 ++++------
 mm/memory.c             |  8 ++++----
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 41224e4ca8cc..21e394964288 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -640,7 +640,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 
 void __folio_lock(struct folio *folio);
 int __folio_lock_killable(struct folio *folio);
-extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
+int __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
 				unsigned int flags);
 void unlock_page(struct page *page);
 void folio_unlock(struct folio *folio);
@@ -701,13 +701,16 @@ static inline int lock_page_killable(struct page *page)
  * caller indicated that it can handle a retry.
  *
  * Return value and mmap_lock implications depend on flags; see
- * __lock_page_or_retry().
+ * __folio_lock_or_retry().
  */
 static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				     unsigned int flags)
 {
+	struct folio *folio;
 	might_sleep();
-	return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
+
+	folio = page_folio(page);
+	return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags);
 }
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index 67334eb3fd94..28bf50041671 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1623,20 +1623,18 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
 
 /*
  * Return values:
- * 1 - page is locked; mmap_lock is still held.
- * 0 - page is not locked.
+ * 1 - folio is locked; mmap_lock is still held.
+ * 0 - folio is not locked.
  *     mmap_lock has been released (mmap_read_unlock(), unless flags had both
  *     FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in
  *     which case mmap_lock is still held.
  *
  * If neither ALLOW_RETRY nor KILLABLE are set, will always return 1
- * with the page locked and the mmap_lock unperturbed.
+ * with the folio locked and the mmap_lock unperturbed.
  */
-int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
+int __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
 			 unsigned int flags)
 {
-	struct folio *folio = page_folio(page);
-
 	if (fault_flag_allow_retry_first(flags)) {
 		/*
 		 * CAUTION! In this case, mmap_lock is not released
diff --git a/mm/memory.c b/mm/memory.c
index 86ba6c1f6821..fc3f50d0702c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4065,7 +4065,7 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
  * We enter with non-exclusive mmap_lock (to exclude vma changes,
  * but allow concurrent faults).
  * The mmap_lock may have been released depending on flags and our
- * return value.  See filemap_fault() and __lock_page_or_retry().
+ * return value.  See filemap_fault() and __folio_lock_or_retry().
  * If mmap_lock is released, vma may become invalid (for example
  * by other thread calling munmap()).
  */
@@ -4307,7 +4307,7 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
  * concurrent faults).
  *
  * The mmap_lock may have been released depending on flags and our return value.
- * See filemap_fault() and __lock_page_or_retry().
+ * See filemap_fault() and __folio_lock_or_retry().
  */
 static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 {
@@ -4411,7 +4411,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
  * By the time we get here, we already hold the mm semaphore
  *
  * The mmap_lock may have been released depending on flags and our
- * return value.  See filemap_fault() and __lock_page_or_retry().
+ * return value.  See filemap_fault() and __folio_lock_or_retry().
  */
 static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags)
@@ -4567,7 +4567,7 @@ static inline void mm_account_fault(struct pt_regs *regs,
  * By the time we get here, we already hold the mm semaphore
  *
  * The mmap_lock may have been released depending on flags and our
- * return value.  See filemap_fault() and __lock_page_or_retry().
+ * return value.  See filemap_fault() and __folio_lock_or_retry().
  */
 vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 			   unsigned int flags, struct pt_regs *regs)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 23/33] mm/filemap: Add folio_wait_locked
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (21 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 22/33] mm/filemap: Add __folio_lock_or_retry Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 10:41   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 24/33] mm/swap: Add folio_rotate_reclaimable Matthew Wilcox (Oracle)
                   ` (12 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Also add folio_wait_locked_killable().  Turn wait_on_page_locked()
and wait_on_page_locked_killable() into wrappers.  This eliminates a
call to compound_head() from each call-site, reducing text size by 200
bytes for me.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h | 26 ++++++++++++++++++--------
 mm/filemap.c            |  4 ++--
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 21e394964288..e2648d906a84 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -721,23 +721,33 @@ extern void wait_on_page_bit(struct page *page, int bit_nr);
 extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
 
 /* 
- * Wait for a page to be unlocked.
+ * Wait for a folio to be unlocked.
  *
- * This must be called with the caller "holding" the page,
- * ie with increased "page->count" so that the page won't
+ * This must be called with the caller "holding" the folio,
+ * ie with increased "page->count" so that the folio won't
  * go away during the wait..
  */
+static inline void folio_wait_locked(struct folio *folio)
+{
+	if (folio_locked(folio))
+		wait_on_page_bit(&folio->page, PG_locked);
+}
+
+static inline int folio_wait_locked_killable(struct folio *folio)
+{
+	if (!folio_locked(folio))
+		return 0;
+	return wait_on_page_bit_killable(&folio->page, PG_locked);
+}
+
 static inline void wait_on_page_locked(struct page *page)
 {
-	if (PageLocked(page))
-		wait_on_page_bit(compound_head(page), PG_locked);
+	folio_wait_locked(page_folio(page));
 }
 
 static inline int wait_on_page_locked_killable(struct page *page)
 {
-	if (!PageLocked(page))
-		return 0;
-	return wait_on_page_bit_killable(compound_head(page), PG_locked);
+	return folio_wait_locked_killable(page_folio(page));
 }
 
 int put_and_wait_on_page_locked(struct page *page, int state);
diff --git a/mm/filemap.c b/mm/filemap.c
index 28bf50041671..73c31b63392f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1645,9 +1645,9 @@ int __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
 
 		mmap_read_unlock(mm);
 		if (flags & FAULT_FLAG_KILLABLE)
-			wait_on_page_locked_killable(page);
+			folio_wait_locked_killable(folio);
 		else
-			wait_on_page_locked(page);
+			folio_wait_locked(folio);
 		return 0;
 	}
 	if (flags & FAULT_FLAG_KILLABLE) {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 24/33] mm/swap: Add folio_rotate_reclaimable
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (22 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 23/33] mm/filemap: Add folio_wait_locked Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 10:48   ` Vlastimil Babka
  2021-05-27  8:19   ` Christoph Hellwig
  2021-05-11 21:47 ` [PATCH v10 25/33] mm/filemap: Add folio_end_writeback Matthew Wilcox (Oracle)
                   ` (11 subsequent siblings)
  35 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm, linux-kernel

Move the declaration into mm/internal.h and rename
rotate_reclaimable_page() to folio_rotate_reclaimable().  This eliminates
all five of the calls to compound_head() in this function, saving 75 bytes
at the cost of adding 14 bytes to its one caller, end_page_writeback().
Net 61 bytes savings.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/swap.h |  1 -
 mm/filemap.c         |  2 +-
 mm/internal.h        |  1 +
 mm/page_io.c         |  4 ++--
 mm/swap.c            | 18 +++++++++---------
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 20766342845b..76b2338ef24d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -365,7 +365,6 @@ extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_cpu_zone(struct zone *zone);
 extern void lru_add_drain_all(void);
-extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
 extern void deactivate_page(struct page *page);
 extern void mark_page_lazyfree(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index 73c31b63392f..63654a2f7d56 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1528,7 +1528,7 @@ void end_page_writeback(struct page *page)
 	 */
 	if (PageReclaim(page)) {
 		ClearPageReclaim(page);
-		rotate_reclaimable_page(page);
+		folio_rotate_reclaimable(page_folio(page));
 	}
 
 	/*
diff --git a/mm/internal.h b/mm/internal.h
index 46eb82eaa195..68d363a3a1f3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -35,6 +35,7 @@
 void page_writeback_init(void);
 
 vm_fault_t do_swap_page(struct vm_fault *vmf);
+void folio_rotate_reclaimable(struct folio *folio);
 
 void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
 		unsigned long floor, unsigned long ceiling);
diff --git a/mm/page_io.c b/mm/page_io.c
index c493ce9ebcf5..d597bc6e6e45 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -38,7 +38,7 @@ void end_swap_bio_write(struct bio *bio)
 		 * Also print a dire warning that things will go BAD (tm)
 		 * very quickly.
 		 *
-		 * Also clear PG_reclaim to avoid rotate_reclaimable_page()
+		 * Also clear PG_reclaim to avoid folio_rotate_reclaimable()
 		 */
 		set_page_dirty(page);
 		pr_alert_ratelimited("Write-error on swap-device (%u:%u:%llu)\n",
@@ -317,7 +317,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 			 * temporary failure if the system has limited
 			 * memory for allocating transmit buffers.
 			 * Mark the page dirty and avoid
-			 * rotate_reclaimable_page but rate-limit the
+			 * folio_rotate_reclaimable but rate-limit the
 			 * messages but do not flag PageError like
 			 * the normal direct-to-bio case as it could
 			 * be temporary.
diff --git a/mm/swap.c b/mm/swap.c
index dfb48cf9c2c9..6caca11cd2ec 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -249,23 +249,23 @@ static bool pagevec_add_and_need_flush(struct pagevec *pvec, struct page *page)
 }
 
 /*
- * Writeback is about to end against a page which has been marked for immediate
- * reclaim.  If it still appears to be reclaimable, move it to the tail of the
- * inactive list.
+ * Writeback is about to end against a folio which has been marked for
+ * immediate reclaim.  If it still appears to be reclaimable, move it
+ * to the tail of the inactive list.
  *
- * rotate_reclaimable_page() must disable IRQs, to prevent nasty races.
+ * folio_rotate_reclaimable() must disable IRQs, to prevent nasty races.
  */
-void rotate_reclaimable_page(struct page *page)
+void folio_rotate_reclaimable(struct folio *folio)
 {
-	if (!PageLocked(page) && !PageDirty(page) &&
-	    !PageUnevictable(page) && PageLRU(page)) {
+	if (!folio_locked(folio) && !folio_dirty(folio) &&
+	    !folio_unevictable(folio) && folio_lru(folio)) {
 		struct pagevec *pvec;
 		unsigned long flags;
 
-		get_page(page);
+		folio_get(folio);
 		local_lock_irqsave(&lru_rotate.lock, flags);
 		pvec = this_cpu_ptr(&lru_rotate.pvec);
-		if (pagevec_add_and_need_flush(pvec, page))
+		if (pagevec_add_and_need_flush(pvec, &folio->page))
 			pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
 		local_unlock_irqrestore(&lru_rotate.lock, flags);
 	}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 25/33] mm/filemap: Add folio_end_writeback
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (23 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 24/33] mm/swap: Add folio_rotate_reclaimable Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 11:08   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 26/33] mm/writeback: Add folio_wait_writeback Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Add an end_page_writeback() wrapper function for users that are not yet
converted to folios.

folio_end_writeback() is less than half the size of end_page_writeback()
at just 105 bytes compared to 213 bytes, due to removing all the
compound_head() calls.  The 30 byte wrapper function makes this a net
saving of 70 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h |  3 ++-
 mm/filemap.c            | 40 ++++++++++++++++++++--------------------
 mm/folio-compat.c       |  6 ++++++
 3 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e2648d906a84..cbd86c952e25 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -753,7 +753,8 @@ static inline int wait_on_page_locked_killable(struct page *page)
 int put_and_wait_on_page_locked(struct page *page, int state);
 void wait_on_page_writeback(struct page *page);
 int wait_on_page_writeback_killable(struct page *page);
-extern void end_page_writeback(struct page *page);
+void end_page_writeback(struct page *page);
+void folio_end_writeback(struct folio *folio);
 void wait_for_stable_page(struct page *page);
 
 void page_endio(struct page *page, bool is_write, int err);
diff --git a/mm/filemap.c b/mm/filemap.c
index 63654a2f7d56..62312edba8ce 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1175,11 +1175,11 @@ static void wake_up_page_bit(struct page *page, int bit_nr)
 	spin_unlock_irqrestore(&q->lock, flags);
 }
 
-static void wake_up_page(struct page *page, int bit)
+static void folio_wake(struct folio *folio, int bit)
 {
-	if (!PageWaiters(page))
+	if (!folio_waiters(folio))
 		return;
-	wake_up_page_bit(page, bit);
+	wake_up_page_bit(&folio->page, bit);
 }
 
 /*
@@ -1514,38 +1514,38 @@ int wait_on_page_private_2_killable(struct page *page)
 EXPORT_SYMBOL(wait_on_page_private_2_killable);
 
 /**
- * end_page_writeback - end writeback against a page
- * @page: the page
+ * folio_end_writeback - End writeback against a folio.
+ * @folio: The folio.
  */
-void end_page_writeback(struct page *page)
+void folio_end_writeback(struct folio *folio)
 {
 	/*
-	 * TestClearPageReclaim could be used here but it is an atomic
+	 * folio_test_clear_reclaim_flag() could be used here but it is an atomic
 	 * operation and overkill in this particular case. Failing to
-	 * shuffle a page marked for immediate reclaim is too mild to
+	 * shuffle a folio marked for immediate reclaim is too mild to
 	 * justify taking an atomic operation penalty at the end of
-	 * ever page writeback.
+	 * every folio writeback.
 	 */
-	if (PageReclaim(page)) {
-		ClearPageReclaim(page);
-		folio_rotate_reclaimable(page_folio(page));
+	if (folio_reclaim(folio)) {
+		folio_clear_reclaim_flag(folio);
+		folio_rotate_reclaimable(folio);
 	}
 
 	/*
-	 * Writeback does not hold a page reference of its own, relying
+	 * Writeback does not hold a folio reference of its own, relying
 	 * on truncation to wait for the clearing of PG_writeback.
-	 * But here we must make sure that the page is not freed and
-	 * reused before the wake_up_page().
+	 * But here we must make sure that the folio is not freed and
+	 * reused before the folio_wake().
 	 */
-	get_page(page);
-	if (!test_clear_page_writeback(page))
+	folio_get(folio);
+	if (!test_clear_page_writeback(&folio->page))
 		BUG();
 
 	smp_mb__after_atomic();
-	wake_up_page(page, PG_writeback);
-	put_page(page);
+	folio_wake(folio, PG_writeback);
+	folio_put(folio);
 }
-EXPORT_SYMBOL(end_page_writeback);
+EXPORT_SYMBOL(folio_end_writeback);
 
 /*
  * After completing I/O on a page, call this routine to update the page
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 91b3d00a92f7..526843d03d58 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -17,3 +17,9 @@ void unlock_page(struct page *page)
 	return folio_unlock(page_folio(page));
 }
 EXPORT_SYMBOL(unlock_page);
+
+void end_page_writeback(struct page *page)
+{
+	return folio_end_writeback(page_folio(page));
+}
+EXPORT_SYMBOL(end_page_writeback);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 26/33] mm/writeback: Add folio_wait_writeback
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (24 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 25/33] mm/filemap: Add folio_end_writeback Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 11:12   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 27/33] mm/writeback: Add folio_wait_stable Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

wait_on_page_writeback_killable() only has one caller, so convert it to
call folio_wait_writeback_killable().  For the wait_on_page_writeback()
callers, add a compatibility wrapper around folio_wait_writeback().

Turning PageWriteback() into folio_writeback() eliminates a call to
compound_head() which saves 8 bytes and 15 bytes in the two functions.
That is more than offset by adding the wait_on_page_writeback
compatibility wrapper for a net increase in text of 15 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 fs/afs/write.c          |  9 ++++----
 include/linux/pagemap.h |  3 ++-
 mm/folio-compat.c       |  6 ++++++
 mm/page-writeback.c     | 48 ++++++++++++++++++++++++++++-------------
 4 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index 3edb6204b937..22b1c4d43687 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -832,7 +832,8 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
  */
 vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 {
-	struct page *page = thp_head(vmf->page);
+	struct folio *folio = page_folio(vmf->page);
+	struct page *page = &folio->page;
 	struct file *file = vmf->vma->vm_file;
 	struct inode *inode = file_inode(file);
 	struct afs_vnode *vnode = AFS_FS_I(inode);
@@ -851,7 +852,7 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 		return VM_FAULT_RETRY;
 #endif
 
-	if (wait_on_page_writeback_killable(page))
+	if (folio_wait_writeback_killable(folio))
 		return VM_FAULT_RETRY;
 
 	if (lock_page_killable(page) < 0)
@@ -861,8 +862,8 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	 * details the portion of the page we need to write back and we might
 	 * need to redirty the page if there's a problem.
 	 */
-	if (wait_on_page_writeback_killable(page) < 0) {
-		unlock_page(page);
+	if (folio_wait_writeback_killable(folio) < 0) {
+		folio_unlock(folio);
 		return VM_FAULT_RETRY;
 	}
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cbd86c952e25..417efd7edd19 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -752,7 +752,8 @@ static inline int wait_on_page_locked_killable(struct page *page)
 
 int put_and_wait_on_page_locked(struct page *page, int state);
 void wait_on_page_writeback(struct page *page);
-int wait_on_page_writeback_killable(struct page *page);
+void folio_wait_writeback(struct folio *folio);
+int folio_wait_writeback_killable(struct folio *folio);
 void end_page_writeback(struct page *page);
 void folio_end_writeback(struct folio *folio);
 void wait_for_stable_page(struct page *page);
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 526843d03d58..41275dac7a92 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -23,3 +23,9 @@ void end_page_writeback(struct page *page)
 	return folio_end_writeback(page_folio(page));
 }
 EXPORT_SYMBOL(end_page_writeback);
+
+void wait_on_page_writeback(struct page *page)
+{
+	return folio_wait_writeback(page_folio(page));
+}
+EXPORT_SYMBOL_GPL(wait_on_page_writeback);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fe72d5f65688..d7ac428df68a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2813,33 +2813,51 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 }
 EXPORT_SYMBOL(__test_set_page_writeback);
 
-/*
- * Wait for a page to complete writeback
+/**
+ * folio_wait_writeback - Wait for a folio to finish writeback.
+ * @folio: The folio to wait for.
+ *
+ * If the folio is currently being written back to storage, wait for the
+ * I/O to complete.
+ *
+ * Context: Sleeps.  Must be called in process context and with
+ * no spinlocks held.  Caller should hold a reference on the folio.
+ * If the folio is not locked, writeback may start again after writeback
+ * has finished.
  */
-void wait_on_page_writeback(struct page *page)
+void folio_wait_writeback(struct folio *folio)
 {
-	while (PageWriteback(page)) {
-		trace_wait_on_page_writeback(page, page_mapping(page));
-		wait_on_page_bit(page, PG_writeback);
+	while (folio_writeback(folio)) {
+		trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
+		wait_on_page_bit(&folio->page, PG_writeback);
 	}
 }
-EXPORT_SYMBOL_GPL(wait_on_page_writeback);
+EXPORT_SYMBOL_GPL(folio_wait_writeback);
 
-/*
- * Wait for a page to complete writeback.  Returns -EINTR if we get a
- * fatal signal while waiting.
+/**
+ * folio_wait_writeback_killable - Wait for a folio to finish writeback.
+ * @folio: The folio to wait for.
+ *
+ * If the folio is currently being written back to storage, wait for the
+ * I/O to complete or a fatal signal to arrive.
+ *
+ * Context: Sleeps.  Must be called in process context and with
+ * no spinlocks held.  Caller should hold a reference on the folio.
+ * If the folio is not locked, writeback may start again after writeback
+ * has finished.
+ * Return: 0 on success, -EINTR if we get a fatal signal while waiting.
  */
-int wait_on_page_writeback_killable(struct page *page)
+int folio_wait_writeback_killable(struct folio *folio)
 {
-	while (PageWriteback(page)) {
-		trace_wait_on_page_writeback(page, page_mapping(page));
-		if (wait_on_page_bit_killable(page, PG_writeback))
+	while (folio_writeback(folio)) {
+		trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
+		if (wait_on_page_bit_killable(&folio->page, PG_writeback))
 			return -EINTR;
 	}
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(wait_on_page_writeback_killable);
+EXPORT_SYMBOL_GPL(folio_wait_writeback_killable);
 
 /**
  * wait_for_stable_page() - wait for writeback to finish, if necessary.
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 27/33] mm/writeback: Add folio_wait_stable
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (25 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 26/33] mm/writeback: Add folio_wait_writeback Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 11:42   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 28/33] mm/filemap: Add folio_wait_bit Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Move wait_for_stable_page() into the folio compatibility file.
folio_wait_stable() avoids a call to compound_head() and is 14 bytes
smaller than wait_for_stable_page() was.  The net text size grows by 24
bytes as a result of this patch.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h |  1 +
 mm/folio-compat.c       |  6 ++++++
 mm/page-writeback.c     | 24 ++++++++++++++----------
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 417efd7edd19..06b69cd03da3 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -757,6 +757,7 @@ int folio_wait_writeback_killable(struct folio *folio);
 void end_page_writeback(struct page *page);
 void folio_end_writeback(struct folio *folio);
 void wait_for_stable_page(struct page *page);
+void folio_wait_stable(struct folio *folio);
 
 void page_endio(struct page *page, bool is_write, int err);
 
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 41275dac7a92..3c83f03b80d7 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -29,3 +29,9 @@ void wait_on_page_writeback(struct page *page)
 	return folio_wait_writeback(page_folio(page));
 }
 EXPORT_SYMBOL_GPL(wait_on_page_writeback);
+
+void wait_for_stable_page(struct page *page)
+{
+	return folio_wait_stable(page_folio(page));
+}
+EXPORT_SYMBOL_GPL(wait_for_stable_page);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d7ac428df68a..003b85813f7c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2860,17 +2860,21 @@ int folio_wait_writeback_killable(struct folio *folio)
 EXPORT_SYMBOL_GPL(folio_wait_writeback_killable);
 
 /**
- * wait_for_stable_page() - wait for writeback to finish, if necessary.
- * @page:	The page to wait on.
+ * folio_wait_stable() - wait for writeback to finish, if necessary.
+ * @folio: The folio to wait on.
  *
- * This function determines if the given page is related to a backing device
- * that requires page contents to be held stable during writeback.  If so, then
- * it will wait for any pending writeback to complete.
+ * This function determines if the given folio is related to a backing
+ * device that requires folio contents to be held stable during writeback.
+ * If so, then it will wait for any pending writeback to complete.
+ *
+ * Context: Sleeps.  Must be called in process context and with
+ * no spinlocks held.  Caller should hold a reference on the folio.
+ * If the folio is not locked, writeback may start again after writeback
+ * has finished.
  */
-void wait_for_stable_page(struct page *page)
+void folio_wait_stable(struct folio *folio)
 {
-	page = thp_head(page);
-	if (page->mapping->host->i_sb->s_iflags & SB_I_STABLE_WRITES)
-		wait_on_page_writeback(page);
+	if (folio->mapping->host->i_sb->s_iflags & SB_I_STABLE_WRITES)
+		folio_wait_writeback(folio);
 }
-EXPORT_SYMBOL_GPL(wait_for_stable_page);
+EXPORT_SYMBOL_GPL(folio_wait_stable);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 28/33] mm/filemap: Add folio_wait_bit
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (26 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 27/33] mm/writeback: Add folio_wait_stable Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 11:51   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 29/33] mm/filemap: Add folio_wake_bit Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Rename wait_on_page_bit() to folio_wait_bit().  We must always wait on
the folio, otherwise we won't be woken up due to the tail page hashing
to a different bucket from the head page.

This commit shrinks the kernel by 691 bytes, mostly due to moving
the page waitqueue lookup into folio_wait_bit_common().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/pagemap.h | 10 +++---
 mm/filemap.c            | 77 +++++++++++++++++++----------------------
 mm/page-writeback.c     |  4 +--
 3 files changed, 43 insertions(+), 48 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 06b69cd03da3..e524e1b7190a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -714,11 +714,11 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 }
 
 /*
- * This is exported only for wait_on_page_locked/wait_on_page_writeback, etc.,
+ * This is exported only for folio_wait_locked/folio_wait_writeback, etc.,
  * and should not be used directly.
  */
-extern void wait_on_page_bit(struct page *page, int bit_nr);
-extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
+extern void folio_wait_bit(struct folio *folio, int bit_nr);
+extern int folio_wait_bit_killable(struct folio *folio, int bit_nr);
 
 /* 
  * Wait for a folio to be unlocked.
@@ -730,14 +730,14 @@ extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
 static inline void folio_wait_locked(struct folio *folio)
 {
 	if (folio_locked(folio))
-		wait_on_page_bit(&folio->page, PG_locked);
+		folio_wait_bit(folio, PG_locked);
 }
 
 static inline int folio_wait_locked_killable(struct folio *folio)
 {
 	if (!folio_locked(folio))
 		return 0;
-	return wait_on_page_bit_killable(&folio->page, PG_locked);
+	return folio_wait_bit_killable(folio, PG_locked);
 }
 
 static inline void wait_on_page_locked(struct page *page)
diff --git a/mm/filemap.c b/mm/filemap.c
index 62312edba8ce..60afd53fbeb3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1102,7 +1102,7 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 	 *
 	 * So update the flags atomically, and wake up the waiter
 	 * afterwards to avoid any races. This store-release pairs
-	 * with the load-acquire in wait_on_page_bit_common().
+	 * with the load-acquire in folio_wait_bit_common().
 	 */
 	smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN);
 	wake_up_state(wait->private, mode);
@@ -1183,7 +1183,7 @@ static void folio_wake(struct folio *folio, int bit)
 }
 
 /*
- * A choice of three behaviors for wait_on_page_bit_common():
+ * A choice of three behaviors for folio_wait_bit_common():
  */
 enum behavior {
 	EXCLUSIVE,	/* Hold ref to page and take the bit when woken, like
@@ -1198,16 +1198,16 @@ enum behavior {
 };
 
 /*
- * Attempt to check (or get) the page bit, and mark us done
+ * Attempt to check (or get) the folio flag, and mark us done
  * if successful.
  */
-static inline bool trylock_page_bit_common(struct page *page, int bit_nr,
+static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
 					struct wait_queue_entry *wait)
 {
 	if (wait->flags & WQ_FLAG_EXCLUSIVE) {
-		if (test_and_set_bit(bit_nr, &page->flags))
+		if (test_and_set_bit(bit_nr, &folio->flags))
 			return false;
-	} else if (test_bit(bit_nr, &page->flags))
+	} else if (test_bit(bit_nr, &folio->flags))
 		return false;
 
 	wait->flags |= WQ_FLAG_WOKEN | WQ_FLAG_DONE;
@@ -1217,9 +1217,10 @@ static inline bool trylock_page_bit_common(struct page *page, int bit_nr,
 /* How many times do we accept lock stealing from under a waiter? */
 int sysctl_page_lock_unfairness = 5;
 
-static inline int wait_on_page_bit_common(wait_queue_head_t *q,
-	struct page *page, int bit_nr, int state, enum behavior behavior)
+static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
+		int state, enum behavior behavior)
 {
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
 	int unfairness = sysctl_page_lock_unfairness;
 	struct wait_page_queue wait_page;
 	wait_queue_entry_t *wait = &wait_page.wait;
@@ -1228,8 +1229,8 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	unsigned long pflags;
 
 	if (bit_nr == PG_locked &&
-	    !PageUptodate(page) && PageWorkingset(page)) {
-		if (!PageSwapBacked(page)) {
+	    !folio_uptodate(folio) && folio_workingset(folio)) {
+		if (!folio_swapbacked(folio)) {
 			delayacct_thrashing_start();
 			delayacct = true;
 		}
@@ -1239,7 +1240,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 
 	init_wait(wait);
 	wait->func = wake_page_function;
-	wait_page.page = page;
+	wait_page.page = &folio->page;
 	wait_page.bit_nr = bit_nr;
 
 repeat:
@@ -1254,7 +1255,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	 * Do one last check whether we can get the
 	 * page bit synchronously.
 	 *
-	 * Do the SetPageWaiters() marking before that
+	 * Do the folio_set_waiters_flag() marking before that
 	 * to let any waker we _just_ missed know they
 	 * need to wake us up (otherwise they'll never
 	 * even go to the slow case that looks at the
@@ -1265,8 +1266,8 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	 * lock to avoid races.
 	 */
 	spin_lock_irq(&q->lock);
-	SetPageWaiters(page);
-	if (!trylock_page_bit_common(page, bit_nr, wait))
+	folio_set_waiters_flag(folio);
+	if (!folio_trylock_flag(folio, bit_nr, wait))
 		__add_wait_queue_entry_tail(q, wait);
 	spin_unlock_irq(&q->lock);
 
@@ -1276,10 +1277,10 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	 * see whether the page bit testing has already
 	 * been done by the wake function.
 	 *
-	 * We can drop our reference to the page.
+	 * We can drop our reference to the folio.
 	 */
 	if (behavior == DROP)
-		put_page(page);
+		folio_put(folio);
 
 	/*
 	 * Note that until the "finish_wait()", or until
@@ -1316,7 +1317,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 		 *
 		 * And if that fails, we'll have to retry this all.
 		 */
-		if (unlikely(test_and_set_bit(bit_nr, &page->flags)))
+		if (unlikely(test_and_set_bit(bit_nr, folio_flags(folio, 0))))
 			goto repeat;
 
 		wait->flags |= WQ_FLAG_DONE;
@@ -1325,7 +1326,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 
 	/*
 	 * If a signal happened, this 'finish_wait()' may remove the last
-	 * waiter from the wait-queues, but the PageWaiters bit will remain
+	 * waiter from the wait-queues, but the folio_waiters bit will remain
 	 * set. That's ok. The next wakeup will take care of it, and trying
 	 * to do it here would be difficult and prone to races.
 	 */
@@ -1356,19 +1357,17 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
 }
 
-void wait_on_page_bit(struct page *page, int bit_nr)
+void folio_wait_bit(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
-	wait_on_page_bit_common(q, page, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
+	folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
 }
-EXPORT_SYMBOL(wait_on_page_bit);
+EXPORT_SYMBOL(folio_wait_bit);
 
-int wait_on_page_bit_killable(struct page *page, int bit_nr)
+int folio_wait_bit_killable(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
-	return wait_on_page_bit_common(q, page, bit_nr, TASK_KILLABLE, SHARED);
+	return folio_wait_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED);
 }
-EXPORT_SYMBOL(wait_on_page_bit_killable);
+EXPORT_SYMBOL(folio_wait_bit_killable);
 
 /**
  * put_and_wait_on_page_locked - Drop a reference and wait for it to be unlocked
@@ -1385,11 +1384,8 @@ EXPORT_SYMBOL(wait_on_page_bit_killable);
  */
 int put_and_wait_on_page_locked(struct page *page, int state)
 {
-	wait_queue_head_t *q;
-
-	page = compound_head(page);
-	q = page_waitqueue(page);
-	return wait_on_page_bit_common(q, page, PG_locked, state, DROP);
+	return folio_wait_bit_common(page_folio(page), PG_locked, state,
+			DROP);
 }
 
 /**
@@ -1481,9 +1477,10 @@ EXPORT_SYMBOL(end_page_private_2);
  */
 void wait_on_page_private_2(struct page *page)
 {
-	page = compound_head(page);
-	while (PagePrivate2(page))
-		wait_on_page_bit(page, PG_private_2);
+	struct folio *folio = page_folio(page);
+
+	while (folio_private_2(folio))
+		folio_wait_bit(folio, PG_private_2);
 }
 EXPORT_SYMBOL(wait_on_page_private_2);
 
@@ -1500,11 +1497,11 @@ EXPORT_SYMBOL(wait_on_page_private_2);
  */
 int wait_on_page_private_2_killable(struct page *page)
 {
+	struct folio *folio = page_folio(page);
 	int ret = 0;
 
-	page = compound_head(page);
-	while (PagePrivate2(page)) {
-		ret = wait_on_page_bit_killable(page, PG_private_2);
+	while (folio_private_2(folio)) {
+		ret = folio_wait_bit_killable(folio, PG_private_2);
 		if (ret < 0)
 			break;
 	}
@@ -1581,16 +1578,14 @@ EXPORT_SYMBOL_GPL(page_endio);
  */
 void __folio_lock(struct folio *folio)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
-	wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_UNINTERRUPTIBLE,
+	folio_wait_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE,
 				EXCLUSIVE);
 }
 EXPORT_SYMBOL(__folio_lock);
 
 int __folio_lock_killable(struct folio *folio)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
-	return wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_KILLABLE,
+	return folio_wait_bit_common(folio, PG_locked, TASK_KILLABLE,
 					EXCLUSIVE);
 }
 EXPORT_SYMBOL_GPL(__folio_lock_killable);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 003b85813f7c..7f82235e60c3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2829,7 +2829,7 @@ void folio_wait_writeback(struct folio *folio)
 {
 	while (folio_writeback(folio)) {
 		trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
-		wait_on_page_bit(&folio->page, PG_writeback);
+		folio_wait_bit(folio, PG_writeback);
 	}
 }
 EXPORT_SYMBOL_GPL(folio_wait_writeback);
@@ -2851,7 +2851,7 @@ int folio_wait_writeback_killable(struct folio *folio)
 {
 	while (folio_writeback(folio)) {
 		trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
-		if (wait_on_page_bit_killable(&folio->page, PG_writeback))
+		if (folio_wait_bit_killable(folio, PG_writeback))
 			return -EINTR;
 	}
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 29/33] mm/filemap: Add folio_wake_bit
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (27 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 28/33] mm/filemap: Add folio_wait_bit Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 11:53   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 30/33] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Convert wake_up_page_bit() to folio_wake_bit().  All callers have a folio,
so use it directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 mm/filemap.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 60afd53fbeb3..e974bca3e267 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1121,14 +1121,14 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 	return (flags & WQ_FLAG_EXCLUSIVE) != 0;
 }
 
-static void wake_up_page_bit(struct page *page, int bit_nr)
+static void folio_wake_bit(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
+	wait_queue_head_t *q = page_waitqueue(&folio->page);
 	struct wait_page_key key;
 	unsigned long flags;
 	wait_queue_entry_t bookmark;
 
-	key.page = page;
+	key.page = &folio->page;
 	key.bit_nr = bit_nr;
 	key.page_match = 0;
 
@@ -1163,7 +1163,7 @@ static void wake_up_page_bit(struct page *page, int bit_nr)
 	 * page waiters.
 	 */
 	if (!waitqueue_active(q) || !key.page_match) {
-		ClearPageWaiters(page);
+		folio_clear_waiters_flag(folio);
 		/*
 		 * It's possible to miss clearing Waiters here, when we woke
 		 * our page waiters, but the hashed waitqueue has waiters for
@@ -1179,7 +1179,7 @@ static void folio_wake(struct folio *folio, int bit)
 {
 	if (!folio_waiters(folio))
 		return;
-	wake_up_page_bit(&folio->page, bit);
+	folio_wake_bit(folio, bit);
 }
 
 /*
@@ -1444,7 +1444,7 @@ void folio_unlock(struct folio *folio)
 	BUILD_BUG_ON(PG_waiters != 7);
 	VM_BUG_ON_FOLIO(!folio_locked(folio), folio);
 	if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0)))
-		wake_up_page_bit(&folio->page, PG_locked);
+		folio_wake_bit(folio, PG_locked);
 }
 EXPORT_SYMBOL(folio_unlock);
 
@@ -1461,11 +1461,12 @@ EXPORT_SYMBOL(folio_unlock);
  */
 void end_page_private_2(struct page *page)
 {
-	page = compound_head(page);
-	VM_BUG_ON_PAGE(!PagePrivate2(page), page);
-	clear_bit_unlock(PG_private_2, &page->flags);
-	wake_up_page_bit(page, PG_private_2);
-	put_page(page);
+	struct folio *folio = page_folio(page);
+
+	VM_BUG_ON_FOLIO(!folio_private_2(folio), folio);
+	clear_bit_unlock(PG_private_2, folio_flags(folio, 0));
+	folio_wake_bit(folio, PG_private_2);
+	folio_put(folio);
 }
 EXPORT_SYMBOL(end_page_private_2);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 30/33] mm/filemap: Convert page wait queues to be folios
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (28 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 29/33] mm/filemap: Add folio_wake_bit Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 12:23   ` Vlastimil Babka
  2021-05-11 21:47 ` [PATCH v10 31/33] mm/filemap: Add folio private_2 functions Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  35 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

Reinforce that page flags are actually in the head page by changing the
type from page to folio.  Increases the size of cachefiles by two bytes,
but the kernel core is unchanged in size.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
---
 fs/cachefiles/rdwr.c    | 16 ++++++++--------
 include/linux/pagemap.h |  8 ++++----
 mm/filemap.c            | 38 +++++++++++++++++++-------------------
 3 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 8ffc40e84a59..e211a3d5ba44 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -25,20 +25,20 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
 	struct cachefiles_object *object;
 	struct fscache_retrieval *op = monitor->op;
 	struct wait_page_key *key = _key;
-	struct page *page = wait->private;
+	struct folio *folio = wait->private;
 
 	ASSERT(key);
 
 	_enter("{%lu},%u,%d,{%p,%u}",
 	       monitor->netfs_page->index, mode, sync,
-	       key->page, key->bit_nr);
+	       key->folio, key->bit_nr);
 
-	if (key->page != page || key->bit_nr != PG_locked)
+	if (key->folio != folio || key->bit_nr != PG_locked)
 		return 0;
 
-	_debug("--- monitor %p %lx ---", page, page->flags);
+	_debug("--- monitor %p %lx ---", folio, folio->flags);
 
-	if (!PageUptodate(page) && !PageError(page)) {
+	if (!folio_uptodate(folio) && !folio_error(folio)) {
 		/* unlocked, not uptodate and not erronous? */
 		_debug("page probably truncated");
 	}
@@ -107,7 +107,7 @@ static int cachefiles_read_reissue(struct cachefiles_object *object,
 	put_page(backpage2);
 
 	INIT_LIST_HEAD(&monitor->op_link);
-	add_page_wait_queue(backpage, &monitor->monitor);
+	folio_add_wait_queue(page_folio(backpage), &monitor->monitor);
 
 	if (trylock_page(backpage)) {
 		ret = -EIO;
@@ -294,7 +294,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 	get_page(backpage);
 	monitor->back_page = backpage;
 	monitor->monitor.private = backpage;
-	add_page_wait_queue(backpage, &monitor->monitor);
+	folio_add_wait_queue(page_folio(backpage), &monitor->monitor);
 	monitor = NULL;
 
 	/* but the page may have been read before the monitor was installed, so
@@ -548,7 +548,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 		get_page(backpage);
 		monitor->back_page = backpage;
 		monitor->monitor.private = backpage;
-		add_page_wait_queue(backpage, &monitor->monitor);
+		folio_add_wait_queue(page_folio(backpage), &monitor->monitor);
 		monitor = NULL;
 
 		/* but the page may have been read before the monitor was
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e524e1b7190a..353df9aaa8e9 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -614,13 +614,13 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 }
 
 struct wait_page_key {
-	struct page *page;
+	struct folio *folio;
 	int bit_nr;
 	int page_match;
 };
 
 struct wait_page_queue {
-	struct page *page;
+	struct folio *folio;
 	int bit_nr;
 	wait_queue_entry_t wait;
 };
@@ -628,7 +628,7 @@ struct wait_page_queue {
 static inline bool wake_page_match(struct wait_page_queue *wait_page,
 				  struct wait_page_key *key)
 {
-	if (wait_page->page != key->page)
+	if (wait_page->folio != key->folio)
 	       return false;
 	key->page_match = 1;
 
@@ -784,7 +784,7 @@ int wait_on_page_private_2_killable(struct page *page);
 /*
  * Add an arbitrary waiter to a page's wait queue
  */
-extern void add_page_wait_queue(struct page *page, wait_queue_entry_t *waiter);
+void folio_add_wait_queue(struct folio *folio, wait_queue_entry_t *waiter);
 
 /*
  * Fault everything in given userspace address range in.
diff --git a/mm/filemap.c b/mm/filemap.c
index e974bca3e267..1396560dfde8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1019,11 +1019,11 @@ EXPORT_SYMBOL(__page_cache_alloc);
  */
 #define PAGE_WAIT_TABLE_BITS 8
 #define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS)
-static wait_queue_head_t page_wait_table[PAGE_WAIT_TABLE_SIZE] __cacheline_aligned;
+static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE] __cacheline_aligned;
 
-static wait_queue_head_t *page_waitqueue(struct page *page)
+static wait_queue_head_t *folio_waitqueue(struct folio *folio)
 {
-	return &page_wait_table[hash_ptr(page, PAGE_WAIT_TABLE_BITS)];
+	return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)];
 }
 
 void __init pagecache_init(void)
@@ -1031,7 +1031,7 @@ void __init pagecache_init(void)
 	int i;
 
 	for (i = 0; i < PAGE_WAIT_TABLE_SIZE; i++)
-		init_waitqueue_head(&page_wait_table[i]);
+		init_waitqueue_head(&folio_wait_table[i]);
 
 	page_writeback_init();
 }
@@ -1086,10 +1086,10 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 	 */
 	flags = wait->flags;
 	if (flags & WQ_FLAG_EXCLUSIVE) {
-		if (test_bit(key->bit_nr, &key->page->flags))
+		if (test_bit(key->bit_nr, &key->folio->flags))
 			return -1;
 		if (flags & WQ_FLAG_CUSTOM) {
-			if (test_and_set_bit(key->bit_nr, &key->page->flags))
+			if (test_and_set_bit(key->bit_nr, &key->folio->flags))
 				return -1;
 			flags |= WQ_FLAG_DONE;
 		}
@@ -1123,12 +1123,12 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
 
 static void folio_wake_bit(struct folio *folio, int bit_nr)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	wait_queue_head_t *q = folio_waitqueue(folio);
 	struct wait_page_key key;
 	unsigned long flags;
 	wait_queue_entry_t bookmark;
 
-	key.page = &folio->page;
+	key.folio = folio;
 	key.bit_nr = bit_nr;
 	key.page_match = 0;
 
@@ -1220,7 +1220,7 @@ int sysctl_page_lock_unfairness = 5;
 static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 		int state, enum behavior behavior)
 {
-	wait_queue_head_t *q = page_waitqueue(&folio->page);
+	wait_queue_head_t *q = folio_waitqueue(folio);
 	int unfairness = sysctl_page_lock_unfairness;
 	struct wait_page_queue wait_page;
 	wait_queue_entry_t *wait = &wait_page.wait;
@@ -1240,7 +1240,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 
 	init_wait(wait);
 	wait->func = wake_page_function;
-	wait_page.page = &folio->page;
+	wait_page.folio = folio;
 	wait_page.bit_nr = bit_nr;
 
 repeat:
@@ -1389,23 +1389,23 @@ int put_and_wait_on_page_locked(struct page *page, int state)
 }
 
 /**
- * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
- * @page: Page defining the wait queue of interest
+ * folio_add_wait_queue - Add an arbitrary waiter to a folio's wait queue
+ * @folio: Folio defining the wait queue of interest
  * @waiter: Waiter to add to the queue
  *
- * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ * Add an arbitrary @waiter to the wait queue for the nominated @folio.
  */
-void add_page_wait_queue(struct page *page, wait_queue_entry_t *waiter)
+void folio_add_wait_queue(struct folio *folio, wait_queue_entry_t *waiter)
 {
-	wait_queue_head_t *q = page_waitqueue(page);
+	wait_queue_head_t *q = folio_waitqueue(folio);
 	unsigned long flags;
 
 	spin_lock_irqsave(&q->lock, flags);
 	__add_wait_queue_entry_tail(q, waiter);
-	SetPageWaiters(page);
+	folio_set_waiters_flag(folio);
 	spin_unlock_irqrestore(&q->lock, flags);
 }
-EXPORT_SYMBOL_GPL(add_page_wait_queue);
+EXPORT_SYMBOL_GPL(folio_add_wait_queue);
 
 #ifndef clear_bit_unlock_is_negative_byte
 
@@ -1593,10 +1593,10 @@ EXPORT_SYMBOL_GPL(__folio_lock_killable);
 
 static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
 {
-	struct wait_queue_head *q = page_waitqueue(&folio->page);
+	struct wait_queue_head *q = folio_waitqueue(folio);
 	int ret = 0;
 
-	wait->page = &folio->page;
+	wait->folio = folio;
 	wait->bit_nr = PG_locked;
 
 	spin_lock_irq(&q->lock);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 31/33] mm/filemap: Add folio private_2 functions
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (29 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 30/33] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 12:26   ` Vlastimil Babka
  2021-05-27  8:21   ` Christoph Hellwig
  2021-05-11 21:47 ` [PATCH v10 32/33] fs/netfs: Add folio fscache functions Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  35 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm, linux-kernel

end_page_private_2() becomes folio_end_private_2(),
wait_on_page_private_2() becomes folio_wait_private_2() and
wait_on_page_private_2_killable() becomes folio_wait_private_2_killable().

Adjust the fscache equivalents to call page_folio() before calling these
functions to avoid adding wrappers.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/netfs.h   |  6 +++---
 include/linux/pagemap.h |  6 +++---
 mm/filemap.c            | 37 ++++++++++++++++---------------------
 3 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 9062adfa2fb9..fad8c6209edd 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -55,7 +55,7 @@ static inline void set_page_fscache(struct page *page)
  */
 static inline void end_page_fscache(struct page *page)
 {
-	end_page_private_2(page);
+	folio_end_private_2(page_folio(page));
 }
 
 /**
@@ -66,7 +66,7 @@ static inline void end_page_fscache(struct page *page)
  */
 static inline void wait_on_page_fscache(struct page *page)
 {
-	wait_on_page_private_2(page);
+	folio_wait_private_2(page_folio(page));
 }
 
 /**
@@ -82,7 +82,7 @@ static inline void wait_on_page_fscache(struct page *page)
  */
 static inline int wait_on_page_fscache_killable(struct page *page)
 {
-	return wait_on_page_private_2_killable(page);
+	return folio_wait_private_2_killable(page_folio(page));
 }
 
 enum netfs_read_source {
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 353df9aaa8e9..fdb730950507 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -777,9 +777,9 @@ static inline void set_page_private_2(struct page *page)
 	SetPagePrivate2(page);
 }
 
-void end_page_private_2(struct page *page);
-void wait_on_page_private_2(struct page *page);
-int wait_on_page_private_2_killable(struct page *page);
+void folio_end_private_2(struct folio *folio);
+void folio_wait_private_2(struct folio *folio);
+int folio_wait_private_2_killable(struct folio *folio);
 
 /*
  * Add an arbitrary waiter to a page's wait queue
diff --git a/mm/filemap.c b/mm/filemap.c
index 1396560dfde8..0394b893bf9d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1449,56 +1449,51 @@ void folio_unlock(struct folio *folio)
 EXPORT_SYMBOL(folio_unlock);
 
 /**
- * end_page_private_2 - Clear PG_private_2 and release any waiters
- * @page: The page
+ * folio_end_private_2 - Clear PG_private_2 and wake any waiters.
+ * @folio: The folio.
  *
- * Clear the PG_private_2 bit on a page and wake up any sleepers waiting for
- * this.  The page ref held for PG_private_2 being set is released.
+ * Clear the PG_private_2 bit on a folio and wake up any sleepers waiting for
+ * it.  The page ref held for PG_private_2 being set is released.
  *
  * This is, for example, used when a netfs page is being written to a local
  * disk cache, thereby allowing writes to the cache for the same page to be
  * serialised.
  */
-void end_page_private_2(struct page *page)
+void folio_end_private_2(struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
-
 	VM_BUG_ON_FOLIO(!folio_private_2(folio), folio);
 	clear_bit_unlock(PG_private_2, folio_flags(folio, 0));
 	folio_wake_bit(folio, PG_private_2);
 	folio_put(folio);
 }
-EXPORT_SYMBOL(end_page_private_2);
+EXPORT_SYMBOL(folio_end_private_2);
 
 /**
- * wait_on_page_private_2 - Wait for PG_private_2 to be cleared on a page
- * @page: The page to wait on
+ * folio_wait_private_2 - Wait for PG_private_2 to be cleared on a page.
+ * @folio: The folio to wait on.
  *
- * Wait for PG_private_2 (aka PG_fscache) to be cleared on a page.
+ * Wait for PG_private_2 (aka PG_fscache) to be cleared on a folio.
  */
-void wait_on_page_private_2(struct page *page)
+void folio_wait_private_2(struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
-
 	while (folio_private_2(folio))
 		folio_wait_bit(folio, PG_private_2);
 }
-EXPORT_SYMBOL(wait_on_page_private_2);
+EXPORT_SYMBOL(folio_wait_private_2);
 
 /**
- * wait_on_page_private_2_killable - Wait for PG_private_2 to be cleared on a page
- * @page: The page to wait on
+ * folio_wait_private_2_killable - Wait for PG_private_2 to be cleared on a folio.
+ * @folio: The folio to wait on.
  *
- * Wait for PG_private_2 (aka PG_fscache) to be cleared on a page or until a
+ * Wait for PG_private_2 (aka PG_fscache) to be cleared on a folio or until a
  * fatal signal is received by the calling task.
  *
  * Return:
  * - 0 if successful.
  * - -EINTR if a fatal signal was encountered.
  */
-int wait_on_page_private_2_killable(struct page *page)
+int folio_wait_private_2_killable(struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
 	int ret = 0;
 
 	while (folio_private_2(folio)) {
@@ -1509,7 +1504,7 @@ int wait_on_page_private_2_killable(struct page *page)
 
 	return ret;
 }
-EXPORT_SYMBOL(wait_on_page_private_2_killable);
+EXPORT_SYMBOL(folio_wait_private_2_killable);
 
 /**
  * folio_end_writeback - End writeback against a folio.
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 32/33] fs/netfs: Add folio fscache functions
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (30 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 31/33] mm/filemap: Add folio private_2 functions Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 13:48   ` Vlastimil Babka
  2021-05-27  8:23   ` Christoph Hellwig
  2021-05-11 21:47 ` [PATCH v10 33/33] mm: Add folio_mapped Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  35 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm, linux-kernel

Match the page writeback functions by adding
folio_start_fscache(), folio_end_fscache(), folio_wait_fscache() and
folio_wait_fscache_killable().  Also rewrite the kernel-doc to describe
when to use the function rather than what the function does, and include
the kernel-doc in the appropriate rst file.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/netfs_library.rst |  2 +
 include/linux/netfs.h                       | 75 +++++++++++++--------
 2 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst
index 57a641847818..bb68d39f03b7 100644
--- a/Documentation/filesystems/netfs_library.rst
+++ b/Documentation/filesystems/netfs_library.rst
@@ -524,3 +524,5 @@ Note that these methods are passed a pointer to the cache resource structure,
 not the read request structure as they could be used in other situations where
 there isn't a read request structure as well, such as writing dirty data to the
 cache.
+
+.. kernel-doc:: include/linux/netfs.h
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index fad8c6209edd..b0bbd343fc98 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -22,6 +22,7 @@
  * Overload PG_private_2 to give us PG_fscache - this is used to indicate that
  * a page is currently backed by a local disk cache
  */
+#define folio_fscache(folio)		folio_private_2(folio)
 #define PageFsCache(page)		PagePrivate2((page))
 #define SetPageFsCache(page)		SetPagePrivate2((page))
 #define ClearPageFsCache(page)		ClearPagePrivate2((page))
@@ -29,57 +30,77 @@
 #define TestClearPageFsCache(page)	TestClearPagePrivate2((page))
 
 /**
- * set_page_fscache - Set PG_fscache on a page and take a ref
- * @page: The page.
+ * folio_start_fscache - Start an fscache operation on a folio.
+ * @folio: The folio.
  *
- * Set the PG_fscache (PG_private_2) flag on a page and take the reference
- * needed for the VM to handle its lifetime correctly.  This sets the flag and
- * takes the reference unconditionally, so care must be taken not to set the
- * flag again if it's already set.
+ * Call this function before an fscache operation starts on a folio.
+ * Starting a second fscache operation before the first one finishes is
+ * not allowed.
  */
-static inline void set_page_fscache(struct page *page)
+static inline void folio_start_fscache(struct folio *folio)
 {
-	set_page_private_2(page);
+	VM_BUG_ON_FOLIO(folio_private_2(folio), folio);
+	folio_get(folio);
+	folio_set_private_2_flag(folio);
 }
 
 /**
- * end_page_fscache - Clear PG_fscache and release any waiters
- * @page: The page
- *
- * Clear the PG_fscache (PG_private_2) bit on a page and wake up any sleepers
- * waiting for this.  The page ref held for PG_private_2 being set is released.
+ * folio_end_fscache - End an fscache operation on a folio.
+ * @folio: The folio.
  *
- * This is, for example, used when a netfs page is being written to a local
- * disk cache, thereby allowing writes to the cache for the same page to be
- * serialised.
+ * Call this function after an fscache operation has finished.  This will
+ * wake any sleepers waiting on this folio.
  */
-static inline void end_page_fscache(struct page *page)
+static inline void folio_end_fscache(struct folio *folio)
 {
-	folio_end_private_2(page_folio(page));
+	folio_end_private_2(folio);
 }
 
 /**
- * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
- * @page: The page to wait on
+ * folio_wait_fscache - Wait for an fscache operation on this folio to end.
+ * @folio: The folio.
  *
- * Wait for PG_fscache (aka PG_private_2) to be cleared on a page.
+ * If an fscache operation is in progress on this folio, wait for it to
+ * finish.  Another fscache operation may start after this one finishes,
+ * unless the caller holds the folio lock.
  */
-static inline void wait_on_page_fscache(struct page *page)
+static inline void folio_wait_fscache(struct folio *folio)
 {
-	folio_wait_private_2(page_folio(page));
+	folio_wait_private_2(folio);
 }
 
 /**
- * wait_on_page_fscache_killable - Wait for PG_fscache to be cleared on a page
- * @page: The page to wait on
+ * folio_wait_fscache_killable - Wait for an fscache operation on this folio to end.
+ * @folio: The folio.
  *
- * Wait for PG_fscache (aka PG_private_2) to be cleared on a page or until a
- * fatal signal is received by the calling task.
+ * If an fscache operation is in progress on this folio, wait for it to
+ * finish or for a fatal signal to be received.  Another fscache operation
+ * may start after this one finishes, unless the caller holds the folio lock.
  *
  * Return:
  * - 0 if successful.
  * - -EINTR if a fatal signal was encountered.
  */
+static inline int folio_wait_fscache_killable(struct folio *folio)
+{
+	return folio_wait_private_2_killable(folio);
+}
+
+static inline void set_page_fscache(struct page *page)
+{
+	folio_start_fscache(page_folio(page));
+}
+
+static inline void end_page_fscache(struct page *page)
+{
+	folio_end_private_2(page_folio(page));
+}
+
+static inline void wait_on_page_fscache(struct page *page)
+{
+	folio_wait_private_2(page_folio(page));
+}
+
 static inline int wait_on_page_fscache_killable(struct page *page)
 {
 	return folio_wait_private_2_killable(page_folio(page));
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v10 33/33] mm: Add folio_mapped
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (31 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 32/33] fs/netfs: Add folio fscache functions Matthew Wilcox (Oracle)
@ 2021-05-11 21:47 ` Matthew Wilcox (Oracle)
  2021-05-18 14:17   ` Vlastimil Babka
  2021-05-27  8:31   ` Christoph Hellwig
  2021-05-13 14:50 ` [PATCH v10 00/33] Memory folios Matthew Wilcox
                   ` (2 subsequent siblings)
  35 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2021-05-11 21:47 UTC (permalink / raw)
  To: akpm; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-mm, linux-kernel

This function is the equivalent of page_mapped().  It is slightly
shorter as we do not need to handle the PageTail() case.  Reimplement
page_mapped() as a wrapper around folio_mapped().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h |  1 +
 mm/folio-compat.c  |  6 ++++++
 mm/util.c          | 29 ++++++++++++++++-------------
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6e3dde81ecc9..4686107a4f96 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1779,6 +1779,7 @@ static inline pgoff_t page_index(struct page *page)
 }
 
 bool page_mapped(struct page *page);
+bool folio_mapped(struct folio *folio);
 
 /*
  * Return true only if the page has been allocated with
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 3c83f03b80d7..7044fcc8a8aa 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -35,3 +35,9 @@ void wait_for_stable_page(struct page *page)
 	return folio_wait_stable(page_folio(page));
 }
 EXPORT_SYMBOL_GPL(wait_for_stable_page);
+
+bool page_mapped(struct page *page)
+{
+	return folio_mapped(page_folio(page));
+}
+EXPORT_SYMBOL(page_mapped);
diff --git a/mm/util.c b/mm/util.c
index 245f5c7bedae..c2d22145ebae 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -652,28 +652,31 @@ void *page_rmapping(struct page *page)
 	return __page_rmapping(page);
 }
 
-/*
- * Return true if this page is mapped into pagetables.
- * For compound page it returns true if any subpage of compound page is mapped.
+/**
+ * folio_mapped - Is this folio mapped into userspace?
+ * @folio: The folio.
+ *
+ * Return: true if any page in this folio is mapped into pagetables.
  */
-bool page_mapped(struct page *page)
+bool folio_mapped(struct folio *folio)
 {
-	int i;
+	int i, nr;
 
-	if (likely(!PageCompound(page)))
-		return atomic_read(&page->_mapcount) >= 0;
-	page = compound_head(page);
-	if (atomic_read(compound_mapcount_ptr(page)) >= 0)
+	if (folio_single(folio))
+		return atomic_read(&folio->_mapcount) >= 0;
+	if (atomic_read(compound_mapcount_ptr(&folio->page)) >= 0)
 		return true;
-	if (PageHuge(page))
+	if (folio_hugetlb(folio))
 		return false;
-	for (i = 0; i < compound_nr(page); i++) {
-		if (atomic_read(&page[i]._mapcount) >= 0)
+
+	nr = folio_nr_pages(folio);
+	for (i = 0; i < nr; i++) {
+		if (atomic_read(&folio_page(folio, i)->_mapcount) >= 0)
 			return true;
 	}
 	return false;
 }
-EXPORT_SYMBOL(page_mapped);
+EXPORT_SYMBOL(folio_mapped);
 
 struct anon_vma *page_anon_vma(struct page *page)
 {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 00/33] Memory folios
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (32 preceding siblings ...)
  2021-05-11 21:47 ` [PATCH v10 33/33] mm: Add folio_mapped Matthew Wilcox (Oracle)
@ 2021-05-13 14:50 ` Matthew Wilcox
  2021-05-15 10:26 ` William Kucharski
  2021-06-04  1:07 ` Matteo Croce
  35 siblings, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2021-05-13 14:50 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, linux-mm, linux-kernel

On Tue, May 11, 2021 at 10:47:02PM +0100, Matthew Wilcox (Oracle) wrote:
> We also waste a lot of instructions ensuring that we're not looking at
> a tail page.  Almost every call to PageFoo() contains one or more hidden
> calls to compound_head().  This also happens for get_page(), put_page()
> and many more functions.  There does not appear to be a way to tell gcc
> that it can cache the result of compound_head(), nor is there a way to
> tell it that compound_head() is idempotent.

I instrumented _compound_head() on a test VM:

+++ b/include/linux/page-flags.h
@@ -179,10 +179,13 @@ enum pageflags {

 #ifndef __GENERATING_BOUNDS_H

+extern atomic_t chcc;
+
 static inline unsigned long _compound_head(const struct page *page)
 {
        unsigned long head = READ_ONCE(page->compound_head);

+       atomic_inc(&chcc);
        if (unlikely(head & 1))
                return head - 1;
        return (unsigned long)page;

which means it catches both calls to compound_head() and page_folio().
Between patch 8/96 in folio_v9 and patch 96/96, the number of calls in
an idle VM went down from almost 7k/s to just over 5k/s; about 25%.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 01/33] mm: Introduce struct folio
  2021-05-11 21:47 ` [PATCH v10 01/33] mm: Introduce struct folio Matthew Wilcox (Oracle)
@ 2021-05-14 10:34   ` Vlastimil Babka
  2021-05-14 10:40   ` Vlastimil Babka
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 10:34 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> A struct folio is a new abstraction to replace the venerable struct page.
> A function which takes a struct folio argument declares that it will
> operate on the entire (possibly compound) page, not just PAGE_SIZE bytes.
> In return, the caller guarantees that the pointer it is passing does
> not point to a tail page.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 02/33] mm: Add folio_pgdat and folio_zone
  2021-05-11 21:47 ` [PATCH v10 02/33] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
@ 2021-05-14 10:35   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 10:35 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Zi Yan, Christoph Hellwig,
	Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> These are just convenience wrappers for callers with folios; pgdat and
> zone can be reached from tail pages as well as head pages.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/mm.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b29c86824e6b..a55c2c0628b6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1560,6 +1560,16 @@ static inline pg_data_t *page_pgdat(const struct page *page)
>  	return NODE_DATA(page_to_nid(page));
>  }
>  
> +static inline struct zone *folio_zone(const struct folio *folio)
> +{
> +	return page_zone(&folio->page);
> +}
> +
> +static inline pg_data_t *folio_pgdat(const struct folio *folio)
> +{
> +	return page_pgdat(&folio->page);
> +}
> +
>  #ifdef SECTION_IN_PAGE_FLAGS
>  static inline void set_page_section(struct page *page, unsigned long section)
>  {
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 03/33] mm/vmstat: Add functions to account folio statistics
  2021-05-11 21:47 ` [PATCH v10 03/33] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
@ 2021-05-14 10:36   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 10:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Allow page counters to be more readily modified by callers which have
> a folio.  Name these wrappers with 'stat' instead of 'state' as requested
> by Linus here:
> https://lore.kernel.org/linux-mm/CAHk-=wj847SudR-kt+46fT3+xFFgiwpgThvm7DJWGdi4cVrbnQ@mail.gmail.com/
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/vmstat.h | 107 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 107 insertions(+)
> 
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 3299cd69e4ca..d287d7c31b8f 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -402,6 +402,78 @@ static inline void drain_zonestat(struct zone *zone,
>  			struct per_cpu_pageset *pset) { }
>  #endif		/* CONFIG_SMP */
>  
> +static inline void __zone_stat_mod_folio(struct folio *folio,
> +		enum zone_stat_item item, long nr)
> +{
> +	__mod_zone_page_state(folio_zone(folio), item, nr);
> +}
> +
> +static inline void __zone_stat_add_folio(struct folio *folio,
> +		enum zone_stat_item item)
> +{
> +	__mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
> +}
> +
> +static inline void __zone_stat_sub_folio(struct folio *folio,
> +		enum zone_stat_item item)
> +{
> +	__mod_zone_page_state(folio_zone(folio), item, -folio_nr_pages(folio));
> +}
> +
> +static inline void zone_stat_mod_folio(struct folio *folio,
> +		enum zone_stat_item item, long nr)
> +{
> +	mod_zone_page_state(folio_zone(folio), item, nr);
> +}
> +
> +static inline void zone_stat_add_folio(struct folio *folio,
> +		enum zone_stat_item item)
> +{
> +	mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
> +}
> +
> +static inline void zone_stat_sub_folio(struct folio *folio,
> +		enum zone_stat_item item)
> +{
> +	mod_zone_page_state(folio_zone(folio), item, -folio_nr_pages(folio));
> +}
> +
> +static inline void __node_stat_mod_folio(struct folio *folio,
> +		enum node_stat_item item, long nr)
> +{
> +	__mod_node_page_state(folio_pgdat(folio), item, nr);
> +}
> +
> +static inline void __node_stat_add_folio(struct folio *folio,
> +		enum node_stat_item item)
> +{
> +	__mod_node_page_state(folio_pgdat(folio), item, folio_nr_pages(folio));
> +}
> +
> +static inline void __node_stat_sub_folio(struct folio *folio,
> +		enum node_stat_item item)
> +{
> +	__mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
> +}
> +
> +static inline void node_stat_mod_folio(struct folio *folio,
> +		enum node_stat_item item, long nr)
> +{
> +	mod_node_page_state(folio_pgdat(folio), item, nr);
> +}
> +
> +static inline void node_stat_add_folio(struct folio *folio,
> +		enum node_stat_item item)
> +{
> +	mod_node_page_state(folio_pgdat(folio), item, folio_nr_pages(folio));
> +}
> +
> +static inline void node_stat_sub_folio(struct folio *folio,
> +		enum node_stat_item item)
> +{
> +	mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
> +}
> +
>  static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
>  					     int migratetype)
>  {
> @@ -530,6 +602,24 @@ static inline void __dec_lruvec_page_state(struct page *page,
>  	__mod_lruvec_page_state(page, idx, -1);
>  }
>  
> +static inline void __lruvec_stat_mod_folio(struct folio *folio,
> +					   enum node_stat_item idx, int val)
> +{
> +	__mod_lruvec_page_state(&folio->page, idx, val);
> +}
> +
> +static inline void __lruvec_stat_add_folio(struct folio *folio,
> +					   enum node_stat_item idx)
> +{
> +	__lruvec_stat_mod_folio(folio, idx, folio_nr_pages(folio));
> +}
> +
> +static inline void __lruvec_stat_sub_folio(struct folio *folio,
> +					   enum node_stat_item idx)
> +{
> +	__lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio));
> +}
> +
>  static inline void inc_lruvec_page_state(struct page *page,
>  					 enum node_stat_item idx)
>  {
> @@ -542,4 +632,21 @@ static inline void dec_lruvec_page_state(struct page *page,
>  	mod_lruvec_page_state(page, idx, -1);
>  }
>  
> +static inline void lruvec_stat_mod_folio(struct folio *folio,
> +					 enum node_stat_item idx, int val)
> +{
> +	mod_lruvec_page_state(&folio->page, idx, val);
> +}
> +
> +static inline void lruvec_stat_add_folio(struct folio *folio,
> +					 enum node_stat_item idx)
> +{
> +	lruvec_stat_mod_folio(folio, idx, folio_nr_pages(folio));
> +}
> +
> +static inline void lruvec_stat_sub_folio(struct folio *folio,
> +					 enum node_stat_item idx)
> +{
> +	lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio));
> +}
>  #endif /* _LINUX_VMSTAT_H */
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 01/33] mm: Introduce struct folio
  2021-05-11 21:47 ` [PATCH v10 01/33] mm: Introduce struct folio Matthew Wilcox (Oracle)
  2021-05-14 10:34   ` Vlastimil Babka
@ 2021-05-14 10:40   ` Vlastimil Babka
  2021-05-14 11:47     ` Matthew Wilcox
  2021-05-15 10:55   ` William Kucharski
  2021-05-27  8:09   ` Christoph Hellwig
  3 siblings, 1 reply; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 10:40 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> +/**
> + * folio_page - Return a page from a folio.
> + * @folio: The folio.
> + * @n: The page number to return.
> + *
> + * @n is relative to the start of the folio.  It should be between
> + * 0 and folio_nr_pages(@folio) - 1, but this is not checked for.
> + */
> +#define folio_page(folio, n)	nth_page(&(folio)->page, n)

BTW, would it make sense to have also a folio_page(folio) wrapper? Or is
"&folio->page" used in later patches sufficiently elegant and stable enough for
the future?

>  static __always_inline int PageTail(struct page *page)
>  {
>  	return READ_ONCE(page->compound_head) & 1;
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 04/33] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
  2021-05-11 21:47 ` [PATCH v10 04/33] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
@ 2021-05-14 10:44   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 10:44 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Zi Yan, Christoph Hellwig,
	Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> These are the folio equivalents of VM_BUG_ON_PAGE and VM_WARN_ON_ONCE_PAGE.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 05/33] mm: Add folio reference count functions
  2021-05-11 21:47 ` [PATCH v10 05/33] mm: Add folio reference count functions Matthew Wilcox (Oracle)
@ 2021-05-14 11:04   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 11:04 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> These functions mirror their page reference counterparts.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 01/33] mm: Introduce struct folio
  2021-05-14 10:40   ` Vlastimil Babka
@ 2021-05-14 11:47     ` Matthew Wilcox
  0 siblings, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2021-05-14 11:47 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel, Jeff Layton

On Fri, May 14, 2021 at 12:40:05PM +0200, Vlastimil Babka wrote:
> On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> > +/**
> > + * folio_page - Return a page from a folio.
> > + * @folio: The folio.
> > + * @n: The page number to return.
> > + *
> > + * @n is relative to the start of the folio.  It should be between
> > + * 0 and folio_nr_pages(@folio) - 1, but this is not checked for.
> > + */
> > +#define folio_page(folio, n)	nth_page(&(folio)->page, n)
> 
> BTW, would it make sense to have also a folio_page(folio) wrapper? Or is
> "&folio->page" used in later patches sufficiently elegant and stable enough for
> the future?

Ah!  If you see &folio->page in a patch, it's "a bad smell" [1].  At
this stage, it probably indicates "This other thing I need isn't
converted entirely to folios yet".  I consider it fine in
implementations of utility functions like this:

+static inline unsigned int folio_order(struct folio *folio)
+{
+       return compound_order(&folio->page);
+}

but when we see it here:

+void folio_unlock(struct folio *folio)
 {
        BUILD_BUG_ON(PG_waiters != 7);
-       page = compound_head(page);
-       VM_BUG_ON_PAGE(!PageLocked(page), page);
-       if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags))
-               wake_up_page_bit(page, PG_locked);
+       VM_BUG_ON_FOLIO(!folio_locked(folio), folio);
+       if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0)))
+               wake_up_page_bit(&folio->page, PG_locked);
 }

that's an indication that wake_up_page_bit() needs to be converted to
folio_wake_bit(), which happens in a later patch.  I could probably
avoid this temporary problem with a different ordering of the patches,
but it's not clear to me that's a good use of my time.

The existing folio_page() is a way of distinguishing between "this
function i need to call doesn't have a folio equivalent yet" and "this
function i need to call needs to deal specifically with one page in
this folio".  For the former, use &folio->page; for the latter, use
folio_page() or folio_file_page().

[1] https://en.wikipedia.org/wiki/Code_smell

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 06/33] mm: Add folio_put
  2021-05-11 21:47 ` [PATCH v10 06/33] mm: Add folio_put Matthew Wilcox (Oracle)
@ 2021-05-14 11:52   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 11:52 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Zi Yan, Christoph Hellwig,
	Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> If we know we have a folio, we can call folio_put() instead of put_page()
> and save the overhead of calling compound_head().  Also skips the
> devmap checks.
> 
> This commit looks like it should be a no-op, but actually saves 1312 bytes
> of text with the distro-derived config that I'm testing.  Some functions
> grow a little while others shrink.  I presume the compiler is making
> different inlining decisions.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 07/33] mm: Add folio_get
  2021-05-11 21:47 ` [PATCH v10 07/33] mm: Add folio_get Matthew Wilcox (Oracle)
@ 2021-05-14 11:56   ` Vlastimil Babka
  2021-05-14 14:24     ` Matthew Wilcox
  0 siblings, 1 reply; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 11:56 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Zi Yan, Christoph Hellwig,
	Jeff Layton

Nitpick: function names in subject should IMHO also end with (). But not a
reason for resend all patches that don't...

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> If we know we have a folio, we can call folio_get() instead
> of get_page() and save the overhead of calling compound_head().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/mm.h | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 610948f0cb43..feb4645ef4f2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1219,18 +1219,26 @@ static inline bool is_pci_p2pdma_page(const struct page *page)
>  }
>  
>  /* 127: arbitrary random number, small enough to assemble well */
> -#define page_ref_zero_or_close_to_overflow(page) \
> -	((unsigned int) page_ref_count(page) + 127u <= 127u)
> +#define folio_ref_zero_or_close_to_overflow(folio) \
> +	((unsigned int) folio_ref_count(folio) + 127u <= 127u)
> +
> +/**
> + * folio_get - Increment the reference count on a folio.
> + * @folio: The folio.
> + *
> + * Context: May be called in any context, as long as you know that
> + * you have a refcount on the folio.  If you do not already have one,
> + * folio_try_get() may be the right interface for you to use.
> + */
> +static inline void folio_get(struct folio *folio)
> +{
> +	VM_BUG_ON_FOLIO(folio_ref_zero_or_close_to_overflow(folio), folio);
> +	folio_ref_inc(folio);
> +}
>  
>  static inline void get_page(struct page *page)
>  {
> -	page = compound_head(page);
> -	/*
> -	 * Getting a normal page or the head of a compound page
> -	 * requires to already have an elevated page->_refcount.
> -	 */
> -	VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
> -	page_ref_inc(page);
> +	folio_get(page_folio(page));
>  }
>  
>  bool __must_check try_grab_page(struct page *page, unsigned int flags);
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 08/33] mm: Add folio_try_get_rcu
  2021-05-11 21:47 ` [PATCH v10 08/33] mm: Add folio_try_get_rcu Matthew Wilcox (Oracle)
@ 2021-05-14 12:11   ` Vlastimil Babka
  2021-05-27  8:16   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 12:11 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm; +Cc: linux-fsdevel, linux-mm, linux-kernel

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> This is the equivalent of page_cache_get_speculative().  Also add
> folio_ref_try_add_rcu (the equivalent of page_cache_add_speculative)
> and folio_get_unless_zero() (the equivalent of get_page_unless_zero()).
> 
> The new kernel-doc attempts to explain from the user's point of view
> when to use folio_try_get_rcu() and when to use folio_get_unless_zero(),
> because there seems to be some confusion currently between the users of
> page_cache_get_speculative() and get_page_unless_zero().
> 
> Reimplement page_cache_add_speculative() and page_cache_get_speculative()
> as wrappers around the folio equivalents, but leave get_page_unless_zero()
> alone for now.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 07/33] mm: Add folio_get
  2021-05-14 11:56   ` Vlastimil Babka
@ 2021-05-14 14:24     ` Matthew Wilcox
  2021-05-14 15:39       ` Vlastimil Babka
  2021-05-27  8:10       ` Christoph Hellwig
  0 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox @ 2021-05-14 14:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, linux-fsdevel, linux-mm, linux-kernel, Zi Yan,
	Christoph Hellwig, Jeff Layton

On Fri, May 14, 2021 at 01:56:46PM +0200, Vlastimil Babka wrote:
> Nitpick: function names in subject should IMHO also end with (). But not a
> reason for resend all patches that don't...

Hm, I thought it was preferred to not do that.  I can fix it
easily enough when I go through and add the R-b.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 09/33] mm: Add folio flag manipulation functions
  2021-05-11 21:47 ` [PATCH v10 09/33] mm: Add folio flag manipulation functions Matthew Wilcox (Oracle)
@ 2021-05-14 15:29   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 15:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> These new functions are the folio analogues of the various PageFlags
> functions.  If CONFIG_DEBUG_VM_PGFLAGS is enabled, we check the folio
> is not a tail page at every invocation.  This will also catch the
> PagePoisoned case as a poisoned page has every bit set, which would
> include PageTail.
> 
> This saves 1727 bytes of text with the distro-derived config that
> I'm testing due to removing a double call to compound_head() in
> PageSwapCache().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Some nits:

...

>   * Macros to create function definitions for page flags
>   */
>  #define TESTPAGEFLAG(uname, lname, policy)				\
> +static __always_inline bool folio_##lname(struct folio *folio)		\
> +{ return test_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }	\
>  static __always_inline int Page##uname(struct page *page)		\
>  	{ return test_bit(PG_##lname, &policy(page, 0)->flags); }

Maybe unify these idents while at it?

>  
>  #define SETPAGEFLAG(uname, lname, policy)				\
> +static __always_inline							\
> +void folio_set_##lname##_flag(struct folio *folio)			\
> +{ set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }		\
>  static __always_inline void SetPage##uname(struct page *page)		\
>  	{ set_bit(PG_##lname, &policy(page, 1)->flags); }
>  
>  #define CLEARPAGEFLAG(uname, lname, policy)				\
> +static __always_inline							\
> +void folio_clear_##lname##_flag(struct folio *folio)			\
> +{ clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }		\
>  static __always_inline void ClearPage##uname(struct page *page)		\
>  	{ clear_bit(PG_##lname, &policy(page, 1)->flags); }
>  
>  #define __SETPAGEFLAG(uname, lname, policy)				\
> +static __always_inline							\
> +void __folio_set_##lname##_flag(struct folio *folio)			\
> +{ __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }		\
>  static __always_inline void __SetPage##uname(struct page *page)		\
>  	{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
>  
>  #define __CLEARPAGEFLAG(uname, lname, policy)				\
> +static __always_inline							\
> +void __folio_clear_##lname##_flag(struct folio *folio)			\
> +{ __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }	\
>  static __always_inline void __ClearPage##uname(struct page *page)	\
>  	{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
>  
>  #define TESTSETFLAG(uname, lname, policy)				\
> +static __always_inline							\
> +bool folio_test_set_##lname##_flag(struct folio *folio)		\

The line above seems to need extra tab before '\'
(used vimdiff on your git tree)


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 10/33] mm: Add folio_young and folio_idle
  2021-05-11 21:47 ` [PATCH v10 10/33] mm: Add folio_young and folio_idle Matthew Wilcox (Oracle)
@ 2021-05-14 15:33   ` Vlastimil Babka
  2021-05-27  8:17   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 15:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm; +Cc: linux-fsdevel, linux-mm, linux-kernel

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Idle page tracking is handled through page_ext on 32-bit architectures.
> Add folio equivalents for 32-bit and move all the page compatibility
> parts to common code.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 07/33] mm: Add folio_get
  2021-05-14 14:24     ` Matthew Wilcox
@ 2021-05-14 15:39       ` Vlastimil Babka
  2021-05-27  8:10       ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 15:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, linux-fsdevel, linux-mm, linux-kernel, Zi Yan,
	Christoph Hellwig, Jeff Layton

On 5/14/21 4:24 PM, Matthew Wilcox wrote:
> On Fri, May 14, 2021 at 01:56:46PM +0200, Vlastimil Babka wrote:
>> Nitpick: function names in subject should IMHO also end with (). But not a
>> reason for resend all patches that don't...
> 
> Hm, I thought it was preferred to not do that.

Hm, no idea if there's a concensus on that, actually.

> I can fix it
> easily enough when I go through and add the R-b.

If I was right...


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 11/33] mm: Handle per-folio private data
  2021-05-11 21:47 ` [PATCH v10 11/33] mm: Handle per-folio private data Matthew Wilcox (Oracle)
@ 2021-05-14 15:41   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 15:41 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Add folio_get_private() which mirrors page_private() -- ie folio private
> data is the same as page private data.  The only difference is that these
> return a void * instead of an unsigned long, which matches the majority
> of users.
> 
> Turn attach_page_private() into folio_attach_private() and reimplement
> attach_page_private() as a wrapper.  No filesystem which uses page private
> data currently supports compound pages, so we're free to define the rules.
> attach_page_private() may only be called on a head page; if you want
> to add private data to a tail page, you can call set_page_private()
> directly (and shouldn't increment the page refcount!  That should be
> done when adding private data to the head page / folio).
> 
> This saves 597 bytes of text with the distro-derived config that I'm
> testing due to removing the calls to compound_head() in get_page()
> & put_page().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 12/33] mm/filemap: Add folio_index, folio_file_page and folio_contains
  2021-05-11 21:47 ` [PATCH v10 12/33] mm/filemap: Add folio_index, folio_file_page and folio_contains Matthew Wilcox (Oracle)
@ 2021-05-14 15:55   ` Vlastimil Babka
  2021-05-15 15:51     ` Matthew Wilcox
  0 siblings, 1 reply; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 15:55 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> folio_index() is the equivalent of page_index() for folios.
> folio_file_page() is the equivalent of find_subpage().

find_subpage() special cases hugetlbfs, folio_file_page() doesn't.

> folio_contains() is the equivalent of thp_contains().

Yet here, both thp_contains() and folio_contains() does.

This patch doesn't add users so maybe it becomes obvious later, but perhaps
worth explaining in the changelog or comment?

> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>
> ---
>  include/linux/pagemap.h | 53 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index bc5fa3d7204e..8eaeffccfd38 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -386,6 +386,59 @@ static inline bool thp_contains(struct page *head, pgoff_t index)
>  	return page_index(head) == (index & ~(thp_nr_pages(head) - 1UL));
>  }
>  
> +#define swapcache_index(folio)	__page_file_index(&(folio)->page)
> +
> +/**
> + * folio_index - File index of a folio.
> + * @folio: The folio.
> + *
> + * For a folio which is either in the page cache or the swap cache,
> + * return its index within the address_space it belongs to.  If you know
> + * the page is definitely in the page cache, you can look at the folio's
> + * index directly.
> + *
> + * Return: The index (offset in units of pages) of a folio in its file.
> + */
> +static inline pgoff_t folio_index(struct folio *folio)
> +{
> +        if (unlikely(folio_swapcache(folio)))
> +                return swapcache_index(folio);
> +        return folio->index;
> +}
> +
> +/**
> + * folio_file_page - The page for a particular index.
> + * @folio: The folio which contains this index.
> + * @index: The index we want to look up.
> + *
> + * Sometimes after looking up a folio in the page cache, we need to
> + * obtain the specific page for an index (eg a page fault).
> + *
> + * Return: The page containing the file data for this index.
> + */
> +static inline struct page *folio_file_page(struct folio *folio, pgoff_t index)
> +{
> +	return folio_page(folio, index & (folio_nr_pages(folio) - 1));
> +}
> +
> +/**
> + * folio_contains - Does this folio contain this index?
> + * @folio: The folio.
> + * @index: The page index within the file.
> + *
> + * Context: The caller should have the page locked in order to prevent
> + * (eg) shmem from moving the page between the page cache and swap cache
> + * and changing its index in the middle of the operation.
> + * Return: true or false.
> + */
> +static inline bool folio_contains(struct folio *folio, pgoff_t index)
> +{
> +	/* HugeTLBfs indexes the page cache in units of hpage_size */
> +	if (folio_hugetlb(folio))
> +		return folio->index == index;
> +	return index - folio_index(folio) < folio_nr_pages(folio);
> +}
> +
>  /*
>   * Given the page we found in the page cache, return the page corresponding
>   * to this index in the file
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 13/33] mm/filemap: Add folio_next_index
  2021-05-11 21:47 ` [PATCH v10 13/33] mm/filemap: Add folio_next_index Matthew Wilcox (Oracle)
@ 2021-05-14 17:07   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 17:07 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> This helper returns the page index of the next folio in the file (ie
> the end of this folio, plus one).
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/pagemap.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 8eaeffccfd38..3b82252d12fc 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -406,6 +406,17 @@ static inline pgoff_t folio_index(struct folio *folio)
>          return folio->index;
>  }
>  
> +/**
> + * folio_next_index - Get the index of the next folio.
> + * @folio: The current folio.
> + *
> + * Return: The index of the folio which follows this folio in the file.
> + */
> +static inline pgoff_t folio_next_index(struct folio *folio)
> +{
> +	return folio->index + folio_nr_pages(folio);
> +}
> +
>  /**
>   * folio_file_page - The page for a particular index.
>   * @folio: The folio which contains this index.
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 14/33] mm/filemap: Add folio_offset and folio_file_offset
  2021-05-11 21:47 ` [PATCH v10 14/33] mm/filemap: Add folio_offset and folio_file_offset Matthew Wilcox (Oracle)
@ 2021-05-14 17:08   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 17:08 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> These are just wrappers around their page counterpart.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/pagemap.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 3b82252d12fc..448a2dfb5ff1 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -558,6 +558,16 @@ static inline loff_t page_file_offset(struct page *page)
>  	return ((loff_t)page_index(page)) << PAGE_SHIFT;
>  }
>  
> +static inline loff_t folio_offset(struct folio *folio)
> +{
> +	return page_offset(&folio->page);
> +}
> +
> +static inline loff_t folio_file_offset(struct folio *folio)
> +{
> +	return page_file_offset(&folio->page);
> +}
> +
>  extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
>  				     unsigned long address);
>  
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 15/33] mm/util: Add folio_mapping and folio_file_mapping
  2021-05-11 21:47 ` [PATCH v10 15/33] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
@ 2021-05-14 17:29   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 17:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> These are the folio equivalent of page_mapping() and page_file_mapping().
> Add an out-of-line page_mapping() wrapper around folio_mapping()
> in order to prevent the page_folio() call from bloating every caller
> of page_mapping().  Adjust page_file_mapping() and page_mapping_file()
> to use folios internally.  Rename __page_file_mapping() to
> swapcache_mapping() and change it to take a folio.
> 
> This ends up saving 186 bytes of text overall.  folio_mapping() is
> 45 bytes shorter than page_mapping() was, but the new page_mapping()
> wrapper is 30 bytes.  The major reduction is a few bytes less in dozens
> of nfs functions (which call page_file_mapping()).  Most of these appear
> to be a slight change in gcc's register allocation decisions, which allow:
> 
>    48 8b 56 08         mov    0x8(%rsi),%rdx
>    48 8d 42 ff         lea    -0x1(%rdx),%rax
>    83 e2 01            and    $0x1,%edx
>    48 0f 44 c6         cmove  %rsi,%rax
> 
> to become:
> 
>    48 8b 46 08         mov    0x8(%rsi),%rax
>    48 8d 78 ff         lea    -0x1(%rax),%rdi
>    a8 01               test   $0x1,%al
>    48 0f 44 fe         cmove  %rsi,%rdi
> 
> for a reduction of a single byte.  Once the NFS client is converted to
> use folios, this entire sequence will disappear.
> 
> Also add folio_mapping() documentation.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  Documentation/core-api/mm-api.rst |  2 ++
>  include/linux/mm.h                | 14 -------------
>  include/linux/pagemap.h           | 35 +++++++++++++++++++++++++++++--
>  include/linux/swap.h              |  6 ++++++
>  mm/Makefile                       |  2 +-
>  mm/folio-compat.c                 | 13 ++++++++++++
>  mm/swapfile.c                     |  8 +++----
>  mm/util.c                         | 30 +++++++++++++++-----------
>  8 files changed, 77 insertions(+), 33 deletions(-)
>  create mode 100644 mm/folio-compat.c
> 
> diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst
> index 5c459ee2acce..dcce6605947a 100644
> --- a/Documentation/core-api/mm-api.rst
> +++ b/Documentation/core-api/mm-api.rst
> @@ -100,3 +100,5 @@ More Memory Management Functions
>     :internal:
>  .. kernel-doc:: include/linux/page_ref.h
>  .. kernel-doc:: include/linux/mmzone.h
> +.. kernel-doc:: mm/util.c
> +   :functions: folio_mapping
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index feb4645ef4f2..dca39daf3495 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1749,19 +1749,6 @@ void page_address_init(void);
>  
>  extern void *page_rmapping(struct page *page);
>  extern struct anon_vma *page_anon_vma(struct page *page);
> -extern struct address_space *page_mapping(struct page *page);
> -
> -extern struct address_space *__page_file_mapping(struct page *);
> -
> -static inline
> -struct address_space *page_file_mapping(struct page *page)
> -{
> -	if (unlikely(PageSwapCache(page)))
> -		return __page_file_mapping(page);
> -
> -	return page->mapping;
> -}
> -
>  extern pgoff_t __page_file_index(struct page *page);
>  
>  /*
> @@ -1776,7 +1763,6 @@ static inline pgoff_t page_index(struct page *page)
>  }
>  
>  bool page_mapped(struct page *page);
> -struct address_space *page_mapping(struct page *page);
>  
>  /*
>   * Return true only if the page has been allocated with
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 448a2dfb5ff1..1f37d7656955 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -162,14 +162,45 @@ static inline void filemap_nr_thps_dec(struct address_space *mapping)
>  
>  void release_pages(struct page **pages, int nr);
>  
> +struct address_space *page_mapping(struct page *);
> +struct address_space *folio_mapping(struct folio *);
> +struct address_space *swapcache_mapping(struct folio *);
> +
> +/**
> + * folio_file_mapping - Find the mapping this folio belongs to.
> + * @folio: The folio.
> + *
> + * For folios which are in the page cache, return the mapping that this
> + * page belongs to.  Folios in the swap cache return the mapping of the
> + * swap file or swap device where the data is stored.  This is different
> + * from the mapping returned by folio_mapping().  The only reason to
> + * use it is if, like NFS, you return 0 from ->activate_swapfile.
> + *
> + * Do not call this for folios which aren't in the page cache or swap cache.
> + */
> +static inline struct address_space *folio_file_mapping(struct folio *folio)
> +{
> +	if (unlikely(folio_swapcache(folio)))
> +		return swapcache_mapping(folio);
> +
> +	return folio->mapping;
> +}
> +
> +static inline struct address_space *page_file_mapping(struct page *page)
> +{
> +	return folio_file_mapping(page_folio(page));
> +}
> +
>  /*
>   * For file cache pages, return the address_space, otherwise return NULL
>   */
>  static inline struct address_space *page_mapping_file(struct page *page)
>  {
> -	if (unlikely(PageSwapCache(page)))
> +	struct folio *folio = page_folio(page);
> +
> +	if (unlikely(folio_swapcache(folio)))
>  		return NULL;
> -	return page_mapping(page);
> +	return folio_mapping(folio);
>  }
>  
>  static inline bool page_cache_add_speculative(struct page *page, int count)
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 144727041e78..20766342845b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -314,6 +314,12 @@ struct vma_swap_readahead {
>  #endif
>  };
>  
> +static inline swp_entry_t folio_swap_entry(struct folio *folio)
> +{
> +	swp_entry_t entry = { .val = page_private(&folio->page) };
> +	return entry;
> +}
> +
>  /* linux/mm/workingset.c */
>  void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages);
>  void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg);
> diff --git a/mm/Makefile b/mm/Makefile
> index a9ad6122d468..434c2a46b6c5 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -46,7 +46,7 @@ mmu-$(CONFIG_MMU)	+= process_vm_access.o
>  endif
>  
>  obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
> -			   maccess.o page-writeback.o \
> +			   maccess.o page-writeback.o folio-compat.o \
>  			   readahead.o swap.o truncate.o vmscan.o shmem.o \
>  			   util.o mmzone.o vmstat.o backing-dev.o \
>  			   mm_init.o percpu.o slab_common.o \
> diff --git a/mm/folio-compat.c b/mm/folio-compat.c
> new file mode 100644
> index 000000000000..5e107aa30a62
> --- /dev/null
> +++ b/mm/folio-compat.c
> @@ -0,0 +1,13 @@
> +/*
> + * Compatibility functions which bloat the callers too much to make inline.
> + * All of the callers of these functions should be converted to use folios
> + * eventually.
> + */
> +
> +#include <linux/pagemap.h>
> +
> +struct address_space *page_mapping(struct page *page)
> +{
> +	return folio_mapping(page_folio(page));
> +}
> +EXPORT_SYMBOL(page_mapping);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 149e77454e3c..d0ee24239a83 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3533,13 +3533,13 @@ struct swap_info_struct *page_swap_info(struct page *page)
>  }
>  
>  /*
> - * out-of-line __page_file_ methods to avoid include hell.
> + * out-of-line methods to avoid include hell.
>   */
> -struct address_space *__page_file_mapping(struct page *page)
> +struct address_space *swapcache_mapping(struct folio *folio)
>  {
> -	return page_swap_info(page)->swap_file->f_mapping;
> +	return page_swap_info(&folio->page)->swap_file->f_mapping;
>  }
> -EXPORT_SYMBOL_GPL(__page_file_mapping);
> +EXPORT_SYMBOL_GPL(swapcache_mapping);
>  
>  pgoff_t __page_file_index(struct page *page)
>  {
> diff --git a/mm/util.c b/mm/util.c
> index 0b6dd9d81da7..245f5c7bedae 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -686,30 +686,36 @@ struct anon_vma *page_anon_vma(struct page *page)
>  	return __page_rmapping(page);
>  }
>  
> -struct address_space *page_mapping(struct page *page)
> +/**
> + * folio_mapping - Find the mapping where this folio is stored.
> + * @folio: The folio.
> + *
> + * For folios which are in the page cache, return the mapping that this
> + * page belongs to.  Folios in the swap cache return the swap mapping
> + * this page is stored in (which is different from the mapping for the
> + * swap file or swap device where the data is stored).
> + *
> + * You can call this for folios which aren't in the swap cache or page
> + * cache and it will return NULL.
> + */
> +struct address_space *folio_mapping(struct folio *folio)
>  {
>  	struct address_space *mapping;
>  
> -	page = compound_head(page);
> -
>  	/* This happens if someone calls flush_dcache_page on slab page */
> -	if (unlikely(PageSlab(page)))
> +	if (unlikely(folio_slab(folio)))
>  		return NULL;
>  
> -	if (unlikely(PageSwapCache(page))) {
> -		swp_entry_t entry;
> -
> -		entry.val = page_private(page);
> -		return swap_address_space(entry);
> -	}
> +	if (unlikely(folio_swapcache(folio)))
> +		return swap_address_space(folio_swap_entry(folio));
>  
> -	mapping = page->mapping;
> +	mapping = folio->mapping;
>  	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
>  		return NULL;
>  
>  	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
>  }
> -EXPORT_SYMBOL(page_mapping);
> +EXPORT_SYMBOL(folio_mapping);
>  
>  /* Slow path of page_mapcount() for compound pages */
>  int __page_mapcount(struct page *page)
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 16/33] mm: Add folio_mapcount
  2021-05-11 21:47 ` [PATCH v10 16/33] mm: Add folio_mapcount Matthew Wilcox (Oracle)
@ 2021-05-14 17:39   ` Vlastimil Babka
  2021-05-18 18:45   ` Matthew Wilcox
  1 sibling, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-14 17:39 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of page_mapcount().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 00/33] Memory folios
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (33 preceding siblings ...)
  2021-05-13 14:50 ` [PATCH v10 00/33] Memory folios Matthew Wilcox
@ 2021-05-15 10:26 ` William Kucharski
  2021-06-04  1:07 ` Matteo Croce
  35 siblings, 0 replies; 96+ messages in thread
From: William Kucharski @ 2021-05-15 10:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Andrew Morton, linux-fsdevel, linux-mm, linux-kernel

I have a nit on part 01/33, but will respond directly there.

For the series:

Reviewed-by: William Kucharski <william.kucharski@oracle.com>

> On May 11, 2021, at 3:47 PM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
> 
> Managing memory in 4KiB pages is a serious overhead.  Many benchmarks
> benefit from a larger "page size".  As an example, an earlier iteration
> of this idea which used compound pages (and wasn't particularly tuned)
> got a 7% performance boost when compiling the kernel.
> 
> Using compound pages or THPs exposes a weakness of our type system.
> Functions are often unprepared for compound pages to be passed to them,
> and may only act on PAGE_SIZE chunks.  Even functions which are aware of
> compound pages may expect a head page, and do the wrong thing if passed
> a tail page.
> 
> We also waste a lot of instructions ensuring that we're not looking at
> a tail page.  Almost every call to PageFoo() contains one or more hidden
> calls to compound_head().  This also happens for get_page(), put_page()
> and many more functions.  There does not appear to be a way to tell gcc
> that it can cache the result of compound_head(), nor is there a way to
> tell it that compound_head() is idempotent.
> 
> This patch series uses a new type, the struct folio, to manage memory.
> It provides some basic infrastructure that's worthwhile in its own right,
> shrinking the kernel by about 5kB of text.
> 
> Since v9:
> - Rebase onto mmotm 2021-05-10-21-46
> - Add folio_memcg() definition for !MEMCG (intel lkp)
> - Change folio->private from an unsigned long to a void *
> - Use folio_page() to implement folio_file_page()
> - Add folio_try_get() and folio_try_get_rcu()
> - Trim back down to just the first few patches, which are better-reviewed.
> v9: https://lore.kernel.org/linux-mm/20210505150628.111735-1-willy@infradead.org/
> v8: https://lore.kernel.org/linux-mm/20210430180740.2707166-1-willy@infradead.org/
> 
> Matthew Wilcox (Oracle) (33):
>  mm: Introduce struct folio
>  mm: Add folio_pgdat and folio_zone
>  mm/vmstat: Add functions to account folio statistics
>  mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
>  mm: Add folio reference count functions
>  mm: Add folio_put
>  mm: Add folio_get
>  mm: Add folio_try_get_rcu
>  mm: Add folio flag manipulation functions
>  mm: Add folio_young and folio_idle
>  mm: Handle per-folio private data
>  mm/filemap: Add folio_index, folio_file_page and folio_contains
>  mm/filemap: Add folio_next_index
>  mm/filemap: Add folio_offset and folio_file_offset
>  mm/util: Add folio_mapping and folio_file_mapping
>  mm: Add folio_mapcount
>  mm/memcg: Add folio wrappers for various functions
>  mm/filemap: Add folio_unlock
>  mm/filemap: Add folio_lock
>  mm/filemap: Add folio_lock_killable
>  mm/filemap: Add __folio_lock_async
>  mm/filemap: Add __folio_lock_or_retry
>  mm/filemap: Add folio_wait_locked
>  mm/swap: Add folio_rotate_reclaimable
>  mm/filemap: Add folio_end_writeback
>  mm/writeback: Add folio_wait_writeback
>  mm/writeback: Add folio_wait_stable
>  mm/filemap: Add folio_wait_bit
>  mm/filemap: Add folio_wake_bit
>  mm/filemap: Convert page wait queues to be folios
>  mm/filemap: Add folio private_2 functions
>  fs/netfs: Add folio fscache functions
>  mm: Add folio_mapped
> 
> Documentation/core-api/mm-api.rst           |   4 +
> Documentation/filesystems/netfs_library.rst |   2 +
> fs/afs/write.c                              |   9 +-
> fs/cachefiles/rdwr.c                        |  16 +-
> fs/io_uring.c                               |   2 +-
> include/linux/memcontrol.h                  |  63 ++++
> include/linux/mm.h                          | 174 ++++++++--
> include/linux/mm_types.h                    |  71 ++++
> include/linux/mmdebug.h                     |  20 ++
> include/linux/netfs.h                       |  77 +++--
> include/linux/page-flags.h                  | 230 ++++++++++---
> include/linux/page_idle.h                   |  99 +++---
> include/linux/page_ref.h                    | 158 ++++++++-
> include/linux/pagemap.h                     | 358 ++++++++++++--------
> include/linux/swap.h                        |   7 +-
> include/linux/vmstat.h                      | 107 ++++++
> mm/Makefile                                 |   2 +-
> mm/filemap.c                                | 315 ++++++++---------
> mm/folio-compat.c                           |  43 +++
> mm/internal.h                               |   1 +
> mm/memory.c                                 |   8 +-
> mm/page-writeback.c                         |  72 ++--
> mm/page_io.c                                |   4 +-
> mm/swap.c                                   |  18 +-
> mm/swapfile.c                               |   8 +-
> mm/util.c                                   |  59 ++--
> 26 files changed, 1374 insertions(+), 553 deletions(-)
> create mode 100644 mm/folio-compat.c
> 
> -- 
> 2.30.2
> 
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 01/33] mm: Introduce struct folio
  2021-05-11 21:47 ` [PATCH v10 01/33] mm: Introduce struct folio Matthew Wilcox (Oracle)
  2021-05-14 10:34   ` Vlastimil Babka
  2021-05-14 10:40   ` Vlastimil Babka
@ 2021-05-15 10:55   ` William Kucharski
  2021-05-15 20:14     ` Matthew Wilcox
  2021-05-27  8:09   ` Christoph Hellwig
  3 siblings, 1 reply; 96+ messages in thread
From: William Kucharski @ 2021-05-15 10:55 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Andrew Morton, Linux-Fsdevel, linux-mm, linux-kernel, Jeff Layton

Comment inline:

> On May 11, 2021, at 3:47 PM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
> 
> A struct folio is a new abstraction to replace the venerable struct page.
> A function which takes a struct folio argument declares that it will
> operate on the entire (possibly compound) page, not just PAGE_SIZE bytes.
> In return, the caller guarantees that the pointer it is passing does
> not point to a tail page.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Acked-by: Jeff Layton <jlayton@kernel.org>
> ---
> Documentation/core-api/mm-api.rst |  1 +
> include/linux/mm.h                | 74 +++++++++++++++++++++++++++++++
> include/linux/mm_types.h          | 60 +++++++++++++++++++++++++
> include/linux/page-flags.h        | 27 +++++++++++
> 4 files changed, 162 insertions(+)
> 
> diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst
> index a42f9baddfbf..2a94e6164f80 100644
> --- a/Documentation/core-api/mm-api.rst
> +++ b/Documentation/core-api/mm-api.rst
> @@ -95,6 +95,7 @@ More Memory Management Functions
> .. kernel-doc:: mm/mempolicy.c
> .. kernel-doc:: include/linux/mm_types.h
>    :internal:
> +.. kernel-doc:: include/linux/page-flags.h
> .. kernel-doc:: include/linux/mm.h
>    :internal:
> .. kernel-doc:: include/linux/mmzone.h
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2327f99b121f..b29c86824e6b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -950,6 +950,20 @@ static inline unsigned int compound_order(struct page *page)
> 	return page[1].compound_order;
> }
> 
> +/**
> + * folio_order - The allocation order of a folio.
> + * @folio: The folio.
> + *
> + * A folio is composed of 2^order pages.  See get_order() for the definition
> + * of order.
> + *
> + * Return: The order of the folio.
> + */
> +static inline unsigned int folio_order(struct folio *folio)
> +{
> +	return compound_order(&folio->page);
> +}
> +
> static inline bool hpage_pincount_available(struct page *page)
> {
> 	/*
> @@ -1595,6 +1609,65 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
> #endif
> }
> 
> +/**
> + * folio_nr_pages - The number of pages in the folio.
> + * @folio: The folio.
> + *
> + * Return: A number which is a power of two.
> + */
> +static inline unsigned long folio_nr_pages(struct folio *folio)
> +{
> +	return compound_nr(&folio->page);
> +}
> +
> +/**
> + * folio_next - Move to the next physical folio.
> + * @folio: The folio we're currently operating on.
> + *
> + * If you have physically contiguous memory which may span more than
> + * one folio (eg a &struct bio_vec), use this function to move from one
> + * folio to the next.  Do not use it if the memory is only virtually
> + * contiguous as the folios are almost certainly not adjacent to each
> + * other.  This is the folio equivalent to writing ``page++``.
> + *
> + * Context: We assume that the folios are refcounted and/or locked at a
> + * higher level and do not adjust the reference counts.
> + * Return: The next struct folio.
> + */
> +static inline struct folio *folio_next(struct folio *folio)
> +{
> +	return (struct folio *)folio_page(folio, folio_nr_pages(folio));
> +}
> +
> +/**
> + * folio_shift - The number of bits covered by this folio.
> + * @folio: The folio.
> + *
> + * A folio contains a number of bytes which is a power-of-two in size.
> + * This function tells you which power-of-two the folio is.
> + *
> + * Context: The caller should have a reference on the folio to prevent
> + * it from being split.  It is not necessary for the folio to be locked.
> + * Return: The base-2 logarithm of the size of this folio.
> + */
> +static inline unsigned int folio_shift(struct folio *folio)
> +{
> +	return PAGE_SHIFT + folio_order(folio);
> +}
> +
> +/**
> + * folio_size - The number of bytes in a folio.
> + * @folio: The folio.
> + *
> + * Context: The caller should have a reference on the folio to prevent
> + * it from being split.  It is not necessary for the folio to be locked.
> + * Return: The number of bytes in this folio.
> + */
> +static inline size_t folio_size(struct folio *folio)
> +{
> +	return PAGE_SIZE << folio_order(folio);
> +}
> +
> /*
>  * Some inline functions in vmstat.h depend on page_zone()
>  */
> @@ -1699,6 +1772,7 @@ extern void pagefault_out_of_memory(void);
> 
> #define offset_in_page(p)	((unsigned long)(p) & ~PAGE_MASK)
> #define offset_in_thp(page, p)	((unsigned long)(p) & (thp_size(page) - 1))
> +#define offset_in_folio(folio, p) ((unsigned long)(p) & (folio_size(folio) - 1))
> 
> /*
>  * Flags passed to show_mem() and show_free_areas() to suppress output in
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5aacc1c10a45..3118ba8b5a4e 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -224,6 +224,66 @@ struct page {
> #endif
> } _struct_page_alignment;
> 
> +/**
> + * struct folio - Represents a contiguous set of bytes.
> + * @flags: Identical to the page flags.
> + * @lru: Least Recently Used list; tracks how recently this folio was used.
> + * @mapping: The file this page belongs to, or refers to the anon_vma for
> + *    anonymous pages.
> + * @index: Offset within the file, in units of pages.  For anonymous pages,
> + *    this is the index from the beginning of the mmap.
> + * @private: Filesystem per-folio data (see folio_attach_private()).
> + *    Used for swp_entry_t if folio_swapcache().
> + * @_mapcount: Do not access this member directly.  Use folio_mapcount() to
> + *    find out how many times this folio is mapped by userspace.
> + * @_refcount: Do not access this member directly.  Use folio_ref_count()
> + *    to find how many references there are to this folio.
> + * @memcg_data: Memory Control Group data.
> + *
> + * A folio is a physically, virtually and logically contiguous set
> + * of bytes.  It is a power-of-two in size, and it is aligned to that
> + * same power-of-two.  It is at least as large as %PAGE_SIZE.  If it is
> + * in the page cache, it is at a file offset which is a multiple of that
> + * power-of-two.  It may be mapped into userspace at an address which is
> + * at an arbitrary page offset, but its kernel virtual address is aligned
> + * to its size.
> + */
> +struct folio {
> +	/* private: don't document the anon union */
> +	union {
> +		struct {
> +	/* public: */
> +			unsigned long flags;
> +			struct list_head lru;
> +			struct address_space *mapping;
> +			pgoff_t index;
> +			void *private;
> +			atomic_t _mapcount;
> +			atomic_t _refcount;
> +#ifdef CONFIG_MEMCG
> +			unsigned long memcg_data;
> +#endif
> +	/* private: the union with struct page is transitional */
> +		};
> +		struct page page;
> +	};
> +};
> +
> +static_assert(sizeof(struct page) == sizeof(struct folio));
> +#define FOLIO_MATCH(pg, fl)						\
> +	static_assert(offsetof(struct page, pg) == offsetof(struct folio, fl))
> +FOLIO_MATCH(flags, flags);
> +FOLIO_MATCH(lru, lru);
> +FOLIO_MATCH(compound_head, lru);
> +FOLIO_MATCH(index, index);
> +FOLIO_MATCH(private, private);
> +FOLIO_MATCH(_mapcount, _mapcount);
> +FOLIO_MATCH(_refcount, _refcount);
> +#ifdef CONFIG_MEMCG
> +FOLIO_MATCH(memcg_data, memcg_data);
> +#endif
> +#undef FOLIO_MATCH
> +
> static inline atomic_t *compound_mapcount_ptr(struct page *page)
> {
> 	return &page[1].compound_mapcount;
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d8e26243db25..e069aa8b11b7 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -188,6 +188,33 @@ static inline unsigned long _compound_head(const struct page *page)
> 
> #define compound_head(page)	((typeof(page))_compound_head(page))
> 
> +/**
> + * page_folio - Converts from page to folio.
> + * @p: The page.
> + *
> + * Every page is part of a folio.  This function cannot be called on a
> + * NULL pointer.
> + *
> + * Context: No reference, nor lock is required on @page.  If the caller
> + * does not hold a reference, this call may race with a folio split, so
> + * it should re-check the folio still contains this page after gaining
> + * a reference on the folio.
> + * Return: The folio which contains this page.
> + */
> +#define page_folio(p)		(_Generic((p),				\
> +	const struct page *:	(const struct folio *)_compound_head(p), \
> +	struct page *:		(struct folio *)_compound_head(p)))
> +
> +/**
> + * folio_page - Return a page from a folio.
> + * @folio: The folio.
> + * @n: The page number to return.
> + *
> + * @n is relative to the start of the folio.  It should be between
> + * 0 and folio_nr_pages(@folio) - 1, but this is not checked for.

Please add a statement noting WHY @n isn't checked since you state it
should be. Something like "...but this is not checked for because this is
a hot path."

> + */
> +#define folio_page(folio, n)	nth_page(&(folio)->page, n)
> +
> static __always_inline int PageTail(struct page *page)
> {
> 	return READ_ONCE(page->compound_head) & 1;
> -- 
> 2.30.2
> 
> 

Thanks,
    Bill

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 12/33] mm/filemap: Add folio_index, folio_file_page and folio_contains
  2021-05-14 15:55   ` Vlastimil Babka
@ 2021-05-15 15:51     ` Matthew Wilcox
  0 siblings, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2021-05-15 15:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

On Fri, May 14, 2021 at 05:55:46PM +0200, Vlastimil Babka wrote:
> On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> > folio_index() is the equivalent of page_index() for folios.
> > folio_file_page() is the equivalent of find_subpage().
> 
> find_subpage() special cases hugetlbfs, folio_file_page() doesn't.
> 
> > folio_contains() is the equivalent of thp_contains().
> 
> Yet here, both thp_contains() and folio_contains() does.
> 
> This patch doesn't add users so maybe it becomes obvious later, but perhaps
> worth explaining in the changelog or comment?

No, you're right, this is a bug.

I originally had it in my mind that hugetlbfs wouldn't need to do this
any more because it can just use the folio interfaces and never try to
find the subpage.

But I don't understand all the cases well enough to be sure that
they're all gone, and they certainly don't all go as part of this
patch series.  So I think I need to reintroduce the check-for-hugetlb
to folio_file_page() and we can look at removing it later once we're
sure that nobody is using the interfaces that return pages from the page
cache any more.  Or we convert hugetlbfs to use the page cache the same
way as every other filesystem ;-)

Thanks for spotting that.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 01/33] mm: Introduce struct folio
  2021-05-15 10:55   ` William Kucharski
@ 2021-05-15 20:14     ` Matthew Wilcox
  2021-05-16 19:26       ` William Kucharski
  0 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox @ 2021-05-15 20:14 UTC (permalink / raw)
  To: William Kucharski
  Cc: Andrew Morton, Linux-Fsdevel, linux-mm, linux-kernel, Jeff Layton

On Sat, May 15, 2021 at 10:55:19AM +0000, William Kucharski wrote:
> > +/**
> > + * folio_page - Return a page from a folio.
> > + * @folio: The folio.
> > + * @n: The page number to return.
> > + *
> > + * @n is relative to the start of the folio.  It should be between
> > + * 0 and folio_nr_pages(@folio) - 1, but this is not checked for.
> 
> Please add a statement noting WHY @n isn't checked since you state it
> should be. Something like "...but this is not checked for because this is
> a hot path."

Hmm ... how about this:

/**
 * folio_page - Return a page from a folio.
 * @folio: The folio.
 * @n: The page number to return.
 *
 * @n is relative to the start of the folio.  This function does not
 * check that the page number lies within @folio; the caller is presumed
 * to have a reference to the page.
 */
#define folio_page(folio, n)    nth_page(&(folio)->page, n)

It occurred to me that it is actually useful (under some circumstances)
for referring to a page outside the base folio.  For example when
dealing with bios that have merged consecutive pages together into a
single bvec (ok, bios don't use folios, but it would be reasonable if
they did in future).

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 01/33] mm: Introduce struct folio
  2021-05-15 20:14     ` Matthew Wilcox
@ 2021-05-16 19:26       ` William Kucharski
  0 siblings, 0 replies; 96+ messages in thread
From: William Kucharski @ 2021-05-16 19:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Linux-Fsdevel, linux-mm, linux-kernel, Jeff Layton



> On May 15, 2021, at 2:14 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Sat, May 15, 2021 at 10:55:19AM +0000, William Kucharski wrote:
>>> +/**
>>> + * folio_page - Return a page from a folio.
>>> + * @folio: The folio.
>>> + * @n: The page number to return.
>>> + *
>>> + * @n is relative to the start of the folio.  It should be between
>>> + * 0 and folio_nr_pages(@folio) - 1, but this is not checked for.
>> 
>> Please add a statement noting WHY @n isn't checked since you state it
>> should be. Something like "...but this is not checked for because this is
>> a hot path."
> 
> Hmm ... how about this:
> 
> /**
> * folio_page - Return a page from a folio.
> * @folio: The folio.
> * @n: The page number to return.
> *
> * @n is relative to the start of the folio.  This function does not
> * check that the page number lies within @folio; the caller is presumed
> * to have a reference to the page.
> */
> #define folio_page(folio, n)    nth_page(&(folio)->page, n)
> 
> It occurred to me that it is actually useful (under some circumstances)
> for referring to a page outside the base folio.  For example when
> dealing with bios that have merged consecutive pages together into a
> single bvec (ok, bios don't use folios, but it would be reasonable if
> they did in future).

I like that comment better, or you could just state bounds checking of
the returned page number is left to the caller; that would cover both the
normal case and possible future usage for calculations outside the base
folio.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 17/33] mm/memcg: Add folio wrappers for various functions
  2021-05-11 21:47 ` [PATCH v10 17/33] mm/memcg: Add folio wrappers for various functions Matthew Wilcox (Oracle)
@ 2021-05-18  9:57   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18  9:57 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Add new wrapper functions folio_memcg(), lock_folio_memcg(),
> unlock_folio_memcg(), mem_cgroup_folio_lruvec() and
> count_memcg_folio_event()
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 18/33] mm/filemap: Add folio_unlock
  2021-05-11 21:47 ` [PATCH v10 18/33] mm/filemap: Add folio_unlock Matthew Wilcox (Oracle)
@ 2021-05-18 10:06   ` Vlastimil Babka
  2021-05-18 11:30     ` Matthew Wilcox
  0 siblings, 1 reply; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 10:06 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Convert unlock_page() to call folio_unlock().  By using a folio we
> avoid a call to compound_head().  This shortens the function from 39
> bytes to 25 and removes 4 instructions on x86-64.  Because we still
> have unlock_page(), it's a net increase of 24 bytes of text for the
> kernel as a whole, but any path that uses folio_unlock() will execute
> 4 fewer instructions.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>
> ---
>  include/linux/pagemap.h |  3 ++-
>  mm/filemap.c            | 27 ++++++++++-----------------
>  mm/folio-compat.c       |  6 ++++++
>  3 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 1f37d7656955..8dbba0074536 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -643,7 +643,8 @@ extern int __lock_page_killable(struct page *page);
>  extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
>  extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>  				unsigned int flags);
> -extern void unlock_page(struct page *page);
> +void unlock_page(struct page *page);
> +void folio_unlock(struct folio *folio);
>  
>  /*
>   * Return true if the page was successfully locked
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 817a47059bd0..e7a6a58d6cd9 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1435,29 +1435,22 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
>  #endif
>  
>  /**
> - * unlock_page - unlock a locked page
> - * @page: the page
> + * folio_unlock - Unlock a locked folio.
> + * @folio: The folio.
>   *
> - * Unlocks the page and wakes up sleepers in wait_on_page_locked().
> - * Also wakes sleepers in wait_on_page_writeback() because the wakeup
> - * mechanism between PageLocked pages and PageWriteback pages is shared.
> - * But that's OK - sleepers in wait_on_page_writeback() just go back to sleep.
> + * Unlocks the folio and wakes up any thread sleeping on the page lock.
>   *
> - * Note that this depends on PG_waiters being the sign bit in the byte
> - * that contains PG_locked - thus the BUILD_BUG_ON(). That allows us to
> - * clear the PG_locked bit and test PG_waiters at the same time fairly
> - * portably (architectures that do LL/SC can test any bit, while x86 can
> - * test the sign bit).

Was it necessary to remove the comments about wait_on_page_writeback() and
PG_waiters etc?

> + * Context: May be called from interrupt or process context.  May not be
> + * called from NMI context.

Where did the NMI part come from?

>   */
> -void unlock_page(struct page *page)
> +void folio_unlock(struct folio *folio)
>  {
>  	BUILD_BUG_ON(PG_waiters != 7);
> -	page = compound_head(page);
> -	VM_BUG_ON_PAGE(!PageLocked(page), page);
> -	if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags))
> -		wake_up_page_bit(page, PG_locked);
> +	VM_BUG_ON_FOLIO(!folio_locked(folio), folio);
> +	if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0)))
> +		wake_up_page_bit(&folio->page, PG_locked);
>  }
> -EXPORT_SYMBOL(unlock_page);
> +EXPORT_SYMBOL(folio_unlock);
>  
>  /**
>   * end_page_private_2 - Clear PG_private_2 and release any waiters
> diff --git a/mm/folio-compat.c b/mm/folio-compat.c
> index 5e107aa30a62..91b3d00a92f7 100644
> --- a/mm/folio-compat.c
> +++ b/mm/folio-compat.c
> @@ -11,3 +11,9 @@ struct address_space *page_mapping(struct page *page)
>  	return folio_mapping(page_folio(page));
>  }
>  EXPORT_SYMBOL(page_mapping);
> +
> +void unlock_page(struct page *page)
> +{
> +	return folio_unlock(page_folio(page));
> +}
> +EXPORT_SYMBOL(unlock_page);
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 19/33] mm/filemap: Add folio_lock
  2021-05-11 21:47 ` [PATCH v10 19/33] mm/filemap: Add folio_lock Matthew Wilcox (Oracle)
@ 2021-05-18 10:26   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 10:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> This is like lock_page() but for use by callers who know they have a folio.
> Convert __lock_page() to be __folio_lock().  This saves one call to
> compound_head() per contended call to lock_page().
> 
> Saves 362 bytes of text; mostly from improved register allocation and
> inlining decisions.  __folio_lock is 59 bytes while __lock_page was 79.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/pagemap.h | 24 +++++++++++++++++++-----
>  mm/filemap.c            | 29 +++++++++++++++--------------
>  2 files changed, 34 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 8dbba0074536..9a78397609b8 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -638,7 +638,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
>  	return true;
>  }
>  
> -extern void __lock_page(struct page *page);
> +void __folio_lock(struct folio *folio);
>  extern int __lock_page_killable(struct page *page);
>  extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
>  extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> @@ -646,13 +646,24 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>  void unlock_page(struct page *page);
>  void folio_unlock(struct folio *folio);
>  
> +static inline bool folio_trylock(struct folio *folio)
> +{
> +	return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0)));
> +}
> +
>  /*
>   * Return true if the page was successfully locked
>   */
>  static inline int trylock_page(struct page *page)
>  {
> -	page = compound_head(page);
> -	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
> +	return folio_trylock(page_folio(page));
> +}
> +
> +static inline void folio_lock(struct folio *folio)
> +{
> +	might_sleep();
> +	if (!folio_trylock(folio))
> +		__folio_lock(folio);
>  }
>  
>  /*
> @@ -660,9 +671,12 @@ static inline int trylock_page(struct page *page)
>   */
>  static inline void lock_page(struct page *page)
>  {
> +	struct folio *folio;
>  	might_sleep();
> -	if (!trylock_page(page))
> -		__lock_page(page);
> +
> +	folio = page_folio(page);
> +	if (!folio_trylock(folio))
> +		__folio_lock(folio);
>  }
>  
>  /*
> diff --git a/mm/filemap.c b/mm/filemap.c
> index e7a6a58d6cd9..c6e5ba176764 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1187,7 +1187,7 @@ static void wake_up_page(struct page *page, int bit)
>   */
>  enum behavior {
>  	EXCLUSIVE,	/* Hold ref to page and take the bit when woken, like
> -			 * __lock_page() waiting on then setting PG_locked.
> +			 * __folio_lock() waiting on then setting PG_locked.
>  			 */
>  	SHARED,		/* Hold ref to page and check the bit when woken, like
>  			 * wait_on_page_writeback() waiting on PG_writeback.
> @@ -1576,17 +1576,16 @@ void page_endio(struct page *page, bool is_write, int err)
>  EXPORT_SYMBOL_GPL(page_endio);
>  
>  /**
> - * __lock_page - get a lock on the page, assuming we need to sleep to get it
> - * @__page: the page to lock
> + * __folio_lock - Get a lock on the folio, assuming we need to sleep to get it.
> + * @folio: The folio to lock
>   */
> -void __lock_page(struct page *__page)
> +void __folio_lock(struct folio *folio)
>  {
> -	struct page *page = compound_head(__page);
> -	wait_queue_head_t *q = page_waitqueue(page);
> -	wait_on_page_bit_common(q, page, PG_locked, TASK_UNINTERRUPTIBLE,
> +	wait_queue_head_t *q = page_waitqueue(&folio->page);
> +	wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_UNINTERRUPTIBLE,
>  				EXCLUSIVE);
>  }
> -EXPORT_SYMBOL(__lock_page);
> +EXPORT_SYMBOL(__folio_lock);
>  
>  int __lock_page_killable(struct page *__page)
>  {
> @@ -1661,10 +1660,10 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>  			return 0;
>  		}
>  	} else {
> -		__lock_page(page);
> +		__folio_lock(page_folio(page));
>  	}
> -	return 1;
>  
> +	return 1;
>  }
>  
>  /**
> @@ -2835,7 +2834,9 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
>  static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
>  				     struct file **fpin)
>  {
> -	if (trylock_page(page))
> +	struct folio *folio = page_folio(page);
> +
> +	if (folio_trylock(folio))
>  		return 1;
>  
>  	/*
> @@ -2848,7 +2849,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
>  
>  	*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
>  	if (vmf->flags & FAULT_FLAG_KILLABLE) {
> -		if (__lock_page_killable(page)) {
> +		if (__lock_page_killable(&folio->page)) {
>  			/*
>  			 * We didn't have the right flags to drop the mmap_lock,
>  			 * but all fault_handlers only check for fatal signals
> @@ -2860,11 +2861,11 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
>  			return 0;
>  		}
>  	} else
> -		__lock_page(page);
> +		__folio_lock(folio);
> +
>  	return 1;
>  }
>  
> -
>  /*
>   * Synchronous readahead happens when we don't even find a page in the page
>   * cache at all.  We don't want to perform IO under the mmap sem, so if we have
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 20/33] mm/filemap: Add folio_lock_killable
  2021-05-11 21:47 ` [PATCH v10 20/33] mm/filemap: Add folio_lock_killable Matthew Wilcox (Oracle)
@ 2021-05-18 10:31   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 10:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> This is like lock_page_killable() but for use by callers who
> know they have a folio.  Convert __lock_page_killable() to be
> __folio_lock_killable().  This saves one call to compound_head() per
> contended call to lock_page_killable().
> 
> __folio_lock_killable() is 20 bytes smaller than __lock_page_killable()
> was.  lock_page_maybe_drop_mmap() shrinks by 68 bytes and
> __lock_page_or_retry() shrinks by 66 bytes.  That's a total of 154 bytes
> of text saved.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 21/33] mm/filemap: Add __folio_lock_async
  2021-05-11 21:47 ` [PATCH v10 21/33] mm/filemap: Add __folio_lock_async Matthew Wilcox (Oracle)
@ 2021-05-18 10:34   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 10:34 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> There aren't any actual callers of lock_page_async(), so remove it.
> Convert filemap_update_page() to call __folio_lock_async().
> 
> __folio_lock_async() is 21 bytes smaller than __lock_page_async(),
> but the real savings come from using a folio in filemap_update_page(),
> shrinking it from 514 bytes to 403 bytes, saving 111 bytes.  The text
> shrinks by 132 bytes in total.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 22/33] mm/filemap: Add __folio_lock_or_retry
  2021-05-11 21:47 ` [PATCH v10 22/33] mm/filemap: Add __folio_lock_or_retry Matthew Wilcox (Oracle)
@ 2021-05-18 10:38   ` Vlastimil Babka
  2021-05-18 10:45     ` Vlastimil Babka
  2021-05-18 13:35     ` Matthew Wilcox
  0 siblings, 2 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 10:38 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Convert __lock_page_or_retry() to __folio_lock_or_retry().  This actually
> saves 4 bytes in the only caller of lock_page_or_retry() (due to better
> register allocation) and saves the 20 byte cost of calling page_folio()
> in __folio_lock_or_retry() for a total saving of 24 bytes.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>
> ---
>  include/linux/pagemap.h |  9 ++++++---
>  mm/filemap.c            | 10 ++++------
>  mm/memory.c             |  8 ++++----
>  3 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 41224e4ca8cc..21e394964288 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -640,7 +640,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
>  
>  void __folio_lock(struct folio *folio);
>  int __folio_lock_killable(struct folio *folio);
> -extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> +int __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
>  				unsigned int flags);
>  void unlock_page(struct page *page);
>  void folio_unlock(struct folio *folio);
> @@ -701,13 +701,16 @@ static inline int lock_page_killable(struct page *page)
>   * caller indicated that it can handle a retry.
>   *
>   * Return value and mmap_lock implications depend on flags; see
> - * __lock_page_or_retry().
> + * __folio_lock_or_retry().
>   */
>  static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
>  				     unsigned int flags)
>  {
> +	struct folio *folio;
>  	might_sleep();
> -	return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
> +
> +	folio = page_folio(page);
> +	return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags);
>  }
>  
>  /*
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 67334eb3fd94..28bf50041671 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1623,20 +1623,18 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
>  
>  /*
>   * Return values:
> - * 1 - page is locked; mmap_lock is still held.
> - * 0 - page is not locked.
> + * 1 - folio is locked; mmap_lock is still held.
> + * 0 - folio is not locked.
>   *     mmap_lock has been released (mmap_read_unlock(), unless flags had both
>   *     FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in
>   *     which case mmap_lock is still held.
>   *
>   * If neither ALLOW_RETRY nor KILLABLE are set, will always return 1
> - * with the page locked and the mmap_lock unperturbed.
> + * with the folio locked and the mmap_lock unperturbed.
>   */
> -int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> +int __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
>  			 unsigned int flags)
>  {
> -	struct folio *folio = page_folio(page);
> -
>  	if (fault_flag_allow_retry_first(flags)) {
>  		/*
>  		 * CAUTION! In this case, mmap_lock is not released

A bit later in this branch, 'page' is accessed, but it no longer exists. And
thus as expected, it doesn't compile. Assuming it's fixed later, but
bisectability etc...

> diff --git a/mm/memory.c b/mm/memory.c
> index 86ba6c1f6821..fc3f50d0702c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4065,7 +4065,7 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
>   * We enter with non-exclusive mmap_lock (to exclude vma changes,
>   * but allow concurrent faults).
>   * The mmap_lock may have been released depending on flags and our
> - * return value.  See filemap_fault() and __lock_page_or_retry().
> + * return value.  See filemap_fault() and __folio_lock_or_retry().
>   * If mmap_lock is released, vma may become invalid (for example
>   * by other thread calling munmap()).
>   */
> @@ -4307,7 +4307,7 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
>   * concurrent faults).
>   *
>   * The mmap_lock may have been released depending on flags and our return value.
> - * See filemap_fault() and __lock_page_or_retry().
> + * See filemap_fault() and __folio_lock_or_retry().
>   */
>  static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>  {
> @@ -4411,7 +4411,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>   * By the time we get here, we already hold the mm semaphore
>   *
>   * The mmap_lock may have been released depending on flags and our
> - * return value.  See filemap_fault() and __lock_page_or_retry().
> + * return value.  See filemap_fault() and __folio_lock_or_retry().
>   */
>  static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>  		unsigned long address, unsigned int flags)
> @@ -4567,7 +4567,7 @@ static inline void mm_account_fault(struct pt_regs *regs,
>   * By the time we get here, we already hold the mm semaphore
>   *
>   * The mmap_lock may have been released depending on flags and our
> - * return value.  See filemap_fault() and __lock_page_or_retry().
> + * return value.  See filemap_fault() and __folio_lock_or_retry().
>   */
>  vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>  			   unsigned int flags, struct pt_regs *regs)
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 23/33] mm/filemap: Add folio_wait_locked
  2021-05-11 21:47 ` [PATCH v10 23/33] mm/filemap: Add folio_wait_locked Matthew Wilcox (Oracle)
@ 2021-05-18 10:41   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 10:41 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Also add folio_wait_locked_killable().  Turn wait_on_page_locked()
> and wait_on_page_locked_killable() into wrappers.  This eliminates a
> call to compound_head() from each call-site, reducing text size by 200
> bytes for me.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

(and it fixes filemap.c to be compilable again)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 22/33] mm/filemap: Add __folio_lock_or_retry
  2021-05-18 10:38   ` Vlastimil Babka
@ 2021-05-18 10:45     ` Vlastimil Babka
  2021-05-18 13:35     ` Matthew Wilcox
  1 sibling, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 10:45 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/18/21 12:38 PM, Vlastimil Babka wrote:
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -1623,20 +1623,18 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
>>  
>>  /*
>>   * Return values:
>> - * 1 - page is locked; mmap_lock is still held.
>> - * 0 - page is not locked.
>> + * 1 - folio is locked; mmap_lock is still held.
>> + * 0 - folio is not locked.
>>   *     mmap_lock has been released (mmap_read_unlock(), unless flags had both
>>   *     FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in
>>   *     which case mmap_lock is still held.
>>   *
>>   * If neither ALLOW_RETRY nor KILLABLE are set, will always return 1
>> - * with the page locked and the mmap_lock unperturbed.
>> + * with the folio locked and the mmap_lock unperturbed.
>>   */
>> -int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>> +int __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
>>  			 unsigned int flags)
>>  {
>> -	struct folio *folio = page_folio(page);
>> -
>>  	if (fault_flag_allow_retry_first(flags)) {
>>  		/*
>>  		 * CAUTION! In this case, mmap_lock is not released
> 
> A bit later in this branch, 'page' is accessed, but it no longer exists. And
> thus as expected, it doesn't compile. Assuming it's fixed later, but
> bisectability etc...

Also, the switch from 'page' to &folio->page in there should probably have been
done already in "[PATCH v10 20/33] mm/filemap: Add folio_lock_killable", not in
this patch?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 24/33] mm/swap: Add folio_rotate_reclaimable
  2021-05-11 21:47 ` [PATCH v10 24/33] mm/swap: Add folio_rotate_reclaimable Matthew Wilcox (Oracle)
@ 2021-05-18 10:48   ` Vlastimil Babka
  2021-05-27  8:19   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 10:48 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm; +Cc: linux-fsdevel, linux-mm, linux-kernel

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Move the declaration into mm/internal.h and rename
> rotate_reclaimable_page() to folio_rotate_reclaimable().  This eliminates
> all five of the calls to compound_head() in this function, saving 75 bytes
> at the cost of adding 14 bytes to its one caller, end_page_writeback().
> Net 61 bytes savings.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 25/33] mm/filemap: Add folio_end_writeback
  2021-05-11 21:47 ` [PATCH v10 25/33] mm/filemap: Add folio_end_writeback Matthew Wilcox (Oracle)
@ 2021-05-18 11:08   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 11:08 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Add an end_page_writeback() wrapper function for users that are not yet
> converted to folios.
> 
> folio_end_writeback() is less than half the size of end_page_writeback()
> at just 105 bytes compared to 213 bytes, due to removing all the
> compound_head() calls.  The 30 byte wrapper function makes this a net
> saving of 70 bytes.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 26/33] mm/writeback: Add folio_wait_writeback
  2021-05-11 21:47 ` [PATCH v10 26/33] mm/writeback: Add folio_wait_writeback Matthew Wilcox (Oracle)
@ 2021-05-18 11:12   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 11:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> wait_on_page_writeback_killable() only has one caller, so convert it to
> call folio_wait_writeback_killable().  For the wait_on_page_writeback()
> callers, add a compatibility wrapper around folio_wait_writeback().
> 
> Turning PageWriteback() into folio_writeback() eliminates a call to
> compound_head() which saves 8 bytes and 15 bytes in the two functions.
> That is more than offset by adding the wait_on_page_writeback
> compatibility wrapper for a net increase in text of 15 bytes.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 18/33] mm/filemap: Add folio_unlock
  2021-05-18 10:06   ` Vlastimil Babka
@ 2021-05-18 11:30     ` Matthew Wilcox
  0 siblings, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2021-05-18 11:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

On Tue, May 18, 2021 at 12:06:42PM +0200, Vlastimil Babka wrote:
> >  /**
> > - * unlock_page - unlock a locked page
> > - * @page: the page
> > + * folio_unlock - Unlock a locked folio.
> > + * @folio: The folio.
> >   *
> > - * Unlocks the page and wakes up sleepers in wait_on_page_locked().
> > - * Also wakes sleepers in wait_on_page_writeback() because the wakeup
> > - * mechanism between PageLocked pages and PageWriteback pages is shared.
> > - * But that's OK - sleepers in wait_on_page_writeback() just go back to sleep.
> > + * Unlocks the folio and wakes up any thread sleeping on the page lock.
> >   *
> > - * Note that this depends on PG_waiters being the sign bit in the byte
> > - * that contains PG_locked - thus the BUILD_BUG_ON(). That allows us to
> > - * clear the PG_locked bit and test PG_waiters at the same time fairly
> > - * portably (architectures that do LL/SC can test any bit, while x86 can
> > - * test the sign bit).
> 
> Was it necessary to remove the comments about wait_on_page_writeback() and
> PG_waiters etc?

I think so.  This kernel-doc is for the person who wants to understand
how to use the function, not for the person who wants to understand why
the function is written the way it is.  For that person, we have git log
messages and other comments dotted throughout, eg the comment on
clear_bit_unlock_is_negative_byte() in mm/filemap.c and the comment
on PG_waiters in include/linux/page-flags.h.

> > + * Context: May be called from interrupt or process context.  May not be
> > + * called from NMI context.
> 
> Where did the NMI part come from?

If you're in NMI context and call unlock_page() and the page has a
waiter on it, we call folio_wake_bit(), which calls spin_lock_irqsave()
on the wait_queue_head_t lock, which I believe cannot be done safely 
from NMI context (as the NMI may have interrupted us while holding
that lock).


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 27/33] mm/writeback: Add folio_wait_stable
  2021-05-11 21:47 ` [PATCH v10 27/33] mm/writeback: Add folio_wait_stable Matthew Wilcox (Oracle)
@ 2021-05-18 11:42   ` Vlastimil Babka
  2021-05-18 13:55     ` Matthew Wilcox
  0 siblings, 1 reply; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 11:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Move wait_for_stable_page() into the folio compatibility file.
> folio_wait_stable() avoids a call to compound_head() and is 14 bytes
> smaller than wait_for_stable_page() was.  The net text size grows by 24
> bytes as a result of this patch.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

This seems to remove last user of thp_head(). Remove it as obsolete?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 28/33] mm/filemap: Add folio_wait_bit
  2021-05-11 21:47 ` [PATCH v10 28/33] mm/filemap: Add folio_wait_bit Matthew Wilcox (Oracle)
@ 2021-05-18 11:51   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 11:51 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Rename wait_on_page_bit() to folio_wait_bit().  We must always wait on
> the folio, otherwise we won't be woken up due to the tail page hashing
> to a different bucket from the head page.
> 
> This commit shrinks the kernel by 691 bytes, mostly due to moving
> the page waitqueue lookup into folio_wait_bit_common().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>


Acked-by: Vlastimil Babka <vbabka@suse.cz>


Nit below.

> ---
>  include/linux/pagemap.h | 10 +++---
>  mm/filemap.c            | 77 +++++++++++++++++++----------------------
>  mm/page-writeback.c     |  4 +--
>  3 files changed, 43 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 06b69cd03da3..e524e1b7190a 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -714,11 +714,11 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
>  }
>  
>  /*
> - * This is exported only for wait_on_page_locked/wait_on_page_writeback, etc.,
> + * This is exported only for folio_wait_locked/folio_wait_writeback, etc.,
>   * and should not be used directly.
>   */
> -extern void wait_on_page_bit(struct page *page, int bit_nr);
> -extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
> +extern void folio_wait_bit(struct folio *folio, int bit_nr);
> +extern int folio_wait_bit_killable(struct folio *folio, int bit_nr);

Nit: you remove these 'externs' in other patches, not here?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 29/33] mm/filemap: Add folio_wake_bit
  2021-05-11 21:47 ` [PATCH v10 29/33] mm/filemap: Add folio_wake_bit Matthew Wilcox (Oracle)
@ 2021-05-18 11:53   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 11:53 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Convert wake_up_page_bit() to folio_wake_bit().  All callers have a folio,
> so use it directly.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>


Acked-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 30/33] mm/filemap: Convert page wait queues to be folios
  2021-05-11 21:47 ` [PATCH v10 30/33] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
@ 2021-05-18 12:23   ` Vlastimil Babka
  0 siblings, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 12:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Reinforce that page flags are actually in the head page by changing the
> type from page to folio.  Increases the size of cachefiles by two bytes,
> but the kernel core is unchanged in size.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

More renaming of stuff could be possible, but not essential for functionality.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 31/33] mm/filemap: Add folio private_2 functions
  2021-05-11 21:47 ` [PATCH v10 31/33] mm/filemap: Add folio private_2 functions Matthew Wilcox (Oracle)
@ 2021-05-18 12:26   ` Vlastimil Babka
  2021-05-27  8:21   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 12:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm; +Cc: linux-fsdevel, linux-mm, linux-kernel

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> end_page_private_2() becomes folio_end_private_2(),
> wait_on_page_private_2() becomes folio_wait_private_2() and
> wait_on_page_private_2_killable() becomes folio_wait_private_2_killable().
> 
> Adjust the fscache equivalents to call page_folio() before calling these
> functions to avoid adding wrappers.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 22/33] mm/filemap: Add __folio_lock_or_retry
  2021-05-18 10:38   ` Vlastimil Babka
  2021-05-18 10:45     ` Vlastimil Babka
@ 2021-05-18 13:35     ` Matthew Wilcox
  1 sibling, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2021-05-18 13:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

On Tue, May 18, 2021 at 12:38:46PM +0200, Vlastimil Babka wrote:
> > -int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> > +int __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
> >  			 unsigned int flags)
> >  {
> > -	struct folio *folio = page_folio(page);
> > -
> >  	if (fault_flag_allow_retry_first(flags)) {
> >  		/*
> >  		 * CAUTION! In this case, mmap_lock is not released
> 
> A bit later in this branch, 'page' is accessed, but it no longer exists. And
> thus as expected, it doesn't compile. Assuming it's fixed later, but
> bisectability etc...

Oops.  Thanks for catching that; I've reordered this patch and the
folio_wait_locked() patch, which makes the entire problem go away.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 32/33] fs/netfs: Add folio fscache functions
  2021-05-11 21:47 ` [PATCH v10 32/33] fs/netfs: Add folio fscache functions Matthew Wilcox (Oracle)
@ 2021-05-18 13:48   ` Vlastimil Babka
  2021-05-27  8:23   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 13:48 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm; +Cc: linux-fsdevel, linux-mm, linux-kernel

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> Match the page writeback functions by adding
> folio_start_fscache(), folio_end_fscache(), folio_wait_fscache() and
> folio_wait_fscache_killable().  Also rewrite the kernel-doc to describe
> when to use the function rather than what the function does, and include
> the kernel-doc in the appropriate rst file.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks like set_page_private_2() should be removed by this patch as it removes
the last caller, and the other functions were removed by previous patch.

Other than that,

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 27/33] mm/writeback: Add folio_wait_stable
  2021-05-18 11:42   ` Vlastimil Babka
@ 2021-05-18 13:55     ` Matthew Wilcox
  0 siblings, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2021-05-18 13:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig,
	Jeff Layton

On Tue, May 18, 2021 at 01:42:04PM +0200, Vlastimil Babka wrote:
> On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> > Move wait_for_stable_page() into the folio compatibility file.
> > folio_wait_stable() avoids a call to compound_head() and is 14 bytes
> > smaller than wait_for_stable_page() was.  The net text size grows by 24
> > bytes as a result of this patch.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Acked-by: Jeff Layton <jlayton@kernel.org>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> This seems to remove last user of thp_head(). Remove it as obsolete?

Good catch!  I'll squash that in.  We're down to just one user of
thp_order in my tree ...

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 33/33] mm: Add folio_mapped
  2021-05-11 21:47 ` [PATCH v10 33/33] mm: Add folio_mapped Matthew Wilcox (Oracle)
@ 2021-05-18 14:17   ` Vlastimil Babka
  2021-05-27  8:31   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Vlastimil Babka @ 2021-05-18 14:17 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), akpm; +Cc: linux-fsdevel, linux-mm, linux-kernel

On 5/11/21 11:47 PM, Matthew Wilcox (Oracle) wrote:
> This function is the equivalent of page_mapped().  It is slightly
> shorter as we do not need to handle the PageTail() case.  Reimplement
> page_mapped() as a wrapper around folio_mapped().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 16/33] mm: Add folio_mapcount
  2021-05-11 21:47 ` [PATCH v10 16/33] mm: Add folio_mapcount Matthew Wilcox (Oracle)
  2021-05-14 17:39   ` Vlastimil Babka
@ 2021-05-18 18:45   ` Matthew Wilcox
  1 sibling, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2021-05-18 18:45 UTC (permalink / raw)
  To: akpm
  Cc: linux-fsdevel, linux-mm, linux-kernel, Christoph Hellwig, Jeff Layton

On Tue, May 11, 2021 at 10:47:18PM +0100, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of page_mapcount().
[...]
>  
> +/**
> + * folio_mapcount - The number of mappings of this folio.
> + * @folio: The folio.
> + *
> + * The result includes the number of times any of the pages in the
> + * folio are mapped to userspace.

I thought it did, but it doesn't.  It returns the number of times
the head/base page of this folio is mapped into userspace, which is not
a terribly useful concept.  I suspect this should call total_mapcount()
instead.  Looking through the complete set of patches, it's only used
in debugging code (unaccount_page_cache_page() and dump_page()).
I'm going to withdraw this patch from the next submission until I've
had the chance to think about it some more.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 01/33] mm: Introduce struct folio
  2021-05-11 21:47 ` [PATCH v10 01/33] mm: Introduce struct folio Matthew Wilcox (Oracle)
                     ` (2 preceding siblings ...)
  2021-05-15 10:55   ` William Kucharski
@ 2021-05-27  8:09   ` Christoph Hellwig
  3 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2021-05-27  8:09 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: akpm, linux-fsdevel, linux-mm, linux-kernel, Jeff Layton

On Tue, May 11, 2021 at 10:47:03PM +0100, Matthew Wilcox (Oracle) wrote:
> A struct folio is a new abstraction to replace the venerable struct page.
> A function which takes a struct folio argument declares that it will
> operate on the entire (possibly compound) page, not just PAGE_SIZE bytes.
> In return, the caller guarantees that the pointer it is passing does
> not point to a tail page.

I still hate the overlay that must match struct page with passion and
think it is going to come back and bytes us.

But we really need to get out of the compound page mess and move forward
with large page suppot in the page cache.

So:

Reluctantly-Acked-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 07/33] mm: Add folio_get
  2021-05-14 14:24     ` Matthew Wilcox
  2021-05-14 15:39       ` Vlastimil Babka
@ 2021-05-27  8:10       ` Christoph Hellwig
  2021-05-27 22:53         ` Andrew Morton
  1 sibling, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2021-05-27  8:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Vlastimil Babka, akpm, linux-fsdevel, linux-mm, linux-kernel,
	Zi Yan, Christoph Hellwig, Jeff Layton

On Fri, May 14, 2021 at 03:24:26PM +0100, Matthew Wilcox wrote:
> On Fri, May 14, 2021 at 01:56:46PM +0200, Vlastimil Babka wrote:
> > Nitpick: function names in subject should IMHO also end with (). But not a
> > reason for resend all patches that don't...
> 
> Hm, I thought it was preferred to not do that.  I can fix it
> easily enough when I go through and add the R-b.

I hate the pointless ().  Some maintainers insist on it.   No matter
what you do you'll make some folks happy and others not.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 08/33] mm: Add folio_try_get_rcu
  2021-05-11 21:47 ` [PATCH v10 08/33] mm: Add folio_try_get_rcu Matthew Wilcox (Oracle)
  2021-05-14 12:11   ` Vlastimil Babka
@ 2021-05-27  8:16   ` Christoph Hellwig
  2021-06-05  4:26     ` Matthew Wilcox
  1 sibling, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2021-05-27  8:16 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel

On Tue, May 11, 2021 at 10:47:10PM +0100, Matthew Wilcox (Oracle) wrote:
> -static inline int page_ref_add_unless(struct page *page, int nr, int u)
> +static inline bool page_ref_add_unless(struct page *page, int nr, int u)
>  {
> -	int ret = atomic_add_unless(&page->_refcount, nr, u);
> +	bool ret = atomic_add_unless(&page->_refcount, nr, u);
>  
>  	if (page_ref_tracepoint_active(page_ref_mod_unless))
>  		__page_ref_mod_unless(page, nr, ret);
>  	return ret;
>  }

Unrelated but neat cleanup.

>  
> -static inline int folio_ref_add_unless(struct folio *folio, int nr, int u)
> +static inline bool folio_ref_add_unless(struct folio *folio, int nr, int u)
>  {
>  	return page_ref_add_unless(&folio->page, nr, u);
>  }

This should probably go into the patch adding folio_ref_add_unless.

> +static inline bool folio_ref_try_add_rcu(struct folio *folio, int count)

Should this have a __ prefix and/or a don't use direct comment?

> +{
> +#ifdef CONFIG_TINY_RCU
> +	/*
> +	 * The caller guarantees the folio will not be freed from interrupt
> +	 * context, so (on !SMP) we only need preemption to be disabled
> +	 * and TINY_RCU does that for us.
> +	 */
> +# ifdef CONFIG_PREEMPT_COUNT
> +	VM_BUG_ON(!in_atomic() && !irqs_disabled());
> +# endif

	VM_BUG_ON(IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
		  !in_atomic() && !irqs_disabled());

?

> +	VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio);
> +	folio_ref_add(folio, count);
> +#else
> +	if (unlikely(!folio_ref_add_unless(folio, count, 0))) {
> +		/* Either the folio has been freed, or will be freed. */
> +		return false;
> +	}
> +#endif
> +	return true;

but is this tiny rcu optimization really worth it?  I guess we're just
preserving it from the existing code and don't rock the boat..

> @@ -1746,6 +1746,26 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
>  }
>  EXPORT_SYMBOL(page_cache_prev_miss);
>  
> +/*
> + * Lockless page cache protocol:
> + * On the lookup side:
> + * 1. Load the folio from i_pages
> + * 2. Increment the refcount if it's not zero
> + * 3. If the folio is not found by xas_reload(), put the refcount and retry
> + *
> + * On the removal side:
> + * A. Freeze the page (by zeroing the refcount if nobody else has a reference)
> + * B. Remove the page from i_pages
> + * C. Return the page to the page allocator
> + *
> + * This means that any page may have its reference count temporarily
> + * increased by a speculative page cache (or fast GUP) lookup as it can
> + * be allocated by another user before the RCU grace period expires.
> + * Because the refcount temporarily acquired here may end up being the
> + * last refcount on the page, any page allocation must be freeable by
> + * put_folio().
> + */
> +
>  /*
>   * mapping_get_entry - Get a page cache entry.
>   * @mapping: the address_space to search

Is this really a good place for the comment?  I'd expect it either near
a relevant function or at the top of a file.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 10/33] mm: Add folio_young and folio_idle
  2021-05-11 21:47 ` [PATCH v10 10/33] mm: Add folio_young and folio_idle Matthew Wilcox (Oracle)
  2021-05-14 15:33   ` Vlastimil Babka
@ 2021-05-27  8:17   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2021-05-27  8:17 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel

On Tue, May 11, 2021 at 10:47:12PM +0100, Matthew Wilcox (Oracle) wrote:
> Idle page tracking is handled through page_ext on 32-bit architectures.
> Add folio equivalents for 32-bit and move all the page compatibility
> parts to common code.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 24/33] mm/swap: Add folio_rotate_reclaimable
  2021-05-11 21:47 ` [PATCH v10 24/33] mm/swap: Add folio_rotate_reclaimable Matthew Wilcox (Oracle)
  2021-05-18 10:48   ` Vlastimil Babka
@ 2021-05-27  8:19   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2021-05-27  8:19 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel

On Tue, May 11, 2021 at 10:47:26PM +0100, Matthew Wilcox (Oracle) wrote:
> Move the declaration into mm/internal.h and rename
> rotate_reclaimable_page() to folio_rotate_reclaimable().  This eliminates
> all five of the calls to compound_head() in this function, saving 75 bytes
> at the cost of adding 14 bytes to its one caller, end_page_writeback().
> Net 61 bytes savings.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 31/33] mm/filemap: Add folio private_2 functions
  2021-05-11 21:47 ` [PATCH v10 31/33] mm/filemap: Add folio private_2 functions Matthew Wilcox (Oracle)
  2021-05-18 12:26   ` Vlastimil Babka
@ 2021-05-27  8:21   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2021-05-27  8:21 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel

On Tue, May 11, 2021 at 10:47:33PM +0100, Matthew Wilcox (Oracle) wrote:
> end_page_private_2() becomes folio_end_private_2(),
> wait_on_page_private_2() becomes folio_wait_private_2() and
> wait_on_page_private_2_killable() becomes folio_wait_private_2_killable().
> 
> Adjust the fscache equivalents to call page_folio() before calling these
> functions to avoid adding wrappers.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 32/33] fs/netfs: Add folio fscache functions
  2021-05-11 21:47 ` [PATCH v10 32/33] fs/netfs: Add folio fscache functions Matthew Wilcox (Oracle)
  2021-05-18 13:48   ` Vlastimil Babka
@ 2021-05-27  8:23   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2021-05-27  8:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel

On Tue, May 11, 2021 at 10:47:34PM +0100, Matthew Wilcox (Oracle) wrote:
> Match the page writeback functions by adding
> folio_start_fscache(), folio_end_fscache(), folio_wait_fscache() and
> folio_wait_fscache_killable().  Also rewrite the kernel-doc to describe
> when to use the function rather than what the function does, and include
> the kernel-doc in the appropriate rst file.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks fine, but doesn't actually seem to be needed for this series.
I'd move it closer to actual users of the new helpers.

Otherwise:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 33/33] mm: Add folio_mapped
  2021-05-11 21:47 ` [PATCH v10 33/33] mm: Add folio_mapped Matthew Wilcox (Oracle)
  2021-05-18 14:17   ` Vlastimil Babka
@ 2021-05-27  8:31   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2021-05-27  8:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel

On Tue, May 11, 2021 at 10:47:35PM +0100, Matthew Wilcox (Oracle) wrote:
> This function is the equivalent of page_mapped().  It is slightly
> shorter as we do not need to handle the PageTail() case.  Reimplement
> page_mapped() as a wrapper around folio_mapped().

No byte savings numbers as for the other patches?

The patch itself looks good, although I'd go for a slightly easier
readable structure:

bool folio_mapped(struct folio *folio)
{
	if (folio_single(folio))
		return atomic_read(&folio->_mapcount) >= 0;

	if (atomic_read(compound_mapcount_ptr(&folio->page)) >= 0)
		return true;

	if (!folio_hugetlb(folio)) {
		unsigned long i;

		for (i = 0; i < folio_nr_pages(folio); i++)
			if (atomic_read(&folio_page(folio, i)->_mapcount) >= 0)
 				return true;
 	}
 	return false;
 }

 Shouldn't we also have a folio version of compound_mapcount_ptr?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 07/33] mm: Add folio_get
  2021-05-27  8:10       ` Christoph Hellwig
@ 2021-05-27 22:53         ` Andrew Morton
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Morton @ 2021-05-27 22:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matthew Wilcox, Vlastimil Babka, linux-fsdevel, linux-mm,
	linux-kernel, Zi Yan, Christoph Hellwig, Jeff Layton

On Thu, 27 May 2021 09:10:31 +0100 Christoph Hellwig <hch@infradead.org> wrote:

> On Fri, May 14, 2021 at 03:24:26PM +0100, Matthew Wilcox wrote:
> > On Fri, May 14, 2021 at 01:56:46PM +0200, Vlastimil Babka wrote:
> > > Nitpick: function names in subject should IMHO also end with (). But not a
> > > reason for resend all patches that don't...
> > 
> > Hm, I thought it was preferred to not do that.  I can fix it
> > easily enough when I go through and add the R-b.
> 
> I hate the pointless ().  Some maintainers insist on it.   No matter
> what you do you'll make some folks happy and others not.

I prefer it.  It succinctly says "this identifier is a function" which
is useful info.

I get many changelogs saying "the foo function" or "the function foo". 
"foo()" is better.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 00/33] Memory folios
  2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
                   ` (34 preceding siblings ...)
  2021-05-15 10:26 ` William Kucharski
@ 2021-06-04  1:07 ` Matteo Croce
  2021-06-04  2:13   ` Matthew Wilcox
  35 siblings, 1 reply; 96+ messages in thread
From: Matteo Croce @ 2021-06-04  1:07 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel

On Tue, 11 May 2021 22:47:02 +0100
"Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> We also waste a lot of instructions ensuring that we're not looking at
> a tail page.  Almost every call to PageFoo() contains one or more
> hidden calls to compound_head().  This also happens for get_page(),
> put_page() and many more functions.  There does not appear to be a
> way to tell gcc that it can cache the result of compound_head(), nor
> is there a way to tell it that compound_head() is idempotent.
> 

Maybe it's not effective in all situations but the following hint to
the compiler seems to have an effect, at least according to bloat-o-meter:


--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -179,7 +179,7 @@ enum pageflags {
 
 struct page;   /* forward declaration */
 
-static inline struct page *compound_head(struct page *page)
+static inline __attribute_const__ struct page *compound_head(struct page *page)
 {
        unsigned long head = READ_ONCE(page->compound_head);
 

$ scripts/bloat-o-meter vmlinux.o.orig vmlinux.o
add/remove: 3/13 grow/shrink: 65/689 up/down: 21080/-198089 (-177009)
Function                                     old     new   delta
ntfs_mft_record_alloc                      14414   16627   +2213
migrate_pages                               8891   10819   +1928
ext2_get_page.isra                          1029    2343   +1314
kfence_init                                  180    1331   +1151
page_remove_rmap                             754    1893   +1139
f2fs_fsync_node_pages                       4378    5406   +1028
deferred_split_huge_page                    1279    2286   +1007
relock_page_lruvec_irqsave                     -     975    +975
f2fs_file_write_iter                        3508    4408    +900
__pagevec_lru_add                            704    1311    +607
[...]
pagevec_move_tail_fn                        5333    3215   -2118
__activate_page                             6183    4021   -2162
__unmap_and_move                            2190       -   -2190
__page_cache_release                        4738    2547   -2191
migrate_page_states                         7088    4842   -2246
lru_deactivate_fn                           5925    3652   -2273
move_pages_to_lru                           7259    4980   -2279
check_move_unevictable_pages                7131    4594   -2537
release_pages                               6940    4386   -2554
lru_lazyfree_fn                             6798    4198   -2600
ntfs_mft_record_format                      2940       -   -2940
lru_deactivate_file_fn                      9220    5631   -3589
shrink_page_list                           20653   15749   -4904
page_memcg                                  5149     193   -4956
Total: Before=388863526, After=388686517, chg -0.05%

I don't know if it breaks something though, nor if it gives some real
improvement.

-- 
per aspera ad upstream

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 00/33] Memory folios
  2021-06-04  1:07 ` Matteo Croce
@ 2021-06-04  2:13   ` Matthew Wilcox
  2021-06-08 14:56     ` Matteo Croce
  0 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox @ 2021-06-04  2:13 UTC (permalink / raw)
  To: Matteo Croce; +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel

On Fri, Jun 04, 2021 at 03:07:12AM +0200, Matteo Croce wrote:
> On Tue, 11 May 2021 22:47:02 +0100
> "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
> 
> > We also waste a lot of instructions ensuring that we're not looking at
> > a tail page.  Almost every call to PageFoo() contains one or more
> > hidden calls to compound_head().  This also happens for get_page(),
> > put_page() and many more functions.  There does not appear to be a
> > way to tell gcc that it can cache the result of compound_head(), nor
> > is there a way to tell it that compound_head() is idempotent.
> > 
> 
> Maybe it's not effective in all situations but the following hint to
> the compiler seems to have an effect, at least according to bloat-o-meter:

It definitely has an effect ;-)

     Note that a function that has pointer arguments and examines the
     data pointed to must _not_ be declared 'const' if the pointed-to
     data might change between successive invocations of the function.
     In general, since a function cannot distinguish data that might
     change from data that cannot, const functions should never take
     pointer or, in C++, reference arguments.  Likewise, a function that
     calls a non-const function usually must not be const itself.

So that's not going to work because a call to split_huge_page() won't
tell the compiler that it's changed.

Reading the documentation, we might be able to get away with marking the
function as pure:

     The 'pure' attribute imposes similar but looser restrictions on a
     function's definition than the 'const' attribute: 'pure' allows the
     function to read any non-volatile memory, even if it changes in
     between successive invocations of the function.

although that's going to miss opportunities, since taking a lock will
modify the contents of struct page, meaning the compiler won't cache
the results of compound_head().

> $ scripts/bloat-o-meter vmlinux.o.orig vmlinux.o
> add/remove: 3/13 grow/shrink: 65/689 up/down: 21080/-198089 (-177009)

I assume this is an allyesconfig kernel?    I think it's a good
indication of how much opportunity there is.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 08/33] mm: Add folio_try_get_rcu
  2021-05-27  8:16   ` Christoph Hellwig
@ 2021-06-05  4:26     ` Matthew Wilcox
  2021-06-06 14:13       ` Christoph Hellwig
  0 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox @ 2021-06-05  4:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: akpm, linux-fsdevel, linux-mm, linux-kernel

On Thu, May 27, 2021 at 09:16:42AM +0100, Christoph Hellwig wrote:
> On Tue, May 11, 2021 at 10:47:10PM +0100, Matthew Wilcox (Oracle) wrote:
> > +static inline bool folio_ref_try_add_rcu(struct folio *folio, int count)
> 
> Should this have a __ prefix and/or a don't use direct comment?

I think it will get used directly ... its page counterpart is:

mm/gup.c:       if (unlikely(!page_cache_add_speculative(head, refs)))

I deliberately left kernel-doc off this function so it's not described,
but described folio_try_get_rcu() in excruciating detail.  I hope that's
enough.  There's no comment on page_cache_add_speculative() today, so
again, we're status quo.

> > +{
> > +#ifdef CONFIG_TINY_RCU
> > +	/*
> > +	 * The caller guarantees the folio will not be freed from interrupt
> > +	 * context, so (on !SMP) we only need preemption to be disabled
> > +	 * and TINY_RCU does that for us.
> > +	 */
> > +# ifdef CONFIG_PREEMPT_COUNT
> > +	VM_BUG_ON(!in_atomic() && !irqs_disabled());
> > +# endif
> 
> 	VM_BUG_ON(IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
> 		  !in_atomic() && !irqs_disabled());
> 
> ?

I'm just moving it over, and honestly, I think it's slightly clearer
this way.  We can't check it if PREEMPT_COUNT isn't enabled, and I
think that's expressed better by the ifdef than the IS_ENABLED().

> > +	VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio);
> > +	folio_ref_add(folio, count);
> > +#else
> > +	if (unlikely(!folio_ref_add_unless(folio, count, 0))) {
> > +		/* Either the folio has been freed, or will be freed. */
> > +		return false;
> > +	}
> > +#endif
> > +	return true;
> 
> but is this tiny rcu optimization really worth it?  I guess we're just
> preserving it from the existing code and don't rock the boat..

I wondered that myself.  It's been there since Nick introduced it in
2008 with commit e286781d5f2e.  We certainly cared about small systems
more then, but apparently we still care about UP enough to maintain
CONFIG_TINY_RCU, so maybe this optimisation is still relevant.

> > @@ -1746,6 +1746,26 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
> >  }
> >  EXPORT_SYMBOL(page_cache_prev_miss);
> >  
> > +/*
> > + * Lockless page cache protocol:
> > + * On the lookup side:
> > + * 1. Load the folio from i_pages
> > + * 2. Increment the refcount if it's not zero
> > + * 3. If the folio is not found by xas_reload(), put the refcount and retry
> > + *
> > + * On the removal side:
> > + * A. Freeze the page (by zeroing the refcount if nobody else has a reference)
> > + * B. Remove the page from i_pages
> > + * C. Return the page to the page allocator
> > + *
> > + * This means that any page may have its reference count temporarily
> > + * increased by a speculative page cache (or fast GUP) lookup as it can
> > + * be allocated by another user before the RCU grace period expires.
> > + * Because the refcount temporarily acquired here may end up being the
> > + * last refcount on the page, any page allocation must be freeable by
> > + * put_folio().
> > + */
> > +
> >  /*
> >   * mapping_get_entry - Get a page cache entry.
> >   * @mapping: the address_space to search
> 
> Is this really a good place for the comment?  I'd expect it either near
> a relevant function or at the top of a file.

It's right before mapping_get_entry() which is the main lookup function
for the page cache, so I think it meets your first criteria?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 08/33] mm: Add folio_try_get_rcu
  2021-06-05  4:26     ` Matthew Wilcox
@ 2021-06-06 14:13       ` Christoph Hellwig
  0 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2021-06-06 14:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, akpm, linux-fsdevel, linux-mm, linux-kernel

On Sat, Jun 05, 2021 at 05:26:59AM +0100, Matthew Wilcox wrote:
> On Thu, May 27, 2021 at 09:16:42AM +0100, Christoph Hellwig wrote:
> > On Tue, May 11, 2021 at 10:47:10PM +0100, Matthew Wilcox (Oracle) wrote:
> > > +static inline bool folio_ref_try_add_rcu(struct folio *folio, int count)
> > 
> > Should this have a __ prefix and/or a don't use direct comment?
> 
> I think it will get used directly ... its page counterpart is:
> 
> mm/gup.c:       if (unlikely(!page_cache_add_speculative(head, refs)))
> 
> I deliberately left kernel-doc off this function so it's not described,
> but described folio_try_get_rcu() in excruciating detail.  I hope that's
> enough.  There's no comment on page_cache_add_speculative() today, so
> again, we're status quo.

Ok.  Seems a little weird, but so does much in this area.

> > Is this really a good place for the comment?  I'd expect it either near
> > a relevant function or at the top of a file.
> 
> It's right before mapping_get_entry() which is the main lookup function
> for the page cache, so I think it meets your first criteria?

I guess it does, yes.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v10 00/33] Memory folios
  2021-06-04  2:13   ` Matthew Wilcox
@ 2021-06-08 14:56     ` Matteo Croce
  0 siblings, 0 replies; 96+ messages in thread
From: Matteo Croce @ 2021-06-08 14:56 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, linux-fsdevel, linux-mm, linux-kernel

On Fri, Jun 4, 2021 at 4:13 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Jun 04, 2021 at 03:07:12AM +0200, Matteo Croce wrote:
> > On Tue, 11 May 2021 22:47:02 +0100
> > "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
> >
> > > We also waste a lot of instructions ensuring that we're not looking at
> > > a tail page.  Almost every call to PageFoo() contains one or more
> > > hidden calls to compound_head().  This also happens for get_page(),
> > > put_page() and many more functions.  There does not appear to be a
> > > way to tell gcc that it can cache the result of compound_head(), nor
> > > is there a way to tell it that compound_head() is idempotent.
> > >
> >
> > Maybe it's not effective in all situations but the following hint to
> > the compiler seems to have an effect, at least according to bloat-o-meter:
>
> It definitely has an effect ;-)
>
>      Note that a function that has pointer arguments and examines the
>      data pointed to must _not_ be declared 'const' if the pointed-to
>      data might change between successive invocations of the function.
>      In general, since a function cannot distinguish data that might
>      change from data that cannot, const functions should never take
>      pointer or, in C++, reference arguments.  Likewise, a function that
>      calls a non-const function usually must not be const itself.
>
> So that's not going to work because a call to split_huge_page() won't
> tell the compiler that it's changed.
>
> Reading the documentation, we might be able to get away with marking the
> function as pure:
>
>      The 'pure' attribute imposes similar but looser restrictions on a
>      function's definition than the 'const' attribute: 'pure' allows the
>      function to read any non-volatile memory, even if it changes in
>      between successive invocations of the function.
>
> although that's going to miss opportunities, since taking a lock will
> modify the contents of struct page, meaning the compiler won't cache
> the results of compound_head().
>
> > $ scripts/bloat-o-meter vmlinux.o.orig vmlinux.o
> > add/remove: 3/13 grow/shrink: 65/689 up/down: 21080/-198089 (-177009)
>
> I assume this is an allyesconfig kernel?    I think it's a good
> indication of how much opportunity there is.
>

Yes, it's an allyesconfig kernel.
I did the same with pure:

$ git diff
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 04a34c08e0a6..548b72b46eb1 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -179,7 +179,7 @@ enum pageflags {

struct page;   /* forward declaration */

-static inline struct page *compound_head(struct page *page)
+static inline __pure struct page *compound_head(struct page *page)
{
       unsigned long head = READ_ONCE(page->compound_head);


$ scripts/bloat-o-meter vmlinux.o.orig vmlinux.o
add/remove: 3/13 grow/shrink: 63/689 up/down: 20910/-192081 (-171171)
Function                                     old     new   delta
ntfs_mft_record_alloc                      14414   16627   +2213
migrate_pages                               8891   10819   +1928
ext2_get_page.isra                          1029    2343   +1314
kfence_init                                  180    1331   +1151
page_remove_rmap                             754    1893   +1139
f2fs_fsync_node_pages                       4378    5406   +1028
[...]
migrate_page_states                         7088    4842   -2246
ntfs_mft_record_format                      2940       -   -2940
lru_deactivate_file_fn                      9220    6277   -2943
shrink_page_list                           20653   15749   -4904
page_memcg                                  5149     193   -4956
Total: Before=388869713, After=388698542, chg -0.04%

$ ls -l vmlinux.o.orig vmlinux.o
-rw-rw-r-- 1 mcroce mcroce 1295502680 Jun  8 16:47 vmlinux.o
-rw-rw-r-- 1 mcroce mcroce 1295934624 Jun  8 16:28 vmlinux.o.orig

vmlinux is ~420 kb smaller..

-- 
per aspera ad upstream

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2021-06-08 14:57 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11 21:47 [PATCH v10 00/33] Memory folios Matthew Wilcox (Oracle)
2021-05-11 21:47 ` [PATCH v10 01/33] mm: Introduce struct folio Matthew Wilcox (Oracle)
2021-05-14 10:34   ` Vlastimil Babka
2021-05-14 10:40   ` Vlastimil Babka
2021-05-14 11:47     ` Matthew Wilcox
2021-05-15 10:55   ` William Kucharski
2021-05-15 20:14     ` Matthew Wilcox
2021-05-16 19:26       ` William Kucharski
2021-05-27  8:09   ` Christoph Hellwig
2021-05-11 21:47 ` [PATCH v10 02/33] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
2021-05-14 10:35   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 03/33] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
2021-05-14 10:36   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 04/33] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
2021-05-14 10:44   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 05/33] mm: Add folio reference count functions Matthew Wilcox (Oracle)
2021-05-14 11:04   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 06/33] mm: Add folio_put Matthew Wilcox (Oracle)
2021-05-14 11:52   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 07/33] mm: Add folio_get Matthew Wilcox (Oracle)
2021-05-14 11:56   ` Vlastimil Babka
2021-05-14 14:24     ` Matthew Wilcox
2021-05-14 15:39       ` Vlastimil Babka
2021-05-27  8:10       ` Christoph Hellwig
2021-05-27 22:53         ` Andrew Morton
2021-05-11 21:47 ` [PATCH v10 08/33] mm: Add folio_try_get_rcu Matthew Wilcox (Oracle)
2021-05-14 12:11   ` Vlastimil Babka
2021-05-27  8:16   ` Christoph Hellwig
2021-06-05  4:26     ` Matthew Wilcox
2021-06-06 14:13       ` Christoph Hellwig
2021-05-11 21:47 ` [PATCH v10 09/33] mm: Add folio flag manipulation functions Matthew Wilcox (Oracle)
2021-05-14 15:29   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 10/33] mm: Add folio_young and folio_idle Matthew Wilcox (Oracle)
2021-05-14 15:33   ` Vlastimil Babka
2021-05-27  8:17   ` Christoph Hellwig
2021-05-11 21:47 ` [PATCH v10 11/33] mm: Handle per-folio private data Matthew Wilcox (Oracle)
2021-05-14 15:41   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 12/33] mm/filemap: Add folio_index, folio_file_page and folio_contains Matthew Wilcox (Oracle)
2021-05-14 15:55   ` Vlastimil Babka
2021-05-15 15:51     ` Matthew Wilcox
2021-05-11 21:47 ` [PATCH v10 13/33] mm/filemap: Add folio_next_index Matthew Wilcox (Oracle)
2021-05-14 17:07   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 14/33] mm/filemap: Add folio_offset and folio_file_offset Matthew Wilcox (Oracle)
2021-05-14 17:08   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 15/33] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
2021-05-14 17:29   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 16/33] mm: Add folio_mapcount Matthew Wilcox (Oracle)
2021-05-14 17:39   ` Vlastimil Babka
2021-05-18 18:45   ` Matthew Wilcox
2021-05-11 21:47 ` [PATCH v10 17/33] mm/memcg: Add folio wrappers for various functions Matthew Wilcox (Oracle)
2021-05-18  9:57   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 18/33] mm/filemap: Add folio_unlock Matthew Wilcox (Oracle)
2021-05-18 10:06   ` Vlastimil Babka
2021-05-18 11:30     ` Matthew Wilcox
2021-05-11 21:47 ` [PATCH v10 19/33] mm/filemap: Add folio_lock Matthew Wilcox (Oracle)
2021-05-18 10:26   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 20/33] mm/filemap: Add folio_lock_killable Matthew Wilcox (Oracle)
2021-05-18 10:31   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 21/33] mm/filemap: Add __folio_lock_async Matthew Wilcox (Oracle)
2021-05-18 10:34   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 22/33] mm/filemap: Add __folio_lock_or_retry Matthew Wilcox (Oracle)
2021-05-18 10:38   ` Vlastimil Babka
2021-05-18 10:45     ` Vlastimil Babka
2021-05-18 13:35     ` Matthew Wilcox
2021-05-11 21:47 ` [PATCH v10 23/33] mm/filemap: Add folio_wait_locked Matthew Wilcox (Oracle)
2021-05-18 10:41   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 24/33] mm/swap: Add folio_rotate_reclaimable Matthew Wilcox (Oracle)
2021-05-18 10:48   ` Vlastimil Babka
2021-05-27  8:19   ` Christoph Hellwig
2021-05-11 21:47 ` [PATCH v10 25/33] mm/filemap: Add folio_end_writeback Matthew Wilcox (Oracle)
2021-05-18 11:08   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 26/33] mm/writeback: Add folio_wait_writeback Matthew Wilcox (Oracle)
2021-05-18 11:12   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 27/33] mm/writeback: Add folio_wait_stable Matthew Wilcox (Oracle)
2021-05-18 11:42   ` Vlastimil Babka
2021-05-18 13:55     ` Matthew Wilcox
2021-05-11 21:47 ` [PATCH v10 28/33] mm/filemap: Add folio_wait_bit Matthew Wilcox (Oracle)
2021-05-18 11:51   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 29/33] mm/filemap: Add folio_wake_bit Matthew Wilcox (Oracle)
2021-05-18 11:53   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 30/33] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
2021-05-18 12:23   ` Vlastimil Babka
2021-05-11 21:47 ` [PATCH v10 31/33] mm/filemap: Add folio private_2 functions Matthew Wilcox (Oracle)
2021-05-18 12:26   ` Vlastimil Babka
2021-05-27  8:21   ` Christoph Hellwig
2021-05-11 21:47 ` [PATCH v10 32/33] fs/netfs: Add folio fscache functions Matthew Wilcox (Oracle)
2021-05-18 13:48   ` Vlastimil Babka
2021-05-27  8:23   ` Christoph Hellwig
2021-05-11 21:47 ` [PATCH v10 33/33] mm: Add folio_mapped Matthew Wilcox (Oracle)
2021-05-18 14:17   ` Vlastimil Babka
2021-05-27  8:31   ` Christoph Hellwig
2021-05-13 14:50 ` [PATCH v10 00/33] Memory folios Matthew Wilcox
2021-05-15 10:26 ` William Kucharski
2021-06-04  1:07 ` Matteo Croce
2021-06-04  2:13   ` Matthew Wilcox
2021-06-08 14:56     ` Matteo Croce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).