All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Various significant MM patches
@ 2024-03-21 14:24 Matthew Wilcox (Oracle)
  2024-03-21 14:24 ` [PATCH 1/9] mm: Always initialise folio->_deferred_list Matthew Wilcox (Oracle)
                   ` (8 more replies)
  0 siblings, 9 replies; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

These patches all interact in annoying ways which make it tricky to
send them out in any way other than a big batch, even though there's
not really an overarching theme to connect them.

The big effects of this patch series are:

 - folio_test_hugetlb() becomes reliable, even when called without a
   page reference
 - We free up PG_slab, and we could always use more page flags
 - We no longer need to check PageSlab before calling page_mapcount()

Matthew Wilcox (Oracle) (9):
  mm: Always initialise folio->_deferred_list
  mm: Create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
  mm: Remove folio_prep_large_rmappable()
  mm: Support page_mapcount() on page_has_type() pages
  mm: Turn folio_test_hugetlb into a PageType
  mm: Remove a call to compound_head() from is_page_hwpoison()
  mm: Free up PG_slab
  mm: Improve dumping of mapcount and page_type
  hugetlb: Remove mention of destructors

 fs/proc/page.c                 |   7 +-
 include/linux/huge_mm.h        |   3 -
 include/linux/mm.h             |   8 +-
 include/linux/page-flags.h     | 172 ++++++++++++++++++++-------------
 include/trace/events/mmflags.h |   3 +-
 mm/debug.c                     |  19 ++--
 mm/huge_memory.c               |  11 +--
 mm/hugetlb.c                   |  65 ++++---------
 mm/internal.h                  |   5 +-
 mm/memcontrol.c                |   2 +
 mm/memory-failure.c            |   9 --
 mm/page_alloc.c                |   9 +-
 mm/slab.h                      |   2 +-
 13 files changed, 155 insertions(+), 160 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/9] mm: Always initialise folio->_deferred_list
  2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
@ 2024-03-21 14:24 ` Matthew Wilcox (Oracle)
  2024-03-22  8:23   ` Miaohe Lin
                     ` (2 more replies)
  2024-03-21 14:24 ` [PATCH 2/9] mm: Create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

For compound pages which are at least order-2 (and hence have a
deferred_list), initialise it and then we can check at free that the
page is not part of a deferred list.  We recently found this useful to
rule out a source of corruption.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/huge_memory.c | 2 --
 mm/hugetlb.c     | 3 ++-
 mm/internal.h    | 2 ++
 mm/memcontrol.c  | 2 ++
 mm/page_alloc.c  | 9 +++++----
 5 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9859aa4f7553..04fb994a7b0b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio)
 {
 	if (!folio || !folio_test_large(folio))
 		return;
-	if (folio_order(folio) > 1)
-		INIT_LIST_HEAD(&folio->_deferred_list);
 	folio_set_large_rmappable(folio);
 }
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 23ef240ba48a..7e9a766059aa 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
 		destroy_compound_gigantic_folio(folio, huge_page_order(h));
 		free_gigantic_folio(folio, huge_page_order(h));
 	} else {
-		__free_pages(&folio->page, huge_page_order(h));
+		INIT_LIST_HEAD(&folio->_deferred_list);
+		folio_put(folio);
 	}
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 7e486f2c502c..10895ec52546 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -525,6 +525,8 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
 	atomic_set(&folio->_entire_mapcount, -1);
 	atomic_set(&folio->_nr_pages_mapped, 0);
 	atomic_set(&folio->_pincount, 0);
+	if (order > 1)
+		INIT_LIST_HEAD(&folio->_deferred_list);
 }
 
 static inline void prep_compound_tail(struct page *head, int tail_idx)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fabce2b50c69..a2a74d4ca0b1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7448,6 +7448,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
 	struct obj_cgroup *objcg;
 
 	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
+	VM_BUG_ON_FOLIO(folio_order(folio) > 1 &&
+			!list_empty(&folio->_deferred_list), folio);
 
 	/*
 	 * Nobody should be changing or seriously looking at
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 14d39f34d336..4301146a5bf4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
 		}
 		break;
 	case 2:
-		/*
-		 * the second tail page: ->mapping is
-		 * deferred_list.next -- ignore value.
-		 */
+		/* the second tail page: deferred_list overlaps ->mapping */
+		if (unlikely(!list_empty(&folio->_deferred_list))) {
+			bad_page(page, "on deferred list");
+			goto out;
+		}
 		break;
 	default:
 		if (page->mapping != TAIL_MAPPING) {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/9] mm: Create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
  2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
  2024-03-21 14:24 ` [PATCH 1/9] mm: Always initialise folio->_deferred_list Matthew Wilcox (Oracle)
@ 2024-03-21 14:24 ` Matthew Wilcox (Oracle)
  2024-03-22  9:33   ` Vlastimil Babka
  2024-03-21 14:24 ` [PATCH 3/9] mm: Remove folio_prep_large_rmappable() Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

Following the separation of FOLIO_FLAGS from PAGEFLAGS, separate
FOLIO_FLAG_FALSE from PAGEFLAG_FALSE and FOLIO_TYPE_OPS from
PAGE_TYPE_OPS.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/page-flags.h | 70 +++++++++++++++++++++++++-------------
 1 file changed, 47 insertions(+), 23 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 652d77805e99..dc1607f1415e 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -458,30 +458,51 @@ static __always_inline int TestClearPage##uname(struct page *page)	\
 	TESTSETFLAG(uname, lname, policy)				\
 	TESTCLEARFLAG(uname, lname, policy)
 
+#define FOLIO_TEST_FLAG_FALSE(name)					\
+static inline bool folio_test_##name(const struct folio *folio)		\
+{ return false; }
+#define FOLIO_SET_FLAG_NOOP(name)					\
+static inline void folio_set_##name(struct folio *folio) { }
+#define FOLIO_CLEAR_FLAG_NOOP(name)					\
+static inline void folio_clear_##name(struct folio *folio) { }
+#define __FOLIO_SET_FLAG_NOOP(name)					\
+static inline void __folio_set_##name(struct folio *folio) { }
+#define __FOLIO_CLEAR_FLAG_NOOP(name)					\
+static inline void __folio_clear_##name(struct folio *folio) { }
+#define FOLIO_TEST_SET_FLAG_FALSE(name)					\
+static inline bool folio_test_set_##name(struct folio *folio)		\
+{ return false; }
+#define FOLIO_TEST_CLEAR_FLAG_FALSE(name)				\
+static inline bool folio_test_clear_##name(struct folio *folio)		\
+{ return false; }
+
+#define FOLIO_FLAG_FALSE(name)						\
+FOLIO_TEST_FLAG_FALSE(name)						\
+FOLIO_SET_FLAG_NOOP(name)						\
+FOLIO_CLEAR_FLAG_NOOP(name)
+
 #define TESTPAGEFLAG_FALSE(uname, lname)				\
-static inline bool folio_test_##lname(const struct folio *folio) { return false; } \
+FOLIO_TEST_FLAG_FALSE(lname)						\
 static inline int Page##uname(const struct page *page) { return 0; }
 
 #define SETPAGEFLAG_NOOP(uname, lname)					\
-static inline void folio_set_##lname(struct folio *folio) { }		\
+FOLIO_SET_FLAG_NOOP(lname)						\
 static inline void SetPage##uname(struct page *page) {  }
 
 #define CLEARPAGEFLAG_NOOP(uname, lname)				\
-static inline void folio_clear_##lname(struct folio *folio) { }		\
+FOLIO_CLEAR_FLAG_NOOP(lname)						\
 static inline void ClearPage##uname(struct page *page) {  }
 
 #define __CLEARPAGEFLAG_NOOP(uname, lname)				\
-static inline void __folio_clear_##lname(struct folio *folio) { }	\
+__FOLIO_CLEAR_FLAG_NOOP(lname)						\
 static inline void __ClearPage##uname(struct page *page) {  }
 
 #define TESTSETFLAG_FALSE(uname, lname)					\
-static inline bool folio_test_set_##lname(struct folio *folio)		\
-{ return 0; }								\
+FOLIO_TEST_SET_FLAG_FALSE(lname)					\
 static inline int TestSetPage##uname(struct page *page) { return 0; }
 
 #define TESTCLEARFLAG_FALSE(uname, lname)				\
-static inline bool folio_test_clear_##lname(struct folio *folio)	\
-{ return 0; }								\
+FOLIO_TEST_CLEAR_FLAG_FALSE(lname)					\
 static inline int TestClearPage##uname(struct page *page) { return 0; }
 
 #define PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname, lname)	\
@@ -977,35 +998,38 @@ static inline int page_has_type(const struct page *page)
 	return page_type_has_type(page->page_type);
 }
 
+#define FOLIO_TYPE_OPS(lname, fname)					\
+static __always_inline bool folio_test_##fname(const struct folio *folio)\
+{									\
+	return folio_test_type(folio, PG_##lname);			\
+}									\
+static __always_inline void __folio_set_##fname(struct folio *folio)	\
+{									\
+	VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio);		\
+	folio->page.page_type &= ~PG_##lname;				\
+}									\
+static __always_inline void __folio_clear_##fname(struct folio *folio)	\
+{									\
+	VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio);		\
+	folio->page.page_type |= PG_##lname;				\
+}
+
 #define PAGE_TYPE_OPS(uname, lname, fname)				\
+FOLIO_TYPE_OPS(lname, fname)						\
 static __always_inline int Page##uname(const struct page *page)		\
 {									\
 	return PageType(page, PG_##lname);				\
 }									\
-static __always_inline int folio_test_##fname(const struct folio *folio)\
-{									\
-	return folio_test_type(folio, PG_##lname);			\
-}									\
 static __always_inline void __SetPage##uname(struct page *page)		\
 {									\
 	VM_BUG_ON_PAGE(!PageType(page, 0), page);			\
 	page->page_type &= ~PG_##lname;					\
 }									\
-static __always_inline void __folio_set_##fname(struct folio *folio)	\
-{									\
-	VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio);		\
-	folio->page.page_type &= ~PG_##lname;				\
-}									\
 static __always_inline void __ClearPage##uname(struct page *page)	\
 {									\
 	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
 	page->page_type |= PG_##lname;					\
-}									\
-static __always_inline void __folio_clear_##fname(struct folio *folio)	\
-{									\
-	VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio);		\
-	folio->page.page_type |= PG_##lname;				\
-}									\
+}
 
 /*
  * PageBuddy() indicates that the page is free and in the buddy system
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/9] mm: Remove folio_prep_large_rmappable()
  2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
  2024-03-21 14:24 ` [PATCH 1/9] mm: Always initialise folio->_deferred_list Matthew Wilcox (Oracle)
  2024-03-21 14:24 ` [PATCH 2/9] mm: Create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros Matthew Wilcox (Oracle)
@ 2024-03-21 14:24 ` Matthew Wilcox (Oracle)
  2024-03-22  9:37   ` Vlastimil Babka
  2024-03-22 12:51   ` David Hildenbrand
  2024-03-21 14:24 ` [PATCH 4/9] mm: Support page_mapcount() on page_has_type() pages Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

Now that prep_compound_page() initialises folio->_deferred_list,
folio_prep_large_rmappable()'s only purpose is to set the large_rmappable
flag, so inline it into the two callers.  Take the opportunity to convert
the large_rmappable definition from PAGEFLAG to FOLIO_FLAG and remove
the existance of PageTestLargeRmappable and friends.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/huge_mm.h    | 3 ---
 include/linux/page-flags.h | 4 ++--
 mm/huge_memory.c           | 9 +--------
 mm/internal.h              | 3 ++-
 4 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index de0c89105076..0e16451adaba 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -263,7 +263,6 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
 unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
 		unsigned long len, unsigned long pgoff, unsigned long flags);
 
-void folio_prep_large_rmappable(struct folio *folio);
 bool can_split_folio(struct folio *folio, int *pextra_pins);
 int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 		unsigned int new_order);
@@ -411,8 +410,6 @@ static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
 	return 0;
 }
 
-static inline void folio_prep_large_rmappable(struct folio *folio) {}
-
 #define transparent_hugepage_flags 0UL
 
 #define thp_get_unmapped_area	NULL
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index dc1607f1415e..8d0e6ce25ca2 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -869,9 +869,9 @@ static inline void ClearPageCompound(struct page *page)
 	BUG_ON(!PageHead(page));
 	ClearPageHead(page);
 }
-PAGEFLAG(LargeRmappable, large_rmappable, PF_SECOND)
+FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE)
 #else
-TESTPAGEFLAG_FALSE(LargeRmappable, large_rmappable)
+FOLIO_FLAG_FALSE(large_rmappable)
 #endif
 
 #define PG_head_mask ((1UL << PG_head))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 04fb994a7b0b..5cb025341d52 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -788,13 +788,6 @@ struct deferred_split *get_deferred_split_queue(struct folio *folio)
 }
 #endif
 
-void folio_prep_large_rmappable(struct folio *folio)
-{
-	if (!folio || !folio_test_large(folio))
-		return;
-	folio_set_large_rmappable(folio);
-}
-
 static inline bool is_transparent_hugepage(struct folio *folio)
 {
 	if (!folio_test_large(folio))
@@ -2861,7 +2854,7 @@ static void __split_huge_page_tail(struct folio *folio, int tail,
 	clear_compound_head(page_tail);
 	if (new_order) {
 		prep_compound_page(page_tail, new_order);
-		folio_prep_large_rmappable(new_folio);
+		folio_set_large_rmappable(new_folio);
 	}
 
 	/* Finally unfreeze refcount. Additional reference from page cache. */
diff --git a/mm/internal.h b/mm/internal.h
index 10895ec52546..ee669963db15 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -513,7 +513,8 @@ static inline struct folio *page_rmappable_folio(struct page *page)
 {
 	struct folio *folio = (struct folio *)page;
 
-	folio_prep_large_rmappable(folio);
+	if (folio && folio_test_large(folio))
+		folio_set_large_rmappable(folio);
 	return folio;
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/9] mm: Support page_mapcount() on page_has_type() pages
  2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2024-03-21 14:24 ` [PATCH 3/9] mm: Remove folio_prep_large_rmappable() Matthew Wilcox (Oracle)
@ 2024-03-21 14:24 ` Matthew Wilcox (Oracle)
  2024-03-22  9:43   ` Vlastimil Babka
  2024-03-22 15:04   ` David Hildenbrand
  2024-03-21 14:24 ` [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

Return 0 for pages which can't be mapped.  This matches how page_mapped()
works.  It is more convenient for users to not have to filter out
these pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/proc/page.c             | 7 ++-----
 include/linux/mm.h         | 8 +++++---
 include/linux/page-flags.h | 4 ++--
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 195b077c0fac..9223856c934b 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -67,7 +67,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
 		 */
 		ppage = pfn_to_online_page(pfn);
 
-		if (!ppage || PageSlab(ppage) || page_has_type(ppage))
+		if (!ppage)
 			pcount = 0;
 		else
 			pcount = page_mapcount(ppage);
@@ -124,11 +124,8 @@ u64 stable_page_flags(struct page *page)
 
 	/*
 	 * pseudo flags for the well known (anonymous) memory mapped pages
-	 *
-	 * Note that page->_mapcount is overloaded in SLAB, so the
-	 * simple test in page_mapped() is not enough.
 	 */
-	if (!PageSlab(page) && page_mapped(page))
+	if (page_mapped(page))
 		u |= 1 << KPF_MMAP;
 	if (PageAnon(page))
 		u |= 1 << KPF_ANON;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0436b919f1c7..5ff3d687bc6c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1223,14 +1223,16 @@ static inline void page_mapcount_reset(struct page *page)
  * a large folio, it includes the number of times this page is mapped
  * as part of that folio.
  *
- * The result is undefined for pages which cannot be mapped into userspace.
- * For example SLAB or special types of pages. See function page_has_type().
- * They use this field in struct page differently.
+ * Will report 0 for pages which cannot be mapped into userspace, eg
+ * slab, page tables and similar.
  */
 static inline int page_mapcount(struct page *page)
 {
 	int mapcount = atomic_read(&page->_mapcount) + 1;
 
+	/* Handle page_has_type() pages */
+	if (mapcount < 0)
+		mapcount = 0;
 	if (unlikely(PageCompound(page)))
 		mapcount += folio_entire_mapcount(page_folio(page));
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8d0e6ce25ca2..5852f967c640 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -971,12 +971,12 @@ static inline bool is_page_hwpoison(struct page *page)
  * page_type may be used.  Because it is initialised to -1, we invert the
  * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
  * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
- * low bits so that an underflow or overflow of page_mapcount() won't be
+ * low bits so that an underflow or overflow of _mapcount won't be
  * mistaken for a page type value.
  */
 
 #define PAGE_TYPE_BASE	0xf0000000
-/* Reserve		0x0000007f to catch underflows of page_mapcount */
+/* Reserve		0x0000007f to catch underflows of _mapcount */
 #define PAGE_MAPCOUNT_RESERVE	-128
 #define PG_buddy	0x00000080
 #define PG_offline	0x00000100
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2024-03-21 14:24 ` [PATCH 4/9] mm: Support page_mapcount() on page_has_type() pages Matthew Wilcox (Oracle)
@ 2024-03-21 14:24 ` Matthew Wilcox (Oracle)
  2024-03-22 10:19   ` Vlastimil Babka
                     ` (2 more replies)
  2024-03-21 14:24 ` [PATCH 6/9] mm: Remove a call to compound_head() from is_page_hwpoison() Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

The current folio_test_hugetlb() can be fooled by a concurrent folio split
into returning true for a folio which has never belonged to hugetlbfs.
This can't happen if the caller holds a refcount on it, but we have a
few places (memory-failure, compaction, procfs) which do not and should
not take a speculative reference.

Since hugetlb pages do not use individual page mapcounts (they are always
fully mapped and use the entire_mapcount field to record the number
of mappings), the PageType field is available now that page_mapcount()
ignores the value in this field.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 include/linux/page-flags.h     | 70 ++++++++++++++++------------------
 include/trace/events/mmflags.h |  1 +
 mm/hugetlb.c                   | 22 ++---------
 3 files changed, 37 insertions(+), 56 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5852f967c640..6fb3cd42ee59 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -190,7 +190,6 @@ enum pageflags {
 
 	/* At least one page in this folio has the hwpoison flag set */
 	PG_has_hwpoisoned = PG_error,
-	PG_hugetlb = PG_active,
 	PG_large_rmappable = PG_workingset, /* anon or file-backed */
 };
 
@@ -876,29 +875,6 @@ FOLIO_FLAG_FALSE(large_rmappable)
 
 #define PG_head_mask ((1UL << PG_head))
 
-#ifdef CONFIG_HUGETLB_PAGE
-int PageHuge(const struct page *page);
-SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
-CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
-
-/**
- * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
- * @folio: The folio to test.
- *
- * Context: Any context.  Caller should have a reference on the folio to
- * prevent it from being turned into a tail page.
- * Return: True for hugetlbfs folios, false for anon folios or folios
- * belonging to other filesystems.
- */
-static inline bool folio_test_hugetlb(const struct folio *folio)
-{
-	return folio_test_large(folio) &&
-		test_bit(PG_hugetlb, const_folio_flags(folio, 1));
-}
-#else
-TESTPAGEFLAG_FALSE(Huge, hugetlb)
-#endif
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * PageHuge() only returns true for hugetlbfs pages, but not for
@@ -954,18 +930,6 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
 	TESTSCFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
 #endif
 
-/*
- * Check if a page is currently marked HWPoisoned. Note that this check is
- * best effort only and inherently racy: there is no way to synchronize with
- * failing hardware.
- */
-static inline bool is_page_hwpoison(struct page *page)
-{
-	if (PageHWPoison(page))
-		return true;
-	return PageHuge(page) && PageHWPoison(compound_head(page));
-}
-
 /*
  * For pages that are never mapped to userspace (and aren't PageSlab),
  * page_type may be used.  Because it is initialised to -1, we invert the
@@ -982,6 +946,7 @@ static inline bool is_page_hwpoison(struct page *page)
 #define PG_offline	0x00000100
 #define PG_table	0x00000200
 #define PG_guard	0x00000400
+#define PG_hugetlb	0x00000800
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -1076,6 +1041,37 @@ PAGE_TYPE_OPS(Table, table, pgtable)
  */
 PAGE_TYPE_OPS(Guard, guard, guard)
 
+#ifdef CONFIG_HUGETLB_PAGE
+FOLIO_TYPE_OPS(hugetlb, hugetlb)
+#else
+FOLIO_TEST_FLAG_FALSE(hugetlb)
+#endif
+
+/**
+ * PageHuge - Determine if the page belongs to hugetlbfs
+ * @page: The page to test.
+ *
+ * Context: Any context.
+ * Return: True for hugetlbfs pages, false for anon pages or pages
+ * belonging to other filesystems.
+ */
+static inline bool PageHuge(const struct page *page)
+{
+	return folio_test_hugetlb(page_folio(page));
+}
+
+/*
+ * Check if a page is currently marked HWPoisoned. Note that this check is
+ * best effort only and inherently racy: there is no way to synchronize with
+ * failing hardware.
+ */
+static inline bool is_page_hwpoison(struct page *page)
+{
+	if (PageHWPoison(page))
+		return true;
+	return PageHuge(page) && PageHWPoison(compound_head(page));
+}
+
 extern bool is_free_buddy_page(struct page *page);
 
 PAGEFLAG(Isolated, isolated, PF_ANY);
@@ -1142,7 +1138,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
  */
 #define PAGE_FLAGS_SECOND						\
 	(0xffUL /* order */		| 1UL << PG_has_hwpoisoned |	\
-	 1UL << PG_hugetlb		| 1UL << PG_large_rmappable)
+	 1UL << PG_large_rmappable)
 
 #define PAGE_FLAGS_PRIVATE				\
 	(1UL << PG_private | 1UL << PG_private_2)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index d801409b33cf..d55e53ac91bd 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -135,6 +135,7 @@ IF_HAVE_PG_ARCH_X(arch_3)
 #define DEF_PAGETYPE_NAME(_name) { PG_##_name, __stringify(_name) }
 
 #define __def_pagetype_names						\
+	DEF_PAGETYPE_NAME(hugetlb),					\
 	DEF_PAGETYPE_NAME(offline),					\
 	DEF_PAGETYPE_NAME(guard),					\
 	DEF_PAGETYPE_NAME(table),					\
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7e9a766059aa..bdcbb62096cf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1624,7 +1624,7 @@ static inline void __clear_hugetlb_destructor(struct hstate *h,
 {
 	lockdep_assert_held(&hugetlb_lock);
 
-	folio_clear_hugetlb(folio);
+	__folio_clear_hugetlb(folio);
 }
 
 /*
@@ -1711,7 +1711,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio,
 		h->surplus_huge_pages_node[nid]++;
 	}
 
-	folio_set_hugetlb(folio);
+	__folio_set_hugetlb(folio);
 	folio_change_private(folio, NULL);
 	/*
 	 * We have to set hugetlb_vmemmap_optimized again as above
@@ -2050,7 +2050,7 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid)
 
 static void init_new_hugetlb_folio(struct hstate *h, struct folio *folio)
 {
-	folio_set_hugetlb(folio);
+	__folio_set_hugetlb(folio);
 	INIT_LIST_HEAD(&folio->lru);
 	hugetlb_set_folio_subpool(folio, NULL);
 	set_hugetlb_cgroup(folio, NULL);
@@ -2160,22 +2160,6 @@ static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
 	return __prep_compound_gigantic_folio(folio, order, true);
 }
 
-/*
- * PageHuge() only returns true for hugetlbfs pages, but not for normal or
- * transparent huge pages.  See the PageTransHuge() documentation for more
- * details.
- */
-int PageHuge(const struct page *page)
-{
-	const struct folio *folio;
-
-	if (!PageCompound(page))
-		return 0;
-	folio = page_folio(page);
-	return folio_test_hugetlb(folio);
-}
-EXPORT_SYMBOL_GPL(PageHuge);
-
 /*
  * Find and lock address space (mapping) in write mode.
  *
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6/9] mm: Remove a call to compound_head() from is_page_hwpoison()
  2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2024-03-21 14:24 ` [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType Matthew Wilcox (Oracle)
@ 2024-03-21 14:24 ` Matthew Wilcox (Oracle)
  2024-03-22 10:28   ` Vlastimil Babka
  2024-03-21 14:24 ` [PATCH 7/9] mm: Free up PG_slab Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

We can call it only once instead of twice.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/page-flags.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6fb3cd42ee59..94eb8a11a321 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -1065,11 +1065,14 @@ static inline bool PageHuge(const struct page *page)
  * best effort only and inherently racy: there is no way to synchronize with
  * failing hardware.
  */
-static inline bool is_page_hwpoison(struct page *page)
+static inline bool is_page_hwpoison(const struct page *page)
 {
+	const struct folio *folio;
+
 	if (PageHWPoison(page))
 		return true;
-	return PageHuge(page) && PageHWPoison(compound_head(page));
+	folio = page_folio(page);
+	return folio_test_hugetlb(folio) && PageHWPoison(&folio->page);
 }
 
 extern bool is_free_buddy_page(struct page *page);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 7/9] mm: Free up PG_slab
  2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2024-03-21 14:24 ` [PATCH 6/9] mm: Remove a call to compound_head() from is_page_hwpoison() Matthew Wilcox (Oracle)
@ 2024-03-21 14:24 ` Matthew Wilcox (Oracle)
  2024-03-22  9:20   ` Miaohe Lin
                     ` (3 more replies)
  2024-03-21 14:24 ` [PATCH 8/9] mm: Improve dumping of mapcount and page_type Matthew Wilcox (Oracle)
  2024-03-21 14:24 ` [PATCH 9/9] hugetlb: Remove mention of destructors Matthew Wilcox (Oracle)
  8 siblings, 4 replies; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

Reclaim the Slab page flag by using a spare bit in PageType.  We are
perennially short of page flags for various purposes, and now that
the original SLAB allocator has been retired, SLUB does not use the
mapcount/page_type field.  This lets us remove a number of special cases
for ignoring mapcount on Slab pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/page-flags.h     | 21 +++++++++++++++++----
 include/trace/events/mmflags.h |  2 +-
 mm/memory-failure.c            |  9 ---------
 mm/slab.h                      |  2 +-
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 94eb8a11a321..73e0b17c7728 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -109,7 +109,6 @@ enum pageflags {
 	PG_active,
 	PG_workingset,
 	PG_error,
-	PG_slab,
 	PG_owner_priv_1,	/* Owner use. If pagecache, fs may use*/
 	PG_arch_1,
 	PG_reserved,
@@ -524,7 +523,6 @@ PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
 	TESTCLEARFLAG(Active, active, PF_HEAD)
 PAGEFLAG(Workingset, workingset, PF_HEAD)
 	TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
-__PAGEFLAG(Slab, slab, PF_NO_TAIL)
 PAGEFLAG(Checked, checked, PF_NO_COMPOUND)	   /* Used by some filesystems */
 
 /* Xen */
@@ -931,7 +929,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
 #endif
 
 /*
- * For pages that are never mapped to userspace (and aren't PageSlab),
+ * For pages that are never mapped to userspace,
  * page_type may be used.  Because it is initialised to -1, we invert the
  * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
  * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
@@ -947,6 +945,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
 #define PG_table	0x00000200
 #define PG_guard	0x00000400
 #define PG_hugetlb	0x00000800
+#define PG_slab		0x00001000
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -1041,6 +1040,20 @@ PAGE_TYPE_OPS(Table, table, pgtable)
  */
 PAGE_TYPE_OPS(Guard, guard, guard)
 
+FOLIO_TYPE_OPS(slab, slab)
+
+/**
+ * PageSlab - Determine if the page belongs to the slab allocator
+ * @page: The page to test.
+ *
+ * Context: Any context.
+ * Return: True for slab pages, false for any other kind of page.
+ */
+static inline bool PageSlab(const struct page *page)
+{
+	return folio_test_slab(page_folio(page));
+}
+
 #ifdef CONFIG_HUGETLB_PAGE
 FOLIO_TYPE_OPS(hugetlb, hugetlb)
 #else
@@ -1121,7 +1134,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
 	(1UL << PG_lru		| 1UL << PG_locked	|	\
 	 1UL << PG_private	| 1UL << PG_private_2	|	\
 	 1UL << PG_writeback	| 1UL << PG_reserved	|	\
-	 1UL << PG_slab		| 1UL << PG_active 	|	\
+	 1UL << PG_active 	|				\
 	 1UL << PG_unevictable	| __PG_MLOCKED | LRU_GEN_MASK)
 
 /*
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index d55e53ac91bd..e46d6e82765e 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -107,7 +107,6 @@
 	DEF_PAGEFLAG_NAME(lru),						\
 	DEF_PAGEFLAG_NAME(active),					\
 	DEF_PAGEFLAG_NAME(workingset),					\
-	DEF_PAGEFLAG_NAME(slab),					\
 	DEF_PAGEFLAG_NAME(owner_priv_1),				\
 	DEF_PAGEFLAG_NAME(arch_1),					\
 	DEF_PAGEFLAG_NAME(reserved),					\
@@ -135,6 +134,7 @@ IF_HAVE_PG_ARCH_X(arch_3)
 #define DEF_PAGETYPE_NAME(_name) { PG_##_name, __stringify(_name) }
 
 #define __def_pagetype_names						\
+	DEF_PAGETYPE_NAME(slab),					\
 	DEF_PAGETYPE_NAME(hugetlb),					\
 	DEF_PAGETYPE_NAME(offline),					\
 	DEF_PAGETYPE_NAME(guard),					\
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9349948f1abf..1cb41ba7870c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1239,7 +1239,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 #define mlock		(1UL << PG_mlocked)
 #define lru		(1UL << PG_lru)
 #define head		(1UL << PG_head)
-#define slab		(1UL << PG_slab)
 #define reserved	(1UL << PG_reserved)
 
 static struct page_state error_states[] = {
@@ -1249,13 +1248,6 @@ static struct page_state error_states[] = {
 	 * PG_buddy pages only make a small fraction of all free pages.
 	 */
 
-	/*
-	 * Could in theory check if slab page is free or if we can drop
-	 * currently unused objects without touching them. But just
-	 * treat it as standard kernel for now.
-	 */
-	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },
-
 	{ head,		head,		MF_MSG_HUGE,		me_huge_page },
 
 	{ sc|dirty,	sc|dirty,	MF_MSG_DIRTY_SWAPCACHE,	me_swapcache_dirty },
@@ -1282,7 +1274,6 @@ static struct page_state error_states[] = {
 #undef mlock
 #undef lru
 #undef head
-#undef slab
 #undef reserved
 
 static void update_per_node_mf_stats(unsigned long pfn,
diff --git a/mm/slab.h b/mm/slab.h
index d2bc9b191222..457b15da2a6b 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -84,8 +84,8 @@ struct slab {
 		};
 		struct rcu_head rcu_head;
 	};
-	unsigned int __unused;
 
+	unsigned int __page_type;
 	atomic_t __page_refcount;
 #ifdef CONFIG_MEMCG
 	unsigned long memcg_data;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 8/9] mm: Improve dumping of mapcount and page_type
  2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2024-03-21 14:24 ` [PATCH 7/9] mm: Free up PG_slab Matthew Wilcox (Oracle)
@ 2024-03-21 14:24 ` Matthew Wilcox (Oracle)
  2024-03-22 11:05   ` Vlastimil Babka
  2024-03-22 15:10   ` David Hildenbrand
  2024-03-21 14:24 ` [PATCH 9/9] hugetlb: Remove mention of destructors Matthew Wilcox (Oracle)
  8 siblings, 2 replies; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

For pages that have a page_type, set the mapcount to 0, which will
reduce the confusion in people reading page dumps ("Why does this page
have a mapcount of -128?").  Now that hugetlbfs is a page_type, read the
entire_mapcount for any large folio; this is fine for all folios as no
user reuses the entire_mapcount field.

For pages which do not have a page type, do not print it to reduce
clutter.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/debug.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/debug.c b/mm/debug.c
index c1c1a6a484e4..e8a96b8b7197 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -55,18 +55,14 @@ static void __dump_folio(struct folio *folio, struct page *page,
 		unsigned long pfn, unsigned long idx)
 {
 	struct address_space *mapping = folio_mapping(folio);
-	int mapcount = 0;
+	int mapcount = atomic_read(&page->_mapcount) + 1;
 	char *type = "";
 
-	/*
-	 * page->_mapcount space in struct page is used by slab pages to
-	 * encode own info, and we must avoid calling page_folio() again.
-	 */
-	if (!folio_test_slab(folio)) {
-		mapcount = atomic_read(&page->_mapcount) + 1;
-		if (folio_test_large(folio))
-			mapcount += folio_entire_mapcount(folio);
-	}
+	/* Open-code page_mapcount() to avoid looking up a stale folio */
+	if (mapcount < 0)
+		mapcount = 0;
+	if (folio_test_large(folio))
+		mapcount += folio_entire_mapcount(folio);
 
 	pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
 			folio_ref_count(folio), mapcount, mapping,
@@ -99,7 +95,8 @@ static void __dump_folio(struct folio *folio, struct page *page,
 	 */
 	pr_warn("%sflags: %pGp%s\n", type, &folio->flags,
 		is_migrate_cma_folio(folio, pfn) ? " CMA" : "");
-	pr_warn("page_type: %pGt\n", &folio->page.page_type);
+	if (page_has_type(&folio->page))
+		pr_warn("page_type: %pGt\n", &folio->page.page_type);
 
 	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
 			sizeof(unsigned long), page,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 9/9] hugetlb: Remove mention of destructors
  2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2024-03-21 14:24 ` [PATCH 8/9] mm: Improve dumping of mapcount and page_type Matthew Wilcox (Oracle)
@ 2024-03-21 14:24 ` Matthew Wilcox (Oracle)
  2024-03-22 11:08   ` Vlastimil Babka
  2024-03-22 15:13   ` David Hildenbrand
  8 siblings, 2 replies; 45+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-21 14:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

We no longer have destructors or dtors, merely a page flag
(technically a page type flag, but that's an implementation detail).
Remove __clear_hugetlb_destructor, fix up comments and the occasional
variable name.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/hugetlb.c | 42 +++++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bdcbb62096cf..6ca9ac90ad35 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1619,19 +1619,11 @@ static inline void destroy_compound_gigantic_folio(struct folio *folio,
 						unsigned int order) { }
 #endif
 
-static inline void __clear_hugetlb_destructor(struct hstate *h,
-						struct folio *folio)
-{
-	lockdep_assert_held(&hugetlb_lock);
-
-	__folio_clear_hugetlb(folio);
-}
-
 /*
  * Remove hugetlb folio from lists.
- * If vmemmap exists for the folio, update dtor so that the folio appears
- * as just a compound page.  Otherwise, wait until after allocating vmemmap
- * to update dtor.
+ * If vmemmap exists for the folio, clear the hugetlb flag so that the
+ * folio appears as just a compound page.  Otherwise, wait until after
+ * allocating vmemmap to clear the flag.
  *
  * A reference is held on the folio, except in the case of demote.
  *
@@ -1662,12 +1654,12 @@ static void __remove_hugetlb_folio(struct hstate *h, struct folio *folio,
 	}
 
 	/*
-	 * We can only clear the hugetlb destructor after allocating vmemmap
+	 * We can only clear the hugetlb flag after allocating vmemmap
 	 * pages.  Otherwise, someone (memory error handling) may try to write
 	 * to tail struct pages.
 	 */
 	if (!folio_test_hugetlb_vmemmap_optimized(folio))
-		__clear_hugetlb_destructor(h, folio);
+		__folio_clear_hugetlb(folio);
 
 	 /*
 	  * In the case of demote we do not ref count the page as it will soon
@@ -1741,7 +1733,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio,
 static void __update_and_free_hugetlb_folio(struct hstate *h,
 						struct folio *folio)
 {
-	bool clear_dtor = folio_test_hugetlb_vmemmap_optimized(folio);
+	bool clear_flag = folio_test_hugetlb_vmemmap_optimized(folio);
 
 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		return;
@@ -1754,11 +1746,11 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
 		return;
 
 	/*
-	 * If folio is not vmemmap optimized (!clear_dtor), then the folio
+	 * If folio is not vmemmap optimized (!clear_flag), then the folio
 	 * is no longer identified as a hugetlb page.  hugetlb_vmemmap_restore_folio
 	 * can only be passed hugetlb pages and will BUG otherwise.
 	 */
-	if (clear_dtor && hugetlb_vmemmap_restore_folio(h, folio)) {
+	if (clear_flag && hugetlb_vmemmap_restore_folio(h, folio)) {
 		spin_lock_irq(&hugetlb_lock);
 		/*
 		 * If we cannot allocate vmemmap pages, just refuse to free the
@@ -1779,11 +1771,11 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
 
 	/*
 	 * If vmemmap pages were allocated above, then we need to clear the
-	 * hugetlb destructor under the hugetlb lock.
+	 * hugetlb flag under the hugetlb lock.
 	 */
-	if (clear_dtor) {
+	if (clear_flag) {
 		spin_lock_irq(&hugetlb_lock);
-		__clear_hugetlb_destructor(h, folio);
+		__folio_clear_hugetlb(folio);
 		spin_unlock_irq(&hugetlb_lock);
 	}
 
@@ -1885,7 +1877,7 @@ static void bulk_vmemmap_restore_error(struct hstate *h,
 		list_for_each_entry_safe(folio, t_folio, non_hvo_folios, lru) {
 			list_del(&folio->lru);
 			spin_lock_irq(&hugetlb_lock);
-			__clear_hugetlb_destructor(h, folio);
+			__folio_clear_hugetlb(folio);
 			spin_unlock_irq(&hugetlb_lock);
 			update_and_free_hugetlb_folio(h, folio, false);
 			cond_resched();
@@ -1910,7 +1902,7 @@ static void bulk_vmemmap_restore_error(struct hstate *h,
 			} else {
 				list_del(&folio->lru);
 				spin_lock_irq(&hugetlb_lock);
-				__clear_hugetlb_destructor(h, folio);
+				__folio_clear_hugetlb(folio);
 				spin_unlock_irq(&hugetlb_lock);
 				update_and_free_hugetlb_folio(h, folio, false);
 				cond_resched();
@@ -1943,14 +1935,14 @@ static void update_and_free_pages_bulk(struct hstate *h,
 	 * should only be pages on the non_hvo_folios list.
 	 * Do note that the non_hvo_folios list could be empty.
 	 * Without HVO enabled, ret will be 0 and there is no need to call
-	 * __clear_hugetlb_destructor as this was done previously.
+	 * __folio_clear_hugetlb as this was done previously.
 	 */
 	VM_WARN_ON(!list_empty(folio_list));
 	VM_WARN_ON(ret < 0);
 	if (!list_empty(&non_hvo_folios) && ret) {
 		spin_lock_irq(&hugetlb_lock);
 		list_for_each_entry(folio, &non_hvo_folios, lru)
-			__clear_hugetlb_destructor(h, folio);
+			__folio_clear_hugetlb(folio);
 		spin_unlock_irq(&hugetlb_lock);
 	}
 
@@ -1975,7 +1967,7 @@ void free_huge_folio(struct folio *folio)
 {
 	/*
 	 * Can't pass hstate in here because it is called from the
-	 * compound page destructor.
+	 * generic mm code.
 	 */
 	struct hstate *h = folio_hstate(folio);
 	int nid = folio_nid(folio);
@@ -2125,7 +2117,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
 			set_compound_head(p, &folio->page);
 	}
 	__folio_set_head(folio);
-	/* we rely on prep_new_hugetlb_folio to set the destructor */
+	/* we rely on prep_new_hugetlb_folio to set the hugetlb flag */
 	folio_set_order(folio, order);
 	atomic_set(&folio->_entire_mapcount, -1);
 	atomic_set(&folio->_nr_pages_mapped, 0);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] mm: Always initialise folio->_deferred_list
  2024-03-21 14:24 ` [PATCH 1/9] mm: Always initialise folio->_deferred_list Matthew Wilcox (Oracle)
@ 2024-03-22  8:23   ` Miaohe Lin
  2024-03-22 13:00     ` Matthew Wilcox
  2024-03-22  9:30   ` Vlastimil Babka
  2024-03-22 12:49   ` David Hildenbrand
  2 siblings, 1 reply; 45+ messages in thread
From: Miaohe Lin @ 2024-03-22  8:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, David Hildenbrand, Vlastimil Babka, Muchun Song,
	Oscar Salvador, Andrew Morton

On 2024/3/21 22:24, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list.  We recently found this useful to
> rule out a source of corruption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/huge_memory.c | 2 --
>  mm/hugetlb.c     | 3 ++-
>  mm/internal.h    | 2 ++
>  mm/memcontrol.c  | 2 ++
>  mm/page_alloc.c  | 9 +++++----
>  5 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9859aa4f7553..04fb994a7b0b 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio)
>  {
>  	if (!folio || !folio_test_large(folio))
>  		return;
> -	if (folio_order(folio) > 1)
> -		INIT_LIST_HEAD(&folio->_deferred_list);
>  	folio_set_large_rmappable(folio);
>  }
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 23ef240ba48a..7e9a766059aa 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
>  		destroy_compound_gigantic_folio(folio, huge_page_order(h));
>  		free_gigantic_folio(folio, huge_page_order(h));
>  	} else {
> -		__free_pages(&folio->page, huge_page_order(h));
> +		INIT_LIST_HEAD(&folio->_deferred_list);

Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ?

> +		folio_put(folio);

Can all __free_pages be replaced with folio_put in mm/hugetlb.c?

Thanks.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 7/9] mm: Free up PG_slab
  2024-03-21 14:24 ` [PATCH 7/9] mm: Free up PG_slab Matthew Wilcox (Oracle)
@ 2024-03-22  9:20   ` Miaohe Lin
  2024-03-22 10:41     ` Vlastimil Babka
  2024-03-22 15:09   ` David Hildenbrand
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Miaohe Lin @ 2024-03-22  9:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, David Hildenbrand, Vlastimil Babka, Muchun Song,
	Oscar Salvador, Andrew Morton

On 2024/3/21 22:24, Matthew Wilcox (Oracle) wrote:
> Reclaim the Slab page flag by using a spare bit in PageType.  We are
> perennially short of page flags for various purposes, and now that
> the original SLAB allocator has been retired, SLUB does not use the
> mapcount/page_type field.  This lets us remove a number of special cases
> for ignoring mapcount on Slab pages.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/page-flags.h     | 21 +++++++++++++++++----
>  include/trace/events/mmflags.h |  2 +-
>  mm/memory-failure.c            |  9 ---------
>  mm/slab.h                      |  2 +-
>  4 files changed, 19 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 94eb8a11a321..73e0b17c7728 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -109,7 +109,6 @@ enum pageflags {
>  	PG_active,
>  	PG_workingset,
>  	PG_error,
> -	PG_slab,
>  	PG_owner_priv_1,	/* Owner use. If pagecache, fs may use*/
>  	PG_arch_1,
>  	PG_reserved,
> @@ -524,7 +523,6 @@ PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
>  	TESTCLEARFLAG(Active, active, PF_HEAD)
>  PAGEFLAG(Workingset, workingset, PF_HEAD)
>  	TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
> -__PAGEFLAG(Slab, slab, PF_NO_TAIL)
>  PAGEFLAG(Checked, checked, PF_NO_COMPOUND)	   /* Used by some filesystems */
>  
>  /* Xen */
> @@ -931,7 +929,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>  #endif
>  
>  /*
> - * For pages that are never mapped to userspace (and aren't PageSlab),
> + * For pages that are never mapped to userspace,
>   * page_type may be used.  Because it is initialised to -1, we invert the
>   * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
>   * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
> @@ -947,6 +945,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>  #define PG_table	0x00000200
>  #define PG_guard	0x00000400
>  #define PG_hugetlb	0x00000800
> +#define PG_slab		0x00001000
>  
>  #define PageType(page, flag)						\
>  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> @@ -1041,6 +1040,20 @@ PAGE_TYPE_OPS(Table, table, pgtable)
>   */
>  PAGE_TYPE_OPS(Guard, guard, guard)
>  
> +FOLIO_TYPE_OPS(slab, slab)
> +
> +/**
> + * PageSlab - Determine if the page belongs to the slab allocator
> + * @page: The page to test.
> + *
> + * Context: Any context.
> + * Return: True for slab pages, false for any other kind of page.
> + */
> +static inline bool PageSlab(const struct page *page)
> +{
> +	return folio_test_slab(page_folio(page));
> +}
> +
>  #ifdef CONFIG_HUGETLB_PAGE
>  FOLIO_TYPE_OPS(hugetlb, hugetlb)
>  #else
> @@ -1121,7 +1134,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
>  	(1UL << PG_lru		| 1UL << PG_locked	|	\
>  	 1UL << PG_private	| 1UL << PG_private_2	|	\
>  	 1UL << PG_writeback	| 1UL << PG_reserved	|	\
> -	 1UL << PG_slab		| 1UL << PG_active 	|	\
> +	 1UL << PG_active 	|				\
>  	 1UL << PG_unevictable	| __PG_MLOCKED | LRU_GEN_MASK)
>  
>  /*
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index d55e53ac91bd..e46d6e82765e 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -107,7 +107,6 @@
>  	DEF_PAGEFLAG_NAME(lru),						\
>  	DEF_PAGEFLAG_NAME(active),					\
>  	DEF_PAGEFLAG_NAME(workingset),					\
> -	DEF_PAGEFLAG_NAME(slab),					\
>  	DEF_PAGEFLAG_NAME(owner_priv_1),				\
>  	DEF_PAGEFLAG_NAME(arch_1),					\
>  	DEF_PAGEFLAG_NAME(reserved),					\
> @@ -135,6 +134,7 @@ IF_HAVE_PG_ARCH_X(arch_3)
>  #define DEF_PAGETYPE_NAME(_name) { PG_##_name, __stringify(_name) }
>  
>  #define __def_pagetype_names						\
> +	DEF_PAGETYPE_NAME(slab),					\
>  	DEF_PAGETYPE_NAME(hugetlb),					\
>  	DEF_PAGETYPE_NAME(offline),					\
>  	DEF_PAGETYPE_NAME(guard),					\
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 9349948f1abf..1cb41ba7870c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1239,7 +1239,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
>  #define mlock		(1UL << PG_mlocked)
>  #define lru		(1UL << PG_lru)
>  #define head		(1UL << PG_head)
> -#define slab		(1UL << PG_slab)
>  #define reserved	(1UL << PG_reserved)
>  
>  static struct page_state error_states[] = {
> @@ -1249,13 +1248,6 @@ static struct page_state error_states[] = {
>  	 * PG_buddy pages only make a small fraction of all free pages.
>  	 */
>  
> -	/*
> -	 * Could in theory check if slab page is free or if we can drop
> -	 * currently unused objects without touching them. But just
> -	 * treat it as standard kernel for now.
> -	 */
> -	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },

Will it be better to leave the above slab case here to catch possible unhandled obscure races with
slab? Though it looks like slab page shouldn't reach here.

Thanks.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] mm: Always initialise folio->_deferred_list
  2024-03-21 14:24 ` [PATCH 1/9] mm: Always initialise folio->_deferred_list Matthew Wilcox (Oracle)
  2024-03-22  8:23   ` Miaohe Lin
@ 2024-03-22  9:30   ` Vlastimil Babka
  2024-03-22 12:49   ` David Hildenbrand
  2 siblings, 0 replies; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-22  9:30 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, David Hildenbrand, Miaohe Lin, Muchun Song, Oscar Salvador

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list.  We recently found this useful to
> rule out a source of corruption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/9] mm: Create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
  2024-03-21 14:24 ` [PATCH 2/9] mm: Create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros Matthew Wilcox (Oracle)
@ 2024-03-22  9:33   ` Vlastimil Babka
  0 siblings, 0 replies; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-22  9:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, David Hildenbrand, Miaohe Lin, Muchun Song, Oscar Salvador

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> Following the separation of FOLIO_FLAGS from PAGEFLAGS, separate
> FOLIO_FLAG_FALSE from PAGEFLAG_FALSE and FOLIO_TYPE_OPS from
> PAGE_TYPE_OPS.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/9] mm: Remove folio_prep_large_rmappable()
  2024-03-21 14:24 ` [PATCH 3/9] mm: Remove folio_prep_large_rmappable() Matthew Wilcox (Oracle)
@ 2024-03-22  9:37   ` Vlastimil Babka
  2024-03-22 12:51   ` David Hildenbrand
  1 sibling, 0 replies; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-22  9:37 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, David Hildenbrand, Miaohe Lin, Muchun Song, Oscar Salvador

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> Now that prep_compound_page() initialises folio->_deferred_list,
> folio_prep_large_rmappable()'s only purpose is to set the large_rmappable
> flag, so inline it into the two callers.  Take the opportunity to convert
> the large_rmappable definition from PAGEFLAG to FOLIO_FLAG and remove
> the existance of PageTestLargeRmappable and friends.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/9] mm: Support page_mapcount() on page_has_type() pages
  2024-03-21 14:24 ` [PATCH 4/9] mm: Support page_mapcount() on page_has_type() pages Matthew Wilcox (Oracle)
@ 2024-03-22  9:43   ` Vlastimil Babka
  2024-03-22 12:43     ` Matthew Wilcox
  2024-03-22 15:04   ` David Hildenbrand
  1 sibling, 1 reply; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-22  9:43 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, David Hildenbrand, Miaohe Lin, Muchun Song, Oscar Salvador

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> Return 0 for pages which can't be mapped.  This matches how page_mapped()
> works.  It is more convenient for users to not have to filter out
> these pages.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Hm strictly speaking you shouldn't be removing those PageSlab tests until
it's changed to a PageType in 7/9? If we're paranoid enough about not
breaking bisection between this and that patch.

Otherwise

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  fs/proc/page.c             | 7 ++-----
>  include/linux/mm.h         | 8 +++++---
>  include/linux/page-flags.h | 4 ++--
>  3 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 195b077c0fac..9223856c934b 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -67,7 +67,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
>  		 */
>  		ppage = pfn_to_online_page(pfn);
>  
> -		if (!ppage || PageSlab(ppage) || page_has_type(ppage))
> +		if (!ppage)
>  			pcount = 0;
>  		else
>  			pcount = page_mapcount(ppage);
> @@ -124,11 +124,8 @@ u64 stable_page_flags(struct page *page)
>  
>  	/*
>  	 * pseudo flags for the well known (anonymous) memory mapped pages
> -	 *
> -	 * Note that page->_mapcount is overloaded in SLAB, so the
> -	 * simple test in page_mapped() is not enough.
>  	 */
> -	if (!PageSlab(page) && page_mapped(page))
> +	if (page_mapped(page))
>  		u |= 1 << KPF_MMAP;
>  	if (PageAnon(page))
>  		u |= 1 << KPF_ANON;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0436b919f1c7..5ff3d687bc6c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1223,14 +1223,16 @@ static inline void page_mapcount_reset(struct page *page)
>   * a large folio, it includes the number of times this page is mapped
>   * as part of that folio.
>   *
> - * The result is undefined for pages which cannot be mapped into userspace.
> - * For example SLAB or special types of pages. See function page_has_type().
> - * They use this field in struct page differently.
> + * Will report 0 for pages which cannot be mapped into userspace, eg
> + * slab, page tables and similar.
>   */
>  static inline int page_mapcount(struct page *page)
>  {
>  	int mapcount = atomic_read(&page->_mapcount) + 1;
>  
> +	/* Handle page_has_type() pages */
> +	if (mapcount < 0)
> +		mapcount = 0;
>  	if (unlikely(PageCompound(page)))
>  		mapcount += folio_entire_mapcount(page_folio(page));
>  
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 8d0e6ce25ca2..5852f967c640 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -971,12 +971,12 @@ static inline bool is_page_hwpoison(struct page *page)
>   * page_type may be used.  Because it is initialised to -1, we invert the
>   * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
>   * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
> - * low bits so that an underflow or overflow of page_mapcount() won't be
> + * low bits so that an underflow or overflow of _mapcount won't be
>   * mistaken for a page type value.
>   */
>  
>  #define PAGE_TYPE_BASE	0xf0000000
> -/* Reserve		0x0000007f to catch underflows of page_mapcount */
> +/* Reserve		0x0000007f to catch underflows of _mapcount */
>  #define PAGE_MAPCOUNT_RESERVE	-128
>  #define PG_buddy	0x00000080
>  #define PG_offline	0x00000100



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-21 14:24 ` [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType Matthew Wilcox (Oracle)
@ 2024-03-22 10:19   ` Vlastimil Babka
  2024-03-22 15:06     ` David Hildenbrand
  2024-03-23  3:24     ` Matthew Wilcox
  2024-03-25  7:57   ` Vlastimil Babka
  2024-03-25 15:14   ` Matthew Wilcox
  2 siblings, 2 replies; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-22 10:19 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, David Hildenbrand, Miaohe Lin, Muchun Song,
	Oscar Salvador, Luis Chamberlain

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> The current folio_test_hugetlb() can be fooled by a concurrent folio split
> into returning true for a folio which has never belonged to hugetlbfs.
> This can't happen if the caller holds a refcount on it, but we have a
> few places (memory-failure, compaction, procfs) which do not and should
> not take a speculative reference.

Should we add metadata wrt closing the bug report from Luis?

https://lore.kernel.org/all/8fa1c95c-4749-33dd-42ba-243e492ab109@suse.cz/

I assume this wouldn't be fun wrt stable...

> Since hugetlb pages do not use individual page mapcounts (they are always
> fully mapped and use the entire_mapcount field to record the number

Wasn't there some discussions to allow partial mappings of hugetlb? What
would be the implications?

> of mappings), the PageType field is available now that page_mapcount()
> ignores the value in this field.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: David Hildenbrand <david@redhat.com>

Other than that,
Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/page-flags.h     | 70 ++++++++++++++++------------------
>  include/trace/events/mmflags.h |  1 +
>  mm/hugetlb.c                   | 22 ++---------
>  3 files changed, 37 insertions(+), 56 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5852f967c640..6fb3cd42ee59 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -190,7 +190,6 @@ enum pageflags {
>  
>  	/* At least one page in this folio has the hwpoison flag set */
>  	PG_has_hwpoisoned = PG_error,
> -	PG_hugetlb = PG_active,
>  	PG_large_rmappable = PG_workingset, /* anon or file-backed */
>  };
>  
> @@ -876,29 +875,6 @@ FOLIO_FLAG_FALSE(large_rmappable)
>  
>  #define PG_head_mask ((1UL << PG_head))
>  
> -#ifdef CONFIG_HUGETLB_PAGE
> -int PageHuge(const struct page *page);
> -SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
> -CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
> -
> -/**
> - * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
> - * @folio: The folio to test.
> - *
> - * Context: Any context.  Caller should have a reference on the folio to
> - * prevent it from being turned into a tail page.
> - * Return: True for hugetlbfs folios, false for anon folios or folios
> - * belonging to other filesystems.
> - */
> -static inline bool folio_test_hugetlb(const struct folio *folio)
> -{
> -	return folio_test_large(folio) &&
> -		test_bit(PG_hugetlb, const_folio_flags(folio, 1));
> -}
> -#else
> -TESTPAGEFLAG_FALSE(Huge, hugetlb)
> -#endif
> -
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  /*
>   * PageHuge() only returns true for hugetlbfs pages, but not for
> @@ -954,18 +930,6 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>  	TESTSCFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>  #endif
>  
> -/*
> - * Check if a page is currently marked HWPoisoned. Note that this check is
> - * best effort only and inherently racy: there is no way to synchronize with
> - * failing hardware.
> - */
> -static inline bool is_page_hwpoison(struct page *page)
> -{
> -	if (PageHWPoison(page))
> -		return true;
> -	return PageHuge(page) && PageHWPoison(compound_head(page));
> -}
> -
>  /*
>   * For pages that are never mapped to userspace (and aren't PageSlab),
>   * page_type may be used.  Because it is initialised to -1, we invert the
> @@ -982,6 +946,7 @@ static inline bool is_page_hwpoison(struct page *page)
>  #define PG_offline	0x00000100
>  #define PG_table	0x00000200
>  #define PG_guard	0x00000400
> +#define PG_hugetlb	0x00000800
>  
>  #define PageType(page, flag)						\
>  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> @@ -1076,6 +1041,37 @@ PAGE_TYPE_OPS(Table, table, pgtable)
>   */
>  PAGE_TYPE_OPS(Guard, guard, guard)
>  
> +#ifdef CONFIG_HUGETLB_PAGE
> +FOLIO_TYPE_OPS(hugetlb, hugetlb)
> +#else
> +FOLIO_TEST_FLAG_FALSE(hugetlb)
> +#endif
> +
> +/**
> + * PageHuge - Determine if the page belongs to hugetlbfs
> + * @page: The page to test.
> + *
> + * Context: Any context.
> + * Return: True for hugetlbfs pages, false for anon pages or pages
> + * belonging to other filesystems.
> + */
> +static inline bool PageHuge(const struct page *page)
> +{
> +	return folio_test_hugetlb(page_folio(page));
> +}
> +
> +/*
> + * Check if a page is currently marked HWPoisoned. Note that this check is
> + * best effort only and inherently racy: there is no way to synchronize with
> + * failing hardware.
> + */
> +static inline bool is_page_hwpoison(struct page *page)
> +{
> +	if (PageHWPoison(page))
> +		return true;
> +	return PageHuge(page) && PageHWPoison(compound_head(page));
> +}
> +
>  extern bool is_free_buddy_page(struct page *page);
>  
>  PAGEFLAG(Isolated, isolated, PF_ANY);
> @@ -1142,7 +1138,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
>   */
>  #define PAGE_FLAGS_SECOND						\
>  	(0xffUL /* order */		| 1UL << PG_has_hwpoisoned |	\
> -	 1UL << PG_hugetlb		| 1UL << PG_large_rmappable)
> +	 1UL << PG_large_rmappable)
>  
>  #define PAGE_FLAGS_PRIVATE				\
>  	(1UL << PG_private | 1UL << PG_private_2)
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index d801409b33cf..d55e53ac91bd 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -135,6 +135,7 @@ IF_HAVE_PG_ARCH_X(arch_3)
>  #define DEF_PAGETYPE_NAME(_name) { PG_##_name, __stringify(_name) }
>  
>  #define __def_pagetype_names						\
> +	DEF_PAGETYPE_NAME(hugetlb),					\
>  	DEF_PAGETYPE_NAME(offline),					\
>  	DEF_PAGETYPE_NAME(guard),					\
>  	DEF_PAGETYPE_NAME(table),					\
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 7e9a766059aa..bdcbb62096cf 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1624,7 +1624,7 @@ static inline void __clear_hugetlb_destructor(struct hstate *h,
>  {
>  	lockdep_assert_held(&hugetlb_lock);
>  
> -	folio_clear_hugetlb(folio);
> +	__folio_clear_hugetlb(folio);
>  }
>  
>  /*
> @@ -1711,7 +1711,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio,
>  		h->surplus_huge_pages_node[nid]++;
>  	}
>  
> -	folio_set_hugetlb(folio);
> +	__folio_set_hugetlb(folio);
>  	folio_change_private(folio, NULL);
>  	/*
>  	 * We have to set hugetlb_vmemmap_optimized again as above
> @@ -2050,7 +2050,7 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid)
>  
>  static void init_new_hugetlb_folio(struct hstate *h, struct folio *folio)
>  {
> -	folio_set_hugetlb(folio);
> +	__folio_set_hugetlb(folio);
>  	INIT_LIST_HEAD(&folio->lru);
>  	hugetlb_set_folio_subpool(folio, NULL);
>  	set_hugetlb_cgroup(folio, NULL);
> @@ -2160,22 +2160,6 @@ static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
>  	return __prep_compound_gigantic_folio(folio, order, true);
>  }
>  
> -/*
> - * PageHuge() only returns true for hugetlbfs pages, but not for normal or
> - * transparent huge pages.  See the PageTransHuge() documentation for more
> - * details.
> - */
> -int PageHuge(const struct page *page)
> -{
> -	const struct folio *folio;
> -
> -	if (!PageCompound(page))
> -		return 0;
> -	folio = page_folio(page);
> -	return folio_test_hugetlb(folio);
> -}
> -EXPORT_SYMBOL_GPL(PageHuge);
> -
>  /*
>   * Find and lock address space (mapping) in write mode.
>   *



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6/9] mm: Remove a call to compound_head() from is_page_hwpoison()
  2024-03-21 14:24 ` [PATCH 6/9] mm: Remove a call to compound_head() from is_page_hwpoison() Matthew Wilcox (Oracle)
@ 2024-03-22 10:28   ` Vlastimil Babka
  0 siblings, 0 replies; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-22 10:28 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, David Hildenbrand, Miaohe Lin, Muchun Song, Oscar Salvador

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> We can call it only once instead of twice.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/page-flags.h | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 6fb3cd42ee59..94eb8a11a321 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -1065,11 +1065,14 @@ static inline bool PageHuge(const struct page *page)
>   * best effort only and inherently racy: there is no way to synchronize with
>   * failing hardware.
>   */
> -static inline bool is_page_hwpoison(struct page *page)
> +static inline bool is_page_hwpoison(const struct page *page)
>  {
> +	const struct folio *folio;
> +
>  	if (PageHWPoison(page))
>  		return true;
> -	return PageHuge(page) && PageHWPoison(compound_head(page));
> +	folio = page_folio(page);
> +	return folio_test_hugetlb(folio) && PageHWPoison(&folio->page);
>  }
>  
>  extern bool is_free_buddy_page(struct page *page);



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 7/9] mm: Free up PG_slab
  2024-03-22  9:20   ` Miaohe Lin
@ 2024-03-22 10:41     ` Vlastimil Babka
  2024-04-01  3:38       ` Miaohe Lin
  0 siblings, 1 reply; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-22 10:41 UTC (permalink / raw)
  To: Miaohe Lin, Matthew Wilcox (Oracle), Naoya Horiguchi
  Cc: linux-mm, David Hildenbrand, Muchun Song, Oscar Salvador, Andrew Morton

On 3/22/24 10:20, Miaohe Lin wrote:
> On 2024/3/21 22:24, Matthew Wilcox (Oracle) wrote:
>> Reclaim the Slab page flag by using a spare bit in PageType.  We are
>> perennially short of page flags for various purposes, and now that
>> the original SLAB allocator has been retired, SLUB does not use the
>> mapcount/page_type field.  This lets us remove a number of special cases
>> for ignoring mapcount on Slab pages.
>> 
>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

>> ---
>>  include/linux/page-flags.h     | 21 +++++++++++++++++----
>>  include/trace/events/mmflags.h |  2 +-
>>  mm/memory-failure.c            |  9 ---------
>>  mm/slab.h                      |  2 +-
>>  4 files changed, 19 insertions(+), 15 deletions(-)
>> 
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 94eb8a11a321..73e0b17c7728 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -109,7 +109,6 @@ enum pageflags {
>>  	PG_active,
>>  	PG_workingset,
>>  	PG_error,
>> -	PG_slab,
>>  	PG_owner_priv_1,	/* Owner use. If pagecache, fs may use*/
>>  	PG_arch_1,
>>  	PG_reserved,
>> @@ -524,7 +523,6 @@ PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
>>  	TESTCLEARFLAG(Active, active, PF_HEAD)
>>  PAGEFLAG(Workingset, workingset, PF_HEAD)
>>  	TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
>> -__PAGEFLAG(Slab, slab, PF_NO_TAIL)
>>  PAGEFLAG(Checked, checked, PF_NO_COMPOUND)	   /* Used by some filesystems */
>>  
>>  /* Xen */
>> @@ -931,7 +929,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>>  #endif
>>  
>>  /*
>> - * For pages that are never mapped to userspace (and aren't PageSlab),
>> + * For pages that are never mapped to userspace,
>>   * page_type may be used.  Because it is initialised to -1, we invert the
>>   * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
>>   * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
>> @@ -947,6 +945,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>>  #define PG_table	0x00000200
>>  #define PG_guard	0x00000400
>>  #define PG_hugetlb	0x00000800
>> +#define PG_slab		0x00001000
>>  
>>  #define PageType(page, flag)						\
>>  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>> @@ -1041,6 +1040,20 @@ PAGE_TYPE_OPS(Table, table, pgtable)
>>   */
>>  PAGE_TYPE_OPS(Guard, guard, guard)
>>  
>> +FOLIO_TYPE_OPS(slab, slab)
>> +
>> +/**
>> + * PageSlab - Determine if the page belongs to the slab allocator
>> + * @page: The page to test.
>> + *
>> + * Context: Any context.
>> + * Return: True for slab pages, false for any other kind of page.
>> + */
>> +static inline bool PageSlab(const struct page *page)
>> +{
>> +	return folio_test_slab(page_folio(page));
>> +}
>> +
>>  #ifdef CONFIG_HUGETLB_PAGE
>>  FOLIO_TYPE_OPS(hugetlb, hugetlb)
>>  #else
>> @@ -1121,7 +1134,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
>>  	(1UL << PG_lru		| 1UL << PG_locked	|	\
>>  	 1UL << PG_private	| 1UL << PG_private_2	|	\
>>  	 1UL << PG_writeback	| 1UL << PG_reserved	|	\
>> -	 1UL << PG_slab		| 1UL << PG_active 	|	\
>> +	 1UL << PG_active 	|				\
>>  	 1UL << PG_unevictable	| __PG_MLOCKED | LRU_GEN_MASK)
>>  
>>  /*
>> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
>> index d55e53ac91bd..e46d6e82765e 100644
>> --- a/include/trace/events/mmflags.h
>> +++ b/include/trace/events/mmflags.h
>> @@ -107,7 +107,6 @@
>>  	DEF_PAGEFLAG_NAME(lru),						\
>>  	DEF_PAGEFLAG_NAME(active),					\
>>  	DEF_PAGEFLAG_NAME(workingset),					\
>> -	DEF_PAGEFLAG_NAME(slab),					\
>>  	DEF_PAGEFLAG_NAME(owner_priv_1),				\
>>  	DEF_PAGEFLAG_NAME(arch_1),					\
>>  	DEF_PAGEFLAG_NAME(reserved),					\
>> @@ -135,6 +134,7 @@ IF_HAVE_PG_ARCH_X(arch_3)
>>  #define DEF_PAGETYPE_NAME(_name) { PG_##_name, __stringify(_name) }
>>  
>>  #define __def_pagetype_names						\
>> +	DEF_PAGETYPE_NAME(slab),					\
>>  	DEF_PAGETYPE_NAME(hugetlb),					\
>>  	DEF_PAGETYPE_NAME(offline),					\
>>  	DEF_PAGETYPE_NAME(guard),					\
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 9349948f1abf..1cb41ba7870c 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1239,7 +1239,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
>>  #define mlock		(1UL << PG_mlocked)
>>  #define lru		(1UL << PG_lru)
>>  #define head		(1UL << PG_head)
>> -#define slab		(1UL << PG_slab)
>>  #define reserved	(1UL << PG_reserved)
>>  
>>  static struct page_state error_states[] = {
>> @@ -1249,13 +1248,6 @@ static struct page_state error_states[] = {
>>  	 * PG_buddy pages only make a small fraction of all free pages.
>>  	 */
>>  
>> -	/*
>> -	 * Could in theory check if slab page is free or if we can drop
>> -	 * currently unused objects without touching them. But just
>> -	 * treat it as standard kernel for now.
>> -	 */
>> -	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },
> 
> Will it be better to leave the above slab case here to catch possible unhandled obscure races with
> slab? Though it looks like slab page shouldn't reach here.

The code would need to handle page types as it's no longer a page flag. I
guess that's your decision? If it's not necessary, then I guess MF_MSG_SLAB
itself could be also removed with a buch of more code referencing it.

> Thanks.
> 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 8/9] mm: Improve dumping of mapcount and page_type
  2024-03-21 14:24 ` [PATCH 8/9] mm: Improve dumping of mapcount and page_type Matthew Wilcox (Oracle)
@ 2024-03-22 11:05   ` Vlastimil Babka
  2024-03-22 15:10   ` David Hildenbrand
  1 sibling, 0 replies; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-22 11:05 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, David Hildenbrand, Miaohe Lin, Muchun Song, Oscar Salvador

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> For pages that have a page_type, set the mapcount to 0, which will
> reduce the confusion in people reading page dumps ("Why does this page
> have a mapcount of -128?").  Now that hugetlbfs is a page_type, read the
> entire_mapcount for any large folio; this is fine for all folios as no
> user reuses the entire_mapcount field.
> 
> For pages which do not have a page type, do not print it to reduce
> clutter.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/debug.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/debug.c b/mm/debug.c
> index c1c1a6a484e4..e8a96b8b7197 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -55,18 +55,14 @@ static void __dump_folio(struct folio *folio, struct page *page,
>  		unsigned long pfn, unsigned long idx)
>  {
>  	struct address_space *mapping = folio_mapping(folio);
> -	int mapcount = 0;
> +	int mapcount = atomic_read(&page->_mapcount) + 1;
>  	char *type = "";
>  
> -	/*
> -	 * page->_mapcount space in struct page is used by slab pages to
> -	 * encode own info, and we must avoid calling page_folio() again.
> -	 */
> -	if (!folio_test_slab(folio)) {
> -		mapcount = atomic_read(&page->_mapcount) + 1;
> -		if (folio_test_large(folio))
> -			mapcount += folio_entire_mapcount(folio);
> -	}
> +	/* Open-code page_mapcount() to avoid looking up a stale folio */
> +	if (mapcount < 0)
> +		mapcount = 0;
> +	if (folio_test_large(folio))
> +		mapcount += folio_entire_mapcount(folio);
>  
>  	pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
>  			folio_ref_count(folio), mapcount, mapping,
> @@ -99,7 +95,8 @@ static void __dump_folio(struct folio *folio, struct page *page,
>  	 */
>  	pr_warn("%sflags: %pGp%s\n", type, &folio->flags,
>  		is_migrate_cma_folio(folio, pfn) ? " CMA" : "");
> -	pr_warn("page_type: %pGt\n", &folio->page.page_type);
> +	if (page_has_type(&folio->page))
> +		pr_warn("page_type: %pGt\n", &folio->page.page_type);
>  
>  	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
>  			sizeof(unsigned long), page,



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 9/9] hugetlb: Remove mention of destructors
  2024-03-21 14:24 ` [PATCH 9/9] hugetlb: Remove mention of destructors Matthew Wilcox (Oracle)
@ 2024-03-22 11:08   ` Vlastimil Babka
  2024-03-22 15:13   ` David Hildenbrand
  1 sibling, 0 replies; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-22 11:08 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, David Hildenbrand, Miaohe Lin, Muchun Song, Oscar Salvador

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> We no longer have destructors or dtors, merely a page flag
> (technically a page type flag, but that's an implementation detail).
> Remove __clear_hugetlb_destructor, fix up comments and the occasional
> variable name.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/9] mm: Support page_mapcount() on page_has_type() pages
  2024-03-22  9:43   ` Vlastimil Babka
@ 2024-03-22 12:43     ` Matthew Wilcox
  0 siblings, 0 replies; 45+ messages in thread
From: Matthew Wilcox @ 2024-03-22 12:43 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, David Hildenbrand, Miaohe Lin,
	Muchun Song, Oscar Salvador

On Fri, Mar 22, 2024 at 10:43:38AM +0100, Vlastimil Babka wrote:
> On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> > Return 0 for pages which can't be mapped.  This matches how page_mapped()
> > works.  It is more convenient for users to not have to filter out
> > these pages.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> Hm strictly speaking you shouldn't be removing those PageSlab tests until
> it's changed to a PageType in 7/9? If we're paranoid enough about not
> breaking bisection between this and that patch.

I thought about that.  Slub currently doesn't use the field while will
become __page_type, so it's left set to -1 by the page allocator.  So
this is safe.

Thanks for checking that though ;-)

> Otherwise
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> > ---
> >  fs/proc/page.c             | 7 ++-----
> >  include/linux/mm.h         | 8 +++++---
> >  include/linux/page-flags.h | 4 ++--
> >  3 files changed, 9 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/proc/page.c b/fs/proc/page.c
> > index 195b077c0fac..9223856c934b 100644
> > --- a/fs/proc/page.c
> > +++ b/fs/proc/page.c
> > @@ -67,7 +67,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
> >  		 */
> >  		ppage = pfn_to_online_page(pfn);
> >  
> > -		if (!ppage || PageSlab(ppage) || page_has_type(ppage))
> > +		if (!ppage)
> >  			pcount = 0;
> >  		else
> >  			pcount = page_mapcount(ppage);
> > @@ -124,11 +124,8 @@ u64 stable_page_flags(struct page *page)
> >  
> >  	/*
> >  	 * pseudo flags for the well known (anonymous) memory mapped pages
> > -	 *
> > -	 * Note that page->_mapcount is overloaded in SLAB, so the
> > -	 * simple test in page_mapped() is not enough.
> >  	 */
> > -	if (!PageSlab(page) && page_mapped(page))
> > +	if (page_mapped(page))
> >  		u |= 1 << KPF_MMAP;
> >  	if (PageAnon(page))
> >  		u |= 1 << KPF_ANON;
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 0436b919f1c7..5ff3d687bc6c 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1223,14 +1223,16 @@ static inline void page_mapcount_reset(struct page *page)
> >   * a large folio, it includes the number of times this page is mapped
> >   * as part of that folio.
> >   *
> > - * The result is undefined for pages which cannot be mapped into userspace.
> > - * For example SLAB or special types of pages. See function page_has_type().
> > - * They use this field in struct page differently.
> > + * Will report 0 for pages which cannot be mapped into userspace, eg
> > + * slab, page tables and similar.
> >   */
> >  static inline int page_mapcount(struct page *page)
> >  {
> >  	int mapcount = atomic_read(&page->_mapcount) + 1;
> >  
> > +	/* Handle page_has_type() pages */
> > +	if (mapcount < 0)
> > +		mapcount = 0;
> >  	if (unlikely(PageCompound(page)))
> >  		mapcount += folio_entire_mapcount(page_folio(page));
> >  
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 8d0e6ce25ca2..5852f967c640 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -971,12 +971,12 @@ static inline bool is_page_hwpoison(struct page *page)
> >   * page_type may be used.  Because it is initialised to -1, we invert the
> >   * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
> >   * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
> > - * low bits so that an underflow or overflow of page_mapcount() won't be
> > + * low bits so that an underflow or overflow of _mapcount won't be
> >   * mistaken for a page type value.
> >   */
> >  
> >  #define PAGE_TYPE_BASE	0xf0000000
> > -/* Reserve		0x0000007f to catch underflows of page_mapcount */
> > +/* Reserve		0x0000007f to catch underflows of _mapcount */
> >  #define PAGE_MAPCOUNT_RESERVE	-128
> >  #define PG_buddy	0x00000080
> >  #define PG_offline	0x00000100
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] mm: Always initialise folio->_deferred_list
  2024-03-21 14:24 ` [PATCH 1/9] mm: Always initialise folio->_deferred_list Matthew Wilcox (Oracle)
  2024-03-22  8:23   ` Miaohe Lin
  2024-03-22  9:30   ` Vlastimil Babka
@ 2024-03-22 12:49   ` David Hildenbrand
  2 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2024-03-22 12:49 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, Vlastimil Babka, Miaohe Lin, Muchun Song, Oscar Salvador

On 21.03.24 15:24, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list.  We recently found this useful to
> rule out a source of corruption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/9] mm: Remove folio_prep_large_rmappable()
  2024-03-21 14:24 ` [PATCH 3/9] mm: Remove folio_prep_large_rmappable() Matthew Wilcox (Oracle)
  2024-03-22  9:37   ` Vlastimil Babka
@ 2024-03-22 12:51   ` David Hildenbrand
  1 sibling, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2024-03-22 12:51 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, Vlastimil Babka, Miaohe Lin, Muchun Song, Oscar Salvador

On 21.03.24 15:24, Matthew Wilcox (Oracle) wrote:
> Now that prep_compound_page() initialises folio->_deferred_list,
> folio_prep_large_rmappable()'s only purpose is to set the large_rmappable
> flag, so inline it into the two callers.  Take the opportunity to convert
> the large_rmappable definition from PAGEFLAG to FOLIO_FLAG and remove
> the existance of PageTestLargeRmappable and friends.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] mm: Always initialise folio->_deferred_list
  2024-03-22  8:23   ` Miaohe Lin
@ 2024-03-22 13:00     ` Matthew Wilcox
  2024-04-01  3:14       ` Miaohe Lin
  0 siblings, 1 reply; 45+ messages in thread
From: Matthew Wilcox @ 2024-03-22 13:00 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: linux-mm, David Hildenbrand, Vlastimil Babka, Muchun Song,
	Oscar Salvador, Andrew Morton

On Fri, Mar 22, 2024 at 04:23:59PM +0800, Miaohe Lin wrote:
> > +++ b/mm/hugetlb.c
> > @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
> >  		destroy_compound_gigantic_folio(folio, huge_page_order(h));
> >  		free_gigantic_folio(folio, huge_page_order(h));
> >  	} else {
> > -		__free_pages(&folio->page, huge_page_order(h));
> > +		INIT_LIST_HEAD(&folio->_deferred_list);
> 
> Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ?

Maybe?  Something like
		/* We reused this space for our own purposes */

> > +		folio_put(folio);
> 
> Can all __free_pages be replaced with folio_put in mm/hugetlb.c?

There's only one left, and indeed it can!

I'll drop this into my tree and send it as a proper patch later.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 333f6278ef63..43cc7e6bc374 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
 		nodemask_t *node_alloc_noretry)
 {
 	int order = huge_page_order(h);
-	struct page *page;
+	struct folio *folio;
 	bool alloc_try_hard = true;
 	bool retry = true;
 
 	/*
-	 * By default we always try hard to allocate the page with
-	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating pages in
+	 * By default we always try hard to allocate the folio with
+	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating folios in
 	 * a loop (to adjust global huge page counts) and previous allocation
 	 * failed, do not continue to try hard on the same node.  Use the
 	 * node_alloc_noretry bitmap to manage this state information.
@@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
 	if (nid == NUMA_NO_NODE)
 		nid = numa_mem_id();
 retry:
-	page = __alloc_pages(gfp_mask, order, nid, nmask);
+	folio = __folio_alloc(gfp_mask, order, nid, nmask);
 
-	/* Freeze head page */
-	if (page && !page_ref_freeze(page, 1)) {
-		__free_pages(page, order);
+	if (folio && !folio_ref_freeze(folio, 1)) {
+		folio_put(folio);
 		if (retry) {	/* retry once */
 			retry = false;
 			goto retry;
 		}
 		/* WOW!  twice in a row. */
-		pr_warn("HugeTLB head page unexpected inflated ref count\n");
-		page = NULL;
+		pr_warn("HugeTLB unexpected inflated folio ref count\n");
+		folio = NULL;
 	}
 
 	/*
-	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this
-	 * indicates an overall state change.  Clear bit so that we resume
-	 * normal 'try hard' allocations.
+	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
+	 * folio this indicates an overall state change.  Clear bit so
+	 * that we resume normal 'try hard' allocations.
 	 */
-	if (node_alloc_noretry && page && !alloc_try_hard)
+	if (node_alloc_noretry && folio && !alloc_try_hard)
 		node_clear(nid, *node_alloc_noretry);
 
 	/*
-	 * If we tried hard to get a page but failed, set bit so that
+	 * If we tried hard to get a folio but failed, set bit so that
 	 * subsequent attempts will not try as hard until there is an
 	 * overall state change.
 	 */
-	if (node_alloc_noretry && !page && alloc_try_hard)
+	if (node_alloc_noretry && !folio && alloc_try_hard)
 		node_set(nid, *node_alloc_noretry);
 
-	if (!page) {
+	if (!folio) {
 		__count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
 		return NULL;
 	}
 
 	__count_vm_event(HTLB_BUDDY_PGALLOC);
-	return page_folio(page);
+	return folio;
 }
 
 static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/9] mm: Support page_mapcount() on page_has_type() pages
  2024-03-21 14:24 ` [PATCH 4/9] mm: Support page_mapcount() on page_has_type() pages Matthew Wilcox (Oracle)
  2024-03-22  9:43   ` Vlastimil Babka
@ 2024-03-22 15:04   ` David Hildenbrand
  1 sibling, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2024-03-22 15:04 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, Vlastimil Babka, Miaohe Lin, Muchun Song, Oscar Salvador

On 21.03.24 15:24, Matthew Wilcox (Oracle) wrote:
> Return 0 for pages which can't be mapped.  This matches how page_mapped()
> works.  It is more convenient for users to not have to filter out
> these pages.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---


Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-22 10:19   ` Vlastimil Babka
@ 2024-03-22 15:06     ` David Hildenbrand
  2024-03-23  3:24     ` Matthew Wilcox
  1 sibling, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2024-03-22 15:06 UTC (permalink / raw)
  To: Vlastimil Babka, Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, Miaohe Lin, Muchun Song, Oscar Salvador, Luis Chamberlain

On 22.03.24 11:19, Vlastimil Babka wrote:
> On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
>> The current folio_test_hugetlb() can be fooled by a concurrent folio split
>> into returning true for a folio which has never belonged to hugetlbfs.
>> This can't happen if the caller holds a refcount on it, but we have a
>> few places (memory-failure, compaction, procfs) which do not and should
>> not take a speculative reference.
> 
> Should we add metadata wrt closing the bug report from Luis?
> 
> https://lore.kernel.org/all/8fa1c95c-4749-33dd-42ba-243e492ab109@suse.cz/
> 
> I assume this wouldn't be fun wrt stable...
> 
>> Since hugetlb pages do not use individual page mapcounts (they are always
>> fully mapped and use the entire_mapcount field to record the number
> 
> Wasn't there some discussions to allow partial mappings of hugetlb? What
> would be the implications?

If we ever go that path, we really should avoid messing with any 
subpages right from the start. We should make it work using a single 
total mapcount per folio.

Anyhow, that's stuff for the future.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 7/9] mm: Free up PG_slab
  2024-03-21 14:24 ` [PATCH 7/9] mm: Free up PG_slab Matthew Wilcox (Oracle)
  2024-03-22  9:20   ` Miaohe Lin
@ 2024-03-22 15:09   ` David Hildenbrand
  2024-03-25 15:19   ` Matthew Wilcox
  2024-03-31 15:11     ` [LTP] " kernel test robot
  3 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2024-03-22 15:09 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, Vlastimil Babka, Miaohe Lin, Muchun Song, Oscar Salvador

On 21.03.24 15:24, Matthew Wilcox (Oracle) wrote:
> Reclaim the Slab page flag by using a spare bit in PageType.  We are
> perennially short of page flags for various purposes, and now that
> the original SLAB allocator has been retired, SLUB does not use the
> mapcount/page_type field.  This lets us remove a number of special cases
> for ignoring mapcount on Slab pages.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---


Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 8/9] mm: Improve dumping of mapcount and page_type
  2024-03-21 14:24 ` [PATCH 8/9] mm: Improve dumping of mapcount and page_type Matthew Wilcox (Oracle)
  2024-03-22 11:05   ` Vlastimil Babka
@ 2024-03-22 15:10   ` David Hildenbrand
  1 sibling, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2024-03-22 15:10 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, Vlastimil Babka, Miaohe Lin, Muchun Song, Oscar Salvador

On 21.03.24 15:24, Matthew Wilcox (Oracle) wrote:
> For pages that have a page_type, set the mapcount to 0, which will
> reduce the confusion in people reading page dumps ("Why does this page
> have a mapcount of -128?").  Now that hugetlbfs is a page_type, read the
> entire_mapcount for any large folio; this is fine for all folios as no
> user reuses the entire_mapcount field.
> 
> For pages which do not have a page type, do not print it to reduce
> clutter.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 9/9] hugetlb: Remove mention of destructors
  2024-03-21 14:24 ` [PATCH 9/9] hugetlb: Remove mention of destructors Matthew Wilcox (Oracle)
  2024-03-22 11:08   ` Vlastimil Babka
@ 2024-03-22 15:13   ` David Hildenbrand
  1 sibling, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2024-03-22 15:13 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: linux-mm, Vlastimil Babka, Miaohe Lin, Muchun Song, Oscar Salvador

On 21.03.24 15:24, Matthew Wilcox (Oracle) wrote:
> We no longer have destructors or dtors, merely a page flag
> (technically a page type flag, but that's an implementation detail).
> Remove __clear_hugetlb_destructor, fix up comments and the occasional
> variable name.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/hugetlb.c | 42 +++++++++++++++++-------------------------
>   1 file changed, 17 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bdcbb62096cf..6ca9ac90ad35 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1619,19 +1619,11 @@ static inline void destroy_compound_gigantic_folio(struct folio *folio,
>   						unsigned int order) { }
>   #endif
>   
> -static inline void __clear_hugetlb_destructor(struct hstate *h,
> -						struct folio *folio)
> -{
> -	lockdep_assert_held(&hugetlb_lock);
> -
> -	__folio_clear_hugetlb(folio);

We're losing that sanity check, which is a big unfortunate.

But hugetlb maintainers can decide if they want to keep it.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-22 10:19   ` Vlastimil Babka
  2024-03-22 15:06     ` David Hildenbrand
@ 2024-03-23  3:24     ` Matthew Wilcox
  1 sibling, 0 replies; 45+ messages in thread
From: Matthew Wilcox @ 2024-03-23  3:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, David Hildenbrand, Miaohe Lin,
	Muchun Song, Oscar Salvador, Luis Chamberlain

On Fri, Mar 22, 2024 at 11:19:34AM +0100, Vlastimil Babka wrote:
> Should we add metadata wrt closing the bug report from Luis?
> 
> https://lore.kernel.org/all/8fa1c95c-4749-33dd-42ba-243e492ab109@suse.cz/

Probably a good idea.

> I assume this wouldn't be fun wrt stable...

I don't think it should be too bad?  I think we only need to backport it
as far as v6.6 when I got rid of folio->dtor.  Yes, it's unreliable
before that, but it doesn't cause crashes, just bad decisions.

> > Since hugetlb pages do not use individual page mapcounts (they are always
> > fully mapped and use the entire_mapcount field to record the number
> 
> Wasn't there some discussions to allow partial mappings of hugetlb? What
> would be the implications?

I think I'm hammering another nail into that coffin.  As I understand
it, everyone has given up on that proposal and they're looking to make
THP more reliable so they can use THP.  See Yu Zhao's recent proposals.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-21 14:24 ` [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType Matthew Wilcox (Oracle)
  2024-03-22 10:19   ` Vlastimil Babka
@ 2024-03-25  7:57   ` Vlastimil Babka
  2024-03-25 18:48     ` Andrew Morton
  2024-03-25 15:14   ` Matthew Wilcox
  2 siblings, 1 reply; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-25  7:57 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton, Luis Chamberlain
  Cc: linux-mm, David Hildenbrand, Miaohe Lin, Muchun Song, Oscar Salvador

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> The current folio_test_hugetlb() can be fooled by a concurrent folio split
> into returning true for a folio which has never belonged to hugetlbfs.
> This can't happen if the caller holds a refcount on it, but we have a
> few places (memory-failure, compaction, procfs) which do not and should
> not take a speculative reference.

In compaction and with CONFIG_DEBUG_VM enabled, the current implementation
can result in an oops, as reported by Luis. This happens since 9c5ccf2db04b
("mm: remove HUGETLB_PAGE_DTOR") effectively added some VM_BUG_ON() checks
in the PageHuge() testing path.

> Since hugetlb pages do not use individual page mapcounts (they are always
> fully mapped and use the entire_mapcount field to record the number
> of mappings), the PageType field is available now that page_mapcount()
> ignores the value in this field.

Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218227
Fixes: 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")
Cc: <stable@vger.kernel.org>

> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/page-flags.h     | 70 ++++++++++++++++------------------
>  include/trace/events/mmflags.h |  1 +
>  mm/hugetlb.c                   | 22 ++---------
>  3 files changed, 37 insertions(+), 56 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5852f967c640..6fb3cd42ee59 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -190,7 +190,6 @@ enum pageflags {
>  
>  	/* At least one page in this folio has the hwpoison flag set */
>  	PG_has_hwpoisoned = PG_error,
> -	PG_hugetlb = PG_active,
>  	PG_large_rmappable = PG_workingset, /* anon or file-backed */
>  };
>  
> @@ -876,29 +875,6 @@ FOLIO_FLAG_FALSE(large_rmappable)
>  
>  #define PG_head_mask ((1UL << PG_head))
>  
> -#ifdef CONFIG_HUGETLB_PAGE
> -int PageHuge(const struct page *page);
> -SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
> -CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
> -
> -/**
> - * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
> - * @folio: The folio to test.
> - *
> - * Context: Any context.  Caller should have a reference on the folio to
> - * prevent it from being turned into a tail page.
> - * Return: True for hugetlbfs folios, false for anon folios or folios
> - * belonging to other filesystems.
> - */
> -static inline bool folio_test_hugetlb(const struct folio *folio)
> -{
> -	return folio_test_large(folio) &&
> -		test_bit(PG_hugetlb, const_folio_flags(folio, 1));
> -}
> -#else
> -TESTPAGEFLAG_FALSE(Huge, hugetlb)
> -#endif
> -
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  /*
>   * PageHuge() only returns true for hugetlbfs pages, but not for
> @@ -954,18 +930,6 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>  	TESTSCFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>  #endif
>  
> -/*
> - * Check if a page is currently marked HWPoisoned. Note that this check is
> - * best effort only and inherently racy: there is no way to synchronize with
> - * failing hardware.
> - */
> -static inline bool is_page_hwpoison(struct page *page)
> -{
> -	if (PageHWPoison(page))
> -		return true;
> -	return PageHuge(page) && PageHWPoison(compound_head(page));
> -}
> -
>  /*
>   * For pages that are never mapped to userspace (and aren't PageSlab),
>   * page_type may be used.  Because it is initialised to -1, we invert the
> @@ -982,6 +946,7 @@ static inline bool is_page_hwpoison(struct page *page)
>  #define PG_offline	0x00000100
>  #define PG_table	0x00000200
>  #define PG_guard	0x00000400
> +#define PG_hugetlb	0x00000800
>  
>  #define PageType(page, flag)						\
>  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> @@ -1076,6 +1041,37 @@ PAGE_TYPE_OPS(Table, table, pgtable)
>   */
>  PAGE_TYPE_OPS(Guard, guard, guard)
>  
> +#ifdef CONFIG_HUGETLB_PAGE
> +FOLIO_TYPE_OPS(hugetlb, hugetlb)
> +#else
> +FOLIO_TEST_FLAG_FALSE(hugetlb)
> +#endif
> +
> +/**
> + * PageHuge - Determine if the page belongs to hugetlbfs
> + * @page: The page to test.
> + *
> + * Context: Any context.
> + * Return: True for hugetlbfs pages, false for anon pages or pages
> + * belonging to other filesystems.
> + */
> +static inline bool PageHuge(const struct page *page)
> +{
> +	return folio_test_hugetlb(page_folio(page));
> +}
> +
> +/*
> + * Check if a page is currently marked HWPoisoned. Note that this check is
> + * best effort only and inherently racy: there is no way to synchronize with
> + * failing hardware.
> + */
> +static inline bool is_page_hwpoison(struct page *page)
> +{
> +	if (PageHWPoison(page))
> +		return true;
> +	return PageHuge(page) && PageHWPoison(compound_head(page));
> +}
> +
>  extern bool is_free_buddy_page(struct page *page);
>  
>  PAGEFLAG(Isolated, isolated, PF_ANY);
> @@ -1142,7 +1138,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
>   */
>  #define PAGE_FLAGS_SECOND						\
>  	(0xffUL /* order */		| 1UL << PG_has_hwpoisoned |	\
> -	 1UL << PG_hugetlb		| 1UL << PG_large_rmappable)
> +	 1UL << PG_large_rmappable)
>  
>  #define PAGE_FLAGS_PRIVATE				\
>  	(1UL << PG_private | 1UL << PG_private_2)
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index d801409b33cf..d55e53ac91bd 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -135,6 +135,7 @@ IF_HAVE_PG_ARCH_X(arch_3)
>  #define DEF_PAGETYPE_NAME(_name) { PG_##_name, __stringify(_name) }
>  
>  #define __def_pagetype_names						\
> +	DEF_PAGETYPE_NAME(hugetlb),					\
>  	DEF_PAGETYPE_NAME(offline),					\
>  	DEF_PAGETYPE_NAME(guard),					\
>  	DEF_PAGETYPE_NAME(table),					\
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 7e9a766059aa..bdcbb62096cf 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1624,7 +1624,7 @@ static inline void __clear_hugetlb_destructor(struct hstate *h,
>  {
>  	lockdep_assert_held(&hugetlb_lock);
>  
> -	folio_clear_hugetlb(folio);
> +	__folio_clear_hugetlb(folio);
>  }
>  
>  /*
> @@ -1711,7 +1711,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio,
>  		h->surplus_huge_pages_node[nid]++;
>  	}
>  
> -	folio_set_hugetlb(folio);
> +	__folio_set_hugetlb(folio);
>  	folio_change_private(folio, NULL);
>  	/*
>  	 * We have to set hugetlb_vmemmap_optimized again as above
> @@ -2050,7 +2050,7 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid)
>  
>  static void init_new_hugetlb_folio(struct hstate *h, struct folio *folio)
>  {
> -	folio_set_hugetlb(folio);
> +	__folio_set_hugetlb(folio);
>  	INIT_LIST_HEAD(&folio->lru);
>  	hugetlb_set_folio_subpool(folio, NULL);
>  	set_hugetlb_cgroup(folio, NULL);
> @@ -2160,22 +2160,6 @@ static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
>  	return __prep_compound_gigantic_folio(folio, order, true);
>  }
>  
> -/*
> - * PageHuge() only returns true for hugetlbfs pages, but not for normal or
> - * transparent huge pages.  See the PageTransHuge() documentation for more
> - * details.
> - */
> -int PageHuge(const struct page *page)
> -{
> -	const struct folio *folio;
> -
> -	if (!PageCompound(page))
> -		return 0;
> -	folio = page_folio(page);
> -	return folio_test_hugetlb(folio);
> -}
> -EXPORT_SYMBOL_GPL(PageHuge);
> -
>  /*
>   * Find and lock address space (mapping) in write mode.
>   *



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-21 14:24 ` [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType Matthew Wilcox (Oracle)
  2024-03-22 10:19   ` Vlastimil Babka
  2024-03-25  7:57   ` Vlastimil Babka
@ 2024-03-25 15:14   ` Matthew Wilcox
  2024-03-25 15:18     ` Matthew Wilcox
  2 siblings, 1 reply; 45+ messages in thread
From: Matthew Wilcox @ 2024-03-25 15:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

On Thu, Mar 21, 2024 at 02:24:43PM +0000, Matthew Wilcox (Oracle) wrote:
> The current folio_test_hugetlb() can be fooled by a concurrent folio split
> into returning true for a folio which has never belonged to hugetlbfs.
> This can't happen if the caller holds a refcount on it, but we have a
> few places (memory-failure, compaction, procfs) which do not and should
> not take a speculative reference.
> 
> Since hugetlb pages do not use individual page mapcounts (they are always
> fully mapped and use the entire_mapcount field to record the number
> of mappings), the PageType field is available now that page_mapcount()
> ignores the value in this field.

Update vmcoreinfo:

diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index f95516cd45bb..41372f5d5c19 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -205,11 +205,10 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_NUMBER(PG_head_mask);
 #define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
-#ifdef CONFIG_HUGETLB_PAGE
-	VMCOREINFO_NUMBER(PG_hugetlb);
+#define PAGE_HUGETLB_MAPCOUNT_VALUE(	(~PG_hugetlb)
+	VMCOREINFO_NUMBER(PAGE_HUGETLB_MAPCOUNT_VALUE);
 #define PAGE_OFFLINE_MAPCOUNT_VALUE	(~PG_offline)
 	VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
-#endif
 
 #ifdef CONFIG_KALLSYMS
 	VMCOREINFO_SYMBOL(kallsyms_names);


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-25 15:14   ` Matthew Wilcox
@ 2024-03-25 15:18     ` Matthew Wilcox
  2024-03-25 15:33       ` Matthew Wilcox
  0 siblings, 1 reply; 45+ messages in thread
From: Matthew Wilcox @ 2024-03-25 15:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

On Mon, Mar 25, 2024 at 03:14:53PM +0000, Matthew Wilcox wrote:
> Update vmcoreinfo:

Urgh, a typo slipped in.  use this instead:

diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index f95516cd45bb..41372f5d5c19 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -205,11 +205,10 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_NUMBER(PG_head_mask);
 #define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
-#ifdef CONFIG_HUGETLB_PAGE
-	VMCOREINFO_NUMBER(PG_hugetlb);
+#define PAGE_HUGETLB_MAPCOUNT_VALUE(	(~PG_hugetlb)
+	VMCOREINFO_NUMBER(PAGE_HUGETLB_MAPCOUNT_VALUE);
 #define PAGE_OFFLINE_MAPCOUNT_VALUE	(~PG_offline)
 	VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
-#endif
 
 #ifdef CONFIG_KALLSYMS
 	VMCOREINFO_SYMBOL(kallsyms_names);


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 7/9] mm: Free up PG_slab
  2024-03-21 14:24 ` [PATCH 7/9] mm: Free up PG_slab Matthew Wilcox (Oracle)
  2024-03-22  9:20   ` Miaohe Lin
  2024-03-22 15:09   ` David Hildenbrand
@ 2024-03-25 15:19   ` Matthew Wilcox
  2024-03-31 15:11     ` [LTP] " kernel test robot
  3 siblings, 0 replies; 45+ messages in thread
From: Matthew Wilcox @ 2024-03-25 15:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

On Thu, Mar 21, 2024 at 02:24:45PM +0000, Matthew Wilcox (Oracle) wrote:
> Reclaim the Slab page flag by using a spare bit in PageType.  We are
> perennially short of page flags for various purposes, and now that
> the original SLAB allocator has been retired, SLUB does not use the
> mapcount/page_type field.  This lets us remove a number of special cases
> for ignoring mapcount on Slab pages.

Update vmcoreinfo:

diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index 23c125c2e243..1d5eadd9dd61 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -198,7 +198,8 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_NUMBER(PG_private);
 	VMCOREINFO_NUMBER(PG_swapcache);
 	VMCOREINFO_NUMBER(PG_swapbacked);
-	VMCOREINFO_NUMBER(PG_slab);
+#define PAGE_SLAB_MAPCOUNT_VALUE	(~PG_slab)
+	VMCOREINFO_NUMBER(PAGE_SLAB_MAPCOUNT_VALUE);
 #ifdef CONFIG_MEMORY_FAILURE
 	VMCOREINFO_NUMBER(PG_hwpoison);
 #endif


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-25 15:18     ` Matthew Wilcox
@ 2024-03-25 15:33       ` Matthew Wilcox
  0 siblings, 0 replies; 45+ messages in thread
From: Matthew Wilcox @ 2024-03-25 15:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

On Mon, Mar 25, 2024 at 03:18:39PM +0000, Matthew Wilcox wrote:
> On Mon, Mar 25, 2024 at 03:14:53PM +0000, Matthew Wilcox wrote:
> > Update vmcoreinfo:
> 
> Urgh, a typo slipped in.  use this instead:

*sigh*.  Can I go back to bed now please?

diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index f95516cd45bb..23c125c2e243 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -205,11 +205,10 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_NUMBER(PG_head_mask);
 #define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
-#ifdef CONFIG_HUGETLB_PAGE
-	VMCOREINFO_NUMBER(PG_hugetlb);
+#define PAGE_HUGETLB_MAPCOUNT_VALUE	(~PG_hugetlb)
+	VMCOREINFO_NUMBER(PAGE_HUGETLB_MAPCOUNT_VALUE);
 #define PAGE_OFFLINE_MAPCOUNT_VALUE	(~PG_offline)
 	VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
-#endif
 
 #ifdef CONFIG_KALLSYMS
 	VMCOREINFO_SYMBOL(kallsyms_names);


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-25  7:57   ` Vlastimil Babka
@ 2024-03-25 18:48     ` Andrew Morton
  2024-03-25 20:41       ` Matthew Wilcox
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Morton @ 2024-03-25 18:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox (Oracle),
	Luis Chamberlain, linux-mm, David Hildenbrand, Miaohe Lin,
	Muchun Song, Oscar Salvador

On Mon, 25 Mar 2024 08:57:52 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> > The current folio_test_hugetlb() can be fooled by a concurrent folio split
> > into returning true for a folio which has never belonged to hugetlbfs.
> > This can't happen if the caller holds a refcount on it, but we have a
> > few places (memory-failure, compaction, procfs) which do not and should
> > not take a speculative reference.
> 
> In compaction and with CONFIG_DEBUG_VM enabled, the current implementation
> can result in an oops, as reported by Luis. This happens since 9c5ccf2db04b
> ("mm: remove HUGETLB_PAGE_DTOR") effectively added some VM_BUG_ON() checks
> in the PageHuge() testing path.
> 
> > Since hugetlb pages do not use individual page mapcounts (they are always
> > fully mapped and use the entire_mapcount field to record the number
> > of mappings), the PageType field is available now that page_mapcount()
> > ignores the value in this field.
> 
> Reported-by: Luis Chamberlain <mcgrof@kernel.org>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218227
> Fixes: 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")
> Cc: <stable@vger.kernel.org>

Thanks.

The patch doesn't work as a standalone thing.

In file included from ./include/linux/mmzone.h:23,
                 from ./include/linux/gfp.h:7,
                 from ./include/linux/slab.h:16,
                 from ./include/linux/crypto.h:17,
                 from arch/x86/kernel/asm-offsets.c:9:
./include/linux/page-flags.h:1021:1: error: return type defaults to 'int' [-Werror=implicit-int]
 1021 | FOLIO_TYPE_OPS(hugetlb, hugetlb)
      | ^~~~~~~~~~~~~~
./include/linux/page-flags.h:1021:1: error: function declaration isn't a prototype [-Werror=strict-prototypes]
./include/linux/page-flags.h: In function 'FOLIO_TYPE_OPS':
./include/linux/page-flags.h:1035:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token

<a million more>

Matthew, could you please redo this patch (and its vmcore fix) and send
as a standalone -stable patch?  It could be that the "Various
significant MM patches" will need a redo afterwards.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-25 18:48     ` Andrew Morton
@ 2024-03-25 20:41       ` Matthew Wilcox
  2024-03-25 20:47         ` Vlastimil Babka
  0 siblings, 1 reply; 45+ messages in thread
From: Matthew Wilcox @ 2024-03-25 20:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Luis Chamberlain, linux-mm, David Hildenbrand,
	Miaohe Lin, Muchun Song, Oscar Salvador

On Mon, Mar 25, 2024 at 11:48:13AM -0700, Andrew Morton wrote:
> On Mon, 25 Mar 2024 08:57:52 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
> > Reported-by: Luis Chamberlain <mcgrof@kernel.org>
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218227
> > Fixes: 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")
> > Cc: <stable@vger.kernel.org>
> 
> Thanks.
> 
> The patch doesn't work as a standalone thing.

No, it depends on both
    mm: support page_mapcount() on page_has_type() pages
    mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros

I was assuming both would get backported as dependencies.  If you want a
standalone patch, something like this would do the trick.

> Matthew, could you please redo this patch (and its vmcore fix) and send
> as a standalone -stable patch?  It could be that the "Various
> significant MM patches" will need a redo afterwards.

I'd rather keep the mapcount patch separate for upstream purposes.
I've build-tested against 6.6.22 with allmodconfig and then with
HUGETLB=n (but otherwise allmodconfig)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf5d0b1b16f4..5e15004eab8c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1192,6 +1192,9 @@ static inline int page_mapcount(struct page *page)
 {
 	int mapcount = atomic_read(&page->_mapcount) + 1;
 
+	/* Handle page_has_type() pages */
+	if (mapcount < 0)
+		mapcount = 0;
 	if (unlikely(PageCompound(page)))
 		mapcount += folio_entire_mapcount(page_folio(page));
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5c02720c53a5..4d5f750551c5 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -190,7 +190,6 @@ enum pageflags {
 
 	/* At least one page in this folio has the hwpoison flag set */
 	PG_has_hwpoisoned = PG_error,
-	PG_hugetlb = PG_active,
 	PG_large_rmappable = PG_workingset, /* anon or file-backed */
 };
 
@@ -815,29 +814,6 @@ TESTPAGEFLAG_FALSE(LargeRmappable, large_rmappable)
 
 #define PG_head_mask ((1UL << PG_head))
 
-#ifdef CONFIG_HUGETLB_PAGE
-int PageHuge(struct page *page);
-SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
-CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
-
-/**
- * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
- * @folio: The folio to test.
- *
- * Context: Any context.  Caller should have a reference on the folio to
- * prevent it from being turned into a tail page.
- * Return: True for hugetlbfs folios, false for anon folios or folios
- * belonging to other filesystems.
- */
-static inline bool folio_test_hugetlb(struct folio *folio)
-{
-	return folio_test_large(folio) &&
-		test_bit(PG_hugetlb, folio_flags(folio, 1));
-}
-#else
-TESTPAGEFLAG_FALSE(Huge, hugetlb)
-#endif
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * PageHuge() only returns true for hugetlbfs pages, but not for
@@ -893,18 +869,6 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
 	TESTSCFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
 #endif
 
-/*
- * Check if a page is currently marked HWPoisoned. Note that this check is
- * best effort only and inherently racy: there is no way to synchronize with
- * failing hardware.
- */
-static inline bool is_page_hwpoison(struct page *page)
-{
-	if (PageHWPoison(page))
-		return true;
-	return PageHuge(page) && PageHWPoison(compound_head(page));
-}
-
 /*
  * For pages that are never mapped to userspace (and aren't PageSlab),
  * page_type may be used.  Because it is initialised to -1, we invert the
@@ -921,6 +885,7 @@ static inline bool is_page_hwpoison(struct page *page)
 #define PG_offline	0x00000100
 #define PG_table	0x00000200
 #define PG_guard	0x00000400
+#define PG_hugetlb	0x00000800
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -1012,6 +977,37 @@ PAGE_TYPE_OPS(Table, table, pgtable)
  */
 PAGE_TYPE_OPS(Guard, guard, guard)
 
+#ifdef CONFIG_HUGETLB_PAGE
+PAGE_TYPE_OPS(HeadHugeTLB, hugetlb, hugetlb)
+
+/**
+ * PageHuge - Determine if the folio belongs to hugetlbfs.
+ * @page: The page to test.
+ *
+ * Context: Any context.
+ * Return: True for hugetlbfs folios, false for anon folios or folios
+ * belonging to other filesystems.
+ */
+static inline bool PageHuge(const struct page *page)
+{
+	return folio_test_hugetlb(page_folio(page));
+}
+#else
+TESTPAGEFLAG_FALSE(Huge, hugetlb)
+#endif
+
+/*
+ * Check if a page is currently marked HWPoisoned. Note that this check is
+ * best effort only and inherently racy: there is no way to synchronize with
+ * failing hardware.
+ */
+static inline bool is_page_hwpoison(struct page *page)
+{
+	if (PageHWPoison(page))
+		return true;
+	return PageHuge(page) && PageHWPoison(compound_head(page));
+}
+
 extern bool is_free_buddy_page(struct page *page);
 
 PAGEFLAG(Isolated, isolated, PF_ANY);
@@ -1078,7 +1074,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
  */
 #define PAGE_FLAGS_SECOND						\
 	(0xffUL /* order */		| 1UL << PG_has_hwpoisoned |	\
-	 1UL << PG_hugetlb		| 1UL << PG_large_rmappable)
+	 1UL << PG_large_rmappable)
 
 #define PAGE_FLAGS_PRIVATE				\
 	(1UL << PG_private | 1UL << PG_private_2)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 1478b9dd05fa..e010618f9326 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -135,6 +135,7 @@ IF_HAVE_PG_ARCH_X(arch_3)
 #define DEF_PAGETYPE_NAME(_name) { PG_##_name, __stringify(_name) }
 
 #define __def_pagetype_names						\
+	DEF_PAGETYPE_NAME(hugetlb),					\
 	DEF_PAGETYPE_NAME(offline),					\
 	DEF_PAGETYPE_NAME(guard),					\
 	DEF_PAGETYPE_NAME(table),					\
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 2f675ef045d4..a2face7fbef8 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -675,8 +675,8 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_NUMBER(PG_head_mask);
 #define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
-#ifdef CONFIG_HUGETLB_PAGE
-	VMCOREINFO_NUMBER(PG_hugetlb);
+#define PAGE_HUGETLB_MAPCOUNT_VALUE	(~PG_hugetlb)
+	VMCOREINFO_NUMBER(PAGE_HUGETLB_MAPCOUNT_VALUE);
 #define PAGE_OFFLINE_MAPCOUNT_VALUE	(~PG_offline)
 	VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
 #endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5e6c4d367d33..30b713d330ca 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1630,7 +1630,7 @@ static inline void __clear_hugetlb_destructor(struct hstate *h,
 {
 	lockdep_assert_held(&hugetlb_lock);
 
-	folio_clear_hugetlb(folio);
+	__folio_clear_hugetlb(folio);
 }
 
 /*
@@ -1717,7 +1717,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio,
 		h->surplus_huge_pages_node[nid]++;
 	}
 
-	folio_set_hugetlb(folio);
+	__folio_set_hugetlb(folio);
 	folio_change_private(folio, NULL);
 	/*
 	 * We have to set hugetlb_vmemmap_optimized again as above
@@ -1971,7 +1971,7 @@ static void __prep_new_hugetlb_folio(struct hstate *h, struct folio *folio)
 {
 	hugetlb_vmemmap_optimize(h, &folio->page);
 	INIT_LIST_HEAD(&folio->lru);
-	folio_set_hugetlb(folio);
+	__folio_set_hugetlb(folio);
 	hugetlb_set_folio_subpool(folio, NULL);
 	set_hugetlb_cgroup(folio, NULL);
 	set_hugetlb_cgroup_rsvd(folio, NULL);
@@ -2074,22 +2074,6 @@ static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
 	return __prep_compound_gigantic_folio(folio, order, true);
 }
 
-/*
- * PageHuge() only returns true for hugetlbfs pages, but not for normal or
- * transparent huge pages.  See the PageTransHuge() documentation for more
- * details.
- */
-int PageHuge(struct page *page)
-{
-	struct folio *folio;
-
-	if (!PageCompound(page))
-		return 0;
-	folio = page_folio(page);
-	return folio_test_hugetlb(folio);
-}
-EXPORT_SYMBOL_GPL(PageHuge);
-
 /*
  * Find and lock address space (mapping) in write mode.
  *


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType
  2024-03-25 20:41       ` Matthew Wilcox
@ 2024-03-25 20:47         ` Vlastimil Babka
  0 siblings, 0 replies; 45+ messages in thread
From: Vlastimil Babka @ 2024-03-25 20:47 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton
  Cc: Luis Chamberlain, linux-mm, David Hildenbrand, Miaohe Lin,
	Muchun Song, Oscar Salvador

On 3/25/24 9:41 PM, Matthew Wilcox wrote:
> On Mon, Mar 25, 2024 at 11:48:13AM -0700, Andrew Morton wrote:
>> On Mon, 25 Mar 2024 08:57:52 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
>> > Reported-by: Luis Chamberlain <mcgrof@kernel.org>
>> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218227
>> > Fixes: 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")
>> > Cc: <stable@vger.kernel.org>
>> 
>> Thanks.
>> 
>> The patch doesn't work as a standalone thing.
> 
> No, it depends on both
>     mm: support page_mapcount() on page_has_type() pages
>     mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros

Stable maintainers are usually fine with dependency patches and these are
not hugely intrusive and risky? We should just order and mark it in a way to
make it all obvious.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 7/9] mm: Free up PG_slab
  2024-03-21 14:24 ` [PATCH 7/9] mm: Free up PG_slab Matthew Wilcox (Oracle)
@ 2024-03-31 15:11     ` kernel test robot
  2024-03-22 15:09   ` David Hildenbrand
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: kernel test robot @ 2024-03-31 15:11 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: oe-lkp, lkp, linux-mm, linux-kernel, linux-trace-kernel, ltp,
	Andrew Morton, Matthew Wilcox (Oracle),
	David Hildenbrand, Vlastimil Babka, Miaohe Lin, Muchun Song,
	Oscar Salvador, oliver.sang



Hello,

kernel test robot noticed "UBSAN:shift-out-of-bounds_in_fs/proc/page.c" on:

commit: 30e5296811312a13938b83956a55839ac1e3aa40 ("[PATCH 7/9] mm: Free up PG_slab")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Always-initialise-folio-_deferred_list/20240321-222800
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 23956900041d968f9ad0f30db6dede4daccd7aa9
patch link: https://lore.kernel.org/all/20240321142448.1645400-8-willy@infradead.org/
patch subject: [PATCH 7/9] mm: Free up PG_slab

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20240323
with following parameters:

	disk: 1HDD
	fs: ext4
	test: fs-00



compiler: gcc-12
test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz (Ivy Bridge) with 8G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202403312344.c0d273ab-oliver.sang@intel.com


kern  :warn  : [  528.627387] ------------[ cut here ]------------
kern  :err   : [  528.627589] UBSAN: shift-out-of-bounds in fs/proc/page.c:107:18
kern  :err   : [  528.627884] shift exponent 4096 is too large for 64-bit type 'long long unsigned int'
kern  :warn  : [  528.628200] CPU: 0 PID: 4703 Comm: proc01 Tainted: G S                 6.8.0-11774-g30e529681131 #1
kern  :warn  : [  528.628446] Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013
kern  :warn  : [  528.628659] Call Trace:
kern  :warn  : [  528.628814]  <TASK>
kern :warn : [  528.628960] dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) 
kern :warn : [  528.629134] __ubsan_handle_shift_out_of_bounds (lib/ubsan.c:218 lib/ubsan.c:454) 
kern :warn : [  528.629360] stable_page_flags.part.0.cold (include/linux/page-flags.h:284 fs/proc/page.c:184) 
kern :warn : [  528.629506] kpageflags_read (fs/proc/page.c:238 fs/proc/page.c:250) 
kern :warn : [  528.629623] vfs_read (fs/read_write.c:474) 
kern :warn : [  528.629737] ? do_sys_openat2 (fs/open.c:1415) 
kern :warn : [  528.629898] ? kmem_cache_free (mm/slub.c:4280 mm/slub.c:4344) 
kern :warn : [  528.630063] ? __pfx_vfs_read (fs/read_write.c:457) 
kern :warn : [  528.630225] ? do_sys_openat2 (fs/open.c:1415) 
kern :warn : [  528.630388] ? __pfx_do_sys_openat2 (fs/open.c:1392) 
kern :warn : [  528.630552] ? __do_sys_newfstatat (fs/stat.c:464) 
kern :warn : [  528.630717] ? __fget_light (include/linux/atomic/atomic-arch-fallback.h:479 include/linux/atomic/atomic-instrumented.h:50 fs/file.c:1145) 
kern :warn : [  528.630888] ksys_read (fs/read_write.c:619) 
kern :warn : [  528.631051] ? __pfx_ksys_read (fs/read_write.c:609) 
kern :warn : [  528.631216] ? kmem_cache_free (mm/slub.c:4280 mm/slub.c:4344) 
kern :warn : [  528.631415] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) 
kern :warn : [  528.631555] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) 
kern  :warn  : [  528.631756] RIP: 0033:0x7f90bf2ba19d
kern :warn : [ 528.631913] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 54 0a 00 e8 49 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 24 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
All code
========
   0:	31 c0                	xor    %eax,%eax
   2:	e9 c6 fe ff ff       	jmpq   0xfffffffffffffecd
   7:	50                   	push   %rax
   8:	48 8d 3d 66 54 0a 00 	lea    0xa5466(%rip),%rdi        # 0xa5475
   f:	e8 49 ff 01 00       	callq  0x1ff5d
  14:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  1b:	00 00 
  1d:	80 3d 41 24 0e 00 00 	cmpb   $0x0,0xe2441(%rip)        # 0xe2465
  24:	74 17                	je     0x3d
  26:	31 c0                	xor    %eax,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 5b                	ja     0x8d
  32:	c3                   	retq   
  33:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  3a:	00 00 00 
  3d:	48                   	rex.W
  3e:	83                   	.byte 0x83
  3f:	ec                   	in     (%dx),%al

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 5b                	ja     0x63
   8:	c3                   	retq   
   9:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  10:	00 00 00 
  13:	48                   	rex.W
  14:	83                   	.byte 0x83
  15:	ec                   	in     (%dx),%al
kern  :warn  : [  528.632309] RSP: 002b:00007ffe2eb3c008 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
kern  :warn  : [  528.632540] RAX: ffffffffffffffda RBX: 00007ffe2eb3d1b0 RCX: 00007f90bf2ba19d
kern  :warn  : [  528.632757] RDX: 0000000000000400 RSI: 000055e284e68c40 RDI: 0000000000000005
kern  :warn  : [  528.632960] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000073
kern  :warn  : [  528.633156] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
kern  :warn  : [  528.633399] R13: 000055e284e68c40 R14: 000055e2a975f8cb R15: 00007ffe2eb3d1b0
kern  :warn  : [  528.633645]  </TASK>
kern  :warn  : [  528.633813] ---[ end trace ]---



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240331/202403312344.c0d273ab-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [LTP] [PATCH 7/9] mm: Free up PG_slab
@ 2024-03-31 15:11     ` kernel test robot
  0 siblings, 0 replies; 45+ messages in thread
From: kernel test robot @ 2024-03-31 15:11 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Miaohe Lin, lkp, David Hildenbrand, Muchun Song, linux-kernel,
	Matthew Wilcox (Oracle),
	linux-mm, oliver.sang, Vlastimil Babka, oe-lkp, Andrew Morton,
	Oscar Salvador, ltp, linux-trace-kernel



Hello,

kernel test robot noticed "UBSAN:shift-out-of-bounds_in_fs/proc/page.c" on:

commit: 30e5296811312a13938b83956a55839ac1e3aa40 ("[PATCH 7/9] mm: Free up PG_slab")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Always-initialise-folio-_deferred_list/20240321-222800
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 23956900041d968f9ad0f30db6dede4daccd7aa9
patch link: https://lore.kernel.org/all/20240321142448.1645400-8-willy@infradead.org/
patch subject: [PATCH 7/9] mm: Free up PG_slab

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20240323
with following parameters:

	disk: 1HDD
	fs: ext4
	test: fs-00



compiler: gcc-12
test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz (Ivy Bridge) with 8G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202403312344.c0d273ab-oliver.sang@intel.com


kern  :warn  : [  528.627387] ------------[ cut here ]------------
kern  :err   : [  528.627589] UBSAN: shift-out-of-bounds in fs/proc/page.c:107:18
kern  :err   : [  528.627884] shift exponent 4096 is too large for 64-bit type 'long long unsigned int'
kern  :warn  : [  528.628200] CPU: 0 PID: 4703 Comm: proc01 Tainted: G S                 6.8.0-11774-g30e529681131 #1
kern  :warn  : [  528.628446] Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013
kern  :warn  : [  528.628659] Call Trace:
kern  :warn  : [  528.628814]  <TASK>
kern :warn : [  528.628960] dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) 
kern :warn : [  528.629134] __ubsan_handle_shift_out_of_bounds (lib/ubsan.c:218 lib/ubsan.c:454) 
kern :warn : [  528.629360] stable_page_flags.part.0.cold (include/linux/page-flags.h:284 fs/proc/page.c:184) 
kern :warn : [  528.629506] kpageflags_read (fs/proc/page.c:238 fs/proc/page.c:250) 
kern :warn : [  528.629623] vfs_read (fs/read_write.c:474) 
kern :warn : [  528.629737] ? do_sys_openat2 (fs/open.c:1415) 
kern :warn : [  528.629898] ? kmem_cache_free (mm/slub.c:4280 mm/slub.c:4344) 
kern :warn : [  528.630063] ? __pfx_vfs_read (fs/read_write.c:457) 
kern :warn : [  528.630225] ? do_sys_openat2 (fs/open.c:1415) 
kern :warn : [  528.630388] ? __pfx_do_sys_openat2 (fs/open.c:1392) 
kern :warn : [  528.630552] ? __do_sys_newfstatat (fs/stat.c:464) 
kern :warn : [  528.630717] ? __fget_light (include/linux/atomic/atomic-arch-fallback.h:479 include/linux/atomic/atomic-instrumented.h:50 fs/file.c:1145) 
kern :warn : [  528.630888] ksys_read (fs/read_write.c:619) 
kern :warn : [  528.631051] ? __pfx_ksys_read (fs/read_write.c:609) 
kern :warn : [  528.631216] ? kmem_cache_free (mm/slub.c:4280 mm/slub.c:4344) 
kern :warn : [  528.631415] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) 
kern :warn : [  528.631555] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) 
kern  :warn  : [  528.631756] RIP: 0033:0x7f90bf2ba19d
kern :warn : [ 528.631913] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 54 0a 00 e8 49 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 24 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
All code
========
   0:	31 c0                	xor    %eax,%eax
   2:	e9 c6 fe ff ff       	jmpq   0xfffffffffffffecd
   7:	50                   	push   %rax
   8:	48 8d 3d 66 54 0a 00 	lea    0xa5466(%rip),%rdi        # 0xa5475
   f:	e8 49 ff 01 00       	callq  0x1ff5d
  14:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  1b:	00 00 
  1d:	80 3d 41 24 0e 00 00 	cmpb   $0x0,0xe2441(%rip)        # 0xe2465
  24:	74 17                	je     0x3d
  26:	31 c0                	xor    %eax,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 5b                	ja     0x8d
  32:	c3                   	retq   
  33:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  3a:	00 00 00 
  3d:	48                   	rex.W
  3e:	83                   	.byte 0x83
  3f:	ec                   	in     (%dx),%al

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 5b                	ja     0x63
   8:	c3                   	retq   
   9:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  10:	00 00 00 
  13:	48                   	rex.W
  14:	83                   	.byte 0x83
  15:	ec                   	in     (%dx),%al
kern  :warn  : [  528.632309] RSP: 002b:00007ffe2eb3c008 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
kern  :warn  : [  528.632540] RAX: ffffffffffffffda RBX: 00007ffe2eb3d1b0 RCX: 00007f90bf2ba19d
kern  :warn  : [  528.632757] RDX: 0000000000000400 RSI: 000055e284e68c40 RDI: 0000000000000005
kern  :warn  : [  528.632960] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000073
kern  :warn  : [  528.633156] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
kern  :warn  : [  528.633399] R13: 000055e284e68c40 R14: 000055e2a975f8cb R15: 00007ffe2eb3d1b0
kern  :warn  : [  528.633645]  </TASK>
kern  :warn  : [  528.633813] ---[ end trace ]---



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240331/202403312344.c0d273ab-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] mm: Always initialise folio->_deferred_list
  2024-03-22 13:00     ` Matthew Wilcox
@ 2024-04-01  3:14       ` Miaohe Lin
  0 siblings, 0 replies; 45+ messages in thread
From: Miaohe Lin @ 2024-04-01  3:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, David Hildenbrand, Vlastimil Babka, Muchun Song,
	Oscar Salvador, Andrew Morton

On 2024/3/22 21:00, Matthew Wilcox wrote:
> On Fri, Mar 22, 2024 at 04:23:59PM +0800, Miaohe Lin wrote:
>>> +++ b/mm/hugetlb.c
>>> @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
>>>  		destroy_compound_gigantic_folio(folio, huge_page_order(h));
>>>  		free_gigantic_folio(folio, huge_page_order(h));
>>>  	} else {
>>> -		__free_pages(&folio->page, huge_page_order(h));
>>> +		INIT_LIST_HEAD(&folio->_deferred_list);
>>
>> Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ?

Sorry for late, I was on off-the-job training last week. It's really tired. :(

> 
> Maybe?  Something like
> 		/* We reused this space for our own purposes */

This one looks good to me.

> 
>>> +		folio_put(folio);
>>
>> Can all __free_pages be replaced with folio_put in mm/hugetlb.c?
> 
> There's only one left, and indeed it can!
> 
> I'll drop this into my tree and send it as a proper patch later.
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 333f6278ef63..43cc7e6bc374 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
>  		nodemask_t *node_alloc_noretry)
>  {
>  	int order = huge_page_order(h);
> -	struct page *page;
> +	struct folio *folio;
>  	bool alloc_try_hard = true;
>  	bool retry = true;
>  
>  	/*
> -	 * By default we always try hard to allocate the page with
> -	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating pages in
> +	 * By default we always try hard to allocate the folio with
> +	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating folios in
>  	 * a loop (to adjust global huge page counts) and previous allocation
>  	 * failed, do not continue to try hard on the same node.  Use the
>  	 * node_alloc_noretry bitmap to manage this state information.
> @@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
>  	if (nid == NUMA_NO_NODE)
>  		nid = numa_mem_id();
>  retry:
> -	page = __alloc_pages(gfp_mask, order, nid, nmask);
> +	folio = __folio_alloc(gfp_mask, order, nid, nmask);
>  
> -	/* Freeze head page */
> -	if (page && !page_ref_freeze(page, 1)) {
> -		__free_pages(page, order);
> +	if (folio && !folio_ref_freeze(folio, 1)) {
> +		folio_put(folio);
>  		if (retry) {	/* retry once */
>  			retry = false;
>  			goto retry;
>  		}
>  		/* WOW!  twice in a row. */
> -		pr_warn("HugeTLB head page unexpected inflated ref count\n");
> -		page = NULL;
> +		pr_warn("HugeTLB unexpected inflated folio ref count\n");
> +		folio = NULL;
>  	}
>  
>  	/*
> -	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this
> -	 * indicates an overall state change.  Clear bit so that we resume
> -	 * normal 'try hard' allocations.
> +	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
> +	 * folio this indicates an overall state change.  Clear bit so
> +	 * that we resume normal 'try hard' allocations.
>  	 */
> -	if (node_alloc_noretry && page && !alloc_try_hard)
> +	if (node_alloc_noretry && folio && !alloc_try_hard)
>  		node_clear(nid, *node_alloc_noretry);
>  
>  	/*
> -	 * If we tried hard to get a page but failed, set bit so that
> +	 * If we tried hard to get a folio but failed, set bit so that
>  	 * subsequent attempts will not try as hard until there is an
>  	 * overall state change.
>  	 */
> -	if (node_alloc_noretry && !page && alloc_try_hard)
> +	if (node_alloc_noretry && !folio && alloc_try_hard)
>  		node_set(nid, *node_alloc_noretry);
>  
> -	if (!page) {
> +	if (!folio) {
>  		__count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
>  		return NULL;
>  	}
>  
>  	__count_vm_event(HTLB_BUDDY_PGALLOC);
> -	return page_folio(page);
> +	return folio;
>  }
>  
>  static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
> .

This also looks good to me. Thanks for your work.




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 7/9] mm: Free up PG_slab
  2024-03-22 10:41     ` Vlastimil Babka
@ 2024-04-01  3:38       ` Miaohe Lin
  0 siblings, 0 replies; 45+ messages in thread
From: Miaohe Lin @ 2024-04-01  3:38 UTC (permalink / raw)
  To: Vlastimil Babka, Matthew Wilcox (Oracle), Naoya Horiguchi
  Cc: linux-mm, David Hildenbrand, Muchun Song, Oscar Salvador, Andrew Morton

On 2024/3/22 18:41, Vlastimil Babka wrote:
> On 3/22/24 10:20, Miaohe Lin wrote:
>> On 2024/3/21 22:24, Matthew Wilcox (Oracle) wrote:
>>> Reclaim the Slab page flag by using a spare bit in PageType.  We are
>>> perennially short of page flags for various purposes, and now that
>>> the original SLAB allocator has been retired, SLUB does not use the
>>> mapcount/page_type field.  This lets us remove a number of special cases
>>> for ignoring mapcount on Slab pages.
>>>
>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
>>> ---
>>>  include/linux/page-flags.h     | 21 +++++++++++++++++----
>>>  include/trace/events/mmflags.h |  2 +-
>>>  mm/memory-failure.c            |  9 ---------
>>>  mm/slab.h                      |  2 +-
>>>  4 files changed, 19 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>> index 94eb8a11a321..73e0b17c7728 100644
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -109,7 +109,6 @@ enum pageflags {
>>>  	PG_active,
>>>  	PG_workingset,
>>>  	PG_error,
>>> -	PG_slab,
>>>  	PG_owner_priv_1,	/* Owner use. If pagecache, fs may use*/
>>>  	PG_arch_1,
>>>  	PG_reserved,
>>> @@ -524,7 +523,6 @@ PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
>>>  	TESTCLEARFLAG(Active, active, PF_HEAD)
>>>  PAGEFLAG(Workingset, workingset, PF_HEAD)
>>>  	TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
>>> -__PAGEFLAG(Slab, slab, PF_NO_TAIL)
>>>  PAGEFLAG(Checked, checked, PF_NO_COMPOUND)	   /* Used by some filesystems */
>>>  
>>>  /* Xen */
>>> @@ -931,7 +929,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>>>  #endif
>>>  
>>>  /*
>>> - * For pages that are never mapped to userspace (and aren't PageSlab),
>>> + * For pages that are never mapped to userspace,
>>>   * page_type may be used.  Because it is initialised to -1, we invert the
>>>   * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
>>>   * __ClearPageFoo *sets* the bit used for PageFoo.  We reserve a few high and
>>> @@ -947,6 +945,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>>>  #define PG_table	0x00000200
>>>  #define PG_guard	0x00000400
>>>  #define PG_hugetlb	0x00000800
>>> +#define PG_slab		0x00001000
>>>  
>>>  #define PageType(page, flag)						\
>>>  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>>> @@ -1041,6 +1040,20 @@ PAGE_TYPE_OPS(Table, table, pgtable)
>>>   */
>>>  PAGE_TYPE_OPS(Guard, guard, guard)
>>>  
>>> +FOLIO_TYPE_OPS(slab, slab)
>>> +
>>> +/**
>>> + * PageSlab - Determine if the page belongs to the slab allocator
>>> + * @page: The page to test.
>>> + *
>>> + * Context: Any context.
>>> + * Return: True for slab pages, false for any other kind of page.
>>> + */
>>> +static inline bool PageSlab(const struct page *page)
>>> +{
>>> +	return folio_test_slab(page_folio(page));
>>> +}
>>> +
>>>  #ifdef CONFIG_HUGETLB_PAGE
>>>  FOLIO_TYPE_OPS(hugetlb, hugetlb)
>>>  #else
>>> @@ -1121,7 +1134,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
>>>  	(1UL << PG_lru		| 1UL << PG_locked	|	\
>>>  	 1UL << PG_private	| 1UL << PG_private_2	|	\
>>>  	 1UL << PG_writeback	| 1UL << PG_reserved	|	\
>>> -	 1UL << PG_slab		| 1UL << PG_active 	|	\
>>> +	 1UL << PG_active 	|				\
>>>  	 1UL << PG_unevictable	| __PG_MLOCKED | LRU_GEN_MASK)
>>>  
>>>  /*
>>> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
>>> index d55e53ac91bd..e46d6e82765e 100644
>>> --- a/include/trace/events/mmflags.h
>>> +++ b/include/trace/events/mmflags.h
>>> @@ -107,7 +107,6 @@
>>>  	DEF_PAGEFLAG_NAME(lru),						\
>>>  	DEF_PAGEFLAG_NAME(active),					\
>>>  	DEF_PAGEFLAG_NAME(workingset),					\
>>> -	DEF_PAGEFLAG_NAME(slab),					\
>>>  	DEF_PAGEFLAG_NAME(owner_priv_1),				\
>>>  	DEF_PAGEFLAG_NAME(arch_1),					\
>>>  	DEF_PAGEFLAG_NAME(reserved),					\
>>> @@ -135,6 +134,7 @@ IF_HAVE_PG_ARCH_X(arch_3)
>>>  #define DEF_PAGETYPE_NAME(_name) { PG_##_name, __stringify(_name) }
>>>  
>>>  #define __def_pagetype_names						\
>>> +	DEF_PAGETYPE_NAME(slab),					\
>>>  	DEF_PAGETYPE_NAME(hugetlb),					\
>>>  	DEF_PAGETYPE_NAME(offline),					\
>>>  	DEF_PAGETYPE_NAME(guard),					\
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 9349948f1abf..1cb41ba7870c 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1239,7 +1239,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
>>>  #define mlock		(1UL << PG_mlocked)
>>>  #define lru		(1UL << PG_lru)
>>>  #define head		(1UL << PG_head)
>>> -#define slab		(1UL << PG_slab)
>>>  #define reserved	(1UL << PG_reserved)
>>>  
>>>  static struct page_state error_states[] = {
>>> @@ -1249,13 +1248,6 @@ static struct page_state error_states[] = {
>>>  	 * PG_buddy pages only make a small fraction of all free pages.
>>>  	 */
>>>  
>>> -	/*
>>> -	 * Could in theory check if slab page is free or if we can drop
>>> -	 * currently unused objects without touching them. But just
>>> -	 * treat it as standard kernel for now.
>>> -	 */
>>> -	{ slab,		slab,		MF_MSG_SLAB,	me_kernel },
>>
>> Will it be better to leave the above slab case here to catch possible unhandled obscure races with
>> slab? Though it looks like slab page shouldn't reach here.
> 

Sorry for late, I was on hard off-the-job training last week.

> The code would need to handle page types as it's no longer a page flag. I
> guess that's your decision? If it's not necessary, then I guess MF_MSG_SLAB
> itself could be also removed with a buch of more code referencing it.

It might be overkill to add codes to handle page types just for slab. We're only intended to handle
Huge pages, LRU pages and free budy pages anyway. As code changes, I suspect MF_MSG_SLAB and MF_MSG_KERNEL
are obsolete now. But it might be better to leave them alone. There might be some unhandled obscure races
and some buggy kernel code might lead to something unexpected, e.g. slab pages with LRU flags?

Thanks.

> 
>> Thanks.
>>
> 
> .
> 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 7/9] mm: Free up PG_slab
  2024-03-31 15:11     ` [LTP] " kernel test robot
@ 2024-04-02  5:26       ` Matthew Wilcox
  -1 siblings, 0 replies; 45+ messages in thread
From: Matthew Wilcox @ 2024-04-02  5:26 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-mm, linux-kernel, linux-trace-kernel, ltp,
	Andrew Morton, David Hildenbrand, Vlastimil Babka, Miaohe Lin,
	Muchun Song, Oscar Salvador

On Sun, Mar 31, 2024 at 11:11:10PM +0800, kernel test robot wrote:
> kernel test robot noticed "UBSAN:shift-out-of-bounds_in_fs/proc/page.c" on:
> 
> commit: 30e5296811312a13938b83956a55839ac1e3aa40 ("[PATCH 7/9] mm: Free up PG_slab")

Quite right.  Spotted another one while I was at it.  Not able to test
right now, but this should do the trick:

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 5bc82828c6aa..55b01535eb22 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -175,6 +175,8 @@ u64 stable_page_flags(const struct page *page)
 		u |= 1 << KPF_OFFLINE;
 	if (PageTable(page))
 		u |= 1 << KPF_PGTABLE;
+	if (folio_test_slab(folio))
+		u |= 1 << KPF_SLAB;
 
 #if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
 	u |= kpf_copy_bit(k, KPF_IDLE,          PG_idle);
@@ -184,7 +186,6 @@ u64 stable_page_flags(const struct page *page)
 #endif
 
 	u |= kpf_copy_bit(k, KPF_LOCKED,	PG_locked);
-	u |= kpf_copy_bit(k, KPF_SLAB,		PG_slab);
 	u |= kpf_copy_bit(k, KPF_ERROR,		PG_error);
 	u |= kpf_copy_bit(k, KPF_DIRTY,		PG_dirty);
 	u |= kpf_copy_bit(k, KPF_UPTODATE,	PG_uptodate);
diff --git a/tools/cgroup/memcg_slabinfo.py b/tools/cgroup/memcg_slabinfo.py
index 1d3a90d93fe2..270c28a0d098 100644
--- a/tools/cgroup/memcg_slabinfo.py
+++ b/tools/cgroup/memcg_slabinfo.py
@@ -146,12 +146,11 @@ def detect_kernel_config():
 
 
 def for_each_slab(prog):
-    PGSlab = 1 << prog.constant('PG_slab')
-    PGHead = 1 << prog.constant('PG_head')
+    PGSlab = ~prog.constant('PG_slab')
 
     for page in for_each_page(prog):
         try:
-            if page.flags.value_() & PGSlab:
+            if page.page_type.value_() == PGSlab:
                 yield cast('struct slab *', page)
         except FaultError:
             pass

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [LTP] [PATCH 7/9] mm: Free up PG_slab
@ 2024-04-02  5:26       ` Matthew Wilcox
  0 siblings, 0 replies; 45+ messages in thread
From: Matthew Wilcox @ 2024-04-02  5:26 UTC (permalink / raw)
  To: kernel test robot
  Cc: Miaohe Lin, lkp, David Hildenbrand, Muchun Song, linux-kernel,
	linux-mm, Vlastimil Babka, oe-lkp, Andrew Morton, Oscar Salvador,
	ltp, linux-trace-kernel

On Sun, Mar 31, 2024 at 11:11:10PM +0800, kernel test robot wrote:
> kernel test robot noticed "UBSAN:shift-out-of-bounds_in_fs/proc/page.c" on:
> 
> commit: 30e5296811312a13938b83956a55839ac1e3aa40 ("[PATCH 7/9] mm: Free up PG_slab")

Quite right.  Spotted another one while I was at it.  Not able to test
right now, but this should do the trick:

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 5bc82828c6aa..55b01535eb22 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -175,6 +175,8 @@ u64 stable_page_flags(const struct page *page)
 		u |= 1 << KPF_OFFLINE;
 	if (PageTable(page))
 		u |= 1 << KPF_PGTABLE;
+	if (folio_test_slab(folio))
+		u |= 1 << KPF_SLAB;
 
 #if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
 	u |= kpf_copy_bit(k, KPF_IDLE,          PG_idle);
@@ -184,7 +186,6 @@ u64 stable_page_flags(const struct page *page)
 #endif
 
 	u |= kpf_copy_bit(k, KPF_LOCKED,	PG_locked);
-	u |= kpf_copy_bit(k, KPF_SLAB,		PG_slab);
 	u |= kpf_copy_bit(k, KPF_ERROR,		PG_error);
 	u |= kpf_copy_bit(k, KPF_DIRTY,		PG_dirty);
 	u |= kpf_copy_bit(k, KPF_UPTODATE,	PG_uptodate);
diff --git a/tools/cgroup/memcg_slabinfo.py b/tools/cgroup/memcg_slabinfo.py
index 1d3a90d93fe2..270c28a0d098 100644
--- a/tools/cgroup/memcg_slabinfo.py
+++ b/tools/cgroup/memcg_slabinfo.py
@@ -146,12 +146,11 @@ def detect_kernel_config():
 
 
 def for_each_slab(prog):
-    PGSlab = 1 << prog.constant('PG_slab')
-    PGHead = 1 << prog.constant('PG_head')
+    PGSlab = ~prog.constant('PG_slab')
 
     for page in for_each_page(prog):
         try:
-            if page.flags.value_() & PGSlab:
+            if page.page_type.value_() == PGSlab:
                 yield cast('struct slab *', page)
         except FaultError:
             pass

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2024-04-02  5:26 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-21 14:24 [PATCH 0/9] Various significant MM patches Matthew Wilcox (Oracle)
2024-03-21 14:24 ` [PATCH 1/9] mm: Always initialise folio->_deferred_list Matthew Wilcox (Oracle)
2024-03-22  8:23   ` Miaohe Lin
2024-03-22 13:00     ` Matthew Wilcox
2024-04-01  3:14       ` Miaohe Lin
2024-03-22  9:30   ` Vlastimil Babka
2024-03-22 12:49   ` David Hildenbrand
2024-03-21 14:24 ` [PATCH 2/9] mm: Create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros Matthew Wilcox (Oracle)
2024-03-22  9:33   ` Vlastimil Babka
2024-03-21 14:24 ` [PATCH 3/9] mm: Remove folio_prep_large_rmappable() Matthew Wilcox (Oracle)
2024-03-22  9:37   ` Vlastimil Babka
2024-03-22 12:51   ` David Hildenbrand
2024-03-21 14:24 ` [PATCH 4/9] mm: Support page_mapcount() on page_has_type() pages Matthew Wilcox (Oracle)
2024-03-22  9:43   ` Vlastimil Babka
2024-03-22 12:43     ` Matthew Wilcox
2024-03-22 15:04   ` David Hildenbrand
2024-03-21 14:24 ` [PATCH 5/9] mm: Turn folio_test_hugetlb into a PageType Matthew Wilcox (Oracle)
2024-03-22 10:19   ` Vlastimil Babka
2024-03-22 15:06     ` David Hildenbrand
2024-03-23  3:24     ` Matthew Wilcox
2024-03-25  7:57   ` Vlastimil Babka
2024-03-25 18:48     ` Andrew Morton
2024-03-25 20:41       ` Matthew Wilcox
2024-03-25 20:47         ` Vlastimil Babka
2024-03-25 15:14   ` Matthew Wilcox
2024-03-25 15:18     ` Matthew Wilcox
2024-03-25 15:33       ` Matthew Wilcox
2024-03-21 14:24 ` [PATCH 6/9] mm: Remove a call to compound_head() from is_page_hwpoison() Matthew Wilcox (Oracle)
2024-03-22 10:28   ` Vlastimil Babka
2024-03-21 14:24 ` [PATCH 7/9] mm: Free up PG_slab Matthew Wilcox (Oracle)
2024-03-22  9:20   ` Miaohe Lin
2024-03-22 10:41     ` Vlastimil Babka
2024-04-01  3:38       ` Miaohe Lin
2024-03-22 15:09   ` David Hildenbrand
2024-03-25 15:19   ` Matthew Wilcox
2024-03-31 15:11   ` kernel test robot
2024-03-31 15:11     ` [LTP] " kernel test robot
2024-04-02  5:26     ` Matthew Wilcox
2024-04-02  5:26       ` [LTP] " Matthew Wilcox
2024-03-21 14:24 ` [PATCH 8/9] mm: Improve dumping of mapcount and page_type Matthew Wilcox (Oracle)
2024-03-22 11:05   ` Vlastimil Babka
2024-03-22 15:10   ` David Hildenbrand
2024-03-21 14:24 ` [PATCH 9/9] hugetlb: Remove mention of destructors Matthew Wilcox (Oracle)
2024-03-22 11:08   ` Vlastimil Babka
2024-03-22 15:13   ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.