All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] create hugetlb flags to consolidate state
@ 2021-01-11 21:01 Mike Kravetz
  2021-01-11 21:01 ` [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags Mike Kravetz
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Mike Kravetz @ 2021-01-11 21:01 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Michal Hocko, Naoya Horiguchi, Muchun Song, David Hildenbrand,
	Andrew Morton, Mike Kravetz

While discussing a series of hugetlb fixes in [1], it became evident
that the hugetlb specific page state information is stored in a somewhat
haphazard manner.  Code dealing with state information would be easier
to read, understand and maintain if this information was stored in a
consistent manner.

This RFC series uses page.private of the hugetlb head page for storing a
set of hugetlb specific page flags.  Routines to manipulate the flags
are copied from normal page flag manipulation code and use atomic
operations.  This is likely overkill for the uses in hugetlb code, and
can be changed after additional auditing of code.

For now, only 3 state flags are added as part of this series.  However,
the suggested fix in [2] could use another flag.  In addition, a flag
could be used to track whether or not a huge page has been cleared to
optimize code paths if the init_on_alloc security feature is enabled.

[1] https://lore.kernel.org/r/20210106084739.63318-1-songmuchun@bytedance.com
[2] https://lore.kernel.org/r/20210110124017.86750-4-songmuchun@bytedance.com

Mike Kravetz (3):
  hugetlb: use page.private for hugetlb specific page flags
  hugetlb: convert page_huge_active() to HPageMigratable flag
  hugetlb: convert PageHugeTemporary() to HPageTempSurplus

 fs/hugetlbfs/inode.c       |  11 +--
 include/linux/hugetlb.h    |  19 ++++
 include/linux/page-flags.h |   6 --
 mm/hugetlb.c               | 178 ++++++++++++++++++++-----------------
 mm/memory_hotplug.c        |   2 +-
 5 files changed, 121 insertions(+), 95 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags
  2021-01-11 21:01 [RFC PATCH 0/3] create hugetlb flags to consolidate state Mike Kravetz
@ 2021-01-11 21:01 ` Mike Kravetz
  2021-01-12  3:24     ` Muchun Song
                     ` (3 more replies)
  2021-01-11 21:01 ` [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag Mike Kravetz
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 22+ messages in thread
From: Mike Kravetz @ 2021-01-11 21:01 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Michal Hocko, Naoya Horiguchi, Muchun Song, David Hildenbrand,
	Andrew Morton, Mike Kravetz

As hugetlbfs evolved, state information about hugetlb pages was added.
One 'convenient' way of doing this was to use available fields in tail
pages.  Over time, it has become difficult to know the meaning or contents
of fields simply be looking at a small bit of code.  Sometimes, the
naming is just confusing.  For example: The PagePrivate flag indicates
a huge page reservation was consumed and needs to be restored if an error
is encountered and the page is freed before it is instantiated.  The
page.private field contains the pointer to a subpool if the page is
associated with one.

In an effort to make the code more readable, use page.private to contain
hugetlb specific flags.  These flags will have test, set and clear functions
similar to those used for 'normal' page flags.  More importantly, the
flags will have names which actually reflect their purpose.

In this patch,
- Create infrastructure for huge page flag functions
- Move subpool pointer to page[1].private to make way for flags
  Create routines with meaningful names to modify subpool field
- Use new HPageRestoreReserve reserve flag instead of PagePrivate

Conversion of other state information will happen in subsequent patches.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c    | 11 +++---
 include/linux/hugetlb.h |  2 ++
 mm/hugetlb.c            | 80 +++++++++++++++++++++++++++++++----------
 3 files changed, 67 insertions(+), 26 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index b5c109703daa..8bfb80bc6e96 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -966,14 +966,11 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
 		return rc;
 
 	/*
-	 * page_private is subpool pointer in hugetlb pages.  Transfer to
-	 * new page.  PagePrivate is not associated with page_private for
-	 * hugetlb pages and can not be set here as only page_huge_active
-	 * pages can be migrated.
+	 * Transfer any subpool pointer to the new page.
 	 */
-	if (page_private(page)) {
-		set_page_private(newpage, page_private(page));
-		set_page_private(page, 0);
+	if (hpage_spool(page)) {
+		set_hpage_spool(newpage, hpage_spool(page));
+		set_hpage_spool(page, 0);
 	}
 
 	if (mode != MIGRATE_SYNC_NO_COPY)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ebca2ef02212..4f0159f1b9cc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -104,6 +104,8 @@ extern int hugetlb_max_hstate __read_mostly;
 struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages,
 						long min_hpages);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
+struct hugepage_subpool *hpage_spool(struct page *hpage);
+void set_hpage_spool(struct page *hpage, struct hugepage_subpool *spool);
 
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3b38ea958e95..3eb3b102c589 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -52,6 +52,49 @@ static struct cma *hugetlb_cma[MAX_NUMNODES];
 #endif
 static unsigned long hugetlb_cma_size __initdata;
 
+/*
+ * hugetlb specific state flags located in hpage.private
+ */
+enum htlb_page_flags {
+	HPAGE_RestoreReserve = 0,
+};
+
+/*
+ * Macros to create function definitions for hpage flags
+ */
+#define TESTHPAGEFLAG(flname)					\
+static inline int HPage##flname(struct page *page)		\
+	{ return test_bit(HPAGE_##flname, &(page->private)); }
+
+#define SETHPAGEFLAG(flname)					\
+static inline void SetHPage##flname(struct page *page)		\
+	{ set_bit(HPAGE_##flname, &(page->private)); }
+
+#define CLEARHPAGEFLAG(flname)					\
+static inline void ClearHPage##flname(struct page *page)	\
+	{ clear_bit(HPAGE_##flname, &(page->private)); }
+
+#define HPAGEFLAG(flname)					\
+	TESTHPAGEFLAG(flname)					\
+	SETHPAGEFLAG(flname)					\
+	CLEARHPAGEFLAG(flname)
+
+HPAGEFLAG(RestoreReserve)
+
+/*
+ * hugetlb page subpool pointer located in hpage[1].private
+ */
+struct hugepage_subpool *hpage_spool(struct page *hpage)
+{
+	return (struct hugepage_subpool *)(hpage+1)->private;
+}
+
+void set_hpage_spool(struct page *hpage,
+		struct hugepage_subpool *spool)
+{
+	set_page_private(hpage+1, (unsigned long)spool);
+}
+
 /*
  * Minimum page order among possible hugepage sizes, set to a proper value
  * at boot time.
@@ -1116,7 +1159,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 	nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
 	page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
 	if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
-		SetPagePrivate(page);
+		SetHPageRestoreReserve(page);
 		h->resv_huge_pages--;
 	}
 
@@ -1391,20 +1434,19 @@ static void __free_huge_page(struct page *page)
 	 */
 	struct hstate *h = page_hstate(page);
 	int nid = page_to_nid(page);
-	struct hugepage_subpool *spool =
-		(struct hugepage_subpool *)page_private(page);
+	struct hugepage_subpool *spool = hpage_spool(page);
 	bool restore_reserve;
 
 	VM_BUG_ON_PAGE(page_count(page), page);
 	VM_BUG_ON_PAGE(page_mapcount(page), page);
 
-	set_page_private(page, 0);
+	set_hpage_spool(page, 0);
 	page->mapping = NULL;
-	restore_reserve = PagePrivate(page);
-	ClearPagePrivate(page);
+	restore_reserve = HPageRestoreReserve(page);
+	ClearHPageRestoreReserve(page);
 
 	/*
-	 * If PagePrivate() was set on page, page allocation consumed a
+	 * If RestoreReserve was set on page, page allocation consumed a
 	 * reservation.  If the page was associated with a subpool, there
 	 * would have been a page reserved in the subpool before allocation
 	 * via hugepage_subpool_get_pages().  Since we are 'restoring' the
@@ -2213,9 +2255,9 @@ static long vma_add_reservation(struct hstate *h,
  * This routine is called to restore a reservation on error paths.  In the
  * specific error paths, a huge page was allocated (via alloc_huge_page)
  * and is about to be freed.  If a reservation for the page existed,
- * alloc_huge_page would have consumed the reservation and set PagePrivate
+ * alloc_huge_page would have consumed the reservation and set RestoreReserve
  * in the newly allocated page.  When the page is freed via free_huge_page,
- * the global reservation count will be incremented if PagePrivate is set.
+ * the global reservation count will be incremented if RestoreReserve is set.
  * However, free_huge_page can not adjust the reserve map.  Adjust the
  * reserve map here to be consistent with global reserve count adjustments
  * to be made by free_huge_page.
@@ -2224,13 +2266,13 @@ static void restore_reserve_on_error(struct hstate *h,
 			struct vm_area_struct *vma, unsigned long address,
 			struct page *page)
 {
-	if (unlikely(PagePrivate(page))) {
+	if (unlikely(HPageRestoreReserve(page))) {
 		long rc = vma_needs_reservation(h, vma, address);
 
 		if (unlikely(rc < 0)) {
 			/*
 			 * Rare out of memory condition in reserve map
-			 * manipulation.  Clear PagePrivate so that
+			 * manipulation.  Clear RestoreReserve so that
 			 * global reserve count will not be incremented
 			 * by free_huge_page.  This will make it appear
 			 * as though the reservation for this page was
@@ -2239,7 +2281,7 @@ static void restore_reserve_on_error(struct hstate *h,
 			 * is better than inconsistent global huge page
 			 * accounting of reserve counts.
 			 */
-			ClearPagePrivate(page);
+			ClearHPageRestoreReserve(page);
 		} else if (rc) {
 			rc = vma_add_reservation(h, vma, address);
 			if (unlikely(rc < 0))
@@ -2247,7 +2289,7 @@ static void restore_reserve_on_error(struct hstate *h,
 				 * See above comment about rare out of
 				 * memory condition.
 				 */
-				ClearPagePrivate(page);
+				ClearHPageRestoreReserve(page);
 		} else
 			vma_end_reservation(h, vma, address);
 	}
@@ -2328,7 +2370,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
 		if (!page)
 			goto out_uncharge_cgroup;
 		if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
-			SetPagePrivate(page);
+			SetHPageRestoreReserve(page);
 			h->resv_huge_pages--;
 		}
 		spin_lock(&hugetlb_lock);
@@ -2346,7 +2388,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
 
 	spin_unlock(&hugetlb_lock);
 
-	set_page_private(page, (unsigned long)spool);
+	set_hpage_spool(page, spool);
 
 	map_commit = vma_commit_reservation(h, vma, addr);
 	if (unlikely(map_chg > map_commit)) {
@@ -4150,7 +4192,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	spin_lock(ptl);
 	ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
 	if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
-		ClearPagePrivate(new_page);
+		ClearHPageRestoreReserve(new_page);
 
 		/* Break COW */
 		huge_ptep_clear_flush(vma, haddr, ptep);
@@ -4217,7 +4259,7 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
 
 	if (err)
 		return err;
-	ClearPagePrivate(page);
+	ClearHPageRestoreReserve(page);
 
 	/*
 	 * set page dirty so that it will not be removed from cache/file
@@ -4379,7 +4421,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 		goto backout;
 
 	if (anon_rmap) {
-		ClearPagePrivate(page);
+		ClearHPageRestoreReserve(page);
 		hugepage_add_new_anon_rmap(page, vma, haddr);
 	} else
 		page_dup_rmap(page, true);
@@ -4693,7 +4735,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	if (vm_shared) {
 		page_dup_rmap(page, true);
 	} else {
-		ClearPagePrivate(page);
+		ClearHPageRestoreReserve(page);
 		hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
 	}
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag
  2021-01-11 21:01 [RFC PATCH 0/3] create hugetlb flags to consolidate state Mike Kravetz
  2021-01-11 21:01 ` [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags Mike Kravetz
@ 2021-01-11 21:01 ` Mike Kravetz
  2021-01-12  3:45     ` Muchun Song
  2021-01-15  9:17   ` Oscar Salvador
  2021-01-11 21:01 ` [RFC PATCH 3/3] hugetlb: convert PageHugeTemporary() to HPageTempSurplus Mike Kravetz
  2021-01-12 10:41 ` [RFC PATCH 0/3] create hugetlb flags to consolidate state David Hildenbrand
  3 siblings, 2 replies; 22+ messages in thread
From: Mike Kravetz @ 2021-01-11 21:01 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Michal Hocko, Naoya Horiguchi, Muchun Song, David Hildenbrand,
	Andrew Morton, Mike Kravetz

Use the new hugetlb page specific flag to replace the page_huge_active
interfaces.  By it's name, page_huge_active implied that a huge page
was on the active list.  However, that is not really what code checking
the flag wanted to know.  It really wanted to determine if the huge
page could be migrated.  This happens when the page is actually added
the page cache and/or task page table.  This is the reasoning behind the
name change.

The VM_BUG_ON_PAGE() calls in the interfaces were not really necessary
as in all case but one we KNOW the page is a hugetlb page.  Therefore,
they are removed.  In one call to HPageMigratable() is it possible for
the page to not be a hugetlb page due to a race.  However, the code
making the call (scan_movable_pages) is inherently racy, and page state
will be validated later in the migration process.

Note:  Since HPageMigratable is used outside hugetlb.c, it can not be
static.  Therefore, a new set of hugetlb page flag macros is added for
non-static flag functions.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h    | 17 +++++++++++
 include/linux/page-flags.h |  6 ----
 mm/hugetlb.c               | 60 +++++++++++++++++---------------------
 mm/memory_hotplug.c        |  2 +-
 4 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 4f0159f1b9cc..46e590552d55 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -190,6 +190,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 
 bool is_hugetlb_entry_migration(pte_t pte);
 
+int HPageMigratable(struct page *page);
+void SetHPageMigratable(struct page *page);
+void ClearHPageMigratable(struct page *page);
 #else /* !CONFIG_HUGETLB_PAGE */
 
 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
@@ -370,6 +373,20 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
 	return 0;
 }
 
+static inline int HPageMigratable(struct page *page)
+{
+	return(0);
+}
+
+static inline void SetHPageMigratable(struct page *page)
+{
+	return;
+}
+
+static inline void ClearHPageMigratable(struct page *page)
+{
+	return;
+}
 #endif /* !CONFIG_HUGETLB_PAGE */
 /*
  * hugepages at page global directory. If arch support
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 4f6ba9379112..167250466c9c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -593,15 +593,9 @@ static inline void ClearPageCompound(struct page *page)
 #ifdef CONFIG_HUGETLB_PAGE
 int PageHuge(struct page *page);
 int PageHeadHuge(struct page *page);
-bool page_huge_active(struct page *page);
 #else
 TESTPAGEFLAG_FALSE(Huge)
 TESTPAGEFLAG_FALSE(HeadHuge)
-
-static inline bool page_huge_active(struct page *page)
-{
-	return 0;
-}
 #endif
 
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3eb3b102c589..34ce82f4823c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -57,6 +57,7 @@ static unsigned long hugetlb_cma_size __initdata;
  */
 enum htlb_page_flags {
 	HPAGE_RestoreReserve = 0,
+	HPAGE_Migratable,
 };
 
 /*
@@ -79,7 +80,25 @@ static inline void ClearHPage##flname(struct page *page)	\
 	SETHPAGEFLAG(flname)					\
 	CLEARHPAGEFLAG(flname)
 
+#define EXT_TESTHPAGEFLAG(flname)					\
+int HPage##flname(struct page *page)		\
+	{ return test_bit(HPAGE_##flname, &(page->private)); }
+
+#define EXT_SETHPAGEFLAG(flname)					\
+void SetHPage##flname(struct page *page)		\
+	{ set_bit(HPAGE_##flname, &(page->private)); }
+
+#define EXT_CLEARHPAGEFLAG(flname)					\
+void ClearHPage##flname(struct page *page)	\
+	{ clear_bit(HPAGE_##flname, &(page->private)); }
+
+#define EXT_HPAGEFLAG(flname)					\
+	EXT_TESTHPAGEFLAG(flname)					\
+	EXT_SETHPAGEFLAG(flname)					\
+	EXT_CLEARHPAGEFLAG(flname)
+
 HPAGEFLAG(RestoreReserve)
+EXT_HPAGEFLAG(Migratable)
 
 /*
  * hugetlb page subpool pointer located in hpage[1].private
@@ -1379,31 +1398,6 @@ struct hstate *size_to_hstate(unsigned long size)
 	return NULL;
 }
 
-/*
- * Test to determine whether the hugepage is "active/in-use" (i.e. being linked
- * to hstate->hugepage_activelist.)
- *
- * This function can be called for tail pages, but never returns true for them.
- */
-bool page_huge_active(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageHuge(page), page);
-	return PageHead(page) && PagePrivate(&page[1]);
-}
-
-/* never called for tail page */
-static void set_page_huge_active(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
-	SetPagePrivate(&page[1]);
-}
-
-static void clear_page_huge_active(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
-	ClearPagePrivate(&page[1]);
-}
-
 /*
  * Internal hugetlb specific page flag. Do not use outside of the hugetlb
  * code
@@ -1465,7 +1459,7 @@ static void __free_huge_page(struct page *page)
 	}
 
 	spin_lock(&hugetlb_lock);
-	clear_page_huge_active(page);
+	ClearHPageMigratable(page);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
 	hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h),
@@ -4201,7 +4195,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 				make_huge_pte(vma, new_page, 1));
 		page_remove_rmap(old_page, true);
 		hugepage_add_new_anon_rmap(new_page, vma, haddr);
-		set_page_huge_active(new_page);
+		SetHPageMigratable(new_page);
 		/* Make the old page be freed below */
 		new_page = old_page;
 	}
@@ -4439,11 +4433,11 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 
 	/*
 	 * Only make newly allocated pages active.  Existing pages found
-	 * in the pagecache could be !page_huge_active() if they have been
+	 * in the pagecache could be !HPageMigratable if they have been
 	 * isolated for migration.
 	 */
 	if (new_page)
-		set_page_huge_active(page);
+		SetHPageMigratable(page);
 
 	unlock_page(page);
 out:
@@ -4754,7 +4748,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	update_mmu_cache(dst_vma, dst_addr, dst_pte);
 
 	spin_unlock(ptl);
-	set_page_huge_active(page);
+	SetHPageMigratable(page);
 	if (vm_shared)
 		unlock_page(page);
 	ret = 0;
@@ -5580,11 +5574,11 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
 
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 	spin_lock(&hugetlb_lock);
-	if (!page_huge_active(page) || !get_page_unless_zero(page)) {
+	if (!HPageMigratable(page) || !get_page_unless_zero(page)) {
 		ret = false;
 		goto unlock;
 	}
-	clear_page_huge_active(page);
+	ClearHPageMigratable(page);
 	list_move_tail(&page->lru, list);
 unlock:
 	spin_unlock(&hugetlb_lock);
@@ -5595,7 +5589,7 @@ void putback_active_hugepage(struct page *page)
 {
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 	spin_lock(&hugetlb_lock);
-	set_page_huge_active(page);
+	SetHPageMigratable(page);
 	list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
 	spin_unlock(&hugetlb_lock);
 	put_page(page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0f855deea4b2..fefd43757017 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1261,7 +1261,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
 		if (!PageHuge(page))
 			continue;
 		head = compound_head(page);
-		if (page_huge_active(head))
+		if (HPageMigratable(head))
 			goto found;
 		skip = compound_nr(head) - (page - head);
 		pfn += skip - 1;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 3/3] hugetlb: convert PageHugeTemporary() to HPageTempSurplus
  2021-01-11 21:01 [RFC PATCH 0/3] create hugetlb flags to consolidate state Mike Kravetz
  2021-01-11 21:01 ` [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags Mike Kravetz
  2021-01-11 21:01 ` [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag Mike Kravetz
@ 2021-01-11 21:01 ` Mike Kravetz
  2021-01-15 10:16   ` Oscar Salvador
  2021-01-12 10:41 ` [RFC PATCH 0/3] create hugetlb flags to consolidate state David Hildenbrand
  3 siblings, 1 reply; 22+ messages in thread
From: Mike Kravetz @ 2021-01-11 21:01 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Michal Hocko, Naoya Horiguchi, Muchun Song, David Hildenbrand,
	Andrew Morton, Mike Kravetz

Use new hugetlb specific flag HPageTempSurplus to replace the
PageHugeTemporary() interfaces.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 38 +++++++++-----------------------------
 1 file changed, 9 insertions(+), 29 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 34ce82f4823c..949e1f987319 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -58,6 +58,7 @@ static unsigned long hugetlb_cma_size __initdata;
 enum htlb_page_flags {
 	HPAGE_RestoreReserve = 0,
 	HPAGE_Migratable,
+	HPAGE_TempSurplus,
 };
 
 /*
@@ -99,6 +100,7 @@ void ClearHPage##flname(struct page *page)	\
 
 HPAGEFLAG(RestoreReserve)
 EXT_HPAGEFLAG(Migratable)
+HPAGEFLAG(TempSurplus)
 
 /*
  * hugetlb page subpool pointer located in hpage[1].private
@@ -1398,28 +1400,6 @@ struct hstate *size_to_hstate(unsigned long size)
 	return NULL;
 }
 
-/*
- * Internal hugetlb specific page flag. Do not use outside of the hugetlb
- * code
- */
-static inline bool PageHugeTemporary(struct page *page)
-{
-	if (!PageHuge(page))
-		return false;
-
-	return (unsigned long)page[2].mapping == -1U;
-}
-
-static inline void SetPageHugeTemporary(struct page *page)
-{
-	page[2].mapping = (void *)-1U;
-}
-
-static inline void ClearPageHugeTemporary(struct page *page)
-{
-	page[2].mapping = NULL;
-}
-
 static void __free_huge_page(struct page *page)
 {
 	/*
@@ -1467,9 +1447,9 @@ static void __free_huge_page(struct page *page)
 	if (restore_reserve)
 		h->resv_huge_pages++;
 
-	if (PageHugeTemporary(page)) {
+	if (HPageTempSurplus(page)) {
 		list_del(&page->lru);
-		ClearPageHugeTemporary(page);
+		ClearHPageTempSurplus(page);
 		update_and_free_page(h, page);
 	} else if (h->surplus_huge_pages_node[nid]) {
 		/* remove the page from active list */
@@ -1883,7 +1863,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
 	 * codeflow
 	 */
 	if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
-		SetPageHugeTemporary(page);
+		SetHPageTempSurplus(page);
 		spin_unlock(&hugetlb_lock);
 		put_page(page);
 		return NULL;
@@ -1914,7 +1894,7 @@ static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
 	 * We do not account these pages as surplus because they are only
 	 * temporary and will be released properly on the last reference
 	 */
-	SetPageHugeTemporary(page);
+	SetHPageTempSurplus(page);
 
 	return page;
 }
@@ -5612,12 +5592,12 @@ void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason)
 	 * here as well otherwise the global surplus count will not match
 	 * the per-node's.
 	 */
-	if (PageHugeTemporary(newpage)) {
+	if (HPageTempSurplus(newpage)) {
 		int old_nid = page_to_nid(oldpage);
 		int new_nid = page_to_nid(newpage);
 
-		SetPageHugeTemporary(oldpage);
-		ClearPageHugeTemporary(newpage);
+		SetHPageTempSurplus(oldpage);
+		ClearHPageTempSurplus(newpage);
 
 		spin_lock(&hugetlb_lock);
 		if (h->surplus_huge_pages_node[old_nid]) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [External] [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags
  2021-01-11 21:01 ` [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags Mike Kravetz
@ 2021-01-12  3:24     ` Muchun Song
  2021-01-12  5:23   ` kernel test robot
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Muchun Song @ 2021-01-12  3:24 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: LKML, Linux Memory Management List, Michal Hocko,
	Naoya Horiguchi, David Hildenbrand, Andrew Morton

On Tue, Jan 12, 2021 at 5:04 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> As hugetlbfs evolved, state information about hugetlb pages was added.
> One 'convenient' way of doing this was to use available fields in tail
> pages.  Over time, it has become difficult to know the meaning or contents
> of fields simply be looking at a small bit of code.  Sometimes, the
> naming is just confusing.  For example: The PagePrivate flag indicates
> a huge page reservation was consumed and needs to be restored if an error
> is encountered and the page is freed before it is instantiated.  The
> page.private field contains the pointer to a subpool if the page is
> associated with one.
>
> In an effort to make the code more readable, use page.private to contain
> hugetlb specific flags.  These flags will have test, set and clear functions
> similar to those used for 'normal' page flags.  More importantly, the
> flags will have names which actually reflect their purpose.
>
> In this patch,
> - Create infrastructure for huge page flag functions
> - Move subpool pointer to page[1].private to make way for flags
>   Create routines with meaningful names to modify subpool field
> - Use new HPageRestoreReserve reserve flag instead of PagePrivate
>
> Conversion of other state information will happen in subsequent patches.

Great. We can add more meta information easily in the feature.
And it also makes the code more readable. Thanks.

>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c    | 11 +++---
>  include/linux/hugetlb.h |  2 ++
>  mm/hugetlb.c            | 80 +++++++++++++++++++++++++++++++----------
>  3 files changed, 67 insertions(+), 26 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index b5c109703daa..8bfb80bc6e96 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -966,14 +966,11 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
>                 return rc;
>
>         /*
> -        * page_private is subpool pointer in hugetlb pages.  Transfer to
> -        * new page.  PagePrivate is not associated with page_private for
> -        * hugetlb pages and can not be set here as only page_huge_active
> -        * pages can be migrated.
> +        * Transfer any subpool pointer to the new page.
>          */
> -       if (page_private(page)) {
> -               set_page_private(newpage, page_private(page));
> -               set_page_private(page, 0);
> +       if (hpage_spool(page)) {
> +               set_hpage_spool(newpage, hpage_spool(page));
> +               set_hpage_spool(page, 0);
>         }
>
>         if (mode != MIGRATE_SYNC_NO_COPY)
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..4f0159f1b9cc 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -104,6 +104,8 @@ extern int hugetlb_max_hstate __read_mostly;
>  struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages,
>                                                 long min_hpages);
>  void hugepage_put_subpool(struct hugepage_subpool *spool);
> +struct hugepage_subpool *hpage_spool(struct page *hpage);
> +void set_hpage_spool(struct page *hpage, struct hugepage_subpool *spool);
>
>  void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
>  int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3b38ea958e95..3eb3b102c589 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -52,6 +52,49 @@ static struct cma *hugetlb_cma[MAX_NUMNODES];
>  #endif
>  static unsigned long hugetlb_cma_size __initdata;
>
> +/*
> + * hugetlb specific state flags located in hpage.private
> + */
> +enum htlb_page_flags {
> +       HPAGE_RestoreReserve = 0,

IMHO, can we rename it to HPG_restore_reserve? I just want to
make the name consistent with the page flags (see enum pageflags
in include/linux/page-flags.h).

May we also should add a __NR_HPAGEFLAGS to indicate
how many bits that we already used. And add a BUILD_BUG_ON
to check whether the used bits exceed sizeof(unsigned long).
Although it is not necessary now. If we can check it, I think it
would be better.

> +};
> +
> +/*
> + * Macros to create function definitions for hpage flags
> + */
> +#define TESTHPAGEFLAG(flname)                                  \
> +static inline int HPage##flname(struct page *page)             \
> +       { return test_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define SETHPAGEFLAG(flname)                                   \
> +static inline void SetHPage##flname(struct page *page)         \
> +       { set_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define CLEARHPAGEFLAG(flname)                                 \
> +static inline void ClearHPage##flname(struct page *page)       \
> +       { clear_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define HPAGEFLAG(flname)                                      \
> +       TESTHPAGEFLAG(flname)                                   \
> +       SETHPAGEFLAG(flname)                                    \
> +       CLEARHPAGEFLAG(flname)
> +
> +HPAGEFLAG(RestoreReserve)
> +
> +/*
> + * hugetlb page subpool pointer located in hpage[1].private
> + */
> +struct hugepage_subpool *hpage_spool(struct page *hpage)
> +{
> +       return (struct hugepage_subpool *)(hpage+1)->private;
> +}
> +
> +void set_hpage_spool(struct page *hpage,
> +               struct hugepage_subpool *spool)
> +{
> +       set_page_private(hpage+1, (unsigned long)spool);
> +}
> +
>  /*
>   * Minimum page order among possible hugepage sizes, set to a proper value
>   * at boot time.
> @@ -1116,7 +1159,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>         nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
>         page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
>         if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
> -               SetPagePrivate(page);
> +               SetHPageRestoreReserve(page);
>                 h->resv_huge_pages--;
>         }
>
> @@ -1391,20 +1434,19 @@ static void __free_huge_page(struct page *page)
>          */
>         struct hstate *h = page_hstate(page);
>         int nid = page_to_nid(page);
> -       struct hugepage_subpool *spool =
> -               (struct hugepage_subpool *)page_private(page);
> +       struct hugepage_subpool *spool = hpage_spool(page);
>         bool restore_reserve;
>
>         VM_BUG_ON_PAGE(page_count(page), page);
>         VM_BUG_ON_PAGE(page_mapcount(page), page);
>
> -       set_page_private(page, 0);
> +       set_hpage_spool(page, 0);
>         page->mapping = NULL;
> -       restore_reserve = PagePrivate(page);
> -       ClearPagePrivate(page);
> +       restore_reserve = HPageRestoreReserve(page);
> +       ClearHPageRestoreReserve(page);
>
>         /*
> -        * If PagePrivate() was set on page, page allocation consumed a
> +        * If RestoreReserve was set on page, page allocation consumed a
>          * reservation.  If the page was associated with a subpool, there
>          * would have been a page reserved in the subpool before allocation
>          * via hugepage_subpool_get_pages().  Since we are 'restoring' the
> @@ -2213,9 +2255,9 @@ static long vma_add_reservation(struct hstate *h,
>   * This routine is called to restore a reservation on error paths.  In the
>   * specific error paths, a huge page was allocated (via alloc_huge_page)
>   * and is about to be freed.  If a reservation for the page existed,
> - * alloc_huge_page would have consumed the reservation and set PagePrivate
> + * alloc_huge_page would have consumed the reservation and set RestoreReserve
>   * in the newly allocated page.  When the page is freed via free_huge_page,
> - * the global reservation count will be incremented if PagePrivate is set.
> + * the global reservation count will be incremented if RestoreReserve is set.
>   * However, free_huge_page can not adjust the reserve map.  Adjust the
>   * reserve map here to be consistent with global reserve count adjustments
>   * to be made by free_huge_page.
> @@ -2224,13 +2266,13 @@ static void restore_reserve_on_error(struct hstate *h,
>                         struct vm_area_struct *vma, unsigned long address,
>                         struct page *page)
>  {
> -       if (unlikely(PagePrivate(page))) {
> +       if (unlikely(HPageRestoreReserve(page))) {
>                 long rc = vma_needs_reservation(h, vma, address);
>
>                 if (unlikely(rc < 0)) {
>                         /*
>                          * Rare out of memory condition in reserve map
> -                        * manipulation.  Clear PagePrivate so that
> +                        * manipulation.  Clear RestoreReserve so that
>                          * global reserve count will not be incremented
>                          * by free_huge_page.  This will make it appear
>                          * as though the reservation for this page was
> @@ -2239,7 +2281,7 @@ static void restore_reserve_on_error(struct hstate *h,
>                          * is better than inconsistent global huge page
>                          * accounting of reserve counts.
>                          */
> -                       ClearPagePrivate(page);
> +                       ClearHPageRestoreReserve(page);
>                 } else if (rc) {
>                         rc = vma_add_reservation(h, vma, address);
>                         if (unlikely(rc < 0))
> @@ -2247,7 +2289,7 @@ static void restore_reserve_on_error(struct hstate *h,
>                                  * See above comment about rare out of
>                                  * memory condition.
>                                  */
> -                               ClearPagePrivate(page);
> +                               ClearHPageRestoreReserve(page);
>                 } else
>                         vma_end_reservation(h, vma, address);
>         }
> @@ -2328,7 +2370,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>                 if (!page)
>                         goto out_uncharge_cgroup;
>                 if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
> -                       SetPagePrivate(page);
> +                       SetHPageRestoreReserve(page);
>                         h->resv_huge_pages--;
>                 }
>                 spin_lock(&hugetlb_lock);
> @@ -2346,7 +2388,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>
>         spin_unlock(&hugetlb_lock);
>
> -       set_page_private(page, (unsigned long)spool);
> +       set_hpage_spool(page, spool);
>
>         map_commit = vma_commit_reservation(h, vma, addr);
>         if (unlikely(map_chg > map_commit)) {
> @@ -4150,7 +4192,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>         spin_lock(ptl);
>         ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
>         if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
> -               ClearPagePrivate(new_page);
> +               ClearHPageRestoreReserve(new_page);
>
>                 /* Break COW */
>                 huge_ptep_clear_flush(vma, haddr, ptep);
> @@ -4217,7 +4259,7 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
>
>         if (err)
>                 return err;
> -       ClearPagePrivate(page);
> +       ClearHPageRestoreReserve(page);
>
>         /*
>          * set page dirty so that it will not be removed from cache/file
> @@ -4379,7 +4421,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>                 goto backout;
>
>         if (anon_rmap) {
> -               ClearPagePrivate(page);
> +               ClearHPageRestoreReserve(page);
>                 hugepage_add_new_anon_rmap(page, vma, haddr);
>         } else
>                 page_dup_rmap(page, true);
> @@ -4693,7 +4735,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>         if (vm_shared) {
>                 page_dup_rmap(page, true);
>         } else {
> -               ClearPagePrivate(page);
> +               ClearHPageRestoreReserve(page);
>                 hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
>         }
>
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags
@ 2021-01-12  3:24     ` Muchun Song
  0 siblings, 0 replies; 22+ messages in thread
From: Muchun Song @ 2021-01-12  3:24 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: LKML, Linux Memory Management List, Michal Hocko,
	Naoya Horiguchi, David Hildenbrand, Andrew Morton

On Tue, Jan 12, 2021 at 5:04 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> As hugetlbfs evolved, state information about hugetlb pages was added.
> One 'convenient' way of doing this was to use available fields in tail
> pages.  Over time, it has become difficult to know the meaning or contents
> of fields simply be looking at a small bit of code.  Sometimes, the
> naming is just confusing.  For example: The PagePrivate flag indicates
> a huge page reservation was consumed and needs to be restored if an error
> is encountered and the page is freed before it is instantiated.  The
> page.private field contains the pointer to a subpool if the page is
> associated with one.
>
> In an effort to make the code more readable, use page.private to contain
> hugetlb specific flags.  These flags will have test, set and clear functions
> similar to those used for 'normal' page flags.  More importantly, the
> flags will have names which actually reflect their purpose.
>
> In this patch,
> - Create infrastructure for huge page flag functions
> - Move subpool pointer to page[1].private to make way for flags
>   Create routines with meaningful names to modify subpool field
> - Use new HPageRestoreReserve reserve flag instead of PagePrivate
>
> Conversion of other state information will happen in subsequent patches.

Great. We can add more meta information easily in the feature.
And it also makes the code more readable. Thanks.

>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c    | 11 +++---
>  include/linux/hugetlb.h |  2 ++
>  mm/hugetlb.c            | 80 +++++++++++++++++++++++++++++++----------
>  3 files changed, 67 insertions(+), 26 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index b5c109703daa..8bfb80bc6e96 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -966,14 +966,11 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
>                 return rc;
>
>         /*
> -        * page_private is subpool pointer in hugetlb pages.  Transfer to
> -        * new page.  PagePrivate is not associated with page_private for
> -        * hugetlb pages and can not be set here as only page_huge_active
> -        * pages can be migrated.
> +        * Transfer any subpool pointer to the new page.
>          */
> -       if (page_private(page)) {
> -               set_page_private(newpage, page_private(page));
> -               set_page_private(page, 0);
> +       if (hpage_spool(page)) {
> +               set_hpage_spool(newpage, hpage_spool(page));
> +               set_hpage_spool(page, 0);
>         }
>
>         if (mode != MIGRATE_SYNC_NO_COPY)
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..4f0159f1b9cc 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -104,6 +104,8 @@ extern int hugetlb_max_hstate __read_mostly;
>  struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages,
>                                                 long min_hpages);
>  void hugepage_put_subpool(struct hugepage_subpool *spool);
> +struct hugepage_subpool *hpage_spool(struct page *hpage);
> +void set_hpage_spool(struct page *hpage, struct hugepage_subpool *spool);
>
>  void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
>  int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3b38ea958e95..3eb3b102c589 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -52,6 +52,49 @@ static struct cma *hugetlb_cma[MAX_NUMNODES];
>  #endif
>  static unsigned long hugetlb_cma_size __initdata;
>
> +/*
> + * hugetlb specific state flags located in hpage.private
> + */
> +enum htlb_page_flags {
> +       HPAGE_RestoreReserve = 0,

IMHO, can we rename it to HPG_restore_reserve? I just want to
make the name consistent with the page flags (see enum pageflags
in include/linux/page-flags.h).

May we also should add a __NR_HPAGEFLAGS to indicate
how many bits that we already used. And add a BUILD_BUG_ON
to check whether the used bits exceed sizeof(unsigned long).
Although it is not necessary now. If we can check it, I think it
would be better.

> +};
> +
> +/*
> + * Macros to create function definitions for hpage flags
> + */
> +#define TESTHPAGEFLAG(flname)                                  \
> +static inline int HPage##flname(struct page *page)             \
> +       { return test_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define SETHPAGEFLAG(flname)                                   \
> +static inline void SetHPage##flname(struct page *page)         \
> +       { set_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define CLEARHPAGEFLAG(flname)                                 \
> +static inline void ClearHPage##flname(struct page *page)       \
> +       { clear_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define HPAGEFLAG(flname)                                      \
> +       TESTHPAGEFLAG(flname)                                   \
> +       SETHPAGEFLAG(flname)                                    \
> +       CLEARHPAGEFLAG(flname)
> +
> +HPAGEFLAG(RestoreReserve)
> +
> +/*
> + * hugetlb page subpool pointer located in hpage[1].private
> + */
> +struct hugepage_subpool *hpage_spool(struct page *hpage)
> +{
> +       return (struct hugepage_subpool *)(hpage+1)->private;
> +}
> +
> +void set_hpage_spool(struct page *hpage,
> +               struct hugepage_subpool *spool)
> +{
> +       set_page_private(hpage+1, (unsigned long)spool);
> +}
> +
>  /*
>   * Minimum page order among possible hugepage sizes, set to a proper value
>   * at boot time.
> @@ -1116,7 +1159,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>         nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
>         page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
>         if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
> -               SetPagePrivate(page);
> +               SetHPageRestoreReserve(page);
>                 h->resv_huge_pages--;
>         }
>
> @@ -1391,20 +1434,19 @@ static void __free_huge_page(struct page *page)
>          */
>         struct hstate *h = page_hstate(page);
>         int nid = page_to_nid(page);
> -       struct hugepage_subpool *spool =
> -               (struct hugepage_subpool *)page_private(page);
> +       struct hugepage_subpool *spool = hpage_spool(page);
>         bool restore_reserve;
>
>         VM_BUG_ON_PAGE(page_count(page), page);
>         VM_BUG_ON_PAGE(page_mapcount(page), page);
>
> -       set_page_private(page, 0);
> +       set_hpage_spool(page, 0);
>         page->mapping = NULL;
> -       restore_reserve = PagePrivate(page);
> -       ClearPagePrivate(page);
> +       restore_reserve = HPageRestoreReserve(page);
> +       ClearHPageRestoreReserve(page);
>
>         /*
> -        * If PagePrivate() was set on page, page allocation consumed a
> +        * If RestoreReserve was set on page, page allocation consumed a
>          * reservation.  If the page was associated with a subpool, there
>          * would have been a page reserved in the subpool before allocation
>          * via hugepage_subpool_get_pages().  Since we are 'restoring' the
> @@ -2213,9 +2255,9 @@ static long vma_add_reservation(struct hstate *h,
>   * This routine is called to restore a reservation on error paths.  In the
>   * specific error paths, a huge page was allocated (via alloc_huge_page)
>   * and is about to be freed.  If a reservation for the page existed,
> - * alloc_huge_page would have consumed the reservation and set PagePrivate
> + * alloc_huge_page would have consumed the reservation and set RestoreReserve
>   * in the newly allocated page.  When the page is freed via free_huge_page,
> - * the global reservation count will be incremented if PagePrivate is set.
> + * the global reservation count will be incremented if RestoreReserve is set.
>   * However, free_huge_page can not adjust the reserve map.  Adjust the
>   * reserve map here to be consistent with global reserve count adjustments
>   * to be made by free_huge_page.
> @@ -2224,13 +2266,13 @@ static void restore_reserve_on_error(struct hstate *h,
>                         struct vm_area_struct *vma, unsigned long address,
>                         struct page *page)
>  {
> -       if (unlikely(PagePrivate(page))) {
> +       if (unlikely(HPageRestoreReserve(page))) {
>                 long rc = vma_needs_reservation(h, vma, address);
>
>                 if (unlikely(rc < 0)) {
>                         /*
>                          * Rare out of memory condition in reserve map
> -                        * manipulation.  Clear PagePrivate so that
> +                        * manipulation.  Clear RestoreReserve so that
>                          * global reserve count will not be incremented
>                          * by free_huge_page.  This will make it appear
>                          * as though the reservation for this page was
> @@ -2239,7 +2281,7 @@ static void restore_reserve_on_error(struct hstate *h,
>                          * is better than inconsistent global huge page
>                          * accounting of reserve counts.
>                          */
> -                       ClearPagePrivate(page);
> +                       ClearHPageRestoreReserve(page);
>                 } else if (rc) {
>                         rc = vma_add_reservation(h, vma, address);
>                         if (unlikely(rc < 0))
> @@ -2247,7 +2289,7 @@ static void restore_reserve_on_error(struct hstate *h,
>                                  * See above comment about rare out of
>                                  * memory condition.
>                                  */
> -                               ClearPagePrivate(page);
> +                               ClearHPageRestoreReserve(page);
>                 } else
>                         vma_end_reservation(h, vma, address);
>         }
> @@ -2328,7 +2370,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>                 if (!page)
>                         goto out_uncharge_cgroup;
>                 if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
> -                       SetPagePrivate(page);
> +                       SetHPageRestoreReserve(page);
>                         h->resv_huge_pages--;
>                 }
>                 spin_lock(&hugetlb_lock);
> @@ -2346,7 +2388,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>
>         spin_unlock(&hugetlb_lock);
>
> -       set_page_private(page, (unsigned long)spool);
> +       set_hpage_spool(page, spool);
>
>         map_commit = vma_commit_reservation(h, vma, addr);
>         if (unlikely(map_chg > map_commit)) {
> @@ -4150,7 +4192,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>         spin_lock(ptl);
>         ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
>         if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
> -               ClearPagePrivate(new_page);
> +               ClearHPageRestoreReserve(new_page);
>
>                 /* Break COW */
>                 huge_ptep_clear_flush(vma, haddr, ptep);
> @@ -4217,7 +4259,7 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
>
>         if (err)
>                 return err;
> -       ClearPagePrivate(page);
> +       ClearHPageRestoreReserve(page);
>
>         /*
>          * set page dirty so that it will not be removed from cache/file
> @@ -4379,7 +4421,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>                 goto backout;
>
>         if (anon_rmap) {
> -               ClearPagePrivate(page);
> +               ClearHPageRestoreReserve(page);
>                 hugepage_add_new_anon_rmap(page, vma, haddr);
>         } else
>                 page_dup_rmap(page, true);
> @@ -4693,7 +4735,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>         if (vm_shared) {
>                 page_dup_rmap(page, true);
>         } else {
> -               ClearPagePrivate(page);
> +               ClearHPageRestoreReserve(page);
>                 hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
>         }
>
> --
> 2.29.2
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag
  2021-01-11 21:01 ` [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag Mike Kravetz
@ 2021-01-12  3:45     ` Muchun Song
  2021-01-15  9:17   ` Oscar Salvador
  1 sibling, 0 replies; 22+ messages in thread
From: Muchun Song @ 2021-01-12  3:45 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: LKML, Linux Memory Management List, Michal Hocko,
	Naoya Horiguchi, David Hildenbrand, Andrew Morton

On Tue, Jan 12, 2021 at 5:02 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Use the new hugetlb page specific flag to replace the page_huge_active
> interfaces.  By it's name, page_huge_active implied that a huge page
> was on the active list.  However, that is not really what code checking
> the flag wanted to know.  It really wanted to determine if the huge
> page could be migrated.  This happens when the page is actually added
> the page cache and/or task page table.  This is the reasoning behind the
> name change.
>
> The VM_BUG_ON_PAGE() calls in the interfaces were not really necessary
> as in all case but one we KNOW the page is a hugetlb page.  Therefore,
> they are removed.  In one call to HPageMigratable() is it possible for
> the page to not be a hugetlb page due to a race.  However, the code
> making the call (scan_movable_pages) is inherently racy, and page state
> will be validated later in the migration process.
>
> Note:  Since HPageMigratable is used outside hugetlb.c, it can not be
> static.  Therefore, a new set of hugetlb page flag macros is added for
> non-static flag functions.
>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  include/linux/hugetlb.h    | 17 +++++++++++
>  include/linux/page-flags.h |  6 ----
>  mm/hugetlb.c               | 60 +++++++++++++++++---------------------
>  mm/memory_hotplug.c        |  2 +-
>  4 files changed, 45 insertions(+), 40 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 4f0159f1b9cc..46e590552d55 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -190,6 +190,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>
>  bool is_hugetlb_entry_migration(pte_t pte);
>
> +int HPageMigratable(struct page *page);
> +void SetHPageMigratable(struct page *page);
> +void ClearHPageMigratable(struct page *page);
>  #else /* !CONFIG_HUGETLB_PAGE */
>
>  static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
> @@ -370,6 +373,20 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
>         return 0;
>  }
>
> +static inline int HPageMigratable(struct page *page)
> +{
> +       return(0);
> +}
> +
> +static inline void SetHPageMigratable(struct page *page)
> +{
> +       return;
> +}
> +
> +static inline void ClearHPageMigratable(struct page *page)
> +{
> +       return;
> +}

How about introducing the HPAGEFLAG_NOOP macro to do
that?

#define TESTHPAGEFLAG_FALSE(flname) \
static inline int HPage##flname(struct page *page) { return 0; }

#define SETHPAGEFLAG_NOOP(flname) \
static inline void SetHPage##flname(struct page *page) {}

#define CLEARHPAGEFLAG_NOOP(flname) \
static inline void ClearHPage##flname(struct page *page) {}

#define HPAGEFLAG_NOOP(flname) \
TESTHPAGEFLAG_FALSE(flname) \
SETHPAGEFLAG_NOOP(flname) \
CLEARHPAGEFLAG_NOOP(flname)

HPAGEFLAG_NOOP(Migratable)

>  #endif /* !CONFIG_HUGETLB_PAGE */
>  /*
>   * hugepages at page global directory. If arch support
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 4f6ba9379112..167250466c9c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -593,15 +593,9 @@ static inline void ClearPageCompound(struct page *page)
>  #ifdef CONFIG_HUGETLB_PAGE
>  int PageHuge(struct page *page);
>  int PageHeadHuge(struct page *page);
> -bool page_huge_active(struct page *page);
>  #else
>  TESTPAGEFLAG_FALSE(Huge)
>  TESTPAGEFLAG_FALSE(HeadHuge)
> -
> -static inline bool page_huge_active(struct page *page)
> -{
> -       return 0;
> -}
>  #endif
>
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3eb3b102c589..34ce82f4823c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -57,6 +57,7 @@ static unsigned long hugetlb_cma_size __initdata;
>   */
>  enum htlb_page_flags {
>         HPAGE_RestoreReserve = 0,
> +       HPAGE_Migratable,
>  };
>
>  /*
> @@ -79,7 +80,25 @@ static inline void ClearHPage##flname(struct page *page)     \
>         SETHPAGEFLAG(flname)                                    \
>         CLEARHPAGEFLAG(flname)
>
> +#define EXT_TESTHPAGEFLAG(flname)                                      \
> +int HPage##flname(struct page *page)           \
> +       { return test_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define EXT_SETHPAGEFLAG(flname)                                       \
> +void SetHPage##flname(struct page *page)               \
> +       { set_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define EXT_CLEARHPAGEFLAG(flname)                                     \
> +void ClearHPage##flname(struct page *page)     \
> +       { clear_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define EXT_HPAGEFLAG(flname)                                  \
> +       EXT_TESTHPAGEFLAG(flname)                                       \
> +       EXT_SETHPAGEFLAG(flname)                                        \
> +       EXT_CLEARHPAGEFLAG(flname)
> +
>  HPAGEFLAG(RestoreReserve)
> +EXT_HPAGEFLAG(Migratable)

How about moving HPAGEFLAG to include/linux/hugetlb.h
(including enum htlb_page_flags)? In this case, we do not need
EXT_HPAGEFLAG. Although other nodule do not need
the function of HPAGEFLAG(RestoreReserve). IMHO, I think
it can make code more clean. Right?

>
>  /*
>   * hugetlb page subpool pointer located in hpage[1].private
> @@ -1379,31 +1398,6 @@ struct hstate *size_to_hstate(unsigned long size)
>         return NULL;
>  }
>
> -/*
> - * Test to determine whether the hugepage is "active/in-use" (i.e. being linked
> - * to hstate->hugepage_activelist.)
> - *
> - * This function can be called for tail pages, but never returns true for them.
> - */
> -bool page_huge_active(struct page *page)
> -{
> -       VM_BUG_ON_PAGE(!PageHuge(page), page);
> -       return PageHead(page) && PagePrivate(&page[1]);
> -}
> -
> -/* never called for tail page */
> -static void set_page_huge_active(struct page *page)
> -{
> -       VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> -       SetPagePrivate(&page[1]);
> -}
> -
> -static void clear_page_huge_active(struct page *page)
> -{
> -       VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> -       ClearPagePrivate(&page[1]);
> -}
> -
>  /*
>   * Internal hugetlb specific page flag. Do not use outside of the hugetlb
>   * code
> @@ -1465,7 +1459,7 @@ static void __free_huge_page(struct page *page)
>         }
>
>         spin_lock(&hugetlb_lock);
> -       clear_page_huge_active(page);
> +       ClearHPageMigratable(page);
>         hugetlb_cgroup_uncharge_page(hstate_index(h),
>                                      pages_per_huge_page(h), page);
>         hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h),
> @@ -4201,7 +4195,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>                                 make_huge_pte(vma, new_page, 1));
>                 page_remove_rmap(old_page, true);
>                 hugepage_add_new_anon_rmap(new_page, vma, haddr);
> -               set_page_huge_active(new_page);
> +               SetHPageMigratable(new_page);
>                 /* Make the old page be freed below */
>                 new_page = old_page;
>         }
> @@ -4439,11 +4433,11 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>
>         /*
>          * Only make newly allocated pages active.  Existing pages found
> -        * in the pagecache could be !page_huge_active() if they have been
> +        * in the pagecache could be !HPageMigratable if they have been
>          * isolated for migration.
>          */
>         if (new_page)
> -               set_page_huge_active(page);
> +               SetHPageMigratable(page);
>
>         unlock_page(page);
>  out:
> @@ -4754,7 +4748,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>         update_mmu_cache(dst_vma, dst_addr, dst_pte);
>
>         spin_unlock(ptl);
> -       set_page_huge_active(page);
> +       SetHPageMigratable(page);
>         if (vm_shared)
>                 unlock_page(page);
>         ret = 0;
> @@ -5580,11 +5574,11 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
>
>         VM_BUG_ON_PAGE(!PageHead(page), page);
>         spin_lock(&hugetlb_lock);
> -       if (!page_huge_active(page) || !get_page_unless_zero(page)) {
> +       if (!HPageMigratable(page) || !get_page_unless_zero(page)) {
>                 ret = false;
>                 goto unlock;
>         }
> -       clear_page_huge_active(page);
> +       ClearHPageMigratable(page);
>         list_move_tail(&page->lru, list);
>  unlock:
>         spin_unlock(&hugetlb_lock);
> @@ -5595,7 +5589,7 @@ void putback_active_hugepage(struct page *page)
>  {
>         VM_BUG_ON_PAGE(!PageHead(page), page);
>         spin_lock(&hugetlb_lock);
> -       set_page_huge_active(page);
> +       SetHPageMigratable(page);
>         list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
>         spin_unlock(&hugetlb_lock);
>         put_page(page);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0f855deea4b2..fefd43757017 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1261,7 +1261,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>                 if (!PageHuge(page))
>                         continue;
>                 head = compound_head(page);
> -               if (page_huge_active(head))
> +               if (HPageMigratable(head))
>                         goto found;
>                 skip = compound_nr(head) - (page - head);
>                 pfn += skip - 1;
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag
@ 2021-01-12  3:45     ` Muchun Song
  0 siblings, 0 replies; 22+ messages in thread
From: Muchun Song @ 2021-01-12  3:45 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: LKML, Linux Memory Management List, Michal Hocko,
	Naoya Horiguchi, David Hildenbrand, Andrew Morton

On Tue, Jan 12, 2021 at 5:02 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Use the new hugetlb page specific flag to replace the page_huge_active
> interfaces.  By it's name, page_huge_active implied that a huge page
> was on the active list.  However, that is not really what code checking
> the flag wanted to know.  It really wanted to determine if the huge
> page could be migrated.  This happens when the page is actually added
> the page cache and/or task page table.  This is the reasoning behind the
> name change.
>
> The VM_BUG_ON_PAGE() calls in the interfaces were not really necessary
> as in all case but one we KNOW the page is a hugetlb page.  Therefore,
> they are removed.  In one call to HPageMigratable() is it possible for
> the page to not be a hugetlb page due to a race.  However, the code
> making the call (scan_movable_pages) is inherently racy, and page state
> will be validated later in the migration process.
>
> Note:  Since HPageMigratable is used outside hugetlb.c, it can not be
> static.  Therefore, a new set of hugetlb page flag macros is added for
> non-static flag functions.
>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  include/linux/hugetlb.h    | 17 +++++++++++
>  include/linux/page-flags.h |  6 ----
>  mm/hugetlb.c               | 60 +++++++++++++++++---------------------
>  mm/memory_hotplug.c        |  2 +-
>  4 files changed, 45 insertions(+), 40 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 4f0159f1b9cc..46e590552d55 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -190,6 +190,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>
>  bool is_hugetlb_entry_migration(pte_t pte);
>
> +int HPageMigratable(struct page *page);
> +void SetHPageMigratable(struct page *page);
> +void ClearHPageMigratable(struct page *page);
>  #else /* !CONFIG_HUGETLB_PAGE */
>
>  static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
> @@ -370,6 +373,20 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
>         return 0;
>  }
>
> +static inline int HPageMigratable(struct page *page)
> +{
> +       return(0);
> +}
> +
> +static inline void SetHPageMigratable(struct page *page)
> +{
> +       return;
> +}
> +
> +static inline void ClearHPageMigratable(struct page *page)
> +{
> +       return;
> +}

How about introducing the HPAGEFLAG_NOOP macro to do
that?

#define TESTHPAGEFLAG_FALSE(flname) \
static inline int HPage##flname(struct page *page) { return 0; }

#define SETHPAGEFLAG_NOOP(flname) \
static inline void SetHPage##flname(struct page *page) {}

#define CLEARHPAGEFLAG_NOOP(flname) \
static inline void ClearHPage##flname(struct page *page) {}

#define HPAGEFLAG_NOOP(flname) \
TESTHPAGEFLAG_FALSE(flname) \
SETHPAGEFLAG_NOOP(flname) \
CLEARHPAGEFLAG_NOOP(flname)

HPAGEFLAG_NOOP(Migratable)

>  #endif /* !CONFIG_HUGETLB_PAGE */
>  /*
>   * hugepages at page global directory. If arch support
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 4f6ba9379112..167250466c9c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -593,15 +593,9 @@ static inline void ClearPageCompound(struct page *page)
>  #ifdef CONFIG_HUGETLB_PAGE
>  int PageHuge(struct page *page);
>  int PageHeadHuge(struct page *page);
> -bool page_huge_active(struct page *page);
>  #else
>  TESTPAGEFLAG_FALSE(Huge)
>  TESTPAGEFLAG_FALSE(HeadHuge)
> -
> -static inline bool page_huge_active(struct page *page)
> -{
> -       return 0;
> -}
>  #endif
>
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3eb3b102c589..34ce82f4823c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -57,6 +57,7 @@ static unsigned long hugetlb_cma_size __initdata;
>   */
>  enum htlb_page_flags {
>         HPAGE_RestoreReserve = 0,
> +       HPAGE_Migratable,
>  };
>
>  /*
> @@ -79,7 +80,25 @@ static inline void ClearHPage##flname(struct page *page)     \
>         SETHPAGEFLAG(flname)                                    \
>         CLEARHPAGEFLAG(flname)
>
> +#define EXT_TESTHPAGEFLAG(flname)                                      \
> +int HPage##flname(struct page *page)           \
> +       { return test_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define EXT_SETHPAGEFLAG(flname)                                       \
> +void SetHPage##flname(struct page *page)               \
> +       { set_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define EXT_CLEARHPAGEFLAG(flname)                                     \
> +void ClearHPage##flname(struct page *page)     \
> +       { clear_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define EXT_HPAGEFLAG(flname)                                  \
> +       EXT_TESTHPAGEFLAG(flname)                                       \
> +       EXT_SETHPAGEFLAG(flname)                                        \
> +       EXT_CLEARHPAGEFLAG(flname)
> +
>  HPAGEFLAG(RestoreReserve)
> +EXT_HPAGEFLAG(Migratable)

How about moving HPAGEFLAG to include/linux/hugetlb.h
(including enum htlb_page_flags)? In this case, we do not need
EXT_HPAGEFLAG. Although other nodule do not need
the function of HPAGEFLAG(RestoreReserve). IMHO, I think
it can make code more clean. Right?

>
>  /*
>   * hugetlb page subpool pointer located in hpage[1].private
> @@ -1379,31 +1398,6 @@ struct hstate *size_to_hstate(unsigned long size)
>         return NULL;
>  }
>
> -/*
> - * Test to determine whether the hugepage is "active/in-use" (i.e. being linked
> - * to hstate->hugepage_activelist.)
> - *
> - * This function can be called for tail pages, but never returns true for them.
> - */
> -bool page_huge_active(struct page *page)
> -{
> -       VM_BUG_ON_PAGE(!PageHuge(page), page);
> -       return PageHead(page) && PagePrivate(&page[1]);
> -}
> -
> -/* never called for tail page */
> -static void set_page_huge_active(struct page *page)
> -{
> -       VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> -       SetPagePrivate(&page[1]);
> -}
> -
> -static void clear_page_huge_active(struct page *page)
> -{
> -       VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> -       ClearPagePrivate(&page[1]);
> -}
> -
>  /*
>   * Internal hugetlb specific page flag. Do not use outside of the hugetlb
>   * code
> @@ -1465,7 +1459,7 @@ static void __free_huge_page(struct page *page)
>         }
>
>         spin_lock(&hugetlb_lock);
> -       clear_page_huge_active(page);
> +       ClearHPageMigratable(page);
>         hugetlb_cgroup_uncharge_page(hstate_index(h),
>                                      pages_per_huge_page(h), page);
>         hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h),
> @@ -4201,7 +4195,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>                                 make_huge_pte(vma, new_page, 1));
>                 page_remove_rmap(old_page, true);
>                 hugepage_add_new_anon_rmap(new_page, vma, haddr);
> -               set_page_huge_active(new_page);
> +               SetHPageMigratable(new_page);
>                 /* Make the old page be freed below */
>                 new_page = old_page;
>         }
> @@ -4439,11 +4433,11 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>
>         /*
>          * Only make newly allocated pages active.  Existing pages found
> -        * in the pagecache could be !page_huge_active() if they have been
> +        * in the pagecache could be !HPageMigratable if they have been
>          * isolated for migration.
>          */
>         if (new_page)
> -               set_page_huge_active(page);
> +               SetHPageMigratable(page);
>
>         unlock_page(page);
>  out:
> @@ -4754,7 +4748,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>         update_mmu_cache(dst_vma, dst_addr, dst_pte);
>
>         spin_unlock(ptl);
> -       set_page_huge_active(page);
> +       SetHPageMigratable(page);
>         if (vm_shared)
>                 unlock_page(page);
>         ret = 0;
> @@ -5580,11 +5574,11 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
>
>         VM_BUG_ON_PAGE(!PageHead(page), page);
>         spin_lock(&hugetlb_lock);
> -       if (!page_huge_active(page) || !get_page_unless_zero(page)) {
> +       if (!HPageMigratable(page) || !get_page_unless_zero(page)) {
>                 ret = false;
>                 goto unlock;
>         }
> -       clear_page_huge_active(page);
> +       ClearHPageMigratable(page);
>         list_move_tail(&page->lru, list);
>  unlock:
>         spin_unlock(&hugetlb_lock);
> @@ -5595,7 +5589,7 @@ void putback_active_hugepage(struct page *page)
>  {
>         VM_BUG_ON_PAGE(!PageHead(page), page);
>         spin_lock(&hugetlb_lock);
> -       set_page_huge_active(page);
> +       SetHPageMigratable(page);
>         list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
>         spin_unlock(&hugetlb_lock);
>         put_page(page);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0f855deea4b2..fefd43757017 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1261,7 +1261,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>                 if (!PageHuge(page))
>                         continue;
>                 head = compound_head(page);
> -               if (page_huge_active(head))
> +               if (HPageMigratable(head))
>                         goto found;
>                 skip = compound_nr(head) - (page - head);
>                 pfn += skip - 1;
> --
> 2.29.2
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags
  2021-01-11 21:01 ` [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags Mike Kravetz
  2021-01-12  3:24     ` Muchun Song
@ 2021-01-12  5:23   ` kernel test robot
  2021-01-13 13:54   ` Oscar Salvador
  2021-01-13 14:45   ` Matthew Wilcox
  3 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-01-12  5:23 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2684 bytes --]

Hi Mike,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on linux/master]
[also build test WARNING on linus/master hnaz-linux-mm/master v5.11-rc3 next-20210111]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Kravetz/create-hugetlb-flags-to-consolidate-state/20210112-050554
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 09162bc32c880a791c6c0668ce0745cf7958f576
config: s390-randconfig-s032-20210111 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-208-g46a52ca4-dirty
        # https://github.com/0day-ci/linux/commit/749ba97a58afa340623c1c3042c64b206a23128e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Kravetz/create-hugetlb-flags-to-consolidate-state/20210112-050554
        git checkout 749ba97a58afa340623c1c3042c64b206a23128e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> fs/hugetlbfs/inode.c:973:39: sparse: sparse: Using plain integer as NULL pointer

vim +973 fs/hugetlbfs/inode.c

   957	
   958	static int hugetlbfs_migrate_page(struct address_space *mapping,
   959					struct page *newpage, struct page *page,
   960					enum migrate_mode mode)
   961	{
   962		int rc;
   963	
   964		rc = migrate_huge_page_move_mapping(mapping, newpage, page);
   965		if (rc != MIGRATEPAGE_SUCCESS)
   966			return rc;
   967	
   968		/*
   969		 * Transfer any subpool pointer to the new page.
   970		 */
   971		if (hpage_spool(page)) {
   972			set_hpage_spool(newpage, hpage_spool(page));
 > 973			set_hpage_spool(page, 0);
   974		}
   975	
   976		if (mode != MIGRATE_SYNC_NO_COPY)
   977			migrate_page_copy(newpage, page);
   978		else
   979			migrate_page_states(newpage, page);
   980	
   981		return MIGRATEPAGE_SUCCESS;
   982	}
   983	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 16398 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/3] create hugetlb flags to consolidate state
  2021-01-11 21:01 [RFC PATCH 0/3] create hugetlb flags to consolidate state Mike Kravetz
                   ` (2 preceding siblings ...)
  2021-01-11 21:01 ` [RFC PATCH 3/3] hugetlb: convert PageHugeTemporary() to HPageTempSurplus Mike Kravetz
@ 2021-01-12 10:41 ` David Hildenbrand
  3 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2021-01-12 10:41 UTC (permalink / raw)
  To: Mike Kravetz, linux-kernel, linux-mm
  Cc: Michal Hocko, Naoya Horiguchi, Muchun Song, Andrew Morton

On 11.01.21 22:01, Mike Kravetz wrote:
> While discussing a series of hugetlb fixes in [1], it became evident
> that the hugetlb specific page state information is stored in a somewhat
> haphazard manner.  Code dealing with state information would be easier
> to read, understand and maintain if this information was stored in a
> consistent manner.
> 
> This RFC series uses page.private of the hugetlb head page for storing a
> set of hugetlb specific page flags.  Routines to manipulate the flags
> are copied from normal page flag manipulation code and use atomic
> operations.  This is likely overkill for the uses in hugetlb code, and
> can be changed after additional auditing of code.

Haven't looked into the code but that sounds like a good idea.

> 
> For now, only 3 state flags are added as part of this series.  However,
> the suggested fix in [2] could use another flag.  In addition, a flag
> could be used to track whether or not a huge page has been cleared to
> optimize code paths if the init_on_alloc security feature is enabled.

Right, that will be helpful: indicating pages that were either cleared
directly when allocating (init_on_alloc) or via some asynchronous
mechanism in the future.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags
  2021-01-11 21:01 ` [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags Mike Kravetz
  2021-01-12  3:24     ` Muchun Song
  2021-01-12  5:23   ` kernel test robot
@ 2021-01-13 13:54   ` Oscar Salvador
  2021-01-13 17:49     ` Mike Kravetz
  2021-01-13 14:45   ` Matthew Wilcox
  3 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2021-01-13 13:54 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On Mon, Jan 11, 2021 at 01:01:50PM -0800, Mike Kravetz wrote:
> As hugetlbfs evolved, state information about hugetlb pages was added.
> One 'convenient' way of doing this was to use available fields in tail
> pages.  Over time, it has become difficult to know the meaning or contents
> of fields simply be looking at a small bit of code.  Sometimes, the
> naming is just confusing.  For example: The PagePrivate flag indicates
> a huge page reservation was consumed and needs to be restored if an error
> is encountered and the page is freed before it is instantiated.  The
> page.private field contains the pointer to a subpool if the page is
> associated with one.
> 
> In an effort to make the code more readable, use page.private to contain
> hugetlb specific flags.  These flags will have test, set and clear functions
> similar to those used for 'normal' page flags.  More importantly, the
> flags will have names which actually reflect their purpose.
> 
> In this patch,
> - Create infrastructure for huge page flag functions
> - Move subpool pointer to page[1].private to make way for flags
>   Create routines with meaningful names to modify subpool field
> - Use new HPageRestoreReserve reserve flag instead of PagePrivate
> 
> Conversion of other state information will happen in subsequent patches.

I like this idea, it would make the code much easier to follow, and together
with Muchun's gathering indiscrete index hugetlb code will start looking less
scarier.

I do have a question below:

> +enum htlb_page_flags {
> +	HPAGE_RestoreReserve = 0,
> +};
> +
> +/*
> + * Macros to create function definitions for hpage flags
> + */
> +#define TESTHPAGEFLAG(flname)					\
> +static inline int HPage##flname(struct page *page)		\
> +	{ return test_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define SETHPAGEFLAG(flname)					\
> +static inline void SetHPage##flname(struct page *page)		\
> +	{ set_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define CLEARHPAGEFLAG(flname)					\
> +static inline void ClearHPage##flname(struct page *page)	\
> +	{ clear_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define HPAGEFLAG(flname)					\
> +	TESTHPAGEFLAG(flname)					\
> +	SETHPAGEFLAG(flname)					\
> +	CLEARHPAGEFLAG(flname)
> +
> +HPAGEFLAG(RestoreReserve)

I have mixed feelings about this.
Could we have a single function that sets/clears the bit/flag?
e.g:

 static inline void hugetlb_set_flag(struct page *p, page_flag)
 {
         set_bit(flag, &(page->private));
 }

etc.
It would look less of an overkill?
 
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags
  2021-01-11 21:01 ` [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags Mike Kravetz
                     ` (2 preceding siblings ...)
  2021-01-13 13:54   ` Oscar Salvador
@ 2021-01-13 14:45   ` Matthew Wilcox
  2021-01-13 17:51     ` Mike Kravetz
  3 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2021-01-13 14:45 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On Mon, Jan 11, 2021 at 01:01:50PM -0800, Mike Kravetz wrote:
> +	if (hpage_spool(page)) {

Could this be named hpage_subpool(), please?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags
  2021-01-13 13:54   ` Oscar Salvador
@ 2021-01-13 17:49     ` Mike Kravetz
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Kravetz @ 2021-01-13 17:49 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On 1/13/21 5:54 AM, Oscar Salvador wrote:
> On Mon, Jan 11, 2021 at 01:01:50PM -0800, Mike Kravetz wrote:
>> As hugetlbfs evolved, state information about hugetlb pages was added.
>> One 'convenient' way of doing this was to use available fields in tail
>> pages.  Over time, it has become difficult to know the meaning or contents
>> of fields simply be looking at a small bit of code.  Sometimes, the
>> naming is just confusing.  For example: The PagePrivate flag indicates
>> a huge page reservation was consumed and needs to be restored if an error
>> is encountered and the page is freed before it is instantiated.  The
>> page.private field contains the pointer to a subpool if the page is
>> associated with one.
>>
>> In an effort to make the code more readable, use page.private to contain
>> hugetlb specific flags.  These flags will have test, set and clear functions
>> similar to those used for 'normal' page flags.  More importantly, the
>> flags will have names which actually reflect their purpose.
>>
>> In this patch,
>> - Create infrastructure for huge page flag functions
>> - Move subpool pointer to page[1].private to make way for flags
>>   Create routines with meaningful names to modify subpool field
>> - Use new HPageRestoreReserve reserve flag instead of PagePrivate
>>
>> Conversion of other state information will happen in subsequent patches.
> 
> I like this idea, it would make the code much easier to follow, and together
> with Muchun's gathering indiscrete index hugetlb code will start looking less
> scarier.

Thanks for taking a look.

> 
> I do have a question below:
> 
>> +enum htlb_page_flags {
>> +	HPAGE_RestoreReserve = 0,
>> +};
>> +
>> +/*
>> + * Macros to create function definitions for hpage flags
>> + */
>> +#define TESTHPAGEFLAG(flname)					\
>> +static inline int HPage##flname(struct page *page)		\
>> +	{ return test_bit(HPAGE_##flname, &(page->private)); }
>> +
>> +#define SETHPAGEFLAG(flname)					\
>> +static inline void SetHPage##flname(struct page *page)		\
>> +	{ set_bit(HPAGE_##flname, &(page->private)); }
>> +
>> +#define CLEARHPAGEFLAG(flname)					\
>> +static inline void ClearHPage##flname(struct page *page)	\
>> +	{ clear_bit(HPAGE_##flname, &(page->private)); }
>> +
>> +#define HPAGEFLAG(flname)					\
>> +	TESTHPAGEFLAG(flname)					\
>> +	SETHPAGEFLAG(flname)					\
>> +	CLEARHPAGEFLAG(flname)
>> +
>> +HPAGEFLAG(RestoreReserve)
> 
> I have mixed feelings about this.
> Could we have a single function that sets/clears the bit/flag?
> e.g:
> 
>  static inline void hugetlb_set_flag(struct page *p, page_flag)
>  {
>          set_bit(flag, &(page->private));
>  }
> 
> etc.
> It would look less of an overkill?

Sure, we could do this.  As noted, I simply patterned this after the
page flag routines in page-flags.h.  If we go to single functions as
you suggest, I would perhaps change the name a bit to indicate the flags
were associated with the page.  Invoking code comparison would be:

SetHPageRestoreReserve(page)
	-vs-
hugetlb_set_pageflag(page, HP_Restore_Reserve)

In either case, code would be more readable and you can easily grep for
a specific flag.

If we do go with single functions as above, then they would certainly be
moved to hugetlb.h as some flags need to be accessed outside hugetlb.c.
Muchun has already suggested this movement.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags
  2021-01-13 14:45   ` Matthew Wilcox
@ 2021-01-13 17:51     ` Mike Kravetz
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Kravetz @ 2021-01-13 17:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On 1/13/21 6:45 AM, Matthew Wilcox wrote:
> On Mon, Jan 11, 2021 at 01:01:50PM -0800, Mike Kravetz wrote:
>> +	if (hpage_spool(page)) {
> 
> Could this be named hpage_subpool(), please?
> 

Of course!

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag
  2021-01-11 21:01 ` [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag Mike Kravetz
  2021-01-12  3:45     ` Muchun Song
@ 2021-01-15  9:17   ` Oscar Salvador
  2021-01-15 17:43     ` Mike Kravetz
  1 sibling, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2021-01-15  9:17 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On Mon, Jan 11, 2021 at 01:01:51PM -0800, Mike Kravetz wrote:
> Use the new hugetlb page specific flag to replace the page_huge_active
> interfaces.  By it's name, page_huge_active implied that a huge page
> was on the active list.  However, that is not really what code checking
> the flag wanted to know.  It really wanted to determine if the huge
> page could be migrated.  This happens when the page is actually added
> the page cache and/or task page table.  This is the reasoning behind the
> name change.
> 
> The VM_BUG_ON_PAGE() calls in the interfaces were not really necessary
> as in all case but one we KNOW the page is a hugetlb page.  Therefore,
> they are removed.  In one call to HPageMigratable() is it possible for
> the page to not be a hugetlb page due to a race.  However, the code
> making the call (scan_movable_pages) is inherently racy, and page state
> will be validated later in the migration process.
> 
> Note:  Since HPageMigratable is used outside hugetlb.c, it can not be
> static.  Therefore, a new set of hugetlb page flag macros is added for
> non-static flag functions.

Two things about this one:

I am not sure about the name of this one.
It is true that page_huge_active() was only called by memory-hotplug and all
it wanted to know was whether the page was in-use and so if it made sense
to migrate it, so I see some value in the new PageMigratable flag.

However, not all in-use hugetlb can be migrated, e.g: we might have constraints
when it comes to migrate certain sizes of hugetlb, right?
So setting HPageMigratable to all active hugetlb pages might be a bit misleading?
HPageActive maybe? (Sorry, don't have a replacement)

The other thing is that you are right that scan_movable_pages is racy, but
page_huge_active() was checking if the page had the Head flag set before
retrieving page[1].

Before the page_huge_active() in scan_movable_pages() we have the
if (!PageHuge(page)) check, but could it be that between that check and
the page_huge_active(), the page gets dissolved, and so we are checking
a wrong page[1]? Am I making sense? 


-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 3/3] hugetlb: convert PageHugeTemporary() to HPageTempSurplus
  2021-01-11 21:01 ` [RFC PATCH 3/3] hugetlb: convert PageHugeTemporary() to HPageTempSurplus Mike Kravetz
@ 2021-01-15 10:16   ` Oscar Salvador
  2021-01-15 17:47     ` Mike Kravetz
  0 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2021-01-15 10:16 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On Mon, Jan 11, 2021 at 01:01:52PM -0800, Mike Kravetz wrote:
> Use new hugetlb specific flag HPageTempSurplus to replace the
> PageHugeTemporary() interfaces.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/hugetlb.c | 38 +++++++++-----------------------------
>  1 file changed, 9 insertions(+), 29 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 34ce82f4823c..949e1f987319 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -58,6 +58,7 @@ static unsigned long hugetlb_cma_size __initdata;
>  enum htlb_page_flags {
>  	HPAGE_RestoreReserve = 0,
>  	HPAGE_Migratable,
> +	HPAGE_TempSurplus,
>  };
>  
>  /*
> @@ -99,6 +100,7 @@ void ClearHPage##flname(struct page *page)	\
>  
>  HPAGEFLAG(RestoreReserve)
>  EXT_HPAGEFLAG(Migratable)
> +HPAGEFLAG(TempSurplus)

Would HPAGE_Temporary/Temporary not be a better fit?
The point about current PageHugeTemporary is that these pages are temporary as
they do not belong to the pool and will vanish once the last reference gets drop.
We do not really care that much about the surplus part?

Besides, alloc_migrate_huge_page seems to not want to thread these pages as surplus.

Also, I would add a comment either next to each flag or above
the enum htlb_page_flags (probably the latter) with a brief explanation
of each flag.

Besides that, it looks fine to me.
Here I do not see the same problem in
stripping the PageHuge check in PageHugeTemporary, as I did in previous patch,
because all callers of it make sure they operate on a hugetlb page.

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag
  2021-01-15  9:17   ` Oscar Salvador
@ 2021-01-15 17:43     ` Mike Kravetz
  2021-01-15 20:05       ` Mike Kravetz
  2021-01-15 20:38       ` Oscar Salvador
  0 siblings, 2 replies; 22+ messages in thread
From: Mike Kravetz @ 2021-01-15 17:43 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On 1/15/21 1:17 AM, Oscar Salvador wrote:
> On Mon, Jan 11, 2021 at 01:01:51PM -0800, Mike Kravetz wrote:
>> Use the new hugetlb page specific flag to replace the page_huge_active
>> interfaces.  By it's name, page_huge_active implied that a huge page
>> was on the active list.  However, that is not really what code checking
>> the flag wanted to know.  It really wanted to determine if the huge
>> page could be migrated.  This happens when the page is actually added
>> the page cache and/or task page table.  This is the reasoning behind the
>> name change.
>>
>> The VM_BUG_ON_PAGE() calls in the interfaces were not really necessary
>> as in all case but one we KNOW the page is a hugetlb page.  Therefore,
>> they are removed.  In one call to HPageMigratable() is it possible for
>> the page to not be a hugetlb page due to a race.  However, the code
>> making the call (scan_movable_pages) is inherently racy, and page state
>> will be validated later in the migration process.
>>
>> Note:  Since HPageMigratable is used outside hugetlb.c, it can not be
>> static.  Therefore, a new set of hugetlb page flag macros is added for
>> non-static flag functions.
> 
> Two things about this one:
> 
> I am not sure about the name of this one.
> It is true that page_huge_active() was only called by memory-hotplug and all
> it wanted to know was whether the page was in-use and so if it made sense
> to migrate it, so I see some value in the new PageMigratable flag.
> 
> However, not all in-use hugetlb can be migrated, e.g: we might have constraints
> when it comes to migrate certain sizes of hugetlb, right?
> So setting HPageMigratable to all active hugetlb pages might be a bit misleading?
> HPageActive maybe? (Sorry, don't have a replacement)

You concerns about the name change are correct.

The reason for the change came about from discussions about Muchun's series
of fixes and the need for a new 'page is freed' status to fix a race.  In
that discussion, Michal asked 'Why can't we simply set page_huge_active when
the page is allocated and put on the active list?'.  That is mentioned above,
but we really do not want to try and migrate pages after they are allocated
and before they are in use.  That causes problems in the fault handling code.

Anyway, that is how the suggestion for Migration came about.

In that discussion David Hildenbrand noted that code in alloc_contig_range
should migrate free hugetlb pages, but there is no support for that today.
I plan to look at that if nobody else does.  When such code is added, the
name 'Migratable' will become less applicable.

I'm not great at naming.  Perhaps 'In_Use' as a flag name might fit better.

> The other thing is that you are right that scan_movable_pages is racy, but
> page_huge_active() was checking if the page had the Head flag set before
> retrieving page[1].
> 
> Before the page_huge_active() in scan_movable_pages() we have the
> if (!PageHuge(page)) check, but could it be that between that check and
> the page_huge_active(), the page gets dissolved, and so we are checking
> a wrong page[1]? Am I making sense? 

Yes, you are making sense.

The reason I decided to drop the check is because it does not eliminate the
race.  Even with that check in page_huge_active, the page could be dissolved
between that check and check of page[1].  There really is no way to eliminate
the race without holding a reference to the page (or hugetlb_lock).  That
check in page_huge_active just shortens the race window.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 3/3] hugetlb: convert PageHugeTemporary() to HPageTempSurplus
  2021-01-15 10:16   ` Oscar Salvador
@ 2021-01-15 17:47     ` Mike Kravetz
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Kravetz @ 2021-01-15 17:47 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On 1/15/21 2:16 AM, Oscar Salvador wrote:
> On Mon, Jan 11, 2021 at 01:01:52PM -0800, Mike Kravetz wrote:
>> Use new hugetlb specific flag HPageTempSurplus to replace the
>> PageHugeTemporary() interfaces.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>  mm/hugetlb.c | 38 +++++++++-----------------------------
>>  1 file changed, 9 insertions(+), 29 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 34ce82f4823c..949e1f987319 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -58,6 +58,7 @@ static unsigned long hugetlb_cma_size __initdata;
>>  enum htlb_page_flags {
>>  	HPAGE_RestoreReserve = 0,
>>  	HPAGE_Migratable,
>> +	HPAGE_TempSurplus,
>>  };
>>  
>>  /*
>> @@ -99,6 +100,7 @@ void ClearHPage##flname(struct page *page)	\
>>  
>>  HPAGEFLAG(RestoreReserve)
>>  EXT_HPAGEFLAG(Migratable)
>> +HPAGEFLAG(TempSurplus)
> 
> Would HPAGE_Temporary/Temporary not be a better fit?

Yes it would.

> The point about current PageHugeTemporary is that these pages are temporary as
> they do not belong to the pool and will vanish once the last reference gets drop.
> We do not really care that much about the surplus part?
> 
> Besides, alloc_migrate_huge_page seems to not want to thread these pages as surplus.
> 

All correct, not sure why I was thinking 'surplus' when naming the flag.

> Also, I would add a comment either next to each flag or above
> the enum htlb_page_flags (probably the latter) with a brief explanation
> of each flag.

Will do.

-- 
Mike Kravetz

> Besides that, it looks fine to me.
> Here I do not see the same problem in
> stripping the PageHuge check in PageHugeTemporary, as I did in previous patch,
> because all callers of it make sure they operate on a hugetlb page.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag
  2021-01-15 17:43     ` Mike Kravetz
@ 2021-01-15 20:05       ` Mike Kravetz
  2021-01-15 20:29         ` Oscar Salvador
  2021-01-15 20:38       ` Oscar Salvador
  1 sibling, 1 reply; 22+ messages in thread
From: Mike Kravetz @ 2021-01-15 20:05 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On 1/15/21 9:43 AM, Mike Kravetz wrote:
> On 1/15/21 1:17 AM, Oscar Salvador wrote:
>> On Mon, Jan 11, 2021 at 01:01:51PM -0800, Mike Kravetz wrote:
>>> Use the new hugetlb page specific flag to replace the page_huge_active
>>> interfaces.  By it's name, page_huge_active implied that a huge page
>>> was on the active list.  However, that is not really what code checking
>>> the flag wanted to know.  It really wanted to determine if the huge
>>> page could be migrated.  This happens when the page is actually added
>>> the page cache and/or task page table.  This is the reasoning behind the
>>> name change.
>>>
>>> The VM_BUG_ON_PAGE() calls in the interfaces were not really necessary
>>> as in all case but one we KNOW the page is a hugetlb page.  Therefore,
>>> they are removed.  In one call to HPageMigratable() is it possible for
>>> the page to not be a hugetlb page due to a race.  However, the code
>>> making the call (scan_movable_pages) is inherently racy, and page state
>>> will be validated later in the migration process.
>>>
>>> Note:  Since HPageMigratable is used outside hugetlb.c, it can not be
>>> static.  Therefore, a new set of hugetlb page flag macros is added for
>>> non-static flag functions.
>>
>> Two things about this one:
>>
>> I am not sure about the name of this one.
>> It is true that page_huge_active() was only called by memory-hotplug and all
>> it wanted to know was whether the page was in-use and so if it made sense
>> to migrate it, so I see some value in the new PageMigratable flag.
>>
>> However, not all in-use hugetlb can be migrated, e.g: we might have constraints
>> when it comes to migrate certain sizes of hugetlb, right?
>> So setting HPageMigratable to all active hugetlb pages might be a bit misleading?
>> HPageActive maybe? (Sorry, don't have a replacement)
> 
> You concerns about the name change are correct.
> 
> The reason for the change came about from discussions about Muchun's series
> of fixes and the need for a new 'page is freed' status to fix a race.  In
> that discussion, Michal asked 'Why can't we simply set page_huge_active when
> the page is allocated and put on the active list?'.  That is mentioned above,
> but we really do not want to try and migrate pages after they are allocated
> and before they are in use.  That causes problems in the fault handling code.
> 
> Anyway, that is how the suggestion for Migration came about.
> 
> In that discussion David Hildenbrand noted that code in alloc_contig_range
> should migrate free hugetlb pages, but there is no support for that today.
> I plan to look at that if nobody else does.  When such code is added, the
> name 'Migratable' will become less applicable.
> 
> I'm not great at naming.  Perhaps 'In_Use' as a flag name might fit better.
> 

I went back and took a closer look.  Migration is the reason the existing
page_huge_active interfaces were introduced.  And, the only use of the
page_huge_active check is to determine if a page can be migrated.  So,
I think 'Migratable' may be the most suitable name.

To address the concern about not all hugetlb sizes are migratable, we can
just make a check before setting the flag.  This should even help in the
migration/offline paths as we will know sooner if the page can be
migrated or not.

We can address naming in the 'migrating free hugetlb pages' issue when
that code is written.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag
  2021-01-15 20:05       ` Mike Kravetz
@ 2021-01-15 20:29         ` Oscar Salvador
  2021-01-15 21:25           ` Mike Kravetz
  0 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2021-01-15 20:29 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On Fri, Jan 15, 2021 at 12:05:29PM -0800, Mike Kravetz wrote:
> I went back and took a closer look.  Migration is the reason the existing
> page_huge_active interfaces were introduced.  And, the only use of the
> page_huge_active check is to determine if a page can be migrated.  So,
> I think 'Migratable' may be the most suitable name.

Ok, I did not know that. Let us stick with 'Migratable' then.

> To address the concern about not all hugetlb sizes are migratable, we can
> just make a check before setting the flag.  This should even help in the
> migration/offline paths as we will know sooner if the page can be
> migrated or not.

This sounds like a good idea to me.

> We can address naming in the 'migrating free hugetlb pages' issue when
> that code is written.

Sure, it was just a suggestion as when I though about that something
like 'InUse' or 'Active' made more sense to me, but your point is valid.

Sorry for the confusion.

About that alloc_contig_range topic, I would like to take a look unless
someone is already on it or about to be.

Thanks Mike for the time ;-)


-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag
  2021-01-15 17:43     ` Mike Kravetz
  2021-01-15 20:05       ` Mike Kravetz
@ 2021-01-15 20:38       ` Oscar Salvador
  1 sibling, 0 replies; 22+ messages in thread
From: Oscar Salvador @ 2021-01-15 20:38 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On Fri, Jan 15, 2021 at 09:43:36AM -0800, Mike Kravetz wrote:
> > Before the page_huge_active() in scan_movable_pages() we have the
> > if (!PageHuge(page)) check, but could it be that between that check and
> > the page_huge_active(), the page gets dissolved, and so we are checking
> > a wrong page[1]? Am I making sense? 
> 
> Yes, you are making sense.
> 
> The reason I decided to drop the check is because it does not eliminate the
> race.  Even with that check in page_huge_active, the page could be dissolved
> between that check and check of page[1].  There really is no way to eliminate
> the race without holding a reference to the page (or hugetlb_lock).  That
> check in page_huge_active just shortens the race window.

Yeah, you are right, the race already exists.
Anyway, do_migrate_range should take care of making sure what it is
handling, so I think we are good.

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag
  2021-01-15 20:29         ` Oscar Salvador
@ 2021-01-15 21:25           ` Mike Kravetz
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Kravetz @ 2021-01-15 21:25 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, Michal Hocko, Naoya Horiguchi,
	Muchun Song, David Hildenbrand, Andrew Morton

On 1/15/21 12:29 PM, Oscar Salvador wrote:
> 
> About that alloc_contig_range topic, I would like to take a look unless
> someone is already on it or about to be.
> 
> Thanks Mike for the time ;-)

Feel free.

My first thought is that migration of a free hugetlb page would need to
be something like:
1) allocate a fresh hugetlb page from buddy
2) free the 'migrated' free huge page back to buddy

I do not think we can use the existing 'isolate-migrate' flow.  Isolating
a page would make it unavailable for allocation and that could cause
application issues.  

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-01-15 21:26 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 21:01 [RFC PATCH 0/3] create hugetlb flags to consolidate state Mike Kravetz
2021-01-11 21:01 ` [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags Mike Kravetz
2021-01-12  3:24   ` [External] " Muchun Song
2021-01-12  3:24     ` Muchun Song
2021-01-12  5:23   ` kernel test robot
2021-01-13 13:54   ` Oscar Salvador
2021-01-13 17:49     ` Mike Kravetz
2021-01-13 14:45   ` Matthew Wilcox
2021-01-13 17:51     ` Mike Kravetz
2021-01-11 21:01 ` [RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag Mike Kravetz
2021-01-12  3:45   ` [External] " Muchun Song
2021-01-12  3:45     ` Muchun Song
2021-01-15  9:17   ` Oscar Salvador
2021-01-15 17:43     ` Mike Kravetz
2021-01-15 20:05       ` Mike Kravetz
2021-01-15 20:29         ` Oscar Salvador
2021-01-15 21:25           ` Mike Kravetz
2021-01-15 20:38       ` Oscar Salvador
2021-01-11 21:01 ` [RFC PATCH 3/3] hugetlb: convert PageHugeTemporary() to HPageTempSurplus Mike Kravetz
2021-01-15 10:16   ` Oscar Salvador
2021-01-15 17:47     ` Mike Kravetz
2021-01-12 10:41 ` [RFC PATCH 0/3] create hugetlb flags to consolidate state David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.