linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] mm/hugetlb: hugepage migration enhancements
@ 2022-09-21 22:36 Doug Berger
  2022-09-21 22:36 ` [PATCH 1/3] mm/hugetlb: refactor alloc_and_dissolve_huge_page Doug Berger
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Doug Berger @ 2022-09-21 22:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Muchun Song, Florian Fainelli, linux-mm,
	linux-kernel, Doug Berger

This patch set was included as patches [04/21-06/21] in my
previous patch set to introduce Designated Movable Blocks [1].
They were included there for context, but they have value on
their own and are being resubmitted here for consideration on
their own merits.

The alloc_contig_range() function attempts to isolate the free
pages within the range and migrate the data from non-free pages
within the range to allow those pages to become isolated free
pages. If all of the pages in the range are on isolated free
page lists they can be allocated to the caller.

When free hugepages are encountered in the range an attempt is
made to allocate a new compound page to become a replacement
hugepage and to dissolve the free hugepage so that its pages
within isolated pageblocks can be added to the isolated free
page lists. Hugepages that are not free and encountered within
the range must be migrated out of the range and dissolved to
allow the underlying pages to be added to the isolated free
page lists.

Moving the data from hugepages within the range and freeing the
hugepages is not sufficient since the allocation of migration
target hugepages is allowed to come from free hugepages that may
contain isolated pageblocks and freed hugepages will not be on
isolated free page lists so the alloc_contig_range() will fail.

To address these shortcommings the HPG_dissolve hugepage flag is
introduced to tag hugepages that must be dissolved when they are
freed so that their underlying pages can be moved to the page
allocator's free lists. This prevents hugepages that have had
their data migrated to new hugepages from being made available
to subsequent hugepage allocations and allows the isolated free
page test of alloc_contig_range() to succeed.

Dissolved hugepages must be replaced with new hugepages to keep
the hugetlbfs balanced. To support blocking allocation a new
workqueue in introduced that is analogous to the workqueue
introduced to support the hugetlb vmemmap optimization. This new
workqueue allows the allocation and dissolution of the hugepage
to be offloaded to a separate context from the freeing of the
hugepage. The sync_hugetlb_dissolve() function is introduced to
allow outstanding dissolution of hugepages to complete before
the isolated free page check is made by alloc_contig_range().

In addition, a check is added to hugepage allocation to prevent
free hugepages within an isolated pageblock range from being
used to satisfy migration target allocations preventing circular
migrations.

[1] https://lore.kernel.org/linux-mm/20220913195508.3511038-1-opendmb@gmail.com/

Doug Berger (3):
  mm/hugetlb: refactor alloc_and_dissolve_huge_page
  mm/hugetlb: allow migrated hugepage to dissolve when freed
  mm/hugetlb: add hugepage isolation support

 include/linux/hugetlb.h |   5 ++
 mm/hugetlb.c            | 161 +++++++++++++++++++++++++++++++---------
 mm/migrate.c            |   1 +
 mm/page_alloc.c         |   1 +
 4 files changed, 131 insertions(+), 37 deletions(-)


base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] mm/hugetlb: refactor alloc_and_dissolve_huge_page
  2022-09-21 22:36 [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Doug Berger
@ 2022-09-21 22:36 ` Doug Berger
  2022-09-21 22:36 ` [PATCH 2/3] mm/hugetlb: allow migrated hugepage to dissolve when freed Doug Berger
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Doug Berger @ 2022-09-21 22:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Muchun Song, Florian Fainelli, linux-mm,
	linux-kernel, Doug Berger

The alloc_replacement_page() and replace_hugepage() functions are
created from code in the alloc_and_dissolve_huge_page() function
to allow their reuse by the next commit.

Signed-off-by: Doug Berger <opendmb@gmail.com>
---
 mm/hugetlb.c | 84 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 51 insertions(+), 33 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e070b8593b37..2b60de78007c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2709,32 +2709,22 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
 }
 
 /*
- * alloc_and_dissolve_huge_page - Allocate a new page and dissolve the old one
- * @h: struct hstate old page belongs to
- * @old_page: Old page to dissolve
- * @list: List to isolate the page in case we need to
- * Returns 0 on success, otherwise negated error.
+ * Before dissolving the page, we need to allocate a new one for the
+ * pool to remain stable.  Here, we allocate the page and 'prep' it
+ * by doing everything but actually updating counters and adding to
+ * the pool.  This simplifies and let us do most of the processing
+ * under the lock.
  */
-static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
-					struct list_head *list)
+static struct page *alloc_replacement_page(struct hstate *h, int nid)
 {
 	gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
-	int nid = page_to_nid(old_page);
 	bool alloc_retry = false;
 	struct page *new_page;
-	int ret = 0;
 
-	/*
-	 * Before dissolving the page, we need to allocate a new one for the
-	 * pool to remain stable.  Here, we allocate the page and 'prep' it
-	 * by doing everything but actually updating counters and adding to
-	 * the pool.  This simplifies and let us do most of the processing
-	 * under the lock.
-	 */
 alloc_retry:
 	new_page = alloc_buddy_huge_page(h, gfp_mask, nid, NULL, NULL);
 	if (!new_page)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	/*
 	 * If all goes well, this page will be directly added to the free
 	 * list in the pool.  For this the ref count needs to be zero.
@@ -2748,7 +2738,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 	SetHPageTemporary(new_page);
 	if (!put_page_testzero(new_page)) {
 		if (alloc_retry)
-			return -EBUSY;
+			return ERR_PTR(-EBUSY);
 
 		alloc_retry = true;
 		goto alloc_retry;
@@ -2757,6 +2747,48 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 
 	__prep_new_huge_page(h, new_page);
 
+	return new_page;
+}
+
+static void replace_hugepage(struct hstate *h, int nid, struct page *old_page,
+			     struct page *new_page)
+{
+	lockdep_assert_held(&hugetlb_lock);
+	/*
+	 * Ok, old_page is still a genuine free hugepage. Remove it from
+	 * the freelist and decrease the counters. These will be
+	 * incremented again when calling __prep_account_new_huge_page()
+	 * and enqueue_huge_page() for new_page. The counters will remain
+	 * stable since this happens under the lock.
+	 */
+	remove_hugetlb_page(h, old_page, false);
+
+	/*
+	 * Ref count on new page is already zero as it was dropped
+	 * earlier.  It can be directly added to the pool free list.
+	 */
+	__prep_account_new_huge_page(h, nid);
+	enqueue_huge_page(h, new_page);
+}
+
+/*
+ * alloc_and_dissolve_huge_page - Allocate a new page and dissolve the old one
+ * @h: struct hstate old page belongs to
+ * @old_page: Old page to dissolve
+ * @list: List to isolate the page in case we need to
+ * Returns 0 on success, otherwise negated error.
+ */
+static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
+					struct list_head *list)
+{
+	int nid = page_to_nid(old_page);
+	struct page *new_page;
+	int ret = 0;
+
+	new_page = alloc_replacement_page(h, nid);
+	if (IS_ERR(new_page))
+		return PTR_ERR(new_page);
+
 retry:
 	spin_lock_irq(&hugetlb_lock);
 	if (!PageHuge(old_page)) {
@@ -2783,21 +2815,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 		cond_resched();
 		goto retry;
 	} else {
-		/*
-		 * Ok, old_page is still a genuine free hugepage. Remove it from
-		 * the freelist and decrease the counters. These will be
-		 * incremented again when calling __prep_account_new_huge_page()
-		 * and enqueue_huge_page() for new_page. The counters will remain
-		 * stable since this happens under the lock.
-		 */
-		remove_hugetlb_page(h, old_page, false);
-
-		/*
-		 * Ref count on new page is already zero as it was dropped
-		 * earlier.  It can be directly added to the pool free list.
-		 */
-		__prep_account_new_huge_page(h, nid);
-		enqueue_huge_page(h, new_page);
+		replace_hugepage(h, nid, old_page, new_page);
 
 		/*
 		 * Pages have been replaced, we can safely free the old one.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] mm/hugetlb: allow migrated hugepage to dissolve when freed
  2022-09-21 22:36 [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Doug Berger
  2022-09-21 22:36 ` [PATCH 1/3] mm/hugetlb: refactor alloc_and_dissolve_huge_page Doug Berger
@ 2022-09-21 22:36 ` Doug Berger
  2022-09-21 22:36 ` [PATCH 3/3] mm/hugetlb: add hugepage isolation support Doug Berger
  2022-09-22 20:25 ` [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Mike Kravetz
  3 siblings, 0 replies; 9+ messages in thread
From: Doug Berger @ 2022-09-21 22:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Muchun Song, Florian Fainelli, linux-mm,
	linux-kernel, Doug Berger

There is no isolation mechanism for hugepages so a hugepage that
is migrated is returned to its hugepage freelist. This creates
problems for alloc_contig_range() because migrated hugepages can
be allocated as migrate targets for subsequent hugepage migration
attempts.

Even if the migration succeeds the alloc_contig_range() attempt
will fail because test_pages_isolated() will find the now free
hugepages haven't been dissolved.

A subsequent attempt by alloc_contig_range() is necessary for the
isolate_migratepages_range() function to find the freed hugepage
and dissolve it (assuming it has not been reallocated).

A workqueue is introduced to perform the equivalent functionality
of alloc_and_dissolve_huge_page() for a migrated hugepage when it
is freed so that the pages can be released to the isolated page
lists of the buddy allocator allowing the alloc_contig_range()
attempt to succeed.

The HPG_dissolve hugepage flag is introduced to allow tagging
migratable hugepages that should be dissolved when freed.

Signed-off-by: Doug Berger <opendmb@gmail.com>
---
 include/linux/hugetlb.h |  5 +++
 mm/hugetlb.c            | 72 ++++++++++++++++++++++++++++++++++++++---
 mm/migrate.c            |  1 +
 mm/page_alloc.c         |  1 +
 4 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3ec981a0d8b3..0e6e21805e51 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -222,6 +222,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 
 bool is_hugetlb_entry_migration(pte_t pte);
 void hugetlb_unshare_all_pmds(struct vm_area_struct *vma);
+void sync_hugetlb_dissolve(void);
 
 #else /* !CONFIG_HUGETLB_PAGE */
 
@@ -430,6 +431,8 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
 
 static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { }
 
+static inline void sync_hugetlb_dissolve(void) { }
+
 #endif /* !CONFIG_HUGETLB_PAGE */
 /*
  * hugepages at page global directory. If arch support
@@ -574,6 +577,7 @@ enum hugetlb_page_flags {
 	HPG_freed,
 	HPG_vmemmap_optimized,
 	HPG_raw_hwp_unreliable,
+	HPG_dissolve,
 	__NR_HPAGEFLAGS,
 };
 
@@ -621,6 +625,7 @@ HPAGEFLAG(Temporary, temporary)
 HPAGEFLAG(Freed, freed)
 HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
 HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable)
+HPAGEFLAG(Dissolve, dissolve)
 
 #ifdef CONFIG_HUGETLB_PAGE
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2b60de78007c..eab812760fbe 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1582,6 +1582,10 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
 	}
 }
 
+static LLIST_HEAD(hpage_dissolvelist);
+static void dissolve_hpage_workfn(struct work_struct *work);
+static DECLARE_WORK(dissolve_hpage_work, dissolve_hpage_workfn);
+
 /*
  * As update_and_free_page() can be called under any context, so we cannot
  * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
@@ -1628,6 +1632,8 @@ static inline void flush_free_hpage_work(struct hstate *h)
 {
 	if (hugetlb_vmemmap_optimizable(h))
 		flush_work(&free_hpage_work);
+	if (!hstate_is_gigantic(h))
+		flush_work(&dissolve_hpage_work);
 }
 
 static void update_and_free_page(struct hstate *h, struct page *page,
@@ -1679,7 +1685,7 @@ void free_huge_page(struct page *page)
 	struct hstate *h = page_hstate(page);
 	int nid = page_to_nid(page);
 	struct hugepage_subpool *spool = hugetlb_page_subpool(page);
-	bool restore_reserve;
+	bool restore_reserve, dissolve;
 	unsigned long flags;
 
 	VM_BUG_ON_PAGE(page_count(page), page);
@@ -1691,6 +1697,8 @@ void free_huge_page(struct page *page)
 	page->mapping = NULL;
 	restore_reserve = HPageRestoreReserve(page);
 	ClearHPageRestoreReserve(page);
+	dissolve = HPageDissolve(page);
+	ClearHPageDissolve(page);
 
 	/*
 	 * If HPageRestoreReserve was set on page, page allocation consumed a
@@ -1729,6 +1737,11 @@ void free_huge_page(struct page *page)
 		remove_hugetlb_page(h, page, true);
 		spin_unlock_irqrestore(&hugetlb_lock, flags);
 		update_and_free_page(h, page, true);
+	} else if (dissolve) {
+		spin_unlock_irqrestore(&hugetlb_lock, flags);
+		if (llist_add((struct llist_node *)&page->mapping,
+			      &hpage_dissolvelist))
+			schedule_work(&dissolve_hpage_work);
 	} else {
 		arch_clear_hugepage_flags(page);
 		enqueue_huge_page(h, page);
@@ -2771,6 +2784,49 @@ static void replace_hugepage(struct hstate *h, int nid, struct page *old_page,
 	enqueue_huge_page(h, new_page);
 }
 
+static void dissolve_hpage_workfn(struct work_struct *work)
+{
+	struct llist_node *node;
+
+	node = llist_del_all(&hpage_dissolvelist);
+
+	while (node) {
+		struct page *oldpage, *newpage;
+		struct hstate *h;
+		int nid;
+
+		oldpage = container_of((struct address_space **)node,
+				       struct page, mapping);
+		node = node->next;
+		oldpage->mapping = NULL;
+
+		h = page_hstate(oldpage);
+		nid = page_to_nid(oldpage);
+
+		newpage = alloc_replacement_page(h, nid);
+
+		spin_lock_irq(&hugetlb_lock);
+		/* finish freeing oldpage */
+		arch_clear_hugepage_flags(oldpage);
+		enqueue_huge_page(h, oldpage);
+		if (IS_ERR(newpage)) {
+			/* cannot dissolve so just leave free */
+			spin_unlock_irq(&hugetlb_lock);
+			goto next;
+		}
+
+		replace_hugepage(h, nid, oldpage, newpage);
+
+		/*
+		 * Pages have been replaced, we can safely free the old one.
+		 */
+		spin_unlock_irq(&hugetlb_lock);
+		__update_and_free_page(h, oldpage);
+next:
+		cond_resched();
+	}
+}
+
 /*
  * alloc_and_dissolve_huge_page - Allocate a new page and dissolve the old one
  * @h: struct hstate old page belongs to
@@ -2803,6 +2859,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 		 */
 		spin_unlock_irq(&hugetlb_lock);
 		ret = isolate_hugetlb(old_page, list);
+		SetHPageDissolve(old_page);
 		spin_lock_irq(&hugetlb_lock);
 		goto free_new;
 	} else if (!HPageFreed(old_page)) {
@@ -2864,14 +2921,21 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list)
 	if (hstate_is_gigantic(h))
 		return -ENOMEM;
 
-	if (page_count(head) && !isolate_hugetlb(head, list))
+	if (page_count(head) && !isolate_hugetlb(head, list)) {
+		SetHPageDissolve(head);
 		ret = 0;
-	else if (!page_count(head))
+	} else if (!page_count(head)) {
 		ret = alloc_and_dissolve_huge_page(h, head, list);
-
+	}
 	return ret;
 }
 
+void sync_hugetlb_dissolve(void)
+{
+	flush_work(&free_hpage_work);
+	flush_work(&dissolve_hpage_work);
+}
+
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr, int avoid_reserve)
 {
diff --git a/mm/migrate.c b/mm/migrate.c
index 6a1597c92261..b6c6123e614c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -141,6 +141,7 @@ void putback_movable_pages(struct list_head *l)
 
 	list_for_each_entry_safe(page, page2, l, lru) {
 		if (unlikely(PageHuge(page))) {
+			ClearHPageDissolve(page);
 			putback_active_hugepage(page);
 			continue;
 		}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e5486d47406e..6bf76bbc0308 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9235,6 +9235,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	if (ret && ret != -EBUSY)
 		goto done;
 	ret = 0;
+	sync_hugetlb_dissolve();
 
 	/*
 	 * Pages from [start, end) are within a pageblock_nr_pages
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] mm/hugetlb: add hugepage isolation support
  2022-09-21 22:36 [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Doug Berger
  2022-09-21 22:36 ` [PATCH 1/3] mm/hugetlb: refactor alloc_and_dissolve_huge_page Doug Berger
  2022-09-21 22:36 ` [PATCH 2/3] mm/hugetlb: allow migrated hugepage to dissolve when freed Doug Berger
@ 2022-09-21 22:36 ` Doug Berger
  2022-09-22 20:25 ` [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Mike Kravetz
  3 siblings, 0 replies; 9+ messages in thread
From: Doug Berger @ 2022-09-21 22:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Muchun Song, Florian Fainelli, linux-mm,
	linux-kernel, Doug Berger

When a range of pageblocks is isolated there is at most one
hugepage that has only tail pages overlapping that range (i.e.
a hugepage that overlaps the beginning of the range).

However, that hugepage is the first migration target for an
alloc_contig_range() attempt so it already receives special
attention.

Checking whether the pageblock containing the head of a hugepage
is isolated is an inexpensive way to avoid hugepage allocations
from isolated pageblocks which makes alloc_contig_range() more
efficient.

Signed-off-by: Doug Berger <opendmb@gmail.com>
---
 mm/hugetlb.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index eab812760fbe..3a2f0b55059d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -33,6 +33,7 @@
 #include <linux/migrate.h>
 #include <linux/nospec.h>
 #include <linux/delayacct.h>
+#include <linux/page-isolation.h>
 
 #include <asm/page.h>
 #include <asm/pgalloc.h>
@@ -1135,6 +1136,10 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 		if (PageHWPoison(page))
 			continue;
 
+		/* Check head pageblock isolation */
+		if (is_migrate_isolate_page(page))
+			continue;
+
 		list_move(&page->lru, &h->hugepage_activelist);
 		set_page_refcounted(page);
 		ClearHPageFreed(page);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] mm/hugetlb: hugepage migration enhancements
  2022-09-21 22:36 [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Doug Berger
                   ` (2 preceding siblings ...)
  2022-09-21 22:36 ` [PATCH 3/3] mm/hugetlb: add hugepage isolation support Doug Berger
@ 2022-09-22 20:25 ` Mike Kravetz
  2022-09-22 22:41   ` Mike Kravetz
  2022-09-22 23:14   ` Doug Berger
  3 siblings, 2 replies; 9+ messages in thread
From: Mike Kravetz @ 2022-09-22 20:25 UTC (permalink / raw)
  To: Doug Berger
  Cc: Andrew Morton, Muchun Song, Oscar Salvador, Michal Hocko,
	David Hildenbrand, Florian Fainelli, linux-mm, linux-kernel

On 09/21/22 15:36, Doug Berger wrote:
> This patch set was included as patches [04/21-06/21] in my
> previous patch set to introduce Designated Movable Blocks [1].
> They were included there for context, but they have value on
> their own and are being resubmitted here for consideration on
> their own merits.
> 
> The alloc_contig_range() function attempts to isolate the free
> pages within the range and migrate the data from non-free pages
> within the range to allow those pages to become isolated free
> pages. If all of the pages in the range are on isolated free
> page lists they can be allocated to the caller.
> 
> When free hugepages are encountered in the range an attempt is
> made to allocate a new compound page to become a replacement
> hugepage and to dissolve the free hugepage so that its pages
> within isolated pageblocks can be added to the isolated free
> page lists. Hugepages that are not free and encountered within
> the range must be migrated out of the range and dissolved to
> allow the underlying pages to be added to the isolated free
> page lists.
> 
> Moving the data from hugepages within the range and freeing the
> hugepages is not sufficient since the allocation of migration
> target hugepages is allowed to come from free hugepages that may
> contain isolated pageblocks and freed hugepages will not be on
> isolated free page lists so the alloc_contig_range() will fail.

Thanks for looking into this!  I am adding Oscar, Michal and David on
Cc: as they have looked at similar issues in the past.

Before jumping into the details of your changes, I just want to make one
observation.  When migrating (or dissolving) hugetlb pages that are in a
range operated on by alloc_contig_range(), we should not change the number
of hugetlb pages available as noted here.  This includes the number of
total huge pages and the number of huge pages on the node.  Therefore,
we should allocate another huge page from buddy either as the migration
target or to take the place of the dissolved page.

For free huge pages, we do this via alloc_and_dissolve_huge_page.  IIUC,
there are no issues with this code path?

As noted above, for pages to be migrated we first try to use an existing
free huge page as the target.  Quite some time ago, Michal added code to
allocate a new page from buddy as the target if no free huge pages were
available.  This change also included a special flag to dissolve the
source huge page when it is freed.  It seems like this is the exact
behavior we want here?  I wonder if it might be easier just to use this
existing code?
-- 
Mike Kravetz

> To address these shortcommings the HPG_dissolve hugepage flag is
> introduced to tag hugepages that must be dissolved when they are
> freed so that their underlying pages can be moved to the page
> allocator's free lists. This prevents hugepages that have had
> their data migrated to new hugepages from being made available
> to subsequent hugepage allocations and allows the isolated free
> page test of alloc_contig_range() to succeed.
> 
> Dissolved hugepages must be replaced with new hugepages to keep
> the hugetlbfs balanced. To support blocking allocation a new
> workqueue in introduced that is analogous to the workqueue
> introduced to support the hugetlb vmemmap optimization. This new
> workqueue allows the allocation and dissolution of the hugepage
> to be offloaded to a separate context from the freeing of the
> hugepage. The sync_hugetlb_dissolve() function is introduced to
> allow outstanding dissolution of hugepages to complete before
> the isolated free page check is made by alloc_contig_range().
> 
> In addition, a check is added to hugepage allocation to prevent
> free hugepages within an isolated pageblock range from being
> used to satisfy migration target allocations preventing circular
> migrations.
> 
> [1] https://lore.kernel.org/linux-mm/20220913195508.3511038-1-opendmb@gmail.com/
> 
> Doug Berger (3):
>   mm/hugetlb: refactor alloc_and_dissolve_huge_page
>   mm/hugetlb: allow migrated hugepage to dissolve when freed
>   mm/hugetlb: add hugepage isolation support
> 
>  include/linux/hugetlb.h |   5 ++
>  mm/hugetlb.c            | 161 +++++++++++++++++++++++++++++++---------
>  mm/migrate.c            |   1 +
>  mm/page_alloc.c         |   1 +
>  4 files changed, 131 insertions(+), 37 deletions(-)
> 
> 
> base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] mm/hugetlb: hugepage migration enhancements
  2022-09-22 20:25 ` [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Mike Kravetz
@ 2022-09-22 22:41   ` Mike Kravetz
  2022-09-22 23:27     ` Doug Berger
  2022-09-22 23:14   ` Doug Berger
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Kravetz @ 2022-09-22 22:41 UTC (permalink / raw)
  To: Doug Berger
  Cc: Andrew Morton, Muchun Song, Oscar Salvador, Michal Hocko,
	David Hildenbrand, Florian Fainelli, linux-mm, linux-kernel

On 09/22/22 13:25, Mike Kravetz wrote:
> On 09/21/22 15:36, Doug Berger wrote:
> 
> As noted above, for pages to be migrated we first try to use an existing
> free huge page as the target.  Quite some time ago, Michal added code to
> allocate a new page from buddy as the target if no free huge pages were
> available.  This change also included a special flag to dissolve the
> source huge page when it is freed.  It seems like this is the exact
> behavior we want here?  I wonder if it might be easier just to use this
> existing code?

Totally untested, but I believe the patch below would accomplish this.

From aa8fc11bb67bc9e67e3b6b280fab339afce37759 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Thu, 22 Sep 2022 15:32:10 -0700
Subject: [PATCH] hugetlb: force alloc_contig_range hugetlb migrations to
 allocate new pages

When migrating hugetlb pages as the result of an alloc_contig_range
operation, allocate a new page from buddy for the migration target.
This guarantees that the number of hugetlb pages is not decreased by
the operation.  In addition, this will result in the special HPageTemporary
flag being set in the source page so that it will be dissolved when
freed.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h |  5 +++--
 mm/hugetlb.c            | 12 ++++++++++--
 mm/internal.h           |  1 +
 mm/migrate.c            |  3 ++-
 mm/page_alloc.c         |  1 +
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cfe15b32e2d4..558831bf1087 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -702,7 +702,8 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-				nodemask_t *nmask, gfp_t gfp_mask);
+				nodemask_t *nmask, gfp_t gfp_mask,
+				bool temporary);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
 				unsigned long address);
 int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
@@ -1003,7 +1004,7 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
 
 static inline struct page *
 alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-			nodemask_t *nmask, gfp_t gfp_mask)
+			nodemask_t *nmask, gfp_t gfp_mask, bool temporary)
 {
 	return NULL;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8bcaf66defc5..19de8ae79ec8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2308,8 +2308,11 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
 
 /* page migration callback function */
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-		nodemask_t *nmask, gfp_t gfp_mask)
+		nodemask_t *nmask, gfp_t gfp_mask, bool temporary)
 {
+	if (temporary)
+		goto temporary_alloc;
+
 	spin_lock_irq(&hugetlb_lock);
 	if (h->free_huge_pages - h->resv_huge_pages > 0) {
 		struct page *page;
@@ -2322,6 +2325,11 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
 	}
 	spin_unlock_irq(&hugetlb_lock);
 
+temporary_alloc:
+	/*
+	 * Try to allocate a fresh page that with special HPageTemporary
+	 * characteristics
+	 */
 	return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask);
 }
 
@@ -2337,7 +2345,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
 
 	gfp_mask = htlb_alloc_mask(h);
 	node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
-	page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask);
+	page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask, false);
 	mpol_cond_put(mpol);
 
 	return page;
diff --git a/mm/internal.h b/mm/internal.h
index b3002e03c28f..3ebf8885e24f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -800,6 +800,7 @@ struct migration_target_control {
 	int nid;		/* preferred node id */
 	nodemask_t *nmask;
 	gfp_t gfp_mask;
+	bool alloc_contig;	/* alloc_contig_range allocation */
 };
 
 /*
diff --git a/mm/migrate.c b/mm/migrate.c
index c228afba0963..6505ba2070d7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1610,7 +1610,8 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
 		struct hstate *h = page_hstate(&folio->page);
 
 		gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
+		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask,
+						mtc->alloc_contig);
 	}
 
 	if (folio_test_large(folio)) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d7b20bf09c1c..2b8a5a2b51cd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9166,6 +9166,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
 	struct migration_target_control mtc = {
 		.nid = zone_to_nid(cc->zone),
 		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+		.alloc_contig = true,
 	};
 
 	lru_cache_disable();
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] mm/hugetlb: hugepage migration enhancements
  2022-09-22 20:25 ` [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Mike Kravetz
  2022-09-22 22:41   ` Mike Kravetz
@ 2022-09-22 23:14   ` Doug Berger
  1 sibling, 0 replies; 9+ messages in thread
From: Doug Berger @ 2022-09-22 23:14 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Muchun Song, Oscar Salvador, Michal Hocko,
	David Hildenbrand, Florian Fainelli, linux-mm, linux-kernel

On 9/22/2022 1:25 PM, Mike Kravetz wrote:
> On 09/21/22 15:36, Doug Berger wrote:
>> This patch set was included as patches [04/21-06/21] in my
>> previous patch set to introduce Designated Movable Blocks [1].
>> They were included there for context, but they have value on
>> their own and are being resubmitted here for consideration on
>> their own merits.
>>
>> The alloc_contig_range() function attempts to isolate the free
>> pages within the range and migrate the data from non-free pages
>> within the range to allow those pages to become isolated free
>> pages. If all of the pages in the range are on isolated free
>> page lists they can be allocated to the caller.
>>
>> When free hugepages are encountered in the range an attempt is
>> made to allocate a new compound page to become a replacement
>> hugepage and to dissolve the free hugepage so that its pages
>> within isolated pageblocks can be added to the isolated free
>> page lists. Hugepages that are not free and encountered within
>> the range must be migrated out of the range and dissolved to
>> allow the underlying pages to be added to the isolated free
>> page lists.
>>
>> Moving the data from hugepages within the range and freeing the
>> hugepages is not sufficient since the allocation of migration
>> target hugepages is allowed to come from free hugepages that may
>> contain isolated pageblocks and freed hugepages will not be on
>> isolated free page lists so the alloc_contig_range() will fail.
> 
> Thanks for looking into this!  I am adding Oscar, Michal and David on
> Cc: as they have looked at similar issues in the past.
Thanks for helping to get the right eyeballs looking at this.

> 
> Before jumping into the details of your changes, I just want to make one
> observation.  When migrating (or dissolving) hugetlb pages that are in a
> range operated on by alloc_contig_range(), we should not change the number
> of hugetlb pages available as noted here.  This includes the number of
> total huge pages and the number of huge pages on the node.  Therefore,
> we should allocate another huge page from buddy either as the migration
> target or to take the place of the dissolved page.
> 
> For free huge pages, we do this via alloc_and_dissolve_huge_page.  IIUC,
> there are no issues with this code path?
Yes, I did not observe any issue with the the free hugepage handling 
except that your recent improvements with freezing allocated pages 
(thanks for those) will make merging patch 1 more difficult ;).

> 
> As noted above, for pages to be migrated we first try to use an existing
> free huge page as the target.  Quite some time ago, Michal added code to
> allocate a new page from buddy as the target if no free huge pages were
> available.  This change also included a special flag to dissolve the
> source huge page when it is freed.  It seems like this is the exact
> behavior we want here?  I wonder if it might be easier just to use this
> existing code?
Yes, the temporary page flag can be made to work here and I did 
experiment with it, but it is dependent on the current design decisions.
I decided to move to the approach suggested here because I believe it 
could conceivably scale if support for migrating gigantic pages is 
desired in the future. The circular dependency on alloc_contig_range 
will likely keep that from happening, but that was my thinking.

Thanks for taking the time,
-Doug


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] mm/hugetlb: hugepage migration enhancements
  2022-09-22 22:41   ` Mike Kravetz
@ 2022-09-22 23:27     ` Doug Berger
  2022-09-23 22:51       ` Mike Kravetz
  0 siblings, 1 reply; 9+ messages in thread
From: Doug Berger @ 2022-09-22 23:27 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Muchun Song, Oscar Salvador, Michal Hocko,
	David Hildenbrand, Florian Fainelli, linux-mm, linux-kernel

On 9/22/2022 3:41 PM, Mike Kravetz wrote:
> On 09/22/22 13:25, Mike Kravetz wrote:
>> On 09/21/22 15:36, Doug Berger wrote:
>>
>> As noted above, for pages to be migrated we first try to use an existing
>> free huge page as the target.  Quite some time ago, Michal added code to
>> allocate a new page from buddy as the target if no free huge pages were
>> available.  This change also included a special flag to dissolve the
>> source huge page when it is freed.  It seems like this is the exact
>> behavior we want here?  I wonder if it might be easier just to use this
>> existing code?
> 
> Totally untested, but I believe the patch below would accomplish this.
> 
>  From aa8fc11bb67bc9e67e3b6b280fab339afce37759 Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Thu, 22 Sep 2022 15:32:10 -0700
> Subject: [PATCH] hugetlb: force alloc_contig_range hugetlb migrations to
>   allocate new pages
> 
> When migrating hugetlb pages as the result of an alloc_contig_range
> operation, allocate a new page from buddy for the migration target.
> This guarantees that the number of hugetlb pages is not decreased by
> the operation.  In addition, this will result in the special HPageTemporary
> flag being set in the source page so that it will be dissolved when
> freed.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>   include/linux/hugetlb.h |  5 +++--
>   mm/hugetlb.c            | 12 ++++++++++--
>   mm/internal.h           |  1 +
>   mm/migrate.c            |  3 ++-
>   mm/page_alloc.c         |  1 +
>   5 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index cfe15b32e2d4..558831bf1087 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -702,7 +702,8 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
>   struct page *alloc_huge_page(struct vm_area_struct *vma,
>   				unsigned long addr, int avoid_reserve);
>   struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> -				nodemask_t *nmask, gfp_t gfp_mask);
> +				nodemask_t *nmask, gfp_t gfp_mask,
> +				bool temporary);
>   struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
>   				unsigned long address);
>   int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
> @@ -1003,7 +1004,7 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
>   
>   static inline struct page *
>   alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> -			nodemask_t *nmask, gfp_t gfp_mask)
> +			nodemask_t *nmask, gfp_t gfp_mask, bool temporary)
>   {
>   	return NULL;
>   }
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8bcaf66defc5..19de8ae79ec8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2308,8 +2308,11 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
>   
>   /* page migration callback function */
>   struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> -		nodemask_t *nmask, gfp_t gfp_mask)
> +		nodemask_t *nmask, gfp_t gfp_mask, bool temporary)
>   {
> +	if (temporary)
> +		goto temporary_alloc;
> +
>   	spin_lock_irq(&hugetlb_lock);
>   	if (h->free_huge_pages - h->resv_huge_pages > 0) {
>   		struct page *page;
> @@ -2322,6 +2325,11 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
>   	}
>   	spin_unlock_irq(&hugetlb_lock);
>   
> +temporary_alloc:
> +	/*
> +	 * Try to allocate a fresh page that with special HPageTemporary
> +	 * characteristics
> +	 */
>   	return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask);
>   }
>   
> @@ -2337,7 +2345,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
>   
>   	gfp_mask = htlb_alloc_mask(h);
>   	node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
> -	page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask);
> +	page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask, false);
>   	mpol_cond_put(mpol);
>   
>   	return page;
> diff --git a/mm/internal.h b/mm/internal.h
> index b3002e03c28f..3ebf8885e24f 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -800,6 +800,7 @@ struct migration_target_control {
>   	int nid;		/* preferred node id */
>   	nodemask_t *nmask;
>   	gfp_t gfp_mask;
> +	bool alloc_contig;	/* alloc_contig_range allocation */
>   };
>   
>   /*
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c228afba0963..6505ba2070d7 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1610,7 +1610,8 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
>   		struct hstate *h = page_hstate(&folio->page);
>   
>   		gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
> -		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
> +		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask,
> +						mtc->alloc_contig);
>   	}
>   
>   	if (folio_test_large(folio)) {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d7b20bf09c1c..2b8a5a2b51cd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9166,6 +9166,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
>   	struct migration_target_control mtc = {
>   		.nid = zone_to_nid(cc->zone),
>   		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> +		.alloc_contig = true,
>   	};
>   
>   	lru_cache_disable();
I believe I exposed alloc_migrate_huge_page() and conditionally invoked 
it from alloc_migration_target() when in alloc_contig, which is roughly 
equivalent. I didn't consider modifying the mtc to pass the information 
so my logic in alloc_migration_target() was a little kludgy.

Like I said, this can be made to work and I'm happy to accept an 
alternative if others agree. I think the isolation test of patch 3 is 
also still desirable.

Thanks again!
-Doug

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] mm/hugetlb: hugepage migration enhancements
  2022-09-22 23:27     ` Doug Berger
@ 2022-09-23 22:51       ` Mike Kravetz
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Kravetz @ 2022-09-23 22:51 UTC (permalink / raw)
  To: Doug Berger
  Cc: Andrew Morton, Muchun Song, Oscar Salvador, Michal Hocko,
	David Hildenbrand, Florian Fainelli, linux-mm, linux-kernel

On 09/22/22 16:27, Doug Berger wrote:
> On 9/22/2022 3:41 PM, Mike Kravetz wrote:
> > On 09/22/22 13:25, Mike Kravetz wrote:
> > > On 09/21/22 15:36, Doug Berger wrote:
> > > 
> > > As noted above, for pages to be migrated we first try to use an existing
> > > free huge page as the target.  Quite some time ago, Michal added code to
> > > allocate a new page from buddy as the target if no free huge pages were
> > > available.  This change also included a special flag to dissolve the
> > > source huge page when it is freed.  It seems like this is the exact
> > > behavior we want here?  I wonder if it might be easier just to use this
> > > existing code?
> > 
> > Totally untested, but I believe the patch below would accomplish this.
> > 
> >  From aa8fc11bb67bc9e67e3b6b280fab339afce37759 Mon Sep 17 00:00:00 2001
> > From: Mike Kravetz <mike.kravetz@oracle.com>
> > Date: Thu, 22 Sep 2022 15:32:10 -0700
> > Subject: [PATCH] hugetlb: force alloc_contig_range hugetlb migrations to
> >   allocate new pages
> > 
> > When migrating hugetlb pages as the result of an alloc_contig_range
> > operation, allocate a new page from buddy for the migration target.
> > This guarantees that the number of hugetlb pages is not decreased by
> > the operation.  In addition, this will result in the special HPageTemporary
> > flag being set in the source page so that it will be dissolved when
> > freed.
> > 
<snip>
> I believe I exposed alloc_migrate_huge_page() and conditionally invoked it
> from alloc_migration_target() when in alloc_contig, which is roughly
> equivalent. I didn't consider modifying the mtc to pass the information so
> my logic in alloc_migration_target() was a little kludgy.
> 
> Like I said, this can be made to work and I'm happy to accept an alternative
> if others agree. I think the isolation test of patch 3 is also still
> desirable.

Yes, hoping to get some other opinions as well.

I do agree that patch 3 is still a good idea.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-09-23 22:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21 22:36 [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Doug Berger
2022-09-21 22:36 ` [PATCH 1/3] mm/hugetlb: refactor alloc_and_dissolve_huge_page Doug Berger
2022-09-21 22:36 ` [PATCH 2/3] mm/hugetlb: allow migrated hugepage to dissolve when freed Doug Berger
2022-09-21 22:36 ` [PATCH 3/3] mm/hugetlb: add hugepage isolation support Doug Berger
2022-09-22 20:25 ` [PATCH 0/3] mm/hugetlb: hugepage migration enhancements Mike Kravetz
2022-09-22 22:41   ` Mike Kravetz
2022-09-22 23:27     ` Doug Berger
2022-09-23 22:51       ` Mike Kravetz
2022-09-22 23:14   ` Doug Berger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).