All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment.
@ 2022-04-06 15:18 Zi Yan
  2022-04-06 15:18 ` [PATCH v10 1/5] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Zi Yan @ 2022-04-06 15:18 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, virtualization, Vlastimil Babka, Mel Gorman,
	Eric Ren, Mike Rapoport, Oscar Salvador, Christophe Leroy,
	Zi Yan

From: Zi Yan <ziy@nvidia.com>

Hi David,

This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make
MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2022-04-05-15-54.

I also added "Content-Type: text/plain; charset=UTF-8" to all email bodies
explicitly, please let me know if you still cannot see the emails in a
proper format.

Changelog
===
V10
---
1. Reverted back to the original outer_start, outer_end range for
   test_pages_isolated() and isolate_freepages_range() in Patch 3,
   otherwise isolation will fail if start in alloc_contig_range() is in
   the middle of a free page.

V9
---
1. Limited has_unmovable_pages() check within a pageblock.
2. Added a check to ensure page isolation is done within a single zone
   in isolate_single_pageblock().
3. Fixed an off-by-one bug in isolate_single_pageblock().
4. Fixed a NULL-deferencing bug when the pages before to-be-isolated pageblock
   is not online in isolate_single_pageblock().

V8
---
1. Cleaned up has_unmovable_pages() to remove page argument.

V7
---
1. Added page validity check in isolate_single_pageblock() to avoid out
   of zone pages.
2. Fixed a bug in split_free_page() to split and free pages in correct
   page order.

V6
---
1. Resolved compilation error/warning reported by kernel test robot.
2. Tried to solve the coding concerns from Christophe Leroy.
3. Shortened lengthy lines (pointed out by Christoph Hellwig).

V5
---
1. Moved isolation address alignment handling in start_isolate_page_range().
2. Rewrote and simplified how alloc_contig_range() works at pageblock
   granularity (Patch 3). Only two pageblock migratetypes need to be saved and
   restored. start_isolate_page_range() might need to migrate pages in this
   version, but it prevents the caller from worrying about
   max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment after the page range
   is isolated.

V4
---
1. Dropped two irrelevant patches on non-lru compound page handling, as
   it is not supported upstream.
2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
3. Always check whether two pageblocks can be merged in
   __free_one_page() when order is >= pageblock_order, as the case (not
   mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
3. Moving has_unmovable_pages() is now a separate patch.
4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.

Description
===

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. isolates pages at pageblock granularity instead of
   max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages);
2. splits free pages across the specified range or migrates in-use pages
   across the specified range then splits the freed page to avoid free page
   accounting issues (it happens when multiple pageblocks within a single page
   have different migratetypes);
3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
   range during isolation to avoid alloc_contig_range() failure when pageblocks
   within a MAX_ORDER - 1 aligned range are allocated separately.
4. returns pages not in the range as it did before.

One optimization might come later:
1. make MIGRATE_ISOLATE a separate bit to be able to restore the original
   migratetypes when isolation fails in the middle of the range.

Feel free to give comments and suggestions. Thanks.

[1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/

Zi Yan (5):
  mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  mm: page_isolation: check specified range for unmovable pages
  mm: make alloc_contig_range work at pageblock granularity
  mm: cma: use pageblock_order as the single alignment
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
    size.

 drivers/virtio/virtio_mem.c    |   6 +-
 include/linux/cma.h            |   4 +-
 include/linux/mmzone.h         |   5 +-
 include/linux/page-isolation.h |   6 +-
 mm/internal.h                  |   6 +
 mm/memory_hotplug.c            |   3 +-
 mm/page_alloc.c                | 191 +++++-------------
 mm/page_isolation.c            | 348 +++++++++++++++++++++++++++++++--
 8 files changed, 395 insertions(+), 174 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v10 1/5] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  2022-04-06 15:18 [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
@ 2022-04-06 15:18 ` Zi Yan
  2022-04-06 15:18 ` [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages Zi Yan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2022-04-06 15:18 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, virtualization, Vlastimil Babka, Mel Gorman,
	Eric Ren, Mike Rapoport, Oscar Salvador, Christophe Leroy,
	Zi Yan, Mike Rapoport

From: Zi Yan <ziy@nvidia.com>

has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
mm/page_alloc.c and make it static.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 include/linux/page-isolation.h |   2 -
 mm/page_alloc.c                | 119 ---------------------------------
 mm/page_isolation.c            | 119 +++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 121 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 572458016331..e14eddf6741a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype)
 #define MEMORY_OFFLINE	0x1
 #define REPORT_FAILURE	0x2
 
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f199931784e4..dafef8099acd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8872,125 +8872,6 @@ void *__init alloc_large_system_hash(const char *tablename,
 	return table;
 }
 
-/*
- * This function checks whether pageblock includes unmovable pages or not.
- *
- * PageLRU check without isolation or lru_lock could race so that
- * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
- * check without lock_page also may miss some movable non-lru pages at
- * race condition. So you can't expect this function should be exact.
- *
- * Returns a page without holding a reference. If the caller wants to
- * dereference that page (e.g., dumping), it has to make sure that it
- * cannot get removed (e.g., via memory unplug) concurrently.
- *
- */
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
-{
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
-
-	if (is_migrate_cma_page(page)) {
-		/*
-		 * CMA allocations (alloc_contig_range) really need to mark
-		 * isolate CMA pageblocks even when they are not movable in fact
-		 * so consider them movable here.
-		 */
-		if (is_migrate_cma(migratetype))
-			return NULL;
-
-		return page;
-	}
-
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
-
-		/*
-		 * Both, bootmem allocations and memory holes are marked
-		 * PG_reserved and are unmovable. We can even have unmovable
-		 * allocations inside ZONE_MOVABLE, for example when
-		 * specifying "movablecore".
-		 */
-		if (PageReserved(page))
-			return page;
-
-		/*
-		 * If the zone is movable and we have ruled out all reserved
-		 * pages then it should be reasonably safe to assume the rest
-		 * is movable.
-		 */
-		if (zone_idx(zone) == ZONE_MOVABLE)
-			continue;
-
-		/*
-		 * Hugepages are not in LRU lists, but they're movable.
-		 * THPs are on the LRU, but need to be counted as #small pages.
-		 * We need not scan over tail pages because we don't
-		 * handle each tail page individually in migration.
-		 */
-		if (PageHuge(page) || PageTransCompound(page)) {
-			struct page *head = compound_head(page);
-			unsigned int skip_pages;
-
-			if (PageHuge(page)) {
-				if (!hugepage_migration_supported(page_hstate(head)))
-					return page;
-			} else if (!PageLRU(head) && !__PageMovable(head)) {
-				return page;
-			}
-
-			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
-			continue;
-		}
-
-		/*
-		 * We can't use page_count without pin a page
-		 * because another CPU can free compound page.
-		 * This check already skips compound tails of THP
-		 * because their page->_refcount is zero at all time.
-		 */
-		if (!page_ref_count(page)) {
-			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
-			continue;
-		}
-
-		/*
-		 * The HWPoisoned page may be not in buddy system, and
-		 * page_count() is not 0.
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
-			continue;
-
-		/*
-		 * We treat all PageOffline() pages as movable when offlining
-		 * to give drivers a chance to decrement their reference count
-		 * in MEM_GOING_OFFLINE in order to indicate that these pages
-		 * can be offlined as there are no direct references anymore.
-		 * For actually unmovable PageOffline() where the driver does
-		 * not support this, we will fail later when trying to actually
-		 * move these pages that still have a reference count > 0.
-		 * (false negatives in this function only)
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
-			continue;
-
-		if (__PageMovable(page) || PageLRU(page))
-			continue;
-
-		/*
-		 * If there are RECLAIMABLE pages, we need to check
-		 * it.  But now, memory offline itself doesn't call
-		 * shrink_node_slabs() and it still to be fixed.
-		 */
-		return page;
-	}
-	return NULL;
-}
-
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index ff0ea6308299..df49f86a6ed1 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,6 +15,125 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
+/*
+ * This function checks whether pageblock includes unmovable pages or not.
+ *
+ * PageLRU check without isolation or lru_lock could race so that
+ * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
+ * check without lock_page also may miss some movable non-lru pages at
+ * race condition. So you can't expect this function should be exact.
+ *
+ * Returns a page without holding a reference. If the caller wants to
+ * dereference that page (e.g., dumping), it has to make sure that it
+ * cannot get removed (e.g., via memory unplug) concurrently.
+ *
+ */
+static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
+				 int migratetype, int flags)
+{
+	unsigned long iter = 0;
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long offset = pfn % pageblock_nr_pages;
+
+	if (is_migrate_cma_page(page)) {
+		/*
+		 * CMA allocations (alloc_contig_range) really need to mark
+		 * isolate CMA pageblocks even when they are not movable in fact
+		 * so consider them movable here.
+		 */
+		if (is_migrate_cma(migratetype))
+			return NULL;
+
+		return page;
+	}
+
+	for (; iter < pageblock_nr_pages - offset; iter++) {
+		page = pfn_to_page(pfn + iter);
+
+		/*
+		 * Both, bootmem allocations and memory holes are marked
+		 * PG_reserved and are unmovable. We can even have unmovable
+		 * allocations inside ZONE_MOVABLE, for example when
+		 * specifying "movablecore".
+		 */
+		if (PageReserved(page))
+			return page;
+
+		/*
+		 * If the zone is movable and we have ruled out all reserved
+		 * pages then it should be reasonably safe to assume the rest
+		 * is movable.
+		 */
+		if (zone_idx(zone) == ZONE_MOVABLE)
+			continue;
+
+		/*
+		 * Hugepages are not in LRU lists, but they're movable.
+		 * THPs are on the LRU, but need to be counted as #small pages.
+		 * We need not scan over tail pages because we don't
+		 * handle each tail page individually in migration.
+		 */
+		if (PageHuge(page) || PageTransCompound(page)) {
+			struct page *head = compound_head(page);
+			unsigned int skip_pages;
+
+			if (PageHuge(page)) {
+				if (!hugepage_migration_supported(page_hstate(head)))
+					return page;
+			} else if (!PageLRU(head) && !__PageMovable(head)) {
+				return page;
+			}
+
+			skip_pages = compound_nr(head) - (page - head);
+			iter += skip_pages - 1;
+			continue;
+		}
+
+		/*
+		 * We can't use page_count without pin a page
+		 * because another CPU can free compound page.
+		 * This check already skips compound tails of THP
+		 * because their page->_refcount is zero at all time.
+		 */
+		if (!page_ref_count(page)) {
+			if (PageBuddy(page))
+				iter += (1 << buddy_order(page)) - 1;
+			continue;
+		}
+
+		/*
+		 * The HWPoisoned page may be not in buddy system, and
+		 * page_count() is not 0.
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
+			continue;
+
+		/*
+		 * We treat all PageOffline() pages as movable when offlining
+		 * to give drivers a chance to decrement their reference count
+		 * in MEM_GOING_OFFLINE in order to indicate that these pages
+		 * can be offlined as there are no direct references anymore.
+		 * For actually unmovable PageOffline() where the driver does
+		 * not support this, we will fail later when trying to actually
+		 * move these pages that still have a reference count > 0.
+		 * (false negatives in this function only)
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
+			continue;
+
+		if (__PageMovable(page) || PageLRU(page))
+			continue;
+
+		/*
+		 * If there are RECLAIMABLE pages, we need to check
+		 * it.  But now, memory offline itself doesn't call
+		 * shrink_node_slabs() and it still to be fixed.
+		 */
+		return page;
+	}
+	return NULL;
+}
+
 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
 {
 	struct zone *zone = page_zone(page);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
  2022-04-06 15:18 [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
  2022-04-06 15:18 ` [PATCH v10 1/5] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
@ 2022-04-06 15:18 ` Zi Yan
  2022-04-12 13:10     ` David Hildenbrand
  2022-04-06 15:18 ` [PATCH v10 3/5] mm: make alloc_contig_range work at pageblock granularity Zi Yan
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Zi Yan @ 2022-04-06 15:18 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, virtualization, Vlastimil Babka, Mel Gorman,
	Eric Ren, Mike Rapoport, Oscar Salvador, Christophe Leroy,
	Zi Yan

From: Zi Yan <ziy@nvidia.com>

Enable set_migratetype_isolate() to check specified sub-range for
unmovable pages during isolation. Page isolation is done
at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
granularity are intended to be isolated. For example,
alloc_contig_range(), which uses page isolation, allows ranges without
alignment. This commit makes unmovable page check only look for
interesting pages, so that page isolation can succeed for any
non-overlapping ranges.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/page_alloc.c     | 16 ++--------
 mm/page_isolation.c | 78 ++++++++++++++++++++++++++++++++-------------
 2 files changed, 57 insertions(+), 37 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dafef8099acd..cacb2a16145e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8873,16 +8873,6 @@ void *__init alloc_large_system_hash(const char *tablename,
 }
 
 #ifdef CONFIG_CONTIG_ALLOC
-static unsigned long pfn_max_align_down(unsigned long pfn)
-{
-	return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
-}
-
-static unsigned long pfn_max_align_up(unsigned long pfn)
-{
-	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
-}
-
 #if defined(CONFIG_DYNAMIC_DEBUG) || \
 	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
 /* Usage: See admin-guide/dynamic-debug-howto.rst */
@@ -9027,8 +9017,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype, 0);
+	ret = start_isolate_page_range(start, end, migratetype, 0);
 	if (ret)
 		return ret;
 
@@ -9109,8 +9098,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		free_contig_range(end, outer_end - end);
 
 done:
-	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+	undo_isolate_page_range(start, end, migratetype);
 	return ret;
 }
 EXPORT_SYMBOL(alloc_contig_range);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index df49f86a6ed1..b8f96ed49d8b 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -16,7 +16,9 @@
 #include <trace/events/page_isolation.h>
 
 /*
- * This function checks whether pageblock includes unmovable pages or not.
+ * This function checks whether the range [start_pfn, end_pfn) includes
+ * unmovable pages or not. The range must fall into a single pageblock and
+ * consequently belong to a single zone.
  *
  * PageLRU check without isolation or lru_lock could race so that
  * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
@@ -28,12 +30,14 @@
  * cannot get removed (e.g., via memory unplug) concurrently.
  *
  */
-static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
+static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
+				int migratetype, int flags)
 {
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
+	unsigned long pfn = start_pfn;
+	struct page *page = pfn_to_page(pfn);
+
+	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
+		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
 
 	if (is_migrate_cma_page(page)) {
 		/*
@@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		return page;
 	}
 
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		struct zone *zone;
+
+		page = pfn_to_page(pfn);
+		zone = page_zone(page);
 
 		/*
 		 * Both, bootmem allocations and memory holes are marked
@@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 			}
 
 			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
+			pfn += skip_pages - 1;
 			continue;
 		}
 
@@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		 */
 		if (!page_ref_count(page)) {
 			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
+				pfn += (1 << buddy_order(page)) - 1;
 			continue;
 		}
 
@@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 	return NULL;
 }
 
-static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
+/*
+ * This function set pageblock migratetype to isolate if no unmovable page is
+ * present in [start_pfn, end_pfn). The pageblock must intersect with
+ * [start_pfn, end_pfn).
+ */
+static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
+			unsigned long start_pfn, unsigned long end_pfn)
 {
 	struct zone *zone = page_zone(page);
 	struct page *unmovable;
 	unsigned long flags;
+	unsigned long check_unmovable_start, check_unmovable_end;
 
 	spin_lock_irqsave(&zone->lock, flags);
 
@@ -155,8 +169,16 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	/*
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
+	 *
+	 * Pass the intersection of [start_pfn, end_pfn) and the page's pageblock
+	 * to avoid redundant checks.
 	 */
-	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
+	check_unmovable_start = max(page_to_pfn(page), start_pfn);
+	check_unmovable_end = min(ALIGN(page_to_pfn(page) + 1, pageblock_nr_pages),
+				  end_pfn);
+
+	unmovable = has_unmovable_pages(check_unmovable_start, check_unmovable_end,
+			migratetype, isol_flags);
 	if (!unmovable) {
 		unsigned long nr_pages;
 		int mt = get_pageblock_migratetype(page);
@@ -259,12 +281,21 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 	return NULL;
 }
 
+static unsigned long pfn_max_align_down(unsigned long pfn)
+{
+	return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+static unsigned long pfn_max_align_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
 /**
  * start_isolate_page_range() - make page-allocation-type of range of pages to
  * be MIGRATE_ISOLATE.
  * @start_pfn:		The lower PFN of the range to be isolated.
  * @end_pfn:		The upper PFN of the range to be isolated.
- *			start_pfn/end_pfn must be aligned to pageblock_order.
  * @migratetype:	Migrate type to set in error recovery.
  * @flags:		The following flags are allowed (they can be combined in
  *			a bit mask)
@@ -306,15 +337,16 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	unsigned long pfn;
 	struct page *page;
 
-	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
-	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
+	unsigned long isolate_start = pfn_max_align_down(start_pfn);
+	unsigned long isolate_end = pfn_max_align_up(end_pfn);
 
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
+	for (pfn = isolate_start;
+	     pfn < isolate_end;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page && set_migratetype_isolate(page, migratetype, flags)) {
-			undo_isolate_page_range(start_pfn, pfn, migratetype);
+		if (page && set_migratetype_isolate(page, migratetype, flags,
+					start_pfn, end_pfn)) {
+			undo_isolate_page_range(isolate_start, pfn, migratetype);
 			return -EBUSY;
 		}
 	}
@@ -329,12 +361,12 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 {
 	unsigned long pfn;
 	struct page *page;
+	unsigned long isolate_start = pfn_max_align_down(start_pfn);
+	unsigned long isolate_end = pfn_max_align_up(end_pfn);
 
-	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
-	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
 
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
+	for (pfn = isolate_start;
+	     pfn < isolate_end;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || !is_migrate_isolate_page(page))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v10 3/5] mm: make alloc_contig_range work at pageblock granularity
  2022-04-06 15:18 [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
  2022-04-06 15:18 ` [PATCH v10 1/5] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
  2022-04-06 15:18 ` [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages Zi Yan
@ 2022-04-06 15:18 ` Zi Yan
  2022-04-06 15:18 ` [PATCH v10 4/5] mm: cma: use pageblock_order as the single alignment Zi Yan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2022-04-06 15:18 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, virtualization, Vlastimil Babka, Mel Gorman,
	Eric Ren, Mike Rapoport, Oscar Salvador, Christophe Leroy,
	Zi Yan, kernel test robot

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() worked at MAX_ORDER_NR_PAGES granularity to avoid
merging pageblocks with different migratetypes. It might unnecessarily
convert extra pageblocks at the beginning and at the end of the range.
Change alloc_contig_range() to work at pageblock granularity.

Special handling is needed for free pages and in-use pages across the
boundaries of the range specified alloc_contig_range(). Because these
partially isolated pages causes free page accounting issues. The free
pages will be split and freed into separate migratetype lists; the
in-use pages will be migrated then the freed pages will be handled.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/page-isolation.h |   4 +-
 mm/internal.h                  |   6 +
 mm/memory_hotplug.c            |   3 +-
 mm/page_alloc.c                |  54 +++++++--
 mm/page_isolation.c            | 195 ++++++++++++++++++++++++++++++---
 5 files changed, 234 insertions(+), 28 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index e14eddf6741a..5456b7be38ae 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -42,7 +42,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
  */
 int
 start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			 unsigned migratetype, int flags);
+			 int migratetype, int flags, gfp_t gfp_flags);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
@@ -50,7 +50,7 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
  */
 void
 undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			unsigned migratetype);
+			int migratetype);
 
 /*
  * Test all pages in [start_pfn, end_pfn) are isolated or not.
diff --git a/mm/internal.h b/mm/internal.h
index 1d3fb3c0f971..d2ff37e7b5a5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -359,6 +359,9 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t align,
 			  phys_addr_t min_addr,
 			  int nid, bool exact_nid);
 
+void split_free_page(struct page *free_page,
+				int order, unsigned long split_pfn_offset);
+
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 
 /*
@@ -422,6 +425,9 @@ isolate_freepages_range(struct compact_control *cc,
 int
 isolate_migratepages_range(struct compact_control *cc,
 			   unsigned long low_pfn, unsigned long end_pfn);
+
+int __alloc_contig_migrate_range(struct compact_control *cc,
+					unsigned long start, unsigned long end);
 #endif
 int find_suitable_fallback(struct free_area *area, unsigned int order,
 			int migratetype, bool only_stealable, bool *can_steal);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 74430f88853d..3f55af6d1aa7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1844,7 +1844,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
-				       MEMORY_OFFLINE | REPORT_FAILURE);
+				       MEMORY_OFFLINE | REPORT_FAILURE,
+				       GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
 	if (ret) {
 		reason = "failure to isolate range";
 		goto failed_removal_pcplists_disabled;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cacb2a16145e..2148d3d00a70 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1094,6 +1094,43 @@ static inline void __free_one_page(struct page *page,
 		page_reporting_notify_free(order);
 }
 
+/**
+ * split_free_page() -- split a free page at split_pfn_offset
+ * @free_page:		the original free page
+ * @order:		the order of the page
+ * @split_pfn_offset:	split offset within the page
+ *
+ * It is used when the free page crosses two pageblocks with different migratetypes
+ * at split_pfn_offset within the page. The split free page will be put into
+ * separate migratetype lists afterwards. Otherwise, the function achieves
+ * nothing.
+ */
+void split_free_page(struct page *free_page,
+				int order, unsigned long split_pfn_offset)
+{
+	struct zone *zone = page_zone(free_page);
+	unsigned long free_page_pfn = page_to_pfn(free_page);
+	unsigned long pfn;
+	unsigned long flags;
+	int free_page_order;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	del_page_from_free_list(free_page, zone, order);
+	for (pfn = free_page_pfn;
+	     pfn < free_page_pfn + (1UL << order);) {
+		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+		free_page_order = ffs(split_pfn_offset) - 1;
+		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
+				mt, FPI_NONE);
+		pfn += 1UL << free_page_order;
+		split_pfn_offset -= (1UL << free_page_order);
+		/* we have done the first part, now switch to second part */
+		if (split_pfn_offset == 0)
+			split_pfn_offset = (1UL << order) - (pfn - free_page_pfn);
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
 /*
  * A bad page could be due to a number of fields. Instead of multiple branches,
  * try and check multiple fields with one check. The caller must do a detailed
@@ -8895,7 +8932,7 @@ static inline void alloc_contig_dump_pages(struct list_head *page_list)
 #endif
 
 /* [start, end) must belong to a single zone. */
-static int __alloc_contig_migrate_range(struct compact_control *cc,
+int __alloc_contig_migrate_range(struct compact_control *cc,
 					unsigned long start, unsigned long end)
 {
 	/* This function is based on compact_zone() from compaction.c. */
@@ -8978,7 +9015,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		       unsigned migratetype, gfp_t gfp_mask)
 {
 	unsigned long outer_start, outer_end;
-	unsigned int order;
+	int order;
 	int ret = 0;
 
 	struct compact_control cc = {
@@ -8997,14 +9034,11 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
 	 * have different sizes, and due to the way page allocator
-	 * work, we align the range to biggest of the two pages so
-	 * that page allocator won't try to merge buddies from
-	 * different pageblocks and change MIGRATE_ISOLATE to some
-	 * other migration type.
+	 * work, start_isolate_page_range() has special handlings for this.
 	 *
 	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
 	 * migrate the pages from an unaligned range (ie. pages that
-	 * we are interested in).  This will put all the pages in
+	 * we are interested in). This will put all the pages in
 	 * range back to page allocator as MIGRATE_ISOLATE.
 	 *
 	 * When this is done, we take the pages in range from page
@@ -9017,9 +9051,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(start, end, migratetype, 0);
+	ret = start_isolate_page_range(start, end, migratetype, 0, gfp_mask);
 	if (ret)
-		return ret;
+		goto done;
 
 	drain_all_pages(cc.zone);
 
@@ -9039,7 +9073,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	ret = 0;
 
 	/*
-	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * Pages from [start, end) are within a pageblock_nr_pages
 	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
 	 * more, all pages in [start, end) are free in page allocator.
 	 * What we are going to do is to allocate all pages from
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b8f96ed49d8b..f107bc236e6a 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -205,7 +205,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	return -EBUSY;
 }
 
-static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
+static void unset_migratetype_isolate(struct page *page, int migratetype)
 {
 	struct zone *zone;
 	unsigned long flags, nr_pages;
@@ -281,16 +281,158 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 	return NULL;
 }
 
-static unsigned long pfn_max_align_down(unsigned long pfn)
+/**
+ * isolate_single_pageblock() -- tries to isolate a pageblock that might be
+ * within a free or in-use page.
+ * @boundary_pfn:		pageblock-aligned pfn that a page might cross
+ * @gfp_flags:			GFP flags used for migrating pages
+ * @isolate_before:	isolate the pageblock before the boundary_pfn
+ *
+ * Free and in-use pages can be as big as MAX_ORDER-1 and contain more than one
+ * pageblock. When not all pageblocks within a page are isolated at the same
+ * time, free page accounting can go wrong. For example, in the case of
+ * MAX_ORDER-1 = pageblock_order + 1, a MAX_ORDER-1 page has two pagelbocks.
+ * [         MAX_ORDER-1         ]
+ * [  pageblock0  |  pageblock1  ]
+ * When either pageblock is isolated, if it is a free page, the page is not
+ * split into separate migratetype lists, which is supposed to; if it is an
+ * in-use page and freed later, __free_one_page() does not split the free page
+ * either. The function handles this by splitting the free page or migrating
+ * the in-use page then splitting the free page.
+ */
+static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
+			bool isolate_before)
 {
-	return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
-}
+	unsigned char saved_mt;
+	unsigned long start_pfn;
+	unsigned long isolate_pageblock;
+	unsigned long pfn;
+	struct zone *zone;
 
-static unsigned long pfn_max_align_up(unsigned long pfn)
-{
-	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+	VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages));
+
+	if (isolate_before)
+		isolate_pageblock = boundary_pfn - pageblock_nr_pages;
+	else
+		isolate_pageblock = boundary_pfn;
+
+	/*
+	 * scan at the beginning of MAX_ORDER_NR_PAGES aligned range to avoid
+	 * only isolating a subset of pageblocks from a bigger than pageblock
+	 * free or in-use page. Also make sure all to-be-isolated pageblocks
+	 * are within the same zone.
+	 */
+	zone  = page_zone(pfn_to_page(isolate_pageblock));
+	start_pfn  = max(ALIGN_DOWN(isolate_pageblock, MAX_ORDER_NR_PAGES),
+				      zone->zone_start_pfn);
+
+	saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock));
+	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), MIGRATE_ISOLATE);
+
+	/*
+	 * Bail out early when the to-be-isolated pageblock does not form
+	 * a free or in-use page across boundary_pfn:
+	 *
+	 * 1. isolate before boundary_pfn: the page after is not online
+	 * 2. isolate after boundary_pfn: the page before is not online
+	 *
+	 * This also ensures correctness. Without it, when isolate after
+	 * boundary_pfn and [start_pfn, boundary_pfn) are not online,
+	 * __first_valid_page() will return unexpected NULL in the for loop
+	 * below.
+	 */
+	if (isolate_before) {
+		if (!pfn_to_online_page(boundary_pfn))
+			return 0;
+	} else {
+		if (!pfn_to_online_page(boundary_pfn - 1))
+			return 0;
+	}
+
+	for (pfn = start_pfn; pfn < boundary_pfn;) {
+		struct page *page = __first_valid_page(pfn, boundary_pfn - pfn);
+
+		VM_BUG_ON(!page);
+		pfn = page_to_pfn(page);
+		/*
+		 * start_pfn is MAX_ORDER_NR_PAGES aligned, if there is any
+		 * free pages in [start_pfn, boundary_pfn), its head page will
+		 * always be in the range.
+		 */
+		if (PageBuddy(page)) {
+			int order = buddy_order(page);
+
+			if (pfn + (1UL << order) > boundary_pfn)
+				split_free_page(page, order, boundary_pfn - pfn);
+			pfn += (1UL << order);
+			continue;
+		}
+		/*
+		 * migrate compound pages then let the free page handling code
+		 * above do the rest. If migration is not enabled, just fail.
+		 */
+		if (PageHuge(page) || PageTransCompound(page)) {
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+			unsigned long nr_pages = compound_nr(page);
+			int order = compound_order(page);
+			struct page *head = compound_head(page);
+			unsigned long head_pfn = page_to_pfn(head);
+			int ret;
+			struct compact_control cc = {
+				.nr_migratepages = 0,
+				.order = -1,
+				.zone = page_zone(pfn_to_page(head_pfn)),
+				.mode = MIGRATE_SYNC,
+				.ignore_skip_hint = true,
+				.no_set_skip_hint = true,
+				.gfp_mask = gfp_flags,
+				.alloc_contig = true,
+			};
+			INIT_LIST_HEAD(&cc.migratepages);
+
+			if (head_pfn + nr_pages < boundary_pfn) {
+				pfn += nr_pages;
+				continue;
+			}
+
+			ret = __alloc_contig_migrate_range(&cc, head_pfn,
+						head_pfn + nr_pages);
+
+			if (ret)
+				goto failed;
+			/*
+			 * reset pfn, let the free page handling code above
+			 * split the free page to the right migratetype list.
+			 *
+			 * head_pfn is not used here as a hugetlb page order
+			 * can be bigger than MAX_ORDER-1, but after it is
+			 * freed, the free page order is not. Use pfn within
+			 * the range to find the head of the free page and
+			 * reset order to 0 if a hugetlb page with
+			 * >MAX_ORDER-1 order is encountered.
+			 */
+			if (order > MAX_ORDER-1)
+				order = 0;
+			while (!PageBuddy(pfn_to_page(pfn))) {
+				order++;
+				pfn &= ~0UL << order;
+			}
+			continue;
+#else
+			goto failed;
+#endif
+		}
+
+		pfn++;
+	}
+	return 0;
+failed:
+	/* restore the original migratetype */
+	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), saved_mt);
+	return -EBUSY;
 }
 
+
 /**
  * start_isolate_page_range() - make page-allocation-type of range of pages to
  * be MIGRATE_ISOLATE.
@@ -304,6 +446,8 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
  *					 and PageOffline() pages.
  *			REPORT_FAILURE - report details about the failure to
  *			isolate the range
+ * @gfp_flags:		GFP flags used for migrating pages that sit across the
+ *			range boundaries.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -312,6 +456,10 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
  * pages in the range finally, the caller have to free all pages in the range.
  * test_page_isolated() can be used for test it.
  *
+ * The function first tries to isolate the pageblocks at the beginning and end
+ * of the range, since there might be pages across the range boundaries.
+ * Afterwards, it isolates the rest of the range.
+ *
  * There is no high level synchronization mechanism that prevents two threads
  * from trying to isolate overlapping ranges. If this happens, one thread
  * will notice pageblocks in the overlapping range already set to isolate.
@@ -332,21 +480,38 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
  * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			     unsigned migratetype, int flags)
+			     int migratetype, int flags, gfp_t gfp_flags)
 {
 	unsigned long pfn;
 	struct page *page;
+	/* isolation is done at page block granularity */
+	unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages);
+	unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages);
+	int ret;
 
-	unsigned long isolate_start = pfn_max_align_down(start_pfn);
-	unsigned long isolate_end = pfn_max_align_up(end_pfn);
+	/* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */
+	ret = isolate_single_pageblock(isolate_start, gfp_flags, false);
+	if (ret)
+		return ret;
 
-	for (pfn = isolate_start;
-	     pfn < isolate_end;
+	/* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
+	ret = isolate_single_pageblock(isolate_end, gfp_flags, true);
+	if (ret) {
+		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
+		return ret;
+	}
+
+	/* skip isolated pageblocks at the beginning and end */
+	for (pfn = isolate_start + pageblock_nr_pages;
+	     pfn < isolate_end - pageblock_nr_pages;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (page && set_migratetype_isolate(page, migratetype, flags,
 					start_pfn, end_pfn)) {
 			undo_isolate_page_range(isolate_start, pfn, migratetype);
+			unset_migratetype_isolate(
+				pfn_to_page(isolate_end - pageblock_nr_pages),
+				migratetype);
 			return -EBUSY;
 		}
 	}
@@ -357,12 +522,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
  * Make isolated pages available again.
  */
 void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			    unsigned migratetype)
+			    int migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
-	unsigned long isolate_start = pfn_max_align_down(start_pfn);
-	unsigned long isolate_end = pfn_max_align_up(end_pfn);
+	unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages);
+	unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages);
 
 
 	for (pfn = isolate_start;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v10 4/5] mm: cma: use pageblock_order as the single alignment
  2022-04-06 15:18 [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (2 preceding siblings ...)
  2022-04-06 15:18 ` [PATCH v10 3/5] mm: make alloc_contig_range work at pageblock granularity Zi Yan
@ 2022-04-06 15:18 ` Zi Yan
  2022-04-06 15:18 ` [PATCH v10 5/5] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
  2022-04-12 12:35   ` David Hildenbrand
  5 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2022-04-06 15:18 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, virtualization, Vlastimil Babka, Mel Gorman,
	Eric Ren, Mike Rapoport, Oscar Salvador, Christophe Leroy,
	Zi Yan

From: Zi Yan <ziy@nvidia.com>

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_nr_pages
alignment.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/cma.h    | 4 ++--
 include/linux/mmzone.h | 5 +----
 mm/page_alloc.c        | 4 ++--
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index a6f637342740..63873b93deaa 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -17,11 +17,11 @@
 #define CMA_MAX_NAME 64
 
 /*
- * TODO: once the buddy -- especially pageblock merging and alloc_contig_range()
+ *  the buddy -- especially pageblock merging and alloc_contig_range()
  * -- can deal with only some pageblocks of a higher-order page being
  *  MIGRATE_CMA, we can use pageblock_nr_pages.
  */
-#define CMA_MIN_ALIGNMENT_PAGES MAX_ORDER_NR_PAGES
+#define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages
 #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
 
 struct cma;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 46ffab808f03..aab70355d64f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
 	 *
 	 * The way to use it is to change migratetype of a range of
 	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger than
-	 * a single pageblock.
+	 * __free_pageblock_cma() function.
 	 */
 	MIGRATE_CMA,
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2148d3d00a70..29bb5177a7f5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9000,8 +9000,8 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
  *			be either of the two.
  * @gfp_mask:	GFP mask to use during compaction
  *
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned.  The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
  *
  * The first thing this routine does is attempt to MIGRATE_ISOLATE all
  * pageblocks in the range.  Once isolated, the pageblocks should not
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v10 5/5] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.
  2022-04-06 15:18 [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (3 preceding siblings ...)
  2022-04-06 15:18 ` [PATCH v10 4/5] mm: cma: use pageblock_order as the single alignment Zi Yan
@ 2022-04-06 15:18 ` Zi Yan
  2022-04-12 12:35   ` David Hildenbrand
  5 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2022-04-06 15:18 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, virtualization, Vlastimil Babka, Mel Gorman,
	Eric Ren, Mike Rapoport, Oscar Salvador, Christophe Leroy,
	Zi Yan

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() now only needs to be aligned to pageblock_nr_pages,
drop virtio_mem size requirement that it needs to be MAX_ORDER_NR_PAGES.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 drivers/virtio/virtio_mem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index e7d6b679596d..e07486f01999 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2476,10 +2476,10 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
 	/*
-	 * TODO: once alloc_contig_range() works reliably with pageblock
-	 * granularity on ZONE_NORMAL, use pageblock_nr_pages instead.
+	 * alloc_contig_range() works reliably with pageblock
+	 * granularity on ZONE_NORMAL, use pageblock_nr_pages.
 	 */
-	sb_size = PAGE_SIZE * MAX_ORDER_NR_PAGES;
+	sb_size = PAGE_SIZE * pageblock_nr_pages;
 	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
 	if (sb_size < memory_block_size_bytes() && !force_bbm) {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment.
  2022-04-06 15:18 [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
@ 2022-04-12 12:35   ` David Hildenbrand
  2022-04-06 15:18 ` [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages Zi Yan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2022-04-12 12:35 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: linux-kernel, virtualization, Vlastimil Babka, Mel Gorman,
	Eric Ren, Mike Rapoport, Oscar Salvador, Christophe Leroy

On 06.04.22 17:18, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Hi David,

Hi!

> 
> This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
> and alloc_contig_range(). It prepares for my upcoming changes to make
> MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2022-04-05-15-54.

Sorry for the late reply, I've got way too many irons in the fire right now.

> 
> I also added "Content-Type: text/plain; charset=UTF-8" to all email bodies
> explicitly, please let me know if you still cannot see the emails in a
> proper format.

Oh, thanks! But no need to work around Mimecast mailing issues on your
side. This just has to be fixed for good on the RH side ...


I yet heave to give #3 a thorough review, sorry for not commenting on
that earlier. It's a bit more involved, especially, with all the
possible corner cases :)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment.
@ 2022-04-12 12:35   ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2022-04-12 12:35 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: linux-kernel, Christophe Leroy, virtualization, Vlastimil Babka,
	Eric Ren, Mel Gorman, Mike Rapoport, Oscar Salvador

On 06.04.22 17:18, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Hi David,

Hi!

> 
> This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
> and alloc_contig_range(). It prepares for my upcoming changes to make
> MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2022-04-05-15-54.

Sorry for the late reply, I've got way too many irons in the fire right now.

> 
> I also added "Content-Type: text/plain; charset=UTF-8" to all email bodies
> explicitly, please let me know if you still cannot see the emails in a
> proper format.

Oh, thanks! But no need to work around Mimecast mailing issues on your
side. This just has to be fixed for good on the RH side ...


I yet heave to give #3 a thorough review, sorry for not commenting on
that earlier. It's a bit more involved, especially, with all the
possible corner cases :)

-- 
Thanks,

David / dhildenb

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
  2022-04-06 15:18 ` [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages Zi Yan
@ 2022-04-12 13:10     ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2022-04-12 13:10 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: linux-kernel, virtualization, Vlastimil Babka, Mel Gorman,
	Eric Ren, Mike Rapoport, Oscar Salvador, Christophe Leroy

On 06.04.22 17:18, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
> granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---

[...]

>  /*
> - * This function checks whether pageblock includes unmovable pages or not.
> + * This function checks whether the range [start_pfn, end_pfn) includes
> + * unmovable pages or not. The range must fall into a single pageblock and
> + * consequently belong to a single zone.
>   *
>   * PageLRU check without isolation or lru_lock could race so that
>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
> @@ -28,12 +30,14 @@
>   * cannot get removed (e.g., via memory unplug) concurrently.
>   *
>   */
> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> -				 int migratetype, int flags)
> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
> +				int migratetype, int flags)
>  {
> -	unsigned long iter = 0;
> -	unsigned long pfn = page_to_pfn(page);
> -	unsigned long offset = pfn % pageblock_nr_pages;
> +	unsigned long pfn = start_pfn;
> +	struct page *page = pfn_to_page(pfn);


Just do

struct page *page = pfn_to_page(start_pfn);
struct zone *zone = page_zone(page);

here. No need to lookup the zone again in the loop because, as you
document "must ... belong to a single zone.".

Then, there is also no need to initialize "pfn" here. In the loop header
is sufficient.

> +
> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>  
>  	if (is_migrate_cma_page(page)) {
>  		/*
> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>  		return page;
>  	}
>  
> -	for (; iter < pageblock_nr_pages - offset; iter++) {
> -		page = pfn_to_page(pfn + iter);
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +		struct zone *zone;
> +
> +		page = pfn_to_page(pfn);
> +		zone = page_zone(page);
>  
>  		/*
>  		 * Both, bootmem allocations and memory holes are marked
> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>  			}
>  
>  			skip_pages = compound_nr(head) - (page - head);
> -			iter += skip_pages - 1;
> +			pfn += skip_pages - 1;
>  			continue;
>  		}
>  
> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>  		 */
>  		if (!page_ref_count(page)) {
>  			if (PageBuddy(page))
> -				iter += (1 << buddy_order(page)) - 1;
> +				pfn += (1 << buddy_order(page)) - 1;
>  			continue;
>  		}
>  
> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>  	return NULL;
>  }
>  
> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
> +/*
> + * This function set pageblock migratetype to isolate if no unmovable page is
> + * present in [start_pfn, end_pfn). The pageblock must intersect with
> + * [start_pfn, end_pfn).
> + */
> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
> +			unsigned long start_pfn, unsigned long end_pfn)

I think we might be able do better, eventually not passing start_pfn at
all. Hmm.

I think we want to pull out the
start_isolate_page_range()/undo_isolate_page_range() interface change
into a separate patch.

Let me try to give it a shot, I'll try hacking something up real quick
to see if we can do better.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
@ 2022-04-12 13:10     ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2022-04-12 13:10 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: linux-kernel, Christophe Leroy, virtualization, Vlastimil Babka,
	Eric Ren, Mel Gorman, Mike Rapoport, Oscar Salvador

On 06.04.22 17:18, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
> granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---

[...]

>  /*
> - * This function checks whether pageblock includes unmovable pages or not.
> + * This function checks whether the range [start_pfn, end_pfn) includes
> + * unmovable pages or not. The range must fall into a single pageblock and
> + * consequently belong to a single zone.
>   *
>   * PageLRU check without isolation or lru_lock could race so that
>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
> @@ -28,12 +30,14 @@
>   * cannot get removed (e.g., via memory unplug) concurrently.
>   *
>   */
> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> -				 int migratetype, int flags)
> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
> +				int migratetype, int flags)
>  {
> -	unsigned long iter = 0;
> -	unsigned long pfn = page_to_pfn(page);
> -	unsigned long offset = pfn % pageblock_nr_pages;
> +	unsigned long pfn = start_pfn;
> +	struct page *page = pfn_to_page(pfn);


Just do

struct page *page = pfn_to_page(start_pfn);
struct zone *zone = page_zone(page);

here. No need to lookup the zone again in the loop because, as you
document "must ... belong to a single zone.".

Then, there is also no need to initialize "pfn" here. In the loop header
is sufficient.

> +
> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>  
>  	if (is_migrate_cma_page(page)) {
>  		/*
> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>  		return page;
>  	}
>  
> -	for (; iter < pageblock_nr_pages - offset; iter++) {
> -		page = pfn_to_page(pfn + iter);
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +		struct zone *zone;
> +
> +		page = pfn_to_page(pfn);
> +		zone = page_zone(page);
>  
>  		/*
>  		 * Both, bootmem allocations and memory holes are marked
> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>  			}
>  
>  			skip_pages = compound_nr(head) - (page - head);
> -			iter += skip_pages - 1;
> +			pfn += skip_pages - 1;
>  			continue;
>  		}
>  
> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>  		 */
>  		if (!page_ref_count(page)) {
>  			if (PageBuddy(page))
> -				iter += (1 << buddy_order(page)) - 1;
> +				pfn += (1 << buddy_order(page)) - 1;
>  			continue;
>  		}
>  
> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>  	return NULL;
>  }
>  
> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
> +/*
> + * This function set pageblock migratetype to isolate if no unmovable page is
> + * present in [start_pfn, end_pfn). The pageblock must intersect with
> + * [start_pfn, end_pfn).
> + */
> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
> +			unsigned long start_pfn, unsigned long end_pfn)

I think we might be able do better, eventually not passing start_pfn at
all. Hmm.

I think we want to pull out the
start_isolate_page_range()/undo_isolate_page_range() interface change
into a separate patch.

Let me try to give it a shot, I'll try hacking something up real quick
to see if we can do better.

-- 
Thanks,

David / dhildenb

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
  2022-04-12 13:10     ` David Hildenbrand
  (?)
@ 2022-04-12 14:07     ` Zi Yan
  2022-04-12 14:49         ` David Hildenbrand
  -1 siblings, 1 reply; 18+ messages in thread
From: Zi Yan @ 2022-04-12 14:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, virtualization, Vlastimil Babka,
	Mel Gorman, Eric Ren, Mike Rapoport, Oscar Salvador,
	Christophe Leroy

[-- Attachment #1: Type: text/plain, Size: 4678 bytes --]

On 12 Apr 2022, at 9:10, David Hildenbrand wrote:

> On 06.04.22 17:18, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>> granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>
> [...]
>
>>  /*
>> - * This function checks whether pageblock includes unmovable pages or not.
>> + * This function checks whether the range [start_pfn, end_pfn) includes
>> + * unmovable pages or not. The range must fall into a single pageblock and
>> + * consequently belong to a single zone.
>>   *
>>   * PageLRU check without isolation or lru_lock could race so that
>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>> @@ -28,12 +30,14 @@
>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>   *
>>   */
>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>> -				 int migratetype, int flags)
>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>> +				int migratetype, int flags)
>>  {
>> -	unsigned long iter = 0;
>> -	unsigned long pfn = page_to_pfn(page);
>> -	unsigned long offset = pfn % pageblock_nr_pages;
>> +	unsigned long pfn = start_pfn;
>> +	struct page *page = pfn_to_page(pfn);
>
>
> Just do
>
> struct page *page = pfn_to_page(start_pfn);
> struct zone *zone = page_zone(page);
>
> here. No need to lookup the zone again in the loop because, as you
> document "must ... belong to a single zone.".
>
> Then, there is also no need to initialize "pfn" here. In the loop header
> is sufficient.
>

Sure.

>> +
>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>
>>  	if (is_migrate_cma_page(page)) {
>>  		/*
>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>  		return page;
>>  	}
>>
>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>> -		page = pfn_to_page(pfn + iter);
>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> +		struct zone *zone;
>> +
>> +		page = pfn_to_page(pfn);
>> +		zone = page_zone(page);
>>
>>  		/*
>>  		 * Both, bootmem allocations and memory holes are marked
>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>  			}
>>
>>  			skip_pages = compound_nr(head) - (page - head);
>> -			iter += skip_pages - 1;
>> +			pfn += skip_pages - 1;
>>  			continue;
>>  		}
>>
>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>  		 */
>>  		if (!page_ref_count(page)) {
>>  			if (PageBuddy(page))
>> -				iter += (1 << buddy_order(page)) - 1;
>> +				pfn += (1 << buddy_order(page)) - 1;
>>  			continue;
>>  		}
>>
>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>  	return NULL;
>>  }
>>
>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>> +/*
>> + * This function set pageblock migratetype to isolate if no unmovable page is
>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>> + * [start_pfn, end_pfn).
>> + */
>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>> +			unsigned long start_pfn, unsigned long end_pfn)
>
> I think we might be able do better, eventually not passing start_pfn at
> all. Hmm.

IMHO, having start_pfn and end_pfn in the parameter list would make the
interface easier to understand. Otherwise if we remove start_pfn,
the caller needs to adjust @page to be within the range of [start_pfn,
end_pfn)

>
> I think we want to pull out the
> start_isolate_page_range()/undo_isolate_page_range() interface change
> into a separate patch.

You mean a patch just adding

unsigned long isolate_start = pfn_max_align_down(start_pfn);
unsigned long isolate_end = pfn_max_align_up(end_pfn);

in start_isolate_page_range()/undo_isolate_page_range()?

Yes I can do that.

>
> Let me try to give it a shot, I'll try hacking something up real quick
> to see if we can do better.

Sure. Thanks.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
  2022-04-12 14:07     ` Zi Yan
@ 2022-04-12 14:49         ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2022-04-12 14:49 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, linux-kernel, virtualization, Vlastimil Babka,
	Mel Gorman, Eric Ren, Mike Rapoport, Oscar Salvador,
	Christophe Leroy

On 12.04.22 16:07, Zi Yan wrote:
> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
> 
>> On 06.04.22 17:18, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> Enable set_migratetype_isolate() to check specified sub-range for
>>> unmovable pages during isolation. Page isolation is done
>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>> granularity are intended to be isolated. For example,
>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>> alignment. This commit makes unmovable page check only look for
>>> interesting pages, so that page isolation can succeed for any
>>> non-overlapping ranges.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>
>> [...]
>>
>>>  /*
>>> - * This function checks whether pageblock includes unmovable pages or not.
>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>> + * consequently belong to a single zone.
>>>   *
>>>   * PageLRU check without isolation or lru_lock could race so that
>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>> @@ -28,12 +30,14 @@
>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>   *
>>>   */
>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>> -				 int migratetype, int flags)
>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>> +				int migratetype, int flags)
>>>  {
>>> -	unsigned long iter = 0;
>>> -	unsigned long pfn = page_to_pfn(page);
>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>> +	unsigned long pfn = start_pfn;
>>> +	struct page *page = pfn_to_page(pfn);
>>
>>
>> Just do
>>
>> struct page *page = pfn_to_page(start_pfn);
>> struct zone *zone = page_zone(page);
>>
>> here. No need to lookup the zone again in the loop because, as you
>> document "must ... belong to a single zone.".
>>
>> Then, there is also no need to initialize "pfn" here. In the loop header
>> is sufficient.
>>
> 
> Sure.
> 
>>> +
>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>
>>>  	if (is_migrate_cma_page(page)) {
>>>  		/*
>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>  		return page;
>>>  	}
>>>
>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>> -		page = pfn_to_page(pfn + iter);
>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>> +		struct zone *zone;
>>> +
>>> +		page = pfn_to_page(pfn);
>>> +		zone = page_zone(page);
>>>
>>>  		/*
>>>  		 * Both, bootmem allocations and memory holes are marked
>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>  			}
>>>
>>>  			skip_pages = compound_nr(head) - (page - head);
>>> -			iter += skip_pages - 1;
>>> +			pfn += skip_pages - 1;
>>>  			continue;
>>>  		}
>>>
>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>  		 */
>>>  		if (!page_ref_count(page)) {
>>>  			if (PageBuddy(page))
>>> -				iter += (1 << buddy_order(page)) - 1;
>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>  			continue;
>>>  		}
>>>
>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>  	return NULL;
>>>  }
>>>
>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>> +/*
>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>> + * [start_pfn, end_pfn).
>>> + */
>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>
>> I think we might be able do better, eventually not passing start_pfn at
>> all. Hmm.
> 
> IMHO, having start_pfn and end_pfn in the parameter list would make the
> interface easier to understand. Otherwise if we remove start_pfn,
> the caller needs to adjust @page to be within the range of [start_pfn,
> end_pfn)
> 
>>
>> I think we want to pull out the
>> start_isolate_page_range()/undo_isolate_page_range() interface change
>> into a separate patch.
> 
> You mean a patch just adding
> 
> unsigned long isolate_start = pfn_max_align_down(start_pfn);
> unsigned long isolate_end = pfn_max_align_up(end_pfn);
> 
> in start_isolate_page_range()/undo_isolate_page_range()?
> 
> Yes I can do that.

I think we have to be careful with memory onlining/offlining. There are
corner cases where we get called with only pageblock alignment and
must not adjust the range.


Something like this as a base for the next cleanups/extensions:


From 18d29b53600d6d0d6ac87cdc6b7517e989fa3dac Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 12 Apr 2022 15:51:50 +0200
Subject: [PATCH] mm: page-isolation: Move alignment logic into
 start_isolate_page_range()/undo_isolate_page_range()

For ordinary range allocations, we actually have to isolate all pageblocks
in a MAX_ORDER - 1 range. Only memory onlining/offlining is special: it
knows exactly which pageblocks to isolate/unisolate and we must not mess
with the pageblocks to isolate (memory onlining/offlining alwayes passed
MAX_ORDER - 1 - aligned ranges, unless we're dealing with vmemmap
located on hotplugged memory, whereby the start pfn might only be
pageblock aligned).

Further, for ordinary allcoations, we'll want to know the exact range
we want to allocate -- to check only that range for unmovable pages.
Right now we lose that information.

So let's move the alignment logic into start_isolate_page_range() /
undo_isolate_page_range(), such that we have the actual range of
interest available and the alignment logic contained in there.

Provide start_isolate_pageblocks()/undo_isolate_pageblocks() for memory
onlining/offlining.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/page-isolation.h | 23 ++++-----
 mm/memory_hotplug.c            |  8 ++--
 mm/page_alloc.c                | 15 +-----
 mm/page_isolation.c            | 85 ++++++++++++++++++++++++++--------
 4 files changed, 81 insertions(+), 50 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index e14eddf6741a..8e9e9e80ba67 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -37,20 +37,15 @@ void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
 
-/*
- * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			 unsigned migratetype, int flags);
-
-/*
- * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
- * target range is [start_pfn, end_pfn)
- */
-void
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			unsigned migratetype);
+int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype, int flags);
+int start_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype, int flags);
+
+void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype);
+void undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype);
 
 /*
  * Test all pages in [start_pfn, end_pfn) are isolated or not.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 416b38ca8def..fb7f63c800d1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1089,7 +1089,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 
 	/*
 	 * Fixup the number of isolated pageblocks before marking the sections
-	 * onlining, such that undo_isolate_page_range() works correctly.
+	 * onlining, such that undo_isolate_pageblocks() works correctly.
 	 */
 	spin_lock_irqsave(&zone->lock, flags);
 	zone->nr_isolate_pageblock += nr_pages / pageblock_nr_pages;
@@ -1113,7 +1113,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 		build_all_zonelists(NULL);
 
 	/* Basic onlining is complete, allow allocation of onlined pages. */
-	undo_isolate_page_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE);
+	undo_isolate_pageblocks(pfn, pfn + nr_pages, MIGRATE_MOVABLE);
 
 	/*
 	 * Freshly onlined pages aren't shuffled (e.g., all pages are placed to
@@ -1834,7 +1834,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	lru_cache_disable();
 
 	/* set above range as isolated */
-	ret = start_isolate_page_range(start_pfn, end_pfn,
+	ret = start_isolate_pageblocks(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
 				       MEMORY_OFFLINE | REPORT_FAILURE);
 	if (ret) {
@@ -1937,7 +1937,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 
 failed_removal_isolated:
 	/* pushback to free area */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_pageblocks(start_pfn, end_pfn, MIGRATE_MOVABLE);
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 failed_removal_pcplists_disabled:
 	lru_cache_enable();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index badc05fc1b2f..76f8c19e37a2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8950,15 +8950,6 @@ void *__init alloc_large_system_hash(const char *tablename,
 }
 
 #ifdef CONFIG_CONTIG_ALLOC
-static unsigned long pfn_max_align_down(unsigned long pfn)
-{
-	return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
-}
-
-static unsigned long pfn_max_align_up(unsigned long pfn)
-{
-	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
-}
 
 #if defined(CONFIG_DYNAMIC_DEBUG) || \
 	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
@@ -9104,8 +9095,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype, 0);
+	ret = start_isolate_page_range(start, end, migratetype, 0);
 	if (ret)
 		return ret;
 
@@ -9186,8 +9176,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		free_contig_range(end, outer_end - end);
 
 done:
-	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+	undo_isolate_page_range(start, end, migratetype);
 	return ret;
 }
 EXPORT_SYMBOL(alloc_contig_range);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b34f1310aeaa..7c1f7724c5bb 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,6 +15,16 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
+static unsigned long pfn_max_align_down(unsigned long pfn)
+{
+	return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+static unsigned long pfn_max_align_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
 /*
  * This function checks whether pageblock includes unmovable pages or not.
  *
@@ -262,12 +272,40 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 	return NULL;
 }
 
+/*
+ * Make page-allocation-type of pageblocks to be MIGRATE_ISOLATE.
+ *
+ * Most users should actually use start_isolate_page_range(). Only memory
+ * onlining/offlining that knows exactly what it's doing in regard to
+ * isolating only some pageblocks of MAX_ORDER - 1 pages (for the vmemmap)
+ * should use this interface.
+ */
+int start_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype, int flags)
+{
+	unsigned long pfn;
+	struct page *page;
+
+	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
+	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
+
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += pageblock_nr_pages) {
+		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (page && set_migratetype_isolate(page, migratetype, flags)) {
+			undo_isolate_pageblocks(start_pfn, pfn, migratetype);
+			return -EBUSY;
+		}
+	}
+	return 0;
+}
+
 /**
  * start_isolate_page_range() - make page-allocation-type of range of pages to
  * be MIGRATE_ISOLATE.
  * @start_pfn:		The lower PFN of the range to be isolated.
  * @end_pfn:		The upper PFN of the range to be isolated.
- *			start_pfn/end_pfn must be aligned to pageblock_order.
  * @migratetype:	Migrate type to set in error recovery.
  * @flags:		The following flags are allowed (they can be combined in
  *			a bit mask)
@@ -306,29 +344,23 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     unsigned migratetype, int flags)
 {
-	unsigned long pfn;
-	struct page *page;
-
-	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
-	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
+	start_pfn = = pfn_max_align_down(start_pfn);
+	end_pfn = pfn_max_align_up(end_pfn);
 
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += pageblock_nr_pages) {
-		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page && set_migratetype_isolate(page, migratetype, flags)) {
-			undo_isolate_page_range(start_pfn, pfn, migratetype);
-			return -EBUSY;
-		}
-	}
-	return 0;
+	return start_isolate_pageblocks(start_pfn, end_pfn, migratetype, flags);
 }
 
 /*
- * Make isolated pages available again.
+ * Make isolated pageblocks, isolated via start_isolate_pageblocks, available
+ * again.
+ *
+ * Most users should actually use undo_isolate_page_range(). Only memory
+ * onlining/offlining that knows exactly what it's doing in regard to
+ * isolating only some pageblocks of MAX_ORDER - 1 pages (for the vmemmap)
+ * should use this interface.
  */
-void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			    unsigned migratetype)
+void undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -345,6 +377,21 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 		unset_migratetype_isolate(page, migratetype);
 	}
 }
+
+/*
+ * Make isolated pageblocks, isolated via start_isolate_page_range(), available
+ * again. The pageblock isolation range will be extended just like for
+ * start_isolate_page_range().
+ */
+void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype)
+{
+	start_pfn = = pfn_max_align_down(start_pfn);
+	end_pfn = pfn_max_align_up(end_pfn);
+
+	return undo_isolate_pageblocks(start_pfn, end_pfn, migratetype);
+}
+
 /*
  * Test all pages in the range is free(means isolated) or not.
  * all pages in [start_pfn...end_pfn) must be in the same zone.
-- 
2.35.1



-- 
Thanks,

David / dhildenb


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
@ 2022-04-12 14:49         ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2022-04-12 14:49 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-kernel, Christophe Leroy, virtualization, linux-mm,
	Mike Rapoport, Eric Ren, Mel Gorman, Vlastimil Babka,
	Oscar Salvador

On 12.04.22 16:07, Zi Yan wrote:
> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
> 
>> On 06.04.22 17:18, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> Enable set_migratetype_isolate() to check specified sub-range for
>>> unmovable pages during isolation. Page isolation is done
>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>> granularity are intended to be isolated. For example,
>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>> alignment. This commit makes unmovable page check only look for
>>> interesting pages, so that page isolation can succeed for any
>>> non-overlapping ranges.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>
>> [...]
>>
>>>  /*
>>> - * This function checks whether pageblock includes unmovable pages or not.
>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>> + * consequently belong to a single zone.
>>>   *
>>>   * PageLRU check without isolation or lru_lock could race so that
>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>> @@ -28,12 +30,14 @@
>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>   *
>>>   */
>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>> -				 int migratetype, int flags)
>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>> +				int migratetype, int flags)
>>>  {
>>> -	unsigned long iter = 0;
>>> -	unsigned long pfn = page_to_pfn(page);
>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>> +	unsigned long pfn = start_pfn;
>>> +	struct page *page = pfn_to_page(pfn);
>>
>>
>> Just do
>>
>> struct page *page = pfn_to_page(start_pfn);
>> struct zone *zone = page_zone(page);
>>
>> here. No need to lookup the zone again in the loop because, as you
>> document "must ... belong to a single zone.".
>>
>> Then, there is also no need to initialize "pfn" here. In the loop header
>> is sufficient.
>>
> 
> Sure.
> 
>>> +
>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>
>>>  	if (is_migrate_cma_page(page)) {
>>>  		/*
>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>  		return page;
>>>  	}
>>>
>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>> -		page = pfn_to_page(pfn + iter);
>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>> +		struct zone *zone;
>>> +
>>> +		page = pfn_to_page(pfn);
>>> +		zone = page_zone(page);
>>>
>>>  		/*
>>>  		 * Both, bootmem allocations and memory holes are marked
>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>  			}
>>>
>>>  			skip_pages = compound_nr(head) - (page - head);
>>> -			iter += skip_pages - 1;
>>> +			pfn += skip_pages - 1;
>>>  			continue;
>>>  		}
>>>
>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>  		 */
>>>  		if (!page_ref_count(page)) {
>>>  			if (PageBuddy(page))
>>> -				iter += (1 << buddy_order(page)) - 1;
>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>  			continue;
>>>  		}
>>>
>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>  	return NULL;
>>>  }
>>>
>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>> +/*
>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>> + * [start_pfn, end_pfn).
>>> + */
>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>
>> I think we might be able do better, eventually not passing start_pfn at
>> all. Hmm.
> 
> IMHO, having start_pfn and end_pfn in the parameter list would make the
> interface easier to understand. Otherwise if we remove start_pfn,
> the caller needs to adjust @page to be within the range of [start_pfn,
> end_pfn)
> 
>>
>> I think we want to pull out the
>> start_isolate_page_range()/undo_isolate_page_range() interface change
>> into a separate patch.
> 
> You mean a patch just adding
> 
> unsigned long isolate_start = pfn_max_align_down(start_pfn);
> unsigned long isolate_end = pfn_max_align_up(end_pfn);
> 
> in start_isolate_page_range()/undo_isolate_page_range()?
> 
> Yes I can do that.

I think we have to be careful with memory onlining/offlining. There are
corner cases where we get called with only pageblock alignment and
must not adjust the range.


Something like this as a base for the next cleanups/extensions:


From 18d29b53600d6d0d6ac87cdc6b7517e989fa3dac Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 12 Apr 2022 15:51:50 +0200
Subject: [PATCH] mm: page-isolation: Move alignment logic into
 start_isolate_page_range()/undo_isolate_page_range()

For ordinary range allocations, we actually have to isolate all pageblocks
in a MAX_ORDER - 1 range. Only memory onlining/offlining is special: it
knows exactly which pageblocks to isolate/unisolate and we must not mess
with the pageblocks to isolate (memory onlining/offlining alwayes passed
MAX_ORDER - 1 - aligned ranges, unless we're dealing with vmemmap
located on hotplugged memory, whereby the start pfn might only be
pageblock aligned).

Further, for ordinary allcoations, we'll want to know the exact range
we want to allocate -- to check only that range for unmovable pages.
Right now we lose that information.

So let's move the alignment logic into start_isolate_page_range() /
undo_isolate_page_range(), such that we have the actual range of
interest available and the alignment logic contained in there.

Provide start_isolate_pageblocks()/undo_isolate_pageblocks() for memory
onlining/offlining.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/page-isolation.h | 23 ++++-----
 mm/memory_hotplug.c            |  8 ++--
 mm/page_alloc.c                | 15 +-----
 mm/page_isolation.c            | 85 ++++++++++++++++++++++++++--------
 4 files changed, 81 insertions(+), 50 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index e14eddf6741a..8e9e9e80ba67 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -37,20 +37,15 @@ void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
 
-/*
- * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			 unsigned migratetype, int flags);
-
-/*
- * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
- * target range is [start_pfn, end_pfn)
- */
-void
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			unsigned migratetype);
+int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype, int flags);
+int start_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype, int flags);
+
+void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype);
+void undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype);
 
 /*
  * Test all pages in [start_pfn, end_pfn) are isolated or not.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 416b38ca8def..fb7f63c800d1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1089,7 +1089,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 
 	/*
 	 * Fixup the number of isolated pageblocks before marking the sections
-	 * onlining, such that undo_isolate_page_range() works correctly.
+	 * onlining, such that undo_isolate_pageblocks() works correctly.
 	 */
 	spin_lock_irqsave(&zone->lock, flags);
 	zone->nr_isolate_pageblock += nr_pages / pageblock_nr_pages;
@@ -1113,7 +1113,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 		build_all_zonelists(NULL);
 
 	/* Basic onlining is complete, allow allocation of onlined pages. */
-	undo_isolate_page_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE);
+	undo_isolate_pageblocks(pfn, pfn + nr_pages, MIGRATE_MOVABLE);
 
 	/*
 	 * Freshly onlined pages aren't shuffled (e.g., all pages are placed to
@@ -1834,7 +1834,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	lru_cache_disable();
 
 	/* set above range as isolated */
-	ret = start_isolate_page_range(start_pfn, end_pfn,
+	ret = start_isolate_pageblocks(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
 				       MEMORY_OFFLINE | REPORT_FAILURE);
 	if (ret) {
@@ -1937,7 +1937,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 
 failed_removal_isolated:
 	/* pushback to free area */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_pageblocks(start_pfn, end_pfn, MIGRATE_MOVABLE);
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 failed_removal_pcplists_disabled:
 	lru_cache_enable();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index badc05fc1b2f..76f8c19e37a2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8950,15 +8950,6 @@ void *__init alloc_large_system_hash(const char *tablename,
 }
 
 #ifdef CONFIG_CONTIG_ALLOC
-static unsigned long pfn_max_align_down(unsigned long pfn)
-{
-	return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
-}
-
-static unsigned long pfn_max_align_up(unsigned long pfn)
-{
-	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
-}
 
 #if defined(CONFIG_DYNAMIC_DEBUG) || \
 	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
@@ -9104,8 +9095,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype, 0);
+	ret = start_isolate_page_range(start, end, migratetype, 0);
 	if (ret)
 		return ret;
 
@@ -9186,8 +9176,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		free_contig_range(end, outer_end - end);
 
 done:
-	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+	undo_isolate_page_range(start, end, migratetype);
 	return ret;
 }
 EXPORT_SYMBOL(alloc_contig_range);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b34f1310aeaa..7c1f7724c5bb 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,6 +15,16 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
+static unsigned long pfn_max_align_down(unsigned long pfn)
+{
+	return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+static unsigned long pfn_max_align_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
 /*
  * This function checks whether pageblock includes unmovable pages or not.
  *
@@ -262,12 +272,40 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 	return NULL;
 }
 
+/*
+ * Make page-allocation-type of pageblocks to be MIGRATE_ISOLATE.
+ *
+ * Most users should actually use start_isolate_page_range(). Only memory
+ * onlining/offlining that knows exactly what it's doing in regard to
+ * isolating only some pageblocks of MAX_ORDER - 1 pages (for the vmemmap)
+ * should use this interface.
+ */
+int start_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype, int flags)
+{
+	unsigned long pfn;
+	struct page *page;
+
+	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
+	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
+
+	for (pfn = start_pfn;
+	     pfn < end_pfn;
+	     pfn += pageblock_nr_pages) {
+		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (page && set_migratetype_isolate(page, migratetype, flags)) {
+			undo_isolate_pageblocks(start_pfn, pfn, migratetype);
+			return -EBUSY;
+		}
+	}
+	return 0;
+}
+
 /**
  * start_isolate_page_range() - make page-allocation-type of range of pages to
  * be MIGRATE_ISOLATE.
  * @start_pfn:		The lower PFN of the range to be isolated.
  * @end_pfn:		The upper PFN of the range to be isolated.
- *			start_pfn/end_pfn must be aligned to pageblock_order.
  * @migratetype:	Migrate type to set in error recovery.
  * @flags:		The following flags are allowed (they can be combined in
  *			a bit mask)
@@ -306,29 +344,23 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     unsigned migratetype, int flags)
 {
-	unsigned long pfn;
-	struct page *page;
-
-	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
-	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
+	start_pfn = = pfn_max_align_down(start_pfn);
+	end_pfn = pfn_max_align_up(end_pfn);
 
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += pageblock_nr_pages) {
-		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page && set_migratetype_isolate(page, migratetype, flags)) {
-			undo_isolate_page_range(start_pfn, pfn, migratetype);
-			return -EBUSY;
-		}
-	}
-	return 0;
+	return start_isolate_pageblocks(start_pfn, end_pfn, migratetype, flags);
 }
 
 /*
- * Make isolated pages available again.
+ * Make isolated pageblocks, isolated via start_isolate_pageblocks, available
+ * again.
+ *
+ * Most users should actually use undo_isolate_page_range(). Only memory
+ * onlining/offlining that knows exactly what it's doing in regard to
+ * isolating only some pageblocks of MAX_ORDER - 1 pages (for the vmemmap)
+ * should use this interface.
  */
-void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			    unsigned migratetype)
+void undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -345,6 +377,21 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 		unset_migratetype_isolate(page, migratetype);
 	}
 }
+
+/*
+ * Make isolated pageblocks, isolated via start_isolate_page_range(), available
+ * again. The pageblock isolation range will be extended just like for
+ * start_isolate_page_range().
+ */
+void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype)
+{
+	start_pfn = = pfn_max_align_down(start_pfn);
+	end_pfn = pfn_max_align_up(end_pfn);
+
+	return undo_isolate_pageblocks(start_pfn, end_pfn, migratetype);
+}
+
 /*
  * Test all pages in the range is free(means isolated) or not.
  * all pages in [start_pfn...end_pfn) must be in the same zone.
-- 
2.35.1



-- 
Thanks,

David / dhildenb

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
  2022-04-12 14:49         ` David Hildenbrand
  (?)
@ 2022-04-12 15:01         ` Zi Yan
  2022-04-12 15:06             ` David Hildenbrand
  -1 siblings, 1 reply; 18+ messages in thread
From: Zi Yan @ 2022-04-12 15:01 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, virtualization, Vlastimil Babka,
	Mel Gorman, Eric Ren, Mike Rapoport, Oscar Salvador,
	Christophe Leroy

[-- Attachment #1: Type: text/plain, Size: 16600 bytes --]

On 12 Apr 2022, at 10:49, David Hildenbrand wrote:

> On 12.04.22 16:07, Zi Yan wrote:
>> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
>>
>>> On 06.04.22 17:18, Zi Yan wrote:
>>>> From: Zi Yan <ziy@nvidia.com>
>>>>
>>>> Enable set_migratetype_isolate() to check specified sub-range for
>>>> unmovable pages during isolation. Page isolation is done
>>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>>> granularity are intended to be isolated. For example,
>>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>>> alignment. This commit makes unmovable page check only look for
>>>> interesting pages, so that page isolation can succeed for any
>>>> non-overlapping ranges.
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> ---
>>>
>>> [...]
>>>
>>>>  /*
>>>> - * This function checks whether pageblock includes unmovable pages or not.
>>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>>> + * consequently belong to a single zone.
>>>>   *
>>>>   * PageLRU check without isolation or lru_lock could race so that
>>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>>> @@ -28,12 +30,14 @@
>>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>>   *
>>>>   */
>>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>> -				 int migratetype, int flags)
>>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>>> +				int migratetype, int flags)
>>>>  {
>>>> -	unsigned long iter = 0;
>>>> -	unsigned long pfn = page_to_pfn(page);
>>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>>> +	unsigned long pfn = start_pfn;
>>>> +	struct page *page = pfn_to_page(pfn);
>>>
>>>
>>> Just do
>>>
>>> struct page *page = pfn_to_page(start_pfn);
>>> struct zone *zone = page_zone(page);
>>>
>>> here. No need to lookup the zone again in the loop because, as you
>>> document "must ... belong to a single zone.".
>>>
>>> Then, there is also no need to initialize "pfn" here. In the loop header
>>> is sufficient.
>>>
>>
>> Sure.
>>
>>>> +
>>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>>
>>>>  	if (is_migrate_cma_page(page)) {
>>>>  		/*
>>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>  		return page;
>>>>  	}
>>>>
>>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>>> -		page = pfn_to_page(pfn + iter);
>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>> +		struct zone *zone;
>>>> +
>>>> +		page = pfn_to_page(pfn);
>>>> +		zone = page_zone(page);
>>>>
>>>>  		/*
>>>>  		 * Both, bootmem allocations and memory holes are marked
>>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>  			}
>>>>
>>>>  			skip_pages = compound_nr(head) - (page - head);
>>>> -			iter += skip_pages - 1;
>>>> +			pfn += skip_pages - 1;
>>>>  			continue;
>>>>  		}
>>>>
>>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>  		 */
>>>>  		if (!page_ref_count(page)) {
>>>>  			if (PageBuddy(page))
>>>> -				iter += (1 << buddy_order(page)) - 1;
>>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>>  			continue;
>>>>  		}
>>>>
>>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>  	return NULL;
>>>>  }
>>>>
>>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>>> +/*
>>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>>> + * [start_pfn, end_pfn).
>>>> + */
>>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>>
>>> I think we might be able do better, eventually not passing start_pfn at
>>> all. Hmm.
>>
>> IMHO, having start_pfn and end_pfn in the parameter list would make the
>> interface easier to understand. Otherwise if we remove start_pfn,
>> the caller needs to adjust @page to be within the range of [start_pfn,
>> end_pfn)
>>
>>>
>>> I think we want to pull out the
>>> start_isolate_page_range()/undo_isolate_page_range() interface change
>>> into a separate patch.
>>
>> You mean a patch just adding
>>
>> unsigned long isolate_start = pfn_max_align_down(start_pfn);
>> unsigned long isolate_end = pfn_max_align_up(end_pfn);
>>
>> in start_isolate_page_range()/undo_isolate_page_range()?
>>
>> Yes I can do that.
>
> I think we have to be careful with memory onlining/offlining. There are
> corner cases where we get called with only pageblock alignment and
> must not adjust the range.

In the patch below, you added a new set of start_isolate_pageblocks()
and undo_isolate_pageblocks(), where start_isolate_pageblocks() still
calls set_migratetype_isolate() and noted their range should not be
adjusted. But in my patch, set_migratetype_isolate() adjusts
the range for has_unmovable_pages() check. Do you mean
start_isolate_pageblocks() should call a different version of
set_migratetype_isolate() without range adjustment? That can be done
with an additional parameter in start_isolate_page_range(), like
bool strict, right?


>
>
> Something like this as a base for the next cleanups/extensions:
>
>
> From 18d29b53600d6d0d6ac87cdc6b7517e989fa3dac Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Tue, 12 Apr 2022 15:51:50 +0200
> Subject: [PATCH] mm: page-isolation: Move alignment logic into
>  start_isolate_page_range()/undo_isolate_page_range()
>
> For ordinary range allocations, we actually have to isolate all pageblocks
> in a MAX_ORDER - 1 range. Only memory onlining/offlining is special: it
> knows exactly which pageblocks to isolate/unisolate and we must not mess
> with the pageblocks to isolate (memory onlining/offlining alwayes passed
> MAX_ORDER - 1 - aligned ranges, unless we're dealing with vmemmap
> located on hotplugged memory, whereby the start pfn might only be
> pageblock aligned).
>
> Further, for ordinary allcoations, we'll want to know the exact range
> we want to allocate -- to check only that range for unmovable pages.
> Right now we lose that information.
>
> So let's move the alignment logic into start_isolate_page_range() /
> undo_isolate_page_range(), such that we have the actual range of
> interest available and the alignment logic contained in there.
>
> Provide start_isolate_pageblocks()/undo_isolate_pageblocks() for memory
> onlining/offlining.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/page-isolation.h | 23 ++++-----
>  mm/memory_hotplug.c            |  8 ++--
>  mm/page_alloc.c                | 15 +-----
>  mm/page_isolation.c            | 85 ++++++++++++++++++++++++++--------
>  4 files changed, 81 insertions(+), 50 deletions(-)
>
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index e14eddf6741a..8e9e9e80ba67 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -37,20 +37,15 @@ void set_pageblock_migratetype(struct page *page, int migratetype);
>  int move_freepages_block(struct zone *zone, struct page *page,
>  				int migratetype, int *num_movable);
>
> -/*
> - * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
> - */
> -int
> -start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> -			 unsigned migratetype, int flags);
> -
> -/*
> - * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
> - * target range is [start_pfn, end_pfn)
> - */
> -void
> -undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> -			unsigned migratetype);
> +int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> +			     unsigned migratetype, int flags);
> +int start_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
> +			     unsigned migratetype, int flags);
> +
> +void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> +			     unsigned migratetype);
> +void undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
> +			     unsigned migratetype);
>
>  /*
>   * Test all pages in [start_pfn, end_pfn) are isolated or not.
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 416b38ca8def..fb7f63c800d1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1089,7 +1089,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
>
>  	/*
>  	 * Fixup the number of isolated pageblocks before marking the sections
> -	 * onlining, such that undo_isolate_page_range() works correctly.
> +	 * onlining, such that undo_isolate_pageblocks() works correctly.
>  	 */
>  	spin_lock_irqsave(&zone->lock, flags);
>  	zone->nr_isolate_pageblock += nr_pages / pageblock_nr_pages;
> @@ -1113,7 +1113,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
>  		build_all_zonelists(NULL);
>
>  	/* Basic onlining is complete, allow allocation of onlined pages. */
> -	undo_isolate_page_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE);
> +	undo_isolate_pageblocks(pfn, pfn + nr_pages, MIGRATE_MOVABLE);
>
>  	/*
>  	 * Freshly onlined pages aren't shuffled (e.g., all pages are placed to
> @@ -1834,7 +1834,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	lru_cache_disable();
>
>  	/* set above range as isolated */
> -	ret = start_isolate_page_range(start_pfn, end_pfn,
> +	ret = start_isolate_pageblocks(start_pfn, end_pfn,
>  				       MIGRATE_MOVABLE,
>  				       MEMORY_OFFLINE | REPORT_FAILURE);
>  	if (ret) {
> @@ -1937,7 +1937,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>
>  failed_removal_isolated:
>  	/* pushback to free area */
> -	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
> +	undo_isolate_pageblocks(start_pfn, end_pfn, MIGRATE_MOVABLE);
>  	memory_notify(MEM_CANCEL_OFFLINE, &arg);
>  failed_removal_pcplists_disabled:
>  	lru_cache_enable();
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index badc05fc1b2f..76f8c19e37a2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8950,15 +8950,6 @@ void *__init alloc_large_system_hash(const char *tablename,
>  }
>
>  #ifdef CONFIG_CONTIG_ALLOC
> -static unsigned long pfn_max_align_down(unsigned long pfn)
> -{
> -	return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
> -}
> -
> -static unsigned long pfn_max_align_up(unsigned long pfn)
> -{
> -	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
> -}
>
>  #if defined(CONFIG_DYNAMIC_DEBUG) || \
>  	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
> @@ -9104,8 +9095,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	 * put back to page allocator so that buddy can use them.
>  	 */
>
> -	ret = start_isolate_page_range(pfn_max_align_down(start),
> -				       pfn_max_align_up(end), migratetype, 0);
> +	ret = start_isolate_page_range(start, end, migratetype, 0);
>  	if (ret)
>  		return ret;
>
> @@ -9186,8 +9176,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  		free_contig_range(end, outer_end - end);
>
>  done:
> -	undo_isolate_page_range(pfn_max_align_down(start),
> -				pfn_max_align_up(end), migratetype);
> +	undo_isolate_page_range(start, end, migratetype);
>  	return ret;
>  }
>  EXPORT_SYMBOL(alloc_contig_range);
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index b34f1310aeaa..7c1f7724c5bb 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -15,6 +15,16 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/page_isolation.h>
>
> +static unsigned long pfn_max_align_down(unsigned long pfn)
> +{
> +	return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
> +}
> +
> +static unsigned long pfn_max_align_up(unsigned long pfn)
> +{
> +	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
> +}
> +
>  /*
>   * This function checks whether pageblock includes unmovable pages or not.
>   *
> @@ -262,12 +272,40 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>  	return NULL;
>  }
>
> +/*
> + * Make page-allocation-type of pageblocks to be MIGRATE_ISOLATE.
> + *
> + * Most users should actually use start_isolate_page_range(). Only memory
> + * onlining/offlining that knows exactly what it's doing in regard to
> + * isolating only some pageblocks of MAX_ORDER - 1 pages (for the vmemmap)
> + * should use this interface.
> + */
> +int start_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
> +			     unsigned migratetype, int flags)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +
> +	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
> +	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
> +
> +	for (pfn = start_pfn;
> +	     pfn < end_pfn;
> +	     pfn += pageblock_nr_pages) {
> +		page = __first_valid_page(pfn, pageblock_nr_pages);
> +		if (page && set_migratetype_isolate(page, migratetype, flags)) {
> +			undo_isolate_pageblocks(start_pfn, pfn, migratetype);
> +			return -EBUSY;
> +		}
> +	}
> +	return 0;
> +}
> +
>  /**
>   * start_isolate_page_range() - make page-allocation-type of range of pages to
>   * be MIGRATE_ISOLATE.
>   * @start_pfn:		The lower PFN of the range to be isolated.
>   * @end_pfn:		The upper PFN of the range to be isolated.
> - *			start_pfn/end_pfn must be aligned to pageblock_order.
>   * @migratetype:	Migrate type to set in error recovery.
>   * @flags:		The following flags are allowed (they can be combined in
>   *			a bit mask)
> @@ -306,29 +344,23 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>  int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  			     unsigned migratetype, int flags)
>  {
> -	unsigned long pfn;
> -	struct page *page;
> -
> -	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
> -	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
> +	start_pfn = = pfn_max_align_down(start_pfn);
> +	end_pfn = pfn_max_align_up(end_pfn);
>
> -	for (pfn = start_pfn;
> -	     pfn < end_pfn;
> -	     pfn += pageblock_nr_pages) {
> -		page = __first_valid_page(pfn, pageblock_nr_pages);
> -		if (page && set_migratetype_isolate(page, migratetype, flags)) {
> -			undo_isolate_page_range(start_pfn, pfn, migratetype);
> -			return -EBUSY;
> -		}
> -	}
> -	return 0;
> +	return start_isolate_pageblocks(start_pfn, end_pfn, migratetype, flags);
>  }
>
>  /*
> - * Make isolated pages available again.
> + * Make isolated pageblocks, isolated via start_isolate_pageblocks, available
> + * again.
> + *
> + * Most users should actually use undo_isolate_page_range(). Only memory
> + * onlining/offlining that knows exactly what it's doing in regard to
> + * isolating only some pageblocks of MAX_ORDER - 1 pages (for the vmemmap)
> + * should use this interface.
>   */
> -void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> -			    unsigned migratetype)
> +void undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
> +			     unsigned migratetype)
>  {
>  	unsigned long pfn;
>  	struct page *page;
> @@ -345,6 +377,21 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  		unset_migratetype_isolate(page, migratetype);
>  	}
>  }
> +
> +/*
> + * Make isolated pageblocks, isolated via start_isolate_page_range(), available
> + * again. The pageblock isolation range will be extended just like for
> + * start_isolate_page_range().
> + */
> +void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> +			     unsigned migratetype)
> +{
> +	start_pfn = = pfn_max_align_down(start_pfn);
> +	end_pfn = pfn_max_align_up(end_pfn);
> +
> +	return undo_isolate_pageblocks(start_pfn, end_pfn, migratetype);
> +}
> +
>  /*
>   * Test all pages in the range is free(means isolated) or not.
>   * all pages in [start_pfn...end_pfn) must be in the same zone.
> -- 
> 2.35.1
>
>
>
> -- 
> Thanks,
>
> David / dhildenb


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
  2022-04-12 15:01         ` Zi Yan
@ 2022-04-12 15:06             ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2022-04-12 15:06 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, linux-kernel, virtualization, Vlastimil Babka,
	Mel Gorman, Eric Ren, Mike Rapoport, Oscar Salvador,
	Christophe Leroy

On 12.04.22 17:01, Zi Yan wrote:
> On 12 Apr 2022, at 10:49, David Hildenbrand wrote:
> 
>> On 12.04.22 16:07, Zi Yan wrote:
>>> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
>>>
>>>> On 06.04.22 17:18, Zi Yan wrote:
>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>
>>>>> Enable set_migratetype_isolate() to check specified sub-range for
>>>>> unmovable pages during isolation. Page isolation is done
>>>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>>>> granularity are intended to be isolated. For example,
>>>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>>>> alignment. This commit makes unmovable page check only look for
>>>>> interesting pages, so that page isolation can succeed for any
>>>>> non-overlapping ranges.
>>>>>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>>  /*
>>>>> - * This function checks whether pageblock includes unmovable pages or not.
>>>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>>>> + * consequently belong to a single zone.
>>>>>   *
>>>>>   * PageLRU check without isolation or lru_lock could race so that
>>>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>>>> @@ -28,12 +30,14 @@
>>>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>>>   *
>>>>>   */
>>>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>> -				 int migratetype, int flags)
>>>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>>>> +				int migratetype, int flags)
>>>>>  {
>>>>> -	unsigned long iter = 0;
>>>>> -	unsigned long pfn = page_to_pfn(page);
>>>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>>>> +	unsigned long pfn = start_pfn;
>>>>> +	struct page *page = pfn_to_page(pfn);
>>>>
>>>>
>>>> Just do
>>>>
>>>> struct page *page = pfn_to_page(start_pfn);
>>>> struct zone *zone = page_zone(page);
>>>>
>>>> here. No need to lookup the zone again in the loop because, as you
>>>> document "must ... belong to a single zone.".
>>>>
>>>> Then, there is also no need to initialize "pfn" here. In the loop header
>>>> is sufficient.
>>>>
>>>
>>> Sure.
>>>
>>>>> +
>>>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>>>
>>>>>  	if (is_migrate_cma_page(page)) {
>>>>>  		/*
>>>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		return page;
>>>>>  	}
>>>>>
>>>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>>>> -		page = pfn_to_page(pfn + iter);
>>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>>> +		struct zone *zone;
>>>>> +
>>>>> +		page = pfn_to_page(pfn);
>>>>> +		zone = page_zone(page);
>>>>>
>>>>>  		/*
>>>>>  		 * Both, bootmem allocations and memory holes are marked
>>>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  			}
>>>>>
>>>>>  			skip_pages = compound_nr(head) - (page - head);
>>>>> -			iter += skip_pages - 1;
>>>>> +			pfn += skip_pages - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		 */
>>>>>  		if (!page_ref_count(page)) {
>>>>>  			if (PageBuddy(page))
>>>>> -				iter += (1 << buddy_order(page)) - 1;
>>>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  	return NULL;
>>>>>  }
>>>>>
>>>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>>>> +/*
>>>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>>>> + * [start_pfn, end_pfn).
>>>>> + */
>>>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>>>
>>>> I think we might be able do better, eventually not passing start_pfn at
>>>> all. Hmm.
>>>
>>> IMHO, having start_pfn and end_pfn in the parameter list would make the
>>> interface easier to understand. Otherwise if we remove start_pfn,
>>> the caller needs to adjust @page to be within the range of [start_pfn,
>>> end_pfn)
>>>
>>>>
>>>> I think we want to pull out the
>>>> start_isolate_page_range()/undo_isolate_page_range() interface change
>>>> into a separate patch.
>>>
>>> You mean a patch just adding
>>>
>>> unsigned long isolate_start = pfn_max_align_down(start_pfn);
>>> unsigned long isolate_end = pfn_max_align_up(end_pfn);
>>>
>>> in start_isolate_page_range()/undo_isolate_page_range()?
>>>
>>> Yes I can do that.
>>
>> I think we have to be careful with memory onlining/offlining. There are
>> corner cases where we get called with only pageblock alignment and
>> must not adjust the range.
> 
> In the patch below, you added a new set of start_isolate_pageblocks()
> and undo_isolate_pageblocks(), where start_isolate_pageblocks() still
> calls set_migratetype_isolate() and noted their range should not be
> adjusted. But in my patch, set_migratetype_isolate() adjusts
> the range for has_unmovable_pages() check. Do you mean

Right, that's broken in your patch. Memory onlining/offlining behavior
changed recently when "vmemmap on memory" was added. The start range
might only be aligned to pageblocks but not MAX_ORDER -1 -- and we must
not align u..

The core thing is that there are two types of users: memory offlining
that knows what it's doing when it aligns to less then MAX_ORDER -1 ,
and range allocators, that just pass in the range of interest.

> start_isolate_pageblocks() should call a different version of
> set_migratetype_isolate() without range adjustment? That can be done
> with an additional parameter in start_isolate_page_range(), like
> bool strict, right?

Random boolean flags are in general frowned upon ;)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
@ 2022-04-12 15:06             ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2022-04-12 15:06 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-kernel, Christophe Leroy, virtualization, linux-mm,
	Mike Rapoport, Eric Ren, Mel Gorman, Vlastimil Babka,
	Oscar Salvador

On 12.04.22 17:01, Zi Yan wrote:
> On 12 Apr 2022, at 10:49, David Hildenbrand wrote:
> 
>> On 12.04.22 16:07, Zi Yan wrote:
>>> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
>>>
>>>> On 06.04.22 17:18, Zi Yan wrote:
>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>
>>>>> Enable set_migratetype_isolate() to check specified sub-range for
>>>>> unmovable pages during isolation. Page isolation is done
>>>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>>>> granularity are intended to be isolated. For example,
>>>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>>>> alignment. This commit makes unmovable page check only look for
>>>>> interesting pages, so that page isolation can succeed for any
>>>>> non-overlapping ranges.
>>>>>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>>  /*
>>>>> - * This function checks whether pageblock includes unmovable pages or not.
>>>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>>>> + * consequently belong to a single zone.
>>>>>   *
>>>>>   * PageLRU check without isolation or lru_lock could race so that
>>>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>>>> @@ -28,12 +30,14 @@
>>>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>>>   *
>>>>>   */
>>>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>> -				 int migratetype, int flags)
>>>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>>>> +				int migratetype, int flags)
>>>>>  {
>>>>> -	unsigned long iter = 0;
>>>>> -	unsigned long pfn = page_to_pfn(page);
>>>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>>>> +	unsigned long pfn = start_pfn;
>>>>> +	struct page *page = pfn_to_page(pfn);
>>>>
>>>>
>>>> Just do
>>>>
>>>> struct page *page = pfn_to_page(start_pfn);
>>>> struct zone *zone = page_zone(page);
>>>>
>>>> here. No need to lookup the zone again in the loop because, as you
>>>> document "must ... belong to a single zone.".
>>>>
>>>> Then, there is also no need to initialize "pfn" here. In the loop header
>>>> is sufficient.
>>>>
>>>
>>> Sure.
>>>
>>>>> +
>>>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>>>
>>>>>  	if (is_migrate_cma_page(page)) {
>>>>>  		/*
>>>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		return page;
>>>>>  	}
>>>>>
>>>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>>>> -		page = pfn_to_page(pfn + iter);
>>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>>> +		struct zone *zone;
>>>>> +
>>>>> +		page = pfn_to_page(pfn);
>>>>> +		zone = page_zone(page);
>>>>>
>>>>>  		/*
>>>>>  		 * Both, bootmem allocations and memory holes are marked
>>>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  			}
>>>>>
>>>>>  			skip_pages = compound_nr(head) - (page - head);
>>>>> -			iter += skip_pages - 1;
>>>>> +			pfn += skip_pages - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		 */
>>>>>  		if (!page_ref_count(page)) {
>>>>>  			if (PageBuddy(page))
>>>>> -				iter += (1 << buddy_order(page)) - 1;
>>>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  	return NULL;
>>>>>  }
>>>>>
>>>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>>>> +/*
>>>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>>>> + * [start_pfn, end_pfn).
>>>>> + */
>>>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>>>
>>>> I think we might be able do better, eventually not passing start_pfn at
>>>> all. Hmm.
>>>
>>> IMHO, having start_pfn and end_pfn in the parameter list would make the
>>> interface easier to understand. Otherwise if we remove start_pfn,
>>> the caller needs to adjust @page to be within the range of [start_pfn,
>>> end_pfn)
>>>
>>>>
>>>> I think we want to pull out the
>>>> start_isolate_page_range()/undo_isolate_page_range() interface change
>>>> into a separate patch.
>>>
>>> You mean a patch just adding
>>>
>>> unsigned long isolate_start = pfn_max_align_down(start_pfn);
>>> unsigned long isolate_end = pfn_max_align_up(end_pfn);
>>>
>>> in start_isolate_page_range()/undo_isolate_page_range()?
>>>
>>> Yes I can do that.
>>
>> I think we have to be careful with memory onlining/offlining. There are
>> corner cases where we get called with only pageblock alignment and
>> must not adjust the range.
> 
> In the patch below, you added a new set of start_isolate_pageblocks()
> and undo_isolate_pageblocks(), where start_isolate_pageblocks() still
> calls set_migratetype_isolate() and noted their range should not be
> adjusted. But in my patch, set_migratetype_isolate() adjusts
> the range for has_unmovable_pages() check. Do you mean

Right, that's broken in your patch. Memory onlining/offlining behavior
changed recently when "vmemmap on memory" was added. The start range
might only be aligned to pageblocks but not MAX_ORDER -1 -- and we must
not align u..

The core thing is that there are two types of users: memory offlining
that knows what it's doing when it aligns to less then MAX_ORDER -1 ,
and range allocators, that just pass in the range of interest.

> start_isolate_pageblocks() should call a different version of
> set_migratetype_isolate() without range adjustment? That can be done
> with an additional parameter in start_isolate_page_range(), like
> bool strict, right?

Random boolean flags are in general frowned upon ;)

-- 
Thanks,

David / dhildenb

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
  2022-04-12 15:06             ` David Hildenbrand
  (?)
@ 2022-04-12 17:41             ` Zi Yan
  2022-04-12 19:34               ` Zi Yan
  -1 siblings, 1 reply; 18+ messages in thread
From: Zi Yan @ 2022-04-12 17:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, virtualization, Vlastimil Babka,
	Mel Gorman, Eric Ren, Mike Rapoport, Oscar Salvador,
	Christophe Leroy

[-- Attachment #1: Type: text/plain, Size: 7012 bytes --]

On 12 Apr 2022, at 11:06, David Hildenbrand wrote:

> On 12.04.22 17:01, Zi Yan wrote:
>> On 12 Apr 2022, at 10:49, David Hildenbrand wrote:
>>
>>> On 12.04.22 16:07, Zi Yan wrote:
>>>> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
>>>>
>>>>> On 06.04.22 17:18, Zi Yan wrote:
>>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>>
>>>>>> Enable set_migratetype_isolate() to check specified sub-range for
>>>>>> unmovable pages during isolation. Page isolation is done
>>>>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>>>>> granularity are intended to be isolated. For example,
>>>>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>>>>> alignment. This commit makes unmovable page check only look for
>>>>>> interesting pages, so that page isolation can succeed for any
>>>>>> non-overlapping ranges.
>>>>>>
>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>> ---
>>>>>
>>>>> [...]
>>>>>
>>>>>>  /*
>>>>>> - * This function checks whether pageblock includes unmovable pages or not.
>>>>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>>>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>>>>> + * consequently belong to a single zone.
>>>>>>   *
>>>>>>   * PageLRU check without isolation or lru_lock could race so that
>>>>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>>>>> @@ -28,12 +30,14 @@
>>>>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>>>>   *
>>>>>>   */
>>>>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>> -				 int migratetype, int flags)
>>>>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>>>>> +				int migratetype, int flags)
>>>>>>  {
>>>>>> -	unsigned long iter = 0;
>>>>>> -	unsigned long pfn = page_to_pfn(page);
>>>>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>>>>> +	unsigned long pfn = start_pfn;
>>>>>> +	struct page *page = pfn_to_page(pfn);
>>>>>
>>>>>
>>>>> Just do
>>>>>
>>>>> struct page *page = pfn_to_page(start_pfn);
>>>>> struct zone *zone = page_zone(page);
>>>>>
>>>>> here. No need to lookup the zone again in the loop because, as you
>>>>> document "must ... belong to a single zone.".
>>>>>
>>>>> Then, there is also no need to initialize "pfn" here. In the loop header
>>>>> is sufficient.
>>>>>
>>>>
>>>> Sure.
>>>>
>>>>>> +
>>>>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>>>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>>>>
>>>>>>  	if (is_migrate_cma_page(page)) {
>>>>>>  		/*
>>>>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>>  		return page;
>>>>>>  	}
>>>>>>
>>>>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>>>>> -		page = pfn_to_page(pfn + iter);
>>>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>>>> +		struct zone *zone;
>>>>>> +
>>>>>> +		page = pfn_to_page(pfn);
>>>>>> +		zone = page_zone(page);
>>>>>>
>>>>>>  		/*
>>>>>>  		 * Both, bootmem allocations and memory holes are marked
>>>>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>>  			}
>>>>>>
>>>>>>  			skip_pages = compound_nr(head) - (page - head);
>>>>>> -			iter += skip_pages - 1;
>>>>>> +			pfn += skip_pages - 1;
>>>>>>  			continue;
>>>>>>  		}
>>>>>>
>>>>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>>  		 */
>>>>>>  		if (!page_ref_count(page)) {
>>>>>>  			if (PageBuddy(page))
>>>>>> -				iter += (1 << buddy_order(page)) - 1;
>>>>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>>>>  			continue;
>>>>>>  		}
>>>>>>
>>>>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>>  	return NULL;
>>>>>>  }
>>>>>>
>>>>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>>>>> +/*
>>>>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>>>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>>>>> + * [start_pfn, end_pfn).
>>>>>> + */
>>>>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>>>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>>>>
>>>>> I think we might be able do better, eventually not passing start_pfn at
>>>>> all. Hmm.
>>>>
>>>> IMHO, having start_pfn and end_pfn in the parameter list would make the
>>>> interface easier to understand. Otherwise if we remove start_pfn,
>>>> the caller needs to adjust @page to be within the range of [start_pfn,
>>>> end_pfn)
>>>>
>>>>>
>>>>> I think we want to pull out the
>>>>> start_isolate_page_range()/undo_isolate_page_range() interface change
>>>>> into a separate patch.
>>>>
>>>> You mean a patch just adding
>>>>
>>>> unsigned long isolate_start = pfn_max_align_down(start_pfn);
>>>> unsigned long isolate_end = pfn_max_align_up(end_pfn);
>>>>
>>>> in start_isolate_page_range()/undo_isolate_page_range()?
>>>>
>>>> Yes I can do that.
>>>
>>> I think we have to be careful with memory onlining/offlining. There are
>>> corner cases where we get called with only pageblock alignment and
>>> must not adjust the range.
>>
>> In the patch below, you added a new set of start_isolate_pageblocks()
>> and undo_isolate_pageblocks(), where start_isolate_pageblocks() still
>> calls set_migratetype_isolate() and noted their range should not be
>> adjusted. But in my patch, set_migratetype_isolate() adjusts
>> the range for has_unmovable_pages() check. Do you mean
>
> Right, that's broken in your patch. Memory onlining/offlining behavior
> changed recently when "vmemmap on memory" was added. The start range
> might only be aligned to pageblocks but not MAX_ORDER -1 -- and we must
> not align u..
>
> The core thing is that there are two types of users: memory offlining
> that knows what it's doing when it aligns to less then MAX_ORDER -1 ,
> and range allocators, that just pass in the range of interest.

Oh, you mean the pfn_max_align_down() and pfn_max_align_up() are wrong
for memory onlining/offlining callers. Got it. And in patch 3, this is
not a concern any more, since we move to pageblock_nr_pages alignment.

>
>> start_isolate_pageblocks() should call a different version of
>> set_migratetype_isolate() without range adjustment? That can be done
>> with an additional parameter in start_isolate_page_range(), like
>> bool strict, right?
>
> Random boolean flags are in general frowned upon ;)

I misunderstood about the alignment	issue. No need for this additional
parameter.

Thanks. Will take your patch and adjust this one based on yours.

I will wait for your reviews on patch 3 and onwards before sending
out a new version.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
  2022-04-12 17:41             ` Zi Yan
@ 2022-04-12 19:34               ` Zi Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2022-04-12 19:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, virtualization, Vlastimil Babka,
	Mel Gorman, Eric Ren, Mike Rapoport, Oscar Salvador,
	Christophe Leroy

[-- Attachment #1: Type: text/plain, Size: 8089 bytes --]

On 12 Apr 2022, at 13:41, Zi Yan wrote:

> On 12 Apr 2022, at 11:06, David Hildenbrand wrote:
>
>> On 12.04.22 17:01, Zi Yan wrote:
>>> On 12 Apr 2022, at 10:49, David Hildenbrand wrote:
>>>
>>>> On 12.04.22 16:07, Zi Yan wrote:
>>>>> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
>>>>>
>>>>>> On 06.04.22 17:18, Zi Yan wrote:
>>>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>>>
>>>>>>> Enable set_migratetype_isolate() to check specified sub-range for
>>>>>>> unmovable pages during isolation. Page isolation is done
>>>>>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>>>>>> granularity are intended to be isolated. For example,
>>>>>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>>>>>> alignment. This commit makes unmovable page check only look for
>>>>>>> interesting pages, so that page isolation can succeed for any
>>>>>>> non-overlapping ranges.
>>>>>>>
>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>> ---
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>>  /*
>>>>>>> - * This function checks whether pageblock includes unmovable pages or not.
>>>>>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>>>>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>>>>>> + * consequently belong to a single zone.
>>>>>>>   *
>>>>>>>   * PageLRU check without isolation or lru_lock could race so that
>>>>>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>>>>>> @@ -28,12 +30,14 @@
>>>>>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>>>>>   *
>>>>>>>   */
>>>>>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>>> -				 int migratetype, int flags)
>>>>>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>>>>>> +				int migratetype, int flags)
>>>>>>>  {
>>>>>>> -	unsigned long iter = 0;
>>>>>>> -	unsigned long pfn = page_to_pfn(page);
>>>>>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>>>>>> +	unsigned long pfn = start_pfn;
>>>>>>> +	struct page *page = pfn_to_page(pfn);
>>>>>>
>>>>>>
>>>>>> Just do
>>>>>>
>>>>>> struct page *page = pfn_to_page(start_pfn);
>>>>>> struct zone *zone = page_zone(page);
>>>>>>
>>>>>> here. No need to lookup the zone again in the loop because, as you
>>>>>> document "must ... belong to a single zone.".
>>>>>>
>>>>>> Then, there is also no need to initialize "pfn" here. In the loop header
>>>>>> is sufficient.
>>>>>>
>>>>>
>>>>> Sure.
>>>>>
>>>>>>> +
>>>>>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>>>>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>>>>>
>>>>>>>  	if (is_migrate_cma_page(page)) {
>>>>>>>  		/*
>>>>>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>>>  		return page;
>>>>>>>  	}
>>>>>>>
>>>>>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>>>>>> -		page = pfn_to_page(pfn + iter);
>>>>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>>>>> +		struct zone *zone;
>>>>>>> +
>>>>>>> +		page = pfn_to_page(pfn);
>>>>>>> +		zone = page_zone(page);
>>>>>>>
>>>>>>>  		/*
>>>>>>>  		 * Both, bootmem allocations and memory holes are marked
>>>>>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>>>  			}
>>>>>>>
>>>>>>>  			skip_pages = compound_nr(head) - (page - head);
>>>>>>> -			iter += skip_pages - 1;
>>>>>>> +			pfn += skip_pages - 1;
>>>>>>>  			continue;
>>>>>>>  		}
>>>>>>>
>>>>>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>>>  		 */
>>>>>>>  		if (!page_ref_count(page)) {
>>>>>>>  			if (PageBuddy(page))
>>>>>>> -				iter += (1 << buddy_order(page)) - 1;
>>>>>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>>>>>  			continue;
>>>>>>>  		}
>>>>>>>
>>>>>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>>>  	return NULL;
>>>>>>>  }
>>>>>>>
>>>>>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>>>>>> +/*
>>>>>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>>>>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>>>>>> + * [start_pfn, end_pfn).
>>>>>>> + */
>>>>>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>>>>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>>>>>
>>>>>> I think we might be able do better, eventually not passing start_pfn at
>>>>>> all. Hmm.
>>>>>
>>>>> IMHO, having start_pfn and end_pfn in the parameter list would make the
>>>>> interface easier to understand. Otherwise if we remove start_pfn,
>>>>> the caller needs to adjust @page to be within the range of [start_pfn,
>>>>> end_pfn)
>>>>>
>>>>>>
>>>>>> I think we want to pull out the
>>>>>> start_isolate_page_range()/undo_isolate_page_range() interface change
>>>>>> into a separate patch.
>>>>>
>>>>> You mean a patch just adding
>>>>>
>>>>> unsigned long isolate_start = pfn_max_align_down(start_pfn);
>>>>> unsigned long isolate_end = pfn_max_align_up(end_pfn);
>>>>>
>>>>> in start_isolate_page_range()/undo_isolate_page_range()?
>>>>>
>>>>> Yes I can do that.
>>>>
>>>> I think we have to be careful with memory onlining/offlining. There are
>>>> corner cases where we get called with only pageblock alignment and
>>>> must not adjust the range.
>>>
>>> In the patch below, you added a new set of start_isolate_pageblocks()
>>> and undo_isolate_pageblocks(), where start_isolate_pageblocks() still
>>> calls set_migratetype_isolate() and noted their range should not be
>>> adjusted. But in my patch, set_migratetype_isolate() adjusts
>>> the range for has_unmovable_pages() check. Do you mean
>>
>> Right, that's broken in your patch. Memory onlining/offlining behavior
>> changed recently when "vmemmap on memory" was added. The start range
>> might only be aligned to pageblocks but not MAX_ORDER -1 -- and we must
>> not align u..
>>
>> The core thing is that there are two types of users: memory offlining
>> that knows what it's doing when it aligns to less then MAX_ORDER -1 ,
>> and range allocators, that just pass in the range of interest.
>
> Oh, you mean the pfn_max_align_down() and pfn_max_align_up() are wrong
> for memory onlining/offlining callers. Got it. And in patch 3, this is
> not a concern any more, since we move to pageblock_nr_pages alignment.
>
>>
>>> start_isolate_pageblocks() should call a different version of
>>> set_migratetype_isolate() without range adjustment? That can be done
>>> with an additional parameter in start_isolate_page_range(), like
>>> bool strict, right?
>>
>> Random boolean flags are in general frowned upon ;)
>
> I misunderstood about the alignment	issue. No need for this additional
> parameter.
>
> Thanks. Will take your patch and adjust this one based on yours.
>
> I will wait for your reviews on patch 3 and onwards before sending
> out a new version.

As I am editing my patch 2 and 3 based on your patch, it becomes redundant
to have start_isolate_pageblocks() in patch 3, since start_isolate_page_range()
will only require pageblock size alignment after patch 3.
start_isolate_pageblocks() and start_isolate_page_range() do the same thing.

Since moving MAX_ORDER-1 alignment into start_isolate_page_range() is wrong
in patch 2, I think it is better to move it in patch 3, when
start_isolate_page_range() is ready for pageblock size alignment. Patch 2
then will have no functional change, since the range does not change and
has_unmovable_pages() will make its in in patch 3 eventually.

If you think moving the range alignment adjustment should be a separate
patch, I can move it to patch 4 after patch 3 when all the related
functions are ready for the new pageblock size alignment.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-12 19:34 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 15:18 [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-04-06 15:18 ` [PATCH v10 1/5] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-04-06 15:18 ` [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-04-12 13:10   ` David Hildenbrand
2022-04-12 13:10     ` David Hildenbrand
2022-04-12 14:07     ` Zi Yan
2022-04-12 14:49       ` David Hildenbrand
2022-04-12 14:49         ` David Hildenbrand
2022-04-12 15:01         ` Zi Yan
2022-04-12 15:06           ` David Hildenbrand
2022-04-12 15:06             ` David Hildenbrand
2022-04-12 17:41             ` Zi Yan
2022-04-12 19:34               ` Zi Yan
2022-04-06 15:18 ` [PATCH v10 3/5] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-04-06 15:18 ` [PATCH v10 4/5] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-04-06 15:18 ` [PATCH v10 5/5] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-04-12 12:35 ` [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment David Hildenbrand
2022-04-12 12:35   ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.