linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment.
@ 2022-02-11 16:41 Zi Yan
  2022-02-11 16:41 ` [PATCH v5 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Zi Yan @ 2022-02-11 16:41 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Oscar Salvador, Robin Murphy, linux-kernel,
	iommu, Mike Rapoport, Eric Ren, virtualization, linuxppc-dev,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make
MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2022-02-08-15-31.

Changelog
===
V5
---
1. Moved isolation address alignment handling in start_isolate_page_range().
2. Rewrote and simplified how alloc_contig_range() works at pageblock
   granularity (Patch 3). Only two pageblock migratetypes need to be saved and
   restored. start_isolate_page_range() might need to migrate pages in this
   version, but it prevents the caller from worrying about
   max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment after the page range
   is isolated.

V4
---
1. Dropped two irrelevant patches on non-lru compound page handling, as
   it is not supported upstream.
2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
3. Always check whether two pageblocks can be merged in
   __free_one_page() when order is >= pageblock_order, as the case (not
   mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
3. Moving has_unmovable_pages() is now a separate patch.
4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.

Description
===

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. isolates pages at pageblock granularity instead of
   max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages);
2. splits free pages across the specified range or migrates in-use pages
   across the specified range then splits the freed page to avoid free page
   accounting issues (it happens when multiple pageblocks within a single page
   have different migratetypes);
3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
   range during isolation to avoid alloc_contig_range() failure when pageblocks
   within a MAX_ORDER - 1 aligned range are allocated separately.
4. returns pages not in the range as it did before.

One optimization might come later:
1. make MIGRATE_ISOLATE a separate bit to be able to restore the original
   migratetypes when isolation fails in the middle of the range.

Feel free to give comments and suggestions. Thanks.

[1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/

Zi Yan (6):
  mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  mm: page_isolation: check specified range for unmovable pages
  mm: make alloc_contig_range work at pageblock granularity
  mm: cma: use pageblock_order as the single alignment
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
    size.
  arch: powerpc: adjust fadump alignment to be pageblock aligned.

 arch/powerpc/include/asm/fadump-internal.h |   4 +-
 drivers/virtio/virtio_mem.c                |   7 +-
 include/linux/mmzone.h                     |   5 +-
 include/linux/page-isolation.h             |  16 +-
 kernel/dma/contiguous.c                    |   2 +-
 mm/cma.c                                   |   6 +-
 mm/internal.h                              |   3 +
 mm/memory_hotplug.c                        |   3 +-
 mm/page_alloc.c                            | 371 ++++++++++-----------
 mm/page_isolation.c                        | 172 +++++++++-
 10 files changed, 367 insertions(+), 222 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  2022-02-11 16:41 [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
@ 2022-02-11 16:41 ` Zi Yan
  2022-02-14 10:44   ` Mike Rapoport
  2022-02-11 16:41 ` [PATCH v5 2/6] mm: page_isolation: check specified range for unmovable pages Zi Yan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Zi Yan @ 2022-02-11 16:41 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Oscar Salvador, Robin Murphy, linux-kernel,
	iommu, Mike Rapoport, Eric Ren, virtualization, linuxppc-dev,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
mm/page_alloc.c and make it static.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/page-isolation.h |   2 -
 mm/page_alloc.c                | 119 ---------------------------------
 mm/page_isolation.c            | 119 +++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 121 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 572458016331..e14eddf6741a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype)
 #define MEMORY_OFFLINE	0x1
 #define REPORT_FAILURE	0x2
 
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cface1d38093..e2c6a67fc386 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8962,125 +8962,6 @@ void *__init alloc_large_system_hash(const char *tablename,
 	return table;
 }
 
-/*
- * This function checks whether pageblock includes unmovable pages or not.
- *
- * PageLRU check without isolation or lru_lock could race so that
- * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
- * check without lock_page also may miss some movable non-lru pages at
- * race condition. So you can't expect this function should be exact.
- *
- * Returns a page without holding a reference. If the caller wants to
- * dereference that page (e.g., dumping), it has to make sure that it
- * cannot get removed (e.g., via memory unplug) concurrently.
- *
- */
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
-{
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
-
-	if (is_migrate_cma_page(page)) {
-		/*
-		 * CMA allocations (alloc_contig_range) really need to mark
-		 * isolate CMA pageblocks even when they are not movable in fact
-		 * so consider them movable here.
-		 */
-		if (is_migrate_cma(migratetype))
-			return NULL;
-
-		return page;
-	}
-
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
-
-		/*
-		 * Both, bootmem allocations and memory holes are marked
-		 * PG_reserved and are unmovable. We can even have unmovable
-		 * allocations inside ZONE_MOVABLE, for example when
-		 * specifying "movablecore".
-		 */
-		if (PageReserved(page))
-			return page;
-
-		/*
-		 * If the zone is movable and we have ruled out all reserved
-		 * pages then it should be reasonably safe to assume the rest
-		 * is movable.
-		 */
-		if (zone_idx(zone) == ZONE_MOVABLE)
-			continue;
-
-		/*
-		 * Hugepages are not in LRU lists, but they're movable.
-		 * THPs are on the LRU, but need to be counted as #small pages.
-		 * We need not scan over tail pages because we don't
-		 * handle each tail page individually in migration.
-		 */
-		if (PageHuge(page) || PageTransCompound(page)) {
-			struct page *head = compound_head(page);
-			unsigned int skip_pages;
-
-			if (PageHuge(page)) {
-				if (!hugepage_migration_supported(page_hstate(head)))
-					return page;
-			} else if (!PageLRU(head) && !__PageMovable(head)) {
-				return page;
-			}
-
-			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
-			continue;
-		}
-
-		/*
-		 * We can't use page_count without pin a page
-		 * because another CPU can free compound page.
-		 * This check already skips compound tails of THP
-		 * because their page->_refcount is zero at all time.
-		 */
-		if (!page_ref_count(page)) {
-			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
-			continue;
-		}
-
-		/*
-		 * The HWPoisoned page may be not in buddy system, and
-		 * page_count() is not 0.
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
-			continue;
-
-		/*
-		 * We treat all PageOffline() pages as movable when offlining
-		 * to give drivers a chance to decrement their reference count
-		 * in MEM_GOING_OFFLINE in order to indicate that these pages
-		 * can be offlined as there are no direct references anymore.
-		 * For actually unmovable PageOffline() where the driver does
-		 * not support this, we will fail later when trying to actually
-		 * move these pages that still have a reference count > 0.
-		 * (false negatives in this function only)
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
-			continue;
-
-		if (__PageMovable(page) || PageLRU(page))
-			continue;
-
-		/*
-		 * If there are RECLAIMABLE pages, we need to check
-		 * it.  But now, memory offline itself doesn't call
-		 * shrink_node_slabs() and it still to be fixed.
-		 */
-		return page;
-	}
-	return NULL;
-}
-
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f67c4c70f17f..b34f1310aeaa 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,6 +15,125 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
+/*
+ * This function checks whether pageblock includes unmovable pages or not.
+ *
+ * PageLRU check without isolation or lru_lock could race so that
+ * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
+ * check without lock_page also may miss some movable non-lru pages at
+ * race condition. So you can't expect this function should be exact.
+ *
+ * Returns a page without holding a reference. If the caller wants to
+ * dereference that page (e.g., dumping), it has to make sure that it
+ * cannot get removed (e.g., via memory unplug) concurrently.
+ *
+ */
+static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
+				 int migratetype, int flags)
+{
+	unsigned long iter = 0;
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long offset = pfn % pageblock_nr_pages;
+
+	if (is_migrate_cma_page(page)) {
+		/*
+		 * CMA allocations (alloc_contig_range) really need to mark
+		 * isolate CMA pageblocks even when they are not movable in fact
+		 * so consider them movable here.
+		 */
+		if (is_migrate_cma(migratetype))
+			return NULL;
+
+		return page;
+	}
+
+	for (; iter < pageblock_nr_pages - offset; iter++) {
+		page = pfn_to_page(pfn + iter);
+
+		/*
+		 * Both, bootmem allocations and memory holes are marked
+		 * PG_reserved and are unmovable. We can even have unmovable
+		 * allocations inside ZONE_MOVABLE, for example when
+		 * specifying "movablecore".
+		 */
+		if (PageReserved(page))
+			return page;
+
+		/*
+		 * If the zone is movable and we have ruled out all reserved
+		 * pages then it should be reasonably safe to assume the rest
+		 * is movable.
+		 */
+		if (zone_idx(zone) == ZONE_MOVABLE)
+			continue;
+
+		/*
+		 * Hugepages are not in LRU lists, but they're movable.
+		 * THPs are on the LRU, but need to be counted as #small pages.
+		 * We need not scan over tail pages because we don't
+		 * handle each tail page individually in migration.
+		 */
+		if (PageHuge(page) || PageTransCompound(page)) {
+			struct page *head = compound_head(page);
+			unsigned int skip_pages;
+
+			if (PageHuge(page)) {
+				if (!hugepage_migration_supported(page_hstate(head)))
+					return page;
+			} else if (!PageLRU(head) && !__PageMovable(head)) {
+				return page;
+			}
+
+			skip_pages = compound_nr(head) - (page - head);
+			iter += skip_pages - 1;
+			continue;
+		}
+
+		/*
+		 * We can't use page_count without pin a page
+		 * because another CPU can free compound page.
+		 * This check already skips compound tails of THP
+		 * because their page->_refcount is zero at all time.
+		 */
+		if (!page_ref_count(page)) {
+			if (PageBuddy(page))
+				iter += (1 << buddy_order(page)) - 1;
+			continue;
+		}
+
+		/*
+		 * The HWPoisoned page may be not in buddy system, and
+		 * page_count() is not 0.
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
+			continue;
+
+		/*
+		 * We treat all PageOffline() pages as movable when offlining
+		 * to give drivers a chance to decrement their reference count
+		 * in MEM_GOING_OFFLINE in order to indicate that these pages
+		 * can be offlined as there are no direct references anymore.
+		 * For actually unmovable PageOffline() where the driver does
+		 * not support this, we will fail later when trying to actually
+		 * move these pages that still have a reference count > 0.
+		 * (false negatives in this function only)
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
+			continue;
+
+		if (__PageMovable(page) || PageLRU(page))
+			continue;
+
+		/*
+		 * If there are RECLAIMABLE pages, we need to check
+		 * it.  But now, memory offline itself doesn't call
+		 * shrink_node_slabs() and it still to be fixed.
+		 */
+		return page;
+	}
+	return NULL;
+}
+
 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
 {
 	struct zone *zone = page_zone(page);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/6] mm: page_isolation: check specified range for unmovable pages
  2022-02-11 16:41 [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
  2022-02-11 16:41 ` [PATCH v5 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
@ 2022-02-11 16:41 ` Zi Yan
  2022-02-11 16:41 ` [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity Zi Yan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Zi Yan @ 2022-02-11 16:41 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Oscar Salvador, Robin Murphy, linux-kernel,
	iommu, Mike Rapoport, Eric Ren, virtualization, linuxppc-dev,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

Enable set_migratetype_isolate() to check specified sub-range for
unmovable pages during isolation. Page isolation is done
at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
pages within that granularity are intended to be isolated. For example,
alloc_contig_range(), which uses page isolation, allows ranges without
alignment. This commit makes unmovable page check only look for
interesting pages, so that page isolation can succeed for any
non-overlapping ranges.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/page-isolation.h | 12 +++++++++
 mm/page_alloc.c                | 15 +----------
 mm/page_isolation.c            | 46 +++++++++++++++++++++-------------
 3 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index e14eddf6741a..4ef7be6def83 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -15,6 +15,18 @@ static inline bool is_migrate_isolate(int migratetype)
 {
 	return migratetype == MIGRATE_ISOLATE;
 }
+static inline unsigned long pfn_max_align_down(unsigned long pfn)
+{
+	return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+				     pageblock_nr_pages));
+}
+
+static inline unsigned long pfn_max_align_up(unsigned long pfn)
+{
+	return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+				pageblock_nr_pages));
+}
+
 #else
 static inline bool has_isolate_pageblock(struct zone *zone)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e2c6a67fc386..62ef78f3d771 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8963,18 +8963,6 @@ void *__init alloc_large_system_hash(const char *tablename,
 }
 
 #ifdef CONFIG_CONTIG_ALLOC
-static unsigned long pfn_max_align_down(unsigned long pfn)
-{
-	return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
-			     pageblock_nr_pages) - 1);
-}
-
-static unsigned long pfn_max_align_up(unsigned long pfn)
-{
-	return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
-				pageblock_nr_pages));
-}
-
 #if defined(CONFIG_DYNAMIC_DEBUG) || \
 	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
 /* Usage: See admin-guide/dynamic-debug-howto.rst */
@@ -9119,8 +9107,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype, 0);
+	ret = start_isolate_page_range(start, end, migratetype, 0);
 	if (ret)
 		return ret;
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b34f1310aeaa..64d093ab83ec 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -16,7 +16,8 @@
 #include <trace/events/page_isolation.h>
 
 /*
- * This function checks whether pageblock includes unmovable pages or not.
+ * This function checks whether pageblock within [start_pfn, end_pfn) includes
+ * unmovable pages or not.
  *
  * PageLRU check without isolation or lru_lock could race so that
  * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
@@ -29,11 +30,14 @@
  *
  */
 static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
+				 int migratetype, int flags,
+				 unsigned long start_pfn, unsigned long end_pfn)
 {
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
+	unsigned long first_pfn = max(page_to_pfn(page), start_pfn);
+	unsigned long pfn = first_pfn;
+	unsigned long last_pfn = min(ALIGN(pfn + 1, pageblock_nr_pages), end_pfn);
+
+	page = pfn_to_page(pfn);
 
 	if (is_migrate_cma_page(page)) {
 		/*
@@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		return page;
 	}
 
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
+	for (pfn = first_pfn; pfn < last_pfn; pfn++) {
+		page = pfn_to_page(pfn);
 
 		/*
 		 * Both, bootmem allocations and memory holes are marked
@@ -85,7 +89,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 			}
 
 			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
+			pfn += skip_pages - 1;
 			continue;
 		}
 
@@ -97,7 +101,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		 */
 		if (!page_ref_count(page)) {
 			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
+				pfn += (1 << buddy_order(page)) - 1;
 			continue;
 		}
 
@@ -134,7 +138,13 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 	return NULL;
 }
 
-static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
+/*
+ * This function set pageblock migratetype to isolate if no unmovable page is
+ * present in [start_pfn, end_pfn). The pageblock must be within
+ * [start_pfn, end_pfn).
+ */
+static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
+			unsigned long start_pfn, unsigned long end_pfn)
 {
 	struct zone *zone = page_zone(page);
 	struct page *unmovable;
@@ -156,7 +166,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
+	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags, start_pfn, end_pfn);
 	if (!unmovable) {
 		unsigned long nr_pages;
 		int mt = get_pageblock_migratetype(page);
@@ -267,7 +277,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * be MIGRATE_ISOLATE.
  * @start_pfn:		The lower PFN of the range to be isolated.
  * @end_pfn:		The upper PFN of the range to be isolated.
- *			start_pfn/end_pfn must be aligned to pageblock_order.
  * @migratetype:	Migrate type to set in error recovery.
  * @flags:		The following flags are allowed (they can be combined in
  *			a bit mask)
@@ -309,15 +318,16 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	unsigned long pfn;
 	struct page *page;
 
-	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
-	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
+	unsigned long isolate_start = pfn_max_align_down(start_pfn);
+	unsigned long isolate_end = pfn_max_align_up(end_pfn);
 
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
+	for (pfn = isolate_start;
+	     pfn < isolate_end;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page && set_migratetype_isolate(page, migratetype, flags)) {
-			undo_isolate_page_range(start_pfn, pfn, migratetype);
+		if (page && set_migratetype_isolate(page, migratetype, flags,
+					start_pfn, end_pfn)) {
+			undo_isolate_page_range(isolate_start, pfn, migratetype);
 			return -EBUSY;
 		}
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity
  2022-02-11 16:41 [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
  2022-02-11 16:41 ` [PATCH v5 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
  2022-02-11 16:41 ` [PATCH v5 2/6] mm: page_isolation: check specified range for unmovable pages Zi Yan
@ 2022-02-11 16:41 ` Zi Yan
  2022-02-14  7:26   ` Christoph Hellwig
  2022-02-14  7:59   ` Christophe Leroy
  2022-02-11 16:41 ` [PATCH v5 4/6] mm: cma: use pageblock_order as the single alignment Zi Yan
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 12+ messages in thread
From: Zi Yan @ 2022-02-11 16:41 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Oscar Salvador, Robin Murphy, linux-kernel,
	iommu, Mike Rapoport, Eric Ren, virtualization, linuxppc-dev,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
pageblocks with different migratetypes. It might unnecessarily convert
extra pageblocks at the beginning and at the end of the range. Change
alloc_contig_range() to work at pageblock granularity.

Special handling is needed for free pages and in-use pages across the
boundaries of the range specified alloc_contig_range(). Because these
partially isolated pages causes free page accounting issues. The free
pages will be split and freed into separate migratetype lists; the
in-use pages will be migrated then the freed pages will be handled.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/page-isolation.h |   2 +-
 mm/internal.h                  |   3 +
 mm/memory_hotplug.c            |   3 +-
 mm/page_alloc.c                | 235 +++++++++++++++++++++++++--------
 mm/page_isolation.c            |  33 ++++-
 5 files changed, 211 insertions(+), 65 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 4ef7be6def83..78ff940cc169 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -54,7 +54,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
  */
 int
 start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			 unsigned migratetype, int flags);
+			 unsigned migratetype, int flags, gfp_t gfp_flags);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
diff --git a/mm/internal.h b/mm/internal.h
index 0d240e876831..509cbdc25992 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -319,6 +319,9 @@ isolate_freepages_range(struct compact_control *cc,
 int
 isolate_migratepages_range(struct compact_control *cc,
 			   unsigned long low_pfn, unsigned long end_pfn);
+
+int
+isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, int isolate_before_boundary);
 #endif
 int find_suitable_fallback(struct free_area *area, unsigned int order,
 			int migratetype, bool only_stealable, bool *can_steal);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ce68098832aa..82406d2f3e46 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1863,7 +1863,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
-				       MEMORY_OFFLINE | REPORT_FAILURE);
+				       MEMORY_OFFLINE | REPORT_FAILURE,
+				       GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
 	if (ret) {
 		reason = "failure to isolate range";
 		goto failed_removal_pcplists_disabled;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 62ef78f3d771..7a4fa21aea5c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8985,7 +8985,7 @@ static inline void alloc_contig_dump_pages(struct list_head *page_list)
 #endif
 
 /* [start, end) must belong to a single zone. */
-static int __alloc_contig_migrate_range(struct compact_control *cc,
+int __alloc_contig_migrate_range(struct compact_control *cc,
 					unsigned long start, unsigned long end)
 {
 	/* This function is based on compact_zone() from compaction.c. */
@@ -9043,6 +9043,167 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	return 0;
 }
 
+/**
+ * split_free_page() -- split a free page at split_pfn_offset
+ * @free_page:		the original free page
+ * @order:		the order of the page
+ * @split_pfn_offset:	split offset within the page
+ *
+ * It is used when the free page crosses two pageblocks with different migratetypes
+ * at split_pfn_offset within the page. The split free page will be put into
+ * separate migratetype lists afterwards. Otherwise, the function achieves
+ * nothing.
+ */
+static inline void split_free_page(struct page *free_page,
+				int order, unsigned long split_pfn_offset)
+{
+	struct zone *zone = page_zone(free_page);
+	unsigned long free_page_pfn = page_to_pfn(free_page);
+	unsigned long pfn;
+	unsigned long flags;
+	int free_page_order;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	del_page_from_free_list(free_page, zone, order);
+	for (pfn = free_page_pfn;
+	     pfn < free_page_pfn + (1UL << order);) {
+		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+		free_page_order = order_base_2(split_pfn_offset);
+		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
+				mt, FPI_NONE);
+		pfn += 1UL << free_page_order;
+		split_pfn_offset -= (1UL << free_page_order);
+		/* we have done the first part, now switch to second part */
+		if (split_pfn_offset == 0)
+			split_pfn_offset = (1UL << order) - (pfn - free_page_pfn);
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
+/**
+ * isolate_single_pageblock() -- tries to isolate a pageblock that might be
+ * within a free or in-use page.
+ * @boundary_pfn:		pageblock-aligned pfn that a page might cross
+ * @gfp_flags:			GFP flags used for migrating pages
+ * @isolate_before_boundary:	isolate the pageblock before (1) or after (0)
+ *				the boundary_pfn
+ *
+ * Free and in-use pages can be as big as MAX_ORDER-1 and contain more than one
+ * pageblock. When not all pageblocks within a page are isolated at the same
+ * time, free page accounting can go wrong. For example, in the case of
+ * MAX_ORDER-1 = pageblock_order + 1, a MAX_ORDER-1 page has two pagelbocks.
+ * [        MAX_ORDER-1          ]
+ * [  pageblock0  |  pageblock1  ]
+ * When either pageblock is isolated, if it is a free page, the page is not
+ * split into separate migratetype lists, which is supposed to; if it is an
+ * in-use page and freed later, __free_one_page() does not split the free page
+ * either. The function handles this by splitting the free page or migrating
+ * the in-use page then splitting the free page.
+ */
+int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
+			int isolate_before_boundary)
+{
+	unsigned char saved_mt;
+	/*
+	 * scan at max(MAX_ORDER_NR_PAGES, pageblock_nr_pages) aligned range to
+	 * avoid isolate pageblocks belonging to a bigger free or in-use page
+	 */
+	unsigned long start_pfn = pfn_max_align_down(boundary_pfn);
+	unsigned long isolated_pageblock_pfn;
+	unsigned long pfn;
+
+	VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages));
+
+	if (isolate_before_boundary)
+		isolated_pageblock_pfn = boundary_pfn - pageblock_nr_pages;
+	else
+		isolated_pageblock_pfn = boundary_pfn;
+
+	saved_mt = get_pageblock_migratetype(pfn_to_page(isolated_pageblock_pfn));
+	set_pageblock_migratetype(pfn_to_page(isolated_pageblock_pfn), MIGRATE_ISOLATE);
+
+	for (pfn = start_pfn; pfn < boundary_pfn;) {
+		struct page *page = pfn_to_page(pfn);
+
+		/*
+		 * start_pfn is max(MAX_ORDER_NR_PAGES, pageblock_nr_pages)
+		 * aligned, if there is any free pages in [start_pfn, boundary_pfn),
+		 * its head page will always be in the range.
+		 */
+		if (PageBuddy(page)) {
+			int order = buddy_order(page);
+
+			if (pfn + (1UL << order) > boundary_pfn)
+				split_free_page(page, order, boundary_pfn - pfn);
+			pfn += (1UL << order);
+			continue;
+		}
+		/*
+		 * migrate compound pages then let the free page handling code
+		 * above do the rest
+		 */
+		if (PageHuge(page) || PageTransCompound(page)) {
+			unsigned long nr_pages = compound_nr(page);
+			int order = compound_order(page);
+			struct page *head = compound_head(page);
+			unsigned long head_pfn = page_to_pfn(head);
+
+			if (head_pfn + nr_pages >= boundary_pfn) {
+				int ret;
+				struct compact_control cc = {
+					.nr_migratepages = 0,
+					.order = -1,
+					.zone = page_zone(pfn_to_page(head_pfn)),
+					.mode = MIGRATE_SYNC,
+					.ignore_skip_hint = true,
+					.no_set_skip_hint = true,
+					.gfp_mask = current_gfp_context(gfp_flags),
+					.alloc_contig = true,
+				};
+
+				INIT_LIST_HEAD(&cc.migratepages);
+
+				ret = __alloc_contig_migrate_range(&cc, head_pfn,
+							head_pfn + nr_pages);
+
+				if (ret) {
+					/* restore the original migratetype */
+					set_pageblock_migratetype(
+						pfn_to_page(isolated_pageblock_pfn),
+						saved_mt);
+					return -EBUSY;
+				}
+				/*
+				 * reset pfn, let the free page handling code
+				 * above split the free page to the right
+				 * migratetype list.
+				 *
+				 * head_pfn is not used here as a hugetlb page
+				 * order can be bigger than MAX_ORDER-1, but
+				 * after it is freed, the free page order is not.
+				 * Use pfn within the range to find the head of
+				 * the free page and reset order to 0 if a hugetlb
+				 * page with >MAX_ORDER-1 order is encountered.
+				 */
+				if (order > MAX_ORDER-1)
+					order = 0;
+				while (!PageBuddy(pfn_to_page(pfn))) {
+					order++;
+					pfn &= ~0UL << order;
+				}
+				continue;
+			}
+			pfn += nr_pages;
+			continue;
+		}
+
+		pfn++;
+	}
+	return 0;
+}
+
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -9067,8 +9228,9 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 int alloc_contig_range(unsigned long start, unsigned long end,
 		       unsigned migratetype, gfp_t gfp_mask)
 {
-	unsigned long outer_start, outer_end;
-	unsigned int order;
+	unsigned long outer_end;
+	unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
+	unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
 	int ret = 0;
 
 	struct compact_control cc = {
@@ -9087,14 +9249,11 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
 	 * have different sizes, and due to the way page allocator
-	 * work, we align the range to biggest of the two pages so
-	 * that page allocator won't try to merge buddies from
-	 * different pageblocks and change MIGRATE_ISOLATE to some
-	 * other migration type.
+	 * work, start_isolate_page_range() has special handlings for this.
 	 *
 	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
 	 * migrate the pages from an unaligned range (ie. pages that
-	 * we are interested in).  This will put all the pages in
+	 * we are interested in). This will put all the pages in
 	 * range back to page allocator as MIGRATE_ISOLATE.
 	 *
 	 * When this is done, we take the pages in range from page
@@ -9107,9 +9266,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(start, end, migratetype, 0);
+	ret = start_isolate_page_range(start, end, migratetype, 0, gfp_mask);
 	if (ret)
-		return ret;
+		goto done;
 
 	drain_all_pages(cc.zone);
 
@@ -9128,68 +9287,28 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		goto done;
 	ret = 0;
 
-	/*
-	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
-	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
-	 * more, all pages in [start, end) are free in page allocator.
-	 * What we are going to do is to allocate all pages from
-	 * [start, end) (that is remove them from page allocator).
-	 *
-	 * The only problem is that pages at the beginning and at the
-	 * end of interesting range may be not aligned with pages that
-	 * page allocator holds, ie. they can be part of higher order
-	 * pages.  Because of this, we reserve the bigger range and
-	 * once this is done free the pages we are not interested in.
-	 *
-	 * We don't have to hold zone->lock here because the pages are
-	 * isolated thus they won't get removed from buddy.
-	 */
-
-	order = 0;
-	outer_start = start;
-	while (!PageBuddy(pfn_to_page(outer_start))) {
-		if (++order >= MAX_ORDER) {
-			outer_start = start;
-			break;
-		}
-		outer_start &= ~0UL << order;
-	}
-
-	if (outer_start != start) {
-		order = buddy_order(pfn_to_page(outer_start));
-
-		/*
-		 * outer_start page could be small order buddy page and
-		 * it doesn't include start page. Adjust outer_start
-		 * in this case to report failed page properly
-		 * on tracepoint in test_pages_isolated()
-		 */
-		if (outer_start + (1UL << order) <= start)
-			outer_start = start;
-	}
-
 	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, 0)) {
+	if (test_pages_isolated(alloc_start, alloc_end, 0)) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Grab isolated pages from freelists. */
-	outer_end = isolate_freepages_range(&cc, outer_start, end);
+	outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
 	if (!outer_end) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Free head and tail (if any) */
-	if (start != outer_start)
-		free_contig_range(outer_start, start - outer_start);
-	if (end != outer_end)
-		free_contig_range(end, outer_end - end);
+	if (start != alloc_start)
+		free_contig_range(alloc_start, start - alloc_start);
+	if (end != alloc_end)
+		free_contig_range(end, alloc_end - end);
 
 done:
-	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+	undo_isolate_page_range(alloc_start,
+				alloc_end, migratetype);
 	return ret;
 }
 EXPORT_SYMBOL(alloc_contig_range);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 64d093ab83ec..0256d5e1032c 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -285,6 +285,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  *					 and PageOffline() pages.
  *			REPORT_FAILURE - report details about the failure to
  *			isolate the range
+ * @gfp_flags:		GFP flags used for migrating pages that sit across the
+ *			range boundaries.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -293,6 +295,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * pages in the range finally, the caller have to free all pages in the range.
  * test_page_isolated() can be used for test it.
  *
+ * The function first tries to isolate the pageblocks at the beginning and end
+ * of the range, since there might be pages across the range boundaries.
+ * Afterwards, it isolates the rest of the range.
+ *
  * There is no high level synchronization mechanism that prevents two threads
  * from trying to isolate overlapping ranges. If this happens, one thread
  * will notice pageblocks in the overlapping range already set to isolate.
@@ -313,21 +319,38 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			     unsigned migratetype, int flags)
+			     unsigned migratetype, int flags, gfp_t gfp_flags)
 {
 	unsigned long pfn;
 	struct page *page;
+	/* isolation is done at page block granularity */
+	unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages);
+	unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages);
+	int ret;
 
-	unsigned long isolate_start = pfn_max_align_down(start_pfn);
-	unsigned long isolate_end = pfn_max_align_up(end_pfn);
+	/* isolate [isolate_start, isolate_start + pageblock_nr_pages] pageblock */
+	ret = isolate_single_pageblock(isolate_start, gfp_flags, 0);
+	if (ret)
+		return ret;
+
+	/* isolate [isolate_end - pageblock_nr_pages, isolate_end] pageblock */
+	ret = isolate_single_pageblock(isolate_end, gfp_flags, 1);
+	if (ret) {
+		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
+		return ret;
+	}
 
-	for (pfn = isolate_start;
-	     pfn < isolate_end;
+	/* skip isolated pageblocks at the beginning and end */
+	for (pfn = isolate_start + pageblock_nr_pages;
+	     pfn < isolate_end - pageblock_nr_pages;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (page && set_migratetype_isolate(page, migratetype, flags,
 					start_pfn, end_pfn)) {
 			undo_isolate_page_range(isolate_start, pfn, migratetype);
+			unset_migratetype_isolate(
+				pfn_to_page(isolate_end - pageblock_nr_pages),
+				migratetype);
 			return -EBUSY;
 		}
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 4/6] mm: cma: use pageblock_order as the single alignment
  2022-02-11 16:41 [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (2 preceding siblings ...)
  2022-02-11 16:41 ` [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity Zi Yan
@ 2022-02-11 16:41 ` Zi Yan
  2022-02-11 16:41 ` [PATCH v5 5/6] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
  2022-02-11 16:41 ` [PATCH v5 6/6] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
  5 siblings, 0 replies; 12+ messages in thread
From: Zi Yan @ 2022-02-11 16:41 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Oscar Salvador, Robin Murphy, linux-kernel,
	iommu, Mike Rapoport, Eric Ren, virtualization, linuxppc-dev,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_order
alignment.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h  | 5 +----
 kernel/dma/contiguous.c | 2 +-
 mm/cma.c                | 6 ++----
 mm/page_alloc.c         | 4 ++--
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3fff6deca2c0..da38c8436493 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
 	 *
 	 * The way to use it is to change migratetype of a range of
 	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger than
-	 * a single pageblock.
+	 * __free_pageblock_cma() function.
 	 */
 	MIGRATE_CMA,
 #endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {
 
 static int __init rmem_cma_setup(struct reserved_mem *rmem)
 {
-	phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+	phys_addr_t align = PAGE_SIZE << pageblock_order;
 	phys_addr_t mask = align - 1;
 	unsigned long node = rmem->fdt_node;
 	bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index 766f1b82b532..b2e927fab7b5 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -187,8 +187,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 		return -EINVAL;
 
 	/* ensure minimal alignment required by mm core */
-	alignment = PAGE_SIZE <<
-			max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+	alignment = PAGE_SIZE << pageblock_order;
 
 	/* alignment should be aligned with order_per_bit */
 	if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -275,8 +274,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 	 * migratetype page by page allocator's buddy algorithm. In the case,
 	 * you couldn't get a contiguous memory, which is not what we want.
 	 */
-	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE <<
-			  max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE << pageblock_order);
 	if (fixed && base & (alignment - 1)) {
 		ret = -EINVAL;
 		pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a4fa21aea5c..ac9432e63ce1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9214,8 +9214,8 @@ int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
  *			be either of the two.
  * @gfp_mask:	GFP mask to use during compaction
  *
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned.  The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
  *
  * The first thing this routine does is attempt to MIGRATE_ISOLATE all
  * pageblocks in the range.  Once isolated, the pageblocks should not
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 5/6] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.
  2022-02-11 16:41 [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (3 preceding siblings ...)
  2022-02-11 16:41 ` [PATCH v5 4/6] mm: cma: use pageblock_order as the single alignment Zi Yan
@ 2022-02-11 16:41 ` Zi Yan
  2022-02-11 16:41 ` [PATCH v5 6/6] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
  5 siblings, 0 replies; 12+ messages in thread
From: Zi Yan @ 2022-02-11 16:41 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Oscar Salvador, Robin Murphy, linux-kernel,
	iommu, Mike Rapoport, Eric Ren, virtualization, linuxppc-dev,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 drivers/virtio/virtio_mem.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 38becd8d578c..2307e65d18c2 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2476,13 +2476,12 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
 	/*
-	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
-	 * pageblock_nr_pages pages. This:
+	 * We want subblocks to span at least pageblock_nr_pages pages.
+	 * This:
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-			pageblock_nr_pages) * PAGE_SIZE;
+	sb_size = pageblock_nr_pages * PAGE_SIZE;
 	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
 	if (sb_size < memory_block_size_bytes() && !force_bbm) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 6/6] arch: powerpc: adjust fadump alignment to be pageblock aligned.
  2022-02-11 16:41 [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (4 preceding siblings ...)
  2022-02-11 16:41 ` [PATCH v5 5/6] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
@ 2022-02-11 16:41 ` Zi Yan
  5 siblings, 0 replies; 12+ messages in thread
From: Zi Yan @ 2022-02-11 16:41 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Oscar Salvador, Robin Murphy, linux-kernel,
	iommu, Mike Rapoport, Eric Ren, virtualization, linuxppc-dev,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 arch/powerpc/include/asm/fadump-internal.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h
index 52189928ec08..fbfca85b4200 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
 #define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
 /* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE <<				\
-				 max_t(unsigned long, MAX_ORDER - 1,	\
-				 pageblock_order))
+#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE << pageblock_order)
 
 /* FAD commands */
 #define FADUMP_REGISTER			1
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity
  2022-02-11 16:41 ` [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity Zi Yan
@ 2022-02-14  7:26   ` Christoph Hellwig
  2022-02-14 15:46     ` Zi Yan
  2022-02-14  7:59   ` Christophe Leroy
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-02-14  7:26 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Mike Rapoport, Eric Ren,
	Oscar Salvador, Robin Murphy, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

> +int
> +isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, int isolate_before_boundary);

Please avoid the completely unreadably long line. i.e.

int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
		int isolate_before_boundary);

Same in various other spots.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity
  2022-02-11 16:41 ` [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity Zi Yan
  2022-02-14  7:26   ` Christoph Hellwig
@ 2022-02-14  7:59   ` Christophe Leroy
  2022-02-14 16:03     ` Zi Yan
  1 sibling, 1 reply; 12+ messages in thread
From: Christophe Leroy @ 2022-02-14  7:59 UTC (permalink / raw)
  To: Zi Yan, David Hildenbrand, linux-mm
  Cc: Mel Gorman, linuxppc-dev, linux-kernel, virtualization, iommu,
	Vlastimil Babka, Eric Ren, Marek Szyprowski, Robin Murphy,
	Christoph Hellwig, Mike Rapoport, Oscar Salvador



Le 11/02/2022 à 17:41, Zi Yan a écrit :
> From: Zi Yan <ziy@nvidia.com>
> 
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
> 
> Special handling is needed for free pages and in-use pages across the
> boundaries of the range specified alloc_contig_range(). Because these
> partially isolated pages causes free page accounting issues. The free
> pages will be split and freed into separate migratetype lists; the
> in-use pages will be migrated then the freed pages will be handled.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   include/linux/page-isolation.h |   2 +-
>   mm/internal.h                  |   3 +
>   mm/memory_hotplug.c            |   3 +-
>   mm/page_alloc.c                | 235 +++++++++++++++++++++++++--------
>   mm/page_isolation.c            |  33 ++++-
>   5 files changed, 211 insertions(+), 65 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 4ef7be6def83..78ff940cc169 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -54,7 +54,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
>    */
>   int
>   start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> -			 unsigned migratetype, int flags);
> +			 unsigned migratetype, int flags, gfp_t gfp_flags);
>   
>   /*
>    * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
> diff --git a/mm/internal.h b/mm/internal.h
> index 0d240e876831..509cbdc25992 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -319,6 +319,9 @@ isolate_freepages_range(struct compact_control *cc,
>   int
>   isolate_migratepages_range(struct compact_control *cc,
>   			   unsigned long low_pfn, unsigned long end_pfn);
> +
> +int
> +isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, int isolate_before_boundary);
>   #endif
>   int find_suitable_fallback(struct free_area *area, unsigned int order,
>   			int migratetype, bool only_stealable, bool *can_steal);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ce68098832aa..82406d2f3e46 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1863,7 +1863,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>   	/* set above range as isolated */
>   	ret = start_isolate_page_range(start_pfn, end_pfn,
>   				       MIGRATE_MOVABLE,
> -				       MEMORY_OFFLINE | REPORT_FAILURE);
> +				       MEMORY_OFFLINE | REPORT_FAILURE,
> +				       GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
>   	if (ret) {
>   		reason = "failure to isolate range";
>   		goto failed_removal_pcplists_disabled;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 62ef78f3d771..7a4fa21aea5c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8985,7 +8985,7 @@ static inline void alloc_contig_dump_pages(struct list_head *page_list)
>   #endif
>   
>   /* [start, end) must belong to a single zone. */
> -static int __alloc_contig_migrate_range(struct compact_control *cc,
> +int __alloc_contig_migrate_range(struct compact_control *cc,
>   					unsigned long start, unsigned long end)
>   {
>   	/* This function is based on compact_zone() from compaction.c. */
> @@ -9043,6 +9043,167 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   	return 0;
>   }
>   
> +/**
> + * split_free_page() -- split a free page at split_pfn_offset
> + * @free_page:		the original free page
> + * @order:		the order of the page
> + * @split_pfn_offset:	split offset within the page
> + *
> + * It is used when the free page crosses two pageblocks with different migratetypes
> + * at split_pfn_offset within the page. The split free page will be put into
> + * separate migratetype lists afterwards. Otherwise, the function achieves
> + * nothing.
> + */
> +static inline void split_free_page(struct page *free_page,
> +				int order, unsigned long split_pfn_offset)
> +{
> +	struct zone *zone = page_zone(free_page);
> +	unsigned long free_page_pfn = page_to_pfn(free_page);
> +	unsigned long pfn;
> +	unsigned long flags;
> +	int free_page_order;
> +
> +	spin_lock_irqsave(&zone->lock, flags);
> +	del_page_from_free_list(free_page, zone, order);
> +	for (pfn = free_page_pfn;
> +	     pfn < free_page_pfn + (1UL << order);) {
> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> +		free_page_order = order_base_2(split_pfn_offset);
> +		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
> +				mt, FPI_NONE);
> +		pfn += 1UL << free_page_order;
> +		split_pfn_offset -= (1UL << free_page_order);
> +		/* we have done the first part, now switch to second part */
> +		if (split_pfn_offset == 0)
> +			split_pfn_offset = (1UL << order) - (pfn - free_page_pfn);
> +	}
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
> +/**
> + * isolate_single_pageblock() -- tries to isolate a pageblock that might be
> + * within a free or in-use page.
> + * @boundary_pfn:		pageblock-aligned pfn that a page might cross
> + * @gfp_flags:			GFP flags used for migrating pages
> + * @isolate_before_boundary:	isolate the pageblock before (1) or after (0)
> + *				the boundary_pfn
> + *
> + * Free and in-use pages can be as big as MAX_ORDER-1 and contain more than one
> + * pageblock. When not all pageblocks within a page are isolated at the same
> + * time, free page accounting can go wrong. For example, in the case of
> + * MAX_ORDER-1 = pageblock_order + 1, a MAX_ORDER-1 page has two pagelbocks.
> + * [        MAX_ORDER-1          ]
> + * [  pageblock0  |  pageblock1  ]
> + * When either pageblock is isolated, if it is a free page, the page is not
> + * split into separate migratetype lists, which is supposed to; if it is an
> + * in-use page and freed later, __free_one_page() does not split the free page
> + * either. The function handles this by splitting the free page or migrating
> + * the in-use page then splitting the free page.
> + */
> +int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
> +			int isolate_before_boundary)

Do you need such big param name ?

See 
https://www.kernel.org/doc/html/latest/process/coding-style.html?highlight=style#naming

isolate_before_boundary could probably be shorter.

And should be a bool by the way.

> +{
> +	unsigned char saved_mt;
> +	/*
> +	 * scan at max(MAX_ORDER_NR_PAGES, pageblock_nr_pages) aligned range to
> +	 * avoid isolate pageblocks belonging to a bigger free or in-use page
> +	 */
> +	unsigned long start_pfn = pfn_max_align_down(boundary_pfn);
> +	unsigned long isolated_pageblock_pfn;

Variable name too long.

> +	unsigned long pfn;
> +
> +	VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages));
> +
> +	if (isolate_before_boundary)
> +		isolated_pageblock_pfn = boundary_pfn - pageblock_nr_pages;
> +	else
> +		isolated_pageblock_pfn = boundary_pfn;
> +
> +	saved_mt = get_pageblock_migratetype(pfn_to_page(isolated_pageblock_pfn));
> +	set_pageblock_migratetype(pfn_to_page(isolated_pageblock_pfn), MIGRATE_ISOLATE);
> +
> +	for (pfn = start_pfn; pfn < boundary_pfn;) {

This loop is a bit long a deep. Isn't there a way to put what's in "if 
(PageHuge(page) || PageTransCompound(page))" into a sub-function ?

See 
https://www.kernel.org/doc/html/latest/process/coding-style.html?highlight=style#functions

> +		struct page *page = pfn_to_page(pfn);
> +
> +		/*
> +		 * start_pfn is max(MAX_ORDER_NR_PAGES, pageblock_nr_pages)
> +		 * aligned, if there is any free pages in [start_pfn, boundary_pfn),
> +		 * its head page will always be in the range.
> +		 */
> +		if (PageBuddy(page)) {
> +			int order = buddy_order(page);
> +
> +			if (pfn + (1UL << order) > boundary_pfn)
> +				split_free_page(page, order, boundary_pfn - pfn);
> +			pfn += (1UL << order);
> +			continue;
> +		}
> +		/*
> +		 * migrate compound pages then let the free page handling code
> +		 * above do the rest
> +		 */
> +		if (PageHuge(page) || PageTransCompound(page)) {
> +			unsigned long nr_pages = compound_nr(page);
> +			int order = compound_order(page);
> +			struct page *head = compound_head(page);
> +			unsigned long head_pfn = page_to_pfn(head);
> +
> +			if (head_pfn + nr_pages >= boundary_pfn) {
> +				int ret;
> +				struct compact_control cc = {
> +					.nr_migratepages = 0,
> +					.order = -1,
> +					.zone = page_zone(pfn_to_page(head_pfn)),
> +					.mode = MIGRATE_SYNC,
> +					.ignore_skip_hint = true,
> +					.no_set_skip_hint = true,
> +					.gfp_mask = current_gfp_context(gfp_flags),
> +					.alloc_contig = true,
> +				};
> +
> +				INIT_LIST_HEAD(&cc.migratepages);
> +
> +				ret = __alloc_contig_migrate_range(&cc, head_pfn,
> +							head_pfn + nr_pages);
> +
> +				if (ret) {
> +					/* restore the original migratetype */
> +					set_pageblock_migratetype(
> +						pfn_to_page(isolated_pageblock_pfn),
> +						saved_mt);
> +					return -EBUSY;
> +				}
> +				/*
> +				 * reset pfn, let the free page handling code
> +				 * above split the free page to the right
> +				 * migratetype list.
> +				 *
> +				 * head_pfn is not used here as a hugetlb page
> +				 * order can be bigger than MAX_ORDER-1, but
> +				 * after it is freed, the free page order is not.
> +				 * Use pfn within the range to find the head of
> +				 * the free page and reset order to 0 if a hugetlb
> +				 * page with >MAX_ORDER-1 order is encountered.
> +				 */
> +				if (order > MAX_ORDER-1)
> +					order = 0;
> +				while (!PageBuddy(pfn_to_page(pfn))) {
> +					order++;
> +					pfn &= ~0UL << order;
> +				}
> +				continue;
> +			}
> +			pfn += nr_pages;
> +			continue;
> +		}
> +
> +		pfn++;
> +	}
> +	return 0;
> +}
> +
> +
>   /**
>    * alloc_contig_range() -- tries to allocate given range of pages
>    * @start:	start PFN to allocate
> @@ -9067,8 +9228,9 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   int alloc_contig_range(unsigned long start, unsigned long end,
>   		       unsigned migratetype, gfp_t gfp_mask)
>   {
> -	unsigned long outer_start, outer_end;
> -	unsigned int order;
> +	unsigned long outer_end;
> +	unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
> +	unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
>   	int ret = 0;
>   
>   	struct compact_control cc = {
> @@ -9087,14 +9249,11 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   	 * What we do here is we mark all pageblocks in range as
>   	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
>   	 * have different sizes, and due to the way page allocator
> -	 * work, we align the range to biggest of the two pages so
> -	 * that page allocator won't try to merge buddies from
> -	 * different pageblocks and change MIGRATE_ISOLATE to some
> -	 * other migration type.
> +	 * work, start_isolate_page_range() has special handlings for this.
>   	 *
>   	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
>   	 * migrate the pages from an unaligned range (ie. pages that
> -	 * we are interested in).  This will put all the pages in
> +	 * we are interested in). This will put all the pages in
>   	 * range back to page allocator as MIGRATE_ISOLATE.
>   	 *
>   	 * When this is done, we take the pages in range from page
> @@ -9107,9 +9266,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   	 * put back to page allocator so that buddy can use them.
>   	 */
>   
> -	ret = start_isolate_page_range(start, end, migratetype, 0);
> +	ret = start_isolate_page_range(start, end, migratetype, 0, gfp_mask);
>   	if (ret)
> -		return ret;
> +		goto done;
>   
>   	drain_all_pages(cc.zone);
>   
> @@ -9128,68 +9287,28 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   		goto done;
>   	ret = 0;
>   
> -	/*
> -	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
> -	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
> -	 * more, all pages in [start, end) are free in page allocator.
> -	 * What we are going to do is to allocate all pages from
> -	 * [start, end) (that is remove them from page allocator).
> -	 *
> -	 * The only problem is that pages at the beginning and at the
> -	 * end of interesting range may be not aligned with pages that
> -	 * page allocator holds, ie. they can be part of higher order
> -	 * pages.  Because of this, we reserve the bigger range and
> -	 * once this is done free the pages we are not interested in.
> -	 *
> -	 * We don't have to hold zone->lock here because the pages are
> -	 * isolated thus they won't get removed from buddy.
> -	 */
> -
> -	order = 0;
> -	outer_start = start;
> -	while (!PageBuddy(pfn_to_page(outer_start))) {
> -		if (++order >= MAX_ORDER) {
> -			outer_start = start;
> -			break;
> -		}
> -		outer_start &= ~0UL << order;
> -	}
> -
> -	if (outer_start != start) {
> -		order = buddy_order(pfn_to_page(outer_start));
> -
> -		/*
> -		 * outer_start page could be small order buddy page and
> -		 * it doesn't include start page. Adjust outer_start
> -		 * in this case to report failed page properly
> -		 * on tracepoint in test_pages_isolated()
> -		 */
> -		if (outer_start + (1UL << order) <= start)
> -			outer_start = start;
> -	}
> -
>   	/* Make sure the range is really isolated. */
> -	if (test_pages_isolated(outer_start, end, 0)) {
> +	if (test_pages_isolated(alloc_start, alloc_end, 0)) {
>   		ret = -EBUSY;
>   		goto done;
>   	}
>   
>   	/* Grab isolated pages from freelists. */
> -	outer_end = isolate_freepages_range(&cc, outer_start, end);
> +	outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
>   	if (!outer_end) {
>   		ret = -EBUSY;
>   		goto done;
>   	}
>   
>   	/* Free head and tail (if any) */
> -	if (start != outer_start)
> -		free_contig_range(outer_start, start - outer_start);
> -	if (end != outer_end)
> -		free_contig_range(end, outer_end - end);
> +	if (start != alloc_start)
> +		free_contig_range(alloc_start, start - alloc_start);
> +	if (end != alloc_end)
> +		free_contig_range(end, alloc_end - end);
>   
>   done:
> -	undo_isolate_page_range(pfn_max_align_down(start),
> -				pfn_max_align_up(end), migratetype);
> +	undo_isolate_page_range(alloc_start,
> +				alloc_end, migratetype);
>   	return ret;
>   }
>   EXPORT_SYMBOL(alloc_contig_range);
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 64d093ab83ec..0256d5e1032c 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -285,6 +285,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>    *					 and PageOffline() pages.
>    *			REPORT_FAILURE - report details about the failure to
>    *			isolate the range
> + * @gfp_flags:		GFP flags used for migrating pages that sit across the
> + *			range boundaries.
>    *
>    * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
>    * the range will never be allocated. Any free pages and pages freed in the
> @@ -293,6 +295,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>    * pages in the range finally, the caller have to free all pages in the range.
>    * test_page_isolated() can be used for test it.
>    *
> + * The function first tries to isolate the pageblocks at the beginning and end
> + * of the range, since there might be pages across the range boundaries.
> + * Afterwards, it isolates the rest of the range.
> + *
>    * There is no high level synchronization mechanism that prevents two threads
>    * from trying to isolate overlapping ranges. If this happens, one thread
>    * will notice pageblocks in the overlapping range already set to isolate.
> @@ -313,21 +319,38 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>    * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
>    */
>   int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> -			     unsigned migratetype, int flags)
> +			     unsigned migratetype, int flags, gfp_t gfp_flags)
>   {
>   	unsigned long pfn;
>   	struct page *page;
> +	/* isolation is done at page block granularity */
> +	unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages);
> +	unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages);
> +	int ret;
>   
> -	unsigned long isolate_start = pfn_max_align_down(start_pfn);
> -	unsigned long isolate_end = pfn_max_align_up(end_pfn);
> +	/* isolate [isolate_start, isolate_start + pageblock_nr_pages] pageblock */
> +	ret = isolate_single_pageblock(isolate_start, gfp_flags, 0);
> +	if (ret)
> +		return ret;
> +
> +	/* isolate [isolate_end - pageblock_nr_pages, isolate_end] pageblock */
> +	ret = isolate_single_pageblock(isolate_end, gfp_flags, 1);
> +	if (ret) {
> +		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
> +		return ret;
> +	}
>   
> -	for (pfn = isolate_start;
> -	     pfn < isolate_end;
> +	/* skip isolated pageblocks at the beginning and end */
> +	for (pfn = isolate_start + pageblock_nr_pages;
> +	     pfn < isolate_end - pageblock_nr_pages;
>   	     pfn += pageblock_nr_pages) {
>   		page = __first_valid_page(pfn, pageblock_nr_pages);
>   		if (page && set_migratetype_isolate(page, migratetype, flags,
>   					start_pfn, end_pfn)) {
>   			undo_isolate_page_range(isolate_start, pfn, migratetype);
> +			unset_migratetype_isolate(
> +				pfn_to_page(isolate_end - pageblock_nr_pages),
> +				migratetype);
>   			return -EBUSY;
>   		}
>   	}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  2022-02-11 16:41 ` [PATCH v5 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
@ 2022-02-14 10:44   ` Mike Rapoport
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Rapoport @ 2022-02-14 10:44 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Oscar Salvador,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

On Fri, Feb 11, 2022 at 11:41:30AM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
> mm/page_alloc.c and make it static.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  include/linux/page-isolation.h |   2 -
>  mm/page_alloc.c                | 119 ---------------------------------
>  mm/page_isolation.c            | 119 +++++++++++++++++++++++++++++++++
>  3 files changed, 119 insertions(+), 121 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 572458016331..e14eddf6741a 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype)
>  #define MEMORY_OFFLINE	0x1
>  #define REPORT_FAILURE	0x2
>  
> -struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> -				 int migratetype, int flags);
>  void set_pageblock_migratetype(struct page *page, int migratetype);
>  int move_freepages_block(struct zone *zone, struct page *page,
>  				int migratetype, int *num_movable);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index cface1d38093..e2c6a67fc386 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8962,125 +8962,6 @@ void *__init alloc_large_system_hash(const char *tablename,
>  	return table;
>  }
>  
> -/*
> - * This function checks whether pageblock includes unmovable pages or not.
> - *
> - * PageLRU check without isolation or lru_lock could race so that
> - * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
> - * check without lock_page also may miss some movable non-lru pages at
> - * race condition. So you can't expect this function should be exact.
> - *
> - * Returns a page without holding a reference. If the caller wants to
> - * dereference that page (e.g., dumping), it has to make sure that it
> - * cannot get removed (e.g., via memory unplug) concurrently.
> - *
> - */
> -struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> -				 int migratetype, int flags)
> -{
> -	unsigned long iter = 0;
> -	unsigned long pfn = page_to_pfn(page);
> -	unsigned long offset = pfn % pageblock_nr_pages;
> -
> -	if (is_migrate_cma_page(page)) {
> -		/*
> -		 * CMA allocations (alloc_contig_range) really need to mark
> -		 * isolate CMA pageblocks even when they are not movable in fact
> -		 * so consider them movable here.
> -		 */
> -		if (is_migrate_cma(migratetype))
> -			return NULL;
> -
> -		return page;
> -	}
> -
> -	for (; iter < pageblock_nr_pages - offset; iter++) {
> -		page = pfn_to_page(pfn + iter);
> -
> -		/*
> -		 * Both, bootmem allocations and memory holes are marked
> -		 * PG_reserved and are unmovable. We can even have unmovable
> -		 * allocations inside ZONE_MOVABLE, for example when
> -		 * specifying "movablecore".
> -		 */
> -		if (PageReserved(page))
> -			return page;
> -
> -		/*
> -		 * If the zone is movable and we have ruled out all reserved
> -		 * pages then it should be reasonably safe to assume the rest
> -		 * is movable.
> -		 */
> -		if (zone_idx(zone) == ZONE_MOVABLE)
> -			continue;
> -
> -		/*
> -		 * Hugepages are not in LRU lists, but they're movable.
> -		 * THPs are on the LRU, but need to be counted as #small pages.
> -		 * We need not scan over tail pages because we don't
> -		 * handle each tail page individually in migration.
> -		 */
> -		if (PageHuge(page) || PageTransCompound(page)) {
> -			struct page *head = compound_head(page);
> -			unsigned int skip_pages;
> -
> -			if (PageHuge(page)) {
> -				if (!hugepage_migration_supported(page_hstate(head)))
> -					return page;
> -			} else if (!PageLRU(head) && !__PageMovable(head)) {
> -				return page;
> -			}
> -
> -			skip_pages = compound_nr(head) - (page - head);
> -			iter += skip_pages - 1;
> -			continue;
> -		}
> -
> -		/*
> -		 * We can't use page_count without pin a page
> -		 * because another CPU can free compound page.
> -		 * This check already skips compound tails of THP
> -		 * because their page->_refcount is zero at all time.
> -		 */
> -		if (!page_ref_count(page)) {
> -			if (PageBuddy(page))
> -				iter += (1 << buddy_order(page)) - 1;
> -			continue;
> -		}
> -
> -		/*
> -		 * The HWPoisoned page may be not in buddy system, and
> -		 * page_count() is not 0.
> -		 */
> -		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
> -			continue;
> -
> -		/*
> -		 * We treat all PageOffline() pages as movable when offlining
> -		 * to give drivers a chance to decrement their reference count
> -		 * in MEM_GOING_OFFLINE in order to indicate that these pages
> -		 * can be offlined as there are no direct references anymore.
> -		 * For actually unmovable PageOffline() where the driver does
> -		 * not support this, we will fail later when trying to actually
> -		 * move these pages that still have a reference count > 0.
> -		 * (false negatives in this function only)
> -		 */
> -		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
> -			continue;
> -
> -		if (__PageMovable(page) || PageLRU(page))
> -			continue;
> -
> -		/*
> -		 * If there are RECLAIMABLE pages, we need to check
> -		 * it.  But now, memory offline itself doesn't call
> -		 * shrink_node_slabs() and it still to be fixed.
> -		 */
> -		return page;
> -	}
> -	return NULL;
> -}
> -
>  #ifdef CONFIG_CONTIG_ALLOC
>  static unsigned long pfn_max_align_down(unsigned long pfn)
>  {
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index f67c4c70f17f..b34f1310aeaa 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -15,6 +15,125 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/page_isolation.h>
>  
> +/*
> + * This function checks whether pageblock includes unmovable pages or not.
> + *
> + * PageLRU check without isolation or lru_lock could race so that
> + * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
> + * check without lock_page also may miss some movable non-lru pages at
> + * race condition. So you can't expect this function should be exact.
> + *
> + * Returns a page without holding a reference. If the caller wants to
> + * dereference that page (e.g., dumping), it has to make sure that it
> + * cannot get removed (e.g., via memory unplug) concurrently.
> + *
> + */
> +static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> +				 int migratetype, int flags)
> +{
> +	unsigned long iter = 0;
> +	unsigned long pfn = page_to_pfn(page);
> +	unsigned long offset = pfn % pageblock_nr_pages;
> +
> +	if (is_migrate_cma_page(page)) {
> +		/*
> +		 * CMA allocations (alloc_contig_range) really need to mark
> +		 * isolate CMA pageblocks even when they are not movable in fact
> +		 * so consider them movable here.
> +		 */
> +		if (is_migrate_cma(migratetype))
> +			return NULL;
> +
> +		return page;
> +	}
> +
> +	for (; iter < pageblock_nr_pages - offset; iter++) {
> +		page = pfn_to_page(pfn + iter);
> +
> +		/*
> +		 * Both, bootmem allocations and memory holes are marked
> +		 * PG_reserved and are unmovable. We can even have unmovable
> +		 * allocations inside ZONE_MOVABLE, for example when
> +		 * specifying "movablecore".
> +		 */
> +		if (PageReserved(page))
> +			return page;
> +
> +		/*
> +		 * If the zone is movable and we have ruled out all reserved
> +		 * pages then it should be reasonably safe to assume the rest
> +		 * is movable.
> +		 */
> +		if (zone_idx(zone) == ZONE_MOVABLE)
> +			continue;
> +
> +		/*
> +		 * Hugepages are not in LRU lists, but they're movable.
> +		 * THPs are on the LRU, but need to be counted as #small pages.
> +		 * We need not scan over tail pages because we don't
> +		 * handle each tail page individually in migration.
> +		 */
> +		if (PageHuge(page) || PageTransCompound(page)) {
> +			struct page *head = compound_head(page);
> +			unsigned int skip_pages;
> +
> +			if (PageHuge(page)) {
> +				if (!hugepage_migration_supported(page_hstate(head)))
> +					return page;
> +			} else if (!PageLRU(head) && !__PageMovable(head)) {
> +				return page;
> +			}
> +
> +			skip_pages = compound_nr(head) - (page - head);
> +			iter += skip_pages - 1;
> +			continue;
> +		}
> +
> +		/*
> +		 * We can't use page_count without pin a page
> +		 * because another CPU can free compound page.
> +		 * This check already skips compound tails of THP
> +		 * because their page->_refcount is zero at all time.
> +		 */
> +		if (!page_ref_count(page)) {
> +			if (PageBuddy(page))
> +				iter += (1 << buddy_order(page)) - 1;
> +			continue;
> +		}
> +
> +		/*
> +		 * The HWPoisoned page may be not in buddy system, and
> +		 * page_count() is not 0.
> +		 */
> +		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
> +			continue;
> +
> +		/*
> +		 * We treat all PageOffline() pages as movable when offlining
> +		 * to give drivers a chance to decrement their reference count
> +		 * in MEM_GOING_OFFLINE in order to indicate that these pages
> +		 * can be offlined as there are no direct references anymore.
> +		 * For actually unmovable PageOffline() where the driver does
> +		 * not support this, we will fail later when trying to actually
> +		 * move these pages that still have a reference count > 0.
> +		 * (false negatives in this function only)
> +		 */
> +		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
> +			continue;
> +
> +		if (__PageMovable(page) || PageLRU(page))
> +			continue;
> +
> +		/*
> +		 * If there are RECLAIMABLE pages, we need to check
> +		 * it.  But now, memory offline itself doesn't call
> +		 * shrink_node_slabs() and it still to be fixed.
> +		 */
> +		return page;
> +	}
> +	return NULL;
> +}
> +
>  static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>  {
>  	struct zone *zone = page_zone(page);
> -- 
> 2.34.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity
  2022-02-14  7:26   ` Christoph Hellwig
@ 2022-02-14 15:46     ` Zi Yan
  0 siblings, 0 replies; 12+ messages in thread
From: Zi Yan @ 2022-02-14 15:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Mike Rapoport, Eric Ren,
	Oscar Salvador, Robin Murphy, Vlastimil Babka, Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On 14 Feb 2022, at 2:26, Christoph Hellwig wrote:

>> +int
>> +isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, int isolate_before_boundary);
>
> Please avoid the completely unreadably long line. i.e.
>
> int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
> 		int isolate_before_boundary);
>
> Same in various other spots.

OK. Thanks for pointing it out. checkpatch.pl did not report any
warning about this. It seems that the column limit has been relaxed
to 100. Anyway, I will make it shorter.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity
  2022-02-14  7:59   ` Christophe Leroy
@ 2022-02-14 16:03     ` Zi Yan
  0 siblings, 0 replies; 12+ messages in thread
From: Zi Yan @ 2022-02-14 16:03 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: David Hildenbrand, Robin Murphy, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Vlastimil Babka, Eric Ren,
	Marek Szyprowski, Mel Gorman, Christoph Hellwig, Mike Rapoport,
	Oscar Salvador

[-- Attachment #1: Type: text/plain, Size: 18627 bytes --]

On 14 Feb 2022, at 2:59, Christophe Leroy wrote:

> Le 11/02/2022 à 17:41, Zi Yan a écrit :
>> From: Zi Yan <ziy@nvidia.com>
>>
>> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
>> pageblocks with different migratetypes. It might unnecessarily convert
>> extra pageblocks at the beginning and at the end of the range. Change
>> alloc_contig_range() to work at pageblock granularity.
>>
>> Special handling is needed for free pages and in-use pages across the
>> boundaries of the range specified alloc_contig_range(). Because these
>> partially isolated pages causes free page accounting issues. The free
>> pages will be split and freed into separate migratetype lists; the
>> in-use pages will be migrated then the freed pages will be handled.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>   include/linux/page-isolation.h |   2 +-
>>   mm/internal.h                  |   3 +
>>   mm/memory_hotplug.c            |   3 +-
>>   mm/page_alloc.c                | 235 +++++++++++++++++++++++++--------
>>   mm/page_isolation.c            |  33 ++++-
>>   5 files changed, 211 insertions(+), 65 deletions(-)
>>
>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>> index 4ef7be6def83..78ff940cc169 100644
>> --- a/include/linux/page-isolation.h
>> +++ b/include/linux/page-isolation.h
>> @@ -54,7 +54,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
>>    */
>>   int
>>   start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>> -			 unsigned migratetype, int flags);
>> +			 unsigned migratetype, int flags, gfp_t gfp_flags);
>>
>>   /*
>>    * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 0d240e876831..509cbdc25992 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -319,6 +319,9 @@ isolate_freepages_range(struct compact_control *cc,
>>   int
>>   isolate_migratepages_range(struct compact_control *cc,
>>   			   unsigned long low_pfn, unsigned long end_pfn);
>> +
>> +int
>> +isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, int isolate_before_boundary);
>>   #endif
>>   int find_suitable_fallback(struct free_area *area, unsigned int order,
>>   			int migratetype, bool only_stealable, bool *can_steal);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index ce68098832aa..82406d2f3e46 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1863,7 +1863,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>>   	/* set above range as isolated */
>>   	ret = start_isolate_page_range(start_pfn, end_pfn,
>>   				       MIGRATE_MOVABLE,
>> -				       MEMORY_OFFLINE | REPORT_FAILURE);
>> +				       MEMORY_OFFLINE | REPORT_FAILURE,
>> +				       GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
>>   	if (ret) {
>>   		reason = "failure to isolate range";
>>   		goto failed_removal_pcplists_disabled;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 62ef78f3d771..7a4fa21aea5c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8985,7 +8985,7 @@ static inline void alloc_contig_dump_pages(struct list_head *page_list)
>>   #endif
>>
>>   /* [start, end) must belong to a single zone. */
>> -static int __alloc_contig_migrate_range(struct compact_control *cc,
>> +int __alloc_contig_migrate_range(struct compact_control *cc,
>>   					unsigned long start, unsigned long end)
>>   {
>>   	/* This function is based on compact_zone() from compaction.c. */
>> @@ -9043,6 +9043,167 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>>   	return 0;
>>   }
>>
>> +/**
>> + * split_free_page() -- split a free page at split_pfn_offset
>> + * @free_page:		the original free page
>> + * @order:		the order of the page
>> + * @split_pfn_offset:	split offset within the page
>> + *
>> + * It is used when the free page crosses two pageblocks with different migratetypes
>> + * at split_pfn_offset within the page. The split free page will be put into
>> + * separate migratetype lists afterwards. Otherwise, the function achieves
>> + * nothing.
>> + */
>> +static inline void split_free_page(struct page *free_page,
>> +				int order, unsigned long split_pfn_offset)
>> +{
>> +	struct zone *zone = page_zone(free_page);
>> +	unsigned long free_page_pfn = page_to_pfn(free_page);
>> +	unsigned long pfn;
>> +	unsigned long flags;
>> +	int free_page_order;
>> +
>> +	spin_lock_irqsave(&zone->lock, flags);
>> +	del_page_from_free_list(free_page, zone, order);
>> +	for (pfn = free_page_pfn;
>> +	     pfn < free_page_pfn + (1UL << order);) {
>> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>> +
>> +		free_page_order = order_base_2(split_pfn_offset);
>> +		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
>> +				mt, FPI_NONE);
>> +		pfn += 1UL << free_page_order;
>> +		split_pfn_offset -= (1UL << free_page_order);
>> +		/* we have done the first part, now switch to second part */
>> +		if (split_pfn_offset == 0)
>> +			split_pfn_offset = (1UL << order) - (pfn - free_page_pfn);
>> +	}
>> +	spin_unlock_irqrestore(&zone->lock, flags);
>> +}
>> +
>> +/**
>> + * isolate_single_pageblock() -- tries to isolate a pageblock that might be
>> + * within a free or in-use page.
>> + * @boundary_pfn:		pageblock-aligned pfn that a page might cross
>> + * @gfp_flags:			GFP flags used for migrating pages
>> + * @isolate_before_boundary:	isolate the pageblock before (1) or after (0)
>> + *				the boundary_pfn
>> + *
>> + * Free and in-use pages can be as big as MAX_ORDER-1 and contain more than one
>> + * pageblock. When not all pageblocks within a page are isolated at the same
>> + * time, free page accounting can go wrong. For example, in the case of
>> + * MAX_ORDER-1 = pageblock_order + 1, a MAX_ORDER-1 page has two pagelbocks.
>> + * [        MAX_ORDER-1          ]
>> + * [  pageblock0  |  pageblock1  ]
>> + * When either pageblock is isolated, if it is a free page, the page is not
>> + * split into separate migratetype lists, which is supposed to; if it is an
>> + * in-use page and freed later, __free_one_page() does not split the free page
>> + * either. The function handles this by splitting the free page or migrating
>> + * the in-use page then splitting the free page.
>> + */
>> +int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>> +			int isolate_before_boundary)
>
> Do you need such big param name ?

I am happy to take any suggestion.

>
> See
> https://www.kernel.org/doc/html/latest/process/coding-style.html?highlight=style#naming
>
> isolate_before_boundary could probably be shorter.

isolate_before instead?

>
> And should be a bool by the way.

Sure.
>
>> +{
>> +	unsigned char saved_mt;
>> +	/*
>> +	 * scan at max(MAX_ORDER_NR_PAGES, pageblock_nr_pages) aligned range to
>> +	 * avoid isolate pageblocks belonging to a bigger free or in-use page
>> +	 */
>> +	unsigned long start_pfn = pfn_max_align_down(boundary_pfn);
>> +	unsigned long isolated_pageblock_pfn;
>
> Variable name too long.
>
>> +	unsigned long pfn;
>> +
>> +	VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages));
>> +
>> +	if (isolate_before_boundary)
>> +		isolated_pageblock_pfn = boundary_pfn - pageblock_nr_pages;
>> +	else
>> +		isolated_pageblock_pfn = boundary_pfn;
>> +
>> +	saved_mt = get_pageblock_migratetype(pfn_to_page(isolated_pageblock_pfn));
>> +	set_pageblock_migratetype(pfn_to_page(isolated_pageblock_pfn), MIGRATE_ISOLATE);
>> +
>> +	for (pfn = start_pfn; pfn < boundary_pfn;) {
>
> This loop is a bit long a deep. Isn't there a way to put what's in "if
> (PageHuge(page) || PageTransCompound(page))" into a sub-function ?
>

Let me give it a try.

> See
> https://www.kernel.org/doc/html/latest/process/coding-style.html?highlight=style#functions
>

Thanks for the review.


>> +		struct page *page = pfn_to_page(pfn);
>> +
>> +		/*
>> +		 * start_pfn is max(MAX_ORDER_NR_PAGES, pageblock_nr_pages)
>> +		 * aligned, if there is any free pages in [start_pfn, boundary_pfn),
>> +		 * its head page will always be in the range.
>> +		 */
>> +		if (PageBuddy(page)) {
>> +			int order = buddy_order(page);
>> +
>> +			if (pfn + (1UL << order) > boundary_pfn)
>> +				split_free_page(page, order, boundary_pfn - pfn);
>> +			pfn += (1UL << order);
>> +			continue;
>> +		}
>> +		/*
>> +		 * migrate compound pages then let the free page handling code
>> +		 * above do the rest
>> +		 */
>> +		if (PageHuge(page) || PageTransCompound(page)) {
>> +			unsigned long nr_pages = compound_nr(page);
>> +			int order = compound_order(page);
>> +			struct page *head = compound_head(page);
>> +			unsigned long head_pfn = page_to_pfn(head);
>> +
>> +			if (head_pfn + nr_pages >= boundary_pfn) {
>> +				int ret;
>> +				struct compact_control cc = {
>> +					.nr_migratepages = 0,
>> +					.order = -1,
>> +					.zone = page_zone(pfn_to_page(head_pfn)),
>> +					.mode = MIGRATE_SYNC,
>> +					.ignore_skip_hint = true,
>> +					.no_set_skip_hint = true,
>> +					.gfp_mask = current_gfp_context(gfp_flags),
>> +					.alloc_contig = true,
>> +				};
>> +
>> +				INIT_LIST_HEAD(&cc.migratepages);
>> +
>> +				ret = __alloc_contig_migrate_range(&cc, head_pfn,
>> +							head_pfn + nr_pages);
>> +
>> +				if (ret) {
>> +					/* restore the original migratetype */
>> +					set_pageblock_migratetype(
>> +						pfn_to_page(isolated_pageblock_pfn),
>> +						saved_mt);
>> +					return -EBUSY;
>> +				}
>> +				/*
>> +				 * reset pfn, let the free page handling code
>> +				 * above split the free page to the right
>> +				 * migratetype list.
>> +				 *
>> +				 * head_pfn is not used here as a hugetlb page
>> +				 * order can be bigger than MAX_ORDER-1, but
>> +				 * after it is freed, the free page order is not.
>> +				 * Use pfn within the range to find the head of
>> +				 * the free page and reset order to 0 if a hugetlb
>> +				 * page with >MAX_ORDER-1 order is encountered.
>> +				 */
>> +				if (order > MAX_ORDER-1)
>> +					order = 0;
>> +				while (!PageBuddy(pfn_to_page(pfn))) {
>> +					order++;
>> +					pfn &= ~0UL << order;
>> +				}
>> +				continue;
>> +			}
>> +			pfn += nr_pages;
>> +			continue;
>> +		}
>> +
>> +		pfn++;
>> +	}
>> +	return 0;
>> +}
>> +
>> +
>>   /**
>>    * alloc_contig_range() -- tries to allocate given range of pages
>>    * @start:	start PFN to allocate
>> @@ -9067,8 +9228,9 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>>   int alloc_contig_range(unsigned long start, unsigned long end,
>>   		       unsigned migratetype, gfp_t gfp_mask)
>>   {
>> -	unsigned long outer_start, outer_end;
>> -	unsigned int order;
>> +	unsigned long outer_end;
>> +	unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
>> +	unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
>>   	int ret = 0;
>>
>>   	struct compact_control cc = {
>> @@ -9087,14 +9249,11 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>   	 * What we do here is we mark all pageblocks in range as
>>   	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
>>   	 * have different sizes, and due to the way page allocator
>> -	 * work, we align the range to biggest of the two pages so
>> -	 * that page allocator won't try to merge buddies from
>> -	 * different pageblocks and change MIGRATE_ISOLATE to some
>> -	 * other migration type.
>> +	 * work, start_isolate_page_range() has special handlings for this.
>>   	 *
>>   	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
>>   	 * migrate the pages from an unaligned range (ie. pages that
>> -	 * we are interested in).  This will put all the pages in
>> +	 * we are interested in). This will put all the pages in
>>   	 * range back to page allocator as MIGRATE_ISOLATE.
>>   	 *
>>   	 * When this is done, we take the pages in range from page
>> @@ -9107,9 +9266,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>   	 * put back to page allocator so that buddy can use them.
>>   	 */
>>
>> -	ret = start_isolate_page_range(start, end, migratetype, 0);
>> +	ret = start_isolate_page_range(start, end, migratetype, 0, gfp_mask);
>>   	if (ret)
>> -		return ret;
>> +		goto done;
>>
>>   	drain_all_pages(cc.zone);
>>
>> @@ -9128,68 +9287,28 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>   		goto done;
>>   	ret = 0;
>>
>> -	/*
>> -	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
>> -	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
>> -	 * more, all pages in [start, end) are free in page allocator.
>> -	 * What we are going to do is to allocate all pages from
>> -	 * [start, end) (that is remove them from page allocator).
>> -	 *
>> -	 * The only problem is that pages at the beginning and at the
>> -	 * end of interesting range may be not aligned with pages that
>> -	 * page allocator holds, ie. they can be part of higher order
>> -	 * pages.  Because of this, we reserve the bigger range and
>> -	 * once this is done free the pages we are not interested in.
>> -	 *
>> -	 * We don't have to hold zone->lock here because the pages are
>> -	 * isolated thus they won't get removed from buddy.
>> -	 */
>> -
>> -	order = 0;
>> -	outer_start = start;
>> -	while (!PageBuddy(pfn_to_page(outer_start))) {
>> -		if (++order >= MAX_ORDER) {
>> -			outer_start = start;
>> -			break;
>> -		}
>> -		outer_start &= ~0UL << order;
>> -	}
>> -
>> -	if (outer_start != start) {
>> -		order = buddy_order(pfn_to_page(outer_start));
>> -
>> -		/*
>> -		 * outer_start page could be small order buddy page and
>> -		 * it doesn't include start page. Adjust outer_start
>> -		 * in this case to report failed page properly
>> -		 * on tracepoint in test_pages_isolated()
>> -		 */
>> -		if (outer_start + (1UL << order) <= start)
>> -			outer_start = start;
>> -	}
>> -
>>   	/* Make sure the range is really isolated. */
>> -	if (test_pages_isolated(outer_start, end, 0)) {
>> +	if (test_pages_isolated(alloc_start, alloc_end, 0)) {
>>   		ret = -EBUSY;
>>   		goto done;
>>   	}
>>
>>   	/* Grab isolated pages from freelists. */
>> -	outer_end = isolate_freepages_range(&cc, outer_start, end);
>> +	outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
>>   	if (!outer_end) {
>>   		ret = -EBUSY;
>>   		goto done;
>>   	}
>>
>>   	/* Free head and tail (if any) */
>> -	if (start != outer_start)
>> -		free_contig_range(outer_start, start - outer_start);
>> -	if (end != outer_end)
>> -		free_contig_range(end, outer_end - end);
>> +	if (start != alloc_start)
>> +		free_contig_range(alloc_start, start - alloc_start);
>> +	if (end != alloc_end)
>> +		free_contig_range(end, alloc_end - end);
>>
>>   done:
>> -	undo_isolate_page_range(pfn_max_align_down(start),
>> -				pfn_max_align_up(end), migratetype);
>> +	undo_isolate_page_range(alloc_start,
>> +				alloc_end, migratetype);
>>   	return ret;
>>   }
>>   EXPORT_SYMBOL(alloc_contig_range);
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 64d093ab83ec..0256d5e1032c 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -285,6 +285,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>    *					 and PageOffline() pages.
>>    *			REPORT_FAILURE - report details about the failure to
>>    *			isolate the range
>> + * @gfp_flags:		GFP flags used for migrating pages that sit across the
>> + *			range boundaries.
>>    *
>>    * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
>>    * the range will never be allocated. Any free pages and pages freed in the
>> @@ -293,6 +295,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>    * pages in the range finally, the caller have to free all pages in the range.
>>    * test_page_isolated() can be used for test it.
>>    *
>> + * The function first tries to isolate the pageblocks at the beginning and end
>> + * of the range, since there might be pages across the range boundaries.
>> + * Afterwards, it isolates the rest of the range.
>> + *
>>    * There is no high level synchronization mechanism that prevents two threads
>>    * from trying to isolate overlapping ranges. If this happens, one thread
>>    * will notice pageblocks in the overlapping range already set to isolate.
>> @@ -313,21 +319,38 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>    * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
>>    */
>>   int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>> -			     unsigned migratetype, int flags)
>> +			     unsigned migratetype, int flags, gfp_t gfp_flags)
>>   {
>>   	unsigned long pfn;
>>   	struct page *page;
>> +	/* isolation is done at page block granularity */
>> +	unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages);
>> +	unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages);
>> +	int ret;
>>
>> -	unsigned long isolate_start = pfn_max_align_down(start_pfn);
>> -	unsigned long isolate_end = pfn_max_align_up(end_pfn);
>> +	/* isolate [isolate_start, isolate_start + pageblock_nr_pages] pageblock */
>> +	ret = isolate_single_pageblock(isolate_start, gfp_flags, 0);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* isolate [isolate_end - pageblock_nr_pages, isolate_end] pageblock */
>> +	ret = isolate_single_pageblock(isolate_end, gfp_flags, 1);
>> +	if (ret) {
>> +		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
>> +		return ret;
>> +	}
>>
>> -	for (pfn = isolate_start;
>> -	     pfn < isolate_end;
>> +	/* skip isolated pageblocks at the beginning and end */
>> +	for (pfn = isolate_start + pageblock_nr_pages;
>> +	     pfn < isolate_end - pageblock_nr_pages;
>>   	     pfn += pageblock_nr_pages) {
>>   		page = __first_valid_page(pfn, pageblock_nr_pages);
>>   		if (page && set_migratetype_isolate(page, migratetype, flags,
>>   					start_pfn, end_pfn)) {
>>   			undo_isolate_page_range(isolate_start, pfn, migratetype);
>> +			unset_migratetype_isolate(
>> +				pfn_to_page(isolate_end - pageblock_nr_pages),
>> +				migratetype);
>>   			return -EBUSY;
>>   		}
>>   	}

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-02-14 16:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-11 16:41 [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-02-11 16:41 ` [PATCH v5 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-02-14 10:44   ` Mike Rapoport
2022-02-11 16:41 ` [PATCH v5 2/6] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-02-11 16:41 ` [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-02-14  7:26   ` Christoph Hellwig
2022-02-14 15:46     ` Zi Yan
2022-02-14  7:59   ` Christophe Leroy
2022-02-14 16:03     ` Zi Yan
2022-02-11 16:41 ` [PATCH v5 4/6] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-02-11 16:41 ` [PATCH v5 5/6] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-02-11 16:41 ` [PATCH v5 6/6] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).