linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
@ 2021-12-09 23:04 Zi Yan
  2021-12-09 23:04 ` [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Zi Yan
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Zi Yan @ 2021-12-09 23:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
adjustable at boot time[1].

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. still isolates pageblocks at MAX_ORDER - 1 granularity;
2. but saves the pageblock migratetypes outside the specified range of
   alloc_contig_range() and restores them after all pages within the range
   become free after __alloc_contig_migrate_range();
3. splits free pages spanning multiple pageblocks at the beginning and the end
   of the range and puts the split pages to the right migratetype free lists
   based on the pageblock migratetypes;
4. returns pages not in the range as it did before this patch.

Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
to make sure all pageblocks belonging to a single page are isolated together 
and later pageblocks outside the range need to have their migratetypes restored;
or 2) extra logic will need to be added during page free time to split a free
page with multi-migratetype pageblocks.

Two optimizations might come later:
1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
   range during isolation to increase successful rate of alloc_contig_range().
2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
   migratetypes before and after isolation respectively.

Feel free to give comments and suggestions. Thanks.


[1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/


Zi Yan (7):
  mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  mm: compaction: handle non-lru compound pages properly in
    isolate_migratepages_block().
  mm: migrate: allocate the right size of non hugetlb or THP compound
    pages.
  mm: make alloc_contig_range work at pageblock granularity
  mm: cma: use pageblock_order as the single alignment
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
    size.
  arch: powerpc: adjust fadump alignment to be pageblock aligned.

 arch/powerpc/include/asm/fadump-internal.h |   4 +-
 drivers/virtio/virtio_mem.c                |   6 +-
 include/linux/mmzone.h                     |  11 +-
 kernel/dma/contiguous.c                    |   2 +-
 mm/cma.c                                   |   6 +-
 mm/compaction.c                            |  10 +-
 mm/migrate.c                               |   8 +-
 mm/page_alloc.c                            | 203 +++++++++++++++++----
 8 files changed, 196 insertions(+), 54 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
@ 2021-12-09 23:04 ` Zi Yan
  2021-12-10  7:43   ` Eric Ren
  2021-12-09 23:04 ` [RFC PATCH v2 2/7] mm: compaction: handle non-lru compound pages properly in isolate_migratepages_block() Zi Yan
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Zi Yan @ 2021-12-09 23:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
It prepares for the upcoming removal of the MAX_ORDER-1 alignment
requirement for CMA and alloc_contig_range().

MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.

[1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h |  6 ++++++
 mm/page_alloc.c        | 28 ++++++++++++++++++----------
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..b925431b0123 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -83,6 +83,12 @@ static inline bool is_migrate_movable(int mt)
 	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
 }
 
+/* See fallbacks[MIGRATE_TYPES][3] in page_alloc.c */
+static inline bool migratetype_has_fallback(int mt)
+{
+	return mt < MIGRATE_PCPTYPES;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edfd6c81af82..107a5f186d3b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1041,6 +1041,12 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
 	return page_is_buddy(higher_page, higher_buddy, order + 1);
 }
 
+static inline bool has_non_fallback_pageblock(struct zone *zone)
+{
+	return has_isolate_pageblock(zone) || zone_cma_pages(zone) != 0 ||
+		zone->nr_reserved_highatomic != 0;
+}
+
 /*
  * Freeing function for a buddy system allocator.
  *
@@ -1116,14 +1122,15 @@ static inline void __free_one_page(struct page *page,
 	}
 	if (order < MAX_ORDER - 1) {
 		/* If we are here, it means order is >= pageblock_order.
-		 * We want to prevent merge between freepages on isolate
-		 * pageblock and normal pageblock. Without this, pageblock
-		 * isolation could cause incorrect freepage or CMA accounting.
+		 * We want to prevent merge between freepages on pageblock
+		 * without fallbacks and normal pageblock. Without this,
+		 * pageblock isolation could cause incorrect freepage or CMA
+		 * accounting or HIGHATOMIC accounting.
 		 *
 		 * We don't want to hit this code for the more frequent
 		 * low-order merging.
 		 */
-		if (unlikely(has_isolate_pageblock(zone))) {
+		if (unlikely(has_non_fallback_pageblock(zone))) {
 			int buddy_mt;
 
 			buddy_pfn = __find_buddy_pfn(pfn, order);
@@ -1131,8 +1138,8 @@ static inline void __free_one_page(struct page *page,
 			buddy_mt = get_pageblock_migratetype(buddy);
 
 			if (migratetype != buddy_mt
-					&& (is_migrate_isolate(migratetype) ||
-						is_migrate_isolate(buddy_mt)))
+					&& (!migratetype_has_fallback(migratetype) ||
+						!migratetype_has_fallback(buddy_mt)))
 				goto done_merging;
 		}
 		max_order = order + 1;
@@ -2483,6 +2490,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
 #ifdef CONFIG_CMA
 	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
 #endif
@@ -2794,8 +2802,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 
 	/* Yoink! */
 	mt = get_pageblock_migratetype(page);
-	if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
-	    && !is_migrate_cma(mt)) {
+	/* Only reserve normal pageblock */
+	if (migratetype_has_fallback(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
 		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
@@ -3544,8 +3552,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
-			    && !is_migrate_highatomic(mt))
+			/* Only change normal pageblock */
+			if (migratetype_has_fallback(mt))
 				set_pageblock_migratetype(page,
 							  MIGRATE_MOVABLE);
 		}
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 2/7] mm: compaction: handle non-lru compound pages properly in isolate_migratepages_block().
  2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
  2021-12-09 23:04 ` [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Zi Yan
@ 2021-12-09 23:04 ` Zi Yan
  2021-12-09 23:04 ` [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages Zi Yan
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2021-12-09 23:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

In isolate_migratepages_block(), a !PageLRU tail page can be encountered
when the page is larger than a pageblock. Use compound head page for the
checks inside and skip the entire compound page when isolation succeeds.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/compaction.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b4e94cda3019..ad9053fbbe06 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -979,19 +979,23 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		 * Skip any other type of page
 		 */
 		if (!PageLRU(page)) {
+			struct page *head = compound_head(page);
 			/*
 			 * __PageMovable can return false positive so we need
 			 * to verify it under page_lock.
 			 */
-			if (unlikely(__PageMovable(page)) &&
-					!PageIsolated(page)) {
+			if (unlikely(__PageMovable(head)) &&
+					!PageIsolated(head)) {
 				if (locked) {
 					unlock_page_lruvec_irqrestore(locked, flags);
 					locked = NULL;
 				}
 
-				if (!isolate_movable_page(page, isolate_mode))
+				if (!isolate_movable_page(head, isolate_mode)) {
+					low_pfn += (1 << compound_order(head)) - 1 - (page - head);
+					page = head;
 					goto isolate_success;
+				}
 			}
 
 			goto isolate_fail;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.
  2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
  2021-12-09 23:04 ` [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Zi Yan
  2021-12-09 23:04 ` [RFC PATCH v2 2/7] mm: compaction: handle non-lru compound pages properly in isolate_migratepages_block() Zi Yan
@ 2021-12-09 23:04 ` Zi Yan
  2021-12-10  7:53   ` Eric Ren
  2021-12-09 23:04 ` [RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity Zi Yan
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Zi Yan @ 2021-12-09 23:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

alloc_migration_target() is used by alloc_contig_range() and non-LRU
movable compound pages can be migrated. Current code does not allocate the
right page size for such pages. Check THP precisely using
is_transparent_huge() and add allocation support for non-LRU compound
pages.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/migrate.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index d487a399253b..2ce3c771b1de 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
 		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
 	}
 
-	if (PageTransHuge(page)) {
+	if (is_transparent_hugepage(page)) {
 		/*
 		 * clear __GFP_RECLAIM to make the migration callback
 		 * consistent with regular THP allocations.
@@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
 		gfp_mask |= GFP_TRANSHUGE;
 		order = HPAGE_PMD_ORDER;
 	}
+	if (PageCompound(page)) {
+		gfp_mask |= __GFP_COMP;
+		order = compound_order(page);
+	}
 	zidx = zone_idx(page_zone(page));
 	if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
 		gfp_mask |= __GFP_HIGHMEM;
 
 	new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);
 
-	if (new_page && PageTransHuge(new_page))
+	if (new_page && is_transparent_hugepage(page))
 		prep_transhuge_page(new_page);
 
 	return new_page;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity
  2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (2 preceding siblings ...)
  2021-12-09 23:04 ` [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages Zi Yan
@ 2021-12-09 23:04 ` Zi Yan
  2021-12-10  8:12   ` Eric Ren
  2021-12-09 23:04 ` [RFC PATCH v2 5/7] mm: cma: use pageblock_order as the single alignment Zi Yan
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Zi Yan @ 2021-12-09 23:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
pageblocks with different migratetypes. It might unnecessarily convert
extra pageblocks at the beginning and at the end of the range. Change
alloc_contig_range() to work at pageblock granularity.

It is done by restoring pageblock types and split >pageblock_order free
pages after isolating at MAX_ORDER-1 granularity and migrating pages
away at pageblock granularity. The reason for this process is that
during isolation, some pages, either free or in-use, might have >pageblock
sizes and isolating part of them can cause free accounting issues.
Restoring the migratetypes of the pageblocks not in the interesting
range later is much easier.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/page_alloc.c | 169 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 149 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 107a5f186d3b..5ffbeb1b7512 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8981,8 +8981,8 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
-	return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
-			     pageblock_nr_pages) - 1);
+	return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+				     pageblock_nr_pages));
 }
 
 static unsigned long pfn_max_align_up(unsigned long pfn)
@@ -9071,6 +9071,52 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	return 0;
 }
 
+static inline int save_migratetypes(unsigned char *migratetypes,
+				unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn = start_pfn;
+	int num = 0;
+
+	while (pfn < end_pfn) {
+		migratetypes[num] = get_pageblock_migratetype(pfn_to_page(pfn));
+		num++;
+		pfn += pageblock_nr_pages;
+	}
+	return num;
+}
+
+static inline int restore_migratetypes(unsigned char *migratetypes,
+				unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn = start_pfn;
+	int num = 0;
+
+	while (pfn < end_pfn) {
+		set_pageblock_migratetype(pfn_to_page(pfn), migratetypes[num]);
+		num++;
+		pfn += pageblock_nr_pages;
+	}
+	return num;
+}
+
+static inline void split_free_page_into_pageblocks(struct page *free_page,
+				int order, struct zone *zone)
+{
+	unsigned long pfn;
+
+	spin_lock(&zone->lock);
+	del_page_from_free_list(free_page, zone, order);
+	for (pfn = page_to_pfn(free_page);
+	     pfn < page_to_pfn(free_page) + (1UL << order);
+	     pfn += pageblock_nr_pages) {
+		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
+				mt, FPI_NONE);
+	}
+	spin_unlock(&zone->lock);
+}
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -9096,8 +9142,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		       unsigned migratetype, gfp_t gfp_mask)
 {
 	unsigned long outer_start, outer_end;
+	unsigned long isolate_start = pfn_max_align_down(start);
+	unsigned long isolate_end = pfn_max_align_up(end);
+	unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
+	unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
+	unsigned long num_pageblock_to_save;
 	unsigned int order;
 	int ret = 0;
+	unsigned char *saved_mt;
+	int num;
 
 	struct compact_control cc = {
 		.nr_migratepages = 0,
@@ -9111,11 +9164,30 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
+	/*
+	 * TODO: make MIGRATE_ISOLATE a standalone bit to avoid overwriting
+	 * the exiting migratetype. Then, we will not need the save and restore
+	 * process here.
+	 */
+
+	/* Save the migratepages of the pageblocks before start and after end */
+	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
+				+ (isolate_end - alloc_end) / pageblock_nr_pages;
+	saved_mt =
+		kmalloc_array(num_pageblock_to_save,
+			      sizeof(unsigned char), GFP_KERNEL);
+	if (!saved_mt)
+		return -ENOMEM;
+
+	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
+
+	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
 	/*
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
 	 * have different sizes, and due to the way page allocator
-	 * work, we align the range to biggest of the two pages so
+	 * work, we align the isolation range to biggest of the two so
 	 * that page allocator won't try to merge buddies from
 	 * different pageblocks and change MIGRATE_ISOLATE to some
 	 * other migration type.
@@ -9125,6 +9197,20 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * we are interested in).  This will put all the pages in
 	 * range back to page allocator as MIGRATE_ISOLATE.
 	 *
+	 * Afterwards, we restore the migratetypes of the pageblocks not
+	 * in range, split free pages spanning outside the range,
+	 * and put split free pages (at pageblock_order) to the right
+	 * migratetype list.
+	 *
+	 * NOTE: the above approach is used because it can cause free
+	 * page accounting issues during isolation, if a page, either
+	 * free or in-use, contains multiple pageblocks and we only
+	 * isolate a subset of them. For example, if only the second
+	 * pageblock is isolated from a page with 2 pageblocks, after
+	 * the page is free, it will be put in the first pageblock
+	 * migratetype list instead of having 2 pageblocks in two
+	 * separate migratetype lists.
+	 *
 	 * When this is done, we take the pages in range from page
 	 * allocator removing them from the buddy system.  This way
 	 * page allocator will never consider using them.
@@ -9135,10 +9221,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype, 0);
+	ret = start_isolate_page_range(isolate_start, isolate_end, migratetype, 0);
 	if (ret)
-		return ret;
+		goto done;
 
 	drain_all_pages(cc.zone);
 
@@ -9174,6 +9259,19 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * isolated thus they won't get removed from buddy.
 	 */
 
+	/*
+	 * Restore migratetypes of pageblocks outside [start, end)
+	 * TODO: remove it when MIGRATE_ISOLATE becomes a standalone bit
+	 */
+
+	num = restore_migratetypes(saved_mt, isolate_start, alloc_start);
+
+	num = restore_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
+	/*
+	 * Split free page spanning [isolate_start, alloc_start) and put the
+	 * pageblocks in the right migratetype lists.
+	 */
 	order = 0;
 	outer_start = start;
 	while (!PageBuddy(pfn_to_page(outer_start))) {
@@ -9188,37 +9286,68 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		order = buddy_order(pfn_to_page(outer_start));
 
 		/*
-		 * outer_start page could be small order buddy page and
-		 * it doesn't include start page. Adjust outer_start
-		 * in this case to report failed page properly
-		 * on tracepoint in test_pages_isolated()
+		 * split the free page has start page and put the pageblocks
+		 * in the right migratetype list
 		 */
-		if (outer_start + (1UL << order) <= start)
-			outer_start = start;
+		if (outer_start + (1UL << order) > start) {
+			struct page *free_page = pfn_to_page(outer_start);
+
+			split_free_page_into_pageblocks(free_page, order, cc.zone);
+		}
+	}
+
+	/*
+	 * Split free page spanning [alloc_end, isolate_end) and put the
+	 * pageblocks in the right migratetype list
+	 */
+	order = 0;
+	outer_end = end;
+	while (!PageBuddy(pfn_to_page(outer_end))) {
+		if (++order >= MAX_ORDER) {
+			outer_end = end;
+			break;
+		}
+		outer_end &= ~0UL << order;
+	}
+
+	if (outer_end != end) {
+		order = buddy_order(pfn_to_page(outer_end));
+
+		/*
+		 * split the free page has start page and put the pageblocks
+		 * in the right migratetype list
+		 */
+		VM_BUG_ON(outer_end + (1UL << order) <= end);
+		{
+			struct page *free_page = pfn_to_page(outer_end);
+
+			split_free_page_into_pageblocks(free_page, order, cc.zone);
+		}
 	}
 
 	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, 0)) {
+	if (test_pages_isolated(alloc_start, alloc_end, 0)) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Grab isolated pages from freelists. */
-	outer_end = isolate_freepages_range(&cc, outer_start, end);
+	outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
 	if (!outer_end) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Free head and tail (if any) */
-	if (start != outer_start)
-		free_contig_range(outer_start, start - outer_start);
-	if (end != outer_end)
-		free_contig_range(end, outer_end - end);
+	if (start != alloc_start)
+		free_contig_range(alloc_start, start - alloc_start);
+	if (end != alloc_end)
+		free_contig_range(end, alloc_end - end);
 
 done:
-	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+	kfree(saved_mt);
+	undo_isolate_page_range(alloc_start,
+				alloc_end, migratetype);
 	return ret;
 }
 EXPORT_SYMBOL(alloc_contig_range);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 5/7] mm: cma: use pageblock_order as the single alignment
  2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (3 preceding siblings ...)
  2021-12-09 23:04 ` [RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity Zi Yan
@ 2021-12-09 23:04 ` Zi Yan
  2021-12-09 23:04 ` [RFC PATCH v2 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2021-12-09 23:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_order
alignment.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h  | 5 +----
 kernel/dma/contiguous.c | 2 +-
 mm/cma.c                | 6 ++----
 mm/page_alloc.c         | 6 +++---
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b925431b0123..71830af35745 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
 	 *
 	 * The way to use it is to change migratetype of a range of
 	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger than
-	 * a single pageblock.
+	 * __free_pageblock_cma() function.
 	 */
 	MIGRATE_CMA,
 #endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {
 
 static int __init rmem_cma_setup(struct reserved_mem *rmem)
 {
-	phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+	phys_addr_t align = PAGE_SIZE << pageblock_order;
 	phys_addr_t mask = align - 1;
 	unsigned long node = rmem->fdt_node;
 	bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..d171158bd418 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -180,8 +180,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 		return -EINVAL;
 
 	/* ensure minimal alignment required by mm core */
-	alignment = PAGE_SIZE <<
-			max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+	alignment = PAGE_SIZE << pageblock_order;
 
 	/* alignment should be aligned with order_per_bit */
 	if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -268,8 +267,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 	 * migratetype page by page allocator's buddy algorithm. In the case,
 	 * you couldn't get a contiguous memory, which is not what we want.
 	 */
-	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE <<
-			  max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE << pageblock_order);
 	if (fixed && base & (alignment - 1)) {
 		ret = -EINVAL;
 		pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5ffbeb1b7512..3317f2064105 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9127,8 +9127,8 @@ static inline void split_free_page_into_pageblocks(struct page *free_page,
  *			be either of the two.
  * @gfp_mask:	GFP mask to use during compaction
  *
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned.  The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
  *
  * The first thing this routine does is attempt to MIGRATE_ISOLATE all
  * pageblocks in the range.  Once isolated, the pageblocks should not
@@ -9243,7 +9243,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	ret = 0;
 
 	/*
-	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * Pages from [start, end) are within a pageblock_nr_pages
 	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
 	 * more, all pages in [start, end) are free in page allocator.
 	 * What we are going to do is to allocate all pages from
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.
  2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (4 preceding siblings ...)
  2021-12-09 23:04 ` [RFC PATCH v2 5/7] mm: cma: use pageblock_order as the single alignment Zi Yan
@ 2021-12-09 23:04 ` Zi Yan
  2021-12-09 23:04 ` [RFC PATCH v2 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2021-12-09 23:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 drivers/virtio/virtio_mem.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 96e5a8782769..dab4bce417fd 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2436,15 +2436,13 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
 	/*
-	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
-	 * pageblock_nr_pages pages. This:
+	 * We want subblocks to span at least pageblock_nr_pages pages. This:
 	 * - Simplifies our page onlining code (virtio_mem_online_page_cb)
 	 *   and fake page onlining code (virtio_mem_fake_online).
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-			pageblock_nr_pages) * PAGE_SIZE;
+	sb_size = pageblock_nr_pages * PAGE_SIZE;
 	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
 	if (sb_size < memory_block_size_bytes() && !force_bbm) {
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned.
  2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (5 preceding siblings ...)
  2021-12-09 23:04 ` [RFC PATCH v2 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
@ 2021-12-09 23:04 ` Zi Yan
  2021-12-10  7:30 ` [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Eric Ren
  2021-12-10 18:36 ` David Hildenbrand
  8 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2021-12-09 23:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 arch/powerpc/include/asm/fadump-internal.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h
index 8d61c8f3fec4..9198f20b6b68 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
 #define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
 /* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE <<				\
-				 max_t(unsigned long, MAX_ORDER - 1,	\
-				 pageblock_order))
+#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE << pageblock_order)
 
 /* FAD commands */
 #define FADUMP_REGISTER			1
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (6 preceding siblings ...)
  2021-12-09 23:04 ` [RFC PATCH v2 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
@ 2021-12-10  7:30 ` Eric Ren
  2021-12-10 15:30   ` Zi Yan
  2021-12-10 18:36 ` David Hildenbrand
  8 siblings, 1 reply; 18+ messages in thread
From: Eric Ren @ 2021-12-10  7:30 UTC (permalink / raw)
  To: Zi Yan, David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman

Hi Zi Yan,

On 2021/12/10 07:04, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> Hi all,
>
> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
> adjustable at boot time[1].
>
> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
> isolates pageblocks to remove free memory from buddy allocator but isolating
> only a subset of pageblocks within a page spanning across multiple pageblocks
> causes free page accounting issues. Isolated page might not be put into the
> right free list, since the code assumes the migratetype of the first pageblock
> as the whole free page migratetype. This is based on the discussion at [2].
>
> To remove the requirement, this patchset:
> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
Then, unplug fails if either pageblock of the  MAX_ORDER - 1 page has 
unmovable page, right?

Thanks,
Eric
> 2. but saves the pageblock migratetypes outside the specified range of
>     alloc_contig_range() and restores them after all pages within the range
>     become free after __alloc_contig_migrate_range();
> 3. splits free pages spanning multiple pageblocks at the beginning and the end
>     of the range and puts the split pages to the right migratetype free lists
>     based on the pageblock migratetypes;
> 4. returns pages not in the range as it did before this patch.
>
> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
> to make sure all pageblocks belonging to a single page are isolated together
> and later pageblocks outside the range need to have their migratetypes restored;
> or 2) extra logic will need to be added during page free time to split a free
> page with multi-migratetype pageblocks.
>
> Two optimizations might come later:
> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
>     range during isolation to increase successful rate of alloc_contig_range().
> 2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
>     migratetypes before and after isolation respectively.
>
> Feel free to give comments and suggestions. Thanks.
>
>
> [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
> [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/
>
>
> Zi Yan (7):
>    mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
>    mm: compaction: handle non-lru compound pages properly in
>      isolate_migratepages_block().
>    mm: migrate: allocate the right size of non hugetlb or THP compound
>      pages.
>    mm: make alloc_contig_range work at pageblock granularity
>    mm: cma: use pageblock_order as the single alignment
>    drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>      size.
>    arch: powerpc: adjust fadump alignment to be pageblock aligned.
>
>   arch/powerpc/include/asm/fadump-internal.h |   4 +-
>   drivers/virtio/virtio_mem.c                |   6 +-
>   include/linux/mmzone.h                     |  11 +-
>   kernel/dma/contiguous.c                    |   2 +-
>   mm/cma.c                                   |   6 +-
>   mm/compaction.c                            |  10 +-
>   mm/migrate.c                               |   8 +-
>   mm/page_alloc.c                            | 203 +++++++++++++++++----
>   8 files changed, 196 insertions(+), 54 deletions(-)
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  2021-12-09 23:04 ` [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Zi Yan
@ 2021-12-10  7:43   ` Eric Ren
  2021-12-10 15:39     ` Zi Yan
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Ren @ 2021-12-10  7:43 UTC (permalink / raw)
  To: Zi Yan, David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman

Hi,

On 2021/12/10 07:04, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> requirement for CMA and alloc_contig_range().
>
> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
>
> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   include/linux/mmzone.h |  6 ++++++
>   mm/page_alloc.c        | 28 ++++++++++++++++++----------
>   2 files changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 58e744b78c2c..b925431b0123 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -83,6 +83,12 @@ static inline bool is_migrate_movable(int mt)
>   	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
>   }
>   
> +/* See fallbacks[MIGRATE_TYPES][3] in page_alloc.c */
> +static inline bool migratetype_has_fallback(int mt)
> +{
> +	return mt < MIGRATE_PCPTYPES;
> +}
> +

I would suggest spliting the patch into 2 parts.  The first part: no 
functioning change, just introduce migratetype_has_fallback()
and replace where it applys to.

>   #define for_each_migratetype_order(order, type) \
>   	for (order = 0; order < MAX_ORDER; order++) \
>   		for (type = 0; type < MIGRATE_TYPES; type++)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index edfd6c81af82..107a5f186d3b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1041,6 +1041,12 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
>   	return page_is_buddy(higher_page, higher_buddy, order + 1);
>   }
>   
> +static inline bool has_non_fallback_pageblock(struct zone *zone)
> +{
> +	return has_isolate_pageblock(zone) || zone_cma_pages(zone) != 0 ||
> +		zone->nr_reserved_highatomic != 0;

Make zone->nr_reserved_highatomic != 0 a helper as zone_cma_pages()?
> +}
> +
>   /*
>    * Freeing function for a buddy system allocator.
>    *
> @@ -1116,14 +1122,15 @@ static inline void __free_one_page(struct page *page,
>   	}
>   	if (order < MAX_ORDER - 1) {
>   		/* If we are here, it means order is >= pageblock_order.
> -		 * We want to prevent merge between freepages on isolate
> -		 * pageblock and normal pageblock. Without this, pageblock
> -		 * isolation could cause incorrect freepage or CMA accounting.
> +		 * We want to prevent merge between freepages on pageblock
> +		 * without fallbacks and normal pageblock. Without this,
> +		 * pageblock isolation could cause incorrect freepage or CMA
> +		 * accounting or HIGHATOMIC accounting.
>   		 *
>   		 * We don't want to hit this code for the more frequent
>   		 * low-order merging.
>   		 */
> -		if (unlikely(has_isolate_pageblock(zone))) {
> +		if (unlikely(has_non_fallback_pageblock(zone))) {
I'm not familiar with the code details, just wondering if this change 
would has side effects on cma
pageblock merging as it the condition stronger?

Thanks,
Eric
>   			int buddy_mt;
>   
>   			buddy_pfn = __find_buddy_pfn(pfn, order);
> @@ -1131,8 +1138,8 @@ static inline void __free_one_page(struct page *page,
>   			buddy_mt = get_pageblock_migratetype(buddy);
>   
>   			if (migratetype != buddy_mt
> -					&& (is_migrate_isolate(migratetype) ||
> -						is_migrate_isolate(buddy_mt)))
> +					&& (!migratetype_has_fallback(migratetype) ||
> +						!migratetype_has_fallback(buddy_mt)))
>   				goto done_merging;
>   		}
>   		max_order = order + 1;
> @@ -2483,6 +2490,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>   	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>   	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>   	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>   #ifdef CONFIG_CMA
>   	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
>   #endif
> @@ -2794,8 +2802,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
>   
>   	/* Yoink! */
>   	mt = get_pageblock_migratetype(page);
> -	if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
> -	    && !is_migrate_cma(mt)) {
> +	/* Only reserve normal pageblock */
> +	if (migratetype_has_fallback(mt)) {
>   		zone->nr_reserved_highatomic += pageblock_nr_pages;
>   		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
>   		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
> @@ -3544,8 +3552,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
>   		struct page *endpage = page + (1 << order) - 1;
>   		for (; page < endpage; page += pageblock_nr_pages) {
>   			int mt = get_pageblock_migratetype(page);
> -			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
> -			    && !is_migrate_highatomic(mt))
> +			/* Only change normal pageblock */
> +			if (migratetype_has_fallback(mt))
>   				set_pageblock_migratetype(page,
>   							  MIGRATE_MOVABLE);
>   		}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.
  2021-12-09 23:04 ` [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages Zi Yan
@ 2021-12-10  7:53   ` Eric Ren
  2021-12-10 15:48     ` Zi Yan
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Ren @ 2021-12-10  7:53 UTC (permalink / raw)
  To: Zi Yan, David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman

Hi,

On 2021/12/10 07:04, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> alloc_migration_target() is used by alloc_contig_range() and non-LRU
> movable compound pages can be migrated. Current code does not allocate the
> right page size for such pages. Check THP precisely using
> is_transparent_huge() and add allocation support for non-LRU compound
> pages.
Could you elaborate on why the current code doesn't get the right size?  
how this patch fixes it.

The description sounds like it's an existing bug, if so, the patch 
subject should be changed to
"Fixes ..."?

>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   mm/migrate.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index d487a399253b..2ce3c771b1de 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
>   		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
>   	}
>   
> -	if (PageTransHuge(page)) {
> +	if (is_transparent_hugepage(page)) {
>   		/*
>   		 * clear __GFP_RECLAIM to make the migration callback
>   		 * consistent with regular THP allocations.
> @@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
if (PageTransHuge(page)) {  // just give more code context
...
>   		gfp_mask |= GFP_TRANSHUGE;
>   		order = HPAGE_PMD_ORDER;
order assigned here
>   	}
> +	if (PageCompound(page)) {
> +		gfp_mask |= __GFP_COMP;
> +		order = compound_order(page);
re-assinged again as THP is a compound page?

Thanks,
Eric
> +	}
>   	zidx = zone_idx(page_zone(page));
>   	if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
>   		gfp_mask |= __GFP_HIGHMEM;
>   
>   	new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);
>   
> -	if (new_page && PageTransHuge(new_page))
> +	if (new_page && is_transparent_hugepage(page))
>   		prep_transhuge_page(new_page);
>   
>   	return new_page;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity
  2021-12-09 23:04 ` [RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity Zi Yan
@ 2021-12-10  8:12   ` Eric Ren
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Ren @ 2021-12-10  8:12 UTC (permalink / raw)
  To: Zi Yan, David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman

Hi,

On 2021/12/10 07:04, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
>
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
Could you elaborate on how the acconting issue would happen in details?


> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   mm/page_alloc.c | 169 ++++++++++++++++++++++++++++++++++++++++++------
>   1 file changed, 149 insertions(+), 20 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 107a5f186d3b..5ffbeb1b7512 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8981,8 +8981,8 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>   #ifdef CONFIG_CONTIG_ALLOC
>   static unsigned long pfn_max_align_down(unsigned long pfn)
>   {
> -	return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
> -			     pageblock_nr_pages) - 1);
> +	return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
> +				     pageblock_nr_pages));
>   }
>   
>   static unsigned long pfn_max_align_up(unsigned long pfn)
> @@ -9071,6 +9071,52 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   	return 0;
>   }
>   
> +static inline int save_migratetypes(unsigned char *migratetypes,
> +				unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn = start_pfn;
> +	int num = 0;
> +
> +	while (pfn < end_pfn) {
> +		migratetypes[num] = get_pageblock_migratetype(pfn_to_page(pfn));
> +		num++;
> +		pfn += pageblock_nr_pages;
> +	}
> +	return num;
> +}
> +
> +static inline int restore_migratetypes(unsigned char *migratetypes,
> +				unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn = start_pfn;
> +	int num = 0;
> +
> +	while (pfn < end_pfn) {
> +		set_pageblock_migratetype(pfn_to_page(pfn), migratetypes[num]);
> +		num++;
> +		pfn += pageblock_nr_pages;
> +	}
> +	return num;
> +}
> +
> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> +				int order, struct zone *zone)
> +{
> +	unsigned long pfn;
> +
> +	spin_lock(&zone->lock);
> +	del_page_from_free_list(free_page, zone, order);
> +	for (pfn = page_to_pfn(free_page);
> +	     pfn < page_to_pfn(free_page) + (1UL << order);
> +	     pfn += pageblock_nr_pages) {
> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> +		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> +				mt, FPI_NONE);
> +	}
> +	spin_unlock(&zone->lock);
> +}
> +
>   /**
>    * alloc_contig_range() -- tries to allocate given range of pages
>    * @start:	start PFN to allocate
> @@ -9096,8 +9142,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   		       unsigned migratetype, gfp_t gfp_mask)
>   {
>   	unsigned long outer_start, outer_end;
> +	unsigned long isolate_start = pfn_max_align_down(start);
> +	unsigned long isolate_end = pfn_max_align_up(end);
> +	unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
> +	unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
What is the differecence between isolate_* and alloc_*?
> +	unsigned long num_pageblock_to_save;
>   	unsigned int order;
>   	int ret = 0;
> +	unsigned char *saved_mt;
> +	int num;
>   
>   	struct compact_control cc = {
>   		.nr_migratepages = 0,
> @@ -9111,11 +9164,30 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   	};
>   	INIT_LIST_HEAD(&cc.migratepages);
>   
> +	/*
> +	 * TODO: make MIGRATE_ISOLATE a standalone bit to avoid overwriting
> +	 * the exiting migratetype. Then, we will not need the save and restore
> +	 * process here.
> +	 */
> +
> +	/* Save the migratepages of the pageblocks before start and after end */
> +	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
> +				+ (isolate_end - alloc_end) / pageblock_nr_pages;
> +	saved_mt =
> +		kmalloc_array(num_pageblock_to_save,
> +			      sizeof(unsigned char), GFP_KERNEL);
> +	if (!saved_mt)
> +		return -ENOMEM;
> +
> +	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> +	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
the two lines above looks confusing to figure out the logic. Could you 
help explain a bit?

Thanks,
Eric
> +
>   	/*
>   	 * What we do here is we mark all pageblocks in range as
>   	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
>   	 * have different sizes, and due to the way page allocator
> -	 * work, we align the range to biggest of the two pages so
> +	 * work, we align the isolation range to biggest of the two so
>   	 * that page allocator won't try to merge buddies from
>   	 * different pageblocks and change MIGRATE_ISOLATE to some
>   	 * other migration type.
> @@ -9125,6 +9197,20 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   	 * we are interested in).  This will put all the pages in
>   	 * range back to page allocator as MIGRATE_ISOLATE.
>   	 *
> +	 * Afterwards, we restore the migratetypes of the pageblocks not
> +	 * in range, split free pages spanning outside the range,
> +	 * and put split free pages (at pageblock_order) to the right
> +	 * migratetype list.
> +	 *
> +	 * NOTE: the above approach is used because it can cause free
> +	 * page accounting issues during isolation, if a page, either
> +	 * free or in-use, contains multiple pageblocks and we only
> +	 * isolate a subset of them. For example, if only the second
> +	 * pageblock is isolated from a page with 2 pageblocks, after
> +	 * the page is free, it will be put in the first pageblock
> +	 * migratetype list instead of having 2 pageblocks in two
> +	 * separate migratetype lists.
> +	 *
>   	 * When this is done, we take the pages in range from page
>   	 * allocator removing them from the buddy system.  This way
>   	 * page allocator will never consider using them.
> @@ -9135,10 +9221,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   	 * put back to page allocator so that buddy can use them.
>   	 */
>   
> -	ret = start_isolate_page_range(pfn_max_align_down(start),
> -				       pfn_max_align_up(end), migratetype, 0);
> +	ret = start_isolate_page_range(isolate_start, isolate_end, migratetype, 0);
>   	if (ret)
> -		return ret;
> +		goto done;
>   
>   	drain_all_pages(cc.zone);
>   
> @@ -9174,6 +9259,19 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   	 * isolated thus they won't get removed from buddy.
>   	 */
>   
> +	/*
> +	 * Restore migratetypes of pageblocks outside [start, end)
> +	 * TODO: remove it when MIGRATE_ISOLATE becomes a standalone bit
> +	 */
> +
> +	num = restore_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> +	num = restore_migratetypes(&saved_mt[num], alloc_end, isolate_end);
> +
> +	/*
> +	 * Split free page spanning [isolate_start, alloc_start) and put the
> +	 * pageblocks in the right migratetype lists.
> +	 */
>   	order = 0;
>   	outer_start = start;
>   	while (!PageBuddy(pfn_to_page(outer_start))) {
> @@ -9188,37 +9286,68 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>   		order = buddy_order(pfn_to_page(outer_start));
>   
>   		/*
> -		 * outer_start page could be small order buddy page and
> -		 * it doesn't include start page. Adjust outer_start
> -		 * in this case to report failed page properly
> -		 * on tracepoint in test_pages_isolated()
> +		 * split the free page has start page and put the pageblocks
> +		 * in the right migratetype list
>   		 */
> -		if (outer_start + (1UL << order) <= start)
> -			outer_start = start;
> +		if (outer_start + (1UL << order) > start) {
> +			struct page *free_page = pfn_to_page(outer_start);
> +
> +			split_free_page_into_pageblocks(free_page, order, cc.zone);
> +		}
> +	}
> +
> +	/*
> +	 * Split free page spanning [alloc_end, isolate_end) and put the
> +	 * pageblocks in the right migratetype list
> +	 */
> +	order = 0;
> +	outer_end = end;
> +	while (!PageBuddy(pfn_to_page(outer_end))) {
> +		if (++order >= MAX_ORDER) {
> +			outer_end = end;
> +			break;
> +		}
> +		outer_end &= ~0UL << order;
> +	}
> +
> +	if (outer_end != end) {
> +		order = buddy_order(pfn_to_page(outer_end));
> +
> +		/*
> +		 * split the free page has start page and put the pageblocks
> +		 * in the right migratetype list
> +		 */
> +		VM_BUG_ON(outer_end + (1UL << order) <= end);
> +		{
> +			struct page *free_page = pfn_to_page(outer_end);
> +
> +			split_free_page_into_pageblocks(free_page, order, cc.zone);
> +		}
>   	}
>   
>   	/* Make sure the range is really isolated. */
> -	if (test_pages_isolated(outer_start, end, 0)) {
> +	if (test_pages_isolated(alloc_start, alloc_end, 0)) {
>   		ret = -EBUSY;
>   		goto done;
>   	}
>   
>   	/* Grab isolated pages from freelists. */
> -	outer_end = isolate_freepages_range(&cc, outer_start, end);
> +	outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
>   	if (!outer_end) {
>   		ret = -EBUSY;
>   		goto done;
>   	}
>   
>   	/* Free head and tail (if any) */
> -	if (start != outer_start)
> -		free_contig_range(outer_start, start - outer_start);
> -	if (end != outer_end)
> -		free_contig_range(end, outer_end - end);
> +	if (start != alloc_start)
> +		free_contig_range(alloc_start, start - alloc_start);
> +	if (end != alloc_end)
> +		free_contig_range(end, alloc_end - end);
>   
>   done:
> -	undo_isolate_page_range(pfn_max_align_down(start),
> -				pfn_max_align_up(end), migratetype);
> +	kfree(saved_mt);
> +	undo_isolate_page_range(alloc_start,
> +				alloc_end, migratetype);
>   	return ret;
>   }
>   EXPORT_SYMBOL(alloc_contig_range);


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-12-10  7:30 ` [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Eric Ren
@ 2021-12-10 15:30   ` Zi Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2021-12-10 15:30 UTC (permalink / raw)
  To: Eric Ren
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 4035 bytes --]

On 10 Dec 2021, at 2:30, Eric Ren wrote:

> Hi Zi Yan,
>
> On 2021/12/10 07:04, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Hi all,
>>
>> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
>> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
>> adjustable at boot time[1].
>>
>> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
>> isolates pageblocks to remove free memory from buddy allocator but isolating
>> only a subset of pageblocks within a page spanning across multiple pageblocks
>> causes free page accounting issues. Isolated page might not be put into the
>> right free list, since the code assumes the migratetype of the first pageblock
>> as the whole free page migratetype. This is based on the discussion at [2].
>>
>> To remove the requirement, this patchset:
>> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
> Then, unplug fails if either pageblock of the  MAX_ORDER - 1 page has unmovable page, right?

Right. One of the optimizations mentioned is targeting this by passing the actual
range instead of the MAX_ORDER-1 aligned range, so that has_unmovable_pages()
will not give false positive, minimizing the isolation failure rates.

>
> Thanks,
> Eric
>> 2. but saves the pageblock migratetypes outside the specified range of
>>     alloc_contig_range() and restores them after all pages within the range
>>     become free after __alloc_contig_migrate_range();
>> 3. splits free pages spanning multiple pageblocks at the beginning and the end
>>     of the range and puts the split pages to the right migratetype free lists
>>     based on the pageblock migratetypes;
>> 4. returns pages not in the range as it did before this patch.
>>
>> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
>> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
>> to make sure all pageblocks belonging to a single page are isolated together
>> and later pageblocks outside the range need to have their migratetypes restored;
>> or 2) extra logic will need to be added during page free time to split a free
>> page with multi-migratetype pageblocks.
>>
>> Two optimizations might come later:
>> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
>>     range during isolation to increase successful rate of alloc_contig_range().
^^^^^^^^^^^^^^

>> 2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
>>     migratetypes before and after isolation respectively.
>>
>> Feel free to give comments and suggestions. Thanks.
>>
>>
>> [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
>> [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/
>>
>>
>> Zi Yan (7):
>>    mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
>>    mm: compaction: handle non-lru compound pages properly in
>>      isolate_migratepages_block().
>>    mm: migrate: allocate the right size of non hugetlb or THP compound
>>      pages.
>>    mm: make alloc_contig_range work at pageblock granularity
>>    mm: cma: use pageblock_order as the single alignment
>>    drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>>      size.
>>    arch: powerpc: adjust fadump alignment to be pageblock aligned.
>>
>>   arch/powerpc/include/asm/fadump-internal.h |   4 +-
>>   drivers/virtio/virtio_mem.c                |   6 +-
>>   include/linux/mmzone.h                     |  11 +-
>>   kernel/dma/contiguous.c                    |   2 +-
>>   mm/cma.c                                   |   6 +-
>>   mm/compaction.c                            |  10 +-
>>   mm/migrate.c                               |   8 +-
>>   mm/page_alloc.c                            | 203 +++++++++++++++++----
>>   8 files changed, 196 insertions(+), 54 deletions(-)
>>


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  2021-12-10  7:43   ` Eric Ren
@ 2021-12-10 15:39     ` Zi Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2021-12-10 15:39 UTC (permalink / raw)
  To: Eric Ren
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 5820 bytes --]

Hi Eric,

Thanks for looking into my patch.

On 10 Dec 2021, at 2:43, Eric Ren wrote:

> Hi,
>
> On 2021/12/10 07:04, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
>> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
>> requirement for CMA and alloc_contig_range().
>>
>> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
>> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
>> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
>>
>> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>   include/linux/mmzone.h |  6 ++++++
>>   mm/page_alloc.c        | 28 ++++++++++++++++++----------
>>   2 files changed, 24 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 58e744b78c2c..b925431b0123 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -83,6 +83,12 @@ static inline bool is_migrate_movable(int mt)
>>   	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
>>   }
>>  +/* See fallbacks[MIGRATE_TYPES][3] in page_alloc.c */
>> +static inline bool migratetype_has_fallback(int mt)
>> +{
>> +	return mt < MIGRATE_PCPTYPES;
>> +}
>> +
>
> I would suggest spliting the patch into 2 parts.  The first part: no functioning change, just introduce migratetype_has_fallback()
> and replace where it applys to.

OK. I can do that.

>
>>   #define for_each_migratetype_order(order, type) \
>>   	for (order = 0; order < MAX_ORDER; order++) \
>>   		for (type = 0; type < MIGRATE_TYPES; type++)
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index edfd6c81af82..107a5f186d3b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1041,6 +1041,12 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
>>   	return page_is_buddy(higher_page, higher_buddy, order + 1);
>>   }
>>  +static inline bool has_non_fallback_pageblock(struct zone *zone)
>> +{
>> +	return has_isolate_pageblock(zone) || zone_cma_pages(zone) != 0 ||
>> +		zone->nr_reserved_highatomic != 0;
>
> Make zone->nr_reserved_highatomic != 0 a helper as zone_cma_pages()?

I am not sure. We have zone_cma_pages() because when CMA is not enabled, 0 can be
simply returned. But MIGRATE_HIGHATOMIC is always present, then an helper function
is not that useful.

>> +}
>> +
>>   /*
>>    * Freeing function for a buddy system allocator.
>>    *
>> @@ -1116,14 +1122,15 @@ static inline void __free_one_page(struct page *page,
>>   	}
>>   	if (order < MAX_ORDER - 1) {
>>   		/* If we are here, it means order is >= pageblock_order.
>> -		 * We want to prevent merge between freepages on isolate
>> -		 * pageblock and normal pageblock. Without this, pageblock
>> -		 * isolation could cause incorrect freepage or CMA accounting.
>> +		 * We want to prevent merge between freepages on pageblock
>> +		 * without fallbacks and normal pageblock. Without this,
>> +		 * pageblock isolation could cause incorrect freepage or CMA
>> +		 * accounting or HIGHATOMIC accounting.
>>   		 *
>>   		 * We don't want to hit this code for the more frequent
>>   		 * low-order merging.
>>   		 */
>> -		if (unlikely(has_isolate_pageblock(zone))) {
>> +		if (unlikely(has_non_fallback_pageblock(zone))) {
> I'm not familiar with the code details, just wondering if this change would has side effects on cma
> pageblock merging as it the condition stronger?

No impact on cma pageblock merging, AFAICT.

>
> Thanks,
> Eric
>>   			int buddy_mt;
>>    			buddy_pfn = __find_buddy_pfn(pfn, order);
>> @@ -1131,8 +1138,8 @@ static inline void __free_one_page(struct page *page,
>>   			buddy_mt = get_pageblock_migratetype(buddy);
>>    			if (migratetype != buddy_mt
>> -					&& (is_migrate_isolate(migratetype) ||
>> -						is_migrate_isolate(buddy_mt)))
>> +					&& (!migratetype_has_fallback(migratetype) ||
>> +						!migratetype_has_fallback(buddy_mt)))
>>   				goto done_merging;
>>   		}
>>   		max_order = order + 1;
>> @@ -2483,6 +2490,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>>   	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>>   	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>>   	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
>> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>>   #ifdef CONFIG_CMA
>>   	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
>>   #endif
>> @@ -2794,8 +2802,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
>>    	/* Yoink! */
>>   	mt = get_pageblock_migratetype(page);
>> -	if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
>> -	    && !is_migrate_cma(mt)) {
>> +	/* Only reserve normal pageblock */
>> +	if (migratetype_has_fallback(mt)) {
>>   		zone->nr_reserved_highatomic += pageblock_nr_pages;
>>   		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
>>   		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
>> @@ -3544,8 +3552,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
>>   		struct page *endpage = page + (1 << order) - 1;
>>   		for (; page < endpage; page += pageblock_nr_pages) {
>>   			int mt = get_pageblock_migratetype(page);
>> -			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
>> -			    && !is_migrate_highatomic(mt))
>> +			/* Only change normal pageblock */
>> +			if (migratetype_has_fallback(mt))
>>   				set_pageblock_migratetype(page,
>>   							  MIGRATE_MOVABLE);
>>   		}


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.
  2021-12-10  7:53   ` Eric Ren
@ 2021-12-10 15:48     ` Zi Yan
  2021-12-10 17:59       ` Yang Shi
  0 siblings, 1 reply; 18+ messages in thread
From: Zi Yan @ 2021-12-10 15:48 UTC (permalink / raw)
  To: Eric Ren
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 2669 bytes --]

On 10 Dec 2021, at 2:53, Eric Ren wrote:

> Hi,
>
> On 2021/12/10 07:04, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> alloc_migration_target() is used by alloc_contig_range() and non-LRU
>> movable compound pages can be migrated. Current code does not allocate the
>> right page size for such pages. Check THP precisely using
>> is_transparent_huge() and add allocation support for non-LRU compound
>> pages.
> Could you elaborate on why the current code doesn't get the right size?  how this patch fixes it.

The current code only check PageHuge() and PageTransHuge(). Non-LRU compound
pages will be regarded as PageTransHuge() and an order-9 page is always allocated
regardless of the actual page order. This patch makes the exact check for
THPs using is_transparent_huge() instead of PageTransHuge() and checks PageCompound()
after PageHuge() and is_transparent_huge() for non-LRU compound pages.

>
> The description sounds like it's an existing bug, if so, the patch subject should be changed to
> "Fixes ..."?

I have not seen any related bug report.

>
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>   mm/migrate.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index d487a399253b..2ce3c771b1de 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
>>   		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
>>   	}
>>  -	if (PageTransHuge(page)) {
>> +	if (is_transparent_hugepage(page)) {
>>   		/*
>>   		 * clear __GFP_RECLAIM to make the migration callback
>>   		 * consistent with regular THP allocations.
>> @@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
> if (PageTransHuge(page)) {  // just give more code context
> ...
>>   		gfp_mask |= GFP_TRANSHUGE;
>>   		order = HPAGE_PMD_ORDER;
> order assigned here
>>   	}
>> +	if (PageCompound(page)) {
>> +		gfp_mask |= __GFP_COMP;
>> +		order = compound_order(page);
> re-assinged again as THP is a compound page?

Ah, you are right. Will use else if instead. Thanks.

> Thanks,
> Eric
>> +	}
>>   	zidx = zone_idx(page_zone(page));
>>   	if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
>>   		gfp_mask |= __GFP_HIGHMEM;
>>    	new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);
>>  -	if (new_page && PageTransHuge(new_page))
>> +	if (new_page && is_transparent_hugepage(page))
>>   		prep_transhuge_page(new_page);
>>    	return new_page;


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.
  2021-12-10 15:48     ` Zi Yan
@ 2021-12-10 17:59       ` Yang Shi
  0 siblings, 0 replies; 18+ messages in thread
From: Yang Shi @ 2021-12-10 17:59 UTC (permalink / raw)
  To: Zi Yan
  Cc: Eric Ren, David Hildenbrand, Linux MM, Linux Kernel Mailing List,
	Michael Ellerman, Christoph Hellwig, Marek Szyprowski,
	Robin Murphy, linuxppc-dev, virtualization, iommu,
	Vlastimil Babka, Mel Gorman

On Fri, Dec 10, 2021 at 8:08 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 10 Dec 2021, at 2:53, Eric Ren wrote:
>
> > Hi,
> >
> > On 2021/12/10 07:04, Zi Yan wrote:
> >> From: Zi Yan <ziy@nvidia.com>
> >>
> >> alloc_migration_target() is used by alloc_contig_range() and non-LRU
> >> movable compound pages can be migrated. Current code does not allocate the
> >> right page size for such pages. Check THP precisely using
> >> is_transparent_huge() and add allocation support for non-LRU compound
> >> pages.
> > Could you elaborate on why the current code doesn't get the right size?  how this patch fixes it.
>
> The current code only check PageHuge() and PageTransHuge(). Non-LRU compound
> pages will be regarded as PageTransHuge() and an order-9 page is always allocated
> regardless of the actual page order. This patch makes the exact check for
> THPs using is_transparent_huge() instead of PageTransHuge() and checks PageCompound()
> after PageHuge() and is_transparent_huge() for non-LRU compound pages.
>
> >
> > The description sounds like it's an existing bug, if so, the patch subject should be changed to
> > "Fixes ..."?
>
> I have not seen any related bug report.
>
> >
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >> ---
> >>   mm/migrate.c | 8 ++++++--
> >>   1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index d487a399253b..2ce3c771b1de 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
> >>              return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
> >>      }
> >>  -   if (PageTransHuge(page)) {
> >> +    if (is_transparent_hugepage(page)) {
> >>              /*
> >>               * clear __GFP_RECLAIM to make the migration callback
> >>               * consistent with regular THP allocations.
> >> @@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
> > if (PageTransHuge(page)) {  // just give more code context
> > ...
> >>              gfp_mask |= GFP_TRANSHUGE;
> >>              order = HPAGE_PMD_ORDER;
> > order assigned here
> >>      }
> >> +    if (PageCompound(page)) {
> >> +            gfp_mask |= __GFP_COMP;
> >> +            order = compound_order(page);
> > re-assinged again as THP is a compound page?
>
> Ah, you are right. Will use else if instead. Thanks.

You don't have to use else if, you could just do:

if (PageCompound(page)) {
    gfp_mask |= __GFP_COMP;
    order = compound_order(page);
}

if (is_transparent_hugepage(page)) {
    /* Manipulate THP specific gfp mask */
    ....
}


>
> > Thanks,
> > Eric
> >> +    }
> >>      zidx = zone_idx(page_zone(page));
> >>      if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
> >>              gfp_mask |= __GFP_HIGHMEM;
> >>      new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);
> >>  -   if (new_page && PageTransHuge(new_page))
> >> +    if (new_page && is_transparent_hugepage(page))
> >>              prep_transhuge_page(new_page);
> >>      return new_page;
>
>
> --
> Best Regards,
> Yan, Zi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (7 preceding siblings ...)
  2021-12-10  7:30 ` [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Eric Ren
@ 2021-12-10 18:36 ` David Hildenbrand
  2021-12-10 20:17   ` Zi Yan
  8 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2021-12-10 18:36 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren

On 10.12.21 00:04, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Hi all,

Hi,

thanks for working on that!

> 
> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
> adjustable at boot time[1].
> 
> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
> isolates pageblocks to remove free memory from buddy allocator but isolating
> only a subset of pageblocks within a page spanning across multiple pageblocks
> causes free page accounting issues. Isolated page might not be put into the
> right free list, since the code assumes the migratetype of the first pageblock
> as the whole free page migratetype. This is based on the discussion at [2].
> 
> To remove the requirement, this patchset:
> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
> 2. but saves the pageblock migratetypes outside the specified range of
>    alloc_contig_range() and restores them after all pages within the range
>    become free after __alloc_contig_migrate_range();
> 3. splits free pages spanning multiple pageblocks at the beginning and the end
>    of the range and puts the split pages to the right migratetype free lists
>    based on the pageblock migratetypes;
> 4. returns pages not in the range as it did before this patch.
> 
> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
> to make sure all pageblocks belonging to a single page are isolated together 
> and later pageblocks outside the range need to have their migratetypes restored;
> or 2) extra logic will need to be added during page free time to split a free
> page with multi-migratetype pageblocks.
> 
> Two optimizations might come later:
> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
>    range during isolation to increase successful rate of alloc_contig_range().

The issue with virtio-mem is that we'll need that as soon as we change
the granularity to pageblocks, because otherwise, you can heavily
degrade unplug reliably in sane setups:

Previous:
* Try unplug free 4M range (2 pageblocks): succeeds

Now:
* Try unplug 2M range (first pageblock): succeeds.
* Try unplug next 2M range (second pageblock): fails because first
contains unmovable allcoations.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-12-10 18:36 ` David Hildenbrand
@ 2021-12-10 20:17   ` Zi Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2021-12-10 20:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren

[-- Attachment #1: Type: text/plain, Size: 2603 bytes --]

On 10 Dec 2021, at 13:36, David Hildenbrand wrote:

> On 10.12.21 00:04, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Hi all,
>
> Hi,
>
> thanks for working on that!
>
>>
>> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
>> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
>> adjustable at boot time[1].
>>
>> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
>> isolates pageblocks to remove free memory from buddy allocator but isolating
>> only a subset of pageblocks within a page spanning across multiple pageblocks
>> causes free page accounting issues. Isolated page might not be put into the
>> right free list, since the code assumes the migratetype of the first pageblock
>> as the whole free page migratetype. This is based on the discussion at [2].
>>
>> To remove the requirement, this patchset:
>> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
>> 2. but saves the pageblock migratetypes outside the specified range of
>>    alloc_contig_range() and restores them after all pages within the range
>>    become free after __alloc_contig_migrate_range();
>> 3. splits free pages spanning multiple pageblocks at the beginning and the end
>>    of the range and puts the split pages to the right migratetype free lists
>>    based on the pageblock migratetypes;
>> 4. returns pages not in the range as it did before this patch.
>>
>> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
>> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
>> to make sure all pageblocks belonging to a single page are isolated together
>> and later pageblocks outside the range need to have their migratetypes restored;
>> or 2) extra logic will need to be added during page free time to split a free
>> page with multi-migratetype pageblocks.
>>
>> Two optimizations might come later:
>> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
>>    range during isolation to increase successful rate of alloc_contig_range().
>
> The issue with virtio-mem is that we'll need that as soon as we change
> the granularity to pageblocks, because otherwise, you can heavily
> degrade unplug reliably in sane setups:
>
> Previous:
> * Try unplug free 4M range (2 pageblocks): succeeds
>
> Now:
> * Try unplug 2M range (first pageblock): succeeds.
> * Try unplug next 2M range (second pageblock): fails because first
> contains unmovable allcoations.
>

OK. Make sense. I will add it in the next version.


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-12-10 20:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-09 23:04 [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2021-12-09 23:04 ` [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Zi Yan
2021-12-10  7:43   ` Eric Ren
2021-12-10 15:39     ` Zi Yan
2021-12-09 23:04 ` [RFC PATCH v2 2/7] mm: compaction: handle non-lru compound pages properly in isolate_migratepages_block() Zi Yan
2021-12-09 23:04 ` [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages Zi Yan
2021-12-10  7:53   ` Eric Ren
2021-12-10 15:48     ` Zi Yan
2021-12-10 17:59       ` Yang Shi
2021-12-09 23:04 ` [RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2021-12-10  8:12   ` Eric Ren
2021-12-09 23:04 ` [RFC PATCH v2 5/7] mm: cma: use pageblock_order as the single alignment Zi Yan
2021-12-09 23:04 ` [RFC PATCH v2 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2021-12-09 23:04 ` [RFC PATCH v2 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
2021-12-10  7:30 ` [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment Eric Ren
2021-12-10 15:30   ` Zi Yan
2021-12-10 18:36 ` David Hildenbrand
2021-12-10 20:17   ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).