All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
@ 2022-01-19 19:06 ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make
MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2021-12-29-20-07.

Changelog from RFC
===
1. Dropped two irrelevant patches on non-lru compound page handling, as
   it is not supported upstream.
2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
3. Always check whether two pageblocks can be merged in
   __free_one_page() when order is >= pageblock_order, as the case (not
   mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
3. Moving has_unmovable_pages() is now a separate patch.
4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.

Description
===

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. still isolates pageblocks at MAX_ORDER - 1 granularity;
2. but saves the pageblock migratetypes outside the specified range of
   alloc_contig_range() and restores them after all pages within the range
   become free after __alloc_contig_migrate_range();
3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
   range during isolation to avoid alloc_contig_range() failure when pageblocks
   within a MAX_ORDER - 1 aligned range are allocated separately.
3. splits free pages spanning multiple pageblocks at the beginning and the end
   of the range and puts the split pages to the right migratetype free lists
   based on the pageblock migratetypes;
4. returns pages not in the range as it did before.

Isolation needs to be done at MAX_ORDER - 1 granularity, because otherwise
either 1) it is needed to detect to-be-isolated page size (free, PageHuge, THP,
or other PageCompound) to make sure all pageblocks belonging to a single page
are isolated together and later restore pageblock migratetypes outside the
range, or 2) assuming isolation happens at pageblock granularity, a free page
with multi-migratetype pageblocks can seen in free page path and needs
to be split and freed at pageblock granularity.

One optimization might come later:
1. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
   migratetypes before and after isolation respectively.

Feel free to give comments and suggestions. Thanks.

[1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/


Zi Yan (7):
  mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  mm: page_isolation: check specified range for unmovable pages
  mm: make alloc_contig_range work at pageblock granularity
  mm: cma: use pageblock_order as the single alignment
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
    size.
  arch: powerpc: adjust fadump alignment to be pageblock aligned.

 arch/powerpc/include/asm/fadump-internal.h |   4 +-
 drivers/virtio/virtio_mem.c                |   7 +-
 include/linux/mmzone.h                     |  16 +-
 include/linux/page-isolation.h             |   3 +-
 kernel/dma/contiguous.c                    |   2 +-
 mm/cma.c                                   |   6 +-
 mm/memory_hotplug.c                        |  12 +-
 mm/page_alloc.c                            | 337 +++++++++++----------
 mm/page_isolation.c                        | 154 +++++++++-
 9 files changed, 352 insertions(+), 189 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
@ 2022-01-19 19:06 ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Michael Ellerman, Robin Murphy, linux-kernel,
	iommu, Eric Ren, virtualization, linuxppc-dev, Christoph Hellwig,
	Vlastimil Babka

From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make
MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2021-12-29-20-07.

Changelog from RFC
===
1. Dropped two irrelevant patches on non-lru compound page handling, as
   it is not supported upstream.
2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
3. Always check whether two pageblocks can be merged in
   __free_one_page() when order is >= pageblock_order, as the case (not
   mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
3. Moving has_unmovable_pages() is now a separate patch.
4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.

Description
===

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. still isolates pageblocks at MAX_ORDER - 1 granularity;
2. but saves the pageblock migratetypes outside the specified range of
   alloc_contig_range() and restores them after all pages within the range
   become free after __alloc_contig_migrate_range();
3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
   range during isolation to avoid alloc_contig_range() failure when pageblocks
   within a MAX_ORDER - 1 aligned range are allocated separately.
3. splits free pages spanning multiple pageblocks at the beginning and the end
   of the range and puts the split pages to the right migratetype free lists
   based on the pageblock migratetypes;
4. returns pages not in the range as it did before.

Isolation needs to be done at MAX_ORDER - 1 granularity, because otherwise
either 1) it is needed to detect to-be-isolated page size (free, PageHuge, THP,
or other PageCompound) to make sure all pageblocks belonging to a single page
are isolated together and later restore pageblock migratetypes outside the
range, or 2) assuming isolation happens at pageblock granularity, a free page
with multi-migratetype pageblocks can seen in free page path and needs
to be split and freed at pageblock granularity.

One optimization might come later:
1. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
   migratetypes before and after isolation respectively.

Feel free to give comments and suggestions. Thanks.

[1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/


Zi Yan (7):
  mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  mm: page_isolation: check specified range for unmovable pages
  mm: make alloc_contig_range work at pageblock granularity
  mm: cma: use pageblock_order as the single alignment
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
    size.
  arch: powerpc: adjust fadump alignment to be pageblock aligned.

 arch/powerpc/include/asm/fadump-internal.h |   4 +-
 drivers/virtio/virtio_mem.c                |   7 +-
 include/linux/mmzone.h                     |  16 +-
 include/linux/page-isolation.h             |   3 +-
 kernel/dma/contiguous.c                    |   2 +-
 mm/cma.c                                   |   6 +-
 mm/memory_hotplug.c                        |  12 +-
 mm/page_alloc.c                            | 337 +++++++++++----------
 mm/page_isolation.c                        | 154 +++++++++-
 9 files changed, 352 insertions(+), 189 deletions(-)

-- 
2.34.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
@ 2022-01-19 19:06 ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Robin Murphy, linux-kernel, iommu, Eric Ren,
	virtualization, linuxppc-dev, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make
MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2021-12-29-20-07.

Changelog from RFC
===
1. Dropped two irrelevant patches on non-lru compound page handling, as
   it is not supported upstream.
2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
3. Always check whether two pageblocks can be merged in
   __free_one_page() when order is >= pageblock_order, as the case (not
   mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
3. Moving has_unmovable_pages() is now a separate patch.
4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.

Description
===

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. still isolates pageblocks at MAX_ORDER - 1 granularity;
2. but saves the pageblock migratetypes outside the specified range of
   alloc_contig_range() and restores them after all pages within the range
   become free after __alloc_contig_migrate_range();
3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
   range during isolation to avoid alloc_contig_range() failure when pageblocks
   within a MAX_ORDER - 1 aligned range are allocated separately.
3. splits free pages spanning multiple pageblocks at the beginning and the end
   of the range and puts the split pages to the right migratetype free lists
   based on the pageblock migratetypes;
4. returns pages not in the range as it did before.

Isolation needs to be done at MAX_ORDER - 1 granularity, because otherwise
either 1) it is needed to detect to-be-isolated page size (free, PageHuge, THP,
or other PageCompound) to make sure all pageblocks belonging to a single page
are isolated together and later restore pageblock migratetypes outside the
range, or 2) assuming isolation happens at pageblock granularity, a free page
with multi-migratetype pageblocks can seen in free page path and needs
to be split and freed at pageblock granularity.

One optimization might come later:
1. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
   migratetypes before and after isolation respectively.

Feel free to give comments and suggestions. Thanks.

[1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/


Zi Yan (7):
  mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  mm: page_isolation: check specified range for unmovable pages
  mm: make alloc_contig_range work at pageblock granularity
  mm: cma: use pageblock_order as the single alignment
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
    size.
  arch: powerpc: adjust fadump alignment to be pageblock aligned.

 arch/powerpc/include/asm/fadump-internal.h |   4 +-
 drivers/virtio/virtio_mem.c                |   7 +-
 include/linux/mmzone.h                     |  16 +-
 include/linux/page-isolation.h             |   3 +-
 kernel/dma/contiguous.c                    |   2 +-
 mm/cma.c                                   |   6 +-
 mm/memory_hotplug.c                        |  12 +-
 mm/page_alloc.c                            | 337 +++++++++++----------
 mm/page_isolation.c                        | 154 +++++++++-
 9 files changed, 352 insertions(+), 189 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  2022-01-19 19:06 ` Zi Yan
  (?)
@ 2022-01-19 19:06   ` Zi Yan
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
It prepares for the upcoming removal of the MAX_ORDER-1 alignment
requirement for CMA and alloc_contig_range().

MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.

[1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h | 11 +++++++++++
 mm/page_alloc.c        | 37 ++++++++++++++++++++-----------------
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index aed44e9b5d89..71b77aab748d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -83,6 +83,17 @@ static inline bool is_migrate_movable(int mt)
 	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
 }
 
+/*
+ * Check whether a migratetype can be merged with another migratetype.
+ *
+ * It is only mergeable when it can fall back to other migratetypes for
+ * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c.
+ */
+static inline bool migratetype_is_mergeable(int mt)
+{
+	return mt < MIGRATE_PCPTYPES;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8dd6399bafb5..15de65215c02 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1117,25 +1117,24 @@ static inline void __free_one_page(struct page *page,
 	}
 	if (order < MAX_ORDER - 1) {
 		/* If we are here, it means order is >= pageblock_order.
-		 * We want to prevent merge between freepages on isolate
-		 * pageblock and normal pageblock. Without this, pageblock
-		 * isolation could cause incorrect freepage or CMA accounting.
+		 * We want to prevent merge between freepages on pageblock
+		 * without fallbacks and normal pageblock. Without this,
+		 * pageblock isolation could cause incorrect freepage or CMA
+		 * accounting or HIGHATOMIC accounting.
 		 *
 		 * We don't want to hit this code for the more frequent
 		 * low-order merging.
 		 */
-		if (unlikely(has_isolate_pageblock(zone))) {
-			int buddy_mt;
+		int buddy_mt;
 
-			buddy_pfn = __find_buddy_pfn(pfn, order);
-			buddy = page + (buddy_pfn - pfn);
-			buddy_mt = get_pageblock_migratetype(buddy);
+		buddy_pfn = __find_buddy_pfn(pfn, order);
+		buddy = page + (buddy_pfn - pfn);
+		buddy_mt = get_pageblock_migratetype(buddy);
 
-			if (migratetype != buddy_mt
-					&& (is_migrate_isolate(migratetype) ||
-						is_migrate_isolate(buddy_mt)))
-				goto done_merging;
-		}
+		if (migratetype != buddy_mt
+				&& (!migratetype_is_mergeable(migratetype) ||
+					!migratetype_is_mergeable(buddy_mt)))
+			goto done_merging;
 		max_order = order + 1;
 		goto continue_merging;
 	}
@@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
 #ifdef CONFIG_CMA
 	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
 #endif
@@ -2795,8 +2795,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 
 	/* Yoink! */
 	mt = get_pageblock_migratetype(page);
-	if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
-	    && !is_migrate_cma(mt)) {
+	/* Only reserve normal pageblocks (i.e., they can merge with others) */
+	if (migratetype_is_mergeable(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
 		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
@@ -3545,8 +3545,11 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
-			    && !is_migrate_highatomic(mt))
+			/*
+			 * Only change normal pageblocks (i.e., they can merge
+			 * with others)
+			 */
+			if (migratetype_is_mergeable(mt))
 				set_pageblock_migratetype(page,
 							  MIGRATE_MOVABLE);
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Michael Ellerman, Robin Murphy, linux-kernel,
	iommu, Eric Ren, virtualization, linuxppc-dev, Christoph Hellwig,
	Vlastimil Babka

From: Zi Yan <ziy@nvidia.com>

This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
It prepares for the upcoming removal of the MAX_ORDER-1 alignment
requirement for CMA and alloc_contig_range().

MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.

[1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h | 11 +++++++++++
 mm/page_alloc.c        | 37 ++++++++++++++++++++-----------------
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index aed44e9b5d89..71b77aab748d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -83,6 +83,17 @@ static inline bool is_migrate_movable(int mt)
 	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
 }
 
+/*
+ * Check whether a migratetype can be merged with another migratetype.
+ *
+ * It is only mergeable when it can fall back to other migratetypes for
+ * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c.
+ */
+static inline bool migratetype_is_mergeable(int mt)
+{
+	return mt < MIGRATE_PCPTYPES;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8dd6399bafb5..15de65215c02 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1117,25 +1117,24 @@ static inline void __free_one_page(struct page *page,
 	}
 	if (order < MAX_ORDER - 1) {
 		/* If we are here, it means order is >= pageblock_order.
-		 * We want to prevent merge between freepages on isolate
-		 * pageblock and normal pageblock. Without this, pageblock
-		 * isolation could cause incorrect freepage or CMA accounting.
+		 * We want to prevent merge between freepages on pageblock
+		 * without fallbacks and normal pageblock. Without this,
+		 * pageblock isolation could cause incorrect freepage or CMA
+		 * accounting or HIGHATOMIC accounting.
 		 *
 		 * We don't want to hit this code for the more frequent
 		 * low-order merging.
 		 */
-		if (unlikely(has_isolate_pageblock(zone))) {
-			int buddy_mt;
+		int buddy_mt;
 
-			buddy_pfn = __find_buddy_pfn(pfn, order);
-			buddy = page + (buddy_pfn - pfn);
-			buddy_mt = get_pageblock_migratetype(buddy);
+		buddy_pfn = __find_buddy_pfn(pfn, order);
+		buddy = page + (buddy_pfn - pfn);
+		buddy_mt = get_pageblock_migratetype(buddy);
 
-			if (migratetype != buddy_mt
-					&& (is_migrate_isolate(migratetype) ||
-						is_migrate_isolate(buddy_mt)))
-				goto done_merging;
-		}
+		if (migratetype != buddy_mt
+				&& (!migratetype_is_mergeable(migratetype) ||
+					!migratetype_is_mergeable(buddy_mt)))
+			goto done_merging;
 		max_order = order + 1;
 		goto continue_merging;
 	}
@@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
 #ifdef CONFIG_CMA
 	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
 #endif
@@ -2795,8 +2795,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 
 	/* Yoink! */
 	mt = get_pageblock_migratetype(page);
-	if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
-	    && !is_migrate_cma(mt)) {
+	/* Only reserve normal pageblocks (i.e., they can merge with others) */
+	if (migratetype_is_mergeable(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
 		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
@@ -3545,8 +3545,11 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
-			    && !is_migrate_highatomic(mt))
+			/*
+			 * Only change normal pageblocks (i.e., they can merge
+			 * with others)
+			 */
+			if (migratetype_is_mergeable(mt))
 				set_pageblock_migratetype(page,
 							  MIGRATE_MOVABLE);
 		}
-- 
2.34.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Robin Murphy, linux-kernel, iommu, Eric Ren,
	virtualization, linuxppc-dev, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
It prepares for the upcoming removal of the MAX_ORDER-1 alignment
requirement for CMA and alloc_contig_range().

MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.

[1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h | 11 +++++++++++
 mm/page_alloc.c        | 37 ++++++++++++++++++++-----------------
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index aed44e9b5d89..71b77aab748d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -83,6 +83,17 @@ static inline bool is_migrate_movable(int mt)
 	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
 }
 
+/*
+ * Check whether a migratetype can be merged with another migratetype.
+ *
+ * It is only mergeable when it can fall back to other migratetypes for
+ * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c.
+ */
+static inline bool migratetype_is_mergeable(int mt)
+{
+	return mt < MIGRATE_PCPTYPES;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8dd6399bafb5..15de65215c02 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1117,25 +1117,24 @@ static inline void __free_one_page(struct page *page,
 	}
 	if (order < MAX_ORDER - 1) {
 		/* If we are here, it means order is >= pageblock_order.
-		 * We want to prevent merge between freepages on isolate
-		 * pageblock and normal pageblock. Without this, pageblock
-		 * isolation could cause incorrect freepage or CMA accounting.
+		 * We want to prevent merge between freepages on pageblock
+		 * without fallbacks and normal pageblock. Without this,
+		 * pageblock isolation could cause incorrect freepage or CMA
+		 * accounting or HIGHATOMIC accounting.
 		 *
 		 * We don't want to hit this code for the more frequent
 		 * low-order merging.
 		 */
-		if (unlikely(has_isolate_pageblock(zone))) {
-			int buddy_mt;
+		int buddy_mt;
 
-			buddy_pfn = __find_buddy_pfn(pfn, order);
-			buddy = page + (buddy_pfn - pfn);
-			buddy_mt = get_pageblock_migratetype(buddy);
+		buddy_pfn = __find_buddy_pfn(pfn, order);
+		buddy = page + (buddy_pfn - pfn);
+		buddy_mt = get_pageblock_migratetype(buddy);
 
-			if (migratetype != buddy_mt
-					&& (is_migrate_isolate(migratetype) ||
-						is_migrate_isolate(buddy_mt)))
-				goto done_merging;
-		}
+		if (migratetype != buddy_mt
+				&& (!migratetype_is_mergeable(migratetype) ||
+					!migratetype_is_mergeable(buddy_mt)))
+			goto done_merging;
 		max_order = order + 1;
 		goto continue_merging;
 	}
@@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
 #ifdef CONFIG_CMA
 	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
 #endif
@@ -2795,8 +2795,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 
 	/* Yoink! */
 	mt = get_pageblock_migratetype(page);
-	if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
-	    && !is_migrate_cma(mt)) {
+	/* Only reserve normal pageblocks (i.e., they can merge with others) */
+	if (migratetype_is_mergeable(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
 		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
@@ -3545,8 +3545,11 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
-			    && !is_migrate_highatomic(mt))
+			/*
+			 * Only change normal pageblocks (i.e., they can merge
+			 * with others)
+			 */
+			if (migratetype_is_mergeable(mt))
 				set_pageblock_migratetype(page,
 							  MIGRATE_MOVABLE);
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  2022-01-19 19:06 ` Zi Yan
  (?)
@ 2022-01-19 19:06   ` Zi Yan
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
mm/page_alloc.c and make it static.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/page-isolation.h |   2 -
 mm/page_alloc.c                | 119 ---------------------------------
 mm/page_isolation.c            | 119 +++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 121 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 572458016331..e14eddf6741a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype)
 #define MEMORY_OFFLINE	0x1
 #define REPORT_FAILURE	0x2
 
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15de65215c02..1d812268c2a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8859,125 +8859,6 @@ void *__init alloc_large_system_hash(const char *tablename,
 	return table;
 }
 
-/*
- * This function checks whether pageblock includes unmovable pages or not.
- *
- * PageLRU check without isolation or lru_lock could race so that
- * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
- * check without lock_page also may miss some movable non-lru pages at
- * race condition. So you can't expect this function should be exact.
- *
- * Returns a page without holding a reference. If the caller wants to
- * dereference that page (e.g., dumping), it has to make sure that it
- * cannot get removed (e.g., via memory unplug) concurrently.
- *
- */
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
-{
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
-
-	if (is_migrate_cma_page(page)) {
-		/*
-		 * CMA allocations (alloc_contig_range) really need to mark
-		 * isolate CMA pageblocks even when they are not movable in fact
-		 * so consider them movable here.
-		 */
-		if (is_migrate_cma(migratetype))
-			return NULL;
-
-		return page;
-	}
-
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
-
-		/*
-		 * Both, bootmem allocations and memory holes are marked
-		 * PG_reserved and are unmovable. We can even have unmovable
-		 * allocations inside ZONE_MOVABLE, for example when
-		 * specifying "movablecore".
-		 */
-		if (PageReserved(page))
-			return page;
-
-		/*
-		 * If the zone is movable and we have ruled out all reserved
-		 * pages then it should be reasonably safe to assume the rest
-		 * is movable.
-		 */
-		if (zone_idx(zone) == ZONE_MOVABLE)
-			continue;
-
-		/*
-		 * Hugepages are not in LRU lists, but they're movable.
-		 * THPs are on the LRU, but need to be counted as #small pages.
-		 * We need not scan over tail pages because we don't
-		 * handle each tail page individually in migration.
-		 */
-		if (PageHuge(page) || PageTransCompound(page)) {
-			struct page *head = compound_head(page);
-			unsigned int skip_pages;
-
-			if (PageHuge(page)) {
-				if (!hugepage_migration_supported(page_hstate(head)))
-					return page;
-			} else if (!PageLRU(head) && !__PageMovable(head)) {
-				return page;
-			}
-
-			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
-			continue;
-		}
-
-		/*
-		 * We can't use page_count without pin a page
-		 * because another CPU can free compound page.
-		 * This check already skips compound tails of THP
-		 * because their page->_refcount is zero at all time.
-		 */
-		if (!page_ref_count(page)) {
-			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
-			continue;
-		}
-
-		/*
-		 * The HWPoisoned page may be not in buddy system, and
-		 * page_count() is not 0.
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
-			continue;
-
-		/*
-		 * We treat all PageOffline() pages as movable when offlining
-		 * to give drivers a chance to decrement their reference count
-		 * in MEM_GOING_OFFLINE in order to indicate that these pages
-		 * can be offlined as there are no direct references anymore.
-		 * For actually unmovable PageOffline() where the driver does
-		 * not support this, we will fail later when trying to actually
-		 * move these pages that still have a reference count > 0.
-		 * (false negatives in this function only)
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
-			continue;
-
-		if (__PageMovable(page) || PageLRU(page))
-			continue;
-
-		/*
-		 * If there are RECLAIMABLE pages, we need to check
-		 * it.  But now, memory offline itself doesn't call
-		 * shrink_node_slabs() and it still to be fixed.
-		 */
-		return page;
-	}
-	return NULL;
-}
-
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6a0ddda6b3c5..6c841274bf46 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,6 +15,125 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
+/*
+ * This function checks whether pageblock includes unmovable pages or not.
+ *
+ * PageLRU check without isolation or lru_lock could race so that
+ * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
+ * check without lock_page also may miss some movable non-lru pages at
+ * race condition. So you can't expect this function should be exact.
+ *
+ * Returns a page without holding a reference. If the caller wants to
+ * dereference that page (e.g., dumping), it has to make sure that it
+ * cannot get removed (e.g., via memory unplug) concurrently.
+ *
+ */
+static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
+				 int migratetype, int flags)
+{
+	unsigned long iter = 0;
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long offset = pfn % pageblock_nr_pages;
+
+	if (is_migrate_cma_page(page)) {
+		/*
+		 * CMA allocations (alloc_contig_range) really need to mark
+		 * isolate CMA pageblocks even when they are not movable in fact
+		 * so consider them movable here.
+		 */
+		if (is_migrate_cma(migratetype))
+			return NULL;
+
+		return page;
+	}
+
+	for (; iter < pageblock_nr_pages - offset; iter++) {
+		page = pfn_to_page(pfn + iter);
+
+		/*
+		 * Both, bootmem allocations and memory holes are marked
+		 * PG_reserved and are unmovable. We can even have unmovable
+		 * allocations inside ZONE_MOVABLE, for example when
+		 * specifying "movablecore".
+		 */
+		if (PageReserved(page))
+			return page;
+
+		/*
+		 * If the zone is movable and we have ruled out all reserved
+		 * pages then it should be reasonably safe to assume the rest
+		 * is movable.
+		 */
+		if (zone_idx(zone) == ZONE_MOVABLE)
+			continue;
+
+		/*
+		 * Hugepages are not in LRU lists, but they're movable.
+		 * THPs are on the LRU, but need to be counted as #small pages.
+		 * We need not scan over tail pages because we don't
+		 * handle each tail page individually in migration.
+		 */
+		if (PageHuge(page) || PageTransCompound(page)) {
+			struct page *head = compound_head(page);
+			unsigned int skip_pages;
+
+			if (PageHuge(page)) {
+				if (!hugepage_migration_supported(page_hstate(head)))
+					return page;
+			} else if (!PageLRU(head) && !__PageMovable(head)) {
+				return page;
+			}
+
+			skip_pages = compound_nr(head) - (page - head);
+			iter += skip_pages - 1;
+			continue;
+		}
+
+		/*
+		 * We can't use page_count without pin a page
+		 * because another CPU can free compound page.
+		 * This check already skips compound tails of THP
+		 * because their page->_refcount is zero at all time.
+		 */
+		if (!page_ref_count(page)) {
+			if (PageBuddy(page))
+				iter += (1 << buddy_order(page)) - 1;
+			continue;
+		}
+
+		/*
+		 * The HWPoisoned page may be not in buddy system, and
+		 * page_count() is not 0.
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
+			continue;
+
+		/*
+		 * We treat all PageOffline() pages as movable when offlining
+		 * to give drivers a chance to decrement their reference count
+		 * in MEM_GOING_OFFLINE in order to indicate that these pages
+		 * can be offlined as there are no direct references anymore.
+		 * For actually unmovable PageOffline() where the driver does
+		 * not support this, we will fail later when trying to actually
+		 * move these pages that still have a reference count > 0.
+		 * (false negatives in this function only)
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
+			continue;
+
+		if (__PageMovable(page) || PageLRU(page))
+			continue;
+
+		/*
+		 * If there are RECLAIMABLE pages, we need to check
+		 * it.  But now, memory offline itself doesn't call
+		 * shrink_node_slabs() and it still to be fixed.
+		 */
+		return page;
+	}
+	return NULL;
+}
+
 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
 {
 	struct zone *zone = page_zone(page);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Michael Ellerman, Robin Murphy, linux-kernel,
	iommu, Eric Ren, virtualization, linuxppc-dev, Christoph Hellwig,
	Vlastimil Babka

From: Zi Yan <ziy@nvidia.com>

has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
mm/page_alloc.c and make it static.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/page-isolation.h |   2 -
 mm/page_alloc.c                | 119 ---------------------------------
 mm/page_isolation.c            | 119 +++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 121 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 572458016331..e14eddf6741a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype)
 #define MEMORY_OFFLINE	0x1
 #define REPORT_FAILURE	0x2
 
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15de65215c02..1d812268c2a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8859,125 +8859,6 @@ void *__init alloc_large_system_hash(const char *tablename,
 	return table;
 }
 
-/*
- * This function checks whether pageblock includes unmovable pages or not.
- *
- * PageLRU check without isolation or lru_lock could race so that
- * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
- * check without lock_page also may miss some movable non-lru pages at
- * race condition. So you can't expect this function should be exact.
- *
- * Returns a page without holding a reference. If the caller wants to
- * dereference that page (e.g., dumping), it has to make sure that it
- * cannot get removed (e.g., via memory unplug) concurrently.
- *
- */
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
-{
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
-
-	if (is_migrate_cma_page(page)) {
-		/*
-		 * CMA allocations (alloc_contig_range) really need to mark
-		 * isolate CMA pageblocks even when they are not movable in fact
-		 * so consider them movable here.
-		 */
-		if (is_migrate_cma(migratetype))
-			return NULL;
-
-		return page;
-	}
-
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
-
-		/*
-		 * Both, bootmem allocations and memory holes are marked
-		 * PG_reserved and are unmovable. We can even have unmovable
-		 * allocations inside ZONE_MOVABLE, for example when
-		 * specifying "movablecore".
-		 */
-		if (PageReserved(page))
-			return page;
-
-		/*
-		 * If the zone is movable and we have ruled out all reserved
-		 * pages then it should be reasonably safe to assume the rest
-		 * is movable.
-		 */
-		if (zone_idx(zone) == ZONE_MOVABLE)
-			continue;
-
-		/*
-		 * Hugepages are not in LRU lists, but they're movable.
-		 * THPs are on the LRU, but need to be counted as #small pages.
-		 * We need not scan over tail pages because we don't
-		 * handle each tail page individually in migration.
-		 */
-		if (PageHuge(page) || PageTransCompound(page)) {
-			struct page *head = compound_head(page);
-			unsigned int skip_pages;
-
-			if (PageHuge(page)) {
-				if (!hugepage_migration_supported(page_hstate(head)))
-					return page;
-			} else if (!PageLRU(head) && !__PageMovable(head)) {
-				return page;
-			}
-
-			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
-			continue;
-		}
-
-		/*
-		 * We can't use page_count without pin a page
-		 * because another CPU can free compound page.
-		 * This check already skips compound tails of THP
-		 * because their page->_refcount is zero at all time.
-		 */
-		if (!page_ref_count(page)) {
-			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
-			continue;
-		}
-
-		/*
-		 * The HWPoisoned page may be not in buddy system, and
-		 * page_count() is not 0.
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
-			continue;
-
-		/*
-		 * We treat all PageOffline() pages as movable when offlining
-		 * to give drivers a chance to decrement their reference count
-		 * in MEM_GOING_OFFLINE in order to indicate that these pages
-		 * can be offlined as there are no direct references anymore.
-		 * For actually unmovable PageOffline() where the driver does
-		 * not support this, we will fail later when trying to actually
-		 * move these pages that still have a reference count > 0.
-		 * (false negatives in this function only)
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
-			continue;
-
-		if (__PageMovable(page) || PageLRU(page))
-			continue;
-
-		/*
-		 * If there are RECLAIMABLE pages, we need to check
-		 * it.  But now, memory offline itself doesn't call
-		 * shrink_node_slabs() and it still to be fixed.
-		 */
-		return page;
-	}
-	return NULL;
-}
-
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6a0ddda6b3c5..6c841274bf46 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,6 +15,125 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
+/*
+ * This function checks whether pageblock includes unmovable pages or not.
+ *
+ * PageLRU check without isolation or lru_lock could race so that
+ * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
+ * check without lock_page also may miss some movable non-lru pages at
+ * race condition. So you can't expect this function should be exact.
+ *
+ * Returns a page without holding a reference. If the caller wants to
+ * dereference that page (e.g., dumping), it has to make sure that it
+ * cannot get removed (e.g., via memory unplug) concurrently.
+ *
+ */
+static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
+				 int migratetype, int flags)
+{
+	unsigned long iter = 0;
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long offset = pfn % pageblock_nr_pages;
+
+	if (is_migrate_cma_page(page)) {
+		/*
+		 * CMA allocations (alloc_contig_range) really need to mark
+		 * isolate CMA pageblocks even when they are not movable in fact
+		 * so consider them movable here.
+		 */
+		if (is_migrate_cma(migratetype))
+			return NULL;
+
+		return page;
+	}
+
+	for (; iter < pageblock_nr_pages - offset; iter++) {
+		page = pfn_to_page(pfn + iter);
+
+		/*
+		 * Both, bootmem allocations and memory holes are marked
+		 * PG_reserved and are unmovable. We can even have unmovable
+		 * allocations inside ZONE_MOVABLE, for example when
+		 * specifying "movablecore".
+		 */
+		if (PageReserved(page))
+			return page;
+
+		/*
+		 * If the zone is movable and we have ruled out all reserved
+		 * pages then it should be reasonably safe to assume the rest
+		 * is movable.
+		 */
+		if (zone_idx(zone) == ZONE_MOVABLE)
+			continue;
+
+		/*
+		 * Hugepages are not in LRU lists, but they're movable.
+		 * THPs are on the LRU, but need to be counted as #small pages.
+		 * We need not scan over tail pages because we don't
+		 * handle each tail page individually in migration.
+		 */
+		if (PageHuge(page) || PageTransCompound(page)) {
+			struct page *head = compound_head(page);
+			unsigned int skip_pages;
+
+			if (PageHuge(page)) {
+				if (!hugepage_migration_supported(page_hstate(head)))
+					return page;
+			} else if (!PageLRU(head) && !__PageMovable(head)) {
+				return page;
+			}
+
+			skip_pages = compound_nr(head) - (page - head);
+			iter += skip_pages - 1;
+			continue;
+		}
+
+		/*
+		 * We can't use page_count without pin a page
+		 * because another CPU can free compound page.
+		 * This check already skips compound tails of THP
+		 * because their page->_refcount is zero at all time.
+		 */
+		if (!page_ref_count(page)) {
+			if (PageBuddy(page))
+				iter += (1 << buddy_order(page)) - 1;
+			continue;
+		}
+
+		/*
+		 * The HWPoisoned page may be not in buddy system, and
+		 * page_count() is not 0.
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
+			continue;
+
+		/*
+		 * We treat all PageOffline() pages as movable when offlining
+		 * to give drivers a chance to decrement their reference count
+		 * in MEM_GOING_OFFLINE in order to indicate that these pages
+		 * can be offlined as there are no direct references anymore.
+		 * For actually unmovable PageOffline() where the driver does
+		 * not support this, we will fail later when trying to actually
+		 * move these pages that still have a reference count > 0.
+		 * (false negatives in this function only)
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
+			continue;
+
+		if (__PageMovable(page) || PageLRU(page))
+			continue;
+
+		/*
+		 * If there are RECLAIMABLE pages, we need to check
+		 * it.  But now, memory offline itself doesn't call
+		 * shrink_node_slabs() and it still to be fixed.
+		 */
+		return page;
+	}
+	return NULL;
+}
+
 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
 {
 	struct zone *zone = page_zone(page);
-- 
2.34.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Robin Murphy, linux-kernel, iommu, Eric Ren,
	virtualization, linuxppc-dev, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
mm/page_alloc.c and make it static.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/page-isolation.h |   2 -
 mm/page_alloc.c                | 119 ---------------------------------
 mm/page_isolation.c            | 119 +++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 121 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 572458016331..e14eddf6741a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype)
 #define MEMORY_OFFLINE	0x1
 #define REPORT_FAILURE	0x2
 
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15de65215c02..1d812268c2a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8859,125 +8859,6 @@ void *__init alloc_large_system_hash(const char *tablename,
 	return table;
 }
 
-/*
- * This function checks whether pageblock includes unmovable pages or not.
- *
- * PageLRU check without isolation or lru_lock could race so that
- * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
- * check without lock_page also may miss some movable non-lru pages at
- * race condition. So you can't expect this function should be exact.
- *
- * Returns a page without holding a reference. If the caller wants to
- * dereference that page (e.g., dumping), it has to make sure that it
- * cannot get removed (e.g., via memory unplug) concurrently.
- *
- */
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
-{
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
-
-	if (is_migrate_cma_page(page)) {
-		/*
-		 * CMA allocations (alloc_contig_range) really need to mark
-		 * isolate CMA pageblocks even when they are not movable in fact
-		 * so consider them movable here.
-		 */
-		if (is_migrate_cma(migratetype))
-			return NULL;
-
-		return page;
-	}
-
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
-
-		/*
-		 * Both, bootmem allocations and memory holes are marked
-		 * PG_reserved and are unmovable. We can even have unmovable
-		 * allocations inside ZONE_MOVABLE, for example when
-		 * specifying "movablecore".
-		 */
-		if (PageReserved(page))
-			return page;
-
-		/*
-		 * If the zone is movable and we have ruled out all reserved
-		 * pages then it should be reasonably safe to assume the rest
-		 * is movable.
-		 */
-		if (zone_idx(zone) == ZONE_MOVABLE)
-			continue;
-
-		/*
-		 * Hugepages are not in LRU lists, but they're movable.
-		 * THPs are on the LRU, but need to be counted as #small pages.
-		 * We need not scan over tail pages because we don't
-		 * handle each tail page individually in migration.
-		 */
-		if (PageHuge(page) || PageTransCompound(page)) {
-			struct page *head = compound_head(page);
-			unsigned int skip_pages;
-
-			if (PageHuge(page)) {
-				if (!hugepage_migration_supported(page_hstate(head)))
-					return page;
-			} else if (!PageLRU(head) && !__PageMovable(head)) {
-				return page;
-			}
-
-			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
-			continue;
-		}
-
-		/*
-		 * We can't use page_count without pin a page
-		 * because another CPU can free compound page.
-		 * This check already skips compound tails of THP
-		 * because their page->_refcount is zero at all time.
-		 */
-		if (!page_ref_count(page)) {
-			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
-			continue;
-		}
-
-		/*
-		 * The HWPoisoned page may be not in buddy system, and
-		 * page_count() is not 0.
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
-			continue;
-
-		/*
-		 * We treat all PageOffline() pages as movable when offlining
-		 * to give drivers a chance to decrement their reference count
-		 * in MEM_GOING_OFFLINE in order to indicate that these pages
-		 * can be offlined as there are no direct references anymore.
-		 * For actually unmovable PageOffline() where the driver does
-		 * not support this, we will fail later when trying to actually
-		 * move these pages that still have a reference count > 0.
-		 * (false negatives in this function only)
-		 */
-		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
-			continue;
-
-		if (__PageMovable(page) || PageLRU(page))
-			continue;
-
-		/*
-		 * If there are RECLAIMABLE pages, we need to check
-		 * it.  But now, memory offline itself doesn't call
-		 * shrink_node_slabs() and it still to be fixed.
-		 */
-		return page;
-	}
-	return NULL;
-}
-
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6a0ddda6b3c5..6c841274bf46 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,6 +15,125 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
+/*
+ * This function checks whether pageblock includes unmovable pages or not.
+ *
+ * PageLRU check without isolation or lru_lock could race so that
+ * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
+ * check without lock_page also may miss some movable non-lru pages at
+ * race condition. So you can't expect this function should be exact.
+ *
+ * Returns a page without holding a reference. If the caller wants to
+ * dereference that page (e.g., dumping), it has to make sure that it
+ * cannot get removed (e.g., via memory unplug) concurrently.
+ *
+ */
+static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
+				 int migratetype, int flags)
+{
+	unsigned long iter = 0;
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long offset = pfn % pageblock_nr_pages;
+
+	if (is_migrate_cma_page(page)) {
+		/*
+		 * CMA allocations (alloc_contig_range) really need to mark
+		 * isolate CMA pageblocks even when they are not movable in fact
+		 * so consider them movable here.
+		 */
+		if (is_migrate_cma(migratetype))
+			return NULL;
+
+		return page;
+	}
+
+	for (; iter < pageblock_nr_pages - offset; iter++) {
+		page = pfn_to_page(pfn + iter);
+
+		/*
+		 * Both, bootmem allocations and memory holes are marked
+		 * PG_reserved and are unmovable. We can even have unmovable
+		 * allocations inside ZONE_MOVABLE, for example when
+		 * specifying "movablecore".
+		 */
+		if (PageReserved(page))
+			return page;
+
+		/*
+		 * If the zone is movable and we have ruled out all reserved
+		 * pages then it should be reasonably safe to assume the rest
+		 * is movable.
+		 */
+		if (zone_idx(zone) == ZONE_MOVABLE)
+			continue;
+
+		/*
+		 * Hugepages are not in LRU lists, but they're movable.
+		 * THPs are on the LRU, but need to be counted as #small pages.
+		 * We need not scan over tail pages because we don't
+		 * handle each tail page individually in migration.
+		 */
+		if (PageHuge(page) || PageTransCompound(page)) {
+			struct page *head = compound_head(page);
+			unsigned int skip_pages;
+
+			if (PageHuge(page)) {
+				if (!hugepage_migration_supported(page_hstate(head)))
+					return page;
+			} else if (!PageLRU(head) && !__PageMovable(head)) {
+				return page;
+			}
+
+			skip_pages = compound_nr(head) - (page - head);
+			iter += skip_pages - 1;
+			continue;
+		}
+
+		/*
+		 * We can't use page_count without pin a page
+		 * because another CPU can free compound page.
+		 * This check already skips compound tails of THP
+		 * because their page->_refcount is zero at all time.
+		 */
+		if (!page_ref_count(page)) {
+			if (PageBuddy(page))
+				iter += (1 << buddy_order(page)) - 1;
+			continue;
+		}
+
+		/*
+		 * The HWPoisoned page may be not in buddy system, and
+		 * page_count() is not 0.
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
+			continue;
+
+		/*
+		 * We treat all PageOffline() pages as movable when offlining
+		 * to give drivers a chance to decrement their reference count
+		 * in MEM_GOING_OFFLINE in order to indicate that these pages
+		 * can be offlined as there are no direct references anymore.
+		 * For actually unmovable PageOffline() where the driver does
+		 * not support this, we will fail later when trying to actually
+		 * move these pages that still have a reference count > 0.
+		 * (false negatives in this function only)
+		 */
+		if ((flags & MEMORY_OFFLINE) && PageOffline(page))
+			continue;
+
+		if (__PageMovable(page) || PageLRU(page))
+			continue;
+
+		/*
+		 * If there are RECLAIMABLE pages, we need to check
+		 * it.  But now, memory offline itself doesn't call
+		 * shrink_node_slabs() and it still to be fixed.
+		 */
+		return page;
+	}
+	return NULL;
+}
+
 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
 {
 	struct zone *zone = page_zone(page);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-01-19 19:06 ` Zi Yan
  (?)
@ 2022-01-19 19:06   ` Zi Yan
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

Enable set_migratetype_isolate() to check specified sub-range for
unmovable pages during isolation. Page isolation is done
at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
pages within that granularity are intended to be isolated. For example,
alloc_contig_range(), which uses page isolation, allows ranges without
alignment. This commit makes unmovable page check only look for
interesting pages, so that page isolation can succeed for any
non-overlapping ranges.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/page-isolation.h |  1 +
 mm/memory_hotplug.c            | 12 +++++++-
 mm/page_alloc.c                |  2 +-
 mm/page_isolation.c            | 53 +++++++++++++++++++++-------------
 4 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index e14eddf6741a..a4d2687ed4e6 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -42,6 +42,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
  */
 int
 start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			 unsigned long isolate_start, unsigned long isolate_end,
 			 unsigned migratetype, int flags);
 
 /*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0139b77c51d5..5db84c3fa882 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1901,8 +1901,18 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	zone_pcp_disable(zone);
 	lru_cache_disable();
 
-	/* set above range as isolated */
+	/*
+	 * set above range as isolated
+	 *
+	 * start_pfn and end_pfn are the same as isolate_start and isolate_end,
+	 * because start_pfn and end_pfn are already PAGES_PER_SECTION
+	 * (>= MAX_ORDER_NR_PAGES) aligned; if start_pfn is
+	 * pageblock_nr_pages aligned in memmap_on_memory case, there is no
+	 * need to isolate pages before start_pfn, since they are used by
+	 * memmap thus not user visible.
+	 */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
+				       start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
 				       MEMORY_OFFLINE | REPORT_FAILURE);
 	if (ret) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1d812268c2a9..812cf557b20f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9016,7 +9016,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(pfn_max_align_down(start),
+	ret = start_isolate_page_range(start, end, pfn_max_align_down(start),
 				       pfn_max_align_up(end), migratetype, 0);
 	if (ret)
 		return ret;
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6c841274bf46..d17ad9a7d4bf 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -16,7 +16,8 @@
 #include <trace/events/page_isolation.h>
 
 /*
- * This function checks whether pageblock includes unmovable pages or not.
+ * This function checks whether pageblock within [start_pfn, end_pfn) includes
+ * unmovable pages or not.
  *
  * PageLRU check without isolation or lru_lock could race so that
  * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
@@ -29,11 +30,14 @@
  *
  */
 static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
+				 int migratetype, int flags,
+				 unsigned long start_pfn, unsigned long end_pfn)
 {
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
+	unsigned long first_pfn = max(page_to_pfn(page), start_pfn);
+	unsigned long pfn = first_pfn;
+	unsigned long last_pfn = min(ALIGN(pfn + 1, pageblock_nr_pages), end_pfn);
+
+	page = pfn_to_page(pfn);
 
 	if (is_migrate_cma_page(page)) {
 		/*
@@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		return page;
 	}
 
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
+	for (pfn = first_pfn; pfn < last_pfn; pfn++) {
+		page = pfn_to_page(pfn);
 
 		/*
 		 * Both, bootmem allocations and memory holes are marked
@@ -85,7 +89,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 			}
 
 			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
+			pfn += skip_pages - 1;
 			continue;
 		}
 
@@ -97,7 +101,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		 */
 		if (!page_ref_count(page)) {
 			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
+				pfn += (1 << buddy_order(page)) - 1;
 			continue;
 		}
 
@@ -134,7 +138,13 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 	return NULL;
 }
 
-static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
+/*
+ * This function set pageblock migratetype to isolate if no unmovable page is
+ * present in [start_pfn, end_pfn). The pageblock must be within
+ * [start_pfn, end_pfn).
+ */
+static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
+			unsigned long start_pfn, unsigned long end_pfn)
 {
 	struct zone *zone = page_zone(page);
 	struct page *unmovable;
@@ -156,7 +166,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
+	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags, start_pfn, end_pfn);
 	if (!unmovable) {
 		unsigned long nr_pages;
 		int mt = get_pageblock_migratetype(page);
@@ -265,8 +275,12 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 /**
  * start_isolate_page_range() - make page-allocation-type of range of pages to
  * be MIGRATE_ISOLATE.
- * @start_pfn:		The lower PFN of the range to be isolated.
- * @end_pfn:		The upper PFN of the range to be isolated.
+ * @start_pfn:		The lower PFN of the range to be checked for
+ *			possibility of isolation.
+ * @end_pfn:		The upper PFN of the range to be checked for
+ *			possibility of isolation.
+ * @isolate_start:		The lower PFN of the range to be isolated.
+ * @isolate_end:		The upper PFN of the range to be isolated.
  *			start_pfn/end_pfn must be aligned to pageblock_order.
  * @migratetype:	Migrate type to set in error recovery.
  * @flags:		The following flags are allowed (they can be combined in
@@ -304,20 +318,19 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned long isolate_start, unsigned long isolate_end,
 			     unsigned migratetype, int flags)
 {
 	unsigned long pfn;
 	struct page *page;
 
-	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
-	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
-
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
+	for (pfn = isolate_start;
+	     pfn < isolate_end;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page && set_migratetype_isolate(page, migratetype, flags)) {
-			undo_isolate_page_range(start_pfn, pfn, migratetype);
+		if (page && set_migratetype_isolate(page, migratetype, flags,
+					start_pfn, end_pfn)) {
+			undo_isolate_page_range(isolate_start, pfn, migratetype);
 			return -EBUSY;
 		}
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Michael Ellerman, Robin Murphy, linux-kernel,
	iommu, Eric Ren, virtualization, linuxppc-dev, Christoph Hellwig,
	Vlastimil Babka

From: Zi Yan <ziy@nvidia.com>

Enable set_migratetype_isolate() to check specified sub-range for
unmovable pages during isolation. Page isolation is done
at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
pages within that granularity are intended to be isolated. For example,
alloc_contig_range(), which uses page isolation, allows ranges without
alignment. This commit makes unmovable page check only look for
interesting pages, so that page isolation can succeed for any
non-overlapping ranges.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/page-isolation.h |  1 +
 mm/memory_hotplug.c            | 12 +++++++-
 mm/page_alloc.c                |  2 +-
 mm/page_isolation.c            | 53 +++++++++++++++++++++-------------
 4 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index e14eddf6741a..a4d2687ed4e6 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -42,6 +42,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
  */
 int
 start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			 unsigned long isolate_start, unsigned long isolate_end,
 			 unsigned migratetype, int flags);
 
 /*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0139b77c51d5..5db84c3fa882 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1901,8 +1901,18 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	zone_pcp_disable(zone);
 	lru_cache_disable();
 
-	/* set above range as isolated */
+	/*
+	 * set above range as isolated
+	 *
+	 * start_pfn and end_pfn are the same as isolate_start and isolate_end,
+	 * because start_pfn and end_pfn are already PAGES_PER_SECTION
+	 * (>= MAX_ORDER_NR_PAGES) aligned; if start_pfn is
+	 * pageblock_nr_pages aligned in memmap_on_memory case, there is no
+	 * need to isolate pages before start_pfn, since they are used by
+	 * memmap thus not user visible.
+	 */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
+				       start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
 				       MEMORY_OFFLINE | REPORT_FAILURE);
 	if (ret) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1d812268c2a9..812cf557b20f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9016,7 +9016,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(pfn_max_align_down(start),
+	ret = start_isolate_page_range(start, end, pfn_max_align_down(start),
 				       pfn_max_align_up(end), migratetype, 0);
 	if (ret)
 		return ret;
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6c841274bf46..d17ad9a7d4bf 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -16,7 +16,8 @@
 #include <trace/events/page_isolation.h>
 
 /*
- * This function checks whether pageblock includes unmovable pages or not.
+ * This function checks whether pageblock within [start_pfn, end_pfn) includes
+ * unmovable pages or not.
  *
  * PageLRU check without isolation or lru_lock could race so that
  * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
@@ -29,11 +30,14 @@
  *
  */
 static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
+				 int migratetype, int flags,
+				 unsigned long start_pfn, unsigned long end_pfn)
 {
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
+	unsigned long first_pfn = max(page_to_pfn(page), start_pfn);
+	unsigned long pfn = first_pfn;
+	unsigned long last_pfn = min(ALIGN(pfn + 1, pageblock_nr_pages), end_pfn);
+
+	page = pfn_to_page(pfn);
 
 	if (is_migrate_cma_page(page)) {
 		/*
@@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		return page;
 	}
 
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
+	for (pfn = first_pfn; pfn < last_pfn; pfn++) {
+		page = pfn_to_page(pfn);
 
 		/*
 		 * Both, bootmem allocations and memory holes are marked
@@ -85,7 +89,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 			}
 
 			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
+			pfn += skip_pages - 1;
 			continue;
 		}
 
@@ -97,7 +101,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		 */
 		if (!page_ref_count(page)) {
 			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
+				pfn += (1 << buddy_order(page)) - 1;
 			continue;
 		}
 
@@ -134,7 +138,13 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 	return NULL;
 }
 
-static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
+/*
+ * This function set pageblock migratetype to isolate if no unmovable page is
+ * present in [start_pfn, end_pfn). The pageblock must be within
+ * [start_pfn, end_pfn).
+ */
+static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
+			unsigned long start_pfn, unsigned long end_pfn)
 {
 	struct zone *zone = page_zone(page);
 	struct page *unmovable;
@@ -156,7 +166,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
+	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags, start_pfn, end_pfn);
 	if (!unmovable) {
 		unsigned long nr_pages;
 		int mt = get_pageblock_migratetype(page);
@@ -265,8 +275,12 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 /**
  * start_isolate_page_range() - make page-allocation-type of range of pages to
  * be MIGRATE_ISOLATE.
- * @start_pfn:		The lower PFN of the range to be isolated.
- * @end_pfn:		The upper PFN of the range to be isolated.
+ * @start_pfn:		The lower PFN of the range to be checked for
+ *			possibility of isolation.
+ * @end_pfn:		The upper PFN of the range to be checked for
+ *			possibility of isolation.
+ * @isolate_start:		The lower PFN of the range to be isolated.
+ * @isolate_end:		The upper PFN of the range to be isolated.
  *			start_pfn/end_pfn must be aligned to pageblock_order.
  * @migratetype:	Migrate type to set in error recovery.
  * @flags:		The following flags are allowed (they can be combined in
@@ -304,20 +318,19 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned long isolate_start, unsigned long isolate_end,
 			     unsigned migratetype, int flags)
 {
 	unsigned long pfn;
 	struct page *page;
 
-	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
-	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
-
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
+	for (pfn = isolate_start;
+	     pfn < isolate_end;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page && set_migratetype_isolate(page, migratetype, flags)) {
-			undo_isolate_page_range(start_pfn, pfn, migratetype);
+		if (page && set_migratetype_isolate(page, migratetype, flags,
+					start_pfn, end_pfn)) {
+			undo_isolate_page_range(isolate_start, pfn, migratetype);
 			return -EBUSY;
 		}
 	}
-- 
2.34.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Robin Murphy, linux-kernel, iommu, Eric Ren,
	virtualization, linuxppc-dev, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

Enable set_migratetype_isolate() to check specified sub-range for
unmovable pages during isolation. Page isolation is done
at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
pages within that granularity are intended to be isolated. For example,
alloc_contig_range(), which uses page isolation, allows ranges without
alignment. This commit makes unmovable page check only look for
interesting pages, so that page isolation can succeed for any
non-overlapping ranges.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/page-isolation.h |  1 +
 mm/memory_hotplug.c            | 12 +++++++-
 mm/page_alloc.c                |  2 +-
 mm/page_isolation.c            | 53 +++++++++++++++++++++-------------
 4 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index e14eddf6741a..a4d2687ed4e6 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -42,6 +42,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
  */
 int
 start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			 unsigned long isolate_start, unsigned long isolate_end,
 			 unsigned migratetype, int flags);
 
 /*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0139b77c51d5..5db84c3fa882 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1901,8 +1901,18 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	zone_pcp_disable(zone);
 	lru_cache_disable();
 
-	/* set above range as isolated */
+	/*
+	 * set above range as isolated
+	 *
+	 * start_pfn and end_pfn are the same as isolate_start and isolate_end,
+	 * because start_pfn and end_pfn are already PAGES_PER_SECTION
+	 * (>= MAX_ORDER_NR_PAGES) aligned; if start_pfn is
+	 * pageblock_nr_pages aligned in memmap_on_memory case, there is no
+	 * need to isolate pages before start_pfn, since they are used by
+	 * memmap thus not user visible.
+	 */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
+				       start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
 				       MEMORY_OFFLINE | REPORT_FAILURE);
 	if (ret) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1d812268c2a9..812cf557b20f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9016,7 +9016,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(pfn_max_align_down(start),
+	ret = start_isolate_page_range(start, end, pfn_max_align_down(start),
 				       pfn_max_align_up(end), migratetype, 0);
 	if (ret)
 		return ret;
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6c841274bf46..d17ad9a7d4bf 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -16,7 +16,8 @@
 #include <trace/events/page_isolation.h>
 
 /*
- * This function checks whether pageblock includes unmovable pages or not.
+ * This function checks whether pageblock within [start_pfn, end_pfn) includes
+ * unmovable pages or not.
  *
  * PageLRU check without isolation or lru_lock could race so that
  * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
@@ -29,11 +30,14 @@
  *
  */
 static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-				 int migratetype, int flags)
+				 int migratetype, int flags,
+				 unsigned long start_pfn, unsigned long end_pfn)
 {
-	unsigned long iter = 0;
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long offset = pfn % pageblock_nr_pages;
+	unsigned long first_pfn = max(page_to_pfn(page), start_pfn);
+	unsigned long pfn = first_pfn;
+	unsigned long last_pfn = min(ALIGN(pfn + 1, pageblock_nr_pages), end_pfn);
+
+	page = pfn_to_page(pfn);
 
 	if (is_migrate_cma_page(page)) {
 		/*
@@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		return page;
 	}
 
-	for (; iter < pageblock_nr_pages - offset; iter++) {
-		page = pfn_to_page(pfn + iter);
+	for (pfn = first_pfn; pfn < last_pfn; pfn++) {
+		page = pfn_to_page(pfn);
 
 		/*
 		 * Both, bootmem allocations and memory holes are marked
@@ -85,7 +89,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 			}
 
 			skip_pages = compound_nr(head) - (page - head);
-			iter += skip_pages - 1;
+			pfn += skip_pages - 1;
 			continue;
 		}
 
@@ -97,7 +101,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		 */
 		if (!page_ref_count(page)) {
 			if (PageBuddy(page))
-				iter += (1 << buddy_order(page)) - 1;
+				pfn += (1 << buddy_order(page)) - 1;
 			continue;
 		}
 
@@ -134,7 +138,13 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 	return NULL;
 }
 
-static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
+/*
+ * This function set pageblock migratetype to isolate if no unmovable page is
+ * present in [start_pfn, end_pfn). The pageblock must be within
+ * [start_pfn, end_pfn).
+ */
+static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
+			unsigned long start_pfn, unsigned long end_pfn)
 {
 	struct zone *zone = page_zone(page);
 	struct page *unmovable;
@@ -156,7 +166,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
+	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags, start_pfn, end_pfn);
 	if (!unmovable) {
 		unsigned long nr_pages;
 		int mt = get_pageblock_migratetype(page);
@@ -265,8 +275,12 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 /**
  * start_isolate_page_range() - make page-allocation-type of range of pages to
  * be MIGRATE_ISOLATE.
- * @start_pfn:		The lower PFN of the range to be isolated.
- * @end_pfn:		The upper PFN of the range to be isolated.
+ * @start_pfn:		The lower PFN of the range to be checked for
+ *			possibility of isolation.
+ * @end_pfn:		The upper PFN of the range to be checked for
+ *			possibility of isolation.
+ * @isolate_start:		The lower PFN of the range to be isolated.
+ * @isolate_end:		The upper PFN of the range to be isolated.
  *			start_pfn/end_pfn must be aligned to pageblock_order.
  * @migratetype:	Migrate type to set in error recovery.
  * @flags:		The following flags are allowed (they can be combined in
@@ -304,20 +318,19 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned long isolate_start, unsigned long isolate_end,
 			     unsigned migratetype, int flags)
 {
 	unsigned long pfn;
 	struct page *page;
 
-	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
-	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
-
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
+	for (pfn = isolate_start;
+	     pfn < isolate_end;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page && set_migratetype_isolate(page, migratetype, flags)) {
-			undo_isolate_page_range(start_pfn, pfn, migratetype);
+		if (page && set_migratetype_isolate(page, migratetype, flags,
+					start_pfn, end_pfn)) {
+			undo_isolate_page_range(isolate_start, pfn, migratetype);
 			return -EBUSY;
 		}
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
  2022-01-19 19:06 ` Zi Yan
  (?)
@ 2022-01-19 19:06   ` Zi Yan
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
pageblocks with different migratetypes. It might unnecessarily convert
extra pageblocks at the beginning and at the end of the range. Change
alloc_contig_range() to work at pageblock granularity.

It is done by restoring pageblock types and split >pageblock_order free
pages after isolating at MAX_ORDER-1 granularity and migrating pages
away at pageblock granularity. The reason for this process is that
during isolation, some pages, either free or in-use, might have >pageblock
sizes and isolating part of them can cause free accounting issues.
Restoring the migratetypes of the pageblocks not in the interesting
range later is much easier.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/page_alloc.c | 175 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 155 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 812cf557b20f..6ed506234efa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8862,8 +8862,8 @@ void *__init alloc_large_system_hash(const char *tablename,
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
-	return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
-			     pageblock_nr_pages) - 1);
+	return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+				     pageblock_nr_pages));
 }
 
 static unsigned long pfn_max_align_up(unsigned long pfn)
@@ -8952,6 +8952,52 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	return 0;
 }
 
+static inline int save_migratetypes(unsigned char *migratetypes,
+				unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn = start_pfn;
+	int num = 0;
+
+	while (pfn < end_pfn) {
+		migratetypes[num] = get_pageblock_migratetype(pfn_to_page(pfn));
+		num++;
+		pfn += pageblock_nr_pages;
+	}
+	return num;
+}
+
+static inline int restore_migratetypes(unsigned char *migratetypes,
+				unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn = start_pfn;
+	int num = 0;
+
+	while (pfn < end_pfn) {
+		set_pageblock_migratetype(pfn_to_page(pfn), migratetypes[num]);
+		num++;
+		pfn += pageblock_nr_pages;
+	}
+	return num;
+}
+
+static inline void split_free_page_into_pageblocks(struct page *free_page,
+				int order, struct zone *zone)
+{
+	unsigned long pfn;
+
+	spin_lock(&zone->lock);
+	del_page_from_free_list(free_page, zone, order);
+	for (pfn = page_to_pfn(free_page);
+	     pfn < page_to_pfn(free_page) + (1UL << order);
+	     pfn += pageblock_nr_pages) {
+		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
+				mt, FPI_NONE);
+	}
+	spin_unlock(&zone->lock);
+}
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -8977,8 +9023,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		       unsigned migratetype, gfp_t gfp_mask)
 {
 	unsigned long outer_start, outer_end;
+	unsigned long isolate_start = pfn_max_align_down(start);
+	unsigned long isolate_end = pfn_max_align_up(end);
+	unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
+	unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
+	unsigned long num_pageblock_to_save;
 	unsigned int order;
 	int ret = 0;
+	unsigned char *saved_mt;
+	int num;
 
 	struct compact_control cc = {
 		.nr_migratepages = 0,
@@ -8992,11 +9045,30 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
+	/*
+	 * TODO: make MIGRATE_ISOLATE a standalone bit to avoid overwriting
+	 * the exiting migratetype. Then, we will not need the save and restore
+	 * process here.
+	 */
+
+	/* Save the migratepages of the pageblocks before start and after end */
+	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
+				+ (isolate_end - alloc_end) / pageblock_nr_pages;
+	saved_mt =
+		kmalloc_array(num_pageblock_to_save,
+			      sizeof(unsigned char), GFP_KERNEL);
+	if (!saved_mt)
+		return -ENOMEM;
+
+	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
+
+	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
 	/*
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
 	 * have different sizes, and due to the way page allocator
-	 * work, we align the range to biggest of the two pages so
+	 * work, we align the isolation range to biggest of the two so
 	 * that page allocator won't try to merge buddies from
 	 * different pageblocks and change MIGRATE_ISOLATE to some
 	 * other migration type.
@@ -9006,6 +9078,20 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * we are interested in).  This will put all the pages in
 	 * range back to page allocator as MIGRATE_ISOLATE.
 	 *
+	 * Afterwards, we restore the migratetypes of the pageblocks not
+	 * in range, split free pages spanning outside the range,
+	 * and put split free pages (at pageblock_order) to the right
+	 * migratetype list.
+	 *
+	 * NOTE: the above approach is used because it can cause free
+	 * page accounting issues during isolation, if a page, either
+	 * free or in-use, contains multiple pageblocks and we only
+	 * isolate a subset of them. For example, if only the second
+	 * pageblock is isolated from a page with 2 pageblocks, after
+	 * the page is free, it will be put in the first pageblock
+	 * migratetype list instead of having 2 pageblocks in two
+	 * separate migratetype lists.
+	 *
 	 * When this is done, we take the pages in range from page
 	 * allocator removing them from the buddy system.  This way
 	 * page allocator will never consider using them.
@@ -9016,10 +9102,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(start, end, pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype, 0);
+	ret = start_isolate_page_range(start, end, isolate_start, isolate_end,
+				migratetype, 0);
 	if (ret)
-		return ret;
+		goto done;
 
 	drain_all_pages(cc.zone);
 
@@ -9055,6 +9141,19 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * isolated thus they won't get removed from buddy.
 	 */
 
+	/*
+	 * Restore migratetypes of pageblocks outside [start, end)
+	 * TODO: remove it when MIGRATE_ISOLATE becomes a standalone bit
+	 */
+
+	num = restore_migratetypes(saved_mt, isolate_start, alloc_start);
+
+	num = restore_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
+	/*
+	 * Split free page spanning [isolate_start, alloc_start) and put the
+	 * pageblocks in the right migratetype lists.
+	 */
 	order = 0;
 	outer_start = start;
 	while (!PageBuddy(pfn_to_page(outer_start))) {
@@ -9069,37 +9168,73 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		order = buddy_order(pfn_to_page(outer_start));
 
 		/*
-		 * outer_start page could be small order buddy page and
-		 * it doesn't include start page. Adjust outer_start
-		 * in this case to report failed page properly
-		 * on tracepoint in test_pages_isolated()
+		 * split the free page has start page and put the pageblocks
+		 * in the right migratetype list
 		 */
-		if (outer_start + (1UL << order) <= start)
-			outer_start = start;
+		if (outer_start + (1UL << order) > start) {
+			struct page *free_page = pfn_to_page(outer_start);
+
+			split_free_page_into_pageblocks(free_page, order, cc.zone);
+		}
+	}
+
+	/*
+	 * Split free page spanning [alloc_end, isolate_end) and put the
+	 * pageblocks in the right migratetype list
+	 */
+	for (outer_end = alloc_end; outer_end < isolate_end;) {
+		unsigned long begin_pfn = outer_end;
+
+		order = 0;
+		while (!PageBuddy(pfn_to_page(outer_end))) {
+			if (++order >= MAX_ORDER) {
+				outer_end = begin_pfn;
+				break;
+			}
+			outer_end &= ~0UL << order;
+		}
+
+		if (outer_end != begin_pfn) {
+			order = buddy_order(pfn_to_page(outer_end));
+
+			/*
+			 * split the free page has start page and put the pageblocks
+			 * in the right migratetype list
+			 */
+			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
+			{
+				struct page *free_page = pfn_to_page(outer_end);
+
+				split_free_page_into_pageblocks(free_page, order, cc.zone);
+			}
+			outer_end += 1UL << order;
+		} else
+			outer_end = begin_pfn + 1;
 	}
 
 	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, 0)) {
+	if (test_pages_isolated(alloc_start, alloc_end, 0)) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Grab isolated pages from freelists. */
-	outer_end = isolate_freepages_range(&cc, outer_start, end);
+	outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
 	if (!outer_end) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Free head and tail (if any) */
-	if (start != outer_start)
-		free_contig_range(outer_start, start - outer_start);
-	if (end != outer_end)
-		free_contig_range(end, outer_end - end);
+	if (start != alloc_start)
+		free_contig_range(alloc_start, start - alloc_start);
+	if (end != alloc_end)
+		free_contig_range(end, alloc_end - end);
 
 done:
-	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+	kfree(saved_mt);
+	undo_isolate_page_range(alloc_start,
+				alloc_end, migratetype);
 	return ret;
 }
 EXPORT_SYMBOL(alloc_contig_range);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Michael Ellerman, Robin Murphy, linux-kernel,
	iommu, Eric Ren, virtualization, linuxppc-dev, Christoph Hellwig,
	Vlastimil Babka

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
pageblocks with different migratetypes. It might unnecessarily convert
extra pageblocks at the beginning and at the end of the range. Change
alloc_contig_range() to work at pageblock granularity.

It is done by restoring pageblock types and split >pageblock_order free
pages after isolating at MAX_ORDER-1 granularity and migrating pages
away at pageblock granularity. The reason for this process is that
during isolation, some pages, either free or in-use, might have >pageblock
sizes and isolating part of them can cause free accounting issues.
Restoring the migratetypes of the pageblocks not in the interesting
range later is much easier.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/page_alloc.c | 175 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 155 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 812cf557b20f..6ed506234efa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8862,8 +8862,8 @@ void *__init alloc_large_system_hash(const char *tablename,
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
-	return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
-			     pageblock_nr_pages) - 1);
+	return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+				     pageblock_nr_pages));
 }
 
 static unsigned long pfn_max_align_up(unsigned long pfn)
@@ -8952,6 +8952,52 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	return 0;
 }
 
+static inline int save_migratetypes(unsigned char *migratetypes,
+				unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn = start_pfn;
+	int num = 0;
+
+	while (pfn < end_pfn) {
+		migratetypes[num] = get_pageblock_migratetype(pfn_to_page(pfn));
+		num++;
+		pfn += pageblock_nr_pages;
+	}
+	return num;
+}
+
+static inline int restore_migratetypes(unsigned char *migratetypes,
+				unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn = start_pfn;
+	int num = 0;
+
+	while (pfn < end_pfn) {
+		set_pageblock_migratetype(pfn_to_page(pfn), migratetypes[num]);
+		num++;
+		pfn += pageblock_nr_pages;
+	}
+	return num;
+}
+
+static inline void split_free_page_into_pageblocks(struct page *free_page,
+				int order, struct zone *zone)
+{
+	unsigned long pfn;
+
+	spin_lock(&zone->lock);
+	del_page_from_free_list(free_page, zone, order);
+	for (pfn = page_to_pfn(free_page);
+	     pfn < page_to_pfn(free_page) + (1UL << order);
+	     pfn += pageblock_nr_pages) {
+		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
+				mt, FPI_NONE);
+	}
+	spin_unlock(&zone->lock);
+}
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -8977,8 +9023,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		       unsigned migratetype, gfp_t gfp_mask)
 {
 	unsigned long outer_start, outer_end;
+	unsigned long isolate_start = pfn_max_align_down(start);
+	unsigned long isolate_end = pfn_max_align_up(end);
+	unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
+	unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
+	unsigned long num_pageblock_to_save;
 	unsigned int order;
 	int ret = 0;
+	unsigned char *saved_mt;
+	int num;
 
 	struct compact_control cc = {
 		.nr_migratepages = 0,
@@ -8992,11 +9045,30 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
+	/*
+	 * TODO: make MIGRATE_ISOLATE a standalone bit to avoid overwriting
+	 * the exiting migratetype. Then, we will not need the save and restore
+	 * process here.
+	 */
+
+	/* Save the migratepages of the pageblocks before start and after end */
+	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
+				+ (isolate_end - alloc_end) / pageblock_nr_pages;
+	saved_mt =
+		kmalloc_array(num_pageblock_to_save,
+			      sizeof(unsigned char), GFP_KERNEL);
+	if (!saved_mt)
+		return -ENOMEM;
+
+	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
+
+	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
 	/*
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
 	 * have different sizes, and due to the way page allocator
-	 * work, we align the range to biggest of the two pages so
+	 * work, we align the isolation range to biggest of the two so
 	 * that page allocator won't try to merge buddies from
 	 * different pageblocks and change MIGRATE_ISOLATE to some
 	 * other migration type.
@@ -9006,6 +9078,20 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * we are interested in).  This will put all the pages in
 	 * range back to page allocator as MIGRATE_ISOLATE.
 	 *
+	 * Afterwards, we restore the migratetypes of the pageblocks not
+	 * in range, split free pages spanning outside the range,
+	 * and put split free pages (at pageblock_order) to the right
+	 * migratetype list.
+	 *
+	 * NOTE: the above approach is used because it can cause free
+	 * page accounting issues during isolation, if a page, either
+	 * free or in-use, contains multiple pageblocks and we only
+	 * isolate a subset of them. For example, if only the second
+	 * pageblock is isolated from a page with 2 pageblocks, after
+	 * the page is free, it will be put in the first pageblock
+	 * migratetype list instead of having 2 pageblocks in two
+	 * separate migratetype lists.
+	 *
 	 * When this is done, we take the pages in range from page
 	 * allocator removing them from the buddy system.  This way
 	 * page allocator will never consider using them.
@@ -9016,10 +9102,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(start, end, pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype, 0);
+	ret = start_isolate_page_range(start, end, isolate_start, isolate_end,
+				migratetype, 0);
 	if (ret)
-		return ret;
+		goto done;
 
 	drain_all_pages(cc.zone);
 
@@ -9055,6 +9141,19 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * isolated thus they won't get removed from buddy.
 	 */
 
+	/*
+	 * Restore migratetypes of pageblocks outside [start, end)
+	 * TODO: remove it when MIGRATE_ISOLATE becomes a standalone bit
+	 */
+
+	num = restore_migratetypes(saved_mt, isolate_start, alloc_start);
+
+	num = restore_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
+	/*
+	 * Split free page spanning [isolate_start, alloc_start) and put the
+	 * pageblocks in the right migratetype lists.
+	 */
 	order = 0;
 	outer_start = start;
 	while (!PageBuddy(pfn_to_page(outer_start))) {
@@ -9069,37 +9168,73 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		order = buddy_order(pfn_to_page(outer_start));
 
 		/*
-		 * outer_start page could be small order buddy page and
-		 * it doesn't include start page. Adjust outer_start
-		 * in this case to report failed page properly
-		 * on tracepoint in test_pages_isolated()
+		 * split the free page has start page and put the pageblocks
+		 * in the right migratetype list
 		 */
-		if (outer_start + (1UL << order) <= start)
-			outer_start = start;
+		if (outer_start + (1UL << order) > start) {
+			struct page *free_page = pfn_to_page(outer_start);
+
+			split_free_page_into_pageblocks(free_page, order, cc.zone);
+		}
+	}
+
+	/*
+	 * Split free page spanning [alloc_end, isolate_end) and put the
+	 * pageblocks in the right migratetype list
+	 */
+	for (outer_end = alloc_end; outer_end < isolate_end;) {
+		unsigned long begin_pfn = outer_end;
+
+		order = 0;
+		while (!PageBuddy(pfn_to_page(outer_end))) {
+			if (++order >= MAX_ORDER) {
+				outer_end = begin_pfn;
+				break;
+			}
+			outer_end &= ~0UL << order;
+		}
+
+		if (outer_end != begin_pfn) {
+			order = buddy_order(pfn_to_page(outer_end));
+
+			/*
+			 * split the free page has start page and put the pageblocks
+			 * in the right migratetype list
+			 */
+			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
+			{
+				struct page *free_page = pfn_to_page(outer_end);
+
+				split_free_page_into_pageblocks(free_page, order, cc.zone);
+			}
+			outer_end += 1UL << order;
+		} else
+			outer_end = begin_pfn + 1;
 	}
 
 	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, 0)) {
+	if (test_pages_isolated(alloc_start, alloc_end, 0)) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Grab isolated pages from freelists. */
-	outer_end = isolate_freepages_range(&cc, outer_start, end);
+	outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
 	if (!outer_end) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Free head and tail (if any) */
-	if (start != outer_start)
-		free_contig_range(outer_start, start - outer_start);
-	if (end != outer_end)
-		free_contig_range(end, outer_end - end);
+	if (start != alloc_start)
+		free_contig_range(alloc_start, start - alloc_start);
+	if (end != alloc_end)
+		free_contig_range(end, alloc_end - end);
 
 done:
-	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+	kfree(saved_mt);
+	undo_isolate_page_range(alloc_start,
+				alloc_end, migratetype);
 	return ret;
 }
 EXPORT_SYMBOL(alloc_contig_range);
-- 
2.34.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Robin Murphy, linux-kernel, iommu, Eric Ren,
	virtualization, linuxppc-dev, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
pageblocks with different migratetypes. It might unnecessarily convert
extra pageblocks at the beginning and at the end of the range. Change
alloc_contig_range() to work at pageblock granularity.

It is done by restoring pageblock types and split >pageblock_order free
pages after isolating at MAX_ORDER-1 granularity and migrating pages
away at pageblock granularity. The reason for this process is that
during isolation, some pages, either free or in-use, might have >pageblock
sizes and isolating part of them can cause free accounting issues.
Restoring the migratetypes of the pageblocks not in the interesting
range later is much easier.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/page_alloc.c | 175 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 155 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 812cf557b20f..6ed506234efa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8862,8 +8862,8 @@ void *__init alloc_large_system_hash(const char *tablename,
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
-	return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
-			     pageblock_nr_pages) - 1);
+	return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+				     pageblock_nr_pages));
 }
 
 static unsigned long pfn_max_align_up(unsigned long pfn)
@@ -8952,6 +8952,52 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	return 0;
 }
 
+static inline int save_migratetypes(unsigned char *migratetypes,
+				unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn = start_pfn;
+	int num = 0;
+
+	while (pfn < end_pfn) {
+		migratetypes[num] = get_pageblock_migratetype(pfn_to_page(pfn));
+		num++;
+		pfn += pageblock_nr_pages;
+	}
+	return num;
+}
+
+static inline int restore_migratetypes(unsigned char *migratetypes,
+				unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn = start_pfn;
+	int num = 0;
+
+	while (pfn < end_pfn) {
+		set_pageblock_migratetype(pfn_to_page(pfn), migratetypes[num]);
+		num++;
+		pfn += pageblock_nr_pages;
+	}
+	return num;
+}
+
+static inline void split_free_page_into_pageblocks(struct page *free_page,
+				int order, struct zone *zone)
+{
+	unsigned long pfn;
+
+	spin_lock(&zone->lock);
+	del_page_from_free_list(free_page, zone, order);
+	for (pfn = page_to_pfn(free_page);
+	     pfn < page_to_pfn(free_page) + (1UL << order);
+	     pfn += pageblock_nr_pages) {
+		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
+				mt, FPI_NONE);
+	}
+	spin_unlock(&zone->lock);
+}
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -8977,8 +9023,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		       unsigned migratetype, gfp_t gfp_mask)
 {
 	unsigned long outer_start, outer_end;
+	unsigned long isolate_start = pfn_max_align_down(start);
+	unsigned long isolate_end = pfn_max_align_up(end);
+	unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
+	unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
+	unsigned long num_pageblock_to_save;
 	unsigned int order;
 	int ret = 0;
+	unsigned char *saved_mt;
+	int num;
 
 	struct compact_control cc = {
 		.nr_migratepages = 0,
@@ -8992,11 +9045,30 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
+	/*
+	 * TODO: make MIGRATE_ISOLATE a standalone bit to avoid overwriting
+	 * the exiting migratetype. Then, we will not need the save and restore
+	 * process here.
+	 */
+
+	/* Save the migratepages of the pageblocks before start and after end */
+	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
+				+ (isolate_end - alloc_end) / pageblock_nr_pages;
+	saved_mt =
+		kmalloc_array(num_pageblock_to_save,
+			      sizeof(unsigned char), GFP_KERNEL);
+	if (!saved_mt)
+		return -ENOMEM;
+
+	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
+
+	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
 	/*
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
 	 * have different sizes, and due to the way page allocator
-	 * work, we align the range to biggest of the two pages so
+	 * work, we align the isolation range to biggest of the two so
 	 * that page allocator won't try to merge buddies from
 	 * different pageblocks and change MIGRATE_ISOLATE to some
 	 * other migration type.
@@ -9006,6 +9078,20 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * we are interested in).  This will put all the pages in
 	 * range back to page allocator as MIGRATE_ISOLATE.
 	 *
+	 * Afterwards, we restore the migratetypes of the pageblocks not
+	 * in range, split free pages spanning outside the range,
+	 * and put split free pages (at pageblock_order) to the right
+	 * migratetype list.
+	 *
+	 * NOTE: the above approach is used because it can cause free
+	 * page accounting issues during isolation, if a page, either
+	 * free or in-use, contains multiple pageblocks and we only
+	 * isolate a subset of them. For example, if only the second
+	 * pageblock is isolated from a page with 2 pageblocks, after
+	 * the page is free, it will be put in the first pageblock
+	 * migratetype list instead of having 2 pageblocks in two
+	 * separate migratetype lists.
+	 *
 	 * When this is done, we take the pages in range from page
 	 * allocator removing them from the buddy system.  This way
 	 * page allocator will never consider using them.
@@ -9016,10 +9102,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(start, end, pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype, 0);
+	ret = start_isolate_page_range(start, end, isolate_start, isolate_end,
+				migratetype, 0);
 	if (ret)
-		return ret;
+		goto done;
 
 	drain_all_pages(cc.zone);
 
@@ -9055,6 +9141,19 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * isolated thus they won't get removed from buddy.
 	 */
 
+	/*
+	 * Restore migratetypes of pageblocks outside [start, end)
+	 * TODO: remove it when MIGRATE_ISOLATE becomes a standalone bit
+	 */
+
+	num = restore_migratetypes(saved_mt, isolate_start, alloc_start);
+
+	num = restore_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
+	/*
+	 * Split free page spanning [isolate_start, alloc_start) and put the
+	 * pageblocks in the right migratetype lists.
+	 */
 	order = 0;
 	outer_start = start;
 	while (!PageBuddy(pfn_to_page(outer_start))) {
@@ -9069,37 +9168,73 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		order = buddy_order(pfn_to_page(outer_start));
 
 		/*
-		 * outer_start page could be small order buddy page and
-		 * it doesn't include start page. Adjust outer_start
-		 * in this case to report failed page properly
-		 * on tracepoint in test_pages_isolated()
+		 * split the free page has start page and put the pageblocks
+		 * in the right migratetype list
 		 */
-		if (outer_start + (1UL << order) <= start)
-			outer_start = start;
+		if (outer_start + (1UL << order) > start) {
+			struct page *free_page = pfn_to_page(outer_start);
+
+			split_free_page_into_pageblocks(free_page, order, cc.zone);
+		}
+	}
+
+	/*
+	 * Split free page spanning [alloc_end, isolate_end) and put the
+	 * pageblocks in the right migratetype list
+	 */
+	for (outer_end = alloc_end; outer_end < isolate_end;) {
+		unsigned long begin_pfn = outer_end;
+
+		order = 0;
+		while (!PageBuddy(pfn_to_page(outer_end))) {
+			if (++order >= MAX_ORDER) {
+				outer_end = begin_pfn;
+				break;
+			}
+			outer_end &= ~0UL << order;
+		}
+
+		if (outer_end != begin_pfn) {
+			order = buddy_order(pfn_to_page(outer_end));
+
+			/*
+			 * split the free page has start page and put the pageblocks
+			 * in the right migratetype list
+			 */
+			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
+			{
+				struct page *free_page = pfn_to_page(outer_end);
+
+				split_free_page_into_pageblocks(free_page, order, cc.zone);
+			}
+			outer_end += 1UL << order;
+		} else
+			outer_end = begin_pfn + 1;
 	}
 
 	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, 0)) {
+	if (test_pages_isolated(alloc_start, alloc_end, 0)) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Grab isolated pages from freelists. */
-	outer_end = isolate_freepages_range(&cc, outer_start, end);
+	outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
 	if (!outer_end) {
 		ret = -EBUSY;
 		goto done;
 	}
 
 	/* Free head and tail (if any) */
-	if (start != outer_start)
-		free_contig_range(outer_start, start - outer_start);
-	if (end != outer_end)
-		free_contig_range(end, outer_end - end);
+	if (start != alloc_start)
+		free_contig_range(alloc_start, start - alloc_start);
+	if (end != alloc_end)
+		free_contig_range(end, alloc_end - end);
 
 done:
-	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+	kfree(saved_mt);
+	undo_isolate_page_range(alloc_start,
+				alloc_end, migratetype);
 	return ret;
 }
 EXPORT_SYMBOL(alloc_contig_range);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 5/7] mm: cma: use pageblock_order as the single alignment
  2022-01-19 19:06 ` Zi Yan
  (?)
@ 2022-01-19 19:06   ` Zi Yan
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_order
alignment.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h  | 5 +----
 kernel/dma/contiguous.c | 2 +-
 mm/cma.c                | 6 ++----
 mm/page_alloc.c         | 6 +++---
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 71b77aab748d..7bd3694b24b4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
 	 *
 	 * The way to use it is to change migratetype of a range of
 	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger than
-	 * a single pageblock.
+	 * __free_pageblock_cma() function.
 	 */
 	MIGRATE_CMA,
 #endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {
 
 static int __init rmem_cma_setup(struct reserved_mem *rmem)
 {
-	phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+	phys_addr_t align = PAGE_SIZE << pageblock_order;
 	phys_addr_t mask = align - 1;
 	unsigned long node = rmem->fdt_node;
 	bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..d171158bd418 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -180,8 +180,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 		return -EINVAL;
 
 	/* ensure minimal alignment required by mm core */
-	alignment = PAGE_SIZE <<
-			max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+	alignment = PAGE_SIZE << pageblock_order;
 
 	/* alignment should be aligned with order_per_bit */
 	if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -268,8 +267,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 	 * migratetype page by page allocator's buddy algorithm. In the case,
 	 * you couldn't get a contiguous memory, which is not what we want.
 	 */
-	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE <<
-			  max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE << pageblock_order);
 	if (fixed && base & (alignment - 1)) {
 		ret = -EINVAL;
 		pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6ed506234efa..a8ced1a00ce8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9008,8 +9008,8 @@ static inline void split_free_page_into_pageblocks(struct page *free_page,
  *			be either of the two.
  * @gfp_mask:	GFP mask to use during compaction
  *
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned.  The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
  *
  * The first thing this routine does is attempt to MIGRATE_ISOLATE all
  * pageblocks in the range.  Once isolated, the pageblocks should not
@@ -9125,7 +9125,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	ret = 0;
 
 	/*
-	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * Pages from [start, end) are within a pageblock_nr_pages
 	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
 	 * more, all pages in [start, end) are free in page allocator.
 	 * What we are going to do is to allocate all pages from
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 5/7] mm: cma: use pageblock_order as the single alignment
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Michael Ellerman, Robin Murphy, linux-kernel,
	iommu, Eric Ren, virtualization, linuxppc-dev, Christoph Hellwig,
	Vlastimil Babka

From: Zi Yan <ziy@nvidia.com>

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_order
alignment.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h  | 5 +----
 kernel/dma/contiguous.c | 2 +-
 mm/cma.c                | 6 ++----
 mm/page_alloc.c         | 6 +++---
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 71b77aab748d..7bd3694b24b4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
 	 *
 	 * The way to use it is to change migratetype of a range of
 	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger than
-	 * a single pageblock.
+	 * __free_pageblock_cma() function.
 	 */
 	MIGRATE_CMA,
 #endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {
 
 static int __init rmem_cma_setup(struct reserved_mem *rmem)
 {
-	phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+	phys_addr_t align = PAGE_SIZE << pageblock_order;
 	phys_addr_t mask = align - 1;
 	unsigned long node = rmem->fdt_node;
 	bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..d171158bd418 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -180,8 +180,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 		return -EINVAL;
 
 	/* ensure minimal alignment required by mm core */
-	alignment = PAGE_SIZE <<
-			max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+	alignment = PAGE_SIZE << pageblock_order;
 
 	/* alignment should be aligned with order_per_bit */
 	if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -268,8 +267,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 	 * migratetype page by page allocator's buddy algorithm. In the case,
 	 * you couldn't get a contiguous memory, which is not what we want.
 	 */
-	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE <<
-			  max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE << pageblock_order);
 	if (fixed && base & (alignment - 1)) {
 		ret = -EINVAL;
 		pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6ed506234efa..a8ced1a00ce8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9008,8 +9008,8 @@ static inline void split_free_page_into_pageblocks(struct page *free_page,
  *			be either of the two.
  * @gfp_mask:	GFP mask to use during compaction
  *
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned.  The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
  *
  * The first thing this routine does is attempt to MIGRATE_ISOLATE all
  * pageblocks in the range.  Once isolated, the pageblocks should not
@@ -9125,7 +9125,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	ret = 0;
 
 	/*
-	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * Pages from [start, end) are within a pageblock_nr_pages
 	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
 	 * more, all pages in [start, end) are free in page allocator.
 	 * What we are going to do is to allocate all pages from
-- 
2.34.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 5/7] mm: cma: use pageblock_order as the single alignment
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Robin Murphy, linux-kernel, iommu, Eric Ren,
	virtualization, linuxppc-dev, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_order
alignment.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h  | 5 +----
 kernel/dma/contiguous.c | 2 +-
 mm/cma.c                | 6 ++----
 mm/page_alloc.c         | 6 +++---
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 71b77aab748d..7bd3694b24b4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
 	 *
 	 * The way to use it is to change migratetype of a range of
 	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger than
-	 * a single pageblock.
+	 * __free_pageblock_cma() function.
 	 */
 	MIGRATE_CMA,
 #endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {
 
 static int __init rmem_cma_setup(struct reserved_mem *rmem)
 {
-	phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+	phys_addr_t align = PAGE_SIZE << pageblock_order;
 	phys_addr_t mask = align - 1;
 	unsigned long node = rmem->fdt_node;
 	bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..d171158bd418 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -180,8 +180,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 		return -EINVAL;
 
 	/* ensure minimal alignment required by mm core */
-	alignment = PAGE_SIZE <<
-			max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+	alignment = PAGE_SIZE << pageblock_order;
 
 	/* alignment should be aligned with order_per_bit */
 	if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -268,8 +267,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 	 * migratetype page by page allocator's buddy algorithm. In the case,
 	 * you couldn't get a contiguous memory, which is not what we want.
 	 */
-	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE <<
-			  max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE << pageblock_order);
 	if (fixed && base & (alignment - 1)) {
 		ret = -EINVAL;
 		pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6ed506234efa..a8ced1a00ce8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9008,8 +9008,8 @@ static inline void split_free_page_into_pageblocks(struct page *free_page,
  *			be either of the two.
  * @gfp_mask:	GFP mask to use during compaction
  *
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned.  The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
  *
  * The first thing this routine does is attempt to MIGRATE_ISOLATE all
  * pageblocks in the range.  Once isolated, the pageblocks should not
@@ -9125,7 +9125,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	ret = 0;
 
 	/*
-	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * Pages from [start, end) are within a pageblock_nr_pages
 	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
 	 * more, all pages in [start, end) are free in page allocator.
 	 * What we are going to do is to allocate all pages from
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.
  2022-01-19 19:06 ` Zi Yan
  (?)
@ 2022-01-19 19:06   ` Zi Yan
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 drivers/virtio/virtio_mem.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index a6a78685cfbe..eafba2119ae3 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2476,13 +2476,12 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
 	/*
-	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
-	 * pageblock_nr_pages pages. This:
+	 * We want subblocks to span at least pageblock_nr_pages pages.
+	 * This:
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-			pageblock_nr_pages) * PAGE_SIZE;
+	sb_size = pageblock_nr_pages * PAGE_SIZE;
 	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
 	if (sb_size < memory_block_size_bytes() && !force_bbm) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Michael Ellerman, Robin Murphy, linux-kernel,
	iommu, Eric Ren, virtualization, linuxppc-dev, Christoph Hellwig,
	Vlastimil Babka

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 drivers/virtio/virtio_mem.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index a6a78685cfbe..eafba2119ae3 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2476,13 +2476,12 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
 	/*
-	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
-	 * pageblock_nr_pages pages. This:
+	 * We want subblocks to span at least pageblock_nr_pages pages.
+	 * This:
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-			pageblock_nr_pages) * PAGE_SIZE;
+	sb_size = pageblock_nr_pages * PAGE_SIZE;
 	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
 	if (sb_size < memory_block_size_bytes() && !force_bbm) {
-- 
2.34.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Robin Murphy, linux-kernel, iommu, Eric Ren,
	virtualization, linuxppc-dev, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 drivers/virtio/virtio_mem.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index a6a78685cfbe..eafba2119ae3 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2476,13 +2476,12 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
 	/*
-	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
-	 * pageblock_nr_pages pages. This:
+	 * We want subblocks to span at least pageblock_nr_pages pages.
+	 * This:
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-			pageblock_nr_pages) * PAGE_SIZE;
+	sb_size = pageblock_nr_pages * PAGE_SIZE;
 	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
 	if (sb_size < memory_block_size_bytes() && !force_bbm) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned.
  2022-01-19 19:06 ` Zi Yan
  (?)
@ 2022-01-19 19:06   ` Zi Yan
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren, Zi Yan

From: Zi Yan <ziy@nvidia.com>

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 arch/powerpc/include/asm/fadump-internal.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h
index 52189928ec08..fbfca85b4200 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
 #define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
 /* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE <<				\
-				 max_t(unsigned long, MAX_ORDER - 1,	\
-				 pageblock_order))
+#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE << pageblock_order)
 
 /* FAD commands */
 #define FADUMP_REGISTER			1
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned.
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Michael Ellerman, Robin Murphy, linux-kernel,
	iommu, Eric Ren, virtualization, linuxppc-dev, Christoph Hellwig,
	Vlastimil Babka

From: Zi Yan <ziy@nvidia.com>

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 arch/powerpc/include/asm/fadump-internal.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h
index 52189928ec08..fbfca85b4200 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
 #define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
 /* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE <<				\
-				 max_t(unsigned long, MAX_ORDER - 1,	\
-				 pageblock_order))
+#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE << pageblock_order)
 
 /* FAD commands */
 #define FADUMP_REGISTER			1
-- 
2.34.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned.
@ 2022-01-19 19:06   ` Zi Yan
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-19 19:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Mel Gorman, Zi Yan, Robin Murphy, linux-kernel, iommu, Eric Ren,
	virtualization, linuxppc-dev, Christoph Hellwig, Vlastimil Babka,
	Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 arch/powerpc/include/asm/fadump-internal.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h
index 52189928ec08..fbfca85b4200 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
 #define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
 /* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE <<				\
-				 max_t(unsigned long, MAX_ORDER - 1,	\
-				 pageblock_order))
+#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE << pageblock_order)
 
 /* FAD commands */
 #define FADUMP_REGISTER			1
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-01-19 19:06   ` Zi Yan
  (?)
@ 2022-01-24  9:55     ` Oscar Salvador
  -1 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-24  9:55 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

On 2022-01-19 20:06, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
> pages within that granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.

Hi Zi Yan,

I had to re-read this several times as I found this a bit misleading.
I was mainly confused by the fact that memory_hotplug does isolation on 
PAGES_PER_SECTION granularity, and reading the above seems to indicate 
that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages 
granularity.

True is that start_isolate_page_range() expects the range to be 
pageblock aligned and works in pageblock_nr_pages chunks, but I do not 
think that is what you meant to say here.

Now, to the change itself, below:


> @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone
> *zone, struct page *page,
>  		return page;
>  	}
> 
> -	for (; iter < pageblock_nr_pages - offset; iter++) {
> -		page = pfn_to_page(pfn + iter);
> +	for (pfn = first_pfn; pfn < last_pfn; pfn++) {

You already did pfn = first_pfn before.

>  /**
>   * start_isolate_page_range() - make page-allocation-type of range of 
> pages to
>   * be MIGRATE_ISOLATE.
> - * @start_pfn:		The lower PFN of the range to be isolated.
> - * @end_pfn:		The upper PFN of the range to be isolated.
> + * @start_pfn:		The lower PFN of the range to be checked for
> + *			possibility of isolation.
> + * @end_pfn:		The upper PFN of the range to be checked for
> + *			possibility of isolation.
> + * @isolate_start:		The lower PFN of the range to be isolated.
> + * @isolate_end:		The upper PFN of the range to be isolated.

So, what does "possibility" means here. I think this need to be 
clarified a bit better.

 From what you pointed out in the commit message I think what you are 
doing is:

- alloc_contig_range() gets a range to be isolated.
- then you pass two ranges to start_isolate_page_range()
   (start_pfn, end_pfn]: which is the unaligned range you got in 
alloc_contig_range()
   (isolate_start, isolate_end]: which got aligned to, let's say, to 
MAX_ORDER_NR_PAGES

Now, most likely, (start_pfn, end_pfn] only covers a sub-range of 
(isolate_start, isolate_end], and that
sub-range is what you really want to isolate (so (start_pfn, end_pfn])?

If so, should not the names be reversed?


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-24  9:55     ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-24  9:55 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

On 2022-01-19 20:06, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
> pages within that granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.

Hi Zi Yan,

I had to re-read this several times as I found this a bit misleading.
I was mainly confused by the fact that memory_hotplug does isolation on 
PAGES_PER_SECTION granularity, and reading the above seems to indicate 
that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages 
granularity.

True is that start_isolate_page_range() expects the range to be 
pageblock aligned and works in pageblock_nr_pages chunks, but I do not 
think that is what you meant to say here.

Now, to the change itself, below:


> @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone
> *zone, struct page *page,
>  		return page;
>  	}
> 
> -	for (; iter < pageblock_nr_pages - offset; iter++) {
> -		page = pfn_to_page(pfn + iter);
> +	for (pfn = first_pfn; pfn < last_pfn; pfn++) {

You already did pfn = first_pfn before.

>  /**
>   * start_isolate_page_range() - make page-allocation-type of range of 
> pages to
>   * be MIGRATE_ISOLATE.
> - * @start_pfn:		The lower PFN of the range to be isolated.
> - * @end_pfn:		The upper PFN of the range to be isolated.
> + * @start_pfn:		The lower PFN of the range to be checked for
> + *			possibility of isolation.
> + * @end_pfn:		The upper PFN of the range to be checked for
> + *			possibility of isolation.
> + * @isolate_start:		The lower PFN of the range to be isolated.
> + * @isolate_end:		The upper PFN of the range to be isolated.

So, what does "possibility" means here. I think this need to be 
clarified a bit better.

 From what you pointed out in the commit message I think what you are 
doing is:

- alloc_contig_range() gets a range to be isolated.
- then you pass two ranges to start_isolate_page_range()
   (start_pfn, end_pfn]: which is the unaligned range you got in 
alloc_contig_range()
   (isolate_start, isolate_end]: which got aligned to, let's say, to 
MAX_ORDER_NR_PAGES

Now, most likely, (start_pfn, end_pfn] only covers a sub-range of 
(isolate_start, isolate_end], and that
sub-range is what you really want to isolate (so (start_pfn, end_pfn])?

If so, should not the names be reversed?


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-24  9:55     ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-24  9:55 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Eric Ren,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka

On 2022-01-19 20:06, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
> pages within that granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.

Hi Zi Yan,

I had to re-read this several times as I found this a bit misleading.
I was mainly confused by the fact that memory_hotplug does isolation on 
PAGES_PER_SECTION granularity, and reading the above seems to indicate 
that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages 
granularity.

True is that start_isolate_page_range() expects the range to be 
pageblock aligned and works in pageblock_nr_pages chunks, but I do not 
think that is what you meant to say here.

Now, to the change itself, below:


> @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone
> *zone, struct page *page,
>  		return page;
>  	}
> 
> -	for (; iter < pageblock_nr_pages - offset; iter++) {
> -		page = pfn_to_page(pfn + iter);
> +	for (pfn = first_pfn; pfn < last_pfn; pfn++) {

You already did pfn = first_pfn before.

>  /**
>   * start_isolate_page_range() - make page-allocation-type of range of 
> pages to
>   * be MIGRATE_ISOLATE.
> - * @start_pfn:		The lower PFN of the range to be isolated.
> - * @end_pfn:		The upper PFN of the range to be isolated.
> + * @start_pfn:		The lower PFN of the range to be checked for
> + *			possibility of isolation.
> + * @end_pfn:		The upper PFN of the range to be checked for
> + *			possibility of isolation.
> + * @isolate_start:		The lower PFN of the range to be isolated.
> + * @isolate_end:		The upper PFN of the range to be isolated.

So, what does "possibility" means here. I think this need to be 
clarified a bit better.

 From what you pointed out in the commit message I think what you are 
doing is:

- alloc_contig_range() gets a range to be isolated.
- then you pass two ranges to start_isolate_page_range()
   (start_pfn, end_pfn]: which is the unaligned range you got in 
alloc_contig_range()
   (isolate_start, isolate_end]: which got aligned to, let's say, to 
MAX_ORDER_NR_PAGES

Now, most likely, (start_pfn, end_pfn] only covers a sub-range of 
(isolate_start, isolate_end], and that
sub-range is what you really want to isolate (so (start_pfn, end_pfn])?

If so, should not the names be reversed?


-- 
Oscar Salvador
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  2022-01-19 19:06   ` Zi Yan
  (?)
  (?)
@ 2022-01-24 14:02     ` Mel Gorman
  -1 siblings, 0 replies; 72+ messages in thread
From: Mel Gorman @ 2022-01-24 14:02 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Eric Ren

On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> requirement for CMA and alloc_contig_range().
> 
> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> 
> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> <SNIP>
>
> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>  #ifdef CONFIG_CMA
>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
>  #endif

If it's never used, why is it added?

Otherwise looks fine so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-24 14:02     ` Mel Gorman
  0 siblings, 0 replies; 72+ messages in thread
From: Mel Gorman @ 2022-01-24 14:02 UTC (permalink / raw)
  To: Zi Yan
  Cc: Michael Ellerman, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Eric Ren, Robin Murphy, Christoph Hellwig,
	Vlastimil Babka, Marek Szyprowski

On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> requirement for CMA and alloc_contig_range().
> 
> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> 
> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> <SNIP>
>
> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>  #ifdef CONFIG_CMA
>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
>  #endif

If it's never used, why is it added?

Otherwise looks fine so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-24 14:02     ` Mel Gorman
  0 siblings, 0 replies; 72+ messages in thread
From: Mel Gorman @ 2022-01-24 14:02 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Eric Ren, Robin Murphy, Christoph Hellwig,
	Vlastimil Babka, Marek Szyprowski

On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> requirement for CMA and alloc_contig_range().
> 
> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> 
> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> <SNIP>
>
> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>  #ifdef CONFIG_CMA
>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
>  #endif

If it's never used, why is it added?

Otherwise looks fine so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-24 14:02     ` Mel Gorman
  0 siblings, 0 replies; 72+ messages in thread
From: Mel Gorman @ 2022-01-24 14:02 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, Michael Ellerman, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka

On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> requirement for CMA and alloc_contig_range().
> 
> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> 
> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> <SNIP>
>
> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>  #ifdef CONFIG_CMA
>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
>  #endif

If it's never used, why is it added?

Otherwise looks fine so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  2022-01-24 14:02     ` Mel Gorman
  (?)
@ 2022-01-24 16:12       ` Zi Yan via iommu
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-24 16:12 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Eric Ren

[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]

On 24 Jan 2022, at 9:02, Mel Gorman wrote:

> On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
>> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
>> requirement for CMA and alloc_contig_range().
>>
>> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
>> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
>> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
>>
>> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>
>> <SNIP>
>>
>> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
>> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>>  #ifdef CONFIG_CMA
>>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
>>  #endif
>
> If it's never used, why is it added?

Just to make the fallbacks list complete, since MIGRATE_CMA and
MIGRATE_ISOLATE are in the list. Instead, I can remove MIGRATE_CMA and
MIGRATE_ISOLATE. WDYT?

>
> Otherwise looks fine so
>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>

Thanks.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-24 16:12       ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan via iommu @ 2022-01-24 16:12 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Hildenbrand, Michael Ellerman, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka


[-- Attachment #1.1: Type: text/plain, Size: 1549 bytes --]

On 24 Jan 2022, at 9:02, Mel Gorman wrote:

> On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
>> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
>> requirement for CMA and alloc_contig_range().
>>
>> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
>> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
>> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
>>
>> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>
>> <SNIP>
>>
>> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
>> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>>  #ifdef CONFIG_CMA
>>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
>>  #endif
>
> If it's never used, why is it added?

Just to make the fallbacks list complete, since MIGRATE_CMA and
MIGRATE_ISOLATE are in the list. Instead, I can remove MIGRATE_CMA and
MIGRATE_ISOLATE. WDYT?

>
> Otherwise looks fine so
>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>

Thanks.

--
Best Regards,
Yan, Zi

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-24 16:12       ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-24 16:12 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Hildenbrand, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Eric Ren, Robin Murphy, Christoph Hellwig,
	Vlastimil Babka, Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]

On 24 Jan 2022, at 9:02, Mel Gorman wrote:

> On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
>> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
>> requirement for CMA and alloc_contig_range().
>>
>> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
>> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
>> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
>>
>> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>
>> <SNIP>
>>
>> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
>> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>>  #ifdef CONFIG_CMA
>>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
>>  #endif
>
> If it's never used, why is it added?

Just to make the fallbacks list complete, since MIGRATE_CMA and
MIGRATE_ISOLATE are in the list. Instead, I can remove MIGRATE_CMA and
MIGRATE_ISOLATE. WDYT?

>
> Otherwise looks fine so
>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>

Thanks.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  2022-01-24 16:12       ` Zi Yan via iommu
  (?)
  (?)
@ 2022-01-24 16:43         ` Mel Gorman
  -1 siblings, 0 replies; 72+ messages in thread
From: Mel Gorman @ 2022-01-24 16:43 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Eric Ren

On Mon, Jan 24, 2022 at 11:12:07AM -0500, Zi Yan wrote:
> On 24 Jan 2022, at 9:02, Mel Gorman wrote:
> 
> > On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> >> From: Zi Yan <ziy@nvidia.com>
> >>
> >> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> >> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> >> requirement for CMA and alloc_contig_range().
> >>
> >> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> >> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> >> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> >>
> >> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>
> >> <SNIP>
> >>
> >> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
> >>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
> >>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
> >>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
> >> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
> >>  #ifdef CONFIG_CMA
> >>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
> >>  #endif
> >
> > If it's never used, why is it added?
> 
> Just to make the fallbacks list complete, since MIGRATE_CMA and
> MIGRATE_ISOLATE are in the list. Instead, I can remove MIGRATE_CMA and
> MIGRATE_ISOLATE. WDYT?
> 

It probably makes more sense to remove them or replace them with a comment
stating what migratetypes do not have a fallback list. Do it as a separate
patch that stands alone. It does not need to be part of this series.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-24 16:43         ` Mel Gorman
  0 siblings, 0 replies; 72+ messages in thread
From: Mel Gorman @ 2022-01-24 16:43 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, Michael Ellerman, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka

On Mon, Jan 24, 2022 at 11:12:07AM -0500, Zi Yan wrote:
> On 24 Jan 2022, at 9:02, Mel Gorman wrote:
> 
> > On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> >> From: Zi Yan <ziy@nvidia.com>
> >>
> >> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> >> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> >> requirement for CMA and alloc_contig_range().
> >>
> >> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> >> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> >> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> >>
> >> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>
> >> <SNIP>
> >>
> >> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
> >>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
> >>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
> >>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
> >> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
> >>  #ifdef CONFIG_CMA
> >>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
> >>  #endif
> >
> > If it's never used, why is it added?
> 
> Just to make the fallbacks list complete, since MIGRATE_CMA and
> MIGRATE_ISOLATE are in the list. Instead, I can remove MIGRATE_CMA and
> MIGRATE_ISOLATE. WDYT?
> 

It probably makes more sense to remove them or replace them with a comment
stating what migratetypes do not have a fallback list. Do it as a separate
patch that stands alone. It does not need to be part of this series.

-- 
Mel Gorman
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-24 16:43         ` Mel Gorman
  0 siblings, 0 replies; 72+ messages in thread
From: Mel Gorman @ 2022-01-24 16:43 UTC (permalink / raw)
  To: Zi Yan
  Cc: Michael Ellerman, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Eric Ren, Robin Murphy, Christoph Hellwig,
	Vlastimil Babka, Marek Szyprowski

On Mon, Jan 24, 2022 at 11:12:07AM -0500, Zi Yan wrote:
> On 24 Jan 2022, at 9:02, Mel Gorman wrote:
> 
> > On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> >> From: Zi Yan <ziy@nvidia.com>
> >>
> >> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> >> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> >> requirement for CMA and alloc_contig_range().
> >>
> >> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> >> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> >> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> >>
> >> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>
> >> <SNIP>
> >>
> >> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
> >>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
> >>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
> >>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
> >> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
> >>  #ifdef CONFIG_CMA
> >>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
> >>  #endif
> >
> > If it's never used, why is it added?
> 
> Just to make the fallbacks list complete, since MIGRATE_CMA and
> MIGRATE_ISOLATE are in the list. Instead, I can remove MIGRATE_CMA and
> MIGRATE_ISOLATE. WDYT?
> 

It probably makes more sense to remove them or replace them with a comment
stating what migratetypes do not have a fallback list. Do it as a separate
patch that stands alone. It does not need to be part of this series.

-- 
Mel Gorman
SUSE Labs
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
@ 2022-01-24 16:43         ` Mel Gorman
  0 siblings, 0 replies; 72+ messages in thread
From: Mel Gorman @ 2022-01-24 16:43 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Eric Ren, Robin Murphy, Christoph Hellwig,
	Vlastimil Babka, Marek Szyprowski

On Mon, Jan 24, 2022 at 11:12:07AM -0500, Zi Yan wrote:
> On 24 Jan 2022, at 9:02, Mel Gorman wrote:
> 
> > On Wed, Jan 19, 2022 at 02:06:17PM -0500, Zi Yan wrote:
> >> From: Zi Yan <ziy@nvidia.com>
> >>
> >> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> >> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> >> requirement for CMA and alloc_contig_range().
> >>
> >> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> >> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> >> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
> >>
> >> [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>
> >> <SNIP>
> >>
> >> @@ -2484,6 +2483,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
> >>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
> >>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
> >>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
> >> +	[MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
> >>  #ifdef CONFIG_CMA
> >>  	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
> >>  #endif
> >
> > If it's never used, why is it added?
> 
> Just to make the fallbacks list complete, since MIGRATE_CMA and
> MIGRATE_ISOLATE are in the list. Instead, I can remove MIGRATE_CMA and
> MIGRATE_ISOLATE. WDYT?
> 

It probably makes more sense to remove them or replace them with a comment
stating what migratetypes do not have a fallback list. Do it as a separate
patch that stands alone. It does not need to be part of this series.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-01-24  9:55     ` Oscar Salvador
  (?)
@ 2022-01-24 17:17       ` Zi Yan via iommu
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-24 17:17 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

[-- Attachment #1: Type: text/plain, Size: 5048 bytes --]

On 24 Jan 2022, at 4:55, Oscar Salvador wrote:

> On 2022-01-19 20:06, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>> pages within that granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
>
> Hi Zi Yan,
>
> I had to re-read this several times as I found this a bit misleading.
> I was mainly confused by the fact that memory_hotplug does isolation on PAGES_PER_SECTION granularity, and reading the above seems to indicate that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages granularity.

You are right. Sorry for the confusion. I think it should be
“Page isolation is done at least on max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) granularity.”

memory_hotplug uses PAGES_PER_SECTION. It is greater than that.


>
> True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here.

Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) alignment instead of pageblock alignment. It seems to
be an uncovered bug in the current code, since all callers uses at least
max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment.

The reason is that if start_isolate_page_range() is only pageblock aligned
and a caller wants to isolate one pageblock from a MAX_ORDER-1
(2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE
accounting error. To avoid it, start_isolate_page_range() needs to isolate
the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range.


>
> Now, to the change itself, below:
>
>
>> @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone
>> *zone, struct page *page,
>>  		return page;
>>  	}
>>
>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>> -		page = pfn_to_page(pfn + iter);
>> +	for (pfn = first_pfn; pfn < last_pfn; pfn++) {
>
> You already did pfn = first_pfn before.

Got it. Will remove the redundant code.

>
>>  /**
>>   * start_isolate_page_range() - make page-allocation-type of range of pages to
>>   * be MIGRATE_ISOLATE.
>> - * @start_pfn:		The lower PFN of the range to be isolated.
>> - * @end_pfn:		The upper PFN of the range to be isolated.
>> + * @start_pfn:		The lower PFN of the range to be checked for
>> + *			possibility of isolation.
>> + * @end_pfn:		The upper PFN of the range to be checked for
>> + *			possibility of isolation.
>> + * @isolate_start:		The lower PFN of the range to be isolated.
>> + * @isolate_end:		The upper PFN of the range to be isolated.
>
> So, what does "possibility" means here. I think this need to be clarified a bit better.

start_isolate_page_range() needs to check if unmovable pages exist in the
range [start_pfn, end_pfn) but mark all pageblocks within [isolate_start,
isolate_end) MIGRATE_ISOLATE (isolate_* need to be max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) aligned). But now I realize “possibility” here is very
confusing, since both ranges decide whether the isolation can succeed.

>
> From what you pointed out in the commit message I think what you are doing is:
>
> - alloc_contig_range() gets a range to be isolated.
> - then you pass two ranges to start_isolate_page_range()
>   (start_pfn, end_pfn]: which is the unaligned range you got in alloc_contig_range()
>   (isolate_start, isolate_end]: which got aligned to, let's say, to MAX_ORDER_NR_PAGES
>
> Now, most likely, (start_pfn, end_pfn] only covers a sub-range of (isolate_start, isolate_end], and that
> sub-range is what you really want to isolate (so (start_pfn, end_pfn])?

Correct.

I agree that isolate_start and isolate_end are pretty confusing here.
They are implementation details of start_isolate_page_range() and should
not be exposed. I will remove them from the parameter list and produce
them inside start_isolate_page_range(). They are pfn_max_align_down()
and pfn_max_align_up() of start_pfn and end_pfn, respectively.

In alloc_contig_range(), the code is still needed to save and restore
migrateypes for [isolate_start, start_pfn) and (end_pfn, isolate_end],
because [start_pfn, end_pfn) is not required to be max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) aligned. Like I said in the patch, the code will
go away once MIGRATE_ISOLATE becomes a standalone bit without overwriting
existing migratetypes during page isolation. And then isolate_start
and isolate_end here will be completely transparent to callers of
start_isolate_page_range().

Thanks for your review and comment.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-24 17:17       ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan via iommu @ 2022-01-24 17:17 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Mel Gorman, David Hildenbrand, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Eric Ren,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka


[-- Attachment #1.1: Type: text/plain, Size: 5048 bytes --]

On 24 Jan 2022, at 4:55, Oscar Salvador wrote:

> On 2022-01-19 20:06, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>> pages within that granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
>
> Hi Zi Yan,
>
> I had to re-read this several times as I found this a bit misleading.
> I was mainly confused by the fact that memory_hotplug does isolation on PAGES_PER_SECTION granularity, and reading the above seems to indicate that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages granularity.

You are right. Sorry for the confusion. I think it should be
“Page isolation is done at least on max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) granularity.”

memory_hotplug uses PAGES_PER_SECTION. It is greater than that.


>
> True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here.

Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) alignment instead of pageblock alignment. It seems to
be an uncovered bug in the current code, since all callers uses at least
max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment.

The reason is that if start_isolate_page_range() is only pageblock aligned
and a caller wants to isolate one pageblock from a MAX_ORDER-1
(2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE
accounting error. To avoid it, start_isolate_page_range() needs to isolate
the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range.


>
> Now, to the change itself, below:
>
>
>> @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone
>> *zone, struct page *page,
>>  		return page;
>>  	}
>>
>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>> -		page = pfn_to_page(pfn + iter);
>> +	for (pfn = first_pfn; pfn < last_pfn; pfn++) {
>
> You already did pfn = first_pfn before.

Got it. Will remove the redundant code.

>
>>  /**
>>   * start_isolate_page_range() - make page-allocation-type of range of pages to
>>   * be MIGRATE_ISOLATE.
>> - * @start_pfn:		The lower PFN of the range to be isolated.
>> - * @end_pfn:		The upper PFN of the range to be isolated.
>> + * @start_pfn:		The lower PFN of the range to be checked for
>> + *			possibility of isolation.
>> + * @end_pfn:		The upper PFN of the range to be checked for
>> + *			possibility of isolation.
>> + * @isolate_start:		The lower PFN of the range to be isolated.
>> + * @isolate_end:		The upper PFN of the range to be isolated.
>
> So, what does "possibility" means here. I think this need to be clarified a bit better.

start_isolate_page_range() needs to check if unmovable pages exist in the
range [start_pfn, end_pfn) but mark all pageblocks within [isolate_start,
isolate_end) MIGRATE_ISOLATE (isolate_* need to be max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) aligned). But now I realize “possibility” here is very
confusing, since both ranges decide whether the isolation can succeed.

>
> From what you pointed out in the commit message I think what you are doing is:
>
> - alloc_contig_range() gets a range to be isolated.
> - then you pass two ranges to start_isolate_page_range()
>   (start_pfn, end_pfn]: which is the unaligned range you got in alloc_contig_range()
>   (isolate_start, isolate_end]: which got aligned to, let's say, to MAX_ORDER_NR_PAGES
>
> Now, most likely, (start_pfn, end_pfn] only covers a sub-range of (isolate_start, isolate_end], and that
> sub-range is what you really want to isolate (so (start_pfn, end_pfn])?

Correct.

I agree that isolate_start and isolate_end are pretty confusing here.
They are implementation details of start_isolate_page_range() and should
not be exposed. I will remove them from the parameter list and produce
them inside start_isolate_page_range(). They are pfn_max_align_down()
and pfn_max_align_up() of start_pfn and end_pfn, respectively.

In alloc_contig_range(), the code is still needed to save and restore
migrateypes for [isolate_start, start_pfn) and (end_pfn, isolate_end],
because [start_pfn, end_pfn) is not required to be max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) aligned. Like I said in the patch, the code will
go away once MIGRATE_ISOLATE becomes a standalone bit without overwriting
existing migratetypes during page isolation. And then isolate_start
and isolate_end here will be completely transparent to callers of
start_isolate_page_range().

Thanks for your review and comment.

--
Best Regards,
Yan, Zi

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-24 17:17       ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-24 17:17 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 5048 bytes --]

On 24 Jan 2022, at 4:55, Oscar Salvador wrote:

> On 2022-01-19 20:06, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>> pages within that granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
>
> Hi Zi Yan,
>
> I had to re-read this several times as I found this a bit misleading.
> I was mainly confused by the fact that memory_hotplug does isolation on PAGES_PER_SECTION granularity, and reading the above seems to indicate that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages granularity.

You are right. Sorry for the confusion. I think it should be
“Page isolation is done at least on max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) granularity.”

memory_hotplug uses PAGES_PER_SECTION. It is greater than that.


>
> True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here.

Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) alignment instead of pageblock alignment. It seems to
be an uncovered bug in the current code, since all callers uses at least
max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment.

The reason is that if start_isolate_page_range() is only pageblock aligned
and a caller wants to isolate one pageblock from a MAX_ORDER-1
(2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE
accounting error. To avoid it, start_isolate_page_range() needs to isolate
the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range.


>
> Now, to the change itself, below:
>
>
>> @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone
>> *zone, struct page *page,
>>  		return page;
>>  	}
>>
>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>> -		page = pfn_to_page(pfn + iter);
>> +	for (pfn = first_pfn; pfn < last_pfn; pfn++) {
>
> You already did pfn = first_pfn before.

Got it. Will remove the redundant code.

>
>>  /**
>>   * start_isolate_page_range() - make page-allocation-type of range of pages to
>>   * be MIGRATE_ISOLATE.
>> - * @start_pfn:		The lower PFN of the range to be isolated.
>> - * @end_pfn:		The upper PFN of the range to be isolated.
>> + * @start_pfn:		The lower PFN of the range to be checked for
>> + *			possibility of isolation.
>> + * @end_pfn:		The upper PFN of the range to be checked for
>> + *			possibility of isolation.
>> + * @isolate_start:		The lower PFN of the range to be isolated.
>> + * @isolate_end:		The upper PFN of the range to be isolated.
>
> So, what does "possibility" means here. I think this need to be clarified a bit better.

start_isolate_page_range() needs to check if unmovable pages exist in the
range [start_pfn, end_pfn) but mark all pageblocks within [isolate_start,
isolate_end) MIGRATE_ISOLATE (isolate_* need to be max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) aligned). But now I realize “possibility” here is very
confusing, since both ranges decide whether the isolation can succeed.

>
> From what you pointed out in the commit message I think what you are doing is:
>
> - alloc_contig_range() gets a range to be isolated.
> - then you pass two ranges to start_isolate_page_range()
>   (start_pfn, end_pfn]: which is the unaligned range you got in alloc_contig_range()
>   (isolate_start, isolate_end]: which got aligned to, let's say, to MAX_ORDER_NR_PAGES
>
> Now, most likely, (start_pfn, end_pfn] only covers a sub-range of (isolate_start, isolate_end], and that
> sub-range is what you really want to isolate (so (start_pfn, end_pfn])?

Correct.

I agree that isolate_start and isolate_end are pretty confusing here.
They are implementation details of start_isolate_page_range() and should
not be exposed. I will remove them from the parameter list and produce
them inside start_isolate_page_range(). They are pfn_max_align_down()
and pfn_max_align_up() of start_pfn and end_pfn, respectively.

In alloc_contig_range(), the code is still needed to save and restore
migrateypes for [isolate_start, start_pfn) and (end_pfn, isolate_end],
because [start_pfn, end_pfn) is not required to be max(MAX_ORDER_NR_PAEGS,
pageblock_nr_pages) aligned. Like I said in the patch, the code will
go away once MIGRATE_ISOLATE becomes a standalone bit without overwriting
existing migratetypes during page isolation. And then isolate_start
and isolate_end here will be completely transparent to callers of
start_isolate_page_range().

Thanks for your review and comment.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  2022-01-19 19:06   ` Zi Yan
  (?)
@ 2022-01-25  6:23     ` Oscar Salvador
  -1 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-25  6:23 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

On 2022-01-19 20:06, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
> mm/page_alloc.c and make it static.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
@ 2022-01-25  6:23     ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-25  6:23 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

On 2022-01-19 20:06, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
> mm/page_alloc.c and make it static.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
@ 2022-01-25  6:23     ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-25  6:23 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Eric Ren,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka

On 2022-01-19 20:06, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
> mm/page_alloc.c and make it static.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-01-24 17:17       ` Zi Yan via iommu
  (?)
@ 2022-01-25 13:19         ` Oscar Salvador
  -1 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-25 13:19 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

On Mon, Jan 24, 2022 at 12:17:23PM -0500, Zi Yan wrote:
> You are right. Sorry for the confusion. I think it should be
> “Page isolation is done at least on max(MAX_ORDER_NR_PAEGS,
> pageblock_nr_pages) granularity.”
> 
> memory_hotplug uses PAGES_PER_SECTION. It is greater than that.

Or just specify that the max(MAX_ORDER_NR_PAGES, pageblock_nr_pages) granurality
only comes from alloc_contig_range at the moment. Other callers might want
to work in other granularity (e.g: memory-hotplug) although ultimately the
range has to be aligned to something.

> > True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here.
> 
> Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS,
> pageblock_nr_pages) alignment instead of pageblock alignment. It seems to
> be an uncovered bug in the current code, since all callers uses at least
> max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment.
> 
> The reason is that if start_isolate_page_range() is only pageblock aligned
> and a caller wants to isolate one pageblock from a MAX_ORDER-1
> (2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE
> accounting error. To avoid it, start_isolate_page_range() needs to isolate
> the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range.

So, let me see if I get this straight:

You are saying that, currently, alloc_contig_ranges() works on the biggest
alignment otherwise we might have this scenario:

[      MAX_ORDER-1       ]
[pageblock#0][pageblock#1]

We only want to isolate pageblock#1, so we pass a pageblock-aligned range to
start_isolate_page_range(), but the page belonging to pageblock#1 spans
pageblock#0 and pageblock#1 because it is a MAX_ORDER-1 page.

So when we call set_migratetype_isolate()->set_pageblock_migratetype(), this will
mark either pageblock#0 or pageblock#1 as isolated, but the whole page will be put
in the MIGRATE_ISOLATE freelist by move_freepages_block()->move_freepages().
Meaning, we wil effectively have two pageblocks isolated, but only one marked
as such?

Did I get it right or did I miss something?

I know that this has been discussed previously, and the cover-letter already
mentions it, but I think it would be great to have some sort of information about
the problem in the commit message as well, so people do not have to go and find
it somewhere else.


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-25 13:19         ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-25 13:19 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

On Mon, Jan 24, 2022 at 12:17:23PM -0500, Zi Yan wrote:
> You are right. Sorry for the confusion. I think it should be
> “Page isolation is done at least on max(MAX_ORDER_NR_PAEGS,
> pageblock_nr_pages) granularity.”
> 
> memory_hotplug uses PAGES_PER_SECTION. It is greater than that.

Or just specify that the max(MAX_ORDER_NR_PAGES, pageblock_nr_pages) granurality
only comes from alloc_contig_range at the moment. Other callers might want
to work in other granularity (e.g: memory-hotplug) although ultimately the
range has to be aligned to something.

> > True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here.
> 
> Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS,
> pageblock_nr_pages) alignment instead of pageblock alignment. It seems to
> be an uncovered bug in the current code, since all callers uses at least
> max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment.
> 
> The reason is that if start_isolate_page_range() is only pageblock aligned
> and a caller wants to isolate one pageblock from a MAX_ORDER-1
> (2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE
> accounting error. To avoid it, start_isolate_page_range() needs to isolate
> the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range.

So, let me see if I get this straight:

You are saying that, currently, alloc_contig_ranges() works on the biggest
alignment otherwise we might have this scenario:

[      MAX_ORDER-1       ]
[pageblock#0][pageblock#1]

We only want to isolate pageblock#1, so we pass a pageblock-aligned range to
start_isolate_page_range(), but the page belonging to pageblock#1 spans
pageblock#0 and pageblock#1 because it is a MAX_ORDER-1 page.

So when we call set_migratetype_isolate()->set_pageblock_migratetype(), this will
mark either pageblock#0 or pageblock#1 as isolated, but the whole page will be put
in the MIGRATE_ISOLATE freelist by move_freepages_block()->move_freepages().
Meaning, we wil effectively have two pageblocks isolated, but only one marked
as such?

Did I get it right or did I miss something?

I know that this has been discussed previously, and the cover-letter already
mentions it, but I think it would be great to have some sort of information about
the problem in the commit message as well, so people do not have to go and find
it somewhere else.


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-25 13:19         ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-25 13:19 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Eric Ren,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka

On Mon, Jan 24, 2022 at 12:17:23PM -0500, Zi Yan wrote:
> You are right. Sorry for the confusion. I think it should be
> “Page isolation is done at least on max(MAX_ORDER_NR_PAEGS,
> pageblock_nr_pages) granularity.”
> 
> memory_hotplug uses PAGES_PER_SECTION. It is greater than that.

Or just specify that the max(MAX_ORDER_NR_PAGES, pageblock_nr_pages) granurality
only comes from alloc_contig_range at the moment. Other callers might want
to work in other granularity (e.g: memory-hotplug) although ultimately the
range has to be aligned to something.

> > True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here.
> 
> Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS,
> pageblock_nr_pages) alignment instead of pageblock alignment. It seems to
> be an uncovered bug in the current code, since all callers uses at least
> max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment.
> 
> The reason is that if start_isolate_page_range() is only pageblock aligned
> and a caller wants to isolate one pageblock from a MAX_ORDER-1
> (2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE
> accounting error. To avoid it, start_isolate_page_range() needs to isolate
> the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range.

So, let me see if I get this straight:

You are saying that, currently, alloc_contig_ranges() works on the biggest
alignment otherwise we might have this scenario:

[      MAX_ORDER-1       ]
[pageblock#0][pageblock#1]

We only want to isolate pageblock#1, so we pass a pageblock-aligned range to
start_isolate_page_range(), but the page belonging to pageblock#1 spans
pageblock#0 and pageblock#1 because it is a MAX_ORDER-1 page.

So when we call set_migratetype_isolate()->set_pageblock_migratetype(), this will
mark either pageblock#0 or pageblock#1 as isolated, but the whole page will be put
in the MIGRATE_ISOLATE freelist by move_freepages_block()->move_freepages().
Meaning, we wil effectively have two pageblocks isolated, but only one marked
as such?

Did I get it right or did I miss something?

I know that this has been discussed previously, and the cover-letter already
mentions it, but I think it would be great to have some sort of information about
the problem in the commit message as well, so people do not have to go and find
it somewhere else.


-- 
Oscar Salvador
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-01-25 13:19         ` Oscar Salvador
  (?)
@ 2022-01-25 13:21           ` Oscar Salvador
  -1 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-25 13:21 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote:
> I know that this has been discussed previously, and the cover-letter already
> mentions it, but I think it would be great to have some sort of information about
> the problem in the commit message as well, so people do not have to go and find
> it somewhere else.

Sorry, the commit already points it out, but I meant to elaborate some more.

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-25 13:21           ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-25 13:21 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Eric Ren,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka

On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote:
> I know that this has been discussed previously, and the cover-letter already
> mentions it, but I think it would be great to have some sort of information about
> the problem in the commit message as well, so people do not have to go and find
> it somewhere else.

Sorry, the commit already points it out, but I meant to elaborate some more.

-- 
Oscar Salvador
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-25 13:21           ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-01-25 13:21 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote:
> I know that this has been discussed previously, and the cover-letter already
> mentions it, but I think it would be great to have some sort of information about
> the problem in the commit message as well, so people do not have to go and find
> it somewhere else.

Sorry, the commit already points it out, but I meant to elaborate some more.

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-01-25 13:21           ` Oscar Salvador
  (?)
@ 2022-01-25 16:31             ` Zi Yan via iommu
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-25 16:31 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

On 25 Jan 2022, at 8:21, Oscar Salvador wrote:

> On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote:
>> I know that this has been discussed previously, and the cover-letter already
>> mentions it, but I think it would be great to have some sort of information about
>> the problem in the commit message as well, so people do not have to go and find
>> it somewhere else.
>
> Sorry, the commit already points it out, but I meant to elaborate some more.

You got it right about the issue.

And I will add more text in the commit message and function comments to clarify
the situation.

Thanks for your suggestions.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-25 16:31             ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan via iommu @ 2022-01-25 16:31 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Mel Gorman, David Hildenbrand, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Eric Ren,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka


[-- Attachment #1.1: Type: text/plain, Size: 652 bytes --]

On 25 Jan 2022, at 8:21, Oscar Salvador wrote:

> On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote:
>> I know that this has been discussed previously, and the cover-letter already
>> mentions it, but I think it would be great to have some sort of information about
>> the problem in the commit message as well, so people do not have to go and find
>> it somewhere else.
>
> Sorry, the commit already points it out, but I meant to elaborate some more.

You got it right about the issue.

And I will add more text in the commit message and function comments to clarify
the situation.

Thanks for your suggestions.

--
Best Regards,
Yan, Zi

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-01-25 16:31             ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-01-25 16:31 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

On 25 Jan 2022, at 8:21, Oscar Salvador wrote:

> On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote:
>> I know that this has been discussed previously, and the cover-letter already
>> mentions it, but I think it would be great to have some sort of information about
>> the problem in the commit message as well, so people do not have to go and find
>> it somewhere else.
>
> Sorry, the commit already points it out, but I meant to elaborate some more.

You got it right about the issue.

And I will add more text in the commit message and function comments to clarify
the situation.

Thanks for your suggestions.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-01-19 19:06   ` Zi Yan
  (?)
@ 2022-02-02 12:18     ` Oscar Salvador
  -1 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-02-02 12:18 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
> pages within that granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.

Another thing that came to my mind.
Prior to this patch, has_unmovable_pages() was checking on pageblock
granularity, starting at pfn#0 of the pageblock.
With this patch, you no longer check on pageblock granularity, which
means you might isolate a pageblock, but some pages that sneaked in
might actually be unmovable.

E.g:

Let's say you have a pageblock that spans (pfn#512,pfn#1024),
and you pass alloc_contig_range() (pfn#514,pfn#1024).
has_unmovable_pages() will start checking the pageblock at pfn#514,
and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
pfn turn out to be actually unmovable?


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-02-02 12:18     ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-02-02 12:18 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
> pages within that granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.

Another thing that came to my mind.
Prior to this patch, has_unmovable_pages() was checking on pageblock
granularity, starting at pfn#0 of the pageblock.
With this patch, you no longer check on pageblock granularity, which
means you might isolate a pageblock, but some pages that sneaked in
might actually be unmovable.

E.g:

Let's say you have a pageblock that spans (pfn#512,pfn#1024),
and you pass alloc_contig_range() (pfn#514,pfn#1024).
has_unmovable_pages() will start checking the pageblock at pfn#514,
and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
pfn turn out to be actually unmovable?


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-02-02 12:18     ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-02-02 12:18 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Eric Ren,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka

On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
> pages within that granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.

Another thing that came to my mind.
Prior to this patch, has_unmovable_pages() was checking on pageblock
granularity, starting at pfn#0 of the pageblock.
With this patch, you no longer check on pageblock granularity, which
means you might isolate a pageblock, but some pages that sneaked in
might actually be unmovable.

E.g:

Let's say you have a pageblock that spans (pfn#512,pfn#1024),
and you pass alloc_contig_range() (pfn#514,pfn#1024).
has_unmovable_pages() will start checking the pageblock at pfn#514,
and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
pfn turn out to be actually unmovable?


-- 
Oscar Salvador
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-02-02 12:18     ` Oscar Salvador
  (?)
  (?)
@ 2022-02-02 12:25       ` David Hildenbrand
  -1 siblings, 0 replies; 72+ messages in thread
From: David Hildenbrand @ 2022-02-02 12:25 UTC (permalink / raw)
  To: Oscar Salvador, Zi Yan
  Cc: linux-mm, linux-kernel, Michael Ellerman, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, linuxppc-dev, virtualization,
	iommu, Vlastimil Babka, Mel Gorman, Eric Ren

On 02.02.22 13:18, Oscar Salvador wrote:
> On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>> pages within that granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
> 
> Another thing that came to my mind.
> Prior to this patch, has_unmovable_pages() was checking on pageblock
> granularity, starting at pfn#0 of the pageblock.
> With this patch, you no longer check on pageblock granularity, which
> means you might isolate a pageblock, but some pages that sneaked in
> might actually be unmovable.
> 
> E.g:
> 
> Let's say you have a pageblock that spans (pfn#512,pfn#1024),
> and you pass alloc_contig_range() (pfn#514,pfn#1024).
> has_unmovable_pages() will start checking the pageblock at pfn#514,
> and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
> pfn turn out to be actually unmovable?

That's the whole idea for being able to allocate parts of an unmovable
pageblock that are movable.

If the first part is unmovable but the second part is movable, nothing
should stop us from trying to allocate the second part.

Of course, we want to remember the original migratetype of the
pageblock, to restore that after isolation -- otherwise we'll end up
converting all such pageblocks to MIGRATE_MOVABLE. The next patch does
that IIRC.

However, devil is in the detail, and I still have to review those parts
of this series.


Note that there are no current users of alloc_contig_range() that
allocate < MAX_ORDER - 1 -- except CMA, but for CMA we immediately exit
has_unmovable_pages() either way.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-02-02 12:25       ` David Hildenbrand
  0 siblings, 0 replies; 72+ messages in thread
From: David Hildenbrand @ 2022-02-02 12:25 UTC (permalink / raw)
  To: Oscar Salvador, Zi Yan
  Cc: Mel Gorman, Michael Ellerman, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka

On 02.02.22 13:18, Oscar Salvador wrote:
> On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>> pages within that granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
> 
> Another thing that came to my mind.
> Prior to this patch, has_unmovable_pages() was checking on pageblock
> granularity, starting at pfn#0 of the pageblock.
> With this patch, you no longer check on pageblock granularity, which
> means you might isolate a pageblock, but some pages that sneaked in
> might actually be unmovable.
> 
> E.g:
> 
> Let's say you have a pageblock that spans (pfn#512,pfn#1024),
> and you pass alloc_contig_range() (pfn#514,pfn#1024).
> has_unmovable_pages() will start checking the pageblock at pfn#514,
> and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
> pfn turn out to be actually unmovable?

That's the whole idea for being able to allocate parts of an unmovable
pageblock that are movable.

If the first part is unmovable but the second part is movable, nothing
should stop us from trying to allocate the second part.

Of course, we want to remember the original migratetype of the
pageblock, to restore that after isolation -- otherwise we'll end up
converting all such pageblocks to MIGRATE_MOVABLE. The next patch does
that IIRC.

However, devil is in the detail, and I still have to review those parts
of this series.


Note that there are no current users of alloc_contig_range() that
allocate < MAX_ORDER - 1 -- except CMA, but for CMA we immediately exit
has_unmovable_pages() either way.

-- 
Thanks,

David / dhildenb

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-02-02 12:25       ` David Hildenbrand
  0 siblings, 0 replies; 72+ messages in thread
From: David Hildenbrand @ 2022-02-02 12:25 UTC (permalink / raw)
  To: Oscar Salvador, Zi Yan
  Cc: Mel Gorman, Michael Ellerman, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

On 02.02.22 13:18, Oscar Salvador wrote:
> On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>> pages within that granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
> 
> Another thing that came to my mind.
> Prior to this patch, has_unmovable_pages() was checking on pageblock
> granularity, starting at pfn#0 of the pageblock.
> With this patch, you no longer check on pageblock granularity, which
> means you might isolate a pageblock, but some pages that sneaked in
> might actually be unmovable.
> 
> E.g:
> 
> Let's say you have a pageblock that spans (pfn#512,pfn#1024),
> and you pass alloc_contig_range() (pfn#514,pfn#1024).
> has_unmovable_pages() will start checking the pageblock at pfn#514,
> and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
> pfn turn out to be actually unmovable?

That's the whole idea for being able to allocate parts of an unmovable
pageblock that are movable.

If the first part is unmovable but the second part is movable, nothing
should stop us from trying to allocate the second part.

Of course, we want to remember the original migratetype of the
pageblock, to restore that after isolation -- otherwise we'll end up
converting all such pageblocks to MIGRATE_MOVABLE. The next patch does
that IIRC.

However, devil is in the detail, and I still have to review those parts
of this series.


Note that there are no current users of alloc_contig_range() that
allocate < MAX_ORDER - 1 -- except CMA, but for CMA we immediately exit
has_unmovable_pages() either way.

-- 
Thanks,

David / dhildenb

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-02-02 12:25       ` David Hildenbrand
  0 siblings, 0 replies; 72+ messages in thread
From: David Hildenbrand @ 2022-02-02 12:25 UTC (permalink / raw)
  To: Oscar Salvador, Zi Yan
  Cc: Mel Gorman, linuxppc-dev, linux-kernel, virtualization, linux-mm,
	iommu, Eric Ren, Robin Murphy, Christoph Hellwig,
	Vlastimil Babka, Marek Szyprowski

On 02.02.22 13:18, Oscar Salvador wrote:
> On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>> pages within that granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
> 
> Another thing that came to my mind.
> Prior to this patch, has_unmovable_pages() was checking on pageblock
> granularity, starting at pfn#0 of the pageblock.
> With this patch, you no longer check on pageblock granularity, which
> means you might isolate a pageblock, but some pages that sneaked in
> might actually be unmovable.
> 
> E.g:
> 
> Let's say you have a pageblock that spans (pfn#512,pfn#1024),
> and you pass alloc_contig_range() (pfn#514,pfn#1024).
> has_unmovable_pages() will start checking the pageblock at pfn#514,
> and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
> pfn turn out to be actually unmovable?

That's the whole idea for being able to allocate parts of an unmovable
pageblock that are movable.

If the first part is unmovable but the second part is movable, nothing
should stop us from trying to allocate the second part.

Of course, we want to remember the original migratetype of the
pageblock, to restore that after isolation -- otherwise we'll end up
converting all such pageblocks to MIGRATE_MOVABLE. The next patch does
that IIRC.

However, devil is in the detail, and I still have to review those parts
of this series.


Note that there are no current users of alloc_contig_range() that
allocate < MAX_ORDER - 1 -- except CMA, but for CMA we immediately exit
has_unmovable_pages() either way.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-02-02 12:25       ` David Hildenbrand
  (?)
@ 2022-02-02 16:25         ` Zi Yan via iommu
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-02-02 16:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]

On 2 Feb 2022, at 7:25, David Hildenbrand wrote:

> On 02.02.22 13:18, Oscar Salvador wrote:
>> On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> Enable set_migratetype_isolate() to check specified sub-range for
>>> unmovable pages during isolation. Page isolation is done
>>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>>> pages within that granularity are intended to be isolated. For example,
>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>> alignment. This commit makes unmovable page check only look for
>>> interesting pages, so that page isolation can succeed for any
>>> non-overlapping ranges.
>>
>> Another thing that came to my mind.
>> Prior to this patch, has_unmovable_pages() was checking on pageblock
>> granularity, starting at pfn#0 of the pageblock.
>> With this patch, you no longer check on pageblock granularity, which
>> means you might isolate a pageblock, but some pages that sneaked in
>> might actually be unmovable.
>>
>> E.g:
>>
>> Let's say you have a pageblock that spans (pfn#512,pfn#1024),
>> and you pass alloc_contig_range() (pfn#514,pfn#1024).
>> has_unmovable_pages() will start checking the pageblock at pfn#514,
>> and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
>> pfn turn out to be actually unmovable?
>
> That's the whole idea for being able to allocate parts of an unmovable
> pageblock that are movable.
>
> If the first part is unmovable but the second part is movable, nothing
> should stop us from trying to allocate the second part.
>
> Of course, we want to remember the original migratetype of the
> pageblock, to restore that after isolation -- otherwise we'll end up
> converting all such pageblocks to MIGRATE_MOVABLE. The next patch does
> that IIRC.

Yes. A desirable optimization is to make MIGRATE_ISOLATE a standalone bit,
so isolating a pageblock will not remove its original migratetype. It is
on my todo list.

>
> However, devil is in the detail, and I still have to review those parts
> of this series.

Thanks. You can wait for my next version. I am planning to make
start_isolate_page_range() accept any address range and move migratetype
save and restore into it, so that the caller do not need to worry about
alignment.

Basically, start_isolate_page_range() will need to migrate compound pages
at the beginning and/or the end of the given range [start_pfn, end_pfn) if
start_pfn and/or end_pfn-1 is in the middle of a compound page.
If start_pfn and/or end_pfn-1 is in the middle of a free page, the free
page will need to be split and put into separate migratetype lists.

>
>
> Note that there are no current users of alloc_contig_range() that
> allocate < MAX_ORDER - 1 -- except CMA, but for CMA we immediately exit
> has_unmovable_pages() either way.
>
> -- 
> Thanks,
>
> David / dhildenb


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-02-02 16:25         ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan via iommu @ 2022-02-02 16:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mel Gorman, Michael Ellerman, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Oscar Salvador,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka


[-- Attachment #1.1: Type: text/plain, Size: 2996 bytes --]

On 2 Feb 2022, at 7:25, David Hildenbrand wrote:

> On 02.02.22 13:18, Oscar Salvador wrote:
>> On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> Enable set_migratetype_isolate() to check specified sub-range for
>>> unmovable pages during isolation. Page isolation is done
>>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>>> pages within that granularity are intended to be isolated. For example,
>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>> alignment. This commit makes unmovable page check only look for
>>> interesting pages, so that page isolation can succeed for any
>>> non-overlapping ranges.
>>
>> Another thing that came to my mind.
>> Prior to this patch, has_unmovable_pages() was checking on pageblock
>> granularity, starting at pfn#0 of the pageblock.
>> With this patch, you no longer check on pageblock granularity, which
>> means you might isolate a pageblock, but some pages that sneaked in
>> might actually be unmovable.
>>
>> E.g:
>>
>> Let's say you have a pageblock that spans (pfn#512,pfn#1024),
>> and you pass alloc_contig_range() (pfn#514,pfn#1024).
>> has_unmovable_pages() will start checking the pageblock at pfn#514,
>> and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
>> pfn turn out to be actually unmovable?
>
> That's the whole idea for being able to allocate parts of an unmovable
> pageblock that are movable.
>
> If the first part is unmovable but the second part is movable, nothing
> should stop us from trying to allocate the second part.
>
> Of course, we want to remember the original migratetype of the
> pageblock, to restore that after isolation -- otherwise we'll end up
> converting all such pageblocks to MIGRATE_MOVABLE. The next patch does
> that IIRC.

Yes. A desirable optimization is to make MIGRATE_ISOLATE a standalone bit,
so isolating a pageblock will not remove its original migratetype. It is
on my todo list.

>
> However, devil is in the detail, and I still have to review those parts
> of this series.

Thanks. You can wait for my next version. I am planning to make
start_isolate_page_range() accept any address range and move migratetype
save and restore into it, so that the caller do not need to worry about
alignment.

Basically, start_isolate_page_range() will need to migrate compound pages
at the beginning and/or the end of the given range [start_pfn, end_pfn) if
start_pfn and/or end_pfn-1 is in the middle of a compound page.
If start_pfn and/or end_pfn-1 is in the middle of a free page, the free
page will need to be split and put into separate migratetype lists.

>
>
> Note that there are no current users of alloc_contig_range() that
> allocate < MAX_ORDER - 1 -- except CMA, but for CMA we immediately exit
> has_unmovable_pages() either way.
>
> -- 
> Thanks,
>
> David / dhildenb


--
Best Regards,
Yan, Zi

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-02-02 16:25         ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-02-02 16:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mel Gorman, linuxppc-dev, linux-kernel, virtualization, linux-mm,
	iommu, Eric Ren, Oscar Salvador, Robin Murphy, Christoph Hellwig,
	Vlastimil Babka, Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]

On 2 Feb 2022, at 7:25, David Hildenbrand wrote:

> On 02.02.22 13:18, Oscar Salvador wrote:
>> On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> Enable set_migratetype_isolate() to check specified sub-range for
>>> unmovable pages during isolation. Page isolation is done
>>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>>> pages within that granularity are intended to be isolated. For example,
>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>> alignment. This commit makes unmovable page check only look for
>>> interesting pages, so that page isolation can succeed for any
>>> non-overlapping ranges.
>>
>> Another thing that came to my mind.
>> Prior to this patch, has_unmovable_pages() was checking on pageblock
>> granularity, starting at pfn#0 of the pageblock.
>> With this patch, you no longer check on pageblock granularity, which
>> means you might isolate a pageblock, but some pages that sneaked in
>> might actually be unmovable.
>>
>> E.g:
>>
>> Let's say you have a pageblock that spans (pfn#512,pfn#1024),
>> and you pass alloc_contig_range() (pfn#514,pfn#1024).
>> has_unmovable_pages() will start checking the pageblock at pfn#514,
>> and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those
>> pfn turn out to be actually unmovable?
>
> That's the whole idea for being able to allocate parts of an unmovable
> pageblock that are movable.
>
> If the first part is unmovable but the second part is movable, nothing
> should stop us from trying to allocate the second part.
>
> Of course, we want to remember the original migratetype of the
> pageblock, to restore that after isolation -- otherwise we'll end up
> converting all such pageblocks to MIGRATE_MOVABLE. The next patch does
> that IIRC.

Yes. A desirable optimization is to make MIGRATE_ISOLATE a standalone bit,
so isolating a pageblock will not remove its original migratetype. It is
on my todo list.

>
> However, devil is in the detail, and I still have to review those parts
> of this series.

Thanks. You can wait for my next version. I am planning to make
start_isolate_page_range() accept any address range and move migratetype
save and restore into it, so that the caller do not need to worry about
alignment.

Basically, start_isolate_page_range() will need to migrate compound pages
at the beginning and/or the end of the given range [start_pfn, end_pfn) if
start_pfn and/or end_pfn-1 is in the middle of a compound page.
If start_pfn and/or end_pfn-1 is in the middle of a free page, the free
page will need to be split and put into separate migratetype lists.

>
>
> Note that there are no current users of alloc_contig_range() that
> allocate < MAX_ORDER - 1 -- except CMA, but for CMA we immediately exit
> has_unmovable_pages() either way.
>
> -- 
> Thanks,
>
> David / dhildenb


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
  2022-02-02 12:25       ` David Hildenbrand
  (?)
@ 2022-02-02 16:35         ` Oscar Salvador
  -1 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-02-02 16:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Zi Yan, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

On Wed, Feb 02, 2022 at 01:25:28PM +0100, David Hildenbrand wrote:
> That's the whole idea for being able to allocate parts of an unmovable
> pageblock that are movable.
> 
> If the first part is unmovable but the second part is movable, nothing
> should stop us from trying to allocate the second part.

Yeah, I see, I was a bit slow there, but I see the point now.
 
Thanks David

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-02-02 16:35         ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-02-02 16:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mel Gorman, Eric Ren, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Zi Yan,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka

On Wed, Feb 02, 2022 at 01:25:28PM +0100, David Hildenbrand wrote:
> That's the whole idea for being able to allocate parts of an unmovable
> pageblock that are movable.
> 
> If the first part is unmovable but the second part is movable, nothing
> should stop us from trying to allocate the second part.

Yeah, I see, I was a bit slow there, but I see the point now.
 
Thanks David

-- 
Oscar Salvador
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages
@ 2022-02-02 16:35         ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-02-02 16:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mel Gorman, Eric Ren, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Zi Yan, Robin Murphy, Christoph Hellwig,
	Vlastimil Babka, Marek Szyprowski

On Wed, Feb 02, 2022 at 01:25:28PM +0100, David Hildenbrand wrote:
> That's the whole idea for being able to allocate parts of an unmovable
> pageblock that are movable.
> 
> If the first part is unmovable but the second part is movable, nothing
> should stop us from trying to allocate the second part.

Yeah, I see, I was a bit slow there, but I see the point now.
 
Thanks David

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
  2022-01-19 19:06   ` Zi Yan
  (?)
@ 2022-02-04 13:56     ` Oscar Salvador
  -1 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-02-04 13:56 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
> 
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.

Hi Zi Yan,

Due to time constraints I only glanced over, so some comments below
about stuff that caught my eye:

> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> +				int order, struct zone *zone)
> +{
> +	unsigned long pfn;
> +
> +	spin_lock(&zone->lock);
> +	del_page_from_free_list(free_page, zone, order);
> +	for (pfn = page_to_pfn(free_page);
> +	     pfn < page_to_pfn(free_page) + (1UL << order);

It migt make sense to have a end_pfn variable so that does not have to
be constantly evaluated. Or maybe the compiler is clever enough to only
evualuate it once.

> +	     pfn += pageblock_nr_pages) {
> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> +		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> +				mt, FPI_NONE);
> +	}
> +	spin_unlock(&zone->lock);

It is possible that free_page's order is already pageblock_order, so I
would add a one-liner upfront to catch that case and return, otherwise
we do the delete_from_freelist-and-free_it_again dance.

> +	/* Save the migratepages of the pageblocks before start and after end */
> +	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
> +				+ (isolate_end - alloc_end) / pageblock_nr_pages;
> +	saved_mt =
> +		kmalloc_array(num_pageblock_to_save,
> +			      sizeof(unsigned char), GFP_KERNEL);
> +	if (!saved_mt)
> +		return -ENOMEM;
> +
> +	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> +	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);

I really hope we can put all this magic within start_isolate_page_range,
and the counterparts in undo_isolate_page_range.

Also, I kinda dislike the &saved_mt thing. I thought about some other
approaches but nothing that wasn't too specific for this case, and I
guess we want that function to be as generic as possible.

> +	/*
> +	 * Split free page spanning [alloc_end, isolate_end) and put the
> +	 * pageblocks in the right migratetype list
> +	 */
> +	for (outer_end = alloc_end; outer_end < isolate_end;) {
> +		unsigned long begin_pfn = outer_end;
> +
> +		order = 0;
> +		while (!PageBuddy(pfn_to_page(outer_end))) {
> +			if (++order >= MAX_ORDER) {
> +				outer_end = begin_pfn;
> +				break;
> +			}
> +			outer_end &= ~0UL << order;
> +		}
> +
> +		if (outer_end != begin_pfn) {
> +			order = buddy_order(pfn_to_page(outer_end));
> +
> +			/*
> +			 * split the free page has start page and put the pageblocks
> +			 * in the right migratetype list
> +			 */
> +			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);

How could this possibily happen?

> +			{
> +				struct page *free_page = pfn_to_page(outer_end);
> +
> +				split_free_page_into_pageblocks(free_page, order, cc.zone);
> +			}
> +			outer_end += 1UL << order;
> +		} else
> +			outer_end = begin_pfn + 1;
>  	}

I think there are cases could optimize for. If the page has already been
split in pageblock by the outer_start loop, we could skip this outer_end
logic altogether.

E.g: An order-10 page is split in two pageblocks. There's nothing else
to be done, right? We could skip this. 


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
@ 2022-02-04 13:56     ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-02-04 13:56 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Eric Ren,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka

On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
> 
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.

Hi Zi Yan,

Due to time constraints I only glanced over, so some comments below
about stuff that caught my eye:

> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> +				int order, struct zone *zone)
> +{
> +	unsigned long pfn;
> +
> +	spin_lock(&zone->lock);
> +	del_page_from_free_list(free_page, zone, order);
> +	for (pfn = page_to_pfn(free_page);
> +	     pfn < page_to_pfn(free_page) + (1UL << order);

It migt make sense to have a end_pfn variable so that does not have to
be constantly evaluated. Or maybe the compiler is clever enough to only
evualuate it once.

> +	     pfn += pageblock_nr_pages) {
> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> +		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> +				mt, FPI_NONE);
> +	}
> +	spin_unlock(&zone->lock);

It is possible that free_page's order is already pageblock_order, so I
would add a one-liner upfront to catch that case and return, otherwise
we do the delete_from_freelist-and-free_it_again dance.

> +	/* Save the migratepages of the pageblocks before start and after end */
> +	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
> +				+ (isolate_end - alloc_end) / pageblock_nr_pages;
> +	saved_mt =
> +		kmalloc_array(num_pageblock_to_save,
> +			      sizeof(unsigned char), GFP_KERNEL);
> +	if (!saved_mt)
> +		return -ENOMEM;
> +
> +	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> +	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);

I really hope we can put all this magic within start_isolate_page_range,
and the counterparts in undo_isolate_page_range.

Also, I kinda dislike the &saved_mt thing. I thought about some other
approaches but nothing that wasn't too specific for this case, and I
guess we want that function to be as generic as possible.

> +	/*
> +	 * Split free page spanning [alloc_end, isolate_end) and put the
> +	 * pageblocks in the right migratetype list
> +	 */
> +	for (outer_end = alloc_end; outer_end < isolate_end;) {
> +		unsigned long begin_pfn = outer_end;
> +
> +		order = 0;
> +		while (!PageBuddy(pfn_to_page(outer_end))) {
> +			if (++order >= MAX_ORDER) {
> +				outer_end = begin_pfn;
> +				break;
> +			}
> +			outer_end &= ~0UL << order;
> +		}
> +
> +		if (outer_end != begin_pfn) {
> +			order = buddy_order(pfn_to_page(outer_end));
> +
> +			/*
> +			 * split the free page has start page and put the pageblocks
> +			 * in the right migratetype list
> +			 */
> +			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);

How could this possibily happen?

> +			{
> +				struct page *free_page = pfn_to_page(outer_end);
> +
> +				split_free_page_into_pageblocks(free_page, order, cc.zone);
> +			}
> +			outer_end += 1UL << order;
> +		} else
> +			outer_end = begin_pfn + 1;
>  	}

I think there are cases could optimize for. If the page has already been
split in pageblock by the outer_start loop, we could skip this outer_end
logic altogether.

E.g: An order-10 page is split in two pageblocks. There's nothing else
to be done, right? We could skip this. 


-- 
Oscar Salvador
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
@ 2022-02-04 13:56     ` Oscar Salvador
  0 siblings, 0 replies; 72+ messages in thread
From: Oscar Salvador @ 2022-02-04 13:56 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
> 
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.

Hi Zi Yan,

Due to time constraints I only glanced over, so some comments below
about stuff that caught my eye:

> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> +				int order, struct zone *zone)
> +{
> +	unsigned long pfn;
> +
> +	spin_lock(&zone->lock);
> +	del_page_from_free_list(free_page, zone, order);
> +	for (pfn = page_to_pfn(free_page);
> +	     pfn < page_to_pfn(free_page) + (1UL << order);

It migt make sense to have a end_pfn variable so that does not have to
be constantly evaluated. Or maybe the compiler is clever enough to only
evualuate it once.

> +	     pfn += pageblock_nr_pages) {
> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> +		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> +				mt, FPI_NONE);
> +	}
> +	spin_unlock(&zone->lock);

It is possible that free_page's order is already pageblock_order, so I
would add a one-liner upfront to catch that case and return, otherwise
we do the delete_from_freelist-and-free_it_again dance.

> +	/* Save the migratepages of the pageblocks before start and after end */
> +	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
> +				+ (isolate_end - alloc_end) / pageblock_nr_pages;
> +	saved_mt =
> +		kmalloc_array(num_pageblock_to_save,
> +			      sizeof(unsigned char), GFP_KERNEL);
> +	if (!saved_mt)
> +		return -ENOMEM;
> +
> +	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> +	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);

I really hope we can put all this magic within start_isolate_page_range,
and the counterparts in undo_isolate_page_range.

Also, I kinda dislike the &saved_mt thing. I thought about some other
approaches but nothing that wasn't too specific for this case, and I
guess we want that function to be as generic as possible.

> +	/*
> +	 * Split free page spanning [alloc_end, isolate_end) and put the
> +	 * pageblocks in the right migratetype list
> +	 */
> +	for (outer_end = alloc_end; outer_end < isolate_end;) {
> +		unsigned long begin_pfn = outer_end;
> +
> +		order = 0;
> +		while (!PageBuddy(pfn_to_page(outer_end))) {
> +			if (++order >= MAX_ORDER) {
> +				outer_end = begin_pfn;
> +				break;
> +			}
> +			outer_end &= ~0UL << order;
> +		}
> +
> +		if (outer_end != begin_pfn) {
> +			order = buddy_order(pfn_to_page(outer_end));
> +
> +			/*
> +			 * split the free page has start page and put the pageblocks
> +			 * in the right migratetype list
> +			 */
> +			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);

How could this possibily happen?

> +			{
> +				struct page *free_page = pfn_to_page(outer_end);
> +
> +				split_free_page_into_pageblocks(free_page, order, cc.zone);
> +			}
> +			outer_end += 1UL << order;
> +		} else
> +			outer_end = begin_pfn + 1;
>  	}

I think there are cases could optimize for. If the page has already been
split in pageblock by the outer_start loop, we could skip this outer_end
logic altogether.

E.g: An order-10 page is split in two pageblocks. There's nothing else
to be done, right? We could skip this. 


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
  2022-02-04 13:56     ` Oscar Salvador
  (?)
@ 2022-02-04 15:19       ` Zi Yan via iommu
  -1 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-02-04 15:19 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: David Hildenbrand, linux-mm, linux-kernel, Michael Ellerman,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, linuxppc-dev,
	virtualization, iommu, Vlastimil Babka, Mel Gorman, Eric Ren

[-- Attachment #1: Type: text/plain, Size: 4612 bytes --]

On 4 Feb 2022, at 8:56, Oscar Salvador wrote:

> On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
>> pageblocks with different migratetypes. It might unnecessarily convert
>> extra pageblocks at the beginning and at the end of the range. Change
>> alloc_contig_range() to work at pageblock granularity.
>>
>> It is done by restoring pageblock types and split >pageblock_order free
>> pages after isolating at MAX_ORDER-1 granularity and migrating pages
>> away at pageblock granularity. The reason for this process is that
>> during isolation, some pages, either free or in-use, might have >pageblock
>> sizes and isolating part of them can cause free accounting issues.
>> Restoring the migratetypes of the pageblocks not in the interesting
>> range later is much easier.
>
> Hi Zi Yan,
>
> Due to time constraints I only glanced over, so some comments below
> about stuff that caught my eye:

Thanks for looking at the patches!

>
>> +static inline void split_free_page_into_pageblocks(struct page *free_page,
>> +				int order, struct zone *zone)
>> +{
>> +	unsigned long pfn;
>> +
>> +	spin_lock(&zone->lock);
>> +	del_page_from_free_list(free_page, zone, order);
>> +	for (pfn = page_to_pfn(free_page);
>> +	     pfn < page_to_pfn(free_page) + (1UL << order);
>
> It migt make sense to have a end_pfn variable so that does not have to
> be constantly evaluated. Or maybe the compiler is clever enough to only
> evualuate it once.

Sure. Will add end_pfn.

>
>> +	     pfn += pageblock_nr_pages) {
>> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>> +
>> +		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
>> +				mt, FPI_NONE);
>> +	}
>> +	spin_unlock(&zone->lock);
>
> It is possible that free_page's order is already pageblock_order, so I
> would add a one-liner upfront to catch that case and return, otherwise
> we do the delete_from_freelist-and-free_it_again dance.

Make sense. Will do.

>
>> +	/* Save the migratepages of the pageblocks before start and after end */
>> +	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
>> +				+ (isolate_end - alloc_end) / pageblock_nr_pages;
>> +	saved_mt =
>> +		kmalloc_array(num_pageblock_to_save,
>> +			      sizeof(unsigned char), GFP_KERNEL);
>> +	if (!saved_mt)
>> +		return -ENOMEM;
>> +
>> +	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
>> +
>> +	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
>
> I really hope we can put all this magic within start_isolate_page_range,
> and the counterparts in undo_isolate_page_range.
>

That is my plan too.

> Also, I kinda dislike the &saved_mt thing. I thought about some other
> approaches but nothing that wasn't too specific for this case, and I
> guess we want that function to be as generic as possible.
>
I do not like it either. This whole save and restore thing should go away
once I make MIGRATE_ISOLATE a standalone bit, so that the original
migrateypes will not be overwritten during isolation. Hopefully, I can
work on it soon to get rid of this hunk completely.

>> +	/*
>> +	 * Split free page spanning [alloc_end, isolate_end) and put the
>> +	 * pageblocks in the right migratetype list
>> +	 */
>> +	for (outer_end = alloc_end; outer_end < isolate_end;) {
>> +		unsigned long begin_pfn = outer_end;
>> +
>> +		order = 0;
>> +		while (!PageBuddy(pfn_to_page(outer_end))) {
>> +			if (++order >= MAX_ORDER) {
>> +				outer_end = begin_pfn;
>> +				break;
>> +			}
>> +			outer_end &= ~0UL << order;
>> +		}
>> +
>> +		if (outer_end != begin_pfn) {
>> +			order = buddy_order(pfn_to_page(outer_end));
>> +
>> +			/*
>> +			 * split the free page has start page and put the pageblocks
>> +			 * in the right migratetype list
>> +			 */
>> +			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
>
> How could this possibily happen?

Right. It is not possible. Will remove it.

>
>> +			{
>> +				struct page *free_page = pfn_to_page(outer_end);
>> +
>> +				split_free_page_into_pageblocks(free_page, order, cc.zone);
>> +			}
>> +			outer_end += 1UL << order;
>> +		} else
>> +			outer_end = begin_pfn + 1;
>>  	}
>
> I think there are cases could optimize for. If the page has already been
> split in pageblock by the outer_start loop, we could skip this outer_end
> logic altogether.
>
> E.g: An order-10 page is split in two pageblocks. There's nothing else
> to be done, right? We could skip this.

Yes. I will think about it more and do some optimization.


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
@ 2022-02-04 15:19       ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan via iommu @ 2022-02-04 15:19 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Mel Gorman, David Hildenbrand, Michael Ellerman, linuxppc-dev,
	linux-kernel, virtualization, linux-mm, iommu, Eric Ren,
	Robin Murphy, Christoph Hellwig, Vlastimil Babka


[-- Attachment #1.1: Type: text/plain, Size: 4612 bytes --]

On 4 Feb 2022, at 8:56, Oscar Salvador wrote:

> On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
>> pageblocks with different migratetypes. It might unnecessarily convert
>> extra pageblocks at the beginning and at the end of the range. Change
>> alloc_contig_range() to work at pageblock granularity.
>>
>> It is done by restoring pageblock types and split >pageblock_order free
>> pages after isolating at MAX_ORDER-1 granularity and migrating pages
>> away at pageblock granularity. The reason for this process is that
>> during isolation, some pages, either free or in-use, might have >pageblock
>> sizes and isolating part of them can cause free accounting issues.
>> Restoring the migratetypes of the pageblocks not in the interesting
>> range later is much easier.
>
> Hi Zi Yan,
>
> Due to time constraints I only glanced over, so some comments below
> about stuff that caught my eye:

Thanks for looking at the patches!

>
>> +static inline void split_free_page_into_pageblocks(struct page *free_page,
>> +				int order, struct zone *zone)
>> +{
>> +	unsigned long pfn;
>> +
>> +	spin_lock(&zone->lock);
>> +	del_page_from_free_list(free_page, zone, order);
>> +	for (pfn = page_to_pfn(free_page);
>> +	     pfn < page_to_pfn(free_page) + (1UL << order);
>
> It migt make sense to have a end_pfn variable so that does not have to
> be constantly evaluated. Or maybe the compiler is clever enough to only
> evualuate it once.

Sure. Will add end_pfn.

>
>> +	     pfn += pageblock_nr_pages) {
>> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>> +
>> +		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
>> +				mt, FPI_NONE);
>> +	}
>> +	spin_unlock(&zone->lock);
>
> It is possible that free_page's order is already pageblock_order, so I
> would add a one-liner upfront to catch that case and return, otherwise
> we do the delete_from_freelist-and-free_it_again dance.

Make sense. Will do.

>
>> +	/* Save the migratepages of the pageblocks before start and after end */
>> +	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
>> +				+ (isolate_end - alloc_end) / pageblock_nr_pages;
>> +	saved_mt =
>> +		kmalloc_array(num_pageblock_to_save,
>> +			      sizeof(unsigned char), GFP_KERNEL);
>> +	if (!saved_mt)
>> +		return -ENOMEM;
>> +
>> +	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
>> +
>> +	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
>
> I really hope we can put all this magic within start_isolate_page_range,
> and the counterparts in undo_isolate_page_range.
>

That is my plan too.

> Also, I kinda dislike the &saved_mt thing. I thought about some other
> approaches but nothing that wasn't too specific for this case, and I
> guess we want that function to be as generic as possible.
>
I do not like it either. This whole save and restore thing should go away
once I make MIGRATE_ISOLATE a standalone bit, so that the original
migrateypes will not be overwritten during isolation. Hopefully, I can
work on it soon to get rid of this hunk completely.

>> +	/*
>> +	 * Split free page spanning [alloc_end, isolate_end) and put the
>> +	 * pageblocks in the right migratetype list
>> +	 */
>> +	for (outer_end = alloc_end; outer_end < isolate_end;) {
>> +		unsigned long begin_pfn = outer_end;
>> +
>> +		order = 0;
>> +		while (!PageBuddy(pfn_to_page(outer_end))) {
>> +			if (++order >= MAX_ORDER) {
>> +				outer_end = begin_pfn;
>> +				break;
>> +			}
>> +			outer_end &= ~0UL << order;
>> +		}
>> +
>> +		if (outer_end != begin_pfn) {
>> +			order = buddy_order(pfn_to_page(outer_end));
>> +
>> +			/*
>> +			 * split the free page has start page and put the pageblocks
>> +			 * in the right migratetype list
>> +			 */
>> +			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
>
> How could this possibily happen?

Right. It is not possible. Will remove it.

>
>> +			{
>> +				struct page *free_page = pfn_to_page(outer_end);
>> +
>> +				split_free_page_into_pageblocks(free_page, order, cc.zone);
>> +			}
>> +			outer_end += 1UL << order;
>> +		} else
>> +			outer_end = begin_pfn + 1;
>>  	}
>
> I think there are cases could optimize for. If the page has already been
> split in pageblock by the outer_start loop, we could skip this outer_end
> logic altogether.
>
> E.g: An order-10 page is split in two pageblocks. There's nothing else
> to be done, right? We could skip this.

Yes. I will think about it more and do some optimization.


--
Best Regards,
Yan, Zi

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
@ 2022-02-04 15:19       ` Zi Yan via iommu
  0 siblings, 0 replies; 72+ messages in thread
From: Zi Yan @ 2022-02-04 15:19 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Eric Ren, Robin Murphy,
	Christoph Hellwig, Vlastimil Babka, Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 4612 bytes --]

On 4 Feb 2022, at 8:56, Oscar Salvador wrote:

> On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
>> pageblocks with different migratetypes. It might unnecessarily convert
>> extra pageblocks at the beginning and at the end of the range. Change
>> alloc_contig_range() to work at pageblock granularity.
>>
>> It is done by restoring pageblock types and split >pageblock_order free
>> pages after isolating at MAX_ORDER-1 granularity and migrating pages
>> away at pageblock granularity. The reason for this process is that
>> during isolation, some pages, either free or in-use, might have >pageblock
>> sizes and isolating part of them can cause free accounting issues.
>> Restoring the migratetypes of the pageblocks not in the interesting
>> range later is much easier.
>
> Hi Zi Yan,
>
> Due to time constraints I only glanced over, so some comments below
> about stuff that caught my eye:

Thanks for looking at the patches!

>
>> +static inline void split_free_page_into_pageblocks(struct page *free_page,
>> +				int order, struct zone *zone)
>> +{
>> +	unsigned long pfn;
>> +
>> +	spin_lock(&zone->lock);
>> +	del_page_from_free_list(free_page, zone, order);
>> +	for (pfn = page_to_pfn(free_page);
>> +	     pfn < page_to_pfn(free_page) + (1UL << order);
>
> It migt make sense to have a end_pfn variable so that does not have to
> be constantly evaluated. Or maybe the compiler is clever enough to only
> evualuate it once.

Sure. Will add end_pfn.

>
>> +	     pfn += pageblock_nr_pages) {
>> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>> +
>> +		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
>> +				mt, FPI_NONE);
>> +	}
>> +	spin_unlock(&zone->lock);
>
> It is possible that free_page's order is already pageblock_order, so I
> would add a one-liner upfront to catch that case and return, otherwise
> we do the delete_from_freelist-and-free_it_again dance.

Make sense. Will do.

>
>> +	/* Save the migratepages of the pageblocks before start and after end */
>> +	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
>> +				+ (isolate_end - alloc_end) / pageblock_nr_pages;
>> +	saved_mt =
>> +		kmalloc_array(num_pageblock_to_save,
>> +			      sizeof(unsigned char), GFP_KERNEL);
>> +	if (!saved_mt)
>> +		return -ENOMEM;
>> +
>> +	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
>> +
>> +	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
>
> I really hope we can put all this magic within start_isolate_page_range,
> and the counterparts in undo_isolate_page_range.
>

That is my plan too.

> Also, I kinda dislike the &saved_mt thing. I thought about some other
> approaches but nothing that wasn't too specific for this case, and I
> guess we want that function to be as generic as possible.
>
I do not like it either. This whole save and restore thing should go away
once I make MIGRATE_ISOLATE a standalone bit, so that the original
migrateypes will not be overwritten during isolation. Hopefully, I can
work on it soon to get rid of this hunk completely.

>> +	/*
>> +	 * Split free page spanning [alloc_end, isolate_end) and put the
>> +	 * pageblocks in the right migratetype list
>> +	 */
>> +	for (outer_end = alloc_end; outer_end < isolate_end;) {
>> +		unsigned long begin_pfn = outer_end;
>> +
>> +		order = 0;
>> +		while (!PageBuddy(pfn_to_page(outer_end))) {
>> +			if (++order >= MAX_ORDER) {
>> +				outer_end = begin_pfn;
>> +				break;
>> +			}
>> +			outer_end &= ~0UL << order;
>> +		}
>> +
>> +		if (outer_end != begin_pfn) {
>> +			order = buddy_order(pfn_to_page(outer_end));
>> +
>> +			/*
>> +			 * split the free page has start page and put the pageblocks
>> +			 * in the right migratetype list
>> +			 */
>> +			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
>
> How could this possibily happen?

Right. It is not possible. Will remove it.

>
>> +			{
>> +				struct page *free_page = pfn_to_page(outer_end);
>> +
>> +				split_free_page_into_pageblocks(free_page, order, cc.zone);
>> +			}
>> +			outer_end += 1UL << order;
>> +		} else
>> +			outer_end = begin_pfn + 1;
>>  	}
>
> I think there are cases could optimize for. If the page has already been
> split in pageblock by the outer_start loop, we could skip this outer_end
> logic altogether.
>
> E.g: An order-10 page is split in two pageblocks. There's nothing else
> to be done, right? We could skip this.

Yes. I will think about it more and do some optimization.


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2022-02-04 15:21 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-19 19:06 [PATCH v4 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` Zi Yan
2022-01-19 19:06 ` [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-24 14:02   ` Mel Gorman
2022-01-24 14:02     ` Mel Gorman
2022-01-24 14:02     ` Mel Gorman
2022-01-24 14:02     ` Mel Gorman
2022-01-24 16:12     ` Zi Yan
2022-01-24 16:12       ` Zi Yan
2022-01-24 16:12       ` Zi Yan via iommu
2022-01-24 16:43       ` Mel Gorman
2022-01-24 16:43         ` Mel Gorman
2022-01-24 16:43         ` Mel Gorman
2022-01-24 16:43         ` Mel Gorman
2022-01-19 19:06 ` [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-25  6:23   ` Oscar Salvador
2022-01-25  6:23     ` Oscar Salvador
2022-01-25  6:23     ` Oscar Salvador
2022-01-19 19:06 ` [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-24  9:55   ` Oscar Salvador
2022-01-24  9:55     ` Oscar Salvador
2022-01-24  9:55     ` Oscar Salvador
2022-01-24 17:17     ` Zi Yan
2022-01-24 17:17       ` Zi Yan
2022-01-24 17:17       ` Zi Yan via iommu
2022-01-25 13:19       ` Oscar Salvador
2022-01-25 13:19         ` Oscar Salvador
2022-01-25 13:19         ` Oscar Salvador
2022-01-25 13:21         ` Oscar Salvador
2022-01-25 13:21           ` Oscar Salvador
2022-01-25 13:21           ` Oscar Salvador
2022-01-25 16:31           ` Zi Yan
2022-01-25 16:31             ` Zi Yan
2022-01-25 16:31             ` Zi Yan via iommu
2022-02-02 12:18   ` Oscar Salvador
2022-02-02 12:18     ` Oscar Salvador
2022-02-02 12:18     ` Oscar Salvador
2022-02-02 12:25     ` David Hildenbrand
2022-02-02 12:25       ` David Hildenbrand
2022-02-02 12:25       ` David Hildenbrand
2022-02-02 12:25       ` David Hildenbrand
2022-02-02 16:25       ` Zi Yan
2022-02-02 16:25         ` Zi Yan
2022-02-02 16:25         ` Zi Yan via iommu
2022-02-02 16:35       ` Oscar Salvador
2022-02-02 16:35         ` Oscar Salvador
2022-02-02 16:35         ` Oscar Salvador
2022-01-19 19:06 ` [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-02-04 13:56   ` Oscar Salvador
2022-02-04 13:56     ` Oscar Salvador
2022-02-04 13:56     ` Oscar Salvador
2022-02-04 15:19     ` Zi Yan
2022-02-04 15:19       ` Zi Yan
2022-02-04 15:19       ` Zi Yan via iommu
2022-01-19 19:06 ` [PATCH v4 5/7] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-19 19:06 ` [PATCH v4 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-19 19:06 ` [PATCH v4 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
2022-01-19 19:06   ` Zi Yan
2022-01-19 19:06   ` Zi Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.