[RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
@ 2021-11-15 19:37 Zi Yan
  2021-11-15 19:37 ` [RFC PATCH 1/3] mm: cma: alloc_contig_range: use pageblock_order as the single alignment Zi Yan
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Zi Yan @ 2021-11-15 19:37 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Robin Murphy, linux-kernel, iommu, Zi Yan, virtualization,
	linuxppc-dev, Christoph Hellwig, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

Hi David,

You suggested to make alloc_contig_range() deal with pageblock_order instead of
MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
patchset is my attempt to achieve that. Please take a look and let me know if
I am doing it correctly or not.

From what my understanding, cma required alignment of
max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
__free_one_page() does not prevent merging two different pageblocks, when
MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
does prevent that. It should be OK to just align cma to pageblock_order.
alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
pageblock_order as alignment too.

In terms of virtio_mem, if I understand correctly, it relies on
alloc_contig_range() to obtain contiguous free pages and offlines them to reduce
guest memory size. As the result of alloc_contig_range() alignment change,
virtio_mem should be able to just align PFNs to pageblock_order.

Thanks.

[1] https://lore.kernel.org/linux-mm/28b57903-fae6-47ac-7e1b-a1dd41421349@redhat.com/

Zi Yan (3):
  mm: cma: alloc_contig_range: use pageblock_order as the single
    alignment.
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
    size.
  arch: powerpc: adjust fadump alignment to be pageblock aligned.

 arch/powerpc/include/asm/fadump-internal.h |  4 +---
 drivers/virtio/virtio_mem.c                |  6 ++----
 include/linux/mmzone.h                     |  5 +----
 kernel/dma/contiguous.c                    |  2 +-
 mm/cma.c                                   |  6 ++----
 mm/page_alloc.c                            | 12 +++++-------
 6 files changed, 12 insertions(+), 23 deletions(-)

-- 
2.33.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/3] mm: cma: alloc_contig_range: use pageblock_order as the single alignment.
  2021-11-15 19:37 [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
@ 2021-11-15 19:37 ` Zi Yan
  2021-11-15 19:37 ` [RFC PATCH 2/3] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Zi Yan @ 2021-11-15 19:37 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Robin Murphy, linux-kernel, iommu, Zi Yan, virtualization,
	linuxppc-dev, Christoph Hellwig, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

Pages are grouped in unit of pageblock_order for mobility. cma and
alloc_contig_range() uses pageblock for isolation and migration, so
aligning PFNs to pageblock_order is necessary. But the PFNs used in cma
and alloc_contig_range() were aligned to max(pageblock_order, MAX_ORDER-1).
This makes no difference than aligning only to pageblock_order, since
if pageblock_order > MAX_ORDER, the PFN is pageblock aligned,
otherwise, the PFN is still pageblock aligned. Drop MAX_ORDER alignment
requirement.

When commit 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
introduced MIGRATE_CMA, __free_one_page() did not prevent merging
freepages on isolate pagelbock and normal pageblock, thus it required
max(pageblock_order, MAX_ORDER-1) alignment. __free_one_page() has
changed and does prevent such merges.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/mmzone.h  |  5 +----
 kernel/dma/contiguous.c |  2 +-
 mm/cma.c                |  6 ++----
 mm/page_alloc.c         | 12 +++++-------
 4 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..6c61aa41a779 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
 	 *
 	 * The way to use it is to change migratetype of a range of
 	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger than
-	 * a single pageblock.
+	 * __free_pageblock_cma() function.
 	 */
 	MIGRATE_CMA,
 #endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {
 
 static int __init rmem_cma_setup(struct reserved_mem *rmem)
 {
-	phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+	phys_addr_t align = PAGE_SIZE << pageblock_order;
 	phys_addr_t mask = align - 1;
 	unsigned long node = rmem->fdt_node;
 	bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..d171158bd418 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -180,8 +180,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 		return -EINVAL;
 
 	/* ensure minimal alignment required by mm core */
-	alignment = PAGE_SIZE <<
-			max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+	alignment = PAGE_SIZE << pageblock_order;
 
 	/* alignment should be aligned with order_per_bit */
 	if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -268,8 +267,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 	 * migratetype page by page allocator's buddy algorithm. In the case,
 	 * you couldn't get a contiguous memory, which is not what we want.
 	 */
-	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE <<
-			  max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+	alignment = max(alignment,  (phys_addr_t)PAGE_SIZE << pageblock_order);
 	if (fixed && base & (alignment - 1)) {
 		ret = -EINVAL;
 		pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..5700f5502d59 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8922,14 +8922,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
-	return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
-			     pageblock_nr_pages) - 1);
+	return pfn & ~(pageblock_nr_pages - 1);
 }
 
 static unsigned long pfn_max_align_up(unsigned long pfn)
 {
-	return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
-				pageblock_nr_pages));
+	return ALIGN(pfn, pageblock_nr_pages);
 }
 
 #if defined(CONFIG_DYNAMIC_DEBUG) || \
@@ -9022,8 +9020,8 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  *			be either of the two.
  * @gfp_mask:	GFP mask to use during compaction
  *
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned.  The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
  *
  * The first thing this routine does is attempt to MIGRATE_ISOLATE all
  * pageblocks in the range.  Once isolated, the pageblocks should not
@@ -9099,7 +9097,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	ret = 0;
 
 	/*
-	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * Pages from [start, end) are within a pageblock_nr_pages
 	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
 	 * more, all pages in [start, end) are free in page allocator.
 	 * What we are going to do is to allocate all pages from
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/3] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.
  2021-11-15 19:37 [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
  2021-11-15 19:37 ` [RFC PATCH 1/3] mm: cma: alloc_contig_range: use pageblock_order as the single alignment Zi Yan
@ 2021-11-15 19:37 ` Zi Yan
  2021-11-15 19:37 ` [RFC PATCH 3/3] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Zi Yan @ 2021-11-15 19:37 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Robin Murphy, linux-kernel, iommu, Zi Yan, virtualization,
	linuxppc-dev, Christoph Hellwig, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 drivers/virtio/virtio_mem.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 0da0af251c73..a0a0994b42e2 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2436,15 +2436,13 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 				      VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
 	/*
-	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
-	 * pageblock_nr_pages pages. This:
+	 * We want subblocks to span at least pageblock_nr_pages pages. This:
 	 * - Simplifies our page onlining code (virtio_mem_online_page_cb)
 	 *   and fake page onlining code (virtio_mem_fake_online).
 	 * - Is required for now for alloc_contig_range() to work reliably -
 	 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 	 */
-	sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-			pageblock_nr_pages) * PAGE_SIZE;
+	sb_size = pageblock_nr_pages * PAGE_SIZE;
 	sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
 	if (sb_size < memory_block_size_bytes() && !force_bbm) {
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 3/3] arch: powerpc: adjust fadump alignment to be pageblock aligned.
  2021-11-15 19:37 [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
  2021-11-15 19:37 ` [RFC PATCH 1/3] mm: cma: alloc_contig_range: use pageblock_order as the single alignment Zi Yan
  2021-11-15 19:37 ` [RFC PATCH 2/3] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
@ 2021-11-15 19:37 ` Zi Yan
  2021-11-15 19:37 ` [RFC PATCH 3/3] arch: powerpc: adjust fadump alignment to " Zi Yan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Zi Yan @ 2021-11-15 19:37 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Robin Murphy, linux-kernel, iommu, Zi Yan, virtualization,
	linuxppc-dev, Christoph Hellwig, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 arch/powerpc/include/asm/fadump-internal.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h
index 8d61c8f3fec4..9198f20b6b68 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
 #define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
 /* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE <<				\
-				 max_t(unsigned long, MAX_ORDER - 1,	\
-				 pageblock_order))
+#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE << pageblock_order)
 
 /* FAD commands */
 #define FADUMP_REGISTER			1
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 3/3] arch: powerpc: adjust fadump alignment to pageblock aligned.
  2021-11-15 19:37 [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (2 preceding siblings ...)
  2021-11-15 19:37 ` [RFC PATCH 3/3] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
@ 2021-11-15 19:37 ` Zi Yan
  2021-11-16  8:58 ` [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment David Hildenbrand
  2021-11-19 12:33 ` Vlastimil Babka
  5 siblings, 0 replies; 15+ messages in thread
From: Zi Yan @ 2021-11-15 19:37 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Robin Murphy, linux-kernel, iommu, Zi Yan, virtualization,
	linuxppc-dev, Christoph Hellwig, Marek Szyprowski

From: Zi Yan <ziy@nvidia.com>

CMA only requires pageblock alignment. Change fadump too.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 arch/powerpc/include/asm/fadump-internal.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h
index 8d61c8f3fec4..9198f20b6b68 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
 #define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
 /* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE <<				\
-				 max_t(unsigned long, MAX_ORDER - 1,	\
-				 pageblock_order))
+#define FADUMP_CMA_ALIGNMENT	(PAGE_SIZE << pageblock_order)
 
 /* FAD commands */
 #define FADUMP_REGISTER			1
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-15 19:37 [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (3 preceding siblings ...)
  2021-11-15 19:37 ` [RFC PATCH 3/3] arch: powerpc: adjust fadump alignment to " Zi Yan
@ 2021-11-16  8:58 ` David Hildenbrand
  2021-11-17  3:04   ` Zi Yan
  2021-11-19 12:33 ` Vlastimil Babka
  5 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2021-11-16  8:58 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: Robin Murphy, linux-kernel, iommu, virtualization, linuxppc-dev,
	Christoph Hellwig, Marek Szyprowski

On 15.11.21 20:37, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Hi David,

Hi,

thanks for looking into this.

> 
> You suggested to make alloc_contig_range() deal with pageblock_order instead of
> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
> patchset is my attempt to achieve that. Please take a look and let me know if
> I am doing it correctly or not.
> 
> From what my understanding, cma required alignment of
> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
> __free_one_page() does not prevent merging two different pageblocks, when
> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
> does prevent that. It should be OK to just align cma to pageblock_order.
> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
> pageblock_order as alignment too.

I wonder if that's sufficient. Especially the outer_start logic in
alloc_contig_range() might be problematic. There are some ugly corner
cases with free pages/allocations spanning multiple pageblocks and we
only isolated a single pageblock.

Regarding CMA, we have to keep the following cases working:

a) Different pageblock types (MIGRATE_CMA and !MIGRATE_CMA) in MAX_ORDER
   - 1 page:
   [       MAX_ ORDER - 1     ]
   [ pageblock 0 | pageblock 1]

Assume either pageblock 0 is MIGRATE_CMA or pageblock 1 is MIGRATE_CMA,
but not both. We have to make sure alloc_contig_range() keeps working
correctly. This should be the case even with your change, as we won't
merging pages accross differing migratetypes.

b) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
   [       MAX_ ORDER - 1     ]
   [ pageblock 0 | pageblock 1]

Assume both are MIGRATE_CMA. Assume we want to either allocate from
pageblock 0 or pageblock 1. Especially, assume we want to allocate from
pageblock 1. While we would isolate pageblock 1, we wouldn't isolate
pageblock 0.

What happens if we either have a free page spanning the MAX_ORDER - 1
range already OR if we have to migrate a MAX_ORDER - 1 page, resulting
in a free MAX_ORDER - 1 page of which only the second pageblock is
isolated? We would end up essentially freeing a page that has mixed
pageblocks, essentially placing it in !MIGRATE_ISOLATE free lists ... I
might be wrong but I have the feeling that this would be problematic.

c) Concurrent allocations:
    [       MAX_ ORDER - 1     ]
    [ pageblock 0 | pageblock 1]

Assume b) but we have two concurrent CMA allocations to pageblock 0 and
pageblock 1, which would now be possible as start_isolate_page_range()
isolate would succeed on both.

Regarding virtio-mem, we care about the following cases:

a) Allocating parts from completely movable MAX_ ORDER - 1 page:
   [       MAX_ ORDER - 1     ]
   [ pageblock 0 | pageblock 1]

Assume pageblock 0 and pageblock 1 are either free or contain only
movable pages. Assume we allocated pageblock 0. We have to make sure we
can allocate pageblock 1. The other way around, assume we allocated
pageblock 1, we have to make sure we can allocate pageblock 0.

Free pages spanning both pageblocks might be problematic.

b) Allocate parts of partially movable MAX_ ORDER - 1 page:
   [       MAX_ ORDER - 1     ]
   [ pageblock 0 | pageblock 1]

Assume pageblock 0 contains unmovable data but pageblock 1 not: we have
to make sure we can allocate pageblock 1. Similarly, assume pageblock 1
contains unmovable data but pageblock 0 no: we have to make sure we can
allocate pageblock 1. has_unmovable_pages() might allow for that.

But, we want to fail early in case we want to allocate a single
pageblock but it contains unmovable data. This could be either directly
or indirectly.

If we have an unmovable (compound) MAX_ ORDER - 1 and we'd try isolating
pageblock 1, has_unmovable_pages() would always return "false" because
we'd simply be skiping over any tail pages, and not detect the
un-movability.

c) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:

Same concern as for CMA b)

So the biggest concern I have is dealing with migrating/freeing >
pageblock_order pages while only having isolated a single pageblock.

> 
> In terms of virtio_mem, if I understand correctly, it relies on
> alloc_contig_range() to obtain contiguous free pages and offlines them to reduce
> guest memory size. As the result of alloc_contig_range() alignment change,
> virtio_mem should be able to just align PFNs to pageblock_order.

For virtio-mem it will most probably be desirable to first try
allocating the MAX_ORDER -1 range if possible and then fallback to
pageblock_order. But that's an additional change on top in virtio-mem code.

My take to teach alloc_contig_range() to properly handle would be the
following:

a) Convert MIGRATE_ISOLATE into a separate pageblock flag

We would want to convert MIGRATE_ISOLATE into a separate pageblock
flags, such that when we isolate a page block we preserve the original
migratetype. While start_isolate_page_range() would set that bit,
undo_isolate_page_range() would simply clear that bit. The buddy would
use a single MIGRATE_ISOLATE queue as is: the original migratetype is
only used for restoring the correct migratetype. This would allow for
restoring e.g., MIGRATE_UNMOVABLE after isolating an unmovable pageblock
(below) and not simply setting all such pageblocks to MIGRATE_MOVABLE
when un-isolating.

Ideally, we'd get rid of the "migratetype" parameter for
alloc_contig_range(). However, even with the above change we have to
make sure that memory offlining and ordinary alloc_contig_range() users
will fail on MIGRATE_CMA pageblocks (has_unmovable_page() checks that as
of today). We could achieve that differently, though (e.g., bool
cma_alloc parameter instead).

b) Allow isolating pageblocks with unmovable pages

We'd pass the actual range of interest to start_isolate_page_range() and
rework the code to check has_unmovable_pages() only on the range of
interest, but considering overlapping larger allocations. E.g., if we
stumble over a compound page, lookup the head an test if that page is
movable/unmovable.

c) Change alloc_contig_range() to not "extend" the range of interest to
include pageblock of different type. Assume we're isolating a
MIGRATE_CMA pageblock, only isolate a neighboring MIGRATE_CMA pageblock,
not other pageblocks.

So we'd keep isolating complete MAX_ORDER - 1 pages unless c) prevents
it. We'd allow isolating even pageblocks that contain unmovable pages on
ZONE_NORMAL, and check via has_unmovable_pages() only if the range of
interest contains unmovable pages, not the whole MAX_ORDER -1 range or
even the whole pageblock. We'd not silently overwrite the pageblock type
when restoring but instead restore the old migratetype.

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-16  8:58 ` [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment David Hildenbrand
@ 2021-11-17  3:04   ` Zi Yan
  2021-11-23 16:54     ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Zi Yan @ 2021-11-17  3:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linuxppc-dev, linux-kernel, virtualization, linux-mm, iommu,
	Robin Murphy, Christoph Hellwig, Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 10499 bytes --]

On 16 Nov 2021, at 3:58, David Hildenbrand wrote:

> On 15.11.21 20:37, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Hi David,
>
> Hi,
>
> thanks for looking into this.
>
>>
>> You suggested to make alloc_contig_range() deal with pageblock_order instead of
>> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
>> patchset is my attempt to achieve that. Please take a look and let me know if
>> I am doing it correctly or not.
>>
>> From what my understanding, cma required alignment of
>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
>> __free_one_page() does not prevent merging two different pageblocks, when
>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
>> does prevent that. It should be OK to just align cma to pageblock_order.
>> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
>> pageblock_order as alignment too.
>
> I wonder if that's sufficient. Especially the outer_start logic in
> alloc_contig_range() might be problematic. There are some ugly corner
> cases with free pages/allocations spanning multiple pageblocks and we
> only isolated a single pageblock.

Thank you a lot for writing the list of these corner cases. They are
very helpful!

>
>
> Regarding CMA, we have to keep the following cases working:
>
> a) Different pageblock types (MIGRATE_CMA and !MIGRATE_CMA) in MAX_ORDER
>    - 1 page:
>    [       MAX_ ORDER - 1     ]
>    [ pageblock 0 | pageblock 1]
>
> Assume either pageblock 0 is MIGRATE_CMA or pageblock 1 is MIGRATE_CMA,
> but not both. We have to make sure alloc_contig_range() keeps working
> correctly. This should be the case even with your change, as we won't
> merging pages accross differing migratetypes.

Yes.

>
> b) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
>    [       MAX_ ORDER - 1     ]
>    [ pageblock 0 | pageblock 1]
>
> Assume both are MIGRATE_CMA. Assume we want to either allocate from
> pageblock 0 or pageblock 1. Especially, assume we want to allocate from
> pageblock 1. While we would isolate pageblock 1, we wouldn't isolate
> pageblock 0.
>
> What happens if we either have a free page spanning the MAX_ORDER - 1
> range already OR if we have to migrate a MAX_ORDER - 1 page, resulting
> in a free MAX_ORDER - 1 page of which only the second pageblock is
> isolated? We would end up essentially freeing a page that has mixed
> pageblocks, essentially placing it in !MIGRATE_ISOLATE free lists ... I
> might be wrong but I have the feeling that this would be problematic.
>

This could happen when start_isolate_page_range() stumbles upon a compound
page with order >= pageblock_order or a free page with order >=
pageblock_order, but should not. start_isolate_page_range() should check
the actual page size, either compound page size or free page size, and set
the migratetype across pageblocks if the page is bigger than pageblock size.
More precisely set_migratetype_isolate() should do that.


> c) Concurrent allocations:
>     [       MAX_ ORDER - 1     ]
>     [ pageblock 0 | pageblock 1]
>
> Assume b) but we have two concurrent CMA allocations to pageblock 0 and
> pageblock 1, which would now be possible as start_isolate_page_range()
> isolate would succeed on both.

Two isolations will be serialized by the zone lock taken by
set_migratetype_isolate(), so the concurrent allocation would not be a problem.
If it is a MAX_ORDER-1 free page, the first comer should split it and only
isolate one of the pageblock then second one can isolate the other pageblock.
If it is a MAX_ORDER-1 compound page, the first comer should isolate both
pageblocks, then the second one would fail. WDYT?


In sum, it seems to me that the issue is page isolation code only sees
pageblock without check the actual page. When there are multiple pageblocks
belonging to one page, the problem appears. This should be fixed.

>
>
> Regarding virtio-mem, we care about the following cases:
>
> a) Allocating parts from completely movable MAX_ ORDER - 1 page:
>    [       MAX_ ORDER - 1     ]
>    [ pageblock 0 | pageblock 1]
>
> Assume pageblock 0 and pageblock 1 are either free or contain only
> movable pages. Assume we allocated pageblock 0. We have to make sure we
> can allocate pageblock 1. The other way around, assume we allocated
> pageblock 1, we have to make sure we can allocate pageblock 0.
>
> Free pages spanning both pageblocks might be problematic.

Can you elaborate a bit? If either of pageblock 0 and 1 is used by
virtio-mem, why do we care the other? If pageblock 0 and 1 belong to
the same page (either free or compound), they should have the same
migratetype. If we want to just allocate one of them, we can split
the free page or migrate the compound page then split the remaining
free page.

>
> b) Allocate parts of partially movable MAX_ ORDER - 1 page:
>    [       MAX_ ORDER - 1     ]
>    [ pageblock 0 | pageblock 1]
>
> Assume pageblock 0 contains unmovable data but pageblock 1 not: we have
> to make sure we can allocate pageblock 1. Similarly, assume pageblock 1
> contains unmovable data but pageblock 0 no: we have to make sure we can
> allocate pageblock 1. has_unmovable_pages() might allow for that.
>
> But, we want to fail early in case we want to allocate a single
> pageblock but it contains unmovable data. This could be either directly
> or indirectly.
>
> If we have an unmovable (compound) MAX_ ORDER - 1 and we'd try isolating
> pageblock 1, has_unmovable_pages() would always return "false" because
> we'd simply be skiping over any tail pages, and not detect the
> un-movability.

OK. It seems to me that has_unmovable_pages() needs to be fixed to handle
such a situation.

>
> c) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
>
> Same concern as for CMA b)
>
>
> So the biggest concern I have is dealing with migrating/freeing >
> pageblock_order pages while only having isolated a single pageblock.

I agree. I think isolation code needs to be aware of >pageblock_order
pages and act accordingly. If it is a free page, split the page to
avoid isolating a subset of the page. If it is a compound page, either
fail the isolation or isolate the entire compound page instead.

>
>>
>> In terms of virtio_mem, if I understand correctly, it relies on
>> alloc_contig_range() to obtain contiguous free pages and offlines them to reduce
>> guest memory size. As the result of alloc_contig_range() alignment change,
>> virtio_mem should be able to just align PFNs to pageblock_order.
>
> For virtio-mem it will most probably be desirable to first try
> allocating the MAX_ORDER -1 range if possible and then fallback to
> pageblock_order. But that's an additional change on top in virtio-mem code.
>

Just to understand the motivation, is this because MAX_ORDER-1 range
would be faster than pageblock_order? What if MAX_ORDER-1 goes beyond
a memory section size (like my WIP patchset to increase MAX_ORDER
beyond the memory section size)? virtio-mem could first try PAGES_PER_SECTION,
then fall back to pageblock_order, right?
>
>
> My take to teach alloc_contig_range() to properly handle would be the
> following:
>
> a) Convert MIGRATE_ISOLATE into a separate pageblock flag
>
> We would want to convert MIGRATE_ISOLATE into a separate pageblock
> flags, such that when we isolate a page block we preserve the original
> migratetype. While start_isolate_page_range() would set that bit,
> undo_isolate_page_range() would simply clear that bit. The buddy would
> use a single MIGRATE_ISOLATE queue as is: the original migratetype is
> only used for restoring the correct migratetype. This would allow for
> restoring e.g., MIGRATE_UNMOVABLE after isolating an unmovable pageblock
> (below) and not simply setting all such pageblocks to MIGRATE_MOVABLE
> when un-isolating.
>
> Ideally, we'd get rid of the "migratetype" parameter for
> alloc_contig_range(). However, even with the above change we have to
> make sure that memory offlining and ordinary alloc_contig_range() users
> will fail on MIGRATE_CMA pageblocks (has_unmovable_page() checks that as
> of today). We could achieve that differently, though (e.g., bool
> cma_alloc parameter instead).

This might need to be done in a separate patch, since pageblock bits require
to be word aligned and it is 4 now. To convert MIGRATE_ISOLATE to a separate
bit, either NR_PAGEBLOCK_BITS needs to be increased to 8 or a separate
isolation bitmap array needs to be allocated. Or the migratetype information
can be stored temporarily during isolation process. I can look into it later.


>
>
> b) Allow isolating pageblocks with unmovable pages
>
> We'd pass the actual range of interest to start_isolate_page_range() and
> rework the code to check has_unmovable_pages() only on the range of
> interest, but considering overlapping larger allocations. E.g., if we
> stumble over a compound page, lookup the head an test if that page is
> movable/unmovable.

This is an optimization to reduce isolation failure rate, right? This only
applies to the pageblocks at the beginning and the end of a range of interest.

>
> c) Change alloc_contig_range() to not "extend" the range of interest to
> include pageblock of different type. Assume we're isolating a
> MIGRATE_CMA pageblock, only isolate a neighboring MIGRATE_CMA pageblock,
> not other pageblocks.

But alloc_contig_range() would return these extended pageblocks at the end.
And if pageblock migratetype can be preserved during isolation (item (a) above),
this would not be a problem, right?

>
>
> So we'd keep isolating complete MAX_ORDER - 1 pages unless c) prevents
> it. We'd allow isolating even pageblocks that contain unmovable pages on
> ZONE_NORMAL, and check via has_unmovable_pages() only if the range of
> interest contains unmovable pages, not the whole MAX_ORDER -1 range or
> even the whole pageblock. We'd not silently overwrite the pageblock type
> when restoring but instead restore the old migratetype.
>
I assume MAX_ORDER - 1 is an optimization for faster isolation speed.
If MAX_ORDER goes beyond a memory section size, I guess PAGES_PER_SECTION
is what you want, right? FYI, I am preparing a follow-up patch to replace
any MAX_ORDER use that is intended to indicate maximum physically contiguous
size with a new variable, MAX_PHYS_CONTIG_ORDER, which is PFN_SECTION_SHIFT
when SPARSEMEM and MAX_ORDER when FLATMEM. I would replace MAX_ORDER here
with the new variable.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-15 19:37 [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
                   ` (4 preceding siblings ...)
  2021-11-16  8:58 ` [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment David Hildenbrand
@ 2021-11-19 12:33 ` Vlastimil Babka
  2021-11-19 15:15   ` Zi Yan
  5 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2021-11-19 12:33 UTC (permalink / raw)
  To: Zi Yan, David Hildenbrand, linux-mm
  Cc: Robin Murphy, linux-kernel, iommu, virtualization, linuxppc-dev,
	Christoph Hellwig, Marek Szyprowski

On 11/15/21 20:37, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Hi David,
> 
> You suggested to make alloc_contig_range() deal with pageblock_order instead of
> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
> patchset is my attempt to achieve that. Please take a look and let me know if
> I am doing it correctly or not.
> 
> From what my understanding, cma required alignment of
> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
> __free_one_page() does not prevent merging two different pageblocks, when
> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
> does prevent that.

But it does prevent that only for isolated pageblock, not CMA, and yout
patchset doesn't seem to expand that to CMA? Or am I missing something.


> It should be OK to just align cma to pageblock_order.
> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
> pageblock_order as alignment too.
> 
> In terms of virtio_mem, if I understand correctly, it relies on
> alloc_contig_range() to obtain contiguous free pages and offlines them to reduce
> guest memory size. As the result of alloc_contig_range() alignment change,
> virtio_mem should be able to just align PFNs to pageblock_order.
> 
> Thanks.
> 
> 
> [1] https://lore.kernel.org/linux-mm/28b57903-fae6-47ac-7e1b-a1dd41421349@redhat.com/
> 
> Zi Yan (3):
>   mm: cma: alloc_contig_range: use pageblock_order as the single
>     alignment.
>   drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>     size.
>   arch: powerpc: adjust fadump alignment to be pageblock aligned.
> 
>  arch/powerpc/include/asm/fadump-internal.h |  4 +---
>  drivers/virtio/virtio_mem.c                |  6 ++----
>  include/linux/mmzone.h                     |  5 +----
>  kernel/dma/contiguous.c                    |  2 +-
>  mm/cma.c                                   |  6 ++----
>  mm/page_alloc.c                            | 12 +++++-------
>  6 files changed, 12 insertions(+), 23 deletions(-)
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-19 12:33 ` Vlastimil Babka
@ 2021-11-19 15:15   ` Zi Yan
  2021-11-23 16:35     ` Zi Yan
  0 siblings, 1 reply; 15+ messages in thread
From: Zi Yan @ 2021-11-19 15:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Hildenbrand, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 2495 bytes --]

On 19 Nov 2021, at 7:33, Vlastimil Babka wrote:

> On 11/15/21 20:37, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Hi David,
>>
>> You suggested to make alloc_contig_range() deal with pageblock_order instead of
>> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
>> patchset is my attempt to achieve that. Please take a look and let me know if
>> I am doing it correctly or not.
>>
>> From what my understanding, cma required alignment of
>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
>> __free_one_page() does not prevent merging two different pageblocks, when
>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
>> does prevent that.
>
> But it does prevent that only for isolated pageblock, not CMA, and yout
> patchset doesn't seem to expand that to CMA? Or am I missing something.

Yeah, you are right. Originally, I thought preventing merging isolated pageblock
with other types of pageblocks is sufficient, since MIGRATE_CMA is always
converted from MIGRATE_ISOLATE. But that is not true. I will rework the code.
Thanks for pointing this out.

>
>
>> It should be OK to just align cma to pageblock_order.
>> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
>> pageblock_order as alignment too.
>>
>> In terms of virtio_mem, if I understand correctly, it relies on
>> alloc_contig_range() to obtain contiguous free pages and offlines them to reduce
>> guest memory size. As the result of alloc_contig_range() alignment change,
>> virtio_mem should be able to just align PFNs to pageblock_order.
>>
>> Thanks.
>>
>>
>> [1] https://lore.kernel.org/linux-mm/28b57903-fae6-47ac-7e1b-a1dd41421349@redhat.com/
>>
>> Zi Yan (3):
>>   mm: cma: alloc_contig_range: use pageblock_order as the single
>>     alignment.
>>   drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>>     size.
>>   arch: powerpc: adjust fadump alignment to be pageblock aligned.
>>
>>  arch/powerpc/include/asm/fadump-internal.h |  4 +---
>>  drivers/virtio/virtio_mem.c                |  6 ++----
>>  include/linux/mmzone.h                     |  5 +----
>>  kernel/dma/contiguous.c                    |  2 +-
>>  mm/cma.c                                   |  6 ++----
>>  mm/page_alloc.c                            | 12 +++++-------
>>  6 files changed, 12 insertions(+), 23 deletions(-)
>>

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-19 15:15   ` Zi Yan
@ 2021-11-23 16:35     ` Zi Yan
  2021-11-23 17:32       ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Zi Yan @ 2021-11-23 16:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Hildenbrand, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 3014 bytes --]

On 19 Nov 2021, at 10:15, Zi Yan wrote:

> On 19 Nov 2021, at 7:33, Vlastimil Babka wrote:
>
>> On 11/15/21 20:37, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> Hi David,
>>>
>>> You suggested to make alloc_contig_range() deal with pageblock_order instead of
>>> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
>>> patchset is my attempt to achieve that. Please take a look and let me know if
>>> I am doing it correctly or not.
>>>
>>> From what my understanding, cma required alignment of
>>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
>>> __free_one_page() does not prevent merging two different pageblocks, when
>>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
>>> does prevent that.
>>
>> But it does prevent that only for isolated pageblock, not CMA, and yout
>> patchset doesn't seem to expand that to CMA? Or am I missing something.
>
> Yeah, you are right. Originally, I thought preventing merging isolated pageblock
> with other types of pageblocks is sufficient, since MIGRATE_CMA is always
> converted from MIGRATE_ISOLATE. But that is not true. I will rework the code.
> Thanks for pointing this out.
>

I find that two pageblocks with different migratetypes, like MIGRATE_RECLAIMABLE
and MIGRATE_MOVABLE can be merged into a single free page after I checked
__free_one_page() in detail and printed pageblock information during buddy page
merging. I am not sure what consequence it will cause. Do you have any idea?

I will fix it in the next version of this patchset.

>>
>>
>>> It should be OK to just align cma to pageblock_order.
>>> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
>>> pageblock_order as alignment too.
>>>
>>> In terms of virtio_mem, if I understand correctly, it relies on
>>> alloc_contig_range() to obtain contiguous free pages and offlines them to reduce
>>> guest memory size. As the result of alloc_contig_range() alignment change,
>>> virtio_mem should be able to just align PFNs to pageblock_order.
>>>
>>> Thanks.
>>>
>>>
>>> [1] https://lore.kernel.org/linux-mm/28b57903-fae6-47ac-7e1b-a1dd41421349@redhat.com/
>>>
>>> Zi Yan (3):
>>>   mm: cma: alloc_contig_range: use pageblock_order as the single
>>>     alignment.
>>>   drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>>>     size.
>>>   arch: powerpc: adjust fadump alignment to be pageblock aligned.
>>>
>>>  arch/powerpc/include/asm/fadump-internal.h |  4 +---
>>>  drivers/virtio/virtio_mem.c                |  6 ++----
>>>  include/linux/mmzone.h                     |  5 +----
>>>  kernel/dma/contiguous.c                    |  2 +-
>>>  mm/cma.c                                   |  6 ++----
>>>  mm/page_alloc.c                            | 12 +++++-------
>>>  6 files changed, 12 insertions(+), 23 deletions(-)
>>>
>
> --
> Best Regards,
> Yan, Zi


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-17  3:04   ` Zi Yan
@ 2021-11-23 16:54     ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2021-11-23 16:54 UTC (permalink / raw)
  To: Zi Yan
  Cc: linuxppc-dev, linux-kernel, virtualization, linux-mm, iommu,
	Robin Murphy, Christoph Hellwig, Marek Szyprowski

On 17.11.21 04:04, Zi Yan wrote:
> On 16 Nov 2021, at 3:58, David Hildenbrand wrote:
> 
>> On 15.11.21 20:37, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> Hi David,
>>
>> Hi,
>>
>> thanks for looking into this.
>>

Hi,

sorry for the delay, I wasn't "actually working" last week, so now I'm
back from holiday :)

>>>
>>> You suggested to make alloc_contig_range() deal with pageblock_order instead of
>>> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
>>> patchset is my attempt to achieve that. Please take a look and let me know if
>>> I am doing it correctly or not.
>>>
>>> From what my understanding, cma required alignment of
>>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
>>> __free_one_page() does not prevent merging two different pageblocks, when
>>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
>>> does prevent that. It should be OK to just align cma to pageblock_order.
>>> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
>>> pageblock_order as alignment too.
>>
>> I wonder if that's sufficient. Especially the outer_start logic in
>> alloc_contig_range() might be problematic. There are some ugly corner
>> cases with free pages/allocations spanning multiple pageblocks and we
>> only isolated a single pageblock.
> 
> Thank you a lot for writing the list of these corner cases. They are
> very helpful!
> 
>>
>>
>> Regarding CMA, we have to keep the following cases working:
>>
>> a) Different pageblock types (MIGRATE_CMA and !MIGRATE_CMA) in MAX_ORDER
>>    - 1 page:
>>    [       MAX_ ORDER - 1     ]
>>    [ pageblock 0 | pageblock 1]
>>
>> Assume either pageblock 0 is MIGRATE_CMA or pageblock 1 is MIGRATE_CMA,
>> but not both. We have to make sure alloc_contig_range() keeps working
>> correctly. This should be the case even with your change, as we won't
>> merging pages accross differing migratetypes.
> 
> Yes.
> 
>>
>> b) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
>>    [       MAX_ ORDER - 1     ]
>>    [ pageblock 0 | pageblock 1]
>>
>> Assume both are MIGRATE_CMA. Assume we want to either allocate from
>> pageblock 0 or pageblock 1. Especially, assume we want to allocate from
>> pageblock 1. While we would isolate pageblock 1, we wouldn't isolate
>> pageblock 0.
>>
>> What happens if we either have a free page spanning the MAX_ORDER - 1
>> range already OR if we have to migrate a MAX_ORDER - 1 page, resulting
>> in a free MAX_ORDER - 1 page of which only the second pageblock is
>> isolated? We would end up essentially freeing a page that has mixed
>> pageblocks, essentially placing it in !MIGRATE_ISOLATE free lists ... I
>> might be wrong but I have the feeling that this would be problematic.
>>
> 
> This could happen when start_isolate_page_range() stumbles upon a compound
> page with order >= pageblock_order or a free page with order >=
> pageblock_order, but should not. start_isolate_page_range() should check
> the actual page size, either compound page size or free page size, and set
> the migratetype across pageblocks if the page is bigger than pageblock size.
> More precisely set_migratetype_isolate() should do that.

Right -- but then we have to extend the isolation range from within
set_migratetype_isolate() :/ E.g., how should we know what we have to
unisolate later ..

> 
> 
>> c) Concurrent allocations:
>>     [       MAX_ ORDER - 1     ]
>>     [ pageblock 0 | pageblock 1]
>>
>> Assume b) but we have two concurrent CMA allocations to pageblock 0 and
>> pageblock 1, which would now be possible as start_isolate_page_range()
>> isolate would succeed on both.
> 
> Two isolations will be serialized by the zone lock taken by
> set_migratetype_isolate(), so the concurrent allocation would not be a problem.
> If it is a MAX_ORDER-1 free page, the first comer should split it and only
> isolate one of the pageblock then second one can isolate the other pageblock.

Right, the issue is essentially b) above.

> If it is a MAX_ORDER-1 compound page, the first comer should isolate both
> pageblocks, then the second one would fail. WDYT?

Possibly we could even have two independent CMA areas "colliding" within
a MAX_ ORDER - 1 page. I guess the surprise would be getting an
"-EAGAIN" from isolation, but the caller should properly handle that.

Maybe it's really easier to do something along the lines I proposed
below and always isolate the complete MAX_ORDER-1 range in
alloc_contig_range(). We just have to "fix" isolation as I drafted.

> 
> 
> In sum, it seems to me that the issue is page isolation code only sees
> pageblock without check the actual page. When there are multiple pageblocks
> belonging to one page, the problem appears. This should be fixed.
> 
>>
>>
>> Regarding virtio-mem, we care about the following cases:
>>
>> a) Allocating parts from completely movable MAX_ ORDER - 1 page:
>>    [       MAX_ ORDER - 1     ]
>>    [ pageblock 0 | pageblock 1]
>>
>> Assume pageblock 0 and pageblock 1 are either free or contain only
>> movable pages. Assume we allocated pageblock 0. We have to make sure we
>> can allocate pageblock 1. The other way around, assume we allocated
>> pageblock 1, we have to make sure we can allocate pageblock 0.
>>
>> Free pages spanning both pageblocks might be problematic.
> 
> Can you elaborate a bit? If either of pageblock 0 and 1 is used by
> virtio-mem, why do we care the other? If pageblock 0 and 1 belong to
> the same page (either free or compound), they should have the same
> migratetype. If we want to just allocate one of them, we can split
> the free page or migrate the compound page then split the remaining
> free page.

The thing is: it has to work on ZONE_NORMAL and ZONE_MOVABLE as well.
It's essentially the same issue as a) and b) in the CMA case, so it
should be covered by these.

> 
>>
>> b) Allocate parts of partially movable MAX_ ORDER - 1 page:
>>    [       MAX_ ORDER - 1     ]
>>    [ pageblock 0 | pageblock 1]
>>
>> Assume pageblock 0 contains unmovable data but pageblock 1 not: we have
>> to make sure we can allocate pageblock 1. Similarly, assume pageblock 1
>> contains unmovable data but pageblock 0 no: we have to make sure we can
>> allocate pageblock 1. has_unmovable_pages() might allow for that.
>>
>> But, we want to fail early in case we want to allocate a single
>> pageblock but it contains unmovable data. This could be either directly
>> or indirectly.
>>
>> If we have an unmovable (compound) MAX_ ORDER - 1 and we'd try isolating
>> pageblock 1, has_unmovable_pages() would always return "false" because
>> we'd simply be skiping over any tail pages, and not detect the
>> un-movability.
> 
> OK. It seems to me that has_unmovable_pages() needs to be fixed to handle
> such a situation.

Right.

> 
>>
>> c) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
>>
>> Same concern as for CMA b)
>>
>>
>> So the biggest concern I have is dealing with migrating/freeing >
>> pageblock_order pages while only having isolated a single pageblock.
> 
> I agree. I think isolation code needs to be aware of >pageblock_order
> pages and act accordingly. If it is a free page, split the page to
> avoid isolating a subset of the page. If it is a compound page, either
> fail the isolation or isolate the entire compound page instead.
> 
>>
>>>
>>> In terms of virtio_mem, if I understand correctly, it relies on
>>> alloc_contig_range() to obtain contiguous free pages and offlines them to reduce
>>> guest memory size. As the result of alloc_contig_range() alignment change,
>>> virtio_mem should be able to just align PFNs to pageblock_order.
>>
>> For virtio-mem it will most probably be desirable to first try
>> allocating the MAX_ORDER -1 range if possible and then fallback to
>> pageblock_order. But that's an additional change on top in virtio-mem code.
>>
> 
> Just to understand the motivation, is this because MAX_ORDER-1 range
> would be faster than pageblock_order? What if MAX_ORDER-1 goes beyond
> a memory section size (like my WIP patchset to increase MAX_ORDER
> beyond the memory section size)? virtio-mem could first try PAGES_PER_SECTION,
> then fall back to pageblock_order, right?

My comment is only in the context of this patch series, not in context
of eventually raising MAX_ORDER to exceed eventually a single memory
section or even a memory block.

Yes, it would be faster. What we do right now (if the complete memory
block is used by Linux and thus not allocated by virtio-mem yet):

a) Try allocating the the complete memory block.
b) If it fails, fallback to essentially MAX_ORDER - 1 chunks

In the context of this patch it would be reasonable to

a) Try allocating the the complete memory block.
b) If it fails, fallback to essentially MAX_ORDER - 1 chunks
c) If it fails, fallback to essentially pageblock order chunks

Things will be different if we change MAX_ORDER - 1.

>>
>>
>> My take to teach alloc_contig_range() to properly handle would be the
>> following:
>>
>> a) Convert MIGRATE_ISOLATE into a separate pageblock flag
>>
>> We would want to convert MIGRATE_ISOLATE into a separate pageblock
>> flags, such that when we isolate a page block we preserve the original
>> migratetype. While start_isolate_page_range() would set that bit,
>> undo_isolate_page_range() would simply clear that bit. The buddy would
>> use a single MIGRATE_ISOLATE queue as is: the original migratetype is
>> only used for restoring the correct migratetype. This would allow for
>> restoring e.g., MIGRATE_UNMOVABLE after isolating an unmovable pageblock
>> (below) and not simply setting all such pageblocks to MIGRATE_MOVABLE
>> when un-isolating.
>>
>> Ideally, we'd get rid of the "migratetype" parameter for
>> alloc_contig_range(). However, even with the above change we have to
>> make sure that memory offlining and ordinary alloc_contig_range() users
>> will fail on MIGRATE_CMA pageblocks (has_unmovable_page() checks that as
>> of today). We could achieve that differently, though (e.g., bool
>> cma_alloc parameter instead).
> 
> This might need to be done in a separate patch, since pageblock bits require
> to be word aligned and it is 4 now. To convert MIGRATE_ISOLATE to a separate
> bit, either NR_PAGEBLOCK_BITS needs to be increased to 8 or a separate
> isolation bitmap array needs to be allocated. Or the migratetype information
> can be stored temporarily during isolation process. I can look into it later.

Right, we'd need 8 instead of 4 bits. But we could preserve the previous
migratettype (MOVABLE, UNMOVABLE, CMA) ... when isolating and wouldn't
have to magically punch in whatever someone told us.

> 
> 
>>
>>
>> b) Allow isolating pageblocks with unmovable pages
>>
>> We'd pass the actual range of interest to start_isolate_page_range() and
>> rework the code to check has_unmovable_pages() only on the range of
>> interest, but considering overlapping larger allocations. E.g., if we
>> stumble over a compound page, lookup the head an test if that page is
>> movable/unmovable.
> 
> This is an optimization to reduce isolation failure rate, right? This only
> applies to the pageblocks at the beginning and the end of a range of interest.

Right. And with a) we can simply isolate+unisolate without always
changing the migratetype e.g., to MIGRATE_MOVABLE in case of virtio-mem.

> 
>>
>> c) Change alloc_contig_range() to not "extend" the range of interest to
>> include pageblock of different type. Assume we're isolating a
>> MIGRATE_CMA pageblock, only isolate a neighboring MIGRATE_CMA pageblock,
>> not other pageblocks.
> 
> But alloc_contig_range() would return these extended pageblocks at the end.
> And if pageblock migratetype can be preserved during isolation (item (a) above),
> this would not be a problem, right?

We have to make sure that ordinary alloc_contig_range() and memory
offlining don't allocate MIGRATE_CMA ranges. So when actually isolating
we have to refuse isolating MIGRATE_CMA pageblocks. Handling that in the
caller when adjusting the range keeps the logic in the actual isolation
code is one option (cma=false - bail out when wanting to isolate
MIGRATE_CMA).

But there might be alternatives. Most probably we'd just have to check
for the "range of interest". If cma=false we just have to make sure to
not isolate MIGRATE_CMA inside the "range of interest". Yes that should
work as well.

> 
>>
>>
>> So we'd keep isolating complete MAX_ORDER - 1 pages unless c) prevents
>> it. We'd allow isolating even pageblocks that contain unmovable pages on
>> ZONE_NORMAL, and check via has_unmovable_pages() only if the range of
>> interest contains unmovable pages, not the whole MAX_ORDER -1 range or
>> even the whole pageblock. We'd not silently overwrite the pageblock type
>> when restoring but instead restore the old migratetype.
>>
> I assume MAX_ORDER - 1 is an optimization for faster isolation speed.
> If MAX_ORDER goes beyond a memory section size, I guess PAGES_PER_SECTION
> is what you want, right? FYI, I am preparing a follow-up patch to replace
> any MAX_ORDER use that is intended to indicate maximum physically contiguous
> size with a new variable, MAX_PHYS_CONTIG_ORDER, which is PFN_SECTION_SHIFT
> when SPARSEMEM and MAX_ORDER when FLATMEM. I would replace MAX_ORDER here
> with the new variable.

Yes, with MAX_ORDER changes it will be a different story. We could
detect at runtime what actually makes sense.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-23 16:35     ` Zi Yan
@ 2021-11-23 17:32       ` Vlastimil Babka
  2021-11-29 22:08         ` Zi Yan
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2021-11-23 17:32 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski

On 11/23/21 17:35, Zi Yan wrote:
> On 19 Nov 2021, at 10:15, Zi Yan wrote:
>>>> From what my understanding, cma required alignment of
>>>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
>>>> __free_one_page() does not prevent merging two different pageblocks, when
>>>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
>>>> does prevent that.
>>>
>>> But it does prevent that only for isolated pageblock, not CMA, and yout
>>> patchset doesn't seem to expand that to CMA? Or am I missing something.
>>
>> Yeah, you are right. Originally, I thought preventing merging isolated pageblock
>> with other types of pageblocks is sufficient, since MIGRATE_CMA is always
>> converted from MIGRATE_ISOLATE. But that is not true. I will rework the code.
>> Thanks for pointing this out.
>>
> 
> I find that two pageblocks with different migratetypes, like MIGRATE_RECLAIMABLE
> and MIGRATE_MOVABLE can be merged into a single free page after I checked
> __free_one_page() in detail and printed pageblock information during buddy page
> merging.

Yes, that can happen.

I am not sure what consequence it will cause. Do you have any idea?

For MIGRATE_RECLAIMABLE or MIGRATE_MOVABLE or even MIGRATE_UNMOVABLE it's
absolutely fine. As long as these pageblocks are fully free (and they are if
it's a single free page spanning 2 pageblocks), they can be of any of these
type, as they can be reused as needed without causing fragmentation.

But in case of MIGRATE_CMA and MIGRATE_ISOLATE, uncontrolled merging would
break the specifics of those types. That's why the code is careful for
MIGRATE_ISOLATE, and MIGRATE_CMA was until now done in MAX_ORDER granularity.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-23 17:32       ` Vlastimil Babka
@ 2021-11-29 22:08         ` Zi Yan
  2021-11-30  9:11           ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Zi Yan @ 2021-11-29 22:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Hildenbrand, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 2062 bytes --]

On 23 Nov 2021, at 12:32, Vlastimil Babka wrote:

> On 11/23/21 17:35, Zi Yan wrote:
>> On 19 Nov 2021, at 10:15, Zi Yan wrote:
>>>>> From what my understanding, cma required alignment of
>>>>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
>>>>> __free_one_page() does not prevent merging two different pageblocks, when
>>>>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
>>>>> does prevent that.
>>>>
>>>> But it does prevent that only for isolated pageblock, not CMA, and yout
>>>> patchset doesn't seem to expand that to CMA? Or am I missing something.
>>>
>>> Yeah, you are right. Originally, I thought preventing merging isolated pageblock
>>> with other types of pageblocks is sufficient, since MIGRATE_CMA is always
>>> converted from MIGRATE_ISOLATE. But that is not true. I will rework the code.
>>> Thanks for pointing this out.
>>>
>>
>> I find that two pageblocks with different migratetypes, like MIGRATE_RECLAIMABLE
>> and MIGRATE_MOVABLE can be merged into a single free page after I checked
>> __free_one_page() in detail and printed pageblock information during buddy page
>> merging.
>
> Yes, that can happen.
>
> I am not sure what consequence it will cause. Do you have any idea?
>
> For MIGRATE_RECLAIMABLE or MIGRATE_MOVABLE or even MIGRATE_UNMOVABLE it's
> absolutely fine. As long as these pageblocks are fully free (and they are if
> it's a single free page spanning 2 pageblocks), they can be of any of these
> type, as they can be reused as needed without causing fragmentation.
>
> But in case of MIGRATE_CMA and MIGRATE_ISOLATE, uncontrolled merging would
> break the specifics of those types. That's why the code is careful for
> MIGRATE_ISOLATE, and MIGRATE_CMA was until now done in MAX_ORDER granularity.

Thanks for the explanation. Basically migratetypes that can fall back to each
other can be merged into a single free page, right?

How about MIGRATE_HIGHATOMIC? It should not be merged with other migratetypes
from my understanding.


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-29 22:08         ` Zi Yan
@ 2021-11-30  9:11           ` Vlastimil Babka
  2021-11-30 10:08             ` Mel Gorman
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2021-11-30  9:11 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mel Gorman, David Hildenbrand, linuxppc-dev, linux-kernel,
	virtualization, linux-mm, iommu, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski

On 11/29/21 23:08, Zi Yan wrote:
> On 23 Nov 2021, at 12:32, Vlastimil Babka wrote:
> 
>> On 11/23/21 17:35, Zi Yan wrote:
>>> On 19 Nov 2021, at 10:15, Zi Yan wrote:
>>>>>> From what my understanding, cma required alignment of
>>>>>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
>>>>>> __free_one_page() does not prevent merging two different pageblocks, when
>>>>>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
>>>>>> does prevent that.
>>>>>
>>>>> But it does prevent that only for isolated pageblock, not CMA, and yout
>>>>> patchset doesn't seem to expand that to CMA? Or am I missing something.
>>>>
>>>> Yeah, you are right. Originally, I thought preventing merging isolated pageblock
>>>> with other types of pageblocks is sufficient, since MIGRATE_CMA is always
>>>> converted from MIGRATE_ISOLATE. But that is not true. I will rework the code.
>>>> Thanks for pointing this out.
>>>>
>>>
>>> I find that two pageblocks with different migratetypes, like MIGRATE_RECLAIMABLE
>>> and MIGRATE_MOVABLE can be merged into a single free page after I checked
>>> __free_one_page() in detail and printed pageblock information during buddy page
>>> merging.
>>
>> Yes, that can happen.
>>
>> I am not sure what consequence it will cause. Do you have any idea?
>>
>> For MIGRATE_RECLAIMABLE or MIGRATE_MOVABLE or even MIGRATE_UNMOVABLE it's
>> absolutely fine. As long as these pageblocks are fully free (and they are if
>> it's a single free page spanning 2 pageblocks), they can be of any of these
>> type, as they can be reused as needed without causing fragmentation.
>>
>> But in case of MIGRATE_CMA and MIGRATE_ISOLATE, uncontrolled merging would
>> break the specifics of those types. That's why the code is careful for
>> MIGRATE_ISOLATE, and MIGRATE_CMA was until now done in MAX_ORDER granularity.
> 
> Thanks for the explanation. Basically migratetypes that can fall back to each
> other can be merged into a single free page, right?

Yes.

> How about MIGRATE_HIGHATOMIC? It should not be merged with other migratetypes
> from my understanding.

Hmm it shouldn't minimally because it has an accounting that would become
broken. So it should prevent merging or make sure the reservations are with
MAX_ORDER granularity, but seems that neither is true? CCing Mel.

> --
> Best Regards,
> Yan, Zi
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.
  2021-11-30  9:11           ` Vlastimil Babka
@ 2021-11-30 10:08             ` Mel Gorman
  0 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2021-11-30 10:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Hildenbrand, linuxppc-dev, linux-kernel, virtualization,
	linux-mm, iommu, Zi Yan, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski

On Tue, Nov 30, 2021 at 10:11:36AM +0100, Vlastimil Babka wrote:
> >>> I find that two pageblocks with different migratetypes, like MIGRATE_RECLAIMABLE
> >>> and MIGRATE_MOVABLE can be merged into a single free page after I checked
> >>> __free_one_page() in detail and printed pageblock information during buddy page
> >>> merging.
> >>
> >> Yes, that can happen.
> >>
> >> I am not sure what consequence it will cause. Do you have any idea?
> >>
> >> For MIGRATE_RECLAIMABLE or MIGRATE_MOVABLE or even MIGRATE_UNMOVABLE it's
> >> absolutely fine. As long as these pageblocks are fully free (and they are if
> >> it's a single free page spanning 2 pageblocks), they can be of any of these
> >> type, as they can be reused as needed without causing fragmentation.
> >>
> >> But in case of MIGRATE_CMA and MIGRATE_ISOLATE, uncontrolled merging would
> >> break the specifics of those types. That's why the code is careful for
> >> MIGRATE_ISOLATE, and MIGRATE_CMA was until now done in MAX_ORDER granularity.
> > 
> > Thanks for the explanation. Basically migratetypes that can fall back to each
> > other can be merged into a single free page, right?
> 
> Yes.
> 
> > How about MIGRATE_HIGHATOMIC? It should not be merged with other migratetypes
> > from my understanding.
> 
> Hmm it shouldn't minimally because it has an accounting that would become
> broken. So it should prevent merging or make sure the reservations are with
> MAX_ORDER granularity, but seems that neither is true? CCing Mel.
> 

MIGRATE_HIGHATOMIC pageblocks can have pages allocated of different
types, particularly UNMOVABLE and potentially RECLAIMABLE. The
reserving or releasing MIGRATE_HIGHATOMIC pageblocks should be done with
reserve_highatomic_pageblock and unreserve_highatomic_pageblock to get
the accounting right.

However, there does not appear to be any special protection against a
page in a highatomic pageblock getting merged with a buddy of another
pageblock type. The pageblock would still have the right setting but on
allocation, the pages could split to the wrong free list and be lost
until the pages belonging to MIGRATE_HIGHATOMIC were freed again.

Not sure how much of a problem that is in practice, it's been a while
since I've heard of high-order atomic allocation failures.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-11-30 10:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-15 19:37 [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2021-11-15 19:37 ` [RFC PATCH 1/3] mm: cma: alloc_contig_range: use pageblock_order as the single alignment Zi Yan
2021-11-15 19:37 ` [RFC PATCH 2/3] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2021-11-15 19:37 ` [RFC PATCH 3/3] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
2021-11-15 19:37 ` [RFC PATCH 3/3] arch: powerpc: adjust fadump alignment to " Zi Yan
2021-11-16  8:58 ` [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment David Hildenbrand
2021-11-17  3:04   ` Zi Yan
2021-11-23 16:54     ` David Hildenbrand
2021-11-19 12:33 ` Vlastimil Babka
2021-11-19 15:15   ` Zi Yan
2021-11-23 16:35     ` Zi Yan
2021-11-23 17:32       ` Vlastimil Babka
2021-11-29 22:08         ` Zi Yan
2021-11-30  9:11           ` Vlastimil Babka
2021-11-30 10:08             ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).