linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
@ 2021-04-21  6:51 Mike Rapoport
  2021-04-21  6:51 ` [PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
                   ` (4 more replies)
  0 siblings, 5 replies; 47+ messages in thread
From: Mike Rapoport @ 2021-04-21  6:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, David Hildenbrand, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

Hi,

These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
pfn_valid_within() to 1. 

The idea is to mark NOMAP pages as reserved in the memory map and restore
the intended semantics of pfn_valid() to designate availability of struct
page for a pfn.

With this the core mm will be able to cope with the fact that it cannot use
NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
will be treated correctly even without the need for pfn_valid_within.

The patches are only boot tested on qemu-system-aarch64 so I'd really
appreciate memory stress tests on real hardware.

If this actually works we'll be one step closer to drop custom pfn_valid()
on arm64 altogether.

v2:
* Add check for PFN overflow in pfn_is_map_memory()
* Add Acked-by and Reviewed-by tags, thanks David.

v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-rppt@kernel.org
* Add comment about the semantics of pfn_valid() as Anshuman suggested
* Extend comments about MEMBLOCK_NOMAP, per Anshuman
* Use pfn_is_map_memory() name for the exported wrapper for
  memblock_is_map_memory(). It is still local to arch/arm64 in the end
  because of header dependency issues.

rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-rppt@kernel.org

Mike Rapoport (4):
  include/linux/mmzone.h: add documentation for pfn_valid()
  memblock: update initialization of reserved pages
  arm64: decouple check whether pfn is in linear map from pfn_valid()
  arm64: drop pfn_valid_within() and simplify pfn_valid()

 arch/arm64/Kconfig              |  3 ---
 arch/arm64/include/asm/memory.h |  2 +-
 arch/arm64/include/asm/page.h   |  1 +
 arch/arm64/kvm/mmu.c            |  2 +-
 arch/arm64/mm/init.c            | 10 ++++++++--
 arch/arm64/mm/ioremap.c         |  4 ++--
 arch/arm64/mm/mmu.c             |  2 +-
 include/linux/memblock.h        |  4 +++-
 include/linux/mmzone.h          | 11 +++++++++++
 mm/memblock.c                   | 28 ++++++++++++++++++++++++++--
 10 files changed, 54 insertions(+), 13 deletions(-)

base-commit: e49d033bddf5b565044e2abe4241353959bc9120
-- 
2.28.0

*** BLURB HERE ***

Mike Rapoport (4):
  include/linux/mmzone.h: add documentation for pfn_valid()
  memblock: update initialization of reserved pages
  arm64: decouple check whether pfn is in linear map from pfn_valid()
  arm64: drop pfn_valid_within() and simplify pfn_valid()

 arch/arm64/Kconfig              |  3 ---
 arch/arm64/include/asm/memory.h |  2 +-
 arch/arm64/include/asm/page.h   |  1 +
 arch/arm64/kvm/mmu.c            |  2 +-
 arch/arm64/mm/init.c            | 15 +++++++++++++--
 arch/arm64/mm/ioremap.c         |  4 ++--
 arch/arm64/mm/mmu.c             |  2 +-
 include/linux/memblock.h        |  4 +++-
 include/linux/mmzone.h          | 11 +++++++++++
 mm/memblock.c                   | 28 ++++++++++++++++++++++++++--
 10 files changed, 59 insertions(+), 13 deletions(-)

-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid()
  2021-04-21  6:51 [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
@ 2021-04-21  6:51 ` Mike Rapoport
  2021-04-21 10:49   ` Anshuman Khandual
  2021-04-21  6:51 ` [PATCH v2 2/4] memblock: update initialization of reserved pages Mike Rapoport
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-21  6:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, David Hildenbrand, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

Add comment describing the semantics of pfn_valid() that clarifies that
pfn_valid() only checks for availability of a memory map entry (i.e. struct
page) for a PFN rather than availability of usable memory backing that PFN.

The most "generic" version of pfn_valid() used by the configurations with
SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most
suitable place for documentation about semantics of pfn_valid().

Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 include/linux/mmzone.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 47946cec7584..961f0eeefb62 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1410,6 +1410,17 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
 #endif
 
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
+/**
+ * pfn_valid - check if there is a valid memory map entry for a PFN
+ * @pfn: the page frame number to check
+ *
+ * Check if there is a valid memory map entry aka struct page for the @pfn.
+ * Note, that availability of the memory map entry does not imply that
+ * there is actual usable memory at that @pfn. The struct page may
+ * represent a hole or an unusable page frame.
+ *
+ * Return: 1 for PFNs that have memory map entries and 0 otherwise
+ */
 static inline int pfn_valid(unsigned long pfn)
 {
 	struct mem_section *ms;
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 2/4] memblock: update initialization of reserved pages
  2021-04-21  6:51 [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
  2021-04-21  6:51 ` [PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
@ 2021-04-21  6:51 ` Mike Rapoport
  2021-04-21  7:49   ` David Hildenbrand
  2021-04-21 10:51   ` Anshuman Khandual
  2021-04-21  6:51 ` [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 47+ messages in thread
From: Mike Rapoport @ 2021-04-21  6:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, David Hildenbrand, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

The struct pages representing a reserved memory region are initialized
using reserve_bootmem_range() function. This function is called for each
reserved region just before the memory is freed from memblock to the buddy
page allocator.

The struct pages for MEMBLOCK_NOMAP regions are kept with the default
values set by the memory map initialization which makes it necessary to
have a special treatment for such pages in pfn_valid() and
pfn_valid_within().

Split out initialization of the reserved pages to a function with a
meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
reserved regions and mark struct pages for the NOMAP regions as
PageReserved.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 include/linux/memblock.h |  4 +++-
 mm/memblock.c            | 28 ++++++++++++++++++++++++++--
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5984fff3f175..634c1a578db8 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
  * @MEMBLOCK_NONE: no special request
  * @MEMBLOCK_HOTPLUG: hotpluggable region
  * @MEMBLOCK_MIRROR: mirrored region
- * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
+ * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
+ * reserved in the memory map; refer to memblock_mark_nomap() description
+ * for futher details
  */
 enum memblock_flags {
 	MEMBLOCK_NONE		= 0x0,	/* No special request */
diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..3abf2c3fea7f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
  * @base: the base phys addr of the region
  * @size: the size of the region
  *
+ * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
+ * direct mapping of the physical memory. These regions will still be
+ * covered by the memory map. The struct page representing NOMAP memory
+ * frames in the memory map will be PageReserved()
+ *
  * Return: 0 on success, -errno on failure.
  */
 int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
@@ -2002,6 +2007,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
 	return end_pfn - start_pfn;
 }
 
+static void __init memmap_init_reserved_pages(void)
+{
+	struct memblock_region *region;
+	phys_addr_t start, end;
+	u64 i;
+
+	/* initialize struct pages for the reserved regions */
+	for_each_reserved_mem_range(i, &start, &end)
+		reserve_bootmem_region(start, end);
+
+	/* and also treat struct pages for the NOMAP regions as PageReserved */
+	for_each_mem_region(region) {
+		if (memblock_is_nomap(region)) {
+			start = region->base;
+			end = start + region->size;
+			reserve_bootmem_region(start, end);
+		}
+	}
+}
+
 static unsigned long __init free_low_memory_core_early(void)
 {
 	unsigned long count = 0;
@@ -2010,8 +2035,7 @@ static unsigned long __init free_low_memory_core_early(void)
 
 	memblock_clear_hotplug(0, -1);
 
-	for_each_reserved_mem_range(i, &start, &end)
-		reserve_bootmem_region(start, end);
+	memmap_init_reserved_pages();
 
 	/*
 	 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
  2021-04-21  6:51 [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
  2021-04-21  6:51 ` [PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
  2021-04-21  6:51 ` [PATCH v2 2/4] memblock: update initialization of reserved pages Mike Rapoport
@ 2021-04-21  6:51 ` Mike Rapoport
  2021-04-21 10:59   ` Anshuman Khandual
  2021-04-21  6:51 ` [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
  2021-04-22  7:00 ` [PATCH v2 0/4] " Kefeng Wang
  4 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-21  6:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, David Hildenbrand, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

The intended semantics of pfn_valid() is to verify whether there is a
struct page for the pfn in question and nothing else.

Yet, on arm64 it is used to distinguish memory areas that are mapped in the
linear map vs those that require ioremap() to access them.

Introduce a dedicated pfn_is_map_memory() wrapper for
memblock_is_map_memory() to perform such check and use it where
appropriate.

Using a wrapper allows to avoid cyclic include dependencies.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/include/asm/memory.h |  2 +-
 arch/arm64/include/asm/page.h   |  1 +
 arch/arm64/kvm/mmu.c            |  2 +-
 arch/arm64/mm/init.c            | 11 +++++++++++
 arch/arm64/mm/ioremap.c         |  4 ++--
 arch/arm64/mm/mmu.c             |  2 +-
 6 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0aabc3be9a75..194f9f993d30 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
 
 #define virt_addr_valid(addr)	({					\
 	__typeof__(addr) __addr = __tag_reset(addr);			\
-	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
+	__is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr));	\
 })
 
 void dump_mem_limit(void);
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..99a6da91f870 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
 typedef struct page *pgtable_t;
 
 extern int pfn_valid(unsigned long);
+extern int pfn_is_map_memory(unsigned long);
 
 #include <asm/memory.h>
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8711894db8c2..23dd99e29b23 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 
 static bool kvm_is_device_pfn(unsigned long pfn)
 {
-	return !pfn_valid(pfn);
+	return !pfn_is_map_memory(pfn);
 }
 
 /*
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 3685e12aba9b..dc03bdc12c0f 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn)
 }
 EXPORT_SYMBOL(pfn_valid);
 
+int pfn_is_map_memory(unsigned long pfn)
+{
+	phys_addr_t addr = PFN_PHYS(pfn);
+
+	if (PHYS_PFN(addr) != pfn)
+		return 0;
+	
+	return memblock_is_map_memory(addr);
+}
+EXPORT_SYMBOL(pfn_is_map_memory);
+
 static phys_addr_t memory_limit = PHYS_ADDR_MAX;
 
 /*
diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index b5e83c46b23e..b7c81dacabf0 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
 	/*
 	 * Don't allow RAM to be mapped.
 	 */
-	if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
+	if (WARN_ON(pfn_is_map_memory(__phys_to_pfn(phys_addr))))
 		return NULL;
 
 	area = get_vm_area_caller(size, VM_IOREMAP, caller);
@@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
 void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
 {
 	/* For normal memory we already have a cacheable mapping. */
-	if (pfn_valid(__phys_to_pfn(phys_addr)))
+	if (pfn_is_map_memory(__phys_to_pfn(phys_addr)))
 		return (void __iomem *)__phys_to_virt(phys_addr);
 
 	return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5d9550fdb9cf..26045e9adbd7 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -81,7 +81,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
-	if (!pfn_valid(pfn))
+	if (!pfn_is_map_memory(pfn))
 		return pgprot_noncached(vma_prot);
 	else if (file->f_flags & O_SYNC)
 		return pgprot_writecombine(vma_prot);
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-21  6:51 [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
                   ` (2 preceding siblings ...)
  2021-04-21  6:51 ` [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
@ 2021-04-21  6:51 ` Mike Rapoport
  2021-04-21  7:49   ` David Hildenbrand
  2021-04-21 11:06   ` Anshuman Khandual
  2021-04-22  7:00 ` [PATCH v2 0/4] " Kefeng Wang
  4 siblings, 2 replies; 47+ messages in thread
From: Mike Rapoport @ 2021-04-21  6:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, David Hildenbrand, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

The arm64's version of pfn_valid() differs from the generic because of two
reasons:

* Parts of the memory map are freed during boot. This makes it necessary to
  verify that there is actual physical memory that corresponds to a pfn
  which is done by querying memblock.

* There are NOMAP memory regions. These regions are not mapped in the
  linear map and until the previous commit the struct pages representing
  these areas had default values.

As the consequence of absence of the special treatment of NOMAP regions in
the memory map it was necessary to use memblock_is_map_memory() in
pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
generic mm functionality would not treat a NOMAP page as a normal page.

Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
the rest of core mm will treat them as unusable memory and thus
pfn_valid_within() is no longer required at all and can be disabled by
removing CONFIG_HOLES_IN_ZONE on arm64.

pfn_valid() can be slightly simplified by replacing
memblock_is_map_memory() with memblock_is_memory().

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/Kconfig   | 3 ---
 arch/arm64/mm/init.c | 4 ++--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e4e1b6550115..58e439046d05 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
 	def_bool y
 	depends on NUMA
 
-config HOLES_IN_ZONE
-	def_bool y
-
 source "kernel/Kconfig.hz"
 
 config ARCH_SPARSEMEM_ENABLE
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index dc03bdc12c0f..eb3f56fb8c7c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
 
 	/*
 	 * ZONE_DEVICE memory does not have the memblock entries.
-	 * memblock_is_map_memory() check for ZONE_DEVICE based
+	 * memblock_is_memory() check for ZONE_DEVICE based
 	 * addresses will always fail. Even the normal hotplugged
 	 * memory will never have MEMBLOCK_NOMAP flag set in their
 	 * memblock entries. Skip memblock search for all non early
@@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
 		return pfn_section_valid(ms, pfn);
 }
 #endif
-	return memblock_is_map_memory(addr);
+	return memblock_is_memory(addr);
 }
 EXPORT_SYMBOL(pfn_valid);
 
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-21  6:51 ` [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
@ 2021-04-21  7:49   ` David Hildenbrand
  2021-04-21 11:06   ` Anshuman Khandual
  1 sibling, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21  7:49 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On 21.04.21 08:51, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The arm64's version of pfn_valid() differs from the generic because of two
> reasons:
> 
> * Parts of the memory map are freed during boot. This makes it necessary to
>    verify that there is actual physical memory that corresponds to a pfn
>    which is done by querying memblock.
> 
> * There are NOMAP memory regions. These regions are not mapped in the
>    linear map and until the previous commit the struct pages representing
>    these areas had default values.
> 
> As the consequence of absence of the special treatment of NOMAP regions in
> the memory map it was necessary to use memblock_is_map_memory() in
> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> generic mm functionality would not treat a NOMAP page as a normal page.
> 
> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> the rest of core mm will treat them as unusable memory and thus
> pfn_valid_within() is no longer required at all and can be disabled by
> removing CONFIG_HOLES_IN_ZONE on arm64.
> 
> pfn_valid() can be slightly simplified by replacing
> memblock_is_map_memory() with memblock_is_memory().
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>   arch/arm64/Kconfig   | 3 ---
>   arch/arm64/mm/init.c | 4 ++--
>   2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e4e1b6550115..58e439046d05 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>   	def_bool y
>   	depends on NUMA
>   
> -config HOLES_IN_ZONE
> -	def_bool y
> -
>   source "kernel/Kconfig.hz"
>   
>   config ARCH_SPARSEMEM_ENABLE
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index dc03bdc12c0f..eb3f56fb8c7c 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
>   
>   	/*
>   	 * ZONE_DEVICE memory does not have the memblock entries.
> -	 * memblock_is_map_memory() check for ZONE_DEVICE based
> +	 * memblock_is_memory() check for ZONE_DEVICE based
>   	 * addresses will always fail. Even the normal hotplugged
>   	 * memory will never have MEMBLOCK_NOMAP flag set in their
>   	 * memblock entries. Skip memblock search for all non early
> @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
>   		return pfn_section_valid(ms, pfn);
>   }
>   #endif
> -	return memblock_is_map_memory(addr);
> +	return memblock_is_memory(addr);
>   }
>   EXPORT_SYMBOL(pfn_valid);
>   
> 

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 2/4] memblock: update initialization of reserved pages
  2021-04-21  6:51 ` [PATCH v2 2/4] memblock: update initialization of reserved pages Mike Rapoport
@ 2021-04-21  7:49   ` David Hildenbrand
  2021-04-21 10:51   ` Anshuman Khandual
  1 sibling, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-04-21  7:49 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On 21.04.21 08:51, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
> 
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
> 
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>   include/linux/memblock.h |  4 +++-
>   mm/memblock.c            | 28 ++++++++++++++++++++++++++--
>   2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 5984fff3f175..634c1a578db8 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
>    * @MEMBLOCK_NONE: no special request
>    * @MEMBLOCK_HOTPLUG: hotpluggable region
>    * @MEMBLOCK_MIRROR: mirrored region
> - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
> + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
> + * reserved in the memory map; refer to memblock_mark_nomap() description
> + * for futher details
>    */
>   enum memblock_flags {
>   	MEMBLOCK_NONE		= 0x0,	/* No special request */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..3abf2c3fea7f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
>    * @base: the base phys addr of the region
>    * @size: the size of the region
>    *
> + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
> + * direct mapping of the physical memory. These regions will still be
> + * covered by the memory map. The struct page representing NOMAP memory
> + * frames in the memory map will be PageReserved()
> + *
>    * Return: 0 on success, -errno on failure.
>    */
>   int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
> @@ -2002,6 +2007,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
>   	return end_pfn - start_pfn;
>   }
>   
> +static void __init memmap_init_reserved_pages(void)
> +{
> +	struct memblock_region *region;
> +	phys_addr_t start, end;
> +	u64 i;
> +
> +	/* initialize struct pages for the reserved regions */
> +	for_each_reserved_mem_range(i, &start, &end)
> +		reserve_bootmem_region(start, end);
> +
> +	/* and also treat struct pages for the NOMAP regions as PageReserved */
> +	for_each_mem_region(region) {
> +		if (memblock_is_nomap(region)) {
> +			start = region->base;
> +			end = start + region->size;
> +			reserve_bootmem_region(start, end);
> +		}
> +	}
> +}
> +
>   static unsigned long __init free_low_memory_core_early(void)
>   {
>   	unsigned long count = 0;
> @@ -2010,8 +2035,7 @@ static unsigned long __init free_low_memory_core_early(void)
>   
>   	memblock_clear_hotplug(0, -1);
>   
> -	for_each_reserved_mem_range(i, &start, &end)
> -		reserve_bootmem_region(start, end);
> +	memmap_init_reserved_pages();
>   
>   	/*
>   	 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid()
  2021-04-21  6:51 ` [PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
@ 2021-04-21 10:49   ` Anshuman Khandual
  0 siblings, 0 replies; 47+ messages in thread
From: Anshuman Khandual @ 2021-04-21 10:49 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Andrew Morton, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On 4/21/21 12:21 PM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Add comment describing the semantics of pfn_valid() that clarifies that
> pfn_valid() only checks for availability of a memory map entry (i.e. struct
> page) for a PFN rather than availability of usable memory backing that PFN.
> 
> The most "generic" version of pfn_valid() used by the configurations with
> SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most
> suitable place for documentation about semantics of pfn_valid().
> 
> Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  include/linux/mmzone.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 47946cec7584..961f0eeefb62 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1410,6 +1410,17 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>  #endif
>  
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> +/**
> + * pfn_valid - check if there is a valid memory map entry for a PFN
> + * @pfn: the page frame number to check
> + *
> + * Check if there is a valid memory map entry aka struct page for the @pfn.
> + * Note, that availability of the memory map entry does not imply that
> + * there is actual usable memory at that @pfn. The struct page may
> + * represent a hole or an unusable page frame.
> + *
> + * Return: 1 for PFNs that have memory map entries and 0 otherwise
> + */
>  static inline int pfn_valid(unsigned long pfn)
>  {
>  	struct mem_section *ms;
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 2/4] memblock: update initialization of reserved pages
  2021-04-21  6:51 ` [PATCH v2 2/4] memblock: update initialization of reserved pages Mike Rapoport
  2021-04-21  7:49   ` David Hildenbrand
@ 2021-04-21 10:51   ` Anshuman Khandual
  1 sibling, 0 replies; 47+ messages in thread
From: Anshuman Khandual @ 2021-04-21 10:51 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Andrew Morton, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm


On 4/21/21 12:21 PM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
> 
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
> 
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  include/linux/memblock.h |  4 +++-
>  mm/memblock.c            | 28 ++++++++++++++++++++++++++--
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 5984fff3f175..634c1a578db8 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
>   * @MEMBLOCK_NONE: no special request
>   * @MEMBLOCK_HOTPLUG: hotpluggable region
>   * @MEMBLOCK_MIRROR: mirrored region
> - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
> + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
> + * reserved in the memory map; refer to memblock_mark_nomap() description
> + * for futher details

Small nit - s/futher/further

>   */
>  enum memblock_flags {
>  	MEMBLOCK_NONE		= 0x0,	/* No special request */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..3abf2c3fea7f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
>   * @base: the base phys addr of the region
>   * @size: the size of the region
>   *
> + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
> + * direct mapping of the physical memory. These regions will still be
> + * covered by the memory map. The struct page representing NOMAP memory
> + * frames in the memory map will be PageReserved()
> + *
>   * Return: 0 on success, -errno on failure.
>   */
>  int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
> @@ -2002,6 +2007,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
>  	return end_pfn - start_pfn;
>  }
>  
> +static void __init memmap_init_reserved_pages(void)
> +{
> +	struct memblock_region *region;
> +	phys_addr_t start, end;
> +	u64 i;
> +
> +	/* initialize struct pages for the reserved regions */
> +	for_each_reserved_mem_range(i, &start, &end)
> +		reserve_bootmem_region(start, end);
> +
> +	/* and also treat struct pages for the NOMAP regions as PageReserved */
> +	for_each_mem_region(region) {
> +		if (memblock_is_nomap(region)) {
> +			start = region->base;
> +			end = start + region->size;
> +			reserve_bootmem_region(start, end);
> +		}
> +	}

I guess there is no possible method to unify both these loops.

> +}
> +
>  static unsigned long __init free_low_memory_core_early(void)
>  {
>  	unsigned long count = 0;
> @@ -2010,8 +2035,7 @@ static unsigned long __init free_low_memory_core_early(void)
>  
>  	memblock_clear_hotplug(0, -1);
>  
> -	for_each_reserved_mem_range(i, &start, &end)
> -		reserve_bootmem_region(start, end);
> +	memmap_init_reserved_pages();
>  
>  	/*
>  	 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> 


Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
  2021-04-21  6:51 ` [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
@ 2021-04-21 10:59   ` Anshuman Khandual
  2021-04-21 12:19     ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Anshuman Khandual @ 2021-04-21 10:59 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Andrew Morton, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm


On 4/21/21 12:21 PM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The intended semantics of pfn_valid() is to verify whether there is a
> struct page for the pfn in question and nothing else.
> 
> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> linear map vs those that require ioremap() to access them.
> 
> Introduce a dedicated pfn_is_map_memory() wrapper for
> memblock_is_map_memory() to perform such check and use it where
> appropriate.
> 
> Using a wrapper allows to avoid cyclic include dependencies.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/arm64/include/asm/memory.h |  2 +-
>  arch/arm64/include/asm/page.h   |  1 +
>  arch/arm64/kvm/mmu.c            |  2 +-
>  arch/arm64/mm/init.c            | 11 +++++++++++
>  arch/arm64/mm/ioremap.c         |  4 ++--
>  arch/arm64/mm/mmu.c             |  2 +-
>  6 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 0aabc3be9a75..194f9f993d30 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>  
>  #define virt_addr_valid(addr)	({					\
>  	__typeof__(addr) __addr = __tag_reset(addr);			\
> -	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
> +	__is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr));	\
>  })
>  
>  void dump_mem_limit(void);
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 012cffc574e8..99a6da91f870 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
>  typedef struct page *pgtable_t;
>  
>  extern int pfn_valid(unsigned long);
> +extern int pfn_is_map_memory(unsigned long);

Check patch is complaining about this.

WARNING: function definition argument 'unsigned long' should also have an identifier name
#50: FILE: arch/arm64/include/asm/page.h:41:
+extern int pfn_is_map_memory(unsigned long);


>  
>  #include <asm/memory.h>
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 8711894db8c2..23dd99e29b23 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>  
>  static bool kvm_is_device_pfn(unsigned long pfn)
>  {
> -	return !pfn_valid(pfn);
> +	return !pfn_is_map_memory(pfn);
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 3685e12aba9b..dc03bdc12c0f 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn)
>  }
>  EXPORT_SYMBOL(pfn_valid);
>  
> +int pfn_is_map_memory(unsigned long pfn)
> +{
> +	phys_addr_t addr = PFN_PHYS(pfn);
> +

Should also bring with it, the comment regarding upper bits in
the pfn from arm64 pfn_valid().

> +	if (PHYS_PFN(addr) != pfn)
> +		return 0;
> +	

 ^^^^^ trailing spaces here.

ERROR: trailing whitespace
#81: FILE: arch/arm64/mm/init.c:263:
+^I$

> +	return memblock_is_map_memory(addr);
> +}
> +EXPORT_SYMBOL(pfn_is_map_memory);
> +

Is the EXPORT_SYMBOL() required to build drivers which will use
pfn_is_map_memory() but currently use pfn_valid() ?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-21  6:51 ` [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
  2021-04-21  7:49   ` David Hildenbrand
@ 2021-04-21 11:06   ` Anshuman Khandual
  2021-04-21 12:24     ` Mike Rapoport
  1 sibling, 1 reply; 47+ messages in thread
From: Anshuman Khandual @ 2021-04-21 11:06 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Andrew Morton, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm


On 4/21/21 12:21 PM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The arm64's version of pfn_valid() differs from the generic because of two
> reasons:
> 
> * Parts of the memory map are freed during boot. This makes it necessary to
>   verify that there is actual physical memory that corresponds to a pfn
>   which is done by querying memblock.
> 
> * There are NOMAP memory regions. These regions are not mapped in the
>   linear map and until the previous commit the struct pages representing
>   these areas had default values.
> 
> As the consequence of absence of the special treatment of NOMAP regions in
> the memory map it was necessary to use memblock_is_map_memory() in
> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> generic mm functionality would not treat a NOMAP page as a normal page.
> 
> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> the rest of core mm will treat them as unusable memory and thus
> pfn_valid_within() is no longer required at all and can be disabled by
> removing CONFIG_HOLES_IN_ZONE on arm64.

This makes sense.

> 
> pfn_valid() can be slightly simplified by replacing
> memblock_is_map_memory() with memblock_is_memory().
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/arm64/Kconfig   | 3 ---
>  arch/arm64/mm/init.c | 4 ++--
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e4e1b6550115..58e439046d05 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>  	def_bool y
>  	depends on NUMA
>  
> -config HOLES_IN_ZONE
> -	def_bool y
> -

Right.

>  source "kernel/Kconfig.hz"
>  
>  config ARCH_SPARSEMEM_ENABLE
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index dc03bdc12c0f..eb3f56fb8c7c 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
>  
>  	/*
>  	 * ZONE_DEVICE memory does not have the memblock entries.
> -	 * memblock_is_map_memory() check for ZONE_DEVICE based
> +	 * memblock_is_memory() check for ZONE_DEVICE based
>  	 * addresses will always fail. Even the normal hotplugged
>  	 * memory will never have MEMBLOCK_NOMAP flag set in their
>  	 * memblock entries. Skip memblock search for all non early
> @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
>  		return pfn_section_valid(ms, pfn);
>  }
>  #endif
> -	return memblock_is_map_memory(addr);
> +	return memblock_is_memory(addr);

Wondering if MEMBLOCK_NOMAP is now being treated similarly to other
memory pfns for page table walking purpose but with PageReserved(),
why memblock_is_memory() is still required ? At this point, should
not we just return valid for early_section() memory. As pfn_valid()
now just implies that pfn has a struct page backing which has been
already verified with valid_section() etc.

>  }
>  EXPORT_SYMBOL(pfn_valid);
>  
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
  2021-04-21 10:59   ` Anshuman Khandual
@ 2021-04-21 12:19     ` Mike Rapoport
  2021-04-21 13:13       ` Anshuman Khandual
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-21 12:19 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Andrew Morton, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On Wed, Apr 21, 2021 at 04:29:48PM +0530, Anshuman Khandual wrote:
> 
> On 4/21/21 12:21 PM, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > The intended semantics of pfn_valid() is to verify whether there is a
> > struct page for the pfn in question and nothing else.
> > 
> > Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> > linear map vs those that require ioremap() to access them.
> > 
> > Introduce a dedicated pfn_is_map_memory() wrapper for
> > memblock_is_map_memory() to perform such check and use it where
> > appropriate.
> > 
> > Using a wrapper allows to avoid cyclic include dependencies.
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  arch/arm64/include/asm/memory.h |  2 +-
> >  arch/arm64/include/asm/page.h   |  1 +
> >  arch/arm64/kvm/mmu.c            |  2 +-
> >  arch/arm64/mm/init.c            | 11 +++++++++++
> >  arch/arm64/mm/ioremap.c         |  4 ++--
> >  arch/arm64/mm/mmu.c             |  2 +-
> >  6 files changed, 17 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > index 0aabc3be9a75..194f9f993d30 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
> >  
> >  #define virt_addr_valid(addr)	({					\
> >  	__typeof__(addr) __addr = __tag_reset(addr);			\
> > -	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
> > +	__is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr));	\
> >  })
> >  
> >  void dump_mem_limit(void);
> > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> > index 012cffc574e8..99a6da91f870 100644
> > --- a/arch/arm64/include/asm/page.h
> > +++ b/arch/arm64/include/asm/page.h
> > @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
> >  typedef struct page *pgtable_t;
> >  
> >  extern int pfn_valid(unsigned long);
> > +extern int pfn_is_map_memory(unsigned long);
> 
> Check patch is complaining about this.
> 
> WARNING: function definition argument 'unsigned long' should also have an identifier name
> #50: FILE: arch/arm64/include/asm/page.h:41:
> +extern int pfn_is_map_memory(unsigned long);
> 
> 
> >  
> >  #include <asm/memory.h>
> >  
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 8711894db8c2..23dd99e29b23 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> >  
> >  static bool kvm_is_device_pfn(unsigned long pfn)
> >  {
> > -	return !pfn_valid(pfn);
> > +	return !pfn_is_map_memory(pfn);
> >  }
> >  
> >  /*
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 3685e12aba9b..dc03bdc12c0f 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn)
> >  }
> >  EXPORT_SYMBOL(pfn_valid);
> >  
> > +int pfn_is_map_memory(unsigned long pfn)
> > +{
> > +	phys_addr_t addr = PFN_PHYS(pfn);
> > +
> 
> Should also bring with it, the comment regarding upper bits in
> the pfn from arm64 pfn_valid().

I think a reference to the comment in pfn_valid() will suffice.

BTW, I wonder how is that other architectures do not need this check?
 
> > +	if (PHYS_PFN(addr) != pfn)
> > +		return 0;
> > +	
> 
>  ^^^^^ trailing spaces here.
> 
> ERROR: trailing whitespace
> #81: FILE: arch/arm64/mm/init.c:263:
> +^I$

Oops :)
 
> > +	return memblock_is_map_memory(addr);
> > +}
> > +EXPORT_SYMBOL(pfn_is_map_memory);
> > +
> 
> Is the EXPORT_SYMBOL() required to build drivers which will use
> pfn_is_map_memory() but currently use pfn_valid() ?

Yes, this is required for virt_addr_valid() that is used by modules.

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-21 11:06   ` Anshuman Khandual
@ 2021-04-21 12:24     ` Mike Rapoport
  2021-04-21 13:15       ` Anshuman Khandual
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-21 12:24 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Andrew Morton, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On Wed, Apr 21, 2021 at 04:36:46PM +0530, Anshuman Khandual wrote:
> 
> On 4/21/21 12:21 PM, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > The arm64's version of pfn_valid() differs from the generic because of two
> > reasons:
> > 
> > * Parts of the memory map are freed during boot. This makes it necessary to
> >   verify that there is actual physical memory that corresponds to a pfn
> >   which is done by querying memblock.
> > 
> > * There are NOMAP memory regions. These regions are not mapped in the
> >   linear map and until the previous commit the struct pages representing
> >   these areas had default values.
> > 
> > As the consequence of absence of the special treatment of NOMAP regions in
> > the memory map it was necessary to use memblock_is_map_memory() in
> > pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> > generic mm functionality would not treat a NOMAP page as a normal page.
> > 
> > Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> > the rest of core mm will treat them as unusable memory and thus
> > pfn_valid_within() is no longer required at all and can be disabled by
> > removing CONFIG_HOLES_IN_ZONE on arm64.
> 
> This makes sense.
> 
> > 
> > pfn_valid() can be slightly simplified by replacing
> > memblock_is_map_memory() with memblock_is_memory().
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  arch/arm64/Kconfig   | 3 ---
> >  arch/arm64/mm/init.c | 4 ++--
> >  2 files changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index e4e1b6550115..58e439046d05 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> >  	def_bool y
> >  	depends on NUMA
> >  
> > -config HOLES_IN_ZONE
> > -	def_bool y
> > -
> 
> Right.
> 
> >  source "kernel/Kconfig.hz"
> >  
> >  config ARCH_SPARSEMEM_ENABLE
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index dc03bdc12c0f..eb3f56fb8c7c 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
> >  
> >  	/*
> >  	 * ZONE_DEVICE memory does not have the memblock entries.
> > -	 * memblock_is_map_memory() check for ZONE_DEVICE based
> > +	 * memblock_is_memory() check for ZONE_DEVICE based
> >  	 * addresses will always fail. Even the normal hotplugged
> >  	 * memory will never have MEMBLOCK_NOMAP flag set in their
> >  	 * memblock entries. Skip memblock search for all non early
> > @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
> >  		return pfn_section_valid(ms, pfn);
> >  }
> >  #endif
> > -	return memblock_is_map_memory(addr);
> > +	return memblock_is_memory(addr);
> 
> Wondering if MEMBLOCK_NOMAP is now being treated similarly to other
> memory pfns for page table walking purpose but with PageReserved(),
> why memblock_is_memory() is still required ? At this point, should
> not we just return valid for early_section() memory. As pfn_valid()
> now just implies that pfn has a struct page backing which has been
> already verified with valid_section() etc.

memblock_is_memory() is required because arm64 frees unused parts of the
memory map. So, for instance, if we have 64M out of 128M populated in a
section the section based calculation would return 1 for a pfn in the
second half of the section, but there would be no memory map there.


> >  }
> >  EXPORT_SYMBOL(pfn_valid);
> >  
> > 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
  2021-04-21 12:19     ` Mike Rapoport
@ 2021-04-21 13:13       ` Anshuman Khandual
  0 siblings, 0 replies; 47+ messages in thread
From: Anshuman Khandual @ 2021-04-21 13:13 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, Andrew Morton, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On 4/21/21 5:49 PM, Mike Rapoport wrote:
> On Wed, Apr 21, 2021 at 04:29:48PM +0530, Anshuman Khandual wrote:
>>
>> On 4/21/21 12:21 PM, Mike Rapoport wrote:
>>> From: Mike Rapoport <rppt@linux.ibm.com>
>>>
>>> The intended semantics of pfn_valid() is to verify whether there is a
>>> struct page for the pfn in question and nothing else.
>>>
>>> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
>>> linear map vs those that require ioremap() to access them.
>>>
>>> Introduce a dedicated pfn_is_map_memory() wrapper for
>>> memblock_is_map_memory() to perform such check and use it where
>>> appropriate.
>>>
>>> Using a wrapper allows to avoid cyclic include dependencies.
>>>
>>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
>>> ---
>>>  arch/arm64/include/asm/memory.h |  2 +-
>>>  arch/arm64/include/asm/page.h   |  1 +
>>>  arch/arm64/kvm/mmu.c            |  2 +-
>>>  arch/arm64/mm/init.c            | 11 +++++++++++
>>>  arch/arm64/mm/ioremap.c         |  4 ++--
>>>  arch/arm64/mm/mmu.c             |  2 +-
>>>  6 files changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>>> index 0aabc3be9a75..194f9f993d30 100644
>>> --- a/arch/arm64/include/asm/memory.h
>>> +++ b/arch/arm64/include/asm/memory.h
>>> @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>>>  
>>>  #define virt_addr_valid(addr)	({					\
>>>  	__typeof__(addr) __addr = __tag_reset(addr);			\
>>> -	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
>>> +	__is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr));	\
>>>  })
>>>  
>>>  void dump_mem_limit(void);
>>> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
>>> index 012cffc574e8..99a6da91f870 100644
>>> --- a/arch/arm64/include/asm/page.h
>>> +++ b/arch/arm64/include/asm/page.h
>>> @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
>>>  typedef struct page *pgtable_t;
>>>  
>>>  extern int pfn_valid(unsigned long);
>>> +extern int pfn_is_map_memory(unsigned long);
>>
>> Check patch is complaining about this.
>>
>> WARNING: function definition argument 'unsigned long' should also have an identifier name
>> #50: FILE: arch/arm64/include/asm/page.h:41:
>> +extern int pfn_is_map_memory(unsigned long);
>>
>>
>>>  
>>>  #include <asm/memory.h>
>>>  
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 8711894db8c2..23dd99e29b23 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>>>  
>>>  static bool kvm_is_device_pfn(unsigned long pfn)
>>>  {
>>> -	return !pfn_valid(pfn);
>>> +	return !pfn_is_map_memory(pfn);
>>>  }
>>>  
>>>  /*
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 3685e12aba9b..dc03bdc12c0f 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn)
>>>  }
>>>  EXPORT_SYMBOL(pfn_valid);
>>>  
>>> +int pfn_is_map_memory(unsigned long pfn)
>>> +{
>>> +	phys_addr_t addr = PFN_PHYS(pfn);
>>> +
>>
>> Should also bring with it, the comment regarding upper bits in
>> the pfn from arm64 pfn_valid().
> 
> I think a reference to the comment in pfn_valid() will suffice.

Okay.

> 
> BTW, I wonder how is that other architectures do not need this check?

Trying to move that into generic pfn_valid() in mmzone.h, will resend
the RFC patch after this series.

https://patchwork.kernel.org/project/linux-mm/patch/1615174073-10520-1-git-send-email-anshuman.khandual@arm.com/

>  
>>> +	if (PHYS_PFN(addr) != pfn)
>>> +		return 0;
>>> +	
>>
>>  ^^^^^ trailing spaces here.
>>
>> ERROR: trailing whitespace
>> #81: FILE: arch/arm64/mm/init.c:263:
>> +^I$
> 
> Oops :)
>  
>>> +	return memblock_is_map_memory(addr);
>>> +}
>>> +EXPORT_SYMBOL(pfn_is_map_memory);
>>> +
>>
>> Is the EXPORT_SYMBOL() required to build drivers which will use
>> pfn_is_map_memory() but currently use pfn_valid() ?
> 
> Yes, this is required for virt_addr_valid() that is used by modules.
> 

There will be two adjacent EXPORT_SYMBOLS(), one for pfn_valid() and
one for pfn_is_map_memory(). But its okay I guess, cant help it.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-21 12:24     ` Mike Rapoport
@ 2021-04-21 13:15       ` Anshuman Khandual
  0 siblings, 0 replies; 47+ messages in thread
From: Anshuman Khandual @ 2021-04-21 13:15 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, Andrew Morton, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm



On 4/21/21 5:54 PM, Mike Rapoport wrote:
> On Wed, Apr 21, 2021 at 04:36:46PM +0530, Anshuman Khandual wrote:
>>
>> On 4/21/21 12:21 PM, Mike Rapoport wrote:
>>> From: Mike Rapoport <rppt@linux.ibm.com>
>>>
>>> The arm64's version of pfn_valid() differs from the generic because of two
>>> reasons:
>>>
>>> * Parts of the memory map are freed during boot. This makes it necessary to
>>>   verify that there is actual physical memory that corresponds to a pfn
>>>   which is done by querying memblock.
>>>
>>> * There are NOMAP memory regions. These regions are not mapped in the
>>>   linear map and until the previous commit the struct pages representing
>>>   these areas had default values.
>>>
>>> As the consequence of absence of the special treatment of NOMAP regions in
>>> the memory map it was necessary to use memblock_is_map_memory() in
>>> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
>>> generic mm functionality would not treat a NOMAP page as a normal page.
>>>
>>> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
>>> the rest of core mm will treat them as unusable memory and thus
>>> pfn_valid_within() is no longer required at all and can be disabled by
>>> removing CONFIG_HOLES_IN_ZONE on arm64.
>>
>> This makes sense.
>>
>>>
>>> pfn_valid() can be slightly simplified by replacing
>>> memblock_is_map_memory() with memblock_is_memory().
>>>
>>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
>>> ---
>>>  arch/arm64/Kconfig   | 3 ---
>>>  arch/arm64/mm/init.c | 4 ++--
>>>  2 files changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index e4e1b6550115..58e439046d05 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>>  	def_bool y
>>>  	depends on NUMA
>>>  
>>> -config HOLES_IN_ZONE
>>> -	def_bool y
>>> -
>>
>> Right.
>>
>>>  source "kernel/Kconfig.hz"
>>>  
>>>  config ARCH_SPARSEMEM_ENABLE
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index dc03bdc12c0f..eb3f56fb8c7c 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
>>>  
>>>  	/*
>>>  	 * ZONE_DEVICE memory does not have the memblock entries.
>>> -	 * memblock_is_map_memory() check for ZONE_DEVICE based
>>> +	 * memblock_is_memory() check for ZONE_DEVICE based
>>>  	 * addresses will always fail. Even the normal hotplugged
>>>  	 * memory will never have MEMBLOCK_NOMAP flag set in their
>>>  	 * memblock entries. Skip memblock search for all non early
>>> @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
>>>  		return pfn_section_valid(ms, pfn);
>>>  }
>>>  #endif
>>> -	return memblock_is_map_memory(addr);
>>> +	return memblock_is_memory(addr);
>>
>> Wondering if MEMBLOCK_NOMAP is now being treated similarly to other
>> memory pfns for page table walking purpose but with PageReserved(),
>> why memblock_is_memory() is still required ? At this point, should
>> not we just return valid for early_section() memory. As pfn_valid()
>> now just implies that pfn has a struct page backing which has been
>> already verified with valid_section() etc.
> 
> memblock_is_memory() is required because arm64 frees unused parts of the
> memory map. So, for instance, if we have 64M out of 128M populated in a
> section the section based calculation would return 1 for a pfn in the
> second half of the section, but there would be no memory map there.

Understood.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-21  6:51 [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
                   ` (3 preceding siblings ...)
  2021-04-21  6:51 ` [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
@ 2021-04-22  7:00 ` Kefeng Wang
  2021-04-22  7:29   ` Mike Rapoport
  4 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-04-22  7:00 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, David Hildenbrand, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm


On 2021/4/21 14:51, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> Hi,
>
> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> pfn_valid_within() to 1.
>
> The idea is to mark NOMAP pages as reserved in the memory map and restore
> the intended semantics of pfn_valid() to designate availability of struct
> page for a pfn.
>
> With this the core mm will be able to cope with the fact that it cannot use
> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> will be treated correctly even without the need for pfn_valid_within.
>
> The patches are only boot tested on qemu-system-aarch64 so I'd really
> appreciate memory stress tests on real hardware.
>
> If this actually works we'll be one step closer to drop custom pfn_valid()
> on arm64 altogether.

Hi Mike,I have a question, without HOLES_IN_ZONE, the pfn_valid_within() 
in move_freepages_block()->move_freepages()
will be optimized, if there are holes in zone, the 'struce page'(memory 
map) for pfn range of hole will be free by
free_memmap(), and then the page traverse in the zone(with holes) from 
move_freepages() will meet the wrong page,
then it could panic at PageLRU(page) test, check link[1],

"The idea is to mark NOMAP pages as reserved in the memory map", I see 
the patch2 check memblock_is_nomap() in memory region
of memblock, but it seems that memblock_mark_nomap() is not called(maybe 
I missed), then memmap_init_reserved_pages() won't
work, so should the HOLES_IN_ZONE still be needed for generic mm code?

[1] 
https://lore.kernel.org/linux-arm-kernel/541193a6-2bce-f042-5bb2-88913d5f1047@arm.com/


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-22  7:00 ` [PATCH v2 0/4] " Kefeng Wang
@ 2021-04-22  7:29   ` Mike Rapoport
  2021-04-22 15:28     ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-22  7:29 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Thu, Apr 22, 2021 at 03:00:20PM +0800, Kefeng Wang wrote:
> 
> On 2021/4/21 14:51, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Hi,
> > 
> > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> > pfn_valid_within() to 1.
> > 
> > The idea is to mark NOMAP pages as reserved in the memory map and restore
> > the intended semantics of pfn_valid() to designate availability of struct
> > page for a pfn.
> > 
> > With this the core mm will be able to cope with the fact that it cannot use
> > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> > will be treated correctly even without the need for pfn_valid_within.
> > 
> > The patches are only boot tested on qemu-system-aarch64 so I'd really
> > appreciate memory stress tests on real hardware.
> > 
> > If this actually works we'll be one step closer to drop custom pfn_valid()
> > on arm64 altogether.
> 
> Hi Mike,I have a question, without HOLES_IN_ZONE, the pfn_valid_within() in
> move_freepages_block()->move_freepages()
> will be optimized, if there are holes in zone, the 'struce page'(memory map)
> for pfn range of hole will be free by
> free_memmap(), and then the page traverse in the zone(with holes) from
> move_freepages() will meet the wrong page,
> then it could panic at PageLRU(page) test, check link[1],

First, HOLES_IN_ZONE name us hugely misleading, this configuration option
has nothing to to with memory holes, but rather it is there to deal with
holes or undefined struct pages in the memory map, when these holes can be
inside a MAX_ORDER_NR_PAGES region.

In general pfn walkers use pfn_valid() and pfn_valid_within() to avoid
accessing *missing* struct pages, like those that are freed at
free_memmap(). But on arm64 these tests also filter out the nomap entries
because their struct pages are not initialized.

The panic you refer to happened because there was an uninitialized struct
page in the middle of MAX_ORDER_NR_PAGES region because it corresponded to
nomap memory.

With these changes I make sure that such pages will be properly initialized
as PageReserved and the pfn walkers will be able to rely on the memory map.

Note also, that free_memmap() aligns the parts being freed on MAX_ORDER
boundaries, so there will be no missing parts in the memory map within a
MAX_ORDER_NR_PAGES region.
 
> "The idea is to mark NOMAP pages as reserved in the memory map", I see the
> patch2 check memblock_is_nomap() in memory region
> of memblock, but it seems that memblock_mark_nomap() is not called(maybe I
> missed), then memmap_init_reserved_pages() won't
> work, so should the HOLES_IN_ZONE still be needed for generic mm code?
> 
> [1] https://lore.kernel.org/linux-arm-kernel/541193a6-2bce-f042-5bb2-88913d5f1047@arm.com/
> 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-22  7:29   ` Mike Rapoport
@ 2021-04-22 15:28     ` Kefeng Wang
  2021-04-23  8:11       ` Kefeng Wang
  2021-04-25  6:59       ` [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
  0 siblings, 2 replies; 47+ messages in thread
From: Kefeng Wang @ 2021-04-22 15:28 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm


On 2021/4/22 15:29, Mike Rapoport wrote:
> On Thu, Apr 22, 2021 at 03:00:20PM +0800, Kefeng Wang wrote:
>> On 2021/4/21 14:51, Mike Rapoport wrote:
>>> From: Mike Rapoport <rppt@linux.ibm.com>
>>>
>>> Hi,
>>>
>>> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
>>> pfn_valid_within() to 1.
>>>
>>> The idea is to mark NOMAP pages as reserved in the memory map and restore
>>> the intended semantics of pfn_valid() to designate availability of struct
>>> page for a pfn.
>>>
>>> With this the core mm will be able to cope with the fact that it cannot use
>>> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
>>> will be treated correctly even without the need for pfn_valid_within.
>>>
>>> The patches are only boot tested on qemu-system-aarch64 so I'd really
>>> appreciate memory stress tests on real hardware.
>>>
>>> If this actually works we'll be one step closer to drop custom pfn_valid()
>>> on arm64 altogether.
>> Hi Mike,I have a question, without HOLES_IN_ZONE, the pfn_valid_within() in
>> move_freepages_block()->move_freepages()
>> will be optimized, if there are holes in zone, the 'struce page'(memory map)
>> for pfn range of hole will be free by
>> free_memmap(), and then the page traverse in the zone(with holes) from
>> move_freepages() will meet the wrong page,
>> then it could panic at PageLRU(page) test, check link[1],
> First, HOLES_IN_ZONE name us hugely misleading, this configuration option
> has nothing to to with memory holes, but rather it is there to deal with
> holes or undefined struct pages in the memory map, when these holes can be
> inside a MAX_ORDER_NR_PAGES region.
>
> In general pfn walkers use pfn_valid() and pfn_valid_within() to avoid
> accessing *missing* struct pages, like those that are freed at
> free_memmap(). But on arm64 these tests also filter out the nomap entries
> because their struct pages are not initialized.
>
> The panic you refer to happened because there was an uninitialized struct
> page in the middle of MAX_ORDER_NR_PAGES region because it corresponded to
> nomap memory.
>
> With these changes I make sure that such pages will be properly initialized
> as PageReserved and the pfn walkers will be able to rely on the memory map.
>
> Note also, that free_memmap() aligns the parts being freed on MAX_ORDER
> boundaries, so there will be no missing parts in the memory map within a
> MAX_ORDER_NR_PAGES region.

Ok, thanks, we met a same panic like the link on arm32(without 
HOLES_IN_ZONE),

the scheme for arm64 could be suit for arm32, right?  I will try the 
patchset with

some changes on arm32 and give some feedback.

Again, the stupid question, where will mark the region of memblock with

MEMBLOCK_NOMAP flag ?


>   
>> "The idea is to mark NOMAP pages as reserved in the memory map", I see the
>> patch2 check memblock_is_nomap() in memory region
>> of memblock, but it seems that memblock_mark_nomap() is not called(maybe I
>> missed), then memmap_init_reserved_pages() won't
>> work, so should the HOLES_IN_ZONE still be needed for generic mm code?
>>
>> [1] https://lore.kernel.org/linux-arm-kernel/541193a6-2bce-f042-5bb2-88913d5f1047@arm.com/
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-22 15:28     ` Kefeng Wang
@ 2021-04-23  8:11       ` Kefeng Wang
  2021-04-25  7:19         ` arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()) Mike Rapoport
  2021-04-25  6:59       ` [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
  1 sibling, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-04-23  8:11 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm


On 2021/4/22 23:28, Kefeng Wang wrote:
>
> On 2021/4/22 15:29, Mike Rapoport wrote:
>> On Thu, Apr 22, 2021 at 03:00:20PM +0800, Kefeng Wang wrote:
>>> On 2021/4/21 14:51, Mike Rapoport wrote:
>>>> From: Mike Rapoport <rppt@linux.ibm.com>
>>>>
>>>> Hi,
>>>>
>>>> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially 
>>>> hardwire
>>>> pfn_valid_within() to 1.
>>>>
>>>> The idea is to mark NOMAP pages as reserved in the memory map and 
>>>> restore
>>>> the intended semantics of pfn_valid() to designate availability of 
>>>> struct
>>>> page for a pfn.
>>>>
>>>> With this the core mm will be able to cope with the fact that it 
>>>> cannot use
>>>> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER 
>>>> blocks
>>>> will be treated correctly even without the need for pfn_valid_within.
>>>>
>>>> The patches are only boot tested on qemu-system-aarch64 so I'd really
>>>> appreciate memory stress tests on real hardware.
>>>>
>>>> If this actually works we'll be one step closer to drop custom 
>>>> pfn_valid()
>>>> on arm64 altogether.
...
>
> Ok, thanks, we met a same panic like the link on arm32(without 
> HOLES_IN_ZONE),
>
> the scheme for arm64 could be suit for arm32, right?  I will try the 
> patchset with
>
> some changes on arm32 and give some feedback.

I tested this patchset(plus arm32 change, like arm64 does) based on lts 
5.10,add

some debug log, the useful info shows below, if we enable HOLES_IN_ZONE, 
no panic,

any idea, thanks.

Zone ranges:
   Normal   [mem 0x0000000080a00000-0x00000000b01fffff]
   HighMem  [mem 0x00000000b0200000-0x00000000ffffefff]
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x0000000080a00000-0x00000000855fffff]
   node   0: [mem 0x0000000086a00000-0x0000000087dfffff]
   node   0: [mem 0x000000008bd00000-0x000000008c4fffff]
   node   0: [mem 0x000000008e300000-0x000000008ecfffff]
   node   0: [mem 0x0000000090d00000-0x00000000bfffffff]
   node   0: [mem 0x00000000cc000000-0x00000000dc9fffff]
   node   0: [mem 0x00000000de700000-0x00000000de9fffff]
   node   0: [mem 0x00000000e0800000-0x00000000e0bfffff]
   node   0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
   node   0: [mem 0x00000000fda00000-0x00000000ffffefff]

----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86a00, 86a00000
----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e300, 8e300000
----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de700, de700000
----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
=== >move_freepages: start_pfn/end_pfn [de600, de7ff], [de600000, 
de7ff000] :  pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags 
= ffffffff
8<--- cut here ---
Unable to handle kernel paging request at virtual address fffffffe
pgd = 5dd50df5
[fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000
Internal error: Oops: 37 [#1] SMP ARM
Modules linked in: gmac(O)
CPU: 2 PID: 635 Comm: test-oom Tainted: G           O      5.10.0+ #31
Hardware name: Hisilicon A9
PC is at move_freepages_block+0x150/0x278
LR is at move_freepages_block+0x150/0x278
pc : [<c02383a4>]    lr : [<c02383a4>]    psr: 200e0393
sp : c4179cf8  ip : 00000000  fp : 00000001
r10: c4179d58  r9 : 000de7ff  r8 : 00000000
r7 : c0863280  r6 : 000de600  r5 : 000de600  r4 : ef3cc000
r3 : ffffffff  r2 : 00000000  r1 : ef5d069c  r0 : fffffffe
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 1ac5387d  Table: 83b0c04a  DAC: 55555555
Process test-oom (pid: 635, stack limit = 0x25d667df)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-22 15:28     ` Kefeng Wang
  2021-04-23  8:11       ` Kefeng Wang
@ 2021-04-25  6:59       ` Mike Rapoport
  1 sibling, 0 replies; 47+ messages in thread
From: Mike Rapoport @ 2021-04-25  6:59 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Thu, Apr 22, 2021 at 11:28:24PM +0800, Kefeng Wang wrote:
> 
> On 2021/4/22 15:29, Mike Rapoport wrote:
> > On Thu, Apr 22, 2021 at 03:00:20PM +0800, Kefeng Wang wrote:
> > > On 2021/4/21 14:51, Mike Rapoport wrote:
> > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > 
> > > > Hi,
> > > > 
> > > > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> > > > pfn_valid_within() to 1.
> > > > 
> > > > The idea is to mark NOMAP pages as reserved in the memory map and restore
> > > > the intended semantics of pfn_valid() to designate availability of struct
> > > > page for a pfn.
> > > > 
> > > > With this the core mm will be able to cope with the fact that it cannot use
> > > > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> > > > will be treated correctly even without the need for pfn_valid_within.
> > > > 
> > > > The patches are only boot tested on qemu-system-aarch64 so I'd really
> > > > appreciate memory stress tests on real hardware.
> > > > 
> > > > If this actually works we'll be one step closer to drop custom pfn_valid()
> > > > on arm64 altogether.
> > > Hi Mike,I have a question, without HOLES_IN_ZONE, the pfn_valid_within() in
> > > move_freepages_block()->move_freepages()
> > > will be optimized, if there are holes in zone, the 'struce page'(memory map)
> > > for pfn range of hole will be free by
> > > free_memmap(), and then the page traverse in the zone(with holes) from
> > > move_freepages() will meet the wrong page,
> > > then it could panic at PageLRU(page) test, check link[1],
> > First, HOLES_IN_ZONE name us hugely misleading, this configuration option
> > has nothing to to with memory holes, but rather it is there to deal with
> > holes or undefined struct pages in the memory map, when these holes can be
> > inside a MAX_ORDER_NR_PAGES region.
> > 
> > In general pfn walkers use pfn_valid() and pfn_valid_within() to avoid
> > accessing *missing* struct pages, like those that are freed at
> > free_memmap(). But on arm64 these tests also filter out the nomap entries
> > because their struct pages are not initialized.
> > 
> > The panic you refer to happened because there was an uninitialized struct
> > page in the middle of MAX_ORDER_NR_PAGES region because it corresponded to
> > nomap memory.
> > 
> > With these changes I make sure that such pages will be properly initialized
> > as PageReserved and the pfn walkers will be able to rely on the memory map.
> > 
> > Note also, that free_memmap() aligns the parts being freed on MAX_ORDER
> > boundaries, so there will be no missing parts in the memory map within a
> > MAX_ORDER_NR_PAGES region.
> 
> Ok, thanks, we met a same panic like the link on arm32(without
> HOLES_IN_ZONE),
> 
> the scheme for arm64 could be suit for arm32, right?

In general yes. You just need to make sure that usage of pfn_valid() in
arch/arm does not presume that it tests something beyond availability of
struct page for a pfn.
 
> I will try the patchset with some changes on arm32 and give some
> feedback.
> 
> Again, the stupid question, where will mark the region of memblock with
> MEMBLOCK_NOMAP flag ?
 
Not sure I understand the question. The memory regions with "nomap"
property in the device tree will be marked MEMBLOCK_NOMAP.
 
> > > "The idea is to mark NOMAP pages as reserved in the memory map", I see the
> > > patch2 check memblock_is_nomap() in memory region
> > > of memblock, but it seems that memblock_mark_nomap() is not called(maybe I
> > > missed), then memmap_init_reserved_pages() won't
> > > work, so should the HOLES_IN_ZONE still be needed for generic mm code?
> > > 
> > > [1] https://lore.kernel.org/linux-arm-kernel/541193a6-2bce-f042-5bb2-88913d5f1047@arm.com/
> > > 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-23  8:11       ` Kefeng Wang
@ 2021-04-25  7:19         ` Mike Rapoport
       [not found]           ` <52f7d03b-7219-46bc-c62d-b976bc31ebd5@huawei.com>
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-25  7:19 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote:
> 
> I tested this patchset(plus arm32 change, like arm64 does) based on lts
> 5.10,add
> 
> some debug log, the useful info shows below, if we enable HOLES_IN_ZONE, no
> panic,
> 
> any idea, thanks.
 
Are there any changes on top of 5.10 except for pfn_valid() patch?
Do you see this panic on 5.10 without the changes?
Can you see stack backtrace beyond move_freepages_block?

> Zone ranges:
>   Normal   [mem 0x0000000080a00000-0x00000000b01fffff]
>   HighMem  [mem 0x00000000b0200000-0x00000000ffffefff]
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x0000000080a00000-0x00000000855fffff]
>   node   0: [mem 0x0000000086a00000-0x0000000087dfffff]
>   node   0: [mem 0x000000008bd00000-0x000000008c4fffff]
>   node   0: [mem 0x000000008e300000-0x000000008ecfffff]
>   node   0: [mem 0x0000000090d00000-0x00000000bfffffff]
>   node   0: [mem 0x00000000cc000000-0x00000000dc9fffff]
>   node   0: [mem 0x00000000de700000-0x00000000de9fffff]
>   node   0: [mem 0x00000000e0800000-0x00000000e0bfffff]
>   node   0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
>   node   0: [mem 0x00000000fda00000-0x00000000ffffefff]
> 
> ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86a00, 86a00000
> ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e300, 8e300000
> ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
> ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de700, de700000
> ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
> ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
> ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
> === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000]
> :  pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff
> 8<--- cut here ---
> Unable to handle kernel paging request at virtual address fffffffe
> pgd = 5dd50df5
> [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000
> Internal error: Oops: 37 [#1] SMP ARM
> Modules linked in: gmac(O)
> CPU: 2 PID: 635 Comm: test-oom Tainted: G           O      5.10.0+ #31
> Hardware name: Hisilicon A9
> PC is at move_freepages_block+0x150/0x278
> LR is at move_freepages_block+0x150/0x278
> pc : [<c02383a4>]    lr : [<c02383a4>]    psr: 200e0393
> sp : c4179cf8  ip : 00000000  fp : 00000001
> r10: c4179d58  r9 : 000de7ff  r8 : 00000000
> r7 : c0863280  r6 : 000de600  r5 : 000de600  r4 : ef3cc000
> r3 : ffffffff  r2 : 00000000  r1 : ef5d069c  r0 : fffffffe
> Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 1ac5387d  Table: 83b0c04a  DAC: 55555555
> Process test-oom (pid: 635, stack limit = 0x25d667df)
> 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
       [not found]           ` <52f7d03b-7219-46bc-c62d-b976bc31ebd5@huawei.com>
@ 2021-04-26  5:20             ` Mike Rapoport
  2021-04-26 15:26               ` Kefeng Wang
  2021-05-12  3:50             ` Matthew Wilcox
  1 sibling, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-26  5:20 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote:
> 
> On 2021/4/25 15:19, Mike Rapoport wrote:
> 
>     On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote:
> 
>         I tested this patchset(plus arm32 change, like arm64 does) based on lts
>         5.10,add
> 
>         some debug log, the useful info shows below, if we enable HOLES_IN_ZONE, no
>         panic,
> 
>         any idea, thanks.
> 
> 
>     Are there any changes on top of 5.10 except for pfn_valid() patch?
>     Do you see this panic on 5.10 without the changes?
> 
> Yes, there are some BSP support for arm board based on 5.10, with or without
> 
> your patch will get same panic, the panic pfn=de600 in the range of
> [dcc00,de00]
> 
> which is freed by free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de700,
> de700000
> 
> we see the PC is at PageLRU, same reason like arm64 panic log,
> 
>    "PageBuddy in move_freepages returns false
>     Then we call PageLRU, the macro calls PF_HEAD which is compound_page()
>     compound_page reads page->compound_head, it is 0xffffffffffffffff, so it
>     resturns 0xfffffffffffffffe - and accessing this address causes crash"
> 
>     Can you see stack backtrace beyond move_freepages_block?
> 
> I do some oom test, so the log is about memory allocate,
> 
> [<c02383c8>] (move_freepages_block) from [<c0238668>]
> (steal_suitable_fallback+0x174/0x1f4)
> 
> [<c0238668>] (steal_suitable_fallback) from [<c023999c>] (get_page_from_freelist+0x490/0x9a4)

Hmm, this is called with a page from free list, having a page from a freed
part of the memory map passed to steal_suitable_fallback() means that there
is an issue with creation of the free list.

Can you please add "memblock=debug" to the kernel command line and post the
log?

> [<c023999c>] (get_page_from_freelist) from [<c023a4dc>] (__alloc_pages_nodemask+0x188/0xc08)
> [<c023a4dc>] (__alloc_pages_nodemask) from [<c0223078>] (alloc_zeroed_user_highpage_movable+0x14/0x3c)
> [<c0223078>] (alloc_zeroed_user_highpage_movable) from [<c0226768>] (handle_mm_fault+0x254/0xac8)
> [<c0226768>] (handle_mm_fault) from [<c04ba09c>] (do_page_fault+0x228/0x2f4)
> [<c04ba09c>] (do_page_fault) from [<c0111d80>] (do_DataAbort+0x48/0xd0)
> [<c0111d80>] (do_DataAbort) from [<c0100e00>] (__dabt_usr+0x40/0x60)
> 
> 
> 
>         Zone ranges:
>           Normal   [mem 0x0000000080a00000-0x00000000b01fffff]
>           HighMem  [mem 0x00000000b0200000-0x00000000ffffefff]
>         Movable zone start for each node
>         Early memory node ranges
>           node   0: [mem 0x0000000080a00000-0x00000000855fffff]
>           node   0: [mem 0x0000000086a00000-0x0000000087dfffff]
>           node   0: [mem 0x000000008bd00000-0x000000008c4fffff]
>           node   0: [mem 0x000000008e300000-0x000000008ecfffff]
>           node   0: [mem 0x0000000090d00000-0x00000000bfffffff]
>           node   0: [mem 0x00000000cc000000-0x00000000dc9fffff]
>           node   0: [mem 0x00000000de700000-0x00000000de9fffff]
>           node   0: [mem 0x00000000e0800000-0x00000000e0bfffff]
>           node   0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
>           node   0: [mem 0x00000000fda00000-0x00000000ffffefff]
> 
>         ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86a00, 86a00000
>         ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e300, 8e300000
>         ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
>         ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de700, de700000
>         ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
>         ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
>         ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
>         === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000]
>         :  pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff
>         8<--- cut here ---
>         Unable to handle kernel paging request at virtual address fffffffe
>         pgd = 5dd50df5
>         [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000
>         Internal error: Oops: 37 [#1] SMP ARM
>         Modules linked in: gmac(O)
>         CPU: 2 PID: 635 Comm: test-oom Tainted: G           O      5.10.0+ #31
>         Hardware name: Hisilicon A9
>         PC is at move_freepages_block+0x150/0x278
>         LR is at move_freepages_block+0x150/0x278
>         pc : [<c02383a4>]    lr : [<c02383a4>]    psr: 200e0393
>         sp : c4179cf8  ip : 00000000  fp : 00000001
>         r10: c4179d58  r9 : 000de7ff  r8 : 00000000
>         r7 : c0863280  r6 : 000de600  r5 : 000de600  r4 : ef3cc000
>         r3 : ffffffff  r2 : 00000000  r1 : ef5d069c  r0 : fffffffe
>         Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
>         Control: 1ac5387d  Table: 83b0c04a  DAC: 55555555
>         Process test-oom (pid: 635, stack limit = 0x25d667df)
> 
> 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-26  5:20             ` Mike Rapoport
@ 2021-04-26 15:26               ` Kefeng Wang
  2021-04-27  6:23                 ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-04-26 15:26 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm


On 2021/4/26 13:20, Mike Rapoport wrote:
> On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote:
>> On 2021/4/25 15:19, Mike Rapoport wrote:
>>
>>      On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote:
>>
>>          I tested this patchset(plus arm32 change, like arm64 does) based on lts
>>          5.10,add
>>
>>          some debug log, the useful info shows below, if we enable HOLES_IN_ZONE, no
>>          panic,
>>
>>          any idea, thanks.
>>
>>
>>      Are there any changes on top of 5.10 except for pfn_valid() patch?
>>      Do you see this panic on 5.10 without the changes?
>>
>> Yes, there are some BSP support for arm board based on 5.10, with or without
>>
>> your patch will get same panic, the panic pfn=de600 in the range of
>> [dcc00,de00]
>>
>> which is freed by free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de700,
>> de700000
>>
>> we see the PC is at PageLRU, same reason like arm64 panic log,
>>
>>     "PageBuddy in move_freepages returns false
>>      Then we call PageLRU, the macro calls PF_HEAD which is compound_page()
>>      compound_page reads page->compound_head, it is 0xffffffffffffffff, so it
>>      resturns 0xfffffffffffffffe - and accessing this address causes crash"
>>
>>      Can you see stack backtrace beyond move_freepages_block?
>>
>> I do some oom test, so the log is about memory allocate,
>>
>> [<c02383c8>] (move_freepages_block) from [<c0238668>]
>> (steal_suitable_fallback+0x174/0x1f4)
>>
>> [<c0238668>] (steal_suitable_fallback) from [<c023999c>] (get_page_from_freelist+0x490/0x9a4)
> Hmm, this is called with a page from free list, having a page from a freed
> part of the memory map passed to steal_suitable_fallback() means that there
> is an issue with creation of the free list.
>
> Can you please add "memblock=debug" to the kernel command line and post the
> log?

Here is the log,

CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=1ac5387d

CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
OF: fdt: Machine model: HISI-CA9
memblock_add: [0x80a00000-0x855fffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0x86a00000-0x87dfffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0x8bd00000-0x8c4fffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0x8e300000-0x8ecfffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0x90d00000-0xbfffffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0xcc000000-0xdc9fffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0xe0800000-0xe0bfffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0xf5300000-0xf5bfffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0xf5c00000-0xf6ffffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0xfe100000-0xfebfffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0xfec00000-0xffffffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0xde700000-0xde9fffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0xf4b00000-0xf52fffff] early_init_dt_scan_memory+0x11c/0x188
memblock_add: [0xfda00000-0xfe0fffff] early_init_dt_scan_memory+0x11c/0x188
memblock_reserve: [0x80a01000-0x80a02d2e] setup_arch+0x68/0x5c4
Malformed early option 'vecpage_wrprotect'
Memory policy: Data cache writealloc
memblock_reserve: [0x80b00000-0x812e8057] arm_memblock_init+0x34/0x14c
memblock_reserve: [0x83000000-0x84ffffff] arm_memblock_init+0x100/0x14c
memblock_reserve: [0x80a04000-0x80a07fff] arm_memblock_init+0xa0/0x14c
memblock_reserve: [0x80a00000-0x80a02fff] hisi_mem_reserve+0x14/0x30
MEMBLOCK configuration:
  memory size = 0x4c0fffff reserved size = 0x027ef058
  memory.cnt  = 0xa
  memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
  memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
  memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
  memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
  memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
  memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
  memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
  memory[0x7]    [0xe0800000-0xe0bfffff], 0x00400000 bytes flags: 0x0
  memory[0x8]    [0xf4b00000-0xf6ffffff], 0x02500000 bytes flags: 0x0
  memory[0x9]    [0xfda00000-0xfffffffe], 0x025fffff bytes flags: 0x0
  reserved.cnt  = 0x4
  reserved[0x0]    [0x80a00000-0x80a02fff], 0x00003000 bytes flags: 0x0
  reserved[0x1]    [0x80a04000-0x80a07fff], 0x00004000 bytes flags: 0x0
  reserved[0x2]    [0x80b00000-0x812e8057], 0x007e8058 bytes flags: 0x0
  reserved[0x3]    [0x83000000-0x84ffffff], 0x02000000 bytes flags: 0x0
memblock_alloc_try_nid: 2097152 bytes align=0x200000 nid=-1 
from=0x00000000 max_addr=0x00000000 early_alloc+0x20/0x4c
memblock_reserve: [0xb0000000-0xb01fffff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 
max_addr=0x00000000 early_alloc+0x20/0x4c
memblock_reserve: [0xaffff000-0xafffffff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 40 bytes align=0x4 nid=-1 from=0x00000000 
max_addr=0x00000000 iotable_init+0x34/0xf0
memblock_reserve: [0xafffefd8-0xafffefff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 
max_addr=0x00000000 early_alloc+0x20/0x4c
memblock_reserve: [0xafffd000-0xafffdfff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 
max_addr=0x00000000 early_alloc+0x20/0x4c
memblock_reserve: [0xafffc000-0xafffcfff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 
max_addr=0x00000000 early_alloc+0x20/0x4c
memblock_reserve: [0xafffb000-0xafffbfff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 
max_addr=0x00000000 early_alloc+0x20/0x4c
memblock_reserve: [0xafffa000-0xafffafff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 384 bytes align=0x20 nid=0 from=0x00000000 
max_addr=0x00000000 sparse_init_nid+0x34/0x1d8
memblock_reserve: [0xafffee40-0xafffefbf] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_exact_nid_raw: 12582912 bytes align=0x80000 nid=0 
from=0xc09fffff max_addr=0x00000000 sparse_init_nid+0xec/0x1d8
memblock_reserve: [0xaf380000-0xaff7ffff] 
memblock_alloc_range_nid+0x104/0x13c
Zone ranges:
   Normal   [mem 0x0000000080a00000-0x00000000b01fffff]
   HighMem  [mem 0x00000000b0200000-0x00000000ffffefff]
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x0000000080a00000-0x00000000855fffff]
   node   0: [mem 0x0000000086a00000-0x0000000087dfffff]
   node   0: [mem 0x000000008bd00000-0x000000008c4fffff]
   node   0: [mem 0x000000008e300000-0x000000008ecfffff]
   node   0: [mem 0x0000000090d00000-0x00000000bfffffff]
   node   0: [mem 0x00000000cc000000-0x00000000dc9fffff]
   node   0: [mem 0x00000000de700000-0x00000000de9fffff]
   node   0: [mem 0x00000000e0800000-0x00000000e0bfffff]
   node   0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
   node   0: [mem 0x00000000fda00000-0x00000000ffffefff]
Zeroed struct page in unavailable ranges: 513 pages
Initmem setup node 0 [mem 0x0000000080a00000-0x00000000ffffefff]
On node 0 totalpages: 311551
   Normal zone: 1230 pages used for memmap
   Normal zone: 0 pages reserved
   Normal zone: 157440 pages, LIFO batch:31
   HighMem zone: 154111 pages, LIFO batch:31
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffee20-0xafffee3f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffee00-0xafffee1f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffede0-0xafffedff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffedc0-0xafffeddf] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffeda0-0xafffedbf] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffed80-0xafffed9f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffed60-0xafffed7f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffed40-0xafffed5f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffed20-0xafffed3f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_arch+0x440/0x5c4
memblock_reserve: [0xafffed00-0xafffed1f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 22396 bytes align=0x4 nid=-1 from=0x00000000 
max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x30/0x64
memblock_reserve: [0xafff4884-0xafff9fff] 
memblock_alloc_range_nid+0x104/0x13c
[dts]:cpu type is 1380
memblock_alloc_try_nid: 404 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc.constprop.8+0x1c/0x24
memblock_reserve: [0xafffeb60-0xafffecf3] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 404 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc.constprop.8+0x1c/0x24
memblock_reserve: [0xafffe9c0-0xafffeb53] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafff3000-0xafff3fff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 4096 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafff2000-0xafff2fff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 278528 bytes align=0x1000 nid=-1 from=0xc09fffff 
max_addr=0x00000000 pcpu_dfl_fc_alloc+0x28/0x34
memblock_reserve: [0xaffae000-0xafff1fff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_free: [0xaffbf000-0xaffbefff] pcpu_embed_first_chunk+0x5ec/0x6a8
memblock_free: [0xaffd0000-0xaffcffff] pcpu_embed_first_chunk+0x5ec/0x6a8
memblock_free: [0xaffe1000-0xaffe0fff] pcpu_embed_first_chunk+0x5ec/0x6a8
memblock_free: [0xafff2000-0xafff1fff] pcpu_embed_first_chunk+0x5ec/0x6a8
percpu: Embedded 17 pages/cpu s37044 r8192 d24396 u69632
memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffefc0-0xafffefc3] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe9a0-0xafffe9a3] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 16 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe980-0xafffe98f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 16 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe960-0xafffe96f] 
memblock_alloc_range_nid+0x104/0x13c
pcpu-alloc: s37044 r8192 d24396 u69632 alloc=17*4096
pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
memblock_alloc_try_nid: 128 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe8e0-0xafffe95f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 92 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe880-0xafffe8db] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 384 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe700-0xafffe87f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 388 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe560-0xafffe6e3] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 96 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe500-0xafffe55f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 92 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe4a0-0xafffe4fb] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 768 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe1a0-0xafffe49f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 772 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafff4580-0xafff4883] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 192 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 memblock_alloc+0x18/0x20
memblock_reserve: [0xafffe0e0-0xafffe19f] 
memblock_alloc_range_nid+0x104/0x13c
memblock_free: [0xafff3000-0xafff3fff] pcpu_embed_first_chunk+0x570/0x6a8
memblock_free: [0xafff2000-0xafff2fff] pcpu_embed_first_chunk+0x58c/0x6a8
Built 1 zonelists, mobility grouping on.  Total pages: 310321
Kernel command line: console=ttyAMA0,9600n8N lpj=8000000 
initrd=0x83000000,0x2000000 maxcpus=4 master_cpu=1 quiet highres=off  
oops=panic vecpage_wrprotect ksm=1 ramdisk_size=30720 kmemleak=off 
min_loop=128 lockd.nlm_tcpport=13001 lockd.nlm_udpport=13001 
rdinit=/sbin/init root=/dev/ram0 vmalloc=256M
printk: log_buf_len individual max cpu contribution: 4096 bytes
printk: log_buf_len total cpu_extra contributions: 12288 bytes
printk: log_buf_len min size: 16384 bytes
memblock_alloc_try_nid: 32768 bytes align=0x4 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_log_buf+0xe4/0x404
memblock_reserve: [0xaffa6000-0xaffadfff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 12288 bytes align=0x4 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_log_buf+0x130/0x404
memblock_reserve: [0xaffa3000-0xaffa5fff] 
memblock_alloc_range_nid+0x104/0x13c
memblock_alloc_try_nid: 90112 bytes align=0x4 nid=-1 from=0x00000000 
max_addr=0x00000000 setup_log_buf+0x180/0x404
memblock_reserve: [0xaff8d000-0xaffa2fff] 
memblock_alloc_range_nid+0x104/0x13c
printk: log_buf_len: 32768 bytes
printk: early log buf free: 2492(15%)
memblock_alloc_try_nid: 524288 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 alloc_large_system_hash+0x1b0/0x2e8
memblock_reserve: [0xaf300000-0xaf37ffff] 
memblock_alloc_range_nid+0x104/0x13c
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
memblock_alloc_try_nid: 262144 bytes align=0x20 nid=-1 from=0x00000000 
max_addr=0x00000000 alloc_large_system_hash+0x1b0/0x2e8
memblock_reserve: [0xaf2c0000-0xaf2fffff] 
memblock_alloc_range_nid+0x104/0x13c
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
memblock_free: [0xaf430000-0xaf453fff] mem_init+0x154/0x238
memblock_free: [0xaf510000-0xaf545fff] mem_init+0x154/0x238
memblock_free: [0xaf560000-0xaf57ffff] mem_init+0x154/0x238
memblock_free: [0xafd98000-0xafdcdfff] mem_init+0x154/0x238
memblock_free: [0xafdd8000-0xafdfffff] mem_init+0x154/0x238
memblock_free: [0xafe18000-0xafe7ffff] mem_init+0x154/0x238
memblock_free: [0xafee0000-0xafefffff] mem_init+0x154/0x238
Memory: 1191160K/1246204K available (4096K kernel code, 436K rwdata, 
1120K rodata, 1024K init, 491K bss, 55044K reserved, 0K cma-reserved, 
616444K highmem)

>> [<c023999c>] (get_page_from_freelist) from [<c023a4dc>] (__alloc_pages_nodemask+0x188/0xc08)
>> [<c023a4dc>] (__alloc_pages_nodemask) from [<c0223078>] (alloc_zeroed_user_highpage_movable+0x14/0x3c)
>> [<c0223078>] (alloc_zeroed_user_highpage_movable) from [<c0226768>] (handle_mm_fault+0x254/0xac8)
>> [<c0226768>] (handle_mm_fault) from [<c04ba09c>] (do_page_fault+0x228/0x2f4)
>> [<c04ba09c>] (do_page_fault) from [<c0111d80>] (do_DataAbort+0x48/0xd0)
>> [<c0111d80>] (do_DataAbort) from [<c0100e00>] (__dabt_usr+0x40/0x60)
>>
>>
>>
>>          Zone ranges:
>>            Normal   [mem 0x0000000080a00000-0x00000000b01fffff]
>>            HighMem  [mem 0x00000000b0200000-0x00000000ffffefff]
>>          Movable zone start for each node
>>          Early memory node ranges
>>            node   0: [mem 0x0000000080a00000-0x00000000855fffff]
>>            node   0: [mem 0x0000000086a00000-0x0000000087dfffff]
>>            node   0: [mem 0x000000008bd00000-0x000000008c4fffff]
>>            node   0: [mem 0x000000008e300000-0x000000008ecfffff]
>>            node   0: [mem 0x0000000090d00000-0x00000000bfffffff]
>>            node   0: [mem 0x00000000cc000000-0x00000000dc9fffff]
>>            node   0: [mem 0x00000000de700000-0x00000000de9fffff]
>>            node   0: [mem 0x00000000e0800000-0x00000000e0bfffff]
>>            node   0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
>>            node   0: [mem 0x00000000fda00000-0x00000000ffffefff]
>>
>>          ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86a00, 86a00000
>>          ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e300, 8e300000
>>          ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
>>          ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de700, de700000
>>          ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
>>          ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
>>          ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
>>          === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000]
>>          :  pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff
>>          8<--- cut here ---
>>          Unable to handle kernel paging request at virtual address fffffffe
>>          pgd = 5dd50df5
>>          [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000
>>          Internal error: Oops: 37 [#1] SMP ARM
>>          Modules linked in: gmac(O)
>>          CPU: 2 PID: 635 Comm: test-oom Tainted: G           O      5.10.0+ #31
>>          Hardware name: Hisilicon A9
>>          PC is at move_freepages_block+0x150/0x278
>>          LR is at move_freepages_block+0x150/0x278
>>          pc : [<c02383a4>]    lr : [<c02383a4>]    psr: 200e0393
>>          sp : c4179cf8  ip : 00000000  fp : 00000001
>>          r10: c4179d58  r9 : 000de7ff  r8 : 00000000
>>          r7 : c0863280  r6 : 000de600  r5 : 000de600  r4 : ef3cc000
>>          r3 : ffffffff  r2 : 00000000  r1 : ef5d069c  r0 : fffffffe
>>          Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
>>          Control: 1ac5387d  Table: 83b0c04a  DAC: 55555555
>>          Process test-oom (pid: 635, stack limit = 0x25d667df)
>>
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-26 15:26               ` Kefeng Wang
@ 2021-04-27  6:23                 ` Mike Rapoport
  2021-04-27 11:08                   ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-27  6:23 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote:
> 
> On 2021/4/26 13:20, Mike Rapoport wrote:
> > On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote:
> > > On 2021/4/25 15:19, Mike Rapoport wrote:
> > > 
> > >      On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote:
> > > 
> > >          I tested this patchset(plus arm32 change, like arm64 does)
> > >          based on lts 5.10,add some debug log, the useful info shows
> > >          below, if we enable HOLES_IN_ZONE, no panic, any idea,
> > >          thanks.
> > > 
> > >      Are there any changes on top of 5.10 except for pfn_valid() patch?
> > >      Do you see this panic on 5.10 without the changes?
> > > 
> > > Yes, there are some BSP support for arm board based on 5.10,

Is it possible to test 5.12?

> > > with or without your patch will get same panic, the panic pfn=de600
> > > in the range of [dcc00,de00] which is freed by free_memmap, start_pfn
> > > = dcc00,  dcc00000 end_pfn = de700, de700000
> > > 
> > > we see the PC is at PageLRU, same reason like arm64 panic log,
> > > 
> > >     "PageBuddy in move_freepages returns false
> > >      Then we call PageLRU, the macro calls PF_HEAD which is compound_page()
> > >      compound_page reads page->compound_head, it is 0xffffffffffffffff, so it
> > >      resturns 0xfffffffffffffffe - and accessing this address causes crash"
> > > 
> > >      Can you see stack backtrace beyond move_freepages_block?
> > > 
> > > I do some oom test, so the log is about memory allocate,
> > > 
> > > [<c02383c8>] (move_freepages_block) from [<c0238668>]
> > > (steal_suitable_fallback+0x174/0x1f4)
> > > 
> > > [<c0238668>] (steal_suitable_fallback) from [<c023999c>] (get_page_from_freelist+0x490/0x9a4)
> >
> > Hmm, this is called with a page from free list, having a page from a freed
> > part of the memory map passed to steal_suitable_fallback() means that there
> > is an issue with creation of the free list.
> > 
> > Can you please add "memblock=debug" to the kernel command line and post the
> > log?
> 
> Here is the log,
> 
> CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=1ac5387d
> 
> CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> OF: fdt: Machine model: HISI-CA9
> memblock_add: [0x80a00000-0x855fffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0x86a00000-0x87dfffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0x8bd00000-0x8c4fffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0x8e300000-0x8ecfffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0x90d00000-0xbfffffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0xcc000000-0xdc9fffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0xe0800000-0xe0bfffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0xf5300000-0xf5bfffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0xf5c00000-0xf6ffffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0xfe100000-0xfebfffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0xfec00000-0xffffffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0xde700000-0xde9fffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0xf4b00000-0xf52fffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_add: [0xfda00000-0xfe0fffff] early_init_dt_scan_memory+0x11c/0x188
> memblock_reserve: [0x80a01000-0x80a02d2e] setup_arch+0x68/0x5c4
> Malformed early option 'vecpage_wrprotect'
> Memory policy: Data cache writealloc
> memblock_reserve: [0x80b00000-0x812e8057] arm_memblock_init+0x34/0x14c
> memblock_reserve: [0x83000000-0x84ffffff] arm_memblock_init+0x100/0x14c
> memblock_reserve: [0x80a04000-0x80a07fff] arm_memblock_init+0xa0/0x14c
> memblock_reserve: [0x80a00000-0x80a02fff] hisi_mem_reserve+0x14/0x30
> MEMBLOCK configuration:
>  memory size = 0x4c0fffff reserved size = 0x027ef058
>  memory.cnt  = 0xa
>  memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
>  memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
>  memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
>  memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
>  memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
>  memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
>  memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
>  memory[0x7]    [0xe0800000-0xe0bfffff], 0x00400000 bytes flags: 0x0
>  memory[0x8]    [0xf4b00000-0xf6ffffff], 0x02500000 bytes flags: 0x0
>  memory[0x9]    [0xfda00000-0xfffffffe], 0x025fffff bytes flags: 0x0
>  reserved.cnt  = 0x4
>  reserved[0x0]    [0x80a00000-0x80a02fff], 0x00003000 bytes flags: 0x0
>  reserved[0x1]    [0x80a04000-0x80a07fff], 0x00004000 bytes flags: 0x0
>  reserved[0x2]    [0x80b00000-0x812e8057], 0x007e8058 bytes flags: 0x0
>  reserved[0x3]    [0x83000000-0x84ffffff], 0x02000000 bytes flags: 0x0
...
> Zone ranges:
>   Normal   [mem 0x0000000080a00000-0x00000000b01fffff]
>   HighMem  [mem 0x00000000b0200000-0x00000000ffffefff]
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x0000000080a00000-0x00000000855fffff]
>   node   0: [mem 0x0000000086a00000-0x0000000087dfffff]
>   node   0: [mem 0x000000008bd00000-0x000000008c4fffff]
>   node   0: [mem 0x000000008e300000-0x000000008ecfffff]
>   node   0: [mem 0x0000000090d00000-0x00000000bfffffff]
>   node   0: [mem 0x00000000cc000000-0x00000000dc9fffff]
>   node   0: [mem 0x00000000de700000-0x00000000de9fffff]
>   node   0: [mem 0x00000000e0800000-0x00000000e0bfffff]
>   node   0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
>   node   0: [mem 0x00000000fda00000-0x00000000ffffefff]
> Zeroed struct page in unavailable ranges: 513 pages
> Initmem setup node 0 [mem 0x0000000080a00000-0x00000000ffffefff]
> On node 0 totalpages: 311551
>   Normal zone: 1230 pages used for memmap
>   Normal zone: 0 pages reserved
>   Normal zone: 157440 pages, LIFO batch:31
>   HighMem zone: 154111 pages, LIFO batch:31

AFAICT the range [de600000, de7ff000] should not be added to the free
lists.

Can you try with the below patch:

diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..7f3c33d53f87 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1994,6 +1994,8 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
 	unsigned long end_pfn = min_t(unsigned long,
 				      PFN_DOWN(end), max_low_pfn);
 
+	pr_info("%s: range: %pa - %pa, pfn: %lx - %lx\n", __func__, &start, &end, start_pfn, end_pfn);
+
 	if (start_pfn >= end_pfn)
 		return 0;
 
 
> > > [<c023999c>] (get_page_from_freelist) from [<c023a4dc>] (__alloc_pages_nodemask+0x188/0xc08)
> > > [<c023a4dc>] (__alloc_pages_nodemask) from [<c0223078>] (alloc_zeroed_user_highpage_movable+0x14/0x3c)
> > > [<c0223078>] (alloc_zeroed_user_highpage_movable) from [<c0226768>] (handle_mm_fault+0x254/0xac8)
> > > [<c0226768>] (handle_mm_fault) from [<c04ba09c>] (do_page_fault+0x228/0x2f4)
> > > [<c04ba09c>] (do_page_fault) from [<c0111d80>] (do_DataAbort+0x48/0xd0)
> > > [<c0111d80>] (do_DataAbort) from [<c0100e00>] (__dabt_usr+0x40/0x60)
> > > 
> > >          Zone ranges:
> > >            Normal   [mem 0x0000000080a00000-0x00000000b01fffff]
> > >            HighMem  [mem 0x00000000b0200000-0x00000000ffffefff]
> > >          Movable zone start for each node
> > >          Early memory node ranges
> > >            node   0: [mem 0x0000000080a00000-0x00000000855fffff]
> > >            node   0: [mem 0x0000000086a00000-0x0000000087dfffff]
> > >            node   0: [mem 0x000000008bd00000-0x000000008c4fffff]
> > >            node   0: [mem 0x000000008e300000-0x000000008ecfffff]
> > >            node   0: [mem 0x0000000090d00000-0x00000000bfffffff]
> > >            node   0: [mem 0x00000000cc000000-0x00000000dc9fffff]
> > >            node   0: [mem 0x00000000de700000-0x00000000de9fffff]
> > >            node   0: [mem 0x00000000e0800000-0x00000000e0bfffff]
> > >            node   0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
> > >            node   0: [mem 0x00000000fda00000-0x00000000ffffefff]
> > > 
> > >          ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86a00, 86a00000
> > >          ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e300, 8e300000
> > >          ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
> > >          ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de700, de700000
> > >          ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
> > >          ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
> > >          ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
> > >          === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000]
> > >          :  pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff
> > >          8<--- cut here ---
> > >          Unable to handle kernel paging request at virtual address fffffffe
> > >          pgd = 5dd50df5
> > >          [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000
> > >          Internal error: Oops: 37 [#1] SMP ARM
> > >          Modules linked in: gmac(O)
> > >          CPU: 2 PID: 635 Comm: test-oom Tainted: G           O      5.10.0+ #31
> > >          Hardware name: Hisilicon A9
> > >          PC is at move_freepages_block+0x150/0x278
> > >          LR is at move_freepages_block+0x150/0x278
> > >          pc : [<c02383a4>]    lr : [<c02383a4>]    psr: 200e0393
> > >          sp : c4179cf8  ip : 00000000  fp : 00000001
> > >          r10: c4179d58  r9 : 000de7ff  r8 : 00000000
> > >          r7 : c0863280  r6 : 000de600  r5 : 000de600  r4 : ef3cc000
> > >          r3 : ffffffff  r2 : 00000000  r1 : ef5d069c  r0 : fffffffe
> > >          Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
> > >          Control: 1ac5387d  Table: 83b0c04a  DAC: 55555555
> > >          Process test-oom (pid: 635, stack limit = 0x25d667df)
> > > 
> > > 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-27  6:23                 ` Mike Rapoport
@ 2021-04-27 11:08                   ` Kefeng Wang
  2021-04-28  5:59                     ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-04-27 11:08 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm


On 2021/4/27 14:23, Mike Rapoport wrote:
> On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote:
>> On 2021/4/26 13:20, Mike Rapoport wrote:
>>> On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote:
>>>> On 2021/4/25 15:19, Mike Rapoport wrote:
>>>>
>>>>       On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote:
>>>>
>>>>           I tested this patchset(plus arm32 change, like arm64 does)
>>>>           based on lts 5.10,add some debug log, the useful info shows
>>>>           below, if we enable HOLES_IN_ZONE, no panic, any idea,
>>>>           thanks.
>>>>
>>>>       Are there any changes on top of 5.10 except for pfn_valid() patch?
>>>>       Do you see this panic on 5.10 without the changes?
>>>>
>>>> Yes, there are some BSP support for arm board based on 5.10,
> Is it possible to test 5.12?
>
>>>> with or without your patch will get same panic, the panic pfn=de600
>>>> in the range of [dcc00,de00] which is freed by free_memmap, start_pfn
>>>> = dcc00,  dcc00000 end_pfn = de700, de700000
>>>>
>>>> we see the PC is at PageLRU, same reason like arm64 panic log,
>>>>
>>>>      "PageBuddy in move_freepages returns false
>>>>       Then we call PageLRU, the macro calls PF_HEAD which is compound_page()
>>>>       compound_page reads page->compound_head, it is 0xffffffffffffffff, so it
>>>>       resturns 0xfffffffffffffffe - and accessing this address causes crash"
>>>>
>>>>       Can you see stack backtrace beyond move_freepages_block?
>>>>
>>>> I do some oom test, so the log is about memory allocate,
>>>>
>>>> [<c02383c8>] (move_freepages_block) from [<c0238668>]
>>>> (steal_suitable_fallback+0x174/0x1f4)
>>>>
>>>> [<c0238668>] (steal_suitable_fallback) from [<c023999c>] (get_page_from_freelist+0x490/0x9a4)
>>> Hmm, this is called with a page from free list, having a page from a freed
>>> part of the memory map passed to steal_suitable_fallback() means that there
>>> is an issue with creation of the free list.
>>>
>>> Can you please add "memblock=debug" to the kernel command line and post the
>>> log?
>> Here is the log,
>>
>> CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=1ac5387d
>>
>> CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
>> OF: fdt: Machine model: HISI-CA9
>> memblock_add: [0x80a00000-0x855fffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0x86a00000-0x87dfffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0x8bd00000-0x8c4fffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0x8e300000-0x8ecfffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0x90d00000-0xbfffffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0xcc000000-0xdc9fffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0xe0800000-0xe0bfffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0xf5300000-0xf5bfffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0xf5c00000-0xf6ffffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0xfe100000-0xfebfffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0xfec00000-0xffffffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0xde700000-0xde9fffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0xf4b00000-0xf52fffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_add: [0xfda00000-0xfe0fffff] early_init_dt_scan_memory+0x11c/0x188
>> memblock_reserve: [0x80a01000-0x80a02d2e] setup_arch+0x68/0x5c4
>> Malformed early option 'vecpage_wrprotect'
>> Memory policy: Data cache writealloc
>> memblock_reserve: [0x80b00000-0x812e8057] arm_memblock_init+0x34/0x14c
>> memblock_reserve: [0x83000000-0x84ffffff] arm_memblock_init+0x100/0x14c
>> memblock_reserve: [0x80a04000-0x80a07fff] arm_memblock_init+0xa0/0x14c
>> memblock_reserve: [0x80a00000-0x80a02fff] hisi_mem_reserve+0x14/0x30
>> MEMBLOCK configuration:
>>   memory size = 0x4c0fffff reserved size = 0x027ef058
>>   memory.cnt  = 0xa
>>   memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
>>   memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
>>   memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
>>   memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
>>   memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
>>   memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
>>   memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
>>   memory[0x7]    [0xe0800000-0xe0bfffff], 0x00400000 bytes flags: 0x0
>>   memory[0x8]    [0xf4b00000-0xf6ffffff], 0x02500000 bytes flags: 0x0
>>   memory[0x9]    [0xfda00000-0xfffffffe], 0x025fffff bytes flags: 0x0
>>   reserved.cnt  = 0x4
>>   reserved[0x0]    [0x80a00000-0x80a02fff], 0x00003000 bytes flags: 0x0
>>   reserved[0x1]    [0x80a04000-0x80a07fff], 0x00004000 bytes flags: 0x0
>>   reserved[0x2]    [0x80b00000-0x812e8057], 0x007e8058 bytes flags: 0x0
>>   reserved[0x3]    [0x83000000-0x84ffffff], 0x02000000 bytes flags: 0x0
> ...
>> Zone ranges:
>>    Normal   [mem 0x0000000080a00000-0x00000000b01fffff]
>>    HighMem  [mem 0x00000000b0200000-0x00000000ffffefff]
>> Movable zone start for each node
>> Early memory node ranges
>>    node   0: [mem 0x0000000080a00000-0x00000000855fffff]
>>    node   0: [mem 0x0000000086a00000-0x0000000087dfffff]
>>    node   0: [mem 0x000000008bd00000-0x000000008c4fffff]
>>    node   0: [mem 0x000000008e300000-0x000000008ecfffff]
>>    node   0: [mem 0x0000000090d00000-0x00000000bfffffff]
>>    node   0: [mem 0x00000000cc000000-0x00000000dc9fffff]
>>    node   0: [mem 0x00000000de700000-0x00000000de9fffff]
>>    node   0: [mem 0x00000000e0800000-0x00000000e0bfffff]
>>    node   0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
>>    node   0: [mem 0x00000000fda00000-0x00000000ffffefff]
>> Zeroed struct page in unavailable ranges: 513 pages
>> Initmem setup node 0 [mem 0x0000000080a00000-0x00000000ffffefff]
>> On node 0 totalpages: 311551
>>    Normal zone: 1230 pages used for memmap
>>    Normal zone: 0 pages reserved
>>    Normal zone: 157440 pages, LIFO batch:31
>>    HighMem zone: 154111 pages, LIFO batch:31
> AFAICT the range [de600000, de7ff000] should not be added to the free
> lists.
>
> Can you try with the below patch:
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..7f3c33d53f87 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1994,6 +1994,8 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
>   	unsigned long end_pfn = min_t(unsigned long,
>   				      PFN_DOWN(end), max_low_pfn);
>   
> +	pr_info("%s: range: %pa - %pa, pfn: %lx - %lx\n", __func__, &start, &end, start_pfn, end_pfn);
> +
>   	if (start_pfn >= end_pfn)
>   		return 0;
>   
__free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04
__free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00
__free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000
__free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600
__free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00
__free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500
__free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00
__free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0
__free_memory_core, range: 0xaf430000 - 0xaf454000, pfn: af430 - af454
__free_memory_core, range: 0xaf510000 - 0xaf546000, pfn: af510 - af546
__free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580
__free_memory_core, range: 0xafd98000 - 0xafdce000, pfn: afd98 - afdce
__free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00
__free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80
__free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00
__free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d
__free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4
__free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe
__free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe
__free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe
__free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe
__free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe
__free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe
__free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe
__free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe
__free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe
__free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe
__free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
__free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
__free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200
__free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200
__free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200
__free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200

>   
>>>> [<c023999c>] (get_page_from_freelist) from [<c023a4dc>] (__alloc_pages_nodemask+0x188/0xc08)
>>>> [<c023a4dc>] (__alloc_pages_nodemask) from [<c0223078>] (alloc_zeroed_user_highpage_movable+0x14/0x3c)
>>>> [<c0223078>] (alloc_zeroed_user_highpage_movable) from [<c0226768>] (handle_mm_fault+0x254/0xac8)
>>>> [<c0226768>] (handle_mm_fault) from [<c04ba09c>] (do_page_fault+0x228/0x2f4)
>>>> [<c04ba09c>] (do_page_fault) from [<c0111d80>] (do_DataAbort+0x48/0xd0)
>>>> [<c0111d80>] (do_DataAbort) from [<c0100e00>] (__dabt_usr+0x40/0x60)
>>>>
>>>>           Zone ranges:
>>>>             Normal   [mem 0x0000000080a00000-0x00000000b01fffff]
>>>>             HighMem  [mem 0x00000000b0200000-0x00000000ffffefff]
>>>>           Movable zone start for each node
>>>>           Early memory node ranges
>>>>             node   0: [mem 0x0000000080a00000-0x00000000855fffff]
>>>>             node   0: [mem 0x0000000086a00000-0x0000000087dfffff]
>>>>             node   0: [mem 0x000000008bd00000-0x000000008c4fffff]
>>>>             node   0: [mem 0x000000008e300000-0x000000008ecfffff]
>>>>             node   0: [mem 0x0000000090d00000-0x00000000bfffffff]
>>>>             node   0: [mem 0x00000000cc000000-0x00000000dc9fffff]
>>>>             node   0: [mem 0x00000000de700000-0x00000000de9fffff]
>>>>             node   0: [mem 0x00000000e0800000-0x00000000e0bfffff]
>>>>             node   0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
>>>>             node   0: [mem 0x00000000fda00000-0x00000000ffffefff]
>>>>
>>>>           ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86a00, 86a00000
>>>>           ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e300, 8e300000
>>>>           ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
>>>>           ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de700, de700000
>>>>           ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
>>>>           ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
>>>>           ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
>>>>           === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000]
>>>>           :  pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff
>>>>           8<--- cut here ---
>>>>           Unable to handle kernel paging request at virtual address fffffffe
>>>>           pgd = 5dd50df5
>>>>           [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000
>>>>           Internal error: Oops: 37 [#1] SMP ARM
>>>>           Modules linked in: gmac(O)
>>>>           CPU: 2 PID: 635 Comm: test-oom Tainted: G           O      5.10.0+ #31
>>>>           Hardware name: Hisilicon A9
>>>>           PC is at move_freepages_block+0x150/0x278
>>>>           LR is at move_freepages_block+0x150/0x278
>>>>           pc : [<c02383a4>]    lr : [<c02383a4>]    psr: 200e0393
>>>>           sp : c4179cf8  ip : 00000000  fp : 00000001
>>>>           r10: c4179d58  r9 : 000de7ff  r8 : 00000000
>>>>           r7 : c0863280  r6 : 000de600  r5 : 000de600  r4 : ef3cc000
>>>>           r3 : ffffffff  r2 : 00000000  r1 : ef5d069c  r0 : fffffffe
>>>>           Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
>>>>           Control: 1ac5387d  Table: 83b0c04a  DAC: 55555555
>>>>           Process test-oom (pid: 635, stack limit = 0x25d667df)
>>>>
>>>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-27 11:08                   ` Kefeng Wang
@ 2021-04-28  5:59                     ` Mike Rapoport
  2021-04-29  0:48                       ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-28  5:59 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Tue, Apr 27, 2021 at 07:08:59PM +0800, Kefeng Wang wrote:
> 
> On 2021/4/27 14:23, Mike Rapoport wrote:
> > On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote:
> > > On 2021/4/26 13:20, Mike Rapoport wrote:
> > > > On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote:
> > > > > On 2021/4/25 15:19, Mike Rapoport wrote:
> > > > > 
> > > > >       On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote:
> > > > > 
> > > > >           I tested this patchset(plus arm32 change, like arm64 does)
> > > > >           based on lts 5.10,add some debug log, the useful info shows
> > > > >           below, if we enable HOLES_IN_ZONE, no panic, any idea,
> > > > >           thanks.
> > > > > 
> > > > >       Are there any changes on top of 5.10 except for pfn_valid() patch?
> > > > >       Do you see this panic on 5.10 without the changes?
> > > > > 
> > > > > Yes, there are some BSP support for arm board based on 5.10,
> > Is it possible to test 5.12?

Do you use SPARSMEM? If yes, what is your section size?
What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-28  5:59                     ` Mike Rapoport
@ 2021-04-29  0:48                       ` Kefeng Wang
  2021-04-29  6:57                         ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-04-29  0:48 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm


On 2021/4/28 13:59, Mike Rapoport wrote:
> On Tue, Apr 27, 2021 at 07:08:59PM +0800, Kefeng Wang wrote:
>> On 2021/4/27 14:23, Mike Rapoport wrote:
>>> On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote:
>>>> On 2021/4/26 13:20, Mike Rapoport wrote:
>>>>> On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote:
>>>>>> On 2021/4/25 15:19, Mike Rapoport wrote:
>>>>>>
>>>>>>        On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote:
>>>>>>
>>>>>>            I tested this patchset(plus arm32 change, like arm64 does)
>>>>>>            based on lts 5.10,add some debug log, the useful info shows
>>>>>>            below, if we enable HOLES_IN_ZONE, no panic, any idea,
>>>>>>            thanks.
>>>>>>
>>>>>>        Are there any changes on top of 5.10 except for pfn_valid() patch?
>>>>>>        Do you see this panic on 5.10 without the changes?
>>>>>>
>>>>>> Yes, there are some BSP support for arm board based on 5.10,
>>> Is it possible to test 5.12?
> Do you use SPARSMEM? If yes, what is your section size?
> What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?

Yes,

CONFIG_SPARSEMEM=y

CONFIG_SPARSEMEM_STATIC=y

CONFIG_FORCE_MAX_ZONEORDER = 11

CONFIG_PAGE_OFFSET=0xC0000000
CONFIG_HAVE_ARCH_PFN_VALID=y
CONFIG_HIGHMEM=y
#define SECTION_SIZE_BITS    26
#define MAX_PHYSADDR_BITS    32
#define MAX_PHYSMEM_BITS     32


>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-29  0:48                       ` Kefeng Wang
@ 2021-04-29  6:57                         ` Mike Rapoport
  2021-04-29 10:22                           ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-29  6:57 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Thu, Apr 29, 2021 at 08:48:26AM +0800, Kefeng Wang wrote:
> 
> On 2021/4/28 13:59, Mike Rapoport wrote:
> > On Tue, Apr 27, 2021 at 07:08:59PM +0800, Kefeng Wang wrote:
> > > On 2021/4/27 14:23, Mike Rapoport wrote:
> > > > On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote:
> > > > > On 2021/4/26 13:20, Mike Rapoport wrote:
> > > > > > On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote:
> > > > > > > On 2021/4/25 15:19, Mike Rapoport wrote:
> > > > > > > 
> > > > > > >        On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote:
> > > > > > > 
> > > > > > >            I tested this patchset(plus arm32 change, like arm64 does)
> > > > > > >            based on lts 5.10,add some debug log, the useful info shows
> > > > > > >            below, if we enable HOLES_IN_ZONE, no panic, any idea,
> > > > > > >            thanks.
> > > > > > > 
> > > > > > >        Are there any changes on top of 5.10 except for pfn_valid() patch?
> > > > > > >        Do you see this panic on 5.10 without the changes?
> > > > > > > 
> > > > > > > Yes, there are some BSP support for arm board based on 5.10,
> > > > Is it possible to test 5.12?
> > Do you use SPARSMEM? If yes, what is your section size?
> > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
> 
> Yes,
> 
> CONFIG_SPARSEMEM=y
> 
> CONFIG_SPARSEMEM_STATIC=y
> 
> CONFIG_FORCE_MAX_ZONEORDER = 11
> 
> CONFIG_PAGE_OFFSET=0xC0000000
> CONFIG_HAVE_ARCH_PFN_VALID=y
> CONFIG_HIGHMEM=y
> #define SECTION_SIZE_BITS    26
> #define MAX_PHYSADDR_BITS    32
> #define MAX_PHYSMEM_BITS     32

It seems that with SPARSEMEM we don't align the freed parts on pageblock
boundaries.

Can you try the patch below:

diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..1926369b52ec 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void)
 		 * due to SPARSEMEM sections which aren't present.
 		 */
 		start = min(start, ALIGN(prev_end, PAGES_PER_SECTION));
-#else
+#endif
 		/*
 		 * Align down here since the VM subsystem insists that the
 		 * memmap entries are valid from the bank start aligned to
 		 * MAX_ORDER_NR_PAGES.
 		 */
 		start = round_down(start, MAX_ORDER_NR_PAGES);
-#endif
 
 		/*
 		 * If we had a previous bank, and there is a space
 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-29  6:57                         ` Mike Rapoport
@ 2021-04-29 10:22                           ` Kefeng Wang
  2021-04-30  9:51                             ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-04-29 10:22 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm


On 2021/4/29 14:57, Mike Rapoport wrote:

>>> Do you use SPARSMEM? If yes, what is your section size?
>>> What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
>> Yes,
>>
>> CONFIG_SPARSEMEM=y
>>
>> CONFIG_SPARSEMEM_STATIC=y
>>
>> CONFIG_FORCE_MAX_ZONEORDER = 11
>>
>> CONFIG_PAGE_OFFSET=0xC0000000
>> CONFIG_HAVE_ARCH_PFN_VALID=y
>> CONFIG_HIGHMEM=y
>> #define SECTION_SIZE_BITS    26
>> #define MAX_PHYSADDR_BITS    32
>> #define MAX_PHYSMEM_BITS     32


With the patch,  the addr is aligned, but the panic still occurred,

new free memory log is below,

memblock_free: [0xaf430000-0xaf44ffff] mem_init+0x158/0x23c

memblock_free: [0xaf510000-0xaf53ffff] mem_init+0x158/0x23c
memblock_free: [0xaf560000-0xaf57ffff] mem_init+0x158/0x23c
memblock_free: [0xafd98000-0xafdc7fff] mem_init+0x158/0x23c
memblock_free: [0xafdd8000-0xafdfffff] mem_init+0x158/0x23c
memblock_free: [0xafe18000-0xafe7ffff] mem_init+0x158/0x23c
memblock_free: [0xafee0000-0xafefffff] mem_init+0x158/0x23c
__free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04
__free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00
__free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000
__free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600
__free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00
__free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500
__free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00
__free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0
__free_memory_core, range: 0xaf430000 - 0xaf450000, pfn: af430 - af450
__free_memory_core, range: 0xaf510000 - 0xaf540000, pfn: af510 - af540
__free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580
__free_memory_core, range: 0xafd98000 - 0xafdc8000, pfn: afd98 - afdc8
__free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00
__free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80
__free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00
__free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d
__free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4
__free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe
__free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe
__free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe
__free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe
__free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe
__free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe
__free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe
__free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe
__free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe
__free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe
__free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
__free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
__free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200
__free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200
__free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200
__free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200
> It seems that with SPARSEMEM we don't align the freed parts on pageblock
> boundaries.
>
> Can you try the patch below:
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..1926369b52ec 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void)
>   		 * due to SPARSEMEM sections which aren't present.
>   		 */
>   		start = min(start, ALIGN(prev_end, PAGES_PER_SECTION));
> -#else
> +#endif
>   		/*
>   		 * Align down here since the VM subsystem insists that the
>   		 * memmap entries are valid from the bank start aligned to
>   		 * MAX_ORDER_NR_PAGES.
>   		 */
>   		start = round_down(start, MAX_ORDER_NR_PAGES);
> -#endif
>   
>   		/*
>   		 * If we had a previous bank, and there is a space
>   
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-29 10:22                           ` Kefeng Wang
@ 2021-04-30  9:51                             ` Mike Rapoport
  2021-04-30 11:24                               ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-04-30  9:51 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote:
> 
> On 2021/4/29 14:57, Mike Rapoport wrote:
> 
> > > > Do you use SPARSMEM? If yes, what is your section size?
> > > > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
> > > Yes,
> > > 
> > > CONFIG_SPARSEMEM=y
> > > 
> > > CONFIG_SPARSEMEM_STATIC=y
> > > 
> > > CONFIG_FORCE_MAX_ZONEORDER = 11
> > > 
> > > CONFIG_PAGE_OFFSET=0xC0000000
> > > CONFIG_HAVE_ARCH_PFN_VALID=y
> > > CONFIG_HIGHMEM=y
> > > #define SECTION_SIZE_BITS    26
> > > #define MAX_PHYSADDR_BITS    32
> > > #define MAX_PHYSMEM_BITS     32
> 
> 
> With the patch,  the addr is aligned, but the panic still occurred,

Is this the same panic at move_freepages() for range [de600, de7ff]?

Do you enable CONFIG_ARM_LPAE?

> new free memory log is below,
> 
> memblock_free: [0xaf430000-0xaf44ffff] mem_init+0x158/0x23c
> 
> memblock_free: [0xaf510000-0xaf53ffff] mem_init+0x158/0x23c
> memblock_free: [0xaf560000-0xaf57ffff] mem_init+0x158/0x23c
> memblock_free: [0xafd98000-0xafdc7fff] mem_init+0x158/0x23c
> memblock_free: [0xafdd8000-0xafdfffff] mem_init+0x158/0x23c
> memblock_free: [0xafe18000-0xafe7ffff] mem_init+0x158/0x23c
> memblock_free: [0xafee0000-0xafefffff] mem_init+0x158/0x23c
> __free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04
> __free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00
> __free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000
> __free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600
> __free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00
> __free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500
> __free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00
> __free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0
> __free_memory_core, range: 0xaf430000 - 0xaf450000, pfn: af430 - af450
> __free_memory_core, range: 0xaf510000 - 0xaf540000, pfn: af510 - af540
> __free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580
> __free_memory_core, range: 0xafd98000 - 0xafdc8000, pfn: afd98 - afdc8
> __free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00
> __free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80
> __free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00
> __free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d
> __free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4
> __free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe
> __free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe
> __free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe
> __free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe
> __free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe
> __free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe
> __free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe
> __free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe
> __free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe
> __free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe
> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200

The range [de600, de7ff] 

> __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200
> __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200
> __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200
> > It seems that with SPARSEMEM we don't align the freed parts on pageblock
> > boundaries.
> > 
> > Can you try the patch below:
> > 
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..1926369b52ec 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void)
> >   		 * due to SPARSEMEM sections which aren't present.
> >   		 */
> >   		start = min(start, ALIGN(prev_end, PAGES_PER_SECTION));
> > -#else
> > +#endif
> >   		/*
> >   		 * Align down here since the VM subsystem insists that the
> >   		 * memmap entries are valid from the bank start aligned to
> >   		 * MAX_ORDER_NR_PAGES.
> >   		 */
> >   		start = round_down(start, MAX_ORDER_NR_PAGES);
> > -#endif
> >   		/*
> >   		 * If we had a previous bank, and there is a space
> > 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-30  9:51                             ` Mike Rapoport
@ 2021-04-30 11:24                               ` Kefeng Wang
  2021-05-03  6:26                                 ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-04-30 11:24 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm



On 2021/4/30 17:51, Mike Rapoport wrote:
> On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote:
>>
>> On 2021/4/29 14:57, Mike Rapoport wrote:
>>
>>>>> Do you use SPARSMEM? If yes, what is your section size?
>>>>> What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
>>>> Yes,
>>>>
>>>> CONFIG_SPARSEMEM=y
>>>>
>>>> CONFIG_SPARSEMEM_STATIC=y
>>>>
>>>> CONFIG_FORCE_MAX_ZONEORDER = 11
>>>>
>>>> CONFIG_PAGE_OFFSET=0xC0000000
>>>> CONFIG_HAVE_ARCH_PFN_VALID=y
>>>> CONFIG_HIGHMEM=y
>>>> #define SECTION_SIZE_BITS    26
>>>> #define MAX_PHYSADDR_BITS    32
>>>> #define MAX_PHYSMEM_BITS     32
>>
>>
>> With the patch,  the addr is aligned, but the panic still occurred,
> 
> Is this the same panic at move_freepages() for range [de600, de7ff]?
> 
> Do you enable CONFIG_ARM_LPAE?

no, the CONFIG_ARM_LPAE is not set, and yes with same panic at 
move_freepages at

start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] :  pfn =de600, 
page =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000



> 
>> new free memory log is below,
>>
>> memblock_free: [0xaf430000-0xaf44ffff] mem_init+0x158/0x23c
>>
>> memblock_free: [0xaf510000-0xaf53ffff] mem_init+0x158/0x23c
>> memblock_free: [0xaf560000-0xaf57ffff] mem_init+0x158/0x23c
>> memblock_free: [0xafd98000-0xafdc7fff] mem_init+0x158/0x23c
>> memblock_free: [0xafdd8000-0xafdfffff] mem_init+0x158/0x23c
>> memblock_free: [0xafe18000-0xafe7ffff] mem_init+0x158/0x23c
>> memblock_free: [0xafee0000-0xafefffff] mem_init+0x158/0x23c
>> __free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04
>> __free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00
>> __free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000
>> __free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600
>> __free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00
>> __free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500
>> __free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00
>> __free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0
>> __free_memory_core, range: 0xaf430000 - 0xaf450000, pfn: af430 - af450
>> __free_memory_core, range: 0xaf510000 - 0xaf540000, pfn: af510 - af540
>> __free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580
>> __free_memory_core, range: 0xafd98000 - 0xafdc8000, pfn: afd98 - afdc8
>> __free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00
>> __free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80
>> __free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00
>> __free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d
>> __free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4
>> __free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe
>> __free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe
>> __free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe
>> __free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe
>> __free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe
>> __free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe
>> __free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe
>> __free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe
>> __free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe
>> __free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe
>> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
>> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
>> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200
> 
> The range [de600, de7ff]
the __free_memory_core will check the start pfn and end pfn,

  if (start_pfn >= end_pfn)
          return 0;

  __free_pages_memory(start_pfn, end_pfn);
so the memory will not be freed to buddy, confused...
> 
>> __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200
>> __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200
>> __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200
>>> It seems that with SPARSEMEM we don't align the freed parts on pageblock
>>> boundaries.
>>>
>>> Can you try the patch below:
>>>
>>> diff --git a/mm/memblock.c b/mm/memblock.c
>>> index afaefa8fc6ab..1926369b52ec 100644
>>> --- a/mm/memblock.c
>>> +++ b/mm/memblock.c
>>> @@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void)
>>>    		 * due to SPARSEMEM sections which aren't present.
>>>    		 */
>>>    		start = min(start, ALIGN(prev_end, PAGES_PER_SECTION));
>>> -#else
>>> +#endif
>>>    		/*
>>>    		 * Align down here since the VM subsystem insists that the
>>>    		 * memmap entries are valid from the bank start aligned to
>>>    		 * MAX_ORDER_NR_PAGES.
>>>    		 */
>>>    		start = round_down(start, MAX_ORDER_NR_PAGES);
>>> -#endif
>>>    		/*
>>>    		 * If we had a previous bank, and there is a space
>>>
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-04-30 11:24                               ` Kefeng Wang
@ 2021-05-03  6:26                                 ` Mike Rapoport
  2021-05-03  8:07                                   ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-05-03  6:26 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Fri, Apr 30, 2021 at 07:24:37PM +0800, Kefeng Wang wrote:
> 
> 
> On 2021/4/30 17:51, Mike Rapoport wrote:
> > On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote:
> > > 
> > > On 2021/4/29 14:57, Mike Rapoport wrote:
> > > 
> > > > > > Do you use SPARSMEM? If yes, what is your section size?
> > > > > > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
> > > > > Yes,
> > > > > 
> > > > > CONFIG_SPARSEMEM=y
> > > > > 
> > > > > CONFIG_SPARSEMEM_STATIC=y
> > > > > 
> > > > > CONFIG_FORCE_MAX_ZONEORDER = 11
> > > > > 
> > > > > CONFIG_PAGE_OFFSET=0xC0000000
> > > > > CONFIG_HAVE_ARCH_PFN_VALID=y
> > > > > CONFIG_HIGHMEM=y
> > > > > #define SECTION_SIZE_BITS    26
> > > > > #define MAX_PHYSADDR_BITS    32
> > > > > #define MAX_PHYSMEM_BITS     32
> > > 
> > > 
> > > With the patch,  the addr is aligned, but the panic still occurred,
> > 
> > Is this the same panic at move_freepages() for range [de600, de7ff]?
> > 
> > Do you enable CONFIG_ARM_LPAE?
> 
> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
> move_freepages at
> 
> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] :  pfn =de600, page
> =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000
> 
> > > __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
> > > __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
> > > __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200

Hmm, [de600, de7ff] is not added to the free lists which is correct. But
then it's unclear how the page for de600 gets to move_freepages()...

Can't say I have any bright ideas to try here...

> the __free_memory_core will check the start pfn and end pfn,
> 
>  if (start_pfn >= end_pfn)
>          return 0;
> 
>  __free_pages_memory(start_pfn, end_pfn);
> so the memory will not be freed to buddy, confused...

It's a check for range validity, all valid ranges are added.

> > > __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200
> > > __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200
> > > __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200
> > > > It seems that with SPARSEMEM we don't align the freed parts on pageblock
> > > > boundaries.
> > > > 
> > > > Can you try the patch below:
> > > > 
> > > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > > index afaefa8fc6ab..1926369b52ec 100644
> > > > --- a/mm/memblock.c
> > > > +++ b/mm/memblock.c
> > > > @@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void)
> > > >    		 * due to SPARSEMEM sections which aren't present.
> > > >    		 */
> > > >    		start = min(start, ALIGN(prev_end, PAGES_PER_SECTION));
> > > > -#else
> > > > +#endif
> > > >    		/*
> > > >    		 * Align down here since the VM subsystem insists that the
> > > >    		 * memmap entries are valid from the bank start aligned to
> > > >    		 * MAX_ORDER_NR_PAGES.
> > > >    		 */
> > > >    		start = round_down(start, MAX_ORDER_NR_PAGES);
> > > > -#endif
> > > >    		/*
> > > >    		 * If we had a previous bank, and there is a space
> > > > 
> > 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-03  6:26                                 ` Mike Rapoport
@ 2021-05-03  8:07                                   ` David Hildenbrand
  2021-05-03  8:44                                     ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-05-03  8:07 UTC (permalink / raw)
  To: Mike Rapoport, Kefeng Wang
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm

On 03.05.21 08:26, Mike Rapoport wrote:
> On Fri, Apr 30, 2021 at 07:24:37PM +0800, Kefeng Wang wrote:
>>
>>
>> On 2021/4/30 17:51, Mike Rapoport wrote:
>>> On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote:
>>>>
>>>> On 2021/4/29 14:57, Mike Rapoport wrote:
>>>>
>>>>>>> Do you use SPARSMEM? If yes, what is your section size?
>>>>>>> What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
>>>>>> Yes,
>>>>>>
>>>>>> CONFIG_SPARSEMEM=y
>>>>>>
>>>>>> CONFIG_SPARSEMEM_STATIC=y
>>>>>>
>>>>>> CONFIG_FORCE_MAX_ZONEORDER = 11
>>>>>>
>>>>>> CONFIG_PAGE_OFFSET=0xC0000000
>>>>>> CONFIG_HAVE_ARCH_PFN_VALID=y
>>>>>> CONFIG_HIGHMEM=y
>>>>>> #define SECTION_SIZE_BITS    26
>>>>>> #define MAX_PHYSADDR_BITS    32
>>>>>> #define MAX_PHYSMEM_BITS     32
>>>>
>>>>
>>>> With the patch,  the addr is aligned, but the panic still occurred,
>>>
>>> Is this the same panic at move_freepages() for range [de600, de7ff]?
>>>
>>> Do you enable CONFIG_ARM_LPAE?
>>
>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
>> move_freepages at
>>
>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] :  pfn =de600, page
>> =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000
>>
>>>> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
>>>> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
>>>> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200
> 
> Hmm, [de600, de7ff] is not added to the free lists which is correct. But
> then it's unclear how the page for de600 gets to move_freepages()...
> 
> Can't say I have any bright ideas to try here...

Are we missing some checks (e.g., PageReserved()) that 
pfn_valid_within() would have "caught" before?

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-03  8:07                                   ` David Hildenbrand
@ 2021-05-03  8:44                                     ` Mike Rapoport
  2021-05-06 12:47                                       ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-05-03  8:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Kefeng Wang, linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm

On Mon, May 03, 2021 at 10:07:01AM +0200, David Hildenbrand wrote:
> On 03.05.21 08:26, Mike Rapoport wrote:
> > On Fri, Apr 30, 2021 at 07:24:37PM +0800, Kefeng Wang wrote:
> > > 
> > > 
> > > On 2021/4/30 17:51, Mike Rapoport wrote:
> > > > On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote:
> > > > > 
> > > > > On 2021/4/29 14:57, Mike Rapoport wrote:
> > > > > 
> > > > > > > > Do you use SPARSMEM? If yes, what is your section size?
> > > > > > > > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
> > > > > > > Yes,
> > > > > > > 
> > > > > > > CONFIG_SPARSEMEM=y
> > > > > > > 
> > > > > > > CONFIG_SPARSEMEM_STATIC=y
> > > > > > > 
> > > > > > > CONFIG_FORCE_MAX_ZONEORDER = 11
> > > > > > > 
> > > > > > > CONFIG_PAGE_OFFSET=0xC0000000
> > > > > > > CONFIG_HAVE_ARCH_PFN_VALID=y
> > > > > > > CONFIG_HIGHMEM=y
> > > > > > > #define SECTION_SIZE_BITS    26
> > > > > > > #define MAX_PHYSADDR_BITS    32
> > > > > > > #define MAX_PHYSMEM_BITS     32
> > > > > 
> > > > > 
> > > > > With the patch,  the addr is aligned, but the panic still occurred,
> > > > 
> > > > Is this the same panic at move_freepages() for range [de600, de7ff]?
> > > > 
> > > > Do you enable CONFIG_ARM_LPAE?
> > > 
> > > no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
> > > move_freepages at
> > > 
> > > start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] :  pfn =de600, page
> > > =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000
> > > 
> > > > > __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
> > > > > __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
> > > > > __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200
> > 
> > Hmm, [de600, de7ff] is not added to the free lists which is correct. But
> > then it's unclear how the page for de600 gets to move_freepages()...
> > 
> > Can't say I have any bright ideas to try here...
> 
> Are we missing some checks (e.g., PageReserved()) that pfn_valid_within()
> would have "caught" before?

Unless I'm missing something the crash happens in __rmqueue_fallback():

do_steal:
	page = get_page_from_free_area(area, fallback_mt);

	steal_suitable_fallback(zone, page, alloc_flags, start_migratetype,
								can_steal);
		-> move_freepages() 
			-> BUG()

So a page from free area should be sane as the freed range was never added
it to the free lists.

And honestly, with the memory layout reported elsewhere in the stack I'd
say that the bootloader/fdt beg for fixes...

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-03  8:44                                     ` Mike Rapoport
@ 2021-05-06 12:47                                       ` Kefeng Wang
  2021-05-07  7:17                                         ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-05-06 12:47 UTC (permalink / raw)
  To: Mike Rapoport, David Hildenbrand
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm



On 2021/5/3 16:44, Mike Rapoport wrote:
> On Mon, May 03, 2021 at 10:07:01AM +0200, David Hildenbrand wrote:
>> On 03.05.21 08:26, Mike Rapoport wrote:
>>> On Fri, Apr 30, 2021 at 07:24:37PM +0800, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2021/4/30 17:51, Mike Rapoport wrote:
>>>>> On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote:
>>>>>>
>>>>>> On 2021/4/29 14:57, Mike Rapoport wrote:
>>>>>>
>>>>>>>>> Do you use SPARSMEM? If yes, what is your section size?
>>>>>>>>> What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
>>>>>>>> Yes,
>>>>>>>>
>>>>>>>> CONFIG_SPARSEMEM=y
>>>>>>>>
>>>>>>>> CONFIG_SPARSEMEM_STATIC=y
>>>>>>>>
>>>>>>>> CONFIG_FORCE_MAX_ZONEORDER = 11
>>>>>>>>
>>>>>>>> CONFIG_PAGE_OFFSET=0xC0000000
>>>>>>>> CONFIG_HAVE_ARCH_PFN_VALID=y
>>>>>>>> CONFIG_HIGHMEM=y
>>>>>>>> #define SECTION_SIZE_BITS    26
>>>>>>>> #define MAX_PHYSADDR_BITS    32
>>>>>>>> #define MAX_PHYSMEM_BITS     32
>>>>>>
>>>>>>
>>>>>> With the patch,  the addr is aligned, but the panic still occurred,
>>>>>
>>>>> Is this the same panic at move_freepages() for range [de600, de7ff]?
>>>>>
>>>>> Do you enable CONFIG_ARM_LPAE?
>>>>
>>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
>>>> move_freepages at
>>>>
>>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] :  pfn =de600, page
>>>> =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000
>>>>
>>>>>> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
>>>>>> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
>>>>>> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200
>>>
>>> Hmm, [de600, de7ff] is not added to the free lists which is correct. But
>>> then it's unclear how the page for de600 gets to move_freepages()...
>>>
>>> Can't say I have any bright ideas to try here...
>>
>> Are we missing some checks (e.g., PageReserved()) that pfn_valid_within()
>> would have "caught" before?
> 
> Unless I'm missing something the crash happens in __rmqueue_fallback():
> 
> do_steal:
> 	page = get_page_from_free_area(area, fallback_mt);
> 
> 	steal_suitable_fallback(zone, page, alloc_flags, start_migratetype,
> 								can_steal);
> 		-> move_freepages()
> 			-> BUG()
> 
> So a page from free area should be sane as the freed range was never added
> it to the free lists.

Sorry for the late response due to the vacation.

The pfn in range [de600, de7ff] won't be added into the free lists via 
__free_memory_core(), but the pfn could be added into freelists via 
free_highmem_page()

I add some debug[1] in add_to_free_list(), we could see the calltrace

free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000]
free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000]
free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000]
add_to_free_list, ===> pfn = de700
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec
pfn = de700
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48
Hardware name: Hisilicon A9
[<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0)
[<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec)
[<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4)
[<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>] 
(add_to_free_list+0x8c/0xec)
[<c023721c>] (add_to_free_list) from [<c0237e00>] 
(free_pcppages_bulk+0x200/0x278)
[<c0237e00>] (free_pcppages_bulk) from [<c0238d14>] 
(free_unref_page+0x58/0x68)
[<c0238d14>] (free_unref_page) from [<c023bb54>] 
(free_highmem_page+0xc/0x50)
[<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254)
[<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0)
[<c0700b38>] (start_kernel) from [<00000000>] (0x0)

so any idea?

[1] debug
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 1ba9f9f9dbd8..ee3619c04f93 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -286,7 +286,7 @@ static void __init free_highpages(void)
                 /* Truncate partial highmem entries */
                 if (start < max_low)
                         start = max_low;
-
+               pr_info("%s, range_pfn [%lx, %lx], range_addr [%x, 
%x]\n", __func__, start, end, range_start, range_end);
                 for (; start < end; start++)
                         free_highmem_page(pfn_to_page(start));

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 592479f43c74..920f041f0c6f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -892,7 +892,14 @@ compaction_capture(struct capture_control *capc, 
struct page *page,
  static inline void add_to_free_list(struct page *page, struct zone *zone,
                                     unsigned int order, int migratetype)
  {
+       unsigned long pfn;
         struct free_area *area = &zone->free_area[order];
+       pfn = page_to_pfn(page);
+       if (pfn >= 0xde600 && pfn < 0xde7ff) {
+               pr_info("%s, ===> pfn = %lx", __func__, pfn);
+               WARN_ONCE(pfn == 0xde700, "pfn = %lx", pfn);
+       }



> 
> And honestly, with the memory layout reported elsewhere in the stack I'd
> say that the bootloader/fdt beg for fixes...
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-06 12:47                                       ` Kefeng Wang
@ 2021-05-07  7:17                                         ` Kefeng Wang
  2021-05-07 10:30                                           ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-05-07  7:17 UTC (permalink / raw)
  To: Mike Rapoport, David Hildenbrand
  Cc: linux-arm-kernel, Andrew Morton, Anshuman Khandual,
	Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm



On 2021/5/6 20:47, Kefeng Wang wrote:
> 
> 
>>>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
>>>>> move_freepages at
>>>>>
>>>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] :  pfn 
>>>>> =de600, page
>>>>> =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000
>>>>>
>>>>>>> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - 
>>>>>>> b0200
>>>>>>> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - 
>>>>>>> b0200
>>>>>>> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - 
>>>>>>> b0200
>>>>
>>>> Hmm, [de600, de7ff] is not added to the free lists which is correct. 
>>>> But
>>>> then it's unclear how the page for de600 gets to move_freepages()...
>>>>
>>>> Can't say I have any bright ideas to try here...
>>>
>>> Are we missing some checks (e.g., PageReserved()) that 
>>> pfn_valid_within()
>>> would have "caught" before?
>>
>> Unless I'm missing something the crash happens in __rmqueue_fallback():
>>
>> do_steal:
>>     page = get_page_from_free_area(area, fallback_mt);
>>
>>     steal_suitable_fallback(zone, page, alloc_flags, start_migratetype,
>>                                 can_steal);
>>         -> move_freepages()
>>             -> BUG()
>>
>> So a page from free area should be sane as the freed range was never 
>> added
>> it to the free lists.
> 
> Sorry for the late response due to the vacation.
> 
> The pfn in range [de600, de7ff] won't be added into the free lists via 
> __free_memory_core(), but the pfn could be added into freelists via 
> free_highmem_page()
> 
> I add some debug[1] in add_to_free_list(), we could see the calltrace
> 
> free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000]
> free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000]
> free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000]
> add_to_free_list, ===> pfn = de700
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec
> pfn = de700
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48
> Hardware name: Hisilicon A9
> [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0)
> [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec)
> [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4)
> [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>] 
> (add_to_free_list+0x8c/0xec)
> [<c023721c>] (add_to_free_list) from [<c0237e00>] 
> (free_pcppages_bulk+0x200/0x278)
> [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>] 
> (free_unref_page+0x58/0x68)
> [<c0238d14>] (free_unref_page) from [<c023bb54>] 
> (free_highmem_page+0xc/0x50)
> [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254)
> [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0)
> [<c0700b38>] (start_kernel) from [<00000000>] (0x0)
> 
> so any idea?

If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the 
start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff],
but the range of [de600,de700] without ‘struct page' will lead to
this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE,
and the same issue will occurred in isolate_freepages_block(), maybe
there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to 
solve this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM 
or only in ARCH_HISI, any better solution?  Thanks.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-07  7:17                                         ` Kefeng Wang
@ 2021-05-07 10:30                                           ` Mike Rapoport
  2021-05-07 12:34                                             ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-05-07 10:30 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Fri, May 07, 2021 at 03:17:08PM +0800, Kefeng Wang wrote:
> 
> On 2021/5/6 20:47, Kefeng Wang wrote:
> > 
> > 
> > > > > > no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
> > > > > > move_freepages at
> > > > > > 
> > > > > > start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000]
> > > > > > :  pfn =de600, page
> > > > > > =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000
> > > > > > 
> > > > > > > > __free_memory_core, range: 0xb0200000 -
> > > > > > > > 0xc0000000, pfn: b0200 - b0200
> > > > > > > > __free_memory_core, range: 0xcc000000 -
> > > > > > > > 0xdca00000, pfn: cc000 - b0200
> > > > > > > > __free_memory_core, range: 0xde700000 -
> > > > > > > > 0xdea00000, pfn: de700 - b0200
> > > > > 
> > > > > Hmm, [de600, de7ff] is not added to the free lists which is
> > > > > correct. But
> > > > > then it's unclear how the page for de600 gets to move_freepages()...
> > > > > 
> > > > > Can't say I have any bright ideas to try here...
> > > > 
> > > > Are we missing some checks (e.g., PageReserved()) that
> > > > pfn_valid_within()
> > > > would have "caught" before?
> > > 
> > > Unless I'm missing something the crash happens in __rmqueue_fallback():
> > > 
> > > do_steal:
> > >     page = get_page_from_free_area(area, fallback_mt);
> > > 
> > >     steal_suitable_fallback(zone, page, alloc_flags, start_migratetype,
> > >                                 can_steal);
> > >         -> move_freepages()
> > >             -> BUG()
> > > 
> > > So a page from free area should be sane as the freed range was never
> > > added
> > > it to the free lists.
> > 
> > Sorry for the late response due to the vacation.
> > 
> > The pfn in range [de600, de7ff] won't be added into the free lists via
> > __free_memory_core(), but the pfn could be added into freelists via
> > free_highmem_page()
> > 
> > I add some debug[1] in add_to_free_list(), we could see the calltrace
> > 
> > free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000]
> > free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000]
> > free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000]
> > add_to_free_list, ===> pfn = de700
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec
> > pfn = de700
> > Modules linked in:
> > CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48
> > Hardware name: Hisilicon A9
> > [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0)
> > [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec)
> > [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4)
> > [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>]
> > (add_to_free_list+0x8c/0xec)
> > [<c023721c>] (add_to_free_list) from [<c0237e00>]
> > (free_pcppages_bulk+0x200/0x278)
> > [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>]
> > (free_unref_page+0x58/0x68)
> > [<c0238d14>] (free_unref_page) from [<c023bb54>]
> > (free_highmem_page+0xc/0x50)
> > [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254)
> > [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0)
> > [<c0700b38>] (start_kernel) from [<00000000>] (0x0)
> > 
> > so any idea?
> 
> If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the
> start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff],
> but the range of [de600,de700] without ‘struct page' will lead to
> this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE,
> and the same issue will occurred in isolate_freepages_block(), maybe

I think your analysis is correct except one minor detail. With the #ifdef
fix I've proposed earlieri [1] the memmap for [0xde600, 0xde700] should not
be freed so there should be a struct page. Did you check what parts of the
memmap are actually freed with this patch applied?
Would you get a panic if you add

	dump_page(pfn_to_page(0xde600), "");

say, in the end of memblock_free_all()?

> there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve
> this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in
> ARCH_HISI, any better solution?  Thanks.

I don't think that HOLES_IN_ZONE is the right solution. I believe that we
must keep the memory map aligned on pageblock boundaries. That's surely not the
case for SPARSEMEM as of now, and if my fix is not enough we need to find
where it went wrong.

Besides, I'd say that if it is possible to update your firmware to make the
memory layout reported to the kernel less, hmm, esoteric, you would hit
less corner cases.

[1] https://lore.kernel.org/lkml/YIpY8TXCSc7Lfa2Z@kernel.org

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-07 10:30                                           ` Mike Rapoport
@ 2021-05-07 12:34                                             ` Kefeng Wang
  2021-05-09  5:59                                               ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-05-07 12:34 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm



On 2021/5/7 18:30, Mike Rapoport wrote:
> On Fri, May 07, 2021 at 03:17:08PM +0800, Kefeng Wang wrote:
>>
>> On 2021/5/6 20:47, Kefeng Wang wrote:
>>>
>>>
>>>>>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
>>>>>>> move_freepages at
>>>>>>>
>>>>>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000]
>>>>>>> :  pfn =de600, page
>>>>>>> =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000
>>>>>>>
>>>>>>>>> __free_memory_core, range: 0xb0200000 -
>>>>>>>>> 0xc0000000, pfn: b0200 - b0200
>>>>>>>>> __free_memory_core, range: 0xcc000000 -
>>>>>>>>> 0xdca00000, pfn: cc000 - b0200
>>>>>>>>> __free_memory_core, range: 0xde700000 -
>>>>>>>>> 0xdea00000, pfn: de700 - b0200
>>>>>>
>>>>>> Hmm, [de600, de7ff] is not added to the free lists which is
>>>>>> correct. But
>>>>>> then it's unclear how the page for de600 gets to move_freepages()...
>>>>>>
>>>>>> Can't say I have any bright ideas to try here...
>>>>>
>>>>> Are we missing some checks (e.g., PageReserved()) that
>>>>> pfn_valid_within()
>>>>> would have "caught" before?
>>>>
>>>> Unless I'm missing something the crash happens in __rmqueue_fallback():
>>>>
>>>> do_steal:
>>>>      page = get_page_from_free_area(area, fallback_mt);
>>>>
>>>>      steal_suitable_fallback(zone, page, alloc_flags, start_migratetype,
>>>>                                  can_steal);
>>>>          -> move_freepages()
>>>>              -> BUG()
>>>>
>>>> So a page from free area should be sane as the freed range was never
>>>> added
>>>> it to the free lists.
>>>
>>> Sorry for the late response due to the vacation.
>>>
>>> The pfn in range [de600, de7ff] won't be added into the free lists via
>>> __free_memory_core(), but the pfn could be added into freelists via
>>> free_highmem_page()
>>>
>>> I add some debug[1] in add_to_free_list(), we could see the calltrace
>>>
>>> free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000]
>>> free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000]
>>> free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000]
>>> add_to_free_list, ===> pfn = de700
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec
>>> pfn = de700
>>> Modules linked in:
>>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48
>>> Hardware name: Hisilicon A9
>>> [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0)
>>> [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec)
>>> [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4)
>>> [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>]
>>> (add_to_free_list+0x8c/0xec)
>>> [<c023721c>] (add_to_free_list) from [<c0237e00>]
>>> (free_pcppages_bulk+0x200/0x278)
>>> [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>]
>>> (free_unref_page+0x58/0x68)
>>> [<c0238d14>] (free_unref_page) from [<c023bb54>]
>>> (free_highmem_page+0xc/0x50)
>>> [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254)
>>> [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0)
>>> [<c0700b38>] (start_kernel) from [<00000000>] (0x0)
>>>
>>> so any idea?
>>
>> If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the
>> start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff],
>> but the range of [de600,de700] without ‘struct page' will lead to
>> this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE,
>> and the same issue will occurred in isolate_freepages_block(), maybe
> 
> I think your analysis is correct except one minor detail. With the #ifdef
> fix I've proposed earlieri [1] the memmap for [0xde600, 0xde700] should not
> be freed so there should be a struct page. Did you check what parts of the
> memmap are actually freed with this patch applied?
> Would you get a panic if you add
> 
> 	dump_page(pfn_to_page(0xde600), "");
> 
> say, in the end of memblock_free_all()?

The memory is not continuous, see MEMBLOCK:
  memory size = 0x4c0fffff reserved size = 0x027ef058
  memory.cnt  = 0xa
  memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
  memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
  memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
  memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
  memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
  memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
  memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
...

The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000]
is not available memory, and we won't create memmap , so with or without 
your patch, we can't see the range in free_memmap(), right?

> 
>> there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve
>> this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in
>> ARCH_HISI, any better solution?  Thanks.
> 
> I don't think that HOLES_IN_ZONE is the right solution. I believe that we
> must keep the memory map aligned on pageblock boundaries. That's surely not the
> case for SPARSEMEM as of now, and if my fix is not enough we need to find
> where it went wrong.
> 
> Besides, I'd say that if it is possible to update your firmware to make the
> memory layout reported to the kernel less, hmm, esoteric, you would hit
> less corner cases.

Sorry, memory layout is customized and we can't change it, some memory 
is for special purposes by our production.
> 
> [1] https://lore.kernel.org/lkml/YIpY8TXCSc7Lfa2Z@kernel.org
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-07 12:34                                             ` Kefeng Wang
@ 2021-05-09  5:59                                               ` Mike Rapoport
  2021-05-10  3:10                                                 ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-05-09  5:59 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Fri, May 07, 2021 at 08:34:52PM +0800, Kefeng Wang wrote:
> 
> 
> On 2021/5/7 18:30, Mike Rapoport wrote:
> > On Fri, May 07, 2021 at 03:17:08PM +0800, Kefeng Wang wrote:
> > > 
> > > On 2021/5/6 20:47, Kefeng Wang wrote:
> > > > 
> > > > > > > > no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
> > > > > > > > move_freepages at
> > > > > > > > 
> > > > > > > > start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000]
> > > > > > > > :  pfn =de600, page
> > > > > > > > =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000
> > > > > > > > 
> > > > > > > > > > __free_memory_core, range: 0xb0200000 -
> > > > > > > > > > 0xc0000000, pfn: b0200 - b0200
> > > > > > > > > > __free_memory_core, range: 0xcc000000 -
> > > > > > > > > > 0xdca00000, pfn: cc000 - b0200
> > > > > > > > > > __free_memory_core, range: 0xde700000 -
> > > > > > > > > > 0xdea00000, pfn: de700 - b0200
> > > > > > > 
> > > > > > > Hmm, [de600, de7ff] is not added to the free lists which is
> > > > > > > correct. But
> > > > > > > then it's unclear how the page for de600 gets to move_freepages()...
> > > > > > > 
> > > > > > > Can't say I have any bright ideas to try here...
> > > > > > 
> > > > > > Are we missing some checks (e.g., PageReserved()) that
> > > > > > pfn_valid_within()
> > > > > > would have "caught" before?
> > > > > 
> > > > > Unless I'm missing something the crash happens in __rmqueue_fallback():
> > > > > 
> > > > > do_steal:
> > > > >      page = get_page_from_free_area(area, fallback_mt);
> > > > > 
> > > > >      steal_suitable_fallback(zone, page, alloc_flags, start_migratetype,
> > > > >                                  can_steal);
> > > > >          -> move_freepages()
> > > > >              -> BUG()
> > > > > 
> > > > > So a page from free area should be sane as the freed range was never
> > > > > added
> > > > > it to the free lists.
> > > > 
> > > > Sorry for the late response due to the vacation.
> > > > 
> > > > The pfn in range [de600, de7ff] won't be added into the free lists via
> > > > __free_memory_core(), but the pfn could be added into freelists via
> > > > free_highmem_page()
> > > > 
> > > > I add some debug[1] in add_to_free_list(), we could see the calltrace
> > > > 
> > > > free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000]
> > > > free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000]
> > > > free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000]
> > > > add_to_free_list, ===> pfn = de700
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec
> > > > pfn = de700
> > > > Modules linked in:
> > > > CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48
> > > > Hardware name: Hisilicon A9
> > > > [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0)
> > > > [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec)
> > > > [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4)
> > > > [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>]
> > > > (add_to_free_list+0x8c/0xec)
> > > > [<c023721c>] (add_to_free_list) from [<c0237e00>]
> > > > (free_pcppages_bulk+0x200/0x278)
> > > > [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>]
> > > > (free_unref_page+0x58/0x68)
> > > > [<c0238d14>] (free_unref_page) from [<c023bb54>]
> > > > (free_highmem_page+0xc/0x50)
> > > > [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254)
> > > > [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0)
> > > > [<c0700b38>] (start_kernel) from [<00000000>] (0x0)
> > > > 
> > > > so any idea?
> > > 
> > > If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the
> > > start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff],
> > > but the range of [de600,de700] without ‘struct page' will lead to
> > > this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE,
> > > and the same issue will occurred in isolate_freepages_block(), maybe
> > 
> > I think your analysis is correct except one minor detail. With the #ifdef
> > fix I've proposed earlieri [1] the memmap for [0xde600, 0xde700] should not
> > be freed so there should be a struct page. Did you check what parts of the
> > memmap are actually freed with this patch applied?
> > Would you get a panic if you add
> > 
> > 	dump_page(pfn_to_page(0xde600), "");
> > 
> > say, in the end of memblock_free_all()?
> 
> The memory is not continuous, see MEMBLOCK:
>  memory size = 0x4c0fffff reserved size = 0x027ef058
>  memory.cnt  = 0xa
>  memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
>  memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
>  memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
>  memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
>  memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
>  memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
>  memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
> ...
> 
> The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000]
> is not available memory, and we won't create memmap , so with or without
> your patch, we can't see the range in free_memmap(), right?
 

This is not available memory and we won't see the reange in free_memmap(),
but we still should create memmap for it and that's what my patch tried to
do.

There are a lot of places in core mm that operate on pageblocks and
free_unused_memmap() should make sure that any pageblock has a valid memory
map.

Currently, that's not the case when SPARSEMEM=y and my patch tried to fix
it.

Can you please send log with my patch applied and with the printing of
ranges that are freed in free_unused_memmap() you've used in previous
mails?
 
> > > there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve
> > > this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in
> > > ARCH_HISI, any better solution?  Thanks.
> > 
> > I don't think that HOLES_IN_ZONE is the right solution. I believe that we
> > must keep the memory map aligned on pageblock boundaries. That's surely not the
> > case for SPARSEMEM as of now, and if my fix is not enough we need to find
> > where it went wrong.
> > 
> > Besides, I'd say that if it is possible to update your firmware to make the
> > memory layout reported to the kernel less, hmm, esoteric, you would hit
> > less corner cases.
> 
> Sorry, memory layout is customized and we can't change it, some memory is
> for special purposes by our production.
 
I understand that this memory cannot be used by Linux, but the firmware may
supply the kernel with actual physical memory layout and then mark all
the special purpose memory that kernel should not touch as reserved.

> > [1] https://lore.kernel.org/lkml/YIpY8TXCSc7Lfa2Z@kernel.org
> > 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-09  5:59                                               ` Mike Rapoport
@ 2021-05-10  3:10                                                 ` Kefeng Wang
  2021-05-11  8:48                                                   ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-05-10  3:10 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm



On 2021/5/9 13:59, Mike Rapoport wrote:
> On Fri, May 07, 2021 at 08:34:52PM +0800, Kefeng Wang wrote:
>>
>>
>> On 2021/5/7 18:30, Mike Rapoport wrote:
>>> On Fri, May 07, 2021 at 03:17:08PM +0800, Kefeng Wang wrote:
>>>>
>>>> On 2021/5/6 20:47, Kefeng Wang wrote:
>>>>>
>>>>>>>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
>>>>>>>>> move_freepages at
>>>>>>>>>
>>>>>>>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000]
>>>>>>>>> :  pfn =de600, page
>>>>>>>>> =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000
>>>>>>>>>
>>>>>>>>>>> __free_memory_core, range: 0xb0200000 -
>>>>>>>>>>> 0xc0000000, pfn: b0200 - b0200
>>>>>>>>>>> __free_memory_core, range: 0xcc000000 -
>>>>>>>>>>> 0xdca00000, pfn: cc000 - b0200
>>>>>>>>>>> __free_memory_core, range: 0xde700000 -
>>>>>>>>>>> 0xdea00000, pfn: de700 - b0200
>>>>>>>>
>>>>>>>> Hmm, [de600, de7ff] is not added to the free lists which is
>>>>>>>> correct. But
>>>>>>>> then it's unclear how the page for de600 gets to move_freepages()...
>>>>>>>>
>>>>>>>> Can't say I have any bright ideas to try here...
>>>>>>>
>>>>>>> Are we missing some checks (e.g., PageReserved()) that
>>>>>>> pfn_valid_within()
>>>>>>> would have "caught" before?
>>>>>>
>>>>>> Unless I'm missing something the crash happens in __rmqueue_fallback():
>>>>>>
>>>>>> do_steal:
>>>>>>       page = get_page_from_free_area(area, fallback_mt);
>>>>>>
>>>>>>       steal_suitable_fallback(zone, page, alloc_flags, start_migratetype,
>>>>>>                                   can_steal);
>>>>>>           -> move_freepages()
>>>>>>               -> BUG()
>>>>>>
>>>>>> So a page from free area should be sane as the freed range was never
>>>>>> added
>>>>>> it to the free lists.
>>>>>
>>>>> Sorry for the late response due to the vacation.
>>>>>
>>>>> The pfn in range [de600, de7ff] won't be added into the free lists via
>>>>> __free_memory_core(), but the pfn could be added into freelists via
>>>>> free_highmem_page()
>>>>>
>>>>> I add some debug[1] in add_to_free_list(), we could see the calltrace
>>>>>
>>>>> free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000]
>>>>> free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000]
>>>>> free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000]
>>>>> add_to_free_list, ===> pfn = de700
>>>>> ------------[ cut here ]------------
>>>>> WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec
>>>>> pfn = de700
>>>>> Modules linked in:
>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48
>>>>> Hardware name: Hisilicon A9
>>>>> [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0)
>>>>> [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec)
>>>>> [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4)
>>>>> [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>]
>>>>> (add_to_free_list+0x8c/0xec)
>>>>> [<c023721c>] (add_to_free_list) from [<c0237e00>]
>>>>> (free_pcppages_bulk+0x200/0x278)
>>>>> [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>]
>>>>> (free_unref_page+0x58/0x68)
>>>>> [<c0238d14>] (free_unref_page) from [<c023bb54>]
>>>>> (free_highmem_page+0xc/0x50)
>>>>> [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254)
>>>>> [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0)
>>>>> [<c0700b38>] (start_kernel) from [<00000000>] (0x0)
>>>>>
>>>>> so any idea?
>>>>
>>>> If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the
>>>> start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff],
>>>> but the range of [de600,de700] without ‘struct page' will lead to
>>>> this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE,
>>>> and the same issue will occurred in isolate_freepages_block(), maybe
>>>
>>> I think your analysis is correct except one minor detail. With the #ifdef
>>> fix I've proposed earlieri [1] the memmap for [0xde600, 0xde700] should not
>>> be freed so there should be a struct page. Did you check what parts of the
>>> memmap are actually freed with this patch applied?
>>> Would you get a panic if you add
>>>
>>> 	dump_page(pfn_to_page(0xde600), "");
>>>
>>> say, in the end of memblock_free_all()?
>>
>> The memory is not continuous, see MEMBLOCK:
>>   memory size = 0x4c0fffff reserved size = 0x027ef058
>>   memory.cnt  = 0xa
>>   memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
>>   memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
>>   memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
>>   memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
>>   memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
>>   memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
>>   memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
>> ...
>>
>> The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000]
>> is not available memory, and we won't create memmap , so with or without
>> your patch, we can't see the range in free_memmap(), right?
>   
> 
> This is not available memory and we won't see the reange in free_memmap(),
> but we still should create memmap for it and that's what my patch tried to
> do.
> 
> There are a lot of places in core mm that operate on pageblocks and
> free_unused_memmap() should make sure that any pageblock has a valid memory
> map.
> 
> Currently, that's not the case when SPARSEMEM=y and my patch tried to fix
> it.
> 
> Can you please send log with my patch applied and with the printing of
> ranges that are freed in free_unused_memmap() you've used in previous
> mails?
with your patch[1] and debug print in free_memmap,
----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86800, 86800000
----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e000, 8e000000
----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de400, de400000
----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
__free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04
__free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00
__free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000
__free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600
__free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00
__free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500
__free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00
__free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0
__free_memory_core, range: 0xaf430000 - 0xaf450000, pfn: af430 - af450
__free_memory_core, range: 0xaf510000 - 0xaf540000, pfn: af510 - af540
__free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580
__free_memory_core, range: 0xafd98000 - 0xafdc8000, pfn: afd98 - afdc8
__free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00
__free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80
__free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00
__free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d
__free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4
__free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe
__free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe
__free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe
__free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe
__free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe
__free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe
__free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe
__free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe
__free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe
__free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe
__free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
__free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
__free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200
__free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200
__free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200
__free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200
free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000]
free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000]
free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000]
free_highpages, range_pfn [e0800, e0c00], range_addr [e0800000, e0c00000]
free_highpages, range_pfn [f4b00, f7000], range_addr [f4b00000, f7000000]
free_highpages, range_pfn [fda00, fffff], range_addr [fda00000, ffffffff]

>   
>>>> there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve
>>>> this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in
>>>> ARCH_HISI, any better solution?  Thanks.
>>>
>>> I don't think that HOLES_IN_ZONE is the right solution. I believe that we
>>> must keep the memory map aligned on pageblock boundaries. That's surely not the
>>> case for SPARSEMEM as of now, and if my fix is not enough we need to find
>>> where it went wrong.
>>>
>>> Besides, I'd say that if it is possible to update your firmware to make the
>>> memory layout reported to the kernel less, hmm, esoteric, you would hit
>>> less corner cases.
>>
>> Sorry, memory layout is customized and we can't change it, some memory is
>> for special purposes by our production.
>   
> I understand that this memory cannot be used by Linux, but the firmware may
> supply the kernel with actual physical memory layout and then mark all
> the special purpose memory that kernel should not touch as reserved.
We only can modify kernel, so it is not practicable for our production, 
and this way looks like a workaround, we need find a way to solve the 
issue from kernel side.

[1] https://lore.kernel.org/lkml/YIpY8TXCSc7Lfa2Z@kernel.org


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-10  3:10                                                 ` Kefeng Wang
@ 2021-05-11  8:48                                                   ` Mike Rapoport
  2021-05-12  3:08                                                     ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-05-11  8:48 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote:
>
> > > The memory is not continuous, see MEMBLOCK:
> > >   memory size = 0x4c0fffff reserved size = 0x027ef058
> > >   memory.cnt  = 0xa
> > >   memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
> > >   memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
> > >   memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
> > >   memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
> > >   memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
> > >   memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
> > >   memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
> > > ...
> > > 
> > > The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000]
> > > is not available memory, and we won't create memmap , so with or without
> > > your patch, we can't see the range in free_memmap(), right?
> > 
> > This is not available memory and we won't see the reange in free_memmap(),
> > but we still should create memmap for it and that's what my patch tried to
> > do.
> > 
> > There are a lot of places in core mm that operate on pageblocks and
> > free_unused_memmap() should make sure that any pageblock has a valid memory
> > map.
> > 
> > Currently, that's not the case when SPARSEMEM=y and my patch tried to fix
> > it.
> > 
> > Can you please send log with my patch applied and with the printing of
> > ranges that are freed in free_unused_memmap() you've used in previous
> > mails?

> with your patch[1] and debug print in free_memmap,
> ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86800, 86800000
> ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e000, 8e000000
> ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
> ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de400, de400000
> ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
> ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
> ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000

It seems that freeing of the memory map is suboptimal still because that
code was not designed for memory layout that has more holes than Swiss
cheese. 

Still, the range [0xde600,0xde700] is not freed and there should be struct
pages for this range.

Can you add 

	dump_page(pfn_to_page(0xde600), "");

say, in the end of memblock_free_all()?
 
-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-11  8:48                                                   ` Mike Rapoport
@ 2021-05-12  3:08                                                     ` Kefeng Wang
  2021-05-12  8:26                                                       ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-05-12  3:08 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm



On 2021/5/11 16:48, Mike Rapoport wrote:
> On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote:
>>
>>>> The memory is not continuous, see MEMBLOCK:
>>>>    memory size = 0x4c0fffff reserved size = 0x027ef058
>>>>    memory.cnt  = 0xa
>>>>    memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
>>>>    memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
>>>>    memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
>>>>    memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
>>>>    memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
>>>>    memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
>>>>    memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
>>>> ...
>>>>
>>>> The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000]
>>>> is not available memory, and we won't create memmap , so with or without
>>>> your patch, we can't see the range in free_memmap(), right?
>>>
>>> This is not available memory and we won't see the reange in free_memmap(),
>>> but we still should create memmap for it and that's what my patch tried to
>>> do.
>>>
>>> There are a lot of places in core mm that operate on pageblocks and
>>> free_unused_memmap() should make sure that any pageblock has a valid memory
>>> map.
>>>
>>> Currently, that's not the case when SPARSEMEM=y and my patch tried to fix
>>> it.
>>>
>>> Can you please send log with my patch applied and with the printing of
>>> ranges that are freed in free_unused_memmap() you've used in previous
>>> mails?
> 
>> with your patch[1] and debug print in free_memmap,
>> ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86800, 86800000
>> ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e000, 8e000000
>> ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
>> ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de400, de400000
>> ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
>> ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
>> ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
> 
> It seems that freeing of the memory map is suboptimal still because that
> code was not designed for memory layout that has more holes than Swiss
> cheese.
> 
> Still, the range [0xde600,0xde700] is not freed and there should be struct
> pages for this range.
> 
> Can you add
> 
> 	dump_page(pfn_to_page(0xde600), "");
> 
> say, in the end of memblock_free_all()?
>   
> 
The range [0xde600,0xde700] is not memory, so it won't create struct 
page for it when sparse_init?

After apply patch[1], the dump_page log,

page:ef3cc000 is uninitialized and poisoned
raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
page dumped because:


[1] 
https://lore.kernel.org/linux-mm/20210512031057.13580-3-wangkefeng.wang@huawei.com/T/#u

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
       [not found]           ` <52f7d03b-7219-46bc-c62d-b976bc31ebd5@huawei.com>
  2021-04-26  5:20             ` Mike Rapoport
@ 2021-05-12  3:50             ` Matthew Wilcox
  1 sibling, 0 replies; 47+ messages in thread
From: Matthew Wilcox @ 2021-05-12  3:50 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Mike Rapoport, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote:
> we see the PC is at PageLRU, same reason like arm64 panic log,
> 
> "PageBuddy in move_freepages returns false Then we call PageLRU, the macro
> calls PF_HEAD which is compound_page() compound_page reads
> page->compound_head, it is 0xffffffffffffffff, so it resturns
> 0xfffffffffffffffe - and accessing this address causes crash"

Oh.  I posted patches to fix this back in 2018.

https://lore.kernel.org/linux-mm/20180414043145.3953-6-willy@infradead.org/

and 2019.

https://lore.kernel.org/linux-mm/20190501202433.GC28500@bombadil.infradead.org/

and 2020.

https://lore.kernel.org/linux-mm/20200408150148.25290-6-willy@infradead.org/

Looks like it's about that time of year for me to try to fix this again.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-12  3:08                                                     ` Kefeng Wang
@ 2021-05-12  8:26                                                       ` Mike Rapoport
  2021-05-13  3:44                                                         ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-05-12  8:26 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Wed, May 12, 2021 at 11:08:14AM +0800, Kefeng Wang wrote:
> 
> On 2021/5/11 16:48, Mike Rapoport wrote:
> > On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote:
> > > 
> > > > > The memory is not continuous, see MEMBLOCK:
> > > > >    memory size = 0x4c0fffff reserved size = 0x027ef058
> > > > >    memory.cnt  = 0xa
> > > > >    memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
> > > > >    memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
> > > > >    memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
> > > > >    memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
> > > > >    memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
> > > > >    memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
> > > > >    memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
> > > > > ...
> > > > > 
> > > > > The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000]
> > > > > is not available memory, and we won't create memmap , so with or without
> > > > > your patch, we can't see the range in free_memmap(), right?
> > > > 
> > > > This is not available memory and we won't see the reange in free_memmap(),
> > > > but we still should create memmap for it and that's what my patch tried to
> > > > do.
> > > > 
> > > > There are a lot of places in core mm that operate on pageblocks and
> > > > free_unused_memmap() should make sure that any pageblock has a valid memory
> > > > map.
> > > > 
> > > > Currently, that's not the case when SPARSEMEM=y and my patch tried to fix
> > > > it.
> > > > 
> > > > Can you please send log with my patch applied and with the printing of
> > > > ranges that are freed in free_unused_memmap() you've used in previous
> > > > mails?
> > 
> > > with your patch[1] and debug print in free_memmap,
> > > ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86800, 86800000
> > > ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e000, 8e000000
> > > ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
> > > ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de400, de400000
> > > ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
> > > ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
> > > ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
> > 
> > It seems that freeing of the memory map is suboptimal still because that
> > code was not designed for memory layout that has more holes than Swiss
> > cheese.
> > 
> > Still, the range [0xde600,0xde700] is not freed and there should be struct
> > pages for this range.
> > 
> > Can you add
> > 
> > 	dump_page(pfn_to_page(0xde600), "");
> > 
> > say, in the end of memblock_free_all()?
> > 
> The range [0xde600,0xde700] is not memory, so it won't create struct page
> for it when sparse_init?

sparse_init() indeed does not create memory map for unpopulated memory, but
it has pretty coarse granularity, i.e. 64M in your configuration. A hole
should be at least 64M in order to skip allocation of the memory map for
it.

For example, your memory layout has a hole of 192M at pfn 0xc0000 and this
hole won't have the memory map.

However the hole 0xdca00 - 0xde70 will still have a memory map in the
section  that covers 0xdc000 - 0xe0000.

I've tried outline this in a sketch below, hope it helps.

Memory:
                          c0000      cc000                      dca00
--------------------------+          +--------------------------+ +----+
 memory bank              |<- hole ->| memory bank              | | mb |
--------------------------+          +--------------------------+ +----+
                                                                de700  dea00

Memory map:

b0000    b4000            c0000      cc000   d0000    d8000    dc000
+--------+--------+- ... -+          +--------+- ... -+--------+---------+
| memmap | memmap | ...   |<- hole ->| memmap |  ...  | memmap | memmap  |
+--------+--------+- ... -+          +--------+- ... -+--------+---------+


> After apply patch[1], the dump_page log,
> 
> page:ef3cc000 is uninitialized and poisoned
> raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> page dumped because:

This means that there is a memory map entry, and it got poisoned during the
initialization and never got reinitialized to sensible values, which would
be PageReserved() in this case.

I believe this was fixed by commit 0740a50b9baa ("mm/page_alloc.c: refactor
initialization of struct page for holes in memory layout") in the mainline
tree.

Can you backport it to your 5.10 tree and check if it helps?
 
-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-12  8:26                                                       ` Mike Rapoport
@ 2021-05-13  3:44                                                         ` Kefeng Wang
  2021-05-13 10:55                                                           ` Mike Rapoport
  0 siblings, 1 reply; 47+ messages in thread
From: Kefeng Wang @ 2021-05-13  3:44 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm, wangkefeng.wang



On 2021/5/12 16:26, Mike Rapoport wrote:
> On Wed, May 12, 2021 at 11:08:14AM +0800, Kefeng Wang wrote:
>>
>> On 2021/5/11 16:48, Mike Rapoport wrote:
>>> On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote:
>>>>
>>>>>> The memory is not continuous, see MEMBLOCK:
>>>>>>     memory size = 0x4c0fffff reserved size = 0x027ef058
>>>>>>     memory.cnt  = 0xa
>>>>>>     memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
>>>>>>     memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
>>>>>>     memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
>>>>>>     memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
>>>>>>     memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
>>>>>>     memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
>>>>>>     memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
>>>>>> ...
>>>>>>
>>>>>> The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000]
>>>>>> is not available memory, and we won't create memmap , so with or without
>>>>>> your patch, we can't see the range in free_memmap(), right?
>>>>>
>>>>> This is not available memory and we won't see the reange in free_memmap(),
>>>>> but we still should create memmap for it and that's what my patch tried to
>>>>> do.
>>>>>
>>>>> There are a lot of places in core mm that operate on pageblocks and
>>>>> free_unused_memmap() should make sure that any pageblock has a valid memory
>>>>> map.
>>>>>
>>>>> Currently, that's not the case when SPARSEMEM=y and my patch tried to fix
>>>>> it.
>>>>>
>>>>> Can you please send log with my patch applied and with the printing of
>>>>> ranges that are freed in free_unused_memmap() you've used in previous
>>>>> mails?
>>>
>>>> with your patch[1] and debug print in free_memmap,
>>>> ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86800, 86800000
>>>> ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e000, 8e000000
>>>> ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
>>>> ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de400, de400000
>>>> ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
>>>> ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
>>>> ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
>>>
>>> It seems that freeing of the memory map is suboptimal still because that
>>> code was not designed for memory layout that has more holes than Swiss
>>> cheese.
>>>
>>> Still, the range [0xde600,0xde700] is not freed and there should be struct
>>> pages for this range.
>>>
>>> Can you add
>>>
>>> 	dump_page(pfn_to_page(0xde600), "");
>>>
>>> say, in the end of memblock_free_all()?
>>>
>> The range [0xde600,0xde700] is not memory, so it won't create struct page
>> for it when sparse_init?
> 
> sparse_init() indeed does not create memory map for unpopulated memory, but
> it has pretty coarse granularity, i.e. 64M in your configuration. A hole
> should be at least 64M in order to skip allocation of the memory map for
> it.
> 
> For example, your memory layout has a hole of 192M at pfn 0xc0000 and this
> hole won't have the memory map.
> 
> However the hole 0xdca00 - 0xde70 will still have a memory map in the
> section  that covers 0xdc000 - 0xe0000.
> 
> I've tried outline this in a sketch below, hope it helps.
> 
> Memory:
>                            c0000      cc000                      dca00
> --------------------------+          +--------------------------+ +----+
>   memory bank              |<- hole ->| memory bank              | | mb |
> --------------------------+          +--------------------------+ +----+
>                                                                  de700  dea00
> 
> Memory map:
> 
> b0000    b4000            c0000      cc000   d0000    d8000    dc000
> +--------+--------+- ... -+          +--------+- ... -+--------+---------+
> | memmap | memmap | ...   |<- hole ->| memmap |  ...  | memmap | memmap  |
> +--------+--------+- ... -+          +--------+- ... -+--------+---------+
> 
> 
Thanks for the sketch, it is more clear,

>> After apply patch[1], the dump_page log,
>>
>> page:ef3cc000 is uninitialized and poisoned
>> raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
>> page dumped because:
> 
> This means that there is a memory map entry, and it got poisoned during the
> initialization and never got reinitialized to sensible values, which would
> be PageReserved() in this case.
> 
> I believe this was fixed by commit 0740a50b9baa ("mm/page_alloc.c: refactor
> initialization of struct page for holes in memory layout") in the mainline
> tree.
> 
> Can you backport it to your 5.10 tree and check if it helps?
>   
Hi Mike, the 0740a50b9baa is already in 5.10, tags/v5.10.24~5

commit 4c84191cbc3eff49568d3c5cccb628fa382cf7fb
Author: Mike Rapoport <rppt@kernel.org>
Date:   Fri Mar 12 21:07:12 2021 -0800

     mm/page_alloc.c: refactor initialization of struct page for holes 
in memory layout

     commit 0740a50b9baa4472cfb12442df4b39e2712a64a4 upstream.

but check init_unavailable_range(), we need deal with the hole in the
range of one pageblock.

For our scene, pageblock range: 0xde600,0xde7ff, but the available pfn 
begin with 0xde700.

If pfn(eg, 0xde600) is not valid, the step in init_unavailable_range is
pageblock_nr_pages, and ALIGN_DOWN(pfn, pageblock_nr_pages) from 0xde600
to 0xde700 is same, so the page range for pfn [0xde600,0xde700] won't be
initialized.

After add the following patch, the oom test could passed,

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..0c7e04f86f9f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6484,13 +6484,14 @@ static u64 __meminit 
init_unavailable_range(unsigned long spfn,
                                             unsigned long epfn,
                                             int zone, int node)
  {
-       unsigned long pfn;
+       unsigned long pfn, pfn_down;
+       unsigned long epfn_down = ALIGN_DOWN(epfn, pageblock_nr_pages);
         u64 pgcnt = 0;

         for (pfn = spfn; pfn < epfn; pfn++) {
-               if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
-                       pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
-                               + pageblock_nr_pages - 1;
+               pfn_down = ALIGN_DOWN(pfn, pageblock_nr_pages);
+               if (!pfn_valid(pfn_down) && pfn_down != epfn_down) {
+                       pfn = pfn_down + pageblock_nr_pages - 1;
                         continue;
                 }
                 __init_single_page(pfn_to_page(pfn), pfn, zone, node);


Before:
On node 0 totalpages: 311551
   Normal zone: 1230 pages used for memmap
   Normal zone: 0 pages reserved
   Normal zone: 157440 pages, LIFO batch:31
   Normal zone: 16384 pages in unavailable ranges
   HighMem zone: 154111 pages, LIFO batch:31
   HighMem zone: 1 pages in unavailable ranges

page:ef3cc000 is uninitialized and poisoned
raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

After:
On node 0 totalpages: 311551
   Normal zone: 1230 pages used for memmap
   Normal zone: 0 pages reserved
   Normal zone: 157440 pages, LIFO batch:31
   Normal zone: 17152 pages in unavailable ranges
   HighMem zone: 154111 pages, LIFO batch:31
   HighMem zone: 513 pages in unavailable ranges
...
page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xde600
flags: 0xdd001000(reserved)
raw: dd001000 ef3cc004 ef3cc004 00000000 00000000 00000000 ffffffff 00000001


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-13  3:44                                                         ` Kefeng Wang
@ 2021-05-13 10:55                                                           ` Mike Rapoport
  2021-05-14  2:18                                                             ` Kefeng Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2021-05-13 10:55 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On Thu, May 13, 2021 at 11:44:00AM +0800, Kefeng Wang wrote:
> On 2021/5/12 16:26, Mike Rapoport wrote:
> > On Wed, May 12, 2021 at 11:08:14AM +0800, Kefeng Wang wrote:
> > > 
> > > On 2021/5/11 16:48, Mike Rapoport wrote:
> > > > On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote:
> > > > > 
> > > > > > > The memory is not continuous, see MEMBLOCK:
> > > > > > >     memory size = 0x4c0fffff reserved size = 0x027ef058
> > > > > > >     memory.cnt  = 0xa
> > > > > > >     memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
> > > > > > >     memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
> > > > > > >     memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
> > > > > > >     memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
> > > > > > >     memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
> > > > > > >     memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
> > > > > > >     memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
> > > > > > > ...
> > > > > > > 
> > > > > > > The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000]
> > > > > > > is not available memory, and we won't create memmap , so with or without
> > > > > > > your patch, we can't see the range in free_memmap(), right?
> > > > > > 
> > > > > > This is not available memory and we won't see the reange in free_memmap(),
> > > > > > but we still should create memmap for it and that's what my patch tried to
> > > > > > do.
> > > > > > 
> > > > > > There are a lot of places in core mm that operate on pageblocks and
> > > > > > free_unused_memmap() should make sure that any pageblock has a valid memory
> > > > > > map.
> > > > > > 
> > > > > > Currently, that's not the case when SPARSEMEM=y and my patch tried to fix
> > > > > > it.
> > > > > > 
> > > > > > Can you please send log with my patch applied and with the printing of
> > > > > > ranges that are freed in free_unused_memmap() you've used in previous
> > > > > > mails?
> > > > 
> > > > > with your patch[1] and debug print in free_memmap,
> > > > > ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86800, 86800000
> > > > > ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e000, 8e000000
> > > > > ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
> > > > > ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de400, de400000
> > > > > ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
> > > > > ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
> > > > > ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
> > > > 
> > > > It seems that freeing of the memory map is suboptimal still because that
> > > > code was not designed for memory layout that has more holes than Swiss
> > > > cheese.
> > > > 
> > > > Still, the range [0xde600,0xde700] is not freed and there should be struct
> > > > pages for this range.
> > > > 
> > > > Can you add
> > > > 
> > > > 	dump_page(pfn_to_page(0xde600), "");
> > > > 
> > > > say, in the end of memblock_free_all()?
> > > > 
> > > The range [0xde600,0xde700] is not memory, so it won't create struct page
> > > for it when sparse_init?
> > 
> > sparse_init() indeed does not create memory map for unpopulated memory, but
> > it has pretty coarse granularity, i.e. 64M in your configuration. A hole
> > should be at least 64M in order to skip allocation of the memory map for
> > it.
> > 
> > For example, your memory layout has a hole of 192M at pfn 0xc0000 and this
> > hole won't have the memory map.
> > 
> > However the hole 0xdca00 - 0xde70 will still have a memory map in the
> > section  that covers 0xdc000 - 0xe0000.
> > 
> > I've tried outline this in a sketch below, hope it helps.
> > 
> > Memory:
> >                            c0000      cc000                      dca00
> > --------------------------+          +--------------------------+ +----+
> >   memory bank              |<- hole ->| memory bank              | | mb |
> > --------------------------+          +--------------------------+ +----+
> >                                                                  de700  dea00
> > 
> > Memory map:
> > 
> > b0000    b4000            c0000      cc000   d0000    d8000    dc000
> > +--------+--------+- ... -+          +--------+- ... -+--------+---------+
> > | memmap | memmap | ...   |<- hole ->| memmap |  ...  | memmap | memmap  |
> > +--------+--------+- ... -+          +--------+- ... -+--------+---------+
> > 
> > 
> Thanks for the sketch, it is more clear,
> 
> > > After apply patch[1], the dump_page log,
> > > 
> > > page:ef3cc000 is uninitialized and poisoned
> > > raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> > > page dumped because:
> > 
> > This means that there is a memory map entry, and it got poisoned during the
> > initialization and never got reinitialized to sensible values, which would
> > be PageReserved() in this case.
> > 
> > I believe this was fixed by commit 0740a50b9baa ("mm/page_alloc.c: refactor
> > initialization of struct page for holes in memory layout") in the mainline
> > tree.
> > 
> > Can you backport it to your 5.10 tree and check if it helps?
> Hi Mike, the 0740a50b9baa is already in 5.10, tags/v5.10.24~5

Ah, you are using stable 5.10.y.
 
> commit 4c84191cbc3eff49568d3c5cccb628fa382cf7fb
> Author: Mike Rapoport <rppt@kernel.org>
> Date:   Fri Mar 12 21:07:12 2021 -0800
> 
>     mm/page_alloc.c: refactor initialization of struct page for holes in
> memory layout
> 
>     commit 0740a50b9baa4472cfb12442df4b39e2712a64a4 upstream.
> 
> but check init_unavailable_range(), we need deal with the hole in the
> range of one pageblock.
> 
> For our scene, pageblock range: 0xde600,0xde7ff, but the available pfn begin
> with 0xde700.
> 
> If pfn(eg, 0xde600) is not valid, the step in init_unavailable_range is
> pageblock_nr_pages, and ALIGN_DOWN(pfn, pageblock_nr_pages) from 0xde600
> to 0xde700 is same, so the page range for pfn [0xde600,0xde700] won't be
> initialized.

The pfn 0xde600 is valid in the sense that there is a memory map for that
pfn. Yet, with ARM's custom pfn_valid() will treat it as invalid because
there is a hole.
 
> After add the following patch, the oom test could passed,
 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index aaa1655cf682..0c7e04f86f9f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6484,13 +6484,14 @@ static u64 __meminit init_unavailable_range(unsigned
> long spfn,
>                                             unsigned long epfn,
>                                             int zone, int node)
>  {
> -       unsigned long pfn;
> +       unsigned long pfn, pfn_down;
> +       unsigned long epfn_down = ALIGN_DOWN(epfn, pageblock_nr_pages);
>         u64 pgcnt = 0;
> 
>         for (pfn = spfn; pfn < epfn; pfn++) {
> -               if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
> -                       pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
> -                               + pageblock_nr_pages - 1;
> +               pfn_down = ALIGN_DOWN(pfn, pageblock_nr_pages);
> +               if (!pfn_valid(pfn_down) && pfn_down != epfn_down) {
> +                       pfn = pfn_down + pageblock_nr_pages - 1;
>                         continue;
>                 }
>                 __init_single_page(pfn_to_page(pfn), pfn, zone, node);

I'd rather prefer to keep init_unavailable_range() and the assumption that
the memory map always covers an entire pageblock.

Can you please try the below hack. Essentially, it makes arm with SPARSEMEM
to use the generic pfn_valid() and updates the freeing of the memory map to
have the entire pageblocks covered.

If this works I'll send formal patches for those changes.


diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 24804f11302d..86ee711a3fdb 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -73,7 +73,7 @@ config ARM
 	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL
 	select HAVE_ARCH_MMAP_RND_BITS if MMU
-	select HAVE_ARCH_PFN_VALID
+#	select HAVE_ARCH_PFN_VALID
 	select HAVE_ARCH_SECCOMP
 	select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
diff --git a/mm/memblock.c b/mm/memblock.c
index 504435753259..0d7bef1b49c3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1928,9 +1928,11 @@ static void __init free_unused_memmap(void)
 	unsigned long start, end, prev_end = 0;
 	int i;
 
+#ifndef CONFIG_ARM
 	if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) ||
 	    IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
 		return;
+#endif
 
 	/*
 	 * This relies on each bank being in address order.
@@ -1943,14 +1945,13 @@ static void __init free_unused_memmap(void)
 		 * due to SPARSEMEM sections which aren't present.
 		 */
 		start = min(start, ALIGN(prev_end, PAGES_PER_SECTION));
-#else
+#endif
 		/*
 		 * Align down here since the VM subsystem insists that the
 		 * memmap entries are valid from the bank start aligned to
 		 * MAX_ORDER_NR_PAGES.
 		 */
 		start = round_down(start, MAX_ORDER_NR_PAGES);
-#endif
 
 		/*
 		 * If we had a previous bank, and there is a space
 

> Before:
> On node 0 totalpages: 311551
>   Normal zone: 1230 pages used for memmap
>   Normal zone: 0 pages reserved
>   Normal zone: 157440 pages, LIFO batch:31
>   Normal zone: 16384 pages in unavailable ranges
>   HighMem zone: 154111 pages, LIFO batch:31
>   HighMem zone: 1 pages in unavailable ranges
> 
> page:ef3cc000 is uninitialized and poisoned
> raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> 
> After:
> On node 0 totalpages: 311551
>   Normal zone: 1230 pages used for memmap
>   Normal zone: 0 pages reserved
>   Normal zone: 157440 pages, LIFO batch:31
>   Normal zone: 17152 pages in unavailable ranges
>   HighMem zone: 154111 pages, LIFO batch:31
>   HighMem zone: 513 pages in unavailable ranges
> ...
> page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xde600
> flags: 0xdd001000(reserved)
> raw: dd001000 ef3cc004 ef3cc004 00000000 00000000 00000000 ffffffff 00000001
> 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())
  2021-05-13 10:55                                                           ` Mike Rapoport
@ 2021-05-14  2:18                                                             ` Kefeng Wang
  0 siblings, 0 replies; 47+ messages in thread
From: Kefeng Wang @ 2021-05-14  2:18 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: David Hildenbrand, linux-arm-kernel, Andrew Morton,
	Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm



On 2021/5/13 18:55, Mike Rapoport wrote:
> On Thu, May 13, 2021 at 11:44:00AM +0800, Kefeng Wang wrote:
>> On 2021/5/12 16:26, Mike Rapoport wrote:
>>> On Wed, May 12, 2021 at 11:08:14AM +0800, Kefeng Wang wrote:
>>>>
>>>> On 2021/5/11 16:48, Mike Rapoport wrote:
>>>>> On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote:
>>>>>>
>>>>>>>> The memory is not continuous, see MEMBLOCK:
>>>>>>>>      memory size = 0x4c0fffff reserved size = 0x027ef058
>>>>>>>>      memory.cnt  = 0xa
>>>>>>>>      memory[0x0]    [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0
>>>>>>>>      memory[0x1]    [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0
>>>>>>>>      memory[0x2]    [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0
>>>>>>>>      memory[0x3]    [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0
>>>>>>>>      memory[0x4]    [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0
>>>>>>>>      memory[0x5]    [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0
>>>>>>>>      memory[0x6]    [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0
>>>>>>>> ...
>>>>>>>>
>>>>>>>> The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000]
>>>>>>>> is not available memory, and we won't create memmap , so with or without
>>>>>>>> your patch, we can't see the range in free_memmap(), right?
>>>>>>>
>>>>>>> This is not available memory and we won't see the reange in free_memmap(),
>>>>>>> but we still should create memmap for it and that's what my patch tried to
>>>>>>> do.
>>>>>>>
>>>>>>> There are a lot of places in core mm that operate on pageblocks and
>>>>>>> free_unused_memmap() should make sure that any pageblock has a valid memory
>>>>>>> map.
>>>>>>>
>>>>>>> Currently, that's not the case when SPARSEMEM=y and my patch tried to fix
>>>>>>> it.
>>>>>>>
>>>>>>> Can you please send log with my patch applied and with the printing of
>>>>>>> ranges that are freed in free_unused_memmap() you've used in previous
>>>>>>> mails?
>>>>>
>>>>>> with your patch[1] and debug print in free_memmap,
>>>>>> ----> free_memmap, start_pfn = 85800,  85800000 end_pfn = 86800, 86800000
>>>>>> ----> free_memmap, start_pfn = 8c800,  8c800000 end_pfn = 8e000, 8e000000
>>>>>> ----> free_memmap, start_pfn = 8f000,  8f000000 end_pfn = 90000, 90000000
>>>>>> ----> free_memmap, start_pfn = dcc00,  dcc00000 end_pfn = de400, de400000
>>>>>> ----> free_memmap, start_pfn = dec00,  dec00000 end_pfn = e0000, e0000000
>>>>>> ----> free_memmap, start_pfn = e0c00,  e0c00000 end_pfn = e4000, e4000000
>>>>>> ----> free_memmap, start_pfn = f7000,  f7000000 end_pfn = f8000, f8000000
>>>>>
>>>>> It seems that freeing of the memory map is suboptimal still because that
>>>>> code was not designed for memory layout that has more holes than Swiss
>>>>> cheese.
>>>>>
>>>>> Still, the range [0xde600,0xde700] is not freed and there should be struct
>>>>> pages for this range.
>>>>>
>>>>> Can you add
>>>>>
>>>>> 	dump_page(pfn_to_page(0xde600), "");
>>>>>
>>>>> say, in the end of memblock_free_all()?
>>>>>
>>>> The range [0xde600,0xde700] is not memory, so it won't create struct page
>>>> for it when sparse_init?
>>>
>>> sparse_init() indeed does not create memory map for unpopulated memory, but
>>> it has pretty coarse granularity, i.e. 64M in your configuration. A hole
>>> should be at least 64M in order to skip allocation of the memory map for
>>> it.
>>>
>>> For example, your memory layout has a hole of 192M at pfn 0xc0000 and this
>>> hole won't have the memory map.
>>>
>>> However the hole 0xdca00 - 0xde70 will still have a memory map in the
>>> section  that covers 0xdc000 - 0xe0000.
>>>
>>> I've tried outline this in a sketch below, hope it helps.
>>>
>>> Memory:
>>>                             c0000      cc000                      dca00
>>> --------------------------+          +--------------------------+ +----+
>>>    memory bank              |<- hole ->| memory bank              | | mb |
>>> --------------------------+          +--------------------------+ +----+
>>>                                                                   de700  dea00
>>>
>>> Memory map:
>>>
>>> b0000    b4000            c0000      cc000   d0000    d8000    dc000
>>> +--------+--------+- ... -+          +--------+- ... -+--------+---------+
>>> | memmap | memmap | ...   |<- hole ->| memmap |  ...  | memmap | memmap  |
>>> +--------+--------+- ... -+          +--------+- ... -+--------+---------+
>>>
>>>
>> Thanks for the sketch, it is more clear,
>>
>>>> After apply patch[1], the dump_page log,
>>>>
>>>> page:ef3cc000 is uninitialized and poisoned
>>>> raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
>>>> page dumped because:
>>>
>>> This means that there is a memory map entry, and it got poisoned during the
>>> initialization and never got reinitialized to sensible values, which would
>>> be PageReserved() in this case.
>>>
>>> I believe this was fixed by commit 0740a50b9baa ("mm/page_alloc.c: refactor
>>> initialization of struct page for holes in memory layout") in the mainline
>>> tree.
>>>
>>> Can you backport it to your 5.10 tree and check if it helps?
>> Hi Mike, the 0740a50b9baa is already in 5.10, tags/v5.10.24~5
> 
> Ah, you are using stable 5.10.y.
>   
>> commit 4c84191cbc3eff49568d3c5cccb628fa382cf7fb
>> Author: Mike Rapoport <rppt@kernel.org>
>> Date:   Fri Mar 12 21:07:12 2021 -0800
>>
>>      mm/page_alloc.c: refactor initialization of struct page for holes in
>> memory layout
>>
>>      commit 0740a50b9baa4472cfb12442df4b39e2712a64a4 upstream.
>>
>> but check init_unavailable_range(), we need deal with the hole in the
>> range of one pageblock.
>>
>> For our scene, pageblock range: 0xde600,0xde7ff, but the available pfn begin
>> with 0xde700.
>>
>> If pfn(eg, 0xde600) is not valid, the step in init_unavailable_range is
>> pageblock_nr_pages, and ALIGN_DOWN(pfn, pageblock_nr_pages) from 0xde600
>> to 0xde700 is same, so the page range for pfn [0xde600,0xde700] won't be
>> initialized.
> 
> The pfn 0xde600 is valid in the sense that there is a memory map for that
> pfn. Yet, with ARM's custom pfn_valid() will treat it as invalid because
> there is a hole.
>   
>> After add the following patch, the oom test could passed,
>   
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index aaa1655cf682..0c7e04f86f9f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6484,13 +6484,14 @@ static u64 __meminit init_unavailable_range(unsigned
>> long spfn,
>>                                              unsigned long epfn,
>>                                              int zone, int node)
>>   {
>> -       unsigned long pfn;
>> +       unsigned long pfn, pfn_down;
>> +       unsigned long epfn_down = ALIGN_DOWN(epfn, pageblock_nr_pages);
>>          u64 pgcnt = 0;
>>
>>          for (pfn = spfn; pfn < epfn; pfn++) {
>> -               if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
>> -                       pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
>> -                               + pageblock_nr_pages - 1;
>> +               pfn_down = ALIGN_DOWN(pfn, pageblock_nr_pages);
>> +               if (!pfn_valid(pfn_down) && pfn_down != epfn_down) {
>> +                       pfn = pfn_down + pageblock_nr_pages - 1;
>>                          continue;
>>                  }
>>                  __init_single_page(pfn_to_page(pfn), pfn, zone, node);
> 
> I'd rather prefer to keep init_unavailable_range() and the assumption that
> the memory map always covers an entire pageblock.
> 
> Can you please try the below hack. Essentially, it makes arm with SPARSEMEM
> to use the generic pfn_valid() and updates the freeing of the memory map to
> have the entire pageblocks covered.
> 
> If this works I'll send formal patches for those changes.
> 
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 24804f11302d..86ee711a3fdb 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -73,7 +73,7 @@ config ARM
>   	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
>   	select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL
>   	select HAVE_ARCH_MMAP_RND_BITS if MMU
> -	select HAVE_ARCH_PFN_VALID
> +#	select HAVE_ARCH_PFN_VALID
>   	select HAVE_ARCH_SECCOMP
>   	select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
>   	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 504435753259..0d7bef1b49c3 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1928,9 +1928,11 @@ static void __init free_unused_memmap(void)
>   	unsigned long start, end, prev_end = 0;
>   	int i;
>   
> +#ifndef CONFIG_ARM
>   	if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) ||
>   	    IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
>   		return;
> +#endif
>   
>   	/*
>   	 * This relies on each bank being in address order.
> @@ -1943,14 +1945,13 @@ static void __init free_unused_memmap(void)
>   		 * due to SPARSEMEM sections which aren't present.
>   		 */
>   		start = min(start, ALIGN(prev_end, PAGES_PER_SECTION));
> -#else
> +#endif
>   		/*
>   		 * Align down here since the VM subsystem insists that the
>   		 * memmap entries are valid from the bank start aligned to
>   		 * MAX_ORDER_NR_PAGES.
>   		 */
>   		start = round_down(start, MAX_ORDER_NR_PAGES);
> -#endif
>   
>   		/*
>   		 * If we had a previous bank, and there is a space
>   
> 

Without HAVE_ARCH_PFN_VALID, init_unavailable_range will set those page
with Reserved flag, and yes, it works for oom test.

On node 0 totalpages: 311551
   Normal zone: 1230 pages used for memmap
   Normal zone: 0 pages reserved
   Normal zone: 157440 pages, LIFO batch:31
   Normal zone: 55552 pages in unavailable ranges
   HighMem zone: 154111 pages, LIFO batch:31
   HighMem zone: 41985 pages in unavailable ranges

Thanks for your kindly guidance.

>> Before:
>> On node 0 totalpages: 311551
>>    Normal zone: 1230 pages used for memmap
>>    Normal zone: 0 pages reserved
>>    Normal zone: 157440 pages, LIFO batch:31
>>    Normal zone: 16384 pages in unavailable ranges
>>    HighMem zone: 154111 pages, LIFO batch:31
>>    HighMem zone: 1 pages in unavailable ranges
>>
>> page:ef3cc000 is uninitialized and poisoned
>> raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
>>
>> After:
>> On node 0 totalpages: 311551
>>    Normal zone: 1230 pages used for memmap
>>    Normal zone: 0 pages reserved
>>    Normal zone: 157440 pages, LIFO batch:31
>>    Normal zone: 17152 pages in unavailable ranges
>>    HighMem zone: 154111 pages, LIFO batch:31
>>    HighMem zone: 513 pages in unavailable ranges
>> ...
>> page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xde600
>> flags: 0xdd001000(reserved)
>> raw: dd001000 ef3cc004 ef3cc004 00000000 00000000 00000000 ffffffff 00000001
>>
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2021-05-14  2:20 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21  6:51 [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-04-21  6:51 ` [PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid() Mike Rapoport
2021-04-21 10:49   ` Anshuman Khandual
2021-04-21  6:51 ` [PATCH v2 2/4] memblock: update initialization of reserved pages Mike Rapoport
2021-04-21  7:49   ` David Hildenbrand
2021-04-21 10:51   ` Anshuman Khandual
2021-04-21  6:51 ` [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid() Mike Rapoport
2021-04-21 10:59   ` Anshuman Khandual
2021-04-21 12:19     ` Mike Rapoport
2021-04-21 13:13       ` Anshuman Khandual
2021-04-21  6:51 ` [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-04-21  7:49   ` David Hildenbrand
2021-04-21 11:06   ` Anshuman Khandual
2021-04-21 12:24     ` Mike Rapoport
2021-04-21 13:15       ` Anshuman Khandual
2021-04-22  7:00 ` [PATCH v2 0/4] " Kefeng Wang
2021-04-22  7:29   ` Mike Rapoport
2021-04-22 15:28     ` Kefeng Wang
2021-04-23  8:11       ` Kefeng Wang
2021-04-25  7:19         ` arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()) Mike Rapoport
     [not found]           ` <52f7d03b-7219-46bc-c62d-b976bc31ebd5@huawei.com>
2021-04-26  5:20             ` Mike Rapoport
2021-04-26 15:26               ` Kefeng Wang
2021-04-27  6:23                 ` Mike Rapoport
2021-04-27 11:08                   ` Kefeng Wang
2021-04-28  5:59                     ` Mike Rapoport
2021-04-29  0:48                       ` Kefeng Wang
2021-04-29  6:57                         ` Mike Rapoport
2021-04-29 10:22                           ` Kefeng Wang
2021-04-30  9:51                             ` Mike Rapoport
2021-04-30 11:24                               ` Kefeng Wang
2021-05-03  6:26                                 ` Mike Rapoport
2021-05-03  8:07                                   ` David Hildenbrand
2021-05-03  8:44                                     ` Mike Rapoport
2021-05-06 12:47                                       ` Kefeng Wang
2021-05-07  7:17                                         ` Kefeng Wang
2021-05-07 10:30                                           ` Mike Rapoport
2021-05-07 12:34                                             ` Kefeng Wang
2021-05-09  5:59                                               ` Mike Rapoport
2021-05-10  3:10                                                 ` Kefeng Wang
2021-05-11  8:48                                                   ` Mike Rapoport
2021-05-12  3:08                                                     ` Kefeng Wang
2021-05-12  8:26                                                       ` Mike Rapoport
2021-05-13  3:44                                                         ` Kefeng Wang
2021-05-13 10:55                                                           ` Mike Rapoport
2021-05-14  2:18                                                             ` Kefeng Wang
2021-05-12  3:50             ` Matthew Wilcox
2021-04-25  6:59       ` [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).