linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/RFT PATCH 0/3] arm64: drop pfn_valid_within() and simplify pfn_valid()
@ 2021-04-07 17:26 Mike Rapoport
  2021-04-07 17:26 ` [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages Mike Rapoport
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Mike Rapoport @ 2021-04-07 17:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

Hi,

These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
pfn_valid_within() to 1. 

The idea is to mark NOMAP pages as reserved in the memory map and restore
the intended semantics of pfn_valid() to designate availability of struct
page for a pfn.

With this the core mm will be able to cope with the fact that it cannot use
NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
will be treated correctly even without the need for pfn_valid_within.

The patches are only boot tested on qemu-system-aarch64 so I'd really
appreciate memory stress tests on real hardware.

If this actually works we'll be one step closer to drop custom pfn_valid()
on arm64 altogether.

Mike Rapoport (3):
  memblock: update initialization of reserved pages
  arm64: decouple check whether pfn is normal memory from pfn_valid()
  arm64: drop pfn_valid_within() and simplify pfn_valid()

 arch/arm64/Kconfig              |  3 ---
 arch/arm64/include/asm/memory.h |  2 +-
 arch/arm64/include/asm/page.h   |  1 +
 arch/arm64/kvm/mmu.c            |  2 +-
 arch/arm64/mm/init.c            | 10 ++++++++--
 arch/arm64/mm/ioremap.c         |  4 ++--
 arch/arm64/mm/mmu.c             |  2 +-
 mm/memblock.c                   | 23 +++++++++++++++++++++--
 8 files changed, 35 insertions(+), 12 deletions(-)


base-commit: e49d033bddf5b565044e2abe4241353959bc9120
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-07 17:26 [RFC/RFT PATCH 0/3] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
@ 2021-04-07 17:26 ` Mike Rapoport
  2021-04-08  5:16   ` Anshuman Khandual
  2021-04-14 15:12   ` David Hildenbrand
  2021-04-07 17:26 ` [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid() Mike Rapoport
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Mike Rapoport @ 2021-04-07 17:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

The struct pages representing a reserved memory region are initialized
using reserve_bootmem_range() function. This function is called for each
reserved region just before the memory is freed from memblock to the buddy
page allocator.

The struct pages for MEMBLOCK_NOMAP regions are kept with the default
values set by the memory map initialization which makes it necessary to
have a special treatment for such pages in pfn_valid() and
pfn_valid_within().

Split out initialization of the reserved pages to a function with a
meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
reserved regions and mark struct pages for the NOMAP regions as
PageReserved.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 mm/memblock.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..6b7ea9d86310 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
 	return end_pfn - start_pfn;
 }
 
+static void __init memmap_init_reserved_pages(void)
+{
+	struct memblock_region *region;
+	phys_addr_t start, end;
+	u64 i;
+
+	/* initialize struct pages for the reserved regions */
+	for_each_reserved_mem_range(i, &start, &end)
+		reserve_bootmem_region(start, end);
+
+	/* and also treat struct pages for the NOMAP regions as PageReserved */
+	for_each_mem_region(region) {
+		if (memblock_is_nomap(region)) {
+			start = region->base;
+			end = start + region->size;
+			reserve_bootmem_region(start, end);
+		}
+	}
+}
+
 static unsigned long __init free_low_memory_core_early(void)
 {
 	unsigned long count = 0;
@@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
 
 	memblock_clear_hotplug(0, -1);
 
-	for_each_reserved_mem_range(i, &start, &end)
-		reserve_bootmem_region(start, end);
+	memmap_init_reserved_pages();
 
 	/*
 	 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid()
  2021-04-07 17:26 [RFC/RFT PATCH 0/3] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
  2021-04-07 17:26 ` [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages Mike Rapoport
@ 2021-04-07 17:26 ` Mike Rapoport
  2021-04-08  5:14   ` Anshuman Khandual
  2021-04-07 17:26 ` [RFC/RFT PATCH 3/3] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
  2021-04-08  5:19 ` [RFC/RFT PATCH 0/3] " Anshuman Khandual
  3 siblings, 1 reply; 25+ messages in thread
From: Mike Rapoport @ 2021-04-07 17:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

The intended semantics of pfn_valid() is to verify whether there is a
struct page for the pfn in question and nothing else.

Yet, on arm64 it is used to distinguish memory areas that are mapped in the
linear map vs those that require ioremap() to access them.

Introduce a dedicated pfn_is_memory() to perform such check and use it
where appropriate.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/include/asm/memory.h | 2 +-
 arch/arm64/include/asm/page.h   | 1 +
 arch/arm64/kvm/mmu.c            | 2 +-
 arch/arm64/mm/init.c            | 6 ++++++
 arch/arm64/mm/ioremap.c         | 4 ++--
 arch/arm64/mm/mmu.c             | 2 +-
 6 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0aabc3be9a75..7e77fdf71b9d 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
 
 #define virt_addr_valid(addr)	({					\
 	__typeof__(addr) __addr = __tag_reset(addr);			\
-	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
+	__is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));	\
 })
 
 void dump_mem_limit(void);
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..32b485bcc6ff 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
 typedef struct page *pgtable_t;
 
 extern int pfn_valid(unsigned long);
+extern int pfn_is_memory(unsigned long);
 
 #include <asm/memory.h>
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8711894db8c2..ad2ea65a3937 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 
 static bool kvm_is_device_pfn(unsigned long pfn)
 {
-	return !pfn_valid(pfn);
+	return !pfn_is_memory(pfn);
 }
 
 /*
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 3685e12aba9b..258b1905ed4a 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -258,6 +258,12 @@ int pfn_valid(unsigned long pfn)
 }
 EXPORT_SYMBOL(pfn_valid);
 
+int pfn_is_memory(unsigned long pfn)
+{
+	return memblock_is_map_memory(PFN_PHYS(pfn));
+}
+EXPORT_SYMBOL(pfn_is_memory);
+
 static phys_addr_t memory_limit = PHYS_ADDR_MAX;
 
 /*
diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index b5e83c46b23e..82a369b22ef5 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
 	/*
 	 * Don't allow RAM to be mapped.
 	 */
-	if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
+	if (WARN_ON(pfn_is_memory(__phys_to_pfn(phys_addr))))
 		return NULL;
 
 	area = get_vm_area_caller(size, VM_IOREMAP, caller);
@@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
 void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
 {
 	/* For normal memory we already have a cacheable mapping. */
-	if (pfn_valid(__phys_to_pfn(phys_addr)))
+	if (pfn_is_memory(__phys_to_pfn(phys_addr)))
 		return (void __iomem *)__phys_to_virt(phys_addr);
 
 	return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5d9550fdb9cf..038d20fe163f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -81,7 +81,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
-	if (!pfn_valid(pfn))
+	if (!pfn_is_memory(pfn))
 		return pgprot_noncached(vma_prot);
 	else if (file->f_flags & O_SYNC)
 		return pgprot_writecombine(vma_prot);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC/RFT PATCH 3/3] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-07 17:26 [RFC/RFT PATCH 0/3] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
  2021-04-07 17:26 ` [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages Mike Rapoport
  2021-04-07 17:26 ` [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid() Mike Rapoport
@ 2021-04-07 17:26 ` Mike Rapoport
  2021-04-08  5:12   ` Anshuman Khandual
  2021-04-08  5:19 ` [RFC/RFT PATCH 0/3] " Anshuman Khandual
  3 siblings, 1 reply; 25+ messages in thread
From: Mike Rapoport @ 2021-04-07 17:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm

From: Mike Rapoport <rppt@linux.ibm.com>

The arm64's version of pfn_valid() differs from the generic because of two
reasons:

* Parts of the memory map are freed during boot. This makes it necessary to
  verify that there is actual physical memory that corresponds to a pfn
  which is done by querying memblock.

* There are NOMAP memory regions. These regions are not mapped in the
  linear map and until the previous commit the struct pages representing
  these areas had default values.

As the consequence of absence of the special treatment of NOMAP regions in
the memory map it was necessary to use memblock_is_map_memory() in
pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
generic mm functionality would not treat a NOMAP page as a normal page.

Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
the rest of core mm will treat them as unusable memory and thus
pfn_valid_within() is no longer required at all and can be disabled by
removing CONFIG_HOLES_IN_ZONE on arm64.

pfn_valid() can be slightly simplified by replacing
memblock_is_map_memory() with memblock_is_memory().

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/Kconfig   | 3 ---
 arch/arm64/mm/init.c | 4 ++--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e4e1b6550115..58e439046d05 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
 	def_bool y
 	depends on NUMA
 
-config HOLES_IN_ZONE
-	def_bool y
-
 source "kernel/Kconfig.hz"
 
 config ARCH_SPARSEMEM_ENABLE
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 258b1905ed4a..bb6dd406b1f0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
 
 	/*
 	 * ZONE_DEVICE memory does not have the memblock entries.
-	 * memblock_is_map_memory() check for ZONE_DEVICE based
+	 * memblock_is_memory() check for ZONE_DEVICE based
 	 * addresses will always fail. Even the normal hotplugged
 	 * memory will never have MEMBLOCK_NOMAP flag set in their
 	 * memblock entries. Skip memblock search for all non early
@@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
 		return pfn_section_valid(ms, pfn);
 }
 #endif
-	return memblock_is_map_memory(addr);
+	return memblock_is_memory(addr);
 }
 EXPORT_SYMBOL(pfn_valid);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 3/3] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-07 17:26 ` [RFC/RFT PATCH 3/3] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
@ 2021-04-08  5:12   ` Anshuman Khandual
  2021-04-08  6:17     ` Mike Rapoport
  0 siblings, 1 reply; 25+ messages in thread
From: Anshuman Khandual @ 2021-04-08  5:12 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm


On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The arm64's version of pfn_valid() differs from the generic because of two
> reasons:
> 
> * Parts of the memory map are freed during boot. This makes it necessary to
>   verify that there is actual physical memory that corresponds to a pfn
>   which is done by querying memblock.
> 
> * There are NOMAP memory regions. These regions are not mapped in the
>   linear map and until the previous commit the struct pages representing
>   these areas had default values.
> 
> As the consequence of absence of the special treatment of NOMAP regions in
> the memory map it was necessary to use memblock_is_map_memory() in
> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> generic mm functionality would not treat a NOMAP page as a normal page.
> 
> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> the rest of core mm will treat them as unusable memory and thus
> pfn_valid_within() is no longer required at all and can be disabled by
> removing CONFIG_HOLES_IN_ZONE on arm64.

But what about the memory map that are freed during boot (mentioned above).
Would not they still cause CONFIG_HOLES_IN_ZONE to be applicable and hence
pfn_valid_within() ?

> 
> pfn_valid() can be slightly simplified by replacing
> memblock_is_map_memory() with memblock_is_memory().

Just to understand this better, pfn_valid() will now return true for all
MEMBLOCK_NOMAP based memory but that is okay as core MM would still ignore
them as unusable memory for being PageReserved().

> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/arm64/Kconfig   | 3 ---
>  arch/arm64/mm/init.c | 4 ++--
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e4e1b6550115..58e439046d05 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>  	def_bool y
>  	depends on NUMA
>  
> -config HOLES_IN_ZONE
> -	def_bool y
> -
>  source "kernel/Kconfig.hz"
>  
>  config ARCH_SPARSEMEM_ENABLE
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 258b1905ed4a..bb6dd406b1f0 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
>  
>  	/*
>  	 * ZONE_DEVICE memory does not have the memblock entries.
> -	 * memblock_is_map_memory() check for ZONE_DEVICE based
> +	 * memblock_is_memory() check for ZONE_DEVICE based
>  	 * addresses will always fail. Even the normal hotplugged
>  	 * memory will never have MEMBLOCK_NOMAP flag set in their
>  	 * memblock entries. Skip memblock search for all non early
> @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
>  		return pfn_section_valid(ms, pfn);
>  }
>  #endif
> -	return memblock_is_map_memory(addr);
> +	return memblock_is_memory(addr);
>  }
>  EXPORT_SYMBOL(pfn_valid);
>  
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid()
  2021-04-07 17:26 ` [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid() Mike Rapoport
@ 2021-04-08  5:14   ` Anshuman Khandual
  2021-04-08  6:00     ` Mike Rapoport
  2021-04-14 15:58     ` David Hildenbrand
  0 siblings, 2 replies; 25+ messages in thread
From: Anshuman Khandual @ 2021-04-08  5:14 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm


On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The intended semantics of pfn_valid() is to verify whether there is a
> struct page for the pfn in question and nothing else.

Should there be a comment affirming this semantics interpretation, above the
generic pfn_valid() in include/linux/mmzone.h ?

> 
> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> linear map vs those that require ioremap() to access them.
> 
> Introduce a dedicated pfn_is_memory() to perform such check and use it
> where appropriate.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/arm64/include/asm/memory.h | 2 +-
>  arch/arm64/include/asm/page.h   | 1 +
>  arch/arm64/kvm/mmu.c            | 2 +-
>  arch/arm64/mm/init.c            | 6 ++++++
>  arch/arm64/mm/ioremap.c         | 4 ++--
>  arch/arm64/mm/mmu.c             | 2 +-
>  6 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 0aabc3be9a75..7e77fdf71b9d 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>  
>  #define virt_addr_valid(addr)	({					\
>  	__typeof__(addr) __addr = __tag_reset(addr);			\
> -	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
> +	__is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));	\
>  })
>  
>  void dump_mem_limit(void);
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 012cffc574e8..32b485bcc6ff 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
>  typedef struct page *pgtable_t;
>  
>  extern int pfn_valid(unsigned long);
> +extern int pfn_is_memory(unsigned long);
>  
>  #include <asm/memory.h>
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 8711894db8c2..ad2ea65a3937 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>  
>  static bool kvm_is_device_pfn(unsigned long pfn)
>  {
> -	return !pfn_valid(pfn);
> +	return !pfn_is_memory(pfn);
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 3685e12aba9b..258b1905ed4a 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -258,6 +258,12 @@ int pfn_valid(unsigned long pfn)
>  }
>  EXPORT_SYMBOL(pfn_valid);
>  
> +int pfn_is_memory(unsigned long pfn)
> +{
> +	return memblock_is_map_memory(PFN_PHYS(pfn));
> +}
> +EXPORT_SYMBOL(pfn_is_memory);> +

Should not this be generic though ? There is nothing platform or arm64
specific in here. Wondering as pfn_is_memory() just indicates that the
pfn is linear mapped, should not it be renamed as pfn_is_linear_memory()
instead ? Regardless, it's fine either way.

>  static phys_addr_t memory_limit = PHYS_ADDR_MAX;
>  
>  /*
> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> index b5e83c46b23e..82a369b22ef5 100644
> --- a/arch/arm64/mm/ioremap.c
> +++ b/arch/arm64/mm/ioremap.c
> @@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
>  	/*
>  	 * Don't allow RAM to be mapped.
>  	 */
> -	if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
> +	if (WARN_ON(pfn_is_memory(__phys_to_pfn(phys_addr))))
>  		return NULL;
>  
>  	area = get_vm_area_caller(size, VM_IOREMAP, caller);
> @@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
>  void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
>  {
>  	/* For normal memory we already have a cacheable mapping. */
> -	if (pfn_valid(__phys_to_pfn(phys_addr)))
> +	if (pfn_is_memory(__phys_to_pfn(phys_addr)))
>  		return (void __iomem *)__phys_to_virt(phys_addr);
>  
>  	return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 5d9550fdb9cf..038d20fe163f 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -81,7 +81,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  			      unsigned long size, pgprot_t vma_prot)
>  {
> -	if (!pfn_valid(pfn))
> +	if (!pfn_is_memory(pfn))
>  		return pgprot_noncached(vma_prot);
>  	else if (file->f_flags & O_SYNC)
>  		return pgprot_writecombine(vma_prot);
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-07 17:26 ` [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages Mike Rapoport
@ 2021-04-08  5:16   ` Anshuman Khandual
  2021-04-08  5:48     ` Mike Rapoport
  2021-04-14 15:12   ` David Hildenbrand
  1 sibling, 1 reply; 25+ messages in thread
From: Anshuman Khandual @ 2021-04-08  5:16 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm



On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
> 
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
> 
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.

This would definitely need updating the comment for MEMBLOCK_NOMAP definition
in include/linux/memblock.h just to make the semantics is clear, though arm64
is currently the only user for MEMBLOCK_NOMAP.

> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  mm/memblock.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..6b7ea9d86310 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
>  	return end_pfn - start_pfn;
>  }
>  
> +static void __init memmap_init_reserved_pages(void)
> +{
> +	struct memblock_region *region;
> +	phys_addr_t start, end;
> +	u64 i;
> +
> +	/* initialize struct pages for the reserved regions */
> +	for_each_reserved_mem_range(i, &start, &end)
> +		reserve_bootmem_region(start, end);
> +
> +	/* and also treat struct pages for the NOMAP regions as PageReserved */
> +	for_each_mem_region(region) {
> +		if (memblock_is_nomap(region)) {
> +			start = region->base;
> +			end = start + region->size;
> +			reserve_bootmem_region(start, end);
> +		}
> +	}
> +}
> +
>  static unsigned long __init free_low_memory_core_early(void)
>  {
>  	unsigned long count = 0;
> @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
>  
>  	memblock_clear_hotplug(0, -1);
>  
> -	for_each_reserved_mem_range(i, &start, &end)
> -		reserve_bootmem_region(start, end);
> +	memmap_init_reserved_pages();
>  
>  	/*
>  	 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 0/3] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-07 17:26 [RFC/RFT PATCH 0/3] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
                   ` (2 preceding siblings ...)
  2021-04-07 17:26 ` [RFC/RFT PATCH 3/3] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
@ 2021-04-08  5:19 ` Anshuman Khandual
  2021-04-08  6:27   ` Mike Rapoport
  3 siblings, 1 reply; 25+ messages in thread
From: Anshuman Khandual @ 2021-04-08  5:19 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, David Hildenbrand, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm, James Morse

Adding James here.

+ James Morse <james.morse@arm.com>

On 4/7/21 10:56 PM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Hi,
> 
> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> pfn_valid_within() to 1. 

That would be really great for arm64 platform as it will save CPU cycles on
many generic MM paths, given that our pfn_valid() has been expensive.

> 
> The idea is to mark NOMAP pages as reserved in the memory map and restore

Though I am not really sure, would that possibly be problematic for UEFI/EFI
use cases as it might have just treated them as normal struct pages till now.

> the intended semantics of pfn_valid() to designate availability of struct
> page for a pfn.

Right, that would be better as the current semantics is not ideal.

> 
> With this the core mm will be able to cope with the fact that it cannot use
> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> will be treated correctly even without the need for pfn_valid_within.
> 
> The patches are only boot tested on qemu-system-aarch64 so I'd really
> appreciate memory stress tests on real hardware.

Did some preliminary memory stress tests on a guest with portions of memory
marked as MEMBLOCK_NOMAP and did not find any obvious problem. But this might
require some testing on real UEFI environment with firmware using MEMBLOCK_NOMAP
memory to make sure that changing these struct pages to PageReserved() is safe.


> 
> If this actually works we'll be one step closer to drop custom pfn_valid()
> on arm64 altogether.

Right, planning to rework and respin the RFC originally sent last month.

https://patchwork.kernel.org/project/linux-mm/patch/1615174073-10520-1-git-send-email-anshuman.khandual@arm.com/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-08  5:16   ` Anshuman Khandual
@ 2021-04-08  5:48     ` Mike Rapoport
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2021-04-08  5:48 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On Thu, Apr 08, 2021 at 10:46:18AM +0530, Anshuman Khandual wrote:
> 
> 
> On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > The struct pages representing a reserved memory region are initialized
> > using reserve_bootmem_range() function. This function is called for each
> > reserved region just before the memory is freed from memblock to the buddy
> > page allocator.
> > 
> > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > values set by the memory map initialization which makes it necessary to
> > have a special treatment for such pages in pfn_valid() and
> > pfn_valid_within().
> > 
> > Split out initialization of the reserved pages to a function with a
> > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> > reserved regions and mark struct pages for the NOMAP regions as
> > PageReserved.
> 
> This would definitely need updating the comment for MEMBLOCK_NOMAP definition
> in include/linux/memblock.h just to make the semantics is clear,

Sure

> though arm64 is currently the only user for MEMBLOCK_NOMAP.

> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  mm/memblock.c | 23 +++++++++++++++++++++--
> >  1 file changed, 21 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..6b7ea9d86310 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> >  	return end_pfn - start_pfn;
> >  }
> >  
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > +	struct memblock_region *region;
> > +	phys_addr_t start, end;
> > +	u64 i;
> > +
> > +	/* initialize struct pages for the reserved regions */
> > +	for_each_reserved_mem_range(i, &start, &end)
> > +		reserve_bootmem_region(start, end);
> > +
> > +	/* and also treat struct pages for the NOMAP regions as PageReserved */
> > +	for_each_mem_region(region) {
> > +		if (memblock_is_nomap(region)) {
> > +			start = region->base;
> > +			end = start + region->size;
> > +			reserve_bootmem_region(start, end);
> > +		}
> > +	}
> > +}
> > +
> >  static unsigned long __init free_low_memory_core_early(void)
> >  {
> >  	unsigned long count = 0;
> > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
> >  
> >  	memblock_clear_hotplug(0, -1);
> >  
> > -	for_each_reserved_mem_range(i, &start, &end)
> > -		reserve_bootmem_region(start, end);
> > +	memmap_init_reserved_pages();
> >  
> >  	/*
> >  	 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> > 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid()
  2021-04-08  5:14   ` Anshuman Khandual
@ 2021-04-08  6:00     ` Mike Rapoport
  2021-04-14 15:58     ` David Hildenbrand
  1 sibling, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2021-04-08  6:00 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On Thu, Apr 08, 2021 at 10:44:58AM +0530, Anshuman Khandual wrote:
> 
> On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > The intended semantics of pfn_valid() is to verify whether there is a
> > struct page for the pfn in question and nothing else.
> 
> Should there be a comment affirming this semantics interpretation, above the
> generic pfn_valid() in include/linux/mmzone.h ?

Yeah, that would have been helpful :)
 
> > 
> > Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> > linear map vs those that require ioremap() to access them.
> > 
> > Introduce a dedicated pfn_is_memory() to perform such check and use it
> > where appropriate.
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  arch/arm64/include/asm/memory.h | 2 +-
> >  arch/arm64/include/asm/page.h   | 1 +
> >  arch/arm64/kvm/mmu.c            | 2 +-
> >  arch/arm64/mm/init.c            | 6 ++++++
> >  arch/arm64/mm/ioremap.c         | 4 ++--
> >  arch/arm64/mm/mmu.c             | 2 +-
> >  6 files changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > index 0aabc3be9a75..7e77fdf71b9d 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
> >  
> >  #define virt_addr_valid(addr)	({					\
> >  	__typeof__(addr) __addr = __tag_reset(addr);			\
> > -	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
> > +	__is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));	\
> >  })
> >  
> >  void dump_mem_limit(void);
> > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> > index 012cffc574e8..32b485bcc6ff 100644
> > --- a/arch/arm64/include/asm/page.h
> > +++ b/arch/arm64/include/asm/page.h
> > @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
> >  typedef struct page *pgtable_t;
> >  
> >  extern int pfn_valid(unsigned long);
> > +extern int pfn_is_memory(unsigned long);
> >  
> >  #include <asm/memory.h>
> >  
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 8711894db8c2..ad2ea65a3937 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> >  
> >  static bool kvm_is_device_pfn(unsigned long pfn)
> >  {
> > -	return !pfn_valid(pfn);
> > +	return !pfn_is_memory(pfn);
> >  }
> >  
> >  /*
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 3685e12aba9b..258b1905ed4a 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -258,6 +258,12 @@ int pfn_valid(unsigned long pfn)
> >  }
> >  EXPORT_SYMBOL(pfn_valid);
> >  
> > +int pfn_is_memory(unsigned long pfn)
> > +{
> > +	return memblock_is_map_memory(PFN_PHYS(pfn));
> > +}
> > +EXPORT_SYMBOL(pfn_is_memory);> +
> 
> Should not this be generic though ? There is nothing platform or arm64
> specific in here.

As NOMAP itself is quite ARM specific, this check is currently only
relevant for arm64 and maybe arm32.
But probably having an EXPORT_SYMBOL wrapper for memblock_is_map_memory(),
say in memblock does make sense for all architectures that have
KEEP_MEMBLOCK.

> Wondering as pfn_is_memory() just indicates that the
> pfn is linear mapped, should not it be renamed as pfn_is_linear_memory()
> instead ? Regardless, it's fine either way.

Yeah, I agree that naming could be better here. I think that for a generic name
we'd need pfn_is_directly_mapped() so that it can be used on x86 ;-)
 
> >  static phys_addr_t memory_limit = PHYS_ADDR_MAX;
> >  
> >  /*
> > diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> > index b5e83c46b23e..82a369b22ef5 100644
> > --- a/arch/arm64/mm/ioremap.c
> > +++ b/arch/arm64/mm/ioremap.c
> > @@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
> >  	/*
> >  	 * Don't allow RAM to be mapped.
> >  	 */
> > -	if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
> > +	if (WARN_ON(pfn_is_memory(__phys_to_pfn(phys_addr))))
> >  		return NULL;
> >  
> >  	area = get_vm_area_caller(size, VM_IOREMAP, caller);
> > @@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
> >  void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
> >  {
> >  	/* For normal memory we already have a cacheable mapping. */
> > -	if (pfn_valid(__phys_to_pfn(phys_addr)))
> > +	if (pfn_is_memory(__phys_to_pfn(phys_addr)))
> >  		return (void __iomem *)__phys_to_virt(phys_addr);
> >  
> >  	return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 5d9550fdb9cf..038d20fe163f 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -81,7 +81,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
> >  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> >  			      unsigned long size, pgprot_t vma_prot)
> >  {
> > -	if (!pfn_valid(pfn))
> > +	if (!pfn_is_memory(pfn))
> >  		return pgprot_noncached(vma_prot);
> >  	else if (file->f_flags & O_SYNC)
> >  		return pgprot_writecombine(vma_prot);
> > 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 3/3] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-08  5:12   ` Anshuman Khandual
@ 2021-04-08  6:17     ` Mike Rapoport
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2021-04-08  6:17 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On Thu, Apr 08, 2021 at 10:42:43AM +0530, Anshuman Khandual wrote:
> 
> On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > The arm64's version of pfn_valid() differs from the generic because of two
> > reasons:
> > 
> > * Parts of the memory map are freed during boot. This makes it necessary to
> >   verify that there is actual physical memory that corresponds to a pfn
> >   which is done by querying memblock.
> > 
> > * There are NOMAP memory regions. These regions are not mapped in the
> >   linear map and until the previous commit the struct pages representing
> >   these areas had default values.
> > 
> > As the consequence of absence of the special treatment of NOMAP regions in
> > the memory map it was necessary to use memblock_is_map_memory() in
> > pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> > generic mm functionality would not treat a NOMAP page as a normal page.
> > 
> > Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> > the rest of core mm will treat them as unusable memory and thus
> > pfn_valid_within() is no longer required at all and can be disabled by
> > removing CONFIG_HOLES_IN_ZONE on arm64.
> 
> But what about the memory map that are freed during boot (mentioned above).
> Would not they still cause CONFIG_HOLES_IN_ZONE to be applicable and hence
> pfn_valid_within() ?

The CONFIG_HOLES_IN_ZONE name is misleading as actually pfn_valid_within()
is only required for holes within a MAX_ORDER_NR_PAGES blocks (see comment
near pfn_valid_within() definition in mmzone.h). The freeing of the memory
map during boot avoids breaking MAX_ORDER blocks and the holes for which
memory map is freed are always aligned at MAX_ORDER.

AFAIU, the only case when there could be a hole in a MAX_ORDER block is
when EFI/ACPI reserves memory for its use and this memory becomes NOMAP in
the kernel. We still create struct pages for this memory, but they never
get values other than defaults, so core mm has no idea that this memory
should be touched, hence the need for pfn_valid_within() aliased to
pfn_valid() on arm64.
 
> > pfn_valid() can be slightly simplified by replacing
> > memblock_is_map_memory() with memblock_is_memory().
> 
> Just to understand this better, pfn_valid() will now return true for all
> MEMBLOCK_NOMAP based memory but that is okay as core MM would still ignore
> them as unusable memory for being PageReserved().

Right, pfn_valid() will return true for all memory, including
MEMBLOCK_NOMAP. Since core mm deals with PageResrved() for memory used by
the firmware, e.g. on x86, I don't see why it won't work on arm64.
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  arch/arm64/Kconfig   | 3 ---
> >  arch/arm64/mm/init.c | 4 ++--
> >  2 files changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index e4e1b6550115..58e439046d05 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> >  	def_bool y
> >  	depends on NUMA
> >  
> > -config HOLES_IN_ZONE
> > -	def_bool y
> > -
> >  source "kernel/Kconfig.hz"
> >  
> >  config ARCH_SPARSEMEM_ENABLE
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 258b1905ed4a..bb6dd406b1f0 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
> >  
> >  	/*
> >  	 * ZONE_DEVICE memory does not have the memblock entries.
> > -	 * memblock_is_map_memory() check for ZONE_DEVICE based
> > +	 * memblock_is_memory() check for ZONE_DEVICE based
> >  	 * addresses will always fail. Even the normal hotplugged
> >  	 * memory will never have MEMBLOCK_NOMAP flag set in their
> >  	 * memblock entries. Skip memblock search for all non early
> > @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
> >  		return pfn_section_valid(ms, pfn);
> >  }
> >  #endif
> > -	return memblock_is_map_memory(addr);
> > +	return memblock_is_memory(addr);
> >  }
> >  EXPORT_SYMBOL(pfn_valid);
> >  
> > 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 0/3] arm64: drop pfn_valid_within() and simplify pfn_valid()
  2021-04-08  5:19 ` [RFC/RFT PATCH 0/3] " Anshuman Khandual
@ 2021-04-08  6:27   ` Mike Rapoport
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2021-04-08  6:27 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Ard Biesheuvel, Catalin Marinas,
	David Hildenbrand, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm, James Morse

On Thu, Apr 08, 2021 at 10:49:02AM +0530, Anshuman Khandual wrote:
> Adding James here.
> 
> + James Morse <james.morse@arm.com>
> 
> On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Hi,
> > 
> > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> > pfn_valid_within() to 1. 
> 
> That would be really great for arm64 platform as it will save CPU cycles on
> many generic MM paths, given that our pfn_valid() has been expensive.
> 
> > 
> > The idea is to mark NOMAP pages as reserved in the memory map and restore
> 
> Though I am not really sure, would that possibly be problematic for UEFI/EFI
> use cases as it might have just treated them as normal struct pages till now.

I don't think there should be a problem because now the struct pages for
UEFI/ACPI never got to be used by the core mm. They were (rightfully)
skipped by memblock_free_all() from one side and pfn_valid() and
pfn_valid_within() return false for them in various pfn walkers from the
other side.
 
> > the intended semantics of pfn_valid() to designate availability of struct
> > page for a pfn.
> 
> Right, that would be better as the current semantics is not ideal.
> 
> > 
> > With this the core mm will be able to cope with the fact that it cannot use
> > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> > will be treated correctly even without the need for pfn_valid_within.
> > 
> > The patches are only boot tested on qemu-system-aarch64 so I'd really
> > appreciate memory stress tests on real hardware.
> 
> Did some preliminary memory stress tests on a guest with portions of memory
> marked as MEMBLOCK_NOMAP and did not find any obvious problem. But this might
> require some testing on real UEFI environment with firmware using MEMBLOCK_NOMAP
> memory to make sure that changing these struct pages to PageReserved() is safe.

I surely have no access for such machines :)
 
> > If this actually works we'll be one step closer to drop custom pfn_valid()
> > on arm64 altogether.
> 
> Right, planning to rework and respin the RFC originally sent last month.
> 
> https://patchwork.kernel.org/project/linux-mm/patch/1615174073-10520-1-git-send-email-anshuman.khandual@arm.com/

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-07 17:26 ` [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages Mike Rapoport
  2021-04-08  5:16   ` Anshuman Khandual
@ 2021-04-14 15:12   ` David Hildenbrand
  2021-04-14 15:27     ` Ard Biesheuvel
  2021-04-14 20:06     ` Mike Rapoport
  1 sibling, 2 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-04-14 15:12 UTC (permalink / raw)
  To: Mike Rapoport, linux-arm-kernel
  Cc: Anshuman Khandual, Ard Biesheuvel, Catalin Marinas, Marc Zyngier,
	Mark Rutland, Mike Rapoport, Will Deacon, kvmarm, linux-kernel,
	linux-mm

On 07.04.21 19:26, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
> 
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().

I assume these pages are never given to the buddy, because we don't have 
a direct mapping. So to the kernel, it's essentially just like a memory 
hole with benefits.

I can spot that we want to export such memory like any special memory 
thingy/hole in /proc/iomem -- "reserved", which makes sense.

I would assume that MEMBLOCK_NOMAP is a special type of *reserved* 
memory. IOW, that for_each_reserved_mem_range() should already succeed 
on these as well -- we should mark anything that is MEMBLOCK_NOMAP 
implicitly as reserved. Or are there valid reasons not to do so? What 
can anyone do with that memory?

I assume they are pretty much useless for the kernel, right? Like other 
reserved memory ranges.


> 
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>   mm/memblock.c | 23 +++++++++++++++++++++--
>   1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..6b7ea9d86310 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
>   	return end_pfn - start_pfn;
>   }
>   
> +static void __init memmap_init_reserved_pages(void)
> +{
> +	struct memblock_region *region;
> +	phys_addr_t start, end;
> +	u64 i;
> +
> +	/* initialize struct pages for the reserved regions */
> +	for_each_reserved_mem_range(i, &start, &end)
> +		reserve_bootmem_region(start, end);
> +
> +	/* and also treat struct pages for the NOMAP regions as PageReserved */
> +	for_each_mem_region(region) {
> +		if (memblock_is_nomap(region)) {
> +			start = region->base;
> +			end = start + region->size;
> +			reserve_bootmem_region(start, end);
> +		}
> +	}
> +}
> +
>   static unsigned long __init free_low_memory_core_early(void)
>   {
>   	unsigned long count = 0;
> @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
>   
>   	memblock_clear_hotplug(0, -1);
>   
> -	for_each_reserved_mem_range(i, &start, &end)
> -		reserve_bootmem_region(start, end);
> +	memmap_init_reserved_pages();
>   
>   	/*
>   	 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> 


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-14 15:12   ` David Hildenbrand
@ 2021-04-14 15:27     ` Ard Biesheuvel
  2021-04-14 15:52       ` David Hildenbrand
  2021-04-14 20:11       ` Mike Rapoport
  2021-04-14 20:06     ` Mike Rapoport
  1 sibling, 2 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2021-04-14 15:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Rapoport, Linux ARM, Anshuman Khandual, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
	Linux Kernel Mailing List, Linux Memory Management List

On Wed, 14 Apr 2021 at 17:14, David Hildenbrand <david@redhat.com> wrote:
>
> On 07.04.21 19:26, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > The struct pages representing a reserved memory region are initialized
> > using reserve_bootmem_range() function. This function is called for each
> > reserved region just before the memory is freed from memblock to the buddy
> > page allocator.
> >
> > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > values set by the memory map initialization which makes it necessary to
> > have a special treatment for such pages in pfn_valid() and
> > pfn_valid_within().
>
> I assume these pages are never given to the buddy, because we don't have
> a direct mapping. So to the kernel, it's essentially just like a memory
> hole with benefits.
>
> I can spot that we want to export such memory like any special memory
> thingy/hole in /proc/iomem -- "reserved", which makes sense.
>
> I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
> memory. IOW, that for_each_reserved_mem_range() should already succeed
> on these as well -- we should mark anything that is MEMBLOCK_NOMAP
> implicitly as reserved. Or are there valid reasons not to do so? What
> can anyone do with that memory?
>
> I assume they are pretty much useless for the kernel, right? Like other
> reserved memory ranges.
>

On ARM, we need to know whether any physical regions that do not
contain system memory contain something with device semantics or not.
One of the examples is ACPI tables: these are in reserved memory, and
so they are not covered by the linear region. However, when the ACPI
core ioremap()s an arbitrary memory region, we don't know whether it
is mapping a memory region or a device region unless we keep track of
this in some way. (Device mappings require device attributes, but
firmware tables require memory attributes, as they might be accessed
using misaligned reads)


>
> >
> > Split out initialization of the reserved pages to a function with a
> > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> > reserved regions and mark struct pages for the NOMAP regions as
> > PageReserved.
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >   mm/memblock.c | 23 +++++++++++++++++++++--
> >   1 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..6b7ea9d86310 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> >       return end_pfn - start_pfn;
> >   }
> >
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > +     struct memblock_region *region;
> > +     phys_addr_t start, end;
> > +     u64 i;
> > +
> > +     /* initialize struct pages for the reserved regions */
> > +     for_each_reserved_mem_range(i, &start, &end)
> > +             reserve_bootmem_region(start, end);
> > +
> > +     /* and also treat struct pages for the NOMAP regions as PageReserved */
> > +     for_each_mem_region(region) {
> > +             if (memblock_is_nomap(region)) {
> > +                     start = region->base;
> > +                     end = start + region->size;
> > +                     reserve_bootmem_region(start, end);
> > +             }
> > +     }
> > +}
> > +
> >   static unsigned long __init free_low_memory_core_early(void)
> >   {
> >       unsigned long count = 0;
> > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
> >
> >       memblock_clear_hotplug(0, -1);
> >
> > -     for_each_reserved_mem_range(i, &start, &end)
> > -             reserve_bootmem_region(start, end);
> > +     memmap_init_reserved_pages();
> >
> >       /*
> >        * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> >
>
>
> --
> Thanks,
>
> David / dhildenb
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-14 15:27     ` Ard Biesheuvel
@ 2021-04-14 15:52       ` David Hildenbrand
  2021-04-14 20:24         ` Mike Rapoport
  2021-04-14 20:11       ` Mike Rapoport
  1 sibling, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-04-14 15:52 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mike Rapoport, Linux ARM, Anshuman Khandual, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
	Linux Kernel Mailing List, Linux Memory Management List

On 14.04.21 17:27, Ard Biesheuvel wrote:
> On Wed, 14 Apr 2021 at 17:14, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.04.21 19:26, Mike Rapoport wrote:
>>> From: Mike Rapoport <rppt@linux.ibm.com>
>>>
>>> The struct pages representing a reserved memory region are initialized
>>> using reserve_bootmem_range() function. This function is called for each
>>> reserved region just before the memory is freed from memblock to the buddy
>>> page allocator.
>>>
>>> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
>>> values set by the memory map initialization which makes it necessary to
>>> have a special treatment for such pages in pfn_valid() and
>>> pfn_valid_within().
>>
>> I assume these pages are never given to the buddy, because we don't have
>> a direct mapping. So to the kernel, it's essentially just like a memory
>> hole with benefits.
>>
>> I can spot that we want to export such memory like any special memory
>> thingy/hole in /proc/iomem -- "reserved", which makes sense.
>>
>> I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
>> memory. IOW, that for_each_reserved_mem_range() should already succeed
>> on these as well -- we should mark anything that is MEMBLOCK_NOMAP
>> implicitly as reserved. Or are there valid reasons not to do so? What
>> can anyone do with that memory?
>>
>> I assume they are pretty much useless for the kernel, right? Like other
>> reserved memory ranges.
>>
> 
> On ARM, we need to know whether any physical regions that do not
> contain system memory contain something with device semantics or not.
> One of the examples is ACPI tables: these are in reserved memory, and
> so they are not covered by the linear region. However, when the ACPI
> core ioremap()s an arbitrary memory region, we don't know whether it
> is mapping a memory region or a device region unless we keep track of
> this in some way. (Device mappings require device attributes, but
> firmware tables require memory attributes, as they might be accessed
> using misaligned reads)

Using generically sounding NOMAP ("don't create direct mapping") to 
identify device regions feels like a hack. I know, it was introduced 
just for that purpose.

Looking at memblock_mark_nomap(), we consider "device regions"

1) ACPI tables

2) VIDEO_TYPE_EFI memory

3) some device-tree regions in of/fdt.c


IIUC, right now we end up creating a memmap for this NOMAP memory, but 
hide it away in pfn_valid(). This patch set at least fixes that.

Assuming these pages are never mapped to user space via the struct page 
(which better be the case), we could further use a new pagetype to mark 
these pages in a special way, such that we can identify them directly 
via pfn_to_page().

Then, we could mostly avoid having to query memblock at runtime to 
figure out that this is special memory. This would obviously be an 
extension to this series. Just a thought.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid()
  2021-04-08  5:14   ` Anshuman Khandual
  2021-04-08  6:00     ` Mike Rapoport
@ 2021-04-14 15:58     ` David Hildenbrand
  2021-04-14 20:29       ` Mike Rapoport
  1 sibling, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-04-14 15:58 UTC (permalink / raw)
  To: Anshuman Khandual, Mike Rapoport, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Marc Zyngier, Mark Rutland,
	Mike Rapoport, Will Deacon, kvmarm, linux-kernel, linux-mm

On 08.04.21 07:14, Anshuman Khandual wrote:
> 
> On 4/7/21 10:56 PM, Mike Rapoport wrote:
>> From: Mike Rapoport <rppt@linux.ibm.com>
>>
>> The intended semantics of pfn_valid() is to verify whether there is a
>> struct page for the pfn in question and nothing else.
> 
> Should there be a comment affirming this semantics interpretation, above the
> generic pfn_valid() in include/linux/mmzone.h ?
> 
>>
>> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
>> linear map vs those that require ioremap() to access them.
>>
>> Introduce a dedicated pfn_is_memory() to perform such check and use it
>> where appropriate.
>>
>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
>> ---
>>   arch/arm64/include/asm/memory.h | 2 +-
>>   arch/arm64/include/asm/page.h   | 1 +
>>   arch/arm64/kvm/mmu.c            | 2 +-
>>   arch/arm64/mm/init.c            | 6 ++++++
>>   arch/arm64/mm/ioremap.c         | 4 ++--
>>   arch/arm64/mm/mmu.c             | 2 +-
>>   6 files changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>> index 0aabc3be9a75..7e77fdf71b9d 100644
>> --- a/arch/arm64/include/asm/memory.h
>> +++ b/arch/arm64/include/asm/memory.h
>> @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>>   
>>   #define virt_addr_valid(addr)	({					\
>>   	__typeof__(addr) __addr = __tag_reset(addr);			\
>> -	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
>> +	__is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));	\
>>   })
>>   
>>   void dump_mem_limit(void);
>> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
>> index 012cffc574e8..32b485bcc6ff 100644
>> --- a/arch/arm64/include/asm/page.h
>> +++ b/arch/arm64/include/asm/page.h
>> @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
>>   typedef struct page *pgtable_t;
>>   
>>   extern int pfn_valid(unsigned long);
>> +extern int pfn_is_memory(unsigned long);
>>   
>>   #include <asm/memory.h>
>>   
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 8711894db8c2..ad2ea65a3937 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>>   
>>   static bool kvm_is_device_pfn(unsigned long pfn)
>>   {
>> -	return !pfn_valid(pfn);
>> +	return !pfn_is_memory(pfn);
>>   }
>>   
>>   /*
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 3685e12aba9b..258b1905ed4a 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -258,6 +258,12 @@ int pfn_valid(unsigned long pfn)
>>   }
>>   EXPORT_SYMBOL(pfn_valid);
>>   
>> +int pfn_is_memory(unsigned long pfn)
>> +{
>> +	return memblock_is_map_memory(PFN_PHYS(pfn));
>> +}
>> +EXPORT_SYMBOL(pfn_is_memory);> +
> 
> Should not this be generic though ? There is nothing platform or arm64
> specific in here. Wondering as pfn_is_memory() just indicates that the
> pfn is linear mapped, should not it be renamed as pfn_is_linear_memory()
> instead ? Regardless, it's fine either way.

TBH, I dislike (generic) pfn_is_memory(). It feels like we're mixing 
concepts. NOMAP memory vs !NOMAP memory; even NOMAP is some kind of 
memory after all. pfn_is_map_memory() would be more expressive, although 
still sub-optimal.

We'd actually want some kind of arm64-specific pfn_is_system_memory() or 
the inverse pfn_is_device_memory() -- to be improved.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-14 15:12   ` David Hildenbrand
  2021-04-14 15:27     ` Ard Biesheuvel
@ 2021-04-14 20:06     ` Mike Rapoport
  1 sibling, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2021-04-14 20:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-arm-kernel, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On Wed, Apr 14, 2021 at 05:12:11PM +0200, David Hildenbrand wrote:
> On 07.04.21 19:26, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > The struct pages representing a reserved memory region are initialized
> > using reserve_bootmem_range() function. This function is called for each
> > reserved region just before the memory is freed from memblock to the buddy
> > page allocator.
> > 
> > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > values set by the memory map initialization which makes it necessary to
> > have a special treatment for such pages in pfn_valid() and
> > pfn_valid_within().
> 
> I assume these pages are never given to the buddy, because we don't have a
> direct mapping. So to the kernel, it's essentially just like a memory hole
> with benefits.

The pages should not be accessed as normal memory so they do not have a
direct (or in ARMish linear) mapping and are never given to buddy. 
After looking at ACPI standard I don't see a fundamental reason for this
but they've already made this mess and we need to cope with it.
 
> I can spot that we want to export such memory like any special memory
> thingy/hole in /proc/iomem -- "reserved", which makes sense.

It does, but let's wait with /proc/iomem changes. We don't really have a
100% consistent view of it on different architectures, so adding yet
another type there does not seem, well, urgent.
 
> I would assume that MEMBLOCK_NOMAP is a special type of *reserved* memory.
> IOW, that for_each_reserved_mem_range() should already succeed on these as
> well -- we should mark anything that is MEMBLOCK_NOMAP implicitly as
> reserved. Or are there valid reasons not to do so? What can anyone do with
> that memory?
> 
> I assume they are pretty much useless for the kernel, right? Like other
> reserved memory ranges.

I agree that there is a lot of commonality between NOMAP and reserved. The
problem is that even semantics for reserved is different between
architectures. Moreover, on the same architecture there could be
E820_TYPE_RESERVED and memblock.reserved with different properties.

I'd really prefer moving in baby steps here because any change in the boot
mm can bear several month of early hangs debugging ;-)

> > Split out initialization of the reserved pages to a function with a
> > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> > reserved regions and mark struct pages for the NOMAP regions as
> > PageReserved.
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >   mm/memblock.c | 23 +++++++++++++++++++++--
> >   1 file changed, 21 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..6b7ea9d86310 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> >   	return end_pfn - start_pfn;
> >   }
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > +	struct memblock_region *region;
> > +	phys_addr_t start, end;
> > +	u64 i;
> > +
> > +	/* initialize struct pages for the reserved regions */
> > +	for_each_reserved_mem_range(i, &start, &end)
> > +		reserve_bootmem_region(start, end);
> > +
> > +	/* and also treat struct pages for the NOMAP regions as PageReserved */
> > +	for_each_mem_region(region) {
> > +		if (memblock_is_nomap(region)) {
> > +			start = region->base;
> > +			end = start + region->size;
> > +			reserve_bootmem_region(start, end);
> > +		}
> > +	}
> > +}
> > +
> >   static unsigned long __init free_low_memory_core_early(void)
> >   {
> >   	unsigned long count = 0;
> > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
> >   	memblock_clear_hotplug(0, -1);
> > -	for_each_reserved_mem_range(i, &start, &end)
> > -		reserve_bootmem_region(start, end);
> > +	memmap_init_reserved_pages();
> >   	/*
> >   	 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-14 15:27     ` Ard Biesheuvel
  2021-04-14 15:52       ` David Hildenbrand
@ 2021-04-14 20:11       ` Mike Rapoport
  1 sibling, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2021-04-14 20:11 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: David Hildenbrand, Linux ARM, Anshuman Khandual, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
	Linux Kernel Mailing List, Linux Memory Management List

On Wed, Apr 14, 2021 at 05:27:53PM +0200, Ard Biesheuvel wrote:
> On Wed, 14 Apr 2021 at 17:14, David Hildenbrand <david@redhat.com> wrote:
> >
> > On 07.04.21 19:26, Mike Rapoport wrote:
> > > From: Mike Rapoport <rppt@linux.ibm.com>
> > >
> > > The struct pages representing a reserved memory region are initialized
> > > using reserve_bootmem_range() function. This function is called for each
> > > reserved region just before the memory is freed from memblock to the buddy
> > > page allocator.
> > >
> > > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > > values set by the memory map initialization which makes it necessary to
> > > have a special treatment for such pages in pfn_valid() and
> > > pfn_valid_within().
> >
> > I assume these pages are never given to the buddy, because we don't have
> > a direct mapping. So to the kernel, it's essentially just like a memory
> > hole with benefits.
> >
> > I can spot that we want to export such memory like any special memory
> > thingy/hole in /proc/iomem -- "reserved", which makes sense.
> >
> > I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
> > memory. IOW, that for_each_reserved_mem_range() should already succeed
> > on these as well -- we should mark anything that is MEMBLOCK_NOMAP
> > implicitly as reserved. Or are there valid reasons not to do so? What
> > can anyone do with that memory?
> >
> > I assume they are pretty much useless for the kernel, right? Like other
> > reserved memory ranges.
> >
> 
> On ARM, we need to know whether any physical regions that do not
> contain system memory contain something with device semantics or not.
> One of the examples is ACPI tables: these are in reserved memory, and
> so they are not covered by the linear region. However, when the ACPI
> core ioremap()s an arbitrary memory region, we don't know whether it
> is mapping a memory region or a device region unless we keep track of
> this in some way. (Device mappings require device attributes, but
> firmware tables require memory attributes, as they might be accessed
> using misaligned reads)

I mostly agree, but my understanding is that regions of *physical* memory
that are occupied by various pieces of EFI/ACPI information require special
treatment because it was defined this way in the APCI spec.
And since ARM cannot tolerate aliased mappings with different caching mode
the whole bunch of firmware memory should be ioremap()ed to access it.

> > > Split out initialization of the reserved pages to a function with a
> > > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> > > reserved regions and mark struct pages for the NOMAP regions as
> > > PageReserved.
> > >
> > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > > ---
> > >   mm/memblock.c | 23 +++++++++++++++++++++--
> > >   1 file changed, 21 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > index afaefa8fc6ab..6b7ea9d86310 100644
> > > --- a/mm/memblock.c
> > > +++ b/mm/memblock.c
> > > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> > >       return end_pfn - start_pfn;
> > >   }
> > >
> > > +static void __init memmap_init_reserved_pages(void)
> > > +{
> > > +     struct memblock_region *region;
> > > +     phys_addr_t start, end;
> > > +     u64 i;
> > > +
> > > +     /* initialize struct pages for the reserved regions */
> > > +     for_each_reserved_mem_range(i, &start, &end)
> > > +             reserve_bootmem_region(start, end);
> > > +
> > > +     /* and also treat struct pages for the NOMAP regions as PageReserved */
> > > +     for_each_mem_region(region) {
> > > +             if (memblock_is_nomap(region)) {
> > > +                     start = region->base;
> > > +                     end = start + region->size;
> > > +                     reserve_bootmem_region(start, end);
> > > +             }
> > > +     }
> > > +}
> > > +
> > >   static unsigned long __init free_low_memory_core_early(void)
> > >   {
> > >       unsigned long count = 0;
> > > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory_core_early(void)
> > >
> > >       memblock_clear_hotplug(0, -1);
> > >
> > > -     for_each_reserved_mem_range(i, &start, &end)
> > > -             reserve_bootmem_region(start, end);
> > > +     memmap_init_reserved_pages();
> > >
> > >       /*
> > >        * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> > >
> >
> >
> > --
> > Thanks,
> >
> > David / dhildenb
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-14 15:52       ` David Hildenbrand
@ 2021-04-14 20:24         ` Mike Rapoport
  2021-04-15  9:30           ` David Hildenbrand
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Rapoport @ 2021-04-14 20:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ard Biesheuvel, Linux ARM, Anshuman Khandual, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
	Linux Kernel Mailing List, Linux Memory Management List

On Wed, Apr 14, 2021 at 05:52:57PM +0200, David Hildenbrand wrote:
> On 14.04.21 17:27, Ard Biesheuvel wrote:
> > On Wed, 14 Apr 2021 at 17:14, David Hildenbrand <david@redhat.com> wrote:
> > > 
> > > On 07.04.21 19:26, Mike Rapoport wrote:
> > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > 
> > > > The struct pages representing a reserved memory region are initialized
> > > > using reserve_bootmem_range() function. This function is called for each
> > > > reserved region just before the memory is freed from memblock to the buddy
> > > > page allocator.
> > > > 
> > > > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > > > values set by the memory map initialization which makes it necessary to
> > > > have a special treatment for such pages in pfn_valid() and
> > > > pfn_valid_within().
> > > 
> > > I assume these pages are never given to the buddy, because we don't have
> > > a direct mapping. So to the kernel, it's essentially just like a memory
> > > hole with benefits.
> > > 
> > > I can spot that we want to export such memory like any special memory
> > > thingy/hole in /proc/iomem -- "reserved", which makes sense.
> > > 
> > > I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
> > > memory. IOW, that for_each_reserved_mem_range() should already succeed
> > > on these as well -- we should mark anything that is MEMBLOCK_NOMAP
> > > implicitly as reserved. Or are there valid reasons not to do so? What
> > > can anyone do with that memory?
> > > 
> > > I assume they are pretty much useless for the kernel, right? Like other
> > > reserved memory ranges.
> > > 
> > 
> > On ARM, we need to know whether any physical regions that do not
> > contain system memory contain something with device semantics or not.
> > One of the examples is ACPI tables: these are in reserved memory, and
> > so they are not covered by the linear region. However, when the ACPI
> > core ioremap()s an arbitrary memory region, we don't know whether it
> > is mapping a memory region or a device region unless we keep track of
> > this in some way. (Device mappings require device attributes, but
> > firmware tables require memory attributes, as they might be accessed
> > using misaligned reads)
> 
> Using generically sounding NOMAP ("don't create direct mapping") to identify
> device regions feels like a hack. I know, it was introduced just for that
> purpose.
> 
> Looking at memblock_mark_nomap(), we consider "device regions"
> 
> 1) ACPI tables
> 
> 2) VIDEO_TYPE_EFI memory
> 
> 3) some device-tree regions in of/fdt.c
> 
> 
> IIUC, right now we end up creating a memmap for this NOMAP memory, but hide
> it away in pfn_valid(). This patch set at least fixes that.

Currently we have memmap entries with struct page set to defaults for the
NOMAP memory. AFAIU hiding them in pfn_valid()/pfn_valid_within() was a
solution to failures in pfn walkers that presumed that for a pfn_valid()
there will be a struct page that really reflects the state of that page.

> Assuming these pages are never mapped to user space via the struct page
> (which better be the case), we could further use a new pagetype to mark
> these pages in a special way, such that we can identify them directly via
> pfn_to_page().

Not sure we really need a new pagetype here, PG_Reserved seems to be quite
enough to say "don't touch this".  I generally agree that we could make
PG_Reserved a PageType and then have several sub-types for reserved memory.
This definitely will add clarity but I'm not sure that this justifies
amount of churn and effort required to audit uses of PageResrved().
 
> Then, we could mostly avoid having to query memblock at runtime to figure
> out that this is special memory. This would obviously be an extension to
> this series. Just a thought. 

Stop pushing memblock out of kernel! ;-)

Now, seriously, we can minimize memblock involvement in run-time and this
series in yet another step in that direction.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid()
  2021-04-14 15:58     ` David Hildenbrand
@ 2021-04-14 20:29       ` Mike Rapoport
  2021-04-15  9:31         ` David Hildenbrand
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Rapoport @ 2021-04-14 20:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, linux-arm-kernel, Ard Biesheuvel,
	Catalin Marinas, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On Wed, Apr 14, 2021 at 05:58:26PM +0200, David Hildenbrand wrote:
> On 08.04.21 07:14, Anshuman Khandual wrote:
> > 
> > On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > 
> > > The intended semantics of pfn_valid() is to verify whether there is a
> > > struct page for the pfn in question and nothing else.
> > 
> > Should there be a comment affirming this semantics interpretation, above the
> > generic pfn_valid() in include/linux/mmzone.h ?
> > 
> > > 
> > > Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> > > linear map vs those that require ioremap() to access them.
> > > 
> > > Introduce a dedicated pfn_is_memory() to perform such check and use it
> > > where appropriate.
> > > 
> > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > > ---
> > >   arch/arm64/include/asm/memory.h | 2 +-
> > >   arch/arm64/include/asm/page.h   | 1 +
> > >   arch/arm64/kvm/mmu.c            | 2 +-
> > >   arch/arm64/mm/init.c            | 6 ++++++
> > >   arch/arm64/mm/ioremap.c         | 4 ++--
> > >   arch/arm64/mm/mmu.c             | 2 +-
> > >   6 files changed, 12 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > > index 0aabc3be9a75..7e77fdf71b9d 100644
> > > --- a/arch/arm64/include/asm/memory.h
> > > +++ b/arch/arm64/include/asm/memory.h
> > > @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
> > >   #define virt_addr_valid(addr)	({					\
> > >   	__typeof__(addr) __addr = __tag_reset(addr);			\
> > > -	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
> > > +	__is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));	\
> > >   })
> > >   void dump_mem_limit(void);
> > > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> > > index 012cffc574e8..32b485bcc6ff 100644
> > > --- a/arch/arm64/include/asm/page.h
> > > +++ b/arch/arm64/include/asm/page.h
> > > @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
> > >   typedef struct page *pgtable_t;
> > >   extern int pfn_valid(unsigned long);
> > > +extern int pfn_is_memory(unsigned long);
> > >   #include <asm/memory.h>
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index 8711894db8c2..ad2ea65a3937 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> > >   static bool kvm_is_device_pfn(unsigned long pfn)
> > >   {
> > > -	return !pfn_valid(pfn);
> > > +	return !pfn_is_memory(pfn);
> > >   }
> > >   /*
> > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > index 3685e12aba9b..258b1905ed4a 100644
> > > --- a/arch/arm64/mm/init.c
> > > +++ b/arch/arm64/mm/init.c
> > > @@ -258,6 +258,12 @@ int pfn_valid(unsigned long pfn)
> > >   }
> > >   EXPORT_SYMBOL(pfn_valid);
> > > +int pfn_is_memory(unsigned long pfn)
> > > +{
> > > +	return memblock_is_map_memory(PFN_PHYS(pfn));
> > > +}
> > > +EXPORT_SYMBOL(pfn_is_memory);> +
> > 
> > Should not this be generic though ? There is nothing platform or arm64
> > specific in here. Wondering as pfn_is_memory() just indicates that the
> > pfn is linear mapped, should not it be renamed as pfn_is_linear_memory()
> > instead ? Regardless, it's fine either way.
> 
> TBH, I dislike (generic) pfn_is_memory(). It feels like we're mixing
> concepts.

Yeah, at the moment NOMAP is very much arm specific so I'd keep it this way
for now.

>  NOMAP memory vs !NOMAP memory; even NOMAP is some kind of memory
> after all. pfn_is_map_memory() would be more expressive, although still
> sub-optimal.
>
> We'd actually want some kind of arm64-specific pfn_is_system_memory() or the
> inverse pfn_is_device_memory() -- to be improved.

In my current version (to be posted soon) I've started with
pfn_lineary_mapped() but then ended up with pfn_mapped() to make it
"upward" compatible with architectures that use direct rather than linear
map :)

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-14 20:24         ` Mike Rapoport
@ 2021-04-15  9:30           ` David Hildenbrand
  2021-04-16 11:44             ` Mike Rapoport
  0 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-04-15  9:30 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Ard Biesheuvel, Linux ARM, Anshuman Khandual, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
	Linux Kernel Mailing List, Linux Memory Management List

> Not sure we really need a new pagetype here, PG_Reserved seems to be quite
> enough to say "don't touch this".  I generally agree that we could make
> PG_Reserved a PageType and then have several sub-types for reserved memory.
> This definitely will add clarity but I'm not sure that this justifies
> amount of churn and effort required to audit uses of PageResrved().
>   
>> Then, we could mostly avoid having to query memblock at runtime to figure
>> out that this is special memory. This would obviously be an extension to
>> this series. Just a thought.
> 
> Stop pushing memblock out of kernel! ;-)

Can't stop. Won't stop. :D

It's lovely for booting up a kernel until we have other data-structures 
in place ;)


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid()
  2021-04-14 20:29       ` Mike Rapoport
@ 2021-04-15  9:31         ` David Hildenbrand
  2021-04-16 11:40           ` Mike Rapoport
  0 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2021-04-15  9:31 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Anshuman Khandual, linux-arm-kernel, Ard Biesheuvel,
	Catalin Marinas, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On 14.04.21 22:29, Mike Rapoport wrote:
> On Wed, Apr 14, 2021 at 05:58:26PM +0200, David Hildenbrand wrote:
>> On 08.04.21 07:14, Anshuman Khandual wrote:
>>>
>>> On 4/7/21 10:56 PM, Mike Rapoport wrote:
>>>> From: Mike Rapoport <rppt@linux.ibm.com>
>>>>
>>>> The intended semantics of pfn_valid() is to verify whether there is a
>>>> struct page for the pfn in question and nothing else.
>>>
>>> Should there be a comment affirming this semantics interpretation, above the
>>> generic pfn_valid() in include/linux/mmzone.h ?
>>>
>>>>
>>>> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
>>>> linear map vs those that require ioremap() to access them.
>>>>
>>>> Introduce a dedicated pfn_is_memory() to perform such check and use it
>>>> where appropriate.
>>>>
>>>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
>>>> ---
>>>>    arch/arm64/include/asm/memory.h | 2 +-
>>>>    arch/arm64/include/asm/page.h   | 1 +
>>>>    arch/arm64/kvm/mmu.c            | 2 +-
>>>>    arch/arm64/mm/init.c            | 6 ++++++
>>>>    arch/arm64/mm/ioremap.c         | 4 ++--
>>>>    arch/arm64/mm/mmu.c             | 2 +-
>>>>    6 files changed, 12 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>>>> index 0aabc3be9a75..7e77fdf71b9d 100644
>>>> --- a/arch/arm64/include/asm/memory.h
>>>> +++ b/arch/arm64/include/asm/memory.h
>>>> @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>>>>    #define virt_addr_valid(addr)	({					\
>>>>    	__typeof__(addr) __addr = __tag_reset(addr);			\
>>>> -	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
>>>> +	__is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));	\
>>>>    })
>>>>    void dump_mem_limit(void);
>>>> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
>>>> index 012cffc574e8..32b485bcc6ff 100644
>>>> --- a/arch/arm64/include/asm/page.h
>>>> +++ b/arch/arm64/include/asm/page.h
>>>> @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
>>>>    typedef struct page *pgtable_t;
>>>>    extern int pfn_valid(unsigned long);
>>>> +extern int pfn_is_memory(unsigned long);
>>>>    #include <asm/memory.h>
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index 8711894db8c2..ad2ea65a3937 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>>>>    static bool kvm_is_device_pfn(unsigned long pfn)
>>>>    {
>>>> -	return !pfn_valid(pfn);
>>>> +	return !pfn_is_memory(pfn);
>>>>    }
>>>>    /*
>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>>> index 3685e12aba9b..258b1905ed4a 100644
>>>> --- a/arch/arm64/mm/init.c
>>>> +++ b/arch/arm64/mm/init.c
>>>> @@ -258,6 +258,12 @@ int pfn_valid(unsigned long pfn)
>>>>    }
>>>>    EXPORT_SYMBOL(pfn_valid);
>>>> +int pfn_is_memory(unsigned long pfn)
>>>> +{
>>>> +	return memblock_is_map_memory(PFN_PHYS(pfn));
>>>> +}
>>>> +EXPORT_SYMBOL(pfn_is_memory);> +
>>>
>>> Should not this be generic though ? There is nothing platform or arm64
>>> specific in here. Wondering as pfn_is_memory() just indicates that the
>>> pfn is linear mapped, should not it be renamed as pfn_is_linear_memory()
>>> instead ? Regardless, it's fine either way.
>>
>> TBH, I dislike (generic) pfn_is_memory(). It feels like we're mixing
>> concepts.
> 
> Yeah, at the moment NOMAP is very much arm specific so I'd keep it this way
> for now.
> 
>>   NOMAP memory vs !NOMAP memory; even NOMAP is some kind of memory
>> after all. pfn_is_map_memory() would be more expressive, although still
>> sub-optimal.
>>
>> We'd actually want some kind of arm64-specific pfn_is_system_memory() or the
>> inverse pfn_is_device_memory() -- to be improved.
> 
> In my current version (to be posted soon) I've started with
> pfn_lineary_mapped() but then ended up with pfn_mapped() to make it
> "upward" compatible with architectures that use direct rather than linear
> map :)

And even that is moot. It doesn't tell you if a PFN is *actually* mapped 
(hello secretmem).

I'd suggest to just use memblock_is_map_memory() in arch specific code. 
Then it's clear what we are querying exactly and what the semantics 
might be.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid()
  2021-04-15  9:31         ` David Hildenbrand
@ 2021-04-16 11:40           ` Mike Rapoport
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2021-04-16 11:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, linux-arm-kernel, Ard Biesheuvel,
	Catalin Marinas, Marc Zyngier, Mark Rutland, Mike Rapoport,
	Will Deacon, kvmarm, linux-kernel, linux-mm

On Thu, Apr 15, 2021 at 11:31:26AM +0200, David Hildenbrand wrote:
> On 14.04.21 22:29, Mike Rapoport wrote:
> > On Wed, Apr 14, 2021 at 05:58:26PM +0200, David Hildenbrand wrote:
> > > On 08.04.21 07:14, Anshuman Khandual wrote:
> > > > 
> > > > On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > > 
> > > > > The intended semantics of pfn_valid() is to verify whether there is a
> > > > > struct page for the pfn in question and nothing else.
> > > > 
> > > > Should there be a comment affirming this semantics interpretation, above the
> > > > generic pfn_valid() in include/linux/mmzone.h ?
> > > > 
> > > > > 
> > > > > Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> > > > > linear map vs those that require ioremap() to access them.
> > > > > 
> > > > > Introduce a dedicated pfn_is_memory() to perform such check and use it
> > > > > where appropriate.
> > > > > 
> > > > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > > > > ---
> > > > >    arch/arm64/include/asm/memory.h | 2 +-
> > > > >    arch/arm64/include/asm/page.h   | 1 +
> > > > >    arch/arm64/kvm/mmu.c            | 2 +-
> > > > >    arch/arm64/mm/init.c            | 6 ++++++
> > > > >    arch/arm64/mm/ioremap.c         | 4 ++--
> > > > >    arch/arm64/mm/mmu.c             | 2 +-
> > > > >    6 files changed, 12 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > > > > index 0aabc3be9a75..7e77fdf71b9d 100644
> > > > > --- a/arch/arm64/include/asm/memory.h
> > > > > +++ b/arch/arm64/include/asm/memory.h
> > > > > @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
> > > > >    #define virt_addr_valid(addr)	({					\
> > > > >    	__typeof__(addr) __addr = __tag_reset(addr);			\
> > > > > -	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
> > > > > +	__is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));	\
> > > > >    })
> > > > >    void dump_mem_limit(void);
> > > > > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> > > > > index 012cffc574e8..32b485bcc6ff 100644
> > > > > --- a/arch/arm64/include/asm/page.h
> > > > > +++ b/arch/arm64/include/asm/page.h
> > > > > @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
> > > > >    typedef struct page *pgtable_t;
> > > > >    extern int pfn_valid(unsigned long);
> > > > > +extern int pfn_is_memory(unsigned long);
> > > > >    #include <asm/memory.h>
> > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > index 8711894db8c2..ad2ea65a3937 100644
> > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> > > > >    static bool kvm_is_device_pfn(unsigned long pfn)
> > > > >    {
> > > > > -	return !pfn_valid(pfn);
> > > > > +	return !pfn_is_memory(pfn);
> > > > >    }
> > > > >    /*
> > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > > index 3685e12aba9b..258b1905ed4a 100644
> > > > > --- a/arch/arm64/mm/init.c
> > > > > +++ b/arch/arm64/mm/init.c
> > > > > @@ -258,6 +258,12 @@ int pfn_valid(unsigned long pfn)
> > > > >    }
> > > > >    EXPORT_SYMBOL(pfn_valid);
> > > > > +int pfn_is_memory(unsigned long pfn)
> > > > > +{
> > > > > +	return memblock_is_map_memory(PFN_PHYS(pfn));
> > > > > +}
> > > > > +EXPORT_SYMBOL(pfn_is_memory);> +
> > > > 
> > > > Should not this be generic though ? There is nothing platform or arm64
> > > > specific in here. Wondering as pfn_is_memory() just indicates that the
> > > > pfn is linear mapped, should not it be renamed as pfn_is_linear_memory()
> > > > instead ? Regardless, it's fine either way.
> > > 
> > > TBH, I dislike (generic) pfn_is_memory(). It feels like we're mixing
> > > concepts.
> > 
> > Yeah, at the moment NOMAP is very much arm specific so I'd keep it this way
> > for now.
> > 
> > >   NOMAP memory vs !NOMAP memory; even NOMAP is some kind of memory
> > > after all. pfn_is_map_memory() would be more expressive, although still
> > > sub-optimal.
> > > 
> > > We'd actually want some kind of arm64-specific pfn_is_system_memory() or the
> > > inverse pfn_is_device_memory() -- to be improved.
> > 
> > In my current version (to be posted soon) I've started with
> > pfn_lineary_mapped() but then ended up with pfn_mapped() to make it
> > "upward" compatible with architectures that use direct rather than linear
> > map :)
> 
> And even that is moot. It doesn't tell you if a PFN is *actually* mapped
> (hello secretmem).
> 
> I'd suggest to just use memblock_is_map_memory() in arch specific code. Then
> it's clear what we are querying exactly and what the semantics might be.

Ok, let's export memblock_is_map_memory() for the KEEP_MEMBLOCK case.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-15  9:30           ` David Hildenbrand
@ 2021-04-16 11:44             ` Mike Rapoport
  2021-04-16 11:54               ` David Hildenbrand
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Rapoport @ 2021-04-16 11:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ard Biesheuvel, Linux ARM, Anshuman Khandual, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
	Linux Kernel Mailing List, Linux Memory Management List

On Thu, Apr 15, 2021 at 11:30:12AM +0200, David Hildenbrand wrote:
> > Not sure we really need a new pagetype here, PG_Reserved seems to be quite
> > enough to say "don't touch this".  I generally agree that we could make
> > PG_Reserved a PageType and then have several sub-types for reserved memory.
> > This definitely will add clarity but I'm not sure that this justifies
> > amount of churn and effort required to audit uses of PageResrved().
> > > Then, we could mostly avoid having to query memblock at runtime to figure
> > > out that this is special memory. This would obviously be an extension to
> > > this series. Just a thought.
> > 
> > Stop pushing memblock out of kernel! ;-)
> 
> Can't stop. Won't stop. :D
> 
> It's lovely for booting up a kernel until we have other data-structures in
> place ;)

A bit more seriously, we don't have any data structure that reliably
represents physical memory layout and arch-independent fashion. 
memblock is probably the best starting point for eventually having one.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages
  2021-04-16 11:44             ` Mike Rapoport
@ 2021-04-16 11:54               ` David Hildenbrand
  0 siblings, 0 replies; 25+ messages in thread
From: David Hildenbrand @ 2021-04-16 11:54 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Ard Biesheuvel, Linux ARM, Anshuman Khandual, Catalin Marinas,
	Marc Zyngier, Mark Rutland, Mike Rapoport, Will Deacon, kvmarm,
	Linux Kernel Mailing List, Linux Memory Management List

On 16.04.21 13:44, Mike Rapoport wrote:
> On Thu, Apr 15, 2021 at 11:30:12AM +0200, David Hildenbrand wrote:
>>> Not sure we really need a new pagetype here, PG_Reserved seems to be quite
>>> enough to say "don't touch this".  I generally agree that we could make
>>> PG_Reserved a PageType and then have several sub-types for reserved memory.
>>> This definitely will add clarity but I'm not sure that this justifies
>>> amount of churn and effort required to audit uses of PageResrved().
>>>> Then, we could mostly avoid having to query memblock at runtime to figure
>>>> out that this is special memory. This would obviously be an extension to
>>>> this series. Just a thought.
>>>
>>> Stop pushing memblock out of kernel! ;-)
>>
>> Can't stop. Won't stop. :D
>>
>> It's lovely for booting up a kernel until we have other data-structures in
>> place ;)
> 
> A bit more seriously, we don't have any data structure that reliably
> represents physical memory layout and arch-independent fashion.
> memblock is probably the best starting point for eventually having one.

We have the (slowish) kernel resource tree after boot and the (faster) 
memmap. I really don't see why we really need another slowish variant.

We might be better off to just extend and speed up the kernel resource tree.

Memblock as is is not a reasonable datastructure to keep around after 
boot: for example, how we handle boottime allocations and reserve 
regions both as reserved.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-04-16 11:54 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-07 17:26 [RFC/RFT PATCH 0/3] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-04-07 17:26 ` [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages Mike Rapoport
2021-04-08  5:16   ` Anshuman Khandual
2021-04-08  5:48     ` Mike Rapoport
2021-04-14 15:12   ` David Hildenbrand
2021-04-14 15:27     ` Ard Biesheuvel
2021-04-14 15:52       ` David Hildenbrand
2021-04-14 20:24         ` Mike Rapoport
2021-04-15  9:30           ` David Hildenbrand
2021-04-16 11:44             ` Mike Rapoport
2021-04-16 11:54               ` David Hildenbrand
2021-04-14 20:11       ` Mike Rapoport
2021-04-14 20:06     ` Mike Rapoport
2021-04-07 17:26 ` [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid() Mike Rapoport
2021-04-08  5:14   ` Anshuman Khandual
2021-04-08  6:00     ` Mike Rapoport
2021-04-14 15:58     ` David Hildenbrand
2021-04-14 20:29       ` Mike Rapoport
2021-04-15  9:31         ` David Hildenbrand
2021-04-16 11:40           ` Mike Rapoport
2021-04-07 17:26 ` [RFC/RFT PATCH 3/3] arm64: drop pfn_valid_within() and simplify pfn_valid() Mike Rapoport
2021-04-08  5:12   ` Anshuman Khandual
2021-04-08  6:17     ` Mike Rapoport
2021-04-08  5:19 ` [RFC/RFT PATCH 0/3] " Anshuman Khandual
2021-04-08  6:27   ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).