linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 0/2] introduce memblock_next_valid_pfn() (again) for arm64
@ 2019-07-23  5:51 Hanjun Guo
  2019-07-23  5:51 ` [PATCH v12 1/2] mm: page_alloc: " Hanjun Guo
  2019-07-23  5:51 ` [PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn Hanjun Guo
  0 siblings, 2 replies; 8+ messages in thread
From: Hanjun Guo @ 2019-07-23  5:51 UTC (permalink / raw)
  To: Ard Biesheuvel, Andrew Morton, Catalin Marinas, Jia He,
	Mike Rapoport, Will Deacon
  Cc: linux-arm-kernel, linux-mm, linux-kernel, Hanjun Guo

Here is new version of "[PATCH v11 0/3] remain and optimize
memblock_next_valid_pfn on arm and arm64" from Jia He, which is suggested
by Ard to respin this patch set [1].

In the new version, I squashed patch 1/3 and patch 2/3 in v11 into
one patch, fixed a bug for possible out of bound accessing the
regions, and just introduce memblock_next_valid_pfn() for arm64 only
as I don't have a arm32 platform to test.

Ard asked to "with the new data points added for documentation, and
crystal clear about how the meaning of PFN validity differs between
ARM and other architectures, and why the assumptions that the
optimization is based on are guaranteed to hold", to be honest, I
didn't see PFN validity differs between ARM and x86 architecture,
but there is a bug in commit b92df1de5d28 ("mm: page_alloc: skip over
regions of invalid pfns where possible") which has a possible out of
bound accessing the regions as well, so not sure that is the root cause.

Testing on a HiSilicon ARM64 server (a 4 sockets system), I can get
pretty much speedup for bootmem_init() at boot:
    
with 384G memory,
before: 13310ms
after:  1415ms
   
with 1T memory,
before: 20s
after:  2s

[1]: https://lkml.org/lkml/2019/6/10/412

Jia He (2):
  mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64
  mm: page_alloc: reduce unnecessary binary search in
    memblock_next_valid_pfn

 arch/arm64/Kconfig     |  1 +
 include/linux/mmzone.h |  9 +++++++
 mm/Kconfig             |  3 +++
 mm/memblock.c          | 56 ++++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c        |  4 ++-
 5 files changed, 72 insertions(+), 1 deletion(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v12 1/2] mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64
  2019-07-23  5:51 [PATCH v12 0/2] introduce memblock_next_valid_pfn() (again) for arm64 Hanjun Guo
@ 2019-07-23  5:51 ` Hanjun Guo
  2019-07-23  8:30   ` Mike Rapoport
  2019-08-01  8:06   ` Ard Biesheuvel
  2019-07-23  5:51 ` [PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn Hanjun Guo
  1 sibling, 2 replies; 8+ messages in thread
From: Hanjun Guo @ 2019-07-23  5:51 UTC (permalink / raw)
  To: Ard Biesheuvel, Andrew Morton, Catalin Marinas, Jia He,
	Mike Rapoport, Will Deacon
  Cc: linux-arm-kernel, linux-mm, linux-kernel, Hanjun Guo

From: Jia He <hejianet@gmail.com>

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic on x86 due to specific memory mapping on x86_64 which will
skip valid pfns as well, so Daniel Vacek reverted it later.

But as suggested by Daniel Vacek, it is fine to using memblock to skip
gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.

Daniel said:
"On arm and arm64, memblock is used by default. But generic version of
pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
not always return the next valid one but skips more resulting in some
valid frames to be skipped (as if they were invalid). And that's why
kernel was eventually crashing on some !arm machines."

Introduce a new config option CONFIG_HAVE_MEMBLOCK_PFN_VALID and only
selected for arm64, using the new config option to guard the
memblock_next_valid_pfn().

This was tested on a HiSilicon Kunpeng920 based ARM64 server, the speedup
is pretty impressive for bootmem_init() at boot:

with 384G memory,
before: 13310ms
after:  1415ms

with 1T memory,
before: 20s
after:  2s

Suggested-by: Daniel Vacek <neelx@redhat.com>
Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
---
 arch/arm64/Kconfig     |  1 +
 include/linux/mmzone.h |  9 +++++++++
 mm/Kconfig             |  3 +++
 mm/memblock.c          | 31 +++++++++++++++++++++++++++++++
 mm/page_alloc.c        |  4 +++-
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 697ea0510729..058eb26579be 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -893,6 +893,7 @@ config ARCH_FLATMEM_ENABLE
 
 config HAVE_ARCH_PFN_VALID
 	def_bool y
+	select HAVE_MEMBLOCK_PFN_VALID
 
 config HW_PERF_EVENTS
 	def_bool y
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394cabaf4e..24cb6bdb1759 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1325,6 +1325,10 @@ static inline int pfn_present(unsigned long pfn)
 #endif
 
 #define early_pfn_valid(pfn)	pfn_valid(pfn)
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+extern unsigned long memblock_next_valid_pfn(unsigned long pfn);
+#define next_valid_pfn(pfn)	memblock_next_valid_pfn(pfn)
+#endif
 void sparse_init(void);
 #else
 #define sparse_init()	do {} while (0)
@@ -1347,6 +1351,11 @@ struct mminit_pfnnid_cache {
 #define early_pfn_valid(pfn)	(1)
 #endif
 
+/* fallback to default definitions */
+#ifndef next_valid_pfn
+#define next_valid_pfn(pfn)	(pfn + 1)
+#endif
+
 void memory_present(int nid, unsigned long start, unsigned long end);
 
 /*
diff --git a/mm/Kconfig b/mm/Kconfig
index f0c76ba47695..c578374b6413 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -132,6 +132,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	bool
 
+config HAVE_MEMBLOCK_PFN_VALID
+	bool
+
 config HAVE_GENERIC_GUP
 	bool
 
diff --git a/mm/memblock.c b/mm/memblock.c
index 7d4f61ae666a..d57ba51bb9cd 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1251,6 +1251,37 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
 	return 0;
 }
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
+{
+	struct memblock_type *type = &memblock.memory;
+	unsigned int right = type->cnt;
+	unsigned int mid, left = 0;
+	phys_addr_t addr = PFN_PHYS(++pfn);
+
+	do {
+		mid = (right + left) / 2;
+
+		if (addr < type->regions[mid].base)
+			right = mid;
+		else if (addr >= (type->regions[mid].base +
+				  type->regions[mid].size))
+			left = mid + 1;
+		else {
+			/* addr is within the region, so pfn is valid */
+			return pfn;
+		}
+	} while (left < right);
+
+	if (right == type->cnt)
+		return -1UL;
+	else
+		return PHYS_PFN(type->regions[right].base);
+}
+EXPORT_SYMBOL(memblock_next_valid_pfn);
+#endif /* CONFIG_HAVE_MEMBLOCK_PFN_VALID */
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 /**
  * __next_mem_pfn_range_in_zone - iterator for for_each_*_range_in_zone()
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8abe0af..70933c40380a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5811,8 +5811,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		 * function.  They do not exist on hotplugged memory.
 		 */
 		if (context == MEMMAP_EARLY) {
-			if (!early_pfn_valid(pfn))
+			if (!early_pfn_valid(pfn)) {
+				pfn = next_valid_pfn(pfn) - 1;
 				continue;
+			}
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
 			if (overlap_memmap_init(zone, &pfn))
-- 
2.19.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn
  2019-07-23  5:51 [PATCH v12 0/2] introduce memblock_next_valid_pfn() (again) for arm64 Hanjun Guo
  2019-07-23  5:51 ` [PATCH v12 1/2] mm: page_alloc: " Hanjun Guo
@ 2019-07-23  5:51 ` Hanjun Guo
  2019-07-23  8:33   ` Mike Rapoport
  1 sibling, 1 reply; 8+ messages in thread
From: Hanjun Guo @ 2019-07-23  5:51 UTC (permalink / raw)
  To: Ard Biesheuvel, Andrew Morton, Catalin Marinas, Jia He,
	Mike Rapoport, Will Deacon
  Cc: linux-arm-kernel, linux-mm, linux-kernel, Hanjun Guo

From: Jia He <hejianet@gmail.com>

After skipping some invalid pfns in memmap_init_zone(), there is still
some room for improvement.

E.g. if pfn and pfn+1 are in the same memblock region, we can simply pfn++
instead of doing the binary search in memblock_next_valid_pfn.

Furthermore, if the pfn is in a gap of two memory region, skip to next
region directly to speedup the binary search.

Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
---
 mm/memblock.c | 37 +++++++++++++++++++++++++++++++------
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index d57ba51bb9cd..95d5916716a0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1256,28 +1256,53 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
 unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
 {
 	struct memblock_type *type = &memblock.memory;
+	struct memblock_region *regions = type->regions;
 	unsigned int right = type->cnt;
 	unsigned int mid, left = 0;
+	unsigned long start_pfn, end_pfn, next_start_pfn;
 	phys_addr_t addr = PFN_PHYS(++pfn);
+	static int early_region_idx __initdata_memblock = -1;
 
+	/* fast path, return pfn+1 if next pfn is in the same region */
+	if (early_region_idx != -1) {
+		start_pfn = PFN_DOWN(regions[early_region_idx].base);
+		end_pfn = PFN_DOWN(regions[early_region_idx].base +
+				regions[early_region_idx].size);
+
+		if (pfn >= start_pfn && pfn < end_pfn)
+			return pfn;
+
+		/* try slow path */
+		if (++early_region_idx == type->cnt)
+			goto slow_path;
+
+		next_start_pfn = PFN_DOWN(regions[early_region_idx].base);
+
+		if (pfn >= end_pfn && pfn <= next_start_pfn)
+			return next_start_pfn;
+	}
+
+slow_path:
+	/* slow path, do the binary searching */
 	do {
 		mid = (right + left) / 2;
 
-		if (addr < type->regions[mid].base)
+		if (addr < regions[mid].base)
 			right = mid;
-		else if (addr >= (type->regions[mid].base +
-				  type->regions[mid].size))
+		else if (addr >= (regions[mid].base + regions[mid].size))
 			left = mid + 1;
 		else {
-			/* addr is within the region, so pfn is valid */
+			early_region_idx = mid;
 			return pfn;
 		}
 	} while (left < right);
 
 	if (right == type->cnt)
 		return -1UL;
-	else
-		return PHYS_PFN(type->regions[right].base);
+
+	early_region_idx = right;
+
+	return PHYS_PFN(regions[early_region_idx].base);
 }
 EXPORT_SYMBOL(memblock_next_valid_pfn);
 #endif /* CONFIG_HAVE_MEMBLOCK_PFN_VALID */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v12 1/2] mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64
  2019-07-23  5:51 ` [PATCH v12 1/2] mm: page_alloc: " Hanjun Guo
@ 2019-07-23  8:30   ` Mike Rapoport
  2019-07-24  8:29     ` Hanjun Guo
  2019-08-01  8:06   ` Ard Biesheuvel
  1 sibling, 1 reply; 8+ messages in thread
From: Mike Rapoport @ 2019-07-23  8:30 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Ard Biesheuvel, Andrew Morton, Catalin Marinas, Jia He,
	Will Deacon, linux-arm-kernel, linux-mm, linux-kernel

On Tue, Jul 23, 2019 at 01:51:12PM +0800, Hanjun Guo wrote:
> From: Jia He <hejianet@gmail.com>
> 
> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") optimized the loop in memmap_init_zone(). But it causes
> possible panic on x86 due to specific memory mapping on x86_64 which will
> skip valid pfns as well, so Daniel Vacek reverted it later.
> 
> But as suggested by Daniel Vacek, it is fine to using memblock to skip
> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
> 
> Daniel said:
> "On arm and arm64, memblock is used by default. But generic version of
> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
> not always return the next valid one but skips more resulting in some
> valid frames to be skipped (as if they were invalid). And that's why
> kernel was eventually crashing on some !arm machines."

I think that the crash on x86 was not related to CONFIG_HAVE_ARCH_PFN_VALID
but rather to the x86 way to setup memblock.  Some of the x86 reserved
memory areas were never added to memblock.memory, which makes memblock's
view of the physical memory incomplete and that's why
memblock_next_valid_pfn() could skip valid PFNs on x86.

> Introduce a new config option CONFIG_HAVE_MEMBLOCK_PFN_VALID and only
> selected for arm64, using the new config option to guard the
> memblock_next_valid_pfn().
 
As far as I can tell, the memblock_next_valid_pfn() should work on most
architectures and not only on ARM. For sure there is should be no
dependency between CONFIG_HAVE_ARCH_PFN_VALID and memblock_next_valid_pfn().

I believe that the configuration option to guard memblock_next_valid_pfn()
should be opt-out and that only x86 will require it.

> This was tested on a HiSilicon Kunpeng920 based ARM64 server, the speedup
> is pretty impressive for bootmem_init() at boot:
> 
> with 384G memory,
> before: 13310ms
> after:  1415ms
> 
> with 1T memory,
> before: 20s
> after:  2s
> 
> Suggested-by: Daniel Vacek <neelx@redhat.com>
> Signed-off-by: Jia He <hejianet@gmail.com>
> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
> ---
>  arch/arm64/Kconfig     |  1 +
>  include/linux/mmzone.h |  9 +++++++++
>  mm/Kconfig             |  3 +++
>  mm/memblock.c          | 31 +++++++++++++++++++++++++++++++
>  mm/page_alloc.c        |  4 +++-
>  5 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 697ea0510729..058eb26579be 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -893,6 +893,7 @@ config ARCH_FLATMEM_ENABLE
>  
>  config HAVE_ARCH_PFN_VALID
>  	def_bool y
> +	select HAVE_MEMBLOCK_PFN_VALID
>
>  config HW_PERF_EVENTS
>  	def_bool y
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 70394cabaf4e..24cb6bdb1759 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1325,6 +1325,10 @@ static inline int pfn_present(unsigned long pfn)
>  #endif
>  
>  #define early_pfn_valid(pfn)	pfn_valid(pfn)
> +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
> +extern unsigned long memblock_next_valid_pfn(unsigned long pfn);
> +#define next_valid_pfn(pfn)	memblock_next_valid_pfn(pfn)

Please make it 'static inline' and move out of '#ifdef CONFIG_SPARSEMEM'

> +#endif
>  void sparse_init(void);
>  #else
>  #define sparse_init()	do {} while (0)
> @@ -1347,6 +1351,11 @@ struct mminit_pfnnid_cache {
>  #define early_pfn_valid(pfn)	(1)
>  #endif
>  
> +/* fallback to default definitions */
> +#ifndef next_valid_pfn
> +#define next_valid_pfn(pfn)	(pfn + 1)

static inline as well.

> +#endif
> +
>  void memory_present(int nid, unsigned long start, unsigned long end);
>  
>  /*
> diff --git a/mm/Kconfig b/mm/Kconfig
> index f0c76ba47695..c578374b6413 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -132,6 +132,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>  config HAVE_MEMBLOCK_PHYS_MAP
>  	bool
>  
> +config HAVE_MEMBLOCK_PFN_VALID
> +	bool
> +
>  config HAVE_GENERIC_GUP
>  	bool
>  
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 7d4f61ae666a..d57ba51bb9cd 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1251,6 +1251,37 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>  	return 0;
>  }
>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> +
> +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
> +{
> +	struct memblock_type *type = &memblock.memory;
> +	unsigned int right = type->cnt;
> +	unsigned int mid, left = 0;
> +	phys_addr_t addr = PFN_PHYS(++pfn);
> +
> +	do {
> +		mid = (right + left) / 2;
> +
> +		if (addr < type->regions[mid].base)
> +			right = mid;
> +		else if (addr >= (type->regions[mid].base +
> +				  type->regions[mid].size))
> +			left = mid + 1;
> +		else {
> +			/* addr is within the region, so pfn is valid */
> +			return pfn;
> +		}
> +	} while (left < right);
> +

We have memblock_search() for this.

> +	if (right == type->cnt)
> +		return -1UL;
> +	else
> +		return PHYS_PFN(type->regions[right].base);
> +}
> +EXPORT_SYMBOL(memblock_next_valid_pfn);
> +#endif /* CONFIG_HAVE_MEMBLOCK_PFN_VALID */
> +
>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  /**
>   * __next_mem_pfn_range_in_zone - iterator for for_each_*_range_in_zone()
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d66bc8abe0af..70933c40380a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5811,8 +5811,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  		 * function.  They do not exist on hotplugged memory.
>  		 */
>  		if (context == MEMMAP_EARLY) {
> -			if (!early_pfn_valid(pfn))
> +			if (!early_pfn_valid(pfn)) {
> +				pfn = next_valid_pfn(pfn) - 1;
>  				continue;
> +			}
>  			if (!early_pfn_in_nid(pfn, nid))
>  				continue;
>  			if (overlap_memmap_init(zone, &pfn))
> -- 
> 2.19.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn
  2019-07-23  5:51 ` [PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn Hanjun Guo
@ 2019-07-23  8:33   ` Mike Rapoport
  2019-07-24  8:33     ` Hanjun Guo
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Rapoport @ 2019-07-23  8:33 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Ard Biesheuvel, Andrew Morton, Catalin Marinas, Jia He,
	Will Deacon, linux-arm-kernel, linux-mm, linux-kernel

On Tue, Jul 23, 2019 at 01:51:13PM +0800, Hanjun Guo wrote:
> From: Jia He <hejianet@gmail.com>
> 
> After skipping some invalid pfns in memmap_init_zone(), there is still
> some room for improvement.
> 
> E.g. if pfn and pfn+1 are in the same memblock region, we can simply pfn++
> instead of doing the binary search in memblock_next_valid_pfn.
> 
> Furthermore, if the pfn is in a gap of two memory region, skip to next
> region directly to speedup the binary search.

How much speed up do you see with this improvements relatively to simple
binary search in memblock_next_valid_pfn()?
  
> Signed-off-by: Jia He <hejianet@gmail.com>
> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
> ---
>  mm/memblock.c | 37 +++++++++++++++++++++++++++++++------
>  1 file changed, 31 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index d57ba51bb9cd..95d5916716a0 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1256,28 +1256,53 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>  unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>  {
>  	struct memblock_type *type = &memblock.memory;
> +	struct memblock_region *regions = type->regions;
>  	unsigned int right = type->cnt;
>  	unsigned int mid, left = 0;
> +	unsigned long start_pfn, end_pfn, next_start_pfn;
>  	phys_addr_t addr = PFN_PHYS(++pfn);
> +	static int early_region_idx __initdata_memblock = -1;
>  
> +	/* fast path, return pfn+1 if next pfn is in the same region */
> +	if (early_region_idx != -1) {
> +		start_pfn = PFN_DOWN(regions[early_region_idx].base);
> +		end_pfn = PFN_DOWN(regions[early_region_idx].base +
> +				regions[early_region_idx].size);
> +
> +		if (pfn >= start_pfn && pfn < end_pfn)
> +			return pfn;
> +
> +		/* try slow path */
> +		if (++early_region_idx == type->cnt)
> +			goto slow_path;
> +
> +		next_start_pfn = PFN_DOWN(regions[early_region_idx].base);
> +
> +		if (pfn >= end_pfn && pfn <= next_start_pfn)
> +			return next_start_pfn;
> +	}
> +
> +slow_path:
> +	/* slow path, do the binary searching */
>  	do {
>  		mid = (right + left) / 2;
>  
> -		if (addr < type->regions[mid].base)
> +		if (addr < regions[mid].base)
>  			right = mid;
> -		else if (addr >= (type->regions[mid].base +
> -				  type->regions[mid].size))
> +		else if (addr >= (regions[mid].base + regions[mid].size))
>  			left = mid + 1;
>  		else {
> -			/* addr is within the region, so pfn is valid */
> +			early_region_idx = mid;
>  			return pfn;
>  		}
>  	} while (left < right);
>  
>  	if (right == type->cnt)
>  		return -1UL;
> -	else
> -		return PHYS_PFN(type->regions[right].base);
> +
> +	early_region_idx = right;
> +
> +	return PHYS_PFN(regions[early_region_idx].base);
>  }
>  EXPORT_SYMBOL(memblock_next_valid_pfn);
>  #endif /* CONFIG_HAVE_MEMBLOCK_PFN_VALID */
> -- 
> 2.19.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v12 1/2] mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64
  2019-07-23  8:30   ` Mike Rapoport
@ 2019-07-24  8:29     ` Hanjun Guo
  0 siblings, 0 replies; 8+ messages in thread
From: Hanjun Guo @ 2019-07-24  8:29 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Ard Biesheuvel, Andrew Morton, Catalin Marinas, Jia He,
	Will Deacon, linux-arm-kernel, linux-mm, linux-kernel

On 2019/7/23 16:30, Mike Rapoport wrote:
> On Tue, Jul 23, 2019 at 01:51:12PM +0800, Hanjun Guo wrote:
>> From: Jia He <hejianet@gmail.com>
>>
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But it causes
>> possible panic on x86 due to specific memory mapping on x86_64 which will
>> skip valid pfns as well, so Daniel Vacek reverted it later.
>>
>> But as suggested by Daniel Vacek, it is fine to using memblock to skip
>> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
>>
>> Daniel said:
>> "On arm and arm64, memblock is used by default. But generic version of
>> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
>> not always return the next valid one but skips more resulting in some
>> valid frames to be skipped (as if they were invalid). And that's why
>> kernel was eventually crashing on some !arm machines."
> 
> I think that the crash on x86 was not related to CONFIG_HAVE_ARCH_PFN_VALID
> but rather to the x86 way to setup memblock.  Some of the x86 reserved
> memory areas were never added to memblock.memory, which makes memblock's
> view of the physical memory incomplete and that's why
> memblock_next_valid_pfn() could skip valid PFNs on x86.

Thank you for kindly clarify, I will update the patch with your comments
in next version.

> 
>> Introduce a new config option CONFIG_HAVE_MEMBLOCK_PFN_VALID and only
>> selected for arm64, using the new config option to guard the
>> memblock_next_valid_pfn().
>  
> As far as I can tell, the memblock_next_valid_pfn() should work on most
> architectures and not only on ARM. For sure there is should be no
> dependency between CONFIG_HAVE_ARCH_PFN_VALID and memblock_next_valid_pfn().
> 
> I believe that the configuration option to guard memblock_next_valid_pfn()
> should be opt-out and that only x86 will require it.

So how about introduce a configuration option, say, CONFIG_HAVE_ARCH_PFN_INVALID,
selected by x86 and keep it default unselected for all other architecture?

> 
>> This was tested on a HiSilicon Kunpeng920 based ARM64 server, the speedup
>> is pretty impressive for bootmem_init() at boot:
>>
>> with 384G memory,
>> before: 13310ms
>> after:  1415ms
>>
>> with 1T memory,
>> before: 20s
>> after:  2s
>>
>> Suggested-by: Daniel Vacek <neelx@redhat.com>
>> Signed-off-by: Jia He <hejianet@gmail.com>
>> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
>> ---
>>  arch/arm64/Kconfig     |  1 +
>>  include/linux/mmzone.h |  9 +++++++++
>>  mm/Kconfig             |  3 +++
>>  mm/memblock.c          | 31 +++++++++++++++++++++++++++++++
>>  mm/page_alloc.c        |  4 +++-
>>  5 files changed, 47 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 697ea0510729..058eb26579be 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -893,6 +893,7 @@ config ARCH_FLATMEM_ENABLE
>>  
>>  config HAVE_ARCH_PFN_VALID
>>  	def_bool y
>> +	select HAVE_MEMBLOCK_PFN_VALID
>>
>>  config HW_PERF_EVENTS
>>  	def_bool y
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 70394cabaf4e..24cb6bdb1759 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1325,6 +1325,10 @@ static inline int pfn_present(unsigned long pfn)
>>  #endif
>>  
>>  #define early_pfn_valid(pfn)	pfn_valid(pfn)
>> +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
>> +extern unsigned long memblock_next_valid_pfn(unsigned long pfn);
>> +#define next_valid_pfn(pfn)	memblock_next_valid_pfn(pfn)
> 
> Please make it 'static inline' and move out of '#ifdef CONFIG_SPARSEMEM'

Will do.

> 
>> +#endif
>>  void sparse_init(void);
>>  #else
>>  #define sparse_init()	do {} while (0)
>> @@ -1347,6 +1351,11 @@ struct mminit_pfnnid_cache {
>>  #define early_pfn_valid(pfn)	(1)
>>  #endif
>>  
>> +/* fallback to default definitions */
>> +#ifndef next_valid_pfn
>> +#define next_valid_pfn(pfn)	(pfn + 1)
> 
> static inline as well.

OK.

> 
>> +#endif
>> +
>>  void memory_present(int nid, unsigned long start, unsigned long end);
>>  
>>  /*
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index f0c76ba47695..c578374b6413 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -132,6 +132,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>>  config HAVE_MEMBLOCK_PHYS_MAP
>>  	bool
>>  
>> +config HAVE_MEMBLOCK_PFN_VALID
>> +	bool
>> +
>>  config HAVE_GENERIC_GUP
>>  	bool
>>  
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7d4f61ae666a..d57ba51bb9cd 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1251,6 +1251,37 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>>  	return 0;
>>  }
>>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>> +
>> +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
>> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>> +{
>> +	struct memblock_type *type = &memblock.memory;
>> +	unsigned int right = type->cnt;
>> +	unsigned int mid, left = 0;
>> +	phys_addr_t addr = PFN_PHYS(++pfn);
>> +
>> +	do {
>> +		mid = (right + left) / 2;
>> +
>> +		if (addr < type->regions[mid].base)
>> +			right = mid;
>> +		else if (addr >= (type->regions[mid].base +
>> +				  type->regions[mid].size))
>> +			left = mid + 1;
>> +		else {
>> +			/* addr is within the region, so pfn is valid */
>> +			return pfn;
>> +		}
>> +	} while (left < right);
>> +
> 
> We have memblock_search() for this.

I will update my patch as you suggested.

Thanks
Hanjun


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn
  2019-07-23  8:33   ` Mike Rapoport
@ 2019-07-24  8:33     ` Hanjun Guo
  0 siblings, 0 replies; 8+ messages in thread
From: Hanjun Guo @ 2019-07-24  8:33 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Ard Biesheuvel, Catalin Marinas, linux-kernel, linux-mm, Jia He,
	Andrew Morton, Will Deacon, linux-arm-kernel

On 2019/7/23 16:33, Mike Rapoport wrote:
> On Tue, Jul 23, 2019 at 01:51:13PM +0800, Hanjun Guo wrote:
>> From: Jia He <hejianet@gmail.com>
>>
>> After skipping some invalid pfns in memmap_init_zone(), there is still
>> some room for improvement.
>>
>> E.g. if pfn and pfn+1 are in the same memblock region, we can simply pfn++
>> instead of doing the binary search in memblock_next_valid_pfn.
>>
>> Furthermore, if the pfn is in a gap of two memory region, skip to next
>> region directly to speedup the binary search.
> How much speed up do you see with this improvements relatively to simple
> binary search in memblock_next_valid_pfn()?

The major speedup on my platform is the previous patch in this patch set,
not this one, I think it's related to sparse memory mode for different
platforms.

Thanks
Hanjun

>   


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v12 1/2] mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64
  2019-07-23  5:51 ` [PATCH v12 1/2] mm: page_alloc: " Hanjun Guo
  2019-07-23  8:30   ` Mike Rapoport
@ 2019-08-01  8:06   ` Ard Biesheuvel
  1 sibling, 0 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2019-08-01  8:06 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Andrew Morton, Catalin Marinas, Jia He, Mike Rapoport,
	Will Deacon, linux-arm-kernel, Linux-MM,
	Linux Kernel Mailing List

On Tue, 23 Jul 2019 at 08:53, Hanjun Guo <guohanjun@huawei.com> wrote:
>
> From: Jia He <hejianet@gmail.com>
>
> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") optimized the loop in memmap_init_zone(). But it causes
> possible panic on x86 due to specific memory mapping on x86_64 which will
> skip valid pfns as well, so Daniel Vacek reverted it later.
>
> But as suggested by Daniel Vacek, it is fine to using memblock to skip
> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
>
> Daniel said:
> "On arm and arm64, memblock is used by default. But generic version of
> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
> not always return the next valid one but skips more resulting in some
> valid frames to be skipped (as if they were invalid). And that's why
> kernel was eventually crashing on some !arm machines."
>
> Introduce a new config option CONFIG_HAVE_MEMBLOCK_PFN_VALID and only
> selected for arm64, using the new config option to guard the
> memblock_next_valid_pfn().
>
> This was tested on a HiSilicon Kunpeng920 based ARM64 server, the speedup
> is pretty impressive for bootmem_init() at boot:
>
> with 384G memory,
> before: 13310ms
> after:  1415ms
>
> with 1T memory,
> before: 20s
> after:  2s
>
> Suggested-by: Daniel Vacek <neelx@redhat.com>
> Signed-off-by: Jia He <hejianet@gmail.com>
> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
> ---
>  arch/arm64/Kconfig     |  1 +
>  include/linux/mmzone.h |  9 +++++++++
>  mm/Kconfig             |  3 +++
>  mm/memblock.c          | 31 +++++++++++++++++++++++++++++++
>  mm/page_alloc.c        |  4 +++-
>  5 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 697ea0510729..058eb26579be 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -893,6 +893,7 @@ config ARCH_FLATMEM_ENABLE
>
>  config HAVE_ARCH_PFN_VALID
>         def_bool y
> +       select HAVE_MEMBLOCK_PFN_VALID
>
>  config HW_PERF_EVENTS
>         def_bool y
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 70394cabaf4e..24cb6bdb1759 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1325,6 +1325,10 @@ static inline int pfn_present(unsigned long pfn)
>  #endif
>
>  #define early_pfn_valid(pfn)   pfn_valid(pfn)
> +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
> +extern unsigned long memblock_next_valid_pfn(unsigned long pfn);
> +#define next_valid_pfn(pfn)    memblock_next_valid_pfn(pfn)
> +#endif
>  void sparse_init(void);
>  #else
>  #define sparse_init()  do {} while (0)
> @@ -1347,6 +1351,11 @@ struct mminit_pfnnid_cache {
>  #define early_pfn_valid(pfn)   (1)
>  #endif
>
> +/* fallback to default definitions */
> +#ifndef next_valid_pfn
> +#define next_valid_pfn(pfn)    (pfn + 1)
> +#endif
> +
>  void memory_present(int nid, unsigned long start, unsigned long end);
>
>  /*
> diff --git a/mm/Kconfig b/mm/Kconfig
> index f0c76ba47695..c578374b6413 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -132,6 +132,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>  config HAVE_MEMBLOCK_PHYS_MAP
>         bool
>
> +config HAVE_MEMBLOCK_PFN_VALID
> +       bool
> +
>  config HAVE_GENERIC_GUP
>         bool
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 7d4f61ae666a..d57ba51bb9cd 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1251,6 +1251,37 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>         return 0;
>  }
>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> +
> +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
> +{
> +       struct memblock_type *type = &memblock.memory;
> +       unsigned int right = type->cnt;
> +       unsigned int mid, left = 0;
> +       phys_addr_t addr = PFN_PHYS(++pfn);
> +
> +       do {
> +               mid = (right + left) / 2;
> +
> +               if (addr < type->regions[mid].base)
> +                       right = mid;
> +               else if (addr >= (type->regions[mid].base +
> +                                 type->regions[mid].size))
> +                       left = mid + 1;
> +               else {
> +                       /* addr is within the region, so pfn is valid */
> +                       return pfn;
> +               }
> +       } while (left < right);
> +
> +       if (right == type->cnt)
> +               return -1UL;
> +       else
> +               return PHYS_PFN(type->regions[right].base);
> +}
> +EXPORT_SYMBOL(memblock_next_valid_pfn);
> +#endif /* CONFIG_HAVE_MEMBLOCK_PFN_VALID */
> +
>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  /**
>   * __next_mem_pfn_range_in_zone - iterator for for_each_*_range_in_zone()
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d66bc8abe0af..70933c40380a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5811,8 +5811,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>                  * function.  They do not exist on hotplugged memory.
>                  */
>                 if (context == MEMMAP_EARLY) {
> -                       if (!early_pfn_valid(pfn))
> +                       if (!early_pfn_valid(pfn)) {
> +                               pfn = next_valid_pfn(pfn) - 1;

This is the thing I objected to previously: subtracting 1 so the pfn++
in the for() produces the correct value.

Could we instead pull the next() operation into the for() construct as
the third argument?

>                                 continue;
> +                       }
>                         if (!early_pfn_in_nid(pfn, nid))
>                                 continue;
>                         if (overlap_memmap_init(zone, &pfn))
> --
> 2.19.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-08-01  8:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23  5:51 [PATCH v12 0/2] introduce memblock_next_valid_pfn() (again) for arm64 Hanjun Guo
2019-07-23  5:51 ` [PATCH v12 1/2] mm: page_alloc: " Hanjun Guo
2019-07-23  8:30   ` Mike Rapoport
2019-07-24  8:29     ` Hanjun Guo
2019-08-01  8:06   ` Ard Biesheuvel
2019-07-23  5:51 ` [PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn Hanjun Guo
2019-07-23  8:33   ` Mike Rapoport
2019-07-24  8:33     ` Hanjun Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).