linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64
@ 2018-04-04  2:56 Jia He
  2018-04-04  2:56 ` [PATCH v6 1/5] mm: page_alloc: remain memblock_next_valid_pfn() " Jia He
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Jia He @ 2018-04-04  2:56 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Andrew Morton, Michal Hocko
  Cc: Wei Yang, Kees Cook, Laura Abbott, Vladimir Murzin,
	Philip Derrin, AKASHI Takahiro, James Morse, Steve Capper,
	Pavel Tatashin, Gioh Kim, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Daniel Jordan, Daniel Vacek,
	Eugeniu Rosca, linux-arm-kernel, linux-kernel, linux-mm, Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") tried to optimize the loop in memmap_init_zone(). But
there is still some room for improvement.

Patch 1 remain the memblock_next_valid_pfn on arm and arm64
Patch 2 optimizes the memblock_next_valid_pfn()
Patch 3~5 optimizes the early_pfn_valid()

As for the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 41313 us to 24345 us in my
armv8a server(QDF2400 with 96G memory).

Attached the memblock region information in my server.
[   86.956758] Zone ranges:
[   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
[   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
[   86.972631] Movable zone start for each node
[   86.977179] Early memory node ranges
[   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
[   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
[   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
[   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
[   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
[   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
[   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
[   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
[   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
[   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
[   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
[   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
[   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
[   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
[   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
[   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
[   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
[   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
[   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
[   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
[   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
[   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
[   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
[   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
[   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
[   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
[   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
[   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
[   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
[   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
[   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
[   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
[   87.194791] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]

Without this patchset:
[  117.106153] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
[  117.113677] before memmap_init
[  117.118195] after  memmap_init
>>> memmap_init takes 4518 us
[  117.121446] before memmap_init
[  117.154992] after  memmap_init
>>> memmap_init takes 33546 us
[  117.158241] before memmap_init
[  117.161490] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 41313 us

With this patchset:
[   87.194791] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
[   87.202314] before memmap_init
[   87.206164] after  memmap_init
>>> memmap_init takes 3850 us
[   87.209416] before memmap_init
[   87.226662] after  memmap_init
>>> memmap_init takes 17246 us
[   87.229911] before memmap_init
[   87.233160] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 24345 us

Changelog:
V6: - simplify the codes, move arm/arm64 common codes to one file.
    - refine patches as suggested by Danial Vacek and Ard Biesheuvel
V5: - further refining as suggested by Danial Vacek. Make codes
      arm/arm64 more arch specific
V4: - refine patches as suggested by Danial Vacek and Wei Yang
    - optimized on arm besides arm64
V3: - fix 2 issues reported by kbuild test robot
V2: - rebase to mmotm latest
    - remain memblock_next_valid_pfn on arm64
    - refine memblock_search_pfn_regions and pfn_valid_region

Jia He (5):
  mm: page_alloc: remain memblock_next_valid_pfn() on arm and arm64
  arm: arm64: page_alloc: reduce unnecessary binary search in
    memblock_next_valid_pfn()
  mm/memblock: introduce memblock_search_pfn_regions()
  arm: arm64: introduce pfn_valid_region()
  mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

 arch/arm/mm/init.c           |  1 +
 arch/arm64/mm/init.c         |  1 +
 include/linux/arm96_common.h | 76 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/memblock.h     |  2 ++
 include/linux/mmzone.h       | 18 ++++++++++-
 mm/memblock.c                |  9 ++++++
 mm/page_alloc.c              |  2 +-
 7 files changed, 107 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/arm96_common.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v6 1/5] mm: page_alloc: remain memblock_next_valid_pfn() on arm and arm64
  2018-04-04  2:56 [PATCH v6 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64 Jia He
@ 2018-04-04  2:56 ` Jia He
  2018-04-04 14:19   ` kbuild test robot
  2018-04-04  2:56 ` [PATCH v6 2/5] arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Jia He @ 2018-04-04  2:56 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Andrew Morton, Michal Hocko
  Cc: Wei Yang, Kees Cook, Laura Abbott, Vladimir Murzin,
	Philip Derrin, AKASHI Takahiro, James Morse, Steve Capper,
	Pavel Tatashin, Gioh Kim, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Daniel Jordan, Daniel Vacek,
	Eugeniu Rosca, linux-arm-kernel, linux-kernel, linux-mm, Jia He,
	Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic bug. So Daniel Vacek reverted it later.

But as suggested by Daniel Vacek, it is fine to using memblock to skip
gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.

On arm and arm64, memblock is used by default. But generic version of
pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
not always return the next valid one but skips more resulting in some
valid frames to be skipped (as if they were invalid). And that's why
kernel was eventually crashing on some !arm machines.

And as verified by Eugeniu Rosca, arm can benifit from commit
b92df1de5d28. So remain the memblock_next_valid_pfn on arm{,64} and move
the related codes to arm64 arch directory.

Suggested-by: Daniel Vacek <neelx@redhat.com>
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 arch/arm/mm/init.c           |  1 +
 arch/arm64/mm/init.c         |  1 +
 include/linux/arm96_common.h | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/mmzone.h       | 11 +++++++++++
 mm/page_alloc.c              |  2 +-
 5 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/arm96_common.h

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index a1f11a7..296cc52 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -25,6 +25,7 @@
 #include <linux/dma-contiguous.h>
 #include <linux/sizes.h>
 #include <linux/stop_machine.h>
+#include <linux/arm96_common.h>
 
 #include <asm/cp15.h>
 #include <asm/mach-types.h>
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b90..6efab80 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -40,6 +40,7 @@
 #include <linux/mm.h>
 #include <linux/kexec.h>
 #include <linux/crash_dump.h>
+#include <linux/arm96_common.h>
 
 #include <asm/boot.h>
 #include <asm/fixmap.h>
diff --git a/include/linux/arm96_common.h b/include/linux/arm96_common.h
new file mode 100644
index 0000000..a6f68ea
--- /dev/null
+++ b/include/linux/arm96_common.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Common definitions of arm and arm64
+ * Copyright (C) 2018 HXT-semitech Corp.
+ */
+#ifndef __ARM96_COMMON_H
+#define __ARM96_COMMON_H
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+/* HAVE_MEMBLOCK is always enabled on arm and arm64 */
+ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
+{
+	struct memblock_type *type = &memblock.memory;
+	unsigned int right = type->cnt;
+	unsigned int mid, left = 0;
+	phys_addr_t addr = PFN_PHYS(++pfn);
+
+	do {
+		mid = (right + left) / 2;
+
+		if (addr < type->regions[mid].base)
+			right = mid;
+		else if (addr >= (type->regions[mid].base +
+				  type->regions[mid].size))
+			left = mid + 1;
+		else {
+			/* addr is within the region, so pfn is valid */
+			return pfn;
+		}
+	} while (left < right);
+
+	if (right == type->cnt)
+		return -1UL;
+	else
+		return PHYS_PFN(type->regions[right].base);
+}
+EXPORT_SYMBOL(memblock_next_valid_pfn);
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+#endif /*__ARM96_COMMON_H*/
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d797716..a517d43 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1245,6 +1245,8 @@ static inline int pfn_valid(unsigned long pfn)
 		return 0;
 	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
+
+#define next_valid_pfn(pfn)	(pfn++)
 #endif
 
 static inline int pfn_present(unsigned long pfn)
@@ -1270,6 +1272,10 @@ static inline int pfn_present(unsigned long pfn)
 #endif
 
 #define early_pfn_valid(pfn)	pfn_valid(pfn)
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+extern ulong memblock_next_valid_pfn(ulong pfn);
+#define next_valid_pfn(pfn)	memblock_next_valid_pfn(pfn)
+#endif
 void sparse_init(void);
 #else
 #define sparse_init()	do {} while (0)
@@ -1291,6 +1297,11 @@ struct mminit_pfnnid_cache {
 #define early_pfn_valid(pfn)	(1)
 #endif
 
+/* fallback to default defitions*/
+#ifndef next_valid_pfn
+#define next_valid_pfn	(pfn++)
+#endif
+
 void memory_present(int nid, unsigned long start, unsigned long end);
 unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c19f5ac..9d05f29 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5475,7 +5475,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	if (altmap && start_pfn == altmap->base_pfn)
 		start_pfn += altmap->reserve;
 
-	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+	for (pfn = start_pfn; pfn < end_pfn; next_valid_pfn(pfn)) {
 		/*
 		 * There can be holes in boot-time mem_map[]s handed to this
 		 * function.  They do not exist on hotplugged memory.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 2/5] arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
  2018-04-04  2:56 [PATCH v6 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64 Jia He
  2018-04-04  2:56 ` [PATCH v6 1/5] mm: page_alloc: remain memblock_next_valid_pfn() " Jia He
@ 2018-04-04  2:56 ` Jia He
  2018-04-04  2:56 ` [PATCH v6 3/5] mm/memblock: introduce memblock_search_pfn_regions() Jia He
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Jia He @ 2018-04-04  2:56 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Andrew Morton, Michal Hocko
  Cc: Wei Yang, Kees Cook, Laura Abbott, Vladimir Murzin,
	Philip Derrin, AKASHI Takahiro, James Morse, Steve Capper,
	Pavel Tatashin, Gioh Kim, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Daniel Jordan, Daniel Vacek,
	Eugeniu Rosca, linux-arm-kernel, linux-kernel, linux-mm, Jia He,
	Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. if pfn and pfn+1 are in the same
memblock region, we can simply pfn++ instead of doing the binary search
in memblock_next_valid_pfn.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 include/linux/arm96_common.h | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/linux/arm96_common.h b/include/linux/arm96_common.h
index a6f68ea..2f4dea4 100644
--- a/include/linux/arm96_common.h
+++ b/include/linux/arm96_common.h
@@ -5,32 +5,47 @@
 #ifndef __ARM96_COMMON_H
 #define __ARM96_COMMON_H
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
+static int early_region_idx __init_memblock = -1;
 /* HAVE_MEMBLOCK is always enabled on arm and arm64 */
 ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
 {
 	struct memblock_type *type = &memblock.memory;
-	unsigned int right = type->cnt;
-	unsigned int mid, left = 0;
+	struct memblock_region *regions = type->regions;
+	uint right = type->cnt;
+	uint mid, left = 0;
+	ulong start_pfn, end_pfn;
 	phys_addr_t addr = PFN_PHYS(++pfn);
 
+	/* fast path, return pfn+1 if next pfn is in the same region */
+	if (early_region_idx != -1) {
+		start_pfn = PFN_DOWN(regions[early_region_idx].base);
+		end_pfn = PFN_DOWN(regions[early_region_idx].base +
+				regions[early_region_idx].size);
+
+		if (pfn >= start_pfn && pfn < end_pfn)
+			return pfn;
+	}
+
+	/* slow path, do the binary searching */
 	do {
 		mid = (right + left) / 2;
 
-		if (addr < type->regions[mid].base)
+		if (addr < regions[mid].base)
 			right = mid;
-		else if (addr >= (type->regions[mid].base +
-				  type->regions[mid].size))
+		else if (addr >= (regions[mid].base + regions[mid].size))
 			left = mid + 1;
 		else {
-			/* addr is within the region, so pfn is valid */
+			early_region_idx = mid;
 			return pfn;
 		}
 	} while (left < right);
 
 	if (right == type->cnt)
 		return -1UL;
-	else
-		return PHYS_PFN(type->regions[right].base);
+
+	early_region_idx = right;
+
+	return PHYS_PFN(regions[early_region_idx].base);
 }
 EXPORT_SYMBOL(memblock_next_valid_pfn);
 #endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 3/5] mm/memblock: introduce memblock_search_pfn_regions()
  2018-04-04  2:56 [PATCH v6 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64 Jia He
  2018-04-04  2:56 ` [PATCH v6 1/5] mm: page_alloc: remain memblock_next_valid_pfn() " Jia He
  2018-04-04  2:56 ` [PATCH v6 2/5] arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
@ 2018-04-04  2:56 ` Jia He
  2018-04-04  2:56 ` [PATCH v6 4/5] arm: arm64: introduce pfn_valid_region() Jia He
  2018-04-04  2:56 ` [PATCH v6 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() Jia He
  4 siblings, 0 replies; 8+ messages in thread
From: Jia He @ 2018-04-04  2:56 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Andrew Morton, Michal Hocko
  Cc: Wei Yang, Kees Cook, Laura Abbott, Vladimir Murzin,
	Philip Derrin, AKASHI Takahiro, James Morse, Steve Capper,
	Pavel Tatashin, Gioh Kim, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Daniel Jordan, Daniel Vacek,
	Eugeniu Rosca, linux-arm-kernel, linux-kernel, linux-mm, Jia He,
	Jia He

This api is to find the region start_pfn and end_pfn of the input pfn.
With this helper, we can improve the loop in early_pfn_valid by reducing
the unnecessary binary searches.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 include/linux/memblock.h | 2 ++
 mm/memblock.c            | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 0257aee..a0127b3 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -203,6 +203,8 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
 	     i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+int memblock_search_pfn_regions(unsigned long pfn);
+
 /**
  * for_each_free_mem_range - iterate through free memblock areas
  * @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index ba7c878..0f4004c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1617,6 +1617,15 @@ static int __init_memblock memblock_search(struct memblock_type *type, phys_addr
 	return -1;
 }
 
+/* search memblock with the input pfn, return the region idx */
+int __init_memblock memblock_search_pfn_regions(unsigned long pfn)
+{
+	struct memblock_type *type = &memblock.memory;
+	int mid = memblock_search(type, PFN_PHYS(pfn));
+
+	return mid;
+}
+
 bool __init memblock_is_reserved(phys_addr_t addr)
 {
 	return memblock_search(&memblock.reserved, addr) != -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 4/5] arm: arm64: introduce pfn_valid_region()
  2018-04-04  2:56 [PATCH v6 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64 Jia He
                   ` (2 preceding siblings ...)
  2018-04-04  2:56 ` [PATCH v6 3/5] mm/memblock: introduce memblock_search_pfn_regions() Jia He
@ 2018-04-04  2:56 ` Jia He
  2018-04-04  2:56 ` [PATCH v6 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() Jia He
  4 siblings, 0 replies; 8+ messages in thread
From: Jia He @ 2018-04-04  2:56 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Andrew Morton, Michal Hocko
  Cc: Wei Yang, Kees Cook, Laura Abbott, Vladimir Murzin,
	Philip Derrin, AKASHI Takahiro, James Morse, Steve Capper,
	Pavel Tatashin, Gioh Kim, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Daniel Jordan, Daniel Vacek,
	Eugeniu Rosca, linux-arm-kernel, linux-kernel, linux-mm, Jia He,
	Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. in early_pfn_valid(), if pfn and
pfn+1 are in the same memblock region, we can record the last returned
memblock region index and check pfn++ is still in the same region. Thus
we can avoid do the slow binary searches.

Currently it only improve the performance on arm/arm64 and will have no
impact on other arches.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 include/linux/arm96_common.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/linux/arm96_common.h b/include/linux/arm96_common.h
index 2f4dea4..bb86bd3 100644
--- a/include/linux/arm96_common.h
+++ b/include/linux/arm96_common.h
@@ -48,5 +48,29 @@ ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
 	return PHYS_PFN(regions[early_region_idx].base);
 }
 EXPORT_SYMBOL(memblock_next_valid_pfn);
+
+int pfn_valid_region(ulong pfn)
+{
+	ulong start_pfn, end_pfn;
+	struct memblock_type *type = &memblock.memory;
+	struct memblock_region *regions = type->regions;
+
+	if (early_region_idx != -1) {
+		start_pfn = PFN_DOWN(regions[early_region_idx].base);
+		end_pfn = PFN_DOWN(regions[early_region_idx].base +
+					regions[early_region_idx].size);
+
+		if (pfn >= start_pfn && pfn < end_pfn)
+			return !memblock_is_nomap(
+					&regions[early_region_idx]);
+	}
+
+	early_region_idx = memblock_search_pfn_regions(pfn);
+	if (early_region_idx == -1)
+		return false;
+
+	return !memblock_is_nomap(&regions[early_region_idx]);
+}
+EXPORT_SYMBOL(pfn_valid_region);
 #endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
 #endif /*__ARM96_COMMON_H*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()
  2018-04-04  2:56 [PATCH v6 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64 Jia He
                   ` (3 preceding siblings ...)
  2018-04-04  2:56 ` [PATCH v6 4/5] arm: arm64: introduce pfn_valid_region() Jia He
@ 2018-04-04  2:56 ` Jia He
  4 siblings, 0 replies; 8+ messages in thread
From: Jia He @ 2018-04-04  2:56 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Andrew Morton, Michal Hocko
  Cc: Wei Yang, Kees Cook, Laura Abbott, Vladimir Murzin,
	Philip Derrin, AKASHI Takahiro, James Morse, Steve Capper,
	Pavel Tatashin, Gioh Kim, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Daniel Jordan, Daniel Vacek,
	Eugeniu Rosca, linux-arm-kernel, linux-kernel, linux-mm, Jia He,
	Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. in early_pfn_valid(), if pfn and
pfn+1 are in the same memblock region, we can record the last returned
memblock region index and check check pfn++ is still in the same region.

Currently it only improve the performance on arm64 and will have no
impact on other arches.

For the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 41313 us to 24345 us in my
armv8a server(QDF2400 with 96G memory).

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 include/linux/mmzone.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a517d43..516ffb49 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1271,11 +1271,16 @@ static inline int pfn_present(unsigned long pfn)
 #define pfn_to_nid(pfn)		(0)
 #endif
 
-#define early_pfn_valid(pfn)	pfn_valid(pfn)
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 extern ulong memblock_next_valid_pfn(ulong pfn);
 #define next_valid_pfn(pfn)	memblock_next_valid_pfn(pfn)
-#endif
+
+extern int pfn_valid_region(ulong pfn);
+#define early_pfn_valid(pfn)	pfn_valid_region(pfn)
+#else
+#define early_pfn_valid(pfn)    pfn_valid(pfn)
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
 void sparse_init(void);
 #else
 #define sparse_init()	do {} while (0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 1/5] mm: page_alloc: remain memblock_next_valid_pfn() on arm and arm64
  2018-04-04  2:56 ` [PATCH v6 1/5] mm: page_alloc: remain memblock_next_valid_pfn() " Jia He
@ 2018-04-04 14:19   ` kbuild test robot
  2018-04-04 14:34     ` Jia He
  0 siblings, 1 reply; 8+ messages in thread
From: kbuild test robot @ 2018-04-04 14:19 UTC (permalink / raw)
  To: Jia He
  Cc: kbuild-all, Russell King, Catalin Marinas, Will Deacon,
	Mark Rutland, Ard Biesheuvel, Andrew Morton, Michal Hocko,
	Wei Yang, Kees Cook, Laura Abbott, Vladimir Murzin,
	Philip Derrin, AKASHI Takahiro, James Morse, Steve Capper,
	Pavel Tatashin, Gioh Kim, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Daniel Jordan, Daniel Vacek,
	Eugeniu Rosca, linux-arm-kernel, linux-kernel, linux-mm, Jia He,
	Jia He

[-- Attachment #1: Type: text/plain, Size: 1645 bytes --]

Hi Jia,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.16 next-20180403]
[cannot apply to linus/master mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jia-He/mm-page_alloc-remain-memblock_next_valid_pfn-on-arm-and-arm64/20180404-200732
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: i386-randconfig-x013-201813 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/gfp.h:6:0,
                    from include/linux/mm.h:10,
                    from mm/page_alloc.c:18:
   mm/page_alloc.c: In function 'memmap_init_zone':
>> include/linux/mmzone.h:1299:28: error: called object is not a function or function pointer
    #define next_valid_pfn (pfn++)
                           ~~~~^~~
>> mm/page_alloc.c:5349:39: note: in expansion of macro 'next_valid_pfn'
     for (pfn = start_pfn; pfn < end_pfn; next_valid_pfn(pfn)) {
                                          ^~~~~~~~~~~~~~

vim +1299 include/linux/mmzone.h

  1296	
  1297	/* fallback to default defitions*/
  1298	#ifndef next_valid_pfn
> 1299	#define next_valid_pfn	(pfn++)
  1300	#endif
  1301	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31450 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 1/5] mm: page_alloc: remain memblock_next_valid_pfn() on arm and arm64
  2018-04-04 14:19   ` kbuild test robot
@ 2018-04-04 14:34     ` Jia He
  0 siblings, 0 replies; 8+ messages in thread
From: Jia He @ 2018-04-04 14:34 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Russell King, Catalin Marinas, Will Deacon,
	Mark Rutland, Ard Biesheuvel, Andrew Morton, Michal Hocko,
	Wei Yang, Kees Cook, Laura Abbott, Vladimir Murzin,
	Philip Derrin, AKASHI Takahiro, James Morse, Steve Capper,
	Pavel Tatashin, Gioh Kim, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Daniel Jordan, Daniel Vacek,
	Eugeniu Rosca, linux-arm-kernel, linux-kernel, linux-mm, Jia He

sorry, will fix it right now

Cheer,

Jia


On 4/4/2018 10:19 PM, kbuild test robot Wrote:
> Hi Jia,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on arm64/for-next/core]
> [also build test ERROR on v4.16 next-20180403]
> [cannot apply to linus/master mmotm/master]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Jia-He/mm-page_alloc-remain-memblock_next_valid_pfn-on-arm-and-arm64/20180404-200732
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
> config: i386-randconfig-x013-201813 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
>          # save the attached .config to linux build tree
>          make ARCH=i386
>
> All error/warnings (new ones prefixed by >>):
>
>     In file included from include/linux/gfp.h:6:0,
>                      from include/linux/mm.h:10,
>                      from mm/page_alloc.c:18:
>     mm/page_alloc.c: In function 'memmap_init_zone':
>>> include/linux/mmzone.h:1299:28: error: called object is not a function or function pointer
>      #define next_valid_pfn (pfn++)
>                             ~~~~^~~
>>> mm/page_alloc.c:5349:39: note: in expansion of macro 'next_valid_pfn'
>       for (pfn = start_pfn; pfn < end_pfn; next_valid_pfn(pfn)) {
>                                            ^~~~~~~~~~~~~~
>
> vim +1299 include/linux/mmzone.h
>
>    1296	
>    1297	/* fallback to default defitions*/
>    1298	#ifndef next_valid_pfn
>> 1299	#define next_valid_pfn	(pfn++)
>    1300	#endif
>    1301	
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

-- 
Cheers,
Jia

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-04-04 14:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-04  2:56 [PATCH v6 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64 Jia He
2018-04-04  2:56 ` [PATCH v6 1/5] mm: page_alloc: remain memblock_next_valid_pfn() " Jia He
2018-04-04 14:19   ` kbuild test robot
2018-04-04 14:34     ` Jia He
2018-04-04  2:56 ` [PATCH v6 2/5] arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
2018-04-04  2:56 ` [PATCH v6 3/5] mm/memblock: introduce memblock_search_pfn_regions() Jia He
2018-04-04  2:56 ` [PATCH v6 4/5] arm: arm64: introduce pfn_valid_region() Jia He
2018-04-04  2:56 ` [PATCH v6 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() Jia He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).