All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
@ 2018-03-26  3:02 Jia He
  2018-03-26  3:02 ` [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable Jia He
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Jia He @ 2018-03-26  3:02 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin
  Cc: Pavel Tatashin, Daniel Jordan, AKASHI Takahiro, Gioh Kim,
	Steven Sistare, Daniel Vacek, Eugeniu Rosca, Vlastimil Babka,
	linux-kernel, linux-mm, James Morse, Steve Capper, x86,
	Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") tried to optimize the loop in memmap_init_zone(). But
there is still some room for improvement.

Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
        is enabled
Patch 2 optimizes the memblock_next_valid_pfn()
Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
        because the changes are located across subsystems.

I tested the pfn loop process in memmap_init(), the same as before.
As for the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 41313 us to 24345 us in my
armv8a server(QDF2400 with 96G memory).

Attached the memblock region information in my server.
[   86.956758] Zone ranges:
[   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
[   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
[   86.972631] Movable zone start for each node
[   86.977179] Early memory node ranges
[   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
[   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
[   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
[   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
[   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
[   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
[   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
[   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
[   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
[   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
[   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
[   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
[   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
[   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
[   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
[   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
[   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
[   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
[   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
[   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
[   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
[   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
[   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
[   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
[   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
[   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
[   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
[   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
[   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
[   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
[   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
[   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
[   87.194791] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]

Without this patchset:
[  117.106153] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
[  117.113677] before memmap_init
[  117.118195] after  memmap_init
>>> memmap_init takes 4518 us
[  117.121446] before memmap_init
[  117.154992] after  memmap_init
>>> memmap_init takes 33546 us
[  117.158241] before memmap_init
[  117.161490] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 41313 us

With this patchset:
[   87.194791] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
[   87.202314] before memmap_init
[   87.206164] after  memmap_init
>>> memmap_init takes 3850 us
[   87.209416] before memmap_init
[   87.226662] after  memmap_init
>>> memmap_init takes 17246 us
[   87.229911] before memmap_init
[   87.233160] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 24345 us

Changelog:
V3: - fix 2 issues reported by kbuild test robot
V2: - rebase to mmotm latest
    - remain memblock_next_valid_pfn on arm64
    - refine memblock_search_pfn_regions and pfn_valid_region

Jia He (5):
  mm: page_alloc: remain memblock_next_valid_pfn() when
    CONFIG_HAVE_ARCH_PFN_VALID is enable
  mm: page_alloc: reduce unnecessary binary search in
    memblock_next_valid_pfn()
  mm/memblock: introduce memblock_search_pfn_regions()
  arm64: introduce pfn_valid_region()
  mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

 arch/arm64/include/asm/page.h    |  3 ++-
 arch/arm64/mm/init.c             | 25 ++++++++++++++++++-
 arch/x86/include/asm/mmzone_32.h |  2 +-
 include/linux/memblock.h         |  6 +++++
 include/linux/mmzone.h           | 12 ++++++---
 mm/memblock.c                    | 53 ++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c                  | 12 ++++++++-
 7 files changed, 106 insertions(+), 7 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable
  2018-03-26  3:02 [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Jia He
@ 2018-03-26  3:02 ` Jia He
  2018-03-28  9:18   ` Wei Yang
  2018-03-26  3:02 ` [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Jia He @ 2018-03-26  3:02 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin
  Cc: Pavel Tatashin, Daniel Jordan, AKASHI Takahiro, Gioh Kim,
	Steven Sistare, Daniel Vacek, Eugeniu Rosca, Vlastimil Babka,
	linux-kernel, linux-mm, James Morse, Steve Capper, x86,
	Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He, Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic bug. So Daniel Vacek reverted it later.

But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
enable. And as verified by Eugeniu Rosca, arm can benifit from this
commit. So remain the memblock_next_valid_pfn.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 include/linux/memblock.h |  4 ++++
 mm/memblock.c            | 29 +++++++++++++++++++++++++++++
 mm/page_alloc.c          | 11 ++++++++++-
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 0257aee..efbbe4b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
 	     i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long memblock_next_valid_pfn(unsigned long pfn);
+#endif
+
 /**
  * for_each_free_mem_range - iterate through free memblock areas
  * @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index ba7c878..bea5a9c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
 		*out_nid = r->nid;
 }
 
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
+{
+	struct memblock_type *type = &memblock.memory;
+	unsigned int right = type->cnt;
+	unsigned int mid, left = 0;
+	phys_addr_t addr = PFN_PHYS(++pfn);
+
+	do {
+		mid = (right + left) / 2;
+
+		if (addr < type->regions[mid].base)
+			right = mid;
+		else if (addr >= (type->regions[mid].base +
+				  type->regions[mid].size))
+			left = mid + 1;
+		else {
+			/* addr is within the region, so pfn is valid */
+			return pfn;
+		}
+	} while (left < right);
+
+	if (right == type->cnt)
+		return -1UL;
+	else
+		return PHYS_PFN(type->regions[right].base);
+}
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
 /**
  * memblock_set_node - set node ID on memblock regions
  * @base: base of area to set node ID for
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c19f5ac..2a967f7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (context != MEMMAP_EARLY)
 			goto not_early;
 
-		if (!early_pfn_valid(pfn))
+		if (!early_pfn_valid(pfn)) {
+#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
+			/*
+			 * Skip to the pfn preceding the next valid one (or
+			 * end_pfn), such that we hit a valid pfn (or end_pfn)
+			 * on our next iteration of the loop.
+			 */
+			pfn = memblock_next_valid_pfn(pfn) - 1;
+#endif
 			continue;
+		}
 		if (!early_pfn_in_nid(pfn, nid))
 			continue;
 		if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
  2018-03-26  3:02 [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Jia He
  2018-03-26  3:02 ` [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable Jia He
@ 2018-03-26  3:02 ` Jia He
  2018-03-27 17:17   ` Daniel Vacek
  2018-03-28  9:26   ` Wei Yang
  2018-03-26  3:02 ` [PATCH v3 3/5] mm/memblock: introduce memblock_search_pfn_regions() Jia He
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 29+ messages in thread
From: Jia He @ 2018-03-26  3:02 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin
  Cc: Pavel Tatashin, Daniel Jordan, AKASHI Takahiro, Gioh Kim,
	Steven Sistare, Daniel Vacek, Eugeniu Rosca, Vlastimil Babka,
	linux-kernel, linux-mm, James Morse, Steve Capper, x86,
	Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He, Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. if pfn and pfn+1 are in the same
memblock region, we can simply pfn++ instead of doing the binary search
in memblock_next_valid_pfn. This patch only works when
CONFIG_HAVE_ARCH_PFN_VALID is enable.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 include/linux/memblock.h |  2 +-
 mm/memblock.c            | 73 +++++++++++++++++++++++++++++-------------------
 mm/page_alloc.c          |  3 +-
 3 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index efbbe4b..a8fb2ab 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
-unsigned long memblock_next_valid_pfn(unsigned long pfn);
+unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
 #endif
 
 /**
diff --git a/mm/memblock.c b/mm/memblock.c
index bea5a9c..06c1a08 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
 		*out_nid = r->nid;
 }
 
-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
-{
-	struct memblock_type *type = &memblock.memory;
-	unsigned int right = type->cnt;
-	unsigned int mid, left = 0;
-	phys_addr_t addr = PFN_PHYS(++pfn);
-
-	do {
-		mid = (right + left) / 2;
-
-		if (addr < type->regions[mid].base)
-			right = mid;
-		else if (addr >= (type->regions[mid].base +
-				  type->regions[mid].size))
-			left = mid + 1;
-		else {
-			/* addr is within the region, so pfn is valid */
-			return pfn;
-		}
-	} while (left < right);
-
-	if (right == type->cnt)
-		return -1UL;
-	else
-		return PHYS_PFN(type->regions[right].base);
-}
-#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
-
 /**
  * memblock_set_node - set node ID on memblock regions
  * @base: base of area to set node ID for
@@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
 }
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
+							int *last_idx)
+{
+	struct memblock_type *type = &memblock.memory;
+	unsigned int right = type->cnt;
+	unsigned int mid, left = 0;
+	unsigned long start_pfn, end_pfn;
+	phys_addr_t addr = PFN_PHYS(++pfn);
+
+	/* fast path, return pfh+1 if next pfn is in the same region */
+	if (*last_idx != -1) {
+		start_pfn = PFN_DOWN(type->regions[*last_idx].base);
+		end_pfn = PFN_DOWN(type->regions[*last_idx].base +
+				type->regions[*last_idx].size);
+
+		if (pfn < end_pfn && pfn > start_pfn)
+			return pfn;
+	}
+
+	/* slow path, do the binary searching */
+	do {
+		mid = (right + left) / 2;
+
+		if (addr < type->regions[mid].base)
+			right = mid;
+		else if (addr >= (type->regions[mid].base +
+				  type->regions[mid].size))
+			left = mid + 1;
+		else {
+			*last_idx = mid;
+			return pfn;
+		}
+	} while (left < right);
+
+	if (right == type->cnt)
+		return -1UL;
+
+	*last_idx = right;
+
+	return PHYS_PFN(type->regions[*last_idx].base);
+}
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
 static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
 					phys_addr_t align, phys_addr_t start,
 					phys_addr_t end, int nid, ulong flags)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2a967f7..0bb0274 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	unsigned long end_pfn = start_pfn + size;
 	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long pfn;
+	int idx = -1;
 	unsigned long nr_initialised = 0;
 	struct page *page;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
@@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			 * end_pfn), such that we hit a valid pfn (or end_pfn)
 			 * on our next iteration of the loop.
 			 */
-			pfn = memblock_next_valid_pfn(pfn) - 1;
+			pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
 #endif
 			continue;
 		}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/5] mm/memblock: introduce memblock_search_pfn_regions()
  2018-03-26  3:02 [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Jia He
  2018-03-26  3:02 ` [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable Jia He
  2018-03-26  3:02 ` [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
@ 2018-03-26  3:02 ` Jia He
  2018-03-26  3:02 ` [PATCH v3 4/5] arm64: introduce pfn_valid_region() Jia He
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: Jia He @ 2018-03-26  3:02 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin
  Cc: Pavel Tatashin, Daniel Jordan, AKASHI Takahiro, Gioh Kim,
	Steven Sistare, Daniel Vacek, Eugeniu Rosca, Vlastimil Babka,
	linux-kernel, linux-mm, James Morse, Steve Capper, x86,
	Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He, Jia He

This api is the preparation for further optimizing early_pfn_valid

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 include/linux/memblock.h | 2 ++
 mm/memblock.c            | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index a8fb2ab..104bca6 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -207,6 +207,8 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
 unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
 #endif
 
+int memblock_search_pfn_regions(unsigned long pfn);
+
 /**
  * for_each_free_mem_range - iterate through free memblock areas
  * @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index 06c1a08..15fcde2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1661,6 +1661,15 @@ static int __init_memblock memblock_search(struct memblock_type *type, phys_addr
 	return -1;
 }
 
+/* search memblock with the input pfn, return the region idx */
+int __init_memblock memblock_search_pfn_regions(unsigned long pfn)
+{
+	struct memblock_type *type = &memblock.memory;
+	int mid = memblock_search(type, PFN_PHYS(pfn));
+
+	return mid;
+}
+
 bool __init memblock_is_reserved(phys_addr_t addr)
 {
 	return memblock_search(&memblock.reserved, addr) != -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 4/5] arm64: introduce pfn_valid_region()
  2018-03-26  3:02 [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Jia He
                   ` (2 preceding siblings ...)
  2018-03-26  3:02 ` [PATCH v3 3/5] mm/memblock: introduce memblock_search_pfn_regions() Jia He
@ 2018-03-26  3:02 ` Jia He
  2018-03-28  9:38   ` Wei Yang
  2018-03-26  3:02 ` [PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() Jia He
  2018-03-27  1:02 ` [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Wei Yang
  5 siblings, 1 reply; 29+ messages in thread
From: Jia He @ 2018-03-26  3:02 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin
  Cc: Pavel Tatashin, Daniel Jordan, AKASHI Takahiro, Gioh Kim,
	Steven Sistare, Daniel Vacek, Eugeniu Rosca, Vlastimil Babka,
	linux-kernel, linux-mm, James Morse, Steve Capper, x86,
	Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He, Jia He

This is the preparation for further optimizing in early_pfn_valid
on arm64.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 arch/arm64/include/asm/page.h |  3 ++-
 arch/arm64/mm/init.c          | 25 ++++++++++++++++++++++++-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 60d02c8..da2cba3 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,7 +38,8 @@ extern void clear_page(void *to);
 typedef struct page *pgtable_t;
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
-extern int pfn_valid(unsigned long);
+extern int pfn_valid(unsigned long pfn);
+extern int pfn_valid_region(unsigned long pfn, int *last_idx);
 #endif
 
 #include <asm/memory.h>
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b90..06433d5 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -290,7 +290,30 @@ int pfn_valid(unsigned long pfn)
 	return memblock_is_map_memory(pfn << PAGE_SHIFT);
 }
 EXPORT_SYMBOL(pfn_valid);
-#endif
+
+int pfn_valid_region(unsigned long pfn, int *last_idx)
+{
+	unsigned long start_pfn, end_pfn;
+	struct memblock_type *type = &memblock.memory;
+
+	if (*last_idx != -1) {
+		start_pfn = PFN_DOWN(type->regions[*last_idx].base);
+		end_pfn= PFN_DOWN(type->regions[*last_idx].base +
+					type->regions[*last_idx].size);
+
+		if (pfn >= start_pfn && pfn < end_pfn)
+			return !memblock_is_nomap(
+				&memblock.memory.regions[*last_idx]);
+	}
+
+	*last_idx = memblock_search_pfn_regions(pfn);
+	if (*last_idx == -1)
+		return false;
+
+	return !memblock_is_nomap(&memblock.memory.regions[*last_idx]);
+}
+EXPORT_SYMBOL(pfn_valid_region);
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
 
 #ifndef CONFIG_SPARSEMEM
 static void __init arm64_memory_present(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()
  2018-03-26  3:02 [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Jia He
                   ` (3 preceding siblings ...)
  2018-03-26  3:02 ` [PATCH v3 4/5] arm64: introduce pfn_valid_region() Jia He
@ 2018-03-26  3:02 ` Jia He
  2018-03-27 17:51   ` Daniel Vacek
  2018-03-27  1:02 ` [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Wei Yang
  5 siblings, 1 reply; 29+ messages in thread
From: Jia He @ 2018-03-26  3:02 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin
  Cc: Pavel Tatashin, Daniel Jordan, AKASHI Takahiro, Gioh Kim,
	Steven Sistare, Daniel Vacek, Eugeniu Rosca, Vlastimil Babka,
	linux-kernel, linux-mm, James Morse, Steve Capper, x86,
	Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He, Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. in early_pfn_valid(), if pfn and
pfn+1 are in the same memblock region, we can record the last returned
memblock region index and check check pfn++ is still in the same region.

Currently it only improve the performance on arm64 and will have no
impact on other arches.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 arch/x86/include/asm/mmzone_32.h |  2 +-
 include/linux/mmzone.h           | 12 +++++++++---
 mm/page_alloc.c                  |  2 +-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 73d8dd1..329d3ba 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -49,7 +49,7 @@ static inline int pfn_valid(int pfn)
 	return 0;
 }
 
-#define early_pfn_valid(pfn)	pfn_valid((pfn))
+#define early_pfn_valid(pfn, last_region_idx)	pfn_valid((pfn))
 
 #endif /* CONFIG_DISCONTIGMEM */
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d797716..3a686af 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1267,9 +1267,15 @@ static inline int pfn_present(unsigned long pfn)
 })
 #else
 #define pfn_to_nid(pfn)		(0)
-#endif
+#endif /*CONFIG_NUMA*/
+
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+#define early_pfn_valid(pfn, last_region_idx) \
+				pfn_valid_region(pfn, last_region_idx)
+#else
+#define early_pfn_valid(pfn, last_region_idx)	pfn_valid(pfn)
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
 
-#define early_pfn_valid(pfn)	pfn_valid(pfn)
 void sparse_init(void);
 #else
 #define sparse_init()	do {} while (0)
@@ -1288,7 +1294,7 @@ struct mminit_pfnnid_cache {
 };
 
 #ifndef early_pfn_valid
-#define early_pfn_valid(pfn)	(1)
+#define early_pfn_valid(pfn, last_region_idx)	(1)
 #endif
 
 void memory_present(int nid, unsigned long start, unsigned long end);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0bb0274..debccf3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5484,7 +5484,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (context != MEMMAP_EARLY)
 			goto not_early;
 
-		if (!early_pfn_valid(pfn)) {
+		if (!early_pfn_valid(pfn, &idx)) {
 #if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
 			/*
 			 * Skip to the pfn preceding the next valid one (or
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
  2018-03-26  3:02 [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Jia He
                   ` (4 preceding siblings ...)
  2018-03-26  3:02 ` [PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() Jia He
@ 2018-03-27  1:02 ` Wei Yang
  2018-03-27  7:15     ` Jia He
  5 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2018-03-27  1:02 UTC (permalink / raw)
  To: Jia He
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov

On Sun, Mar 25, 2018 at 08:02:14PM -0700, Jia He wrote:
>Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>where possible") tried to optimize the loop in memmap_init_zone(). But
>there is still some room for improvement.
>
>Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
>        is enabled
>Patch 2 optimizes the memblock_next_valid_pfn()
>Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
>        because the changes are located across subsystems.
>
>I tested the pfn loop process in memmap_init(), the same as before.
>As for the performance improvement, after this set, I can see the time
>overhead of memmap_init() is reduced from 41313 us to 24345 us in my
>armv8a server(QDF2400 with 96G memory).
>
>Attached the memblock region information in my server.
>[   86.956758] Zone ranges:
>[   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>[   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
>[   86.972631] Movable zone start for each node
>[   86.977179] Early memory node ranges
>[   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
>[   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
>[   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
>[   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
>[   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
>[   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
>[   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
>[   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
>[   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
>[   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
>[   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
>[   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
>[   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
>[   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
>[   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
>[   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
>[   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
>[   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
>[   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
>[   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
>[   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
>[   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
>[   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
>[   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
>[   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
>[   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
>[   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
>[   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
>[   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
>[   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
>[   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>[   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]

Hi, Jia

I haven't taken a deep look into your code, just one curious question on your
memory layout.

The log above is printed out in free_area_init_nodes(), which iterates on
memblock.memory and prints them. If I am not wrong, memory regions added to
memblock.memory are ordered and merged if possible.

While from your log, I see many regions could be merged but are isolated. For
example, the last two region:

  node   0: [mem 0x000000007eff0000-0x000000007effffff]
  node   0: [mem 0x000000007f000000-0x00000017ffffffff]

So I am curious why they are isolated instead of combined to one.

>From the code, the possible reason is the region's flag differs from each
other. If you have time, would you mind taking a look into this?

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
  2018-03-27  1:02 ` [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Wei Yang
@ 2018-03-27  7:15     ` Jia He
  0 siblings, 0 replies; 29+ messages in thread
From: Jia He @ 2018-03-27  7:15 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov



On 3/27/2018 9:02 AM, Wei Yang Wrote:
> On Sun, Mar 25, 2018 at 08:02:14PM -0700, Jia He wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") tried to optimize the loop in memmap_init_zone(). But
>> there is still some room for improvement.
>>
>> Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
>>         is enabled
>> Patch 2 optimizes the memblock_next_valid_pfn()
>> Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
>>         because the changes are located across subsystems.
>>
>> I tested the pfn loop process in memmap_init(), the same as before.
>> As for the performance improvement, after this set, I can see the time
>> overhead of memmap_init() is reduced from 41313 us to 24345 us in my
>> armv8a server(QDF2400 with 96G memory).
>>
>> Attached the memblock region information in my server.
>> [   86.956758] Zone ranges:
>> [   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>> [   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
>> [   86.972631] Movable zone start for each node
>> [   86.977179] Early memory node ranges
>> [   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
>> [   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
>> [   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
>> [   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
>> [   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
>> [   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
>> [   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
>> [   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
>> [   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
>> [   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
>> [   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
>> [   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
>> [   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
>> [   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
>> [   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
>> [   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
>> [   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
>> [   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
>> [   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
>> [   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
>> [   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
>> [   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
>> [   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
>> [   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
>> [   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
>> [   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
>> [   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
>> [   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
>> [   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
>> [   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
>> [   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> [   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
> Hi, Jia
>
> I haven't taken a deep look into your code, just one curious question on your
> memory layout.
>
> The log above is printed out in free_area_init_nodes(), which iterates on
> memblock.memory and prints them. If I am not wrong, memory regions added to
> memblock.memory are ordered and merged if possible.
>
> While from your log, I see many regions could be merged but are isolated. For
> example, the last two region:
>
>    node   0: [mem 0x000000007eff0000-0x000000007effffff]
>    node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>
> So I am curious why they are isolated instead of combined to one.
>
> >From the code, the possible reason is the region's flag differs from each
> other. If you have time, would you mind taking a look into this?
>
Hi Wei
I thought these 2 have different flags
[    0.000000] idx=30,region [7eff0000:10000]flag=4     <--- aka 
MEMBLOCK_NOMAP
[    0.000000]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
[    0.000000] idx=31,region [7f000000:81000000]flag=0 <--- aka 
MEMBLOCK_NONE
[    0.000000]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]

-- 
Cheers,
Jia

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
@ 2018-03-27  7:15     ` Jia He
  0 siblings, 0 replies; 29+ messages in thread
From: Jia He @ 2018-03-27  7:15 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov



On 3/27/2018 9:02 AM, Wei Yang Wrote:
> On Sun, Mar 25, 2018 at 08:02:14PM -0700, Jia He wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") tried to optimize the loop in memmap_init_zone(). But
>> there is still some room for improvement.
>>
>> Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
>>         is enabled
>> Patch 2 optimizes the memblock_next_valid_pfn()
>> Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
>>         because the changes are located across subsystems.
>>
>> I tested the pfn loop process in memmap_init(), the same as before.
>> As for the performance improvement, after this set, I can see the time
>> overhead of memmap_init() is reduced from 41313 us to 24345 us in my
>> armv8a server(QDF2400 with 96G memory).
>>
>> Attached the memblock region information in my server.
>> [   86.956758] Zone ranges:
>> [   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>> [   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
>> [   86.972631] Movable zone start for each node
>> [   86.977179] Early memory node ranges
>> [   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
>> [   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
>> [   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
>> [   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
>> [   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
>> [   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
>> [   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
>> [   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
>> [   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
>> [   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
>> [   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
>> [   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
>> [   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
>> [   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
>> [   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
>> [   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
>> [   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
>> [   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
>> [   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
>> [   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
>> [   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
>> [   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
>> [   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
>> [   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
>> [   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
>> [   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
>> [   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
>> [   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
>> [   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
>> [   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
>> [   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> [   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
> Hi, Jia
>
> I haven't taken a deep look into your code, just one curious question on your
> memory layout.
>
> The log above is printed out in free_area_init_nodes(), which iterates on
> memblock.memory and prints them. If I am not wrong, memory regions added to
> memblock.memory are ordered and merged if possible.
>
> While from your log, I see many regions could be merged but are isolated. For
> example, the last two region:
>
>    node   0: [mem 0x000000007eff0000-0x000000007effffff]
>    node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>
> So I am curious why they are isolated instead of combined to one.
>
> >From the code, the possible reason is the region's flag differs from each
> other. If you have time, would you mind taking a look into this?
>
Hi Wei
I thought these 2 have different flags
[A A A  0.000000] idx=30,region [7eff0000:10000]flag=4A A A A  <--- aka 
MEMBLOCK_NOMAP
[A A A  0.000000]A A  nodeA A  0: [mem 0x000000007eff0000-0x000000007effffff]
[A A A  0.000000] idx=31,region [7f000000:81000000]flag=0 <--- aka 
MEMBLOCK_NONE
[A A A  0.000000]A A  nodeA A  0: [mem 0x000000007f000000-0x00000017ffffffff]

-- 
Cheers,
Jia

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
  2018-03-26  3:02 ` [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
@ 2018-03-27 17:17   ` Daniel Vacek
  2018-03-28  2:09     ` Jia He
  2018-03-28  9:26   ` Wei Yang
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel Vacek @ 2018-03-27 17:17 UTC (permalink / raw)
  To: Jia He
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Eugeniu Rosca,
	Vlastimil Babka, open list, linux-mm, James Morse, Steve Capper,
	x86, Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He

On Mon, Mar 26, 2018 at 5:02 AM, Jia He <hejianet@gmail.com> wrote:
> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") optimized the loop in memmap_init_zone(). But there is
> still some room for improvement. E.g. if pfn and pfn+1 are in the same
> memblock region, we can simply pfn++ instead of doing the binary search
> in memblock_next_valid_pfn. This patch only works when
> CONFIG_HAVE_ARCH_PFN_VALID is enable.
>
> Signed-off-by: Jia He <jia.he@hxt-semitech.com>
> ---
>  include/linux/memblock.h |  2 +-
>  mm/memblock.c            | 73 +++++++++++++++++++++++++++++-------------------
>  mm/page_alloc.c          |  3 +-
>  3 files changed, 47 insertions(+), 31 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index efbbe4b..a8fb2ab 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
> -unsigned long memblock_next_valid_pfn(unsigned long pfn);
> +unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
>  #endif
>
>  /**
> diff --git a/mm/memblock.c b/mm/memblock.c
> index bea5a9c..06c1a08 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
>                 *out_nid = r->nid;
>  }
>
> -#ifdef CONFIG_HAVE_ARCH_PFN_VALID
> -unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
> -{
> -       struct memblock_type *type = &memblock.memory;
> -       unsigned int right = type->cnt;
> -       unsigned int mid, left = 0;
> -       phys_addr_t addr = PFN_PHYS(++pfn);
> -
> -       do {
> -               mid = (right + left) / 2;
> -
> -               if (addr < type->regions[mid].base)
> -                       right = mid;
> -               else if (addr >= (type->regions[mid].base +
> -                                 type->regions[mid].size))
> -                       left = mid + 1;
> -               else {
> -                       /* addr is within the region, so pfn is valid */
> -                       return pfn;
> -               }
> -       } while (left < right);
> -
> -       if (right == type->cnt)
> -               return -1UL;
> -       else
> -               return PHYS_PFN(type->regions[right].base);
> -}
> -#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
> -
>  /**
>   * memblock_set_node - set node ID on memblock regions
>   * @base: base of area to set node ID for
> @@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>  }
>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>
> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
> +                                                       int *last_idx)
> +{
> +       struct memblock_type *type = &memblock.memory;
> +       unsigned int right = type->cnt;
> +       unsigned int mid, left = 0;
> +       unsigned long start_pfn, end_pfn;
> +       phys_addr_t addr = PFN_PHYS(++pfn);
> +
> +       /* fast path, return pfh+1 if next pfn is in the same region */
> +       if (*last_idx != -1) {
> +               start_pfn = PFN_DOWN(type->regions[*last_idx].base);
> +               end_pfn = PFN_DOWN(type->regions[*last_idx].base +
> +                               type->regions[*last_idx].size);
> +
> +               if (pfn < end_pfn && pfn > start_pfn)
> +                       return pfn;
> +       }
> +
> +       /* slow path, do the binary searching */
> +       do {
> +               mid = (right + left) / 2;
> +
> +               if (addr < type->regions[mid].base)
> +                       right = mid;
> +               else if (addr >= (type->regions[mid].base +
> +                                 type->regions[mid].size))
> +                       left = mid + 1;
> +               else {
> +                       *last_idx = mid;
> +                       return pfn;
> +               }
> +       } while (left < right);
> +
> +       if (right == type->cnt)
> +               return -1UL;
> +
> +       *last_idx = right;
> +
> +       return PHYS_PFN(type->regions[*last_idx].base);
> +}
> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
> +
>  static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>                                         phys_addr_t align, phys_addr_t start,
>                                         phys_addr_t end, int nid, ulong flags)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2a967f7..0bb0274 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>         unsigned long end_pfn = start_pfn + size;
>         pg_data_t *pgdat = NODE_DATA(nid);
>         unsigned long pfn;
> +       int idx = -1;
>         unsigned long nr_initialised = 0;
>         struct page *page;
>  #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> @@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>                          * end_pfn), such that we hit a valid pfn (or end_pfn)
>                          * on our next iteration of the loop.
>                          */
> -                       pfn = memblock_next_valid_pfn(pfn) - 1;
> +                       pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
>  #endif
>                         continue;
>                 }
> --
> 2.7.4
>

So the function is only defined with CONFIG_HAVE_ARCH_PFN_VALID but
it's called with CONFIG_HAVE_MEMBLOCK_NODE_MAP? The definition should
likely depend on both options as the function really depends on both
conditions. Otherwise it should be defined nop.

--nX

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()
  2018-03-26  3:02 ` [PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() Jia He
@ 2018-03-27 17:51   ` Daniel Vacek
  2018-03-28  2:10     ` Jia He
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Vacek @ 2018-03-27 17:51 UTC (permalink / raw)
  To: Jia He
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Eugeniu Rosca,
	Vlastimil Babka, open list, linux-mm, James Morse, Steve Capper,
	x86, Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He

On Mon, Mar 26, 2018 at 5:02 AM, Jia He <hejianet@gmail.com> wrote:
> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") optimized the loop in memmap_init_zone(). But there is
> still some room for improvement. E.g. in early_pfn_valid(), if pfn and
> pfn+1 are in the same memblock region, we can record the last returned
> memblock region index and check check pfn++ is still in the same region.
>
> Currently it only improve the performance on arm64 and will have no
> impact on other arches.
>
> Signed-off-by: Jia He <jia.he@hxt-semitech.com>
> ---
>  arch/x86/include/asm/mmzone_32.h |  2 +-
>  include/linux/mmzone.h           | 12 +++++++++---
>  mm/page_alloc.c                  |  2 +-
>  3 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
> index 73d8dd1..329d3ba 100644
> --- a/arch/x86/include/asm/mmzone_32.h
> +++ b/arch/x86/include/asm/mmzone_32.h
> @@ -49,7 +49,7 @@ static inline int pfn_valid(int pfn)
>         return 0;
>  }
>
> -#define early_pfn_valid(pfn)   pfn_valid((pfn))
> +#define early_pfn_valid(pfn, last_region_idx)  pfn_valid((pfn))
>
>  #endif /* CONFIG_DISCONTIGMEM */
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d797716..3a686af 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1267,9 +1267,15 @@ static inline int pfn_present(unsigned long pfn)
>  })
>  #else
>  #define pfn_to_nid(pfn)                (0)
> -#endif
> +#endif /*CONFIG_NUMA*/
> +
> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
> +#define early_pfn_valid(pfn, last_region_idx) \
> +                               pfn_valid_region(pfn, last_region_idx)
> +#else
> +#define early_pfn_valid(pfn, last_region_idx)  pfn_valid(pfn)
> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>
> -#define early_pfn_valid(pfn)   pfn_valid(pfn)
>  void sparse_init(void);
>  #else
>  #define sparse_init()  do {} while (0)
> @@ -1288,7 +1294,7 @@ struct mminit_pfnnid_cache {
>  };
>
>  #ifndef early_pfn_valid
> -#define early_pfn_valid(pfn)   (1)
> +#define early_pfn_valid(pfn, last_region_idx)  (1)
>  #endif
>
>  void memory_present(int nid, unsigned long start, unsigned long end);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0bb0274..debccf3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5484,7 +5484,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>                 if (context != MEMMAP_EARLY)
>                         goto not_early;
>
> -               if (!early_pfn_valid(pfn)) {
> +               if (!early_pfn_valid(pfn, &idx)) {
>  #if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
>                         /*
>                          * Skip to the pfn preceding the next valid one (or
> --
> 2.7.4
>

Hmm, what about making index global variable instead of changing all
the prototypes? Similar to early_pfnnid_cache for example. Something
like:

#ifdef CONFIG_HAVE_ARCH_PFN_VALID
extern int early_region_idx __meminitdata;
#define early_pfn_valid(pfn) \
                               pfn_valid_region(pfn, &early_region_idx)
#else
#define early_pfn_valid(pfn)  pfn_valid(pfn)
#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/

And move this to arch/arm64/include/asm/page.h ?

--nX

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
  2018-03-27  7:15     ` Jia He
@ 2018-03-28  0:30       ` Wei Yang
  -1 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2018-03-28  0:30 UTC (permalink / raw)
  To: Jia He
  Cc: Wei Yang, Andrew Morton, Michal Hocko, Catalin Marinas,
	Mel Gorman, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Tatashin,
	Daniel Jordan, AKASHI Takahiro, Gioh Kim, Steven Sistare,
	Daniel Vacek, Eugeniu Rosca, Vlastimil Babka, linux-kernel,
	linux-mm, James Morse, Steve Capper, x86, Greg Kroah-Hartman,
	Kate Stewart, Philippe Ombredanne, Johannes Weiner, Kemi Wang,
	Petr Tesarik, YASUAKI ISHIMATSU, Andrey Ryabinin,
	Nikolay Borisov

On Tue, Mar 27, 2018 at 03:15:08PM +0800, Jia He wrote:
>
>
>On 3/27/2018 9:02 AM, Wei Yang Wrote:
>> On Sun, Mar 25, 2018 at 08:02:14PM -0700, Jia He wrote:
>> > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> > where possible") tried to optimize the loop in memmap_init_zone(). But
>> > there is still some room for improvement.
>> > 
>> > Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
>> >         is enabled
>> > Patch 2 optimizes the memblock_next_valid_pfn()
>> > Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
>> >         because the changes are located across subsystems.
>> > 
>> > I tested the pfn loop process in memmap_init(), the same as before.
>> > As for the performance improvement, after this set, I can see the time
>> > overhead of memmap_init() is reduced from 41313 us to 24345 us in my
>> > armv8a server(QDF2400 with 96G memory).
>> > 
>> > Attached the memblock region information in my server.
>> > [   86.956758] Zone ranges:
>> > [   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>> > [   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
>> > [   86.972631] Movable zone start for each node
>> > [   86.977179] Early memory node ranges
>> > [   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
>> > [   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
>> > [   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
>> > [   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
>> > [   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
>> > [   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
>> > [   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
>> > [   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
>> > [   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
>> > [   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
>> > [   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
>> > [   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
>> > [   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
>> > [   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
>> > [   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
>> > [   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
>> > [   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
>> > [   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
>> > [   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
>> > [   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
>> > [   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
>> > [   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
>> > [   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
>> > [   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
>> > [   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
>> > [   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
>> > [   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
>> > [   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
>> > [   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
>> > [   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
>> > [   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> > [   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> Hi, Jia
>> 
>> I haven't taken a deep look into your code, just one curious question on your
>> memory layout.
>> 
>> The log above is printed out in free_area_init_nodes(), which iterates on
>> memblock.memory and prints them. If I am not wrong, memory regions added to
>> memblock.memory are ordered and merged if possible.
>> 
>> While from your log, I see many regions could be merged but are isolated. For
>> example, the last two region:
>> 
>>    node   0: [mem 0x000000007eff0000-0x000000007effffff]
>>    node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> 
>> So I am curious why they are isolated instead of combined to one.
>> 
>> >From the code, the possible reason is the region's flag differs from each
>> other. If you have time, would you mind taking a look into this?
>> 
>Hi Wei
>I thought these 2 have different flags
>[    0.000000] idx=30,region [7eff0000:10000]flag=4     <--- aka
>MEMBLOCK_NOMAP
>[    0.000000]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>[    0.000000] idx=31,region [7f000000:81000000]flag=0 <--- aka MEMBLOCK_NONE
>[    0.000000]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]

Thanks.

Hmm, I am not that familiar with those flags, while they look like to indicate
the physical capability of this range.

	MEMBLOCK_NONE		no special
	MEMBLOCK_HOTPLUG	hotplug-able
	MEMBLOCK_MIRROR		high reliable
	MEMBLOCK_NOMAP		no direct map

While these flags are not there when they are first added into the memory
region. When you look at the memblock_add_range(), the last parameter passed
is always 0. This means current several separated ranges reflect the physical
memory capability layout.

Then, why this layout is so scattered? As you can see several ranges are less
than 1M.

If, just my assumption, we could merge some of them, we could have a better
performance. Less ranges, less searching time.

>
>-- 
>Cheers,
>Jia

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
@ 2018-03-28  0:30       ` Wei Yang
  0 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2018-03-28  0:30 UTC (permalink / raw)
  To: Jia He
  Cc: Wei Yang, Andrew Morton, Michal Hocko, Catalin Marinas,
	Mel Gorman, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Tatashin,
	Daniel Jordan, AKASHI Takahiro, Gioh Kim, Steven Sistare,
	Daniel Vacek, Eugeniu Rosca, Vlastimil Babka, linux-kernel,
	linux-mm, James Morse, Steve Capper, x86, Greg Kroah-Hartman,
	Kate Stewart, Philippe Ombredanne, Johannes Weiner, Kemi Wang,
	Petr Tesarik, YASUAKI ISHIMATSU, Andrey Ryabinin,
	Nikolay Borisov

On Tue, Mar 27, 2018 at 03:15:08PM +0800, Jia He wrote:
>
>
>On 3/27/2018 9:02 AM, Wei Yang Wrote:
>> On Sun, Mar 25, 2018 at 08:02:14PM -0700, Jia He wrote:
>> > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> > where possible") tried to optimize the loop in memmap_init_zone(). But
>> > there is still some room for improvement.
>> > 
>> > Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
>> >         is enabled
>> > Patch 2 optimizes the memblock_next_valid_pfn()
>> > Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
>> >         because the changes are located across subsystems.
>> > 
>> > I tested the pfn loop process in memmap_init(), the same as before.
>> > As for the performance improvement, after this set, I can see the time
>> > overhead of memmap_init() is reduced from 41313 us to 24345 us in my
>> > armv8a server(QDF2400 with 96G memory).
>> > 
>> > Attached the memblock region information in my server.
>> > [   86.956758] Zone ranges:
>> > [   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>> > [   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
>> > [   86.972631] Movable zone start for each node
>> > [   86.977179] Early memory node ranges
>> > [   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
>> > [   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
>> > [   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
>> > [   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
>> > [   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
>> > [   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
>> > [   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
>> > [   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
>> > [   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
>> > [   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
>> > [   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
>> > [   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
>> > [   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
>> > [   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
>> > [   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
>> > [   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
>> > [   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
>> > [   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
>> > [   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
>> > [   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
>> > [   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
>> > [   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
>> > [   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
>> > [   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
>> > [   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
>> > [   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
>> > [   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
>> > [   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
>> > [   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
>> > [   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
>> > [   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> > [   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> Hi, Jia
>> 
>> I haven't taken a deep look into your code, just one curious question on your
>> memory layout.
>> 
>> The log above is printed out in free_area_init_nodes(), which iterates on
>> memblock.memory and prints them. If I am not wrong, memory regions added to
>> memblock.memory are ordered and merged if possible.
>> 
>> While from your log, I see many regions could be merged but are isolated. For
>> example, the last two region:
>> 
>>    node   0: [mem 0x000000007eff0000-0x000000007effffff]
>>    node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> 
>> So I am curious why they are isolated instead of combined to one.
>> 
>> >From the code, the possible reason is the region's flag differs from each
>> other. If you have time, would you mind taking a look into this?
>> 
>Hi Wei
>I thought these 2 have different flags
>[    0.000000] idx=30,region [7eff0000:10000]flag=4     <--- aka
>MEMBLOCK_NOMAP
>[    0.000000]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>[    0.000000] idx=31,region [7f000000:81000000]flag=0 <--- aka MEMBLOCK_NONE
>[    0.000000]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]

Thanks.

Hmm, I am not that familiar with those flags, while they look like to indicate
the physical capability of this range.

	MEMBLOCK_NONE		no special
	MEMBLOCK_HOTPLUG	hotplug-able
	MEMBLOCK_MIRROR		high reliable
	MEMBLOCK_NOMAP		no direct map

While these flags are not there when they are first added into the memory
region. When you look at the memblock_add_range(), the last parameter passed
is always 0. This means current several separated ranges reflect the physical
memory capability layout.

Then, why this layout is so scattered? As you can see several ranges are less
than 1M.

If, just my assumption, we could merge some of them, we could have a better
performance. Less ranges, less searching time.

>
>-- 
>Cheers,
>Jia

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
  2018-03-28  0:30       ` Wei Yang
@ 2018-03-28  1:45         ` Jia He
  -1 siblings, 0 replies; 29+ messages in thread
From: Jia He @ 2018-03-28  1:45 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov



On 3/28/2018 8:30 AM, Wei Yang Wrote:
> On Tue, Mar 27, 2018 at 03:15:08PM +0800, Jia He wrote:
>>
>> On 3/27/2018 9:02 AM, Wei Yang Wrote:
>>> On Sun, Mar 25, 2018 at 08:02:14PM -0700, Jia He wrote:
>>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>>>> where possible") tried to optimize the loop in memmap_init_zone(). But
>>>> there is still some room for improvement.
>>>>
>>>> Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
>>>>          is enabled
>>>> Patch 2 optimizes the memblock_next_valid_pfn()
>>>> Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
>>>>          because the changes are located across subsystems.
>>>>
>>>> I tested the pfn loop process in memmap_init(), the same as before.
>>>> As for the performance improvement, after this set, I can see the time
>>>> overhead of memmap_init() is reduced from 41313 us to 24345 us in my
>>>> armv8a server(QDF2400 with 96G memory).
>>>>
>>>> Attached the memblock region information in my server.
>>>> [   86.956758] Zone ranges:
>>>> [   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>>>> [   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
>>>> [   86.972631] Movable zone start for each node
>>>> [   86.977179] Early memory node ranges
>>>> [   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
>>>> [   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
>>>> [   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
>>>> [   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
>>>> [   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
>>>> [   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
>>>> [   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
>>>> [   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
>>>> [   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
>>>> [   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
>>>> [   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
>>>> [   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
>>>> [   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
>>>> [   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
>>>> [   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
>>>> [   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
>>>> [   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
>>>> [   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
>>>> [   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
>>>> [   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
>>>> [   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
>>>> [   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
>>>> [   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
>>>> [   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
>>>> [   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
>>>> [   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
>>>> [   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
>>>> [   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
>>>> [   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
>>>> [   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
>>>> [   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>>>> [   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>>> Hi, Jia
>>>
>>> I haven't taken a deep look into your code, just one curious question on your
>>> memory layout.
>>>
>>> The log above is printed out in free_area_init_nodes(), which iterates on
>>> memblock.memory and prints them. If I am not wrong, memory regions added to
>>> memblock.memory are ordered and merged if possible.
>>>
>>> While from your log, I see many regions could be merged but are isolated. For
>>> example, the last two region:
>>>
>>>     node   0: [mem 0x000000007eff0000-0x000000007effffff]
>>>     node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>>>
>>> So I am curious why they are isolated instead of combined to one.
>>>
>>> >From the code, the possible reason is the region's flag differs from each
>>> other. If you have time, would you mind taking a look into this?
>>>
>> Hi Wei
>> I thought these 2 have different flags
>> [    0.000000] idx=30,region [7eff0000:10000]flag=4     <--- aka
>> MEMBLOCK_NOMAP
>> [    0.000000]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> [    0.000000] idx=31,region [7f000000:81000000]flag=0 <--- aka MEMBLOCK_NONE
>> [    0.000000]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
> Thanks.
>
> Hmm, I am not that familiar with those flags, while they look like to indicate
> the physical capability of this range.
>
> 	MEMBLOCK_NONE		no special
> 	MEMBLOCK_HOTPLUG	hotplug-able
> 	MEMBLOCK_MIRROR		high reliable
> 	MEMBLOCK_NOMAP		no direct map
>
> While these flags are not there when they are first added into the memory
> region. When you look at the memblock_add_range(), the last parameter passed
> is always 0. This means current several separated ranges reflect the physical
> memory capability layout.
>
> Then, why this layout is so scattered? As you can see several ranges are less
> than 1M.
>
> If, just my assumption, we could merge some of them, we could have a better
> performance. Less ranges, less searching time.
Thanks for your suggestions, Wei
Need further digging and will consider to improve it in another patchset.

-- 
Cheers,
Jia

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
@ 2018-03-28  1:45         ` Jia He
  0 siblings, 0 replies; 29+ messages in thread
From: Jia He @ 2018-03-28  1:45 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov



On 3/28/2018 8:30 AM, Wei Yang Wrote:
> On Tue, Mar 27, 2018 at 03:15:08PM +0800, Jia He wrote:
>>
>> On 3/27/2018 9:02 AM, Wei Yang Wrote:
>>> On Sun, Mar 25, 2018 at 08:02:14PM -0700, Jia He wrote:
>>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>>>> where possible") tried to optimize the loop in memmap_init_zone(). But
>>>> there is still some room for improvement.
>>>>
>>>> Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
>>>>          is enabled
>>>> Patch 2 optimizes the memblock_next_valid_pfn()
>>>> Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
>>>>          because the changes are located across subsystems.
>>>>
>>>> I tested the pfn loop process in memmap_init(), the same as before.
>>>> As for the performance improvement, after this set, I can see the time
>>>> overhead of memmap_init() is reduced from 41313 us to 24345 us in my
>>>> armv8a server(QDF2400 with 96G memory).
>>>>
>>>> Attached the memblock region information in my server.
>>>> [   86.956758] Zone ranges:
>>>> [   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>>>> [   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
>>>> [   86.972631] Movable zone start for each node
>>>> [   86.977179] Early memory node ranges
>>>> [   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
>>>> [   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
>>>> [   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
>>>> [   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
>>>> [   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
>>>> [   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
>>>> [   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
>>>> [   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
>>>> [   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
>>>> [   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
>>>> [   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
>>>> [   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
>>>> [   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
>>>> [   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
>>>> [   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
>>>> [   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
>>>> [   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
>>>> [   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
>>>> [   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
>>>> [   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
>>>> [   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
>>>> [   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
>>>> [   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
>>>> [   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
>>>> [   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
>>>> [   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
>>>> [   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
>>>> [   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
>>>> [   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
>>>> [   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
>>>> [   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>>>> [   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>>> Hi, Jia
>>>
>>> I haven't taken a deep look into your code, just one curious question on your
>>> memory layout.
>>>
>>> The log above is printed out in free_area_init_nodes(), which iterates on
>>> memblock.memory and prints them. If I am not wrong, memory regions added to
>>> memblock.memory are ordered and merged if possible.
>>>
>>> While from your log, I see many regions could be merged but are isolated. For
>>> example, the last two region:
>>>
>>>     node   0: [mem 0x000000007eff0000-0x000000007effffff]
>>>     node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>>>
>>> So I am curious why they are isolated instead of combined to one.
>>>
>>> >From the code, the possible reason is the region's flag differs from each
>>> other. If you have time, would you mind taking a look into this?
>>>
>> Hi Wei
>> I thought these 2 have different flags
>> [A A A  0.000000] idx=30,region [7eff0000:10000]flag=4A A A A  <--- aka
>> MEMBLOCK_NOMAP
>> [A A A  0.000000]A A  nodeA A  0: [mem 0x000000007eff0000-0x000000007effffff]
>> [A A A  0.000000] idx=31,region [7f000000:81000000]flag=0 <--- aka MEMBLOCK_NONE
>> [A A A  0.000000]A A  nodeA A  0: [mem 0x000000007f000000-0x00000017ffffffff]
> Thanks.
>
> Hmm, I am not that familiar with those flags, while they look like to indicate
> the physical capability of this range.
>
> 	MEMBLOCK_NONE		no special
> 	MEMBLOCK_HOTPLUG	hotplug-able
> 	MEMBLOCK_MIRROR		high reliable
> 	MEMBLOCK_NOMAP		no direct map
>
> While these flags are not there when they are first added into the memory
> region. When you look at the memblock_add_range(), the last parameter passed
> is always 0. This means current several separated ranges reflect the physical
> memory capability layout.
>
> Then, why this layout is so scattered? As you can see several ranges are less
> than 1M.
>
> If, just my assumption, we could merge some of them, we could have a better
> performance. Less ranges, less searching time.
Thanks for your suggestions, Wei
Need further digging and will consider to improve it in another patchset.

-- 
Cheers,
Jia

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
  2018-03-27 17:17   ` Daniel Vacek
@ 2018-03-28  2:09     ` Jia He
  0 siblings, 0 replies; 29+ messages in thread
From: Jia He @ 2018-03-28  2:09 UTC (permalink / raw)
  To: Daniel Vacek
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Eugeniu Rosca,
	Vlastimil Babka, open list, linux-mm, James Morse, Steve Capper,
	x86, Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He



On 3/28/2018 1:17 AM, Daniel Vacek Wrote:
> On Mon, Mar 26, 2018 at 5:02 AM, Jia He <hejianet@gmail.com> wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But there is
>> still some room for improvement. E.g. if pfn and pfn+1 are in the same
>> memblock region, we can simply pfn++ instead of doing the binary search
>> in memblock_next_valid_pfn. This patch only works when
>> CONFIG_HAVE_ARCH_PFN_VALID is enable.
>>
>> Signed-off-by: Jia He <jia.he@hxt-semitech.com>
>> ---
>>   include/linux/memblock.h |  2 +-
>>   mm/memblock.c            | 73 +++++++++++++++++++++++++++++-------------------
>>   mm/page_alloc.c          |  3 +-
>>   3 files changed, 47 insertions(+), 31 deletions(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index efbbe4b..a8fb2ab 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
>>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>
>>   #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> -unsigned long memblock_next_valid_pfn(unsigned long pfn);
>> +unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
>>   #endif
>>
>>   /**
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index bea5a9c..06c1a08 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
>>                  *out_nid = r->nid;
>>   }
>>
>> -#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> -unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>> -{
>> -       struct memblock_type *type = &memblock.memory;
>> -       unsigned int right = type->cnt;
>> -       unsigned int mid, left = 0;
>> -       phys_addr_t addr = PFN_PHYS(++pfn);
>> -
>> -       do {
>> -               mid = (right + left) / 2;
>> -
>> -               if (addr < type->regions[mid].base)
>> -                       right = mid;
>> -               else if (addr >= (type->regions[mid].base +
>> -                                 type->regions[mid].size))
>> -                       left = mid + 1;
>> -               else {
>> -                       /* addr is within the region, so pfn is valid */
>> -                       return pfn;
>> -               }
>> -       } while (left < right);
>> -
>> -       if (right == type->cnt)
>> -               return -1UL;
>> -       else
>> -               return PHYS_PFN(type->regions[right].base);
>> -}
>> -#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>> -
>>   /**
>>    * memblock_set_node - set node ID on memblock regions
>>    * @base: base of area to set node ID for
>> @@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>>   }
>>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>
>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
>> +                                                       int *last_idx)
>> +{
>> +       struct memblock_type *type = &memblock.memory;
>> +       unsigned int right = type->cnt;
>> +       unsigned int mid, left = 0;
>> +       unsigned long start_pfn, end_pfn;
>> +       phys_addr_t addr = PFN_PHYS(++pfn);
>> +
>> +       /* fast path, return pfh+1 if next pfn is in the same region */
>> +       if (*last_idx != -1) {
>> +               start_pfn = PFN_DOWN(type->regions[*last_idx].base);
>> +               end_pfn = PFN_DOWN(type->regions[*last_idx].base +
>> +                               type->regions[*last_idx].size);
>> +
>> +               if (pfn < end_pfn && pfn > start_pfn)
>> +                       return pfn;
>> +       }
>> +
>> +       /* slow path, do the binary searching */
>> +       do {
>> +               mid = (right + left) / 2;
>> +
>> +               if (addr < type->regions[mid].base)
>> +                       right = mid;
>> +               else if (addr >= (type->regions[mid].base +
>> +                                 type->regions[mid].size))
>> +                       left = mid + 1;
>> +               else {
>> +                       *last_idx = mid;
>> +                       return pfn;
>> +               }
>> +       } while (left < right);
>> +
>> +       if (right == type->cnt)
>> +               return -1UL;
>> +
>> +       *last_idx = right;
>> +
>> +       return PHYS_PFN(type->regions[*last_idx].base);
>> +}
>> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>> +
>>   static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>>                                          phys_addr_t align, phys_addr_t start,
>>                                          phys_addr_t end, int nid, ulong flags)
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 2a967f7..0bb0274 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>          unsigned long end_pfn = start_pfn + size;
>>          pg_data_t *pgdat = NODE_DATA(nid);
>>          unsigned long pfn;
>> +       int idx = -1;
>>          unsigned long nr_initialised = 0;
>>          struct page *page;
>>   #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> @@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>                           * end_pfn), such that we hit a valid pfn (or end_pfn)
>>                           * on our next iteration of the loop.
>>                           */
>> -                       pfn = memblock_next_valid_pfn(pfn) - 1;
>> +                       pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
>>   #endif
>>                          continue;
>>                  }
>> --
>> 2.7.4
>>
> So the function is only defined with CONFIG_HAVE_ARCH_PFN_VALID but
> it's called with CONFIG_HAVE_MEMBLOCK_NODE_MAP? The definition should
> likely depend on both options as the function really depends on both
> conditions. Otherwise it should be defined nop.
>
> --nX
>
Yes, thanks

-- 
Cheers,
Jia

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()
  2018-03-27 17:51   ` Daniel Vacek
@ 2018-03-28  2:10     ` Jia He
  0 siblings, 0 replies; 29+ messages in thread
From: Jia He @ 2018-03-28  2:10 UTC (permalink / raw)
  To: Daniel Vacek
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Eugeniu Rosca,
	Vlastimil Babka, open list, linux-mm, James Morse, Steve Capper,
	x86, Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Jia He



On 3/28/2018 1:51 AM, Daniel Vacek Wrote:
> On Mon, Mar 26, 2018 at 5:02 AM, Jia He <hejianet@gmail.com> wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But there is
>> still some room for improvement. E.g. in early_pfn_valid(), if pfn and
>> pfn+1 are in the same memblock region, we can record the last returned
>> memblock region index and check check pfn++ is still in the same region.
>>
>> Currently it only improve the performance on arm64 and will have no
>> impact on other arches.
>>
>> Signed-off-by: Jia He <jia.he@hxt-semitech.com>
>> ---
>>   arch/x86/include/asm/mmzone_32.h |  2 +-
>>   include/linux/mmzone.h           | 12 +++++++++---
>>   mm/page_alloc.c                  |  2 +-
>>   3 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
>> index 73d8dd1..329d3ba 100644
>> --- a/arch/x86/include/asm/mmzone_32.h
>> +++ b/arch/x86/include/asm/mmzone_32.h
>> @@ -49,7 +49,7 @@ static inline int pfn_valid(int pfn)
>>          return 0;
>>   }
>>
>> -#define early_pfn_valid(pfn)   pfn_valid((pfn))
>> +#define early_pfn_valid(pfn, last_region_idx)  pfn_valid((pfn))
>>
>>   #endif /* CONFIG_DISCONTIGMEM */
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index d797716..3a686af 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1267,9 +1267,15 @@ static inline int pfn_present(unsigned long pfn)
>>   })
>>   #else
>>   #define pfn_to_nid(pfn)                (0)
>> -#endif
>> +#endif /*CONFIG_NUMA*/
>> +
>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> +#define early_pfn_valid(pfn, last_region_idx) \
>> +                               pfn_valid_region(pfn, last_region_idx)
>> +#else
>> +#define early_pfn_valid(pfn, last_region_idx)  pfn_valid(pfn)
>> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>>
>> -#define early_pfn_valid(pfn)   pfn_valid(pfn)
>>   void sparse_init(void);
>>   #else
>>   #define sparse_init()  do {} while (0)
>> @@ -1288,7 +1294,7 @@ struct mminit_pfnnid_cache {
>>   };
>>
>>   #ifndef early_pfn_valid
>> -#define early_pfn_valid(pfn)   (1)
>> +#define early_pfn_valid(pfn, last_region_idx)  (1)
>>   #endif
>>
>>   void memory_present(int nid, unsigned long start, unsigned long end);
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 0bb0274..debccf3 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5484,7 +5484,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>                  if (context != MEMMAP_EARLY)
>>                          goto not_early;
>>
>> -               if (!early_pfn_valid(pfn)) {
>> +               if (!early_pfn_valid(pfn, &idx)) {
>>   #if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
>>                          /*
>>                           * Skip to the pfn preceding the next valid one (or
>> --
>> 2.7.4
>>
> Hmm, what about making index global variable instead of changing all
> the prototypes? Similar to early_pfnnid_cache for example. Something
> like:
>
> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
> extern int early_region_idx __meminitdata;
> #define early_pfn_valid(pfn) \
>                                 pfn_valid_region(pfn, &early_region_idx)
> #else
> #define early_pfn_valid(pfn)  pfn_valid(pfn)
> #endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>
> And move this to arch/arm64/include/asm/page.h ?
>
> --nX
>
Yes. ok with me

-- 
Cheers,
Jia

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
  2018-03-28  1:45         ` Jia He
@ 2018-03-28  2:36           ` Wei Yang
  -1 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2018-03-28  2:36 UTC (permalink / raw)
  To: Jia He
  Cc: Wei Yang, Andrew Morton, Michal Hocko, Catalin Marinas,
	Mel Gorman, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Tatashin,
	Daniel Jordan, AKASHI Takahiro, Gioh Kim, Steven Sistare,
	Daniel Vacek, Eugeniu Rosca, Vlastimil Babka, linux-kernel,
	linux-mm, James Morse, Steve Capper, x86, Greg Kroah-Hartman,
	Kate Stewart, Philippe Ombredanne, Johannes Weiner, Kemi Wang,
	Petr Tesarik, YASUAKI ISHIMATSU, Andrey Ryabinin,
	Nikolay Borisov

On Wed, Mar 28, 2018 at 09:45:33AM +0800, Jia He wrote:
>
>
>On 3/28/2018 8:30 AM, Wei Yang Wrote:
>> On Tue, Mar 27, 2018 at 03:15:08PM +0800, Jia He wrote:
>> > 
>> > On 3/27/2018 9:02 AM, Wei Yang Wrote:
>> > > On Sun, Mar 25, 2018 at 08:02:14PM -0700, Jia He wrote:
>> > > > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> > > > where possible") tried to optimize the loop in memmap_init_zone(). But
>> > > > there is still some room for improvement.
>> > > > 
>> > > > Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
>> > > >          is enabled
>> > > > Patch 2 optimizes the memblock_next_valid_pfn()
>> > > > Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
>> > > >          because the changes are located across subsystems.
>> > > > 
>> > > > I tested the pfn loop process in memmap_init(), the same as before.
>> > > > As for the performance improvement, after this set, I can see the time
>> > > > overhead of memmap_init() is reduced from 41313 us to 24345 us in my
>> > > > armv8a server(QDF2400 with 96G memory).
>> > > > 
>> > > > Attached the memblock region information in my server.
>> > > > [   86.956758] Zone ranges:
>> > > > [   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>> > > > [   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
>> > > > [   86.972631] Movable zone start for each node
>> > > > [   86.977179] Early memory node ranges
>> > > > [   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
>> > > > [   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
>> > > > [   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
>> > > > [   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
>> > > > [   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
>> > > > [   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
>> > > > [   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
>> > > > [   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
>> > > > [   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
>> > > > [   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
>> > > > [   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
>> > > > [   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
>> > > > [   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
>> > > > [   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
>> > > > [   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
>> > > > [   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
>> > > > [   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
>> > > > [   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
>> > > > [   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
>> > > > [   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
>> > > > [   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
>> > > > [   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
>> > > > [   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
>> > > > [   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
>> > > > [   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
>> > > > [   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
>> > > > [   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
>> > > > [   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
>> > > > [   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
>> > > > [   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
>> > > > [   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> > > > [   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> > > Hi, Jia
>> > > 
>> > > I haven't taken a deep look into your code, just one curious question on your
>> > > memory layout.
>> > > 
>> > > The log above is printed out in free_area_init_nodes(), which iterates on
>> > > memblock.memory and prints them. If I am not wrong, memory regions added to
>> > > memblock.memory are ordered and merged if possible.
>> > > 
>> > > While from your log, I see many regions could be merged but are isolated. For
>> > > example, the last two region:
>> > > 
>> > >     node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> > >     node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> > > 
>> > > So I am curious why they are isolated instead of combined to one.
>> > > 
>> > > >From the code, the possible reason is the region's flag differs from each
>> > > other. If you have time, would you mind taking a look into this?
>> > > 
>> > Hi Wei
>> > I thought these 2 have different flags
>> > [    0.000000] idx=30,region [7eff0000:10000]flag=4     <--- aka
>> > MEMBLOCK_NOMAP
>> > [    0.000000]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> > [    0.000000] idx=31,region [7f000000:81000000]flag=0 <--- aka MEMBLOCK_NONE
>> > [    0.000000]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> Thanks.
>> 
>> Hmm, I am not that familiar with those flags, while they look like to indicate
>> the physical capability of this range.
>> 
>> 	MEMBLOCK_NONE		no special
>> 	MEMBLOCK_HOTPLUG	hotplug-able
>> 	MEMBLOCK_MIRROR		high reliable
>> 	MEMBLOCK_NOMAP		no direct map
>> 
>> While these flags are not there when they are first added into the memory
>> region. When you look at the memblock_add_range(), the last parameter passed
>> is always 0. This means current several separated ranges reflect the physical
>> memory capability layout.
>> 
>> Then, why this layout is so scattered? As you can see several ranges are less
>> than 1M.
>> 
>> If, just my assumption, we could merge some of them, we could have a better
>> performance. Less ranges, less searching time.
>Thanks for your suggestions, Wei
>Need further digging and will consider to improve it in another patchset.
>

You are welcome :-)

I am glad to see your further patchset or investigation, if you are willing me
to involve.

>-- 
>Cheers,
>Jia

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
@ 2018-03-28  2:36           ` Wei Yang
  0 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2018-03-28  2:36 UTC (permalink / raw)
  To: Jia He
  Cc: Wei Yang, Andrew Morton, Michal Hocko, Catalin Marinas,
	Mel Gorman, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Tatashin,
	Daniel Jordan, AKASHI Takahiro, Gioh Kim, Steven Sistare,
	Daniel Vacek, Eugeniu Rosca, Vlastimil Babka, linux-kernel,
	linux-mm, James Morse, Steve Capper, x86, Greg Kroah-Hartman,
	Kate Stewart, Philippe Ombredanne, Johannes Weiner, Kemi Wang,
	Petr Tesarik, YASUAKI ISHIMATSU, Andrey Ryabinin,
	Nikolay Borisov

On Wed, Mar 28, 2018 at 09:45:33AM +0800, Jia He wrote:
>
>
>On 3/28/2018 8:30 AM, Wei Yang Wrote:
>> On Tue, Mar 27, 2018 at 03:15:08PM +0800, Jia He wrote:
>> > 
>> > On 3/27/2018 9:02 AM, Wei Yang Wrote:
>> > > On Sun, Mar 25, 2018 at 08:02:14PM -0700, Jia He wrote:
>> > > > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> > > > where possible") tried to optimize the loop in memmap_init_zone(). But
>> > > > there is still some room for improvement.
>> > > > 
>> > > > Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
>> > > >          is enabled
>> > > > Patch 2 optimizes the memblock_next_valid_pfn()
>> > > > Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
>> > > >          because the changes are located across subsystems.
>> > > > 
>> > > > I tested the pfn loop process in memmap_init(), the same as before.
>> > > > As for the performance improvement, after this set, I can see the time
>> > > > overhead of memmap_init() is reduced from 41313 us to 24345 us in my
>> > > > armv8a server(QDF2400 with 96G memory).
>> > > > 
>> > > > Attached the memblock region information in my server.
>> > > > [   86.956758] Zone ranges:
>> > > > [   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>> > > > [   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
>> > > > [   86.972631] Movable zone start for each node
>> > > > [   86.977179] Early memory node ranges
>> > > > [   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
>> > > > [   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
>> > > > [   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
>> > > > [   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
>> > > > [   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
>> > > > [   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
>> > > > [   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
>> > > > [   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
>> > > > [   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
>> > > > [   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
>> > > > [   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
>> > > > [   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
>> > > > [   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
>> > > > [   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
>> > > > [   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
>> > > > [   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
>> > > > [   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
>> > > > [   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
>> > > > [   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
>> > > > [   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
>> > > > [   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
>> > > > [   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
>> > > > [   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
>> > > > [   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
>> > > > [   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
>> > > > [   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
>> > > > [   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
>> > > > [   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
>> > > > [   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
>> > > > [   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
>> > > > [   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> > > > [   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> > > Hi, Jia
>> > > 
>> > > I haven't taken a deep look into your code, just one curious question on your
>> > > memory layout.
>> > > 
>> > > The log above is printed out in free_area_init_nodes(), which iterates on
>> > > memblock.memory and prints them. If I am not wrong, memory regions added to
>> > > memblock.memory are ordered and merged if possible.
>> > > 
>> > > While from your log, I see many regions could be merged but are isolated. For
>> > > example, the last two region:
>> > > 
>> > >     node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> > >     node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> > > 
>> > > So I am curious why they are isolated instead of combined to one.
>> > > 
>> > > >From the code, the possible reason is the region's flag differs from each
>> > > other. If you have time, would you mind taking a look into this?
>> > > 
>> > Hi Wei
>> > I thought these 2 have different flags
>> > [    0.000000] idx=30,region [7eff0000:10000]flag=4     <--- aka
>> > MEMBLOCK_NOMAP
>> > [    0.000000]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
>> > [    0.000000] idx=31,region [7f000000:81000000]flag=0 <--- aka MEMBLOCK_NONE
>> > [    0.000000]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
>> Thanks.
>> 
>> Hmm, I am not that familiar with those flags, while they look like to indicate
>> the physical capability of this range.
>> 
>> 	MEMBLOCK_NONE		no special
>> 	MEMBLOCK_HOTPLUG	hotplug-able
>> 	MEMBLOCK_MIRROR		high reliable
>> 	MEMBLOCK_NOMAP		no direct map
>> 
>> While these flags are not there when they are first added into the memory
>> region. When you look at the memblock_add_range(), the last parameter passed
>> is always 0. This means current several separated ranges reflect the physical
>> memory capability layout.
>> 
>> Then, why this layout is so scattered? As you can see several ranges are less
>> than 1M.
>> 
>> If, just my assumption, we could merge some of them, we could have a better
>> performance. Less ranges, less searching time.
>Thanks for your suggestions, Wei
>Need further digging and will consider to improve it in another patchset.
>

You are welcome :-)

I am glad to see your further patchset or investigation, if you are willing me
to involve.

>-- 
>Cheers,
>Jia

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable
  2018-03-26  3:02 ` [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable Jia He
@ 2018-03-28  9:18   ` Wei Yang
  2018-03-28  9:49     ` Jia He
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2018-03-28  9:18 UTC (permalink / raw)
  To: Jia He
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov, Jia He

Oops, I should reply this thread. Forget about the reply on another thread.

On Sun, Mar 25, 2018 at 08:02:15PM -0700, Jia He wrote:
>Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>where possible") optimized the loop in memmap_init_zone(). But it causes
>possible panic bug. So Daniel Vacek reverted it later.
>

Why this has a bug? Do you have some link about it?

If the audience could know the potential risk, it would be helpful to review
the code and decide whether to take it back.

>But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
>enable. And as verified by Eugeniu Rosca, arm can benifit from this
>commit. So remain the memblock_next_valid_pfn.
>
>Signed-off-by: Jia He <jia.he@hxt-semitech.com>
>---
> include/linux/memblock.h |  4 ++++
> mm/memblock.c            | 29 +++++++++++++++++++++++++++++
> mm/page_alloc.c          | 11 ++++++++++-
> 3 files changed, 43 insertions(+), 1 deletion(-)
>
>diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>index 0257aee..efbbe4b 100644
>--- a/include/linux/memblock.h
>+++ b/include/linux/memblock.h
>@@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
> 	     i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
>+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>+unsigned long memblock_next_valid_pfn(unsigned long pfn);
>+#endif
>+
> /**
>  * for_each_free_mem_range - iterate through free memblock areas
>  * @i: u64 used as loop variable
>diff --git a/mm/memblock.c b/mm/memblock.c
>index ba7c878..bea5a9c 100644
>--- a/mm/memblock.c
>+++ b/mm/memblock.c
>@@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
> 		*out_nid = r->nid;
> }
> 
>+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>+{
>+	struct memblock_type *type = &memblock.memory;
>+	unsigned int right = type->cnt;
>+	unsigned int mid, left = 0;
>+	phys_addr_t addr = PFN_PHYS(++pfn);
>+
>+	do {
>+		mid = (right + left) / 2;
>+
>+		if (addr < type->regions[mid].base)
>+			right = mid;
>+		else if (addr >= (type->regions[mid].base +
>+				  type->regions[mid].size))
>+			left = mid + 1;
>+		else {
>+			/* addr is within the region, so pfn is valid */
>+			return pfn;
>+		}
>+	} while (left < right);
>+
>+	if (right == type->cnt)
>+		return -1UL;
>+	else
>+		return PHYS_PFN(type->regions[right].base);
>+}
>+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>+
> /**
>  * memblock_set_node - set node ID on memblock regions
>  * @base: base of area to set node ID for
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index c19f5ac..2a967f7 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> 		if (context != MEMMAP_EARLY)
> 			goto not_early;
> 
>-		if (!early_pfn_valid(pfn))
>+		if (!early_pfn_valid(pfn)) {
>+#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)

In commit b92df1de5d28, it use CONFIG_HAVE_MEMBLOCK_NODE_MAP.

Not get the point of your change.

>+			/*
>+			 * Skip to the pfn preceding the next valid one (or
>+			 * end_pfn), such that we hit a valid pfn (or end_pfn)
>+			 * on our next iteration of the loop.
>+			 */
>+			pfn = memblock_next_valid_pfn(pfn) - 1;
>+#endif
> 			continue;
>+		}
> 		if (!early_pfn_in_nid(pfn, nid))
> 			continue;
> 		if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
>-- 
>2.7.4

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
  2018-03-26  3:02 ` [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
  2018-03-27 17:17   ` Daniel Vacek
@ 2018-03-28  9:26   ` Wei Yang
  2018-03-29  8:06     ` Jia He
  1 sibling, 1 reply; 29+ messages in thread
From: Wei Yang @ 2018-03-28  9:26 UTC (permalink / raw)
  To: Jia He
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov, Jia He

On Sun, Mar 25, 2018 at 08:02:16PM -0700, Jia He wrote:
>Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>where possible") optimized the loop in memmap_init_zone(). But there is
>still some room for improvement. E.g. if pfn and pfn+1 are in the same
>memblock region, we can simply pfn++ instead of doing the binary search
>in memblock_next_valid_pfn. This patch only works when
>CONFIG_HAVE_ARCH_PFN_VALID is enable.
>
>Signed-off-by: Jia He <jia.he@hxt-semitech.com>
>---
> include/linux/memblock.h |  2 +-
> mm/memblock.c            | 73 +++++++++++++++++++++++++++++-------------------
> mm/page_alloc.c          |  3 +-
> 3 files changed, 47 insertions(+), 31 deletions(-)
>
>diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>index efbbe4b..a8fb2ab 100644
>--- a/include/linux/memblock.h
>+++ b/include/linux/memblock.h
>@@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>-unsigned long memblock_next_valid_pfn(unsigned long pfn);
>+unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
> #endif
> 
> /**
>diff --git a/mm/memblock.c b/mm/memblock.c
>index bea5a9c..06c1a08 100644
>--- a/mm/memblock.c
>+++ b/mm/memblock.c
>@@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
> 		*out_nid = r->nid;
> }
> 
>-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>-{
>-	struct memblock_type *type = &memblock.memory;
>-	unsigned int right = type->cnt;
>-	unsigned int mid, left = 0;
>-	phys_addr_t addr = PFN_PHYS(++pfn);
>-
>-	do {
>-		mid = (right + left) / 2;
>-
>-		if (addr < type->regions[mid].base)
>-			right = mid;
>-		else if (addr >= (type->regions[mid].base +
>-				  type->regions[mid].size))
>-			left = mid + 1;
>-		else {
>-			/* addr is within the region, so pfn is valid */
>-			return pfn;
>-		}
>-	} while (left < right);
>-
>-	if (right == type->cnt)
>-		return -1UL;
>-	else
>-		return PHYS_PFN(type->regions[right].base);
>-}
>-#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>-
> /**
>  * memblock_set_node - set node ID on memblock regions
>  * @base: base of area to set node ID for
>@@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
> }
> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
>+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
>+							int *last_idx)
>+{
>+	struct memblock_type *type = &memblock.memory;
>+	unsigned int right = type->cnt;
>+	unsigned int mid, left = 0;
>+	unsigned long start_pfn, end_pfn;
>+	phys_addr_t addr = PFN_PHYS(++pfn);
>+
>+	/* fast path, return pfh+1 if next pfn is in the same region */
                             ^^^  pfn

>+	if (*last_idx != -1) {
>+		start_pfn = PFN_DOWN(type->regions[*last_idx].base);

To me, it should be PFN_UP().

>+		end_pfn = PFN_DOWN(type->regions[*last_idx].base +
>+				type->regions[*last_idx].size);
>+
>+		if (pfn < end_pfn && pfn > start_pfn)

Could be (pfn < end_pfn && pfn >= start_pfn)?

pfn == start_pfn is also a valid address.

>+			return pfn;
>+	}
>+
>+	/* slow path, do the binary searching */
>+	do {
>+		mid = (right + left) / 2;
>+
>+		if (addr < type->regions[mid].base)
>+			right = mid;
>+		else if (addr >= (type->regions[mid].base +
>+				  type->regions[mid].size))
>+			left = mid + 1;
>+		else {
>+			*last_idx = mid;
>+			return pfn;
>+		}
>+	} while (left < right);
>+
>+	if (right == type->cnt)
>+		return -1UL;
>+
>+	*last_idx = right;
>+
>+	return PHYS_PFN(type->regions[*last_idx].base);
>+}
>+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/

The same comment as Daniel, you are moving the function out of
CONFIG_HAVE_MEMBLOCK_NODE_MAP.

>+
> static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
> 					phys_addr_t align, phys_addr_t start,
> 					phys_addr_t end, int nid, ulong flags)
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 2a967f7..0bb0274 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> 	unsigned long end_pfn = start_pfn + size;
> 	pg_data_t *pgdat = NODE_DATA(nid);
> 	unsigned long pfn;
>+	int idx = -1;
> 	unsigned long nr_initialised = 0;
> 	struct page *page;
> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>@@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> 			 * end_pfn), such that we hit a valid pfn (or end_pfn)
> 			 * on our next iteration of the loop.
> 			 */
>-			pfn = memblock_next_valid_pfn(pfn) - 1;
>+			pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
> #endif
> 			continue;
> 		}
>-- 
>2.7.4

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 4/5] arm64: introduce pfn_valid_region()
  2018-03-26  3:02 ` [PATCH v3 4/5] arm64: introduce pfn_valid_region() Jia He
@ 2018-03-28  9:38   ` Wei Yang
  0 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2018-03-28  9:38 UTC (permalink / raw)
  To: Jia He
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov, Jia He

On Sun, Mar 25, 2018 at 08:02:18PM -0700, Jia He wrote:
>This is the preparation for further optimizing in early_pfn_valid
>on arm64.
>
>Signed-off-by: Jia He <jia.he@hxt-semitech.com>
>---
> arch/arm64/include/asm/page.h |  3 ++-
> arch/arm64/mm/init.c          | 25 ++++++++++++++++++++++++-
> 2 files changed, 26 insertions(+), 2 deletions(-)
>
>diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
>index 60d02c8..da2cba3 100644
>--- a/arch/arm64/include/asm/page.h
>+++ b/arch/arm64/include/asm/page.h
>@@ -38,7 +38,8 @@ extern void clear_page(void *to);
> typedef struct page *pgtable_t;
> 
> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>-extern int pfn_valid(unsigned long);
>+extern int pfn_valid(unsigned long pfn);
>+extern int pfn_valid_region(unsigned long pfn, int *last_idx);
> #endif
> 
> #include <asm/memory.h>
>diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>index 00e7b90..06433d5 100644
>--- a/arch/arm64/mm/init.c
>+++ b/arch/arm64/mm/init.c
>@@ -290,7 +290,30 @@ int pfn_valid(unsigned long pfn)
> 	return memblock_is_map_memory(pfn << PAGE_SHIFT);
> }
> EXPORT_SYMBOL(pfn_valid);
>-#endif
>+
>+int pfn_valid_region(unsigned long pfn, int *last_idx)
>+{
>+	unsigned long start_pfn, end_pfn;
>+	struct memblock_type *type = &memblock.memory;
>+
>+	if (*last_idx != -1) {
>+		start_pfn = PFN_DOWN(type->regions[*last_idx].base);

PFN_UP() should be used.

>+		end_pfn= PFN_DOWN(type->regions[*last_idx].base +
>+					type->regions[*last_idx].size);
>+
>+		if (pfn >= start_pfn && pfn < end_pfn)
>+			return !memblock_is_nomap(
>+				&memblock.memory.regions[*last_idx]);

Could use type->regions directly.

>+	}
>+
>+	*last_idx = memblock_search_pfn_regions(pfn);
>+	if (*last_idx == -1)
>+		return false;
>+
>+	return !memblock_is_nomap(&memblock.memory.regions[*last_idx]);

Could use type->regions directly.

Well, since your check memblock.memory.regions, how about use a variable
equals memblock.memory.regions directly instead of type->regions?

For example:

struct memblock_region *regions = memblock.memory.regions;

>+}
>+EXPORT_SYMBOL(pfn_valid_region);
>+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
> 
> #ifndef CONFIG_SPARSEMEM
> static void __init arm64_memory_present(void)
>-- 
>2.7.4

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable
  2018-03-28  9:18   ` Wei Yang
@ 2018-03-28  9:49     ` Jia He
  2018-04-02  8:12       ` Wei Yang
  0 siblings, 1 reply; 29+ messages in thread
From: Jia He @ 2018-03-28  9:49 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov, Jia He



On 3/28/2018 5:18 PM, Wei Yang Wrote:
> Oops, I should reply this thread. Forget about the reply on another thread.
>
> On Sun, Mar 25, 2018 at 08:02:15PM -0700, Jia He wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But it causes
>> possible panic bug. So Daniel Vacek reverted it later.
>>
> Why this has a bug? Do you have some link about it?
>
> If the audience could know the potential risk, it would be helpful to review
> the code and decide whether to take it back.
Hi Wei
Paul firstly submit a commit b92df1de5 to improve the loop in 
memmap_init_zone.
And Daniel tried to fix a bug_on panic issue on X86 in commit 
864b75f9d6b because
there is evidence that this bug_on was caused by b92df1de5 [1].

But things didn't get better, 864b75f9d6b caused booting hang issue on 
arm{64} [2]
So maintainer decided to reverted both b92df1de5 and 864b75f9d6b

[1] https://patchwork.kernel.org/patch/10251145/
[2] https://lkml.org/lkml/2018/3/14/469
>
>> But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
>> enable. And as verified by Eugeniu Rosca, arm can benifit from this
>> commit. So remain the memblock_next_valid_pfn.
>>
>> Signed-off-by: Jia He <jia.he@hxt-semitech.com>
>> ---
>> include/linux/memblock.h |  4 ++++
>> mm/memblock.c            | 29 +++++++++++++++++++++++++++++
>> mm/page_alloc.c          | 11 ++++++++++-
>> 3 files changed, 43 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index 0257aee..efbbe4b 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
>> 	     i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
>> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>
>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> +unsigned long memblock_next_valid_pfn(unsigned long pfn);
>> +#endif
>> +
>> /**
>>   * for_each_free_mem_range - iterate through free memblock areas
>>   * @i: u64 used as loop variable
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index ba7c878..bea5a9c 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
>> 		*out_nid = r->nid;
>> }
>>
>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>> +{
>> +	struct memblock_type *type = &memblock.memory;
>> +	unsigned int right = type->cnt;
>> +	unsigned int mid, left = 0;
>> +	phys_addr_t addr = PFN_PHYS(++pfn);
>> +
>> +	do {
>> +		mid = (right + left) / 2;
>> +
>> +		if (addr < type->regions[mid].base)
>> +			right = mid;
>> +		else if (addr >= (type->regions[mid].base +
>> +				  type->regions[mid].size))
>> +			left = mid + 1;
>> +		else {
>> +			/* addr is within the region, so pfn is valid */
>> +			return pfn;
>> +		}
>> +	} while (left < right);
>> +
>> +	if (right == type->cnt)
>> +		return -1UL;
>> +	else
>> +		return PHYS_PFN(type->regions[right].base);
>> +}
>> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>> +
>> /**
>>   * memblock_set_node - set node ID on memblock regions
>>   * @base: base of area to set node ID for
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c19f5ac..2a967f7 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>> 		if (context != MEMMAP_EARLY)
>> 			goto not_early;
>>
>> -		if (!early_pfn_valid(pfn))
>> +		if (!early_pfn_valid(pfn)) {
>> +#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
> In commit b92df1de5d28, it use CONFIG_HAVE_MEMBLOCK_NODE_MAP.
>
> Not get the point of your change.
Please get more information about the reason why using 
CONFIG_HAVE_MEMBLOCK in
d49d47e mm: page_alloc: skip over regions of invalid pfns on UMA

And this commit is dependent on b92df1de, so it is also reverted.

Cheers,
Jia
>
>> +			/*
>> +			 * Skip to the pfn preceding the next valid one (or
>> +			 * end_pfn), such that we hit a valid pfn (or end_pfn)
>> +			 * on our next iteration of the loop.
>> +			 */
>> +			pfn = memblock_next_valid_pfn(pfn) - 1;
>> +#endif
>> 			continue;
>> +		}
>> 		if (!early_pfn_in_nid(pfn, nid))
>> 			continue;
>> 		if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
>> -- 
>> 2.7.4

-- 
Cheers,
Jia

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
  2018-03-28  9:26   ` Wei Yang
@ 2018-03-29  8:06     ` Jia He
  2018-03-30  1:43       ` Wei Yang
  0 siblings, 1 reply; 29+ messages in thread
From: Jia He @ 2018-03-29  8:06 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov, Jia He



On 3/28/2018 5:26 PM, Wei Yang Wrote:
> On Sun, Mar 25, 2018 at 08:02:16PM -0700, Jia He wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But there is
>> still some room for improvement. E.g. if pfn and pfn+1 are in the same
>> memblock region, we can simply pfn++ instead of doing the binary search
>> in memblock_next_valid_pfn. This patch only works when
>> CONFIG_HAVE_ARCH_PFN_VALID is enable.
>>
>> Signed-off-by: Jia He <jia.he@hxt-semitech.com>
>> ---
>> include/linux/memblock.h |  2 +-
>> mm/memblock.c            | 73 +++++++++++++++++++++++++++++-------------------
>> mm/page_alloc.c          |  3 +-
>> 3 files changed, 47 insertions(+), 31 deletions(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index efbbe4b..a8fb2ab 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
>> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>
>> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> -unsigned long memblock_next_valid_pfn(unsigned long pfn);
>> +unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
>> #endif
>>
>> /**
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index bea5a9c..06c1a08 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
>> 		*out_nid = r->nid;
>> }
>>
>> -#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> -unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>> -{
>> -	struct memblock_type *type = &memblock.memory;
>> -	unsigned int right = type->cnt;
>> -	unsigned int mid, left = 0;
>> -	phys_addr_t addr = PFN_PHYS(++pfn);
>> -
>> -	do {
>> -		mid = (right + left) / 2;
>> -
>> -		if (addr < type->regions[mid].base)
>> -			right = mid;
>> -		else if (addr >= (type->regions[mid].base +
>> -				  type->regions[mid].size))
>> -			left = mid + 1;
>> -		else {
>> -			/* addr is within the region, so pfn is valid */
>> -			return pfn;
>> -		}
>> -	} while (left < right);
>> -
>> -	if (right == type->cnt)
>> -		return -1UL;
>> -	else
>> -		return PHYS_PFN(type->regions[right].base);
>> -}
>> -#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>> -
>> /**
>>   * memblock_set_node - set node ID on memblock regions
>>   * @base: base of area to set node ID for
>> @@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>> }
>> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>
>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
>> +							int *last_idx)
>> +{
>> +	struct memblock_type *type = &memblock.memory;
>> +	unsigned int right = type->cnt;
>> +	unsigned int mid, left = 0;
>> +	unsigned long start_pfn, end_pfn;
>> +	phys_addr_t addr = PFN_PHYS(++pfn);
>> +
>> +	/* fast path, return pfh+1 if next pfn is in the same region */
>                               ^^^  pfn
Thanks
>> +	if (*last_idx != -1) {
>> +		start_pfn = PFN_DOWN(type->regions[*last_idx].base);
> To me, it should be PFN_UP().
hmm.., seems all the base of memory region is pfn aligned (0x10000 
aligned). So

PFN_UP is the same as PFN_DOWN here?
I got this logic from memblock_search_pfn_nid()

Cheers,
Jia

>
>> +		end_pfn = PFN_DOWN(type->regions[*last_idx].base +
>> +				type->regions[*last_idx].size);
>> +
>> +		if (pfn < end_pfn && pfn > start_pfn)
> Could be (pfn < end_pfn && pfn >= start_pfn)?
>
> pfn == start_pfn is also a valid address.
No, pfn=pfn+1 at the beginning, so pfn != start_pfn
>
>> +			return pfn;
>> +	}
>> +
>> +	/* slow path, do the binary searching */
>> +	do {
>> +		mid = (right + left) / 2;
>> +
>> +		if (addr < type->regions[mid].base)
>> +			right = mid;
>> +		else if (addr >= (type->regions[mid].base +
>> +				  type->regions[mid].size))
>> +			left = mid + 1;
>> +		else {
>> +			*last_idx = mid;
>> +			return pfn;
>> +		}
>> +	} while (left < right);
>> +
>> +	if (right == type->cnt)
>> +		return -1UL;
>> +
>> +	*last_idx = right;
>> +
>> +	return PHYS_PFN(type->regions[*last_idx].base);
>> +}
>> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
> The same comment as Daniel, you are moving the function out of
> CONFIG_HAVE_MEMBLOCK_NODE_MAP.
>> +
>> static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>> 					phys_addr_t align, phys_addr_t start,
>> 					phys_addr_t end, int nid, ulong flags)
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 2a967f7..0bb0274 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>> 	unsigned long end_pfn = start_pfn + size;
>> 	pg_data_t *pgdat = NODE_DATA(nid);
>> 	unsigned long pfn;
>> +	int idx = -1;
>> 	unsigned long nr_initialised = 0;
>> 	struct page *page;
>> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> @@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>> 			 * end_pfn), such that we hit a valid pfn (or end_pfn)
>> 			 * on our next iteration of the loop.
>> 			 */
>> -			pfn = memblock_next_valid_pfn(pfn) - 1;
>> +			pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
>> #endif
>> 			continue;
>> 		}
>> -- 
>> 2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
  2018-03-29  8:06     ` Jia He
@ 2018-03-30  1:43       ` Wei Yang
  2018-03-30  2:12         ` Jia He
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2018-03-30  1:43 UTC (permalink / raw)
  To: Jia He
  Cc: Wei Yang, Andrew Morton, Michal Hocko, Catalin Marinas,
	Mel Gorman, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Tatashin,
	Daniel Jordan, AKASHI Takahiro, Gioh Kim, Steven Sistare,
	Daniel Vacek, Eugeniu Rosca, Vlastimil Babka, linux-kernel,
	linux-mm, James Morse, Steve Capper, x86, Greg Kroah-Hartman,
	Kate Stewart, Philippe Ombredanne, Johannes Weiner, Kemi Wang,
	Petr Tesarik, YASUAKI ISHIMATSU, Andrey Ryabinin,
	Nikolay Borisov, Jia He

On Thu, Mar 29, 2018 at 04:06:38PM +0800, Jia He wrote:
>
>
>On 3/28/2018 5:26 PM, Wei Yang Wrote:
>> On Sun, Mar 25, 2018 at 08:02:16PM -0700, Jia He wrote:
>> > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> > where possible") optimized the loop in memmap_init_zone(). But there is
>> > still some room for improvement. E.g. if pfn and pfn+1 are in the same
>> > memblock region, we can simply pfn++ instead of doing the binary search
>> > in memblock_next_valid_pfn. This patch only works when
>> > CONFIG_HAVE_ARCH_PFN_VALID is enable.
>> > 
>> > Signed-off-by: Jia He <jia.he@hxt-semitech.com>
>> > ---
>> > include/linux/memblock.h |  2 +-
>> > mm/memblock.c            | 73 +++++++++++++++++++++++++++++-------------------
>> > mm/page_alloc.c          |  3 +-
>> > 3 files changed, 47 insertions(+), 31 deletions(-)
>> > 
>> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> > index efbbe4b..a8fb2ab 100644
>> > --- a/include/linux/memblock.h
>> > +++ b/include/linux/memblock.h
>> > @@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
>> > #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>> > 
>> > #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> > -unsigned long memblock_next_valid_pfn(unsigned long pfn);
>> > +unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
>> > #endif
>> > 
>> > /**
>> > diff --git a/mm/memblock.c b/mm/memblock.c
>> > index bea5a9c..06c1a08 100644
>> > --- a/mm/memblock.c
>> > +++ b/mm/memblock.c
>> > @@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
>> > 		*out_nid = r->nid;
>> > }
>> > 
>> > -#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> > -unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>> > -{
>> > -	struct memblock_type *type = &memblock.memory;
>> > -	unsigned int right = type->cnt;
>> > -	unsigned int mid, left = 0;
>> > -	phys_addr_t addr = PFN_PHYS(++pfn);
>> > -
>> > -	do {
>> > -		mid = (right + left) / 2;
>> > -
>> > -		if (addr < type->regions[mid].base)
>> > -			right = mid;
>> > -		else if (addr >= (type->regions[mid].base +
>> > -				  type->regions[mid].size))
>> > -			left = mid + 1;
>> > -		else {
>> > -			/* addr is within the region, so pfn is valid */
>> > -			return pfn;
>> > -		}
>> > -	} while (left < right);
>> > -
>> > -	if (right == type->cnt)
>> > -		return -1UL;
>> > -	else
>> > -		return PHYS_PFN(type->regions[right].base);
>> > -}
>> > -#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>> > -
>> > /**
>> >   * memblock_set_node - set node ID on memblock regions
>> >   * @base: base of area to set node ID for
>> > @@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>> > }
>> > #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>> > 
>> > +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> > +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
>> > +							int *last_idx)
>> > +{
>> > +	struct memblock_type *type = &memblock.memory;
>> > +	unsigned int right = type->cnt;
>> > +	unsigned int mid, left = 0;
>> > +	unsigned long start_pfn, end_pfn;
>> > +	phys_addr_t addr = PFN_PHYS(++pfn);
>> > +
>> > +	/* fast path, return pfh+1 if next pfn is in the same region */
>>                               ^^^  pfn
>Thanks
>> > +	if (*last_idx != -1) {
>> > +		start_pfn = PFN_DOWN(type->regions[*last_idx].base);
>> To me, it should be PFN_UP().
>hmm.., seems all the base of memory region is pfn aligned (0x10000 aligned).
>So
>
>PFN_UP is the same as PFN_DOWN here?
>I got this logic from memblock_search_pfn_nid()

Ok, I guess here hide some buggy code.

When you look at __next_mem_pfn_range(), it uses PFN_UP() for base. The reason
is try to clip some un-page-aligned memory. While PFN_DOWN() will introduce
some unavailable memory to system.

Even mostly those address are page-aligned, we need to be careful for this.

Let me drop a patch to fix the original one.

>
>Cheers,
>Jia
>
>> 
>> > +		end_pfn = PFN_DOWN(type->regions[*last_idx].base +
>> > +				type->regions[*last_idx].size);
>> > +
>> > +		if (pfn < end_pfn && pfn > start_pfn)
>> Could be (pfn < end_pfn && pfn >= start_pfn)?
>> 
>> pfn == start_pfn is also a valid address.
>No, pfn=pfn+1 at the beginning, so pfn != start_pfn

This is a little bit tricky.

There is no requirement to pass a valid pfn to memblock_next_valid_pfn(). So
suppose we have memory layout like this:

    [0x100, 0x1ff]
    [0x300, 0x3ff]

And I call memblock_next_valid_pfn(0x2ff, 1), would this fits the fast path
logic?

Well, since memblock_next_valid_pfn() only used memmap_init_zone(), the
situation as I mentioned seems will not happen.

Even though, I suggest to chagne this, otherwise your logic in slow path and
fast path differs. In the case above, your slow path returns 0x300 at last.

>> 
>> > +			return pfn;
>> > +	}
>> > +
>> > +	/* slow path, do the binary searching */
>> > +	do {
>> > +		mid = (right + left) / 2;
>> > +
>> > +		if (addr < type->regions[mid].base)
>> > +			right = mid;
>> > +		else if (addr >= (type->regions[mid].base +
>> > +				  type->regions[mid].size))
>> > +			left = mid + 1;
>> > +		else {
>> > +			*last_idx = mid;
>> > +			return pfn;
>> > +		}
>> > +	} while (left < right);
>> > +
>> > +	if (right == type->cnt)
>> > +		return -1UL;
>> > +
>> > +	*last_idx = right;
>> > +
>> > +	return PHYS_PFN(type->regions[*last_idx].base);
>> > +}
>> > +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>> The same comment as Daniel, you are moving the function out of
>> CONFIG_HAVE_MEMBLOCK_NODE_MAP.
>> > +
>> > static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>> > 					phys_addr_t align, phys_addr_t start,
>> > 					phys_addr_t end, int nid, ulong flags)
>> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> > index 2a967f7..0bb0274 100644
>> > --- a/mm/page_alloc.c
>> > +++ b/mm/page_alloc.c
>> > @@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>> > 	unsigned long end_pfn = start_pfn + size;
>> > 	pg_data_t *pgdat = NODE_DATA(nid);
>> > 	unsigned long pfn;
>> > +	int idx = -1;
>> > 	unsigned long nr_initialised = 0;
>> > 	struct page *page;
>> > #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> > @@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>> > 			 * end_pfn), such that we hit a valid pfn (or end_pfn)
>> > 			 * on our next iteration of the loop.
>> > 			 */
>> > -			pfn = memblock_next_valid_pfn(pfn) - 1;
>> > +			pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
>> > #endif
>> > 			continue;
>> > 		}
>> > -- 
>> > 2.7.4

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
  2018-03-30  1:43       ` Wei Yang
@ 2018-03-30  2:12         ` Jia He
  0 siblings, 0 replies; 29+ messages in thread
From: Jia He @ 2018-03-30  2:12 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov, Jia He



On 3/30/2018 9:43 AM, Wei Yang Wrote:
> On Thu, Mar 29, 2018 at 04:06:38PM +0800, Jia He wrote:
>>
>> On 3/28/2018 5:26 PM, Wei Yang Wrote:
>>> On Sun, Mar 25, 2018 at 08:02:16PM -0700, Jia He wrote:
>>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>>>> where possible") optimized the loop in memmap_init_zone(). But there is
>>>> still some room for improvement. E.g. if pfn and pfn+1 are in the same
>>>> memblock region, we can simply pfn++ instead of doing the binary search
>>>> in memblock_next_valid_pfn. This patch only works when
>>>> CONFIG_HAVE_ARCH_PFN_VALID is enable.
>>>>
>>>> Signed-off-by: Jia He <jia.he@hxt-semitech.com>
>>>> ---
>>>> include/linux/memblock.h |  2 +-
>>>> mm/memblock.c            | 73 +++++++++++++++++++++++++++++-------------------
>>>> mm/page_alloc.c          |  3 +-
>>>> 3 files changed, 47 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>>>> index efbbe4b..a8fb2ab 100644
>>>> --- a/include/linux/memblock.h
>>>> +++ b/include/linux/memblock.h
>>>> @@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
>>>> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>>
>>>> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>>> -unsigned long memblock_next_valid_pfn(unsigned long pfn);
>>>> +unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
>>>> #endif
>>>>
>>>> /**
>>>> diff --git a/mm/memblock.c b/mm/memblock.c
>>>> index bea5a9c..06c1a08 100644
>>>> --- a/mm/memblock.c
>>>> +++ b/mm/memblock.c
>>>> @@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
>>>> 		*out_nid = r->nid;
>>>> }
>>>>
>>>> -#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>>> -unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>>>> -{
>>>> -	struct memblock_type *type = &memblock.memory;
>>>> -	unsigned int right = type->cnt;
>>>> -	unsigned int mid, left = 0;
>>>> -	phys_addr_t addr = PFN_PHYS(++pfn);
>>>> -
>>>> -	do {
>>>> -		mid = (right + left) / 2;
>>>> -
>>>> -		if (addr < type->regions[mid].base)
>>>> -			right = mid;
>>>> -		else if (addr >= (type->regions[mid].base +
>>>> -				  type->regions[mid].size))
>>>> -			left = mid + 1;
>>>> -		else {
>>>> -			/* addr is within the region, so pfn is valid */
>>>> -			return pfn;
>>>> -		}
>>>> -	} while (left < right);
>>>> -
>>>> -	if (right == type->cnt)
>>>> -		return -1UL;
>>>> -	else
>>>> -		return PHYS_PFN(type->regions[right].base);
>>>> -}
>>>> -#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>>>> -
>>>> /**
>>>>    * memblock_set_node - set node ID on memblock regions
>>>>    * @base: base of area to set node ID for
>>>> @@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>>>> }
>>>> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>>
>>>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>>> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
>>>> +							int *last_idx)
>>>> +{
>>>> +	struct memblock_type *type = &memblock.memory;
>>>> +	unsigned int right = type->cnt;
>>>> +	unsigned int mid, left = 0;
>>>> +	unsigned long start_pfn, end_pfn;
>>>> +	phys_addr_t addr = PFN_PHYS(++pfn);
>>>> +
>>>> +	/* fast path, return pfh+1 if next pfn is in the same region */
>>>                                ^^^  pfn
>> Thanks
>>>> +	if (*last_idx != -1) {
>>>> +		start_pfn = PFN_DOWN(type->regions[*last_idx].base);
>>> To me, it should be PFN_UP().
>> hmm.., seems all the base of memory region is pfn aligned (0x10000 aligned).
>> So
>>
>> PFN_UP is the same as PFN_DOWN here?
>> I got this logic from memblock_search_pfn_nid()
> Ok, I guess here hide some buggy code.
>
> When you look at __next_mem_pfn_range(), it uses PFN_UP() for base. The reason
> is try to clip some un-page-aligned memory. While PFN_DOWN() will introduce
> some unavailable memory to system.
>
> Even mostly those address are page-aligned, we need to be careful for this.
>
> Let me drop a patch to fix the original one.
Ok, please cc me, I will change the related code when your patch is 
accepted. ;-)
>> Cheers,
>> Jia
>>
>>>> +		end_pfn = PFN_DOWN(type->regions[*last_idx].base +
>>>> +				type->regions[*last_idx].size);
>>>> +
>>>> +		if (pfn < end_pfn && pfn > start_pfn)
>>> Could be (pfn < end_pfn && pfn >= start_pfn)?
>>>
>>> pfn == start_pfn is also a valid address.
>> No, pfn=pfn+1 at the beginning, so pfn != start_pfn
> This is a little bit tricky.
>
> There is no requirement to pass a valid pfn to memblock_next_valid_pfn(). So
> suppose we have memory layout like this:
>
>      [0x100, 0x1ff]
>      [0x300, 0x3ff]
>
> And I call memblock_next_valid_pfn(0x2ff, 1), would this fits the fast path
> logic?
>
> Well, since memblock_next_valid_pfn() only used memmap_init_zone(), the
> situation as I mentioned seems will not happen.
>
> Even though, I suggest to chagne this, otherwise your logic in slow path and
> fast path differs. In the case above, your slow path returns 0x300 at last.
ok. looks like it does no harm, even I thought the code will guarantee 
it to skip from 0x1ff
to 0x300 directly.
I will change it after fuctional test.

--
Cheers,
Jia

>>>> +			return pfn;
>>>> +	}
>>>> +
>>>> +	/* slow path, do the binary searching */
>>>> +	do {
>>>> +		mid = (right + left) / 2;
>>>> +
>>>> +		if (addr < type->regions[mid].base)
>>>> +			right = mid;
>>>> +		else if (addr >= (type->regions[mid].base +
>>>> +				  type->regions[mid].size))
>>>> +			left = mid + 1;
>>>> +		else {
>>>> +			*last_idx = mid;
>>>> +			return pfn;
>>>> +		}
>>>> +	} while (left < right);
>>>> +
>>>> +	if (right == type->cnt)
>>>> +		return -1UL;
>>>> +
>>>> +	*last_idx = right;
>>>> +
>>>> +	return PHYS_PFN(type->regions[*last_idx].base);
>>>> +}
>>>> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>>> The same comment as Daniel, you are moving the function out of
>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP.
>>>> +
>>>> static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>>>> 					phys_addr_t align, phys_addr_t start,
>>>> 					phys_addr_t end, int nid, ulong flags)
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 2a967f7..0bb0274 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>> 	unsigned long end_pfn = start_pfn + size;
>>>> 	pg_data_t *pgdat = NODE_DATA(nid);
>>>> 	unsigned long pfn;
>>>> +	int idx = -1;
>>>> 	unsigned long nr_initialised = 0;
>>>> 	struct page *page;
>>>> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>>>> @@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>> 			 * end_pfn), such that we hit a valid pfn (or end_pfn)
>>>> 			 * on our next iteration of the loop.
>>>> 			 */
>>>> -			pfn = memblock_next_valid_pfn(pfn) - 1;
>>>> +			pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
>>>> #endif
>>>> 			continue;
>>>> 		}
>>>> -- 
>>>> 2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable
  2018-03-28  9:49     ` Jia He
@ 2018-04-02  8:12       ` Wei Yang
  2018-04-02  9:17         ` Jia He
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Yang @ 2018-04-02  8:12 UTC (permalink / raw)
  To: Jia He
  Cc: Wei Yang, Andrew Morton, Michal Hocko, Catalin Marinas,
	Mel Gorman, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Tatashin,
	Daniel Jordan, AKASHI Takahiro, Gioh Kim, Steven Sistare,
	Daniel Vacek, Eugeniu Rosca, Vlastimil Babka, linux-kernel,
	linux-mm, James Morse, Steve Capper, x86, Greg Kroah-Hartman,
	Kate Stewart, Philippe Ombredanne, Johannes Weiner, Kemi Wang,
	Petr Tesarik, YASUAKI ISHIMATSU, Andrey Ryabinin,
	Nikolay Borisov, Jia He

On Wed, Mar 28, 2018 at 05:49:23PM +0800, Jia He wrote:
>
>
>On 3/28/2018 5:18 PM, Wei Yang Wrote:
>> Oops, I should reply this thread. Forget about the reply on another thread.
>> 
>> On Sun, Mar 25, 2018 at 08:02:15PM -0700, Jia He wrote:
>> > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> > where possible") optimized the loop in memmap_init_zone(). But it causes
>> > possible panic bug. So Daniel Vacek reverted it later.
>> > 
>> Why this has a bug? Do you have some link about it?
>> 
>> If the audience could know the potential risk, it would be helpful to review
>> the code and decide whether to take it back.
>Hi Wei
>Paul firstly submit a commit b92df1de5 to improve the loop in
>memmap_init_zone.
>And Daniel tried to fix a bug_on panic issue on X86 in commit 864b75f9d6b
>because
>there is evidence that this bug_on was caused by b92df1de5 [1].
>
>But things didn't get better, 864b75f9d6b caused booting hang issue on
>arm{64} [2]
>So maintainer decided to reverted both b92df1de5 and 864b75f9d6b
>
>[1] https://patchwork.kernel.org/patch/10251145/
>[2] https://lkml.org/lkml/2018/3/14/469

I took some time to look into the discussion, while the root cause seems not
clear now?

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable
  2018-04-02  8:12       ` Wei Yang
@ 2018-04-02  9:17         ` Jia He
  2018-04-03  0:14           ` Wei Yang
  0 siblings, 1 reply; 29+ messages in thread
From: Jia He @ 2018-04-02  9:17 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Michal Hocko, Catalin Marinas, Mel Gorman,
	Will Deacon, Mark Rutland, Ard Biesheuvel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Pavel Tatashin, Daniel Jordan,
	AKASHI Takahiro, Gioh Kim, Steven Sistare, Daniel Vacek,
	Eugeniu Rosca, Vlastimil Babka, linux-kernel, linux-mm,
	James Morse, Steve Capper, x86, Greg Kroah-Hartman, Kate Stewart,
	Philippe Ombredanne, Johannes Weiner, Kemi Wang, Petr Tesarik,
	YASUAKI ISHIMATSU, Andrey Ryabinin, Nikolay Borisov, Jia He



On 4/2/2018 4:12 PM, Wei Yang Wrote:
> On Wed, Mar 28, 2018 at 05:49:23PM +0800, Jia He wrote:
>>
>> On 3/28/2018 5:18 PM, Wei Yang Wrote:
>>> Oops, I should reply this thread. Forget about the reply on another thread.
>>>
>>> On Sun, Mar 25, 2018 at 08:02:15PM -0700, Jia He wrote:
>>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>>>> where possible") optimized the loop in memmap_init_zone(). But it causes
>>>> possible panic bug. So Daniel Vacek reverted it later.
>>>>
>>> Why this has a bug? Do you have some link about it?
>>>
>>> If the audience could know the potential risk, it would be helpful to review
>>> the code and decide whether to take it back.
>> Hi Wei
>> Paul firstly submit a commit b92df1de5 to improve the loop in
>> memmap_init_zone.
>> And Daniel tried to fix a bug_on panic issue on X86 in commit 864b75f9d6b
>> because
>> there is evidence that this bug_on was caused by b92df1de5 [1].
>>
>> But things didn't get better, 864b75f9d6b caused booting hang issue on
>> arm{64} [2]
>> So maintainer decided to reverted both b92df1de5 and 864b75f9d6b
>>
>> [1] https://patchwork.kernel.org/patch/10251145/
>> [2] https://lkml.org/lkml/2018/3/14/469
> I took some time to look into the discussion, while the root cause seems not
> clear now?
>
Frankly speaking, to me the root cause of that bug_on is not completedly
clear :-) Daniel ever gave me some hints as followed, but currently I have
no x86 platform to understand the details.

"On arm and arm64, memblock is used by default. But generic version of
pfn_valid() is based on mem sections and memblock_next_valid_pfn()
does not always return the next valid one but skips more resulting in
some valid frames to be skipped (as if they were invalid). And that's
why kernel was eventually crashing on some !arm machines."

-- 
Cheers,
Jia

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable
  2018-04-02  9:17         ` Jia He
@ 2018-04-03  0:14           ` Wei Yang
  0 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2018-04-03  0:14 UTC (permalink / raw)
  To: Jia He
  Cc: Wei Yang, Andrew Morton, Michal Hocko, Catalin Marinas,
	Mel Gorman, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Tatashin,
	Daniel Jordan, AKASHI Takahiro, Gioh Kim, Steven Sistare,
	Daniel Vacek, Eugeniu Rosca, Vlastimil Babka, linux-kernel,
	linux-mm, James Morse, Steve Capper, x86, Greg Kroah-Hartman,
	Kate Stewart, Philippe Ombredanne, Johannes Weiner, Kemi Wang,
	Petr Tesarik, YASUAKI ISHIMATSU, Andrey Ryabinin,
	Nikolay Borisov, Jia He

On Mon, Apr 02, 2018 at 05:17:35PM +0800, Jia He wrote:
>
>
>On 4/2/2018 4:12 PM, Wei Yang Wrote:
>> On Wed, Mar 28, 2018 at 05:49:23PM +0800, Jia He wrote:
>> > 
>> > On 3/28/2018 5:18 PM, Wei Yang Wrote:
>> > > Oops, I should reply this thread. Forget about the reply on another thread.
>> > > 
>> > > On Sun, Mar 25, 2018 at 08:02:15PM -0700, Jia He wrote:
>> > > > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> > > > where possible") optimized the loop in memmap_init_zone(). But it causes
>> > > > possible panic bug. So Daniel Vacek reverted it later.
>> > > > 
>> > > Why this has a bug? Do you have some link about it?
>> > > 
>> > > If the audience could know the potential risk, it would be helpful to review
>> > > the code and decide whether to take it back.
>> > Hi Wei
>> > Paul firstly submit a commit b92df1de5 to improve the loop in
>> > memmap_init_zone.
>> > And Daniel tried to fix a bug_on panic issue on X86 in commit 864b75f9d6b
>> > because
>> > there is evidence that this bug_on was caused by b92df1de5 [1].
>> > 
>> > But things didn't get better, 864b75f9d6b caused booting hang issue on
>> > arm{64} [2]
>> > So maintainer decided to reverted both b92df1de5 and 864b75f9d6b
>> > 
>> > [1] https://patchwork.kernel.org/patch/10251145/
>> > [2] https://lkml.org/lkml/2018/3/14/469
>> I took some time to look into the discussion, while the root cause seems not
>> clear now?
>> 
>Frankly speaking, to me the root cause of that bug_on is not completedly
>clear :-) Daniel ever gave me some hints as followed, but currently I have
>no x86 platform to understand the details.
>
>"On arm and arm64, memblock is used by default. But generic version of
>pfn_valid() is based on mem sections and memblock_next_valid_pfn()
>does not always return the next valid one but skips more resulting in
>some valid frames to be skipped (as if they were invalid). And that's
>why kernel was eventually crashing on some !arm machines."
>

This means a system with memblock is safe to use this function?

As I know, mem_section is based on memblock, so in which case
memblock_next_valid_pfn() skips a valid pfn? A little confused.

>-- 
>Cheers,
>Jia

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-04-03  0:14 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-26  3:02 [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Jia He
2018-03-26  3:02 ` [PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable Jia He
2018-03-28  9:18   ` Wei Yang
2018-03-28  9:49     ` Jia He
2018-04-02  8:12       ` Wei Yang
2018-04-02  9:17         ` Jia He
2018-04-03  0:14           ` Wei Yang
2018-03-26  3:02 ` [PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
2018-03-27 17:17   ` Daniel Vacek
2018-03-28  2:09     ` Jia He
2018-03-28  9:26   ` Wei Yang
2018-03-29  8:06     ` Jia He
2018-03-30  1:43       ` Wei Yang
2018-03-30  2:12         ` Jia He
2018-03-26  3:02 ` [PATCH v3 3/5] mm/memblock: introduce memblock_search_pfn_regions() Jia He
2018-03-26  3:02 ` [PATCH v3 4/5] arm64: introduce pfn_valid_region() Jia He
2018-03-28  9:38   ` Wei Yang
2018-03-26  3:02 ` [PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() Jia He
2018-03-27 17:51   ` Daniel Vacek
2018-03-28  2:10     ` Jia He
2018-03-27  1:02 ` [PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid Wei Yang
2018-03-27  7:15   ` Jia He
2018-03-27  7:15     ` Jia He
2018-03-28  0:30     ` Wei Yang
2018-03-28  0:30       ` Wei Yang
2018-03-28  1:45       ` Jia He
2018-03-28  1:45         ` Jia He
2018-03-28  2:36         ` Wei Yang
2018-03-28  2:36           ` Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.