linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup
@ 2020-12-20  8:27 Baoquan He
  2020-12-20  8:27 ` [PATCH v2 1/5] mm: memmap defer init dosn't work as expected Baoquan He
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Baoquan He @ 2020-12-20  8:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, gopakumarr, rppt, david, bhe

VMware reported the performance regression during memmap_init() invocation.
And they bisected to commit 73a6e474cb376 ("mm: memmap_init: iterate over
memblock regions rather that check each PFN") causing it.

https://lore.kernel.org/linux-mm/DM6PR05MB52921FF90FA01CC337DD23A1A4080@DM6PR05MB5292.namprd05.prod.outlook.com/

After investigation, it's caused by incorrect memmap init defer handling
in memmap_init_zone() after commit 73a6e474cb376. The current
memmap_init_zone() only handle one memory region of one zone, while
memmap_init() iterates over all its memory regions and pass them one by
one into memmap_init_zone() to handle.

So in this patchset, patch 1/5 fixes the bug observed by VMware. Patch
2~5/5 clean up codes.
accordingly.

VMware helped do the testing for the patch 1 of v1 version which was based
on master branch of Linus's tree on their VMware ESI platform, while the
patch 1 is not changed in functionality in v2. And I haven't got a
ia64 machine to compile or test, will really appreciate if anyone can help
compile this patchset on one. This patchset is based on the latest next/master,
only did the basic test.  

Baoquan He (5):
  mm: memmap defer init dosn't work as expected
  mm: rename memmap_init() and memmap_init_zone()
  mm: simplify parater of function memmap_init_zone()
  mm: simplify parameter of setup_usemap()
  mm: remove unneeded local variable in free_area_init_core

 arch/ia64/include/asm/pgtable.h |  3 +-
 arch/ia64/mm/init.c             | 16 +++++----
 include/linux/mm.h              |  5 +--
 mm/memory_hotplug.c             |  2 +-
 mm/page_alloc.c                 | 60 ++++++++++++++++-----------------
 5 files changed, 43 insertions(+), 43 deletions(-)

-- 
2.17.2



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/5] mm: memmap defer init dosn't work as expected
  2020-12-20  8:27 [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Baoquan He
@ 2020-12-20  8:27 ` Baoquan He
  2020-12-21  6:32   ` Mike Rapoport
  2020-12-20  8:27 ` [PATCH v2 2/5] mm: rename memmap_init() and memmap_init_zone() Baoquan He
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-12-20  8:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, gopakumarr, rppt, david, bhe

VMware observed a performance regression during memmap init on their platform,
and bisected to commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock
regions rather that check each PFN") causing it.

Before the commit:

  [0.033176] Normal zone: 1445888 pages used for memmap
  [0.033176] Normal zone: 89391104 pages, LIFO batch:63
  [0.035851] ACPI: PM-Timer IO Port: 0x448

With commit

  [0.026874] Normal zone: 1445888 pages used for memmap
  [0.026875] Normal zone: 89391104 pages, LIFO batch:63
  [2.028450] ACPI: PM-Timer IO Port: 0x448

The root cause is the current memmap defer init doesn't work as expected.
Before, memmap_init_zone() was used to do memmap init of one whole zone, to
initialize all low zones of one numa node, but defer memmap init of the
last zone in that numa node. However, since commit 73a6e474cb376, function
memmap_init() is adapted to iterater over memblock regions inside one zone,
then call memmap_init_zone() to do memmap init for each region.

E.g, on VMware's system, the memory layout is as below, there are two memory
regions in node 2. The current code will mistakenly initialize the whole 1st
region [mem 0xab00000000-0xfcffffffff], then do memmap defer to iniatialize
only one memmory section on the 2nd region [mem 0x10000000000-0x1033fffffff].
In fact, we only expect to see that there's only one memory section's memmap
initialized. That's why more time is costed at the time.

[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]

Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge whether
defer need be taken in zone wide.

Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Reported-by: Rahul Gopakumar <gopakumarr@vmware.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: stable@vger.kernel.org
---
 arch/ia64/mm/init.c | 4 ++--
 include/linux/mm.h  | 5 +++--
 mm/memory_hotplug.c | 2 +-
 mm/page_alloc.c     | 8 +++++---
 4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 9b5acf8fb092..e76386a3479e 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -536,7 +536,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
 
 	if (map_start < map_end)
 		memmap_init_zone((unsigned long)(map_end - map_start),
-				 args->nid, args->zone, page_to_pfn(map_start),
+				 args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
 				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
 	return 0;
 }
@@ -546,7 +546,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
 	     unsigned long start_pfn)
 {
 	if (!vmem_map) {
-		memmap_init_zone(size, nid, zone, start_pfn,
+		memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
 				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
 	} else {
 		struct page *start;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e4e5be20b0c2..92e06ea053f4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2432,8 +2432,9 @@ extern int __meminit early_pfn_to_nid(unsigned long pfn);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
-		enum meminit_context, struct vmem_altmap *, int migratetype);
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+		unsigned long, unsigned long, enum meminit_context,
+		struct vmem_altmap *, int migratetype);
 extern void setup_per_zone_wmarks(void);
 extern int __meminit init_per_zone_wmark_min(void);
 extern void mem_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index af41fb990820..f9d57b9be8c7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -713,7 +713,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	 * expects the zone spans the pfn range. All the pages in the range
 	 * are reserved so nobody should be touching them so we should be safe
 	 */
-	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
+	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
 			 MEMINIT_HOTPLUG, altmap, migratetype);
 
 	set_zone_contiguous(zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8cea0823b70e..32645f2e7b96 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -423,6 +423,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
 	if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
 		return false;
 
+	if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
+		return true;
 	/*
 	 * We start only with one section of pages, more pages are added as
 	 * needed until the rest of deferred pages are initialized.
@@ -6116,7 +6118,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
  * zone stats (e.g., nr_isolate_pageblock) are touched.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
-		unsigned long start_pfn,
+		unsigned long start_pfn, unsigned long zone_end_pfn,
 		enum meminit_context context,
 		struct vmem_altmap *altmap, int migratetype)
 {
@@ -6152,7 +6154,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (context == MEMINIT_EARLY) {
 			if (overlap_memmap_init(zone, &pfn))
 				continue;
-			if (defer_init(nid, pfn, end_pfn))
+			if (defer_init(nid, pfn, zone_end_pfn))
 				break;
 		}
 
@@ -6307,7 +6309,7 @@ void __init __weak memmap_init(unsigned long size, int nid,
 
 		if (end_pfn > start_pfn) {
 			size = end_pfn - start_pfn;
-			memmap_init_zone(size, nid, zone, start_pfn,
+			memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
 					 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
 		}
 
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/5] mm: rename memmap_init() and memmap_init_zone()
  2020-12-20  8:27 [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Baoquan He
  2020-12-20  8:27 ` [PATCH v2 1/5] mm: memmap defer init dosn't work as expected Baoquan He
@ 2020-12-20  8:27 ` Baoquan He
  2020-12-21  6:33   ` Mike Rapoport
  2020-12-20  8:27 ` [PATCH v2 3/5] mm: simplify parater of function memmap_init_zone() Baoquan He
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-12-20  8:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, gopakumarr, rppt, david, bhe

The current memmap_init_zone() only handles memory region inside one zone,
actually memmap_init() does the memmap init of one zone. So rename both of
them accordingly.

And also rename the function parameter 'range_start_pfn' and local variable
'range_end_pfn' of memmap_init() to zone_start_pfn/zone_end_pfn.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/ia64/include/asm/pgtable.h |  2 +-
 arch/ia64/mm/init.c             |  6 +++---
 include/linux/mm.h              |  2 +-
 mm/memory_hotplug.c             |  2 +-
 mm/page_alloc.c                 | 24 ++++++++++++------------
 5 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 779b6972aa84..dce2ff37df65 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -520,7 +520,7 @@ extern struct page *zero_page_memmap_ptr;
 
 #  ifdef CONFIG_VIRTUAL_MEM_MAP
   /* arch mem_map init routine is needed due to holes in a virtual mem_map */
-    extern void memmap_init (unsigned long size, int nid, unsigned long zone,
+    extern void memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			     unsigned long start_pfn);
 #  endif /* CONFIG_VIRTUAL_MEM_MAP */
 # endif /* !__ASSEMBLY__ */
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index e76386a3479e..c8e68e92beb3 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -535,18 +535,18 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
 		    / sizeof(struct page));
 
 	if (map_start < map_end)
-		memmap_init_zone((unsigned long)(map_end - map_start),
+		memmap_init_range((unsigned long)(map_end - map_start),
 				 args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
 				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
 	return 0;
 }
 
 void __meminit
-memmap_init (unsigned long size, int nid, unsigned long zone,
+memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	     unsigned long start_pfn)
 {
 	if (!vmem_map) {
-		memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
+		memmap_init_range(size, nid, zone, start_pfn, start_pfn + size,
 				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
 	} else {
 		struct page *start;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 92e06ea053f4..f72c138c2272 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2432,7 +2432,7 @@ extern int __meminit early_pfn_to_nid(unsigned long pfn);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long,
+extern void memmap_init_range(unsigned long, int, unsigned long,
 		unsigned long, unsigned long, enum meminit_context,
 		struct vmem_altmap *, int migratetype);
 extern void setup_per_zone_wmarks(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f9d57b9be8c7..ddcb1cd24c60 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -713,7 +713,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	 * expects the zone spans the pfn range. All the pages in the range
 	 * are reserved so nobody should be touching them so we should be safe
 	 */
-	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
+	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
 			 MEMINIT_HOTPLUG, altmap, migratetype);
 
 	set_zone_contiguous(zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 32645f2e7b96..4b46326099d9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6117,7 +6117,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
  * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
  * zone stats (e.g., nr_isolate_pageblock) are touched.
  */
-void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
+void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, unsigned long zone_end_pfn,
 		enum meminit_context context,
 		struct vmem_altmap *altmap, int migratetype)
@@ -6292,24 +6292,24 @@ static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn,
 }
 #endif
 
-void __init __weak memmap_init(unsigned long size, int nid,
+void __init __weak memmap_init_zone(unsigned long size, int nid,
 			       unsigned long zone,
-			       unsigned long range_start_pfn)
+			       unsigned long zone_start_pfn)
 {
 	unsigned long start_pfn, end_pfn, hole_start_pfn = 0;
-	unsigned long range_end_pfn = range_start_pfn + size;
+	unsigned long zone_end_pfn = zone_start_pfn + size;
 	u64 pgcnt = 0;
 	int i;
 
 	for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
-		start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
-		end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
-		hole_start_pfn = clamp(hole_start_pfn, range_start_pfn,
-				       range_end_pfn);
+		start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
+		end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
+		hole_start_pfn = clamp(hole_start_pfn, zone_start_pfn,
+				       zone_end_pfn);
 
 		if (end_pfn > start_pfn) {
 			size = end_pfn - start_pfn;
-			memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
+			memmap_init_range(size, nid, zone, start_pfn, zone_end_pfn,
 					 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
 		}
 
@@ -6326,8 +6326,8 @@ void __init __weak memmap_init(unsigned long size, int nid,
 	 * considered initialized. Make sure that memmap has a well defined
 	 * state.
 	 */
-	if (hole_start_pfn < range_end_pfn)
-		pgcnt += init_unavailable_range(hole_start_pfn, range_end_pfn,
+	if (hole_start_pfn < zone_end_pfn)
+		pgcnt += init_unavailable_range(hole_start_pfn, zone_end_pfn,
 						zone, nid);
 
 	if (pgcnt)
@@ -7039,7 +7039,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 		set_pageblock_order();
 		setup_usemap(pgdat, zone, zone_start_pfn, size);
 		init_currently_empty_zone(zone, zone_start_pfn, size);
-		memmap_init(size, nid, j, zone_start_pfn);
+		memmap_init_zone(size, nid, j, zone_start_pfn);
 	}
 }
 
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/5] mm: simplify parater of function memmap_init_zone()
  2020-12-20  8:27 [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Baoquan He
  2020-12-20  8:27 ` [PATCH v2 1/5] mm: memmap defer init dosn't work as expected Baoquan He
  2020-12-20  8:27 ` [PATCH v2 2/5] mm: rename memmap_init() and memmap_init_zone() Baoquan He
@ 2020-12-20  8:27 ` Baoquan He
  2020-12-21  6:34   ` Mike Rapoport
  2020-12-20  8:27 ` [PATCH v2 4/5] mm: simplify parameter of setup_usemap() Baoquan He
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-12-20  8:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, gopakumarr, rppt, david, bhe

As David suggested, simply passing 'struct zone *zone' is enough. We can
get all needed information from 'struct zone*' easily.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/ia64/include/asm/pgtable.h |  3 +--
 arch/ia64/mm/init.c             | 12 +++++++-----
 mm/page_alloc.c                 | 20 ++++++++++----------
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index dce2ff37df65..2c81394a2430 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -520,8 +520,7 @@ extern struct page *zero_page_memmap_ptr;
 
 #  ifdef CONFIG_VIRTUAL_MEM_MAP
   /* arch mem_map init routine is needed due to holes in a virtual mem_map */
-    extern void memmap_init_zone(unsigned long size, int nid, unsigned long zone,
-			     unsigned long start_pfn);
+    extern void memmap_init_zone(struct zone *zone);
 #  endif /* CONFIG_VIRTUAL_MEM_MAP */
 # endif /* !__ASSEMBLY__ */
 
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index c8e68e92beb3..ccbda1a74c95 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -541,12 +541,14 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
 	return 0;
 }
 
-void __meminit
-memmap_init_zone(unsigned long size, int nid, unsigned long zone,
-	     unsigned long start_pfn)
+void __meminit memmap_init_zone(struct zone *zone)
 {
+	unsigned long size = zone->spanned_pages;
+	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
+	unsigned long start_pfn = zone->zone_start_pfn;
+
 	if (!vmem_map) {
-		memmap_init_range(size, nid, zone, start_pfn, start_pfn + size,
+		memmap_init_range(size, nid, zone_id, start_pfn, start_pfn + size,
 				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
 	} else {
 		struct page *start;
@@ -556,7 +558,7 @@ memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		args.start = start;
 		args.end = start + size;
 		args.nid = nid;
-		args.zone = zone;
+		args.zone = zone_id;
 
 		efi_memmap_walk(virtual_memmap_init, &args);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4b46326099d9..7a6626351ed7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6292,16 +6292,16 @@ static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn,
 }
 #endif
 
-void __init __weak memmap_init_zone(unsigned long size, int nid,
-			       unsigned long zone,
-			       unsigned long zone_start_pfn)
+void __init __weak memmap_init_zone(struct zone *zone)
 {
 	unsigned long start_pfn, end_pfn, hole_start_pfn = 0;
-	unsigned long zone_end_pfn = zone_start_pfn + size;
+	int i, nid = zone_to_nid(zone), zone_id = zone_idx(zone);
+	unsigned long zone_start_pfn = zone->zone_start_pfn;
+	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
 	u64 pgcnt = 0;
-	int i;
 
 	for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
+		unsigned long size;
 		start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
 		end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
 		hole_start_pfn = clamp(hole_start_pfn, zone_start_pfn,
@@ -6309,13 +6309,13 @@ void __init __weak memmap_init_zone(unsigned long size, int nid,
 
 		if (end_pfn > start_pfn) {
 			size = end_pfn - start_pfn;
-			memmap_init_range(size, nid, zone, start_pfn, zone_end_pfn,
+			memmap_init_range(size, nid, zone_id, start_pfn, zone_end_pfn,
 					 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
 		}
 
 		if (hole_start_pfn < start_pfn)
 			pgcnt += init_unavailable_range(hole_start_pfn,
-							start_pfn, zone, nid);
+							start_pfn, zone_id, nid);
 		hole_start_pfn = end_pfn;
 	}
 
@@ -6328,11 +6328,11 @@ void __init __weak memmap_init_zone(unsigned long size, int nid,
 	 */
 	if (hole_start_pfn < zone_end_pfn)
 		pgcnt += init_unavailable_range(hole_start_pfn, zone_end_pfn,
-						zone, nid);
+						zone_id, nid);
 
 	if (pgcnt)
 		pr_info("%s: Zeroed struct page in unavailable ranges: %lld\n",
-			zone_names[zone], pgcnt);
+			zone_names[zone_id], pgcnt);
 }
 
 static int zone_batchsize(struct zone *zone)
@@ -7039,7 +7039,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 		set_pageblock_order();
 		setup_usemap(pgdat, zone, zone_start_pfn, size);
 		init_currently_empty_zone(zone, zone_start_pfn, size);
-		memmap_init_zone(size, nid, j, zone_start_pfn);
+		memmap_init_zone(zone);
 	}
 }
 
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/5] mm: simplify parameter of setup_usemap()
  2020-12-20  8:27 [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Baoquan He
                   ` (2 preceding siblings ...)
  2020-12-20  8:27 ` [PATCH v2 3/5] mm: simplify parater of function memmap_init_zone() Baoquan He
@ 2020-12-20  8:27 ` Baoquan He
  2020-12-21  6:34   ` Mike Rapoport
  2020-12-20  8:27 ` [PATCH v2 5/5] mm: remove unneeded local variable in free_area_init_core Baoquan He
  2020-12-23  1:46 ` [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Andrew Morton
  5 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-12-20  8:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, gopakumarr, rppt, david, bhe

Parameter 'zone' has got needed information, let's remove other
unnecessary parameters.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a6626351ed7..7f0a917ab858 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6824,25 +6824,22 @@ static unsigned long __init usemap_size(unsigned long zone_start_pfn, unsigned l
 	return usemapsize / 8;
 }
 
-static void __ref setup_usemap(struct pglist_data *pgdat,
-				struct zone *zone,
-				unsigned long zone_start_pfn,
-				unsigned long zonesize)
+static void __ref setup_usemap(struct zone *zone)
 {
-	unsigned long usemapsize = usemap_size(zone_start_pfn, zonesize);
+	unsigned long usemapsize = usemap_size(zone->zone_start_pfn,
+					       zone->spanned_pages);
 	zone->pageblock_flags = NULL;
 	if (usemapsize) {
 		zone->pageblock_flags =
 			memblock_alloc_node(usemapsize, SMP_CACHE_BYTES,
-					    pgdat->node_id);
+					    zone_to_nid(zone));
 		if (!zone->pageblock_flags)
 			panic("Failed to allocate %ld bytes for zone %s pageblock flags on node %d\n",
-			      usemapsize, zone->name, pgdat->node_id);
+			      usemapsize, zone->name, zone_to_nid(zone));
 	}
 }
 #else
-static inline void setup_usemap(struct pglist_data *pgdat, struct zone *zone,
-				unsigned long zone_start_pfn, unsigned long zonesize) {}
+static inline void setup_usemap(struct zone *zone) {}
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
@@ -7037,7 +7034,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 			continue;
 
 		set_pageblock_order();
-		setup_usemap(pgdat, zone, zone_start_pfn, size);
+		setup_usemap(zone);
 		init_currently_empty_zone(zone, zone_start_pfn, size);
 		memmap_init_zone(zone);
 	}
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/5] mm: remove unneeded local variable in free_area_init_core
  2020-12-20  8:27 [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Baoquan He
                   ` (3 preceding siblings ...)
  2020-12-20  8:27 ` [PATCH v2 4/5] mm: simplify parameter of setup_usemap() Baoquan He
@ 2020-12-20  8:27 ` Baoquan He
  2020-12-21  6:35   ` Mike Rapoport
  2020-12-23  1:46 ` [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Andrew Morton
  5 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-12-20  8:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, gopakumarr, rppt, david, bhe

Local variable 'zone_start_pfn' is not needed since there's only
one call site in free_area_init_core(). Let's remove it and pass
zone->zone_start_pfn directly to init_currently_empty_zone().

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7f0a917ab858..189a86253c93 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6986,7 +6986,6 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 	for (j = 0; j < MAX_NR_ZONES; j++) {
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, freesize, memmap_pages;
-		unsigned long zone_start_pfn = zone->zone_start_pfn;
 
 		size = zone->spanned_pages;
 		freesize = zone->present_pages;
@@ -7035,7 +7034,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 
 		set_pageblock_order();
 		setup_usemap(zone);
-		init_currently_empty_zone(zone, zone_start_pfn, size);
+		init_currently_empty_zone(zone, zone->zone_start_pfn, size);
 		memmap_init_zone(zone);
 	}
 }
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] mm: memmap defer init dosn't work as expected
  2020-12-20  8:27 ` [PATCH v2 1/5] mm: memmap defer init dosn't work as expected Baoquan He
@ 2020-12-21  6:32   ` Mike Rapoport
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2020-12-21  6:32 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, linux-mm, akpm, gopakumarr, david

On Sun, Dec 20, 2020 at 04:27:50PM +0800, Baoquan He wrote:
> VMware observed a performance regression during memmap init on their platform,
> and bisected to commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock
> regions rather that check each PFN") causing it.
> 
> Before the commit:
> 
>   [0.033176] Normal zone: 1445888 pages used for memmap
>   [0.033176] Normal zone: 89391104 pages, LIFO batch:63
>   [0.035851] ACPI: PM-Timer IO Port: 0x448
> 
> With commit
> 
>   [0.026874] Normal zone: 1445888 pages used for memmap
>   [0.026875] Normal zone: 89391104 pages, LIFO batch:63
>   [2.028450] ACPI: PM-Timer IO Port: 0x448
> 
> The root cause is the current memmap defer init doesn't work as expected.
> Before, memmap_init_zone() was used to do memmap init of one whole zone, to
> initialize all low zones of one numa node, but defer memmap init of the
> last zone in that numa node. However, since commit 73a6e474cb376, function
> memmap_init() is adapted to iterater over memblock regions inside one zone,
> then call memmap_init_zone() to do memmap init for each region.
> 
> E.g, on VMware's system, the memory layout is as below, there are two memory
> regions in node 2. The current code will mistakenly initialize the whole 1st
> region [mem 0xab00000000-0xfcffffffff], then do memmap defer to iniatialize
> only one memmory section on the 2nd region [mem 0x10000000000-0x1033fffffff].
> In fact, we only expect to see that there's only one memory section's memmap
> initialized. That's why more time is costed at the time.
> 
> [    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> [    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
> [    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
> [    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
> [    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
> [    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]
> 
> Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
> down the real zone end pfn so that defer_init() can use it to judge whether
> defer need be taken in zone wide.
> 
> Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
> Reported-by: Rahul Gopakumar <gopakumarr@vmware.com>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: stable@vger.kernel.org

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  arch/ia64/mm/init.c | 4 ++--
>  include/linux/mm.h  | 5 +++--
>  mm/memory_hotplug.c | 2 +-
>  mm/page_alloc.c     | 8 +++++---
>  4 files changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 9b5acf8fb092..e76386a3479e 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -536,7 +536,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
>  
>  	if (map_start < map_end)
>  		memmap_init_zone((unsigned long)(map_end - map_start),
> -				 args->nid, args->zone, page_to_pfn(map_start),
> +				 args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
>  				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  	return 0;
>  }
> @@ -546,7 +546,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
>  	     unsigned long start_pfn)
>  {
>  	if (!vmem_map) {
> -		memmap_init_zone(size, nid, zone, start_pfn,
> +		memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
>  				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  	} else {
>  		struct page *start;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e4e5be20b0c2..92e06ea053f4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2432,8 +2432,9 @@ extern int __meminit early_pfn_to_nid(unsigned long pfn);
>  #endif
>  
>  extern void set_dma_reserve(unsigned long new_dma_reserve);
> -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
> -		enum meminit_context, struct vmem_altmap *, int migratetype);
> +extern void memmap_init_zone(unsigned long, int, unsigned long,
> +		unsigned long, unsigned long, enum meminit_context,
> +		struct vmem_altmap *, int migratetype);
>  extern void setup_per_zone_wmarks(void);
>  extern int __meminit init_per_zone_wmark_min(void);
>  extern void mem_init(void);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index af41fb990820..f9d57b9be8c7 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -713,7 +713,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	 * expects the zone spans the pfn range. All the pages in the range
>  	 * are reserved so nobody should be touching them so we should be safe
>  	 */
> -	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
> +	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>  			 MEMINIT_HOTPLUG, altmap, migratetype);
>  
>  	set_zone_contiguous(zone);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8cea0823b70e..32645f2e7b96 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -423,6 +423,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
>  	if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
>  		return false;
>  
> +	if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
> +		return true;
>  	/*
>  	 * We start only with one section of pages, more pages are added as
>  	 * needed until the rest of deferred pages are initialized.
> @@ -6116,7 +6118,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>   * zone stats (e.g., nr_isolate_pageblock) are touched.
>   */
>  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> -		unsigned long start_pfn,
> +		unsigned long start_pfn, unsigned long zone_end_pfn,
>  		enum meminit_context context,
>  		struct vmem_altmap *altmap, int migratetype)
>  {
> @@ -6152,7 +6154,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  		if (context == MEMINIT_EARLY) {
>  			if (overlap_memmap_init(zone, &pfn))
>  				continue;
> -			if (defer_init(nid, pfn, end_pfn))
> +			if (defer_init(nid, pfn, zone_end_pfn))
>  				break;
>  		}
>  
> @@ -6307,7 +6309,7 @@ void __init __weak memmap_init(unsigned long size, int nid,
>  
>  		if (end_pfn > start_pfn) {
>  			size = end_pfn - start_pfn;
> -			memmap_init_zone(size, nid, zone, start_pfn,
> +			memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
>  					 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  		}
>  
> -- 
> 2.17.2
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] mm: rename memmap_init() and memmap_init_zone()
  2020-12-20  8:27 ` [PATCH v2 2/5] mm: rename memmap_init() and memmap_init_zone() Baoquan He
@ 2020-12-21  6:33   ` Mike Rapoport
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2020-12-21  6:33 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, linux-mm, akpm, gopakumarr, david

On Sun, Dec 20, 2020 at 04:27:51PM +0800, Baoquan He wrote:
> The current memmap_init_zone() only handles memory region inside one zone,
> actually memmap_init() does the memmap init of one zone. So rename both of
> them accordingly.
> 
> And also rename the function parameter 'range_start_pfn' and local variable
> 'range_end_pfn' of memmap_init() to zone_start_pfn/zone_end_pfn.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  arch/ia64/include/asm/pgtable.h |  2 +-
>  arch/ia64/mm/init.c             |  6 +++---
>  include/linux/mm.h              |  2 +-
>  mm/memory_hotplug.c             |  2 +-
>  mm/page_alloc.c                 | 24 ++++++++++++------------
>  5 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
> index 779b6972aa84..dce2ff37df65 100644
> --- a/arch/ia64/include/asm/pgtable.h
> +++ b/arch/ia64/include/asm/pgtable.h
> @@ -520,7 +520,7 @@ extern struct page *zero_page_memmap_ptr;
>  
>  #  ifdef CONFIG_VIRTUAL_MEM_MAP
>    /* arch mem_map init routine is needed due to holes in a virtual mem_map */
> -    extern void memmap_init (unsigned long size, int nid, unsigned long zone,
> +    extern void memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  			     unsigned long start_pfn);
>  #  endif /* CONFIG_VIRTUAL_MEM_MAP */
>  # endif /* !__ASSEMBLY__ */
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index e76386a3479e..c8e68e92beb3 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -535,18 +535,18 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
>  		    / sizeof(struct page));
>  
>  	if (map_start < map_end)
> -		memmap_init_zone((unsigned long)(map_end - map_start),
> +		memmap_init_range((unsigned long)(map_end - map_start),
>  				 args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
>  				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  	return 0;
>  }
>  
>  void __meminit
> -memmap_init (unsigned long size, int nid, unsigned long zone,
> +memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  	     unsigned long start_pfn)
>  {
>  	if (!vmem_map) {
> -		memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
> +		memmap_init_range(size, nid, zone, start_pfn, start_pfn + size,
>  				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  	} else {
>  		struct page *start;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 92e06ea053f4..f72c138c2272 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2432,7 +2432,7 @@ extern int __meminit early_pfn_to_nid(unsigned long pfn);
>  #endif
>  
>  extern void set_dma_reserve(unsigned long new_dma_reserve);
> -extern void memmap_init_zone(unsigned long, int, unsigned long,
> +extern void memmap_init_range(unsigned long, int, unsigned long,
>  		unsigned long, unsigned long, enum meminit_context,
>  		struct vmem_altmap *, int migratetype);
>  extern void setup_per_zone_wmarks(void);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index f9d57b9be8c7..ddcb1cd24c60 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -713,7 +713,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	 * expects the zone spans the pfn range. All the pages in the range
>  	 * are reserved so nobody should be touching them so we should be safe
>  	 */
> -	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
> +	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>  			 MEMINIT_HOTPLUG, altmap, migratetype);
>  
>  	set_zone_contiguous(zone);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 32645f2e7b96..4b46326099d9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6117,7 +6117,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>   * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
>   * zone stats (e.g., nr_isolate_pageblock) are touched.
>   */
> -void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> +void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone,
>  		unsigned long start_pfn, unsigned long zone_end_pfn,
>  		enum meminit_context context,
>  		struct vmem_altmap *altmap, int migratetype)
> @@ -6292,24 +6292,24 @@ static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn,
>  }
>  #endif
>  
> -void __init __weak memmap_init(unsigned long size, int nid,
> +void __init __weak memmap_init_zone(unsigned long size, int nid,
>  			       unsigned long zone,
> -			       unsigned long range_start_pfn)
> +			       unsigned long zone_start_pfn)
>  {
>  	unsigned long start_pfn, end_pfn, hole_start_pfn = 0;
> -	unsigned long range_end_pfn = range_start_pfn + size;
> +	unsigned long zone_end_pfn = zone_start_pfn + size;
>  	u64 pgcnt = 0;
>  	int i;
>  
>  	for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> -		start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> -		end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> -		hole_start_pfn = clamp(hole_start_pfn, range_start_pfn,
> -				       range_end_pfn);
> +		start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> +		end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> +		hole_start_pfn = clamp(hole_start_pfn, zone_start_pfn,
> +				       zone_end_pfn);
>  
>  		if (end_pfn > start_pfn) {
>  			size = end_pfn - start_pfn;
> -			memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
> +			memmap_init_range(size, nid, zone, start_pfn, zone_end_pfn,
>  					 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  		}
>  
> @@ -6326,8 +6326,8 @@ void __init __weak memmap_init(unsigned long size, int nid,
>  	 * considered initialized. Make sure that memmap has a well defined
>  	 * state.
>  	 */
> -	if (hole_start_pfn < range_end_pfn)
> -		pgcnt += init_unavailable_range(hole_start_pfn, range_end_pfn,
> +	if (hole_start_pfn < zone_end_pfn)
> +		pgcnt += init_unavailable_range(hole_start_pfn, zone_end_pfn,
>  						zone, nid);
>  
>  	if (pgcnt)
> @@ -7039,7 +7039,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
>  		set_pageblock_order();
>  		setup_usemap(pgdat, zone, zone_start_pfn, size);
>  		init_currently_empty_zone(zone, zone_start_pfn, size);
> -		memmap_init(size, nid, j, zone_start_pfn);
> +		memmap_init_zone(size, nid, j, zone_start_pfn);
>  	}
>  }
>  
> -- 
> 2.17.2
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/5] mm: simplify parater of function memmap_init_zone()
  2020-12-20  8:27 ` [PATCH v2 3/5] mm: simplify parater of function memmap_init_zone() Baoquan He
@ 2020-12-21  6:34   ` Mike Rapoport
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2020-12-21  6:34 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, linux-mm, akpm, gopakumarr, david

On Sun, Dec 20, 2020 at 04:27:52PM +0800, Baoquan He wrote:
> As David suggested, simply passing 'struct zone *zone' is enough. We can
> get all needed information from 'struct zone*' easily.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  arch/ia64/include/asm/pgtable.h |  3 +--
>  arch/ia64/mm/init.c             | 12 +++++++-----
>  mm/page_alloc.c                 | 20 ++++++++++----------
>  3 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
> index dce2ff37df65..2c81394a2430 100644
> --- a/arch/ia64/include/asm/pgtable.h
> +++ b/arch/ia64/include/asm/pgtable.h
> @@ -520,8 +520,7 @@ extern struct page *zero_page_memmap_ptr;
>  
>  #  ifdef CONFIG_VIRTUAL_MEM_MAP
>    /* arch mem_map init routine is needed due to holes in a virtual mem_map */
> -    extern void memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> -			     unsigned long start_pfn);
> +    extern void memmap_init_zone(struct zone *zone);
>  #  endif /* CONFIG_VIRTUAL_MEM_MAP */
>  # endif /* !__ASSEMBLY__ */
>  
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index c8e68e92beb3..ccbda1a74c95 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -541,12 +541,14 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
>  	return 0;
>  }
>  
> -void __meminit
> -memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> -	     unsigned long start_pfn)
> +void __meminit memmap_init_zone(struct zone *zone)
>  {
> +	unsigned long size = zone->spanned_pages;
> +	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> +	unsigned long start_pfn = zone->zone_start_pfn;
> +
>  	if (!vmem_map) {
> -		memmap_init_range(size, nid, zone, start_pfn, start_pfn + size,
> +		memmap_init_range(size, nid, zone_id, start_pfn, start_pfn + size,
>  				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  	} else {
>  		struct page *start;
> @@ -556,7 +558,7 @@ memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  		args.start = start;
>  		args.end = start + size;
>  		args.nid = nid;
> -		args.zone = zone;
> +		args.zone = zone_id;
>  
>  		efi_memmap_walk(virtual_memmap_init, &args);
>  	}
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4b46326099d9..7a6626351ed7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6292,16 +6292,16 @@ static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn,
>  }
>  #endif
>  
> -void __init __weak memmap_init_zone(unsigned long size, int nid,
> -			       unsigned long zone,
> -			       unsigned long zone_start_pfn)
> +void __init __weak memmap_init_zone(struct zone *zone)
>  {
>  	unsigned long start_pfn, end_pfn, hole_start_pfn = 0;
> -	unsigned long zone_end_pfn = zone_start_pfn + size;
> +	int i, nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> +	unsigned long zone_start_pfn = zone->zone_start_pfn;
> +	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
>  	u64 pgcnt = 0;
> -	int i;
>  
>  	for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> +		unsigned long size;
>  		start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
>  		end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
>  		hole_start_pfn = clamp(hole_start_pfn, zone_start_pfn,
> @@ -6309,13 +6309,13 @@ void __init __weak memmap_init_zone(unsigned long size, int nid,
>  
>  		if (end_pfn > start_pfn) {
>  			size = end_pfn - start_pfn;
> -			memmap_init_range(size, nid, zone, start_pfn, zone_end_pfn,
> +			memmap_init_range(size, nid, zone_id, start_pfn, zone_end_pfn,
>  					 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  		}
>  
>  		if (hole_start_pfn < start_pfn)
>  			pgcnt += init_unavailable_range(hole_start_pfn,
> -							start_pfn, zone, nid);
> +							start_pfn, zone_id, nid);
>  		hole_start_pfn = end_pfn;
>  	}
>  
> @@ -6328,11 +6328,11 @@ void __init __weak memmap_init_zone(unsigned long size, int nid,
>  	 */
>  	if (hole_start_pfn < zone_end_pfn)
>  		pgcnt += init_unavailable_range(hole_start_pfn, zone_end_pfn,
> -						zone, nid);
> +						zone_id, nid);
>  
>  	if (pgcnt)
>  		pr_info("%s: Zeroed struct page in unavailable ranges: %lld\n",
> -			zone_names[zone], pgcnt);
> +			zone_names[zone_id], pgcnt);
>  }
>  
>  static int zone_batchsize(struct zone *zone)
> @@ -7039,7 +7039,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
>  		set_pageblock_order();
>  		setup_usemap(pgdat, zone, zone_start_pfn, size);
>  		init_currently_empty_zone(zone, zone_start_pfn, size);
> -		memmap_init_zone(size, nid, j, zone_start_pfn);
> +		memmap_init_zone(zone);
>  	}
>  }
>  
> -- 
> 2.17.2
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] mm: simplify parameter of setup_usemap()
  2020-12-20  8:27 ` [PATCH v2 4/5] mm: simplify parameter of setup_usemap() Baoquan He
@ 2020-12-21  6:34   ` Mike Rapoport
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2020-12-21  6:34 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, linux-mm, akpm, gopakumarr, david

On Sun, Dec 20, 2020 at 04:27:53PM +0800, Baoquan He wrote:
> Parameter 'zone' has got needed information, let's remove other
> unnecessary parameters.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  mm/page_alloc.c | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7a6626351ed7..7f0a917ab858 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6824,25 +6824,22 @@ static unsigned long __init usemap_size(unsigned long zone_start_pfn, unsigned l
>  	return usemapsize / 8;
>  }
>  
> -static void __ref setup_usemap(struct pglist_data *pgdat,
> -				struct zone *zone,
> -				unsigned long zone_start_pfn,
> -				unsigned long zonesize)
> +static void __ref setup_usemap(struct zone *zone)
>  {
> -	unsigned long usemapsize = usemap_size(zone_start_pfn, zonesize);
> +	unsigned long usemapsize = usemap_size(zone->zone_start_pfn,
> +					       zone->spanned_pages);
>  	zone->pageblock_flags = NULL;
>  	if (usemapsize) {
>  		zone->pageblock_flags =
>  			memblock_alloc_node(usemapsize, SMP_CACHE_BYTES,
> -					    pgdat->node_id);
> +					    zone_to_nid(zone));
>  		if (!zone->pageblock_flags)
>  			panic("Failed to allocate %ld bytes for zone %s pageblock flags on node %d\n",
> -			      usemapsize, zone->name, pgdat->node_id);
> +			      usemapsize, zone->name, zone_to_nid(zone));
>  	}
>  }
>  #else
> -static inline void setup_usemap(struct pglist_data *pgdat, struct zone *zone,
> -				unsigned long zone_start_pfn, unsigned long zonesize) {}
> +static inline void setup_usemap(struct zone *zone) {}
>  #endif /* CONFIG_SPARSEMEM */
>  
>  #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
> @@ -7037,7 +7034,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
>  			continue;
>  
>  		set_pageblock_order();
> -		setup_usemap(pgdat, zone, zone_start_pfn, size);
> +		setup_usemap(zone);
>  		init_currently_empty_zone(zone, zone_start_pfn, size);
>  		memmap_init_zone(zone);
>  	}
> -- 
> 2.17.2
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] mm: remove unneeded local variable in free_area_init_core
  2020-12-20  8:27 ` [PATCH v2 5/5] mm: remove unneeded local variable in free_area_init_core Baoquan He
@ 2020-12-21  6:35   ` Mike Rapoport
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2020-12-21  6:35 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, linux-mm, akpm, gopakumarr, david

On Sun, Dec 20, 2020 at 04:27:54PM +0800, Baoquan He wrote:
> Local variable 'zone_start_pfn' is not needed since there's only
> one call site in free_area_init_core(). Let's remove it and pass
> zone->zone_start_pfn directly to init_currently_empty_zone().
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  mm/page_alloc.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7f0a917ab858..189a86253c93 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6986,7 +6986,6 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
>  	for (j = 0; j < MAX_NR_ZONES; j++) {
>  		struct zone *zone = pgdat->node_zones + j;
>  		unsigned long size, freesize, memmap_pages;
> -		unsigned long zone_start_pfn = zone->zone_start_pfn;
>  
>  		size = zone->spanned_pages;
>  		freesize = zone->present_pages;
> @@ -7035,7 +7034,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
>  
>  		set_pageblock_order();
>  		setup_usemap(zone);
> -		init_currently_empty_zone(zone, zone_start_pfn, size);
> +		init_currently_empty_zone(zone, zone->zone_start_pfn, size);
>  		memmap_init_zone(zone);
>  	}
>  }
> -- 
> 2.17.2
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup
  2020-12-20  8:27 [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Baoquan He
                   ` (4 preceding siblings ...)
  2020-12-20  8:27 ` [PATCH v2 5/5] mm: remove unneeded local variable in free_area_init_core Baoquan He
@ 2020-12-23  1:46 ` Andrew Morton
  2020-12-23  2:05   ` Baoquan He
  5 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2020-12-23  1:46 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, linux-mm, gopakumarr, rppt, david

On Sun, 20 Dec 2020 16:27:49 +0800 Baoquan He <bhe@redhat.com> wrote:

> VMware reported the performance regression during memmap_init() invocation.
> And they bisected to commit 73a6e474cb376 ("mm: memmap_init: iterate over
> memblock regions rather that check each PFN") causing it.
> 
> https://lore.kernel.org/linux-mm/DM6PR05MB52921FF90FA01CC337DD23A1A4080@DM6PR05MB5292.namprd05.prod.outlook.com/
> 
> After investigation, it's caused by incorrect memmap init defer handling
> in memmap_init_zone() after commit 73a6e474cb376. The current
> memmap_init_zone() only handle one memory region of one zone, while
> memmap_init() iterates over all its memory regions and pass them one by
> one into memmap_init_zone() to handle.
> 
> So in this patchset, patch 1/5 fixes the bug observed by VMware. Patch
> 2~5/5 clean up codes.
> accordingly.

This series doesn't apply well to current mainline (plus, perhaps,
material which I sent to Linus today).

So please check all that against mainline in a day or so, refresh,
retest and resend.

Please separate the fix for the performance regression (1/5) into a
single standalone patch, ready for -stable backporting.  And then a
separate 4-patch series with the cleanups for a 5.11 merge.

Thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup
  2020-12-23  1:46 ` [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Andrew Morton
@ 2020-12-23  2:05   ` Baoquan He
  2020-12-23  8:12     ` Baoquan He
  0 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-12-23  2:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, gopakumarr, rppt, david

On 12/22/20 at 05:46pm, Andrew Morton wrote:
> On Sun, 20 Dec 2020 16:27:49 +0800 Baoquan He <bhe@redhat.com> wrote:
> 
> > VMware reported the performance regression during memmap_init() invocation.
> > And they bisected to commit 73a6e474cb376 ("mm: memmap_init: iterate over
> > memblock regions rather that check each PFN") causing it.
> > 
> > https://lore.kernel.org/linux-mm/DM6PR05MB52921FF90FA01CC337DD23A1A4080@DM6PR05MB5292.namprd05.prod.outlook.com/
> > 
> > After investigation, it's caused by incorrect memmap init defer handling
> > in memmap_init_zone() after commit 73a6e474cb376. The current
> > memmap_init_zone() only handle one memory region of one zone, while
> > memmap_init() iterates over all its memory regions and pass them one by
> > one into memmap_init_zone() to handle.
> > 
> > So in this patchset, patch 1/5 fixes the bug observed by VMware. Patch
> > 2~5/5 clean up codes.
> > accordingly.
> 
> This series doesn't apply well to current mainline (plus, perhaps,
> material which I sent to Linus today).
> 
> So please check all that against mainline in a day or so, refresh,
> retest and resend.
> 
> Please separate the fix for the performance regression (1/5) into a
> single standalone patch, ready for -stable backporting.  And then a
> separate 4-patch series with the cleanups for a 5.11 merge.

Sure, doing now. 

By the way, when sending patches to linux-mm ML, which branch should I
rebase them on? I usually take your akpm/master as base, thought this
will make your patch picking easier. Seems my understanding is not true,
akpm/master is changed very soon, we should always base patch on linus's
master branch, whether patch is sending to linux-mm or not, right?

Thanks
Baoquan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup
  2020-12-23  2:05   ` Baoquan He
@ 2020-12-23  8:12     ` Baoquan He
  0 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-12-23  8:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, gopakumarr, rppt, david

On 12/23/20 at 10:05am, Baoquan He wrote:
> On 12/22/20 at 05:46pm, Andrew Morton wrote:
> > On Sun, 20 Dec 2020 16:27:49 +0800 Baoquan He <bhe@redhat.com> wrote:
> > 
> > > VMware reported the performance regression during memmap_init() invocation.
> > > And they bisected to commit 73a6e474cb376 ("mm: memmap_init: iterate over
> > > memblock regions rather that check each PFN") causing it.
> > > 
> > > https://lore.kernel.org/linux-mm/DM6PR05MB52921FF90FA01CC337DD23A1A4080@DM6PR05MB5292.namprd05.prod.outlook.com/
> > > 
> > > After investigation, it's caused by incorrect memmap init defer handling
> > > in memmap_init_zone() after commit 73a6e474cb376. The current
> > > memmap_init_zone() only handle one memory region of one zone, while
> > > memmap_init() iterates over all its memory regions and pass them one by
> > > one into memmap_init_zone() to handle.
> > > 
> > > So in this patchset, patch 1/5 fixes the bug observed by VMware. Patch
> > > 2~5/5 clean up codes.
> > > accordingly.
> > 
> > This series doesn't apply well to current mainline (plus, perhaps,
> > material which I sent to Linus today).
> > 
> > So please check all that against mainline in a day or so, refresh,
> > retest and resend.
> > 
> > Please separate the fix for the performance regression (1/5) into a
> > single standalone patch, ready for -stable backporting.  And then a
> > separate 4-patch series with the cleanups for a 5.11 merge.

Have sent the 1/5 as a standalone patch. Will send the rest 4 patches as
a patchset once the patch 1/5 is merged into linux-next. Thanks, Andrew.

> 
> Sure, doing now. 
> 
> By the way, when sending patches to linux-mm ML, which branch should I
> rebase them on? I usually take your akpm/master as base, thought this
> will make your patch picking easier. Seems my understanding is not true,
> akpm/master is changed very soon, we should always base patch on linus's
> master branch, whether patch is sending to linux-mm or not, right?



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-12-23  8:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-20  8:27 [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Baoquan He
2020-12-20  8:27 ` [PATCH v2 1/5] mm: memmap defer init dosn't work as expected Baoquan He
2020-12-21  6:32   ` Mike Rapoport
2020-12-20  8:27 ` [PATCH v2 2/5] mm: rename memmap_init() and memmap_init_zone() Baoquan He
2020-12-21  6:33   ` Mike Rapoport
2020-12-20  8:27 ` [PATCH v2 3/5] mm: simplify parater of function memmap_init_zone() Baoquan He
2020-12-21  6:34   ` Mike Rapoport
2020-12-20  8:27 ` [PATCH v2 4/5] mm: simplify parameter of setup_usemap() Baoquan He
2020-12-21  6:34   ` Mike Rapoport
2020-12-20  8:27 ` [PATCH v2 5/5] mm: remove unneeded local variable in free_area_init_core Baoquan He
2020-12-21  6:35   ` Mike Rapoport
2020-12-23  1:46 ` [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Andrew Morton
2020-12-23  2:05   ` Baoquan He
2020-12-23  8:12     ` Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).