linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] mm/init: minor clean up and improvement
@ 2024-03-26  6:11 Baoquan He
  2024-03-26  6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-26  6:11 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He

These are all observed when going through code flow during mm init.

Baoquan He (7):
  mm: move array mem_section init code out of memory_present()
  mm/init: remove the unnecessary special treatment for memory-less node
  mm: make __absent_pages_in_range() as static
  mm/page_alloc.c: remove unneeded codes in !NUMA version of
    build_zonelists()
  mm/mm_init.c: remove the outdated code comment above
    deferred_grow_zone()
  mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[]
    for empty zone
  mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES

 include/linux/mm.h |  2 --
 mm/mm_init.c       | 26 +++++---------------------
 mm/page_alloc.c    | 28 ++++------------------------
 mm/sparse.c        | 26 +++++++++++++-------------
 4 files changed, 22 insertions(+), 60 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] mm: move array mem_section init code out of memory_present()
  2024-03-26  6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
@ 2024-03-26  6:11 ` Baoquan He
  2024-04-02  8:19   ` Mike Rapoport
  2024-03-26  6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-03-26  6:11 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He

When CONFIG_SPARSEMEM_EXTREME is enabled, mem_section need be initialized
to point at a two-dimensional array, and its 1st dimension of length
NR_SECTION_ROOTS will be dynamically allocated. Once the allocation is
done, it's available for all nodes.

So take the 1st dimension of mem_section initialization out of
memory_present()(), and put it into memblocks_present() which is a more
appripriate place.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index aed0951b87fa..46e88549d1a6 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -226,19 +226,6 @@ static void __init memory_present(int nid, unsigned long start, unsigned long en
 {
 	unsigned long pfn;
 
-#ifdef CONFIG_SPARSEMEM_EXTREME
-	if (unlikely(!mem_section)) {
-		unsigned long size, align;
-
-		size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
-		align = 1 << (INTERNODE_CACHE_SHIFT);
-		mem_section = memblock_alloc(size, align);
-		if (!mem_section)
-			panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
-			      __func__, size, align);
-	}
-#endif
-
 	start &= PAGE_SECTION_MASK;
 	mminit_validate_memmodel_limits(&start, &end);
 	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
@@ -267,6 +254,19 @@ static void __init memblocks_present(void)
 	unsigned long start, end;
 	int i, nid;
 
+#ifdef CONFIG_SPARSEMEM_EXTREME
+	if (unlikely(!mem_section)) {
+		unsigned long size, align;
+
+		size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
+		align = 1 << (INTERNODE_CACHE_SHIFT);
+		mem_section = memblock_alloc(size, align);
+		if (!mem_section)
+			panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
+			      __func__, size, align);
+	}
+#endif
+
 	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid)
 		memory_present(nid, start, end);
 }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
  2024-03-26  6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
  2024-03-26  6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
@ 2024-03-26  6:11 ` Baoquan He
  2024-04-02  8:32   ` Mike Rapoport
  2024-04-10  3:35   ` [PATCH v2 " Baoquan He
  2024-03-26  6:11 ` [PATCH 3/7] mm: make __absent_pages_in_range() as static Baoquan He
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-26  6:11 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He

Because memory-less node's ->node_present_pages and its
zone's ->present_pages are all 0, the judgement before calling
node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
node is enough to skip memory-less node. The 'continue;' statement
inside for_each_node() loop of free_area_init() is gilding the lily.

Here, remove the special handling to make memory-less node share the
same code flow as normal node. And the code comment above the 'continue'
statement is not needed either.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/mm_init.c | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 089dc60159e9..99681ffb9091 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
 				panic("Cannot allocate %zuB for node %d.\n",
 				       sizeof(*pgdat), nid);
 			arch_refresh_nodedata(nid, pgdat);
-			free_area_init_node(nid);
-
-			/*
-			 * We do not want to confuse userspace by sysfs
-			 * files/directories for node without any memory
-			 * attached to it, so this node is not marked as
-			 * N_MEMORY and not marked online so that no sysfs
-			 * hierarchy will be created via register_one_node for
-			 * it. The pgdat will get fully initialized by
-			 * hotadd_init_pgdat() when memory is hotplugged into
-			 * this node.
-			 */
-			continue;
 		}
 
 		pgdat = NODE_DATA(nid);
 		free_area_init_node(nid);
 
 		/* Any memory on that node */
-		if (pgdat->node_present_pages)
+		if (pgdat->node_present_pages) {
 			node_set_state(nid, N_MEMORY);
-		check_for_memory(pgdat);
+			check_for_memory(pgdat);
+		}
 	}
 
 	calc_nr_kernel_pages();
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/7] mm: make __absent_pages_in_range() as static
  2024-03-26  6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
  2024-03-26  6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
  2024-03-26  6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
@ 2024-03-26  6:11 ` Baoquan He
  2024-04-02  8:29   ` Mike Rapoport
  2024-03-26  6:11 ` [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists() Baoquan He
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-03-26  6:11 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He

It's only called in mm/mm_init.c now.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 include/linux/mm.h | 2 --
 mm/mm_init.c       | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ab1ba0a31429..74f71e802e0c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3193,8 +3193,6 @@ static inline unsigned long get_num_physpages(void)
  */
 void free_area_init(unsigned long *max_zone_pfn);
 unsigned long node_map_pfn_alignment(void);
-unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
-						unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
 						unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 99681ffb9091..cbdb49cdd2ce 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1143,7 +1143,7 @@ static void __init adjust_zone_range_for_zone_movable(int nid,
  * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
  * then all holes in the requested range will be accounted for.
  */
-unsigned long __init __absent_pages_in_range(int nid,
+static unsigned long __init __absent_pages_in_range(int nid,
 				unsigned long range_start_pfn,
 				unsigned long range_end_pfn)
 {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists()
  2024-03-26  6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
                   ` (2 preceding siblings ...)
  2024-03-26  6:11 ` [PATCH 3/7] mm: make __absent_pages_in_range() as static Baoquan He
@ 2024-03-26  6:11 ` Baoquan He
  2024-03-27 12:06   ` [PATCH v2 " Baoquan He
  2024-03-26  6:11 ` [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone() Baoquan He
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-03-26  6:11 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He

When CONFIG_NUMA=n, MAX_NUMNODES is always 1 because Kconfig item
NODES_SHIFT depends on NUMA. So in !NUMA version of build_zonelists(),
no need to bother with the two for loop because code execution won't
enter them ever.

Here, remove those unneeded codes in !NUMA version of build_zonelists().

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 21 ---------------------
 1 file changed, 21 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 14d39f34d336..fd2b49aed59e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5190,27 +5190,6 @@ static void build_zonelists(pg_data_t *pgdat)
 	nr_zones = build_zonerefs_node(pgdat, zonerefs);
 	zonerefs += nr_zones;
 
-	/*
-	 * Now we build the zonelist so that it contains the zones
-	 * of all the other nodes.
-	 * We don't want to pressure a particular node, so when
-	 * building the zones for node N, we make sure that the
-	 * zones coming right after the local ones are those from
-	 * node N+1 (modulo N)
-	 */
-	for (node = local_node + 1; node < MAX_NUMNODES; node++) {
-		if (!node_online(node))
-			continue;
-		nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
-		zonerefs += nr_zones;
-	}
-	for (node = 0; node < local_node; node++) {
-		if (!node_online(node))
-			continue;
-		nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
-		zonerefs += nr_zones;
-	}
-
 	zonerefs->zone = NULL;
 	zonerefs->zone_idx = 0;
 }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone()
  2024-03-26  6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
                   ` (3 preceding siblings ...)
  2024-03-26  6:11 ` [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists() Baoquan He
@ 2024-03-26  6:11 ` Baoquan He
  2024-04-02  8:30   ` Mike Rapoport
  2024-03-26  6:11 ` [PATCH 6/7] mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone Baoquan He
  2024-03-26  6:11 ` [PATCH 7/7] mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES Baoquan He
  6 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-03-26  6:11 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He

The noinline attribute has been taken off in commit 9420f89db2dd ("mm:
move most of core MM initialization to mm/mm_init.c"). So remove the
unneeded code comment above deferred_grow_zone().

And also remove the unneeded bracket in deferred_init_pages().

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/mm_init.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index cbdb49cdd2ce..cc24e7958c0c 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2006,7 +2006,7 @@ static unsigned long  __init deferred_init_pages(struct zone *zone,
 		__init_single_page(page, pfn, zid, nid);
 		nr_pages++;
 	}
-	return (nr_pages);
+	return nr_pages;
 }
 
 /*
@@ -2208,10 +2208,6 @@ static int __init deferred_init_memmap(void *data)
  * Return true when zone was grown, otherwise return false. We return true even
  * when we grow less than requested, to let the caller decide if there are
  * enough pages to satisfy the allocation.
- *
- * Note: We use noinline because this function is needed only during boot, and
- * it is called from a __ref function _deferred_grow_zone. This way we are
- * making sure that it is not inlined into permanent text section.
  */
 bool __init deferred_grow_zone(struct zone *zone, unsigned int order)
 {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone
  2024-03-26  6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
                   ` (4 preceding siblings ...)
  2024-03-26  6:11 ` [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone() Baoquan He
@ 2024-03-26  6:11 ` Baoquan He
  2024-03-26  6:11 ` [PATCH 7/7] mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES Baoquan He
  6 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-26  6:11 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He

On one node, for lower zone's ->lowmem_reserve[], it will show how
much memory is reserved in this lower zone to avoid excessive page
allocation from the relevant higher zone's fallback allocation.

However, currently lower zone's lowmem_reserve[] element will be filled
even though the relevant higher zone is empty. That doesnt' make sense
and can cause confusion.

E.g on node 0 of one system as below, it has zone
DMA/DMA32/NORMAL/MOVABLE/DEVICE, among them zone MOVABLE/DEVICE are the
highest and both are empty. In zone DMA/DMA32's protection array, we can
see that it has value for zone MOVABLE and DEVICE.

Node 0, zone      DMA
  ......
  pages free     2816
        boost    0
        min      7
        low      10
        high     13
        spanned  4095
        present  3998
        managed  3840
        cma      0
        protection: (0, 1582, 23716, 23716, 23716)
   ......
Node 0, zone    DMA32
  pages free     403269
        boost    0
        min      753
        low      1158
        high     1563
        spanned  1044480
        present  487039
        managed  405070
        cma      0
        protection: (0, 0, 22134, 22134, 22134)
   ......
Node 0, zone   Normal
  pages free     5423879
        boost    0
        min      10539
        low      16205
        high     21871
        spanned  5767168
        present  5767168
        managed  5666438
        cma      0
        protection: (0, 0, 0, 0, 0)
   ......
Node 0, zone  Movable
  pages free     0
        boost    0
        min      32
        low      32
        high     32
        spanned  0
        present  0
        managed  0
        cma      0
        protection: (0, 0, 0, 0, 0)
Node 0, zone   Device
  pages free     0
        boost    0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        cma      0
        protection: (0, 0, 0, 0, 0)

Here, clear out the element value in lower zone's ->lowmem_reserve[] if the
relevant higher zone is empty.

And also replace space with tab in _deferred_grow_zone()

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd2b49aed59e..ce1d12cf2ec7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -332,7 +332,7 @@ static inline bool deferred_pages_enabled(void)
 static bool __ref
 _deferred_grow_zone(struct zone *zone, unsigned int order)
 {
-       return deferred_grow_zone(zone, order);
+	return deferred_grow_zone(zone, order);
 }
 #else
 static inline bool deferred_pages_enabled(void)
@@ -5806,10 +5806,11 @@ static void setup_per_zone_lowmem_reserve(void)
 
 			for (j = i + 1; j < MAX_NR_ZONES; j++) {
 				struct zone *upper_zone = &pgdat->node_zones[j];
+				bool empty = !zone_managed_pages(upper_zone);
 
 				managed_pages += zone_managed_pages(upper_zone);
 
-				if (clear)
+				if (clear || empty)
 					zone->lowmem_reserve[j] = 0;
 				else
 					zone->lowmem_reserve[j] = managed_pages / ratio;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/7] mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES
  2024-03-26  6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
                   ` (5 preceding siblings ...)
  2024-03-26  6:11 ` [PATCH 6/7] mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone Baoquan He
@ 2024-03-26  6:11 ` Baoquan He
  6 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-26  6:11 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He

Earlier, in commit 1dd214b8f21c ("mm: page_alloc: avoid merging
non-fallbackable pageblocks with others"), migrate type MIGRATE_CMA and
MIGRATE_ISOLATE are removed from fallbacks list since they are never
used.

Later on, in commit ("aa02d3c174ab mm/page_alloc: reduce fallbacks to
(MIGRATE_PCPTYPES - 1)"), the array column size is reduced to
'MIGRATE_PCPTYPES - 1'. In fact, the array row size need be reduced to
MIGRATE_PCPTYPES too since it's only covering rows of the number
MIGRATE_PCPTYPES. Even though the current code has handled cases
when the migratetype is CMA, HIGHATOMIC and MEMORY_ISOLATION, making
the row size right is still good to avoid future error and confusion.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ce1d12cf2ec7..b60cdcda46d2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1592,7 +1592,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  *
  * The other migratetypes do not have fallbacks.
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_PCPTYPES - 1] = {
+static int fallbacks[MIGRATE_PCPTYPES][MIGRATE_PCPTYPES - 1] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE   },
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE   },
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists()
  2024-03-26  6:11 ` [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists() Baoquan He
@ 2024-03-27 12:06   ` Baoquan He
  0 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-27 12:06 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman

When CONFIG_NUMA=n, MAX_NUMNODES is always 1 because Kconfig item
NODES_SHIFT depends on NUMA. So in !NUMA version of build_zonelists(),
no need to bother with the two for loop because code execution won't
enter them ever.

Here, remove those unneeded codes in !NUMA version of build_zonelists().

Signed-off-by: Baoquan He <bhe@redhat.com>
---
v1->v2:
Remove the unused local variables to fix lkp reported warning.

 mm/page_alloc.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 14d39f34d336..d35b4cbeffbd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5180,37 +5180,13 @@ static void setup_min_slab_ratio(void);
 
 static void build_zonelists(pg_data_t *pgdat)
 {
-	int node, local_node;
 	struct zoneref *zonerefs;
 	int nr_zones;
 
-	local_node = pgdat->node_id;
-
 	zonerefs = pgdat->node_zonelists[ZONELIST_FALLBACK]._zonerefs;
 	nr_zones = build_zonerefs_node(pgdat, zonerefs);
 	zonerefs += nr_zones;
 
-	/*
-	 * Now we build the zonelist so that it contains the zones
-	 * of all the other nodes.
-	 * We don't want to pressure a particular node, so when
-	 * building the zones for node N, we make sure that the
-	 * zones coming right after the local ones are those from
-	 * node N+1 (modulo N)
-	 */
-	for (node = local_node + 1; node < MAX_NUMNODES; node++) {
-		if (!node_online(node))
-			continue;
-		nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
-		zonerefs += nr_zones;
-	}
-	for (node = 0; node < local_node; node++) {
-		if (!node_online(node))
-			continue;
-		nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
-		zonerefs += nr_zones;
-	}
-
 	zonerefs->zone = NULL;
 	zonerefs->zone_idx = 0;
 }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/7] mm: move array mem_section init code out of memory_present()
  2024-03-26  6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
@ 2024-04-02  8:19   ` Mike Rapoport
  2024-04-04  1:38     ` Baoquan He
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2024-04-02  8:19 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-mm, akpm, mgorman

On Tue, Mar 26, 2024 at 02:11:27PM +0800, Baoquan He wrote:
> When CONFIG_SPARSEMEM_EXTREME is enabled, mem_section need be initialized
> to point at a two-dimensional array, and its 1st dimension of length
> NR_SECTION_ROOTS will be dynamically allocated. Once the allocation is
> done, it's available for all nodes.
> 
> So take the 1st dimension of mem_section initialization out of
> memory_present()(), and put it into memblocks_present() which is a more
> appripriate place.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aed0951b87fa..46e88549d1a6 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -226,19 +226,6 @@ static void __init memory_present(int nid, unsigned long start, unsigned long en
>  {
>  	unsigned long pfn;
>  
> -#ifdef CONFIG_SPARSEMEM_EXTREME
> -	if (unlikely(!mem_section)) {
> -		unsigned long size, align;
> -
> -		size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
> -		align = 1 << (INTERNODE_CACHE_SHIFT);
> -		mem_section = memblock_alloc(size, align);
> -		if (!mem_section)
> -			panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
> -			      __func__, size, align);
> -	}
> -#endif
> -
>  	start &= PAGE_SECTION_MASK;
>  	mminit_validate_memmodel_limits(&start, &end);
>  	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
> @@ -267,6 +254,19 @@ static void __init memblocks_present(void)
>  	unsigned long start, end;
>  	int i, nid;
>  
> +#ifdef CONFIG_SPARSEMEM_EXTREME
> +	if (unlikely(!mem_section)) {

With the new calling sequence sparse_init() -> memblocks_present() ->
allocate mem_section, mem_section here will be NULL, so this if is not
really relevant. We might want to add WARN_ON_ONCE(mem_section) just in
case to catch multiple calls to sparse_init().

> +		unsigned long size, align;
> +
> +		size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
> +		align = 1 << (INTERNODE_CACHE_SHIFT);
> +		mem_section = memblock_alloc(size, align);
> +		if (!mem_section)
> +			panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
> +			      __func__, size, align);
> +	}
> +#endif

> +
>  	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid)
>  		memory_present(nid, start, end);
>  }
> -- 
> 2.41.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/7] mm: make __absent_pages_in_range() as static
  2024-03-26  6:11 ` [PATCH 3/7] mm: make __absent_pages_in_range() as static Baoquan He
@ 2024-04-02  8:29   ` Mike Rapoport
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Rapoport @ 2024-04-02  8:29 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-mm, akpm, mgorman

On Tue, Mar 26, 2024 at 02:11:29PM +0800, Baoquan He wrote:
> It's only called in mm/mm_init.c now.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  include/linux/mm.h | 2 --
>  mm/mm_init.c       | 2 +-
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ab1ba0a31429..74f71e802e0c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3193,8 +3193,6 @@ static inline unsigned long get_num_physpages(void)
>   */
>  void free_area_init(unsigned long *max_zone_pfn);
>  unsigned long node_map_pfn_alignment(void);
> -unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
> -						unsigned long end_pfn);
>  extern unsigned long absent_pages_in_range(unsigned long start_pfn,
>  						unsigned long end_pfn);
>  extern void get_pfn_range_for_nid(unsigned int nid,
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 99681ffb9091..cbdb49cdd2ce 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1143,7 +1143,7 @@ static void __init adjust_zone_range_for_zone_movable(int nid,
>   * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
>   * then all holes in the requested range will be accounted for.
>   */
> -unsigned long __init __absent_pages_in_range(int nid,
> +static unsigned long __init __absent_pages_in_range(int nid,
>  				unsigned long range_start_pfn,
>  				unsigned long range_end_pfn)
>  {
>
> -- 
> 2.41.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone()
  2024-03-26  6:11 ` [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone() Baoquan He
@ 2024-04-02  8:30   ` Mike Rapoport
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Rapoport @ 2024-04-02  8:30 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-mm, akpm, mgorman

On Tue, Mar 26, 2024 at 02:11:31PM +0800, Baoquan He wrote:
> The noinline attribute has been taken off in commit 9420f89db2dd ("mm:
> move most of core MM initialization to mm/mm_init.c"). So remove the
> unneeded code comment above deferred_grow_zone().
> 
> And also remove the unneeded bracket in deferred_init_pages().
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  mm/mm_init.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index cbdb49cdd2ce..cc24e7958c0c 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -2006,7 +2006,7 @@ static unsigned long  __init deferred_init_pages(struct zone *zone,
>  		__init_single_page(page, pfn, zid, nid);
>  		nr_pages++;
>  	}
> -	return (nr_pages);
> +	return nr_pages;
>  }
>  
>  /*
> @@ -2208,10 +2208,6 @@ static int __init deferred_init_memmap(void *data)
>   * Return true when zone was grown, otherwise return false. We return true even
>   * when we grow less than requested, to let the caller decide if there are
>   * enough pages to satisfy the allocation.
> - *
> - * Note: We use noinline because this function is needed only during boot, and
> - * it is called from a __ref function _deferred_grow_zone. This way we are
> - * making sure that it is not inlined into permanent text section.
>   */
>  bool __init deferred_grow_zone(struct zone *zone, unsigned int order)
>  {
> -- 
> 2.41.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
  2024-03-26  6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
@ 2024-04-02  8:32   ` Mike Rapoport
  2024-04-04  3:23     ` Baoquan He
  2024-04-10  3:35   ` [PATCH v2 " Baoquan He
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2024-04-02  8:32 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-mm, akpm, mgorman

On Tue, Mar 26, 2024 at 02:11:28PM +0800, Baoquan He wrote:
> Because memory-less node's ->node_present_pages and its
> zone's ->present_pages are all 0, the judgement before calling
> node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
> node is enough to skip memory-less node. The 'continue;' statement
> inside for_each_node() loop of free_area_init() is gilding the lily.
> 
> Here, remove the special handling to make memory-less node share the
> same code flow as normal node. And the code comment above the 'continue'
> statement is not needed either.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/mm_init.c | 18 +++---------------
>  1 file changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 089dc60159e9..99681ffb9091 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
>  				panic("Cannot allocate %zuB for node %d.\n",
>  				       sizeof(*pgdat), nid);
>  			arch_refresh_nodedata(nid, pgdat);
> -			free_area_init_node(nid);
> -
> -			/*
> -			 * We do not want to confuse userspace by sysfs
> -			 * files/directories for node without any memory
> -			 * attached to it, so this node is not marked as
> -			 * N_MEMORY and not marked online so that no sysfs
> -			 * hierarchy will be created via register_one_node for
> -			 * it. The pgdat will get fully initialized by
> -			 * hotadd_init_pgdat() when memory is hotplugged into
> -			 * this node.
> -			 */

I think this comment is still valuable. Maybe rephrase it a bit and move it
before 'if (pgdat->node_present_pages)'?

> -			continue;
>  		}
>  
>  		pgdat = NODE_DATA(nid);
>  		free_area_init_node(nid);
>  
>  		/* Any memory on that node */
> -		if (pgdat->node_present_pages)
> +		if (pgdat->node_present_pages) {
>  			node_set_state(nid, N_MEMORY);
> -		check_for_memory(pgdat);
> +			check_for_memory(pgdat);
> +		}
>  	}
>  
>  	calc_nr_kernel_pages();
> -- 
> 2.41.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/7] mm: move array mem_section init code out of memory_present()
  2024-04-02  8:19   ` Mike Rapoport
@ 2024-04-04  1:38     ` Baoquan He
  0 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-04-04  1:38 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-mm, akpm, mgorman

On 04/02/24 at 11:19am, Mike Rapoport wrote:
> On Tue, Mar 26, 2024 at 02:11:27PM +0800, Baoquan He wrote:
> > When CONFIG_SPARSEMEM_EXTREME is enabled, mem_section need be initialized
> > to point at a two-dimensional array, and its 1st dimension of length
> > NR_SECTION_ROOTS will be dynamically allocated. Once the allocation is
> > done, it's available for all nodes.
> > 
> > So take the 1st dimension of mem_section initialization out of
> > memory_present()(), and put it into memblocks_present() which is a more
> > appripriate place.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/sparse.c | 26 +++++++++++++-------------
> >  1 file changed, 13 insertions(+), 13 deletions(-)
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index aed0951b87fa..46e88549d1a6 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -226,19 +226,6 @@ static void __init memory_present(int nid, unsigned long start, unsigned long en
> >  {
> >  	unsigned long pfn;
> >  
> > -#ifdef CONFIG_SPARSEMEM_EXTREME
> > -	if (unlikely(!mem_section)) {
> > -		unsigned long size, align;
> > -
> > -		size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
> > -		align = 1 << (INTERNODE_CACHE_SHIFT);
> > -		mem_section = memblock_alloc(size, align);
> > -		if (!mem_section)
> > -			panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
> > -			      __func__, size, align);
> > -	}
> > -#endif
> > -
> >  	start &= PAGE_SECTION_MASK;
> >  	mminit_validate_memmodel_limits(&start, &end);
> >  	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
> > @@ -267,6 +254,19 @@ static void __init memblocks_present(void)
> >  	unsigned long start, end;
> >  	int i, nid;
> >  
> > +#ifdef CONFIG_SPARSEMEM_EXTREME
> > +	if (unlikely(!mem_section)) {
> 
> With the new calling sequence sparse_init() -> memblocks_present() ->
> allocate mem_section, mem_section here will be NULL, so this if is not
> really relevant. We might want to add WARN_ON_ONCE(mem_section) just in
> case to catch multiple calls to sparse_init().

Sorry for late reply.

Hmm, I am not very eager to add WARN_ON_ONCE(mem_section) becasue as you
said it's not very relevant. Calling sparse_init() miltiple times is not
acceptable, while it may not be good to let checking mem_section in
memblocks_present() to catch it.

> 
> > +		unsigned long size, align;
> > +
> > +		size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
> > +		align = 1 << (INTERNODE_CACHE_SHIFT);
> > +		mem_section = memblock_alloc(size, align);
> > +		if (!mem_section)
> > +			panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
> > +			      __func__, size, align);
> > +	}
> > +#endif
> 
> > +
> >  	for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid)
> >  		memory_present(nid, start, end);
> >  }
> > -- 
> > 2.41.0
> > 
> 
> -- 
> Sincerely yours,
> Mike.
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
  2024-04-02  8:32   ` Mike Rapoport
@ 2024-04-04  3:23     ` Baoquan He
  2024-04-09 15:40       ` Mike Rapoport
  0 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-04-04  3:23 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-mm, akpm, mgorman

On 04/02/24 at 11:32am, Mike Rapoport wrote:
> On Tue, Mar 26, 2024 at 02:11:28PM +0800, Baoquan He wrote:
> > Because memory-less node's ->node_present_pages and its
> > zone's ->present_pages are all 0, the judgement before calling
> > node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
> > node is enough to skip memory-less node. The 'continue;' statement
> > inside for_each_node() loop of free_area_init() is gilding the lily.
> > 
> > Here, remove the special handling to make memory-less node share the
> > same code flow as normal node. And the code comment above the 'continue'
> > statement is not needed either.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/mm_init.c | 18 +++---------------
> >  1 file changed, 3 insertions(+), 15 deletions(-)
> > 
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index 089dc60159e9..99681ffb9091 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> >  				panic("Cannot allocate %zuB for node %d.\n",
> >  				       sizeof(*pgdat), nid);
> >  			arch_refresh_nodedata(nid, pgdat);
> > -			free_area_init_node(nid);
> > -
> > -			/*
> > -			 * We do not want to confuse userspace by sysfs
> > -			 * files/directories for node without any memory
> > -			 * attached to it, so this node is not marked as
> > -			 * N_MEMORY and not marked online so that no sysfs
> > -			 * hierarchy will be created via register_one_node for
> > -			 * it. The pgdat will get fully initialized by
> > -			 * hotadd_init_pgdat() when memory is hotplugged into
> > -			 * this node.
> > -			 */
> 
> I think this comment is still valuable. Maybe rephrase it a bit and move it
> before 'if (pgdat->node_present_pages)'?

Fair enough.

Do you think below paragraph is OK to you? Please help polish or
rephrase it.

diff --git a/mm/mm_init.c b/mm/mm_init.c
index dd875f943cbb..3ce0f29637ba 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1839,7 +1839,14 @@ void __init free_area_init(unsigned long *max_zone_pfn)
 		pgdat = NODE_DATA(nid);
 		free_area_init_node(nid);
 
-		/* Any memory on that node */
+		/*
+		 * No sysfs hierarcy will be created via register_one_node()
+		 *for memory-less node because here it's not marked as N_MEMORY
+		 *and won't be set online later. The benefit is userspace
+		 *program won't be confused by sysfs files/directories of
+		 *memory-less node. The pgdat will get fully initialized by
+		 *hotadd_init_pgdat() when memory is hotplugged into this node.
+		 */
 		if (pgdat->node_present_pages) {
 			node_set_state(nid, N_MEMORY);
 			check_for_memory(pgdat);



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
  2024-04-04  3:23     ` Baoquan He
@ 2024-04-09 15:40       ` Mike Rapoport
  2024-04-10  3:38         ` Baoquan He
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2024-04-09 15:40 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-mm, akpm, mgorman

On Thu, Apr 04, 2024 at 11:23:51AM +0800, Baoquan He wrote:
> On 04/02/24 at 11:32am, Mike Rapoport wrote:
> > On Tue, Mar 26, 2024 at 02:11:28PM +0800, Baoquan He wrote:
> > > Because memory-less node's ->node_present_pages and its
> > > zone's ->present_pages are all 0, the judgement before calling
> > > node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
> > > node is enough to skip memory-less node. The 'continue;' statement
> > > inside for_each_node() loop of free_area_init() is gilding the lily.
> > > 
> > > Here, remove the special handling to make memory-less node share the
> > > same code flow as normal node. And the code comment above the 'continue'
> > > statement is not needed either.
> > > 
> > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > ---
> > >  mm/mm_init.c | 18 +++---------------
> > >  1 file changed, 3 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > index 089dc60159e9..99681ffb9091 100644
> > > --- a/mm/mm_init.c
> > > +++ b/mm/mm_init.c
> > > @@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> > >  				panic("Cannot allocate %zuB for node %d.\n",
> > >  				       sizeof(*pgdat), nid);
> > >  			arch_refresh_nodedata(nid, pgdat);
> > > -			free_area_init_node(nid);
> > > -
> > > -			/*
> > > -			 * We do not want to confuse userspace by sysfs
> > > -			 * files/directories for node without any memory
> > > -			 * attached to it, so this node is not marked as
> > > -			 * N_MEMORY and not marked online so that no sysfs
> > > -			 * hierarchy will be created via register_one_node for
> > > -			 * it. The pgdat will get fully initialized by
> > > -			 * hotadd_init_pgdat() when memory is hotplugged into
> > > -			 * this node.
> > > -			 */
> > 
> > I think this comment is still valuable. Maybe rephrase it a bit and move it
> > before 'if (pgdat->node_present_pages)'?
> 
> Fair enough.
> 
> Do you think below paragraph is OK to you? Please help polish or
> rephrase it.
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index dd875f943cbb..3ce0f29637ba 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1839,7 +1839,14 @@ void __init free_area_init(unsigned long *max_zone_pfn)
>  		pgdat = NODE_DATA(nid);
>  		free_area_init_node(nid);
>  
> -		/* Any memory on that node */
> +		/*
> +		 * No sysfs hierarcy will be created via register_one_node()
> +		 *for memory-less node because here it's not marked as N_MEMORY
> +		 *and won't be set online later. The benefit is userspace
> +		 *program won't be confused by sysfs files/directories of
> +		 *memory-less node. The pgdat will get fully initialized by
> +		 *hotadd_init_pgdat() when memory is hotplugged into this node.
> +		 */

Ack

>  		if (pgdat->node_present_pages) {
>  			node_set_state(nid, N_MEMORY);
>  			check_for_memory(pgdat);
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/7] mm/init: remove the unnecessary special treatment for memory-less node
  2024-03-26  6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
  2024-04-02  8:32   ` Mike Rapoport
@ 2024-04-10  3:35   ` Baoquan He
  1 sibling, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-04-10  3:35 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, rppt, mgorman

Because memory-less node's ->node_present_pages and its
zone's ->present_pages are all 0, the judgement before calling
node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
node is enough to skip memory-less node. The 'continue;' statement
inside for_each_node() loop of free_area_init() is gilding the lily.

Here, remove the special handling to make memory-less node share the
same code flow as normal node.

And also rephrase the code comments above the 'continue' statement
and move them above above line 'if (pgdat->node_present_pages)'.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
v1->v2:
- As Mike suggested, the old code comments above the 'continue'
  statement is still useful for easier understanding code and system
  behaviour. So rephrase and move them above line 'if
  (pgdat->node_present_pages)'. Thanks to Mike.

 mm/mm_init.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 2016ca8031e9..32ede966e609 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1834,28 +1834,23 @@ void __init free_area_init(unsigned long *max_zone_pfn)
 				panic("Cannot allocate %zuB for node %d.\n",
 				       sizeof(*pgdat), nid);
 			arch_refresh_nodedata(nid, pgdat);
-			free_area_init_node(nid);
-
-			/*
-			 * We do not want to confuse userspace by sysfs
-			 * files/directories for node without any memory
-			 * attached to it, so this node is not marked as
-			 * N_MEMORY and not marked online so that no sysfs
-			 * hierarchy will be created via register_one_node for
-			 * it. The pgdat will get fully initialized by
-			 * hotadd_init_pgdat() when memory is hotplugged into
-			 * this node.
-			 */
-			continue;
 		}
 
 		pgdat = NODE_DATA(nid);
 		free_area_init_node(nid);
 
-		/* Any memory on that node */
-		if (pgdat->node_present_pages)
+		/*
+		 * No sysfs hierarcy will be created via register_one_node()
+		 *for memory-less node because here it's not marked as N_MEMORY
+		 *and won't be set online later. The benefit is userspace
+		 *program won't be confused by sysfs files/directories of
+		 *memory-less node. The pgdat will get fully initialized by
+		 *hotadd_init_pgdat() when memory is hotplugged into this node.
+		 */
+		if (pgdat->node_present_pages) {
 			node_set_state(nid, N_MEMORY);
-		check_for_memory(pgdat);
+			check_for_memory(pgdat);
+		}
 	}
 
 	calc_nr_kernel_pages();
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
  2024-04-09 15:40       ` Mike Rapoport
@ 2024-04-10  3:38         ` Baoquan He
  0 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-04-10  3:38 UTC (permalink / raw)
  To: Mike Rapoport, akpm; +Cc: linux-mm, mgorman

On 04/09/24 at 06:40pm, Mike Rapoport wrote:
> On Thu, Apr 04, 2024 at 11:23:51AM +0800, Baoquan He wrote:
> > On 04/02/24 at 11:32am, Mike Rapoport wrote:
> > > On Tue, Mar 26, 2024 at 02:11:28PM +0800, Baoquan He wrote:
> > > > Because memory-less node's ->node_present_pages and its
> > > > zone's ->present_pages are all 0, the judgement before calling
> > > > node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
> > > > node is enough to skip memory-less node. The 'continue;' statement
> > > > inside for_each_node() loop of free_area_init() is gilding the lily.
> > > > 
> > > > Here, remove the special handling to make memory-less node share the
> > > > same code flow as normal node. And the code comment above the 'continue'
> > > > statement is not needed either.
> > > > 
> > > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > > ---
> > > >  mm/mm_init.c | 18 +++---------------
> > > >  1 file changed, 3 insertions(+), 15 deletions(-)
> > > > 
> > > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > > index 089dc60159e9..99681ffb9091 100644
> > > > --- a/mm/mm_init.c
> > > > +++ b/mm/mm_init.c
> > > > @@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> > > >  				panic("Cannot allocate %zuB for node %d.\n",
> > > >  				       sizeof(*pgdat), nid);
> > > >  			arch_refresh_nodedata(nid, pgdat);
> > > > -			free_area_init_node(nid);
> > > > -
> > > > -			/*
> > > > -			 * We do not want to confuse userspace by sysfs
> > > > -			 * files/directories for node without any memory
> > > > -			 * attached to it, so this node is not marked as
> > > > -			 * N_MEMORY and not marked online so that no sysfs
> > > > -			 * hierarchy will be created via register_one_node for
> > > > -			 * it. The pgdat will get fully initialized by
> > > > -			 * hotadd_init_pgdat() when memory is hotplugged into
> > > > -			 * this node.
> > > > -			 */
> > > 
> > > I think this comment is still valuable. Maybe rephrase it a bit and move it
> > > before 'if (pgdat->node_present_pages)'?
> > 
> > Fair enough.
> > 
> > Do you think below paragraph is OK to you? Please help polish or
> > rephrase it.
> > 
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index dd875f943cbb..3ce0f29637ba 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -1839,7 +1839,14 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> >  		pgdat = NODE_DATA(nid);
> >  		free_area_init_node(nid);
> >  
> > -		/* Any memory on that node */
> > +		/*
> > +		 * No sysfs hierarcy will be created via register_one_node()
> > +		 *for memory-less node because here it's not marked as N_MEMORY
> > +		 *and won't be set online later. The benefit is userspace
> > +		 *program won't be confused by sysfs files/directories of
> > +		 *memory-less node. The pgdat will get fully initialized by
> > +		 *hotadd_init_pgdat() when memory is hotplugged into this node.
> > +		 */
> 
> Ack

Thanks a lot for reviewing and confirming, Mike.

Hi Andrew,

I have sent out v2 to include above code comment changing, please feel
free to pick the draft patch and append, or take the newly post v2.
Thanks.

> 
> >  		if (pgdat->node_present_pages) {
> >  			node_set_state(nid, N_MEMORY);
> >  			check_for_memory(pgdat);
> > 
> 
> -- 
> Sincerely yours,
> Mike.
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-04-10  3:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-26  6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
2024-03-26  6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
2024-04-02  8:19   ` Mike Rapoport
2024-04-04  1:38     ` Baoquan He
2024-03-26  6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
2024-04-02  8:32   ` Mike Rapoport
2024-04-04  3:23     ` Baoquan He
2024-04-09 15:40       ` Mike Rapoport
2024-04-10  3:38         ` Baoquan He
2024-04-10  3:35   ` [PATCH v2 " Baoquan He
2024-03-26  6:11 ` [PATCH 3/7] mm: make __absent_pages_in_range() as static Baoquan He
2024-04-02  8:29   ` Mike Rapoport
2024-03-26  6:11 ` [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists() Baoquan He
2024-03-27 12:06   ` [PATCH v2 " Baoquan He
2024-03-26  6:11 ` [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone() Baoquan He
2024-04-02  8:30   ` Mike Rapoport
2024-03-26  6:11 ` [PATCH 6/7] mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone Baoquan He
2024-03-26  6:11 ` [PATCH 7/7] mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).