* [PATCH 0/7] mm/init: minor clean up and improvement
@ 2024-03-26 6:11 Baoquan He
2024-03-26 6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
` (6 more replies)
0 siblings, 7 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-26 6:11 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He
These are all observed when going through code flow during mm init.
Baoquan He (7):
mm: move array mem_section init code out of memory_present()
mm/init: remove the unnecessary special treatment for memory-less node
mm: make __absent_pages_in_range() as static
mm/page_alloc.c: remove unneeded codes in !NUMA version of
build_zonelists()
mm/mm_init.c: remove the outdated code comment above
deferred_grow_zone()
mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[]
for empty zone
mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES
include/linux/mm.h | 2 --
mm/mm_init.c | 26 +++++---------------------
mm/page_alloc.c | 28 ++++------------------------
mm/sparse.c | 26 +++++++++++++-------------
4 files changed, 22 insertions(+), 60 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/7] mm: move array mem_section init code out of memory_present()
2024-03-26 6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
@ 2024-03-26 6:11 ` Baoquan He
2024-04-02 8:19 ` Mike Rapoport
2024-03-26 6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
` (5 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-03-26 6:11 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He
When CONFIG_SPARSEMEM_EXTREME is enabled, mem_section need be initialized
to point at a two-dimensional array, and its 1st dimension of length
NR_SECTION_ROOTS will be dynamically allocated. Once the allocation is
done, it's available for all nodes.
So take the 1st dimension of mem_section initialization out of
memory_present()(), and put it into memblocks_present() which is a more
appripriate place.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/sparse.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/mm/sparse.c b/mm/sparse.c
index aed0951b87fa..46e88549d1a6 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -226,19 +226,6 @@ static void __init memory_present(int nid, unsigned long start, unsigned long en
{
unsigned long pfn;
-#ifdef CONFIG_SPARSEMEM_EXTREME
- if (unlikely(!mem_section)) {
- unsigned long size, align;
-
- size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
- align = 1 << (INTERNODE_CACHE_SHIFT);
- mem_section = memblock_alloc(size, align);
- if (!mem_section)
- panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
- __func__, size, align);
- }
-#endif
-
start &= PAGE_SECTION_MASK;
mminit_validate_memmodel_limits(&start, &end);
for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
@@ -267,6 +254,19 @@ static void __init memblocks_present(void)
unsigned long start, end;
int i, nid;
+#ifdef CONFIG_SPARSEMEM_EXTREME
+ if (unlikely(!mem_section)) {
+ unsigned long size, align;
+
+ size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
+ align = 1 << (INTERNODE_CACHE_SHIFT);
+ mem_section = memblock_alloc(size, align);
+ if (!mem_section)
+ panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
+ __func__, size, align);
+ }
+#endif
+
for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid)
memory_present(nid, start, end);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
2024-03-26 6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
2024-03-26 6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
@ 2024-03-26 6:11 ` Baoquan He
2024-04-02 8:32 ` Mike Rapoport
2024-04-10 3:35 ` [PATCH v2 " Baoquan He
2024-03-26 6:11 ` [PATCH 3/7] mm: make __absent_pages_in_range() as static Baoquan He
` (4 subsequent siblings)
6 siblings, 2 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-26 6:11 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He
Because memory-less node's ->node_present_pages and its
zone's ->present_pages are all 0, the judgement before calling
node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
node is enough to skip memory-less node. The 'continue;' statement
inside for_each_node() loop of free_area_init() is gilding the lily.
Here, remove the special handling to make memory-less node share the
same code flow as normal node. And the code comment above the 'continue'
statement is not needed either.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/mm_init.c | 18 +++---------------
1 file changed, 3 insertions(+), 15 deletions(-)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 089dc60159e9..99681ffb9091 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
panic("Cannot allocate %zuB for node %d.\n",
sizeof(*pgdat), nid);
arch_refresh_nodedata(nid, pgdat);
- free_area_init_node(nid);
-
- /*
- * We do not want to confuse userspace by sysfs
- * files/directories for node without any memory
- * attached to it, so this node is not marked as
- * N_MEMORY and not marked online so that no sysfs
- * hierarchy will be created via register_one_node for
- * it. The pgdat will get fully initialized by
- * hotadd_init_pgdat() when memory is hotplugged into
- * this node.
- */
- continue;
}
pgdat = NODE_DATA(nid);
free_area_init_node(nid);
/* Any memory on that node */
- if (pgdat->node_present_pages)
+ if (pgdat->node_present_pages) {
node_set_state(nid, N_MEMORY);
- check_for_memory(pgdat);
+ check_for_memory(pgdat);
+ }
}
calc_nr_kernel_pages();
--
2.41.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/7] mm: make __absent_pages_in_range() as static
2024-03-26 6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
2024-03-26 6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
2024-03-26 6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
@ 2024-03-26 6:11 ` Baoquan He
2024-04-02 8:29 ` Mike Rapoport
2024-03-26 6:11 ` [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists() Baoquan He
` (3 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-03-26 6:11 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He
It's only called in mm/mm_init.c now.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
include/linux/mm.h | 2 --
mm/mm_init.c | 2 +-
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ab1ba0a31429..74f71e802e0c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3193,8 +3193,6 @@ static inline unsigned long get_num_physpages(void)
*/
void free_area_init(unsigned long *max_zone_pfn);
unsigned long node_map_pfn_alignment(void);
-unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
- unsigned long end_pfn);
extern unsigned long absent_pages_in_range(unsigned long start_pfn,
unsigned long end_pfn);
extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 99681ffb9091..cbdb49cdd2ce 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1143,7 +1143,7 @@ static void __init adjust_zone_range_for_zone_movable(int nid,
* Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
* then all holes in the requested range will be accounted for.
*/
-unsigned long __init __absent_pages_in_range(int nid,
+static unsigned long __init __absent_pages_in_range(int nid,
unsigned long range_start_pfn,
unsigned long range_end_pfn)
{
--
2.41.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists()
2024-03-26 6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
` (2 preceding siblings ...)
2024-03-26 6:11 ` [PATCH 3/7] mm: make __absent_pages_in_range() as static Baoquan He
@ 2024-03-26 6:11 ` Baoquan He
2024-03-27 12:06 ` [PATCH v2 " Baoquan He
2024-03-26 6:11 ` [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone() Baoquan He
` (2 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-03-26 6:11 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He
When CONFIG_NUMA=n, MAX_NUMNODES is always 1 because Kconfig item
NODES_SHIFT depends on NUMA. So in !NUMA version of build_zonelists(),
no need to bother with the two for loop because code execution won't
enter them ever.
Here, remove those unneeded codes in !NUMA version of build_zonelists().
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/page_alloc.c | 21 ---------------------
1 file changed, 21 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 14d39f34d336..fd2b49aed59e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5190,27 +5190,6 @@ static void build_zonelists(pg_data_t *pgdat)
nr_zones = build_zonerefs_node(pgdat, zonerefs);
zonerefs += nr_zones;
- /*
- * Now we build the zonelist so that it contains the zones
- * of all the other nodes.
- * We don't want to pressure a particular node, so when
- * building the zones for node N, we make sure that the
- * zones coming right after the local ones are those from
- * node N+1 (modulo N)
- */
- for (node = local_node + 1; node < MAX_NUMNODES; node++) {
- if (!node_online(node))
- continue;
- nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
- zonerefs += nr_zones;
- }
- for (node = 0; node < local_node; node++) {
- if (!node_online(node))
- continue;
- nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
- zonerefs += nr_zones;
- }
-
zonerefs->zone = NULL;
zonerefs->zone_idx = 0;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone()
2024-03-26 6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
` (3 preceding siblings ...)
2024-03-26 6:11 ` [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists() Baoquan He
@ 2024-03-26 6:11 ` Baoquan He
2024-04-02 8:30 ` Mike Rapoport
2024-03-26 6:11 ` [PATCH 6/7] mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone Baoquan He
2024-03-26 6:11 ` [PATCH 7/7] mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES Baoquan He
6 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-03-26 6:11 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He
The noinline attribute has been taken off in commit 9420f89db2dd ("mm:
move most of core MM initialization to mm/mm_init.c"). So remove the
unneeded code comment above deferred_grow_zone().
And also remove the unneeded bracket in deferred_init_pages().
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/mm_init.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index cbdb49cdd2ce..cc24e7958c0c 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2006,7 +2006,7 @@ static unsigned long __init deferred_init_pages(struct zone *zone,
__init_single_page(page, pfn, zid, nid);
nr_pages++;
}
- return (nr_pages);
+ return nr_pages;
}
/*
@@ -2208,10 +2208,6 @@ static int __init deferred_init_memmap(void *data)
* Return true when zone was grown, otherwise return false. We return true even
* when we grow less than requested, to let the caller decide if there are
* enough pages to satisfy the allocation.
- *
- * Note: We use noinline because this function is needed only during boot, and
- * it is called from a __ref function _deferred_grow_zone. This way we are
- * making sure that it is not inlined into permanent text section.
*/
bool __init deferred_grow_zone(struct zone *zone, unsigned int order)
{
--
2.41.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 6/7] mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone
2024-03-26 6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
` (4 preceding siblings ...)
2024-03-26 6:11 ` [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone() Baoquan He
@ 2024-03-26 6:11 ` Baoquan He
2024-03-26 6:11 ` [PATCH 7/7] mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES Baoquan He
6 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-26 6:11 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He
On one node, for lower zone's ->lowmem_reserve[], it will show how
much memory is reserved in this lower zone to avoid excessive page
allocation from the relevant higher zone's fallback allocation.
However, currently lower zone's lowmem_reserve[] element will be filled
even though the relevant higher zone is empty. That doesnt' make sense
and can cause confusion.
E.g on node 0 of one system as below, it has zone
DMA/DMA32/NORMAL/MOVABLE/DEVICE, among them zone MOVABLE/DEVICE are the
highest and both are empty. In zone DMA/DMA32's protection array, we can
see that it has value for zone MOVABLE and DEVICE.
Node 0, zone DMA
......
pages free 2816
boost 0
min 7
low 10
high 13
spanned 4095
present 3998
managed 3840
cma 0
protection: (0, 1582, 23716, 23716, 23716)
......
Node 0, zone DMA32
pages free 403269
boost 0
min 753
low 1158
high 1563
spanned 1044480
present 487039
managed 405070
cma 0
protection: (0, 0, 22134, 22134, 22134)
......
Node 0, zone Normal
pages free 5423879
boost 0
min 10539
low 16205
high 21871
spanned 5767168
present 5767168
managed 5666438
cma 0
protection: (0, 0, 0, 0, 0)
......
Node 0, zone Movable
pages free 0
boost 0
min 32
low 32
high 32
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0, 0)
Node 0, zone Device
pages free 0
boost 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0, 0)
Here, clear out the element value in lower zone's ->lowmem_reserve[] if the
relevant higher zone is empty.
And also replace space with tab in _deferred_grow_zone()
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/page_alloc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd2b49aed59e..ce1d12cf2ec7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -332,7 +332,7 @@ static inline bool deferred_pages_enabled(void)
static bool __ref
_deferred_grow_zone(struct zone *zone, unsigned int order)
{
- return deferred_grow_zone(zone, order);
+ return deferred_grow_zone(zone, order);
}
#else
static inline bool deferred_pages_enabled(void)
@@ -5806,10 +5806,11 @@ static void setup_per_zone_lowmem_reserve(void)
for (j = i + 1; j < MAX_NR_ZONES; j++) {
struct zone *upper_zone = &pgdat->node_zones[j];
+ bool empty = !zone_managed_pages(upper_zone);
managed_pages += zone_managed_pages(upper_zone);
- if (clear)
+ if (clear || empty)
zone->lowmem_reserve[j] = 0;
else
zone->lowmem_reserve[j] = managed_pages / ratio;
--
2.41.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 7/7] mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES
2024-03-26 6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
` (5 preceding siblings ...)
2024-03-26 6:11 ` [PATCH 6/7] mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone Baoquan He
@ 2024-03-26 6:11 ` Baoquan He
6 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-26 6:11 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman, Baoquan He
Earlier, in commit 1dd214b8f21c ("mm: page_alloc: avoid merging
non-fallbackable pageblocks with others"), migrate type MIGRATE_CMA and
MIGRATE_ISOLATE are removed from fallbacks list since they are never
used.
Later on, in commit ("aa02d3c174ab mm/page_alloc: reduce fallbacks to
(MIGRATE_PCPTYPES - 1)"), the array column size is reduced to
'MIGRATE_PCPTYPES - 1'. In fact, the array row size need be reduced to
MIGRATE_PCPTYPES too since it's only covering rows of the number
MIGRATE_PCPTYPES. Even though the current code has handled cases
when the migratetype is CMA, HIGHATOMIC and MEMORY_ISOLATION, making
the row size right is still good to avoid future error and confusion.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ce1d12cf2ec7..b60cdcda46d2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1592,7 +1592,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
*
* The other migratetypes do not have fallbacks.
*/
-static int fallbacks[MIGRATE_TYPES][MIGRATE_PCPTYPES - 1] = {
+static int fallbacks[MIGRATE_PCPTYPES][MIGRATE_PCPTYPES - 1] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE },
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE },
--
2.41.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists()
2024-03-26 6:11 ` [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists() Baoquan He
@ 2024-03-27 12:06 ` Baoquan He
0 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-03-27 12:06 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman
When CONFIG_NUMA=n, MAX_NUMNODES is always 1 because Kconfig item
NODES_SHIFT depends on NUMA. So in !NUMA version of build_zonelists(),
no need to bother with the two for loop because code execution won't
enter them ever.
Here, remove those unneeded codes in !NUMA version of build_zonelists().
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v1->v2:
Remove the unused local variables to fix lkp reported warning.
mm/page_alloc.c | 24 ------------------------
1 file changed, 24 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 14d39f34d336..d35b4cbeffbd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5180,37 +5180,13 @@ static void setup_min_slab_ratio(void);
static void build_zonelists(pg_data_t *pgdat)
{
- int node, local_node;
struct zoneref *zonerefs;
int nr_zones;
- local_node = pgdat->node_id;
-
zonerefs = pgdat->node_zonelists[ZONELIST_FALLBACK]._zonerefs;
nr_zones = build_zonerefs_node(pgdat, zonerefs);
zonerefs += nr_zones;
- /*
- * Now we build the zonelist so that it contains the zones
- * of all the other nodes.
- * We don't want to pressure a particular node, so when
- * building the zones for node N, we make sure that the
- * zones coming right after the local ones are those from
- * node N+1 (modulo N)
- */
- for (node = local_node + 1; node < MAX_NUMNODES; node++) {
- if (!node_online(node))
- continue;
- nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
- zonerefs += nr_zones;
- }
- for (node = 0; node < local_node; node++) {
- if (!node_online(node))
- continue;
- nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
- zonerefs += nr_zones;
- }
-
zonerefs->zone = NULL;
zonerefs->zone_idx = 0;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/7] mm: move array mem_section init code out of memory_present()
2024-03-26 6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
@ 2024-04-02 8:19 ` Mike Rapoport
2024-04-04 1:38 ` Baoquan He
0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2024-04-02 8:19 UTC (permalink / raw)
To: Baoquan He; +Cc: linux-mm, akpm, mgorman
On Tue, Mar 26, 2024 at 02:11:27PM +0800, Baoquan He wrote:
> When CONFIG_SPARSEMEM_EXTREME is enabled, mem_section need be initialized
> to point at a two-dimensional array, and its 1st dimension of length
> NR_SECTION_ROOTS will be dynamically allocated. Once the allocation is
> done, it's available for all nodes.
>
> So take the 1st dimension of mem_section initialization out of
> memory_present()(), and put it into memblocks_present() which is a more
> appripriate place.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
> mm/sparse.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aed0951b87fa..46e88549d1a6 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -226,19 +226,6 @@ static void __init memory_present(int nid, unsigned long start, unsigned long en
> {
> unsigned long pfn;
>
> -#ifdef CONFIG_SPARSEMEM_EXTREME
> - if (unlikely(!mem_section)) {
> - unsigned long size, align;
> -
> - size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
> - align = 1 << (INTERNODE_CACHE_SHIFT);
> - mem_section = memblock_alloc(size, align);
> - if (!mem_section)
> - panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
> - __func__, size, align);
> - }
> -#endif
> -
> start &= PAGE_SECTION_MASK;
> mminit_validate_memmodel_limits(&start, &end);
> for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
> @@ -267,6 +254,19 @@ static void __init memblocks_present(void)
> unsigned long start, end;
> int i, nid;
>
> +#ifdef CONFIG_SPARSEMEM_EXTREME
> + if (unlikely(!mem_section)) {
With the new calling sequence sparse_init() -> memblocks_present() ->
allocate mem_section, mem_section here will be NULL, so this if is not
really relevant. We might want to add WARN_ON_ONCE(mem_section) just in
case to catch multiple calls to sparse_init().
> + unsigned long size, align;
> +
> + size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
> + align = 1 << (INTERNODE_CACHE_SHIFT);
> + mem_section = memblock_alloc(size, align);
> + if (!mem_section)
> + panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
> + __func__, size, align);
> + }
> +#endif
> +
> for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid)
> memory_present(nid, start, end);
> }
> --
> 2.41.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/7] mm: make __absent_pages_in_range() as static
2024-03-26 6:11 ` [PATCH 3/7] mm: make __absent_pages_in_range() as static Baoquan He
@ 2024-04-02 8:29 ` Mike Rapoport
0 siblings, 0 replies; 18+ messages in thread
From: Mike Rapoport @ 2024-04-02 8:29 UTC (permalink / raw)
To: Baoquan He; +Cc: linux-mm, akpm, mgorman
On Tue, Mar 26, 2024 at 02:11:29PM +0800, Baoquan He wrote:
> It's only called in mm/mm_init.c now.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---
> include/linux/mm.h | 2 --
> mm/mm_init.c | 2 +-
> 2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ab1ba0a31429..74f71e802e0c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3193,8 +3193,6 @@ static inline unsigned long get_num_physpages(void)
> */
> void free_area_init(unsigned long *max_zone_pfn);
> unsigned long node_map_pfn_alignment(void);
> -unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
> - unsigned long end_pfn);
> extern unsigned long absent_pages_in_range(unsigned long start_pfn,
> unsigned long end_pfn);
> extern void get_pfn_range_for_nid(unsigned int nid,
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 99681ffb9091..cbdb49cdd2ce 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1143,7 +1143,7 @@ static void __init adjust_zone_range_for_zone_movable(int nid,
> * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
> * then all holes in the requested range will be accounted for.
> */
> -unsigned long __init __absent_pages_in_range(int nid,
> +static unsigned long __init __absent_pages_in_range(int nid,
> unsigned long range_start_pfn,
> unsigned long range_end_pfn)
> {
>
> --
> 2.41.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone()
2024-03-26 6:11 ` [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone() Baoquan He
@ 2024-04-02 8:30 ` Mike Rapoport
0 siblings, 0 replies; 18+ messages in thread
From: Mike Rapoport @ 2024-04-02 8:30 UTC (permalink / raw)
To: Baoquan He; +Cc: linux-mm, akpm, mgorman
On Tue, Mar 26, 2024 at 02:11:31PM +0800, Baoquan He wrote:
> The noinline attribute has been taken off in commit 9420f89db2dd ("mm:
> move most of core MM initialization to mm/mm_init.c"). So remove the
> unneeded code comment above deferred_grow_zone().
>
> And also remove the unneeded bracket in deferred_init_pages().
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---
> mm/mm_init.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index cbdb49cdd2ce..cc24e7958c0c 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -2006,7 +2006,7 @@ static unsigned long __init deferred_init_pages(struct zone *zone,
> __init_single_page(page, pfn, zid, nid);
> nr_pages++;
> }
> - return (nr_pages);
> + return nr_pages;
> }
>
> /*
> @@ -2208,10 +2208,6 @@ static int __init deferred_init_memmap(void *data)
> * Return true when zone was grown, otherwise return false. We return true even
> * when we grow less than requested, to let the caller decide if there are
> * enough pages to satisfy the allocation.
> - *
> - * Note: We use noinline because this function is needed only during boot, and
> - * it is called from a __ref function _deferred_grow_zone. This way we are
> - * making sure that it is not inlined into permanent text section.
> */
> bool __init deferred_grow_zone(struct zone *zone, unsigned int order)
> {
> --
> 2.41.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
2024-03-26 6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
@ 2024-04-02 8:32 ` Mike Rapoport
2024-04-04 3:23 ` Baoquan He
2024-04-10 3:35 ` [PATCH v2 " Baoquan He
1 sibling, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2024-04-02 8:32 UTC (permalink / raw)
To: Baoquan He; +Cc: linux-mm, akpm, mgorman
On Tue, Mar 26, 2024 at 02:11:28PM +0800, Baoquan He wrote:
> Because memory-less node's ->node_present_pages and its
> zone's ->present_pages are all 0, the judgement before calling
> node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
> node is enough to skip memory-less node. The 'continue;' statement
> inside for_each_node() loop of free_area_init() is gilding the lily.
>
> Here, remove the special handling to make memory-less node share the
> same code flow as normal node. And the code comment above the 'continue'
> statement is not needed either.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
> mm/mm_init.c | 18 +++---------------
> 1 file changed, 3 insertions(+), 15 deletions(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 089dc60159e9..99681ffb9091 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> panic("Cannot allocate %zuB for node %d.\n",
> sizeof(*pgdat), nid);
> arch_refresh_nodedata(nid, pgdat);
> - free_area_init_node(nid);
> -
> - /*
> - * We do not want to confuse userspace by sysfs
> - * files/directories for node without any memory
> - * attached to it, so this node is not marked as
> - * N_MEMORY and not marked online so that no sysfs
> - * hierarchy will be created via register_one_node for
> - * it. The pgdat will get fully initialized by
> - * hotadd_init_pgdat() when memory is hotplugged into
> - * this node.
> - */
I think this comment is still valuable. Maybe rephrase it a bit and move it
before 'if (pgdat->node_present_pages)'?
> - continue;
> }
>
> pgdat = NODE_DATA(nid);
> free_area_init_node(nid);
>
> /* Any memory on that node */
> - if (pgdat->node_present_pages)
> + if (pgdat->node_present_pages) {
> node_set_state(nid, N_MEMORY);
> - check_for_memory(pgdat);
> + check_for_memory(pgdat);
> + }
> }
>
> calc_nr_kernel_pages();
> --
> 2.41.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/7] mm: move array mem_section init code out of memory_present()
2024-04-02 8:19 ` Mike Rapoport
@ 2024-04-04 1:38 ` Baoquan He
0 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-04-04 1:38 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, akpm, mgorman
On 04/02/24 at 11:19am, Mike Rapoport wrote:
> On Tue, Mar 26, 2024 at 02:11:27PM +0800, Baoquan He wrote:
> > When CONFIG_SPARSEMEM_EXTREME is enabled, mem_section need be initialized
> > to point at a two-dimensional array, and its 1st dimension of length
> > NR_SECTION_ROOTS will be dynamically allocated. Once the allocation is
> > done, it's available for all nodes.
> >
> > So take the 1st dimension of mem_section initialization out of
> > memory_present()(), and put it into memblocks_present() which is a more
> > appripriate place.
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> > mm/sparse.c | 26 +++++++++++++-------------
> > 1 file changed, 13 insertions(+), 13 deletions(-)
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index aed0951b87fa..46e88549d1a6 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -226,19 +226,6 @@ static void __init memory_present(int nid, unsigned long start, unsigned long en
> > {
> > unsigned long pfn;
> >
> > -#ifdef CONFIG_SPARSEMEM_EXTREME
> > - if (unlikely(!mem_section)) {
> > - unsigned long size, align;
> > -
> > - size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
> > - align = 1 << (INTERNODE_CACHE_SHIFT);
> > - mem_section = memblock_alloc(size, align);
> > - if (!mem_section)
> > - panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
> > - __func__, size, align);
> > - }
> > -#endif
> > -
> > start &= PAGE_SECTION_MASK;
> > mminit_validate_memmodel_limits(&start, &end);
> > for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
> > @@ -267,6 +254,19 @@ static void __init memblocks_present(void)
> > unsigned long start, end;
> > int i, nid;
> >
> > +#ifdef CONFIG_SPARSEMEM_EXTREME
> > + if (unlikely(!mem_section)) {
>
> With the new calling sequence sparse_init() -> memblocks_present() ->
> allocate mem_section, mem_section here will be NULL, so this if is not
> really relevant. We might want to add WARN_ON_ONCE(mem_section) just in
> case to catch multiple calls to sparse_init().
Sorry for late reply.
Hmm, I am not very eager to add WARN_ON_ONCE(mem_section) becasue as you
said it's not very relevant. Calling sparse_init() miltiple times is not
acceptable, while it may not be good to let checking mem_section in
memblocks_present() to catch it.
>
> > + unsigned long size, align;
> > +
> > + size = sizeof(struct mem_section *) * NR_SECTION_ROOTS;
> > + align = 1 << (INTERNODE_CACHE_SHIFT);
> > + mem_section = memblock_alloc(size, align);
> > + if (!mem_section)
> > + panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
> > + __func__, size, align);
> > + }
> > +#endif
>
> > +
> > for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid)
> > memory_present(nid, start, end);
> > }
> > --
> > 2.41.0
> >
>
> --
> Sincerely yours,
> Mike.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
2024-04-02 8:32 ` Mike Rapoport
@ 2024-04-04 3:23 ` Baoquan He
2024-04-09 15:40 ` Mike Rapoport
0 siblings, 1 reply; 18+ messages in thread
From: Baoquan He @ 2024-04-04 3:23 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, akpm, mgorman
On 04/02/24 at 11:32am, Mike Rapoport wrote:
> On Tue, Mar 26, 2024 at 02:11:28PM +0800, Baoquan He wrote:
> > Because memory-less node's ->node_present_pages and its
> > zone's ->present_pages are all 0, the judgement before calling
> > node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
> > node is enough to skip memory-less node. The 'continue;' statement
> > inside for_each_node() loop of free_area_init() is gilding the lily.
> >
> > Here, remove the special handling to make memory-less node share the
> > same code flow as normal node. And the code comment above the 'continue'
> > statement is not needed either.
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> > mm/mm_init.c | 18 +++---------------
> > 1 file changed, 3 insertions(+), 15 deletions(-)
> >
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index 089dc60159e9..99681ffb9091 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> > panic("Cannot allocate %zuB for node %d.\n",
> > sizeof(*pgdat), nid);
> > arch_refresh_nodedata(nid, pgdat);
> > - free_area_init_node(nid);
> > -
> > - /*
> > - * We do not want to confuse userspace by sysfs
> > - * files/directories for node without any memory
> > - * attached to it, so this node is not marked as
> > - * N_MEMORY and not marked online so that no sysfs
> > - * hierarchy will be created via register_one_node for
> > - * it. The pgdat will get fully initialized by
> > - * hotadd_init_pgdat() when memory is hotplugged into
> > - * this node.
> > - */
>
> I think this comment is still valuable. Maybe rephrase it a bit and move it
> before 'if (pgdat->node_present_pages)'?
Fair enough.
Do you think below paragraph is OK to you? Please help polish or
rephrase it.
diff --git a/mm/mm_init.c b/mm/mm_init.c
index dd875f943cbb..3ce0f29637ba 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1839,7 +1839,14 @@ void __init free_area_init(unsigned long *max_zone_pfn)
pgdat = NODE_DATA(nid);
free_area_init_node(nid);
- /* Any memory on that node */
+ /*
+ * No sysfs hierarcy will be created via register_one_node()
+ *for memory-less node because here it's not marked as N_MEMORY
+ *and won't be set online later. The benefit is userspace
+ *program won't be confused by sysfs files/directories of
+ *memory-less node. The pgdat will get fully initialized by
+ *hotadd_init_pgdat() when memory is hotplugged into this node.
+ */
if (pgdat->node_present_pages) {
node_set_state(nid, N_MEMORY);
check_for_memory(pgdat);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
2024-04-04 3:23 ` Baoquan He
@ 2024-04-09 15:40 ` Mike Rapoport
2024-04-10 3:38 ` Baoquan He
0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2024-04-09 15:40 UTC (permalink / raw)
To: Baoquan He; +Cc: linux-mm, akpm, mgorman
On Thu, Apr 04, 2024 at 11:23:51AM +0800, Baoquan He wrote:
> On 04/02/24 at 11:32am, Mike Rapoport wrote:
> > On Tue, Mar 26, 2024 at 02:11:28PM +0800, Baoquan He wrote:
> > > Because memory-less node's ->node_present_pages and its
> > > zone's ->present_pages are all 0, the judgement before calling
> > > node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
> > > node is enough to skip memory-less node. The 'continue;' statement
> > > inside for_each_node() loop of free_area_init() is gilding the lily.
> > >
> > > Here, remove the special handling to make memory-less node share the
> > > same code flow as normal node. And the code comment above the 'continue'
> > > statement is not needed either.
> > >
> > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > ---
> > > mm/mm_init.c | 18 +++---------------
> > > 1 file changed, 3 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > index 089dc60159e9..99681ffb9091 100644
> > > --- a/mm/mm_init.c
> > > +++ b/mm/mm_init.c
> > > @@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> > > panic("Cannot allocate %zuB for node %d.\n",
> > > sizeof(*pgdat), nid);
> > > arch_refresh_nodedata(nid, pgdat);
> > > - free_area_init_node(nid);
> > > -
> > > - /*
> > > - * We do not want to confuse userspace by sysfs
> > > - * files/directories for node without any memory
> > > - * attached to it, so this node is not marked as
> > > - * N_MEMORY and not marked online so that no sysfs
> > > - * hierarchy will be created via register_one_node for
> > > - * it. The pgdat will get fully initialized by
> > > - * hotadd_init_pgdat() when memory is hotplugged into
> > > - * this node.
> > > - */
> >
> > I think this comment is still valuable. Maybe rephrase it a bit and move it
> > before 'if (pgdat->node_present_pages)'?
>
> Fair enough.
>
> Do you think below paragraph is OK to you? Please help polish or
> rephrase it.
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index dd875f943cbb..3ce0f29637ba 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1839,7 +1839,14 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> pgdat = NODE_DATA(nid);
> free_area_init_node(nid);
>
> - /* Any memory on that node */
> + /*
> + * No sysfs hierarcy will be created via register_one_node()
> + *for memory-less node because here it's not marked as N_MEMORY
> + *and won't be set online later. The benefit is userspace
> + *program won't be confused by sysfs files/directories of
> + *memory-less node. The pgdat will get fully initialized by
> + *hotadd_init_pgdat() when memory is hotplugged into this node.
> + */
Ack
> if (pgdat->node_present_pages) {
> node_set_state(nid, N_MEMORY);
> check_for_memory(pgdat);
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/7] mm/init: remove the unnecessary special treatment for memory-less node
2024-03-26 6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
2024-04-02 8:32 ` Mike Rapoport
@ 2024-04-10 3:35 ` Baoquan He
1 sibling, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-04-10 3:35 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, rppt, mgorman
Because memory-less node's ->node_present_pages and its
zone's ->present_pages are all 0, the judgement before calling
node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
node is enough to skip memory-less node. The 'continue;' statement
inside for_each_node() loop of free_area_init() is gilding the lily.
Here, remove the special handling to make memory-less node share the
same code flow as normal node.
And also rephrase the code comments above the 'continue' statement
and move them above above line 'if (pgdat->node_present_pages)'.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v1->v2:
- As Mike suggested, the old code comments above the 'continue'
statement is still useful for easier understanding code and system
behaviour. So rephrase and move them above line 'if
(pgdat->node_present_pages)'. Thanks to Mike.
mm/mm_init.c | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 2016ca8031e9..32ede966e609 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1834,28 +1834,23 @@ void __init free_area_init(unsigned long *max_zone_pfn)
panic("Cannot allocate %zuB for node %d.\n",
sizeof(*pgdat), nid);
arch_refresh_nodedata(nid, pgdat);
- free_area_init_node(nid);
-
- /*
- * We do not want to confuse userspace by sysfs
- * files/directories for node without any memory
- * attached to it, so this node is not marked as
- * N_MEMORY and not marked online so that no sysfs
- * hierarchy will be created via register_one_node for
- * it. The pgdat will get fully initialized by
- * hotadd_init_pgdat() when memory is hotplugged into
- * this node.
- */
- continue;
}
pgdat = NODE_DATA(nid);
free_area_init_node(nid);
- /* Any memory on that node */
- if (pgdat->node_present_pages)
+ /*
+ * No sysfs hierarcy will be created via register_one_node()
+ *for memory-less node because here it's not marked as N_MEMORY
+ *and won't be set online later. The benefit is userspace
+ *program won't be confused by sysfs files/directories of
+ *memory-less node. The pgdat will get fully initialized by
+ *hotadd_init_pgdat() when memory is hotplugged into this node.
+ */
+ if (pgdat->node_present_pages) {
node_set_state(nid, N_MEMORY);
- check_for_memory(pgdat);
+ check_for_memory(pgdat);
+ }
}
calc_nr_kernel_pages();
--
2.41.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node
2024-04-09 15:40 ` Mike Rapoport
@ 2024-04-10 3:38 ` Baoquan He
0 siblings, 0 replies; 18+ messages in thread
From: Baoquan He @ 2024-04-10 3:38 UTC (permalink / raw)
To: Mike Rapoport, akpm; +Cc: linux-mm, mgorman
On 04/09/24 at 06:40pm, Mike Rapoport wrote:
> On Thu, Apr 04, 2024 at 11:23:51AM +0800, Baoquan He wrote:
> > On 04/02/24 at 11:32am, Mike Rapoport wrote:
> > > On Tue, Mar 26, 2024 at 02:11:28PM +0800, Baoquan He wrote:
> > > > Because memory-less node's ->node_present_pages and its
> > > > zone's ->present_pages are all 0, the judgement before calling
> > > > node_set_state() to set N_MEMORY, N_HIGH_MEMORY, N_NORMAL_MEMORY for
> > > > node is enough to skip memory-less node. The 'continue;' statement
> > > > inside for_each_node() loop of free_area_init() is gilding the lily.
> > > >
> > > > Here, remove the special handling to make memory-less node share the
> > > > same code flow as normal node. And the code comment above the 'continue'
> > > > statement is not needed either.
> > > >
> > > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > > ---
> > > > mm/mm_init.c | 18 +++---------------
> > > > 1 file changed, 3 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > > index 089dc60159e9..99681ffb9091 100644
> > > > --- a/mm/mm_init.c
> > > > +++ b/mm/mm_init.c
> > > > @@ -1834,28 +1834,16 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> > > > panic("Cannot allocate %zuB for node %d.\n",
> > > > sizeof(*pgdat), nid);
> > > > arch_refresh_nodedata(nid, pgdat);
> > > > - free_area_init_node(nid);
> > > > -
> > > > - /*
> > > > - * We do not want to confuse userspace by sysfs
> > > > - * files/directories for node without any memory
> > > > - * attached to it, so this node is not marked as
> > > > - * N_MEMORY and not marked online so that no sysfs
> > > > - * hierarchy will be created via register_one_node for
> > > > - * it. The pgdat will get fully initialized by
> > > > - * hotadd_init_pgdat() when memory is hotplugged into
> > > > - * this node.
> > > > - */
> > >
> > > I think this comment is still valuable. Maybe rephrase it a bit and move it
> > > before 'if (pgdat->node_present_pages)'?
> >
> > Fair enough.
> >
> > Do you think below paragraph is OK to you? Please help polish or
> > rephrase it.
> >
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index dd875f943cbb..3ce0f29637ba 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -1839,7 +1839,14 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> > pgdat = NODE_DATA(nid);
> > free_area_init_node(nid);
> >
> > - /* Any memory on that node */
> > + /*
> > + * No sysfs hierarcy will be created via register_one_node()
> > + *for memory-less node because here it's not marked as N_MEMORY
> > + *and won't be set online later. The benefit is userspace
> > + *program won't be confused by sysfs files/directories of
> > + *memory-less node. The pgdat will get fully initialized by
> > + *hotadd_init_pgdat() when memory is hotplugged into this node.
> > + */
>
> Ack
Thanks a lot for reviewing and confirming, Mike.
Hi Andrew,
I have sent out v2 to include above code comment changing, please feel
free to pick the draft patch and append, or take the newly post v2.
Thanks.
>
> > if (pgdat->node_present_pages) {
> > node_set_state(nid, N_MEMORY);
> > check_for_memory(pgdat);
> >
>
> --
> Sincerely yours,
> Mike.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-04-10 3:38 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-26 6:11 [PATCH 0/7] mm/init: minor clean up and improvement Baoquan He
2024-03-26 6:11 ` [PATCH 1/7] mm: move array mem_section init code out of memory_present() Baoquan He
2024-04-02 8:19 ` Mike Rapoport
2024-04-04 1:38 ` Baoquan He
2024-03-26 6:11 ` [PATCH 2/7] mm/init: remove the unnecessary special treatment for memory-less node Baoquan He
2024-04-02 8:32 ` Mike Rapoport
2024-04-04 3:23 ` Baoquan He
2024-04-09 15:40 ` Mike Rapoport
2024-04-10 3:38 ` Baoquan He
2024-04-10 3:35 ` [PATCH v2 " Baoquan He
2024-03-26 6:11 ` [PATCH 3/7] mm: make __absent_pages_in_range() as static Baoquan He
2024-04-02 8:29 ` Mike Rapoport
2024-03-26 6:11 ` [PATCH 4/7] mm/page_alloc.c: remove unneeded codes in !NUMA version of build_zonelists() Baoquan He
2024-03-27 12:06 ` [PATCH v2 " Baoquan He
2024-03-26 6:11 ` [PATCH 5/7] mm/mm_init.c: remove the outdated code comment above deferred_grow_zone() Baoquan He
2024-04-02 8:30 ` Mike Rapoport
2024-03-26 6:11 ` [PATCH 6/7] mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone Baoquan He
2024-03-26 6:11 ` [PATCH 7/7] mm/page_alloc.c: change the array-length to MIGRATE_PCPTYPES Baoquan He
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).