All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] A minor hotplug refactoring
@ 2022-03-07 15:07 Oscar Salvador
  2022-03-07 15:07 ` [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Oscar Salvador @ 2022-03-07 15:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Wei Yang, Miaohe Lin, linux-mm,
	linux-kernel, Oscar Salvador

Hi,

These are a few cleanups that go on top of Michal's work [1].

Thanks

[1] https://lore.kernel.org/lkml/20220127085305.20890-1-mhocko@kernel.org/

Oscar Salvador (3):
  mm/page_alloc: Do not calculate node's total pages and memmap pages
    when empty
  mm/memory_hotplug: Reset node's state when empty during offline
  mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node

 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 93 ++++++++++++++++++----------------
 mm/page_alloc.c                | 59 +++++++--------------
 3 files changed, 69 insertions(+), 85 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-03-07 15:07 [PATCH 0/3] A minor hotplug refactoring Oscar Salvador
@ 2022-03-07 15:07 ` Oscar Salvador
  2022-04-28 12:13   ` David Hildenbrand
  2022-03-07 15:07 ` [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Oscar Salvador @ 2022-03-07 15:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Wei Yang, Miaohe Lin, linux-mm,
	linux-kernel, Oscar Salvador

free_area_init_node() calls calculate_node_totalpages() and
free_area_init_core(). The former to get node's {spanned,present}_pages,
and the latter to calculate, among other things, how many pages per zone
we spent on memmap_pages, which is used to substract zone's free pages.

On memoryless-nodes, it is pointless to perform such a bunch of work, so
make sure we skip the calculations when having a node or empty zone.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_alloc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 967085c1c78a..0b7d176a8990 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7312,6 +7312,10 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
 	unsigned long realtotalpages = 0, totalpages = 0;
 	enum zone_type i;
 
+	/* Skip calculation for memoryless nodes */
+	if (node_start_pfn == node_end_pfn)
+		goto no_pages;
+
 	for (i = 0; i < MAX_NR_ZONES; i++) {
 		struct zone *zone = pgdat->node_zones + i;
 		unsigned long zone_start_pfn, zone_end_pfn;
@@ -7344,6 +7348,7 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
 		realtotalpages += real_size;
 	}
 
+no_pages:
 	pgdat->node_spanned_pages = totalpages;
 	pgdat->node_present_pages = realtotalpages;
 	pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
@@ -7562,6 +7567,10 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 		size = zone->spanned_pages;
 		freesize = zone->present_pages;
 
+		/* No pages? Nothing to calculate then. */
+		if (!size)
+			goto no_pages;
+
 		/*
 		 * Adjust freesize so that it accounts for how much memory
 		 * is used by this zone for memmap. This affects the watermark
@@ -7597,6 +7606,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 		 * when the bootmem allocator frees pages into the buddy system.
 		 * And all highmem pages will be managed by the buddy system.
 		 */
+no_pages:
 		zone_init_internals(zone, j, nid, freesize);
 
 		if (!size)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline
  2022-03-07 15:07 [PATCH 0/3] A minor hotplug refactoring Oscar Salvador
  2022-03-07 15:07 ` [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
@ 2022-03-07 15:07 ` Oscar Salvador
  2022-04-28 12:30   ` David Hildenbrand
  2022-03-07 15:07 ` [PATCH 3/3] mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node Oscar Salvador
  2022-04-27 21:05 ` [PATCH 0/3] A minor hotplug refactoring Andrew Morton
  3 siblings, 1 reply; 11+ messages in thread
From: Oscar Salvador @ 2022-03-07 15:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Wei Yang, Miaohe Lin, linux-mm,
	linux-kernel, Oscar Salvador

All possible nodes are now pre-allocated at boot time by free_area_init()->
free_area_init_node(), and those which are to be hot-plugged are initialized
later on by hotadd_init_pgdat()->free_area_init_core_hotplug() when they
become online.

free_area_init_core_hotplug() calls pgdat_init_internals() and
zone_init_internals() to initialize some internal data structures
and zeroes a few pgdat fields.

But we do already call pgdat_init_internals() and zone_init_internals()
for all possible nodes back in free_area_init_core(), and pgdat fields
are already zeroed because the pre-allocation memsets with 0s the
structure, meaning we do not need to repeat the process when
the node becomes online.

So initialize it only once when booting, and make sure to reset
the fields we care about to 0 when the node goes empty.
The only thing we need to check for is to allocate per_cpu_nodestats
struct the very first time this node goes online.

node_reset_state() is the function in charge of resetting pgdat's fields,
and it is called when offline_pages() detects that the node becomes empty
worth of memory.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 58 +++++++++++++++++++++-------------
 mm/page_alloc.c                | 49 +++++-----------------------
 3 files changed, 45 insertions(+), 64 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 76bf2de86def..fcf4c9a023cc 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct zone *zone);
 extern void clear_zone_contiguous(struct zone *zone);
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
+extern bool pgdat_has_boot_nodestats(pg_data_t *pgdat);
 extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
 extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
 extern int add_memory_resource(int nid, struct resource *resource,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ddc62f8b591f..07cece9e22e4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1164,18 +1164,18 @@ static void reset_node_present_pages(pg_data_t *pgdat)
 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
 static pg_data_t __ref *hotadd_init_pgdat(int nid)
 {
-	struct pglist_data *pgdat;
+	struct pglist_data *pgdat = NODE_DATA(nid);
 
 	/*
-	 * NODE_DATA is preallocated (free_area_init) but its internal
-	 * state is not allocated completely. Add missing pieces.
-	 * Completely offline nodes stay around and they just need
-	 * reintialization.
+	 * NODE_DATA is preallocated (free_area_init), the only thing missing
+	 * is to allocate its per_cpu_nodestats struct and to build node's
+	 * zonelists. The allocation of per_cpu_nodestats only needs to be done
+	 * the very first time this node is brought up, as we reset its state
+	 * when all node's memory goes offline.
 	 */
-	pgdat = NODE_DATA(nid);
-
-	/* init node's zones as empty zones, we don't have any present pages.*/
-	free_area_init_core_hotplug(pgdat);
+	if (pgdat_has_boot_nodestats(pgdat))
+		pgdat->per_cpu_nodestats = alloc_percpu_gfp(struct per_cpu_nodestat,
+							    __GFP_ZERO);
 
 	/*
 	 * The node we allocated has no zone fallback lists. For avoiding
@@ -1183,15 +1183,6 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
 	 */
 	build_all_zonelists(pgdat);
 
-	/*
-	 * When memory is hot-added, all the memory is in offline state. So
-	 * clear all zones' present_pages because they will be updated in
-	 * online_pages() and offline_pages().
-	 * TODO: should be in free_area_init_core_hotplug?
-	 */
-	reset_node_managed_pages(pgdat);
-	reset_node_present_pages(pgdat);
-
 	return pgdat;
 }
 
@@ -1799,6 +1790,30 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
 		node_clear_state(node, N_MEMORY);
 }
 
+static void node_reset_state(int node)
+{
+	pg_data_t *pgdat = NODE_DATA(node);
+	int cpu;
+
+	kswapd_stop(node);
+	kcompactd_stop(node);
+
+	reset_node_managed_pages(pgdat);
+	reset_node_present_pages(pgdat);
+
+	pgdat->nr_zones = 0;
+	pgdat->kswapd_order = 0;
+	pgdat->kswapd_highest_zoneidx = 0;
+	pgdat->node_start_pfn = 0;
+
+	for_each_online_cpu(cpu) {
+		struct per_cpu_nodestat *p;
+
+		p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
+		memset(p, 0, sizeof(*p));
+	}
+}
+
 static int count_system_ram_pages_cb(unsigned long start_pfn,
 				     unsigned long nr_pages, void *data)
 {
@@ -1956,10 +1971,9 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	}
 
 	node_states_clear_node(node, &arg);
-	if (arg.status_change_nid >= 0) {
-		kswapd_stop(node);
-		kcompactd_stop(node);
-	}
+	if (arg.status_change_nid >= 0)
+		/* Reset node's state as all its memory went offline. */
+		node_reset_state(node);
 
 	writeback_set_ratelimit();
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b7d176a8990..36e67418a6f0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6382,6 +6382,13 @@ static DEFINE_PER_CPU(struct per_cpu_pages, boot_pageset);
 static DEFINE_PER_CPU(struct per_cpu_zonestat, boot_zonestats);
 static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+bool pgdat_has_boot_nodestats(pg_data_t *pgdat)
+{
+	return pgdat->per_cpu_nodestats == &boot_nodestats;
+}
+#endif
+
 static void __build_all_zonelists(void *data)
 {
 	int nid;
@@ -7491,7 +7498,7 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
 	lruvec_init(&pgdat->__lruvec);
 }
 
-static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
+static void __init zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
 							unsigned long remaining_pages)
 {
 	atomic_long_set(&zone->managed_pages, remaining_pages);
@@ -7503,46 +7510,6 @@ static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx,
 	zone_pcp_init(zone);
 }
 
-/*
- * Set up the zone data structures
- * - init pgdat internals
- * - init all zones belonging to this node
- *
- * NOTE: this function is only called during memory hotplug
- */
-#ifdef CONFIG_MEMORY_HOTPLUG
-void __ref free_area_init_core_hotplug(struct pglist_data *pgdat)
-{
-	int nid = pgdat->node_id;
-	enum zone_type z;
-	int cpu;
-
-	pgdat_init_internals(pgdat);
-
-	if (pgdat->per_cpu_nodestats == &boot_nodestats)
-		pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);
-
-	/*
-	 * Reset the nr_zones, order and highest_zoneidx before reuse.
-	 * Note that kswapd will init kswapd_highest_zoneidx properly
-	 * when it starts in the near future.
-	 */
-	pgdat->nr_zones = 0;
-	pgdat->kswapd_order = 0;
-	pgdat->kswapd_highest_zoneidx = 0;
-	pgdat->node_start_pfn = 0;
-	for_each_online_cpu(cpu) {
-		struct per_cpu_nodestat *p;
-
-		p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
-		memset(p, 0, sizeof(*p));
-	}
-
-	for (z = 0; z < MAX_NR_ZONES; z++)
-		zone_init_internals(&pgdat->node_zones[z], z, nid, 0);
-}
-#endif
-
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node
  2022-03-07 15:07 [PATCH 0/3] A minor hotplug refactoring Oscar Salvador
  2022-03-07 15:07 ` [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
  2022-03-07 15:07 ` [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
@ 2022-03-07 15:07 ` Oscar Salvador
  2022-04-28 12:35   ` David Hildenbrand
  2022-04-27 21:05 ` [PATCH 0/3] A minor hotplug refactoring Andrew Morton
  3 siblings, 1 reply; 11+ messages in thread
From: Oscar Salvador @ 2022-03-07 15:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Wei Yang, Miaohe Lin, linux-mm,
	linux-kernel, Oscar Salvador

Since we pre-allocate all nodes now, hotadd_init_pgdat() does
not need to return a pgdat struct, as that was meant for
__try_online_node() to check whether the node was succesfully
allocated.

Also get rid of the __ref as all functions hotadd_init_pgdat()
calls fall within the same section.

Also try to make more clear the return codes from __try_online_node().
__try_online_node() can return either 0, 1 or -errno (the latter not really
as the BUG_ON() would catch it before we return) but depending on the caller
that has different meanings.
For add_memory_resource(), when __try_online_node() returns non-zero,
it means that the node was already allocated and it does not need to bring
it up. It is fine not to check for -errno values because we do not
get to call register_one_node() when !set_node_online.
For those who call try_online_node(), so set_node_online is true, a value
other than zero means a failure (e.g: cpu_up() or find_and_online_cpu_nid()).

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory_hotplug.c | 35 ++++++++++++++---------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 07cece9e22e4..5c92ac81a399 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1161,8 +1161,7 @@ static void reset_node_present_pages(pg_data_t *pgdat)
 	pgdat->node_present_pages = 0;
 }
 
-/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
-static pg_data_t __ref *hotadd_init_pgdat(int nid)
+static void hotadd_init_pgdat(int nid)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
 
@@ -1182,8 +1181,6 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
 	 * to access not-initialized zonelist, build here.
 	 */
 	build_all_zonelists(pgdat);
-
-	return pgdat;
 }
 
 /*
@@ -1193,31 +1190,27 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
  * called by cpu_up() to online a node without onlined memory.
  *
  * Returns:
- * 1 -> a new node has been allocated
- * 0 -> the node is already online
- * -ENOMEM -> the node could not be allocated
+ * 1 -> The node has been initialized.
+ * 0 -> Either the node was already online, or we succesfully registered a new
+ *      one.
+ * -errno -> register_one_node() failed.
  */
 static int __try_online_node(int nid, bool set_node_online)
 {
-	pg_data_t *pgdat;
-	int ret = 1;
+	int ret;
 
 	if (node_online(nid))
 		return 0;
 
-	pgdat = hotadd_init_pgdat(nid);
-	if (!pgdat) {
-		pr_err("Cannot online node %d due to NULL pgdat\n", nid);
-		ret = -ENOMEM;
-		goto out;
-	}
+	hotadd_init_pgdat(nid);
+
+	if (!set_node_online)
+		return 1;
+
+	node_set_online(nid);
+	ret = register_one_node(nid);
+	BUG_ON(ret);
 
-	if (set_node_online) {
-		node_set_online(nid);
-		ret = register_one_node(nid);
-		BUG_ON(ret);
-	}
-out:
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] A minor hotplug refactoring
  2022-03-07 15:07 [PATCH 0/3] A minor hotplug refactoring Oscar Salvador
                   ` (2 preceding siblings ...)
  2022-03-07 15:07 ` [PATCH 3/3] mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node Oscar Salvador
@ 2022-04-27 21:05 ` Andrew Morton
  3 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2022-04-27 21:05 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: David Hildenbrand, Michal Hocko, Wei Yang, Miaohe Lin, linux-mm,
	linux-kernel

On Mon,  7 Mar 2022 16:07:22 +0100 Oscar Salvador <osalvador@suse.de> wrote:

> Hi,
> 
> These are a few cleanups that go on top of Michal's work [1].
> 

Has anyone (else) had a chance to review these?

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-03-07 15:07 ` [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
@ 2022-04-28 12:13   ` David Hildenbrand
  2022-05-05  9:09     ` Oscar Salvador
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2022-04-28 12:13 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Michal Hocko, Wei Yang, Miaohe Lin, linux-mm, linux-kernel

On 07.03.22 16:07, Oscar Salvador wrote:
> free_area_init_node() calls calculate_node_totalpages() and
> free_area_init_core(). The former to get node's {spanned,present}_pages,
> and the latter to calculate, among other things, how many pages per zone
> we spent on memmap_pages, which is used to substract zone's free pages.
> 
> On memoryless-nodes, it is pointless to perform such a bunch of work, so
> make sure we skip the calculations when having a node or empty zone.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Sorry, I'm late with review. My mailbox got flooded.

> ---
>  mm/page_alloc.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 967085c1c78a..0b7d176a8990 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7312,6 +7312,10 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
>  	unsigned long realtotalpages = 0, totalpages = 0;
>  	enum zone_type i;
>  
> +	/* Skip calculation for memoryless nodes */
> +	if (node_start_pfn == node_end_pfn)
> +		goto no_pages;
> +

Just a NIT:

E.g., in zone_spanned_pages_in_node() we test for
	!node_start_pfn && !node_end_pfn

In update_pgdat_span(), we set
	node_start_pfn = node_end_pfn = 0;
when we find an empty node during memory unplug.

Therefore, I wonder if a helper "is_memoryless_node()" or "node_empty()"
might be reasonable, that just checks for either
	!node_start_pfn && !node_end_pfn
or
	node_start_pfn == node_end_pfn



>  	for (i = 0; i < MAX_NR_ZONES; i++) {
>  		struct zone *zone = pgdat->node_zones + i;
>  		unsigned long zone_start_pfn, zone_end_pfn;
> @@ -7344,6 +7348,7 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
>  		realtotalpages += real_size;
>  	}
>  
> +no_pages:
>  	pgdat->node_spanned_pages = totalpages;
>  	pgdat->node_present_pages = realtotalpages;
>  	pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
> @@ -7562,6 +7567,10 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
>  		size = zone->spanned_pages;
>  		freesize = zone->present_pages;
>  
> +		/* No pages? Nothing to calculate then. */
> +		if (!size)
> +			goto no_pages;
> +
>  		/*
>  		 * Adjust freesize so that it accounts for how much memory
>  		 * is used by this zone for memmap. This affects the watermark
> @@ -7597,6 +7606,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
>  		 * when the bootmem allocator frees pages into the buddy system.
>  		 * And all highmem pages will be managed by the buddy system.
>  		 */
> +no_pages:
>  		zone_init_internals(zone, j, nid, freesize);
>  
>  		if (!size)

We have another size check below. We essentially have right now:

"
	if (!size)
		goto no_pages;

	[code]
no_pages:
	zone_init_internals(zone, j, nid, freesize);

	if (!size)
		continue
	[more code]
"

IMHO, it would be nicer to avoid the label/goto by just doing a:

"
	if (!size) {
		zone_init_internals(zone, j, nid, 0);
		continue;
	}

	[code]
	zone_init_internals(zone, j, nid, freesize);
	[more code]
"

Or factoring out [code] into a separate function.


Anyhow, the change itself looks sane.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline
  2022-03-07 15:07 ` [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
@ 2022-04-28 12:30   ` David Hildenbrand
  2022-05-05 10:37     ` Oscar Salvador
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2022-04-28 12:30 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Michal Hocko, Wei Yang, Miaohe Lin, linux-mm, linux-kernel

On 07.03.22 16:07, Oscar Salvador wrote:
> All possible nodes are now pre-allocated at boot time by free_area_init()->
> free_area_init_node(), and those which are to be hot-plugged are initialized
> later on by hotadd_init_pgdat()->free_area_init_core_hotplug() when they
> become online.
> 
> free_area_init_core_hotplug() calls pgdat_init_internals() and
> zone_init_internals() to initialize some internal data structures
> and zeroes a few pgdat fields.
> 
> But we do already call pgdat_init_internals() and zone_init_internals()
> for all possible nodes back in free_area_init_core(), and pgdat fields
> are already zeroed because the pre-allocation memsets with 0s the
> structure, meaning we do not need to repeat the process when
> the node becomes online.
> 
> So initialize it only once when booting, and make sure to reset
> the fields we care about to 0 when the node goes empty.
> The only thing we need to check for is to allocate per_cpu_nodestats
> struct the very first time this node goes online.
> 
> node_reset_state() is the function in charge of resetting pgdat's fields,
> and it is called when offline_pages() detects that the node becomes empty
> worth of memory.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/memory_hotplug.h |  2 +-
>  mm/memory_hotplug.c            | 58 +++++++++++++++++++++-------------
>  mm/page_alloc.c                | 49 +++++-----------------------
>  3 files changed, 45 insertions(+), 64 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 76bf2de86def..fcf4c9a023cc 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct zone *zone);
>  extern void clear_zone_contiguous(struct zone *zone);
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
> +extern bool pgdat_has_boot_nodestats(pg_data_t *pgdat);
>  extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>  extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>  extern int add_memory_resource(int nid, struct resource *resource,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ddc62f8b591f..07cece9e22e4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1164,18 +1164,18 @@ static void reset_node_present_pages(pg_data_t *pgdat)
>  /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
>  static pg_data_t __ref *hotadd_init_pgdat(int nid)
>  {
> -	struct pglist_data *pgdat;
> +	struct pglist_data *pgdat = NODE_DATA(nid);
>  
>  	/*
> -	 * NODE_DATA is preallocated (free_area_init) but its internal
> -	 * state is not allocated completely. Add missing pieces.
> -	 * Completely offline nodes stay around and they just need
> -	 * reintialization.
> +	 * NODE_DATA is preallocated (free_area_init), the only thing missing
> +	 * is to allocate its per_cpu_nodestats struct and to build node's
> +	 * zonelists. The allocation of per_cpu_nodestats only needs to be done
> +	 * the very first time this node is brought up, as we reset its state
> +	 * when all node's memory goes offline.
>  	 */
> -	pgdat = NODE_DATA(nid);
> -
> -	/* init node's zones as empty zones, we don't have any present pages.*/
> -	free_area_init_core_hotplug(pgdat);
> +	if (pgdat_has_boot_nodestats(pgdat))
> +		pgdat->per_cpu_nodestats = alloc_percpu_gfp(struct per_cpu_nodestat,
> +							    __GFP_ZERO);
>  
>  	/*
>  	 * The node we allocated has no zone fallback lists. For avoiding
> @@ -1183,15 +1183,6 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
>  	 */
>  	build_all_zonelists(pgdat);
>  
> -	/*
> -	 * When memory is hot-added, all the memory is in offline state. So
> -	 * clear all zones' present_pages because they will be updated in
> -	 * online_pages() and offline_pages().
> -	 * TODO: should be in free_area_init_core_hotplug?
> -	 */
> -	reset_node_managed_pages(pgdat);
> -	reset_node_present_pages(pgdat);
> -
>  	return pgdat;
>  }
>  
> @@ -1799,6 +1790,30 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
>  		node_clear_state(node, N_MEMORY);
>  }
>  
> +static void node_reset_state(int node)
> +{
> +	pg_data_t *pgdat = NODE_DATA(node);
> +	int cpu;
> +
> +	kswapd_stop(node);
> +	kcompactd_stop(node);
> +
> +	reset_node_managed_pages(pgdat);
> +	reset_node_present_pages(pgdat);
> +
> +	pgdat->nr_zones = 0;
> +	pgdat->kswapd_order = 0;
> +	pgdat->kswapd_highest_zoneidx = 0;
> +	pgdat->node_start_pfn = 0;


I'm confused why we have to mess with
* present pages
* managed pages
* node_start_pfn

here at all.

1) If there would be any present page left, calling node_reset_state()
would be a BUG.
2) If there would be any manged page left, calling node_reset_state()
would be a BUG.
3) node_start_pfn will be properly updated by
remove_pfn_range_from_zone()->update_pgdat_span()


To make it clearer, I *think* touching node_start_pfn is very wrong.

What if the node still has ZONE_DEVICE? They don't account towards
present pages but only towards spanned pages, and we're messing with the
start range.

remove_pfn_range_from_zone()->update_pgdat_span() should be the only
place that modifies the spanned range when offlining.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node
  2022-03-07 15:07 ` [PATCH 3/3] mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node Oscar Salvador
@ 2022-04-28 12:35   ` David Hildenbrand
  2022-05-05  9:17     ` Oscar Salvador
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2022-04-28 12:35 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Michal Hocko, Wei Yang, Miaohe Lin, linux-mm, linux-kernel

On 07.03.22 16:07, Oscar Salvador wrote:
> Since we pre-allocate all nodes now, hotadd_init_pgdat() does
> not need to return a pgdat struct, as that was meant for
> __try_online_node() to check whether the node was succesfully
> allocated.
> 
> Also get rid of the __ref as all functions hotadd_init_pgdat()
> calls fall within the same section.
> 
> Also try to make more clear the return codes from __try_online_node().
> __try_online_node() can return either 0, 1 or -errno (the latter not really
> as the BUG_ON() would catch it before we return) but depending on the caller
> that has different meanings.
> For add_memory_resource(), when __try_online_node() returns non-zero,
> it means that the node was already allocated and it does not need to bring
> it up. It is fine not to check for -errno values because we do not
> get to call register_one_node() when !set_node_online.
> For those who call try_online_node(), so set_node_online is true, a value
> other than zero means a failure (e.g: cpu_up() or find_and_online_cpu_nid()).
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/memory_hotplug.c | 35 ++++++++++++++---------------------
>  1 file changed, 14 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 07cece9e22e4..5c92ac81a399 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1161,8 +1161,7 @@ static void reset_node_present_pages(pg_data_t *pgdat)
>  	pgdat->node_present_pages = 0;
>  }
>  
> -/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
> -static pg_data_t __ref *hotadd_init_pgdat(int nid)
> +static void hotadd_init_pgdat(int nid)
>  {
>  	struct pglist_data *pgdat = NODE_DATA(nid);
>  
> @@ -1182,8 +1181,6 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
>  	 * to access not-initialized zonelist, build here.
>  	 */
>  	build_all_zonelists(pgdat);
> -
> -	return pgdat;
>  }
>  
>  /*
> @@ -1193,31 +1190,27 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
>   * called by cpu_up() to online a node without onlined memory.
>   *
>   * Returns:
> - * 1 -> a new node has been allocated
> - * 0 -> the node is already online
> - * -ENOMEM -> the node could not be allocated
> + * 1 -> The node has been initialized.
> + * 0 -> Either the node was already online, or we succesfully registered a new
> + *      one.
> + * -errno -> register_one_node() failed.
>   */
>  static int __try_online_node(int nid, bool set_node_online)
>  {
> -	pg_data_t *pgdat;
> -	int ret = 1;
> +	int ret;
>  
>  	if (node_online(nid))
>  		return 0;
>  
> -	pgdat = hotadd_init_pgdat(nid);
> -	if (!pgdat) {
> -		pr_err("Cannot online node %d due to NULL pgdat\n", nid);
> -		ret = -ENOMEM;
> -		goto out;
> -	}
> +	hotadd_init_pgdat(nid);
> +
> +	if (!set_node_online)
> +		return 1;
> +
> +	node_set_online(nid);
> +	ret = register_one_node(nid);
> +	BUG_ON(ret);
>  
> -	if (set_node_online) {
> -		node_set_online(nid);
> -		ret = register_one_node(nid);
> -		BUG_ON(ret);
> -	}
> -out:
>  	return ret;

BUG_ON(ret);
return ret;

hm? This will never return :)

So either leave the old code flow or "return 1;" I'd actually prefer the
old code flow to then "return 1;" here:

if (set_node_online) {
	node_set_online(nid);
	BUG_ON(register_one_node(nid));
}
return 1;

We can then let go of "ret".

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-04-28 12:13   ` David Hildenbrand
@ 2022-05-05  9:09     ` Oscar Salvador
  0 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2022-05-05  9:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Michal Hocko, Wei Yang, Miaohe Lin, linux-mm,
	linux-kernel

On Thu, Apr 28, 2022 at 02:13:46PM +0200, David Hildenbrand wrote:
> Sorry, I'm late with review. My mailbox got flooded.

Hi David :)

> > +	/* Skip calculation for memoryless nodes */
> > +	if (node_start_pfn == node_end_pfn)
> > +		goto no_pages;
> > +
> 
> Just a NIT:
> 
> E.g., in zone_spanned_pages_in_node() we test for
> 	!node_start_pfn && !node_end_pfn
> 
> In update_pgdat_span(), we set
> 	node_start_pfn = node_end_pfn = 0;
> when we find an empty node during memory unplug.
> 
> Therefore, I wonder if a helper "is_memoryless_node()" or "node_empty()"
> might be reasonable, that just checks for either
> 	!node_start_pfn && !node_end_pfn
> or
> 	node_start_pfn == node_end_pfn

Yeah, I thoguth about that as well, but given the few places we check
for it I was hesitant to add it.
But it might make the situation more clear, so I will go with a helper.

> > +no_pages:
> >  		zone_init_internals(zone, j, nid, freesize);
> >  
> >  		if (!size)
> 
> We have another size check below. We essentially have right now:
> 
> "
> 	if (!size)
> 		goto no_pages;
> 
> 	[code]
> no_pages:
> 	zone_init_internals(zone, j, nid, freesize);
> 
> 	if (!size)
> 		continue
> 	[more code]
> "
> 
> IMHO, it would be nicer to avoid the label/goto by just doing a:
> 
> "
> 	if (!size) {
> 		zone_init_internals(zone, j, nid, 0);
> 		continue;
> 	}
> 
> 	[code]
> 	zone_init_internals(zone, j, nid, freesize);
> 	[more code]
> "
> 
> Or factoring out [code] into a separate function.

I did not think about how a refactor would look, so for now I will go
with your first proposal. If I see that a refactor is due, I will think
more about it.

thanks!


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node
  2022-04-28 12:35   ` David Hildenbrand
@ 2022-05-05  9:17     ` Oscar Salvador
  0 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2022-05-05  9:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Michal Hocko, Wei Yang, Miaohe Lin, linux-mm,
	linux-kernel

On Thu, Apr 28, 2022 at 02:35:11PM +0200, David Hildenbrand wrote:
> BUG_ON(ret);
> return ret;
> 
> hm? This will never return :)

Yes, this looks like a brainfart.
It will not return __only__ in the case where register_node_node()
fails, which is what we do right now, but then I do not know why added
"errno -> register_one_node() failed." in the comment.

I might have managed to confuse myself.

Thanks 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline
  2022-04-28 12:30   ` David Hildenbrand
@ 2022-05-05 10:37     ` Oscar Salvador
  0 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2022-05-05 10:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Michal Hocko, Wei Yang, Miaohe Lin, linux-mm,
	linux-kernel

On Thu, Apr 28, 2022 at 02:30:20PM +0200, David Hildenbrand wrote:
> On 07.03.22 16:07, Oscar Salvador wrote:
> > +static void node_reset_state(int node)
> > +{
> > +	pg_data_t *pgdat = NODE_DATA(node);
> > +	int cpu;
> > +
> > +	kswapd_stop(node);
> > +	kcompactd_stop(node);
> > +
> > +	reset_node_managed_pages(pgdat);
> > +	reset_node_present_pages(pgdat);
> > +
> > +	pgdat->nr_zones = 0;
> > +	pgdat->kswapd_order = 0;
> > +	pgdat->kswapd_highest_zoneidx = 0;
> > +	pgdat->node_start_pfn = 0;
> 
> 
> I'm confused why we have to mess with
> * present pages
> * managed pages
> * node_start_pfn
> 
> here at all.
> 
> 1) If there would be any present page left, calling node_reset_state()
> would be a BUG.
> 2) If there would be any manged page left, calling node_reset_state()
> would be a BUG.
> 3) node_start_pfn will be properly updated by
> remove_pfn_range_from_zone()->update_pgdat_span()

Yes, you are right, trusting update_pgdat_span() is the right to do
here.

> To make it clearer, I *think* touching node_start_pfn is very wrong.
> 
> What if the node still has ZONE_DEVICE? They don't account towards
> present pages but only towards spanned pages, and we're messing with the
> start range.

Did not think of that scenario, but as you said, we should be leaving
node/zone's pages accounting alone here.

> remove_pfn_range_from_zone()->update_pgdat_span() should be the only
> place that modifies the spanned range when offlining.

Will update the patch.

Thanks for the review David!


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-05 10:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 15:07 [PATCH 0/3] A minor hotplug refactoring Oscar Salvador
2022-03-07 15:07 ` [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
2022-04-28 12:13   ` David Hildenbrand
2022-05-05  9:09     ` Oscar Salvador
2022-03-07 15:07 ` [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
2022-04-28 12:30   ` David Hildenbrand
2022-05-05 10:37     ` Oscar Salvador
2022-03-07 15:07 ` [PATCH 3/3] mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node Oscar Salvador
2022-04-28 12:35   ` David Hildenbrand
2022-05-05  9:17     ` Oscar Salvador
2022-04-27 21:05 ` [PATCH 0/3] A minor hotplug refactoring Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.