All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Minor memoryhotplug refactoring
@ 2022-06-21  4:17 Oscar Salvador
  2022-06-21  4:17 ` [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
  2022-06-21  4:17 ` [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
  0 siblings, 2 replies; 12+ messages in thread
From: Oscar Salvador @ 2022-06-21  4:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, linux-mm, linux-kernel, Oscar Salvador

Hi,

these are a few cleanups.

The first one is to spare us with some operations when dealing with empty nodes,
and the second one is to refactor memory-hotplug code taking advantatge of the
fact that we initialize all nodes (empty or not) when booting the system.
That gives us the chance to only have to reset some fields when the node
goes offline again.

More information can be found in the respective patches.

v1 -> v2:
          - Addressed feedback from David

Oscar Salvador (2):
  mm/page_alloc: Do not calculate node's total pages and memmap pages
    when empty
  mm/memory_hotplug: Reset node's state when empty during offline

 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 54 ++++++++++++++++------------
 mm/page_alloc.c                | 65 +++++++++++-----------------------
 3 files changed, 53 insertions(+), 68 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-06-21  4:17 [PATCH v2 0/2] Minor memoryhotplug refactoring Oscar Salvador
@ 2022-06-21  4:17 ` Oscar Salvador
  2022-06-21  7:44   ` David Hildenbrand
  2022-06-21  4:17 ` [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
  1 sibling, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2022-06-21  4:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, linux-mm, linux-kernel, Oscar Salvador

free_area_init_node() calls calculate_node_totalpages() and
free_area_init_core(). The former to get node's {spanned,present}_pages,
and the latter to calculate, among other things, how many pages per zone
we spent on memmap_pages, which is used to substract zone's free pages.

On memoryless-nodes, it is pointless to perform such a bunch of work, so
make sure we skip the calculations when having a node or empty zone.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_alloc.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3df0485..2b9b2422ba32 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7361,6 +7361,10 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
 	unsigned long realtotalpages = 0, totalpages = 0;
 	enum zone_type i;
 
+	/* Skip calculation for memoryless nodes */
+	if (pgdat_is_empty(pgdat))
+		goto no_pages;
+
 	for (i = 0; i < MAX_NR_ZONES; i++) {
 		struct zone *zone = pgdat->node_zones + i;
 		unsigned long zone_start_pfn, zone_end_pfn;
@@ -7393,6 +7397,7 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
 		realtotalpages += real_size;
 	}
 
+no_pages:
 	pgdat->node_spanned_pages = totalpages;
 	pgdat->node_present_pages = realtotalpages;
 	pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
@@ -7610,6 +7615,12 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 		size = zone->spanned_pages;
 		freesize = zone->present_pages;
 
+		/* No pages? Nothing to calculate then. */
+		if (!size) {
+			zone_init_internals(zone, j, nid, 0);
+			continue;
+		}
+
 		/*
 		 * Adjust freesize so that it accounts for how much memory
 		 * is used by this zone for memmap. This affects the watermark
@@ -7647,9 +7658,6 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 		 */
 		zone_init_internals(zone, j, nid, freesize);
 
-		if (!size)
-			continue;
-
 		set_pageblock_order();
 		setup_usemap(zone);
 		init_currently_empty_zone(zone, zone->zone_start_pfn, size);
@@ -7730,7 +7738,7 @@ static void __init free_area_init_node(int nid)
 	pgdat->node_start_pfn = start_pfn;
 	pgdat->per_cpu_nodestats = NULL;
 
-	if (start_pfn != end_pfn) {
+	if (!pgdat_is_empty(pgdat)) {
 		pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
 			(u64)start_pfn << PAGE_SHIFT,
 			end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline
  2022-06-21  4:17 [PATCH v2 0/2] Minor memoryhotplug refactoring Oscar Salvador
  2022-06-21  4:17 ` [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
@ 2022-06-21  4:17 ` Oscar Salvador
  2022-06-21  7:59   ` David Hildenbrand
  1 sibling, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2022-06-21  4:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, linux-mm, linux-kernel, Oscar Salvador

All possible nodes are now pre-allocated at boot time by free_area_init()->
free_area_init_node(), and those which are to be hot-plugged are initialized
later on by hotadd_init_pgdat()->free_area_init_core_hotplug() when they
become online.

free_area_init_core_hotplug() calls pgdat_init_internals() and
zone_init_internals() to initialize some internal data structures
and zeroes a few pgdat fields.

But we do already call pgdat_init_internals() and zone_init_internals()
for all possible nodes back in free_area_init_core(), and pgdat fields
are already zeroed because the pre-allocation memsets with 0s the
structure, meaning we do not need to repeat the process when
the node becomes online.

So initialize it only once when booting, and make sure to reset
the fields we care about to 0 when the node goes empty.
The only thing we need to check for is to allocate per_cpu_nodestats
struct the very first time this node goes online.

node_reset_state() is the function in charge of resetting pgdat's fields,
and it is called when offline_pages() detects that the node becomes empty
worth of memory.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 54 ++++++++++++++++++++--------------
 mm/page_alloc.c                | 49 +++++-------------------------
 3 files changed, 41 insertions(+), 64 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 20d7edf62a6a..917112661b5c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct zone *zone);
 extern void clear_zone_contiguous(struct zone *zone);
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
+extern bool pgdat_has_boot_nodestats(pg_data_t *pgdat);
 extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
 extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
 extern int add_memory_resource(int nid, struct resource *resource,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1213d0c67a53..8a464cdd44ad 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1176,18 +1176,18 @@ static void reset_node_present_pages(pg_data_t *pgdat)
 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
 static pg_data_t __ref *hotadd_init_pgdat(int nid)
 {
-	struct pglist_data *pgdat;
+	struct pglist_data *pgdat = NODE_DATA(nid);
 
 	/*
-	 * NODE_DATA is preallocated (free_area_init) but its internal
-	 * state is not allocated completely. Add missing pieces.
-	 * Completely offline nodes stay around and they just need
-	 * reintialization.
+	 * NODE_DATA is preallocated (free_area_init), the only thing missing
+	 * is to allocate its per_cpu_nodestats struct and to build node's
+	 * zonelists. The allocation of per_cpu_nodestats only needs to be done
+	 * the very first time this node is brought up, as we reset its state
+	 * when all node's memory goes offline.
 	 */
-	pgdat = NODE_DATA(nid);
-
-	/* init node's zones as empty zones, we don't have any present pages.*/
-	free_area_init_core_hotplug(pgdat);
+	if (pgdat_has_boot_nodestats(pgdat))
+		pgdat->per_cpu_nodestats = alloc_percpu_gfp(struct per_cpu_nodestat,
+							    __GFP_ZERO);
 
 	/*
 	 * The node we allocated has no zone fallback lists. For avoiding
@@ -1195,15 +1195,6 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
 	 */
 	build_all_zonelists(pgdat);
 
-	/*
-	 * When memory is hot-added, all the memory is in offline state. So
-	 * clear all zones' present_pages because they will be updated in
-	 * online_pages() and offline_pages().
-	 * TODO: should be in free_area_init_core_hotplug?
-	 */
-	reset_node_managed_pages(pgdat);
-	reset_node_present_pages(pgdat);
-
 	return pgdat;
 }
 
@@ -1780,6 +1771,26 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
 		node_clear_state(node, N_MEMORY);
 }
 
+static void node_reset_state(int node)
+{
+	pg_data_t *pgdat = NODE_DATA(node);
+	int cpu;
+
+	kswapd_stop(node);
+	kcompactd_stop(node);
+
+	pgdat->nr_zones = 0;
+	pgdat->kswapd_order = 0;
+	pgdat->kswapd_highest_zoneidx = 0;
+
+	for_each_online_cpu(cpu) {
+		struct per_cpu_nodestat *p;
+
+		p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
+		memset(p, 0, sizeof(*p));
+	}
+}
+
 static int count_system_ram_pages_cb(unsigned long start_pfn,
 				     unsigned long nr_pages, void *data)
 {
@@ -1940,10 +1951,9 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	}
 
 	node_states_clear_node(node, &arg);
-	if (arg.status_change_nid >= 0) {
-		kswapd_stop(node);
-		kcompactd_stop(node);
-	}
+	if (arg.status_change_nid >= 0)
+		/* Reset node's state as all its memory went offline. */
+		node_reset_state(node);
 
 	writeback_set_ratelimit();
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2b9b2422ba32..384bb5a50743 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6416,6 +6416,13 @@ static DEFINE_PER_CPU(struct per_cpu_pages, boot_pageset);
 static DEFINE_PER_CPU(struct per_cpu_zonestat, boot_zonestats);
 DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+bool pgdat_has_boot_nodestats(pg_data_t *pgdat)
+{
+	return pgdat->per_cpu_nodestats == &boot_nodestats;
+}
+#endif
+
 static void __build_all_zonelists(void *data)
 {
 	int nid;
@@ -7539,7 +7546,7 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
 	lruvec_init(&pgdat->__lruvec);
 }
 
-static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
+static void __init zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
 							unsigned long remaining_pages)
 {
 	atomic_long_set(&zone->managed_pages, remaining_pages);
@@ -7551,46 +7558,6 @@ static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx,
 	zone_pcp_init(zone);
 }
 
-/*
- * Set up the zone data structures
- * - init pgdat internals
- * - init all zones belonging to this node
- *
- * NOTE: this function is only called during memory hotplug
- */
-#ifdef CONFIG_MEMORY_HOTPLUG
-void __ref free_area_init_core_hotplug(struct pglist_data *pgdat)
-{
-	int nid = pgdat->node_id;
-	enum zone_type z;
-	int cpu;
-
-	pgdat_init_internals(pgdat);
-
-	if (pgdat->per_cpu_nodestats == &boot_nodestats)
-		pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);
-
-	/*
-	 * Reset the nr_zones, order and highest_zoneidx before reuse.
-	 * Note that kswapd will init kswapd_highest_zoneidx properly
-	 * when it starts in the near future.
-	 */
-	pgdat->nr_zones = 0;
-	pgdat->kswapd_order = 0;
-	pgdat->kswapd_highest_zoneidx = 0;
-	pgdat->node_start_pfn = 0;
-	for_each_online_cpu(cpu) {
-		struct per_cpu_nodestat *p;
-
-		p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
-		memset(p, 0, sizeof(*p));
-	}
-
-	for (z = 0; z < MAX_NR_ZONES; z++)
-		zone_init_internals(&pgdat->node_zones[z], z, nid, 0);
-}
-#endif
-
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-06-21  4:17 ` [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
@ 2022-06-21  7:44   ` David Hildenbrand
  2022-06-22  3:47     ` Oscar Salvador
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2022-06-21  7:44 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton; +Cc: Michal Hocko, linux-mm, linux-kernel

On 21.06.22 06:17, Oscar Salvador wrote:
> free_area_init_node() calls calculate_node_totalpages() and
> free_area_init_core(). The former to get node's {spanned,present}_pages,
> and the latter to calculate, among other things, how many pages per zone
> we spent on memmap_pages, which is used to substract zone's free pages.
> 
> On memoryless-nodes, it is pointless to perform such a bunch of work, so
> make sure we skip the calculations when having a node or empty zone.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/page_alloc.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e008a3df0485..2b9b2422ba32 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7361,6 +7361,10 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
>  	unsigned long realtotalpages = 0, totalpages = 0;
>  	enum zone_type i;
>  
> +	/* Skip calculation for memoryless nodes */
> +	if (pgdat_is_empty(pgdat))
> +		goto no_pages;
> +
>  	for (i = 0; i < MAX_NR_ZONES; i++) {
>  		struct zone *zone = pgdat->node_zones + i;
>  		unsigned long zone_start_pfn, zone_end_pfn;
> @@ -7393,6 +7397,7 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
>  		realtotalpages += real_size;
>  	}
>  
> +no_pages:
>  	pgdat->node_spanned_pages = totalpages;
>  	pgdat->node_present_pages = realtotalpages;
>  	pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
> @@ -7610,6 +7615,12 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
>  		size = zone->spanned_pages;
>  		freesize = zone->present_pages;
>  
> +		/* No pages? Nothing to calculate then. */
> +		if (!size) {
> +			zone_init_internals(zone, j, nid, 0);
> +			continue;
> +		}
> +
>  		/*
>  		 * Adjust freesize so that it accounts for how much memory
>  		 * is used by this zone for memmap. This affects the watermark
> @@ -7647,9 +7658,6 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
>  		 */
>  		zone_init_internals(zone, j, nid, freesize);
>  
> -		if (!size)
> -			continue;
> -
>  		set_pageblock_order();
>  		setup_usemap(zone);
>  		init_currently_empty_zone(zone, zone->zone_start_pfn, size);
> @@ -7730,7 +7738,7 @@ static void __init free_area_init_node(int nid)
>  	pgdat->node_start_pfn = start_pfn;
>  	pgdat->per_cpu_nodestats = NULL;
>  
> -	if (start_pfn != end_pfn) {
> +	if (!pgdat_is_empty(pgdat)) {
>  		pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
>  			(u64)start_pfn << PAGE_SHIFT,
>  			end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);


It's worth noting that the check in pgdat_is_empty() is slightly
different. I *think* it doesn't matter in practice, yet I wonder if we
should simply fixup (currently unused) pgdat_is_empty().

Anyhow

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline
  2022-06-21  4:17 ` [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
@ 2022-06-21  7:59   ` David Hildenbrand
  2022-06-22  4:25     ` Oscar Salvador
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2022-06-21  7:59 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton; +Cc: Michal Hocko, linux-mm, linux-kernel

On 21.06.22 06:17, Oscar Salvador wrote:
> All possible nodes are now pre-allocated at boot time by free_area_init()->
> free_area_init_node(), and those which are to be hot-plugged are initialized
> later on by hotadd_init_pgdat()->free_area_init_core_hotplug() when they
> become online.
> 
> free_area_init_core_hotplug() calls pgdat_init_internals() and
> zone_init_internals() to initialize some internal data structures
> and zeroes a few pgdat fields.
> 
> But we do already call pgdat_init_internals() and zone_init_internals()
> for all possible nodes back in free_area_init_core(), and pgdat fields
> are already zeroed because the pre-allocation memsets with 0s the
> structure, meaning we do not need to repeat the process when
> the node becomes online.
> 
> So initialize it only once when booting, and make sure to reset
> the fields we care about to 0 when the node goes empty.
> The only thing we need to check for is to allocate per_cpu_nodestats
> struct the very first time this node goes online.
> 
> node_reset_state() is the function in charge of resetting pgdat's fields,
> and it is called when offline_pages() detects that the node becomes empty
> worth of memory.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/memory_hotplug.h |  2 +-
>  mm/memory_hotplug.c            | 54 ++++++++++++++++++++--------------
>  mm/page_alloc.c                | 49 +++++-------------------------
>  3 files changed, 41 insertions(+), 64 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 20d7edf62a6a..917112661b5c 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct zone *zone);
>  extern void clear_zone_contiguous(struct zone *zone);
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
> +extern bool pgdat_has_boot_nodestats(pg_data_t *pgdat);
>  extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>  extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>  extern int add_memory_resource(int nid, struct resource *resource,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1213d0c67a53..8a464cdd44ad 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1176,18 +1176,18 @@ static void reset_node_present_pages(pg_data_t *pgdat)
>  /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
>  static pg_data_t __ref *hotadd_init_pgdat(int nid)
>  {
> -	struct pglist_data *pgdat;
> +	struct pglist_data *pgdat = NODE_DATA(nid);
>  
>  	/*
> -	 * NODE_DATA is preallocated (free_area_init) but its internal
> -	 * state is not allocated completely. Add missing pieces.
> -	 * Completely offline nodes stay around and they just need
> -	 * reintialization.
> +	 * NODE_DATA is preallocated (free_area_init), the only thing missing
> +	 * is to allocate its per_cpu_nodestats struct and to build node's
> +	 * zonelists. The allocation of per_cpu_nodestats only needs to be done
> +	 * the very first time this node is brought up, as we reset its state
> +	 * when all node's memory goes offline.
>  	 */
> -	pgdat = NODE_DATA(nid);
> -
> -	/* init node's zones as empty zones, we don't have any present pages.*/
> -	free_area_init_core_hotplug(pgdat);
> +	if (pgdat_has_boot_nodestats(pgdat))
> +		pgdat->per_cpu_nodestats = alloc_percpu_gfp(struct per_cpu_nodestat,
> +							    __GFP_ZERO);
>  
>  	/*
>  	 * The node we allocated has no zone fallback lists. For avoiding
> @@ -1195,15 +1195,6 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
>  	 */
>  	build_all_zonelists(pgdat);
>  
> -	/*
> -	 * When memory is hot-added, all the memory is in offline state. So
> -	 * clear all zones' present_pages because they will be updated in
> -	 * online_pages() and offline_pages().
> -	 * TODO: should be in free_area_init_core_hotplug?
> -	 */
> -	reset_node_managed_pages(pgdat);
> -	reset_node_present_pages(pgdat);
> -
>  	return pgdat;
>  }
>  
> @@ -1780,6 +1771,26 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
>  		node_clear_state(node, N_MEMORY);
>  }
>  
> +static void node_reset_state(int node)
> +{
> +	pg_data_t *pgdat = NODE_DATA(node);
> +	int cpu;
> +
> +	kswapd_stop(node);
> +	kcompactd_stop(node);
> +
> +	pgdat->nr_zones = 0;

^ what is that? it should be "highest_zone_idx" and I don't see any
reason that we really need this.

To detect if a node is empty we can use pgdat_is_empty(). To detect if a
zone is empty we can use zone_is_empty().

The usage of "pgdat->nr_zones" as an optimization is questionable,
especially when iterating over our handful of zones where most nodes
miss the *lower* zones like ZONE_DMA* in practice and have ZONE_NORMAL.

Can we get rid of that and just check pgdat_is_empty() and
zone_is_empty() and iterate all applicable zones from 0..X?


If it amkes sense what I'm saying, that could be done before this patch.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-06-21  7:44   ` David Hildenbrand
@ 2022-06-22  3:47     ` Oscar Salvador
  2022-06-22  3:56       ` Muchun Song
  0 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2022-06-22  3:47 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Tue, Jun 21, 2022 at 09:44:47AM +0200, David Hildenbrand wrote:
> 
> 
> It's worth noting that the check in pgdat_is_empty() is slightly
> different. I *think* it doesn't matter in practice, yet I wonder if we
> should simply fixup (currently unused) pgdat_is_empty().

I guess we could change it to

 static inline bool pgdat_is_empty(pg_data_t *pgdat)
 {
	 return node_start_pfn(pgdat->node_id) == node_end_pfn(pgdat->node_id)
 }

? And maybe even rename it to to node_is_empty (not sure why but I tend to like
that more than pgdat) 

I could squeeze a "fixup" patch for that before this one. 

> 
> Anyhow
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks!


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-06-22  3:47     ` Oscar Salvador
@ 2022-06-22  3:56       ` Muchun Song
  2022-06-22  8:31         ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Muchun Song @ 2022-06-22  3:56 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: David Hildenbrand, Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Wed, Jun 22, 2022 at 05:47:22AM +0200, Oscar Salvador wrote:
> On Tue, Jun 21, 2022 at 09:44:47AM +0200, David Hildenbrand wrote:
> > 
> > 
> > It's worth noting that the check in pgdat_is_empty() is slightly
> > different. I *think* it doesn't matter in practice, yet I wonder if we
> > should simply fixup (currently unused) pgdat_is_empty().
> 
> I guess we could change it to
> 
>  static inline bool pgdat_is_empty(pg_data_t *pgdat)
>  {
> 	 return node_start_pfn(pgdat->node_id) == node_end_pfn(pgdat->node_id)
>  }
> 
> ? And maybe even rename it to to node_is_empty (not sure why but I tend to like

At least I like this name (node_is_empty) as well.

Thanks.

> that more than pgdat) 
> 
> I could squeeze a "fixup" patch for that before this one. 
> 
> > 
> > Anyhow
> > 
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> Thanks!
> 
> 
> -- 
> Oscar Salvador
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline
  2022-06-21  7:59   ` David Hildenbrand
@ 2022-06-22  4:25     ` Oscar Salvador
  2022-06-22  8:44       ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2022-06-22  4:25 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Tue, Jun 21, 2022 at 09:59:07AM +0200, David Hildenbrand wrote:
> > +static void node_reset_state(int node)
> > +{
> > +	pg_data_t *pgdat = NODE_DATA(node);
> > +	int cpu;
> > +
> > +	kswapd_stop(node);
> > +	kcompactd_stop(node);
> > +
> > +	pgdat->nr_zones = 0;
> 
> ^ what is that? it should be "highest_zone_idx" and I don't see any
> reason that we really need this.

Uhm, I thought we need to reset this, otherwise init_currently_empty_zone()
might not set it to a right value:

...
 if (zone_idx > pgdat->nr_zones)
    pgdat->nr_zones = zone_idx
...

At least we set it to 0 in free_area_init_core_hotplug() (before this patch).

> To detect if a node is empty we can use pgdat_is_empty(). To detect if a
> zone is empty we can use zone_is_empty().
> 
> The usage of "pgdat->nr_zones" as an optimization is questionable,
> especially when iterating over our handful of zones where most nodes
> miss the *lower* zones like ZONE_DMA* in practice and have ZONE_NORMAL.
> 
> Can we get rid of that and just check pgdat_is_empty() and
> zone_is_empty() and iterate all applicable zones from 0..X?

So, lemme see if I get you.
You mean to e.g: replace the following (code snippet from set_pgdat_percpu_threshold)

  for (i = 0; i < pgdat->nr_zones; i++) {
           zone = &pgdat->node_zones[i];

		    [some code]
  }

with this:

  for (zid = 0; zid < MAX_NR_ZONES; i++) {
            struct zone *zone = pgdat->node_zones + i;

            if (zone_is_empty(zone))
                    continue; 
  }

I guess we can, and I can see that we have a mix of both usages, so it might be
good to consolidate one.
And actually, I think we do the same amount of work, right? So not really an
optimization in those pieces of code.

The only thing that unsettles me is the compaction part.
We set pgdat->kcompactd_highest_zoneidx by checking pgdat->nr_zones, and use
that as our compact_control->highest_zoneidx. (kcompactd->kcompactd_do_work)

Now, I do not really see any reason we could not adapt that code to not
realy on pgdat->nr_zones, but I would have to check further how this
interacts with highest_zoneidx down the road, and where else should
we rewrite code.


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-06-22  3:56       ` Muchun Song
@ 2022-06-22  8:31         ` David Hildenbrand
  2022-06-22  8:54           ` Muchun Song
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2022-06-22  8:31 UTC (permalink / raw)
  To: Muchun Song, Oscar Salvador
  Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On 22.06.22 05:56, Muchun Song wrote:
> On Wed, Jun 22, 2022 at 05:47:22AM +0200, Oscar Salvador wrote:
>> On Tue, Jun 21, 2022 at 09:44:47AM +0200, David Hildenbrand wrote:
>>>
>>>
>>> It's worth noting that the check in pgdat_is_empty() is slightly
>>> different. I *think* it doesn't matter in practice, yet I wonder if we
>>> should simply fixup (currently unused) pgdat_is_empty().
>>
>> I guess we could change it to
>>
>>  static inline bool pgdat_is_empty(pg_data_t *pgdat)
>>  {
>> 	 return node_start_pfn(pgdat->node_id) == node_end_pfn(pgdat->node_id)
>>  }
>>
>> ? And maybe even rename it to to node_is_empty (not sure why but I tend to like
> 
> At least I like this name (node_is_empty) as well.
> 

Let's try keeping it consistent. I think node_is_empty() might indicate
that we're punching in a node id instead of a pgdat.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline
  2022-06-22  4:25     ` Oscar Salvador
@ 2022-06-22  8:44       ` David Hildenbrand
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2022-06-22  8:44 UTC (permalink / raw)
  To: Oscar Salvador; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On 22.06.22 06:25, Oscar Salvador wrote:
> On Tue, Jun 21, 2022 at 09:59:07AM +0200, David Hildenbrand wrote:
>>> +static void node_reset_state(int node)
>>> +{
>>> +	pg_data_t *pgdat = NODE_DATA(node);
>>> +	int cpu;
>>> +
>>> +	kswapd_stop(node);
>>> +	kcompactd_stop(node);
>>> +
>>> +	pgdat->nr_zones = 0;
>>
>> ^ what is that? it should be "highest_zone_idx" and I don't see any
>> reason that we really need this.
> 
> Uhm, I thought we need to reset this, otherwise init_currently_empty_zone()
> might not set it to a right value:
> 
> ...
>  if (zone_idx > pgdat->nr_zones)
>     pgdat->nr_zones = zone_idx
> ...
> 
> At least we set it to 0 in free_area_init_core_hotplug() (before this patch).

Yeah, I don't think we need to, the nodes+zones should be empty either
way. Maybe a micro-optimization that doesn't really optimize anything
that much?

free_area_init_node() warns if it isn't reset, but my gut feeling is
this is completely unnecessary. We're not dealing with 200 zones ...

> 
>> To detect if a node is empty we can use pgdat_is_empty(). To detect if a
>> zone is empty we can use zone_is_empty().
>>
>> The usage of "pgdat->nr_zones" as an optimization is questionable,
>> especially when iterating over our handful of zones where most nodes
>> miss the *lower* zones like ZONE_DMA* in practice and have ZONE_NORMAL.
>>
>> Can we get rid of that and just check pgdat_is_empty() and
>> zone_is_empty() and iterate all applicable zones from 0..X?
> 
> So, lemme see if I get you.
> You mean to e.g: replace the following (code snippet from set_pgdat_percpu_threshold)
> 
>   for (i = 0; i < pgdat->nr_zones; i++) {
>            zone = &pgdat->node_zones[i];
> 
> 		    [some code]
>   }
> 
> with this:
> 
>   for (zid = 0; zid < MAX_NR_ZONES; i++) {
>             struct zone *zone = pgdat->node_zones + i;
> 
>             if (zone_is_empty(zone))
>                     continue; 
>   }

Yes, some places that are not interested in ZONE_DEVICE might want to
skip that completely. See below.

> 
> I guess we can, and I can see that we have a mix of both usages, so it might be
> good to consolidate one.
> And actually, I think we do the same amount of work, right? So not really an
> optimization in those pieces of code.

That's my understanding.

> 
> The only thing that unsettles me is the compaction part.
> We set pgdat->kcompactd_highest_zoneidx by checking pgdat->nr_zones, and use
> that as our compact_control->highest_zoneidx. (kcompactd->kcompactd_do_work)

I wonder why we would want to use ZONE_DEVICE there ...

move_pfn_range_to_zone()->init_currently_empty_zone() would set
pgdat->nr_zones = ZONE_DEVICE.

Which looks unnecessary.

> 
> Now, I do not really see any reason we could not adapt that code to not
> realy on pgdat->nr_zones, but I would have to check further how this
> interacts with highest_zoneidx down the road, and where else should
> we rewrite code.

I wonder if all we want is:


diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index aab70355d64f..905919683025 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -497,8 +497,9 @@ enum zone_type {
         * there can be false negatives).
         */
        ZONE_MOVABLE,
+       __MAX_NR_BUDDY_ZONES,
 #ifdef CONFIG_ZONE_DEVICE
-       ZONE_DEVICE,
+       ZONE_DEVICE = __MAX_NR_BUDDY_ZONES,
 #endif
        __MAX_NR_ZONES

diff --git a/tools/testing/memblock/linux/mmzone.h
b/tools/testing/memblock/linux/mmzone.h
index 7c2eb5c9bb54..2c3492239e45 100644
--- a/tools/testing/memblock/linux/mmzone.h
+++ b/tools/testing/memblock/linux/mmzone.h
@@ -17,6 +17,7 @@ enum zone_type {
 };

 #define MAX_NR_ZONES __MAX_NR_ZONES
+#define MAX_NR_BUDDY_ZONES __MAX_NR_BUDDY_ZONES
 #define MAX_ORDER 11
 #define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))



Then, all relevant buddy-related code (compaction ...) can simply run

for (i = 0; i < MAX_NR_BUDDY_ZONES; i++)

or

for (i = MAX_NR_BUDDY_ZONES - 1; i >= 0; i--)

and check if the relevant zone is empty.


There will still be users of MAX_NR_ZONES, though, that have to consider
ZONE_DEVICE as well.

-- 
Thanks,

David / dhildenb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-06-22  8:31         ` David Hildenbrand
@ 2022-06-22  8:54           ` Muchun Song
  2022-06-22 10:49             ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Muchun Song @ 2022-06-22  8:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Wed, Jun 22, 2022 at 10:31:12AM +0200, David Hildenbrand wrote:
> On 22.06.22 05:56, Muchun Song wrote:
> > On Wed, Jun 22, 2022 at 05:47:22AM +0200, Oscar Salvador wrote:
> >> On Tue, Jun 21, 2022 at 09:44:47AM +0200, David Hildenbrand wrote:
> >>>
> >>>
> >>> It's worth noting that the check in pgdat_is_empty() is slightly
> >>> different. I *think* it doesn't matter in practice, yet I wonder if we
> >>> should simply fixup (currently unused) pgdat_is_empty().
> >>
> >> I guess we could change it to
> >>
> >>  static inline bool pgdat_is_empty(pg_data_t *pgdat)
> >>  {
> >> 	 return node_start_pfn(pgdat->node_id) == node_end_pfn(pgdat->node_id)
> >>  }
> >>
> >> ? And maybe even rename it to to node_is_empty (not sure why but I tend to like
> > 
> > At least I like this name (node_is_empty) as well.
> > 
> 
> Let's try keeping it consistent. I think node_is_empty() might indicate
> that we're punching in a node id instead of a pgdat.
>

I suspect Oscar will change the argument to "nid" as well, like:

static inline bool node_is_empty(int nid)
{
	return node_start_pfn(nid) == node_end_pfn(nid);
}

Does this look good?

Thanks.
 
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty
  2022-06-22  8:54           ` Muchun Song
@ 2022-06-22 10:49             ` David Hildenbrand
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2022-06-22 10:49 UTC (permalink / raw)
  To: Muchun Song
  Cc: Oscar Salvador, Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On 22.06.22 10:54, Muchun Song wrote:
> On Wed, Jun 22, 2022 at 10:31:12AM +0200, David Hildenbrand wrote:
>> On 22.06.22 05:56, Muchun Song wrote:
>>> On Wed, Jun 22, 2022 at 05:47:22AM +0200, Oscar Salvador wrote:
>>>> On Tue, Jun 21, 2022 at 09:44:47AM +0200, David Hildenbrand wrote:
>>>>>
>>>>>
>>>>> It's worth noting that the check in pgdat_is_empty() is slightly
>>>>> different. I *think* it doesn't matter in practice, yet I wonder if we
>>>>> should simply fixup (currently unused) pgdat_is_empty().
>>>>
>>>> I guess we could change it to
>>>>
>>>>  static inline bool pgdat_is_empty(pg_data_t *pgdat)
>>>>  {
>>>> 	 return node_start_pfn(pgdat->node_id) == node_end_pfn(pgdat->node_id)
>>>>  }
>>>>
>>>> ? And maybe even rename it to to node_is_empty (not sure why but I tend to like
>>>
>>> At least I like this name (node_is_empty) as well.
>>>
>>
>> Let's try keeping it consistent. I think node_is_empty() might indicate
>> that we're punching in a node id instead of a pgdat.
>>
> 
> I suspect Oscar will change the argument to "nid" as well, like:
> 
> static inline bool node_is_empty(int nid)
> {
> 	return node_start_pfn(nid) == node_end_pfn(nid);
> }
> 
> Does this look good?

Then we have to lookup the pgdat multiple times for (IMHO) no real
compelling reason.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-22 10:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-21  4:17 [PATCH v2 0/2] Minor memoryhotplug refactoring Oscar Salvador
2022-06-21  4:17 ` [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
2022-06-21  7:44   ` David Hildenbrand
2022-06-22  3:47     ` Oscar Salvador
2022-06-22  3:56       ` Muchun Song
2022-06-22  8:31         ` David Hildenbrand
2022-06-22  8:54           ` Muchun Song
2022-06-22 10:49             ` David Hildenbrand
2022-06-21  4:17 ` [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
2022-06-21  7:59   ` David Hildenbrand
2022-06-22  4:25     ` Oscar Salvador
2022-06-22  8:44       ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.