All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Wei Yang <richard.weiyang@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline
Date: Thu, 28 Apr 2022 14:30:20 +0200	[thread overview]
Message-ID: <0d2853f6-66e5-251a-2d9e-c229f0ebcd5e@redhat.com> (raw)
In-Reply-To: <20220307150725.6810-3-osalvador@suse.de>

On 07.03.22 16:07, Oscar Salvador wrote:
> All possible nodes are now pre-allocated at boot time by free_area_init()->
> free_area_init_node(), and those which are to be hot-plugged are initialized
> later on by hotadd_init_pgdat()->free_area_init_core_hotplug() when they
> become online.
> 
> free_area_init_core_hotplug() calls pgdat_init_internals() and
> zone_init_internals() to initialize some internal data structures
> and zeroes a few pgdat fields.
> 
> But we do already call pgdat_init_internals() and zone_init_internals()
> for all possible nodes back in free_area_init_core(), and pgdat fields
> are already zeroed because the pre-allocation memsets with 0s the
> structure, meaning we do not need to repeat the process when
> the node becomes online.
> 
> So initialize it only once when booting, and make sure to reset
> the fields we care about to 0 when the node goes empty.
> The only thing we need to check for is to allocate per_cpu_nodestats
> struct the very first time this node goes online.
> 
> node_reset_state() is the function in charge of resetting pgdat's fields,
> and it is called when offline_pages() detects that the node becomes empty
> worth of memory.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/memory_hotplug.h |  2 +-
>  mm/memory_hotplug.c            | 58 +++++++++++++++++++++-------------
>  mm/page_alloc.c                | 49 +++++-----------------------
>  3 files changed, 45 insertions(+), 64 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 76bf2de86def..fcf4c9a023cc 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct zone *zone);
>  extern void clear_zone_contiguous(struct zone *zone);
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
> +extern bool pgdat_has_boot_nodestats(pg_data_t *pgdat);
>  extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>  extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>  extern int add_memory_resource(int nid, struct resource *resource,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ddc62f8b591f..07cece9e22e4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1164,18 +1164,18 @@ static void reset_node_present_pages(pg_data_t *pgdat)
>  /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
>  static pg_data_t __ref *hotadd_init_pgdat(int nid)
>  {
> -	struct pglist_data *pgdat;
> +	struct pglist_data *pgdat = NODE_DATA(nid);
>  
>  	/*
> -	 * NODE_DATA is preallocated (free_area_init) but its internal
> -	 * state is not allocated completely. Add missing pieces.
> -	 * Completely offline nodes stay around and they just need
> -	 * reintialization.
> +	 * NODE_DATA is preallocated (free_area_init), the only thing missing
> +	 * is to allocate its per_cpu_nodestats struct and to build node's
> +	 * zonelists. The allocation of per_cpu_nodestats only needs to be done
> +	 * the very first time this node is brought up, as we reset its state
> +	 * when all node's memory goes offline.
>  	 */
> -	pgdat = NODE_DATA(nid);
> -
> -	/* init node's zones as empty zones, we don't have any present pages.*/
> -	free_area_init_core_hotplug(pgdat);
> +	if (pgdat_has_boot_nodestats(pgdat))
> +		pgdat->per_cpu_nodestats = alloc_percpu_gfp(struct per_cpu_nodestat,
> +							    __GFP_ZERO);
>  
>  	/*
>  	 * The node we allocated has no zone fallback lists. For avoiding
> @@ -1183,15 +1183,6 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
>  	 */
>  	build_all_zonelists(pgdat);
>  
> -	/*
> -	 * When memory is hot-added, all the memory is in offline state. So
> -	 * clear all zones' present_pages because they will be updated in
> -	 * online_pages() and offline_pages().
> -	 * TODO: should be in free_area_init_core_hotplug?
> -	 */
> -	reset_node_managed_pages(pgdat);
> -	reset_node_present_pages(pgdat);
> -
>  	return pgdat;
>  }
>  
> @@ -1799,6 +1790,30 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
>  		node_clear_state(node, N_MEMORY);
>  }
>  
> +static void node_reset_state(int node)
> +{
> +	pg_data_t *pgdat = NODE_DATA(node);
> +	int cpu;
> +
> +	kswapd_stop(node);
> +	kcompactd_stop(node);
> +
> +	reset_node_managed_pages(pgdat);
> +	reset_node_present_pages(pgdat);
> +
> +	pgdat->nr_zones = 0;
> +	pgdat->kswapd_order = 0;
> +	pgdat->kswapd_highest_zoneidx = 0;
> +	pgdat->node_start_pfn = 0;


I'm confused why we have to mess with
* present pages
* managed pages
* node_start_pfn

here at all.

1) If there would be any present page left, calling node_reset_state()
would be a BUG.
2) If there would be any manged page left, calling node_reset_state()
would be a BUG.
3) node_start_pfn will be properly updated by
remove_pfn_range_from_zone()->update_pgdat_span()


To make it clearer, I *think* touching node_start_pfn is very wrong.

What if the node still has ZONE_DEVICE? They don't account towards
present pages but only towards spanned pages, and we're messing with the
start range.

remove_pfn_range_from_zone()->update_pgdat_span() should be the only
place that modifies the spanned range when offlining.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2022-04-28 12:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-07 15:07 [PATCH 0/3] A minor hotplug refactoring Oscar Salvador
2022-03-07 15:07 ` [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
2022-04-28 12:13   ` David Hildenbrand
2022-05-05  9:09     ` Oscar Salvador
2022-03-07 15:07 ` [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
2022-04-28 12:30   ` David Hildenbrand [this message]
2022-05-05 10:37     ` Oscar Salvador
2022-03-07 15:07 ` [PATCH 3/3] mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node Oscar Salvador
2022-04-28 12:35   ` David Hildenbrand
2022-05-05  9:17     ` Oscar Salvador
2022-04-27 21:05 ` [PATCH 0/3] A minor hotplug refactoring Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d2853f6-66e5-251a-2d9e-c229f0ebcd5e@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=richard.weiyang@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.