linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK
@ 2020-04-16 10:47 David Hildenbrand
  2020-04-16 10:47 ` [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn David Hildenbrand
  2020-04-16 10:47 ` [PATCH RFC 2/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK David Hildenbrand
  0 siblings, 2 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-04-16 10:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Anshuman Khandual,
	Baoquan He, Michal Hocko, Mike Rapoport, Oscar Salvador,
	Pankaj Gupta

While working on some other cleanups I stumbled over the creation/removal
of memblocks in hotplug code and wondered why we still need that. Turns
out, we only need that handling with CONFIG_ARCH_KEEP_MEMBLOCK - unless
I am missing something important (-> RFC).

I'll be sending out patches to remove CONFIG_ARCH_KEEP_MEMBLOCK on s390x
soonish, after finding a way to test them.

Gave it a quick test on x86-64.

David Hildenbrand (2):
  mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  mm/memory_hotplug: handle memblocks only with
    CONFIG_ARCH_KEEP_MEMBLOCK

 mm/Kconfig          |  3 +++
 mm/memory_hotplug.c | 28 +++++++++++++---------------
 2 files changed, 16 insertions(+), 15 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  2020-04-16 10:47 [PATCH RFC 0/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK David Hildenbrand
@ 2020-04-16 10:47 ` David Hildenbrand
  2020-04-16 14:11   ` Pankaj Gupta
  2020-04-21 12:30   ` Michal Hocko
  2020-04-16 10:47 ` [PATCH RFC 2/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK David Hildenbrand
  1 sibling, 2 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-04-16 10:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Michal Hocko,
	Baoquan He, Oscar Salvador, Pankaj Gupta

A hotadded node/pgdat will span no pages at all, until memory is moved to
the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
when onlining memory blocks. We don't have to initialize the
node_start_pfn to the memory we are adding.

Note: we'll also end up with pgdat->node_start_pfn == 0 when offlined the
last memory block belonging to a node (via remove_pfn_range_from_zone()->
update_pgdat_span()).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 47cf6036eb31..9b15ce465be2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -866,10 +866,9 @@ static void reset_node_present_pages(pg_data_t *pgdat)
 }
 
 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
-static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
+static pg_data_t __ref *hotadd_new_pgdat(int nid)
 {
 	struct pglist_data *pgdat;
-	unsigned long start_pfn = PFN_DOWN(start);
 
 	pgdat = NODE_DATA(nid);
 	if (!pgdat) {
@@ -899,9 +898,8 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
 	}
 
 	/* we can use NODE_DATA(nid) from here */
-
 	pgdat->node_id = nid;
-	pgdat->node_start_pfn = start_pfn;
+	pgdat->node_start_pfn = 0;
 
 	/* init node's zones as empty zones, we don't have any present pages.*/
 	free_area_init_core_hotplug(nid);
@@ -936,7 +934,6 @@ static void rollback_node_hotadd(int nid)
 /**
  * try_online_node - online a node if offlined
  * @nid: the node ID
- * @start: start addr of the node
  * @set_node_online: Whether we want to online the node
  * called by cpu_up() to online a node without onlined memory.
  *
@@ -945,7 +942,7 @@ static void rollback_node_hotadd(int nid)
  * 0 -> the node is already online
  * -ENOMEM -> the node could not be allocated
  */
-static int __try_online_node(int nid, u64 start, bool set_node_online)
+static int __try_online_node(int nid, bool set_node_online)
 {
 	pg_data_t *pgdat;
 	int ret = 1;
@@ -953,7 +950,7 @@ static int __try_online_node(int nid, u64 start, bool set_node_online)
 	if (node_online(nid))
 		return 0;
 
-	pgdat = hotadd_new_pgdat(nid, start);
+	pgdat = hotadd_new_pgdat(nid);
 	if (!pgdat) {
 		pr_err("Cannot online node %d due to NULL pgdat\n", nid);
 		ret = -ENOMEM;
@@ -977,7 +974,7 @@ int try_online_node(int nid)
 	int ret;
 
 	mem_hotplug_begin();
-	ret =  __try_online_node(nid, 0, true);
+	ret =  __try_online_node(nid, true);
 	mem_hotplug_done();
 	return ret;
 }
@@ -1031,7 +1028,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	 */
 	memblock_add_node(start, size, nid);
 
-	ret = __try_online_node(nid, start, false);
+	ret = __try_online_node(nid, false);
 	if (ret < 0)
 		goto error;
 	new_node = ret;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 2/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK
  2020-04-16 10:47 [PATCH RFC 0/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK David Hildenbrand
  2020-04-16 10:47 ` [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn David Hildenbrand
@ 2020-04-16 10:47 ` David Hildenbrand
  2020-04-16 17:09   ` Mike Rapoport
  2020-04-21 12:39   ` Michal Hocko
  1 sibling, 2 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-04-16 10:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Michal Hocko,
	Baoquan He, Oscar Salvador, Pankaj Gupta, Mike Rapoport,
	Anshuman Khandual

The comment in add_memory_resource() is stale: hotadd_new_pgdat() will
no longer call get_pfn_range_for_nid(), as a hotadded pgdat will simply
span no pages at all, until memory is moved to the zone/node via
move_pfn_range_to_zone() - e.g., when onlining memory blocks.

The only archs that care about memblocks for hotplugged memory (either
for iterating over all system RAM or testing for memory validity) are
arm64, s390x, and powerpc - due to CONFIG_ARCH_KEEP_MEMBLOCK. Without
CONFIG_ARCH_KEEP_MEMBLOCK, we can simply stop messing with memblocks.

For s390x, it seems to be fairly easy to avoid CONFIG_ARCH_KEEP_MEMBLOCK.
arm64 could rework most code (esp., pfn_valid(), valid_phys_addr_range()
and kexec_file_load()) to not require memblocks for hotplugged
memory. E.g., as hotplugged memory has no holes and can be identified
using !early_section(), arm64's variant of pfn_valid() could be reworked
fairly easily to not require memblocks for hotadded memory. powerpc might
be more involed.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/Kconfig          |  3 +++
 mm/memory_hotplug.c | 13 +++++++------
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index c1acc34c1c35..a063fd9cdff4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -136,6 +136,9 @@ config HAVE_FAST_GUP
 	depends on MMU
 	bool
 
+# Don't discard allocated memory used to track "memory" and "reserved" memblocks
+# after early boot, so it can still be used to test for validity of memory.
+# Also, memblocks are updated with memory hot(un)plug.
 config ARCH_KEEP_MEMBLOCK
 	bool
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9b15ce465be2..104285ee9ae8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1020,13 +1020,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
 
 	mem_hotplug_begin();
 
-	/*
-	 * Add new range to memblock so that when hotadd_new_pgdat() is called
-	 * to allocate new pgdat, get_pfn_range_for_nid() will be able to find
-	 * this new range and calculate total pages correctly.  The range will
-	 * be removed at hot-remove time.
-	 */
+#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
 	memblock_add_node(start, size, nid);
+#endif
 
 	ret = __try_online_node(nid, false);
 	if (ret < 0)
@@ -1075,7 +1071,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	/* rollback pgdat allocation and others */
 	if (new_node)
 		rollback_node_hotadd(nid);
+#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
 	memblock_remove(start, size);
+#endif
 	mem_hotplug_done();
 	return ret;
 }
@@ -1751,8 +1749,11 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
 	mem_hotplug_begin();
 
 	arch_remove_memory(nid, start, size, NULL);
+
+#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
 	memblock_free(start, size);
 	memblock_remove(start, size);
+#endif
 	__release_memory_resource(start, size);
 
 	try_offline_node(nid);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  2020-04-16 10:47 ` [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn David Hildenbrand
@ 2020-04-16 14:11   ` Pankaj Gupta
  2020-04-21 12:30   ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Pankaj Gupta @ 2020-04-16 14:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, Linux MM, Andrew Morton, Michal Hocko, Baoquan He, Oscar Salvador

> A hotadded node/pgdat will span no pages at all, until memory is moved to
> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
> when onlining memory blocks. We don't have to initialize the
> node_start_pfn to the memory we are adding.
>
> Note: we'll also end up with pgdat->node_start_pfn == 0 when offlined the
> last memory block belonging to a node (via remove_pfn_range_from_zone()->
> update_pgdat_span()).
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory_hotplug.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 47cf6036eb31..9b15ce465be2 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -866,10 +866,9 @@ static void reset_node_present_pages(pg_data_t *pgdat)
>  }
>
>  /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
> -static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
> +static pg_data_t __ref *hotadd_new_pgdat(int nid)
>  {
>         struct pglist_data *pgdat;
> -       unsigned long start_pfn = PFN_DOWN(start);
>
>         pgdat = NODE_DATA(nid);
>         if (!pgdat) {
> @@ -899,9 +898,8 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>         }
>
>         /* we can use NODE_DATA(nid) from here */
> -
>         pgdat->node_id = nid;
> -       pgdat->node_start_pfn = start_pfn;
> +       pgdat->node_start_pfn = 0;
>
>         /* init node's zones as empty zones, we don't have any present pages.*/
>         free_area_init_core_hotplug(nid);
> @@ -936,7 +934,6 @@ static void rollback_node_hotadd(int nid)
>  /**
>   * try_online_node - online a node if offlined
>   * @nid: the node ID
> - * @start: start addr of the node
>   * @set_node_online: Whether we want to online the node
>   * called by cpu_up() to online a node without onlined memory.
>   *
> @@ -945,7 +942,7 @@ static void rollback_node_hotadd(int nid)
>   * 0 -> the node is already online
>   * -ENOMEM -> the node could not be allocated
>   */
> -static int __try_online_node(int nid, u64 start, bool set_node_online)
> +static int __try_online_node(int nid, bool set_node_online)
>  {
>         pg_data_t *pgdat;
>         int ret = 1;
> @@ -953,7 +950,7 @@ static int __try_online_node(int nid, u64 start, bool set_node_online)
>         if (node_online(nid))
>                 return 0;
>
> -       pgdat = hotadd_new_pgdat(nid, start);
> +       pgdat = hotadd_new_pgdat(nid);
>         if (!pgdat) {
>                 pr_err("Cannot online node %d due to NULL pgdat\n", nid);
>                 ret = -ENOMEM;
> @@ -977,7 +974,7 @@ int try_online_node(int nid)
>         int ret;
>
>         mem_hotplug_begin();
> -       ret =  __try_online_node(nid, 0, true);
> +       ret =  __try_online_node(nid, true);
>         mem_hotplug_done();
>         return ret;
>  }
> @@ -1031,7 +1028,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
>          */
>         memblock_add_node(start, size, nid);
>
> -       ret = __try_online_node(nid, start, false);
> +       ret = __try_online_node(nid, false);
>         if (ret < 0)
>                 goto error;
>         new_node = ret;
> --

Looks right thing to me. Will wait for others to comment.

Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

> 2.25.1
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 2/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK
  2020-04-16 10:47 ` [PATCH RFC 2/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK David Hildenbrand
@ 2020-04-16 17:09   ` Mike Rapoport
  2020-04-21 12:39   ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2020-04-16 17:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Baoquan He,
	Oscar Salvador, Pankaj Gupta, Anshuman Khandual

On Thu, Apr 16, 2020 at 12:47:07PM +0200, David Hildenbrand wrote:
> The comment in add_memory_resource() is stale: hotadd_new_pgdat() will
> no longer call get_pfn_range_for_nid(), as a hotadded pgdat will simply
> span no pages at all, until memory is moved to the zone/node via
> move_pfn_range_to_zone() - e.g., when onlining memory blocks.
> 
> The only archs that care about memblocks for hotplugged memory (either
> for iterating over all system RAM or testing for memory validity) are
> arm64, s390x, and powerpc - due to CONFIG_ARCH_KEEP_MEMBLOCK. Without
> CONFIG_ARCH_KEEP_MEMBLOCK, we can simply stop messing with memblocks.
> 
> For s390x, it seems to be fairly easy to avoid CONFIG_ARCH_KEEP_MEMBLOCK.
> arm64 could rework most code (esp., pfn_valid(), valid_phys_addr_range()
> and kexec_file_load()) to not require memblocks for hotplugged
> memory. E.g., as hotplugged memory has no holes and can be identified
> using !early_section(), arm64's variant of pfn_valid() could be reworked
> fairly easily to not require memblocks for hotadded memory. powerpc might
> be more involed.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  mm/Kconfig          |  3 +++
>  mm/memory_hotplug.c | 13 +++++++------
>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index c1acc34c1c35..a063fd9cdff4 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -136,6 +136,9 @@ config HAVE_FAST_GUP
>  	depends on MMU
>  	bool
>  
> +# Don't discard allocated memory used to track "memory" and "reserved" memblocks
> +# after early boot, so it can still be used to test for validity of memory.
> +# Also, memblocks are updated with memory hot(un)plug.
>  config ARCH_KEEP_MEMBLOCK
>  	bool
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9b15ce465be2..104285ee9ae8 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1020,13 +1020,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  
>  	mem_hotplug_begin();
>  
> -	/*
> -	 * Add new range to memblock so that when hotadd_new_pgdat() is called
> -	 * to allocate new pgdat, get_pfn_range_for_nid() will be able to find
> -	 * this new range and calculate total pages correctly.  The range will
> -	 * be removed at hot-remove time.
> -	 */
> +#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
>  	memblock_add_node(start, size, nid);
> +#endif
>  
>  	ret = __try_online_node(nid, false);
>  	if (ret < 0)
> @@ -1075,7 +1071,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  	/* rollback pgdat allocation and others */
>  	if (new_node)
>  		rollback_node_hotadd(nid);
> +#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
>  	memblock_remove(start, size);
> +#endif
>  	mem_hotplug_done();
>  	return ret;
>  }
> @@ -1751,8 +1749,11 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  	mem_hotplug_begin();
>  
>  	arch_remove_memory(nid, start, size, NULL);
> +
> +#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
>  	memblock_free(start, size);
>  	memblock_remove(start, size);
> +#endif
>  	__release_memory_resource(start, size);
>  
>  	try_offline_node(nid);
> -- 
> 2.25.1
> 

-- 
Sincerely yours,
Mike.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  2020-04-16 10:47 ` [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn David Hildenbrand
  2020-04-16 14:11   ` Pankaj Gupta
@ 2020-04-21 12:30   ` Michal Hocko
  2020-04-21 12:35     ` David Hildenbrand
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-04-21 12:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Baoquan He,
	Oscar Salvador, Pankaj Gupta

Sorry for the late reply

On Thu 16-04-20 12:47:06, David Hildenbrand wrote:
> A hotadded node/pgdat will span no pages at all, until memory is moved to
> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
> when onlining memory blocks. We don't have to initialize the
> node_start_pfn to the memory we are adding.

You are right that the node is empty at this phase but that is already
reflected by zero present pages (hmm, I do not see spanned pages to be
set 0 though). What I am missing here is why this is an improvement. The
new node is already visible here and I do not see why we hide the
information we already know.
 
> Note: we'll also end up with pgdat->node_start_pfn == 0 when offlined the
> last memory block belonging to a node (via remove_pfn_range_from_zone()->
> update_pgdat_span()).
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory_hotplug.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 47cf6036eb31..9b15ce465be2 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -866,10 +866,9 @@ static void reset_node_present_pages(pg_data_t *pgdat)
>  }
>  
>  /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
> -static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
> +static pg_data_t __ref *hotadd_new_pgdat(int nid)
>  {
>  	struct pglist_data *pgdat;
> -	unsigned long start_pfn = PFN_DOWN(start);
>  
>  	pgdat = NODE_DATA(nid);
>  	if (!pgdat) {
> @@ -899,9 +898,8 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>  	}
>  
>  	/* we can use NODE_DATA(nid) from here */
> -
>  	pgdat->node_id = nid;
> -	pgdat->node_start_pfn = start_pfn;
> +	pgdat->node_start_pfn = 0;
>  
>  	/* init node's zones as empty zones, we don't have any present pages.*/
>  	free_area_init_core_hotplug(nid);
> @@ -936,7 +934,6 @@ static void rollback_node_hotadd(int nid)
>  /**
>   * try_online_node - online a node if offlined
>   * @nid: the node ID
> - * @start: start addr of the node
>   * @set_node_online: Whether we want to online the node
>   * called by cpu_up() to online a node without onlined memory.
>   *
> @@ -945,7 +942,7 @@ static void rollback_node_hotadd(int nid)
>   * 0 -> the node is already online
>   * -ENOMEM -> the node could not be allocated
>   */
> -static int __try_online_node(int nid, u64 start, bool set_node_online)
> +static int __try_online_node(int nid, bool set_node_online)
>  {
>  	pg_data_t *pgdat;
>  	int ret = 1;
> @@ -953,7 +950,7 @@ static int __try_online_node(int nid, u64 start, bool set_node_online)
>  	if (node_online(nid))
>  		return 0;
>  
> -	pgdat = hotadd_new_pgdat(nid, start);
> +	pgdat = hotadd_new_pgdat(nid);
>  	if (!pgdat) {
>  		pr_err("Cannot online node %d due to NULL pgdat\n", nid);
>  		ret = -ENOMEM;
> @@ -977,7 +974,7 @@ int try_online_node(int nid)
>  	int ret;
>  
>  	mem_hotplug_begin();
> -	ret =  __try_online_node(nid, 0, true);
> +	ret =  __try_online_node(nid, true);
>  	mem_hotplug_done();
>  	return ret;
>  }
> @@ -1031,7 +1028,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  	 */
>  	memblock_add_node(start, size, nid);
>  
> -	ret = __try_online_node(nid, start, false);
> +	ret = __try_online_node(nid, false);
>  	if (ret < 0)
>  		goto error;
>  	new_node = ret;
> -- 
> 2.25.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  2020-04-21 12:30   ` Michal Hocko
@ 2020-04-21 12:35     ` David Hildenbrand
  2020-04-21 12:52       ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-04-21 12:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Andrew Morton, Baoquan He,
	Oscar Salvador, Pankaj Gupta

On 21.04.20 14:30, Michal Hocko wrote:
> Sorry for the late reply
> 
> On Thu 16-04-20 12:47:06, David Hildenbrand wrote:
>> A hotadded node/pgdat will span no pages at all, until memory is moved to
>> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
>> when onlining memory blocks. We don't have to initialize the
>> node_start_pfn to the memory we are adding.
> 
> You are right that the node is empty at this phase but that is already
> reflected by zero present pages (hmm, I do not see spanned pages to be
> set 0 though). What I am missing here is why this is an improvement. The
> new node is already visible here and I do not see why we hide the
> information we already know.

"information we already know" - no, not before we online the memory.

Before onlining, it's just setting node_start_pfn to *some value* to be
overwritten in move_pfn_range_to_zone()->resize_pgdat_range().

(I have some more patches to clean up __try_online_node(), and this
change here makes it clear that there isn't any magic happening in
__try_online_node() related to the start pfn and memblocks - see patch #2)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 2/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK
  2020-04-16 10:47 ` [PATCH RFC 2/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK David Hildenbrand
  2020-04-16 17:09   ` Mike Rapoport
@ 2020-04-21 12:39   ` Michal Hocko
  2020-04-21 12:41     ` David Hildenbrand
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-04-21 12:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Baoquan He,
	Oscar Salvador, Pankaj Gupta, Mike Rapoport, Anshuman Khandual

On Thu 16-04-20 12:47:07, David Hildenbrand wrote:
> The comment in add_memory_resource() is stale: hotadd_new_pgdat() will
> no longer call get_pfn_range_for_nid(), as a hotadded pgdat will simply
> span no pages at all, until memory is moved to the zone/node via
> move_pfn_range_to_zone() - e.g., when onlining memory blocks.
> 
> The only archs that care about memblocks for hotplugged memory (either
> for iterating over all system RAM or testing for memory validity) are
> arm64, s390x, and powerpc - due to CONFIG_ARCH_KEEP_MEMBLOCK. Without
> CONFIG_ARCH_KEEP_MEMBLOCK, we can simply stop messing with memblocks.

OK, makes sense to me.

> For s390x, it seems to be fairly easy to avoid CONFIG_ARCH_KEEP_MEMBLOCK.
> arm64 could rework most code (esp., pfn_valid(), valid_phys_addr_range()
> and kexec_file_load()) to not require memblocks for hotplugged
> memory. E.g., as hotplugged memory has no holes and can be identified
> using !early_section(), arm64's variant of pfn_valid() could be reworked
> fairly easily to not require memblocks for hotadded memory. powerpc might
> be more involed.

I haven't checked these architectures but is the information really
useful for this patch?

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

with a minor nit

> -	/*
> -	 * Add new range to memblock so that when hotadd_new_pgdat() is called
> -	 * to allocate new pgdat, get_pfn_range_for_nid() will be able to find
> -	 * this new range and calculate total pages correctly.  The range will
> -	 * be removed at hot-remove time.
> -	 */
> +#ifdef CONFIG_ARCH_KEEP_MEMBLOCK

	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)

would be slightly nicer. This should work for all the ifedefs in this
patch.

>  	memblock_add_node(start, size, nid);
> +#endif
>  
>  	ret = __try_online_node(nid, false);
>  	if (ret < 0)
> @@ -1075,7 +1071,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  	/* rollback pgdat allocation and others */
>  	if (new_node)
>  		rollback_node_hotadd(nid);
> +#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
>  	memblock_remove(start, size);
> +#endif
>  	mem_hotplug_done();
>  	return ret;
>  }
> @@ -1751,8 +1749,11 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  	mem_hotplug_begin();
>  
>  	arch_remove_memory(nid, start, size, NULL);
> +
> +#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
>  	memblock_free(start, size);
>  	memblock_remove(start, size);
> +#endif
>  	__release_memory_resource(start, size);
>  
>  	try_offline_node(nid);
> -- 
> 2.25.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 2/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK
  2020-04-21 12:39   ` Michal Hocko
@ 2020-04-21 12:41     ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-04-21 12:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Andrew Morton, Baoquan He,
	Oscar Salvador, Pankaj Gupta, Mike Rapoport, Anshuman Khandual

>> For s390x, it seems to be fairly easy to avoid CONFIG_ARCH_KEEP_MEMBLOCK.
>> arm64 could rework most code (esp., pfn_valid(), valid_phys_addr_range()
>> and kexec_file_load()) to not require memblocks for hotplugged
>> memory. E.g., as hotplugged memory has no holes and can be identified
>> using !early_section(), arm64's variant of pfn_valid() could be reworked
>> fairly easily to not require memblocks for hotadded memory. powerpc might
>> be more involed.
> 
> I haven't checked these architectures but is the information really
> useful for this patch?

It might of interest for Arm64 people (cced below). I sent out s390x
patches to change that.

I can drop that part.

> 
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Baoquan He <bhe@redhat.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> with a minor nit
> 
>> -	/*
>> -	 * Add new range to memblock so that when hotadd_new_pgdat() is called
>> -	 * to allocate new pgdat, get_pfn_range_for_nid() will be able to find
>> -	 * this new range and calculate total pages correctly.  The range will
>> -	 * be removed at hot-remove time.
>> -	 */
>> +#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
> 
> 	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)
> 
> would be slightly nicer. This should work for all the ifedefs in this
> patch.

Yeah, will change if it compiles.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  2020-04-21 12:35     ` David Hildenbrand
@ 2020-04-21 12:52       ` Michal Hocko
  2020-04-21 13:06         ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-04-21 12:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Baoquan He,
	Oscar Salvador, Pankaj Gupta

On Tue 21-04-20 14:35:12, David Hildenbrand wrote:
> On 21.04.20 14:30, Michal Hocko wrote:
> > Sorry for the late reply
> > 
> > On Thu 16-04-20 12:47:06, David Hildenbrand wrote:
> >> A hotadded node/pgdat will span no pages at all, until memory is moved to
> >> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
> >> when onlining memory blocks. We don't have to initialize the
> >> node_start_pfn to the memory we are adding.
> > 
> > You are right that the node is empty at this phase but that is already
> > reflected by zero present pages (hmm, I do not see spanned pages to be
> > set 0 though). What I am missing here is why this is an improvement. The
> > new node is already visible here and I do not see why we hide the
> > information we already know.
> 
> "information we already know" - no, not before we online the memory.

Is this really the case? All add_memory_resource users operate on a
physical memory range.

> Before onlining, it's just setting node_start_pfn to *some value* to be
> overwritten in move_pfn_range_to_zone()->resize_pgdat_range().

Yes the value is overwritten but I am not sure this is actually correct
thing to do. I cannot remember why I've chosen to do that. It doesn't
really seem unlikely to online node in a higher physical address.

Btw. one thing that I have in my notes, I was never able to actually
test the no numa node case. Because I have always been testing with node
being allocated during the boot. Do you have any way to trigger this
path?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  2020-04-21 12:52       ` Michal Hocko
@ 2020-04-21 13:06         ` David Hildenbrand
  2020-04-22  8:21           ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-04-21 13:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Andrew Morton, Baoquan He,
	Oscar Salvador, Pankaj Gupta

On 21.04.20 14:52, Michal Hocko wrote:
> On Tue 21-04-20 14:35:12, David Hildenbrand wrote:
>> On 21.04.20 14:30, Michal Hocko wrote:
>>> Sorry for the late reply
>>>
>>> On Thu 16-04-20 12:47:06, David Hildenbrand wrote:
>>>> A hotadded node/pgdat will span no pages at all, until memory is moved to
>>>> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
>>>> when onlining memory blocks. We don't have to initialize the
>>>> node_start_pfn to the memory we are adding.
>>>
>>> You are right that the node is empty at this phase but that is already
>>> reflected by zero present pages (hmm, I do not see spanned pages to be
>>> set 0 though). What I am missing here is why this is an improvement. The
>>> new node is already visible here and I do not see why we hide the
>>> information we already know.
>>
>> "information we already know" - no, not before we online the memory.
> 
> Is this really the case? All add_memory_resource users operate on a
> physical memory range.

Having the first add_memory() to magically set node_start_pfn of a hotplugged
node isn't dangerous, I think we agree on that. It's just completely
unnecessary here and at least left me confused why this is needed at all-
because the node start/end pfn is only really touched when
onlining/offlining memory (when resizing the zone and the pgdat).

> 
>> Before onlining, it's just setting node_start_pfn to *some value* to be
>> overwritten in move_pfn_range_to_zone()->resize_pgdat_range().
> 
> Yes the value is overwritten but I am not sure this is actually correct
> thing to do. I cannot remember why I've chosen to do that. It doesn't
> really seem unlikely to online node in a higher physical address.
> 

Well, we decided to glue the node span to onlining/offlining of memory.
So, the value really has no meaning without any of that memory being
online/the node span being 0.

> Btw. one thing that I have in my notes, I was never able to actually
> test the no numa node case. Because I have always been testing with node
> being allocated during the boot. Do you have any way to trigger this
> path?

Sure, here is my test case

#! /bin/bash
sudo qemu-system-x86_64 \
    --enable-kvm \
    -m 4G,maxmem=20G,slots=2 \
    -smp sockets=2,cores=2 \
    -numa node,nodeid=0,cpus=0-1,mem=4G -numa node,nodeid=1,mem=0G \
    -kernel /home/dhildenb/git/linux/arch/x86_64/boot/bzImage \
    -append "console=ttyS0 rd.shell rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0 page_owner=on" \
    -initrd /boot/initramfs-5.4.7-200.fc31.x86_64.img \
    -machine pc \
    -nographic \
    -nodefaults \
    -chardev stdio,id=serial \
    -device isa-serial,chardev=serial \
    -chardev socket,id=monitor,path=/var/tmp/monitor,server,nowait \
    -mon chardev=monitor,mode=readline \
    -device virtio-balloon \
    -object memory-backend-ram,id=mem0,size=512M \
    -object memory-backend-ram,id=mem1,size=512M \
    -device pc-dimm,id=dimm0,memdev=mem0,node=1 \
    -device pc-dimm,id=dimm1,memdev=mem1,node=1

Instead of coldplugging the DIMMs to node 1, you could also hotplug them later
(let me know if you need information on how to do that). I use this test to
verify that the node is properly onlined/offlined once I unplug/replug the two
DIMMs (e.g., after onlining/offlining the memory blocks).

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  2020-04-21 13:06         ` David Hildenbrand
@ 2020-04-22  8:21           ` Michal Hocko
  2020-04-22  8:32             ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-04-22  8:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Baoquan He,
	Oscar Salvador, Pankaj Gupta

On Tue 21-04-20 15:06:20, David Hildenbrand wrote:
> On 21.04.20 14:52, Michal Hocko wrote:
> > On Tue 21-04-20 14:35:12, David Hildenbrand wrote:
> >> On 21.04.20 14:30, Michal Hocko wrote:
> >>> Sorry for the late reply
> >>>
> >>> On Thu 16-04-20 12:47:06, David Hildenbrand wrote:
> >>>> A hotadded node/pgdat will span no pages at all, until memory is moved to
> >>>> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
> >>>> when onlining memory blocks. We don't have to initialize the
> >>>> node_start_pfn to the memory we are adding.
> >>>
> >>> You are right that the node is empty at this phase but that is already
> >>> reflected by zero present pages (hmm, I do not see spanned pages to be
> >>> set 0 though). What I am missing here is why this is an improvement. The
> >>> new node is already visible here and I do not see why we hide the
> >>> information we already know.
> >>
> >> "information we already know" - no, not before we online the memory.
> > 
> > Is this really the case? All add_memory_resource users operate on a
> > physical memory range.
> 
> Having the first add_memory() to magically set node_start_pfn of a hotplugged
> node isn't dangerous, I think we agree on that. It's just completely
> unnecessary here and at least left me confused why this is needed at all-
> because the node start/end pfn is only really touched when
> onlining/offlining memory (when resizing the zone and the pgdat).

I do not see any specific problem. It just feels odd to
ignore the start pfn when we have that information. I am little bit
worried that this might kick back. E.g. say we start using the memmaps
from the hotplugged memory then the initial part of the node will never
get online and we would have memmaps outside of the node span. I do not
see an immediate problem except for the feeling this is odd.

That being said I will shut up now and leave it alone.

[...]
> > Btw. one thing that I have in my notes, I was never able to actually
> > test the no numa node case. Because I have always been testing with node
> > being allocated during the boot. Do you have any way to trigger this
> > path?
> 
> Sure, here is my test case
> 
> #! /bin/bash
> sudo qemu-system-x86_64 \
>     --enable-kvm \
>     -m 4G,maxmem=20G,slots=2 \
>     -smp sockets=2,cores=2 \
>     -numa node,nodeid=0,cpus=0-1,mem=4G -numa node,nodeid=1,mem=0G \

I have been using a similar command line
NUMA_ONE_MEMORY_LESS_NODE="-numa node,mem=2G -numa node,mem=0G"
which gets appended to the qemu cmdline. I have always thought that this
would allocate pgdat for node 1 though. I have checked that again now
and dmesg doesn't point to node 1 anywhere.

Thanks!

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  2020-04-22  8:21           ` Michal Hocko
@ 2020-04-22  8:32             ` David Hildenbrand
  2020-04-22 10:00               ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-04-22  8:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Andrew Morton, Baoquan He,
	Oscar Salvador, Pankaj Gupta

On 22.04.20 10:21, Michal Hocko wrote:
> On Tue 21-04-20 15:06:20, David Hildenbrand wrote:
>> On 21.04.20 14:52, Michal Hocko wrote:
>>> On Tue 21-04-20 14:35:12, David Hildenbrand wrote:
>>>> On 21.04.20 14:30, Michal Hocko wrote:
>>>>> Sorry for the late reply
>>>>>
>>>>> On Thu 16-04-20 12:47:06, David Hildenbrand wrote:
>>>>>> A hotadded node/pgdat will span no pages at all, until memory is moved to
>>>>>> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
>>>>>> when onlining memory blocks. We don't have to initialize the
>>>>>> node_start_pfn to the memory we are adding.
>>>>>
>>>>> You are right that the node is empty at this phase but that is already
>>>>> reflected by zero present pages (hmm, I do not see spanned pages to be
>>>>> set 0 though). What I am missing here is why this is an improvement. The
>>>>> new node is already visible here and I do not see why we hide the
>>>>> information we already know.
>>>>
>>>> "information we already know" - no, not before we online the memory.
>>>
>>> Is this really the case? All add_memory_resource users operate on a
>>> physical memory range.
>>
>> Having the first add_memory() to magically set node_start_pfn of a hotplugged
>> node isn't dangerous, I think we agree on that. It's just completely
>> unnecessary here and at least left me confused why this is needed at all-
>> because the node start/end pfn is only really touched when
>> onlining/offlining memory (when resizing the zone and the pgdat).
> 
> I do not see any specific problem. It just feels odd to
> ignore the start pfn when we have that information. I am little bit
> worried that this might kick back. E.g. say we start using the memmaps
> from the hotplugged memory then the initial part of the node will never> get online and we would have memmaps outside of the node span. I do not

That's a general issue, which I pointed out as response to Oscars last
series. This needs more thought and reworks, especially how
node_start_pfn/node_spanned_pages are glued to memory onlining/offlining
today.

> see an immediate problem except for the feeling this is odd.

I think it's inconsistent. E.g., start with memory-less/cpu-less node
and don't online memory from the kernel immediately.

Hotplug CPU. PGDAT initialized with node_start_pfn=0. Hotplug memory.
-> node_start_pfn=0 until memory is actually onlined.

Hotplug memory. PGDAT initialized with node_start_pfn=$VALUE. Hotplug CPU.
-> node_start_pfn=$VALUE

Hotplug memory. PGDAT initialized with node_start_pfn=$VALUE. Hotplug
CPU. Hotunplug memory.
-> node_start_pfn=$VALUE, although there is no memory anymore.

Hotplug memory 1. PGDAT initialized with node_start_pfn=$VALUE. Hotplug
memory 2. Hotunplug memory 2.
-> node_start_pfn=$VALUE1 instead of $VALUE2.


Again, because node_start_pfn has absolutely no meaning until memory is
actually onlined - today.

> 
> That being said I will shut up now and leave it alone.

Is that a nack?

Thanks for having a look!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn
  2020-04-22  8:32             ` David Hildenbrand
@ 2020-04-22 10:00               ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2020-04-22 10:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Baoquan He,
	Oscar Salvador, Pankaj Gupta

On Wed 22-04-20 10:32:32, David Hildenbrand wrote:
> On 22.04.20 10:21, Michal Hocko wrote:
> > On Tue 21-04-20 15:06:20, David Hildenbrand wrote:
> >> On 21.04.20 14:52, Michal Hocko wrote:
> >>> On Tue 21-04-20 14:35:12, David Hildenbrand wrote:
> >>>> On 21.04.20 14:30, Michal Hocko wrote:
> >>>>> Sorry for the late reply
> >>>>>
> >>>>> On Thu 16-04-20 12:47:06, David Hildenbrand wrote:
> >>>>>> A hotadded node/pgdat will span no pages at all, until memory is moved to
> >>>>>> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
> >>>>>> when onlining memory blocks. We don't have to initialize the
> >>>>>> node_start_pfn to the memory we are adding.
> >>>>>
> >>>>> You are right that the node is empty at this phase but that is already
> >>>>> reflected by zero present pages (hmm, I do not see spanned pages to be
> >>>>> set 0 though). What I am missing here is why this is an improvement. The
> >>>>> new node is already visible here and I do not see why we hide the
> >>>>> information we already know.
> >>>>
> >>>> "information we already know" - no, not before we online the memory.
> >>>
> >>> Is this really the case? All add_memory_resource users operate on a
> >>> physical memory range.
> >>
> >> Having the first add_memory() to magically set node_start_pfn of a hotplugged
> >> node isn't dangerous, I think we agree on that. It's just completely
> >> unnecessary here and at least left me confused why this is needed at all-
> >> because the node start/end pfn is only really touched when
> >> onlining/offlining memory (when resizing the zone and the pgdat).
> > 
> > I do not see any specific problem. It just feels odd to
> > ignore the start pfn when we have that information. I am little bit
> > worried that this might kick back. E.g. say we start using the memmaps
> > from the hotplugged memory then the initial part of the node will never> get online and we would have memmaps outside of the node span. I do not
> 
> That's a general issue, which I pointed out as response to Oscars last
> series. This needs more thought and reworks, especially how
> node_start_pfn/node_spanned_pages are glued to memory onlining/offlining
> today.
> 
> > see an immediate problem except for the feeling this is odd.
> 
> I think it's inconsistent. E.g., start with memory-less/cpu-less node
> and don't online memory from the kernel immediately.
> 
> Hotplug CPU. PGDAT initialized with node_start_pfn=0. Hotplug memory.
> -> node_start_pfn=0 until memory is actually onlined.
> 
> Hotplug memory. PGDAT initialized with node_start_pfn=$VALUE. Hotplug CPU.
> -> node_start_pfn=$VALUE
> 
> Hotplug memory. PGDAT initialized with node_start_pfn=$VALUE. Hotplug
> CPU. Hotunplug memory.
> -> node_start_pfn=$VALUE, although there is no memory anymore.
> 
> Hotplug memory 1. PGDAT initialized with node_start_pfn=$VALUE. Hotplug
> memory 2. Hotunplug memory 2.
> -> node_start_pfn=$VALUE1 instead of $VALUE2.
> 
> 
> Again, because node_start_pfn has absolutely no meaning until memory is
> actually onlined - today.
> 
> > 
> > That being said I will shut up now and leave it alone.
> 
> Is that a nack?

No it's not. Nor I am going to ack this but I will not stand in the
way. I would just urge to have as many assumptions you are making and as
much information in the changelog as possible.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-22 10:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-16 10:47 [PATCH RFC 0/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK David Hildenbrand
2020-04-16 10:47 ` [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn David Hildenbrand
2020-04-16 14:11   ` Pankaj Gupta
2020-04-21 12:30   ` Michal Hocko
2020-04-21 12:35     ` David Hildenbrand
2020-04-21 12:52       ` Michal Hocko
2020-04-21 13:06         ` David Hildenbrand
2020-04-22  8:21           ` Michal Hocko
2020-04-22  8:32             ` David Hildenbrand
2020-04-22 10:00               ` Michal Hocko
2020-04-16 10:47 ` [PATCH RFC 2/2] mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK David Hildenbrand
2020-04-16 17:09   ` Mike Rapoport
2020-04-21 12:39   ` Michal Hocko
2020-04-21 12:41     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).