All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Yasuaki Ishimatsu <yasu.isimatu@gmail.com>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	qiuxishi@huawei.com, Kani Toshimitsu <toshi.kani@hpe.com>,
	slaoub@gmail.com, Joonsoo Kim <js1304@gmail.com>,
	Andi Kleen <ak@linux.intel.com>,
	Zhang Zhen <zhenzhang.zhang@huawei.com>,
	David Rientjes <rientjes@google.com>,
	Daniel Kiper <daniel.kiper@oracle.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Chris Metcalf <cmetcalf@mellanox.com>,
	Dan Williams <dan.j.williams@gmail.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [PATCH 0/6] mm: make movable onlining suck less
Date: Thu, 6 Apr 2017 13:07:47 +0200	[thread overview]
Message-ID: <20170406110747.GJ5497@dhcp22.suse.cz> (raw)
In-Reply-To: <20170405210214.GX6035@dhcp22.suse.cz>

On Wed 05-04-17 23:02:14, Michal Hocko wrote:
[...]
> OK, I was staring into the code and I guess I finally understand what is
> going on here. Looking at arch_add_memory->...->register_mem_sect_under_node
> was just misleading. I am still not 100% sure why but we try to do the
> same thing later from register_one_node->link_mem_sections for nodes
> which were offline. I should have noticed this path before. And here
> is the difference from the previous code. We are past arch_add_memory
> and that path used to do __add_zone which among other things will also
> resize node boundaries. I am not doing that anymore because I postpone
> that to the onlining phase. Jeez this code is so convoluted my head
> spins.
> 
> I am not really sure how to fix this. I suspect register_mem_sect_under_node
> should just ignore the online state of the node. But I wouldn't
> be all that surprised if this had some subtle reason as well. An
> alternative would be to actually move register_mem_sect_under_node out
> of register_new_memory and move it up the call stack, most probably to
> add_memory_resource. We have the range and can map it to the memblock
> and so will not rely on the node range. I will sleep over it and
> hopefully come up with something tomorrow.

OK, so this is the most sensible way I was able to come up with. I
didn't get to test it yet but from the above analysis it should work.
---
>From 6c99a3284ea70262e3f25cbe71826a57aeaa7ffd Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 6 Apr 2017 11:59:37 +0200
Subject: [PATCH] mm, memory_hotplug: split up register_one_node

Memory hotplug (add_memory_resource) has to reinitialize node
infrastructure if the node is offline (one which went through the
complete add_memory(); remove_memory() cycle). That involves node
registration to the kobj infrastructure (register_node), the proper
association with cpus (register_cpu_under_node) and finally creation of
node<->memblock symlinks (link_mem_sections).

The last part requires to know node_start_pfn and node_spanned_pages
which we currently have but a leter patch will postpone this
initialization to the onlining phase which happens later. In fact
we do not need to rely on the early initialization even now because
we know which range is currently hot added.

Split register_one_node into core which does all the common work for
the boot time NUMA initialization and the hotplug (__register_one_node).
register_one_node keeps the full initialization while hotplug calls
__register_one_node and manually calls link_mem_sections for the proper
range.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/base/node.c  | 51 ++++++++++++++++++++-------------------------------
 include/linux/node.h | 35 ++++++++++++++++++++++++++++++++++-
 mm/memory_hotplug.c  | 17 ++++++++++++++++-
 3 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 06294d69779b..dff5b53f7905 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -461,10 +461,9 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
 	return 0;
 }
 
-static int link_mem_sections(int nid)
+int link_mem_sections(int nid, unsigned long start_pfn, unsigned long nr_pages)
 {
-	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
-	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
+	unsigned long end_pfn = start_pfn + nr_pages;
 	unsigned long pfn;
 	struct memory_block *mem_blk = NULL;
 	int err = 0;
@@ -552,10 +551,7 @@ static int node_memory_callback(struct notifier_block *self,
 	return NOTIFY_OK;
 }
 #endif	/* CONFIG_HUGETLBFS */
-#else	/* !CONFIG_MEMORY_HOTPLUG_SPARSE */
-
-static int link_mem_sections(int nid) { return 0; }
-#endif	/* CONFIG_MEMORY_HOTPLUG_SPARSE */
+#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
 
 #if !defined(CONFIG_MEMORY_HOTPLUG_SPARSE) || \
     !defined(CONFIG_HUGETLBFS)
@@ -569,39 +565,32 @@ static void init_node_hugetlb_work(int nid) { }
 
 #endif
 
-int register_one_node(int nid)
+int __register_one_node(int nid)
 {
-	int error = 0;
+	int p_node = parent_node(nid);
+	struct node *parent = NULL;
+	int error;
 	int cpu;
 
-	if (node_online(nid)) {
-		int p_node = parent_node(nid);
-		struct node *parent = NULL;
-
-		if (p_node != nid)
-			parent = node_devices[p_node];
-
-		node_devices[nid] = kzalloc(sizeof(struct node), GFP_KERNEL);
-		if (!node_devices[nid])
-			return -ENOMEM;
-
-		error = register_node(node_devices[nid], nid, parent);
+	if (p_node != nid)
+		parent = node_devices[p_node];
 
-		/* link cpu under this node */
-		for_each_present_cpu(cpu) {
-			if (cpu_to_node(cpu) == nid)
-				register_cpu_under_node(cpu, nid);
-		}
+	node_devices[nid] = kzalloc(sizeof(struct node), GFP_KERNEL);
+	if (!node_devices[nid])
+		return -ENOMEM;
 
-		/* link memory sections under this node */
-		error = link_mem_sections(nid);
+	error = register_node(node_devices[nid], nid, parent);
 
-		/* initialize work queue for memory hot plug */
-		init_node_hugetlb_work(nid);
+	/* link cpu under this node */
+	for_each_present_cpu(cpu) {
+		if (cpu_to_node(cpu) == nid)
+			register_cpu_under_node(cpu, nid);
 	}
 
-	return error;
+	/* initialize work queue for memory hot plug */
+	init_node_hugetlb_work(nid);
 
+	return error;
 }
 
 void unregister_one_node(int nid)
diff --git a/include/linux/node.h b/include/linux/node.h
index 2115ad5d6f19..2baa640d0b92 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -30,9 +30,38 @@ struct memory_block;
 extern struct node *node_devices[];
 typedef  void (*node_registration_func_t)(struct node *);
 
+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+extern int link_mem_sections(int nid, unsigned long start_pfn, unsigned long nr_pages);
+#else
+static int link_mem_sections(int nid, unsigned long start_pfn, unsigned long nr_pages)
+{
+	return 0;
+}
+#endif
+
 extern void unregister_node(struct node *node);
 #ifdef CONFIG_NUMA
-extern int register_one_node(int nid);
+/* Core of the node registration - only memory hotplug should use this */
+extern int __register_one_node(int nid);
+
+/* Registers an online node */
+static inline int register_one_node(int nid)
+{
+	int error = 0;
+
+	if (node_online(nid)) {
+		struct pglist_data *pgdat = NODE_DATA(nid);
+
+		error = __register_one_node(nid);
+		if (error)
+			return error;
+		/* link memory sections under this node */
+		error = link_mem_sections(nid, pgdat->node_start_pfn, pgdat->node_spanned_pages);
+	}
+
+	return error;
+}
+
 extern void unregister_one_node(int nid);
 extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
@@ -46,6 +75,10 @@ extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
 					 node_registration_func_t unregister);
 #endif
 #else
+static inline int __register_one_node(int nid)
+{
+	return 0;
+}
 static inline int register_one_node(int nid)
 {
 	return 0;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c2b018c808b7..2c731bdfa845 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1220,7 +1220,22 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 	node_set_online(nid);
 
 	if (new_node) {
-		ret = register_one_node(nid);
+		unsigned long start_pfn = start >> PAGE_SHIFT;
+		unsigned long nr_pages = size >> PAGE_SHIFT;
+
+		ret = __register_one_node(nid);
+		if (ret)
+			goto register_fail;
+
+		/*
+		 * link memory sections under this node. This is already
+		 * done when creatig memory section in register_new_memory
+		 * but that depends to have the node registered so offline
+		 * nodes have to go through register_node.
+		 * TODO clean up this mess.
+		 */
+		ret = link_mem_sections(nid, start_pfn, nr_pages);
+register_fail:
 		/*
 		 * If sysfs file of new node can't create, cpu on the node
 		 * can't be hot-added. There is no rollback way now.
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Yasuaki Ishimatsu <yasu.isimatu@gmail.com>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	qiuxishi@huawei.com, Kani Toshimitsu <toshi.kani@hpe.com>,
	slaoub@gmail.com, Joonsoo Kim <js1304@gmail.com>,
	Andi Kleen <ak@linux.intel.com>,
	Zhang Zhen <zhenzhang.zhang@huawei.com>,
	David Rientjes <rientjes@google.com>,
	Daniel Kiper <daniel.kiper@oracle.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Chris Metcalf <cmetcalf@mellanox.com>,
	Dan Williams <dan.j.williams@gmail.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [PATCH 0/6] mm: make movable onlining suck less
Date: Thu, 6 Apr 2017 13:07:47 +0200	[thread overview]
Message-ID: <20170406110747.GJ5497@dhcp22.suse.cz> (raw)
In-Reply-To: <20170405210214.GX6035@dhcp22.suse.cz>

On Wed 05-04-17 23:02:14, Michal Hocko wrote:
[...]
> OK, I was staring into the code and I guess I finally understand what is
> going on here. Looking at arch_add_memory->...->register_mem_sect_under_node
> was just misleading. I am still not 100% sure why but we try to do the
> same thing later from register_one_node->link_mem_sections for nodes
> which were offline. I should have noticed this path before. And here
> is the difference from the previous code. We are past arch_add_memory
> and that path used to do __add_zone which among other things will also
> resize node boundaries. I am not doing that anymore because I postpone
> that to the onlining phase. Jeez this code is so convoluted my head
> spins.
> 
> I am not really sure how to fix this. I suspect register_mem_sect_under_node
> should just ignore the online state of the node. But I wouldn't
> be all that surprised if this had some subtle reason as well. An
> alternative would be to actually move register_mem_sect_under_node out
> of register_new_memory and move it up the call stack, most probably to
> add_memory_resource. We have the range and can map it to the memblock
> and so will not rely on the node range. I will sleep over it and
> hopefully come up with something tomorrow.

OK, so this is the most sensible way I was able to come up with. I
didn't get to test it yet but from the above analysis it should work.
---

  reply	other threads:[~2017-04-06 11:08 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-30 11:54 [PATCH 0/6] mm: make movable onlining suck less Michal Hocko
2017-03-30 11:54 ` Michal Hocko
2017-03-30 11:54 ` [PATCH 1/6] mm: get rid of zone_is_initialized Michal Hocko
2017-03-30 11:54   ` Michal Hocko
2017-03-31  3:39   ` Hillf Danton
2017-03-31  3:39     ` Hillf Danton
2017-03-31  6:43     ` Michal Hocko
2017-03-31  6:43       ` Michal Hocko
2017-03-31  6:48       ` Michal Hocko
2017-03-31  6:48         ` Michal Hocko
2017-03-31  7:39   ` [PATCH v1 " Michal Hocko
2017-03-31  7:39     ` Michal Hocko
2017-04-05  8:14     ` Michal Hocko
2017-04-05  8:14       ` Michal Hocko
2017-04-05  9:06       ` Igor Mammedov
2017-04-05  9:06         ` Igor Mammedov
2017-04-05  9:23         ` Michal Hocko
2017-04-05  9:23           ` Michal Hocko
2017-03-30 11:54 ` [PATCH 2/6] mm, tile: drop arch_{add,remove}_memory Michal Hocko
2017-03-30 11:54   ` Michal Hocko
2017-03-30 15:41   ` Chris Metcalf
2017-03-30 15:41     ` Chris Metcalf
2017-03-30 11:54 ` [PATCH 3/6] mm: remove return value from init_currently_empty_zone Michal Hocko
2017-03-30 11:54   ` Michal Hocko
2017-03-31  3:49   ` Hillf Danton
2017-03-31  3:49     ` Hillf Danton
2017-03-31  6:49     ` Michal Hocko
2017-03-31  6:49       ` Michal Hocko
2017-03-31  7:06       ` Hillf Danton
2017-03-31  7:06         ` Hillf Danton
2017-03-31  7:18         ` Michal Hocko
2017-03-31  7:18           ` Michal Hocko
2017-03-31  7:43   ` Michal Hocko
2017-03-31  7:43     ` Michal Hocko
2017-04-03 21:22   ` Reza Arbab
2017-04-03 21:22     ` Reza Arbab
2017-04-04  7:30     ` Michal Hocko
2017-04-04  7:30       ` Michal Hocko
2017-03-30 11:54 ` [PATCH 4/6] mm, memory_hotplug: use node instead of zone in can_online_high_movable Michal Hocko
2017-03-30 11:54   ` Michal Hocko
2017-03-30 11:54 ` [PATCH 5/6] mm, memory_hotplug: do not associate hotadded memory to zones until online Michal Hocko
2017-03-30 11:54   ` Michal Hocko
2017-03-31  6:18   ` Hillf Danton
2017-03-31  6:18     ` Hillf Danton
2017-03-31  6:50     ` Michal Hocko
2017-03-31  6:50       ` Michal Hocko
2017-04-04 12:21   ` Tobias Regnery
2017-04-04 12:21     ` Tobias Regnery
2017-04-04 12:45     ` Michal Hocko
2017-04-04 12:45       ` Michal Hocko
2017-04-06  8:14   ` Michal Hocko
2017-04-06  8:14     ` Michal Hocko
2017-04-06 12:46   ` Michal Hocko
2017-04-06 12:46     ` Michal Hocko
2017-03-30 11:54 ` [PATCH 6/6] mm, memory_hotplug: remove unused cruft after memory hotplug rework Michal Hocko
2017-03-30 11:54   ` Michal Hocko
2017-03-31  7:46   ` Michal Hocko
2017-03-31  7:46     ` Michal Hocko
2017-03-31 19:19 ` [PATCH 0/6] mm: make movable onlining suck less Heiko Carstens
2017-03-31 19:19   ` Heiko Carstens
2017-04-03  7:34   ` Michal Hocko
2017-04-03  7:34     ` Michal Hocko
2017-04-03 11:55 ` Michal Hocko
2017-04-03 11:55   ` Michal Hocko
2017-04-03 12:20   ` Igor Mammedov
2017-04-03 12:20     ` Igor Mammedov
2017-04-03 19:58   ` Reza Arbab
2017-04-03 19:58     ` Reza Arbab
2017-04-03 20:23     ` Michal Hocko
2017-04-03 20:23       ` Michal Hocko
2017-04-03 20:42       ` Reza Arbab
2017-04-03 20:42         ` Reza Arbab
2017-04-04  7:23         ` Michal Hocko
2017-04-04  7:23           ` Michal Hocko
2017-04-04  7:34           ` Michal Hocko
2017-04-04  7:34             ` Michal Hocko
2017-04-04  8:23             ` Michal Hocko
2017-04-04  8:23               ` Michal Hocko
2017-04-04 15:59               ` Reza Arbab
2017-04-04 15:59                 ` Reza Arbab
2017-04-04 16:02               ` Reza Arbab
2017-04-04 16:02                 ` Reza Arbab
2017-04-04 16:44                 ` Michal Hocko
2017-04-04 16:44                   ` Michal Hocko
2017-04-04 18:30                   ` Reza Arbab
2017-04-04 18:30                     ` Reza Arbab
2017-04-04 19:41                     ` Michal Hocko
2017-04-04 19:41                       ` Michal Hocko
2017-04-04 21:43                       ` Reza Arbab
2017-04-04 21:43                         ` Reza Arbab
2017-04-05  6:42                         ` Michal Hocko
2017-04-05  6:42                           ` Michal Hocko
2017-04-05  9:24                           ` Michal Hocko
2017-04-05  9:24                             ` Michal Hocko
2017-04-05 14:53                             ` Reza Arbab
2017-04-05 14:53                               ` Reza Arbab
2017-04-05 15:42                               ` Michal Hocko
2017-04-05 15:42                                 ` Michal Hocko
2017-04-05 17:32                                 ` Reza Arbab
2017-04-05 17:32                                   ` Reza Arbab
2017-04-05 18:15                                   ` Michal Hocko
2017-04-05 18:15                                     ` Michal Hocko
2017-04-05 19:39                                     ` Michal Hocko
2017-04-05 19:39                                       ` Michal Hocko
2017-04-05 21:02                                     ` Michal Hocko
2017-04-05 21:02                                       ` Michal Hocko
2017-04-06 11:07                                       ` Michal Hocko [this message]
2017-04-06 11:07                                         ` Michal Hocko
2017-04-05 15:48                           ` Reza Arbab
2017-04-05 15:48                             ` Reza Arbab
2017-04-05 16:34                             ` Michal Hocko
2017-04-05 16:34                               ` Michal Hocko
2017-04-05 20:55                               ` Reza Arbab
2017-04-05 20:55                                 ` Reza Arbab
2017-04-06  9:25                               ` Michal Hocko
2017-04-06  9:25                                 ` Michal Hocko
2017-04-05 13:52                         ` Michal Hocko
2017-04-05 13:52                           ` Michal Hocko
2017-04-05 15:23                           ` Reza Arbab
2017-04-05 15:23                             ` Reza Arbab
2017-04-05  6:36                       ` Michal Hocko
2017-04-05  6:36                         ` Michal Hocko
2017-04-06 13:08 ` Michal Hocko
2017-04-06 13:08   ` Michal Hocko
2017-04-06 15:24   ` Reza Arbab
2017-04-06 15:24     ` Reza Arbab
2017-04-06 15:41     ` Michal Hocko
2017-04-06 15:41       ` Michal Hocko
2017-04-06 15:46       ` Reza Arbab
2017-04-06 15:46         ` Reza Arbab
2017-04-06 16:21         ` Michal Hocko
2017-04-06 16:21           ` Michal Hocko
2017-04-06 16:24           ` Mel Gorman
2017-04-06 16:24             ` Mel Gorman
2017-04-06 16:55           ` Mel Gorman
2017-04-06 16:55             ` Mel Gorman
2017-04-06 17:12             ` Michal Hocko
2017-04-06 17:12               ` Michal Hocko
2017-04-06 17:46               ` Mel Gorman
2017-04-06 17:46                 ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170406110747.GJ5497@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arbab@linux.vnet.ibm.com \
    --cc=cmetcalf@mellanox.com \
    --cc=dan.j.williams@gmail.com \
    --cc=daniel.kiper@oracle.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=imammedo@redhat.com \
    --cc=js1304@gmail.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=qiuxishi@huawei.com \
    --cc=rientjes@google.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=slaoub@gmail.com \
    --cc=tangchen@cn.fujitsu.com \
    --cc=toshi.kani@hpe.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=yasu.isimatu@gmail.com \
    --cc=zhenzhang.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.