From: Dou Liyang <douly.fnst@cn.fujitsu.com> To: cl@linux.com, tj@kernel.org, mika.j.penttila@gmail.com, mingo@redhat.com, akpm@linux-foundation.org, rjw@rjwysocki.net, hpa@zytor.com, yasu.isimatu@gmail.com, isimatu.yasuaki@jp.fujitsu.com, kamezawa.hiroyu@jp.fujitsu.com, izumi.taku@jp.fujitsu.com, gongzhaogang@inspur.com, len.brown@intel.com, lenb@kernel.org, tglx@linutronix.de, chen.tang@easystack.cn, rafael@kernel.org Cc: x86@kernel.org, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Tang Chen <tangchen@cn.fujitsu.com>, Zhu Guihua <zhugh.fnst@cn.fujitsu.com>, Dou Liyang <douly.fnst@cn.fujitsu.com> Subject: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time. Date: Tue, 19 Jul 2016 15:28:02 +0800 [thread overview] Message-ID: <1468913288-16605-2-git-send-email-douly.fnst@cn.fujitsu.com> (raw) In-Reply-To: <1468913288-16605-1-git-send-email-douly.fnst@cn.fujitsu.com> From: Tang Chen <tangchen@cn.fujitsu.com> For now, x86 does not support memory-less node. A node without memory will not be onlined, and the cpus on it will be mapped to the other online nodes with memory in init_cpu_to_node(). The reason of doing this is to ensure each cpu has mapped to a node with memory, so that it will be able to allocate local memory for that cpu. But we don't have to do it in this way. In this series of patches, we are going to construct cpu <-> node mapping for all possible cpus at boot time, which is a 1-1 mapping. It means the cpu will be mapped to the node it belongs to, and will never be changed. If a node has only cpus but no memory, the cpus on it will be mapped to a memory-less node. And the memory-less node should be onlined. This patch allocate pgdats for all memory-less nodes and online them at boot time. Then build zonelists for these nodes. As a result, when cpus on these memory-less nodes try to allocate memory from local node, it will automatically fall back to the proper zones in the zonelists. Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> --- arch/x86/mm/numa.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 9c086c5..2a87a28 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -723,22 +723,19 @@ void __init x86_numa_init(void) numa_init(dummy_numa_init); } -static __init int find_near_online_node(int node) +static void __init init_memory_less_node(int nid) { - int n, val; - int min_val = INT_MAX; - int best_node = -1; + unsigned long zones_size[MAX_NR_ZONES] = {0}; + unsigned long zholes_size[MAX_NR_ZONES] = {0}; - for_each_online_node(n) { - val = node_distance(node, n); + /* Allocate and initialize node data. Memory-less node is now online.*/ + alloc_node_data(nid); + free_area_init_node(nid, zones_size, 0, zholes_size); - if (val < min_val) { - min_val = val; - best_node = n; - } - } - - return best_node; + /* + * All zonelists will be built later in start_kernel() after per cpu + * areas are initialized. + */ } /* @@ -767,8 +764,10 @@ void __init init_cpu_to_node(void) if (node == NUMA_NO_NODE) continue; + if (!node_online(node)) - node = find_near_online_node(node); + init_memory_less_node(node); + numa_set_node(cpu, node); } } -- 2.5.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Dou Liyang <douly.fnst@cn.fujitsu.com> To: <cl@linux.com>, <tj@kernel.org>, <mika.j.penttila@gmail.com>, <mingo@redhat.com>, <akpm@linux-foundation.org>, <rjw@rjwysocki.net>, <hpa@zytor.com>, <yasu.isimatu@gmail.com>, <isimatu.yasuaki@jp.fujitsu.com>, <kamezawa.hiroyu@jp.fujitsu.com>, <izumi.taku@jp.fujitsu.com>, <gongzhaogang@inspur.com>, <len.brown@intel.com>, <lenb@kernel.org>, <tglx@linutronix.de>, <chen.tang@easystack.cn>, <rafael@kernel.org> Cc: <x86@kernel.org>, <linux-acpi@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>, Tang Chen <tangchen@cn.fujitsu.com>, Zhu Guihua <zhugh.fnst@cn.fujitsu.com>, Dou Liyang <douly.fnst@cn.fujitsu.com> Subject: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time. Date: Tue, 19 Jul 2016 15:28:02 +0800 [thread overview] Message-ID: <1468913288-16605-2-git-send-email-douly.fnst@cn.fujitsu.com> (raw) In-Reply-To: <1468913288-16605-1-git-send-email-douly.fnst@cn.fujitsu.com> From: Tang Chen <tangchen@cn.fujitsu.com> For now, x86 does not support memory-less node. A node without memory will not be onlined, and the cpus on it will be mapped to the other online nodes with memory in init_cpu_to_node(). The reason of doing this is to ensure each cpu has mapped to a node with memory, so that it will be able to allocate local memory for that cpu. But we don't have to do it in this way. In this series of patches, we are going to construct cpu <-> node mapping for all possible cpus at boot time, which is a 1-1 mapping. It means the cpu will be mapped to the node it belongs to, and will never be changed. If a node has only cpus but no memory, the cpus on it will be mapped to a memory-less node. And the memory-less node should be onlined. This patch allocate pgdats for all memory-less nodes and online them at boot time. Then build zonelists for these nodes. As a result, when cpus on these memory-less nodes try to allocate memory from local node, it will automatically fall back to the proper zones in the zonelists. Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> --- arch/x86/mm/numa.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 9c086c5..2a87a28 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -723,22 +723,19 @@ void __init x86_numa_init(void) numa_init(dummy_numa_init); } -static __init int find_near_online_node(int node) +static void __init init_memory_less_node(int nid) { - int n, val; - int min_val = INT_MAX; - int best_node = -1; + unsigned long zones_size[MAX_NR_ZONES] = {0}; + unsigned long zholes_size[MAX_NR_ZONES] = {0}; - for_each_online_node(n) { - val = node_distance(node, n); + /* Allocate and initialize node data. Memory-less node is now online.*/ + alloc_node_data(nid); + free_area_init_node(nid, zones_size, 0, zholes_size); - if (val < min_val) { - min_val = val; - best_node = n; - } - } - - return best_node; + /* + * All zonelists will be built later in start_kernel() after per cpu + * areas are initialized. + */ } /* @@ -767,8 +764,10 @@ void __init init_cpu_to_node(void) if (node == NUMA_NO_NODE) continue; + if (!node_online(node)) - node = find_near_online_node(node); + init_memory_less_node(node); + numa_set_node(cpu, node); } } -- 2.5.5
next prev parent reply other threads:[~2016-07-19 7:28 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-07-19 7:28 [PATCH v8 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang 2016-07-19 7:28 ` Dou Liyang 2016-07-19 7:28 ` Dou Liyang [this message] 2016-07-19 7:28 ` [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time Dou Liyang 2016-07-19 18:50 ` Tejun Heo 2016-07-19 18:50 ` Tejun Heo 2016-07-20 1:52 ` Dou Liyang 2016-07-20 1:52 ` Dou Liyang 2016-07-20 2:28 ` Dou Liyang 2016-07-20 2:28 ` Dou Liyang 2016-07-20 2:28 ` Dou Liyang 2016-07-19 7:28 ` [PATCH v8 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus " Dou Liyang 2016-07-19 7:28 ` Dou Liyang 2016-07-19 7:28 ` [PATCH v8 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping Dou Liyang 2016-07-19 7:28 ` Dou Liyang 2016-07-19 7:28 ` [PATCH v8 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Dou Liyang 2016-07-19 7:28 ` Dou Liyang 2016-07-19 7:28 ` [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Dou Liyang 2016-07-19 7:28 ` Dou Liyang 2016-07-19 20:06 ` Rafael J. Wysocki 2016-07-19 20:06 ` Rafael J. Wysocki 2016-07-20 1:25 ` Dou Liyang 2016-07-20 1:25 ` Dou Liyang 2016-07-20 1:25 ` Dou Liyang 2016-07-19 7:28 ` [PATCH v8 6/7] Provide the mechanism to validate processors in the ACPI tables Dou Liyang 2016-07-19 7:28 ` Dou Liyang 2016-07-19 7:28 ` [PATCH v8 7/7] Provide the interface to validate the proc_id which they give Dou Liyang 2016-07-19 7:28 ` Dou Liyang 2016-07-19 18:53 ` Tejun Heo 2016-07-19 18:53 ` Tejun Heo 2016-07-20 0:55 ` Dou Liyang 2016-07-20 0:55 ` Dou Liyang 2016-07-20 0:55 ` Dou Liyang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1468913288-16605-2-git-send-email-douly.fnst@cn.fujitsu.com \ --to=douly.fnst@cn.fujitsu.com \ --cc=akpm@linux-foundation.org \ --cc=chen.tang@easystack.cn \ --cc=cl@linux.com \ --cc=gongzhaogang@inspur.com \ --cc=hpa@zytor.com \ --cc=isimatu.yasuaki@jp.fujitsu.com \ --cc=izumi.taku@jp.fujitsu.com \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=len.brown@intel.com \ --cc=lenb@kernel.org \ --cc=linux-acpi@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mika.j.penttila@gmail.com \ --cc=mingo@redhat.com \ --cc=rafael@kernel.org \ --cc=rjw@rjwysocki.net \ --cc=tangchen@cn.fujitsu.com \ --cc=tglx@linutronix.de \ --cc=tj@kernel.org \ --cc=x86@kernel.org \ --cc=yasu.isimatu@gmail.com \ --cc=zhugh.fnst@cn.fujitsu.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.