linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE.
@ 2013-05-24  9:29 Tang Chen
  2013-05-24  9:29 ` [PATCH v3 01/13] x86: get pg_data_t's memory from other node Tang Chen
                   ` (12 more replies)
  0 siblings, 13 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

In memory hotplug situation, the hotpluggable memory should be
arranged in ZONE_MOVABLE because memory in ZONE_NORMAL may be
used by kernel, and Linux cannot migrate pages used by kernel.

So we need a way to specify hotpluggable memory as movable. It
should be as easy as possible.

According to ACPI spec 5.0, SRAT table has memory affinity
structure and the structure has Hot Pluggable Filed. 
See "5.2.16.2 Memory Affinity Structure".

If we use the information, we might be able to specify hotpluggable
memory by firmware. For example, if Hot Pluggable Filed is enabled,
kernel sets the memory as movable memory.

To achieve this goal, we need to do the following:
1. Prevent memblock from allocating hotpluggable memroy for kernel.
   This is done by reserving hotpluggable memory in memblock as the
   folowing steps:
   1) Parse SRAT early enough so that memblock knows which memory
      is hotpluggable.
   2) Add a "flags" member to memblock so that it is able to tell
      which memory is hotpluggable when freeing it to buddy.

2. Free hotpluggable memory to buddy system when memory initialization
   is done.

3. Arrange hotpluggable memory in ZONE_MOVABLE.
   (This will cause NUMA performance decreased)

4. Provide a user interface to enable/disable this functionality.
   (This is useful for those who don't use memory hotplug and who don't
    want to lose their NUMA performance.)


This patch-set does the following:
patch1:        Fix a little problem.
patch2:        Have Hot-Pluggable Field in SRAT printed when parsing SRAT.
patch4,5:      Introduce hotpluggable field to numa_meminfo.
patch6,7:      Introduce flags to memblock, and keep the public APIs prototype
               unmodified.
patch8,9:      Reserve node-life-cycle memory as MEMBLK_LOCAL_NODE with memblock.
patch10,11:    Reserve hotpluggable memory as MEMBLK_HOTPLUGGABLE with memblock,
               and free it to buddy when memory initialization is done.
patch3,12,13:  Improve "movablecore" boot option to support "movablecore=acpi".


Change log v2 -> v3:
1. As Chen Gong <gong.chen@linux.intel.com> noticed that 
   memblock_alloc_try_nid() will call panic() if it fails to
   allocate memory, so remove the return value check in 
   setup_node_data() in patch1.
2. Did not movable find_usable_zone_for_movable() forward 
   to initialize movable_zone. Fixed in patch12.
3. Did not transform reserved->regions[i].base to its PFN 
   in find_zone_movable_pfns_for_nodes(). Fixed in patch12.

Change log v1 -> v2:
1. Fix a bug in patch10: forgot to update start and end value.
2. Add new patch8: make alloc_low_pages be able to call
   memory_add_physaddr_to_nid().


This patch-set is based on Yinghai's
"x86, ACPI, numa: Parse numa info early" patch-set.
Please refer to:
v1: https://lkml.org/lkml/2013/3/7/642
v2: https://lkml.org/lkml/2013/3/10/47
v3: https://lkml.org/lkml/2013/4/4/639
v4: https://lkml.org/lkml/2013/4/11/829

And Yinghai's patch did the following things:
1) Parse SRAT early enough.
2)Allocate pagetable pages in local node.

Tang Chen (12):
  acpi: Print Hot-Pluggable Field in SRAT.
  page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using
    SRAT.
  x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct
    numa_meminfo.
  x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup
    numa_meminfo.
  memblock, numa: Introduce flag into memblock.
  x86, numa, mem-hotplug: Mark nodes which the kernel resides in.
  x86, numa: Move memory_add_physaddr_to_nid() to CONFIG_NUMA.
  x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve
    node-life-cycle data.
  x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark
    and reserve hotpluggable memory.
  x86, memblock, mem-hotplug: Free hotpluggable memory reserved by
    memblock.
  x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher
    priority.
  doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot
    option.

Yasuaki Ishimatsu (1):
  x86: get pg_data_t's memory from other node

 Documentation/kernel-parameters.txt |    8 ++
 arch/x86/include/asm/numa.h         |    3 +-
 arch/x86/kernel/apic/numaq_32.c     |    2 +-
 arch/x86/mm/amdtopology.c           |    3 +-
 arch/x86/mm/init.c                  |   16 +++-
 arch/x86/mm/numa.c                  |   67 +++++++++++++++----
 arch/x86/mm/numa_internal.h         |    1 +
 arch/x86/mm/srat.c                  |   11 ++-
 include/linux/memblock.h            |   16 +++++
 include/linux/memory_hotplug.h      |    3 +
 mm/memblock.c                       |  127 ++++++++++++++++++++++++++++++----
 mm/nobootmem.c                      |    3 +
 mm/page_alloc.c                     |   44 ++++++++++++-
 13 files changed, 262 insertions(+), 42 deletions(-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 01/13] x86: get pg_data_t's memory from other node
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-24  9:29 ` [PATCH v3 02/13] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system can create movable node which all memory of the
node is allocated as ZONE_MOVABLE, setup_node_data() cannot
allocate memory for the node's pg_data_t.
So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
to retry when the first allocation fails.

As noticed by Chen Gong <gong.chen@linux.intel.com>, memblock_alloc_try_nid()
will call panic() if it fails to allocate memory. So we don't need to
check the return value.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 arch/x86/mm/numa.c |    7 +------
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 11acdf6..af18b18 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -214,12 +214,7 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
 	 * Allocate node data.  Try node-local memory and then any node.
 	 * Never allocate in DMA zone.
 	 */
-	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
-	if (!nd_pa) {
-		pr_err("Cannot find %zu bytes in node %d\n",
-		       nd_size, nid);
-		return;
-	}
+	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
 	nd = __va(nd_pa);
 
 	/* report and initialize */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 02/13] acpi: Print Hot-Pluggable Field in SRAT.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
  2013-05-24  9:29 ` [PATCH v3 01/13] x86: get pg_data_t's memory from other node Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-24  9:29 ` [PATCH v3 03/13] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT Tang Chen
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

The Hot-Pluggable field in SRAT suggests if the memory could be
hotplugged while the system is running. Print it as well when
parsing SRAT will help users to know which memory is hotpluggable.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/srat.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index 443f9ef..5055fa7 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -146,6 +146,7 @@ int __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
 	u64 start, end;
+	u32 hotpluggable;
 	int node, pxm;
 
 	if (srat_disabled())
@@ -154,7 +155,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		goto out_err_bad_srat;
 	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
 		goto out_err;
-	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
+	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+	if (hotpluggable && !save_add_info())
 		goto out_err;
 
 	start = ma->base_address;
@@ -174,9 +176,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 
 	node_set(node, numa_nodes_parsed);
 
-	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
+	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
 	       node, pxm,
-	       (unsigned long long) start, (unsigned long long) end - 1);
+	       (unsigned long long) start, (unsigned long long) end - 1,
+	       hotpluggable ? "Hot Pluggable" : "");
 
 	return 0;
 out_err_bad_srat:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 03/13] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
  2013-05-24  9:29 ` [PATCH v3 01/13] x86: get pg_data_t's memory from other node Tang Chen
  2013-05-24  9:29 ` [PATCH v3 02/13] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-24  9:29 ` [PATCH v3 04/13] x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct numa_meminfo Tang Chen
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

The Hot-Pluggable Fired in SRAT specified which memory ranges are hotpluggable.
We will arrange hotpluggable memory as ZONE_MOVABLE for users who want to use
memory hotplug functionality. But this will cause NUMA performance decreased
because kernel cannot use ZONE_MOVABLE.

So we improve movablecore boot option to allow those who want to use memory
hotplug functionality to enable using SRAT info to arrange movable memory.

Users can specify "movablecore=acpi" in kernel commandline to enable this
functionality.

For those who don't use memory hotplug or who don't want to lose their NUMA
performance, just don't specify anything. The kernel will work as before.

Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 include/linux/memory_hotplug.h |    3 +++
 mm/page_alloc.c                |   13 +++++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index b6a3be7..18fe2a3 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,9 @@ enum {
 	ONLINE_MOVABLE,
 };
 
+/* Enable/disable SRAT in movablecore boot option */
+extern bool movablecore_enable_srat;
+
 /*
  * pgdat resizing functions
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f368db4..b9ea143 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -208,6 +208,8 @@ static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
 
+bool __initdata movablecore_enable_srat = false;
+
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
 EXPORT_SYMBOL(movable_zone);
@@ -5025,6 +5027,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	}
 }
 
+static void __init cmdline_movablecore_srat(char *p)
+{
+	if (p && !strcmp(p, "acpi"))
+		movablecore_enable_srat = true;
+}
+
 static int __init cmdline_parse_core(char *p, unsigned long *core)
 {
 	unsigned long long coremem;
@@ -5055,6 +5063,11 @@ static int __init cmdline_parse_kernelcore(char *p)
  */
 static int __init cmdline_parse_movablecore(char *p)
 {
+	cmdline_movablecore_srat(p);
+
+	if (movablecore_enable_srat)
+		return 0;
+
 	return cmdline_parse_core(p, &required_movablecore);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 04/13] x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct numa_meminfo.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (2 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 03/13] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-24  9:29 ` [PATCH v3 05/13] x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup numa_meminfo Tang Chen
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

Since Yinghai has implement "Allocate pagetable pages in local node", for a
node with hotpluggable memory, we have to allocate pagetable pages first, and
then reserve the rest as hotpluggable memory in memblock.

But the kernel parse SRAT first, and then initialize memory mapping. So we have
to remember the which memory ranges are hotpluggable for future usage.

When parsing SRAT, we added each memory range to numa_meminfo. So we can store
hotpluggable info in numa_meminfo.

This patch introduces a "bool hotpluggable" member into struct
numa_meminfo.

And modifies the following APIs' prototypes to support it:
   - numa_add_memblk()
   - numa_add_memblk_to()

And the following callers:
   - numaq_register_node()
   - dummy_numa_init()
   - amd_numa_init()
   - acpi_numa_memory_affinity_init() in x86

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/include/asm/numa.h     |    3 ++-
 arch/x86/kernel/apic/numaq_32.c |    2 +-
 arch/x86/mm/amdtopology.c       |    3 ++-
 arch/x86/mm/numa.c              |   10 +++++++---
 arch/x86/mm/numa_internal.h     |    1 +
 arch/x86/mm/srat.c              |    2 +-
 6 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 1b99ee5..73096b2 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -31,7 +31,8 @@ extern int numa_off;
 extern s16 __apicid_to_node[MAX_LOCAL_APIC];
 extern nodemask_t numa_nodes_parsed __initdata;
 
-extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+extern int __init numa_add_memblk(int nodeid, u64 start, u64 end,
+				  bool hotpluggable);
 extern void __init numa_set_distance(int from, int to, int distance);
 
 static inline void set_apicid_to_node(int apicid, s16 node)
diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c
index d661ee9..7a9c542 100644
--- a/arch/x86/kernel/apic/numaq_32.c
+++ b/arch/x86/kernel/apic/numaq_32.c
@@ -82,7 +82,7 @@ static inline void numaq_register_node(int node, struct sys_cfg_data *scd)
 	int ret;
 
 	node_set(node, numa_nodes_parsed);
-	ret = numa_add_memblk(node, start, end);
+	ret = numa_add_memblk(node, start, end, false);
 	BUG_ON(ret < 0);
 }
 
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 5247d01..d521471 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -167,7 +167,8 @@ int __init amd_numa_init(void)
 			nodeid, base, limit);
 
 		prevbase = base;
-		numa_add_memblk(nodeid, base, limit);
+		/* Do not support memory hotplug for AMD cpu. */
+		numa_add_memblk(nodeid, base, limit, false);
 		node_set(nodeid, numa_nodes_parsed);
 	}
 
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index af18b18..892729b 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -134,6 +134,7 @@ void __init setup_node_to_cpumask_map(void)
 }
 
 static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
+				     bool hotpluggable,
 				     struct numa_meminfo *mi)
 {
 	/* ignore zero length blks */
@@ -155,6 +156,7 @@ static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
 	mi->blk[mi->nr_blks].start = start;
 	mi->blk[mi->nr_blks].end = end;
 	mi->blk[mi->nr_blks].nid = nid;
+	mi->blk[mi->nr_blks].hotpluggable = hotpluggable;
 	mi->nr_blks++;
 	return 0;
 }
@@ -179,15 +181,17 @@ void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi)
  * @nid: NUMA node ID of the new memblk
  * @start: Start address of the new memblk
  * @end: End address of the new memblk
+ * @hotpluggable: True if memblk is hotpluggable
  *
  * Add a new memblk to the default numa_meminfo.
  *
  * RETURNS:
  * 0 on success, -errno on failure.
  */
-int __init numa_add_memblk(int nid, u64 start, u64 end)
+int __init numa_add_memblk(int nid, u64 start, u64 end,
+			   bool hotpluggable)
 {
-	return numa_add_memblk_to(nid, start, end, &numa_meminfo);
+	return numa_add_memblk_to(nid, start, end, hotpluggable, &numa_meminfo);
 }
 
 /* Initialize NODE_DATA for a node on the local memory */
@@ -627,7 +631,7 @@ static int __init dummy_numa_init(void)
 	       0LLU, PFN_PHYS(max_pfn) - 1);
 
 	node_set(0, numa_nodes_parsed);
-	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
+	numa_add_memblk(0, 0, PFN_PHYS(max_pfn), false);
 
 	return 0;
 }
diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h
index bb2fbcc..1ce4e6b 100644
--- a/arch/x86/mm/numa_internal.h
+++ b/arch/x86/mm/numa_internal.h
@@ -8,6 +8,7 @@ struct numa_memblk {
 	u64			start;
 	u64			end;
 	int			nid;
+	bool			hotpluggable;
 };
 
 struct numa_meminfo {
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index 5055fa7..f7f6fd4 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -171,7 +171,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		goto out_err_bad_srat;
 	}
 
-	if (numa_add_memblk(node, start, end) < 0)
+	if (numa_add_memblk(node, start, end, hotpluggable) < 0)
 		goto out_err_bad_srat;
 
 	node_set(node, numa_nodes_parsed);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 05/13] x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup numa_meminfo.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (3 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 04/13] x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct numa_meminfo Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-24  9:29 ` [PATCH v3 06/13] memblock, numa: Introduce flag into memblock Tang Chen
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

Since we have introduced hotplug info into struct numa_meminfo, we need
to consider it when cleanup numa_meminfo.

The original logic in numa_cleanup_meminfo() is:
Merge blocks on the same node, holes between which don't overlap with
memory on other nodes.

This patch modifies numa_cleanup_meminfo() logic like this:
Merge blocks with the same hotpluggable type on the same node, holes
between which don't overlap with memory on other nodes.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 892729b..fec5ff8 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -292,18 +292,22 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 			}
 
 			/*
-			 * Join together blocks on the same node, holes
-			 * between which don't overlap with memory on other
-			 * nodes.
+			 * Join together blocks on the same node, with the same
+			 * hotpluggable flags, holes between which don't overlap
+			 * with memory on other nodes.
 			 */
 			if (bi->nid != bj->nid)
 				continue;
+			if (bi->hotpluggable != bj->hotpluggable)
+				continue;
+
 			start = min(bi->start, bj->start);
 			end = max(bi->end, bj->end);
 			for (k = 0; k < mi->nr_blks; k++) {
 				struct numa_memblk *bk = &mi->blk[k];
 
-				if (bi->nid == bk->nid)
+				if (bi->nid == bk->nid &&
+				    bi->hotpluggable == bk->hotpluggable)
 					continue;
 				if (start < bk->end && end > bk->start)
 					break;
@@ -323,6 +327,7 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 	for (i = mi->nr_blks; i < ARRAY_SIZE(mi->blk); i++) {
 		mi->blk[i].start = mi->blk[i].end = 0;
 		mi->blk[i].nid = NUMA_NO_NODE;
+		mi->blk[i].hotpluggable = false;
 	}
 
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 06/13] memblock, numa: Introduce flag into memblock.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (4 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 05/13] x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup numa_meminfo Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
       [not found]   ` <20130603013034.GA31743@hacker.(null)>
  2013-05-24  9:29 ` [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in Tang Chen
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

There is no flag in memblock to discribe what type the memory is.
Sometimes, we may use memblock to reserve some memory for special usage.
For example, as Yinghai did in his patch, allocate pagetables on local
node before all the memory on the node is mapped.
Please refer to Yinghai's patch:
v1: https://lkml.org/lkml/2013/3/7/642
v2: https://lkml.org/lkml/2013/3/10/47
v3: https://lkml.org/lkml/2013/4/4/639
v4: https://lkml.org/lkml/2013/4/11/829

In hotplug environment, there could be some problems when we hot-remove
memory if we do so. Pagetable pages are kernel memory, which we cannot
migrate. But we can put them in local node because their life-cycle is
the same as the node.  So we need to free them all before memory hot-removing.

Actually, data whose life cycle is the same as a node, such as pagetable
pages, vmemmap pages, page_cgroup pages, all could be put on local node.
They can be freed when we hot-removing a whole node.

In order to do so, we need to mark out these special pages in memblock.
In this patch, we introduce a new "flags" member into memblock_region:
   struct memblock_region {
           phys_addr_t base;
           phys_addr_t size;
           unsigned long flags;
   #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
           int nid;
   #endif
   };

This patch does the following things:
1) Add "flags" member to memblock_region, and MEMBLK_ANY flag for common usage.
2) Modify the following APIs' prototype:
	memblock_add_region()
	memblock_insert_region()
3) Add memblock_reserve_region() to support reserve memory with flags, and keep
   memblock_reserve()'s prototype unmodified.
4) Modify other APIs to support flags, but keep their prototype unmodified.

The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>.

Suggested-by: Wen Congyang <wency@cn.fujitsu.com>
Suggested-by: Liu Jiang <jiang.liu@huawei.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 include/linux/memblock.h |    8 ++++++
 mm/memblock.c            |   56 +++++++++++++++++++++++++++++++++------------
 2 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f388203..c63a66e 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -19,9 +19,17 @@
 
 #define INIT_MEMBLOCK_REGIONS	128
 
+#define MEMBLK_FLAGS_DEFAULT	0
+
+/* Definition of memblock flags. */
+enum memblock_flags {
+	__NR_MEMBLK_FLAGS,	/* number of flags */
+};
+
 struct memblock_region {
 	phys_addr_t base;
 	phys_addr_t size;
+	unsigned long flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	int nid;
 #endif
diff --git a/mm/memblock.c b/mm/memblock.c
index 16eda3d..63924ae 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -157,6 +157,7 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u
 		type->cnt = 1;
 		type->regions[0].base = 0;
 		type->regions[0].size = 0;
+		type->regions[0].flags = 0;
 		memblock_set_region_node(&type->regions[0], MAX_NUMNODES);
 	}
 }
@@ -307,7 +308,8 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
 
 		if (this->base + this->size != next->base ||
 		    memblock_get_region_node(this) !=
-		    memblock_get_region_node(next)) {
+		    memblock_get_region_node(next) ||
+		    this->flags != next->flags) {
 			BUG_ON(this->base + this->size > next->base);
 			i++;
 			continue;
@@ -327,13 +329,15 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
  * @base:	base address of the new region
  * @size:	size of the new region
  * @nid:	node id of the new region
+ * @flags:	flags of the new region
  *
  * Insert new memblock region [@base,@base+@size) into @type at @idx.
  * @type must already have extra room to accomodate the new region.
  */
 static void __init_memblock memblock_insert_region(struct memblock_type *type,
 						   int idx, phys_addr_t base,
-						   phys_addr_t size, int nid)
+						   phys_addr_t size,
+						   int nid, unsigned long flags)
 {
 	struct memblock_region *rgn = &type->regions[idx];
 
@@ -341,6 +345,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
 	memmove(rgn + 1, rgn, (type->cnt - idx) * sizeof(*rgn));
 	rgn->base = base;
 	rgn->size = size;
+	rgn->flags = flags;
 	memblock_set_region_node(rgn, nid);
 	type->cnt++;
 	type->total_size += size;
@@ -352,6 +357,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * @base: base address of the new region
  * @size: size of the new region
  * @nid: nid of the new region
+ * @flags: flags of the new region
  *
  * Add new memblock region [@base,@base+@size) into @type.  The new region
  * is allowed to overlap with existing ones - overlaps don't affect already
@@ -362,7 +368,8 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
  * 0 on success, -errno on failure.
  */
 static int __init_memblock memblock_add_region(struct memblock_type *type,
-				phys_addr_t base, phys_addr_t size, int nid)
+				phys_addr_t base, phys_addr_t size,
+				int nid, unsigned long flags)
 {
 	bool insert = false;
 	phys_addr_t obase = base;
@@ -377,6 +384,7 @@ static int __init_memblock memblock_add_region(struct memblock_type *type,
 		WARN_ON(type->cnt != 1 || type->total_size);
 		type->regions[0].base = base;
 		type->regions[0].size = size;
+		type->regions[0].flags = flags;
 		memblock_set_region_node(&type->regions[0], nid);
 		type->total_size = size;
 		return 0;
@@ -407,7 +415,8 @@ repeat:
 			nr_new++;
 			if (insert)
 				memblock_insert_region(type, i++, base,
-						       rbase - base, nid);
+						       rbase - base, nid,
+						       flags);
 		}
 		/* area below @rend is dealt with, forget about it */
 		base = min(rend, end);
@@ -417,7 +426,8 @@ repeat:
 	if (base < end) {
 		nr_new++;
 		if (insert)
-			memblock_insert_region(type, i, base, end - base, nid);
+			memblock_insert_region(type, i, base, end - base,
+					       nid, flags);
 	}
 
 	/*
@@ -439,12 +449,14 @@ repeat:
 int __init_memblock memblock_add_node(phys_addr_t base, phys_addr_t size,
 				       int nid)
 {
-	return memblock_add_region(&memblock.memory, base, size, nid);
+	return memblock_add_region(&memblock.memory, base, size,
+				   nid, MEMBLK_FLAGS_DEFAULT);
 }
 
 int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_add_region(&memblock.memory, base, size, MAX_NUMNODES);
+	return memblock_add_region(&memblock.memory, base, size,
+				   MAX_NUMNODES, MEMBLK_FLAGS_DEFAULT);
 }
 
 /**
@@ -499,7 +511,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= base - rbase;
 			type->total_size -= base - rbase;
 			memblock_insert_region(type, i, rbase, base - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else if (rend > end) {
 			/*
 			 * @rgn intersects from above.  Split and redo the
@@ -509,7 +522,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
 			rgn->size -= end - rbase;
 			type->total_size -= end - rbase;
 			memblock_insert_region(type, i--, rbase, end - rbase,
-					       memblock_get_region_node(rgn));
+					       memblock_get_region_node(rgn),
+					       rgn->flags);
 		} else {
 			/* @rgn is fully contained, record it */
 			if (!*end_rgn)
@@ -551,16 +565,25 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
 	return __memblock_remove(&memblock.reserved, base, size);
 }
 
-int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+static int __init_memblock memblock_reserve_region(phys_addr_t base,
+						   phys_addr_t size,
+						   int nid,
+						   unsigned long flags)
 {
 	struct memblock_type *_rgn = &memblock.reserved;
 
-	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
+	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] with flags %#016lx %pF\n",
 		     (unsigned long long)base,
 		     (unsigned long long)base + size,
-		     (void *)_RET_IP_);
+		     flags, (void *)_RET_IP_);
+
+	return memblock_add_region(_rgn, base, size, nid, flags);
+}
 
-	return memblock_add_region(_rgn, base, size, MAX_NUMNODES);
+int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+{
+	return memblock_reserve_region(base, size, MAX_NUMNODES,
+				       MEMBLK_FLAGS_DEFAULT);
 }
 
 /**
@@ -982,6 +1005,7 @@ void __init_memblock memblock_set_current_limit(phys_addr_t limit)
 static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
 {
 	unsigned long long base, size;
+	unsigned long flags;
 	int i;
 
 	pr_info(" %s.cnt  = 0x%lx\n", name, type->cnt);
@@ -992,13 +1016,15 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name
 
 		base = rgn->base;
 		size = rgn->size;
+		flags = rgn->flags;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 		if (memblock_get_region_node(rgn) != MAX_NUMNODES)
 			snprintf(nid_buf, sizeof(nid_buf), " on node %d",
 				 memblock_get_region_node(rgn));
 #endif
-		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s\n",
-			name, i, base, base + size - 1, size, nid_buf);
+		pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s "
+			"flags: %#lx\n",
+			name, i, base, base + size - 1, size, nid_buf, flags);
 	}
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (5 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 06/13] memblock, numa: Introduce flag into memblock Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-31 16:24   ` Vasilis Liaskovitis
  2013-05-24  9:29 ` [PATCH v3 08/13] x86, numa: Move memory_add_physaddr_to_nid() to CONFIG_NUMA Tang Chen
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

If all the memory ranges in SRAT are hotpluggable, we should not
arrange them all in ZONE_MOVABLE. Otherwise the kernel won't have
enough memory to boot.

This patch introduce a global variable kernel_nodemask to mark
all the nodes the kernel resides in. And no matter if they are
hotpluggable, we arrange them as un-hotpluggable.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c       |    6 ++++++
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   20 ++++++++++++++++++++
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index fec5ff8..8357c75 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -654,6 +654,12 @@ static bool srat_used __initdata;
  */
 static void __init early_x86_numa_init(void)
 {
+	/*
+	 * Need to find out which nodes the kernel resides in, and arrange
+	 * them as un-hotpluggable when parsing SRAT.
+	 */
+	memblock_mark_kernel_nodes();
+
 	if (!numa_off) {
 #ifdef CONFIG_X86_NUMAQ
 		if (!numa_init(numaq_numa_init))
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index c63a66e..5064eed 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -66,6 +66,7 @@ int memblock_remove(phys_addr_t base, phys_addr_t size);
 int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
+void memblock_mark_kernel_nodes(void);
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index 63924ae..1b93a5d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -35,6 +35,9 @@ struct memblock memblock __initdata_memblock = {
 	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
 };
 
+/* Mark which nodes the kernel resides in. */
+static nodemask_t memblock_kernel_nodemask __initdata_memblock;
+
 int memblock_debug __initdata_memblock;
 static int memblock_can_resize __initdata_memblock;
 static int memblock_memory_in_slab __initdata_memblock = 0;
@@ -787,6 +790,23 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
 	memblock_merge_regions(type);
 	return 0;
 }
+
+void __init_memblock memblock_mark_kernel_nodes()
+{
+	int i, nid;
+	struct memblock_type *reserved = &memblock.reserved;
+
+	for (i = 0; i < reserved->cnt; i++)
+		if (reserved->regions[i].flags == MEMBLK_FLAGS_DEFAULT) {
+			nid = memblock_get_region_node(&reserved->regions[i]);
+			node_set(nid, memblock_kernel_nodemask);
+		}
+}
+#else
+void __init_memblock memblock_mark_kernel_nodes()
+{
+	node_set(0, memblock_kernel_nodemask);
+}
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 08/13] x86, numa: Move memory_add_physaddr_to_nid() to CONFIG_NUMA.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (6 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-24  9:29 ` [PATCH v3 09/13] x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve node-life-cycle data Tang Chen
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

memory_add_physaddr_to_nid() is declared in include/linux/memory_hotplug.h,
protected by CONFIG_NUMA. And in x86, the definitions are protected by
CONFIG_MEMORY_HOTPLUG.

memory_add_physaddr_to_nid() uses numa_meminfo to find the physical address's
nid. It has nothing to do with memory hotplug. And also, it can be used by
alloc_low_pages() to obtain nid of the allocated memory.

So in x86, also use CONFIG_NUMA to protect it.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/numa.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 8357c75..b28baf3 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -955,7 +955,7 @@ EXPORT_SYMBOL(cpumask_of_node);
 
 #endif	/* !CONFIG_DEBUG_PER_CPU_MAPS */
 
-#ifdef CONFIG_MEMORY_HOTPLUG
+#ifdef CONFIG_NUMA
 int memory_add_physaddr_to_nid(u64 start)
 {
 	struct numa_meminfo *mi = &numa_meminfo;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 09/13] x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve node-life-cycle data.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (7 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 08/13] x86, numa: Move memory_add_physaddr_to_nid() to CONFIG_NUMA Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-24  9:29 ` [PATCH v3 10/13] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory Tang Chen
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

node-life-cycle data (whose life cycle is the same as a node)
allocated by memblock should be marked so that when we free usable
memory to buddy system, we can skip them.

This patch introduces a flag MEMBLK_LOCAL_NODE for memblock to reserve
node-life-cycle data. For now, it is only kernel direct mapping pagetable
pages, based on Yinghai's patch.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/init.c       |   16 ++++++++++++----
 include/linux/memblock.h |    2 ++
 mm/memblock.c            |    7 +++++++
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8d0007a..002d487 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -62,14 +62,22 @@ __ref void *alloc_low_pages(unsigned int num)
 					low_min_pfn_mapped << PAGE_SHIFT,
 					low_max_pfn_mapped << PAGE_SHIFT,
 					PAGE_SIZE * num , PAGE_SIZE);
-		} else
+			if (!ret)
+				panic("alloc_low_page: can not alloc memory");
+
+			memblock_reserve(ret, PAGE_SIZE * num);
+		} else {
 			ret = memblock_find_in_range(
 					local_min_pfn_mapped << PAGE_SHIFT,
 					local_max_pfn_mapped << PAGE_SHIFT,
 					PAGE_SIZE * num , PAGE_SIZE);
-		if (!ret)
-			panic("alloc_low_page: can not alloc memory");
-		memblock_reserve(ret, PAGE_SIZE * num);
+			if (!ret)
+				panic("alloc_low_page: can not alloc memory");
+
+			memblock_reserve_local_node(ret, PAGE_SIZE * num,
+					memory_add_physaddr_to_nid(ret));
+		}
+
 		pfn = ret >> PAGE_SHIFT;
 	} else {
 		pfn = pgt_buf_end;
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5064eed..3b2d1c4 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -23,6 +23,7 @@
 
 /* Definition of memblock flags. */
 enum memblock_flags {
+	MEMBLK_LOCAL_NODE,	/* node-life-cycle data */
 	__NR_MEMBLK_FLAGS,	/* number of flags */
 };
 
@@ -65,6 +66,7 @@ int memblock_add(phys_addr_t base, phys_addr_t size);
 int memblock_remove(phys_addr_t base, phys_addr_t size);
 int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
+int memblock_reserve_local_node(phys_addr_t base, phys_addr_t size, int nid);
 void memblock_trim_memory(phys_addr_t align);
 void memblock_mark_kernel_nodes(void);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index 1b93a5d..edde4c2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -589,6 +589,13 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
 				       MEMBLK_FLAGS_DEFAULT);
 }
 
+int __init_memblock memblock_reserve_local_node(phys_addr_t base,
+					phys_addr_t size, int nid)
+{
+	unsigned long flags = 1 << MEMBLK_LOCAL_NODE;
+	return memblock_reserve_region(base, size, nid, flags);
+}
+
 /**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 10/13] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (8 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 09/13] x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve node-life-cycle data Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-31 16:15   ` Vasilis Liaskovitis
  2013-05-24  9:29 ` [PATCH v3 11/13] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock Tang Chen
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

We mark out movable memory ranges and reserve them with MEMBLK_HOTPLUGGABLE flag in
memblock.reserved. This should be done after the memory mapping is initialized
because the kernel now supports allocate pagetable pages on local node, which
are kernel pages.

The reserved hotpluggable will be freed to buddy when memory initialization
is done.

And also, ensure all the nodes which the kernel resides in are un-hotpluggable.

This idea is from Wen Congyang <wency@cn.fujitsu.com> and Jiang Liu <jiang.liu@huawei.com>.

Suggested-by: Jiang Liu <jiang.liu@huawei.com>
Suggested-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 arch/x86/mm/numa.c       |   29 +++++++++++++++++++++++++++++
 include/linux/memblock.h |    3 +++
 mm/memblock.c            |   19 +++++++++++++++++++
 3 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index b28baf3..73f9ade 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -727,6 +727,33 @@ static void __init early_x86_numa_init_mapping(void)
 }
 #endif
 
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+static void __init early_mem_hotplug_init()
+{
+	int i, nid;
+	phys_addr_t start, end;
+
+	if (!movablecore_enable_srat)
+		return;
+
+	for (i = 0; i < numa_meminfo.nr_blks; i++) {
+		nid = numa_meminfo.blk[i].nid;
+		start = numa_meminfo.blk[i].start;
+		end = numa_meminfo.blk[i].end;
+
+		if (!numa_meminfo.blk[i].hotpluggable ||
+		    memblock_is_kernel_node(nid))
+			continue;
+
+		memblock_reserve_hotpluggable(start, end - start, nid);
+	}
+}
+#else		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+static inline void early_mem_hotplug_init()
+{
+}
+#endif		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
 void __init early_initmem_init(void)
 {
 	early_x86_numa_init();
@@ -736,6 +763,8 @@ void __init early_initmem_init(void)
 	load_cr3(swapper_pg_dir);
 	__flush_tlb_all();
 
+	early_mem_hotplug_init();
+
 	early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
 }
 
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 3b2d1c4..0f01930 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -24,6 +24,7 @@
 /* Definition of memblock flags. */
 enum memblock_flags {
 	MEMBLK_LOCAL_NODE,	/* node-life-cycle data */
+	MEMBLK_HOTPLUGGABLE,	/* hotpluggable region */
 	__NR_MEMBLK_FLAGS,	/* number of flags */
 };
 
@@ -67,8 +68,10 @@ int memblock_remove(phys_addr_t base, phys_addr_t size);
 int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
 int memblock_reserve_local_node(phys_addr_t base, phys_addr_t size, int nid);
+int memblock_reserve_hotpluggable(phys_addr_t base, phys_addr_t size, int nid);
 void memblock_trim_memory(phys_addr_t align);
 void memblock_mark_kernel_nodes(void);
+bool memblock_is_kernel_node(int nid);
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index edde4c2..0c55588 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -596,6 +596,13 @@ int __init_memblock memblock_reserve_local_node(phys_addr_t base,
 	return memblock_reserve_region(base, size, nid, flags);
 }
 
+int __init_memblock memblock_reserve_hotpluggable(phys_addr_t base,
+					phys_addr_t size, int nid)
+{
+	unsigned long flags = 1 << MEMBLK_HOTPLUGGABLE;
+	return memblock_reserve_region(base, size, nid, flags);
+}
+
 /**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
@@ -809,11 +816,23 @@ void __init_memblock memblock_mark_kernel_nodes()
 			node_set(nid, memblock_kernel_nodemask);
 		}
 }
+
+bool __init_memblock memblock_is_kernel_node(int nid)
+{
+	if (node_isset(nid, memblock_kernel_nodemask))
+		return true;
+	return false;
+}
 #else
 void __init_memblock memblock_mark_kernel_nodes()
 {
 	node_set(0, memblock_kernel_nodemask);
 }
+
+bool __init_memblock memblock_is_kernel_node(int nid)
+{
+	return true;
+}
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 11/13] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (9 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 10/13] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  2013-05-24  9:29 ` [PATCH v3 12/13] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority Tang Chen
  2013-05-24  9:29 ` [PATCH v3 13/13] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option Tang Chen
  12 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

We reserved hotpluggable memory in memblock. And when memory initialization
is done, we have to free it to buddy system.

This patch free memory reserved by memblock with flag MEMBLK_HOTPLUGGABLE.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   20 ++++++++++++++++++++
 mm/nobootmem.c           |    3 +++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 0f01930..08c761d 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -69,6 +69,7 @@ int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
 int memblock_reserve_local_node(phys_addr_t base, phys_addr_t size, int nid);
 int memblock_reserve_hotpluggable(phys_addr_t base, phys_addr_t size, int nid);
+void memblock_free_hotpluggable(void);
 void memblock_trim_memory(phys_addr_t align);
 void memblock_mark_kernel_nodes(void);
 bool memblock_is_kernel_node(int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index 0c55588..54de398 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -568,6 +568,26 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
 	return __memblock_remove(&memblock.reserved, base, size);
 }
 
+static void __init_memblock memblock_free_flags(unsigned long flags)
+{
+	int i;
+	struct memblock_type *reserved = &memblock.reserved;
+
+	for (i = 0; i < reserved->cnt; i++) {
+		if (reserved->regions[i].flags == flags)
+			memblock_remove_region(reserved, i);
+	}
+}
+
+void __init_memblock memblock_free_hotpluggable()
+{
+	unsigned long flags = 1 << MEMBLK_HOTPLUGGABLE;
+
+	memblock_dbg("memblock: free all hotpluggable memory");
+
+	memblock_free_flags(flags);
+}
+
 static int __init_memblock memblock_reserve_region(phys_addr_t base,
 						   phys_addr_t size,
 						   int nid,
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 5e07d36..cd85604 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -165,6 +165,9 @@ unsigned long __init free_all_bootmem(void)
 	for_each_online_pgdat(pgdat)
 		reset_node_lowmem_managed_pages(pgdat);
 
+	/* Hotpluggable memory reserved by memblock should also be freed. */
+	memblock_free_hotpluggable();
+
 	/*
 	 * We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
 	 *  because in some case like Node0 doesn't have RAM installed
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 12/13] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (10 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 11/13] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
       [not found]   ` <20130603025924.GB7441@hacker.(null)>
  2013-05-24  9:29 ` [PATCH v3 13/13] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option Tang Chen
  12 siblings, 1 reply; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

Arrange hotpluggable memory as ZONE_MOVABLE will cause NUMA performance decreased
because the kernel cannot use movable memory.

For users who don't use memory hotplug and who don't want to lose their NUMA
performance, they need a way to disable this functionality.

So, if users specify "movablecore=acpi" in kernel commandline, the kernel will
use SRAT to arrange ZONE_MOVABLE, and it has higher priority then original
movablecore and kernelcore boot option.

For those who don't want this, just specify nothing.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |    5 +++++
 mm/page_alloc.c          |   31 +++++++++++++++++++++++++++++--
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 08c761d..5528e8f 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -69,6 +69,7 @@ int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
 int memblock_reserve_local_node(phys_addr_t base, phys_addr_t size, int nid);
 int memblock_reserve_hotpluggable(phys_addr_t base, phys_addr_t size, int nid);
+bool memblock_is_hotpluggable(struct memblock_region *region);
 void memblock_free_hotpluggable(void);
 void memblock_trim_memory(phys_addr_t align);
 void memblock_mark_kernel_nodes(void);
diff --git a/mm/memblock.c b/mm/memblock.c
index 54de398..8b9a13c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -623,6 +623,11 @@ int __init_memblock memblock_reserve_hotpluggable(phys_addr_t base,
 	return memblock_reserve_region(base, size, nid, flags);
 }
 
+bool __init_memblock memblock_is_hotpluggable(struct memblock_region *region)
+{
+	return region->flags & (1 << MEMBLK_HOTPLUGGABLE);
+}
+
 /**
  * __next_free_mem_range - next function for for_each_free_mem_range()
  * @idx: pointer to u64 loop variable
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b9ea143..557b21b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4793,9 +4793,37 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	nodemask_t saved_node_state = node_states[N_MEMORY];
 	unsigned long totalpages = early_calculate_totalpages();
 	int usable_nodes = nodes_weight(node_states[N_MEMORY]);
+	struct memblock_type *reserved = &memblock.reserved;
 
 	/*
-	 * If movablecore was specified, calculate what size of
+	 * Need to find movable_zone earlier in case movablecore=acpi is
+	 * specified.
+	 */
+	find_usable_zone_for_movable();
+
+	/*
+	 * If movablecore=acpi was specified, then zone_movable_pfn[] has been
+	 * initialized, and no more work needs to do.
+	 * NOTE: In this case, we ignore kernelcore option.
+	 */
+	if (movablecore_enable_srat) {
+		for (i = 0; i < reserved->cnt; i++) {
+			if (!memblock_is_hotpluggable(&reserved->regions[i]))
+				continue;
+
+			nid = reserved->regions[i].nid;
+
+			usable_startpfn = PFN_DOWN(reserved->regions[i].base);
+			zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
+				min(usable_startpfn, zone_movable_pfn[nid]) :
+				usable_startpfn;
+		}
+
+		goto out;
+	}
+
+	/*
+	 * If movablecore=nn[KMG] was specified, calculate what size of
 	 * kernelcore that corresponds so that memory usable for
 	 * any allocation type is evenly spread. If both kernelcore
 	 * and movablecore are specified, then the value of kernelcore
@@ -4821,7 +4849,6 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		goto out;
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 13/13] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option.
  2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
                   ` (11 preceding siblings ...)
  2013-05-24  9:29 ` [PATCH v3 12/13] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority Tang Chen
@ 2013-05-24  9:29 ` Tang Chen
  12 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-05-24  9:29 UTC (permalink / raw)
  To: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit
  Cc: x86, linux-doc, linux-kernel, linux-mm

Since we modify movablecore boot option to support
"movablecore=acpi", this patch adds doc for it.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4609e81..a1c515b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1649,6 +1649,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablecore=acpi	[KNL,X86] This parameter will enable the
+			kernel to arrange ZONE_MOVABLE with the help of
+			Hot-Pluggable Field in SRAT. All the hotpluggable
+			memory will be arranged in ZONE_MOVABLE.
+			NOTE: Any node which the kernel resides in will
+			      always be un-hotpluggable so that the kernel
+			      will always have enough memory to boot.
+
 	MTD_Partition=	[MTD]
 			Format: <name>,<region-number>,<size>,<offset>
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 10/13] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory.
  2013-05-24  9:29 ` [PATCH v3 10/13] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory Tang Chen
@ 2013-05-31 16:15   ` Vasilis Liaskovitis
  0 siblings, 0 replies; 21+ messages in thread
From: Vasilis Liaskovitis @ 2013-05-31 16:15 UTC (permalink / raw)
  To: Tang Chen
  Cc: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	lwoodman, riel, jweiner, prarit, x86, linux-doc, linux-kernel,
	linux-mm

On Fri, May 24, 2013 at 05:29:19PM +0800, Tang Chen wrote:
> We mark out movable memory ranges and reserve them with MEMBLK_HOTPLUGGABLE flag in
> memblock.reserved. This should be done after the memory mapping is initialized
> because the kernel now supports allocate pagetable pages on local node, which
> are kernel pages.
> 
> The reserved hotpluggable will be freed to buddy when memory initialization
> is done.
> 
> And also, ensure all the nodes which the kernel resides in are un-hotpluggable.
> 
> This idea is from Wen Congyang <wency@cn.fujitsu.com> and Jiang Liu <jiang.liu@huawei.com>.
> 
> Suggested-by: Jiang Liu <jiang.liu@huawei.com>
> Suggested-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  arch/x86/mm/numa.c       |   29 +++++++++++++++++++++++++++++
>  include/linux/memblock.h |    3 +++
>  mm/memblock.c            |   19 +++++++++++++++++++
>  3 files changed, 51 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index b28baf3..73f9ade 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -727,6 +727,33 @@ static void __init early_x86_numa_init_mapping(void)
>  }
>  #endif
>  
> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +static void __init early_mem_hotplug_init()
> +{
> +	int i, nid;
> +	phys_addr_t start, end;
> +
> +	if (!movablecore_enable_srat)
> +		return;
> +
> +	for (i = 0; i < numa_meminfo.nr_blks; i++) {
> +		nid = numa_meminfo.blk[i].nid;
> +		start = numa_meminfo.blk[i].start;
> +		end = numa_meminfo.blk[i].end;
> +
> +		if (!numa_meminfo.blk[i].hotpluggable ||
> +		    memblock_is_kernel_node(nid))
> +			continue;

In my v2 testing, I had a seabios bug: *all* memory was marked as hotpluggable
and the first if condition clause above always returned true.
I have a fixed seabios version that only sets hotplug bit to 1 for extra dimms
(see my v2 reply on how to use it with qemu):
https://github.com/vliaskov/seabios/commits/memhp-v4

I think there is another problem with mark_kernel_nodes though, see my comment
for 7/13.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in.
  2013-05-24  9:29 ` [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in Tang Chen
@ 2013-05-31 16:24   ` Vasilis Liaskovitis
  2013-06-03  7:35     ` Tang Chen
  0 siblings, 1 reply; 21+ messages in thread
From: Vasilis Liaskovitis @ 2013-05-31 16:24 UTC (permalink / raw)
  To: Tang Chen
  Cc: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	lwoodman, riel, jweiner, prarit, x86, linux-doc, linux-kernel,
	linux-mm

Hi,

On Fri, May 24, 2013 at 05:29:16PM +0800, Tang Chen wrote:
> If all the memory ranges in SRAT are hotpluggable, we should not
> arrange them all in ZONE_MOVABLE. Otherwise the kernel won't have
> enough memory to boot.
> 
> This patch introduce a global variable kernel_nodemask to mark
> all the nodes the kernel resides in. And no matter if they are
> hotpluggable, we arrange them as un-hotpluggable.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> ---
>  arch/x86/mm/numa.c       |    6 ++++++
>  include/linux/memblock.h |    1 +
>  mm/memblock.c            |   20 ++++++++++++++++++++
>  3 files changed, 27 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index fec5ff8..8357c75 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -654,6 +654,12 @@ static bool srat_used __initdata;
>   */
>  static void __init early_x86_numa_init(void)
>  {
> +	/*
> +	 * Need to find out which nodes the kernel resides in, and arrange
> +	 * them as un-hotpluggable when parsing SRAT.
> +	 */
> +	memblock_mark_kernel_nodes();
> +
>  	if (!numa_off) {
>  #ifdef CONFIG_X86_NUMAQ
>  		if (!numa_init(numaq_numa_init))
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index c63a66e..5064eed 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -66,6 +66,7 @@ int memblock_remove(phys_addr_t base, phys_addr_t size);
>  int memblock_free(phys_addr_t base, phys_addr_t size);
>  int memblock_reserve(phys_addr_t base, phys_addr_t size);
>  void memblock_trim_memory(phys_addr_t align);
> +void memblock_mark_kernel_nodes(void);
>  
>  #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>  void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 63924ae..1b93a5d 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -35,6 +35,9 @@ struct memblock memblock __initdata_memblock = {
>  	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
>  };
>  
> +/* Mark which nodes the kernel resides in. */
> +static nodemask_t memblock_kernel_nodemask __initdata_memblock;
> +
>  int memblock_debug __initdata_memblock;
>  static int memblock_can_resize __initdata_memblock;
>  static int memblock_memory_in_slab __initdata_memblock = 0;
> @@ -787,6 +790,23 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>  	memblock_merge_regions(type);
>  	return 0;
>  }
> +
> +void __init_memblock memblock_mark_kernel_nodes()
> +{
> +	int i, nid;
> +	struct memblock_type *reserved = &memblock.reserved;
> +
> +	for (i = 0; i < reserved->cnt; i++)
> +		if (reserved->regions[i].flags == MEMBLK_FLAGS_DEFAULT) {
> +			nid = memblock_get_region_node(&reserved->regions[i]);
> +			node_set(nid, memblock_kernel_nodemask);
> +		}
> +}

I think there is a problem here because memblock_set_region_node is sometimes
called with nid == MAX_NUMNODES. This means the correct node is not properly
masked in the memblock_kernel_nodemask bitmap.
E.g. in a VM test, memblock_mark_kernel_nodes with extra pr_warn calls iterates
over the following memblocks (ranges below are memblks base-(base+size)):

[    0.000000] memblock_mark_kernel_nodes nid=64 0x00000000000000-0x00000000010000
[    0.000000] memblock_mark_kernel_nodes nid=64 0x00000000098000-0x00000000100000
[    0.000000] memblock_mark_kernel_nodes nid=64 0x00000001000000-0x00000001a5a000
[    0.000000] memblock_mark_kernel_nodes nid=64 0x00000037000000-0x000000377f8000

where MAX_NUMNODES is 64 because CONFIG_NODES_SHIFT=6.
The ranges above belong to node 0, but the node's bit is never marked.

With a buggy bios that marks all memory as hotpluggable, this results in a
panic, because both checks against hotpluggable bit and memblock_kernel_bitmask
(in early_mem_hotplug_init) fail, the numa regions have all been merged together
and memblock_reserve_hotpluggable is called for all memory. 

With a correct bios (some part of initial memory is not hotplug-able) the kernel
can boot since the hotpluggable bit check works ok, but extra dimms on node 0
will still be allowed to be in MOVABLE_ZONE.

Actually this behaviour (being able to have MOVABLE memory on nodes with kernel
reserved memblocks) sort of matches the policy I requested in v2 :). But i
suspect that is not your intent i.e. you want memblock_kernel_nodemask_bitmap to
prevent movable reservations for the whole node where kernel has reserved
memblocks.

Is there a way to get accurate nid information for memblocks at early boot? I
suspect pfn_to_nid doesn't work yet at this stage (i got a panic when I
attempted iirc)

I used the hack below but it depends on CONFIG_NUMA, hopefully there is a
cleaner general way:

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index cfd8c2f..af8ad2a 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -133,6 +133,19 @@ void __init setup_node_to_cpumask_map(void)
 	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
 }
 
+int __init numa_find_range_nid(u64 start, u64 size)
+{
+	unsigned int i;
+	struct numa_meminfo *mi = &numa_meminfo;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		if (start >= mi->blk[i].start && start + size -1 <= mi->blk[i].end)
+		 return mi->blk[i].nid;
+	}
+	return -1;
+}
+EXPORT_SYMBOL(numa_find_range_nid);
+
 static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
 				     bool hotpluggable,
 				     struct numa_meminfo *mi)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 77a71fb..194b7c7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1600,6 +1600,9 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end);
 #endif
 
+#ifdef CONFIG_NUMA
+int __init numa_find_range_nid(u64 start, u64 size);
+#endif
 struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
 			unsigned long pfn, unsigned long size, pgprot_t);
diff --git a/mm/memblock.c b/mm/memblock.c
index a6b7845..284aced 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -834,15 +834,26 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
 
 void __init_memblock memblock_mark_kernel_nodes()
 {
-	int i, nid;
+	int i, nid, tmpnid;
 	struct memblock_type *reserved = &memblock.reserved;
 
 	for (i = 0; i < reserved->cnt; i++)
 		if (reserved->regions[i].flags == MEMBLK_FLAGS_DEFAULT) {
 			nid = memblock_get_region_node(&reserved->regions[i]);
+		if (nid == MAX_NUMNODES) {
+			tmpnid = numa_find_range_nid(reserved->regions[i].base,
+				reserved->regions[i].size);
+			if (tmpnid >= 0)
+				nid = tmpnid;
+		}

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index e862311..84d6e64 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -667,11 +667,7 @@ static bool srat_used __initdata;
  */
 static void __init early_x86_numa_init(void)
 {
-	/*
-	 * Need to find out which nodes the kernel resides in, and arrange
-	 * them as un-hotpluggable when parsing SRAT.
-	 */
-	memblock_mark_kernel_nodes();
 
 	if (!numa_off) {
 #ifdef CONFIG_X86_NUMAQ
@@ -779,6 +775,12 @@ void __init early_initmem_init(void)
 	load_cr3(swapper_pg_dir);
 	__flush_tlb_all();
 
+	/*
+	 * Need to find out which nodes the kernel resides in, and arrange
+	 * them as un-hotpluggable when parsing SRAT.
+	 */
+
+	memblock_mark_kernel_nodes();
 	early_mem_hotplug_init();
 
 	early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
-- 



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 06/13] memblock, numa: Introduce flag into memblock.
       [not found]   ` <20130603013034.GA31743@hacker.(null)>
@ 2013-06-03  1:59     ` Tang Chen
  0 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-06-03  1:59 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit, x86,
	linux-doc, linux-kernel, linux-mm

Hi Li,

On 06/03/2013 09:30 AM, Wanpeng Li wrote:
> On Fri, May 24, 2013 at 05:29:15PM +0800, Tang Chen wrote:
>> There is no flag in memblock to discribe what type the memory is.
>
> s/discribe/describe

OK.
......
>>
>> +#define MEMBLK_FLAGS_DEFAULT	0
>> +
>
> MEMBLK_FLAGS_DEFAULT is one of the memblock flags, it should also include in
> memblock_flags emum.
>

Hum, here I think I can change all the flags in the enum into macro. 
Seems that
the macro is easier to use.

Thanks. :)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in.
  2013-05-31 16:24   ` Vasilis Liaskovitis
@ 2013-06-03  7:35     ` Tang Chen
  2013-06-03 13:18       ` Vasilis Liaskovitis
  0 siblings, 1 reply; 21+ messages in thread
From: Tang Chen @ 2013-06-03  7:35 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	lwoodman, riel, jweiner, prarit, x86, linux-doc, linux-kernel,
	linux-mm

Hi Vasilis,

On 06/01/2013 12:24 AM, Vasilis Liaskovitis wrote:
......
>> +void __init_memblock memblock_mark_kernel_nodes()
>> +{
>> +	int i, nid;
>> +	struct memblock_type *reserved =&memblock.reserved;
>> +
>> +	for (i = 0; i<  reserved->cnt; i++)
>> +		if (reserved->regions[i].flags == MEMBLK_FLAGS_DEFAULT) {
>> +			nid = memblock_get_region_node(&reserved->regions[i]);
>> +			node_set(nid, memblock_kernel_nodemask);
>> +		}
>> +}
>
> I think there is a problem here because memblock_set_region_node is sometimes
> called with nid == MAX_NUMNODES. This means the correct node is not properly
> masked in the memblock_kernel_nodemask bitmap.
> E.g. in a VM test, memblock_mark_kernel_nodes with extra pr_warn calls iterates
> over the following memblocks (ranges below are memblks base-(base+size)):
>
> [    0.000000] memblock_mark_kernel_nodes nid=64 0x00000000000000-0x00000000010000
> [    0.000000] memblock_mark_kernel_nodes nid=64 0x00000000098000-0x00000000100000
> [    0.000000] memblock_mark_kernel_nodes nid=64 0x00000001000000-0x00000001a5a000
> [    0.000000] memblock_mark_kernel_nodes nid=64 0x00000037000000-0x000000377f8000
>
> where MAX_NUMNODES is 64 because CONFIG_NODES_SHIFT=6.
> The ranges above belong to node 0, but the node's bit is never marked.
>
> With a buggy bios that marks all memory as hotpluggable, this results in a
> panic, because both checks against hotpluggable bit and memblock_kernel_bitmask
> (in early_mem_hotplug_init) fail, the numa regions have all been merged together
> and memblock_reserve_hotpluggable is called for all memory.
>
> With a correct bios (some part of initial memory is not hotplug-able) the kernel
> can boot since the hotpluggable bit check works ok, but extra dimms on node 0
> will still be allowed to be in MOVABLE_ZONE.
>

OK, I see the problem. But would you please give me a call trace that 
can show
how this could happen. I think the memory block info should be the same as
numa_meminfo. Can we fix the caller to make it set nid correctly ?

> Actually this behaviour (being able to have MOVABLE memory on nodes with kernel
> reserved memblocks) sort of matches the policy I requested in v2 :). But i
> suspect that is not your intent i.e. you want memblock_kernel_nodemask_bitmap to
> prevent movable reservations for the whole node where kernel has reserved
> memblocks.

I intended to set the whole node which the kernel resides in as 
un-hotpluggable.

>
> Is there a way to get accurate nid information for memblocks at early boot? I
> suspect pfn_to_nid doesn't work yet at this stage (i got a panic when I
> attempted iirc)

In such an early time, I think we can only get nid from numa_meminfo. So 
as I
said above, I'd like to fix this problem by making memblock has correct nid.

And I read the patch below. I think if we get nid from numa_meminfo, 
than we
don't need to call memblock_get_region_node().

Thanks. :)

>
> I used the hack below but it depends on CONFIG_NUMA, hopefully there is a
> cleaner general way:
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index cfd8c2f..af8ad2a 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -133,6 +133,19 @@ void __init setup_node_to_cpumask_map(void)
>   	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>   }
>
> +int __init numa_find_range_nid(u64 start, u64 size)
> +{
> +	unsigned int i;
> +	struct numa_meminfo *mi =&numa_meminfo;
> +
> +	for (i = 0; i<  mi->nr_blks; i++) {
> +		if (start>= mi->blk[i].start&&  start + size -1<= mi->blk[i].end)
> +		 return mi->blk[i].nid;
> +	}
> +	return -1;
> +}
> +EXPORT_SYMBOL(numa_find_range_nid);
> +
>   static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
>   				     bool hotpluggable,
>   				     struct numa_meminfo *mi)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 77a71fb..194b7c7 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1600,6 +1600,9 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
>   			unsigned long start, unsigned long end);
>   #endif
>
> +#ifdef CONFIG_NUMA
> +int __init numa_find_range_nid(u64 start, u64 size);
> +#endif
>   struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
>   int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
>   			unsigned long pfn, unsigned long size, pgprot_t);
> diff --git a/mm/memblock.c b/mm/memblock.c
> index a6b7845..284aced 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -834,15 +834,26 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>
>   void __init_memblock memblock_mark_kernel_nodes()
>   {
> -	int i, nid;
> +	int i, nid, tmpnid;
>   	struct memblock_type *reserved =&memblock.reserved;
>
>   	for (i = 0; i<  reserved->cnt; i++)
>   		if (reserved->regions[i].flags == MEMBLK_FLAGS_DEFAULT) {
>   			nid = memblock_get_region_node(&reserved->regions[i]);
> +		if (nid == MAX_NUMNODES) {
> +			tmpnid = numa_find_range_nid(reserved->regions[i].base,
> +				reserved->regions[i].size);
> +			if (tmpnid>= 0)
> +				nid = tmpnid;
> +		}
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index e862311..84d6e64 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -667,11 +667,7 @@ static bool srat_used __initdata;
>    */
>   static void __init early_x86_numa_init(void)
>   {
> -	/*
> -	 * Need to find out which nodes the kernel resides in, and arrange
> -	 * them as un-hotpluggable when parsing SRAT.
> -	 */
> -	memblock_mark_kernel_nodes();
>
>   	if (!numa_off) {
>   #ifdef CONFIG_X86_NUMAQ
> @@ -779,6 +775,12 @@ void __init early_initmem_init(void)
>   	load_cr3(swapper_pg_dir);
>   	__flush_tlb_all();
>
> +	/*
> +	 * Need to find out which nodes the kernel resides in, and arrange
> +	 * them as un-hotpluggable when parsing SRAT.
> +	 */
> +
> +	memblock_mark_kernel_nodes();
>   	early_mem_hotplug_init();
>
>   	early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 12/13] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority.
       [not found]   ` <20130603025924.GB7441@hacker.(null)>
@ 2013-06-03  7:37     ` Tang Chen
  0 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-06-03  7:37 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	vasilis.liaskovitis, lwoodman, riel, jweiner, prarit, x86,
	linux-doc, linux-kernel, linux-mm

On 06/03/2013 10:59 AM, Wanpeng Li wrote:
> On Fri, May 24, 2013 at 05:29:21PM +0800, Tang Chen wrote:
>> Arrange hotpluggable memory as ZONE_MOVABLE will cause NUMA performance decreased
>> because the kernel cannot use movable memory.
>>
>> For users who don't use memory hotplug and who don't want to lose their NUMA
>> performance, they need a way to disable this functionality.
>>
>> So, if users specify "movablecore=acpi" in kernel commandline, the kernel will
>> use SRAT to arrange ZONE_MOVABLE, and it has higher priority then original
>> movablecore and kernelcore boot option.
>>
>> For those who don't want this, just specify nothing.
>>
>
> Reviewed-by: Wanpeng Li<liwanp@linux.vnet.ibm.com>

Thank you very much for reviewing these patches. :)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in.
  2013-06-03  7:35     ` Tang Chen
@ 2013-06-03 13:18       ` Vasilis Liaskovitis
  2013-06-06  9:42         ` Tang Chen
  0 siblings, 1 reply; 21+ messages in thread
From: Vasilis Liaskovitis @ 2013-06-03 13:18 UTC (permalink / raw)
  To: Tang Chen
  Cc: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	lwoodman, riel, jweiner, prarit, x86, linux-doc, linux-kernel,
	linux-mm

Hi Tang,

On Mon, Jun 03, 2013 at 03:35:53PM +0800, Tang Chen wrote:
> Hi Vasilis,
>
[...]
> >The ranges above belong to node 0, but the node's bit is never marked.
> >
> >With a buggy bios that marks all memory as hotpluggable, this results in a
> >panic, because both checks against hotpluggable bit and memblock_kernel_bitmask
> >(in early_mem_hotplug_init) fail, the numa regions have all been merged together
> >and memblock_reserve_hotpluggable is called for all memory.
> >
> >With a correct bios (some part of initial memory is not hotplug-able) the kernel
> >can boot since the hotpluggable bit check works ok, but extra dimms on node 0
> >will still be allowed to be in MOVABLE_ZONE.
> >
> 
> OK, I see the problem. But would you please give me a call trace
> that can show
> how this could happen. I think the memory block info should be the same as
> numa_meminfo. Can we fix the caller to make it set nid correctly ?

memblock_reserve() calls memblock_add_region with nid == MAX_NUMNODES. So
all calls of memblock_reserve() in arch/x86/kernel/setup.c will cause memblock
additions with this non-specific node id I think.

Call sites I have seen in practice in my tests are trim_low_memory_range,
early_reserve_initrd, reserve_brk, all from setup_arch.

The MAX_NUMNODES case also happens when setup_arch adds memblocks for e820 map
entries:

setup_arch
  memblock_x86_fill
    memblock_add <--(calls memblock_add_region with nid == MAX_NUMNODES)

The problem is that these functions are called before numa/srat discovery in
early_initmem_init. So we don't have the numa_meminfo yet when these memblocks
are added/reserved. If calls can be re-ordered that would work, otherwise we should
update nid memblock fields after numa_meminfo has been setup.

> 
> >Actually this behaviour (being able to have MOVABLE memory on nodes with kernel
> >reserved memblocks) sort of matches the policy I requested in v2 :). But i
> >suspect that is not your intent i.e. you want memblock_kernel_nodemask_bitmap to
> >prevent movable reservations for the whole node where kernel has reserved
> >memblocks.
> 
> I intended to set the whole node which the kernel resides in as
> un-hotpluggable.
> 
> >
> >Is there a way to get accurate nid information for memblocks at early boot? I
> >suspect pfn_to_nid doesn't work yet at this stage (i got a panic when I
> >attempted iirc)
> 
> In such an early time, I think we can only get nid from
> numa_meminfo. So as I
> said above, I'd like to fix this problem by making memblock has correct nid.
> 
> And I read the patch below. I think if we get nid from numa_meminfo,
> than we
> don't need to call memblock_get_region_node().
> 

ok. If we update the memblock nid fields from numa_meminfo,
memblock_get_region_node will always return the correct node id.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in.
  2013-06-03 13:18       ` Vasilis Liaskovitis
@ 2013-06-06  9:42         ` Tang Chen
  0 siblings, 0 replies; 21+ messages in thread
From: Tang Chen @ 2013-06-06  9:42 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: mingo, hpa, akpm, yinghai, jiang.liu, wency, laijs,
	isimatu.yasuaki, tj, mgorman, minchan, mina86, gong.chen,
	lwoodman, riel, jweiner, prarit, x86, linux-doc, linux-kernel,
	linux-mm

Hi Vasilis,

On 06/03/2013 09:18 PM, Vasilis Liaskovitis wrote:
......
>>
>> In such an early time, I think we can only get nid from
>> numa_meminfo. So as I
>> said above, I'd like to fix this problem by making memblock has correct nid.
>>
>> And I read the patch below. I think if we get nid from numa_meminfo,
>> than we
>> don't need to call memblock_get_region_node().
>>
>
> ok. If we update the memblock nid fields from numa_meminfo,
> memblock_get_region_node will always return the correct node id.
>

I have fixed this problem in this way. And I'll send the new patches 
next week.

Thanks. :)


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-06-06  9:39 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-24  9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
2013-05-24  9:29 ` [PATCH v3 01/13] x86: get pg_data_t's memory from other node Tang Chen
2013-05-24  9:29 ` [PATCH v3 02/13] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
2013-05-24  9:29 ` [PATCH v3 03/13] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT Tang Chen
2013-05-24  9:29 ` [PATCH v3 04/13] x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct numa_meminfo Tang Chen
2013-05-24  9:29 ` [PATCH v3 05/13] x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup numa_meminfo Tang Chen
2013-05-24  9:29 ` [PATCH v3 06/13] memblock, numa: Introduce flag into memblock Tang Chen
     [not found]   ` <20130603013034.GA31743@hacker.(null)>
2013-06-03  1:59     ` Tang Chen
2013-05-24  9:29 ` [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in Tang Chen
2013-05-31 16:24   ` Vasilis Liaskovitis
2013-06-03  7:35     ` Tang Chen
2013-06-03 13:18       ` Vasilis Liaskovitis
2013-06-06  9:42         ` Tang Chen
2013-05-24  9:29 ` [PATCH v3 08/13] x86, numa: Move memory_add_physaddr_to_nid() to CONFIG_NUMA Tang Chen
2013-05-24  9:29 ` [PATCH v3 09/13] x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve node-life-cycle data Tang Chen
2013-05-24  9:29 ` [PATCH v3 10/13] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory Tang Chen
2013-05-31 16:15   ` Vasilis Liaskovitis
2013-05-24  9:29 ` [PATCH v3 11/13] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock Tang Chen
2013-05-24  9:29 ` [PATCH v3 12/13] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority Tang Chen
     [not found]   ` <20130603025924.GB7441@hacker.(null)>
2013-06-03  7:37     ` Tang Chen
2013-05-24  9:29 ` [PATCH v3 13/13] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option Tang Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).