All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Support SRAT for movablemem_map boot option.
@ 2013-01-25  9:42 ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-25  9:42 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

Here, we do two things:
1) patch1 ~ patch2: Prevent memblock from allocating memory in memory
                    to be set as ZONE_MOVABLE.
2) patch3:          Provide movablemem_map=acpi option for users who
                    don't want to specify physical address in kernel
                    commandline. It will use SRAT info, and set all
                    the hotpluggable memory as ZONE_MOVABLE.

After applying these 3 patches, movablemem_map boot option will work like this:

        /*
         * For movablemem_map=acpi:
         *
         * SRAT:                |_____| |_____| |_________| |_________| ......
         * node id:                0       1         1           2
         * hotpluggable:           n       y         y           n
         * movablemem_map:              |_____| |_________|
         *
         *
         * For movablemem_map=nn[KMG]@ss[KMG]:
         *
         * SRAT:                |_____| |_____| |_________| |_________| ......
         * node id:                0       1         1           2
         * user specified:                |__|                 |___|
         * movablemem_map:                |___| |_________|    |______| ......
         *
         * Using movablemem_map, we can prevent memblock from allocating memory
         * on ZONE_MOVABLE at boot time.
         *
         * NOTE: In the second case, SRAT info will be ingored.
         */

NOTE: Using this boot option could cause NUMA performance down. For users who
      don't want to lose NUMA performance, just do not use it now.
      We will improve it all along.

For more info of movablemem_map, please refer to:
      https://lkml.org/lkml/2013/1/14/87


Tang Chen (3):
  acpi, memory-hotplug: Parse SRAT before memblock is ready.
  acpi, memory-hotplug: Extend movablemem_map ranges to the end of
    node.
  acpi, memory-hotplug: Support getting hotplug info from SRAT.

 Documentation/kernel-parameters.txt |   23 ++++++++--
 arch/x86/kernel/setup.c             |   13 ++++--
 arch/x86/mm/numa.c                  |    2 +-
 arch/x86/mm/srat.c                  |   81 +++++++++++++++++++++++++++++++++-
 drivers/acpi/numa.c                 |   23 ++++++----
 include/linux/acpi.h                |    1 +
 include/linux/mm.h                  |    6 +++
 mm/page_alloc.c                     |   56 +++++++++++++++++++++++-
 8 files changed, 179 insertions(+), 26 deletions(-)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/3] Support SRAT for movablemem_map boot option.
@ 2013-01-25  9:42 ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-25  9:42 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

Here, we do two things:
1) patch1 ~ patch2: Prevent memblock from allocating memory in memory
                    to be set as ZONE_MOVABLE.
2) patch3:          Provide movablemem_map=acpi option for users who
                    don't want to specify physical address in kernel
                    commandline. It will use SRAT info, and set all
                    the hotpluggable memory as ZONE_MOVABLE.

After applying these 3 patches, movablemem_map boot option will work like this:

        /*
         * For movablemem_map=acpi:
         *
         * SRAT:                |_____| |_____| |_________| |_________| ......
         * node id:                0       1         1           2
         * hotpluggable:           n       y         y           n
         * movablemem_map:              |_____| |_________|
         *
         *
         * For movablemem_map=nn[KMG]@ss[KMG]:
         *
         * SRAT:                |_____| |_____| |_________| |_________| ......
         * node id:                0       1         1           2
         * user specified:                |__|                 |___|
         * movablemem_map:                |___| |_________|    |______| ......
         *
         * Using movablemem_map, we can prevent memblock from allocating memory
         * on ZONE_MOVABLE at boot time.
         *
         * NOTE: In the second case, SRAT info will be ingored.
         */

NOTE: Using this boot option could cause NUMA performance down. For users who
      don't want to lose NUMA performance, just do not use it now.
      We will improve it all along.

For more info of movablemem_map, please refer to:
      https://lkml.org/lkml/2013/1/14/87


Tang Chen (3):
  acpi, memory-hotplug: Parse SRAT before memblock is ready.
  acpi, memory-hotplug: Extend movablemem_map ranges to the end of
    node.
  acpi, memory-hotplug: Support getting hotplug info from SRAT.

 Documentation/kernel-parameters.txt |   23 ++++++++--
 arch/x86/kernel/setup.c             |   13 ++++--
 arch/x86/mm/numa.c                  |    2 +-
 arch/x86/mm/srat.c                  |   81 +++++++++++++++++++++++++++++++++-
 drivers/acpi/numa.c                 |   23 ++++++----
 include/linux/acpi.h                |    1 +
 include/linux/mm.h                  |    6 +++
 mm/page_alloc.c                     |   56 +++++++++++++++++++++++-
 8 files changed, 179 insertions(+), 26 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/3] acpi, memory-hotplug: Parse SRAT before memblock is ready.
  2013-01-25  9:42 ` Tang Chen
@ 2013-01-25  9:42   ` Tang Chen
  -1 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-25  9:42 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

On linux, the pages used by kernel could not be migrated. As a result,
if a memory range is used by kernel, it cannot be hot-removed. So if
we want to hot-remove memory, we should prevent kernel from using it.

The way now used to prevent this is specify a memory range by
movablemem_map boot option and set it as ZONE_MOVABLE.

But when the system is booting, memblock will allocate memory, and
reserve the memory for kernel. And before we parse SRAT, and know the
node memory ranges, memblock is working. And it may allocate memory
in ranges to be set as ZONE_MOVABLE. This memory can be used by kernel,
and never be freed.

So, let's parse SRAT before memblock is called first. And it is early enough.

The first call of memblock_find_in_range_node() is in:
setup_arch()
 |-->setup_real_mode()

so, this patch add a function early_parse_srat() to parse SRAT, and call
it before setup_real_mode() is called.

NOTE:
1) Do not clear numa_nodes_parsed in numa_init() because SRAT was parsed earlier.
2) I don't know why using count of memory affinities parsed from SRAT as a return
   value in original acpi_numa_init(). So I add a static variable srat_mem_cnt to
   remember this count and use it as the return value of the new acpi_numa_init()

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |   13 +++++++++----
 arch/x86/mm/numa.c      |    2 +-
 drivers/acpi/numa.c     |   23 +++++++++++++----------
 include/linux/acpi.h    |    1 +
 4 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 23ddd55..3787f14 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -903,6 +903,15 @@ void __init setup_arch(char **cmdline_p)
 	setup_bios_corruption_check();
 #endif
 
+	/*
+	 * In the memory hotplug case, the kernel need info from SRAT to
+	 * determine which memory is hotpluggable before allocating memory
+	 * using memblock.
+	 */
+	acpi_boot_table_init();
+	early_acpi_boot_init();
+	early_parse_srat();
+
 	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
 			(max_pfn_mapped<<PAGE_SHIFT) - 1);
 
@@ -965,10 +974,6 @@ void __init setup_arch(char **cmdline_p)
 	/*
 	 * Parse the ACPI tables for possible boot-time SMP configuration.
 	 */
-	acpi_boot_table_init();
-
-	early_acpi_boot_init();
-
 	initmem_init();
 	memblock_find_dma_reserve();
 
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index db939b6..6bdee58 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -570,7 +570,7 @@ static int __init numa_init(int (*init_func)(void))
 	for (i = 0; i < MAX_LOCAL_APIC; i++)
 		set_apicid_to_node(i, NUMA_NO_NODE);
 
-	nodes_clear(numa_nodes_parsed);
+	/* Do not clear numa_nodes_parsed because SRAT was parsed earlier. */
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index cb31298..2d255b1 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -280,10 +280,10 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 					    handler, max_entries);
 }
 
-int __init acpi_numa_init(void)
-{
-	int cnt = 0;
+static int srat_mem_cnt;
 
+void __init early_parse_srat(void)
+{
 	/*
 	 * Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
 	 * SRAT cpu entries could have different order with that in MADT.
@@ -293,21 +293,24 @@ int __init acpi_numa_init(void)
 	/* SRAT: Static Resource Affinity Table */
 	if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
-				     acpi_parse_x2apic_affinity, 0);
+				      acpi_parse_x2apic_affinity, 0);
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
-				     acpi_parse_processor_affinity, 0);
-		cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
-					    acpi_parse_memory_affinity,
-					    NR_NODE_MEMBLKS);
+				      acpi_parse_processor_affinity, 0);
+		srat_mem_cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
+						     acpi_parse_memory_affinity,
+						     NR_NODE_MEMBLKS);
 	}
+}
 
+int __init acpi_numa_init(void)
+{
 	/* SLIT: System Locality Information Table */
 	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
 
 	acpi_numa_arch_fixup();
 
-	if (cnt < 0)
-		return cnt;
+	if (srat_mem_cnt < 0)
+		return srat_mem_cnt;
 	else if (!parsed_numa_memblks)
 		return -ENOENT;
 	return 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 3994d77..9a8b9c6 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -93,6 +93,7 @@ int acpi_boot_init (void);
 void acpi_boot_table_init (void);
 int acpi_mps_check (void);
 int acpi_numa_init (void);
+void __init early_parse_srat(void);
 
 int acpi_table_init (void);
 int acpi_table_parse (char *id, acpi_table_handler handler);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 1/3] acpi, memory-hotplug: Parse SRAT before memblock is ready.
@ 2013-01-25  9:42   ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-25  9:42 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

On linux, the pages used by kernel could not be migrated. As a result,
if a memory range is used by kernel, it cannot be hot-removed. So if
we want to hot-remove memory, we should prevent kernel from using it.

The way now used to prevent this is specify a memory range by
movablemem_map boot option and set it as ZONE_MOVABLE.

But when the system is booting, memblock will allocate memory, and
reserve the memory for kernel. And before we parse SRAT, and know the
node memory ranges, memblock is working. And it may allocate memory
in ranges to be set as ZONE_MOVABLE. This memory can be used by kernel,
and never be freed.

So, let's parse SRAT before memblock is called first. And it is early enough.

The first call of memblock_find_in_range_node() is in:
setup_arch()
 |-->setup_real_mode()

so, this patch add a function early_parse_srat() to parse SRAT, and call
it before setup_real_mode() is called.

NOTE:
1) Do not clear numa_nodes_parsed in numa_init() because SRAT was parsed earlier.
2) I don't know why using count of memory affinities parsed from SRAT as a return
   value in original acpi_numa_init(). So I add a static variable srat_mem_cnt to
   remember this count and use it as the return value of the new acpi_numa_init()

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |   13 +++++++++----
 arch/x86/mm/numa.c      |    2 +-
 drivers/acpi/numa.c     |   23 +++++++++++++----------
 include/linux/acpi.h    |    1 +
 4 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 23ddd55..3787f14 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -903,6 +903,15 @@ void __init setup_arch(char **cmdline_p)
 	setup_bios_corruption_check();
 #endif
 
+	/*
+	 * In the memory hotplug case, the kernel need info from SRAT to
+	 * determine which memory is hotpluggable before allocating memory
+	 * using memblock.
+	 */
+	acpi_boot_table_init();
+	early_acpi_boot_init();
+	early_parse_srat();
+
 	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
 			(max_pfn_mapped<<PAGE_SHIFT) - 1);
 
@@ -965,10 +974,6 @@ void __init setup_arch(char **cmdline_p)
 	/*
 	 * Parse the ACPI tables for possible boot-time SMP configuration.
 	 */
-	acpi_boot_table_init();
-
-	early_acpi_boot_init();
-
 	initmem_init();
 	memblock_find_dma_reserve();
 
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index db939b6..6bdee58 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -570,7 +570,7 @@ static int __init numa_init(int (*init_func)(void))
 	for (i = 0; i < MAX_LOCAL_APIC; i++)
 		set_apicid_to_node(i, NUMA_NO_NODE);
 
-	nodes_clear(numa_nodes_parsed);
+	/* Do not clear numa_nodes_parsed because SRAT was parsed earlier. */
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index cb31298..2d255b1 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -280,10 +280,10 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 					    handler, max_entries);
 }
 
-int __init acpi_numa_init(void)
-{
-	int cnt = 0;
+static int srat_mem_cnt;
 
+void __init early_parse_srat(void)
+{
 	/*
 	 * Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
 	 * SRAT cpu entries could have different order with that in MADT.
@@ -293,21 +293,24 @@ int __init acpi_numa_init(void)
 	/* SRAT: Static Resource Affinity Table */
 	if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
-				     acpi_parse_x2apic_affinity, 0);
+				      acpi_parse_x2apic_affinity, 0);
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
-				     acpi_parse_processor_affinity, 0);
-		cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
-					    acpi_parse_memory_affinity,
-					    NR_NODE_MEMBLKS);
+				      acpi_parse_processor_affinity, 0);
+		srat_mem_cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
+						     acpi_parse_memory_affinity,
+						     NR_NODE_MEMBLKS);
 	}
+}
 
+int __init acpi_numa_init(void)
+{
 	/* SLIT: System Locality Information Table */
 	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
 
 	acpi_numa_arch_fixup();
 
-	if (cnt < 0)
-		return cnt;
+	if (srat_mem_cnt < 0)
+		return srat_mem_cnt;
 	else if (!parsed_numa_memblks)
 		return -ENOENT;
 	return 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 3994d77..9a8b9c6 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -93,6 +93,7 @@ int acpi_boot_init (void);
 void acpi_boot_table_init (void);
 int acpi_mps_check (void);
 int acpi_numa_init (void);
+void __init early_parse_srat(void);
 
 int acpi_table_init (void);
 int acpi_table_parse (char *id, acpi_table_handler handler);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] acpi, memory-hotplug: Extend movablemem_map ranges to the end of node.
  2013-01-25  9:42 ` Tang Chen
@ 2013-01-25  9:42   ` Tang Chen
  -1 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-25  9:42 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

When implementing movablemem_map boot option, we introduced an array
movablemem_map.map[] to store the memory ranges to be set as ZONE_MOVABLE.

Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify
the whole node memory range, we need to extend it to the node end so that
we can use it to prevent memblock from allocating memory in the ranges
user didn't specify.

We now implement movablemem_map boot option like this:
        /*
         * For movablemem_map=nn[KMG]@ss[KMG]:
         *
         * SRAT:                |_____| |_____| |_________| |_________| ......
         * node id:                0       1         1           2
         * user specified:                |__|                 |___|
         * movablemem_map:                |___| |_________|    |______| ......
         *
         * Using movablemem_map, we can prevent memblock from allocating memory
         * on ZONE_MOVABLE at boot time.
         *
         * NOTE: In this case, SRAT info will be ingored.
         */

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/srat.c |   61 +++++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/mm.h |    5 ++++
 mm/page_alloc.c    |   34 +++++++++++++++++++++++++++-
 3 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index 4ddf497..f841d0e 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -141,11 +141,16 @@ static inline int save_add_info(void) {return 1;}
 static inline int save_add_info(void) {return 0;}
 #endif
 
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+extern struct movablemem_map movablemem_map;
+#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
 /* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
 int __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
 	u64 start, end;
+	u32 hotpluggable;
 	int node, pxm;
 
 	if (srat_disabled())
@@ -157,8 +162,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
 		return -1;
 
-	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
+	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+	if (hotpluggable && !save_add_info())
 		return -1;
+
 	start = ma->base_address;
 	end = start + ma->length;
 	pxm = ma->proximity_domain;
@@ -178,9 +185,57 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 
 	node_set(node, numa_nodes_parsed);
 
-	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
+	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
 	       node, pxm,
-	       (unsigned long long) start, (unsigned long long) end - 1);
+	       (unsigned long long) start, (unsigned long long) end - 1,
+	       hotpluggable ? "Hot Pluggable": "");
+
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+	int overlap;
+	unsigned long start_pfn, end_pfn;
+
+	start_pfn = PFN_DOWN(start);
+	end_pfn = PFN_UP(end);
+
+	/*
+	 * For movablecore_map=nn[KMG]@ss[KMG]:
+	 *
+	 * SRAT:		|_____| |_____| |_________| |_________| ......
+	 * node id:		   0       1         1           2
+	 * user specified:	          |__|                 |___|
+	 * movablemem_map:		  |___| |_________|    |______| ......
+	 *
+	 * Using movablemem_map, we can prevent memblock from allocating memory
+	 * on ZONE_MOVABLE at boot time.
+	 */
+	overlap = movablemem_map_overlap(start_pfn, end_pfn);
+	if (overlap >= 0) {
+		/*
+		 * If part of this range is in movablemem_map, we need to
+		 * add the range after it to extend the range to the end
+		 * of the node, because from the min address specified to
+		 * the end of the node will be ZONE_MOVABLE.
+		 */
+		start_pfn = max(start_pfn,
+			    movablemem_map.map[overlap].start_pfn);
+		insert_movablemem_map(start_pfn, end_pfn);
+
+		/*
+		 * Set the nodemask, so that if the address range on one node
+		 * is not continuse, we can add the subsequent ranges on the
+		 * same node into movablemem_map.
+		 */
+		node_set(node, movablemem_map.numa_nodes_hotplug);
+	} else {
+		if (node_isset(node, movablemem_map.numa_nodes_hotplug))
+			/*
+			 * Insert the range if we already have movable ranges
+			 * on the same node.
+			 */
+			insert_movablemem_map(start_pfn, end_pfn);
+	}
+#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
 	return 0;
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7cef651..e88077a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1368,8 +1368,13 @@ struct movablemem_entry {
 struct movablemem_map {
 	int nr_map;
 	struct movablemem_entry map[MOVABLEMEM_MAP_MAX];
+	nodemask_t numa_nodes_hotplug;	/* on which nodes we specify memory */
 };
 
+extern void __init insert_movablemem_map(unsigned long start_pfn,
+					 unsigned long end_pfn);
+extern int __init movablemem_map_overlap(unsigned long start_pfn,
+					 unsigned long end_pfn);
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 #if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3978797..bdbce73 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5168,6 +5168,36 @@ early_param("kernelcore", cmdline_parse_kernelcore);
 early_param("movablecore", cmdline_parse_movablecore);
 
 /**
+ * movablemem_map_overlap() - Check if a range overlaps movablemem_map.map[].
+ * @start_pfn:	start pfn of the range to be checked
+ * @end_pfn: 	end pfn of the range to be checked (exclusive)
+ *
+ * This function checks if a given memory range [start_pfn, end_pfn) overlaps
+ * the movablemem_map.map[] array.
+ *
+ * Return: index of the first overlapped element in movablemem_map.map[]
+ *         or -1 if they don't overlap each other.
+ */
+int __init movablemem_map_overlap(unsigned long start_pfn,
+				   unsigned long end_pfn)
+{
+	int overlap;
+
+	if (!movablemem_map.nr_map)
+		return -1;
+
+	for (overlap = 0; overlap < movablemem_map.nr_map; overlap++)
+		if (start_pfn < movablemem_map.map[overlap].end_pfn)
+			break;
+
+	if (overlap == movablemem_map.nr_map ||
+	    end_pfn <= movablemem_map.map[overlap].start_pfn)
+		return -1;
+
+	return overlap;
+}
+
+/**
  * insert_movablemem_map() - Insert a memory range in to movablemem_map.map.
  * @start_pfn:	start pfn of the range
  * @end_pfn:	end pfn of the range
@@ -5175,8 +5205,8 @@ early_param("movablecore", cmdline_parse_movablecore);
  * This function will also merge the overlapped ranges, and sort the array
  * by start_pfn in monotonic increasing order.
  */
-static void __init insert_movablemem_map(unsigned long start_pfn,
-					  unsigned long end_pfn)
+void __init insert_movablemem_map(unsigned long start_pfn,
+				  unsigned long end_pfn)
 {
 	int pos, overlap;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] acpi, memory-hotplug: Extend movablemem_map ranges to the end of node.
@ 2013-01-25  9:42   ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-25  9:42 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

When implementing movablemem_map boot option, we introduced an array
movablemem_map.map[] to store the memory ranges to be set as ZONE_MOVABLE.

Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify
the whole node memory range, we need to extend it to the node end so that
we can use it to prevent memblock from allocating memory in the ranges
user didn't specify.

We now implement movablemem_map boot option like this:
        /*
         * For movablemem_map=nn[KMG]@ss[KMG]:
         *
         * SRAT:                |_____| |_____| |_________| |_________| ......
         * node id:                0       1         1           2
         * user specified:                |__|                 |___|
         * movablemem_map:                |___| |_________|    |______| ......
         *
         * Using movablemem_map, we can prevent memblock from allocating memory
         * on ZONE_MOVABLE at boot time.
         *
         * NOTE: In this case, SRAT info will be ingored.
         */

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/srat.c |   61 +++++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/mm.h |    5 ++++
 mm/page_alloc.c    |   34 +++++++++++++++++++++++++++-
 3 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index 4ddf497..f841d0e 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -141,11 +141,16 @@ static inline int save_add_info(void) {return 1;}
 static inline int save_add_info(void) {return 0;}
 #endif
 
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+extern struct movablemem_map movablemem_map;
+#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
 /* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
 int __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
 	u64 start, end;
+	u32 hotpluggable;
 	int node, pxm;
 
 	if (srat_disabled())
@@ -157,8 +162,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
 		return -1;
 
-	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
+	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+	if (hotpluggable && !save_add_info())
 		return -1;
+
 	start = ma->base_address;
 	end = start + ma->length;
 	pxm = ma->proximity_domain;
@@ -178,9 +185,57 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 
 	node_set(node, numa_nodes_parsed);
 
-	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
+	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
 	       node, pxm,
-	       (unsigned long long) start, (unsigned long long) end - 1);
+	       (unsigned long long) start, (unsigned long long) end - 1,
+	       hotpluggable ? "Hot Pluggable": "");
+
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+	int overlap;
+	unsigned long start_pfn, end_pfn;
+
+	start_pfn = PFN_DOWN(start);
+	end_pfn = PFN_UP(end);
+
+	/*
+	 * For movablecore_map=nn[KMG]@ss[KMG]:
+	 *
+	 * SRAT:		|_____| |_____| |_________| |_________| ......
+	 * node id:		   0       1         1           2
+	 * user specified:	          |__|                 |___|
+	 * movablemem_map:		  |___| |_________|    |______| ......
+	 *
+	 * Using movablemem_map, we can prevent memblock from allocating memory
+	 * on ZONE_MOVABLE at boot time.
+	 */
+	overlap = movablemem_map_overlap(start_pfn, end_pfn);
+	if (overlap >= 0) {
+		/*
+		 * If part of this range is in movablemem_map, we need to
+		 * add the range after it to extend the range to the end
+		 * of the node, because from the min address specified to
+		 * the end of the node will be ZONE_MOVABLE.
+		 */
+		start_pfn = max(start_pfn,
+			    movablemem_map.map[overlap].start_pfn);
+		insert_movablemem_map(start_pfn, end_pfn);
+
+		/*
+		 * Set the nodemask, so that if the address range on one node
+		 * is not continuse, we can add the subsequent ranges on the
+		 * same node into movablemem_map.
+		 */
+		node_set(node, movablemem_map.numa_nodes_hotplug);
+	} else {
+		if (node_isset(node, movablemem_map.numa_nodes_hotplug))
+			/*
+			 * Insert the range if we already have movable ranges
+			 * on the same node.
+			 */
+			insert_movablemem_map(start_pfn, end_pfn);
+	}
+#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
 	return 0;
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7cef651..e88077a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1368,8 +1368,13 @@ struct movablemem_entry {
 struct movablemem_map {
 	int nr_map;
 	struct movablemem_entry map[MOVABLEMEM_MAP_MAX];
+	nodemask_t numa_nodes_hotplug;	/* on which nodes we specify memory */
 };
 
+extern void __init insert_movablemem_map(unsigned long start_pfn,
+					 unsigned long end_pfn);
+extern int __init movablemem_map_overlap(unsigned long start_pfn,
+					 unsigned long end_pfn);
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 #if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3978797..bdbce73 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5168,6 +5168,36 @@ early_param("kernelcore", cmdline_parse_kernelcore);
 early_param("movablecore", cmdline_parse_movablecore);
 
 /**
+ * movablemem_map_overlap() - Check if a range overlaps movablemem_map.map[].
+ * @start_pfn:	start pfn of the range to be checked
+ * @end_pfn: 	end pfn of the range to be checked (exclusive)
+ *
+ * This function checks if a given memory range [start_pfn, end_pfn) overlaps
+ * the movablemem_map.map[] array.
+ *
+ * Return: index of the first overlapped element in movablemem_map.map[]
+ *         or -1 if they don't overlap each other.
+ */
+int __init movablemem_map_overlap(unsigned long start_pfn,
+				   unsigned long end_pfn)
+{
+	int overlap;
+
+	if (!movablemem_map.nr_map)
+		return -1;
+
+	for (overlap = 0; overlap < movablemem_map.nr_map; overlap++)
+		if (start_pfn < movablemem_map.map[overlap].end_pfn)
+			break;
+
+	if (overlap == movablemem_map.nr_map ||
+	    end_pfn <= movablemem_map.map[overlap].start_pfn)
+		return -1;
+
+	return overlap;
+}
+
+/**
  * insert_movablemem_map() - Insert a memory range in to movablemem_map.map.
  * @start_pfn:	start pfn of the range
  * @end_pfn:	end pfn of the range
@@ -5175,8 +5205,8 @@ early_param("movablecore", cmdline_parse_movablecore);
  * This function will also merge the overlapped ranges, and sort the array
  * by start_pfn in monotonic increasing order.
  */
-static void __init insert_movablemem_map(unsigned long start_pfn,
-					  unsigned long end_pfn)
+void __init insert_movablemem_map(unsigned long start_pfn,
+				  unsigned long end_pfn)
 {
 	int pos, overlap;
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-25  9:42 ` Tang Chen
@ 2013-01-25  9:42   ` Tang Chen
  -1 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-25  9:42 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

We now provide an option for users who don't want to specify physical
memory address in kernel commandline.

        /*
         * For movablemem_map=acpi:
         *
         * SRAT:                |_____| |_____| |_________| |_________| ......
         * node id:                0       1         1           2
         * hotpluggable:           n       y         y           n
         * movablemem_map:              |_____| |_________|
         *
         * Using movablemem_map, we can prevent memblock from allocating memory
         * on ZONE_MOVABLE at boot time.
         */

So user just specify movablemem_map=acpi, and the kernel will use hotpluggable
info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.

NOTE: Using this way will cause NUMA performance down because the whole node
      will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
      If users don't want to lose NUMA performance, just don't use it.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   23 ++++++++++++++++++-----
 arch/x86/mm/srat.c                  |   22 +++++++++++++++++++++-
 include/linux/mm.h                  |    1 +
 mm/page_alloc.c                     |   22 +++++++++++++++++++++-
 4 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7770611..3d9dc9d 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1637,15 +1637,28 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablemem_map=acpi
+			[KNL,X86,IA-64,PPC] This parameter is similar to
+			memmap except it specifies the memory map of
+			ZONE_MOVABLE.
+			This option inform the kernel to use Hot Pluggable bit
+			in flags from SRAT from ACPI BIOS to determine which
+			memory devices could be hotplugged. The corresponding
+			memory ranges will be set as ZONE_MOVABLE.
+
 	movablemem_map=nn[KMG]@ss[KMG]
 			[KNL,X86,IA-64,PPC] This parameter is similar to
 			memmap except it specifies the memory map of
 			ZONE_MOVABLE.
-			If more areas are all within one node, then from
-			lowest ss to the end of the node will be ZONE_MOVABLE.
-			If an area covers two or more nodes, the area from
-			ss to the end of the 1st node will be ZONE_MOVABLE,
-			and all the rest nodes will only have ZONE_MOVABLE.
+			If user specifies memory ranges, the info in SRAT will
+			be ingored. And it works like the following:
+			- If more ranges are all within one node, then from
+			  lowest ss to the end of the node will be ZONE_MOVABLE.
+			- If a range is within a node, then from ss to the end
+			  of the node will be ZONE_MOVABLE.
+			- If a range covers two or more nodes, then from ss to
+			  the end of the 1st node will be ZONE_MOVABLE, and all
+			  the rest nodes will only have ZONE_MOVABLE.
 			If memmap is specified at the same time, the
 			movablemem_map will be limited within the memmap
 			areas. If kernelcore or movablecore is also specified,
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index f841d0e..94d6e72 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -198,7 +198,23 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	end_pfn = PFN_UP(end);
 
 	/*
-	 * For movablecore_map=nn[KMG]@ss[KMG]:
+	 * For movablemem_map=acpi:
+	 *
+	 * SRAT:		|_____| |_____| |_________| |_________| ......
+	 * node id:                0       1         1           2
+	 * hotpluggable:	   n       y         y           n
+	 * movablemem_map:	        |_____| |_________|
+	 *
+	 * Using movablemem_map, we can prevent memblock from allocating memory
+	 * on ZONE_MOVABLE at boot time.
+	 */
+	if (hotpluggable && movablemem_map.acpi) {
+		insert_movablemem_map(start_pfn, end_pfn);
+		goto out;
+	}
+
+	/*
+	 * For movablemem_map=nn[KMG]@ss[KMG]:
 	 *
 	 * SRAT:		|_____| |_____| |_________| |_________| ......
 	 * node id:		   0       1         1           2
@@ -207,6 +223,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	 *
 	 * Using movablemem_map, we can prevent memblock from allocating memory
 	 * on ZONE_MOVABLE at boot time.
+	 *
+	 * NOTE: In this case, SRAT info will be ingored.
 	 */
 	overlap = movablemem_map_overlap(start_pfn, end_pfn);
 	if (overlap >= 0) {
@@ -234,6 +252,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 			 */
 			insert_movablemem_map(start_pfn, end_pfn);
 	}
+
+out:
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 	return 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e88077a..3eda45f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1366,6 +1366,7 @@ struct movablemem_entry {
 };
 
 struct movablemem_map {
+	bool acpi;	/* true if using SRAT info */
 	int nr_map;
 	struct movablemem_entry map[MOVABLEMEM_MAP_MAX];
 	nodemask_t numa_nodes_hotplug;	/* on which nodes we specify memory */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdbce73..843c00d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -202,7 +202,10 @@ static unsigned long __meminitdata dma_reserve;
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 /* Movable memory ranges, will also be used by memblock subsystem. */
-struct movablemem_map movablemem_map;
+struct movablemem_map movablemem_map = {
+	.acpi = false,
+	.nr_map = 0,
+};
 
 static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
@@ -5306,6 +5309,23 @@ static int __init cmdline_parse_movablemem_map(char *p)
 	if (!p)
 		goto err;
 
+	if (!strncmp(p, "acpi", max(4, strlen(p))))
+		movablemem_map.acpi = true;
+
+	/*
+	 * If user decide to use info from BIOS, all the other user specified
+	 * ranges will be ingored.
+	 */
+	if (movablemem_map.acpi) {
+		if (movablemem_map.nr_map) {
+			memset(movablemem_map.map, 0,
+				sizeof(struct movablemem_entry)
+				* movablemem_map.nr_map);
+			movablemem_map.nr_map = 0;
+		}
+		return 0;
+	}
+
 	oldp = p;
 	mem_size = memparse(p, &p);
 	if (p == oldp)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-01-25  9:42   ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-25  9:42 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

We now provide an option for users who don't want to specify physical
memory address in kernel commandline.

        /*
         * For movablemem_map=acpi:
         *
         * SRAT:                |_____| |_____| |_________| |_________| ......
         * node id:                0       1         1           2
         * hotpluggable:           n       y         y           n
         * movablemem_map:              |_____| |_________|
         *
         * Using movablemem_map, we can prevent memblock from allocating memory
         * on ZONE_MOVABLE at boot time.
         */

So user just specify movablemem_map=acpi, and the kernel will use hotpluggable
info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.

NOTE: Using this way will cause NUMA performance down because the whole node
      will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
      If users don't want to lose NUMA performance, just don't use it.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   23 ++++++++++++++++++-----
 arch/x86/mm/srat.c                  |   22 +++++++++++++++++++++-
 include/linux/mm.h                  |    1 +
 mm/page_alloc.c                     |   22 +++++++++++++++++++++-
 4 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7770611..3d9dc9d 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1637,15 +1637,28 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablemem_map=acpi
+			[KNL,X86,IA-64,PPC] This parameter is similar to
+			memmap except it specifies the memory map of
+			ZONE_MOVABLE.
+			This option inform the kernel to use Hot Pluggable bit
+			in flags from SRAT from ACPI BIOS to determine which
+			memory devices could be hotplugged. The corresponding
+			memory ranges will be set as ZONE_MOVABLE.
+
 	movablemem_map=nn[KMG]@ss[KMG]
 			[KNL,X86,IA-64,PPC] This parameter is similar to
 			memmap except it specifies the memory map of
 			ZONE_MOVABLE.
-			If more areas are all within one node, then from
-			lowest ss to the end of the node will be ZONE_MOVABLE.
-			If an area covers two or more nodes, the area from
-			ss to the end of the 1st node will be ZONE_MOVABLE,
-			and all the rest nodes will only have ZONE_MOVABLE.
+			If user specifies memory ranges, the info in SRAT will
+			be ingored. And it works like the following:
+			- If more ranges are all within one node, then from
+			  lowest ss to the end of the node will be ZONE_MOVABLE.
+			- If a range is within a node, then from ss to the end
+			  of the node will be ZONE_MOVABLE.
+			- If a range covers two or more nodes, then from ss to
+			  the end of the 1st node will be ZONE_MOVABLE, and all
+			  the rest nodes will only have ZONE_MOVABLE.
 			If memmap is specified at the same time, the
 			movablemem_map will be limited within the memmap
 			areas. If kernelcore or movablecore is also specified,
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index f841d0e..94d6e72 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -198,7 +198,23 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	end_pfn = PFN_UP(end);
 
 	/*
-	 * For movablecore_map=nn[KMG]@ss[KMG]:
+	 * For movablemem_map=acpi:
+	 *
+	 * SRAT:		|_____| |_____| |_________| |_________| ......
+	 * node id:                0       1         1           2
+	 * hotpluggable:	   n       y         y           n
+	 * movablemem_map:	        |_____| |_________|
+	 *
+	 * Using movablemem_map, we can prevent memblock from allocating memory
+	 * on ZONE_MOVABLE at boot time.
+	 */
+	if (hotpluggable && movablemem_map.acpi) {
+		insert_movablemem_map(start_pfn, end_pfn);
+		goto out;
+	}
+
+	/*
+	 * For movablemem_map=nn[KMG]@ss[KMG]:
 	 *
 	 * SRAT:		|_____| |_____| |_________| |_________| ......
 	 * node id:		   0       1         1           2
@@ -207,6 +223,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	 *
 	 * Using movablemem_map, we can prevent memblock from allocating memory
 	 * on ZONE_MOVABLE at boot time.
+	 *
+	 * NOTE: In this case, SRAT info will be ingored.
 	 */
 	overlap = movablemem_map_overlap(start_pfn, end_pfn);
 	if (overlap >= 0) {
@@ -234,6 +252,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 			 */
 			insert_movablemem_map(start_pfn, end_pfn);
 	}
+
+out:
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 	return 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e88077a..3eda45f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1366,6 +1366,7 @@ struct movablemem_entry {
 };
 
 struct movablemem_map {
+	bool acpi;	/* true if using SRAT info */
 	int nr_map;
 	struct movablemem_entry map[MOVABLEMEM_MAP_MAX];
 	nodemask_t numa_nodes_hotplug;	/* on which nodes we specify memory */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdbce73..843c00d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -202,7 +202,10 @@ static unsigned long __meminitdata dma_reserve;
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 /* Movable memory ranges, will also be used by memblock subsystem. */
-struct movablemem_map movablemem_map;
+struct movablemem_map movablemem_map = {
+	.acpi = false,
+	.nr_map = 0,
+};
 
 static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
@@ -5306,6 +5309,23 @@ static int __init cmdline_parse_movablemem_map(char *p)
 	if (!p)
 		goto err;
 
+	if (!strncmp(p, "acpi", max(4, strlen(p))))
+		movablemem_map.acpi = true;
+
+	/*
+	 * If user decide to use info from BIOS, all the other user specified
+	 * ranges will be ingored.
+	 */
+	if (movablemem_map.acpi) {
+		if (movablemem_map.nr_map) {
+			memset(movablemem_map.map, 0,
+				sizeof(struct movablemem_entry)
+				* movablemem_map.nr_map);
+			movablemem_map.nr_map = 0;
+		}
+		return 0;
+	}
+
 	oldp = p;
 	mem_size = memparse(p, &p);
 	if (p == oldp)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] acpi, memory-hotplug: Extend movablemem_map ranges to the end of node.
  2013-01-25  9:42   ` Tang Chen
@ 2013-01-26  0:36     ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-01-26  0:36 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Fri, 25 Jan 2013 17:42:08 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> When implementing movablemem_map boot option, we introduced an array
> movablemem_map.map[] to store the memory ranges to be set as ZONE_MOVABLE.
> 
> Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify
> the whole node memory range, we need to extend it to the node end so that
> we can use it to prevent memblock from allocating memory in the ranges
> user didn't specify.
> 
> We now implement movablemem_map boot option like this:
>         /*
>          * For movablemem_map=nn[KMG]@ss[KMG]:
>          *
>          * SRAT:                |_____| |_____| |_________| |_________| ......
>          * node id:                0       1         1           2
>          * user specified:                |__|                 |___|
>          * movablemem_map:                |___| |_________|    |______| ......
>          *
>          * Using movablemem_map, we can prevent memblock from allocating memory
>          * on ZONE_MOVABLE at boot time.
>          *
>          * NOTE: In this case, SRAT info will be ingored.
>          */
> 

The patch generates a bunch of rejects, partly due to linux-next
changes but I think I fixed everything up OK.

> index 4ddf497..f841d0e 100644
> --- a/arch/x86/mm/srat.c
> +++ b/arch/x86/mm/srat.c
> @@ -141,11 +141,16 @@ static inline int save_add_info(void) {return 1;}
>  static inline int save_add_info(void) {return 0;}
>  #endif
>  
> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +extern struct movablemem_map movablemem_map;
> +#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

Well.

a) we shouldn't put extern declarations in C files - put them in
   headers so we can be assured that all compilation units agree on the
   type.

b) the ifdefs are unneeded - a unused extern declaration is OK (as
   long as the type itself is always defined!)

c) movablemem_map is already declared in memblock.h.

So I zapped the above three lines.

> @@ -178,9 +185,57 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  
>  	node_set(node, numa_nodes_parsed);
>  
> -	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
> +	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
>  	       node, pxm,
> -	       (unsigned long long) start, (unsigned long long) end - 1);
> +	       (unsigned long long) start, (unsigned long long) end - 1,
> +	       hotpluggable ? "Hot Pluggable": "");
> +
> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +	int overlap;
> +	unsigned long start_pfn, end_pfn;

no, we don't put declarations of locals in the middle of C statements
like this:

arch/x86/mm/srat.c: In function 'acpi_numa_memory_affinity_init':
arch/x86/mm/srat.c:185: warning: ISO C90 forbids mixed declarations and code

Did your compiler not emit this warning?

I fixed this by moving the code into a new function
"handle_movablemem".  Feel free to suggest a more appropriate name!


From: Andrew Morton <akpm@linux-foundation.org>
Subject: acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix

clean up code, fix build warning

Cc: "Brown, Len" <len.brown@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/mm/srat.c |   93 ++++++++++++++++++++++---------------------
 1 file changed, 49 insertions(+), 44 deletions(-)

diff -puN arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix arch/x86/mm/srat.c
--- a/arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix
+++ a/arch/x86/mm/srat.c
@@ -142,50 +142,8 @@ static inline int save_add_info(void) {r
 #endif
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
-extern struct movablemem_map movablemem_map;
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
-
-/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
-int __init
-acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
+static void __init handle_movablemem(int node, u64 start, u64 end)
 {
-	u64 start, end;
-	u32 hotpluggable;
-	int node, pxm;
-
-	if (srat_disabled())
-		goto out_err;
-	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
-		goto out_err_bad_srat;
-	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
-		goto out_err;
-	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
-	if (hotpluggable && !save_add_info())
-		goto out_err;
-
-	start = ma->base_address;
-	end = start + ma->length;
-	pxm = ma->proximity_domain;
-	if (acpi_srat_revision <= 1)
-		pxm &= 0xff;
-
-	node = setup_node(pxm);
-	if (node < 0) {
-		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
-		goto out_err_bad_srat;
-	}
-
-	if (numa_add_memblk(node, start, end) < 0)
-		goto out_err_bad_srat;
-
-	node_set(node, numa_nodes_parsed);
-
-	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
-	       node, pxm,
-	       (unsigned long long) start, (unsigned long long) end - 1,
-	       hotpluggable ? "Hot Pluggable": "");
-
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	int overlap;
 	unsigned long start_pfn, end_pfn;
 
@@ -229,7 +187,54 @@ acpi_numa_memory_affinity_init(struct ac
 			 */
 			insert_movablemem_map(start_pfn, end_pfn);
 	}
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+}
+#else		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+static inline void handle_movablemem(int node, u64 start, u64 end)
+{
+}
+#endif		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
+/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
+int __init
+acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
+{
+	u64 start, end;
+	u32 hotpluggable;
+	int node, pxm;
+
+	if (srat_disabled())
+		goto out_err;
+	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
+		goto out_err_bad_srat;
+	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
+		goto out_err;
+	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+	if (hotpluggable && !save_add_info())
+		goto out_err;
+
+	start = ma->base_address;
+	end = start + ma->length;
+	pxm = ma->proximity_domain;
+	if (acpi_srat_revision <= 1)
+		pxm &= 0xff;
+
+	node = setup_node(pxm);
+	if (node < 0) {
+		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
+		goto out_err_bad_srat;
+	}
+
+	if (numa_add_memblk(node, start, end) < 0)
+		goto out_err_bad_srat;
+
+	node_set(node, numa_nodes_parsed);
+
+	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
+	       node, pxm,
+	       (unsigned long long) start, (unsigned long long) end - 1,
+	       hotpluggable ? "Hot Pluggable": "");
+
+	handle_movablemem(node, start, end);
 
 	return 0;
 out_err_bad_srat:
diff -puN include/linux/mm.h~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix include/linux/mm.h
diff -puN mm/page_alloc.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix mm/page_alloc.c
_


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] acpi, memory-hotplug: Extend movablemem_map ranges to the end of node.
@ 2013-01-26  0:36     ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-01-26  0:36 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Fri, 25 Jan 2013 17:42:08 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> When implementing movablemem_map boot option, we introduced an array
> movablemem_map.map[] to store the memory ranges to be set as ZONE_MOVABLE.
> 
> Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify
> the whole node memory range, we need to extend it to the node end so that
> we can use it to prevent memblock from allocating memory in the ranges
> user didn't specify.
> 
> We now implement movablemem_map boot option like this:
>         /*
>          * For movablemem_map=nn[KMG]@ss[KMG]:
>          *
>          * SRAT:                |_____| |_____| |_________| |_________| ......
>          * node id:                0       1         1           2
>          * user specified:                |__|                 |___|
>          * movablemem_map:                |___| |_________|    |______| ......
>          *
>          * Using movablemem_map, we can prevent memblock from allocating memory
>          * on ZONE_MOVABLE at boot time.
>          *
>          * NOTE: In this case, SRAT info will be ingored.
>          */
> 

The patch generates a bunch of rejects, partly due to linux-next
changes but I think I fixed everything up OK.

> index 4ddf497..f841d0e 100644
> --- a/arch/x86/mm/srat.c
> +++ b/arch/x86/mm/srat.c
> @@ -141,11 +141,16 @@ static inline int save_add_info(void) {return 1;}
>  static inline int save_add_info(void) {return 0;}
>  #endif
>  
> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +extern struct movablemem_map movablemem_map;
> +#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

Well.

a) we shouldn't put extern declarations in C files - put them in
   headers so we can be assured that all compilation units agree on the
   type.

b) the ifdefs are unneeded - a unused extern declaration is OK (as
   long as the type itself is always defined!)

c) movablemem_map is already declared in memblock.h.

So I zapped the above three lines.

> @@ -178,9 +185,57 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  
>  	node_set(node, numa_nodes_parsed);
>  
> -	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
> +	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
>  	       node, pxm,
> -	       (unsigned long long) start, (unsigned long long) end - 1);
> +	       (unsigned long long) start, (unsigned long long) end - 1,
> +	       hotpluggable ? "Hot Pluggable": "");
> +
> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +	int overlap;
> +	unsigned long start_pfn, end_pfn;

no, we don't put declarations of locals in the middle of C statements
like this:

arch/x86/mm/srat.c: In function 'acpi_numa_memory_affinity_init':
arch/x86/mm/srat.c:185: warning: ISO C90 forbids mixed declarations and code

Did your compiler not emit this warning?

I fixed this by moving the code into a new function
"handle_movablemem".  Feel free to suggest a more appropriate name!


From: Andrew Morton <akpm@linux-foundation.org>
Subject: acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix

clean up code, fix build warning

Cc: "Brown, Len" <len.brown@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/mm/srat.c |   93 ++++++++++++++++++++++---------------------
 1 file changed, 49 insertions(+), 44 deletions(-)

diff -puN arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix arch/x86/mm/srat.c
--- a/arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix
+++ a/arch/x86/mm/srat.c
@@ -142,50 +142,8 @@ static inline int save_add_info(void) {r
 #endif
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
-extern struct movablemem_map movablemem_map;
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
-
-/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
-int __init
-acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
+static void __init handle_movablemem(int node, u64 start, u64 end)
 {
-	u64 start, end;
-	u32 hotpluggable;
-	int node, pxm;
-
-	if (srat_disabled())
-		goto out_err;
-	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
-		goto out_err_bad_srat;
-	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
-		goto out_err;
-	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
-	if (hotpluggable && !save_add_info())
-		goto out_err;
-
-	start = ma->base_address;
-	end = start + ma->length;
-	pxm = ma->proximity_domain;
-	if (acpi_srat_revision <= 1)
-		pxm &= 0xff;
-
-	node = setup_node(pxm);
-	if (node < 0) {
-		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
-		goto out_err_bad_srat;
-	}
-
-	if (numa_add_memblk(node, start, end) < 0)
-		goto out_err_bad_srat;
-
-	node_set(node, numa_nodes_parsed);
-
-	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
-	       node, pxm,
-	       (unsigned long long) start, (unsigned long long) end - 1,
-	       hotpluggable ? "Hot Pluggable": "");
-
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	int overlap;
 	unsigned long start_pfn, end_pfn;
 
@@ -229,7 +187,54 @@ acpi_numa_memory_affinity_init(struct ac
 			 */
 			insert_movablemem_map(start_pfn, end_pfn);
 	}
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+}
+#else		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+static inline void handle_movablemem(int node, u64 start, u64 end)
+{
+}
+#endif		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
+/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
+int __init
+acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
+{
+	u64 start, end;
+	u32 hotpluggable;
+	int node, pxm;
+
+	if (srat_disabled())
+		goto out_err;
+	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
+		goto out_err_bad_srat;
+	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
+		goto out_err;
+	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+	if (hotpluggable && !save_add_info())
+		goto out_err;
+
+	start = ma->base_address;
+	end = start + ma->length;
+	pxm = ma->proximity_domain;
+	if (acpi_srat_revision <= 1)
+		pxm &= 0xff;
+
+	node = setup_node(pxm);
+	if (node < 0) {
+		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
+		goto out_err_bad_srat;
+	}
+
+	if (numa_add_memblk(node, start, end) < 0)
+		goto out_err_bad_srat;
+
+	node_set(node, numa_nodes_parsed);
+
+	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
+	       node, pxm,
+	       (unsigned long long) start, (unsigned long long) end - 1,
+	       hotpluggable ? "Hot Pluggable": "");
+
+	handle_movablemem(node, start, end);
 
 	return 0;
 out_err_bad_srat:
diff -puN include/linux/mm.h~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix include/linux/mm.h
diff -puN mm/page_alloc.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix mm/page_alloc.c
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-25  9:42   ` Tang Chen
@ 2013-01-26  0:40     ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-01-26  0:40 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Fri, 25 Jan 2013 17:42:09 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> We now provide an option for users who don't want to specify physical
> memory address in kernel commandline.
> 
>         /*
>          * For movablemem_map=acpi:
>          *
>          * SRAT:                |_____| |_____| |_________| |_________| ......
>          * node id:                0       1         1           2
>          * hotpluggable:           n       y         y           n
>          * movablemem_map:              |_____| |_________|
>          *
>          * Using movablemem_map, we can prevent memblock from allocating memory
>          * on ZONE_MOVABLE at boot time.
>          */
> 
> So user just specify movablemem_map=acpi, and the kernel will use hotpluggable
> info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.

Well, as a reult of my previous hackery, arch/x86/mm/srat.c now looks
rather different.  Please check it carefully and runtime test this code
when it appears in linux-next?


/*
 * ACPI 3.0 based NUMA setup
 * Copyright 2004 Andi Kleen, SuSE Labs.
 *
 * Reads the ACPI SRAT table to figure out what memory belongs to which CPUs.
 *
 * Called from acpi_numa_init while reading the SRAT and SLIT tables.
 * Assumes all memory regions belonging to a single proximity domain
 * are in one chunk. Holes between them will be included in the node.
 */

#include <linux/kernel.h>
#include <linux/acpi.h>
#include <linux/mmzone.h>
#include <linux/bitmap.h>
#include <linux/module.h>
#include <linux/topology.h>
#include <linux/bootmem.h>
#include <linux/memblock.h>
#include <linux/mm.h>
#include <asm/proto.h>
#include <asm/numa.h>
#include <asm/e820.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>

int acpi_numa __initdata;

static __init int setup_node(int pxm)
{
	return acpi_map_pxm_to_node(pxm);
}

static __init void bad_srat(void)
{
	printk(KERN_ERR "SRAT: SRAT not used.\n");
	acpi_numa = -1;
}

static __init inline int srat_disabled(void)
{
	return acpi_numa < 0;
}

/* Callback for SLIT parsing */
void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
{
	int i, j;

	for (i = 0; i < slit->locality_count; i++)
		for (j = 0; j < slit->locality_count; j++)
			numa_set_distance(pxm_to_node(i), pxm_to_node(j),
				slit->entry[slit->locality_count * i + j]);
}

/* Callback for Proximity Domain -> x2APIC mapping */
void __init
acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
{
	int pxm, node;
	int apic_id;

	if (srat_disabled())
		return;
	if (pa->header.length < sizeof(struct acpi_srat_x2apic_cpu_affinity)) {
		bad_srat();
		return;
	}
	if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
		return;
	pxm = pa->proximity_domain;
	apic_id = pa->apic_id;
	if (!apic->apic_id_valid(apic_id)) {
		printk(KERN_INFO "SRAT: PXM %u -> X2APIC 0x%04x ignored\n",
			 pxm, apic_id);
		return;
	}
	node = setup_node(pxm);
	if (node < 0) {
		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
		bad_srat();
		return;
	}

	if (apic_id >= MAX_LOCAL_APIC) {
		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
		return;
	}
	set_apicid_to_node(apic_id, node);
	node_set(node, numa_nodes_parsed);
	acpi_numa = 1;
	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
	       pxm, apic_id, node);
}

/* Callback for Proximity Domain -> LAPIC mapping */
void __init
acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
{
	int pxm, node;
	int apic_id;

	if (srat_disabled())
		return;
	if (pa->header.length != sizeof(struct acpi_srat_cpu_affinity)) {
		bad_srat();
		return;
	}
	if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
		return;
	pxm = pa->proximity_domain_lo;
	if (acpi_srat_revision >= 2)
		pxm |= *((unsigned int*)pa->proximity_domain_hi) << 8;
	node = setup_node(pxm);
	if (node < 0) {
		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
		bad_srat();
		return;
	}

	if (get_uv_system_type() >= UV_X2APIC)
		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
	else
		apic_id = pa->apic_id;

	if (apic_id >= MAX_LOCAL_APIC) {
		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
		return;
	}

	set_apicid_to_node(apic_id, node);
	node_set(node, numa_nodes_parsed);
	acpi_numa = 1;
	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
	       pxm, apic_id, node);
}

#ifdef CONFIG_MEMORY_HOTPLUG
static inline int save_add_info(void) {return 1;}
#else
static inline int save_add_info(void) {return 0;}
#endif

#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
static void __init
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
{
	int overlap;
	unsigned long start_pfn, end_pfn;

	start_pfn = PFN_DOWN(start);
	end_pfn = PFN_UP(end);

	/*
	 * For movablemem_map=acpi:
	 *
	 * SRAT:		|_____| |_____| |_________| |_________| ......
	 * node id:                0       1         1           2
	 * hotpluggable:	   n       y         y           n
	 * movablemem_map:	        |_____| |_________|
	 *
	 * Using movablemem_map, we can prevent memblock from allocating memory
	 * on ZONE_MOVABLE at boot time.
	 */
	if (hotpluggable && movablemem_map.acpi) {
		insert_movablemem_map(start_pfn, end_pfn);
		goto out;
	}

	/*
	 * For movablemem_map=nn[KMG]@ss[KMG]:
	 *
	 * SRAT:		|_____| |_____| |_________| |_________| ......
	 * node id:		   0       1         1           2
	 * user specified:	          |__|                 |___|
	 * movablemem_map:		  |___| |_________|    |______| ......
	 *
	 * Using movablemem_map, we can prevent memblock from allocating memory
	 * on ZONE_MOVABLE at boot time.
	 *
	 * NOTE: In this case, SRAT info will be ingored.
	 */
	overlap = movablemem_map_overlap(start_pfn, end_pfn);
	if (overlap >= 0) {
		/*
		 * If part of this range is in movablemem_map, we need to
		 * add the range after it to extend the range to the end
		 * of the node, because from the min address specified to
		 * the end of the node will be ZONE_MOVABLE.
		 */
		start_pfn = max(start_pfn,
			    movablemem_map.map[overlap].start_pfn);
		insert_movablemem_map(start_pfn, end_pfn);

		/*
		 * Set the nodemask, so that if the address range on one node
		 * is not continuse, we can add the subsequent ranges on the
		 * same node into movablemem_map.
		 */
		node_set(node, movablemem_map.numa_nodes_hotplug);
	} else {
		if (node_isset(node, movablemem_map.numa_nodes_hotplug))
			/*
			 * Insert the range if we already have movable ranges
			 * on the same node.
			 */
			insert_movablemem_map(start_pfn, end_pfn);
	}
out:
	return;
}
#else		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static inline void
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
{
}
#endif		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
int __init
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
{
	u64 start, end;
	u32 hotpluggable;
	int node, pxm;

	if (srat_disabled())
		goto out_err;
	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
		goto out_err_bad_srat;
	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
		goto out_err;
	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
	if (hotpluggable && !save_add_info())
		goto out_err;

	start = ma->base_address;
	end = start + ma->length;
	pxm = ma->proximity_domain;
	if (acpi_srat_revision <= 1)
		pxm &= 0xff;

	node = setup_node(pxm);
	if (node < 0) {
		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
		goto out_err_bad_srat;
	}

	if (numa_add_memblk(node, start, end) < 0)
		goto out_err_bad_srat;

	node_set(node, numa_nodes_parsed);

	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
	       node, pxm,
	       (unsigned long long) start, (unsigned long long) end - 1,
	       hotpluggable ? "Hot Pluggable": "");

	handle_movablemem(node, start, end, hotpluggable);

	return 0;
out_err_bad_srat:
	bad_srat();
out_err:
	return -1;
}

void __init acpi_numa_arch_fixup(void) {}

int __init x86_acpi_numa_init(void)
{
	int ret;

	ret = acpi_numa_init();
	if (ret < 0)
		return ret;
	return srat_disabled() ? -EINVAL : 0;
}


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-01-26  0:40     ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-01-26  0:40 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Fri, 25 Jan 2013 17:42:09 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> We now provide an option for users who don't want to specify physical
> memory address in kernel commandline.
> 
>         /*
>          * For movablemem_map=acpi:
>          *
>          * SRAT:                |_____| |_____| |_________| |_________| ......
>          * node id:                0       1         1           2
>          * hotpluggable:           n       y         y           n
>          * movablemem_map:              |_____| |_________|
>          *
>          * Using movablemem_map, we can prevent memblock from allocating memory
>          * on ZONE_MOVABLE at boot time.
>          */
> 
> So user just specify movablemem_map=acpi, and the kernel will use hotpluggable
> info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.

Well, as a reult of my previous hackery, arch/x86/mm/srat.c now looks
rather different.  Please check it carefully and runtime test this code
when it appears in linux-next?


/*
 * ACPI 3.0 based NUMA setup
 * Copyright 2004 Andi Kleen, SuSE Labs.
 *
 * Reads the ACPI SRAT table to figure out what memory belongs to which CPUs.
 *
 * Called from acpi_numa_init while reading the SRAT and SLIT tables.
 * Assumes all memory regions belonging to a single proximity domain
 * are in one chunk. Holes between them will be included in the node.
 */

#include <linux/kernel.h>
#include <linux/acpi.h>
#include <linux/mmzone.h>
#include <linux/bitmap.h>
#include <linux/module.h>
#include <linux/topology.h>
#include <linux/bootmem.h>
#include <linux/memblock.h>
#include <linux/mm.h>
#include <asm/proto.h>
#include <asm/numa.h>
#include <asm/e820.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>

int acpi_numa __initdata;

static __init int setup_node(int pxm)
{
	return acpi_map_pxm_to_node(pxm);
}

static __init void bad_srat(void)
{
	printk(KERN_ERR "SRAT: SRAT not used.\n");
	acpi_numa = -1;
}

static __init inline int srat_disabled(void)
{
	return acpi_numa < 0;
}

/* Callback for SLIT parsing */
void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
{
	int i, j;

	for (i = 0; i < slit->locality_count; i++)
		for (j = 0; j < slit->locality_count; j++)
			numa_set_distance(pxm_to_node(i), pxm_to_node(j),
				slit->entry[slit->locality_count * i + j]);
}

/* Callback for Proximity Domain -> x2APIC mapping */
void __init
acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
{
	int pxm, node;
	int apic_id;

	if (srat_disabled())
		return;
	if (pa->header.length < sizeof(struct acpi_srat_x2apic_cpu_affinity)) {
		bad_srat();
		return;
	}
	if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
		return;
	pxm = pa->proximity_domain;
	apic_id = pa->apic_id;
	if (!apic->apic_id_valid(apic_id)) {
		printk(KERN_INFO "SRAT: PXM %u -> X2APIC 0x%04x ignored\n",
			 pxm, apic_id);
		return;
	}
	node = setup_node(pxm);
	if (node < 0) {
		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
		bad_srat();
		return;
	}

	if (apic_id >= MAX_LOCAL_APIC) {
		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
		return;
	}
	set_apicid_to_node(apic_id, node);
	node_set(node, numa_nodes_parsed);
	acpi_numa = 1;
	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
	       pxm, apic_id, node);
}

/* Callback for Proximity Domain -> LAPIC mapping */
void __init
acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
{
	int pxm, node;
	int apic_id;

	if (srat_disabled())
		return;
	if (pa->header.length != sizeof(struct acpi_srat_cpu_affinity)) {
		bad_srat();
		return;
	}
	if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
		return;
	pxm = pa->proximity_domain_lo;
	if (acpi_srat_revision >= 2)
		pxm |= *((unsigned int*)pa->proximity_domain_hi) << 8;
	node = setup_node(pxm);
	if (node < 0) {
		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
		bad_srat();
		return;
	}

	if (get_uv_system_type() >= UV_X2APIC)
		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
	else
		apic_id = pa->apic_id;

	if (apic_id >= MAX_LOCAL_APIC) {
		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
		return;
	}

	set_apicid_to_node(apic_id, node);
	node_set(node, numa_nodes_parsed);
	acpi_numa = 1;
	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
	       pxm, apic_id, node);
}

#ifdef CONFIG_MEMORY_HOTPLUG
static inline int save_add_info(void) {return 1;}
#else
static inline int save_add_info(void) {return 0;}
#endif

#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
static void __init
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
{
	int overlap;
	unsigned long start_pfn, end_pfn;

	start_pfn = PFN_DOWN(start);
	end_pfn = PFN_UP(end);

	/*
	 * For movablemem_map=acpi:
	 *
	 * SRAT:		|_____| |_____| |_________| |_________| ......
	 * node id:                0       1         1           2
	 * hotpluggable:	   n       y         y           n
	 * movablemem_map:	        |_____| |_________|
	 *
	 * Using movablemem_map, we can prevent memblock from allocating memory
	 * on ZONE_MOVABLE at boot time.
	 */
	if (hotpluggable && movablemem_map.acpi) {
		insert_movablemem_map(start_pfn, end_pfn);
		goto out;
	}

	/*
	 * For movablemem_map=nn[KMG]@ss[KMG]:
	 *
	 * SRAT:		|_____| |_____| |_________| |_________| ......
	 * node id:		   0       1         1           2
	 * user specified:	          |__|                 |___|
	 * movablemem_map:		  |___| |_________|    |______| ......
	 *
	 * Using movablemem_map, we can prevent memblock from allocating memory
	 * on ZONE_MOVABLE at boot time.
	 *
	 * NOTE: In this case, SRAT info will be ingored.
	 */
	overlap = movablemem_map_overlap(start_pfn, end_pfn);
	if (overlap >= 0) {
		/*
		 * If part of this range is in movablemem_map, we need to
		 * add the range after it to extend the range to the end
		 * of the node, because from the min address specified to
		 * the end of the node will be ZONE_MOVABLE.
		 */
		start_pfn = max(start_pfn,
			    movablemem_map.map[overlap].start_pfn);
		insert_movablemem_map(start_pfn, end_pfn);

		/*
		 * Set the nodemask, so that if the address range on one node
		 * is not continuse, we can add the subsequent ranges on the
		 * same node into movablemem_map.
		 */
		node_set(node, movablemem_map.numa_nodes_hotplug);
	} else {
		if (node_isset(node, movablemem_map.numa_nodes_hotplug))
			/*
			 * Insert the range if we already have movable ranges
			 * on the same node.
			 */
			insert_movablemem_map(start_pfn, end_pfn);
	}
out:
	return;
}
#else		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static inline void
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
{
}
#endif		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
int __init
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
{
	u64 start, end;
	u32 hotpluggable;
	int node, pxm;

	if (srat_disabled())
		goto out_err;
	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
		goto out_err_bad_srat;
	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
		goto out_err;
	hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
	if (hotpluggable && !save_add_info())
		goto out_err;

	start = ma->base_address;
	end = start + ma->length;
	pxm = ma->proximity_domain;
	if (acpi_srat_revision <= 1)
		pxm &= 0xff;

	node = setup_node(pxm);
	if (node < 0) {
		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
		goto out_err_bad_srat;
	}

	if (numa_add_memblk(node, start, end) < 0)
		goto out_err_bad_srat;

	node_set(node, numa_nodes_parsed);

	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
	       node, pxm,
	       (unsigned long long) start, (unsigned long long) end - 1,
	       hotpluggable ? "Hot Pluggable": "");

	handle_movablemem(node, start, end, hotpluggable);

	return 0;
out_err_bad_srat:
	bad_srat();
out_err:
	return -1;
}

void __init acpi_numa_arch_fixup(void) {}

int __init x86_acpi_numa_init(void)
{
	int ret;

	ret = acpi_numa_init();
	if (ret < 0)
		return ret;
	return srat_disabled() ? -EINVAL : 0;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-25  9:42   ` Tang Chen
@ 2013-01-26  1:12     ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-01-26  1:12 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Fri, 25 Jan 2013 17:42:09 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> NOTE: Using this way will cause NUMA performance down because the whole node
>       will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
>       If users don't want to lose NUMA performance, just don't use it.

I agree with this, but it means that nobody will test any of your new code.

To get improved testing coverage, can you think of any temporary
testing-only patch which will cause testers to exercise the
memory-hotplug changes?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-01-26  1:12     ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-01-26  1:12 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Fri, 25 Jan 2013 17:42:09 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> NOTE: Using this way will cause NUMA performance down because the whole node
>       will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
>       If users don't want to lose NUMA performance, just don't use it.

I agree with this, but it means that nobody will test any of your new code.

To get improved testing coverage, can you think of any temporary
testing-only patch which will cause testers to exercise the
memory-hotplug changes?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-26  1:12     ` Andrew Morton
@ 2013-01-26  1:29       ` H. Peter Anvin
  -1 siblings, 0 replies; 35+ messages in thread
From: H. Peter Anvin @ 2013-01-26  1:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tang Chen, jiang.liu, wujianguo, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On 01/25/2013 05:12 PM, Andrew Morton wrote:
> On Fri, 25 Jan 2013 17:42:09 +0800
> Tang Chen <tangchen@cn.fujitsu.com> wrote:
> 
>> NOTE: Using this way will cause NUMA performance down because the whole node
>>       will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
>>       If users don't want to lose NUMA performance, just don't use it.
> 
> I agree with this, but it means that nobody will test any of your new code.
> 
> To get improved testing coverage, can you think of any temporary
> testing-only patch which will cause testers to exercise the
> memory-hotplug changes?
> 

There is another problem: if ALL the nodes in the system support
hotpluggable memory, what happens?

	-hpa


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-01-26  1:29       ` H. Peter Anvin
  0 siblings, 0 replies; 35+ messages in thread
From: H. Peter Anvin @ 2013-01-26  1:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tang Chen, jiang.liu, wujianguo, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On 01/25/2013 05:12 PM, Andrew Morton wrote:
> On Fri, 25 Jan 2013 17:42:09 +0800
> Tang Chen <tangchen@cn.fujitsu.com> wrote:
> 
>> NOTE: Using this way will cause NUMA performance down because the whole node
>>       will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
>>       If users don't want to lose NUMA performance, just don't use it.
> 
> I agree with this, but it means that nobody will test any of your new code.
> 
> To get improved testing coverage, can you think of any temporary
> testing-only patch which will cause testers to exercise the
> memory-hotplug changes?
> 

There is another problem: if ALL the nodes in the system support
hotpluggable memory, what happens?

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] acpi, memory-hotplug: Extend movablemem_map ranges to the end of node.
  2013-01-26  0:36     ` Andrew Morton
@ 2013-01-28  1:53       ` Tang Chen
  -1 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-28  1:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

Hi Andrew,

On 01/26/2013 08:36 AM, Andrew Morton wrote:
>
> The patch generates a bunch of rejects, partly due to linux-next
> changes but I think I fixed everything up OK.

Thank you for your fixing. :)

>
>> index 4ddf497..f841d0e 100644
>> --- a/arch/x86/mm/srat.c
>> +++ b/arch/x86/mm/srat.c
>> @@ -141,11 +141,16 @@ static inline int save_add_info(void) {return 1;}
>>   static inline int save_add_info(void) {return 0;}
>>   #endif
>>
>> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> +extern struct movablemem_map movablemem_map;
>> +#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>
> Well.
>
> a) we shouldn't put extern declarations in C files - put them in
>     headers so we can be assured that all compilation units agree on the
>     type.

OK, got it. :)

>
> b) the ifdefs are unneeded - a unused extern declaration is OK (as
>     long as the type itself is always defined!)

I think struct movablemem_map is defined in mm.h, protected by 
CONFIG_HAVE_MEMBLOCK_NODE_MAP.
So the declaration needs to be protected by CONFIG_HAVE_MEMBLOCK_NODE_MAP.

>
> c) movablemem_map is already declared in memblock.h.

Hum, yes, it is defined in mm.h, included by memblock.h :)

>
> So I zapped the above three lines.
>
>> @@ -178,9 +185,57 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>>
>>   	node_set(node, numa_nodes_parsed);
>>
>> -	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
>> +	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
>>   	       node, pxm,
>> -	       (unsigned long long) start, (unsigned long long) end - 1);
>> +	       (unsigned long long) start, (unsigned long long) end - 1,
>> +	       hotpluggable ? "Hot Pluggable": "");
>> +
>> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> +	int overlap;
>> +	unsigned long start_pfn, end_pfn;
>
> no, we don't put declarations of locals in the middle of C statements
> like this:
>
> arch/x86/mm/srat.c: In function 'acpi_numa_memory_affinity_init':
> arch/x86/mm/srat.c:185: warning: ISO C90 forbids mixed declarations and code
>
> Did your compiler not emit this warning?

Sorry, I think it did, but I missed it.
THanks for fixing that. :)

>
> I fixed this by moving the code into a new function
> "handle_movablemem".  Feel free to suggest a more appropriate name!
>
>
> From: Andrew Morton<akpm@linux-foundation.org>
> Subject: acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix
>
> clean up code, fix build warning
>
> Cc: "Brown, Len"<len.brown@intel.com>
> Cc: "H. Peter Anvin"<hpa@zytor.com>
> Cc: Ingo Molnar<mingo@elte.hu>
> Cc: Jiang Liu<jiang.liu@huawei.com>
> Cc: Jianguo Wu<wujianguo@huawei.com>
> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Cc: Kamezawa Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Lai Jiangshan<laijs@cn.fujitsu.com>
> Cc: Len Brown<lenb@kernel.org>
> Cc: Tang Chen<tangchen@cn.fujitsu.com>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: Wu Jianguo<wujianguo@huawei.com>
> Cc: Yasuaki Ishimatsu<isimatu.yasuaki@jp.fujitsu.com>
> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
> ---
>
>   arch/x86/mm/srat.c |   93 ++++++++++++++++++++++---------------------
>   1 file changed, 49 insertions(+), 44 deletions(-)
>
> diff -puN arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix arch/x86/mm/srat.c
> --- a/arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix
> +++ a/arch/x86/mm/srat.c
> @@ -142,50 +142,8 @@ static inline int save_add_info(void) {r
>   #endif
>
>   #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> -extern struct movablemem_map movablemem_map;
> -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> -
> -/* Callback for parsing of the Proximity Domain<->  Memory Area mappings */
> -int __init
> -acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> +static void __init handle_movablemem(int node, u64 start, u64 end)
>   {
> -	u64 start, end;
> -	u32 hotpluggable;
> -	int node, pxm;
> -
> -	if (srat_disabled())
> -		goto out_err;
> -	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
> -		goto out_err_bad_srat;
> -	if ((ma->flags&  ACPI_SRAT_MEM_ENABLED) == 0)
> -		goto out_err;
> -	hotpluggable = ma->flags&  ACPI_SRAT_MEM_HOT_PLUGGABLE;
> -	if (hotpluggable&&  !save_add_info())
> -		goto out_err;
> -
> -	start = ma->base_address;
> -	end = start + ma->length;
> -	pxm = ma->proximity_domain;
> -	if (acpi_srat_revision<= 1)
> -		pxm&= 0xff;
> -
> -	node = setup_node(pxm);
> -	if (node<  0) {
> -		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
> -		goto out_err_bad_srat;
> -	}
> -
> -	if (numa_add_memblk(node, start, end)<  0)
> -		goto out_err_bad_srat;
> -
> -	node_set(node, numa_nodes_parsed);
> -
> -	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
> -	       node, pxm,
> -	       (unsigned long long) start, (unsigned long long) end - 1,
> -	       hotpluggable ? "Hot Pluggable": "");
> -
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>   	int overlap;
>   	unsigned long start_pfn, end_pfn;
>
> @@ -229,7 +187,54 @@ acpi_numa_memory_affinity_init(struct ac
>   			 */
>   			insert_movablemem_map(start_pfn, end_pfn);
>   	}
> -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> +}
> +#else		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> +static inline void handle_movablemem(int node, u64 start, u64 end)
> +{
> +}
> +#endif		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> +
> +/* Callback for parsing of the Proximity Domain<->  Memory Area mappings */
> +int __init
> +acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> +{
> +	u64 start, end;
> +	u32 hotpluggable;
> +	int node, pxm;
> +
> +	if (srat_disabled())
> +		goto out_err;
> +	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
> +		goto out_err_bad_srat;
> +	if ((ma->flags&  ACPI_SRAT_MEM_ENABLED) == 0)
> +		goto out_err;
> +	hotpluggable = ma->flags&  ACPI_SRAT_MEM_HOT_PLUGGABLE;
> +	if (hotpluggable&&  !save_add_info())
> +		goto out_err;
> +
> +	start = ma->base_address;
> +	end = start + ma->length;
> +	pxm = ma->proximity_domain;
> +	if (acpi_srat_revision<= 1)
> +		pxm&= 0xff;
> +
> +	node = setup_node(pxm);
> +	if (node<  0) {
> +		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
> +		goto out_err_bad_srat;
> +	}
> +
> +	if (numa_add_memblk(node, start, end)<  0)
> +		goto out_err_bad_srat;
> +
> +	node_set(node, numa_nodes_parsed);
> +
> +	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
> +	       node, pxm,
> +	       (unsigned long long) start, (unsigned long long) end - 1,
> +	       hotpluggable ? "Hot Pluggable": "");
> +
> +	handle_movablemem(node, start, end);
>
>   	return 0;
>   out_err_bad_srat:
> diff -puN include/linux/mm.h~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix include/linux/mm.h
> diff -puN mm/page_alloc.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix mm/page_alloc.c
> _
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] acpi, memory-hotplug: Extend movablemem_map ranges to the end of node.
@ 2013-01-28  1:53       ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-28  1:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

Hi Andrew,

On 01/26/2013 08:36 AM, Andrew Morton wrote:
>
> The patch generates a bunch of rejects, partly due to linux-next
> changes but I think I fixed everything up OK.

Thank you for your fixing. :)

>
>> index 4ddf497..f841d0e 100644
>> --- a/arch/x86/mm/srat.c
>> +++ b/arch/x86/mm/srat.c
>> @@ -141,11 +141,16 @@ static inline int save_add_info(void) {return 1;}
>>   static inline int save_add_info(void) {return 0;}
>>   #endif
>>
>> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> +extern struct movablemem_map movablemem_map;
>> +#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>
> Well.
>
> a) we shouldn't put extern declarations in C files - put them in
>     headers so we can be assured that all compilation units agree on the
>     type.

OK, got it. :)

>
> b) the ifdefs are unneeded - a unused extern declaration is OK (as
>     long as the type itself is always defined!)

I think struct movablemem_map is defined in mm.h, protected by 
CONFIG_HAVE_MEMBLOCK_NODE_MAP.
So the declaration needs to be protected by CONFIG_HAVE_MEMBLOCK_NODE_MAP.

>
> c) movablemem_map is already declared in memblock.h.

Hum, yes, it is defined in mm.h, included by memblock.h :)

>
> So I zapped the above three lines.
>
>> @@ -178,9 +185,57 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>>
>>   	node_set(node, numa_nodes_parsed);
>>
>> -	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
>> +	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
>>   	       node, pxm,
>> -	       (unsigned long long) start, (unsigned long long) end - 1);
>> +	       (unsigned long long) start, (unsigned long long) end - 1,
>> +	       hotpluggable ? "Hot Pluggable": "");
>> +
>> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> +	int overlap;
>> +	unsigned long start_pfn, end_pfn;
>
> no, we don't put declarations of locals in the middle of C statements
> like this:
>
> arch/x86/mm/srat.c: In function 'acpi_numa_memory_affinity_init':
> arch/x86/mm/srat.c:185: warning: ISO C90 forbids mixed declarations and code
>
> Did your compiler not emit this warning?

Sorry, I think it did, but I missed it.
THanks for fixing that. :)

>
> I fixed this by moving the code into a new function
> "handle_movablemem".  Feel free to suggest a more appropriate name!
>
>
> From: Andrew Morton<akpm@linux-foundation.org>
> Subject: acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix
>
> clean up code, fix build warning
>
> Cc: "Brown, Len"<len.brown@intel.com>
> Cc: "H. Peter Anvin"<hpa@zytor.com>
> Cc: Ingo Molnar<mingo@elte.hu>
> Cc: Jiang Liu<jiang.liu@huawei.com>
> Cc: Jianguo Wu<wujianguo@huawei.com>
> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Cc: Kamezawa Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Lai Jiangshan<laijs@cn.fujitsu.com>
> Cc: Len Brown<lenb@kernel.org>
> Cc: Tang Chen<tangchen@cn.fujitsu.com>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: Wu Jianguo<wujianguo@huawei.com>
> Cc: Yasuaki Ishimatsu<isimatu.yasuaki@jp.fujitsu.com>
> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
> ---
>
>   arch/x86/mm/srat.c |   93 ++++++++++++++++++++++---------------------
>   1 file changed, 49 insertions(+), 44 deletions(-)
>
> diff -puN arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix arch/x86/mm/srat.c
> --- a/arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix
> +++ a/arch/x86/mm/srat.c
> @@ -142,50 +142,8 @@ static inline int save_add_info(void) {r
>   #endif
>
>   #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> -extern struct movablemem_map movablemem_map;
> -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> -
> -/* Callback for parsing of the Proximity Domain<->  Memory Area mappings */
> -int __init
> -acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> +static void __init handle_movablemem(int node, u64 start, u64 end)
>   {
> -	u64 start, end;
> -	u32 hotpluggable;
> -	int node, pxm;
> -
> -	if (srat_disabled())
> -		goto out_err;
> -	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
> -		goto out_err_bad_srat;
> -	if ((ma->flags&  ACPI_SRAT_MEM_ENABLED) == 0)
> -		goto out_err;
> -	hotpluggable = ma->flags&  ACPI_SRAT_MEM_HOT_PLUGGABLE;
> -	if (hotpluggable&&  !save_add_info())
> -		goto out_err;
> -
> -	start = ma->base_address;
> -	end = start + ma->length;
> -	pxm = ma->proximity_domain;
> -	if (acpi_srat_revision<= 1)
> -		pxm&= 0xff;
> -
> -	node = setup_node(pxm);
> -	if (node<  0) {
> -		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
> -		goto out_err_bad_srat;
> -	}
> -
> -	if (numa_add_memblk(node, start, end)<  0)
> -		goto out_err_bad_srat;
> -
> -	node_set(node, numa_nodes_parsed);
> -
> -	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
> -	       node, pxm,
> -	       (unsigned long long) start, (unsigned long long) end - 1,
> -	       hotpluggable ? "Hot Pluggable": "");
> -
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>   	int overlap;
>   	unsigned long start_pfn, end_pfn;
>
> @@ -229,7 +187,54 @@ acpi_numa_memory_affinity_init(struct ac
>   			 */
>   			insert_movablemem_map(start_pfn, end_pfn);
>   	}
> -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> +}
> +#else		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> +static inline void handle_movablemem(int node, u64 start, u64 end)
> +{
> +}
> +#endif		/* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> +
> +/* Callback for parsing of the Proximity Domain<->  Memory Area mappings */
> +int __init
> +acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> +{
> +	u64 start, end;
> +	u32 hotpluggable;
> +	int node, pxm;
> +
> +	if (srat_disabled())
> +		goto out_err;
> +	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity))
> +		goto out_err_bad_srat;
> +	if ((ma->flags&  ACPI_SRAT_MEM_ENABLED) == 0)
> +		goto out_err;
> +	hotpluggable = ma->flags&  ACPI_SRAT_MEM_HOT_PLUGGABLE;
> +	if (hotpluggable&&  !save_add_info())
> +		goto out_err;
> +
> +	start = ma->base_address;
> +	end = start + ma->length;
> +	pxm = ma->proximity_domain;
> +	if (acpi_srat_revision<= 1)
> +		pxm&= 0xff;
> +
> +	node = setup_node(pxm);
> +	if (node<  0) {
> +		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
> +		goto out_err_bad_srat;
> +	}
> +
> +	if (numa_add_memblk(node, start, end)<  0)
> +		goto out_err_bad_srat;
> +
> +	node_set(node, numa_nodes_parsed);
> +
> +	printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
> +	       node, pxm,
> +	       (unsigned long long) start, (unsigned long long) end - 1,
> +	       hotpluggable ? "Hot Pluggable": "");
> +
> +	handle_movablemem(node, start, end);
>
>   	return 0;
>   out_err_bad_srat:
> diff -puN include/linux/mm.h~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix include/linux/mm.h
> diff -puN mm/page_alloc.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix mm/page_alloc.c
> _
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-26  1:29       ` H. Peter Anvin
@ 2013-01-28  2:07         ` Tang Chen
  -1 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-28  2:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, jiang.liu, wujianguo, wency, laijs, linfeng,
	yinghai, isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim,
	mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
	tony.luck, glommer, linux-kernel, linux-mm

On 01/26/2013 09:29 AM, H. Peter Anvin wrote:
> On 01/25/2013 05:12 PM, Andrew Morton wrote:
>> On Fri, 25 Jan 2013 17:42:09 +0800
>> Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>>
>>> NOTE: Using this way will cause NUMA performance down because the whole node
>>>        will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
>>>        If users don't want to lose NUMA performance, just don't use it.
>>
>> I agree with this, but it means that nobody will test any of your new code.
>>
>> To get improved testing coverage, can you think of any temporary
>> testing-only patch which will cause testers to exercise the
>> memory-hotplug changes?
>>
>
> There is another problem: if ALL the nodes in the system support
> hotpluggable memory, what happens?
>

Hi HPA,

I think I missed this case. If all the memory is hotpluggable, and user 
specified
movablemem_map=acpi, all the memory could be set as movable, and the 
kernel will
fail to start.

I will post a patch to fix it. How about always keep node0 unhotpluggable ?

Thanks. :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-01-28  2:07         ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-28  2:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, jiang.liu, wujianguo, wency, laijs, linfeng,
	yinghai, isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim,
	mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
	tony.luck, glommer, linux-kernel, linux-mm

On 01/26/2013 09:29 AM, H. Peter Anvin wrote:
> On 01/25/2013 05:12 PM, Andrew Morton wrote:
>> On Fri, 25 Jan 2013 17:42:09 +0800
>> Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>>
>>> NOTE: Using this way will cause NUMA performance down because the whole node
>>>        will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
>>>        If users don't want to lose NUMA performance, just don't use it.
>>
>> I agree with this, but it means that nobody will test any of your new code.
>>
>> To get improved testing coverage, can you think of any temporary
>> testing-only patch which will cause testers to exercise the
>> memory-hotplug changes?
>>
>
> There is another problem: if ALL the nodes in the system support
> hotpluggable memory, what happens?
>

Hi HPA,

I think I missed this case. If all the memory is hotpluggable, and user 
specified
movablemem_map=acpi, all the memory could be set as movable, and the 
kernel will
fail to start.

I will post a patch to fix it. How about always keep node0 unhotpluggable ?

Thanks. :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-26  1:12     ` Andrew Morton
@ 2013-01-28  9:15       ` Tang Chen
  -1 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-28  9:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On 01/26/2013 09:12 AM, Andrew Morton wrote:
> On Fri, 25 Jan 2013 17:42:09 +0800
> Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>
>> NOTE: Using this way will cause NUMA performance down because the whole node
>>        will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
>>        If users don't want to lose NUMA performance, just don't use it.
>
> I agree with this, but it means that nobody will test any of your new code.
>
> To get improved testing coverage, can you think of any temporary
> testing-only patch which will cause testers to exercise the
> memory-hotplug changes?

Hi Andrew,

OK,I‘ll think about it and post the testing patch. But I think a shell 
script
may be easier because the boot option is specified by user and the code 
only runs
when system is booting. So we need to reboot the system all the time.

Thanks. :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-01-28  9:15       ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-01-28  9:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On 01/26/2013 09:12 AM, Andrew Morton wrote:
> On Fri, 25 Jan 2013 17:42:09 +0800
> Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>
>> NOTE: Using this way will cause NUMA performance down because the whole node
>>        will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
>>        If users don't want to lose NUMA performance, just don't use it.
>
> I agree with this, but it means that nobody will test any of your new code.
>
> To get improved testing coverage, can you think of any temporary
> testing-only patch which will cause testers to exercise the
> memory-hotplug changes?

Hi Andrew,

OK,I‘ll think about it and post the testing patch. But I think a shell 
script
may be easier because the boot option is specified by user and the code 
only runs
when system is booting. So we need to reboot the system all the time.

Thanks. :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-28  2:07         ` Tang Chen
@ 2013-01-28 17:45           ` Luck, Tony
  -1 siblings, 0 replies; 35+ messages in thread
From: Luck, Tony @ 2013-01-28 17:45 UTC (permalink / raw)
  To: Tang Chen, H. Peter Anvin
  Cc: Andrew Morton, jiang.liu, wujianguo, wency, laijs, linfeng,
	yinghai, isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim,
	mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
	glommer, linux-kernel, linux-mm

> I will post a patch to fix it. How about always keep node0 unhotpluggable ?

Node 0 (or more specifically the node that contains memory <4GB) will be
full of BIOS reserved holes in the memory map. It probably isn't removable
even if Linux thinks it is.  Someday we might have a smart BIOS that can
relocate itself to another node - but for now making node0 unhotpluggable
looks to be a plausible interim move.

Ultimately we'd like to be able to remove any node (just not all of them at
the same time ... just like we can now offline any cpu - but not all of them
together).

-Tony

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-01-28 17:45           ` Luck, Tony
  0 siblings, 0 replies; 35+ messages in thread
From: Luck, Tony @ 2013-01-28 17:45 UTC (permalink / raw)
  To: Tang Chen, H. Peter Anvin
  Cc: Andrew Morton, jiang.liu, wujianguo, wency, laijs, linfeng,
	yinghai, isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim,
	mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
	glommer, linux-kernel, linux-mm

> I will post a patch to fix it. How about always keep node0 unhotpluggable ?

Node 0 (or more specifically the node that contains memory <4GB) will be
full of BIOS reserved holes in the memory map. It probably isn't removable
even if Linux thinks it is.  Someday we might have a smart BIOS that can
relocate itself to another node - but for now making node0 unhotpluggable
looks to be a plausible interim move.

Ultimately we'd like to be able to remove any node (just not all of them at
the same time ... just like we can now offline any cpu - but not all of them
together).

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-28 17:45           ` Luck, Tony
  (?)
@ 2013-01-29  6:43           ` Tang Chen
  2013-01-29 18:38             ` Luck, Tony
  -1 siblings, 1 reply; 35+ messages in thread
From: Tang Chen @ 2013-01-29  6:43 UTC (permalink / raw)
  To: Luck, Tony
  Cc: H. Peter Anvin, Andrew Morton, jiang.liu, wujianguo, wency,
	laijs, linfeng, yinghai, isimatu.yasuaki, rob, kosaki.motohiro,
	minchan.kim, mgorman, rientjes, guz.fnst, rusty, lliubbo,
	jaegeuk.hanse, glommer, linux-kernel, linux-mm

On 01/29/2013 01:45 AM, Luck, Tony wrote:
>> I will post a patch to fix it. How about always keep node0 unhotpluggable ?
>
> Node 0 (or more specifically the node that contains memory<4GB) will be
> full of BIOS reserved holes in the memory map.

Hi Tony,

One thing I'm not sure, is memory<4GB always on node 0 ?
On my box, it is on node 0.

But since node id is 1-1 mapped to PXM in SRAT, if SRAT entries are not 
ordered by
physical address, memory<4GB may not on node 0. I think this is 
something related
to firmware. i didn't find anything about the order problem in ACPI 
specification.

So, do we just check if the node id != 0, or we need to check if we have 
reserved
enough for kernel, such as 4GB ?

Thanks. :)

>It probably isn't removable
> even if Linux thinks it is.  Someday we might have a smart BIOS that can
> relocate itself to another node - but for now making node0 unhotpluggable
> looks to be a plausible interim move.
>
> Ultimately we'd like to be able to remove any node (just not all of them at
> the same time ... just like we can now offline any cpu - but not all of them
> together).
>
> -Tony
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-29  6:43           ` Tang Chen
@ 2013-01-29 18:38             ` Luck, Tony
  2013-01-29 18:40               ` H. Peter Anvin
  0 siblings, 1 reply; 35+ messages in thread
From: Luck, Tony @ 2013-01-29 18:38 UTC (permalink / raw)
  To: Tang Chen
  Cc: H. Peter Anvin, Andrew Morton, jiang.liu, wujianguo, wency,
	laijs, linfeng, yinghai, isimatu.yasuaki, rob, kosaki.motohiro,
	minchan.kim, mgorman, rientjes, guz.fnst, rusty, lliubbo,
	jaegeuk.hanse, glommer, linux-kernel, linux-mm

>> Node 0 (or more specifically the node that contains memory<4GB) will be
>> full of BIOS reserved holes in the memory map.

> One thing I'm not sure, is memory<4GB always on node 0 ?
> On my box, it is on node 0.

I think in practice the <4GB memory will be on node 0 ... but it all depends
on how Linux decides to number the nodes ... which in turn depends on the
order of entries in various BIOS tables.  So it is theoretically possible that
we'd end up with some system on which the low memory is on some other
node. But it might require stranger than usual BIOS.

Summary: coding "node == 0" is almost 100% certain to be right - except
on some pathological systems.  So code for node==0 and if we ever see
a pathological machine - we can either point and laugh at the BIOS people
that set that up - or possibly fix our code.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-29 18:38             ` Luck, Tony
@ 2013-01-29 18:40               ` H. Peter Anvin
  0 siblings, 0 replies; 35+ messages in thread
From: H. Peter Anvin @ 2013-01-29 18:40 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Tang Chen, Andrew Morton, jiang.liu, wujianguo, wency, laijs,
	linfeng, yinghai, isimatu.yasuaki, rob, kosaki.motohiro,
	minchan.kim, mgorman, rientjes, guz.fnst, rusty, lliubbo,
	jaegeuk.hanse, glommer, linux-kernel, linux-mm

On 01/29/2013 10:38 AM, Luck, Tony wrote:
>>> Node 0 (or more specifically the node that contains memory<4GB) will be
>>> full of BIOS reserved holes in the memory map.
> 
>> One thing I'm not sure, is memory<4GB always on node 0 ?
>> On my box, it is on node 0.
> 
> I think in practice the <4GB memory will be on node 0 ... but it all depends
> on how Linux decides to number the nodes ... which in turn depends on the
> order of entries in various BIOS tables.  So it is theoretically possible that
> we'd end up with some system on which the low memory is on some other
> node. But it might require stranger than usual BIOS.
> 
> Summary: coding "node == 0" is almost 100% certain to be right - except
> on some pathological systems.  So code for node==0 and if we ever see
> a pathological machine - we can either point and laugh at the BIOS people
> that set that up - or possibly fix our code.
> 

We also probably need to weld down the memory that the kernel static
areas occupy and where we have allocated memory during boot time.  In
particular, memory that you want to be movable MUST NOT have boot-time
kernel allocations, and it becomes critical to enforce that, lest you
pull memory that you thought was safe and crash the system.

	-hpa


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-01-25  9:42   ` Tang Chen
@ 2013-02-04 23:26     ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-02-04 23:26 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Fri, 25 Jan 2013 17:42:09 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> We now provide an option for users who don't want to specify physical
> memory address in kernel commandline.
> 
>         /*
>          * For movablemem_map=acpi:
>          *
>          * SRAT:                |_____| |_____| |_________| |_________| ......
>          * node id:                0       1         1           2
>          * hotpluggable:           n       y         y           n
>          * movablemem_map:              |_____| |_________|
>          *
>          * Using movablemem_map, we can prevent memblock from allocating memory
>          * on ZONE_MOVABLE at boot time.
>          */
> 
> So user just specify movablemem_map=acpi, and the kernel will use hotpluggable
> info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.
> 
> ...
>  
> +	if (!strncmp(p, "acpi", max(4, strlen(p))))
> +		movablemem_map.acpi = true;

Generates a warning:

mm/page_alloc.c: In function 'cmdline_parse_movablemem_map':
mm/page_alloc.c:5312: warning: comparison of distinct pointer types lacks a cast

due to max(int, size_t).

This is easily fixed, but the code looks rather pointless.  If the
incoming string is supposed to be exactly "acpi" then use strcmp().  If
the incoming string must start with "acpi" then use strncmp(p, "acpi", 4).

IOW, the max is unneeded?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-02-04 23:26     ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-02-04 23:26 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Fri, 25 Jan 2013 17:42:09 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> We now provide an option for users who don't want to specify physical
> memory address in kernel commandline.
> 
>         /*
>          * For movablemem_map=acpi:
>          *
>          * SRAT:                |_____| |_____| |_________| |_________| ......
>          * node id:                0       1         1           2
>          * hotpluggable:           n       y         y           n
>          * movablemem_map:              |_____| |_________|
>          *
>          * Using movablemem_map, we can prevent memblock from allocating memory
>          * on ZONE_MOVABLE at boot time.
>          */
> 
> So user just specify movablemem_map=acpi, and the kernel will use hotpluggable
> info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.
> 
> ...
>  
> +	if (!strncmp(p, "acpi", max(4, strlen(p))))
> +		movablemem_map.acpi = true;

Generates a warning:

mm/page_alloc.c: In function 'cmdline_parse_movablemem_map':
mm/page_alloc.c:5312: warning: comparison of distinct pointer types lacks a cast

due to max(int, size_t).

This is easily fixed, but the code looks rather pointless.  If the
incoming string is supposed to be exactly "acpi" then use strcmp().  If
the incoming string must start with "acpi" then use strncmp(p, "acpi", 4).

IOW, the max is unneeded?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-02-04 23:26     ` Andrew Morton
@ 2013-02-06  2:20       ` Tang Chen
  -1 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-02-06  2:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On 02/05/2013 07:26 AM, Andrew Morton wrote:
> On Fri, 25 Jan 2013 17:42:09 +0800
> Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>
>> We now provide an option for users who don't want to specify physical
>> memory address in kernel commandline.
>>
>>          /*
>>           * For movablemem_map=acpi:
>>           *
>>           * SRAT:                |_____| |_____| |_________| |_________| ......
>>           * node id:                0       1         1           2
>>           * hotpluggable:           n       y         y           n
>>           * movablemem_map:              |_____| |_________|
>>           *
>>           * Using movablemem_map, we can prevent memblock from allocating memory
>>           * on ZONE_MOVABLE at boot time.
>>           */
>>
>> So user just specify movablemem_map=acpi, and the kernel will use hotpluggable
>> info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.
>>
>> ...
>>
>> +	if (!strncmp(p, "acpi", max(4, strlen(p))))
>> +		movablemem_map.acpi = true;
>
> Generates a warning:
>
> mm/page_alloc.c: In function 'cmdline_parse_movablemem_map':
> mm/page_alloc.c:5312: warning: comparison of distinct pointer types lacks a cast
>
> due to max(int, size_t).
>
> This is easily fixed, but the code looks rather pointless.  If the
> incoming string is supposed to be exactly "acpi" then use strcmp().  If
> the incoming string must start with "acpi" then use strncmp(p, "acpi", 4).
>
> IOW, the max is unneeded?

Hi Andrew,

I think I made another mistake here. I meant to use min(4, strlen(p)) in 
case p is
something like 'aaa' whose length is less then 4. But I mistook it with 
max().

But after I dig into strcmp() in the kernel, I think it is OK to use 
strcmp().
min() or max() is not needed.

Thanks. :)

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-02-06  2:20       ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-02-06  2:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On 02/05/2013 07:26 AM, Andrew Morton wrote:
> On Fri, 25 Jan 2013 17:42:09 +0800
> Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>
>> We now provide an option for users who don't want to specify physical
>> memory address in kernel commandline.
>>
>>          /*
>>           * For movablemem_map=acpi:
>>           *
>>           * SRAT:                |_____| |_____| |_________| |_________| ......
>>           * node id:                0       1         1           2
>>           * hotpluggable:           n       y         y           n
>>           * movablemem_map:              |_____| |_________|
>>           *
>>           * Using movablemem_map, we can prevent memblock from allocating memory
>>           * on ZONE_MOVABLE at boot time.
>>           */
>>
>> So user just specify movablemem_map=acpi, and the kernel will use hotpluggable
>> info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.
>>
>> ...
>>
>> +	if (!strncmp(p, "acpi", max(4, strlen(p))))
>> +		movablemem_map.acpi = true;
>
> Generates a warning:
>
> mm/page_alloc.c: In function 'cmdline_parse_movablemem_map':
> mm/page_alloc.c:5312: warning: comparison of distinct pointer types lacks a cast
>
> due to max(int, size_t).
>
> This is easily fixed, but the code looks rather pointless.  If the
> incoming string is supposed to be exactly "acpi" then use strcmp().  If
> the incoming string must start with "acpi" then use strncmp(p, "acpi", 4).
>
> IOW, the max is unneeded?

Hi Andrew,

I think I made another mistake here. I meant to use min(4, strlen(p)) in 
case p is
something like 'aaa' whose length is less then 4. But I mistook it with 
max().

But after I dig into strcmp() in the kernel, I think it is OK to use 
strcmp().
min() or max() is not needed.

Thanks. :)

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-02-06  2:20       ` Tang Chen
@ 2013-02-06 21:54         ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-02-06 21:54 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Wed, 06 Feb 2013 10:20:57 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> >>
> >> +	if (!strncmp(p, "acpi", max(4, strlen(p))))
> >> +		movablemem_map.acpi = true;
> >
> > Generates a warning:
> >
> > mm/page_alloc.c: In function 'cmdline_parse_movablemem_map':
> > mm/page_alloc.c:5312: warning: comparison of distinct pointer types lacks a cast
> >
> > due to max(int, size_t).
> >
> > This is easily fixed, but the code looks rather pointless.  If the
> > incoming string is supposed to be exactly "acpi" then use strcmp().  If
> > the incoming string must start with "acpi" then use strncmp(p, "acpi", 4).
> >
> > IOW, the max is unneeded?
> 
> Hi Andrew,
> 
> I think I made another mistake here. I meant to use min(4, strlen(p)) in 
> case p is
> something like 'aaa' whose length is less then 4. But I mistook it with 
> max().
> 
> But after I dig into strcmp() in the kernel, I think it is OK to use 
> strcmp().
> min() or max() is not needed.

OK, I did that.

But the code still looks a bit more complex than we need.  Could we do

static int __init cmdline_parse_movablemem_map(char *p)
{
	char *oldp;
	u64 start_at, mem_size;

	if (!p)
		goto err;

	/*
	 * If user decide to use info from BIOS, all the other user specified
	 * ranges will be ingored.
	 */
	if (!strcmp(p, "acpi")) {
		movablemem_map.acpi = true;
		if (movablemem_map.nr_map) {
			memset(movablemem_map.map, 0,
				sizeof(struct movablemem_entry)
				* movablemem_map.nr_map);
			movablemem_map.nr_map = 0;
		}
		return 0;
	}


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-02-06 21:54         ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2013-02-06 21:54 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Wed, 06 Feb 2013 10:20:57 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> >>
> >> +	if (!strncmp(p, "acpi", max(4, strlen(p))))
> >> +		movablemem_map.acpi = true;
> >
> > Generates a warning:
> >
> > mm/page_alloc.c: In function 'cmdline_parse_movablemem_map':
> > mm/page_alloc.c:5312: warning: comparison of distinct pointer types lacks a cast
> >
> > due to max(int, size_t).
> >
> > This is easily fixed, but the code looks rather pointless.  If the
> > incoming string is supposed to be exactly "acpi" then use strcmp().  If
> > the incoming string must start with "acpi" then use strncmp(p, "acpi", 4).
> >
> > IOW, the max is unneeded?
> 
> Hi Andrew,
> 
> I think I made another mistake here. I meant to use min(4, strlen(p)) in 
> case p is
> something like 'aaa' whose length is less then 4. But I mistook it with 
> max().
> 
> But after I dig into strcmp() in the kernel, I think it is OK to use 
> strcmp().
> min() or max() is not needed.

OK, I did that.

But the code still looks a bit more complex than we need.  Could we do

static int __init cmdline_parse_movablemem_map(char *p)
{
	char *oldp;
	u64 start_at, mem_size;

	if (!p)
		goto err;

	/*
	 * If user decide to use info from BIOS, all the other user specified
	 * ranges will be ingored.
	 */
	if (!strcmp(p, "acpi")) {
		movablemem_map.acpi = true;
		if (movablemem_map.nr_map) {
			memset(movablemem_map.map, 0,
				sizeof(struct movablemem_entry)
				* movablemem_map.nr_map);
			movablemem_map.nr_map = 0;
		}
		return 0;
	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
  2013-02-06 21:54         ` Andrew Morton
@ 2013-02-07  6:22           ` Tang Chen
  -1 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-02-07  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On 02/07/2013 05:54 AM, Andrew Morton wrote:
> On Wed, 06 Feb 2013 10:20:57 +0800
> Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>
>>>>
>>>> +	if (!strncmp(p, "acpi", max(4, strlen(p))))
>>>> +		movablemem_map.acpi = true;
>>>
>>> Generates a warning:
>>>
>>> mm/page_alloc.c: In function 'cmdline_parse_movablemem_map':
>>> mm/page_alloc.c:5312: warning: comparison of distinct pointer types lacks a cast
>>>
>>> due to max(int, size_t).
>>>
>>> This is easily fixed, but the code looks rather pointless.  If the
>>> incoming string is supposed to be exactly "acpi" then use strcmp().  If
>>> the incoming string must start with "acpi" then use strncmp(p, "acpi", 4).
>>>
>>> IOW, the max is unneeded?
>>
>> Hi Andrew,
>>
>> I think I made another mistake here. I meant to use min(4, strlen(p)) in
>> case p is
>> something like 'aaa' whose length is less then 4. But I mistook it with
>> max().
>>
>> But after I dig into strcmp() in the kernel, I think it is OK to use
>> strcmp().
>> min() or max() is not needed.
>
> OK, I did that.
>
> But the code still looks a bit more complex than we need.  Could we do
>
> static int __init cmdline_parse_movablemem_map(char *p)
> {
> 	char *oldp;
> 	u64 start_at, mem_size;
>
> 	if (!p)
> 		goto err;
>
> 	/*
> 	 * If user decide to use info from BIOS, all the other user specified
> 	 * ranges will be ingored.
> 	 */
> 	if (!strcmp(p, "acpi")) {
> 		movablemem_map.acpi = true;
> 		if (movablemem_map.nr_map) {
> 			memset(movablemem_map.map, 0,
> 				sizeof(struct movablemem_entry)
> 				* movablemem_map.nr_map);
> 			movablemem_map.nr_map = 0;
> 		}
> 		return 0;
> 	}
>
>
No, I don't think so.

If user specified like this:

1) movablemem_map=aaa@bbb ---------- will be added into array
2) movablemem_map=acpi    ---------- will empty the array
3) movablemem_map=ccc@ddd ---------- will be added into array again (wrong!)

So, we need to code like this:

+	if (!strncmp(p, "acpi", max(4, strlen(p))))
+		movablemem_map.acpi = true;

In this way, 3) movablemem_map=ccc@ddd will not go into this if segment.

+
+	/*
+	 * If user decide to use info from BIOS, all the other user specified
+	 * ranges will be ingored.
+	 */
+	if (movablemem_map.acpi) {
+		if (movablemem_map.nr_map) {
+			memset(movablemem_map.map, 0,
+				sizeof(struct movablemem_entry)
+				* movablemem_map.nr_map);
+			movablemem_map.nr_map = 0;
+		}
+		return 0;
+	}

But it will go into this if segment, and will not add the range into array.

Thanks. :)





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
@ 2013-02-07  6:22           ` Tang Chen
  0 siblings, 0 replies; 35+ messages in thread
From: Tang Chen @ 2013-02-07  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On 02/07/2013 05:54 AM, Andrew Morton wrote:
> On Wed, 06 Feb 2013 10:20:57 +0800
> Tang Chen<tangchen@cn.fujitsu.com>  wrote:
>
>>>>
>>>> +	if (!strncmp(p, "acpi", max(4, strlen(p))))
>>>> +		movablemem_map.acpi = true;
>>>
>>> Generates a warning:
>>>
>>> mm/page_alloc.c: In function 'cmdline_parse_movablemem_map':
>>> mm/page_alloc.c:5312: warning: comparison of distinct pointer types lacks a cast
>>>
>>> due to max(int, size_t).
>>>
>>> This is easily fixed, but the code looks rather pointless.  If the
>>> incoming string is supposed to be exactly "acpi" then use strcmp().  If
>>> the incoming string must start with "acpi" then use strncmp(p, "acpi", 4).
>>>
>>> IOW, the max is unneeded?
>>
>> Hi Andrew,
>>
>> I think I made another mistake here. I meant to use min(4, strlen(p)) in
>> case p is
>> something like 'aaa' whose length is less then 4. But I mistook it with
>> max().
>>
>> But after I dig into strcmp() in the kernel, I think it is OK to use
>> strcmp().
>> min() or max() is not needed.
>
> OK, I did that.
>
> But the code still looks a bit more complex than we need.  Could we do
>
> static int __init cmdline_parse_movablemem_map(char *p)
> {
> 	char *oldp;
> 	u64 start_at, mem_size;
>
> 	if (!p)
> 		goto err;
>
> 	/*
> 	 * If user decide to use info from BIOS, all the other user specified
> 	 * ranges will be ingored.
> 	 */
> 	if (!strcmp(p, "acpi")) {
> 		movablemem_map.acpi = true;
> 		if (movablemem_map.nr_map) {
> 			memset(movablemem_map.map, 0,
> 				sizeof(struct movablemem_entry)
> 				* movablemem_map.nr_map);
> 			movablemem_map.nr_map = 0;
> 		}
> 		return 0;
> 	}
>
>
No, I don't think so.

If user specified like this:

1) movablemem_map=aaa@bbb ---------- will be added into array
2) movablemem_map=acpi    ---------- will empty the array
3) movablemem_map=ccc@ddd ---------- will be added into array again (wrong!)

So, we need to code like this:

+	if (!strncmp(p, "acpi", max(4, strlen(p))))
+		movablemem_map.acpi = true;

In this way, 3) movablemem_map=ccc@ddd will not go into this if segment.

+
+	/*
+	 * If user decide to use info from BIOS, all the other user specified
+	 * ranges will be ingored.
+	 */
+	if (movablemem_map.acpi) {
+		if (movablemem_map.nr_map) {
+			memset(movablemem_map.map, 0,
+				sizeof(struct movablemem_entry)
+				* movablemem_map.nr_map);
+			movablemem_map.nr_map = 0;
+		}
+		return 0;
+	}

But it will go into this if segment, and will not add the range into array.

Thanks. :)




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2013-02-07  6:23 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-25  9:42 [PATCH 0/3] Support SRAT for movablemem_map boot option Tang Chen
2013-01-25  9:42 ` Tang Chen
2013-01-25  9:42 ` [PATCH 1/3] acpi, memory-hotplug: Parse SRAT before memblock is ready Tang Chen
2013-01-25  9:42   ` Tang Chen
2013-01-25  9:42 ` [PATCH 2/3] acpi, memory-hotplug: Extend movablemem_map ranges to the end of node Tang Chen
2013-01-25  9:42   ` Tang Chen
2013-01-26  0:36   ` Andrew Morton
2013-01-26  0:36     ` Andrew Morton
2013-01-28  1:53     ` Tang Chen
2013-01-28  1:53       ` Tang Chen
2013-01-25  9:42 ` [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT Tang Chen
2013-01-25  9:42   ` Tang Chen
2013-01-26  0:40   ` Andrew Morton
2013-01-26  0:40     ` Andrew Morton
2013-01-26  1:12   ` Andrew Morton
2013-01-26  1:12     ` Andrew Morton
2013-01-26  1:29     ` H. Peter Anvin
2013-01-26  1:29       ` H. Peter Anvin
2013-01-28  2:07       ` Tang Chen
2013-01-28  2:07         ` Tang Chen
2013-01-28 17:45         ` Luck, Tony
2013-01-28 17:45           ` Luck, Tony
2013-01-29  6:43           ` Tang Chen
2013-01-29 18:38             ` Luck, Tony
2013-01-29 18:40               ` H. Peter Anvin
2013-01-28  9:15     ` Tang Chen
2013-01-28  9:15       ` Tang Chen
2013-02-04 23:26   ` Andrew Morton
2013-02-04 23:26     ` Andrew Morton
2013-02-06  2:20     ` Tang Chen
2013-02-06  2:20       ` Tang Chen
2013-02-06 21:54       ` Andrew Morton
2013-02-06 21:54         ` Andrew Morton
2013-02-07  6:22         ` Tang Chen
2013-02-07  6:22           ` Tang Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.