All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tang Chen <tangchen@cn.fujitsu.com>
To: tglx@linutronix.de, mingo@elte.hu, hpa@zytor.com,
	akpm@linux-foundation.org, tj@kernel.org, trenn@suse.de,
	yinghai@kernel.org, jiang.liu@huawei.com, wency@cn.fujitsu.com,
	laijs@cn.fujitsu.com, isimatu.yasuaki@jp.fujitsu.com,
	izumi.taku@jp.fujitsu.com, mgorman@suse.de, minchan@kernel.org,
	mina86@mina86.com, gong.chen@linux.intel.com,
	vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com,
	riel@redhat.com, jweiner@redhat.com, prarit@redhat.com,
	zhangyanfei@cn.fujitsu.com, yanghy@cn.fujitsu.com
Cc: x86@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-acpi@vger.kernel.org
Subject: [PATCH 17/21] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT.
Date: Fri, 19 Jul 2013 15:59:30 +0800	[thread overview]
Message-ID: <1374220774-29974-18-git-send-email-tangchen@cn.fujitsu.com> (raw)
In-Reply-To: <1374220774-29974-1-git-send-email-tangchen@cn.fujitsu.com>

The Hot-Pluggable fired in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel,
it cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we improve movablecore boot option to allow users to
choose to reserve hotpluggable memory and set it as ZONE_MOVABLE or not.

Users can specify "movablecore=acpi" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.

Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 arch/x86/kernel/setup.c        |    8 +++++++-
 include/linux/memory_hotplug.h |    3 +++
 mm/page_alloc.c                |   13 +++++++++++++
 3 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9717760..9d08a03 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1083,8 +1083,14 @@ void __init setup_arch(char **cmdline_p)
 	 * Linux kernel cannot migrate kernel pages, as a result, memory used
 	 * by the kernel cannot be hot-removed. Reserve hotpluggable memory to
 	 * prevent memblock from allocating hotpluggable memory for the kernel.
+	 *
+	 * If all the memory in a node is hotpluggable, then the kernel won't
+	 * be able to use memory on that node. This will cause NUMA performance
+	 * down. So by default, we don't reserve any hotpluggable memory. users
+	 * may use "movablecore=acpi" boot option to enable this functionality.
 	 */
-	reserve_hotpluggable_memory();
+	if (movablecore_enable_srat)
+		reserve_hotpluggable_memory();
 #endif
 
 	/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 681b97f..9f26e29 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,9 @@ enum {
 	ONLINE_MOVABLE,
 };
 
+/* Enable/disable SRAT in movablecore boot option */
+extern bool movablecore_enable_srat;
+
 /*
  * pgdat resizing functions
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3edb62..6271c36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -209,6 +209,8 @@ static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
 
+bool __initdata movablecore_enable_srat;
+
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
 EXPORT_SYMBOL(movable_zone);
@@ -5112,6 +5114,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	}
 }
 
+static void __init cmdline_movablecore_srat(char *p)
+{
+	if (p && !strcmp(p, "acpi"))
+		movablecore_enable_srat = true;
+}
+
 static int __init cmdline_parse_core(char *p, unsigned long *core)
 {
 	unsigned long long coremem;
@@ -5142,6 +5150,11 @@ static int __init cmdline_parse_kernelcore(char *p)
  */
 static int __init cmdline_parse_movablecore(char *p)
 {
+	cmdline_movablecore_srat(p);
+
+	if (movablecore_enable_srat)
+		return 0;
+
 	return cmdline_parse_core(p, &required_movablecore);
 }
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Tang Chen <tangchen@cn.fujitsu.com>
To: tglx@linutronix.de, mingo@elte.hu, hpa@zytor.com,
	akpm@linux-foundation.org, tj@kernel.org, trenn@suse.de,
	yinghai@kernel.org, jiang.liu@huawei.com, wency@cn.fujitsu.com,
	laijs@cn.fujitsu.com, isimatu.yasuaki@jp.fujitsu.com,
	izumi.taku@jp.fujitsu.com, mgorman@suse.de, minchan@kernel.org,
	mina86@mina86.com, gong.chen@linux.intel.com,
	vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com,
	riel@redhat.com, jweiner@redhat.com, prarit@redhat.com,
	zhangyanfei@cn.fujitsu.com, yanghy@cn.fujitsu.com
Cc: x86@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-acpi@vger.kernel.org
Subject: [PATCH 17/21] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT.
Date: Fri, 19 Jul 2013 15:59:30 +0800	[thread overview]
Message-ID: <1374220774-29974-18-git-send-email-tangchen@cn.fujitsu.com> (raw)
In-Reply-To: <1374220774-29974-1-git-send-email-tangchen@cn.fujitsu.com>

The Hot-Pluggable fired in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel,
it cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we improve movablecore boot option to allow users to
choose to reserve hotpluggable memory and set it as ZONE_MOVABLE or not.

Users can specify "movablecore=acpi" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.

Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 arch/x86/kernel/setup.c        |    8 +++++++-
 include/linux/memory_hotplug.h |    3 +++
 mm/page_alloc.c                |   13 +++++++++++++
 3 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9717760..9d08a03 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1083,8 +1083,14 @@ void __init setup_arch(char **cmdline_p)
 	 * Linux kernel cannot migrate kernel pages, as a result, memory used
 	 * by the kernel cannot be hot-removed. Reserve hotpluggable memory to
 	 * prevent memblock from allocating hotpluggable memory for the kernel.
+	 *
+	 * If all the memory in a node is hotpluggable, then the kernel won't
+	 * be able to use memory on that node. This will cause NUMA performance
+	 * down. So by default, we don't reserve any hotpluggable memory. users
+	 * may use "movablecore=acpi" boot option to enable this functionality.
 	 */
-	reserve_hotpluggable_memory();
+	if (movablecore_enable_srat)
+		reserve_hotpluggable_memory();
 #endif
 
 	/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 681b97f..9f26e29 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,9 @@ enum {
 	ONLINE_MOVABLE,
 };
 
+/* Enable/disable SRAT in movablecore boot option */
+extern bool movablecore_enable_srat;
+
 /*
  * pgdat resizing functions
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3edb62..6271c36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -209,6 +209,8 @@ static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
 
+bool __initdata movablecore_enable_srat;
+
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
 EXPORT_SYMBOL(movable_zone);
@@ -5112,6 +5114,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	}
 }
 
+static void __init cmdline_movablecore_srat(char *p)
+{
+	if (p && !strcmp(p, "acpi"))
+		movablecore_enable_srat = true;
+}
+
 static int __init cmdline_parse_core(char *p, unsigned long *core)
 {
 	unsigned long long coremem;
@@ -5142,6 +5150,11 @@ static int __init cmdline_parse_kernelcore(char *p)
  */
 static int __init cmdline_parse_movablecore(char *p)
 {
+	cmdline_movablecore_srat(p);
+
+	if (movablecore_enable_srat)
+		return 0;
+
 	return cmdline_parse_core(p, &required_movablecore);
 }
 
-- 
1.7.1


  parent reply	other threads:[~2013-07-19  7:59 UTC|newest]

Thread overview: 152+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-19  7:59 [PATCH 00/21] Arrange hotpluggable memory as ZONE_MOVABLE Tang Chen
2013-07-19  7:59 ` Tang Chen
2013-07-19  7:59 ` [PATCH 01/21] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 18:48   ` Tejun Heo
2013-07-23 18:48     ` Tejun Heo
2013-07-23 19:15     ` Joe Perches
2013-07-23 19:15       ` Joe Perches
2013-07-23 19:20       ` Tejun Heo
2013-07-23 19:20         ` Tejun Heo
2013-07-23 19:26         ` Joe Perches
2013-07-23 19:26           ` Joe Perches
2013-07-24  1:46     ` Tang Chen
2013-07-24  1:46       ` Tang Chen
2013-07-19  7:59 ` [PATCH 02/21] memblock, numa: Introduce flag into memblock Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 19:09   ` Tejun Heo
2013-07-23 19:09     ` Tejun Heo
2013-07-24  2:53     ` Tang Chen
2013-07-24  2:53       ` Tang Chen
2013-07-24 15:54       ` Tejun Heo
2013-07-24 15:54         ` Tejun Heo
2013-07-25  6:42         ` Tang Chen
2013-07-25  6:42           ` Tang Chen
2013-07-19  7:59 ` [PATCH 03/21] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to reserve hotpluggable memory Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 19:19   ` Tejun Heo
2013-07-23 19:19     ` Tejun Heo
2013-07-24  2:55     ` Tang Chen
2013-07-24  2:55       ` Tang Chen
2013-07-19  7:59 ` [PATCH 04/21] acpi: Remove "continue" in macro INVALID_TABLE() Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 19:15   ` Tejun Heo
2013-07-23 19:15     ` Tejun Heo
2013-07-19  7:59 ` [PATCH 05/21] acpi: Introduce acpi_invalid_table() to check if a table is invalid Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-19  7:59 ` [PATCH 06/21] x86, acpi: Split acpi_boot_table_init() into two parts Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-19  7:59 ` [PATCH 07/21] x86, acpi: Initialize ACPI root table list earlier Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-19  7:59 ` [PATCH 08/21] x86, acpi: Also initialize signature and length when parsing root table Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 19:45   ` Tejun Heo
2013-07-23 19:45     ` Tejun Heo
2013-07-25  6:50     ` Tang Chen
2013-07-25  6:50       ` Tang Chen
2013-07-19  7:59 ` [PATCH 09/21] x86: Make get_ramdisk_{image|size}() global Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 19:56   ` Tejun Heo
2013-07-23 19:56     ` Tejun Heo
2013-07-24  3:12     ` Tang Chen
2013-07-24  3:12       ` Tang Chen
2013-07-19  7:59 ` [PATCH 10/21] earlycpio.c: Fix the confusing comment of find_cpio_data() Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 20:02   ` Tejun Heo
2013-07-23 20:02     ` Tejun Heo
2013-07-24  3:20     ` Tang Chen
2013-07-24  3:20       ` Tang Chen
2013-07-19  7:59 ` [PATCH 11/21] x86: get pg_data_t's memory from other node Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 20:09   ` Tejun Heo
2013-07-23 20:09     ` Tejun Heo
2013-07-24  3:52     ` Tang Chen
2013-07-24  3:52       ` Tang Chen
2013-07-24 16:03       ` Tejun Heo
2013-07-24 16:03         ` Tejun Heo
2013-07-19  7:59 ` [PATCH 12/21] x86, acpi: Try to find if SRAT is overrided earlier Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 20:27   ` Tejun Heo
2013-07-23 20:27     ` Tejun Heo
2013-07-24  6:57     ` Tang Chen
2013-07-24  6:57       ` Tang Chen
2013-07-19  7:59 ` [PATCH 13/21] x86, acpi: Try to find SRAT in firmware earlier Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 20:49   ` Tejun Heo
2013-07-23 20:49     ` Tejun Heo
2013-07-24 10:12     ` Tang Chen
2013-07-24 10:12       ` Tang Chen
2013-07-24 15:55       ` Tejun Heo
2013-07-24 15:55         ` Tejun Heo
2013-07-23 23:26   ` Cody P Schafer
2013-07-23 23:26     ` Cody P Schafer
2013-07-24 10:16     ` Tang Chen
2013-07-24 10:16       ` Tang Chen
2013-07-19  7:59 ` [PATCH 14/21] x86, acpi, numa: Reserve hotpluggable memory at early time Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 20:55   ` Tejun Heo
2013-07-23 20:55     ` Tejun Heo
2013-07-23 21:32     ` Tejun Heo
2013-07-23 21:32       ` Tejun Heo
2013-07-25  2:13       ` Tang Chen
2013-07-25  2:13         ` Tang Chen
2013-07-25 15:17         ` Tejun Heo
2013-07-25 15:17           ` Tejun Heo
2013-07-26  3:45           ` Tang Chen
2013-07-26  3:45             ` Tang Chen
2013-07-26 10:26             ` Tejun Heo
2013-07-26 10:26               ` Tejun Heo
2013-07-26 10:27               ` Tejun Heo
2013-07-26 10:27                 ` Tejun Heo
2013-07-29  2:12               ` Tang Chen
2013-07-29  2:12                 ` Tang Chen
2013-07-29 17:10                 ` Tejun Heo
2013-07-29 17:10                   ` Tejun Heo
2013-07-19  7:59 ` [PATCH 15/21] x86, acpi, numa: Don't reserve memory on nodes the kernel resides in Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 20:59   ` Tejun Heo
2013-07-23 20:59     ` Tejun Heo
2013-07-25  2:34     ` Tang Chen
2013-07-25  2:34       ` Tang Chen
2013-07-19  7:59 ` [PATCH 16/21] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 21:00   ` Tejun Heo
2013-07-23 21:00     ` Tejun Heo
2013-07-25  2:35     ` Tang Chen
2013-07-25  2:35       ` Tang Chen
2013-07-19  7:59 ` Tang Chen [this message]
2013-07-19  7:59   ` [PATCH 17/21] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT Tang Chen
2013-07-23 21:04   ` Tejun Heo
2013-07-23 21:04     ` Tejun Heo
2013-07-23 21:11     ` Tejun Heo
2013-07-23 21:11       ` Tejun Heo
2013-07-25  3:50       ` Tang Chen
2013-07-25  3:50         ` Tang Chen
2013-07-25 15:09         ` Tejun Heo
2013-07-25 15:09           ` Tejun Heo
2013-07-26  3:58           ` Tang Chen
2013-07-26  3:58             ` Tang Chen
2013-07-19  7:59 ` [PATCH 18/21] x86, numa: Synchronize nid info in memblock.reserve with numa_meminfo Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 21:25   ` Tejun Heo
2013-07-23 21:25     ` Tejun Heo
2013-07-25  4:09     ` Tang Chen
2013-07-25  4:09       ` Tang Chen
2013-07-25 15:05       ` Tejun Heo
2013-07-25 15:05         ` Tejun Heo
2013-07-26  4:00         ` Tang Chen
2013-07-26  4:00           ` Tang Chen
2013-07-19  7:59 ` [PATCH 19/21] x86, numa: Save nid when reserve memory into memblock.reserved[] Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-19  7:59 ` [PATCH 20/21] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 21:21   ` Tejun Heo
2013-07-23 21:21     ` Tejun Heo
2013-07-19  7:59 ` [PATCH 21/21] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option Tang Chen
2013-07-19  7:59   ` Tang Chen
2013-07-23 21:21   ` Tejun Heo
2013-07-23 21:21     ` Tejun Heo
2013-07-25  3:53     ` Tang Chen
2013-07-25  3:53       ` Tang Chen
2013-07-22  2:48 ` [PATCH 00/21] Arrange hotpluggable memory as ZONE_MOVABLE Tang Chen
2013-07-22  2:48   ` Tang Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1374220774-29974-18-git-send-email-tangchen@cn.fujitsu.com \
    --to=tangchen@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=jweiner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mina86@mina86.com \
    --cc=minchan@kernel.org \
    --cc=mingo@elte.hu \
    --cc=prarit@redhat.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=trenn@suse.de \
    --cc=vasilis.liaskovitis@profitbricks.com \
    --cc=wency@cn.fujitsu.com \
    --cc=x86@kernel.org \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yinghai@kernel.org \
    --cc=zhangyanfei@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.