linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Add movablecore_map boot option
@ 2012-12-11  2:33 Tang Chen
  2012-12-11  2:33 ` [PATCH v3 1/5] x86: get pg_data_t's memory from other node Tang Chen
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Tang Chen @ 2012-12-11  2:33 UTC (permalink / raw)
  To: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer
  Cc: linux-kernel, linux-mm, linux-doc

[What we are doing]
This patchset provide a boot option for user to specify ZONE_MOVABLE memory
map for each node in the system.

movablecore_map=nn[KMG]@ss[KMG]

This option make sure memory range from ss to ss+nn is movable memory.


[Why we do this]
If we hot remove a memroy, the memory cannot have kernel memory,
because Linux cannot migrate kernel memory currently. Therefore,
we have to guarantee that the hot removed memory has only movable
memoroy.

Linux has two boot options, kernelcore= and movablecore=, for
creating movable memory. These boot options can specify the amount
of memory use as kernel or movable memory. Using them, we can
create ZONE_MOVABLE which has only movable memory.

But it does not fulfill a requirement of memory hot remove, because
even if we specify the boot options, movable memory is distributed
in each node evenly. So when we want to hot remove memory which
memory range is 0x80000000-0c0000000, we have no way to specify
the memory as movable memory.

So we proposed a new feature which specifies memory range to use as
movable memory.


[Ways to do this]
There may be 2 ways to specify movable memory.
 1. use firmware information
 2. use boot option

1. use firmware information
  According to ACPI spec 5.0, SRAT table has memory affinity structure
  and the structure has Hot Pluggable Filed. See "5.2.16.2 Memory
  Affinity Structure". If we use the information, we might be able to
  specify movable memory by firmware. For example, if Hot Pluggable
  Filed is enabled, Linux sets the memory as movable memory.

2. use boot option
  This is our proposal. New boot option can specify memory range to use
  as movable memory.


[How we do this]
We chose second way, because if we use first way, users cannot change
memory range to use as movable memory easily. We think if we create
movable memory, performance regression may occur by NUMA. In this case,
user can turn off the feature easily if we prepare the boot option.
And if we prepare the boot optino, the user can select which memory
to use as movable memory easily. 


[How to use]
Specify the following boot option:
movablecore_map=nn[KMG]@ss[KMG]

That means physical address range from ss to ss+nn will be allocated as
ZONE_MOVABLE.

And the following points should be considered.

1) If the range is involved in a single node, then from ss to the end of
   the node will be ZONE_MOVABLE.
2) If the range covers two or more nodes, then from ss to the end of
   the node will be ZONE_MOVABLE, and all the other nodes will only
   have ZONE_MOVABLE.
3) If no range is in the node, then the node will have no ZONE_MOVABLE
   unless kernelcore or movablecore is specified.
4) This option could be specified at most MAX_NUMNODES times.
5) If kernelcore or movablecore is also specified, movablecore_map will have
   higher priority to be satisfied.
6) This option has no conflict with memmap option.


Change log:

v2 -> v3:
1) Use memblock_alloc_try_nid() instead of memblock_alloc_nid() to allocate
   memory twice if a whole node is ZONE_MOVABLE.
2) Add DMA, DMA32 addresses check, make sure ZONE_MOVABLE won't use these addresses.
   Suggested by Wu Jianguo <wujianguo@huawei.com>
3) Add lowmem addresses check, when the system has highmem, make sure ZONE_MOVABLE
   won't use lowmem. Suggested by Liu Jiang <jiang.liu@huawei.com>
4) Fix misuse of pfns in movablecore_map.map[] as physical addresses.

Tang Chen (4):
  page_alloc: add movable_memmap kernel parameter
  page_alloc: Introduce zone_movable_limit[] to keep movable limit for
    nodes
  page_alloc: Make movablecore_map has higher priority
  page_alloc: Bootmem limit with movablecore_map

Yasuaki Ishimatsu (1):
  x86: get pg_data_t's memory from other node

 Documentation/kernel-parameters.txt |   17 +++
 arch/x86/mm/numa.c                  |    5 +-
 include/linux/memblock.h            |    1 +
 include/linux/mm.h                  |   11 ++
 mm/memblock.c                       |   18 +++-
 mm/page_alloc.c                     |  238 ++++++++++++++++++++++++++++++++++-
 6 files changed, 282 insertions(+), 8 deletions(-)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/5] x86: get pg_data_t's memory from other node
  2012-12-11  2:33 [PATCH v3 0/5] Add movablecore_map boot option Tang Chen
@ 2012-12-11  2:33 ` Tang Chen
  2012-12-11  2:33 ` [PATCH v3 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Tang Chen @ 2012-12-11  2:33 UTC (permalink / raw)
  To: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer
  Cc: linux-kernel, linux-mm, linux-doc

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system can create movable node which all memory of the
node is allocated as ZONE_MOVABLE, setup_node_data() cannot
allocate memory for the node's pg_data_t.
So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
to retry when the first allocation fails.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 arch/x86/mm/numa.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 2d125be..db939b6 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -222,10 +222,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
 		nd_pa = __pa(nd);
 		remapped = true;
 	} else {
-		nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+		nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
 		if (!nd_pa) {
-			pr_err("Cannot find %zu bytes in node %d\n",
-			       nd_size, nid);
+			pr_err("Cannot find %zu bytes in any node\n", nd_size);
 			return;
 		}
 		nd = __va(nd_pa);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 2/5] page_alloc: add movable_memmap kernel parameter
  2012-12-11  2:33 [PATCH v3 0/5] Add movablecore_map boot option Tang Chen
  2012-12-11  2:33 ` [PATCH v3 1/5] x86: get pg_data_t's memory from other node Tang Chen
@ 2012-12-11  2:33 ` Tang Chen
  2012-12-11  2:33 ` [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Tang Chen @ 2012-12-11  2:33 UTC (permalink / raw)
  To: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer
  Cc: linux-kernel, linux-mm, linux-doc

This patch adds functions to parse movablecore_map boot option. Since the
option could be specified more then once, all the maps will be stored in
the global variable movablecore_map.map array.

And also, we keep the array in monotonic increasing order by start_pfn.
And merge all overlapped ranges.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   17 +++++
 include/linux/mm.h                  |   11 +++
 mm/page_alloc.c                     |  126 +++++++++++++++++++++++++++++++++++
 3 files changed, 154 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 9776f06..785f878 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1620,6 +1620,23 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablecore_map=nn[KMG]@ss[KMG]
+			[KNL,X86,IA-64,PPC] This parameter is similar to
+			memmap except it specifies the memory map of
+			ZONE_MOVABLE.
+			If more areas are all within one node, then from
+			lowest ss to the end of the node will be ZONE_MOVABLE.
+			If an area covers two or more nodes, the area from
+			ss to the end of the 1st node will be ZONE_MOVABLE,
+			and all the rest nodes will only have ZONE_MOVABLE.
+			If memmap is specified at the same time, the
+			movablecore_map will be limited within the memmap
+			areas. If kernelcore or movablecore is also specified,
+			movablecore_map will have higher priority to be
+			satisfied. So the administrator should be careful that
+			the amount of movablecore_map areas are not too large.
+			Otherwise kernel won't have enough memory to start.
+
 	MTD_Partition=	[MTD]
 			Format: <name>,<region-number>,<size>,<offset>
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bcaab4e..29622c2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1328,6 +1328,17 @@ extern void free_bootmem_with_active_regions(int nid,
 						unsigned long max_low_pfn);
 extern void sparse_memory_present_with_active_regions(int nid);
 
+#define MOVABLECORE_MAP_MAX MAX_NUMNODES
+struct movablecore_entry {
+	unsigned long start_pfn;    /* start pfn of memory segment */
+	unsigned long end_pfn;      /* end pfn of memory segment */
+};
+
+struct movablecore_map {
+	int nr_map;
+	struct movablecore_entry map[MOVABLECORE_MAP_MAX];
+};
+
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 #if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8f2c87..1c91d16 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -198,6 +198,9 @@ static unsigned long __meminitdata nr_all_pages;
 static unsigned long __meminitdata dma_reserve;
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+/* Movable memory ranges, will also be used by memblock subsystem. */
+struct movablecore_map movablecore_map;
+
 static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __initdata required_kernelcore;
@@ -5003,6 +5006,129 @@ static int __init cmdline_parse_movablecore(char *p)
 early_param("kernelcore", cmdline_parse_kernelcore);
 early_param("movablecore", cmdline_parse_movablecore);
 
+/**
+ * insert_movablecore_map - Insert a memory range in to movablecore_map.map.
+ * @start_pfn: start pfn of the range
+ * @end_pfn: end pfn of the range
+ *
+ * This function will also merge the overlapped ranges, and sort the array
+ * by start_pfn in monotonic increasing order.
+ */
+static void __init insert_movablecore_map(unsigned long start_pfn,
+					  unsigned long end_pfn)
+{
+	int pos, overlap;
+
+	/*
+	 * pos will be at the 1st overlapped range, or the position
+	 * where the element should be inserted.
+	 */
+	for (pos = 0; pos < movablecore_map.nr_map; pos++)
+		if (start_pfn <= movablecore_map.map[pos].end_pfn)
+			break;
+
+	/* If there is no overlapped range, just insert the element. */
+	if (pos == movablecore_map.nr_map ||
+	    end_pfn < movablecore_map.map[pos].start_pfn) {
+		/*
+		 * If pos is not the end of array, we need to move all
+		 * the rest elements backward.
+		 */
+		if (pos < movablecore_map.nr_map)
+			memmove(&movablecore_map.map[pos+1],
+				&movablecore_map.map[pos],
+				sizeof(struct movablecore_entry) *
+				(movablecore_map.nr_map - pos));
+		movablecore_map.map[pos].start_pfn = start_pfn;
+		movablecore_map.map[pos].end_pfn = end_pfn;
+		movablecore_map.nr_map++;
+		return;
+	}
+
+	/* overlap will be at the last overlapped range */
+	for (overlap = pos + 1; overlap < movablecore_map.nr_map; overlap++)
+		if (end_pfn < movablecore_map.map[overlap].start_pfn)
+			break;
+
+	/*
+	 * If there are more ranges overlapped, we need to merge them,
+	 * and move the rest elements forward.
+	 */
+	overlap--;
+	movablecore_map.map[pos].start_pfn = min(start_pfn,
+					movablecore_map.map[pos].start_pfn);
+	movablecore_map.map[pos].end_pfn = max(end_pfn,
+					movablecore_map.map[overlap].end_pfn);
+
+	if (pos != overlap && overlap + 1 != movablecore_map.nr_map)
+		memmove(&movablecore_map.map[pos+1],
+			&movablecore_map.map[overlap+1],
+			sizeof(struct movablecore_entry) *
+			(movablecore_map.nr_map - overlap - 1));
+
+	movablecore_map.nr_map -= overlap - pos;
+}
+
+/**
+ * movablecore_map_add_region - Add a memory range into movablecore_map.
+ * @start: physical start address of range
+ * @end: physical end address of range
+ *
+ * This function transform the physical address into pfn, and then add the
+ * range into movablecore_map by calling insert_movablecore_map().
+ */
+static void __init movablecore_map_add_region(u64 start, u64 size)
+{
+	unsigned long start_pfn, end_pfn;
+
+	/* In case size == 0 or start + size overflows */
+	if (start + size <= start)
+		return;
+
+	if (movablecore_map.nr_map >= ARRAY_SIZE(movablecore_map.map)) {
+		pr_err("movable_memory_map: too many entries;"
+			" ignoring [mem %#010llx-%#010llx]\n",
+			(unsigned long long) start,
+			(unsigned long long) (start + size - 1));
+		return;
+	}
+
+	start_pfn = PFN_DOWN(start);
+	end_pfn = PFN_UP(start + size);
+	insert_movablecore_map(start_pfn, end_pfn);
+}
+
+/*
+ * movablecore_map=nn[KMG]@ss[KMG] sets the region of memory to be used as
+ * movable memory.
+ */
+static int __init cmdline_parse_movablecore_map(char *p)
+{
+	char *oldp;
+	u64 start_at, mem_size;
+
+	if (!p)
+		goto err;
+
+	oldp = p;
+	mem_size = memparse(p, &p);
+	if (p == oldp)
+		goto err;
+
+	if (*p == '@') {
+		oldp = ++p;
+		start_at = memparse(p, &p);
+		if (p == oldp || *p != '\0')
+			goto err;
+
+		movablecore_map_add_region(start_at, mem_size);
+		return 0;
+	}
+err:
+	return -EINVAL;
+}
+early_param("movablecore_map", cmdline_parse_movablecore_map);
+
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11  2:33 [PATCH v3 0/5] Add movablecore_map boot option Tang Chen
  2012-12-11  2:33 ` [PATCH v3 1/5] x86: get pg_data_t's memory from other node Tang Chen
  2012-12-11  2:33 ` [PATCH v3 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
@ 2012-12-11  2:33 ` Tang Chen
  2012-12-11  3:07   ` Jianguo Wu
  2012-12-11  4:55   ` [PATCH v3 3/5][RESEND] " Tang Chen
  2012-12-11  2:33 ` [PATCH v3 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Tang Chen @ 2012-12-11  2:33 UTC (permalink / raw)
  To: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer
  Cc: linux-kernel, linux-mm, linux-doc

This patch introduces a new array zone_movable_limit[] to store the
ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
The function sanitize_zone_movable_limit() will find out to which
node the ranges in movable_map.map[] belongs, and calculates the
low boundary of ZONE_MOVABLE for each node.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1c91d16..4853619 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
+static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
 
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
@@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
 	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
 }
 
+/**
+ * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
+ *
+ * zone_movable_limit is initialized as 0. This function will try to get
+ * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
+ * assigne them to zone_movable_limit.
+ * zone_movable_limit[nid] == 0 means no limit for the node.
+ *
+ * Note: Each range is represented as [start_pfn, end_pfn)
+ */
+static void __meminit sanitize_zone_movable_limit(void)
+{
+	int map_pos = 0, i, nid;
+	unsigned long start_pfn, end_pfn;
+
+	if (!movablecore_map.nr_map)
+		return;
+
+	/* Iterate all ranges from minimum to maximum */
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
+		/*
+		 * If we have found lowest pfn of ZONE_MOVABLE of the node
+		 * specified by user, just go on to check next range.
+		 */
+		if (zone_movable_limit[nid])
+			continue;
+
+#ifdef CONFIG_ZONE_DMA
+		/* Skip DMA memory. */
+		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
+			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
+#endif
+
+#ifdef CONFIG_ZONE_DMA32
+		/* Skip DMA32 memory. */
+		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
+			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
+#endif
+
+#ifdef CONFIG_HIGHMEM
+		/* Skip lowmem if ZONE_MOVABLE is highmem. */
+		if (zone_movable_is_highmem() &&
+		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
+			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
+#endif
+
+		if (start_pfn >= end_pfn)
+			continue;
+
+		while (map_pos < movablecore_map.nr_map) {
+			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
+				break;
+
+			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
+				map_pos++;
+				continue;
+			}
+
+			/*
+			 * The start_pfn of ZONE_MOVABLE is either the minimum
+			 * pfn specified by movablecore_map, or 0, which means
+			 * the node has no ZONE_MOVABLE.
+			 */
+			zone_movable_limit[nid] = max(start_pfn,
+					movablecore_map.map[map_pos].start_pfn);
+
+			break;
+		}
+	}
+}
+
 #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
 					unsigned long zone_type,
@@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
 	return zholes_size[zone_type];
 }
 
+static void __meminit sanitize_zone_movable_limit(void)
+{
+}
+
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
@@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 
 	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
 	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
+	sanitize_zone_movable_limit();
 	find_zone_movable_pfns_for_nodes();
 
 	/* Print out the zone ranges */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/5] page_alloc: Make movablecore_map has higher priority
  2012-12-11  2:33 [PATCH v3 0/5] Add movablecore_map boot option Tang Chen
                   ` (2 preceding siblings ...)
  2012-12-11  2:33 ` [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
@ 2012-12-11  2:33 ` Tang Chen
  2012-12-11  4:56   ` [PATCH v3 4/5][RESEND] " Tang Chen
  2012-12-11  2:33 ` [PATCH v3 5/5] page_alloc: Bootmem limit with movablecore_map Tang Chen
  2012-12-11 11:33 ` [PATCH v3 0/5] Add movablecore_map boot option Simon Jeons
  5 siblings, 1 reply; 28+ messages in thread
From: Tang Chen @ 2012-12-11  2:33 UTC (permalink / raw)
  To: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer
  Cc: linux-kernel, linux-mm, linux-doc

If kernelcore or movablecore is specified at the same time
with movablecore_map, movablecore_map will have higher
priority to be satisfied.
This patch will make find_zone_movable_pfns_for_nodes()
calculate zone_movable_pfn[] with the limit from
zone_movable_limit[].

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 mm/page_alloc.c |   35 +++++++++++++++++++++++++++++++----
 1 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4853619..e7b6db5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4839,12 +4839,25 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		required_kernelcore = max(required_kernelcore, corepages);
 	}
 
-	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
-	if (!required_kernelcore)
+	/*
+	 * No matter kernelcore/movablecore was limited or not, movable_zone
+	 * should always be set to a usable zone index.
+	 */
+	find_usable_zone_for_movable();
+
+	/*
+	 * If neither kernelcore/movablecore nor movablecore_map is specified,
+	 * there is no ZONE_MOVABLE. But if movablecore_map is specified, the
+	 * start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[].
+	 */
+	if (!required_kernelcore) {
+		if (movablecore_map.nr_map)
+			memcpy(zone_movable_pfn, zone_movable_limit,
+				sizeof(zone_movable_pfn));
 		goto out;
+	}
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
@@ -4872,10 +4885,24 @@ restart:
 		for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
 			unsigned long size_pages;
 
+			/*
+			 * Find more memory for kernelcore in
+			 * [zone_movable_pfn[nid], zone_movable_limit[nid]).
+			 */
 			start_pfn = max(start_pfn, zone_movable_pfn[nid]);
 			if (start_pfn >= end_pfn)
 				continue;
 
+			if (zone_movable_limit[nid]) {
+				end_pfn = min(end_pfn, zone_movable_limit[nid]);
+				/* No range left for kernelcore in this node */
+				if (start_pfn >= end_pfn) {
+					zone_movable_pfn[nid] =
+							zone_movable_limit[nid];
+					break;
+				}
+			}
+
 			/* Account for what is only usable for kernelcore */
 			if (start_pfn < usable_startpfn) {
 				unsigned long kernel_pages;
@@ -4935,12 +4962,12 @@ restart:
 	if (usable_nodes && required_kernelcore > usable_nodes)
 		goto restart;
 
+out:
 	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
 	for (nid = 0; nid < MAX_NUMNODES; nid++)
 		zone_movable_pfn[nid] =
 			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
 
-out:
 	/* restore the node_state */
 	node_states[N_HIGH_MEMORY] = saved_node_state;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 5/5] page_alloc: Bootmem limit with movablecore_map
  2012-12-11  2:33 [PATCH v3 0/5] Add movablecore_map boot option Tang Chen
                   ` (3 preceding siblings ...)
  2012-12-11  2:33 ` [PATCH v3 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
@ 2012-12-11  2:33 ` Tang Chen
  2012-12-11 11:33 ` [PATCH v3 0/5] Add movablecore_map boot option Simon Jeons
  5 siblings, 0 replies; 28+ messages in thread
From: Tang Chen @ 2012-12-11  2:33 UTC (permalink / raw)
  To: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer
  Cc: linux-kernel, linux-mm, linux-doc

This patch make sure bootmem will not allocate memory from areas that
may be ZONE_MOVABLE. The map info is from movablecore_map boot option.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   18 +++++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index d452ee1..6e25597 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -42,6 +42,7 @@ struct memblock {
 
 extern struct memblock memblock;
 extern int memblock_debug;
+extern struct movablecore_map movablecore_map;
 
 #define memblock_dbg(fmt, ...) \
 	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
diff --git a/mm/memblock.c b/mm/memblock.c
index 6259055..197c3be 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -101,6 +101,7 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 {
 	phys_addr_t this_start, this_end, cand;
 	u64 i;
+	int curr = movablecore_map.nr_map - 1;
 
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
@@ -114,13 +115,28 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 		this_start = clamp(this_start, start, end);
 		this_end = clamp(this_end, start, end);
 
-		if (this_end < size)
+restart:
+		if (this_end <= this_start || this_end < size)
 			continue;
 
+		for (; curr >= 0; curr--) {
+			if ((movablecore_map.map[curr].start_pfn << PAGE_SHIFT)
+			    < this_end)
+				break;
+		}
+
 		cand = round_down(this_end - size, align);
+		if (curr >= 0 &&
+		    cand < movablecore_map.map[curr].end_pfn << PAGE_SHIFT) {
+			this_end = movablecore_map.map[curr].start_pfn
+				   << PAGE_SHIFT;
+			goto restart;
+		}
+
 		if (cand >= this_start)
 			return cand;
 	}
+
 	return 0;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11  2:33 ` [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
@ 2012-12-11  3:07   ` Jianguo Wu
  2012-12-11  3:32     ` Tang Chen
  2012-12-11 12:24     ` Simon Jeons
  2012-12-11  4:55   ` [PATCH v3 3/5][RESEND] " Tang Chen
  1 sibling, 2 replies; 28+ messages in thread
From: Jianguo Wu @ 2012-12-11  3:07 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On 2012/12/11 10:33, Tang Chen wrote:

> This patch introduces a new array zone_movable_limit[] to store the
> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
> The function sanitize_zone_movable_limit() will find out to which
> node the ranges in movable_map.map[] belongs, and calculates the
> low boundary of ZONE_MOVABLE for each node.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
> ---
>  mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 77 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1c91d16..4853619 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
>  static unsigned long __initdata required_kernelcore;
>  static unsigned long __initdata required_movablecore;
>  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
> +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
>  
>  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>  int movable_zone;
> @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
>  	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
>  }
>  
> +/**
> + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
> + *
> + * zone_movable_limit is initialized as 0. This function will try to get
> + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
> + * assigne them to zone_movable_limit.
> + * zone_movable_limit[nid] == 0 means no limit for the node.
> + *
> + * Note: Each range is represented as [start_pfn, end_pfn)
> + */
> +static void __meminit sanitize_zone_movable_limit(void)
> +{
> +	int map_pos = 0, i, nid;
> +	unsigned long start_pfn, end_pfn;
> +
> +	if (!movablecore_map.nr_map)
> +		return;
> +
> +	/* Iterate all ranges from minimum to maximum */
> +	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
> +		/*
> +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
> +		 * specified by user, just go on to check next range.
> +		 */
> +		if (zone_movable_limit[nid])
> +			continue;
> +
> +#ifdef CONFIG_ZONE_DMA
> +		/* Skip DMA memory. */
> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
> +#endif
> +
> +#ifdef CONFIG_ZONE_DMA32
> +		/* Skip DMA32 memory. */
> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
> +#endif
> +
> +#ifdef CONFIG_HIGHMEM
> +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
> +		if (zone_movable_is_highmem() &&

Hi Tang,

I think zone_movable_is_highmem() is not work correctly here.
	sanitize_zone_movable_limit
		zone_movable_is_highmem      <--using movable_zone here
	find_zone_movable_pfns_for_nodes
		find_usable_zone_for_movable <--movable_zone is specified here

I think Jiang Liu's patch works fine for highmem, please refer to:
http://marc.info/?l=linux-mm&m=135476085816087&w=2

Thanks,
Jianguo Wu

> +		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
> +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
> +#endif
> +
> +		if (start_pfn >= end_pfn)
> +			continue;
> +
> +		while (map_pos < movablecore_map.nr_map) {
> +			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
> +				break;
> +
> +			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
> +				map_pos++;
> +				continue;
> +			}
> +
> +			/*
> +			 * The start_pfn of ZONE_MOVABLE is either the minimum
> +			 * pfn specified by movablecore_map, or 0, which means
> +			 * the node has no ZONE_MOVABLE.
> +			 */
> +			zone_movable_limit[nid] = max(start_pfn,
> +					movablecore_map.map[map_pos].start_pfn);
> +
> +			break;
> +		}
> +	}
> +}
> +
>  #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>  static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
>  					unsigned long zone_type,
> @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
>  	return zholes_size[zone_type];
>  }
>  
> +static void __meminit sanitize_zone_movable_limit(void)
> +{
> +}
> +
>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>  
>  static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
> @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>  
>  	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
>  	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
> +	sanitize_zone_movable_limit();
>  	find_zone_movable_pfns_for_nodes();
>  
>  	/* Print out the zone ranges */




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11  3:07   ` Jianguo Wu
@ 2012-12-11  3:32     ` Tang Chen
  2012-12-11 11:28       ` Simon Jeons
  2012-12-11 12:24     ` Simon Jeons
  1 sibling, 1 reply; 28+ messages in thread
From: Tang Chen @ 2012-12-11  3:32 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: jiang.liu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On 12/11/2012 11:07 AM, Jianguo Wu wrote:
> On 2012/12/11 10:33, Tang Chen wrote:
>
>> This patch introduces a new array zone_movable_limit[] to store the
>> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
>> The function sanitize_zone_movable_limit() will find out to which
>> node the ranges in movable_map.map[] belongs, and calculates the
>> low boundary of ZONE_MOVABLE for each node.
>>
>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>> Signed-off-by: Jiang Liu<jiang.liu@huawei.com>
>> Reviewed-by: Wen Congyang<wency@cn.fujitsu.com>
>> Reviewed-by: Lai Jiangshan<laijs@cn.fujitsu.com>
>> Tested-by: Lin Feng<linfeng@cn.fujitsu.com>
>> ---
>>   mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 files changed, 77 insertions(+), 0 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 1c91d16..4853619 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
>>   static unsigned long __initdata required_kernelcore;
>>   static unsigned long __initdata required_movablecore;
>>   static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
>> +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
>>
>>   /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>>   int movable_zone;
>> @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
>>   	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
>>   }
>>
>> +/**
>> + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
>> + *
>> + * zone_movable_limit is initialized as 0. This function will try to get
>> + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
>> + * assigne them to zone_movable_limit.
>> + * zone_movable_limit[nid] == 0 means no limit for the node.
>> + *
>> + * Note: Each range is represented as [start_pfn, end_pfn)
>> + */
>> +static void __meminit sanitize_zone_movable_limit(void)
>> +{
>> +	int map_pos = 0, i, nid;
>> +	unsigned long start_pfn, end_pfn;
>> +
>> +	if (!movablecore_map.nr_map)
>> +		return;
>> +
>> +	/* Iterate all ranges from minimum to maximum */
>> +	for_each_mem_pfn_range(i, MAX_NUMNODES,&start_pfn,&end_pfn,&nid) {
>> +		/*
>> +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
>> +		 * specified by user, just go on to check next range.
>> +		 */
>> +		if (zone_movable_limit[nid])
>> +			continue;
>> +
>> +#ifdef CONFIG_ZONE_DMA
>> +		/* Skip DMA memory. */
>> +		if (start_pfn<  arch_zone_highest_possible_pfn[ZONE_DMA])
>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
>> +#endif
>> +
>> +#ifdef CONFIG_ZONE_DMA32
>> +		/* Skip DMA32 memory. */
>> +		if (start_pfn<  arch_zone_highest_possible_pfn[ZONE_DMA32])
>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
>> +#endif
>> +
>> +#ifdef CONFIG_HIGHMEM
>> +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
>> +		if (zone_movable_is_highmem()&&
>
> Hi Tang,
>
> I think zone_movable_is_highmem() is not work correctly here.
> 	sanitize_zone_movable_limit
> 		zone_movable_is_highmem<--using movable_zone here
> 	find_zone_movable_pfns_for_nodes
> 		find_usable_zone_for_movable<--movable_zone is specified here
>
> I think Jiang Liu's patch works fine for highmem, please refer to:
> http://marc.info/?l=linux-mm&m=135476085816087&w=2

Hi Wu,

Yes, I forgot movable_zone think. Thanks for reminding me. :)

But Liu's patch you just mentioned, I didn't use it because I
don't think we should skip kernelcore when movablecore_map is specified.
If these 2 options are not conflict, we should satisfy them both. :)

Of course, I also think Liu's suggestion is wonderful. But I think we
need more discussion on it. :)

I'll fix it soon.
Thanks. :)

>
> Thanks,
> Jianguo Wu
>
>> +		    start_pfn<  arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
>> +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
>> +#endif
>> +
>> +		if (start_pfn>= end_pfn)
>> +			continue;
>> +
>> +		while (map_pos<  movablecore_map.nr_map) {
>> +			if (end_pfn<= movablecore_map.map[map_pos].start_pfn)
>> +				break;
>> +
>> +			if (start_pfn>= movablecore_map.map[map_pos].end_pfn) {
>> +				map_pos++;
>> +				continue;
>> +			}
>> +
>> +			/*
>> +			 * The start_pfn of ZONE_MOVABLE is either the minimum
>> +			 * pfn specified by movablecore_map, or 0, which means
>> +			 * the node has no ZONE_MOVABLE.
>> +			 */
>> +			zone_movable_limit[nid] = max(start_pfn,
>> +					movablecore_map.map[map_pos].start_pfn);
>> +
>> +			break;
>> +		}
>> +	}
>> +}
>> +
>>   #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>   static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
>>   					unsigned long zone_type,
>> @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
>>   	return zholes_size[zone_type];
>>   }
>>
>> +static void __meminit sanitize_zone_movable_limit(void)
>> +{
>> +}
>> +
>>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>
>>   static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
>> @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>>
>>   	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
>>   	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
>> +	sanitize_zone_movable_limit();
>>   	find_zone_movable_pfns_for_nodes();
>>
>>   	/* Print out the zone ranges */
>
>
>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/5][RESEND] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11  2:33 ` [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
  2012-12-11  3:07   ` Jianguo Wu
@ 2012-12-11  4:55   ` Tang Chen
  1 sibling, 0 replies; 28+ messages in thread
From: Tang Chen @ 2012-12-11  4:55 UTC (permalink / raw)
  To: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer
  Cc: linux-kernel, linux-mm, linux-doc

This patch introduces a new array zone_movable_limit[] to store the
ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
The function sanitize_zone_movable_limit() will find out to which
node the ranges in movable_map.map[] belongs, and calculates the
low boundary of ZONE_MOVABLE for each node.

change log:
Do find_usable_zone_for_movable() to initialize movable_zone
so that sanitize_zone_movable_limit() could use it.

Reported-by: Wu Jianguo <wujianguo@huawei.com>


Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Liu Jiang <jiang.liu@huawei.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 mm/page_alloc.c |   79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 78 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1c91d16..52c368e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
+static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
 
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
@@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
 	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
 }
 
+/**
+ * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
+ *
+ * zone_movable_limit is initialized as 0. This function will try to get
+ * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
+ * assigne them to zone_movable_limit.
+ * zone_movable_limit[nid] == 0 means no limit for the node.
+ *
+ * Note: Each range is represented as [start_pfn, end_pfn)
+ */
+static void __meminit sanitize_zone_movable_limit(void)
+{
+	int map_pos = 0, i, nid;
+	unsigned long start_pfn, end_pfn;
+
+	if (!movablecore_map.nr_map)
+		return;
+
+	/* Iterate all ranges from minimum to maximum */
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
+		/*
+		 * If we have found lowest pfn of ZONE_MOVABLE of the node
+		 * specified by user, just go on to check next range.
+		 */
+		if (zone_movable_limit[nid])
+			continue;
+
+#ifdef CONFIG_ZONE_DMA
+		/* Skip DMA memory. */
+		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
+			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
+#endif
+
+#ifdef CONFIG_ZONE_DMA32
+		/* Skip DMA32 memory. */
+		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
+			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
+#endif
+
+#ifdef CONFIG_HIGHMEM
+		/* Skip lowmem if ZONE_MOVABLE is highmem. */
+		if (zone_movable_is_highmem() &&
+		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
+			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
+#endif
+
+		if (start_pfn >= end_pfn)
+			continue;
+
+		while (map_pos < movablecore_map.nr_map) {
+			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
+				break;
+
+			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
+				map_pos++;
+				continue;
+			}
+
+			/*
+			 * The start_pfn of ZONE_MOVABLE is either the minimum
+			 * pfn specified by movablecore_map, or 0, which means
+			 * the node has no ZONE_MOVABLE.
+			 */
+			zone_movable_limit[nid] = max(start_pfn,
+					movablecore_map.map[map_pos].start_pfn);
+
+			break;
+		}
+	}
+}
+
 #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
 					unsigned long zone_type,
@@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
 	return zholes_size[zone_type];
 }
 
+static void __meminit sanitize_zone_movable_limit(void)
+{
+}
+
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
@@ -4768,7 +4844,6 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		goto out;
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
@@ -4923,6 +4998,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 
 	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
 	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
+	find_usable_zone_for_movable();
+	sanitize_zone_movable_limit();
 	find_zone_movable_pfns_for_nodes();
 
 	/* Print out the zone ranges */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/5][RESEND] page_alloc: Make movablecore_map has higher priority
  2012-12-11  2:33 ` [PATCH v3 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
@ 2012-12-11  4:56   ` Tang Chen
  2012-12-12  1:33     ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Tang Chen @ 2012-12-11  4:56 UTC (permalink / raw)
  To: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer
  Cc: linux-kernel, linux-mm, linux-doc

If kernelcore or movablecore is specified at the same time
with movablecore_map, movablecore_map will have higher
priority to be satisfied.
This patch will make find_zone_movable_pfns_for_nodes()
calculate zone_movable_pfn[] with the limit from
zone_movable_limit[].

change log:
Move find_usable_zone_for_movable() to free_area_init_nodes()
so that sanitize_zone_movable_limit() in patch 3 could use
initialized movable_zone.

Reported-by: Wu Jianguo <wujianguo@huawei.com>

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 mm/page_alloc.c |   28 +++++++++++++++++++++++++---
 1 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 52c368e..00fa67d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4839,9 +4839,17 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		required_kernelcore = max(required_kernelcore, corepages);
 	}
 
-	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
-	if (!required_kernelcore)
+	/*
+	 * If neither kernelcore/movablecore nor movablecore_map is specified,
+	 * there is no ZONE_MOVABLE. But if movablecore_map is specified, the
+	 * start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[].
+	 */
+	if (!required_kernelcore) {
+		if (movablecore_map.nr_map)
+			memcpy(zone_movable_pfn, zone_movable_limit,
+				sizeof(zone_movable_pfn));
 		goto out;
+	}
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
@@ -4871,10 +4879,24 @@ restart:
 		for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
 			unsigned long size_pages;
 
+			/*
+			 * Find more memory for kernelcore in
+			 * [zone_movable_pfn[nid], zone_movable_limit[nid]).
+			 */
 			start_pfn = max(start_pfn, zone_movable_pfn[nid]);
 			if (start_pfn >= end_pfn)
 				continue;
 
+			if (zone_movable_limit[nid]) {
+				end_pfn = min(end_pfn, zone_movable_limit[nid]);
+				/* No range left for kernelcore in this node */
+				if (start_pfn >= end_pfn) {
+					zone_movable_pfn[nid] =
+							zone_movable_limit[nid];
+					break;
+				}
+			}
+
 			/* Account for what is only usable for kernelcore */
 			if (start_pfn < usable_startpfn) {
 				unsigned long kernel_pages;
@@ -4934,12 +4956,12 @@ restart:
 	if (usable_nodes && required_kernelcore > usable_nodes)
 		goto restart;
 
+out:
 	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
 	for (nid = 0; nid < MAX_NUMNODES; nid++)
 		zone_movable_pfn[nid] =
 			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
 
-out:
 	/* restore the node_state */
 	node_states[N_HIGH_MEMORY] = saved_node_state;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11  3:32     ` Tang Chen
@ 2012-12-11 11:28       ` Simon Jeons
  2012-12-12  0:49         ` Jiang Liu
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-11 11:28 UTC (permalink / raw)
  To: Tang Chen
  Cc: Jianguo Wu, jiang.liu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Tue, 2012-12-11 at 11:32 +0800, Tang Chen wrote:
> On 12/11/2012 11:07 AM, Jianguo Wu wrote:
> > On 2012/12/11 10:33, Tang Chen wrote:
> >
> >> This patch introduces a new array zone_movable_limit[] to store the
> >> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
> >> The function sanitize_zone_movable_limit() will find out to which
> >> node the ranges in movable_map.map[] belongs, and calculates the
> >> low boundary of ZONE_MOVABLE for each node.

What's the difference between zone_movable_limit[nid] and
zone_movable_pfn[nid]?

> >>
> >> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
> >> Signed-off-by: Jiang Liu<jiang.liu@huawei.com>
> >> Reviewed-by: Wen Congyang<wency@cn.fujitsu.com>
> >> Reviewed-by: Lai Jiangshan<laijs@cn.fujitsu.com>
> >> Tested-by: Lin Feng<linfeng@cn.fujitsu.com>
> >> ---
> >>   mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   1 files changed, 77 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 1c91d16..4853619 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
> >>   static unsigned long __initdata required_kernelcore;
> >>   static unsigned long __initdata required_movablecore;
> >>   static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
> >> +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
> >>
> >>   /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
> >>   int movable_zone;
> >> @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
> >>   	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
> >>   }
> >>
> >> +/**
> >> + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
> >> + *
> >> + * zone_movable_limit is initialized as 0. This function will try to get
> >> + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
> >> + * assigne them to zone_movable_limit.
> >> + * zone_movable_limit[nid] == 0 means no limit for the node.
> >> + *
> >> + * Note: Each range is represented as [start_pfn, end_pfn)
> >> + */
> >> +static void __meminit sanitize_zone_movable_limit(void)
> >> +{
> >> +	int map_pos = 0, i, nid;
> >> +	unsigned long start_pfn, end_pfn;
> >> +
> >> +	if (!movablecore_map.nr_map)
> >> +		return;
> >> +
> >> +	/* Iterate all ranges from minimum to maximum */
> >> +	for_each_mem_pfn_range(i, MAX_NUMNODES,&start_pfn,&end_pfn,&nid) {
> >> +		/*
> >> +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
> >> +		 * specified by user, just go on to check next range.
> >> +		 */
> >> +		if (zone_movable_limit[nid])
> >> +			continue;
> >> +
> >> +#ifdef CONFIG_ZONE_DMA
> >> +		/* Skip DMA memory. */
> >> +		if (start_pfn<  arch_zone_highest_possible_pfn[ZONE_DMA])
> >> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
> >> +#endif
> >> +
> >> +#ifdef CONFIG_ZONE_DMA32
> >> +		/* Skip DMA32 memory. */
> >> +		if (start_pfn<  arch_zone_highest_possible_pfn[ZONE_DMA32])
> >> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
> >> +#endif
> >> +
> >> +#ifdef CONFIG_HIGHMEM
> >> +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
> >> +		if (zone_movable_is_highmem()&&
> >
> > Hi Tang,
> >
> > I think zone_movable_is_highmem() is not work correctly here.
> > 	sanitize_zone_movable_limit
> > 		zone_movable_is_highmem<--using movable_zone here
> > 	find_zone_movable_pfns_for_nodes
> > 		find_usable_zone_for_movable<--movable_zone is specified here
> >
> > I think Jiang Liu's patch works fine for highmem, please refer to:
> > http://marc.info/?l=linux-mm&m=135476085816087&w=2
> 
> Hi Wu,
> 
> Yes, I forgot movable_zone think. Thanks for reminding me. :)
> 
> But Liu's patch you just mentioned, I didn't use it because I
> don't think we should skip kernelcore when movablecore_map is specified.
> If these 2 options are not conflict, we should satisfy them both. :)
> 
> Of course, I also think Liu's suggestion is wonderful. But I think we
> need more discussion on it. :)
> 
> I'll fix it soon.
> Thanks. :)
> 
> >
> > Thanks,
> > Jianguo Wu
> >
> >> +		    start_pfn<  arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
> >> +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
> >> +#endif
> >> +
> >> +		if (start_pfn>= end_pfn)
> >> +			continue;
> >> +
> >> +		while (map_pos<  movablecore_map.nr_map) {
> >> +			if (end_pfn<= movablecore_map.map[map_pos].start_pfn)
> >> +				break;
> >> +
> >> +			if (start_pfn>= movablecore_map.map[map_pos].end_pfn) {
> >> +				map_pos++;
> >> +				continue;
> >> +			}
> >> +
> >> +			/*
> >> +			 * The start_pfn of ZONE_MOVABLE is either the minimum
> >> +			 * pfn specified by movablecore_map, or 0, which means
> >> +			 * the node has no ZONE_MOVABLE.
> >> +			 */
> >> +			zone_movable_limit[nid] = max(start_pfn,
> >> +					movablecore_map.map[map_pos].start_pfn);
> >> +
> >> +			break;
> >> +		}
> >> +	}
> >> +}
> >> +
> >>   #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> >>   static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
> >>   					unsigned long zone_type,
> >> @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
> >>   	return zholes_size[zone_type];
> >>   }
> >>
> >> +static void __meminit sanitize_zone_movable_limit(void)
> >> +{
> >> +}
> >> +
> >>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> >>
> >>   static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
> >> @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
> >>
> >>   	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
> >>   	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
> >> +	sanitize_zone_movable_limit();
> >>   	find_zone_movable_pfns_for_nodes();
> >>
> >>   	/* Print out the zone ranges */
> >
> >
> >
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/5] Add movablecore_map boot option
  2012-12-11  2:33 [PATCH v3 0/5] Add movablecore_map boot option Tang Chen
                   ` (4 preceding siblings ...)
  2012-12-11  2:33 ` [PATCH v3 5/5] page_alloc: Bootmem limit with movablecore_map Tang Chen
@ 2012-12-11 11:33 ` Simon Jeons
  5 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2012-12-11 11:33 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Tue, 2012-12-11 at 10:33 +0800, Tang Chen wrote:
> [What we are doing]
> This patchset provide a boot option for user to specify ZONE_MOVABLE memory
> map for each node in the system.
> 
> movablecore_map=nn[KMG]@ss[KMG]
> 
> This option make sure memory range from ss to ss+nn is movable memory.
> 
> 
> [Why we do this]
> If we hot remove a memroy, the memory cannot have kernel memory,
> because Linux cannot migrate kernel memory currently. Therefore,
> we have to guarantee that the hot removed memory has only movable
> memoroy.
> 
> Linux has two boot options, kernelcore= and movablecore=, for
> creating movable memory. These boot options can specify the amount
> of memory use as kernel or movable memory. Using them, we can
> create ZONE_MOVABLE which has only movable memory.
> 
> But it does not fulfill a requirement of memory hot remove, because
> even if we specify the boot options, movable memory is distributed
> in each node evenly. So when we want to hot remove memory which
> memory range is 0x80000000-0c0000000, we have no way to specify
> the memory as movable memory.
> 
> So we proposed a new feature which specifies memory range to use as
> movable memory.
> 
> 
> [Ways to do this]
> There may be 2 ways to specify movable memory.
>  1. use firmware information
>  2. use boot option
> 
> 1. use firmware information
>   According to ACPI spec 5.0, SRAT table has memory affinity structure
>   and the structure has Hot Pluggable Filed. See "5.2.16.2 Memory
>   Affinity Structure". If we use the information, we might be able to
>   specify movable memory by firmware. For example, if Hot Pluggable
>   Filed is enabled, Linux sets the memory as movable memory.
> 
> 2. use boot option
>   This is our proposal. New boot option can specify memory range to use
>   as movable memory.
> 
> 
> [How we do this]
> We chose second way, because if we use first way, users cannot change
> memory range to use as movable memory easily. We think if we create
> movable memory, performance regression may occur by NUMA. In this case,
> user can turn off the feature easily if we prepare the boot option.
> And if we prepare the boot optino, the user can select which memory
> to use as movable memory easily. 
> 
> 
> [How to use]
> Specify the following boot option:
> movablecore_map=nn[KMG]@ss[KMG]
> 
> That means physical address range from ss to ss+nn will be allocated as
> ZONE_MOVABLE.
> 
> And the following points should be considered.
> 
> 1) If the range is involved in a single node, then from ss to the end of
>    the node will be ZONE_MOVABLE.
> 2) If the range covers two or more nodes, then from ss to the end of
>    the node will be ZONE_MOVABLE, and all the other nodes will only
>    have ZONE_MOVABLE.

Could you explain which part of your codes implement point 1 and point
2?

> 3) If no range is in the node, then the node will have no ZONE_MOVABLE
>    unless kernelcore or movablecore is specified.
> 4) This option could be specified at most MAX_NUMNODES times.
> 5) If kernelcore or movablecore is also specified, movablecore_map will have
>    higher priority to be satisfied.
> 6) This option has no conflict with memmap option.
> 
> 
> Change log:
> 
> v2 -> v3:
> 1) Use memblock_alloc_try_nid() instead of memblock_alloc_nid() to allocate
>    memory twice if a whole node is ZONE_MOVABLE.
> 2) Add DMA, DMA32 addresses check, make sure ZONE_MOVABLE won't use these addresses.
>    Suggested by Wu Jianguo <wujianguo@huawei.com>
> 3) Add lowmem addresses check, when the system has highmem, make sure ZONE_MOVABLE
>    won't use lowmem. Suggested by Liu Jiang <jiang.liu@huawei.com>
> 4) Fix misuse of pfns in movablecore_map.map[] as physical addresses.
> 
> Tang Chen (4):
>   page_alloc: add movable_memmap kernel parameter
>   page_alloc: Introduce zone_movable_limit[] to keep movable limit for
>     nodes
>   page_alloc: Make movablecore_map has higher priority
>   page_alloc: Bootmem limit with movablecore_map
> 
> Yasuaki Ishimatsu (1):
>   x86: get pg_data_t's memory from other node
> 
>  Documentation/kernel-parameters.txt |   17 +++
>  arch/x86/mm/numa.c                  |    5 +-
>  include/linux/memblock.h            |    1 +
>  include/linux/mm.h                  |   11 ++
>  mm/memblock.c                       |   18 +++-
>  mm/page_alloc.c                     |  238 ++++++++++++++++++++++++++++++++++-
>  6 files changed, 282 insertions(+), 8 deletions(-)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11  3:07   ` Jianguo Wu
  2012-12-11  3:32     ` Tang Chen
@ 2012-12-11 12:24     ` Simon Jeons
  2012-12-11 12:41       ` Jianguo Wu
  1 sibling, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-11 12:24 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Tang Chen, jiang.liu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Tue, 2012-12-11 at 11:07 +0800, Jianguo Wu wrote:
> On 2012/12/11 10:33, Tang Chen wrote:
> 
> > This patch introduces a new array zone_movable_limit[] to store the
> > ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
> > The function sanitize_zone_movable_limit() will find out to which
> > node the ranges in movable_map.map[] belongs, and calculates the
> > low boundary of ZONE_MOVABLE for each node.
> > 
> > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> > Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> > Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
> > Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
> > ---
> >  mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 77 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 1c91d16..4853619 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
> >  static unsigned long __initdata required_kernelcore;
> >  static unsigned long __initdata required_movablecore;
> >  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
> > +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
> >  
> >  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
> >  int movable_zone;
> > @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
> >  	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
> >  }
> >  
> > +/**
> > + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
> > + *
> > + * zone_movable_limit is initialized as 0. This function will try to get
> > + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
> > + * assigne them to zone_movable_limit.
> > + * zone_movable_limit[nid] == 0 means no limit for the node.
> > + *
> > + * Note: Each range is represented as [start_pfn, end_pfn)
> > + */
> > +static void __meminit sanitize_zone_movable_limit(void)
> > +{
> > +	int map_pos = 0, i, nid;
> > +	unsigned long start_pfn, end_pfn;
> > +
> > +	if (!movablecore_map.nr_map)
> > +		return;
> > +
> > +	/* Iterate all ranges from minimum to maximum */
> > +	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
> > +		/*
> > +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
> > +		 * specified by user, just go on to check next range.
> > +		 */
> > +		if (zone_movable_limit[nid])
> > +			continue;
> > +
> > +#ifdef CONFIG_ZONE_DMA
> > +		/* Skip DMA memory. */
> > +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
> > +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
> > +#endif
> > +
> > +#ifdef CONFIG_ZONE_DMA32
> > +		/* Skip DMA32 memory. */
> > +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
> > +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
> > +#endif
> > +
> > +#ifdef CONFIG_HIGHMEM
> > +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
> > +		if (zone_movable_is_highmem() &&
> 
> Hi Tang,
> 
> I think zone_movable_is_highmem() is not work correctly here.
> 	sanitize_zone_movable_limit
> 		zone_movable_is_highmem      <--using movable_zone here
> 	find_zone_movable_pfns_for_nodes
> 		find_usable_zone_for_movable <--movable_zone is specified here
> 

Hi Jiangguo and Chen,

- What's the meaning of zone_movable_is_highmem(), does it mean all zone
highmem pages are zone movable pages or ....
- dmesg 

> 0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
> [    0.000000] Movable zone start for each node
> [    0.000000]   Node 0: 0x97800000

Why the start of zone movable is in the range of zone highmem, if all
the pages of zone movable are from zone highmem? If the answer is yes, 
zone movable and zone highmem are in the equal status or not?

> I think Jiang Liu's patch works fine for highmem, please refer to:
> http://marc.info/?l=linux-mm&m=135476085816087&w=2
> 
> Thanks,
> Jianguo Wu
> 
> > +		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
> > +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
> > +#endif
> > +
> > +		if (start_pfn >= end_pfn)
> > +			continue;
> > +
> > +		while (map_pos < movablecore_map.nr_map) {
> > +			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
> > +				break;
> > +
> > +			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
> > +				map_pos++;
> > +				continue;
> > +			}
> > +
> > +			/*
> > +			 * The start_pfn of ZONE_MOVABLE is either the minimum
> > +			 * pfn specified by movablecore_map, or 0, which means
> > +			 * the node has no ZONE_MOVABLE.
> > +			 */
> > +			zone_movable_limit[nid] = max(start_pfn,
> > +					movablecore_map.map[map_pos].start_pfn);
> > +
> > +			break;
> > +		}
> > +	}
> > +}
> > +
> >  #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> >  static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
> >  					unsigned long zone_type,
> > @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
> >  	return zholes_size[zone_type];
> >  }
> >  
> > +static void __meminit sanitize_zone_movable_limit(void)
> > +{
> > +}
> > +
> >  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> >  
> >  static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
> > @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
> >  
> >  	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
> >  	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
> > +	sanitize_zone_movable_limit();
> >  	find_zone_movable_pfns_for_nodes();
> >  
> >  	/* Print out the zone ranges */
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11 12:24     ` Simon Jeons
@ 2012-12-11 12:41       ` Jianguo Wu
  2012-12-11 13:20         ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Jianguo Wu @ 2012-12-11 12:41 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Tang Chen, jiang.liu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On 2012/12/11 20:24, Simon Jeons wrote:

> On Tue, 2012-12-11 at 11:07 +0800, Jianguo Wu wrote:
>> On 2012/12/11 10:33, Tang Chen wrote:
>>
>>> This patch introduces a new array zone_movable_limit[] to store the
>>> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
>>> The function sanitize_zone_movable_limit() will find out to which
>>> node the ranges in movable_map.map[] belongs, and calculates the
>>> low boundary of ZONE_MOVABLE for each node.
>>>
>>> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
>>> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
>>> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
>>> ---
>>>  mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 files changed, 77 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 1c91d16..4853619 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
>>>  static unsigned long __initdata required_kernelcore;
>>>  static unsigned long __initdata required_movablecore;
>>>  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
>>> +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
>>>  
>>>  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>>>  int movable_zone;
>>> @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
>>>  	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
>>>  }
>>>  
>>> +/**
>>> + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
>>> + *
>>> + * zone_movable_limit is initialized as 0. This function will try to get
>>> + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
>>> + * assigne them to zone_movable_limit.
>>> + * zone_movable_limit[nid] == 0 means no limit for the node.
>>> + *
>>> + * Note: Each range is represented as [start_pfn, end_pfn)
>>> + */
>>> +static void __meminit sanitize_zone_movable_limit(void)
>>> +{
>>> +	int map_pos = 0, i, nid;
>>> +	unsigned long start_pfn, end_pfn;
>>> +
>>> +	if (!movablecore_map.nr_map)
>>> +		return;
>>> +
>>> +	/* Iterate all ranges from minimum to maximum */
>>> +	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
>>> +		/*
>>> +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
>>> +		 * specified by user, just go on to check next range.
>>> +		 */
>>> +		if (zone_movable_limit[nid])
>>> +			continue;
>>> +
>>> +#ifdef CONFIG_ZONE_DMA
>>> +		/* Skip DMA memory. */
>>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
>>> +#endif
>>> +
>>> +#ifdef CONFIG_ZONE_DMA32
>>> +		/* Skip DMA32 memory. */
>>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
>>> +#endif
>>> +
>>> +#ifdef CONFIG_HIGHMEM
>>> +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
>>> +		if (zone_movable_is_highmem() &&
>>
>> Hi Tang,
>>
>> I think zone_movable_is_highmem() is not work correctly here.
>> 	sanitize_zone_movable_limit
>> 		zone_movable_is_highmem      <--using movable_zone here
>> 	find_zone_movable_pfns_for_nodes
>> 		find_usable_zone_for_movable <--movable_zone is specified here
>>
> 
> Hi Jiangguo and Chen,
> 
> - What's the meaning of zone_movable_is_highmem(), does it mean all zone
> highmem pages are zone movable pages or ....

Hi Simon,

zone_movable_is_highmem() means whether zone pages in ZONE_MOVABLE are taken from
highmem.

> - dmesg 
> 
>> 0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
>> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
>> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000]   Node 0: 0x97800000
> 
> Why the start of zone movable is in the range of zone highmem, if all
> the pages of zone movable are from zone highmem? If the answer is yes, 

> zone movable and zone highmem are in the equal status or not?

The pages of zone_movable can be taken from zone_movalbe or zone_normal,
if we have highmem, then zone_movable will be taken from zone_highmem,
otherwise zone_movable will be taken from zone_normal.

you can refer to find_usable_zone_for_movable().

Thanks,
Jianguo Wu

> 
>> I think Jiang Liu's patch works fine for highmem, please refer to:
>> http://marc.info/?l=linux-mm&m=135476085816087&w=2
>>
>> Thanks,
>> Jianguo Wu
>>
>>> +		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
>>> +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
>>> +#endif
>>> +
>>> +		if (start_pfn >= end_pfn)
>>> +			continue;
>>> +
>>> +		while (map_pos < movablecore_map.nr_map) {
>>> +			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
>>> +				break;
>>> +
>>> +			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
>>> +				map_pos++;
>>> +				continue;
>>> +			}
>>> +
>>> +			/*
>>> +			 * The start_pfn of ZONE_MOVABLE is either the minimum
>>> +			 * pfn specified by movablecore_map, or 0, which means
>>> +			 * the node has no ZONE_MOVABLE.
>>> +			 */
>>> +			zone_movable_limit[nid] = max(start_pfn,
>>> +					movablecore_map.map[map_pos].start_pfn);
>>> +
>>> +			break;
>>> +		}
>>> +	}
>>> +}
>>> +
>>>  #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>  static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
>>>  					unsigned long zone_type,
>>> @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
>>>  	return zholes_size[zone_type];
>>>  }
>>>  
>>> +static void __meminit sanitize_zone_movable_limit(void)
>>> +{
>>> +}
>>> +
>>>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>  
>>>  static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
>>> @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>>>  
>>>  	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
>>>  	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
>>> +	sanitize_zone_movable_limit();
>>>  	find_zone_movable_pfns_for_nodes();
>>>  
>>>  	/* Print out the zone ranges */
>>
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11 12:41       ` Jianguo Wu
@ 2012-12-11 13:20         ` Simon Jeons
  2012-12-12  1:57           ` Jianguo Wu
  2012-12-12  1:58           ` Lin Feng
  0 siblings, 2 replies; 28+ messages in thread
From: Simon Jeons @ 2012-12-11 13:20 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Tang Chen, jiang.liu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Tue, 2012-12-11 at 20:41 +0800, Jianguo Wu wrote:
> On 2012/12/11 20:24, Simon Jeons wrote:
> 
> > On Tue, 2012-12-11 at 11:07 +0800, Jianguo Wu wrote:
> >> On 2012/12/11 10:33, Tang Chen wrote:
> >>
> >>> This patch introduces a new array zone_movable_limit[] to store the
> >>> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
> >>> The function sanitize_zone_movable_limit() will find out to which
> >>> node the ranges in movable_map.map[] belongs, and calculates the
> >>> low boundary of ZONE_MOVABLE for each node.
> >>>
> >>> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> >>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> >>> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
> >>> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> >>> Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
> >>> ---
> >>>  mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  1 files changed, 77 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index 1c91d16..4853619 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
> >>>  static unsigned long __initdata required_kernelcore;
> >>>  static unsigned long __initdata required_movablecore;
> >>>  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
> >>> +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
> >>>  
> >>>  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
> >>>  int movable_zone;
> >>> @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
> >>>  	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
> >>>  }
> >>>  
> >>> +/**
> >>> + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
> >>> + *
> >>> + * zone_movable_limit is initialized as 0. This function will try to get
> >>> + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
> >>> + * assigne them to zone_movable_limit.
> >>> + * zone_movable_limit[nid] == 0 means no limit for the node.
> >>> + *
> >>> + * Note: Each range is represented as [start_pfn, end_pfn)
> >>> + */
> >>> +static void __meminit sanitize_zone_movable_limit(void)
> >>> +{
> >>> +	int map_pos = 0, i, nid;
> >>> +	unsigned long start_pfn, end_pfn;
> >>> +
> >>> +	if (!movablecore_map.nr_map)
> >>> +		return;
> >>> +
> >>> +	/* Iterate all ranges from minimum to maximum */
> >>> +	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
> >>> +		/*
> >>> +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
> >>> +		 * specified by user, just go on to check next range.
> >>> +		 */
> >>> +		if (zone_movable_limit[nid])
> >>> +			continue;
> >>> +
> >>> +#ifdef CONFIG_ZONE_DMA
> >>> +		/* Skip DMA memory. */
> >>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
> >>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
> >>> +#endif
> >>> +
> >>> +#ifdef CONFIG_ZONE_DMA32
> >>> +		/* Skip DMA32 memory. */
> >>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
> >>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
> >>> +#endif
> >>> +
> >>> +#ifdef CONFIG_HIGHMEM
> >>> +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
> >>> +		if (zone_movable_is_highmem() &&
> >>
> >> Hi Tang,
> >>
> >> I think zone_movable_is_highmem() is not work correctly here.
> >> 	sanitize_zone_movable_limit
> >> 		zone_movable_is_highmem      <--using movable_zone here
> >> 	find_zone_movable_pfns_for_nodes
> >> 		find_usable_zone_for_movable <--movable_zone is specified here
> >>
> > 
> > Hi Jiangguo and Chen,
> > 
> > - What's the meaning of zone_movable_is_highmem(), does it mean all zone
> > highmem pages are zone movable pages or ....
> 
> Hi Simon,
> 
> zone_movable_is_highmem() means whether zone pages in ZONE_MOVABLE are taken from
> highmem.
> 
> > - dmesg 
> > 
> >> 0.000000] Zone ranges:
> >> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
> >> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
> >> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
> >> [    0.000000] Movable zone start for each node
> >> [    0.000000]   Node 0: 0x97800000
> > 
> > Why the start of zone movable is in the range of zone highmem, if all
> > the pages of zone movable are from zone highmem? If the answer is yes, 
> 
> > zone movable and zone highmem are in the equal status or not?
> 
> The pages of zone_movable can be taken from zone_movalbe or zone_normal,
> if we have highmem, then zone_movable will be taken from zone_highmem,
> otherwise zone_movable will be taken from zone_normal.
> 
> you can refer to find_usable_zone_for_movable().

Hi Jiangguo,

I have 8G memory, movablecore=5G, but dmesg looks strange, what
happended to me?

> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
> [    0.000000] Movable zone start for each node
> [    0.000000]   Node 0: 0xb7000000
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
> [    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
> [    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
> [    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
> [    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
> [    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
> [    0.000000] On node 0 totalpages: 748095
> [    0.000000]   DMA zone: 32 pages used for memmap
> [    0.000000]   DMA zone: 0 pages reserved
> [    0.000000]   DMA zone: 3949 pages, LIFO batch:0
> [    0.000000]   Normal zone: 1736 pages used for memmap
> [    0.000000]   Normal zone: 219958 pages, LIFO batch:31
> [    0.000000]   HighMem zone: 4083 pages used for memmap
> [    0.000000]   HighMem zone: 517569 pages, LIFO batch:31
> [    0.000000]   Movable zone: 768 pages, LIFO batch:0

> 
> Thanks,
> Jianguo Wu
> 
> > 
> >> I think Jiang Liu's patch works fine for highmem, please refer to:
> >> http://marc.info/?l=linux-mm&m=135476085816087&w=2
> >>
> >> Thanks,
> >> Jianguo Wu
> >>
> >>> +		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
> >>> +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
> >>> +#endif
> >>> +
> >>> +		if (start_pfn >= end_pfn)
> >>> +			continue;
> >>> +
> >>> +		while (map_pos < movablecore_map.nr_map) {
> >>> +			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
> >>> +				break;
> >>> +
> >>> +			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
> >>> +				map_pos++;
> >>> +				continue;
> >>> +			}
> >>> +
> >>> +			/*
> >>> +			 * The start_pfn of ZONE_MOVABLE is either the minimum
> >>> +			 * pfn specified by movablecore_map, or 0, which means
> >>> +			 * the node has no ZONE_MOVABLE.
> >>> +			 */
> >>> +			zone_movable_limit[nid] = max(start_pfn,
> >>> +					movablecore_map.map[map_pos].start_pfn);
> >>> +
> >>> +			break;
> >>> +		}
> >>> +	}
> >>> +}
> >>> +
> >>>  #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> >>>  static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
> >>>  					unsigned long zone_type,
> >>> @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
> >>>  	return zholes_size[zone_type];
> >>>  }
> >>>  
> >>> +static void __meminit sanitize_zone_movable_limit(void)
> >>> +{
> >>> +}
> >>> +
> >>>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> >>>  
> >>>  static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
> >>> @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
> >>>  
> >>>  	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
> >>>  	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
> >>> +	sanitize_zone_movable_limit();
> >>>  	find_zone_movable_pfns_for_nodes();
> >>>  
> >>>  	/* Print out the zone ranges */
> >>
> >>
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> > 
> > 
> > .
> > 
> 
> 
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11 11:28       ` Simon Jeons
@ 2012-12-12  0:49         ` Jiang Liu
  2012-12-12  9:09           ` Tang Chen
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang Liu @ 2012-12-12  0:49 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Tang Chen, Jianguo Wu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On 2012-12-11 19:28, Simon Jeons wrote:
> On Tue, 2012-12-11 at 11:32 +0800, Tang Chen wrote:
>> On 12/11/2012 11:07 AM, Jianguo Wu wrote:
>>> On 2012/12/11 10:33, Tang Chen wrote:
>>>
>>>> This patch introduces a new array zone_movable_limit[] to store the
>>>> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
>>>> The function sanitize_zone_movable_limit() will find out to which
>>>> node the ranges in movable_map.map[] belongs, and calculates the
>>>> low boundary of ZONE_MOVABLE for each node.
> 
> What's the difference between zone_movable_limit[nid] and
> zone_movable_pfn[nid]?
zone_movable_limit[] is a temporary storage for zone_moveable_pfn[].
It's used to handle a special case if user specifies both movablecore_map
and movablecore/kernelcore on the kernel command line. 

> 
>>>>
>>>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>>>> Signed-off-by: Jiang Liu<jiang.liu@huawei.com>
>>>> Reviewed-by: Wen Congyang<wency@cn.fujitsu.com>
>>>> Reviewed-by: Lai Jiangshan<laijs@cn.fujitsu.com>
>>>> Tested-by: Lin Feng<linfeng@cn.fujitsu.com>
>>>> ---
>>>>   mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   1 files changed, 77 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 1c91d16..4853619 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
>>>>   static unsigned long __initdata required_kernelcore;
>>>>   static unsigned long __initdata required_movablecore;
>>>>   static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
>>>> +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
>>>>
>>>>   /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>>>>   int movable_zone;
>>>> @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
>>>>   	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
>>>>   }
>>>>
>>>> +/**
>>>> + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
>>>> + *
>>>> + * zone_movable_limit is initialized as 0. This function will try to get
>>>> + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
>>>> + * assigne them to zone_movable_limit.
>>>> + * zone_movable_limit[nid] == 0 means no limit for the node.
>>>> + *
>>>> + * Note: Each range is represented as [start_pfn, end_pfn)
>>>> + */
>>>> +static void __meminit sanitize_zone_movable_limit(void)
>>>> +{
>>>> +	int map_pos = 0, i, nid;
>>>> +	unsigned long start_pfn, end_pfn;
>>>> +
>>>> +	if (!movablecore_map.nr_map)
>>>> +		return;
>>>> +
>>>> +	/* Iterate all ranges from minimum to maximum */
>>>> +	for_each_mem_pfn_range(i, MAX_NUMNODES,&start_pfn,&end_pfn,&nid) {
>>>> +		/*
>>>> +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
>>>> +		 * specified by user, just go on to check next range.
>>>> +		 */
>>>> +		if (zone_movable_limit[nid])
>>>> +			continue;
>>>> +
>>>> +#ifdef CONFIG_ZONE_DMA
>>>> +		/* Skip DMA memory. */
>>>> +		if (start_pfn<  arch_zone_highest_possible_pfn[ZONE_DMA])
>>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
>>>> +#endif
>>>> +
>>>> +#ifdef CONFIG_ZONE_DMA32
>>>> +		/* Skip DMA32 memory. */
>>>> +		if (start_pfn<  arch_zone_highest_possible_pfn[ZONE_DMA32])
>>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
>>>> +#endif
>>>> +
>>>> +#ifdef CONFIG_HIGHMEM
>>>> +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
>>>> +		if (zone_movable_is_highmem()&&
>>>
>>> Hi Tang,
>>>
>>> I think zone_movable_is_highmem() is not work correctly here.
>>> 	sanitize_zone_movable_limit
>>> 		zone_movable_is_highmem<--using movable_zone here
>>> 	find_zone_movable_pfns_for_nodes
>>> 		find_usable_zone_for_movable<--movable_zone is specified here
>>>
>>> I think Jiang Liu's patch works fine for highmem, please refer to:
>>> http://marc.info/?l=linux-mm&m=135476085816087&w=2
>>
>> Hi Wu,
>>
>> Yes, I forgot movable_zone think. Thanks for reminding me. :)
>>
>> But Liu's patch you just mentioned, I didn't use it because I
>> don't think we should skip kernelcore when movablecore_map is specified.
>> If these 2 options are not conflict, we should satisfy them both. :)
>>
>> Of course, I also think Liu's suggestion is wonderful. But I think we
>> need more discussion on it. :)
>>
>> I'll fix it soon.
>> Thanks. :)
>>
>>>
>>> Thanks,
>>> Jianguo Wu
>>>
>>>> +		    start_pfn<  arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
>>>> +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
>>>> +#endif
>>>> +
>>>> +		if (start_pfn>= end_pfn)
>>>> +			continue;
>>>> +
>>>> +		while (map_pos<  movablecore_map.nr_map) {
>>>> +			if (end_pfn<= movablecore_map.map[map_pos].start_pfn)
>>>> +				break;
>>>> +
>>>> +			if (start_pfn>= movablecore_map.map[map_pos].end_pfn) {
>>>> +				map_pos++;
>>>> +				continue;
>>>> +			}
>>>> +
>>>> +			/*
>>>> +			 * The start_pfn of ZONE_MOVABLE is either the minimum
>>>> +			 * pfn specified by movablecore_map, or 0, which means
>>>> +			 * the node has no ZONE_MOVABLE.
>>>> +			 */
>>>> +			zone_movable_limit[nid] = max(start_pfn,
>>>> +					movablecore_map.map[map_pos].start_pfn);
>>>> +
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +}
>>>> +
>>>>   #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>>   static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
>>>>   					unsigned long zone_type,
>>>> @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
>>>>   	return zholes_size[zone_type];
>>>>   }
>>>>
>>>> +static void __meminit sanitize_zone_movable_limit(void)
>>>> +{
>>>> +}
>>>> +
>>>>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>>
>>>>   static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
>>>> @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>>>>
>>>>   	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
>>>>   	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
>>>> +	sanitize_zone_movable_limit();
>>>>   	find_zone_movable_pfns_for_nodes();
>>>>
>>>>   	/* Print out the zone ranges */
>>>
>>>
>>>
>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 
> 
> .
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/5][RESEND] page_alloc: Make movablecore_map has higher priority
  2012-12-11  4:56   ` [PATCH v3 4/5][RESEND] " Tang Chen
@ 2012-12-12  1:33     ` Simon Jeons
  2012-12-12  9:34       ` Tang Chen
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-12  1:33 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Tue, 2012-12-11 at 12:56 +0800, Tang Chen wrote:
> If kernelcore or movablecore is specified at the same time
> with movablecore_map, movablecore_map will have higher
> priority to be satisfied.
> This patch will make find_zone_movable_pfns_for_nodes()
> calculate zone_movable_pfn[] with the limit from
> zone_movable_limit[].
> 
> change log:
> Move find_usable_zone_for_movable() to free_area_init_nodes()
> so that sanitize_zone_movable_limit() in patch 3 could use
> initialized movable_zone.
> 
> Reported-by: Wu Jianguo <wujianguo@huawei.com>
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
> ---
>  mm/page_alloc.c |   28 +++++++++++++++++++++++++---
>  1 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 52c368e..00fa67d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4839,9 +4839,17 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>  		required_kernelcore = max(required_kernelcore, corepages);
>  	}
>  
> -	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
> -	if (!required_kernelcore)
> +	/*
> +	 * If neither kernelcore/movablecore nor movablecore_map is specified,
> +	 * there is no ZONE_MOVABLE. But if movablecore_map is specified, the
> +	 * start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[].
> +	 */
> +	if (!required_kernelcore) {
> +		if (movablecore_map.nr_map)
> +			memcpy(zone_movable_pfn, zone_movable_limit,
> +				sizeof(zone_movable_pfn));
>  		goto out;
> +	}
>  
>  	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>  	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
> @@ -4871,10 +4879,24 @@ restart:
>  		for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
>  			unsigned long size_pages;
>  
> +			/*
> +			 * Find more memory for kernelcore in
> +			 * [zone_movable_pfn[nid], zone_movable_limit[nid]).
> +			 */
>  			start_pfn = max(start_pfn, zone_movable_pfn[nid]);
>  			if (start_pfn >= end_pfn)
>  				continue;
>  

Hi Chen,

> +			if (zone_movable_limit[nid]) {
> +				end_pfn = min(end_pfn, zone_movable_limit[nid]);
> +				/* No range left for kernelcore in this node */
> +				if (start_pfn >= end_pfn) {
> +					zone_movable_pfn[nid] =
> +							zone_movable_limit[nid];
> +					break;
> +				}
> +			}
> +

Could you explain this part of codes? hard to understand.

>  			/* Account for what is only usable for kernelcore */
>  			if (start_pfn < usable_startpfn) {
>  				unsigned long kernel_pages;
> @@ -4934,12 +4956,12 @@ restart:
>  	if (usable_nodes && required_kernelcore > usable_nodes)
>  		goto restart;
>  
> +out:
>  	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
>  	for (nid = 0; nid < MAX_NUMNODES; nid++)
>  		zone_movable_pfn[nid] =
>  			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
>  
> -out:
>  	/* restore the node_state */
>  	node_states[N_HIGH_MEMORY] = saved_node_state;
>  }



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11 13:20         ` Simon Jeons
@ 2012-12-12  1:57           ` Jianguo Wu
  2012-12-12  2:03             ` Simon Jeons
  2012-12-12  1:58           ` Lin Feng
  1 sibling, 1 reply; 28+ messages in thread
From: Jianguo Wu @ 2012-12-12  1:57 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Tang Chen, jiang.liu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On 2012/12/11 21:20, Simon Jeons wrote:

> On Tue, 2012-12-11 at 20:41 +0800, Jianguo Wu wrote:
>> On 2012/12/11 20:24, Simon Jeons wrote:
>>
>>> On Tue, 2012-12-11 at 11:07 +0800, Jianguo Wu wrote:
>>>> On 2012/12/11 10:33, Tang Chen wrote:
>>>>
>>>>> This patch introduces a new array zone_movable_limit[] to store the
>>>>> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
>>>>> The function sanitize_zone_movable_limit() will find out to which
>>>>> node the ranges in movable_map.map[] belongs, and calculates the
>>>>> low boundary of ZONE_MOVABLE for each node.
>>>>>
>>>>> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>>>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
>>>>> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>>> Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
>>>>> ---
>>>>>  mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  1 files changed, 77 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index 1c91d16..4853619 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
>>>>>  static unsigned long __initdata required_kernelcore;
>>>>>  static unsigned long __initdata required_movablecore;
>>>>>  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
>>>>> +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
>>>>>  
>>>>>  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>>>>>  int movable_zone;
>>>>> @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
>>>>>  	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
>>>>>  }
>>>>>  
>>>>> +/**
>>>>> + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
>>>>> + *
>>>>> + * zone_movable_limit is initialized as 0. This function will try to get
>>>>> + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
>>>>> + * assigne them to zone_movable_limit.
>>>>> + * zone_movable_limit[nid] == 0 means no limit for the node.
>>>>> + *
>>>>> + * Note: Each range is represented as [start_pfn, end_pfn)
>>>>> + */
>>>>> +static void __meminit sanitize_zone_movable_limit(void)
>>>>> +{
>>>>> +	int map_pos = 0, i, nid;
>>>>> +	unsigned long start_pfn, end_pfn;
>>>>> +
>>>>> +	if (!movablecore_map.nr_map)
>>>>> +		return;
>>>>> +
>>>>> +	/* Iterate all ranges from minimum to maximum */
>>>>> +	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
>>>>> +		/*
>>>>> +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
>>>>> +		 * specified by user, just go on to check next range.
>>>>> +		 */
>>>>> +		if (zone_movable_limit[nid])
>>>>> +			continue;
>>>>> +
>>>>> +#ifdef CONFIG_ZONE_DMA
>>>>> +		/* Skip DMA memory. */
>>>>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
>>>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
>>>>> +#endif
>>>>> +
>>>>> +#ifdef CONFIG_ZONE_DMA32
>>>>> +		/* Skip DMA32 memory. */
>>>>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
>>>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
>>>>> +#endif
>>>>> +
>>>>> +#ifdef CONFIG_HIGHMEM
>>>>> +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
>>>>> +		if (zone_movable_is_highmem() &&
>>>>
>>>> Hi Tang,
>>>>
>>>> I think zone_movable_is_highmem() is not work correctly here.
>>>> 	sanitize_zone_movable_limit
>>>> 		zone_movable_is_highmem      <--using movable_zone here
>>>> 	find_zone_movable_pfns_for_nodes
>>>> 		find_usable_zone_for_movable <--movable_zone is specified here
>>>>
>>>
>>> Hi Jiangguo and Chen,
>>>
>>> - What's the meaning of zone_movable_is_highmem(), does it mean all zone
>>> highmem pages are zone movable pages or ....
>>
>> Hi Simon,
>>
>> zone_movable_is_highmem() means whether zone pages in ZONE_MOVABLE are taken from
>> highmem.
>>
>>> - dmesg 
>>>
>>>> 0.000000] Zone ranges:
>>>> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
>>>> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
>>>> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
>>>> [    0.000000] Movable zone start for each node
>>>> [    0.000000]   Node 0: 0x97800000
>>>
>>> Why the start of zone movable is in the range of zone highmem, if all
>>> the pages of zone movable are from zone highmem? If the answer is yes, 
>>
>>> zone movable and zone highmem are in the equal status or not?
>>
>> The pages of zone_movable can be taken from zone_movalbe or zone_normal,
>> if we have highmem, then zone_movable will be taken from zone_highmem,
>> otherwise zone_movable will be taken from zone_normal.
>>
>> you can refer to find_usable_zone_for_movable().
> 
> Hi Jiangguo,
> 
> I have 8G memory, movablecore=5G, but dmesg looks strange, what
> happended to me?
> 

Hi Simon,

I think you used 32bit kernel, and didn't enable CONFIG_X86_PAE, right?
So, it can only address memory below 4G.

Thanks,
Jianguo Wu

>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
>> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
>> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000]   Node 0: 0xb7000000
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
>> [    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
>> [    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
>> [    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
>> [    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
>> [    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
>> [    0.000000] On node 0 totalpages: 748095
>> [    0.000000]   DMA zone: 32 pages used for memmap
>> [    0.000000]   DMA zone: 0 pages reserved
>> [    0.000000]   DMA zone: 3949 pages, LIFO batch:0
>> [    0.000000]   Normal zone: 1736 pages used for memmap
>> [    0.000000]   Normal zone: 219958 pages, LIFO batch:31
>> [    0.000000]   HighMem zone: 4083 pages used for memmap
>> [    0.000000]   HighMem zone: 517569 pages, LIFO batch:31
>> [    0.000000]   Movable zone: 768 pages, LIFO batch:0
> 
>>
>> Thanks,
>> Jianguo Wu
>>
>>>
>>>> I think Jiang Liu's patch works fine for highmem, please refer to:
>>>> http://marc.info/?l=linux-mm&m=135476085816087&w=2
>>>>
>>>> Thanks,
>>>> Jianguo Wu
>>>>
>>>>> +		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
>>>>> +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
>>>>> +#endif
>>>>> +
>>>>> +		if (start_pfn >= end_pfn)
>>>>> +			continue;
>>>>> +
>>>>> +		while (map_pos < movablecore_map.nr_map) {
>>>>> +			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
>>>>> +				break;
>>>>> +
>>>>> +			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
>>>>> +				map_pos++;
>>>>> +				continue;
>>>>> +			}
>>>>> +
>>>>> +			/*
>>>>> +			 * The start_pfn of ZONE_MOVABLE is either the minimum
>>>>> +			 * pfn specified by movablecore_map, or 0, which means
>>>>> +			 * the node has no ZONE_MOVABLE.
>>>>> +			 */
>>>>> +			zone_movable_limit[nid] = max(start_pfn,
>>>>> +					movablecore_map.map[map_pos].start_pfn);
>>>>> +
>>>>> +			break;
>>>>> +		}
>>>>> +	}
>>>>> +}
>>>>> +
>>>>>  #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>>>  static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
>>>>>  					unsigned long zone_type,
>>>>> @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
>>>>>  	return zholes_size[zone_type];
>>>>>  }
>>>>>  
>>>>> +static void __meminit sanitize_zone_movable_limit(void)
>>>>> +{
>>>>> +}
>>>>> +
>>>>>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>>>  
>>>>>  static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
>>>>> @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>>>>>  
>>>>>  	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
>>>>>  	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
>>>>> +	sanitize_zone_movable_limit();
>>>>>  	find_zone_movable_pfns_for_nodes();
>>>>>  
>>>>>  	/* Print out the zone ranges */
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
> 
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-11 13:20         ` Simon Jeons
  2012-12-12  1:57           ` Jianguo Wu
@ 2012-12-12  1:58           ` Lin Feng
  1 sibling, 0 replies; 28+ messages in thread
From: Lin Feng @ 2012-12-12  1:58 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Jianguo Wu, Tang Chen, jiang.liu, hpa, akpm, wency, laijs,
	yinghai, isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim,
	mgorman, rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm, linux-doc



On 12/11/2012 09:20 PM, Simon Jeons wrote:
> On Tue, 2012-12-11 at 20:41 +0800, Jianguo Wu wrote:
>> On 2012/12/11 20:24, Simon Jeons wrote:
>>
>>> On Tue, 2012-12-11 at 11:07 +0800, Jianguo Wu wrote:
>>>> On 2012/12/11 10:33, Tang Chen wrote:
>>>>
>>>>> This patch introduces a new array zone_movable_limit[] to store the
>>>>> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
>>>>> The function sanitize_zone_movable_limit() will find out to which
>>>>> node the ranges in movable_map.map[] belongs, and calculates the
>>>>> low boundary of ZONE_MOVABLE for each node.
>>>>>
>>>>> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>>>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
>>>>> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>>> Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
>>>>> ---
>>>>>  mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  1 files changed, 77 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index 1c91d16..4853619 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
>>>>>  static unsigned long __initdata required_kernelcore;
>>>>>  static unsigned long __initdata required_movablecore;
>>>>>  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
>>>>> +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
>>>>>  
>>>>>  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>>>>>  int movable_zone;
>>>>> @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
>>>>>  	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
>>>>>  }
>>>>>  
>>>>> +/**
>>>>> + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
>>>>> + *
>>>>> + * zone_movable_limit is initialized as 0. This function will try to get
>>>>> + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
>>>>> + * assigne them to zone_movable_limit.
>>>>> + * zone_movable_limit[nid] == 0 means no limit for the node.
>>>>> + *
>>>>> + * Note: Each range is represented as [start_pfn, end_pfn)
>>>>> + */
>>>>> +static void __meminit sanitize_zone_movable_limit(void)
>>>>> +{
>>>>> +	int map_pos = 0, i, nid;
>>>>> +	unsigned long start_pfn, end_pfn;
>>>>> +
>>>>> +	if (!movablecore_map.nr_map)
>>>>> +		return;
>>>>> +
>>>>> +	/* Iterate all ranges from minimum to maximum */
>>>>> +	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
>>>>> +		/*
>>>>> +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
>>>>> +		 * specified by user, just go on to check next range.
>>>>> +		 */
>>>>> +		if (zone_movable_limit[nid])
>>>>> +			continue;
>>>>> +
>>>>> +#ifdef CONFIG_ZONE_DMA
>>>>> +		/* Skip DMA memory. */
>>>>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
>>>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
>>>>> +#endif
>>>>> +
>>>>> +#ifdef CONFIG_ZONE_DMA32
>>>>> +		/* Skip DMA32 memory. */
>>>>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
>>>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
>>>>> +#endif
>>>>> +
>>>>> +#ifdef CONFIG_HIGHMEM
>>>>> +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
>>>>> +		if (zone_movable_is_highmem() &&
>>>>
>>>> Hi Tang,
>>>>
>>>> I think zone_movable_is_highmem() is not work correctly here.
>>>> 	sanitize_zone_movable_limit
>>>> 		zone_movable_is_highmem      <--using movable_zone here
>>>> 	find_zone_movable_pfns_for_nodes
>>>> 		find_usable_zone_for_movable <--movable_zone is specified here
>>>>
>>>
>>> Hi Jiangguo and Chen,
>>>
>>> - What's the meaning of zone_movable_is_highmem(), does it mean all zone
>>> highmem pages are zone movable pages or ....
>>
>> Hi Simon,
>>
>> zone_movable_is_highmem() means whether zone pages in ZONE_MOVABLE are taken from
>> highmem.
>>
>>> - dmesg 
>>>
>>>> 0.000000] Zone ranges:
>>>> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
>>>> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
>>>> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
>>>> [    0.000000] Movable zone start for each node
>>>> [    0.000000]   Node 0: 0x97800000
>>>
>>> Why the start of zone movable is in the range of zone highmem, if all
>>> the pages of zone movable are from zone highmem? If the answer is yes, 
>>
>>> zone movable and zone highmem are in the equal status or not?
>>
>> The pages of zone_movable can be taken from zone_movalbe or zone_normal,
>> if we have highmem, then zone_movable will be taken from zone_highmem,
>> otherwise zone_movable will be taken from zone_normal.
>>
>> you can refer to find_usable_zone_for_movable().
> 
> Hi Jiangguo,
> 
> I have 8G memory, movablecore=5G, but dmesg looks strange, what
> happended to me?
Hi Simon,

Is there any other boot parameters for memory taken besides 'movablecore=5G'?

thanks,
linfeng
> 
>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
>> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
>> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000]   Node 0: 0xb7000000
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
>> [    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
>> [    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
>> [    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
>> [    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
>> [    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
>> [    0.000000] On node 0 totalpages: 748095
>> [    0.000000]   DMA zone: 32 pages used for memmap
>> [    0.000000]   DMA zone: 0 pages reserved
>> [    0.000000]   DMA zone: 3949 pages, LIFO batch:0
>> [    0.000000]   Normal zone: 1736 pages used for memmap
>> [    0.000000]   Normal zone: 219958 pages, LIFO batch:31
>> [    0.000000]   HighMem zone: 4083 pages used for memmap
>> [    0.000000]   HighMem zone: 517569 pages, LIFO batch:31
>> [    0.000000]   Movable zone: 768 pages, LIFO batch:0
> 
>>
>> Thanks,
>> Jianguo Wu
>>
>>>
>>>> I think Jiang Liu's patch works fine for highmem, please refer to:
>>>> http://marc.info/?l=linux-mm&m=135476085816087&w=2
>>>>
>>>> Thanks,
>>>> Jianguo Wu
>>>>
>>>>> +		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
>>>>> +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
>>>>> +#endif
>>>>> +
>>>>> +		if (start_pfn >= end_pfn)
>>>>> +			continue;
>>>>> +
>>>>> +		while (map_pos < movablecore_map.nr_map) {
>>>>> +			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
>>>>> +				break;
>>>>> +
>>>>> +			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
>>>>> +				map_pos++;
>>>>> +				continue;
>>>>> +			}
>>>>> +
>>>>> +			/*
>>>>> +			 * The start_pfn of ZONE_MOVABLE is either the minimum
>>>>> +			 * pfn specified by movablecore_map, or 0, which means
>>>>> +			 * the node has no ZONE_MOVABLE.
>>>>> +			 */
>>>>> +			zone_movable_limit[nid] = max(start_pfn,
>>>>> +					movablecore_map.map[map_pos].start_pfn);
>>>>> +
>>>>> +			break;
>>>>> +		}
>>>>> +	}
>>>>> +}
>>>>> +
>>>>>  #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>>>  static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
>>>>>  					unsigned long zone_type,
>>>>> @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
>>>>>  	return zholes_size[zone_type];
>>>>>  }
>>>>>  
>>>>> +static void __meminit sanitize_zone_movable_limit(void)
>>>>> +{
>>>>> +}
>>>>> +
>>>>>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>>>  
>>>>>  static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
>>>>> @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>>>>>  
>>>>>  	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
>>>>>  	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
>>>>> +	sanitize_zone_movable_limit();
>>>>>  	find_zone_movable_pfns_for_nodes();
>>>>>  
>>>>>  	/* Print out the zone ranges */
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-12  1:57           ` Jianguo Wu
@ 2012-12-12  2:03             ` Simon Jeons
  0 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2012-12-12  2:03 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Tang Chen, jiang.liu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Wed, 2012-12-12 at 09:57 +0800, Jianguo Wu wrote:
> On 2012/12/11 21:20, Simon Jeons wrote:
> 
> > On Tue, 2012-12-11 at 20:41 +0800, Jianguo Wu wrote:
> >> On 2012/12/11 20:24, Simon Jeons wrote:
> >>
> >>> On Tue, 2012-12-11 at 11:07 +0800, Jianguo Wu wrote:
> >>>> On 2012/12/11 10:33, Tang Chen wrote:
> >>>>
> >>>>> This patch introduces a new array zone_movable_limit[] to store the
> >>>>> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
> >>>>> The function sanitize_zone_movable_limit() will find out to which
> >>>>> node the ranges in movable_map.map[] belongs, and calculates the
> >>>>> low boundary of ZONE_MOVABLE for each node.
> >>>>>
> >>>>> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> >>>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> >>>>> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
> >>>>> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> >>>>> Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
> >>>>> ---
> >>>>>  mm/page_alloc.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>  1 files changed, 77 insertions(+), 0 deletions(-)
> >>>>>
> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>>> index 1c91d16..4853619 100644
> >>>>> --- a/mm/page_alloc.c
> >>>>> +++ b/mm/page_alloc.c
> >>>>> @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
> >>>>>  static unsigned long __initdata required_kernelcore;
> >>>>>  static unsigned long __initdata required_movablecore;
> >>>>>  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
> >>>>> +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
> >>>>>  
> >>>>>  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
> >>>>>  int movable_zone;
> >>>>> @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
> >>>>>  	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
> >>>>>  }
> >>>>>  
> >>>>> +/**
> >>>>> + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
> >>>>> + *
> >>>>> + * zone_movable_limit is initialized as 0. This function will try to get
> >>>>> + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
> >>>>> + * assigne them to zone_movable_limit.
> >>>>> + * zone_movable_limit[nid] == 0 means no limit for the node.
> >>>>> + *
> >>>>> + * Note: Each range is represented as [start_pfn, end_pfn)
> >>>>> + */
> >>>>> +static void __meminit sanitize_zone_movable_limit(void)
> >>>>> +{
> >>>>> +	int map_pos = 0, i, nid;
> >>>>> +	unsigned long start_pfn, end_pfn;
> >>>>> +
> >>>>> +	if (!movablecore_map.nr_map)
> >>>>> +		return;
> >>>>> +
> >>>>> +	/* Iterate all ranges from minimum to maximum */
> >>>>> +	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
> >>>>> +		/*
> >>>>> +		 * If we have found lowest pfn of ZONE_MOVABLE of the node
> >>>>> +		 * specified by user, just go on to check next range.
> >>>>> +		 */
> >>>>> +		if (zone_movable_limit[nid])
> >>>>> +			continue;
> >>>>> +
> >>>>> +#ifdef CONFIG_ZONE_DMA
> >>>>> +		/* Skip DMA memory. */
> >>>>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
> >>>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
> >>>>> +#endif
> >>>>> +
> >>>>> +#ifdef CONFIG_ZONE_DMA32
> >>>>> +		/* Skip DMA32 memory. */
> >>>>> +		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
> >>>>> +			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
> >>>>> +#endif
> >>>>> +
> >>>>> +#ifdef CONFIG_HIGHMEM
> >>>>> +		/* Skip lowmem if ZONE_MOVABLE is highmem. */
> >>>>> +		if (zone_movable_is_highmem() &&
> >>>>
> >>>> Hi Tang,
> >>>>
> >>>> I think zone_movable_is_highmem() is not work correctly here.
> >>>> 	sanitize_zone_movable_limit
> >>>> 		zone_movable_is_highmem      <--using movable_zone here
> >>>> 	find_zone_movable_pfns_for_nodes
> >>>> 		find_usable_zone_for_movable <--movable_zone is specified here
> >>>>
> >>>
> >>> Hi Jiangguo and Chen,
> >>>
> >>> - What's the meaning of zone_movable_is_highmem(), does it mean all zone
> >>> highmem pages are zone movable pages or ....
> >>
> >> Hi Simon,
> >>
> >> zone_movable_is_highmem() means whether zone pages in ZONE_MOVABLE are taken from
> >> highmem.
> >>
> >>> - dmesg 
> >>>
> >>>> 0.000000] Zone ranges:
> >>>> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
> >>>> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
> >>>> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
> >>>> [    0.000000] Movable zone start for each node
> >>>> [    0.000000]   Node 0: 0x97800000
> >>>
> >>> Why the start of zone movable is in the range of zone highmem, if all
> >>> the pages of zone movable are from zone highmem? If the answer is yes, 
> >>
> >>> zone movable and zone highmem are in the equal status or not?
> >>
> >> The pages of zone_movable can be taken from zone_movalbe or zone_normal,
> >> if we have highmem, then zone_movable will be taken from zone_highmem,
> >> otherwise zone_movable will be taken from zone_normal.
> >>
> >> you can refer to find_usable_zone_for_movable().
> > 
> > Hi Jiangguo,
> > 
> > I have 8G memory, movablecore=5G, but dmesg looks strange, what
> > happended to me?
> > 
> 
> Hi Simon,
> 
> I think you used 32bit kernel, and didn't enable CONFIG_X86_PAE, right?
> So, it can only address memory below 4G.

Thanks for you response. Enable PAE on x86 32bit kernel, 8G memory,
movablecore=6.5G

[    0.000000] 8304MB HIGHMEM available.
[    0.000000] 885MB LOWMEM available.
[    0.000000]   mapped low ram: 0 - 375fe000
[    0.000000]   low ram: 0 - 375fe000
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
[    0.000000]   Normal   [mem 0x01000000-0x375fdfff]
[    0.000000]   HighMem  [mem 0x375fe000-0x3e5fffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
[    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
[    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
[    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
[    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
[    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
[    0.000000]   node   0: [mem 0x00000000-0x3e5fffff]
[    0.000000] On node 0 totalpages: 2051391
[    0.000000] free_area_init_node: node 0, pgdat c0c26a80, node_mem_map
f19de200
[    0.000000]   DMA zone: 32 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 3949 pages, LIFO batch:0
[    0.000000]   Normal zone: 1740 pages used for memmap
[    0.000000]   Normal zone: 220466 pages, LIFO batch:31
[    0.000000]   HighMem zone: 16609 pages used for memmap
[    0.000000]   HighMem zone: 1808595 pages, LIFO batch:31

Why zone movable disappear?

> 
> Thanks,
> Jianguo Wu
> 
> >> [    0.000000] Zone ranges:
> >> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
> >> [    0.000000]   Normal   [mem 0x01000000-0x373fdfff]
> >> [    0.000000]   HighMem  [mem 0x373fe000-0xb6cfffff]
> >> [    0.000000] Movable zone start for each node
> >> [    0.000000]   Node 0: 0xb7000000
> >> [    0.000000] Early memory node ranges
> >> [    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
> >> [    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
> >> [    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
> >> [    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
> >> [    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
> >> [    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
> >> [    0.000000] On node 0 totalpages: 748095
> >> [    0.000000]   DMA zone: 32 pages used for memmap
> >> [    0.000000]   DMA zone: 0 pages reserved
> >> [    0.000000]   DMA zone: 3949 pages, LIFO batch:0
> >> [    0.000000]   Normal zone: 1736 pages used for memmap
> >> [    0.000000]   Normal zone: 219958 pages, LIFO batch:31
> >> [    0.000000]   HighMem zone: 4083 pages used for memmap
> >> [    0.000000]   HighMem zone: 517569 pages, LIFO batch:31
> >> [    0.000000]   Movable zone: 768 pages, LIFO batch:0
> > 
> >>
> >> Thanks,
> >> Jianguo Wu
> >>
> >>>
> >>>> I think Jiang Liu's patch works fine for highmem, please refer to:
> >>>> http://marc.info/?l=linux-mm&m=135476085816087&w=2
> >>>>
> >>>> Thanks,
> >>>> Jianguo Wu
> >>>>
> >>>>> +		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
> >>>>> +			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
> >>>>> +#endif
> >>>>> +
> >>>>> +		if (start_pfn >= end_pfn)
> >>>>> +			continue;
> >>>>> +
> >>>>> +		while (map_pos < movablecore_map.nr_map) {
> >>>>> +			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
> >>>>> +				break;
> >>>>> +
> >>>>> +			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
> >>>>> +				map_pos++;
> >>>>> +				continue;
> >>>>> +			}
> >>>>> +
> >>>>> +			/*
> >>>>> +			 * The start_pfn of ZONE_MOVABLE is either the minimum
> >>>>> +			 * pfn specified by movablecore_map, or 0, which means
> >>>>> +			 * the node has no ZONE_MOVABLE.
> >>>>> +			 */
> >>>>> +			zone_movable_limit[nid] = max(start_pfn,
> >>>>> +					movablecore_map.map[map_pos].start_pfn);
> >>>>> +
> >>>>> +			break;
> >>>>> +		}
> >>>>> +	}
> >>>>> +}
> >>>>> +
> >>>>>  #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> >>>>>  static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
> >>>>>  					unsigned long zone_type,
> >>>>> @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
> >>>>>  	return zholes_size[zone_type];
> >>>>>  }
> >>>>>  
> >>>>> +static void __meminit sanitize_zone_movable_limit(void)
> >>>>> +{
> >>>>> +}
> >>>>> +
> >>>>>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> >>>>>  
> >>>>>  static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
> >>>>> @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
> >>>>>  
> >>>>>  	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
> >>>>>  	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
> >>>>> +	sanitize_zone_movable_limit();
> >>>>>  	find_zone_movable_pfns_for_nodes();
> >>>>>  
> >>>>>  	/* Print out the zone ranges */
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>> the body to majordomo@kvack.org.  For more info on Linux MM,
> >>>> see: http://www.linux-mm.org/ .
> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>>
> >>>
> >>>
> >>> .
> >>>
> >>
> >>
> >>
> > 
> > 
> > 
> > .
> > 
> 
> 
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-12  0:49         ` Jiang Liu
@ 2012-12-12  9:09           ` Tang Chen
  2012-12-12  9:29             ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Tang Chen @ 2012-12-12  9:09 UTC (permalink / raw)
  To: Jiang Liu, Simon Jeons
  Cc: Jianguo Wu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On 12/12/2012 08:49 AM, Jiang Liu wrote:
>>>>> This patch introduces a new array zone_movable_limit[] to store the
>>>>> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
>>>>> The function sanitize_zone_movable_limit() will find out to which
>>>>> node the ranges in movable_map.map[] belongs, and calculates the
>>>>> low boundary of ZONE_MOVABLE for each node.
>>
>> What's the difference between zone_movable_limit[nid] and
>> zone_movable_pfn[nid]?
> zone_movable_limit[] is a temporary storage for zone_moveable_pfn[].
> It's used to handle a special case if user specifies both movablecore_map
> and movablecore/kernelcore on the kernel command line.
>
Hi Simon, Liu,

Sorry for the late and thanks for your discussion. :)

As Liu said, zone_movable_limit[] is a temporary array for calculation.

If users specified movablecore_map option, zone_movable_limit[] holds
the lowest pfn of ZONE_MOVABLE limited by movablecore_map option. It is 
constant, won't change.

Please refer to find_zone_movable_pfns_for_nodes() in patch4, you will
see that zone_moveable_pfn[] will be changed each time kernel area
increases.

So when kernel area increases on node i, zone_moveable_pfn[i] will
increase. And if zone_moveable_pfn[i] > zone_movable_limit[i], we should
stop allocate memory for kernel on node i. Here, I give movablecore_map 
higher priority than kernelcore/movablecore.

And also, I tried to use zone_moveable_pfn[] to store limits. But when
calculating the kernel area, I still have to store the limits in
temporary variables. I think the code was ugly. So I added an new array.

Thanks. :)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-12  9:09           ` Tang Chen
@ 2012-12-12  9:29             ` Simon Jeons
  2012-12-12 10:32               ` Tang Chen
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-12  9:29 UTC (permalink / raw)
  To: Tang Chen
  Cc: Jiang Liu, Jianguo Wu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Wed, 2012-12-12 at 17:09 +0800, Tang Chen wrote:
> On 12/12/2012 08:49 AM, Jiang Liu wrote:
> >>>>> This patch introduces a new array zone_movable_limit[] to store the
> >>>>> ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
> >>>>> The function sanitize_zone_movable_limit() will find out to which
> >>>>> node the ranges in movable_map.map[] belongs, and calculates the
> >>>>> low boundary of ZONE_MOVABLE for each node.
> >>
> >> What's the difference between zone_movable_limit[nid] and
> >> zone_movable_pfn[nid]?
> > zone_movable_limit[] is a temporary storage for zone_moveable_pfn[].
> > It's used to handle a special case if user specifies both movablecore_map
> > and movablecore/kernelcore on the kernel command line.
> >
> Hi Simon, Liu,
> 
> Sorry for the late and thanks for your discussion. :)
> 
> As Liu said, zone_movable_limit[] is a temporary array for calculation.
> 
> If users specified movablecore_map option, zone_movable_limit[] holds
> the lowest pfn of ZONE_MOVABLE limited by movablecore_map option. It is 
> constant, won't change.
> 
> Please refer to find_zone_movable_pfns_for_nodes() in patch4, you will
> see that zone_moveable_pfn[] will be changed each time kernel area
> increases.
> 
> So when kernel area increases on node i, zone_moveable_pfn[i] will
> increase. And if zone_moveable_pfn[i] > zone_movable_limit[i], we should
> stop allocate memory for kernel on node i. Here, I give movablecore_map 
> higher priority than kernelcore/movablecore.
> 
> And also, I tried to use zone_moveable_pfn[] to store limits. But when
> calculating the kernel area, I still have to store the limits in
> temporary variables. I think the code was ugly. So I added an new array.
> 
> Thanks. :)

Thanks for your clarify. 

Enable PAE on x86 32bit kernel, 8G memory, movablecore=6.5G
> 
> [    0.000000] 8304MB HIGHMEM available.
> [    0.000000] 885MB LOWMEM available.
> [    0.000000]   mapped low ram: 0 - 375fe000
> [    0.000000]   low ram: 0 - 375fe000
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
> [    0.000000]   Normal   [mem 0x01000000-0x375fdfff]
> [    0.000000]   HighMem  [mem 0x375fe000-0x3e5fffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
> [    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
> [    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
> [    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
> [    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
> [    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
> [    0.000000]   node   0: [mem 0x00000000-0x3e5fffff]
> [    0.000000] On node 0 totalpages: 2051391
> [    0.000000] free_area_init_node: node 0, pgdat c0c26a80,
> node_mem_map
> f19de200
> [    0.000000]   DMA zone: 32 pages used for memmap
> [    0.000000]   DMA zone: 0 pages reserved
> [    0.000000]   DMA zone: 3949 pages, LIFO batch:0
> [    0.000000]   Normal zone: 1740 pages used for memmap
> [    0.000000]   Normal zone: 220466 pages, LIFO batch:31
> [    0.000000]   HighMem zone: 16609 pages used for memmap
> [    0.000000]   HighMem zone: 1808595 pages, LIFO batch:31

Why zone movable disappear?




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/5][RESEND] page_alloc: Make movablecore_map has higher priority
  2012-12-12  1:33     ` Simon Jeons
@ 2012-12-12  9:34       ` Tang Chen
  2012-12-13  1:56         ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Tang Chen @ 2012-12-12  9:34 UTC (permalink / raw)
  To: Simon Jeons
  Cc: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

Hi Simon,

Thanks for reviewing. This logic is aimed at make movablecore_map
coexist with kernelcore/movablecore.

Please see below. :)

On 12/12/2012 09:33 AM, Simon Jeons wrote:
>> @@ -4839,9 +4839,17 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>>   		required_kernelcore = max(required_kernelcore, corepages);
>>   	}
>>
>> -	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
>> -	if (!required_kernelcore)
>> +	/*
>> +	 * If neither kernelcore/movablecore nor movablecore_map is specified,
>> +	 * there is no ZONE_MOVABLE. But if movablecore_map is specified, the
>> +	 * start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[].
>> +	 */
>> +	if (!required_kernelcore) {
>> +		if (movablecore_map.nr_map)
>> +			memcpy(zone_movable_pfn, zone_movable_limit,
>> +				sizeof(zone_movable_pfn));

If users didn't specified kernelcore option, then zone_movable_pfn[]
and zone_movable_limit[] are all the same. We skip the logic.

>>   		goto out;
>> +	}
>>
>>   	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>>   	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
>> @@ -4871,10 +4879,24 @@ restart:
>>   		for_each_mem_pfn_range(i, nid,&start_pfn,&end_pfn, NULL) {
>>   			unsigned long size_pages;
>>
>> +			/*
>> +			 * Find more memory for kernelcore in
>> +			 * [zone_movable_pfn[nid], zone_movable_limit[nid]).
>> +			 */
>>   			start_pfn = max(start_pfn, zone_movable_pfn[nid]);
>>   			if (start_pfn>= end_pfn)
>>   				continue;
>>
>
> Hi Chen,
>
>> +			if (zone_movable_limit[nid]) {

If users didn't give any limitation of ZONE_MOVABLE on node i, we could
skip the logic too.

>> +				end_pfn = min(end_pfn, zone_movable_limit[nid]);

In order to reuse the original kernelcore/movablecore logic, we keep
end_pfn <= zone_movable_limit[nid]. We device [start_pfn, end_pfn) into
two parts:
[start_pfn, zone_movable_limit[nid])
and
[zone_movable_limit[nid], end_pfn).

We just remove the second part, and go on to the original logic.

>> +				/* No range left for kernelcore in this node */
>> +				if (start_pfn>= end_pfn) {

Since we re-evaluated end_pfn, if we have crossed the limitation, we
should stop.

>> +					zone_movable_pfn[nid] =
>> +							zone_movable_limit[nid];

Here, we found the real limitation. That means, the lowest pfn of
ZONE_MOVABLE is either zone_movable_limit[nid] or the value the original
logic calculates out, which is below zone_movable_limit[nid].

>> +					break;

Then we break and go on to the next node.

>> +				}
>> +			}
>> +
>
> Could you explain this part of codes? hard to understand.
>
>>   			/* Account for what is only usable for kernelcore */
>>   			if (start_pfn<  usable_startpfn) {
>>   				unsigned long kernel_pages;
>> @@ -4934,12 +4956,12 @@ restart:
>>   	if (usable_nodes&&  required_kernelcore>  usable_nodes)
>>   		goto restart;
>>
>> +out:
>>   	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
>>   	for (nid = 0; nid<  MAX_NUMNODES; nid++)
>>   		zone_movable_pfn[nid] =
>>   			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
>>
>> -out:
>>   	/* restore the node_state */
>>   	node_states[N_HIGH_MEMORY] = saved_node_state;
>>   }
>
>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-12  9:29             ` Simon Jeons
@ 2012-12-12 10:32               ` Tang Chen
  2012-12-13  0:28                 ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Tang Chen @ 2012-12-12 10:32 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Jiang Liu, Jianguo Wu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

Hi Simon,

On 12/12/2012 05:29 PM, Simon Jeons wrote:
>
> Thanks for your clarify.
>
> Enable PAE on x86 32bit kernel, 8G memory, movablecore=6.5G

Could you please provide more info ?

Such as the whole kernel commondline. And did this happen after
you applied these patches ? What is the output without these
patches ?

Thanks. :)

>>
>> [    0.000000] 8304MB HIGHMEM available.
>> [    0.000000] 885MB LOWMEM available.
>> [    0.000000]   mapped low ram: 0 - 375fe000
>> [    0.000000]   low ram: 0 - 375fe000
>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
>> [    0.000000]   Normal   [mem 0x01000000-0x375fdfff]
>> [    0.000000]   HighMem  [mem 0x375fe000-0x3e5fffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
>> [    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
>> [    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
>> [    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
>> [    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
>> [    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
>> [    0.000000]   node   0: [mem 0x00000000-0x3e5fffff]
>> [    0.000000] On node 0 totalpages: 2051391
>> [    0.000000] free_area_init_node: node 0, pgdat c0c26a80,
>> node_mem_map
>> f19de200
>> [    0.000000]   DMA zone: 32 pages used for memmap
>> [    0.000000]   DMA zone: 0 pages reserved
>> [    0.000000]   DMA zone: 3949 pages, LIFO batch:0
>> [    0.000000]   Normal zone: 1740 pages used for memmap
>> [    0.000000]   Normal zone: 220466 pages, LIFO batch:31
>> [    0.000000]   HighMem zone: 16609 pages used for memmap
>> [    0.000000]   HighMem zone: 1808595 pages, LIFO batch:31
>
> Why zone movable disappear?
>
>
>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-12 10:32               ` Tang Chen
@ 2012-12-13  0:28                 ` Simon Jeons
  2012-12-13  1:48                   ` Tang Chen
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-13  0:28 UTC (permalink / raw)
  To: Tang Chen
  Cc: Jiang Liu, Jianguo Wu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Wed, 2012-12-12 at 18:32 +0800, Tang Chen wrote:
> Hi Simon,
> 
> On 12/12/2012 05:29 PM, Simon Jeons wrote:
> >
> > Thanks for your clarify.
> >
> > Enable PAE on x86 32bit kernel, 8G memory, movablecore=6.5G
> 
> Could you please provide more info ?
> 
> Such as the whole kernel commondline. And did this happen after
> you applied these patches ? What is the output without these
> patches ?

This result is without the patches, I didn't add more kernel
commandline, just movablecore=6.5G, but output as you see is strange, so
what happened? 

> 
> Thanks. :)
> 
> >>
> >> [    0.000000] 8304MB HIGHMEM available.
> >> [    0.000000] 885MB LOWMEM available.
> >> [    0.000000]   mapped low ram: 0 - 375fe000
> >> [    0.000000]   low ram: 0 - 375fe000
> >> [    0.000000] Zone ranges:
> >> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
> >> [    0.000000]   Normal   [mem 0x01000000-0x375fdfff]
> >> [    0.000000]   HighMem  [mem 0x375fe000-0x3e5fffff]
> >> [    0.000000] Movable zone start for each node
> >> [    0.000000] Early memory node ranges
> >> [    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
> >> [    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
> >> [    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
> >> [    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
> >> [    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
> >> [    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
> >> [    0.000000]   node   0: [mem 0x00000000-0x3e5fffff]
> >> [    0.000000] On node 0 totalpages: 2051391
> >> [    0.000000] free_area_init_node: node 0, pgdat c0c26a80,
> >> node_mem_map
> >> f19de200
> >> [    0.000000]   DMA zone: 32 pages used for memmap
> >> [    0.000000]   DMA zone: 0 pages reserved
> >> [    0.000000]   DMA zone: 3949 pages, LIFO batch:0
> >> [    0.000000]   Normal zone: 1740 pages used for memmap
> >> [    0.000000]   Normal zone: 220466 pages, LIFO batch:31
> >> [    0.000000]   HighMem zone: 16609 pages used for memmap
> >> [    0.000000]   HighMem zone: 1808595 pages, LIFO batch:31
> >
> > Why zone movable disappear?
> >
> >
> >
> >
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-13  0:28                 ` Simon Jeons
@ 2012-12-13  1:48                   ` Tang Chen
  2012-12-13  3:09                     ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Tang Chen @ 2012-12-13  1:48 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Jiang Liu, Jianguo Wu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On 12/13/2012 08:28 AM, Simon Jeons wrote:
> On Wed, 2012-12-12 at 18:32 +0800, Tang Chen wrote:
>> Hi Simon,
>>
>> On 12/12/2012 05:29 PM, Simon Jeons wrote:
>>>
>>> Thanks for your clarify.
>>>
>>> Enable PAE on x86 32bit kernel, 8G memory, movablecore=6.5G
>>
>> Could you please provide more info ?
>>
>> Such as the whole kernel commondline. And did this happen after
>> you applied these patches ? What is the output without these
>> patches ?
>
> This result is without the patches, I didn't add more kernel
> commandline, just movablecore=6.5G, but output as you see is strange, so
> what happened?

Hi Simon,

For now, I'm not quite sure what happened. Could you please provide the
output without the movablecore=6.5G option ?

Seeing from your output, your totalpages=2051391, which is about 8G. But
the memory mapped for your node 0 is obviously not enough.

When we have high memory, ZONE_MOVABLE is taken from ZONE_HIGH. So the
first line, 8304MB HIGHMEM available is also strange.

So I think we need more info to find out the problem. :)

Thank. :)

>
>>
>> Thanks. :)
>>
>>>>
>>>> [    0.000000] 8304MB HIGHMEM available.
>>>> [    0.000000] 885MB LOWMEM available.
>>>> [    0.000000]   mapped low ram: 0 - 375fe000
>>>> [    0.000000]   low ram: 0 - 375fe000
>>>> [    0.000000] Zone ranges:
>>>> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
>>>> [    0.000000]   Normal   [mem 0x01000000-0x375fdfff]
>>>> [    0.000000]   HighMem  [mem 0x375fe000-0x3e5fffff]
>>>> [    0.000000] Movable zone start for each node
>>>> [    0.000000] Early memory node ranges
>>>> [    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
>>>> [    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
>>>> [    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
>>>> [    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
>>>> [    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
>>>> [    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
>>>> [    0.000000]   node   0: [mem 0x00000000-0x3e5fffff]
>>>> [    0.000000] On node 0 totalpages: 2051391
>>>> [    0.000000] free_area_init_node: node 0, pgdat c0c26a80,
>>>> node_mem_map
>>>> f19de200
>>>> [    0.000000]   DMA zone: 32 pages used for memmap
>>>> [    0.000000]   DMA zone: 0 pages reserved
>>>> [    0.000000]   DMA zone: 3949 pages, LIFO batch:0
>>>> [    0.000000]   Normal zone: 1740 pages used for memmap
>>>> [    0.000000]   Normal zone: 220466 pages, LIFO batch:31
>>>> [    0.000000]   HighMem zone: 16609 pages used for memmap
>>>> [    0.000000]   HighMem zone: 1808595 pages, LIFO batch:31
>>>
>>> Why zone movable disappear?
>>>
>>>
>>>
>>>
>>
>
>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/5][RESEND] page_alloc: Make movablecore_map has higher priority
  2012-12-12  9:34       ` Tang Chen
@ 2012-12-13  1:56         ` Simon Jeons
  0 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2012-12-13  1:56 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Wed, 2012-12-12 at 17:34 +0800, Tang Chen wrote:
> Hi Simon,
> 
> Thanks for reviewing. This logic is aimed at make movablecore_map
> coexist with kernelcore/movablecore.
> 
> Please see below. :)

Hi Chen,

Thanks for your detail explanation. The logic looks reasonable to me. Bu
t how you guarantee the below changlog in your patchset.
1) If the range is involved in a single node, then from ss to the end of
the node will be ZONE_MOVABLE.
2) If the range covers two or more nodes, then from ss to the end of the
node will be ZONE_MOVABLE, and all the other nodes will only have
ZONE_MOVABLE.

> 
> On 12/12/2012 09:33 AM, Simon Jeons wrote:
> >> @@ -4839,9 +4839,17 @@ static void __init find_zone_movable_pfns_for_nodes(void)
> >>   		required_kernelcore = max(required_kernelcore, corepages);
> >>   	}
> >>
> >> -	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
> >> -	if (!required_kernelcore)
> >> +	/*
> >> +	 * If neither kernelcore/movablecore nor movablecore_map is specified,
> >> +	 * there is no ZONE_MOVABLE. But if movablecore_map is specified, the
> >> +	 * start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[].
> >> +	 */
> >> +	if (!required_kernelcore) {
> >> +		if (movablecore_map.nr_map)
> >> +			memcpy(zone_movable_pfn, zone_movable_limit,
> >> +				sizeof(zone_movable_pfn));
> 
> If users didn't specified kernelcore option, then zone_movable_pfn[]
> and zone_movable_limit[] are all the same. We skip the logic.
> 
> >>   		goto out;
> >> +	}
> >>
> >>   	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
> >>   	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
> >> @@ -4871,10 +4879,24 @@ restart:
> >>   		for_each_mem_pfn_range(i, nid,&start_pfn,&end_pfn, NULL) {
> >>   			unsigned long size_pages;
> >>
> >> +			/*
> >> +			 * Find more memory for kernelcore in
> >> +			 * [zone_movable_pfn[nid], zone_movable_limit[nid]).
> >> +			 */
> >>   			start_pfn = max(start_pfn, zone_movable_pfn[nid]);
> >>   			if (start_pfn>= end_pfn)
> >>   				continue;
> >>
> >
> > Hi Chen,
> >
> >> +			if (zone_movable_limit[nid]) {
> 
> If users didn't give any limitation of ZONE_MOVABLE on node i, we could
> skip the logic too.
> 
> >> +				end_pfn = min(end_pfn, zone_movable_limit[nid]);
> 
> In order to reuse the original kernelcore/movablecore logic, we keep
> end_pfn <= zone_movable_limit[nid]. We device [start_pfn, end_pfn) into
> two parts:
> [start_pfn, zone_movable_limit[nid])
> and
> [zone_movable_limit[nid], end_pfn).
> 
> We just remove the second part, and go on to the original logic.
> 
> >> +				/* No range left for kernelcore in this node */
> >> +				if (start_pfn>= end_pfn) {
> 
> Since we re-evaluated end_pfn, if we have crossed the limitation, we
> should stop.
> 
> >> +					zone_movable_pfn[nid] =
> >> +							zone_movable_limit[nid];
> 
> Here, we found the real limitation. That means, the lowest pfn of
> ZONE_MOVABLE is either zone_movable_limit[nid] or the value the original
> logic calculates out, which is below zone_movable_limit[nid].
> 
> >> +					break;
> 
> Then we break and go on to the next node.
> 
> >> +				}
> >> +			}
> >> +
> >
> > Could you explain this part of codes? hard to understand.
> >
> >>   			/* Account for what is only usable for kernelcore */
> >>   			if (start_pfn<  usable_startpfn) {
> >>   				unsigned long kernel_pages;
> >> @@ -4934,12 +4956,12 @@ restart:
> >>   	if (usable_nodes&&  required_kernelcore>  usable_nodes)
> >>   		goto restart;
> >>
> >> +out:
> >>   	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
> >>   	for (nid = 0; nid<  MAX_NUMNODES; nid++)
> >>   		zone_movable_pfn[nid] =
> >>   			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
> >>
> >> -out:
> >>   	/* restore the node_state */
> >>   	node_states[N_HIGH_MEMORY] = saved_node_state;
> >>   }
> >
> >
> >
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2012-12-13  1:48                   ` Tang Chen
@ 2012-12-13  3:09                     ` Simon Jeons
  0 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2012-12-13  3:09 UTC (permalink / raw)
  To: Tang Chen
  Cc: Jiang Liu, Jianguo Wu, hpa, akpm, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, rusty, lliubbo, jaegeuk.hanse, tony.luck, glommer,
	linux-kernel, linux-mm, linux-doc

On Thu, 2012-12-13 at 09:48 +0800, Tang Chen wrote:
> On 12/13/2012 08:28 AM, Simon Jeons wrote:
> > On Wed, 2012-12-12 at 18:32 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> On 12/12/2012 05:29 PM, Simon Jeons wrote:
> >>>
> >>> Thanks for your clarify.
> >>>
> >>> Enable PAE on x86 32bit kernel, 8G memory, movablecore=6.5G
> >>
> >> Could you please provide more info ?
> >>
> >> Such as the whole kernel commondline. And did this happen after
> >> you applied these patches ? What is the output without these
> >> patches ?
> >
> > This result is without the patches, I didn't add more kernel
> > commandline, just movablecore=6.5G, but output as you see is strange, so
> > what happened?
> 
> Hi Simon,
> 
> For now, I'm not quite sure what happened. Could you please provide the
> output without the movablecore=6.5G option ?
> 
> Seeing from your output, your totalpages=2051391, which is about 8G. But
> the memory mapped for your node 0 is obviously not enough.
> 
> When we have high memory, ZONE_MOVABLE is taken from ZONE_HIGH. So the
> first line, 8304MB HIGHMEM available is also strange.
> 
> So I think we need more info to find out the problem. :)
> 

[    0.000000] 8304MB HIGHMEM available.
[    0.000000] 885MB LOWMEM available.
[    0.000000]   mapped low ram: 0 - 375fe000
[    0.000000]   low ram: 0 - 375fe000
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
[    0.000000]   Normal   [mem 0x01000000-0x375fdfff]
[    0.000000]   HighMem  [mem 0x375fe000-0x3e5fffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
[    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
[    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
[    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
[    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
[    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
[    0.000000]   node   0: [mem 0x00000000-0x3e5fffff]
[    0.000000] On node 0 totalpages: 2051391
[    0.000000] free_area_init_node: node 0, pgdat c0c2cc00, node_mem_map
f19c2200
[    0.000000]   DMA zone: 32 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 3949 pages, LIFO batch:0
[    0.000000]   Normal zone: 1740 pages used for memmap
[    0.000000]   Normal zone: 220466 pages, LIFO batch:31
[    0.000000]   HighMem zone: 16609 pages used for memmap
[    0.000000]   HighMem zone: 1808595 pages, LIFO batch:31


menuentry 'Fedora (3.7.0+)' --class fedora --class gnu-linux --class gnu
--class os $menuentry_id_option
'gnulinux-simple-7ed9528d-006f-4d9e-93d9-f68b0967ca99' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod ext2
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1
--hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'
eba9dfce-d7f1-4b5c-9199-f2abf80e5dc6
        else
          search --no-floppy --fs-uuid --set=root
eba9dfce-d7f1-4b5c-9199-f2abf80e5dc6
        fi
        echo 'Loading Fedora (3.7.0+)'
        linux   /vmlinuz-3.7.0+ root=/dev/mapper/vg_kernel-lv_root ro
rd.md=0 rd.dm=0 rd.lvm.lv=vg_kernel/lv_root SYSFONT=True  KEYTABLE=us
rd.luks=0 LANG=en_US.UTF-8 rd.lvm.lv=vg_kernel/lv_swap rhgb quiet
        echo 'Loading initial ramdisk ...'
        initrd /initramfs-3.7.0+.img
}


> Thank. :)
> 
> >
> >>
> >> Thanks. :)
> >>
> >>>>
> >>>> [    0.000000] 8304MB HIGHMEM available.
> >>>> [    0.000000] 885MB LOWMEM available.
> >>>> [    0.000000]   mapped low ram: 0 - 375fe000
> >>>> [    0.000000]   low ram: 0 - 375fe000
> >>>> [    0.000000] Zone ranges:
> >>>> [    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
> >>>> [    0.000000]   Normal   [mem 0x01000000-0x375fdfff]
> >>>> [    0.000000]   HighMem  [mem 0x375fe000-0x3e5fffff]
> >>>> [    0.000000] Movable zone start for each node
> >>>> [    0.000000] Early memory node ranges
> >>>> [    0.000000]   node   0: [mem 0x00010000-0x0009cfff]
> >>>> [    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
> >>>> [    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
> >>>> [    0.000000]   node   0: [mem 0x40200000-0xb69cbfff]
> >>>> [    0.000000]   node   0: [mem 0xb6a46000-0xb6a47fff]
> >>>> [    0.000000]   node   0: [mem 0xb6b1c000-0xb6cfffff]
> >>>> [    0.000000]   node   0: [mem 0x00000000-0x3e5fffff]
> >>>> [    0.000000] On node 0 totalpages: 2051391
> >>>> [    0.000000] free_area_init_node: node 0, pgdat c0c26a80,
> >>>> node_mem_map
> >>>> f19de200
> >>>> [    0.000000]   DMA zone: 32 pages used for memmap
> >>>> [    0.000000]   DMA zone: 0 pages reserved
> >>>> [    0.000000]   DMA zone: 3949 pages, LIFO batch:0
> >>>> [    0.000000]   Normal zone: 1740 pages used for memmap
> >>>> [    0.000000]   Normal zone: 220466 pages, LIFO batch:31
> >>>> [    0.000000]   HighMem zone: 16609 pages used for memmap
> >>>> [    0.000000]   HighMem zone: 1808595 pages, LIFO batch:31
> >>>
> >>> Why zone movable disappear?
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-12-13  3:09 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-11  2:33 [PATCH v3 0/5] Add movablecore_map boot option Tang Chen
2012-12-11  2:33 ` [PATCH v3 1/5] x86: get pg_data_t's memory from other node Tang Chen
2012-12-11  2:33 ` [PATCH v3 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
2012-12-11  2:33 ` [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
2012-12-11  3:07   ` Jianguo Wu
2012-12-11  3:32     ` Tang Chen
2012-12-11 11:28       ` Simon Jeons
2012-12-12  0:49         ` Jiang Liu
2012-12-12  9:09           ` Tang Chen
2012-12-12  9:29             ` Simon Jeons
2012-12-12 10:32               ` Tang Chen
2012-12-13  0:28                 ` Simon Jeons
2012-12-13  1:48                   ` Tang Chen
2012-12-13  3:09                     ` Simon Jeons
2012-12-11 12:24     ` Simon Jeons
2012-12-11 12:41       ` Jianguo Wu
2012-12-11 13:20         ` Simon Jeons
2012-12-12  1:57           ` Jianguo Wu
2012-12-12  2:03             ` Simon Jeons
2012-12-12  1:58           ` Lin Feng
2012-12-11  4:55   ` [PATCH v3 3/5][RESEND] " Tang Chen
2012-12-11  2:33 ` [PATCH v3 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
2012-12-11  4:56   ` [PATCH v3 4/5][RESEND] " Tang Chen
2012-12-12  1:33     ` Simon Jeons
2012-12-12  9:34       ` Tang Chen
2012-12-13  1:56         ` Simon Jeons
2012-12-11  2:33 ` [PATCH v3 5/5] page_alloc: Bootmem limit with movablecore_map Tang Chen
2012-12-11 11:33 ` [PATCH v3 0/5] Add movablecore_map boot option Simon Jeons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).