linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug
@ 2012-09-10  8:58 Lai Jiangshan
  2012-09-10  8:58 ` [V4 PATCH 01/26] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
                   ` (26 more replies)
  0 siblings, 27 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:58 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

	A) Introduction:

This patchset adds MOVABLE-dedicated node and online_movable for memory-management.

It is used for anti-fragmentation(hugepage, big-order allocation...),
hot-removal-of-memory(virtualization, power-conserve, move memory between systems
to make better utilities of memories).

This patchset is based on 650470d1da17c20bf9700f9446775a01cbda52c3 of newest tip tree.

	B) User Interface:

When users(big system manager) need config some node/memory as MOVABLE:
	1 Use kernelcore_max_addr=XX when boot
	2 Use movable_online hotplug action when running
We may introduce some more convenient interface, such as
	movable_node=NODE_LIST boot option.

	C) Patches

Patch1-3      Fix problems of the current code.(all related with hotplug)
Patch4        cleanup for node_state_attr
Patch5        introduce N_MEMORY
Patch6-18     use N_MEMORY instead N_HIGH_MEMORY.
              The patches are separated by subsystem,
              *these conversions was(must be) checked carefully*.
              Patch18 also changes the node_states initialization
Patch19       Add config to allow MOVABLE-dedicated node
Patch20-24    Add kernelcore_max_addr
Patch25,26       Add online_movable and online_kernel


	D) changes
change V4-v3
	rebase.
	online_movable/online_kernel can create a zone from empty
	or empyt a zone

change V3-v2:
	Proper nodemask management

change V2-V1:

The original V1 patchset of MOVABLE-dedicated node is here:
http://comments.gmane.org/gmane.linux.kernel.mm/78122

The new V2 adds N_MEMORY and a notion of "MOVABLE-dedicated node".
And fix some related problems.

The orignal V1 patchset of "add online_movable" is here:
https://lkml.org/lkml/2012/7/4/145

The new V2 discards the MIGRATE_HOTREMOVE approach, and use a more straight
implementation(only 1 patch).
Lai Jiangshan (22):
  page_alloc.c: don't subtract unrelated memmap from zone's present
    pages
  memory_hotplug: fix missing nodemask management
  slub, hotplug: ignore unrelated node's hot-adding and hot-removing
  node: cleanup node_state_attr
  node_states: introduce N_MEMORY
  cpuset: use N_MEMORY instead N_HIGH_MEMORY
  procfs: use N_MEMORY instead N_HIGH_MEMORY
  memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  oom: use N_MEMORY instead N_HIGH_MEMORY
  mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
  mempolicy: use N_MEMORY instead N_HIGH_MEMORY
  hugetlb: use N_MEMORY instead N_HIGH_MEMORY
  vmstat: use N_MEMORY instead N_HIGH_MEMORY
  kthread: use N_MEMORY instead N_HIGH_MEMORY
  init: use N_MEMORY instead N_HIGH_MEMORY
  vmscan: use N_MEMORY instead N_HIGH_MEMORY
  page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states
    initialization
  hotplug: update nodemasks management
  numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
  page_alloc: add kernelcore_max_addr
  mm, memory-hotplug: add online_movable and online_kernel
  memory_hotplug: handle empty zone when online_movable/online_kernel

Yasuaki Ishimatsu (4):
  x86: get pg_data_t's memory from other node
  x86: use memblock_set_current_limit() to set memblock.current_limit
  memblock: limit memory address from memblock
  memblock: compare current_limit with end variable at
    memblock_find_in_range_node()

 Documentation/cgroups/cpusets.txt   |    2 +-
 Documentation/kernel-parameters.txt |    9 ++
 Documentation/memory-hotplug.txt    |   24 +++-
 arch/x86/kernel/setup.c             |    4 +-
 arch/x86/mm/init_64.c               |    4 +-
 arch/x86/mm/numa.c                  |    8 +-
 drivers/base/memory.c               |   19 ++-
 drivers/base/node.c                 |   28 +++--
 fs/proc/kcore.c                     |    2 +-
 fs/proc/task_mmu.c                  |    4 +-
 include/linux/cpuset.h              |    2 +-
 include/linux/memblock.h            |    1 +
 include/linux/memory.h              |    2 +
 include/linux/memory_hotplug.h      |   13 ++-
 include/linux/nodemask.h            |    5 +
 init/main.c                         |    2 +-
 kernel/cpuset.c                     |   32 ++--
 kernel/kthread.c                    |    2 +-
 mm/Kconfig                          |    8 +
 mm/hugetlb.c                        |   24 ++--
 mm/memblock.c                       |   10 +-
 mm/memcontrol.c                     |   18 ++--
 mm/memory_hotplug.c                 |  271 ++++++++++++++++++++++++++++++++---
 mm/mempolicy.c                      |   12 +-
 mm/migrate.c                        |    2 +-
 mm/oom_kill.c                       |    2 +-
 mm/page_alloc.c                     |   96 ++++++++-----
 mm/page_cgroup.c                    |    2 +-
 mm/slub.c                           |    4 +-
 mm/vmscan.c                         |    4 +-
 mm/vmstat.c                         |    4 +-
 31 files changed, 476 insertions(+), 144 deletions(-)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [V4 PATCH 01/26] page_alloc.c: don't subtract unrelated memmap from zone's present pages
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
@ 2012-09-10  8:58 ` Lai Jiangshan
  2012-09-10  8:58 ` [V4 PATCH 02/26] memory_hotplug: fix missing nodemask management Lai Jiangshan
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:58 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

A)======
Currently, memory-page-map(struct page array) is not defined in struct zone.
It is defined in several ways:

FLATMEM: global memmap, can be allocated from any zone <= ZONE_NORMAL
CONFIG_DISCONTIGMEM: node-specific memmap, can be allocated from any
		     zone <= ZONE_NORMAL within that node.
CONFIG_SPARSEMEM: memorysection-specific memmap, can be allocated from any zone,
		  when CONFIG_SPARSEMEM_VMEMMAP, it is even not physical continuous.

So, the memmap has nothing directly related with the zone. And it's memory can be
allocated outside, so it is wrong to subtract memmap's size from zone's
present pages.

B)======
When system has large holes, the subtracted-present-pages-size will become
very small or negative, make the memory management works bad at the zone or
make the zone unusable even the real-present-pages-size is actually large.

C)======
And subtracted-present-pages-size has problem when memory-hot-removing,
the zone->zone->present_pages may overflow and become huge(unsigned long).

D)======
memory-page-map is large and long living unreclaimable memory, it is good to
subtract them for proper watermark.
So a new proper approach is needed to do it similarly
and new approach should also handle other long living unreclaimable memory.

Current blindly subtracted-present-pages-size approach does wrong, remove it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/page_alloc.c |   20 +-------------------
 1 files changed, 1 insertions(+), 19 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c66fb87..9e3c8b2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4401,30 +4401,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 	for (j = 0; j < MAX_NR_ZONES; j++) {
 		struct zone *zone = pgdat->node_zones + j;
-		unsigned long size, realsize, memmap_pages;
+		unsigned long size, realsize;
 
 		size = zone_spanned_pages_in_node(nid, j, zones_size);
 		realsize = size - zone_absent_pages_in_node(nid, j,
 								zholes_size);
 
-		/*
-		 * Adjust realsize so that it accounts for how much memory
-		 * is used by this zone for memmap. This affects the watermark
-		 * and per-cpu initialisations
-		 */
-		memmap_pages =
-			PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
-		if (realsize >= memmap_pages) {
-			realsize -= memmap_pages;
-			if (memmap_pages)
-				printk(KERN_DEBUG
-				       "  %s zone: %lu pages used for memmap\n",
-				       zone_names[j], memmap_pages);
-		} else
-			printk(KERN_WARNING
-				"  %s zone: %lu pages exceeds realsize %lu\n",
-				zone_names[j], memmap_pages, realsize);
-
 		/* Account for reserved pages */
 		if (j == 0 && realsize > dma_reserve) {
 			realsize -= dma_reserve;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 02/26] memory_hotplug: fix missing nodemask management
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
  2012-09-10  8:58 ` [V4 PATCH 01/26] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
@ 2012-09-10  8:58 ` Lai Jiangshan
  2012-09-11  2:55   ` Wen Congyang
  2012-09-10  8:58 ` [V4 PATCH 03/26] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:58 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
it forgot to manage node_states[N_NORMAL_MEMORY]. fix it.

Add check_nodemasks_changes_online() and check_nodemasks_changes_offline
to detect do node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
are changed while hotplug.

Also add @status_change_nid_normal to struct memory_notify, thus
the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
are changed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 Documentation/memory-hotplug.txt |    5 ++-
 include/linux/memory.h           |    1 +
 mm/memory_hotplug.c              |   94 +++++++++++++++++++++++++++++++------
 3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6d0c251..6e6cbc7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -377,15 +377,18 @@ The third argument is passed by pointer of struct memory_notify.
 struct memory_notify {
        unsigned long start_pfn;
        unsigned long nr_pages;
+       int status_change_nid_normal;
        int status_change_nid;
 }
 
 start_pfn is start_pfn of online/offline memory.
 nr_pages is # of pages of online/offline memory.
+status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
 status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
 set/clear. It means a new(memoryless) node gets new memory by online and a
 node loses all memory. If this is -1, then nodemask status is not changed.
-If status_changed_nid >= 0, callback should create/discard structures for the
+If status_changed_nid* >= 0, callback should create/discard structures for the
 node if necessary.
 
 --------------
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 1ac7f6e..6b9202b 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
 struct memory_notify {
 	unsigned long start_pfn;
 	unsigned long nr_pages;
+	int status_change_nid_normal;
 	int status_change_nid;
 };
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ad25f9..8c3bcf6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -456,6 +456,34 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 	return 0;
 }
 
+static void check_nodemasks_changes_online(unsigned long nr_pages,
+	struct zone *zone, struct memory_notify *arg)
+{
+	int nid = zone_to_nid(zone);
+	enum zone_type zone_last = ZONE_NORMAL;
+
+	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+		zone_last = ZONE_MOVABLE;
+
+	if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
+		arg->status_change_nid_normal = nid;
+	else
+		arg->status_change_nid_normal = -1;
+
+	if (!node_state(nid, N_HIGH_MEMORY))
+		arg->status_change_nid = nid;
+	else
+		arg->status_change_nid = -1;
+}
+
+static void set_nodemasks(int node, struct memory_notify *arg)
+{
+	if (arg->status_change_nid_normal >= 0)
+		node_set_state(node, N_NORMAL_MEMORY);
+
+	node_set_state(node, N_HIGH_MEMORY);
+}
+
 
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 {
@@ -467,13 +495,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 	struct memory_notify arg;
 
 	lock_memory_hotplug();
+	/*
+	 * This doesn't need a lock to do pfn_to_page().
+	 * The section can't be removed here because of the
+	 * memory_block->state_mutex.
+	 */
+	zone = page_zone(pfn_to_page(pfn));
+
 	arg.start_pfn = pfn;
 	arg.nr_pages = nr_pages;
-	arg.status_change_nid = -1;
+	check_nodemasks_changes_online(nr_pages, zone, &arg);
 
 	nid = page_to_nid(pfn_to_page(pfn));
-	if (node_present_pages(nid) == 0)
-		arg.status_change_nid = nid;
 
 	ret = memory_notify(MEM_GOING_ONLINE, &arg);
 	ret = notifier_to_errno(ret);
@@ -483,12 +516,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 		return ret;
 	}
 	/*
-	 * This doesn't need a lock to do pfn_to_page().
-	 * The section can't be removed here because of the
-	 * memory_block->state_mutex.
-	 */
-	zone = page_zone(pfn_to_page(pfn));
-	/*
 	 * If this zone is not populated, then it is not in zonelist.
 	 * This means the page allocator ignores this zone.
 	 * So, zonelist must be updated after online.
@@ -513,7 +540,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 	zone->present_pages += onlined_pages;
 	zone->zone_pgdat->node_present_pages += onlined_pages;
 	if (onlined_pages) {
-		node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
+		set_nodemasks(zone_to_nid(zone), &arg);
 		if (need_zonelists_rebuild)
 			build_all_zonelists(NULL, zone);
 		else
@@ -866,6 +893,44 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	return offlined;
 }
 
+static void check_nodemasks_changes_offline(unsigned long nr_pages,
+		struct zone *zone, struct memory_notify *arg)
+{
+	struct pglist_data *pgdat = zone->zone_pgdat;
+	unsigned long present_pages = 0;
+	enum zone_type zt, zone_last = ZONE_NORMAL;
+
+	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+		zone_last = ZONE_MOVABLE;
+
+	for (zt = 0; zt <= zone_last; zt++)
+		present_pages += pgdat->node_zones[zt].present_pages;
+	if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
+		arg->status_change_nid_normal = zone_to_nid(zone);
+	else
+		arg->status_change_nid_normal = -1;
+
+	zone_last = ZONE_MOVABLE;
+	for (; zt <= zone_last; zt++)
+		present_pages += pgdat->node_zones[zt].present_pages;
+	if (nr_pages >= present_pages)
+		arg->status_change_nid = zone_to_nid(zone);
+	else
+		arg->status_change_nid = -1;
+}
+
+static void clear_nodemasks(int node, struct memory_notify *arg)
+{
+	if (arg->status_change_nid_normal >= 0)
+		node_clear_state(node, N_NORMAL_MEMORY);
+
+	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+		return;
+
+	if (arg->status_change_nid >= 0)
+		node_clear_state(node, N_HIGH_MEMORY);
+}
+
 static int __ref offline_pages(unsigned long start_pfn,
 		  unsigned long end_pfn, unsigned long timeout)
 {
@@ -899,9 +964,7 @@ static int __ref offline_pages(unsigned long start_pfn,
 
 	arg.start_pfn = start_pfn;
 	arg.nr_pages = nr_pages;
-	arg.status_change_nid = -1;
-	if (nr_pages >= node_present_pages(node))
-		arg.status_change_nid = node;
+	check_nodemasks_changes_offline(nr_pages, zone, &arg);
 
 	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
 	ret = notifier_to_errno(ret);
@@ -969,10 +1032,9 @@ repeat:
 	if (!populated_zone(zone))
 		zone_pcp_reset(zone);
 
-	if (!node_present_pages(node)) {
-		node_clear_state(node, N_HIGH_MEMORY);
+	clear_nodemasks(node, &arg);
+	if (arg.status_change_nid >= 0)
 		kswapd_stop(node);
-	}
 
 	vm_total_pages = nr_free_pagecache_pages();
 	writeback_set_ratelimit();
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 03/26] slub, hotplug: ignore unrelated node's hot-adding and hot-removing
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
  2012-09-10  8:58 ` [V4 PATCH 01/26] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
  2012-09-10  8:58 ` [V4 PATCH 02/26] memory_hotplug: fix missing nodemask management Lai Jiangshan
@ 2012-09-10  8:58 ` Lai Jiangshan
  2012-09-10  8:58 ` [V4 PATCH 04/26] node: cleanup node_state_attr Lai Jiangshan
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:58 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

SLUB only fucus on the nodes which has normal memory, so ignore the other
node's hot-adding and hot-removing.

so we only do something when marg->status_change_nid_normal > 0.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/slub.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8f78e25..7a1d02c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3572,7 +3572,7 @@ static void slab_mem_offline_callback(void *arg)
 	struct memory_notify *marg = arg;
 	int offline_node;
 
-	offline_node = marg->status_change_nid;
+	offline_node = marg->status_change_nid_normal;
 
 	/*
 	 * If the node still has available memory. we need kmem_cache_node
@@ -3605,7 +3605,7 @@ static int slab_mem_going_online_callback(void *arg)
 	struct kmem_cache_node *n;
 	struct kmem_cache *s;
 	struct memory_notify *marg = arg;
-	int nid = marg->status_change_nid;
+	int nid = marg->status_change_nid_normal;
 	int ret = 0;
 
 	/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 04/26] node: cleanup node_state_attr
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (2 preceding siblings ...)
  2012-09-10  8:58 ` [V4 PATCH 03/26] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
@ 2012-09-10  8:58 ` Lai Jiangshan
  2012-09-10  8:58 ` [V4 PATCH 05/26] node_states: introduce N_MEMORY Lai Jiangshan
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:58 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

use [index] = init_value
use N_xxxxx instead of hardcode.

Make it more readability and easy to add new state.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 drivers/base/node.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..5d7731e 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -614,23 +614,23 @@ static ssize_t show_node_state(struct device *dev,
 	{ __ATTR(name, 0444, show_node_state, NULL), state }
 
 static struct node_attr node_state_attr[] = {
-	_NODE_ATTR(possible, N_POSSIBLE),
-	_NODE_ATTR(online, N_ONLINE),
-	_NODE_ATTR(has_normal_memory, N_NORMAL_MEMORY),
-	_NODE_ATTR(has_cpu, N_CPU),
+	[N_POSSIBLE] = _NODE_ATTR(possible, N_POSSIBLE),
+	[N_ONLINE] = _NODE_ATTR(online, N_ONLINE),
+	[N_NORMAL_MEMORY] = _NODE_ATTR(has_normal_memory, N_NORMAL_MEMORY),
 #ifdef CONFIG_HIGHMEM
-	_NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
+	[N_HIGH_MEMORY] = _NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
 #endif
+	[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
 };
 
 static struct attribute *node_state_attrs[] = {
-	&node_state_attr[0].attr.attr,
-	&node_state_attr[1].attr.attr,
-	&node_state_attr[2].attr.attr,
-	&node_state_attr[3].attr.attr,
+	&node_state_attr[N_POSSIBLE].attr.attr,
+	&node_state_attr[N_ONLINE].attr.attr,
+	&node_state_attr[N_NORMAL_MEMORY].attr.attr,
 #ifdef CONFIG_HIGHMEM
-	&node_state_attr[4].attr.attr,
+	&node_state_attr[N_HIGH_MEMORY].attr.attr,
 #endif
+	&node_state_attr[N_CPU].attr.attr,
 	NULL
 };
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 05/26] node_states: introduce N_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (3 preceding siblings ...)
  2012-09-10  8:58 ` [V4 PATCH 04/26] node: cleanup node_state_attr Lai Jiangshan
@ 2012-09-10  8:58 ` Lai Jiangshan
  2012-09-10  8:58 ` [V4 PATCH 06/26] cpuset: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:58 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

We have N_NORMAL_MEMORY for standing for the nodes that have normal memory with
zone_type <= ZONE_NORMAL.

And we have N_HIGH_MEMORY for standing for the nodes that have normal or high
memory.

But we don't have any word to stand for the nodes that have *any* memory.

And we have N_CPU but without N_MEMORY.

Current code reuse the N_HIGH_MEMORY for this purpose because any node which
has memory must have high memory or normal memory currently.

A)	But this reusing is bad for *readability*. Because the name
	N_HIGH_MEMORY just stands for high or normal:

A.example 1)
	mem_cgroup_nr_lru_pages():
		for_each_node_state(nid, N_HIGH_MEMORY)

	The user will be confused(why this function just counts for high or
	normal memory node? does it counts for ZONE_MOVABLE's lru pages?)
	until someone else tell them N_HIGH_MEMORY is reused to stand for
	nodes that have any memory.

A.cont) If we introduce N_MEMORY, we can reduce this confusing
	AND make the code more clearly:

A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:

	One is in page_cgroup_init(void):
		for_each_node_state(nid, N_HIGH_MEMORY) {

	It means if the node have memory, we will allocate page_cgroup map for
	the node. We should use N_MEMORY instead here to gaim more clearly.

	The second using is in alloc_page_cgroup():
		if (node_state(nid, N_HIGH_MEMORY))
			addr = vzalloc_node(size, nid);

	It means if the node has high or normal memory that can be allocated
	from kernel. We should keep N_HIGH_MEMORY here, and it will be better
	if the "any memory" semantic of N_HIGH_MEMORY is removed.

B)	This reusing is out-dated if we introduce MOVABLE-dedicated node.
	The MOVABLE-dedicated node should not appear in
	node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY],
	because MOVABLE-dedicated node has no high or normal memory.

	In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node
	is in node_stats[N_HIGH_MEMORY], it is also means it is in
	node_stats[N_NORMAL_MEMORY], it causes SLUB wrong.

	The slub uses
		for_each_node_state(nid, N_NORMAL_MEMORY)
	and creates kmem_cache_node for MOVABLE-dedicated node and cause problem.

In one word, we need a N_MEMORY. We just intrude it as an alias to
N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late patches.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 include/linux/nodemask.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 7afc363..c6ebdc9 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -380,6 +380,7 @@ enum node_states {
 #else
 	N_HIGH_MEMORY = N_NORMAL_MEMORY,
 #endif
+	N_MEMORY = N_HIGH_MEMORY,
 	N_CPU,		/* The node has one or more cpus */
 	NR_NODE_STATES
 };
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 06/26] cpuset: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (4 preceding siblings ...)
  2012-09-10  8:58 ` [V4 PATCH 05/26] node_states: introduce N_MEMORY Lai Jiangshan
@ 2012-09-10  8:58 ` Lai Jiangshan
  2012-09-10  8:58 ` [V4 PATCH 07/26] procfs: " Lai Jiangshan
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:58 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 Documentation/cgroups/cpusets.txt |    2 +-
 include/linux/cpuset.h            |    2 +-
 kernel/cpuset.c                   |   32 ++++++++++++++++----------------
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index cefd3d8..12e01d4 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -218,7 +218,7 @@ and name space for cpusets, with a minimum of additional kernel code.
 The cpus and mems files in the root (top_cpuset) cpuset are
 read-only.  The cpus file automatically tracks the value of
 cpu_online_mask using a CPU hotplug notifier, and the mems file
-automatically tracks the value of node_states[N_HIGH_MEMORY]--i.e.,
+automatically tracks the value of node_states[N_MEMORY]--i.e.,
 nodes with memory--using the cpuset_track_online_nodes() hook.
 
 
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 838320f..8c8a60d 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -144,7 +144,7 @@ static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
 	return node_possible_map;
 }
 
-#define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
+#define cpuset_current_mems_allowed (node_states[N_MEMORY])
 static inline void cpuset_init_current_mems_allowed(void) {}
 
 static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index f33c715..2b133db 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -302,10 +302,10 @@ static void guarantee_online_cpus(const struct cpuset *cs,
  * are online, with memory.  If none are online with memory, walk
  * up the cpuset hierarchy until we find one that does have some
  * online mems.  If we get all the way to the top and still haven't
- * found any online mems, return node_states[N_HIGH_MEMORY].
+ * found any online mems, return node_states[N_MEMORY].
  *
  * One way or another, we guarantee to return some non-empty subset
- * of node_states[N_HIGH_MEMORY].
+ * of node_states[N_MEMORY].
  *
  * Call with callback_mutex held.
  */
@@ -313,14 +313,14 @@ static void guarantee_online_cpus(const struct cpuset *cs,
 static void guarantee_online_mems(const struct cpuset *cs, nodemask_t *pmask)
 {
 	while (cs && !nodes_intersects(cs->mems_allowed,
-					node_states[N_HIGH_MEMORY]))
+					node_states[N_MEMORY]))
 		cs = cs->parent;
 	if (cs)
 		nodes_and(*pmask, cs->mems_allowed,
-					node_states[N_HIGH_MEMORY]);
+					node_states[N_MEMORY]);
 	else
-		*pmask = node_states[N_HIGH_MEMORY];
-	BUG_ON(!nodes_intersects(*pmask, node_states[N_HIGH_MEMORY]));
+		*pmask = node_states[N_MEMORY];
+	BUG_ON(!nodes_intersects(*pmask, node_states[N_MEMORY]));
 }
 
 /*
@@ -1100,7 +1100,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 		return -ENOMEM;
 
 	/*
-	 * top_cpuset.mems_allowed tracks node_stats[N_HIGH_MEMORY];
+	 * top_cpuset.mems_allowed tracks node_stats[N_MEMORY];
 	 * it's read-only
 	 */
 	if (cs == &top_cpuset) {
@@ -1122,7 +1122,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 			goto done;
 
 		if (!nodes_subset(trialcs->mems_allowed,
-				node_states[N_HIGH_MEMORY])) {
+				node_states[N_MEMORY])) {
 			retval =  -EINVAL;
 			goto done;
 		}
@@ -2034,7 +2034,7 @@ static struct cpuset *cpuset_next(struct list_head *queue)
  * before dropping down to the next.  It always processes a node before
  * any of its children.
  *
- * In the case of memory hot-unplug, it will remove nodes from N_HIGH_MEMORY
+ * In the case of memory hot-unplug, it will remove nodes from N_MEMORY
  * if all present pages from a node are offlined.
  */
 static void
@@ -2073,7 +2073,7 @@ scan_cpusets_upon_hotplug(struct cpuset *root, enum hotplug_event event)
 
 			/* Continue past cpusets with all mems online */
 			if (nodes_subset(cp->mems_allowed,
-					node_states[N_HIGH_MEMORY]))
+					node_states[N_MEMORY]))
 				continue;
 
 			oldmems = cp->mems_allowed;
@@ -2081,7 +2081,7 @@ scan_cpusets_upon_hotplug(struct cpuset *root, enum hotplug_event event)
 			/* Remove offline mems from this cpuset. */
 			mutex_lock(&callback_mutex);
 			nodes_and(cp->mems_allowed, cp->mems_allowed,
-						node_states[N_HIGH_MEMORY]);
+						node_states[N_MEMORY]);
 			mutex_unlock(&callback_mutex);
 
 			/* Move tasks from the empty cpuset to a parent */
@@ -2134,8 +2134,8 @@ void cpuset_update_active_cpus(bool cpu_online)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 /*
- * Keep top_cpuset.mems_allowed tracking node_states[N_HIGH_MEMORY].
- * Call this routine anytime after node_states[N_HIGH_MEMORY] changes.
+ * Keep top_cpuset.mems_allowed tracking node_states[N_MEMORY].
+ * Call this routine anytime after node_states[N_MEMORY] changes.
  * See cpuset_update_active_cpus() for CPU hotplug handling.
  */
 static int cpuset_track_online_nodes(struct notifier_block *self,
@@ -2148,7 +2148,7 @@ static int cpuset_track_online_nodes(struct notifier_block *self,
 	case MEM_ONLINE:
 		oldmems = top_cpuset.mems_allowed;
 		mutex_lock(&callback_mutex);
-		top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
+		top_cpuset.mems_allowed = node_states[N_MEMORY];
 		mutex_unlock(&callback_mutex);
 		update_tasks_nodemask(&top_cpuset, &oldmems, NULL);
 		break;
@@ -2177,7 +2177,7 @@ static int cpuset_track_online_nodes(struct notifier_block *self,
 void __init cpuset_init_smp(void)
 {
 	cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask);
-	top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
+	top_cpuset.mems_allowed = node_states[N_MEMORY];
 
 	hotplug_memory_notifier(cpuset_track_online_nodes, 10);
 
@@ -2245,7 +2245,7 @@ void cpuset_init_current_mems_allowed(void)
  *
  * Description: Returns the nodemask_t mems_allowed of the cpuset
  * attached to the specified @tsk.  Guaranteed to return some non-empty
- * subset of node_states[N_HIGH_MEMORY], even if this means going outside the
+ * subset of node_states[N_MEMORY], even if this means going outside the
  * tasks cpuset.
  **/
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 07/26] procfs: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (5 preceding siblings ...)
  2012-09-10  8:58 ` [V4 PATCH 06/26] cpuset: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
@ 2012-09-10  8:58 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 08/26] memcontrol: " Lai Jiangshan
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:58 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 fs/proc/kcore.c    |    2 +-
 fs/proc/task_mmu.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 86c67ee..e96d4f1 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -249,7 +249,7 @@ static int kcore_update_ram(void)
 	/* Not inialized....update now */
 	/* find out "max pfn" */
 	end_pfn = 0;
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		unsigned long node_end;
 		node_end  = NODE_DATA(nid)->node_start_pfn +
 			NODE_DATA(nid)->node_spanned_pages;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 4540b8f..ed3d381 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1080,7 +1080,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
 		return NULL;
 
 	nid = page_to_nid(page);
-	if (!node_isset(nid, node_states[N_HIGH_MEMORY]))
+	if (!node_isset(nid, node_states[N_MEMORY]))
 		return NULL;
 
 	return page;
@@ -1232,7 +1232,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
 	if (md->writeback)
 		seq_printf(m, " writeback=%lu", md->writeback);
 
-	for_each_node_state(n, N_HIGH_MEMORY)
+	for_each_node_state(n, N_MEMORY)
 		if (md->node[n])
 			seq_printf(m, " N%d=%lu", n, md->node[n]);
 out:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 08/26] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (6 preceding siblings ...)
  2012-09-10  8:58 ` [V4 PATCH 07/26] procfs: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 09/26] oom: " Lai Jiangshan
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/memcontrol.c  |   18 +++++++++---------
 mm/page_cgroup.c |    2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 795e525..0d42f53 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -801,7 +801,7 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
 	int nid;
 	u64 total = 0;
 
-	for_each_node_state(nid, N_HIGH_MEMORY)
+	for_each_node_state(nid, N_MEMORY)
 		total += mem_cgroup_node_nr_lru_pages(memcg, nid, lru_mask);
 	return total;
 }
@@ -1617,9 +1617,9 @@ static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
 		return;
 
 	/* make a nodemask where this memcg uses memory from */
-	memcg->scan_nodes = node_states[N_HIGH_MEMORY];
+	memcg->scan_nodes = node_states[N_MEMORY];
 
-	for_each_node_mask(nid, node_states[N_HIGH_MEMORY]) {
+	for_each_node_mask(nid, node_states[N_MEMORY]) {
 
 		if (!test_mem_cgroup_node_reclaimable(memcg, nid, false))
 			node_clear(nid, memcg->scan_nodes);
@@ -1690,7 +1690,7 @@ static bool mem_cgroup_reclaimable(struct mem_cgroup *memcg, bool noswap)
 	/*
 	 * Check rest of nodes.
 	 */
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		if (node_isset(nid, memcg->scan_nodes))
 			continue;
 		if (test_mem_cgroup_node_reclaimable(memcg, nid, noswap))
@@ -3765,7 +3765,7 @@ move_account:
 		drain_all_stock_sync(memcg);
 		ret = 0;
 		mem_cgroup_start_move(memcg);
-		for_each_node_state(node, N_HIGH_MEMORY) {
+		for_each_node_state(node, N_MEMORY) {
 			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
 				enum lru_list lru;
 				for_each_lru(lru) {
@@ -4093,7 +4093,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
 
 	total_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL);
 	seq_printf(m, "total=%lu", total_nr);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid, LRU_ALL);
 		seq_printf(m, " N%d=%lu", nid, node_nr);
 	}
@@ -4101,7 +4101,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
 
 	file_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_FILE);
 	seq_printf(m, "file=%lu", file_nr);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
 				LRU_ALL_FILE);
 		seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4110,7 +4110,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
 
 	anon_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_ANON);
 	seq_printf(m, "anon=%lu", anon_nr);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
 				LRU_ALL_ANON);
 		seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4119,7 +4119,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
 
 	unevictable_nr = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
 	seq_printf(m, "unevictable=%lu", unevictable_nr);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
 				BIT(LRU_UNEVICTABLE));
 		seq_printf(m, " N%d=%lu", nid, node_nr);
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 5ddad0c..c1054ad 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -271,7 +271,7 @@ void __init page_cgroup_init(void)
 	if (mem_cgroup_disabled())
 		return;
 
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		unsigned long start_pfn, end_pfn;
 
 		start_pfn = node_start_pfn(nid);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 09/26] oom: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (7 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 08/26] memcontrol: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 10/26] mm,migrate: " Lai Jiangshan
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 mm/oom_kill.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1986008..5269e9d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -257,7 +257,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
 	 * the page allocator means a mempolicy is in effect.  Cpuset policy
 	 * is enforced in get_page_from_freelist().
 	 */
-	if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask)) {
+	if (nodemask && !nodes_subset(node_states[N_MEMORY], *nodemask)) {
 		*totalpages = total_swap_pages;
 		for_each_node_mask(nid, *nodemask)
 			*totalpages += node_spanned_pages(nid);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 10/26] mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (8 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 09/26] oom: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 11/26] mempolicy: " Lai Jiangshan
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/migrate.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..d595e58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1201,7 +1201,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 			if (node < 0 || node >= MAX_NUMNODES)
 				goto out_pm;
 
-			if (!node_state(node, N_HIGH_MEMORY))
+			if (!node_state(node, N_MEMORY))
 				goto out_pm;
 
 			err = -EACCES;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 11/26] mempolicy: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (9 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 10/26] mm,migrate: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 12/26] hugetlb: " Lai Jiangshan
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/mempolicy.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ada3be..54cf023 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -212,9 +212,9 @@ static int mpol_set_nodemask(struct mempolicy *pol,
 	/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
 	if (pol == NULL)
 		return 0;
-	/* Check N_HIGH_MEMORY */
+	/* Check N_MEMORY */
 	nodes_and(nsc->mask1,
-		  cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
+		  cpuset_current_mems_allowed, node_states[N_MEMORY]);
 
 	VM_BUG_ON(!nodes);
 	if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes))
@@ -1363,7 +1363,7 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 		goto out_put;
 	}
 
-	if (!nodes_subset(*new, node_states[N_HIGH_MEMORY])) {
+	if (!nodes_subset(*new, node_states[N_MEMORY])) {
 		err = -EINVAL;
 		goto out_put;
 	}
@@ -2320,7 +2320,7 @@ void __init numa_policy_init(void)
 	 * fall back to the largest node if they're all smaller.
 	 */
 	nodes_clear(interleave_nodes);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		unsigned long total_pages = node_present_pages(nid);
 
 		/* Preserve the largest node */
@@ -2401,7 +2401,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
 		*nodelist++ = '\0';
 		if (nodelist_parse(nodelist, nodes))
 			goto out;
-		if (!nodes_subset(nodes, node_states[N_HIGH_MEMORY]))
+		if (!nodes_subset(nodes, node_states[N_MEMORY]))
 			goto out;
 	} else
 		nodes_clear(nodes);
@@ -2435,7 +2435,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
 		 * Default to online nodes with memory if no nodelist
 		 */
 		if (!nodelist)
-			nodes = node_states[N_HIGH_MEMORY];
+			nodes = node_states[N_MEMORY];
 		break;
 	case MPOL_LOCAL:
 		/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 12/26] hugetlb: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (10 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 11/26] mempolicy: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 13/26] vmstat: " Lai Jiangshan
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 drivers/base/node.c |    2 +-
 mm/hugetlb.c        |   24 ++++++++++++------------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5d7731e..4c3aa7c 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -227,7 +227,7 @@ static node_registration_func_t __hugetlb_unregister_node;
 static inline bool hugetlb_register_node(struct node *node)
 {
 	if (__hugetlb_register_node &&
-			node_state(node->dev.id, N_HIGH_MEMORY)) {
+			node_state(node->dev.id, N_MEMORY)) {
 		__hugetlb_register_node(node);
 		return true;
 	}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc72712..a254dfb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1052,7 +1052,7 @@ static void return_unused_surplus_pages(struct hstate *h,
 	 * on-line nodes with memory and will handle the hstate accounting.
 	 */
 	while (nr_pages--) {
-		if (!free_pool_huge_page(h, &node_states[N_HIGH_MEMORY], 1))
+		if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
 			break;
 	}
 }
@@ -1175,14 +1175,14 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 int __weak alloc_bootmem_huge_page(struct hstate *h)
 {
 	struct huge_bootmem_page *m;
-	int nr_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+	int nr_nodes = nodes_weight(node_states[N_MEMORY]);
 
 	while (nr_nodes) {
 		void *addr;
 
 		addr = __alloc_bootmem_node_nopanic(
 				NODE_DATA(hstate_next_node_to_alloc(h,
-						&node_states[N_HIGH_MEMORY])),
+						&node_states[N_MEMORY])),
 				huge_page_size(h), huge_page_size(h), 0);
 
 		if (addr) {
@@ -1254,7 +1254,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 			if (!alloc_bootmem_huge_page(h))
 				break;
 		} else if (!alloc_fresh_huge_page(h,
-					 &node_states[N_HIGH_MEMORY]))
+					 &node_states[N_MEMORY]))
 			break;
 	}
 	h->max_huge_pages = i;
@@ -1522,7 +1522,7 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
 		if (!(obey_mempolicy &&
 				init_nodemask_of_mempolicy(nodes_allowed))) {
 			NODEMASK_FREE(nodes_allowed);
-			nodes_allowed = &node_states[N_HIGH_MEMORY];
+			nodes_allowed = &node_states[N_MEMORY];
 		}
 	} else if (nodes_allowed) {
 		/*
@@ -1532,11 +1532,11 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
 		count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
 		init_nodemask_of_node(nodes_allowed, nid);
 	} else
-		nodes_allowed = &node_states[N_HIGH_MEMORY];
+		nodes_allowed = &node_states[N_MEMORY];
 
 	h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);
 
-	if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+	if (nodes_allowed != &node_states[N_MEMORY])
 		NODEMASK_FREE(nodes_allowed);
 
 	return len;
@@ -1839,7 +1839,7 @@ static void hugetlb_register_all_nodes(void)
 {
 	int nid;
 
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		struct node *node = &node_devices[nid];
 		if (node->dev.id == nid)
 			hugetlb_register_node(node);
@@ -1934,8 +1934,8 @@ void __init hugetlb_add_hstate(unsigned order)
 	for (i = 0; i < MAX_NUMNODES; ++i)
 		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
 	INIT_LIST_HEAD(&h->hugepage_activelist);
-	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
-	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
+	h->next_nid_to_alloc = first_node(node_states[N_MEMORY]);
+	h->next_nid_to_free = first_node(node_states[N_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
 					huge_page_size(h)/1024);
 	/*
@@ -2030,11 +2030,11 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
 		if (!(obey_mempolicy &&
 			       init_nodemask_of_mempolicy(nodes_allowed))) {
 			NODEMASK_FREE(nodes_allowed);
-			nodes_allowed = &node_states[N_HIGH_MEMORY];
+			nodes_allowed = &node_states[N_MEMORY];
 		}
 		h->max_huge_pages = set_max_huge_pages(h, tmp, nodes_allowed);
 
-		if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+		if (nodes_allowed != &node_states[N_MEMORY])
 			NODEMASK_FREE(nodes_allowed);
 	}
 out:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 13/26] vmstat: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (11 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 12/26] hugetlb: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 14/26] kthread: " Lai Jiangshan
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/vmstat.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index df7a674..eeaf4e1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -918,7 +918,7 @@ static int pagetypeinfo_show(struct seq_file *m, void *arg)
 	pg_data_t *pgdat = (pg_data_t *)arg;
 
 	/* check memoryless node */
-	if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+	if (!node_state(pgdat->node_id, N_MEMORY))
 		return 0;
 
 	seq_printf(m, "Page block order: %d\n", pageblock_order);
@@ -1280,7 +1280,7 @@ static int unusable_show(struct seq_file *m, void *arg)
 	pg_data_t *pgdat = (pg_data_t *)arg;
 
 	/* check memoryless node */
-	if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+	if (!node_state(pgdat->node_id, N_MEMORY))
 		return 0;
 
 	walk_zones_in_node(m, pgdat, unusable_show_print);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 14/26] kthread: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (12 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 13/26] vmstat: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 15/26] init: " Lai Jiangshan
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/kthread.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 146a6fa..065a0a8 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -427,7 +427,7 @@ int kthreadd(void *unused)
 	set_task_comm(tsk, "kthreadd");
 	ignore_signals(tsk);
 	set_cpus_allowed_ptr(tsk, cpu_all_mask);
-	set_mems_allowed(node_states[N_HIGH_MEMORY]);
+	set_mems_allowed(node_states[N_MEMORY]);
 
 	current->flags |= PF_NOFREEZE;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 15/26] init: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (13 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 14/26] kthread: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 16/26] vmscan: " Lai Jiangshan
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 init/main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/init/main.c b/init/main.c
index b286730..637e604 100644
--- a/init/main.c
+++ b/init/main.c
@@ -848,7 +848,7 @@ static int __init kernel_init(void * unused)
 	/*
 	 * init can allocate pages on any node
 	 */
-	set_mems_allowed(node_states[N_HIGH_MEMORY]);
+	set_mems_allowed(node_states[N_MEMORY]);
 	/*
 	 * init can run on any cpu.
 	 */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 16/26] vmscan: use N_MEMORY instead N_HIGH_MEMORY
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (14 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 15/26] init: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 17/26] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 mm/vmscan.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8d01243..cb42747 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3071,7 +3071,7 @@ static int __devinit cpu_callback(struct notifier_block *nfb,
 	int nid;
 
 	if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
-		for_each_node_state(nid, N_HIGH_MEMORY) {
+		for_each_node_state(nid, N_MEMORY) {
 			pg_data_t *pgdat = NODE_DATA(nid);
 			const struct cpumask *mask;
 
@@ -3126,7 +3126,7 @@ static int __init kswapd_init(void)
 	int nid;
 
 	swap_setup();
-	for_each_node_state(nid, N_HIGH_MEMORY)
+	for_each_node_state(nid, N_MEMORY)
  		kswapd_run(nid);
 	hotcpu_notifier(cpu_callback, 0);
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 17/26] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (15 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 16/26] vmscan: " Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 18/26] hotplug: update nodemasks management Lai Jiangshan
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Since we introduced N_MEMORY, we update the initialization of node_states.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 arch/x86/mm/init_64.c |    4 +++-
 mm/page_alloc.c       |   40 ++++++++++++++++++++++------------------
 2 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 2b6b4a3..005f00c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -625,7 +625,9 @@ void __init paging_init(void)
 	 *	 numa support is not compiled in, and later node_set_state
 	 *	 will not set it back.
 	 */
-	node_clear_state(0, N_NORMAL_MEMORY);
+	node_clear_state(0, N_MEMORY);
+	if (N_MEMORY != N_NORMAL_MEMORY)
+		node_clear_state(0, N_NORMAL_MEMORY);
 
 	zone_sizes_init();
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9e3c8b2..3bb04ed 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1675,7 +1675,7 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark,
  *
  * If the zonelist cache is present in the passed in zonelist, then
  * returns a pointer to the allowed node mask (either the current
- * tasks mems_allowed, or node_states[N_HIGH_MEMORY].)
+ * tasks mems_allowed, or node_states[N_MEMORY].)
  *
  * If the zonelist cache is not available for this zonelist, does
  * nothing and returns NULL.
@@ -1704,7 +1704,7 @@ static nodemask_t *zlc_setup(struct zonelist *zonelist, int alloc_flags)
 
 	allowednodes = !in_interrupt() && (alloc_flags & ALLOC_CPUSET) ?
 					&cpuset_current_mems_allowed :
-					&node_states[N_HIGH_MEMORY];
+					&node_states[N_MEMORY];
 	return allowednodes;
 }
 
@@ -3132,7 +3132,7 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
 		return node;
 	}
 
-	for_each_node_state(n, N_HIGH_MEMORY) {
+	for_each_node_state(n, N_MEMORY) {
 
 		/* Don't want a node to appear more than once */
 		if (node_isset(n, *used_node_mask))
@@ -3274,7 +3274,7 @@ static int default_zonelist_order(void)
  	 * local memory, NODE_ORDER may be suitable.
          */
 	average_size = total_size /
-				(nodes_weight(node_states[N_HIGH_MEMORY]) + 1);
+				(nodes_weight(node_states[N_MEMORY]) + 1);
 	for_each_online_node(nid) {
 		low_kmem_size = 0;
 		total_size = 0;
@@ -4619,7 +4619,7 @@ unsigned long __init find_min_pfn_with_active_regions(void)
 /*
  * early_calculate_totalpages()
  * Sum pages in active regions for movable zone.
- * Populate N_HIGH_MEMORY for calculating usable_nodes.
+ * Populate N_MEMORY for calculating usable_nodes.
  */
 static unsigned long __init early_calculate_totalpages(void)
 {
@@ -4632,7 +4632,7 @@ static unsigned long __init early_calculate_totalpages(void)
 
 		totalpages += pages;
 		if (pages)
-			node_set_state(nid, N_HIGH_MEMORY);
+			node_set_state(nid, N_MEMORY);
 	}
   	return totalpages;
 }
@@ -4649,9 +4649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	unsigned long usable_startpfn;
 	unsigned long kernelcore_node, kernelcore_remaining;
 	/* save the state before borrow the nodemask */
-	nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
+	nodemask_t saved_node_state = node_states[N_MEMORY];
 	unsigned long totalpages = early_calculate_totalpages();
-	int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+	int usable_nodes = nodes_weight(node_states[N_MEMORY]);
 
 	/*
 	 * If movablecore was specified, calculate what size of
@@ -4686,7 +4686,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 restart:
 	/* Spread kernelcore memory as evenly as possible throughout nodes */
 	kernelcore_node = required_kernelcore / usable_nodes;
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		unsigned long start_pfn, end_pfn;
 
 		/*
@@ -4778,23 +4778,27 @@ restart:
 
 out:
 	/* restore the node_state */
-	node_states[N_HIGH_MEMORY] = saved_node_state;
+	node_states[N_MEMORY] = saved_node_state;
 }
 
-/* Any regular memory on that node ? */
-static void __init check_for_regular_memory(pg_data_t *pgdat)
+/* Any regular or high memory on that node ? */
+static void check_for_memory(pg_data_t *pgdat, int nid)
 {
-#ifdef CONFIG_HIGHMEM
 	enum zone_type zone_type;
 
-	for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
+	if (N_MEMORY == N_NORMAL_MEMORY)
+		return;
+
+	for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) {
 		struct zone *zone = &pgdat->node_zones[zone_type];
 		if (zone->present_pages) {
-			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
+			node_set_state(nid, N_HIGH_MEMORY);
+			if (N_NORMAL_MEMORY != N_HIGH_MEMORY &&
+			    zone_type <= ZONE_NORMAL)
+				node_set_state(nid, N_NORMAL_MEMORY);
 			break;
 		}
 	}
-#endif
 }
 
 /**
@@ -4877,8 +4881,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 
 		/* Any memory on that node */
 		if (pgdat->node_present_pages)
-			node_set_state(nid, N_HIGH_MEMORY);
-		check_for_regular_memory(pgdat);
+			node_set_state(nid, N_MEMORY);
+		check_for_memory(pgdat, nid);
 	}
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 18/26] hotplug: update nodemasks management
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (16 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 17/26] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 19/26] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

update nodemasks management for N_MEMORY

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 Documentation/memory-hotplug.txt |    5 +++-
 include/linux/memory.h           |    1 +
 mm/memory_hotplug.c              |   49 +++++++++++++++++++++++++++++++++----
 3 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6e6cbc7..70bc1c7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -378,6 +378,7 @@ struct memory_notify {
        unsigned long start_pfn;
        unsigned long nr_pages;
        int status_change_nid_normal;
+       int status_change_nid_high;
        int status_change_nid;
 }
 
@@ -385,7 +386,9 @@ start_pfn is start_pfn of online/offline memory.
 nr_pages is # of pages of online/offline memory.
 status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
 is (will be) set/clear, if this is -1, then nodemask status is not changed.
-status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
+status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
+status_change_nid is set node id when N_MEMORY of nodemask is (will be)
 set/clear. It means a new(memoryless) node gets new memory by online and a
 node loses all memory. If this is -1, then nodemask status is not changed.
 If status_changed_nid* >= 0, callback should create/discard structures for the
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 6b9202b..8089e49 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -54,6 +54,7 @@ struct memory_notify {
 	unsigned long start_pfn;
 	unsigned long nr_pages;
 	int status_change_nid_normal;
+	int status_change_nid_high;
 	int status_change_nid;
 };
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8c3bcf6..d2b0158 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -462,7 +462,7 @@ static void check_nodemasks_changes_online(unsigned long nr_pages,
 	int nid = zone_to_nid(zone);
 	enum zone_type zone_last = ZONE_NORMAL;
 
-	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+	if (N_MEMORY == N_NORMAL_MEMORY)
 		zone_last = ZONE_MOVABLE;
 
 	if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
@@ -470,7 +470,20 @@ static void check_nodemasks_changes_online(unsigned long nr_pages,
 	else
 		arg->status_change_nid_normal = -1;
 
-	if (!node_state(nid, N_HIGH_MEMORY))
+#ifdef CONFIG_HIGHMEM
+	zone_last = ZONE_HIGH;
+	if (N_MEMORY == N_HIGH_MEMORY)
+		zone_last = ZONE_MOVABLE;
+
+	if (zone_idx(zone) <= zone_last && !node_state(nid, N_HIGH_MEMORY))
+		arg->status_change_nid_high = nid;
+	else
+		arg->status_change_nid_high = -1;
+#else
+	arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
+	if (!node_state(nid, N_MEMORY))
 		arg->status_change_nid = nid;
 	else
 		arg->status_change_nid = -1;
@@ -481,7 +494,10 @@ static void set_nodemasks(int node, struct memory_notify *arg)
 	if (arg->status_change_nid_normal >= 0)
 		node_set_state(node, N_NORMAL_MEMORY);
 
-	node_set_state(node, N_HIGH_MEMORY);
+	if (arg->status_change_nid_high >= 0)
+		node_set_state(node, N_HIGH_MEMORY);
+
+	node_set_state(node, N_MEMORY);
 }
 
 
@@ -900,7 +916,7 @@ static void check_nodemasks_changes_offline(unsigned long nr_pages,
 	unsigned long present_pages = 0;
 	enum zone_type zt, zone_last = ZONE_NORMAL;
 
-	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+	if (N_MEMORY == N_NORMAL_MEMORY)
 		zone_last = ZONE_MOVABLE;
 
 	for (zt = 0; zt <= zone_last; zt++)
@@ -910,6 +926,21 @@ static void check_nodemasks_changes_offline(unsigned long nr_pages,
 	else
 		arg->status_change_nid_normal = -1;
 
+#ifdef CONIG_HIGHMEM
+	zone_last = ZONE_HIGH;
+	if (N_MEMORY == N_HIGH_MEMORY)
+		zone_last = ZONE_MOVABLE;
+
+	for (; zt <= zone_last; zt++)
+		present_pages += pgdat->node_zones[zt].present_pages;
+	if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
+		arg->status_change_nid_high = zone_to_nid(zone);
+	else
+		arg->status_change_nid_high = -1;
+#else
+	arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
 	zone_last = ZONE_MOVABLE;
 	for (; zt <= zone_last; zt++)
 		present_pages += pgdat->node_zones[zt].present_pages;
@@ -924,11 +955,17 @@ static void clear_nodemasks(int node, struct memory_notify *arg)
 	if (arg->status_change_nid_normal >= 0)
 		node_clear_state(node, N_NORMAL_MEMORY);
 
-	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+	if (N_MEMORY == N_NORMAL_MEMORY)
 		return;
 
-	if (arg->status_change_nid >= 0)
+	if (arg->status_change_nid_high >= 0)
 		node_clear_state(node, N_HIGH_MEMORY);
+
+	if (N_MEMORY == N_HIGH_MEMORY)
+		return;
+
+	if (arg->status_change_nid >= 0)
+		node_clear_state(node, N_MEMORY);
 }
 
 static int __ref offline_pages(unsigned long start_pfn,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 19/26] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (17 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 18/26] hotplug: update nodemasks management Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 20/26] page_alloc: add kernelcore_max_addr Lai Jiangshan
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

All are prepared, we can actually introduce N_MEMORY.
add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 drivers/base/node.c      |    6 ++++++
 include/linux/nodemask.h |    4 ++++
 mm/Kconfig               |    8 ++++++++
 mm/page_alloc.c          |    3 +++
 4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 4c3aa7c..9cdd66f 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -620,6 +620,9 @@ static struct node_attr node_state_attr[] = {
 #ifdef CONFIG_HIGHMEM
 	[N_HIGH_MEMORY] = _NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
 #endif
+#ifdef CONFIG_MOVABLE_NODE
+	[N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
+#endif
 	[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
 };
 
@@ -630,6 +633,9 @@ static struct attribute *node_state_attrs[] = {
 #ifdef CONFIG_HIGHMEM
 	&node_state_attr[N_HIGH_MEMORY].attr.attr,
 #endif
+#ifdef CONFIG_MOVABLE_NODE
+	&node_state_attr[N_MEMORY].attr.attr,
+#endif
 	&node_state_attr[N_CPU].attr.attr,
 	NULL
 };
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index c6ebdc9..4e2cbfa 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -380,7 +380,11 @@ enum node_states {
 #else
 	N_HIGH_MEMORY = N_NORMAL_MEMORY,
 #endif
+#ifdef CONFIG_MOVABLE_NODE
+	N_MEMORY,		/* The node has memory(regular, high, movable) */
+#else
 	N_MEMORY = N_HIGH_MEMORY,
+#endif
 	N_CPU,		/* The node has one or more cpus */
 	NR_NODE_STATES
 };
diff --git a/mm/Kconfig b/mm/Kconfig
index d5c8019..8c14a2c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -143,6 +143,14 @@ config NO_BOOTMEM
 config MEMORY_ISOLATION
 	boolean
 
+config MOVABLE_NODE
+	boolean "Enable to assign a node has only movable memory"
+	depends on HAVE_MEMBLOCK
+	depends on NO_BOOTMEM
+	depends on X86_64
+	depends on NUMA
+	default y
+
 # eventually, we can have this option just 'select SPARSEMEM'
 config MEMORY_HOTPLUG
 	bool "Allow for memory hot-add"
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3bb04ed..621c666 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -90,6 +90,9 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
 #ifdef CONFIG_HIGHMEM
 	[N_HIGH_MEMORY] = { { [0] = 1UL } },
 #endif
+#ifdef CONFIG_MOVABLE_NODE
+	[N_MEMORY] = { { [0] = 1UL } },
+#endif
 	[N_CPU] = { { [0] = 1UL } },
 #endif	/* NUMA */
 };
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 20/26] page_alloc: add kernelcore_max_addr
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (18 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 19/26] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 21/26] x86: get pg_data_t's memory from other node Lai Jiangshan
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

Current ZONE_MOVABLE (kernelcore=) setting policy with boot option doesn't meet
our requirement. We need something like kernelcore_max_addr=XX boot option
to limit the kernelcore upper address.

The memory with higher address will be migratable(movable) and they
are easier to be offline(always ready to be offline when the system don't require
so much memory).

It makes things easy when we dynamic hot-add/remove memory, make better
utilities of memories, and helps for THP.

All kernelcore_max_addr=, kernelcore= and movablecore= can be safely specified
at the same time(or any 2 of them).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |    9 +++++++++
 mm/page_alloc.c                     |   29 ++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7aef334..02a2ce9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1215,6 +1215,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			use the HighMem zone if it exists, and the Normal
 			zone if it does not.
 
+	kernelcore_max_addr=nn[KMG]	[KNL,X86,IA-64,PPC] This parameter
+			is the same effect as kernelcore parameter, except it
+			specifies the up physical address of memory range
+			usable by the kernel for non-movable allocations.
+			If both kernelcore and kernelcore_max_addr are
+			specified, this requested's priority is higher than
+			kernelcore's.
+			See the kernelcore parameter.
+
 	kgdbdbgp=	[KGDB,HW] kgdb over EHCI usb debug port.
 			Format: <Controller#>[,poll interval]
 			The controller # is the number of the ehci usb debug
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 621c666..c1c5834 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -203,6 +203,7 @@ static unsigned long __meminitdata dma_reserve;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
+static unsigned long __initdata required_kernelcore_max_pfn;
 static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
@@ -4650,6 +4651,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 {
 	int i, nid;
 	unsigned long usable_startpfn;
+	unsigned long kernelcore_max_pfn;
 	unsigned long kernelcore_node, kernelcore_remaining;
 	/* save the state before borrow the nodemask */
 	nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -4678,6 +4680,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		required_kernelcore = max(required_kernelcore, corepages);
 	}
 
+	if (required_kernelcore_max_pfn && !required_kernelcore)
+		required_kernelcore = totalpages;
+
 	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
 	if (!required_kernelcore)
 		goto out;
@@ -4686,6 +4691,12 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
+	if (required_kernelcore_max_pfn)
+		kernelcore_max_pfn = required_kernelcore_max_pfn;
+	else
+		kernelcore_max_pfn = ULONG_MAX >> PAGE_SHIFT;
+	kernelcore_max_pfn = max(kernelcore_max_pfn, usable_startpfn);
+
 restart:
 	/* Spread kernelcore memory as evenly as possible throughout nodes */
 	kernelcore_node = required_kernelcore / usable_nodes;
@@ -4712,8 +4723,12 @@ restart:
 			unsigned long size_pages;
 
 			start_pfn = max(start_pfn, zone_movable_pfn[nid]);
-			if (start_pfn >= end_pfn)
+			end_pfn = min(kernelcore_max_pfn, end_pfn);
+			if (start_pfn >= end_pfn) {
+				if (!zone_movable_pfn[nid])
+					zone_movable_pfn[nid] = start_pfn;
 				continue;
+			}
 
 			/* Account for what is only usable for kernelcore */
 			if (start_pfn < usable_startpfn) {
@@ -4904,6 +4919,18 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
 	return 0;
 }
 
+#ifdef CONFIG_MOVABLE_NODE
+/*
+ * kernelcore_max_addr=addr sets the up physical address of memory range
+ * for use for allocations that cannot be reclaimed or migrated.
+ */
+static int __init cmdline_parse_kernelcore_max_addr(char *p)
+{
+	return cmdline_parse_core(p, &required_kernelcore_max_pfn);
+}
+early_param("kernelcore_max_addr", cmdline_parse_kernelcore_max_addr);
+#endif
+
 /*
  * kernelcore=size sets the amount of memory for use for allocations that
  * cannot be reclaimed or migrated.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 21/26] x86: get pg_data_t's memory from other node
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (19 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 20/26] page_alloc: add kernelcore_max_addr Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 22/26] x86: use memblock_set_current_limit() to set memblock.current_limit Lai Jiangshan
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system can create movable node which all memory of the
node is allocated as ZONE_MOVABLE, setup_node_data() cannot
allocate memory for the node's pg_data_t.
So when memblock_alloc_nid() fails, setup_node_data() retries
memblock_alloc().

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 arch/x86/mm/numa.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 2d125be..a86e315 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -223,9 +223,13 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
 		remapped = true;
 	} else {
 		nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
-		if (!nd_pa) {
-			pr_err("Cannot find %zu bytes in node %d\n",
+		if (!nd_pa)
+			printk(KERN_WARNING "Cannot find %zu bytes in node %d\n",
 			       nd_size, nid);
+		nd_pa = memblock_alloc(nd_size, SMP_CACHE_BYTES);
+		if (!nd_pa) {
+			pr_err("Cannot find %zu bytes in other node\n",
+			       nd_size);
 			return;
 		}
 		nd = __va(nd_pa);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 22/26] x86: use memblock_set_current_limit() to set memblock.current_limit
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (20 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 21/26] x86: get pg_data_t's memory from other node Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 23/26] memblock: limit memory address from memblock Lai Jiangshan
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

memblock.current_limit is set directly though memblock_set_current_limit()
is prepared. So fix it.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f4b9b80..bb9d9f8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -889,7 +889,7 @@ void __init setup_arch(char **cmdline_p)
 
 	cleanup_highmap();
 
-	memblock.current_limit = get_max_mapped();
+	memblock_set_current_limit(get_max_mapped());
 	memblock_x86_fill();
 
 	/*
@@ -925,7 +925,7 @@ void __init setup_arch(char **cmdline_p)
 		max_low_pfn = max_pfn;
 	}
 #endif
-	memblock.current_limit = get_max_mapped();
+	memblock_set_current_limit(get_max_mapped());
 	dma_contiguous_reserve(0);
 
 	/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 23/26] memblock: limit memory address from memblock
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (21 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 22/26] x86: use memblock_set_current_limit() to set memblock.current_limit Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 24/26] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Setting kernelcore_max_pfn means all memory which is bigger than
the boot parameter is allocated as ZONE_MOVABLE. So memory which
is allocated by memblock also should be limited by the parameter.

The patch limits memory from memblock.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |    5 ++++-
 mm/page_alloc.c          |    6 +++++-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 19dc455..f2977ae 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -42,6 +42,7 @@ struct memblock {
 
 extern struct memblock memblock;
 extern int memblock_debug;
+extern phys_addr_t memblock_limit;
 
 #define memblock_dbg(fmt, ...) \
 	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
diff --git a/mm/memblock.c b/mm/memblock.c
index 82aa349..fbf5efc 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -932,7 +932,10 @@ int __init_memblock memblock_is_region_reserved(phys_addr_t base, phys_addr_t si
 
 void __init_memblock memblock_set_current_limit(phys_addr_t limit)
 {
-	memblock.current_limit = limit;
+	if (!memblock_limit || (memblock_limit > limit))
+		memblock.current_limit = limit;
+	else
+		memblock.current_limit = memblock_limit;
 }
 
 static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c1c5834..3878170 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -208,6 +208,8 @@ static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
 
+phys_addr_t memblock_limit;
+
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
 EXPORT_SYMBOL(movable_zone);
@@ -4926,7 +4928,9 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
  */
 static int __init cmdline_parse_kernelcore_max_addr(char *p)
 {
-	return cmdline_parse_core(p, &required_kernelcore_max_pfn);
+	cmdline_parse_core(p, &required_kernelcore_max_pfn);
+	memblock_limit = required_kernelcore_max_pfn << PAGE_SHIFT;
+	return 0;
 }
 early_param("kernelcore_max_addr", cmdline_parse_kernelcore_max_addr);
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 24/26] memblock: compare current_limit with end variable at memblock_find_in_range_node()
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (22 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 23/26] memblock: limit memory address from memblock Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 25/26] mm, memory-hotplug: add online_movable and online_kernel Lai Jiangshan
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

memblock_find_in_range_node() does not compare memblock.current_limit
with end variable. Thus even if memblock.current_limit is smaller than
end variable, the function allocates memory address that is bigger than
memblock.current_limit.

The patch adds the check to "memblock_find_in_range_node()"

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/memblock.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index fbf5efc..f726b5e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -99,11 +99,12 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 					phys_addr_t align, int nid)
 {
 	phys_addr_t this_start, this_end, cand;
+	phys_addr_t current_limit = memblock.current_limit;
 	u64 i;
 
 	/* pump up @end */
-	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
-		end = memblock.current_limit;
+	if ((end == MEMBLOCK_ALLOC_ACCESSIBLE) || (end > current_limit))
+		end = current_limit;
 
 	/* avoid allocating the first page */
 	start = max_t(phys_addr_t, start, PAGE_SIZE);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 25/26] mm, memory-hotplug: add online_movable and online_kernel
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (23 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 24/26] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-10  8:59 ` [V4 PATCH 26/26] memory_hotplug: handle empty zone when online_movable/online_kernel Lai Jiangshan
  2012-09-11  0:40 ` [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Yasuaki Ishimatsu
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.

It makes things easy when we dynamic hot-add/remove memory, make better
utilities of memories, and helps for THP.

Current constraints: Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be onlined from ZONE_NORMAL to ZONE_MOVABLE.

For opposite onlining behavior, we also introduce "online_kernel" to change
a memoryblock of ZONE_MOVABLE to ZONE_KERNEL when online.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 Documentation/memory-hotplug.txt |   14 +++++-
 drivers/base/memory.c            |   19 +++++---
 include/linux/memory_hotplug.h   |   13 +++++-
 mm/memory_hotplug.c              |  101 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 137 insertions(+), 10 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 70bc1c7..8e5eacb 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -161,7 +161,8 @@ a recent addition and not present on older kernels.
 		    in the memory block.
 'state'           : read-write
                     at read:  contains online/offline state of memory.
-                    at write: user can specify "online", "offline" command
+                    at write: user can specify "online_kernel",
+                    "online_movable", "online", "offline" command
                     which will be performed on al sections in the block.
 'phys_device'     : read-only: designed to show the name of physical memory
                     device.  This is not well implemented now.
@@ -255,6 +256,17 @@ For onlining, you have to write "online" to the section's state file as:
 
 % echo online > /sys/devices/system/memory/memoryXXX/state
 
+This onlining will not change the ZONE type of the target memory section,
+If the memory section is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
+
+% echo online_movable > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_MOVABLE)
+
+And if the memory section is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
+
+% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_NORMAL)
+
 After this, section memoryXXX's state will be 'online' and the amount of
 available memory will be increased.
 
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..1ad2f48 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -246,7 +246,7 @@ static bool pages_correctly_reserved(unsigned long start_pfn,
  * OK to have direct references to sparsemem variables in here.
  */
 static int
-memory_block_action(unsigned long phys_index, unsigned long action)
+memory_block_action(unsigned long phys_index, unsigned long action, int online_type)
 {
 	unsigned long start_pfn, start_paddr;
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -262,7 +262,7 @@ memory_block_action(unsigned long phys_index, unsigned long action)
 			if (!pages_correctly_reserved(start_pfn, nr_pages))
 				return -EBUSY;
 
-			ret = online_pages(start_pfn, nr_pages);
+			ret = online_pages(start_pfn, nr_pages, online_type);
 			break;
 		case MEM_OFFLINE:
 			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
@@ -279,7 +279,8 @@ memory_block_action(unsigned long phys_index, unsigned long action)
 }
 
 static int memory_block_change_state(struct memory_block *mem,
-		unsigned long to_state, unsigned long from_state_req)
+		unsigned long to_state, unsigned long from_state_req,
+		int online_type)
 {
 	int ret = 0;
 
@@ -293,7 +294,7 @@ static int memory_block_change_state(struct memory_block *mem,
 	if (to_state == MEM_OFFLINE)
 		mem->state = MEM_GOING_OFFLINE;
 
-	ret = memory_block_action(mem->start_section_nr, to_state);
+	ret = memory_block_action(mem->start_section_nr, to_state, online_type);
 
 	if (ret) {
 		mem->state = from_state_req;
@@ -325,10 +326,14 @@ store_mem_state(struct device *dev,
 
 	mem = container_of(dev, struct memory_block, dev);
 
-	if (!strncmp(buf, "online", min((int)count, 6)))
-		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
+	if (!strncmp(buf, "online_kernel", min((int)count, 13)))
+		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_KERNEL);
+	else if (!strncmp(buf, "online_movable", min((int)count, 14)))
+		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_MOVABLE);
+	else if (!strncmp(buf, "online", min((int)count, 6)))
+		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_KEEP);
 	else if(!strncmp(buf, "offline", min((int)count, 7)))
-		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE, -1);
 
 	if (ret)
 		return ret;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..047cd1d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -25,6 +25,13 @@ enum {
 	MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE = NODE_INFO,
 };
 
+/* Types for control the zone type of onlined memory */
+enum {
+	ONLINE_KEEP,
+	ONLINE_KERNEL,
+	ONLINE_MOVABLE,
+};
+
 /*
  * pgdat resizing functions
  */
@@ -45,6 +52,10 @@ void pgdat_resize_init(struct pglist_data *pgdat)
 }
 /*
  * Zone resizing functions
+ *
+ * Note: any attempt to resize a zone should has pgdat_resize_lock()
+ * zone_span_writelock() both held. This ensure the size of a zone
+ * can't be changed while pgdat_resize_lock() held.
  */
 static inline unsigned zone_span_seqbegin(struct zone *zone)
 {
@@ -70,7 +81,7 @@ extern int zone_grow_free_lists(struct zone *zone, unsigned long new_nr_pages);
 extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
 extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
 /* VM interface that may be used by firmware interface */
-extern int online_pages(unsigned long, unsigned long);
+extern int online_pages(unsigned long, unsigned long, int);
 extern void __offline_isolated_pages(unsigned long, unsigned long);
 
 typedef void (*online_page_callback_t)(struct page *page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d2b0158..e691076 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -210,6 +210,89 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
 	zone_span_writeunlock(zone);
 }
 
+static void resize_zone(struct zone *zone, unsigned long start_pfn,
+		unsigned long end_pfn)
+{
+
+	zone_span_writelock(zone);
+
+	zone->zone_start_pfn = start_pfn;
+	zone->spanned_pages = end_pfn - start_pfn;
+
+	zone_span_writeunlock(zone);
+}
+
+static void fix_zone_id(struct zone *zone, unsigned long start_pfn,
+		unsigned long end_pfn)
+{
+	enum zone_type zid = zone_idx(zone);
+	int nid = zone->zone_pgdat->node_id;
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++)
+		set_page_links(pfn_to_page(pfn), zid, nid, pfn);
+}
+
+static int move_pfn_range_left(struct zone *z1, struct zone *z2,
+		unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long flags;
+
+	pgdat_resize_lock(z1->zone_pgdat, &flags);
+
+	/* can't move pfns which are higher than @z2 */
+	if (end_pfn > z2->zone_start_pfn + z2->spanned_pages)
+		goto out_fail;
+	/* the move out part mast at the left most of @z2 */
+	if (start_pfn > z2->zone_start_pfn)
+		goto out_fail;
+	/* must included/overlap */
+	if (end_pfn <= z2->zone_start_pfn)
+		goto out_fail;
+
+	resize_zone(z1, z1->zone_start_pfn, end_pfn);
+	resize_zone(z2, end_pfn, z2->zone_start_pfn + z2->spanned_pages);
+
+	pgdat_resize_unlock(z1->zone_pgdat, &flags);
+
+	fix_zone_id(z1, start_pfn, end_pfn);
+
+	return 0;
+out_fail:
+	pgdat_resize_unlock(z1->zone_pgdat, &flags);
+	return -1;
+}
+
+static int move_pfn_range_right(struct zone *z1, struct zone *z2,
+		unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long flags;
+
+	pgdat_resize_lock(z1->zone_pgdat, &flags);
+
+	/* can't move pfns which are lower than @z1 */
+	if (z1->zone_start_pfn > start_pfn)
+		goto out_fail;
+	/* the move out part mast at the right most of @z1 */
+	if (z1->zone_start_pfn + z1->spanned_pages >  end_pfn)
+		goto out_fail;
+	/* must included/overlap */
+	if (start_pfn >= z1->zone_start_pfn + z1->spanned_pages)
+		goto out_fail;
+
+	resize_zone(z1, z1->zone_start_pfn, start_pfn);
+	resize_zone(z2, start_pfn, z2->zone_start_pfn + z2->spanned_pages);
+
+	pgdat_resize_unlock(z1->zone_pgdat, &flags);
+
+	fix_zone_id(z2, start_pfn, end_pfn);
+
+	return 0;
+out_fail:
+	pgdat_resize_unlock(z1->zone_pgdat, &flags);
+	return -1;
+}
+
 static void grow_pgdat_span(struct pglist_data *pgdat, unsigned long start_pfn,
 			    unsigned long end_pfn)
 {
@@ -501,7 +584,7 @@ static void set_nodemasks(int node, struct memory_notify *arg)
 }
 
 
-int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
+int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
 {
 	unsigned long onlined_pages = 0;
 	struct zone *zone;
@@ -518,6 +601,22 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 	 */
 	zone = page_zone(pfn_to_page(pfn));
 
+	if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
+		if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
+			unlock_memory_hotplug();
+			return -1;
+		}
+	}
+	if (online_type == ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) {
+		if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) {
+			unlock_memory_hotplug();
+			return -1;
+		}
+	}
+
+	/* Previous code may changed the zone of the pfn range */
+	zone = page_zone(pfn_to_page(pfn));
+
 	arg.start_pfn = pfn;
 	arg.nr_pages = nr_pages;
 	check_nodemasks_changes_online(nr_pages, zone, &arg);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 26/26] memory_hotplug: handle empty zone when online_movable/online_kernel
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (24 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 25/26] mm, memory-hotplug: add online_movable and online_kernel Lai Jiangshan
@ 2012-09-10  8:59 ` Lai Jiangshan
  2012-09-11  0:40 ` [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Yasuaki Ishimatsu
  26 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-10  8:59 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

make online_movable/online_kernel can empty a zone
or can move memory to a empty zone.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   51 +++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e691076..1903850 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -216,8 +216,17 @@ static void resize_zone(struct zone *zone, unsigned long start_pfn,
 
 	zone_span_writelock(zone);
 
-	zone->zone_start_pfn = start_pfn;
-	zone->spanned_pages = end_pfn - start_pfn;
+	if (end_pfn - start_pfn) {
+		zone->zone_start_pfn = start_pfn;
+		zone->spanned_pages = end_pfn - start_pfn;
+	} else {
+		/*
+		 * make it consist as free_area_init_core(),
+		 * if spanned_pages = 0, then keep start_pfn = 0
+		 */
+		zone->zone_start_pfn = 0;
+		zone->spanned_pages = 0;
+	}
 
 	zone_span_writeunlock(zone);
 }
@@ -233,10 +242,19 @@ static void fix_zone_id(struct zone *zone, unsigned long start_pfn,
 		set_page_links(pfn_to_page(pfn), zid, nid, pfn);
 }
 
-static int move_pfn_range_left(struct zone *z1, struct zone *z2,
+static int __meminit move_pfn_range_left(struct zone *z1, struct zone *z2,
 		unsigned long start_pfn, unsigned long end_pfn)
 {
+	int ret;
 	unsigned long flags;
+	unsigned long z1_start_pfn;
+
+	if (!z1->wait_table) {
+		ret = init_currently_empty_zone(z1, start_pfn,
+			end_pfn - start_pfn, MEMMAP_HOTPLUG);
+		if (ret)
+			return ret;
+	}
 
 	pgdat_resize_lock(z1->zone_pgdat, &flags);
 
@@ -250,7 +268,13 @@ static int move_pfn_range_left(struct zone *z1, struct zone *z2,
 	if (end_pfn <= z2->zone_start_pfn)
 		goto out_fail;
 
-	resize_zone(z1, z1->zone_start_pfn, end_pfn);
+	/* use start_pfn for z1's start_pfn if z1 is empty */
+	if (z1->zone_start_pfn)
+		z1_start_pfn = z1->zone_start_pfn;
+	else
+		z1_start_pfn = start_pfn;
+
+	resize_zone(z1, z1_start_pfn, end_pfn);
 	resize_zone(z2, end_pfn, z2->zone_start_pfn + z2->spanned_pages);
 
 	pgdat_resize_unlock(z1->zone_pgdat, &flags);
@@ -263,10 +287,19 @@ out_fail:
 	return -1;
 }
 
-static int move_pfn_range_right(struct zone *z1, struct zone *z2,
+static int __meminit move_pfn_range_right(struct zone *z1, struct zone *z2,
 		unsigned long start_pfn, unsigned long end_pfn)
 {
+	int ret;
 	unsigned long flags;
+	unsigned long z2_end_pfn;
+
+	if (!z2->wait_table) {
+		ret = init_currently_empty_zone(z2, start_pfn,
+			end_pfn - start_pfn, MEMMAP_HOTPLUG);
+		if (ret)
+			return ret;
+	}
 
 	pgdat_resize_lock(z1->zone_pgdat, &flags);
 
@@ -280,8 +313,14 @@ static int move_pfn_range_right(struct zone *z1, struct zone *z2,
 	if (start_pfn >= z1->zone_start_pfn + z1->spanned_pages)
 		goto out_fail;
 
+	/* use end_pfn for z2's end_pfn if z2 is empty */
+	if (z2->zone_start_pfn)
+		z2_end_pfn = z2->zone_start_pfn + z2->spanned_pages;
+	else
+		z2_end_pfn = end_pfn;
+
 	resize_zone(z1, z1->zone_start_pfn, start_pfn);
-	resize_zone(z2, start_pfn, z2->zone_start_pfn + z2->spanned_pages);
+	resize_zone(z2, start_pfn, z2_end_pfn);
 
 	pgdat_resize_unlock(z1->zone_pgdat, &flags);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug
  2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
                   ` (25 preceding siblings ...)
  2012-09-10  8:59 ` [V4 PATCH 26/26] memory_hotplug: handle empty zone when online_movable/online_kernel Lai Jiangshan
@ 2012-09-11  0:40 ` Yasuaki Ishimatsu
  2012-09-11  1:22   ` Lai Jiangshan
  2012-09-11  9:44   ` [V4 PATCH 27/27] memory,hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock() Lai Jiangshan
  26 siblings, 2 replies; 35+ messages in thread
From: Yasuaki Ishimatsu @ 2012-09-11  0:40 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Andrew Morton

Hi Lai,

Using memory_online to hot-added node's memory, the following kernel messages
were shown. Is this a known issue?

[  327.837408] ------------[ cut here ]------------
[  327.892556] kernel BUG at mm/page_alloc.c:553!
[  327.945621] invalid opcode: 0000 [#1] SMP 
[  327.994748] Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma i7core_edac edac_core sg e1000e igb dca sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas scsi_mod
[  328.560103] CPU 0 
[  328.582021] Pid: 2445, comm: bash Not tainted 3.6.0-rc5-removable-node+ #1 FUJITSU-SV PRIMEQUEST 1800E/SB
[  328.698524] RIP: 0010:[<ffffffff8116ffdc>]  [<ffffffff8116ffdc>] free_pcppages_bulk+0x4ec/0x540
[  328.802580] RSP: 0018:ffff8807875f9b88  EFLAGS: 00010002
[  328.866025] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001
[  328.951273] RDX: 0000000000000002 RSI: ffffea007fe00000 RDI: ffff880764801680
[  329.036522] RBP: ffff8807875f9c38 R08: 0000000001ff8000 R09: ffff880764801740
[  329.121771] R10: 0000000001800000 R11: 0000000000000001 R12: 0000000000000002
[  329.207022] R13: ffffea007fe00000 R14: ffff880764801680 R15: ffffea007fe00020
[  329.292270] FS:  00007ff533e92700(0000) GS:ffff8807c1800000(0000) knlGS:0000000000000000
[  329.388942] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  329.457575] CR2: ffffffffff600400 CR3: 00000007b6826000 CR4: 00000000000007f0
[  329.542826] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  329.628075] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  329.713326] Process bash (pid: 2445, threadinfo ffff8807875f8000, task ffff8807b72ccca0)
[  329.809993] Stack:
[  329.833984]  ffff881ff8000000 0000000000000000 0000000000000000 0000000000000000
[  329.922788]  ffffffff81c3f160 ffff8807875f9fd8 ffff8807648016e8 0000000100000002
[  330.011593]  ffff8807875f8000 ffff8807875f8000 0000000000000030 ffff8807c19d0e18
[  330.100400] Call Trace:
[  330.129588]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
[  330.201345]  [<ffffffff81170445>] __free_pages+0x35/0x50
[  330.264798]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
[  330.334478]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
[  330.405197]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
[  330.474880]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
[  330.549750]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
[  330.636050]  [<ffffffff8165464b>] online_pages+0x22b/0x390
[  330.701584]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
[  330.773347]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
[  330.844063]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
[  330.921013]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
[  330.985502]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
[  331.054150]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
[  331.124867]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
[  331.190392]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
[  331.259038]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
[  331.320411]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
[  331.380747]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
[  331.452501] Code: ff ff 0f 0b eb fe 0f 0b 0f 1f 80 00 00 00 00 eb f7 0f 0b eb fe 48 c7 c7 80 3b c3 81 e8 ae 6c f6 ff 85 c0 0f 85 cb fd ff ff eb 8f <0f> 0b 66 90 eb fc 0f 0b eb fe 49 8b 47 e0 f6 c4 40 0f 1f 00 74 
[  331.685087] RIP  [<ffffffff8116ffdc>] free_pcppages_bulk+0x4ec/0x540
[  331.761102]  RSP <ffff8807875f9b88>
[  331.802749] ---[ end trace f3112128f3ab7e75 ]---
[  331.859455] BUG: sleeping function called from invalid context at mm/slub.c:930
[  331.946779] in_atomic(): 1, irqs_disabled(): 1, pid: 2445, name: bash
[  332.023723] INFO: lockdep is turned off.
[  332.070554] irq event stamp: 301462
[  332.112196] hardirqs last  enabled at (301461): [<ffffffff816700f0>] _raw_spin_unlock_irq+0x30/0x50
[  332.220397] hardirqs last disabled at (301462): [<ffffffff8166f5cf>] _raw_spin_lock_irq+0x1f/0x90
[  332.326522] softirqs last  enabled at (301450): [<ffffffff81076a9c>] __do_softirq+0x18c/0x3e0
[  332.428493] softirqs last disabled at (301445): [<ffffffff8167af3c>] call_softirq+0x1c/0x30
[  332.528391] Pid: 2445, comm: bash Tainted: G      D      3.6.0-rc5-removable-node+ #1
[  332.621944] Call Trace:
[  332.651131]  [<ffffffff810a943a>] __might_sleep+0x18a/0x240
[  332.717699]  [<ffffffff811bddbb>] __kmalloc+0x6b/0x220
[  332.779079]  [<ffffffff814f5ab1>] ? efivar_create_sysfs_entry+0x41/0x1b0
[  332.859144]  [<ffffffff814f5ab1>] efivar_create_sysfs_entry+0x41/0x1b0
[  332.937130]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
[  333.006812]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
[  333.087916]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
[  333.163827]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
[  333.230391]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
[  333.294883]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
[  333.357294]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
[  333.420746]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
[  333.481085]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
[  333.540386]  [<ffffffff8101a8eb>] die+0x5b/0x90
[  333.594489]  [<ffffffff81670c64>] do_trap+0xc4/0x170
[  333.653789]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
[  333.718278]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
[  333.792117]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[  333.870098]  [<ffffffff81670420>] ? restore_args+0x30/0x30
[  333.935620]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
[  333.996991]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
[  334.070825]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
[  334.144660]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
[  334.216417]  [<ffffffff81170445>] __free_pages+0x35/0x50
[  334.279868]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
[  334.349549]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
[  334.420269]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
[  334.489957]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
[  334.564832]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
[  334.651125]  [<ffffffff8165464b>] online_pages+0x22b/0x390
[  334.716654]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
[  334.788411]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
[  334.859130]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
[  334.936078]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
[  335.000568]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
[  335.069213]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
[  335.139931]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
[  335.205459]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
[  335.274103]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
[  335.335477]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
[  335.395815]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
[  335.467571] BUG: scheduling while atomic: bash/2445/0x10000004
[  335.537243] INFO: lockdep is turned off.
[  335.584074] Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma i7core_edac edac_core sg e1000e igb dca sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas scsi_mod
[  336.149444] irq event stamp: 301462
[  336.191085] hardirqs last  enabled at (301461): [<ffffffff816700f0>] _raw_spin_unlock_irq+0x30/0x50
[  336.299285] hardirqs last disabled at (301462): [<ffffffff8166f5cf>] _raw_spin_lock_irq+0x1f/0x90
[  336.405411] softirqs last  enabled at (301450): [<ffffffff81076a9c>] __do_softirq+0x18c/0x3e0
[  336.507381] softirqs last disabled at (301445): [<ffffffff8167af3c>] call_softirq+0x1c/0x30
[  336.607278] Pid: 2445, comm: bash Tainted: G      D      3.6.0-rc5-removable-node+ #1
[  336.700831] Call Trace:
[  336.730023]  [<ffffffff810a7f0a>] __schedule_bug+0x6a/0x90
[  336.795549]  [<ffffffff8166e128>] __schedule+0x7d8/0x880
[  336.858999]  [<ffffffff810acc6a>] __cond_resched+0x2a/0x40
[  336.924529]  [<ffffffff8166e260>] _cond_resched+0x30/0x40
[  336.989017]  [<ffffffff811bddc0>] __kmalloc+0x70/0x220
[  337.050393]  [<ffffffff814f5ab1>] ? efivar_create_sysfs_entry+0x41/0x1b0
[  337.130456]  [<ffffffff814f5ab1>] efivar_create_sysfs_entry+0x41/0x1b0
[  337.208445]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
[  337.278125]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
[  337.359228]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
[  337.435138]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
[  337.501705]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
[  337.566194]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
[  337.628607]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
[  337.692052]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
[  337.752383]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
[  337.811683]  [<ffffffff8101a8eb>] die+0x5b/0x90
[  337.865790]  [<ffffffff81670c64>] do_trap+0xc4/0x170
[  337.925090]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
[  337.989579]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
[  338.063413]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[  338.141400]  [<ffffffff81670420>] ? restore_args+0x30/0x30
[  338.206928]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
[  338.268305]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
[  338.342137]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
[  338.415973]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
[  338.487732]  [<ffffffff81170445>] __free_pages+0x35/0x50
[  338.551181]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
[  338.620863]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
[  338.691583]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
[  338.761264]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
[  338.836137]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
[  338.922431]  [<ffffffff8165464b>] online_pages+0x22b/0x390
[  338.987959]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
[  339.059718]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
[  339.130438]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
[  339.207385]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
[  339.271879]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
[  339.340520]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
[  339.411238]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
[  339.476766]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
[  339.545410]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
[  339.606784]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
[  339.667121]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b

[  339.739266] BUG: sleeping function called from invalid context at mm/slub.c:930
[  339.826795] in_atomic(): 1, irqs_disabled(): 0, pid: 2445, name: bash
[  339.903925] INFO: lockdep is turned off.
[  339.950938] Pid: 2445, comm: bash Tainted: G      D W    3.6.0-rc5-removable-node+ #1
[  340.044702] Call Trace:
[  340.073958]  [<ffffffff810a943a>] __might_sleep+0x18a/0x240
[  340.140704]  [<ffffffff811bd4fb>] kmem_cache_alloc_trace+0x4b/0x1d0
[  340.215759]  [<ffffffff814f5acf>] efivar_create_sysfs_entry+0x5f/0x1b0
[  340.293928]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
[  340.363794]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
[  340.445071]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
[  340.521163]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
[  340.587902]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
[  340.652571]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
[  340.715155]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
[  340.778739]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
[  340.839300]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
[  340.898737]  [<ffffffff8101a8eb>] die+0x5b/0x90
[  340.953042]  [<ffffffff81670c64>] do_trap+0xc4/0x170
[  341.012562]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
[  341.077211]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
[  341.151258]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[  341.229428]  [<ffffffff81670420>] ? restore_args+0x30/0x30
[  341.295109]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
[  341.356736]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
[  341.430789]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
[  341.504832]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
[  341.576803]  [<ffffffff81170445>] __free_pages+0x35/0x50
[  341.640459]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
[  341.710291]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
[  341.781224]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
[  341.851126]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
[  341.926211]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
[  342.012703]  [<ffffffff8165464b>] online_pages+0x22b/0x390
[  342.078434]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
[  342.150325]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
[  342.221165]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
[  342.298323]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
[  342.362966]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
[  342.431796]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
[  342.502689]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
[  342.568387]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
[  342.637216]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
[  342.698761]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
[  342.759236]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
[  342.831184] BUG: scheduling while atomic: bash/2445/0x10000004
[  342.901189] INFO: lockdep is turned off.
[  342.948190] Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma i7core_edac edac_core sg e1000e igb dca sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas scsi_mod
[  343.519830] Pid: 2445, comm: bash Tainted: G      D W    3.6.0-rc5-removable-node+ #1
[  343.613552] Call Trace:
[  343.642773]  [<ffffffff810a7f0a>] __schedule_bug+0x6a/0x90
[  343.708445]  [<ffffffff8166e128>] __schedule+0x7d8/0x880
[  343.772084]  [<ffffffff814f5acf>] ? efivar_create_sysfs_entry+0x5f/0x1b0
[  343.852324]  [<ffffffff810acc6a>] __cond_resched+0x2a/0x40
[  343.918004]  [<ffffffff8166e260>] _cond_resched+0x30/0x40
[  343.982676]  [<ffffffff811bd500>] kmem_cache_alloc_trace+0x50/0x1d0
[  344.057721]  [<ffffffff814f5acf>] efivar_create_sysfs_entry+0x5f/0x1b0
[  344.135874]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
[  344.205752]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
[  344.287007]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
[  344.363106]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
[  344.429885]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
[  344.494584]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
[  344.557185]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
[  344.620847]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
[  344.681390]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
[  344.740918]  [<ffffffff8101a8eb>] die+0x5b/0x90
[  344.795241]  [<ffffffff81670c64>] do_trap+0xc4/0x170
[  344.854734]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
[  344.919416]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
[  344.993473]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[  345.071682]  [<ffffffff81670420>] ? restore_args+0x30/0x30
[  345.137431]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
[  345.199000]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
[  345.273041]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
[  345.347082]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
[  345.419041]  [<ffffffff81170445>] __free_pages+0x35/0x50
[  345.482682]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
[  345.552584]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
[  345.623511]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
[  345.693391]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
[  345.768476]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
[  345.854961]  [<ffffffff8165464b>] online_pages+0x22b/0x390
[  345.920712]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
[  345.992687]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
[  346.063626]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
[  346.140786]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
[  346.205491]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
[  346.274370]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
[  346.345299]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
[  346.411046]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
[  346.479894]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
[  346.541493]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
[  346.602042]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
...

Thanks,
Yasuaki Ishimatsu

2012/09/10 17:58, Lai Jiangshan wrote:
> 	A) Introduction:
> 
> This patchset adds MOVABLE-dedicated node and online_movable for memory-management.
> 
> It is used for anti-fragmentation(hugepage, big-order allocation...),
> hot-removal-of-memory(virtualization, power-conserve, move memory between systems
> to make better utilities of memories).
> 
> This patchset is based on 650470d1da17c20bf9700f9446775a01cbda52c3 of newest tip tree.
> 
> 	B) User Interface:
> 
> When users(big system manager) need config some node/memory as MOVABLE:
> 	1 Use kernelcore_max_addr=XX when boot
> 	2 Use movable_online hotplug action when running
> We may introduce some more convenient interface, such as
> 	movable_node=NODE_LIST boot option.
> 
> 	C) Patches
> 
> Patch1-3      Fix problems of the current code.(all related with hotplug)
> Patch4        cleanup for node_state_attr
> Patch5        introduce N_MEMORY
> Patch6-18     use N_MEMORY instead N_HIGH_MEMORY.
>                The patches are separated by subsystem,
>                *these conversions was(must be) checked carefully*.
>                Patch18 also changes the node_states initialization
> Patch19       Add config to allow MOVABLE-dedicated node
> Patch20-24    Add kernelcore_max_addr
> Patch25,26       Add online_movable and online_kernel
> 
> 
> 	D) changes
> change V4-v3
> 	rebase.
> 	online_movable/online_kernel can create a zone from empty
> 	or empyt a zone
> 
> change V3-v2:
> 	Proper nodemask management
> 
> change V2-V1:
> 
> The original V1 patchset of MOVABLE-dedicated node is here:
> http://comments.gmane.org/gmane.linux.kernel.mm/78122
> 
> The new V2 adds N_MEMORY and a notion of "MOVABLE-dedicated node".
> And fix some related problems.
> 
> The orignal V1 patchset of "add online_movable" is here:
> https://lkml.org/lkml/2012/7/4/145
> 
> The new V2 discards the MIGRATE_HOTREMOVE approach, and use a more straight
> implementation(only 1 patch).
> Lai Jiangshan (22):
>    page_alloc.c: don't subtract unrelated memmap from zone's present
>      pages
>    memory_hotplug: fix missing nodemask management
>    slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>    node: cleanup node_state_attr
>    node_states: introduce N_MEMORY
>    cpuset: use N_MEMORY instead N_HIGH_MEMORY
>    procfs: use N_MEMORY instead N_HIGH_MEMORY
>    memcontrol: use N_MEMORY instead N_HIGH_MEMORY
>    oom: use N_MEMORY instead N_HIGH_MEMORY
>    mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
>    mempolicy: use N_MEMORY instead N_HIGH_MEMORY
>    hugetlb: use N_MEMORY instead N_HIGH_MEMORY
>    vmstat: use N_MEMORY instead N_HIGH_MEMORY
>    kthread: use N_MEMORY instead N_HIGH_MEMORY
>    init: use N_MEMORY instead N_HIGH_MEMORY
>    vmscan: use N_MEMORY instead N_HIGH_MEMORY
>    page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states
>      initialization
>    hotplug: update nodemasks management
>    numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
>    page_alloc: add kernelcore_max_addr
>    mm, memory-hotplug: add online_movable and online_kernel
>    memory_hotplug: handle empty zone when online_movable/online_kernel
> 
> Yasuaki Ishimatsu (4):
>    x86: get pg_data_t's memory from other node
>    x86: use memblock_set_current_limit() to set memblock.current_limit
>    memblock: limit memory address from memblock
>    memblock: compare current_limit with end variable at
>      memblock_find_in_range_node()
> 
>   Documentation/cgroups/cpusets.txt   |    2 +-
>   Documentation/kernel-parameters.txt |    9 ++
>   Documentation/memory-hotplug.txt    |   24 +++-
>   arch/x86/kernel/setup.c             |    4 +-
>   arch/x86/mm/init_64.c               |    4 +-
>   arch/x86/mm/numa.c                  |    8 +-
>   drivers/base/memory.c               |   19 ++-
>   drivers/base/node.c                 |   28 +++--
>   fs/proc/kcore.c                     |    2 +-
>   fs/proc/task_mmu.c                  |    4 +-
>   include/linux/cpuset.h              |    2 +-
>   include/linux/memblock.h            |    1 +
>   include/linux/memory.h              |    2 +
>   include/linux/memory_hotplug.h      |   13 ++-
>   include/linux/nodemask.h            |    5 +
>   init/main.c                         |    2 +-
>   kernel/cpuset.c                     |   32 ++--
>   kernel/kthread.c                    |    2 +-
>   mm/Kconfig                          |    8 +
>   mm/hugetlb.c                        |   24 ++--
>   mm/memblock.c                       |   10 +-
>   mm/memcontrol.c                     |   18 ++--
>   mm/memory_hotplug.c                 |  271 ++++++++++++++++++++++++++++++++---
>   mm/mempolicy.c                      |   12 +-
>   mm/migrate.c                        |    2 +-
>   mm/oom_kill.c                       |    2 +-
>   mm/page_alloc.c                     |   96 ++++++++-----
>   mm/page_cgroup.c                    |    2 +-
>   mm/slub.c                           |    4 +-
>   mm/vmscan.c                         |    4 +-
>   mm/vmstat.c                         |    4 +-
>   31 files changed, 476 insertions(+), 144 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug
  2012-09-11  0:40 ` [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Yasuaki Ishimatsu
@ 2012-09-11  1:22   ` Lai Jiangshan
  2012-09-11  1:37     ` Yasuaki Ishimatsu
  2012-09-11  9:44   ` [V4 PATCH 27/27] memory,hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock() Lai Jiangshan
  1 sibling, 1 reply; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-11  1:22 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Andrew Morton

On 09/11/2012 08:40 AM, Yasuaki Ishimatsu wrote:
> Hi Lai,
> 
> Using memory_online to hot-added node's memory, the following kernel messages
> were shown. Is this a known issue?

Thank you for your report.

What operations did you have performed ?

Thanks.
Lai

> 
> [  327.837408] ------------[ cut here ]------------
> [  327.892556] kernel BUG at mm/page_alloc.c:553!
> [  327.945621] invalid opcode: 0000 [#1] SMP 
> [  327.994748] Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma i7core_edac edac_core sg e1000e igb dca sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas scsi_mod
> [  328.560103] CPU 0 
> [  328.582021] Pid: 2445, comm: bash Not tainted 3.6.0-rc5-removable-node+ #1 FUJITSU-SV PRIMEQUEST 1800E/SB
> [  328.698524] RIP: 0010:[<ffffffff8116ffdc>]  [<ffffffff8116ffdc>] free_pcppages_bulk+0x4ec/0x540
> [  328.802580] RSP: 0018:ffff8807875f9b88  EFLAGS: 00010002
> [  328.866025] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001
> [  328.951273] RDX: 0000000000000002 RSI: ffffea007fe00000 RDI: ffff880764801680
> [  329.036522] RBP: ffff8807875f9c38 R08: 0000000001ff8000 R09: ffff880764801740
> [  329.121771] R10: 0000000001800000 R11: 0000000000000001 R12: 0000000000000002
> [  329.207022] R13: ffffea007fe00000 R14: ffff880764801680 R15: ffffea007fe00020
> [  329.292270] FS:  00007ff533e92700(0000) GS:ffff8807c1800000(0000) knlGS:0000000000000000
> [  329.388942] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  329.457575] CR2: ffffffffff600400 CR3: 00000007b6826000 CR4: 00000000000007f0
> [  329.542826] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  329.628075] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  329.713326] Process bash (pid: 2445, threadinfo ffff8807875f8000, task ffff8807b72ccca0)
> [  329.809993] Stack:
> [  329.833984]  ffff881ff8000000 0000000000000000 0000000000000000 0000000000000000
> [  329.922788]  ffffffff81c3f160 ffff8807875f9fd8 ffff8807648016e8 0000000100000002
> [  330.011593]  ffff8807875f8000 ffff8807875f8000 0000000000000030 ffff8807c19d0e18
> [  330.100400] Call Trace:
> [  330.129588]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
> [  330.201345]  [<ffffffff81170445>] __free_pages+0x35/0x50
> [  330.264798]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
> [  330.334478]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
> [  330.405197]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
> [  330.474880]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
> [  330.549750]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
> [  330.636050]  [<ffffffff8165464b>] online_pages+0x22b/0x390
> [  330.701584]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
> [  330.773347]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
> [  330.844063]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
> [  330.921013]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
> [  330.985502]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
> [  331.054150]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
> [  331.124867]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
> [  331.190392]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
> [  331.259038]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
> [  331.320411]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
> [  331.380747]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
> [  331.452501] Code: ff ff 0f 0b eb fe 0f 0b 0f 1f 80 00 00 00 00 eb f7 0f 0b eb fe 48 c7 c7 80 3b c3 81 e8 ae 6c f6 ff 85 c0 0f 85 cb fd ff ff eb 8f <0f> 0b 66 90 eb fc 0f 0b eb fe 49 8b 47 e0 f6 c4 40 0f 1f 00 74 
> [  331.685087] RIP  [<ffffffff8116ffdc>] free_pcppages_bulk+0x4ec/0x540
> [  331.761102]  RSP <ffff8807875f9b88>
> [  331.802749] ---[ end trace f3112128f3ab7e75 ]---
> [  331.859455] BUG: sleeping function called from invalid context at mm/slub.c:930
> [  331.946779] in_atomic(): 1, irqs_disabled(): 1, pid: 2445, name: bash
> [  332.023723] INFO: lockdep is turned off.
> [  332.070554] irq event stamp: 301462
> [  332.112196] hardirqs last  enabled at (301461): [<ffffffff816700f0>] _raw_spin_unlock_irq+0x30/0x50
> [  332.220397] hardirqs last disabled at (301462): [<ffffffff8166f5cf>] _raw_spin_lock_irq+0x1f/0x90
> [  332.326522] softirqs last  enabled at (301450): [<ffffffff81076a9c>] __do_softirq+0x18c/0x3e0
> [  332.428493] softirqs last disabled at (301445): [<ffffffff8167af3c>] call_softirq+0x1c/0x30
> [  332.528391] Pid: 2445, comm: bash Tainted: G      D      3.6.0-rc5-removable-node+ #1
> [  332.621944] Call Trace:
> [  332.651131]  [<ffffffff810a943a>] __might_sleep+0x18a/0x240
> [  332.717699]  [<ffffffff811bddbb>] __kmalloc+0x6b/0x220
> [  332.779079]  [<ffffffff814f5ab1>] ? efivar_create_sysfs_entry+0x41/0x1b0
> [  332.859144]  [<ffffffff814f5ab1>] efivar_create_sysfs_entry+0x41/0x1b0
> [  332.937130]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
> [  333.006812]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
> [  333.087916]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
> [  333.163827]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
> [  333.230391]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
> [  333.294883]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
> [  333.357294]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
> [  333.420746]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
> [  333.481085]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
> [  333.540386]  [<ffffffff8101a8eb>] die+0x5b/0x90
> [  333.594489]  [<ffffffff81670c64>] do_trap+0xc4/0x170
> [  333.653789]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
> [  333.718278]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
> [  333.792117]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [  333.870098]  [<ffffffff81670420>] ? restore_args+0x30/0x30
> [  333.935620]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
> [  333.996991]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
> [  334.070825]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
> [  334.144660]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
> [  334.216417]  [<ffffffff81170445>] __free_pages+0x35/0x50
> [  334.279868]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
> [  334.349549]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
> [  334.420269]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
> [  334.489957]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
> [  334.564832]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
> [  334.651125]  [<ffffffff8165464b>] online_pages+0x22b/0x390
> [  334.716654]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
> [  334.788411]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
> [  334.859130]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
> [  334.936078]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
> [  335.000568]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
> [  335.069213]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
> [  335.139931]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
> [  335.205459]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
> [  335.274103]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
> [  335.335477]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
> [  335.395815]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
> [  335.467571] BUG: scheduling while atomic: bash/2445/0x10000004
> [  335.537243] INFO: lockdep is turned off.
> [  335.584074] Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma i7core_edac edac_core sg e1000e igb dca sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas scsi_mod
> [  336.149444] irq event stamp: 301462
> [  336.191085] hardirqs last  enabled at (301461): [<ffffffff816700f0>] _raw_spin_unlock_irq+0x30/0x50
> [  336.299285] hardirqs last disabled at (301462): [<ffffffff8166f5cf>] _raw_spin_lock_irq+0x1f/0x90
> [  336.405411] softirqs last  enabled at (301450): [<ffffffff81076a9c>] __do_softirq+0x18c/0x3e0
> [  336.507381] softirqs last disabled at (301445): [<ffffffff8167af3c>] call_softirq+0x1c/0x30
> [  336.607278] Pid: 2445, comm: bash Tainted: G      D      3.6.0-rc5-removable-node+ #1
> [  336.700831] Call Trace:
> [  336.730023]  [<ffffffff810a7f0a>] __schedule_bug+0x6a/0x90
> [  336.795549]  [<ffffffff8166e128>] __schedule+0x7d8/0x880
> [  336.858999]  [<ffffffff810acc6a>] __cond_resched+0x2a/0x40
> [  336.924529]  [<ffffffff8166e260>] _cond_resched+0x30/0x40
> [  336.989017]  [<ffffffff811bddc0>] __kmalloc+0x70/0x220
> [  337.050393]  [<ffffffff814f5ab1>] ? efivar_create_sysfs_entry+0x41/0x1b0
> [  337.130456]  [<ffffffff814f5ab1>] efivar_create_sysfs_entry+0x41/0x1b0
> [  337.208445]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
> [  337.278125]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
> [  337.359228]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
> [  337.435138]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
> [  337.501705]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
> [  337.566194]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
> [  337.628607]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
> [  337.692052]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
> [  337.752383]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
> [  337.811683]  [<ffffffff8101a8eb>] die+0x5b/0x90
> [  337.865790]  [<ffffffff81670c64>] do_trap+0xc4/0x170
> [  337.925090]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
> [  337.989579]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
> [  338.063413]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [  338.141400]  [<ffffffff81670420>] ? restore_args+0x30/0x30
> [  338.206928]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
> [  338.268305]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
> [  338.342137]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
> [  338.415973]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
> [  338.487732]  [<ffffffff81170445>] __free_pages+0x35/0x50
> [  338.551181]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
> [  338.620863]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
> [  338.691583]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
> [  338.761264]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
> [  338.836137]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
> [  338.922431]  [<ffffffff8165464b>] online_pages+0x22b/0x390
> [  338.987959]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
> [  339.059718]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
> [  339.130438]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
> [  339.207385]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
> [  339.271879]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
> [  339.340520]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
> [  339.411238]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
> [  339.476766]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
> [  339.545410]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
> [  339.606784]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
> [  339.667121]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
> 
> [  339.739266] BUG: sleeping function called from invalid context at mm/slub.c:930
> [  339.826795] in_atomic(): 1, irqs_disabled(): 0, pid: 2445, name: bash
> [  339.903925] INFO: lockdep is turned off.
> [  339.950938] Pid: 2445, comm: bash Tainted: G      D W    3.6.0-rc5-removable-node+ #1
> [  340.044702] Call Trace:
> [  340.073958]  [<ffffffff810a943a>] __might_sleep+0x18a/0x240
> [  340.140704]  [<ffffffff811bd4fb>] kmem_cache_alloc_trace+0x4b/0x1d0
> [  340.215759]  [<ffffffff814f5acf>] efivar_create_sysfs_entry+0x5f/0x1b0
> [  340.293928]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
> [  340.363794]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
> [  340.445071]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
> [  340.521163]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
> [  340.587902]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
> [  340.652571]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
> [  340.715155]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
> [  340.778739]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
> [  340.839300]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
> [  340.898737]  [<ffffffff8101a8eb>] die+0x5b/0x90
> [  340.953042]  [<ffffffff81670c64>] do_trap+0xc4/0x170
> [  341.012562]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
> [  341.077211]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
> [  341.151258]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [  341.229428]  [<ffffffff81670420>] ? restore_args+0x30/0x30
> [  341.295109]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
> [  341.356736]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
> [  341.430789]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
> [  341.504832]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
> [  341.576803]  [<ffffffff81170445>] __free_pages+0x35/0x50
> [  341.640459]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
> [  341.710291]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
> [  341.781224]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
> [  341.851126]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
> [  341.926211]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
> [  342.012703]  [<ffffffff8165464b>] online_pages+0x22b/0x390
> [  342.078434]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
> [  342.150325]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
> [  342.221165]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
> [  342.298323]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
> [  342.362966]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
> [  342.431796]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
> [  342.502689]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
> [  342.568387]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
> [  342.637216]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
> [  342.698761]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
> [  342.759236]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
> [  342.831184] BUG: scheduling while atomic: bash/2445/0x10000004
> [  342.901189] INFO: lockdep is turned off.
> [  342.948190] Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma i7core_edac edac_core sg e1000e igb dca sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas scsi_mod
> [  343.519830] Pid: 2445, comm: bash Tainted: G      D W    3.6.0-rc5-removable-node+ #1
> [  343.613552] Call Trace:
> [  343.642773]  [<ffffffff810a7f0a>] __schedule_bug+0x6a/0x90
> [  343.708445]  [<ffffffff8166e128>] __schedule+0x7d8/0x880
> [  343.772084]  [<ffffffff814f5acf>] ? efivar_create_sysfs_entry+0x5f/0x1b0
> [  343.852324]  [<ffffffff810acc6a>] __cond_resched+0x2a/0x40
> [  343.918004]  [<ffffffff8166e260>] _cond_resched+0x30/0x40
> [  343.982676]  [<ffffffff811bd500>] kmem_cache_alloc_trace+0x50/0x1d0
> [  344.057721]  [<ffffffff814f5acf>] efivar_create_sysfs_entry+0x5f/0x1b0
> [  344.135874]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
> [  344.205752]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
> [  344.287007]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
> [  344.363106]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
> [  344.429885]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
> [  344.494584]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
> [  344.557185]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
> [  344.620847]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
> [  344.681390]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
> [  344.740918]  [<ffffffff8101a8eb>] die+0x5b/0x90
> [  344.795241]  [<ffffffff81670c64>] do_trap+0xc4/0x170
> [  344.854734]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
> [  344.919416]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
> [  344.993473]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [  345.071682]  [<ffffffff81670420>] ? restore_args+0x30/0x30
> [  345.137431]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
> [  345.199000]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
> [  345.273041]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
> [  345.347082]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
> [  345.419041]  [<ffffffff81170445>] __free_pages+0x35/0x50
> [  345.482682]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
> [  345.552584]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
> [  345.623511]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
> [  345.693391]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
> [  345.768476]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
> [  345.854961]  [<ffffffff8165464b>] online_pages+0x22b/0x390
> [  345.920712]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
> [  345.992687]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
> [  346.063626]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
> [  346.140786]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
> [  346.205491]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
> [  346.274370]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
> [  346.345299]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
> [  346.411046]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
> [  346.479894]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
> [  346.541493]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
> [  346.602042]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
> ...
> 
> Thanks,
> Yasuaki Ishimatsu
> 
> 2012/09/10 17:58, Lai Jiangshan wrote:
>> 	A) Introduction:
>>
>> This patchset adds MOVABLE-dedicated node and online_movable for memory-management.
>>
>> It is used for anti-fragmentation(hugepage, big-order allocation...),
>> hot-removal-of-memory(virtualization, power-conserve, move memory between systems
>> to make better utilities of memories).
>>
>> This patchset is based on 650470d1da17c20bf9700f9446775a01cbda52c3 of newest tip tree.
>>
>> 	B) User Interface:
>>
>> When users(big system manager) need config some node/memory as MOVABLE:
>> 	1 Use kernelcore_max_addr=XX when boot
>> 	2 Use movable_online hotplug action when running
>> We may introduce some more convenient interface, such as
>> 	movable_node=NODE_LIST boot option.
>>
>> 	C) Patches
>>
>> Patch1-3      Fix problems of the current code.(all related with hotplug)
>> Patch4        cleanup for node_state_attr
>> Patch5        introduce N_MEMORY
>> Patch6-18     use N_MEMORY instead N_HIGH_MEMORY.
>>                The patches are separated by subsystem,
>>                *these conversions was(must be) checked carefully*.
>>                Patch18 also changes the node_states initialization
>> Patch19       Add config to allow MOVABLE-dedicated node
>> Patch20-24    Add kernelcore_max_addr
>> Patch25,26       Add online_movable and online_kernel
>>
>>
>> 	D) changes
>> change V4-v3
>> 	rebase.
>> 	online_movable/online_kernel can create a zone from empty
>> 	or empyt a zone
>>
>> change V3-v2:
>> 	Proper nodemask management
>>
>> change V2-V1:
>>
>> The original V1 patchset of MOVABLE-dedicated node is here:
>> http://comments.gmane.org/gmane.linux.kernel.mm/78122
>>
>> The new V2 adds N_MEMORY and a notion of "MOVABLE-dedicated node".
>> And fix some related problems.
>>
>> The orignal V1 patchset of "add online_movable" is here:
>> https://lkml.org/lkml/2012/7/4/145
>>
>> The new V2 discards the MIGRATE_HOTREMOVE approach, and use a more straight
>> implementation(only 1 patch).
>> Lai Jiangshan (22):
>>    page_alloc.c: don't subtract unrelated memmap from zone's present
>>      pages
>>    memory_hotplug: fix missing nodemask management
>>    slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>>    node: cleanup node_state_attr
>>    node_states: introduce N_MEMORY
>>    cpuset: use N_MEMORY instead N_HIGH_MEMORY
>>    procfs: use N_MEMORY instead N_HIGH_MEMORY
>>    memcontrol: use N_MEMORY instead N_HIGH_MEMORY
>>    oom: use N_MEMORY instead N_HIGH_MEMORY
>>    mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
>>    mempolicy: use N_MEMORY instead N_HIGH_MEMORY
>>    hugetlb: use N_MEMORY instead N_HIGH_MEMORY
>>    vmstat: use N_MEMORY instead N_HIGH_MEMORY
>>    kthread: use N_MEMORY instead N_HIGH_MEMORY
>>    init: use N_MEMORY instead N_HIGH_MEMORY
>>    vmscan: use N_MEMORY instead N_HIGH_MEMORY
>>    page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states
>>      initialization
>>    hotplug: update nodemasks management
>>    numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
>>    page_alloc: add kernelcore_max_addr
>>    mm, memory-hotplug: add online_movable and online_kernel
>>    memory_hotplug: handle empty zone when online_movable/online_kernel
>>
>> Yasuaki Ishimatsu (4):
>>    x86: get pg_data_t's memory from other node
>>    x86: use memblock_set_current_limit() to set memblock.current_limit
>>    memblock: limit memory address from memblock
>>    memblock: compare current_limit with end variable at
>>      memblock_find_in_range_node()
>>
>>   Documentation/cgroups/cpusets.txt   |    2 +-
>>   Documentation/kernel-parameters.txt |    9 ++
>>   Documentation/memory-hotplug.txt    |   24 +++-
>>   arch/x86/kernel/setup.c             |    4 +-
>>   arch/x86/mm/init_64.c               |    4 +-
>>   arch/x86/mm/numa.c                  |    8 +-
>>   drivers/base/memory.c               |   19 ++-
>>   drivers/base/node.c                 |   28 +++--
>>   fs/proc/kcore.c                     |    2 +-
>>   fs/proc/task_mmu.c                  |    4 +-
>>   include/linux/cpuset.h              |    2 +-
>>   include/linux/memblock.h            |    1 +
>>   include/linux/memory.h              |    2 +
>>   include/linux/memory_hotplug.h      |   13 ++-
>>   include/linux/nodemask.h            |    5 +
>>   init/main.c                         |    2 +-
>>   kernel/cpuset.c                     |   32 ++--
>>   kernel/kthread.c                    |    2 +-
>>   mm/Kconfig                          |    8 +
>>   mm/hugetlb.c                        |   24 ++--
>>   mm/memblock.c                       |   10 +-
>>   mm/memcontrol.c                     |   18 ++--
>>   mm/memory_hotplug.c                 |  271 ++++++++++++++++++++++++++++++++---
>>   mm/mempolicy.c                      |   12 +-
>>   mm/migrate.c                        |    2 +-
>>   mm/oom_kill.c                       |    2 +-
>>   mm/page_alloc.c                     |   96 ++++++++-----
>>   mm/page_cgroup.c                    |    2 +-
>>   mm/slub.c                           |    4 +-
>>   mm/vmscan.c                         |    4 +-
>>   mm/vmstat.c                         |    4 +-
>>   31 files changed, 476 insertions(+), 144 deletions(-)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug
  2012-09-11  1:22   ` Lai Jiangshan
@ 2012-09-11  1:37     ` Yasuaki Ishimatsu
  2012-09-11  3:09       ` Lai Jiangshan
  0 siblings, 1 reply; 35+ messages in thread
From: Yasuaki Ishimatsu @ 2012-09-11  1:37 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Andrew Morton

Hi Lai,

2012/09/11 10:22, Lai Jiangshan wrote:
> On 09/11/2012 08:40 AM, Yasuaki Ishimatsu wrote:
>> Hi Lai,
>>
>> Using memory_online to hot-added node's memory, the following kernel messages
>> were shown. Is this a known issue?
>
> Thank you for your report.
>
> What operations did you have performed ?

My operations are as follows:

1. Hot-add a new node by container driver.
    In my system, container driver hot-addes a new nodes which includes CPUs and
    memorys.

2. echo online_movable to hot-added nodes's memory
    When container driver hot-adds a new nodes, my system creates node2 sysfs.
    And the sysfs has memory768-memory1023 sysfs. So I echo "online_movable"
    to memory1023/state file.
    # echo online_movable > memory1023/state

Thanks,
Yasuaki Ishimatsu

>
> Thanks.
> Lai
>
>>
>> [  327.837408] ------------[ cut here ]------------
>> [  327.892556] kernel BUG at mm/page_alloc.c:553!
>> [  327.945621] invalid opcode: 0000 [#1] SMP
>> [  327.994748] Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma i7core_edac edac_core sg e1000e igb dca sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>> [  328.560103] CPU 0
>> [  328.582021] Pid: 2445, comm: bash Not tainted 3.6.0-rc5-removable-node+ #1 FUJITSU-SV PRIMEQUEST 1800E/SB
>> [  328.698524] RIP: 0010:[<ffffffff8116ffdc>]  [<ffffffff8116ffdc>] free_pcppages_bulk+0x4ec/0x540
>> [  328.802580] RSP: 0018:ffff8807875f9b88  EFLAGS: 00010002
>> [  328.866025] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001
>> [  328.951273] RDX: 0000000000000002 RSI: ffffea007fe00000 RDI: ffff880764801680
>> [  329.036522] RBP: ffff8807875f9c38 R08: 0000000001ff8000 R09: ffff880764801740
>> [  329.121771] R10: 0000000001800000 R11: 0000000000000001 R12: 0000000000000002
>> [  329.207022] R13: ffffea007fe00000 R14: ffff880764801680 R15: ffffea007fe00020
>> [  329.292270] FS:  00007ff533e92700(0000) GS:ffff8807c1800000(0000) knlGS:0000000000000000
>> [  329.388942] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  329.457575] CR2: ffffffffff600400 CR3: 00000007b6826000 CR4: 00000000000007f0
>> [  329.542826] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  329.628075] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [  329.713326] Process bash (pid: 2445, threadinfo ffff8807875f8000, task ffff8807b72ccca0)
>> [  329.809993] Stack:
>> [  329.833984]  ffff881ff8000000 0000000000000000 0000000000000000 0000000000000000
>> [  329.922788]  ffffffff81c3f160 ffff8807875f9fd8 ffff8807648016e8 0000000100000002
>> [  330.011593]  ffff8807875f8000 ffff8807875f8000 0000000000000030 ffff8807c19d0e18
>> [  330.100400] Call Trace:
>> [  330.129588]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
>> [  330.201345]  [<ffffffff81170445>] __free_pages+0x35/0x50
>> [  330.264798]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
>> [  330.334478]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
>> [  330.405197]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
>> [  330.474880]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
>> [  330.549750]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
>> [  330.636050]  [<ffffffff8165464b>] online_pages+0x22b/0x390
>> [  330.701584]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
>> [  330.773347]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
>> [  330.844063]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
>> [  330.921013]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
>> [  330.985502]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
>> [  331.054150]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
>> [  331.124867]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
>> [  331.190392]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
>> [  331.259038]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
>> [  331.320411]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
>> [  331.380747]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
>> [  331.452501] Code: ff ff 0f 0b eb fe 0f 0b 0f 1f 80 00 00 00 00 eb f7 0f 0b eb fe 48 c7 c7 80 3b c3 81 e8 ae 6c f6 ff 85 c0 0f 85 cb fd ff ff eb 8f <0f> 0b 66 90 eb fc 0f 0b eb fe 49 8b 47 e0 f6 c4 40 0f 1f 00 74
>> [  331.685087] RIP  [<ffffffff8116ffdc>] free_pcppages_bulk+0x4ec/0x540
>> [  331.761102]  RSP <ffff8807875f9b88>
>> [  331.802749] ---[ end trace f3112128f3ab7e75 ]---
>> [  331.859455] BUG: sleeping function called from invalid context at mm/slub.c:930
>> [  331.946779] in_atomic(): 1, irqs_disabled(): 1, pid: 2445, name: bash
>> [  332.023723] INFO: lockdep is turned off.
>> [  332.070554] irq event stamp: 301462
>> [  332.112196] hardirqs last  enabled at (301461): [<ffffffff816700f0>] _raw_spin_unlock_irq+0x30/0x50
>> [  332.220397] hardirqs last disabled at (301462): [<ffffffff8166f5cf>] _raw_spin_lock_irq+0x1f/0x90
>> [  332.326522] softirqs last  enabled at (301450): [<ffffffff81076a9c>] __do_softirq+0x18c/0x3e0
>> [  332.428493] softirqs last disabled at (301445): [<ffffffff8167af3c>] call_softirq+0x1c/0x30
>> [  332.528391] Pid: 2445, comm: bash Tainted: G      D      3.6.0-rc5-removable-node+ #1
>> [  332.621944] Call Trace:
>> [  332.651131]  [<ffffffff810a943a>] __might_sleep+0x18a/0x240
>> [  332.717699]  [<ffffffff811bddbb>] __kmalloc+0x6b/0x220
>> [  332.779079]  [<ffffffff814f5ab1>] ? efivar_create_sysfs_entry+0x41/0x1b0
>> [  332.859144]  [<ffffffff814f5ab1>] efivar_create_sysfs_entry+0x41/0x1b0
>> [  332.937130]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
>> [  333.006812]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
>> [  333.087916]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
>> [  333.163827]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
>> [  333.230391]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
>> [  333.294883]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
>> [  333.357294]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
>> [  333.420746]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
>> [  333.481085]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
>> [  333.540386]  [<ffffffff8101a8eb>] die+0x5b/0x90
>> [  333.594489]  [<ffffffff81670c64>] do_trap+0xc4/0x170
>> [  333.653789]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
>> [  333.718278]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
>> [  333.792117]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>> [  333.870098]  [<ffffffff81670420>] ? restore_args+0x30/0x30
>> [  333.935620]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
>> [  333.996991]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
>> [  334.070825]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
>> [  334.144660]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
>> [  334.216417]  [<ffffffff81170445>] __free_pages+0x35/0x50
>> [  334.279868]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
>> [  334.349549]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
>> [  334.420269]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
>> [  334.489957]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
>> [  334.564832]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
>> [  334.651125]  [<ffffffff8165464b>] online_pages+0x22b/0x390
>> [  334.716654]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
>> [  334.788411]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
>> [  334.859130]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
>> [  334.936078]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
>> [  335.000568]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
>> [  335.069213]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
>> [  335.139931]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
>> [  335.205459]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
>> [  335.274103]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
>> [  335.335477]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
>> [  335.395815]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
>> [  335.467571] BUG: scheduling while atomic: bash/2445/0x10000004
>> [  335.537243] INFO: lockdep is turned off.
>> [  335.584074] Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma i7core_edac edac_core sg e1000e igb dca sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>> [  336.149444] irq event stamp: 301462
>> [  336.191085] hardirqs last  enabled at (301461): [<ffffffff816700f0>] _raw_spin_unlock_irq+0x30/0x50
>> [  336.299285] hardirqs last disabled at (301462): [<ffffffff8166f5cf>] _raw_spin_lock_irq+0x1f/0x90
>> [  336.405411] softirqs last  enabled at (301450): [<ffffffff81076a9c>] __do_softirq+0x18c/0x3e0
>> [  336.507381] softirqs last disabled at (301445): [<ffffffff8167af3c>] call_softirq+0x1c/0x30
>> [  336.607278] Pid: 2445, comm: bash Tainted: G      D      3.6.0-rc5-removable-node+ #1
>> [  336.700831] Call Trace:
>> [  336.730023]  [<ffffffff810a7f0a>] __schedule_bug+0x6a/0x90
>> [  336.795549]  [<ffffffff8166e128>] __schedule+0x7d8/0x880
>> [  336.858999]  [<ffffffff810acc6a>] __cond_resched+0x2a/0x40
>> [  336.924529]  [<ffffffff8166e260>] _cond_resched+0x30/0x40
>> [  336.989017]  [<ffffffff811bddc0>] __kmalloc+0x70/0x220
>> [  337.050393]  [<ffffffff814f5ab1>] ? efivar_create_sysfs_entry+0x41/0x1b0
>> [  337.130456]  [<ffffffff814f5ab1>] efivar_create_sysfs_entry+0x41/0x1b0
>> [  337.208445]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
>> [  337.278125]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
>> [  337.359228]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
>> [  337.435138]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
>> [  337.501705]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
>> [  337.566194]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
>> [  337.628607]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
>> [  337.692052]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
>> [  337.752383]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
>> [  337.811683]  [<ffffffff8101a8eb>] die+0x5b/0x90
>> [  337.865790]  [<ffffffff81670c64>] do_trap+0xc4/0x170
>> [  337.925090]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
>> [  337.989579]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
>> [  338.063413]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>> [  338.141400]  [<ffffffff81670420>] ? restore_args+0x30/0x30
>> [  338.206928]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
>> [  338.268305]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
>> [  338.342137]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
>> [  338.415973]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
>> [  338.487732]  [<ffffffff81170445>] __free_pages+0x35/0x50
>> [  338.551181]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
>> [  338.620863]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
>> [  338.691583]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
>> [  338.761264]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
>> [  338.836137]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
>> [  338.922431]  [<ffffffff8165464b>] online_pages+0x22b/0x390
>> [  338.987959]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
>> [  339.059718]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
>> [  339.130438]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
>> [  339.207385]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
>> [  339.271879]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
>> [  339.340520]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
>> [  339.411238]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
>> [  339.476766]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
>> [  339.545410]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
>> [  339.606784]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
>> [  339.667121]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
>>
>> [  339.739266] BUG: sleeping function called from invalid context at mm/slub.c:930
>> [  339.826795] in_atomic(): 1, irqs_disabled(): 0, pid: 2445, name: bash
>> [  339.903925] INFO: lockdep is turned off.
>> [  339.950938] Pid: 2445, comm: bash Tainted: G      D W    3.6.0-rc5-removable-node+ #1
>> [  340.044702] Call Trace:
>> [  340.073958]  [<ffffffff810a943a>] __might_sleep+0x18a/0x240
>> [  340.140704]  [<ffffffff811bd4fb>] kmem_cache_alloc_trace+0x4b/0x1d0
>> [  340.215759]  [<ffffffff814f5acf>] efivar_create_sysfs_entry+0x5f/0x1b0
>> [  340.293928]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
>> [  340.363794]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
>> [  340.445071]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
>> [  340.521163]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
>> [  340.587902]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
>> [  340.652571]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
>> [  340.715155]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
>> [  340.778739]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
>> [  340.839300]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
>> [  340.898737]  [<ffffffff8101a8eb>] die+0x5b/0x90
>> [  340.953042]  [<ffffffff81670c64>] do_trap+0xc4/0x170
>> [  341.012562]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
>> [  341.077211]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
>> [  341.151258]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>> [  341.229428]  [<ffffffff81670420>] ? restore_args+0x30/0x30
>> [  341.295109]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
>> [  341.356736]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
>> [  341.430789]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
>> [  341.504832]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
>> [  341.576803]  [<ffffffff81170445>] __free_pages+0x35/0x50
>> [  341.640459]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
>> [  341.710291]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
>> [  341.781224]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
>> [  341.851126]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
>> [  341.926211]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
>> [  342.012703]  [<ffffffff8165464b>] online_pages+0x22b/0x390
>> [  342.078434]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
>> [  342.150325]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
>> [  342.221165]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
>> [  342.298323]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
>> [  342.362966]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
>> [  342.431796]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
>> [  342.502689]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
>> [  342.568387]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
>> [  342.637216]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
>> [  342.698761]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
>> [  342.759236]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
>> [  342.831184] BUG: scheduling while atomic: bash/2445/0x10000004
>> [  342.901189] INFO: lockdep is turned off.
>> [  342.948190] Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma i7core_edac edac_core sg e1000e igb dca sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>> [  343.519830] Pid: 2445, comm: bash Tainted: G      D W    3.6.0-rc5-removable-node+ #1
>> [  343.613552] Call Trace:
>> [  343.642773]  [<ffffffff810a7f0a>] __schedule_bug+0x6a/0x90
>> [  343.708445]  [<ffffffff8166e128>] __schedule+0x7d8/0x880
>> [  343.772084]  [<ffffffff814f5acf>] ? efivar_create_sysfs_entry+0x5f/0x1b0
>> [  343.852324]  [<ffffffff810acc6a>] __cond_resched+0x2a/0x40
>> [  343.918004]  [<ffffffff8166e260>] _cond_resched+0x30/0x40
>> [  343.982676]  [<ffffffff811bd500>] kmem_cache_alloc_trace+0x50/0x1d0
>> [  344.057721]  [<ffffffff814f5acf>] efivar_create_sysfs_entry+0x5f/0x1b0
>> [  344.135874]  [<ffffffff814f5f9b>] efi_pstore_write+0x37b/0x3a0
>> [  344.205752]  [<ffffffff81670187>] ? _raw_spin_unlock_irqrestore+0x77/0x80
>> [  344.287007]  [<ffffffff8106d022>] ? kmsg_dump_get_buffer+0x1e2/0x2c0
>> [  344.363106]  [<ffffffff812bf3d0>] ? pstore_dump+0x1b0/0x220
>> [  344.429885]  [<ffffffff812bf34f>] pstore_dump+0x12f/0x220
>> [  344.494584]  [<ffffffff8106faab>] kmsg_dump+0x11b/0x2a0
>> [  344.557185]  [<ffffffff8106f9b6>] ? kmsg_dump+0x26/0x2a0
>> [  344.620847]  [<ffffffff8106bb6d>] oops_exit+0x1d/0x20
>> [  344.681390]  [<ffffffff816712fe>] oops_end+0x7e/0xf0
>> [  344.740918]  [<ffffffff8101a8eb>] die+0x5b/0x90
>> [  344.795241]  [<ffffffff81670c64>] do_trap+0xc4/0x170
>> [  344.854734]  [<ffffffff810186f5>] do_invalid_op+0x95/0xb0
>> [  344.919416]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
>> [  344.993473]  [<ffffffff8134a56d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>> [  345.071682]  [<ffffffff81670420>] ? restore_args+0x30/0x30
>> [  345.137431]  [<ffffffff8167acbb>] invalid_op+0x1b/0x20
>> [  345.199000]  [<ffffffff8116ffdc>] ? free_pcppages_bulk+0x4ec/0x540
>> [  345.273041]  [<ffffffff8116fc09>] ? free_pcppages_bulk+0x119/0x540
>> [  345.347082]  [<ffffffff811703c7>] free_hot_cold_page+0x187/0x1d0
>> [  345.419041]  [<ffffffff81170445>] __free_pages+0x35/0x50
>> [  345.482682]  [<ffffffff811c05ec>] __online_page_free+0x1c/0x20
>> [  345.552584]  [<ffffffff811c0616>] generic_online_page+0x26/0x30
>> [  345.623511]  [<ffffffff811c0271>] online_pages_range+0x61/0x90
>> [  345.693391]  [<ffffffff81078240>] walk_system_ram_range+0x140/0x150
>> [  345.768476]  [<ffffffff811c0210>] ? __online_page_increment_counters+0x20/0x20
>> [  345.854961]  [<ffffffff8165464b>] online_pages+0x22b/0x390
>> [  345.920712]  [<ffffffff8144d2ec>] memory_block_action+0xbc/0x1a0
>> [  345.992687]  [<ffffffff8166cbfa>] ? mutex_lock_nested+0x4a/0x60
>> [  346.063626]  [<ffffffff8144d453>] memory_block_change_state+0x83/0xf0
>> [  346.140786]  [<ffffffff8118ff9c>] ? might_fault+0x5c/0xb0
>> [  346.205491]  [<ffffffff8144d5f7>] store_mem_state+0x137/0x180
>> [  346.274370]  [<ffffffff8124a517>] ? sysfs_write_file+0x87/0x100
>> [  346.345299]  [<ffffffff814375f0>] dev_attr_store+0x20/0x30
>> [  346.411046]  [<ffffffff8124a533>] sysfs_write_file+0xa3/0x100
>> [  346.479894]  [<ffffffff811cc6d0>] vfs_write+0xd0/0x1a0
>> [  346.541493]  [<ffffffff811cc8a4>] sys_write+0x54/0xa0
>> [  346.602042]  [<ffffffff81679bd9>] system_call_fastpath+0x16/0x1b
>> ...
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>> 2012/09/10 17:58, Lai Jiangshan wrote:
>>> 	A) Introduction:
>>>
>>> This patchset adds MOVABLE-dedicated node and online_movable for memory-management.
>>>
>>> It is used for anti-fragmentation(hugepage, big-order allocation...),
>>> hot-removal-of-memory(virtualization, power-conserve, move memory between systems
>>> to make better utilities of memories).
>>>
>>> This patchset is based on 650470d1da17c20bf9700f9446775a01cbda52c3 of newest tip tree.
>>>
>>> 	B) User Interface:
>>>
>>> When users(big system manager) need config some node/memory as MOVABLE:
>>> 	1 Use kernelcore_max_addr=XX when boot
>>> 	2 Use movable_online hotplug action when running
>>> We may introduce some more convenient interface, such as
>>> 	movable_node=NODE_LIST boot option.
>>>
>>> 	C) Patches
>>>
>>> Patch1-3      Fix problems of the current code.(all related with hotplug)
>>> Patch4        cleanup for node_state_attr
>>> Patch5        introduce N_MEMORY
>>> Patch6-18     use N_MEMORY instead N_HIGH_MEMORY.
>>>                 The patches are separated by subsystem,
>>>                 *these conversions was(must be) checked carefully*.
>>>                 Patch18 also changes the node_states initialization
>>> Patch19       Add config to allow MOVABLE-dedicated node
>>> Patch20-24    Add kernelcore_max_addr
>>> Patch25,26       Add online_movable and online_kernel
>>>
>>>
>>> 	D) changes
>>> change V4-v3
>>> 	rebase.
>>> 	online_movable/online_kernel can create a zone from empty
>>> 	or empyt a zone
>>>
>>> change V3-v2:
>>> 	Proper nodemask management
>>>
>>> change V2-V1:
>>>
>>> The original V1 patchset of MOVABLE-dedicated node is here:
>>> http://comments.gmane.org/gmane.linux.kernel.mm/78122
>>>
>>> The new V2 adds N_MEMORY and a notion of "MOVABLE-dedicated node".
>>> And fix some related problems.
>>>
>>> The orignal V1 patchset of "add online_movable" is here:
>>> https://lkml.org/lkml/2012/7/4/145
>>>
>>> The new V2 discards the MIGRATE_HOTREMOVE approach, and use a more straight
>>> implementation(only 1 patch).
>>> Lai Jiangshan (22):
>>>     page_alloc.c: don't subtract unrelated memmap from zone's present
>>>       pages
>>>     memory_hotplug: fix missing nodemask management
>>>     slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>>>     node: cleanup node_state_attr
>>>     node_states: introduce N_MEMORY
>>>     cpuset: use N_MEMORY instead N_HIGH_MEMORY
>>>     procfs: use N_MEMORY instead N_HIGH_MEMORY
>>>     memcontrol: use N_MEMORY instead N_HIGH_MEMORY
>>>     oom: use N_MEMORY instead N_HIGH_MEMORY
>>>     mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
>>>     mempolicy: use N_MEMORY instead N_HIGH_MEMORY
>>>     hugetlb: use N_MEMORY instead N_HIGH_MEMORY
>>>     vmstat: use N_MEMORY instead N_HIGH_MEMORY
>>>     kthread: use N_MEMORY instead N_HIGH_MEMORY
>>>     init: use N_MEMORY instead N_HIGH_MEMORY
>>>     vmscan: use N_MEMORY instead N_HIGH_MEMORY
>>>     page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states
>>>       initialization
>>>     hotplug: update nodemasks management
>>>     numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
>>>     page_alloc: add kernelcore_max_addr
>>>     mm, memory-hotplug: add online_movable and online_kernel
>>>     memory_hotplug: handle empty zone when online_movable/online_kernel
>>>
>>> Yasuaki Ishimatsu (4):
>>>     x86: get pg_data_t's memory from other node
>>>     x86: use memblock_set_current_limit() to set memblock.current_limit
>>>     memblock: limit memory address from memblock
>>>     memblock: compare current_limit with end variable at
>>>       memblock_find_in_range_node()
>>>
>>>    Documentation/cgroups/cpusets.txt   |    2 +-
>>>    Documentation/kernel-parameters.txt |    9 ++
>>>    Documentation/memory-hotplug.txt    |   24 +++-
>>>    arch/x86/kernel/setup.c             |    4 +-
>>>    arch/x86/mm/init_64.c               |    4 +-
>>>    arch/x86/mm/numa.c                  |    8 +-
>>>    drivers/base/memory.c               |   19 ++-
>>>    drivers/base/node.c                 |   28 +++--
>>>    fs/proc/kcore.c                     |    2 +-
>>>    fs/proc/task_mmu.c                  |    4 +-
>>>    include/linux/cpuset.h              |    2 +-
>>>    include/linux/memblock.h            |    1 +
>>>    include/linux/memory.h              |    2 +
>>>    include/linux/memory_hotplug.h      |   13 ++-
>>>    include/linux/nodemask.h            |    5 +
>>>    init/main.c                         |    2 +-
>>>    kernel/cpuset.c                     |   32 ++--
>>>    kernel/kthread.c                    |    2 +-
>>>    mm/Kconfig                          |    8 +
>>>    mm/hugetlb.c                        |   24 ++--
>>>    mm/memblock.c                       |   10 +-
>>>    mm/memcontrol.c                     |   18 ++--
>>>    mm/memory_hotplug.c                 |  271 ++++++++++++++++++++++++++++++++---
>>>    mm/mempolicy.c                      |   12 +-
>>>    mm/migrate.c                        |    2 +-
>>>    mm/oom_kill.c                       |    2 +-
>>>    mm/page_alloc.c                     |   96 ++++++++-----
>>>    mm/page_cgroup.c                    |    2 +-
>>>    mm/slub.c                           |    4 +-
>>>    mm/vmscan.c                         |    4 +-
>>>    mm/vmstat.c                         |    4 +-
>>>    31 files changed, 476 insertions(+), 144 deletions(-)
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
>>
>>
>>
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [V4 PATCH 02/26] memory_hotplug: fix missing nodemask management
  2012-09-10  8:58 ` [V4 PATCH 02/26] memory_hotplug: fix missing nodemask management Lai Jiangshan
@ 2012-09-11  2:55   ` Wen Congyang
  0 siblings, 0 replies; 35+ messages in thread
From: Wen Congyang @ 2012-09-11  2:55 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Yasuaki ISIMATU,
	Andrew Morton

At 09/10/2012 04:58 PM, Lai Jiangshan Wrote:
> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
> it forgot to manage node_states[N_NORMAL_MEMORY]. fix it.
> 
> Add check_nodemasks_changes_online() and check_nodemasks_changes_offline
> to detect do node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
> are changed while hotplug.
> 
> Also add @status_change_nid_normal to struct memory_notify, thus
> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
> are changed.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  Documentation/memory-hotplug.txt |    5 ++-
>  include/linux/memory.h           |    1 +
>  mm/memory_hotplug.c              |   94 +++++++++++++++++++++++++++++++------
>  3 files changed, 83 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
> index 6d0c251..6e6cbc7 100644
> --- a/Documentation/memory-hotplug.txt
> +++ b/Documentation/memory-hotplug.txt
> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct memory_notify.
>  struct memory_notify {
>         unsigned long start_pfn;
>         unsigned long nr_pages;
> +       int status_change_nid_normal;
>         int status_change_nid;
>  }
>  
>  start_pfn is start_pfn of online/offline memory.
>  nr_pages is # of pages of online/offline memory.
> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
> +is (will be) set/clear, if this is -1, then nodemask status is not changed.
>  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
>  set/clear. It means a new(memoryless) node gets new memory by online and a
>  node loses all memory. If this is -1, then nodemask status is not changed.
> -If status_changed_nid >= 0, callback should create/discard structures for the
> +If status_changed_nid* >= 0, callback should create/discard structures for the
>  node if necessary.
>  
>  --------------
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 1ac7f6e..6b9202b 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
>  struct memory_notify {
>  	unsigned long start_pfn;
>  	unsigned long nr_pages;
> +	int status_change_nid_normal;
>  	int status_change_nid;
>  };
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3ad25f9..8c3bcf6 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -456,6 +456,34 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>  	return 0;
>  }
>  
> +static void check_nodemasks_changes_online(unsigned long nr_pages,
> +	struct zone *zone, struct memory_notify *arg)
> +{
> +	int nid = zone_to_nid(zone);
> +	enum zone_type zone_last = ZONE_NORMAL;
> +
> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> +		zone_last = ZONE_MOVABLE;
> +
> +	if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
> +		arg->status_change_nid_normal = nid;
> +	else
> +		arg->status_change_nid_normal = -1;
> +
> +	if (!node_state(nid, N_HIGH_MEMORY))
> +		arg->status_change_nid = nid;
> +	else
> +		arg->status_change_nid = -1;
> +}
> +
> +static void set_nodemasks(int node, struct memory_notify *arg)
> +{
> +	if (arg->status_change_nid_normal >= 0)
> +		node_set_state(node, N_NORMAL_MEMORY);
> +
> +	node_set_state(node, N_HIGH_MEMORY);
> +}
> +
>  
>  int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>  {
> @@ -467,13 +495,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>  	struct memory_notify arg;
>  
>  	lock_memory_hotplug();
> +	/*
> +	 * This doesn't need a lock to do pfn_to_page().
> +	 * The section can't be removed here because of the
> +	 * memory_block->state_mutex.
> +	 */

If we hotremove memory, we remove the section without memory_block->state_mutex,
it is fixed by the following patch:
https://lkml.org/lkml/2012/9/5/162

Thanks
Wen Congyang

> +	zone = page_zone(pfn_to_page(pfn));
> +
>  	arg.start_pfn = pfn;
>  	arg.nr_pages = nr_pages;
> -	arg.status_change_nid = -1;
> +	check_nodemasks_changes_online(nr_pages, zone, &arg);
>  
>  	nid = page_to_nid(pfn_to_page(pfn));
> -	if (node_present_pages(nid) == 0)
> -		arg.status_change_nid = nid;
>  
>  	ret = memory_notify(MEM_GOING_ONLINE, &arg);
>  	ret = notifier_to_errno(ret);
> @@ -483,12 +516,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>  		return ret;
>  	}
>  	/*
> -	 * This doesn't need a lock to do pfn_to_page().
> -	 * The section can't be removed here because of the
> -	 * memory_block->state_mutex.
> -	 */
> -	zone = page_zone(pfn_to_page(pfn));
> -	/*
>  	 * If this zone is not populated, then it is not in zonelist.
>  	 * This means the page allocator ignores this zone.
>  	 * So, zonelist must be updated after online.
> @@ -513,7 +540,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>  	zone->present_pages += onlined_pages;
>  	zone->zone_pgdat->node_present_pages += onlined_pages;
>  	if (onlined_pages) {
> -		node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
> +		set_nodemasks(zone_to_nid(zone), &arg);
>  		if (need_zonelists_rebuild)
>  			build_all_zonelists(NULL, zone);
>  		else
> @@ -866,6 +893,44 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>  	return offlined;
>  }
>  
> +static void check_nodemasks_changes_offline(unsigned long nr_pages,
> +		struct zone *zone, struct memory_notify *arg)
> +{
> +	struct pglist_data *pgdat = zone->zone_pgdat;
> +	unsigned long present_pages = 0;
> +	enum zone_type zt, zone_last = ZONE_NORMAL;
> +
> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> +		zone_last = ZONE_MOVABLE;
> +
> +	for (zt = 0; zt <= zone_last; zt++)
> +		present_pages += pgdat->node_zones[zt].present_pages;
> +	if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
> +		arg->status_change_nid_normal = zone_to_nid(zone);
> +	else
> +		arg->status_change_nid_normal = -1;
> +
> +	zone_last = ZONE_MOVABLE;
> +	for (; zt <= zone_last; zt++)
> +		present_pages += pgdat->node_zones[zt].present_pages;
> +	if (nr_pages >= present_pages)
> +		arg->status_change_nid = zone_to_nid(zone);
> +	else
> +		arg->status_change_nid = -1;
> +}
> +
> +static void clear_nodemasks(int node, struct memory_notify *arg)
> +{
> +	if (arg->status_change_nid_normal >= 0)
> +		node_clear_state(node, N_NORMAL_MEMORY);
> +
> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> +		return;
> +
> +	if (arg->status_change_nid >= 0)
> +		node_clear_state(node, N_HIGH_MEMORY);
> +}
> +
>  static int __ref offline_pages(unsigned long start_pfn,
>  		  unsigned long end_pfn, unsigned long timeout)
>  {
> @@ -899,9 +964,7 @@ static int __ref offline_pages(unsigned long start_pfn,
>  
>  	arg.start_pfn = start_pfn;
>  	arg.nr_pages = nr_pages;
> -	arg.status_change_nid = -1;
> -	if (nr_pages >= node_present_pages(node))
> -		arg.status_change_nid = node;
> +	check_nodemasks_changes_offline(nr_pages, zone, &arg);
>  
>  	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
>  	ret = notifier_to_errno(ret);
> @@ -969,10 +1032,9 @@ repeat:
>  	if (!populated_zone(zone))
>  		zone_pcp_reset(zone);
>  
> -	if (!node_present_pages(node)) {
> -		node_clear_state(node, N_HIGH_MEMORY);
> +	clear_nodemasks(node, &arg);
> +	if (arg.status_change_nid >= 0)
>  		kswapd_stop(node);
> -	}
>  
>  	vm_total_pages = nr_free_pagecache_pages();
>  	writeback_set_ratelimit();


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug
  2012-09-11  1:37     ` Yasuaki Ishimatsu
@ 2012-09-11  3:09       ` Lai Jiangshan
  0 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-11  3:09 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Andrew Morton

On 09/11/2012 09:37 AM, Yasuaki Ishimatsu wrote:
> Hi Lai,
> 
> 2012/09/11 10:22, Lai Jiangshan wrote:
>> On 09/11/2012 08:40 AM, Yasuaki Ishimatsu wrote:
>>> Hi Lai,
>>>
>>> Using memory_online to hot-added node's memory, the following kernel messages
>>> were shown. Is this a known issue?
>>
>> Thank you for your report.
>>
>> What operations did you have performed ?
> 
> My operations are as follows:
> 
> 1. Hot-add a new node by container driver.
>    In my system, container driver hot-addes a new nodes which includes CPUs and
>    memorys.
> 
> 2. echo online_movable to hot-added nodes's memory
>    When container driver hot-adds a new nodes, my system creates node2 sysfs.
>    And the sysfs has memory768-memory1023 sysfs. So I echo "online_movable"
>    to memory1023/state file.
>    # echo online_movable > memory1023/state
> 


I can't reproduce the bug. and my system is a little different from you.
could you show me the /proc/zoneinfo?

also, could you add following patch, it will help me know which constraint I have broken.

Thanks,
Lai

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3878170..68302ef 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -549,6 +549,9 @@ static inline void __free_one_page(struct page *page,
 
 	page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
 
+	VM_BUG_ON(page_outside_zone_boundaries(zone, page));
+	VM_BUG_ON(!pfn_valid_within(page_to_pfn(page)));
+	VM_BUG_ON(zone != page_zone(page));
 	VM_BUG_ON(page_idx & ((1 << order) - 1));
 	VM_BUG_ON(bad_range(zone, page));
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [V4 PATCH 27/27] memory,hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()
  2012-09-11  0:40 ` [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Yasuaki Ishimatsu
  2012-09-11  1:22   ` Lai Jiangshan
@ 2012-09-11  9:44   ` Lai Jiangshan
  2012-09-11 10:18     ` Yasuaki Ishimatsu
  1 sibling, 1 reply; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-11  9:44 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Andrew Morton,
	'FNST-Wen Congyang'

On 09/11/2012 08:40 AM, Yasuaki Ishimatsu wrote:
> Hi Lai,
> 
> Using memory_online to hot-added node's memory, the following kernel messages
> were shown. Is this a known issue?

Fixed.

Subject: Don't modify the zone_start_pfn outside of zone_span_writelock()

Original __add_zone() and new online_movable/online_kernel
maybe call sleep-able init_currently_empty_zone() to init wait_table,

but this function also modifies the zone_start_pfn without lock. 
so we move this code out, and ensure the modification of zone_start_pfn is done
with zone_span_writelock() held or booting.

Since zone_start_pfn is not modified by init_currently_empty_zone()
grow_zone_span() needs to be updated to be aware of empty zone.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reported-by: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
Tested-by: Wen Congyang <wency@cn.fujitsu.com>
---
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ad25f9..c26a4ea 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -201,7 +201,7 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
 	zone_span_writelock(zone);
 
 	old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	if (start_pfn < zone->zone_start_pfn)
+	if (!zone->zone_start_pfn || start_pfn < zone->zone_start_pfn)
 		zone->zone_start_pfn = start_pfn;
 
 	zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 009ac28..637b4f8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3991,8 +3991,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
 		return ret;
 	pgdat->nr_zones = zone_idx(zone) + 1;
 
-	zone->zone_start_pfn = zone_start_pfn;
-
 	mminit_dprintk(MMINIT_TRACE, "memmap_init",
 			"Initialising map node %d zone %lu pfns %lu -> %lu\n",
 			pgdat->node_id,
@@ -4459,6 +4457,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		ret = init_currently_empty_zone(zone, zone_start_pfn,
 						size, MEMMAP_EARLY);
 		BUG_ON(ret);
+		zone->zone_start_pfn = zone_start_pfn;
 		memmap_init(size, nid, j, zone_start_pfn);
 		zone_start_pfn += size;
 	}

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [V4 PATCH 27/27] memory,hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()
  2012-09-11  9:44   ` [V4 PATCH 27/27] memory,hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock() Lai Jiangshan
@ 2012-09-11 10:18     ` Yasuaki Ishimatsu
  2012-09-12  1:38       ` Lai Jiangshan
  0 siblings, 1 reply; 35+ messages in thread
From: Yasuaki Ishimatsu @ 2012-09-11 10:18 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Andrew Morton,
	'FNST-Wen Congyang'

Hi Lai,

2012/09/11 18:44, Lai Jiangshan wrote:
> On 09/11/2012 08:40 AM, Yasuaki Ishimatsu wrote:
>> Hi Lai,
>>
>> Using memory_online to hot-added node's memory, the following kernel messages
>> were shown. Is this a known issue?
>
> Fixed.
>
> Subject: Don't modify the zone_start_pfn outside of zone_span_writelock()
>
> Original __add_zone() and new online_movable/online_kernel
> maybe call sleep-able init_currently_empty_zone() to init wait_table,
>
> but this function also modifies the zone_start_pfn without lock.
> so we move this code out, and ensure the modification of zone_start_pfn is done
> with zone_span_writelock() held or booting.
>
> Since zone_start_pfn is not modified by init_currently_empty_zone()
> grow_zone_span() needs to be updated to be aware of empty zone.
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reported-by: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
> Tested-by: Wen Congyang <wency@cn.fujitsu.com>

Applying the patch, the kernel messages disappeared. Thanks.
But I have a question. Using online_movable, the following messages are shown.

[  608.314608] Built 3 zonelists in Node order, mobility grouping on.  Total pages: 7844412
[  608.411478] Policy zone: Normal

I think memory is allocated to ZONE_MOVABLE by using online_movable.
So why is "Policy zone: Normal" shown? It should be "Policy zone: Movable"

Thanks,
Yasuaki Ishimatsu

> ---
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3ad25f9..c26a4ea 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -201,7 +201,7 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
>   	zone_span_writelock(zone);
>
>   	old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
> -	if (start_pfn < zone->zone_start_pfn)
> +	if (!zone->zone_start_pfn || start_pfn < zone->zone_start_pfn)
>   		zone->zone_start_pfn = start_pfn;
>
>   	zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 009ac28..637b4f8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3991,8 +3991,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
>   		return ret;
>   	pgdat->nr_zones = zone_idx(zone) + 1;
>
> -	zone->zone_start_pfn = zone_start_pfn;
> -
>   	mminit_dprintk(MMINIT_TRACE, "memmap_init",
>   			"Initialising map node %d zone %lu pfns %lu -> %lu\n",
>   			pgdat->node_id,
> @@ -4459,6 +4457,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>   		ret = init_currently_empty_zone(zone, zone_start_pfn,
>   						size, MEMMAP_EARLY);
>   		BUG_ON(ret);
> +		zone->zone_start_pfn = zone_start_pfn;
>   		memmap_init(size, nid, j, zone_start_pfn);
>   		zone_start_pfn += size;
>   	}
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [V4 PATCH 27/27] memory,hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()
  2012-09-11 10:18     ` Yasuaki Ishimatsu
@ 2012-09-12  1:38       ` Lai Jiangshan
  0 siblings, 0 replies; 35+ messages in thread
From: Lai Jiangshan @ 2012-09-12  1:38 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Andrew Morton,
	'FNST-Wen Congyang'

On 09/11/2012 06:18 PM, Yasuaki Ishimatsu wrote:
> Hi Lai,
> 
> 2012/09/11 18:44, Lai Jiangshan wrote:
>> On 09/11/2012 08:40 AM, Yasuaki Ishimatsu wrote:
>>> Hi Lai,
>>>
>>> Using memory_online to hot-added node's memory, the following kernel messages
>>> were shown. Is this a known issue?
>>
>> Fixed.
>>
>> Subject: Don't modify the zone_start_pfn outside of zone_span_writelock()
>>
>> Original __add_zone() and new online_movable/online_kernel
>> maybe call sleep-able init_currently_empty_zone() to init wait_table,
>>
>> but this function also modifies the zone_start_pfn without lock.
>> so we move this code out, and ensure the modification of zone_start_pfn is done
>> with zone_span_writelock() held or booting.
>>
>> Since zone_start_pfn is not modified by init_currently_empty_zone()
>> grow_zone_span() needs to be updated to be aware of empty zone.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Reported-by: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
>> Tested-by: Wen Congyang <wency@cn.fujitsu.com>
> 
> Applying the patch, the kernel messages disappeared. Thanks.
> But I have a question. Using online_movable, the following messages are shown.
> 
> [  608.314608] Built 3 zonelists in Node order, mobility grouping on.  Total pages: 7844412
> [  608.411478] Policy zone: Normal
> 
> I think memory is allocated to ZONE_MOVABLE by using online_movable.
> So why is "Policy zone: Normal" shown? It should be "Policy zone: Movable"
> 
>


I don't know the mean of "Policy zone" of here. but:

---------------------------------
/* Highest zone. An specific allocation for a zone below that is not
   policied. */
enum zone_type policy_zone = 0;


------------------------------------------------
extern enum zone_type policy_zone;

static inline void check_highest_zone(enum zone_type k)
{
	if (k > policy_zone && k != ZONE_MOVABLE)
		policy_zone = k;
}

----------------------

so I think the output is correct.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2012-09-12  1:45 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-10  8:58 [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
2012-09-10  8:58 ` [V4 PATCH 01/26] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
2012-09-10  8:58 ` [V4 PATCH 02/26] memory_hotplug: fix missing nodemask management Lai Jiangshan
2012-09-11  2:55   ` Wen Congyang
2012-09-10  8:58 ` [V4 PATCH 03/26] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
2012-09-10  8:58 ` [V4 PATCH 04/26] node: cleanup node_state_attr Lai Jiangshan
2012-09-10  8:58 ` [V4 PATCH 05/26] node_states: introduce N_MEMORY Lai Jiangshan
2012-09-10  8:58 ` [V4 PATCH 06/26] cpuset: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
2012-09-10  8:58 ` [V4 PATCH 07/26] procfs: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 08/26] memcontrol: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 09/26] oom: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 10/26] mm,migrate: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 11/26] mempolicy: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 12/26] hugetlb: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 13/26] vmstat: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 14/26] kthread: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 15/26] init: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 16/26] vmscan: " Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 17/26] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 18/26] hotplug: update nodemasks management Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 19/26] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 20/26] page_alloc: add kernelcore_max_addr Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 21/26] x86: get pg_data_t's memory from other node Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 22/26] x86: use memblock_set_current_limit() to set memblock.current_limit Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 23/26] memblock: limit memory address from memblock Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 24/26] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 25/26] mm, memory-hotplug: add online_movable and online_kernel Lai Jiangshan
2012-09-10  8:59 ` [V4 PATCH 26/26] memory_hotplug: handle empty zone when online_movable/online_kernel Lai Jiangshan
2012-09-11  0:40 ` [V4 PATCH 00/26] memory,numa: introduce MOVABLE-dedicated node and online_movable for hotplug Yasuaki Ishimatsu
2012-09-11  1:22   ` Lai Jiangshan
2012-09-11  1:37     ` Yasuaki Ishimatsu
2012-09-11  3:09       ` Lai Jiangshan
2012-09-11  9:44   ` [V4 PATCH 27/27] memory,hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock() Lai Jiangshan
2012-09-11 10:18     ` Yasuaki Ishimatsu
2012-09-12  1:38       ` Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).