linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node
@ 2012-10-29 15:07 Lai Jiangshan
  2012-10-29 15:07 ` [V5 PATCH 01/26] mm, memory-hotplug: dynamic configure movable memory and portion memory Lai Jiangshan
                   ` (26 more replies)
  0 siblings, 27 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:07 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

Movable memory is a very important concept of memory-management,
we need to consolidate it and make use of it on systems.

Movable memory is needed for
o	anti-fragmentation(hugepage, big-order allocation...)
o	logic hot-remove(virtualization, Memory capacity on Demand)
o	physic hot-remove(power-saving, hardware partitioning, hardware fault management)

All these require dynamic configuring the memory and making better utilities of memories
and safer. We also need physic hot-remove, so we need movable node too.
(Although some systems support physic-memory-migration, we don't require all
memory on physic-node is movable, but movable node is still needed here
for logic-node if we want to make physic-migration is transparent)

We add dynamic configuration commands "online_movalbe" and "online_kernel".
We also add non-dynamic boot option kernelcore_max_addr.
We may add some more dynamic/non-dynamic configuration in future.


The patchset is based on 3.7-rc3 with these three patches already applied:
	https://lkml.org/lkml/2012/10/24/151
	https://lkml.org/lkml/2012/10/26/150

You can also simply pull all the patches from:
	git pull https://github.com/laijs/linux.git hotplug-next



Issues):

mempolicy(M_BIND) don't act well when the nodemask has movable nodes only,
the kernel allocation will fail and the task can't create new task or other
kernel objects.

So we change the strategy/policy
	when the bound nodemask has movable node(s) only, we only
	apply mempolicy for userspace allocation, don't apply it
	for kernel allocation.

CPUSET also has the same problem, but the code spread in page_alloc.c,
and we doesn't fix it yet, we can/will change allocation strategy to one of
these 3 strategies:
	1) the same strategy as mempolicy
	2) change cpuset, make nodemask always has at least a normal node
	3) split nodemask: nodemask_user and nodemask_kernel

Thoughts?



Patches):

patch1-3:     add online_movable and online_kernel, bot don't result movable node
Patch4        cleanup for node_state_attr
Patch5        introduce N_MEMORY
Patch6-17     use N_MEMORY instead N_HIGH_MEMORY.
              The patches are separated by subsystem,
              Patch18 also changes the node_states initialization
Patch18-20    Add  MOVABLE-dedicated node 
Patch21-25    Add kernelcore_max_addr
patch26:      mempolicy handle movable node




Changes):

change V5-V4:
	consolidate online_movable/online_kernel
	nodemask management

change V4-v3
	rebase.
	online_movable/online_kernel can create a zone from empty
	or empyt a zone

change V3-v2:
	Proper nodemask management

change V2-V1:

The original V1 patchset of MOVABLE-dedicated node is here:
http://comments.gmane.org/gmane.linux.kernel.mm/78122

The new V2 adds N_MEMORY and a notion of "MOVABLE-dedicated node".
And fix some related problems.

The orignal V1 patchset of "add online_movable" is here:
https://lkml.org/lkml/2012/7/4/145

The new V2 discards the MIGRATE_HOTREMOVE approach, and use a more straight
implementation(only 1 patch).



Lai Jiangshan (22):
  mm, memory-hotplug: dynamic configure movable memory and portion
    memory
  memory_hotplug: handle empty zone when online_movable/online_kernel
  memory_hotplug: ensure every online node has NORMAL memory
  node: cleanup node_state_attr
  node_states: introduce N_MEMORY
  cpuset: use N_MEMORY instead N_HIGH_MEMORY
  procfs: use N_MEMORY instead N_HIGH_MEMORY
  memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  oom: use N_MEMORY instead N_HIGH_MEMORY
  mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
  mempolicy: use N_MEMORY instead N_HIGH_MEMORY
  hugetlb: use N_MEMORY instead N_HIGH_MEMORY
  vmstat: use N_MEMORY instead N_HIGH_MEMORY
  kthread: use N_MEMORY instead N_HIGH_MEMORY
  init: use N_MEMORY instead N_HIGH_MEMORY
  vmscan: use N_MEMORY instead N_HIGH_MEMORY
  page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states
    initialization
  hotplug: update nodemasks management
  numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
  memory_hotplug: allow online/offline memory to result movable node
  page_alloc: add kernelcore_max_addr
  mempolicy: fix is_valid_nodemask()

Yasuaki Ishimatsu (4):
  x86: get pg_data_t's memory from other node
  x86: use memblock_set_current_limit() to set memblock.current_limit
  memblock: limit memory address from memblock
  memblock: compare current_limit with end variable at
    memblock_find_in_range_node()

 Documentation/cgroups/cpusets.txt   |    2 +-
 Documentation/kernel-parameters.txt |    9 +
 Documentation/memory-hotplug.txt    |   19 ++-
 arch/x86/kernel/setup.c             |    4 +-
 arch/x86/mm/init_64.c               |    4 +-
 arch/x86/mm/numa.c                  |    8 +-
 drivers/base/memory.c               |   27 ++--
 drivers/base/node.c                 |   28 ++--
 fs/proc/kcore.c                     |    2 +-
 fs/proc/task_mmu.c                  |    4 +-
 include/linux/cpuset.h              |    2 +-
 include/linux/memblock.h            |    1 +
 include/linux/memory.h              |    1 +
 include/linux/memory_hotplug.h      |   13 ++-
 include/linux/nodemask.h            |    5 +
 init/main.c                         |    2 +-
 kernel/cpuset.c                     |   32 ++--
 kernel/kthread.c                    |    2 +-
 mm/Kconfig                          |    8 +
 mm/hugetlb.c                        |   24 ++--
 mm/memblock.c                       |   10 +-
 mm/memcontrol.c                     |   18 +-
 mm/memory_hotplug.c                 |  283 +++++++++++++++++++++++++++++++++--
 mm/mempolicy.c                      |   48 ++++---
 mm/migrate.c                        |    2 +-
 mm/oom_kill.c                       |    2 +-
 mm/page_alloc.c                     |   76 +++++++---
 mm/page_cgroup.c                    |    2 +-
 mm/vmscan.c                         |    4 +-
 mm/vmstat.c                         |    4 +-
 30 files changed, 508 insertions(+), 138 deletions(-)

-- 
1.7.4.4

cover-letter:


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [V5 PATCH 01/26] mm, memory-hotplug: dynamic configure movable memory and portion memory
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
@ 2012-10-29 15:07 ` Lai Jiangshan
  2012-10-29 15:20 ` [V5 PATCH 02/26] memory_hotplug: handle empty zone when online_movable/online_kernel Lai Jiangshan
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:07 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan

Add online_movable and online_kernel for logic memory hotplug.
This is the dynamic version of "movablecore" & "kernelcore".

We have the same reason to introduce it as to introduce "movablecore" & "kernelcore".
It has the same motive as "movablecore" & "kernelcore", but it is dynamic/running-time:

o	We can configure memory as kernelcore or movablecore after boot.

	Userspace workload is increased, we need more hugepage, we can't
	use "online_movable" to add memory and allow the system use more
	THP(transparent-huge-page), vice-verse when kernel workload is increase.

	Also help for virtualization to dynamic configure host/guest's memory,
	to save/(reduce waste) memory.

	Memory capacity on Demand

o	When a new node is physically online after boot, we need to use
	"online_movable" or "online_kernel" to configure/portion it
	as we expected when we logic-online it.

	This configuration also helps for physically-memory-migrate.

o	all benefit as the same as existed "movablecore" & "kernelcore".

o	Preparing for movable-node, which is very important for power-saving,
	hardware partitioning and high-available-system(hardware fault management).

	(Note, we don't introduce movable-node here.)


Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.

When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.

Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.


Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 Documentation/memory-hotplug.txt |   14 +++++-
 drivers/base/memory.c            |   27 ++++++----
 include/linux/memory_hotplug.h   |   13 +++++-
 mm/memory_hotplug.c              |  101 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6e6cbc7..c6f993d 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -161,7 +161,8 @@ a recent addition and not present on older kernels.
 		    in the memory block.
 'state'           : read-write
                     at read:  contains online/offline state of memory.
-                    at write: user can specify "online", "offline" command
+                    at write: user can specify "online_kernel",
+                    "online_movable", "online", "offline" command
                     which will be performed on al sections in the block.
 'phys_device'     : read-only: designed to show the name of physical memory
                     device.  This is not well implemented now.
@@ -255,6 +256,17 @@ For onlining, you have to write "online" to the section's state file as:
 
 % echo online > /sys/devices/system/memory/memoryXXX/state
 
+This onlining will not change the ZONE type of the target memory section,
+If the memory section is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
+
+% echo online_movable > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_MOVABLE)
+
+And if the memory section is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
+
+% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_NORMAL)
+
 After this, section memoryXXX's state will be 'online' and the amount of
 available memory will be increased.
 
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 86c8821..15a1dd7 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -246,7 +246,7 @@ static bool pages_correctly_reserved(unsigned long start_pfn,
  * OK to have direct references to sparsemem variables in here.
  */
 static int
-memory_block_action(unsigned long phys_index, unsigned long action)
+memory_block_action(unsigned long phys_index, unsigned long action, int online_type)
 {
 	unsigned long start_pfn;
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -261,7 +261,7 @@ memory_block_action(unsigned long phys_index, unsigned long action)
 			if (!pages_correctly_reserved(start_pfn, nr_pages))
 				return -EBUSY;
 
-			ret = online_pages(start_pfn, nr_pages);
+			ret = online_pages(start_pfn, nr_pages, online_type);
 			break;
 		case MEM_OFFLINE:
 			ret = offline_pages(start_pfn, nr_pages);
@@ -276,7 +276,8 @@ memory_block_action(unsigned long phys_index, unsigned long action)
 }
 
 static int __memory_block_change_state(struct memory_block *mem,
-		unsigned long to_state, unsigned long from_state_req)
+		unsigned long to_state, unsigned long from_state_req,
+		int online_type)
 {
 	int ret = 0;
 
@@ -288,7 +289,7 @@ static int __memory_block_change_state(struct memory_block *mem,
 	if (to_state == MEM_OFFLINE)
 		mem->state = MEM_GOING_OFFLINE;
 
-	ret = memory_block_action(mem->start_section_nr, to_state);
+	ret = memory_block_action(mem->start_section_nr, to_state, online_type);
 
 	if (ret) {
 		mem->state = from_state_req;
@@ -311,12 +312,14 @@ out:
 }
 
 static int memory_block_change_state(struct memory_block *mem,
-		unsigned long to_state, unsigned long from_state_req)
+		unsigned long to_state, unsigned long from_state_req,
+		int online_type)
 {
 	int ret;
 
 	mutex_lock(&mem->state_mutex);
-	ret = __memory_block_change_state(mem, to_state, from_state_req);
+	ret = __memory_block_change_state(mem, to_state, from_state_req,
+					  online_type);
 	mutex_unlock(&mem->state_mutex);
 
 	return ret;
@@ -330,10 +333,14 @@ store_mem_state(struct device *dev,
 
 	mem = container_of(dev, struct memory_block, dev);
 
-	if (!strncmp(buf, "online", min((int)count, 6)))
-		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
+	if (!strncmp(buf, "online_kernel", min((int)count, 13)))
+		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_KERNEL);
+	else if (!strncmp(buf, "online_movable", min((int)count, 14)))
+		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_MOVABLE);
+	else if (!strncmp(buf, "online", min((int)count, 6)))
+		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_KEEP);
 	else if(!strncmp(buf, "offline", min((int)count, 7)))
-		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE, -1);
 
 	if (ret)
 		return ret;
@@ -669,7 +676,7 @@ int offline_memory_block(struct memory_block *mem)
 
 	mutex_lock(&mem->state_mutex);
 	if (mem->state != MEM_OFFLINE)
-		ret = __memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+		ret = __memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE, -1);
 	mutex_unlock(&mem->state_mutex);
 
 	return ret;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 95573ec..4a45c4e 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -26,6 +26,13 @@ enum {
 	MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE = NODE_INFO,
 };
 
+/* Types for control the zone type of onlined memory */
+enum {
+	ONLINE_KEEP,
+	ONLINE_KERNEL,
+	ONLINE_MOVABLE,
+};
+
 /*
  * pgdat resizing functions
  */
@@ -46,6 +53,10 @@ void pgdat_resize_init(struct pglist_data *pgdat)
 }
 /*
  * Zone resizing functions
+ *
+ * Note: any attempt to resize a zone should has pgdat_resize_lock()
+ * zone_span_writelock() both held. This ensure the size of a zone
+ * can't be changed while pgdat_resize_lock() held.
  */
 static inline unsigned zone_span_seqbegin(struct zone *zone)
 {
@@ -71,7 +82,7 @@ extern int zone_grow_free_lists(struct zone *zone, unsigned long new_nr_pages);
 extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
 extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
 /* VM interface that may be used by firmware interface */
-extern int online_pages(unsigned long, unsigned long);
+extern int online_pages(unsigned long, unsigned long, int);
 extern void __offline_isolated_pages(unsigned long, unsigned long);
 
 typedef void (*online_page_callback_t)(struct page *page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a1920fb..6d3bec4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -221,6 +221,89 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
 	zone_span_writeunlock(zone);
 }
 
+static void resize_zone(struct zone *zone, unsigned long start_pfn,
+		unsigned long end_pfn)
+{
+
+	zone_span_writelock(zone);
+
+	zone->zone_start_pfn = start_pfn;
+	zone->spanned_pages = end_pfn - start_pfn;
+
+	zone_span_writeunlock(zone);
+}
+
+static void fix_zone_id(struct zone *zone, unsigned long start_pfn,
+		unsigned long end_pfn)
+{
+	enum zone_type zid = zone_idx(zone);
+	int nid = zone->zone_pgdat->node_id;
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++)
+		set_page_links(pfn_to_page(pfn), zid, nid, pfn);
+}
+
+static int move_pfn_range_left(struct zone *z1, struct zone *z2,
+		unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long flags;
+
+	pgdat_resize_lock(z1->zone_pgdat, &flags);
+
+	/* can't move pfns which are higher than @z2 */
+	if (end_pfn > z2->zone_start_pfn + z2->spanned_pages)
+		goto out_fail;
+	/* the move out part mast at the left most of @z2 */
+	if (start_pfn > z2->zone_start_pfn)
+		goto out_fail;
+	/* must included/overlap */
+	if (end_pfn <= z2->zone_start_pfn)
+		goto out_fail;
+
+	resize_zone(z1, z1->zone_start_pfn, end_pfn);
+	resize_zone(z2, end_pfn, z2->zone_start_pfn + z2->spanned_pages);
+
+	pgdat_resize_unlock(z1->zone_pgdat, &flags);
+
+	fix_zone_id(z1, start_pfn, end_pfn);
+
+	return 0;
+out_fail:
+	pgdat_resize_unlock(z1->zone_pgdat, &flags);
+	return -1;
+}
+
+static int move_pfn_range_right(struct zone *z1, struct zone *z2,
+		unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long flags;
+
+	pgdat_resize_lock(z1->zone_pgdat, &flags);
+
+	/* can't move pfns which are lower than @z1 */
+	if (z1->zone_start_pfn > start_pfn)
+		goto out_fail;
+	/* the move out part mast at the right most of @z1 */
+	if (z1->zone_start_pfn + z1->spanned_pages >  end_pfn)
+		goto out_fail;
+	/* must included/overlap */
+	if (start_pfn >= z1->zone_start_pfn + z1->spanned_pages)
+		goto out_fail;
+
+	resize_zone(z1, z1->zone_start_pfn, start_pfn);
+	resize_zone(z2, start_pfn, z2->zone_start_pfn + z2->spanned_pages);
+
+	pgdat_resize_unlock(z1->zone_pgdat, &flags);
+
+	fix_zone_id(z2, start_pfn, end_pfn);
+
+	return 0;
+out_fail:
+	pgdat_resize_unlock(z1->zone_pgdat, &flags);
+	return -1;
+}
+
 static void grow_pgdat_span(struct pglist_data *pgdat, unsigned long start_pfn,
 			    unsigned long end_pfn)
 {
@@ -515,7 +598,7 @@ static void node_states_set_node(int node, struct memory_notify *arg)
 }
 
 
-int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
+int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
 {
 	unsigned long onlined_pages = 0;
 	struct zone *zone;
@@ -532,6 +615,22 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 	 */
 	zone = page_zone(pfn_to_page(pfn));
 
+	if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
+		if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
+			unlock_memory_hotplug();
+			return -1;
+		}
+	}
+	if (online_type == ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) {
+		if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) {
+			unlock_memory_hotplug();
+			return -1;
+		}
+	}
+
+	/* Previous code may changed the zone of the pfn range */
+	zone = page_zone(pfn_to_page(pfn));
+
 	arg.start_pfn = pfn;
 	arg.nr_pages = nr_pages;
 	node_states_check_changes_online(nr_pages, zone, &arg);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 02/26] memory_hotplug: handle empty zone when online_movable/online_kernel
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
  2012-10-29 15:07 ` [V5 PATCH 01/26] mm, memory-hotplug: dynamic configure movable memory and portion memory Lai Jiangshan
@ 2012-10-29 15:20 ` Lai Jiangshan
  2012-10-29 15:20 ` [V5 PATCH 03/26] memory_hotplug: ensure every online node has NORMAL memory Lai Jiangshan
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:20 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Wen Congyang,
	linux-mm

make online_movable/online_kernel can empty a zone
or can move memory to a empty zone.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   51 +++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6d3bec4..bdcdaf6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -227,8 +227,17 @@ static void resize_zone(struct zone *zone, unsigned long start_pfn,
 
 	zone_span_writelock(zone);
 
-	zone->zone_start_pfn = start_pfn;
-	zone->spanned_pages = end_pfn - start_pfn;
+	if (end_pfn - start_pfn) {
+		zone->zone_start_pfn = start_pfn;
+		zone->spanned_pages = end_pfn - start_pfn;
+	} else {
+		/*
+		 * make it consist as free_area_init_core(),
+		 * if spanned_pages = 0, then keep start_pfn = 0
+		 */
+		zone->zone_start_pfn = 0;
+		zone->spanned_pages = 0;
+	}
 
 	zone_span_writeunlock(zone);
 }
@@ -244,10 +253,19 @@ static void fix_zone_id(struct zone *zone, unsigned long start_pfn,
 		set_page_links(pfn_to_page(pfn), zid, nid, pfn);
 }
 
-static int move_pfn_range_left(struct zone *z1, struct zone *z2,
+static int __meminit move_pfn_range_left(struct zone *z1, struct zone *z2,
 		unsigned long start_pfn, unsigned long end_pfn)
 {
+	int ret;
 	unsigned long flags;
+	unsigned long z1_start_pfn;
+
+	if (!z1->wait_table) {
+		ret = init_currently_empty_zone(z1, start_pfn,
+			end_pfn - start_pfn, MEMMAP_HOTPLUG);
+		if (ret)
+			return ret;
+	}
 
 	pgdat_resize_lock(z1->zone_pgdat, &flags);
 
@@ -261,7 +279,13 @@ static int move_pfn_range_left(struct zone *z1, struct zone *z2,
 	if (end_pfn <= z2->zone_start_pfn)
 		goto out_fail;
 
-	resize_zone(z1, z1->zone_start_pfn, end_pfn);
+	/* use start_pfn for z1's start_pfn if z1 is empty */
+	if (z1->spanned_pages)
+		z1_start_pfn = z1->zone_start_pfn;
+	else
+		z1_start_pfn = start_pfn;
+
+	resize_zone(z1, z1_start_pfn, end_pfn);
 	resize_zone(z2, end_pfn, z2->zone_start_pfn + z2->spanned_pages);
 
 	pgdat_resize_unlock(z1->zone_pgdat, &flags);
@@ -274,10 +298,19 @@ out_fail:
 	return -1;
 }
 
-static int move_pfn_range_right(struct zone *z1, struct zone *z2,
+static int __meminit move_pfn_range_right(struct zone *z1, struct zone *z2,
 		unsigned long start_pfn, unsigned long end_pfn)
 {
+	int ret;
 	unsigned long flags;
+	unsigned long z2_end_pfn;
+
+	if (!z2->wait_table) {
+		ret = init_currently_empty_zone(z2, start_pfn,
+			end_pfn - start_pfn, MEMMAP_HOTPLUG);
+		if (ret)
+			return ret;
+	}
 
 	pgdat_resize_lock(z1->zone_pgdat, &flags);
 
@@ -291,8 +324,14 @@ static int move_pfn_range_right(struct zone *z1, struct zone *z2,
 	if (start_pfn >= z1->zone_start_pfn + z1->spanned_pages)
 		goto out_fail;
 
+	/* use end_pfn for z2's end_pfn if z2 is empty */
+	if (z2->spanned_pages)
+		z2_end_pfn = z2->zone_start_pfn + z2->spanned_pages;
+	else
+		z2_end_pfn = end_pfn;
+
 	resize_zone(z1, z1->zone_start_pfn, start_pfn);
-	resize_zone(z2, start_pfn, z2->zone_start_pfn + z2->spanned_pages);
+	resize_zone(z2, start_pfn, z2_end_pfn);
 
 	pgdat_resize_unlock(z1->zone_pgdat, &flags);
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 03/26] memory_hotplug: ensure every online node has NORMAL memory
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
  2012-10-29 15:07 ` [V5 PATCH 01/26] mm, memory-hotplug: dynamic configure movable memory and portion memory Lai Jiangshan
  2012-10-29 15:20 ` [V5 PATCH 02/26] memory_hotplug: handle empty zone when online_movable/online_kernel Lai Jiangshan
@ 2012-10-29 15:20 ` Lai Jiangshan
  2012-10-29 15:20 ` [V5 PATCH 04/26] node: cleanup node_state_attr Lai Jiangshan
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:20 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Wen Congyang,
	linux-mm

Old  memory hotplug code and new online/movable may cause a online node
don't have any normal memory, but memory-management acts bad when we have
nodes which is online but don't have any normal memory.
Example: it may cause a bound task fail on all kernel allocation and
cause the task can't create task or create other kernel object.

So we disable non-normal-memory-node here, we will enable it
when we prepared.


Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   40 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index bdcdaf6..9af9641 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -589,6 +589,12 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 	return 0;
 }
 
+/* ensure every online node has NORMAL memory */
+static bool can_online_high_movable(struct zone *zone)
+{
+	return node_state(zone_to_nid(zone), N_NORMAL_MEMORY);
+}
+
 /* check which state of node_states will be changed when online memory */
 static void node_states_check_changes_online(unsigned long nr_pages,
 	struct zone *zone, struct memory_notify *arg)
@@ -654,6 +660,12 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	 */
 	zone = page_zone(pfn_to_page(pfn));
 
+	if ((zone_idx(zone) > ZONE_NORMAL || online_type == ONLINE_MOVABLE) &&
+	    !can_online_high_movable(zone)) {
+		unlock_memory_hotplug();
+		return -1;
+	}
+
 	if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
 		if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
 			unlock_memory_hotplug();
@@ -1058,6 +1070,30 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	return offlined;
 }
 
+/* ensure the node has NORMAL memory if it is still online */
+static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
+{
+	struct pglist_data *pgdat = zone->zone_pgdat;
+	unsigned long present_pages = 0;
+	enum zone_type zt;
+
+	for (zt = 0; zt <= ZONE_NORMAL; zt++)
+		present_pages += pgdat->node_zones[zt].present_pages;
+
+	if (present_pages > nr_pages)
+		return true;
+
+	present_pages = 0;
+	for (; zt <= ZONE_MOVABLE; zt++)
+		present_pages += pgdat->node_zones[zt].present_pages;
+
+	/*
+	 * we can't offline the last normal memory until all
+	 * higher memory is offlined.
+	 */
+	return present_pages == 0;
+}
+
 /* check which state of node_states will be changed when offline memory */
 static void node_states_check_changes_offline(unsigned long nr_pages,
 		struct zone *zone, struct memory_notify *arg)
@@ -1145,6 +1181,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	node = zone_to_nid(zone);
 	nr_pages = end_pfn - start_pfn;
 
+	ret = -EINVAL;
+	if (zone_idx(zone) <= ZONE_NORMAL && !can_offline_normal(zone, nr_pages))
+		goto out;
+
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, true);
 	if (ret)
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 04/26] node: cleanup node_state_attr
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (2 preceding siblings ...)
  2012-10-29 15:20 ` [V5 PATCH 03/26] memory_hotplug: ensure every online node has NORMAL memory Lai Jiangshan
@ 2012-10-29 15:20 ` Lai Jiangshan
  2012-10-29 15:20 ` [V5 PATCH 05/26] node_states: introduce N_MEMORY Lai Jiangshan
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:20 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan,
	Greg Kroah-Hartman

use [index] = init_value
use N_xxxxx instead of hardcode.

Make it more readability and easy to add new state.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 drivers/base/node.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..5d7731e 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -614,23 +614,23 @@ static ssize_t show_node_state(struct device *dev,
 	{ __ATTR(name, 0444, show_node_state, NULL), state }
 
 static struct node_attr node_state_attr[] = {
-	_NODE_ATTR(possible, N_POSSIBLE),
-	_NODE_ATTR(online, N_ONLINE),
-	_NODE_ATTR(has_normal_memory, N_NORMAL_MEMORY),
-	_NODE_ATTR(has_cpu, N_CPU),
+	[N_POSSIBLE] = _NODE_ATTR(possible, N_POSSIBLE),
+	[N_ONLINE] = _NODE_ATTR(online, N_ONLINE),
+	[N_NORMAL_MEMORY] = _NODE_ATTR(has_normal_memory, N_NORMAL_MEMORY),
 #ifdef CONFIG_HIGHMEM
-	_NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
+	[N_HIGH_MEMORY] = _NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
 #endif
+	[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
 };
 
 static struct attribute *node_state_attrs[] = {
-	&node_state_attr[0].attr.attr,
-	&node_state_attr[1].attr.attr,
-	&node_state_attr[2].attr.attr,
-	&node_state_attr[3].attr.attr,
+	&node_state_attr[N_POSSIBLE].attr.attr,
+	&node_state_attr[N_ONLINE].attr.attr,
+	&node_state_attr[N_NORMAL_MEMORY].attr.attr,
 #ifdef CONFIG_HIGHMEM
-	&node_state_attr[4].attr.attr,
+	&node_state_attr[N_HIGH_MEMORY].attr.attr,
 #endif
+	&node_state_attr[N_CPU].attr.attr,
 	NULL
 };
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 05/26] node_states: introduce N_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (3 preceding siblings ...)
  2012-10-29 15:20 ` [V5 PATCH 04/26] node: cleanup node_state_attr Lai Jiangshan
@ 2012-10-29 15:20 ` Lai Jiangshan
  2012-10-29 20:46   ` David Rientjes
  2012-10-29 15:20 ` [V5 PATCH 06/26] cpuset: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
                   ` (21 subsequent siblings)
  26 siblings, 1 reply; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:20 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Christoph Lameter,
	Hillf Danton

We have N_NORMAL_MEMORY for standing for the nodes that have normal memory with
zone_type <= ZONE_NORMAL.

And we have N_HIGH_MEMORY for standing for the nodes that have normal or high
memory.

But we don't have any word to stand for the nodes that have *any* memory.

And we have N_CPU but without N_MEMORY.

Current code reuse the N_HIGH_MEMORY for this purpose because any node which
has memory must have high memory or normal memory currently.

A)	But this reusing is bad for *readability*. Because the name
	N_HIGH_MEMORY just stands for high or normal:

A.example 1)
	mem_cgroup_nr_lru_pages():
		for_each_node_state(nid, N_HIGH_MEMORY)

	The user will be confused(why this function just counts for high or
	normal memory node? does it counts for ZONE_MOVABLE's lru pages?)
	until someone else tell them N_HIGH_MEMORY is reused to stand for
	nodes that have any memory.

A.cont) If we introduce N_MEMORY, we can reduce this confusing
	AND make the code more clearly:

A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:

	One is in page_cgroup_init(void):
		for_each_node_state(nid, N_HIGH_MEMORY) {

	It means if the node have memory, we will allocate page_cgroup map for
	the node. We should use N_MEMORY instead here to gaim more clearly.

	The second using is in alloc_page_cgroup():
		if (node_state(nid, N_HIGH_MEMORY))
			addr = vzalloc_node(size, nid);

	It means if the node has high or normal memory that can be allocated
	from kernel. We should keep N_HIGH_MEMORY here, and it will be better
	if the "any memory" semantic of N_HIGH_MEMORY is removed.

B)	This reusing is out-dated if we introduce MOVABLE-dedicated node.
	The MOVABLE-dedicated node should not appear in
	node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY],
	because MOVABLE-dedicated node has no high or normal memory.

	In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node
	is in node_stats[N_HIGH_MEMORY], it is also means it is in
	node_stats[N_NORMAL_MEMORY], it causes SLUB wrong.

	The slub uses
		for_each_node_state(nid, N_NORMAL_MEMORY)
	and creates kmem_cache_node for MOVABLE-dedicated node and cause problem.

In one word, we need a N_MEMORY. We just intrude it as an alias to
N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late patches.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 include/linux/nodemask.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 7afc363..c6ebdc9 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -380,6 +380,7 @@ enum node_states {
 #else
 	N_HIGH_MEMORY = N_NORMAL_MEMORY,
 #endif
+	N_MEMORY = N_HIGH_MEMORY,
 	N_CPU,		/* The node has one or more cpus */
 	NR_NODE_STATES
 };
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 06/26] cpuset: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (4 preceding siblings ...)
  2012-10-29 15:20 ` [V5 PATCH 05/26] node_states: introduce N_MEMORY Lai Jiangshan
@ 2012-10-29 15:20 ` Lai Jiangshan
  2012-10-29 15:20 ` [V5 PATCH 07/26] procfs: " Lai Jiangshan
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:20 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Paul Menage,
	Rob Landley, linux-doc

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 Documentation/cgroups/cpusets.txt |    2 +-
 include/linux/cpuset.h            |    2 +-
 kernel/cpuset.c                   |   32 ++++++++++++++++----------------
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index cefd3d8..12e01d4 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -218,7 +218,7 @@ and name space for cpusets, with a minimum of additional kernel code.
 The cpus and mems files in the root (top_cpuset) cpuset are
 read-only.  The cpus file automatically tracks the value of
 cpu_online_mask using a CPU hotplug notifier, and the mems file
-automatically tracks the value of node_states[N_HIGH_MEMORY]--i.e.,
+automatically tracks the value of node_states[N_MEMORY]--i.e.,
 nodes with memory--using the cpuset_track_online_nodes() hook.
 
 
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 838320f..8c8a60d29 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -144,7 +144,7 @@ static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
 	return node_possible_map;
 }
 
-#define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
+#define cpuset_current_mems_allowed (node_states[N_MEMORY])
 static inline void cpuset_init_current_mems_allowed(void) {}
 
 static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index f33c715..2b133db 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -302,10 +302,10 @@ static void guarantee_online_cpus(const struct cpuset *cs,
  * are online, with memory.  If none are online with memory, walk
  * up the cpuset hierarchy until we find one that does have some
  * online mems.  If we get all the way to the top and still haven't
- * found any online mems, return node_states[N_HIGH_MEMORY].
+ * found any online mems, return node_states[N_MEMORY].
  *
  * One way or another, we guarantee to return some non-empty subset
- * of node_states[N_HIGH_MEMORY].
+ * of node_states[N_MEMORY].
  *
  * Call with callback_mutex held.
  */
@@ -313,14 +313,14 @@ static void guarantee_online_cpus(const struct cpuset *cs,
 static void guarantee_online_mems(const struct cpuset *cs, nodemask_t *pmask)
 {
 	while (cs && !nodes_intersects(cs->mems_allowed,
-					node_states[N_HIGH_MEMORY]))
+					node_states[N_MEMORY]))
 		cs = cs->parent;
 	if (cs)
 		nodes_and(*pmask, cs->mems_allowed,
-					node_states[N_HIGH_MEMORY]);
+					node_states[N_MEMORY]);
 	else
-		*pmask = node_states[N_HIGH_MEMORY];
-	BUG_ON(!nodes_intersects(*pmask, node_states[N_HIGH_MEMORY]));
+		*pmask = node_states[N_MEMORY];
+	BUG_ON(!nodes_intersects(*pmask, node_states[N_MEMORY]));
 }
 
 /*
@@ -1100,7 +1100,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 		return -ENOMEM;
 
 	/*
-	 * top_cpuset.mems_allowed tracks node_stats[N_HIGH_MEMORY];
+	 * top_cpuset.mems_allowed tracks node_stats[N_MEMORY];
 	 * it's read-only
 	 */
 	if (cs == &top_cpuset) {
@@ -1122,7 +1122,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 			goto done;
 
 		if (!nodes_subset(trialcs->mems_allowed,
-				node_states[N_HIGH_MEMORY])) {
+				node_states[N_MEMORY])) {
 			retval =  -EINVAL;
 			goto done;
 		}
@@ -2034,7 +2034,7 @@ static struct cpuset *cpuset_next(struct list_head *queue)
  * before dropping down to the next.  It always processes a node before
  * any of its children.
  *
- * In the case of memory hot-unplug, it will remove nodes from N_HIGH_MEMORY
+ * In the case of memory hot-unplug, it will remove nodes from N_MEMORY
  * if all present pages from a node are offlined.
  */
 static void
@@ -2073,7 +2073,7 @@ scan_cpusets_upon_hotplug(struct cpuset *root, enum hotplug_event event)
 
 			/* Continue past cpusets with all mems online */
 			if (nodes_subset(cp->mems_allowed,
-					node_states[N_HIGH_MEMORY]))
+					node_states[N_MEMORY]))
 				continue;
 
 			oldmems = cp->mems_allowed;
@@ -2081,7 +2081,7 @@ scan_cpusets_upon_hotplug(struct cpuset *root, enum hotplug_event event)
 			/* Remove offline mems from this cpuset. */
 			mutex_lock(&callback_mutex);
 			nodes_and(cp->mems_allowed, cp->mems_allowed,
-						node_states[N_HIGH_MEMORY]);
+						node_states[N_MEMORY]);
 			mutex_unlock(&callback_mutex);
 
 			/* Move tasks from the empty cpuset to a parent */
@@ -2134,8 +2134,8 @@ void cpuset_update_active_cpus(bool cpu_online)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 /*
- * Keep top_cpuset.mems_allowed tracking node_states[N_HIGH_MEMORY].
- * Call this routine anytime after node_states[N_HIGH_MEMORY] changes.
+ * Keep top_cpuset.mems_allowed tracking node_states[N_MEMORY].
+ * Call this routine anytime after node_states[N_MEMORY] changes.
  * See cpuset_update_active_cpus() for CPU hotplug handling.
  */
 static int cpuset_track_online_nodes(struct notifier_block *self,
@@ -2148,7 +2148,7 @@ static int cpuset_track_online_nodes(struct notifier_block *self,
 	case MEM_ONLINE:
 		oldmems = top_cpuset.mems_allowed;
 		mutex_lock(&callback_mutex);
-		top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
+		top_cpuset.mems_allowed = node_states[N_MEMORY];
 		mutex_unlock(&callback_mutex);
 		update_tasks_nodemask(&top_cpuset, &oldmems, NULL);
 		break;
@@ -2177,7 +2177,7 @@ static int cpuset_track_online_nodes(struct notifier_block *self,
 void __init cpuset_init_smp(void)
 {
 	cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask);
-	top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
+	top_cpuset.mems_allowed = node_states[N_MEMORY];
 
 	hotplug_memory_notifier(cpuset_track_online_nodes, 10);
 
@@ -2245,7 +2245,7 @@ void cpuset_init_current_mems_allowed(void)
  *
  * Description: Returns the nodemask_t mems_allowed of the cpuset
  * attached to the specified @tsk.  Guaranteed to return some non-empty
- * subset of node_states[N_HIGH_MEMORY], even if this means going outside the
+ * subset of node_states[N_MEMORY], even if this means going outside the
  * tasks cpuset.
  **/
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 07/26] procfs: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (5 preceding siblings ...)
  2012-10-29 15:20 ` [V5 PATCH 06/26] cpuset: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
@ 2012-10-29 15:20 ` Lai Jiangshan
  2012-10-29 15:20 ` [V5 PATCH 08/26] memcontrol: " Lai Jiangshan
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:20 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Laura Vasilescu,
	Jiri Kosina, WANG Cong, Konstantin Khlebnikov, Naoya Horiguchi,
	Hugh Dickins

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 fs/proc/kcore.c    |    2 +-
 fs/proc/task_mmu.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 86c67ee..e96d4f1 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -249,7 +249,7 @@ static int kcore_update_ram(void)
 	/* Not inialized....update now */
 	/* find out "max pfn" */
 	end_pfn = 0;
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		unsigned long node_end;
 		node_end  = NODE_DATA(nid)->node_start_pfn +
 			NODE_DATA(nid)->node_spanned_pages;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 90c63f9..2d89601 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1126,7 +1126,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
 		return NULL;
 
 	nid = page_to_nid(page);
-	if (!node_isset(nid, node_states[N_HIGH_MEMORY]))
+	if (!node_isset(nid, node_states[N_MEMORY]))
 		return NULL;
 
 	return page;
@@ -1279,7 +1279,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
 	if (md->writeback)
 		seq_printf(m, " writeback=%lu", md->writeback);
 
-	for_each_node_state(n, N_HIGH_MEMORY)
+	for_each_node_state(n, N_MEMORY)
 		if (md->node[n])
 			seq_printf(m, " N%d=%lu", n, md->node[n]);
 out:
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 08/26] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (6 preceding siblings ...)
  2012-10-29 15:20 ` [V5 PATCH 07/26] procfs: " Lai Jiangshan
@ 2012-10-29 15:20 ` Lai Jiangshan
  2012-10-29 16:22   ` Michal Hocko
  2012-10-31 13:18   ` Michal Hocko
  2012-10-29 15:20 ` [V5 PATCH 09/26] oom: " Lai Jiangshan
                   ` (18 subsequent siblings)
  26 siblings, 2 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:20 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Johannes Weiner,
	Michal Hocko, Balbir Singh, Tejun Heo, Li Zefan, cgroups,
	linux-mm, containers

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/memcontrol.c  |   18 +++++++++---------
 mm/page_cgroup.c |    2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7acf43b..1b69665 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -800,7 +800,7 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
 	int nid;
 	u64 total = 0;
 
-	for_each_node_state(nid, N_HIGH_MEMORY)
+	for_each_node_state(nid, N_MEMORY)
 		total += mem_cgroup_node_nr_lru_pages(memcg, nid, lru_mask);
 	return total;
 }
@@ -1611,9 +1611,9 @@ static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
 		return;
 
 	/* make a nodemask where this memcg uses memory from */
-	memcg->scan_nodes = node_states[N_HIGH_MEMORY];
+	memcg->scan_nodes = node_states[N_MEMORY];
 
-	for_each_node_mask(nid, node_states[N_HIGH_MEMORY]) {
+	for_each_node_mask(nid, node_states[N_MEMORY]) {
 
 		if (!test_mem_cgroup_node_reclaimable(memcg, nid, false))
 			node_clear(nid, memcg->scan_nodes);
@@ -1684,7 +1684,7 @@ static bool mem_cgroup_reclaimable(struct mem_cgroup *memcg, bool noswap)
 	/*
 	 * Check rest of nodes.
 	 */
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		if (node_isset(nid, memcg->scan_nodes))
 			continue;
 		if (test_mem_cgroup_node_reclaimable(memcg, nid, noswap))
@@ -3759,7 +3759,7 @@ move_account:
 		drain_all_stock_sync(memcg);
 		ret = 0;
 		mem_cgroup_start_move(memcg);
-		for_each_node_state(node, N_HIGH_MEMORY) {
+		for_each_node_state(node, N_MEMORY) {
 			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
 				enum lru_list lru;
 				for_each_lru(lru) {
@@ -4087,7 +4087,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
 
 	total_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL);
 	seq_printf(m, "total=%lu", total_nr);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid, LRU_ALL);
 		seq_printf(m, " N%d=%lu", nid, node_nr);
 	}
@@ -4095,7 +4095,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
 
 	file_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_FILE);
 	seq_printf(m, "file=%lu", file_nr);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
 				LRU_ALL_FILE);
 		seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4104,7 +4104,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
 
 	anon_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_ANON);
 	seq_printf(m, "anon=%lu", anon_nr);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
 				LRU_ALL_ANON);
 		seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4113,7 +4113,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
 
 	unevictable_nr = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
 	seq_printf(m, "unevictable=%lu", unevictable_nr);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
 				BIT(LRU_UNEVICTABLE));
 		seq_printf(m, " N%d=%lu", nid, node_nr);
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 5ddad0c..c1054ad 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -271,7 +271,7 @@ void __init page_cgroup_init(void)
 	if (mem_cgroup_disabled())
 		return;
 
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		unsigned long start_pfn, end_pfn;
 
 		start_pfn = node_start_pfn(nid);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 09/26] oom: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (7 preceding siblings ...)
  2012-10-29 15:20 ` [V5 PATCH 08/26] memcontrol: " Lai Jiangshan
@ 2012-10-29 15:20 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 10/26] mm,migrate: " Lai Jiangshan
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:20 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Michal Hocko,
	KOSAKI Motohiro, linux-mm

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 mm/oom_kill.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 79e0f3e..aa2d89c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -257,7 +257,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
 	 * the page allocator means a mempolicy is in effect.  Cpuset policy
 	 * is enforced in get_page_from_freelist().
 	 */
-	if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask)) {
+	if (nodemask && !nodes_subset(node_states[N_MEMORY], *nodemask)) {
 		*totalpages = total_swap_pages;
 		for_each_node_mask(nid, *nodemask)
 			*totalpages += node_spanned_pages(nid);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 10/26] mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (8 preceding siblings ...)
  2012-10-29 15:20 ` [V5 PATCH 09/26] oom: " Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 11/26] mempolicy: " Lai Jiangshan
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Michal Hocko,
	Hugh Dickins, Christoph Lameter, linux-mm

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/migrate.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..d595e58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1201,7 +1201,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 			if (node < 0 || node >= MAX_NUMNODES)
 				goto out_pm;
 
-			if (!node_state(node, N_HIGH_MEMORY))
+			if (!node_state(node, N_MEMORY))
 				goto out_pm;
 
 			err = -EACCES;
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 11/26] mempolicy: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (9 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 10/26] mm,migrate: " Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 12/26] hugetlb: " Lai Jiangshan
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, KOSAKI Motohiro,
	Christoph Lameter, linux-mm

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/mempolicy.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d04a8a5..d4a084c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -212,9 +212,9 @@ static int mpol_set_nodemask(struct mempolicy *pol,
 	/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
 	if (pol == NULL)
 		return 0;
-	/* Check N_HIGH_MEMORY */
+	/* Check N_MEMORY */
 	nodes_and(nsc->mask1,
-		  cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
+		  cpuset_current_mems_allowed, node_states[N_MEMORY]);
 
 	VM_BUG_ON(!nodes);
 	if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes))
@@ -1388,7 +1388,7 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 		goto out_put;
 	}
 
-	if (!nodes_subset(*new, node_states[N_HIGH_MEMORY])) {
+	if (!nodes_subset(*new, node_states[N_MEMORY])) {
 		err = -EINVAL;
 		goto out_put;
 	}
@@ -2361,7 +2361,7 @@ void __init numa_policy_init(void)
 	 * fall back to the largest node if they're all smaller.
 	 */
 	nodes_clear(interleave_nodes);
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		unsigned long total_pages = node_present_pages(nid);
 
 		/* Preserve the largest node */
@@ -2442,7 +2442,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
 		*nodelist++ = '\0';
 		if (nodelist_parse(nodelist, nodes))
 			goto out;
-		if (!nodes_subset(nodes, node_states[N_HIGH_MEMORY]))
+		if (!nodes_subset(nodes, node_states[N_MEMORY]))
 			goto out;
 	} else
 		nodes_clear(nodes);
@@ -2476,7 +2476,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
 		 * Default to online nodes with memory if no nodelist
 		 */
 		if (!nodelist)
-			nodes = node_states[N_HIGH_MEMORY];
+			nodes = node_states[N_MEMORY];
 		break;
 	case MPOL_LOCAL:
 		/*
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 12/26] hugetlb: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (10 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 11/26] mempolicy: " Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 13/26] vmstat: " Lai Jiangshan
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan,
	Greg Kroah-Hartman, Michal Hocko, Hillf Danton, Aneesh Kumar K.V,
	linux-mm

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 drivers/base/node.c |    2 +-
 mm/hugetlb.c        |   24 ++++++++++++------------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5d7731e..4c3aa7c 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -227,7 +227,7 @@ static node_registration_func_t __hugetlb_unregister_node;
 static inline bool hugetlb_register_node(struct node *node)
 {
 	if (__hugetlb_register_node &&
-			node_state(node->dev.id, N_HIGH_MEMORY)) {
+			node_state(node->dev.id, N_MEMORY)) {
 		__hugetlb_register_node(node);
 		return true;
 	}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 59a0059..7720ade 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1057,7 +1057,7 @@ static void return_unused_surplus_pages(struct hstate *h,
 	 * on-line nodes with memory and will handle the hstate accounting.
 	 */
 	while (nr_pages--) {
-		if (!free_pool_huge_page(h, &node_states[N_HIGH_MEMORY], 1))
+		if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
 			break;
 	}
 }
@@ -1180,14 +1180,14 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 int __weak alloc_bootmem_huge_page(struct hstate *h)
 {
 	struct huge_bootmem_page *m;
-	int nr_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+	int nr_nodes = nodes_weight(node_states[N_MEMORY]);
 
 	while (nr_nodes) {
 		void *addr;
 
 		addr = __alloc_bootmem_node_nopanic(
 				NODE_DATA(hstate_next_node_to_alloc(h,
-						&node_states[N_HIGH_MEMORY])),
+						&node_states[N_MEMORY])),
 				huge_page_size(h), huge_page_size(h), 0);
 
 		if (addr) {
@@ -1259,7 +1259,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 			if (!alloc_bootmem_huge_page(h))
 				break;
 		} else if (!alloc_fresh_huge_page(h,
-					 &node_states[N_HIGH_MEMORY]))
+					 &node_states[N_MEMORY]))
 			break;
 	}
 	h->max_huge_pages = i;
@@ -1527,7 +1527,7 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
 		if (!(obey_mempolicy &&
 				init_nodemask_of_mempolicy(nodes_allowed))) {
 			NODEMASK_FREE(nodes_allowed);
-			nodes_allowed = &node_states[N_HIGH_MEMORY];
+			nodes_allowed = &node_states[N_MEMORY];
 		}
 	} else if (nodes_allowed) {
 		/*
@@ -1537,11 +1537,11 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
 		count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
 		init_nodemask_of_node(nodes_allowed, nid);
 	} else
-		nodes_allowed = &node_states[N_HIGH_MEMORY];
+		nodes_allowed = &node_states[N_MEMORY];
 
 	h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);
 
-	if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+	if (nodes_allowed != &node_states[N_MEMORY])
 		NODEMASK_FREE(nodes_allowed);
 
 	return len;
@@ -1844,7 +1844,7 @@ static void hugetlb_register_all_nodes(void)
 {
 	int nid;
 
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		struct node *node = &node_devices[nid];
 		if (node->dev.id == nid)
 			hugetlb_register_node(node);
@@ -1939,8 +1939,8 @@ void __init hugetlb_add_hstate(unsigned order)
 	for (i = 0; i < MAX_NUMNODES; ++i)
 		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
 	INIT_LIST_HEAD(&h->hugepage_activelist);
-	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
-	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
+	h->next_nid_to_alloc = first_node(node_states[N_MEMORY]);
+	h->next_nid_to_free = first_node(node_states[N_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
 					huge_page_size(h)/1024);
 	/*
@@ -2035,11 +2035,11 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
 		if (!(obey_mempolicy &&
 			       init_nodemask_of_mempolicy(nodes_allowed))) {
 			NODEMASK_FREE(nodes_allowed);
-			nodes_allowed = &node_states[N_HIGH_MEMORY];
+			nodes_allowed = &node_states[N_MEMORY];
 		}
 		h->max_huge_pages = set_max_huge_pages(h, tmp, nodes_allowed);
 
-		if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+		if (nodes_allowed != &node_states[N_MEMORY])
 			NODEMASK_FREE(nodes_allowed);
 	}
 out:
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 13/26] vmstat: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (11 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 12/26] hugetlb: " Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 14/26] kthread: " Lai Jiangshan
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Mel Gorman,
	Christoph Lameter, Minchan Kim, Johannes Weiner, linux-mm

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/vmstat.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..1b5cacd 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -930,7 +930,7 @@ static int pagetypeinfo_show(struct seq_file *m, void *arg)
 	pg_data_t *pgdat = (pg_data_t *)arg;
 
 	/* check memoryless node */
-	if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+	if (!node_state(pgdat->node_id, N_MEMORY))
 		return 0;
 
 	seq_printf(m, "Page block order: %d\n", pageblock_order);
@@ -1292,7 +1292,7 @@ static int unusable_show(struct seq_file *m, void *arg)
 	pg_data_t *pgdat = (pg_data_t *)arg;
 
 	/* check memoryless node */
-	if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+	if (!node_state(pgdat->node_id, N_MEMORY))
 		return 0;
 
 	walk_zones_in_node(m, pgdat, unusable_show_print);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 14/26] kthread: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (12 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 13/26] vmstat: " Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 15/26] init: " Lai Jiangshan
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Tejun Heo,
	Thomas Gleixner, Paul Gortmaker, Paul E. McKenney, Namhyung Kim

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/kthread.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 29fb60c..691dc2e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -428,7 +428,7 @@ int kthreadd(void *unused)
 	set_task_comm(tsk, "kthreadd");
 	ignore_signals(tsk);
 	set_cpus_allowed_ptr(tsk, cpu_all_mask);
-	set_mems_allowed(node_states[N_HIGH_MEMORY]);
+	set_mems_allowed(node_states[N_MEMORY]);
 
 	current->flags |= PF_NOFREEZE;
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 15/26] init: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (13 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 14/26] kthread: " Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 16/26] vmscan: " Lai Jiangshan
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Ingo Molnar,
	Jim Cromie, Al Viro, H. Peter Anvin

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 init/main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/init/main.c b/init/main.c
index 9cf77ab..9595968 100644
--- a/init/main.c
+++ b/init/main.c
@@ -855,7 +855,7 @@ static void __init kernel_init_freeable(void)
 	/*
 	 * init can allocate pages on any node
 	 */
-	set_mems_allowed(node_states[N_HIGH_MEMORY]);
+	set_mems_allowed(node_states[N_MEMORY]);
 	/*
 	 * init can run on any cpu.
 	 */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 16/26] vmscan: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (14 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 15/26] init: " Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 17/26] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Minchan Kim,
	Hugh Dickins, linux-mm

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 mm/vmscan.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2624edc..98a2e11 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3135,7 +3135,7 @@ static int __devinit cpu_callback(struct notifier_block *nfb,
 	int nid;
 
 	if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
-		for_each_node_state(nid, N_HIGH_MEMORY) {
+		for_each_node_state(nid, N_MEMORY) {
 			pg_data_t *pgdat = NODE_DATA(nid);
 			const struct cpumask *mask;
 
@@ -3191,7 +3191,7 @@ static int __init kswapd_init(void)
 	int nid;
 
 	swap_setup();
-	for_each_node_state(nid, N_HIGH_MEMORY)
+	for_each_node_state(nid, N_MEMORY)
  		kswapd_run(nid);
 	hotcpu_notifier(cpu_callback, 0);
 	return 0;
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 17/26] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (15 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 16/26] vmscan: " Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 18/26] hotplug: update nodemasks management Lai Jiangshan
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Tejun Heo, Pekka Enberg,
	Minchan Kim, Michal Hocko, linux-mm

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Since we introduced N_MEMORY, we update the initialization of node_states.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 arch/x86/mm/init_64.c |    4 +++-
 mm/page_alloc.c       |   40 ++++++++++++++++++++++------------------
 2 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3baff25..2ead3c8 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -630,7 +630,9 @@ void __init paging_init(void)
 	 *	 numa support is not compiled in, and later node_set_state
 	 *	 will not set it back.
 	 */
-	node_clear_state(0, N_NORMAL_MEMORY);
+	node_clear_state(0, N_MEMORY);
+	if (N_MEMORY != N_NORMAL_MEMORY)
+		node_clear_state(0, N_NORMAL_MEMORY);
 
 	zone_sizes_init();
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b1ef9b0..b70c929 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1692,7 +1692,7 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark,
  *
  * If the zonelist cache is present in the passed in zonelist, then
  * returns a pointer to the allowed node mask (either the current
- * tasks mems_allowed, or node_states[N_HIGH_MEMORY].)
+ * tasks mems_allowed, or node_states[N_MEMORY].)
  *
  * If the zonelist cache is not available for this zonelist, does
  * nothing and returns NULL.
@@ -1721,7 +1721,7 @@ static nodemask_t *zlc_setup(struct zonelist *zonelist, int alloc_flags)
 
 	allowednodes = !in_interrupt() && (alloc_flags & ALLOC_CPUSET) ?
 					&cpuset_current_mems_allowed :
-					&node_states[N_HIGH_MEMORY];
+					&node_states[N_MEMORY];
 	return allowednodes;
 }
 
@@ -3194,7 +3194,7 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
 		return node;
 	}
 
-	for_each_node_state(n, N_HIGH_MEMORY) {
+	for_each_node_state(n, N_MEMORY) {
 
 		/* Don't want a node to appear more than once */
 		if (node_isset(n, *used_node_mask))
@@ -3336,7 +3336,7 @@ static int default_zonelist_order(void)
  	 * local memory, NODE_ORDER may be suitable.
          */
 	average_size = total_size /
-				(nodes_weight(node_states[N_HIGH_MEMORY]) + 1);
+				(nodes_weight(node_states[N_MEMORY]) + 1);
 	for_each_online_node(nid) {
 		low_kmem_size = 0;
 		total_size = 0;
@@ -4669,7 +4669,7 @@ unsigned long __init find_min_pfn_with_active_regions(void)
 /*
  * early_calculate_totalpages()
  * Sum pages in active regions for movable zone.
- * Populate N_HIGH_MEMORY for calculating usable_nodes.
+ * Populate N_MEMORY for calculating usable_nodes.
  */
 static unsigned long __init early_calculate_totalpages(void)
 {
@@ -4682,7 +4682,7 @@ static unsigned long __init early_calculate_totalpages(void)
 
 		totalpages += pages;
 		if (pages)
-			node_set_state(nid, N_HIGH_MEMORY);
+			node_set_state(nid, N_MEMORY);
 	}
   	return totalpages;
 }
@@ -4699,9 +4699,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	unsigned long usable_startpfn;
 	unsigned long kernelcore_node, kernelcore_remaining;
 	/* save the state before borrow the nodemask */
-	nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
+	nodemask_t saved_node_state = node_states[N_MEMORY];
 	unsigned long totalpages = early_calculate_totalpages();
-	int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+	int usable_nodes = nodes_weight(node_states[N_MEMORY]);
 
 	/*
 	 * If movablecore was specified, calculate what size of
@@ -4736,7 +4736,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 restart:
 	/* Spread kernelcore memory as evenly as possible throughout nodes */
 	kernelcore_node = required_kernelcore / usable_nodes;
-	for_each_node_state(nid, N_HIGH_MEMORY) {
+	for_each_node_state(nid, N_MEMORY) {
 		unsigned long start_pfn, end_pfn;
 
 		/*
@@ -4828,23 +4828,27 @@ restart:
 
 out:
 	/* restore the node_state */
-	node_states[N_HIGH_MEMORY] = saved_node_state;
+	node_states[N_MEMORY] = saved_node_state;
 }
 
-/* Any regular memory on that node ? */
-static void __init check_for_regular_memory(pg_data_t *pgdat)
+/* Any regular or high memory on that node ? */
+static void check_for_memory(pg_data_t *pgdat, int nid)
 {
-#ifdef CONFIG_HIGHMEM
 	enum zone_type zone_type;
 
-	for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
+	if (N_MEMORY == N_NORMAL_MEMORY)
+		return;
+
+	for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) {
 		struct zone *zone = &pgdat->node_zones[zone_type];
 		if (zone->present_pages) {
-			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
+			node_set_state(nid, N_HIGH_MEMORY);
+			if (N_NORMAL_MEMORY != N_HIGH_MEMORY &&
+			    zone_type <= ZONE_NORMAL)
+				node_set_state(nid, N_NORMAL_MEMORY);
 			break;
 		}
 	}
-#endif
 }
 
 /**
@@ -4927,8 +4931,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 
 		/* Any memory on that node */
 		if (pgdat->node_present_pages)
-			node_set_state(nid, N_HIGH_MEMORY);
-		check_for_regular_memory(pgdat);
+			node_set_state(nid, N_MEMORY);
+		check_for_memory(pgdat, nid);
 	}
 }
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 18/26] hotplug: update nodemasks management
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (16 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 17/26] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 19/26] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Rob Landley,
	Jianguo Wu, Kay Sievers, Wen Congyang, linux-doc, linux-mm

update nodemasks management for N_MEMORY

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 Documentation/memory-hotplug.txt |    5 ++-
 include/linux/memory.h           |    1 +
 mm/memory_hotplug.c              |   87 +++++++++++++++++++++++++++++++-------
 3 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index c6f993d..8e5eacb 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -390,6 +390,7 @@ struct memory_notify {
        unsigned long start_pfn;
        unsigned long nr_pages;
        int status_change_nid_normal;
+       int status_change_nid_high;
        int status_change_nid;
 }
 
@@ -397,7 +398,9 @@ start_pfn is start_pfn of online/offline memory.
 nr_pages is # of pages of online/offline memory.
 status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
 is (will be) set/clear, if this is -1, then nodemask status is not changed.
-status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
+status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
+status_change_nid is set node id when N_MEMORY of nodemask is (will be)
 set/clear. It means a new(memoryless) node gets new memory by online and a
 node loses all memory. If this is -1, then nodemask status is not changed.
 If status_changed_nid* >= 0, callback should create/discard structures for the
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a09216d..45e93b4 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -54,6 +54,7 @@ struct memory_notify {
 	unsigned long start_pfn;
 	unsigned long nr_pages;
 	int status_change_nid_normal;
+	int status_change_nid_high;
 	int status_change_nid;
 };
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9af9641..a55b547 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -603,13 +603,15 @@ static void node_states_check_changes_online(unsigned long nr_pages,
 	enum zone_type zone_last = ZONE_NORMAL;
 
 	/*
-	 * If we have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
-	 * which have 0...ZONE_NORMAL, set zone_last to ZONE_NORMAL.
+	 * If we have HIGHMEM or movable node, node_states[N_NORMAL_MEMORY]
+	 * contains nodes which have zones of 0...ZONE_NORMAL,
+	 * set zone_last to ZONE_NORMAL.
 	 *
-	 * If we don't have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
-	 * which have 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
+	 * If we don't have HIGHMEM nor movable node,
+	 * node_states[N_NORMAL_MEMORY] contains nodes which have zones of
+	 * 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
 	 */
-	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+	if (N_MEMORY == N_NORMAL_MEMORY)
 		zone_last = ZONE_MOVABLE;
 
 	/*
@@ -623,12 +625,34 @@ static void node_states_check_changes_online(unsigned long nr_pages,
 	else
 		arg->status_change_nid_normal = -1;
 
+#ifdef CONFIG_HIGHMEM
+	/*
+	 * If we have movable node, node_states[N_HIGH_MEMORY]
+	 * contains nodes which have zones of 0...ZONE_HIGH,
+	 * set zone_last to ZONE_HIGH.
+	 *
+	 * If we don't have movable node, node_states[N_NORMAL_MEMORY]
+	 * contains nodes which have zones of 0...ZONE_MOVABLE,
+	 * set zone_last to ZONE_MOVABLE.
+	 */
+	zone_last = ZONE_HIGH;
+	if (N_MEMORY == N_HIGH_MEMORY)
+		zone_last = ZONE_MOVABLE;
+
+	if (zone_idx(zone) <= zone_last && !node_state(nid, N_HIGH_MEMORY))
+		arg->status_change_nid_high = nid;
+	else
+		arg->status_change_nid_high = -1;
+#else
+	arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
 	/*
 	 * if the node don't have memory befor online, we will need to
-	 * set the node to node_states[N_HIGH_MEMORY] after the memory
+	 * set the node to node_states[N_MEMORY] after the memory
 	 * is online.
 	 */
-	if (!node_state(nid, N_HIGH_MEMORY))
+	if (!node_state(nid, N_MEMORY))
 		arg->status_change_nid = nid;
 	else
 		arg->status_change_nid = -1;
@@ -639,7 +663,10 @@ static void node_states_set_node(int node, struct memory_notify *arg)
 	if (arg->status_change_nid_normal >= 0)
 		node_set_state(node, N_NORMAL_MEMORY);
 
-	node_set_state(node, N_HIGH_MEMORY);
+	if (arg->status_change_nid_high >= 0)
+		node_set_state(node, N_HIGH_MEMORY);
+
+	node_set_state(node, N_MEMORY);
 }
 
 
@@ -1103,13 +1130,15 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
 	enum zone_type zt, zone_last = ZONE_NORMAL;
 
 	/*
-	 * If we have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
-	 * which have 0...ZONE_NORMAL, set zone_last to ZONE_NORMAL.
+	 * If we have HIGHMEM or movable node, node_states[N_NORMAL_MEMORY]
+	 * contains nodes which have zones of 0...ZONE_NORMAL,
+	 * set zone_last to ZONE_NORMAL.
 	 *
-	 * If we don't have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
-	 * which have 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
+	 * If we don't have HIGHMEM nor movable node,
+	 * node_states[N_NORMAL_MEMORY] contains nodes which have zones of
+	 * 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
 	 */
-	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+	if (N_MEMORY == N_NORMAL_MEMORY)
 		zone_last = ZONE_MOVABLE;
 
 	/*
@@ -1126,6 +1155,30 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
 	else
 		arg->status_change_nid_normal = -1;
 
+#ifdef CONIG_HIGHMEM
+	/*
+	 * If we have movable node, node_states[N_HIGH_MEMORY]
+	 * contains nodes which have zones of 0...ZONE_HIGH,
+	 * set zone_last to ZONE_HIGH.
+	 *
+	 * If we don't have movable node, node_states[N_NORMAL_MEMORY]
+	 * contains nodes which have zones of 0...ZONE_MOVABLE,
+	 * set zone_last to ZONE_MOVABLE.
+	 */
+	zone_last = ZONE_HIGH;
+	if (N_MEMORY == N_HIGH_MEMORY)
+		zone_last = ZONE_MOVABLE;
+
+	for (; zt <= zone_last; zt++)
+		present_pages += pgdat->node_zones[zt].present_pages;
+	if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
+		arg->status_change_nid_high = zone_to_nid(zone);
+	else
+		arg->status_change_nid_high = -1;
+#else
+	arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
 	/*
 	 * node_states[N_HIGH_MEMORY] contains nodes which have 0...ZONE_MOVABLE
 	 */
@@ -1150,9 +1203,13 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
 	if (arg->status_change_nid_normal >= 0)
 		node_clear_state(node, N_NORMAL_MEMORY);
 
-	if ((N_HIGH_MEMORY != N_NORMAL_MEMORY) &&
-	    (arg->status_change_nid >= 0))
+	if ((N_MEMORY != N_NORMAL_MEMORY) &&
+	    (arg->status_change_nid_high >= 0))
 		node_clear_state(node, N_HIGH_MEMORY);
+
+	if ((N_MEMORY != N_HIGH_MEMORY) &&
+	    (arg->status_change_nid >= 0))
+		node_clear_state(node, N_MEMORY);
 }
 
 static int __ref __offline_pages(unsigned long start_pfn,
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 19/26] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (17 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 18/26] hotplug: update nodemasks management Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 20/26] memory_hotplug: allow online/offline memory to result movable node Lai Jiangshan
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan,
	Greg Kroah-Hartman, Christoph Lameter, Hillf Danton, Minchan Kim,
	Johannes Weiner, Dan Magenheimer, Mel Gorman, Michal Hocko,
	linux-mm

All are prepared, we can actually introduce N_MEMORY.
add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 drivers/base/node.c      |    6 ++++++
 include/linux/nodemask.h |    4 ++++
 mm/Kconfig               |    8 ++++++++
 mm/page_alloc.c          |    3 +++
 4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 4c3aa7c..9cdd66f 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -620,6 +620,9 @@ static struct node_attr node_state_attr[] = {
 #ifdef CONFIG_HIGHMEM
 	[N_HIGH_MEMORY] = _NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
 #endif
+#ifdef CONFIG_MOVABLE_NODE
+	[N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
+#endif
 	[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
 };
 
@@ -630,6 +633,9 @@ static struct attribute *node_state_attrs[] = {
 #ifdef CONFIG_HIGHMEM
 	&node_state_attr[N_HIGH_MEMORY].attr.attr,
 #endif
+#ifdef CONFIG_MOVABLE_NODE
+	&node_state_attr[N_MEMORY].attr.attr,
+#endif
 	&node_state_attr[N_CPU].attr.attr,
 	NULL
 };
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index c6ebdc9..4e2cbfa 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -380,7 +380,11 @@ enum node_states {
 #else
 	N_HIGH_MEMORY = N_NORMAL_MEMORY,
 #endif
+#ifdef CONFIG_MOVABLE_NODE
+	N_MEMORY,		/* The node has memory(regular, high, movable) */
+#else
 	N_MEMORY = N_HIGH_MEMORY,
+#endif
 	N_CPU,		/* The node has one or more cpus */
 	NR_NODE_STATES
 };
diff --git a/mm/Kconfig b/mm/Kconfig
index a3f8ddd..957ebd5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -143,6 +143,14 @@ config NO_BOOTMEM
 config MEMORY_ISOLATION
 	boolean
 
+config MOVABLE_NODE
+	boolean "Enable to assign a node has only movable memory"
+	depends on HAVE_MEMBLOCK
+	depends on NO_BOOTMEM
+	depends on X86_64
+	depends on NUMA
+	default y
+
 # eventually, we can have this option just 'select SPARSEMEM'
 config MEMORY_HOTPLUG
 	bool "Allow for memory hot-add"
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b70c929..a42337f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -90,6 +90,9 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
 #ifdef CONFIG_HIGHMEM
 	[N_HIGH_MEMORY] = { { [0] = 1UL } },
 #endif
+#ifdef CONFIG_MOVABLE_NODE
+	[N_MEMORY] = { { [0] = 1UL } },
+#endif
 	[N_CPU] = { { [0] = 1UL } },
 #endif	/* NUMA */
 };
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 20/26] memory_hotplug: allow online/offline memory to result movable node
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (18 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 19/26] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 21/26] page_alloc: add kernelcore_max_addr Lai Jiangshan
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Wen Congyang,
	linux-mm

Now, memory management can handle movable node or nodes which don't have
any normal memory, so we can dynamic configure and add movable node by:
	online a ZONE_MOVABLE memory from a previous offline node
	offline the last normal memory which result a non-normal-memory-node

movable-node is very important for power-saving,
hardware partitioning and high-available-system(hardware fault management).


Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a55b547..756744c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -589,11 +589,19 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 	return 0;
 }
 
+#ifdef CONFIG_MOVABLE_NODE
+/* when CONFIG_MOVABLE_NODE, we allow online node don't have normal memory */
+static bool can_online_high_movable(struct zone *zone)
+{
+	return true;
+}
+#else /* #ifdef CONFIG_MOVABLE_NODE */
 /* ensure every online node has NORMAL memory */
 static bool can_online_high_movable(struct zone *zone)
 {
 	return node_state(zone_to_nid(zone), N_NORMAL_MEMORY);
 }
+#endif /* #ifdef CONFIG_MOVABLE_NODE */
 
 /* check which state of node_states will be changed when online memory */
 static void node_states_check_changes_online(unsigned long nr_pages,
@@ -1097,6 +1105,13 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	return offlined;
 }
 
+#ifdef CONFIG_MOVABLE_NODE
+/* when CONFIG_MOVABLE_NODE, we allow online node don't have normal memory */
+static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
+{
+	return true;
+}
+#else /* #ifdef CONFIG_MOVABLE_NODE */
 /* ensure the node has NORMAL memory if it is still online */
 static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
 {
@@ -1120,6 +1135,7 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
 	 */
 	return present_pages == 0;
 }
+#endif /* #ifdef CONFIG_MOVABLE_NODE */
 
 /* check which state of node_states will be changed when offline memory */
 static void node_states_check_changes_offline(unsigned long nr_pages,
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 21/26] page_alloc: add kernelcore_max_addr
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (19 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 20/26] memory_hotplug: allow online/offline memory to result movable node Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 22/26] x86: get pg_data_t's memory from other node Lai Jiangshan
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Rob Landley,
	Minchan Kim, Michal Hocko, linux-doc, linux-mm

Current ZONE_MOVABLE (kernelcore=) setting policy with boot option doesn't meet
our requirement. We need something like kernelcore_max_addr=XX boot option
to limit the kernelcore upper address.

The memory with higher address will be migratable(movable) and they
are easier to be offline(always ready to be offline when the system don't require
so much memory).

It makes things easy when we dynamic hot-add/remove memory, make better
utilities of memories, and helps for THP.

All kernelcore_max_addr=, kernelcore= and movablecore= can be safely specified
at the same time(or any 2 of them).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |    9 +++++++++
 mm/page_alloc.c                     |   29 ++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 9776f06..2b72ffb 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1223,6 +1223,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			use the HighMem zone if it exists, and the Normal
 			zone if it does not.
 
+	kernelcore_max_addr=nn[KMG]	[KNL,X86,IA-64,PPC] This parameter
+			is the same effect as kernelcore parameter, except it
+			specifies the up physical address of memory range
+			usable by the kernel for non-movable allocations.
+			If both kernelcore and kernelcore_max_addr are
+			specified, this requested's priority is higher than
+			kernelcore's.
+			See the kernelcore parameter.
+
 	kgdbdbgp=	[KGDB,HW] kgdb over EHCI usb debug port.
 			Format: <Controller#>[,poll interval]
 			The controller # is the number of the ehci usb debug
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a42337f..11df8b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -203,6 +203,7 @@ static unsigned long __meminitdata dma_reserve;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
+static unsigned long __initdata required_kernelcore_max_pfn;
 static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
@@ -4700,6 +4701,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 {
 	int i, nid;
 	unsigned long usable_startpfn;
+	unsigned long kernelcore_max_pfn;
 	unsigned long kernelcore_node, kernelcore_remaining;
 	/* save the state before borrow the nodemask */
 	nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -4728,6 +4730,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		required_kernelcore = max(required_kernelcore, corepages);
 	}
 
+	if (required_kernelcore_max_pfn && !required_kernelcore)
+		required_kernelcore = totalpages;
+
 	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
 	if (!required_kernelcore)
 		goto out;
@@ -4736,6 +4741,12 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
+	if (required_kernelcore_max_pfn)
+		kernelcore_max_pfn = required_kernelcore_max_pfn;
+	else
+		kernelcore_max_pfn = ULONG_MAX >> PAGE_SHIFT;
+	kernelcore_max_pfn = max(kernelcore_max_pfn, usable_startpfn);
+
 restart:
 	/* Spread kernelcore memory as evenly as possible throughout nodes */
 	kernelcore_node = required_kernelcore / usable_nodes;
@@ -4762,8 +4773,12 @@ restart:
 			unsigned long size_pages;
 
 			start_pfn = max(start_pfn, zone_movable_pfn[nid]);
-			if (start_pfn >= end_pfn)
+			end_pfn = min(kernelcore_max_pfn, end_pfn);
+			if (start_pfn >= end_pfn) {
+				if (!zone_movable_pfn[nid])
+					zone_movable_pfn[nid] = start_pfn;
 				continue;
+			}
 
 			/* Account for what is only usable for kernelcore */
 			if (start_pfn < usable_startpfn) {
@@ -4954,6 +4969,18 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
 	return 0;
 }
 
+#ifdef CONFIG_MOVABLE_NODE
+/*
+ * kernelcore_max_addr=addr sets the up physical address of memory range
+ * for use for allocations that cannot be reclaimed or migrated.
+ */
+static int __init cmdline_parse_kernelcore_max_addr(char *p)
+{
+	return cmdline_parse_core(p, &required_kernelcore_max_pfn);
+}
+early_param("kernelcore_max_addr", cmdline_parse_kernelcore_max_addr);
+#endif
+
 /*
  * kernelcore=size sets the amount of memory for use for allocations that
  * cannot be reclaimed or migrated.
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 22/26] x86: get pg_data_t's memory from other node
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (20 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 21/26] page_alloc: add kernelcore_max_addr Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 23/26] x86: use memblock_set_current_limit() to set memblock.current_limit Lai Jiangshan
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Bjorn Helgaas

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system can create movable node which all memory of the
node is allocated as ZONE_MOVABLE, setup_node_data() cannot
allocate memory for the node's pg_data_t.
So when memblock_alloc_nid() fails, setup_node_data() retries
memblock_alloc().

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 arch/x86/mm/numa.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 2d125be..a86e315 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -223,9 +223,13 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
 		remapped = true;
 	} else {
 		nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
-		if (!nd_pa) {
-			pr_err("Cannot find %zu bytes in node %d\n",
+		if (!nd_pa)
+			printk(KERN_WARNING "Cannot find %zu bytes in node %d\n",
 			       nd_size, nid);
+		nd_pa = memblock_alloc(nd_size, SMP_CACHE_BYTES);
+		if (!nd_pa) {
+			pr_err("Cannot find %zu bytes in other node\n",
+			       nd_size);
 			return;
 		}
 		nd = __va(nd_pa);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 23/26] x86: use memblock_set_current_limit() to set memblock.current_limit
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (21 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 22/26] x86: get pg_data_t's memory from other node Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 24/26] memblock: limit memory address from memblock Lai Jiangshan
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Matt Fleming, Attilio Rao

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

memblock.current_limit is set directly though memblock_set_current_limit()
is prepared. So fix it.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 arch/x86/kernel/setup.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ca45696..ab3017a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -890,7 +890,7 @@ void __init setup_arch(char **cmdline_p)
 
 	cleanup_highmap();
 
-	memblock.current_limit = get_max_mapped();
+	memblock_set_current_limit(get_max_mapped());
 	memblock_x86_fill();
 
 	/*
@@ -940,7 +940,7 @@ void __init setup_arch(char **cmdline_p)
 		max_low_pfn = max_pfn;
 	}
 #endif
-	memblock.current_limit = get_max_mapped();
+	memblock_set_current_limit(get_max_mapped());
 	dma_contiguous_reserve(0);
 
 	/*
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 24/26] memblock: limit memory address from memblock
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (22 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 23/26] x86: use memblock_set_current_limit() to set memblock.current_limit Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 25/26] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Tejun Heo,
	Wanpeng Li, Jacob Shin, Ingo Molnar, Minchan Kim, Michal Hocko,
	linux-mm

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Setting kernelcore_max_pfn means all memory which is bigger than
the boot parameter is allocated as ZONE_MOVABLE. So memory which
is allocated by memblock also should be limited by the parameter.

The patch limits memory from memblock.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |    5 ++++-
 mm/page_alloc.c          |    6 +++++-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index d452ee1..3e52911 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -42,6 +42,7 @@ struct memblock {
 
 extern struct memblock memblock;
 extern int memblock_debug;
+extern phys_addr_t memblock_limit;
 
 #define memblock_dbg(fmt, ...) \
 	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
diff --git a/mm/memblock.c b/mm/memblock.c
index 6259055..ee2e307 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -957,7 +957,10 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
 
 void __init_memblock memblock_set_current_limit(phys_addr_t limit)
 {
-	memblock.current_limit = limit;
+	if (!memblock_limit || (memblock_limit > limit))
+		memblock.current_limit = limit;
+	else
+		memblock.current_limit = memblock_limit;
 }
 
 static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 11df8b5..f76b696 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -208,6 +208,8 @@ static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
 
+phys_addr_t memblock_limit;
+
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
 EXPORT_SYMBOL(movable_zone);
@@ -4976,7 +4978,9 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
  */
 static int __init cmdline_parse_kernelcore_max_addr(char *p)
 {
-	return cmdline_parse_core(p, &required_kernelcore_max_pfn);
+	cmdline_parse_core(p, &required_kernelcore_max_pfn);
+	memblock_limit = required_kernelcore_max_pfn << PAGE_SHIFT;
+	return 0;
 }
 early_param("kernelcore_max_addr", cmdline_parse_kernelcore_max_addr);
 #endif
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 25/26] memblock: compare current_limit with end variable at memblock_find_in_range_node()
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (23 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 24/26] memblock: limit memory address from memblock Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-29 15:21 ` [V5 PATCH 26/26] mempolicy: fix is_valid_nodemask() Lai Jiangshan
  2012-10-30  9:50 ` [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Yasuaki Ishimatsu
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, Tejun Heo,
	Ingo Molnar, Wanpeng Li, linux-mm

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

memblock_find_in_range_node() does not compare memblock.current_limit
with end variable. Thus even if memblock.current_limit is smaller than
end variable, the function allocates memory address that is bigger than
memblock.current_limit.

The patch adds the check to "memblock_find_in_range_node()"

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/memblock.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index ee2e307..50ab53c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -100,11 +100,12 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 					phys_addr_t align, int nid)
 {
 	phys_addr_t this_start, this_end, cand;
+	phys_addr_t current_limit = memblock.current_limit;
 	u64 i;
 
 	/* pump up @end */
-	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
-		end = memblock.current_limit;
+	if ((end == MEMBLOCK_ALLOC_ACCESSIBLE) || (end > current_limit))
+		end = current_limit;
 
 	/* avoid allocating the first page */
 	start = max_t(phys_addr_t, start, PAGE_SIZE);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [V5 PATCH 26/26] mempolicy: fix is_valid_nodemask()
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (24 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 25/26] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
@ 2012-10-29 15:21 ` Lai Jiangshan
  2012-10-30  9:50 ` [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Yasuaki Ishimatsu
  26 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-10-29 15:21 UTC (permalink / raw)
  To: Mel Gorman, David Rientjes, LKML, x86 maintainers
  Cc: Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Yasuaki ISIMATU, Andrew Morton, Lai Jiangshan, KOSAKI Motohiro,
	Christoph Lameter, linux-mm

is_valid_nodemask() is introduced by 19770b32. but it does not match
its comments, because it does not check the zone which > policy_zone.

Also in b377fd, this commits told us, if highest zone is ZONE_MOVABLE,
we should also apply memory policies to it. so ZONE_MOVABLE should be valid zone
for policies. is_valid_nodemask() need to be changed to match it.

Fix: check all zones, even its zoneid > policy_zone.
Use nodes_intersects() instead open code to check it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reported-by: Wen Congyang <wency@cn.fujitsu.com>
---
 mm/mempolicy.c |   36 ++++++++++++++++++++++--------------
 1 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d4a084c..ed7c249 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -140,19 +140,7 @@ static const struct mempolicy_operations {
 /* Check that the nodemask contains at least one populated zone */
 static int is_valid_nodemask(const nodemask_t *nodemask)
 {
-	int nd, k;
-
-	for_each_node_mask(nd, *nodemask) {
-		struct zone *z;
-
-		for (k = 0; k <= policy_zone; k++) {
-			z = &NODE_DATA(nd)->node_zones[k];
-			if (z->present_pages > 0)
-				return 1;
-		}
-	}
-
-	return 0;
+	return nodes_intersects(*nodemask, node_states[N_MEMORY]);
 }
 
 static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
@@ -1572,6 +1560,26 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
 	return pol;
 }
 
+static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone)
+{
+	enum zone_type dynamic_policy_zone = policy_zone;
+
+	BUG_ON(dynamic_policy_zone == ZONE_MOVABLE);
+
+	/*
+	 * if policy->v.nodes has movable memory only,
+	 * we apply policy when gfp_zone(gfp) = ZONE_MOVABLE only.
+	 *
+	 * policy->v.nodes is intersect with node_states[N_MEMORY].
+	 * so if the following test faile, it implies
+	 * policy->v.nodes has movable memory only.
+	 */
+	if (!nodes_intersects(policy->v.nodes, node_states[N_HIGH_MEMORY]))
+		dynamic_policy_zone = ZONE_MOVABLE;
+
+	return zone >= dynamic_policy_zone;
+}
+
 /*
  * Return a nodemask representing a mempolicy for filtering nodes for
  * page allocation
@@ -1580,7 +1588,7 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
 {
 	/* Lower zones don't get a nodemask applied for MPOL_BIND */
 	if (unlikely(policy->mode == MPOL_BIND) &&
-			gfp_zone(gfp) >= policy_zone &&
+			apply_policy_zone(policy, gfp_zone(gfp)) &&
 			cpuset_nodemask_valid_mems_allowed(&policy->v.nodes))
 		return &policy->v.nodes;
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 08/26] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:20 ` [V5 PATCH 08/26] memcontrol: " Lai Jiangshan
@ 2012-10-29 16:22   ` Michal Hocko
  2012-10-29 20:40     ` David Rientjes
  2012-10-31 13:18   ` Michal Hocko
  1 sibling, 1 reply; 37+ messages in thread
From: Michal Hocko @ 2012-10-29 16:22 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Yasuaki ISIMATU,
	Andrew Morton, Johannes Weiner, Balbir Singh, Tejun Heo,
	Li Zefan, cgroups, linux-mm, containers

On Mon 29-10-12 23:20:58, Lai Jiangshan wrote:
> N_HIGH_MEMORY stands for the nodes that has normal or high memory.
> N_MEMORY stands for the nodes that has any memory.

What is the difference of those two?

> The code here need to handle with the nodes which have memory, we should
> use N_MEMORY instead.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  mm/memcontrol.c  |   18 +++++++++---------
>  mm/page_cgroup.c |    2 +-
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7acf43b..1b69665 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -800,7 +800,7 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
>  	int nid;
>  	u64 total = 0;
>  
> -	for_each_node_state(nid, N_HIGH_MEMORY)
> +	for_each_node_state(nid, N_MEMORY)
>  		total += mem_cgroup_node_nr_lru_pages(memcg, nid, lru_mask);
>  	return total;
>  }
> @@ -1611,9 +1611,9 @@ static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
>  		return;
>  
>  	/* make a nodemask where this memcg uses memory from */
> -	memcg->scan_nodes = node_states[N_HIGH_MEMORY];
> +	memcg->scan_nodes = node_states[N_MEMORY];
>  
> -	for_each_node_mask(nid, node_states[N_HIGH_MEMORY]) {
> +	for_each_node_mask(nid, node_states[N_MEMORY]) {
>  
>  		if (!test_mem_cgroup_node_reclaimable(memcg, nid, false))
>  			node_clear(nid, memcg->scan_nodes);
> @@ -1684,7 +1684,7 @@ static bool mem_cgroup_reclaimable(struct mem_cgroup *memcg, bool noswap)
>  	/*
>  	 * Check rest of nodes.
>  	 */
> -	for_each_node_state(nid, N_HIGH_MEMORY) {
> +	for_each_node_state(nid, N_MEMORY) {
>  		if (node_isset(nid, memcg->scan_nodes))
>  			continue;
>  		if (test_mem_cgroup_node_reclaimable(memcg, nid, noswap))
> @@ -3759,7 +3759,7 @@ move_account:
>  		drain_all_stock_sync(memcg);
>  		ret = 0;
>  		mem_cgroup_start_move(memcg);
> -		for_each_node_state(node, N_HIGH_MEMORY) {
> +		for_each_node_state(node, N_MEMORY) {
>  			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
>  				enum lru_list lru;
>  				for_each_lru(lru) {
> @@ -4087,7 +4087,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
>  
>  	total_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL);
>  	seq_printf(m, "total=%lu", total_nr);
> -	for_each_node_state(nid, N_HIGH_MEMORY) {
> +	for_each_node_state(nid, N_MEMORY) {
>  		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid, LRU_ALL);
>  		seq_printf(m, " N%d=%lu", nid, node_nr);
>  	}
> @@ -4095,7 +4095,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
>  
>  	file_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_FILE);
>  	seq_printf(m, "file=%lu", file_nr);
> -	for_each_node_state(nid, N_HIGH_MEMORY) {
> +	for_each_node_state(nid, N_MEMORY) {
>  		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
>  				LRU_ALL_FILE);
>  		seq_printf(m, " N%d=%lu", nid, node_nr);
> @@ -4104,7 +4104,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
>  
>  	anon_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_ANON);
>  	seq_printf(m, "anon=%lu", anon_nr);
> -	for_each_node_state(nid, N_HIGH_MEMORY) {
> +	for_each_node_state(nid, N_MEMORY) {
>  		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
>  				LRU_ALL_ANON);
>  		seq_printf(m, " N%d=%lu", nid, node_nr);
> @@ -4113,7 +4113,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
>  
>  	unevictable_nr = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
>  	seq_printf(m, "unevictable=%lu", unevictable_nr);
> -	for_each_node_state(nid, N_HIGH_MEMORY) {
> +	for_each_node_state(nid, N_MEMORY) {
>  		node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
>  				BIT(LRU_UNEVICTABLE));
>  		seq_printf(m, " N%d=%lu", nid, node_nr);
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 5ddad0c..c1054ad 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -271,7 +271,7 @@ void __init page_cgroup_init(void)
>  	if (mem_cgroup_disabled())
>  		return;
>  
> -	for_each_node_state(nid, N_HIGH_MEMORY) {
> +	for_each_node_state(nid, N_MEMORY) {
>  		unsigned long start_pfn, end_pfn;
>  
>  		start_pfn = node_start_pfn(nid);
> -- 
> 1.7.4.4
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 08/26] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 16:22   ` Michal Hocko
@ 2012-10-29 20:40     ` David Rientjes
  2012-10-29 20:58       ` Michal Hocko
  0 siblings, 1 reply; 37+ messages in thread
From: David Rientjes @ 2012-10-29 20:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Lai Jiangshan, Mel Gorman, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Yasuaki ISIMATU,
	Andrew Morton, Johannes Weiner, Balbir Singh, Tejun Heo,
	Li Zefan, cgroups, linux-mm, containers

On Mon, 29 Oct 2012, Michal Hocko wrote:

> > N_HIGH_MEMORY stands for the nodes that has normal or high memory.
> > N_MEMORY stands for the nodes that has any memory.
> 
> What is the difference of those two?
> 

Patch 5 in the series introduces it to be equal to N_HIGH_MEMORY, so 
accepting this patch would be an implicit ack of the direction taken 
there.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 05/26] node_states: introduce N_MEMORY
  2012-10-29 15:20 ` [V5 PATCH 05/26] node_states: introduce N_MEMORY Lai Jiangshan
@ 2012-10-29 20:46   ` David Rientjes
  2012-10-31  7:03     ` Wen Congyang
  0 siblings, 1 reply; 37+ messages in thread
From: David Rientjes @ 2012-10-29 20:46 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Mel Gorman, LKML, x86 maintainers, Jiang Liu, Rusty Russell,
	Yinghai Lu, KAMEZAWA Hiroyuki, Yasuaki ISIMATU, Andrew Morton,
	Christoph Lameter, Hillf Danton

On Mon, 29 Oct 2012, Lai Jiangshan wrote:

> We have N_NORMAL_MEMORY for standing for the nodes that have normal memory with
> zone_type <= ZONE_NORMAL.
> 
> And we have N_HIGH_MEMORY for standing for the nodes that have normal or high
> memory.
> 

(In other words, all memory.)

> But we don't have any word to stand for the nodes that have *any* memory.
> 

It's N_HIGH_MEMORY, or at least it's supposed to be.  Is there a problem 
where the bit isn't getting set for a node with memory?

> A)	But this reusing is bad for *readability*. Because the name
> 	N_HIGH_MEMORY just stands for high or normal:
> 
> A.example 1)
> 	mem_cgroup_nr_lru_pages():
> 		for_each_node_state(nid, N_HIGH_MEMORY)
> 
> 	The user will be confused(why this function just counts for high or
> 	normal memory node? does it counts for ZONE_MOVABLE's lru pages?)
> 	until someone else tell them N_HIGH_MEMORY is reused to stand for
> 	nodes that have any memory.
> 
> A.cont) If we introduce N_MEMORY, we can reduce this confusing
> 	AND make the code more clearly:
> 
> A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:
> 
> 	One is in page_cgroup_init(void):
> 		for_each_node_state(nid, N_HIGH_MEMORY) {
> 
> 	It means if the node have memory, we will allocate page_cgroup map for
> 	the node. We should use N_MEMORY instead here to gaim more clearly.
> 
> 	The second using is in alloc_page_cgroup():
> 		if (node_state(nid, N_HIGH_MEMORY))
> 			addr = vzalloc_node(size, nid);
> 
> 	It means if the node has high or normal memory that can be allocated
> 	from kernel. We should keep N_HIGH_MEMORY here, and it will be better
> 	if the "any memory" semantic of N_HIGH_MEMORY is removed.
> 
> B)	This reusing is out-dated if we introduce MOVABLE-dedicated node.
> 	The MOVABLE-dedicated node should not appear in
> 	node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY],
> 	because MOVABLE-dedicated node has no high or normal memory.
> 
> 	In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node
> 	is in node_stats[N_HIGH_MEMORY], it is also means it is in
> 	node_stats[N_NORMAL_MEMORY], it causes SLUB wrong.
> 
> 	The slub uses
> 		for_each_node_state(nid, N_NORMAL_MEMORY)
> 	and creates kmem_cache_node for MOVABLE-dedicated node and cause problem.
> 
> In one word, we need a N_MEMORY. We just intrude it as an alias to
> N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late patches.
> 

If this is really that problematic (and it appears it's not given that 
there are many use cases of it and people tend to get it right), then why 
not simply rename N_HIGH_MEMORY instead of introducing yet another 
nodemask to the equation?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 08/26] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 20:40     ` David Rientjes
@ 2012-10-29 20:58       ` Michal Hocko
  2012-10-29 21:08         ` David Rientjes
  0 siblings, 1 reply; 37+ messages in thread
From: Michal Hocko @ 2012-10-29 20:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lai Jiangshan, Mel Gorman, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Yasuaki ISIMATU,
	Andrew Morton, Johannes Weiner, Balbir Singh, Tejun Heo,
	Li Zefan, cgroups, linux-mm, containers

On Mon 29-10-12 13:40:39, David Rientjes wrote:
> On Mon, 29 Oct 2012, Michal Hocko wrote:
> 
> > > N_HIGH_MEMORY stands for the nodes that has normal or high memory.
> > > N_MEMORY stands for the nodes that has any memory.
> > 
> > What is the difference of those two?
> > 
> 
> Patch 5 in the series 

Strange, I do not see that one at the mailing list.

> introduces it to be equal to N_HIGH_MEMORY, so 

So this is just a rename? If yes it would be much esier if it was
mentioned in the patch description.

> accepting this patch would be an implicit ack of the direction taken 
> there.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 08/26] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 20:58       ` Michal Hocko
@ 2012-10-29 21:08         ` David Rientjes
  2012-10-29 21:34           ` Michal Hocko
  0 siblings, 1 reply; 37+ messages in thread
From: David Rientjes @ 2012-10-29 21:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Lai Jiangshan, Mel Gorman, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Yasuaki ISIMATU,
	Andrew Morton, Johannes Weiner, Balbir Singh, Tejun Heo,
	Li Zefan, cgroups, linux-mm, containers

On Mon, 29 Oct 2012, Michal Hocko wrote:

> > > > N_HIGH_MEMORY stands for the nodes that has normal or high memory.
> > > > N_MEMORY stands for the nodes that has any memory.
> > > 
> > > What is the difference of those two?
> > > 
> > 
> > Patch 5 in the series 
> 
> Strange, I do not see that one at the mailing list.
> 

http://marc.info/?l=linux-kernel&m=135152595827692

> > introduces it to be equal to N_HIGH_MEMORY, so 
> 
> So this is just a rename? If yes it would be much esier if it was
> mentioned in the patch description.
> 

It's not even a rename even though it should be, it's adding yet another 
node_states that is equal to N_HIGH_MEMORY since that state already 
includes all memory.  It's just a matter of taste but I think we should be 
renaming it instead of aliasing it (unless you actually want to make 
N_HIGH_MEMORY only include nodes with highmem, but nothing depends on 
that).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 08/26] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 21:08         ` David Rientjes
@ 2012-10-29 21:34           ` Michal Hocko
  0 siblings, 0 replies; 37+ messages in thread
From: Michal Hocko @ 2012-10-29 21:34 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lai Jiangshan, Mel Gorman, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Yasuaki ISIMATU,
	Andrew Morton, Johannes Weiner, Balbir Singh, Tejun Heo,
	Li Zefan, cgroups, linux-mm, containers

On Mon 29-10-12 14:08:05, David Rientjes wrote:
> On Mon, 29 Oct 2012, Michal Hocko wrote:
> 
> > > > > N_HIGH_MEMORY stands for the nodes that has normal or high memory.
> > > > > N_MEMORY stands for the nodes that has any memory.
> > > > 
> > > > What is the difference of those two?
> > > > 
> > > 
> > > Patch 5 in the series 
> > 
> > Strange, I do not see that one at the mailing list.
> > 
> 
> http://marc.info/?l=linux-kernel&m=135152595827692

Thanks!

> > > introduces it to be equal to N_HIGH_MEMORY, so 
> > 
> > So this is just a rename? If yes it would be much esier if it was
> > mentioned in the patch description.
> > 
> 
> It's not even a rename even though it should be, it's adding yet another 
> node_states that is equal to N_HIGH_MEMORY since that state already 
> includes all memory.  

Which is really strange because I do not see any reason for yet another
alias if the follow up patches rename all of them (I didn't try to apply
the whole series to check that so I might be wrong here).

> It's just a matter of taste but I think we should be renaming it
> instead of aliasing it (unless you actually want to make N_HIGH_MEMORY
> only include nodes with highmem, but nothing depends on that).

Agreed, I've always considered N_HIGH_MEMORY misleading and confusing so
renaming it would really make a lot of sense to me.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node
  2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
                   ` (25 preceding siblings ...)
  2012-10-29 15:21 ` [V5 PATCH 26/26] mempolicy: fix is_valid_nodemask() Lai Jiangshan
@ 2012-10-30  9:50 ` Yasuaki Ishimatsu
  2012-10-31  9:30   ` Wen Congyang
  26 siblings, 1 reply; 37+ messages in thread
From: Yasuaki Ishimatsu @ 2012-10-30  9:50 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Andrew Morton

HI Lai,

The patch-set is huge. Therefore, we hesitate to read the patch-set. 
I think the patch-set has multiple feature developments. 
  - Development of online_movable [PATCH 1 - 3]
  - Cleanup node_state_attr [PATCH 4]
  - Introduce N_MEMORY [PATCH 5 - 18]
  - Development of kernelcore_max_addr [PATCH 19 - 25]
  - Bug fix [PATCH 26]

Why don't you separate the patch-set into each feature development?
By separating the patch-set, many people can easily participate
in your development.

Thanks,
Yasuaki Ishimatsu

2012/10/30 0:07, Lai Jiangshan wrote:
> Movable memory is a very important concept of memory-management,
> we need to consolidate it and make use of it on systems.
> 
> Movable memory is needed for
> o	anti-fragmentation(hugepage, big-order allocation...)
> o	logic hot-remove(virtualization, Memory capacity on Demand)
> o	physic hot-remove(power-saving, hardware partitioning, hardware fault management)
> 
> All these require dynamic configuring the memory and making better utilities of memories
> and safer. We also need physic hot-remove, so we need movable node too.
> (Although some systems support physic-memory-migration, we don't require all
> memory on physic-node is movable, but movable node is still needed here
> for logic-node if we want to make physic-migration is transparent)
> 
> We add dynamic configuration commands "online_movalbe" and "online_kernel".
> We also add non-dynamic boot option kernelcore_max_addr.
> We may add some more dynamic/non-dynamic configuration in future.
> 
> 
> The patchset is based on 3.7-rc3 with these three patches already applied:
> 	https://lkml.org/lkml/2012/10/24/151
> 	https://lkml.org/lkml/2012/10/26/150
> 
> You can also simply pull all the patches from:
> 	git pull https://github.com/laijs/linux.git hotplug-next
> 
> 
> 
> Issues):
> 
> mempolicy(M_BIND) don't act well when the nodemask has movable nodes only,
> the kernel allocation will fail and the task can't create new task or other
> kernel objects.
> 
> So we change the strategy/policy
> 	when the bound nodemask has movable node(s) only, we only
> 	apply mempolicy for userspace allocation, don't apply it
> 	for kernel allocation.
> 
> CPUSET also has the same problem, but the code spread in page_alloc.c,
> and we doesn't fix it yet, we can/will change allocation strategy to one of
> these 3 strategies:
> 	1) the same strategy as mempolicy
> 	2) change cpuset, make nodemask always has at least a normal node
> 	3) split nodemask: nodemask_user and nodemask_kernel
> 
> Thoughts?
> 
> 
> 
> Patches):
> 
> patch1-3:     add online_movable and online_kernel, bot don't result movable node
> Patch4        cleanup for node_state_attr
> Patch5        introduce N_MEMORY
> Patch6-17     use N_MEMORY instead N_HIGH_MEMORY.
>                The patches are separated by subsystem,
>                Patch18 also changes the node_states initialization
> Patch18-20    Add  MOVABLE-dedicated node
> Patch21-25    Add kernelcore_max_addr
> patch26:      mempolicy handle movable node
> 
> 
> 
> 
> Changes):
> 
> change V5-V4:
> 	consolidate online_movable/online_kernel
> 	nodemask management
> 
> change V4-v3
> 	rebase.
> 	online_movable/online_kernel can create a zone from empty
> 	or empyt a zone
> 
> change V3-v2:
> 	Proper nodemask management
> 
> change V2-V1:
> 
> The original V1 patchset of MOVABLE-dedicated node is here:
> http://comments.gmane.org/gmane.linux.kernel.mm/78122
> 
> The new V2 adds N_MEMORY and a notion of "MOVABLE-dedicated node".
> And fix some related problems.
> 
> The orignal V1 patchset of "add online_movable" is here:
> https://lkml.org/lkml/2012/7/4/145
> 
> The new V2 discards the MIGRATE_HOTREMOVE approach, and use a more straight
> implementation(only 1 patch).
> 
> 
> 
> Lai Jiangshan (22):
>    mm, memory-hotplug: dynamic configure movable memory and portion
>      memory
>    memory_hotplug: handle empty zone when online_movable/online_kernel
>    memory_hotplug: ensure every online node has NORMAL memory
>    node: cleanup node_state_attr
>    node_states: introduce N_MEMORY
>    cpuset: use N_MEMORY instead N_HIGH_MEMORY
>    procfs: use N_MEMORY instead N_HIGH_MEMORY
>    memcontrol: use N_MEMORY instead N_HIGH_MEMORY
>    oom: use N_MEMORY instead N_HIGH_MEMORY
>    mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
>    mempolicy: use N_MEMORY instead N_HIGH_MEMORY
>    hugetlb: use N_MEMORY instead N_HIGH_MEMORY
>    vmstat: use N_MEMORY instead N_HIGH_MEMORY
>    kthread: use N_MEMORY instead N_HIGH_MEMORY
>    init: use N_MEMORY instead N_HIGH_MEMORY
>    vmscan: use N_MEMORY instead N_HIGH_MEMORY
>    page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states
>      initialization
>    hotplug: update nodemasks management
>    numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
>    memory_hotplug: allow online/offline memory to result movable node
>    page_alloc: add kernelcore_max_addr
>    mempolicy: fix is_valid_nodemask()
> 
> Yasuaki Ishimatsu (4):
>    x86: get pg_data_t's memory from other node
>    x86: use memblock_set_current_limit() to set memblock.current_limit
>    memblock: limit memory address from memblock
>    memblock: compare current_limit with end variable at
>      memblock_find_in_range_node()
> 
>   Documentation/cgroups/cpusets.txt   |    2 +-
>   Documentation/kernel-parameters.txt |    9 +
>   Documentation/memory-hotplug.txt    |   19 ++-
>   arch/x86/kernel/setup.c             |    4 +-
>   arch/x86/mm/init_64.c               |    4 +-
>   arch/x86/mm/numa.c                  |    8 +-
>   drivers/base/memory.c               |   27 ++--
>   drivers/base/node.c                 |   28 ++--
>   fs/proc/kcore.c                     |    2 +-
>   fs/proc/task_mmu.c                  |    4 +-
>   include/linux/cpuset.h              |    2 +-
>   include/linux/memblock.h            |    1 +
>   include/linux/memory.h              |    1 +
>   include/linux/memory_hotplug.h      |   13 ++-
>   include/linux/nodemask.h            |    5 +
>   init/main.c                         |    2 +-
>   kernel/cpuset.c                     |   32 ++--
>   kernel/kthread.c                    |    2 +-
>   mm/Kconfig                          |    8 +
>   mm/hugetlb.c                        |   24 ++--
>   mm/memblock.c                       |   10 +-
>   mm/memcontrol.c                     |   18 +-
>   mm/memory_hotplug.c                 |  283 +++++++++++++++++++++++++++++++++--
>   mm/mempolicy.c                      |   48 ++++---
>   mm/migrate.c                        |    2 +-
>   mm/oom_kill.c                       |    2 +-
>   mm/page_alloc.c                     |   76 +++++++---
>   mm/page_cgroup.c                    |    2 +-
>   mm/vmscan.c                         |    4 +-
>   mm/vmstat.c                         |    4 +-
>   30 files changed, 508 insertions(+), 138 deletions(-)
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 05/26] node_states: introduce N_MEMORY
  2012-10-29 20:46   ` David Rientjes
@ 2012-10-31  7:03     ` Wen Congyang
  0 siblings, 0 replies; 37+ messages in thread
From: Wen Congyang @ 2012-10-31  7:03 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lai Jiangshan, Mel Gorman, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Yasuaki ISIMATU,
	Andrew Morton, Christoph Lameter, Hillf Danton

At 10/30/2012 04:46 AM, David Rientjes Wrote:
> On Mon, 29 Oct 2012, Lai Jiangshan wrote:
> 
>> We have N_NORMAL_MEMORY for standing for the nodes that have normal memory with
>> zone_type <= ZONE_NORMAL.
>>
>> And we have N_HIGH_MEMORY for standing for the nodes that have normal or high
>> memory.
>>
> 
> (In other words, all memory.)
> 
>> But we don't have any word to stand for the nodes that have *any* memory.
>>
> 
> It's N_HIGH_MEMORY, or at least it's supposed to be.  Is there a problem 
> where the bit isn't getting set for a node with memory?
> 
>> A)	But this reusing is bad for *readability*. Because the name
>> 	N_HIGH_MEMORY just stands for high or normal:
>>
>> A.example 1)
>> 	mem_cgroup_nr_lru_pages():
>> 		for_each_node_state(nid, N_HIGH_MEMORY)
>>
>> 	The user will be confused(why this function just counts for high or
>> 	normal memory node? does it counts for ZONE_MOVABLE's lru pages?)
>> 	until someone else tell them N_HIGH_MEMORY is reused to stand for
>> 	nodes that have any memory.
>>
>> A.cont) If we introduce N_MEMORY, we can reduce this confusing
>> 	AND make the code more clearly:
>>
>> A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:
>>
>> 	One is in page_cgroup_init(void):
>> 		for_each_node_state(nid, N_HIGH_MEMORY) {
>>
>> 	It means if the node have memory, we will allocate page_cgroup map for
>> 	the node. We should use N_MEMORY instead here to gaim more clearly.
>>
>> 	The second using is in alloc_page_cgroup():
>> 		if (node_state(nid, N_HIGH_MEMORY))
>> 			addr = vzalloc_node(size, nid);
>>
>> 	It means if the node has high or normal memory that can be allocated
>> 	from kernel. We should keep N_HIGH_MEMORY here, and it will be better
>> 	if the "any memory" semantic of N_HIGH_MEMORY is removed.
>>
>> B)	This reusing is out-dated if we introduce MOVABLE-dedicated node.
>> 	The MOVABLE-dedicated node should not appear in
>> 	node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY],
>> 	because MOVABLE-dedicated node has no high or normal memory.
>>
>> 	In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node
>> 	is in node_stats[N_HIGH_MEMORY], it is also means it is in
>> 	node_stats[N_NORMAL_MEMORY], it causes SLUB wrong.
>>
>> 	The slub uses
>> 		for_each_node_state(nid, N_NORMAL_MEMORY)
>> 	and creates kmem_cache_node for MOVABLE-dedicated node and cause problem.
>>
>> In one word, we need a N_MEMORY. We just intrude it as an alias to
>> N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late patches.
>>
> 
> If this is really that problematic (and it appears it's not given that 
> there are many use cases of it and people tend to get it right), then why 
> not simply rename N_HIGH_MEMORY instead of introducing yet another 
> nodemask to the equation?

The reason is that we need a node which only contains movable memory. This
feature is very important for node hotplug. So we will add a new nodemask
for movable memory. N_MEMORY contains movable memory but N_HIGH_MEMORY
doesn't contain it.

Thanks
Wen Congyang

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node
  2012-10-30  9:50 ` [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Yasuaki Ishimatsu
@ 2012-10-31  9:30   ` Wen Congyang
  0 siblings, 0 replies; 37+ messages in thread
From: Wen Congyang @ 2012-10-31  9:30 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: Lai Jiangshan, Mel Gorman, David Rientjes, LKML, x86 maintainers,
	Jiang Liu, Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki,
	Andrew Morton

At 10/30/2012 05:50 PM, Yasuaki Ishimatsu Wrote:
> HI Lai,
> 
> The patch-set is huge. Therefore, we hesitate to read the patch-set. 
> I think the patch-set has multiple feature developments. 
>   - Development of online_movable [PATCH 1 - 3]
>   - Cleanup node_state_attr [PATCH 4]
>   - Introduce N_MEMORY [PATCH 5 - 18]
>   - Development of kernelcore_max_addr [PATCH 19 - 25]
>   - Bug fix [PATCH 26]

I have splited it to 6 patchsets:

part1: patch 1-3
    http://marc.info/?l=linux-kernel&m=135166176108186&w=2

part2: patch 4
    http://marc.info/?l=linux-kernel&m=135166705909544&w=2

part3: patch 5-18
    http://marc.info/?l=linux-kernel&m=135167050510527&w=2

part4: patch 19-20
    http://marc.info/?l=linux-kernel&m=135167344211401&w=2

part5: patch 21-25
    http://marc.info/?l=linux-kernel&m=135167497312063&w=2

part6: patch 26
    http://marc.info/?l=linux-kernel&m=135167512612132&w=2

> 
> Why don't you separate the patch-set into each feature development?
> By separating the patch-set, many people can easily participate
> in your development.
> 
> Thanks,
> Yasuaki Ishimatsu
> 
> 2012/10/30 0:07, Lai Jiangshan wrote:
>> Movable memory is a very important concept of memory-management,
>> we need to consolidate it and make use of it on systems.
>>
>> Movable memory is needed for
>> o	anti-fragmentation(hugepage, big-order allocation...)
>> o	logic hot-remove(virtualization, Memory capacity on Demand)
>> o	physic hot-remove(power-saving, hardware partitioning, hardware fault management)
>>
>> All these require dynamic configuring the memory and making better utilities of memories
>> and safer. We also need physic hot-remove, so we need movable node too.
>> (Although some systems support physic-memory-migration, we don't require all
>> memory on physic-node is movable, but movable node is still needed here
>> for logic-node if we want to make physic-migration is transparent)
>>
>> We add dynamic configuration commands "online_movalbe" and "online_kernel".
>> We also add non-dynamic boot option kernelcore_max_addr.
>> We may add some more dynamic/non-dynamic configuration in future.
>>
>>
>> The patchset is based on 3.7-rc3 with these three patches already applied:
>> 	https://lkml.org/lkml/2012/10/24/151
>> 	https://lkml.org/lkml/2012/10/26/150
>>
>> You can also simply pull all the patches from:
>> 	git pull https://github.com/laijs/linux.git hotplug-next
>>
>>
>>
>> Issues):
>>
>> mempolicy(M_BIND) don't act well when the nodemask has movable nodes only,
>> the kernel allocation will fail and the task can't create new task or other
>> kernel objects.
>>
>> So we change the strategy/policy
>> 	when the bound nodemask has movable node(s) only, we only
>> 	apply mempolicy for userspace allocation, don't apply it
>> 	for kernel allocation.
>>
>> CPUSET also has the same problem, but the code spread in page_alloc.c,
>> and we doesn't fix it yet, we can/will change allocation strategy to one of
>> these 3 strategies:
>> 	1) the same strategy as mempolicy
>> 	2) change cpuset, make nodemask always has at least a normal node
>> 	3) split nodemask: nodemask_user and nodemask_kernel
>>
>> Thoughts?
>>
>>
>>
>> Patches):
>>
>> patch1-3:     add online_movable and online_kernel, bot don't result movable node
>> Patch4        cleanup for node_state_attr
>> Patch5        introduce N_MEMORY
>> Patch6-17     use N_MEMORY instead N_HIGH_MEMORY.
>>                The patches are separated by subsystem,
>>                Patch18 also changes the node_states initialization
>> Patch18-20    Add  MOVABLE-dedicated node
>> Patch21-25    Add kernelcore_max_addr
>> patch26:      mempolicy handle movable node
>>
>>
>>
>>
>> Changes):
>>
>> change V5-V4:
>> 	consolidate online_movable/online_kernel
>> 	nodemask management
>>
>> change V4-v3
>> 	rebase.
>> 	online_movable/online_kernel can create a zone from empty
>> 	or empyt a zone
>>
>> change V3-v2:
>> 	Proper nodemask management
>>
>> change V2-V1:
>>
>> The original V1 patchset of MOVABLE-dedicated node is here:
>> http://comments.gmane.org/gmane.linux.kernel.mm/78122
>>
>> The new V2 adds N_MEMORY and a notion of "MOVABLE-dedicated node".
>> And fix some related problems.
>>
>> The orignal V1 patchset of "add online_movable" is here:
>> https://lkml.org/lkml/2012/7/4/145
>>
>> The new V2 discards the MIGRATE_HOTREMOVE approach, and use a more straight
>> implementation(only 1 patch).
>>
>>
>>
>> Lai Jiangshan (22):
>>    mm, memory-hotplug: dynamic configure movable memory and portion
>>      memory
>>    memory_hotplug: handle empty zone when online_movable/online_kernel
>>    memory_hotplug: ensure every online node has NORMAL memory
>>    node: cleanup node_state_attr
>>    node_states: introduce N_MEMORY
>>    cpuset: use N_MEMORY instead N_HIGH_MEMORY
>>    procfs: use N_MEMORY instead N_HIGH_MEMORY
>>    memcontrol: use N_MEMORY instead N_HIGH_MEMORY
>>    oom: use N_MEMORY instead N_HIGH_MEMORY
>>    mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
>>    mempolicy: use N_MEMORY instead N_HIGH_MEMORY
>>    hugetlb: use N_MEMORY instead N_HIGH_MEMORY
>>    vmstat: use N_MEMORY instead N_HIGH_MEMORY
>>    kthread: use N_MEMORY instead N_HIGH_MEMORY
>>    init: use N_MEMORY instead N_HIGH_MEMORY
>>    vmscan: use N_MEMORY instead N_HIGH_MEMORY
>>    page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states
>>      initialization
>>    hotplug: update nodemasks management
>>    numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
>>    memory_hotplug: allow online/offline memory to result movable node
>>    page_alloc: add kernelcore_max_addr
>>    mempolicy: fix is_valid_nodemask()
>>
>> Yasuaki Ishimatsu (4):
>>    x86: get pg_data_t's memory from other node
>>    x86: use memblock_set_current_limit() to set memblock.current_limit
>>    memblock: limit memory address from memblock
>>    memblock: compare current_limit with end variable at
>>      memblock_find_in_range_node()
>>
>>   Documentation/cgroups/cpusets.txt   |    2 +-
>>   Documentation/kernel-parameters.txt |    9 +
>>   Documentation/memory-hotplug.txt    |   19 ++-
>>   arch/x86/kernel/setup.c             |    4 +-
>>   arch/x86/mm/init_64.c               |    4 +-
>>   arch/x86/mm/numa.c                  |    8 +-
>>   drivers/base/memory.c               |   27 ++--
>>   drivers/base/node.c                 |   28 ++--
>>   fs/proc/kcore.c                     |    2 +-
>>   fs/proc/task_mmu.c                  |    4 +-
>>   include/linux/cpuset.h              |    2 +-
>>   include/linux/memblock.h            |    1 +
>>   include/linux/memory.h              |    1 +
>>   include/linux/memory_hotplug.h      |   13 ++-
>>   include/linux/nodemask.h            |    5 +
>>   init/main.c                         |    2 +-
>>   kernel/cpuset.c                     |   32 ++--
>>   kernel/kthread.c                    |    2 +-
>>   mm/Kconfig                          |    8 +
>>   mm/hugetlb.c                        |   24 ++--
>>   mm/memblock.c                       |   10 +-
>>   mm/memcontrol.c                     |   18 +-
>>   mm/memory_hotplug.c                 |  283 +++++++++++++++++++++++++++++++++--
>>   mm/mempolicy.c                      |   48 ++++---
>>   mm/migrate.c                        |    2 +-
>>   mm/oom_kill.c                       |    2 +-
>>   mm/page_alloc.c                     |   76 +++++++---
>>   mm/page_cgroup.c                    |    2 +-
>>   mm/vmscan.c                         |    4 +-
>>   mm/vmstat.c                         |    4 +-
>>   30 files changed, 508 insertions(+), 138 deletions(-)
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [V5 PATCH 08/26] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
  2012-10-29 15:20 ` [V5 PATCH 08/26] memcontrol: " Lai Jiangshan
  2012-10-29 16:22   ` Michal Hocko
@ 2012-10-31 13:18   ` Michal Hocko
  1 sibling, 0 replies; 37+ messages in thread
From: Michal Hocko @ 2012-10-31 13:18 UTC (permalink / raw)
  To: Lai Jiangshan, Wen Congyang
  Cc: Mel Gorman, David Rientjes, LKML, x86 maintainers, Jiang Liu,
	Rusty Russell, Yinghai Lu, KAMEZAWA Hiroyuki, Yasuaki ISIMATU,
	Andrew Morton, Johannes Weiner, Balbir Singh, Tejun Heo,
	Li Zefan, cgroups, linux-mm, containers, Christoph Lameter,
	Hillf Danton

On Wed 31-10-12 15:03:36, Wen Congyang wrote:
> At 10/30/2012 04:46 AM, David Rientjes Wrote:
> > On Mon, 29 Oct 2012, Lai Jiangshan wrote:
[...]
> >> In one word, we need a N_MEMORY. We just intrude it as an alias to
> >> N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late patches.
> >>
> > 
> > If this is really that problematic (and it appears it's not given that 
> > there are many use cases of it and people tend to get it right), then why 
> > not simply rename N_HIGH_MEMORY instead of introducing yet another 
> > nodemask to the equation?
> 
> The reason is that we need a node which only contains movable memory. This
> feature is very important for node hotplug. So we will add a new nodemask
> for movable memory. N_MEMORY contains movable memory but N_HIGH_MEMORY
> doesn't contain it.

OK, so the N_MOVABLE_MEMORY (or how you will call it) requires that all
the allocations will be migrateable?
How do you want to achieve that with the page_cgroup descriptors? (see
bellow)

On Mon 29-10-12 23:20:58, Lai Jiangshan wrote:
[...]
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 5ddad0c..c1054ad 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -271,7 +271,7 @@ void __init page_cgroup_init(void)
>  	if (mem_cgroup_disabled())
>  		return;
>  
> -	for_each_node_state(nid, N_HIGH_MEMORY) {
> +	for_each_node_state(nid, N_MEMORY) {
>  		unsigned long start_pfn, end_pfn;
>  
>  		start_pfn = node_start_pfn(nid);

This will call init_section_page_cgroup(pfn, nid) later which allocates
page_cgroup descriptors which are not movable. Or is there any code in
your patchset that handles this?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2012-10-31 13:18 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-29 15:07 [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Lai Jiangshan
2012-10-29 15:07 ` [V5 PATCH 01/26] mm, memory-hotplug: dynamic configure movable memory and portion memory Lai Jiangshan
2012-10-29 15:20 ` [V5 PATCH 02/26] memory_hotplug: handle empty zone when online_movable/online_kernel Lai Jiangshan
2012-10-29 15:20 ` [V5 PATCH 03/26] memory_hotplug: ensure every online node has NORMAL memory Lai Jiangshan
2012-10-29 15:20 ` [V5 PATCH 04/26] node: cleanup node_state_attr Lai Jiangshan
2012-10-29 15:20 ` [V5 PATCH 05/26] node_states: introduce N_MEMORY Lai Jiangshan
2012-10-29 20:46   ` David Rientjes
2012-10-31  7:03     ` Wen Congyang
2012-10-29 15:20 ` [V5 PATCH 06/26] cpuset: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
2012-10-29 15:20 ` [V5 PATCH 07/26] procfs: " Lai Jiangshan
2012-10-29 15:20 ` [V5 PATCH 08/26] memcontrol: " Lai Jiangshan
2012-10-29 16:22   ` Michal Hocko
2012-10-29 20:40     ` David Rientjes
2012-10-29 20:58       ` Michal Hocko
2012-10-29 21:08         ` David Rientjes
2012-10-29 21:34           ` Michal Hocko
2012-10-31 13:18   ` Michal Hocko
2012-10-29 15:20 ` [V5 PATCH 09/26] oom: " Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 10/26] mm,migrate: " Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 11/26] mempolicy: " Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 12/26] hugetlb: " Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 13/26] vmstat: " Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 14/26] kthread: " Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 15/26] init: " Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 16/26] vmscan: " Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 17/26] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 18/26] hotplug: update nodemasks management Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 19/26] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 20/26] memory_hotplug: allow online/offline memory to result movable node Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 21/26] page_alloc: add kernelcore_max_addr Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 22/26] x86: get pg_data_t's memory from other node Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 23/26] x86: use memblock_set_current_limit() to set memblock.current_limit Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 24/26] memblock: limit memory address from memblock Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 25/26] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
2012-10-29 15:21 ` [V5 PATCH 26/26] mempolicy: fix is_valid_nodemask() Lai Jiangshan
2012-10-30  9:50 ` [V5 PATCH 00/26] mm, memory-hotplug: dynamic configure movable memory and introduce movable node Yasuaki Ishimatsu
2012-10-31  9:30   ` Wen Congyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).