linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] memory_hotplug: fix memory hotplug bug
@ 2012-09-27  6:47 Lai Jiangshan
  2012-09-27  6:47 ` [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY] Lai Jiangshan
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Lai Jiangshan @ 2012-09-27  6:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Rob Landley, Andrew Morton, Jiang Liu, Jianguo Wu,
	Kay Sievers, Greg Kroah-Hartman, Xishi Qiu, Mel Gorman,
	linux-doc, linux-mm

We found 3 bug while we test and develop memory hotplug.

PATCH1~2: the old code does not handle node_states[N_NORMAL_MEMORY] correctly,
it corrupts the memory.

PATCH3: move the modification of zone_start_pfn into corresponding lock.

CC: Rob Landley <rob@landley.net>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Jiang Liu <jiang.liu@huawei.com>
CC: Jianguo Wu <wujianguo@huawei.com>
CC: Kay Sievers <kay.sievers@vrfy.org>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Xishi Qiu <qiuxishi@huawei.com>
CC: Mel Gorman <mgorman@suse.de>
CC: linux-doc@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-mm@kvack.org

Lai Jiangshan (3):
  memory_hotplug: fix missing nodemask management
  slub, hotplug: ignore unrelated node's hot-adding and hot-removing
  memory,hotplug: Don't modify the zone_start_pfn outside of
    zone_span_writelock()

 Documentation/memory-hotplug.txt |    5 ++-
 include/linux/memory.h           |    1 +
 mm/memory_hotplug.c              |   96 +++++++++++++++++++++++++++++++-------
 mm/page_alloc.c                  |    3 +-
 mm/slub.c                        |    4 +-
 5 files changed, 87 insertions(+), 22 deletions(-)

-- 
1.7.4.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]
  2012-09-27  6:47 [PATCH 0/3] memory_hotplug: fix memory hotplug bug Lai Jiangshan
@ 2012-09-27  6:47 ` Lai Jiangshan
  2012-09-27 14:32   ` Ni zhan Chen
  2012-09-27 22:03   ` KOSAKI Motohiro
  2012-09-27  6:47 ` [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 19+ messages in thread
From: Lai Jiangshan @ 2012-09-27  6:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Rob Landley, Andrew Morton, Jiang Liu, Jianguo Wu,
	Kay Sievers, Greg Kroah-Hartman, Xishi Qiu, Mel Gorman,
	linux-doc, linux-mm

Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
node_states[N_NORMAL_MEMORY] becomes stale.

We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
are changed while hotpluging.

Also add @status_change_nid_normal to struct memory_notify, thus
the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
are changed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 Documentation/memory-hotplug.txt |    5 ++-
 include/linux/memory.h           |    1 +
 mm/memory_hotplug.c              |   94 +++++++++++++++++++++++++++++++------
 3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6d0c251..6e6cbc7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -377,15 +377,18 @@ The third argument is passed by pointer of struct memory_notify.
 struct memory_notify {
        unsigned long start_pfn;
        unsigned long nr_pages;
+       int status_change_nid_normal;
        int status_change_nid;
 }
 
 start_pfn is start_pfn of online/offline memory.
 nr_pages is # of pages of online/offline memory.
+status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
 status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
 set/clear. It means a new(memoryless) node gets new memory by online and a
 node loses all memory. If this is -1, then nodemask status is not changed.
-If status_changed_nid >= 0, callback should create/discard structures for the
+If status_changed_nid* >= 0, callback should create/discard structures for the
 node if necessary.
 
 --------------
diff --git a/include/linux/memory.h b/include/linux/memory.h
index ff9a9f8..a09216d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
 struct memory_notify {
 	unsigned long start_pfn;
 	unsigned long nr_pages;
+	int status_change_nid_normal;
 	int status_change_nid;
 };
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a5b90d..b62d429b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 	return 0;
 }
 
+static void check_nodemasks_changes_online(unsigned long nr_pages,
+	struct zone *zone, struct memory_notify *arg)
+{
+	int nid = zone_to_nid(zone);
+	enum zone_type zone_last = ZONE_NORMAL;
+
+	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+		zone_last = ZONE_MOVABLE;
+
+	if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
+		arg->status_change_nid_normal = nid;
+	else
+		arg->status_change_nid_normal = -1;
+
+	if (!node_state(nid, N_HIGH_MEMORY))
+		arg->status_change_nid = nid;
+	else
+		arg->status_change_nid = -1;
+}
+
+static void set_nodemasks(int node, struct memory_notify *arg)
+{
+	if (arg->status_change_nid_normal >= 0)
+		node_set_state(node, N_NORMAL_MEMORY);
+
+	node_set_state(node, N_HIGH_MEMORY);
+}
+
 
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 {
@@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 	struct memory_notify arg;
 
 	lock_memory_hotplug();
+	/*
+	 * This doesn't need a lock to do pfn_to_page().
+	 * The section can't be removed here because of the
+	 * memory_block->state_mutex.
+	 */
+	zone = page_zone(pfn_to_page(pfn));
+
 	arg.start_pfn = pfn;
 	arg.nr_pages = nr_pages;
-	arg.status_change_nid = -1;
+	check_nodemasks_changes_online(nr_pages, zone, &arg);
 
 	nid = page_to_nid(pfn_to_page(pfn));
-	if (node_present_pages(nid) == 0)
-		arg.status_change_nid = nid;
 
 	ret = memory_notify(MEM_GOING_ONLINE, &arg);
 	ret = notifier_to_errno(ret);
@@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 		return ret;
 	}
 	/*
-	 * This doesn't need a lock to do pfn_to_page().
-	 * The section can't be removed here because of the
-	 * memory_block->state_mutex.
-	 */
-	zone = page_zone(pfn_to_page(pfn));
-	/*
 	 * If this zone is not populated, then it is not in zonelist.
 	 * This means the page allocator ignores this zone.
 	 * So, zonelist must be updated after online.
@@ -517,7 +544,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 	zone->present_pages += onlined_pages;
 	zone->zone_pgdat->node_present_pages += onlined_pages;
 	if (onlined_pages) {
-		node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
+		set_nodemasks(zone_to_nid(zone), &arg);
 		if (need_zonelists_rebuild)
 			build_all_zonelists(NULL, zone);
 		else
@@ -870,6 +897,44 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	return offlined;
 }
 
+static void check_nodemasks_changes_offline(unsigned long nr_pages,
+		struct zone *zone, struct memory_notify *arg)
+{
+	struct pglist_data *pgdat = zone->zone_pgdat;
+	unsigned long present_pages = 0;
+	enum zone_type zt, zone_last = ZONE_NORMAL;
+
+	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+		zone_last = ZONE_MOVABLE;
+
+	for (zt = 0; zt <= zone_last; zt++)
+		present_pages += pgdat->node_zones[zt].present_pages;
+	if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
+		arg->status_change_nid_normal = zone_to_nid(zone);
+	else
+		arg->status_change_nid_normal = -1;
+
+	zone_last = ZONE_MOVABLE;
+	for (; zt <= zone_last; zt++)
+		present_pages += pgdat->node_zones[zt].present_pages;
+	if (nr_pages >= present_pages)
+		arg->status_change_nid = zone_to_nid(zone);
+	else
+		arg->status_change_nid = -1;
+}
+
+static void clear_nodemasks(int node, struct memory_notify *arg)
+{
+	if (arg->status_change_nid_normal >= 0)
+		node_clear_state(node, N_NORMAL_MEMORY);
+
+	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+		return;
+
+	if (arg->status_change_nid >= 0)
+		node_clear_state(node, N_HIGH_MEMORY);
+}
+
 static int __ref offline_pages(unsigned long start_pfn,
 		  unsigned long end_pfn, unsigned long timeout)
 {
@@ -903,9 +968,7 @@ static int __ref offline_pages(unsigned long start_pfn,
 
 	arg.start_pfn = start_pfn;
 	arg.nr_pages = nr_pages;
-	arg.status_change_nid = -1;
-	if (nr_pages >= node_present_pages(node))
-		arg.status_change_nid = node;
+	check_nodemasks_changes_offline(nr_pages, zone, &arg);
 
 	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
 	ret = notifier_to_errno(ret);
@@ -973,10 +1036,9 @@ repeat:
 	if (!populated_zone(zone))
 		zone_pcp_reset(zone);
 
-	if (!node_present_pages(node)) {
-		node_clear_state(node, N_HIGH_MEMORY);
+	clear_nodemasks(node, &arg);
+	if (arg.status_change_nid >= 0)
 		kswapd_stop(node);
-	}
 
 	vm_total_pages = nr_free_pagecache_pages();
 	writeback_set_ratelimit();
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing
  2012-09-27  6:47 [PATCH 0/3] memory_hotplug: fix memory hotplug bug Lai Jiangshan
  2012-09-27  6:47 ` [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY] Lai Jiangshan
@ 2012-09-27  6:47 ` Lai Jiangshan
  2012-09-27 22:04   ` KOSAKI Motohiro
  2012-09-27  6:47 ` [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock() Lai Jiangshan
  2012-09-28  0:39 ` [PATCH 0/3] memory_hotplug: fix memory hotplug bug Ni zhan Chen
  3 siblings, 1 reply; 19+ messages in thread
From: Lai Jiangshan @ 2012-09-27  6:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Christoph Lameter, Pekka Enberg, Matt Mackall, linux-mm

SLUB only fucus on the nodes which has normal memory, so ignore the other
node's hot-adding and hot-removing.

Aka: if some memroy of a node(which has no onlined memory) is online,
but this new memory onlined is not normal memory(HIGH memory example),
we should not allocate kmem_cache_node for SLUB.

And if the last normal memory is offlined, but the node still has memroy,
we should remove kmem_cache_node for that node.(current code delay it when
all of the memory is offlined)

so we only do something when marg->status_change_nid_normal > 0.
marg->status_change_nid is not suitable here.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 mm/slub.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 2fdd96f..2d78639 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3577,7 +3577,7 @@ static void slab_mem_offline_callback(void *arg)
 	struct memory_notify *marg = arg;
 	int offline_node;
 
-	offline_node = marg->status_change_nid;
+	offline_node = marg->status_change_nid_normal;
 
 	/*
 	 * If the node still has available memory. we need kmem_cache_node
@@ -3610,7 +3610,7 @@ static int slab_mem_going_online_callback(void *arg)
 	struct kmem_cache_node *n;
 	struct kmem_cache *s;
 	struct memory_notify *marg = arg;
-	int nid = marg->status_change_nid;
+	int nid = marg->status_change_nid_normal;
 	int ret = 0;
 
 	/*
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()
  2012-09-27  6:47 [PATCH 0/3] memory_hotplug: fix memory hotplug bug Lai Jiangshan
  2012-09-27  6:47 ` [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY] Lai Jiangshan
  2012-09-27  6:47 ` [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
@ 2012-09-27  6:47 ` Lai Jiangshan
  2012-09-27 13:19   ` Ni zhan Chen
  2012-09-27 22:30   ` KOSAKI Motohiro
  2012-09-28  0:39 ` [PATCH 0/3] memory_hotplug: fix memory hotplug bug Ni zhan Chen
  3 siblings, 2 replies; 19+ messages in thread
From: Lai Jiangshan @ 2012-09-27  6:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Mel Gorman, Andrew Morton, Jiang Liu, Xishi Qiu,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
	linux-mm

The __add_zone() maybe call sleep-able init_currently_empty_zone()
to init wait_table,

But this function also modifies the zone_start_pfn without any lock.
It is bugy.

So we move this modification out, and we ensure the modification
of zone_start_pfn is only done with zone_span_writelock() held or in booting.

Since zone_start_pfn is not modified by init_currently_empty_zone()
grow_zone_span() needs to check zone_start_pfn before update it.

CC: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reported-by: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
Tested-by: Wen Congyang <wency@cn.fujitsu.com>
---
 mm/memory_hotplug.c |    2 +-
 mm/page_alloc.c     |    3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b62d429b..790561f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -205,7 +205,7 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
 	zone_span_writelock(zone);
 
 	old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	if (start_pfn < zone->zone_start_pfn)
+	if (!zone->zone_start_pfn || start_pfn < zone->zone_start_pfn)
 		zone->zone_start_pfn = start_pfn;
 
 	zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c13ea75..2545013 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3997,8 +3997,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
 		return ret;
 	pgdat->nr_zones = zone_idx(zone) + 1;
 
-	zone->zone_start_pfn = zone_start_pfn;
-
 	mminit_dprintk(MMINIT_TRACE, "memmap_init",
 			"Initialising map node %d zone %lu pfns %lu -> %lu\n",
 			pgdat->node_id,
@@ -4465,6 +4463,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		ret = init_currently_empty_zone(zone, zone_start_pfn,
 						size, MEMMAP_EARLY);
 		BUG_ON(ret);
+		zone->zone_start_pfn = zone_start_pfn;
 		memmap_init(size, nid, j, zone_start_pfn);
 		zone_start_pfn += size;
 	}
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()
  2012-09-27  6:47 ` [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock() Lai Jiangshan
@ 2012-09-27 13:19   ` Ni zhan Chen
  2012-09-28  7:29     ` Lai Jiangshan
  2012-09-27 22:30   ` KOSAKI Motohiro
  1 sibling, 1 reply; 19+ messages in thread
From: Ni zhan Chen @ 2012-09-27 13:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Mel Gorman, Andrew Morton, Jiang Liu, Xishi Qiu,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
	linux-mm

On 09/27/2012 02:47 PM, Lai Jiangshan wrote:
> The __add_zone() maybe call sleep-able init_currently_empty_zone()
> to init wait_table,
>
> But this function also modifies the zone_start_pfn without any lock.
> It is bugy.
>
> So we move this modification out, and we ensure the modification
> of zone_start_pfn is only done with zone_span_writelock() held or in booting.
>
> Since zone_start_pfn is not modified by init_currently_empty_zone()
> grow_zone_span() needs to check zone_start_pfn before update it.
>
> CC: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reported-by: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
> Tested-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>   mm/memory_hotplug.c |    2 +-
>   mm/page_alloc.c     |    3 +--
>   2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b62d429b..790561f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -205,7 +205,7 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
>   	zone_span_writelock(zone);
>   
>   	old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
> -	if (start_pfn < zone->zone_start_pfn)
> +	if (!zone->zone_start_pfn || start_pfn < zone->zone_start_pfn)
>   		zone->zone_start_pfn = start_pfn;
>   
>   	zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c13ea75..2545013 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3997,8 +3997,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
>   		return ret;
>   	pgdat->nr_zones = zone_idx(zone) + 1;
>   
> -	zone->zone_start_pfn = zone_start_pfn;
> -

then how can mminit_dprintk print zone->zone_start_pfn ? always print 0 
make no sense.

>   	mminit_dprintk(MMINIT_TRACE, "memmap_init",
>   			"Initialising map node %d zone %lu pfns %lu -> %lu\n",
>   			pgdat->node_id,
> @@ -4465,6 +4463,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>   		ret = init_currently_empty_zone(zone, zone_start_pfn,
>   						size, MEMMAP_EARLY);
>   		BUG_ON(ret);
> +		zone->zone_start_pfn = zone_start_pfn;
>   		memmap_init(size, nid, j, zone_start_pfn);
>   		zone_start_pfn += size;
>   	}


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]
  2012-09-27  6:47 ` [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY] Lai Jiangshan
@ 2012-09-27 14:32   ` Ni zhan Chen
  2012-09-28  7:32     ` Lai Jiangshan
  2012-09-27 22:03   ` KOSAKI Motohiro
  1 sibling, 1 reply; 19+ messages in thread
From: Ni zhan Chen @ 2012-09-27 14:32 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Rob Landley, Andrew Morton, Jiang Liu, Jianguo Wu,
	Kay Sievers, Greg Kroah-Hartman, Xishi Qiu, Mel Gorman,
	linux-doc, linux-mm

On 09/27/2012 02:47 PM, Lai Jiangshan wrote:
> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
> it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
> node_states[N_NORMAL_MEMORY] becomes stale.
>
> We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
> are changed while hotpluging.
>
> Also add @status_change_nid_normal to struct memory_notify, thus
> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
> are changed.

I still don't understand why need care N_NORMAL_MEMORY here, could you 
explain
in details?

>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>   Documentation/memory-hotplug.txt |    5 ++-
>   include/linux/memory.h           |    1 +
>   mm/memory_hotplug.c              |   94 +++++++++++++++++++++++++++++++------
>   3 files changed, 83 insertions(+), 17 deletions(-)
>
> diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
> index 6d0c251..6e6cbc7 100644
> --- a/Documentation/memory-hotplug.txt
> +++ b/Documentation/memory-hotplug.txt
> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct memory_notify.
>   struct memory_notify {
>          unsigned long start_pfn;
>          unsigned long nr_pages;
> +       int status_change_nid_normal;
>          int status_change_nid;
>   }
>   
>   start_pfn is start_pfn of online/offline memory.
>   nr_pages is # of pages of online/offline memory.
> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
> +is (will be) set/clear, if this is -1, then nodemask status is not changed.
>   status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
>   set/clear. It means a new(memoryless) node gets new memory by online and a
>   node loses all memory. If this is -1, then nodemask status is not changed.
> -If status_changed_nid >= 0, callback should create/discard structures for the
> +If status_changed_nid* >= 0, callback should create/discard structures for the
>   node if necessary.
>   
>   --------------
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index ff9a9f8..a09216d 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
>   struct memory_notify {
>   	unsigned long start_pfn;
>   	unsigned long nr_pages;
> +	int status_change_nid_normal;
>   	int status_change_nid;
>   };
>   
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6a5b90d..b62d429b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>   	return 0;
>   }
>   
> +static void check_nodemasks_changes_online(unsigned long nr_pages,
> +	struct zone *zone, struct memory_notify *arg)
> +{
> +	int nid = zone_to_nid(zone);
> +	enum zone_type zone_last = ZONE_NORMAL;
> +
> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> +		zone_last = ZONE_MOVABLE;
> +
> +	if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
> +		arg->status_change_nid_normal = nid;
> +	else
> +		arg->status_change_nid_normal = -1;
> +
> +	if (!node_state(nid, N_HIGH_MEMORY))
> +		arg->status_change_nid = nid;
> +	else
> +		arg->status_change_nid = -1;
> +}
> +
> +static void set_nodemasks(int node, struct memory_notify *arg)
> +{
> +	if (arg->status_change_nid_normal >= 0)
> +		node_set_state(node, N_NORMAL_MEMORY);
> +
> +	node_set_state(node, N_HIGH_MEMORY);
> +}
> +
>   
>   int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>   {
> @@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>   	struct memory_notify arg;
>   
>   	lock_memory_hotplug();
> +	/*
> +	 * This doesn't need a lock to do pfn_to_page().
> +	 * The section can't be removed here because of the
> +	 * memory_block->state_mutex.
> +	 */
> +	zone = page_zone(pfn_to_page(pfn));
> +
>   	arg.start_pfn = pfn;
>   	arg.nr_pages = nr_pages;
> -	arg.status_change_nid = -1;
> +	check_nodemasks_changes_online(nr_pages, zone, &arg);
>   
>   	nid = page_to_nid(pfn_to_page(pfn));
> -	if (node_present_pages(nid) == 0)
> -		arg.status_change_nid = nid;
>   
>   	ret = memory_notify(MEM_GOING_ONLINE, &arg);
>   	ret = notifier_to_errno(ret);
> @@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>   		return ret;
>   	}
>   	/*
> -	 * This doesn't need a lock to do pfn_to_page().
> -	 * The section can't be removed here because of the
> -	 * memory_block->state_mutex.
> -	 */
> -	zone = page_zone(pfn_to_page(pfn));
> -	/*
>   	 * If this zone is not populated, then it is not in zonelist.
>   	 * This means the page allocator ignores this zone.
>   	 * So, zonelist must be updated after online.
> @@ -517,7 +544,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>   	zone->present_pages += onlined_pages;
>   	zone->zone_pgdat->node_present_pages += onlined_pages;
>   	if (onlined_pages) {
> -		node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
> +		set_nodemasks(zone_to_nid(zone), &arg);
>   		if (need_zonelists_rebuild)
>   			build_all_zonelists(NULL, zone);
>   		else
> @@ -870,6 +897,44 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>   	return offlined;
>   }
>   
> +static void check_nodemasks_changes_offline(unsigned long nr_pages,
> +		struct zone *zone, struct memory_notify *arg)
> +{
> +	struct pglist_data *pgdat = zone->zone_pgdat;
> +	unsigned long present_pages = 0;
> +	enum zone_type zt, zone_last = ZONE_NORMAL;
> +
> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> +		zone_last = ZONE_MOVABLE;
> +
> +	for (zt = 0; zt <= zone_last; zt++)
> +		present_pages += pgdat->node_zones[zt].present_pages;
> +	if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
> +		arg->status_change_nid_normal = zone_to_nid(zone);
> +	else
> +		arg->status_change_nid_normal = -1;
> +
> +	zone_last = ZONE_MOVABLE;
> +	for (; zt <= zone_last; zt++)
> +		present_pages += pgdat->node_zones[zt].present_pages;
> +	if (nr_pages >= present_pages)
> +		arg->status_change_nid = zone_to_nid(zone);
> +	else
> +		arg->status_change_nid = -1;
> +}
> +
> +static void clear_nodemasks(int node, struct memory_notify *arg)
> +{
> +	if (arg->status_change_nid_normal >= 0)
> +		node_clear_state(node, N_NORMAL_MEMORY);
> +
> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> +		return;
> +
> +	if (arg->status_change_nid >= 0)
> +		node_clear_state(node, N_HIGH_MEMORY);
> +}
> +
>   static int __ref offline_pages(unsigned long start_pfn,
>   		  unsigned long end_pfn, unsigned long timeout)
>   {
> @@ -903,9 +968,7 @@ static int __ref offline_pages(unsigned long start_pfn,
>   
>   	arg.start_pfn = start_pfn;
>   	arg.nr_pages = nr_pages;
> -	arg.status_change_nid = -1;
> -	if (nr_pages >= node_present_pages(node))
> -		arg.status_change_nid = node;
> +	check_nodemasks_changes_offline(nr_pages, zone, &arg);
>   
>   	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
>   	ret = notifier_to_errno(ret);
> @@ -973,10 +1036,9 @@ repeat:
>   	if (!populated_zone(zone))
>   		zone_pcp_reset(zone);
>   
> -	if (!node_present_pages(node)) {
> -		node_clear_state(node, N_HIGH_MEMORY);
> +	clear_nodemasks(node, &arg);
> +	if (arg.status_change_nid >= 0)
>   		kswapd_stop(node);
> -	}
>   
>   	vm_total_pages = nr_free_pagecache_pages();
>   	writeback_set_ratelimit();


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]
  2012-09-27  6:47 ` [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY] Lai Jiangshan
  2012-09-27 14:32   ` Ni zhan Chen
@ 2012-09-27 22:03   ` KOSAKI Motohiro
  2012-10-26  1:39     ` Lai Jiangshan
  1 sibling, 1 reply; 19+ messages in thread
From: KOSAKI Motohiro @ 2012-09-27 22:03 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Rob Landley, Andrew Morton, Jiang Liu, Jianguo Wu,
	Kay Sievers, Greg Kroah-Hartman, Xishi Qiu, Mel Gorman,
	linux-doc, linux-mm, kosaki.motohiro

(9/27/12 2:47 AM), Lai Jiangshan wrote:
> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
> it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
> node_states[N_NORMAL_MEMORY] becomes stale.

What's mean 'stale'? I guess

: Currently memory_hotplug doesn't turn on/off node_states[N_NORMAL_MEMORY] and
: then it will be invalid if the platform has highmem. Luckily, almost memory 
: hotplug aware platform don't have highmem, but are not all.

right?
I supporse this patch only meaningful on ARM platform practically.



> We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
> are changed while hotpluging.


> Also add @status_change_nid_normal to struct memory_notify, thus
> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
> are changed.

status_change_nid_normal is very ugly to me. When status_change_nid and 
status_change_nid_normal has positive value, they are always the same.
nid and flags value are more natual to me.



> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  Documentation/memory-hotplug.txt |    5 ++-
>  include/linux/memory.h           |    1 +
>  mm/memory_hotplug.c              |   94 +++++++++++++++++++++++++++++++------
>  3 files changed, 83 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
> index 6d0c251..6e6cbc7 100644
> --- a/Documentation/memory-hotplug.txt
> +++ b/Documentation/memory-hotplug.txt
> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct memory_notify.
>  struct memory_notify {
>         unsigned long start_pfn;
>         unsigned long nr_pages;
> +       int status_change_nid_normal;
>         int status_change_nid;
>  }
>  
>  start_pfn is start_pfn of online/offline memory.
>  nr_pages is # of pages of online/offline memory.
> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
> +is (will be) set/clear, if this is -1, then nodemask status is not changed.
>  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
>  set/clear. It means a new(memoryless) node gets new memory by online and a
>  node loses all memory. If this is -1, then nodemask status is not changed.
> -If status_changed_nid >= 0, callback should create/discard structures for the
> +If status_changed_nid* >= 0, callback should create/discard structures for the
>  node if necessary.
>  
>  --------------
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index ff9a9f8..a09216d 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
>  struct memory_notify {
>  	unsigned long start_pfn;
>  	unsigned long nr_pages;
> +	int status_change_nid_normal;
>  	int status_change_nid;
>  };
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6a5b90d..b62d429b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>  	return 0;
>  }
>  
> +static void check_nodemasks_changes_online(unsigned long nr_pages,
> +	struct zone *zone, struct memory_notify *arg)
> +{
> +	int nid = zone_to_nid(zone);
> +	enum zone_type zone_last = ZONE_NORMAL;
> +
> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> +		zone_last = ZONE_MOVABLE;

This is very strange (or ugly) code. ZONE_MOVABLE don't depend on high mem.


> +
> +	if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
> +		arg->status_change_nid_normal = nid;
> +	else
> +		arg->status_change_nid_normal = -1;

Wrong. The onlined node may only have high mem zone. IOW, think fake numa case etc.


> +
> +	if (!node_state(nid, N_HIGH_MEMORY))
> +		arg->status_change_nid = nid;
> +	else
> +		arg->status_change_nid = -1;
> +}
> +
> +static void set_nodemasks(int node, struct memory_notify *arg)

Too ugly. just remove this and use node_set_state() directly.

> +{
> +	if (arg->status_change_nid_normal >= 0)
> +		node_set_state(node, N_NORMAL_MEMORY);
> +
> +	node_set_state(node, N_HIGH_MEMORY);
> +}
> +
>  
>  int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>  {
> @@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>  	struct memory_notify arg;
>  
>  	lock_memory_hotplug();
> +	/*
> +	 * This doesn't need a lock to do pfn_to_page().
> +	 * The section can't be removed here because of the
> +	 * memory_block->state_mutex.
> +	 */

Please explain the intention of this comment. We think lock_memory_hotplug() close
a race against memory offline directly.


> +	zone = page_zone(pfn_to_page(pfn));
> +
>  	arg.start_pfn = pfn;
>  	arg.nr_pages = nr_pages;
> -	arg.status_change_nid = -1;
> +	check_nodemasks_changes_online(nr_pages, zone, &arg);
>  
>  	nid = page_to_nid(pfn_to_page(pfn));
> -	if (node_present_pages(nid) == 0)
> -		arg.status_change_nid = nid;
>  
>  	ret = memory_notify(MEM_GOING_ONLINE, &arg);
>  	ret = notifier_to_errno(ret);
> @@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>  		return ret;
>  	}
>  	/*
> -	 * This doesn't need a lock to do pfn_to_page().
> -	 * The section can't be removed here because of the
> -	 * memory_block->state_mutex.
> -	 */
> -	zone = page_zone(pfn_to_page(pfn));
> -	/*
>  	 * If this zone is not populated, then it is not in zonelist.
>  	 * This means the page allocator ignores this zone.
>  	 * So, zonelist must be updated after online.
> @@ -517,7 +544,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>  	zone->present_pages += onlined_pages;
>  	zone->zone_pgdat->node_present_pages += onlined_pages;
>  	if (onlined_pages) {
> -		node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
> +		set_nodemasks(zone_to_nid(zone), &arg);
>  		if (need_zonelists_rebuild)
>  			build_all_zonelists(NULL, zone);
>  		else
> @@ -870,6 +897,44 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>  	return offlined;
>  }
>  
> +static void check_nodemasks_changes_offline(unsigned long nr_pages,
> +		struct zone *zone, struct memory_notify *arg)
> +{

This should remove ugly memory_notify argument and should be fold 
check_nodemasks_changes_online().


> +	struct pglist_data *pgdat = zone->zone_pgdat;
> +	unsigned long present_pages = 0;
> +	enum zone_type zt, zone_last = ZONE_NORMAL;
> +
> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> +		zone_last = ZONE_MOVABLE;
> +
> +	for (zt = 0; zt <= zone_last; zt++)
> +		present_pages += pgdat->node_zones[zt].present_pages;
> +	if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
> +		arg->status_change_nid_normal = zone_to_nid(zone);
> +	else
> +		arg->status_change_nid_normal = -1;

Wrong. Think, a zone has both normal and highmem zone and when admin 
only remove highmem area. you should check pfn too.


> +
> +	zone_last = ZONE_MOVABLE;
> +	for (; zt <= zone_last; zt++)
> +		present_pages += pgdat->node_zones[zt].present_pages;
> +	if (nr_pages >= present_pages)
> +		arg->status_change_nid = zone_to_nid(zone);
> +	else
> +		arg->status_change_nid = -1;
> +}
> +
> +static void clear_nodemasks(int node, struct memory_notify *arg)
> +{
> +	if (arg->status_change_nid_normal >= 0)
> +		node_clear_state(node, N_NORMAL_MEMORY);
> +
> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> +		return;

You should use #ifdef. otherwise we face unreachable statement warning.

> +
> +	if (arg->status_change_nid >= 0)
> +		node_clear_state(node, N_HIGH_MEMORY);
> +}
> +
>  static int __ref offline_pages(unsigned long start_pfn,
>  		  unsigned long end_pfn, unsigned long timeout)
>  {
> @@ -903,9 +968,7 @@ static int __ref offline_pages(unsigned long start_pfn,
>  
>  	arg.start_pfn = start_pfn;
>  	arg.nr_pages = nr_pages;
> -	arg.status_change_nid = -1;
> -	if (nr_pages >= node_present_pages(node))
> -		arg.status_change_nid = node;
> +	check_nodemasks_changes_offline(nr_pages, zone, &arg);
>  
>  	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
>  	ret = notifier_to_errno(ret);
> @@ -973,10 +1036,9 @@ repeat:
>  	if (!populated_zone(zone))
>  		zone_pcp_reset(zone);
>  
> -	if (!node_present_pages(node)) {
> -		node_clear_state(node, N_HIGH_MEMORY);
> +	clear_nodemasks(node, &arg);
> +	if (arg.status_change_nid >= 0)
>  		kswapd_stop(node);
> -	}
>  
>  	vm_total_pages = nr_free_pagecache_pages();
>  	writeback_set_ratelimit();
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing
  2012-09-27  6:47 ` [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
@ 2012-09-27 22:04   ` KOSAKI Motohiro
  2012-09-27 22:35     ` Christoph
  0 siblings, 1 reply; 19+ messages in thread
From: KOSAKI Motohiro @ 2012-09-27 22:04 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Christoph Lameter, Pekka Enberg, Matt Mackall,
	linux-mm, kosaki.motohiro

(9/27/12 2:47 AM), Lai Jiangshan wrote:
> SLUB only fucus on the nodes which has normal memory, so ignore the other
> node's hot-adding and hot-removing.
> 
> Aka: if some memroy of a node(which has no onlined memory) is online,
> but this new memory onlined is not normal memory(HIGH memory example),
> we should not allocate kmem_cache_node for SLUB.
> 
> And if the last normal memory is offlined, but the node still has memroy,
> we should remove kmem_cache_node for that node.(current code delay it when
> all of the memory is offlined)
> 
> so we only do something when marg->status_change_nid_normal > 0.
> marg->status_change_nid is not suitable here.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  mm/slub.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 2fdd96f..2d78639 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3577,7 +3577,7 @@ static void slab_mem_offline_callback(void *arg)
>  	struct memory_notify *marg = arg;
>  	int offline_node;
>  
> -	offline_node = marg->status_change_nid;
> +	offline_node = marg->status_change_nid_normal;
>  
>  	/*
>  	 * If the node still has available memory. we need kmem_cache_node
> @@ -3610,7 +3610,7 @@ static int slab_mem_going_online_callback(void *arg)
>  	struct kmem_cache_node *n;
>  	struct kmem_cache *s;
>  	struct memory_notify *marg = arg;
> -	int nid = marg->status_change_nid;
> +	int nid = marg->status_change_nid_normal;
>  	int ret = 0;

Looks reasonable. I think slab need similar fix too.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()
  2012-09-27  6:47 ` [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock() Lai Jiangshan
  2012-09-27 13:19   ` Ni zhan Chen
@ 2012-09-27 22:30   ` KOSAKI Motohiro
  2012-09-28  7:39     ` Lai Jiangshan
  1 sibling, 1 reply; 19+ messages in thread
From: KOSAKI Motohiro @ 2012-09-27 22:30 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Mel Gorman, Andrew Morton, Jiang Liu, Xishi Qiu,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
	linux-mm, kosaki.motohiro

(9/27/12 2:47 AM), Lai Jiangshan wrote:
> The __add_zone() maybe call sleep-able init_currently_empty_zone()
> to init wait_table,

This doesn't explain why sleepable is critical important. I think sleepable
is jsut unrelated. The fact is only: to write zone->zone_start_pfn require
zone_span_writelock, but init_currently_empty_zone() doesn't take it.


> 
> But this function also modifies the zone_start_pfn without any lock.
> It is bugy.

buggy?


> So we move this modification out, and we ensure the modification
> of zone_start_pfn is only done with zone_span_writelock() held or in booting.
> 
> Since zone_start_pfn is not modified by init_currently_empty_zone()
> grow_zone_span() needs to check zone_start_pfn before update it.
> 
> CC: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reported-by: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
> Tested-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  mm/memory_hotplug.c |    2 +-
>  mm/page_alloc.c     |    3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b62d429b..790561f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -205,7 +205,7 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
>  	zone_span_writelock(zone);
>  
>  	old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
> -	if (start_pfn < zone->zone_start_pfn)
> +	if (!zone->zone_start_pfn || start_pfn < zone->zone_start_pfn)
>  		zone->zone_start_pfn = start_pfn;

Wrong. zone->zone_start_pfn==0 may be valid pfn. You shouldn't assume it is uninitialized
value.


>  
>  	zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c13ea75..2545013 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3997,8 +3997,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
>  		return ret;
>  	pgdat->nr_zones = zone_idx(zone) + 1;
>  
> -	zone->zone_start_pfn = zone_start_pfn;
> -
>  	mminit_dprintk(MMINIT_TRACE, "memmap_init",
>  			"Initialising map node %d zone %lu pfns %lu -> %lu\n",
>  			pgdat->node_id,
> @@ -4465,6 +4463,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  		ret = init_currently_empty_zone(zone, zone_start_pfn,
>  						size, MEMMAP_EARLY);
>  		BUG_ON(ret);
> +		zone->zone_start_pfn = zone_start_pfn;
>  		memmap_init(size, nid, j, zone_start_pfn);
>  		zone_start_pfn += size;
>  	}
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing
  2012-09-27 22:04   ` KOSAKI Motohiro
@ 2012-09-27 22:35     ` Christoph
  2012-09-28  7:19       ` Lai Jiangshan
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph @ 2012-09-27 22:35 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lai Jiangshan, linux-kernel, Christoph Lameter, Pekka Enberg,
	Matt Mackall, linux-mm, kosaki.motohiro

While you are at it: Could you move the code into slab_common.c so that there is only one version to maintain?

On Sep 27, 2012, at 17:04, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:

> (9/27/12 2:47 AM), Lai Jiangshan wrote:
>> SLUB only fucus on the nodes which has normal memory, so ignore the other
>> node's hot-adding and hot-removing.
>> 
>> Aka: if some memroy of a node(which has no onlined memory) is online,
>> but this new memory onlined is not normal memory(HIGH memory example),
>> we should not allocate kmem_cache_node for SLUB.
>> 
>> And if the last normal memory is offlined, but the node still has memroy,
>> we should remove kmem_cache_node for that node.(current code delay it when
>> all of the memory is offlined)
>> 
>> so we only do something when marg->status_change_nid_normal > 0.
>> marg->status_change_nid is not suitable here.
>> 
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>> mm/slub.c |    4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 2fdd96f..2d78639 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -3577,7 +3577,7 @@ static void slab_mem_offline_callback(void *arg)
>>    struct memory_notify *marg = arg;
>>    int offline_node;
>> 
>> -    offline_node = marg->status_change_nid;
>> +    offline_node = marg->status_change_nid_normal;
>> 
>>    /*
>>     * If the node still has available memory. we need kmem_cache_node
>> @@ -3610,7 +3610,7 @@ static int slab_mem_going_online_callback(void *arg)
>>    struct kmem_cache_node *n;
>>    struct kmem_cache *s;
>>    struct memory_notify *marg = arg;
>> -    int nid = marg->status_change_nid;
>> +    int nid = marg->status_change_nid_normal;
>>    int ret = 0;
> 
> Looks reasonable. I think slab need similar fix too.
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] memory_hotplug: fix memory hotplug bug
  2012-09-27  6:47 [PATCH 0/3] memory_hotplug: fix memory hotplug bug Lai Jiangshan
                   ` (2 preceding siblings ...)
  2012-09-27  6:47 ` [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock() Lai Jiangshan
@ 2012-09-28  0:39 ` Ni zhan Chen
  3 siblings, 0 replies; 19+ messages in thread
From: Ni zhan Chen @ 2012-09-28  0:39 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Rob Landley, Andrew Morton, Jiang Liu, Jianguo Wu,
	Kay Sievers, Greg Kroah-Hartman, Xishi Qiu, Mel Gorman,
	linux-doc, linux-mm

On 09/27/2012 02:47 PM, Lai Jiangshan wrote:
> We found 3 bug while we test and develop memory hotplug.
>
> PATCH1~2: the old code does not handle node_states[N_NORMAL_MEMORY] correctly,
> it corrupts the memory.
>
> PATCH3: move the modification of zone_start_pfn into corresponding lock.

please fully test them before send out. thanks.

>
> CC: Rob Landley <rob@landley.net>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Jiang Liu <jiang.liu@huawei.com>
> CC: Jianguo Wu <wujianguo@huawei.com>
> CC: Kay Sievers <kay.sievers@vrfy.org>
> CC: Greg Kroah-Hartman <gregkh@suse.de>
> CC: Xishi Qiu <qiuxishi@huawei.com>
> CC: Mel Gorman <mgorman@suse.de>
> CC: linux-doc@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: linux-mm@kvack.org
>
> Lai Jiangshan (3):
>    memory_hotplug: fix missing nodemask management
>    slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>    memory,hotplug: Don't modify the zone_start_pfn outside of
>      zone_span_writelock()
>
>   Documentation/memory-hotplug.txt |    5 ++-
>   include/linux/memory.h           |    1 +
>   mm/memory_hotplug.c              |   96 +++++++++++++++++++++++++++++++-------
>   mm/page_alloc.c                  |    3 +-
>   mm/slub.c                        |    4 +-
>   5 files changed, 87 insertions(+), 22 deletions(-)
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing
  2012-09-27 22:35     ` Christoph
@ 2012-09-28  7:19       ` Lai Jiangshan
  2012-09-28 22:26         ` KOSAKI Motohiro
  0 siblings, 1 reply; 19+ messages in thread
From: Lai Jiangshan @ 2012-09-28  7:19 UTC (permalink / raw)
  To: Christoph
  Cc: KOSAKI Motohiro, linux-kernel, Christoph Lameter, Pekka Enberg,
	Matt Mackall, linux-mm

HI, Christoph, KOSAKI

SLAB always allocates kmem_list3 for all nodes(N_HIGH_MEMORY), also node bug/bad things happens.
SLUB always requires kmem_cache_node on the correct node, so these fix is needed.

SLAB uses for_each_online_node() to travel nodes and do maintain,
and it tolerates kmem_list3 on alien nodes.
SLUB uses for_each_node_state(node, N_NORMAL_MEMORY) to travel nodes and do maintain,
and it does not tolerate kmem_cache_node on alien nodes.

Maybe we need to change SLAB future and let it use
for_each_node_state(node, N_NORMAL_MEMORY), But I don't want to change SLAB
until I find something bad in SLAB.

Thanks,
Lai

On 09/28/2012 06:35 AM, Christoph wrote:
> While you are at it: Could you move the code into slab_common.c so that there is only one version to maintain?
> 
> On Sep 27, 2012, at 17:04, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:
> 
>> (9/27/12 2:47 AM), Lai Jiangshan wrote:
>>> SLUB only fucus on the nodes which has normal memory, so ignore the other
>>> node's hot-adding and hot-removing.
>>>
>>> Aka: if some memroy of a node(which has no onlined memory) is online,
>>> but this new memory onlined is not normal memory(HIGH memory example),
>>> we should not allocate kmem_cache_node for SLUB.
>>>
>>> And if the last normal memory is offlined, but the node still has memroy,
>>> we should remove kmem_cache_node for that node.(current code delay it when
>>> all of the memory is offlined)
>>>
>>> so we only do something when marg->status_change_nid_normal > 0.
>>> marg->status_change_nid is not suitable here.
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>>> mm/slub.c |    4 ++--
>>> 1 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index 2fdd96f..2d78639 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -3577,7 +3577,7 @@ static void slab_mem_offline_callback(void *arg)
>>>    struct memory_notify *marg = arg;
>>>    int offline_node;
>>>
>>> -    offline_node = marg->status_change_nid;
>>> +    offline_node = marg->status_change_nid_normal;
>>>
>>>    /*
>>>     * If the node still has available memory. we need kmem_cache_node
>>> @@ -3610,7 +3610,7 @@ static int slab_mem_going_online_callback(void *arg)
>>>    struct kmem_cache_node *n;
>>>    struct kmem_cache *s;
>>>    struct memory_notify *marg = arg;
>>> -    int nid = marg->status_change_nid;
>>> +    int nid = marg->status_change_nid_normal;
>>>    int ret = 0;
>>
>> Looks reasonable. I think slab need similar fix too.
>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()
  2012-09-27 13:19   ` Ni zhan Chen
@ 2012-09-28  7:29     ` Lai Jiangshan
  2012-09-28  8:04       ` Ni zhan Chen
  0 siblings, 1 reply; 19+ messages in thread
From: Lai Jiangshan @ 2012-09-28  7:29 UTC (permalink / raw)
  To: Ni zhan Chen
  Cc: linux-kernel, Mel Gorman, Andrew Morton, Jiang Liu, Xishi Qiu,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
	linux-mm

Hi, Chen,

On 09/27/2012 09:19 PM, Ni zhan Chen wrote:
> On 09/27/2012 02:47 PM, Lai Jiangshan wrote:
>> The __add_zone() maybe call sleep-able init_currently_empty_zone()
>> to init wait_table,
>>
>> But this function also modifies the zone_start_pfn without any lock.
>> It is bugy.
>>
>> So we move this modification out, and we ensure the modification
>> of zone_start_pfn is only done with zone_span_writelock() held or in booting.
>>
>> Since zone_start_pfn is not modified by init_currently_empty_zone()
>> grow_zone_span() needs to check zone_start_pfn before update it.
>>
>> CC: Mel Gorman <mel@csn.ul.ie>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Reported-by: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
>> Tested-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>   mm/memory_hotplug.c |    2 +-
>>   mm/page_alloc.c     |    3 +--
>>   2 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index b62d429b..790561f 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -205,7 +205,7 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
>>       zone_span_writelock(zone);
>>         old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
>> -    if (start_pfn < zone->zone_start_pfn)
>> +    if (!zone->zone_start_pfn || start_pfn < zone->zone_start_pfn)
>>           zone->zone_start_pfn = start_pfn;
>>         zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c13ea75..2545013 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3997,8 +3997,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
>>           return ret;
>>       pgdat->nr_zones = zone_idx(zone) + 1;
>>   -    zone->zone_start_pfn = zone_start_pfn;
>> -
> 
> then how can mminit_dprintk print zone->zone_start_pfn ? always print 0 make no sense.


The full code here:

	mminit_dprintk(MMINIT_TRACE, "memmap_init",
			"Initialising map node %d zone %lu pfns %lu -> %lu\n",
			pgdat->node_id,
			(unsigned long)zone_idx(zone),
			zone_start_pfn, (zone_start_pfn + size));


It doesn't always print 0, it still behaves as I expected.
Could you elaborate?

Thanks,
Lai 


> 
>>       mminit_dprintk(MMINIT_TRACE, "memmap_init",
>>               "Initialising map node %d zone %lu pfns %lu -> %lu\n",
>>               pgdat->node_id,
>> @@ -4465,6 +4463,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>>           ret = init_currently_empty_zone(zone, zone_start_pfn,
>>                           size, MEMMAP_EARLY);
>>           BUG_ON(ret);
>> +        zone->zone_start_pfn = zone_start_pfn;
>>           memmap_init(size, nid, j, zone_start_pfn);
>>           zone_start_pfn += size;
>>       }
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]
  2012-09-27 14:32   ` Ni zhan Chen
@ 2012-09-28  7:32     ` Lai Jiangshan
  0 siblings, 0 replies; 19+ messages in thread
From: Lai Jiangshan @ 2012-09-28  7:32 UTC (permalink / raw)
  To: Ni zhan Chen
  Cc: linux-kernel, Rob Landley, Andrew Morton, Jiang Liu, Jianguo Wu,
	Kay Sievers, Greg Kroah-Hartman, Xishi Qiu, Mel Gorman,
	linux-doc, linux-mm

On 09/27/2012 10:32 PM, Ni zhan Chen wrote:
> On 09/27/2012 02:47 PM, Lai Jiangshan wrote:
>> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
>> it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
>> node_states[N_NORMAL_MEMORY] becomes stale.
>>
>> We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
>> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
>> are changed while hotpluging.
>>
>> Also add @status_change_nid_normal to struct memory_notify, thus
>> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
>> are changed.
> 
> I still don't understand why need care N_NORMAL_MEMORY here, could you explain
> in details?

Hi, Chen

In short node_states[N_NORMAL_MEMORY] will become wrong in some situation.
many memory management code access to this node_states[N_NORMAL_MEMORY].

I will add more detail in the changelog in next round.

Thanks,
Lai

> 
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>   Documentation/memory-hotplug.txt |    5 ++-
>>   include/linux/memory.h           |    1 +
>>   mm/memory_hotplug.c              |   94 +++++++++++++++++++++++++++++++------
>>   3 files changed, 83 insertions(+), 17 deletions(-)
>>
>> diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
>> index 6d0c251..6e6cbc7 100644
>> --- a/Documentation/memory-hotplug.txt
>> +++ b/Documentation/memory-hotplug.txt
>> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct memory_notify.
>>   struct memory_notify {
>>          unsigned long start_pfn;
>>          unsigned long nr_pages;
>> +       int status_change_nid_normal;
>>          int status_change_nid;
>>   }
>>     start_pfn is start_pfn of online/offline memory.
>>   nr_pages is # of pages of online/offline memory.
>> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
>> +is (will be) set/clear, if this is -1, then nodemask status is not changed.
>>   status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
>>   set/clear. It means a new(memoryless) node gets new memory by online and a
>>   node loses all memory. If this is -1, then nodemask status is not changed.
>> -If status_changed_nid >= 0, callback should create/discard structures for the
>> +If status_changed_nid* >= 0, callback should create/discard structures for the
>>   node if necessary.
>>     --------------
>> diff --git a/include/linux/memory.h b/include/linux/memory.h
>> index ff9a9f8..a09216d 100644
>> --- a/include/linux/memory.h
>> +++ b/include/linux/memory.h
>> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
>>   struct memory_notify {
>>       unsigned long start_pfn;
>>       unsigned long nr_pages;
>> +    int status_change_nid_normal;
>>       int status_change_nid;
>>   };
>>   diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 6a5b90d..b62d429b 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>>       return 0;
>>   }
>>   +static void check_nodemasks_changes_online(unsigned long nr_pages,
>> +    struct zone *zone, struct memory_notify *arg)
>> +{
>> +    int nid = zone_to_nid(zone);
>> +    enum zone_type zone_last = ZONE_NORMAL;
>> +
>> +    if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
>> +        zone_last = ZONE_MOVABLE;
>> +
>> +    if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
>> +        arg->status_change_nid_normal = nid;
>> +    else
>> +        arg->status_change_nid_normal = -1;
>> +
>> +    if (!node_state(nid, N_HIGH_MEMORY))
>> +        arg->status_change_nid = nid;
>> +    else
>> +        arg->status_change_nid = -1;
>> +}
>> +
>> +static void set_nodemasks(int node, struct memory_notify *arg)
>> +{
>> +    if (arg->status_change_nid_normal >= 0)
>> +        node_set_state(node, N_NORMAL_MEMORY);
>> +
>> +    node_set_state(node, N_HIGH_MEMORY);
>> +}
>> +
>>     int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>   {
>> @@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>       struct memory_notify arg;
>>         lock_memory_hotplug();
>> +    /*
>> +     * This doesn't need a lock to do pfn_to_page().
>> +     * The section can't be removed here because of the
>> +     * memory_block->state_mutex.
>> +     */
>> +    zone = page_zone(pfn_to_page(pfn));
>> +
>>       arg.start_pfn = pfn;
>>       arg.nr_pages = nr_pages;
>> -    arg.status_change_nid = -1;
>> +    check_nodemasks_changes_online(nr_pages, zone, &arg);
>>         nid = page_to_nid(pfn_to_page(pfn));
>> -    if (node_present_pages(nid) == 0)
>> -        arg.status_change_nid = nid;
>>         ret = memory_notify(MEM_GOING_ONLINE, &arg);
>>       ret = notifier_to_errno(ret);
>> @@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>           return ret;
>>       }
>>       /*
>> -     * This doesn't need a lock to do pfn_to_page().
>> -     * The section can't be removed here because of the
>> -     * memory_block->state_mutex.
>> -     */
>> -    zone = page_zone(pfn_to_page(pfn));
>> -    /*
>>        * If this zone is not populated, then it is not in zonelist.
>>        * This means the page allocator ignores this zone.
>>        * So, zonelist must be updated after online.
>> @@ -517,7 +544,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>       zone->present_pages += onlined_pages;
>>       zone->zone_pgdat->node_present_pages += onlined_pages;
>>       if (onlined_pages) {
>> -        node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
>> +        set_nodemasks(zone_to_nid(zone), &arg);
>>           if (need_zonelists_rebuild)
>>               build_all_zonelists(NULL, zone);
>>           else
>> @@ -870,6 +897,44 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>>       return offlined;
>>   }
>>   +static void check_nodemasks_changes_offline(unsigned long nr_pages,
>> +        struct zone *zone, struct memory_notify *arg)
>> +{
>> +    struct pglist_data *pgdat = zone->zone_pgdat;
>> +    unsigned long present_pages = 0;
>> +    enum zone_type zt, zone_last = ZONE_NORMAL;
>> +
>> +    if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
>> +        zone_last = ZONE_MOVABLE;
>> +
>> +    for (zt = 0; zt <= zone_last; zt++)
>> +        present_pages += pgdat->node_zones[zt].present_pages;
>> +    if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
>> +        arg->status_change_nid_normal = zone_to_nid(zone);
>> +    else
>> +        arg->status_change_nid_normal = -1;
>> +
>> +    zone_last = ZONE_MOVABLE;
>> +    for (; zt <= zone_last; zt++)
>> +        present_pages += pgdat->node_zones[zt].present_pages;
>> +    if (nr_pages >= present_pages)
>> +        arg->status_change_nid = zone_to_nid(zone);
>> +    else
>> +        arg->status_change_nid = -1;
>> +}
>> +
>> +static void clear_nodemasks(int node, struct memory_notify *arg)
>> +{
>> +    if (arg->status_change_nid_normal >= 0)
>> +        node_clear_state(node, N_NORMAL_MEMORY);
>> +
>> +    if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
>> +        return;
>> +
>> +    if (arg->status_change_nid >= 0)
>> +        node_clear_state(node, N_HIGH_MEMORY);
>> +}
>> +
>>   static int __ref offline_pages(unsigned long start_pfn,
>>             unsigned long end_pfn, unsigned long timeout)
>>   {
>> @@ -903,9 +968,7 @@ static int __ref offline_pages(unsigned long start_pfn,
>>         arg.start_pfn = start_pfn;
>>       arg.nr_pages = nr_pages;
>> -    arg.status_change_nid = -1;
>> -    if (nr_pages >= node_present_pages(node))
>> -        arg.status_change_nid = node;
>> +    check_nodemasks_changes_offline(nr_pages, zone, &arg);
>>         ret = memory_notify(MEM_GOING_OFFLINE, &arg);
>>       ret = notifier_to_errno(ret);
>> @@ -973,10 +1036,9 @@ repeat:
>>       if (!populated_zone(zone))
>>           zone_pcp_reset(zone);
>>   -    if (!node_present_pages(node)) {
>> -        node_clear_state(node, N_HIGH_MEMORY);
>> +    clear_nodemasks(node, &arg);
>> +    if (arg.status_change_nid >= 0)
>>           kswapd_stop(node);
>> -    }
>>         vm_total_pages = nr_free_pagecache_pages();
>>       writeback_set_ratelimit();
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()
  2012-09-27 22:30   ` KOSAKI Motohiro
@ 2012-09-28  7:39     ` Lai Jiangshan
  0 siblings, 0 replies; 19+ messages in thread
From: Lai Jiangshan @ 2012-09-28  7:39 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, Mel Gorman, Andrew Morton, Jiang Liu, Xishi Qiu,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
	linux-mm

Hi, KOSAKI

On 09/28/2012 06:30 AM, KOSAKI Motohiro wrote:
> (9/27/12 2:47 AM), Lai Jiangshan wrote:
>> The __add_zone() maybe call sleep-able init_currently_empty_zone()
>> to init wait_table,
> 
> This doesn't explain why sleepable is critical important. I think sleepable
> is jsut unrelated. The fact is only: to write zone->zone_start_pfn require
> zone_span_writelock, but init_currently_empty_zone() doesn't take it.

You are right, sleepable is not critical important, but the lock is critical.

I am Sorry that I added "sleep-able" and misled guys.

Actually I want to say:

1) to write zone->zone_start_pfn require zone_span_writelock
2) init_currently_empty_zone() is sleepable, so we can't use zone_span_writelock()
   protect the whole init_currently_empty_zone().
3) so we have to move the modification code out of init_currently_empty_zone()
   as this patch does.

> 
> 
>>
>> But this function also modifies the zone_start_pfn without any lock.
>> It is bugy.
> 
> buggy?
> 
> 
>> So we move this modification out, and we ensure the modification
>> of zone_start_pfn is only done with zone_span_writelock() held or in booting.
>>
>> Since zone_start_pfn is not modified by init_currently_empty_zone()
>> grow_zone_span() needs to check zone_start_pfn before update it.
>>
>> CC: Mel Gorman <mel@csn.ul.ie>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Reported-by: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
>> Tested-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  mm/memory_hotplug.c |    2 +-
>>  mm/page_alloc.c     |    3 +--
>>  2 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index b62d429b..790561f 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -205,7 +205,7 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
>>  	zone_span_writelock(zone);
>>  
>>  	old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
>> -	if (start_pfn < zone->zone_start_pfn)
>> +	if (!zone->zone_start_pfn || start_pfn < zone->zone_start_pfn)
>>  		zone->zone_start_pfn = start_pfn;
> 
> Wrong. zone->zone_start_pfn==0 may be valid pfn. You shouldn't assume it is uninitialized
> value.

Good catch, I will use zone->spanned_pages instead.


Thanks,
Lai

> 
> 
>>  
>>  	zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c13ea75..2545013 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3997,8 +3997,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
>>  		return ret;
>>  	pgdat->nr_zones = zone_idx(zone) + 1;
>>  
>> -	zone->zone_start_pfn = zone_start_pfn;
>> -
>>  	mminit_dprintk(MMINIT_TRACE, "memmap_init",
>>  			"Initialising map node %d zone %lu pfns %lu -> %lu\n",
>>  			pgdat->node_id,
>> @@ -4465,6 +4463,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>>  		ret = init_currently_empty_zone(zone, zone_start_pfn,
>>  						size, MEMMAP_EARLY);
>>  		BUG_ON(ret);
>> +		zone->zone_start_pfn = zone_start_pfn;
>>  		memmap_init(size, nid, j, zone_start_pfn);
>>  		zone_start_pfn += size;
>>  	}
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()
  2012-09-28  7:29     ` Lai Jiangshan
@ 2012-09-28  8:04       ` Ni zhan Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Ni zhan Chen @ 2012-09-28  8:04 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Mel Gorman, Andrew Morton, Jiang Liu, Xishi Qiu,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
	linux-mm

On 09/28/2012 03:29 PM, Lai Jiangshan wrote:
> Hi, Chen,
>
> On 09/27/2012 09:19 PM, Ni zhan Chen wrote:
>> On 09/27/2012 02:47 PM, Lai Jiangshan wrote:
>>> The __add_zone() maybe call sleep-able init_currently_empty_zone()
>>> to init wait_table,
>>>
>>> But this function also modifies the zone_start_pfn without any lock.
>>> It is bugy.
>>>
>>> So we move this modification out, and we ensure the modification
>>> of zone_start_pfn is only done with zone_span_writelock() held or in booting.
>>>
>>> Since zone_start_pfn is not modified by init_currently_empty_zone()
>>> grow_zone_span() needs to check zone_start_pfn before update it.
>>>
>>> CC: Mel Gorman <mel@csn.ul.ie>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Reported-by: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
>>> Tested-by: Wen Congyang <wency@cn.fujitsu.com>
>>> ---
>>>    mm/memory_hotplug.c |    2 +-
>>>    mm/page_alloc.c     |    3 +--
>>>    2 files changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index b62d429b..790561f 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -205,7 +205,7 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
>>>        zone_span_writelock(zone);
>>>          old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
>>> -    if (start_pfn < zone->zone_start_pfn)
>>> +    if (!zone->zone_start_pfn || start_pfn < zone->zone_start_pfn)
>>>            zone->zone_start_pfn = start_pfn;
>>>          zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index c13ea75..2545013 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -3997,8 +3997,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
>>>            return ret;
>>>        pgdat->nr_zones = zone_idx(zone) + 1;
>>>    -    zone->zone_start_pfn = zone_start_pfn;
>>> -
>> then how can mminit_dprintk print zone->zone_start_pfn ? always print 0 make no sense.
>
> The full code here:
>
> 	mminit_dprintk(MMINIT_TRACE, "memmap_init",
> 			"Initialising map node %d zone %lu pfns %lu -> %lu\n",
> 			pgdat->node_id,
> 			(unsigned long)zone_idx(zone),
> 			zone_start_pfn, (zone_start_pfn + size));
>
>
> It doesn't always print 0, it still behaves as I expected.
> Could you elaborate?

Yeah, you are right. I mean mminit_dprintk is called after 
zone->zone_start_pfn initialized to show initialising state, but after 
this patch applied zone->zone_start_pfn will not be initialized before 
this print point.

>
> Thanks,
> Lai
>
>
>>>        mminit_dprintk(MMINIT_TRACE, "memmap_init",
>>>                "Initialising map node %d zone %lu pfns %lu -> %lu\n",
>>>                pgdat->node_id,
>>> @@ -4465,6 +4463,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>>>            ret = init_currently_empty_zone(zone, zone_start_pfn,
>>>                            size, MEMMAP_EARLY);
>>>            BUG_ON(ret);
>>> +        zone->zone_start_pfn = zone_start_pfn;
>>>            memmap_init(size, nid, j, zone_start_pfn);
>>>            zone_start_pfn += size;
>>>        }
>>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing
  2012-09-28  7:19       ` Lai Jiangshan
@ 2012-09-28 22:26         ` KOSAKI Motohiro
  2012-10-24  7:06           ` Lai Jiangshan
  0 siblings, 1 reply; 19+ messages in thread
From: KOSAKI Motohiro @ 2012-09-28 22:26 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Christoph, linux-kernel, Christoph Lameter, Pekka Enberg,
	Matt Mackall, linux-mm

On Fri, Sep 28, 2012 at 3:19 AM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> HI, Christoph, KOSAKI
>
> SLAB always allocates kmem_list3 for all nodes(N_HIGH_MEMORY), also node bug/bad things happens.
> SLUB always requires kmem_cache_node on the correct node, so these fix is needed.
>
> SLAB uses for_each_online_node() to travel nodes and do maintain,
> and it tolerates kmem_list3 on alien nodes.
> SLUB uses for_each_node_state(node, N_NORMAL_MEMORY) to travel nodes and do maintain,
> and it does not tolerate kmem_cache_node on alien nodes.
>
> Maybe we need to change SLAB future and let it use
> for_each_node_state(node, N_NORMAL_MEMORY), But I don't want to change SLAB
> until I find something bad in SLAB.

SLAB can't use highmem. then traverse zones which don't have normal
memory is silly IMHO.
If this is not bug, current slub behavior is also not bug. Is there
any difference?

If I understand correctly, current code may waste some additional
memory on corner case. but it doesn't make memory leak both when slab
and slub.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing
  2012-09-28 22:26         ` KOSAKI Motohiro
@ 2012-10-24  7:06           ` Lai Jiangshan
  0 siblings, 0 replies; 19+ messages in thread
From: Lai Jiangshan @ 2012-10-24  7:06 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph, linux-kernel, Christoph Lameter, Pekka Enberg,
	Matt Mackall, linux-mm

On 09/29/2012 06:26 AM, KOSAKI Motohiro wrote:
> On Fri, Sep 28, 2012 at 3:19 AM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>> HI, Christoph, KOSAKI
>>
>> SLAB always allocates kmem_list3 for all nodes(N_HIGH_MEMORY), also node bug/bad things happens.
>> SLUB always requires kmem_cache_node on the correct node, so these fix is needed.
>>
>> SLAB uses for_each_online_node() to travel nodes and do maintain,
>> and it tolerates kmem_list3 on alien nodes.
>> SLUB uses for_each_node_state(node, N_NORMAL_MEMORY) to travel nodes and do maintain,
>> and it does not tolerate kmem_cache_node on alien nodes.
>>
>> Maybe we need to change SLAB future and let it use
>> for_each_node_state(node, N_NORMAL_MEMORY), But I don't want to change SLAB
>> until I find something bad in SLAB.
> 
> SLAB can't use highmem. then traverse zones which don't have normal
> memory is silly IMHO.

SLAB tolerates dummy kmem_list3 on alien nodes.

> If this is not bug, current slub behavior is also not bug. Is there
> any difference?

SLUB can't tolerates dummy kmem_cache_node on alien nodes, otherwise
n->nr_slabs will be corrupted when we online a node which don't have normal memory,
and trigger a WARN_ON(). And it will trigger BUG_ON() when we remove the node.

Since SLUB always use for_each_node_state(node, N_NORMAL_MEMORY), we should make
all the other code in slub.c be compatible with it. otherwise we will break the
design of SLUB.

Since SLAB always use for_each_online_node(), it means it accept some silly behavior
in the design, we don't need to change it before we decide to remove the whole
silly things at together. there is not waring and buggy in SLAB in this view.

> 
> If I understand correctly, current code may waste some additional
> memory on corner case. but it doesn't make memory leak both when slab
> and slub.
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]
  2012-09-27 22:03   ` KOSAKI Motohiro
@ 2012-10-26  1:39     ` Lai Jiangshan
  0 siblings, 0 replies; 19+ messages in thread
From: Lai Jiangshan @ 2012-10-26  1:39 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, Rob Landley, Andrew Morton, Jiang Liu, Jianguo Wu,
	Kay Sievers, Greg Kroah-Hartman, Xishi Qiu, Mel Gorman,
	linux-doc, linux-mm

Hi, KOSAKI 

On 09/28/2012 06:03 AM, KOSAKI Motohiro wrote:
> (9/27/12 2:47 AM), Lai Jiangshan wrote:
>> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
>> it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
>> node_states[N_NORMAL_MEMORY] becomes stale.
> 
> What's mean 'stale'? I guess
> 
> : Currently memory_hotplug doesn't turn on/off node_states[N_NORMAL_MEMORY]


Right.

> and
> : then it will be invalid if the platform has highmem. Luckily, almost memory 
> : hotplug aware platform don't have highmem, but are not all.
> 
> right?

Some platforms(32 bit) support logic-memory-hotplug.
Some platforms have movable memory.
They are all considered.

> I supporse this patch only meaningful on ARM platform practically.
> 

any platform whic supports memory-hotplug.

> 
> 
>> We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
>> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
>> are changed while hotpluging.
> 
> 
>> Also add @status_change_nid_normal to struct memory_notify, thus
>> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
>> are changed.
> 
> status_change_nid_normal is very ugly to me. When status_change_nid and 
> status_change_nid_normal has positive value, they are always the same.
> nid and flags value are more natual to me.

If we use flags, the semantic of "status_change_nid" is changed and we need to
change more current code, and we will add complicated to the memory hotplug
callbacks.

like this:

-		node = arg->status_change_nid;
+		if (arg->status_change_flags & (1UL << N_HIGH_MEMORY))
+			node = arg->status_change_nid;
+		else
+			node = -1;

> 
> 
> 
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>  Documentation/memory-hotplug.txt |    5 ++-
>>  include/linux/memory.h           |    1 +
>>  mm/memory_hotplug.c              |   94 +++++++++++++++++++++++++++++++------
>>  3 files changed, 83 insertions(+), 17 deletions(-)
>>
>> diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
>> index 6d0c251..6e6cbc7 100644
>> --- a/Documentation/memory-hotplug.txt
>> +++ b/Documentation/memory-hotplug.txt
>> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct memory_notify.
>>  struct memory_notify {
>>         unsigned long start_pfn;
>>         unsigned long nr_pages;
>> +       int status_change_nid_normal;
>>         int status_change_nid;
>>  }
>>  
>>  start_pfn is start_pfn of online/offline memory.
>>  nr_pages is # of pages of online/offline memory.
>> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
>> +is (will be) set/clear, if this is -1, then nodemask status is not changed.
>>  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
>>  set/clear. It means a new(memoryless) node gets new memory by online and a
>>  node loses all memory. If this is -1, then nodemask status is not changed.
>> -If status_changed_nid >= 0, callback should create/discard structures for the
>> +If status_changed_nid* >= 0, callback should create/discard structures for the
>>  node if necessary.
>>  
>>  --------------
>> diff --git a/include/linux/memory.h b/include/linux/memory.h
>> index ff9a9f8..a09216d 100644
>> --- a/include/linux/memory.h
>> +++ b/include/linux/memory.h
>> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
>>  struct memory_notify {
>>  	unsigned long start_pfn;
>>  	unsigned long nr_pages;
>> +	int status_change_nid_normal;
>>  	int status_change_nid;
>>  };
>>  
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 6a5b90d..b62d429b 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>>  	return 0;
>>  }
>>  
>> +static void check_nodemasks_changes_online(unsigned long nr_pages,
>> +	struct zone *zone, struct memory_notify *arg)
>> +{
>> +	int nid = zone_to_nid(zone);
>> +	enum zone_type zone_last = ZONE_NORMAL;
>> +
>> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
>> +		zone_last = ZONE_MOVABLE;
> 
> This is very strange (or ugly) code. ZONE_MOVABLE don't depend on high mem.

If we don't have HIGHMEM,
any node of N_NORMAL_MEMORY has 0...ZONE_MOVABLE

if we have HIGHMEM,
any node of N_NORMAL_MEMORY has 0...ZONE_NORMAL

> 
> 
>> +
>> +	if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
>> +		arg->status_change_nid_normal = nid;
>> +	else
>> +		arg->status_change_nid_normal = -1;
> 
> Wrong. The onlined node may only have high mem zone. IOW, think fake numa case etc.

"zone_idx(zone) <= zone_last" checks this case. the result is "else" branch.

> 
> 
>> +
>> +	if (!node_state(nid, N_HIGH_MEMORY))
>> +		arg->status_change_nid = nid;
>> +	else
>> +		arg->status_change_nid = -1;
>> +}
>> +
>> +static void set_nodemasks(int node, struct memory_notify *arg)
> 
> Too ugly. just remove this and use node_set_state() directly.
> 
>> +{
>> +	if (arg->status_change_nid_normal >= 0)
>> +		node_set_state(node, N_NORMAL_MEMORY);
>> +
>> +	node_set_state(node, N_HIGH_MEMORY);
>> +}
>> +
>>  
>>  int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>  {
>> @@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>  	struct memory_notify arg;
>>  
>>  	lock_memory_hotplug();
>> +	/*
>> +	 * This doesn't need a lock to do pfn_to_page().
>> +	 * The section can't be removed here because of the
>> +	 * memory_block->state_mutex.
>> +	 */
> 
> Please explain the intention of this comment. We think lock_memory_hotplug() close
> a race against memory offline directly.

I move old code up. the context does not changed.

> 
> 
>> +	zone = page_zone(pfn_to_page(pfn));
>> +
>>  	arg.start_pfn = pfn;
>>  	arg.nr_pages = nr_pages;
>> -	arg.status_change_nid = -1;
>> +	check_nodemasks_changes_online(nr_pages, zone, &arg);
>>  
>>  	nid = page_to_nid(pfn_to_page(pfn));
>> -	if (node_present_pages(nid) == 0)
>> -		arg.status_change_nid = nid;
>>  
>>  	ret = memory_notify(MEM_GOING_ONLINE, &arg);
>>  	ret = notifier_to_errno(ret);
>> @@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>  		return ret;
>>  	}
>>  	/*
>> -	 * This doesn't need a lock to do pfn_to_page().
>> -	 * The section can't be removed here because of the
>> -	 * memory_block->state_mutex.
>> -	 */
>> -	zone = page_zone(pfn_to_page(pfn));
>> -	/*
>>  	 * If this zone is not populated, then it is not in zonelist.
>>  	 * This means the page allocator ignores this zone.
>>  	 * So, zonelist must be updated after online.
>> @@ -517,7 +544,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>  	zone->present_pages += onlined_pages;
>>  	zone->zone_pgdat->node_present_pages += onlined_pages;
>>  	if (onlined_pages) {
>> -		node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
>> +		set_nodemasks(zone_to_nid(zone), &arg);
>>  		if (need_zonelists_rebuild)
>>  			build_all_zonelists(NULL, zone);
>>  		else
>> @@ -870,6 +897,44 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>>  	return offlined;
>>  }
>>  
>> +static void check_nodemasks_changes_offline(unsigned long nr_pages,
>> +		struct zone *zone, struct memory_notify *arg)
>> +{
> 
> This should remove ugly memory_notify argument and should be fold 
> check_nodemasks_changes_online().
> 
> 
>> +	struct pglist_data *pgdat = zone->zone_pgdat;
>> +	unsigned long present_pages = 0;
>> +	enum zone_type zt, zone_last = ZONE_NORMAL;
>> +
>> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
>> +		zone_last = ZONE_MOVABLE;
>> +
>> +	for (zt = 0; zt <= zone_last; zt++)
>> +		present_pages += pgdat->node_zones[zt].present_pages;
>> +	if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
>> +		arg->status_change_nid_normal = zone_to_nid(zone);
>> +	else
>> +		arg->status_change_nid_normal = -1;
> 
> Wrong. Think, a zone has both normal and highmem zone and when admin 
> only remove highmem area. you should check pfn too.

I checked the pfn via "zone_idx(zone) <= zone_last".

"""zone = page_zone(pfn_to_page(pfn))"""

> 
> 
>> +
>> +	zone_last = ZONE_MOVABLE;
>> +	for (; zt <= zone_last; zt++)
>> +		present_pages += pgdat->node_zones[zt].present_pages;
>> +	if (nr_pages >= present_pages)
>> +		arg->status_change_nid = zone_to_nid(zone);
>> +	else
>> +		arg->status_change_nid = -1;
>> +}
>> +
>> +static void clear_nodemasks(int node, struct memory_notify *arg)
>> +{
>> +	if (arg->status_change_nid_normal >= 0)
>> +		node_clear_state(node, N_NORMAL_MEMORY);
>> +
>> +	if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
>> +		return;
> 
> You should use #ifdef. otherwise we face unreachable statement warning.

fixed.

> 
>> +
>> +	if (arg->status_change_nid >= 0)
>> +		node_clear_state(node, N_HIGH_MEMORY);
>> +}
>> +
>>  static int __ref offline_pages(unsigned long start_pfn,
>>  		  unsigned long end_pfn, unsigned long timeout)
>>  {
>> @@ -903,9 +968,7 @@ static int __ref offline_pages(unsigned long start_pfn,
>>  
>>  	arg.start_pfn = start_pfn;
>>  	arg.nr_pages = nr_pages;
>> -	arg.status_change_nid = -1;
>> -	if (nr_pages >= node_present_pages(node))
>> -		arg.status_change_nid = node;
>> +	check_nodemasks_changes_offline(nr_pages, zone, &arg);
>>  
>>  	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
>>  	ret = notifier_to_errno(ret);
>> @@ -973,10 +1036,9 @@ repeat:
>>  	if (!populated_zone(zone))
>>  		zone_pcp_reset(zone);
>>  
>> -	if (!node_present_pages(node)) {
>> -		node_clear_state(node, N_HIGH_MEMORY);
>> +	clear_nodemasks(node, &arg);
>> +	if (arg.status_change_nid >= 0)
>>  		kswapd_stop(node);
>> -	}
>>  
>>  	vm_total_pages = nr_free_pagecache_pages();
>>  	writeback_set_ratelimit();
>>

Thank you very much for the review.
And sorry for so late respond.

Thank,
Lai


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-10-26  1:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-27  6:47 [PATCH 0/3] memory_hotplug: fix memory hotplug bug Lai Jiangshan
2012-09-27  6:47 ` [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY] Lai Jiangshan
2012-09-27 14:32   ` Ni zhan Chen
2012-09-28  7:32     ` Lai Jiangshan
2012-09-27 22:03   ` KOSAKI Motohiro
2012-10-26  1:39     ` Lai Jiangshan
2012-09-27  6:47 ` [PATCH 2/3] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
2012-09-27 22:04   ` KOSAKI Motohiro
2012-09-27 22:35     ` Christoph
2012-09-28  7:19       ` Lai Jiangshan
2012-09-28 22:26         ` KOSAKI Motohiro
2012-10-24  7:06           ` Lai Jiangshan
2012-09-27  6:47 ` [PATCH 3/3] memory_hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock() Lai Jiangshan
2012-09-27 13:19   ` Ni zhan Chen
2012-09-28  7:29     ` Lai Jiangshan
2012-09-28  8:04       ` Ni zhan Chen
2012-09-27 22:30   ` KOSAKI Motohiro
2012-09-28  7:39     ` Lai Jiangshan
2012-09-28  0:39 ` [PATCH 0/3] memory_hotplug: fix memory hotplug bug Ni zhan Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).