All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [0/4] Update slab memory hotplug series
@ 2010-02-11 20:53 ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:53 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


Should address all earlier comments (except for the funny cpuset
case which I chose to declare a don't do that)

Also this time hopefully without missing patches.

There are still some other issues with memory hotadd, but that's the 
current slab set.

The patches are against 2.6.32, but apply to mainline I believe.

-Andi

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [PATCH] [0/4] Update slab memory hotplug series
@ 2010-02-11 20:53 ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:53 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


Should address all earlier comments (except for the funny cpuset
case which I chose to declare a don't do that)

Also this time hopefully without missing patches.

There are still some other issues with memory hotadd, but that's the 
current slab set.

The patches are against 2.6.32, but apply to mainline I believe.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
  2010-02-11 20:53 ` Andi Kleen
@ 2010-02-11 20:54   ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:54 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


When fallback_alloc() runs the node of the CPU might not be initialized yet.
Handle this case by allocating in another node.

v2: Try to allocate from all nodes (David Rientjes)

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 mm/slab.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -3188,7 +3188,24 @@ retry:
 		if (local_flags & __GFP_WAIT)
 			local_irq_enable();
 		kmem_flagcheck(cache, flags);
-		obj = kmem_getpages(cache, local_flags, numa_node_id());
+
+		/*
+		 * Node not set up yet? Try one that the cache has been set up
+		 * for.
+		 */
+		nid = numa_node_id();
+		if (cache->nodelists[nid] == NULL) {
+			for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+				nid = zone_to_nid(zone);
+				if (cache->nodelists[nid]) {
+					obj = kmem_getpages(cache, local_flags, nid);
+					if (obj)
+						break;
+				}
+			}
+		} else
+			obj = kmem_getpages(cache, local_flags, nid);
+
 		if (local_flags & __GFP_WAIT)
 			local_irq_disable();
 		if (obj) {

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
@ 2010-02-11 20:54   ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:54 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


When fallback_alloc() runs the node of the CPU might not be initialized yet.
Handle this case by allocating in another node.

v2: Try to allocate from all nodes (David Rientjes)

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 mm/slab.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -3188,7 +3188,24 @@ retry:
 		if (local_flags & __GFP_WAIT)
 			local_irq_enable();
 		kmem_flagcheck(cache, flags);
-		obj = kmem_getpages(cache, local_flags, numa_node_id());
+
+		/*
+		 * Node not set up yet? Try one that the cache has been set up
+		 * for.
+		 */
+		nid = numa_node_id();
+		if (cache->nodelists[nid] == NULL) {
+			for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+				nid = zone_to_nid(zone);
+				if (cache->nodelists[nid]) {
+					obj = kmem_getpages(cache, local_flags, nid);
+					if (obj)
+						break;
+				}
+			}
+		} else
+			obj = kmem_getpages(cache, local_flags, nid);
+
 		if (local_flags & __GFP_WAIT)
 			local_irq_disable();
 		if (obj) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [PATCH] [2/4] SLAB: Separate node initialization into separate function
  2010-02-11 20:53 ` Andi Kleen
@ 2010-02-11 20:54   ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:54 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


No functional changes.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 mm/slab.c |   34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -1158,19 +1158,9 @@ free_array_cache:
 	}
 }
 
-static int __cpuinit cpuup_prepare(long cpu)
+static int slab_node_prepare(int node)
 {
 	struct kmem_cache *cachep;
-	struct kmem_list3 *l3 = NULL;
-	int node = cpu_to_node(cpu);
-	const int memsize = sizeof(struct kmem_list3);
-
-	/*
-	 * We need to do this right in the beginning since
-	 * alloc_arraycache's are going to use this list.
-	 * kmalloc_node allows us to add the slab to the right
-	 * kmem_list3 and not this cpu's kmem_list3
-	 */
 
 	list_for_each_entry(cachep, &cache_chain, next) {
 		/*
@@ -1179,9 +1169,10 @@ static int __cpuinit cpuup_prepare(long
 		 * node has not already allocated this
 		 */
 		if (!cachep->nodelists[node]) {
-			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
+			struct kmem_list3 *l3;
+			l3 = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node);
 			if (!l3)
-				goto bad;
+				return -1;
 			kmem_list3_init(l3);
 			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
 			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
@@ -1200,6 +1191,23 @@ static int __cpuinit cpuup_prepare(long
 			cachep->batchcount + cachep->num;
 		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
 	}
+	return 0;
+}
+
+static int __cpuinit cpuup_prepare(long cpu)
+{
+	struct kmem_cache *cachep;
+	struct kmem_list3 *l3 = NULL;
+	int node = cpu_to_node(cpu);
+
+	/*
+	 * We need to do this right in the beginning since
+	 * alloc_arraycache's are going to use this list.
+	 * kmalloc_node allows us to add the slab to the right
+	 * kmem_list3 and not this cpu's kmem_list3
+	 */
+	if (slab_node_prepare(node) < 0)
+		goto bad;
 
 	/*
 	 * Now we can go ahead with allocating the shared arrays and

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [PATCH] [2/4] SLAB: Separate node initialization into separate function
@ 2010-02-11 20:54   ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:54 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


No functional changes.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 mm/slab.c |   34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -1158,19 +1158,9 @@ free_array_cache:
 	}
 }
 
-static int __cpuinit cpuup_prepare(long cpu)
+static int slab_node_prepare(int node)
 {
 	struct kmem_cache *cachep;
-	struct kmem_list3 *l3 = NULL;
-	int node = cpu_to_node(cpu);
-	const int memsize = sizeof(struct kmem_list3);
-
-	/*
-	 * We need to do this right in the beginning since
-	 * alloc_arraycache's are going to use this list.
-	 * kmalloc_node allows us to add the slab to the right
-	 * kmem_list3 and not this cpu's kmem_list3
-	 */
 
 	list_for_each_entry(cachep, &cache_chain, next) {
 		/*
@@ -1179,9 +1169,10 @@ static int __cpuinit cpuup_prepare(long
 		 * node has not already allocated this
 		 */
 		if (!cachep->nodelists[node]) {
-			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
+			struct kmem_list3 *l3;
+			l3 = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node);
 			if (!l3)
-				goto bad;
+				return -1;
 			kmem_list3_init(l3);
 			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
 			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
@@ -1200,6 +1191,23 @@ static int __cpuinit cpuup_prepare(long
 			cachep->batchcount + cachep->num;
 		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
 	}
+	return 0;
+}
+
+static int __cpuinit cpuup_prepare(long cpu)
+{
+	struct kmem_cache *cachep;
+	struct kmem_list3 *l3 = NULL;
+	int node = cpu_to_node(cpu);
+
+	/*
+	 * We need to do this right in the beginning since
+	 * alloc_arraycache's are going to use this list.
+	 * kmalloc_node allows us to add the slab to the right
+	 * kmem_list3 and not this cpu's kmem_list3
+	 */
+	if (slab_node_prepare(node) < 0)
+		goto bad;
 
 	/*
 	 * Now we can go ahead with allocating the shared arrays and

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
  2010-02-11 20:53 ` Andi Kleen
@ 2010-02-11 20:54   ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:54 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


So kmalloc_node() works even if no CPU is up yet on the new node.

v2: Take cache chain mutex

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 mm/slab.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -115,6 +115,7 @@
 #include	<linux/reciprocal_div.h>
 #include	<linux/debugobjects.h>
 #include	<linux/kmemcheck.h>
+#include	<linux/memory.h>
 
 #include	<asm/cacheflush.h>
 #include	<asm/tlbflush.h>
@@ -1554,6 +1555,23 @@ void __init kmem_cache_init(void)
 	g_cpucache_up = EARLY;
 }
 
+static int slab_memory_callback(struct notifier_block *self,
+				unsigned long action, void *arg)
+{
+	struct memory_notify *mn = (struct memory_notify *)arg;
+
+	/*
+	 * When a node goes online allocate l3s early.	 This way
+	 * kmalloc_node() works for it.
+	 */
+	if (action == MEM_ONLINE && mn->status_change_nid >= 0) {
+		mutex_lock(&cache_chain_mutex);
+		slab_node_prepare(mn->status_change_nid);
+		mutex_unlock(&cache_chain_mutex);
+	}
+	return NOTIFY_OK;
+}
+
 void __init kmem_cache_init_late(void)
 {
 	struct kmem_cache *cachep;
@@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
 	 */
 	register_cpu_notifier(&cpucache_notifier);
 
+	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
+
 	/*
 	 * The reap timers are started later, with a module init call: That part
 	 * of the kernel is not yet operational.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
@ 2010-02-11 20:54   ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:54 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


So kmalloc_node() works even if no CPU is up yet on the new node.

v2: Take cache chain mutex

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 mm/slab.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -115,6 +115,7 @@
 #include	<linux/reciprocal_div.h>
 #include	<linux/debugobjects.h>
 #include	<linux/kmemcheck.h>
+#include	<linux/memory.h>
 
 #include	<asm/cacheflush.h>
 #include	<asm/tlbflush.h>
@@ -1554,6 +1555,23 @@ void __init kmem_cache_init(void)
 	g_cpucache_up = EARLY;
 }
 
+static int slab_memory_callback(struct notifier_block *self,
+				unsigned long action, void *arg)
+{
+	struct memory_notify *mn = (struct memory_notify *)arg;
+
+	/*
+	 * When a node goes online allocate l3s early.	 This way
+	 * kmalloc_node() works for it.
+	 */
+	if (action == MEM_ONLINE && mn->status_change_nid >= 0) {
+		mutex_lock(&cache_chain_mutex);
+		slab_node_prepare(mn->status_change_nid);
+		mutex_unlock(&cache_chain_mutex);
+	}
+	return NOTIFY_OK;
+}
+
 void __init kmem_cache_init_late(void)
 {
 	struct kmem_cache *cachep;
@@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
 	 */
 	register_cpu_notifier(&cpucache_notifier);
 
+	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
+
 	/*
 	 * The reap timers are started later, with a module init call: That part
 	 * of the kernel is not yet operational.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-11 20:53 ` Andi Kleen
@ 2010-02-11 20:54   ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:54 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


cache_reap can run before the node is set up and then reference a NULL 
l3 list. Check for this explicitely and just continue. The node
will be eventually set up.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 mm/slab.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -4093,6 +4093,9 @@ static void cache_reap(struct work_struc
 		 * we can do some work if the lock was obtained.
 		 */
 		l3 = searchp->nodelists[node];
+		/* Note node yet set up */
+		if (!l3)
+			break;
 
 		reap_alien(searchp, l3);
 

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-11 20:54   ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 20:54 UTC (permalink / raw)
  To: penberg, linux-kernel, linux-mm, haicheng.li, rientjes


cache_reap can run before the node is set up and then reference a NULL 
l3 list. Check for this explicitely and just continue. The node
will be eventually set up.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 mm/slab.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -4093,6 +4093,9 @@ static void cache_reap(struct work_struc
 		 * we can do some work if the lock was obtained.
 		 */
 		l3 = searchp->nodelists[node];
+		/* Note node yet set up */
+		if (!l3)
+			break;
 
 		reap_alien(searchp, l3);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
  2010-02-11 20:54   ` Andi Kleen
@ 2010-02-11 21:41     ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-11 21:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li

On Thu, 11 Feb 2010, Andi Kleen wrote:

> When fallback_alloc() runs the node of the CPU might not be initialized yet.
> Handle this case by allocating in another node.
> 
> v2: Try to allocate from all nodes (David Rientjes)
> 

You don't need to specifically address the cpuset restriction in 
fallback_alloc() since kmem_getpages() will return NULL whenever a zone is 
tried from an unallowed node, I just thought it was a faster optimization 
considering you (i) would operate over a nodemask and not the entire 
zonelist, (ii) it would avoid the zone_to_nid() for all zones since you 
already did a zonelist iteration in this function, and (iii) it wouldn't 
needlessly call kmem_getpages() for unallowed nodes.

> Signed-off-by: Andi Kleen <ak@linux.intel.com>

That said, I don't want to see this fix go unmerged since you already 
declined to make that optimization once:

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
@ 2010-02-11 21:41     ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-11 21:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li

On Thu, 11 Feb 2010, Andi Kleen wrote:

> When fallback_alloc() runs the node of the CPU might not be initialized yet.
> Handle this case by allocating in another node.
> 
> v2: Try to allocate from all nodes (David Rientjes)
> 

You don't need to specifically address the cpuset restriction in 
fallback_alloc() since kmem_getpages() will return NULL whenever a zone is 
tried from an unallowed node, I just thought it was a faster optimization 
considering you (i) would operate over a nodemask and not the entire 
zonelist, (ii) it would avoid the zone_to_nid() for all zones since you 
already did a zonelist iteration in this function, and (iii) it wouldn't 
needlessly call kmem_getpages() for unallowed nodes.

> Signed-off-by: Andi Kleen <ak@linux.intel.com>

That said, I don't want to see this fix go unmerged since you already 
declined to make that optimization once:

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [2/4] SLAB: Separate node initialization into separate function
  2010-02-11 20:54   ` Andi Kleen
@ 2010-02-11 21:44     ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-11 21:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li

On Thu, 11 Feb 2010, Andi Kleen wrote:

> Index: linux-2.6.32-memhotadd/mm/slab.c
> ===================================================================
> --- linux-2.6.32-memhotadd.orig/mm/slab.c
> +++ linux-2.6.32-memhotadd/mm/slab.c
> @@ -1158,19 +1158,9 @@ free_array_cache:
>  	}
>  }
>  
> -static int __cpuinit cpuup_prepare(long cpu)
> +static int slab_node_prepare(int node)

I still think this deserves a comment saying that slab_node_prepare() 
requires cache_chain_mutex, I'm not sure people interested in node hotadd 
would be concerned with whether the implementation needs to iterate slab 
caches or not.

Otherwise:

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [2/4] SLAB: Separate node initialization into separate function
@ 2010-02-11 21:44     ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-11 21:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li

On Thu, 11 Feb 2010, Andi Kleen wrote:

> Index: linux-2.6.32-memhotadd/mm/slab.c
> ===================================================================
> --- linux-2.6.32-memhotadd.orig/mm/slab.c
> +++ linux-2.6.32-memhotadd/mm/slab.c
> @@ -1158,19 +1158,9 @@ free_array_cache:
>  	}
>  }
>  
> -static int __cpuinit cpuup_prepare(long cpu)
> +static int slab_node_prepare(int node)

I still think this deserves a comment saying that slab_node_prepare() 
requires cache_chain_mutex, I'm not sure people interested in node hotadd 
would be concerned with whether the implementation needs to iterate slab 
caches or not.

Otherwise:

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
  2010-02-11 20:54   ` Andi Kleen
@ 2010-02-11 21:45     ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-11 21:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li

On Thu, 11 Feb 2010, Andi Kleen wrote:

> Index: linux-2.6.32-memhotadd/mm/slab.c
> ===================================================================
> --- linux-2.6.32-memhotadd.orig/mm/slab.c
> +++ linux-2.6.32-memhotadd/mm/slab.c
> @@ -115,6 +115,7 @@
>  #include	<linux/reciprocal_div.h>
>  #include	<linux/debugobjects.h>
>  #include	<linux/kmemcheck.h>
> +#include	<linux/memory.h>
>  
>  #include	<asm/cacheflush.h>
>  #include	<asm/tlbflush.h>
> @@ -1554,6 +1555,23 @@ void __init kmem_cache_init(void)
>  	g_cpucache_up = EARLY;
>  }
>  
> +static int slab_memory_callback(struct notifier_block *self,
> +				unsigned long action, void *arg)
> +{
> +	struct memory_notify *mn = (struct memory_notify *)arg;
> +
> +	/*
> +	 * When a node goes online allocate l3s early.	 This way
> +	 * kmalloc_node() works for it.
> +	 */
> +	if (action == MEM_ONLINE && mn->status_change_nid >= 0) {
> +		mutex_lock(&cache_chain_mutex);
> +		slab_node_prepare(mn->status_change_nid);
> +		mutex_unlock(&cache_chain_mutex);
> +	}
> +	return NOTIFY_OK;
> +}
> +
>  void __init kmem_cache_init_late(void)
>  {
>  	struct kmem_cache *cachep;
> @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
>  	 */
>  	register_cpu_notifier(&cpucache_notifier);
>  
> +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +

Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
since status_change_nid will always be -1.

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
@ 2010-02-11 21:45     ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-11 21:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li

On Thu, 11 Feb 2010, Andi Kleen wrote:

> Index: linux-2.6.32-memhotadd/mm/slab.c
> ===================================================================
> --- linux-2.6.32-memhotadd.orig/mm/slab.c
> +++ linux-2.6.32-memhotadd/mm/slab.c
> @@ -115,6 +115,7 @@
>  #include	<linux/reciprocal_div.h>
>  #include	<linux/debugobjects.h>
>  #include	<linux/kmemcheck.h>
> +#include	<linux/memory.h>
>  
>  #include	<asm/cacheflush.h>
>  #include	<asm/tlbflush.h>
> @@ -1554,6 +1555,23 @@ void __init kmem_cache_init(void)
>  	g_cpucache_up = EARLY;
>  }
>  
> +static int slab_memory_callback(struct notifier_block *self,
> +				unsigned long action, void *arg)
> +{
> +	struct memory_notify *mn = (struct memory_notify *)arg;
> +
> +	/*
> +	 * When a node goes online allocate l3s early.	 This way
> +	 * kmalloc_node() works for it.
> +	 */
> +	if (action == MEM_ONLINE && mn->status_change_nid >= 0) {
> +		mutex_lock(&cache_chain_mutex);
> +		slab_node_prepare(mn->status_change_nid);
> +		mutex_unlock(&cache_chain_mutex);
> +	}
> +	return NOTIFY_OK;
> +}
> +
>  void __init kmem_cache_init_late(void)
>  {
>  	struct kmem_cache *cachep;
> @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
>  	 */
>  	register_cpu_notifier(&cpucache_notifier);
>  
> +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +

Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
since status_change_nid will always be -1.

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-11 20:54   ` Andi Kleen
@ 2010-02-11 21:45     ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-11 21:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li

On Thu, 11 Feb 2010, Andi Kleen wrote:

> 
> cache_reap can run before the node is set up and then reference a NULL 
> l3 list. Check for this explicitely and just continue. The node
> will be eventually set up.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-11 21:45     ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-11 21:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li

On Thu, 11 Feb 2010, Andi Kleen wrote:

> 
> cache_reap can run before the node is set up and then reference a NULL 
> l3 list. Check for this explicitely and just continue. The node
> will be eventually set up.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
  2010-02-11 21:41     ` David Rientjes
@ 2010-02-11 21:55       ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 21:55 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li

On Thu, Feb 11, 2010 at 01:41:53PM -0800, David Rientjes wrote:
> On Thu, 11 Feb 2010, Andi Kleen wrote:
> 
> > When fallback_alloc() runs the node of the CPU might not be initialized yet.
> > Handle this case by allocating in another node.
> > 
> > v2: Try to allocate from all nodes (David Rientjes)
> > 
> 
> You don't need to specifically address the cpuset restriction in 
> fallback_alloc() since kmem_getpages() will return NULL whenever a zone is 
> tried from an unallowed node, I just thought it was a faster optimization 
> considering you (i) would operate over a nodemask and not the entire 
> zonelist, (ii) it would avoid the zone_to_nid() for all zones since you 
> already did a zonelist iteration in this function, and (iii) it wouldn't 
> needlessly call kmem_getpages() for unallowed nodes.

Thanks for the review again.

I don't really care about performance at all for this, this is just for
a few allocations during the memory hotadd path.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
@ 2010-02-11 21:55       ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-11 21:55 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li

On Thu, Feb 11, 2010 at 01:41:53PM -0800, David Rientjes wrote:
> On Thu, 11 Feb 2010, Andi Kleen wrote:
> 
> > When fallback_alloc() runs the node of the CPU might not be initialized yet.
> > Handle this case by allocating in another node.
> > 
> > v2: Try to allocate from all nodes (David Rientjes)
> > 
> 
> You don't need to specifically address the cpuset restriction in 
> fallback_alloc() since kmem_getpages() will return NULL whenever a zone is 
> tried from an unallowed node, I just thought it was a faster optimization 
> considering you (i) would operate over a nodemask and not the entire 
> zonelist, (ii) it would avoid the zone_to_nid() for all zones since you 
> already did a zonelist iteration in this function, and (iii) it wouldn't 
> needlessly call kmem_getpages() for unallowed nodes.

Thanks for the review again.

I don't really care about performance at all for this, this is just for
a few allocations during the memory hotadd path.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [0/4] Update slab memory hotplug series
  2010-02-11 20:53 ` Andi Kleen
@ 2010-02-13 10:24   ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-13 10:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, linux-mm, haicheng.li, rientjes

Andi Kleen wrote:
> Should address all earlier comments (except for the funny cpuset
> case which I chose to declare a don't do that)
> 
> Also this time hopefully without missing patches.
> 
> There are still some other issues with memory hotadd, but that's the 
> current slab set.
> 
> The patches are against 2.6.32, but apply to mainline I believe.

The series has been applied and will appear in the next version of 
linux-next.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [0/4] Update slab memory hotplug series
@ 2010-02-13 10:24   ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-13 10:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, linux-mm, haicheng.li, rientjes

Andi Kleen wrote:
> Should address all earlier comments (except for the funny cpuset
> case which I chose to declare a don't do that)
> 
> Also this time hopefully without missing patches.
> 
> There are still some other issues with memory hotadd, but that's the 
> current slab set.
> 
> The patches are against 2.6.32, but apply to mainline I believe.

The series has been applied and will appear in the next version of 
linux-next.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
  2010-02-11 20:54   ` Andi Kleen
@ 2010-02-15  6:04     ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15  6:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Thu, Feb 11, 2010 at 09:54:00PM +0100, Andi Kleen wrote:
> 
> When fallback_alloc() runs the node of the CPU might not be initialized yet.
> Handle this case by allocating in another node.
> 
> v2: Try to allocate from all nodes (David Rientjes)
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ---
>  mm/slab.c |   19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.32-memhotadd/mm/slab.c
> ===================================================================
> --- linux-2.6.32-memhotadd.orig/mm/slab.c
> +++ linux-2.6.32-memhotadd/mm/slab.c
> @@ -3188,7 +3188,24 @@ retry:
>  		if (local_flags & __GFP_WAIT)
>  			local_irq_enable();
>  		kmem_flagcheck(cache, flags);
> -		obj = kmem_getpages(cache, local_flags, numa_node_id());
> +
> +		/*
> +		 * Node not set up yet? Try one that the cache has been set up
> +		 * for.
> +		 */
> +		nid = numa_node_id();
> +		if (cache->nodelists[nid] == NULL) {
> +			for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
> +				nid = zone_to_nid(zone);
> +				if (cache->nodelists[nid]) {
> +					obj = kmem_getpages(cache, local_flags, nid);
> +					if (obj)
> +						break;
> +				}
> +			}
> +		} else
> +			obj = kmem_getpages(cache, local_flags, nid);
> +
>  		if (local_flags & __GFP_WAIT)
>  			local_irq_disable();
>  		if (obj) {

This is a better way to go anyway because it really is a proper
"fallback" alloc. I think that possibly used to work (ie. kmem_getpages
would be able to pass -1 for the node there) but got broken along the
line.

Although it's not such a hot path to begin with, care to put a branch
annotation there?

Acked-by: Nick Piggin <npiggin@suse.de>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
@ 2010-02-15  6:04     ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15  6:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Thu, Feb 11, 2010 at 09:54:00PM +0100, Andi Kleen wrote:
> 
> When fallback_alloc() runs the node of the CPU might not be initialized yet.
> Handle this case by allocating in another node.
> 
> v2: Try to allocate from all nodes (David Rientjes)
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ---
>  mm/slab.c |   19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.32-memhotadd/mm/slab.c
> ===================================================================
> --- linux-2.6.32-memhotadd.orig/mm/slab.c
> +++ linux-2.6.32-memhotadd/mm/slab.c
> @@ -3188,7 +3188,24 @@ retry:
>  		if (local_flags & __GFP_WAIT)
>  			local_irq_enable();
>  		kmem_flagcheck(cache, flags);
> -		obj = kmem_getpages(cache, local_flags, numa_node_id());
> +
> +		/*
> +		 * Node not set up yet? Try one that the cache has been set up
> +		 * for.
> +		 */
> +		nid = numa_node_id();
> +		if (cache->nodelists[nid] == NULL) {
> +			for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
> +				nid = zone_to_nid(zone);
> +				if (cache->nodelists[nid]) {
> +					obj = kmem_getpages(cache, local_flags, nid);
> +					if (obj)
> +						break;
> +				}
> +			}
> +		} else
> +			obj = kmem_getpages(cache, local_flags, nid);
> +
>  		if (local_flags & __GFP_WAIT)
>  			local_irq_disable();
>  		if (obj) {

This is a better way to go anyway because it really is a proper
"fallback" alloc. I think that possibly used to work (ie. kmem_getpages
would be able to pass -1 for the node there) but got broken along the
line.

Although it's not such a hot path to begin with, care to put a branch
annotation there?

Acked-by: Nick Piggin <npiggin@suse.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
  2010-02-11 21:45     ` David Rientjes
@ 2010-02-15  6:06       ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15  6:06 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li

On Thu, Feb 11, 2010 at 01:45:16PM -0800, David Rientjes wrote:
> On Thu, 11 Feb 2010, Andi Kleen wrote:
> 
> > Index: linux-2.6.32-memhotadd/mm/slab.c
> > ===================================================================
> > --- linux-2.6.32-memhotadd.orig/mm/slab.c
> > +++ linux-2.6.32-memhotadd/mm/slab.c
> > @@ -115,6 +115,7 @@
> >  #include	<linux/reciprocal_div.h>
> >  #include	<linux/debugobjects.h>
> >  #include	<linux/kmemcheck.h>
> > +#include	<linux/memory.h>
> >  
> >  #include	<asm/cacheflush.h>
> >  #include	<asm/tlbflush.h>
> > @@ -1554,6 +1555,23 @@ void __init kmem_cache_init(void)
> >  	g_cpucache_up = EARLY;
> >  }
> >  
> > +static int slab_memory_callback(struct notifier_block *self,
> > +				unsigned long action, void *arg)
> > +{
> > +	struct memory_notify *mn = (struct memory_notify *)arg;
> > +
> > +	/*
> > +	 * When a node goes online allocate l3s early.	 This way
> > +	 * kmalloc_node() works for it.
> > +	 */
> > +	if (action == MEM_ONLINE && mn->status_change_nid >= 0) {
> > +		mutex_lock(&cache_chain_mutex);
> > +		slab_node_prepare(mn->status_change_nid);
> > +		mutex_unlock(&cache_chain_mutex);
> > +	}
> > +	return NOTIFY_OK;
> > +}
> > +
> >  void __init kmem_cache_init_late(void)
> >  {
> >  	struct kmem_cache *cachep;
> > @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
> >  	 */
> >  	register_cpu_notifier(&cpucache_notifier);
> >  
> > +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> > +
> 
> Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
> since status_change_nid will always be -1.

Compiler doesn't know that, though.

> 
> Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
@ 2010-02-15  6:06       ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15  6:06 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li

On Thu, Feb 11, 2010 at 01:45:16PM -0800, David Rientjes wrote:
> On Thu, 11 Feb 2010, Andi Kleen wrote:
> 
> > Index: linux-2.6.32-memhotadd/mm/slab.c
> > ===================================================================
> > --- linux-2.6.32-memhotadd.orig/mm/slab.c
> > +++ linux-2.6.32-memhotadd/mm/slab.c
> > @@ -115,6 +115,7 @@
> >  #include	<linux/reciprocal_div.h>
> >  #include	<linux/debugobjects.h>
> >  #include	<linux/kmemcheck.h>
> > +#include	<linux/memory.h>
> >  
> >  #include	<asm/cacheflush.h>
> >  #include	<asm/tlbflush.h>
> > @@ -1554,6 +1555,23 @@ void __init kmem_cache_init(void)
> >  	g_cpucache_up = EARLY;
> >  }
> >  
> > +static int slab_memory_callback(struct notifier_block *self,
> > +				unsigned long action, void *arg)
> > +{
> > +	struct memory_notify *mn = (struct memory_notify *)arg;
> > +
> > +	/*
> > +	 * When a node goes online allocate l3s early.	 This way
> > +	 * kmalloc_node() works for it.
> > +	 */
> > +	if (action == MEM_ONLINE && mn->status_change_nid >= 0) {
> > +		mutex_lock(&cache_chain_mutex);
> > +		slab_node_prepare(mn->status_change_nid);
> > +		mutex_unlock(&cache_chain_mutex);
> > +	}
> > +	return NOTIFY_OK;
> > +}
> > +
> >  void __init kmem_cache_init_late(void)
> >  {
> >  	struct kmem_cache *cachep;
> > @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
> >  	 */
> >  	register_cpu_notifier(&cpucache_notifier);
> >  
> > +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> > +
> 
> Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
> since status_change_nid will always be -1.

Compiler doesn't know that, though.

> 
> Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-11 20:54   ` Andi Kleen
@ 2010-02-15  6:15     ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15  6:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> 
> cache_reap can run before the node is set up and then reference a NULL 
> l3 list. Check for this explicitely and just continue. The node
> will be eventually set up.

How, may I ask? cpuup_prepare in the hotplug notifier should always
run before start_cpu_timer.

> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ---
>  mm/slab.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6.32-memhotadd/mm/slab.c
> ===================================================================
> --- linux-2.6.32-memhotadd.orig/mm/slab.c
> +++ linux-2.6.32-memhotadd/mm/slab.c
> @@ -4093,6 +4093,9 @@ static void cache_reap(struct work_struc
>  		 * we can do some work if the lock was obtained.
>  		 */
>  		l3 = searchp->nodelists[node];
> +		/* Note node yet set up */
> +		if (!l3)
> +			break;
>  
>  		reap_alien(searchp, l3);
>  
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-15  6:15     ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15  6:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> 
> cache_reap can run before the node is set up and then reference a NULL 
> l3 list. Check for this explicitely and just continue. The node
> will be eventually set up.

How, may I ask? cpuup_prepare in the hotplug notifier should always
run before start_cpu_timer.

> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ---
>  mm/slab.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6.32-memhotadd/mm/slab.c
> ===================================================================
> --- linux-2.6.32-memhotadd.orig/mm/slab.c
> +++ linux-2.6.32-memhotadd/mm/slab.c
> @@ -4093,6 +4093,9 @@ static void cache_reap(struct work_struc
>  		 * we can do some work if the lock was obtained.
>  		 */
>  		l3 = searchp->nodelists[node];
> +		/* Note node yet set up */
> +		if (!l3)
> +			break;
>  
>  		reap_alien(searchp, l3);
>  
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
  2010-02-15  6:04     ` Nick Piggin
@ 2010-02-15 10:07       ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-15 10:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

> This is a better way to go anyway because it really is a proper
> "fallback" alloc. I think that possibly used to work (ie. kmem_getpages
> would be able to pass -1 for the node there) but got broken along the
> line.

Thanks for the review.

I should add there's still one open problem: in some cases 
the oom killer kicks in on hotadd. Still working on that one.

In general hotadd was mighty bitrotted :/

> 
> Although it's not such a hot path to begin with, care to put a branch
> annotation there?

pointer == NULL is already default unlikely in gcc

/* Pointers are usually not NULL.  */
DEF_PREDICTOR (PRED_POINTER, "pointer", HITRATE (85), 0)
DEF_PREDICTOR (PRED_TREE_POINTER, "pointer (on trees)", HITRATE (85), 0)

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
@ 2010-02-15 10:07       ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-15 10:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

> This is a better way to go anyway because it really is a proper
> "fallback" alloc. I think that possibly used to work (ie. kmem_getpages
> would be able to pass -1 for the node there) but got broken along the
> line.

Thanks for the review.

I should add there's still one open problem: in some cases 
the oom killer kicks in on hotadd. Still working on that one.

In general hotadd was mighty bitrotted :/

> 
> Although it's not such a hot path to begin with, care to put a branch
> annotation there?

pointer == NULL is already default unlikely in gcc

/* Pointers are usually not NULL.  */
DEF_PREDICTOR (PRED_POINTER, "pointer", HITRATE (85), 0)
DEF_PREDICTOR (PRED_TREE_POINTER, "pointer (on trees)", HITRATE (85), 0)

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
  2010-02-15 10:07       ` Andi Kleen
@ 2010-02-15 10:22         ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15 10:22 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 11:07:12AM +0100, Andi Kleen wrote:
> > This is a better way to go anyway because it really is a proper
> > "fallback" alloc. I think that possibly used to work (ie. kmem_getpages
> > would be able to pass -1 for the node there) but got broken along the
> > line.
> 
> Thanks for the review.
> 
> I should add there's still one open problem: in some cases 
> the oom killer kicks in on hotadd. Still working on that one.
> 
> In general hotadd was mighty bitrotted :/

Yes, that doesn't surprise me. I'm sure you can handle it, but send
some traces if you have problems.

 
> > Although it's not such a hot path to begin with, care to put a branch
> > annotation there?
> 
> pointer == NULL is already default unlikely in gcc
> 
> /* Pointers are usually not NULL.  */
> DEF_PREDICTOR (PRED_POINTER, "pointer", HITRATE (85), 0)
> DEF_PREDICTOR (PRED_TREE_POINTER, "pointer (on trees)", HITRATE (85), 0)

Well I still prefer to annotate it. I think builtin expect is 99%.


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2
@ 2010-02-15 10:22         ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15 10:22 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 11:07:12AM +0100, Andi Kleen wrote:
> > This is a better way to go anyway because it really is a proper
> > "fallback" alloc. I think that possibly used to work (ie. kmem_getpages
> > would be able to pass -1 for the node there) but got broken along the
> > line.
> 
> Thanks for the review.
> 
> I should add there's still one open problem: in some cases 
> the oom killer kicks in on hotadd. Still working on that one.
> 
> In general hotadd was mighty bitrotted :/

Yes, that doesn't surprise me. I'm sure you can handle it, but send
some traces if you have problems.

 
> > Although it's not such a hot path to begin with, care to put a branch
> > annotation there?
> 
> pointer == NULL is already default unlikely in gcc
> 
> /* Pointers are usually not NULL.  */
> DEF_PREDICTOR (PRED_POINTER, "pointer", HITRATE (85), 0)
> DEF_PREDICTOR (PRED_TREE_POINTER, "pointer (on trees)", HITRATE (85), 0)

Well I still prefer to annotate it. I think builtin expect is 99%.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-15  6:15     ` Nick Piggin
@ 2010-02-15 10:32       ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-15 10:32 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > 
> > cache_reap can run before the node is set up and then reference a NULL 
> > l3 list. Check for this explicitely and just continue. The node
> > will be eventually set up.
> 
> How, may I ask? cpuup_prepare in the hotplug notifier should always
> run before start_cpu_timer.

I'm not fully sure, but I have the oops to prove it :)

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-15 10:32       ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-15 10:32 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > 
> > cache_reap can run before the node is set up and then reference a NULL 
> > l3 list. Check for this explicitely and just continue. The node
> > will be eventually set up.
> 
> How, may I ask? cpuup_prepare in the hotplug notifier should always
> run before start_cpu_timer.

I'm not fully sure, but I have the oops to prove it :)

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-15 10:32       ` Andi Kleen
@ 2010-02-15 10:41         ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15 10:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 11:32:50AM +0100, Andi Kleen wrote:
> On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> > On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > > 
> > > cache_reap can run before the node is set up and then reference a NULL 
> > > l3 list. Check for this explicitely and just continue. The node
> > > will be eventually set up.
> > 
> > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > run before start_cpu_timer.
> 
> I'm not fully sure, but I have the oops to prove it :)

Hmm, it would be nice to work out why it's happening. If it's completely
reproducible then could I send you a debug patch to test?


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-15 10:41         ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15 10:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 11:32:50AM +0100, Andi Kleen wrote:
> On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> > On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > > 
> > > cache_reap can run before the node is set up and then reference a NULL 
> > > l3 list. Check for this explicitely and just continue. The node
> > > will be eventually set up.
> > 
> > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > run before start_cpu_timer.
> 
> I'm not fully sure, but I have the oops to prove it :)

Hmm, it would be nice to work out why it's happening. If it's completely
reproducible then could I send you a debug patch to test?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-15 10:41         ` Nick Piggin
@ 2010-02-15 10:52           ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-15 10:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 09:41:35PM +1100, Nick Piggin wrote:
> On Mon, Feb 15, 2010 at 11:32:50AM +0100, Andi Kleen wrote:
> > On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> > > On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > > > 
> > > > cache_reap can run before the node is set up and then reference a NULL 
> > > > l3 list. Check for this explicitely and just continue. The node
> > > > will be eventually set up.
> > > 
> > > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > > run before start_cpu_timer.
> > 
> > I'm not fully sure, but I have the oops to prove it :)
> 
> Hmm, it would be nice to work out why it's happening. If it's completely
> reproducible then could I send you a debug patch to test?

Looking at it again I suspect it happened this way:

cpuup_prepare fails (e.g. kmalloc_node returns NULL). The later
patches might have cured that. Nothing stops the timer from
starting in this case anyways.

So given that the first patches might not be needed, but it's
safer to have anyways.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-15 10:52           ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-15 10:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 09:41:35PM +1100, Nick Piggin wrote:
> On Mon, Feb 15, 2010 at 11:32:50AM +0100, Andi Kleen wrote:
> > On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> > > On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > > > 
> > > > cache_reap can run before the node is set up and then reference a NULL 
> > > > l3 list. Check for this explicitely and just continue. The node
> > > > will be eventually set up.
> > > 
> > > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > > run before start_cpu_timer.
> > 
> > I'm not fully sure, but I have the oops to prove it :)
> 
> Hmm, it would be nice to work out why it's happening. If it's completely
> reproducible then could I send you a debug patch to test?

Looking at it again I suspect it happened this way:

cpuup_prepare fails (e.g. kmalloc_node returns NULL). The later
patches might have cured that. Nothing stops the timer from
starting in this case anyways.

So given that the first patches might not be needed, but it's
safer to have anyways.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-15 10:52           ` Andi Kleen
@ 2010-02-15 11:01             ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15 11:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 11:52:53AM +0100, Andi Kleen wrote:
> On Mon, Feb 15, 2010 at 09:41:35PM +1100, Nick Piggin wrote:
> > On Mon, Feb 15, 2010 at 11:32:50AM +0100, Andi Kleen wrote:
> > > On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> > > > On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > > > > 
> > > > > cache_reap can run before the node is set up and then reference a NULL 
> > > > > l3 list. Check for this explicitely and just continue. The node
> > > > > will be eventually set up.
> > > > 
> > > > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > > > run before start_cpu_timer.
> > > 
> > > I'm not fully sure, but I have the oops to prove it :)
> > 
> > Hmm, it would be nice to work out why it's happening. If it's completely
> > reproducible then could I send you a debug patch to test?
> 
> Looking at it again I suspect it happened this way:
> 
> cpuup_prepare fails (e.g. kmalloc_node returns NULL). The later
> patches might have cured that. Nothing stops the timer from
> starting in this case anyways.

Hmm, but it should, because if cpuup_prepare fails  then the
CPU_ONLINE notifiers should never be called I think.

 
> So given that the first patches might not be needed, but it's
> safer to have anyways.

I'm just worried there is still an underlying problem here.


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-15 11:01             ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-15 11:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, Feb 15, 2010 at 11:52:53AM +0100, Andi Kleen wrote:
> On Mon, Feb 15, 2010 at 09:41:35PM +1100, Nick Piggin wrote:
> > On Mon, Feb 15, 2010 at 11:32:50AM +0100, Andi Kleen wrote:
> > > On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> > > > On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > > > > 
> > > > > cache_reap can run before the node is set up and then reference a NULL 
> > > > > l3 list. Check for this explicitely and just continue. The node
> > > > > will be eventually set up.
> > > > 
> > > > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > > > run before start_cpu_timer.
> > > 
> > > I'm not fully sure, but I have the oops to prove it :)
> > 
> > Hmm, it would be nice to work out why it's happening. If it's completely
> > reproducible then could I send you a debug patch to test?
> 
> Looking at it again I suspect it happened this way:
> 
> cpuup_prepare fails (e.g. kmalloc_node returns NULL). The later
> patches might have cured that. Nothing stops the timer from
> starting in this case anyways.

Hmm, but it should, because if cpuup_prepare fails  then the
CPU_ONLINE notifiers should never be called I think.

 
> So given that the first patches might not be needed, but it's
> safer to have anyways.

I'm just worried there is still an underlying problem here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-15 11:01             ` Nick Piggin
@ 2010-02-15 15:30               ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-15 15:30 UTC (permalink / raw)
  To: Nick Piggin; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

Nick Piggin <npiggin@suse.de> writes:
>
> Hmm, but it should, because if cpuup_prepare fails  then the
> CPU_ONLINE notifiers should never be called I think.

That's true.
  
-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-15 15:30               ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-15 15:30 UTC (permalink / raw)
  To: Nick Piggin; +Cc: penberg, linux-kernel, linux-mm, haicheng.li, rientjes

Nick Piggin <npiggin@suse.de> writes:
>
> Hmm, but it should, because if cpuup_prepare fails  then the
> CPU_ONLINE notifiers should never be called I think.

That's true.
  
-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
  2010-02-15  6:06       ` Nick Piggin
@ 2010-02-15 21:47         ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-15 21:47 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andi Kleen, Pekka Enberg, linux-kernel, linux-mm, haicheng.li

On Mon, 15 Feb 2010, Nick Piggin wrote:

> > > @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
> > >  	 */
> > >  	register_cpu_notifier(&cpucache_notifier);
> > >  
> > > +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> > > +
> > 
> > Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
> > since status_change_nid will always be -1.
> 
> Compiler doesn't know that, though.
> 

Right, setting up a memory hotplug callback for UMA kernels here isn't 
necessary although slab_node_prepare() would have to be defined 
unconditionally.  I made this suggestion in my review of the patchset's 
initial version but it was left unchanged, so I'd rather see it included 
than otherwise stall out.  This could always be enclosed in
#ifdef CONFIG_NUMA later just like the callback in slub does.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
@ 2010-02-15 21:47         ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-15 21:47 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andi Kleen, Pekka Enberg, linux-kernel, linux-mm, haicheng.li

On Mon, 15 Feb 2010, Nick Piggin wrote:

> > > @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
> > >  	 */
> > >  	register_cpu_notifier(&cpucache_notifier);
> > >  
> > > +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> > > +
> > 
> > Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
> > since status_change_nid will always be -1.
> 
> Compiler doesn't know that, though.
> 

Right, setting up a memory hotplug callback for UMA kernels here isn't 
necessary although slab_node_prepare() would have to be defined 
unconditionally.  I made this suggestion in my review of the patchset's 
initial version but it was left unchanged, so I'd rather see it included 
than otherwise stall out.  This could always be enclosed in
#ifdef CONFIG_NUMA later just like the callback in slub does.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
  2010-02-15 21:47         ` David Rientjes
@ 2010-02-16 14:04           ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-16 14:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andi Kleen, Pekka Enberg, linux-kernel, linux-mm, haicheng.li

On Mon, Feb 15, 2010 at 01:47:29PM -0800, David Rientjes wrote:
> On Mon, 15 Feb 2010, Nick Piggin wrote:
> 
> > > > @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
> > > >  	 */
> > > >  	register_cpu_notifier(&cpucache_notifier);
> > > >  
> > > > +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> > > > +
> > > 
> > > Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
> > > since status_change_nid will always be -1.
> > 
> > Compiler doesn't know that, though.
> > 
> 
> Right, setting up a memory hotplug callback for UMA kernels here isn't 
> necessary although slab_node_prepare() would have to be defined 
> unconditionally.  I made this suggestion in my review of the patchset's 
> initial version but it was left unchanged, so I'd rather see it included 
> than otherwise stall out.  This could always be enclosed in
> #ifdef CONFIG_NUMA later just like the callback in slub does.

It's not such a big burden to annotate critical core code with such
things. Otherwise someone else ends up eventually doing it.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
@ 2010-02-16 14:04           ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-02-16 14:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andi Kleen, Pekka Enberg, linux-kernel, linux-mm, haicheng.li

On Mon, Feb 15, 2010 at 01:47:29PM -0800, David Rientjes wrote:
> On Mon, 15 Feb 2010, Nick Piggin wrote:
> 
> > > > @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
> > > >  	 */
> > > >  	register_cpu_notifier(&cpucache_notifier);
> > > >  
> > > > +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> > > > +
> > > 
> > > Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
> > > since status_change_nid will always be -1.
> > 
> > Compiler doesn't know that, though.
> > 
> 
> Right, setting up a memory hotplug callback for UMA kernels here isn't 
> necessary although slab_node_prepare() would have to be defined 
> unconditionally.  I made this suggestion in my review of the patchset's 
> initial version but it was left unchanged, so I'd rather see it included 
> than otherwise stall out.  This could always be enclosed in
> #ifdef CONFIG_NUMA later just like the callback in slub does.

It's not such a big burden to annotate critical core code with such
things. Otherwise someone else ends up eventually doing it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
  2010-02-16 14:04           ` Nick Piggin
@ 2010-02-16 20:45             ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-16 20:45 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Rientjes, Andi Kleen, linux-kernel, linux-mm, haicheng.li

Nick Piggin wrote:
> On Mon, Feb 15, 2010 at 01:47:29PM -0800, David Rientjes wrote:
>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>
>>>>> @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
>>>>>  	 */
>>>>>  	register_cpu_notifier(&cpucache_notifier);
>>>>>  
>>>>> +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
>>>>> +
>>>> Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
>>>> since status_change_nid will always be -1.
>>> Compiler doesn't know that, though.
>>>
>> Right, setting up a memory hotplug callback for UMA kernels here isn't 
>> necessary although slab_node_prepare() would have to be defined 
>> unconditionally.  I made this suggestion in my review of the patchset's 
>> initial version but it was left unchanged, so I'd rather see it included 
>> than otherwise stall out.  This could always be enclosed in
>> #ifdef CONFIG_NUMA later just like the callback in slub does.
> 
> It's not such a big burden to annotate critical core code with such
> things. Otherwise someone else ends up eventually doing it.

Yes, please.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2
@ 2010-02-16 20:45             ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-16 20:45 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Rientjes, Andi Kleen, linux-kernel, linux-mm, haicheng.li

Nick Piggin wrote:
> On Mon, Feb 15, 2010 at 01:47:29PM -0800, David Rientjes wrote:
>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>
>>>>> @@ -1577,6 +1595,8 @@ void __init kmem_cache_init_late(void)
>>>>>  	 */
>>>>>  	register_cpu_notifier(&cpucache_notifier);
>>>>>  
>>>>> +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
>>>>> +
>>>> Only needed for CONFIG_NUMA, but there's no side-effects for UMA kernels 
>>>> since status_change_nid will always be -1.
>>> Compiler doesn't know that, though.
>>>
>> Right, setting up a memory hotplug callback for UMA kernels here isn't 
>> necessary although slab_node_prepare() would have to be defined 
>> unconditionally.  I made this suggestion in my review of the patchset's 
>> initial version but it was left unchanged, so I'd rather see it included 
>> than otherwise stall out.  This could always be enclosed in
>> #ifdef CONFIG_NUMA later just like the callback in slub does.
> 
> It's not such a big burden to annotate critical core code with such
> things. Otherwise someone else ends up eventually doing it.

Yes, please.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-15 10:32       ` Andi Kleen
@ 2010-02-19 18:22         ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-19 18:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Nick Piggin, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, 15 Feb 2010, Andi Kleen wrote:

> > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > run before start_cpu_timer.
>
> I'm not fully sure, but I have the oops to prove it :)

I still suspect that this has something to do with Pekka's changing the
boot order for allocator bootstrap. Can we clarify why these problems
exist before we try band aid?


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-19 18:22         ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-19 18:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Nick Piggin, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, 15 Feb 2010, Andi Kleen wrote:

> > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > run before start_cpu_timer.
>
> I'm not fully sure, but I have the oops to prove it :)

I still suspect that this has something to do with Pekka's changing the
boot order for allocator bootstrap. Can we clarify why these problems
exist before we try band aid?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-15 11:01             ` Nick Piggin
@ 2010-02-19 18:22               ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-19 18:22 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, 15 Feb 2010, Nick Piggin wrote:

> I'm just worried there is still an underlying problem here.

So am I. What caused the breakage that requires this patchset?


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-19 18:22               ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-19 18:22 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andi Kleen, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Mon, 15 Feb 2010, Nick Piggin wrote:

> I'm just worried there is still an underlying problem here.

So am I. What caused the breakage that requires this patchset?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-19 18:22               ` Christoph Lameter
@ 2010-02-20  9:01                 ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-20  9:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Andi Kleen, penberg, linux-kernel, linux-mm,
	haicheng.li, rientjes

On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
> On Mon, 15 Feb 2010, Nick Piggin wrote:
> 
> > I'm just worried there is still an underlying problem here.
> 
> So am I. What caused the breakage that requires this patchset?

Memory hotadd with a new node being onlined.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-20  9:01                 ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-20  9:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Andi Kleen, penberg, linux-kernel, linux-mm,
	haicheng.li, rientjes

On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
> On Mon, 15 Feb 2010, Nick Piggin wrote:
> 
> > I'm just worried there is still an underlying problem here.
> 
> So am I. What caused the breakage that requires this patchset?

Memory hotadd with a new node being onlined.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-20  9:01                 ` Andi Kleen
@ 2010-02-22 10:53                   ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-22 10:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, rientjes

Andi Kleen kirjoitti:
> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>
>>> I'm just worried there is still an underlying problem here.
>> So am I. What caused the breakage that requires this patchset?
> 
> Memory hotadd with a new node being onlined.

So can you post the oops, please? Right now I am looking at zapping the 
series from slab.git due to NAKs from both Christoph and Nick.

			Pekka

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-22 10:53                   ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-22 10:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, rientjes

Andi Kleen kirjoitti:
> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>
>>> I'm just worried there is still an underlying problem here.
>> So am I. What caused the breakage that requires this patchset?
> 
> Memory hotadd with a new node being onlined.

So can you post the oops, please? Right now I am looking at zapping the 
series from slab.git due to NAKs from both Christoph and Nick.

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-19 18:22         ` Christoph Lameter
@ 2010-02-22 10:57           ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-22 10:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li, rientjes

Christoph Lameter kirjoitti:
> On Mon, 15 Feb 2010, Andi Kleen wrote:
> 
>>> How, may I ask? cpuup_prepare in the hotplug notifier should always
>>> run before start_cpu_timer.
>> I'm not fully sure, but I have the oops to prove it :)
> 
> I still suspect that this has something to do with Pekka's changing the
> boot order for allocator bootstrap. Can we clarify why these problems
> exist before we try band aid?

I don't see how my changes broke things but maybe I'm not looking hard 
enough. Cache reaping is still setup from cpucache_init() which is an 
initcall which is not affected by my changes AFAICT and from 
cpuup_callback() which shoulda also not be affected.

				Pekka

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-22 10:57           ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-22 10:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li, rientjes

Christoph Lameter kirjoitti:
> On Mon, 15 Feb 2010, Andi Kleen wrote:
> 
>>> How, may I ask? cpuup_prepare in the hotplug notifier should always
>>> run before start_cpu_timer.
>> I'm not fully sure, but I have the oops to prove it :)
> 
> I still suspect that this has something to do with Pekka's changing the
> boot order for allocator bootstrap. Can we clarify why these problems
> exist before we try band aid?

I don't see how my changes broke things but maybe I'm not looking hard 
enough. Cache reaping is still setup from cpucache_init() which is an 
initcall which is not affected by my changes AFAICT and from 
cpuup_callback() which shoulda also not be affected.

				Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-22 10:53                   ` Pekka Enberg
@ 2010-02-22 14:31                     ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-22 14:31 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Christoph Lameter, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, rientjes

On Mon, Feb 22, 2010 at 12:53:27PM +0200, Pekka Enberg wrote:
> Andi Kleen kirjoitti:
>> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>>
>>>> I'm just worried there is still an underlying problem here.
>>> So am I. What caused the breakage that requires this patchset?
>>
>> Memory hotadd with a new node being onlined.
>
> So can you post the oops, please? Right now I am looking at zapping the 

I can't post the oops from a pre-release system.

> series from slab.git due to NAKs from both Christoph and Nick.

Huh? They just complained about the patch, not the whole series.
I don't understand how that could prompt you to drop the whole series.

As far as I know nobody said the patch is wrong so far, just
that they wanted to have more analysis.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-22 14:31                     ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-22 14:31 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Christoph Lameter, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, rientjes

On Mon, Feb 22, 2010 at 12:53:27PM +0200, Pekka Enberg wrote:
> Andi Kleen kirjoitti:
>> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>>
>>>> I'm just worried there is still an underlying problem here.
>>> So am I. What caused the breakage that requires this patchset?
>>
>> Memory hotadd with a new node being onlined.
>
> So can you post the oops, please? Right now I am looking at zapping the 

I can't post the oops from a pre-release system.

> series from slab.git due to NAKs from both Christoph and Nick.

Huh? They just complained about the patch, not the whole series.
I don't understand how that could prompt you to drop the whole series.

As far as I know nobody said the patch is wrong so far, just
that they wanted to have more analysis.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-22 14:31                     ` Andi Kleen
@ 2010-02-22 16:11                       ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-22 16:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, rientjes

Andi Kleen wrote:
> On Mon, Feb 22, 2010 at 12:53:27PM +0200, Pekka Enberg wrote:
>> Andi Kleen kirjoitti:
>>> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>>>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>>>
>>>>> I'm just worried there is still an underlying problem here.
>>>> So am I. What caused the breakage that requires this patchset?
>>> Memory hotadd with a new node being onlined.
>> So can you post the oops, please? Right now I am looking at zapping the 
> 
> I can't post the oops from a pre-release system.
> 
>> series from slab.git due to NAKs from both Christoph and Nick.
> 
> Huh? They just complained about the patch, not the whole series.
> I don't understand how that could prompt you to drop the whole series.

Yeah, I meant the non-ACK'd patches. Sorry for the confusion.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-22 16:11                       ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-22 16:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, rientjes

Andi Kleen wrote:
> On Mon, Feb 22, 2010 at 12:53:27PM +0200, Pekka Enberg wrote:
>> Andi Kleen kirjoitti:
>>> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>>>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>>>
>>>>> I'm just worried there is still an underlying problem here.
>>>> So am I. What caused the breakage that requires this patchset?
>>> Memory hotadd with a new node being onlined.
>> So can you post the oops, please? Right now I am looking at zapping the 
> 
> I can't post the oops from a pre-release system.
> 
>> series from slab.git due to NAKs from both Christoph and Nick.
> 
> Huh? They just complained about the patch, not the whole series.
> I don't understand how that could prompt you to drop the whole series.

Yeah, I meant the non-ACK'd patches. Sorry for the confusion.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-22 16:11                       ` Pekka Enberg
@ 2010-02-22 20:20                         ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-22 20:20 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Christoph Lameter, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, rientjes

On Mon, Feb 22, 2010 at 06:11:03PM +0200, Pekka Enberg wrote:
> Andi Kleen wrote:
>> On Mon, Feb 22, 2010 at 12:53:27PM +0200, Pekka Enberg wrote:
>>> Andi Kleen kirjoitti:
>>>> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>>>>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>>>>
>>>>>> I'm just worried there is still an underlying problem here.
>>>>> So am I. What caused the breakage that requires this patchset?
>>>> Memory hotadd with a new node being onlined.
>>> So can you post the oops, please? Right now I am looking at zapping the 
>>
>> I can't post the oops from a pre-release system.
>>
>>> series from slab.git due to NAKs from both Christoph and Nick.
>>
>> Huh? They just complained about the patch, not the whole series.
>> I don't understand how that could prompt you to drop the whole series.
>
> Yeah, I meant the non-ACK'd patches. Sorry for the confusion.

Ok it's fine for me to drop that patch for now. I'll try to reproduce
that oops and if I can't then it might be just not needed.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-22 20:20                         ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-22 20:20 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Christoph Lameter, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, rientjes

On Mon, Feb 22, 2010 at 06:11:03PM +0200, Pekka Enberg wrote:
> Andi Kleen wrote:
>> On Mon, Feb 22, 2010 at 12:53:27PM +0200, Pekka Enberg wrote:
>>> Andi Kleen kirjoitti:
>>>> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>>>>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>>>>
>>>>>> I'm just worried there is still an underlying problem here.
>>>>> So am I. What caused the breakage that requires this patchset?
>>>> Memory hotadd with a new node being onlined.
>>> So can you post the oops, please? Right now I am looking at zapping the 
>>
>> I can't post the oops from a pre-release system.
>>
>>> series from slab.git due to NAKs from both Christoph and Nick.
>>
>> Huh? They just complained about the patch, not the whole series.
>> I don't understand how that could prompt you to drop the whole series.
>
> Yeah, I meant the non-ACK'd patches. Sorry for the confusion.

Ok it's fine for me to drop that patch for now. I'll try to reproduce
that oops and if I can't then it might be just not needed.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-20  9:01                 ` Andi Kleen
@ 2010-02-24 15:49                   ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-24 15:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Nick Piggin, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Sat, 20 Feb 2010, Andi Kleen wrote:

> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
> > On Mon, 15 Feb 2010, Nick Piggin wrote:
> >
> > > I'm just worried there is still an underlying problem here.
> >
> > So am I. What caused the breakage that requires this patchset?
>
> Memory hotadd with a new node being onlined.

That used to work fine.


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-24 15:49                   ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-24 15:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Nick Piggin, penberg, linux-kernel, linux-mm, haicheng.li, rientjes

On Sat, 20 Feb 2010, Andi Kleen wrote:

> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
> > On Mon, 15 Feb 2010, Nick Piggin wrote:
> >
> > > I'm just worried there is still an underlying problem here.
> >
> > So am I. What caused the breakage that requires this patchset?
>
> Memory hotadd with a new node being onlined.

That used to work fine.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-24 15:49                   ` Christoph Lameter
@ 2010-02-25  7:26                     ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-25  7:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li, rientjes

Christoph Lameter wrote:
> On Sat, 20 Feb 2010, Andi Kleen wrote:
> 
>> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>>
>>>> I'm just worried there is still an underlying problem here.
>>> So am I. What caused the breakage that requires this patchset?
>> Memory hotadd with a new node being onlined.
> 
> That used to work fine.

OK, can we get this issue resolved? The merge window is open and 
Christoph seems to be unhappy with the whole patch queue. I'd hate this 
bug fix to miss .34...

			Pekka

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-25  7:26                     ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-25  7:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li, rientjes

Christoph Lameter wrote:
> On Sat, 20 Feb 2010, Andi Kleen wrote:
> 
>> On Fri, Feb 19, 2010 at 12:22:58PM -0600, Christoph Lameter wrote:
>>> On Mon, 15 Feb 2010, Nick Piggin wrote:
>>>
>>>> I'm just worried there is still an underlying problem here.
>>> So am I. What caused the breakage that requires this patchset?
>> Memory hotadd with a new node being onlined.
> 
> That used to work fine.

OK, can we get this issue resolved? The merge window is open and 
Christoph seems to be unhappy with the whole patch queue. I'd hate this 
bug fix to miss .34...

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25  7:26                     ` Pekka Enberg
@ 2010-02-25  8:01                       ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-25  8:01 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andi Kleen, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li

On Thu, 25 Feb 2010, Pekka Enberg wrote:

> > > > > I'm just worried there is still an underlying problem here.
> > > > So am I. What caused the breakage that requires this patchset?
> > > Memory hotadd with a new node being onlined.
> > 
> > That used to work fine.
> 
> OK, can we get this issue resolved? The merge window is open and Christoph
> seems to be unhappy with the whole patch queue. I'd hate this bug fix to miss
> .34...
> 

I don't see how memory hotadd with a new node being onlined could have 
worked fine before since slab lacked any memory hotplug notifier until 
Andi just added it.

That said, I think the first and fourth patch in this series may be 
unnecessary if slab's notifier were to call slab_node_prepare() on 
MEM_GOING_ONLINE instead of MEM_ONLINE.  Otherwise, kswapd is already 
running, the zonelists for the new pgdat have been initialized, and the 
bit has been set in node_states[N_HIGH_MEMORY] without allocated 
cachep->nodelists[node] memory.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-25  8:01                       ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-25  8:01 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andi Kleen, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li

On Thu, 25 Feb 2010, Pekka Enberg wrote:

> > > > > I'm just worried there is still an underlying problem here.
> > > > So am I. What caused the breakage that requires this patchset?
> > > Memory hotadd with a new node being onlined.
> > 
> > That used to work fine.
> 
> OK, can we get this issue resolved? The merge window is open and Christoph
> seems to be unhappy with the whole patch queue. I'd hate this bug fix to miss
> .34...
> 

I don't see how memory hotadd with a new node being onlined could have 
worked fine before since slab lacked any memory hotplug notifier until 
Andi just added it.

That said, I think the first and fourth patch in this series may be 
unnecessary if slab's notifier were to call slab_node_prepare() on 
MEM_GOING_ONLINE instead of MEM_ONLINE.  Otherwise, kswapd is already 
running, the zonelists for the new pgdat have been initialized, and the 
bit has been set in node_states[N_HIGH_MEMORY] without allocated 
cachep->nodelists[node] memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25  8:01                       ` David Rientjes
@ 2010-02-25 18:30                         ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-25 18:30 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, David Rientjes wrote:

> I don't see how memory hotadd with a new node being onlined could have
> worked fine before since slab lacked any memory hotplug notifier until
> Andi just added it.

AFAICR The cpu notifier took on that role in the past.

If what you say is true then memory hotplug has never worked before.
Kamesan?


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-25 18:30                         ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-25 18:30 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, David Rientjes wrote:

> I don't see how memory hotadd with a new node being onlined could have
> worked fine before since slab lacked any memory hotplug notifier until
> Andi just added it.

AFAICR The cpu notifier took on that role in the past.

If what you say is true then memory hotplug has never worked before.
Kamesan?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25  7:26                     ` Pekka Enberg
@ 2010-02-25 18:34                       ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-25 18:34 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li,
	rientjes, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, Pekka Enberg wrote:

> OK, can we get this issue resolved? The merge window is open and Christoph
> seems to be unhappy with the whole patch queue. I'd hate this bug fix to miss
> .34...

Merge window? These are bugs that have to be fixed independently from a
merge window. The question is if this is the right approach or if there is
other stuff still lurking because we are not yet seeing the full picture.

Can we get some of the hotplug authors involved in the discussion?




^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-25 18:34                       ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-25 18:34 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li,
	rientjes, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, Pekka Enberg wrote:

> OK, can we get this issue resolved? The merge window is open and Christoph
> seems to be unhappy with the whole patch queue. I'd hate this bug fix to miss
> .34...

Merge window? These are bugs that have to be fixed independently from a
merge window. The question is if this is the right approach or if there is
other stuff still lurking because we are not yet seeing the full picture.

Can we get some of the hotplug authors involved in the discussion?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25 18:34                       ` Christoph Lameter
@ 2010-02-25 18:46                         ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-25 18:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li,
	rientjes, KAMEZAWA Hiroyuki

Hi Christoph,

Christoph Lameter wrote:
>> OK, can we get this issue resolved? The merge window is open and Christoph
>> seems to be unhappy with the whole patch queue. I'd hate this bug fix to miss
>> .34...
> 
> Merge window? These are bugs that have to be fixed independently from a
> merge window. The question is if this is the right approach or if there is
> other stuff still lurking because we are not yet seeing the full picture.

The first set of patches from Andi are almost one month old. If this 
issue progresses as swiftly as it has to this day, I foresee a rocky 
road for any of them getting merged to .34 through slab.git, that's all.

			Pekka

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-25 18:46                         ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-25 18:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li,
	rientjes, KAMEZAWA Hiroyuki

Hi Christoph,

Christoph Lameter wrote:
>> OK, can we get this issue resolved? The merge window is open and Christoph
>> seems to be unhappy with the whole patch queue. I'd hate this bug fix to miss
>> .34...
> 
> Merge window? These are bugs that have to be fixed independently from a
> merge window. The question is if this is the right approach or if there is
> other stuff still lurking because we are not yet seeing the full picture.

The first set of patches from Andi are almost one month old. If this 
issue progresses as swiftly as it has to this day, I foresee a rocky 
road for any of them getting merged to .34 through slab.git, that's all.

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25 18:46                         ` Pekka Enberg
@ 2010-02-25 19:19                           ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-25 19:19 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li,
	rientjes, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, Pekka Enberg wrote:

> The first set of patches from Andi are almost one month old. If this issue
> progresses as swiftly as it has to this day, I foresee a rocky road for any of
> them getting merged to .34 through slab.git, that's all.

Onlining and offlining memory is not that frequently used.


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-25 19:19                           ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-25 19:19 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Nick Piggin, linux-kernel, linux-mm, haicheng.li,
	rientjes, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, Pekka Enberg wrote:

> The first set of patches from Andi are almost one month old. If this issue
> progresses as swiftly as it has to this day, I foresee a rocky road for any of
> them getting merged to .34 through slab.git, that's all.

Onlining and offlining memory is not that frequently used.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25 18:30                         ` Christoph Lameter
@ 2010-02-25 21:45                           ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-25 21:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, Christoph Lameter wrote:

> > I don't see how memory hotadd with a new node being onlined could have
> > worked fine before since slab lacked any memory hotplug notifier until
> > Andi just added it.
> 
> AFAICR The cpu notifier took on that role in the past.
> 

The cpu notifier isn't involved if the firmware notifies the kernel that a 
new ACPI memory device has been added or you write a start address to 
/sys/devices/system/memory/probe.  Hot-added memory devices can include 
ACPI_SRAT_MEM_HOT_PLUGGABLE entries in the SRAT for x86 that assign them 
non-online node ids (although all such entries get their bits set in 
node_possible_map at boot), so a new pgdat may be allocated for the node's 
registered range.

Slab isn't concerned about that until the memory is onlined by doing 
echo online > /sys/devices/system/memory/memoryX/state for the new memory 
section.  This is where all the new pages are onlined, kswapd is started 
on the new node, and the zonelists are built.  It's also where the new 
node gets set in N_HIGH_MEMORY and, thus, it's possible to call 
kmalloc_node() in generic kernel code.  All that is done under 
MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and 
fourth patch in this series may not be necessary if we prevent setting the 
bit in the nodemask or building the zonelists until the slab nodelists are 
ready.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-25 21:45                           ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-25 21:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, Christoph Lameter wrote:

> > I don't see how memory hotadd with a new node being onlined could have
> > worked fine before since slab lacked any memory hotplug notifier until
> > Andi just added it.
> 
> AFAICR The cpu notifier took on that role in the past.
> 

The cpu notifier isn't involved if the firmware notifies the kernel that a 
new ACPI memory device has been added or you write a start address to 
/sys/devices/system/memory/probe.  Hot-added memory devices can include 
ACPI_SRAT_MEM_HOT_PLUGGABLE entries in the SRAT for x86 that assign them 
non-online node ids (although all such entries get their bits set in 
node_possible_map at boot), so a new pgdat may be allocated for the node's 
registered range.

Slab isn't concerned about that until the memory is onlined by doing 
echo online > /sys/devices/system/memory/memoryX/state for the new memory 
section.  This is where all the new pages are onlined, kswapd is started 
on the new node, and the zonelists are built.  It's also where the new 
node gets set in N_HIGH_MEMORY and, thus, it's possible to call 
kmalloc_node() in generic kernel code.  All that is done under 
MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and 
fourth patch in this series may not be necessary if we prevent setting the 
bit in the nodemask or building the zonelists until the slab nodelists are 
ready.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25 21:45                           ` David Rientjes
@ 2010-02-25 22:31                             ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-25 22:31 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, David Rientjes wrote:

> On Thu, 25 Feb 2010, Christoph Lameter wrote:
>
> > > I don't see how memory hotadd with a new node being onlined could have
> > > worked fine before since slab lacked any memory hotplug notifier until
> > > Andi just added it.
> >
> > AFAICR The cpu notifier took on that role in the past.
> >
>
> The cpu notifier isn't involved if the firmware notifies the kernel that a
> new ACPI memory device has been added or you write a start address to
> /sys/devices/system/memory/probe.  Hot-added memory devices can include
> ACPI_SRAT_MEM_HOT_PLUGGABLE entries in the SRAT for x86 that assign them
> non-online node ids (although all such entries get their bits set in
> node_possible_map at boot), so a new pgdat may be allocated for the node's
> registered range.

Yes Andi's work makes it explicit but there is already code in the cpu
notifier (see cpuup_prepare) that seems to have been intended to
initialize the node structures. Wonder why the hotplug people never
addressed that issue? Kame?


      list_for_each_entry(cachep, &cache_chain, next) {
                /*
                 * Set up the size64 kmemlist for cpu before we can
                 * begin anything. Make sure some other cpu on this
                 * node has not already allocated this
                 */
                if (!cachep->nodelists[node]) {
                        l3 = kmalloc_node(memsize, GFP_KERNEL, node);
                        if (!l3)
                                goto bad;
                        kmem_list3_init(l3);
                        l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
                            ((unsigned long)cachep) % REAPTIMEOUT_LIST3;

                        /*
                         * The l3s don't come and go as CPUs come and
                         * go.  cache_chain_mutex is sufficient
                         * protection here.
                         */
                        cachep->nodelists[node] = l3;
                }

                spin_lock_irq(&cachep->nodelists[node]->list_lock);
                cachep->nodelists[node]->free_limit =
                        (1 + nr_cpus_node(node)) *
                        cachep->batchcount + cachep->num;
                spin_unlock_irq(&cachep->nodelists[node]->list_lock);
        }


> kmalloc_node() in generic kernel code.  All that is done under
> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
> fourth patch in this series may not be necessary if we prevent setting the
> bit in the nodemask or building the zonelists until the slab nodelists are
> ready.

That sounds good.



^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-25 22:31                             ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-25 22:31 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Thu, 25 Feb 2010, David Rientjes wrote:

> On Thu, 25 Feb 2010, Christoph Lameter wrote:
>
> > > I don't see how memory hotadd with a new node being onlined could have
> > > worked fine before since slab lacked any memory hotplug notifier until
> > > Andi just added it.
> >
> > AFAICR The cpu notifier took on that role in the past.
> >
>
> The cpu notifier isn't involved if the firmware notifies the kernel that a
> new ACPI memory device has been added or you write a start address to
> /sys/devices/system/memory/probe.  Hot-added memory devices can include
> ACPI_SRAT_MEM_HOT_PLUGGABLE entries in the SRAT for x86 that assign them
> non-online node ids (although all such entries get their bits set in
> node_possible_map at boot), so a new pgdat may be allocated for the node's
> registered range.

Yes Andi's work makes it explicit but there is already code in the cpu
notifier (see cpuup_prepare) that seems to have been intended to
initialize the node structures. Wonder why the hotplug people never
addressed that issue? Kame?


      list_for_each_entry(cachep, &cache_chain, next) {
                /*
                 * Set up the size64 kmemlist for cpu before we can
                 * begin anything. Make sure some other cpu on this
                 * node has not already allocated this
                 */
                if (!cachep->nodelists[node]) {
                        l3 = kmalloc_node(memsize, GFP_KERNEL, node);
                        if (!l3)
                                goto bad;
                        kmem_list3_init(l3);
                        l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
                            ((unsigned long)cachep) % REAPTIMEOUT_LIST3;

                        /*
                         * The l3s don't come and go as CPUs come and
                         * go.  cache_chain_mutex is sufficient
                         * protection here.
                         */
                        cachep->nodelists[node] = l3;
                }

                spin_lock_irq(&cachep->nodelists[node]->list_lock);
                cachep->nodelists[node]->free_limit =
                        (1 + nr_cpus_node(node)) *
                        cachep->batchcount + cachep->num;
                spin_unlock_irq(&cachep->nodelists[node]->list_lock);
        }


> kmalloc_node() in generic kernel code.  All that is done under
> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
> fourth patch in this series may not be necessary if we prevent setting the
> bit in the nodemask or building the zonelists until the slab nodelists are
> ready.

That sounds good.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25 18:30                         ` Christoph Lameter
@ 2010-02-26  1:09                           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 170+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-26  1:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Pekka Enberg, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Thu, 25 Feb 2010 12:30:26 -0600 (CST)
Christoph Lameter <cl@linux-foundation.org> wrote:

> On Thu, 25 Feb 2010, David Rientjes wrote:
> 
> > I don't see how memory hotadd with a new node being onlined could have
> > worked fine before since slab lacked any memory hotplug notifier until
> > Andi just added it.
> 
> AFAICR The cpu notifier took on that role in the past.
> 
> If what you say is true then memory hotplug has never worked before.
> Kamesan?
> 
In this code,

 int node = numa_node_id();

node is got by its CPU.

At node hotplug, following order should be kept.
	Add:   memory -> cpu
	Remove: cpu -> memory

cpus must be onlined after memory. At least, we online cpus only after
memory. Then, we(our heavy test on RHEL5) never see this kind of race.


I'm sorry if my answer misses your point.

Thanks,
-Kame
 

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26  1:09                           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 170+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-26  1:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Pekka Enberg, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Thu, 25 Feb 2010 12:30:26 -0600 (CST)
Christoph Lameter <cl@linux-foundation.org> wrote:

> On Thu, 25 Feb 2010, David Rientjes wrote:
> 
> > I don't see how memory hotadd with a new node being onlined could have
> > worked fine before since slab lacked any memory hotplug notifier until
> > Andi just added it.
> 
> AFAICR The cpu notifier took on that role in the past.
> 
> If what you say is true then memory hotplug has never worked before.
> Kamesan?
> 
In this code,

 int node = numa_node_id();

node is got by its CPU.

At node hotplug, following order should be kept.
	Add:   memory -> cpu
	Remove: cpu -> memory

cpus must be onlined after memory. At least, we online cpus only after
memory. Then, we(our heavy test on RHEL5) never see this kind of race.


I'm sorry if my answer misses your point.

Thanks,
-Kame
 

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25 22:31                             ` Christoph Lameter
@ 2010-02-26 10:45                               ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-26 10:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

Christoph Lameter kirjoitti:
>> kmalloc_node() in generic kernel code.  All that is done under
>> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
>> fourth patch in this series may not be necessary if we prevent setting the
>> bit in the nodemask or building the zonelists until the slab nodelists are
>> ready.
> 
> That sounds good.

Andi?

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 10:45                               ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-26 10:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

Christoph Lameter kirjoitti:
>> kmalloc_node() in generic kernel code.  All that is done under
>> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
>> fourth patch in this series may not be necessary if we prevent setting the
>> bit in the nodemask or building the zonelists until the slab nodelists are
>> ready.
> 
> That sounds good.

Andi?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25 18:30                         ` Christoph Lameter
@ 2010-02-26 11:41                           ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 11:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Pekka Enberg, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Thu, Feb 25, 2010 at 12:30:26PM -0600, Christoph Lameter wrote:
> On Thu, 25 Feb 2010, David Rientjes wrote:
> 
> > I don't see how memory hotadd with a new node being onlined could have
> > worked fine before since slab lacked any memory hotplug notifier until
> > Andi just added it.
> 
> AFAICR The cpu notifier took on that role in the past.

The problem is that slab already allocates inside the notifier
and then some state wasn't set up.

> If what you say is true then memory hotplug has never worked before.
> Kamesan?

Memory hotplug with node add never quite worked on x86 before,
for various reasons not related to slab.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 11:41                           ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 11:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Pekka Enberg, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Thu, Feb 25, 2010 at 12:30:26PM -0600, Christoph Lameter wrote:
> On Thu, 25 Feb 2010, David Rientjes wrote:
> 
> > I don't see how memory hotadd with a new node being onlined could have
> > worked fine before since slab lacked any memory hotplug notifier until
> > Andi just added it.
> 
> AFAICR The cpu notifier took on that role in the past.

The problem is that slab already allocates inside the notifier
and then some state wasn't set up.

> If what you say is true then memory hotplug has never worked before.
> Kamesan?

Memory hotplug with node add never quite worked on x86 before,
for various reasons not related to slab.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 10:45                               ` Pekka Enberg
@ 2010-02-26 11:43                                 ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 11:43 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, David Rientjes, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 12:45:02PM +0200, Pekka Enberg wrote:
> Christoph Lameter kirjoitti:
>>> kmalloc_node() in generic kernel code.  All that is done under
>>> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
>>> fourth patch in this series may not be necessary if we prevent setting the
>>> bit in the nodemask or building the zonelists until the slab nodelists are
>>> ready.
>>
>> That sounds good.
>
> Andi?

Well if Christoph wants to submit a better patch that is tested and solves
the problems he can do that.

if he doesn't then I think my patch kit which has been tested
is the best alternative currently.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 11:43                                 ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 11:43 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, David Rientjes, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 12:45:02PM +0200, Pekka Enberg wrote:
> Christoph Lameter kirjoitti:
>>> kmalloc_node() in generic kernel code.  All that is done under
>>> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
>>> fourth patch in this series may not be necessary if we prevent setting the
>>> bit in the nodemask or building the zonelists until the slab nodelists are
>>> ready.
>>
>> That sounds good.
>
> Andi?

Well if Christoph wants to submit a better patch that is tested and solves
the problems he can do that.

if he doesn't then I think my patch kit which has been tested
is the best alternative currently.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 11:43                                 ` Andi Kleen
@ 2010-02-26 12:35                                   ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-26 12:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, David Rientjes, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 1:43 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Fri, Feb 26, 2010 at 12:45:02PM +0200, Pekka Enberg wrote:
>> Christoph Lameter kirjoitti:
>>>> kmalloc_node() in generic kernel code.  All that is done under
>>>> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
>>>> fourth patch in this series may not be necessary if we prevent setting the
>>>> bit in the nodemask or building the zonelists until the slab nodelists are
>>>> ready.
>>>
>>> That sounds good.
>>
>> Andi?
>
> Well if Christoph wants to submit a better patch that is tested and solves
> the problems he can do that.

Sure.

> if he doesn't then I think my patch kit which has been tested
> is the best alternative currently.

So do you expect me to merge your patches over his objections?

                         Pekka

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 12:35                                   ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-02-26 12:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, David Rientjes, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 1:43 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Fri, Feb 26, 2010 at 12:45:02PM +0200, Pekka Enberg wrote:
>> Christoph Lameter kirjoitti:
>>>> kmalloc_node() in generic kernel code.  All that is done under
>>>> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
>>>> fourth patch in this series may not be necessary if we prevent setting the
>>>> bit in the nodemask or building the zonelists until the slab nodelists are
>>>> ready.
>>>
>>> That sounds good.
>>
>> Andi?
>
> Well if Christoph wants to submit a better patch that is tested and solves
> the problems he can do that.

Sure.

> if he doesn't then I think my patch kit which has been tested
> is the best alternative currently.

So do you expect me to merge your patches over his objections?

                         Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 12:35                                   ` Pekka Enberg
@ 2010-02-26 14:08                                     ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 14:08 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Christoph Lameter, David Rientjes, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 02:35:24PM +0200, Pekka Enberg wrote:
> On Fri, Feb 26, 2010 at 1:43 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > On Fri, Feb 26, 2010 at 12:45:02PM +0200, Pekka Enberg wrote:
> >> Christoph Lameter kirjoitti:
> >>>> kmalloc_node() in generic kernel code.  All that is done under
> >>>> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
> >>>> fourth patch in this series may not be necessary if we prevent setting the
> >>>> bit in the nodemask or building the zonelists until the slab nodelists are
> >>>> ready.
> >>>
> >>> That sounds good.
> >>
> >> Andi?
> >
> > Well if Christoph wants to submit a better patch that is tested and solves
> > the problems he can do that.
> 
> Sure.
> 
> > if he doesn't then I think my patch kit which has been tested
> > is the best alternative currently.
> 
> So do you expect me to merge your patches over his objections?

Let's put it like this: i'm sure there a myriad different 
way in all the possible design spaces to change slab to 
make memory hotadd work.

Unless someone gives me a strong reason (e.g. code as submitted
doesn't work or is really unclean) I'm not very motivated to try them
all (also given that slab.c is really legacy code that will
hopefully go away at some point).  

Also there are still other bugs to fix in memory hotadd and I'm focussing
 my efforts on that.

I don't think the patches I submitted are particularly intrusive or 
unclean or broken.

As far as I can see Christoph's proposal was just another way
to do this, but it wasn't clear to me it was better enough
in any way to spend significant time on it.

So yes I would prefer if you merged them as submitted just
to fix the bugs. If someone else comes up with a better way
to do this and submits patches they could still change
to that later.

As for the timer race patch: I cannot make a strong
argument right now that it's needed, on the other hand
a bit of defensive programming also doesn't hurt. But 
if that one is not in I won't cry.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 14:08                                     ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 14:08 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Christoph Lameter, David Rientjes, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 02:35:24PM +0200, Pekka Enberg wrote:
> On Fri, Feb 26, 2010 at 1:43 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > On Fri, Feb 26, 2010 at 12:45:02PM +0200, Pekka Enberg wrote:
> >> Christoph Lameter kirjoitti:
> >>>> kmalloc_node() in generic kernel code.  All that is done under
> >>>> MEM_GOING_ONLINE and not MEM_ONLINE, which is why I suggest the first and
> >>>> fourth patch in this series may not be necessary if we prevent setting the
> >>>> bit in the nodemask or building the zonelists until the slab nodelists are
> >>>> ready.
> >>>
> >>> That sounds good.
> >>
> >> Andi?
> >
> > Well if Christoph wants to submit a better patch that is tested and solves
> > the problems he can do that.
> 
> Sure.
> 
> > if he doesn't then I think my patch kit which has been tested
> > is the best alternative currently.
> 
> So do you expect me to merge your patches over his objections?

Let's put it like this: i'm sure there a myriad different 
way in all the possible design spaces to change slab to 
make memory hotadd work.

Unless someone gives me a strong reason (e.g. code as submitted
doesn't work or is really unclean) I'm not very motivated to try them
all (also given that slab.c is really legacy code that will
hopefully go away at some point).  

Also there are still other bugs to fix in memory hotadd and I'm focussing
 my efforts on that.

I don't think the patches I submitted are particularly intrusive or 
unclean or broken.

As far as I can see Christoph's proposal was just another way
to do this, but it wasn't clear to me it was better enough
in any way to spend significant time on it.

So yes I would prefer if you merged them as submitted just
to fix the bugs. If someone else comes up with a better way
to do this and submits patches they could still change
to that later.

As for the timer race patch: I cannot make a strong
argument right now that it's needed, on the other hand
a bit of defensive programming also doesn't hurt. But 
if that one is not in I won't cry.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 11:41                           ` Andi Kleen
@ 2010-02-26 15:04                             ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-26 15:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Rientjes, Pekka Enberg, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, 26 Feb 2010, Andi Kleen wrote:

> Memory hotplug with node add never quite worked on x86 before,
> for various reasons not related to slab.

Ok but why did things break in such a big way?


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 15:04                             ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-26 15:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Rientjes, Pekka Enberg, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, 26 Feb 2010, Andi Kleen wrote:

> Memory hotplug with node add never quite worked on x86 before,
> for various reasons not related to slab.

Ok but why did things break in such a big way?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 15:04                             ` Christoph Lameter
@ 2010-02-26 15:05                               ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-26 15:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Rientjes, Pekka Enberg, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki


I mean why the core changes if this is an x86 issue?


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 15:05                               ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-26 15:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Rientjes, Pekka Enberg, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki


I mean why the core changes if this is an x86 issue?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 15:04                             ` Christoph Lameter
@ 2010-02-26 15:57                               ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 15:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, David Rientjes, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 09:04:56AM -0600, Christoph Lameter wrote:
> On Fri, 26 Feb 2010, Andi Kleen wrote:
> 
> > Memory hotplug with node add never quite worked on x86 before,
> > for various reasons not related to slab.
> 
> Ok but why did things break in such a big way?

1) numa memory hotadd never worked
2) the rest just bitrotted because nobody tested it.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 15:57                               ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 15:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, David Rientjes, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 09:04:56AM -0600, Christoph Lameter wrote:
> On Fri, 26 Feb 2010, Andi Kleen wrote:
> 
> > Memory hotplug with node add never quite worked on x86 before,
> > for various reasons not related to slab.
> 
> Ok but why did things break in such a big way?

1) numa memory hotadd never worked
2) the rest just bitrotted because nobody tested it.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 15:05                               ` Christoph Lameter
@ 2010-02-26 15:59                                 ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 15:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, David Rientjes, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 09:05:48AM -0600, Christoph Lameter wrote:
> 
> I mean why the core changes if this is an x86 issue?

The slab bugs are in no way related to x86, other than x86 supporting
memory hotadd & numa.

I only wrote "on x86" because I wasn't sure about the status on the other
platforms.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 15:59                                 ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 15:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, David Rientjes, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 09:05:48AM -0600, Christoph Lameter wrote:
> 
> I mean why the core changes if this is an x86 issue?

The slab bugs are in no way related to x86, other than x86 supporting
memory hotadd & numa.

I only wrote "on x86" because I wasn't sure about the status on the other
platforms.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 15:57                               ` Andi Kleen
@ 2010-02-26 17:24                                 ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-26 17:24 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Rientjes, Pekka Enberg, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, 26 Feb 2010, Andi Kleen wrote:

> > > Memory hotplug with node add never quite worked on x86 before,
> > > for various reasons not related to slab.
> >
> > Ok but why did things break in such a big way?
>
> 1) numa memory hotadd never worked

Well Kamesan indicated that this worked if a cpu became online.

> 2) the rest just bitrotted because nobody tested it.

Yep. David: Can you revise the relevant portions of the patchset and
repost it?


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 17:24                                 ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-02-26 17:24 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Rientjes, Pekka Enberg, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, 26 Feb 2010, Andi Kleen wrote:

> > > Memory hotplug with node add never quite worked on x86 before,
> > > for various reasons not related to slab.
> >
> > Ok but why did things break in such a big way?
>
> 1) numa memory hotadd never worked

Well Kamesan indicated that this worked if a cpu became online.

> 2) the rest just bitrotted because nobody tested it.

Yep. David: Can you revise the relevant portions of the patchset and
repost it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 17:24                                 ` Christoph Lameter
@ 2010-02-26 17:31                                   ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 17:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, David Rientjes, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 11:24:50AM -0600, Christoph Lameter wrote:
> On Fri, 26 Feb 2010, Andi Kleen wrote:
> 
> > > > Memory hotplug with node add never quite worked on x86 before,
> > > > for various reasons not related to slab.
> > >
> > > Ok but why did things break in such a big way?
> >
> > 1) numa memory hotadd never worked
> 
> Well Kamesan indicated that this worked if a cpu became online.

I mean in the general case. There were tons of problems all over.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-26 17:31                                   ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-02-26 17:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, David Rientjes, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, Feb 26, 2010 at 11:24:50AM -0600, Christoph Lameter wrote:
> On Fri, 26 Feb 2010, Andi Kleen wrote:
> 
> > > > Memory hotplug with node add never quite worked on x86 before,
> > > > for various reasons not related to slab.
> > >
> > > Ok but why did things break in such a big way?
> >
> > 1) numa memory hotadd never worked
> 
> Well Kamesan indicated that this worked if a cpu became online.

I mean in the general case. There were tons of problems all over.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 17:24                                 ` Christoph Lameter
@ 2010-02-27  0:01                                   ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-27  0:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Pekka Enberg, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Fri, 26 Feb 2010, Christoph Lameter wrote:

> > 1) numa memory hotadd never worked
> 
> Well Kamesan indicated that this worked if a cpu became online.
> 

That may be true, but it doesn't address hotpluggable 
ACPI_SRAT_MEM_HOT_PLUGGABLE entries for CONFIG_MEMORY_HOTPLUG_SPARSE where 
no cpus are being onlined or writing to /sys/devices/system/memory/probe 
for CONFIG_ARCH_MEMORY_PROBE.

> > 2) the rest just bitrotted because nobody tested it.
> 
> Yep. David: Can you revise the relevant portions of the patchset and
> repost it?
> 

Ok.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-02-27  0:01                                   ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-02-27  0:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Pekka Enberg, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Fri, 26 Feb 2010, Christoph Lameter wrote:

> > 1) numa memory hotadd never worked
> 
> Well Kamesan indicated that this worked if a cpu became online.
> 

That may be true, but it doesn't address hotpluggable 
ACPI_SRAT_MEM_HOT_PLUGGABLE entries for CONFIG_MEMORY_HOTPLUG_SPARSE where 
no cpus are being onlined or writing to /sys/devices/system/memory/probe 
for CONFIG_ARCH_MEMORY_PROBE.

> > 2) the rest just bitrotted because nobody tested it.
> 
> Yep. David: Can you revise the relevant portions of the patchset and
> repost it?
> 

Ok.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-26 17:31                                   ` Andi Kleen
@ 2010-03-01  1:59                                     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 170+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-01  1:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, David Rientjes, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Fri, 26 Feb 2010 18:31:15 +0100
Andi Kleen <andi@firstfloor.org> wrote:

> On Fri, Feb 26, 2010 at 11:24:50AM -0600, Christoph Lameter wrote:
> > On Fri, 26 Feb 2010, Andi Kleen wrote:
> > 
> > > > > Memory hotplug with node add never quite worked on x86 before,
> > > > > for various reasons not related to slab.
> > > >
> > > > Ok but why did things break in such a big way?
> > >
> > > 1) numa memory hotadd never worked
> > 
> > Well Kamesan indicated that this worked if a cpu became online.
> 
> I mean in the general case. There were tons of problems all over.
> 
Then, it's cpu hotplug matter, not memory hotplug.
cpu hotplug callback should prepaare 


	l3 = searchp->nodelists[node];
	BUG_ON(!l3);

before onlined. Rather than taking care of races.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-03-01  1:59                                     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 170+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-01  1:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, David Rientjes, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Fri, 26 Feb 2010 18:31:15 +0100
Andi Kleen <andi@firstfloor.org> wrote:

> On Fri, Feb 26, 2010 at 11:24:50AM -0600, Christoph Lameter wrote:
> > On Fri, 26 Feb 2010, Andi Kleen wrote:
> > 
> > > > > Memory hotplug with node add never quite worked on x86 before,
> > > > > for various reasons not related to slab.
> > > >
> > > > Ok but why did things break in such a big way?
> > >
> > > 1) numa memory hotadd never worked
> > 
> > Well Kamesan indicated that this worked if a cpu became online.
> 
> I mean in the general case. There were tons of problems all over.
> 
Then, it's cpu hotplug matter, not memory hotplug.
cpu hotplug callback should prepaare 


	l3 = searchp->nodelists[node];
	BUG_ON(!l3);

before onlined. Rather than taking care of races.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [patch] slab: add memory hotplug support
  2010-02-27  0:01                                   ` David Rientjes
@ 2010-03-01 10:24                                     ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-01 10:24 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Nick Piggin, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Slab lacks any memory hotplug support for nodes that are hotplugged
without cpus being hotplugged.  This is possible at least on x86
CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
node.  It can also be done manually by writing the start address to
/sys/devices/system/memory/probe for kernels that have
CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
then onlining the new memory region.

When a node is hotadded, a nodelist for that node is allocated and 
initialized for each slab cache.  If this isn't completed due to a lack
of memory, the hotadd is aborted: we have a reasonable expectation that
kmalloc_node(nid) will work for all caches if nid is online and memory is
available.  

Since nodelists must be allocated and initialized prior to the new node's
memory actually being online, the struct kmem_list3 is allocated off-node
due to kmalloc_node()'s fallback.

When an entire node is offlined (or an online is aborted), these
nodelists are subsequently drained and freed.  If objects still exist
either on the partial or full lists for those nodes, the offline is
aborted.  This scenario will not occur for an aborted online, however,
since objects can never be allocated from those nodelists until the
online has completed.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/slab.c |  202 +++++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 170 insertions(+), 32 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -115,6 +115,7 @@
 #include	<linux/reciprocal_div.h>
 #include	<linux/debugobjects.h>
 #include	<linux/kmemcheck.h>
+#include	<linux/memory.h>
 
 #include	<asm/cacheflush.h>
 #include	<asm/tlbflush.h>
@@ -1105,6 +1106,52 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 }
 #endif
 
+/*
+ * Allocates and initializes nodelists for a node on each slab cache, used for
+ * either memory or cpu hotplug.  If memory is being hot-added, the kmem_list3
+ * will be allocated off-node since memory is not yet online for the new node.
+ * When hotplugging memory or a cpu, existing nodelists are not replaced if
+ * already in use.
+ *
+ * Must hold cache_chain_mutex.
+ */
+static int init_cache_nodelists_node(int node)
+{
+	struct kmem_cache *cachep;
+	struct kmem_list3 *l3;
+	const int memsize = sizeof(struct kmem_list3);
+
+	list_for_each_entry(cachep, &cache_chain, next) {
+		/*
+		 * Set up the size64 kmemlist for cpu before we can
+		 * begin anything. Make sure some other cpu on this
+		 * node has not already allocated this
+		 */
+		if (!cachep->nodelists[node]) {
+			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
+			if (!l3)
+				return -ENOMEM;
+			kmem_list3_init(l3);
+			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
+			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
+
+			/*
+			 * The l3s don't come and go as CPUs come and
+			 * go.  cache_chain_mutex is sufficient
+			 * protection here.
+			 */
+			cachep->nodelists[node] = l3;
+		}
+
+		spin_lock_irq(&cachep->nodelists[node]->list_lock);
+		cachep->nodelists[node]->free_limit =
+			(1 + nr_cpus_node(node)) *
+			cachep->batchcount + cachep->num;
+		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
+	}
+	return 0;
+}
+
 static void __cpuinit cpuup_canceled(long cpu)
 {
 	struct kmem_cache *cachep;
@@ -1175,7 +1222,7 @@ static int __cpuinit cpuup_prepare(long cpu)
 	struct kmem_cache *cachep;
 	struct kmem_list3 *l3 = NULL;
 	int node = cpu_to_node(cpu);
-	const int memsize = sizeof(struct kmem_list3);
+	int err;
 
 	/*
 	 * We need to do this right in the beginning since
@@ -1183,35 +1230,9 @@ static int __cpuinit cpuup_prepare(long cpu)
 	 * kmalloc_node allows us to add the slab to the right
 	 * kmem_list3 and not this cpu's kmem_list3
 	 */
-
-	list_for_each_entry(cachep, &cache_chain, next) {
-		/*
-		 * Set up the size64 kmemlist for cpu before we can
-		 * begin anything. Make sure some other cpu on this
-		 * node has not already allocated this
-		 */
-		if (!cachep->nodelists[node]) {
-			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
-			if (!l3)
-				goto bad;
-			kmem_list3_init(l3);
-			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
-			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
-
-			/*
-			 * The l3s don't come and go as CPUs come and
-			 * go.  cache_chain_mutex is sufficient
-			 * protection here.
-			 */
-			cachep->nodelists[node] = l3;
-		}
-
-		spin_lock_irq(&cachep->nodelists[node]->list_lock);
-		cachep->nodelists[node]->free_limit =
-			(1 + nr_cpus_node(node)) *
-			cachep->batchcount + cachep->num;
-		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
-	}
+	err = init_cache_nodelists_node(node);
+	if (err < 0)
+		goto bad;
 
 	/*
 	 * Now we can go ahead with allocating the shared arrays and
@@ -1334,11 +1355,120 @@ static struct notifier_block __cpuinitdata cpucache_notifier = {
 	&cpuup_callback, NULL, 0
 };
 
+#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
+/*
+ * Drains and frees nodelists for a node on each slab cache, used for memory
+ * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
+ * hot-remove so that the node is not removed.  When used because memory
+ * hot-add is canceled, the only result is the freed kmem_list3.
+ *
+ * Must hold cache_chain_mutex.
+ */
+static int __meminit free_cache_nodelists_node(int node)
+{
+	struct kmem_cache *cachep;
+	int ret = 0;
+
+	list_for_each_entry(cachep, &cache_chain, next) {
+		struct array_cache *shared;
+		struct array_cache **alien;
+		struct kmem_list3 *l3;
+
+		l3 = cachep->nodelists[node];
+		if (!l3)
+			continue;
+
+		spin_lock_irq(&l3->list_lock);
+		shared = l3->shared;
+		if (shared) {
+			free_block(cachep, shared->entry, shared->avail, node);
+			l3->shared = NULL;
+		}
+		alien = l3->alien;
+		l3->alien = NULL;
+		spin_unlock_irq(&l3->list_lock);
+
+		if (alien) {
+			drain_alien_cache(cachep, alien);
+			free_alien_cache(alien);
+		}
+		kfree(shared);
+
+		drain_freelist(cachep, l3, l3->free_objects);
+		if (!list_empty(&l3->slabs_full) ||
+					!list_empty(&l3->slabs_partial)) {
+			/*
+			 * Continue to iterate through each slab cache to free
+			 * as many nodelists as possible even though the
+			 * offline will be canceled.
+			 */
+			ret = -EBUSY;
+			continue;
+		}
+		kfree(l3);
+		cachep->nodelists[node] = NULL;
+	}
+	return ret;
+}
+
+/*
+ * Onlines nid either as the result of memory hot-add or canceled hot-remove.
+ */
+static int __meminit slab_node_online(int nid)
+{
+	int ret;
+	mutex_lock(&cache_chain_mutex);
+	ret = init_cache_nodelists_node(nid);
+	mutex_unlock(&cache_chain_mutex);
+	return ret;
+}
+
+/*
+ * Offlines nid either as the result of memory hot-remove or canceled hot-add.
+ */
+static int __meminit slab_node_offline(int nid)
+{
+	int ret;
+	mutex_lock(&cache_chain_mutex);
+	ret = free_cache_nodelists_node(nid);
+	mutex_unlock(&cache_chain_mutex);
+	return ret;
+}
+
+static int __meminit slab_memory_callback(struct notifier_block *self,
+					unsigned long action, void *arg)
+{
+	struct memory_notify *mnb = arg;
+	int ret = 0;
+	int nid;
+
+	nid = mnb->status_change_nid;
+	if (nid < 0)
+		goto out;
+
+	switch (action) {
+	case MEM_GOING_ONLINE:
+	case MEM_CANCEL_OFFLINE:
+		ret = slab_node_online(nid);
+		break;
+	case MEM_GOING_OFFLINE:
+	case MEM_CANCEL_ONLINE:
+		ret = slab_node_offline(nid);
+		break;
+	case MEM_ONLINE:
+	case MEM_OFFLINE:
+		break;
+	}
+out:
+	return ret ? notifier_from_errno(ret) : NOTIFY_OK;
+}
+#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
+
 /*
  * swap the static kmem_list3 with kmalloced memory
  */
-static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
-			int nodeid)
+static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
+				int nodeid)
 {
 	struct kmem_list3 *ptr;
 
@@ -1583,6 +1713,14 @@ void __init kmem_cache_init_late(void)
 	 */
 	register_cpu_notifier(&cpucache_notifier);
 
+#ifdef CONFIG_NUMA
+	/*
+	 * Register a memory hotplug callback that initializes and frees
+	 * nodelists.
+	 */
+	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
+#endif
+
 	/*
 	 * The reap timers are started later, with a module init call: That part
 	 * of the kernel is not yet operational.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [patch] slab: add memory hotplug support
@ 2010-03-01 10:24                                     ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-01 10:24 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, Nick Piggin, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Slab lacks any memory hotplug support for nodes that are hotplugged
without cpus being hotplugged.  This is possible at least on x86
CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
node.  It can also be done manually by writing the start address to
/sys/devices/system/memory/probe for kernels that have
CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
then onlining the new memory region.

When a node is hotadded, a nodelist for that node is allocated and 
initialized for each slab cache.  If this isn't completed due to a lack
of memory, the hotadd is aborted: we have a reasonable expectation that
kmalloc_node(nid) will work for all caches if nid is online and memory is
available.  

Since nodelists must be allocated and initialized prior to the new node's
memory actually being online, the struct kmem_list3 is allocated off-node
due to kmalloc_node()'s fallback.

When an entire node is offlined (or an online is aborted), these
nodelists are subsequently drained and freed.  If objects still exist
either on the partial or full lists for those nodes, the offline is
aborted.  This scenario will not occur for an aborted online, however,
since objects can never be allocated from those nodelists until the
online has completed.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/slab.c |  202 +++++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 170 insertions(+), 32 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -115,6 +115,7 @@
 #include	<linux/reciprocal_div.h>
 #include	<linux/debugobjects.h>
 #include	<linux/kmemcheck.h>
+#include	<linux/memory.h>
 
 #include	<asm/cacheflush.h>
 #include	<asm/tlbflush.h>
@@ -1105,6 +1106,52 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 }
 #endif
 
+/*
+ * Allocates and initializes nodelists for a node on each slab cache, used for
+ * either memory or cpu hotplug.  If memory is being hot-added, the kmem_list3
+ * will be allocated off-node since memory is not yet online for the new node.
+ * When hotplugging memory or a cpu, existing nodelists are not replaced if
+ * already in use.
+ *
+ * Must hold cache_chain_mutex.
+ */
+static int init_cache_nodelists_node(int node)
+{
+	struct kmem_cache *cachep;
+	struct kmem_list3 *l3;
+	const int memsize = sizeof(struct kmem_list3);
+
+	list_for_each_entry(cachep, &cache_chain, next) {
+		/*
+		 * Set up the size64 kmemlist for cpu before we can
+		 * begin anything. Make sure some other cpu on this
+		 * node has not already allocated this
+		 */
+		if (!cachep->nodelists[node]) {
+			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
+			if (!l3)
+				return -ENOMEM;
+			kmem_list3_init(l3);
+			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
+			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
+
+			/*
+			 * The l3s don't come and go as CPUs come and
+			 * go.  cache_chain_mutex is sufficient
+			 * protection here.
+			 */
+			cachep->nodelists[node] = l3;
+		}
+
+		spin_lock_irq(&cachep->nodelists[node]->list_lock);
+		cachep->nodelists[node]->free_limit =
+			(1 + nr_cpus_node(node)) *
+			cachep->batchcount + cachep->num;
+		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
+	}
+	return 0;
+}
+
 static void __cpuinit cpuup_canceled(long cpu)
 {
 	struct kmem_cache *cachep;
@@ -1175,7 +1222,7 @@ static int __cpuinit cpuup_prepare(long cpu)
 	struct kmem_cache *cachep;
 	struct kmem_list3 *l3 = NULL;
 	int node = cpu_to_node(cpu);
-	const int memsize = sizeof(struct kmem_list3);
+	int err;
 
 	/*
 	 * We need to do this right in the beginning since
@@ -1183,35 +1230,9 @@ static int __cpuinit cpuup_prepare(long cpu)
 	 * kmalloc_node allows us to add the slab to the right
 	 * kmem_list3 and not this cpu's kmem_list3
 	 */
-
-	list_for_each_entry(cachep, &cache_chain, next) {
-		/*
-		 * Set up the size64 kmemlist for cpu before we can
-		 * begin anything. Make sure some other cpu on this
-		 * node has not already allocated this
-		 */
-		if (!cachep->nodelists[node]) {
-			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
-			if (!l3)
-				goto bad;
-			kmem_list3_init(l3);
-			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
-			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
-
-			/*
-			 * The l3s don't come and go as CPUs come and
-			 * go.  cache_chain_mutex is sufficient
-			 * protection here.
-			 */
-			cachep->nodelists[node] = l3;
-		}
-
-		spin_lock_irq(&cachep->nodelists[node]->list_lock);
-		cachep->nodelists[node]->free_limit =
-			(1 + nr_cpus_node(node)) *
-			cachep->batchcount + cachep->num;
-		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
-	}
+	err = init_cache_nodelists_node(node);
+	if (err < 0)
+		goto bad;
 
 	/*
 	 * Now we can go ahead with allocating the shared arrays and
@@ -1334,11 +1355,120 @@ static struct notifier_block __cpuinitdata cpucache_notifier = {
 	&cpuup_callback, NULL, 0
 };
 
+#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
+/*
+ * Drains and frees nodelists for a node on each slab cache, used for memory
+ * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
+ * hot-remove so that the node is not removed.  When used because memory
+ * hot-add is canceled, the only result is the freed kmem_list3.
+ *
+ * Must hold cache_chain_mutex.
+ */
+static int __meminit free_cache_nodelists_node(int node)
+{
+	struct kmem_cache *cachep;
+	int ret = 0;
+
+	list_for_each_entry(cachep, &cache_chain, next) {
+		struct array_cache *shared;
+		struct array_cache **alien;
+		struct kmem_list3 *l3;
+
+		l3 = cachep->nodelists[node];
+		if (!l3)
+			continue;
+
+		spin_lock_irq(&l3->list_lock);
+		shared = l3->shared;
+		if (shared) {
+			free_block(cachep, shared->entry, shared->avail, node);
+			l3->shared = NULL;
+		}
+		alien = l3->alien;
+		l3->alien = NULL;
+		spin_unlock_irq(&l3->list_lock);
+
+		if (alien) {
+			drain_alien_cache(cachep, alien);
+			free_alien_cache(alien);
+		}
+		kfree(shared);
+
+		drain_freelist(cachep, l3, l3->free_objects);
+		if (!list_empty(&l3->slabs_full) ||
+					!list_empty(&l3->slabs_partial)) {
+			/*
+			 * Continue to iterate through each slab cache to free
+			 * as many nodelists as possible even though the
+			 * offline will be canceled.
+			 */
+			ret = -EBUSY;
+			continue;
+		}
+		kfree(l3);
+		cachep->nodelists[node] = NULL;
+	}
+	return ret;
+}
+
+/*
+ * Onlines nid either as the result of memory hot-add or canceled hot-remove.
+ */
+static int __meminit slab_node_online(int nid)
+{
+	int ret;
+	mutex_lock(&cache_chain_mutex);
+	ret = init_cache_nodelists_node(nid);
+	mutex_unlock(&cache_chain_mutex);
+	return ret;
+}
+
+/*
+ * Offlines nid either as the result of memory hot-remove or canceled hot-add.
+ */
+static int __meminit slab_node_offline(int nid)
+{
+	int ret;
+	mutex_lock(&cache_chain_mutex);
+	ret = free_cache_nodelists_node(nid);
+	mutex_unlock(&cache_chain_mutex);
+	return ret;
+}
+
+static int __meminit slab_memory_callback(struct notifier_block *self,
+					unsigned long action, void *arg)
+{
+	struct memory_notify *mnb = arg;
+	int ret = 0;
+	int nid;
+
+	nid = mnb->status_change_nid;
+	if (nid < 0)
+		goto out;
+
+	switch (action) {
+	case MEM_GOING_ONLINE:
+	case MEM_CANCEL_OFFLINE:
+		ret = slab_node_online(nid);
+		break;
+	case MEM_GOING_OFFLINE:
+	case MEM_CANCEL_ONLINE:
+		ret = slab_node_offline(nid);
+		break;
+	case MEM_ONLINE:
+	case MEM_OFFLINE:
+		break;
+	}
+out:
+	return ret ? notifier_from_errno(ret) : NOTIFY_OK;
+}
+#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
+
 /*
  * swap the static kmem_list3 with kmalloced memory
  */
-static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
-			int nodeid)
+static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
+				int nodeid)
 {
 	struct kmem_list3 *ptr;
 
@@ -1583,6 +1713,14 @@ void __init kmem_cache_init_late(void)
 	 */
 	register_cpu_notifier(&cpucache_notifier);
 
+#ifdef CONFIG_NUMA
+	/*
+	 * Register a memory hotplug callback that initializes and frees
+	 * nodelists.
+	 */
+	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
+#endif
+
 	/*
 	 * The reap timers are started later, with a module init call: That part
 	 * of the kernel is not yet operational.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-03-01  1:59                                     ` KAMEZAWA Hiroyuki
@ 2010-03-01 10:27                                       ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-01 10:27 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andi Kleen, Christoph Lameter, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Mon, 1 Mar 2010, KAMEZAWA Hiroyuki wrote:

> > > Well Kamesan indicated that this worked if a cpu became online.
> > 
> > I mean in the general case. There were tons of problems all over.
> > 
> Then, it's cpu hotplug matter, not memory hotplug.
> cpu hotplug callback should prepaare 
> 
> 
> 	l3 = searchp->nodelists[node];
> 	BUG_ON(!l3);
> 
> before onlined. Rather than taking care of races.
> 

I can only speak for x86 and not the abundance of memory hotplug support 
that exists for powerpc, but cpu hotplug doesn't do _anything_ when a 
memory region that has a corresponding ACPI_SRAT_MEM_HOT_PLUGGABLE entry 
in the SRAT is hotadded and requires a new nodeid.  That can be triggered 
via the acpi layer with plug and play or explicitly from the command line 
via CONFIG_ARCH_MEMORY_PROBE.

Relying on cpu hotplug to set up nodelists in such a circumstance simply 
won't work.  You need memory hotplug support such as in my patch.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-03-01 10:27                                       ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-01 10:27 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andi Kleen, Christoph Lameter, Pekka Enberg, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Mon, 1 Mar 2010, KAMEZAWA Hiroyuki wrote:

> > > Well Kamesan indicated that this worked if a cpu became online.
> > 
> > I mean in the general case. There were tons of problems all over.
> > 
> Then, it's cpu hotplug matter, not memory hotplug.
> cpu hotplug callback should prepaare 
> 
> 
> 	l3 = searchp->nodelists[node];
> 	BUG_ON(!l3);
> 
> before onlined. Rather than taking care of races.
> 

I can only speak for x86 and not the abundance of memory hotplug support 
that exists for powerpc, but cpu hotplug doesn't do _anything_ when a 
memory region that has a corresponding ACPI_SRAT_MEM_HOT_PLUGGABLE entry 
in the SRAT is hotadded and requires a new nodeid.  That can be triggered 
via the acpi layer with plug and play or explicitly from the command line 
via CONFIG_ARCH_MEMORY_PROBE.

Relying on cpu hotplug to set up nodelists in such a circumstance simply 
won't work.  You need memory hotplug support such as in my patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-01 10:24                                     ` David Rientjes
@ 2010-03-02  5:53                                       ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-03-02  5:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andi Kleen, Nick Piggin, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

David Rientjes wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
> 
> When a node is hotadded, a nodelist for that node is allocated and 
> initialized for each slab cache.  If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.  
> 
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
> 
> When an entire node is offlined (or an online is aborted), these
> nodelists are subsequently drained and freed.  If objects still exist
> either on the partial or full lists for those nodes, the offline is
> aborted.  This scenario will not occur for an aborted online, however,
> since objects can never be allocated from those nodelists until the
> online has completed.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Andi, does this fix the oops you were seeing?

			Pekka

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-02  5:53                                       ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-03-02  5:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andi Kleen, Nick Piggin, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

David Rientjes wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
> 
> When a node is hotadded, a nodelist for that node is allocated and 
> initialized for each slab cache.  If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.  
> 
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
> 
> When an entire node is offlined (or an online is aborted), these
> nodelists are subsequently drained and freed.  If objects still exist
> either on the partial or full lists for those nodes, the offline is
> aborted.  This scenario will not occur for an aborted online, however,
> since objects can never be allocated from those nodelists until the
> online has completed.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Andi, does this fix the oops you were seeing?

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-01 10:24                                     ` David Rientjes
@ 2010-03-02 12:53                                       ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-02 12:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, Christoph Lameter,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Mon, Mar 01, 2010 at 02:24:43AM -0800, David Rientjes wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.

The patch looks far more complicated than my simple fix.

Is more complicated now better?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-02 12:53                                       ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-02 12:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, Christoph Lameter,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Mon, Mar 01, 2010 at 02:24:43AM -0800, David Rientjes wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.

The patch looks far more complicated than my simple fix.

Is more complicated now better?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
  2010-02-25 18:46                         ` Pekka Enberg
@ 2010-03-02 12:55                           ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-02 12:55 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andi Kleen, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, rientjes, KAMEZAWA Hiroyuki

> The first set of patches from Andi are almost one month old. If this issue 
> progresses as swiftly as it has to this day, I foresee a rocky road for any 

Yes it seems to be a bike shedding area for some reason (which color
should we paint it today?)

> of them getting merged to .34 through slab.git, that's all.

IMHO they are all bug fixes and there is no excuse for not merging them ASAP,
independent of any merge windows.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap
@ 2010-03-02 12:55                           ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-02 12:55 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andi Kleen, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, rientjes, KAMEZAWA Hiroyuki

> The first set of patches from Andi are almost one month old. If this issue 
> progresses as swiftly as it has to this day, I foresee a rocky road for any 

Yes it seems to be a bike shedding area for some reason (which color
should we paint it today?)

> of them getting merged to .34 through slab.git, that's all.

IMHO they are all bug fixes and there is no excuse for not merging them ASAP,
independent of any merge windows.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-02 12:53                                       ` Andi Kleen
@ 2010-03-02 15:04                                         ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-03-02 15:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Rientjes, Nick Piggin, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Hi Andi,

On Tue, Mar 2, 2010 at 2:53 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Mon, Mar 01, 2010 at 02:24:43AM -0800, David Rientjes wrote:
>> Slab lacks any memory hotplug support for nodes that are hotplugged
>> without cpus being hotplugged.  This is possible at least on x86
>> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
>> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
>> node.  It can also be done manually by writing the start address to
>> /sys/devices/system/memory/probe for kernels that have
>> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
>> then onlining the new memory region.
>
> The patch looks far more complicated than my simple fix.

I wouldn't exactly call the fallback_alloc() games "simple".

> Is more complicated now better?

Heh, heh. You can't post the oops, you don't want to rework your
patches as per review comments, and now you complain about David's
patch without one bit of technical content. I'm sorry but I must
conclude that someone is playing a prank on me because there's no way
a seasoned kernel hacker such as yourself could possibly think that
this is the way to get patches merged.

But anyway, if you have real technical concerns over the patch, please
make them known; otherwise I'd much appreciate a Tested-by tag from
you for David's patch.

Thanks,

                        Pekka

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-02 15:04                                         ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-03-02 15:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Rientjes, Nick Piggin, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Hi Andi,

On Tue, Mar 2, 2010 at 2:53 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Mon, Mar 01, 2010 at 02:24:43AM -0800, David Rientjes wrote:
>> Slab lacks any memory hotplug support for nodes that are hotplugged
>> without cpus being hotplugged.  This is possible at least on x86
>> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
>> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
>> node.  It can also be done manually by writing the start address to
>> /sys/devices/system/memory/probe for kernels that have
>> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
>> then onlining the new memory region.
>
> The patch looks far more complicated than my simple fix.

I wouldn't exactly call the fallback_alloc() games "simple".

> Is more complicated now better?

Heh, heh. You can't post the oops, you don't want to rework your
patches as per review comments, and now you complain about David's
patch without one bit of technical content. I'm sorry but I must
conclude that someone is playing a prank on me because there's no way
a seasoned kernel hacker such as yourself could possibly think that
this is the way to get patches merged.

But anyway, if you have real technical concerns over the patch, please
make them known; otherwise I'd much appreciate a Tested-by tag from
you for David's patch.

Thanks,

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-02  5:53                                       ` Pekka Enberg
@ 2010-03-02 20:20                                         ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-03-02 20:20 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki


Not sure how this would sync with slab use during node bootstrap and
shutdown. Kame-san?

Otherwise

Acked-by: Christoph Lameter <cl@linux-foundation.org>



^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-02 20:20                                         ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-03-02 20:20 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki


Not sure how this would sync with slab use during node bootstrap and
shutdown. Kame-san?

Otherwise

Acked-by: Christoph Lameter <cl@linux-foundation.org>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-02 20:20                                         ` Christoph Lameter
@ 2010-03-02 21:03                                           ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-02 21:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Tue, 2 Mar 2010, Christoph Lameter wrote:

> 
> Not sure how this would sync with slab use during node bootstrap and
> shutdown. Kame-san?
> 

All the nodelist allocation and initialization is done during 
MEM_GOING_ONLINE, so there should be no use of them until that 
notification cycle is done and it has graduated to MEM_ONLINE: if there 
are, there're even bigger problems because zonelist haven't even been 
built for that pgdat yet.  I can only speculate, but since Andi's 
patchset did all this during MEM_ONLINE, where the bit is already set in 
node_states[N_HIGH_MEMORY] and is passable to kmalloc_node(), this is 
probably why additional hacks had to be added elsewhere.

Other than that, concurrent kmem_cache_create() is protected by 
cache_chain_mutex.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-02 21:03                                           ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-02 21:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, Nick Piggin, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Tue, 2 Mar 2010, Christoph Lameter wrote:

> 
> Not sure how this would sync with slab use during node bootstrap and
> shutdown. Kame-san?
> 

All the nodelist allocation and initialization is done during 
MEM_GOING_ONLINE, so there should be no use of them until that 
notification cycle is done and it has graduated to MEM_ONLINE: if there 
are, there're even bigger problems because zonelist haven't even been 
built for that pgdat yet.  I can only speculate, but since Andi's 
patchset did all this during MEM_ONLINE, where the bit is already set in 
node_states[N_HIGH_MEMORY] and is passable to kmalloc_node(), this is 
probably why additional hacks had to be added elsewhere.

Other than that, concurrent kmem_cache_create() is protected by 
cache_chain_mutex.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-02 12:53                                       ` Andi Kleen
@ 2010-03-02 21:17                                         ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-02 21:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Pekka Enberg, Nick Piggin, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Tue, 2 Mar 2010, Andi Kleen wrote:

> The patch looks far more complicated than my simple fix.
> 
> Is more complicated now better?
> 

If you still believe these are "fixes," then perhaps you don't fully 
understand the issue: slab completely lacked memory hotplug support when a 
node is either being onlined or offlined that do not have hotadded or 
hotremoved cpus.  It's as simple as that.

To be fair, my patch may appear more complex because it implements full 
memory hotplug support so that the nodelists are properly drained and 
freed when the same memory regions you onlined for memory hot-add are now 
offlined.  Notice, also, how it touches no other slab code as implementing 
new support for something shouldn't.  There is no need for additional 
hacks to be added in other slab code if you properly allocate and 
initialize the nodelists for the memory being added before it is available 
for use by the kernel.

If you'd test my patch out on your setup, that would be very helpful.  I 
can address any additional issues that you may undercover if you post the 
oops while doing either memory online or offline.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-02 21:17                                         ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-02 21:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Pekka Enberg, Nick Piggin, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Tue, 2 Mar 2010, Andi Kleen wrote:

> The patch looks far more complicated than my simple fix.
> 
> Is more complicated now better?
> 

If you still believe these are "fixes," then perhaps you don't fully 
understand the issue: slab completely lacked memory hotplug support when a 
node is either being onlined or offlined that do not have hotadded or 
hotremoved cpus.  It's as simple as that.

To be fair, my patch may appear more complex because it implements full 
memory hotplug support so that the nodelists are properly drained and 
freed when the same memory regions you onlined for memory hot-add are now 
offlined.  Notice, also, how it touches no other slab code as implementing 
new support for something shouldn't.  There is no need for additional 
hacks to be added in other slab code if you properly allocate and 
initialize the nodelists for the memory being added before it is available 
for use by the kernel.

If you'd test my patch out on your setup, that would be very helpful.  I 
can address any additional issues that you may undercover if you post the 
oops while doing either memory online or offline.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-02 20:20                                         ` Christoph Lameter
@ 2010-03-03  1:28                                           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 170+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-03  1:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Tue, 2 Mar 2010 14:20:06 -0600 (CST)
Christoph Lameter <cl@linux-foundation.org> wrote:

> 
> Not sure how this would sync with slab use during node bootstrap and
> shutdown. Kame-san?
> 
> Otherwise
> 
> Acked-by: Christoph Lameter <cl@linux-foundation.org>
> 

What this patch fixes ? Maybe I miss something...

At node hot-add

 * pgdat is allocated from other node (because we have no memory for "nid")
 * memmap for the first section (and possiby others) will be allocated from
   other nodes.
 * Once a section for the node is onlined, any memory can be allocated localy.

   (Allocating memory from local node requires some new implementation as
    bootmem allocater, we didn't that.)

 Before this patch, slab's control layer is allocated by cpuhotplug.
 So, at least keeping this order,
    memory online -> cpu online
 slab's control layer is allocated from local node.

 When node-hotadd is done in this order
    cpu online -> memory online
 kmalloc_node() will allocate memory from other node via fallback.

 After this patch, slab's control layer is allocated by memory hotplug.
 Then, in any order, slab's control will be allocated via fallback routine.

If this patch is an alternative fix for Andi's this logic
==
Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -4093,6 +4093,9 @@ static void cache_reap(struct work_struc
 		 * we can do some work if the lock was obtained.
 		 */
 		l3 = searchp->nodelists[node];
+		/* Note node yet set up */
+		if (!l3)
+			break;
==
I'm not sure this really happens.

cache_reap() is for checking local node. The caller is set up by
CPU_ONLINE. searchp->nodelists[] is filled by CPU_PREPARE.

Then, cpu for the node should be onlined. (and it's done under proper mutex.)

I'm sorry if I miss something important. But how anyone cause this ?

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-03  1:28                                           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 170+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-03  1:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Tue, 2 Mar 2010 14:20:06 -0600 (CST)
Christoph Lameter <cl@linux-foundation.org> wrote:

> 
> Not sure how this would sync with slab use during node bootstrap and
> shutdown. Kame-san?
> 
> Otherwise
> 
> Acked-by: Christoph Lameter <cl@linux-foundation.org>
> 

What this patch fixes ? Maybe I miss something...

At node hot-add

 * pgdat is allocated from other node (because we have no memory for "nid")
 * memmap for the first section (and possiby others) will be allocated from
   other nodes.
 * Once a section for the node is onlined, any memory can be allocated localy.

   (Allocating memory from local node requires some new implementation as
    bootmem allocater, we didn't that.)

 Before this patch, slab's control layer is allocated by cpuhotplug.
 So, at least keeping this order,
    memory online -> cpu online
 slab's control layer is allocated from local node.

 When node-hotadd is done in this order
    cpu online -> memory online
 kmalloc_node() will allocate memory from other node via fallback.

 After this patch, slab's control layer is allocated by memory hotplug.
 Then, in any order, slab's control will be allocated via fallback routine.

If this patch is an alternative fix for Andi's this logic
==
Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -4093,6 +4093,9 @@ static void cache_reap(struct work_struc
 		 * we can do some work if the lock was obtained.
 		 */
 		l3 = searchp->nodelists[node];
+		/* Note node yet set up */
+		if (!l3)
+			break;
==
I'm not sure this really happens.

cache_reap() is for checking local node. The caller is set up by
CPU_ONLINE. searchp->nodelists[] is filled by CPU_PREPARE.

Then, cpu for the node should be onlined. (and it's done under proper mutex.)

I'm sorry if I miss something important. But how anyone cause this ?

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-03  1:28                                           ` KAMEZAWA Hiroyuki
@ 2010-03-03  2:39                                             ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-03  2:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Christoph Lameter, Pekka Enberg, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Wed, 3 Mar 2010, KAMEZAWA Hiroyuki wrote:

> At node hot-add
> 
>  * pgdat is allocated from other node (because we have no memory for "nid")
>  * memmap for the first section (and possiby others) will be allocated from
>    other nodes.
>  * Once a section for the node is onlined, any memory can be allocated localy.
> 

Correct, and the struct kmem_list3 is also alloacted from other nodes with 
my patch.

>    (Allocating memory from local node requires some new implementation as
>     bootmem allocater, we didn't that.)
> 
>  Before this patch, slab's control layer is allocated by cpuhotplug.
>  So, at least keeping this order,
>     memory online -> cpu online
>  slab's control layer is allocated from local node.
> 
>  When node-hotadd is done in this order
>     cpu online -> memory online
>  kmalloc_node() will allocate memory from other node via fallback.
> 
>  After this patch, slab's control layer is allocated by memory hotplug.
>  Then, in any order, slab's control will be allocated via fallback routine.
> 

Again, this addresses memory hotplug that requires a new node to be 
onlined that do not have corresponding cpus that are being onlined.  On 
x86, these represent ACPI_SRAT_MEM_HOT_PLUGGABLE regions that are onlined 
either by the acpi hotplug or done manually with CONFIG_ARCH_MEMORY_PROBE.  
On other architectures such as powerpc, this is done in different ways.

All of this is spelled out in the changelog for the patch.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-03  2:39                                             ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-03  2:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Christoph Lameter, Pekka Enberg, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Wed, 3 Mar 2010, KAMEZAWA Hiroyuki wrote:

> At node hot-add
> 
>  * pgdat is allocated from other node (because we have no memory for "nid")
>  * memmap for the first section (and possiby others) will be allocated from
>    other nodes.
>  * Once a section for the node is onlined, any memory can be allocated localy.
> 

Correct, and the struct kmem_list3 is also alloacted from other nodes with 
my patch.

>    (Allocating memory from local node requires some new implementation as
>     bootmem allocater, we didn't that.)
> 
>  Before this patch, slab's control layer is allocated by cpuhotplug.
>  So, at least keeping this order,
>     memory online -> cpu online
>  slab's control layer is allocated from local node.
> 
>  When node-hotadd is done in this order
>     cpu online -> memory online
>  kmalloc_node() will allocate memory from other node via fallback.
> 
>  After this patch, slab's control layer is allocated by memory hotplug.
>  Then, in any order, slab's control will be allocated via fallback routine.
> 

Again, this addresses memory hotplug that requires a new node to be 
onlined that do not have corresponding cpus that are being onlined.  On 
x86, these represent ACPI_SRAT_MEM_HOT_PLUGGABLE regions that are onlined 
either by the acpi hotplug or done manually with CONFIG_ARCH_MEMORY_PROBE.  
On other architectures such as powerpc, this is done in different ways.

All of this is spelled out in the changelog for the patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-03  2:39                                             ` David Rientjes
@ 2010-03-03  2:51                                               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 170+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-03  2:51 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Pekka Enberg, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Tue, 2 Mar 2010 18:39:20 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 3 Mar 2010, KAMEZAWA Hiroyuki wrote:
> 
> > At node hot-add
> > 
> >  * pgdat is allocated from other node (because we have no memory for "nid")
> >  * memmap for the first section (and possiby others) will be allocated from
> >    other nodes.
> >  * Once a section for the node is onlined, any memory can be allocated localy.
> > 
> 
> Correct, and the struct kmem_list3 is also alloacted from other nodes with 
> my patch.
> 
> >    (Allocating memory from local node requires some new implementation as
> >     bootmem allocater, we didn't that.)
> > 
> >  Before this patch, slab's control layer is allocated by cpuhotplug.
> >  So, at least keeping this order,
> >     memory online -> cpu online
> >  slab's control layer is allocated from local node.
> > 
> >  When node-hotadd is done in this order
> >     cpu online -> memory online
> >  kmalloc_node() will allocate memory from other node via fallback.
> > 
> >  After this patch, slab's control layer is allocated by memory hotplug.
> >  Then, in any order, slab's control will be allocated via fallback routine.
> > 
> 
> Again, this addresses memory hotplug that requires a new node to be 
> onlined that do not have corresponding cpus that are being onlined.  On 
> x86, these represent ACPI_SRAT_MEM_HOT_PLUGGABLE regions that are onlined 
> either by the acpi hotplug or done manually with CONFIG_ARCH_MEMORY_PROBE.  
> On other architectures such as powerpc, this is done in different ways.
> 
> All of this is spelled out in the changelog for the patch.
> 
Ah, ok. for cpu-less node and kmallco_node() against that node.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-03  2:51                                               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 170+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-03-03  2:51 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Pekka Enberg, Andi Kleen, Nick Piggin,
	linux-kernel, linux-mm, haicheng.li

On Tue, 2 Mar 2010 18:39:20 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 3 Mar 2010, KAMEZAWA Hiroyuki wrote:
> 
> > At node hot-add
> > 
> >  * pgdat is allocated from other node (because we have no memory for "nid")
> >  * memmap for the first section (and possiby others) will be allocated from
> >    other nodes.
> >  * Once a section for the node is onlined, any memory can be allocated localy.
> > 
> 
> Correct, and the struct kmem_list3 is also alloacted from other nodes with 
> my patch.
> 
> >    (Allocating memory from local node requires some new implementation as
> >     bootmem allocater, we didn't that.)
> > 
> >  Before this patch, slab's control layer is allocated by cpuhotplug.
> >  So, at least keeping this order,
> >     memory online -> cpu online
> >  slab's control layer is allocated from local node.
> > 
> >  When node-hotadd is done in this order
> >     cpu online -> memory online
> >  kmalloc_node() will allocate memory from other node via fallback.
> > 
> >  After this patch, slab's control layer is allocated by memory hotplug.
> >  Then, in any order, slab's control will be allocated via fallback routine.
> > 
> 
> Again, this addresses memory hotplug that requires a new node to be 
> onlined that do not have corresponding cpus that are being onlined.  On 
> x86, these represent ACPI_SRAT_MEM_HOT_PLUGGABLE regions that are onlined 
> either by the acpi hotplug or done manually with CONFIG_ARCH_MEMORY_PROBE.  
> On other architectures such as powerpc, this is done in different ways.
> 
> All of this is spelled out in the changelog for the patch.
> 
Ah, ok. for cpu-less node and kmallco_node() against that node.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-02 15:04                                         ` Pekka Enberg
@ 2010-03-03 14:34                                           ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-03 14:34 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, David Rientjes, Nick Piggin, Christoph Lameter,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

> > The patch looks far more complicated than my simple fix.
> 
> I wouldn't exactly call the fallback_alloc() games "simple".

I have to disagree on that.  It was the most simple fix I could
come up with, least intrusive to legacy like slab is.

> > Is more complicated now better?
> 
> Heh, heh. You can't post the oops, you don't want to rework your

The missing oops was about the timer race, not about this one.

> patches as per review comments, and now you complain about David's
> patch without one bit of technical content. I'm sorry but I must

Well sorry I'm just a bit frustrated about the glacial progress on what
should be relatively straight forward fixes.

IMHO something like my patch should have gone into .33 and any more
complicated reworks like this into .34.

> But anyway, if you have real technical concerns over the patch, please
> make them known; otherwise I'd much appreciate a Tested-by tag from
> you for David's patch.

If it works it would be ok for me. The main concern would be to actually
get it fixed.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-03 14:34                                           ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-03 14:34 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, David Rientjes, Nick Piggin, Christoph Lameter,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

> > The patch looks far more complicated than my simple fix.
> 
> I wouldn't exactly call the fallback_alloc() games "simple".

I have to disagree on that.  It was the most simple fix I could
come up with, least intrusive to legacy like slab is.

> > Is more complicated now better?
> 
> Heh, heh. You can't post the oops, you don't want to rework your

The missing oops was about the timer race, not about this one.

> patches as per review comments, and now you complain about David's
> patch without one bit of technical content. I'm sorry but I must

Well sorry I'm just a bit frustrated about the glacial progress on what
should be relatively straight forward fixes.

IMHO something like my patch should have gone into .33 and any more
complicated reworks like this into .34.

> But anyway, if you have real technical concerns over the patch, please
> make them known; otherwise I'd much appreciate a Tested-by tag from
> you for David's patch.

If it works it would be ok for me. The main concern would be to actually
get it fixed.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-03 14:34                                           ` Andi Kleen
@ 2010-03-03 15:46                                             ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-03-03 15:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Pekka Enberg, David Rientjes, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Wed, 3 Mar 2010, Andi Kleen wrote:

> > But anyway, if you have real technical concerns over the patch, please
> > make them known; otherwise I'd much appreciate a Tested-by tag from
> > you for David's patch.
>
> If it works it would be ok for me. The main concern would be to actually
> get it fixed.

You do not have a testcase? This is a result of code review?


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-03 15:46                                             ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-03-03 15:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Pekka Enberg, David Rientjes, Nick Piggin, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Wed, 3 Mar 2010, Andi Kleen wrote:

> > But anyway, if you have real technical concerns over the patch, please
> > make them known; otherwise I'd much appreciate a Tested-by tag from
> > you for David's patch.
>
> If it works it would be ok for me. The main concern would be to actually
> get it fixed.

You do not have a testcase? This is a result of code review?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-01 10:24                                     ` David Rientjes
@ 2010-03-05  6:20                                       ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-03-05  6:20 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Mon, Mar 01, 2010 at 02:24:43AM -0800, David Rientjes wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
> 
> When a node is hotadded, a nodelist for that node is allocated and 
> initialized for each slab cache.  If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.  
> 
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
> 
> When an entire node is offlined (or an online is aborted), these
> nodelists are subsequently drained and freed.  If objects still exist
> either on the partial or full lists for those nodes, the offline is
> aborted.  This scenario will not occur for an aborted online, however,
> since objects can never be allocated from those nodelists until the
> online has completed.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

This looks OK to me in general. Couple of questions though:

> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> +/*
> + * Drains and frees nodelists for a node on each slab cache, used for memory
> + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> + * hot-remove so that the node is not removed.  When used because memory
> + * hot-add is canceled, the only result is the freed kmem_list3.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int __meminit free_cache_nodelists_node(int node)
> +{
> +	struct kmem_cache *cachep;
> +	int ret = 0;
> +
> +	list_for_each_entry(cachep, &cache_chain, next) {
> +		struct array_cache *shared;
> +		struct array_cache **alien;
> +		struct kmem_list3 *l3;
> +
> +		l3 = cachep->nodelists[node];
> +		if (!l3)
> +			continue;
> +
> +		spin_lock_irq(&l3->list_lock);
> +		shared = l3->shared;
> +		if (shared) {
> +			free_block(cachep, shared->entry, shared->avail, node);
> +			l3->shared = NULL;
> +		}
> +		alien = l3->alien;
> +		l3->alien = NULL;
> +		spin_unlock_irq(&l3->list_lock);
> +
> +		if (alien) {
> +			drain_alien_cache(cachep, alien);
> +			free_alien_cache(alien);
> +		}
> +		kfree(shared);
> +
> +		drain_freelist(cachep, l3, l3->free_objects);
> +		if (!list_empty(&l3->slabs_full) ||
> +					!list_empty(&l3->slabs_partial)) {
> +			/*
> +			 * Continue to iterate through each slab cache to free
> +			 * as many nodelists as possible even though the
> +			 * offline will be canceled.
> +			 */
> +			ret = -EBUSY;
> +			continue;
> +		}
> +		kfree(l3);
> +		cachep->nodelists[node] = NULL;

What's stopping races of other CPUs trying to access l3 and array
caches while they're being freed?

> +	}
> +	return ret;
> +}
> +
> +/*
> + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> + */
> +static int __meminit slab_node_online(int nid)
> +{
> +	int ret;
> +	mutex_lock(&cache_chain_mutex);
> +	ret = init_cache_nodelists_node(nid);
> +	mutex_unlock(&cache_chain_mutex);
> +	return ret;
> +}
> +
> +/*
> + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> + */
> +static int __meminit slab_node_offline(int nid)
> +{
> +	int ret;
> +	mutex_lock(&cache_chain_mutex);
> +	ret = free_cache_nodelists_node(nid);
> +	mutex_unlock(&cache_chain_mutex);
> +	return ret;
> +}
> +
> +static int __meminit slab_memory_callback(struct notifier_block *self,
> +					unsigned long action, void *arg)
> +{
> +	struct memory_notify *mnb = arg;
> +	int ret = 0;
> +	int nid;
> +
> +	nid = mnb->status_change_nid;
> +	if (nid < 0)
> +		goto out;
> +
> +	switch (action) {
> +	case MEM_GOING_ONLINE:
> +	case MEM_CANCEL_OFFLINE:
> +		ret = slab_node_online(nid);
> +		break;

This would explode if CANCEL_OFFLINE fails. Call it theoretical and
put a panic() in here and I don't mind. Otherwise you get corruption
somewhere in the slab code.


> +	case MEM_GOING_OFFLINE:
> +	case MEM_CANCEL_ONLINE:
> +		ret = slab_node_offline(nid);
> +		break;
> +	case MEM_ONLINE:
> +	case MEM_OFFLINE:
> +		break;
> +	}
> +out:
> +	return ret ? notifier_from_errno(ret) : NOTIFY_OK;
> +}
> +#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
> +
>  /*
>   * swap the static kmem_list3 with kmalloced memory
>   */
> -static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> -			int nodeid)
> +static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> +				int nodeid)
>  {
>  	struct kmem_list3 *ptr;
>  
> @@ -1583,6 +1713,14 @@ void __init kmem_cache_init_late(void)
>  	 */
>  	register_cpu_notifier(&cpucache_notifier);
>  
> +#ifdef CONFIG_NUMA
> +	/*
> +	 * Register a memory hotplug callback that initializes and frees
> +	 * nodelists.
> +	 */
> +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +#endif
> +
>  	/*
>  	 * The reap timers are started later, with a module init call: That part
>  	 * of the kernel is not yet operational.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-05  6:20                                       ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-03-05  6:20 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Mon, Mar 01, 2010 at 02:24:43AM -0800, David Rientjes wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
> 
> When a node is hotadded, a nodelist for that node is allocated and 
> initialized for each slab cache.  If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.  
> 
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
> 
> When an entire node is offlined (or an online is aborted), these
> nodelists are subsequently drained and freed.  If objects still exist
> either on the partial or full lists for those nodes, the offline is
> aborted.  This scenario will not occur for an aborted online, however,
> since objects can never be allocated from those nodelists until the
> online has completed.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

This looks OK to me in general. Couple of questions though:

> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> +/*
> + * Drains and frees nodelists for a node on each slab cache, used for memory
> + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> + * hot-remove so that the node is not removed.  When used because memory
> + * hot-add is canceled, the only result is the freed kmem_list3.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int __meminit free_cache_nodelists_node(int node)
> +{
> +	struct kmem_cache *cachep;
> +	int ret = 0;
> +
> +	list_for_each_entry(cachep, &cache_chain, next) {
> +		struct array_cache *shared;
> +		struct array_cache **alien;
> +		struct kmem_list3 *l3;
> +
> +		l3 = cachep->nodelists[node];
> +		if (!l3)
> +			continue;
> +
> +		spin_lock_irq(&l3->list_lock);
> +		shared = l3->shared;
> +		if (shared) {
> +			free_block(cachep, shared->entry, shared->avail, node);
> +			l3->shared = NULL;
> +		}
> +		alien = l3->alien;
> +		l3->alien = NULL;
> +		spin_unlock_irq(&l3->list_lock);
> +
> +		if (alien) {
> +			drain_alien_cache(cachep, alien);
> +			free_alien_cache(alien);
> +		}
> +		kfree(shared);
> +
> +		drain_freelist(cachep, l3, l3->free_objects);
> +		if (!list_empty(&l3->slabs_full) ||
> +					!list_empty(&l3->slabs_partial)) {
> +			/*
> +			 * Continue to iterate through each slab cache to free
> +			 * as many nodelists as possible even though the
> +			 * offline will be canceled.
> +			 */
> +			ret = -EBUSY;
> +			continue;
> +		}
> +		kfree(l3);
> +		cachep->nodelists[node] = NULL;

What's stopping races of other CPUs trying to access l3 and array
caches while they're being freed?

> +	}
> +	return ret;
> +}
> +
> +/*
> + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> + */
> +static int __meminit slab_node_online(int nid)
> +{
> +	int ret;
> +	mutex_lock(&cache_chain_mutex);
> +	ret = init_cache_nodelists_node(nid);
> +	mutex_unlock(&cache_chain_mutex);
> +	return ret;
> +}
> +
> +/*
> + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> + */
> +static int __meminit slab_node_offline(int nid)
> +{
> +	int ret;
> +	mutex_lock(&cache_chain_mutex);
> +	ret = free_cache_nodelists_node(nid);
> +	mutex_unlock(&cache_chain_mutex);
> +	return ret;
> +}
> +
> +static int __meminit slab_memory_callback(struct notifier_block *self,
> +					unsigned long action, void *arg)
> +{
> +	struct memory_notify *mnb = arg;
> +	int ret = 0;
> +	int nid;
> +
> +	nid = mnb->status_change_nid;
> +	if (nid < 0)
> +		goto out;
> +
> +	switch (action) {
> +	case MEM_GOING_ONLINE:
> +	case MEM_CANCEL_OFFLINE:
> +		ret = slab_node_online(nid);
> +		break;

This would explode if CANCEL_OFFLINE fails. Call it theoretical and
put a panic() in here and I don't mind. Otherwise you get corruption
somewhere in the slab code.


> +	case MEM_GOING_OFFLINE:
> +	case MEM_CANCEL_ONLINE:
> +		ret = slab_node_offline(nid);
> +		break;
> +	case MEM_ONLINE:
> +	case MEM_OFFLINE:
> +		break;
> +	}
> +out:
> +	return ret ? notifier_from_errno(ret) : NOTIFY_OK;
> +}
> +#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
> +
>  /*
>   * swap the static kmem_list3 with kmalloced memory
>   */
> -static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> -			int nodeid)
> +static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> +				int nodeid)
>  {
>  	struct kmem_list3 *ptr;
>  
> @@ -1583,6 +1713,14 @@ void __init kmem_cache_init_late(void)
>  	 */
>  	register_cpu_notifier(&cpucache_notifier);
>  
> +#ifdef CONFIG_NUMA
> +	/*
> +	 * Register a memory hotplug callback that initializes and frees
> +	 * nodelists.
> +	 */
> +	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +#endif
> +
>  	/*
>  	 * The reap timers are started later, with a module init call: That part
>  	 * of the kernel is not yet operational.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-05  6:20                                       ` Nick Piggin
@ 2010-03-05 12:47                                         ` Anca Emanuel
  -1 siblings, 0 replies; 170+ messages in thread
From: Anca Emanuel @ 2010-03-05 12:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Rientjes, Pekka Enberg, Andi Kleen, Christoph Lameter,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Dumb question: it is possible to hot remove the (bad) memory ? And add
an good one ?
Where is the detection code for the bad module ?

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-05 12:47                                         ` Anca Emanuel
  0 siblings, 0 replies; 170+ messages in thread
From: Anca Emanuel @ 2010-03-05 12:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Rientjes, Pekka Enberg, Andi Kleen, Christoph Lameter,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Dumb question: it is possible to hot remove the (bad) memory ? And add
an good one ?
Where is the detection code for the bad module ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-05 12:47                                         ` Anca Emanuel
@ 2010-03-05 13:58                                           ` Anca Emanuel
  -1 siblings, 0 replies; 170+ messages in thread
From: Anca Emanuel @ 2010-03-05 13:58 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Rientjes, Pekka Enberg, Andi Kleen, Christoph Lameter,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

You can contact Samuel Demeulemeester for help,  memtest@memtest.org

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-05 13:58                                           ` Anca Emanuel
  0 siblings, 0 replies; 170+ messages in thread
From: Anca Emanuel @ 2010-03-05 13:58 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Rientjes, Pekka Enberg, Andi Kleen, Christoph Lameter,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

You can contact Samuel Demeulemeester for help,  memtest@memtest.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-05 12:47                                         ` Anca Emanuel
@ 2010-03-05 14:11                                           ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-03-05 14:11 UTC (permalink / raw)
  To: Anca Emanuel
  Cc: Nick Piggin, David Rientjes, Pekka Enberg, Andi Kleen,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, 5 Mar 2010, Anca Emanuel wrote:

> Dumb question: it is possible to hot remove the (bad) memory ? And add
> an good one ?

Under certain conditions this is possible. If the bad memory was modified
then you have a condition that requires termination of all processes that
are using the memory. If its the kernel then you need to reboot.

If the memory contains a page from disk then the memory can be moved
elsewhere.

If you can clean up a whole range like that then its possible to replace
the memory.




^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-05 14:11                                           ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-03-05 14:11 UTC (permalink / raw)
  To: Anca Emanuel
  Cc: Nick Piggin, David Rientjes, Pekka Enberg, Andi Kleen,
	linux-kernel, linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, 5 Mar 2010, Anca Emanuel wrote:

> Dumb question: it is possible to hot remove the (bad) memory ? And add
> an good one ?

Under certain conditions this is possible. If the bad memory was modified
then you have a condition that requires termination of all processes that
are using the memory. If its the kernel then you need to reboot.

If the memory contains a page from disk then the memory can be moved
elsewhere.

If you can clean up a whole range like that then its possible to replace
the memory.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-05 12:47                                         ` Anca Emanuel
@ 2010-03-08  2:58                                           ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-08  2:58 UTC (permalink / raw)
  To: Anca Emanuel
  Cc: Nick Piggin, David Rientjes, Pekka Enberg, Andi Kleen,
	Christoph Lameter, linux-kernel, linux-mm, haicheng.li,
	KAMEZAWA Hiroyuki

On Fri, Mar 05, 2010 at 02:47:04PM +0200, Anca Emanuel wrote:
> Dumb question: it is possible to hot remove the (bad) memory ? And add
> an good one ?

Not the complete DIMM, but if a specific page containing a stuck
bit or similar can be removed since 2.6.33 yes

In theory you could add new memory replacing that memory if your
hardware and your kernel supports that, but typically that's
not worth it for a few K.

> Where is the detection code for the bad module ?

Part of the code is in the kernel, part in mcelog.
It only works with ECC memory and supported systems ATM (currently
Nehalem class Intel Xeon systems)

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-08  2:58                                           ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-08  2:58 UTC (permalink / raw)
  To: Anca Emanuel
  Cc: Nick Piggin, David Rientjes, Pekka Enberg, Andi Kleen,
	Christoph Lameter, linux-kernel, linux-mm, haicheng.li,
	KAMEZAWA Hiroyuki

On Fri, Mar 05, 2010 at 02:47:04PM +0200, Anca Emanuel wrote:
> Dumb question: it is possible to hot remove the (bad) memory ? And add
> an good one ?

Not the complete DIMM, but if a specific page containing a stuck
bit or similar can be removed since 2.6.33 yes

In theory you could add new memory replacing that memory if your
hardware and your kernel supports that, but typically that's
not worth it for a few K.

> Where is the detection code for the bad module ?

Part of the code is in the kernel, part in mcelog.
It only works with ECC memory and supported systems ATM (currently
Nehalem class Intel Xeon systems)

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-05 14:11                                           ` Christoph Lameter
@ 2010-03-08  3:06                                             ` Andi Kleen
  -1 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-08  3:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Anca Emanuel, Nick Piggin, David Rientjes, Pekka Enberg,
	Andi Kleen, linux-kernel, linux-mm, haicheng.li,
	KAMEZAWA Hiroyuki

> Under certain conditions this is possible. If the bad memory was modified
> then you have a condition that requires termination of all processes that
> are using the memory. If its the kernel then you need to reboot.
> 
> If the memory contains a page from disk then the memory can be moved
> elsewhere.
> 
> If you can clean up a whole range like that then its possible to replace
> the memory.

Typically that's not possible because of the way DIMMs are interleaved --
the to be freed areas would be very large, and with a specific size
there are always kernel or unmovable user areas areas in the way.

In general on Linux hot DIMM replacement only works if the underlying
platform does it transparently (e.g. support memory RAID and chipkill) 
and you have enough redundant memory for it.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-08  3:06                                             ` Andi Kleen
  0 siblings, 0 replies; 170+ messages in thread
From: Andi Kleen @ 2010-03-08  3:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Anca Emanuel, Nick Piggin, David Rientjes, Pekka Enberg,
	Andi Kleen, linux-kernel, linux-mm, haicheng.li,
	KAMEZAWA Hiroyuki

> Under certain conditions this is possible. If the bad memory was modified
> then you have a condition that requires termination of all processes that
> are using the memory. If its the kernel then you need to reboot.
> 
> If the memory contains a page from disk then the memory can be moved
> elsewhere.
> 
> If you can clean up a whole range like that then its possible to replace
> the memory.

Typically that's not possible because of the way DIMMs are interleaved --
the to be freed areas would be very large, and with a specific size
there are always kernel or unmovable user areas areas in the way.

In general on Linux hot DIMM replacement only works if the underlying
platform does it transparently (e.g. support memory RAID and chipkill) 
and you have enough redundant memory for it.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-05  6:20                                       ` Nick Piggin
@ 2010-03-08 23:19                                         ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-08 23:19 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, 5 Mar 2010, Nick Piggin wrote:

> > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> > +/*
> > + * Drains and frees nodelists for a node on each slab cache, used for memory
> > + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> > + * hot-remove so that the node is not removed.  When used because memory
> > + * hot-add is canceled, the only result is the freed kmem_list3.
> > + *
> > + * Must hold cache_chain_mutex.
> > + */
> > +static int __meminit free_cache_nodelists_node(int node)
> > +{
> > +	struct kmem_cache *cachep;
> > +	int ret = 0;
> > +
> > +	list_for_each_entry(cachep, &cache_chain, next) {
> > +		struct array_cache *shared;
> > +		struct array_cache **alien;
> > +		struct kmem_list3 *l3;
> > +
> > +		l3 = cachep->nodelists[node];
> > +		if (!l3)
> > +			continue;
> > +
> > +		spin_lock_irq(&l3->list_lock);
> > +		shared = l3->shared;
> > +		if (shared) {
> > +			free_block(cachep, shared->entry, shared->avail, node);
> > +			l3->shared = NULL;
> > +		}
> > +		alien = l3->alien;
> > +		l3->alien = NULL;
> > +		spin_unlock_irq(&l3->list_lock);
> > +
> > +		if (alien) {
> > +			drain_alien_cache(cachep, alien);
> > +			free_alien_cache(alien);
> > +		}
> > +		kfree(shared);
> > +
> > +		drain_freelist(cachep, l3, l3->free_objects);
> > +		if (!list_empty(&l3->slabs_full) ||
> > +					!list_empty(&l3->slabs_partial)) {
> > +			/*
> > +			 * Continue to iterate through each slab cache to free
> > +			 * as many nodelists as possible even though the
> > +			 * offline will be canceled.
> > +			 */
> > +			ret = -EBUSY;
> > +			continue;
> > +		}
> > +		kfree(l3);
> > +		cachep->nodelists[node] = NULL;
> 
> What's stopping races of other CPUs trying to access l3 and array
> caches while they're being freed?
> 

numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
already does a fallback to other onlined nodes in case a nodeid is passed 
to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).

> > +	}
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> > + */
> > +static int __meminit slab_node_online(int nid)
> > +{
> > +	int ret;
> > +	mutex_lock(&cache_chain_mutex);
> > +	ret = init_cache_nodelists_node(nid);
> > +	mutex_unlock(&cache_chain_mutex);
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> > + */
> > +static int __meminit slab_node_offline(int nid)
> > +{
> > +	int ret;
> > +	mutex_lock(&cache_chain_mutex);
> > +	ret = free_cache_nodelists_node(nid);
> > +	mutex_unlock(&cache_chain_mutex);
> > +	return ret;
> > +}
> > +
> > +static int __meminit slab_memory_callback(struct notifier_block *self,
> > +					unsigned long action, void *arg)
> > +{
> > +	struct memory_notify *mnb = arg;
> > +	int ret = 0;
> > +	int nid;
> > +
> > +	nid = mnb->status_change_nid;
> > +	if (nid < 0)
> > +		goto out;
> > +
> > +	switch (action) {
> > +	case MEM_GOING_ONLINE:
> > +	case MEM_CANCEL_OFFLINE:
> > +		ret = slab_node_online(nid);
> > +		break;
> 
> This would explode if CANCEL_OFFLINE fails. Call it theoretical and
> put a panic() in here and I don't mind. Otherwise you get corruption
> somewhere in the slab code.
> 

MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be 
allocated anywhere on the system and if that happens then the node simply 
couldn't be allocated from (numa_node_id() would never return it as the 
cpu's node, so it's possible to fallback in this scenario).

Instead of doing this all at MEM_GOING_OFFLINE, we could delay freeing of 
the array caches and the nodelist until MEM_OFFLINE.  We're guaranteed 
that all pages are freed at that point so there are no existing objects 
that we need to track and then if the offline fails from a different 
callback it would be possible to reset the l3->nodelists[node] pointers 
since they haven't been freed yet.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-08 23:19                                         ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-08 23:19 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Fri, 5 Mar 2010, Nick Piggin wrote:

> > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> > +/*
> > + * Drains and frees nodelists for a node on each slab cache, used for memory
> > + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> > + * hot-remove so that the node is not removed.  When used because memory
> > + * hot-add is canceled, the only result is the freed kmem_list3.
> > + *
> > + * Must hold cache_chain_mutex.
> > + */
> > +static int __meminit free_cache_nodelists_node(int node)
> > +{
> > +	struct kmem_cache *cachep;
> > +	int ret = 0;
> > +
> > +	list_for_each_entry(cachep, &cache_chain, next) {
> > +		struct array_cache *shared;
> > +		struct array_cache **alien;
> > +		struct kmem_list3 *l3;
> > +
> > +		l3 = cachep->nodelists[node];
> > +		if (!l3)
> > +			continue;
> > +
> > +		spin_lock_irq(&l3->list_lock);
> > +		shared = l3->shared;
> > +		if (shared) {
> > +			free_block(cachep, shared->entry, shared->avail, node);
> > +			l3->shared = NULL;
> > +		}
> > +		alien = l3->alien;
> > +		l3->alien = NULL;
> > +		spin_unlock_irq(&l3->list_lock);
> > +
> > +		if (alien) {
> > +			drain_alien_cache(cachep, alien);
> > +			free_alien_cache(alien);
> > +		}
> > +		kfree(shared);
> > +
> > +		drain_freelist(cachep, l3, l3->free_objects);
> > +		if (!list_empty(&l3->slabs_full) ||
> > +					!list_empty(&l3->slabs_partial)) {
> > +			/*
> > +			 * Continue to iterate through each slab cache to free
> > +			 * as many nodelists as possible even though the
> > +			 * offline will be canceled.
> > +			 */
> > +			ret = -EBUSY;
> > +			continue;
> > +		}
> > +		kfree(l3);
> > +		cachep->nodelists[node] = NULL;
> 
> What's stopping races of other CPUs trying to access l3 and array
> caches while they're being freed?
> 

numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
already does a fallback to other onlined nodes in case a nodeid is passed 
to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).

> > +	}
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> > + */
> > +static int __meminit slab_node_online(int nid)
> > +{
> > +	int ret;
> > +	mutex_lock(&cache_chain_mutex);
> > +	ret = init_cache_nodelists_node(nid);
> > +	mutex_unlock(&cache_chain_mutex);
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> > + */
> > +static int __meminit slab_node_offline(int nid)
> > +{
> > +	int ret;
> > +	mutex_lock(&cache_chain_mutex);
> > +	ret = free_cache_nodelists_node(nid);
> > +	mutex_unlock(&cache_chain_mutex);
> > +	return ret;
> > +}
> > +
> > +static int __meminit slab_memory_callback(struct notifier_block *self,
> > +					unsigned long action, void *arg)
> > +{
> > +	struct memory_notify *mnb = arg;
> > +	int ret = 0;
> > +	int nid;
> > +
> > +	nid = mnb->status_change_nid;
> > +	if (nid < 0)
> > +		goto out;
> > +
> > +	switch (action) {
> > +	case MEM_GOING_ONLINE:
> > +	case MEM_CANCEL_OFFLINE:
> > +		ret = slab_node_online(nid);
> > +		break;
> 
> This would explode if CANCEL_OFFLINE fails. Call it theoretical and
> put a panic() in here and I don't mind. Otherwise you get corruption
> somewhere in the slab code.
> 

MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be 
allocated anywhere on the system and if that happens then the node simply 
couldn't be allocated from (numa_node_id() would never return it as the 
cpu's node, so it's possible to fallback in this scenario).

Instead of doing this all at MEM_GOING_OFFLINE, we could delay freeing of 
the array caches and the nodelist until MEM_OFFLINE.  We're guaranteed 
that all pages are freed at that point so there are no existing objects 
that we need to track and then if the offline fails from a different 
callback it would be possible to reset the l3->nodelists[node] pointers 
since they haven't been freed yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-08 23:19                                         ` David Rientjes
@ 2010-03-09 13:46                                           ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-03-09 13:46 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> On Fri, 5 Mar 2010, Nick Piggin wrote:
> 
> > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> > > +/*
> > > + * Drains and frees nodelists for a node on each slab cache, used for memory
> > > + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> > > + * hot-remove so that the node is not removed.  When used because memory
> > > + * hot-add is canceled, the only result is the freed kmem_list3.
> > > + *
> > > + * Must hold cache_chain_mutex.
> > > + */
> > > +static int __meminit free_cache_nodelists_node(int node)
> > > +{
> > > +	struct kmem_cache *cachep;
> > > +	int ret = 0;
> > > +
> > > +	list_for_each_entry(cachep, &cache_chain, next) {
> > > +		struct array_cache *shared;
> > > +		struct array_cache **alien;
> > > +		struct kmem_list3 *l3;
> > > +
> > > +		l3 = cachep->nodelists[node];
> > > +		if (!l3)
> > > +			continue;
> > > +
> > > +		spin_lock_irq(&l3->list_lock);
> > > +		shared = l3->shared;
> > > +		if (shared) {
> > > +			free_block(cachep, shared->entry, shared->avail, node);
> > > +			l3->shared = NULL;
> > > +		}
> > > +		alien = l3->alien;
> > > +		l3->alien = NULL;
> > > +		spin_unlock_irq(&l3->list_lock);
> > > +
> > > +		if (alien) {
> > > +			drain_alien_cache(cachep, alien);
> > > +			free_alien_cache(alien);
> > > +		}
> > > +		kfree(shared);
> > > +
> > > +		drain_freelist(cachep, l3, l3->free_objects);
> > > +		if (!list_empty(&l3->slabs_full) ||
> > > +					!list_empty(&l3->slabs_partial)) {
> > > +			/*
> > > +			 * Continue to iterate through each slab cache to free
> > > +			 * as many nodelists as possible even though the
> > > +			 * offline will be canceled.
> > > +			 */
> > > +			ret = -EBUSY;
> > > +			continue;
> > > +		}
> > > +		kfree(l3);
> > > +		cachep->nodelists[node] = NULL;
> > 
> > What's stopping races of other CPUs trying to access l3 and array
> > caches while they're being freed?
> > 
> 
> numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
> already does a fallback to other onlined nodes in case a nodeid is passed 
> to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
> cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
> cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).

Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
while it is being accessed by another CPU and concurrently being freed
on this one? 


> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> > > + */
> > > +static int __meminit slab_node_online(int nid)
> > > +{
> > > +	int ret;
> > > +	mutex_lock(&cache_chain_mutex);
> > > +	ret = init_cache_nodelists_node(nid);
> > > +	mutex_unlock(&cache_chain_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> > > + */
> > > +static int __meminit slab_node_offline(int nid)
> > > +{
> > > +	int ret;
> > > +	mutex_lock(&cache_chain_mutex);
> > > +	ret = free_cache_nodelists_node(nid);
> > > +	mutex_unlock(&cache_chain_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +static int __meminit slab_memory_callback(struct notifier_block *self,
> > > +					unsigned long action, void *arg)
> > > +{
> > > +	struct memory_notify *mnb = arg;
> > > +	int ret = 0;
> > > +	int nid;
> > > +
> > > +	nid = mnb->status_change_nid;
> > > +	if (nid < 0)
> > > +		goto out;
> > > +
> > > +	switch (action) {
> > > +	case MEM_GOING_ONLINE:
> > > +	case MEM_CANCEL_OFFLINE:
> > > +		ret = slab_node_online(nid);
> > > +		break;
> > 
> > This would explode if CANCEL_OFFLINE fails. Call it theoretical and
> > put a panic() in here and I don't mind. Otherwise you get corruption
> > somewhere in the slab code.
> > 
> 
> MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be 
> allocated anywhere on the system and if that happens then the node simply 
> couldn't be allocated from (numa_node_id() would never return it as the 
> cpu's node, so it's possible to fallback in this scenario).

Why would it never return the CPU's node? It's CANCEL_OFFLINE that is
the problem.


> Instead of doing this all at MEM_GOING_OFFLINE, we could delay freeing of 
> the array caches and the nodelist until MEM_OFFLINE.  We're guaranteed 
> that all pages are freed at that point so there are no existing objects 
> that we need to track and then if the offline fails from a different 
> callback it would be possible to reset the l3->nodelists[node] pointers 
> since they haven't been freed yet.


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-09 13:46                                           ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-03-09 13:46 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> On Fri, 5 Mar 2010, Nick Piggin wrote:
> 
> > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> > > +/*
> > > + * Drains and frees nodelists for a node on each slab cache, used for memory
> > > + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> > > + * hot-remove so that the node is not removed.  When used because memory
> > > + * hot-add is canceled, the only result is the freed kmem_list3.
> > > + *
> > > + * Must hold cache_chain_mutex.
> > > + */
> > > +static int __meminit free_cache_nodelists_node(int node)
> > > +{
> > > +	struct kmem_cache *cachep;
> > > +	int ret = 0;
> > > +
> > > +	list_for_each_entry(cachep, &cache_chain, next) {
> > > +		struct array_cache *shared;
> > > +		struct array_cache **alien;
> > > +		struct kmem_list3 *l3;
> > > +
> > > +		l3 = cachep->nodelists[node];
> > > +		if (!l3)
> > > +			continue;
> > > +
> > > +		spin_lock_irq(&l3->list_lock);
> > > +		shared = l3->shared;
> > > +		if (shared) {
> > > +			free_block(cachep, shared->entry, shared->avail, node);
> > > +			l3->shared = NULL;
> > > +		}
> > > +		alien = l3->alien;
> > > +		l3->alien = NULL;
> > > +		spin_unlock_irq(&l3->list_lock);
> > > +
> > > +		if (alien) {
> > > +			drain_alien_cache(cachep, alien);
> > > +			free_alien_cache(alien);
> > > +		}
> > > +		kfree(shared);
> > > +
> > > +		drain_freelist(cachep, l3, l3->free_objects);
> > > +		if (!list_empty(&l3->slabs_full) ||
> > > +					!list_empty(&l3->slabs_partial)) {
> > > +			/*
> > > +			 * Continue to iterate through each slab cache to free
> > > +			 * as many nodelists as possible even though the
> > > +			 * offline will be canceled.
> > > +			 */
> > > +			ret = -EBUSY;
> > > +			continue;
> > > +		}
> > > +		kfree(l3);
> > > +		cachep->nodelists[node] = NULL;
> > 
> > What's stopping races of other CPUs trying to access l3 and array
> > caches while they're being freed?
> > 
> 
> numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
> already does a fallback to other onlined nodes in case a nodeid is passed 
> to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
> cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
> cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).

Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
while it is being accessed by another CPU and concurrently being freed
on this one? 


> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> > > + */
> > > +static int __meminit slab_node_online(int nid)
> > > +{
> > > +	int ret;
> > > +	mutex_lock(&cache_chain_mutex);
> > > +	ret = init_cache_nodelists_node(nid);
> > > +	mutex_unlock(&cache_chain_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> > > + */
> > > +static int __meminit slab_node_offline(int nid)
> > > +{
> > > +	int ret;
> > > +	mutex_lock(&cache_chain_mutex);
> > > +	ret = free_cache_nodelists_node(nid);
> > > +	mutex_unlock(&cache_chain_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +static int __meminit slab_memory_callback(struct notifier_block *self,
> > > +					unsigned long action, void *arg)
> > > +{
> > > +	struct memory_notify *mnb = arg;
> > > +	int ret = 0;
> > > +	int nid;
> > > +
> > > +	nid = mnb->status_change_nid;
> > > +	if (nid < 0)
> > > +		goto out;
> > > +
> > > +	switch (action) {
> > > +	case MEM_GOING_ONLINE:
> > > +	case MEM_CANCEL_OFFLINE:
> > > +		ret = slab_node_online(nid);
> > > +		break;
> > 
> > This would explode if CANCEL_OFFLINE fails. Call it theoretical and
> > put a panic() in here and I don't mind. Otherwise you get corruption
> > somewhere in the slab code.
> > 
> 
> MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be 
> allocated anywhere on the system and if that happens then the node simply 
> couldn't be allocated from (numa_node_id() would never return it as the 
> cpu's node, so it's possible to fallback in this scenario).

Why would it never return the CPU's node? It's CANCEL_OFFLINE that is
the problem.


> Instead of doing this all at MEM_GOING_OFFLINE, we could delay freeing of 
> the array caches and the nodelist until MEM_OFFLINE.  We're guaranteed 
> that all pages are freed at that point so there are no existing objects 
> that we need to track and then if the offline fails from a different 
> callback it would be possible to reset the l3->nodelists[node] pointers 
> since they haven't been freed yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-09 13:46                                           ` Nick Piggin
@ 2010-03-22 17:28                                             ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-03-22 17:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Rientjes, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Nick Piggin wrote:
> On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
>> On Fri, 5 Mar 2010, Nick Piggin wrote:
>>
>>>> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
>>>> +/*
>>>> + * Drains and frees nodelists for a node on each slab cache, used for memory
>>>> + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
>>>> + * hot-remove so that the node is not removed.  When used because memory
>>>> + * hot-add is canceled, the only result is the freed kmem_list3.
>>>> + *
>>>> + * Must hold cache_chain_mutex.
>>>> + */
>>>> +static int __meminit free_cache_nodelists_node(int node)
>>>> +{
>>>> +	struct kmem_cache *cachep;
>>>> +	int ret = 0;
>>>> +
>>>> +	list_for_each_entry(cachep, &cache_chain, next) {
>>>> +		struct array_cache *shared;
>>>> +		struct array_cache **alien;
>>>> +		struct kmem_list3 *l3;
>>>> +
>>>> +		l3 = cachep->nodelists[node];
>>>> +		if (!l3)
>>>> +			continue;
>>>> +
>>>> +		spin_lock_irq(&l3->list_lock);
>>>> +		shared = l3->shared;
>>>> +		if (shared) {
>>>> +			free_block(cachep, shared->entry, shared->avail, node);
>>>> +			l3->shared = NULL;
>>>> +		}
>>>> +		alien = l3->alien;
>>>> +		l3->alien = NULL;
>>>> +		spin_unlock_irq(&l3->list_lock);
>>>> +
>>>> +		if (alien) {
>>>> +			drain_alien_cache(cachep, alien);
>>>> +			free_alien_cache(alien);
>>>> +		}
>>>> +		kfree(shared);
>>>> +
>>>> +		drain_freelist(cachep, l3, l3->free_objects);
>>>> +		if (!list_empty(&l3->slabs_full) ||
>>>> +					!list_empty(&l3->slabs_partial)) {
>>>> +			/*
>>>> +			 * Continue to iterate through each slab cache to free
>>>> +			 * as many nodelists as possible even though the
>>>> +			 * offline will be canceled.
>>>> +			 */
>>>> +			ret = -EBUSY;
>>>> +			continue;
>>>> +		}
>>>> +		kfree(l3);
>>>> +		cachep->nodelists[node] = NULL;
>>> What's stopping races of other CPUs trying to access l3 and array
>>> caches while they're being freed?
>>>
>> numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
>> already does a fallback to other onlined nodes in case a nodeid is passed 
>> to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
>> cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
>> cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).
> 
> Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
> while it is being accessed by another CPU and concurrently being freed
> on this one? 
> 
> 
>>>> +	}
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
>>>> + */
>>>> +static int __meminit slab_node_online(int nid)
>>>> +{
>>>> +	int ret;
>>>> +	mutex_lock(&cache_chain_mutex);
>>>> +	ret = init_cache_nodelists_node(nid);
>>>> +	mutex_unlock(&cache_chain_mutex);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
>>>> + */
>>>> +static int __meminit slab_node_offline(int nid)
>>>> +{
>>>> +	int ret;
>>>> +	mutex_lock(&cache_chain_mutex);
>>>> +	ret = free_cache_nodelists_node(nid);
>>>> +	mutex_unlock(&cache_chain_mutex);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int __meminit slab_memory_callback(struct notifier_block *self,
>>>> +					unsigned long action, void *arg)
>>>> +{
>>>> +	struct memory_notify *mnb = arg;
>>>> +	int ret = 0;
>>>> +	int nid;
>>>> +
>>>> +	nid = mnb->status_change_nid;
>>>> +	if (nid < 0)
>>>> +		goto out;
>>>> +
>>>> +	switch (action) {
>>>> +	case MEM_GOING_ONLINE:
>>>> +	case MEM_CANCEL_OFFLINE:
>>>> +		ret = slab_node_online(nid);
>>>> +		break;
>>> This would explode if CANCEL_OFFLINE fails. Call it theoretical and
>>> put a panic() in here and I don't mind. Otherwise you get corruption
>>> somewhere in the slab code.
>>>
>> MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be 
>> allocated anywhere on the system and if that happens then the node simply 
>> couldn't be allocated from (numa_node_id() would never return it as the 
>> cpu's node, so it's possible to fallback in this scenario).
> 
> Why would it never return the CPU's node? It's CANCEL_OFFLINE that is
> the problem.

So I was thinking of pushing this towards Linus but I didn't see anyone 
respond to Nick's concerns. I'm not that familiar with all this hotplug 
stuff so can someone make also Nick happy so we can move forward?

			Pekka

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-22 17:28                                             ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-03-22 17:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Rientjes, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Nick Piggin wrote:
> On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
>> On Fri, 5 Mar 2010, Nick Piggin wrote:
>>
>>>> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
>>>> +/*
>>>> + * Drains and frees nodelists for a node on each slab cache, used for memory
>>>> + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
>>>> + * hot-remove so that the node is not removed.  When used because memory
>>>> + * hot-add is canceled, the only result is the freed kmem_list3.
>>>> + *
>>>> + * Must hold cache_chain_mutex.
>>>> + */
>>>> +static int __meminit free_cache_nodelists_node(int node)
>>>> +{
>>>> +	struct kmem_cache *cachep;
>>>> +	int ret = 0;
>>>> +
>>>> +	list_for_each_entry(cachep, &cache_chain, next) {
>>>> +		struct array_cache *shared;
>>>> +		struct array_cache **alien;
>>>> +		struct kmem_list3 *l3;
>>>> +
>>>> +		l3 = cachep->nodelists[node];
>>>> +		if (!l3)
>>>> +			continue;
>>>> +
>>>> +		spin_lock_irq(&l3->list_lock);
>>>> +		shared = l3->shared;
>>>> +		if (shared) {
>>>> +			free_block(cachep, shared->entry, shared->avail, node);
>>>> +			l3->shared = NULL;
>>>> +		}
>>>> +		alien = l3->alien;
>>>> +		l3->alien = NULL;
>>>> +		spin_unlock_irq(&l3->list_lock);
>>>> +
>>>> +		if (alien) {
>>>> +			drain_alien_cache(cachep, alien);
>>>> +			free_alien_cache(alien);
>>>> +		}
>>>> +		kfree(shared);
>>>> +
>>>> +		drain_freelist(cachep, l3, l3->free_objects);
>>>> +		if (!list_empty(&l3->slabs_full) ||
>>>> +					!list_empty(&l3->slabs_partial)) {
>>>> +			/*
>>>> +			 * Continue to iterate through each slab cache to free
>>>> +			 * as many nodelists as possible even though the
>>>> +			 * offline will be canceled.
>>>> +			 */
>>>> +			ret = -EBUSY;
>>>> +			continue;
>>>> +		}
>>>> +		kfree(l3);
>>>> +		cachep->nodelists[node] = NULL;
>>> What's stopping races of other CPUs trying to access l3 and array
>>> caches while they're being freed?
>>>
>> numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
>> already does a fallback to other onlined nodes in case a nodeid is passed 
>> to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
>> cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
>> cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).
> 
> Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
> while it is being accessed by another CPU and concurrently being freed
> on this one? 
> 
> 
>>>> +	}
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
>>>> + */
>>>> +static int __meminit slab_node_online(int nid)
>>>> +{
>>>> +	int ret;
>>>> +	mutex_lock(&cache_chain_mutex);
>>>> +	ret = init_cache_nodelists_node(nid);
>>>> +	mutex_unlock(&cache_chain_mutex);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
>>>> + */
>>>> +static int __meminit slab_node_offline(int nid)
>>>> +{
>>>> +	int ret;
>>>> +	mutex_lock(&cache_chain_mutex);
>>>> +	ret = free_cache_nodelists_node(nid);
>>>> +	mutex_unlock(&cache_chain_mutex);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int __meminit slab_memory_callback(struct notifier_block *self,
>>>> +					unsigned long action, void *arg)
>>>> +{
>>>> +	struct memory_notify *mnb = arg;
>>>> +	int ret = 0;
>>>> +	int nid;
>>>> +
>>>> +	nid = mnb->status_change_nid;
>>>> +	if (nid < 0)
>>>> +		goto out;
>>>> +
>>>> +	switch (action) {
>>>> +	case MEM_GOING_ONLINE:
>>>> +	case MEM_CANCEL_OFFLINE:
>>>> +		ret = slab_node_online(nid);
>>>> +		break;
>>> This would explode if CANCEL_OFFLINE fails. Call it theoretical and
>>> put a panic() in here and I don't mind. Otherwise you get corruption
>>> somewhere in the slab code.
>>>
>> MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be 
>> allocated anywhere on the system and if that happens then the node simply 
>> couldn't be allocated from (numa_node_id() would never return it as the 
>> cpu's node, so it's possible to fallback in this scenario).
> 
> Why would it never return the CPU's node? It's CANCEL_OFFLINE that is
> the problem.

So I was thinking of pushing this towards Linus but I didn't see anyone 
respond to Nick's concerns. I'm not that familiar with all this hotplug 
stuff so can someone make also Nick happy so we can move forward?

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-22 17:28                                             ` Pekka Enberg
@ 2010-03-22 21:12                                               ` Nick Piggin
  -1 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-03-22 21:12 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Mon, Mar 22, 2010 at 07:28:54PM +0200, Pekka Enberg wrote:
> Nick Piggin wrote:
> >On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> >>On Fri, 5 Mar 2010, Nick Piggin wrote:
> >>
> >>>>+#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> >>>>+/*
> >>>>+ * Drains and frees nodelists for a node on each slab cache, used for memory
> >>>>+ * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> >>>>+ * hot-remove so that the node is not removed.  When used because memory
> >>>>+ * hot-add is canceled, the only result is the freed kmem_list3.
> >>>>+ *
> >>>>+ * Must hold cache_chain_mutex.
> >>>>+ */
> >>>>+static int __meminit free_cache_nodelists_node(int node)
> >>>>+{
> >>>>+	struct kmem_cache *cachep;
> >>>>+	int ret = 0;
> >>>>+
> >>>>+	list_for_each_entry(cachep, &cache_chain, next) {
> >>>>+		struct array_cache *shared;
> >>>>+		struct array_cache **alien;
> >>>>+		struct kmem_list3 *l3;
> >>>>+
> >>>>+		l3 = cachep->nodelists[node];
> >>>>+		if (!l3)
> >>>>+			continue;
> >>>>+
> >>>>+		spin_lock_irq(&l3->list_lock);
> >>>>+		shared = l3->shared;
> >>>>+		if (shared) {
> >>>>+			free_block(cachep, shared->entry, shared->avail, node);
> >>>>+			l3->shared = NULL;
> >>>>+		}
> >>>>+		alien = l3->alien;
> >>>>+		l3->alien = NULL;
> >>>>+		spin_unlock_irq(&l3->list_lock);
> >>>>+
> >>>>+		if (alien) {
> >>>>+			drain_alien_cache(cachep, alien);
> >>>>+			free_alien_cache(alien);
> >>>>+		}
> >>>>+		kfree(shared);
> >>>>+
> >>>>+		drain_freelist(cachep, l3, l3->free_objects);
> >>>>+		if (!list_empty(&l3->slabs_full) ||
> >>>>+					!list_empty(&l3->slabs_partial)) {
> >>>>+			/*
> >>>>+			 * Continue to iterate through each slab cache to free
> >>>>+			 * as many nodelists as possible even though the
> >>>>+			 * offline will be canceled.
> >>>>+			 */
> >>>>+			ret = -EBUSY;
> >>>>+			continue;
> >>>>+		}
> >>>>+		kfree(l3);
> >>>>+		cachep->nodelists[node] = NULL;
> >>>What's stopping races of other CPUs trying to access l3 and array
> >>>caches while they're being freed?
> >>>
> >>numa_node_id() will not return an offlined nodeid and
> >>cache_alloc_node() already does a fallback to other onlined
> >>nodes in case a nodeid is passed to kmalloc_node() that does not
> >>have a nodelist.  l3->shared and l3->alien cannot be accessed
> >>without l3->list_lock (drain, cache_alloc_refill,
> >>cache_flusharray) or cache_chain_mutex (kmem_cache_destroy,
> >>cache_reap).
> >
> >Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
> >while it is being accessed by another CPU and concurrently being freed
> >on this one?
> >
> >
> >>>>+	}
> >>>>+	return ret;
> >>>>+}
> >>>>+
> >>>>+/*
> >>>>+ * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> >>>>+ */
> >>>>+static int __meminit slab_node_online(int nid)
> >>>>+{
> >>>>+	int ret;
> >>>>+	mutex_lock(&cache_chain_mutex);
> >>>>+	ret = init_cache_nodelists_node(nid);
> >>>>+	mutex_unlock(&cache_chain_mutex);
> >>>>+	return ret;
> >>>>+}
> >>>>+
> >>>>+/*
> >>>>+ * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> >>>>+ */
> >>>>+static int __meminit slab_node_offline(int nid)
> >>>>+{
> >>>>+	int ret;
> >>>>+	mutex_lock(&cache_chain_mutex);
> >>>>+	ret = free_cache_nodelists_node(nid);
> >>>>+	mutex_unlock(&cache_chain_mutex);
> >>>>+	return ret;
> >>>>+}
> >>>>+
> >>>>+static int __meminit slab_memory_callback(struct notifier_block *self,
> >>>>+					unsigned long action, void *arg)
> >>>>+{
> >>>>+	struct memory_notify *mnb = arg;
> >>>>+	int ret = 0;
> >>>>+	int nid;
> >>>>+
> >>>>+	nid = mnb->status_change_nid;
> >>>>+	if (nid < 0)
> >>>>+		goto out;
> >>>>+
> >>>>+	switch (action) {
> >>>>+	case MEM_GOING_ONLINE:
> >>>>+	case MEM_CANCEL_OFFLINE:
> >>>>+		ret = slab_node_online(nid);
> >>>>+		break;
> >>>This would explode if CANCEL_OFFLINE fails. Call it theoretical and
> >>>put a panic() in here and I don't mind. Otherwise you get corruption
> >>>somewhere in the slab code.
> >>>
> >>MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3
> >>couldn't be allocated anywhere on the system and if that happens
> >>then the node simply couldn't be allocated from (numa_node_id()
> >>would never return it as the cpu's node, so it's possible to
> >>fallback in this scenario).
> >
> >Why would it never return the CPU's node? It's CANCEL_OFFLINE that is
> >the problem.
> 
> So I was thinking of pushing this towards Linus but I didn't see
> anyone respond to Nick's concerns. I'm not that familiar with all
> this hotplug stuff so can someone make also Nick happy so we can
> move forward?

I don't mind about the memory failure cases (just add a panic
there that should never really happen anyway, just to document
that a part is still missing).

I am more worried about the races. Maybe I just missed how they
are protected against.


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-22 21:12                                               ` Nick Piggin
  0 siblings, 0 replies; 170+ messages in thread
From: Nick Piggin @ 2010-03-22 21:12 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Mon, Mar 22, 2010 at 07:28:54PM +0200, Pekka Enberg wrote:
> Nick Piggin wrote:
> >On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> >>On Fri, 5 Mar 2010, Nick Piggin wrote:
> >>
> >>>>+#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> >>>>+/*
> >>>>+ * Drains and frees nodelists for a node on each slab cache, used for memory
> >>>>+ * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> >>>>+ * hot-remove so that the node is not removed.  When used because memory
> >>>>+ * hot-add is canceled, the only result is the freed kmem_list3.
> >>>>+ *
> >>>>+ * Must hold cache_chain_mutex.
> >>>>+ */
> >>>>+static int __meminit free_cache_nodelists_node(int node)
> >>>>+{
> >>>>+	struct kmem_cache *cachep;
> >>>>+	int ret = 0;
> >>>>+
> >>>>+	list_for_each_entry(cachep, &cache_chain, next) {
> >>>>+		struct array_cache *shared;
> >>>>+		struct array_cache **alien;
> >>>>+		struct kmem_list3 *l3;
> >>>>+
> >>>>+		l3 = cachep->nodelists[node];
> >>>>+		if (!l3)
> >>>>+			continue;
> >>>>+
> >>>>+		spin_lock_irq(&l3->list_lock);
> >>>>+		shared = l3->shared;
> >>>>+		if (shared) {
> >>>>+			free_block(cachep, shared->entry, shared->avail, node);
> >>>>+			l3->shared = NULL;
> >>>>+		}
> >>>>+		alien = l3->alien;
> >>>>+		l3->alien = NULL;
> >>>>+		spin_unlock_irq(&l3->list_lock);
> >>>>+
> >>>>+		if (alien) {
> >>>>+			drain_alien_cache(cachep, alien);
> >>>>+			free_alien_cache(alien);
> >>>>+		}
> >>>>+		kfree(shared);
> >>>>+
> >>>>+		drain_freelist(cachep, l3, l3->free_objects);
> >>>>+		if (!list_empty(&l3->slabs_full) ||
> >>>>+					!list_empty(&l3->slabs_partial)) {
> >>>>+			/*
> >>>>+			 * Continue to iterate through each slab cache to free
> >>>>+			 * as many nodelists as possible even though the
> >>>>+			 * offline will be canceled.
> >>>>+			 */
> >>>>+			ret = -EBUSY;
> >>>>+			continue;
> >>>>+		}
> >>>>+		kfree(l3);
> >>>>+		cachep->nodelists[node] = NULL;
> >>>What's stopping races of other CPUs trying to access l3 and array
> >>>caches while they're being freed?
> >>>
> >>numa_node_id() will not return an offlined nodeid and
> >>cache_alloc_node() already does a fallback to other onlined
> >>nodes in case a nodeid is passed to kmalloc_node() that does not
> >>have a nodelist.  l3->shared and l3->alien cannot be accessed
> >>without l3->list_lock (drain, cache_alloc_refill,
> >>cache_flusharray) or cache_chain_mutex (kmem_cache_destroy,
> >>cache_reap).
> >
> >Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
> >while it is being accessed by another CPU and concurrently being freed
> >on this one?
> >
> >
> >>>>+	}
> >>>>+	return ret;
> >>>>+}
> >>>>+
> >>>>+/*
> >>>>+ * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> >>>>+ */
> >>>>+static int __meminit slab_node_online(int nid)
> >>>>+{
> >>>>+	int ret;
> >>>>+	mutex_lock(&cache_chain_mutex);
> >>>>+	ret = init_cache_nodelists_node(nid);
> >>>>+	mutex_unlock(&cache_chain_mutex);
> >>>>+	return ret;
> >>>>+}
> >>>>+
> >>>>+/*
> >>>>+ * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> >>>>+ */
> >>>>+static int __meminit slab_node_offline(int nid)
> >>>>+{
> >>>>+	int ret;
> >>>>+	mutex_lock(&cache_chain_mutex);
> >>>>+	ret = free_cache_nodelists_node(nid);
> >>>>+	mutex_unlock(&cache_chain_mutex);
> >>>>+	return ret;
> >>>>+}
> >>>>+
> >>>>+static int __meminit slab_memory_callback(struct notifier_block *self,
> >>>>+					unsigned long action, void *arg)
> >>>>+{
> >>>>+	struct memory_notify *mnb = arg;
> >>>>+	int ret = 0;
> >>>>+	int nid;
> >>>>+
> >>>>+	nid = mnb->status_change_nid;
> >>>>+	if (nid < 0)
> >>>>+		goto out;
> >>>>+
> >>>>+	switch (action) {
> >>>>+	case MEM_GOING_ONLINE:
> >>>>+	case MEM_CANCEL_OFFLINE:
> >>>>+		ret = slab_node_online(nid);
> >>>>+		break;
> >>>This would explode if CANCEL_OFFLINE fails. Call it theoretical and
> >>>put a panic() in here and I don't mind. Otherwise you get corruption
> >>>somewhere in the slab code.
> >>>
> >>MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3
> >>couldn't be allocated anywhere on the system and if that happens
> >>then the node simply couldn't be allocated from (numa_node_id()
> >>would never return it as the cpu's node, so it's possible to
> >>fallback in this scenario).
> >
> >Why would it never return the CPU's node? It's CANCEL_OFFLINE that is
> >the problem.
> 
> So I was thinking of pushing this towards Linus but I didn't see
> anyone respond to Nick's concerns. I'm not that familiar with all
> this hotplug stuff so can someone make also Nick happy so we can
> move forward?

I don't mind about the memory failure cases (just add a panic
there that should never really happen anyway, just to document
that a part is still missing).

I am more worried about the races. Maybe I just missed how they
are protected against.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
  2010-03-09 13:46                                           ` Nick Piggin
@ 2010-03-28  2:13                                             ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-28  2:13 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Wed, 10 Mar 2010, Nick Piggin wrote:

> On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> > On Fri, 5 Mar 2010, Nick Piggin wrote:
> > 
> > > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> > > > +/*
> > > > + * Drains and frees nodelists for a node on each slab cache, used for memory
> > > > + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> > > > + * hot-remove so that the node is not removed.  When used because memory
> > > > + * hot-add is canceled, the only result is the freed kmem_list3.
> > > > + *
> > > > + * Must hold cache_chain_mutex.
> > > > + */
> > > > +static int __meminit free_cache_nodelists_node(int node)
> > > > +{
> > > > +	struct kmem_cache *cachep;
> > > > +	int ret = 0;
> > > > +
> > > > +	list_for_each_entry(cachep, &cache_chain, next) {
> > > > +		struct array_cache *shared;
> > > > +		struct array_cache **alien;
> > > > +		struct kmem_list3 *l3;
> > > > +
> > > > +		l3 = cachep->nodelists[node];
> > > > +		if (!l3)
> > > > +			continue;
> > > > +
> > > > +		spin_lock_irq(&l3->list_lock);
> > > > +		shared = l3->shared;
> > > > +		if (shared) {
> > > > +			free_block(cachep, shared->entry, shared->avail, node);
> > > > +			l3->shared = NULL;
> > > > +		}
> > > > +		alien = l3->alien;
> > > > +		l3->alien = NULL;
> > > > +		spin_unlock_irq(&l3->list_lock);
> > > > +
> > > > +		if (alien) {
> > > > +			drain_alien_cache(cachep, alien);
> > > > +			free_alien_cache(alien);
> > > > +		}
> > > > +		kfree(shared);
> > > > +
> > > > +		drain_freelist(cachep, l3, l3->free_objects);
> > > > +		if (!list_empty(&l3->slabs_full) ||
> > > > +					!list_empty(&l3->slabs_partial)) {
> > > > +			/*
> > > > +			 * Continue to iterate through each slab cache to free
> > > > +			 * as many nodelists as possible even though the
> > > > +			 * offline will be canceled.
> > > > +			 */
> > > > +			ret = -EBUSY;
> > > > +			continue;
> > > > +		}
> > > > +		kfree(l3);
> > > > +		cachep->nodelists[node] = NULL;
> > > 
> > > What's stopping races of other CPUs trying to access l3 and array
> > > caches while they're being freed?
> > > 
> > 
> > numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
> > already does a fallback to other onlined nodes in case a nodeid is passed 
> > to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
> > cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
> > cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).
> 
> Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
> while it is being accessed by another CPU and concurrently being freed
> on this one? 
> 

You're right, we can't free cachep->nodelists[node] for any node that is 
being hot-removed to avoid a race in cache_alloc_node().  I thought we had 
protection for this under cache_chain_mutex for most dereferences and 
could disregard cache_alloc_refill() because numa_node_id() would never 
return a node being removed under memory hotplug, that would be the 
responsibility of cpu hotplug instead (offline the cpu first, then ensure 
numa_node_id() can't return a node under hot-remove).

Thanks for pointing that out, it's definitely broken here.

As an alternative, I think we should do something like this on 
MEM_GOING_OFFLINE:

	int ret = 0;

	mutex_lock(&cache_chain_mutex);
	list_for_each_entry(cachep, &cache_chain, next) {
		struct kmem_list3 *l3;

		l3 = cachep->nodelists[node];
		if (!l3)
			continue;
		drain_freelist(cachep, l3, l3->free_objects);

		ret = list_empty(&l3->slabs_full) &&
		      list_empty(&l3->slabs_partial);
		if (ret)
			break;
	}
	mutex_unlock(&cache_chain_mutex);
	return ret ? NOTIFY_BAD : NOTIFY_OK;

to preempt hot-remove of a node where there are slabs on the partial or 
free list that can't be freed.

Then, for MEM_OFFLINE, we leave cachep->nodelists[node] to be valid in 
case there are cache_alloc_node() racers or the node ever comes back 
online; susbequent callers to kmalloc_node() for the offlined node would 
actually return objects from fallback_alloc() since kmem_getpages() would 
fail for a node without present pages.

If slab is allocated after the drain_freelist() above, we'll never 
actually get MEM_OFFLINE since all pages can't be isolated for memory 
hot-remove, thus, the node will never be offlined.  kmem_getpages() can't 
allocate isolated pages, so this race must happen after drain_freelist() 
and prior to the pageblock being isolated.

So the MEM_GOING_OFFLINE check above is really more of a convenience to 
short-circuit the hot-remove if we know we can't free all slab on that 
node to avoid all the subsequent work that would happen only to run into 
isolation failure later.

We don't need to do anything for MEM_CANCEL_OFFLINE since the only affect 
of MEM_GOING_OFFLINE is to drain the freelist.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch] slab: add memory hotplug support
@ 2010-03-28  2:13                                             ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-28  2:13 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Wed, 10 Mar 2010, Nick Piggin wrote:

> On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> > On Fri, 5 Mar 2010, Nick Piggin wrote:
> > 
> > > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> > > > +/*
> > > > + * Drains and frees nodelists for a node on each slab cache, used for memory
> > > > + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> > > > + * hot-remove so that the node is not removed.  When used because memory
> > > > + * hot-add is canceled, the only result is the freed kmem_list3.
> > > > + *
> > > > + * Must hold cache_chain_mutex.
> > > > + */
> > > > +static int __meminit free_cache_nodelists_node(int node)
> > > > +{
> > > > +	struct kmem_cache *cachep;
> > > > +	int ret = 0;
> > > > +
> > > > +	list_for_each_entry(cachep, &cache_chain, next) {
> > > > +		struct array_cache *shared;
> > > > +		struct array_cache **alien;
> > > > +		struct kmem_list3 *l3;
> > > > +
> > > > +		l3 = cachep->nodelists[node];
> > > > +		if (!l3)
> > > > +			continue;
> > > > +
> > > > +		spin_lock_irq(&l3->list_lock);
> > > > +		shared = l3->shared;
> > > > +		if (shared) {
> > > > +			free_block(cachep, shared->entry, shared->avail, node);
> > > > +			l3->shared = NULL;
> > > > +		}
> > > > +		alien = l3->alien;
> > > > +		l3->alien = NULL;
> > > > +		spin_unlock_irq(&l3->list_lock);
> > > > +
> > > > +		if (alien) {
> > > > +			drain_alien_cache(cachep, alien);
> > > > +			free_alien_cache(alien);
> > > > +		}
> > > > +		kfree(shared);
> > > > +
> > > > +		drain_freelist(cachep, l3, l3->free_objects);
> > > > +		if (!list_empty(&l3->slabs_full) ||
> > > > +					!list_empty(&l3->slabs_partial)) {
> > > > +			/*
> > > > +			 * Continue to iterate through each slab cache to free
> > > > +			 * as many nodelists as possible even though the
> > > > +			 * offline will be canceled.
> > > > +			 */
> > > > +			ret = -EBUSY;
> > > > +			continue;
> > > > +		}
> > > > +		kfree(l3);
> > > > +		cachep->nodelists[node] = NULL;
> > > 
> > > What's stopping races of other CPUs trying to access l3 and array
> > > caches while they're being freed?
> > > 
> > 
> > numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
> > already does a fallback to other onlined nodes in case a nodeid is passed 
> > to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
> > cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
> > cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).
> 
> Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
> while it is being accessed by another CPU and concurrently being freed
> on this one? 
> 

You're right, we can't free cachep->nodelists[node] for any node that is 
being hot-removed to avoid a race in cache_alloc_node().  I thought we had 
protection for this under cache_chain_mutex for most dereferences and 
could disregard cache_alloc_refill() because numa_node_id() would never 
return a node being removed under memory hotplug, that would be the 
responsibility of cpu hotplug instead (offline the cpu first, then ensure 
numa_node_id() can't return a node under hot-remove).

Thanks for pointing that out, it's definitely broken here.

As an alternative, I think we should do something like this on 
MEM_GOING_OFFLINE:

	int ret = 0;

	mutex_lock(&cache_chain_mutex);
	list_for_each_entry(cachep, &cache_chain, next) {
		struct kmem_list3 *l3;

		l3 = cachep->nodelists[node];
		if (!l3)
			continue;
		drain_freelist(cachep, l3, l3->free_objects);

		ret = list_empty(&l3->slabs_full) &&
		      list_empty(&l3->slabs_partial);
		if (ret)
			break;
	}
	mutex_unlock(&cache_chain_mutex);
	return ret ? NOTIFY_BAD : NOTIFY_OK;

to preempt hot-remove of a node where there are slabs on the partial or 
free list that can't be freed.

Then, for MEM_OFFLINE, we leave cachep->nodelists[node] to be valid in 
case there are cache_alloc_node() racers or the node ever comes back 
online; susbequent callers to kmalloc_node() for the offlined node would 
actually return objects from fallback_alloc() since kmem_getpages() would 
fail for a node without present pages.

If slab is allocated after the drain_freelist() above, we'll never 
actually get MEM_OFFLINE since all pages can't be isolated for memory 
hot-remove, thus, the node will never be offlined.  kmem_getpages() can't 
allocate isolated pages, so this race must happen after drain_freelist() 
and prior to the pageblock being isolated.

So the MEM_GOING_OFFLINE check above is really more of a convenience to 
short-circuit the hot-remove if we know we can't free all slab on that 
node to avoid all the subsequent work that would happen only to run into 
isolation failure later.

We don't need to do anything for MEM_CANCEL_OFFLINE since the only affect 
of MEM_GOING_OFFLINE is to drain the freelist.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [patch v2] slab: add memory hotplug support
  2010-03-28  2:13                                             ` David Rientjes
@ 2010-03-28  2:40                                               ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-28  2:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Slab lacks any memory hotplug support for nodes that are hotplugged
without cpus being hotplugged.  This is possible at least on x86
CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
node.  It can also be done manually by writing the start address to
/sys/devices/system/memory/probe for kernels that have
CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
then onlining the new memory region.

When a node is hotadded, a nodelist for that node is allocated and 
initialized for each slab cache.  If this isn't completed due to a lack
of memory, the hotadd is aborted: we have a reasonable expectation that
kmalloc_node(nid) will work for all caches if nid is online and memory is
available.  

Since nodelists must be allocated and initialized prior to the new node's
memory actually being online, the struct kmem_list3 is allocated off-node
due to kmalloc_node()'s fallback.

When an entire node would be offlined, its nodelists are subsequently
drained.  If slab objects still exist and cannot be freed, the offline is
aborted.  It is possible that objects will be allocated between this
drain and page isolation, so it's still possible that the offline will
still fail, however.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/slab.c |  157 ++++++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 125 insertions(+), 32 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -115,6 +115,7 @@
 #include	<linux/reciprocal_div.h>
 #include	<linux/debugobjects.h>
 #include	<linux/kmemcheck.h>
+#include	<linux/memory.h>
 
 #include	<asm/cacheflush.h>
 #include	<asm/tlbflush.h>
@@ -1102,6 +1103,52 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 }
 #endif
 
+/*
+ * Allocates and initializes nodelists for a node on each slab cache, used for
+ * either memory or cpu hotplug.  If memory is being hot-added, the kmem_list3
+ * will be allocated off-node since memory is not yet online for the new node.
+ * When hotplugging memory or a cpu, existing nodelists are not replaced if
+ * already in use.
+ *
+ * Must hold cache_chain_mutex.
+ */
+static int init_cache_nodelists_node(int node)
+{
+	struct kmem_cache *cachep;
+	struct kmem_list3 *l3;
+	const int memsize = sizeof(struct kmem_list3);
+
+	list_for_each_entry(cachep, &cache_chain, next) {
+		/*
+		 * Set up the size64 kmemlist for cpu before we can
+		 * begin anything. Make sure some other cpu on this
+		 * node has not already allocated this
+		 */
+		if (!cachep->nodelists[node]) {
+			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
+			if (!l3)
+				return -ENOMEM;
+			kmem_list3_init(l3);
+			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
+			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
+
+			/*
+			 * The l3s don't come and go as CPUs come and
+			 * go.  cache_chain_mutex is sufficient
+			 * protection here.
+			 */
+			cachep->nodelists[node] = l3;
+		}
+
+		spin_lock_irq(&cachep->nodelists[node]->list_lock);
+		cachep->nodelists[node]->free_limit =
+			(1 + nr_cpus_node(node)) *
+			cachep->batchcount + cachep->num;
+		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
+	}
+	return 0;
+}
+
 static void __cpuinit cpuup_canceled(long cpu)
 {
 	struct kmem_cache *cachep;
@@ -1172,7 +1219,7 @@ static int __cpuinit cpuup_prepare(long cpu)
 	struct kmem_cache *cachep;
 	struct kmem_list3 *l3 = NULL;
 	int node = cpu_to_node(cpu);
-	const int memsize = sizeof(struct kmem_list3);
+	int err;
 
 	/*
 	 * We need to do this right in the beginning since
@@ -1180,35 +1227,9 @@ static int __cpuinit cpuup_prepare(long cpu)
 	 * kmalloc_node allows us to add the slab to the right
 	 * kmem_list3 and not this cpu's kmem_list3
 	 */
-
-	list_for_each_entry(cachep, &cache_chain, next) {
-		/*
-		 * Set up the size64 kmemlist for cpu before we can
-		 * begin anything. Make sure some other cpu on this
-		 * node has not already allocated this
-		 */
-		if (!cachep->nodelists[node]) {
-			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
-			if (!l3)
-				goto bad;
-			kmem_list3_init(l3);
-			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
-			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
-
-			/*
-			 * The l3s don't come and go as CPUs come and
-			 * go.  cache_chain_mutex is sufficient
-			 * protection here.
-			 */
-			cachep->nodelists[node] = l3;
-		}
-
-		spin_lock_irq(&cachep->nodelists[node]->list_lock);
-		cachep->nodelists[node]->free_limit =
-			(1 + nr_cpus_node(node)) *
-			cachep->batchcount + cachep->num;
-		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
-	}
+	err = init_cache_nodelists_node(node);
+	if (err < 0)
+		goto bad;
 
 	/*
 	 * Now we can go ahead with allocating the shared arrays and
@@ -1331,11 +1352,75 @@ static struct notifier_block __cpuinitdata cpucache_notifier = {
 	&cpuup_callback, NULL, 0
 };
 
+#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
+/*
+ * Drains freelist for a node on each slab cache, used for memory hot-remove.
+ * Returns -EBUSY if all objects cannot be drained so that the node is not
+ * removed.
+ *
+ * Must hold cache_chain_mutex.
+ */
+static int __meminit drain_cache_nodelists_node(int node)
+{
+	struct kmem_cache *cachep;
+	int ret = 0;
+
+	list_for_each_entry(cachep, &cache_chain, next) {
+		struct kmem_list3 *l3;
+
+		l3 = cachep->nodelists[node];
+		if (!l3)
+			continue;
+
+		drain_freelist(cachep, l3, l3->free_objects);
+
+		if (!list_empty(&l3->slabs_full) ||
+		    !list_empty(&l3->slabs_partial)) {
+			ret = -EBUSY;
+			break;
+		}
+	}
+	return ret;
+}
+
+static int __meminit slab_memory_callback(struct notifier_block *self,
+					unsigned long action, void *arg)
+{
+	struct memory_notify *mnb = arg;
+	int ret = 0;
+	int nid;
+
+	nid = mnb->status_change_nid;
+	if (nid < 0)
+		goto out;
+
+	switch (action) {
+	case MEM_GOING_ONLINE:
+		mutex_lock(&cache_chain_mutex);
+		ret = init_cache_nodelists_node(nid);
+		mutex_unlock(&cache_chain_mutex);
+		break;
+	case MEM_GOING_OFFLINE:
+		mutex_lock(&cache_chain_mutex);
+		ret = drain_cache_nodelists_node(nid);
+		mutex_unlock(&cache_chain_mutex);
+		break;
+	case MEM_ONLINE:
+	case MEM_OFFLINE:
+	case MEM_CANCEL_ONLINE:
+	case MEM_CANCEL_OFFLINE:
+		break;
+	}
+out:
+	return ret ? notifier_from_errno(ret) : NOTIFY_OK;
+}
+#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
+
 /*
  * swap the static kmem_list3 with kmalloced memory
  */
-static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
-			int nodeid)
+static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
+				int nodeid)
 {
 	struct kmem_list3 *ptr;
 
@@ -1580,6 +1665,14 @@ void __init kmem_cache_init_late(void)
 	 */
 	register_cpu_notifier(&cpucache_notifier);
 
+#ifdef CONFIG_NUMA
+	/*
+	 * Register a memory hotplug callback that initializes and frees
+	 * nodelists.
+	 */
+	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
+#endif
+
 	/*
 	 * The reap timers are started later, with a module init call: That part
 	 * of the kernel is not yet operational.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* [patch v2] slab: add memory hotplug support
@ 2010-03-28  2:40                                               ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-03-28  2:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pekka Enberg, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

Slab lacks any memory hotplug support for nodes that are hotplugged
without cpus being hotplugged.  This is possible at least on x86
CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
node.  It can also be done manually by writing the start address to
/sys/devices/system/memory/probe for kernels that have
CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
then onlining the new memory region.

When a node is hotadded, a nodelist for that node is allocated and 
initialized for each slab cache.  If this isn't completed due to a lack
of memory, the hotadd is aborted: we have a reasonable expectation that
kmalloc_node(nid) will work for all caches if nid is online and memory is
available.  

Since nodelists must be allocated and initialized prior to the new node's
memory actually being online, the struct kmem_list3 is allocated off-node
due to kmalloc_node()'s fallback.

When an entire node would be offlined, its nodelists are subsequently
drained.  If slab objects still exist and cannot be freed, the offline is
aborted.  It is possible that objects will be allocated between this
drain and page isolation, so it's still possible that the offline will
still fail, however.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/slab.c |  157 ++++++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 125 insertions(+), 32 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -115,6 +115,7 @@
 #include	<linux/reciprocal_div.h>
 #include	<linux/debugobjects.h>
 #include	<linux/kmemcheck.h>
+#include	<linux/memory.h>
 
 #include	<asm/cacheflush.h>
 #include	<asm/tlbflush.h>
@@ -1102,6 +1103,52 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 }
 #endif
 
+/*
+ * Allocates and initializes nodelists for a node on each slab cache, used for
+ * either memory or cpu hotplug.  If memory is being hot-added, the kmem_list3
+ * will be allocated off-node since memory is not yet online for the new node.
+ * When hotplugging memory or a cpu, existing nodelists are not replaced if
+ * already in use.
+ *
+ * Must hold cache_chain_mutex.
+ */
+static int init_cache_nodelists_node(int node)
+{
+	struct kmem_cache *cachep;
+	struct kmem_list3 *l3;
+	const int memsize = sizeof(struct kmem_list3);
+
+	list_for_each_entry(cachep, &cache_chain, next) {
+		/*
+		 * Set up the size64 kmemlist for cpu before we can
+		 * begin anything. Make sure some other cpu on this
+		 * node has not already allocated this
+		 */
+		if (!cachep->nodelists[node]) {
+			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
+			if (!l3)
+				return -ENOMEM;
+			kmem_list3_init(l3);
+			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
+			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
+
+			/*
+			 * The l3s don't come and go as CPUs come and
+			 * go.  cache_chain_mutex is sufficient
+			 * protection here.
+			 */
+			cachep->nodelists[node] = l3;
+		}
+
+		spin_lock_irq(&cachep->nodelists[node]->list_lock);
+		cachep->nodelists[node]->free_limit =
+			(1 + nr_cpus_node(node)) *
+			cachep->batchcount + cachep->num;
+		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
+	}
+	return 0;
+}
+
 static void __cpuinit cpuup_canceled(long cpu)
 {
 	struct kmem_cache *cachep;
@@ -1172,7 +1219,7 @@ static int __cpuinit cpuup_prepare(long cpu)
 	struct kmem_cache *cachep;
 	struct kmem_list3 *l3 = NULL;
 	int node = cpu_to_node(cpu);
-	const int memsize = sizeof(struct kmem_list3);
+	int err;
 
 	/*
 	 * We need to do this right in the beginning since
@@ -1180,35 +1227,9 @@ static int __cpuinit cpuup_prepare(long cpu)
 	 * kmalloc_node allows us to add the slab to the right
 	 * kmem_list3 and not this cpu's kmem_list3
 	 */
-
-	list_for_each_entry(cachep, &cache_chain, next) {
-		/*
-		 * Set up the size64 kmemlist for cpu before we can
-		 * begin anything. Make sure some other cpu on this
-		 * node has not already allocated this
-		 */
-		if (!cachep->nodelists[node]) {
-			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
-			if (!l3)
-				goto bad;
-			kmem_list3_init(l3);
-			l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
-			    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
-
-			/*
-			 * The l3s don't come and go as CPUs come and
-			 * go.  cache_chain_mutex is sufficient
-			 * protection here.
-			 */
-			cachep->nodelists[node] = l3;
-		}
-
-		spin_lock_irq(&cachep->nodelists[node]->list_lock);
-		cachep->nodelists[node]->free_limit =
-			(1 + nr_cpus_node(node)) *
-			cachep->batchcount + cachep->num;
-		spin_unlock_irq(&cachep->nodelists[node]->list_lock);
-	}
+	err = init_cache_nodelists_node(node);
+	if (err < 0)
+		goto bad;
 
 	/*
 	 * Now we can go ahead with allocating the shared arrays and
@@ -1331,11 +1352,75 @@ static struct notifier_block __cpuinitdata cpucache_notifier = {
 	&cpuup_callback, NULL, 0
 };
 
+#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
+/*
+ * Drains freelist for a node on each slab cache, used for memory hot-remove.
+ * Returns -EBUSY if all objects cannot be drained so that the node is not
+ * removed.
+ *
+ * Must hold cache_chain_mutex.
+ */
+static int __meminit drain_cache_nodelists_node(int node)
+{
+	struct kmem_cache *cachep;
+	int ret = 0;
+
+	list_for_each_entry(cachep, &cache_chain, next) {
+		struct kmem_list3 *l3;
+
+		l3 = cachep->nodelists[node];
+		if (!l3)
+			continue;
+
+		drain_freelist(cachep, l3, l3->free_objects);
+
+		if (!list_empty(&l3->slabs_full) ||
+		    !list_empty(&l3->slabs_partial)) {
+			ret = -EBUSY;
+			break;
+		}
+	}
+	return ret;
+}
+
+static int __meminit slab_memory_callback(struct notifier_block *self,
+					unsigned long action, void *arg)
+{
+	struct memory_notify *mnb = arg;
+	int ret = 0;
+	int nid;
+
+	nid = mnb->status_change_nid;
+	if (nid < 0)
+		goto out;
+
+	switch (action) {
+	case MEM_GOING_ONLINE:
+		mutex_lock(&cache_chain_mutex);
+		ret = init_cache_nodelists_node(nid);
+		mutex_unlock(&cache_chain_mutex);
+		break;
+	case MEM_GOING_OFFLINE:
+		mutex_lock(&cache_chain_mutex);
+		ret = drain_cache_nodelists_node(nid);
+		mutex_unlock(&cache_chain_mutex);
+		break;
+	case MEM_ONLINE:
+	case MEM_OFFLINE:
+	case MEM_CANCEL_ONLINE:
+	case MEM_CANCEL_OFFLINE:
+		break;
+	}
+out:
+	return ret ? notifier_from_errno(ret) : NOTIFY_OK;
+}
+#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
+
 /*
  * swap the static kmem_list3 with kmalloced memory
  */
-static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
-			int nodeid)
+static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
+				int nodeid)
 {
 	struct kmem_list3 *ptr;
 
@@ -1580,6 +1665,14 @@ void __init kmem_cache_init_late(void)
 	 */
 	register_cpu_notifier(&cpucache_notifier);
 
+#ifdef CONFIG_NUMA
+	/*
+	 * Register a memory hotplug callback that initializes and frees
+	 * nodelists.
+	 */
+	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
+#endif
+
 	/*
 	 * The reap timers are started later, with a module init call: That part
 	 * of the kernel is not yet operational.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch v2] slab: add memory hotplug support
  2010-03-28  2:40                                               ` David Rientjes
@ 2010-03-30  9:01                                                 ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-03-30  9:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: Nick Piggin, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Sun, Mar 28, 2010 at 5:40 AM, David Rientjes <rientjes@google.com> wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
>
> When a node is hotadded, a nodelist for that node is allocated and
> initialized for each slab cache.  If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.
>
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
>
> When an entire node would be offlined, its nodelists are subsequently
> drained.  If slab objects still exist and cannot be freed, the offline is
> aborted.  It is possible that objects will be allocated between this
> drain and page isolation, so it's still possible that the offline will
> still fail, however.
>
> Signed-off-by: David Rientjes <rientjes@google.com>

Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
sound to you?

> ---
>  mm/slab.c |  157 ++++++++++++++++++++++++++++++++++++++++++++++++------------
>  1 files changed, 125 insertions(+), 32 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -115,6 +115,7 @@
>  #include       <linux/reciprocal_div.h>
>  #include       <linux/debugobjects.h>
>  #include       <linux/kmemcheck.h>
> +#include       <linux/memory.h>
>
>  #include       <asm/cacheflush.h>
>  #include       <asm/tlbflush.h>
> @@ -1102,6 +1103,52 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
>  }
>  #endif
>
> +/*
> + * Allocates and initializes nodelists for a node on each slab cache, used for
> + * either memory or cpu hotplug.  If memory is being hot-added, the kmem_list3
> + * will be allocated off-node since memory is not yet online for the new node.
> + * When hotplugging memory or a cpu, existing nodelists are not replaced if
> + * already in use.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int init_cache_nodelists_node(int node)
> +{
> +       struct kmem_cache *cachep;
> +       struct kmem_list3 *l3;
> +       const int memsize = sizeof(struct kmem_list3);
> +
> +       list_for_each_entry(cachep, &cache_chain, next) {
> +               /*
> +                * Set up the size64 kmemlist for cpu before we can
> +                * begin anything. Make sure some other cpu on this
> +                * node has not already allocated this
> +                */
> +               if (!cachep->nodelists[node]) {
> +                       l3 = kmalloc_node(memsize, GFP_KERNEL, node);
> +                       if (!l3)
> +                               return -ENOMEM;
> +                       kmem_list3_init(l3);
> +                       l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
> +                           ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> +
> +                       /*
> +                        * The l3s don't come and go as CPUs come and
> +                        * go.  cache_chain_mutex is sufficient
> +                        * protection here.
> +                        */
> +                       cachep->nodelists[node] = l3;
> +               }
> +
> +               spin_lock_irq(&cachep->nodelists[node]->list_lock);
> +               cachep->nodelists[node]->free_limit =
> +                       (1 + nr_cpus_node(node)) *
> +                       cachep->batchcount + cachep->num;
> +               spin_unlock_irq(&cachep->nodelists[node]->list_lock);
> +       }
> +       return 0;
> +}
> +
>  static void __cpuinit cpuup_canceled(long cpu)
>  {
>        struct kmem_cache *cachep;
> @@ -1172,7 +1219,7 @@ static int __cpuinit cpuup_prepare(long cpu)
>        struct kmem_cache *cachep;
>        struct kmem_list3 *l3 = NULL;
>        int node = cpu_to_node(cpu);
> -       const int memsize = sizeof(struct kmem_list3);
> +       int err;
>
>        /*
>         * We need to do this right in the beginning since
> @@ -1180,35 +1227,9 @@ static int __cpuinit cpuup_prepare(long cpu)
>         * kmalloc_node allows us to add the slab to the right
>         * kmem_list3 and not this cpu's kmem_list3
>         */
> -
> -       list_for_each_entry(cachep, &cache_chain, next) {
> -               /*
> -                * Set up the size64 kmemlist for cpu before we can
> -                * begin anything. Make sure some other cpu on this
> -                * node has not already allocated this
> -                */
> -               if (!cachep->nodelists[node]) {
> -                       l3 = kmalloc_node(memsize, GFP_KERNEL, node);
> -                       if (!l3)
> -                               goto bad;
> -                       kmem_list3_init(l3);
> -                       l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
> -                           ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> -
> -                       /*
> -                        * The l3s don't come and go as CPUs come and
> -                        * go.  cache_chain_mutex is sufficient
> -                        * protection here.
> -                        */
> -                       cachep->nodelists[node] = l3;
> -               }
> -
> -               spin_lock_irq(&cachep->nodelists[node]->list_lock);
> -               cachep->nodelists[node]->free_limit =
> -                       (1 + nr_cpus_node(node)) *
> -                       cachep->batchcount + cachep->num;
> -               spin_unlock_irq(&cachep->nodelists[node]->list_lock);
> -       }
> +       err = init_cache_nodelists_node(node);
> +       if (err < 0)
> +               goto bad;
>
>        /*
>         * Now we can go ahead with allocating the shared arrays and
> @@ -1331,11 +1352,75 @@ static struct notifier_block __cpuinitdata cpucache_notifier = {
>        &cpuup_callback, NULL, 0
>  };
>
> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> +/*
> + * Drains freelist for a node on each slab cache, used for memory hot-remove.
> + * Returns -EBUSY if all objects cannot be drained so that the node is not
> + * removed.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int __meminit drain_cache_nodelists_node(int node)
> +{
> +       struct kmem_cache *cachep;
> +       int ret = 0;
> +
> +       list_for_each_entry(cachep, &cache_chain, next) {
> +               struct kmem_list3 *l3;
> +
> +               l3 = cachep->nodelists[node];
> +               if (!l3)
> +                       continue;
> +
> +               drain_freelist(cachep, l3, l3->free_objects);
> +
> +               if (!list_empty(&l3->slabs_full) ||
> +                   !list_empty(&l3->slabs_partial)) {
> +                       ret = -EBUSY;
> +                       break;
> +               }
> +       }
> +       return ret;
> +}
> +
> +static int __meminit slab_memory_callback(struct notifier_block *self,
> +                                       unsigned long action, void *arg)
> +{
> +       struct memory_notify *mnb = arg;
> +       int ret = 0;
> +       int nid;
> +
> +       nid = mnb->status_change_nid;
> +       if (nid < 0)
> +               goto out;
> +
> +       switch (action) {
> +       case MEM_GOING_ONLINE:
> +               mutex_lock(&cache_chain_mutex);
> +               ret = init_cache_nodelists_node(nid);
> +               mutex_unlock(&cache_chain_mutex);
> +               break;
> +       case MEM_GOING_OFFLINE:
> +               mutex_lock(&cache_chain_mutex);
> +               ret = drain_cache_nodelists_node(nid);
> +               mutex_unlock(&cache_chain_mutex);
> +               break;
> +       case MEM_ONLINE:
> +       case MEM_OFFLINE:
> +       case MEM_CANCEL_ONLINE:
> +       case MEM_CANCEL_OFFLINE:
> +               break;
> +       }
> +out:
> +       return ret ? notifier_from_errno(ret) : NOTIFY_OK;
> +}
> +#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
> +
>  /*
>  * swap the static kmem_list3 with kmalloced memory
>  */
> -static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> -                       int nodeid)
> +static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> +                               int nodeid)
>  {
>        struct kmem_list3 *ptr;
>
> @@ -1580,6 +1665,14 @@ void __init kmem_cache_init_late(void)
>         */
>        register_cpu_notifier(&cpucache_notifier);
>
> +#ifdef CONFIG_NUMA
> +       /*
> +        * Register a memory hotplug callback that initializes and frees
> +        * nodelists.
> +        */
> +       hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +#endif
> +
>        /*
>         * The reap timers are started later, with a module init call: That part
>         * of the kernel is not yet operational.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch v2] slab: add memory hotplug support
@ 2010-03-30  9:01                                                 ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-03-30  9:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: Nick Piggin, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

On Sun, Mar 28, 2010 at 5:40 AM, David Rientjes <rientjes@google.com> wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
>
> When a node is hotadded, a nodelist for that node is allocated and
> initialized for each slab cache.  If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.
>
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
>
> When an entire node would be offlined, its nodelists are subsequently
> drained.  If slab objects still exist and cannot be freed, the offline is
> aborted.  It is possible that objects will be allocated between this
> drain and page isolation, so it's still possible that the offline will
> still fail, however.
>
> Signed-off-by: David Rientjes <rientjes@google.com>

Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
sound to you?

> ---
>  mm/slab.c |  157 ++++++++++++++++++++++++++++++++++++++++++++++++------------
>  1 files changed, 125 insertions(+), 32 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -115,6 +115,7 @@
>  #include       <linux/reciprocal_div.h>
>  #include       <linux/debugobjects.h>
>  #include       <linux/kmemcheck.h>
> +#include       <linux/memory.h>
>
>  #include       <asm/cacheflush.h>
>  #include       <asm/tlbflush.h>
> @@ -1102,6 +1103,52 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
>  }
>  #endif
>
> +/*
> + * Allocates and initializes nodelists for a node on each slab cache, used for
> + * either memory or cpu hotplug.  If memory is being hot-added, the kmem_list3
> + * will be allocated off-node since memory is not yet online for the new node.
> + * When hotplugging memory or a cpu, existing nodelists are not replaced if
> + * already in use.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int init_cache_nodelists_node(int node)
> +{
> +       struct kmem_cache *cachep;
> +       struct kmem_list3 *l3;
> +       const int memsize = sizeof(struct kmem_list3);
> +
> +       list_for_each_entry(cachep, &cache_chain, next) {
> +               /*
> +                * Set up the size64 kmemlist for cpu before we can
> +                * begin anything. Make sure some other cpu on this
> +                * node has not already allocated this
> +                */
> +               if (!cachep->nodelists[node]) {
> +                       l3 = kmalloc_node(memsize, GFP_KERNEL, node);
> +                       if (!l3)
> +                               return -ENOMEM;
> +                       kmem_list3_init(l3);
> +                       l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
> +                           ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> +
> +                       /*
> +                        * The l3s don't come and go as CPUs come and
> +                        * go.  cache_chain_mutex is sufficient
> +                        * protection here.
> +                        */
> +                       cachep->nodelists[node] = l3;
> +               }
> +
> +               spin_lock_irq(&cachep->nodelists[node]->list_lock);
> +               cachep->nodelists[node]->free_limit =
> +                       (1 + nr_cpus_node(node)) *
> +                       cachep->batchcount + cachep->num;
> +               spin_unlock_irq(&cachep->nodelists[node]->list_lock);
> +       }
> +       return 0;
> +}
> +
>  static void __cpuinit cpuup_canceled(long cpu)
>  {
>        struct kmem_cache *cachep;
> @@ -1172,7 +1219,7 @@ static int __cpuinit cpuup_prepare(long cpu)
>        struct kmem_cache *cachep;
>        struct kmem_list3 *l3 = NULL;
>        int node = cpu_to_node(cpu);
> -       const int memsize = sizeof(struct kmem_list3);
> +       int err;
>
>        /*
>         * We need to do this right in the beginning since
> @@ -1180,35 +1227,9 @@ static int __cpuinit cpuup_prepare(long cpu)
>         * kmalloc_node allows us to add the slab to the right
>         * kmem_list3 and not this cpu's kmem_list3
>         */
> -
> -       list_for_each_entry(cachep, &cache_chain, next) {
> -               /*
> -                * Set up the size64 kmemlist for cpu before we can
> -                * begin anything. Make sure some other cpu on this
> -                * node has not already allocated this
> -                */
> -               if (!cachep->nodelists[node]) {
> -                       l3 = kmalloc_node(memsize, GFP_KERNEL, node);
> -                       if (!l3)
> -                               goto bad;
> -                       kmem_list3_init(l3);
> -                       l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
> -                           ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> -
> -                       /*
> -                        * The l3s don't come and go as CPUs come and
> -                        * go.  cache_chain_mutex is sufficient
> -                        * protection here.
> -                        */
> -                       cachep->nodelists[node] = l3;
> -               }
> -
> -               spin_lock_irq(&cachep->nodelists[node]->list_lock);
> -               cachep->nodelists[node]->free_limit =
> -                       (1 + nr_cpus_node(node)) *
> -                       cachep->batchcount + cachep->num;
> -               spin_unlock_irq(&cachep->nodelists[node]->list_lock);
> -       }
> +       err = init_cache_nodelists_node(node);
> +       if (err < 0)
> +               goto bad;
>
>        /*
>         * Now we can go ahead with allocating the shared arrays and
> @@ -1331,11 +1352,75 @@ static struct notifier_block __cpuinitdata cpucache_notifier = {
>        &cpuup_callback, NULL, 0
>  };
>
> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> +/*
> + * Drains freelist for a node on each slab cache, used for memory hot-remove.
> + * Returns -EBUSY if all objects cannot be drained so that the node is not
> + * removed.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int __meminit drain_cache_nodelists_node(int node)
> +{
> +       struct kmem_cache *cachep;
> +       int ret = 0;
> +
> +       list_for_each_entry(cachep, &cache_chain, next) {
> +               struct kmem_list3 *l3;
> +
> +               l3 = cachep->nodelists[node];
> +               if (!l3)
> +                       continue;
> +
> +               drain_freelist(cachep, l3, l3->free_objects);
> +
> +               if (!list_empty(&l3->slabs_full) ||
> +                   !list_empty(&l3->slabs_partial)) {
> +                       ret = -EBUSY;
> +                       break;
> +               }
> +       }
> +       return ret;
> +}
> +
> +static int __meminit slab_memory_callback(struct notifier_block *self,
> +                                       unsigned long action, void *arg)
> +{
> +       struct memory_notify *mnb = arg;
> +       int ret = 0;
> +       int nid;
> +
> +       nid = mnb->status_change_nid;
> +       if (nid < 0)
> +               goto out;
> +
> +       switch (action) {
> +       case MEM_GOING_ONLINE:
> +               mutex_lock(&cache_chain_mutex);
> +               ret = init_cache_nodelists_node(nid);
> +               mutex_unlock(&cache_chain_mutex);
> +               break;
> +       case MEM_GOING_OFFLINE:
> +               mutex_lock(&cache_chain_mutex);
> +               ret = drain_cache_nodelists_node(nid);
> +               mutex_unlock(&cache_chain_mutex);
> +               break;
> +       case MEM_ONLINE:
> +       case MEM_OFFLINE:
> +       case MEM_CANCEL_ONLINE:
> +       case MEM_CANCEL_OFFLINE:
> +               break;
> +       }
> +out:
> +       return ret ? notifier_from_errno(ret) : NOTIFY_OK;
> +}
> +#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
> +
>  /*
>  * swap the static kmem_list3 with kmalloced memory
>  */
> -static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> -                       int nodeid)
> +static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> +                               int nodeid)
>  {
>        struct kmem_list3 *ptr;
>
> @@ -1580,6 +1665,14 @@ void __init kmem_cache_init_late(void)
>         */
>        register_cpu_notifier(&cpucache_notifier);
>
> +#ifdef CONFIG_NUMA
> +       /*
> +        * Register a memory hotplug callback that initializes and frees
> +        * nodelists.
> +        */
> +       hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +#endif
> +
>        /*
>         * The reap timers are started later, with a module init call: That part
>         * of the kernel is not yet operational.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch v2] slab: add memory hotplug support
  2010-03-30  9:01                                                 ` Pekka Enberg
@ 2010-03-30 16:43                                                   ` Christoph Lameter
  -1 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-03-30 16:43 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Nick Piggin, Andi Kleen, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Tue, 30 Mar 2010, Pekka Enberg wrote:

> Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
> sound to you?

I looked through the patch before and slabwise this seems to beok but I am
still not very sure how this interacts with the node and cpu bootstrap.
You can have the ack with this caveat.

Acked-by: Christoph Lameter <cl@linux-foundation.org>


^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch v2] slab: add memory hotplug support
@ 2010-03-30 16:43                                                   ` Christoph Lameter
  0 siblings, 0 replies; 170+ messages in thread
From: Christoph Lameter @ 2010-03-30 16:43 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Nick Piggin, Andi Kleen, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Tue, 30 Mar 2010, Pekka Enberg wrote:

> Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
> sound to you?

I looked through the patch before and slabwise this seems to beok but I am
still not very sure how this interacts with the node and cpu bootstrap.
You can have the ack with this caveat.

Acked-by: Christoph Lameter <cl@linux-foundation.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch v2] slab: add memory hotplug support
  2010-03-30 16:43                                                   ` Christoph Lameter
@ 2010-04-04 20:45                                                     ` David Rientjes
  -1 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-04-04 20:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Nick Piggin, Andi Kleen, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Tue, 30 Mar 2010, Christoph Lameter wrote:

> > Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
> > sound to you?
> 
> I looked through the patch before and slabwise this seems to beok but I am
> still not very sure how this interacts with the node and cpu bootstrap.
> You can have the ack with this caveat.
> 
> Acked-by: Christoph Lameter <cl@linux-foundation.org>
> 

Thanks.

I tested this for node hotplug by setting ACPI_SRAT_MEM_HOT_PLUGGABLE 
regions and then setting up a new memory section with 
/sys/devices/system/memory/probe.  I onlined the new memory section, which 
mapped to an offline node, and verified that the nwe nodelists were 
initialized correctly.  This is done before the MEM_ONLINE notifier and 
the bit being set in node_states[N_HIGH_MEMORY].  So, for node hot-add, it 
works.

MEM_GOING_OFFLINE is more interesting, but there's nothing harmful about 
draining the freelist and reporting whether there are existing full or 
partial slabs back to the memory hotplug layer to preempt a hot-remove 
since those slabs cannot be freed.  I don't consider that to be a risky 
change.

As far as the interactions between memory and cpu hotplug, they are really 
different things with many of the same implications for the slab layer.  
Both have the possibility of bringing new nodes online or offline and they 
must be dealt with accordingly.  We lack support for offlining an entire 
node at a time since we must hotplug first by adding a new memory section, 
so these notifiers won't be called simultaneously.  Even if they were, 
draining the freelist and checking if a nodelist needs to be initialized 
is not going to be harmful since both notifiers have the same checks for 
existing nodelists (which is not only necessary if we _did_ have 
simultaneous cpu and memory hot-add, but also if a node transitioned from 
online to offline and back to online).

I hope this patch is merged because it obviously fixed a problem on my box 
where a memory section could be added, a node onlined, and then no slab 
metadata being initialized for that memory.

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch v2] slab: add memory hotplug support
@ 2010-04-04 20:45                                                     ` David Rientjes
  0 siblings, 0 replies; 170+ messages in thread
From: David Rientjes @ 2010-04-04 20:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Nick Piggin, Andi Kleen, linux-kernel, linux-mm,
	haicheng.li, KAMEZAWA Hiroyuki

On Tue, 30 Mar 2010, Christoph Lameter wrote:

> > Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
> > sound to you?
> 
> I looked through the patch before and slabwise this seems to beok but I am
> still not very sure how this interacts with the node and cpu bootstrap.
> You can have the ack with this caveat.
> 
> Acked-by: Christoph Lameter <cl@linux-foundation.org>
> 

Thanks.

I tested this for node hotplug by setting ACPI_SRAT_MEM_HOT_PLUGGABLE 
regions and then setting up a new memory section with 
/sys/devices/system/memory/probe.  I onlined the new memory section, which 
mapped to an offline node, and verified that the nwe nodelists were 
initialized correctly.  This is done before the MEM_ONLINE notifier and 
the bit being set in node_states[N_HIGH_MEMORY].  So, for node hot-add, it 
works.

MEM_GOING_OFFLINE is more interesting, but there's nothing harmful about 
draining the freelist and reporting whether there are existing full or 
partial slabs back to the memory hotplug layer to preempt a hot-remove 
since those slabs cannot be freed.  I don't consider that to be a risky 
change.

As far as the interactions between memory and cpu hotplug, they are really 
different things with many of the same implications for the slab layer.  
Both have the possibility of bringing new nodes online or offline and they 
must be dealt with accordingly.  We lack support for offlining an entire 
node at a time since we must hotplug first by adding a new memory section, 
so these notifiers won't be called simultaneously.  Even if they were, 
draining the freelist and checking if a nodelist needs to be initialized 
is not going to be harmful since both notifiers have the same checks for 
existing nodelists (which is not only necessary if we _did_ have 
simultaneous cpu and memory hot-add, but also if a node transitioned from 
online to offline and back to online).

I hope this patch is merged because it obviously fixed a problem on my box 
where a memory section could be added, a node onlined, and then no slab 
metadata being initialized for that memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch v2] slab: add memory hotplug support
  2010-03-28  2:40                                               ` David Rientjes
@ 2010-04-07 16:29                                                 ` Pekka Enberg
  -1 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-04-07 16:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Nick Piggin, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

David Rientjes wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
> 
> When a node is hotadded, a nodelist for that node is allocated and 
> initialized for each slab cache.  If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.  
> 
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
> 
> When an entire node would be offlined, its nodelists are subsequently
> drained.  If slab objects still exist and cannot be freed, the offline is
> aborted.  It is possible that objects will be allocated between this
> drain and page isolation, so it's still possible that the offline will
> still fail, however.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

I queued this up for 2.6.35. Thanks!

^ permalink raw reply	[flat|nested] 170+ messages in thread

* Re: [patch v2] slab: add memory hotplug support
@ 2010-04-07 16:29                                                 ` Pekka Enberg
  0 siblings, 0 replies; 170+ messages in thread
From: Pekka Enberg @ 2010-04-07 16:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Nick Piggin, Andi Kleen, Christoph Lameter, linux-kernel,
	linux-mm, haicheng.li, KAMEZAWA Hiroyuki

David Rientjes wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged.  This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node.  It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
> 
> When a node is hotadded, a nodelist for that node is allocated and 
> initialized for each slab cache.  If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.  
> 
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
> 
> When an entire node would be offlined, its nodelists are subsequently
> drained.  If slab objects still exist and cannot be freed, the offline is
> aborted.  It is possible that objects will be allocated between this
> drain and page isolation, so it's still possible that the offline will
> still fail, however.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

I queued this up for 2.6.35. Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 170+ messages in thread

end of thread, other threads:[~2010-04-07 16:30 UTC | newest]

Thread overview: 170+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-11 20:53 [PATCH] [0/4] Update slab memory hotplug series Andi Kleen
2010-02-11 20:53 ` Andi Kleen
2010-02-11 20:54 ` [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2 Andi Kleen
2010-02-11 20:54   ` Andi Kleen
2010-02-11 21:41   ` David Rientjes
2010-02-11 21:41     ` David Rientjes
2010-02-11 21:55     ` Andi Kleen
2010-02-11 21:55       ` Andi Kleen
2010-02-15  6:04   ` Nick Piggin
2010-02-15  6:04     ` Nick Piggin
2010-02-15 10:07     ` Andi Kleen
2010-02-15 10:07       ` Andi Kleen
2010-02-15 10:22       ` Nick Piggin
2010-02-15 10:22         ` Nick Piggin
2010-02-11 20:54 ` [PATCH] [2/4] SLAB: Separate node initialization into separate function Andi Kleen
2010-02-11 20:54   ` Andi Kleen
2010-02-11 21:44   ` David Rientjes
2010-02-11 21:44     ` David Rientjes
2010-02-11 20:54 ` [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2 Andi Kleen
2010-02-11 20:54   ` Andi Kleen
2010-02-11 21:45   ` David Rientjes
2010-02-11 21:45     ` David Rientjes
2010-02-15  6:06     ` Nick Piggin
2010-02-15  6:06       ` Nick Piggin
2010-02-15 21:47       ` David Rientjes
2010-02-15 21:47         ` David Rientjes
2010-02-16 14:04         ` Nick Piggin
2010-02-16 14:04           ` Nick Piggin
2010-02-16 20:45           ` Pekka Enberg
2010-02-16 20:45             ` Pekka Enberg
2010-02-11 20:54 ` [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap Andi Kleen
2010-02-11 20:54   ` Andi Kleen
2010-02-11 21:45   ` David Rientjes
2010-02-11 21:45     ` David Rientjes
2010-02-15  6:15   ` Nick Piggin
2010-02-15  6:15     ` Nick Piggin
2010-02-15 10:32     ` Andi Kleen
2010-02-15 10:32       ` Andi Kleen
2010-02-15 10:41       ` Nick Piggin
2010-02-15 10:41         ` Nick Piggin
2010-02-15 10:52         ` Andi Kleen
2010-02-15 10:52           ` Andi Kleen
2010-02-15 11:01           ` Nick Piggin
2010-02-15 11:01             ` Nick Piggin
2010-02-15 15:30             ` Andi Kleen
2010-02-15 15:30               ` Andi Kleen
2010-02-19 18:22             ` Christoph Lameter
2010-02-19 18:22               ` Christoph Lameter
2010-02-20  9:01               ` Andi Kleen
2010-02-20  9:01                 ` Andi Kleen
2010-02-22 10:53                 ` Pekka Enberg
2010-02-22 10:53                   ` Pekka Enberg
2010-02-22 14:31                   ` Andi Kleen
2010-02-22 14:31                     ` Andi Kleen
2010-02-22 16:11                     ` Pekka Enberg
2010-02-22 16:11                       ` Pekka Enberg
2010-02-22 20:20                       ` Andi Kleen
2010-02-22 20:20                         ` Andi Kleen
2010-02-24 15:49                 ` Christoph Lameter
2010-02-24 15:49                   ` Christoph Lameter
2010-02-25  7:26                   ` Pekka Enberg
2010-02-25  7:26                     ` Pekka Enberg
2010-02-25  8:01                     ` David Rientjes
2010-02-25  8:01                       ` David Rientjes
2010-02-25 18:30                       ` Christoph Lameter
2010-02-25 18:30                         ` Christoph Lameter
2010-02-25 21:45                         ` David Rientjes
2010-02-25 21:45                           ` David Rientjes
2010-02-25 22:31                           ` Christoph Lameter
2010-02-25 22:31                             ` Christoph Lameter
2010-02-26 10:45                             ` Pekka Enberg
2010-02-26 10:45                               ` Pekka Enberg
2010-02-26 11:43                               ` Andi Kleen
2010-02-26 11:43                                 ` Andi Kleen
2010-02-26 12:35                                 ` Pekka Enberg
2010-02-26 12:35                                   ` Pekka Enberg
2010-02-26 14:08                                   ` Andi Kleen
2010-02-26 14:08                                     ` Andi Kleen
2010-02-26  1:09                         ` KAMEZAWA Hiroyuki
2010-02-26  1:09                           ` KAMEZAWA Hiroyuki
2010-02-26 11:41                         ` Andi Kleen
2010-02-26 11:41                           ` Andi Kleen
2010-02-26 15:04                           ` Christoph Lameter
2010-02-26 15:04                             ` Christoph Lameter
2010-02-26 15:05                             ` Christoph Lameter
2010-02-26 15:05                               ` Christoph Lameter
2010-02-26 15:59                               ` Andi Kleen
2010-02-26 15:59                                 ` Andi Kleen
2010-02-26 15:57                             ` Andi Kleen
2010-02-26 15:57                               ` Andi Kleen
2010-02-26 17:24                               ` Christoph Lameter
2010-02-26 17:24                                 ` Christoph Lameter
2010-02-26 17:31                                 ` Andi Kleen
2010-02-26 17:31                                   ` Andi Kleen
2010-03-01  1:59                                   ` KAMEZAWA Hiroyuki
2010-03-01  1:59                                     ` KAMEZAWA Hiroyuki
2010-03-01 10:27                                     ` David Rientjes
2010-03-01 10:27                                       ` David Rientjes
2010-02-27  0:01                                 ` David Rientjes
2010-02-27  0:01                                   ` David Rientjes
2010-03-01 10:24                                   ` [patch] slab: add memory hotplug support David Rientjes
2010-03-01 10:24                                     ` David Rientjes
2010-03-02  5:53                                     ` Pekka Enberg
2010-03-02  5:53                                       ` Pekka Enberg
2010-03-02 20:20                                       ` Christoph Lameter
2010-03-02 20:20                                         ` Christoph Lameter
2010-03-02 21:03                                         ` David Rientjes
2010-03-02 21:03                                           ` David Rientjes
2010-03-03  1:28                                         ` KAMEZAWA Hiroyuki
2010-03-03  1:28                                           ` KAMEZAWA Hiroyuki
2010-03-03  2:39                                           ` David Rientjes
2010-03-03  2:39                                             ` David Rientjes
2010-03-03  2:51                                             ` KAMEZAWA Hiroyuki
2010-03-03  2:51                                               ` KAMEZAWA Hiroyuki
2010-03-02 12:53                                     ` Andi Kleen
2010-03-02 12:53                                       ` Andi Kleen
2010-03-02 15:04                                       ` Pekka Enberg
2010-03-02 15:04                                         ` Pekka Enberg
2010-03-03 14:34                                         ` Andi Kleen
2010-03-03 14:34                                           ` Andi Kleen
2010-03-03 15:46                                           ` Christoph Lameter
2010-03-03 15:46                                             ` Christoph Lameter
2010-03-02 21:17                                       ` David Rientjes
2010-03-02 21:17                                         ` David Rientjes
2010-03-05  6:20                                     ` Nick Piggin
2010-03-05  6:20                                       ` Nick Piggin
2010-03-05 12:47                                       ` Anca Emanuel
2010-03-05 12:47                                         ` Anca Emanuel
2010-03-05 13:58                                         ` Anca Emanuel
2010-03-05 13:58                                           ` Anca Emanuel
2010-03-05 14:11                                         ` Christoph Lameter
2010-03-05 14:11                                           ` Christoph Lameter
2010-03-08  3:06                                           ` Andi Kleen
2010-03-08  3:06                                             ` Andi Kleen
2010-03-08  2:58                                         ` Andi Kleen
2010-03-08  2:58                                           ` Andi Kleen
2010-03-08 23:19                                       ` David Rientjes
2010-03-08 23:19                                         ` David Rientjes
2010-03-09 13:46                                         ` Nick Piggin
2010-03-09 13:46                                           ` Nick Piggin
2010-03-22 17:28                                           ` Pekka Enberg
2010-03-22 17:28                                             ` Pekka Enberg
2010-03-22 21:12                                             ` Nick Piggin
2010-03-22 21:12                                               ` Nick Piggin
2010-03-28  2:13                                           ` David Rientjes
2010-03-28  2:13                                             ` David Rientjes
2010-03-28  2:40                                             ` [patch v2] " David Rientjes
2010-03-28  2:40                                               ` David Rientjes
2010-03-30  9:01                                               ` Pekka Enberg
2010-03-30  9:01                                                 ` Pekka Enberg
2010-03-30 16:43                                                 ` Christoph Lameter
2010-03-30 16:43                                                   ` Christoph Lameter
2010-04-04 20:45                                                   ` David Rientjes
2010-04-04 20:45                                                     ` David Rientjes
2010-04-07 16:29                                               ` Pekka Enberg
2010-04-07 16:29                                                 ` Pekka Enberg
2010-02-25 18:34                     ` [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap Christoph Lameter
2010-02-25 18:34                       ` Christoph Lameter
2010-02-25 18:46                       ` Pekka Enberg
2010-02-25 18:46                         ` Pekka Enberg
2010-02-25 19:19                         ` Christoph Lameter
2010-02-25 19:19                           ` Christoph Lameter
2010-03-02 12:55                         ` Andi Kleen
2010-03-02 12:55                           ` Andi Kleen
2010-02-19 18:22       ` Christoph Lameter
2010-02-19 18:22         ` Christoph Lameter
2010-02-22 10:57         ` Pekka Enberg
2010-02-22 10:57           ` Pekka Enberg
2010-02-13 10:24 ` [PATCH] [0/4] Update slab memory hotplug series Pekka Enberg
2010-02-13 10:24   ` Pekka Enberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.