All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Improve slab consumption with memoryless nodes
@ 2014-09-09 19:01 ` Nishanth Aravamudan
  0 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-09 19:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
on ppc LPARs with memoryless nodes, a large amount of memory was
consumed by slabs and was marked unreclaimable. He tracked it down to
slab deactivations in the SLUB core when we allocate remotely, leading
to poor efficiency always when memoryless nodes are present.

After much discussion, Joonsoo provided a few patches that help
significantly. They don't resolve the problem altogether:

 - memory hotplug still needs testing, that is when a memoryless node
   becomes memory-ful, we want to dtrt
 - there are other reasons for going off-node than memoryless nodes,
   e.g., fully exhausted local nodes

Neither case is resolved with this series, but I don't think that should
block their acceptance, as they can be explored/resolved with follow-on
patches.

The series consists of:

[1/3] topology: add support for node_to_mem_node() to determine the fallback node
[2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node

 - Joonsoo's patches to cache the nearest node with memory for each
   NUMA node

[3/3] Partial revert of 81c98869faa5 (""kthread: ensure locality of task_struct allocations")

 - At Tejun's request, keep the knowledge of memoryless node fallback to
   the allocator core.

 include/linux/topology.h | 17 +++++++++++++++++
 kernel/kthread.c         |  2 +-
 mm/page_alloc.c          |  1 +
 mm/slub.c                | 24 ++++++++++++++++++------
 4 files changed, 37 insertions(+), 7 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 0/3] Improve slab consumption with memoryless nodes
@ 2014-09-09 19:01 ` Nishanth Aravamudan
  0 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-09 19:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
on ppc LPARs with memoryless nodes, a large amount of memory was
consumed by slabs and was marked unreclaimable. He tracked it down to
slab deactivations in the SLUB core when we allocate remotely, leading
to poor efficiency always when memoryless nodes are present.

After much discussion, Joonsoo provided a few patches that help
significantly. They don't resolve the problem altogether:

 - memory hotplug still needs testing, that is when a memoryless node
   becomes memory-ful, we want to dtrt
 - there are other reasons for going off-node than memoryless nodes,
   e.g., fully exhausted local nodes

Neither case is resolved with this series, but I don't think that should
block their acceptance, as they can be explored/resolved with follow-on
patches.

The series consists of:

[1/3] topology: add support for node_to_mem_node() to determine the fallback node
[2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node

 - Joonsoo's patches to cache the nearest node with memory for each
   NUMA node

[3/3] Partial revert of 81c98869faa5 (""kthread: ensure locality of task_struct allocations")

 - At Tejun's request, keep the knowledge of memoryless node fallback to
   the allocator core.

 include/linux/topology.h | 17 +++++++++++++++++
 kernel/kthread.c         |  2 +-
 mm/page_alloc.c          |  1 +
 mm/slub.c                | 24 ++++++++++++++++++------
 4 files changed, 37 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
  2014-09-09 19:01 ` Nishanth Aravamudan
@ 2014-09-09 19:03   ` Nishanth Aravamudan
  -1 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-09 19:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

We need to determine the fallback node in slub allocator if the
allocation target node is memoryless node. Without it, the SLUB wrongly
select the node which has no memory and can't use a partial slab,
because of node mismatch. Introduced function, node_to_mem_node(X), will
return a node Y with memory that has the nearest distance. If X is
memoryless node, it will return nearest distance node, but, if X is
normal node, it will return itself.

We will use this function in following patch to determine the fallback
node.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

---
v2 -> v3 (Nishanth):
  Fix declaration and definition of _node_numa_mem_.
  s/node_numa_mem/node_to_mem_node/ as suggested by David Rientjes.

Andrew, I decided to leave the definition of the variables as-is. I
followed-up with Christoph, but didn't hear back on what he actually
wanted. I think the naming can be changed in a future patch if it's
urgent.

diff --git a/include/linux/topology.h b/include/linux/topology.h
index dda6ee521e74..909b6e43b694 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -119,11 +119,20 @@ static inline int numa_node_id(void)
  * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
  */
 DECLARE_PER_CPU(int, _numa_mem_);
+extern int _node_numa_mem_[MAX_NUMNODES];
 
 #ifndef set_numa_mem
 static inline void set_numa_mem(int node)
 {
 	this_cpu_write(_numa_mem_, node);
+	_node_numa_mem_[numa_node_id()] = node;
+}
+#endif
+
+#ifndef node_to_mem_node
+static inline int node_to_mem_node(int node)
+{
+	return _node_numa_mem_[node];
 }
 #endif
 
@@ -146,6 +155,7 @@ static inline int cpu_to_mem(int cpu)
 static inline void set_cpu_numa_mem(int cpu, int node)
 {
 	per_cpu(_numa_mem_, cpu) = node;
+	_node_numa_mem_[cpu_to_node(cpu)] = node;
 }
 #endif
 
@@ -159,6 +169,13 @@ static inline int numa_mem_id(void)
 }
 #endif
 
+#ifndef node_to_mem_node
+static inline int node_to_mem_node(int node)
+{
+	return node;
+}
+#endif
+
 #ifndef cpu_to_mem
 static inline int cpu_to_mem(int cpu)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 18cee0d4c8a2..0883c42936d4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
  */
 DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
 EXPORT_PER_CPU_SYMBOL(_numa_mem_);
+int _node_numa_mem_[MAX_NUMNODES];
 #endif
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
@ 2014-09-09 19:03   ` Nishanth Aravamudan
  0 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-09 19:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

We need to determine the fallback node in slub allocator if the
allocation target node is memoryless node. Without it, the SLUB wrongly
select the node which has no memory and can't use a partial slab,
because of node mismatch. Introduced function, node_to_mem_node(X), will
return a node Y with memory that has the nearest distance. If X is
memoryless node, it will return nearest distance node, but, if X is
normal node, it will return itself.

We will use this function in following patch to determine the fallback
node.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

---
v2 -> v3 (Nishanth):
  Fix declaration and definition of _node_numa_mem_.
  s/node_numa_mem/node_to_mem_node/ as suggested by David Rientjes.

Andrew, I decided to leave the definition of the variables as-is. I
followed-up with Christoph, but didn't hear back on what he actually
wanted. I think the naming can be changed in a future patch if it's
urgent.

diff --git a/include/linux/topology.h b/include/linux/topology.h
index dda6ee521e74..909b6e43b694 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -119,11 +119,20 @@ static inline int numa_node_id(void)
  * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
  */
 DECLARE_PER_CPU(int, _numa_mem_);
+extern int _node_numa_mem_[MAX_NUMNODES];
 
 #ifndef set_numa_mem
 static inline void set_numa_mem(int node)
 {
 	this_cpu_write(_numa_mem_, node);
+	_node_numa_mem_[numa_node_id()] = node;
+}
+#endif
+
+#ifndef node_to_mem_node
+static inline int node_to_mem_node(int node)
+{
+	return _node_numa_mem_[node];
 }
 #endif
 
@@ -146,6 +155,7 @@ static inline int cpu_to_mem(int cpu)
 static inline void set_cpu_numa_mem(int cpu, int node)
 {
 	per_cpu(_numa_mem_, cpu) = node;
+	_node_numa_mem_[cpu_to_node(cpu)] = node;
 }
 #endif
 
@@ -159,6 +169,13 @@ static inline int numa_mem_id(void)
 }
 #endif
 
+#ifndef node_to_mem_node
+static inline int node_to_mem_node(int node)
+{
+	return node;
+}
+#endif
+
 #ifndef cpu_to_mem
 static inline int cpu_to_mem(int cpu)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 18cee0d4c8a2..0883c42936d4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
  */
 DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
 EXPORT_PER_CPU_SYMBOL(_numa_mem_);
+int _node_numa_mem_[MAX_NUMNODES];
 #endif
 
 /*

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node
  2014-09-09 19:03   ` Nishanth Aravamudan
@ 2014-09-09 19:05     ` Nishanth Aravamudan
  -1 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-09 19:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Update the SLUB code to search for partial slabs on the nearest node
with memory in the presence of memoryless nodes. Additionally, do not
consider it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when
a memoryless-node specified allocation goes off-node.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

---
v1 -> v2 (Nishanth):
  Add commit message
  Clean-up conditions in get_partial()

diff --git a/mm/slub.c b/mm/slub.c
index 3e8afcc07a76..497fdfed2f01 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1699,7 +1699,12 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 		struct kmem_cache_cpu *c)
 {
 	void *object;
-	int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
+	int searchnode = node;
+
+	if (node == NUMA_NO_NODE)
+		searchnode = numa_mem_id();
+	else if (!node_present_pages(node))
+		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2280,11 +2285,18 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, page, c->freelist);
-		c->page = NULL;
-		c->freelist = NULL;
-		goto new_slab;
+		int searchnode = node;
+
+		if (node != NUMA_NO_NODE && !node_present_pages(node))
+			searchnode = node_to_mem_node(node);
+
+		if (unlikely(!node_match(page, searchnode))) {
+			stat(s, ALLOC_NODE_MISMATCH);
+			deactivate_slab(s, page, c->freelist);
+			c->page = NULL;
+			c->freelist = NULL;
+			goto new_slab;
+		}
 	}
 
 	/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node
@ 2014-09-09 19:05     ` Nishanth Aravamudan
  0 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-09 19:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Update the SLUB code to search for partial slabs on the nearest node
with memory in the presence of memoryless nodes. Additionally, do not
consider it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when
a memoryless-node specified allocation goes off-node.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

---
v1 -> v2 (Nishanth):
  Add commit message
  Clean-up conditions in get_partial()

diff --git a/mm/slub.c b/mm/slub.c
index 3e8afcc07a76..497fdfed2f01 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1699,7 +1699,12 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 		struct kmem_cache_cpu *c)
 {
 	void *object;
-	int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
+	int searchnode = node;
+
+	if (node == NUMA_NO_NODE)
+		searchnode = numa_mem_id();
+	else if (!node_present_pages(node))
+		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2280,11 +2285,18 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, page, c->freelist);
-		c->page = NULL;
-		c->freelist = NULL;
-		goto new_slab;
+		int searchnode = node;
+
+		if (node != NUMA_NO_NODE && !node_present_pages(node))
+			searchnode = node_to_mem_node(node);
+
+		if (unlikely(!node_match(page, searchnode))) {
+			stat(s, ALLOC_NODE_MISMATCH);
+			deactivate_slab(s, page, c->freelist);
+			c->page = NULL;
+			c->freelist = NULL;
+			goto new_slab;
+		}
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] Partial revert of 81c98869faa5 ("kthread: ensure locality of task_struct allocations")
  2014-09-09 19:05     ` Nishanth Aravamudan
@ 2014-09-09 19:06       ` Nishanth Aravamudan
  -1 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-09 19:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

After discussions with Tejun, we don't want to spread the use of
cpu_to_mem() (and thus knowledge of allocators/NUMA topology details)
into callers, but would rather ensure the callees correctly handle
memoryless nodes. With the previous patches ("topology: add support for
node_to_mem_node() to determine the fallback node" and "slub: fallback
to node_to_mem_node() node if allocating on memoryless node") adding and
using node_to_mem_node(), we can safely undo part of the change to the
kthread logic from 81c98869faa5.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

diff --git a/kernel/kthread.c b/kernel/kthread.c
index ef483220e855..10e489c448fe 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -369,7 +369,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
 {
 	struct task_struct *p;
 
-	p = kthread_create_on_node(threadfn, data, cpu_to_mem(cpu), namefmt,
+	p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
 				   cpu);
 	if (IS_ERR(p))
 		return p;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] Partial revert of 81c98869faa5 ("kthread: ensure locality of task_struct allocations")
@ 2014-09-09 19:06       ` Nishanth Aravamudan
  0 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-09 19:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

After discussions with Tejun, we don't want to spread the use of
cpu_to_mem() (and thus knowledge of allocators/NUMA topology details)
into callers, but would rather ensure the callees correctly handle
memoryless nodes. With the previous patches ("topology: add support for
node_to_mem_node() to determine the fallback node" and "slub: fallback
to node_to_mem_node() node if allocating on memoryless node") adding and
using node_to_mem_node(), we can safely undo part of the change to the
kthread logic from 81c98869faa5.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

diff --git a/kernel/kthread.c b/kernel/kthread.c
index ef483220e855..10e489c448fe 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -369,7 +369,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
 {
 	struct task_struct *p;
 
-	p = kthread_create_on_node(threadfn, data, cpu_to_mem(cpu), namefmt,
+	p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
 				   cpu);
 	if (IS_ERR(p))
 		return p;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
  2014-09-09 19:03   ` Nishanth Aravamudan
@ 2014-09-10  0:11     ` Andrew Morton
  -1 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2014-09-10  0:11 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

On Tue, 9 Sep 2014 12:03:27 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> We need to determine the fallback node in slub allocator if the
> allocation target node is memoryless node. Without it, the SLUB wrongly
> select the node which has no memory and can't use a partial slab,
> because of node mismatch. Introduced function, node_to_mem_node(X), will
> return a node Y with memory that has the nearest distance. If X is
> memoryless node, it will return nearest distance node, but, if X is
> normal node, it will return itself.
> 
> We will use this function in following patch to determine the fallback
> node.
> 
> ...
>
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
>   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().

This comment could be updated.

>   */
>  DECLARE_PER_CPU(int, _numa_mem_);
> +extern int _node_numa_mem_[MAX_NUMNODES];
>  
>  #ifndef set_numa_mem
>  static inline void set_numa_mem(int node)
>  {
>  	this_cpu_write(_numa_mem_, node);
> +	_node_numa_mem_[numa_node_id()] = node;
> +}
> +#endif
> +
> +#ifndef node_to_mem_node
> +static inline int node_to_mem_node(int node)
> +{
> +	return _node_numa_mem_[node];
>  }

A wee bit of documentation wouldn't hurt.

How does node_to_mem_node(numa_node_id()) differ from numa_mem_id()? 
If I'm reading things correctly, they should both always return the
same thing.  If so, do we need both?

Will node_to_mem_node() ever actually be called with a node !=
numa_node_id()?


>  #endif
>  
> @@ -146,6 +155,7 @@ static inline int cpu_to_mem(int cpu)
>  static inline void set_cpu_numa_mem(int cpu, int node)
>  {
>  	per_cpu(_numa_mem_, cpu) = node;
> +	_node_numa_mem_[cpu_to_node(cpu)] = node;
>  }
>  #endif
>  
> @@ -159,6 +169,13 @@ static inline int numa_mem_id(void)
>  }
>  #endif
>  
> +#ifndef node_to_mem_node
> +static inline int node_to_mem_node(int node)
> +{
> +	return node;
> +}
> +#endif
> +
>  #ifndef cpu_to_mem
>  static inline int cpu_to_mem(int cpu)
>  {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 18cee0d4c8a2..0883c42936d4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
>   */
>  DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
>  EXPORT_PER_CPU_SYMBOL(_numa_mem_);
> +int _node_numa_mem_[MAX_NUMNODES];

How does this get updated as CPUs, memory and nodes are hot-added and
removed?


>  #endif
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
@ 2014-09-10  0:11     ` Andrew Morton
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2014-09-10  0:11 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

On Tue, 9 Sep 2014 12:03:27 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> We need to determine the fallback node in slub allocator if the
> allocation target node is memoryless node. Without it, the SLUB wrongly
> select the node which has no memory and can't use a partial slab,
> because of node mismatch. Introduced function, node_to_mem_node(X), will
> return a node Y with memory that has the nearest distance. If X is
> memoryless node, it will return nearest distance node, but, if X is
> normal node, it will return itself.
> 
> We will use this function in following patch to determine the fallback
> node.
> 
> ...
>
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
>   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().

This comment could be updated.

>   */
>  DECLARE_PER_CPU(int, _numa_mem_);
> +extern int _node_numa_mem_[MAX_NUMNODES];
>  
>  #ifndef set_numa_mem
>  static inline void set_numa_mem(int node)
>  {
>  	this_cpu_write(_numa_mem_, node);
> +	_node_numa_mem_[numa_node_id()] = node;
> +}
> +#endif
> +
> +#ifndef node_to_mem_node
> +static inline int node_to_mem_node(int node)
> +{
> +	return _node_numa_mem_[node];
>  }

A wee bit of documentation wouldn't hurt.

How does node_to_mem_node(numa_node_id()) differ from numa_mem_id()? 
If I'm reading things correctly, they should both always return the
same thing.  If so, do we need both?

Will node_to_mem_node() ever actually be called with a node !=
numa_node_id()?


>  #endif
>  
> @@ -146,6 +155,7 @@ static inline int cpu_to_mem(int cpu)
>  static inline void set_cpu_numa_mem(int cpu, int node)
>  {
>  	per_cpu(_numa_mem_, cpu) = node;
> +	_node_numa_mem_[cpu_to_node(cpu)] = node;
>  }
>  #endif
>  
> @@ -159,6 +169,13 @@ static inline int numa_mem_id(void)
>  }
>  #endif
>  
> +#ifndef node_to_mem_node
> +static inline int node_to_mem_node(int node)
> +{
> +	return node;
> +}
> +#endif
> +
>  #ifndef cpu_to_mem
>  static inline int cpu_to_mem(int cpu)
>  {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 18cee0d4c8a2..0883c42936d4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
>   */
>  DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
>  EXPORT_PER_CPU_SYMBOL(_numa_mem_);
> +int _node_numa_mem_[MAX_NUMNODES];

How does this get updated as CPUs, memory and nodes are hot-added and
removed?


>  #endif
>  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node
  2014-09-09 19:05     ` Nishanth Aravamudan
@ 2014-09-10  0:11       ` Andrew Morton
  -1 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2014-09-10  0:11 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

On Tue, 9 Sep 2014 12:05:14 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Update the SLUB code to search for partial slabs on the nearest node
> with memory in the presence of memoryless nodes. Additionally, do not
> consider it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when
> a memoryless-node specified allocation goes off-node.
> 
> ...
>
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1699,7 +1699,12 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>  		struct kmem_cache_cpu *c)
>  {
>  	void *object;
> -	int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
> +	int searchnode = node;
> +
> +	if (node == NUMA_NO_NODE)
> +		searchnode = numa_mem_id();
> +	else if (!node_present_pages(node))
> +		searchnode = node_to_mem_node(node);

I expect a call to node_to_mem_node() will always be preceded by a test
of node_present_pages().  Perhaps node_to_mem_node() should just do the
node_present_pages() call itself?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node
@ 2014-09-10  0:11       ` Andrew Morton
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2014-09-10  0:11 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

On Tue, 9 Sep 2014 12:05:14 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Update the SLUB code to search for partial slabs on the nearest node
> with memory in the presence of memoryless nodes. Additionally, do not
> consider it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when
> a memoryless-node specified allocation goes off-node.
> 
> ...
>
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1699,7 +1699,12 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>  		struct kmem_cache_cpu *c)
>  {
>  	void *object;
> -	int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
> +	int searchnode = node;
> +
> +	if (node == NUMA_NO_NODE)
> +		searchnode = numa_mem_id();
> +	else if (!node_present_pages(node))
> +		searchnode = node_to_mem_node(node);

I expect a call to node_to_mem_node() will always be preceded by a test
of node_present_pages().  Perhaps node_to_mem_node() should just do the
node_present_pages() call itself?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
  2014-09-10  0:11     ` Andrew Morton
@ 2014-09-10  0:47       ` Nishanth Aravamudan
  -1 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-10  0:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

On 09.09.2014 [17:11:15 -0700], Andrew Morton wrote:
> On Tue, 9 Sep 2014 12:03:27 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> 
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > We need to determine the fallback node in slub allocator if the
> > allocation target node is memoryless node. Without it, the SLUB wrongly
> > select the node which has no memory and can't use a partial slab,
> > because of node mismatch. Introduced function, node_to_mem_node(X), will
> > return a node Y with memory that has the nearest distance. If X is
> > memoryless node, it will return nearest distance node, but, if X is
> > normal node, it will return itself.
> > 
> > We will use this function in following patch to determine the fallback
> > node.
> > 
> > ...
> >
> > --- a/include/linux/topology.h
> > +++ b/include/linux/topology.h
> > @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> >   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> 
> This comment could be updated.

Will do, do you prefer a follow-on patch or one that replaces this one?

> >   */
> >  DECLARE_PER_CPU(int, _numa_mem_);
> > +extern int _node_numa_mem_[MAX_NUMNODES];
> >  
> >  #ifndef set_numa_mem
> >  static inline void set_numa_mem(int node)
> >  {
> >  	this_cpu_write(_numa_mem_, node);
> > +	_node_numa_mem_[numa_node_id()] = node;
> > +}
> > +#endif
> > +
> > +#ifndef node_to_mem_node
> > +static inline int node_to_mem_node(int node)
> > +{
> > +	return _node_numa_mem_[node];
> >  }
> 
> A wee bit of documentation wouldn't hurt.
> 
> How does node_to_mem_node(numa_node_id()) differ from numa_mem_id()? 
> If I'm reading things correctly, they should both always return the
> same thing.  If so, do we need both?

That seems correct to me. The nearest memory node of this cpu's NUMA
node (node_to_mem_node(numa_node_id()) is always equal to the nearest
memory node (numa_mem_id()).

> Will node_to_mem_node() ever actually be called with a node !=
> numa_node_id()?

Well, it's a layering problem. The eventual callers of
node_to_mem_node() only have the requested NUMA node (if any) available.
I think because get_partial() __slab_alloc() allow for allocations for
any node, and that's where we see the slab deactivation issues, we need
to support this in the API.

In practice, it's probably that the node parameter is often
numa_node_id(), but we can't be sure of that in these call-paths,
afaict.
 
> >  #endif
> >  
> > @@ -146,6 +155,7 @@ static inline int cpu_to_mem(int cpu)
> >  static inline void set_cpu_numa_mem(int cpu, int node)
> >  {
> >  	per_cpu(_numa_mem_, cpu) = node;
> > +	_node_numa_mem_[cpu_to_node(cpu)] = node;
> >  }
> >  #endif
> >  
> > @@ -159,6 +169,13 @@ static inline int numa_mem_id(void)
> >  }
> >  #endif
> >  
> > +#ifndef node_to_mem_node
> > +static inline int node_to_mem_node(int node)
> > +{
> > +	return node;
> > +}
> > +#endif
> > +
> >  #ifndef cpu_to_mem
> >  static inline int cpu_to_mem(int cpu)
> >  {
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 18cee0d4c8a2..0883c42936d4 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
> >   */
> >  DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
> >  EXPORT_PER_CPU_SYMBOL(_numa_mem_);
> > +int _node_numa_mem_[MAX_NUMNODES];
> 
> How does this get updated as CPUs, memory and nodes are hot-added and
> removed?

As CPUs are added, the architecture code in the CPU bringup will update
the NUMA topology. Memory and node hotplug are still open issues, I
mentioned the former in the cover letter. I should have mentioned it in
this commit message as well.

I do notice that Lee's commit message from 7aac78988551 ("numa:
introduce numa_mem_id()- effective local memory node id"):

"Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change
of numa_zonelist_order, or when node or memory hot-plug requires
zonelist rebuild.  Archs that support memoryless nodes will need to
initialize 'numa_mem' for secondary cpus as they're brought on-line."

And since we update the _node_numa_mem_ value on set_cpu_numa_mem()
calls, which were already needed for numa_mem_id(), we might be covered.
Testing these cases (hotplug) is next in my plans.

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
@ 2014-09-10  0:47       ` Nishanth Aravamudan
  0 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-10  0:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

On 09.09.2014 [17:11:15 -0700], Andrew Morton wrote:
> On Tue, 9 Sep 2014 12:03:27 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> 
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > We need to determine the fallback node in slub allocator if the
> > allocation target node is memoryless node. Without it, the SLUB wrongly
> > select the node which has no memory and can't use a partial slab,
> > because of node mismatch. Introduced function, node_to_mem_node(X), will
> > return a node Y with memory that has the nearest distance. If X is
> > memoryless node, it will return nearest distance node, but, if X is
> > normal node, it will return itself.
> > 
> > We will use this function in following patch to determine the fallback
> > node.
> > 
> > ...
> >
> > --- a/include/linux/topology.h
> > +++ b/include/linux/topology.h
> > @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> >   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> 
> This comment could be updated.

Will do, do you prefer a follow-on patch or one that replaces this one?

> >   */
> >  DECLARE_PER_CPU(int, _numa_mem_);
> > +extern int _node_numa_mem_[MAX_NUMNODES];
> >  
> >  #ifndef set_numa_mem
> >  static inline void set_numa_mem(int node)
> >  {
> >  	this_cpu_write(_numa_mem_, node);
> > +	_node_numa_mem_[numa_node_id()] = node;
> > +}
> > +#endif
> > +
> > +#ifndef node_to_mem_node
> > +static inline int node_to_mem_node(int node)
> > +{
> > +	return _node_numa_mem_[node];
> >  }
> 
> A wee bit of documentation wouldn't hurt.
> 
> How does node_to_mem_node(numa_node_id()) differ from numa_mem_id()? 
> If I'm reading things correctly, they should both always return the
> same thing.  If so, do we need both?

That seems correct to me. The nearest memory node of this cpu's NUMA
node (node_to_mem_node(numa_node_id()) is always equal to the nearest
memory node (numa_mem_id()).

> Will node_to_mem_node() ever actually be called with a node !=
> numa_node_id()?

Well, it's a layering problem. The eventual callers of
node_to_mem_node() only have the requested NUMA node (if any) available.
I think because get_partial() __slab_alloc() allow for allocations for
any node, and that's where we see the slab deactivation issues, we need
to support this in the API.

In practice, it's probably that the node parameter is often
numa_node_id(), but we can't be sure of that in these call-paths,
afaict.
 
> >  #endif
> >  
> > @@ -146,6 +155,7 @@ static inline int cpu_to_mem(int cpu)
> >  static inline void set_cpu_numa_mem(int cpu, int node)
> >  {
> >  	per_cpu(_numa_mem_, cpu) = node;
> > +	_node_numa_mem_[cpu_to_node(cpu)] = node;
> >  }
> >  #endif
> >  
> > @@ -159,6 +169,13 @@ static inline int numa_mem_id(void)
> >  }
> >  #endif
> >  
> > +#ifndef node_to_mem_node
> > +static inline int node_to_mem_node(int node)
> > +{
> > +	return node;
> > +}
> > +#endif
> > +
> >  #ifndef cpu_to_mem
> >  static inline int cpu_to_mem(int cpu)
> >  {
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 18cee0d4c8a2..0883c42936d4 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
> >   */
> >  DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
> >  EXPORT_PER_CPU_SYMBOL(_numa_mem_);
> > +int _node_numa_mem_[MAX_NUMNODES];
> 
> How does this get updated as CPUs, memory and nodes are hot-added and
> removed?

As CPUs are added, the architecture code in the CPU bringup will update
the NUMA topology. Memory and node hotplug are still open issues, I
mentioned the former in the cover letter. I should have mentioned it in
this commit message as well.

I do notice that Lee's commit message from 7aac78988551 ("numa:
introduce numa_mem_id()- effective local memory node id"):

"Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change
of numa_zonelist_order, or when node or memory hot-plug requires
zonelist rebuild.  Archs that support memoryless nodes will need to
initialize 'numa_mem' for secondary cpus as they're brought on-line."

And since we update the _node_numa_mem_ value on set_cpu_numa_mem()
calls, which were already needed for numa_mem_id(), we might be covered.
Testing these cases (hotplug) is next in my plans.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node
  2014-09-10  0:11       ` Andrew Morton
@ 2014-09-10  0:55         ` Nishanth Aravamudan
  -1 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-10  0:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

On 09.09.2014 [17:11:25 -0700], Andrew Morton wrote:
> On Tue, 9 Sep 2014 12:05:14 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> 
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > Update the SLUB code to search for partial slabs on the nearest node
> > with memory in the presence of memoryless nodes. Additionally, do not
> > consider it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when
> > a memoryless-node specified allocation goes off-node.
> > 
> > ...
> >
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1699,7 +1699,12 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> >  		struct kmem_cache_cpu *c)
> >  {
> >  	void *object;
> > -	int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
> > +	int searchnode = node;
> > +
> > +	if (node == NUMA_NO_NODE)
> > +		searchnode = numa_mem_id();
> > +	else if (!node_present_pages(node))
> > +		searchnode = node_to_mem_node(node);
> 
> I expect a call to node_to_mem_node() will always be preceded by a test
> of node_present_pages().  Perhaps node_to_mem_node() should just do the
> node_present_pages() call itself?

Really, we don't need that test here. We could always use the result of
node_to_mem_node() in the else. If memoryless nodes are not supported
(off in .config), then node_to_mem_node() trivially returns. If they are
supported, it returns the correct value for all nodes.
 
It's just an optimization (premature?) since we can avoid worrying (in
this path) about memoryless nodes if the node in question has memory.

And, in fact, in __slab_alloc(), we could do the following:

...
	int searchnode = node;

	if (node != NUMA_NO_NODE)
		searchnode = node_to_mem_node(node);

	if (node != searchnode &&
		unlikely(!node_match(page, searchnode))) {

...

which would minimize the impact to non-memoryless node NUMA configs.

Does that seem better to you? I can add comments to this patch as well.

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node
@ 2014-09-10  0:55         ` Nishanth Aravamudan
  0 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-10  0:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

On 09.09.2014 [17:11:25 -0700], Andrew Morton wrote:
> On Tue, 9 Sep 2014 12:05:14 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> 
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > Update the SLUB code to search for partial slabs on the nearest node
> > with memory in the presence of memoryless nodes. Additionally, do not
> > consider it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when
> > a memoryless-node specified allocation goes off-node.
> > 
> > ...
> >
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1699,7 +1699,12 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> >  		struct kmem_cache_cpu *c)
> >  {
> >  	void *object;
> > -	int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
> > +	int searchnode = node;
> > +
> > +	if (node == NUMA_NO_NODE)
> > +		searchnode = numa_mem_id();
> > +	else if (!node_present_pages(node))
> > +		searchnode = node_to_mem_node(node);
> 
> I expect a call to node_to_mem_node() will always be preceded by a test
> of node_present_pages().  Perhaps node_to_mem_node() should just do the
> node_present_pages() call itself?

Really, we don't need that test here. We could always use the result of
node_to_mem_node() in the else. If memoryless nodes are not supported
(off in .config), then node_to_mem_node() trivially returns. If they are
supported, it returns the correct value for all nodes.
 
It's just an optimization (premature?) since we can avoid worrying (in
this path) about memoryless nodes if the node in question has memory.

And, in fact, in __slab_alloc(), we could do the following:

...
	int searchnode = node;

	if (node != NUMA_NO_NODE)
		searchnode = node_to_mem_node(node);

	if (node != searchnode &&
		unlikely(!node_match(page, searchnode))) {

...

which would minimize the impact to non-memoryless node NUMA configs.

Does that seem better to you? I can add comments to this patch as well.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
  2014-09-10  0:47       ` Nishanth Aravamudan
@ 2014-09-10 19:06         ` Andrew Morton
  -1 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2014-09-10 19:06 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

On Tue, 9 Sep 2014 17:47:23 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:

> On 09.09.2014 [17:11:15 -0700], Andrew Morton wrote:
> > On Tue, 9 Sep 2014 12:03:27 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> > 
> > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > 
> > > We need to determine the fallback node in slub allocator if the
> > > allocation target node is memoryless node. Without it, the SLUB wrongly
> > > select the node which has no memory and can't use a partial slab,
> > > because of node mismatch. Introduced function, node_to_mem_node(X), will
> > > return a node Y with memory that has the nearest distance. If X is
> > > memoryless node, it will return nearest distance node, but, if X is
> > > normal node, it will return itself.
> > > 
> > > We will use this function in following patch to determine the fallback
> > > node.
> > > 
> > > ...
> > >
> > > --- a/include/linux/topology.h
> > > +++ b/include/linux/topology.h
> > > @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> > >   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> > 
> > This comment could be updated.
> 
> Will do, do you prefer a follow-on patch or one that replaces this one?

Either is OK for me.  I always turn replacement patches into
incrementals so I (and others) can see what changed.

> > >   */
> > >  DECLARE_PER_CPU(int, _numa_mem_);
> > > +extern int _node_numa_mem_[MAX_NUMNODES];
> > >  
> > >  #ifndef set_numa_mem
> > >  static inline void set_numa_mem(int node)
> > >  {
> > >  	this_cpu_write(_numa_mem_, node);
> > > +	_node_numa_mem_[numa_node_id()] = node;
> > > +}
> > > +#endif
> > > +
> > > +#ifndef node_to_mem_node
> > > +static inline int node_to_mem_node(int node)
> > > +{
> > > +	return _node_numa_mem_[node];
> > >  }
> > 
> > A wee bit of documentation wouldn't hurt.

This?

> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
> > >   */
> > >  DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
> > >  EXPORT_PER_CPU_SYMBOL(_numa_mem_);
> > > +int _node_numa_mem_[MAX_NUMNODES];
> > 
> > How does this get updated as CPUs, memory and nodes are hot-added and
> > removed?
> 
> As CPUs are added, the architecture code in the CPU bringup will update
> the NUMA topology. Memory and node hotplug are still open issues, I
> mentioned the former in the cover letter. I should have mentioned it in
> this commit message as well.

Please define "open issue".  The computer will crash and catch fire?
If not that, then what?

> I do notice that Lee's commit message from 7aac78988551 ("numa:
> introduce numa_mem_id()- effective local memory node id"):
> 
> "Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
> This will initialize the boot cpu at boot time, and all cpus on change
> of numa_zonelist_order, or when node or memory hot-plug requires
> zonelist rebuild.  Archs that support memoryless nodes will need to
> initialize 'numa_mem' for secondary cpus as they're brought on-line."
> 
> And since we update the _node_numa_mem_ value on set_cpu_numa_mem()
> calls, which were already needed for numa_mem_id(), we might be covered.
> Testing these cases (hotplug) is next in my plans.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
@ 2014-09-10 19:06         ` Andrew Morton
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2014-09-10 19:06 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

On Tue, 9 Sep 2014 17:47:23 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:

> On 09.09.2014 [17:11:15 -0700], Andrew Morton wrote:
> > On Tue, 9 Sep 2014 12:03:27 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> > 
> > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > 
> > > We need to determine the fallback node in slub allocator if the
> > > allocation target node is memoryless node. Without it, the SLUB wrongly
> > > select the node which has no memory and can't use a partial slab,
> > > because of node mismatch. Introduced function, node_to_mem_node(X), will
> > > return a node Y with memory that has the nearest distance. If X is
> > > memoryless node, it will return nearest distance node, but, if X is
> > > normal node, it will return itself.
> > > 
> > > We will use this function in following patch to determine the fallback
> > > node.
> > > 
> > > ...
> > >
> > > --- a/include/linux/topology.h
> > > +++ b/include/linux/topology.h
> > > @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> > >   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> > 
> > This comment could be updated.
> 
> Will do, do you prefer a follow-on patch or one that replaces this one?

Either is OK for me.  I always turn replacement patches into
incrementals so I (and others) can see what changed.

> > >   */
> > >  DECLARE_PER_CPU(int, _numa_mem_);
> > > +extern int _node_numa_mem_[MAX_NUMNODES];
> > >  
> > >  #ifndef set_numa_mem
> > >  static inline void set_numa_mem(int node)
> > >  {
> > >  	this_cpu_write(_numa_mem_, node);
> > > +	_node_numa_mem_[numa_node_id()] = node;
> > > +}
> > > +#endif
> > > +
> > > +#ifndef node_to_mem_node
> > > +static inline int node_to_mem_node(int node)
> > > +{
> > > +	return _node_numa_mem_[node];
> > >  }
> > 
> > A wee bit of documentation wouldn't hurt.

This?

> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
> > >   */
> > >  DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
> > >  EXPORT_PER_CPU_SYMBOL(_numa_mem_);
> > > +int _node_numa_mem_[MAX_NUMNODES];
> > 
> > How does this get updated as CPUs, memory and nodes are hot-added and
> > removed?
> 
> As CPUs are added, the architecture code in the CPU bringup will update
> the NUMA topology. Memory and node hotplug are still open issues, I
> mentioned the former in the cover letter. I should have mentioned it in
> this commit message as well.

Please define "open issue".  The computer will crash and catch fire?
If not that, then what?

> I do notice that Lee's commit message from 7aac78988551 ("numa:
> introduce numa_mem_id()- effective local memory node id"):
> 
> "Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
> This will initialize the boot cpu at boot time, and all cpus on change
> of numa_zonelist_order, or when node or memory hot-plug requires
> zonelist rebuild.  Archs that support memoryless nodes will need to
> initialize 'numa_mem' for secondary cpus as they're brought on-line."
> 
> And since we update the _node_numa_mem_ value on set_cpu_numa_mem()
> calls, which were already needed for numa_mem_id(), we might be covered.
> Testing these cases (hotplug) is next in my plans.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
  2014-09-10 19:06         ` Andrew Morton
@ 2014-09-10 22:49           ` Nishanth Aravamudan
  -1 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-10 22:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, David Rientjes, Han Pingtian, Pekka Enberg,
	Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman,
	Anton Blanchard, Matt Mackall, Christoph Lameter, Wanpeng Li,
	Tejun Heo, Linux Memory Management List, linuxppc-dev

On 10.09.2014 [12:06:16 -0700], Andrew Morton wrote:
> On Tue, 9 Sep 2014 17:47:23 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> 
> > On 09.09.2014 [17:11:15 -0700], Andrew Morton wrote:
> > > On Tue, 9 Sep 2014 12:03:27 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> > > 
> > > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > > 
> > > > We need to determine the fallback node in slub allocator if the
> > > > allocation target node is memoryless node. Without it, the SLUB wrongly
> > > > select the node which has no memory and can't use a partial slab,
> > > > because of node mismatch. Introduced function, node_to_mem_node(X), will
> > > > return a node Y with memory that has the nearest distance. If X is
> > > > memoryless node, it will return nearest distance node, but, if X is
> > > > normal node, it will return itself.
> > > > 
> > > > We will use this function in following patch to determine the fallback
> > > > node.
> > > > 
> > > > ...
> > > >
> > > > --- a/include/linux/topology.h
> > > > +++ b/include/linux/topology.h
> > > > @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> > > >   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> > > 
> > > This comment could be updated.
> > 
> > Will do, do you prefer a follow-on patch or one that replaces this one?
> 
> Either is OK for me.  I always turn replacement patches into
> incrementals so I (and others) can see what changed.

Ok, I'll probably send you just an incremental then myself.

> > > >   */
> > > >  DECLARE_PER_CPU(int, _numa_mem_);
> > > > +extern int _node_numa_mem_[MAX_NUMNODES];
> > > >  
> > > >  #ifndef set_numa_mem
> > > >  static inline void set_numa_mem(int node)
> > > >  {
> > > >  	this_cpu_write(_numa_mem_, node);
> > > > +	_node_numa_mem_[numa_node_id()] = node;
> > > > +}
> > > > +#endif
> > > > +
> > > > +#ifndef node_to_mem_node
> > > > +static inline int node_to_mem_node(int node)
> > > > +{
> > > > +	return _node_numa_mem_[node];
> > > >  }
> > > 
> > > A wee bit of documentation wouldn't hurt.
> 
> This?

Yep, I'll make sure it gets added.

> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
> > > >   */
> > > >  DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
> > > >  EXPORT_PER_CPU_SYMBOL(_numa_mem_);
> > > > +int _node_numa_mem_[MAX_NUMNODES];
> > > 
> > > How does this get updated as CPUs, memory and nodes are hot-added and
> > > removed?
> > 
> > As CPUs are added, the architecture code in the CPU bringup will update
> > the NUMA topology. Memory and node hotplug are still open issues, I
> > mentioned the former in the cover letter. I should have mentioned it in
> > this commit message as well.
> 
> Please define "open issue".  The computer will crash and catch fire?
> If not that, then what?

Umm, let's call it "undefined" (untested?). Which is no different than
where we are today, afaict, with memoryless nodes. I think going from
memoryless->memoryful probably works, but the other direction may not
(and may not be possible in the common case).

-Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
@ 2014-09-10 22:49           ` Nishanth Aravamudan
  0 siblings, 0 replies; 20+ messages in thread
From: Nishanth Aravamudan @ 2014-09-10 22:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

On 10.09.2014 [12:06:16 -0700], Andrew Morton wrote:
> On Tue, 9 Sep 2014 17:47:23 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> 
> > On 09.09.2014 [17:11:15 -0700], Andrew Morton wrote:
> > > On Tue, 9 Sep 2014 12:03:27 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> > > 
> > > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > > 
> > > > We need to determine the fallback node in slub allocator if the
> > > > allocation target node is memoryless node. Without it, the SLUB wrongly
> > > > select the node which has no memory and can't use a partial slab,
> > > > because of node mismatch. Introduced function, node_to_mem_node(X), will
> > > > return a node Y with memory that has the nearest distance. If X is
> > > > memoryless node, it will return nearest distance node, but, if X is
> > > > normal node, it will return itself.
> > > > 
> > > > We will use this function in following patch to determine the fallback
> > > > node.
> > > > 
> > > > ...
> > > >
> > > > --- a/include/linux/topology.h
> > > > +++ b/include/linux/topology.h
> > > > @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> > > >   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> > > 
> > > This comment could be updated.
> > 
> > Will do, do you prefer a follow-on patch or one that replaces this one?
> 
> Either is OK for me.  I always turn replacement patches into
> incrementals so I (and others) can see what changed.

Ok, I'll probably send you just an incremental then myself.

> > > >   */
> > > >  DECLARE_PER_CPU(int, _numa_mem_);
> > > > +extern int _node_numa_mem_[MAX_NUMNODES];
> > > >  
> > > >  #ifndef set_numa_mem
> > > >  static inline void set_numa_mem(int node)
> > > >  {
> > > >  	this_cpu_write(_numa_mem_, node);
> > > > +	_node_numa_mem_[numa_node_id()] = node;
> > > > +}
> > > > +#endif
> > > > +
> > > > +#ifndef node_to_mem_node
> > > > +static inline int node_to_mem_node(int node)
> > > > +{
> > > > +	return _node_numa_mem_[node];
> > > >  }
> > > 
> > > A wee bit of documentation wouldn't hurt.
> 
> This?

Yep, I'll make sure it gets added.

> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
> > > >   */
> > > >  DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
> > > >  EXPORT_PER_CPU_SYMBOL(_numa_mem_);
> > > > +int _node_numa_mem_[MAX_NUMNODES];
> > > 
> > > How does this get updated as CPUs, memory and nodes are hot-added and
> > > removed?
> > 
> > As CPUs are added, the architecture code in the CPU bringup will update
> > the NUMA topology. Memory and node hotplug are still open issues, I
> > mentioned the former in the cover letter. I should have mentioned it in
> > this commit message as well.
> 
> Please define "open issue".  The computer will crash and catch fire?
> If not that, then what?

Umm, let's call it "undefined" (untested?). Which is no different than
where we are today, afaict, with memoryless nodes. I think going from
memoryless->memoryful probably works, but the other direction may not
(and may not be possible in the common case).

-Nish

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-09-10 22:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-09 19:01 [PATCH 0/3] Improve slab consumption with memoryless nodes Nishanth Aravamudan
2014-09-09 19:01 ` Nishanth Aravamudan
2014-09-09 19:03 ` [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node Nishanth Aravamudan
2014-09-09 19:03   ` Nishanth Aravamudan
2014-09-09 19:05   ` [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node Nishanth Aravamudan
2014-09-09 19:05     ` Nishanth Aravamudan
2014-09-09 19:06     ` [PATCH 3/3] Partial revert of 81c98869faa5 ("kthread: ensure locality of task_struct allocations") Nishanth Aravamudan
2014-09-09 19:06       ` Nishanth Aravamudan
2014-09-10  0:11     ` [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node Andrew Morton
2014-09-10  0:11       ` Andrew Morton
2014-09-10  0:55       ` Nishanth Aravamudan
2014-09-10  0:55         ` Nishanth Aravamudan
2014-09-10  0:11   ` [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node Andrew Morton
2014-09-10  0:11     ` Andrew Morton
2014-09-10  0:47     ` Nishanth Aravamudan
2014-09-10  0:47       ` Nishanth Aravamudan
2014-09-10 19:06       ` Andrew Morton
2014-09-10 19:06         ` Andrew Morton
2014-09-10 22:49         ` Nishanth Aravamudan
2014-09-10 22:49           ` Nishanth Aravamudan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.