linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Improve slab consumption with memoryless nodes
@ 2014-08-14  0:13 Nishanth Aravamudan
  2014-08-14  0:14 ` [RFC PATCH v3 1/4] topology: add support for node_to_mem_node() to determine the fallback node Nishanth Aravamudan
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-08-14  0:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
on ppc LPARs with memoryless nodes, a large amount of memory was
consumed by slabs and was marked unreclaimable. He tracked it down to
slab deactivations in the SLUB core when we allocate remotely, leading
to poor efficiency always when memoryless nodes are present.

After much discussion, Joonsoo provided a few patches that help
significantly. They don't resolve the problem altogether:

 - memory hotplug still needs testing, that is when a memoryless node
   becomes memory-ful, we want to dtrt
 - there are other reasons for going off-node than memoryless nodes,
   e.g., fully exhausted local nodes

Neither case is resolved with this series, but I don't think that should
block their acceptance, as they can be explored/resolved with follow-on
patches.

The series consists of:

[1/4] topology: add support for node_to_mem_node() to determine the fallback node
[2/4] slub: fallback to node_to_mem_node() node if allocating on memoryless node

 - Joonsoo's patches to cache the nearest node with memory for each
   NUMA node

[3/4] Partial revert of 81c98869faa5 (""kthread: ensure locality of task_struct allocations")

 - At Tejun's request, keep the knowledge of memoryless node fallback to
   the allocator core.

[4/4] powerpc: reorder per-cpu NUMA information's initialization

 - Fix what appears to be a bug with when the NUMA topology information
   is stored in the powerpc initialization code.

 arch/powerpc/kernel/smp.c | 12 ++++++------
 arch/powerpc/mm/numa.c    | 13 ++++++++++---
 include/linux/topology.h  | 17 +++++++++++++++++
 kernel/kthread.c          |  2 +-
 mm/page_alloc.c           |  1 +
 mm/slub.c                 | 24 ++++++++++++++++++------
 6 files changed, 53 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 1/4] topology: add support for node_to_mem_node() to determine the fallback node
  2014-08-14  0:13 [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
@ 2014-08-14  0:14 ` Nishanth Aravamudan
  2014-08-14 14:35   ` Christoph Lameter
  2014-08-14  0:15 ` [RFC PATCH 2/4] slub: fallback to node_to_mem_node() node if allocating on memoryless node Nishanth Aravamudan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-08-14  0:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

We need to determine the fallback node in slub allocator if the
allocation target node is memoryless node. Without it, the SLUB wrongly
select the node which has no memory and can't use a partial slab,
because of node mismatch. Introduced function, node_to_mem_node(X), will
return a node Y with memory that has the nearest distance. If X is
memoryless node, it will return nearest distance node, but, if X is
normal node, it will return itself.

We will use this function in following patch to determine the fallback
node.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: linuxppc-dev@lists.ozlabs.org

---
v2 -> v3 (Nishanth):
  Fix declaration and definition of _node_numa_mem_.
  s/node_numa_mem/node_to_mem_node/ as suggested by David Rientjes.

diff --git a/include/linux/topology.h b/include/linux/topology.h
index dda6ee521e74..909b6e43b694 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -119,11 +119,20 @@ static inline int numa_node_id(void)
  * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
  */
 DECLARE_PER_CPU(int, _numa_mem_);
+extern int _node_numa_mem_[MAX_NUMNODES];
 
 #ifndef set_numa_mem
 static inline void set_numa_mem(int node)
 {
 	this_cpu_write(_numa_mem_, node);
+	_node_numa_mem_[numa_node_id()] = node;
+}
+#endif
+
+#ifndef node_to_mem_node
+static inline int node_to_mem_node(int node)
+{
+	return _node_numa_mem_[node];
 }
 #endif
 
@@ -146,6 +155,7 @@ static inline int cpu_to_mem(int cpu)
 static inline void set_cpu_numa_mem(int cpu, int node)
 {
 	per_cpu(_numa_mem_, cpu) = node;
+	_node_numa_mem_[cpu_to_node(cpu)] = node;
 }
 #endif
 
@@ -159,6 +169,13 @@ static inline int numa_mem_id(void)
 }
 #endif
 
+#ifndef node_to_mem_node
+static inline int node_to_mem_node(int node)
+{
+	return node;
+}
+#endif
+
 #ifndef cpu_to_mem
 static inline int cpu_to_mem(int cpu)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 18cee0d4c8a2..0883c42936d4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
  */
 DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
 EXPORT_PER_CPU_SYMBOL(_numa_mem_);
+int _node_numa_mem_[MAX_NUMNODES];
 #endif
 
 /*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/4] slub: fallback to node_to_mem_node() node if allocating on memoryless node
  2014-08-14  0:13 [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
  2014-08-14  0:14 ` [RFC PATCH v3 1/4] topology: add support for node_to_mem_node() to determine the fallback node Nishanth Aravamudan
@ 2014-08-14  0:15 ` Nishanth Aravamudan
  2014-08-14  0:16 ` [RFC PATCH 3/4] Partial revert of 81c98869faa5 ("kthread: ensure locality of task_struct allocations") Nishanth Aravamudan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-08-14  0:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Update the SLUB code to search for partial slabs on the nearest node
with memory in the presence of memoryless nodes. Additionally, do not
consider it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when
a memoryless-node specified allocation goes off-node.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
    
---
v1 -> v2 (Nishanth):
  Add commit message
  Clean-up conditions in get_partial()

diff --git a/mm/slub.c b/mm/slub.c
index 3e8afcc07a76..497fdfed2f01 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1699,7 +1699,12 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 		struct kmem_cache_cpu *c)
 {
 	void *object;
-	int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
+	int searchnode = node;
+
+	if (node == NUMA_NO_NODE)
+		searchnode = numa_mem_id();
+	else if (!node_present_pages(node))
+		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2280,11 +2285,18 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, page, c->freelist);
-		c->page = NULL;
-		c->freelist = NULL;
-		goto new_slab;
+		int searchnode = node;
+
+		if (node != NUMA_NO_NODE && !node_present_pages(node))
+			searchnode = node_to_mem_node(node);
+
+		if (unlikely(!node_match(page, searchnode))) {
+			stat(s, ALLOC_NODE_MISMATCH);
+			deactivate_slab(s, page, c->freelist);
+			c->page = NULL;
+			c->freelist = NULL;
+			goto new_slab;
+		}
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/4] Partial revert of 81c98869faa5 ("kthread: ensure locality of task_struct allocations")
  2014-08-14  0:13 [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
  2014-08-14  0:14 ` [RFC PATCH v3 1/4] topology: add support for node_to_mem_node() to determine the fallback node Nishanth Aravamudan
  2014-08-14  0:15 ` [RFC PATCH 2/4] slub: fallback to node_to_mem_node() node if allocating on memoryless node Nishanth Aravamudan
@ 2014-08-14  0:16 ` Nishanth Aravamudan
  2014-08-14  0:17 ` [RFC PATCH 4/4] powerpc: reorder per-cpu NUMA information's initialization Nishanth Aravamudan
  2014-08-22  1:10 ` [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
  4 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-08-14  0:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

After discussions with Tejun, we don't want to spread the use of
cpu_to_mem() (and thus knowledge of allocators/NUMA topology details)
into callers, but would rather ensure the callees correctly handle
memoryless nodes. With the previous patches ("topology: add support for
node_to_mem_node() to determine the fallback node" and "slub: fallback
to node_to_mem_node() node if allocating on memoryless node") adding and
using node_to_mem_node(), we can safely undo part of the change to the
kthread logic from 81c98869faa5 ("kthread: ensure locality of
task_struct allocations").
    
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

diff --git a/kernel/kthread.c b/kernel/kthread.c
index ef483220e855..10e489c448fe 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -369,7 +369,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
 {
 	struct task_struct *p;
 
-	p = kthread_create_on_node(threadfn, data, cpu_to_mem(cpu), namefmt,
+	p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
 				   cpu);
 	if (IS_ERR(p))
 		return p;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 4/4] powerpc: reorder per-cpu NUMA information's initialization
  2014-08-14  0:13 [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
                   ` (2 preceding siblings ...)
  2014-08-14  0:16 ` [RFC PATCH 3/4] Partial revert of 81c98869faa5 ("kthread: ensure locality of task_struct allocations") Nishanth Aravamudan
@ 2014-08-14  0:17 ` Nishanth Aravamudan
  2014-08-22  1:10 ` [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
  4 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-08-14  0:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

There is an issue currently where NUMA information is used on powerpc
(and possibly ia64) before it has been read from the device-tree, which
leads to large slab consumption with CONFIG_SLUB and memoryless nodes.

NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate
after start_secondary(), similar to ia64, which is invoked via
smp_init().

Commit 6ee0578b4daae ("workqueue: mark init_workqueues() as
early_initcall()") made init_workqueues() be invoked via
do_pre_smp_initcalls(), which is obviously before the secondary
processors are online.

Additionally, the following commits changed init_workqueues() to use
cpu_to_node to determine the node to use for kthread_create_on_node:

bce903809ab3f ("workqueue: add wq_numa_tbl_len and
wq_numa_possible_cpumask[]")
f3f90ad469342 ("workqueue: determine NUMA node of workers accourding to
the allowed cpumask")

Therefore, when init_workqueues() runs, it sees all CPUs as being on
Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to
a high number of slab deactivations
(http://www.spinics.net/lists/linux-mm/msg67489.html).

While testing memoryless nodes on PowerKVM guests with a fix to the
workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a
guest topology:

    available: 2 nodes (0-1)
    node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2
    node 0 size: 0 MB
    node 0 free: 0 MB
    node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
    node 1 size: 16336 MB
    node 1 free: 15329 MB
    node distances:
    node   0   1
      0:  10  40
      1:  40  10

the slab consumption decreases from:

    Slab:             932416 kB
    SUnreclaim:       902336 kB

to

    Slab:             395264 kB
    SUnreclaim:       359424 kB

And we see a corresponding increase in the slab efficiency from:

    slab                                   mem     objs    slabs
                                          used   active   active
    ------------------------------------------------------------
    kmalloc-16384                       337 MB   11.28%  100.00%
    task_struct                         288 MB    9.93%  100.00%

to:

    slab                                   mem     objs    slabs
                                          used   active   active
    ------------------------------------------------------------
    kmalloc-16384                        37 MB  100.00%  100.00%
    task_struct                          31 MB  100.00%  100.00%

Powerpc didn't support memoryless nodes until recently (64bb80d87f01
"powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and 8c272261194d
"powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also
helped improve memory consumption with these kind of environments.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

---
Ben & others, one area I'm still unsure of is if calling the NUMA
callback for all CPUs is desired. I don't know how else to get the NUMA
topology into the array easily, but I didn't test in an environment with
hotpluggable CPUs, so I'm not sure if it will lead to errors there (are
there device-tree entries for the topology of CPUs that will be plugged
in? I assume not, actually, so maybe we should keep the logic in
start_secondary so that those CPUs that are hotplugged later get the
right topology data?

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 1007fb802e6b..1fc8984f272e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -376,6 +376,12 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 					GFP_KERNEL, cpu_to_node(cpu));
 		zalloc_cpumask_var_node(&per_cpu(cpu_core_map, cpu),
 					GFP_KERNEL, cpu_to_node(cpu));
+		/*
+		 * numa_node_id() works after this.
+		 */
+		set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
+		set_cpu_numa_mem(cpu,
+				 local_memory_node(numa_cpu_lookup_table[cpu]));
 	}
 
 	cpumask_set_cpu(boot_cpuid, cpu_sibling_mask(boot_cpuid));
@@ -723,12 +729,6 @@ void start_secondary(void *unused)
 	}
 	traverse_core_siblings(cpu, true);
 
-	/*
-	 * numa_node_id() works after this.
-	 */
-	set_numa_node(numa_cpu_lookup_table[cpu]);
-	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
-
 	smp_wmb();
 	notify_cpu_starting(cpu);
 	set_cpu_online(cpu, true);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d3e9a78eaed3..32341e16b8ce 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1049,7 +1049,7 @@ static void __init mark_reserved_regions_for_nid(int nid)
 
 void __init do_init_bootmem(void)
 {
-	int nid;
+	int nid, cpu;
 
 	min_low_pfn = 0;
 	max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
@@ -1122,8 +1122,15 @@ void __init do_init_bootmem(void)
 
 	reset_numa_cpu_lookup_table();
 	register_cpu_notifier(&ppc64_numa_nb);
-	cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
-			  (void *)(unsigned long)boot_cpuid);
+	/*
+	 * We need the numa_cpu_lookup_table to be accurate for all
+	 * CPUs, even before we online them, so that we can use
+	 * cpu_to_{node,mem} early in boot, cf. smp_prepare_cpus().
+	 */
+	for_each_possible_cpu(cpu) {
+		cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
+				  (void *)(unsigned long)boot_cpuid);
+	}
 }
 
 void __init paging_init(void)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 1/4] topology: add support for node_to_mem_node() to determine the fallback node
  2014-08-14  0:14 ` [RFC PATCH v3 1/4] topology: add support for node_to_mem_node() to determine the fallback node Nishanth Aravamudan
@ 2014-08-14 14:35   ` Christoph Lameter
  2014-08-14 20:06     ` Nishanth Aravamudan
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2014-08-14 14:35 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Han Pingtian, Matt Mackall, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, David Rientjes,
	Tejun Heo, Andrew Morton, linuxppc-dev, Joonsoo Kim, Wanpeng Li,
	Anton Blanchard

On Wed, 13 Aug 2014, Nishanth Aravamudan wrote:

> +++ b/include/linux/topology.h
> @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
>   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
>   */
>  DECLARE_PER_CPU(int, _numa_mem_);
> +extern int _node_numa_mem_[MAX_NUMNODES];

Why are these variables starting with an _ ?
Maybe _numa_mem was defined that way because it is typically not defined.
We dont do this in other situations.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 1/4] topology: add support for node_to_mem_node() to determine the fallback node
  2014-08-14 14:35   ` Christoph Lameter
@ 2014-08-14 20:06     ` Nishanth Aravamudan
  2014-08-22 21:52       ` Nishanth Aravamudan
  0 siblings, 1 reply; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-08-14 20:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Han Pingtian, Matt Mackall, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, David Rientjes,
	Tejun Heo, Andrew Morton, linuxppc-dev, Joonsoo Kim, Wanpeng Li,
	Anton Blanchard

On 14.08.2014 [09:35:37 -0500], Christoph Lameter wrote:
> On Wed, 13 Aug 2014, Nishanth Aravamudan wrote:
> 
> > +++ b/include/linux/topology.h
> > @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> >   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> >   */
> >  DECLARE_PER_CPU(int, _numa_mem_);
> > +extern int _node_numa_mem_[MAX_NUMNODES];
> 
> Why are these variables starting with an _ ?
> Maybe _numa_mem was defined that way because it is typically not defined.
> We dont do this in other situations.

That's how it was in Joonsoo's patch and I was trying to minimize the
changes from his version (beyond making it compile). I can of course
update it to not have a prefixing _ if that's preferred.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/4] Improve slab consumption with memoryless nodes
  2014-08-14  0:13 [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
                   ` (3 preceding siblings ...)
  2014-08-14  0:17 ` [RFC PATCH 4/4] powerpc: reorder per-cpu NUMA information's initialization Nishanth Aravamudan
@ 2014-08-22  1:10 ` Nishanth Aravamudan
  2014-08-22 20:32   ` Andrew Morton
  4 siblings, 1 reply; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-08-22  1:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

On 13.08.2014 [17:13:01 -0700], Nishanth Aravamudan wrote:
> Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
> on ppc LPARs with memoryless nodes, a large amount of memory was
> consumed by slabs and was marked unreclaimable. He tracked it down to
> slab deactivations in the SLUB core when we allocate remotely, leading
> to poor efficiency always when memoryless nodes are present.
> 
> After much discussion, Joonsoo provided a few patches that help
> significantly. They don't resolve the problem altogether:
> 
>  - memory hotplug still needs testing, that is when a memoryless node
>    becomes memory-ful, we want to dtrt
>  - there are other reasons for going off-node than memoryless nodes,
>    e.g., fully exhausted local nodes
> 
> Neither case is resolved with this series, but I don't think that should
> block their acceptance, as they can be explored/resolved with follow-on
> patches.
> 
> The series consists of:
> 
> [1/4] topology: add support for node_to_mem_node() to determine the fallback node
> [2/4] slub: fallback to node_to_mem_node() node if allocating on memoryless node
> 
>  - Joonsoo's patches to cache the nearest node with memory for each
>    NUMA node
> 
> [3/4] Partial revert of 81c98869faa5 (""kthread: ensure locality of task_struct allocations")
> 
>  - At Tejun's request, keep the knowledge of memoryless node fallback to
>    the allocator core.
> 
> [4/4] powerpc: reorder per-cpu NUMA information's initialization
> 
>  - Fix what appears to be a bug with when the NUMA topology information
>    is stored in the powerpc initialization code.

Andrew & others,

I know kernel summit is going on, so I'll be patient, but was just
curious if anyone had any further comments other than Christoph's on the
naming.

Thanks,
Nish

> 
>  arch/powerpc/kernel/smp.c | 12 ++++++------
>  arch/powerpc/mm/numa.c    | 13 ++++++++++---
>  include/linux/topology.h  | 17 +++++++++++++++++
>  kernel/kthread.c          |  2 +-
>  mm/page_alloc.c           |  1 +
>  mm/slub.c                 | 24 ++++++++++++++++++------
>  6 files changed, 53 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/4] Improve slab consumption with memoryless nodes
  2014-08-22  1:10 ` [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
@ 2014-08-22 20:32   ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2014-08-22 20:32 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Han Pingtian, Matt Mackall, David Rientjes, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, Tejun Heo,
	Joonsoo Kim, linuxppc-dev, Christoph Lameter, Wanpeng Li,
	Anton Blanchard

On Thu, 21 Aug 2014 18:10:11 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:

> I know kernel summit is going on, so I'll be patient, but was just
> curious if anyone had any further comments other than Christoph's on the
> naming.

Nope.  Please make a decision on the naming, refresh, retest and resend
and I'll get the patches queued up for review and testing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 1/4] topology: add support for node_to_mem_node() to determine the fallback node
  2014-08-14 20:06     ` Nishanth Aravamudan
@ 2014-08-22 21:52       ` Nishanth Aravamudan
  0 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-08-22 21:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Han Pingtian, Matt Mackall, Pekka Enberg,
	Linux Memory Management List, Paul Mackerras, David Rientjes,
	Tejun Heo, Andrew Morton, linuxppc-dev, Joonsoo Kim, Wanpeng Li,
	Anton Blanchard

Hi Christoph,

On 14.08.2014 [13:06:56 -0700], Nishanth Aravamudan wrote:
> On 14.08.2014 [09:35:37 -0500], Christoph Lameter wrote:
> > On Wed, 13 Aug 2014, Nishanth Aravamudan wrote:
> > 
> > > +++ b/include/linux/topology.h
> > > @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> > >   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> > >   */
> > >  DECLARE_PER_CPU(int, _numa_mem_);
> > > +extern int _node_numa_mem_[MAX_NUMNODES];
> > 
> > Why are these variables starting with an _ ?
> > Maybe _numa_mem was defined that way because it is typically not defined.
> > We dont do this in other situations.
> 
> That's how it was in Joonsoo's patch and I was trying to minimize the
> changes from his version (beyond making it compile). I can of course
> update it to not have a prefixing _ if that's preferred.

Upon reflection, did you mean all of these variables? Would you rather I
submitted a follow-on patch that removed the prefix _? Note that
_node_numa_mem_ is also not defined if !MEMORYLESS_NODES.

-Nish

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-08-22 21:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-14  0:13 [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
2014-08-14  0:14 ` [RFC PATCH v3 1/4] topology: add support for node_to_mem_node() to determine the fallback node Nishanth Aravamudan
2014-08-14 14:35   ` Christoph Lameter
2014-08-14 20:06     ` Nishanth Aravamudan
2014-08-22 21:52       ` Nishanth Aravamudan
2014-08-14  0:15 ` [RFC PATCH 2/4] slub: fallback to node_to_mem_node() node if allocating on memoryless node Nishanth Aravamudan
2014-08-14  0:16 ` [RFC PATCH 3/4] Partial revert of 81c98869faa5 ("kthread: ensure locality of task_struct allocations") Nishanth Aravamudan
2014-08-14  0:17 ` [RFC PATCH 4/4] powerpc: reorder per-cpu NUMA information's initialization Nishanth Aravamudan
2014-08-22  1:10 ` [RFC PATCH 0/4] Improve slab consumption with memoryless nodes Nishanth Aravamudan
2014-08-22 20:32   ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).