linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map
@ 2015-03-05 18:05 Nishanth Aravamudan
  2015-03-05 21:16 ` David Rientjes
  2015-03-05 22:13 ` [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map Tejun Heo
  0 siblings, 2 replies; 18+ messages in thread
From: Nishanth Aravamudan @ 2015-03-05 18:05 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Raghavendra K T, Paul Mackerras, Anton Blanchard, David Rientjes,
	Tejun Heo, linuxppc-dev

Raghu noticed an issue with excessive memory allocation on power with a
simple cgroup test, specifically, in mem_cgroup_css_alloc ->
for_each_node -> alloc_mem_cgroup_per_zone_info(), which ends up blowing
up the kmalloc-2048 slab (to the order of 200MB for 400 cgroup
directories).

The underlying issue is that NODES_SHIFT on power is 8 (256 NUMA nodes
possible), which defines node_possible_map, which in turn defines the
iteration of for_each_node.

In practice, we never see a system with 256 NUMA nodes, and in fact, we
do not support node hotplug on power in the first place, so the nodes
that are online when we come up are the nodes that will be present for
the lifetime of this kernel. So let's, at least, drop the NUMA possible
map down to the online map at runtime. This is similar to what x86 does
in its initialization routines.

One could alternatively nodemask_and(node_possible_map,
node_online_map), but I think the cost of anding the two will always be
higher than zero and set a few bits in practice.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

---
While looking at this, I noticed that nr_node_ids is actually a
misnomer, it seems. It's not the number, but the maximum_node_id, as
with sparse NUMA nodes, you might only have two NUMA nodes possible, but
to make certain loops work, nr_node_ids will be, e.g., 17. Should it be
changed?

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 0257a7d659ef..24de29b3651b 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -958,9 +958,17 @@ void __init initmem_init(void)
 
 	memblock_dump_all();
 
+	/*
+	 * zero out the possible nodes after we parse the device-tree,
+	 * so that we lower the maximum NUMA node ID to what is actually
+	 * present.
+	 */
+	nodes_clear(node_possible_map);
+
 	for_each_online_node(nid) {
 		unsigned long start_pfn, end_pfn;
 
+		node_set(nid, node_possible_map);
 		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
 		setup_node_data(nid, start_pfn, end_pfn);
 		sparse_memory_present_with_active_regions(nid);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-03-10 23:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-05 18:05 [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map Nishanth Aravamudan
2015-03-05 21:16 ` David Rientjes
2015-03-05 21:48   ` Michael Ellerman
2015-03-05 21:58     ` David Rientjes
2015-03-05 22:08       ` Tejun Heo
2015-03-05 22:18         ` Tejun Heo
2015-03-05 23:21         ` Nishanth Aravamudan
2015-03-05 23:24           ` Tejun Heo
2015-03-05 23:20       ` Nishanth Aravamudan
2015-03-05 23:17     ` Nishanth Aravamudan
2015-03-05 23:15   ` Nishanth Aravamudan
2015-03-05 23:29     ` David Rientjes
2015-03-06  5:27       ` [PATCH v2] powerpc/numa: set node_possible_map to only node_online_map during boot Nishanth Aravamudan
2015-03-06 11:29         ` Raghavendra K T
2015-03-09 23:55         ` Michael Ellerman
2015-03-10 23:50           ` [PATCH v3] " Nishanth Aravamudan
2015-03-05 22:13 ` [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map Tejun Heo
2015-03-05 23:27   ` Nishanth Aravamudan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).