linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/3] pseries/nodes: Fix issues with memoryless nodes
@ 2017-10-18 20:08 Michael Bringmann
  2017-10-18 20:08 ` [PATCH V2 1/3] pseries/nodes: Ensure enough nodes avail for operations Michael Bringmann
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Michael Bringmann @ 2017-10-18 20:08 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Michael Bringmann, Nathan Fontenot, Michael Ellerman, John Allen,
	Tyrel Datwyler, Thomas Falcon

pseries/nodes: Ensure enough nodes avail for operations

pseries/findnodes: Find nodes with memory when booting memoryless nodes

pseries/initnodes: Ensure nodes initialized for hotplug

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>

Michael Bringmann (3):
  pseries/nodes: Ensure enough nodes avail for operations
  pseries/findnodes: Find nodes with memory when booting memoryless nodes
  pseries/initnodes: Ensure nodes initialized for hotplug

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V2 1/3] pseries/nodes: Ensure enough nodes avail for operations
  2017-10-18 20:08 [PATCH V2 0/3] pseries/nodes: Fix issues with memoryless nodes Michael Bringmann
@ 2017-10-18 20:08 ` Michael Bringmann
  2017-10-18 20:09 ` [PATCH V2 2/3] pseries/findnodes: Find nodes with memory for memoryless nodes Michael Bringmann
  2017-10-18 20:09 ` [PATCH V2 3/3] pseries/initnodes: Ensure nodes initialized for hotplug Michael Bringmann
  2 siblings, 0 replies; 6+ messages in thread
From: Michael Bringmann @ 2017-10-18 20:08 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Michael Ellerman, Michael Bringmann, John Allen, Nathan Fontenot,
	Tyrel Datwyler, Thomas Falcon


pseries/nodes: On pseries systems which allow 'hot-add' of CPU or
memory resources, it may occur that the new resources are to be
inserted into nodes that were not used for these resources at bootup.
In the kernel, any node that is used must be defined and initialized.
This patch ensures that sufficient nodes are defined to support
configuration requirements after boot, as well as at boot.

This patch extracts the value of the lowest domain level (number
of allocable resources) from the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.  This new setting will
override the instruction,

    nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the property is not present at boot, no operation will be performed
to define or enable additional nodes.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |   39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index ec098b3..f885ab7 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -892,6 +892,36 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
 }
 
+static void __init find_possible_nodes(void)
+{
+	struct device_node *rtas;
+	u32 numnodes, i;
+
+	if (min_common_depth <= 0)
+		return;
+
+	rtas = of_find_node_by_path("/rtas");
+	if (!rtas)
+		return;
+
+	if (of_property_read_u32_index(rtas,
+				"ibm,max-associativity-domains",
+				min_common_depth, &numnodes))
+		goto out;
+
+	pr_info("numa: Nodes = %d (mcd = %d)\n", numnodes,
+		min_common_depth);
+
+	for (i = 0; i < numnodes; i++) {
+		if (!node_possible(i))
+			node_set(i, node_possible_map);
+	}
+
+out:
+	if (rtas)
+		of_node_put(rtas);
+}
+
 void __init initmem_init(void)
 {
 	int nid, cpu;
@@ -905,12 +935,15 @@ void __init initmem_init(void)
 	memblock_dump_all();
 
 	/*
-	 * Reduce the possible NUMA nodes to the online NUMA nodes,
-	 * since we do not support node hotplug. This ensures that  we
-	 * lower the maximum NUMA node ID to what is actually present.
+	 * Modify the set of possible NUMA nodes to reflect information
+	 * available about the set of online nodes, and the set of nodes
+	 * that we expect to make use of for this platform's affinity
+	 * calculations.
 	 */
 	nodes_and(node_possible_map, node_possible_map, node_online_map);
 
+	find_possible_nodes();
+
 	for_each_online_node(nid) {
 		unsigned long start_pfn, end_pfn;
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V2 2/3] pseries/findnodes: Find nodes with memory for memoryless nodes
  2017-10-18 20:08 [PATCH V2 0/3] pseries/nodes: Fix issues with memoryless nodes Michael Bringmann
  2017-10-18 20:08 ` [PATCH V2 1/3] pseries/nodes: Ensure enough nodes avail for operations Michael Bringmann
@ 2017-10-18 20:09 ` Michael Bringmann
  2017-10-19  8:56   ` Michael Ellerman
  2017-10-18 20:09 ` [PATCH V2 3/3] pseries/initnodes: Ensure nodes initialized for hotplug Michael Bringmann
  2 siblings, 1 reply; 6+ messages in thread
From: Michael Bringmann @ 2017-10-18 20:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Michael Ellerman, Michael Bringmann, John Allen, Nathan Fontenot,
	Tyrel Datwyler, Thomas Falcon


pseries/findnodes: On pseries systems which allow 'hot-add' of
resources, we may boot configurations that have CPUs, but no memory
associated to a node by the affinity calculations.  Previously, the
software took a shortcut to collapse initialization and references
to such memoryless nodes with other nodes that did have memory
associated with them at boot.  This patch is based on fixes that
allow the proper initialization and distinguishment of memoryless
and memory-plus nodes after NUMA initialization.  It extends the
use of the 'node_to_mem_node()' API from 'topology.h' to modules
that are allocating node-specific memory at boot, and allows such
references to find available memory in another node.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 block/blk-mq-cpumap.c |    3 ++-
 mm/page_alloc.c       |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc..a27a31f 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -73,7 +73,8 @@ int blk_mq_hw_queue_to_node(unsigned int *mq_map, unsigned int index)
 
 	for_each_possible_cpu(i) {
 		if (index == mq_map[i])
-			return local_memory_node(cpu_to_node(i));
+			return local_memory_node(
+					node_to_mem_node(cpu_to_node(i)));
 	}
 
 	return NUMA_NO_NODE;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 77e4d3c..e7aaa2a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4188,6 +4188,7 @@ struct page *
 
 	gfp_mask &= gfp_allowed_mask;
 	alloc_mask = gfp_mask;
+	preferred_nid = node_to_mem_node(preferred_nid);
 	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
 		return NULL;
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V2 3/3] pseries/initnodes: Ensure nodes initialized for hotplug
  2017-10-18 20:08 [PATCH V2 0/3] pseries/nodes: Fix issues with memoryless nodes Michael Bringmann
  2017-10-18 20:08 ` [PATCH V2 1/3] pseries/nodes: Ensure enough nodes avail for operations Michael Bringmann
  2017-10-18 20:09 ` [PATCH V2 2/3] pseries/findnodes: Find nodes with memory for memoryless nodes Michael Bringmann
@ 2017-10-18 20:09 ` Michael Bringmann
  2 siblings, 0 replies; 6+ messages in thread
From: Michael Bringmann @ 2017-10-18 20:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Michael Ellerman, Michael Bringmann, John Allen, Nathan Fontenot,
	Tyrel Datwyler, Thomas Falcon

pseries/nodes: On pseries systems which allow 'hot-add' of CPU,
it may occur that the new resources are to be inserted into nodes
that were not used for memory resources at bootup.  Many different
configurations of PowerPC resources may need to be supported depending
upon the environment.  This patch fixes some problems encountered at
runtime with configurations that support memory-less nodes, or that
hot-add resources during system execution after boot.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |   27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f885ab7..2be6363 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu)
 	nid = of_node_to_nid_single(cpu);
 
 out_present:
-	if (nid < 0 || !node_online(nid))
+	if (nid < 0 || !node_possible(nid))
 		nid = first_online_node;
 
 	map_cpu_to_node(lcpu, nid);
@@ -1311,6 +1311,25 @@ static long vphn_get_associativity(unsigned long cpu,
 	return rc;
 }
 
+static int verify_node_preparation(int nid)
+{
+	/*
+	 * Need to allocate/initialize NODE_DATA from a node with
+	 * memory (see memblock_alloc_try_nid).  Code executed after
+	 * boot (like local_memory_node) often does not know enough
+	 * to recover fully for memoryless nodes. 
+	 */
+	if (NODE_DATA(nid) == NULL) 
+		setup_node_data(nid, 0, 0);
+
+	if (NODE_DATA(nid)->node_spanned_pages == 0) {
+		if (try_online_node(nid))
+			return first_online_node;
+	}
+
+	return nid;
+}
+
 /*
  * Update the CPU maps and sysfs entries for a single CPU when its NUMA
  * characteristics change. This function doesn't perform any locking and is
@@ -1334,7 +1353,7 @@ static int update_cpu_topology(void *data)
 		unmap_cpu_from_node(cpu);
 		map_cpu_to_node(cpu, new_nid);
 		set_cpu_numa_node(cpu, new_nid);
-		set_cpu_numa_mem(cpu, local_memory_node(new_nid));
+		set_cpu_numa_mem(cpu, local_memory_node(node_to_mem_node(new_nid)));
 		vdso_getcpu_init();
 	}
 
@@ -1419,9 +1438,11 @@ int numa_update_cpu_topology(bool cpus_locked)
 		/* Use associativity from first thread for all siblings */
 		vphn_get_associativity(cpu, associativity);
 		new_nid = associativity_to_nid(associativity);
-		if (new_nid < 0 || !node_online(new_nid))
+		if (new_nid < 0 || !node_possible(new_nid))
 			new_nid = first_online_node;
 
+		new_nid = verify_node_preparation(new_nid);
+
 		if (new_nid == numa_cpu_lookup_table[cpu]) {
 			cpumask_andnot(&cpu_associativity_changes_mask,
 					&cpu_associativity_changes_mask,

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/3] pseries/findnodes: Find nodes with memory for memoryless nodes
  2017-10-18 20:09 ` [PATCH V2 2/3] pseries/findnodes: Find nodes with memory for memoryless nodes Michael Bringmann
@ 2017-10-19  8:56   ` Michael Ellerman
  2017-11-15 17:38     ` Michael Bringmann
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Ellerman @ 2017-10-19  8:56 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev, linux-kernel
  Cc: Michael Bringmann, John Allen, Nathan Fontenot, Tyrel Datwyler,
	Thomas Falcon

Hi Michael,

Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> pseries/findnodes: On pseries systems which allow 'hot-add' of

This isn't a powerpc or pseries patch, so the subject/prefix is wrong.

Also because you're changing generic code you need to provide an
explanation that makes sense in general, across all architectures, not
just in terms of what the pseries platform does.

> resources, we may boot configurations that have CPUs, but no memory
> associated to a node by the affinity calculations.

This is called a "memory-less node" and is understood by the generic
code.

> Previously, the
> software took a shortcut to collapse initialization and references

What software? What shortcut?

> to such memoryless nodes with other nodes that did have memory
> associated with them at boot.  This patch is based on fixes that

What fixes?

> allow the proper initialization and distinguishment of memoryless
> and memory-plus nodes after NUMA initialization.

What exactly is unproper about the current code?

> It extends the 
> use of the 'node_to_mem_node()' API from 'topology.h' to modules

The term "modules" has a specific meaning in Linux which is not correct
here. We would just say "in two functions" or "in two files".

> that are allocating node-specific memory at boot, and allows such
> references to find available memory in another node.


> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc..a27a31f 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -73,7 +73,8 @@ int blk_mq_hw_queue_to_node(unsigned int *mq_map, unsigned int index)
>  
>  	for_each_possible_cpu(i) {
>  		if (index == mq_map[i])
> -			return local_memory_node(cpu_to_node(i));
> +			return local_memory_node(
> +					node_to_mem_node(cpu_to_node(i)));

What is this trying to do?

local_memory_node() is supposed to return a "local" node for nodes with
no memory.

And in fact the comment says:

  * Used for initializing percpu 'numa_mem'

Which is what we do:

	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));

And is what's returned by node_to_mem_node():

  static inline void set_numa_mem(int node)
  {
  	this_cpu_write(_numa_mem_, node);
  	_node_numa_mem_[numa_node_id()] = node;
  }
  
  static inline int node_to_mem_node(int node)
  {
  	return _node_numa_mem_[node];
  }

So your change effectively ends up doing:

	return local_memory_node(local_memory_node(cpu_to_node(i)));

Which doesn't look right.


cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/3] pseries/findnodes: Find nodes with memory for memoryless nodes
  2017-10-19  8:56   ` Michael Ellerman
@ 2017-11-15 17:38     ` Michael Bringmann
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Bringmann @ 2017-11-15 17:38 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, linux-kernel
  Cc: John Allen, Nathan Fontenot, Tyrel Datwyler, Thomas Falcon

Hello:
    Sorry for the out-of-date description.  This entire patch has been
removed / eliminated from subsequent patch sets.  All changes to correct
powerpc memoryless nodes will be confined to powerpc-specific code.
    Regards,
Michael

On 10/19/2017 03:56 AM, Michael Ellerman wrote:
> Hi Michael,
> 
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>> pseries/findnodes: On pseries systems which allow 'hot-add' of
> 
> This isn't a powerpc or pseries patch, so the subject/prefix is wrong.
> 
> Also because you're changing generic code you need to provide an
> explanation that makes sense in general, across all architectures, not
> just in terms of what the pseries platform does.
> 
>> resources, we may boot configurations that have CPUs, but no memory
>> associated to a node by the affinity calculations.
> 
> This is called a "memory-less node" and is understood by the generic
> code.
> 
>> Previously, the
>> software took a shortcut to collapse initialization and references
> 
> What software? What shortcut?
> 
>> to such memoryless nodes with other nodes that did have memory
>> associated with them at boot.  This patch is based on fixes that
> 
> What fixes?
> 
>> allow the proper initialization and distinguishment of memoryless
>> and memory-plus nodes after NUMA initialization.
> 
> What exactly is unproper about the current code?
> 
>> It extends the 
>> use of the 'node_to_mem_node()' API from 'topology.h' to modules
> 
> The term "modules" has a specific meaning in Linux which is not correct
> here. We would just say "in two functions" or "in two files".
> 
>> that are allocating node-specific memory at boot, and allows such
>> references to find available memory in another node.
> 
> 
>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>> index 9f8cffc..a27a31f 100644
>> --- a/block/blk-mq-cpumap.c
>> +++ b/block/blk-mq-cpumap.c
>> @@ -73,7 +73,8 @@ int blk_mq_hw_queue_to_node(unsigned int *mq_map, unsigned int index)
>>  
>>  	for_each_possible_cpu(i) {
>>  		if (index == mq_map[i])
>> -			return local_memory_node(cpu_to_node(i));
>> +			return local_memory_node(
>> +					node_to_mem_node(cpu_to_node(i)));
> 
> What is this trying to do?
> 
> local_memory_node() is supposed to return a "local" node for nodes with
> no memory.
> 
> And in fact the comment says:
> 
>   * Used for initializing percpu 'numa_mem'
> 
> Which is what we do:
> 
> 	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> 
> And is what's returned by node_to_mem_node():
> 
>   static inline void set_numa_mem(int node)
>   {
>   	this_cpu_write(_numa_mem_, node);
>   	_node_numa_mem_[numa_node_id()] = node;
>   }
>   
>   static inline int node_to_mem_node(int node)
>   {
>   	return _node_numa_mem_[node];
>   }
> 
> So your change effectively ends up doing:
> 
> 	return local_memory_node(local_memory_node(cpu_to_node(i)));
> 
> Which doesn't look right.
> 
> 
> cheers
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-15 17:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-18 20:08 [PATCH V2 0/3] pseries/nodes: Fix issues with memoryless nodes Michael Bringmann
2017-10-18 20:08 ` [PATCH V2 1/3] pseries/nodes: Ensure enough nodes avail for operations Michael Bringmann
2017-10-18 20:09 ` [PATCH V2 2/3] pseries/findnodes: Find nodes with memory for memoryless nodes Michael Bringmann
2017-10-19  8:56   ` Michael Ellerman
2017-11-15 17:38     ` Michael Bringmann
2017-10-18 20:09 ` [PATCH V2 3/3] pseries/initnodes: Ensure nodes initialized for hotplug Michael Bringmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).