linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] powerpc/nodes/hotplug: Fix problem with memoryless nodes
@ 2017-09-18 18:28 Michael Bringmann
  2017-09-18 18:28 ` [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
  2017-09-18 18:28 ` [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug Michael Bringmann
  0 siblings, 2 replies; 13+ messages in thread
From: Michael Bringmann @ 2017-09-18 18:28 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Michael Bringmann, Nathan Fontenot, Michael Ellerman, John Allen

powerpc/nodes: Ensure enough nodes avail for operations.
On systems like PowerPC which allow 'hot-add' of CPU or memory
resources, it may occur that the new resources are to be inserted
into nodes that were not used for these resources at bootup.  In
the kernel, any node that is used must be defined and initialized
at boot.  This patch extracts the value of the lowest domain level
(number of allocable resources) from the "rtas" device tree property
"ibm,current-associativity-domains" or the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.

powerpc/hotplug: Fix CPU-only node bringup bug
On systems like PowerPC which allow 'hot-add' of CPU, it may occur
that the new resources are to be inserted into nodes that were not
used for memory resources at bootup.  Many different configurations
of PowerPC resources may need to be supported depending upon the
environment.  This patch fixes some problems encountered at runtime
with configurations that support memory-less nodes, but which allow
CPUs to be added at and after boot.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>

Michael Bringmann (2):
  powerpc/vphn: Ensure enough nodes avail for operations
  powerpc/hotplug: Fix CPU-only node bringup bug

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations
  2017-09-18 18:28 [PATCH 0/2] powerpc/nodes/hotplug: Fix problem with memoryless nodes Michael Bringmann
@ 2017-09-18 18:28 ` Michael Bringmann
  2017-10-16 12:33   ` Michael Ellerman
  2017-09-18 18:28 ` [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug Michael Bringmann
  1 sibling, 1 reply; 13+ messages in thread
From: Michael Bringmann @ 2017-09-18 18:28 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Michael Ellerman, Michael Bringmann, John Allen, Nathan Fontenot

powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
or memory resources, it may occur that the new resources are to be
inserted into nodes that were not used for these resources at bootup.
In the kernel, any node that is used must be defined and initialized
at boot.

This patch extracts the value of the lowest domain level (number of
allocable resources) from the "rtas" device tree property
"ibm,current-associativity-domains" or the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.  This new setting will
override the instruction,

    nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the property is not present at boot, no operation will be performed
to define or enable additional nodes.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index ec098b3..b385cd0 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
 }
 
+static void __init node_associativity_setup(void)
+{
+	struct device_node *rtas;
+
+	rtas = of_find_node_by_path("/rtas");
+	if (rtas) {
+		const __be32 *prop;
+		u32 len, entries, numnodes, i;
+
+		prop = of_get_property(rtas,
+					"ibm,current-associativity-domains", &len);
+		if (!prop || len < sizeof(unsigned int)) {
+			prop = of_get_property(rtas,
+					"ibm,max-associativity-domains", &len);
+			goto endit;
+		}
+
+		entries = of_read_number(prop++, 1);
+
+		if (len < (entries * sizeof(unsigned int)))
+			goto endit;
+
+		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
+			entries = min_common_depth;
+		else
+			entries -= 1;
+
+		numnodes = of_read_number(&prop[entries], 1);
+
+		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
+			min_common_depth);
+
+		for (i = 0; i < numnodes; i++) {
+			if (!node_possible(i)) {
+				setup_node_data(i, 0, 0);
+				node_set(i, node_possible_map);
+			}
+		}
+	}
+
+endit:
+	if (rtas)
+		of_node_put(rtas);
+}
+
 void __init initmem_init(void)
 {
 	int nid, cpu;
@@ -911,6 +956,8 @@ void __init initmem_init(void)
 	 */
 	nodes_and(node_possible_map, node_possible_map, node_online_map);
 
+	node_associativity_setup();
+
 	for_each_online_node(nid) {
 		unsigned long start_pfn, end_pfn;
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug
  2017-09-18 18:28 [PATCH 0/2] powerpc/nodes/hotplug: Fix problem with memoryless nodes Michael Bringmann
  2017-09-18 18:28 ` [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
@ 2017-09-18 18:28 ` Michael Bringmann
  2017-10-16 12:54   ` Michael Ellerman
  1 sibling, 1 reply; 13+ messages in thread
From: Michael Bringmann @ 2017-09-18 18:28 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Michael Ellerman, Michael Bringmann, John Allen, Nathan Fontenot

powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU,
it may occur that the new resources are to be inserted into nodes
that were not used for memory resources at bootup.  Many different
configurations of PowerPC resources may need to be supported depending
upon the environment.  This patch fixes some problems encountered at
runtime with configurations that support memory-less nodes, but which
allow CPUs to be added at and after boot.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b385cd0..e811dd1 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu)
 	nid = of_node_to_nid_single(cpu);
 
 out_present:
-	if (nid < 0 || !node_online(nid))
+	if (nid < 0 || !node_possible(nid))
 		nid = first_online_node;
 
 	map_cpu_to_node(lcpu, nid);
@@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu,
 	return rc;
 }
 
+static int verify_node_preparation(int nid)
+{
+	if ((NODE_DATA(nid) == NULL) ||
+	    (NODE_DATA(nid)->node_spanned_pages == 0)) {
+		if (try_online_node(nid))
+			return first_online_node;
+	}
+
+	return nid;
+}
+
 /*
  * Update the CPU maps and sysfs entries for a single CPU when its NUMA
  * characteristics change. This function doesn't perform any locking and is
@@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked)
 		/* Use associativity from first thread for all siblings */
 		vphn_get_associativity(cpu, associativity);
 		new_nid = associativity_to_nid(associativity);
-		if (new_nid < 0 || !node_online(new_nid))
+		if (new_nid < 0 || !node_possible(new_nid))
 			new_nid = first_online_node;
 
+		new_nid = verify_node_preparation(new_nid);
+
 		if (new_nid == numa_cpu_lookup_table[cpu]) {
 			cpumask_andnot(&cpu_associativity_changes_mask,
 					&cpu_associativity_changes_mask,

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations
  2017-09-18 18:28 ` [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
@ 2017-10-16 12:33   ` Michael Ellerman
  2017-10-17 16:14     ` Michael Bringmann
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Ellerman @ 2017-10-16 12:33 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev, linux-kernel
  Cc: Michael Bringmann, John Allen, Nathan Fontenot

Michael Bringmann <mwb@linux.vnet.ibm.com> writes:

> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU

This is a powerpc-only patch, so saying "systems like PowerPC" is
confusing. What you should be saying is "On pseries systems".

> or memory resources, it may occur that the new resources are to be
> inserted into nodes that were not used for these resources at bootup.
> In the kernel, any node that is used must be defined and initialized
> at boot.
>
> This patch extracts the value of the lowest domain level (number of
> allocable resources) from the "rtas" device tree property
> "ibm,current-associativity-domains" or the device tree property

What is current associativity domains? I've not heard of it, where is it
documented, and what does it mean.

Why would use the "current" set vs the "max"? I thought the whole point
was to discover the maximum possible set of nodes that could be
hotplugged.

> "ibm,max-associativity-domains" to use as the maximum number of nodes
> to setup as possibly available in the system.  This new setting will
> override the instruction,
>
>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>
> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>
> If the property is not present at boot, no operation will be performed
> to define or enable additional nodes.
>
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index ec098b3..b385cd0 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>  }
>  
> +static void __init node_associativity_setup(void)

This should really be called "find_possible_nodes()" or something more
descriptive.

> +{
> +	struct device_node *rtas;
> +
> +	rtas = of_find_node_by_path("/rtas");
> +	if (rtas) {

If you just short-circuit that return the whole function body can be
deintented, making it significantly more readable.

ie:
+	rtas = of_find_node_by_path("/rtas");
+	if (!rtas)
+		return;

> +		const __be32 *prop;
> +		u32 len, entries, numnodes, i;
> +
> +		prop = of_get_property(rtas,
> +					"ibm,current-associativity-domains", &len);

Please don't use of_get_property() in new code, we have much better
accessors these days, which do better error checking and handle the
endian conversions for you.

In this case you'd use eg:

	u32 entries;
	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);

> +		if (!prop || len < sizeof(unsigned int)) {
> +			prop = of_get_property(rtas,
> +					"ibm,max-associativity-domains", &len);
> +			goto endit;
> +		}
> +
> +		entries = of_read_number(prop++, 1);
> +
> +		if (len < (entries * sizeof(unsigned int)))
> +			goto endit;
> +
> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
> +			entries = min_common_depth;
> +		else
> +			entries -= 1;
			^
                        You can't just guess that will be the right entry.

If min_common_depth is < 0 the function should have just returned
immediately at the top.

If min_common_depth is outside the range of the property that's a buggy
device tree, you should print a warning and return.

> +		numnodes = of_read_number(&prop[entries], 1);

	u32 num_nodes;
	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
> +
> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
> +			min_common_depth);
> +
> +		for (i = 0; i < numnodes; i++) {
> +			if (!node_possible(i)) {
> +				setup_node_data(i, 0, 0);

Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
it will not be allocated node local, which sucks.

> +				node_set(i, node_possible_map);
> +			}
> +		}
> +	}
> +
> +endit:

"out" would be the normal name.

> +	if (rtas)
> +		of_node_put(rtas);
> +}
> +
>  void __init initmem_init(void)
>  {
>  	int nid, cpu;
> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>  	 */

You need to update the comment above here which is contradicted by the
new function you're adding.

>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>  
> +	node_associativity_setup();
> +
>  	for_each_online_node(nid) {
>  		unsigned long start_pfn, end_pfn;
>  

cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug
  2017-09-18 18:28 ` [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug Michael Bringmann
@ 2017-10-16 12:54   ` Michael Ellerman
  2017-10-17 15:08     ` Michael Bringmann
  2017-11-15 18:28     ` Michael Bringmann
  0 siblings, 2 replies; 13+ messages in thread
From: Michael Ellerman @ 2017-10-16 12:54 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev, linux-kernel
  Cc: Michael Bringmann, John Allen, Nathan Fontenot

Michael Bringmann <mwb@linux.vnet.ibm.com> writes:

> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU,
> it may occur that the new resources are to be inserted into nodes
> that were not used for memory resources at bootup.  Many different
> configurations of PowerPC resources may need to be supported depending
> upon the environment.

Give me some detail please?!

> This patch fixes some problems encountered at

What problems?

> runtime with configurations that support memory-less nodes, but which
> allow CPUs to be added at and after boot.

How does it fix those problems?

> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index b385cd0..e811dd1 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu,
>  	return rc;
>  }
>  
> +static int verify_node_preparation(int nid)
> +{

I would not expect a function called "verify" ...

> +	if ((NODE_DATA(nid) == NULL) ||
> +	    (NODE_DATA(nid)->node_spanned_pages == 0)) {
> +		if (try_online_node(nid))

.. to do something like online a node.

> +			return first_online_node;
> +	}
> +
> +	return nid;
> +}
> +
>  /*
>   * Update the CPU maps and sysfs entries for a single CPU when its NUMA
>   * characteristics change. This function doesn't perform any locking and is
> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked)
>  		/* Use associativity from first thread for all siblings */
>  		vphn_get_associativity(cpu, associativity);
>  		new_nid = associativity_to_nid(associativity);
> -		if (new_nid < 0 || !node_online(new_nid))
> +		if (new_nid < 0 || !node_possible(new_nid))
>  			new_nid = first_online_node;
>  
> +		new_nid = verify_node_preparation(new_nid);

You're being called part-way through CPU hotplug here, are we sure it's
safe to go and do memory hotplug from there? What's the locking
situation?

cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug
  2017-10-16 12:54   ` Michael Ellerman
@ 2017-10-17 15:08     ` Michael Bringmann
  2017-11-15 18:28     ` Michael Bringmann
  1 sibling, 0 replies; 13+ messages in thread
From: Michael Bringmann @ 2017-10-17 15:08 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, linux-kernel; +Cc: John Allen, Nathan Fontenot



On 10/16/2017 07:54 AM, Michael Ellerman wrote:
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> 
>> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU,
>> it may occur that the new resources are to be inserted into nodes
>> that were not used for memory resources at bootup.  Many different
>> configurations of PowerPC resources may need to be supported depending
>> upon the environment.
> 
> Give me some detail please?!

Configurations that demonstrated problems included 'memoryless' nodes
that possessed only CPUs at boot, and nodes that contained neither CPUs
nor memory at boot.  The calculations in the kernel resulted in a different
node use layout on many SAP HANA configured systems.

> 
>> This patch fixes some problems encountered at
> 
> What problems?

The previous implementation collapsed all node assignments after affinity
calculations to use only those nodes that had memory at boot.  This
resulted in calculation and configuration differences between the FSP
code and the Linux kernel.

> 
>> runtime with configurations that support memory-less nodes, but which
>> allow CPUs to be added at and after boot.
> 
> How does it fix those problems?

The change involves completing the initialization of nodes that were not
used at boot, but which were selected by VPHN affinity calculations during
subsequent hotplug operations.

Michael

> 
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index b385cd0..e811dd1 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu,
>>  	return rc;
>>  }
>>  
>> +static int verify_node_preparation(int nid)
>> +{
> 
> I would not expect a function called "verify" ...
> 
>> +	if ((NODE_DATA(nid) == NULL) ||
>> +	    (NODE_DATA(nid)->node_spanned_pages == 0)) {
>> +		if (try_online_node(nid))
> 
> .. to do something like online a node.
> 
>> +			return first_online_node;
>> +	}
>> +
>> +	return nid;
>> +}
>> +
>>  /*
>>   * Update the CPU maps and sysfs entries for a single CPU when its NUMA
>>   * characteristics change. This function doesn't perform any locking and is
>> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked)
>>  		/* Use associativity from first thread for all siblings */
>>  		vphn_get_associativity(cpu, associativity);
>>  		new_nid = associativity_to_nid(associativity);
>> -		if (new_nid < 0 || !node_online(new_nid))
>> +		if (new_nid < 0 || !node_possible(new_nid))
>>  			new_nid = first_online_node;
>>  
>> +		new_nid = verify_node_preparation(new_nid);
> 
> You're being called part-way through CPU hotplug here, are we sure it's
> safe to go and do memory hotplug from there? What's the locking
> situation?
> 
> cheers
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations
  2017-10-16 12:33   ` Michael Ellerman
@ 2017-10-17 16:14     ` Michael Bringmann
  2017-10-17 17:02       ` Nathan Fontenot
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Bringmann @ 2017-10-17 16:14 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, linux-kernel
  Cc: John Allen, Nathan Fontenot, Michael Bringmann from Kernel Team

See below.

On 10/16/2017 07:33 AM, Michael Ellerman wrote:
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> 
>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
> 
> This is a powerpc-only patch, so saying "systems like PowerPC" is
> confusing. What you should be saying is "On pseries systems".
> 
>> or memory resources, it may occur that the new resources are to be
>> inserted into nodes that were not used for these resources at bootup.
>> In the kernel, any node that is used must be defined and initialized
>> at boot.
>>
>> This patch extracts the value of the lowest domain level (number of
>> allocable resources) from the "rtas" device tree property
>> "ibm,current-associativity-domains" or the device tree property
> 
> What is current associativity domains? I've not heard of it, where is it
> documented, and what does it mean.
> 
> Why would use the "current" set vs the "max"? I thought the whole point
> was to discover the maximum possible set of nodes that could be
> hotplugged.
> 
>> "ibm,max-associativity-domains" to use as the maximum number of nodes
>> to setup as possibly available in the system.  This new setting will
>> override the instruction,
>>
>>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>>
>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>>
>> If the property is not present at boot, no operation will be performed
>> to define or enable additional nodes.
>>
>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index ec098b3..b385cd0 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>>  }
>>  
>> +static void __init node_associativity_setup(void)
> 
> This should really be called "find_possible_nodes()" or something more
> descriptive.

Okay.
> 
>> +{
>> +	struct device_node *rtas;
>> +
>> +	rtas = of_find_node_by_path("/rtas");
>> +	if (rtas) {
> 
> If you just short-circuit that return the whole function body can be
> deintented, making it significantly more readable.
> 
> ie:
> +	rtas = of_find_node_by_path("/rtas");
> +	if (!rtas)
> +		return;

Okay.

> 
>> +		const __be32 *prop;
>> +		u32 len, entries, numnodes, i;
>> +
>> +		prop = of_get_property(rtas,
>> +					"ibm,current-associativity-domains", &len);
> 
> Please don't use of_get_property() in new code, we have much better
> accessors these days, which do better error checking and handle the
> endian conversions for you.
> 
> In this case you'd use eg:
> 
> 	u32 entries;
> 	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);

The property 'ibm,current-associativity-domains' has the same format as the property
'ibm,max-associativity-domains' i.e. it is an integer array.  The accessor of_property_read_32,
however, expects it to be an integer singleton value.  Instead, it needs:

> 
>> +		if (!prop || len < sizeof(unsigned int)) {
>> +			prop = of_get_property(rtas,
>> +					"ibm,max-associativity-domains", &len);
                        if (!prop || len < sizeof(unsigned int))
>> +				goto endit;
>> +		}
>> +
>> +		entries = of_read_number(prop++, 1);
>> +
>> +		if (len < (entries * sizeof(unsigned int)))
>> +			goto endit;
>> +
>> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
>> +			entries = min_common_depth;
>> +		else
>> +			entries -= 1;
> 			^
>                         You can't just guess that will be the right entry.
> 
> If min_common_depth is < 0 the function should have just returned
> immediately at the top.

Okay.

> 
> If min_common_depth is outside the range of the property that's a buggy
> device tree, you should print a warning and return.
> 
>> +		numnodes = of_read_number(&prop[entries], 1);
> 
> 	u32 num_nodes;
> 	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
>> +
>> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
>> +			min_common_depth);
>> +
>> +		for (i = 0; i < numnodes; i++) {
>> +			if (!node_possible(i)) {
>> +				setup_node_data(i, 0, 0);
> 
> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
> it will not be allocated node local, which sucks.

Okay.

> 
>> +				node_set(i, node_possible_map);
>> +			}
>> +		}
>> +	}
>> +
>> +endit:
> 
> "out" would be the normal name.

Okay.

> 
>> +	if (rtas)
>> +		of_node_put(rtas);
>> +}
>> +
>>  void __init initmem_init(void)
>>  {
>>  	int nid, cpu;
>> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>>  	 */
> 
> You need to update the comment above here which is contradicted by the
> new function you're adding.

Okay.

> 
>>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>>  
>> +	node_associativity_setup();
>> +
>>  	for_each_online_node(nid) {
>>  		unsigned long start_pfn, end_pfn;
>>  
> 
> cheers
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations
  2017-10-17 16:14     ` Michael Bringmann
@ 2017-10-17 17:02       ` Nathan Fontenot
  2017-10-17 17:22         ` Michael Bringmann
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Fontenot @ 2017-10-17 17:02 UTC (permalink / raw)
  To: Michael Bringmann, Michael Ellerman, linuxppc-dev, linux-kernel
  Cc: John Allen, Michael Bringmann from Kernel Team

On 10/17/2017 11:14 AM, Michael Bringmann wrote:
> See below.
> 
> On 10/16/2017 07:33 AM, Michael Ellerman wrote:
>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>
>>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
>>
>> This is a powerpc-only patch, so saying "systems like PowerPC" is
>> confusing. What you should be saying is "On pseries systems".
>>
>>> or memory resources, it may occur that the new resources are to be
>>> inserted into nodes that were not used for these resources at bootup.
>>> In the kernel, any node that is used must be defined and initialized
>>> at boot.
>>>
>>> This patch extracts the value of the lowest domain level (number of
>>> allocable resources) from the "rtas" device tree property
>>> "ibm,current-associativity-domains" or the device tree property
>>
>> What is current associativity domains? I've not heard of it, where is it
>> documented, and what does it mean.
>>
>> Why would use the "current" set vs the "max"? I thought the whole point
>> was to discover the maximum possible set of nodes that could be
>> hotplugged.
>>
>>> "ibm,max-associativity-domains" to use as the maximum number of nodes
>>> to setup as possibly available in the system.  This new setting will
>>> override the instruction,
>>>
>>>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>
>>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>>>
>>> If the property is not present at boot, no operation will be performed
>>> to define or enable additional nodes.
>>>
>>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 47 insertions(+)
>>>
>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>> index ec098b3..b385cd0 100644
>>> --- a/arch/powerpc/mm/numa.c
>>> +++ b/arch/powerpc/mm/numa.c
>>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>>>  }
>>>  
>>> +static void __init node_associativity_setup(void)
>>
>> This should really be called "find_possible_nodes()" or something more
>> descriptive.
> 
> Okay.
>>
>>> +{
>>> +	struct device_node *rtas;
>>> +
>>> +	rtas = of_find_node_by_path("/rtas");
>>> +	if (rtas) {
>>
>> If you just short-circuit that return the whole function body can be
>> deintented, making it significantly more readable.
>>
>> ie:
>> +	rtas = of_find_node_by_path("/rtas");
>> +	if (!rtas)
>> +		return;
> 
> Okay.
> 
>>
>>> +		const __be32 *prop;
>>> +		u32 len, entries, numnodes, i;
>>> +
>>> +		prop = of_get_property(rtas,
>>> +					"ibm,current-associativity-domains", &len);
>>
>> Please don't use of_get_property() in new code, we have much better
>> accessors these days, which do better error checking and handle the
>> endian conversions for you.
>>
>> In this case you'd use eg:
>>
>> 	u32 entries;
>> 	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);
> 
> The property 'ibm,current-associativity-domains' has the same format as the property
> 'ibm,max-associativity-domains' i.e. it is an integer array.  The accessor of_property_read_32,
> however, expects it to be an integer singleton value.  Instead, it needs:

I think for this case where the property is an array of values you could use
of_property_count_elems_of_size() to get the number of elements in the array
and then use of_property_read_u32_array() to read the array.

-Nathan

> 
>>
>>> +		if (!prop || len < sizeof(unsigned int)) {
>>> +			prop = of_get_property(rtas,
>>> +					"ibm,max-associativity-domains", &len);
>                         if (!prop || len < sizeof(unsigned int))
>>> +				goto endit;
>>> +		}
>>> +
>>> +		entries = of_read_number(prop++, 1);
>>> +
>>> +		if (len < (entries * sizeof(unsigned int)))
>>> +			goto endit;
>>> +
>>> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
>>> +			entries = min_common_depth;
>>> +		else
>>> +			entries -= 1;
>> 			^
>>                         You can't just guess that will be the right entry.
>>
>> If min_common_depth is < 0 the function should have just returned
>> immediately at the top.
> 
> Okay.
> 
>>
>> If min_common_depth is outside the range of the property that's a buggy
>> device tree, you should print a warning and return.
>>
>>> +		numnodes = of_read_number(&prop[entries], 1);
>>
>> 	u32 num_nodes;
>> 	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
>>> +
>>> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
>>> +			min_common_depth);
>>> +
>>> +		for (i = 0; i < numnodes; i++) {
>>> +			if (!node_possible(i)) {
>>> +				setup_node_data(i, 0, 0);
>>
>> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
>> it will not be allocated node local, which sucks.
> 
> Okay.
> 
>>
>>> +				node_set(i, node_possible_map);
>>> +			}
>>> +		}
>>> +	}
>>> +
>>> +endit:
>>
>> "out" would be the normal name.
> 
> Okay.
> 
>>
>>> +	if (rtas)
>>> +		of_node_put(rtas);
>>> +}
>>> +
>>>  void __init initmem_init(void)
>>>  {
>>>  	int nid, cpu;
>>> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>>>  	 */
>>
>> You need to update the comment above here which is contradicted by the
>> new function you're adding.
> 
> Okay.
> 
>>
>>>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>  
>>> +	node_associativity_setup();
>>> +
>>>  	for_each_online_node(nid) {
>>>  		unsigned long start_pfn, end_pfn;
>>>  
>>
>> cheers
>>
>>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations
  2017-10-17 17:02       ` Nathan Fontenot
@ 2017-10-17 17:22         ` Michael Bringmann
  2017-10-17 17:41           ` Nathan Fontenot
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Bringmann @ 2017-10-17 17:22 UTC (permalink / raw)
  To: Nathan Fontenot, Michael Ellerman, linuxppc-dev, linux-kernel
  Cc: John Allen, Michael Bringmann from Kernel Team



On 10/17/2017 12:02 PM, Nathan Fontenot wrote:
> On 10/17/2017 11:14 AM, Michael Bringmann wrote:
>> See below.
>>
>> On 10/16/2017 07:33 AM, Michael Ellerman wrote:
>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>
>>>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
>>>
>>> This is a powerpc-only patch, so saying "systems like PowerPC" is
>>> confusing. What you should be saying is "On pseries systems".
>>>
>>>> or memory resources, it may occur that the new resources are to be
>>>> inserted into nodes that were not used for these resources at bootup.
>>>> In the kernel, any node that is used must be defined and initialized
>>>> at boot.
>>>>
>>>> This patch extracts the value of the lowest domain level (number of
>>>> allocable resources) from the "rtas" device tree property
>>>> "ibm,current-associativity-domains" or the device tree property
>>>
>>> What is current associativity domains? I've not heard of it, where is it
>>> documented, and what does it mean.
>>>
>>> Why would use the "current" set vs the "max"? I thought the whole point
>>> was to discover the maximum possible set of nodes that could be
>>> hotplugged.
>>>
>>>> "ibm,max-associativity-domains" to use as the maximum number of nodes
>>>> to setup as possibly available in the system.  This new setting will
>>>> override the instruction,
>>>>
>>>>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>>
>>>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>>>>
>>>> If the property is not present at boot, no operation will be performed
>>>> to define or enable additional nodes.
>>>>
>>>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>>>> ---
>>>>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 47 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>>> index ec098b3..b385cd0 100644
>>>> --- a/arch/powerpc/mm/numa.c
>>>> +++ b/arch/powerpc/mm/numa.c
>>>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>>>>  }
>>>>  
>>>> +static void __init node_associativity_setup(void)
>>>
>>> This should really be called "find_possible_nodes()" or something more
>>> descriptive.
>>
>> Okay.
>>>
>>>> +{
>>>> +	struct device_node *rtas;
>>>> +
>>>> +	rtas = of_find_node_by_path("/rtas");
>>>> +	if (rtas) {
>>>
>>> If you just short-circuit that return the whole function body can be
>>> deintented, making it significantly more readable.
>>>
>>> ie:
>>> +	rtas = of_find_node_by_path("/rtas");
>>> +	if (!rtas)
>>> +		return;
>>
>> Okay.
>>
>>>
>>>> +		const __be32 *prop;
>>>> +		u32 len, entries, numnodes, i;
>>>> +
>>>> +		prop = of_get_property(rtas,
>>>> +					"ibm,current-associativity-domains", &len);
>>>
>>> Please don't use of_get_property() in new code, we have much better
>>> accessors these days, which do better error checking and handle the
>>> endian conversions for you.
>>>
>>> In this case you'd use eg:
>>>
>>> 	u32 entries;
>>> 	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);
>>
>> The property 'ibm,current-associativity-domains' has the same format as the property
>> 'ibm,max-associativity-domains' i.e. it is an integer array.  The accessor of_property_read_32,
>> however, expects it to be an integer singleton value.  Instead, it needs:
> 
> I think for this case where the property is an array of values you could use
> of_property_count_elems_of_size() to get the number of elements in the array
> and then use of_property_read_u32_array() to read the array.
> 
> -Nathan

We only need one value from the array which is why I am using,

>>>> +		numnodes = of_read_number(&prop[min_common_depth], 1);

With this implementation I do not need to allocate memory for
an array, nor execute code to read all elements of the array.

Michael

> 
>>
>>>
>>>> +		if (!prop || len < sizeof(unsigned int)) {
>>>> +			prop = of_get_property(rtas,
>>>> +					"ibm,max-associativity-domains", &len);
>>                         if (!prop || len < sizeof(unsigned int))
>>>> +				goto endit;
>>>> +		}
>>>> +
>>>> +		entries = of_read_number(prop++, 1);
>>>> +
>>>> +		if (len < (entries * sizeof(unsigned int)))
>>>> +			goto endit;
>>>> +
>>>> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
>>>> +			entries = min_common_depth;
>>>> +		else
>>>> +			entries -= 1;
>>> 			^
>>>                         You can't just guess that will be the right entry.
>>>
>>> If min_common_depth is < 0 the function should have just returned
>>> immediately at the top.
>>
>> Okay.
>>
>>>
>>> If min_common_depth is outside the range of the property that's a buggy
>>> device tree, you should print a warning and return.
>>>
>>>> +		numnodes = of_read_number(&prop[entries], 1);
>>>
>>> 	u32 num_nodes;
>>> 	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
>>>> +
>>>> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
>>>> +			min_common_depth);
>>>> +
>>>> +		for (i = 0; i < numnodes; i++) {
>>>> +			if (!node_possible(i)) {
>>>> +				setup_node_data(i, 0, 0);
>>>
>>> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
>>> it will not be allocated node local, which sucks.
>>
>> Okay.
>>
>>>
>>>> +				node_set(i, node_possible_map);
>>>> +			}
>>>> +		}
>>>> +	}
>>>> +
>>>> +endit:
>>>
>>> "out" would be the normal name.
>>
>> Okay.
>>
>>>
>>>> +	if (rtas)
>>>> +		of_node_put(rtas);
>>>> +}
>>>> +
>>>>  void __init initmem_init(void)
>>>>  {
>>>>  	int nid, cpu;
>>>> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>>>>  	 */
>>>
>>> You need to update the comment above here which is contradicted by the
>>> new function you're adding.
>>
>> Okay.
>>
>>>
>>>>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>>  
>>>> +	node_associativity_setup();
>>>> +
>>>>  	for_each_online_node(nid) {
>>>>  		unsigned long start_pfn, end_pfn;
>>>>  
>>>
>>> cheers
>>>
>>>
>>
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations
  2017-10-17 17:22         ` Michael Bringmann
@ 2017-10-17 17:41           ` Nathan Fontenot
  0 siblings, 0 replies; 13+ messages in thread
From: Nathan Fontenot @ 2017-10-17 17:41 UTC (permalink / raw)
  To: Michael Bringmann, Michael Ellerman, linuxppc-dev, linux-kernel
  Cc: John Allen, Michael Bringmann from Kernel Team



On 10/17/2017 12:22 PM, Michael Bringmann wrote:
> 
> 
> On 10/17/2017 12:02 PM, Nathan Fontenot wrote:
>> On 10/17/2017 11:14 AM, Michael Bringmann wrote:
>>> See below.
>>>
>>> On 10/16/2017 07:33 AM, Michael Ellerman wrote:
>>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>>
>>>>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
>>>>
>>>> This is a powerpc-only patch, so saying "systems like PowerPC" is
>>>> confusing. What you should be saying is "On pseries systems".
>>>>
>>>>> or memory resources, it may occur that the new resources are to be
>>>>> inserted into nodes that were not used for these resources at bootup.
>>>>> In the kernel, any node that is used must be defined and initialized
>>>>> at boot.
>>>>>
>>>>> This patch extracts the value of the lowest domain level (number of
>>>>> allocable resources) from the "rtas" device tree property
>>>>> "ibm,current-associativity-domains" or the device tree property
>>>>
>>>> What is current associativity domains? I've not heard of it, where is it
>>>> documented, and what does it mean.
>>>>
>>>> Why would use the "current" set vs the "max"? I thought the whole point
>>>> was to discover the maximum possible set of nodes that could be
>>>> hotplugged.
>>>>
>>>>> "ibm,max-associativity-domains" to use as the maximum number of nodes
>>>>> to setup as possibly available in the system.  This new setting will
>>>>> override the instruction,
>>>>>
>>>>>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>>>
>>>>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>>>>>
>>>>> If the property is not present at boot, no operation will be performed
>>>>> to define or enable additional nodes.
>>>>>
>>>>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>>>>> ---
>>>>>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 47 insertions(+)
>>>>>
>>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>>>> index ec098b3..b385cd0 100644
>>>>> --- a/arch/powerpc/mm/numa.c
>>>>> +++ b/arch/powerpc/mm/numa.c
>>>>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>>>>>  }
>>>>>  
>>>>> +static void __init node_associativity_setup(void)
>>>>
>>>> This should really be called "find_possible_nodes()" or something more
>>>> descriptive.
>>>
>>> Okay.
>>>>
>>>>> +{
>>>>> +	struct device_node *rtas;
>>>>> +
>>>>> +	rtas = of_find_node_by_path("/rtas");
>>>>> +	if (rtas) {
>>>>
>>>> If you just short-circuit that return the whole function body can be
>>>> deintented, making it significantly more readable.
>>>>
>>>> ie:
>>>> +	rtas = of_find_node_by_path("/rtas");
>>>> +	if (!rtas)
>>>> +		return;
>>>
>>> Okay.
>>>
>>>>
>>>>> +		const __be32 *prop;
>>>>> +		u32 len, entries, numnodes, i;
>>>>> +
>>>>> +		prop = of_get_property(rtas,
>>>>> +					"ibm,current-associativity-domains", &len);
>>>>
>>>> Please don't use of_get_property() in new code, we have much better
>>>> accessors these days, which do better error checking and handle the
>>>> endian conversions for you.
>>>>
>>>> In this case you'd use eg:
>>>>
>>>> 	u32 entries;
>>>> 	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);
>>>
>>> The property 'ibm,current-associativity-domains' has the same format as the property
>>> 'ibm,max-associativity-domains' i.e. it is an integer array.  The accessor of_property_read_32,
>>> however, expects it to be an integer singleton value.  Instead, it needs:
>>
>> I think for this case where the property is an array of values you could use
>> of_property_count_elems_of_size() to get the number of elements in the array
>> and then use of_property_read_u32_array() to read the array.
>>
>> -Nathan
> 
> We only need one value from the array which is why I am using,
> 
>>>>> +		numnodes = of_read_number(&prop[min_common_depth], 1);
> 
> With this implementation I do not need to allocate memory for
> an array, nor execute code to read all elements of the array.
>
> Michael

OK, I didn't see that you just needed a single value from the array.

In this case you could do

	of_property_read_u32_index(rtas, "ibm,current-associativity-domains",
				   min_common_depth, &numnodes);

-Nathan

> 
>>
>>>
>>>>
>>>>> +		if (!prop || len < sizeof(unsigned int)) {
>>>>> +			prop = of_get_property(rtas,
>>>>> +					"ibm,max-associativity-domains", &len);
>>>                         if (!prop || len < sizeof(unsigned int))
>>>>> +				goto endit;
>>>>> +		}
>>>>> +
>>>>> +		entries = of_read_number(prop++, 1);
>>>>> +
>>>>> +		if (len < (entries * sizeof(unsigned int)))
>>>>> +			goto endit;
>>>>> +
>>>>> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
>>>>> +			entries = min_common_depth;
>>>>> +		else
>>>>> +			entries -= 1;
>>>> 			^
>>>>                         You can't just guess that will be the right entry.
>>>>
>>>> If min_common_depth is < 0 the function should have just returned
>>>> immediately at the top.
>>>
>>> Okay.
>>>
>>>>
>>>> If min_common_depth is outside the range of the property that's a buggy
>>>> device tree, you should print a warning and return.
>>>>
>>>>> +		numnodes = of_read_number(&prop[entries], 1);
>>>>
>>>> 	u32 num_nodes;
>>>> 	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
>>>>> +
>>>>> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
>>>>> +			min_common_depth);
>>>>> +
>>>>> +		for (i = 0; i < numnodes; i++) {
>>>>> +			if (!node_possible(i)) {
>>>>> +				setup_node_data(i, 0, 0);
>>>>
>>>> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
>>>> it will not be allocated node local, which sucks.
>>>
>>> Okay.
>>>
>>>>
>>>>> +				node_set(i, node_possible_map);
>>>>> +			}
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +endit:
>>>>
>>>> "out" would be the normal name.
>>>
>>> Okay.
>>>
>>>>
>>>>> +	if (rtas)
>>>>> +		of_node_put(rtas);
>>>>> +}
>>>>> +
>>>>>  void __init initmem_init(void)
>>>>>  {
>>>>>  	int nid, cpu;
>>>>> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>>>>>  	 */
>>>>
>>>> You need to update the comment above here which is contradicted by the
>>>> new function you're adding.
>>>
>>> Okay.
>>>
>>>>
>>>>>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>>>  
>>>>> +	node_associativity_setup();
>>>>> +
>>>>>  	for_each_online_node(nid) {
>>>>>  		unsigned long start_pfn, end_pfn;
>>>>>  
>>>>
>>>> cheers
>>>>
>>>>
>>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug
  2017-10-16 12:54   ` Michael Ellerman
  2017-10-17 15:08     ` Michael Bringmann
@ 2017-11-15 18:28     ` Michael Bringmann
  2017-11-16 16:57       ` Nathan Fontenot
  1 sibling, 1 reply; 13+ messages in thread
From: Michael Bringmann @ 2017-11-15 18:28 UTC (permalink / raw)
  To: linuxppc-dev

Hello:
    See below.

On 10/16/2017 07:54 AM, Michael Ellerman wrote:
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> 
>> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU,
>> it may occur that the new resources are to be inserted into nodes
>> that were not used for memory resources at bootup.  Many different
>> configurations of PowerPC resources may need to be supported depending
>> upon the environment.
> 
> Give me some detail please?!

The most important characteristics that I have observed are:

* Dedicated vs. shared resources.  Shared resources require information
  such as the VPHN hcall for CPU assignment to nodes.
* memoryless nodes at boot.  Nodes need to be defined as 'possible' at
  boot for operation with other code modules.  Previously, the powerpc
  code would limit the set of possible/online nodes to those which have
  memory assigned at boot.  Subsequent add/remove of CPUs or memory would
  only work with this subset of possible nodes.
* memoryless nodes with CPUs at boot.  Due to the previous restriction on
  nodes, nodes that had CPUs but no memory were being collapsed into other
  nodes that did have memory at boot.  In practice this meant that the
  node assignment presented by the runtime kernel differed from the affinity
  and associativity attirbutes presented by the device tree or VPHN hcalls.
  Nodes that might be known to the pHyp were not 'possible' in the runtime
  kernel because they did not have memory at boot.

> 
>> This patch fixes some problems encountered at
> 
> What problems?

This patch set fixes a couple of problems.

* Nodes known to powerpc to be memoryless at boot, but to have CPUs in them
  are allowed to be 'possible' and 'online'.  Memory allocations for those
  nodes are taken from another node that does have memory until and if memory
  is hot-added to the node.
* Nodes which have no resources assigned at boot, but which may still be
  referenced subsequently by affinity or associativity attributes, are kept
  in the list of 'possible' nodes for powerpc.  Hot-add of memory or CPUs
  to the system can reference these nodes and bring them online instead of
  redirecting the resources to the set of nodes known to have memory at boot.

> 
>> runtime with configurations that support memory-less nodes, but which
>> allow CPUs to be added at and after boot.
> 
> How does it fix those problems?

This problem was fixed in a couple of ways.  First, the code now checks
whether the node to which a CPU is mapped by 'numa_update_cpu_topology' /
'arch_update_cpu_topology' has been initialized and has memory available.
If either test is false, a call is made to 'try_online_node()' to finish
the data structure initialization.  Only if we are unable to initialize
the node at this point will the CPU node assignment be collapsed into an
existing node.  After initialization by 'try_online_node()', calls to
'local_memory_node' no longer crash for these memoryless nodes.

> 
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index b385cd0..e811dd1 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu,
>>  	return rc;
>>  }
>>  
>> +static int verify_node_preparation(int nid)
>> +{
> 
> I would not expect a function called "verify" ...
> 
>> +	if ((NODE_DATA(nid) == NULL) ||
>> +	    (NODE_DATA(nid)->node_spanned_pages == 0)) {
>> +		if (try_online_node(nid))
> 
> .. to do something like online a node.

We have changed the function name to 'find_cpu_nid'.

> 
>> +			return first_online_node;
>> +	}
>> +
>> +	return nid;
>> +}
>> +
>>  /*
>>   * Update the CPU maps and sysfs entries for a single CPU when its NUMA
>>   * characteristics change. This function doesn't perform any locking and is
>> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked)
>>  		/* Use associativity from first thread for all siblings */
>>  		vphn_get_associativity(cpu, associativity);
>>  		new_nid = associativity_to_nid(associativity);
>> -		if (new_nid < 0 || !node_online(new_nid))
>> +		if (new_nid < 0 || !node_possible(new_nid))
>>  			new_nid = first_online_node;
>>  
>> +		new_nid = verify_node_preparation(new_nid);
> 
> You're being called part-way through CPU hotplug here, are we sure it's
> safe to go and do memory hotplug from there? What's the locking
> situation?

We are not doing memory hotplug.  We are initializing a node that may be used
by CPUs or memory before it can be referenced as invalid by a CPU hotplug
operation.  CPU hotplug operations are protected by a range of APIs including
cpu_maps_update_begin/cpu_maps_update_done, cpus_read/write_lock / cpus_read/write_unlock,
device locks, and more.  Memory hotplug operations, including try_online_node,
are protected by mem_hotplug_begin/mem_hotplug_done, device locks, and more.
In the case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no overlap.
Using HMC hot-add/hot-remove operations, I have been able to add and remove
CPUs to any possible node without failures.  HMC operations involve a degree
self-serialization, though.

> 
> cheers
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug
  2017-11-15 18:28     ` Michael Bringmann
@ 2017-11-16 16:57       ` Nathan Fontenot
  2017-11-16 17:36         ` Michael Bringmann
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Fontenot @ 2017-11-16 16:57 UTC (permalink / raw)
  To: linuxppc-dev



On 11/15/2017 12:28 PM, Michael Bringmann wrote:
> Hello:
>     See below.
> 
> On 10/16/2017 07:54 AM, Michael Ellerman wrote:
>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>
>>> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU,
>>> it may occur that the new resources are to be inserted into nodes
>>> that were not used for memory resources at bootup.  Many different
>>> configurations of PowerPC resources may need to be supported depending
>>> upon the environment.
>>
>> Give me some detail please?!
> 
> The most important characteristics that I have observed are:
> 
> * Dedicated vs. shared resources.  Shared resources require information
>   such as the VPHN hcall for CPU assignment to nodes.
> * memoryless nodes at boot.  Nodes need to be defined as 'possible' at
>   boot for operation with other code modules.  Previously, the powerpc
>   code would limit the set of possible/online nodes to those which have
>   memory assigned at boot.  Subsequent add/remove of CPUs or memory would
>   only work with this subset of possible nodes.
> * memoryless nodes with CPUs at boot.  Due to the previous restriction on
>   nodes, nodes that had CPUs but no memory were being collapsed into other
>   nodes that did have memory at boot.  In practice this meant that the
>   node assignment presented by the runtime kernel differed from the affinity
>   and associativity attirbutes presented by the device tree or VPHN hcalls.
>   Nodes that might be known to the pHyp were not 'possible' in the runtime
>   kernel because they did not have memory at boot.
> 
>>
>>> This patch fixes some problems encountered at
>>
>> What problems?
> 
> This patch set fixes a couple of problems.
> 
> * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them
>   are allowed to be 'possible' and 'online'.  Memory allocations for those
>   nodes are taken from another node that does have memory until and if memory
>   is hot-added to the node.
> * Nodes which have no resources assigned at boot, but which may still be
>   referenced subsequently by affinity or associativity attributes, are kept
>   in the list of 'possible' nodes for powerpc.  Hot-add of memory or CPUs
>   to the system can reference these nodes and bring them online instead of
>   redirecting the resources to the set of nodes known to have memory at boot.
> 
>>
>>> runtime with configurations that support memory-less nodes, but which
>>> allow CPUs to be added at and after boot.
>>
>> How does it fix those problems?
> 
> This problem was fixed in a couple of ways.  First, the code now checks
> whether the node to which a CPU is mapped by 'numa_update_cpu_topology' /
> 'arch_update_cpu_topology' has been initialized and has memory available.
> If either test is false, a call is made to 'try_online_node()' to finish
> the data structure initialization.  Only if we are unable to initialize
> the node at this point will the CPU node assignment be collapsed into an
> existing node.  After initialization by 'try_online_node()', calls to
> 'local_memory_node' no longer crash for these memoryless nodes.
> 
>>
>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>> index b385cd0..e811dd1 100644
>>> --- a/arch/powerpc/mm/numa.c
>>> +++ b/arch/powerpc/mm/numa.c
>>> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu,
>>>  	return rc;
>>>  }
>>>  
>>> +static int verify_node_preparation(int nid)
>>> +{
>>
>> I would not expect a function called "verify" ...
>>
>>> +	if ((NODE_DATA(nid) == NULL) ||
>>> +	    (NODE_DATA(nid)->node_spanned_pages == 0)) {
>>> +		if (try_online_node(nid))
>>
>> .. to do something like online a node.
> 
> We have changed the function name to 'find_cpu_nid'.

Ok, but I would still not expect 'find_cpu_nid' to online the node.

> 
>>
>>> +			return first_online_node;
>>> +	}
>>> +
>>> +	return nid;
>>> +}
>>> +
>>>  /*
>>>   * Update the CPU maps and sysfs entries for a single CPU when its NUMA
>>>   * characteristics change. This function doesn't perform any locking and is
>>> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked)
>>>  		/* Use associativity from first thread for all siblings */
>>>  		vphn_get_associativity(cpu, associativity);
>>>  		new_nid = associativity_to_nid(associativity);
>>> -		if (new_nid < 0 || !node_online(new_nid))
>>> +		if (new_nid < 0 || !node_possible(new_nid))
>>>  			new_nid = first_online_node;
>>>  
>>> +		new_nid = verify_node_preparation(new_nid);
>>
>> You're being called part-way through CPU hotplug here, are we sure it's
>> safe to go and do memory hotplug from there? What's the locking
>> situation?
> 
> We are not doing memory hotplug.  We are initializing a node that may be used
> by CPUs or memory before it can be referenced as invalid by a CPU hotplug
> operation.  CPU hotplug operations are protected by a range of APIs including
> cpu_maps_update_begin/cpu_maps_update_done, cpus_read/write_lock / cpus_read/write_unlock,
> device locks, and more.  Memory hotplug operations, including try_online_node,
> are protected by mem_hotplug_begin/mem_hotplug_done, device locks, and more.
> In the case of CPUs being hot-added to a previously memoryless node, the
> try_online_node operation occurs wholly within the CPU locks with no overlap.
> Using HMC hot-add/hot-remove operations, I have been able to add and remove
> CPUs to any possible node without failures.  HMC operations involve a degree
> self-serialization, though.

For both memory and cpu DLPAR operations we hold the device hotplug lock.

-Nathan
 
> 
>>
>> cheers
>>
>>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug
  2017-11-16 16:57       ` Nathan Fontenot
@ 2017-11-16 17:36         ` Michael Bringmann
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Bringmann @ 2017-11-16 17:36 UTC (permalink / raw)
  To: linuxppc-dev

>>>
>>>> +	if ((NODE_DATA(nid) == NULL) ||
>>>> +	    (NODE_DATA(nid)->node_spanned_pages == 0)) {
>>>> +		if (try_online_node(nid))
>>>
>>> .. to do something like online a node.
>>
>> We have changed the function name to 'find_cpu_nid'.
> 
> Ok, but I would still not expect 'find_cpu_nid' to online the node.
> 

We would have to talk to the developer that created try_online_node()
which fully initializes the node and all of the related data structures.
A few of the APIs are external, and 'numa.c' knows how to allocate the
base 'pgdat' structure, but everything else that the kernel depends
upon for a node is handled in mm/page_alloc.c and mm/hotplug_memory.c.
I was trying to avoid piecemeal changes to that code -- avoid any changes
if it comes to it.

Even if it was not expected to put the node online, it is convenient,
as otherwise the patches to 'numa.c' would have to put the node online
-- that is expected for a CPU that is online.

Regards,

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-11-16 17:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-18 18:28 [PATCH 0/2] powerpc/nodes/hotplug: Fix problem with memoryless nodes Michael Bringmann
2017-09-18 18:28 ` [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
2017-10-16 12:33   ` Michael Ellerman
2017-10-17 16:14     ` Michael Bringmann
2017-10-17 17:02       ` Nathan Fontenot
2017-10-17 17:22         ` Michael Bringmann
2017-10-17 17:41           ` Nathan Fontenot
2017-09-18 18:28 ` [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug Michael Bringmann
2017-10-16 12:54   ` Michael Ellerman
2017-10-17 15:08     ` Michael Bringmann
2017-11-15 18:28     ` Michael Bringmann
2017-11-16 16:57       ` Nathan Fontenot
2017-11-16 17:36         ` Michael Bringmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).