linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/numa: Restrict possible nodes based on platform
@ 2020-07-06  6:40 Srikar Dronamraju
  2020-07-06 20:58 ` Tyrel Datwyler
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Srikar Dronamraju @ 2020-07-06  6:40 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, linuxppc-dev, Srikar Dronamraju, Bharata B Rao

As per PAPR, there are 2 device tree property
ibm,max-associativity-domains (which defines the maximum number of
domains that the firmware i.e PowerVM can support) and
ibm,current-associativity-domains (which defines the maximum number of
domains that the platform can support). Value of
ibm,max-associativity-domains property is always greater than or equal
to ibm,current-associativity-domains property.

Powerpc currently uses ibm,max-associativity-domains  property while
setting the possible number of nodes. This is currently set at 32.
However the possible number of nodes for a platform may be significantly
less. Hence set the possible number of nodes based on
ibm,current-associativity-domains property.

$ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
/proc/device-tree/rtas/ibm,current-associativity-domains
		 00000005 00000001 00000002 00000002 00000002 00000010
/proc/device-tree/rtas/ibm,max-associativity-domains
		 00000005 00000001 00000008 00000020 00000020 00000100

$ cat /sys/devices/system/node/possible ##Before patch
0-31

$ cat /sys/devices/system/node/possible ##After patch
0-1

Note the maximum nodes this platform can support is only 2 but the
possible nodes is set to 32.

This is important because lot of kernel and user space code allocate
structures for all possible nodes leading to a lot of memory that is
allocated but not used.

I ran a simple experiment to create and destroy 100 memory cgroups on
boot on a 8 node machine (Power8 Alpine).

Before patch
free -k at boot
              total        used        free      shared  buff/cache   available
Mem:      523498176     4106816   518820608       22272      570752   516606720
Swap:       4194240           0     4194240

free -k after creating 100 memory cgroups
              total        used        free      shared  buff/cache   available
Mem:      523498176     4628416   518246464       22336      623296   516058688
Swap:       4194240           0     4194240

free -k after destroying 100 memory cgroups
              total        used        free      shared  buff/cache   available
Mem:      523498176     4697408   518173760       22400      627008   515987904
Swap:       4194240           0     4194240

After patch
free -k at boot
              total        used        free      shared  buff/cache   available
Mem:      523498176     3969472   518933888       22272      594816   516731776
Swap:       4194240           0     4194240

free -k after creating 100 memory cgroups
              total        used        free      shared  buff/cache   available
Mem:      523498176     4181888   518676096       22208      640192   516496448
Swap:       4194240           0     4194240

free -k after destroying 100 memory cgroups
              total        used        free      shared  buff/cache   available
Mem:      523498176     4232320   518619904       22272      645952   516443264
Swap:       4194240           0     4194240

Observations:
Fixed kernel takes 137344 kb (4106816-3969472) less to boot.
Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 memcgs.

Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9fcf2d195830..3d55cef1a2dc 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
 		return;
 
 	if (of_property_read_u32_index(rtas,
-				"ibm,max-associativity-domains",
+				"ibm,current-associativity-domains",
 				min_common_depth, &numnodes))
 		goto out;
 
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
  2020-07-06  6:40 [PATCH] powerpc/numa: Restrict possible nodes based on platform Srikar Dronamraju
@ 2020-07-06 20:58 ` Tyrel Datwyler
  2020-07-07  0:44   ` Nathan Lynch
  2020-07-07  2:50   ` Srikar Dronamraju
  2020-07-06 23:19 ` Nathan Lynch
  2020-07-07  5:02 ` Michael Ellerman
  2 siblings, 2 replies; 13+ messages in thread
From: Tyrel Datwyler @ 2020-07-06 20:58 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: Nathan Lynch, linuxppc-dev, Bharata B Rao

On 7/5/20 11:40 PM, Srikar Dronamraju wrote:
> As per PAPR, there are 2 device tree property
> ibm,max-associativity-domains (which defines the maximum number of
> domains that the firmware i.e PowerVM can support) and
> ibm,current-associativity-domains (which defines the maximum number of
> domains that the platform can support). Value of
> ibm,max-associativity-domains property is always greater than or equal
> to ibm,current-associativity-domains property.
> 
> Powerpc currently uses ibm,max-associativity-domains  property while
> setting the possible number of nodes. This is currently set at 32.
> However the possible number of nodes for a platform may be significantly
> less. Hence set the possible number of nodes based on
> ibm,current-associativity-domains property.
> 
> $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
> /proc/device-tree/rtas/ibm,current-associativity-domains
> 		 00000005 00000001 00000002 00000002 00000002 00000010
> /proc/device-tree/rtas/ibm,max-associativity-domains
> 		 00000005 00000001 00000008 00000020 00000020 00000100
> 
> $ cat /sys/devices/system/node/possible ##Before patch
> 0-31
> 
> $ cat /sys/devices/system/node/possible ##After patch
> 0-1
> 
> Note the maximum nodes this platform can support is only 2 but the
> possible nodes is set to 32.
> 
> This is important because lot of kernel and user space code allocate
> structures for all possible nodes leading to a lot of memory that is
> allocated but not used.
> 
> I ran a simple experiment to create and destroy 100 memory cgroups on
> boot on a 8 node machine (Power8 Alpine).
> 
> Before patch
> free -k at boot
>               total        used        free      shared  buff/cache   available
> Mem:      523498176     4106816   518820608       22272      570752   516606720
> Swap:       4194240           0     4194240
> 
> free -k after creating 100 memory cgroups
>               total        used        free      shared  buff/cache   available
> Mem:      523498176     4628416   518246464       22336      623296   516058688
> Swap:       4194240           0     4194240
> 
> free -k after destroying 100 memory cgroups
>               total        used        free      shared  buff/cache   available
> Mem:      523498176     4697408   518173760       22400      627008   515987904
> Swap:       4194240           0     4194240
> 
> After patch
> free -k at boot
>               total        used        free      shared  buff/cache   available
> Mem:      523498176     3969472   518933888       22272      594816   516731776
> Swap:       4194240           0     4194240
> 
> free -k after creating 100 memory cgroups
>               total        used        free      shared  buff/cache   available
> Mem:      523498176     4181888   518676096       22208      640192   516496448
> Swap:       4194240           0     4194240
> 
> free -k after destroying 100 memory cgroups
>               total        used        free      shared  buff/cache   available
> Mem:      523498176     4232320   518619904       22272      645952   516443264
> Swap:       4194240           0     4194240
> 
> Observations:
> Fixed kernel takes 137344 kb (4106816-3969472) less to boot.
> Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 memcgs.
> 
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Anton Blanchard <anton@ozlabs.org>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 9fcf2d195830..3d55cef1a2dc 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
>  		return;
> 
>  	if (of_property_read_u32_index(rtas,
> -				"ibm,max-associativity-domains",
> +				"ibm,current-associativity-domains",

I'm not sure ibm,current-associativity-domains is guaranteed to exist on older
firmware. You may need check that it exists and fall back to
ibm,max-associativity-domains in the event it doesn't

-Tyrel

>  				min_common_depth, &numnodes))
>  		goto out;
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
  2020-07-06  6:40 [PATCH] powerpc/numa: Restrict possible nodes based on platform Srikar Dronamraju
  2020-07-06 20:58 ` Tyrel Datwyler
@ 2020-07-06 23:19 ` Nathan Lynch
  2020-07-07  5:02 ` Michael Ellerman
  2 siblings, 0 replies; 13+ messages in thread
From: Nathan Lynch @ 2020-07-06 23:19 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: linuxppc-dev, Bharata B Rao

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 9fcf2d195830..3d55cef1a2dc 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
>  		return;
>  
>  	if (of_property_read_u32_index(rtas,
> -				"ibm,max-associativity-domains",
> +				"ibm,current-associativity-domains",
>  				min_common_depth, &numnodes))

Looks good if ibm,current-associativity-domains[min_common_depth]
actually denotes the range of possible values, i.e. a value of 2 implies
node numbers 0 and 1. PAPR+ says it's the "number of unique values",
which isn't how I would specify the property if it's supposed to express
a range. But it's probably OK... 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
  2020-07-06 20:58 ` Tyrel Datwyler
@ 2020-07-07  0:44   ` Nathan Lynch
  2020-07-07  2:53     ` Srikar Dronamraju
  2020-07-07  2:50   ` Srikar Dronamraju
  1 sibling, 1 reply; 13+ messages in thread
From: Nathan Lynch @ 2020-07-07  0:44 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: linuxppc-dev, Srikar Dronamraju, Bharata B Rao

Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
>>  		return;
>> 
>>  	if (of_property_read_u32_index(rtas,
>> -				"ibm,max-associativity-domains",
>> +				"ibm,current-associativity-domains",
>
> I'm not sure ibm,current-associativity-domains is guaranteed to exist on older
> firmware. You may need check that it exists and fall back to
> ibm,max-associativity-domains in the event it doesn't

Yes. Looks like it's a PowerVM-specific property.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
  2020-07-06 20:58 ` Tyrel Datwyler
  2020-07-07  0:44   ` Nathan Lynch
@ 2020-07-07  2:50   ` Srikar Dronamraju
  1 sibling, 0 replies; 13+ messages in thread
From: Srikar Dronamraju @ 2020-07-07  2:50 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: Nathan Lynch, linuxppc-dev, Bharata B Rao

* Tyrel Datwyler <tyreld@linux.ibm.com> [2020-07-06 13:58:42]:

> On 7/5/20 11:40 PM, Srikar Dronamraju wrote:
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index 9fcf2d195830..3d55cef1a2dc 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
> >  		return;
> > 
> >  	if (of_property_read_u32_index(rtas,
> > -				"ibm,max-associativity-domains",
> > +				"ibm,current-associativity-domains",
> 
> I'm not sure ibm,current-associativity-domains is guaranteed to exist on older
> firmware. You may need check that it exists and fall back to
> ibm,max-associativity-domains in the event it doesn't
> 
> -Tyrel
> 

Oh, 
thanks Tyrel thats a good observation.
Will fallback on max-associativity.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
  2020-07-07  0:44   ` Nathan Lynch
@ 2020-07-07  2:53     ` Srikar Dronamraju
  0 siblings, 0 replies; 13+ messages in thread
From: Srikar Dronamraju @ 2020-07-07  2:53 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Tyrel Datwyler, linuxppc-dev, Bharata B Rao

* Nathan Lynch <nathanl@linux.ibm.com> [2020-07-06 19:44:31]:

> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
> >> --- a/arch/powerpc/mm/numa.c
> >> +++ b/arch/powerpc/mm/numa.c
> >> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
> >>  		return;
> >> 
> >>  	if (of_property_read_u32_index(rtas,
> >> -				"ibm,max-associativity-domains",
> >> +				"ibm,current-associativity-domains",
> >
> > I'm not sure ibm,current-associativity-domains is guaranteed to exist on older
> > firmware. You may need check that it exists and fall back to
> > ibm,max-associativity-domains in the event it doesn't
> 
> Yes. Looks like it's a PowerVM-specific property.

Right, this is just a PowerVM specific property and doesn't affect PowerNV.
On PowerNV thought we have sparse nodes, the max possible nodes is set
correctly.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
  2020-07-06  6:40 [PATCH] powerpc/numa: Restrict possible nodes based on platform Srikar Dronamraju
  2020-07-06 20:58 ` Tyrel Datwyler
  2020-07-06 23:19 ` Nathan Lynch
@ 2020-07-07  5:02 ` Michael Ellerman
  2020-07-07  8:42   ` Srikar Dronamraju
  2 siblings, 1 reply; 13+ messages in thread
From: Michael Ellerman @ 2020-07-07  5:02 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, linuxppc-dev, Srikar Dronamraju, Bharata B Rao

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> As per PAPR, there are 2 device tree property
> ibm,max-associativity-domains (which defines the maximum number of
> domains that the firmware i.e PowerVM can support) and
> ibm,current-associativity-domains (which defines the maximum number of
> domains that the platform can support). Value of
> ibm,max-associativity-domains property is always greater than or equal
> to ibm,current-associativity-domains property.

Where is it documented?

It's definitely not in LoPAPR.

> Powerpc currently uses ibm,max-associativity-domains  property while
> setting the possible number of nodes. This is currently set at 32.
> However the possible number of nodes for a platform may be significantly
> less. Hence set the possible number of nodes based on
> ibm,current-associativity-domains property.
>
> $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
> /proc/device-tree/rtas/ibm,current-associativity-domains
> 		 00000005 00000001 00000002 00000002 00000002 00000010
> /proc/device-tree/rtas/ibm,max-associativity-domains
> 		 00000005 00000001 00000008 00000020 00000020 00000100
>
> $ cat /sys/devices/system/node/possible ##Before patch
> 0-31
>
> $ cat /sys/devices/system/node/possible ##After patch
> 0-1
>
> Note the maximum nodes this platform can support is only 2 but the
> possible nodes is set to 32.

But what about LPM to a system with more nodes?

cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
  2020-07-07  5:02 ` Michael Ellerman
@ 2020-07-07  8:42   ` Srikar Dronamraju
  2020-07-10 17:41     ` Nathan Lynch
  0 siblings, 1 reply; 13+ messages in thread
From: Srikar Dronamraju @ 2020-07-07  8:42 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Nathan Lynch, linuxppc-dev, Bharata B Rao

* Michael Ellerman <mpe@ellerman.id.au> [2020-07-07 15:02:17]:

> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> > As per PAPR, there are 2 device tree property
> > ibm,max-associativity-domains (which defines the maximum number of
> > domains that the firmware i.e PowerVM can support) and
> > ibm,current-associativity-domains (which defines the maximum number of
> > domains that the platform can support). Value of
> > ibm,max-associativity-domains property is always greater than or equal
> > to ibm,current-associativity-domains property.
> 
> Where is it documented?
> 
> It's definitely not in LoPAPR.
> 

https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf
Page number 833.

which says 
ibm,current-associativity-domains”
	property name to define the current number of associativity
	domains for this platform.
	prop-encoded-array: An associativity list such that all values are
	the number of unique values that the current platform supports
	in that location. The associativity list consisting of a number of
	entries integer (N) encoded as with encode-int followed by N
	integers encoded as with encode-int each representing current
	number of unique associativity domains the platform supports at
	that level.

> > Powerpc currently uses ibm,max-associativity-domains  property while
> > setting the possible number of nodes. This is currently set at 32.
> > However the possible number of nodes for a platform may be significantly
> > less. Hence set the possible number of nodes based on
> > ibm,current-associativity-domains property.
> >
> > $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
> > /proc/device-tree/rtas/ibm,current-associativity-domains
> > 		 00000005 00000001 00000002 00000002 00000002 00000010
> > /proc/device-tree/rtas/ibm,max-associativity-domains
> > 		 00000005 00000001 00000008 00000020 00000020 00000100
> >
> > $ cat /sys/devices/system/node/possible ##Before patch
> > 0-31
> >
> > $ cat /sys/devices/system/node/possible ##After patch
> > 0-1
> >
> > Note the maximum nodes this platform can support is only 2 but the
> > possible nodes is set to 32.
> 
> But what about LPM to a system with more nodes?
> 

I have very less info on LPM, so I checked with Nathan Lynch before posting
and as per Nathan in the current design of LPM, Linux wouldn't use the new
node numbers. 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
  2020-07-07  8:42   ` Srikar Dronamraju
@ 2020-07-10 17:41     ` Nathan Lynch
  2020-07-15 12:05       ` [PATCH 1/2] powerpc/numa: Limit possible nodes to within num_possible_nodes Srikar Dronamraju
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Lynch @ 2020-07-10 17:41 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: linuxppc-dev, Bharata B Rao

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> * Michael Ellerman <mpe@ellerman.id.au> [2020-07-07 15:02:17]:
>
>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>> > $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
>> > /proc/device-tree/rtas/ibm,current-associativity-domains
>> > 		 00000005 00000001 00000002 00000002 00000002 00000010
>> > /proc/device-tree/rtas/ibm,max-associativity-domains
>> > 		 00000005 00000001 00000008 00000020 00000020 00000100
>> >
>> > $ cat /sys/devices/system/node/possible ##Before patch
>> > 0-31
>> >
>> > $ cat /sys/devices/system/node/possible ##After patch
>> > 0-1
>> >
>> > Note the maximum nodes this platform can support is only 2 but the
>> > possible nodes is set to 32.
>> 
>> But what about LPM to a system with more nodes?
>> 
>
> I have very less info on LPM, so I checked with Nathan Lynch before posting
> and as per Nathan in the current design of LPM, Linux wouldn't use the new
> node numbers.

(I see a v2 has been posted already but I needed a little time to check
with our hypervisor people on this point.)

It's less of a design and more of a least-bad option in the absence of a
more flexible NUMA architecture in Linux.

For now, the "rule" with LPM and NUMA has to be that Linux uses the NUMA
information from the device tree that it was booted with, and it must
disregard ibm,associativity and similar information after LPM or certain
other platform events. Historically there has been code that tried to
honor changes in NUMA information but it caused much worse problems than
degraded performance. That code has been disabled by default since last
year and is now subject to removal:

https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=182897

Most NUMA-aware code happens to follow that rule because the device tree
associativity information tends to get cached on first access in Linux's
own data structures. It all feels a little fragile to me though,
especially since we can DLPAR add processors and memory after LPM with
"new" associativity properties which don't relate to the logical
topology Linux has already built. However, on currently available hw, as
long as we're using ibm,max-associativity-domains to limit the set of
possible nodes, I believe such resources will always receive valid (but
possibly suboptimal) NUMA assignments. That's because as of this
writing ibm,max-associativity-domains has the same contents on all
currently available PowerVM systems.

Now if we change to using ibm,current-associativity-domains, which we
*can* expect to differ between differently configured systems, post-LPM
DLPAR additions can yield resources with node assignments that
fall outside of the possible range, especially when we've migrated from
a smaller system to a larger one.

Is the current code robust against that possibility? I don't think so:
it looks like of_node_to_nid_single(), of_drconf_to_nid_single() and
possibly more code need to guard against this in order to prevent
NODE_DATA() null dereferences and the like. Probably those functions
should be made to clamp the nid assignment at num_possible_nodes()
instead of MAX_NUMNODES, which strikes me as more correct regardless of
your patch.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] powerpc/numa: Limit possible nodes to within num_possible_nodes
  2020-07-10 17:41     ` Nathan Lynch
@ 2020-07-15 12:05       ` Srikar Dronamraju
  2020-07-15 12:05         ` [PATCH 2/2] powerpc/numa: Remove a redundant variable Srikar Dronamraju
  2020-07-22  3:14         ` [PATCH 1/2] powerpc/numa: Limit possible nodes to within num_possible_nodes Nathan Lynch
  0 siblings, 2 replies; 13+ messages in thread
From: Srikar Dronamraju @ 2020-07-15 12:05 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Tyrel Datwyler, linuxppc-dev, Srikar Dronamraju,
	Michael Ellerman

MAX_NUMNODES is a theoretical maximum number of nodes thats is supported
by the kernel. Device tree properties exposes the number of possible
nodes on the current platform. The kernel would detected this and would
use it for most of its resource allocations.  If the platform now
increases the nodes to over what was already exposed, then it may lead
to inconsistencies. Hence limit it to the already exposed nodes.

Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8ec7ff05ae47..a2c5fe0d0cad 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -221,7 +221,7 @@ static void initialize_distance_lookup_table(int nid,
 	}
 }
 
-/* Returns nid in the range [0..MAX_NUMNODES-1], or -1 if no useful numa
+/* Returns nid in the range [0..num_possible_nodes()-1], or -1 if no useful numa
  * info is found.
  */
 static int associativity_to_nid(const __be32 *associativity)
@@ -235,7 +235,7 @@ static int associativity_to_nid(const __be32 *associativity)
 		nid = of_read_number(&associativity[min_common_depth], 1);
 
 	/* POWER4 LPAR uses 0xffff as invalid node */
-	if (nid == 0xffff || nid >= MAX_NUMNODES)
+	if (nid == 0xffff || nid >= num_possible_nodes())
 		nid = NUMA_NO_NODE;
 
 	if (nid > 0 &&
@@ -448,7 +448,7 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
 		index = lmb->aa_index * aa.array_sz + min_common_depth - 1;
 		nid = of_read_number(&aa.arrays[index], 1);
 
-		if (nid == 0xffff || nid >= MAX_NUMNODES)
+		if (nid == 0xffff || nid >= num_possible_nodes())
 			nid = default_nid;
 
 		if (nid > 0) {
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] powerpc/numa: Remove a redundant variable
  2020-07-15 12:05       ` [PATCH 1/2] powerpc/numa: Limit possible nodes to within num_possible_nodes Srikar Dronamraju
@ 2020-07-15 12:05         ` Srikar Dronamraju
  2020-07-22  3:28           ` Nathan Lynch
  2020-07-22  3:14         ` [PATCH 1/2] powerpc/numa: Limit possible nodes to within num_possible_nodes Nathan Lynch
  1 sibling, 1 reply; 13+ messages in thread
From: Srikar Dronamraju @ 2020-07-15 12:05 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Tyrel Datwyler, linuxppc-dev, Srikar Dronamraju,
	Michael Ellerman

In of_drconf_to_nid_single() default_nid always refers to NUMA_NO_NODE.
Hence replace it with NUMA_NO_NODE.

No functional changes.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index a2c5fe0d0cad..b066d89c2975 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -432,16 +432,15 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa)
 static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
 {
 	struct assoc_arrays aa = { .arrays = NULL };
-	int default_nid = NUMA_NO_NODE;
-	int nid = default_nid;
+	int nid = NUMA_NO_NODE;
 	int rc, index;
 
 	if ((min_common_depth < 0) || !numa_enabled)
-		return default_nid;
+		return NUMA_NO_NODE;
 
 	rc = of_get_assoc_arrays(&aa);
 	if (rc)
-		return default_nid;
+		return NUMA_NO_NODE;
 
 	if (min_common_depth <= aa.array_sz &&
 	    !(lmb->flags & DRCONF_MEM_AI_INVALID) && lmb->aa_index < aa.n_arrays) {
@@ -449,7 +448,7 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
 		nid = of_read_number(&aa.arrays[index], 1);
 
 		if (nid == 0xffff || nid >= num_possible_nodes())
-			nid = default_nid;
+			nid = NUMA_NO_NODE;
 
 		if (nid > 0) {
 			index = lmb->aa_index * aa.array_sz;
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] powerpc/numa: Limit possible nodes to within num_possible_nodes
  2020-07-15 12:05       ` [PATCH 1/2] powerpc/numa: Limit possible nodes to within num_possible_nodes Srikar Dronamraju
  2020-07-15 12:05         ` [PATCH 2/2] powerpc/numa: Remove a redundant variable Srikar Dronamraju
@ 2020-07-22  3:14         ` Nathan Lynch
  1 sibling, 0 replies; 13+ messages in thread
From: Nathan Lynch @ 2020-07-22  3:14 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: Tyrel Datwyler, linuxppc-dev, Srikar Dronamraju, Michael Ellerman

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> MAX_NUMNODES is a theoretical maximum number of nodes thats is supported
> by the kernel. Device tree properties exposes the number of possible
> nodes on the current platform. The kernel would detected this and would
> use it for most of its resource allocations.  If the platform now
> increases the nodes to over what was already exposed, then it may lead
> to inconsistencies. Hence limit it to the already exposed nodes.
>
> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Tyrel Datwyler <tyreld@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 8ec7ff05ae47..a2c5fe0d0cad 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -221,7 +221,7 @@ static void initialize_distance_lookup_table(int nid,
>  	}
>  }
>  
> -/* Returns nid in the range [0..MAX_NUMNODES-1], or -1 if no useful numa
> +/* Returns nid in the range [0..num_possible_nodes()-1], or -1 if no useful numa
>   * info is found.
>   */
>  static int associativity_to_nid(const __be32 *associativity)
> @@ -235,7 +235,7 @@ static int associativity_to_nid(const __be32 *associativity)
>  		nid = of_read_number(&associativity[min_common_depth], 1);
>  
>  	/* POWER4 LPAR uses 0xffff as invalid node */
> -	if (nid == 0xffff || nid >= MAX_NUMNODES)
> +	if (nid == 0xffff || nid >= num_possible_nodes())
>  		nid = NUMA_NO_NODE;
>  
>  	if (nid > 0 &&
> @@ -448,7 +448,7 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>  		index = lmb->aa_index * aa.array_sz + min_common_depth - 1;
>  		nid = of_read_number(&aa.arrays[index], 1);
>  
> -		if (nid == 0xffff || nid >= MAX_NUMNODES)
> +		if (nid == 0xffff || nid >= num_possible_nodes())
>  			nid = default_nid;
>  
>  		if (nid > 0) {

Yes, looks fine to me.

Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] powerpc/numa: Remove a redundant variable
  2020-07-15 12:05         ` [PATCH 2/2] powerpc/numa: Remove a redundant variable Srikar Dronamraju
@ 2020-07-22  3:28           ` Nathan Lynch
  0 siblings, 0 replies; 13+ messages in thread
From: Nathan Lynch @ 2020-07-22  3:28 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: Tyrel Datwyler, linuxppc-dev

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> In of_drconf_to_nid_single() default_nid always refers to NUMA_NO_NODE.
> Hence replace it with NUMA_NO_NODE.
>
> No functional changes.
>

...

> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index a2c5fe0d0cad..b066d89c2975 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -432,16 +432,15 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa)
>  static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>  {
>  	struct assoc_arrays aa = { .arrays = NULL };
> -	int default_nid = NUMA_NO_NODE;
> -	int nid = default_nid;
> +	int nid = NUMA_NO_NODE;
>  	int rc, index;
>  
>  	if ((min_common_depth < 0) || !numa_enabled)
> -		return default_nid;
> +		return NUMA_NO_NODE;
>  
>  	rc = of_get_assoc_arrays(&aa);
>  	if (rc)
> -		return default_nid;
> +		return NUMA_NO_NODE;
>  
>  	if (min_common_depth <= aa.array_sz &&
>  	    !(lmb->flags & DRCONF_MEM_AI_INVALID) && lmb->aa_index < aa.n_arrays) {
> @@ -449,7 +448,7 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>  		nid = of_read_number(&aa.arrays[index], 1);
>  
>  		if (nid == 0xffff || nid >= num_possible_nodes())
> -			nid = default_nid;
> +			nid = NUMA_NO_NODE;
>  
>  		if (nid > 0) {
>  			index = lmb->aa_index * aa.array_sz;

Doesn't seem like an improvement to me, sorry. Admittedly it's a small
point, but a local variable named "default_nid" communicates that it's a
default or fallback value. That information is lost with this change.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-07-22  3:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06  6:40 [PATCH] powerpc/numa: Restrict possible nodes based on platform Srikar Dronamraju
2020-07-06 20:58 ` Tyrel Datwyler
2020-07-07  0:44   ` Nathan Lynch
2020-07-07  2:53     ` Srikar Dronamraju
2020-07-07  2:50   ` Srikar Dronamraju
2020-07-06 23:19 ` Nathan Lynch
2020-07-07  5:02 ` Michael Ellerman
2020-07-07  8:42   ` Srikar Dronamraju
2020-07-10 17:41     ` Nathan Lynch
2020-07-15 12:05       ` [PATCH 1/2] powerpc/numa: Limit possible nodes to within num_possible_nodes Srikar Dronamraju
2020-07-15 12:05         ` [PATCH 2/2] powerpc/numa: Remove a redundant variable Srikar Dronamraju
2020-07-22  3:28           ` Nathan Lynch
2020-07-22  3:14         ` [PATCH 1/2] powerpc/numa: Limit possible nodes to within num_possible_nodes Nathan Lynch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).