linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc/numa: Return the first online node instead of 0
@ 2022-06-23 12:54 Aneesh Kumar K.V
  2022-06-23 12:54 ` [PATCH 2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node Aneesh Kumar K.V
  2022-06-24  8:39 ` [PATCH 1/2] powerpc/numa: Return the first online node instead of 0 Srikar Dronamraju
  0 siblings, 2 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2022-06-23 12:54 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, Srikar Dronamraju

If early cpu to node mapping finds an invalid node id, return
the first online node instead of node 0.

With commit e75130f20b1f ("powerpc/numa: Offline memoryless cpuless node 0")
the kernel marks node 0 offline in certain scenarios.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/topology.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 8a4d4f4d9749..704088b1d53c 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -60,7 +60,7 @@ static inline int early_cpu_to_node(int cpu)
 	 * Fall back to node 0 if nid is unset (it should be, except bugs).
 	 * This allows callers to safely do NODE_DATA(early_cpu_to_node(cpu)).
 	 */
-	return (nid < 0) ? 0 : nid;
+	return (nid < 0) ? first_online_node : nid;
 }
 
 int of_drconf_to_nid_single(struct drmem_lmb *lmb);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node
  2022-06-23 12:54 [PATCH 1/2] powerpc/numa: Return the first online node instead of 0 Aneesh Kumar K.V
@ 2022-06-23 12:54 ` Aneesh Kumar K.V
  2022-06-24  8:50   ` Srikar Dronamraju
  2022-06-24  8:39 ` [PATCH 1/2] powerpc/numa: Return the first online node instead of 0 Srikar Dronamraju
  1 sibling, 1 reply; 6+ messages in thread
From: Aneesh Kumar K.V @ 2022-06-23 12:54 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, Srikar Dronamraju

While building the cpu_to_node map make sure we always use the online node
to build the mapping table. In general this should not be an issue
because the kernel use similar lookup mechanism (vphn_get_nid()) to mark
nodes online (setup_node_data()). Hence NUMA nodes we find during
lookup in numa_setup_cpu() will always be found online.

To keep logic simpler/correct, make sure that if the hypervisor
or device tree returned a not online node, don't use that to build
the map table. Instead, use the first_online_node.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 0801b2ce9b7d..f387b9eb9dc9 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu)
 	of_node_put(cpu);
 
 out_present:
-	if (nid < 0 || !node_possible(nid))
+	if (nid < 0 || !node_online(nid))
 		nid = first_online_node;
 
 	/*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] powerpc/numa: Return the first online node instead of 0
  2022-06-23 12:54 [PATCH 1/2] powerpc/numa: Return the first online node instead of 0 Aneesh Kumar K.V
  2022-06-23 12:54 ` [PATCH 2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node Aneesh Kumar K.V
@ 2022-06-24  8:39 ` Srikar Dronamraju
  2022-06-27 14:05   ` Aneesh Kumar K.V
  1 sibling, 1 reply; 6+ messages in thread
From: Srikar Dronamraju @ 2022-06-24  8:39 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev

* Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2022-06-23 18:24:41]:

> If early cpu to node mapping finds an invalid node id, return
> the first online node instead of node 0.
> 
> With commit e75130f20b1f ("powerpc/numa: Offline memoryless cpuless node 0")
> the kernel marks node 0 offline in certain scenarios.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/topology.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index 8a4d4f4d9749..704088b1d53c 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -60,7 +60,7 @@ static inline int early_cpu_to_node(int cpu)
>  	 * Fall back to node 0 if nid is unset (it should be, except bugs).
>  	 * This allows callers to safely do NODE_DATA(early_cpu_to_node(cpu)).
>  	 */
> -	return (nid < 0) ? 0 : nid;
> +	return (nid < 0) ? first_online_node : nid;

Looks good but just two queries.

1. Is there a possibility of early_cpu_to_node() being called before any
node is online?

2. first_online_node is actually not a variable, it returns the lowest
online node. Right? If lets a early_cpu_to_node() for the same CPU across a
node online/offline may end up giving two different nids. Right?


>  }
> 
>  int of_drconf_to_nid_single(struct drmem_lmb *lmb);
> -- 
> 2.36.1
> 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node
  2022-06-23 12:54 ` [PATCH 2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node Aneesh Kumar K.V
@ 2022-06-24  8:50   ` Srikar Dronamraju
  2022-06-27  5:27     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 6+ messages in thread
From: Srikar Dronamraju @ 2022-06-24  8:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev

* Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2022-06-23 18:24:42]:

> While building the cpu_to_node map make sure we always use the online node
> to build the mapping table. In general this should not be an issue
> because the kernel use similar lookup mechanism (vphn_get_nid()) to mark
> nodes online (setup_node_data()). Hence NUMA nodes we find during
> lookup in numa_setup_cpu() will always be found online.
> 
> To keep logic simpler/correct, make sure that if the hypervisor
> or device tree returned a not online node, don't use that to build
> the map table. Instead, use the first_online_node.

Why should the returned nid be already online. Are we facing any problem
with the current code?

Since general idea is to keep cpu_to_node constant, By assigining it the
first_online_node, we may be ending up assigning a wrong node, resulting a
performance penalty later on. i.e CPU may actually belong to node 4 but we
assigned it to node 1. Also the coregroup id is derived from associavity
array. If there is a mismatch between the coregroup id and nid,
scheduler will crib.

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/mm/numa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 0801b2ce9b7d..f387b9eb9dc9 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu)
>  	of_node_put(cpu);
> 
>  out_present:
> -	if (nid < 0 || !node_possible(nid))
> +	if (nid < 0 || !node_online(nid))
>  		nid = first_online_node;
> 
>  	/*
> -- 
> 2.36.1
> 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node
  2022-06-24  8:50   ` Srikar Dronamraju
@ 2022-06-27  5:27     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2022-06-27  5:27 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: linuxppc-dev

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> * Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2022-06-23 18:24:42]:
>
>> While building the cpu_to_node map make sure we always use the online node
>> to build the mapping table. In general this should not be an issue
>> because the kernel use similar lookup mechanism (vphn_get_nid()) to mark
>> nodes online (setup_node_data()). Hence NUMA nodes we find during
>> lookup in numa_setup_cpu() will always be found online.
>> 
>> To keep logic simpler/correct, make sure that if the hypervisor
>> or device tree returned a not online node, don't use that to build
>> the map table. Instead, use the first_online_node.
>
> Why should the returned nid be already online. Are we facing any problem
> with the current code?

There is no issue with the current upstream code. The problem was
detected in a distro kernel where we had partial backport. We didn't
initialize NUMA node details correctly. So only a subset of actual
NUMA nodes got marked online. This resulted in a crash at completely
independent part of the kernel because the kernel try to derefer NODE_DATA()
which was not initialized. 

>
> Since general idea is to keep cpu_to_node constant, By assigining it the
> first_online_node, we may be ending up assigning a wrong node, resulting a
> performance penalty later on. i.e CPU may actually belong to node 4 but we
> assigned it to node 1. Also the coregroup id is derived from associavity
> array. If there is a mismatch between the coregroup id and nid,
> scheduler will crib.

The changed code will never be executed unless something broke between
the NUMA node init (setup_node_data() and numa_setup_cpu()). The reason
the code change was made was to keep it consistent with the fact that we
only initialize NODE_DATA for online NUMA nodes and hence don't return
a possible NUMA node value in numa_setup_cpu(). 

>
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  arch/powerpc/mm/numa.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 0801b2ce9b7d..f387b9eb9dc9 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu)
>>  	of_node_put(cpu);
>> 
>>  out_present:
>> -	if (nid < 0 || !node_possible(nid))
>> +	if (nid < 0 || !node_online(nid))
>>  		nid = first_online_node;
>> 
>>  	/*
>> -- 
>> 2.36.1
>> 
>
> -- 
> Thanks and Regards
> Srikar Dronamraju

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] powerpc/numa: Return the first online node instead of 0
  2022-06-24  8:39 ` [PATCH 1/2] powerpc/numa: Return the first online node instead of 0 Srikar Dronamraju
@ 2022-06-27 14:05   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2022-06-27 14:05 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: linuxppc-dev

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> * Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2022-06-23 18:24:41]:
>
>> If early cpu to node mapping finds an invalid node id, return
>> the first online node instead of node 0.
>> 
>> With commit e75130f20b1f ("powerpc/numa: Offline memoryless cpuless node 0")
>> the kernel marks node 0 offline in certain scenarios.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  arch/powerpc/include/asm/topology.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
>> index 8a4d4f4d9749..704088b1d53c 100644
>> --- a/arch/powerpc/include/asm/topology.h
>> +++ b/arch/powerpc/include/asm/topology.h
>> @@ -60,7 +60,7 @@ static inline int early_cpu_to_node(int cpu)
>>  	 * Fall back to node 0 if nid is unset (it should be, except bugs).
>>  	 * This allows callers to safely do NODE_DATA(early_cpu_to_node(cpu)).
>>  	 */
>> -	return (nid < 0) ? 0 : nid;
>> +	return (nid < 0) ? first_online_node : nid;
>
> Looks good but just two queries.
>
> 1. Is there a possibility of early_cpu_to_node() being called before any
> node is online?

The kernel operate with node 0 online most of the early boot and mark it offline
in mem_topology_setup() just before parse_numa_properties(). So we
should find some nodes set online.

>
> 2. first_online_node is actually not a variable, it returns the lowest
> online node. Right? If lets a early_cpu_to_node() for the same CPU across a
> node online/offline may end up giving two different nids. Right?
>

The change is specific to a case where we find uninitialized
numa_cpu_lookup_table. ie, the firmware didn't specify the mapping for
the cpu. I do agree that for such cpus the node mapping can change
because of the above. I am not sure whether this can cause any issue in
practice. But returning node 0 which can be marked offline can result
in crashes? 

>
>>  }
>> 
>>  int of_drconf_to_nid_single(struct drmem_lmb *lmb);
>> -- 
>> 2.36.1
>> 
>
> -- 
> Thanks and Regards
> Srikar Dronamraju

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-27 14:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23 12:54 [PATCH 1/2] powerpc/numa: Return the first online node instead of 0 Aneesh Kumar K.V
2022-06-23 12:54 ` [PATCH 2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node Aneesh Kumar K.V
2022-06-24  8:50   ` Srikar Dronamraju
2022-06-27  5:27     ` Aneesh Kumar K.V
2022-06-24  8:39 ` [PATCH 1/2] powerpc/numa: Return the first online node instead of 0 Srikar Dronamraju
2022-06-27 14:05   ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).