linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index
@ 2019-05-16  2:37 Tyrel Datwyler
  2019-05-16  2:37 ` [PATCH 2/3] powerpc/pseries: Disable PRRN memory device tree trigger Tyrel Datwyler
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Tyrel Datwyler @ 2019-05-16  2:37 UTC (permalink / raw)
  To: mpe; +Cc: nathanl, Tyrel Datwyler, Tyrel Datwyler, mingming.cao, linuxppc-dev

The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
the cpus device_node so that we can get at the ibm,my-drc-index
property. The only user of cpu readd is an OF notifier call back. This
call back already has a reference to the device_node and therefore can
retrieve the drc_index from the device_node.

This patch simplifies dlpar_cpu_readd() to take a drc_index directly and
does away with an uneccsary device_node lookup.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/topology.h          |  2 +-
 arch/powerpc/mm/numa.c                       |  6 +++---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +---------
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index f85e2b01c3df..c906d9ec9013 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -133,7 +133,7 @@ static inline void shared_proc_topology_init(void) {}
 #define topology_core_cpumask(cpu)	(per_cpu(cpu_core_map, cpu))
 #define topology_core_id(cpu)		(cpu_to_core_id(cpu))
 
-int dlpar_cpu_readd(int cpu);
+int dlpar_cpu_readd(u32 drc_index);
 #endif
 #endif
 
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 57e64273cb33..40c0b6da12c2 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1479,9 +1479,9 @@ static int dt_update_callback(struct notifier_block *nb,
 	case OF_RECONFIG_UPDATE_PROPERTY:
 		if (of_node_is_type(update->dn, "cpu") &&
 		    !of_prop_cmp(update->prop->name, "ibm,associativity")) {
-			u32 core_id;
-			of_property_read_u32(update->dn, "reg", &core_id);
-			rc = dlpar_cpu_readd(core_id);
+			u32 drc_index;
+			of_property_read_u32(update->dn, "ibm,my-drc-index", &drc_index);
+			rc = dlpar_cpu_readd(drc_index);
 			rc = NOTIFY_OK;
 		}
 		break;
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 97feb6e79f1a..2dfa9416ce54 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -802,18 +802,10 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
 	return rc;
 }
 
-int dlpar_cpu_readd(int cpu)
+int dlpar_cpu_readd(u32 drc_index)
 {
-	struct device_node *dn;
-	struct device *dev;
-	u32 drc_index;
 	int rc;
 
-	dev = get_cpu_device(cpu);
-	dn = dev->of_node;
-
-	rc = of_property_read_u32(dn, "ibm,my-drc-index", &drc_index);
-
 	rc = dlpar_cpu_remove_by_index(drc_index);
 	if (!rc)
 		rc = dlpar_cpu_add(drc_index);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] powerpc/pseries: Disable PRRN memory device tree trigger
  2019-05-16  2:37 [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index Tyrel Datwyler
@ 2019-05-16  2:37 ` Tyrel Datwyler
  2019-05-16  2:37 ` [PATCH 3/3] powerpc/pseries: Don't update cpu topology after PRRN event Tyrel Datwyler
  2019-05-16 19:17 ` [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index Nathan Lynch
  2 siblings, 0 replies; 8+ messages in thread
From: Tyrel Datwyler @ 2019-05-16  2:37 UTC (permalink / raw)
  To: mpe; +Cc: nathanl, Tyrel Datwyler, Tyrel Datwyler, mingming.cao, linuxppc-dev

Memory affintiy updates as currently implemented have proved unstable.

This patch comments out the PRRN hook for the time being while we
investigate how to either stablize the current implementation or find a
better approach.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 88925f8ca8a0..660a2dbc43d7 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -242,13 +242,15 @@ static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
 
 static void prrn_update_node(__be32 phandle)
 {
+	/* PRRN Memory Updates have proved unstable. Disable for the time being.
+	 *
 	struct pseries_hp_errorlog hp_elog;
 	struct device_node *dn;
 
-	/*
+	 *
 	 * If a node is found from a the given phandle, the phandle does not
 	 * represent the drc index of an LMB and we can ignore.
-	 */
+	 *
 	dn = of_find_node_by_phandle(be32_to_cpu(phandle));
 	if (dn) {
 		of_node_put(dn);
@@ -261,6 +263,7 @@ static void prrn_update_node(__be32 phandle)
 	hp_elog._drc_u.drc_index = phandle;
 
 	handle_dlpar_errorlog(&hp_elog);
+	*/
 }
 
 int pseries_devicetree_update(s32 scope)
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] powerpc/pseries: Don't update cpu topology after PRRN event
  2019-05-16  2:37 [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index Tyrel Datwyler
  2019-05-16  2:37 ` [PATCH 2/3] powerpc/pseries: Disable PRRN memory device tree trigger Tyrel Datwyler
@ 2019-05-16  2:37 ` Tyrel Datwyler
  2019-05-16 19:17 ` [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index Nathan Lynch
  2 siblings, 0 replies; 8+ messages in thread
From: Tyrel Datwyler @ 2019-05-16  2:37 UTC (permalink / raw)
  To: mpe; +Cc: nathanl, Tyrel Datwyler, Tyrel Datwyler, mingming.cao, linuxppc-dev

When we receive a PRRN event through the event-scan interface we call
pseries_devicetree_update() to update the affinty properties in our
device tree via RTAS. Following this our implemenation attempts to both
frob the existing kernel cpu numa affinities of the live system with the
new device tree properties while also performing a full cpu hotplug
readd of the affected cpus in resposne to the a OF property notifier
triggerd by the device tree update.

This patch does away with the topology update call to frob the
associativity since the DLPAR readd will put the cpu in the proper
numa node when its added back.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/rtasd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 8a1746d755c9..d3aa3a056d8e 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -285,7 +285,6 @@ static void handle_prrn_event(s32 scope)
 	 * the RTAS event.
 	 */
 	pseries_devicetree_update(-scope);
-	numa_update_cpu_topology(false);
 }
 
 static void handle_rtas_event(const struct rtas_error_log *log)
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index
  2019-05-16  2:37 [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index Tyrel Datwyler
  2019-05-16  2:37 ` [PATCH 2/3] powerpc/pseries: Disable PRRN memory device tree trigger Tyrel Datwyler
  2019-05-16  2:37 ` [PATCH 3/3] powerpc/pseries: Don't update cpu topology after PRRN event Tyrel Datwyler
@ 2019-05-16 19:17 ` Nathan Lynch
  2019-05-17 22:58   ` Tyrel Datwyler
  2 siblings, 1 reply; 8+ messages in thread
From: Nathan Lynch @ 2019-05-16 19:17 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: mingming.cao, linuxppc-dev

Tyrel Datwyler <tyreld@linux.ibm.com> writes:
> The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
> the cpus device_node so that we can get at the ibm,my-drc-index
> property. The only user of cpu readd is an OF notifier call back. This
> call back already has a reference to the device_node and therefore can
> retrieve the drc_index from the device_node.

dlpar_cpu_readd is a hack to try to change the CPU-node relationship at
runtime without destabilizing the system. It doesn't accomplish that and
it should just be removed (and I'm working on that).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index
  2019-05-16 19:17 ` [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index Nathan Lynch
@ 2019-05-17 22:58   ` Tyrel Datwyler
  2019-05-20 15:01     ` Nathan Lynch
  0 siblings, 1 reply; 8+ messages in thread
From: Tyrel Datwyler @ 2019-05-17 22:58 UTC (permalink / raw)
  To: Nathan Lynch, Tyrel Datwyler; +Cc: mingming.cao, linuxppc-dev

On 05/16/2019 12:17 PM, Nathan Lynch wrote:
> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>> The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
>> the cpus device_node so that we can get at the ibm,my-drc-index
>> property. The only user of cpu readd is an OF notifier call back. This
>> call back already has a reference to the device_node and therefore can
>> retrieve the drc_index from the device_node.
> 
> dlpar_cpu_readd is a hack to try to change the CPU-node relationship at
> runtime without destabilizing the system. It doesn't accomplish that and
> it should just be removed (and I'm working on that).
> 

I will politely disagree. We've done exactly this from userspace for years. My
experience still suggests that memory affinity is the problem area, and that the
work to push this all into the kernel originally was poorly tested.

-Tyrel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index
  2019-05-17 22:58   ` Tyrel Datwyler
@ 2019-05-20 15:01     ` Nathan Lynch
  2019-06-03  0:11       ` Tyrel Datwyler
  0 siblings, 1 reply; 8+ messages in thread
From: Nathan Lynch @ 2019-05-20 15:01 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: mingming.cao, linuxppc-dev

Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:

> On 05/16/2019 12:17 PM, Nathan Lynch wrote:
>> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>>> The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
>>> the cpus device_node so that we can get at the ibm,my-drc-index
>>> property. The only user of cpu readd is an OF notifier call back. This
>>> call back already has a reference to the device_node and therefore can
>>> retrieve the drc_index from the device_node.
>> 
>> dlpar_cpu_readd is a hack to try to change the CPU-node relationship at
>> runtime without destabilizing the system. It doesn't accomplish that and
>> it should just be removed (and I'm working on that).
>> 
>
> I will politely disagree. We've done exactly this from userspace for
> years. My experience still suggests that memory affinity is the
> problem area, and that the work to push this all into the kernel
> originally was poorly tested.

Kernel implementation details aside, how do you change the cpu-node
relationship at runtime without breaking NUMA-aware applications? Is
this not a fundamental issue to address before adding code like this?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index
  2019-05-20 15:01     ` Nathan Lynch
@ 2019-06-03  0:11       ` Tyrel Datwyler
  2019-06-04 17:21         ` Nathan Lynch
  0 siblings, 1 reply; 8+ messages in thread
From: Tyrel Datwyler @ 2019-06-03  0:11 UTC (permalink / raw)
  To: Nathan Lynch, Tyrel Datwyler; +Cc: mingming.cao, linuxppc-dev

On 05/20/2019 08:01 AM, Nathan Lynch wrote:
> Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:
> 
>> On 05/16/2019 12:17 PM, Nathan Lynch wrote:
>>> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>>>> The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
>>>> the cpus device_node so that we can get at the ibm,my-drc-index
>>>> property. The only user of cpu readd is an OF notifier call back. This
>>>> call back already has a reference to the device_node and therefore can
>>>> retrieve the drc_index from the device_node.
>>>
>>> dlpar_cpu_readd is a hack to try to change the CPU-node relationship at
>>> runtime without destabilizing the system. It doesn't accomplish that and
>>> it should just be removed (and I'm working on that).
>>>
>>
>> I will politely disagree. We've done exactly this from userspace for
>> years. My experience still suggests that memory affinity is the
>> problem area, and that the work to push this all into the kernel
>> originally was poorly tested.
> 
> Kernel implementation details aside, how do you change the cpu-node
> relationship at runtime without breaking NUMA-aware applications? Is
> this not a fundamental issue to address before adding code like this?
> 

If that is the concern then hotplug in general already breaks them. Take for
example the removal of a faulty processor and then adding a new processor back.
It is quite possible that the new processor is in a different NUMA node. Keep in
mind that in this scenario the new processor and threads gets the same logical
cpu ids as the faulty processor we just removed.

Now we have to ask the question who is right and who is wrong. In this case the
kernel data structures reflect the correct NUMA topology. However, did the NUMA
aware application or libnuma make an assumption that specific sets of logical
cpu ids are always in the same NUMA node?

-Tyrel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index
  2019-06-03  0:11       ` Tyrel Datwyler
@ 2019-06-04 17:21         ` Nathan Lynch
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Lynch @ 2019-06-04 17:21 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: mingming.cao, linuxppc-dev

Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:
> On 05/20/2019 08:01 AM, Nathan Lynch wrote:
>> Kernel implementation details aside, how do you change the cpu-node
>> relationship at runtime without breaking NUMA-aware applications? Is
>> this not a fundamental issue to address before adding code like this?
>> 
>
> If that is the concern then hotplug in general already breaks
> them. Take for example the removal of a faulty processor and then
> adding a new processor back.  It is quite possible that the new
> processor is in a different NUMA node. Keep in mind that in this
> scenario the new processor and threads gets the same logical cpu ids
> as the faulty processor we just removed.

Yes, the problem is re-use of a logical CPU id with a node id that
differs from the one it was initially assigned, and there are several
ways to get into that situation on this platform. We probably need to be
more careful in how we allocate a spot in the CPU maps for a newly-added
processor. I believe the algorithm is simple first-fit right now, and it
doesn't take into account prior NUMA relationships.


> Now we have to ask the question who is right and who is wrong. In this
> case the kernel data structures reflect the correct NUMA
> topology. However, did the NUMA aware application or libnuma make an
> assumption that specific sets of logical cpu ids are always in the
> same NUMA node?

Yes, and that assumption is widespread because people tend to develop on
an architecture where this kind of stuff doesn't happen (at least not
yet).

And I don't really agree that the current behavior reflects what is
actually going on. When Linux running in a PowerVM LPAR receives a
notification to change the NUMA properties of a processor at runtime,
it's because the platform has changed the physical characteristics of
the partition. I.e. you're now using a different physical processor,
with different relationships to the other resources in the system. Even
if it didn't destabilize the kernel (by changing the result of
cpu_to_node() when various subsystems assume it will be static),
continuing to use the logical CPU ids on the new processor obscures what
has actually happened. And we have developers that have told us that
this behavior - changing the logical cpu<->node relationship at runtime
- is something their existing NUMA-aware applications cannot handle.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-06-04 17:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16  2:37 [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index Tyrel Datwyler
2019-05-16  2:37 ` [PATCH 2/3] powerpc/pseries: Disable PRRN memory device tree trigger Tyrel Datwyler
2019-05-16  2:37 ` [PATCH 3/3] powerpc/pseries: Don't update cpu topology after PRRN event Tyrel Datwyler
2019-05-16 19:17 ` [PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index Nathan Lynch
2019-05-17 22:58   ` Tyrel Datwyler
2019-05-20 15:01     ` Nathan Lynch
2019-06-03  0:11       ` Tyrel Datwyler
2019-06-04 17:21         ` Nathan Lynch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).