linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration
@ 2018-10-29 18:43 Nathan Fontenot
  2019-01-28 15:41 ` Michael Bringmann
  2019-02-08 13:02 ` Michael Ellerman
  0 siblings, 2 replies; 6+ messages in thread
From: Nathan Fontenot @ 2018-10-29 18:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: ldufour

On pseries systems, performing a partition migration can result in
altering the nodes a CPU is assigned to on the destination system. For
exampl, pre-migration on the source system CPUs are in node 1 and 3,
post-migration on the destination system CPUs are in nodes 2 and 3.

Handling the node change for a CPU can cause corruption in the slab
cache if we hit a timing where a CPUs node is changed while cache_reap()
is invoked. The corruption occurs because the slab cache code appears
to rely on the CPU and slab cache pages being on the same node.

The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
does not prevent us from hitting this scenario.

Changing the device tree property update notification handler that
recognizes an affinity change for a CPU to do a full DLPAR remove and
add of the CPU instead of dynamically changing its node resolves this
issue.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/topology.h          |    2 ++
 arch/powerpc/mm/numa.c                       |    9 +--------
 arch/powerpc/platforms/pseries/hotplug-cpu.c |   19 +++++++++++++++++++
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index a4a718dbfec6..f85e2b01c3df 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -132,6 +132,8 @@ static inline void shared_proc_topology_init(void) {}
 #define topology_sibling_cpumask(cpu)	(per_cpu(cpu_sibling_map, cpu))
 #define topology_core_cpumask(cpu)	(per_cpu(cpu_core_map, cpu))
 #define topology_core_id(cpu)		(cpu_to_core_id(cpu))
+
+int dlpar_cpu_readd(int cpu);
 #endif
 #endif
 
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 693ae1c1acba..bb6a7b56bef7 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1461,13 +1461,6 @@ static void reset_topology_timer(void)
 
 #ifdef CONFIG_SMP
 
-static void stage_topology_update(int core_id)
-{
-	cpumask_or(&cpu_associativity_changes_mask,
-		&cpu_associativity_changes_mask, cpu_sibling_mask(core_id));
-	reset_topology_timer();
-}
-
 static int dt_update_callback(struct notifier_block *nb,
 				unsigned long action, void *data)
 {
@@ -1480,7 +1473,7 @@ static int dt_update_callback(struct notifier_block *nb,
 		    !of_prop_cmp(update->prop->name, "ibm,associativity")) {
 			u32 core_id;
 			of_property_read_u32(update->dn, "reg", &core_id);
-			stage_topology_update(core_id);
+			rc = dlpar_cpu_readd(core_id);
 			rc = NOTIFY_OK;
 		}
 		break;
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 2f8e62163602..97feb6e79f1a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -802,6 +802,25 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
 	return rc;
 }
 
+int dlpar_cpu_readd(int cpu)
+{
+	struct device_node *dn;
+	struct device *dev;
+	u32 drc_index;
+	int rc;
+
+	dev = get_cpu_device(cpu);
+	dn = dev->of_node;
+
+	rc = of_property_read_u32(dn, "ibm,my-drc-index", &drc_index);
+
+	rc = dlpar_cpu_remove_by_index(drc_index);
+	if (!rc)
+		rc = dlpar_cpu_add(drc_index);
+
+	return rc;
+}
+
 int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
 {
 	u32 count, drc_index;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration
  2018-10-29 18:43 [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration Nathan Fontenot
@ 2019-01-28 15:41 ` Michael Bringmann
  2019-01-29  9:37   ` Michael Ellerman
  2019-02-08 13:02 ` Michael Ellerman
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Bringmann @ 2019-01-28 15:41 UTC (permalink / raw)
  To: Nathan Fontenot, linuxppc-dev; +Cc: ldufour

On 10/29/18 1:43 PM, Nathan Fontenot wrote:
> On pseries systems, performing a partition migration can result in
> altering the nodes a CPU is assigned to on the destination system. For
> exampl, pre-migration on the source system CPUs are in node 1 and 3,
> post-migration on the destination system CPUs are in nodes 2 and 3.
> 
> Handling the node change for a CPU can cause corruption in the slab
> cache if we hit a timing where a CPUs node is changed while cache_reap()
> is invoked. The corruption occurs because the slab cache code appears
> to rely on the CPU and slab cache pages being on the same node.
> 
> The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
> does not prevent us from hitting this scenario.
> 
> Changing the device tree property update notification handler that
> recognizes an affinity change for a CPU to do a full DLPAR remove and
> add of the CPU instead of dynamically changing its node resolves this
> issue.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com
Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>

> ---
>  arch/powerpc/include/asm/topology.h          |    2 ++
>  arch/powerpc/mm/numa.c                       |    9 +--------
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |   19 +++++++++++++++++++
>  3 files changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index a4a718dbfec6..f85e2b01c3df 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -132,6 +132,8 @@ static inline void shared_proc_topology_init(void) {}
>  #define topology_sibling_cpumask(cpu)	(per_cpu(cpu_sibling_map, cpu))
>  #define topology_core_cpumask(cpu)	(per_cpu(cpu_core_map, cpu))
>  #define topology_core_id(cpu)		(cpu_to_core_id(cpu))
> +
> +int dlpar_cpu_readd(int cpu);
>  #endif
>  #endif
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 693ae1c1acba..bb6a7b56bef7 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1461,13 +1461,6 @@ static void reset_topology_timer(void)
> 
>  #ifdef CONFIG_SMP
> 
> -static void stage_topology_update(int core_id)
> -{
> -	cpumask_or(&cpu_associativity_changes_mask,
> -		&cpu_associativity_changes_mask, cpu_sibling_mask(core_id));
> -	reset_topology_timer();
> -}
> -
>  static int dt_update_callback(struct notifier_block *nb,
>  				unsigned long action, void *data)
>  {
> @@ -1480,7 +1473,7 @@ static int dt_update_callback(struct notifier_block *nb,
>  		    !of_prop_cmp(update->prop->name, "ibm,associativity")) {
>  			u32 core_id;
>  			of_property_read_u32(update->dn, "reg", &core_id);
> -			stage_topology_update(core_id);
> +			rc = dlpar_cpu_readd(core_id);
>  			rc = NOTIFY_OK;
>  		}
>  		break;
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 2f8e62163602..97feb6e79f1a 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -802,6 +802,25 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
>  	return rc;
>  }
> 
> +int dlpar_cpu_readd(int cpu)
> +{
> +	struct device_node *dn;
> +	struct device *dev;
> +	u32 drc_index;
> +	int rc;
> +
> +	dev = get_cpu_device(cpu);
> +	dn = dev->of_node;
> +
> +	rc = of_property_read_u32(dn, "ibm,my-drc-index", &drc_index);
> +
> +	rc = dlpar_cpu_remove_by_index(drc_index);
> +	if (!rc)
> +		rc = dlpar_cpu_add(drc_index);
> +
> +	return rc;
> +}
> +
>  int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>  {
>  	u32 count, drc_index;
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration
  2019-01-28 15:41 ` Michael Bringmann
@ 2019-01-29  9:37   ` Michael Ellerman
  2019-01-29 16:12     ` Michael Bringmann
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Ellerman @ 2019-01-29  9:37 UTC (permalink / raw)
  To: Michael Bringmann, Nathan Fontenot, linuxppc-dev; +Cc: ldufour

Michael Bringmann <mwb@linux.vnet.ibm.com> writes:

> On 10/29/18 1:43 PM, Nathan Fontenot wrote:
>> On pseries systems, performing a partition migration can result in
>> altering the nodes a CPU is assigned to on the destination system. For
>> exampl, pre-migration on the source system CPUs are in node 1 and 3,
>> post-migration on the destination system CPUs are in nodes 2 and 3.
>> 
>> Handling the node change for a CPU can cause corruption in the slab
>> cache if we hit a timing where a CPUs node is changed while cache_reap()
>> is invoked. The corruption occurs because the slab cache code appears
>> to rely on the CPU and slab cache pages being on the same node.
>> 
>> The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
>> does not prevent us from hitting this scenario.
>> 
>> Changing the device tree property update notification handler that
>> recognizes an affinity change for a CPU to do a full DLPAR remove and
>> add of the CPU instead of dynamically changing its node resolves this
>> issue.
>> 
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com
> Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>

Are you sure that's what you meant? ie. you wrote some of the patch?

What I'd like is to get a Tested-by from you.

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration
  2019-01-29  9:37   ` Michael Ellerman
@ 2019-01-29 16:12     ` Michael Bringmann
  2019-01-30 12:28       ` Michael Ellerman
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Bringmann @ 2019-01-29 16:12 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Fontenot, linuxppc-dev; +Cc: ldufour



On 1/29/19 3:37 AM, Michael Ellerman wrote:
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> 
>> On 10/29/18 1:43 PM, Nathan Fontenot wrote:
>>> On pseries systems, performing a partition migration can result in
>>> altering the nodes a CPU is assigned to on the destination system. For
>>> exampl, pre-migration on the source system CPUs are in node 1 and 3,
>>> post-migration on the destination system CPUs are in nodes 2 and 3.
>>>
>>> Handling the node change for a CPU can cause corruption in the slab
>>> cache if we hit a timing where a CPUs node is changed while cache_reap()
>>> is invoked. The corruption occurs because the slab cache code appears
>>> to rely on the CPU and slab cache pages being on the same node.
>>>
>>> The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
>>> does not prevent us from hitting this scenario.
>>>
>>> Changing the device tree property update notification handler that
>>> recognizes an affinity change for a CPU to do a full DLPAR remove and
>>> add of the CPU instead of dynamically changing its node resolves this
>>> issue.
>>>
>>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com
>> Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>

Tested-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>

> 
> Are you sure that's what you meant? ie. you wrote some of the patch?
> 
> What I'd like is to get a Tested-by from you.
> 
> cheers
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration
  2019-01-29 16:12     ` Michael Bringmann
@ 2019-01-30 12:28       ` Michael Ellerman
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2019-01-30 12:28 UTC (permalink / raw)
  To: Michael Bringmann, Nathan Fontenot, linuxppc-dev; +Cc: ldufour

Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> On 1/29/19 3:37 AM, Michael Ellerman wrote:
>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>> 
>>> On 10/29/18 1:43 PM, Nathan Fontenot wrote:
>>>> On pseries systems, performing a partition migration can result in
>>>> altering the nodes a CPU is assigned to on the destination system. For
>>>> exampl, pre-migration on the source system CPUs are in node 1 and 3,
>>>> post-migration on the destination system CPUs are in nodes 2 and 3.
>>>>
>>>> Handling the node change for a CPU can cause corruption in the slab
>>>> cache if we hit a timing where a CPUs node is changed while cache_reap()
>>>> is invoked. The corruption occurs because the slab cache code appears
>>>> to rely on the CPU and slab cache pages being on the same node.
>>>>
>>>> The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
>>>> does not prevent us from hitting this scenario.
>>>>
>>>> Changing the device tree property update notification handler that
>>>> recognizes an affinity change for a CPU to do a full DLPAR remove and
>>>> add of the CPU instead of dynamically changing its node resolves this
>>>> issue.
>>>>
>>>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com
>>> Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>
>
> Tested-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>

Thanks.

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: powerpc/pseries: Perform full re-add of CPU for topology update post-migration
  2018-10-29 18:43 [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration Nathan Fontenot
  2019-01-28 15:41 ` Michael Bringmann
@ 2019-02-08 13:02 ` Michael Ellerman
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2019-02-08 13:02 UTC (permalink / raw)
  To: Nathan Fontenot, linuxppc-dev; +Cc: ldufour

On Mon, 2018-10-29 at 18:43:36 UTC, Nathan Fontenot wrote:
> On pseries systems, performing a partition migration can result in
> altering the nodes a CPU is assigned to on the destination system. For
> exampl, pre-migration on the source system CPUs are in node 1 and 3,
> post-migration on the destination system CPUs are in nodes 2 and 3.
> 
> Handling the node change for a CPU can cause corruption in the slab
> cache if we hit a timing where a CPUs node is changed while cache_reap()
> is invoked. The corruption occurs because the slab cache code appears
> to rely on the CPU and slab cache pages being on the same node.
> 
> The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
> does not prevent us from hitting this scenario.
> 
> Changing the device tree property update notification handler that
> recognizes an affinity change for a CPU to do a full DLPAR remove and
> add of the CPU instead of dynamically changing its node resolves this
> issue.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>
> Tested-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/81b61324922c67f73813d8a9c175f3c1

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-02-08 14:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-29 18:43 [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration Nathan Fontenot
2019-01-28 15:41 ` Michael Bringmann
2019-01-29  9:37   ` Michael Ellerman
2019-01-29 16:12     ` Michael Bringmann
2019-01-30 12:28       ` Michael Ellerman
2019-02-08 13:02 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).