linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] more migration vs CPU hotplug fixes etc
@ 2019-08-02 19:29 Nathan Lynch
  2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Nathan Lynch @ 2019-08-02 19:29 UTC (permalink / raw)
  To: linuxppc-dev

Despite recent fixes, userspace-initiated CPU hotplug still can
destructively race with the migration code's CPU state
manipulations. And parts of the LPM implementation have potentially
long-running code, especially on larger systems, that ties up the CPU
causing RCU stalls etc.

Changes since v1:
- Correct description of cpu hotplug vs LPM race.
- Add fix for long-running code in pseries_devicetree_update and friends

Nathan Lynch (3):
  powerpc/rtas: use device model APIs and serialization during LPM
  powerpc/rtas: allow rescheduling while changing cpu states
  powerpc/pseries/mobility: use cond_resched when updating device tree

 arch/powerpc/kernel/rtas.c                | 13 ++++++++++---
 arch/powerpc/platforms/pseries/mobility.c |  9 +++++++++
 2 files changed, 19 insertions(+), 3 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM
  2019-08-02 19:29 [PATCH v2 0/3] more migration vs CPU hotplug fixes etc Nathan Lynch
@ 2019-08-02 19:29 ` Nathan Lynch
  2019-08-05 23:05   ` Tyrel Datwyler
                     ` (2 more replies)
  2019-08-02 19:29 ` [PATCH v2 2/3] powerpc/rtas: allow rescheduling while changing cpu states Nathan Lynch
  2019-08-02 19:29 ` [PATCH v2 3/3] powerpc/pseries/mobility: use cond_resched when updating device tree Nathan Lynch
  2 siblings, 3 replies; 10+ messages in thread
From: Nathan Lynch @ 2019-08-02 19:29 UTC (permalink / raw)
  To: linuxppc-dev

The LPAR migration implementation and userspace-initiated cpu hotplug
can interleave their executions like so:

1. Set cpu 7 offline via sysfs.

2. Begin a partition migration, whose implementation requires the OS
   to ensure all present cpus are online; cpu 7 is onlined:

     rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up

   This sets cpu 7 online in all respects except for the cpu's
   corresponding struct device; dev->offline remains true.

3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
   already online and returns success. The driver core (device_online)
   sets dev->offline = false.

4. The migration completes and restores cpu 7 to offline state:

     rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down

This leaves cpu7 in a state where the driver core considers the cpu
device online, but in all other respects it is offline and
unused. Attempts to online the cpu via sysfs appear to succeed but the
driver core actually does not pass the request to the lower-level
cpuhp support code. This makes the cpu unusable until the cpu device
is manually set offline and then online again via sysfs.

Instead of directly calling cpu_up/cpu_down, the migration code should
use the higher-level device core APIs to maintain consistent state and
serialize operations.

Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 5faf0a64c92b..05824eb4323b 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
 		return 0;
 
 	for_each_cpu(cpu, cpus) {
+		struct device *dev = get_cpu_device(cpu);
+
 		switch (state) {
 		case DOWN:
-			cpuret = cpu_down(cpu);
+			cpuret = device_offline(dev);
 			break;
 		case UP:
-			cpuret = cpu_up(cpu);
+			cpuret = device_online(dev);
 			break;
 		}
-		if (cpuret) {
+		if (cpuret < 0) {
 			pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
 					__func__,
 					((state == UP) ? "up" : "down"),
@@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle)
 	data.token = rtas_token("ibm,suspend-me");
 	data.complete = &done;
 
+	lock_device_hotplug();
+
 	/* All present CPUs must be online */
 	cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
 	cpuret = rtas_online_cpus_mask(offline_mask);
@@ -1006,6 +1010,7 @@ int rtas_ibm_suspend_me(u64 handle)
 				__func__);
 
 out:
+	unlock_device_hotplug();
 	free_cpumask_var(offline_mask);
 	return atomic_read(&data.error);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] powerpc/rtas: allow rescheduling while changing cpu states
  2019-08-02 19:29 [PATCH v2 0/3] more migration vs CPU hotplug fixes etc Nathan Lynch
  2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
@ 2019-08-02 19:29 ` Nathan Lynch
  2019-08-13 17:17   ` Gautham R Shenoy
  2019-08-02 19:29 ` [PATCH v2 3/3] powerpc/pseries/mobility: use cond_resched when updating device tree Nathan Lynch
  2 siblings, 1 reply; 10+ messages in thread
From: Nathan Lynch @ 2019-08-02 19:29 UTC (permalink / raw)
  To: linuxppc-dev

rtas_cpu_state_change_mask() potentially operates on scores of cpus,
so explicitly allow rescheduling in the loop body.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 05824eb4323b..b7ca2fde68a9 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -16,6 +16,7 @@
 #include <linux/capability.h>
 #include <linux/delay.h>
 #include <linux/cpu.h>
+#include <linux/sched.h>
 #include <linux/smp.h>
 #include <linux/completion.h>
 #include <linux/cpumask.h>
@@ -898,6 +899,7 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
 				cpumask_clear_cpu(cpu, cpus);
 			}
 		}
+		cond_resched();
 	}
 
 	return ret;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] powerpc/pseries/mobility: use cond_resched when updating device tree
  2019-08-02 19:29 [PATCH v2 0/3] more migration vs CPU hotplug fixes etc Nathan Lynch
  2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
  2019-08-02 19:29 ` [PATCH v2 2/3] powerpc/rtas: allow rescheduling while changing cpu states Nathan Lynch
@ 2019-08-02 19:29 ` Nathan Lynch
  2 siblings, 0 replies; 10+ messages in thread
From: Nathan Lynch @ 2019-08-02 19:29 UTC (permalink / raw)
  To: linuxppc-dev

After a partition migration, pseries_devicetree_update() processes
changes to the device tree communicated from the platform to
Linux. This is a relatively heavyweight operation, with multiple
device tree searches, memory allocations, and conversations with
partition firmware.

There's a few levels of nested loops which are bounded only by
decisions made by the platform, outside of Linux's control, and indeed
we have seen RCU stalls on large systems while executing this call
graph. Use cond_resched() in these loops so that the cpu is yielded
when needed.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index fe812bebdf5e..b571285f6c14 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -9,6 +9,7 @@
 #include <linux/cpu.h>
 #include <linux/kernel.h>
 #include <linux/kobject.h>
+#include <linux/sched.h>
 #include <linux/smp.h>
 #include <linux/stat.h>
 #include <linux/completion.h>
@@ -207,7 +208,11 @@ static int update_dt_node(__be32 phandle, s32 scope)
 
 				prop_data += vd;
 			}
+
+			cond_resched();
 		}
+
+		cond_resched();
 	} while (rtas_rc == 1);
 
 	of_node_put(dn);
@@ -310,8 +315,12 @@ int pseries_devicetree_update(s32 scope)
 					add_dt_node(phandle, drc_index);
 					break;
 				}
+
+				cond_resched();
 			}
 		}
+
+		cond_resched();
 	} while (rc == 1);
 
 	kfree(rtas_buf);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM
  2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
@ 2019-08-05 23:05   ` Tyrel Datwyler
  2019-08-12 16:55     ` Nathan Lynch
  2019-08-13 17:20   ` Gautham R Shenoy
  2019-08-22 13:08   ` Michael Ellerman
  2 siblings, 1 reply; 10+ messages in thread
From: Tyrel Datwyler @ 2019-08-05 23:05 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev

On 8/2/19 12:29 PM, Nathan Lynch wrote:
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
> 
> 1. Set cpu 7 offline via sysfs.
> 
> 2. Begin a partition migration, whose implementation requires the OS
>    to ensure all present cpus are online; cpu 7 is onlined:
> 
>      rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
> 
>    This sets cpu 7 online in all respects except for the cpu's
>    corresponding struct device; dev->offline remains true.
> 
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
>    already online and returns success. The driver core (device_online)
>    sets dev->offline = false.
> 
> 4. The migration completes and restores cpu 7 to offline state:
> 
>      rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
> 
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
> 
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
> 
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
>  arch/powerpc/kernel/rtas.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 5faf0a64c92b..05824eb4323b 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
>  		return 0;
> 
>  	for_each_cpu(cpu, cpus) {
> +		struct device *dev = get_cpu_device(cpu);
> +
>  		switch (state) {
>  		case DOWN:
> -			cpuret = cpu_down(cpu);
> +			cpuret = device_offline(dev);
>  			break;
>  		case UP:
> -			cpuret = cpu_up(cpu);
> +			cpuret = device_online(dev);

Not that I have a problem with using the core device api, but as an FYI we had
discussed in the past introducing one shot functions in kernel/cpu.c for doing
our take down, bring up where cpu_update_maps() can be held for the whole
process. The thought was maybe it would be useful generically being able to
online/offline a bulk subset.

>  			break;
>  		}
> -		if (cpuret) {
> +		if (cpuret < 0) {
>  			pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
>  					__func__,
>  					((state == UP) ? "up" : "down"),
> @@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle)
>  	data.token = rtas_token("ibm,suspend-me");
>  	data.complete = &done;
> 
> +	lock_device_hotplug();
> +

Does taking the device hotplug lock suffice to prevent races with sysfs attempts
to hotplug (on/off) cpus? And if so we can strip out the code that checks the
mask to see if we raced, correct?

-Tyrel

>  	/* All present CPUs must be online */
>  	cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
>  	cpuret = rtas_online_cpus_mask(offline_mask);
> @@ -1006,6 +1010,7 @@ int rtas_ibm_suspend_me(u64 handle)
>  				__func__);
> 
>  out:
> +	unlock_device_hotplug();
>  	free_cpumask_var(offline_mask);
>  	return atomic_read(&data.error);
>  }
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM
  2019-08-05 23:05   ` Tyrel Datwyler
@ 2019-08-12 16:55     ` Nathan Lynch
  0 siblings, 0 replies; 10+ messages in thread
From: Nathan Lynch @ 2019-08-12 16:55 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: ego, linuxppc-dev

Tyrel Datwyler <tyreld@linux.ibm.com> writes:
> On 8/2/19 12:29 PM, Nathan Lynch wrote:
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 5faf0a64c92b..05824eb4323b 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
>>  		return 0;
>> 
>>  	for_each_cpu(cpu, cpus) {
>> +		struct device *dev = get_cpu_device(cpu);
>> +
>>  		switch (state) {
>>  		case DOWN:
>> -			cpuret = cpu_down(cpu);
>> +			cpuret = device_offline(dev);
>>  			break;
>>  		case UP:
>> -			cpuret = cpu_up(cpu);
>> +			cpuret = device_online(dev);
>
> Not that I have a problem with using the core device api, but as an FYI we had
> discussed in the past introducing one shot functions in kernel/cpu.c for doing
> our take down, bring up where cpu_update_maps() can be held for the whole
> process. The thought was maybe it would be useful generically being able to
> online/offline a bulk subset.

Thanks, I've discussed something along those lines with Gautham and it
may be more desirable in the longer term than having the arch code
perform the locking.

However, I think it would be even better to avoid bringing up all the
offline CPUs only to shut them down again on the destination. Ideally we
could nudge offline threads into H_JOIN without doing all the work
associated with cpuhp callbacks and generating hotplug event noise. I'm
concerned about the overhead incurred by the current design on large
LPAR configurations.

Regardless, I'd expect that this fix shouldn't have to wait for the
implementation of either of those ideas.


>>  			break;
>>  		}
>> -		if (cpuret) {
>> +		if (cpuret < 0) {
>>  			pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
>>  					__func__,
>>  					((state == UP) ? "up" : "down"),
>> @@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle)
>>  	data.token = rtas_token("ibm,suspend-me");
>>  	data.complete = &done;
>> 
>> +	lock_device_hotplug();
>> +
>
> Does taking the device hotplug lock suffice to prevent races with sysfs attempts
> to hotplug (on/off) cpus?

Yes.

> And if so we can strip out the code that checks the mask to see if we
> raced, correct?

I hope so, but I'm not sure, and it's harmless to leave in for
now. There could be other code which (like LPM) initiates cpu
online/offline outside of sysfs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] powerpc/rtas: allow rescheduling while changing cpu states
  2019-08-02 19:29 ` [PATCH v2 2/3] powerpc/rtas: allow rescheduling while changing cpu states Nathan Lynch
@ 2019-08-13 17:17   ` Gautham R Shenoy
  2019-08-13 18:14     ` Nathan Lynch
  0 siblings, 1 reply; 10+ messages in thread
From: Gautham R Shenoy @ 2019-08-13 17:17 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: ego, linuxppc-dev

On Sat, Aug 3, 2019 at 1:03 AM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>
> rtas_cpu_state_change_mask() potentially operates on scores of cpus,
> so explicitly allow rescheduling in the loop body.
>

Are we seeing softlockups/rcu stalls while running this ?

> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/rtas.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 05824eb4323b..b7ca2fde68a9 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -16,6 +16,7 @@
>  #include <linux/capability.h>
>  #include <linux/delay.h>
>  #include <linux/cpu.h>
> +#include <linux/sched.h>
>  #include <linux/smp.h>
>  #include <linux/completion.h>
>  #include <linux/cpumask.h>
> @@ -898,6 +899,7 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
>                                 cpumask_clear_cpu(cpu, cpus);
>                         }
>                 }
> +               cond_resched();
>         }
>
>         return ret;
> --
> 2.20.1
>


-- 
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM
  2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
  2019-08-05 23:05   ` Tyrel Datwyler
@ 2019-08-13 17:20   ` Gautham R Shenoy
  2019-08-22 13:08   ` Michael Ellerman
  2 siblings, 0 replies; 10+ messages in thread
From: Gautham R Shenoy @ 2019-08-13 17:20 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: ego, linuxppc-dev

Hello Nathan,

On Sat, Aug 3, 2019 at 1:06 AM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
>
> 1. Set cpu 7 offline via sysfs.
>
> 2. Begin a partition migration, whose implementation requires the OS
>    to ensure all present cpus are online; cpu 7 is onlined:
>
>      rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
>
>    This sets cpu 7 online in all respects except for the cpu's
>    corresponding struct device; dev->offline remains true.
>
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
>    already online and returns success. The driver core (device_online)
>    sets dev->offline = false.
>
> 4. The migration completes and restores cpu 7 to offline state:
>
>      rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
>
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
>
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
>
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

Looks good to me. This locking scheme makes the code consistent with
dlpar_cpu() which also uses the  high-level device APIs.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/rtas.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 5faf0a64c92b..05824eb4323b 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
>                 return 0;
>
>         for_each_cpu(cpu, cpus) {
> +               struct device *dev = get_cpu_device(cpu);
> +
>                 switch (state) {
>                 case DOWN:
> -                       cpuret = cpu_down(cpu);
> +                       cpuret = device_offline(dev);
>                         break;
>                 case UP:
> -                       cpuret = cpu_up(cpu);
> +                       cpuret = device_online(dev);
>                         break;
>                 }
> -               if (cpuret) {
> +               if (cpuret < 0) {
>                         pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
>                                         __func__,
>                                         ((state == UP) ? "up" : "down"),
> @@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle)
>         data.token = rtas_token("ibm,suspend-me");
>         data.complete = &done;
>
> +       lock_device_hotplug();
> +
>         /* All present CPUs must be online */
>         cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
>         cpuret = rtas_online_cpus_mask(offline_mask);
> @@ -1006,6 +1010,7 @@ int rtas_ibm_suspend_me(u64 handle)
>                                 __func__);
>
>  out:
> +       unlock_device_hotplug();
>         free_cpumask_var(offline_mask);
>         return atomic_read(&data.error);
>  }
> --
> 2.20.1
>


-- 
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] powerpc/rtas: allow rescheduling while changing cpu states
  2019-08-13 17:17   ` Gautham R Shenoy
@ 2019-08-13 18:14     ` Nathan Lynch
  0 siblings, 0 replies; 10+ messages in thread
From: Nathan Lynch @ 2019-08-13 18:14 UTC (permalink / raw)
  To: Gautham R Shenoy; +Cc: ego, linuxppc-dev

Gautham R Shenoy <ego.lkml@gmail.com> writes:

> On Sat, Aug 3, 2019 at 1:03 AM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>>
>> rtas_cpu_state_change_mask() potentially operates on scores of cpus,
>> so explicitly allow rescheduling in the loop body.
>>
>
> Are we seeing softlockups/rcu stalls while running this ?

I have not seen a report yet, but since the loop is bound only by the
number of processors in the LPAR I suspect it's only a matter of time.

> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM
  2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
  2019-08-05 23:05   ` Tyrel Datwyler
  2019-08-13 17:20   ` Gautham R Shenoy
@ 2019-08-22 13:08   ` Michael Ellerman
  2 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2019-08-22 13:08 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev

On Fri, 2019-08-02 at 19:29:24 UTC, Nathan Lynch wrote:
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
> 
> 1. Set cpu 7 offline via sysfs.
> 
> 2. Begin a partition migration, whose implementation requires the OS
>    to ensure all present cpus are online; cpu 7 is onlined:
> 
>      rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
> 
>    This sets cpu 7 online in all respects except for the cpu's
>    corresponding struct device; dev->offline remains true.
> 
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
>    already online and returns success. The driver core (device_online)
>    sets dev->offline = false.
> 
> 4. The migration completes and restores cpu 7 to offline state:
> 
>      rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
> 
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
> 
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
> 
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a6717c01ddc259f6f73364779df058e2c67309f8

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-22 13:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-02 19:29 [PATCH v2 0/3] more migration vs CPU hotplug fixes etc Nathan Lynch
2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
2019-08-05 23:05   ` Tyrel Datwyler
2019-08-12 16:55     ` Nathan Lynch
2019-08-13 17:20   ` Gautham R Shenoy
2019-08-22 13:08   ` Michael Ellerman
2019-08-02 19:29 ` [PATCH v2 2/3] powerpc/rtas: allow rescheduling while changing cpu states Nathan Lynch
2019-08-13 17:17   ` Gautham R Shenoy
2019-08-13 18:14     ` Nathan Lynch
2019-08-02 19:29 ` [PATCH v2 3/3] powerpc/pseries/mobility: use cond_resched when updating device tree Nathan Lynch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).