* Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM
2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
@ 2019-08-05 23:05 ` Tyrel Datwyler
2019-08-12 16:55 ` Nathan Lynch
2019-08-13 17:20 ` Gautham R Shenoy
2019-08-22 13:08 ` Michael Ellerman
2 siblings, 1 reply; 10+ messages in thread
From: Tyrel Datwyler @ 2019-08-05 23:05 UTC (permalink / raw)
To: Nathan Lynch, linuxppc-dev
On 8/2/19 12:29 PM, Nathan Lynch wrote:
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
>
> 1. Set cpu 7 offline via sysfs.
>
> 2. Begin a partition migration, whose implementation requires the OS
> to ensure all present cpus are online; cpu 7 is onlined:
>
> rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
>
> This sets cpu 7 online in all respects except for the cpu's
> corresponding struct device; dev->offline remains true.
>
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
> already online and returns success. The driver core (device_online)
> sets dev->offline = false.
>
> 4. The migration completes and restores cpu 7 to offline state:
>
> rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
>
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
>
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
>
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
> arch/powerpc/kernel/rtas.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 5faf0a64c92b..05824eb4323b 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
> return 0;
>
> for_each_cpu(cpu, cpus) {
> + struct device *dev = get_cpu_device(cpu);
> +
> switch (state) {
> case DOWN:
> - cpuret = cpu_down(cpu);
> + cpuret = device_offline(dev);
> break;
> case UP:
> - cpuret = cpu_up(cpu);
> + cpuret = device_online(dev);
Not that I have a problem with using the core device api, but as an FYI we had
discussed in the past introducing one shot functions in kernel/cpu.c for doing
our take down, bring up where cpu_update_maps() can be held for the whole
process. The thought was maybe it would be useful generically being able to
online/offline a bulk subset.
> break;
> }
> - if (cpuret) {
> + if (cpuret < 0) {
> pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
> __func__,
> ((state == UP) ? "up" : "down"),
> @@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle)
> data.token = rtas_token("ibm,suspend-me");
> data.complete = &done;
>
> + lock_device_hotplug();
> +
Does taking the device hotplug lock suffice to prevent races with sysfs attempts
to hotplug (on/off) cpus? And if so we can strip out the code that checks the
mask to see if we raced, correct?
-Tyrel
> /* All present CPUs must be online */
> cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
> cpuret = rtas_online_cpus_mask(offline_mask);
> @@ -1006,6 +1010,7 @@ int rtas_ibm_suspend_me(u64 handle)
> __func__);
>
> out:
> + unlock_device_hotplug();
> free_cpumask_var(offline_mask);
> return atomic_read(&data.error);
> }
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM
2019-08-05 23:05 ` Tyrel Datwyler
@ 2019-08-12 16:55 ` Nathan Lynch
0 siblings, 0 replies; 10+ messages in thread
From: Nathan Lynch @ 2019-08-12 16:55 UTC (permalink / raw)
To: Tyrel Datwyler; +Cc: ego, linuxppc-dev
Tyrel Datwyler <tyreld@linux.ibm.com> writes:
> On 8/2/19 12:29 PM, Nathan Lynch wrote:
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 5faf0a64c92b..05824eb4323b 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
>> return 0;
>>
>> for_each_cpu(cpu, cpus) {
>> + struct device *dev = get_cpu_device(cpu);
>> +
>> switch (state) {
>> case DOWN:
>> - cpuret = cpu_down(cpu);
>> + cpuret = device_offline(dev);
>> break;
>> case UP:
>> - cpuret = cpu_up(cpu);
>> + cpuret = device_online(dev);
>
> Not that I have a problem with using the core device api, but as an FYI we had
> discussed in the past introducing one shot functions in kernel/cpu.c for doing
> our take down, bring up where cpu_update_maps() can be held for the whole
> process. The thought was maybe it would be useful generically being able to
> online/offline a bulk subset.
Thanks, I've discussed something along those lines with Gautham and it
may be more desirable in the longer term than having the arch code
perform the locking.
However, I think it would be even better to avoid bringing up all the
offline CPUs only to shut them down again on the destination. Ideally we
could nudge offline threads into H_JOIN without doing all the work
associated with cpuhp callbacks and generating hotplug event noise. I'm
concerned about the overhead incurred by the current design on large
LPAR configurations.
Regardless, I'd expect that this fix shouldn't have to wait for the
implementation of either of those ideas.
>> break;
>> }
>> - if (cpuret) {
>> + if (cpuret < 0) {
>> pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
>> __func__,
>> ((state == UP) ? "up" : "down"),
>> @@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle)
>> data.token = rtas_token("ibm,suspend-me");
>> data.complete = &done;
>>
>> + lock_device_hotplug();
>> +
>
> Does taking the device hotplug lock suffice to prevent races with sysfs attempts
> to hotplug (on/off) cpus?
Yes.
> And if so we can strip out the code that checks the mask to see if we
> raced, correct?
I hope so, but I'm not sure, and it's harmless to leave in for
now. There could be other code which (like LPM) initiates cpu
online/offline outside of sysfs.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM
2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
2019-08-05 23:05 ` Tyrel Datwyler
@ 2019-08-13 17:20 ` Gautham R Shenoy
2019-08-22 13:08 ` Michael Ellerman
2 siblings, 0 replies; 10+ messages in thread
From: Gautham R Shenoy @ 2019-08-13 17:20 UTC (permalink / raw)
To: Nathan Lynch; +Cc: ego, linuxppc-dev
Hello Nathan,
On Sat, Aug 3, 2019 at 1:06 AM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
>
> 1. Set cpu 7 offline via sysfs.
>
> 2. Begin a partition migration, whose implementation requires the OS
> to ensure all present cpus are online; cpu 7 is onlined:
>
> rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
>
> This sets cpu 7 online in all respects except for the cpu's
> corresponding struct device; dev->offline remains true.
>
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
> already online and returns success. The driver core (device_online)
> sets dev->offline = false.
>
> 4. The migration completes and restores cpu 7 to offline state:
>
> rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
>
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
>
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
>
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Looks good to me. This locking scheme makes the code consistent with
dlpar_cpu() which also uses the high-level device APIs.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/rtas.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 5faf0a64c92b..05824eb4323b 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
> return 0;
>
> for_each_cpu(cpu, cpus) {
> + struct device *dev = get_cpu_device(cpu);
> +
> switch (state) {
> case DOWN:
> - cpuret = cpu_down(cpu);
> + cpuret = device_offline(dev);
> break;
> case UP:
> - cpuret = cpu_up(cpu);
> + cpuret = device_online(dev);
> break;
> }
> - if (cpuret) {
> + if (cpuret < 0) {
> pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
> __func__,
> ((state == UP) ? "up" : "down"),
> @@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle)
> data.token = rtas_token("ibm,suspend-me");
> data.complete = &done;
>
> + lock_device_hotplug();
> +
> /* All present CPUs must be online */
> cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
> cpuret = rtas_online_cpus_mask(offline_mask);
> @@ -1006,6 +1010,7 @@ int rtas_ibm_suspend_me(u64 handle)
> __func__);
>
> out:
> + unlock_device_hotplug();
> free_cpumask_var(offline_mask);
> return atomic_read(&data.error);
> }
> --
> 2.20.1
>
--
Thanks and Regards
gautham.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM
2019-08-02 19:29 ` [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM Nathan Lynch
2019-08-05 23:05 ` Tyrel Datwyler
2019-08-13 17:20 ` Gautham R Shenoy
@ 2019-08-22 13:08 ` Michael Ellerman
2 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2019-08-22 13:08 UTC (permalink / raw)
To: Nathan Lynch, linuxppc-dev
On Fri, 2019-08-02 at 19:29:24 UTC, Nathan Lynch wrote:
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
>
> 1. Set cpu 7 offline via sysfs.
>
> 2. Begin a partition migration, whose implementation requires the OS
> to ensure all present cpus are online; cpu 7 is onlined:
>
> rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
>
> This sets cpu 7 online in all respects except for the cpu's
> corresponding struct device; dev->offline remains true.
>
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
> already online and returns success. The driver core (device_online)
> sets dev->offline = false.
>
> 4. The migration completes and restores cpu 7 to offline state:
>
> rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
>
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
>
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
>
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Series applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/a6717c01ddc259f6f73364779df058e2c67309f8
cheers
^ permalink raw reply [flat|nested] 10+ messages in thread