* [PATCH v2 0/2] safely roll back failed DLPAR cpu add
@ 2019-10-16 18:36 Nathan Lynch
2019-10-16 18:36 ` [PATCH v2 1/2] powerpc/pseries: address checkpatch warnings in dlpar_offline_cpu Nathan Lynch
2019-10-16 18:36 ` [PATCH v2 2/2] powerpc/pseries: safely roll back failed DLPAR cpu add Nathan Lynch
0 siblings, 2 replies; 4+ messages in thread
From: Nathan Lynch @ 2019-10-16 18:36 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld
This is a respin of https://patchwork.ozlabs.org/patch/1164930/.
Changes since v1:
* Address checkpatch warnings in dlpar_offline_cpu before moving it.
* Add Fixes: tags.
Nathan Lynch (2):
powerpc/pseries: address checkpatch warnings in dlpar_offline_cpu
powerpc/pseries: safely roll back failed DLPAR cpu add
arch/powerpc/platforms/pseries/hotplug-cpu.c | 117 ++++++++++---------
1 file changed, 59 insertions(+), 58 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/2] powerpc/pseries: address checkpatch warnings in dlpar_offline_cpu
2019-10-16 18:36 [PATCH v2 0/2] safely roll back failed DLPAR cpu add Nathan Lynch
@ 2019-10-16 18:36 ` Nathan Lynch
2019-11-14 9:07 ` Michael Ellerman
2019-10-16 18:36 ` [PATCH v2 2/2] powerpc/pseries: safely roll back failed DLPAR cpu add Nathan Lynch
1 sibling, 1 reply; 4+ messages in thread
From: Nathan Lynch @ 2019-10-16 18:36 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld
Remove some stray blank lines, convert a printk to pr_warn, and
address a line length violation.
One functional change: use WARN_ON instead of BUG_ON in case H_PROD of
a ceded thread yields an unexpected result from the platform. We can
expect this code path to get uninterruptibly stuck in __cpu_die() if
this happens, but that's more desirable than crashing.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: b6db63d1a7f0 ("pseries/pseries: Add code to online/offline CPUs of a DLPAR node")
---
arch/powerpc/platforms/pseries/hotplug-cpu.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index bbda646b63b5..f3f54ef645e1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -539,7 +539,6 @@ static int dlpar_offline_cpu(struct device_node *dn)
goto out;
cpu_maps_update_begin();
break;
-
}
/*
@@ -547,19 +546,19 @@ static int dlpar_offline_cpu(struct device_node *dn)
* Upgrade it's state to CPU_STATE_OFFLINE.
*/
set_preferred_offline_state(cpu, CPU_STATE_OFFLINE);
- BUG_ON(plpar_hcall_norets(H_PROD, thread)
- != H_SUCCESS);
+ WARN_ON(plpar_hcall_norets(H_PROD, thread) != H_SUCCESS);
__cpu_die(cpu);
break;
}
- if (cpu == num_possible_cpus())
- printk(KERN_WARNING "Could not find cpu to offline with physical id 0x%x\n", thread);
+ if (cpu == num_possible_cpus()) {
+ pr_warn("Could not find cpu to offline with physical id 0x%x\n",
+ thread);
+ }
}
cpu_maps_update_done();
out:
return rc;
-
}
static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] powerpc/pseries: safely roll back failed DLPAR cpu add
2019-10-16 18:36 [PATCH v2 0/2] safely roll back failed DLPAR cpu add Nathan Lynch
2019-10-16 18:36 ` [PATCH v2 1/2] powerpc/pseries: address checkpatch warnings in dlpar_offline_cpu Nathan Lynch
@ 2019-10-16 18:36 ` Nathan Lynch
1 sibling, 0 replies; 4+ messages in thread
From: Nathan Lynch @ 2019-10-16 18:36 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld
dlpar_online_cpu() attempts to online all threads of a core that has
been added to an LPAR. If onlining a non-primary thread
fails (e.g. due to an allocation failure), the core is left with at
least one thread online. dlpar_cpu_add() attempts to roll back the
whole operation, releasing the core back to the platform. However,
since some threads of the core being removed are still online, the
BUG_ON(cpu_online(cpu)) in pseries_remove_processor() strikes:
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 3 PID: 8587 Comm: drmgr Not tainted 5.3.0-rc2-00190-g9b123d1ea237-dirty #46
NIP: c0000000000eeb2c LR: c0000000000eeac4 CTR: c0000000000ee9e0
REGS: c0000001f745b6c0 TRAP: 0700 Not tainted (5.3.0-rc2-00190-g9b123d1ea237-dirty)
MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 44002448 XER: 00000000
CFAR: c00000000195d718 IRQMASK: 0
GPR00: c0000000000eeac4 c0000001f745b950 c0000000032f6200 0000000000000008
GPR04: 0000000000000008 c000000003349c78 0000000000000040 00000000000001ff
GPR08: 0000000000000008 0000000000000000 0000000000000001 0007ffffffffffff
GPR12: 0000000084002844 c00000001ecacb80 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000008
GPR24: c000000003349ee0 c00000000334a2e4 c0000000fca4d7a8 c000000001d20048
GPR28: 0000000000000001 ffffffffffffffff ffffffffffffffff c0000000fca4d7c4
NIP [c0000000000eeb2c] pseries_smp_notifier+0x14c/0x2e0
LR [c0000000000eeac4] pseries_smp_notifier+0xe4/0x2e0
Call Trace:
[c0000001f745b950] [c0000000000eeac4] pseries_smp_notifier+0xe4/0x2e0 (unreliable)
[c0000001f745ba10] [c0000000001ac774] notifier_call_chain+0xb4/0x190
[c0000001f745bab0] [c0000000001ad62c] blocking_notifier_call_chain+0x7c/0xb0
[c0000001f745baf0] [c00000000167bda0] of_detach_node+0xc0/0x110
[c0000001f745bb50] [c0000000000e7ae4] dlpar_detach_node+0x64/0xa0
[c0000001f745bb80] [c0000000000edefc] dlpar_cpu_add+0x31c/0x360
[c0000001f745bc10] [c0000000000ee980] dlpar_cpu_probe+0x50/0xb0
[c0000001f745bc50] [c00000000002cf70] arch_cpu_probe+0x40/0x70
[c0000001f745bc70] [c000000000ccd808] cpu_probe_store+0x48/0x80
[c0000001f745bcb0] [c000000000cbcef8] dev_attr_store+0x38/0x60
[c0000001f745bcd0] [c00000000059c980] sysfs_kf_write+0x70/0xb0
[c0000001f745bd10] [c00000000059afb8] kernfs_fop_write+0xf8/0x280
[c0000001f745bd60] [c0000000004b437c] __vfs_write+0x3c/0x70
[c0000001f745bd80] [c0000000004b8710] vfs_write+0xd0/0x220
[c0000001f745bdd0] [c0000000004b8acc] ksys_write+0x7c/0x140
[c0000001f745be20] [c00000000000bbd8] system_call+0x5c/0x68
Move dlpar_offline_cpu() up in the file so that dlpar_online_cpu() can
use it to re-offline any threads that have been onlined when an error
is encountered.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: e666ae0b10aa ("powerpc/pseries: Update CPU hotplug error recovery")
---
arch/powerpc/platforms/pseries/hotplug-cpu.c | 116 ++++++++++---------
1 file changed, 59 insertions(+), 57 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index f3f54ef645e1..8ab24bd7f89c 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -338,6 +338,62 @@ static void pseries_remove_processor(struct device_node *np)
cpu_maps_update_done();
}
+static int dlpar_offline_cpu(struct device_node *dn)
+{
+ int rc = 0;
+ unsigned int cpu;
+ int len, nthreads, i;
+ const __be32 *intserv;
+ u32 thread;
+
+ intserv = of_get_property(dn, "ibm,ppc-interrupt-server#s", &len);
+ if (!intserv)
+ return -EINVAL;
+
+ nthreads = len / sizeof(u32);
+
+ cpu_maps_update_begin();
+ for (i = 0; i < nthreads; i++) {
+ thread = be32_to_cpu(intserv[i]);
+ for_each_present_cpu(cpu) {
+ if (get_hard_smp_processor_id(cpu) != thread)
+ continue;
+
+ if (get_cpu_current_state(cpu) == CPU_STATE_OFFLINE)
+ break;
+
+ if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) {
+ set_preferred_offline_state(cpu,
+ CPU_STATE_OFFLINE);
+ cpu_maps_update_done();
+ timed_topology_update(1);
+ rc = device_offline(get_cpu_device(cpu));
+ if (rc)
+ goto out;
+ cpu_maps_update_begin();
+ break;
+ }
+
+ /*
+ * The cpu is in CPU_STATE_INACTIVE.
+ * Upgrade it's state to CPU_STATE_OFFLINE.
+ */
+ set_preferred_offline_state(cpu, CPU_STATE_OFFLINE);
+ WARN_ON(plpar_hcall_norets(H_PROD, thread) != H_SUCCESS);
+ __cpu_die(cpu);
+ break;
+ }
+ if (cpu == num_possible_cpus()) {
+ pr_warn("Could not find cpu to offline with physical id 0x%x\n",
+ thread);
+ }
+ }
+ cpu_maps_update_done();
+
+out:
+ return rc;
+}
+
static int dlpar_online_cpu(struct device_node *dn)
{
int rc = 0;
@@ -364,8 +420,10 @@ static int dlpar_online_cpu(struct device_node *dn)
timed_topology_update(1);
find_and_online_cpu_nid(cpu);
rc = device_online(get_cpu_device(cpu));
- if (rc)
+ if (rc) {
+ dlpar_offline_cpu(dn);
goto out;
+ }
cpu_maps_update_begin();
break;
@@ -505,62 +563,6 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
return rc;
}
-static int dlpar_offline_cpu(struct device_node *dn)
-{
- int rc = 0;
- unsigned int cpu;
- int len, nthreads, i;
- const __be32 *intserv;
- u32 thread;
-
- intserv = of_get_property(dn, "ibm,ppc-interrupt-server#s", &len);
- if (!intserv)
- return -EINVAL;
-
- nthreads = len / sizeof(u32);
-
- cpu_maps_update_begin();
- for (i = 0; i < nthreads; i++) {
- thread = be32_to_cpu(intserv[i]);
- for_each_present_cpu(cpu) {
- if (get_hard_smp_processor_id(cpu) != thread)
- continue;
-
- if (get_cpu_current_state(cpu) == CPU_STATE_OFFLINE)
- break;
-
- if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) {
- set_preferred_offline_state(cpu,
- CPU_STATE_OFFLINE);
- cpu_maps_update_done();
- timed_topology_update(1);
- rc = device_offline(get_cpu_device(cpu));
- if (rc)
- goto out;
- cpu_maps_update_begin();
- break;
- }
-
- /*
- * The cpu is in CPU_STATE_INACTIVE.
- * Upgrade it's state to CPU_STATE_OFFLINE.
- */
- set_preferred_offline_state(cpu, CPU_STATE_OFFLINE);
- WARN_ON(plpar_hcall_norets(H_PROD, thread) != H_SUCCESS);
- __cpu_die(cpu);
- break;
- }
- if (cpu == num_possible_cpus()) {
- pr_warn("Could not find cpu to offline with physical id 0x%x\n",
- thread);
- }
- }
- cpu_maps_update_done();
-
-out:
- return rc;
-}
-
static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
{
int rc;
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/2] powerpc/pseries: address checkpatch warnings in dlpar_offline_cpu
2019-10-16 18:36 ` [PATCH v2 1/2] powerpc/pseries: address checkpatch warnings in dlpar_offline_cpu Nathan Lynch
@ 2019-11-14 9:07 ` Michael Ellerman
0 siblings, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2019-11-14 9:07 UTC (permalink / raw)
To: Nathan Lynch, linuxppc-dev; +Cc: tyreld
On Wed, 2019-10-16 at 18:36:10 UTC, Nathan Lynch wrote:
> Remove some stray blank lines, convert a printk to pr_warn, and
> address a line length violation.
>
> One functional change: use WARN_ON instead of BUG_ON in case H_PROD of
> a ceded thread yields an unexpected result from the platform. We can
> expect this code path to get uninterruptibly stuck in __cpu_die() if
> this happens, but that's more desirable than crashing.
>
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Fixes: b6db63d1a7f0 ("pseries/pseries: Add code to online/offline CPUs of a DLPAR node")
Series applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/3366ebe9e19ba3c672a00a6fb86f0ac8636ee989
cheers
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-11-14 9:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-16 18:36 [PATCH v2 0/2] safely roll back failed DLPAR cpu add Nathan Lynch
2019-10-16 18:36 ` [PATCH v2 1/2] powerpc/pseries: address checkpatch warnings in dlpar_offline_cpu Nathan Lynch
2019-11-14 9:07 ` Michael Ellerman
2019-10-16 18:36 ` [PATCH v2 2/2] powerpc/pseries: safely roll back failed DLPAR cpu add Nathan Lynch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).