[PATCH] powerpc/rtas: Fix hang in race against concurrent cpu offline

* [PATCH] powerpc/rtas: Fix hang in race against concurrent cpu offline
@ 2019-06-24 23:48 Juliet Kim
  2019-06-25 17:29 ` Nathan Lynch
  0 siblings, 1 reply; 5+ messages in thread
From: Juliet Kim @ 2019-06-24 23:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mmc, mwb, nathanl

From 1bd2bf7146876e099eafb292668f9a12dc22f526 Mon Sep 17 00:00:00 2001
From: Juliet Kim <julietk@linux.vnet.ibm.com>
Date: Mon, 24 Jun 2019 17:17:46 -0400
Subject: [PATCH 1/1] powerpc/rtas: Fix hang in race against concurrent cpu
 offline
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The commit
(“powerpc/rtas: Fix a potential race between CPU-Offline & Migration)
attempted to fix a hang in Live Partition Mobility(LPM) by abandoning
the LPM attempt if a race between LPM and concurrent CPU offline was
detected.

However, that fix failed to notify Hypervisor that the LPM attempted
had been abandoned which results in a system hang.

Fix this by sending a signal PHYP to cancel the migration, so that PHYP
can stop waiting, and clean up the migration.

Fixes: dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration")
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
---
This is an alternate solution to the one Nathan proposed in:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/192414.html
---
 arch/powerpc/include/asm/hvcall.h | 7 +++++++
 arch/powerpc/kernel/rtas.c        | 8 ++++++++
 2 files changed, 15 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 463c63a..29ca285 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -261,6 +261,7 @@
 #define H_ADD_CONN		0x284
 #define H_DEL_CONN		0x288
 #define H_JOIN			0x298
+#define H_VASI_SIGNAL           0x2A0
 #define H_VASI_STATE            0x2A4
 #define H_VIOCTL		0x2A8
 #define H_ENABLE_CRQ		0x2B0
@@ -348,6 +349,12 @@
 #define H_SIGNAL_SYS_RESET_ALL_OTHERS		-2
 /* >= 0 values are CPU number */
 
+/* Values for argument to H_VASI_SIGNAL */
+#define H_SIGNAL_CANCEL_MIG     0x01
+
+/* Values for 2nd argument to H_VASI_SIGNAL */
+#define H_CPU_OFFLINE_DETECTED  0x0000000006000004
+
 /* H_GET_CPU_CHARACTERISTICS return values */
 #define H_CPU_CHAR_SPEC_BAR_ORI31	(1ull << 63) // IBM bit 0
 #define H_CPU_CHAR_BCCTRL_SERIALISED	(1ull << 62) // IBM bit 1
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index b824f4c..f9002b7 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -981,6 +981,14 @@ int rtas_ibm_suspend_me(u64 handle)
 
 	/* Check if we raced with a CPU-Offline Operation */
 	if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) {
+
+		/* We uses CANCEL, not ABORT to gracefully cancel migration */
+		rc = plpar_hcall_norets(H_VASI_SIGNAL, handle,
+			H_SIGNAL_CANCEL_MIG, H_CPU_OFFLINE_DETECTED);
+
+		if (rc != H_SUCCESS)
+			pr_err("%s: vasi_signal failed %ld\n", __func__, rc);
+
 		pr_err("%s: Raced against a concurrent CPU-Offline\n",
 		       __func__);
 		atomic_set(&data.error, -EBUSY);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread