linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/29] partition suspend updates
@ 2020-10-30  1:17 Nathan Lynch
  2020-10-30  1:17 ` [PATCH 01/29] powerpc/rtas: move rtas_call_reentrant() out of pseries guards Nathan Lynch
                   ` (28 more replies)
  0 siblings, 29 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

This series aims to improve the pseries-specific partition migration
and hibernation implementation, part of which has been living in
kernel/rtas.c. Most of that code is eliminated or moved to
platforms/pseries, and the following major functional changes are
made:

- Use stop_machine() instead of on_each_cpu() to avoid deadlock in the
  join/suspend sequence.

- Retry the join/suspend sequence on errors that are likely to be
  transient. This is a mitigation for the fact that drivers currently
  have no way to prepare for an impending partition suspension,
  sometimes resulting in a virtual adapter being in a state which
  causes the platform to fail the suspend call.

- Request cancellation of the migration via H_VASI_SIGNAL if Linux is
  going to error out of the suspend attempt. This allows the
  management console and other entities to promptly clean up their
  operations instead of relying on long timeouts to fail the
  migration.

- Little-endian users of ibm,suspend-me, ibm,update-nodes and
  ibm,update-properties via sys_rtas are blocked when
  CONFIG_PPC_RTAS_FILTERS is enabled.

- Legacy user space code (drmgr) historically has driven the migration
  process by using sys_rtas to separately call ibm,suspend-me,
  ibm,activate-firmware, and ibm,update-nodes/properties, in that
  order. With these changes, when sys_rtas() dispatches
  ibm,suspend-me, the kernel performs the device tree update and
  firmware activation before returning. This is more reliable, and
  drmgr does not seem bothered by it.

- If the H_VASI_STATE hcall is absent, the implementation proceeds
  with the suspend instead of erroring out. This allows us to exercise
  these code paths in QEMU.

Nathan Lynch (29):
  powerpc/rtas: move rtas_call_reentrant() out of pseries guards
  powerpc/rtas: prevent suspend-related sys_rtas use on LE
  powerpc/rtas: complete ibm,suspend-me status codes
  powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe
  powerpc/rtas: add rtas_ibm_suspend_me()
  powerpc/rtas: add rtas_activate_firmware()
  powerpc/hvcall: add token and codes for H_VASI_SIGNAL
  powerpc/pseries/mobility: don't error on absence of ibm,update-nodes
  powerpc/pseries/mobility: add missing break to default case
  powerpc/pseries/mobility: error message improvements
  powerpc/pseries/mobility: use rtas_activate_firmware() on resume
  powerpc/pseries/mobility: extract VASI session polling logic
  powerpc/pseries/mobility: use stop_machine for join/suspend
  powerpc/pseries/mobility: signal suspend cancellation to platform
  powerpc/pseries/mobility: retry partition suspend after error
  powerpc/rtas: dispatch partition migration requests to pseries
  powerpc/rtas: remove rtas_ibm_suspend_me_unsafe()
  powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend
    ops
  powerpc/pseries/hibernation: pass stream id via function arguments
  powerpc/pseries/hibernation: remove pseries_suspend_cpu()
  powerpc/machdep: remove suspend_disable_cpu()
  powerpc/rtas: remove rtas_suspend_cpu()
  powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me()
  powerpc/rtas: remove unused rtas_suspend_last_cpu()
  powerpc/pseries/hibernation: remove redundant cacheinfo update
  powerpc/pseries/hibernation: perform post-suspend fixups later
  powerpc/pseries/hibernation: remove prepare_late() callback
  powerpc/rtas: remove unused rtas_suspend_me_data
  powerpc/pseries/mobility: refactor node lookup during DT update

 arch/powerpc/include/asm/hvcall.h         |   9 +
 arch/powerpc/include/asm/machdep.h        |   1 -
 arch/powerpc/include/asm/rtas-types.h     |   8 -
 arch/powerpc/include/asm/rtas.h           |  13 +-
 arch/powerpc/kernel/rtas.c                | 245 ++++++---------
 arch/powerpc/platforms/pseries/mobility.c | 361 ++++++++++++++++++----
 arch/powerpc/platforms/pseries/suspend.c  |  79 +----
 7 files changed, 420 insertions(+), 296 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/29] powerpc/rtas: move rtas_call_reentrant() out of pseries guards
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-12-04 20:37   ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE Nathan Lynch
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

rtas_call_reentrant() isn't platform-dependent; move it out of
CONFIG_PPC_PSERIES-guarded code.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 954f41676f69..b40fc892138b 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -897,6 +897,12 @@ int rtas_ibm_suspend_me(u64 handle)
 
 	return atomic_read(&data.error);
 }
+#else /* CONFIG_PPC_PSERIES */
+int rtas_ibm_suspend_me(u64 handle)
+{
+	return -ENOSYS;
+}
+#endif
 
 /**
  * rtas_call_reentrant() - Used for reentrant rtas calls
@@ -948,13 +954,6 @@ int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...)
 	return ret;
 }
 
-#else /* CONFIG_PPC_PSERIES */
-int rtas_ibm_suspend_me(u64 handle)
-{
-	return -ENOSYS;
-}
-#endif
-
 /**
  * Find a specific pseries error log in an RTAS extended event log.
  * @log: RTAS error/event log
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
  2020-10-30  1:17 ` [PATCH 01/29] powerpc/rtas: move rtas_call_reentrant() out of pseries guards Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  3:45   ` Andrew Donnellan
  2020-10-30  1:17 ` [PATCH 03/29] powerpc/rtas: complete ibm,suspend-me status codes Nathan Lynch
                   ` (26 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

While drmgr has had work in some areas to make its RTAS syscall
interactions endian-neutral, its code for performing partition
migration via the syscall has never worked on LE. While it is able to
complete ibm,suspend-me successfully, it crashes when attempting the
subsequent ibm,update-nodes call.

drmgr is the only known (or plausible) user of these ibm,suspend-me,
ibm,update-nodes, and ibm,update-properties, so allow them only in
big-endian configurations.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index b40fc892138b..132b2ae39009 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1049,9 +1049,11 @@ static struct rtas_filter rtas_filters[] __ro_after_init = {
 	{ "set-time-for-power-on", -1, -1, -1, -1, -1 },
 	{ "ibm,set-system-parameter", -1, 1, -1, -1, -1 },
 	{ "set-time-of-day", -1, -1, -1, -1, -1 },
+#ifdef CONFIG_CPU_BIG_ENDIAN
 	{ "ibm,suspend-me", -1, -1, -1, -1, -1 },
 	{ "ibm,update-nodes", -1, 0, -1, -1, -1, 4096 },
 	{ "ibm,update-properties", -1, 0, -1, -1, -1, 4096 },
+#endif
 	{ "ibm,physical-attestation", -1, 0, 1, -1, -1 },
 };
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/29] powerpc/rtas: complete ibm,suspend-me status codes
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
  2020-10-30  1:17 ` [PATCH 01/29] powerpc/rtas: move rtas_call_reentrant() out of pseries guards Nathan Lynch
  2020-10-30  1:17 ` [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-12-04 12:52   ` Michael Ellerman
  2020-10-30  1:17 ` [PATCH 04/29] powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe Nathan Lynch
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

We don't completely account for the possible return codes for
ibm,suspend-me. Add definitions for these.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 55f9a154c95d..f060181a0d32 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -23,11 +23,16 @@
 #define RTAS_RMOBUF_MAX (64 * 1024)
 
 /* RTAS return status codes */
-#define RTAS_NOT_SUSPENDABLE	-9004
 #define RTAS_BUSY		-2    /* RTAS Busy */
 #define RTAS_EXTENDED_DELAY_MIN	9900
 #define RTAS_EXTENDED_DELAY_MAX	9905
 
+/* statuses specific to ibm,suspend-me */
+#define RTAS_SUSPEND_ABORTED     9000 /* Suspension aborted */
+#define RTAS_NOT_SUSPENDABLE    -9004 /* Partition not suspendable */
+#define RTAS_THREADS_ACTIVE     -9005 /* Multiple processor threads active */
+#define RTAS_OUTSTANDING_COPROC -9006 /* Outstanding coprocessor operations */
+
 /*
  * In general to call RTAS use rtas_token("string") to lookup
  * an RTAS token for the given string (e.g. "event-scan").
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/29] powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (2 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 03/29] powerpc/rtas: complete ibm,suspend-me status codes Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 05/29] powerpc/rtas: add rtas_ibm_suspend_me() Nathan Lynch
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

The pseries partition suspend sequence requires that all active CPUs
call H_JOIN, which suspends all but one of them with interrupts
disabled. The "chosen" CPU is then to call ibm,suspend-me to complete
the suspend. Upon returning from ibm,suspend-me, the chosen CPU is to
use H_PROD to wake the joined CPUs.

Using on_each_cpu() for this, as rtas_ibm_suspend_me() does to
implement partition migration, is susceptible to deadlock with other
users of on_each_cpu() and with users of stop_machine APIs. The
callback passed to on_each_cpu() is not allowed to synchronize with
other CPUs in the way it is used here.

Complicating the fix is the fact that rtas_ibm_suspend_me() also
occupies the function name that should be used to provide a more
conventional wrapper for ibm,suspend-me. Rename rtas_ibm_suspend_me()
to rtas_ibm_suspend_me_unsafe() to free up the name and indicate that
it should not gain users.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h           | 2 +-
 arch/powerpc/kernel/rtas.c                | 6 +++---
 arch/powerpc/platforms/pseries/mobility.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index f060181a0d32..8436ed01567b 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -257,7 +257,7 @@ extern int rtas_set_indicator_fast(int indicator, int index, int new_value);
 extern void rtas_progress(char *s, unsigned short hex);
 extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
 extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
-extern int rtas_ibm_suspend_me(u64 handle);
+int rtas_ibm_suspend_me_unsafe(u64 handle);
 
 struct rtc_time;
 extern time64_t rtas_get_boot_time(void);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 132b2ae39009..33adefa84a42 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -843,7 +843,7 @@ static void rtas_percpu_suspend_me(void *info)
 	__rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1);
 }
 
-int rtas_ibm_suspend_me(u64 handle)
+int rtas_ibm_suspend_me_unsafe(u64 handle)
 {
 	long state;
 	long rc;
@@ -898,7 +898,7 @@ int rtas_ibm_suspend_me(u64 handle)
 	return atomic_read(&data.error);
 }
 #else /* CONFIG_PPC_PSERIES */
-int rtas_ibm_suspend_me(u64 handle)
+int rtas_ibm_suspend_me_unsafe(u64 handle)
 {
 	return -ENOSYS;
 }
@@ -1184,7 +1184,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 		int rc = 0;
 		u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
 		              | be32_to_cpu(args.args[1]);
-		rc = rtas_ibm_suspend_me(handle);
+		rc = rtas_ibm_suspend_me_unsafe(handle);
 		if (rc == -EAGAIN)
 			args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
 		else if (rc == -EIO)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index d6f4162478a5..b6de65cbfcd9 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -370,7 +370,7 @@ static ssize_t migration_store(struct class *class,
 		return rc;
 
 	do {
-		rc = rtas_ibm_suspend_me(streamid);
+		rc = rtas_ibm_suspend_me_unsafe(streamid);
 		if (rc == -EAGAIN)
 			ssleep(1);
 	} while (rc == -EAGAIN);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/29] powerpc/rtas: add rtas_ibm_suspend_me()
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (3 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 04/29] powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 06/29] powerpc/rtas: add rtas_activate_firmware() Nathan Lynch
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Now that the name is available, provide a simple wrapper for
ibm,suspend-me which returns both a Linux errno and optionally the
actual RTAS status to the caller.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  1 +
 arch/powerpc/kernel/rtas.c      | 57 +++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 8436ed01567b..b43165fc6c2a 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -258,6 +258,7 @@ extern void rtas_progress(char *s, unsigned short hex);
 extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
 extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
 int rtas_ibm_suspend_me_unsafe(u64 handle);
+int rtas_ibm_suspend_me(int *fw_status);
 
 struct rtc_time;
 extern time64_t rtas_get_boot_time(void);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 33adefa84a42..70c570269d7b 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -684,6 +684,63 @@ int rtas_set_indicator_fast(int indicator, int index, int new_value)
 	return rc;
 }
 
+/**
+ * rtas_ibm_suspend_me() - Call ibm,suspend-me to suspend the LPAR.
+ *
+ * @fw_status: RTAS call status will be placed here if not NULL.
+ *
+ * rtas_ibm_suspend_me() should be called only on a CPU which has
+ * received H_CONTINUE from the H_JOIN hcall. All other active CPUs
+ * should be waiting to return from H_JOIN.
+ *
+ * rtas_ibm_suspend_me() may suspend execution of the OS
+ * indefinitely. Callers should take appropriate measures upon return, such as
+ * resetting watchdog facilities.
+ *
+ * Callers may choose to retry this call if @fw_status is
+ * %RTAS_THREADS_ACTIVE.
+ *
+ * Return:
+ * 0          - The partition has resumed from suspend, possibly after
+ *              migration to a different host.
+ * -ECANCELED - The operation was aborted.
+ * -EAGAIN    - There were other CPUs not in H_JOIN at the time of the call.
+ * -EBUSY     - Some other condition prevented the suspend from succeeding.
+ * -EIO       - Hardware/platform error.
+ */
+int rtas_ibm_suspend_me(int *fw_status)
+{
+	int fwrc;
+	int ret;
+
+	fwrc = rtas_call(rtas_token("ibm,suspend-me"), 0, 1, NULL);
+
+	switch (fwrc) {
+	case 0:
+		ret = 0;
+		break;
+	case RTAS_SUSPEND_ABORTED:
+		ret = -ECANCELED;
+		break;
+	case RTAS_THREADS_ACTIVE:
+		ret = -EAGAIN;
+		break;
+	case RTAS_NOT_SUSPENDABLE:
+	case RTAS_OUTSTANDING_COPROC:
+		ret = -EBUSY;
+		break;
+	case -1:
+	default:
+		ret = -EIO;
+		break;
+	}
+
+	if (fw_status)
+		*fw_status = fwrc;
+
+	return ret;
+}
+
 void __noreturn rtas_restart(char *cmd)
 {
 	if (rtas_flash_term_hook)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/29] powerpc/rtas: add rtas_activate_firmware()
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (4 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 05/29] powerpc/rtas: add rtas_ibm_suspend_me() Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 07/29] powerpc/hvcall: add token and codes for H_VASI_SIGNAL Nathan Lynch
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Provide a documented wrapper function for the ibm,activate-firmware
service, which must be called after a partition migration or
hibernation.

If the function is absent or the call fails, the OS will continue to
run normally with the current firmware, so there is no need to perform
any recovery. Just log it and continue.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  1 +
 arch/powerpc/kernel/rtas.c      | 30 ++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index b43165fc6c2a..fdefe6a974eb 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -247,6 +247,7 @@ extern void __noreturn rtas_restart(char *cmd);
 extern void rtas_power_off(void);
 extern void __noreturn rtas_halt(void);
 extern void rtas_os_term(char *str);
+void rtas_activate_firmware(void);
 extern int rtas_get_sensor(int sensor, int index, int *state);
 extern int rtas_get_sensor_fast(int sensor, int index, int *state);
 extern int rtas_get_power_level(int powerdomain, int *level);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 70c570269d7b..58bbd69a233f 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -961,6 +961,36 @@ int rtas_ibm_suspend_me_unsafe(u64 handle)
 }
 #endif
 
+/**
+ * rtas_activate_firmware() - Activate a new version of firmware.
+ *
+ * Activate a new version of partition firmware. The OS must call this
+ * after resuming from a partition hibernation or migration in order
+ * to maintain the ability to perform live firmware updates. It's not
+ * catastrophic for this method to be absent or to fail; just log the
+ * condition in that case.
+ *
+ * Context: This function may sleep.
+ */
+void rtas_activate_firmware(void)
+{
+	int token;
+	int fwrc;
+
+	token = rtas_token("ibm,activate-firmware");
+	if (token == RTAS_UNKNOWN_SERVICE) {
+		pr_notice("ibm,activate-firmware method unavailable\n");
+		return;
+	}
+
+	do {
+		fwrc = rtas_call(token, 0, 1, NULL);
+	} while (rtas_busy_delay(fwrc));
+
+	if (fwrc)
+		pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
+}
+
 /**
  * rtas_call_reentrant() - Used for reentrant rtas calls
  * @token:	Token for desired reentrant RTAS call
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/29] powerpc/hvcall: add token and codes for H_VASI_SIGNAL
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (5 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 06/29] powerpc/rtas: add rtas_activate_firmware() Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 08/29] powerpc/pseries/mobility: don't error on absence of ibm, update-nodes Nathan Lynch
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

H_VASI_SIGNAL can be used by a partition to request cancellation of
its migration. To be used in future changes.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index c1fbccb04390..c98f5141e3fc 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -155,6 +155,14 @@
 #define H_VASI_RESUMED          5
 #define H_VASI_COMPLETED        6
 
+/* VASI signal codes. Only the Cancel code is valid for H_VASI_SIGNAL. */
+#define H_VASI_SIGNAL_CANCEL    1
+#define H_VASI_SIGNAL_ABORT     2
+#define H_VASI_SIGNAL_SUSPEND   3
+#define H_VASI_SIGNAL_COMPLETE  4
+#define H_VASI_SIGNAL_ENABLE    5
+#define H_VASI_SIGNAL_FAILOVER  6
+
 /* Each control block has to be on a 4K boundary */
 #define H_CB_ALIGNMENT          4096
 
@@ -261,6 +269,7 @@
 #define H_ADD_CONN		0x284
 #define H_DEL_CONN		0x288
 #define H_JOIN			0x298
+#define H_VASI_SIGNAL           0x2A0
 #define H_VASI_STATE            0x2A4
 #define H_VIOCTL		0x2A8
 #define H_ENABLE_CRQ		0x2B0
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/29] powerpc/pseries/mobility: don't error on absence of ibm, update-nodes
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (6 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 07/29] powerpc/hvcall: add token and codes for H_VASI_SIGNAL Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 09/29] powerpc/pseries/mobility: add missing break to default case Nathan Lynch
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Treat the absence of the ibm,update-nodes function as benign instead
of reporting an error. If the platform does not provide that facility,
it's not a problem for Linux.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index b6de65cbfcd9..5bcb6e5cc0f2 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -261,7 +261,7 @@ int pseries_devicetree_update(s32 scope)
 
 	update_nodes_token = rtas_token("ibm,update-nodes");
 	if (update_nodes_token == RTAS_UNKNOWN_SERVICE)
-		return -EINVAL;
+		return 0;
 
 	rtas_buf = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
 	if (!rtas_buf)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/29] powerpc/pseries/mobility: add missing break to default case
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (7 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 08/29] powerpc/pseries/mobility: don't error on absence of ibm, update-nodes Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 10/29] powerpc/pseries/mobility: error message improvements Nathan Lynch
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

update_dt_node() has a switch statement where the default case lacks a
break statement.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 5bcb6e5cc0f2..d85799b5464a 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -213,6 +213,7 @@ static int update_dt_node(__be32 phandle, s32 scope)
 				}
 
 				prop_data += vd;
+				break;
 			}
 
 			cond_resched();
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/29] powerpc/pseries/mobility: error message improvements
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (8 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 09/29] powerpc/pseries/mobility: add missing break to default case Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 11/29] powerpc/pseries/mobility: use rtas_activate_firmware() on resume Nathan Lynch
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

- Convert printk(KERN_ERR) to pr_err().
- Include errno in property update failure message.
- Remove reference to "Post-mobility" from device tree update message:
  with pr_err() it will have a "mobility:" prefix.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index d85799b5464a..ef8f5641e700 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -208,8 +208,8 @@ static int update_dt_node(__be32 phandle, s32 scope)
 				rc = update_dt_property(dn, &prop, prop_name,
 							vd, prop_data);
 				if (rc) {
-					printk(KERN_ERR "Could not update %s"
-					       " property\n", prop_name);
+					pr_err("updating %s property failed: %d\n",
+					       prop_name, rc);
 				}
 
 				prop_data += vd;
@@ -343,8 +343,7 @@ void post_mobility_fixup(void)
 
 	rc = pseries_devicetree_update(MIGRATION_SCOPE);
 	if (rc)
-		printk(KERN_ERR "Post-mobility device tree update "
-			"failed: %d\n", rc);
+		pr_err("device tree update failed: %d\n", rc);
 
 	cacheinfo_rebuild();
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 11/29] powerpc/pseries/mobility: use rtas_activate_firmware() on resume
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (9 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 10/29] powerpc/pseries/mobility: error message improvements Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 12/29] powerpc/pseries/mobility: extract VASI session polling logic Nathan Lynch
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

It's incorrect to abort post-suspend processing if
ibm,activate-firmware isn't available. Use rtas_activate_firmware(),
which logs this condition appropriately and allows us to proceed.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index ef8f5641e700..dc6abf164db7 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -312,21 +312,8 @@ int pseries_devicetree_update(s32 scope)
 void post_mobility_fixup(void)
 {
 	int rc;
-	int activate_fw_token;
 
-	activate_fw_token = rtas_token("ibm,activate-firmware");
-	if (activate_fw_token == RTAS_UNKNOWN_SERVICE) {
-		printk(KERN_ERR "Could not make post-mobility "
-		       "activate-fw call.\n");
-		return;
-	}
-
-	do {
-		rc = rtas_call(activate_fw_token, 0, 1, NULL);
-	} while (rtas_busy_delay(rc));
-
-	if (rc)
-		printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
+	rtas_activate_firmware();
 
 	/*
 	 * We don't want CPUs to go online/offline while the device
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 12/29] powerpc/pseries/mobility: extract VASI session polling logic
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (10 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 11/29] powerpc/pseries/mobility: use rtas_activate_firmware() on resume Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-12-04 12:51   ` Michael Ellerman
  2020-10-30  1:17 ` [PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend Nathan Lynch
                   ` (16 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

The behavior of rtas_ibm_suspend_me_unsafe() is to return -EAGAIN to
the caller until the specified VASI suspend session state makes the
transition from H_VASI_ENABLED to H_VASI_SUSPENDING. In the interest
of separating concerns to prepare for a new implementation of the
join/suspend sequence, extract VASI session polling logic into a
couple of local functions. Waiting for the session state to reach
H_VASI_SUSPENDING before calling rtas_ibm_suspend_me_unsafe() ensures
that we will never get an EAGAIN result necessitating a retry. No
user-visible change in behavior is intended.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 76 +++++++++++++++++++++--
 1 file changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index dc6abf164db7..1b8ae221b98a 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -345,6 +345,73 @@ void post_mobility_fixup(void)
 	return;
 }
 
+static int poll_vasi_state(u64 handle, unsigned long *res)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+	long hvrc;
+	int ret;
+
+	hvrc = plpar_hcall(H_VASI_STATE, retbuf, handle);
+	switch (hvrc) {
+	case H_SUCCESS:
+		ret = 0;
+		*res = retbuf[0];
+		break;
+	case H_PARAMETER:
+		ret = -EINVAL;
+		break;
+	case H_FUNCTION:
+		ret = -EOPNOTSUPP;
+		break;
+	case H_HARDWARE:
+	default:
+		pr_err("unexpected H_VASI_STATE result %ld\n", hvrc);
+		ret = -EIO;
+		break;
+	}
+	return ret;
+}
+
+static int wait_for_vasi_session_suspending(u64 handle)
+{
+	unsigned long state;
+	bool keep_polling;
+	int ret;
+
+	/*
+	 * Wait for transition from H_VASI_ENABLED to
+	 * H_VASI_SUSPENDING. Treat anything else as an error.
+	 */
+	do {
+		keep_polling = false;
+		ret = poll_vasi_state(handle, &state);
+		if (ret != 0)
+			break;
+
+		switch (state) {
+		case H_VASI_SUSPENDING:
+			break;
+		case H_VASI_ENABLED:
+			keep_polling = true;
+			ssleep(1);
+			break;
+		default:
+			pr_err("unexpected H_VASI_STATE result %lu\n", state);
+			ret = -EIO;
+			break;
+		}
+	} while (keep_polling);
+
+	/*
+	 * Proceed even if H_VASI_STATE is unavailable. If H_JOIN or
+	 * ibm,suspend-me are also unimplemented, we'll recover then.
+	 */
+	if (ret == -EOPNOTSUPP)
+		ret = 0;
+
+	return ret;
+}
+
 static ssize_t migration_store(struct class *class,
 			       struct class_attribute *attr, const char *buf,
 			       size_t count)
@@ -356,12 +423,11 @@ static ssize_t migration_store(struct class *class,
 	if (rc)
 		return rc;
 
-	do {
-		rc = rtas_ibm_suspend_me_unsafe(streamid);
-		if (rc == -EAGAIN)
-			ssleep(1);
-	} while (rc == -EAGAIN);
+	rc = wait_for_vasi_session_suspending(streamid);
+	if (rc)
+		return rc;
 
+	rc = rtas_ibm_suspend_me_unsafe(streamid);
 	if (rc)
 		return rc;
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (11 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 12/29] powerpc/pseries/mobility: extract VASI session polling logic Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-12-04 12:52   ` Michael Ellerman
  2020-10-30  1:17 ` [PATCH 14/29] powerpc/pseries/mobility: signal suspend cancellation to platform Nathan Lynch
                   ` (15 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

The partition suspend sequence as specified in the platform
architecture requires that all active processor threads call
H_JOIN, which:

- suspends the calling thread until it is the target of
  an H_PROD; or
- immediately returns H_CONTINUE, if the calling thread is the last to
  call H_JOIN. This thread is expected to call ibm,suspend-me to
  completely suspend the partition.

Upon returning from ibm,suspend-me the calling thread must wake all
others using H_PROD.

rtas_ibm_suspend_me_unsafe() uses on_each_cpu() to implement this
protocol, but because of its synchronizing nature this is susceptible
to deadlock versus users of stop_machine() or other callers of
on_each_cpu().

Not only is stop_machine() intended for use cases like this, it
handles error propagation and allows us to keep the data shared
between CPUs minimal: a single atomic counter which ensures exactly
one CPU will wake the others from their joined states.

Switch the migration code to use stop_machine() and a less complex
local implementation of the H_JOIN/ibm,suspend-me logic, which
carries additional benefits:

- more informative error reporting, appropriately ratelimited
- resets the lockup detector / watchdog on resume to prevent lockup
  warnings when the OS has been suspended for a time exceeding the
  threshold.

Fixes: 91dc182ca6e2 ("[PATCH] powerpc: special-case ibm,suspend-me RTAS call")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 132 ++++++++++++++++++++--
 1 file changed, 125 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 1b8ae221b98a..44ca7d4e143d 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -12,9 +12,11 @@
 #include <linux/cpu.h>
 #include <linux/kernel.h>
 #include <linux/kobject.h>
+#include <linux/nmi.h>
 #include <linux/sched.h>
 #include <linux/smp.h>
 #include <linux/stat.h>
+#include <linux/stop_machine.h>
 #include <linux/completion.h>
 #include <linux/device.h>
 #include <linux/delay.h>
@@ -412,6 +414,128 @@ static int wait_for_vasi_session_suspending(u64 handle)
 	return ret;
 }
 
+static void prod_single(unsigned int target_cpu)
+{
+	long hvrc;
+	int hwid;
+
+	hwid = get_hard_smp_processor_id(target_cpu);
+	hvrc = plpar_hcall_norets(H_PROD, hwid);
+	if (hvrc == H_SUCCESS)
+		return;
+	pr_err_ratelimited("H_PROD of CPU %u (hwid %d) error: %ld\n",
+			   target_cpu, hwid, hvrc);
+}
+
+static void prod_others(void)
+{
+	unsigned int cpu;
+
+	for_each_online_cpu(cpu) {
+		if (cpu != smp_processor_id())
+			prod_single(cpu);
+	}
+}
+
+static u16 clamp_slb_size(void)
+{
+	u16 prev = mmu_slb_size;
+
+	slb_set_size(SLB_MIN_SIZE);
+
+	return prev;
+}
+
+static int do_suspend(void)
+{
+	u16 saved_slb_size;
+	int status;
+	int ret;
+
+	pr_info("calling ibm,suspend-me on CPU %i\n", smp_processor_id());
+
+	/*
+	 * The destination processor model may have fewer SLB entries
+	 * than the source. We reduce mmu_slb_size to a safe minimum
+	 * before suspending in order to minimize the possibility of
+	 * programming non-existent entries on the destination. If
+	 * suspend fails, we restore it before returning. On success
+	 * the OF reconfig path will update it from the new device
+	 * tree after resuming on the destination.
+	 */
+	saved_slb_size = clamp_slb_size();
+
+	ret = rtas_ibm_suspend_me(&status);
+	if (ret != 0) {
+		pr_err("ibm,suspend-me error: %d\n", status);
+		slb_set_size(saved_slb_size);
+	}
+
+	return ret;
+}
+
+static int do_join(void *arg)
+{
+	atomic_t *counter = arg;
+	long hvrc;
+	int ret;
+
+	/* Must ensure MSR.EE off for H_JOIN. */
+	hard_irq_disable();
+	hvrc = plpar_hcall_norets(H_JOIN);
+
+	switch (hvrc) {
+	case H_CONTINUE:
+		/*
+		 * All other CPUs are offline or in H_JOIN. This CPU
+		 * attempts the suspend.
+		 */
+		ret = do_suspend();
+		break;
+	case H_SUCCESS:
+		/*
+		 * The suspend is complete and this cpu has received a
+		 * prod.
+		 */
+		ret = 0;
+		break;
+	case H_BAD_MODE:
+	case H_HARDWARE:
+	default:
+		ret = -EIO;
+		pr_err_ratelimited("H_JOIN error %ld on CPU %i\n",
+				   hvrc, smp_processor_id());
+		break;
+	}
+
+	if (atomic_inc_return(counter) == 1) {
+		pr_info("CPU %u waking all threads\n", smp_processor_id());
+		prod_others();
+	}
+	/*
+	 * Execution may have been suspended for several seconds, so
+	 * reset the watchdog.
+	 */
+	touch_nmi_watchdog();
+	return ret;
+}
+
+static int pseries_migrate_partition(u64 handle)
+{
+	atomic_t counter = ATOMIC_INIT(0);
+	int ret;
+
+	ret = wait_for_vasi_session_suspending(handle);
+	if (ret)
+		goto out;
+
+	ret = stop_machine(do_join, &counter, cpu_online_mask);
+	if (ret == 0)
+		post_mobility_fixup();
+out:
+	return ret;
+}
+
 static ssize_t migration_store(struct class *class,
 			       struct class_attribute *attr, const char *buf,
 			       size_t count)
@@ -423,16 +547,10 @@ static ssize_t migration_store(struct class *class,
 	if (rc)
 		return rc;
 
-	rc = wait_for_vasi_session_suspending(streamid);
+	rc = pseries_migrate_partition(streamid);
 	if (rc)
 		return rc;
 
-	rc = rtas_ibm_suspend_me_unsafe(streamid);
-	if (rc)
-		return rc;
-
-	post_mobility_fixup();
-
 	return count;
 }
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 14/29] powerpc/pseries/mobility: signal suspend cancellation to platform
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (12 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 15/29] powerpc/pseries/mobility: retry partition suspend after error Nathan Lynch
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

If we're returning an error to user space, use H_VASI_SIGNAL to send a
cancellation request to the platform. This isn't strictly required but
it communicates that Linux will not attempt to complete the suspend,
which allows the various entities involved to promptly end the
operation in progress.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 31 +++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 44ca7d4e143d..0f592246f345 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -520,6 +520,35 @@ static int do_join(void *arg)
 	return ret;
 }
 
+/*
+ * Abort reason code byte 0. We use only the 'Migrating partition' value.
+ */
+enum vasi_aborting_entity {
+	ORCHESTRATOR        = 1,
+	VSP_SOURCE          = 2,
+	PARTITION_FIRMWARE  = 3,
+	PLATFORM_FIRMWARE   = 4,
+	VSP_TARGET          = 5,
+	MIGRATING_PARTITION = 6,
+};
+
+static void pseries_cancel_migration(u64 handle, int err)
+{
+	u32 reason_code;
+	u32 detail;
+	u8 entity;
+	long hvrc;
+
+	entity = MIGRATING_PARTITION;
+	detail = abs(err) & 0xffffff;
+	reason_code = (entity << 24) | detail;
+
+	hvrc = plpar_hcall_norets(H_VASI_SIGNAL, handle,
+				  H_VASI_SIGNAL_CANCEL, reason_code);
+	if (hvrc)
+		pr_err("H_VASI_SIGNAL error: %ld\n", hvrc);
+}
+
 static int pseries_migrate_partition(u64 handle)
 {
 	atomic_t counter = ATOMIC_INIT(0);
@@ -532,6 +561,8 @@ static int pseries_migrate_partition(u64 handle)
 	ret = stop_machine(do_join, &counter, cpu_online_mask);
 	if (ret == 0)
 		post_mobility_fixup();
+	else
+		pseries_cancel_migration(handle, ret);
 out:
 	return ret;
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 15/29] powerpc/pseries/mobility: retry partition suspend after error
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (13 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 14/29] powerpc/pseries/mobility: signal suspend cancellation to platform Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 16/29] powerpc/rtas: dispatch partition migration requests to pseries Nathan Lynch
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

This is a mitigation for the relatively rare occurrence where a
virtual IOA can be in a transient state that prevents the
suspend/migration from succeeding, resulting in an error from
ibm,suspend-me.

If the join/suspend sequence returns an error, it is acceptable to
retry as long as the VASI suspend session state is still
"Suspending" (i.e. the platform is still waiting for the OS to
suspend).

Retry a few times on suspend failure while this condition holds,
progressively increasing the delay between attempts. We don't want to
retry indefinitey because firmware emits an error log event on each
unsuccessful attempt.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 59 ++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 0f592246f345..e459cdb8286f 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -549,16 +549,71 @@ static void pseries_cancel_migration(u64 handle, int err)
 		pr_err("H_VASI_SIGNAL error: %ld\n", hvrc);
 }
 
+static int pseries_suspend(u64 handle)
+{
+	const unsigned int max_attempts = 5;
+	unsigned int retry_interval_ms = 1;
+	unsigned int attempt = 1;
+	int ret;
+
+	while (true) {
+		atomic_t counter = ATOMIC_INIT(0);
+		unsigned long vasi_state;
+		int vasi_err;
+
+		ret = stop_machine(do_join, &counter, cpu_online_mask);
+		if (ret == 0)
+			break;
+		/*
+		 * Encountered an error. If the VASI stream is still
+		 * in Suspending state, it's likely a transient
+		 * condition related to some device in the partition
+		 * and we can retry in the hope that the cause has
+		 * cleared after some delay.
+		 *
+		 * A better design would allow drivers etc to prepare
+		 * for the suspend and avoid conditions which prevent
+		 * the suspend from succeeding. For now, we have this
+		 * mitigation.
+		 */
+		pr_notice("Partition suspend attempt %u of %u error: %d\n",
+			  attempt, max_attempts, ret);
+
+		if (attempt == max_attempts)
+			break;
+
+		vasi_err = poll_vasi_state(handle, &vasi_state);
+		if (vasi_err == 0) {
+			if (vasi_state != H_VASI_SUSPENDING) {
+				pr_notice("VASI state %lu after failed suspend\n",
+					  vasi_state);
+				break;
+			}
+		} else if (vasi_err != -EOPNOTSUPP) {
+			pr_err("VASI state poll error: %d", vasi_err);
+			break;
+		}
+
+		pr_notice("Will retry partition suspend after %u ms\n",
+			  retry_interval_ms);
+
+		msleep(retry_interval_ms);
+		retry_interval_ms *= 10;
+		attempt++;
+	}
+
+	return ret;
+}
+
 static int pseries_migrate_partition(u64 handle)
 {
-	atomic_t counter = ATOMIC_INIT(0);
 	int ret;
 
 	ret = wait_for_vasi_session_suspending(handle);
 	if (ret)
 		goto out;
 
-	ret = stop_machine(do_join, &counter, cpu_online_mask);
+	ret = pseries_suspend(handle);
 	if (ret == 0)
 		post_mobility_fixup();
 	else
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 16/29] powerpc/rtas: dispatch partition migration requests to pseries
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (14 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 15/29] powerpc/pseries/mobility: retry partition suspend after error Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-12-04 12:52   ` Michael Ellerman
  2020-10-30  1:17 ` [PATCH 17/29] powerpc/rtas: remove rtas_ibm_suspend_me_unsafe() Nathan Lynch
                   ` (12 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

sys_rtas() cannot call ibm,suspend-me directly in the same way it
handles other inputs. Instead it must dispatch the request to code
that can first perform the H_JOIN sequence before any call to
ibm,suspend-me can succeed. Over time kernel/rtas.c has accreted a fair
amount of platform-specific code to implement this.

Since a different, more robust implementation of the suspend sequence
is now in the pseries platform code, we want to dispatch the request
there while minimizing additional dependence on pseries.

Use a weak function that only pseries overrides. Note that invoking
ibm,suspend-me via the RTAS syscall is all but deprecated; this change
preserves ABI compatibility for old programs while providing to them
the benefit of the new partition suspend implementation. This is a
behavior change in that the kernel performs the device tree update and
firmware activation before returning, but experimentation indicates
this is tolerated fine by legacy user space.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h           | 1 +
 arch/powerpc/kernel/rtas.c                | 8 +++++++-
 arch/powerpc/platforms/pseries/mobility.c | 5 +++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index fdefe6a974eb..be0fc2536673 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -260,6 +260,7 @@ extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
 extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
 int rtas_ibm_suspend_me_unsafe(u64 handle);
 int rtas_ibm_suspend_me(int *fw_status);
+int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
 
 struct rtc_time;
 extern time64_t rtas_get_boot_time(void);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 58bbd69a233f..52fb394f15d6 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1221,6 +1221,12 @@ static bool block_rtas_call(int token, int nargs,
 
 #endif /* CONFIG_PPC_RTAS_FILTER */
 
+/* Only pseries should need to override this. */
+int __weak rtas_syscall_dispatch_ibm_suspend_me(u64 handle)
+{
+	return -EINVAL;
+}
+
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
@@ -1271,7 +1277,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 		int rc = 0;
 		u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
 		              | be32_to_cpu(args.args[1]);
-		rc = rtas_ibm_suspend_me_unsafe(handle);
+		rc = rtas_syscall_dispatch_ibm_suspend_me(handle);
 		if (rc == -EAGAIN)
 			args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
 		else if (rc == -EIO)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e459cdb8286f..d6417c9db201 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -622,6 +622,11 @@ static int pseries_migrate_partition(u64 handle)
 	return ret;
 }
 
+int rtas_syscall_dispatch_ibm_suspend_me(u64 handle)
+{
+	return pseries_migrate_partition(handle);
+}
+
 static ssize_t migration_store(struct class *class,
 			       struct class_attribute *attr, const char *buf,
 			       size_t count)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 17/29] powerpc/rtas: remove rtas_ibm_suspend_me_unsafe()
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (15 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 16/29] powerpc/rtas: dispatch partition migration requests to pseries Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 18/29] powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops Nathan Lynch
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

rtas_ibm_suspend_me_unsafe() is now unused; remove it and
rtas_percpu_suspend_me() which becomes unused as a result.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  1 -
 arch/powerpc/kernel/rtas.c      | 64 ---------------------------------
 2 files changed, 65 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index be0fc2536673..1e695e553a36 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -258,7 +258,6 @@ extern int rtas_set_indicator_fast(int indicator, int index, int new_value);
 extern void rtas_progress(char *s, unsigned short hex);
 extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
 extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
-int rtas_ibm_suspend_me_unsafe(u64 handle);
 int rtas_ibm_suspend_me(int *fw_status);
 int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 52fb394f15d6..6fde38b488f7 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -895,70 +895,6 @@ int rtas_suspend_cpu(struct rtas_suspend_me_data *data)
 	return __rtas_suspend_cpu(data, 0);
 }
 
-static void rtas_percpu_suspend_me(void *info)
-{
-	__rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1);
-}
-
-int rtas_ibm_suspend_me_unsafe(u64 handle)
-{
-	long state;
-	long rc;
-	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
-	struct rtas_suspend_me_data data;
-	DECLARE_COMPLETION_ONSTACK(done);
-
-	if (!rtas_service_present("ibm,suspend-me"))
-		return -ENOSYS;
-
-	/* Make sure the state is valid */
-	rc = plpar_hcall(H_VASI_STATE, retbuf, handle);
-
-	state = retbuf[0];
-
-	if (rc) {
-		printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned %ld\n",rc);
-		return rc;
-	} else if (state == H_VASI_ENABLED) {
-		return -EAGAIN;
-	} else if (state != H_VASI_SUSPENDING) {
-		printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned state %ld\n",
-		       state);
-		return -EIO;
-	}
-
-	atomic_set(&data.working, 0);
-	atomic_set(&data.done, 0);
-	atomic_set(&data.error, 0);
-	data.token = rtas_token("ibm,suspend-me");
-	data.complete = &done;
-
-	lock_device_hotplug();
-
-	cpu_hotplug_disable();
-
-	/* Call function on all CPUs.  One of us will make the
-	 * rtas call
-	 */
-	on_each_cpu(rtas_percpu_suspend_me, &data, 0);
-
-	wait_for_completion(&done);
-
-	if (atomic_read(&data.error) != 0)
-		printk(KERN_ERR "Error doing global join\n");
-
-
-	cpu_hotplug_enable();
-
-	unlock_device_hotplug();
-
-	return atomic_read(&data.error);
-}
-#else /* CONFIG_PPC_PSERIES */
-int rtas_ibm_suspend_me_unsafe(u64 handle)
-{
-	return -ENOSYS;
-}
 #endif
 
 /**
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 18/29] powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (16 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 17/29] powerpc/rtas: remove rtas_ibm_suspend_me_unsafe() Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 19/29] powerpc/pseries/hibernation: pass stream id via function arguments Nathan Lynch
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

There are three ways pseries_suspend_begin() can be reached:

1. When "mem" is written to /sys/power/state:

kobj_attr_store()
-> state_store()
  -> pm_suspend()
    -> suspend_devices_and_enter()
      -> pseries_suspend_begin()

This never works because there is no way to supply a valid stream id
using this interface, and H_VASI_STATE is called with a stream id of
zero. So this call path is useless at best.

2. When a stream id is written to /sys/devices/system/power/hibernate.
pseries_suspend_begin() is polled directly from store_hibernate()
until the stream is in the "Suspending" state (i.e. the platform is
ready for the OS to suspend execution):

dev_attr_store()
-> store_hibernate()
  -> pseries_suspend_begin()

3. When a stream id is written to /sys/devices/system/power/hibernate
(continued). After #2, pseries_suspend_begin() is called once again
from the pm core:

dev_attr_store()
-> store_hibernate()
  -> pm_suspend()
    -> suspend_devices_and_enter()
      -> pseries_suspend_begin()

This is redundant because the VASI suspend state is already known to
be Suspending.

The begin() callback of platform_suspend_ops is optional, so we can
simply remove that assignment with no loss of function.

Fixes: 32d8ad4e621d ("powerpc/pseries: Partition hibernation support")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/suspend.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 81e0ac58d620..3eaa9d59dc7a 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -187,7 +187,6 @@ static struct bus_type suspend_subsys = {
 
 static const struct platform_suspend_ops pseries_suspend_ops = {
 	.valid		= suspend_valid_only_mem,
-	.begin		= pseries_suspend_begin,
 	.prepare_late	= pseries_prepare_late,
 	.enter		= pseries_suspend_enter,
 };
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 19/29] powerpc/pseries/hibernation: pass stream id via function arguments
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (17 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 18/29] powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 20/29] powerpc/pseries/hibernation: remove pseries_suspend_cpu() Nathan Lynch
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

There is no need for the stream id to be a file-global variable; pass
it from hibernate_store() to pseries_suspend_begin() for the
H_VASI_STATE call.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/suspend.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 3eaa9d59dc7a..232621f33510 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -15,7 +15,6 @@
 #include <asm/topology.h>
 #include "../../kernel/cacheinfo.h"
 
-static u64 stream_id;
 static struct device suspend_dev;
 static DECLARE_COMPLETION(suspend_work);
 static struct rtas_suspend_me_data suspend_data;
@@ -29,7 +28,7 @@ static atomic_t suspending;
  * Return value:
  * 	0 on success / other on failure
  **/
-static int pseries_suspend_begin(suspend_state_t state)
+static int pseries_suspend_begin(u64 stream_id)
 {
 	long vasi_state, rc;
 	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
@@ -132,6 +131,7 @@ static ssize_t store_hibernate(struct device *dev,
 			       struct device_attribute *attr,
 			       const char *buf, size_t count)
 {
+	u64 stream_id;
 	int rc;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -140,7 +140,7 @@ static ssize_t store_hibernate(struct device *dev,
 	stream_id = simple_strtoul(buf, NULL, 16);
 
 	do {
-		rc = pseries_suspend_begin(PM_SUSPEND_MEM);
+		rc = pseries_suspend_begin(stream_id);
 		if (rc == -EAGAIN)
 			ssleep(1);
 	} while (rc == -EAGAIN);
@@ -148,8 +148,6 @@ static ssize_t store_hibernate(struct device *dev,
 	if (!rc)
 		rc = pm_suspend(PM_SUSPEND_MEM);
 
-	stream_id = 0;
-
 	if (!rc)
 		rc = count;
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 20/29] powerpc/pseries/hibernation: remove pseries_suspend_cpu()
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (18 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 19/29] powerpc/pseries/hibernation: pass stream id via function arguments Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 21/29] powerpc/machdep: remove suspend_disable_cpu() Nathan Lynch
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Since commit 48f6e7f6d948 ("powerpc/pseries: remove cede offline state
for CPUs"), ppc_md.suspend_disable_cpu() is no longer used and all
CPUs (save one) are placed into true offline state as opposed to
H_JOIN. So pseries_suspend_cpu() is effectively unused; remove it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/suspend.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 232621f33510..3315d698d5ab 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -48,20 +48,6 @@ static int pseries_suspend_begin(u64 stream_id)
 		       vasi_state);
 		return -EIO;
 	}
-
-	return 0;
-}
-
-/**
- * pseries_suspend_cpu - Suspend a single CPU
- *
- * Makes the H_JOIN call to suspend the CPU
- *
- **/
-static int pseries_suspend_cpu(void)
-{
-	if (atomic_read(&suspending))
-		return rtas_suspend_cpu(&suspend_data);
 	return 0;
 }
 
@@ -235,7 +221,6 @@ static int __init pseries_suspend_init(void)
 	if ((rc = pseries_suspend_sysfs_register(&suspend_dev)))
 		return rc;
 
-	ppc_md.suspend_disable_cpu = pseries_suspend_cpu;
 	ppc_md.suspend_enable_irqs = pseries_suspend_enable_irqs;
 	suspend_set_ops(&pseries_suspend_ops);
 	return 0;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 21/29] powerpc/machdep: remove suspend_disable_cpu()
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (19 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 20/29] powerpc/pseries/hibernation: remove pseries_suspend_cpu() Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 22/29] powerpc/rtas: remove rtas_suspend_cpu() Nathan Lynch
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

There are no users left of the suspend_disable_cpu() callback, remove
it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/machdep.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 475687f24f4a..cf6ebbc16cb4 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -207,7 +207,6 @@ struct machdep_calls {
 	void (*suspend_disable_irqs)(void);
 	void (*suspend_enable_irqs)(void);
 #endif
-	int (*suspend_disable_cpu)(void);
 
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 	ssize_t (*cpu_probe)(const char *, size_t);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 22/29] powerpc/rtas: remove rtas_suspend_cpu()
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (20 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 21/29] powerpc/machdep: remove suspend_disable_cpu() Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:17 ` [PATCH 23/29] powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me() Nathan Lynch
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

rtas_suspend_cpu() no longer has users; remove it and
__rtas_suspend_cpu() which now becomes unused as well.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  1 -
 arch/powerpc/kernel/rtas.c      | 52 ---------------------------------
 2 files changed, 53 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 1e695e553a36..86f5d07969e4 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -256,7 +256,6 @@ extern bool rtas_indicator_present(int token, int *maxindex);
 extern int rtas_set_indicator(int indicator, int index, int new_value);
 extern int rtas_set_indicator_fast(int indicator, int index, int new_value);
 extern void rtas_progress(char *s, unsigned short hex);
-extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
 extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
 int rtas_ibm_suspend_me(int *fw_status);
 int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 6fde38b488f7..4445219e92ce 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -843,58 +843,6 @@ int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data)
 	return __rtas_suspend_last_cpu(data, 0);
 }
 
-static int __rtas_suspend_cpu(struct rtas_suspend_me_data *data, int wake_when_done)
-{
-	long rc = H_SUCCESS;
-	unsigned long msr_save;
-	int cpu;
-
-	atomic_inc(&data->working);
-
-	/* really need to ensure MSR.EE is off for H_JOIN */
-	msr_save = mfmsr();
-	mtmsr(msr_save & ~(MSR_EE));
-
-	while (rc == H_SUCCESS && !atomic_read(&data->done) && !atomic_read(&data->error))
-		rc = plpar_hcall_norets(H_JOIN);
-
-	mtmsr(msr_save);
-
-	if (rc == H_SUCCESS) {
-		/* This cpu was prodded and the suspend is complete. */
-		goto out;
-	} else if (rc == H_CONTINUE) {
-		/* All other cpus are in H_JOIN, this cpu does
-		 * the suspend.
-		 */
-		return __rtas_suspend_last_cpu(data, wake_when_done);
-	} else {
-		printk(KERN_ERR "H_JOIN on cpu %i failed with rc = %ld\n",
-		       smp_processor_id(), rc);
-		atomic_set(&data->error, rc);
-	}
-
-	if (wake_when_done) {
-		atomic_set(&data->done, 1);
-
-		/* This cpu did the suspend or got an error; in either case,
-		 * we need to prod all other other cpus out of join state.
-		 * Extra prods are harmless.
-		 */
-		for_each_online_cpu(cpu)
-			plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu));
-	}
-out:
-	if (atomic_dec_return(&data->working) == 0)
-		complete(data->complete);
-	return rc;
-}
-
-int rtas_suspend_cpu(struct rtas_suspend_me_data *data)
-{
-	return __rtas_suspend_cpu(data, 0);
-}
-
 #endif
 
 /**
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 23/29] powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me()
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (21 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 22/29] powerpc/rtas: remove rtas_suspend_cpu() Nathan Lynch
@ 2020-10-30  1:17 ` Nathan Lynch
  2020-10-30  1:18 ` [PATCH 24/29] powerpc/rtas: remove unused rtas_suspend_last_cpu() Nathan Lynch
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

rtas_suspend_last_cpu() and related code perform a lot of work that
isn't relevant to the hibernation workflow. All other CPUs are offline
when called so there is no need to place them in H_JOIN or prod them
on resume, nor is there need for retries or operations on shared
state.

Call the rtas_ibm_suspend_me() wrapper function directly from
pseries_suspend_enter() instead of using rtas_suspend_last_cpu().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/suspend.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 3315d698d5ab..703728cb95ec 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -76,11 +76,7 @@ static void pseries_suspend_enable_irqs(void)
  **/
 static int pseries_suspend_enter(suspend_state_t state)
 {
-	int rc = rtas_suspend_last_cpu(&suspend_data);
-
-	atomic_set(&suspending, 0);
-	atomic_set(&suspend_data.done, 1);
-	return rc;
+	return rtas_ibm_suspend_me(NULL);
 }
 
 /**
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 24/29] powerpc/rtas: remove unused rtas_suspend_last_cpu()
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (22 preceding siblings ...)
  2020-10-30  1:17 ` [PATCH 23/29] powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me() Nathan Lynch
@ 2020-10-30  1:18 ` Nathan Lynch
  2020-10-30  1:18 ` [PATCH 25/29] powerpc/pseries/hibernation: remove redundant cacheinfo update Nathan Lynch
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:18 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

rtas_suspend_last_cpu() is now unused, remove it and
__rtas_suspend_last_cpu() which also becomes unused.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  1 -
 arch/powerpc/kernel/rtas.c      | 45 ---------------------------------
 2 files changed, 46 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 86f5d07969e4..d405b4bd659b 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -256,7 +256,6 @@ extern bool rtas_indicator_present(int token, int *maxindex);
 extern int rtas_set_indicator(int indicator, int index, int new_value);
 extern int rtas_set_indicator_fast(int indicator, int index, int new_value);
 extern void rtas_progress(char *s, unsigned short hex);
-extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
 int rtas_ibm_suspend_me(int *fw_status);
 int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 4445219e92ce..98ec9ffaa3b3 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -799,51 +799,6 @@ void rtas_os_term(char *str)
 }
 
 static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE;
-#ifdef CONFIG_PPC_PSERIES
-static int __rtas_suspend_last_cpu(struct rtas_suspend_me_data *data, int wake_when_done)
-{
-	u16 slb_size = mmu_slb_size;
-	int rc = H_MULTI_THREADS_ACTIVE;
-	int cpu;
-
-	slb_set_size(SLB_MIN_SIZE);
-	printk(KERN_DEBUG "calling ibm,suspend-me on cpu %i\n", smp_processor_id());
-
-	while (rc == H_MULTI_THREADS_ACTIVE && !atomic_read(&data->done) &&
-	       !atomic_read(&data->error))
-		rc = rtas_call(data->token, 0, 1, NULL);
-
-	if (rc || atomic_read(&data->error)) {
-		printk(KERN_DEBUG "ibm,suspend-me returned %d\n", rc);
-		slb_set_size(slb_size);
-	}
-
-	if (atomic_read(&data->error))
-		rc = atomic_read(&data->error);
-
-	atomic_set(&data->error, rc);
-	pSeries_coalesce_init();
-
-	if (wake_when_done) {
-		atomic_set(&data->done, 1);
-
-		for_each_online_cpu(cpu)
-			plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu));
-	}
-
-	if (atomic_dec_return(&data->working) == 0)
-		complete(data->complete);
-
-	return rc;
-}
-
-int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data)
-{
-	atomic_inc(&data->working);
-	return __rtas_suspend_last_cpu(data, 0);
-}
-
-#endif
 
 /**
  * rtas_activate_firmware() - Activate a new version of firmware.
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 25/29] powerpc/pseries/hibernation: remove redundant cacheinfo update
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (23 preceding siblings ...)
  2020-10-30  1:18 ` [PATCH 24/29] powerpc/rtas: remove unused rtas_suspend_last_cpu() Nathan Lynch
@ 2020-10-30  1:18 ` Nathan Lynch
  2020-10-30  1:18 ` [PATCH 26/29] powerpc/pseries/hibernation: perform post-suspend fixups later Nathan Lynch
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:18 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Partitions with cache nodes in the device tree can encounter the
following warning on resume:

CPU 0 already accounted in PowerPC,POWER9@0(Data)
WARNING: CPU: 0 PID: 3177 at arch/powerpc/kernel/cacheinfo.c:197 cacheinfo_cpu_online+0x640/0x820

These calls to cacheinfo_cpu_offline/online have been redundant since
commit e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo
hierarchy post-migration").

Fixes: e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/suspend.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 703728cb95ec..6a94cc0deb88 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -13,7 +13,6 @@
 #include <asm/mmu.h>
 #include <asm/rtas.h>
 #include <asm/topology.h>
-#include "../../kernel/cacheinfo.h"
 
 static struct device suspend_dev;
 static DECLARE_COMPLETION(suspend_work);
@@ -63,9 +62,7 @@ static void pseries_suspend_enable_irqs(void)
 	 * Update configuration which can be modified based on device tree
 	 * changes during resume.
 	 */
-	cacheinfo_cpu_offline(smp_processor_id());
 	post_mobility_fixup();
-	cacheinfo_cpu_online(smp_processor_id());
 }
 
 /**
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 26/29] powerpc/pseries/hibernation: perform post-suspend fixups later
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (24 preceding siblings ...)
  2020-10-30  1:18 ` [PATCH 25/29] powerpc/pseries/hibernation: remove redundant cacheinfo update Nathan Lynch
@ 2020-10-30  1:18 ` Nathan Lynch
  2020-10-30  1:18 ` [PATCH 27/29] powerpc/pseries/hibernation: remove prepare_late() callback Nathan Lynch
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:18 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

The pseries hibernate code calls post_mobility_fixup() which is sort
of a dumping ground of fixups that need to run after resuming from
suspend regardless of whether suspend was a hibernation or a
migration. Calling post_mobility_fixup() from
pseries_suspend_enable_irqs() runs this code early in resume with
devices suspended and only one CPU up, while the much more commonly
used migration case runs these fixups in a more typical process
context.

Call post_mobility_fixup() after the suspend core returns a success
status to the hibernate sysfs store method and remove
pseries_suspend_enable_irqs().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/suspend.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 6a94cc0deb88..589a91730db8 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -50,21 +50,6 @@ static int pseries_suspend_begin(u64 stream_id)
 	return 0;
 }
 
-/**
- * pseries_suspend_enable_irqs
- *
- * Post suspend configuration updates
- *
- **/
-static void pseries_suspend_enable_irqs(void)
-{
-	/*
-	 * Update configuration which can be modified based on device tree
-	 * changes during resume.
-	 */
-	post_mobility_fixup();
-}
-
 /**
  * pseries_suspend_enter - Final phase of hibernation
  *
@@ -127,8 +112,11 @@ static ssize_t store_hibernate(struct device *dev,
 	if (!rc)
 		rc = pm_suspend(PM_SUSPEND_MEM);
 
-	if (!rc)
+	if (!rc) {
 		rc = count;
+		post_mobility_fixup();
+	}
+
 
 	return rc;
 }
@@ -214,7 +202,6 @@ static int __init pseries_suspend_init(void)
 	if ((rc = pseries_suspend_sysfs_register(&suspend_dev)))
 		return rc;
 
-	ppc_md.suspend_enable_irqs = pseries_suspend_enable_irqs;
 	suspend_set_ops(&pseries_suspend_ops);
 	return 0;
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 27/29] powerpc/pseries/hibernation: remove prepare_late() callback
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (25 preceding siblings ...)
  2020-10-30  1:18 ` [PATCH 26/29] powerpc/pseries/hibernation: perform post-suspend fixups later Nathan Lynch
@ 2020-10-30  1:18 ` Nathan Lynch
  2020-10-30  1:18 ` [PATCH 28/29] powerpc/rtas: remove unused rtas_suspend_me_data Nathan Lynch
  2020-10-30  1:18 ` [PATCH 29/29] powerpc/pseries/mobility: refactor node lookup during DT update Nathan Lynch
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:18 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

The pseries hibernate code no longer calls into the original
join/suspend code in kernel/rtas.c, so pseries_prepare_late() and
related code don't accomplish anything now.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/suspend.c | 25 ------------------------
 1 file changed, 25 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 589a91730db8..1b902cbf85c5 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -15,9 +15,6 @@
 #include <asm/topology.h>
 
 static struct device suspend_dev;
-static DECLARE_COMPLETION(suspend_work);
-static struct rtas_suspend_me_data suspend_data;
-static atomic_t suspending;
 
 /**
  * pseries_suspend_begin - First phase of hibernation
@@ -61,23 +58,6 @@ static int pseries_suspend_enter(suspend_state_t state)
 	return rtas_ibm_suspend_me(NULL);
 }
 
-/**
- * pseries_prepare_late - Prepare to suspend all other CPUs
- *
- * Return value:
- * 	0 on success / other on failure
- **/
-static int pseries_prepare_late(void)
-{
-	atomic_set(&suspending, 1);
-	atomic_set(&suspend_data.working, 0);
-	atomic_set(&suspend_data.done, 0);
-	atomic_set(&suspend_data.error, 0);
-	suspend_data.complete = &suspend_work;
-	reinit_completion(&suspend_work);
-	return 0;
-}
-
 /**
  * store_hibernate - Initiate partition hibernation
  * @dev:		subsys root device
@@ -152,7 +132,6 @@ static struct bus_type suspend_subsys = {
 
 static const struct platform_suspend_ops pseries_suspend_ops = {
 	.valid		= suspend_valid_only_mem,
-	.prepare_late	= pseries_prepare_late,
 	.enter		= pseries_suspend_enter,
 };
 
@@ -195,10 +174,6 @@ static int __init pseries_suspend_init(void)
 	if (!firmware_has_feature(FW_FEATURE_LPAR))
 		return 0;
 
-	suspend_data.token = rtas_token("ibm,suspend-me");
-	if (suspend_data.token == RTAS_UNKNOWN_SERVICE)
-		return 0;
-
 	if ((rc = pseries_suspend_sysfs_register(&suspend_dev)))
 		return rc;
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 28/29] powerpc/rtas: remove unused rtas_suspend_me_data
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (26 preceding siblings ...)
  2020-10-30  1:18 ` [PATCH 27/29] powerpc/pseries/hibernation: remove prepare_late() callback Nathan Lynch
@ 2020-10-30  1:18 ` Nathan Lynch
  2020-10-30  1:18 ` [PATCH 29/29] powerpc/pseries/mobility: refactor node lookup during DT update Nathan Lynch
  28 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:18 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

All code which used this type has been removed.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas-types.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas-types.h b/arch/powerpc/include/asm/rtas-types.h
index aa420561bc10..8df6235d64d1 100644
--- a/arch/powerpc/include/asm/rtas-types.h
+++ b/arch/powerpc/include/asm/rtas-types.h
@@ -23,14 +23,6 @@ struct rtas_t {
 	struct device_node *dev;	/* virtual address pointer */
 };
 
-struct rtas_suspend_me_data {
-	atomic_t working; /* number of cpus accessing this struct */
-	atomic_t done;
-	int token; /* ibm,suspend-me */
-	atomic_t error;
-	struct completion *complete; /* wait on this until working == 0 */
-};
-
 struct rtas_error_log {
 	/* Byte 0 */
 	u8		byte0;			/* Architectural version */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 29/29] powerpc/pseries/mobility: refactor node lookup during DT update
  2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
                   ` (27 preceding siblings ...)
  2020-10-30  1:18 ` [PATCH 28/29] powerpc/rtas: remove unused rtas_suspend_me_data Nathan Lynch
@ 2020-10-30  1:18 ` Nathan Lynch
  2020-11-20 16:09   ` Nathan Lynch
  28 siblings, 1 reply; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30  1:18 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

In pseries_devicetree_update(), with each call to ibm,update-nodes the
partition firmware communicates the node to be deleted or updated by
placing its phandle in the work buffer. Each of delete_dt_node(),
update_dt_node(), and add_dt_node() have duplicate lookups using the
phandle value and corresponding refcount management.

Move the lookup and of_node_put() into pseries_devicetree_update(),
and emit a warning on any failed lookups.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 45 ++++++++---------------
 1 file changed, 16 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index d6417c9db201..5521b63898aa 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -61,18 +61,10 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
 	return rc;
 }
 
-static int delete_dt_node(__be32 phandle)
+static int delete_dt_node(struct device_node *dn)
 {
-	struct device_node *dn;
-
-	dn = of_find_node_by_phandle(be32_to_cpu(phandle));
-	if (!dn)
-		return -ENOENT;
-
 	pr_debug("removing node %pOFfp\n", dn);
-
 	dlpar_detach_node(dn);
-	of_node_put(dn);
 	return 0;
 }
 
@@ -137,10 +129,9 @@ static int update_dt_property(struct device_node *dn, struct property **prop,
 	return 0;
 }
 
-static int update_dt_node(__be32 phandle, s32 scope)
+static int update_dt_node(struct device_node *dn, s32 scope)
 {
 	struct update_props_workarea *upwa;
-	struct device_node *dn;
 	struct property *prop = NULL;
 	int i, rc, rtas_rc;
 	char *prop_data;
@@ -157,14 +148,8 @@ static int update_dt_node(__be32 phandle, s32 scope)
 	if (!rtas_buf)
 		return -ENOMEM;
 
-	dn = of_find_node_by_phandle(be32_to_cpu(phandle));
-	if (!dn) {
-		kfree(rtas_buf);
-		return -ENOENT;
-	}
-
 	upwa = (struct update_props_workarea *)&rtas_buf[0];
-	upwa->phandle = phandle;
+	upwa->phandle = cpu_to_be32(dn->phandle);
 
 	do {
 		rtas_rc = mobility_rtas_call(update_properties_token, rtas_buf,
@@ -224,21 +209,15 @@ static int update_dt_node(__be32 phandle, s32 scope)
 		cond_resched();
 	} while (rtas_rc == 1);
 
-	of_node_put(dn);
 	kfree(rtas_buf);
 	return 0;
 }
 
-static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
+static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
 {
 	struct device_node *dn;
-	struct device_node *parent_dn;
 	int rc;
 
-	parent_dn = of_find_node_by_phandle(be32_to_cpu(parent_phandle));
-	if (!parent_dn)
-		return -ENOENT;
-
 	dn = dlpar_configure_connector(drc_index, parent_dn);
 	if (!dn) {
 		of_node_put(parent_dn);
@@ -251,7 +230,6 @@ static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
 
 	pr_debug("added node %pOFfp\n", dn);
 
-	of_node_put(parent_dn);
 	return rc;
 }
 
@@ -284,22 +262,31 @@ int pseries_devicetree_update(s32 scope)
 			data++;
 
 			for (i = 0; i < node_count; i++) {
+				struct device_node *np;
 				__be32 phandle = *data++;
 				__be32 drc_index;
 
+				np = of_find_node_by_phandle(be32_to_cpu(phandle));
+				if (!np) {
+					pr_warn("Failed lookup: phandle 0x%x for action 0x%x\n",
+						be32_to_cpu(phandle), action);
+					continue;
+				}
+
 				switch (action) {
 				case DELETE_DT_NODE:
-					delete_dt_node(phandle);
+					delete_dt_node(np);
 					break;
 				case UPDATE_DT_NODE:
-					update_dt_node(phandle, scope);
+					update_dt_node(np, scope);
 					break;
 				case ADD_DT_NODE:
 					drc_index = *data++;
-					add_dt_node(phandle, drc_index);
+					add_dt_node(np, drc_index);
 					break;
 				}
 
+				of_node_put(np);
 				cond_resched();
 			}
 		}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE
  2020-10-30  1:17 ` [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE Nathan Lynch
@ 2020-10-30  3:45   ` Andrew Donnellan
  2020-10-30 12:10     ` Nathan Lynch
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Donnellan @ 2020-10-30  3:45 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, brking, mmc, cforno12, drt

On 30/10/20 12:17 pm, Nathan Lynch wrote:
> While drmgr has had work in some areas to make its RTAS syscall
> interactions endian-neutral, its code for performing partition
> migration via the syscall has never worked on LE. While it is able to
> complete ibm,suspend-me successfully, it crashes when attempting the
> subsequent ibm,update-nodes call.
> 
> drmgr is the only known (or plausible) user of these ibm,suspend-me,
> ibm,update-nodes, and ibm,update-properties, so allow them only in
> big-endian configurations.

And there's a zero chance that drmgr will ever be fixed on LE?

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE
  2020-10-30  3:45   ` Andrew Donnellan
@ 2020-10-30 12:10     ` Nathan Lynch
  2020-11-06 14:59       ` Andrew Donnellan
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Lynch @ 2020-10-30 12:10 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: tyreld, brking, mmc, cforno12, drt

Andrew Donnellan <ajd@linux.ibm.com> writes:
> On 30/10/20 12:17 pm, Nathan Lynch wrote:
>> While drmgr has had work in some areas to make its RTAS syscall
>> interactions endian-neutral, its code for performing partition
>> migration via the syscall has never worked on LE. While it is able to
>> complete ibm,suspend-me successfully, it crashes when attempting the
>> subsequent ibm,update-nodes call.
>> 
>> drmgr is the only known (or plausible) user of these ibm,suspend-me,
>> ibm,update-nodes, and ibm,update-properties, so allow them only in
>> big-endian configurations.
>
> And there's a zero chance that drmgr will ever be fixed on LE?

It's always used the sysfs interface on LE, and the only way to provoke
it to attempt the syscalls is by doing something like this before
running the migration:

# echo 0 > /tmp/fake_api_version
# mount -o bind,ro /tmp/fake_api_version /sys/kernel/mobility/api_version

So I'm not aware of any circumstance that would actually motivate
someone to make it work on LE. I'd say it's more likely that drmgr will
drop its support for using the syscall altogether.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE
  2020-10-30 12:10     ` Nathan Lynch
@ 2020-11-06 14:59       ` Andrew Donnellan
  2020-11-06 16:44         ` Nathan Lynch
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Donnellan @ 2020-11-06 14:59 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, brking, mmc, cforno12, drt

On 30/10/20 11:10 pm, Nathan Lynch wrote:
>> And there's a zero chance that drmgr will ever be fixed on LE?
> 
> It's always used the sysfs interface on LE, and the only way to provoke
> it to attempt the syscalls is by doing something like this before
> running the migration:
> 
> # echo 0 > /tmp/fake_api_version
> # mount -o bind,ro /tmp/fake_api_version /sys/kernel/mobility/api_version
> 
> So I'm not aware of any circumstance that would actually motivate
> someone to make it work on LE. I'd say it's more likely that drmgr will
> drop its support for using the syscall altogether.
> 

Okay. librtas should handle the error fine, so I don't have any huge 
objection to this, though perhaps a documentation patch to librtas to 
mention that these calls are BE-only would be in order.

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE
  2020-11-06 14:59       ` Andrew Donnellan
@ 2020-11-06 16:44         ` Nathan Lynch
  0 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-11-06 16:44 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: tyreld, brking, mmc, cforno12, drt

Andrew Donnellan <ajd@linux.ibm.com> writes:
> On 30/10/20 11:10 pm, Nathan Lynch wrote:
>>> And there's a zero chance that drmgr will ever be fixed on LE?
>> 
>> It's always used the sysfs interface on LE, and the only way to provoke
>> it to attempt the syscalls is by doing something like this before
>> running the migration:
>> 
>> # echo 0 > /tmp/fake_api_version
>> # mount -o bind,ro /tmp/fake_api_version /sys/kernel/mobility/api_version
>> 
>> So I'm not aware of any circumstance that would actually motivate
>> someone to make it work on LE. I'd say it's more likely that drmgr will
>> drop its support for using the syscall altogether.
>> 
>
> Okay. librtas should handle the error fine, so I don't have any huge 
> objection to this, though perhaps a documentation patch to librtas to 
> mention that these calls are BE-only would be in order.

Good suggestion, I've drafted a documentation change as well as an
attempt to mark the appropriate APIs deprecated for LE here:

https://github.com/nathanlynch/librtas/tree/topic/document-suspend-filter

which I'll send to Tyrel if/when the Linux change is staged.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 29/29] powerpc/pseries/mobility: refactor node lookup during DT update
  2020-10-30  1:18 ` [PATCH 29/29] powerpc/pseries/mobility: refactor node lookup during DT update Nathan Lynch
@ 2020-11-20 16:09   ` Nathan Lynch
  0 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-11-20 16:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Nathan Lynch <nathanl@linux.ibm.com> writes:
> In pseries_devicetree_update(), with each call to ibm,update-nodes the
> partition firmware communicates the node to be deleted or updated by
> placing its phandle in the work buffer. Each of delete_dt_node(),
> update_dt_node(), and add_dt_node() have duplicate lookups using the
> phandle value and corresponding refcount management.

...

I noticed that this introduces a reference count imbalance in an error
path:

> -static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
> +static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
>  {
>  	struct device_node *dn;
> -	struct device_node *parent_dn;
>  	int rc;
>  
> -	parent_dn = of_find_node_by_phandle(be32_to_cpu(parent_phandle));
> -	if (!parent_dn)
> -		return -ENOENT;
> -
>  	dn = dlpar_configure_connector(drc_index, parent_dn);
>  	if (!dn) {
>  		of_node_put(parent_dn);

here:           ^^^

> @@ -251,7 +230,6 @@ static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
>  
>  	pr_debug("added node %pOFfp\n", dn);
>  
> -	of_node_put(parent_dn);
>  	return rc;
>  }

The change correctly removes the of_node_put() from the happy path but
the put in the error branch should have been removed too.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 12/29] powerpc/pseries/mobility: extract VASI session polling logic
  2020-10-30  1:17 ` [PATCH 12/29] powerpc/pseries/mobility: extract VASI session polling logic Nathan Lynch
@ 2020-12-04 12:51   ` Michael Ellerman
  2020-12-04 14:46     ` Nathan Lynch
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Ellerman @ 2020-12-04 12:51 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Nathan Lynch <nathanl@linux.ibm.com> writes:
> The behavior of rtas_ibm_suspend_me_unsafe() is to return -EAGAIN to
> the caller until the specified VASI suspend session state makes the
> transition from H_VASI_ENABLED to H_VASI_SUSPENDING. In the interest
> of separating concerns to prepare for a new implementation of the
> join/suspend sequence, extract VASI session polling logic into a
> couple of local functions. Waiting for the session state to reach
> H_VASI_SUSPENDING before calling rtas_ibm_suspend_me_unsafe() ensures
> that we will never get an EAGAIN result necessitating a retry. No
> user-visible change in behavior is intended.
>
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/mobility.c | 76 +++++++++++++++++++++--
>  1 file changed, 71 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index dc6abf164db7..1b8ae221b98a 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -345,6 +345,73 @@ void post_mobility_fixup(void)
...

> +
> +static int wait_for_vasi_session_suspending(u64 handle)
> +{
> +	unsigned long state;
> +	bool keep_polling;
> +	int ret;
> +
> +	/*
> +	 * Wait for transition from H_VASI_ENABLED to
> +	 * H_VASI_SUSPENDING. Treat anything else as an error.
> +	 */
> +	do {
> +		keep_polling = false;
> +		ret = poll_vasi_state(handle, &state);
> +		if (ret != 0)
> +			break;
> +
> +		switch (state) {
> +		case H_VASI_SUSPENDING:
> +			break;
> +		case H_VASI_ENABLED:
> +			keep_polling = true;
> +			ssleep(1);
> +			break;
> +		default:
> +			pr_err("unexpected H_VASI_STATE result %lu\n", state);
> +			ret = -EIO;
> +			break;
> +		}
> +	} while (keep_polling);

This seems like it could be simpler?

eg:

	while (true) {
		ret = poll_vasi_state(handle, &state);

		if (ret != 0 || state == H_VASI_SUSPENDING)
			break;
		else if (state == H_VASI_ENABLED)
			ssleep(1);
		else {
			pr_err("unexpected H_VASI_STATE result %lu\n", state);
			ret = -EIO;
			break;
		}
	}


Or did I miss something?

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/29] powerpc/rtas: complete ibm,suspend-me status codes
  2020-10-30  1:17 ` [PATCH 03/29] powerpc/rtas: complete ibm,suspend-me status codes Nathan Lynch
@ 2020-12-04 12:52   ` Michael Ellerman
  2020-12-04 14:40     ` Nathan Lynch
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Ellerman @ 2020-12-04 12:52 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Nathan Lynch <nathanl@linux.ibm.com> writes:
> We don't completely account for the possible return codes for
> ibm,suspend-me. Add definitions for these.
>
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/rtas.h | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 55f9a154c95d..f060181a0d32 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -23,11 +23,16 @@
>  #define RTAS_RMOBUF_MAX (64 * 1024)
>  
>  /* RTAS return status codes */
> -#define RTAS_NOT_SUSPENDABLE	-9004
>  #define RTAS_BUSY		-2    /* RTAS Busy */
>  #define RTAS_EXTENDED_DELAY_MIN	9900
>  #define RTAS_EXTENDED_DELAY_MAX	9905
>  
> +/* statuses specific to ibm,suspend-me */
> +#define RTAS_SUSPEND_ABORTED     9000 /* Suspension aborted */

This made me ... pause.

But it really is positive 9000, I checked PAPR.

cheers


> +#define RTAS_NOT_SUSPENDABLE    -9004 /* Partition not suspendable */
> +#define RTAS_THREADS_ACTIVE     -9005 /* Multiple processor threads active */
> +#define RTAS_OUTSTANDING_COPROC -9006 /* Outstanding coprocessor operations */
> +
>  /*
>   * In general to call RTAS use rtas_token("string") to lookup
>   * an RTAS token for the given string (e.g. "event-scan").
> -- 
> 2.25.4

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend
  2020-10-30  1:17 ` [PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend Nathan Lynch
@ 2020-12-04 12:52   ` Michael Ellerman
  2020-12-04 16:01     ` Nathan Lynch
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Ellerman @ 2020-12-04 12:52 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Hi Nathan,

Just a couple of nits ...

Nathan Lynch <nathanl@linux.ibm.com> writes:
> The partition suspend sequence as specified in the platform
> architecture requires that all active processor threads call
> H_JOIN, which:
...
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 1b8ae221b98a..44ca7d4e143d 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -412,6 +414,128 @@ static int wait_for_vasi_session_suspending(u64 handle)
...

> +
> +static int do_join(void *arg)
> +{
> +	atomic_t *counter = arg;
> +	long hvrc;
> +	int ret;
> +
> +	/* Must ensure MSR.EE off for H_JOIN. */
> +	hard_irq_disable();

Didn't stop_machine() already do that for us?

In the state machine in multi_cpu_stop().

> +	hvrc = plpar_hcall_norets(H_JOIN);
> +
> +	switch (hvrc) {
> +	case H_CONTINUE:
> +		/*
> +		 * All other CPUs are offline or in H_JOIN. This CPU
> +		 * attempts the suspend.
> +		 */
> +		ret = do_suspend();
> +		break;
> +	case H_SUCCESS:
> +		/*
> +		 * The suspend is complete and this cpu has received a
> +		 * prod.
> +		 */
> +		ret = 0;
> +		break;
> +	case H_BAD_MODE:
> +	case H_HARDWARE:
> +	default:
> +		ret = -EIO;
> +		pr_err_ratelimited("H_JOIN error %ld on CPU %i\n",
> +				   hvrc, smp_processor_id());
> +		break;
> +	}
> +
> +	if (atomic_inc_return(counter) == 1) {
> +		pr_info("CPU %u waking all threads\n", smp_processor_id());
> +		prod_others();
> +	}

Do we even need the counter? IIUC only one CPU receives H_CONTINUE. So
couldn't we just have that CPU do the prodding of others?

> +	/*
> +	 * Execution may have been suspended for several seconds, so
> +	 * reset the watchdog.
> +	 */
> +	touch_nmi_watchdog();
> +	return ret;
> +}
> +
> +static int pseries_migrate_partition(u64 handle)
> +{
> +	atomic_t counter = ATOMIC_INIT(0);
> +	int ret;
> +
> +	ret = wait_for_vasi_session_suspending(handle);
> +	if (ret)
> +		goto out;

Direct return would be clearer IMHO.

> +
> +	ret = stop_machine(do_join, &counter, cpu_online_mask);
> +	if (ret == 0)
> +		post_mobility_fixup();
> +out:
> +	return ret;
> +}
> +

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 16/29] powerpc/rtas: dispatch partition migration requests to pseries
  2020-10-30  1:17 ` [PATCH 16/29] powerpc/rtas: dispatch partition migration requests to pseries Nathan Lynch
@ 2020-12-04 12:52   ` Michael Ellerman
  2020-12-04 16:04     ` Nathan Lynch
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Ellerman @ 2020-12-04 12:52 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Nathan Lynch <nathanl@linux.ibm.com> writes:
> sys_rtas() cannot call ibm,suspend-me directly in the same way it
> handles other inputs. Instead it must dispatch the request to code
> that can first perform the H_JOIN sequence before any call to
> ibm,suspend-me can succeed. Over time kernel/rtas.c has accreted a fair
> amount of platform-specific code to implement this.
>
> Since a different, more robust implementation of the suspend sequence
> is now in the pseries platform code, we want to dispatch the request
> there while minimizing additional dependence on pseries.
>
> Use a weak function that only pseries overrides.

Over the years weak functions have caused their fair share of problems.

There are cases where they are the cleanest option, but for intra-arch
code like this I think and ifdef is much simpler.


> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index fdefe6a974eb..be0fc2536673 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -260,6 +260,7 @@ extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
>  extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
>  int rtas_ibm_suspend_me_unsafe(u64 handle);
>  int rtas_ibm_suspend_me(int *fw_status);
> +int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);

ie. we'd just do:

#ifdef CONFIG_PPC_PSERIES
int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
#else
int rtas_syscall_dispatch_ibm_suspend_me(u64 handle)
{
	return -EINVAL;
}
#endif


cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/29] powerpc/rtas: complete ibm,suspend-me status codes
  2020-12-04 12:52   ` Michael Ellerman
@ 2020-12-04 14:40     ` Nathan Lynch
  0 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-12-04 14:40 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> We don't completely account for the possible return codes for
>> ibm,suspend-me. Add definitions for these.
>>
>> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
>> ---
>>  arch/powerpc/include/asm/rtas.h | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
>> index 55f9a154c95d..f060181a0d32 100644
>> --- a/arch/powerpc/include/asm/rtas.h
>> +++ b/arch/powerpc/include/asm/rtas.h
>> @@ -23,11 +23,16 @@
>>  #define RTAS_RMOBUF_MAX (64 * 1024)
>>  
>>  /* RTAS return status codes */
>> -#define RTAS_NOT_SUSPENDABLE	-9004
>>  #define RTAS_BUSY		-2    /* RTAS Busy */
>>  #define RTAS_EXTENDED_DELAY_MIN	9900
>>  #define RTAS_EXTENDED_DELAY_MAX	9905
>>  
>> +/* statuses specific to ibm,suspend-me */
>> +#define RTAS_SUSPEND_ABORTED     9000 /* Suspension aborted */
>
> This made me ... pause.
>
> But it really is positive 9000, I checked PAPR.

Yes, 9000 falls within the "vendor-specific success codes" range in the
RTAS status value table. I guess aborting a suspend is
operator-initiated and it's not considered an error condition from that
point of view.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 12/29] powerpc/pseries/mobility: extract VASI session polling logic
  2020-12-04 12:51   ` Michael Ellerman
@ 2020-12-04 14:46     ` Nathan Lynch
  0 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-12-04 14:46 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> The behavior of rtas_ibm_suspend_me_unsafe() is to return -EAGAIN to
>> the caller until the specified VASI suspend session state makes the
>> transition from H_VASI_ENABLED to H_VASI_SUSPENDING. In the interest
>> of separating concerns to prepare for a new implementation of the
>> join/suspend sequence, extract VASI session polling logic into a
>> couple of local functions. Waiting for the session state to reach
>> H_VASI_SUSPENDING before calling rtas_ibm_suspend_me_unsafe() ensures
>> that we will never get an EAGAIN result necessitating a retry. No
>> user-visible change in behavior is intended.
>>
>> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/mobility.c | 76 +++++++++++++++++++++--
>>  1 file changed, 71 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index dc6abf164db7..1b8ae221b98a 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -345,6 +345,73 @@ void post_mobility_fixup(void)
> ...
>
>> +
>> +static int wait_for_vasi_session_suspending(u64 handle)
>> +{
>> +	unsigned long state;
>> +	bool keep_polling;
>> +	int ret;
>> +
>> +	/*
>> +	 * Wait for transition from H_VASI_ENABLED to
>> +	 * H_VASI_SUSPENDING. Treat anything else as an error.
>> +	 */
>> +	do {
>> +		keep_polling = false;
>> +		ret = poll_vasi_state(handle, &state);
>> +		if (ret != 0)
>> +			break;
>> +
>> +		switch (state) {
>> +		case H_VASI_SUSPENDING:
>> +			break;
>> +		case H_VASI_ENABLED:
>> +			keep_polling = true;
>> +			ssleep(1);
>> +			break;
>> +		default:
>> +			pr_err("unexpected H_VASI_STATE result %lu\n", state);
>> +			ret = -EIO;
>> +			break;
>> +		}
>> +	} while (keep_polling);
>
> This seems like it could be simpler?
>
> eg:
>
> 	while (true) {
> 		ret = poll_vasi_state(handle, &state);
>
> 		if (ret != 0 || state == H_VASI_SUSPENDING)
> 			break;
> 		else if (state == H_VASI_ENABLED)
> 			ssleep(1);
> 		else {
> 			pr_err("unexpected H_VASI_STATE result %lu\n", state);
> 			ret = -EIO;
> 			break;
> 		}
> 	}
>
>
> Or did I miss something?

No I think you're right, thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend
  2020-12-04 12:52   ` Michael Ellerman
@ 2020-12-04 16:01     ` Nathan Lynch
  2020-12-05 11:03       ` Michael Ellerman
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Lynch @ 2020-12-04 16:01 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Hi Michael,

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> The partition suspend sequence as specified in the platform
>> architecture requires that all active processor threads call
>> H_JOIN, which:
> ...
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 1b8ae221b98a..44ca7d4e143d 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -412,6 +414,128 @@ static int wait_for_vasi_session_suspending(u64 handle)
> ...
>
>> +
>> +static int do_join(void *arg)
>> +{
>> +	atomic_t *counter = arg;
>> +	long hvrc;
>> +	int ret;
>> +
>> +	/* Must ensure MSR.EE off for H_JOIN. */
>> +	hard_irq_disable();
>
> Didn't stop_machine() already do that for us?
>
> In the state machine in multi_cpu_stop().

Yes, but I didn't want to rely on something that seems like an
implementation detail of stop_machine(). I assumed it's benign and in
keeping with hard_irq_disable()'s intended semantics to make multiple
calls to it within a critical section.

I don't feel strongly about this though. If I remove this
hard_irq_disable() I'll modify the comment to indicate that this
function relies on stop_machine() to satisfy this condition for H_JOIN.


>> +	hvrc = plpar_hcall_norets(H_JOIN);
>> +
>> +	switch (hvrc) {
>> +	case H_CONTINUE:
>> +		/*
>> +		 * All other CPUs are offline or in H_JOIN. This CPU
>> +		 * attempts the suspend.
>> +		 */
>> +		ret = do_suspend();
>> +		break;
>> +	case H_SUCCESS:
>> +		/*
>> +		 * The suspend is complete and this cpu has received a
>> +		 * prod.
>> +		 */
>> +		ret = 0;
>> +		break;
>> +	case H_BAD_MODE:
>> +	case H_HARDWARE:
>> +	default:
>> +		ret = -EIO;
>> +		pr_err_ratelimited("H_JOIN error %ld on CPU %i\n",
>> +				   hvrc, smp_processor_id());
>> +		break;
>> +	}
>> +
>> +	if (atomic_inc_return(counter) == 1) {
>> +		pr_info("CPU %u waking all threads\n", smp_processor_id());
>> +		prod_others();
>> +	}
>
> Do we even need the counter? IIUC only one CPU receives H_CONTINUE. So
> couldn't we just have that CPU do the prodding of others?

CPUs may exit H_JOIN due to system reset interrupt at any time, and
H_JOIN may return H_HARDWARE to a caller after other CPUs have entered
the join state successfully. In these cases the counter ensures exactly
one thread performs the prod sequence.

>
>> +	/*
>> +	 * Execution may have been suspended for several seconds, so
>> +	 * reset the watchdog.
>> +	 */
>> +	touch_nmi_watchdog();
>> +	return ret;
>> +}
>> +
>> +static int pseries_migrate_partition(u64 handle)
>> +{
>> +	atomic_t counter = ATOMIC_INIT(0);
>> +	int ret;
>> +
>> +	ret = wait_for_vasi_session_suspending(handle);
>> +	if (ret)
>> +		goto out;
>
> Direct return would be clearer IMHO.

OK, I can change this.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 16/29] powerpc/rtas: dispatch partition migration requests to pseries
  2020-12-04 12:52   ` Michael Ellerman
@ 2020-12-04 16:04     ` Nathan Lynch
  0 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-12-04 16:04 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> sys_rtas() cannot call ibm,suspend-me directly in the same way it
>> handles other inputs. Instead it must dispatch the request to code
>> that can first perform the H_JOIN sequence before any call to
>> ibm,suspend-me can succeed. Over time kernel/rtas.c has accreted a fair
>> amount of platform-specific code to implement this.
>>
>> Since a different, more robust implementation of the suspend sequence
>> is now in the pseries platform code, we want to dispatch the request
>> there while minimizing additional dependence on pseries.
>>
>> Use a weak function that only pseries overrides.
>
> Over the years weak functions have caused their fair share of problems.
>
> There are cases where they are the cleanest option, but for intra-arch
> code like this I think and ifdef is much simpler.

Fair enough, I wasn't all that confident about this anyway.


>> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
>> index fdefe6a974eb..be0fc2536673 100644
>> --- a/arch/powerpc/include/asm/rtas.h
>> +++ b/arch/powerpc/include/asm/rtas.h
>> @@ -260,6 +260,7 @@ extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
>>  extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
>>  int rtas_ibm_suspend_me_unsafe(u64 handle);
>>  int rtas_ibm_suspend_me(int *fw_status);
>> +int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
>
> ie. we'd just do:
>
> #ifdef CONFIG_PPC_PSERIES
> int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
> #else
> int rtas_syscall_dispatch_ibm_suspend_me(u64 handle)
> {
> 	return -EINVAL;
> }
> #endif

Yep will do.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 01/29] powerpc/rtas: move rtas_call_reentrant() out of pseries guards
  2020-10-30  1:17 ` [PATCH 01/29] powerpc/rtas: move rtas_call_reentrant() out of pseries guards Nathan Lynch
@ 2020-12-04 20:37   ` Nathan Lynch
  0 siblings, 0 replies; 45+ messages in thread
From: Nathan Lynch @ 2020-12-04 20:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Nathan Lynch <nathanl@linux.ibm.com> writes:

> rtas_call_reentrant() isn't platform-dependent; move it out of
> CONFIG_PPC_PSERIES-guarded code.

As reported by the 0-day CI, this is mistaken, and it breaks non-pseries
builds:

>> arch/powerpc/kernel/rtas.c:938:21: error: no member named 'rtas_args_reentrant' in 'struct paca_struct'
           args = local_paca->rtas_args_reentrant;
                  ~~~~~~~~~~  ^
   1 error generated.

https://lore.kernel.org/linuxppc-dev/202012050432.SFbbjWMw-lkp@intel.com/

I'll drop this and reroll the series.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend
  2020-12-04 16:01     ` Nathan Lynch
@ 2020-12-05 11:03       ` Michael Ellerman
  0 siblings, 0 replies; 45+ messages in thread
From: Michael Ellerman @ 2020-12-05 11:03 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Hi Michael,
>
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>> The partition suspend sequence as specified in the platform
>>> architecture requires that all active processor threads call
>>> H_JOIN, which:
>> ...
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>>> index 1b8ae221b98a..44ca7d4e143d 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -412,6 +414,128 @@ static int wait_for_vasi_session_suspending(u64 handle)
>> ...
>>
>>> +
>>> +static int do_join(void *arg)
>>> +{
>>> +	atomic_t *counter = arg;
>>> +	long hvrc;
>>> +	int ret;
>>> +
>>> +	/* Must ensure MSR.EE off for H_JOIN. */
>>> +	hard_irq_disable();
>>
>> Didn't stop_machine() already do that for us?
>>
>> In the state machine in multi_cpu_stop().
>
> Yes, but I didn't want to rely on something that seems like an
> implementation detail of stop_machine(). I assumed it's benign and in
> keeping with hard_irq_disable()'s intended semantics to make multiple
> calls to it within a critical section.

OK. I think it's part of the contract of stop_machine() these days, but
you're right hard_irq_disable() can be called multiple times, so we may
as well leave it there as insurance/documentation.

>>> +	hvrc = plpar_hcall_norets(H_JOIN);
>>> +
>>> +	switch (hvrc) {
>>> +	case H_CONTINUE:
>>> +		/*
>>> +		 * All other CPUs are offline or in H_JOIN. This CPU
>>> +		 * attempts the suspend.
>>> +		 */
>>> +		ret = do_suspend();
>>> +		break;
>>> +	case H_SUCCESS:
>>> +		/*
>>> +		 * The suspend is complete and this cpu has received a
>>> +		 * prod.
>>> +		 */
>>> +		ret = 0;
>>> +		break;
>>> +	case H_BAD_MODE:
>>> +	case H_HARDWARE:
>>> +	default:
>>> +		ret = -EIO;
>>> +		pr_err_ratelimited("H_JOIN error %ld on CPU %i\n",
>>> +				   hvrc, smp_processor_id());
>>> +		break;
>>> +	}
>>> +
>>> +	if (atomic_inc_return(counter) == 1) {
>>> +		pr_info("CPU %u waking all threads\n", smp_processor_id());
>>> +		prod_others();
>>> +	}
>>
>> Do we even need the counter? IIUC only one CPU receives H_CONTINUE. So
>> couldn't we just have that CPU do the prodding of others?
>
> CPUs may exit H_JOIN due to system reset interrupt at any time, and
> H_JOIN may return H_HARDWARE to a caller after other CPUs have entered
> the join state successfully. In these cases the counter ensures exactly
> one thread performs the prod sequence.

OK.

>>> +	/*
>>> +	 * Execution may have been suspended for several seconds, so
>>> +	 * reset the watchdog.
>>> +	 */
>>> +	touch_nmi_watchdog();
>>> +	return ret;
>>> +}
>>> +
>>> +static int pseries_migrate_partition(u64 handle)
>>> +{
>>> +	atomic_t counter = ATOMIC_INIT(0);
>>> +	int ret;
>>> +
>>> +	ret = wait_for_vasi_session_suspending(handle);
>>> +	if (ret)
>>> +		goto out;
>>
>> Direct return would be clearer IMHO.
>
> OK, I can change this.

Thanks.

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-12-05 11:04 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-30  1:17 [PATCH 00/29] partition suspend updates Nathan Lynch
2020-10-30  1:17 ` [PATCH 01/29] powerpc/rtas: move rtas_call_reentrant() out of pseries guards Nathan Lynch
2020-12-04 20:37   ` Nathan Lynch
2020-10-30  1:17 ` [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE Nathan Lynch
2020-10-30  3:45   ` Andrew Donnellan
2020-10-30 12:10     ` Nathan Lynch
2020-11-06 14:59       ` Andrew Donnellan
2020-11-06 16:44         ` Nathan Lynch
2020-10-30  1:17 ` [PATCH 03/29] powerpc/rtas: complete ibm,suspend-me status codes Nathan Lynch
2020-12-04 12:52   ` Michael Ellerman
2020-12-04 14:40     ` Nathan Lynch
2020-10-30  1:17 ` [PATCH 04/29] powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe Nathan Lynch
2020-10-30  1:17 ` [PATCH 05/29] powerpc/rtas: add rtas_ibm_suspend_me() Nathan Lynch
2020-10-30  1:17 ` [PATCH 06/29] powerpc/rtas: add rtas_activate_firmware() Nathan Lynch
2020-10-30  1:17 ` [PATCH 07/29] powerpc/hvcall: add token and codes for H_VASI_SIGNAL Nathan Lynch
2020-10-30  1:17 ` [PATCH 08/29] powerpc/pseries/mobility: don't error on absence of ibm, update-nodes Nathan Lynch
2020-10-30  1:17 ` [PATCH 09/29] powerpc/pseries/mobility: add missing break to default case Nathan Lynch
2020-10-30  1:17 ` [PATCH 10/29] powerpc/pseries/mobility: error message improvements Nathan Lynch
2020-10-30  1:17 ` [PATCH 11/29] powerpc/pseries/mobility: use rtas_activate_firmware() on resume Nathan Lynch
2020-10-30  1:17 ` [PATCH 12/29] powerpc/pseries/mobility: extract VASI session polling logic Nathan Lynch
2020-12-04 12:51   ` Michael Ellerman
2020-12-04 14:46     ` Nathan Lynch
2020-10-30  1:17 ` [PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend Nathan Lynch
2020-12-04 12:52   ` Michael Ellerman
2020-12-04 16:01     ` Nathan Lynch
2020-12-05 11:03       ` Michael Ellerman
2020-10-30  1:17 ` [PATCH 14/29] powerpc/pseries/mobility: signal suspend cancellation to platform Nathan Lynch
2020-10-30  1:17 ` [PATCH 15/29] powerpc/pseries/mobility: retry partition suspend after error Nathan Lynch
2020-10-30  1:17 ` [PATCH 16/29] powerpc/rtas: dispatch partition migration requests to pseries Nathan Lynch
2020-12-04 12:52   ` Michael Ellerman
2020-12-04 16:04     ` Nathan Lynch
2020-10-30  1:17 ` [PATCH 17/29] powerpc/rtas: remove rtas_ibm_suspend_me_unsafe() Nathan Lynch
2020-10-30  1:17 ` [PATCH 18/29] powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops Nathan Lynch
2020-10-30  1:17 ` [PATCH 19/29] powerpc/pseries/hibernation: pass stream id via function arguments Nathan Lynch
2020-10-30  1:17 ` [PATCH 20/29] powerpc/pseries/hibernation: remove pseries_suspend_cpu() Nathan Lynch
2020-10-30  1:17 ` [PATCH 21/29] powerpc/machdep: remove suspend_disable_cpu() Nathan Lynch
2020-10-30  1:17 ` [PATCH 22/29] powerpc/rtas: remove rtas_suspend_cpu() Nathan Lynch
2020-10-30  1:17 ` [PATCH 23/29] powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me() Nathan Lynch
2020-10-30  1:18 ` [PATCH 24/29] powerpc/rtas: remove unused rtas_suspend_last_cpu() Nathan Lynch
2020-10-30  1:18 ` [PATCH 25/29] powerpc/pseries/hibernation: remove redundant cacheinfo update Nathan Lynch
2020-10-30  1:18 ` [PATCH 26/29] powerpc/pseries/hibernation: perform post-suspend fixups later Nathan Lynch
2020-10-30  1:18 ` [PATCH 27/29] powerpc/pseries/hibernation: remove prepare_late() callback Nathan Lynch
2020-10-30  1:18 ` [PATCH 28/29] powerpc/rtas: remove unused rtas_suspend_me_data Nathan Lynch
2020-10-30  1:18 ` [PATCH 29/29] powerpc/pseries/mobility: refactor node lookup during DT update Nathan Lynch
2020-11-20 16:09   ` Nathan Lynch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).