All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Lynch <nathanl@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: tyreld@linux.ibm.com, ajd@linux.ibm.com, mmc@linux.vnet.ibm.com,
	cforno12@linux.vnet.ibm.com, drt@linux.vnet.ibm.com,
	brking@linux.ibm.com
Subject: [PATCH v2 14/28] powerpc/pseries/mobility: retry partition suspend after error
Date: Mon,  7 Dec 2020 15:51:46 -0600	[thread overview]
Message-ID: <20201207215200.1785968-15-nathanl@linux.ibm.com> (raw)
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>

This is a mitigation for the relatively rare occurrence where a
virtual IOA can be in a transient state that prevents the
suspend/migration from succeeding, resulting in an error from
ibm,suspend-me.

If the join/suspend sequence returns an error, it is acceptable to
retry as long as the VASI suspend session state is still
"Suspending" (i.e. the platform is still waiting for the OS to
suspend).

Retry a few times on suspend failure while this condition holds,
progressively increasing the delay between attempts. We don't want to
retry indefinitey because firmware emits an error log event on each
unsuccessful attempt.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 59 ++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index f234a7ed87aa..fe7e35cdc9d5 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -542,16 +542,71 @@ static void pseries_cancel_migration(u64 handle, int err)
 		pr_err("H_VASI_SIGNAL error: %ld\n", hvrc);
 }
 
+static int pseries_suspend(u64 handle)
+{
+	const unsigned int max_attempts = 5;
+	unsigned int retry_interval_ms = 1;
+	unsigned int attempt = 1;
+	int ret;
+
+	while (true) {
+		atomic_t counter = ATOMIC_INIT(0);
+		unsigned long vasi_state;
+		int vasi_err;
+
+		ret = stop_machine(do_join, &counter, cpu_online_mask);
+		if (ret == 0)
+			break;
+		/*
+		 * Encountered an error. If the VASI stream is still
+		 * in Suspending state, it's likely a transient
+		 * condition related to some device in the partition
+		 * and we can retry in the hope that the cause has
+		 * cleared after some delay.
+		 *
+		 * A better design would allow drivers etc to prepare
+		 * for the suspend and avoid conditions which prevent
+		 * the suspend from succeeding. For now, we have this
+		 * mitigation.
+		 */
+		pr_notice("Partition suspend attempt %u of %u error: %d\n",
+			  attempt, max_attempts, ret);
+
+		if (attempt == max_attempts)
+			break;
+
+		vasi_err = poll_vasi_state(handle, &vasi_state);
+		if (vasi_err == 0) {
+			if (vasi_state != H_VASI_SUSPENDING) {
+				pr_notice("VASI state %lu after failed suspend\n",
+					  vasi_state);
+				break;
+			}
+		} else if (vasi_err != -EOPNOTSUPP) {
+			pr_err("VASI state poll error: %d", vasi_err);
+			break;
+		}
+
+		pr_notice("Will retry partition suspend after %u ms\n",
+			  retry_interval_ms);
+
+		msleep(retry_interval_ms);
+		retry_interval_ms *= 10;
+		attempt++;
+	}
+
+	return ret;
+}
+
 static int pseries_migrate_partition(u64 handle)
 {
-	atomic_t counter = ATOMIC_INIT(0);
 	int ret;
 
 	ret = wait_for_vasi_session_suspending(handle);
 	if (ret)
 		return ret;
 
-	ret = stop_machine(do_join, &counter, cpu_online_mask);
+	ret = pseries_suspend(handle);
 	if (ret == 0)
 		post_mobility_fixup();
 	else
-- 
2.28.0


  parent reply	other threads:[~2020-12-07 22:25 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-07 21:51 [PATCH v2 00/28] partition suspend updates Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 01/28] powerpc/rtas: prevent suspend-related sys_rtas use on LE Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 02/28] powerpc/rtas: complete ibm,suspend-me status codes Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 03/28] powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 04/28] powerpc/rtas: add rtas_ibm_suspend_me() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 05/28] powerpc/rtas: add rtas_activate_firmware() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 06/28] powerpc/hvcall: add token and codes for H_VASI_SIGNAL Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 07/28] powerpc/pseries/mobility: don't error on absence of ibm, update-nodes Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 08/28] powerpc/pseries/mobility: add missing break to default case Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 09/28] powerpc/pseries/mobility: error message improvements Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 10/28] powerpc/pseries/mobility: use rtas_activate_firmware() on resume Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 11/28] powerpc/pseries/mobility: extract VASI session polling logic Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 12/28] powerpc/pseries/mobility: use stop_machine for join/suspend Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 13/28] powerpc/pseries/mobility: signal suspend cancellation to platform Nathan Lynch
2020-12-07 21:51 ` Nathan Lynch [this message]
2020-12-07 21:51 ` [PATCH v2 15/28] powerpc/rtas: dispatch partition migration requests to pseries Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 16/28] powerpc/rtas: remove rtas_ibm_suspend_me_unsafe() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 17/28] powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 18/28] powerpc/pseries/hibernation: pass stream id via function arguments Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 19/28] powerpc/pseries/hibernation: remove pseries_suspend_cpu() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 20/28] powerpc/machdep: remove suspend_disable_cpu() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 21/28] powerpc/rtas: remove rtas_suspend_cpu() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 22/28] powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 23/28] powerpc/rtas: remove unused rtas_suspend_last_cpu() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 24/28] powerpc/pseries/hibernation: remove redundant cacheinfo update Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 25/28] powerpc/pseries/hibernation: perform post-suspend fixups later Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 26/28] powerpc/pseries/hibernation: remove prepare_late() callback Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 27/28] powerpc/rtas: remove unused rtas_suspend_me_data Nathan Lynch
2020-12-07 21:52 ` [PATCH v2 28/28] powerpc/pseries/mobility: refactor node lookup during DT update Nathan Lynch
2020-12-15 10:49 ` [PATCH v2 00/28] partition suspend updates Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201207215200.1785968-15-nathanl@linux.ibm.com \
    --to=nathanl@linux.ibm.com \
    --cc=ajd@linux.ibm.com \
    --cc=brking@linux.ibm.com \
    --cc=cforno12@linux.vnet.ibm.com \
    --cc=drt@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mmc@linux.vnet.ibm.com \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.