linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
	Mahesh Salgaonkar <mahesh@linux.ibm.com>
Subject: [PATCH 2/8] powerpc/64s/powernv: Allow KVM to handle guest machine check details
Date: Sat, 28 Nov 2020 17:07:22 +1000	[thread overview]
Message-ID: <20201128070728.825934-3-npiggin@gmail.com> (raw)
In-Reply-To: <20201128070728.825934-1-npiggin@gmail.com>

KVM has strategies to perform machine check recovery. If a MCE hits
in a guest, have the low level handler just decode and save the MCE
but not try to recover anything, so KVM can deal with it.

The host does not own SLBs and does not need to report the SLB state
in case of a multi-hit for example, or know about the virtual memory
map of the guest.

UE and memory poisoning of guest pages in the host is one thing that
is possibly not completely robust at the moment, but this too needs
to go via KVM (possibly via the guest and back out to host via hcall)
rather than being handled at a low level in the host handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/mce.c       |  2 +-
 arch/powerpc/kernel/mce_power.c | 96 ++++++++++++++++++---------------
 2 files changed, 55 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 63702c0badb9..8afe8d37b983 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -577,7 +577,7 @@ void machine_check_print_event_info(struct machine_check_event *evt,
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	/* Display faulty slb contents for SLB errors. */
-	if (evt->error_type == MCE_ERROR_TYPE_SLB)
+	if (evt->error_type == MCE_ERROR_TYPE_SLB && !in_guest)
 		slb_dump_contents(local_paca->mce_faulty_slbs);
 #endif
 }
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index b7e173754a2e..1372ce3f7bdd 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -62,6 +62,20 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
 	return pfn;
 }
 
+static bool mce_in_guest(void)
+{
+#ifdef CONFIG_KVM_BOOK3S_HANDLER
+	/*
+	 * If machine check is hit when in guest context or low level KVM
+	 * code, avoid looking up any translations or making any attempts
+	 * to recover, just record the event and pass to KVM.
+	 */
+	if (get_paca()->kvm_hstate.in_guest)
+		return true;
+#endif
+	return false;
+}
+
 /* flush SLBs and reload */
 #ifdef CONFIG_PPC_BOOK3S_64
 void flush_and_reload_slb(void)
@@ -69,14 +83,6 @@ void flush_and_reload_slb(void)
 	/* Invalidate all SLBs */
 	slb_flush_all_realmode();
 
-#ifdef CONFIG_KVM_BOOK3S_HANDLER
-	/*
-	 * If machine check is hit when in guest or in transition, we will
-	 * only flush the SLBs and continue.
-	 */
-	if (get_paca()->kvm_hstate.in_guest)
-		return;
-#endif
 	if (early_radix_enabled())
 		return;
 
@@ -490,19 +496,21 @@ static int mce_handle_ierror(struct pt_regs *regs,
 		if ((srr1 & table[i].srr1_mask) != table[i].srr1_value)
 			continue;
 
-		/* attempt to correct the error */
-		switch (table[i].error_type) {
-		case MCE_ERROR_TYPE_SLB:
-			if (local_paca->in_mce == 1)
-				slb_save_contents(local_paca->mce_faulty_slbs);
-			handled = mce_flush(MCE_FLUSH_SLB);
-			break;
-		case MCE_ERROR_TYPE_ERAT:
-			handled = mce_flush(MCE_FLUSH_ERAT);
-			break;
-		case MCE_ERROR_TYPE_TLB:
-			handled = mce_flush(MCE_FLUSH_TLB);
-			break;
+		if (!mce_in_guest()) {
+			/* attempt to correct the error */
+			switch (table[i].error_type) {
+			case MCE_ERROR_TYPE_SLB:
+				if (local_paca->in_mce == 1)
+					slb_save_contents(local_paca->mce_faulty_slbs);
+				handled = mce_flush(MCE_FLUSH_SLB);
+				break;
+			case MCE_ERROR_TYPE_ERAT:
+				handled = mce_flush(MCE_FLUSH_ERAT);
+				break;
+			case MCE_ERROR_TYPE_TLB:
+				handled = mce_flush(MCE_FLUSH_TLB);
+				break;
+			}
 		}
 
 		/* now fill in mce_error_info */
@@ -534,7 +542,7 @@ static int mce_handle_ierror(struct pt_regs *regs,
 		mce_err->sync_error = table[i].sync_error;
 		mce_err->severity = table[i].severity;
 		mce_err->initiator = table[i].initiator;
-		if (table[i].nip_valid) {
+		if (table[i].nip_valid && !mce_in_guest()) {
 			*addr = regs->nip;
 			if (mce_err->sync_error &&
 				table[i].error_type == MCE_ERROR_TYPE_UE) {
@@ -577,22 +585,24 @@ static int mce_handle_derror(struct pt_regs *regs,
 		if (!(dsisr & table[i].dsisr_value))
 			continue;
 
-		/* attempt to correct the error */
-		switch (table[i].error_type) {
-		case MCE_ERROR_TYPE_SLB:
-			if (local_paca->in_mce == 1)
-				slb_save_contents(local_paca->mce_faulty_slbs);
-			if (mce_flush(MCE_FLUSH_SLB))
-				handled = 1;
-			break;
-		case MCE_ERROR_TYPE_ERAT:
-			if (mce_flush(MCE_FLUSH_ERAT))
-				handled = 1;
-			break;
-		case MCE_ERROR_TYPE_TLB:
-			if (mce_flush(MCE_FLUSH_TLB))
-				handled = 1;
-			break;
+		if (!mce_in_guest()) {
+			/* attempt to correct the error */
+			switch (table[i].error_type) {
+			case MCE_ERROR_TYPE_SLB:
+				if (local_paca->in_mce == 1)
+					slb_save_contents(local_paca->mce_faulty_slbs);
+				if (mce_flush(MCE_FLUSH_SLB))
+					handled = 1;
+				break;
+			case MCE_ERROR_TYPE_ERAT:
+				if (mce_flush(MCE_FLUSH_ERAT))
+					handled = 1;
+				break;
+			case MCE_ERROR_TYPE_TLB:
+				if (mce_flush(MCE_FLUSH_TLB))
+					handled = 1;
+				break;
+			}
 		}
 
 		/*
@@ -634,7 +644,7 @@ static int mce_handle_derror(struct pt_regs *regs,
 		mce_err->initiator = table[i].initiator;
 		if (table[i].dar_valid)
 			*addr = regs->dar;
-		else if (mce_err->sync_error &&
+		else if (mce_err->sync_error && !mce_in_guest() &&
 				table[i].error_type == MCE_ERROR_TYPE_UE) {
 			/*
 			 * We do a maximum of 4 nested MCE calls, see
@@ -662,7 +672,8 @@ static int mce_handle_derror(struct pt_regs *regs,
 static long mce_handle_ue_error(struct pt_regs *regs,
 				struct mce_error_info *mce_err)
 {
-	long handled = 0;
+	if (mce_in_guest())
+		return 0;
 
 	mce_common_process_ue(regs, mce_err);
 	if (mce_err->ignore_event)
@@ -677,9 +688,10 @@ static long mce_handle_ue_error(struct pt_regs *regs,
 
 	if (ppc_md.mce_check_early_recovery) {
 		if (ppc_md.mce_check_early_recovery(regs))
-			handled = 1;
+			return 1;
 	}
-	return handled;
+
+	return 0;
 }
 
 static long mce_handle_error(struct pt_regs *regs,
-- 
2.23.0


  parent reply	other threads:[~2020-11-28  7:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28  7:07 [PATCH 0/8] powerpc/64s: fix and improve machine check handling Nicholas Piggin
2020-11-28  7:07 ` [PATCH 1/8] powerpc/64s/powernv: Fix memory corruption when saving SLB entries on MCE Nicholas Piggin
2020-11-30  3:55   ` Mahesh J Salgaonkar
2020-11-28  7:07 ` Nicholas Piggin [this message]
2020-11-28  7:07 ` [PATCH 3/8] KVM: PPC: Book3S HV: Don't attempt to recover machine checks for FWNMI enabled guests Nicholas Piggin
2020-11-28  7:07 ` [PATCH 4/8] KVM: PPC: Book3S HV: Ratelimit machine check messages coming from guests Nicholas Piggin
2020-12-02 12:58   ` Michael Ellerman
2020-11-28  7:07 ` [PATCH 5/8] powerpc/64s/powernv: ratelimit harmless HMI error printing Nicholas Piggin
2020-12-02 13:00   ` Michael Ellerman
2020-11-28  7:07 ` [PATCH 6/8] powerpc/64s/pseries: Add ERAT specific machine check handler Nicholas Piggin
2020-11-28  7:07 ` [PATCH 7/8] powerpc/64s: Remove "Host" from MCE logging Nicholas Piggin
2020-11-28  7:07 ` [PATCH 8/8] powerpc/64s: tidy machine check SLB logging Nicholas Piggin
2020-12-04 11:59 ` [PATCH 0/8] powerpc/64s: fix and improve machine check handling Michael Ellerman
2020-12-10 11:30 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201128070728.825934-3-npiggin@gmail.com \
    --to=npiggin@gmail.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).