All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] KVM: x86: WRMSR MCi_CTL/STATUS bug fix and cleanups
@ 2022-05-12 22:27 Sean Christopherson
  2022-05-12 22:27 ` [PATCH 1/3] KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) Sean Christopherson
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Sean Christopherson @ 2022-05-12 22:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Jue Wang

Fix bugs in set_msr_mce() where KVM returns '-1' instead of '1' when
rejecting a WRMSR, which results in -EPERM getting returned to userspace.

Patches 2 and 3 are cleanups to make {g,s}et_msr_mce a bit less wierd
(especially with respect to Jue's CMCI series) and a bit less magical.

Sean Christopherson (3):
  KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)
  KVM: x86: Use explicit case-statements for MCx banks in
    {g,s}et_msr_mce()
  KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs

 arch/x86/kvm/x86.c | 84 +++++++++++++++++++++++++++-------------------
 1 file changed, 50 insertions(+), 34 deletions(-)


base-commit: 2764011106d0436cb44702cfb0981339d68c3509
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)
  2022-05-12 22:27 [PATCH 0/3] KVM: x86: WRMSR MCi_CTL/STATUS bug fix and cleanups Sean Christopherson
@ 2022-05-12 22:27 ` Sean Christopherson
  2022-05-12 22:43   ` Jim Mattson
  2022-05-12 22:27 ` [PATCH 2/3] KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce() Sean Christopherson
  2022-05-12 22:27 ` [PATCH 3/3] KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs Sean Christopherson
  2 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2022-05-12 22:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Jue Wang

Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
MCi_STATUS MSR.  The behavior of "all zeros' or "all ones" for CTL MSRs
is architectural, as is the "only zeros" behavior for STATUS MSRs.  I.e.
the intent is to inject a #GP, not exit to userspace due to an unhandled
emulation case.  Returning '-1' gets interpreted as -EPERM up the stack
and effecitvely kills the guest.

Fixes: 890ca9aefa78 ("KVM: Add MCE support")
Fixes: 9ffd986c6e4e ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8ee8c91fa762..bc6db58975dc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3233,13 +3233,13 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			 */
 			if ((offset & 0x3) == 0 &&
 			    data != 0 && (data | (1 << 10)) != ~(u64)0)
-				return -1;
+				return 1;
 
 			/* MCi_STATUS */
 			if (!msr_info->host_initiated &&
 			    (offset & 0x3) == 1 && data != 0) {
 				if (!can_set_mci_status(vcpu))
-					return -1;
+					return 1;
 			}
 
 			vcpu->arch.mce_banks[offset] = data;
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce()
  2022-05-12 22:27 [PATCH 0/3] KVM: x86: WRMSR MCi_CTL/STATUS bug fix and cleanups Sean Christopherson
  2022-05-12 22:27 ` [PATCH 1/3] KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) Sean Christopherson
@ 2022-05-12 22:27 ` Sean Christopherson
  2022-05-12 23:23   ` Jim Mattson
  2022-05-12 22:27 ` [PATCH 3/3] KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs Sean Christopherson
  2 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2022-05-12 22:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Jue Wang

Use an explicit case statement to grab the full range of MCx bank MSRs
in {g,s}et_msr_mce(), and manually check only the "end" (the number of
banks configured by userspace may be less than the max).  The "default"
trick works, but is a bit odd now, and will be quite odd if/when support
for accessing MCx_CTL2 MSRs is added, which has near identical logic.

Hoist "offset" to function scope so as to avoid curly braces for the case
statement, and because MCx_CTL2 support will need the same variables.

Opportunstically clean up the comment about allowing bit 10 to be cleared
from bank 4.

No functional change intended.

Cc: Jue Wang <juew@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 71 ++++++++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bc6db58975dc..7e685ea9882b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3206,6 +3206,7 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	unsigned bank_num = mcg_cap & 0xff;
 	u32 msr = msr_info->index;
 	u64 data = msr_info->data;
+	u32 offset, last_msr;
 
 	switch (msr) {
 	case MSR_IA32_MCG_STATUS:
@@ -3219,32 +3220,33 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		vcpu->arch.mcg_ctl = data;
 		break;
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
+
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+					    last_msr + 1 - MSR_IA32_MC0_CTL);
+
+		/*
+		 * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
+		 * values are architecturally undefined.  But, some Linux
+		 * kernels clear bit 10 in bank 4 to workaround a BIOS/GART TLB
+		 * issue on AMD K8s, allow bit 10 to be clear when setting all
+		 * other bits in order to avoid an uncaught #GP in the guest.
+		 */
+		if ((offset & 0x3) == 0 &&
+		    data != 0 && (data | (1 << 10)) != ~(u64)0)
+			return 1;
+
+		/* MCi_STATUS */
+		if (!msr_info->host_initiated && (offset & 0x3) == 1 &&
+		    data != 0 && !can_set_mci_status(vcpu))
+			return 1;
+
+		vcpu->arch.mce_banks[offset] = data;
+		break;
 	default:
-		if (msr >= MSR_IA32_MC0_CTL &&
-		    msr < MSR_IA32_MCx_CTL(bank_num)) {
-			u32 offset = array_index_nospec(
-				msr - MSR_IA32_MC0_CTL,
-				MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
-
-			/* only 0 or all 1s can be written to IA32_MCi_CTL
-			 * some Linux kernels though clear bit 10 in bank 4 to
-			 * workaround a BIOS/GART TBL issue on AMD K8s, ignore
-			 * this to avoid an uncatched #GP in the guest
-			 */
-			if ((offset & 0x3) == 0 &&
-			    data != 0 && (data | (1 << 10)) != ~(u64)0)
-				return 1;
-
-			/* MCi_STATUS */
-			if (!msr_info->host_initiated &&
-			    (offset & 0x3) == 1 && data != 0) {
-				if (!can_set_mci_status(vcpu))
-					return 1;
-			}
-
-			vcpu->arch.mce_banks[offset] = data;
-			break;
-		}
 		return 1;
 	}
 	return 0;
@@ -3789,6 +3791,7 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
 	u64 data;
 	u64 mcg_cap = vcpu->arch.mcg_cap;
 	unsigned bank_num = mcg_cap & 0xff;
+	u32 offset, last_msr;
 
 	switch (msr) {
 	case MSR_IA32_P5_MC_ADDR:
@@ -3806,16 +3809,16 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
 	case MSR_IA32_MCG_STATUS:
 		data = vcpu->arch.mcg_status;
 		break;
-	default:
-		if (msr >= MSR_IA32_MC0_CTL &&
-		    msr < MSR_IA32_MCx_CTL(bank_num)) {
-			u32 offset = array_index_nospec(
-				msr - MSR_IA32_MC0_CTL,
-				MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
 
-			data = vcpu->arch.mce_banks[offset];
-			break;
-		}
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+					    last_msr + 1 - MSR_IA32_MC0_CTL);
+		data = vcpu->arch.mce_banks[offset];
+		break;
+	default:
 		return 1;
 	}
 	*pdata = data;
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs
  2022-05-12 22:27 [PATCH 0/3] KVM: x86: WRMSR MCi_CTL/STATUS bug fix and cleanups Sean Christopherson
  2022-05-12 22:27 ` [PATCH 1/3] KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) Sean Christopherson
  2022-05-12 22:27 ` [PATCH 2/3] KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce() Sean Christopherson
@ 2022-05-12 22:27 ` Sean Christopherson
  2022-05-12 23:32   ` Jim Mattson
  2 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2022-05-12 22:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Jue Wang

Add helpers to identify CTL (control) and STATUS MCi MSR types instead of
open coding the checks using the offset.  Using the offset is perfectly
safe, but unintuitive, as understanding what the code does requires
knowing that the offset calcuation will not affect the lower three bits.

Opportunistically comment the STATUS logic to save readers a trip to
Intel's SDM or AMD's APM to understand the "data != 0" check.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7e685ea9882b..b61c5def0bfc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3188,6 +3188,16 @@ static void kvmclock_sync_fn(struct work_struct *work)
 					KVMCLOCK_SYNC_PERIOD);
 }
 
+/* These helpers are safe iff @msr is known to be an MCx bank MSR. */
+static bool is_mci_control_msr(u32 msr)
+{
+	return (msr & 3) == 0;
+}
+static bool is_mci_status_msr(u32 msr)
+{
+	return (msr & 3) == 1;
+}
+
 /*
  * On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
  */
@@ -3225,9 +3235,6 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (msr > last_msr)
 			return 1;
 
-		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
-					    last_msr + 1 - MSR_IA32_MC0_CTL);
-
 		/*
 		 * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
 		 * values are architecturally undefined.  But, some Linux
@@ -3235,15 +3242,21 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * issue on AMD K8s, allow bit 10 to be clear when setting all
 		 * other bits in order to avoid an uncaught #GP in the guest.
 		 */
-		if ((offset & 0x3) == 0 &&
+		if (is_mci_control_msr(msr) &&
 		    data != 0 && (data | (1 << 10)) != ~(u64)0)
 			return 1;
 
-		/* MCi_STATUS */
-		if (!msr_info->host_initiated && (offset & 0x3) == 1 &&
+		/*
+		 * All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR.
+		 * AMD-based CPUs allow non-zero values, but if and only if
+		 * HWCR[McStatusWrEn] is set.
+		 */
+		if (!msr_info->host_initiated && is_mci_status_msr(msr) &&
 		    data != 0 && !can_set_mci_status(vcpu))
 			return 1;
 
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+					    last_msr + 1 - MSR_IA32_MC0_CTL);
 		vcpu->arch.mce_banks[offset] = data;
 		break;
 	default:
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)
  2022-05-12 22:27 ` [PATCH 1/3] KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) Sean Christopherson
@ 2022-05-12 22:43   ` Jim Mattson
  0 siblings, 0 replies; 7+ messages in thread
From: Jim Mattson @ 2022-05-12 22:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm,
	linux-kernel, Jue Wang

On Thu, May 12, 2022 at 3:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
> MCi_STATUS MSR.  The behavior of "all zeros' or "all ones" for CTL MSRs
> is architectural, as is the "only zeros" behavior for STATUS MSRs.  I.e.
> the intent is to inject a #GP, not exit to userspace due to an unhandled
> emulation case.  Returning '-1' gets interpreted as -EPERM up the stack
> and effecitvely kills the guest.
>
> Fixes: 890ca9aefa78 ("KVM: Add MCE support")
> Fixes: 9ffd986c6e4e ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce()
  2022-05-12 22:27 ` [PATCH 2/3] KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce() Sean Christopherson
@ 2022-05-12 23:23   ` Jim Mattson
  0 siblings, 0 replies; 7+ messages in thread
From: Jim Mattson @ 2022-05-12 23:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm,
	linux-kernel, Jue Wang

On Thu, May 12, 2022 at 3:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Use an explicit case statement to grab the full range of MCx bank MSRs
> in {g,s}et_msr_mce(), and manually check only the "end" (the number of
> banks configured by userspace may be less than the max).  The "default"
> trick works, but is a bit odd now, and will be quite odd if/when support
> for accessing MCx_CTL2 MSRs is added, which has near identical logic.
>
> Hoist "offset" to function scope so as to avoid curly braces for the case
> statement, and because MCx_CTL2 support will need the same variables.
>
> Opportunstically clean up the comment about allowing bit 10 to be cleared
> from bank 4.
>
> No functional change intended.
>
> Cc: Jue Wang <juew@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs
  2022-05-12 22:27 ` [PATCH 3/3] KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs Sean Christopherson
@ 2022-05-12 23:32   ` Jim Mattson
  0 siblings, 0 replies; 7+ messages in thread
From: Jim Mattson @ 2022-05-12 23:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm,
	linux-kernel, Jue Wang

On Thu, May 12, 2022 at 3:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Add helpers to identify CTL (control) and STATUS MCi MSR types instead of
> open coding the checks using the offset.  Using the offset is perfectly
> safe, but unintuitive, as understanding what the code does requires
> knowing that the offset calcuation will not affect the lower three bits.
>
> Opportunistically comment the STATUS logic to save readers a trip to
> Intel's SDM or AMD's APM to understand the "data != 0" check.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-05-12 23:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 22:27 [PATCH 0/3] KVM: x86: WRMSR MCi_CTL/STATUS bug fix and cleanups Sean Christopherson
2022-05-12 22:27 ` [PATCH 1/3] KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) Sean Christopherson
2022-05-12 22:43   ` Jim Mattson
2022-05-12 22:27 ` [PATCH 2/3] KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce() Sean Christopherson
2022-05-12 23:23   ` Jim Mattson
2022-05-12 22:27 ` [PATCH 3/3] KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs Sean Christopherson
2022-05-12 23:32   ` Jim Mattson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.