All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation
@ 2022-06-10 17:11 Jue Wang
  2022-06-10 17:11 ` [PATCH v5 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL Jue Wang
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Jue Wang @ 2022-06-10 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch series implement emulation for Corrected Machine Check
Interrupt (CMCI) signaling and UnCorrectable No Action required (UCNA)
error injection.

UCNA errors signaled via CMCI allow a guest to be notified as soon as
uncorrectable memory errors get detected by some background threads,
e.g., threads that migrate guest memory across hosts or threads that
constantly scan system memory for errors [1].

Upon receiving UCNAs, guest kernel isolates the poisoned pages which may
still be free, preventing future accesses that may cause fatal MCEs.

1. https://lore.kernel.org/linux-mm/8eceffc0-01e8-2a55-6eb9-b26faa9e3caf@intel.com/t/


Patch 1-3 clean up KVM APIC LVT logic in preparation to adding APIC_LVTCMCI.

Patch 4 adds APIC_LVTCMCI emulation.

Patch 5 updates mce_banks to use array allocation api.

Patch 6 adds emulation for MSR_IA32_MCx_CTL2 registers that provide per
bank control of CMCI signaling.

Patch 7 enables MCG_CMCI_P and handles injected UCNA errors.

Patch 8 adds a KVM self test that validates UCNA injection and CMCI
emulation.

v5 changes
- Incorporate feedback from David Matlack <dmatlack@google.com>
- Rewrite the change log to be more concise and accurate.
- Removes several duplicated checks in UCNA injection code.
- Add test cases that validate CMCI emulation to self test.

v4 changes
- Incorporate feedback from David Matlack <dmatlack@google.com>
- Rewrite the change logs to be more descriptive.
- Add a KVM self test.

v3 changes
- Incorporate feedback from Sean Christopherson <seanjc@google.com>
- Split clean up to KVM APIC LVT logic to 3 patches.
- Put the clean up of mce_array allocation in a separate patch.
- Base the MCi_CTL2 register emulation on Sean's clean up and fix
series [2]
- Fix bugs around MCi_CTL2 register offset validation and the free of
mci_ctl2_banks array.
- Rewrite the change log with more details in architectural information
about CMCI, UCNA and MCG_CMCI_P.
- Fix various comments and wrapping style.

2. https://lore.kernel.org/lkml/20220512222716.4112548-1-seanjc@google.com/T/

v2 chanegs
- Incorporate feedback from Sean Christopherson <seanjc@google.com>
- Split the single patch into 4:
  1). clean up KVM APIC LVT logic
  2). add CMCI emulation to lapic
  3). add emulation of MSR_IA32_MCx_CTL2
  4). enable MCG_CMCI_P and handle injected UCNAs
- Fix various style issues.

Jue Wang (8):
  KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.
  KVM: x86: Fill apic_lvt_mask with enums / explicit entries.
  KVM: x86: Add APIC_LVTx() macro.
  KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to
    lapic.
  KVM: x86: Use kcalloc to allocate the mce_banks array.
  KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.
  KVM: x86: Enable CMCI capability by default and handle injected UCNA
    errors
  KVM: selftests: Add a self test for CMCI and UCNA emulations.

 arch/x86/include/asm/kvm_host.h               |   1 +
 arch/x86/kvm/lapic.c                          |  66 ++--
 arch/x86/kvm/lapic.h                          |  16 +-
 arch/x86/kvm/vmx/vmx.c                        |   1 +
 arch/x86/kvm/x86.c                            | 178 ++++++---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/apic.h       |   1 +
 .../selftests/kvm/include/x86_64/mce.h        |  25 ++
 .../selftests/kvm/include/x86_64/processor.h  |   1 +
 .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
 .../kvm/x86_64/ucna_injection_test.c          | 347 ++++++++++++++++++
 12 files changed, 573 insertions(+), 67 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/mce.h
 create mode 100644 tools/testing/selftests/kvm/x86_64/ucna_injection_test.c

-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
@ 2022-06-10 17:11 ` Jue Wang
  2022-06-10 17:11 ` [PATCH v5 2/8] KVM: x86: Fill apic_lvt_mask with enums / explicit entries Jue Wang
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-06-10 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

Refactor APIC_VERSION so that the maximum number of LVT entries is
inserted at runtime rather than compile time. This will be used in a
subsequent commit to expose the LVT CMCI Register to VMs that support
Corrected Machine Check error counting/signaling
(IA32_MCG_CAP.MCG_CMCI_P=1).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/lapic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 66b0eb0bda94..a5caa77e279f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -54,7 +54,7 @@
 #define PRIo64 "o"
 
 /* 14 is the version for Xeon and Pentium 8.4.8*/
-#define APIC_VERSION			(0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
+#define APIC_VERSION			0x14UL
 #define LAPIC_MMIO_LENGTH		(1 << 12)
 /* followed define is not in apicdef.h */
 #define MAX_APIC_VECTOR			256
@@ -401,7 +401,7 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
 void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	u32 v = APIC_VERSION;
+	u32 v = APIC_VERSION | ((KVM_APIC_LVT_NUM - 1) << 16);
 
 	if (!lapic_in_kernel(vcpu))
 		return;
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 2/8] KVM: x86: Fill apic_lvt_mask with enums / explicit entries.
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
  2022-06-10 17:11 ` [PATCH v5 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL Jue Wang
@ 2022-06-10 17:11 ` Jue Wang
  2022-06-10 17:11 ` [PATCH v5 3/8] KVM: x86: Add APIC_LVTx() macro Jue Wang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-06-10 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch defines a lapic_lvt_entry enum used as explicit indices to
the apic_lvt_mask array. In later patches a LVT_CMCI will be added to
implement the Corrected Machine Check Interrupt signaling.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/lapic.c | 19 ++++++++++---------
 arch/x86/kvm/lapic.h | 12 +++++++++++-
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a5caa77e279f..73f5cd248a63 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -401,7 +401,7 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
 void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	u32 v = APIC_VERSION | ((KVM_APIC_LVT_NUM - 1) << 16);
+	u32 v = APIC_VERSION | ((KVM_APIC_MAX_NR_LVT_ENTRIES - 1) << 16);
 
 	if (!lapic_in_kernel(vcpu))
 		return;
@@ -419,12 +419,13 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 	kvm_lapic_set_reg(apic, APIC_LVR, v);
 }
 
-static const unsigned int apic_lvt_mask[KVM_APIC_LVT_NUM] = {
-	LVT_MASK ,      /* part LVTT mask, timer mode mask added at runtime */
-	LVT_MASK | APIC_MODE_MASK,	/* LVTTHMR */
-	LVT_MASK | APIC_MODE_MASK,	/* LVTPC */
-	LINT_MASK, LINT_MASK,	/* LVT0-1 */
-	LVT_MASK		/* LVTERR */
+static const unsigned int apic_lvt_mask[KVM_APIC_MAX_NR_LVT_ENTRIES] = {
+	[LVT_TIMER] = LVT_MASK,      /* timer mode mask added at runtime */
+	[LVT_THERMAL_MONITOR] = LVT_MASK | APIC_MODE_MASK,
+	[LVT_PERFORMANCE_COUNTER] = LVT_MASK | APIC_MODE_MASK,
+	[LVT_LINT0] = LINT_MASK,
+	[LVT_LINT1] = LINT_MASK,
+	[LVT_ERROR] = LVT_MASK
 };
 
 static int find_highest_vector(void *bitmap)
@@ -2084,7 +2085,7 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			int i;
 			u32 lvt_val;
 
-			for (i = 0; i < KVM_APIC_LVT_NUM; i++) {
+			for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++) {
 				lvt_val = kvm_lapic_get_reg(apic,
 						       APIC_LVTT + 0x10 * i);
 				kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i,
@@ -2383,7 +2384,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
-	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
+	for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++)
 		kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
 	apic_update_lvtt(apic);
 	if (kvm_vcpu_is_reset_bsp(vcpu) &&
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4e4f8a22754f..4990793c2034 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -10,7 +10,6 @@
 
 #define KVM_APIC_INIT		0
 #define KVM_APIC_SIPI		1
-#define KVM_APIC_LVT_NUM	6
 
 #define APIC_SHORT_MASK			0xc0000
 #define APIC_DEST_NOSHORT		0x0
@@ -29,6 +28,17 @@ enum lapic_mode {
 	LAPIC_MODE_X2APIC = MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE,
 };
 
+enum lapic_lvt_entry {
+	LVT_TIMER,
+	LVT_THERMAL_MONITOR,
+	LVT_PERFORMANCE_COUNTER,
+	LVT_LINT0,
+	LVT_LINT1,
+	LVT_ERROR,
+
+	KVM_APIC_MAX_NR_LVT_ENTRIES,
+};
+
 struct kvm_timer {
 	struct hrtimer timer;
 	s64 period; 				/* unit: ns */
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 3/8] KVM: x86: Add APIC_LVTx() macro.
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
  2022-06-10 17:11 ` [PATCH v5 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL Jue Wang
  2022-06-10 17:11 ` [PATCH v5 2/8] KVM: x86: Fill apic_lvt_mask with enums / explicit entries Jue Wang
@ 2022-06-10 17:11 ` Jue Wang
  2022-06-10 17:11 ` [PATCH v5 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic Jue Wang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-06-10 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

An APIC_LVTx macro is introduced to calcualte the APIC_LVTx register
offset based on the index in the lapic_lvt_entry enum. Later patches
will extend the APIC_LVTx macro to support the APIC_LVTCMCI register
in order to implement Corrected Machine Check Interrupt signaling.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/lapic.c | 7 +++----
 arch/x86/kvm/lapic.h | 2 ++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 73f5cd248a63..db12d2ef1aef 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2086,9 +2086,8 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			u32 lvt_val;
 
 			for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++) {
-				lvt_val = kvm_lapic_get_reg(apic,
-						       APIC_LVTT + 0x10 * i);
-				kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i,
+				lvt_val = kvm_lapic_get_reg(apic, APIC_LVTx(i));
+				kvm_lapic_set_reg(apic, APIC_LVTx(i),
 					     lvt_val | APIC_LVT_MASKED);
 			}
 			apic_update_lvtt(apic);
@@ -2385,7 +2384,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 	kvm_apic_set_version(apic->vcpu);
 
 	for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++)
-		kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
+		kvm_lapic_set_reg(apic, APIC_LVTx(i), APIC_LVT_MASKED);
 	apic_update_lvtt(apic);
 	if (kvm_vcpu_is_reset_bsp(vcpu) &&
 	    kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_LINT0_REENABLED))
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4990793c2034..2d197ed0b8ce 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -39,6 +39,8 @@ enum lapic_lvt_entry {
 	KVM_APIC_MAX_NR_LVT_ENTRIES,
 };
 
+#define APIC_LVTx(x) (APIC_LVTT + 0x10 * (x))
+
 struct kvm_timer {
 	struct hrtimer timer;
 	s64 period; 				/* unit: ns */
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (2 preceding siblings ...)
  2022-06-10 17:11 ` [PATCH v5 3/8] KVM: x86: Add APIC_LVTx() macro Jue Wang
@ 2022-06-10 17:11 ` Jue Wang
  2022-07-01  2:07   ` Xiaoyao Li
  2022-06-10 17:11 ` [PATCH v5 5/8] KVM: x86: Use kcalloc to allocate the mce_banks array Jue Wang
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Jue Wang @ 2022-06-10 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch calculates the number of lvt entries as part of
KVM_X86_MCE_SETUP conditioned on the presence of MCG_CMCI_P bit in
MCG_CAP and stores result in kvm_lapic. It translats from APIC_LVTx
register to index in lapic_lvt_entry enum. It extends the APIC_LVTx
macro as well as other lapic write/reset handling etc to support
Corrected Machine Check Interrupt.

Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/lapic.c | 48 ++++++++++++++++++++++++++++++--------------
 arch/x86/kvm/lapic.h |  4 +++-
 arch/x86/kvm/x86.c   |  2 ++
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index db12d2ef1aef..8537b66cc646 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -27,6 +27,7 @@
 #include <linux/math64.h>
 #include <linux/slab.h>
 #include <asm/processor.h>
+#include <asm/mce.h>
 #include <asm/msr.h>
 #include <asm/page.h>
 #include <asm/current.h>
@@ -398,14 +399,21 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
 	return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
 }
 
+static inline bool kvm_lapic_lvt_supported(struct kvm_lapic *apic, int lvt_index)
+{
+	return apic->nr_lvt_entries > lvt_index;
+}
+
 void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	u32 v = APIC_VERSION | ((KVM_APIC_MAX_NR_LVT_ENTRIES - 1) << 16);
+	u32 v = 0;
 
 	if (!lapic_in_kernel(vcpu))
 		return;
 
+	v = APIC_VERSION | ((apic->nr_lvt_entries - 1) << 16);
+
 	/*
 	 * KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
 	 * which doesn't have EOI register; Some buggy OSes (e.g. Windows with
@@ -425,7 +433,8 @@ static const unsigned int apic_lvt_mask[KVM_APIC_MAX_NR_LVT_ENTRIES] = {
 	[LVT_PERFORMANCE_COUNTER] = LVT_MASK | APIC_MODE_MASK,
 	[LVT_LINT0] = LINT_MASK,
 	[LVT_LINT1] = LINT_MASK,
-	[LVT_ERROR] = LVT_MASK
+	[LVT_ERROR] = LVT_MASK,
+	[LVT_CMCI] = LVT_MASK | APIC_MODE_MASK
 };
 
 static int find_highest_vector(void *bitmap)
@@ -1445,6 +1454,9 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 		APIC_REG_MASK(APIC_TMCCT) |
 		APIC_REG_MASK(APIC_TDCR);
 
+	if (kvm_lapic_lvt_supported(apic, LVT_CMCI))
+		valid_reg_mask |= APIC_REG_MASK(APIC_LVTCMCI);
+
 	/*
 	 * ARBPRI and ICR2 are not valid in x2APIC mode.  WARN if KVM reads ICR
 	 * in x2APIC mode as it's an 8-byte register in x2APIC and needs to be
@@ -2039,6 +2051,15 @@ static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
 	}
 }
 
+static int get_lvt_index(u32 reg) {
+	if (reg == APIC_LVTCMCI)
+		return LVT_CMCI;
+	if (reg < APIC_LVTT || reg > APIC_LVTERR)
+		return -1;
+	return array_index_nospec(
+			(reg - APIC_LVTT) >> 4, KVM_APIC_MAX_NR_LVT_ENTRIES);
+}
+
 static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 {
 	int ret = 0;
@@ -2083,12 +2104,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		apic_set_spiv(apic, val & mask);
 		if (!(val & APIC_SPIV_APIC_ENABLED)) {
 			int i;
-			u32 lvt_val;
 
-			for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++) {
-				lvt_val = kvm_lapic_get_reg(apic, APIC_LVTx(i));
+			for (i = 0; i < apic->nr_lvt_entries; i++) {
 				kvm_lapic_set_reg(apic, APIC_LVTx(i),
-					     lvt_val | APIC_LVT_MASKED);
+					kvm_lapic_get_reg(apic, APIC_LVTx(i)) | APIC_LVT_MASKED);
 			}
 			apic_update_lvtt(apic);
 			atomic_set(&apic->lapic_timer.pending, 0);
@@ -2117,16 +2136,15 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 	case APIC_LVTTHMR:
 	case APIC_LVTPC:
 	case APIC_LVT1:
-	case APIC_LVTERR: {
-		/* TODO: Check vector */
-		size_t size;
-		u32 index;
-
+	case APIC_LVTERR:
+	case APIC_LVTCMCI: {
+		u32 index = get_lvt_index(reg);
+		if (!kvm_lapic_lvt_supported(apic, index)) {
+			ret = 1;
+			break;
+		}
 		if (!kvm_apic_sw_enabled(apic))
 			val |= APIC_LVT_MASKED;
-		size = ARRAY_SIZE(apic_lvt_mask);
-		index = array_index_nospec(
-				(reg - APIC_LVTT) >> 4, size);
 		val &= apic_lvt_mask[index];
 		kvm_lapic_set_reg(apic, reg, val);
 		break;
@@ -2383,7 +2401,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
-	for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++)
+	for (i = 0; i < apic->nr_lvt_entries; i++)
 		kvm_lapic_set_reg(apic, APIC_LVTx(i), APIC_LVT_MASKED);
 	apic_update_lvtt(apic);
 	if (kvm_vcpu_is_reset_bsp(vcpu) &&
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 2d197ed0b8ce..0fcafc76a3ce 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -35,11 +35,12 @@ enum lapic_lvt_entry {
 	LVT_LINT0,
 	LVT_LINT1,
 	LVT_ERROR,
+	LVT_CMCI,
 
 	KVM_APIC_MAX_NR_LVT_ENTRIES,
 };
 
-#define APIC_LVTx(x) (APIC_LVTT + 0x10 * (x))
+#define APIC_LVTx(x) ((x) == LVT_CMCI ? APIC_LVTCMCI : APIC_LVTT + 0x10 * (x))
 
 struct kvm_timer {
 	struct hrtimer timer;
@@ -78,6 +79,7 @@ struct kvm_lapic {
 	struct gfn_to_hva_cache vapic_cache;
 	unsigned long pending_events;
 	unsigned int sipi_vector;
+	int nr_lvt_entries;
 };
 
 struct dest_map;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4790f0d7d40b..a08693808729 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4772,6 +4772,8 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
 	/* Init IA32_MCi_CTL to all 1s */
 	for (bank = 0; bank < bank_num; bank++)
 		vcpu->arch.mce_banks[bank*4] = ~(u64)0;
+	vcpu->arch.apic->nr_lvt_entries =
+		KVM_APIC_MAX_NR_LVT_ENTRIES - !(mcg_cap & MCG_CMCI_P);
 
 	static_call(kvm_x86_setup_mce)(vcpu);
 out:
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 5/8] KVM: x86: Use kcalloc to allocate the mce_banks array.
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (3 preceding siblings ...)
  2022-06-10 17:11 ` [PATCH v5 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic Jue Wang
@ 2022-06-10 17:11 ` Jue Wang
  2022-06-10 17:11 ` [PATCH v5 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs Jue Wang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-06-10 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch updates the allocation of mce_banks with the array allocation
API (kcalloc) as a precedent for the later mci_ctl2_banks to implement
per-bank control of Corrected Machine Check Interrupt (CMCI).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a08693808729..71d0e2a7b417 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11226,7 +11226,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		goto fail_free_lapic;
 	vcpu->arch.pio_data = page_address(page);
 
-	vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4,
+	vcpu->arch.mce_banks = kcalloc(KVM_MAX_MCE_BANKS * 4, sizeof(u64),
 				       GFP_KERNEL_ACCOUNT);
 	if (!vcpu->arch.mce_banks)
 		goto fail_free_pio_data;
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (4 preceding siblings ...)
  2022-06-10 17:11 ` [PATCH v5 5/8] KVM: x86: Use kcalloc to allocate the mce_banks array Jue Wang
@ 2022-06-10 17:11 ` Jue Wang
  2022-06-10 17:11 ` [PATCH v5 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors Jue Wang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-06-10 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch adds the emulation of IA32_MCi_CTL2 registers to KVM. A
separate mci_ctl2_banks array is used to keep the existing mce_banks
register layout intact.

In Machine Check Architecture, in addition to MCG_CMCI_P, bit 30 of
the per-bank register IA32_MCi_CTL2 controls whether Corrected Machine
Check error reporting is enabled.

Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/x86.c              | 130 ++++++++++++++++++++++----------
 2 files changed, 92 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4ff36610af6a..178b7e01bf8f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -806,6 +806,7 @@ struct kvm_vcpu_arch {
 	u64 mcg_ctl;
 	u64 mcg_ext_ctl;
 	u64 *mce_banks;
+	u64 *mci_ctl2_banks;
 
 	/* Cache MMIO info */
 	u64 mmio_gva;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 71d0e2a7b417..b18994907322 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3174,6 +3174,16 @@ static void kvmclock_sync_fn(struct work_struct *work)
 					KVMCLOCK_SYNC_PERIOD);
 }
 
+/* These helpers are safe iff @msr is known to be an MCx bank MSR. */
+static bool is_mci_control_msr(u32 msr)
+{
+	return (msr & 3) == 0;
+}
+static bool is_mci_status_msr(u32 msr)
+{
+	return (msr & 3) == 1;
+}
+
 /*
  * On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
  */
@@ -3192,6 +3202,7 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	unsigned bank_num = mcg_cap & 0xff;
 	u32 msr = msr_info->index;
 	u64 data = msr_info->data;
+	u32 offset, last_msr;
 
 	switch (msr) {
 	case MSR_IA32_MCG_STATUS:
@@ -3205,32 +3216,50 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		vcpu->arch.mcg_ctl = data;
 		break;
-	default:
-		if (msr >= MSR_IA32_MC0_CTL &&
-		    msr < MSR_IA32_MCx_CTL(bank_num)) {
-			u32 offset = array_index_nospec(
-				msr - MSR_IA32_MC0_CTL,
-				MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
-
-			/* only 0 or all 1s can be written to IA32_MCi_CTL
-			 * some Linux kernels though clear bit 10 in bank 4 to
-			 * workaround a BIOS/GART TBL issue on AMD K8s, ignore
-			 * this to avoid an uncatched #GP in the guest
-			 */
-			if ((offset & 0x3) == 0 &&
-			    data != 0 && (data | (1 << 10)) != ~(u64)0)
-				return -1;
-
-			/* MCi_STATUS */
-			if (!msr_info->host_initiated &&
-			    (offset & 0x3) == 1 && data != 0) {
-				if (!can_set_mci_status(vcpu))
-					return -1;
-			}
+	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
 
-			vcpu->arch.mce_banks[offset] = data;
-			break;
-		}
+		if (!(mcg_cap & MCG_CMCI_P) && (data || !msr_info->host_initiated))
+			return 1;
+		/* An attempt to write a 1 to a reserved bit raises #GP */
+		if (data & ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK))
+			return 1;
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
+					    last_msr + 1 - MSR_IA32_MC0_CTL2);
+		vcpu->arch.mci_ctl2_banks[offset] = data;
+		break;
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
+
+		/*
+		 * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
+		 * values are architecturally undefined.  But, some Linux
+		 * kernels clear bit 10 in bank 4 to workaround a BIOS/GART TLB
+		 * issue on AMD K8s, allow bit 10 to be clear when setting all
+		 * other bits in order to avoid an uncaught #GP in the guest.
+		 */
+		if (is_mci_control_msr(msr) &&
+		    data != 0 && (data | (1 << 10)) != ~(u64)0)
+			return 1;
+
+		/*
+		 * All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR.
+		 * AMD-based CPUs allow non-zero values, but if and only if
+		 * HWCR[McStatusWrEn] is set.
+		 */
+		if (!msr_info->host_initiated && is_mci_status_msr(msr) &&
+		    data != 0 && !can_set_mci_status(vcpu))
+			return 1;
+
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+					    last_msr + 1 - MSR_IA32_MC0_CTL);
+		vcpu->arch.mce_banks[offset] = data;
+		break;
+	default:
 		return 1;
 	}
 	return 0;
@@ -3514,7 +3543,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		}
 		break;
-	case 0x200 ... 0x2ff:
+	case 0x200 ... MSR_IA32_MC0_CTL2 - 1:
+	case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff:
 		return kvm_mtrr_set_msr(vcpu, msr, data);
 	case MSR_IA32_APICBASE:
 		return kvm_set_apic_base(vcpu, msr_info);
@@ -3671,6 +3701,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
 		return set_msr_mce(vcpu, msr_info);
 
 	case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
@@ -3775,6 +3806,7 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
 	u64 data;
 	u64 mcg_cap = vcpu->arch.mcg_cap;
 	unsigned bank_num = mcg_cap & 0xff;
+	u32 offset, last_msr;
 
 	switch (msr) {
 	case MSR_IA32_P5_MC_ADDR:
@@ -3792,16 +3824,27 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
 	case MSR_IA32_MCG_STATUS:
 		data = vcpu->arch.mcg_status;
 		break;
-	default:
-		if (msr >= MSR_IA32_MC0_CTL &&
-		    msr < MSR_IA32_MCx_CTL(bank_num)) {
-			u32 offset = array_index_nospec(
-				msr - MSR_IA32_MC0_CTL,
-				MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
+	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
 
-			data = vcpu->arch.mce_banks[offset];
-			break;
-		}
+		if (!(mcg_cap & MCG_CMCI_P) && !host)
+			return 1;
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
+					    last_msr + 1 - MSR_IA32_MC0_CTL2);
+		data = vcpu->arch.mci_ctl2_banks[offset];
+		break;
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
+
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+					    last_msr + 1 - MSR_IA32_MC0_CTL);
+		data = vcpu->arch.mce_banks[offset];
+		break;
+	default:
 		return 1;
 	}
 	*pdata = data;
@@ -3898,7 +3941,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	}
 	case MSR_MTRRcap:
-	case 0x200 ... 0x2ff:
+	case 0x200 ... MSR_IA32_MC0_CTL2 - 1:
+	case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff:
 		return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
 	case 0xcd: /* fsb frequency */
 		msr_info->data = 3;
@@ -4014,6 +4058,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
 		return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
 				   msr_info->host_initiated);
 	case MSR_IA32_XSS:
@@ -4769,9 +4814,12 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
 	/* Init IA32_MCG_CTL to all 1s */
 	if (mcg_cap & MCG_CTL_P)
 		vcpu->arch.mcg_ctl = ~(u64)0;
-	/* Init IA32_MCi_CTL to all 1s */
-	for (bank = 0; bank < bank_num; bank++)
+	/* Init IA32_MCi_CTL to all 1s, IA32_MCi_CTL2 to all 0s */
+	for (bank = 0; bank < bank_num; bank++) {
 		vcpu->arch.mce_banks[bank*4] = ~(u64)0;
+		if (mcg_cap & MCG_CMCI_P)
+			vcpu->arch.mci_ctl2_banks[bank] = 0;
+	}
 	vcpu->arch.apic->nr_lvt_entries =
 		KVM_APIC_MAX_NR_LVT_ENTRIES - !(mcg_cap & MCG_CMCI_P);
 
@@ -11228,7 +11276,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mce_banks = kcalloc(KVM_MAX_MCE_BANKS * 4, sizeof(u64),
 				       GFP_KERNEL_ACCOUNT);
-	if (!vcpu->arch.mce_banks)
+	vcpu->arch.mci_ctl2_banks = kcalloc(KVM_MAX_MCE_BANKS, sizeof(u64),
+					    GFP_KERNEL_ACCOUNT);
+	if (!vcpu->arch.mce_banks || !vcpu->arch.mci_ctl2_banks)
 		goto fail_free_pio_data;
 	vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS;
 
@@ -11281,6 +11331,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
 fail_free_mce_banks:
 	kfree(vcpu->arch.mce_banks);
+	kfree(vcpu->arch.mci_ctl2_banks);
 fail_free_pio_data:
 	free_page((unsigned long)vcpu->arch.pio_data);
 fail_free_lapic:
@@ -11325,6 +11376,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	kvm_hv_vcpu_uninit(vcpu);
 	kvm_pmu_destroy(vcpu);
 	kfree(vcpu->arch.mce_banks);
+	kfree(vcpu->arch.mci_ctl2_banks);
 	kvm_free_lapic(vcpu);
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 	kvm_mmu_destroy(vcpu);
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (5 preceding siblings ...)
  2022-06-10 17:11 ` [PATCH v5 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs Jue Wang
@ 2022-06-10 17:11 ` Jue Wang
  2022-06-10 17:11 ` [PATCH v5 8/8] KVM: selftests: Add a self test for CMCI and UCNA emulations Jue Wang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-06-10 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch enables MCG_CMCI_P by default in kvm_mce_cap_supported. It
reuses ioctl KVM_X86_SET_MCE to implement injection of UnCorrectable
No Action required (UCNA) errors, signaled via Corrected Machine
Check Interrupt (CMCI).

Neither of the CMCI and UCNA emulations depends on hardware.

Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/vmx/vmx.c |  1 +
 arch/x86/kvm/x86.c     | 44 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 610355b9ccce..1aed964ee4ee 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8037,6 +8037,7 @@ static __init int hardware_setup(void)
 	}
 
 	kvm_mce_cap_supported |= MCG_LMCE_P;
+	kvm_mce_cap_supported |= MCG_CMCI_P;
 
 	if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
 		return -EINVAL;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b18994907322..4322a1365f74 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4828,6 +4828,43 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
 	return r;
 }
 
+/*
+ * Validate this is an UCNA error by checking the MCG_STATUS and MCi_STATUS
+ * registers that none of the bits for Machine Check Exceptions are set and
+ * both the VAL (valid) and UC (uncorrectable) bits are set.
+ * UCNA - UnCorrectable No Action required
+ * SRAR - Software Recoverable Action Required
+ * MCI_STATUS_PCC - Processor Context Corrupted
+ * MCI_STATUS_S - Signaled as a Machine Check Exception
+ * MCI_STATUS_AR - This MCE is "software recoverable action required"
+ */
+static bool is_ucna(struct kvm_x86_mce *mce)
+{
+	return	!mce->mcg_status &&
+		!(mce->status & (MCI_STATUS_PCC | MCI_STATUS_S | MCI_STATUS_AR)) &&
+		(mce->status & MCI_STATUS_VAL) &&
+		(mce->status & MCI_STATUS_UC);
+}
+
+static int kvm_vcpu_x86_set_ucna(struct kvm_vcpu *vcpu, struct kvm_x86_mce *mce, u64* banks)
+{
+	u64 mcg_cap = vcpu->arch.mcg_cap;
+
+	banks[1] = mce->status;
+	banks[2] = mce->addr;
+	banks[3] = mce->misc;
+	vcpu->arch.mcg_status = mce->mcg_status;
+
+	if (!(mcg_cap & MCG_CMCI_P) ||
+	    !(vcpu->arch.mci_ctl2_banks[mce->bank] & MCI_CTL2_CMCI_EN))
+		return 0;
+
+	if (lapic_in_kernel(vcpu))
+		kvm_apic_local_deliver(vcpu->arch.apic, APIC_LVTCMCI);
+
+	return 0;
+}
+
 static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
 				      struct kvm_x86_mce *mce)
 {
@@ -4837,6 +4874,12 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
 
 	if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL))
 		return -EINVAL;
+
+	banks += array_index_nospec(4 * mce->bank, 4 * bank_num);
+
+	if (is_ucna(mce))
+		return kvm_vcpu_x86_set_ucna(vcpu, mce, banks);
+
 	/*
 	 * if IA32_MCG_CTL is not all 1s, the uncorrected error
 	 * reporting is disabled
@@ -4844,7 +4887,6 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
 	if ((mce->status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
 	    vcpu->arch.mcg_ctl != ~(u64)0)
 		return 0;
-	banks += 4 * mce->bank;
 	/*
 	 * if IA32_MCi_CTL is not all 1s, the uncorrected error
 	 * reporting is disabled for the bank
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 8/8] KVM: selftests: Add a self test for CMCI and UCNA emulations.
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (6 preceding siblings ...)
  2022-06-10 17:11 ` [PATCH v5 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors Jue Wang
@ 2022-06-10 17:11 ` Jue Wang
  2022-06-23 17:25 ` [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Paolo Bonzini
  2022-06-30 13:48 ` Xiaoyao Li
  9 siblings, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-06-10 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch add a self test that verifies user space can inject
UnCorrectable No Action required (UCNA) memory errors to the guest.
It also verifies that incorrectly configured MSRs for Corrected
Machine Check Interrupt (CMCI) emulation will result in #GP.

Signed-off-by: Jue Wang <juew@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/apic.h       |   1 +
 .../selftests/kvm/include/x86_64/mce.h        |  25 ++
 .../selftests/kvm/include/x86_64/processor.h  |   1 +
 .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
 .../kvm/x86_64/ucna_injection_test.c          | 347 ++++++++++++++++++
 7 files changed, 377 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/mce.h
 create mode 100644 tools/testing/selftests/kvm/x86_64/ucna_injection_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 0b0e4402bba6..a8eb9d3f082c 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -37,6 +37,7 @@
 /x86_64/tsc_scaling_sync
 /x86_64/sync_regs_test
 /x86_64/tsc_msrs_test
+/x86_64/ucna_injection_test
 /x86_64/userspace_io_test
 /x86_64/userspace_msr_exit_test
 /x86_64/vmx_apic_access_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 681b173aa87c..8d9858b54b98 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -66,6 +66,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
 TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
 TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test
 TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
+TEST_GEN_PROGS_x86_64 += x86_64/ucna_injection_test
 TEST_GEN_PROGS_x86_64 += x86_64/userspace_io_test
 TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_apic_access_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index ac88557dcc9a..bed316fdecd5 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -35,6 +35,7 @@
 #define		APIC_SPIV_APIC_ENABLED		(1 << 8)
 #define APIC_IRR	0x200
 #define	APIC_ICR	0x300
+#define	APIC_LVTCMCI	0x2f0
 #define		APIC_DEST_SELF		0x40000
 #define		APIC_DEST_ALLINC	0x80000
 #define		APIC_DEST_ALLBUT	0xC0000
diff --git a/tools/testing/selftests/kvm/include/x86_64/mce.h b/tools/testing/selftests/kvm/include/x86_64/mce.h
new file mode 100644
index 000000000000..6119321f3f5d
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/x86_64/mce.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * tools/testing/selftests/kvm/include/x86_64/mce.h
+ *
+ * Copyright (C) 2022, Google LLC.
+ */
+
+#ifndef SELFTEST_KVM_MCE_H
+#define SELFTEST_KVM_MCE_H
+
+#define MCG_CTL_P		BIT_ULL(8)   /* MCG_CTL register available */
+#define MCG_SER_P		BIT_ULL(24)  /* MCA recovery/new status bits */
+#define MCG_LMCE_P		BIT_ULL(27)  /* Local machine check supported */
+#define MCG_CMCI_P		BIT_ULL(10)  /* CMCI supported */
+#define KVM_MAX_MCE_BANKS 32
+#define MCG_CAP_BANKS_MASK 0xff       /* Bit 0-7 of the MCG_CAP register are #banks */
+#define MCI_STATUS_VAL (1ULL << 63)   /* valid error */
+#define MCI_STATUS_UC (1ULL << 61)    /* uncorrected error */
+#define MCI_STATUS_EN (1ULL << 60)    /* error enabled */
+#define MCI_STATUS_MISCV (1ULL << 59) /* misc error reg. valid */
+#define MCI_STATUS_ADDRV (1ULL << 58) /* addr reg. valid */
+#define MCM_ADDR_PHYS 2    /* physical address */
+#define MCI_CTL2_CMCI_EN		BIT_ULL(30)
+
+#endif /* SELFTEST_KVM_MCE_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index d0d51adec76e..fa316c3b7725 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -481,6 +481,7 @@ struct kvm_cpuid2 *kvm_get_supported_hv_cpuid(void);
 void vcpu_set_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
 struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
 void vm_xsave_req_perm(int bit);
+void vcpu_setup(struct kvm_vm *vm, int vcpuid);
 
 enum x86_page_size {
 	X86_PAGE_SIZE_4K = 0,
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 33ea5e9955d9..bb1ef665cd10 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -580,7 +580,7 @@ static void kvm_setup_tss_64bit(struct kvm_vm *vm, struct kvm_segment *segp,
 	kvm_seg_fill_gdt_64bit(vm, segp);
 }
 
-static void vcpu_setup(struct kvm_vm *vm, int vcpuid)
+void vcpu_setup(struct kvm_vm *vm, int vcpuid)
 {
 	struct kvm_sregs sregs;
 
diff --git a/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
new file mode 100644
index 000000000000..3b550bfd7df1
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
@@ -0,0 +1,347 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ucna_injection_test
+ *
+ * Copyright (C) 2022, Google LLC.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * Test that user space can inject UnCorrectable No Action required (UCNA)
+ * memory errors to the guest.
+ *
+ * The test starts one vCPU with the MCG_CMCI_P enabled. It verifies that
+ * proper UCNA errors can be injected to a vCPU with MCG_CMCI_P and
+ * corresponding per-bank control register (MCI_CTL2) bit enabled.
+ * The test also checks that the UCNA errors get recorded in the
+ * Machine Check bank registers no matter the error signal interrupts get
+ * delivered into the guest or not.
+ *
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <pthread.h>
+#include <inttypes.h>
+#include <string.h>
+#include <time.h>
+
+#include "kvm_util_base.h"
+#include "kvm_util.h"
+#include "mce.h"
+#include "processor.h"
+#include "test_util.h"
+#include "apic.h"
+
+#define UCNA_VCPU_ID 0
+#define CMCI_DISABLED_VCPU_ID 1
+#define CMCI_ENABLED_VCPU_ID 2
+
+#define SYNC_FIRST_UCNA 9
+#define SYNC_SECOND_UCNA 10
+#define SYNC_GP 11
+#define FIRST_UCNA_ADDR 0xdeadbeef
+#define SECOND_UCNA_ADDR 0xcafeb0ba
+
+/*
+ * Vector for the CMCI interrupt.
+ * Value is arbitrary. Any value in 0x20-0xFF should work:
+ * https://wiki.osdev.org/Interrupt_Vector_Table
+ */
+#define CMCI_VECTOR  0xa9
+
+#define UCNA_BANK  0x7	// IMC0 bank
+
+#define MCI_CTL2_RESERVED_BIT BIT_ULL(29)
+/*
+ * Record states about the injected UCNA.
+ * The variables started with the 'i_' prefixes are recorded in interrupt
+ * handler. Variables without the 'i_' prefixes are recorded in guest main
+ * execution thread.
+ */
+static volatile uint64_t i_ucna_rcvd;
+static volatile uint64_t i_ucna_addr;
+static volatile uint64_t ucna_addr;
+static volatile uint64_t ucna_addr2;
+
+struct thread_params {
+	struct kvm_vm *vm;
+	uint32_t vcpu_id;
+	uint64_t *p_i_ucna_rcvd;
+	uint64_t *p_i_ucna_addr;
+	uint64_t *p_ucna_addr;
+	uint64_t *p_ucna_addr2;
+};
+
+static void verify_apic_base_addr(void)
+{
+	uint64_t msr = rdmsr(MSR_IA32_APICBASE);
+	uint64_t base = GET_APIC_BASE(msr);
+
+	GUEST_ASSERT(base == APIC_DEFAULT_GPA);
+}
+
+static void ucna_injection_guest_code(void)
+{
+	uint64_t ctl2;
+	verify_apic_base_addr();
+	xapic_enable();
+
+	/* Sets up the interrupt vector and enables per-bank CMCI sigaling. */
+	xapic_write_reg(APIC_LVTCMCI, CMCI_VECTOR | APIC_DM_FIXED);
+	ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
+	wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 | MCI_CTL2_CMCI_EN);
+
+	/* Enables interrupt in guest. */
+	asm volatile("sti");
+
+	/* Let user space inject the first UCNA */
+	GUEST_SYNC(SYNC_FIRST_UCNA);
+
+	ucna_addr = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
+
+	/* Disables the per-bank CMCI signaling. */
+	ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
+	wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 & ~MCI_CTL2_CMCI_EN);
+
+	/* Let the user space inject the second UCNA */
+	GUEST_SYNC(SYNC_SECOND_UCNA);
+
+	ucna_addr2 = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
+	GUEST_DONE();
+}
+
+static void cmci_disabled_guest_code(void)
+{
+	uint64_t ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
+	wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 | MCI_CTL2_CMCI_EN);
+
+	GUEST_DONE();
+}
+
+static void cmci_enabled_guest_code(void)
+{
+	uint64_t ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
+	wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 | MCI_CTL2_RESERVED_BIT);
+
+	GUEST_DONE();
+}
+
+static void guest_cmci_handler(struct ex_regs *regs)
+{
+	i_ucna_rcvd++;
+	i_ucna_addr = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
+	xapic_write_reg(APIC_EOI, 0);
+}
+
+static void guest_gp_handler(struct ex_regs *regs)
+{
+	GUEST_SYNC(SYNC_GP);
+}
+
+static void run_vcpu_expect_gp(struct kvm_vm *vm, uint32_t vcpu_id)
+{
+	unsigned int exit_reason;
+	struct ucall uc;
+
+	fprintf(stderr, "vCPU thread running vCPU %u\n", vcpu_id);
+	vcpu_run(vm, vcpu_id);
+
+	exit_reason = vcpu_state(vm, vcpu_id)->exit_reason;
+	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+		    vcpu_id, exit_reason, exit_reason_str(exit_reason));
+	TEST_ASSERT(get_ucall(vm, vcpu_id, &uc) == UCALL_SYNC,
+		    "Expect UCALL_SYNC\n");
+	TEST_ASSERT(uc.args[1] == SYNC_GP, "#GP is expected.");
+	fprintf(stderr, "vCPU %u received GP in guest.\n", vcpu_id);
+}
+
+static void inject_ucna(struct kvm_vm *vm, uint32_t vcpu_id, uint64_t addr) {
+	/*
+	 * A UCNA error is indicated with VAL=1, UC=1, PCC=0, S=0 and AR=0 in
+	 * the IA32_MCi_STATUS register.
+	 * MSCOD=1 (BIT[16] - MscodDataRdErr).
+	 * MCACOD=0x0090 (Memory controller error format, channel 0)
+	 */
+	uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
+			  MCI_STATUS_MISCV | MCI_STATUS_ADDRV | 0x10090;
+	struct kvm_x86_mce mce = {};
+	mce.status = status;
+	mce.mcg_status = 0;
+	/*
+	 * MCM_ADDR_PHYS indicates the reported address is a physical address.
+	 * Lowest 6 bits is the recoverable address LSB, i.e., the injected MCE
+	 * is at 4KB granularity.
+	 */
+	mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
+	mce.addr = addr;
+	mce.bank = UCNA_BANK;
+
+	TEST_ASSERT(_vcpu_ioctl(vm, vcpu_id, KVM_X86_SET_MCE, &mce) != -1,
+		    "Inject UCNA");
+}
+
+static void *run_ucna_injection(void *arg)
+{
+	struct thread_params *params = (struct thread_params *)arg;
+	struct ucall uc;
+	int old;
+	int r;
+	unsigned int exit_reason;
+
+	r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
+	TEST_ASSERT(r == 0,
+		    "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
+		    params->vcpu_id, r);
+
+	fprintf(stderr, "vCPU thread running vCPU %u\n", params->vcpu_id);
+	vcpu_run(params->vm, params->vcpu_id);
+
+	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
+	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
+	TEST_ASSERT(get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_SYNC,
+		    "Expect UCALL_SYNC\n");
+	TEST_ASSERT(uc.args[1] == SYNC_FIRST_UCNA, "Injecting first UCNA.");
+
+	fprintf(stderr, "Injecting first UCNA at %#x.\n", FIRST_UCNA_ADDR);
+
+	inject_ucna(params->vm, params->vcpu_id, FIRST_UCNA_ADDR);
+	vcpu_run(params->vm, params->vcpu_id);
+
+	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
+	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
+	TEST_ASSERT(get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_SYNC,
+		    "Expect UCALL_SYNC\n");
+	TEST_ASSERT(uc.args[1] == SYNC_SECOND_UCNA, "Injecting second UCNA.");
+
+	fprintf(stderr, "Injecting second UCNA at %#x.\n", SECOND_UCNA_ADDR);
+
+	inject_ucna(params->vm, params->vcpu_id, SECOND_UCNA_ADDR);
+	vcpu_run(params->vm, params->vcpu_id);
+
+	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
+	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
+	if (get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_ABORT) {
+		TEST_ASSERT(false,
+			    "vCPU %u exited with error: %s.\n",
+			    params->vcpu_id, (const char *)uc.args[0]);
+	}
+
+	return NULL;
+}
+
+static void test_ucna_injection(struct thread_params *params)
+{
+	params->p_i_ucna_rcvd = (uint64_t *)addr_gva2hva(params->vm, (uint64_t)&i_ucna_rcvd);
+	params->p_i_ucna_addr = (uint64_t *)addr_gva2hva(params->vm, (uint64_t)&i_ucna_addr);
+	params->p_ucna_addr = (uint64_t *)addr_gva2hva(params->vm, (uint64_t)&ucna_addr);
+	params->p_ucna_addr2 = (uint64_t *)addr_gva2hva(params->vm, (uint64_t)&ucna_addr2);
+	params->vcpu_id = UCNA_VCPU_ID;
+
+	run_ucna_injection(params);
+
+	TEST_ASSERT(*params->p_i_ucna_rcvd == 1, "Only first UCNA get signaled.");
+	TEST_ASSERT(*params->p_i_ucna_addr == FIRST_UCNA_ADDR,
+		    "Only first UCNA reported addr get recorded via interrupt.");
+	TEST_ASSERT(*params->p_ucna_addr == FIRST_UCNA_ADDR,
+		    "First injected UCNAs should get exposed via registers.");
+	TEST_ASSERT(*params->p_ucna_addr2 == SECOND_UCNA_ADDR,
+		    "Second injected UCNAs should get exposed via registers.");
+
+	fprintf(stderr,
+		"Test successful.\n"
+		"UCNA CMCI interrupts received: %ld\n"
+		"Last UCNA address received via CMCI: %lx\n"
+		"First UCNA address in vCPU thread: %lx\n"
+		"Second UCNA address in vCPU thread: %lx\n",
+		*params->p_i_ucna_rcvd, *params->p_i_ucna_addr,
+		*params->p_ucna_addr, *params->p_ucna_addr2);
+}
+
+static void setup_mce_cap(struct kvm_vm *vm, uint32_t vcpuid, bool enable_cmci_p)
+{
+	uint64_t supported_mcg_caps = 0;
+	uint64_t mcg_caps = MCG_CTL_P | MCG_SER_P | MCG_LMCE_P | KVM_MAX_MCE_BANKS;
+	if (enable_cmci_p)
+		mcg_caps |= MCG_CMCI_P;
+
+	TEST_ASSERT(_kvm_ioctl(vm, KVM_X86_GET_MCE_CAP_SUPPORTED,
+			       &supported_mcg_caps) != -1,
+		    "KVM_GET_MCE_CAP_SUPPORTED");
+	fprintf(stderr, "KVM supported MCG_CAP: %#lx\n", supported_mcg_caps);
+
+	if (!(supported_mcg_caps & MCG_CMCI_P)) {
+		print_skip("MCG_CMCI_P is not supported, skip test.");
+		exit(KSFT_SKIP);
+	}
+
+	mcg_caps &= supported_mcg_caps | MCG_CAP_BANKS_MASK;
+	fprintf(stderr, "Set MCG_CAP: %#lx on cpu: %d\n", mcg_caps, vcpuid);
+	TEST_ASSERT(_vcpu_ioctl(vm, vcpuid, KVM_X86_SETUP_MCE, &mcg_caps) != -1,
+		    "KVM_X86_SETUP_MCE");
+}
+
+static void create_vcpu_with_mce_cap(struct kvm_vm *vm, uint32_t vcpuid,
+		bool enable_cmci_p, void *guest_code)
+{
+	struct kvm_mp_state mp_state;
+	struct kvm_regs regs;
+	vm_vaddr_t stack_vaddr;
+	stack_vaddr = vm_vaddr_alloc(vm, DEFAULT_STACK_PGS * getpagesize(),
+				     DEFAULT_GUEST_STACK_VADDR_MIN);
+
+	vm_vcpu_add(vm, vcpuid);
+	setup_mce_cap(vm, vcpuid, enable_cmci_p);
+
+	vcpu_set_cpuid(vm, vcpuid, kvm_get_supported_cpuid());
+	vcpu_setup(vm, vcpuid);
+
+	/* Setup guest general purpose registers */
+	vcpu_regs_get(vm, vcpuid, &regs);
+	regs.rflags = regs.rflags | 0x2;
+	regs.rsp = stack_vaddr + (DEFAULT_STACK_PGS * getpagesize());
+	regs.rip = (unsigned long) guest_code;
+	vcpu_regs_set(vm, vcpuid, &regs);
+
+	/* Setup the MP state */
+	mp_state.mp_state = 0;
+	vcpu_set_mp_state(vm, vcpuid, &mp_state);
+}
+
+int main(int argc, char *argv[])
+{
+	struct thread_params params;
+	struct kvm_vm *vm;
+
+	kvm_check_cap(KVM_CAP_MCE);
+
+	vm = vm_create_without_vcpus(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES);
+
+	create_vcpu_with_mce_cap(vm, UCNA_VCPU_ID, true, ucna_injection_guest_code);
+	create_vcpu_with_mce_cap(vm, CMCI_DISABLED_VCPU_ID, false,
+				 cmci_disabled_guest_code);
+	create_vcpu_with_mce_cap(vm, CMCI_ENABLED_VCPU_ID, true,
+				 cmci_enabled_guest_code);
+
+	params.vm = vm;
+
+	vm_init_descriptor_tables(vm);
+	vcpu_init_descriptor_tables(vm, UCNA_VCPU_ID);
+	vcpu_init_descriptor_tables(vm, CMCI_DISABLED_VCPU_ID);
+	vcpu_init_descriptor_tables(vm, CMCI_ENABLED_VCPU_ID);
+	vm_install_exception_handler(vm, CMCI_VECTOR, guest_cmci_handler);
+	vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler);
+
+	virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+	test_ucna_injection(&params);
+	run_vcpu_expect_gp(vm, CMCI_DISABLED_VCPU_ID);
+	run_vcpu_expect_gp(vm, CMCI_ENABLED_VCPU_ID);
+
+	kvm_vm_free(vm);
+}
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (7 preceding siblings ...)
  2022-06-10 17:11 ` [PATCH v5 8/8] KVM: selftests: Add a self test for CMCI and UCNA emulations Jue Wang
@ 2022-06-23 17:25 ` Paolo Bonzini
  2022-06-30 13:48 ` Xiaoyao Li
  9 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2022-06-23 17:25 UTC (permalink / raw)
  To: Jue Wang, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan

On 6/10/22 19:11, Jue Wang wrote:
> This patch series implement emulation for Corrected Machine Check
> Interrupt (CMCI) signaling and UnCorrectable No Action required (UCNA)
> error injection.
> 
> UCNA errors signaled via CMCI allow a guest to be notified as soon as
> uncorrectable memory errors get detected by some background threads,
> e.g., threads that migrate guest memory across hosts or threads that
> constantly scan system memory for errors [1].
> 
> Upon receiving UCNAs, guest kernel isolates the poisoned pages which may
> still be free, preventing future accesses that may cause fatal MCEs.
> 
> 1. https://lore.kernel.org/linux-mm/8eceffc0-01e8-2a55-6eb9-b26faa9e3caf@intel.com/t/
> 
> 
> Patch 1-3 clean up KVM APIC LVT logic in preparation to adding APIC_LVTCMCI.
> 
> Patch 4 adds APIC_LVTCMCI emulation.
> 
> Patch 5 updates mce_banks to use array allocation api.
> 
> Patch 6 adds emulation for MSR_IA32_MCx_CTL2 registers that provide per
> bank control of CMCI signaling.
> 
> Patch 7 enables MCG_CMCI_P and handles injected UCNA errors.
> 
> Patch 8 adds a KVM self test that validates UCNA injection and CMCI
> emulation.
> 
> v5 changes
> - Incorporate feedback from David Matlack <dmatlack@google.com>
> - Rewrite the change log to be more concise and accurate.
> - Removes several duplicated checks in UCNA injection code.
> - Add test cases that validate CMCI emulation to self test.
> 
> v4 changes
> - Incorporate feedback from David Matlack <dmatlack@google.com>
> - Rewrite the change logs to be more descriptive.
> - Add a KVM self test.
> 
> v3 changes
> - Incorporate feedback from Sean Christopherson <seanjc@google.com>
> - Split clean up to KVM APIC LVT logic to 3 patches.
> - Put the clean up of mce_array allocation in a separate patch.
> - Base the MCi_CTL2 register emulation on Sean's clean up and fix
> series [2]
> - Fix bugs around MCi_CTL2 register offset validation and the free of
> mci_ctl2_banks array.
> - Rewrite the change log with more details in architectural information
> about CMCI, UCNA and MCG_CMCI_P.
> - Fix various comments and wrapping style.
> 
> 2. https://lore.kernel.org/lkml/20220512222716.4112548-1-seanjc@google.com/T/
> 
> v2 chanegs
> - Incorporate feedback from Sean Christopherson <seanjc@google.com>
> - Split the single patch into 4:
>    1). clean up KVM APIC LVT logic
>    2). add CMCI emulation to lapic
>    3). add emulation of MSR_IA32_MCx_CTL2
>    4). enable MCG_CMCI_P and handle injected UCNAs
> - Fix various style issues.
> 
> Jue Wang (8):
>    KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.
>    KVM: x86: Fill apic_lvt_mask with enums / explicit entries.
>    KVM: x86: Add APIC_LVTx() macro.
>    KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to
>      lapic.
>    KVM: x86: Use kcalloc to allocate the mce_banks array.
>    KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.
>    KVM: x86: Enable CMCI capability by default and handle injected UCNA
>      errors
>    KVM: selftests: Add a self test for CMCI and UCNA emulations.
> 
>   arch/x86/include/asm/kvm_host.h               |   1 +
>   arch/x86/kvm/lapic.c                          |  66 ++--
>   arch/x86/kvm/lapic.h                          |  16 +-
>   arch/x86/kvm/vmx/vmx.c                        |   1 +
>   arch/x86/kvm/x86.c                            | 178 ++++++---
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |   1 +
>   .../selftests/kvm/include/x86_64/apic.h       |   1 +
>   .../selftests/kvm/include/x86_64/mce.h        |  25 ++
>   .../selftests/kvm/include/x86_64/processor.h  |   1 +
>   .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
>   .../kvm/x86_64/ucna_injection_test.c          | 347 ++++++++++++++++++
>   12 files changed, 573 insertions(+), 67 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/include/x86_64/mce.h
>   create mode 100644 tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
> 

Queued, thanks.  The test of course required some changes to adapt to 
the new API.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation
  2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (8 preceding siblings ...)
  2022-06-23 17:25 ` [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Paolo Bonzini
@ 2022-06-30 13:48 ` Xiaoyao Li
  2022-06-30 14:05   ` Jue Wang
  9 siblings, 1 reply; 15+ messages in thread
From: Xiaoyao Li @ 2022-06-30 13:48 UTC (permalink / raw)
  To: Jue Wang, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan

On 6/11/2022 1:11 AM, Jue Wang wrote:
> This patch series implement emulation for Corrected Machine Check
> Interrupt (CMCI) signaling and UnCorrectable No Action required (UCNA)
> error injection.
> 

It seems the main purpose of this series is to allow UCNA error 
injection and notify it to guest via CMCI.

But it doesn't emulate CMCI fully. E.g., guest's error threshold of 
MCi_CTL2 doesn't work. It's still controlled by host's value.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation
  2022-06-30 13:48 ` Xiaoyao Li
@ 2022-06-30 14:05   ` Jue Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-06-30 14:05 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack, Tony Luck, kvm,
	Greg Thelen, Jiaqi Yan

On Thu, Jun 30, 2022 at 6:48 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 6/11/2022 1:11 AM, Jue Wang wrote:
> > This patch series implement emulation for Corrected Machine Check
> > Interrupt (CMCI) signaling and UnCorrectable No Action required (UCNA)
> > error injection.
> >
>
> It seems the main purpose of this series is to allow UCNA error
> injection and notify it to guest via CMCI.
>
> But it doesn't emulate CMCI fully. E.g., guest's error threshold of
> MCi_CTL2 doesn't work. It's still controlled by host's value.
>
Both of the above points are correct.

In fact, this series does not enable injecting corrected errors into a
KVM guest at all, for:

1. It is not necessary since corrected errors are transparent to
software executions (host / guest).
2. Corrected errors can be fully managed on the host side (monitor, re-act).
3. Prevent guest initiated row-hammer attack based on the injected
corrected errors feedback loop.

Thanks,
-Jue

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.
  2022-06-10 17:11 ` [PATCH v5 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic Jue Wang
@ 2022-07-01  2:07   ` Xiaoyao Li
  2022-07-01  4:40     ` Jue Wang
  2022-07-01 16:53     ` Jue Wang
  0 siblings, 2 replies; 15+ messages in thread
From: Xiaoyao Li @ 2022-07-01  2:07 UTC (permalink / raw)
  To: Jue Wang, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan

On 6/11/2022 1:11 AM, Jue Wang wrote:
...
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4790f0d7d40b..a08693808729 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4772,6 +4772,8 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
>   	/* Init IA32_MCi_CTL to all 1s */
>   	for (bank = 0; bank < bank_num; bank++)
>   		vcpu->arch.mce_banks[bank*4] = ~(u64)0;
> +	vcpu->arch.apic->nr_lvt_entries =
> +		KVM_APIC_MAX_NR_LVT_ENTRIES - !(mcg_cap & MCG_CMCI_P);

vcpu->arch.apic->nr_lvt_entries needs to be initialized as 
KVM_APIC_MAX_NR_LVT_ENTREIS - 1 when creating lapic.

What if userspace doesn't call KVM_X86_SETUP_MCE at all?

>   
>   	static_call(kvm_x86_setup_mce)(vcpu);
>   out:


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.
  2022-07-01  2:07   ` Xiaoyao Li
@ 2022-07-01  4:40     ` Jue Wang
  2022-07-01 16:53     ` Jue Wang
  1 sibling, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-07-01  4:40 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	David Matlack, Tony Luck, kvm, Greg Thelen, Jiaqi Yan

On Thu, Jun 30, 2022 at 7:07 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 6/11/2022 1:11 AM, Jue Wang wrote:
> ...
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 4790f0d7d40b..a08693808729 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -4772,6 +4772,8 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
> >       /* Init IA32_MCi_CTL to all 1s */
> >       for (bank = 0; bank < bank_num; bank++)
> >               vcpu->arch.mce_banks[bank*4] = ~(u64)0;
> > +     vcpu->arch.apic->nr_lvt_entries =
> > +             KVM_APIC_MAX_NR_LVT_ENTRIES - !(mcg_cap & MCG_CMCI_P);
>
> vcpu->arch.apic->nr_lvt_entries needs to be initialized as
> KVM_APIC_MAX_NR_LVT_ENTREIS - 1 when creating lapic.
>
> What if userspace doesn't call KVM_X86_SETUP_MCE at all?
Good catch.

Paolo / Sean, should another patch be sent to fix this or do you
recommend some other means to address the nr_lvt_entries
initialization issue that Xiaoyao pointed out?

Thanks a lot,
-Jue
>
> >
> >       static_call(kvm_x86_setup_mce)(vcpu);
> >   out:
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.
  2022-07-01  2:07   ` Xiaoyao Li
  2022-07-01  4:40     ` Jue Wang
@ 2022-07-01 16:53     ` Jue Wang
  1 sibling, 0 replies; 15+ messages in thread
From: Jue Wang @ 2022-07-01 16:53 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack, Tony Luck, kvm,
	Greg Thelen, Jiaqi Yan

I have sent out two patches to address this bug and an #GP reported by
syzkaller.

https://lore.kernel.org/kvm/20220701165045.4074471-1-juew@google.com/

Thanks again for the review and reporting of this issue!

Best regards,
-Jue

On Thu, Jun 30, 2022 at 7:07 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 6/11/2022 1:11 AM, Jue Wang wrote:
> ...
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 4790f0d7d40b..a08693808729 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -4772,6 +4772,8 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
> >       /* Init IA32_MCi_CTL to all 1s */
> >       for (bank = 0; bank < bank_num; bank++)
> >               vcpu->arch.mce_banks[bank*4] = ~(u64)0;
> > +     vcpu->arch.apic->nr_lvt_entries =
> > +             KVM_APIC_MAX_NR_LVT_ENTRIES - !(mcg_cap & MCG_CMCI_P);
>
> vcpu->arch.apic->nr_lvt_entries needs to be initialized as
> KVM_APIC_MAX_NR_LVT_ENTREIS - 1 when creating lapic.
>
> What if userspace doesn't call KVM_X86_SETUP_MCE at all?
>
> >
> >       static_call(kvm_x86_setup_mce)(vcpu);
> >   out:
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-07-01 16:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 17:11 [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
2022-06-10 17:11 ` [PATCH v5 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL Jue Wang
2022-06-10 17:11 ` [PATCH v5 2/8] KVM: x86: Fill apic_lvt_mask with enums / explicit entries Jue Wang
2022-06-10 17:11 ` [PATCH v5 3/8] KVM: x86: Add APIC_LVTx() macro Jue Wang
2022-06-10 17:11 ` [PATCH v5 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic Jue Wang
2022-07-01  2:07   ` Xiaoyao Li
2022-07-01  4:40     ` Jue Wang
2022-07-01 16:53     ` Jue Wang
2022-06-10 17:11 ` [PATCH v5 5/8] KVM: x86: Use kcalloc to allocate the mce_banks array Jue Wang
2022-06-10 17:11 ` [PATCH v5 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs Jue Wang
2022-06-10 17:11 ` [PATCH v5 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors Jue Wang
2022-06-10 17:11 ` [PATCH v5 8/8] KVM: selftests: Add a self test for CMCI and UCNA emulations Jue Wang
2022-06-23 17:25 ` [PATCH v5 0/8] KVM: x86: Add CMCI and UCNA emulation Paolo Bonzini
2022-06-30 13:48 ` Xiaoyao Li
2022-06-30 14:05   ` Jue Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.