All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation
@ 2022-05-20 17:36 Jue Wang
  2022-05-20 17:36 ` [PATCH v4 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL Jue Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Jue Wang @ 2022-05-20 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch series implement emulation for Corrected Machine Check
Interrupt (CMCI) signaling and UnCorrectable No Action required (UCNA)
error injection.

UCNA errors signaled via CMCI allow a guest to be notified as soon as
uncorrectable memory errors get detected by some background threads,
e.g., threads that migrate guest memory across hosts or threads that
constantly scan system memory for errors [1].

Upon receiving UCNAs, guest kernel isolates the poisoned pages which may
still be free, preventing future accesses that may cause fatal MCEs.

1. https://lore.kernel.org/linux-mm/8eceffc0-01e8-2a55-6eb9-b26faa9e3caf@intel.com/t/


Patch 1-3 clean up KVM APIC LVT logic in preparation to adding APIC_LVTCMCI.

Patch 4 adds CMCI emulation to lapic.

Patch 5 updates the allocation of mce_banks to use array allocation api.

Patch 6 adds emulation for MSR_IA32_MCx_CTL2 registers that provide per
bank control of CMCI signaling.

Patch 7 enables MCG_CMCI_P by default and handles injected UCNA errors.

Patch 8 adds a KVM self test that exercises UCNA injection.

v4 changes
- Incorporate feedback from David Matlack <dmatlack@google.com>
- Rewrite the change logs to be more descriptive.
- Add a KVM self test.

v3 changes
- Incorporate feedback from Sean Christopherson <seanjc@google.com>
- Split clean up to KVM APIC LVT logic to 3 patches.
- Put the clean up of mce_array allocation in a separate patch.
- Base the MCi_CTL2 register emulation on Sean's clean up and fix
series [2]
- Fix bugs around MCi_CTL2 register offset validation and the free of
mci_ctl2_banks array.
- Rewrite the change log with more details in architectural information
about CMCI, UCNA and MCG_CMCI_P.
- Fix various comments and wrapping style.

2. https://lore.kernel.org/lkml/20220512222716.4112548-1-seanjc@google.com/T/


v2 chanegs
- Incorporate feedback from Sean Christopherson <seanjc@google.com>
- Split the single patch into 4:
  1). clean up KVM APIC LVT logic
  2). add CMCI emulation to lapic
  3). add emulation of MSR_IA32_MCx_CTL2
  4). enable MCG_CMCI_P and handle injected UCNAs
- Fix various style issues.

Jue Wang (8):
  KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.
  KVM: x86: Fill apic_lvt_mask with enums / explicit entries.
  KVM: x86: Add APIC_LVTx() macro.
  KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to
    lapic.
  KVM: x86: Use kcalloc to allocate the mce_banks array.
  KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.
  KVM: x86: Enable CMCI capability by default and handle injected UCNA
    errors
  KVM: x86: Add a self test for UCNA injection.

 arch/x86/include/asm/kvm_host.h               |   1 +
 arch/x86/kvm/lapic.c                          |  58 +++-
 arch/x86/kvm/lapic.h                          |  15 +-
 arch/x86/kvm/vmx/vmx.c                        |   1 +
 arch/x86/kvm/x86.c                            | 182 ++++++++---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/apic.h       |   1 +
 .../selftests/kvm/include/x86_64/mce.h        |  25 ++
 .../selftests/kvm/include/x86_64/processor.h  |   1 +
 .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
 .../kvm/x86_64/ucna_injection_test.c          | 287 ++++++++++++++++++
 12 files changed, 517 insertions(+), 58 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/mce.h
 create mode 100644 tools/testing/selftests/kvm/x86_64/ucna_injection_test.c

-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.
  2022-05-20 17:36 [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
@ 2022-05-20 17:36 ` Jue Wang
  2022-06-03 18:58   ` David Matlack
  2022-05-20 17:36 ` [PATCH v4 2/8] KVM: x86: Fill apic_lvt_mask with enums / explicit entries Jue Wang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Jue Wang @ 2022-05-20 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

To implement Corrected Machine Check Interrupt (CMCI) as another
LVT vector, the APIC LVT logic needs to be able to handle an additional
LVT vector conditioned on whether MCG_CMCI_P is enabled on the vCPU,
this is because CMCI signaling can only be enabled when the CPU's
MCG_CMCI_P bit is set (Intel SDM, section 15.3.1.1).

This patch factors out the dependency on KVM_APIC_LVT_NUM from the
APIC_VERSION macro. In later patches, KVM_APIC_LVT_NUM will be replaced
with a helper kvm_apic_get_nr_lvt_entries that reports different LVT
number conditioned on whether MCG_CMCI_P is enabled on the vCPU.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/lapic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 66b0eb0bda94..a5caa77e279f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -54,7 +54,7 @@
 #define PRIo64 "o"
 
 /* 14 is the version for Xeon and Pentium 8.4.8*/
-#define APIC_VERSION			(0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
+#define APIC_VERSION			0x14UL
 #define LAPIC_MMIO_LENGTH		(1 << 12)
 /* followed define is not in apicdef.h */
 #define MAX_APIC_VECTOR			256
@@ -401,7 +401,7 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
 void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	u32 v = APIC_VERSION;
+	u32 v = APIC_VERSION | ((KVM_APIC_LVT_NUM - 1) << 16);
 
 	if (!lapic_in_kernel(vcpu))
 		return;
-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/8] KVM: x86: Fill apic_lvt_mask with enums / explicit entries.
  2022-05-20 17:36 [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
  2022-05-20 17:36 ` [PATCH v4 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL Jue Wang
@ 2022-05-20 17:36 ` Jue Wang
  2022-05-20 17:36 ` [PATCH v4 3/8] KVM: x86: Add APIC_LVTx() macro Jue Wang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Jue Wang @ 2022-05-20 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

To implement Corrected Machine Check Interrupt (CMCI) as an additional
LVT vector, the apic_lvt_mask array needs to handle LVT_CMCI
transparently when LVT_CMCI is added.

This patch defines a lapic_lvt_entry enum and use its elements as
explicit indices to the apic_lvt_mask array, naturally extensible to
support an additional LVT_CMCI enum hence apic_lvt_mask element in the
future.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/lapic.c | 19 ++++++++++---------
 arch/x86/kvm/lapic.h | 12 +++++++++++-
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a5caa77e279f..73f5cd248a63 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -401,7 +401,7 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
 void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	u32 v = APIC_VERSION | ((KVM_APIC_LVT_NUM - 1) << 16);
+	u32 v = APIC_VERSION | ((KVM_APIC_MAX_NR_LVT_ENTRIES - 1) << 16);
 
 	if (!lapic_in_kernel(vcpu))
 		return;
@@ -419,12 +419,13 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 	kvm_lapic_set_reg(apic, APIC_LVR, v);
 }
 
-static const unsigned int apic_lvt_mask[KVM_APIC_LVT_NUM] = {
-	LVT_MASK ,      /* part LVTT mask, timer mode mask added at runtime */
-	LVT_MASK | APIC_MODE_MASK,	/* LVTTHMR */
-	LVT_MASK | APIC_MODE_MASK,	/* LVTPC */
-	LINT_MASK, LINT_MASK,	/* LVT0-1 */
-	LVT_MASK		/* LVTERR */
+static const unsigned int apic_lvt_mask[KVM_APIC_MAX_NR_LVT_ENTRIES] = {
+	[LVT_TIMER] = LVT_MASK,      /* timer mode mask added at runtime */
+	[LVT_THERMAL_MONITOR] = LVT_MASK | APIC_MODE_MASK,
+	[LVT_PERFORMANCE_COUNTER] = LVT_MASK | APIC_MODE_MASK,
+	[LVT_LINT0] = LINT_MASK,
+	[LVT_LINT1] = LINT_MASK,
+	[LVT_ERROR] = LVT_MASK
 };
 
 static int find_highest_vector(void *bitmap)
@@ -2084,7 +2085,7 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			int i;
 			u32 lvt_val;
 
-			for (i = 0; i < KVM_APIC_LVT_NUM; i++) {
+			for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++) {
 				lvt_val = kvm_lapic_get_reg(apic,
 						       APIC_LVTT + 0x10 * i);
 				kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i,
@@ -2383,7 +2384,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
-	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
+	for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++)
 		kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
 	apic_update_lvtt(apic);
 	if (kvm_vcpu_is_reset_bsp(vcpu) &&
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4e4f8a22754f..4990793c2034 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -10,7 +10,6 @@
 
 #define KVM_APIC_INIT		0
 #define KVM_APIC_SIPI		1
-#define KVM_APIC_LVT_NUM	6
 
 #define APIC_SHORT_MASK			0xc0000
 #define APIC_DEST_NOSHORT		0x0
@@ -29,6 +28,17 @@ enum lapic_mode {
 	LAPIC_MODE_X2APIC = MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE,
 };
 
+enum lapic_lvt_entry {
+	LVT_TIMER,
+	LVT_THERMAL_MONITOR,
+	LVT_PERFORMANCE_COUNTER,
+	LVT_LINT0,
+	LVT_LINT1,
+	LVT_ERROR,
+
+	KVM_APIC_MAX_NR_LVT_ENTRIES,
+};
+
 struct kvm_timer {
 	struct hrtimer timer;
 	s64 period; 				/* unit: ns */
-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/8] KVM: x86: Add APIC_LVTx() macro.
  2022-05-20 17:36 [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
  2022-05-20 17:36 ` [PATCH v4 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL Jue Wang
  2022-05-20 17:36 ` [PATCH v4 2/8] KVM: x86: Fill apic_lvt_mask with enums / explicit entries Jue Wang
@ 2022-05-20 17:36 ` Jue Wang
  2022-05-20 17:36 ` [PATCH v4 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic Jue Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Jue Wang @ 2022-05-20 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

To implement Corrected Machine Check Interrupt (CMCI) as an additional
LVT vector, the code needs to be able to calculate the APIC_LVTx register
offset based on the register indices in the lapic_lvt_entry enum which
will be used in all places looping through supported APIC_LVTx
registers.

APIC_LVTx macro is introduced for this purpose.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/lapic.c | 7 +++----
 arch/x86/kvm/lapic.h | 2 ++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 73f5cd248a63..db12d2ef1aef 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2086,9 +2086,8 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			u32 lvt_val;
 
 			for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++) {
-				lvt_val = kvm_lapic_get_reg(apic,
-						       APIC_LVTT + 0x10 * i);
-				kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i,
+				lvt_val = kvm_lapic_get_reg(apic, APIC_LVTx(i));
+				kvm_lapic_set_reg(apic, APIC_LVTx(i),
 					     lvt_val | APIC_LVT_MASKED);
 			}
 			apic_update_lvtt(apic);
@@ -2385,7 +2384,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 	kvm_apic_set_version(apic->vcpu);
 
 	for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++)
-		kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
+		kvm_lapic_set_reg(apic, APIC_LVTx(i), APIC_LVT_MASKED);
 	apic_update_lvtt(apic);
 	if (kvm_vcpu_is_reset_bsp(vcpu) &&
 	    kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_LINT0_REENABLED))
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4990793c2034..2d197ed0b8ce 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -39,6 +39,8 @@ enum lapic_lvt_entry {
 	KVM_APIC_MAX_NR_LVT_ENTRIES,
 };
 
+#define APIC_LVTx(x) (APIC_LVTT + 0x10 * (x))
+
 struct kvm_timer {
 	struct hrtimer timer;
 	s64 period; 				/* unit: ns */
-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.
  2022-05-20 17:36 [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (2 preceding siblings ...)
  2022-05-20 17:36 ` [PATCH v4 3/8] KVM: x86: Add APIC_LVTx() macro Jue Wang
@ 2022-05-20 17:36 ` Jue Wang
  2022-06-03 20:26   ` David Matlack
  2022-05-20 17:36 ` [PATCH v4 5/8] KVM: x86: Use kcalloc to allocate the mce_banks array Jue Wang
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Jue Wang @ 2022-05-20 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch adds the handling of APIC_LVTCMCI, conditioned on whether the
vCPU has set MCG_CMCI_P in MCG_CAP register.

Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/lapic.c | 40 +++++++++++++++++++++++++++++++++-------
 arch/x86/kvm/lapic.h |  3 ++-
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index db12d2ef1aef..e2186a7c0eed 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -27,6 +27,7 @@
 #include <linux/math64.h>
 #include <linux/slab.h>
 #include <asm/processor.h>
+#include <asm/mce.h>
 #include <asm/msr.h>
 #include <asm/page.h>
 #include <asm/current.h>
@@ -398,14 +399,26 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
 	return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
 }
 
+static inline bool kvm_is_cmci_supported(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.mcg_cap & MCG_CMCI_P;
+}
+
+static inline int kvm_apic_get_nr_lvt_entries(struct kvm_lapic *apic)
+{
+	return KVM_APIC_MAX_NR_LVT_ENTRIES - !kvm_is_cmci_supported(apic->vcpu);
+}
+
 void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	u32 v = APIC_VERSION | ((KVM_APIC_MAX_NR_LVT_ENTRIES - 1) << 16);
+	u32 v = 0;
 
 	if (!lapic_in_kernel(vcpu))
 		return;
 
+	v = APIC_VERSION | ((kvm_apic_get_nr_lvt_entries(apic) - 1) << 16);
+
 	/*
 	 * KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
 	 * which doesn't have EOI register; Some buggy OSes (e.g. Windows with
@@ -425,7 +438,8 @@ static const unsigned int apic_lvt_mask[KVM_APIC_MAX_NR_LVT_ENTRIES] = {
 	[LVT_PERFORMANCE_COUNTER] = LVT_MASK | APIC_MODE_MASK,
 	[LVT_LINT0] = LINT_MASK,
 	[LVT_LINT1] = LINT_MASK,
-	[LVT_ERROR] = LVT_MASK
+	[LVT_ERROR] = LVT_MASK,
+	[LVT_CMCI] = LVT_MASK | APIC_MODE_MASK
 };
 
 static int find_highest_vector(void *bitmap)
@@ -1445,6 +1459,9 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 		APIC_REG_MASK(APIC_TMCCT) |
 		APIC_REG_MASK(APIC_TDCR);
 
+	if (kvm_is_cmci_supported(apic->vcpu))
+		valid_reg_mask |= APIC_REG_MASK(APIC_LVTCMCI);
+
 	/*
 	 * ARBPRI and ICR2 are not valid in x2APIC mode.  WARN if KVM reads ICR
 	 * in x2APIC mode as it's an 8-byte register in x2APIC and needs to be
@@ -2083,12 +2100,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		apic_set_spiv(apic, val & mask);
 		if (!(val & APIC_SPIV_APIC_ENABLED)) {
 			int i;
-			u32 lvt_val;
 
-			for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++) {
-				lvt_val = kvm_lapic_get_reg(apic, APIC_LVTx(i));
+			for (i = 0; i < kvm_apic_get_nr_lvt_entries(apic); i++) {
 				kvm_lapic_set_reg(apic, APIC_LVTx(i),
-					     lvt_val | APIC_LVT_MASKED);
+					kvm_lapic_get_reg(apic, APIC_LVTx(i)) | APIC_LVT_MASKED);
 			}
 			apic_update_lvtt(apic);
 			atomic_set(&apic->lapic_timer.pending, 0);
@@ -2140,6 +2155,17 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		apic_update_lvtt(apic);
 		break;
 
+	case APIC_LVTCMCI:
+		if (!kvm_is_cmci_supported(apic->vcpu)) {
+			ret = 1;
+			break;
+		}
+		if (!kvm_apic_sw_enabled(apic))
+			val |= APIC_LVT_MASKED;
+		val &= apic_lvt_mask[LVT_CMCI];
+		kvm_lapic_set_reg(apic, APIC_LVTCMCI, val);
+		break;
+
 	case APIC_TMICT:
 		if (apic_lvtt_tscdeadline(apic))
 			break;
@@ -2383,7 +2409,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
-	for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++)
+	for (i = 0; i < kvm_apic_get_nr_lvt_entries(apic); i++)
 		kvm_lapic_set_reg(apic, APIC_LVTx(i), APIC_LVT_MASKED);
 	apic_update_lvtt(apic);
 	if (kvm_vcpu_is_reset_bsp(vcpu) &&
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 2d197ed0b8ce..16298bcb2abf 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -35,11 +35,12 @@ enum lapic_lvt_entry {
 	LVT_LINT0,
 	LVT_LINT1,
 	LVT_ERROR,
+	LVT_CMCI,
 
 	KVM_APIC_MAX_NR_LVT_ENTRIES,
 };
 
-#define APIC_LVTx(x) (APIC_LVTT + 0x10 * (x))
+#define APIC_LVTx(x) ((x) == LVT_CMCI ? APIC_LVTCMCI : APIC_LVTT + 0x10 * (x))
 
 struct kvm_timer {
 	struct hrtimer timer;
-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 5/8] KVM: x86: Use kcalloc to allocate the mce_banks array.
  2022-05-20 17:36 [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (3 preceding siblings ...)
  2022-05-20 17:36 ` [PATCH v4 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic Jue Wang
@ 2022-05-20 17:36 ` Jue Wang
  2022-05-20 17:36 ` [PATCH v4 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs Jue Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Jue Wang @ 2022-05-20 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

Corrected Machine Check Interrupt (CMCI) can be configured via the per
Machine Check bank registers: IA32_MCI_CTL2. To emulate IA32_MCI_CTL2
registers, it's necessary to introduce another array mci_ctl2_banks
in analogy to the mce_banks array under struct kvm_vcpu_arch.

This patch updates the allocation of mce_banks with the array allocation
API (kcalloc) as a precedent for the later mci_ctl2_banks.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4790f0d7d40b..0e839077ce52 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11224,7 +11224,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		goto fail_free_lapic;
 	vcpu->arch.pio_data = page_address(page);
 
-	vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4,
+	vcpu->arch.mce_banks = kcalloc(KVM_MAX_MCE_BANKS * 4, sizeof(u64),
 				       GFP_KERNEL_ACCOUNT);
 	if (!vcpu->arch.mce_banks)
 		goto fail_free_pio_data;
-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.
  2022-05-20 17:36 [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (4 preceding siblings ...)
  2022-05-20 17:36 ` [PATCH v4 5/8] KVM: x86: Use kcalloc to allocate the mce_banks array Jue Wang
@ 2022-05-20 17:36 ` Jue Wang
  2022-06-03 20:41   ` David Matlack
  2022-05-20 17:36 ` [PATCH v4 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors Jue Wang
  2022-05-20 17:36 ` [RFC v4 8/8] KVM: selftests: Add a self test for UCNA injection Jue Wang
  7 siblings, 1 reply; 16+ messages in thread
From: Jue Wang @ 2022-05-20 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

Corrected Machine Check Interrupt (CMCI) can be configured via the per
Machine Check bank registers: IA32_MCi_CTL2. This patch adds the
emulation of IA32_MCi_CTL2 registers to KVM. A separate mci_ctl2_banks
array is used to keep the existing mce_banks register layout intact.

In Machine Check Architecture (MCA), MCG_CMCI_P (bit 10 of MCG_CAP) is
the corrected MC error counting/signaling extension present flag. When
this bit is set, it does not imply CMCI reported corrected error or UCNA
error is supported across all MCA banks. Software should check on a bank
by bank basis (i.e. if bit 30 in each IA32_MCi_CTL2 register is set).

Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/x86.c              | 130 ++++++++++++++++++++++----------
 2 files changed, 92 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4ff36610af6a..178b7e01bf8f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -806,6 +806,7 @@ struct kvm_vcpu_arch {
 	u64 mcg_ctl;
 	u64 mcg_ext_ctl;
 	u64 *mce_banks;
+	u64 *mci_ctl2_banks;
 
 	/* Cache MMIO info */
 	u64 mmio_gva;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0e839077ce52..f8ab592f519b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3174,6 +3174,16 @@ static void kvmclock_sync_fn(struct work_struct *work)
 					KVMCLOCK_SYNC_PERIOD);
 }
 
+/* These helpers are safe iff @msr is known to be an MCx bank MSR. */
+static bool is_mci_control_msr(u32 msr)
+{
+	return (msr & 3) == 0;
+}
+static bool is_mci_status_msr(u32 msr)
+{
+	return (msr & 3) == 1;
+}
+
 /*
  * On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
  */
@@ -3192,6 +3202,7 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	unsigned bank_num = mcg_cap & 0xff;
 	u32 msr = msr_info->index;
 	u64 data = msr_info->data;
+	u32 offset, last_msr;
 
 	switch (msr) {
 	case MSR_IA32_MCG_STATUS:
@@ -3205,32 +3216,50 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		vcpu->arch.mcg_ctl = data;
 		break;
-	default:
-		if (msr >= MSR_IA32_MC0_CTL &&
-		    msr < MSR_IA32_MCx_CTL(bank_num)) {
-			u32 offset = array_index_nospec(
-				msr - MSR_IA32_MC0_CTL,
-				MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
-
-			/* only 0 or all 1s can be written to IA32_MCi_CTL
-			 * some Linux kernels though clear bit 10 in bank 4 to
-			 * workaround a BIOS/GART TBL issue on AMD K8s, ignore
-			 * this to avoid an uncatched #GP in the guest
-			 */
-			if ((offset & 0x3) == 0 &&
-			    data != 0 && (data | (1 << 10)) != ~(u64)0)
-				return -1;
-
-			/* MCi_STATUS */
-			if (!msr_info->host_initiated &&
-			    (offset & 0x3) == 1 && data != 0) {
-				if (!can_set_mci_status(vcpu))
-					return -1;
-			}
+	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
 
-			vcpu->arch.mce_banks[offset] = data;
-			break;
-		}
+		if (!(mcg_cap & MCG_CMCI_P) && (data || !msr_info->host_initiated))
+			return 1;
+		/* An attempt to write a 1 to a reserved bit raises #GP */
+		if (data & ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK))
+			return 1;
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
+					    last_msr + 1 - MSR_IA32_MC0_CTL2);
+		vcpu->arch.mci_ctl2_banks[offset] = data;
+		break;
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
+
+		/*
+		 * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
+		 * values are architecturally undefined.  But, some Linux
+		 * kernels clear bit 10 in bank 4 to workaround a BIOS/GART TLB
+		 * issue on AMD K8s, allow bit 10 to be clear when setting all
+		 * other bits in order to avoid an uncaught #GP in the guest.
+		 */
+		if (is_mci_control_msr(msr) &&
+		    data != 0 && (data | (1 << 10)) != ~(u64)0)
+			return 1;
+
+		/*
+		 * All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR.
+		 * AMD-based CPUs allow non-zero values, but if and only if
+		 * HWCR[McStatusWrEn] is set.
+		 */
+		if (!msr_info->host_initiated && is_mci_status_msr(msr) &&
+		    data != 0 && !can_set_mci_status(vcpu))
+			return 1;
+
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+					    last_msr + 1 - MSR_IA32_MC0_CTL);
+		vcpu->arch.mce_banks[offset] = data;
+		break;
+	default:
 		return 1;
 	}
 	return 0;
@@ -3514,7 +3543,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		}
 		break;
-	case 0x200 ... 0x2ff:
+	case 0x200 ... MSR_IA32_MC0_CTL2 - 1:
+	case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff:
 		return kvm_mtrr_set_msr(vcpu, msr, data);
 	case MSR_IA32_APICBASE:
 		return kvm_set_apic_base(vcpu, msr_info);
@@ -3671,6 +3701,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
 		return set_msr_mce(vcpu, msr_info);
 
 	case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
@@ -3775,6 +3806,7 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
 	u64 data;
 	u64 mcg_cap = vcpu->arch.mcg_cap;
 	unsigned bank_num = mcg_cap & 0xff;
+	u32 offset, last_msr;
 
 	switch (msr) {
 	case MSR_IA32_P5_MC_ADDR:
@@ -3792,16 +3824,27 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
 	case MSR_IA32_MCG_STATUS:
 		data = vcpu->arch.mcg_status;
 		break;
-	default:
-		if (msr >= MSR_IA32_MC0_CTL &&
-		    msr < MSR_IA32_MCx_CTL(bank_num)) {
-			u32 offset = array_index_nospec(
-				msr - MSR_IA32_MC0_CTL,
-				MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
+	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
 
-			data = vcpu->arch.mce_banks[offset];
-			break;
-		}
+		if (!(mcg_cap & MCG_CMCI_P) && !host)
+			return 1;
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
+					    last_msr + 1 - MSR_IA32_MC0_CTL2);
+		data = vcpu->arch.mci_ctl2_banks[offset];
+		break;
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+		last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+		if (msr > last_msr)
+			return 1;
+
+		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+					    last_msr + 1 - MSR_IA32_MC0_CTL);
+		data = vcpu->arch.mce_banks[offset];
+		break;
+	default:
 		return 1;
 	}
 	*pdata = data;
@@ -3898,7 +3941,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	}
 	case MSR_MTRRcap:
-	case 0x200 ... 0x2ff:
+	case 0x200 ... MSR_IA32_MC0_CTL2 - 1:
+	case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff:
 		return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
 	case 0xcd: /* fsb frequency */
 		msr_info->data = 3;
@@ -4014,6 +4058,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
 		return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
 				   msr_info->host_initiated);
 	case MSR_IA32_XSS:
@@ -4769,9 +4814,12 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
 	/* Init IA32_MCG_CTL to all 1s */
 	if (mcg_cap & MCG_CTL_P)
 		vcpu->arch.mcg_ctl = ~(u64)0;
-	/* Init IA32_MCi_CTL to all 1s */
-	for (bank = 0; bank < bank_num; bank++)
+	/* Init IA32_MCi_CTL to all 1s, IA32_MCi_CTL2 to all 0s */
+	for (bank = 0; bank < bank_num; bank++) {
 		vcpu->arch.mce_banks[bank*4] = ~(u64)0;
+		if (mcg_cap & MCG_CMCI_P)
+			vcpu->arch.mci_ctl2_banks[bank] = 0;
+	}
 
 	static_call(kvm_x86_setup_mce)(vcpu);
 out:
@@ -11226,7 +11274,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mce_banks = kcalloc(KVM_MAX_MCE_BANKS * 4, sizeof(u64),
 				       GFP_KERNEL_ACCOUNT);
-	if (!vcpu->arch.mce_banks)
+	vcpu->arch.mci_ctl2_banks = kcalloc(KVM_MAX_MCE_BANKS, sizeof(u64),
+					    GFP_KERNEL_ACCOUNT);
+	if (!vcpu->arch.mce_banks || !vcpu->arch.mci_ctl2_banks)
 		goto fail_free_pio_data;
 	vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS;
 
@@ -11279,6 +11329,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
 fail_free_mce_banks:
 	kfree(vcpu->arch.mce_banks);
+	kfree(vcpu->arch.mci_ctl2_banks);
 fail_free_pio_data:
 	free_page((unsigned long)vcpu->arch.pio_data);
 fail_free_lapic:
@@ -11323,6 +11374,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	kvm_hv_vcpu_uninit(vcpu);
 	kvm_pmu_destroy(vcpu);
 	kfree(vcpu->arch.mce_banks);
+	kfree(vcpu->arch.mci_ctl2_banks);
 	kvm_free_lapic(vcpu);
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 	kvm_mmu_destroy(vcpu);
-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors
  2022-05-20 17:36 [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (5 preceding siblings ...)
  2022-05-20 17:36 ` [PATCH v4 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs Jue Wang
@ 2022-05-20 17:36 ` Jue Wang
  2022-06-03 20:54   ` David Matlack
  2022-05-20 17:36 ` [RFC v4 8/8] KVM: selftests: Add a self test for UCNA injection Jue Wang
  7 siblings, 1 reply; 16+ messages in thread
From: Jue Wang @ 2022-05-20 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

Make KVM support the CMCI capability by default by adding MCG_CMCI_P to
kvm_mce_cap_supported. A vCPU can request for this capability via
KVM_X86_SETUP_MCE. Uncorrectable Error No Action required (UCNA) injection
reuses the MCE injection ioctl KVM_X86_SET_MCE.

Neither of the CMCI and UCNA emulations depends on hardware.

Signed-off-by: Jue Wang <juew@google.com>
---
 arch/x86/kvm/vmx/vmx.c |  1 +
 arch/x86/kvm/x86.c     | 50 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 610355b9ccce..1aed964ee4ee 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8037,6 +8037,7 @@ static __init int hardware_setup(void)
 	}
 
 	kvm_mce_cap_supported |= MCG_LMCE_P;
+	kvm_mce_cap_supported |= MCG_CMCI_P;
 
 	if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
 		return -EINVAL;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f8ab592f519b..d0b1bb6e5e4a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4826,6 +4826,52 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
 	return r;
 }
 
+/*
+ * Validate this is an UCNA error by checking the MCG_STATUS and MCi_STATUS
+ * registers that none of the bits for Machine Check Exceptions are set and
+ * both the VAL (valid) and UC (uncorrectable) bits are set.
+ * UCNA - UnCorrectable No Action required
+ * SRAR - Software Recoverable Action Required
+ * MCI_STATUS_PCC - Processor Context Corrupted
+ * MCI_STATUS_S - Signaled as a Machine Check Exception
+ * MCI_STATUS_AR - This MCE is "software recoverable action required"
+ */
+static bool is_ucna(struct kvm_x86_mce *mce)
+{
+	return	!mce->mcg_status &&
+		!(mce->status & (MCI_STATUS_PCC | MCI_STATUS_S | MCI_STATUS_AR)) &&
+		(mce->status & MCI_STATUS_VAL) &&
+		(mce->status & MCI_STATUS_UC);
+}
+
+static int kvm_vcpu_x86_set_ucna(struct kvm_vcpu *vcpu, struct kvm_x86_mce *mce)
+{
+	u64 mcg_cap = vcpu->arch.mcg_cap;
+	unsigned int bank_num = mcg_cap & 0xff;
+	u64 *banks = vcpu->arch.mce_banks;
+
+	if (mce->bank >= bank_num)
+		return -EINVAL;
+
+	if (!is_ucna(mce))
+		return -EINVAL;
+
+	banks += 4 * mce->bank;
+	banks[1] = mce->status;
+	banks[2] = mce->addr;
+	banks[3] = mce->misc;
+	vcpu->arch.mcg_status = mce->mcg_status;
+
+	if (!(mcg_cap & MCG_CMCI_P) ||
+	    !(vcpu->arch.mci_ctl2_banks[mce->bank] & MCI_CTL2_CMCI_EN))
+		return 0;
+
+	if (lapic_in_kernel(vcpu))
+		kvm_apic_local_deliver(vcpu->arch.apic, APIC_LVTCMCI);
+
+	return 0;
+}
+
 static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
 				      struct kvm_x86_mce *mce)
 {
@@ -4835,6 +4881,10 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
 
 	if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL))
 		return -EINVAL;
+
+	if (is_ucna(mce))
+		return kvm_vcpu_x86_set_ucna(vcpu, mce);
+
 	/*
 	 * if IA32_MCG_CTL is not all 1s, the uncorrected error
 	 * reporting is disabled
-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v4 8/8] KVM: selftests: Add a self test for UCNA injection.
  2022-05-20 17:36 [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
                   ` (6 preceding siblings ...)
  2022-05-20 17:36 ` [PATCH v4 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors Jue Wang
@ 2022-05-20 17:36 ` Jue Wang
  2022-05-20 21:08   ` David Matlack
  7 siblings, 1 reply; 16+ messages in thread
From: Jue Wang @ 2022-05-20 17:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Matlack
  Cc: Tony Luck, kvm, Greg Thelen, Jiaqi Yan, Jue Wang

This patch add a self test that verifies user space can inject
UnCorrectable No Action required (UCNA) memory errors to the guest.

Signed-off-by: Jue Wang <juew@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/apic.h       |   1 +
 .../selftests/kvm/include/x86_64/mce.h        |  25 ++
 .../selftests/kvm/include/x86_64/processor.h  |   1 +
 .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
 .../kvm/x86_64/ucna_injection_test.c          | 287 ++++++++++++++++++
 7 files changed, 317 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/mce.h
 create mode 100644 tools/testing/selftests/kvm/x86_64/ucna_injection_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 0b0e4402bba6..a8eb9d3f082c 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -37,6 +37,7 @@
 /x86_64/tsc_scaling_sync
 /x86_64/sync_regs_test
 /x86_64/tsc_msrs_test
+/x86_64/ucna_injection_test
 /x86_64/userspace_io_test
 /x86_64/userspace_msr_exit_test
 /x86_64/vmx_apic_access_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 681b173aa87c..8d9858b54b98 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -66,6 +66,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
 TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
 TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test
 TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
+TEST_GEN_PROGS_x86_64 += x86_64/ucna_injection_test
 TEST_GEN_PROGS_x86_64 += x86_64/userspace_io_test
 TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_apic_access_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index ac88557dcc9a..bed316fdecd5 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -35,6 +35,7 @@
 #define		APIC_SPIV_APIC_ENABLED		(1 << 8)
 #define APIC_IRR	0x200
 #define	APIC_ICR	0x300
+#define	APIC_LVTCMCI	0x2f0
 #define		APIC_DEST_SELF		0x40000
 #define		APIC_DEST_ALLINC	0x80000
 #define		APIC_DEST_ALLBUT	0xC0000
diff --git a/tools/testing/selftests/kvm/include/x86_64/mce.h b/tools/testing/selftests/kvm/include/x86_64/mce.h
new file mode 100644
index 000000000000..6119321f3f5d
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/x86_64/mce.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * tools/testing/selftests/kvm/include/x86_64/mce.h
+ *
+ * Copyright (C) 2022, Google LLC.
+ */
+
+#ifndef SELFTEST_KVM_MCE_H
+#define SELFTEST_KVM_MCE_H
+
+#define MCG_CTL_P		BIT_ULL(8)   /* MCG_CTL register available */
+#define MCG_SER_P		BIT_ULL(24)  /* MCA recovery/new status bits */
+#define MCG_LMCE_P		BIT_ULL(27)  /* Local machine check supported */
+#define MCG_CMCI_P		BIT_ULL(10)  /* CMCI supported */
+#define KVM_MAX_MCE_BANKS 32
+#define MCG_CAP_BANKS_MASK 0xff       /* Bit 0-7 of the MCG_CAP register are #banks */
+#define MCI_STATUS_VAL (1ULL << 63)   /* valid error */
+#define MCI_STATUS_UC (1ULL << 61)    /* uncorrected error */
+#define MCI_STATUS_EN (1ULL << 60)    /* error enabled */
+#define MCI_STATUS_MISCV (1ULL << 59) /* misc error reg. valid */
+#define MCI_STATUS_ADDRV (1ULL << 58) /* addr reg. valid */
+#define MCM_ADDR_PHYS 2    /* physical address */
+#define MCI_CTL2_CMCI_EN		BIT_ULL(30)
+
+#endif /* SELFTEST_KVM_MCE_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index d0d51adec76e..fa316c3b7725 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -481,6 +481,7 @@ struct kvm_cpuid2 *kvm_get_supported_hv_cpuid(void);
 void vcpu_set_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
 struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
 void vm_xsave_req_perm(int bit);
+void vcpu_setup(struct kvm_vm *vm, int vcpuid);
 
 enum x86_page_size {
 	X86_PAGE_SIZE_4K = 0,
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 33ea5e9955d9..bb1ef665cd10 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -580,7 +580,7 @@ static void kvm_setup_tss_64bit(struct kvm_vm *vm, struct kvm_segment *segp,
 	kvm_seg_fill_gdt_64bit(vm, segp);
 }
 
-static void vcpu_setup(struct kvm_vm *vm, int vcpuid)
+void vcpu_setup(struct kvm_vm *vm, int vcpuid)
 {
 	struct kvm_sregs sregs;
 
diff --git a/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
new file mode 100644
index 000000000000..104a9f116b23
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
@@ -0,0 +1,287 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ucna_injection_test
+ *
+ * Copyright (C) 2022, Google LLC.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * Test that user space can inject UnCorrectable No Action required (UCNA)
+ * memory errors to the guest.
+ *
+ * The test starts one vCPU with the MCG_CMCI_P enabled. It verifies that
+ * proper UCNA errors can be injected to a vCPU with MCG_CMCI_P and
+ * corresponding per-bank control register (MCI_CTL2) bit enabled.
+ * The test also checks that the UCNA errors get recorded in the
+ * Machine Check bank registers no matter the error signal interrupts get
+ * delivered into the guest or not.
+ *
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <pthread.h>
+#include <inttypes.h>
+#include <string.h>
+#include <time.h>
+
+#include "kvm_util_base.h"
+#include "kvm_util.h"
+#include "mce.h"
+#include "processor.h"
+#include "test_util.h"
+#include "apic.h"
+
+#define UCNA_VCPU_ID 0
+#define SYNC_FIRST_UCNA 9
+#define SYNC_SECOND_UCNA 10
+#define FIRST_UCNA_ADDR 0xdeadbeef
+#define SECOND_UCNA_ADDR 0xcafeb0ba
+
+/*
+ * Vector for the CMCI interrupt.
+ * Value is arbitrary. Any value in 0x20-0xFF should work:
+ * https://wiki.osdev.org/Interrupt_Vector_Table
+ */
+#define CMCI_VECTOR  0xa9
+
+#define UCNA_BANK  0x7	// IMC0 bank
+
+/*
+ * Record states about the injected UCNA.
+ * The variables started with the 'i_' prefixes are recorded in interrupt
+ * handler. Variables without the 'i_' prefixes are recorded in guest main
+ * execution thread.
+ */
+static volatile uint64_t i_ucna_rcvd;
+static volatile uint64_t i_ucna_addr;
+static volatile uint64_t ucna_addr;
+static volatile uint64_t ucna_addr2;
+
+struct thread_params {
+	struct kvm_vm *vm;
+	uint32_t vcpu_id;
+};
+
+static void verify_apic_base_addr(void)
+{
+	uint64_t msr = rdmsr(MSR_IA32_APICBASE);
+	uint64_t base = GET_APIC_BASE(msr);
+
+	GUEST_ASSERT(base == APIC_DEFAULT_GPA);
+}
+
+static void guest_code(void)
+{
+	uint64_t ctl2;
+	verify_apic_base_addr();
+	xapic_enable();
+
+	/* Sets up the interrupt vector and enables per-bank CMCI sigaling. */
+	xapic_write_reg(APIC_LVTCMCI, CMCI_VECTOR | APIC_DM_FIXED);
+	ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
+	wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 | MCI_CTL2_CMCI_EN);
+
+	/* Enables interrupt in guest. */
+	asm volatile("sti");
+
+	/* Let user space inject the first UCNA */
+	GUEST_SYNC(SYNC_FIRST_UCNA);
+
+	ucna_addr = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
+
+	/* Disables the per-bank CMCI signaling. */
+	ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
+	wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 & ~MCI_CTL2_CMCI_EN);
+
+	/* Let the user space inject the second UCNA */
+	GUEST_SYNC(SYNC_SECOND_UCNA);
+
+	ucna_addr2 = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
+	GUEST_DONE();
+}
+
+static void guest_cmci_handler(struct ex_regs *regs)
+{
+	i_ucna_rcvd++;
+	i_ucna_addr = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
+	xapic_write_reg(APIC_EOI, 0);
+}
+
+static void inject_ucna(struct kvm_vm *vm, uint32_t vcpu_id, uint64_t addr) {
+	/*
+	 * A UCNA error is indicated with VAL=1, UC=1, PCC=0, S=0 and AR=0 in
+	 * the IA32_MCi_STATUS register.
+	 * MSCOD=1 (BIT[16] - MscodDataRdErr).
+	 * MCACOD=0x0090 (Memory controller error format, channel 0)
+	 */
+	uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
+		          MCI_STATUS_MISCV | MCI_STATUS_ADDRV | 0x10090;
+	struct kvm_x86_mce mce = {};
+	mce.status = status;
+	mce.mcg_status = 0;
+	/*
+	 * MCM_ADDR_PHYS indicates the reported address is a physical address.
+	 * Lowest 6 bits is the recoverable address LSB, i.e., the injected MCE
+	 * is at 4KB granularity.
+	 */
+	mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
+	mce.addr = addr;
+	mce.bank = UCNA_BANK;
+
+	TEST_ASSERT(_vcpu_ioctl(vm, vcpu_id, KVM_X86_SET_MCE, &mce) != -1,
+			"Inject UCNA");
+}
+
+static void *vcpu_thread(void *arg)
+{
+	struct thread_params *params = (struct thread_params *)arg;
+	struct ucall uc;
+	int old;
+	int r;
+	unsigned int exit_reason;
+
+	r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
+	TEST_ASSERT(r == 0,
+		    "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
+		    params->vcpu_id, r);
+
+	fprintf(stderr, "vCPU thread running vCPU %u\n", params->vcpu_id);
+	vcpu_run(params->vm, params->vcpu_id);
+
+	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
+	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
+	TEST_ASSERT(get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_SYNC,
+		    "Expect UCALL_SYNC\n");
+	TEST_ASSERT(uc.args[1] == SYNC_FIRST_UCNA, "Injecting first UCNA.");
+
+	fprintf(stderr, "Injecting first UCNA at %#x.\n", FIRST_UCNA_ADDR);
+
+	inject_ucna(params->vm, params->vcpu_id, FIRST_UCNA_ADDR);
+	vcpu_run(params->vm, params->vcpu_id);
+
+	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
+	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
+	TEST_ASSERT(get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_SYNC,
+		    "Expect UCALL_SYNC\n");
+	TEST_ASSERT(uc.args[1] == SYNC_SECOND_UCNA, "Injecting second UCNA.");
+
+	fprintf(stderr, "Injecting second UCNA at %#x.\n", SECOND_UCNA_ADDR);
+
+	inject_ucna(params->vm, params->vcpu_id, SECOND_UCNA_ADDR);
+	vcpu_run(params->vm, params->vcpu_id);
+
+	exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
+	TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+		    "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+		    params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
+	if (get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_ABORT) {
+		TEST_ASSERT(false,
+			    "vCPU %u exited with error: %s.\n",
+			    params->vcpu_id, (const char *)uc.args[0]);
+	}
+
+	return NULL;
+}
+
+static void setup_mce_cap(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	uint64_t supported_mcg_caps = 0;
+	uint64_t mcg_caps = MCG_CMCI_P | MCG_CTL_P | MCG_SER_P | MCG_LMCE_P
+		| KVM_MAX_MCE_BANKS;
+
+	TEST_ASSERT(_kvm_ioctl(vm, KVM_X86_GET_MCE_CAP_SUPPORTED,
+			       &supported_mcg_caps) != -1,
+		    "KVM_GET_MCE_CAP_SUPPORTED");
+	fprintf(stderr, "KVM supported MCG_CAP: %#lx\n", supported_mcg_caps);
+
+	TEST_ASSERT(supported_mcg_caps & MCG_CMCI_P, "MCG_CMCI_P is not supported");
+
+	mcg_caps &= supported_mcg_caps | MCG_CAP_BANKS_MASK;
+	TEST_ASSERT(_vcpu_ioctl(vm, vcpuid, KVM_X86_SETUP_MCE, &mcg_caps) != -1,
+		    "KVM_X86_SETUP_MCE");
+}
+
+static void create_vcpu_with_mce_cap(struct kvm_vm *vm, uint32_t vcpuid, void *guest_code)
+{
+	struct kvm_mp_state mp_state;
+	struct kvm_regs regs;
+	vm_vaddr_t stack_vaddr;
+	stack_vaddr = vm_vaddr_alloc(vm, DEFAULT_STACK_PGS * getpagesize(),
+				     DEFAULT_GUEST_STACK_VADDR_MIN);
+
+	vm_vcpu_add(vm, vcpuid);
+	setup_mce_cap(vm, vcpuid);
+
+	vcpu_set_cpuid(vm, vcpuid, kvm_get_supported_cpuid());
+	vcpu_setup(vm, vcpuid);
+
+	/* Setup guest general purpose registers */
+	vcpu_regs_get(vm, vcpuid, &regs);
+	regs.rflags = regs.rflags | 0x2;
+	regs.rsp = stack_vaddr + (DEFAULT_STACK_PGS * getpagesize());
+	regs.rip = (unsigned long) guest_code;
+	vcpu_regs_set(vm, vcpuid, &regs);
+
+	/* Setup the MP state */
+	mp_state.mp_state = 0;
+	vcpu_set_mp_state(vm, vcpuid, &mp_state);
+}
+
+int main(int argc, char *argv[])
+{
+	int r;
+	void *retval;
+	int run_secs = 3;
+	pthread_t threads[2];
+	struct thread_params params[2];
+	struct kvm_vm *vm;
+	uint64_t *p_i_ucna_rcvd, *p_i_ucna_addr;
+	uint64_t *p_ucna_addr, *p_ucna_addr2;
+
+	kvm_check_cap(KVM_CAP_MCE);
+
+	vm = vm_create_without_vcpus(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES);
+
+	create_vcpu_with_mce_cap(vm, UCNA_VCPU_ID, guest_code);
+
+	params[0].vm = vm;
+
+	vm_init_descriptor_tables(vm);
+	vcpu_init_descriptor_tables(vm, UCNA_VCPU_ID);
+	vm_install_exception_handler(vm, CMCI_VECTOR, guest_cmci_handler);
+
+	virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+	p_i_ucna_rcvd = (uint64_t *)addr_gva2hva(vm, (uint64_t)&i_ucna_rcvd);
+	p_i_ucna_addr = (uint64_t *)addr_gva2hva(vm, (uint64_t)&i_ucna_addr);
+	p_ucna_addr = (uint64_t *)addr_gva2hva(vm, (uint64_t)&ucna_addr);
+	p_ucna_addr2 = (uint64_t *)addr_gva2hva(vm, (uint64_t)&ucna_addr2);
+
+	/* Start vCPU thread and wait for it to execute first HLT. */
+	params[0].vcpu_id = UCNA_VCPU_ID;
+	r = pthread_create(&threads[0], NULL, vcpu_thread, &params[0]);
+	TEST_ASSERT(r == 0,
+		    "pthread_create vcpu thread failed errno=%d", errno);
+	fprintf(stderr, "vCPU thread started\n");
+
+	sleep(run_secs);
+
+	r = pthread_join(threads[0], &retval);
+	TEST_ASSERT(r == 0,
+		    "pthread_join on vcpu_id=%d failed with errno=%d",
+		    UCNA_VCPU_ID, r);
+
+	fprintf(stderr,
+		"Test successful after running for %d seconds.\n"
+		"UCNA CMCI interrupts received: %ld\n"
+		"Last UCNA address received via CMCI: %lx\n"
+		"First UCNA address in vCPU thread: %lx\n"
+		"Second UCNA address in vCPU thread: %lx\n",
+		run_secs, *p_i_ucna_rcvd, *p_i_ucna_addr, *p_ucna_addr, *p_ucna_addr2);
+
+	kvm_vm_free(vm);
+}
-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC v4 8/8] KVM: selftests: Add a self test for UCNA injection.
  2022-05-20 17:36 ` [RFC v4 8/8] KVM: selftests: Add a self test for UCNA injection Jue Wang
@ 2022-05-20 21:08   ` David Matlack
  2022-05-20 22:16     ` Jue Wang
  0 siblings, 1 reply; 16+ messages in thread
From: David Matlack @ 2022-05-20 21:08 UTC (permalink / raw)
  To: Jue Wang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Tony Luck, kvm list, Greg Thelen,
	Jiaqi Yan

On Fri, May 20, 2022 at 10:37 AM Jue Wang <juew@google.com> wrote:
>
> This patch add a self test that verifies user space can inject
> UnCorrectable No Action required (UCNA) memory errors to the guest.
>
> Signed-off-by: Jue Wang <juew@google.com>
> ---
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/include/x86_64/apic.h       |   1 +
>  .../selftests/kvm/include/x86_64/mce.h        |  25 ++
>  .../selftests/kvm/include/x86_64/processor.h  |   1 +
>  .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
>  .../kvm/x86_64/ucna_injection_test.c          | 287 ++++++++++++++++++
>  7 files changed, 317 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/kvm/include/x86_64/mce.h
>  create mode 100644 tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
>
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 0b0e4402bba6..a8eb9d3f082c 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -37,6 +37,7 @@
>  /x86_64/tsc_scaling_sync
>  /x86_64/sync_regs_test
>  /x86_64/tsc_msrs_test
> +/x86_64/ucna_injection_test
>  /x86_64/userspace_io_test
>  /x86_64/userspace_msr_exit_test
>  /x86_64/vmx_apic_access_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 681b173aa87c..8d9858b54b98 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -66,6 +66,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
>  TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
>  TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test
>  TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
> +TEST_GEN_PROGS_x86_64 += x86_64/ucna_injection_test
>  TEST_GEN_PROGS_x86_64 += x86_64/userspace_io_test
>  TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit_test
>  TEST_GEN_PROGS_x86_64 += x86_64/vmx_apic_access_test
> diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
> index ac88557dcc9a..bed316fdecd5 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/apic.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
> @@ -35,6 +35,7 @@
>  #define                APIC_SPIV_APIC_ENABLED          (1 << 8)
>  #define APIC_IRR       0x200
>  #define        APIC_ICR        0x300
> +#define        APIC_LVTCMCI    0x2f0
>  #define                APIC_DEST_SELF          0x40000
>  #define                APIC_DEST_ALLINC        0x80000
>  #define                APIC_DEST_ALLBUT        0xC0000
> diff --git a/tools/testing/selftests/kvm/include/x86_64/mce.h b/tools/testing/selftests/kvm/include/x86_64/mce.h
> new file mode 100644
> index 000000000000..6119321f3f5d
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/include/x86_64/mce.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * tools/testing/selftests/kvm/include/x86_64/mce.h
> + *
> + * Copyright (C) 2022, Google LLC.
> + */
> +
> +#ifndef SELFTEST_KVM_MCE_H
> +#define SELFTEST_KVM_MCE_H
> +
> +#define MCG_CTL_P              BIT_ULL(8)   /* MCG_CTL register available */
> +#define MCG_SER_P              BIT_ULL(24)  /* MCA recovery/new status bits */
> +#define MCG_LMCE_P             BIT_ULL(27)  /* Local machine check supported */
> +#define MCG_CMCI_P             BIT_ULL(10)  /* CMCI supported */
> +#define KVM_MAX_MCE_BANKS 32
> +#define MCG_CAP_BANKS_MASK 0xff       /* Bit 0-7 of the MCG_CAP register are #banks */
> +#define MCI_STATUS_VAL (1ULL << 63)   /* valid error */
> +#define MCI_STATUS_UC (1ULL << 61)    /* uncorrected error */
> +#define MCI_STATUS_EN (1ULL << 60)    /* error enabled */
> +#define MCI_STATUS_MISCV (1ULL << 59) /* misc error reg. valid */
> +#define MCI_STATUS_ADDRV (1ULL << 58) /* addr reg. valid */
> +#define MCM_ADDR_PHYS 2    /* physical address */
> +#define MCI_CTL2_CMCI_EN               BIT_ULL(30)
> +
> +#endif /* SELFTEST_KVM_MCE_H */
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index d0d51adec76e..fa316c3b7725 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -481,6 +481,7 @@ struct kvm_cpuid2 *kvm_get_supported_hv_cpuid(void);
>  void vcpu_set_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
>  struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
>  void vm_xsave_req_perm(int bit);
> +void vcpu_setup(struct kvm_vm *vm, int vcpuid);
>
>  enum x86_page_size {
>         X86_PAGE_SIZE_4K = 0,
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index 33ea5e9955d9..bb1ef665cd10 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -580,7 +580,7 @@ static void kvm_setup_tss_64bit(struct kvm_vm *vm, struct kvm_segment *segp,
>         kvm_seg_fill_gdt_64bit(vm, segp);
>  }
>
> -static void vcpu_setup(struct kvm_vm *vm, int vcpuid)
> +void vcpu_setup(struct kvm_vm *vm, int vcpuid)
>  {
>         struct kvm_sregs sregs;
>
> diff --git a/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
> new file mode 100644
> index 000000000000..104a9f116b23
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
> @@ -0,0 +1,287 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * ucna_injection_test
> + *
> + * Copyright (C) 2022, Google LLC.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + *
> + * Test that user space can inject UnCorrectable No Action required (UCNA)
> + * memory errors to the guest.
> + *
> + * The test starts one vCPU with the MCG_CMCI_P enabled. It verifies that
> + * proper UCNA errors can be injected to a vCPU with MCG_CMCI_P and
> + * corresponding per-bank control register (MCI_CTL2) bit enabled.
> + * The test also checks that the UCNA errors get recorded in the
> + * Machine Check bank registers no matter the error signal interrupts get
> + * delivered into the guest or not.
> + *
> + */
> +
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <pthread.h>
> +#include <inttypes.h>
> +#include <string.h>
> +#include <time.h>
> +
> +#include "kvm_util_base.h"
> +#include "kvm_util.h"
> +#include "mce.h"
> +#include "processor.h"
> +#include "test_util.h"
> +#include "apic.h"
> +
> +#define UCNA_VCPU_ID 0
> +#define SYNC_FIRST_UCNA 9
> +#define SYNC_SECOND_UCNA 10
> +#define FIRST_UCNA_ADDR 0xdeadbeef
> +#define SECOND_UCNA_ADDR 0xcafeb0ba
> +
> +/*
> + * Vector for the CMCI interrupt.
> + * Value is arbitrary. Any value in 0x20-0xFF should work:
> + * https://wiki.osdev.org/Interrupt_Vector_Table
> + */
> +#define CMCI_VECTOR  0xa9
> +
> +#define UCNA_BANK  0x7 // IMC0 bank
> +
> +/*
> + * Record states about the injected UCNA.
> + * The variables started with the 'i_' prefixes are recorded in interrupt
> + * handler. Variables without the 'i_' prefixes are recorded in guest main
> + * execution thread.
> + */
> +static volatile uint64_t i_ucna_rcvd;
> +static volatile uint64_t i_ucna_addr;
> +static volatile uint64_t ucna_addr;
> +static volatile uint64_t ucna_addr2;
> +
> +struct thread_params {
> +       struct kvm_vm *vm;
> +       uint32_t vcpu_id;
> +};
> +
> +static void verify_apic_base_addr(void)
> +{
> +       uint64_t msr = rdmsr(MSR_IA32_APICBASE);
> +       uint64_t base = GET_APIC_BASE(msr);
> +
> +       GUEST_ASSERT(base == APIC_DEFAULT_GPA);
> +}
> +
> +static void guest_code(void)
> +{
> +       uint64_t ctl2;
> +       verify_apic_base_addr();
> +       xapic_enable();
> +
> +       /* Sets up the interrupt vector and enables per-bank CMCI sigaling. */
> +       xapic_write_reg(APIC_LVTCMCI, CMCI_VECTOR | APIC_DM_FIXED);
> +       ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
> +       wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 | MCI_CTL2_CMCI_EN);
> +
> +       /* Enables interrupt in guest. */
> +       asm volatile("sti");
> +
> +       /* Let user space inject the first UCNA */
> +       GUEST_SYNC(SYNC_FIRST_UCNA);
> +
> +       ucna_addr = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
> +
> +       /* Disables the per-bank CMCI signaling. */
> +       ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
> +       wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 & ~MCI_CTL2_CMCI_EN);
> +
> +       /* Let the user space inject the second UCNA */
> +       GUEST_SYNC(SYNC_SECOND_UCNA);
> +
> +       ucna_addr2 = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
> +       GUEST_DONE();
> +}
> +
> +static void guest_cmci_handler(struct ex_regs *regs)
> +{
> +       i_ucna_rcvd++;
> +       i_ucna_addr = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
> +       xapic_write_reg(APIC_EOI, 0);
> +}
> +
> +static void inject_ucna(struct kvm_vm *vm, uint32_t vcpu_id, uint64_t addr) {
> +       /*
> +        * A UCNA error is indicated with VAL=1, UC=1, PCC=0, S=0 and AR=0 in
> +        * the IA32_MCi_STATUS register.
> +        * MSCOD=1 (BIT[16] - MscodDataRdErr).
> +        * MCACOD=0x0090 (Memory controller error format, channel 0)
> +        */
> +       uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
> +                         MCI_STATUS_MISCV | MCI_STATUS_ADDRV | 0x10090;
> +       struct kvm_x86_mce mce = {};
> +       mce.status = status;
> +       mce.mcg_status = 0;
> +       /*
> +        * MCM_ADDR_PHYS indicates the reported address is a physical address.
> +        * Lowest 6 bits is the recoverable address LSB, i.e., the injected MCE
> +        * is at 4KB granularity.
> +        */
> +       mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
> +       mce.addr = addr;
> +       mce.bank = UCNA_BANK;
> +
> +       TEST_ASSERT(_vcpu_ioctl(vm, vcpu_id, KVM_X86_SET_MCE, &mce) != -1,
> +                       "Inject UCNA");
> +}
> +
> +static void *vcpu_thread(void *arg)
> +{
> +       struct thread_params *params = (struct thread_params *)arg;
> +       struct ucall uc;
> +       int old;
> +       int r;
> +       unsigned int exit_reason;
> +
> +       r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
> +       TEST_ASSERT(r == 0,
> +                   "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
> +                   params->vcpu_id, r);
> +
> +       fprintf(stderr, "vCPU thread running vCPU %u\n", params->vcpu_id);
> +       vcpu_run(params->vm, params->vcpu_id);
> +
> +       exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
> +       TEST_ASSERT(exit_reason == KVM_EXIT_IO,
> +                   "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
> +                   params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
> +       TEST_ASSERT(get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_SYNC,
> +                   "Expect UCALL_SYNC\n");
> +       TEST_ASSERT(uc.args[1] == SYNC_FIRST_UCNA, "Injecting first UCNA.");
> +
> +       fprintf(stderr, "Injecting first UCNA at %#x.\n", FIRST_UCNA_ADDR);
> +
> +       inject_ucna(params->vm, params->vcpu_id, FIRST_UCNA_ADDR);
> +       vcpu_run(params->vm, params->vcpu_id);
> +
> +       exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
> +       TEST_ASSERT(exit_reason == KVM_EXIT_IO,
> +                   "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
> +                   params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
> +       TEST_ASSERT(get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_SYNC,
> +                   "Expect UCALL_SYNC\n");
> +       TEST_ASSERT(uc.args[1] == SYNC_SECOND_UCNA, "Injecting second UCNA.");
> +
> +       fprintf(stderr, "Injecting second UCNA at %#x.\n", SECOND_UCNA_ADDR);
> +
> +       inject_ucna(params->vm, params->vcpu_id, SECOND_UCNA_ADDR);
> +       vcpu_run(params->vm, params->vcpu_id);
> +
> +       exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
> +       TEST_ASSERT(exit_reason == KVM_EXIT_IO,
> +                   "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
> +                   params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
> +       if (get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_ABORT) {
> +               TEST_ASSERT(false,
> +                           "vCPU %u exited with error: %s.\n",
> +                           params->vcpu_id, (const char *)uc.args[0]);
> +       }
> +
> +       return NULL;
> +}
> +
> +static void setup_mce_cap(struct kvm_vm *vm, uint32_t vcpuid)
> +{
> +       uint64_t supported_mcg_caps = 0;
> +       uint64_t mcg_caps = MCG_CMCI_P | MCG_CTL_P | MCG_SER_P | MCG_LMCE_P
> +               | KVM_MAX_MCE_BANKS;
> +
> +       TEST_ASSERT(_kvm_ioctl(vm, KVM_X86_GET_MCE_CAP_SUPPORTED,
> +                              &supported_mcg_caps) != -1,
> +                   "KVM_GET_MCE_CAP_SUPPORTED");
> +       fprintf(stderr, "KVM supported MCG_CAP: %#lx\n", supported_mcg_caps);
> +
> +       TEST_ASSERT(supported_mcg_caps & MCG_CMCI_P, "MCG_CMCI_P is not supported");

Instead of failing the test on older kernels, mark it as skipped.

if (!(supported_mcg_caps & MCG_CMCI_P)) {
        print_skip(" MCG_CMCI_P is not supported");
        exit(KSFT_SKIP);
}

> +
> +       mcg_caps &= supported_mcg_caps | MCG_CAP_BANKS_MASK;
> +       TEST_ASSERT(_vcpu_ioctl(vm, vcpuid, KVM_X86_SETUP_MCE, &mcg_caps) != -1,
> +                   "KVM_X86_SETUP_MCE");
> +}
> +
> +static void create_vcpu_with_mce_cap(struct kvm_vm *vm, uint32_t vcpuid, void *guest_code)
> +{
> +       struct kvm_mp_state mp_state;
> +       struct kvm_regs regs;
> +       vm_vaddr_t stack_vaddr;
> +       stack_vaddr = vm_vaddr_alloc(vm, DEFAULT_STACK_PGS * getpagesize(),
> +                                    DEFAULT_GUEST_STACK_VADDR_MIN);
> +
> +       vm_vcpu_add(vm, vcpuid);
> +       setup_mce_cap(vm, vcpuid);

Is some ordering requirement between KVM_X86_SETUP_MCE and other vCPU
ioctls? I'm wondering why this function has to duplicate so much of
the vCPU setup code.
> +
> +       vcpu_set_cpuid(vm, vcpuid, kvm_get_supported_cpuid());
> +       vcpu_setup(vm, vcpuid);
> +
> +       /* Setup guest general purpose registers */
> +       vcpu_regs_get(vm, vcpuid, &regs);
> +       regs.rflags = regs.rflags | 0x2;
> +       regs.rsp = stack_vaddr + (DEFAULT_STACK_PGS * getpagesize());
> +       regs.rip = (unsigned long) guest_code;
> +       vcpu_regs_set(vm, vcpuid, &regs);
> +
> +       /* Setup the MP state */
> +       mp_state.mp_state = 0;
> +       vcpu_set_mp_state(vm, vcpuid, &mp_state);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +       int r;
> +       void *retval;
> +       int run_secs = 3;
> +       pthread_t threads[2];
> +       struct thread_params params[2];
> +       struct kvm_vm *vm;
> +       uint64_t *p_i_ucna_rcvd, *p_i_ucna_addr;
> +       uint64_t *p_ucna_addr, *p_ucna_addr2;
> +
> +       kvm_check_cap(KVM_CAP_MCE);
> +
> +       vm = vm_create_without_vcpus(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES);
> +
> +       create_vcpu_with_mce_cap(vm, UCNA_VCPU_ID, guest_code);
> +
> +       params[0].vm = vm;
> +
> +       vm_init_descriptor_tables(vm);
> +       vcpu_init_descriptor_tables(vm, UCNA_VCPU_ID);
> +       vm_install_exception_handler(vm, CMCI_VECTOR, guest_cmci_handler);
> +
> +       virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
> +
> +       p_i_ucna_rcvd = (uint64_t *)addr_gva2hva(vm, (uint64_t)&i_ucna_rcvd);
> +       p_i_ucna_addr = (uint64_t *)addr_gva2hva(vm, (uint64_t)&i_ucna_addr);
> +       p_ucna_addr = (uint64_t *)addr_gva2hva(vm, (uint64_t)&ucna_addr);
> +       p_ucna_addr2 = (uint64_t *)addr_gva2hva(vm, (uint64_t)&ucna_addr2);
> +
> +       /* Start vCPU thread and wait for it to execute first HLT. */

Stale comment from xapic_ipi_test.c

> +       params[0].vcpu_id = UCNA_VCPU_ID;
> +       r = pthread_create(&threads[0], NULL, vcpu_thread, &params[0]);
> +       TEST_ASSERT(r == 0,
> +                   "pthread_create vcpu thread failed errno=%d", errno);
> +       fprintf(stderr, "vCPU thread started\n");
> +
> +       sleep(run_secs);
> +
> +       r = pthread_join(threads[0], &retval);
> +       TEST_ASSERT(r == 0,
> +                   "pthread_join on vcpu_id=%d failed with errno=%d",
> +                   UCNA_VCPU_ID, r);

Creating a here is unnecessary, e.g. you could just replace all this
code with a call to vcpu_thread().

> +
> +       fprintf(stderr,
> +               "Test successful after running for %d seconds.\n"
> +               "UCNA CMCI interrupts received: %ld\n"
> +               "Last UCNA address received via CMCI: %lx\n"
> +               "First UCNA address in vCPU thread: %lx\n"
> +               "Second UCNA address in vCPU thread: %lx\n",
> +               run_secs, *p_i_ucna_rcvd, *p_i_ucna_addr, *p_ucna_addr, *p_ucna_addr2);

Instead of printing out all these fields, can you make assertions
about them? e.g. Assert that the expected number of interrupts were
received, etc.

> +
> +       kvm_vm_free(vm);
> +}
> --
> 2.36.1.124.g0e6072fb45-goog
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC v4 8/8] KVM: selftests: Add a self test for UCNA injection.
  2022-05-20 21:08   ` David Matlack
@ 2022-05-20 22:16     ` Jue Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Jue Wang @ 2022-05-20 22:16 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Tony Luck, kvm list, Greg Thelen,
	Jiaqi Yan

Thanks, David.


On Fri, May 20, 2022 at 2:08 PM David Matlack <dmatlack@google.com> wrote:
>
> On Fri, May 20, 2022 at 10:37 AM Jue Wang <juew@google.com> wrote:
> >
> > This patch add a self test that verifies user space can inject
> > UnCorrectable No Action required (UCNA) memory errors to the guest.
> >
> > Signed-off-by: Jue Wang <juew@google.com>
> > ---
> >  tools/testing/selftests/kvm/.gitignore        |   1 +
> >  tools/testing/selftests/kvm/Makefile          |   1 +
> >  .../selftests/kvm/include/x86_64/apic.h       |   1 +
> >  .../selftests/kvm/include/x86_64/mce.h        |  25 ++
> >  .../selftests/kvm/include/x86_64/processor.h  |   1 +
> >  .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
> >  .../kvm/x86_64/ucna_injection_test.c          | 287 ++++++++++++++++++
> >  7 files changed, 317 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/testing/selftests/kvm/include/x86_64/mce.h
> >  create mode 100644 tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
> >
> > diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> > index 0b0e4402bba6..a8eb9d3f082c 100644
> > --- a/tools/testing/selftests/kvm/.gitignore
> > +++ b/tools/testing/selftests/kvm/.gitignore
> > @@ -37,6 +37,7 @@
> >  /x86_64/tsc_scaling_sync
> >  /x86_64/sync_regs_test
> >  /x86_64/tsc_msrs_test
> > +/x86_64/ucna_injection_test
> >  /x86_64/userspace_io_test
> >  /x86_64/userspace_msr_exit_test
> >  /x86_64/vmx_apic_access_test
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index 681b173aa87c..8d9858b54b98 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -66,6 +66,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
> >  TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
> >  TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test
> >  TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
> > +TEST_GEN_PROGS_x86_64 += x86_64/ucna_injection_test
> >  TEST_GEN_PROGS_x86_64 += x86_64/userspace_io_test
> >  TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit_test
> >  TEST_GEN_PROGS_x86_64 += x86_64/vmx_apic_access_test
> > diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
> > index ac88557dcc9a..bed316fdecd5 100644
> > --- a/tools/testing/selftests/kvm/include/x86_64/apic.h
> > +++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
> > @@ -35,6 +35,7 @@
> >  #define                APIC_SPIV_APIC_ENABLED          (1 << 8)
> >  #define APIC_IRR       0x200
> >  #define        APIC_ICR        0x300
> > +#define        APIC_LVTCMCI    0x2f0
> >  #define                APIC_DEST_SELF          0x40000
> >  #define                APIC_DEST_ALLINC        0x80000
> >  #define                APIC_DEST_ALLBUT        0xC0000
> > diff --git a/tools/testing/selftests/kvm/include/x86_64/mce.h b/tools/testing/selftests/kvm/include/x86_64/mce.h
> > new file mode 100644
> > index 000000000000..6119321f3f5d
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/include/x86_64/mce.h
> > @@ -0,0 +1,25 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * tools/testing/selftests/kvm/include/x86_64/mce.h
> > + *
> > + * Copyright (C) 2022, Google LLC.
> > + */
> > +
> > +#ifndef SELFTEST_KVM_MCE_H
> > +#define SELFTEST_KVM_MCE_H
> > +
> > +#define MCG_CTL_P              BIT_ULL(8)   /* MCG_CTL register available */
> > +#define MCG_SER_P              BIT_ULL(24)  /* MCA recovery/new status bits */
> > +#define MCG_LMCE_P             BIT_ULL(27)  /* Local machine check supported */
> > +#define MCG_CMCI_P             BIT_ULL(10)  /* CMCI supported */
> > +#define KVM_MAX_MCE_BANKS 32
> > +#define MCG_CAP_BANKS_MASK 0xff       /* Bit 0-7 of the MCG_CAP register are #banks */
> > +#define MCI_STATUS_VAL (1ULL << 63)   /* valid error */
> > +#define MCI_STATUS_UC (1ULL << 61)    /* uncorrected error */
> > +#define MCI_STATUS_EN (1ULL << 60)    /* error enabled */
> > +#define MCI_STATUS_MISCV (1ULL << 59) /* misc error reg. valid */
> > +#define MCI_STATUS_ADDRV (1ULL << 58) /* addr reg. valid */
> > +#define MCM_ADDR_PHYS 2    /* physical address */
> > +#define MCI_CTL2_CMCI_EN               BIT_ULL(30)
> > +
> > +#endif /* SELFTEST_KVM_MCE_H */
> > diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> > index d0d51adec76e..fa316c3b7725 100644
> > --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> > +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> > @@ -481,6 +481,7 @@ struct kvm_cpuid2 *kvm_get_supported_hv_cpuid(void);
> >  void vcpu_set_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
> >  struct kvm_cpuid2 *vcpu_get_supported_hv_cpuid(struct kvm_vm *vm, uint32_t vcpuid);
> >  void vm_xsave_req_perm(int bit);
> > +void vcpu_setup(struct kvm_vm *vm, int vcpuid);
> >
> >  enum x86_page_size {
> >         X86_PAGE_SIZE_4K = 0,
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > index 33ea5e9955d9..bb1ef665cd10 100644
> > --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > @@ -580,7 +580,7 @@ static void kvm_setup_tss_64bit(struct kvm_vm *vm, struct kvm_segment *segp,
> >         kvm_seg_fill_gdt_64bit(vm, segp);
> >  }
> >
> > -static void vcpu_setup(struct kvm_vm *vm, int vcpuid)
> > +void vcpu_setup(struct kvm_vm *vm, int vcpuid)
> >  {
> >         struct kvm_sregs sregs;
> >
> > diff --git a/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
> > new file mode 100644
> > index 000000000000..104a9f116b23
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
> > @@ -0,0 +1,287 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * ucna_injection_test
> > + *
> > + * Copyright (C) 2022, Google LLC.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + *
> > + * Test that user space can inject UnCorrectable No Action required (UCNA)
> > + * memory errors to the guest.
> > + *
> > + * The test starts one vCPU with the MCG_CMCI_P enabled. It verifies that
> > + * proper UCNA errors can be injected to a vCPU with MCG_CMCI_P and
> > + * corresponding per-bank control register (MCI_CTL2) bit enabled.
> > + * The test also checks that the UCNA errors get recorded in the
> > + * Machine Check bank registers no matter the error signal interrupts get
> > + * delivered into the guest or not.
> > + *
> > + */
> > +
> > +#define _GNU_SOURCE /* for program_invocation_short_name */
> > +#include <pthread.h>
> > +#include <inttypes.h>
> > +#include <string.h>
> > +#include <time.h>
> > +
> > +#include "kvm_util_base.h"
> > +#include "kvm_util.h"
> > +#include "mce.h"
> > +#include "processor.h"
> > +#include "test_util.h"
> > +#include "apic.h"
> > +
> > +#define UCNA_VCPU_ID 0
> > +#define SYNC_FIRST_UCNA 9
> > +#define SYNC_SECOND_UCNA 10
> > +#define FIRST_UCNA_ADDR 0xdeadbeef
> > +#define SECOND_UCNA_ADDR 0xcafeb0ba
> > +
> > +/*
> > + * Vector for the CMCI interrupt.
> > + * Value is arbitrary. Any value in 0x20-0xFF should work:
> > + * https://wiki.osdev.org/Interrupt_Vector_Table
> > + */
> > +#define CMCI_VECTOR  0xa9
> > +
> > +#define UCNA_BANK  0x7 // IMC0 bank
> > +
> > +/*
> > + * Record states about the injected UCNA.
> > + * The variables started with the 'i_' prefixes are recorded in interrupt
> > + * handler. Variables without the 'i_' prefixes are recorded in guest main
> > + * execution thread.
> > + */
> > +static volatile uint64_t i_ucna_rcvd;
> > +static volatile uint64_t i_ucna_addr;
> > +static volatile uint64_t ucna_addr;
> > +static volatile uint64_t ucna_addr2;
> > +
> > +struct thread_params {
> > +       struct kvm_vm *vm;
> > +       uint32_t vcpu_id;
> > +};
> > +
> > +static void verify_apic_base_addr(void)
> > +{
> > +       uint64_t msr = rdmsr(MSR_IA32_APICBASE);
> > +       uint64_t base = GET_APIC_BASE(msr);
> > +
> > +       GUEST_ASSERT(base == APIC_DEFAULT_GPA);
> > +}
> > +
> > +static void guest_code(void)
> > +{
> > +       uint64_t ctl2;
> > +       verify_apic_base_addr();
> > +       xapic_enable();
> > +
> > +       /* Sets up the interrupt vector and enables per-bank CMCI sigaling. */
> > +       xapic_write_reg(APIC_LVTCMCI, CMCI_VECTOR | APIC_DM_FIXED);
> > +       ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
> > +       wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 | MCI_CTL2_CMCI_EN);
> > +
> > +       /* Enables interrupt in guest. */
> > +       asm volatile("sti");
> > +
> > +       /* Let user space inject the first UCNA */
> > +       GUEST_SYNC(SYNC_FIRST_UCNA);
> > +
> > +       ucna_addr = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
> > +
> > +       /* Disables the per-bank CMCI signaling. */
> > +       ctl2 = rdmsr(MSR_IA32_MCx_CTL2(UCNA_BANK));
> > +       wrmsr(MSR_IA32_MCx_CTL2(UCNA_BANK), ctl2 & ~MCI_CTL2_CMCI_EN);
> > +
> > +       /* Let the user space inject the second UCNA */
> > +       GUEST_SYNC(SYNC_SECOND_UCNA);
> > +
> > +       ucna_addr2 = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
> > +       GUEST_DONE();
> > +}
> > +
> > +static void guest_cmci_handler(struct ex_regs *regs)
> > +{
> > +       i_ucna_rcvd++;
> > +       i_ucna_addr = rdmsr(MSR_IA32_MCx_ADDR(UCNA_BANK));
> > +       xapic_write_reg(APIC_EOI, 0);
> > +}
> > +
> > +static void inject_ucna(struct kvm_vm *vm, uint32_t vcpu_id, uint64_t addr) {
> > +       /*
> > +        * A UCNA error is indicated with VAL=1, UC=1, PCC=0, S=0 and AR=0 in
> > +        * the IA32_MCi_STATUS register.
> > +        * MSCOD=1 (BIT[16] - MscodDataRdErr).
> > +        * MCACOD=0x0090 (Memory controller error format, channel 0)
> > +        */
> > +       uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
> > +                         MCI_STATUS_MISCV | MCI_STATUS_ADDRV | 0x10090;
> > +       struct kvm_x86_mce mce = {};
> > +       mce.status = status;
> > +       mce.mcg_status = 0;
> > +       /*
> > +        * MCM_ADDR_PHYS indicates the reported address is a physical address.
> > +        * Lowest 6 bits is the recoverable address LSB, i.e., the injected MCE
> > +        * is at 4KB granularity.
> > +        */
> > +       mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
> > +       mce.addr = addr;
> > +       mce.bank = UCNA_BANK;
> > +
> > +       TEST_ASSERT(_vcpu_ioctl(vm, vcpu_id, KVM_X86_SET_MCE, &mce) != -1,
> > +                       "Inject UCNA");
> > +}
> > +
> > +static void *vcpu_thread(void *arg)
> > +{
> > +       struct thread_params *params = (struct thread_params *)arg;
> > +       struct ucall uc;
> > +       int old;
> > +       int r;
> > +       unsigned int exit_reason;
> > +
> > +       r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
> > +       TEST_ASSERT(r == 0,
> > +                   "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
> > +                   params->vcpu_id, r);
> > +
> > +       fprintf(stderr, "vCPU thread running vCPU %u\n", params->vcpu_id);
> > +       vcpu_run(params->vm, params->vcpu_id);
> > +
> > +       exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
> > +       TEST_ASSERT(exit_reason == KVM_EXIT_IO,
> > +                   "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
> > +                   params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
> > +       TEST_ASSERT(get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_SYNC,
> > +                   "Expect UCALL_SYNC\n");
> > +       TEST_ASSERT(uc.args[1] == SYNC_FIRST_UCNA, "Injecting first UCNA.");
> > +
> > +       fprintf(stderr, "Injecting first UCNA at %#x.\n", FIRST_UCNA_ADDR);
> > +
> > +       inject_ucna(params->vm, params->vcpu_id, FIRST_UCNA_ADDR);
> > +       vcpu_run(params->vm, params->vcpu_id);
> > +
> > +       exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
> > +       TEST_ASSERT(exit_reason == KVM_EXIT_IO,
> > +                   "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
> > +                   params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
> > +       TEST_ASSERT(get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_SYNC,
> > +                   "Expect UCALL_SYNC\n");
> > +       TEST_ASSERT(uc.args[1] == SYNC_SECOND_UCNA, "Injecting second UCNA.");
> > +
> > +       fprintf(stderr, "Injecting second UCNA at %#x.\n", SECOND_UCNA_ADDR);
> > +
> > +       inject_ucna(params->vm, params->vcpu_id, SECOND_UCNA_ADDR);
> > +       vcpu_run(params->vm, params->vcpu_id);
> > +
> > +       exit_reason = vcpu_state(params->vm, params->vcpu_id)->exit_reason;
> > +       TEST_ASSERT(exit_reason == KVM_EXIT_IO,
> > +                   "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
> > +                   params->vcpu_id, exit_reason, exit_reason_str(exit_reason));
> > +       if (get_ucall(params->vm, params->vcpu_id, &uc) == UCALL_ABORT) {
> > +               TEST_ASSERT(false,
> > +                           "vCPU %u exited with error: %s.\n",
> > +                           params->vcpu_id, (const char *)uc.args[0]);
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +static void setup_mce_cap(struct kvm_vm *vm, uint32_t vcpuid)
> > +{
> > +       uint64_t supported_mcg_caps = 0;
> > +       uint64_t mcg_caps = MCG_CMCI_P | MCG_CTL_P | MCG_SER_P | MCG_LMCE_P
> > +               | KVM_MAX_MCE_BANKS;
> > +
> > +       TEST_ASSERT(_kvm_ioctl(vm, KVM_X86_GET_MCE_CAP_SUPPORTED,
> > +                              &supported_mcg_caps) != -1,
> > +                   "KVM_GET_MCE_CAP_SUPPORTED");
> > +       fprintf(stderr, "KVM supported MCG_CAP: %#lx\n", supported_mcg_caps);
> > +
> > +       TEST_ASSERT(supported_mcg_caps & MCG_CMCI_P, "MCG_CMCI_P is not supported");
>
> Instead of failing the test on older kernels, mark it as skipped.
>
> if (!(supported_mcg_caps & MCG_CMCI_P)) {
>         print_skip(" MCG_CMCI_P is not supported");
>         exit(KSFT_SKIP);
> }
>
Thanks, updated.
> > +
> > +       mcg_caps &= supported_mcg_caps | MCG_CAP_BANKS_MASK;
> > +       TEST_ASSERT(_vcpu_ioctl(vm, vcpuid, KVM_X86_SETUP_MCE, &mcg_caps) != -1,
> > +                   "KVM_X86_SETUP_MCE");
> > +}
> > +
> > +static void create_vcpu_with_mce_cap(struct kvm_vm *vm, uint32_t vcpuid, void *guest_code)
> > +{
> > +       struct kvm_mp_state mp_state;
> > +       struct kvm_regs regs;
> > +       vm_vaddr_t stack_vaddr;
> > +       stack_vaddr = vm_vaddr_alloc(vm, DEFAULT_STACK_PGS * getpagesize(),
> > +                                    DEFAULT_GUEST_STACK_VADDR_MIN);
> > +
> > +       vm_vcpu_add(vm, vcpuid);
> > +       setup_mce_cap(vm, vcpuid);
>
> Is some ordering requirement between KVM_X86_SETUP_MCE and other vCPU
> ioctls? I'm wondering why this function has to duplicate so much of
> the vCPU setup code.

Yes, KVM_X86_SETUP_MCE sets up the per vCPU mcg_cap and other MCE
related registers.

This has to be done before kvm_lapic_reset or
kvm_vcpu_after_set_cpuid because otherwise the kvm_apic_set_version
call will effectively set a value in APIC_LVR register that the number
of supported APIC_LVT* vectors is 5 instead of 6 (including CMCI).

In practice, this is how we would configure the per-vCPU
KVM_X86_SETUP_MCE in some other user space VMM impl (and the way it
has been done).
> > +
> > +       vcpu_set_cpuid(vm, vcpuid, kvm_get_supported_cpuid());
> > +       vcpu_setup(vm, vcpuid);
> > +
> > +       /* Setup guest general purpose registers */
> > +       vcpu_regs_get(vm, vcpuid, &regs);
> > +       regs.rflags = regs.rflags | 0x2;
> > +       regs.rsp = stack_vaddr + (DEFAULT_STACK_PGS * getpagesize());
> > +       regs.rip = (unsigned long) guest_code;
> > +       vcpu_regs_set(vm, vcpuid, &regs);
> > +
> > +       /* Setup the MP state */
> > +       mp_state.mp_state = 0;
> > +       vcpu_set_mp_state(vm, vcpuid, &mp_state);
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +       int r;
> > +       void *retval;
> > +       int run_secs = 3;
> > +       pthread_t threads[2];
> > +       struct thread_params params[2];
> > +       struct kvm_vm *vm;
> > +       uint64_t *p_i_ucna_rcvd, *p_i_ucna_addr;
> > +       uint64_t *p_ucna_addr, *p_ucna_addr2;
> > +
> > +       kvm_check_cap(KVM_CAP_MCE);
> > +
> > +       vm = vm_create_without_vcpus(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES);
> > +
> > +       create_vcpu_with_mce_cap(vm, UCNA_VCPU_ID, guest_code);
> > +
> > +       params[0].vm = vm;
> > +
> > +       vm_init_descriptor_tables(vm);
> > +       vcpu_init_descriptor_tables(vm, UCNA_VCPU_ID);
> > +       vm_install_exception_handler(vm, CMCI_VECTOR, guest_cmci_handler);
> > +
> > +       virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
> > +
> > +       p_i_ucna_rcvd = (uint64_t *)addr_gva2hva(vm, (uint64_t)&i_ucna_rcvd);
> > +       p_i_ucna_addr = (uint64_t *)addr_gva2hva(vm, (uint64_t)&i_ucna_addr);
> > +       p_ucna_addr = (uint64_t *)addr_gva2hva(vm, (uint64_t)&ucna_addr);
> > +       p_ucna_addr2 = (uint64_t *)addr_gva2hva(vm, (uint64_t)&ucna_addr2);
> > +
> > +       /* Start vCPU thread and wait for it to execute first HLT. */
>
> Stale comment from xapic_ipi_test.c
>
> > +       params[0].vcpu_id = UCNA_VCPU_ID;
> > +       r = pthread_create(&threads[0], NULL, vcpu_thread, &params[0]);
> > +       TEST_ASSERT(r == 0,
> > +                   "pthread_create vcpu thread failed errno=%d", errno);
> > +       fprintf(stderr, "vCPU thread started\n");
> > +
> > +       sleep(run_secs);
> > +
> > +       r = pthread_join(threads[0], &retval);
> > +       TEST_ASSERT(r == 0,
> > +                   "pthread_join on vcpu_id=%d failed with errno=%d",
> > +                   UCNA_VCPU_ID, r);
>
> Creating a here is unnecessary, e.g. you could just replace all this
> code with a call to vcpu_thread().

Good point, updated.
>
> > +
> > +       fprintf(stderr,
> > +               "Test successful after running for %d seconds.\n"
> > +               "UCNA CMCI interrupts received: %ld\n"
> > +               "Last UCNA address received via CMCI: %lx\n"
> > +               "First UCNA address in vCPU thread: %lx\n"
> > +               "Second UCNA address in vCPU thread: %lx\n",
> > +               run_secs, *p_i_ucna_rcvd, *p_i_ucna_addr, *p_ucna_addr, *p_ucna_addr2);
>
> Instead of printing out all these fields, can you make assertions
> about them? e.g. Assert that the expected number of interrupts were
> received, etc.

Updated.
>
> > +
> > +       kvm_vm_free(vm);
> > +}
> > --
> > 2.36.1.124.g0e6072fb45-goog
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.
  2022-05-20 17:36 ` [PATCH v4 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL Jue Wang
@ 2022-06-03 18:58   ` David Matlack
  2022-06-03 20:28     ` David Matlack
  0 siblings, 1 reply; 16+ messages in thread
From: David Matlack @ 2022-06-03 18:58 UTC (permalink / raw)
  To: Jue Wang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Tony Luck, kvm list, Greg Thelen,
	Jiaqi Yan

On Fri, May 20, 2022 at 10:36 AM Jue Wang <juew@google.com> wrote:
>
> To implement Corrected Machine Check Interrupt (CMCI) as another
> LVT vector, the APIC LVT logic needs to be able to handle an additional
> LVT vector conditioned on whether MCG_CMCI_P is enabled on the vCPU,
> this is because CMCI signaling can only be enabled when the CPU's
> MCG_CMCI_P bit is set (Intel SDM, section 15.3.1.1).
>
> This patch factors out the dependency on KVM_APIC_LVT_NUM from the
> APIC_VERSION macro. In later patches, KVM_APIC_LVT_NUM will be replaced
> with a helper kvm_apic_get_nr_lvt_entries that reports different LVT
> number conditioned on whether MCG_CMCI_P is enabled on the vCPU.

Prefer to state what the patch does first, then explain why. Also
please to use more precise language, especially when referring to
architectural concepts. For example, I don't believe there is any such
thing as an "LVT vector".

Putting that together, how about something like this:

Refactor APIC_VERSION so that the maximum number of LVT entries is
inserted at runtime rather than compile time. This will be used in a
subsequent commit to expose the LVT CMCI Register to VMs that support
Corrected Machine Check error counting/signaling
(IA32_MCG_CAP.MCG_CMCI_P=1).

>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Jue Wang <juew@google.com>
> ---
>  arch/x86/kvm/lapic.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 66b0eb0bda94..a5caa77e279f 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -54,7 +54,7 @@
>  #define PRIo64 "o"
>
>  /* 14 is the version for Xeon and Pentium 8.4.8*/
> -#define APIC_VERSION                   (0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
> +#define APIC_VERSION                   0x14UL
>  #define LAPIC_MMIO_LENGTH              (1 << 12)
>  /* followed define is not in apicdef.h */
>  #define MAX_APIC_VECTOR                        256
> @@ -401,7 +401,7 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
>  void kvm_apic_set_version(struct kvm_vcpu *vcpu)
>  {
>         struct kvm_lapic *apic = vcpu->arch.apic;
> -       u32 v = APIC_VERSION;
> +       u32 v = APIC_VERSION | ((KVM_APIC_LVT_NUM - 1) << 16);
>
>         if (!lapic_in_kernel(vcpu))
>                 return;
> --
> 2.36.1.124.g0e6072fb45-goog
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.
  2022-05-20 17:36 ` [PATCH v4 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic Jue Wang
@ 2022-06-03 20:26   ` David Matlack
  0 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2022-06-03 20:26 UTC (permalink / raw)
  To: Jue Wang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Tony Luck, kvm list, Greg Thelen,
	Jiaqi Yan

On Fri, May 20, 2022 at 10:36 AM Jue Wang <juew@google.com> wrote:
>
> This patch adds the handling of APIC_LVTCMCI, conditioned on whether the
> vCPU has set MCG_CMCI_P in MCG_CAP register.
>
> Signed-off-by: Jue Wang <juew@google.com>
> ---
>  arch/x86/kvm/lapic.c | 40 +++++++++++++++++++++++++++++++++-------
>  arch/x86/kvm/lapic.h |  3 ++-
>  2 files changed, 35 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index db12d2ef1aef..e2186a7c0eed 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -27,6 +27,7 @@
>  #include <linux/math64.h>
>  #include <linux/slab.h>
>  #include <asm/processor.h>
> +#include <asm/mce.h>
>  #include <asm/msr.h>
>  #include <asm/page.h>
>  #include <asm/current.h>
> @@ -398,14 +399,26 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
>         return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
>  }
>
> +static inline bool kvm_is_cmci_supported(struct kvm_vcpu *vcpu)
> +{
> +       return vcpu->arch.mcg_cap & MCG_CMCI_P;
> +}
> +
> +static inline int kvm_apic_get_nr_lvt_entries(struct kvm_lapic *apic)
> +{
> +       return KVM_APIC_MAX_NR_LVT_ENTRIES - !kvm_is_cmci_supported(apic->vcpu);
> +}

Suggesting computing the number of LVT entries once as part of
KVM_X86_SETUP_MCE and storing it in struct kvm_lapic (e.g.
apic->nr_lvt_entries).

I would also suggest replacing kvm_is_cmci_supported() with
kvm_lapic_lvt_cmci_supported(), which checks if the local APIC
supports the LVT CMCI register, rather than looking at mcg_cap. I
think that will result in more readable code because it more directly
checks that the local APIC supports the LVT entry we care about.

> +
>  void kvm_apic_set_version(struct kvm_vcpu *vcpu)
>  {
>         struct kvm_lapic *apic = vcpu->arch.apic;
> -       u32 v = APIC_VERSION | ((KVM_APIC_MAX_NR_LVT_ENTRIES - 1) << 16);
> +       u32 v = 0;
>
>         if (!lapic_in_kernel(vcpu))
>                 return;
>
> +       v = APIC_VERSION | ((kvm_apic_get_nr_lvt_entries(apic) - 1) << 16);
> +
>         /*
>          * KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
>          * which doesn't have EOI register; Some buggy OSes (e.g. Windows with
> @@ -425,7 +438,8 @@ static const unsigned int apic_lvt_mask[KVM_APIC_MAX_NR_LVT_ENTRIES] = {
>         [LVT_PERFORMANCE_COUNTER] = LVT_MASK | APIC_MODE_MASK,
>         [LVT_LINT0] = LINT_MASK,
>         [LVT_LINT1] = LINT_MASK,
> -       [LVT_ERROR] = LVT_MASK
> +       [LVT_ERROR] = LVT_MASK,
> +       [LVT_CMCI] = LVT_MASK | APIC_MODE_MASK
>  };
>
>  static int find_highest_vector(void *bitmap)
> @@ -1445,6 +1459,9 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
>                 APIC_REG_MASK(APIC_TMCCT) |
>                 APIC_REG_MASK(APIC_TDCR);
>
> +       if (kvm_is_cmci_supported(apic->vcpu))
> +               valid_reg_mask |= APIC_REG_MASK(APIC_LVTCMCI);
> +
>         /*
>          * ARBPRI and ICR2 are not valid in x2APIC mode.  WARN if KVM reads ICR
>          * in x2APIC mode as it's an 8-byte register in x2APIC and needs to be
> @@ -2083,12 +2100,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>                 apic_set_spiv(apic, val & mask);
>                 if (!(val & APIC_SPIV_APIC_ENABLED)) {
>                         int i;
> -                       u32 lvt_val;
>
> -                       for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++) {
> -                               lvt_val = kvm_lapic_get_reg(apic, APIC_LVTx(i));
> +                       for (i = 0; i < kvm_apic_get_nr_lvt_entries(apic); i++) {
>                                 kvm_lapic_set_reg(apic, APIC_LVTx(i),
> -                                            lvt_val | APIC_LVT_MASKED);
> +                                       kvm_lapic_get_reg(apic, APIC_LVTx(i)) | APIC_LVT_MASKED);
>                         }
>                         apic_update_lvtt(apic);
>                         atomic_set(&apic->lapic_timer.pending, 0);
> @@ -2140,6 +2155,17 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>                 apic_update_lvtt(apic);
>                 break;
>
> +       case APIC_LVTCMCI:
> +               if (!kvm_is_cmci_supported(apic->vcpu)) {
> +                       ret = 1;
> +                       break;
> +               }
> +               if (!kvm_apic_sw_enabled(apic))
> +                       val |= APIC_LVT_MASKED;
> +               val &= apic_lvt_mask[LVT_CMCI];
> +               kvm_lapic_set_reg(apic, APIC_LVTCMCI, val);

This should be folded into the handling of the other LVT registers.
The code is basically the same. Then you can also drop the
kvm_is_cmci_supported() and replace it with a more generic check that
checks if the LVT entry is supported.

> +               break;
> +
>         case APIC_TMICT:
>                 if (apic_lvtt_tscdeadline(apic))
>                         break;
> @@ -2383,7 +2409,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>                 kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
>         kvm_apic_set_version(apic->vcpu);
>
> -       for (i = 0; i < KVM_APIC_MAX_NR_LVT_ENTRIES; i++)
> +       for (i = 0; i < kvm_apic_get_nr_lvt_entries(apic); i++)
>                 kvm_lapic_set_reg(apic, APIC_LVTx(i), APIC_LVT_MASKED);
>         apic_update_lvtt(apic);
>         if (kvm_vcpu_is_reset_bsp(vcpu) &&
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 2d197ed0b8ce..16298bcb2abf 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -35,11 +35,12 @@ enum lapic_lvt_entry {
>         LVT_LINT0,
>         LVT_LINT1,
>         LVT_ERROR,
> +       LVT_CMCI,
>
>         KVM_APIC_MAX_NR_LVT_ENTRIES,
>  };
>
> -#define APIC_LVTx(x) (APIC_LVTT + 0x10 * (x))
> +#define APIC_LVTx(x) ((x) == LVT_CMCI ? APIC_LVTCMCI : APIC_LVTT + 0x10 * (x))
>
>  struct kvm_timer {
>         struct hrtimer timer;
> --
> 2.36.1.124.g0e6072fb45-goog
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.
  2022-06-03 18:58   ` David Matlack
@ 2022-06-03 20:28     ` David Matlack
  0 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2022-06-03 20:28 UTC (permalink / raw)
  To: Jue Wang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Tony Luck, kvm list, Greg Thelen,
	Jiaqi Yan

On Fri, Jun 3, 2022 at 11:58 AM David Matlack <dmatlack@google.com> wrote:
>
> On Fri, May 20, 2022 at 10:36 AM Jue Wang <juew@google.com> wrote:
> >
> > To implement Corrected Machine Check Interrupt (CMCI) as another
> > LVT vector, the APIC LVT logic needs to be able to handle an additional
> > LVT vector conditioned on whether MCG_CMCI_P is enabled on the vCPU,
> > this is because CMCI signaling can only be enabled when the CPU's
> > MCG_CMCI_P bit is set (Intel SDM, section 15.3.1.1).
> >
> > This patch factors out the dependency on KVM_APIC_LVT_NUM from the
> > APIC_VERSION macro. In later patches, KVM_APIC_LVT_NUM will be replaced
> > with a helper kvm_apic_get_nr_lvt_entries that reports different LVT
> > number conditioned on whether MCG_CMCI_P is enabled on the vCPU.
>
> Prefer to state what the patch does first, then explain why. Also
> please to use more precise language, especially when referring to
> architectural concepts. For example, I don't believe there is any such
> thing as an "LVT vector".

BTW, these suggestions apply to the entire series.

>
> Putting that together, how about something like this:
>
> Refactor APIC_VERSION so that the maximum number of LVT entries is
> inserted at runtime rather than compile time. This will be used in a
> subsequent commit to expose the LVT CMCI Register to VMs that support
> Corrected Machine Check error counting/signaling
> (IA32_MCG_CAP.MCG_CMCI_P=1).
>
> >
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Jue Wang <juew@google.com>
> > ---
> >  arch/x86/kvm/lapic.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 66b0eb0bda94..a5caa77e279f 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -54,7 +54,7 @@
> >  #define PRIo64 "o"
> >
> >  /* 14 is the version for Xeon and Pentium 8.4.8*/
> > -#define APIC_VERSION                   (0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
> > +#define APIC_VERSION                   0x14UL
> >  #define LAPIC_MMIO_LENGTH              (1 << 12)
> >  /* followed define is not in apicdef.h */
> >  #define MAX_APIC_VECTOR                        256
> > @@ -401,7 +401,7 @@ static inline int apic_lvt_nmi_mode(u32 lvt_val)
> >  void kvm_apic_set_version(struct kvm_vcpu *vcpu)
> >  {
> >         struct kvm_lapic *apic = vcpu->arch.apic;
> > -       u32 v = APIC_VERSION;
> > +       u32 v = APIC_VERSION | ((KVM_APIC_LVT_NUM - 1) << 16);
> >
> >         if (!lapic_in_kernel(vcpu))
> >                 return;
> > --
> > 2.36.1.124.g0e6072fb45-goog
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.
  2022-05-20 17:36 ` [PATCH v4 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs Jue Wang
@ 2022-06-03 20:41   ` David Matlack
  0 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2022-06-03 20:41 UTC (permalink / raw)
  To: Jue Wang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Tony Luck, kvm list, Greg Thelen,
	Jiaqi Yan

On Fri, May 20, 2022 at 10:36 AM Jue Wang <juew@google.com> wrote:
>
> Corrected Machine Check Interrupt (CMCI) can be configured via the per
> Machine Check bank registers: IA32_MCi_CTL2. This patch adds the
> emulation of IA32_MCi_CTL2 registers to KVM. A separate mci_ctl2_banks
> array is used to keep the existing mce_banks register layout intact.
>
> In Machine Check Architecture (MCA), MCG_CMCI_P (bit 10 of MCG_CAP) is
> the corrected MC error counting/signaling extension present flag. When
> this bit is set, it does not imply CMCI reported corrected error or UCNA
> error is supported across all MCA banks. Software should check on a bank
> by bank basis (i.e. if bit 30 in each IA32_MCi_CTL2 register is set).
>
> Signed-off-by: Jue Wang <juew@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   1 +
>  arch/x86/kvm/x86.c              | 130 ++++++++++++++++++++++----------
>  2 files changed, 92 insertions(+), 39 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4ff36610af6a..178b7e01bf8f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -806,6 +806,7 @@ struct kvm_vcpu_arch {
>         u64 mcg_ctl;
>         u64 mcg_ext_ctl;
>         u64 *mce_banks;
> +       u64 *mci_ctl2_banks;
>
>         /* Cache MMIO info */
>         u64 mmio_gva;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0e839077ce52..f8ab592f519b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3174,6 +3174,16 @@ static void kvmclock_sync_fn(struct work_struct *work)
>                                         KVMCLOCK_SYNC_PERIOD);
>  }
>
> +/* These helpers are safe iff @msr is known to be an MCx bank MSR. */
> +static bool is_mci_control_msr(u32 msr)
> +{
> +       return (msr & 3) == 0;
> +}
> +static bool is_mci_status_msr(u32 msr)
> +{
> +       return (msr & 3) == 1;
> +}
> +
>  /*
>   * On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
>   */
> @@ -3192,6 +3202,7 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         unsigned bank_num = mcg_cap & 0xff;
>         u32 msr = msr_info->index;
>         u64 data = msr_info->data;
> +       u32 offset, last_msr;
>
>         switch (msr) {
>         case MSR_IA32_MCG_STATUS:
> @@ -3205,32 +3216,50 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                         return 1;
>                 vcpu->arch.mcg_ctl = data;
>                 break;
> -       default:
> -               if (msr >= MSR_IA32_MC0_CTL &&
> -                   msr < MSR_IA32_MCx_CTL(bank_num)) {
> -                       u32 offset = array_index_nospec(
> -                               msr - MSR_IA32_MC0_CTL,
> -                               MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
> -
> -                       /* only 0 or all 1s can be written to IA32_MCi_CTL
> -                        * some Linux kernels though clear bit 10 in bank 4 to
> -                        * workaround a BIOS/GART TBL issue on AMD K8s, ignore
> -                        * this to avoid an uncatched #GP in the guest
> -                        */
> -                       if ((offset & 0x3) == 0 &&
> -                           data != 0 && (data | (1 << 10)) != ~(u64)0)
> -                               return -1;
> -
> -                       /* MCi_STATUS */
> -                       if (!msr_info->host_initiated &&
> -                           (offset & 0x3) == 1 && data != 0) {
> -                               if (!can_set_mci_status(vcpu))
> -                                       return -1;
> -                       }
> +       case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
> +               last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
> +               if (msr > last_msr)
> +                       return 1;
>
> -                       vcpu->arch.mce_banks[offset] = data;
> -                       break;
> -               }
> +               if (!(mcg_cap & MCG_CMCI_P) && (data || !msr_info->host_initiated))
> +                       return 1;
> +               /* An attempt to write a 1 to a reserved bit raises #GP */
> +               if (data & ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK))
> +                       return 1;
> +               offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
> +                                           last_msr + 1 - MSR_IA32_MC0_CTL2);
> +               vcpu->arch.mci_ctl2_banks[offset] = data;

There's a lot of emulation in this commit that would be great to have
test coverage for. e.g. Testing that writing to MSR_IA32_MC0_CTL2 when
mcg_cap.MCG_CMCI_P=0 results in a #GP, writing to reserved bits, etc.

> +               break;
> +       case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> +               last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
> +               if (msr > last_msr)
> +                       return 1;
> +
> +               /*
> +                * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
> +                * values are architecturally undefined.  But, some Linux
> +                * kernels clear bit 10 in bank 4 to workaround a BIOS/GART TLB
> +                * issue on AMD K8s, allow bit 10 to be clear when setting all
> +                * other bits in order to avoid an uncaught #GP in the guest.
> +                */
> +               if (is_mci_control_msr(msr) &&
> +                   data != 0 && (data | (1 << 10)) != ~(u64)0)
> +                       return 1;
> +
> +               /*
> +                * All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR.
> +                * AMD-based CPUs allow non-zero values, but if and only if
> +                * HWCR[McStatusWrEn] is set.
> +                */
> +               if (!msr_info->host_initiated && is_mci_status_msr(msr) &&
> +                   data != 0 && !can_set_mci_status(vcpu))
> +                       return 1;
> +
> +               offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
> +                                           last_msr + 1 - MSR_IA32_MC0_CTL);
> +               vcpu->arch.mce_banks[offset] = data;
> +               break;
> +       default:
>                 return 1;
>         }
>         return 0;
> @@ -3514,7 +3543,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                         return 1;
>                 }
>                 break;
> -       case 0x200 ... 0x2ff:
> +       case 0x200 ... MSR_IA32_MC0_CTL2 - 1:
> +       case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff:
>                 return kvm_mtrr_set_msr(vcpu, msr, data);
>         case MSR_IA32_APICBASE:
>                 return kvm_set_apic_base(vcpu, msr_info);
> @@ -3671,6 +3701,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         case MSR_IA32_MCG_CTL:
>         case MSR_IA32_MCG_STATUS:
>         case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> +       case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
>                 return set_msr_mce(vcpu, msr_info);
>
>         case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
> @@ -3775,6 +3806,7 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
>         u64 data;
>         u64 mcg_cap = vcpu->arch.mcg_cap;
>         unsigned bank_num = mcg_cap & 0xff;
> +       u32 offset, last_msr;
>
>         switch (msr) {
>         case MSR_IA32_P5_MC_ADDR:
> @@ -3792,16 +3824,27 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
>         case MSR_IA32_MCG_STATUS:
>                 data = vcpu->arch.mcg_status;
>                 break;
> -       default:
> -               if (msr >= MSR_IA32_MC0_CTL &&
> -                   msr < MSR_IA32_MCx_CTL(bank_num)) {
> -                       u32 offset = array_index_nospec(
> -                               msr - MSR_IA32_MC0_CTL,
> -                               MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
> +       case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
> +               last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
> +               if (msr > last_msr)
> +                       return 1;
>
> -                       data = vcpu->arch.mce_banks[offset];
> -                       break;
> -               }
> +               if (!(mcg_cap & MCG_CMCI_P) && !host)
> +                       return 1;
> +               offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
> +                                           last_msr + 1 - MSR_IA32_MC0_CTL2);
> +               data = vcpu->arch.mci_ctl2_banks[offset];
> +               break;
> +       case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> +               last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
> +               if (msr > last_msr)
> +                       return 1;
> +
> +               offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
> +                                           last_msr + 1 - MSR_IA32_MC0_CTL);
> +               data = vcpu->arch.mce_banks[offset];
> +               break;
> +       default:
>                 return 1;
>         }
>         *pdata = data;
> @@ -3898,7 +3941,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 break;
>         }
>         case MSR_MTRRcap:
> -       case 0x200 ... 0x2ff:
> +       case 0x200 ... MSR_IA32_MC0_CTL2 - 1:
> +       case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff:
>                 return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
>         case 0xcd: /* fsb frequency */
>                 msr_info->data = 3;
> @@ -4014,6 +4058,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         case MSR_IA32_MCG_CTL:
>         case MSR_IA32_MCG_STATUS:
>         case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> +       case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
>                 return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
>                                    msr_info->host_initiated);
>         case MSR_IA32_XSS:
> @@ -4769,9 +4814,12 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
>         /* Init IA32_MCG_CTL to all 1s */
>         if (mcg_cap & MCG_CTL_P)
>                 vcpu->arch.mcg_ctl = ~(u64)0;
> -       /* Init IA32_MCi_CTL to all 1s */
> -       for (bank = 0; bank < bank_num; bank++)
> +       /* Init IA32_MCi_CTL to all 1s, IA32_MCi_CTL2 to all 0s */
> +       for (bank = 0; bank < bank_num; bank++) {
>                 vcpu->arch.mce_banks[bank*4] = ~(u64)0;
> +               if (mcg_cap & MCG_CMCI_P)
> +                       vcpu->arch.mci_ctl2_banks[bank] = 0;
> +       }
>
>         static_call(kvm_x86_setup_mce)(vcpu);
>  out:
> @@ -11226,7 +11274,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>
>         vcpu->arch.mce_banks = kcalloc(KVM_MAX_MCE_BANKS * 4, sizeof(u64),
>                                        GFP_KERNEL_ACCOUNT);
> -       if (!vcpu->arch.mce_banks)
> +       vcpu->arch.mci_ctl2_banks = kcalloc(KVM_MAX_MCE_BANKS, sizeof(u64),
> +                                           GFP_KERNEL_ACCOUNT);
> +       if (!vcpu->arch.mce_banks || !vcpu->arch.mci_ctl2_banks)
>                 goto fail_free_pio_data;
>         vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS;
>
> @@ -11279,6 +11329,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>         free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
>  fail_free_mce_banks:
>         kfree(vcpu->arch.mce_banks);
> +       kfree(vcpu->arch.mci_ctl2_banks);
>  fail_free_pio_data:
>         free_page((unsigned long)vcpu->arch.pio_data);
>  fail_free_lapic:
> @@ -11323,6 +11374,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>         kvm_hv_vcpu_uninit(vcpu);
>         kvm_pmu_destroy(vcpu);
>         kfree(vcpu->arch.mce_banks);
> +       kfree(vcpu->arch.mci_ctl2_banks);
>         kvm_free_lapic(vcpu);
>         idx = srcu_read_lock(&vcpu->kvm->srcu);
>         kvm_mmu_destroy(vcpu);
> --
> 2.36.1.124.g0e6072fb45-goog
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors
  2022-05-20 17:36 ` [PATCH v4 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors Jue Wang
@ 2022-06-03 20:54   ` David Matlack
  0 siblings, 0 replies; 16+ messages in thread
From: David Matlack @ 2022-06-03 20:54 UTC (permalink / raw)
  To: Jue Wang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Tony Luck, kvm list, Greg Thelen,
	Jiaqi Yan

On Fri, May 20, 2022 at 10:37 AM Jue Wang <juew@google.com> wrote:
>
> Make KVM support the CMCI capability by default by adding MCG_CMCI_P to
> kvm_mce_cap_supported. A vCPU can request for this capability via
> KVM_X86_SETUP_MCE. Uncorrectable Error No Action required (UCNA) injection
> reuses the MCE injection ioctl KVM_X86_SET_MCE.
>
> Neither of the CMCI and UCNA emulations depends on hardware.
>
> Signed-off-by: Jue Wang <juew@google.com>
> ---
>  arch/x86/kvm/vmx/vmx.c |  1 +
>  arch/x86/kvm/x86.c     | 50 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 51 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 610355b9ccce..1aed964ee4ee 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8037,6 +8037,7 @@ static __init int hardware_setup(void)
>         }
>
>         kvm_mce_cap_supported |= MCG_LMCE_P;
> +       kvm_mce_cap_supported |= MCG_CMCI_P;
>
>         if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
>                 return -EINVAL;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f8ab592f519b..d0b1bb6e5e4a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4826,6 +4826,52 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
>         return r;
>  }
>
> +/*
> + * Validate this is an UCNA error by checking the MCG_STATUS and MCi_STATUS
> + * registers that none of the bits for Machine Check Exceptions are set and
> + * both the VAL (valid) and UC (uncorrectable) bits are set.
> + * UCNA - UnCorrectable No Action required
> + * SRAR - Software Recoverable Action Required
> + * MCI_STATUS_PCC - Processor Context Corrupted
> + * MCI_STATUS_S - Signaled as a Machine Check Exception
> + * MCI_STATUS_AR - This MCE is "software recoverable action required"
> + */
> +static bool is_ucna(struct kvm_x86_mce *mce)
> +{
> +       return  !mce->mcg_status &&
> +               !(mce->status & (MCI_STATUS_PCC | MCI_STATUS_S | MCI_STATUS_AR)) &&
> +               (mce->status & MCI_STATUS_VAL) &&
> +               (mce->status & MCI_STATUS_UC);
> +}
> +
> +static int kvm_vcpu_x86_set_ucna(struct kvm_vcpu *vcpu, struct kvm_x86_mce *mce)
> +{
> +       u64 mcg_cap = vcpu->arch.mcg_cap;
> +       unsigned int bank_num = mcg_cap & 0xff;
> +       u64 *banks = vcpu->arch.mce_banks;
> +
> +       if (mce->bank >= bank_num)
> +               return -EINVAL;

Drop this check. The caller already checks it.

> +
> +       if (!is_ucna(mce))
> +               return -EINVAL;

Drop this check. The only caller of this function already checks is_ucna().

> +
> +       banks += 4 * mce->bank;

The caller also computes banks. Perhaps just pass that in rather that
re-calculating it here?

Also, calculating banks should probably use array_index_nospec() since
the index is untrusted (coming from userspace).

> +       banks[1] = mce->status;
> +       banks[2] = mce->addr;
> +       banks[3] = mce->misc;
> +       vcpu->arch.mcg_status = mce->mcg_status;
> +
> +       if (!(mcg_cap & MCG_CMCI_P) ||
> +           !(vcpu->arch.mci_ctl2_banks[mce->bank] & MCI_CTL2_CMCI_EN))
> +               return 0;
> +
> +       if (lapic_in_kernel(vcpu))
> +               kvm_apic_local_deliver(vcpu->arch.apic, APIC_LVTCMCI);
> +
> +       return 0;
> +}
> +
>  static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
>                                       struct kvm_x86_mce *mce)
>  {
> @@ -4835,6 +4881,10 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
>
>         if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL))
>                 return -EINVAL;
> +
> +       if (is_ucna(mce))
> +               return kvm_vcpu_x86_set_ucna(vcpu, mce);
> +
>         /*
>          * if IA32_MCG_CTL is not all 1s, the uncorrected error
>          * reporting is disabled
> --
> 2.36.1.124.g0e6072fb45-goog
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-06-03 20:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20 17:36 [PATCH v4 0/8] KVM: x86: Add CMCI and UCNA emulation Jue Wang
2022-05-20 17:36 ` [PATCH v4 1/8] KVM: x86: Make APIC_VERSION capture only the magic 0x14UL Jue Wang
2022-06-03 18:58   ` David Matlack
2022-06-03 20:28     ` David Matlack
2022-05-20 17:36 ` [PATCH v4 2/8] KVM: x86: Fill apic_lvt_mask with enums / explicit entries Jue Wang
2022-05-20 17:36 ` [PATCH v4 3/8] KVM: x86: Add APIC_LVTx() macro Jue Wang
2022-05-20 17:36 ` [PATCH v4 4/8] KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic Jue Wang
2022-06-03 20:26   ` David Matlack
2022-05-20 17:36 ` [PATCH v4 5/8] KVM: x86: Use kcalloc to allocate the mce_banks array Jue Wang
2022-05-20 17:36 ` [PATCH v4 6/8] KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs Jue Wang
2022-06-03 20:41   ` David Matlack
2022-05-20 17:36 ` [PATCH v4 7/8] KVM: x86: Enable CMCI capability by default and handle injected UCNA errors Jue Wang
2022-06-03 20:54   ` David Matlack
2022-05-20 17:36 ` [RFC v4 8/8] KVM: selftests: Add a self test for UCNA injection Jue Wang
2022-05-20 21:08   ` David Matlack
2022-05-20 22:16     ` Jue Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.