All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/25] Enable FRED with KVM VMX
@ 2024-02-07 17:26 Xin Li
  2024-02-07 17:26 ` [PATCH v2 01/25] KVM: VMX: Cleanup VMX basic information defines and usages Xin Li
                   ` (26 more replies)
  0 siblings, 27 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

This patch set enables the Intel flexible return and event delivery
(FRED) architecture with KVM VMX to allow guests to utilize FRED.

The FRED architecture defines simple new transitions that change
privilege level (ring transitions). The FRED architecture was
designed with the following goals:

1) Improve overall performance and response time by replacing event
   delivery through the interrupt descriptor table (IDT event
   delivery) and event return by the IRET instruction with lower
   latency transitions.

2) Improve software robustness by ensuring that event delivery
   establishes the full supervisor context and that event return
   establishes the full user context.

The new transitions defined by the FRED architecture are FRED event
delivery and, for returning from events, two FRED return instructions.
FRED event delivery can effect a transition from ring 3 to ring 0, but
it is used also to deliver events incident to ring 0. One FRED
instruction (ERETU) effects a return from ring 0 to ring 3, while the
other (ERETS) returns while remaining in ring 0. Collectively, FRED
event delivery and the FRED return instructions are FRED transitions.

Intel VMX architecture is extended to run FRED guests, and the major
changes are:

1) New VMCS fields for FRED context management, which includes two new
event data VMCS fields, eight new guest FRED context VMCS fields and
eight new host FRED context VMCS fields.

2) VMX nested-exception support for proper virtualization of stack
levels introduced with FRED architecture.

Search for the latest FRED spec in most search engines with this search
pattern:

  site:intel.com FRED (flexible return and event delivery) specification

As the native FRED patches are committed in the tip tree "x86/fred"
branch:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/fred,
and we have received a good amount of review comments for v1, it's time
to send out v2 based on this branch for further help from the community.

Patch 1-2 are cleanups to VMX basic and misc MSRs, which were sent
out earlier as a preparation for FRED changes:
https://lore.kernel.org/kvm/20240206182032.1596-1-xin3.li@intel.com/T/#u

Patch 3-15 add FRED support to VMX.
Patch 16-21 add FRED support to nested VMX.
Patch 22 exposes FRED and its baseline features to KVM guests.
Patch 23-25 add FRED selftests.

There is also a counterpart qemu patch set for FRED at:
https://lore.kernel.org/qemu-devel/20231109072012.8078-1-xin3.li@intel.com/T/,
which works with this patch set to allow KVM to run FRED guests.


Changes since v1:
* Always load the secondary VM exit controls (Sean Christopherson).
* Remove FRED VM entry/exit controls consistency checks in
  setup_vmcs_config() (Sean Christopherson).
* Clear FRED VM entry/exit controls if FRED is not enumerated (Chao Gao).
* Use guest_can_use() to trace FRED enumeration in a vcpu (Chao Gao).
* Enable FRED MSRs intercept if FRED is no longer enumerated in CPUID
  (Chao Gao).
* Move guest FRED states init into __vmx_vcpu_reset() (Chao Gao).
* Don't use guest_cpuid_has() in vmx_prepare_switch_to_{host,guest}(),
  which are called from IRQ-disabled context (Chao Gao).
* Reset msr_guest_fred_rsp0 in __vmx_vcpu_reset() (Chao Gao).
* Fail host requested FRED MSRs access if KVM cannot virtualize FRED
  (Chao Gao).
* Handle the case FRED MSRs are valid but KVM cannot virtualize FRED
  (Chao Gao).
* Add sanity checks when writing to FRED MSRs.
* Explain why it is ok to only check CR4.FRED in kvm_is_fred_enabled()
  (Chao Gao).
* Document event data should be equal to CR2/DR6/IA32_XFD_ERR instead
  of using WARN_ON() (Chao Gao).
* Zero event data if a #NM was not caused by extended feature disable
  (Chao Gao).
* Set the nested flag when there is an original interrupt (Chao Gao).
* Dump guest FRED states only if guest has FRED enabled (Nikolay Borisov).
* Add a prerequisite to SHADOW_FIELD_R[OW] macros
* Remove hyperv TLFS related changes (Jeremi Piotrowski).
* Use kvm_cpu_cap_has() instead of cpu_feature_enabled() to decouple
  KVM's capability to virtualize a feature and host's enabling of a
  feature (Chao Gao).


Xin Li (25):
  KVM: VMX: Cleanup VMX basic information defines and usages
  KVM: VMX: Cleanup VMX misc information defines and usages
  KVM: VMX: Add support for the secondary VM exit controls
  KVM: x86: Mark CR4.FRED as not reserved
  KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config
  KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID
  KVM: VMX: Set intercept for FRED MSRs
  KVM: VMX: Initialize VMCS FRED fields
  KVM: VMX: Switch FRED RSP0 between host and guest
  KVM: VMX: Add support for FRED context save/restore
  KVM: x86: Add kvm_is_fred_enabled()
  KVM: VMX: Handle FRED event data
  KVM: VMX: Handle VMX nested exception for FRED
  KVM: VMX: Disable FRED if FRED consistency checks fail
  KVM: VMX: Dump FRED context in dump_vmcs()
  KVM: VMX: Invoke vmx_set_cpu_caps() before nested setup
  KVM: nVMX: Add support for the secondary VM exit controls
  KVM: nVMX: Add a prerequisite to SHADOW_FIELD_R[OW] macros
  KVM: nVMX: Add FRED VMCS fields
  KVM: nVMX: Add support for VMX FRED controls
  KVM: nVMX: Add VMCS FRED states checking
  KVM: x86: Allow FRED/LKGS/WRMSRNS to be exposed to guests
  KVM: selftests: Run debug_regs test with FRED enabled
  KVM: selftests: Add a new VM guest mode to run user level code
  KVM: selftests: Add fred exception tests

 Documentation/virt/kvm/x86/nested-vmx.rst     |  19 +
 arch/x86/include/asm/kvm_host.h               |   8 +-
 arch/x86/include/asm/msr-index.h              |  15 +-
 arch/x86/include/asm/vmx.h                    |  59 ++-
 arch/x86/kvm/cpuid.c                          |   4 +-
 arch/x86/kvm/governed_features.h              |   1 +
 arch/x86/kvm/kvm_cache_regs.h                 |  17 +
 arch/x86/kvm/svm/svm.c                        |   4 +-
 arch/x86/kvm/vmx/capabilities.h               |  30 +-
 arch/x86/kvm/vmx/nested.c                     | 329 ++++++++++++---
 arch/x86/kvm/vmx/nested.h                     |   2 +-
 arch/x86/kvm/vmx/vmcs.h                       |   1 +
 arch/x86/kvm/vmx/vmcs12.c                     |  19 +
 arch/x86/kvm/vmx/vmcs12.h                     |  38 ++
 arch/x86/kvm/vmx/vmcs_shadow_fields.h         |  80 ++--
 arch/x86/kvm/vmx/vmx.c                        | 385 +++++++++++++++---
 arch/x86/kvm/vmx/vmx.h                        |  15 +-
 arch/x86/kvm/x86.c                            | 103 ++++-
 arch/x86/kvm/x86.h                            |   5 +-
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/kvm_util_base.h     |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |  36 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    |   5 +-
 .../selftests/kvm/lib/x86_64/processor.c      |  15 +-
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |   4 +-
 .../testing/selftests/kvm/x86_64/debug_regs.c |  50 ++-
 .../testing/selftests/kvm/x86_64/fred_test.c  | 297 ++++++++++++++
 27 files changed, 1320 insertions(+), 223 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/fred_test.c


base-commit: e13841907b8fda0ae0ce1ec03684665f578416a8
-- 
2.43.0


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2 01/25] KVM: VMX: Cleanup VMX basic information defines and usages
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 02/25] KVM: VMX: Cleanup VMX misc " Xin Li
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Define VMX basic information fields with BIT_ULL()/GENMASK_ULL(), and
replace hardcoded VMX basic numbers with these field macros.

Save the full/raw value of MSR_IA32_VMX_BASIC in the global vmcs_config
as type u64 to get rid of the hi/lo crud, and then use VMX_BASIC helpers
to extract info as needed.

VMX_EPTP_MT_{WB,UC} values 0x6 and 0x0 are generic x86 memory type
values, no need to prefix them with VMX_EPTP_.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
---

Changes since v4:
* Do not split VMX_BASIC bit definitions across multiple files (Kai
  Huang).
* Put some words to the changelog to justify changes around memory
  type macros (Kai Huang).
* Remove a leftover ';' (Kai Huang).

Changes since v3:
* Remove vmx_basic_vmcs_basic_cap() (Kai Huang).
* Add 2 macros VMX_BASIC_VMCS12_SIZE and VMX_BASIC_MEM_TYPE_WB to
  avoid keeping 2 their bit shift macros (Kai Huang).

Changes since v2:
* Simply save the full/raw value of MSR_IA32_VMX_BASIC in the global
  vmcs_config, and then use the helpers to extract info from it as
  needed (Sean Christopherson).
* Move all VMX_MISC related changes to the second patch (Kai Huang).
* Commonize memory type definitions used in the VMX files, as memory
  types are architectural.

Changes since v1:
* Don't add field shift macros unless it's really needed, extra layer
  of indirect makes it harder to read (Sean Christopherson).
* Add a static_assert() to ensure that VMX_BASIC_FEATURES_MASK doesn't
  overlap with VMX_BASIC_RESERVED_BITS (Sean Christopherson).
* read MSR_IA32_VMX_BASIC into an u64 rather than 2 u32 (Sean
  Christopherson).
* Add 2 new functions for extracting fields from VMX basic (Sean
  Christopherson).
* Drop the tools header update (Sean Christopherson).
* Move VMX basic field macros to arch/x86/include/asm/vmx.h.
---
 arch/x86/include/asm/msr-index.h |  9 ---------
 arch/x86/include/asm/vmx.h       | 18 ++++++++++++++++--
 arch/x86/kvm/vmx/capabilities.h  |  6 ++----
 arch/x86/kvm/vmx/nested.c        | 31 ++++++++++++++++++++-----------
 arch/x86/kvm/vmx/vmx.c           | 24 ++++++++++--------------
 5 files changed, 48 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1f9dc9bd13eb..e8af4cf01e89 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1113,15 +1113,6 @@
 #define MSR_IA32_VMX_VMFUNC             0x00000491
 #define MSR_IA32_VMX_PROCBASED_CTLS3	0x00000492
 
-/* VMX_BASIC bits and bitmasks */
-#define VMX_BASIC_VMCS_SIZE_SHIFT	32
-#define VMX_BASIC_TRUE_CTLS		(1ULL << 55)
-#define VMX_BASIC_64		0x0001000000000000LLU
-#define VMX_BASIC_MEM_TYPE_SHIFT	50
-#define VMX_BASIC_MEM_TYPE_MASK	0x003c000000000000LLU
-#define VMX_BASIC_MEM_TYPE_WB	6LLU
-#define VMX_BASIC_INOUT		0x0040000000000000LLU
-
 /* Resctrl MSRs: */
 /* - Intel: */
 #define MSR_IA32_L3_QOS_CFG		0xc81
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 4dba17363008..353538b79ce5 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -121,6 +121,17 @@
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
+/* x86 memory types, explicitly used in VMX only */
+#define MEM_TYPE_WB				0x6ULL
+#define MEM_TYPE_UC				0x0ULL
+
+/* VMX_BASIC bits */
+#define VMX_BASIC_32BIT_PHYS_ADDR_ONLY		BIT_ULL(48)
+#define VMX_BASIC_DUAL_MONITOR_TREATMENT	BIT_ULL(49)
+#define VMX_BASIC_INOUT				BIT_ULL(54)
+#define VMX_BASIC_TRUE_CTLS			BIT_ULL(55)
+
+
 #define VMX_MISC_PREEMPTION_TIMER_RATE_MASK	0x0000001f
 #define VMX_MISC_SAVE_EFER_LMA			0x00000020
 #define VMX_MISC_ACTIVITY_HLT			0x00000040
@@ -144,6 +155,11 @@ static inline u32 vmx_basic_vmcs_size(u64 vmx_basic)
 	return (vmx_basic & GENMASK_ULL(44, 32)) >> 32;
 }
 
+static inline u32 vmx_basic_vmcs_mem_type(u64 vmx_basic)
+{
+	return (vmx_basic & GENMASK_ULL(53, 50)) >> 50;
+}
+
 static inline int vmx_misc_preemption_timer_rate(u64 vmx_misc)
 {
 	return vmx_misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
@@ -506,8 +522,6 @@ enum vmcs_field {
 #define VMX_EPTP_PWL_5				0x20ull
 #define VMX_EPTP_AD_ENABLE_BIT			(1ull << 6)
 #define VMX_EPTP_MT_MASK			0x7ull
-#define VMX_EPTP_MT_WB				0x6ull
-#define VMX_EPTP_MT_UC				0x0ull
 #define VMX_EPT_READABLE_MASK			0x1ull
 #define VMX_EPT_WRITABLE_MASK			0x2ull
 #define VMX_EPT_EXECUTABLE_MASK			0x4ull
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 41a4533f9989..86ce8bb96bed 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -54,9 +54,7 @@ struct nested_vmx_msrs {
 };
 
 struct vmcs_config {
-	int size;
-	u32 basic_cap;
-	u32 revision_id;
+	u64 basic;
 	u32 pin_based_exec_ctrl;
 	u32 cpu_based_exec_ctrl;
 	u32 cpu_based_2nd_exec_ctrl;
@@ -76,7 +74,7 @@ extern struct vmx_capability vmx_capability __ro_after_init;
 
 static inline bool cpu_has_vmx_basic_inout(void)
 {
-	return	(((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
+	return	vmcs_config.basic & VMX_BASIC_INOUT;
 }
 
 static inline bool cpu_has_virtual_nmis(void)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6329a306856b..14d0167825dd 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1226,23 +1226,29 @@ static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
 	return (superset | subset) == superset;
 }
 
+#define VMX_BASIC_FEATURES_MASK			\
+	(VMX_BASIC_DUAL_MONITOR_TREATMENT |	\
+	 VMX_BASIC_INOUT |			\
+	 VMX_BASIC_TRUE_CTLS)
+
+#define VMX_BASIC_RESERVED_BITS			\
+	(GENMASK_ULL(63, 56) | GENMASK_ULL(47, 45) | BIT_ULL(31))
+
 static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
 {
-	const u64 feature_and_reserved =
-		/* feature (except bit 48; see below) */
-		BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
-		/* reserved */
-		BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
 	u64 vmx_basic = vmcs_config.nested.basic;
 
-	if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
+	static_assert(!(VMX_BASIC_FEATURES_MASK & VMX_BASIC_RESERVED_BITS));
+
+	if (!is_bitwise_subset(vmx_basic, data,
+			       VMX_BASIC_FEATURES_MASK | VMX_BASIC_RESERVED_BITS))
 		return -EINVAL;
 
 	/*
 	 * KVM does not emulate a version of VMX that constrains physical
 	 * addresses of VMX structures (e.g. VMCS) to 32-bits.
 	 */
-	if (data & BIT_ULL(48))
+	if (data & VMX_BASIC_32BIT_PHYS_ADDR_ONLY)
 		return -EINVAL;
 
 	if (vmx_basic_vmcs_revision_id(vmx_basic) !=
@@ -2726,11 +2732,11 @@ static bool nested_vmx_check_eptp(struct kvm_vcpu *vcpu, u64 new_eptp)
 
 	/* Check for memory type validity */
 	switch (new_eptp & VMX_EPTP_MT_MASK) {
-	case VMX_EPTP_MT_UC:
+	case MEM_TYPE_UC:
 		if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPTP_UC_BIT)))
 			return false;
 		break;
-	case VMX_EPTP_MT_WB:
+	case MEM_TYPE_WB:
 		if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPTP_WB_BIT)))
 			return false;
 		break;
@@ -6994,6 +7000,9 @@ static void nested_vmx_setup_misc_data(struct vmcs_config *vmcs_conf,
 	msrs->misc_high = 0;
 }
 
+#define VMX_BSAIC_VMCS12_SIZE	((u64)VMCS12_SIZE << 32)
+#define VMX_BASIC_MEM_TYPE_WB	(MEM_TYPE_WB << 50)
+
 static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
 {
 	/*
@@ -7005,8 +7014,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
 	msrs->basic =
 		VMCS12_REVISION |
 		VMX_BASIC_TRUE_CTLS |
-		((u64)VMCS12_SIZE << VMX_BASIC_VMCS_SIZE_SHIFT) |
-		(VMX_BASIC_MEM_TYPE_WB << VMX_BASIC_MEM_TYPE_SHIFT);
+		VMX_BSAIC_VMCS12_SIZE |
+		VMX_BASIC_MEM_TYPE_WB;
 
 	if (cpu_has_vmx_basic_inout())
 		msrs->basic |= VMX_BASIC_INOUT;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cce92f701dee..a16b3de01e3f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2564,13 +2564,13 @@ static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
 static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 			     struct vmx_capability *vmx_cap)
 {
-	u32 vmx_msr_low, vmx_msr_high;
 	u32 _pin_based_exec_control = 0;
 	u32 _cpu_based_exec_control = 0;
 	u32 _cpu_based_2nd_exec_control = 0;
 	u64 _cpu_based_3rd_exec_control = 0;
 	u32 _vmexit_control = 0;
 	u32 _vmentry_control = 0;
+	u64 basic_msr;
 	u64 misc_msr;
 	int i;
 
@@ -2689,29 +2689,25 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		_vmexit_control &= ~x_ctrl;
 	}
 
-	rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high);
+	rdmsrl(MSR_IA32_VMX_BASIC, basic_msr);
 
 	/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
-	if ((vmx_msr_high & 0x1fff) > PAGE_SIZE)
+	if ((vmx_basic_vmcs_size(basic_msr) > PAGE_SIZE))
 		return -EIO;
 
 #ifdef CONFIG_X86_64
 	/* IA-32 SDM Vol 3B: 64-bit CPUs always have VMX_BASIC_MSR[48]==0. */
-	if (vmx_msr_high & (1u<<16))
+	if (basic_msr & VMX_BASIC_32BIT_PHYS_ADDR_ONLY)
 		return -EIO;
 #endif
 
 	/* Require Write-Back (WB) memory type for VMCS accesses. */
-	if (((vmx_msr_high >> 18) & 15) != 6)
+	if (vmx_basic_vmcs_mem_type(basic_msr) != MEM_TYPE_WB)
 		return -EIO;
 
 	rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
 
-	vmcs_conf->size = vmx_msr_high & 0x1fff;
-	vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
-
-	vmcs_conf->revision_id = vmx_msr_low;
-
+	vmcs_conf->basic = basic_msr;
 	vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
 	vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
 	vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control;
@@ -2861,13 +2857,13 @@ struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags)
 	if (!pages)
 		return NULL;
 	vmcs = page_address(pages);
-	memset(vmcs, 0, vmcs_config.size);
+	memset(vmcs, 0, vmx_basic_vmcs_size(vmcs_config.basic));
 
 	/* KVM supports Enlightened VMCS v1 only */
 	if (kvm_is_using_evmcs())
 		vmcs->hdr.revision_id = KVM_EVMCS_VERSION;
 	else
-		vmcs->hdr.revision_id = vmcs_config.revision_id;
+		vmcs->hdr.revision_id = vmx_basic_vmcs_revision_id(vmcs_config.basic);
 
 	if (shadow)
 		vmcs->hdr.shadow_vmcs = 1;
@@ -2960,7 +2956,7 @@ static __init int alloc_kvm_area(void)
 		 * physical CPU.
 		 */
 		if (kvm_is_using_evmcs())
-			vmcs->hdr.revision_id = vmcs_config.revision_id;
+			vmcs->hdr.revision_id = vmx_basic_vmcs_revision_id(vmcs_config.basic);
 
 		per_cpu(vmxarea, cpu) = vmcs;
 	}
@@ -3362,7 +3358,7 @@ static int vmx_get_max_ept_level(void)
 
 u64 construct_eptp(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level)
 {
-	u64 eptp = VMX_EPTP_MT_WB;
+	u64 eptp = MEM_TYPE_WB;
 
 	eptp |= (root_level == 5) ? VMX_EPTP_PWL_5 : VMX_EPTP_PWL_4;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 02/25] KVM: VMX: Cleanup VMX misc information defines and usages
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
  2024-02-07 17:26 ` [PATCH v2 01/25] KVM: VMX: Cleanup VMX basic information defines and usages Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 03/25] KVM: VMX: Add support for the secondary VM exit controls Xin Li
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Define VMX misc information fields with BIT_ULL()/GENMASK_ULL(), and move
VMX misc field macros to vmx.h if used in multiple files or where they are
used only once.

Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/msr-index.h |  5 -----
 arch/x86/include/asm/vmx.h       | 12 +++++------
 arch/x86/kvm/vmx/capabilities.h  |  4 ++--
 arch/x86/kvm/vmx/nested.c        | 34 ++++++++++++++++++++++++--------
 arch/x86/kvm/vmx/nested.h        |  2 +-
 arch/x86/kvm/vmx/vmx.c           |  8 +++-----
 6 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e8af4cf01e89..4fa2b3dd743e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1129,11 +1129,6 @@
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
 
-/* MSR_IA32_VMX_MISC bits */
-#define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
-#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
-#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
-
 /* AMD-V MSRs */
 #define MSR_VM_CR                       0xc0010114
 #define MSR_VM_IGNNE                    0xc0010115
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 353538b79ce5..76518e21c54d 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -132,12 +132,10 @@
 #define VMX_BASIC_TRUE_CTLS			BIT_ULL(55)
 
 
-#define VMX_MISC_PREEMPTION_TIMER_RATE_MASK	0x0000001f
-#define VMX_MISC_SAVE_EFER_LMA			0x00000020
-#define VMX_MISC_ACTIVITY_HLT			0x00000040
-#define VMX_MISC_ACTIVITY_WAIT_SIPI		0x00000100
-#define VMX_MISC_ZERO_LEN_INS			0x40000000
-#define VMX_MISC_MSR_LIST_MULTIPLIER		512
+/* VMX_MISC bits and bitmasks */
+#define VMX_MISC_INTEL_PT			BIT_ULL(14)
+#define VMX_MISC_VMWRITE_SHADOW_RO_FIELDS	BIT_ULL(29)
+#define VMX_MISC_ZERO_LEN_INS			BIT_ULL(30)
 
 /* VMFUNC functions */
 #define VMFUNC_CONTROL_BIT(x)	BIT((VMX_FEATURE_##x & 0x1f) - 28)
@@ -162,7 +160,7 @@ static inline u32 vmx_basic_vmcs_mem_type(u64 vmx_basic)
 
 static inline int vmx_misc_preemption_timer_rate(u64 vmx_misc)
 {
-	return vmx_misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+	return vmx_misc & GENMASK_ULL(4, 0);
 }
 
 static inline int vmx_misc_cr3_count(u64 vmx_misc)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 86ce8bb96bed..cb6588238f46 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -223,7 +223,7 @@ static inline bool cpu_has_vmx_vmfunc(void)
 static inline bool cpu_has_vmx_shadow_vmcs(void)
 {
 	/* check if the cpu supports writing r/o exit information fields */
-	if (!(vmcs_config.misc & MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS))
+	if (!(vmcs_config.misc & VMX_MISC_VMWRITE_SHADOW_RO_FIELDS))
 		return false;
 
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -365,7 +365,7 @@ static inline bool cpu_has_vmx_invvpid_global(void)
 
 static inline bool cpu_has_vmx_intel_pt(void)
 {
-	return (vmcs_config.misc & MSR_IA32_VMX_MISC_INTEL_PT) &&
+	return (vmcs_config.misc & VMX_MISC_INTEL_PT) &&
 		(vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_PT_USE_GPA) &&
 		(vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_RTIT_CTL);
 }
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 14d0167825dd..8a5fda04e2de 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -917,6 +917,8 @@ static int nested_vmx_store_msr_check(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+#define VMX_MISC_MSR_LIST_MULTIPLIER	512
+
 static u32 nested_vmx_max_atomic_switch_msrs(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -1315,18 +1317,34 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 	return 0;
 }
 
+#define VMX_MISC_SAVE_EFER_LMA		BIT_ULL(5)
+#define VMX_MISC_ACTIVITY_STATE_BITMAP	GENMASK_ULL(8, 6)
+#define VMX_MISC_ACTIVITY_HLT		BIT_ULL(6)
+#define VMX_MISC_ACTIVITY_WAIT_SIPI	BIT_ULL(8)
+#define VMX_MISC_RDMSR_IN_SMM		BIT_ULL(15)
+#define VMX_MISC_VMXOFF_BLOCK_SMI	BIT_ULL(28)
+
+#define VMX_MISC_FEATURES_MASK			\
+	(VMX_MISC_SAVE_EFER_LMA |		\
+	 VMX_MISC_ACTIVITY_STATE_BITMAP |	\
+	 VMX_MISC_INTEL_PT |			\
+	 VMX_MISC_RDMSR_IN_SMM |		\
+	 VMX_MISC_VMXOFF_BLOCK_SMI |		\
+	 VMX_MISC_VMWRITE_SHADOW_RO_FIELDS |	\
+	 VMX_MISC_ZERO_LEN_INS)
+
+#define VMX_MISC_RESERVED_BITS			\
+	(BIT_ULL(31) | GENMASK_ULL(13, 9))
+
 static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
 {
-	const u64 feature_and_reserved_bits =
-		/* feature */
-		BIT_ULL(5) | GENMASK_ULL(8, 6) | BIT_ULL(14) | BIT_ULL(15) |
-		BIT_ULL(28) | BIT_ULL(29) | BIT_ULL(30) |
-		/* reserved */
-		GENMASK_ULL(13, 9) | BIT_ULL(31);
 	u64 vmx_misc = vmx_control_msr(vmcs_config.nested.misc_low,
 				       vmcs_config.nested.misc_high);
 
-	if (!is_bitwise_subset(vmx_misc, data, feature_and_reserved_bits))
+	static_assert(!(VMX_MISC_FEATURES_MASK & VMX_MISC_RESERVED_BITS));
+
+	if (!is_bitwise_subset(vmx_misc, data,
+			       VMX_MISC_FEATURES_MASK | VMX_MISC_RESERVED_BITS))
 		return -EINVAL;
 
 	if ((vmx->nested.msrs.pinbased_ctls_high &
@@ -6993,7 +7011,7 @@ static void nested_vmx_setup_misc_data(struct vmcs_config *vmcs_conf,
 {
 	msrs->misc_low = (u32)vmcs_conf->misc & VMX_MISC_SAVE_EFER_LMA;
 	msrs->misc_low |=
-		MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS |
+		VMX_MISC_VMWRITE_SHADOW_RO_FIELDS |
 		VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
 		VMX_MISC_ACTIVITY_HLT |
 		VMX_MISC_ACTIVITY_WAIT_SIPI;
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index cce4e2aa30fb..0782fe599757 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -109,7 +109,7 @@ static inline unsigned nested_cpu_vmx_misc_cr3_count(struct kvm_vcpu *vcpu)
 static inline bool nested_cpu_has_vmwrite_any_field(struct kvm_vcpu *vcpu)
 {
 	return to_vmx(vcpu)->nested.msrs.misc_low &
-		MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS;
+		VMX_MISC_VMWRITE_SHADOW_RO_FIELDS;
 }
 
 static inline bool nested_cpu_has_zero_length_injection(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a16b3de01e3f..581967d20659 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2571,7 +2571,6 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	u32 _vmexit_control = 0;
 	u32 _vmentry_control = 0;
 	u64 basic_msr;
-	u64 misc_msr;
 	int i;
 
 	/*
@@ -2705,8 +2704,6 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	if (vmx_basic_vmcs_mem_type(basic_msr) != MEM_TYPE_WB)
 		return -EIO;
 
-	rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
-
 	vmcs_conf->basic = basic_msr;
 	vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
 	vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
@@ -2714,7 +2711,8 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	vmcs_conf->cpu_based_3rd_exec_ctrl = _cpu_based_3rd_exec_control;
 	vmcs_conf->vmexit_ctrl         = _vmexit_control;
 	vmcs_conf->vmentry_ctrl        = _vmentry_control;
-	vmcs_conf->misc	= misc_msr;
+
+	rdmsrl(MSR_IA32_VMX_MISC, vmcs_conf->misc);
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	if (enlightened_vmcs)
@@ -8603,7 +8601,7 @@ static __init int hardware_setup(void)
 		u64 use_timer_freq = 5000ULL * 1000 * 1000;
 
 		cpu_preemption_timer_multi =
-			vmcs_config.misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+			vmx_misc_preemption_timer_rate(vmcs_config.misc);
 
 		if (tsc_khz)
 			use_timer_freq = (u64)tsc_khz * 1000;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 03/25] KVM: VMX: Add support for the secondary VM exit controls
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
  2024-02-07 17:26 ` [PATCH v2 01/25] KVM: VMX: Cleanup VMX basic information defines and usages Xin Li
  2024-02-07 17:26 ` [PATCH v2 02/25] KVM: VMX: Cleanup VMX misc " Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-19 10:21   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 04/25] KVM: x86: Mark CR4.FRED as not reserved Xin Li
                   ` (23 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Enable the secondary VM exit controls to prepare for FRED enabling.

The activation of the secondary VM exit controls is off now, and it
will be switched on when a VMX feature needing it is enabled.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Change since v1:
* Always load the secondary VM exit controls (Sean Christopherson).
---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/include/asm/vmx.h       |  3 +++
 arch/x86/kvm/vmx/capabilities.h  |  9 ++++++++-
 arch/x86/kvm/vmx/vmcs.h          |  1 +
 arch/x86/kvm/vmx/vmx.c           | 17 ++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h           |  7 ++++++-
 6 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4fa2b3dd743e..ab9ec10a3fff 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1112,6 +1112,7 @@
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS     0x00000490
 #define MSR_IA32_VMX_VMFUNC             0x00000491
 #define MSR_IA32_VMX_PROCBASED_CTLS3	0x00000492
+#define MSR_IA32_VMX_EXIT_CTLS2		0x00000493
 
 /* Resctrl MSRs: */
 /* - Intel: */
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 76518e21c54d..272af2004111 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -105,6 +105,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_ACTIVATE_SECONDARY_CONTROLS	0x80000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -250,6 +251,8 @@ enum vmcs_field {
 	TERTIARY_VM_EXEC_CONTROL_HIGH	= 0x00002035,
 	PID_POINTER_TABLE		= 0x00002042,
 	PID_POINTER_TABLE_HIGH		= 0x00002043,
+	SECONDARY_VM_EXIT_CONTROLS	= 0x00002044,
+	SECONDARY_VM_EXIT_CONTROLS_HIGH	= 0x00002045,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index cb6588238f46..e8f3ad0f79ee 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -59,8 +59,9 @@ struct vmcs_config {
 	u32 cpu_based_exec_ctrl;
 	u32 cpu_based_2nd_exec_ctrl;
 	u64 cpu_based_3rd_exec_ctrl;
-	u32 vmexit_ctrl;
 	u32 vmentry_ctrl;
+	u32 vmexit_ctrl;
+	u64 secondary_vmexit_ctrl;
 	u64 misc;
 	struct nested_vmx_msrs nested;
 };
@@ -136,6 +137,12 @@ static inline bool cpu_has_tertiary_exec_ctrls(void)
 		CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
 }
 
+static inline bool cpu_has_secondary_vmexit_ctrls(void)
+{
+	return vmcs_config.vmexit_ctrl &
+		VM_EXIT_ACTIVATE_SECONDARY_CONTROLS;
+}
+
 static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
 {
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
index 7c1996b433e2..7d45a6504200 100644
--- a/arch/x86/kvm/vmx/vmcs.h
+++ b/arch/x86/kvm/vmx/vmcs.h
@@ -47,6 +47,7 @@ struct vmcs_host_state {
 struct vmcs_controls_shadow {
 	u32 vm_entry;
 	u32 vm_exit;
+	u64 secondary_vm_exit;
 	u32 pin;
 	u32 exec;
 	u32 secondary_exec;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 581967d20659..4023474ea002 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2569,6 +2569,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	u32 _cpu_based_2nd_exec_control = 0;
 	u64 _cpu_based_3rd_exec_control = 0;
 	u32 _vmexit_control = 0;
+	u64 _secondary_vmexit_control = 0;
 	u32 _vmentry_control = 0;
 	u64 basic_msr;
 	int i;
@@ -2688,6 +2689,11 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		_vmexit_control &= ~x_ctrl;
 	}
 
+	if (_vmexit_control & VM_EXIT_ACTIVATE_SECONDARY_CONTROLS)
+		_secondary_vmexit_control =
+			adjust_vmx_controls64(KVM_OPTIONAL_VMX_SECONDARY_VM_EXIT_CONTROLS,
+					      MSR_IA32_VMX_EXIT_CTLS2);
+
 	rdmsrl(MSR_IA32_VMX_BASIC, basic_msr);
 
 	/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -2709,8 +2715,9 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
 	vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control;
 	vmcs_conf->cpu_based_3rd_exec_ctrl = _cpu_based_3rd_exec_control;
-	vmcs_conf->vmexit_ctrl         = _vmexit_control;
 	vmcs_conf->vmentry_ctrl        = _vmentry_control;
+	vmcs_conf->vmexit_ctrl         = _vmexit_control;
+	vmcs_conf->secondary_vmexit_ctrl   = _secondary_vmexit_control;
 
 	rdmsrl(MSR_IA32_VMX_MISC, vmcs_conf->misc);
 
@@ -4421,6 +4428,11 @@ static u32 vmx_vmexit_ctrl(void)
 		~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);
 }
 
+static u64 vmx_secondary_vmexit_ctrl(void)
+{
+	return vmcs_config.secondary_vmexit_ctrl;
+}
+
 static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4766,6 +4778,9 @@ static void init_vmcs(struct vcpu_vmx *vmx)
 
 	vm_exit_controls_set(vmx, vmx_vmexit_ctrl());
 
+	if (cpu_has_secondary_vmexit_ctrls())
+		secondary_vm_exit_controls_set(vmx, vmx_secondary_vmexit_ctrl());
+
 	/* 22.2.1, 20.8.1 */
 	vm_entry_controls_set(vmx, vmx_vmentry_ctrl());
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index e3b0985bb74a..f470eeb2a5c8 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -506,7 +506,11 @@ static inline u8 vmx_get_rvi(void)
 	       VM_EXIT_LOAD_IA32_EFER |					\
 	       VM_EXIT_CLEAR_BNDCFGS |					\
 	       VM_EXIT_PT_CONCEAL_PIP |					\
-	       VM_EXIT_CLEAR_IA32_RTIT_CTL)
+	       VM_EXIT_CLEAR_IA32_RTIT_CTL |				\
+	       VM_EXIT_ACTIVATE_SECONDARY_CONTROLS)
+
+#define KVM_REQUIRED_VMX_SECONDARY_VM_EXIT_CONTROLS (0)
+#define KVM_OPTIONAL_VMX_SECONDARY_VM_EXIT_CONTROLS (0)
 
 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL			\
 	(PIN_BASED_EXT_INTR_MASK |					\
@@ -610,6 +614,7 @@ static __always_inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##b
 }
 BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32)
 BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32)
+BUILD_CONTROLS_SHADOW(secondary_vm_exit, SECONDARY_VM_EXIT_CONTROLS, 64)
 BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL, 32)
 BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL, 32)
 BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL, 32)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 04/25] KVM: x86: Mark CR4.FRED as not reserved
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (2 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 03/25] KVM: VMX: Add support for the secondary VM exit controls Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-19 10:22   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 05/25] KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config Xin Li
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

The CR4.FRED bit, i.e., CR4[32], is no longer a reserved bit when a guest
enumerates FRED, otherwise it is still a reserved bit.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/x86.h              | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b5b2d0fde579..0d88873eba63 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -134,7 +134,7 @@
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
 			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
-			  | X86_CR4_LAM_SUP))
+			  | X86_CR4_LAM_SUP | X86_CR4_FRED))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2f7e19166658..9a52016ebf5a 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -532,6 +532,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_PCIDE;       \
 	if (!__cpu_has(__c, X86_FEATURE_LAM))           \
 		__reserved_bits |= X86_CR4_LAM_SUP;     \
+	if (!__cpu_has(__c, X86_FEATURE_FRED))          \
+		__reserved_bits |= X86_CR4_FRED;        \
 	__reserved_bits;                                \
 })
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 05/25] KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (3 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 04/25] KVM: x86: Mark CR4.FRED as not reserved Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-19 10:24   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 06/25] KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID Xin Li
                   ` (21 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Setup the global vmcs_config for FRED:
1) Add VM_ENTRY_LOAD_IA32_FRED to KVM_OPTIONAL_VMX_VM_ENTRY_CONTROLS to
   have a FRED CPU load guest FRED MSRs from VMCS upon VM entry.
2) Add SECONDARY_VM_EXIT_SAVE_IA32_FRED to
   KVM_OPTIONAL_VMX_SECONDARY_VM_EXIT_CONTROLS to have a FRED CPU save
   guest FRED MSRs to VMCS during VM exit.
3) add SECONDARY_VM_EXIT_LOAD_IA32_FRED to
   KVM_OPTIONAL_VMX_SECONDARY_VM_EXIT_CONTROLS to have a FRED CPU load
   host FRED MSRs from VMCS during VM exit.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Change since v1:
* Remove FRED VM entry/exit controls consistency checks in
  setup_vmcs_config() (Sean Christopherson).
---
 arch/x86/include/asm/vmx.h | 3 +++
 arch/x86/kvm/vmx/vmx.h     | 7 +++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 272af2004111..cb14f7e315f5 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -106,6 +106,8 @@
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
 #define VM_EXIT_ACTIVATE_SECONDARY_CONTROLS	0x80000000
+#define SECONDARY_VM_EXIT_SAVE_IA32_FRED	0x00000001
+#define SECONDARY_VM_EXIT_LOAD_IA32_FRED	0x00000002
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -119,6 +121,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_IA32_FRED			0x00800000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index f470eeb2a5c8..3ad52437f426 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -484,7 +484,8 @@ static inline u8 vmx_get_rvi(void)
 	 VM_ENTRY_LOAD_IA32_EFER |					\
 	 VM_ENTRY_LOAD_BNDCFGS |					\
 	 VM_ENTRY_PT_CONCEAL_PIP |					\
-	 VM_ENTRY_LOAD_IA32_RTIT_CTL)
+	 VM_ENTRY_LOAD_IA32_RTIT_CTL |					\
+	 VM_ENTRY_LOAD_IA32_FRED)
 
 #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS				\
 	(VM_EXIT_SAVE_DEBUG_CONTROLS |					\
@@ -510,7 +511,9 @@ static inline u8 vmx_get_rvi(void)
 	       VM_EXIT_ACTIVATE_SECONDARY_CONTROLS)
 
 #define KVM_REQUIRED_VMX_SECONDARY_VM_EXIT_CONTROLS (0)
-#define KVM_OPTIONAL_VMX_SECONDARY_VM_EXIT_CONTROLS (0)
+#define KVM_OPTIONAL_VMX_SECONDARY_VM_EXIT_CONTROLS			\
+	     (SECONDARY_VM_EXIT_SAVE_IA32_FRED |			\
+	      SECONDARY_VM_EXIT_LOAD_IA32_FRED)
 
 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL			\
 	(PIN_BASED_EXT_INTR_MASK |					\
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 06/25] KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (4 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 05/25] KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-19 11:02   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 07/25] KVM: VMX: Set intercept for FRED MSRs Xin Li
                   ` (20 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Clear FRED VM entry/exit controls when initializing a vCPU, and set
these controls only if FRED is enumerated after set CPUID.

FRED VM entry/exit controls need to be set to establish context
sufficient to support FRED event delivery immediately after VM entry
and exit.  However it is not required to save/load FRED MSRs for
a non-FRED guest, which aren't supposed to access FRED MSRs.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Changes since v1:
* Use kvm_cpu_cap_has() instead of cpu_feature_enabled() (Chao Gao).
* Clear FRED VM entry/exit controls if FRED is not enumerated (Chao Gao).
* Use guest_can_use() to trace FRED enumeration in a vcpu (Chao Gao).
---
 arch/x86/kvm/governed_features.h |  1 +
 arch/x86/kvm/vmx/vmx.c           | 32 +++++++++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/governed_features.h b/arch/x86/kvm/governed_features.h
index ad463b1ed4e4..507ca73e52e9 100644
--- a/arch/x86/kvm/governed_features.h
+++ b/arch/x86/kvm/governed_features.h
@@ -17,6 +17,7 @@ KVM_GOVERNED_X86_FEATURE(PFTHRESHOLD)
 KVM_GOVERNED_X86_FEATURE(VGIF)
 KVM_GOVERNED_X86_FEATURE(VNMI)
 KVM_GOVERNED_X86_FEATURE(LAM)
+KVM_GOVERNED_X86_FEATURE(FRED)
 
 #undef KVM_GOVERNED_X86_FEATURE
 #undef KVM_GOVERNED_FEATURE
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4023474ea002..34b6676f60d8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4402,6 +4402,9 @@ static u32 vmx_vmentry_ctrl(void)
 	if (cpu_has_perf_global_ctrl_bug())
 		vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
 
+	/* Whether to load guest FRED MSRs is deferred until after set CPUID */
+	vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_FRED;
+
 	return vmentry_ctrl;
 }
 
@@ -4430,7 +4433,13 @@ static u32 vmx_vmexit_ctrl(void)
 
 static u64 vmx_secondary_vmexit_ctrl(void)
 {
-	return vmcs_config.secondary_vmexit_ctrl;
+	u64 secondary_vmexit_ctrl = vmcs_config.secondary_vmexit_ctrl;
+
+	/* Whether to save/load FRED MSRs is deferred until after set CPUID */
+	secondary_vmexit_ctrl &= ~(SECONDARY_VM_EXIT_SAVE_IA32_FRED |
+				   SECONDARY_VM_EXIT_LOAD_IA32_FRED);
+
+	return secondary_vmexit_ctrl;
 }
 
 static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
@@ -7762,10 +7771,31 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
+static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_FRED);
+
+	if (guest_can_use(vcpu, X86_FEATURE_FRED)) {
+		vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED);
+		secondary_vm_exit_controls_setbit(vmx,
+						  SECONDARY_VM_EXIT_SAVE_IA32_FRED |
+						  SECONDARY_VM_EXIT_LOAD_IA32_FRED);
+	} else {
+		vm_entry_controls_clearbit(vmx, VM_ENTRY_LOAD_IA32_FRED);
+		secondary_vm_exit_controls_clearbit(vmx,
+						    SECONDARY_VM_EXIT_SAVE_IA32_FRED |
+						    SECONDARY_VM_EXIT_LOAD_IA32_FRED);
+	}
+}
+
 static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	vmx_vcpu_config_fred_after_set_cpuid(vcpu);
+
 	/*
 	 * XSAVES is effectively enabled if and only if XSAVE is also exposed
 	 * to the guest.  XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 07/25] KVM: VMX: Set intercept for FRED MSRs
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (5 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 06/25] KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-19 13:35   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 08/25] KVM: VMX: Initialize VMCS FRED fields Xin Li
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Add FRED MSRs to the valid passthrough MSR list and set FRED MSRs intercept
based on FRED enumeration.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Change since v1:
* Enable FRED MSRs intercept if FRED is no longer enumerated in CPUID
  (Chao Gao).
---
 arch/x86/kvm/vmx/vmx.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 34b6676f60d8..d58ed2d3d379 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -693,6 +693,9 @@ static bool is_valid_passthrough_msr(u32 msr)
 	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
 		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
 		return true;
+	case MSR_IA32_FRED_RSP0 ... MSR_IA32_FRED_CONFIG:
+		/* FRED MSRs should be passthrough to FRED guests only */
+		return true;
 	}
 
 	r = possible_passthrough_msr_slot(msr) != -ENOENT;
@@ -7774,10 +7777,12 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	bool fred_enumerated;
 
 	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_FRED);
+	fred_enumerated = guest_can_use(vcpu, X86_FEATURE_FRED);
 
-	if (guest_can_use(vcpu, X86_FEATURE_FRED)) {
+	if (fred_enumerated) {
 		vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED);
 		secondary_vm_exit_controls_setbit(vmx,
 						  SECONDARY_VM_EXIT_SAVE_IA32_FRED |
@@ -7788,6 +7793,16 @@ static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu)
 						    SECONDARY_VM_EXIT_SAVE_IA32_FRED |
 						    SECONDARY_VM_EXIT_LOAD_IA32_FRED);
 	}
+
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP0, MSR_TYPE_RW, !fred_enumerated);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP1, MSR_TYPE_RW, !fred_enumerated);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP2, MSR_TYPE_RW, !fred_enumerated);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP3, MSR_TYPE_RW, !fred_enumerated);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_STKLVLS, MSR_TYPE_RW, !fred_enumerated);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP1, MSR_TYPE_RW, !fred_enumerated);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP2, MSR_TYPE_RW, !fred_enumerated);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP3, MSR_TYPE_RW, !fred_enumerated);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_CONFIG, MSR_TYPE_RW, !fred_enumerated);
 }
 
 static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 08/25] KVM: VMX: Initialize VMCS FRED fields
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (6 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 07/25] KVM: VMX: Set intercept for FRED MSRs Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-19 14:01   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and guest Xin Li
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Initialize host VMCS FRED fields with host FRED MSRs' value and
guest VMCS FRED fields to 0.

FRED CPU states are managed in 9 new FRED MSRs, as well as a few
existing CPU registers and MSRs, e.g., CR4.FRED.  To support FRED
context management, new VMCS fields corresponding to most of FRED
CPU state MSRs are added to both the host-state and guest-state
areas of VMCS.

Specifically no VMCS fields are added for FRED RSP0 and SSP0 MSRs,
because the 2 FRED MSRs are used during ring 3 event delivery only,
thus KVM, running on ring 0, can run safely even with guest FRED
RSP0 and SSP0.  It can be deferred to load host FRED RSP0 and SSP0
until before returning to user level.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Changes since v1:
* Use kvm_cpu_cap_has() instead of cpu_feature_enabled() to decouple
  KVM's capability to virtualize a feature and host's enabling of a
  feature (Chao Gao).
* Move guest FRED states init into __vmx_vcpu_reset() (Chao Gao).
---
 arch/x86/include/asm/vmx.h | 16 ++++++++++++++++
 arch/x86/kvm/vmx/vmx.c     | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cb14f7e315f5..4889754415b5 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -280,12 +280,28 @@ enum vmcs_field {
 	GUEST_BNDCFGS_HIGH              = 0x00002813,
 	GUEST_IA32_RTIT_CTL		= 0x00002814,
 	GUEST_IA32_RTIT_CTL_HIGH	= 0x00002815,
+	GUEST_IA32_FRED_CONFIG		= 0x0000281a,
+	GUEST_IA32_FRED_RSP1		= 0x0000281c,
+	GUEST_IA32_FRED_RSP2		= 0x0000281e,
+	GUEST_IA32_FRED_RSP3		= 0x00002820,
+	GUEST_IA32_FRED_STKLVLS		= 0x00002822,
+	GUEST_IA32_FRED_SSP1		= 0x00002824,
+	GUEST_IA32_FRED_SSP2		= 0x00002826,
+	GUEST_IA32_FRED_SSP3		= 0x00002828,
 	HOST_IA32_PAT			= 0x00002c00,
 	HOST_IA32_PAT_HIGH		= 0x00002c01,
 	HOST_IA32_EFER			= 0x00002c02,
 	HOST_IA32_EFER_HIGH		= 0x00002c03,
 	HOST_IA32_PERF_GLOBAL_CTRL	= 0x00002c04,
 	HOST_IA32_PERF_GLOBAL_CTRL_HIGH	= 0x00002c05,
+	HOST_IA32_FRED_CONFIG		= 0x00002c08,
+	HOST_IA32_FRED_RSP1		= 0x00002c0a,
+	HOST_IA32_FRED_RSP2		= 0x00002c0c,
+	HOST_IA32_FRED_RSP3		= 0x00002c0e,
+	HOST_IA32_FRED_STKLVLS		= 0x00002c10,
+	HOST_IA32_FRED_SSP1		= 0x00002c12,
+	HOST_IA32_FRED_SSP2		= 0x00002c14,
+	HOST_IA32_FRED_SSP3		= 0x00002c16,
 	PIN_BASED_VM_EXEC_CONTROL       = 0x00004000,
 	CPU_BASED_VM_EXEC_CONTROL       = 0x00004002,
 	EXCEPTION_BITMAP                = 0x00004004,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d58ed2d3d379..b7b772183ee4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1470,6 +1470,18 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
 				    (unsigned long)(cpu_entry_stack(cpu) + 1));
 		}
 
+#ifdef CONFIG_X86_64
+		/* Per-CPU FRED MSRs */
+		if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
+			vmcs_write64(HOST_IA32_FRED_RSP1, read_msr(MSR_IA32_FRED_RSP1));
+			vmcs_write64(HOST_IA32_FRED_RSP2, read_msr(MSR_IA32_FRED_RSP2));
+			vmcs_write64(HOST_IA32_FRED_RSP3, read_msr(MSR_IA32_FRED_RSP3));
+			vmcs_write64(HOST_IA32_FRED_SSP1, read_msr(MSR_IA32_FRED_SSP1));
+			vmcs_write64(HOST_IA32_FRED_SSP2, read_msr(MSR_IA32_FRED_SSP2));
+			vmcs_write64(HOST_IA32_FRED_SSP3, read_msr(MSR_IA32_FRED_SSP3));
+		}
+#endif
+
 		vmx->loaded_vmcs->cpu = cpu;
 	}
 }
@@ -4321,6 +4333,15 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 	 */
 	vmcs_write16(HOST_DS_SELECTOR, 0);
 	vmcs_write16(HOST_ES_SELECTOR, 0);
+
+	/*
+	 * FRED MSRs are per-cpu, however FRED CONFIG and STKLVLS MSRs
+	 * are the same on all CPUs, thus they are initialized here.
+	 */
+	if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
+		vmcs_write64(HOST_IA32_FRED_CONFIG, read_msr(MSR_IA32_FRED_CONFIG));
+		vmcs_write64(HOST_IA32_FRED_STKLVLS, read_msr(MSR_IA32_FRED_STKLVLS));
+	}
 #else
 	vmcs_write16(HOST_DS_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
 	vmcs_write16(HOST_ES_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
@@ -4865,6 +4886,19 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 	 */
 	vmx->pi_desc.nv = POSTED_INTR_VECTOR;
 	vmx->pi_desc.sn = 1;
+
+#ifdef CONFIG_X86_64
+	if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
+		vmcs_write64(GUEST_IA32_FRED_CONFIG, 0);
+		vmcs_write64(GUEST_IA32_FRED_RSP1, 0);
+		vmcs_write64(GUEST_IA32_FRED_RSP2, 0);
+		vmcs_write64(GUEST_IA32_FRED_RSP3, 0);
+		vmcs_write64(GUEST_IA32_FRED_STKLVLS, 0);
+		vmcs_write64(GUEST_IA32_FRED_SSP1, 0);
+		vmcs_write64(GUEST_IA32_FRED_SSP2, 0);
+		vmcs_write64(GUEST_IA32_FRED_SSP3, 0);
+	}
+#endif
 }
 
 static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and guest
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (7 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 08/25] KVM: VMX: Initialize VMCS FRED fields Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-19 14:23   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 10/25] KVM: VMX: Add support for FRED context save/restore Xin Li
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Switch MSR_IA32_FRED_RSP0 between host and guest in
vmx_prepare_switch_to_{host,guest}().

MSR_IA32_FRED_RSP0 is used during ring 3 event delivery only, thus
KVM, running on ring 0, can run safely with guest FRED RSP0, i.e.,
no need to switch between host/guest FRED RSP0 during VM entry and
exit.

KVM should switch to host FRED RSP0 before returning to user level,
and switch to guest FRED RSP0 before entering guest mode.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Changes since v1:
* Don't use guest_cpuid_has() in vmx_prepare_switch_to_{host,guest}(),
  which are called from IRQ-disabled context (Chao Gao).
* Reset msr_guest_fred_rsp0 in __vmx_vcpu_reset() (Chao Gao).
---
 arch/x86/kvm/vmx/vmx.c | 17 +++++++++++++++++
 arch/x86/kvm/vmx/vmx.h |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b7b772183ee4..264378c3b784 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1337,6 +1337,16 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	}
 
 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
+
+	if (guest_can_use(vcpu, X86_FEATURE_FRED)) {
+		/*
+		 * MSR_IA32_FRED_RSP0 is top of task stack, which never changes.
+		 * Thus it should be initialized only once.
+		 */
+		if (unlikely(vmx->msr_host_fred_rsp0 == 0))
+			vmx->msr_host_fred_rsp0 = read_msr(MSR_IA32_FRED_RSP0);
+		wrmsrl(MSR_IA32_FRED_RSP0, vmx->msr_guest_fred_rsp0);
+	}
 #else
 	savesegment(fs, fs_sel);
 	savesegment(gs, gs_sel);
@@ -1381,6 +1391,11 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
 	invalidate_tss_limit();
 #ifdef CONFIG_X86_64
 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
+
+	if (guest_can_use(&vmx->vcpu, X86_FEATURE_FRED)) {
+		vmx->msr_guest_fred_rsp0 = read_msr(MSR_IA32_FRED_RSP0);
+		wrmsrl(MSR_IA32_FRED_RSP0, vmx->msr_host_fred_rsp0);
+	}
 #endif
 	load_fixmap_gdt(raw_smp_processor_id());
 	vmx->guest_state_loaded = false;
@@ -4889,6 +4904,8 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
 #ifdef CONFIG_X86_64
 	if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
+		vmx->msr_guest_fred_rsp0 = 0;
+
 		vmcs_write64(GUEST_IA32_FRED_CONFIG, 0);
 		vmcs_write64(GUEST_IA32_FRED_RSP1, 0);
 		vmcs_write64(GUEST_IA32_FRED_RSP2, 0);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 3ad52437f426..176ad39be406 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -278,6 +278,8 @@ struct vcpu_vmx {
 #ifdef CONFIG_X86_64
 	u64		      msr_host_kernel_gs_base;
 	u64		      msr_guest_kernel_gs_base;
+	u64		      msr_host_fred_rsp0;
+	u64		      msr_guest_fred_rsp0;
 #endif
 
 	u64		      spec_ctrl;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 10/25] KVM: VMX: Add support for FRED context save/restore
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (8 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and guest Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-29  6:31   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 11/25] KVM: x86: Add kvm_is_fred_enabled() Xin Li
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Handle host initiated FRED MSR access requests to allow FRED context
to be set/get from user level.

During VM save/restore and live migration, FRED context needs to be
saved/restored, which requires FRED MSRs to be accessed from a user
level application, e.g., Qemu.

Note, handling of MSR_IA32_FRED_SSP0, i.e., MSR_IA32_PL0_SSP, is not
added yet, which is done in the KVM CET patch set.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Changes since v1:
* Use kvm_cpu_cap_has() instead of cpu_feature_enabled() (Chao Gao).
* Fail host requested FRED MSRs access if KVM cannot virtualize FRED
  (Chao Gao).
* Handle the case FRED MSRs are valid but KVM cannot virtualize FRED
  (Chao Gao).
* Add sanity checks when writing to FRED MSRs.
---
 arch/x86/kvm/vmx/vmx.c | 72 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c     | 47 +++++++++++++++++++++++++++
 2 files changed, 119 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 264378c3b784..ee61d2c25cb0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1420,6 +1420,24 @@ static void vmx_write_guest_kernel_gs_base(struct vcpu_vmx *vmx, u64 data)
 	preempt_enable();
 	vmx->msr_guest_kernel_gs_base = data;
 }
+
+static u64 vmx_read_guest_fred_rsp0(struct vcpu_vmx *vmx)
+{
+	preempt_disable();
+	if (vmx->guest_state_loaded)
+		vmx->msr_guest_fred_rsp0 = read_msr(MSR_IA32_FRED_RSP0);
+	preempt_enable();
+	return vmx->msr_guest_fred_rsp0;
+}
+
+static void vmx_write_guest_fred_rsp0(struct vcpu_vmx *vmx, u64 data)
+{
+	preempt_disable();
+	if (vmx->guest_state_loaded)
+		wrmsrl(MSR_IA32_FRED_RSP0, data);
+	preempt_enable();
+	vmx->msr_guest_fred_rsp0 = data;
+}
 #endif
 
 void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
@@ -2019,6 +2037,33 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_KERNEL_GS_BASE:
 		msr_info->data = vmx_read_guest_kernel_gs_base(vmx);
 		break;
+	case MSR_IA32_FRED_RSP0:
+		msr_info->data = vmx_read_guest_fred_rsp0(vmx);
+		break;
+	case MSR_IA32_FRED_RSP1:
+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_RSP1);
+		break;
+	case MSR_IA32_FRED_RSP2:
+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_RSP2);
+		break;
+	case MSR_IA32_FRED_RSP3:
+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_RSP3);
+		break;
+	case MSR_IA32_FRED_STKLVLS:
+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_STKLVLS);
+		break;
+	case MSR_IA32_FRED_SSP1:
+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_SSP1);
+		break;
+	case MSR_IA32_FRED_SSP2:
+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_SSP2);
+		break;
+	case MSR_IA32_FRED_SSP3:
+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_SSP3);
+		break;
+	case MSR_IA32_FRED_CONFIG:
+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_CONFIG);
+		break;
 #endif
 	case MSR_EFER:
 		return kvm_get_msr_common(vcpu, msr_info);
@@ -2226,6 +2271,33 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			vmx_update_exception_bitmap(vcpu);
 		}
 		break;
+	case MSR_IA32_FRED_RSP0:
+		vmx_write_guest_fred_rsp0(vmx, data);
+		break;
+	case MSR_IA32_FRED_RSP1:
+		vmcs_write64(GUEST_IA32_FRED_RSP1, data);
+		break;
+	case MSR_IA32_FRED_RSP2:
+		vmcs_write64(GUEST_IA32_FRED_RSP2, data);
+		break;
+	case MSR_IA32_FRED_RSP3:
+		vmcs_write64(GUEST_IA32_FRED_RSP3, data);
+		break;
+	case MSR_IA32_FRED_STKLVLS:
+		vmcs_write64(GUEST_IA32_FRED_STKLVLS, data);
+		break;
+	case MSR_IA32_FRED_SSP1:
+		vmcs_write64(GUEST_IA32_FRED_SSP1, data);
+		break;
+	case MSR_IA32_FRED_SSP2:
+		vmcs_write64(GUEST_IA32_FRED_SSP2, data);
+		break;
+	case MSR_IA32_FRED_SSP3:
+		vmcs_write64(GUEST_IA32_FRED_SSP3, data);
+		break;
+	case MSR_IA32_FRED_CONFIG:
+		vmcs_write64(GUEST_IA32_FRED_CONFIG, data);
+		break;
 #endif
 	case MSR_IA32_SYSENTER_CS:
 		if (is_guest_mode(vcpu))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 363b1c080205..4e8d60f248e3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1451,6 +1451,9 @@ static const u32 msrs_to_save_base[] = {
 	MSR_STAR,
 #ifdef CONFIG_X86_64
 	MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
+	MSR_IA32_FRED_RSP0, MSR_IA32_FRED_RSP1, MSR_IA32_FRED_RSP2,
+	MSR_IA32_FRED_RSP3, MSR_IA32_FRED_STKLVLS, MSR_IA32_FRED_SSP1,
+	MSR_IA32_FRED_SSP2, MSR_IA32_FRED_SSP3, MSR_IA32_FRED_CONFIG,
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
@@ -1892,6 +1895,30 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 			return 1;
 
 		data = (u32)data;
+		break;
+	case MSR_IA32_FRED_RSP0 ... MSR_IA32_FRED_CONFIG:
+		if (index != MSR_IA32_FRED_STKLVLS && is_noncanonical_address(data, vcpu))
+			return 1;
+		if ((index >= MSR_IA32_FRED_RSP0 && index <= MSR_IA32_FRED_RSP3) &&
+		    (data & GENMASK_ULL(5, 0)))
+			return 1;
+		if ((index >= MSR_IA32_FRED_SSP1 && index <= MSR_IA32_FRED_SSP3) &&
+		    (data & GENMASK_ULL(2, 0)))
+			return 1;
+
+		if (host_initiated) {
+			if (!kvm_cpu_cap_has(X86_FEATURE_FRED))
+				return 1;
+		} else {
+			/*
+			 * Inject #GP upon FRED MSRs accesses from a non-FRED guest,
+			 * which also ensures no malicious guest can write to FRED
+			 * MSRs to corrupt host FRED MSRs.
+			 */
+			if (!guest_can_use(vcpu, X86_FEATURE_FRED))
+				return 1;
+		}
+
 		break;
 	}
 
@@ -1936,6 +1963,22 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
 			return 1;
 		break;
+	case MSR_IA32_FRED_RSP0 ... MSR_IA32_FRED_CONFIG:
+		if (host_initiated) {
+			if (!kvm_cpu_cap_has(X86_FEATURE_FRED))
+				return 1;
+		} else {
+			/*
+			 * Inject #GP upon FRED MSRs accesses from a non-FRED guest,
+			 * which also ensures no malicious guest can write to FRED
+			 * MSRs to corrupt host FRED MSRs.
+			 */
+			if (!guest_can_use(vcpu, X86_FEATURE_FRED))
+				return 1;
+		}
+
+		break;
+
 	}
 
 	msr.index = index;
@@ -7364,6 +7407,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
 			return;
 		break;
+	case MSR_IA32_FRED_RSP0 ... MSR_IA32_FRED_CONFIG:
+		if (!kvm_cpu_cap_has(X86_FEATURE_FRED))
+			return;
+		break;
 	default:
 		break;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 11/25] KVM: x86: Add kvm_is_fred_enabled()
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (9 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 10/25] KVM: VMX: Add support for FRED context save/restore Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-29  8:24   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 12/25] KVM: VMX: Handle FRED event data Xin Li
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Add kvm_is_fred_enabled() to get if FRED is enabled on a vCPU.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Change since v1:
* Explain why it is ok to only check CR4.FRED (Chao Gao).
---
 arch/x86/kvm/kvm_cache_regs.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 75eae9c4998a..1d431c703fdf 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -187,6 +187,23 @@ static __always_inline bool kvm_is_cr4_bit_set(struct kvm_vcpu *vcpu,
 	return !!kvm_read_cr4_bits(vcpu, cr4_bit);
 }
 
+/*
+ * It's enough to check just CR4.FRED (X86_CR4_FRED) to tell if
+ * a vCPU is running with FRED enabled, because:
+ * 1) CR4.FRED can be set to 1 only _after_ IA32_EFER.LMA = 1.
+ * 2) To leave IA-32e mode, CR4.FRED must be cleared first.
+ *
+ * More details at FRED Spec 6.0 Section 4.2 Enabling in CR4.
+ */
+static __always_inline bool kvm_is_fred_enabled(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_X86_64
+	return kvm_is_cr4_bit_set(vcpu, X86_CR4_FRED);
+#else
+	return false;
+#endif
+}
+
 static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 12/25] KVM: VMX: Handle FRED event data
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (10 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 11/25] KVM: x86: Add kvm_is_fred_enabled() Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-30  3:14   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 13/25] KVM: VMX: Handle VMX nested exception for FRED Xin Li
                   ` (14 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Set injected-event data when injecting a #PF, #DB, or #NM caused
by extended feature disable using FRED event delivery, and save
original-event data for being used as injected-event data.

Unlike IDT using some extra CPU register as part of an event
context, e.g., %cr2 for #PF, FRED saves a complete event context
in its stack frame, e.g., FRED saves the faulting linear address
of a #PF into the event data field defined in its stack frame.

Thus a new VMX control field called injected-event data is added
to provide the event data that will be pushed into a FRED stack
frame for VM entries that inject an event using FRED event delivery.
In addition, a new VM exit information field called original-event
data is added to store the event data that would have saved into a
FRED stack frame for VM exits that occur during FRED event delivery.
After such a VM exit is handled to allow the original-event to be
delivered, the data in the original-event data VMCS field needs to
be set into the injected-event data VMCS field for the injection of
the original event.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Change since v1:
* Document event data should be equal to CR2/DR6/IA32_XFD_ERR instead
  of using WARN_ON() (Chao Gao).
* Zero event data if a #NM was not caused by extended feature disable
  (Chao Gao).
---
 arch/x86/include/asm/vmx.h |   4 ++
 arch/x86/kvm/vmx/vmx.c     | 109 ++++++++++++++++++++++++++++---------
 arch/x86/kvm/vmx/vmx.h     |   1 +
 arch/x86/kvm/x86.c         |  10 +++-
 4 files changed, 95 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 4889754415b5..6b796c5c9c2b 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -256,8 +256,12 @@ enum vmcs_field {
 	PID_POINTER_TABLE_HIGH		= 0x00002043,
 	SECONDARY_VM_EXIT_CONTROLS	= 0x00002044,
 	SECONDARY_VM_EXIT_CONTROLS_HIGH	= 0x00002045,
+	INJECTED_EVENT_DATA		= 0x00002052,
+	INJECTED_EVENT_DATA_HIGH	= 0x00002053,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
+	ORIGINAL_EVENT_DATA		= 0x00002404,
+	ORIGINAL_EVENT_DATA_HIGH	= 0x00002405,
 	VMCS_LINK_POINTER               = 0x00002800,
 	VMCS_LINK_POINTER_HIGH          = 0x00002801,
 	GUEST_IA32_DEBUGCTL             = 0x00002802,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ee61d2c25cb0..f622fb90a098 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1871,9 +1871,29 @@ static void vmx_inject_exception(struct kvm_vcpu *vcpu)
 		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
 			     vmx->vcpu.arch.event_exit_inst_len);
 		intr_info |= INTR_TYPE_SOFT_EXCEPTION;
-	} else
+	} else {
 		intr_info |= INTR_TYPE_HARD_EXCEPTION;
 
+		if (kvm_is_fred_enabled(vcpu)) {
+			u64 event_data = 0;
+
+			if (is_debug(intr_info))
+				/*
+				 * Compared to DR6, FRED #DB event data saved on
+				 * the stack frame have bits 4 ~ 11 and 16 ~ 31
+				 * inverted, i.e.,
+				 *   fred_db_event_data = dr6 ^ 0xFFFF0FF0UL
+				 */
+				event_data = vcpu->arch.dr6 ^ DR6_RESERVED;
+			else if (is_page_fault(intr_info))
+				event_data = vcpu->arch.cr2;
+			else if (is_nm_fault(intr_info))
+				event_data = to_vmx(vcpu)->fred_xfd_event_data;
+
+			vmcs_write64(INJECTED_EVENT_DATA, event_data);
+		}
+	}
+
 	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
 
 	vmx_clear_hlt(vcpu);
@@ -7082,8 +7102,11 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
 	 *
 	 * Queuing exception is done in vmx_handle_exit. See comment there.
 	 */
-	if (vcpu->arch.guest_fpu.fpstate->xfd)
+	if (vcpu->arch.guest_fpu.fpstate->xfd) {
 		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
+		to_vmx(vcpu)->fred_xfd_event_data = vcpu->arch.cr0 & X86_CR0_TS
+			? 0 : vcpu->arch.guest_fpu.xfd_err;
+	}
 }
 
 static void handle_exception_irqoff(struct vcpu_vmx *vmx)
@@ -7199,29 +7222,28 @@ static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
 					      vmx->loaded_vmcs->entry_time));
 }
 
-static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
-				      u32 idt_vectoring_info,
-				      int instr_len_field,
-				      int error_code_field)
+static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu, bool vectoring)
 {
-	u8 vector;
-	int type;
-	bool idtv_info_valid;
-
-	idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
+	u32 event_id = vectoring ? to_vmx(vcpu)->idt_vectoring_info
+				 : vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
+	int instr_len_field = vectoring ? VM_EXIT_INSTRUCTION_LEN
+					: VM_ENTRY_INSTRUCTION_LEN;
+	int error_code_field = vectoring ? IDT_VECTORING_ERROR_CODE
+					 : VM_ENTRY_EXCEPTION_ERROR_CODE;
+	int event_data_field = vectoring ? ORIGINAL_EVENT_DATA
+					 : INJECTED_EVENT_DATA;
+	u8 vector = event_id & INTR_INFO_VECTOR_MASK;
+	int type = event_id & INTR_INFO_INTR_TYPE_MASK;
 
 	vcpu->arch.nmi_injected = false;
 	kvm_clear_exception_queue(vcpu);
 	kvm_clear_interrupt_queue(vcpu);
 
-	if (!idtv_info_valid)
+	if (!(event_id & INTR_INFO_VALID_MASK))
 		return;
 
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 
-	vector = idt_vectoring_info & VECTORING_INFO_VECTOR_MASK;
-	type = idt_vectoring_info & VECTORING_INFO_TYPE_MASK;
-
 	switch (type) {
 	case INTR_TYPE_NMI_INTR:
 		vcpu->arch.nmi_injected = true;
@@ -7236,10 +7258,31 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
 		vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field);
 		fallthrough;
 	case INTR_TYPE_HARD_EXCEPTION:
-		if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) {
-			u32 err = vmcs_read32(error_code_field);
-			kvm_requeue_exception_e(vcpu, vector, err);
-		} else
+		if (kvm_is_fred_enabled(vcpu)) {
+			/* Save event data for being used as injected-event data */
+			u64 event_data = vmcs_read64(event_data_field);
+
+			switch (vector) {
+			case DB_VECTOR:
+				/* %dr6 should be equal to (event_data ^ DR6_RESERVED) */
+				vcpu->arch.dr6 = event_data ^ DR6_RESERVED;
+				break;
+			case NM_VECTOR:
+				to_vmx(vcpu)->fred_xfd_event_data = event_data;
+				break;
+			case PF_VECTOR:
+				/* %cr2 should be equal to event_data */
+				vcpu->arch.cr2 = event_data;
+				break;
+			default:
+				WARN_ON(event_data != 0);
+				break;
+			}
+		}
+
+		if (event_id & INTR_INFO_DELIVER_CODE_MASK)
+			kvm_requeue_exception_e(vcpu, vector, vmcs_read32(error_code_field));
+		else
 			kvm_requeue_exception(vcpu, vector);
 		break;
 	case INTR_TYPE_SOFT_INTR:
@@ -7255,18 +7298,12 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
 
 static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 {
-	__vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info,
-				  VM_EXIT_INSTRUCTION_LEN,
-				  IDT_VECTORING_ERROR_CODE);
+	__vmx_complete_interrupts(&vmx->vcpu, true);
 }
 
 static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
 {
-	__vmx_complete_interrupts(vcpu,
-				  vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
-				  VM_ENTRY_INSTRUCTION_LEN,
-				  VM_ENTRY_EXCEPTION_ERROR_CODE);
-
+	__vmx_complete_interrupts(vcpu, false);
 	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
 }
 
@@ -7382,6 +7419,24 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 
 	vmx_disable_fb_clear(vmx);
 
+	/*
+	 * %cr2 needs to be saved after a VM exit and restored before a VM
+	 * entry in case a VM exit happens immediately after delivery of a
+	 * guest #PF but before guest reads %cr2.
+	 *
+	 * A FRED guest should read its #PF faulting linear address from
+	 * the event data field in its FRED stack frame instead of %cr2.
+	 * But the FRED 5.0 spec still requires a FRED CPU to update %cr2
+	 * in the normal way, thus %cr2 is still updated even for a FRED
+	 * guest.
+	 *
+	 * Note, an NMI could interrupt KVM:
+	 *   1) after VM exit but before CR2 is saved.
+	 *   2) after CR2 is restored but before VM entry.
+	 * And a #PF could happen durng NMI handlng, which overwrites %cr2.
+	 * Thus exc_nmi() should save and restore %cr2 upon entering and
+	 * before leaving to make sure %cr2 not corrupted.
+	 */
 	if (vcpu->arch.cr2 != native_read_cr2())
 		native_write_cr2(vcpu->arch.cr2);
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 176ad39be406..d5738c5a4814 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -266,6 +266,7 @@ struct vcpu_vmx {
 	u32                   exit_intr_info;
 	u32                   idt_vectoring_info;
 	ulong                 rflags;
+	u64                   fred_xfd_event_data;
 
 	/*
 	 * User return MSRs are always emulated when enabled in the guest, but
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4e8d60f248e3..00c0062726ae 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -680,8 +680,14 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 			vcpu->arch.exception.injected = true;
 			if (WARN_ON_ONCE(has_payload)) {
 				/*
-				 * A reinjected event has already
-				 * delivered its payload.
+				 * For a reinjected event, KVM delivers its
+				 * payload through:
+				 *   #PF: save %cr2 into arch.cr2 immediately
+				 *        after VM exits.
+				 *   #DB: save %dr6 into arch.dr6 later in
+				 *        sync_dirty_debug_regs().
+				 *
+				 * For FRED guest, see __vmx_complete_interrupts().
 				 */
 				has_payload = false;
 				payload = 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 13/25] KVM: VMX: Handle VMX nested exception for FRED
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (11 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 12/25] KVM: VMX: Handle FRED event data Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-30  7:34   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 14/25] KVM: VMX: Disable FRED if FRED consistency checks fail Xin Li
                   ` (13 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Set VMX nested exception bit in the VM-entry interruption information
VMCS field when injecting a nested exception using FRED event delivery
to ensure:
  1) The nested exception is injected on a correct stack level.
  2) The nested bit defined in FRED stack frame is set.

The event stack level used by FRED event delivery depends on whether the
event was a nested exception encountered during delivery of another event,
because a nested exception is "regarded" as happening on ring 0.  E.g.,
when #PF is configured to use stack level 1 in IA32_FRED_STKLVLS MSR:
  - nested #PF will be delivered on stack level 1 when encountered in
    ring 3.
  - normal #PF will be delivered on stack level 0 when encountered in
    ring 3.

The VMX nested-exception support ensures the correct event stack level is
chosen when a VM entry injects a nested exception.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Changes since v1:
* Set the nested flag when there is an original interrupt (Chao Gao).
---
 arch/x86/include/asm/kvm_host.h |  6 +++--
 arch/x86/include/asm/vmx.h      |  5 ++--
 arch/x86/kvm/svm/svm.c          |  4 +--
 arch/x86/kvm/vmx/vmx.c          |  8 ++++--
 arch/x86/kvm/x86.c              | 46 ++++++++++++++++++++++++++-------
 arch/x86/kvm/x86.h              |  1 +
 6 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0d88873eba63..ef278ee0b6ca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -736,6 +736,7 @@ struct kvm_queued_exception {
 	u32 error_code;
 	unsigned long payload;
 	bool has_payload;
+	bool nested;
 };
 
 struct kvm_vcpu_arch {
@@ -2060,8 +2061,9 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
-void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
-void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
+void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr, bool nested);
+void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr,
+			     u32 error_code, bool nested);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
 void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 6b796c5c9c2b..68af74e48788 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -134,7 +134,7 @@
 #define VMX_BASIC_DUAL_MONITOR_TREATMENT	BIT_ULL(49)
 #define VMX_BASIC_INOUT				BIT_ULL(54)
 #define VMX_BASIC_TRUE_CTLS			BIT_ULL(55)
-
+#define VMX_BASIC_NESTED_EXCEPTION		BIT_ULL(58)
 
 /* VMX_MISC bits and bitmasks */
 #define VMX_MISC_INTEL_PT			BIT_ULL(14)
@@ -407,8 +407,9 @@ enum vmcs_field {
 #define INTR_INFO_INTR_TYPE_MASK        0x700           /* 10:8 */
 #define INTR_INFO_DELIVER_CODE_MASK     0x800           /* 11 */
 #define INTR_INFO_UNBLOCK_NMI		0x1000		/* 12 */
+#define INTR_INFO_NESTED_EXCEPTION_MASK	0x2000		/* 13 */
 #define INTR_INFO_VALID_MASK            0x80000000      /* 31 */
-#define INTR_INFO_RESVD_BITS_MASK       0x7ffff000
+#define INTR_INFO_RESVD_BITS_MASK       0x7fffd000
 
 #define VECTORING_INFO_VECTOR_MASK           	INTR_INFO_VECTOR_MASK
 #define VECTORING_INFO_TYPE_MASK        	INTR_INFO_INTR_TYPE_MASK
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e90b429c84f1..c220b690a37c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4057,10 +4057,10 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu)
 
 		if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) {
 			u32 err = svm->vmcb->control.exit_int_info_err;
-			kvm_requeue_exception_e(vcpu, vector, err);
+			kvm_requeue_exception_e(vcpu, vector, err, false);
 
 		} else
-			kvm_requeue_exception(vcpu, vector);
+			kvm_requeue_exception(vcpu, vector, false);
 		break;
 	case SVM_EXITINTINFO_TYPE_INTR:
 		kvm_queue_interrupt(vcpu, vector, false);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f622fb90a098..1f265d526daf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1891,6 +1891,8 @@ static void vmx_inject_exception(struct kvm_vcpu *vcpu)
 				event_data = to_vmx(vcpu)->fred_xfd_event_data;
 
 			vmcs_write64(INJECTED_EVENT_DATA, event_data);
+
+			intr_info |= ex->nested ? INTR_INFO_NESTED_EXCEPTION_MASK : 0;
 		}
 	}
 
@@ -7281,9 +7283,11 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu, bool vectoring)
 		}
 
 		if (event_id & INTR_INFO_DELIVER_CODE_MASK)
-			kvm_requeue_exception_e(vcpu, vector, vmcs_read32(error_code_field));
+			kvm_requeue_exception_e(vcpu, vector, vmcs_read32(error_code_field),
+						event_id & INTR_INFO_NESTED_EXCEPTION_MASK);
 		else
-			kvm_requeue_exception(vcpu, vector);
+			kvm_requeue_exception(vcpu, vector,
+					      event_id & INTR_INFO_NESTED_EXCEPTION_MASK);
 		break;
 	case INTR_TYPE_SOFT_INTR:
 		vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00c0062726ae..725819262085 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -645,7 +645,8 @@ static void kvm_leave_nested(struct kvm_vcpu *vcpu)
 
 static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		unsigned nr, bool has_error, u32 error_code,
-	        bool has_payload, unsigned long payload, bool reinject)
+	        bool has_payload, unsigned long payload,
+		bool reinject, bool nested)
 {
 	u32 prev_nr;
 	int class1, class2;
@@ -696,6 +697,13 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 			vcpu->arch.exception.pending = true;
 			vcpu->arch.exception.injected = false;
 		}
+
+		vcpu->arch.exception.nested = vcpu->arch.exception.nested ||
+					      (kvm_is_fred_enabled(vcpu) &&
+					       ((reinject && nested) ||
+					        vcpu->arch.nmi_injected ||
+					        vcpu->arch.interrupt.injected));
+
 		vcpu->arch.exception.has_error_code = has_error;
 		vcpu->arch.exception.vector = nr;
 		vcpu->arch.exception.error_code = error_code;
@@ -725,8 +733,28 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		vcpu->arch.exception.injected = false;
 		vcpu->arch.exception.pending = false;
 
+		/*
+		 * A #DF is NOT a nested event per its definition, however per
+		 * FRED spec 5.0 Appendix B, its delivery determines the new
+		 * stack level as is done for events occurring when CPL = 0.
+		 */
+		vcpu->arch.exception.nested = false;
+
 		kvm_queue_exception_e(vcpu, DF_VECTOR, 0);
 	} else {
+		/*
+		 * FRED spec 5.0 Appendix B: delivery of a nested exception
+		 * determines the new stack level as is done for events
+		 * occurring when CPL = 0.
+		 *
+		 * IOW, FRED event delivery of an event encountered in ring 3
+		 * normally uses stack level 0 unconditionally.  However, if
+		 * the event is an exception nested on any earlier event,
+		 * delivery of the nested exception will consult the FRED MSR
+		 * IA32_FRED_STKLVLS to determine which stack level to use.
+		 */
+		vcpu->arch.exception.nested = kvm_is_fred_enabled(vcpu);
+
 		/* replace previous exception with a new one in a hope
 		   that instruction re-execution will regenerate lost
 		   exception */
@@ -736,20 +764,20 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
-	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, false);
+	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, false, false);
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception);
 
-void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr)
+void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr, bool nested)
 {
-	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, true);
+	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, true, nested);
 }
 EXPORT_SYMBOL_GPL(kvm_requeue_exception);
 
 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr,
 			   unsigned long payload)
 {
-	kvm_multiple_exception(vcpu, nr, false, 0, true, payload, false);
+	kvm_multiple_exception(vcpu, nr, false, 0, true, payload, false, false);
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception_p);
 
@@ -757,7 +785,7 @@ static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
 				    u32 error_code, unsigned long payload)
 {
 	kvm_multiple_exception(vcpu, nr, true, error_code,
-			       true, payload, false);
+			       true, payload, false, false);
 }
 
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
@@ -829,13 +857,13 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
 {
-	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, false);
+	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, false, false);
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception_e);
 
-void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
+void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code, bool nested)
 {
-	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, true);
+	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, true, nested);
 }
 EXPORT_SYMBOL_GPL(kvm_requeue_exception_e);
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9a52016ebf5a..c1f1d5696080 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -108,6 +108,7 @@ static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.exception.pending = false;
 	vcpu->arch.exception.injected = false;
+	vcpu->arch.exception.nested = false;
 	vcpu->arch.exception_vmexit.pending = false;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 14/25] KVM: VMX: Disable FRED if FRED consistency checks fail
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (12 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 13/25] KVM: VMX: Handle VMX nested exception for FRED Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-30  8:21   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 15/25] KVM: VMX: Dump FRED context in dump_vmcs() Xin Li
                   ` (12 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Refuse to virtualize FRED if FRED consistency checks fail.

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/kvm/vmx/capabilities.h | 10 ++++++++++
 arch/x86/kvm/vmx/vmx.c          |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index e8f3ad0f79ee..73bf6618c425 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -400,6 +400,16 @@ static inline bool vmx_pebs_supported(void)
 	return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
 }
 
+static inline bool cpu_has_vmx_fred(void)
+{
+	return boot_cpu_has(X86_FEATURE_FRED) &&
+		(vmcs_config.basic & VMX_BASIC_NESTED_EXCEPTION) &&
+		(vmcs_config.vmexit_ctrl & VM_EXIT_ACTIVATE_SECONDARY_CONTROLS) &&
+		(vmcs_config.secondary_vmexit_ctrl & SECONDARY_VM_EXIT_SAVE_IA32_FRED) &&
+		(vmcs_config.secondary_vmexit_ctrl & SECONDARY_VM_EXIT_LOAD_IA32_FRED) &&
+		(vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_FRED);
+}
+
 static inline bool cpu_has_notify_vmexit(void)
 {
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1f265d526daf..a484b9ac2400 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8113,6 +8113,8 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_check_and_set(X86_FEATURE_DS);
 		kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64);
 	}
+	if (!cpu_has_vmx_fred())
+		kvm_cpu_cap_clear(X86_FEATURE_FRED);
 
 	if (!enable_pmu)
 		kvm_cpu_cap_clear(X86_FEATURE_PDCM);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 15/25] KVM: VMX: Dump FRED context in dump_vmcs()
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (13 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 14/25] KVM: VMX: Disable FRED if FRED consistency checks fail Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-04-30  9:09   ` Chao Gao
  2024-02-07 17:26 ` [PATCH v2 16/25] KVM: VMX: Invoke vmx_set_cpu_caps() before nested setup Xin Li
                   ` (11 subsequent siblings)
  26 siblings, 1 reply; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Add FRED related VMCS fields to dump_vmcs() to have it dump FRED context.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Change since v1:
* Use kvm_cpu_cap_has() instead of cpu_feature_enabled() (Chao Gao).
* Dump guest FRED states only if guest has FRED enabled (Nikolay Borisov).
---
 arch/x86/kvm/vmx/vmx.c | 46 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a484b9ac2400..e3409607122d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6392,7 +6392,7 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 vmentry_ctl, vmexit_ctl;
 	u32 cpu_based_exec_ctrl, pin_based_exec_ctrl, secondary_exec_control;
-	u64 tertiary_exec_control;
+	u64 tertiary_exec_control, secondary_vmexit_ctl;
 	unsigned long cr4;
 	int efer_slot;
 
@@ -6403,6 +6403,8 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 
 	vmentry_ctl = vmcs_read32(VM_ENTRY_CONTROLS);
 	vmexit_ctl = vmcs_read32(VM_EXIT_CONTROLS);
+	secondary_vmexit_ctl = cpu_has_secondary_vmexit_ctrls() ?
+			       vmcs_read64(SECONDARY_VM_EXIT_CONTROLS) : 0;
 	cpu_based_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
 	pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL);
 	cr4 = vmcs_readl(GUEST_CR4);
@@ -6449,6 +6451,19 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	vmx_dump_sel("LDTR:", GUEST_LDTR_SELECTOR);
 	vmx_dump_dtsel("IDTR:", GUEST_IDTR_LIMIT);
 	vmx_dump_sel("TR:  ", GUEST_TR_SELECTOR);
+#ifdef CONFIG_X86_64
+	if (kvm_is_fred_enabled(vcpu)) {
+		pr_err("FRED guest: config=0x%016llx, stack levels=0x%016llx\n"
+		       "RSP0=0x%016lx, RSP1=0x%016llx\n"
+		       "RSP2=0x%016llx, RSP3=0x%016llx\n",
+		       vmcs_read64(GUEST_IA32_FRED_CONFIG),
+		       vmcs_read64(GUEST_IA32_FRED_STKLVLS),
+		       read_msr(MSR_IA32_FRED_RSP0),
+		       vmcs_read64(GUEST_IA32_FRED_RSP1),
+		       vmcs_read64(GUEST_IA32_FRED_RSP2),
+		       vmcs_read64(GUEST_IA32_FRED_RSP3));
+	}
+#endif
 	efer_slot = vmx_find_loadstore_msr_slot(&vmx->msr_autoload.guest, MSR_EFER);
 	if (vmentry_ctl & VM_ENTRY_LOAD_IA32_EFER)
 		pr_err("EFER= 0x%016llx\n", vmcs_read64(GUEST_IA32_EFER));
@@ -6496,6 +6511,19 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	       vmcs_readl(HOST_TR_BASE));
 	pr_err("GDTBase=%016lx IDTBase=%016lx\n",
 	       vmcs_readl(HOST_GDTR_BASE), vmcs_readl(HOST_IDTR_BASE));
+#ifdef CONFIG_X86_64
+	if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
+		pr_err("FRED host: config=0x%016llx, stack levels=0x%016llx\n"
+		       "RSP0=0x%016llx, RSP1=0x%016llx\n"
+		       "RSP2=0x%016llx, RSP3=0x%016llx\n",
+		       vmcs_read64(HOST_IA32_FRED_CONFIG),
+		       vmcs_read64(HOST_IA32_FRED_STKLVLS),
+		       vmx->msr_host_fred_rsp0,
+		       vmcs_read64(HOST_IA32_FRED_RSP1),
+		       vmcs_read64(HOST_IA32_FRED_RSP2),
+		       vmcs_read64(HOST_IA32_FRED_RSP3));
+	}
+#endif
 	pr_err("CR0=%016lx CR3=%016lx CR4=%016lx\n",
 	       vmcs_readl(HOST_CR0), vmcs_readl(HOST_CR3),
 	       vmcs_readl(HOST_CR4));
@@ -6517,25 +6545,29 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	pr_err("*** Control State ***\n");
 	pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n",
 	       cpu_based_exec_ctrl, secondary_exec_control, tertiary_exec_control);
-	pr_err("PinBased=0x%08x EntryControls=%08x ExitControls=%08x\n",
-	       pin_based_exec_ctrl, vmentry_ctl, vmexit_ctl);
+	pr_err("PinBased=0x%08x EntryControls=0x%08x\n",
+	       pin_based_exec_ctrl, vmentry_ctl);
+	pr_err("ExitControls=0x%08x SecondaryExitControls=0x%016llx\n",
+	       vmexit_ctl, secondary_vmexit_ctl);
 	pr_err("ExceptionBitmap=%08x PFECmask=%08x PFECmatch=%08x\n",
 	       vmcs_read32(EXCEPTION_BITMAP),
 	       vmcs_read32(PAGE_FAULT_ERROR_CODE_MASK),
 	       vmcs_read32(PAGE_FAULT_ERROR_CODE_MATCH));
-	pr_err("VMEntry: intr_info=%08x errcode=%08x ilen=%08x\n",
+	pr_err("VMEntry: intr_info=%08x errcode=%08x ilen=%08x event data=%016llx\n",
 	       vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
 	       vmcs_read32(VM_ENTRY_EXCEPTION_ERROR_CODE),
-	       vmcs_read32(VM_ENTRY_INSTRUCTION_LEN));
+	       vmcs_read32(VM_ENTRY_INSTRUCTION_LEN),
+	       kvm_cpu_cap_has(X86_FEATURE_FRED) ? vmcs_read64(INJECTED_EVENT_DATA) : 0);
 	pr_err("VMExit: intr_info=%08x errcode=%08x ilen=%08x\n",
 	       vmcs_read32(VM_EXIT_INTR_INFO),
 	       vmcs_read32(VM_EXIT_INTR_ERROR_CODE),
 	       vmcs_read32(VM_EXIT_INSTRUCTION_LEN));
 	pr_err("        reason=%08x qualification=%016lx\n",
 	       vmcs_read32(VM_EXIT_REASON), vmcs_readl(EXIT_QUALIFICATION));
-	pr_err("IDTVectoring: info=%08x errcode=%08x\n",
+	pr_err("IDTVectoring: info=%08x errcode=%08x event data=%016llx\n",
 	       vmcs_read32(IDT_VECTORING_INFO_FIELD),
-	       vmcs_read32(IDT_VECTORING_ERROR_CODE));
+	       vmcs_read32(IDT_VECTORING_ERROR_CODE),
+	       kvm_cpu_cap_has(X86_FEATURE_FRED) ? vmcs_read64(ORIGINAL_EVENT_DATA) : 0);
 	pr_err("TSC Offset = 0x%016llx\n", vmcs_read64(TSC_OFFSET));
 	if (secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
 		pr_err("TSC Multiplier = 0x%016llx\n",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 16/25] KVM: VMX: Invoke vmx_set_cpu_caps() before nested setup
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (14 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 15/25] KVM: VMX: Dump FRED context in dump_vmcs() Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 17/25] KVM: nVMX: Add support for the secondary VM exit controls Xin Li
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Set VMX CPU capabilities before initializing nested instead of after,
as it needs to check VMX CPU capabilities to setup the VMX basic MSR
for nested.

Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e3409607122d..fc808d599493 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8912,6 +8912,8 @@ static __init int hardware_setup(void)
 
 	setup_default_sgx_lepubkeyhash();
 
+	vmx_set_cpu_caps();
+
 	if (nested) {
 		nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept);
 
@@ -8920,8 +8922,6 @@ static __init int hardware_setup(void)
 			return r;
 	}
 
-	vmx_set_cpu_caps();
-
 	r = alloc_kvm_area();
 	if (r && nested)
 		nested_vmx_hardware_unsetup();
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 17/25] KVM: nVMX: Add support for the secondary VM exit controls
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (15 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 16/25] KVM: VMX: Invoke vmx_set_cpu_caps() before nested setup Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 18/25] KVM: nVMX: Add a prerequisite to SHADOW_FIELD_R[OW] macros Xin Li
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Enable the secondary VM exit controls to prepare for nested FRED.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---
 Documentation/virt/kvm/x86/nested-vmx.rst |  1 +
 arch/x86/kvm/vmx/capabilities.h           |  1 +
 arch/x86/kvm/vmx/nested.c                 | 15 ++++++++++++++-
 arch/x86/kvm/vmx/vmcs12.c                 |  1 +
 arch/x86/kvm/vmx/vmcs12.h                 |  2 ++
 arch/x86/kvm/x86.h                        |  2 +-
 6 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/x86/nested-vmx.rst b/Documentation/virt/kvm/x86/nested-vmx.rst
index ac2095d41f02..e64ef231f310 100644
--- a/Documentation/virt/kvm/x86/nested-vmx.rst
+++ b/Documentation/virt/kvm/x86/nested-vmx.rst
@@ -217,6 +217,7 @@ struct shadow_vmcs is ever changed.
 		u16 host_fs_selector;
 		u16 host_gs_selector;
 		u16 host_tr_selector;
+		u64 secondary_vm_exit_controls;
 	};
 
 
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 73bf6618c425..b41c2cde811d 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -38,6 +38,7 @@ struct nested_vmx_msrs {
 	u32 pinbased_ctls_high;
 	u32 exit_ctls_low;
 	u32 exit_ctls_high;
+	u64 secondary_exit_ctls;
 	u32 entry_ctls_low;
 	u32 entry_ctls_high;
 	u32 misc_low;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 8a5fda04e2de..1132e360ff13 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1431,6 +1431,7 @@ int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
 	case MSR_IA32_VMX_PINBASED_CTLS:
 	case MSR_IA32_VMX_PROCBASED_CTLS:
 	case MSR_IA32_VMX_EXIT_CTLS:
+	case MSR_IA32_VMX_EXIT_CTLS2:
 	case MSR_IA32_VMX_ENTRY_CTLS:
 		/*
 		 * The "non-true" VMX capability MSRs are generated from the
@@ -1509,6 +1510,9 @@ int vmx_get_vmx_msr(struct nested_vmx_msrs *msrs, u32 msr_index, u64 *pdata)
 		if (msr_index == MSR_IA32_VMX_EXIT_CTLS)
 			*pdata |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
 		break;
+	case MSR_IA32_VMX_EXIT_CTLS2:
+		*pdata = msrs->secondary_exit_ctls;
+		break;
 	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
 	case MSR_IA32_VMX_ENTRY_CTLS:
 		*pdata = vmx_control_msr(
@@ -2443,6 +2447,11 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
 		exec_control &= ~VM_EXIT_LOAD_IA32_EFER;
 	vm_exit_controls_set(vmx, exec_control);
 
+	if (exec_control & VM_EXIT_ACTIVATE_SECONDARY_CONTROLS) {
+		exec_control = __secondary_vm_exit_controls_get(vmcs01);
+		secondary_vm_exit_controls_set(vmx, exec_control);
+	}
+
 	/*
 	 * Interrupt/Exception Fields
 	 */
@@ -6856,13 +6865,17 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
 		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
-		VM_EXIT_CLEAR_BNDCFGS;
+		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_ACTIVATE_SECONDARY_CONTROLS;
 	msrs->exit_ctls_high |=
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
 		VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT |
 		VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
 
+	/* secondary exit controls */
+	if (msrs->exit_ctls_high & VM_EXIT_ACTIVATE_SECONDARY_CONTROLS)
+		rdmsrl(MSR_IA32_VMX_EXIT_CTLS2, msrs->secondary_exit_ctls);
+
 	/* We support free control of debug control saving. */
 	msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;
 }
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 106a72c923ca..98457d7b2b23 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -73,6 +73,7 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(PAGE_FAULT_ERROR_CODE_MATCH, page_fault_error_code_match),
 	FIELD(CR3_TARGET_COUNT, cr3_target_count),
 	FIELD(VM_EXIT_CONTROLS, vm_exit_controls),
+	FIELD(SECONDARY_VM_EXIT_CONTROLS, secondary_vm_exit_controls),
 	FIELD(VM_EXIT_MSR_STORE_COUNT, vm_exit_msr_store_count),
 	FIELD(VM_EXIT_MSR_LOAD_COUNT, vm_exit_msr_load_count),
 	FIELD(VM_ENTRY_CONTROLS, vm_entry_controls),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 01936013428b..f50f897b9b5f 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -185,6 +185,7 @@ struct __packed vmcs12 {
 	u16 host_gs_selector;
 	u16 host_tr_selector;
 	u16 guest_pml_index;
+	u64 secondary_vm_exit_controls;
 };
 
 /*
@@ -358,6 +359,7 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(host_gs_selector, 992);
 	CHECK_OFFSET(host_tr_selector, 994);
 	CHECK_OFFSET(guest_pml_index, 996);
+	CHECK_OFFSET(secondary_vm_exit_controls, 998);
 }
 
 extern const unsigned short vmcs12_field_offsets[];
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c1f1d5696080..498bb6090b1e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -47,7 +47,7 @@ void kvm_spurious_fault(void);
  * associated feature that KVM supports for nested virtualization.
  */
 #define KVM_FIRST_EMULATED_VMX_MSR	MSR_IA32_VMX_BASIC
-#define KVM_LAST_EMULATED_VMX_MSR	MSR_IA32_VMX_VMFUNC
+#define KVM_LAST_EMULATED_VMX_MSR	MSR_IA32_VMX_EXIT_CTLS2
 
 #define KVM_DEFAULT_PLE_GAP		128
 #define KVM_VMX_DEFAULT_PLE_WINDOW	4096
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 18/25] KVM: nVMX: Add a prerequisite to SHADOW_FIELD_R[OW] macros
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (16 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 17/25] KVM: nVMX: Add support for the secondary VM exit controls Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 19/25] KVM: nVMX: Add FRED VMCS fields Xin Li
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Add a prerequisite for accessing VMCS fields referenced in macros
SHADOW_FIELD_R[OW], because a VMCS field may not exist on some CPUs.

Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/kvm/vmx/nested.c             | 70 ++++++++++++++++++------
 arch/x86/kvm/vmx/vmcs_shadow_fields.h | 76 +++++++++++++--------------
 2 files changed, 91 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1132e360ff13..94da6a0a2f81 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -53,14 +53,14 @@ struct shadow_vmcs_field {
 	u16	offset;
 };
 static struct shadow_vmcs_field shadow_read_only_fields[] = {
-#define SHADOW_FIELD_RO(x, y) { x, offsetof(struct vmcs12, y) },
+#define SHADOW_FIELD_RO(x, y, c) { x, offsetof(struct vmcs12, y) },
 #include "vmcs_shadow_fields.h"
 };
 static int max_shadow_read_only_fields =
 	ARRAY_SIZE(shadow_read_only_fields);
 
 static struct shadow_vmcs_field shadow_read_write_fields[] = {
-#define SHADOW_FIELD_RW(x, y) { x, offsetof(struct vmcs12, y) },
+#define SHADOW_FIELD_RW(x, y, c) { x, offsetof(struct vmcs12, y) },
 #include "vmcs_shadow_fields.h"
 };
 static int max_shadow_read_write_fields =
@@ -83,6 +83,17 @@ static void init_vmcs_shadow_fields(void)
 			pr_err("Missing field from shadow_read_only_field %x\n",
 			       field + 1);
 
+		switch (field) {
+#define SHADOW_FIELD_RO(x, y, c)		\
+		case x:				\
+			if (!(c))		\
+				continue;	\
+			break;
+#include "vmcs_shadow_fields.h"
+		default:
+			break;
+		}
+
 		clear_bit(field, vmx_vmread_bitmap);
 		if (field & 1)
 #ifdef CONFIG_X86_64
@@ -114,18 +125,12 @@ static void init_vmcs_shadow_fields(void)
 		 * on bare metal.
 		 */
 		switch (field) {
-		case GUEST_PML_INDEX:
-			if (!cpu_has_vmx_pml())
-				continue;
-			break;
-		case VMX_PREEMPTION_TIMER_VALUE:
-			if (!cpu_has_vmx_preemption_timer())
-				continue;
-			break;
-		case GUEST_INTR_STATUS:
-			if (!cpu_has_vmx_apicv())
-				continue;
+#define SHADOW_FIELD_RW(x, y, c)		\
+		case x:				\
+			if (!(c))		\
+				continue;	\
 			break;
+#include "vmcs_shadow_fields.h"
 		default:
 			break;
 		}
@@ -1585,6 +1590,18 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
 
 	for (i = 0; i < max_shadow_read_write_fields; i++) {
 		field = shadow_read_write_fields[i];
+
+		switch (field.encoding) {
+#define SHADOW_FIELD_RW(x, y, c)		\
+		case x:				\
+			if (!(c))		\
+				continue;	\
+			break;
+#include "vmcs_shadow_fields.h"
+		default:
+			break;
+		}
+
 		val = __vmcs_readl(field.encoding);
 		vmcs12_write_any(vmcs12, field.encoding, field.offset, val);
 	}
@@ -1619,6 +1636,23 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
 	for (q = 0; q < ARRAY_SIZE(fields); q++) {
 		for (i = 0; i < max_fields[q]; i++) {
 			field = fields[q][i];
+
+			switch (field.encoding) {
+#define SHADOW_FIELD_RO(x, y, c)			\
+			case x:				\
+				if (!(c))		\
+					continue;	\
+				break;
+#define SHADOW_FIELD_RW(x, y, c)			\
+			case x:				\
+				if (!(c))		\
+					continue;	\
+				break;
+#include "vmcs_shadow_fields.h"
+			default:
+				break;
+			}
+
 			val = vmcs12_read_any(vmcs12, field.encoding,
 					      field.offset);
 			__vmcs_writel(field.encoding, val);
@@ -5492,9 +5526,10 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
 static bool is_shadow_field_rw(unsigned long field)
 {
 	switch (field) {
-#define SHADOW_FIELD_RW(x, y) case x:
+#define SHADOW_FIELD_RW(x, y, c)	\
+	case x:				\
+		return c;
 #include "vmcs_shadow_fields.h"
-		return true;
 	default:
 		break;
 	}
@@ -5504,9 +5539,10 @@ static bool is_shadow_field_rw(unsigned long field)
 static bool is_shadow_field_ro(unsigned long field)
 {
 	switch (field) {
-#define SHADOW_FIELD_RO(x, y) case x:
+#define SHADOW_FIELD_RO(x, y, c)	\
+	case x:				\
+		return c;
 #include "vmcs_shadow_fields.h"
-		return true;
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/vmx/vmcs_shadow_fields.h b/arch/x86/kvm/vmx/vmcs_shadow_fields.h
index cad128d1657b..7f48056fe351 100644
--- a/arch/x86/kvm/vmx/vmcs_shadow_fields.h
+++ b/arch/x86/kvm/vmx/vmcs_shadow_fields.h
@@ -3,10 +3,10 @@ BUILD_BUG_ON(1)
 #endif
 
 #ifndef SHADOW_FIELD_RO
-#define SHADOW_FIELD_RO(x, y)
+#define SHADOW_FIELD_RO(x, y, c)
 #endif
 #ifndef SHADOW_FIELD_RW
-#define SHADOW_FIELD_RW(x, y)
+#define SHADOW_FIELD_RW(x, y, c)
 #endif
 
 /*
@@ -32,48 +32,48 @@ BUILD_BUG_ON(1)
  */
 
 /* 16-bits */
-SHADOW_FIELD_RW(GUEST_INTR_STATUS, guest_intr_status)
-SHADOW_FIELD_RW(GUEST_PML_INDEX, guest_pml_index)
-SHADOW_FIELD_RW(HOST_FS_SELECTOR, host_fs_selector)
-SHADOW_FIELD_RW(HOST_GS_SELECTOR, host_gs_selector)
+SHADOW_FIELD_RW(GUEST_INTR_STATUS, guest_intr_status, cpu_has_vmx_apicv())
+SHADOW_FIELD_RW(GUEST_PML_INDEX, guest_pml_index, cpu_has_vmx_pml())
+SHADOW_FIELD_RW(HOST_FS_SELECTOR, host_fs_selector, true)
+SHADOW_FIELD_RW(HOST_GS_SELECTOR, host_gs_selector, true)
 
 /* 32-bits */
-SHADOW_FIELD_RO(VM_EXIT_REASON, vm_exit_reason)
-SHADOW_FIELD_RO(VM_EXIT_INTR_INFO, vm_exit_intr_info)
-SHADOW_FIELD_RO(VM_EXIT_INSTRUCTION_LEN, vm_exit_instruction_len)
-SHADOW_FIELD_RO(IDT_VECTORING_INFO_FIELD, idt_vectoring_info_field)
-SHADOW_FIELD_RO(IDT_VECTORING_ERROR_CODE, idt_vectoring_error_code)
-SHADOW_FIELD_RO(VM_EXIT_INTR_ERROR_CODE, vm_exit_intr_error_code)
-SHADOW_FIELD_RO(GUEST_CS_AR_BYTES, guest_cs_ar_bytes)
-SHADOW_FIELD_RO(GUEST_SS_AR_BYTES, guest_ss_ar_bytes)
-SHADOW_FIELD_RW(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control)
-SHADOW_FIELD_RW(PIN_BASED_VM_EXEC_CONTROL, pin_based_vm_exec_control)
-SHADOW_FIELD_RW(EXCEPTION_BITMAP, exception_bitmap)
-SHADOW_FIELD_RW(VM_ENTRY_EXCEPTION_ERROR_CODE, vm_entry_exception_error_code)
-SHADOW_FIELD_RW(VM_ENTRY_INTR_INFO_FIELD, vm_entry_intr_info_field)
-SHADOW_FIELD_RW(VM_ENTRY_INSTRUCTION_LEN, vm_entry_instruction_len)
-SHADOW_FIELD_RW(TPR_THRESHOLD, tpr_threshold)
-SHADOW_FIELD_RW(GUEST_INTERRUPTIBILITY_INFO, guest_interruptibility_info)
-SHADOW_FIELD_RW(VMX_PREEMPTION_TIMER_VALUE, vmx_preemption_timer_value)
+SHADOW_FIELD_RO(VM_EXIT_REASON, vm_exit_reason, true)
+SHADOW_FIELD_RO(VM_EXIT_INTR_INFO, vm_exit_intr_info, true)
+SHADOW_FIELD_RO(VM_EXIT_INSTRUCTION_LEN, vm_exit_instruction_len, true)
+SHADOW_FIELD_RO(VM_EXIT_INTR_ERROR_CODE, vm_exit_intr_error_code, true)
+SHADOW_FIELD_RO(IDT_VECTORING_INFO_FIELD, idt_vectoring_info_field, true)
+SHADOW_FIELD_RO(IDT_VECTORING_ERROR_CODE, idt_vectoring_error_code, true)
+SHADOW_FIELD_RO(GUEST_CS_AR_BYTES, guest_cs_ar_bytes, true)
+SHADOW_FIELD_RO(GUEST_SS_AR_BYTES, guest_ss_ar_bytes, true)
+SHADOW_FIELD_RW(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control, true)
+SHADOW_FIELD_RW(PIN_BASED_VM_EXEC_CONTROL, pin_based_vm_exec_control, true)
+SHADOW_FIELD_RW(EXCEPTION_BITMAP, exception_bitmap, true)
+SHADOW_FIELD_RW(VM_ENTRY_EXCEPTION_ERROR_CODE, vm_entry_exception_error_code, true)
+SHADOW_FIELD_RW(VM_ENTRY_INTR_INFO_FIELD, vm_entry_intr_info_field, true)
+SHADOW_FIELD_RW(VM_ENTRY_INSTRUCTION_LEN, vm_entry_instruction_len, true)
+SHADOW_FIELD_RW(TPR_THRESHOLD, tpr_threshold, true)
+SHADOW_FIELD_RW(GUEST_INTERRUPTIBILITY_INFO, guest_interruptibility_info, true)
+SHADOW_FIELD_RW(VMX_PREEMPTION_TIMER_VALUE, vmx_preemption_timer_value, cpu_has_vmx_preemption_timer())
 
 /* Natural width */
-SHADOW_FIELD_RO(EXIT_QUALIFICATION, exit_qualification)
-SHADOW_FIELD_RO(GUEST_LINEAR_ADDRESS, guest_linear_address)
-SHADOW_FIELD_RW(GUEST_RIP, guest_rip)
-SHADOW_FIELD_RW(GUEST_RSP, guest_rsp)
-SHADOW_FIELD_RW(GUEST_CR0, guest_cr0)
-SHADOW_FIELD_RW(GUEST_CR3, guest_cr3)
-SHADOW_FIELD_RW(GUEST_CR4, guest_cr4)
-SHADOW_FIELD_RW(GUEST_RFLAGS, guest_rflags)
-SHADOW_FIELD_RW(CR0_GUEST_HOST_MASK, cr0_guest_host_mask)
-SHADOW_FIELD_RW(CR0_READ_SHADOW, cr0_read_shadow)
-SHADOW_FIELD_RW(CR4_READ_SHADOW, cr4_read_shadow)
-SHADOW_FIELD_RW(HOST_FS_BASE, host_fs_base)
-SHADOW_FIELD_RW(HOST_GS_BASE, host_gs_base)
+SHADOW_FIELD_RO(EXIT_QUALIFICATION, exit_qualification, true)
+SHADOW_FIELD_RO(GUEST_LINEAR_ADDRESS, guest_linear_address, true)
+SHADOW_FIELD_RW(GUEST_RIP, guest_rip, true)
+SHADOW_FIELD_RW(GUEST_RSP, guest_rsp, true)
+SHADOW_FIELD_RW(GUEST_CR0, guest_cr0, true)
+SHADOW_FIELD_RW(GUEST_CR3, guest_cr3, true)
+SHADOW_FIELD_RW(GUEST_CR4, guest_cr4, true)
+SHADOW_FIELD_RW(GUEST_RFLAGS, guest_rflags, true)
+SHADOW_FIELD_RW(CR0_GUEST_HOST_MASK, cr0_guest_host_mask, true)
+SHADOW_FIELD_RW(CR0_READ_SHADOW, cr0_read_shadow, true)
+SHADOW_FIELD_RW(CR4_READ_SHADOW, cr4_read_shadow, true)
+SHADOW_FIELD_RW(HOST_FS_BASE, host_fs_base, true)
+SHADOW_FIELD_RW(HOST_GS_BASE, host_gs_base, true)
 
 /* 64-bit */
-SHADOW_FIELD_RO(GUEST_PHYSICAL_ADDRESS, guest_physical_address)
-SHADOW_FIELD_RO(GUEST_PHYSICAL_ADDRESS_HIGH, guest_physical_address)
+SHADOW_FIELD_RO(GUEST_PHYSICAL_ADDRESS, guest_physical_address, true)
+SHADOW_FIELD_RO(GUEST_PHYSICAL_ADDRESS_HIGH, guest_physical_address, true)
 
 #undef SHADOW_FIELD_RO
 #undef SHADOW_FIELD_RW
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 19/25] KVM: nVMX: Add FRED VMCS fields
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (17 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 18/25] KVM: nVMX: Add a prerequisite to SHADOW_FIELD_R[OW] macros Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 20/25] KVM: nVMX: Add support for VMX FRED controls Xin Li
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Add FRED VMCS fields to nested VMX context management.

Todo: change VMCS12_REVISION, as struct vmcs12 is changed.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---

Change since v1:
* Remove hyperv TLFS related changes (Jeremi Piotrowski).
* Use kvm_cpu_cap_has() instead of cpu_feature_enabled() (Chao Gao).
---
 Documentation/virt/kvm/x86/nested-vmx.rst | 18 +++++
 arch/x86/kvm/vmx/nested.c                 | 91 +++++++++++++++++++----
 arch/x86/kvm/vmx/vmcs12.c                 | 18 +++++
 arch/x86/kvm/vmx/vmcs12.h                 | 36 +++++++++
 arch/x86/kvm/vmx/vmcs_shadow_fields.h     |  4 +
 5 files changed, 152 insertions(+), 15 deletions(-)

diff --git a/Documentation/virt/kvm/x86/nested-vmx.rst b/Documentation/virt/kvm/x86/nested-vmx.rst
index e64ef231f310..87fa9f3877ab 100644
--- a/Documentation/virt/kvm/x86/nested-vmx.rst
+++ b/Documentation/virt/kvm/x86/nested-vmx.rst
@@ -218,6 +218,24 @@ struct shadow_vmcs is ever changed.
 		u16 host_gs_selector;
 		u16 host_tr_selector;
 		u64 secondary_vm_exit_controls;
+		u64 guest_ia32_fred_config;
+		u64 guest_ia32_fred_rsp1;
+		u64 guest_ia32_fred_rsp2;
+		u64 guest_ia32_fred_rsp3;
+		u64 guest_ia32_fred_stklvls;
+		u64 guest_ia32_fred_ssp1;
+		u64 guest_ia32_fred_ssp2;
+		u64 guest_ia32_fred_ssp3;
+		u64 host_ia32_fred_config;
+		u64 host_ia32_fred_rsp1;
+		u64 host_ia32_fred_rsp2;
+		u64 host_ia32_fred_rsp3;
+		u64 host_ia32_fred_stklvls;
+		u64 host_ia32_fred_ssp1;
+		u64 host_ia32_fred_ssp2;
+		u64 host_ia32_fred_ssp3;
+		u64 injected_event_data;
+		u64 original_event_data;
 	};
 
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 94da6a0a2f81..f9c1fbeac302 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -686,6 +686,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_FRED_RSP0, MSR_TYPE_RW);
 #endif
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
@@ -2498,6 +2501,8 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
 			     vmcs12->vm_entry_instruction_len);
 		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
 			     vmcs12->guest_interruptibility_info);
+		if (kvm_cpu_cap_has(X86_FEATURE_FRED))
+			vmcs_write64(INJECTED_EVENT_DATA, vmcs12->injected_event_data);
 		vmx->loaded_vmcs->nmi_known_unmasked =
 			!(vmcs12->guest_interruptibility_info & GUEST_INTR_STATE_NMI);
 	} else {
@@ -2548,6 +2553,17 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 		vmcs_writel(GUEST_GDTR_BASE, vmcs12->guest_gdtr_base);
 		vmcs_writel(GUEST_IDTR_BASE, vmcs12->guest_idtr_base);
 
+		if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
+			vmcs_write64(GUEST_IA32_FRED_CONFIG, vmcs12->guest_ia32_fred_config);
+			vmcs_write64(GUEST_IA32_FRED_RSP1, vmcs12->guest_ia32_fred_rsp1);
+			vmcs_write64(GUEST_IA32_FRED_RSP2, vmcs12->guest_ia32_fred_rsp2);
+			vmcs_write64(GUEST_IA32_FRED_RSP3, vmcs12->guest_ia32_fred_rsp3);
+			vmcs_write64(GUEST_IA32_FRED_STKLVLS, vmcs12->guest_ia32_fred_stklvls);
+			vmcs_write64(GUEST_IA32_FRED_SSP1, vmcs12->guest_ia32_fred_ssp1);
+			vmcs_write64(GUEST_IA32_FRED_SSP2, vmcs12->guest_ia32_fred_ssp2);
+			vmcs_write64(GUEST_IA32_FRED_SSP3, vmcs12->guest_ia32_fred_ssp3);
+		}
+
 		vmx->segment_cache.bitmask = 0;
 	}
 
@@ -3835,6 +3851,22 @@ vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 			vcpu->arch.cr4_guest_owned_bits));
 }
 
+static inline unsigned long
+nested_vmx_get_event_data(struct kvm_vcpu *vcpu, bool for_ex_vmexit)
+{
+	struct kvm_queued_exception *ex = for_ex_vmexit ?
+		&vcpu->arch.exception_vmexit : &vcpu->arch.exception;
+
+	if (ex->has_payload)
+		return ex->payload;
+	else if (ex->vector == PF_VECTOR)
+		return vcpu->arch.cr2;
+	else if (ex->vector == DB_VECTOR)
+		return (vcpu->arch.dr6 & ~DR6_BT) ^ DR6_ACTIVE_LOW;
+	else
+		return 0;
+}
+
 static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
 				      struct vmcs12 *vmcs12,
 				      u32 vm_exit_reason, u32 exit_intr_info)
@@ -3842,6 +3874,8 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
 	u32 idt_vectoring;
 	unsigned int nr;
 
+	vmcs12->original_event_data = 0;
+
 	/*
 	 * Per the SDM, VM-Exits due to double and triple faults are never
 	 * considered to occur during event delivery, even if the double/triple
@@ -3880,6 +3914,12 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
 				vcpu->arch.exception.error_code;
 		}
 
+		idt_vectoring |= vcpu->arch.exception.nested ?
+				INTR_INFO_NESTED_EXCEPTION_MASK : 0;
+
+		vmcs12->original_event_data =
+			nested_vmx_get_event_data(vcpu, false);
+
 		vmcs12->idt_vectoring_info_field = idt_vectoring;
 	} else if (vcpu->arch.nmi_injected) {
 		vmcs12->idt_vectoring_info_field =
@@ -3970,19 +4010,7 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 	struct kvm_queued_exception *ex = &vcpu->arch.exception_vmexit;
 	u32 intr_info = ex->vector | INTR_INFO_VALID_MASK;
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-	unsigned long exit_qual;
-
-	if (ex->has_payload) {
-		exit_qual = ex->payload;
-	} else if (ex->vector == PF_VECTOR) {
-		exit_qual = vcpu->arch.cr2;
-	} else if (ex->vector == DB_VECTOR) {
-		exit_qual = vcpu->arch.dr6;
-		exit_qual &= ~DR6_BT;
-		exit_qual ^= DR6_ACTIVE_LOW;
-	} else {
-		exit_qual = 0;
-	}
+	unsigned long exit_qual = nested_vmx_get_event_data(vcpu, true);
 
 	/*
 	 * Unlike AMD's Paged Real Mode, which reports an error code on #PF
@@ -4003,10 +4031,12 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 		intr_info |= INTR_INFO_DELIVER_CODE_MASK;
 	}
 
-	if (kvm_exception_is_soft(ex->vector))
+	if (kvm_exception_is_soft(ex->vector)) {
 		intr_info |= INTR_TYPE_SOFT_EXCEPTION;
-	else
+	} else {
 		intr_info |= INTR_TYPE_HARD_EXCEPTION;
+		intr_info |= ex->nested ? INTR_INFO_NESTED_EXCEPTION_MASK : 0;
+	}
 
 	if (!(vmcs12->idt_vectoring_info_field & VECTORING_INFO_VALID_MASK) &&
 	    vmx_get_nmi_mask(vcpu))
@@ -4352,6 +4382,14 @@ static bool is_vmcs12_ext_field(unsigned long field)
 	case GUEST_TR_BASE:
 	case GUEST_GDTR_BASE:
 	case GUEST_IDTR_BASE:
+	case GUEST_IA32_FRED_CONFIG:
+	case GUEST_IA32_FRED_RSP1:
+	case GUEST_IA32_FRED_RSP2:
+	case GUEST_IA32_FRED_RSP3:
+	case GUEST_IA32_FRED_STKLVLS:
+	case GUEST_IA32_FRED_SSP1:
+	case GUEST_IA32_FRED_SSP2:
+	case GUEST_IA32_FRED_SSP3:
 	case GUEST_PENDING_DBG_EXCEPTIONS:
 	case GUEST_BNDCFGS:
 		return true;
@@ -4401,6 +4439,18 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu,
 	vmcs12->guest_tr_base = vmcs_readl(GUEST_TR_BASE);
 	vmcs12->guest_gdtr_base = vmcs_readl(GUEST_GDTR_BASE);
 	vmcs12->guest_idtr_base = vmcs_readl(GUEST_IDTR_BASE);
+
+	if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
+		vmcs12->guest_ia32_fred_config = vmcs_read64(GUEST_IA32_FRED_CONFIG);
+		vmcs12->guest_ia32_fred_rsp1 = vmcs_read64(GUEST_IA32_FRED_RSP1);
+		vmcs12->guest_ia32_fred_rsp2 = vmcs_read64(GUEST_IA32_FRED_RSP2);
+		vmcs12->guest_ia32_fred_rsp3 = vmcs_read64(GUEST_IA32_FRED_RSP3);
+		vmcs12->guest_ia32_fred_stklvls = vmcs_read64(GUEST_IA32_FRED_STKLVLS);
+		vmcs12->guest_ia32_fred_ssp1 = vmcs_read64(GUEST_IA32_FRED_SSP1);
+		vmcs12->guest_ia32_fred_ssp2 = vmcs_read64(GUEST_IA32_FRED_SSP2);
+		vmcs12->guest_ia32_fred_ssp3 = vmcs_read64(GUEST_IA32_FRED_SSP3);
+	}
+
 	vmcs12->guest_pending_dbg_exceptions =
 		vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
 
@@ -4625,6 +4675,17 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	vmcs_write32(GUEST_IDTR_LIMIT, 0xFFFF);
 	vmcs_write32(GUEST_GDTR_LIMIT, 0xFFFF);
 
+	if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
+		vmcs_write64(GUEST_IA32_FRED_CONFIG, vmcs12->host_ia32_fred_config);
+		vmcs_write64(GUEST_IA32_FRED_RSP1, vmcs12->host_ia32_fred_rsp1);
+		vmcs_write64(GUEST_IA32_FRED_RSP2, vmcs12->host_ia32_fred_rsp2);
+		vmcs_write64(GUEST_IA32_FRED_RSP3, vmcs12->host_ia32_fred_rsp3);
+		vmcs_write64(GUEST_IA32_FRED_STKLVLS, vmcs12->host_ia32_fred_stklvls);
+		vmcs_write64(GUEST_IA32_FRED_SSP1, vmcs12->host_ia32_fred_ssp1);
+		vmcs_write64(GUEST_IA32_FRED_SSP2, vmcs12->host_ia32_fred_ssp2);
+		vmcs_write64(GUEST_IA32_FRED_SSP3, vmcs12->host_ia32_fred_ssp3);
+	}
+
 	/* If not VM_EXIT_CLEAR_BNDCFGS, the L2 value propagates to L1.  */
 	if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
 		vmcs_write64(GUEST_BNDCFGS, 0);
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 98457d7b2b23..59f17fdfad11 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -80,6 +80,7 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(VM_ENTRY_MSR_LOAD_COUNT, vm_entry_msr_load_count),
 	FIELD(VM_ENTRY_INTR_INFO_FIELD, vm_entry_intr_info_field),
 	FIELD(VM_ENTRY_EXCEPTION_ERROR_CODE, vm_entry_exception_error_code),
+	FIELD(INJECTED_EVENT_DATA, injected_event_data),
 	FIELD(VM_ENTRY_INSTRUCTION_LEN, vm_entry_instruction_len),
 	FIELD(TPR_THRESHOLD, tpr_threshold),
 	FIELD(SECONDARY_VM_EXEC_CONTROL, secondary_vm_exec_control),
@@ -89,6 +90,7 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(VM_EXIT_INTR_ERROR_CODE, vm_exit_intr_error_code),
 	FIELD(IDT_VECTORING_INFO_FIELD, idt_vectoring_info_field),
 	FIELD(IDT_VECTORING_ERROR_CODE, idt_vectoring_error_code),
+	FIELD(ORIGINAL_EVENT_DATA, original_event_data),
 	FIELD(VM_EXIT_INSTRUCTION_LEN, vm_exit_instruction_len),
 	FIELD(VMX_INSTRUCTION_INFO, vmx_instruction_info),
 	FIELD(GUEST_ES_LIMIT, guest_es_limit),
@@ -152,5 +154,21 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip),
 	FIELD(HOST_RSP, host_rsp),
 	FIELD(HOST_RIP, host_rip),
+	FIELD(GUEST_IA32_FRED_CONFIG, guest_ia32_fred_config),
+	FIELD(GUEST_IA32_FRED_RSP1, guest_ia32_fred_rsp1),
+	FIELD(GUEST_IA32_FRED_RSP2, guest_ia32_fred_rsp2),
+	FIELD(GUEST_IA32_FRED_RSP3, guest_ia32_fred_rsp3),
+	FIELD(GUEST_IA32_FRED_STKLVLS, guest_ia32_fred_stklvls),
+	FIELD(GUEST_IA32_FRED_SSP1, guest_ia32_fred_ssp1),
+	FIELD(GUEST_IA32_FRED_SSP2, guest_ia32_fred_ssp2),
+	FIELD(GUEST_IA32_FRED_SSP3, guest_ia32_fred_ssp3),
+	FIELD(HOST_IA32_FRED_CONFIG, host_ia32_fred_config),
+	FIELD(HOST_IA32_FRED_RSP1, host_ia32_fred_rsp1),
+	FIELD(HOST_IA32_FRED_RSP2, host_ia32_fred_rsp2),
+	FIELD(HOST_IA32_FRED_RSP3, host_ia32_fred_rsp3),
+	FIELD(HOST_IA32_FRED_STKLVLS, host_ia32_fred_stklvls),
+	FIELD(HOST_IA32_FRED_SSP1, host_ia32_fred_ssp1),
+	FIELD(HOST_IA32_FRED_SSP2, host_ia32_fred_ssp2),
+	FIELD(HOST_IA32_FRED_SSP3, host_ia32_fred_ssp3),
 };
 const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index f50f897b9b5f..edf7fcef8ccf 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -186,6 +186,24 @@ struct __packed vmcs12 {
 	u16 host_tr_selector;
 	u16 guest_pml_index;
 	u64 secondary_vm_exit_controls;
+	u64 guest_ia32_fred_config;
+	u64 guest_ia32_fred_rsp1;
+	u64 guest_ia32_fred_rsp2;
+	u64 guest_ia32_fred_rsp3;
+	u64 guest_ia32_fred_stklvls;
+	u64 guest_ia32_fred_ssp1;
+	u64 guest_ia32_fred_ssp2;
+	u64 guest_ia32_fred_ssp3;
+	u64 host_ia32_fred_config;
+	u64 host_ia32_fred_rsp1;
+	u64 host_ia32_fred_rsp2;
+	u64 host_ia32_fred_rsp3;
+	u64 host_ia32_fred_stklvls;
+	u64 host_ia32_fred_ssp1;
+	u64 host_ia32_fred_ssp2;
+	u64 host_ia32_fred_ssp3;
+	u64 injected_event_data;
+	u64 original_event_data;
 };
 
 /*
@@ -360,6 +378,24 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(host_tr_selector, 994);
 	CHECK_OFFSET(guest_pml_index, 996);
 	CHECK_OFFSET(secondary_vm_exit_controls, 998);
+	CHECK_OFFSET(guest_ia32_fred_config, 1006);
+	CHECK_OFFSET(guest_ia32_fred_rsp1, 1014);
+	CHECK_OFFSET(guest_ia32_fred_rsp2, 1022);
+	CHECK_OFFSET(guest_ia32_fred_rsp3, 1030);
+	CHECK_OFFSET(guest_ia32_fred_stklvls, 1038);
+	CHECK_OFFSET(guest_ia32_fred_ssp1, 1046);
+	CHECK_OFFSET(guest_ia32_fred_ssp2, 1054);
+	CHECK_OFFSET(guest_ia32_fred_ssp3, 1062);
+	CHECK_OFFSET(host_ia32_fred_config, 1070);
+	CHECK_OFFSET(host_ia32_fred_rsp1, 1078);
+	CHECK_OFFSET(host_ia32_fred_rsp2, 1086);
+	CHECK_OFFSET(host_ia32_fred_rsp3, 1094);
+	CHECK_OFFSET(host_ia32_fred_stklvls, 1102);
+	CHECK_OFFSET(host_ia32_fred_ssp1, 1110);
+	CHECK_OFFSET(host_ia32_fred_ssp2, 1118);
+	CHECK_OFFSET(host_ia32_fred_ssp3, 1126);
+	CHECK_OFFSET(injected_event_data, 1134);
+	CHECK_OFFSET(original_event_data, 1142);
 }
 
 extern const unsigned short vmcs12_field_offsets[];
diff --git a/arch/x86/kvm/vmx/vmcs_shadow_fields.h b/arch/x86/kvm/vmx/vmcs_shadow_fields.h
index 7f48056fe351..3885a3e0fbe8 100644
--- a/arch/x86/kvm/vmx/vmcs_shadow_fields.h
+++ b/arch/x86/kvm/vmx/vmcs_shadow_fields.h
@@ -74,6 +74,10 @@ SHADOW_FIELD_RW(HOST_GS_BASE, host_gs_base, true)
 /* 64-bit */
 SHADOW_FIELD_RO(GUEST_PHYSICAL_ADDRESS, guest_physical_address, true)
 SHADOW_FIELD_RO(GUEST_PHYSICAL_ADDRESS_HIGH, guest_physical_address, true)
+SHADOW_FIELD_RO(ORIGINAL_EVENT_DATA, original_event_data, kvm_cpu_cap_has(X86_FEATURE_FRED))
+SHADOW_FIELD_RO(ORIGINAL_EVENT_DATA_HIGH, original_event_data, kvm_cpu_cap_has(X86_FEATURE_FRED))
+SHADOW_FIELD_RW(INJECTED_EVENT_DATA, injected_event_data, kvm_cpu_cap_has(X86_FEATURE_FRED))
+SHADOW_FIELD_RW(INJECTED_EVENT_DATA_HIGH, injected_event_data, kvm_cpu_cap_has(X86_FEATURE_FRED))
 
 #undef SHADOW_FIELD_RO
 #undef SHADOW_FIELD_RW
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 20/25] KVM: nVMX: Add support for VMX FRED controls
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (18 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 19/25] KVM: nVMX: Add FRED VMCS fields Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 21/25] KVM: nVMX: Add VMCS FRED states checking Xin Li
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Add VMX FRED controls to nested VMX controls and set the VMX
nested-exception support bit (bit 58) in the nested IA32_VMX_BASIC MSR
when FRED is enabled.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 14 ++++++++++----
 arch/x86/kvm/vmx/vmx.c    |  1 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f9c1fbeac302..04a9cdb0361f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1239,10 +1239,12 @@ static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
 #define VMX_BASIC_FEATURES_MASK			\
 	(VMX_BASIC_DUAL_MONITOR_TREATMENT |	\
 	 VMX_BASIC_INOUT |			\
-	 VMX_BASIC_TRUE_CTLS)
+	 VMX_BASIC_TRUE_CTLS |			\
+	 VMX_BASIC_NESTED_EXCEPTION)
 
-#define VMX_BASIC_RESERVED_BITS			\
-	(GENMASK_ULL(63, 56) | GENMASK_ULL(47, 45) | BIT_ULL(31))
+#define VMX_BASIC_RESERVED_BITS				\
+	(GENMASK_ULL(63, 59) | GENMASK_ULL(57, 56) |	\
+	 GENMASK_ULL(47, 45) | BIT_ULL(31))
 
 static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
 {
@@ -6988,7 +6990,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
 #ifdef CONFIG_X86_64
 		VM_ENTRY_IA32E_MODE |
 #endif
-		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+		VM_ENTRY_LOAD_IA32_FRED;
 	msrs->entry_ctls_high |=
 		(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
 		 VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
@@ -7147,6 +7150,9 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
 
 	if (cpu_has_vmx_basic_inout())
 		msrs->basic |= VMX_BASIC_INOUT;
+
+	if (kvm_cpu_cap_has(X86_FEATURE_FRED))
+		msrs->basic |= VMX_BASIC_NESTED_EXCEPTION;
 }
 
 static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fc808d599493..1005b6a57d23 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7915,6 +7915,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 
 	entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1);
 	cr4_fixed1_update(X86_CR4_LAM_SUP,    eax, feature_bit(LAM));
+	cr4_fixed1_update(X86_CR4_FRED,       eax, feature_bit(FRED));
 
 #undef cr4_fixed1_update
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 21/25] KVM: nVMX: Add VMCS FRED states checking
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (19 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 20/25] KVM: nVMX: Add support for VMX FRED controls Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 22/25] KVM: x86: Allow FRED/LKGS/WRMSRNS to be exposed to guests Xin Li
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Add FRED related VMCS fields checkings.

As real hardware, nested VMX performs checks on various VMCS fields,
including both controls and guest/host states.  With the introduction
of VMX FRED, add FRED related VMCS fields checkings.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 80 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 04a9cdb0361f..ef0bd46eb0ce 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2933,6 +2933,8 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 					  struct vmcs12 *vmcs12)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	bool fred_enabled = (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE) &&
+			    (vmcs12->guest_cr4 & X86_CR4_FRED);
 
 	if (CC(!vmx_control_verify(vmcs12->vm_entry_controls,
 				    vmx->nested.msrs.entry_ctls_low,
@@ -2951,6 +2953,7 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		u32 intr_type = intr_info & INTR_INFO_INTR_TYPE_MASK;
 		bool has_error_code = intr_info & INTR_INFO_DELIVER_CODE_MASK;
 		bool should_have_error_code;
+		bool has_nested_exception = vmx->nested.msrs.basic & VMX_BASIC_NESTED_EXCEPTION;
 		bool urg = nested_cpu_has2(vmcs12,
 					   SECONDARY_EXEC_UNRESTRICTED_GUEST);
 		bool prot_mode = !urg || vmcs12->guest_cr0 & X86_CR0_PE;
@@ -2964,7 +2967,9 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		/* VM-entry interruption-info field: vector */
 		if (CC(intr_type == INTR_TYPE_NMI_INTR && vector != NMI_VECTOR) ||
 		    CC(intr_type == INTR_TYPE_HARD_EXCEPTION && vector > 31) ||
-		    CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0))
+		    CC(intr_type == INTR_TYPE_OTHER_EVENT &&
+		       ((!fred_enabled && vector > 0) ||
+		        (fred_enabled && vector > 2))))
 			return -EINVAL;
 
 		/* VM-entry interruption-info field: deliver error code */
@@ -2983,6 +2988,15 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		if (CC(intr_info & INTR_INFO_RESVD_BITS_MASK))
 			return -EINVAL;
 
+		/*
+		 * When the CPU enumerates VMX nested-exception support, bit 13
+		 * (set to indicate a nested exception) of the intr info field
+		 * may have value 1. Otherwise bit 13 is reserved.
+		 */
+		if (CC(!has_nested_exception &&
+		       (intr_info & INTR_INFO_NESTED_EXCEPTION_MASK)))
+			return -EINVAL;
+
 		/* VM-entry instruction length */
 		switch (intr_type) {
 		case INTR_TYPE_SOFT_EXCEPTION:
@@ -2992,6 +3006,12 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 			    CC(vmcs12->vm_entry_instruction_len == 0 &&
 			    CC(!nested_cpu_has_zero_length_injection(vcpu))))
 				return -EINVAL;
+			break;
+		case INTR_TYPE_OTHER_EVENT:
+			if (fred_enabled && (vector == 1 || vector == 2))
+				if (CC(vmcs12->vm_entry_instruction_len > 15))
+					return -EINVAL;
+			break;
 		}
 	}
 
@@ -3054,9 +3074,30 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 	if (ia32e) {
 		if (CC(!(vmcs12->host_cr4 & X86_CR4_PAE)))
 			return -EINVAL;
+		if (vmcs12->vm_exit_controls & VM_EXIT_ACTIVATE_SECONDARY_CONTROLS &&
+		    vmcs12->secondary_vm_exit_controls & SECONDARY_VM_EXIT_LOAD_IA32_FRED) {
+			/* Bit 11, bits 5:4, and bit 2 of the IA32_FRED_CONFIG must be zero */
+			if (CC(vmcs12->host_ia32_fred_config &
+			       (BIT_ULL(11) | GENMASK_ULL(5, 4) | BIT_ULL(2))) ||
+			    CC(vmcs12->host_ia32_fred_rsp1 & GENMASK_ULL(5, 0)) ||
+			    CC(vmcs12->host_ia32_fred_rsp2 & GENMASK_ULL(5, 0)) ||
+			    CC(vmcs12->host_ia32_fred_rsp3 & GENMASK_ULL(5, 0)) ||
+			    CC(vmcs12->host_ia32_fred_ssp1 & GENMASK_ULL(2, 0)) ||
+			    CC(vmcs12->host_ia32_fred_ssp2 & GENMASK_ULL(2, 0)) ||
+			    CC(vmcs12->host_ia32_fred_ssp3 & GENMASK_ULL(2, 0)) ||
+			    CC(is_noncanonical_address(vmcs12->host_ia32_fred_config & PAGE_MASK, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->host_ia32_fred_rsp1, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->host_ia32_fred_rsp2, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->host_ia32_fred_rsp3, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->host_ia32_fred_ssp1, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->host_ia32_fred_ssp2, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->host_ia32_fred_ssp3, vcpu)))
+				return -EINVAL;
+		}
 	} else {
 		if (CC(vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE) ||
 		    CC(vmcs12->host_cr4 & X86_CR4_PCIDE) ||
+		    CC(vmcs12->host_cr4 & X86_CR4_FRED) ||
 		    CC((vmcs12->host_rip) >> 32))
 			return -EINVAL;
 	}
@@ -3200,6 +3241,43 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	     CC((vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD))))
 		return -EINVAL;
 
+	if (ia32e) {
+		if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_FRED) {
+			/* Bit 11, bits 5:4, and bit 2 of the IA32_FRED_CONFIG must be zero */
+			if (CC(vmcs12->guest_ia32_fred_config &
+			       (BIT_ULL(11) | GENMASK_ULL(5, 4) | BIT_ULL(2))) ||
+			    CC(vmcs12->guest_ia32_fred_rsp1 & GENMASK_ULL(5, 0)) ||
+			    CC(vmcs12->guest_ia32_fred_rsp2 & GENMASK_ULL(5, 0)) ||
+			    CC(vmcs12->guest_ia32_fred_rsp3 & GENMASK_ULL(5, 0)) ||
+			    CC(vmcs12->guest_ia32_fred_ssp1 & GENMASK_ULL(2, 0)) ||
+			    CC(vmcs12->guest_ia32_fred_ssp2 & GENMASK_ULL(2, 0)) ||
+			    CC(vmcs12->guest_ia32_fred_ssp3 & GENMASK_ULL(2, 0)) ||
+			    CC(is_noncanonical_address(vmcs12->guest_ia32_fred_config & PAGE_MASK, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->guest_ia32_fred_rsp1, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->guest_ia32_fred_rsp2, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->guest_ia32_fred_rsp3, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->guest_ia32_fred_ssp1, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->guest_ia32_fred_ssp2, vcpu)) ||
+			    CC(is_noncanonical_address(vmcs12->guest_ia32_fred_ssp3, vcpu)))
+				return -EINVAL;
+		}
+		if (vmcs12->guest_cr4 & X86_CR4_FRED) {
+			unsigned int ss_dpl = VMX_AR_DPL(vmcs12->guest_ss_ar_bytes);
+			if (CC(ss_dpl == 1 || ss_dpl == 2))
+				return -EINVAL;
+			if (ss_dpl == 0 &&
+			    CC(!(vmcs12->guest_cs_ar_bytes & VMX_AR_L_MASK)))
+				return -EINVAL;
+			if (ss_dpl == 3 &&
+			    (CC(vmcs12->guest_rflags & X86_EFLAGS_IOPL) ||
+			     CC(vmcs12->guest_interruptibility_info & GUEST_INTR_STATE_STI)))
+				return -EINVAL;
+		}
+	} else {
+		if (CC(vmcs12->guest_cr4 & X86_CR4_FRED))
+			return -EINVAL;
+	}
+
 	if (nested_check_guest_non_reg_state(vmcs12))
 		return -EINVAL;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 22/25] KVM: x86: Allow FRED/LKGS/WRMSRNS to be exposed to guests
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (20 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 21/25] KVM: nVMX: Add VMCS FRED states checking Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 23/25] KVM: selftests: Run debug_regs test with FRED enabled Xin Li
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Allow FRED/LKGS/WRMSRNS to be exposed to guests, thus a guest OS could see
these features when the guest is configured with FRED/LKGS/WRMSRNS in Qemu.

A qemu patch is required to expose FRED/LKGS/WRMSRNS to KVM guests.

Signed-off-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---
 arch/x86/kvm/cpuid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index adba49afb5fe..afc1316d78ad 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -676,8 +676,8 @@ void kvm_set_cpu_caps(void)
 
 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
-		F(FZRM) | F(FSRS) | F(FSRC) |
-		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
+		F(FZRM) | F(FSRS) | F(FSRC) | F(FRED) | F(LKGS) |
+		F(WRMSRNS) | F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
 	);
 
 	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 23/25] KVM: selftests: Run debug_regs test with FRED enabled
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (21 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 22/25] KVM: x86: Allow FRED/LKGS/WRMSRNS to be exposed to guests Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 24/25] KVM: selftests: Add a new VM guest mode to run user level code Xin Li
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Run another round of debug_regs test with FRED enabled if FRED is
available.

Signed-off-by: Xin Li <xin3.li@intel.com>
---
 .../selftests/kvm/include/x86_64/processor.h  |  4 ++
 .../testing/selftests/kvm/x86_64/debug_regs.c | 50 ++++++++++++++-----
 2 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index a84863503fcb..bc5cd8628a20 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -48,6 +48,7 @@ extern bool host_cpu_is_amd;
 #define X86_CR4_SMEP		(1ul << 20)
 #define X86_CR4_SMAP		(1ul << 21)
 #define X86_CR4_PKE		(1ul << 22)
+#define X86_CR4_FRED		(1ul << 32)
 
 struct xstate_header {
 	u64				xstate_bv;
@@ -164,6 +165,9 @@ struct kvm_x86_cpu_feature {
 #define	X86_FEATURE_SPEC_CTRL		KVM_X86_CPU_FEATURE(0x7, 0, EDX, 26)
 #define	X86_FEATURE_ARCH_CAPABILITIES	KVM_X86_CPU_FEATURE(0x7, 0, EDX, 29)
 #define	X86_FEATURE_PKS			KVM_X86_CPU_FEATURE(0x7, 0, ECX, 31)
+#define	X86_FEATURE_FRED		KVM_X86_CPU_FEATURE(0x7, 1, EAX, 17)
+#define	X86_FEATURE_LKGS		KVM_X86_CPU_FEATURE(0x7, 1, EAX, 18)
+#define	X86_FEATURE_WRMSRNS		KVM_X86_CPU_FEATURE(0x7, 1, EAX, 19)
 #define	X86_FEATURE_XTILECFG		KVM_X86_CPU_FEATURE(0xD, 0, EAX, 17)
 #define	X86_FEATURE_XTILEDATA		KVM_X86_CPU_FEATURE(0xD, 0, EAX, 18)
 #define	X86_FEATURE_XSAVES		KVM_X86_CPU_FEATURE(0xD, 1, EAX, 3)
diff --git a/tools/testing/selftests/kvm/x86_64/debug_regs.c b/tools/testing/selftests/kvm/x86_64/debug_regs.c
index f6b295e0b2d2..69055e764f15 100644
--- a/tools/testing/selftests/kvm/x86_64/debug_regs.c
+++ b/tools/testing/selftests/kvm/x86_64/debug_regs.c
@@ -20,7 +20,7 @@ uint32_t guest_value;
 
 extern unsigned char sw_bp, hw_bp, write_data, ss_start, bd_start;
 
-static void guest_code(void)
+static void guest_test_code(void)
 {
 	/* Create a pending interrupt on current vCPU */
 	x2apic_enable();
@@ -61,6 +61,15 @@ static void guest_code(void)
 
 	/* DR6.BD test */
 	asm volatile("bd_start: mov %%dr0, %%rax" : : : "rax");
+}
+
+static void guest_code(void)
+{
+	guest_test_code();
+
+	if (get_cr4() & X86_CR4_FRED)
+		guest_test_code();
+
 	GUEST_DONE();
 }
 
@@ -75,19 +84,15 @@ static void vcpu_skip_insn(struct kvm_vcpu *vcpu, int insn_len)
 	vcpu_regs_set(vcpu, &regs);
 }
 
-int main(void)
+void run_test(struct kvm_vcpu *vcpu)
 {
 	struct kvm_guest_debug debug;
+	struct kvm_run *run = vcpu->run;
 	unsigned long long target_dr6, target_rip;
-	struct kvm_vcpu *vcpu;
-	struct kvm_run *run;
-	struct kvm_vm *vm;
-	struct ucall uc;
-	uint64_t cmd;
 	int i;
 	/* Instruction lengths starting at ss_start */
 	int ss_size[6] = {
-		1,		/* sti*/
+		1,		/* sti */
 		2,		/* xor */
 		2,		/* cpuid */
 		5,		/* mov */
@@ -95,11 +100,6 @@ int main(void)
 		1,		/* cli */
 	};
 
-	TEST_REQUIRE(kvm_has_cap(KVM_CAP_SET_GUEST_DEBUG));
-
-	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
-	run = vcpu->run;
-
 	/* Test software BPs - int3 */
 	memset(&debug, 0, sizeof(debug));
 	debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
@@ -202,6 +202,30 @@ int main(void)
 	/* Disable all debug controls, run to the end */
 	memset(&debug, 0, sizeof(debug));
 	vcpu_guest_debug_set(vcpu, &debug);
+}
+
+int main(void)
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	struct ucall uc;
+	uint64_t cmd;
+
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_SET_GUEST_DEBUG));
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+
+	run_test(vcpu);
+
+	if (kvm_cpu_has(X86_FEATURE_FRED)) {
+		struct kvm_sregs sregs;
+
+		vcpu_sregs_get(vcpu, &sregs);
+		sregs.cr4 |= X86_CR4_FRED;
+		vcpu_sregs_set(vcpu, &sregs);
+
+		run_test(vcpu);
+	}
 
 	vcpu_run(vcpu);
 	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 24/25] KVM: selftests: Add a new VM guest mode to run user level code
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (22 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 23/25] KVM: selftests: Run debug_regs test with FRED enabled Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-02-07 17:26 ` [PATCH v2 25/25] KVM: selftests: Add fred exception tests Xin Li
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Add a new VM guest mode VM_MODE_PXXV48_4K_USER to set the user bit of
guest page table entries, thus allow user level code to run in guests.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 .../testing/selftests/kvm/include/kvm_util_base.h |  1 +
 tools/testing/selftests/kvm/lib/kvm_util.c        |  5 ++++-
 .../testing/selftests/kvm/lib/x86_64/processor.c  | 15 ++++++++++-----
 tools/testing/selftests/kvm/lib/x86_64/vmx.c      |  4 ++--
 4 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 9e5afc472c14..ea1a585ef6f4 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -187,6 +187,7 @@ enum vm_guest_mode {
 	VM_MODE_P36V48_16K,
 	VM_MODE_P36V48_64K,
 	VM_MODE_P36V47_16K,
+	VM_MODE_PXXV48_4K_USER,	/* For 48bits VA but ANY bits PA with USER bit set */
 	NUM_VM_MODES,
 };
 
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index e066d584c656..8b4761836b3e 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -163,6 +163,7 @@ const char *vm_guest_mode_string(uint32_t i)
 		[VM_MODE_P36V48_16K]	= "PA-bits:36,  VA-bits:48, 16K pages",
 		[VM_MODE_P36V48_64K]	= "PA-bits:36,  VA-bits:48, 64K pages",
 		[VM_MODE_P36V47_16K]	= "PA-bits:36,  VA-bits:47, 16K pages",
+		[VM_MODE_PXXV48_4K_USER]	= "PA-bits:ANY, VA-bits:48,  4K user pages",
 	};
 	_Static_assert(sizeof(strings)/sizeof(char *) == NUM_VM_MODES,
 		       "Missing new mode strings?");
@@ -189,6 +190,7 @@ const struct vm_guest_mode_params vm_guest_mode_params[] = {
 	[VM_MODE_P36V48_16K]	= { 36, 48,  0x4000, 14 },
 	[VM_MODE_P36V48_64K]	= { 36, 48, 0x10000, 16 },
 	[VM_MODE_P36V47_16K]	= { 36, 47,  0x4000, 14 },
+	[VM_MODE_PXXV48_4K_USER]	= {  0,  0,  0x1000, 12 },
 };
 _Static_assert(sizeof(vm_guest_mode_params)/sizeof(struct vm_guest_mode_params) == NUM_VM_MODES,
 	       "Missing new mode params?");
@@ -263,6 +265,7 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
 		vm->pgtable_levels = 3;
 		break;
 	case VM_MODE_PXXV48_4K:
+	case VM_MODE_PXXV48_4K_USER:
 #ifdef __x86_64__
 		kvm_get_cpu_address_width(&vm->pa_bits, &vm->va_bits);
 		/*
@@ -278,7 +281,7 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
 		vm->pgtable_levels = 4;
 		vm->va_bits = 48;
 #else
-		TEST_FAIL("VM_MODE_PXXV48_4K not supported on non-x86 platforms");
+		TEST_FAIL("VM_MODE_PXXV48_4K(_USER) not supported on non-x86 platforms");
 #endif
 		break;
 	case VM_MODE_P47V64_4K:
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index d8288374078e..a8e60641df53 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -124,8 +124,8 @@ bool kvm_is_tdp_enabled(void)
 
 void virt_arch_pgd_alloc(struct kvm_vm *vm)
 {
-	TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
-		"unknown or unsupported guest mode, mode: 0x%x", vm->mode);
+	TEST_ASSERT((vm->mode == VM_MODE_PXXV48_4K) || (vm->mode == VM_MODE_PXXV48_4K_USER),
+		"Attempt to use unknown or unsupported guest mode, mode: 0x%x", vm->mode);
 
 	/* If needed, create page map l4 table. */
 	if (!vm->pgd_created) {
@@ -159,6 +159,8 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
 
 	if (!(*pte & PTE_PRESENT_MASK)) {
 		*pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK;
+		if (vm->mode == VM_MODE_PXXV48_4K_USER)
+			*pte |= PTE_USER_MASK;
 		if (current_level == target_level)
 			*pte |= PTE_LARGE_MASK | (paddr & PHYSICAL_PAGE_MASK);
 		else
@@ -185,7 +187,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
 	uint64_t *pml4e, *pdpe, *pde;
 	uint64_t *pte;
 
-	TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K,
+	TEST_ASSERT((vm->mode == VM_MODE_PXXV48_4K) || (vm->mode == VM_MODE_PXXV48_4K_USER),
 		    "Unknown or unsupported guest mode, mode: 0x%x", vm->mode);
 
 	TEST_ASSERT((vaddr % pg_size) == 0,
@@ -222,6 +224,8 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
 	TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
 		    "PTE already present for 4k page at vaddr: 0x%lx\n", vaddr);
 	*pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
+	if (vm->mode == VM_MODE_PXXV48_4K_USER)
+		*pte |= PTE_USER_MASK;
 }
 
 void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
@@ -268,8 +272,8 @@ uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
 	TEST_ASSERT(*level >= PG_LEVEL_NONE && *level < PG_LEVEL_NUM,
 		    "Invalid PG_LEVEL_* '%d'", *level);
 
-	TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
-		"unknown or unsupported guest mode, mode: 0x%x", vm->mode);
+	TEST_ASSERT((vm->mode == VM_MODE_PXXV48_4K) || (vm->mode == VM_MODE_PXXV48_4K_USER),
+		"Attempt to use unknown or unsupported guest mode, mode: 0x%x", vm->mode);
 	TEST_ASSERT(sparsebit_is_set(vm->vpages_valid,
 		(vaddr >> vm->page_shift)),
 		"Invalid virtual address, vaddr: 0x%lx",
@@ -536,6 +540,7 @@ static void vcpu_setup(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
 
 	switch (vm->mode) {
 	case VM_MODE_PXXV48_4K:
+	case VM_MODE_PXXV48_4K_USER:
 		sregs.cr0 = X86_CR0_PE | X86_CR0_NE | X86_CR0_PG;
 		sregs.cr4 |= X86_CR4_PAE | X86_CR4_OSFXSR;
 		sregs.efer |= (EFER_LME | EFER_LMA | EFER_NX);
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 59d97531c9b1..65147de6f9c0 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -403,8 +403,8 @@ void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
 	struct eptPageTableEntry *pt = vmx->eptp_hva, *pte;
 	uint16_t index;
 
-	TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
-		    "unknown or unsupported guest mode, mode: 0x%x", vm->mode);
+	TEST_ASSERT((vm->mode == VM_MODE_PXXV48_4K) || (vm->mode == VM_MODE_PXXV48_4K_USER),
+		    "Attempt to use unknown or unsupported guest mode, mode: 0x%x", vm->mode);
 
 	TEST_ASSERT((nested_paddr >> 48) == 0,
 		    "Nested physical address 0x%lx requires 5-level paging",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 25/25] KVM: selftests: Add fred exception tests
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (23 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 24/25] KVM: selftests: Add a new VM guest mode to run user level code Xin Li
@ 2024-02-07 17:26 ` Xin Li
  2024-03-29 20:18     ` Muhammad Usama Anjum
  2024-03-29 20:18     ` Muhammad Usama Anjum
  2024-03-27  8:08 ` [PATCH v2 00/25] Enable FRED with KVM VMX Kang, Shan
  2024-04-15 17:58 ` Li, Xin3
  26 siblings, 2 replies; 51+ messages in thread
From: Xin Li @ 2024-02-07 17:26 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, ravi.v.shankar, xin

Add tests for FRED event data and VMX nested-exception.

FRED is designed to save a complete event context in its stack frame,
e.g., FRED saves the faulting linear address of a #PF into a 64-bit
event data field defined in FRED stack frame.  As such, FRED VMX adds
event data handling during VMX transitions.

Besides, FRED introduces event stack levels to dispatch an event handler
onto a stack baesd on current stack level and stack levels defined in
IA32_FRED_STKLVLS MSR for each exception vector.  VMX nested-exception
support ensures a correct event stack level is chosen when a VM entry
injects a nested exception, which is regarded as occurred in ring 0.

To fully test the underlying FRED VMX code, this test should be run one
more round with EPT disabled to inject page faults as nested exceptions.

Originally-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |  32 ++
 .../testing/selftests/kvm/x86_64/fred_test.c  | 297 ++++++++++++++++++
 3 files changed, 330 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/fred_test.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 492e937fab00..eaac13a605f2 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -67,6 +67,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
 TEST_GEN_PROGS_x86_64 += x86_64/exit_on_emulation_failure_test
 TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
 TEST_GEN_PROGS_x86_64 += x86_64/hwcr_msr_test
+TEST_GEN_PROGS_x86_64 += x86_64/fred_test
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_evmcs
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index bc5cd8628a20..ef7aaab790e0 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1275,4 +1275,36 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 #define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
 
+/*
+ * FRED related data structures and functions
+ */
+
+#define FRED_SSX_NMI		BIT_ULL(18)
+
+struct fred_stack {
+	u64 r15;
+	u64 r14;
+	u64 r13;
+	u64 r12;
+	u64 bp;
+	u64 bx;
+	u64 r11;
+	u64 r10;
+	u64 r9;
+	u64 r8;
+	u64 ax;
+	u64 cx;
+	u64 dx;
+	u64 si;
+	u64 di;
+	u64 error_code;
+	u64 ip;
+	u64 csx;
+	u64 flags;
+	u64 sp;
+	u64 ssx;
+	u64 event_data;
+	u64 reserved;
+};
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/x86_64/fred_test.c b/tools/testing/selftests/kvm/x86_64/fred_test.c
new file mode 100644
index 000000000000..412afa919568
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/fred_test.c
@@ -0,0 +1,297 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * FRED nested exception tests
+ *
+ * Copyright (C) 2023, Intel, Inc.
+ */
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <asm/msr-index.h>
+
+#include "apic.h"
+#include "kvm_util.h"
+#include "test_util.h"
+#include "guest_modes.h"
+#include "processor.h"
+
+#define IRQ_VECTOR 0xAA
+
+#define FRED_STKLVL(v,l)		(_AT(unsigned long, l) << (2 * (v)))
+#define FRED_CONFIG_ENTRYPOINT(p)	_AT(unsigned long, (p))
+
+/* This address is already mapped in guest page table. */
+#define FRED_VALID_RSP			0x8000
+
+/*
+ * The following addresses are not yet mapped in both EPT and guest page
+ * tables at the beginning.  As a result, it causes an EPT violation VM
+ * exit with an original guest #PF to access any of them for the first
+ * time.
+ *
+ * Use these addresses as guest FRED RSP0 to generate nested #PFs to test
+ * if event data are properly virtualized.
+ */
+static unsigned long fred_invalid_rsp[4] = {
+	0x0,
+	0xf0000000,
+	0xe0000000,
+	0xd0000000,
+};
+
+extern char asm_user_nop[];
+extern char asm_user_ud[];
+extern char asm_done_fault[];
+
+extern void asm_test_fault(int test);
+
+/*
+ * user level code for triggering faults.
+ */
+asm(".pushsection .text\n"
+    ".align 4096\n"
+
+    ".type asm_user_nop, @function\n"
+    "asm_user_nop:\n"
+    "1: .byte 0x90\n"
+    "jmp 1b\n"
+
+    ".fill asm_user_ud - ., 1, 0xcc\n"
+
+    ".type asm_user_ud, @function\n"
+    ".org asm_user_nop + 16\n"
+    "asm_user_ud:\n"
+    /* Trigger a #UD */
+    "ud2\n"
+
+    ".align 4096, 0xcc\n"
+    ".popsection");
+
+/* Send current stack level and #PF address */
+#define GUEST_SYNC_CSL_FA(__stage, __pf_address)		\
+	GUEST_SYNC_ARGS(__stage, __pf_address, 0, 0, 0)
+
+void fred_entry_from_user(struct fred_stack *stack)
+{
+	u32 current_stack_level = rdmsr(MSR_IA32_FRED_CONFIG) & 0x3;
+
+	GUEST_SYNC_CSL_FA(current_stack_level, stack->event_data);
+
+	/* Do NOT go back to user level, continue the next test instead */
+	stack->ssx = 0x18;
+	stack->csx = 0x10;
+	stack->ip = (u64)&asm_done_fault;
+}
+
+void fred_entry_from_kernel(struct fred_stack *stack)
+{
+	/*
+	 * Keep NMI blocked to delay the delivery of the next NMI until
+	 * returning to user level.
+	 * */
+	stack->ssx &= ~FRED_SSX_NMI;
+}
+
+#define PUSH_REGS	\
+	"push %rdi\n"	\
+	"push %rsi\n"	\
+	"push %rdx\n"	\
+	"push %rcx\n"	\
+	"push %rax\n"	\
+	"push %r8\n"	\
+	"push %r9\n"	\
+	"push %r10\n"	\
+	"push %r11\n"	\
+	"push %rbx\n"	\
+	"push %rbp\n"	\
+	"push %r12\n"	\
+	"push %r13\n"	\
+	"push %r14\n"	\
+	"push %r15\n"
+
+#define POP_REGS	\
+	"pop %r15\n"	\
+	"pop %r14\n"	\
+	"pop %r13\n"	\
+	"pop %r12\n"	\
+	"pop %rbp\n"	\
+	"pop %rbx\n"	\
+	"pop %r11\n"	\
+	"pop %r10\n"	\
+	"pop %r9\n"	\
+	"pop %r8\n"	\
+	"pop %rax\n"	\
+	"pop %rcx\n"	\
+	"pop %rdx\n"	\
+	"pop %rsi\n"	\
+	"pop %rdi\n"
+
+/*
+ * FRED entry points.
+ */
+asm(".pushsection .text\n"
+    ".type asm_fred_entrypoint_user, @function\n"
+    ".align 4096\n"
+    "asm_fred_entrypoint_user:\n"
+    "endbr64\n"
+    PUSH_REGS
+    "movq %rsp, %rdi\n"
+    "call fred_entry_from_user\n"
+    POP_REGS
+    /* Do NOT go back to user level, continue the next test instead */
+    ".byte 0xf2,0x0f,0x01,0xca\n"	/* ERETS */
+
+    ".fill asm_fred_entrypoint_kernel - ., 1, 0xcc\n"
+
+    ".type asm_fred_entrypoint_kernel, @function\n"
+    ".org asm_fred_entrypoint_user + 256\n"
+    "asm_fred_entrypoint_kernel:\n"
+    "endbr64\n"
+    PUSH_REGS
+    "movq %rsp, %rdi\n"
+    "call fred_entry_from_kernel\n"
+    POP_REGS
+    ".byte 0xf2,0x0f,0x01,0xca\n"	/* ERETS */
+    ".align 4096, 0xcc\n"
+    ".popsection");
+
+extern char asm_fred_entrypoint_user[];
+
+/*
+ * Prepare a FRED stack frame for ERETU to return to user level code,
+ * nop or ud2.
+ *
+ * Because FRED RSP0 is deliberately not mapped in guest page table,
+ * the delivery of interrupt/NMI or #UD from ring 3 causes a nested
+ * #PF, which is then delivered on FRED RSPx (x is 1, 2 or 3,
+ * determinated by MSR FRED_STKLVL[PF_VECTOR]).
+ */
+asm(".pushsection .text\n"
+    ".type asm_test_fault, @function\n"
+    ".align 4096\n"
+    "asm_test_fault:\n"
+    "endbr64\n"
+    "push %rbp\n"
+    "mov %rsp, %rbp\n"
+    "and $(~0x3f), %rsp\n"
+    "push $0\n"
+    "push $0\n"
+    "mov $0x2b, %rax\n"
+    /* Unblock NMI */
+    "bts $18, %rax\n"
+    /* Set long mode bit */
+    "bts $57, %rax\n"
+    "push %rax\n"
+    /* No stack required for the FRED user level test code */
+    "push $0\n"
+    "pushf\n"
+    "pop %rax\n"
+    /* Allow external interrupts */
+    "bts $9, %rax\n"
+    "push %rax\n"
+    "mov $0x33, %rax\n"
+    "push %rax\n"
+    "cmp $0, %edi\n"
+    "jne 1f\n"
+    "lea asm_user_nop(%rip), %rax\n"
+    "jmp 2f\n"
+    "1: lea asm_user_ud(%rip), %rax\n"
+    "2: push %rax\n"
+    "push $0\n"
+    /* ERETU to user level code to allow event delivery immediately */
+    ".byte 0xf3,0x0f,0x01,0xca\n"
+    "asm_done_fault:\n"
+    "mov %rbp, %rsp\n"
+    "pop %rbp\n"
+    "ret\n"
+    ".align 4096, 0xcc\n"
+    ".popsection");
+
+/*
+ * To fully test the underlying FRED VMX code, this test should be run one
+ * more round with EPT disabled to inject page faults as nested exceptions.
+ */
+static void guest_code(void)
+{
+	wrmsr(MSR_IA32_FRED_CONFIG,
+	      FRED_CONFIG_ENTRYPOINT(asm_fred_entrypoint_user));
+
+	wrmsr(MSR_IA32_FRED_RSP1, FRED_VALID_RSP);
+	wrmsr(MSR_IA32_FRED_RSP2, FRED_VALID_RSP);
+	wrmsr(MSR_IA32_FRED_RSP3, FRED_VALID_RSP);
+
+	/* Enable FRED */
+	set_cr4(get_cr4() | X86_CR4_FRED);
+
+	x2apic_enable();
+
+	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 1));
+	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[1]);
+	/* 1: ud2 to generate #UD */
+	asm_test_fault(1);
+
+	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 2));
+	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[2]);
+	asm volatile("cli");
+	/* Create a pending interrupt on current vCPU */
+	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
+			 APIC_DM_FIXED | IRQ_VECTOR);
+	/* Return to ring 3 */
+	asm_test_fault(0);
+	x2apic_write_reg(APIC_EOI, 0);
+
+	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 3));
+	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[3]);
+	/*
+	 * The first NMI is just to have NMI blocked in ring 0, because
+	 * fred_entry_from_kernel() deliberately clears the NMI bit in
+	 * FRED stack frame.
+	 */
+	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
+			 APIC_DM_NMI | NMI_VECTOR);
+	/* The second NMI will be delivered after returning to ring 3 */
+	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
+			 APIC_DM_NMI | NMI_VECTOR);
+	/* Return to ring 3 */
+	asm_test_fault(0);
+
+	GUEST_DONE();
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	struct ucall uc;
+	uint64_t expected_current_stack_level = 1;
+
+	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_FRED));
+
+	vm = __vm_create_with_vcpus(VM_SHAPE(VM_MODE_PXXV48_4K_USER), 1, 0,
+				    guest_code, &vcpu);
+
+	while (true) {
+		uint64_t r;
+
+		vcpu_run(vcpu);
+
+		r = get_ucall(vcpu, &uc);
+
+		if (r == UCALL_DONE)
+			break;
+
+		if (r == UCALL_SYNC) {
+			TEST_ASSERT((uc.args[1] == expected_current_stack_level) &&
+				    (uc.args[2] == fred_invalid_rsp[expected_current_stack_level] - 1),
+				    "Incorrect stack level %lx and #PF address %lx\n",
+				    uc.args[1], uc.args[2]);
+			expected_current_stack_level++;
+		}
+	}
+
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 00/25] Enable FRED with KVM VMX
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (24 preceding siblings ...)
  2024-02-07 17:26 ` [PATCH v2 25/25] KVM: selftests: Add fred exception tests Xin Li
@ 2024-03-27  8:08 ` Kang, Shan
  2024-04-15 17:58 ` Li, Xin3
  26 siblings, 0 replies; 51+ messages in thread
From: Kang, Shan @ 2024-03-27  8:08 UTC (permalink / raw)
  To: Li, Xin3, kvm, linux-doc, linux-kernel, linux-kselftest
  Cc: corbet, seanjc, x86, dave.hansen, vkuznets, bp, mingo, tglx, hpa,
	pbonzini, peterz, shuah, Shankar, Ravi V, xin

On Wed, 2024-02-07 at 09:26 -0800, Xin Li wrote:
> This patch set enables the Intel flexible return and event delivery
> (FRED) architecture with KVM VMX to allow guests to utilize FRED.
> 
We tested this FRED KVM patch set on a 7th Intel(R) Core(TM) CPU and the Intel 
Simics® Simulator with the following four configurations:

The first config is the baseline on bare metal.
The second config is the baseline on Intel Simics® Simulator.
The third config enables host FRED, but disables guest FRED.
The last config enables both host and guest FRED.

Following are the Kselftest results on KVM guests.

+---------------------------------------------+-------+-------+-------+-------+
|                  Config                     |  Pass |  Fail |  Skip |  Hang |
+---------------------------------------------+-------+-------+-------+-------+
|the 7th Intel(R) Core(TM) CPU                |       |       |       |       |
|  L0: 6.8.0-rc3+ w/ FRED native/KVM patch set|  1775 |  526  |  332  |   6   |
|  L1: 6.8.0-rc3+                             |       |       |       |       |
+---------------------------------------------+-------+-------+-------+-------+
|Intel Simics® Simulator w/o FRED model       |       |       |       |       |
|  L0: 6.8.0-rc3+ w/ FRED native/KVM patch set|  1770 |  526  |  331  |   12  |
|  L1: 6.8.0-rc3+ w/ FRED native/KVM patch set|       |       |       |       |
+---------------------------------------------+-------+-------+-------+-------+
|Intel Simics® Simulator w/ FRED model        |       |       |       |       |
|  L0: 6.8.0-rc3+ w/ FRED native/KVM patch set|  1770 |  526  |  331  |   12  |
|  L1: 6.8.0-rc3+ w/ FRED native/KVM patch set|       |       |       |       |
|         but FRED disabled                   |       |       |       |       |
+---------------------------------------------+-------+-------+-------+-------+
|Intel Simics® Simulator w/ FRED model        |       |       |       |       |
|  L0: 6.8.0-rc3+ w/ FRED native/KVM patch set|  1769 |  528  |  330  |   12  |
|  L1: 6.8.0-rc3+ w/ FRED native/KVM patch set|       |       |       |       |
+---------------------------------------------+-------+-------+-------+-------+

First of all we don't see any major issue. One major differences come from
perf tests. Another variances are from timer tests due to Intel Simics®
emulator's slowness.

The tests "x86:sysret_rip_64" and "x86:sigreturn_32" fail on the last config,
because they are not valid tests for FRED:
https://lore.kernel.org/lkml/20230220030959.119222-1-ammarfaizi2@gnuweeb.org/T/#u
https://lore.kernel.org/lkml/20230706052231.2183-1-xin3.li@intel.com/

NB: Some tests pass on Intel Simics® emulator, but not bare metal. If needed,
we can share the data.

We conducted local live migration tests with LKGS/FRED/WRMSRNS enabled/disabled
on Intel Simics® Simulator, and see no problems.

We also tested KVM Kselftest on a nested FRED KVM guest, 48 out of 83 cases
passed and the rest cases failed due to the slowness of Intel Simics® emulator.

Detailed report upon request.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 25/25] KVM: selftests: Add fred exception tests
@ 2024-03-29 20:18     ` Muhammad Usama Anjum
  0 siblings, 0 replies; 51+ messages in thread
From: Muhammad Usama Anjum @ 2024-03-29 20:18 UTC (permalink / raw)
  To: Xin Li, linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: Muhammad Usama Anjum, seanjc, pbonzini, corbet, tglx, mingo, bp,
	dave.hansen, x86, hpa, shuah, vkuznets, peterz, ravi.v.shankar,
	xin

On 2/7/24 10:26 PM, Xin Li wrote:
> Add tests for FRED event data and VMX nested-exception.
> 
> FRED is designed to save a complete event context in its stack frame,
> e.g., FRED saves the faulting linear address of a #PF into a 64-bit
> event data field defined in FRED stack frame.  As such, FRED VMX adds
> event data handling during VMX transitions.
> 
> Besides, FRED introduces event stack levels to dispatch an event handler
> onto a stack baesd on current stack level and stack levels defined in
> IA32_FRED_STKLVLS MSR for each exception vector.  VMX nested-exception
> support ensures a correct event stack level is chosen when a VM entry
> injects a nested exception, which is regarded as occurred in ring 0.
> 
> To fully test the underlying FRED VMX code, this test should be run one
> more round with EPT disabled to inject page faults as nested exceptions.
> 
> Originally-by: Shan Kang <shan.kang@intel.com>
> Signed-off-by: Xin Li <xin3.li@intel.com>
Thank you for the new test patch. We have been trying to ensure TAP
conformance for tests which cannot be achieved if new tests aren't using
TAP already. Please make your test TAP compliant.

> ---
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/include/x86_64/processor.h  |  32 ++
>  .../testing/selftests/kvm/x86_64/fred_test.c  | 297 ++++++++++++++++++
Add generated binary object to .gitignore.

>  3 files changed, 330 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/fred_test.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 492e937fab00..eaac13a605f2 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -67,6 +67,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
>  TEST_GEN_PROGS_x86_64 += x86_64/exit_on_emulation_failure_test
>  TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hwcr_msr_test
> +TEST_GEN_PROGS_x86_64 += x86_64/fred_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_evmcs
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index bc5cd8628a20..ef7aaab790e0 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -1275,4 +1275,36 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
>  #define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
>  #define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
>  
> +/*
> + * FRED related data structures and functions
> + */
> +
> +#define FRED_SSX_NMI		BIT_ULL(18)
> +
> +struct fred_stack {
> +	u64 r15;
> +	u64 r14;
> +	u64 r13;
> +	u64 r12;
> +	u64 bp;
> +	u64 bx;
> +	u64 r11;
> +	u64 r10;
> +	u64 r9;
> +	u64 r8;
> +	u64 ax;
> +	u64 cx;
> +	u64 dx;
> +	u64 si;
> +	u64 di;
> +	u64 error_code;
> +	u64 ip;
> +	u64 csx;
> +	u64 flags;
> +	u64 sp;
> +	u64 ssx;
> +	u64 event_data;
> +	u64 reserved;
> +};
> +
>  #endif /* SELFTEST_KVM_PROCESSOR_H */
> diff --git a/tools/testing/selftests/kvm/x86_64/fred_test.c b/tools/testing/selftests/kvm/x86_64/fred_test.c
> new file mode 100644
> index 000000000000..412afa919568
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/fred_test.c
> @@ -0,0 +1,297 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * FRED nested exception tests
> + *
> + * Copyright (C) 2023, Intel, Inc.
> + */
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/ioctl.h>
> +#include <asm/msr-index.h>
> +
> +#include "apic.h"
> +#include "kvm_util.h"
> +#include "test_util.h"
> +#include "guest_modes.h"
> +#include "processor.h"
> +
> +#define IRQ_VECTOR 0xAA
> +
> +#define FRED_STKLVL(v,l)		(_AT(unsigned long, l) << (2 * (v)))
> +#define FRED_CONFIG_ENTRYPOINT(p)	_AT(unsigned long, (p))
> +
> +/* This address is already mapped in guest page table. */
> +#define FRED_VALID_RSP			0x8000
> +
> +/*
> + * The following addresses are not yet mapped in both EPT and guest page
> + * tables at the beginning.  As a result, it causes an EPT violation VM
> + * exit with an original guest #PF to access any of them for the first
> + * time.
> + *
> + * Use these addresses as guest FRED RSP0 to generate nested #PFs to test
> + * if event data are properly virtualized.
> + */
> +static unsigned long fred_invalid_rsp[4] = {
> +	0x0,
> +	0xf0000000,
> +	0xe0000000,
> +	0xd0000000,
> +};
> +
> +extern char asm_user_nop[];
> +extern char asm_user_ud[];
> +extern char asm_done_fault[];
> +
> +extern void asm_test_fault(int test);
> +
> +/*
> + * user level code for triggering faults.
> + */
> +asm(".pushsection .text\n"
> +    ".align 4096\n"
> +
> +    ".type asm_user_nop, @function\n"
> +    "asm_user_nop:\n"
> +    "1: .byte 0x90\n"
> +    "jmp 1b\n"
> +
> +    ".fill asm_user_ud - ., 1, 0xcc\n"
> +
> +    ".type asm_user_ud, @function\n"
> +    ".org asm_user_nop + 16\n"
> +    "asm_user_ud:\n"
> +    /* Trigger a #UD */
> +    "ud2\n"
> +
> +    ".align 4096, 0xcc\n"
> +    ".popsection");
> +
> +/* Send current stack level and #PF address */
> +#define GUEST_SYNC_CSL_FA(__stage, __pf_address)		\
> +	GUEST_SYNC_ARGS(__stage, __pf_address, 0, 0, 0)
> +
> +void fred_entry_from_user(struct fred_stack *stack)
> +{
> +	u32 current_stack_level = rdmsr(MSR_IA32_FRED_CONFIG) & 0x3;
> +
> +	GUEST_SYNC_CSL_FA(current_stack_level, stack->event_data);
> +
> +	/* Do NOT go back to user level, continue the next test instead */
> +	stack->ssx = 0x18;
> +	stack->csx = 0x10;
> +	stack->ip = (u64)&asm_done_fault;
> +}
> +
> +void fred_entry_from_kernel(struct fred_stack *stack)
> +{
> +	/*
> +	 * Keep NMI blocked to delay the delivery of the next NMI until
> +	 * returning to user level.
> +	 * */
> +	stack->ssx &= ~FRED_SSX_NMI;
> +}
> +
> +#define PUSH_REGS	\
> +	"push %rdi\n"	\
> +	"push %rsi\n"	\
> +	"push %rdx\n"	\
> +	"push %rcx\n"	\
> +	"push %rax\n"	\
> +	"push %r8\n"	\
> +	"push %r9\n"	\
> +	"push %r10\n"	\
> +	"push %r11\n"	\
> +	"push %rbx\n"	\
> +	"push %rbp\n"	\
> +	"push %r12\n"	\
> +	"push %r13\n"	\
> +	"push %r14\n"	\
> +	"push %r15\n"
> +
> +#define POP_REGS	\
> +	"pop %r15\n"	\
> +	"pop %r14\n"	\
> +	"pop %r13\n"	\
> +	"pop %r12\n"	\
> +	"pop %rbp\n"	\
> +	"pop %rbx\n"	\
> +	"pop %r11\n"	\
> +	"pop %r10\n"	\
> +	"pop %r9\n"	\
> +	"pop %r8\n"	\
> +	"pop %rax\n"	\
> +	"pop %rcx\n"	\
> +	"pop %rdx\n"	\
> +	"pop %rsi\n"	\
> +	"pop %rdi\n"
> +
> +/*
> + * FRED entry points.
> + */
> +asm(".pushsection .text\n"
> +    ".type asm_fred_entrypoint_user, @function\n"
> +    ".align 4096\n"
> +    "asm_fred_entrypoint_user:\n"
> +    "endbr64\n"
> +    PUSH_REGS
> +    "movq %rsp, %rdi\n"
> +    "call fred_entry_from_user\n"
> +    POP_REGS
> +    /* Do NOT go back to user level, continue the next test instead */
> +    ".byte 0xf2,0x0f,0x01,0xca\n"	/* ERETS */
> +
> +    ".fill asm_fred_entrypoint_kernel - ., 1, 0xcc\n"
> +
> +    ".type asm_fred_entrypoint_kernel, @function\n"
> +    ".org asm_fred_entrypoint_user + 256\n"
> +    "asm_fred_entrypoint_kernel:\n"
> +    "endbr64\n"
> +    PUSH_REGS
> +    "movq %rsp, %rdi\n"
> +    "call fred_entry_from_kernel\n"
> +    POP_REGS
> +    ".byte 0xf2,0x0f,0x01,0xca\n"	/* ERETS */
> +    ".align 4096, 0xcc\n"
> +    ".popsection");
> +
> +extern char asm_fred_entrypoint_user[];
> +
> +/*
> + * Prepare a FRED stack frame for ERETU to return to user level code,
> + * nop or ud2.
> + *
> + * Because FRED RSP0 is deliberately not mapped in guest page table,
> + * the delivery of interrupt/NMI or #UD from ring 3 causes a nested
> + * #PF, which is then delivered on FRED RSPx (x is 1, 2 or 3,
> + * determinated by MSR FRED_STKLVL[PF_VECTOR]).
> + */
> +asm(".pushsection .text\n"
> +    ".type asm_test_fault, @function\n"
> +    ".align 4096\n"
> +    "asm_test_fault:\n"
> +    "endbr64\n"
> +    "push %rbp\n"
> +    "mov %rsp, %rbp\n"
> +    "and $(~0x3f), %rsp\n"
> +    "push $0\n"
> +    "push $0\n"
> +    "mov $0x2b, %rax\n"
> +    /* Unblock NMI */
> +    "bts $18, %rax\n"
> +    /* Set long mode bit */
> +    "bts $57, %rax\n"
> +    "push %rax\n"
> +    /* No stack required for the FRED user level test code */
> +    "push $0\n"
> +    "pushf\n"
> +    "pop %rax\n"
> +    /* Allow external interrupts */
> +    "bts $9, %rax\n"
> +    "push %rax\n"
> +    "mov $0x33, %rax\n"
> +    "push %rax\n"
> +    "cmp $0, %edi\n"
> +    "jne 1f\n"
> +    "lea asm_user_nop(%rip), %rax\n"
> +    "jmp 2f\n"
> +    "1: lea asm_user_ud(%rip), %rax\n"
> +    "2: push %rax\n"
> +    "push $0\n"
> +    /* ERETU to user level code to allow event delivery immediately */
> +    ".byte 0xf3,0x0f,0x01,0xca\n"
> +    "asm_done_fault:\n"
> +    "mov %rbp, %rsp\n"
> +    "pop %rbp\n"
> +    "ret\n"
> +    ".align 4096, 0xcc\n"
> +    ".popsection");
> +
> +/*
> + * To fully test the underlying FRED VMX code, this test should be run one
> + * more round with EPT disabled to inject page faults as nested exceptions.
> + */
> +static void guest_code(void)
> +{
> +	wrmsr(MSR_IA32_FRED_CONFIG,
> +	      FRED_CONFIG_ENTRYPOINT(asm_fred_entrypoint_user));
> +
> +	wrmsr(MSR_IA32_FRED_RSP1, FRED_VALID_RSP);
> +	wrmsr(MSR_IA32_FRED_RSP2, FRED_VALID_RSP);
> +	wrmsr(MSR_IA32_FRED_RSP3, FRED_VALID_RSP);
> +
> +	/* Enable FRED */
> +	set_cr4(get_cr4() | X86_CR4_FRED);
> +
> +	x2apic_enable();
> +
> +	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 1));
> +	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[1]);
> +	/* 1: ud2 to generate #UD */
> +	asm_test_fault(1);
> +
> +	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 2));
> +	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[2]);
> +	asm volatile("cli");
> +	/* Create a pending interrupt on current vCPU */
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_FIXED | IRQ_VECTOR);
> +	/* Return to ring 3 */
> +	asm_test_fault(0);
> +	x2apic_write_reg(APIC_EOI, 0);
> +
> +	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 3));
> +	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[3]);
> +	/*
> +	 * The first NMI is just to have NMI blocked in ring 0, because
> +	 * fred_entry_from_kernel() deliberately clears the NMI bit in
> +	 * FRED stack frame.
> +	 */
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_NMI | NMI_VECTOR);
> +	/* The second NMI will be delivered after returning to ring 3 */
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_NMI | NMI_VECTOR);
> +	/* Return to ring 3 */
> +	asm_test_fault(0);
> +
> +	GUEST_DONE();
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +	struct ucall uc;
> +	uint64_t expected_current_stack_level = 1;
> +
> +	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_FRED));
> +
> +	vm = __vm_create_with_vcpus(VM_SHAPE(VM_MODE_PXXV48_4K_USER), 1, 0,
> +				    guest_code, &vcpu);
> +
> +	while (true) {
> +		uint64_t r;
> +
> +		vcpu_run(vcpu);
> +
> +		r = get_ucall(vcpu, &uc);
> +
> +		if (r == UCALL_DONE)
> +			break;
> +
> +		if (r == UCALL_SYNC) {
> +			TEST_ASSERT((uc.args[1] == expected_current_stack_level) &&
> +				    (uc.args[2] == fred_invalid_rsp[expected_current_stack_level] - 1),
> +				    "Incorrect stack level %lx and #PF address %lx\n",
> +				    uc.args[1], uc.args[2]);
> +			expected_current_stack_level++;
> +		}
> +	}
> +
> +	kvm_vm_free(vm);
> +	return 0;
> +}

-- 
BR,
Muhammad Usama Anjum

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 25/25] KVM: selftests: Add fred exception tests
@ 2024-03-29 20:18     ` Muhammad Usama Anjum
  0 siblings, 0 replies; 51+ messages in thread
From: Muhammad Usama Anjum @ 2024-03-29 20:18 UTC (permalink / raw)
  To: Xin Li, linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: Muhammad Usama Anjum, seanjc, pbonzini, corbet, tglx, mingo, bp,
	dave.hansen, x86, hpa, shuah, vkuznets, peterz, ravi.v.shankar,
	xin

On 2/7/24 10:26 PM, Xin Li wrote:
> Add tests for FRED event data and VMX nested-exception.
> 
> FRED is designed to save a complete event context in its stack frame,
> e.g., FRED saves the faulting linear address of a #PF into a 64-bit
> event data field defined in FRED stack frame.  As such, FRED VMX adds
> event data handling during VMX transitions.
> 
> Besides, FRED introduces event stack levels to dispatch an event handler
> onto a stack baesd on current stack level and stack levels defined in
> IA32_FRED_STKLVLS MSR for each exception vector.  VMX nested-exception
> support ensures a correct event stack level is chosen when a VM entry
> injects a nested exception, which is regarded as occurred in ring 0.
> 
> To fully test the underlying FRED VMX code, this test should be run one
> more round with EPT disabled to inject page faults as nested exceptions.
> 
> Originally-by: Shan Kang <shan.kang@intel.com>
> Signed-off-by: Xin Li <xin3.li@intel.com>
Thank you for the new test patch. We have been trying to ensure TAP
conformance for tests which cannot be achieved if new tests aren't using
TAP already. Please make your test TAP compliant.

> ---
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/include/x86_64/processor.h  |  32 ++
>  .../testing/selftests/kvm/x86_64/fred_test.c  | 297 ++++++++++++++++++
Add generated binary object to .gitignore.

>  3 files changed, 330 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/fred_test.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 492e937fab00..eaac13a605f2 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -67,6 +67,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
>  TEST_GEN_PROGS_x86_64 += x86_64/exit_on_emulation_failure_test
>  TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hwcr_msr_test
> +TEST_GEN_PROGS_x86_64 += x86_64/fred_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_evmcs
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index bc5cd8628a20..ef7aaab790e0 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -1275,4 +1275,36 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
>  #define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
>  #define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
>  
> +/*
> + * FRED related data structures and functions
> + */
> +
> +#define FRED_SSX_NMI		BIT_ULL(18)
> +
> +struct fred_stack {
> +	u64 r15;
> +	u64 r14;
> +	u64 r13;
> +	u64 r12;
> +	u64 bp;
> +	u64 bx;
> +	u64 r11;
> +	u64 r10;
> +	u64 r9;
> +	u64 r8;
> +	u64 ax;
> +	u64 cx;
> +	u64 dx;
> +	u64 si;
> +	u64 di;
> +	u64 error_code;
> +	u64 ip;
> +	u64 csx;
> +	u64 flags;
> +	u64 sp;
> +	u64 ssx;
> +	u64 event_data;
> +	u64 reserved;
> +};
> +
>  #endif /* SELFTEST_KVM_PROCESSOR_H */
> diff --git a/tools/testing/selftests/kvm/x86_64/fred_test.c b/tools/testing/selftests/kvm/x86_64/fred_test.c
> new file mode 100644
> index 000000000000..412afa919568
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/fred_test.c
> @@ -0,0 +1,297 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * FRED nested exception tests
> + *
> + * Copyright (C) 2023, Intel, Inc.
> + */
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/ioctl.h>
> +#include <asm/msr-index.h>
> +
> +#include "apic.h"
> +#include "kvm_util.h"
> +#include "test_util.h"
> +#include "guest_modes.h"
> +#include "processor.h"
> +
> +#define IRQ_VECTOR 0xAA
> +
> +#define FRED_STKLVL(v,l)		(_AT(unsigned long, l) << (2 * (v)))
> +#define FRED_CONFIG_ENTRYPOINT(p)	_AT(unsigned long, (p))
> +
> +/* This address is already mapped in guest page table. */
> +#define FRED_VALID_RSP			0x8000
> +
> +/*
> + * The following addresses are not yet mapped in both EPT and guest page
> + * tables at the beginning.  As a result, it causes an EPT violation VM
> + * exit with an original guest #PF to access any of them for the first
> + * time.
> + *
> + * Use these addresses as guest FRED RSP0 to generate nested #PFs to test
> + * if event data are properly virtualized.
> + */
> +static unsigned long fred_invalid_rsp[4] = {
> +	0x0,
> +	0xf0000000,
> +	0xe0000000,
> +	0xd0000000,
> +};
> +
> +extern char asm_user_nop[];
> +extern char asm_user_ud[];
> +extern char asm_done_fault[];
> +
> +extern void asm_test_fault(int test);
> +
> +/*
> + * user level code for triggering faults.
> + */
> +asm(".pushsection .text\n"
> +    ".align 4096\n"
> +
> +    ".type asm_user_nop, @function\n"
> +    "asm_user_nop:\n"
> +    "1: .byte 0x90\n"
> +    "jmp 1b\n"
> +
> +    ".fill asm_user_ud - ., 1, 0xcc\n"
> +
> +    ".type asm_user_ud, @function\n"
> +    ".org asm_user_nop + 16\n"
> +    "asm_user_ud:\n"
> +    /* Trigger a #UD */
> +    "ud2\n"
> +
> +    ".align 4096, 0xcc\n"
> +    ".popsection");
> +
> +/* Send current stack level and #PF address */
> +#define GUEST_SYNC_CSL_FA(__stage, __pf_address)		\
> +	GUEST_SYNC_ARGS(__stage, __pf_address, 0, 0, 0)
> +
> +void fred_entry_from_user(struct fred_stack *stack)
> +{
> +	u32 current_stack_level = rdmsr(MSR_IA32_FRED_CONFIG) & 0x3;
> +
> +	GUEST_SYNC_CSL_FA(current_stack_level, stack->event_data);
> +
> +	/* Do NOT go back to user level, continue the next test instead */
> +	stack->ssx = 0x18;
> +	stack->csx = 0x10;
> +	stack->ip = (u64)&asm_done_fault;
> +}
> +
> +void fred_entry_from_kernel(struct fred_stack *stack)
> +{
> +	/*
> +	 * Keep NMI blocked to delay the delivery of the next NMI until
> +	 * returning to user level.
> +	 * */
> +	stack->ssx &= ~FRED_SSX_NMI;
> +}
> +
> +#define PUSH_REGS	\
> +	"push %rdi\n"	\
> +	"push %rsi\n"	\
> +	"push %rdx\n"	\
> +	"push %rcx\n"	\
> +	"push %rax\n"	\
> +	"push %r8\n"	\
> +	"push %r9\n"	\
> +	"push %r10\n"	\
> +	"push %r11\n"	\
> +	"push %rbx\n"	\
> +	"push %rbp\n"	\
> +	"push %r12\n"	\
> +	"push %r13\n"	\
> +	"push %r14\n"	\
> +	"push %r15\n"
> +
> +#define POP_REGS	\
> +	"pop %r15\n"	\
> +	"pop %r14\n"	\
> +	"pop %r13\n"	\
> +	"pop %r12\n"	\
> +	"pop %rbp\n"	\
> +	"pop %rbx\n"	\
> +	"pop %r11\n"	\
> +	"pop %r10\n"	\
> +	"pop %r9\n"	\
> +	"pop %r8\n"	\
> +	"pop %rax\n"	\
> +	"pop %rcx\n"	\
> +	"pop %rdx\n"	\
> +	"pop %rsi\n"	\
> +	"pop %rdi\n"
> +
> +/*
> + * FRED entry points.
> + */
> +asm(".pushsection .text\n"
> +    ".type asm_fred_entrypoint_user, @function\n"
> +    ".align 4096\n"
> +    "asm_fred_entrypoint_user:\n"
> +    "endbr64\n"
> +    PUSH_REGS
> +    "movq %rsp, %rdi\n"
> +    "call fred_entry_from_user\n"
> +    POP_REGS
> +    /* Do NOT go back to user level, continue the next test instead */
> +    ".byte 0xf2,0x0f,0x01,0xca\n"	/* ERETS */
> +
> +    ".fill asm_fred_entrypoint_kernel - ., 1, 0xcc\n"
> +
> +    ".type asm_fred_entrypoint_kernel, @function\n"
> +    ".org asm_fred_entrypoint_user + 256\n"
> +    "asm_fred_entrypoint_kernel:\n"
> +    "endbr64\n"
> +    PUSH_REGS
> +    "movq %rsp, %rdi\n"
> +    "call fred_entry_from_kernel\n"
> +    POP_REGS
> +    ".byte 0xf2,0x0f,0x01,0xca\n"	/* ERETS */
> +    ".align 4096, 0xcc\n"
> +    ".popsection");
> +
> +extern char asm_fred_entrypoint_user[];
> +
> +/*
> + * Prepare a FRED stack frame for ERETU to return to user level code,
> + * nop or ud2.
> + *
> + * Because FRED RSP0 is deliberately not mapped in guest page table,
> + * the delivery of interrupt/NMI or #UD from ring 3 causes a nested
> + * #PF, which is then delivered on FRED RSPx (x is 1, 2 or 3,
> + * determinated by MSR FRED_STKLVL[PF_VECTOR]).
> + */
> +asm(".pushsection .text\n"
> +    ".type asm_test_fault, @function\n"
> +    ".align 4096\n"
> +    "asm_test_fault:\n"
> +    "endbr64\n"
> +    "push %rbp\n"
> +    "mov %rsp, %rbp\n"
> +    "and $(~0x3f), %rsp\n"
> +    "push $0\n"
> +    "push $0\n"
> +    "mov $0x2b, %rax\n"
> +    /* Unblock NMI */
> +    "bts $18, %rax\n"
> +    /* Set long mode bit */
> +    "bts $57, %rax\n"
> +    "push %rax\n"
> +    /* No stack required for the FRED user level test code */
> +    "push $0\n"
> +    "pushf\n"
> +    "pop %rax\n"
> +    /* Allow external interrupts */
> +    "bts $9, %rax\n"
> +    "push %rax\n"
> +    "mov $0x33, %rax\n"
> +    "push %rax\n"
> +    "cmp $0, %edi\n"
> +    "jne 1f\n"
> +    "lea asm_user_nop(%rip), %rax\n"
> +    "jmp 2f\n"
> +    "1: lea asm_user_ud(%rip), %rax\n"
> +    "2: push %rax\n"
> +    "push $0\n"
> +    /* ERETU to user level code to allow event delivery immediately */
> +    ".byte 0xf3,0x0f,0x01,0xca\n"
> +    "asm_done_fault:\n"
> +    "mov %rbp, %rsp\n"
> +    "pop %rbp\n"
> +    "ret\n"
> +    ".align 4096, 0xcc\n"
> +    ".popsection");
> +
> +/*
> + * To fully test the underlying FRED VMX code, this test should be run one
> + * more round with EPT disabled to inject page faults as nested exceptions.
> + */
> +static void guest_code(void)
> +{
> +	wrmsr(MSR_IA32_FRED_CONFIG,
> +	      FRED_CONFIG_ENTRYPOINT(asm_fred_entrypoint_user));
> +
> +	wrmsr(MSR_IA32_FRED_RSP1, FRED_VALID_RSP);
> +	wrmsr(MSR_IA32_FRED_RSP2, FRED_VALID_RSP);
> +	wrmsr(MSR_IA32_FRED_RSP3, FRED_VALID_RSP);
> +
> +	/* Enable FRED */
> +	set_cr4(get_cr4() | X86_CR4_FRED);
> +
> +	x2apic_enable();
> +
> +	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 1));
> +	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[1]);
> +	/* 1: ud2 to generate #UD */
> +	asm_test_fault(1);
> +
> +	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 2));
> +	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[2]);
> +	asm volatile("cli");
> +	/* Create a pending interrupt on current vCPU */
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_FIXED | IRQ_VECTOR);
> +	/* Return to ring 3 */
> +	asm_test_fault(0);
> +	x2apic_write_reg(APIC_EOI, 0);
> +
> +	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 3));
> +	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[3]);
> +	/*
> +	 * The first NMI is just to have NMI blocked in ring 0, because
> +	 * fred_entry_from_kernel() deliberately clears the NMI bit in
> +	 * FRED stack frame.
> +	 */
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_NMI | NMI_VECTOR);
> +	/* The second NMI will be delivered after returning to ring 3 */
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_NMI | NMI_VECTOR);
> +	/* Return to ring 3 */
> +	asm_test_fault(0);
> +
> +	GUEST_DONE();
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +	struct ucall uc;
> +	uint64_t expected_current_stack_level = 1;
> +
> +	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_FRED));
> +
> +	vm = __vm_create_with_vcpus(VM_SHAPE(VM_MODE_PXXV48_4K_USER), 1, 0,
> +				    guest_code, &vcpu);
> +
> +	while (true) {
> +		uint64_t r;
> +
> +		vcpu_run(vcpu);
> +
> +		r = get_ucall(vcpu, &uc);
> +
> +		if (r == UCALL_DONE)
> +			break;
> +
> +		if (r == UCALL_SYNC) {
> +			TEST_ASSERT((uc.args[1] == expected_current_stack_level) &&
> +				    (uc.args[2] == fred_invalid_rsp[expected_current_stack_level] - 1),
> +				    "Incorrect stack level %lx and #PF address %lx\n",
> +				    uc.args[1], uc.args[2]);
> +			expected_current_stack_level++;
> +		}
> +	}
> +
> +	kvm_vm_free(vm);
> +	return 0;
> +}

-- 
BR,
Muhammad Usama Anjum

X-sender: <linux-kernel+bounces-125382-steffen.klassert=secunet.com@vger.kernel.org>
X-Receiver: <steffen.klassert@secunet.com> ORCPT=rfc822;steffen.klassert@secunet.com NOTIFY=NEVER; X-ExtendedProps=BQAVABYAAgAAAAUAFAARAPDFCS25BAlDktII2g02frgPADUAAABNaWNyb3NvZnQuRXhjaGFuZ2UuVHJhbnNwb3J0LkRpcmVjdG9yeURhdGEuSXNSZXNvdXJjZQIAAAUAagAJAAEAAAAAAAAABQAWAAIAAAUAQwACAAAFAEYABwADAAAABQBHAAIAAAUAEgAPAGIAAAAvbz1zZWN1bmV0L291PUV4Y2hhbmdlIEFkbWluaXN0cmF0aXZlIEdyb3VwIChGWURJQk9IRjIzU1BETFQpL2NuPVJlY2lwaWVudHMvY249U3RlZmZlbiBLbGFzc2VydDY4YwUACwAXAL4AAACheZxkHSGBRqAcAp3ukbifQ049REI2LENOPURhdGFiYXNlcyxDTj1FeGNoYW5nZSBBZG1pbmlzdHJhdGl2ZSBHcm91cCAoRllESUJPSEYyM1NQRExUKSxDTj1BZG1pbmlzdHJhdGl2ZSBHcm91cHMsQ049c2VjdW5ldCxDTj1NaWNyb3NvZnQgRXhjaGFuZ2UsQ049U2VydmljZXMsQ049Q29uZmlndXJhdGlvbixEQz1zZWN1bmV0LERDPWRlBQAOABEABiAS9uuMOkqzwmEZDvWNNQUAHQAPAAwAAABtYngtZXNzZW4tMDIFADwAAgAADwA2AAAATWljcm9zb2Z0LkV4Y2hhbmdlLlRyYW5zcG9ydC5NYWlsUmVjaXBpZW50LkRpc3BsYXlOYW1lDwARAAAAS2xhc3NlcnQsIFN0ZWZmZW4FAAwAAgAABQBsAAIAAAUAWAAXAEoAAADwxQktuQQJQ5LSCNoNNn64Q049S2xhc3NlcnQgU3RlZmZlbixPVT1Vc2VycyxPVT1NaWdyYXRpb24sREM9c2VjdW5ldCxEQz1kZQUAJgACAAEFACIADwAxAAAAQXV0b1Jlc3BvbnNlU3VwcHJlc3M6IDANClRyYW5zbWl0SGlzdG9yeTogRmFsc2UNCg8ALwAAAE1pY3Jvc29mdC5FeGNoYW5nZS5UcmFuc3BvcnQuRXhwYW5zaW9uR3JvdXBUeXBlDwAVAAAATWVtYmVyc0dyb3VwRXhwYW5zaW9uBQAjAAIAAQ==
X-CreatedBy: MSExchange15
X-HeloDomain: b.mx.secunet.com
X-ExtendedProps: BQBjAAoAdEWmlidQ3AgFAGEACAABAAAABQA3AAIAAA8APAAAAE1pY3Jvc29mdC5FeGNoYW5nZS5UcmFuc3BvcnQuTWFpbFJlY2lwaWVudC5Pcmdhbml6YXRpb25TY29wZREAAAAAAAAAAAAAAAAAAAAAAAUASQACAAEFAAQAFCABAAAAHAAAAHN0ZWZmZW4ua2xhc3NlcnRAc2VjdW5ldC5jb20FAAYAAgABBQApAAIAAQ8ACQAAAENJQXVkaXRlZAIAAQUAAgAHAAEAAAAFAAMABwAAAAAABQAFAAIAAQUAYgAKAD0AAADLigAABQBkAA8AAwAAAEh1Yg==
X-Source: SMTP:Default MBX-ESSEN-02
X-SourceIPAddress: 62.96.220.37
X-EndOfInjectedXHeaders: 31682
Received: from cas-essen-01.secunet.de (10.53.40.201) by
 mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.37; Fri, 29 Mar 2024 21:18:58 +0100
Received: from b.mx.secunet.com (62.96.220.37) by cas-essen-01.secunet.de
 (10.53.40.201) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend
 Transport; Fri, 29 Mar 2024 21:18:58 +0100
Received: from localhost (localhost [127.0.0.1])
	by b.mx.secunet.com (Postfix) with ESMTP id EF002202A6
	for <steffen.klassert@secunet.com>; Fri, 29 Mar 2024 21:18:57 +0100 (CET)
X-Virus-Scanned: by secunet
X-Spam-Flag: NO
X-Spam-Score: -5.051
X-Spam-Level:
X-Spam-Status: No, score=-5.051 tagged_above=-999 required=2.1
	tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249,
	MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001,
	SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: a.mx.secunet.com (amavisd-new);
	dkim=pass (2048-bit key) header.d=collabora.com
Received: from b.mx.secunet.com ([127.0.0.1])
	by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id Xr8udnqlhNFw for <steffen.klassert@secunet.com>;
	Fri, 29 Mar 2024 21:18:56 +0100 (CET)
Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=139.178.88.99; helo=sv.mirrors.kernel.org; envelope-from=linux-kernel+bounces-125382-steffen.klassert=secunet.com@vger.kernel.org; receiver=steffen.klassert@secunet.com 
DKIM-Filter: OpenDKIM Filter v2.11.0 b.mx.secunet.com 89D0720270
Authentication-Results: b.mx.secunet.com;
	dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b="L2zt8eif"
Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org [139.178.88.99])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by b.mx.secunet.com (Postfix) with ESMTPS id 89D0720270
	for <steffen.klassert@secunet.com>; Fri, 29 Mar 2024 21:18:56 +0100 (CET)
Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by sv.mirrors.kernel.org (Postfix) with ESMTPS id 11B09286CE4
	for <steffen.klassert@secunet.com>; Fri, 29 Mar 2024 20:18:54 +0000 (UTC)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp.subspace.kernel.org (Postfix) with ESMTP id D6E6713BC1B;
	Fri, 29 Mar 2024 20:18:38 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b="L2zt8eif"
Received: from madrid.collaboradmins.com (madrid.collaboradmins.com [46.235.227.194])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4699D13BAD1;
	Fri, 29 Mar 2024 20:18:32 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=46.235.227.194
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1711743514; cv=none; b=j2hUMl9NyGzSDJP9D+OX7+sbNQl02uJeOwLiIYfS7ZIoMaBPLsqmW/jyWFUtGqZN4jPEzBxKaNM7IowzhVMslqTg+//xfLymH6hV62MaIaGnIgOuOii5e9xzwtIRkH5HWMjNpof2iW7QWN7Tcw8zkKXKTjtgqP9CWAgXF8imcAQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1711743514; c=relaxed/simple;
	bh=M1bVLI2QVrqN4cSmcZhh5HNh6nWGHhPJEonGQgPvO00=;
	h=Message-ID:Date:MIME-Version:Cc:Subject:To:References:From:
	 In-Reply-To:Content-Type; b=Q8XWirNIrCY/iG2sgaBNL9z10ipPNp8KvMULAgzBLiQ3oxYnUsFdhW++cX5jhVlGbpdAUnZG20D2pnBZ2mgswRNCJhh4m1A3AuXfmG60vYtoGDkDdiVVeJcQA7IKbu/UkeUEIi7tV7upwlWyDD7BLzxzw0IOs1hlQVJFBV1BUWw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=collabora.com; spf=pass smtp.mailfrom=collabora.com; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b=L2zt8eif; arc=none smtp.client-ip=46.235.227.194
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=collabora.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=collabora.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com;
	s=mail; t=1711743510;
	bh=M1bVLI2QVrqN4cSmcZhh5HNh6nWGHhPJEonGQgPvO00=;
	h=Date:Cc:Subject:To:References:From:In-Reply-To:From;
	b=L2zt8eif9xbOYilCC0YsH5YJqm0E16o6ScfkTW8PJ/tzy1yVnA9m7MIrFJkygElI4
	 9hCl+N3ah/v40+3Yx+05eCqrFqcX/Awpvn4TAeuhHQOQ+2N0rNKzn/gOCGMxoZr/Gq
	 DF/h2vk8S2rZYUfUNjluciLLrYgJv+amN4UxvGROCmzTJmOZ+DVJJnw3K4pPOUc4TY
	 i/xNMJah3fDLREZOStrAZWyACogXW5CaWGDtRv5u0TLiGFA9n0LpoIpiTsYcr08Jxl
	 b7svNLTCyPu+7BoeuNWz0Bky0Jlr+lhnkzgi3pd6Fh/OFrM5fTlRMzBRvyWIRIa/JA
	 +luzgRZU3AZeg==
Received: from [100.113.15.66] (ec2-34-240-57-77.eu-west-1.compute.amazonaws.com [34.240.57.77])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	(Authenticated sender: usama.anjum)
	by madrid.collaboradmins.com (Postfix) with ESMTPSA id 9C34B3780C21;
	Fri, 29 Mar 2024 20:18:23 +0000 (UTC)
Message-ID: <42ca1da5-445b-47ca-a952-444eaa921360@collabora.com>
Date: Sat, 30 Mar 2024 01:18:54 +0500
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>, seanjc@google.com,
 pbonzini@redhat.com, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
 bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
 shuah@kernel.org, vkuznets@redhat.com, peterz@infradead.org,
 ravi.v.shankar@intel.com, xin@zytor.com
Subject: Re: [PATCH v2 25/25] KVM: selftests: Add fred exception tests
To: Xin Li <xin3.li@intel.com>, linux-kernel@vger.kernel.org,
 kvm@vger.kernel.org, linux-doc@vger.kernel.org,
 linux-kselftest@vger.kernel.org
References: <20240207172646.3981-1-xin3.li@intel.com>
 <20240207172646.3981-26-xin3.li@intel.com>
Content-Language: en-US
From: Muhammad Usama Anjum <usama.anjum@collabora.com>
In-Reply-To: <20240207172646.3981-26-xin3.li@intel.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Return-Path: linux-kernel+bounces-125382-steffen.klassert=secunet.com@vger.kernel.org
X-MS-Exchange-Organization-OriginalArrivalTime: 29 Mar 2024 20:18:58.0490
 (UTC)
X-MS-Exchange-Organization-Network-Message-Id: 7abcb20b-c178-40ec-af7b-08dc502d7707
X-MS-Exchange-Organization-OriginalClientIPAddress: 62.96.220.37
X-MS-Exchange-Organization-OriginalServerIPAddress: 10.53.40.201
X-MS-Exchange-Organization-Cross-Premises-Headers-Processed: cas-essen-01.secunet.de
X-MS-Exchange-Organization-OrderedPrecisionLatencyInProgress: LSRV=mbx-essen-02.secunet.de:TOTAL-HUB=0.233|SMR=0.133(SMRDE=0.003|SMRC=0.129(SMRCL=0.103|X-SMRCR=0.128))|CAT=0.099(CATRESL=0.029
 (CATRESLP2R=0.019)|CATORES=0.064(CATRS=0.064(CATRS-Transport Rule
 Agent=0.001(X-ETREX=0.001 )|CATRS-Index Routing
 Agent=0.060))|CATORT=0.003(CATRT=0.003(CATRT-Journal Agent=0.003
 )));2024-03-29T20:18:58.317Z
X-MS-Exchange-Forest-ArrivalHubServer: mbx-essen-02.secunet.de
X-MS-Exchange-Organization-AuthSource: cas-essen-01.secunet.de
X-MS-Exchange-Organization-AuthAs: Anonymous
X-MS-Exchange-Organization-FromEntityHeader: Internet
X-MS-Exchange-Organization-OriginalSize: 19243
X-MS-Exchange-Organization-HygienePolicy: Standard
X-MS-Exchange-Organization-MessageLatency: SRV=cas-essen-01.secunet.de:TOTAL-FE=0.034|SMR=0.024(SMRPI=0.021(SMRPI-FrontendProxyAgent=0.021))|SMS=0.010
X-MS-Exchange-Organization-Recipient-Limit-Verified: True
X-MS-Exchange-Organization-TotalRecipientCount: 1
X-MS-Exchange-Organization-Rules-Execution-History: 0b0cf904-14ac-4724-8bdf-482ee6223cf2%%%fd34672d-751c-45ae-a963-ed177fcabe23%%%d8080257-b0c3-47b4-b0db-23bc0c8ddb3c%%%95e591a2-5d7d-4afa-b1d0-7573d6c0a5d9%%%f7d0f6bc-4dcc-4876-8c5d-b3d6ddbb3d55%%%16355082-c50b-4214-9c7d-d39575f9f79b
X-MS-Exchange-Forest-RulesExecuted: mbx-essen-02
X-MS-Exchange-Organization-RulesExecuted: mbx-essen-02
X-MS-Exchange-Forest-IndexAgent-0: AQ0CZW4AAa8WAAAPAAADH4sIAAAAAAAEAMU6C1fbVpoSfvHOm6avmT
 u0zdqJMbYhhCTtnFDipJxAYDBkM6fToyPLMqixJY9kE9ht53/uv9nv
 oStLsgROmp31SYx87/d993s/rv0/7X1b1FcfrdbXRa36pL4hDvbK4q
 1li11LvHedgflkYfavYqvdFgPTG3ii47jixWHjuTDPTHsg2vpAF7rd
 Fm/23gobIMz2inlumP2B5dgVQEVsgrc80TY968Q2gZQjPP3MFLownF
 6/aw5Mn5zh2APzfCDgfAsO8wa68U50XL1nlpGQWTmplJkc4nticGqK
 jj7sDiz7RHQt29Rdobfbrul5wukA/W8OXgAxOE8XG+srLWtAZEasdy
 yz2wbGOhbyBccy8dG5FSG2gJGhceofjILCEV6M0CkooYtctIcu/kGw
 gavbnoWa8KQqfgQVgBp8WsCZ67SHBkjCpPjgLnzoeqiltuX19YFxCi
 r2Iegc00VaDsvFOC3d9NqwJIyh68ZIkYEipEcCI6GdrbW6hgxpzaNX
 u292m2KveUiWNnU4O7CnODONgeOCSpKsjZS8Yb/vuANh2t4QjEAGBn
 aMwbh86BDGqeOZtnh/Cm86EAW8gXuBhCz7V8BCAnzMiIkygFvAFaC7
 5onutmFTB2sbJDjZkAxQlSo/ckRn2O1ekAOTxwzttul2LxAqMKnhtM
 0y7AJZgvNOnSF4RssU7tAGvZpIque48NkBfPHeGpyKxsERmkhvddmp
 mWvR1098t/SQtbgEgTPsu9aJZevA20rr4ologm3FKx24+t6Dx8o7eH
 wGLmJ2KxAmhNGk+FlxOh3C8OP0+3PLXqt0rQjwEZB4Jy6cIRkSxbbN
 9ywbuVRF/KcJ3gRR2DJB/aB31AcIwbYTR1sHC7MQj4Dd023DZDKUA9
 gAhm7bzgA1BE5igUlB953gEJAc3PA/BmLoAV1gZ+tA6F3X1NsXFXHQ
 NXXPFD39nYkcMl08kfOBpdsD0BEKvLKygn+AL6frrSIYUFv1zG6HDl
 l9d9Zb3QMyHatriuD1G/yviQeEWalUYvCWbXSHbXP1fHND21hf7bsO
 BKAHfn1KmGt18WCEmnykj9oBh9NwtWIgav3xI0CNvxZmMXuemLbp6u
 gHLTC5eyGcFvkKKLxyAjnixAbf8oUWawLlwfgAFzDbZbG2VgXn8kyX
 /Kf4oERgBqgTUmcPXBeSd3Vjff1SPY0z7fth2+p0QNPAhtBXJ9J0ay
 IwDuS2eS7WH9fNx2uPOnqrWq1UTF03amv6RvVhp+4z7tt6wvMRGjT7
 AWw8eyZWNh6VN8QDeH+EH48azSPtZeO1dnC4/7KpsXLEgx+Er6YTc6
 D1PFcjAbQOaBozGun9KlTz3Bpojq2ZvWFXR4tpHd3qAjqpfSISHetc
 O73om64B6WFytNP3hktcS4wHVx4kvWEy+sjSmWZ0HePdByH0h1b7Qx
 DMs57hfZBvXhLVl7tJOuLIf1vGQ6O9uVHf1Ovov51Huq63Hj2umtUP
 8t/Lj7raoy/HRx+v1R89LIM+6e/aBq6dOVZbnFkuuLPe16j2Fr2BO4
 TkAzS1s564f9YriyEUj411bSDOsIEKfe7TZzLeN9w3iIMXjcND7eUx
 2vJg62VD29tqvpLJ98edI+14d7c4BgQbpQQyO3sHuzvbgLS1vd1oNp
 PJxIAkLVLb6n36I+5zNXfNLuVZastYUO5FoGp3hrZBKdTHWKW/9CaZ
 4j6o+VZ7vbcj4szUNksjDF+JFEHc2Pw3k8XXELzarT18Gl9ZH1tZG1
 upx1Za/fjC+RhObWylGl95HF/YjC3ocbJGfKEdX/CsOER8wXRdx9Ww
 vYptWHGpDC9OvdPVT7z4iXE0bwyNek0N7R+X2IQSCo0KL//+NLCl+M
 a0IdOI1fui2dh9QUnq1Zs9TFLobvuH2k++s0ycjxJahMuDO7E8Uz9F
 JTdU5Ud5qRp6VSrrtbre0R/XHj/c2JQJabVtnq3a0P9OlmGSmcDMUi
 1XIa2Uscd59ozjblU0D56/Xdm1DOgZzZWdNqjdgonKfSJeHuyu1CvV
 FcfuXiQFabwn5p7RB5GQ207/Alrk04EobpdEvVpfK4sdbHHxj1GJxL
 AMX+3l62OtuX98uN1Aa2LTCmnyBEY5qORnjsH1GLp7yIc2zHcBvp9b
 xfcdwx50K6d/jS17g7blJC53rVbSOk4hCesX3qrlGEkn6F5vFWr3Cp
 lWbkdhlvW+ZVROl2OrmMeHA6s7voNqTdk6GeIeupU3vhkqLcvjOXLn
 8G/am8b20f6hqJ5vbaUlURomi2flbkmEX0Vt66g4tP27gK5jn5QFgH
 z/vSjWwerFs1KpNE5te//1i52XWuP10eHfD/Z3Xh8V+0Q2gRjshNI0
 eMERjnXyXgAfeRCBAaTf55GRlMGD2wDHuUrcr4iHN1u7O8+1w+aBSH
 5VzzchEMNHS1c+wosKp9t13uOU5bNi0pAkcIq6MAchblqOP1xizRqx
 JokRh4DL02zLhCHSRmejqwodsxyMnmUBKcrQh3SKTdTOLIf7UZh4JS
 3sVXmYBSDHH0j9M/HyBK8YDHQF2L/AOxU4sheMlB3L9ftMZMvqmZVY
 CB/DmAeA8B6S2fPpUyYAbVbxFDkgydQAh9MVSNDJAjXI0OFrJ9Ad+G
 kfZ3lqcIZ61/ovsx1NDFCZB5YhIi7CNRvyASC0Ndfr/7z+i/ghXL6r
 59Vy+FPHT7KRRTNpsR1ZDNWYB+b5wHRtHOpc0AFELBQjzXb6P//yNH
 1/2E7bbju2qdEtg4QIg1HXh2CUAQisCB0dqbP0NMlF8Tj/agarNdsY
 HOLEpPsUvs+I6hboF5cr/aF36pnUV4kKXt/9w14ONLJcAR2f2GK9+n
 hDroc2B9DuR5RRFs9kkxYhE4Z5EtmpPRGV1gU4TvX8cTWy82uvL2qt
 hFOhoHbDKhYrolIWtTKQMIwruRy205isOO5JRBpYr20kyzFsR8TALM
 XKxlvL4+dSxYQ0bNcTuBopNso47/advm+U5ai9RRN6ndRrQox5mSlj
 KZBb+ebfX29r281d7cVWUcO+98QsC03rdzQfLZrs/zGKjRD+1uHLZj
 I2iML/QhmcfJlClu4HtY7rsBKL4y34ffrDyOF2fK0uRWZAHoQg6N02
 1NziXhNmjOAalGtNSdwDva49jeo9QQ0JhMus2JW/jnrRUpwQmOK5I1
 7vH4kTR7SQeUh3oygs0224ZQ9N//bunKMXL4IGUL7CHiJPg14YRKqe
 18Ktvb9nyL3q+J7Vh60itMile9HU4nfKV9jiHWSc0VR5tTVk0qHXff
 HKNPsCx60W3ivwVWobprgLEhyerDPTlcWHFYHQQ9BON0LINWHUs/17
 zJEmKxGgFLXd+0H8Kzz7jUkeTK3HzZ+0w8bLZsy9lzEPiu/ctgWBmL
 zlpW+1z1O3jPQtPX1rk3eSth6nb9WqqQRrtdStVjobrX46wXr61lr6
 1nr61sNIlgwMtn8Q2CuKBgnax0rZWk/fWkvfqqduSW0kbZ2nE6ylb1
 VTt6SVE7Y207f0dDaM9K12+pb0+SQsK2Kv2IhI+UX0HehaPqbtCGr2
 KFkRLSocqQV8vFmh9TQ60U4E6mrL3VgPrwXJYgTVc87+iXqBViesAt
 rD297ESheh6ftzuHf4VIVkWbZSnXoZeuAOvtXgzdB9K8JZjcPGUTNy
 d8aYQVcV1xSXh4mbrGT0K3uuJPsATP3h1abkE/4txuSjLjXnB9sg7r
 of0A7Gh4okJUbGi1GQHrhmH0cwfew7cxodkMNjdEQuylGX5K9bJSVs
 lAEDutz4+PijSRNsaFKknxF0rRbNijD34fCcPsYHR8QbCfym1HWH/c
 EqdhJwOLbcaCD+8ngtmJz9cVTSgfY49AX0AL+49sma9O27ZPRcFM8R
 Aty9juTXAk7aJpzcgymbvgu8oK/aQ9clPx+88K9Wfin9kaw3Gvs+Jt
 eNsK8MikihjwRGEBexLRwzvi3+C1rrTqlMQOP0vq1euYYnfFs9r7fK
 smyFE+KxTc0kNYrhMGkNPPFtbTMRp2kO+HqA7lpb1mAc8+GjccxIMx
 am99rxo8I1/zm00EPkvQm5SSgcKB3T1B0+MU0TnejKqG6HD9/CyybB
 8a13Rw7vjQv1eFKZpM7X1ibGMGAG/xbmue/MWHb8FTqzWlSSrqlHpu
 fid67VLyUchYN9PYpbeyIi6MN2Knb9iUjmNUHffqo9TkhfdDXGSuY7
 KZlerF4PhOX0lFha18bTejT8RvPXk4SgavWTombUX44WIfWmhfwHXR
 nI3PUJf0qD5D7hz2lE/L6P5lS+4kZ+ivh5bA5976bN/qErPX6l3EGn
 1czS2LSfdBYUilo5drVcenolTv0jcNZScEaI6Oo26p4tGRmTTdCiu1
 488f+WxG/i7eaGtn24TgeM0Tqv4xcVoBSkV5xIF/5Pz8qR7w6CYgiN
 Y2kCKSHRjF3t1n4JI4KUkCqg1YjcOccu3OIXp7VPIkH9YyWoRyQA3s
 C78SIfVLtsdK3lmHzb/LMgXfTxS02IzCD3h38feLZ9cByR2bfZe9ca
 mJprnhS3Dna2tZ1t4JyentO1V2P3BRift14faVvNZuPwSPwWD5fQi5
 H3tBc7b8Gtfgt9eRTj+zBoFP0e7BKLVMO4yZw39nfwGvFTWG7tYy23
 FvO9kJr874TwCxRqUiBb/jr06Adh9Ku88H1Y8JvGMmRR6kwjhFKu40
 rRTtmA+ujyT2aJtjXwf/4Z0Bn77Wt499/oKcjeb8hksp+g3qBOOVA2
 EPI9jp0tM9SH6x3w9+hlYIJH/T9L8THePgLgS+jn+68bMruGbirxe5
 6ebtn0hY/unhhlHvDuw/PZz7+MlcHwr4GM/lDcx/enyfv0a6HxvSEN
 u0Mj/IML+QMi87wPZRxcNPkCvhaXjX59cdj42/HOYaOIhwI32qnuFb
 HkvGhsHR0fNrjsjKkF2PtBaBriUBLUsK8gobzimz2t+dPWQQMf9vaf
 N7SDt2/frG9q66+0YzBqie8mxgp/8mvUWZTFPaQ/nmdO8VcbRVCPWQ
 p/syhfgXrcOKZ8IVkNWqZiIn35ckFirMtkAQIFjoZGKrzVEUXA+UEc
 b2/t7pILldJlboEa301ICr8SSRRVvsiwHEHF4tCogDt6UJ6RwmU+Uh
 L37k1mlYBonYiOpePLTvlFrECPMaH5l3ds+cP28Bdo33XPx75EgzXo
 siekO1JKWYxkCWeO+OsykR48SED8fbT0e8yuHODYz0K/3Auf6t/hVI
 NUszCLP8z+8RDk2hue6r2e3hbHnt7TxZb967CHAIoypWRUVcnzP2VK
 Vaam1My0okwrsxklC7s5esgpeXjPKjl4gHd4hn+Acl0pzCg3YAUgp5
 UZeIdnIALPAFlQpmGF0YlOninACgDklQIDqErGR1SnphRkIo9YhZyE
 ZzB+zvnPPhtZZZrBmCCszCm3CsoMrMwqcwSfh/WAPaI2x0zyLiwWiA
 gzxvB0Yj6rKjeUHADDRxYZnmeU2bwyExJ/AXbhHxAM6DBZoMMH0bkz
 DJNHlDkGYJZ8vakAqaiSSZZullilc3EF/rEsxEwhbA4mwicyDLzPKj
 dnVWVBmZsizrMk0S2iQ8JO53G3QIbO8juLzDxkpE6Cf4RYYFuED8oo
 uTllXho6ovBApTk6qEBmYnTmfEa5yyaeIU+bVpdID/P8ETWPzpmdIl
 +FRVXN0/NigBIGAP5niLKqAnElo9ylI8BkBeIqO6PeIPpo/YI6QzCg
 55vIlQoSKcRq/griynSWxPHpBMwrM2Mr8/BxXlmcVxeI+OIYwPX4in
 oNuVKzyKc6NXqGo1XlDikN7eJzlQ9zCM9LZCkAmFYWroQHgD+RueHo
 Ago1M8kRN0mlkwBn0cHYgScCJvVeDQl+O49mmv0/gJxJtGyCH1K2vA
 omd7n1k1DiLpQUApBnwAvz7CH8zBJlyUkWpaQYm5hOMbSnKGVNKwWi
 Xwg9z6PPo37yOT/PzANWgR1JVW7L9QzFEUNSvl0Mg33uL4L2ljhnch
 KmAPyK4gs0zLVD0veFWuDdWVTXDQJYTHwHyOA9bf3y9xyeArJci9HJ
 KdfjKyHIaeVLcAxI33cp85OZkjNASrqIu0oKGEdKTiaiWTDrvKKg/y
 h3aGt+IjrqNJXRDLsHkboBu5+RUWZJ+TmZXugdS39WucUpHSjnlNtQ
 pFQV8rySReuDpDfzamFOUbHwsVuqytdIbY58A4ogVnwVS+HoIY8OFn
 lgtwyeM8gzryxw1fa30Dmzwcdw4kX6qjJHPskfyRwL7OqzyB7Ul2vs
 UXMkEen/BpckEEdVC5xOVfLnLObMHJXLOVVWrixiXWc35g6Bi5rsSX
 z6i34l/SKvfMGO4duFGyRZKzMyJ2cp1rgszijz1PyAgaa5jMr12SnM
 rrmgreKaK0ttNkpzIavcZuKEmJEh7OPmqXUhlPlwawFbc8rnQJCExa
 Rxnfscv9ZAOCzOKnfo31dcsjMYv1lCnwtW5mhlijVJK4t+n3BzbIV7
 yC/CfcsskgL+83MoCFhkAXyevA6eb0YPzVN5nQ+tgKvPE7fc3uRUak
 JIqzcj7KnTBJYJk5Wc54jsAnujT9lPpxk+TkKiwamLA1tco9i/KbnN
 M/OzJF2on8yR81zntHlDSg2H5tTCPHWGBUXA1jXfaoXRujpDsZkZvS
 vzbKnbUoEcsDJzQvFa4sC85TMwTcUO4LMSGJzzVpSTTAgemzSZ9+6y
 N7KzyfYbvSUjqRWU25QZMEmGH6ZI8JDdl5jtJck2N2Mhtr9jZjjWgg
 GBGcj6tewmuzcR/DMdNx0qKHeYIGWJ21mioOJgMtGDdJjRwwehpzz4
 GYny2J0M8QnE//gDMXkr/DAx+hKHXiziOBzukqpp4ihwRIQicT4KgL
 0SBcKc1PkiSc2CJ3gpQEpVXJ/Ar+4EA4JEhxx4l/n8QrKR92M/EtSj
 XajXobgeMa8WuOdnoULw05MIFfbhkFDXkhi+NZYcPgtSYkiZX/JgKM
 dbfwgNKoLMltzDgOpucWfF2pun9WhtWgzCZ8af4n1tyxlzimtiXrk2
 Jc/iwVZOdnm+BwhlHvlRmYUT/cCU5XUK1QXcLo07lcwA04keJXcLUv
 PzMnbmyWOD5yxZedo3hJoLknP4OUP5MyOzdAZKmJolWzOdxcDBxraC
 cxdZjeyNAbx04xvh46j9mJeRtchYYGLZ0+ailMPshRajwLJ4zUeKEf
 4rRACk5zNYUJ58sFFc5Mc0thiYm4VieZl56AfG3DhupqhdciEDLYx5
 +18mvsqYy159lXGXV2R5Agp3xqrbZ+FeLhplSwx8UwLzZYjs3679gV
 0O6hvBKCQfptmCJPIdvmnJ011HBgeoINfNxOPI71UWEw+NNGn+VYaf
 Tqcxfq+N7MXjIbQon/BEP7dgF8E5hCcR7pBBCXwn5nd6mBvRe1lX3N
 5ksdXJUcr6hi8HyEmmZet+x096fqZaHBNnIeEIH/jTSvp1Rl57hktV
 0KVIDkHw29G6MDPPpV9OAVNUHSjKvv5Q/czRuRnilvXD2T7jh3yGhj
 W/8w+U9sdPucoKN2UYfhZ0aBTXuBJqe/I8iIWuIBb4em2K7Mj3D2CS
 r+h2FFXkG+gGt3Z89TEX3ECiXFnytKUpNbdEfXJeXoROIcEMPyyRsa
 QUhVAhK0rOC5RA4P3zgBMVEqCvLmD+azU0n2aR1Jc0AgQpCLrQr6Mr
 f5H+FvaQfERApRreGnkd3vuF9UDNv3Kf7ZvzUyVPEzOUyUshOtnQ85
 +ixz1kBUofBt/g4fc6Dz4I878GE2nbYUoAAAECpAQ8P3htbCB2ZXJz
 aW9uPSIxLjAiIGVuY29kaW5nPSJ1dGYtMTYiPz4NCjxUYXNrU2V0Pg
 0KICA8VmVyc2lvbj4xNS4wLjAuMDwvVmVyc2lvbj4NCiAgPFRhc2tz
 Pg0KICAgIDxUYXNrIFN0YXJ0SW5kZXg9IjExNDAiPg0KICAgICAgPF
 Rhc2tTdHJpbmc+UGxlYXNlIG1ha2UgeW91ciB0ZXN0IFRBUCBjb21w
 bGlhbnQuPC9UYXNrU3RyaW5nPg0KICAgICAgPEFzc2lnbmVlcz4NCi
 AgICAgICAgPEVtYWlsVXNlciBJZD0ieGluMy5saUBpbnRlbC5jb20i
 PlhpbiBMaTwvRW1haWxVc2VyPg0KICAgICAgICA8RW1haWxVc2VyIE
 lkPSJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiAvPg0KICAg
 ICAgICA8RW1haWxVc2VyIElkPSJrdm1Admdlci5rZXJuZWwub3JnIi
 AvPg0KICAgICAgICA8RW1haWxVc2VyIElkPSJsaW51eC1kb2NAdmdl
 ci5rZXJuZWwub3JnIiAvPg0KICAgICAgICA8RW1haWxVc2VyIElkPS
 JsaW51eC1rc2VsZnRlc3RAdmdlci5rZXJuZWwub3JnIiAvPg0KICAg
 ICAgPC9Bc3NpZ25lZXM+DQogICAgPC9UYXNrPg0KICA8L1Rhc2tzPg
 0KPC9UYXNrU2V0PgEKugI8P3htbCB2ZXJzaW9uPSIxLjAiIGVuY29k
 aW5nPSJ1dGYtMTYiPz4NCjxFbWFpbFNldD4NCiAgPFZlcnNpb24+MT
 UuMC4wLjA8L1ZlcnNpb24+DQogIDxFbWFpbHM+DQogICAgPEVtYWls
 IFN0YXJ0SW5kZXg9IjkxNyI+DQogICAgICA8RW1haWxTdHJpbmc+c2
 hhbi5rYW5nQGludGVsLmNvbTwvRW1haWxTdHJpbmc+DQogICAgPC9F
 bWFpbD4NCiAgICA8RW1haWwgU3RhcnRJbmRleD0iOTY0Ij4NCiAgIC
 AgIDxFbWFpbFN0cmluZz54aW4zLmxpQGludGVsLmNvbTwvRW1haWxT
 dHJpbmc+DQogICAgPC9FbWFpbD4NCiAgPC9FbWFpbHM+DQo8L0VtYW
 lsU2V0PgEMvgY8P3htbCB2ZXJzaW9uPSIxLjAiIGVuY29kaW5nPSJ1
 dGYtMTYiPz4NCjxDb250YWN0U2V0Pg0KICA8VmVyc2lvbj4xNS4wLj
 AuMDwvVmVyc2lvbj4NCiAgPENvbnRhY3RzPg0KICAgIDxDb250YWN0
 IFN0YXJ0SW5kZXg9IjkxMSI+DQogICAgICA8UGVyc29uIFN0YXJ0SW
 5kZXg9IjkxMSI+DQogICAgICAgIDxQZXJzb25TdHJpbmc+S2FuZzwv
 UGVyc29uU3RyaW5nPg0KICAgICAgPC9QZXJzb24+DQogICAgICA8RW
 1haWxzPg0KICAgICAgICA8RW1haWwgU3RhcnRJbmRleD0iOTE3Ij4N
 CiAgICAgICAgICA8RW1haWxTdHJpbmc+c2hhbi5rYW5nQGludGVsLm
 NvbTwvRW1haWxTdHJpbmc+DQogICAgICAgIDwvRW1haWw+DQogICAg
 ICA8L0VtYWlscz4NCiAgICAgIDxDb250YWN0U3RyaW5nPkthbmcgJm
 x0O3NoYW4ua2FuZ0BpbnRlbC5jb208L0NvbnRhY3RTdHJpbmc+DQog
 ICAgPC9Db250YWN0Pg0KICAgIDxDb250YWN0IFN0YXJ0SW5kZXg9Ij
 k1NiI+DQogICAgICA8UGVyc29uIFN0YXJ0SW5kZXg9Ijk1NiI+DQog
 ICAgICAgIDxQZXJzb25TdHJpbmc+WGluIExpPC9QZXJzb25TdHJpbm
 c+DQogICAgICA8L1BlcnNvbj4NCiAgICAgIDxFbWFpbHM+DQogICAg
 ICAgIDxFbWFpbCBTdGFydEluZGV4PSI5NjQiPg0KICAgICAgICAgID
 xFbWFpbFN0cmluZz54aW4zLmxpQGludGVsLmNvbTwvRW1haWxTdHJp
 bmc+DQogICAgICAgIDwvRW1haWw+DQogICAgICA8L0VtYWlscz4NCi
 AgICAgIDxDb250YWN0U3RyaW5nPlhpbiBMaSAmbHQ7eGluMy5saUBp
 bnRlbC5jb208L0NvbnRhY3RTdHJpbmc+DQogICAgPC9Db250YWN0Pg
 0KICA8L0NvbnRhY3RzPg0KPC9Db250YWN0U2V0PgEOzwFSZXRyaWV2
 ZXJPcGVyYXRvciwxMCwwO1JldHJpZXZlck9wZXJhdG9yLDExLDQ7UG
 9zdERvY1BhcnNlck9wZXJhdG9yLDEwLDE7UG9zdERvY1BhcnNlck9w
 ZXJhdG9yLDExLDA7UG9zdFdvcmRCcmVha2VyRGlhZ25vc3RpY09wZX
 JhdG9yLDEwLDg7UG9zdFdvcmRCcmVha2VyRGlhZ25vc3RpY09wZXJh
 dG9yLDExLDA7VHJhbnNwb3J0V3JpdGVyUHJvZHVjZXIsMjAsMjk=
X-MS-Exchange-Forest-IndexAgent: 1 7733
X-MS-Exchange-Forest-EmailMessageHash: BCAF8B2D
X-MS-Exchange-Forest-Language: en
X-MS-Exchange-Organization-Processed-By-Journaling: Journal Agent

On 2/7/24 10:26 PM, Xin Li wrote:
> Add tests for FRED event data and VMX nested-exception.
> 
> FRED is designed to save a complete event context in its stack frame,
> e.g., FRED saves the faulting linear address of a #PF into a 64-bit
> event data field defined in FRED stack frame.  As such, FRED VMX adds
> event data handling during VMX transitions.
> 
> Besides, FRED introduces event stack levels to dispatch an event handler
> onto a stack baesd on current stack level and stack levels defined in
> IA32_FRED_STKLVLS MSR for each exception vector.  VMX nested-exception
> support ensures a correct event stack level is chosen when a VM entry
> injects a nested exception, which is regarded as occurred in ring 0.
> 
> To fully test the underlying FRED VMX code, this test should be run one
> more round with EPT disabled to inject page faults as nested exceptions.
> 
> Originally-by: Shan Kang <shan.kang@intel.com>
> Signed-off-by: Xin Li <xin3.li@intel.com>
Thank you for the new test patch. We have been trying to ensure TAP
conformance for tests which cannot be achieved if new tests aren't using
TAP already. Please make your test TAP compliant.

> ---
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/include/x86_64/processor.h  |  32 ++
>  .../testing/selftests/kvm/x86_64/fred_test.c  | 297 ++++++++++++++++++
Add generated binary object to .gitignore.

>  3 files changed, 330 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/fred_test.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 492e937fab00..eaac13a605f2 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -67,6 +67,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
>  TEST_GEN_PROGS_x86_64 += x86_64/exit_on_emulation_failure_test
>  TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hwcr_msr_test
> +TEST_GEN_PROGS_x86_64 += x86_64/fred_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_evmcs
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index bc5cd8628a20..ef7aaab790e0 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -1275,4 +1275,36 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
>  #define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
>  #define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
>  
> +/*
> + * FRED related data structures and functions
> + */
> +
> +#define FRED_SSX_NMI		BIT_ULL(18)
> +
> +struct fred_stack {
> +	u64 r15;
> +	u64 r14;
> +	u64 r13;
> +	u64 r12;
> +	u64 bp;
> +	u64 bx;
> +	u64 r11;
> +	u64 r10;
> +	u64 r9;
> +	u64 r8;
> +	u64 ax;
> +	u64 cx;
> +	u64 dx;
> +	u64 si;
> +	u64 di;
> +	u64 error_code;
> +	u64 ip;
> +	u64 csx;
> +	u64 flags;
> +	u64 sp;
> +	u64 ssx;
> +	u64 event_data;
> +	u64 reserved;
> +};
> +
>  #endif /* SELFTEST_KVM_PROCESSOR_H */
> diff --git a/tools/testing/selftests/kvm/x86_64/fred_test.c b/tools/testing/selftests/kvm/x86_64/fred_test.c
> new file mode 100644
> index 000000000000..412afa919568
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/fred_test.c
> @@ -0,0 +1,297 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * FRED nested exception tests
> + *
> + * Copyright (C) 2023, Intel, Inc.
> + */
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/ioctl.h>
> +#include <asm/msr-index.h>
> +
> +#include "apic.h"
> +#include "kvm_util.h"
> +#include "test_util.h"
> +#include "guest_modes.h"
> +#include "processor.h"
> +
> +#define IRQ_VECTOR 0xAA
> +
> +#define FRED_STKLVL(v,l)		(_AT(unsigned long, l) << (2 * (v)))
> +#define FRED_CONFIG_ENTRYPOINT(p)	_AT(unsigned long, (p))
> +
> +/* This address is already mapped in guest page table. */
> +#define FRED_VALID_RSP			0x8000
> +
> +/*
> + * The following addresses are not yet mapped in both EPT and guest page
> + * tables at the beginning.  As a result, it causes an EPT violation VM
> + * exit with an original guest #PF to access any of them for the first
> + * time.
> + *
> + * Use these addresses as guest FRED RSP0 to generate nested #PFs to test
> + * if event data are properly virtualized.
> + */
> +static unsigned long fred_invalid_rsp[4] = {
> +	0x0,
> +	0xf0000000,
> +	0xe0000000,
> +	0xd0000000,
> +};
> +
> +extern char asm_user_nop[];
> +extern char asm_user_ud[];
> +extern char asm_done_fault[];
> +
> +extern void asm_test_fault(int test);
> +
> +/*
> + * user level code for triggering faults.
> + */
> +asm(".pushsection .text\n"
> +    ".align 4096\n"
> +
> +    ".type asm_user_nop, @function\n"
> +    "asm_user_nop:\n"
> +    "1: .byte 0x90\n"
> +    "jmp 1b\n"
> +
> +    ".fill asm_user_ud - ., 1, 0xcc\n"
> +
> +    ".type asm_user_ud, @function\n"
> +    ".org asm_user_nop + 16\n"
> +    "asm_user_ud:\n"
> +    /* Trigger a #UD */
> +    "ud2\n"
> +
> +    ".align 4096, 0xcc\n"
> +    ".popsection");
> +
> +/* Send current stack level and #PF address */
> +#define GUEST_SYNC_CSL_FA(__stage, __pf_address)		\
> +	GUEST_SYNC_ARGS(__stage, __pf_address, 0, 0, 0)
> +
> +void fred_entry_from_user(struct fred_stack *stack)
> +{
> +	u32 current_stack_level = rdmsr(MSR_IA32_FRED_CONFIG) & 0x3;
> +
> +	GUEST_SYNC_CSL_FA(current_stack_level, stack->event_data);
> +
> +	/* Do NOT go back to user level, continue the next test instead */
> +	stack->ssx = 0x18;
> +	stack->csx = 0x10;
> +	stack->ip = (u64)&asm_done_fault;
> +}
> +
> +void fred_entry_from_kernel(struct fred_stack *stack)
> +{
> +	/*
> +	 * Keep NMI blocked to delay the delivery of the next NMI until
> +	 * returning to user level.
> +	 * */
> +	stack->ssx &= ~FRED_SSX_NMI;
> +}
> +
> +#define PUSH_REGS	\
> +	"push %rdi\n"	\
> +	"push %rsi\n"	\
> +	"push %rdx\n"	\
> +	"push %rcx\n"	\
> +	"push %rax\n"	\
> +	"push %r8\n"	\
> +	"push %r9\n"	\
> +	"push %r10\n"	\
> +	"push %r11\n"	\
> +	"push %rbx\n"	\
> +	"push %rbp\n"	\
> +	"push %r12\n"	\
> +	"push %r13\n"	\
> +	"push %r14\n"	\
> +	"push %r15\n"
> +
> +#define POP_REGS	\
> +	"pop %r15\n"	\
> +	"pop %r14\n"	\
> +	"pop %r13\n"	\
> +	"pop %r12\n"	\
> +	"pop %rbp\n"	\
> +	"pop %rbx\n"	\
> +	"pop %r11\n"	\
> +	"pop %r10\n"	\
> +	"pop %r9\n"	\
> +	"pop %r8\n"	\
> +	"pop %rax\n"	\
> +	"pop %rcx\n"	\
> +	"pop %rdx\n"	\
> +	"pop %rsi\n"	\
> +	"pop %rdi\n"
> +
> +/*
> + * FRED entry points.
> + */
> +asm(".pushsection .text\n"
> +    ".type asm_fred_entrypoint_user, @function\n"
> +    ".align 4096\n"
> +    "asm_fred_entrypoint_user:\n"
> +    "endbr64\n"
> +    PUSH_REGS
> +    "movq %rsp, %rdi\n"
> +    "call fred_entry_from_user\n"
> +    POP_REGS
> +    /* Do NOT go back to user level, continue the next test instead */
> +    ".byte 0xf2,0x0f,0x01,0xca\n"	/* ERETS */
> +
> +    ".fill asm_fred_entrypoint_kernel - ., 1, 0xcc\n"
> +
> +    ".type asm_fred_entrypoint_kernel, @function\n"
> +    ".org asm_fred_entrypoint_user + 256\n"
> +    "asm_fred_entrypoint_kernel:\n"
> +    "endbr64\n"
> +    PUSH_REGS
> +    "movq %rsp, %rdi\n"
> +    "call fred_entry_from_kernel\n"
> +    POP_REGS
> +    ".byte 0xf2,0x0f,0x01,0xca\n"	/* ERETS */
> +    ".align 4096, 0xcc\n"
> +    ".popsection");
> +
> +extern char asm_fred_entrypoint_user[];
> +
> +/*
> + * Prepare a FRED stack frame for ERETU to return to user level code,
> + * nop or ud2.
> + *
> + * Because FRED RSP0 is deliberately not mapped in guest page table,
> + * the delivery of interrupt/NMI or #UD from ring 3 causes a nested
> + * #PF, which is then delivered on FRED RSPx (x is 1, 2 or 3,
> + * determinated by MSR FRED_STKLVL[PF_VECTOR]).
> + */
> +asm(".pushsection .text\n"
> +    ".type asm_test_fault, @function\n"
> +    ".align 4096\n"
> +    "asm_test_fault:\n"
> +    "endbr64\n"
> +    "push %rbp\n"
> +    "mov %rsp, %rbp\n"
> +    "and $(~0x3f), %rsp\n"
> +    "push $0\n"
> +    "push $0\n"
> +    "mov $0x2b, %rax\n"
> +    /* Unblock NMI */
> +    "bts $18, %rax\n"
> +    /* Set long mode bit */
> +    "bts $57, %rax\n"
> +    "push %rax\n"
> +    /* No stack required for the FRED user level test code */
> +    "push $0\n"
> +    "pushf\n"
> +    "pop %rax\n"
> +    /* Allow external interrupts */
> +    "bts $9, %rax\n"
> +    "push %rax\n"
> +    "mov $0x33, %rax\n"
> +    "push %rax\n"
> +    "cmp $0, %edi\n"
> +    "jne 1f\n"
> +    "lea asm_user_nop(%rip), %rax\n"
> +    "jmp 2f\n"
> +    "1: lea asm_user_ud(%rip), %rax\n"
> +    "2: push %rax\n"
> +    "push $0\n"
> +    /* ERETU to user level code to allow event delivery immediately */
> +    ".byte 0xf3,0x0f,0x01,0xca\n"
> +    "asm_done_fault:\n"
> +    "mov %rbp, %rsp\n"
> +    "pop %rbp\n"
> +    "ret\n"
> +    ".align 4096, 0xcc\n"
> +    ".popsection");
> +
> +/*
> + * To fully test the underlying FRED VMX code, this test should be run one
> + * more round with EPT disabled to inject page faults as nested exceptions.
> + */
> +static void guest_code(void)
> +{
> +	wrmsr(MSR_IA32_FRED_CONFIG,
> +	      FRED_CONFIG_ENTRYPOINT(asm_fred_entrypoint_user));
> +
> +	wrmsr(MSR_IA32_FRED_RSP1, FRED_VALID_RSP);
> +	wrmsr(MSR_IA32_FRED_RSP2, FRED_VALID_RSP);
> +	wrmsr(MSR_IA32_FRED_RSP3, FRED_VALID_RSP);
> +
> +	/* Enable FRED */
> +	set_cr4(get_cr4() | X86_CR4_FRED);
> +
> +	x2apic_enable();
> +
> +	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 1));
> +	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[1]);
> +	/* 1: ud2 to generate #UD */
> +	asm_test_fault(1);
> +
> +	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 2));
> +	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[2]);
> +	asm volatile("cli");
> +	/* Create a pending interrupt on current vCPU */
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_FIXED | IRQ_VECTOR);
> +	/* Return to ring 3 */
> +	asm_test_fault(0);
> +	x2apic_write_reg(APIC_EOI, 0);
> +
> +	wrmsr(MSR_IA32_FRED_STKLVLS, FRED_STKLVL(PF_VECTOR, 3));
> +	wrmsr(MSR_IA32_FRED_RSP0, fred_invalid_rsp[3]);
> +	/*
> +	 * The first NMI is just to have NMI blocked in ring 0, because
> +	 * fred_entry_from_kernel() deliberately clears the NMI bit in
> +	 * FRED stack frame.
> +	 */
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_NMI | NMI_VECTOR);
> +	/* The second NMI will be delivered after returning to ring 3 */
> +	x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
> +			 APIC_DM_NMI | NMI_VECTOR);
> +	/* Return to ring 3 */
> +	asm_test_fault(0);
> +
> +	GUEST_DONE();
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +	struct ucall uc;
> +	uint64_t expected_current_stack_level = 1;
> +
> +	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_FRED));
> +
> +	vm = __vm_create_with_vcpus(VM_SHAPE(VM_MODE_PXXV48_4K_USER), 1, 0,
> +				    guest_code, &vcpu);
> +
> +	while (true) {
> +		uint64_t r;
> +
> +		vcpu_run(vcpu);
> +
> +		r = get_ucall(vcpu, &uc);
> +
> +		if (r == UCALL_DONE)
> +			break;
> +
> +		if (r == UCALL_SYNC) {
> +			TEST_ASSERT((uc.args[1] == expected_current_stack_level) &&
> +				    (uc.args[2] == fred_invalid_rsp[expected_current_stack_level] - 1),
> +				    "Incorrect stack level %lx and #PF address %lx\n",
> +				    uc.args[1], uc.args[2]);
> +			expected_current_stack_level++;
> +		}
> +	}
> +
> +	kvm_vm_free(vm);
> +	return 0;
> +}

-- 
BR,
Muhammad Usama Anjum


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v2 00/25] Enable FRED with KVM VMX
  2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
                   ` (25 preceding siblings ...)
  2024-03-27  8:08 ` [PATCH v2 00/25] Enable FRED with KVM VMX Kang, Shan
@ 2024-04-15 17:58 ` Li, Xin3
  26 siblings, 0 replies; 51+ messages in thread
From: Li, Xin3 @ 2024-04-15 17:58 UTC (permalink / raw)
  To: Li, Xin3, linux-kernel, kvm, linux-doc, linux-kselftest
  Cc: seanjc, pbonzini, corbet, tglx, mingo, bp, dave.hansen, x86, hpa,
	shuah, vkuznets, peterz, Shankar, Ravi V, xin

> This patch set enables the Intel flexible return and event delivery
> (FRED) architecture with KVM VMX to allow guests to utilize FRED.
> 

<snip>

> 
> Intel VMX architecture is extended to run FRED guests, and the major changes
> are:
> 
> 1) New VMCS fields for FRED context management, which includes two new
> event data VMCS fields, eight new guest FRED context VMCS fields and eight new
> host FRED context VMCS fields.
> 
> 2) VMX nested-exception support for proper virtualization of stack levels
> introduced with FRED architecture.
> 

<snip>

> 
> Patch 1-2 are cleanups to VMX basic and misc MSRs, which were sent out earlier
> as a preparation for FRED changes:
> https://lore.kernel.org/kvm/20240206182032.1596-1-xin3.li@intel.com/T/#u

Obviously I will drop the 2 clean patches in the next iteration.

> Patch 3-15 add FRED support to VMX.
> Patch 16-21 add FRED support to nested VMX.
> Patch 22 exposes FRED and its baseline features to KVM guests.
> Patch 23-25 add FRED selftests.

Please help to review and comment on the FRED KVM/VMX patches.

Thanks!
    Xin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 03/25] KVM: VMX: Add support for the secondary VM exit controls
  2024-02-07 17:26 ` [PATCH v2 03/25] KVM: VMX: Add support for the secondary VM exit controls Xin Li
@ 2024-04-19 10:21   ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-04-19 10:21 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, ravi.v.shankar, xin

On Wed, Feb 07, 2024 at 09:26:23AM -0800, Xin Li wrote:
>Enable the secondary VM exit controls to prepare for FRED enabling.
>
>The activation of the secondary VM exit controls is off now, and it
>will be switched on when a VMX feature needing it is enabled.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 04/25] KVM: x86: Mark CR4.FRED as not reserved
  2024-02-07 17:26 ` [PATCH v2 04/25] KVM: x86: Mark CR4.FRED as not reserved Xin Li
@ 2024-04-19 10:22   ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-04-19 10:22 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, ravi.v.shankar, xin

On Wed, Feb 07, 2024 at 09:26:24AM -0800, Xin Li wrote:
>The CR4.FRED bit, i.e., CR4[32], is no longer a reserved bit when a guest
>enumerates FRED, otherwise it is still a reserved bit.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 05/25] KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config
  2024-02-07 17:26 ` [PATCH v2 05/25] KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config Xin Li
@ 2024-04-19 10:24   ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-04-19 10:24 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, ravi.v.shankar, xin

On Wed, Feb 07, 2024 at 09:26:25AM -0800, Xin Li wrote:
>Setup the global vmcs_config for FRED:
>1) Add VM_ENTRY_LOAD_IA32_FRED to KVM_OPTIONAL_VMX_VM_ENTRY_CONTROLS to
>   have a FRED CPU load guest FRED MSRs from VMCS upon VM entry.
>2) Add SECONDARY_VM_EXIT_SAVE_IA32_FRED to
>   KVM_OPTIONAL_VMX_SECONDARY_VM_EXIT_CONTROLS to have a FRED CPU save
>   guest FRED MSRs to VMCS during VM exit.
>3) add SECONDARY_VM_EXIT_LOAD_IA32_FRED to
>   KVM_OPTIONAL_VMX_SECONDARY_VM_EXIT_CONTROLS to have a FRED CPU load
>   host FRED MSRs from VMCS during VM exit.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 06/25] KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID
  2024-02-07 17:26 ` [PATCH v2 06/25] KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID Xin Li
@ 2024-04-19 11:02   ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-04-19 11:02 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, ravi.v.shankar, xin

On Wed, Feb 07, 2024 at 09:26:26AM -0800, Xin Li wrote:
>Clear FRED VM entry/exit controls when initializing a vCPU, and set
>these controls only if FRED is enumerated after set CPUID.
>
>FRED VM entry/exit controls need to be set to establish context
>sufficient to support FRED event delivery immediately after VM entry
>and exit. However it is not required to save/load FRED MSRs for
>a non-FRED guest, which aren't supposed to access FRED MSRs.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 07/25] KVM: VMX: Set intercept for FRED MSRs
  2024-02-07 17:26 ` [PATCH v2 07/25] KVM: VMX: Set intercept for FRED MSRs Xin Li
@ 2024-04-19 13:35   ` Chao Gao
  2024-04-19 17:06     ` Li, Xin3
  0 siblings, 1 reply; 51+ messages in thread
From: Chao Gao @ 2024-04-19 13:35 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, ravi.v.shankar, xin

On Wed, Feb 07, 2024 at 09:26:27AM -0800, Xin Li wrote:
>Add FRED MSRs to the valid passthrough MSR list and set FRED MSRs intercept
>based on FRED enumeration.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

two nits below.

>---
>
>Change since v1:
>* Enable FRED MSRs intercept if FRED is no longer enumerated in CPUID
>  (Chao Gao).
>---
> arch/x86/kvm/vmx/vmx.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index 34b6676f60d8..d58ed2d3d379 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -693,6 +693,9 @@ static bool is_valid_passthrough_msr(u32 msr)
> 	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
> 		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
> 		return true;
>+	case MSR_IA32_FRED_RSP0 ... MSR_IA32_FRED_CONFIG:
>+		/* FRED MSRs should be passthrough to FRED guests only */

This comment sounds weird. It sounds like the code will be something like:
		if guest supports FRED
			return true
		else
			return false

how about "FRED MSRs are pass-thru'd to guests which enumerate FRED"?

Or to align with above comment for LBR MSRs, just say

/* FRED MSRs. These are handled in vmx_vcpu_config_fred_after_set_cpuid() */

>+		return true;
> 	}
> 
> 	r = possible_passthrough_msr_slot(msr) != -ENOENT;
>@@ -7774,10 +7777,12 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
> static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu)
> {
> 	struct vcpu_vmx *vmx = to_vmx(vcpu);
>+	bool fred_enumerated;
> 
> 	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_FRED);
>+	fred_enumerated = guest_can_use(vcpu, X86_FEATURE_FRED);
> 
>-	if (guest_can_use(vcpu, X86_FEATURE_FRED)) {
>+	if (fred_enumerated) {
> 		vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED);
> 		secondary_vm_exit_controls_setbit(vmx,
> 						  SECONDARY_VM_EXIT_SAVE_IA32_FRED |
>@@ -7788,6 +7793,16 @@ static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu)
> 						    SECONDARY_VM_EXIT_SAVE_IA32_FRED |
> 						    SECONDARY_VM_EXIT_LOAD_IA32_FRED);
> 	}
>+
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP0, MSR_TYPE_RW, !fred_enumerated);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP1, MSR_TYPE_RW, !fred_enumerated);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP2, MSR_TYPE_RW, !fred_enumerated);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP3, MSR_TYPE_RW, !fred_enumerated);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_STKLVLS, MSR_TYPE_RW, !fred_enumerated);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP1, MSR_TYPE_RW, !fred_enumerated);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP2, MSR_TYPE_RW, !fred_enumerated);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP3, MSR_TYPE_RW, !fred_enumerated);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_CONFIG, MSR_TYPE_RW, !fred_enumerated);

Use a for-loop here? e.g., 
	for (i = MSR_IA32_FRED_RSP0; i <= MSR_IA32_FRED_CONFIG; i++)
> }
> 
> static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>-- 
>2.43.0
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 08/25] KVM: VMX: Initialize VMCS FRED fields
  2024-02-07 17:26 ` [PATCH v2 08/25] KVM: VMX: Initialize VMCS FRED fields Xin Li
@ 2024-04-19 14:01   ` Chao Gao
  2024-04-19 17:02     ` Li, Xin3
  0 siblings, 1 reply; 51+ messages in thread
From: Chao Gao @ 2024-04-19 14:01 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, ravi.v.shankar, xin

On Wed, Feb 07, 2024 at 09:26:28AM -0800, Xin Li wrote:
>Initialize host VMCS FRED fields with host FRED MSRs' value and
>guest VMCS FRED fields to 0.
>
>FRED CPU states are managed in 9 new FRED MSRs, as well as a few
>existing CPU registers and MSRs, e.g., CR4.FRED.  To support FRED
>context management, new VMCS fields corresponding to most of FRED
>CPU state MSRs are added to both the host-state and guest-state
>areas of VMCS.
>
>Specifically no VMCS fields are added for FRED RSP0 and SSP0 MSRs,
>because the 2 FRED MSRs are used during ring 3 event delivery only,
>thus KVM, running on ring 0, can run safely even with guest FRED
>RSP0 and SSP0.  It can be deferred to load host FRED RSP0 and SSP0
>until before returning to user level.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>
>---
>
>Changes since v1:
>* Use kvm_cpu_cap_has() instead of cpu_feature_enabled() to decouple
>  KVM's capability to virtualize a feature and host's enabling of a
>  feature (Chao Gao).
>* Move guest FRED states init into __vmx_vcpu_reset() (Chao Gao).
>---
> arch/x86/include/asm/vmx.h | 16 ++++++++++++++++
> arch/x86/kvm/vmx/vmx.c     | 34 ++++++++++++++++++++++++++++++++++
> 2 files changed, 50 insertions(+)
>
>diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>index cb14f7e315f5..4889754415b5 100644
>--- a/arch/x86/include/asm/vmx.h
>+++ b/arch/x86/include/asm/vmx.h
>@@ -280,12 +280,28 @@ enum vmcs_field {
> 	GUEST_BNDCFGS_HIGH              = 0x00002813,
> 	GUEST_IA32_RTIT_CTL		= 0x00002814,
> 	GUEST_IA32_RTIT_CTL_HIGH	= 0x00002815,
>+	GUEST_IA32_FRED_CONFIG		= 0x0000281a,
>+	GUEST_IA32_FRED_RSP1		= 0x0000281c,
>+	GUEST_IA32_FRED_RSP2		= 0x0000281e,
>+	GUEST_IA32_FRED_RSP3		= 0x00002820,
>+	GUEST_IA32_FRED_STKLVLS		= 0x00002822,
>+	GUEST_IA32_FRED_SSP1		= 0x00002824,
>+	GUEST_IA32_FRED_SSP2		= 0x00002826,
>+	GUEST_IA32_FRED_SSP3		= 0x00002828,
> 	HOST_IA32_PAT			= 0x00002c00,
> 	HOST_IA32_PAT_HIGH		= 0x00002c01,
> 	HOST_IA32_EFER			= 0x00002c02,
> 	HOST_IA32_EFER_HIGH		= 0x00002c03,
> 	HOST_IA32_PERF_GLOBAL_CTRL	= 0x00002c04,
> 	HOST_IA32_PERF_GLOBAL_CTRL_HIGH	= 0x00002c05,
>+	HOST_IA32_FRED_CONFIG		= 0x00002c08,
>+	HOST_IA32_FRED_RSP1		= 0x00002c0a,
>+	HOST_IA32_FRED_RSP2		= 0x00002c0c,
>+	HOST_IA32_FRED_RSP3		= 0x00002c0e,
>+	HOST_IA32_FRED_STKLVLS		= 0x00002c10,
>+	HOST_IA32_FRED_SSP1		= 0x00002c12,
>+	HOST_IA32_FRED_SSP2		= 0x00002c14,
>+	HOST_IA32_FRED_SSP3		= 0x00002c16,
> 	PIN_BASED_VM_EXEC_CONTROL       = 0x00004000,
> 	CPU_BASED_VM_EXEC_CONTROL       = 0x00004002,
> 	EXCEPTION_BITMAP                = 0x00004004,
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index d58ed2d3d379..b7b772183ee4 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -1470,6 +1470,18 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
> 				    (unsigned long)(cpu_entry_stack(cpu) + 1));
> 		}
> 
>+#ifdef CONFIG_X86_64

is this #ifdeffery neccesary?

I assume kvm_cpu_cap_has(X86_FEATURE_FRED) is always false for !CONFIG_X86_64.
Looks most of FRED changes in core kernel don't have such #ifdeffery.

>+		/* Per-CPU FRED MSRs */

Please explain why these six MSRs are updated here and why only they are updated in this
comment.

>+		if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
>+			vmcs_write64(HOST_IA32_FRED_RSP1, read_msr(MSR_IA32_FRED_RSP1));
>+			vmcs_write64(HOST_IA32_FRED_RSP2, read_msr(MSR_IA32_FRED_RSP2));
>+			vmcs_write64(HOST_IA32_FRED_RSP3, read_msr(MSR_IA32_FRED_RSP3));
>+			vmcs_write64(HOST_IA32_FRED_SSP1, read_msr(MSR_IA32_FRED_SSP1));
>+			vmcs_write64(HOST_IA32_FRED_SSP2, read_msr(MSR_IA32_FRED_SSP2));
>+			vmcs_write64(HOST_IA32_FRED_SSP3, read_msr(MSR_IA32_FRED_SSP3));
>+		}
>+#endif
>+
> 		vmx->loaded_vmcs->cpu = cpu;
> 	}
> }
>@@ -4321,6 +4333,15 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
> 	 */
> 	vmcs_write16(HOST_DS_SELECTOR, 0);
> 	vmcs_write16(HOST_ES_SELECTOR, 0);
>+
>+	/*
>+	 * FRED MSRs are per-cpu, however FRED CONFIG and STKLVLS MSRs
>+	 * are the same on all CPUs, thus they are initialized here.
>+	 */
>+	if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
>+		vmcs_write64(HOST_IA32_FRED_CONFIG, read_msr(MSR_IA32_FRED_CONFIG));
>+		vmcs_write64(HOST_IA32_FRED_STKLVLS, read_msr(MSR_IA32_FRED_STKLVLS));
>+	}
> #else
> 	vmcs_write16(HOST_DS_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
> 	vmcs_write16(HOST_ES_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
>@@ -4865,6 +4886,19 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu)
> 	 */
> 	vmx->pi_desc.nv = POSTED_INTR_VECTOR;
> 	vmx->pi_desc.sn = 1;
>+
>+#ifdef CONFIG_X86_64

ditto

>+	if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
>+		vmcs_write64(GUEST_IA32_FRED_CONFIG, 0);
>+		vmcs_write64(GUEST_IA32_FRED_RSP1, 0);
>+		vmcs_write64(GUEST_IA32_FRED_RSP2, 0);
>+		vmcs_write64(GUEST_IA32_FRED_RSP3, 0);
>+		vmcs_write64(GUEST_IA32_FRED_STKLVLS, 0);
>+		vmcs_write64(GUEST_IA32_FRED_SSP1, 0);
>+		vmcs_write64(GUEST_IA32_FRED_SSP2, 0);
>+		vmcs_write64(GUEST_IA32_FRED_SSP3, 0);
>+	}
>+#endif
> }
> 
> static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>-- 
>2.43.0
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and guest
  2024-02-07 17:26 ` [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and guest Xin Li
@ 2024-04-19 14:23   ` Chao Gao
  2024-04-19 16:37     ` Li, Xin3
  0 siblings, 1 reply; 51+ messages in thread
From: Chao Gao @ 2024-04-19 14:23 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, ravi.v.shankar, xin

On Wed, Feb 07, 2024 at 09:26:29AM -0800, Xin Li wrote:
>Switch MSR_IA32_FRED_RSP0 between host and guest in
>vmx_prepare_switch_to_{host,guest}().
>
>MSR_IA32_FRED_RSP0 is used during ring 3 event delivery only, thus
>KVM, running on ring 0, can run safely with guest FRED RSP0, i.e.,
>no need to switch between host/guest FRED RSP0 during VM entry and
>exit.
>
>KVM should switch to host FRED RSP0 before returning to user level,
>and switch to guest FRED RSP0 before entering guest mode.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>
>---
>
>Changes since v1:
>* Don't use guest_cpuid_has() in vmx_prepare_switch_to_{host,guest}(),
>  which are called from IRQ-disabled context (Chao Gao).
>* Reset msr_guest_fred_rsp0 in __vmx_vcpu_reset() (Chao Gao).
>---
> arch/x86/kvm/vmx/vmx.c | 17 +++++++++++++++++
> arch/x86/kvm/vmx/vmx.h |  2 ++
> 2 files changed, 19 insertions(+)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index b7b772183ee4..264378c3b784 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -1337,6 +1337,16 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> 	}
> 
> 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
>+
>+	if (guest_can_use(vcpu, X86_FEATURE_FRED)) {
>+		/*
>+		 * MSR_IA32_FRED_RSP0 is top of task stack, which never changes.
>+		 * Thus it should be initialized only once.
>+		 */
>+		if (unlikely(vmx->msr_host_fred_rsp0 == 0))
>+			vmx->msr_host_fred_rsp0 = read_msr(MSR_IA32_FRED_RSP0);

can we just drop this and use "(unsigned long)task_stack_page(current) + THREAD_SIZE"
as host fred rsp0?

>+		wrmsrl(MSR_IA32_FRED_RSP0, vmx->msr_guest_fred_rsp0);

any reason to not use wrmsrns?

>+	}
> #else
> 	savesegment(fs, fs_sel);
> 	savesegment(gs, gs_sel);
>@@ -1381,6 +1391,11 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
> 	invalidate_tss_limit();
> #ifdef CONFIG_X86_64
> 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
>+
>+	if (guest_can_use(&vmx->vcpu, X86_FEATURE_FRED)) {
>+		vmx->msr_guest_fred_rsp0 = read_msr(MSR_IA32_FRED_RSP0);
>+		wrmsrl(MSR_IA32_FRED_RSP0, vmx->msr_host_fred_rsp0);

same question.

>+	}
> #endif
> 	load_fixmap_gdt(raw_smp_processor_id());
> 	vmx->guest_state_loaded = false;
>@@ -4889,6 +4904,8 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu)
> 
> #ifdef CONFIG_X86_64
> 	if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
>+		vmx->msr_guest_fred_rsp0 = 0;
>+
> 		vmcs_write64(GUEST_IA32_FRED_CONFIG, 0);
> 		vmcs_write64(GUEST_IA32_FRED_RSP1, 0);
> 		vmcs_write64(GUEST_IA32_FRED_RSP2, 0);
>diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>index 3ad52437f426..176ad39be406 100644
>--- a/arch/x86/kvm/vmx/vmx.h
>+++ b/arch/x86/kvm/vmx/vmx.h
>@@ -278,6 +278,8 @@ struct vcpu_vmx {
> #ifdef CONFIG_X86_64
> 	u64		      msr_host_kernel_gs_base;
> 	u64		      msr_guest_kernel_gs_base;
>+	u64		      msr_host_fred_rsp0;
>+	u64		      msr_guest_fred_rsp0;
> #endif
> 
> 	u64		      spec_ctrl;
>-- 
>2.43.0
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and guest
  2024-04-19 14:23   ` Chao Gao
@ 2024-04-19 16:37     ` Li, Xin3
  0 siblings, 0 replies; 51+ messages in thread
From: Li, Xin3 @ 2024-04-19 16:37 UTC (permalink / raw)
  To: Gao, Chao
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

> >+		if (unlikely(vmx->msr_host_fred_rsp0 == 0))
> >+			vmx->msr_host_fred_rsp0 =
> read_msr(MSR_IA32_FRED_RSP0);
> 
> can we just drop this and use "(unsigned long)task_stack_page(current) +
> THREAD_SIZE"
> as host fred rsp0?

I thought about it, however, don't see a strong reason that it's better,
 i.e., is RDMSR slower than reading 'stack' from current task_struct?

> 
> >+		wrmsrl(MSR_IA32_FRED_RSP0, vmx->msr_guest_fred_rsp0);
> 
> any reason to not use wrmsrns?

Good call!


> >+	}
> > #else
> > 	savesegment(fs, fs_sel);
> > 	savesegment(gs, gs_sel);
> >@@ -1381,6 +1391,11 @@ static void vmx_prepare_switch_to_host(struct
> vcpu_vmx *vmx)
> > 	invalidate_tss_limit();
> > #ifdef CONFIG_X86_64
> > 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
> >+
> >+	if (guest_can_use(&vmx->vcpu, X86_FEATURE_FRED)) {
> >+		vmx->msr_guest_fred_rsp0 = read_msr(MSR_IA32_FRED_RSP0);
> >+		wrmsrl(MSR_IA32_FRED_RSP0, vmx->msr_host_fred_rsp0);
> 
> same question.

Will do!

Thanks!
    Xin


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v2 08/25] KVM: VMX: Initialize VMCS FRED fields
  2024-04-19 14:01   ` Chao Gao
@ 2024-04-19 17:02     ` Li, Xin3
  0 siblings, 0 replies; 51+ messages in thread
From: Li, Xin3 @ 2024-04-19 17:02 UTC (permalink / raw)
  To: Gao, Chao
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

> >+#ifdef CONFIG_X86_64
> 
> is this #ifdeffery neccesary?

Yes, otherwise build fails on 32 bit.

> 
> I assume kvm_cpu_cap_has(X86_FEATURE_FRED) is always false
> for !CONFIG_X86_64.
> Looks most of FRED changes in core kernel don't have such #ifdeffery.

Because it's not a compile time false, instead false from runtime.

> 
> >+		/* Per-CPU FRED MSRs */
> 
> Please explain why these six MSRs are updated here and why only they are updated in
> this comment.

The explanation is kind of implicit "per-CPU", I will make it more explicit.	

Thanks!
    Xin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v2 07/25] KVM: VMX: Set intercept for FRED MSRs
  2024-04-19 13:35   ` Chao Gao
@ 2024-04-19 17:06     ` Li, Xin3
  0 siblings, 0 replies; 51+ messages in thread
From: Li, Xin3 @ 2024-04-19 17:06 UTC (permalink / raw)
  To: Gao, Chao
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

> >+	case MSR_IA32_FRED_RSP0 ... MSR_IA32_FRED_CONFIG:
> >+		/* FRED MSRs should be passthrough to FRED guests only */
> 
> This comment sounds weird. It sounds like the code will be something like:
> 		if guest supports FRED
> 			return true
> 		else
> 			return false
> 
> how about "FRED MSRs are pass-thru'd to guests which enumerate FRED"?
> 
> Or to align with above comment for LBR MSRs, just say
> 
> /* FRED MSRs. These are handled in vmx_vcpu_config_fred_after_set_cpuid() */
> 

Maybe both to not confuse people at all 😊

> >+		return true;
> > 	}
> >
> > 	r = possible_passthrough_msr_slot(msr) != -ENOENT; @@ -7774,10
> >+7777,12 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
> >static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu)
> >{
> > 	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >+	bool fred_enumerated;
> >
> > 	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_FRED);
> >+	fred_enumerated = guest_can_use(vcpu, X86_FEATURE_FRED);
> >
> >-	if (guest_can_use(vcpu, X86_FEATURE_FRED)) {
> >+	if (fred_enumerated) {
> > 		vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_FRED);
> > 		secondary_vm_exit_controls_setbit(vmx,
> >
> SECONDARY_VM_EXIT_SAVE_IA32_FRED | @@ -7788,6 +7793,16 @@
> >static void vmx_vcpu_config_fred_after_set_cpuid(struct kvm_vcpu *vcpu)
> >
> SECONDARY_VM_EXIT_SAVE_IA32_FRED |
> >
> SECONDARY_VM_EXIT_LOAD_IA32_FRED);
> > 	}
> >+
> >+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP0,
> MSR_TYPE_RW, !fred_enumerated);
> >+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP1,
> MSR_TYPE_RW, !fred_enumerated);
> >+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP2,
> MSR_TYPE_RW, !fred_enumerated);
> >+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP3,
> MSR_TYPE_RW, !fred_enumerated);
> >+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_STKLVLS,
> MSR_TYPE_RW, !fred_enumerated);
> >+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP1,
> MSR_TYPE_RW, !fred_enumerated);
> >+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP2,
> MSR_TYPE_RW, !fred_enumerated);
> >+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP3,
> MSR_TYPE_RW, !fred_enumerated);
> >+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_CONFIG, MSR_TYPE_RW,
> >+!fred_enumerated);
> 
> Use a for-loop here? e.g.,
> 	for (i = MSR_IA32_FRED_RSP0; i <= MSR_IA32_FRED_CONFIG; i++)
> > }

Yeah, let me try.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 25/25] KVM: selftests: Add fred exception tests
@ 2024-04-24 16:08     ` Sean Christopherson
  0 siblings, 0 replies; 51+ messages in thread
From: Sean Christopherson @ 2024-04-24 16:08 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Xin Li, linux-kernel, kvm, linux-doc, linux-kselftest, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, ravi.v.shankar, xin

On Sat, Mar 30, 2024, Muhammad Usama Anjum wrote:
> On 2/7/24 10:26 PM, Xin Li wrote:
> > Add tests for FRED event data and VMX nested-exception.
> > 
> > FRED is designed to save a complete event context in its stack frame,
> > e.g., FRED saves the faulting linear address of a #PF into a 64-bit
> > event data field defined in FRED stack frame.  As such, FRED VMX adds
> > event data handling during VMX transitions.
> > 
> > Besides, FRED introduces event stack levels to dispatch an event handler
> > onto a stack baesd on current stack level and stack levels defined in
> > IA32_FRED_STKLVLS MSR for each exception vector.  VMX nested-exception
> > support ensures a correct event stack level is chosen when a VM entry
> > injects a nested exception, which is regarded as occurred in ring 0.
> > 
> > To fully test the underlying FRED VMX code, this test should be run one
> > more round with EPT disabled to inject page faults as nested exceptions.
> > 
> > Originally-by: Shan Kang <shan.kang@intel.com>
> > Signed-off-by: Xin Li <xin3.li@intel.com>
> Thank you for the new test patch. We have been trying to ensure TAP
> conformance for tests which cannot be achieved if new tests aren't using
> TAP already.

Who is "we"?

> Please make your test TAP compliant.

This isn't entirely reasonable feedback.  I'm all for getting KVM selftests
TAP-friendly, but the current reality is that the KVM selftests infrastructure
doesn't make it easy to be TAP compliant.  We're working on improving things,
i.e. I do hope/want to get to a state where it's a hard requirement for KVM
selftests to be TAP compliant, but we aren't there yet.

If you have specific feedback on _how_ to make a test TAP compliant, then by all
means provide that feedback.  But a drive-by "make your test TAP compliant" isn't
super helpful.

> > ---
> >  tools/testing/selftests/kvm/Makefile          |   1 +
> >  .../selftests/kvm/include/x86_64/processor.h  |  32 ++
> >  .../testing/selftests/kvm/x86_64/fred_test.c  | 297 ++++++++++++++++++
> Add generated binary object to .gitignore.

This should be unnecessary (though I haven't actually verified by building), as
KVM selftests ignore most everything by default since commit 43e96957e8b8
("KVM: selftests: Use pattern matching in .gitignore").

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 10/25] KVM: VMX: Add support for FRED context save/restore
  2024-02-07 17:26 ` [PATCH v2 10/25] KVM: VMX: Add support for FRED context save/restore Xin Li
@ 2024-04-29  6:31   ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-04-29  6:31 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

On Thu, Feb 08, 2024 at 01:26:30AM +0800, Xin Li wrote:
>Handle host initiated FRED MSR access requests to allow FRED context
>to be set/get from user level.
>

The changelog isn't accurate because guest accesses are also handled
by this patch, specifically in the "else" branch.

	>+		if (host_initiated) {
	>+			if (!kvm_cpu_cap_has(X86_FEATURE_FRED))
	>+				return 1;
	>+		} else {



> void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
>@@ -2019,6 +2037,33 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_KERNEL_GS_BASE:
> 		msr_info->data = vmx_read_guest_kernel_gs_base(vmx);
> 		break;
>+	case MSR_IA32_FRED_RSP0:
>+		msr_info->data = vmx_read_guest_fred_rsp0(vmx);
>+		break;
>+	case MSR_IA32_FRED_RSP1:
>+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_RSP1);
>+		break;
>+	case MSR_IA32_FRED_RSP2:
>+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_RSP2);
>+		break;
>+	case MSR_IA32_FRED_RSP3:
>+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_RSP3);
>+		break;
>+	case MSR_IA32_FRED_STKLVLS:
>+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_STKLVLS);
>+		break;
>+	case MSR_IA32_FRED_SSP1:
>+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_SSP1);
>+		break;
>+	case MSR_IA32_FRED_SSP2:
>+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_SSP2);
>+		break;
>+	case MSR_IA32_FRED_SSP3:
>+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_SSP3);
>+		break;
>+	case MSR_IA32_FRED_CONFIG:
>+		msr_info->data = vmcs_read64(GUEST_IA32_FRED_CONFIG);
>+		break;

how about adding a helper function to convert MSR index to the VMCS field id?
Then do:

	case MSR_IA32_FRED_RSP1 ... MSR_IA32_FRED_STKLVLS:
	case MSR_IA32_FRED_SSP1 ... MSR_IA32_FRED_CONFIG:
		msr_info->data = vmcs_read64(msr_to_vmcs(index));
		break;

and ...

> #endif
> 	case MSR_EFER:
> 		return kvm_get_msr_common(vcpu, msr_info);
>@@ -2226,6 +2271,33 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 			vmx_update_exception_bitmap(vcpu);
> 		}
> 		break;
>+	case MSR_IA32_FRED_RSP0:
>+		vmx_write_guest_fred_rsp0(vmx, data);
>+		break;
>+	case MSR_IA32_FRED_RSP1:
>+		vmcs_write64(GUEST_IA32_FRED_RSP1, data);
>+		break;
>+	case MSR_IA32_FRED_RSP2:
>+		vmcs_write64(GUEST_IA32_FRED_RSP2, data);
>+		break;
>+	case MSR_IA32_FRED_RSP3:
>+		vmcs_write64(GUEST_IA32_FRED_RSP3, data);
>+		break;
>+	case MSR_IA32_FRED_STKLVLS:
>+		vmcs_write64(GUEST_IA32_FRED_STKLVLS, data);
>+		break;
>+	case MSR_IA32_FRED_SSP1:
>+		vmcs_write64(GUEST_IA32_FRED_SSP1, data);
>+		break;
>+	case MSR_IA32_FRED_SSP2:
>+		vmcs_write64(GUEST_IA32_FRED_SSP2, data);
>+		break;
>+	case MSR_IA32_FRED_SSP3:
>+		vmcs_write64(GUEST_IA32_FRED_SSP3, data);
>+		break;
>+	case MSR_IA32_FRED_CONFIG:
>+		vmcs_write64(GUEST_IA32_FRED_CONFIG, data);
>+		break;

	case MSR_IA32_FRED_RSP1 ... MSR_IA32_FRED_STKLVLS:
	case MSR_IA32_FRED_SSP1 ... MSR_IA32_FRED_CONFIG:
		vmcs_write64(msr_to_vmcs(index), data);
		break;

The code will be more compact and generate less instructions.  I believe CET
series can do the same change [*]. Performance here isn't critical. I just
think it looks cumbersome to repeat the same pattern for 8 (and more with
CET considered) MSRs.

[*]: https://lore.kernel.org/kvm/20240219074733.122080-21-weijiang.yang@intel.com/

> #endif
> 	case MSR_IA32_SYSENTER_CS:
> 		if (is_guest_mode(vcpu))
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 363b1c080205..4e8d60f248e3 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1451,6 +1451,9 @@ static const u32 msrs_to_save_base[] = {
> 	MSR_STAR,
> #ifdef CONFIG_X86_64
> 	MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
>+	MSR_IA32_FRED_RSP0, MSR_IA32_FRED_RSP1, MSR_IA32_FRED_RSP2,
>+	MSR_IA32_FRED_RSP3, MSR_IA32_FRED_STKLVLS, MSR_IA32_FRED_SSP1,
>+	MSR_IA32_FRED_SSP2, MSR_IA32_FRED_SSP3, MSR_IA32_FRED_CONFIG,
> #endif
> 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
> 	MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
>@@ -1892,6 +1895,30 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
> 			return 1;
> 
> 		data = (u32)data;
>+		break;
>+	case MSR_IA32_FRED_RSP0 ... MSR_IA32_FRED_CONFIG:
>+		if (index != MSR_IA32_FRED_STKLVLS && is_noncanonical_address(data, vcpu))
>+			return 1;
>+		if ((index >= MSR_IA32_FRED_RSP0 && index <= MSR_IA32_FRED_RSP3) &&
>+		    (data & GENMASK_ULL(5, 0)))
>+			return 1;
>+		if ((index >= MSR_IA32_FRED_SSP1 && index <= MSR_IA32_FRED_SSP3) &&
>+		    (data & GENMASK_ULL(2, 0)))
>+			return 1;
>+
>+		if (host_initiated) {
>+			if (!kvm_cpu_cap_has(X86_FEATURE_FRED))
>+				return 1;

Should be:
			if (!kvm_cpu_cap_has(X86_FEATURE_FRED) && data)

KVM ABI allows userspace to write only 0 if guests cannot enumerate the
feature. And even better, your next version can be on top of Sean's series

https://lore.kernel.org/kvm/20240425181422.3250947-1-seanjc@google.com/T/#md00be687770e1e658fc9fe0eac20b5f0bd230e4c

this way, you can get rid of the "host_initiated" check.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 11/25] KVM: x86: Add kvm_is_fred_enabled()
  2024-02-07 17:26 ` [PATCH v2 11/25] KVM: x86: Add kvm_is_fred_enabled() Xin Li
@ 2024-04-29  8:24   ` Chao Gao
  2024-05-11  1:24     ` Li, Xin3
  0 siblings, 1 reply; 51+ messages in thread
From: Chao Gao @ 2024-04-29  8:24 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

On Thu, Feb 08, 2024 at 01:26:31AM +0800, Xin Li wrote:
>Add kvm_is_fred_enabled() to get if FRED is enabled on a vCPU.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>
>---
>
>Change since v1:
>* Explain why it is ok to only check CR4.FRED (Chao Gao).
>---
> arch/x86/kvm/kvm_cache_regs.h | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
>diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
>index 75eae9c4998a..1d431c703fdf 100644
>--- a/arch/x86/kvm/kvm_cache_regs.h
>+++ b/arch/x86/kvm/kvm_cache_regs.h
>@@ -187,6 +187,23 @@ static __always_inline bool kvm_is_cr4_bit_set(struct kvm_vcpu *vcpu,
> 	return !!kvm_read_cr4_bits(vcpu, cr4_bit);
> }
> 
>+/*
>+ * It's enough to check just CR4.FRED (X86_CR4_FRED) to tell if
>+ * a vCPU is running with FRED enabled, because:
>+ * 1) CR4.FRED can be set to 1 only _after_ IA32_EFER.LMA = 1.
>+ * 2) To leave IA-32e mode, CR4.FRED must be cleared first.
>+ *
>+ * More details at FRED Spec 6.0 Section 4.2 Enabling in CR4.
>+ */

I think we can give more context here, e.g.,

Although FRED architecture applies to 64-bit mode only, there is no need to
check if the CPU is in 64-bit mode (i.e., IA32_EFER.LMA and CS.L) to tell if
FRED is enabled because CR4.FRED=1 implies the CPU is in 64-bit mode.
Specifically,

1) ..
2) ..

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 12/25] KVM: VMX: Handle FRED event data
  2024-02-07 17:26 ` [PATCH v2 12/25] KVM: VMX: Handle FRED event data Xin Li
@ 2024-04-30  3:14   ` Chao Gao
  2024-05-10  9:36     ` Li, Xin3
  0 siblings, 1 reply; 51+ messages in thread
From: Chao Gao @ 2024-04-30  3:14 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

>index ee61d2c25cb0..f622fb90a098 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -1871,9 +1871,29 @@ static void vmx_inject_exception(struct kvm_vcpu *vcpu)
>                vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
>                             vmx->vcpu.arch.event_exit_inst_len);
>                intr_info |= INTR_TYPE_SOFT_EXCEPTION;
>-       } else
>+       } else {
>                intr_info |= INTR_TYPE_HARD_EXCEPTION;
>
>+               if (kvm_is_fred_enabled(vcpu)) {
>+                       u64 event_data = 0;
>+
>+                       if (is_debug(intr_info))
>+                               /*
>+                                * Compared to DR6, FRED #DB event data saved on
>+                                * the stack frame have bits 4 ~ 11 and 16 ~ 31
>+                                * inverted, i.e.,
>+                                *   fred_db_event_data = dr6 ^ 0xFFFF0FF0UL
>+                                */
>+                               event_data = vcpu->arch.dr6 ^ DR6_RESERVED;
>+                       else if (is_page_fault(intr_info))
>+                               event_data = vcpu->arch.cr2;
>+                       else if (is_nm_fault(intr_info))
>+                               event_data = to_vmx(vcpu)->fred_xfd_event_data;
>+

IMO, deriving an event_data from CR2/DR6 is a little short-sighted because the
event_data and CR2/DR6 __can__ be different, e.g., L1 VMM __can__ set CR2 to A
and event_data field to B (!=A) when injecting #PF.

And this approach cannot be extended to handle a (future) exception whose
event_data isn't tied to a dedicated register like CR2/DR6.

Adding a new field fred_xfd_event_data in struct vcpu has problems too:
fred_xfd_event_data gets lost during migration; strickly speaking, event_data
is tied to an exception rather than a CPU. e.g., the CPU may detect a nested
exception when delivering one and both have their own event_data.

I think we can make event_data a property of exceptions. i.e., add a payload2
to struct kvm_queued_exception. and add new APIs to kvm_queue_exception* family
to accept a payload2 and in VMX code, just program payload2 to the VMCS
event_data field if FRED is enabled. KVM ABI should be extended as well to pass
payload2 to userspace like how the payload is handled in
kvm_vcpu_ioctl_x86_get/put_vcpu_events.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 13/25] KVM: VMX: Handle VMX nested exception for FRED
  2024-02-07 17:26 ` [PATCH v2 13/25] KVM: VMX: Handle VMX nested exception for FRED Xin Li
@ 2024-04-30  7:34   ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-04-30  7:34 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

On Thu, Feb 08, 2024 at 01:26:33AM +0800, Xin Li wrote:
>Set VMX nested exception bit in the VM-entry interruption information
>VMCS field when injecting a nested exception using FRED event delivery
>to ensure:
>  1) The nested exception is injected on a correct stack level.
>  2) The nested bit defined in FRED stack frame is set.
>
>The event stack level used by FRED event delivery depends on whether the
>event was a nested exception encountered during delivery of another event,
>because a nested exception is "regarded" as happening on ring 0.  E.g.,
>when #PF is configured to use stack level 1 in IA32_FRED_STKLVLS MSR:
>  - nested #PF will be delivered on stack level 1 when encountered in
>    ring 3.
>  - normal #PF will be delivered on stack level 0 when encountered in
>    ring 3.
>
>The VMX nested-exception support ensures the correct event stack level is
>chosen when a VM entry injects a nested exception.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>
>---
>
>Changes since v1:
>* Set the nested flag when there is an original interrupt (Chao Gao).
>---
> arch/x86/include/asm/kvm_host.h |  6 +++--
> arch/x86/include/asm/vmx.h      |  5 ++--
> arch/x86/kvm/svm/svm.c          |  4 +--
> arch/x86/kvm/vmx/vmx.c          |  8 ++++--
> arch/x86/kvm/x86.c              | 46 ++++++++++++++++++++++++++-------
> arch/x86/kvm/x86.h              |  1 +
> 6 files changed, 53 insertions(+), 17 deletions(-)
>
>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>index 0d88873eba63..ef278ee0b6ca 100644
>--- a/arch/x86/include/asm/kvm_host.h
>+++ b/arch/x86/include/asm/kvm_host.h
>@@ -736,6 +736,7 @@ struct kvm_queued_exception {
> 	u32 error_code;
> 	unsigned long payload;
> 	bool has_payload;
>+	bool nested;

"nested" may be lost after migration.

> };
> 
> struct kvm_vcpu_arch {
>@@ -2060,8 +2061,9 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
> void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
> void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
> void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
>-void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
>-void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
>+void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr, bool nested);
>+void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr,
>+			     u32 error_code, bool nested);
> void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
> void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
> 				    struct x86_exception *fault);
>diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>index 6b796c5c9c2b..68af74e48788 100644
>--- a/arch/x86/include/asm/vmx.h
>+++ b/arch/x86/include/asm/vmx.h
>@@ -134,7 +134,7 @@
> #define VMX_BASIC_DUAL_MONITOR_TREATMENT	BIT_ULL(49)
> #define VMX_BASIC_INOUT				BIT_ULL(54)
> #define VMX_BASIC_TRUE_CTLS			BIT_ULL(55)
>-
>+#define VMX_BASIC_NESTED_EXCEPTION		BIT_ULL(58)

this definition is not used in this patch.

> 
> /* VMX_MISC bits and bitmasks */
> #define VMX_MISC_INTEL_PT			BIT_ULL(14)
>@@ -407,8 +407,9 @@ enum vmcs_field {
> #define INTR_INFO_INTR_TYPE_MASK        0x700           /* 10:8 */
> #define INTR_INFO_DELIVER_CODE_MASK     0x800           /* 11 */
> #define INTR_INFO_UNBLOCK_NMI		0x1000		/* 12 */
>+#define INTR_INFO_NESTED_EXCEPTION_MASK	0x2000		/* 13 */
> #define INTR_INFO_VALID_MASK            0x80000000      /* 31 */
>-#define INTR_INFO_RESVD_BITS_MASK       0x7ffff000
>+#define INTR_INFO_RESVD_BITS_MASK       0x7fffd000
> 
> #define VECTORING_INFO_VECTOR_MASK           	INTR_INFO_VECTOR_MASK
> #define VECTORING_INFO_TYPE_MASK        	INTR_INFO_INTR_TYPE_MASK
>diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>index e90b429c84f1..c220b690a37c 100644
>--- a/arch/x86/kvm/svm/svm.c
>+++ b/arch/x86/kvm/svm/svm.c
>@@ -4057,10 +4057,10 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu)
> 
> 		if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) {
> 			u32 err = svm->vmcb->control.exit_int_info_err;
>-			kvm_requeue_exception_e(vcpu, vector, err);
>+			kvm_requeue_exception_e(vcpu, vector, err, false);
> 
> 		} else
>-			kvm_requeue_exception(vcpu, vector);
>+			kvm_requeue_exception(vcpu, vector, false);
> 		break;
> 	case SVM_EXITINTINFO_TYPE_INTR:
> 		kvm_queue_interrupt(vcpu, vector, false);
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index f622fb90a098..1f265d526daf 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -1891,6 +1891,8 @@ static void vmx_inject_exception(struct kvm_vcpu *vcpu)
> 				event_data = to_vmx(vcpu)->fred_xfd_event_data;
> 
> 			vmcs_write64(INJECTED_EVENT_DATA, event_data);
>+
>+			intr_info |= ex->nested ? INTR_INFO_NESTED_EXCEPTION_MASK : 0;
> 		}
> 	}
> 
>@@ -7281,9 +7283,11 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu, bool vectoring)
> 		}
> 
> 		if (event_id & INTR_INFO_DELIVER_CODE_MASK)
>-			kvm_requeue_exception_e(vcpu, vector, vmcs_read32(error_code_field));
>+			kvm_requeue_exception_e(vcpu, vector, vmcs_read32(error_code_field),
>+						event_id & INTR_INFO_NESTED_EXCEPTION_MASK);
> 		else
>-			kvm_requeue_exception(vcpu, vector);
>+			kvm_requeue_exception(vcpu, vector,
>+					      event_id & INTR_INFO_NESTED_EXCEPTION_MASK);
> 		break;
> 	case INTR_TYPE_SOFT_INTR:
> 		vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field);
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 00c0062726ae..725819262085 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -645,7 +645,8 @@ static void kvm_leave_nested(struct kvm_vcpu *vcpu)
> 
> static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
> 		unsigned nr, bool has_error, u32 error_code,
>-	        bool has_payload, unsigned long payload, bool reinject)
>+	        bool has_payload, unsigned long payload,
>+		bool reinject, bool nested)
> {
> 	u32 prev_nr;
> 	int class1, class2;
>@@ -696,6 +697,13 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
> 			vcpu->arch.exception.pending = true;
> 			vcpu->arch.exception.injected = false;
> 		}
>+
>+		vcpu->arch.exception.nested = vcpu->arch.exception.nested ||
>+					      (kvm_is_fred_enabled(vcpu) &&
>+					       ((reinject && nested) ||
>+					        vcpu->arch.nmi_injected ||
>+					        vcpu->arch.interrupt.injected));

You can set the nested flag regardless of FRED because the sole place using
such information (vmx_inject_exception()) is guarded by kvm_is_fred_enabled()
already.

I would also drop the check about @reinject to make @reinject and @nested
orthogonal (i.e., avoid the artifical rule that nested interrupts should be
queued by "reinject" only)

so, how about:
		if (vcpu->arch.nmi_injected || vcpu->arch.interrupt.injected ||
		    nested)
			vcpu->arch.exception.nested = true;

>+
> 		vcpu->arch.exception.has_error_code = has_error;
> 		vcpu->arch.exception.vector = nr;
> 		vcpu->arch.exception.error_code = error_code;
>@@ -725,8 +733,28 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
> 		vcpu->arch.exception.injected = false;
> 		vcpu->arch.exception.pending = false;
> 
>+		/*
>+		 * A #DF is NOT a nested event per its definition, however per
>+		 * FRED spec 5.0 Appendix B, its delivery determines the new
>+		 * stack level as is done for events occurring when CPL = 0.
>+		 */
>+		vcpu->arch.exception.nested = false;
>+
> 		kvm_queue_exception_e(vcpu, DF_VECTOR, 0);
> 	} else {
>+		/*
>+		 * FRED spec 5.0 Appendix B: delivery of a nested exception
>+		 * determines the new stack level as is done for events
>+		 * occurring when CPL = 0.
>+		 *
>+		 * IOW, FRED event delivery of an event encountered in ring 3
>+		 * normally uses stack level 0 unconditionally.  However, if
>+		 * the event is an exception nested on any earlier event,
>+		 * delivery of the nested exception will consult the FRED MSR
>+		 * IA32_FRED_STKLVLS to determine which stack level to use.
>+		 */
>+		vcpu->arch.exception.nested = kvm_is_fred_enabled(vcpu);

as said above, nested flag can be set regardless of FRED.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 14/25] KVM: VMX: Disable FRED if FRED consistency checks fail
  2024-02-07 17:26 ` [PATCH v2 14/25] KVM: VMX: Disable FRED if FRED consistency checks fail Xin Li
@ 2024-04-30  8:21   ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-04-30  8:21 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

On Thu, Feb 08, 2024 at 01:26:34AM +0800, Xin Li wrote:
>Refuse to virtualize FRED if FRED consistency checks fail.

After reading this, I realize some consistency checks are missing in
setup_vmcs_config(). Actually Sean requested some infrastructure for
vmcs_entry_exit_pairs to deal with secondary_vmexit_ctrl.

https://lore.kernel.org/kvm/ZU5F58_KRIHzxrMp@google.com/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 15/25] KVM: VMX: Dump FRED context in dump_vmcs()
  2024-02-07 17:26 ` [PATCH v2 15/25] KVM: VMX: Dump FRED context in dump_vmcs() Xin Li
@ 2024-04-30  9:09   ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-04-30  9:09 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

On Thu, Feb 08, 2024 at 01:26:35AM +0800, Xin Li wrote:
>Add FRED related VMCS fields to dump_vmcs() to have it dump FRED context.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Tested-by: Shan Kang <shan.kang@intel.com>
>---
>
>Change since v1:
>* Use kvm_cpu_cap_has() instead of cpu_feature_enabled() (Chao Gao).
>* Dump guest FRED states only if guest has FRED enabled (Nikolay Borisov).
>---
> arch/x86/kvm/vmx/vmx.c | 46 +++++++++++++++++++++++++++++++++++-------
> 1 file changed, 39 insertions(+), 7 deletions(-)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index a484b9ac2400..e3409607122d 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -6392,7 +6392,7 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
>        struct vcpu_vmx *vmx = to_vmx(vcpu);
>        u32 vmentry_ctl, vmexit_ctl;
>        u32 cpu_based_exec_ctrl, pin_based_exec_ctrl, secondary_exec_control;
>-       u64 tertiary_exec_control;
>+       u64 tertiary_exec_control, secondary_vmexit_ctl;
>        unsigned long cr4;
>        int efer_slot;
>
>@@ -6403,6 +6403,8 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
>
>        vmentry_ctl = vmcs_read32(VM_ENTRY_CONTROLS);
>        vmexit_ctl = vmcs_read32(VM_EXIT_CONTROLS);
>+       secondary_vmexit_ctl = cpu_has_secondary_vmexit_ctrls() ?
>+                              vmcs_read64(SECONDARY_VM_EXIT_CONTROLS) : 0;
>        cpu_based_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
>        pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL);
>        cr4 = vmcs_readl(GUEST_CR4);
>@@ -6449,6 +6451,19 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
>        vmx_dump_sel("LDTR:", GUEST_LDTR_SELECTOR);
>        vmx_dump_dtsel("IDTR:", GUEST_IDTR_LIMIT);
>        vmx_dump_sel("TR:  ", GUEST_TR_SELECTOR);
>+#ifdef CONFIG_X86_64
>+       if (kvm_is_fred_enabled(vcpu)) {

FRED MSRs are accessible even if CR4.FRED isn't set and #ifdef is ugly, I think
you can simply do:

	if (vmentry_ctrl & VM_ENTRY_LOAD_IA32_FRED)

just like below handling for EFER/PAT etc.

>+               pr_err("FRED guest: config=0x%016llx, stack levels=0x%016llx\n"
>+                      "RSP0=0x%016lx, RSP1=0x%016llx\n"
>+                      "RSP2=0x%016llx, RSP3=0x%016llx\n",
>+                      vmcs_read64(GUEST_IA32_FRED_CONFIG),
>+                      vmcs_read64(GUEST_IA32_FRED_STKLVLS),
>+                      read_msr(MSR_IA32_FRED_RSP0),
>+                      vmcs_read64(GUEST_IA32_FRED_RSP1),
>+                      vmcs_read64(GUEST_IA32_FRED_RSP2),
>+                      vmcs_read64(GUEST_IA32_FRED_RSP3));
>+       }
>+#endif
>        efer_slot = vmx_find_loadstore_msr_slot(&vmx->msr_autoload.guest, MSR_EFER);
>        if (vmentry_ctl & VM_ENTRY_LOAD_IA32_EFER)
>                pr_err("EFER= 0x%016llx\n", vmcs_read64(GUEST_IA32_EFER));
>@@ -6496,6 +6511,19 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
>               vmcs_readl(HOST_TR_BASE));
>        pr_err("GDTBase=%016lx IDTBase=%016lx\n",
>               vmcs_readl(HOST_GDTR_BASE), vmcs_readl(HOST_IDTR_BASE));
>+#ifdef CONFIG_X86_64
>+       if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {

ditto

>+               pr_err("FRED host: config=0x%016llx, stack levels=0x%016llx\n"
>+                      "RSP0=0x%016llx, RSP1=0x%016llx\n"
>+                      "RSP2=0x%016llx, RSP3=0x%016llx\n",
>+                      vmcs_read64(HOST_IA32_FRED_CONFIG),
>+                      vmcs_read64(HOST_IA32_FRED_STKLVLS),
>+                      vmx->msr_host_fred_rsp0,
>+                      vmcs_read64(HOST_IA32_FRED_RSP1),
>+                      vmcs_read64(HOST_IA32_FRED_RSP2),
>+                      vmcs_read64(HOST_IA32_FRED_RSP3));
>+       }
>+#endif
>        pr_err("CR0=%016lx CR3=%016lx CR4=%016lx\n",
>               vmcs_readl(HOST_CR0), vmcs_readl(HOST_CR3),
>               vmcs_readl(HOST_CR4));
>@@ -6517,25 +6545,29 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
>        pr_err("*** Control State ***\n");
>        pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n",
>               cpu_based_exec_ctrl, secondary_exec_control, tertiary_exec_control);
>-       pr_err("PinBased=0x%08x EntryControls=%08x ExitControls=%08x\n",
>-              pin_based_exec_ctrl, vmentry_ctl, vmexit_ctl);
>+       pr_err("PinBased=0x%08x EntryControls=0x%08x\n",
>+              pin_based_exec_ctrl, vmentry_ctl);
>+       pr_err("ExitControls=0x%08x SecondaryExitControls=0x%016llx\n",
>+              vmexit_ctl, secondary_vmexit_ctl);
>        pr_err("ExceptionBitmap=%08x PFECmask=%08x PFECmatch=%08x\n",
>               vmcs_read32(EXCEPTION_BITMAP),
>               vmcs_read32(PAGE_FAULT_ERROR_CODE_MASK),
>               vmcs_read32(PAGE_FAULT_ERROR_CODE_MATCH));
>-       pr_err("VMEntry: intr_info=%08x errcode=%08x ilen=%08x\n",
>+       pr_err("VMEntry: intr_info=%08x errcode=%08x ilen=%08x event data=%016llx\n",

s/event data/event_data/

>               vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
>               vmcs_read32(VM_ENTRY_EXCEPTION_ERROR_CODE),
>-              vmcs_read32(VM_ENTRY_INSTRUCTION_LEN));
>+              vmcs_read32(VM_ENTRY_INSTRUCTION_LEN),
>+              kvm_cpu_cap_has(X86_FEATURE_FRED) ? vmcs_read64(INJECTED_EVENT_DATA) : 0);

again, it is better to check some vmexit/vmentry ctrl bit.

>        pr_err("VMExit: intr_info=%08x errcode=%08x ilen=%08x\n",
>               vmcs_read32(VM_EXIT_INTR_INFO),
>               vmcs_read32(VM_EXIT_INTR_ERROR_CODE),
>               vmcs_read32(VM_EXIT_INSTRUCTION_LEN));
>        pr_err("        reason=%08x qualification=%016lx\n",
>               vmcs_read32(VM_EXIT_REASON), vmcs_readl(EXIT_QUALIFICATION));
>-       pr_err("IDTVectoring: info=%08x errcode=%08x\n",
>+       pr_err("IDTVectoring: info=%08x errcode=%08x event data=%016llx\n",

s/event data/event_data/

>               vmcs_read32(IDT_VECTORING_INFO_FIELD),
>-              vmcs_read32(IDT_VECTORING_ERROR_CODE));
>+              vmcs_read32(IDT_VECTORING_ERROR_CODE),
>+              kvm_cpu_cap_has(X86_FEATURE_FRED) ? vmcs_read64(ORIGINAL_EVENT_DATA) : 0);

ditto

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v2 12/25] KVM: VMX: Handle FRED event data
  2024-04-30  3:14   ` Chao Gao
@ 2024-05-10  9:36     ` Li, Xin3
  2024-05-11  3:03       ` Chao Gao
  0 siblings, 1 reply; 51+ messages in thread
From: Li, Xin3 @ 2024-05-10  9:36 UTC (permalink / raw)
  To: Gao, Chao
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

> >+               if (kvm_is_fred_enabled(vcpu)) {
> >+                       u64 event_data = 0;
> >+
> >+                       if (is_debug(intr_info))
> >+                               /*
> >+                                * Compared to DR6, FRED #DB event data saved on
> >+                                * the stack frame have bits 4 ~ 11 and 16 ~ 31
> >+                                * inverted, i.e.,
> >+                                *   fred_db_event_data = dr6 ^ 0xFFFF0FF0UL
> >+                                */
> >+                               event_data = vcpu->arch.dr6 ^ DR6_RESERVED;
> >+                       else if (is_page_fault(intr_info))
> >+                               event_data = vcpu->arch.cr2;
> >+                       else if (is_nm_fault(intr_info))
> >+                               event_data =
> >+ to_vmx(vcpu)->fred_xfd_event_data;
> >+
> 
> IMO, deriving an event_data from CR2/DR6 is a little short-sighted because the
> event_data and CR2/DR6 __can__ be different, e.g., L1 VMM __can__ set CR2 to A
> and event_data field to B (!=A) when injecting #PF.

VMM should guarantee a FRED guest _sees_ consistent values in CR6/DR6
and event data. If not it's just a VMM bug that we need to fix.

> 
> And this approach cannot be extended to handle a (future) exception whose
> event_data isn't tied to a dedicated register like CR2/DR6.

See below.

> Adding a new field fred_xfd_event_data in struct vcpu has problems too:
> fred_xfd_event_data gets lost during migration;

I'm not bothered, because this is not hard to fix, right?

> strickly speaking, event_data is tied
> to an exception rather than a CPU. e.g., the CPU may detect a nested exception when
> delivering one and both have their own event_data.

No, don't get me wrong. An event data has to be _regenerated_ after
a nested exception is handled and the original instruction flow is
restarted. 
sometimes the original event could be gone.

We don't say an event data is tied to an exception or a CPU, which
is just confusing, or misleading.

> I think we can make event_data a property of exceptions. i.e., add a payload2 to
> struct kvm_queued_exception. and add new APIs to kvm_queue_exception* family to
> accept a payload2 and in VMX code, just program payload2 to the VMCS event_data
> field if FRED is enabled. KVM ABI should be extended as well to pass
> payload2 to userspace like how the payload is handled in
> kvm_vcpu_ioctl_x86_get/put_vcpu_events.

Yes, it's very likely that we will need to add a payload2 in future,
but NOT now. 2 reasons:

1) The first-generation FRED is designed to NOT go too far from what
   IDT can do. And FRED event data is conceptually an alias of CR2/DR6
   in the latest FRED spec (not considering xfd event data for now).
   And the existing payload is a nice match for now;

2) FRED is an extendable CPU architecture, which allows the structure
   of event data to become way bigger and complicated. Let's not assume
   anything and add a payload2 too early.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v2 11/25] KVM: x86: Add kvm_is_fred_enabled()
  2024-04-29  8:24   ` Chao Gao
@ 2024-05-11  1:24     ` Li, Xin3
  2024-05-11  1:53       ` Chao Gao
  0 siblings, 1 reply; 51+ messages in thread
From: Li, Xin3 @ 2024-05-11  1:24 UTC (permalink / raw)
  To: Gao, Chao
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

> >+/*
> >+ * It's enough to check just CR4.FRED (X86_CR4_FRED) to tell if
> >+ * a vCPU is running with FRED enabled, because:
> >+ * 1) CR4.FRED can be set to 1 only _after_ IA32_EFER.LMA = 1.
> >+ * 2) To leave IA-32e mode, CR4.FRED must be cleared first.
> >+ *
> >+ * More details at FRED Spec 6.0 Section 4.2 Enabling in CR4.
> >+ */
> 
> I think we can give more context here, e.g.,
> 
> Although FRED architecture applies to 64-bit mode only, there is no need to check if
> the CPU is in 64-bit mode (i.e., IA32_EFER.LMA and CS.L) to tell if FRED is enabled
> because CR4.FRED=1 implies the CPU is in 64-bit mode.

What is "more context" here?

> Specifically,



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 11/25] KVM: x86: Add kvm_is_fred_enabled()
  2024-05-11  1:24     ` Li, Xin3
@ 2024-05-11  1:53       ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-05-11  1:53 UTC (permalink / raw)
  To: Li, Xin3
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

On Sat, May 11, 2024 at 09:24:12AM +0800, Li, Xin3 wrote:
>> >+/*
>> >+ * It's enough to check just CR4.FRED (X86_CR4_FRED) to tell if
>> >+ * a vCPU is running with FRED enabled, because:
>> >+ * 1) CR4.FRED can be set to 1 only _after_ IA32_EFER.LMA = 1.
>> >+ * 2) To leave IA-32e mode, CR4.FRED must be cleared first.
>> >+ *
>> >+ * More details at FRED Spec 6.0 Section 4.2 Enabling in CR4.
>> >+ */
>> 
>> I think we can give more context here, e.g.,
>> 
>> Although FRED architecture applies to 64-bit mode only, there is no need to check if
>> the CPU is in 64-bit mode (i.e., IA32_EFER.LMA and CS.L) to tell if FRED is enabled
>> because CR4.FRED=1 implies the CPU is in 64-bit mode.
>
>What is "more context" here?

e.g.,
why IA32_EFER.LMA and CPU mode are related to FRED here?

"it's enough to " implies something else is not necessary. what is it?

I don't think the original comment make them super clear.

>
>> Specifically,
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 12/25] KVM: VMX: Handle FRED event data
  2024-05-10  9:36     ` Li, Xin3
@ 2024-05-11  3:03       ` Chao Gao
  0 siblings, 0 replies; 51+ messages in thread
From: Chao Gao @ 2024-05-11  3:03 UTC (permalink / raw)
  To: Li, Xin3
  Cc: linux-kernel, kvm, linux-doc, linux-kselftest, seanjc, pbonzini,
	corbet, tglx, mingo, bp, dave.hansen, x86, hpa, shuah, vkuznets,
	peterz, Shankar, Ravi V, xin

On Fri, May 10, 2024 at 05:36:03PM +0800, Li, Xin3 wrote:
>> >+               if (kvm_is_fred_enabled(vcpu)) {
>> >+                       u64 event_data = 0;
>> >+
>> >+                       if (is_debug(intr_info))
>> >+                               /*
>> >+                                * Compared to DR6, FRED #DB event data saved on
>> >+                                * the stack frame have bits 4 ~ 11 and 16 ~ 31
>> >+                                * inverted, i.e.,
>> >+                                *   fred_db_event_data = dr6 ^ 0xFFFF0FF0UL
>> >+                                */
>> >+                               event_data = vcpu->arch.dr6 ^ DR6_RESERVED;
>> >+                       else if (is_page_fault(intr_info))
>> >+                               event_data = vcpu->arch.cr2;
>> >+                       else if (is_nm_fault(intr_info))
>> >+                               event_data =
>> >+ to_vmx(vcpu)->fred_xfd_event_data;
>> >+
>> 
>> IMO, deriving an event_data from CR2/DR6 is a little short-sighted because the
>> event_data and CR2/DR6 __can__ be different, e.g., L1 VMM __can__ set CR2 to A
>> and event_data field to B (!=A) when injecting #PF.
>
>VMM should guarantee a FRED guest _sees_ consistent values in CR6/DR6
>and event data. If not it's just a VMM bug that we need to fix.

I don't get why VMM should.

I know the hardware will guarantee this. And likely KVM will also do this.
but I don't think it is necessary for KVM to assume L1 VMM will guarantee
this. because as long as L2 guest is enlightened to read event_data from stack
only, the ABI between L1 VMM and L2 guest can be: CR2/DR6 may be out of sync
with the event_data. I am not saying it is good that L1 VMM deviates from the
real hardware behavior. But how L1 VMM defines this ABI with L2 has nothing to
do with KVM as L0. KVM shouldn't make assumptions on that.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2024-05-11  3:03 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-07 17:26 [PATCH v2 00/25] Enable FRED with KVM VMX Xin Li
2024-02-07 17:26 ` [PATCH v2 01/25] KVM: VMX: Cleanup VMX basic information defines and usages Xin Li
2024-02-07 17:26 ` [PATCH v2 02/25] KVM: VMX: Cleanup VMX misc " Xin Li
2024-02-07 17:26 ` [PATCH v2 03/25] KVM: VMX: Add support for the secondary VM exit controls Xin Li
2024-04-19 10:21   ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 04/25] KVM: x86: Mark CR4.FRED as not reserved Xin Li
2024-04-19 10:22   ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 05/25] KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config Xin Li
2024-04-19 10:24   ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 06/25] KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID Xin Li
2024-04-19 11:02   ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 07/25] KVM: VMX: Set intercept for FRED MSRs Xin Li
2024-04-19 13:35   ` Chao Gao
2024-04-19 17:06     ` Li, Xin3
2024-02-07 17:26 ` [PATCH v2 08/25] KVM: VMX: Initialize VMCS FRED fields Xin Li
2024-04-19 14:01   ` Chao Gao
2024-04-19 17:02     ` Li, Xin3
2024-02-07 17:26 ` [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and guest Xin Li
2024-04-19 14:23   ` Chao Gao
2024-04-19 16:37     ` Li, Xin3
2024-02-07 17:26 ` [PATCH v2 10/25] KVM: VMX: Add support for FRED context save/restore Xin Li
2024-04-29  6:31   ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 11/25] KVM: x86: Add kvm_is_fred_enabled() Xin Li
2024-04-29  8:24   ` Chao Gao
2024-05-11  1:24     ` Li, Xin3
2024-05-11  1:53       ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 12/25] KVM: VMX: Handle FRED event data Xin Li
2024-04-30  3:14   ` Chao Gao
2024-05-10  9:36     ` Li, Xin3
2024-05-11  3:03       ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 13/25] KVM: VMX: Handle VMX nested exception for FRED Xin Li
2024-04-30  7:34   ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 14/25] KVM: VMX: Disable FRED if FRED consistency checks fail Xin Li
2024-04-30  8:21   ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 15/25] KVM: VMX: Dump FRED context in dump_vmcs() Xin Li
2024-04-30  9:09   ` Chao Gao
2024-02-07 17:26 ` [PATCH v2 16/25] KVM: VMX: Invoke vmx_set_cpu_caps() before nested setup Xin Li
2024-02-07 17:26 ` [PATCH v2 17/25] KVM: nVMX: Add support for the secondary VM exit controls Xin Li
2024-02-07 17:26 ` [PATCH v2 18/25] KVM: nVMX: Add a prerequisite to SHADOW_FIELD_R[OW] macros Xin Li
2024-02-07 17:26 ` [PATCH v2 19/25] KVM: nVMX: Add FRED VMCS fields Xin Li
2024-02-07 17:26 ` [PATCH v2 20/25] KVM: nVMX: Add support for VMX FRED controls Xin Li
2024-02-07 17:26 ` [PATCH v2 21/25] KVM: nVMX: Add VMCS FRED states checking Xin Li
2024-02-07 17:26 ` [PATCH v2 22/25] KVM: x86: Allow FRED/LKGS/WRMSRNS to be exposed to guests Xin Li
2024-02-07 17:26 ` [PATCH v2 23/25] KVM: selftests: Run debug_regs test with FRED enabled Xin Li
2024-02-07 17:26 ` [PATCH v2 24/25] KVM: selftests: Add a new VM guest mode to run user level code Xin Li
2024-02-07 17:26 ` [PATCH v2 25/25] KVM: selftests: Add fred exception tests Xin Li
2024-03-29 20:18   ` Muhammad Usama Anjum
2024-04-24 16:08     ` Sean Christopherson
2024-03-29 20:18   ` Muhammad Usama Anjum
2024-03-29 20:18     ` Muhammad Usama Anjum
2024-03-27  8:08 ` [PATCH v2 00/25] Enable FRED with KVM VMX Kang, Shan
2024-04-15 17:58 ` Li, Xin3

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.