linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs
@ 2022-08-02 16:07 Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags Vitaly Kuznetsov
                   ` (25 more replies)
  0 siblings, 26 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Changes since v4:
- Rebase to the latest kvm/queue (93472b797153).
- Add "KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM"
  [Sean].
- Use "__always_inline" [Sean, Nathan, Paolo]. 
- Collect R-b tags [Max, Sean, Kai].
- Use two-dimentional 'evmcs_unsupported_ctls' array in PATCH 09 [Paolo].
- Numerous indentation fixes [Sean].
- KVM_{REQ,OPT}_VMX_* -> KVM_{REQUIRED,OPTIONAL}_VMX_* [Sean].
- Minor changes to commit messages, comments,... [Sean, Max].

Original description:

Enlightened VMCS v1 definition was updates to include fields for the
following features:
    - PerfGlobalCtrl
    - EnclsExitingBitmap
    - TSC scaling
    - GuestLbrCtl
    - CET
    - SSP
While the information is missing in the publicly available TLFS, the
updated definition comes with a new feature bit in CPUID.0x4000000A.EBX
(BIT 0). Use a made up HV_X64_NESTED_EVMCS1_2022_UPDATE name for it.

Add support for the new revision to KVM. SSP, CET and GuestLbrCtl
features are not currently supported by KVM.

While on it, implement Sean's idea to use vmcs_config for setting up
L1 VMX control MSRs instead of re-reading host MSRs.

Jim Mattson (1):
  KVM: x86: VMX: Replace some Intel model numbers with mnemonics

Sean Christopherson (2):
  KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM
  KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not
    setup

Vitaly Kuznetsov (23):
  KVM: x86: hyper-v: Expose access to debug MSRs in the partition
    privilege flags
  x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
  x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields
  KVM: nVMX: Support several new fields in eVMCSv1
  KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf
  KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields
  KVM: selftests: Switch to updated eVMCSv1 definition
  KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with
    enlightened VMCS
  KVM: selftests: Enable TSC scaling in evmcs selftest
  KVM: VMX: Get rid of eVMCS specific VMX controls sanitization
  KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config()
  KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING in
    setup_vmcs_config()
  KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING
    in setup_vmcs_config()
  KVM: VMX: Extend VMX controls macro shenanigans
  KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of
    setup_vmcs_config()
  KVM: VMX: Add missing VMEXIT controls to vmcs_config
  KVM: VMX: Add missing CPU based VM execution controls to vmcs_config
  KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of
    setup_vmcs_config()
  KVM: nVMX: Always set required-1 bits of pinbased_ctls to
    PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR
  KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs
  KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config
  KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up
    nested MSR

 arch/x86/include/asm/hyperv-tlfs.h            |  30 +-
 arch/x86/include/asm/kvm_host.h               |   2 +
 arch/x86/kvm/hyperv.c                         |  20 +-
 arch/x86/kvm/vmx/capabilities.h               |  14 +-
 arch/x86/kvm/vmx/evmcs.c                      | 127 +++++++--
 arch/x86/kvm/vmx/evmcs.h                      |  18 +-
 arch/x86/kvm/vmx/nested.c                     |  70 +++--
 arch/x86/kvm/vmx/nested.h                     |   2 +-
 arch/x86/kvm/vmx/vmx.c                        | 256 ++++++++----------
 arch/x86/kvm/vmx/vmx.h                        | 162 +++++++++--
 include/asm-generic/hyperv-tlfs.h             |   2 +
 .../selftests/kvm/include/x86_64/evmcs.h      |  45 ++-
 .../selftests/kvm/include/x86_64/vmx.h        |   2 +
 .../testing/selftests/kvm/x86_64/evmcs_test.c |  31 ++-
 14 files changed, 521 insertions(+), 260 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-18 15:14   ` Sean Christopherson
  2022-08-02 16:07 ` [PATCH v5 02/26] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

For some features, Hyper-V spec defines two separate CPUID bits: one
listing whether the feature is supported or not and another one showing
whether guest partition was granted access to the feature ("partition
privilege mask"). 'Debug MSRs available' is one of such features. Add
the missing 'access' bit.

Note: hv_check_msr_access() deliberately keeps checking
HV_FEATURE_DEBUG_MSRS_AVAILABLE bit instead of the new HV_ACCESS_DEBUG_MSRS
to not break existing VMMs (QEMU) which only expose one bit. Normally, VMMs
should set either both these bits or none.

Fixes: f97f5a56f597 ("x86/kvm/hyper-v: Add support for synthetic debugger interface")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c             | 1 +
 include/asm-generic/hyperv-tlfs.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index ed804447589c..c284a605e453 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2496,6 +2496,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 			ent->eax |= HV_MSR_RESET_AVAILABLE;
 			ent->eax |= HV_MSR_REFERENCE_TSC_AVAILABLE;
 			ent->eax |= HV_ACCESS_FREQUENCY_MSRS;
+			ent->eax |= HV_ACCESS_DEBUG_MSRS;
 			ent->eax |= HV_ACCESS_REENLIGHTENMENT;
 
 			ent->ebx |= HV_POST_MESSAGES;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index fdce7a4cfc6f..1d99dd296a76 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -70,6 +70,8 @@
 #define HV_MSR_GUEST_IDLE_AVAILABLE		BIT(10)
 /* Partition local APIC and TSC frequency registers available */
 #define HV_ACCESS_FREQUENCY_MSRS		BIT(11)
+/* Debug MSRs available */
+#define HV_ACCESS_DEBUG_MSRS			BIT(12)
 /* AccessReenlightenmentControls privilege */
 #define HV_ACCESS_REENLIGHTENMENT		BIT(13)
 /* AccessTscInvariantControls privilege */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 02/26] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 03/26] x86/hyperv: Update " Vitaly Kuznetsov
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Section 1.9 of TLFS v6.0b says:

"All structures are padded in such a way that fields are aligned
naturally (that is, an 8-byte field is aligned to an offset of 8 bytes
and so on)".

'struct enlightened_vmcs' has a glitch:

...
        struct {
                u32                nested_flush_hypercall:1; /*   836: 0  4 */
                u32                msr_bitmap:1;         /*   836: 1  4 */
                u32                reserved:30;          /*   836: 2  4 */
        } hv_enlightenments_control;                     /*   836     4 */
        u32                        hv_vp_id;             /*   840     4 */
        u64                        hv_vm_id;             /*   844     8 */
        u64                        partition_assist_page; /*   852     8 */
...

And the observed values in 'partition_assist_page' make no sense at
all. Fix the layout by padding the structure properly.

Fixes: 68d1eb72ee99 ("x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 0a9407dc0859..6f0acc45e67a 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -546,7 +546,7 @@ struct hv_enlightened_vmcs {
 	u64 guest_rip;
 
 	u32 hv_clean_fields;
-	u32 hv_padding_32;
+	u32 padding32_1;
 	u32 hv_synthetic_controls;
 	struct {
 		u32 nested_flush_hypercall:1;
@@ -554,7 +554,7 @@ struct hv_enlightened_vmcs {
 		u32 reserved:30;
 	}  __packed hv_enlightenments_control;
 	u32 hv_vp_id;
-
+	u32 padding32_2;
 	u64 hv_vm_id;
 	u64 partition_assist_page;
 	u64 padding64_4[4];
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 02/26] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-18 15:21   ` Sean Christopherson
  2022-08-02 16:07 ` [PATCH v5 04/26] KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields Vitaly Kuznetsov
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Updated Hyper-V Enlightened VMCS specification lists several new
fields for the following features:

- PerfGlobalCtrl
- EnclsExitingBitmap
- Tsc Scaling
- GuestLbrCtl
- CET
- SSP

Update the definition. The updated definition is available only when
CPUID.0x4000000A.EBX BIT(0) is '1'. Add a define for it as well.

Note: The latest TLFS is available at
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 6f0acc45e67a..ebc27017fa48 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -138,6 +138,17 @@
 #define HV_X64_NESTED_GUEST_MAPPING_FLUSH		BIT(18)
 #define HV_X64_NESTED_MSR_BITMAP			BIT(19)
 
+/*
+ * Nested quirks. These are HYPERV_CPUID_NESTED_FEATURES.EBX bits.
+ *
+ * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
+ * published TLFS version. When the bit is set, nested hypervisor can use
+ * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
+ * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
+ * specification).
+ */
+#define HV_X64_NESTED_EVMCS1_2022_UPDATE		BIT(0)
+
 /*
  * This is specific to AMD and specifies that enlightened TLB flush is
  * supported. If guest opts in to this feature, ASID invalidations only
@@ -559,9 +570,20 @@ struct hv_enlightened_vmcs {
 	u64 partition_assist_page;
 	u64 padding64_4[4];
 	u64 guest_bndcfgs;
-	u64 padding64_5[7];
+	u64 guest_ia32_perf_global_ctrl;
+	u64 guest_ia32_s_cet;
+	u64 guest_ssp;
+	u64 guest_ia32_int_ssp_table_addr;
+	u64 guest_ia32_lbr_ctl;
+	u64 padding64_5[2];
 	u64 xss_exit_bitmap;
-	u64 padding64_6[7];
+	u64 encls_exiting_bitmap;
+	u64 host_ia32_perf_global_ctrl;
+	u64 tsc_multiplier;
+	u64 host_ia32_s_cet;
+	u64 host_ssp;
+	u64 host_ia32_int_ssp_table_addr;
+	u64 padding64_6;
 } __packed;
 
 #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE			0
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 04/26] KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 03/26] x86/hyperv: Update " Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 05/26] KVM: nVMX: Support several new fields in eVMCSv1 Vitaly Kuznetsov
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Enlightened VMCS v1 definition was updated with new fields, support
them in KVM by defining VMCS-to-EVMCS conversion.

Note: SSP, CET and Guest LBR features are not supported by KVM yet and
the corresponding fields are not defined in 'enum vmcs_field', leave
them commented out for now.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 6a61b1ae7942..8bea5dea0341 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -28,6 +28,8 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
 		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
 	EVMCS1_FIELD(HOST_IA32_EFER, host_ia32_efer,
 		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_IA32_PERF_GLOBAL_CTRL, host_ia32_perf_global_ctrl,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
 	EVMCS1_FIELD(HOST_CR0, host_cr0,
 		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
 	EVMCS1_FIELD(HOST_CR3, host_cr3,
@@ -78,6 +80,8 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
 		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
 	EVMCS1_FIELD(GUEST_IA32_EFER, guest_ia32_efer,
 		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_IA32_PERF_GLOBAL_CTRL, guest_ia32_perf_global_ctrl,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
 	EVMCS1_FIELD(GUEST_PDPTR0, guest_pdptr0,
 		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
 	EVMCS1_FIELD(GUEST_PDPTR1, guest_pdptr1,
@@ -126,6 +130,28 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
 		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
 	EVMCS1_FIELD(XSS_EXIT_BITMAP, xss_exit_bitmap,
 		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
+	EVMCS1_FIELD(ENCLS_EXITING_BITMAP, encls_exiting_bitmap,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
+	EVMCS1_FIELD(TSC_MULTIPLIER, tsc_multiplier,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
+	/*
+	 * Not used by KVM:
+	 *
+	 * EVMCS1_FIELD(0x00006828, guest_ia32_s_cet,
+	 *	     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	 * EVMCS1_FIELD(0x0000682A, guest_ssp,
+	 *	     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC),
+	 * EVMCS1_FIELD(0x0000682C, guest_ia32_int_ssp_table_addr,
+	 *	     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	 * EVMCS1_FIELD(0x00002816, guest_ia32_lbr_ctl,
+	 *	     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	 * EVMCS1_FIELD(0x00006C18, host_ia32_s_cet,
+	 *	     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	 * EVMCS1_FIELD(0x00006C1A, host_ssp,
+	 *	     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	 * EVMCS1_FIELD(0x00006C1C, host_ia32_int_ssp_table_addr,
+	 *	     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	 */
 
 	/* 64 bit read only */
 	EVMCS1_FIELD(GUEST_PHYSICAL_ADDRESS, guest_physical_address,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 05/26] KVM: nVMX: Support several new fields in eVMCSv1
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 04/26] KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 06/26] KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf Vitaly Kuznetsov
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Enlightened VMCS v1 definition was updated with new fields, add
support for them for Hyper-V on KVM.

Note: SSP, CET and Guest LBR features are not supported by KVM yet
and 'struct vmcs12' has no corresponding fields.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ddd4367d4826..270a1d8e4a6e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1607,6 +1607,10 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
 		vmcs12->guest_rflags = evmcs->guest_rflags;
 		vmcs12->guest_interruptibility_info =
 			evmcs->guest_interruptibility_info;
+		/*
+		 * Not present in struct vmcs12:
+		 * vmcs12->guest_ssp = evmcs->guest_ssp;
+		 */
 	}
 
 	if (unlikely(!(hv_clean_fields &
@@ -1653,6 +1657,13 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
 		vmcs12->host_fs_selector = evmcs->host_fs_selector;
 		vmcs12->host_gs_selector = evmcs->host_gs_selector;
 		vmcs12->host_tr_selector = evmcs->host_tr_selector;
+		vmcs12->host_ia32_perf_global_ctrl = evmcs->host_ia32_perf_global_ctrl;
+		/*
+		 * Not present in struct vmcs12:
+		 * vmcs12->host_ia32_s_cet = evmcs->host_ia32_s_cet;
+		 * vmcs12->host_ssp = evmcs->host_ssp;
+		 * vmcs12->host_ia32_int_ssp_table_addr = evmcs->host_ia32_int_ssp_table_addr;
+		 */
 	}
 
 	if (unlikely(!(hv_clean_fields &
@@ -1720,6 +1731,8 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
 		vmcs12->tsc_offset = evmcs->tsc_offset;
 		vmcs12->virtual_apic_page_addr = evmcs->virtual_apic_page_addr;
 		vmcs12->xss_exit_bitmap = evmcs->xss_exit_bitmap;
+		vmcs12->encls_exiting_bitmap = evmcs->encls_exiting_bitmap;
+		vmcs12->tsc_multiplier = evmcs->tsc_multiplier;
 	}
 
 	if (unlikely(!(hv_clean_fields &
@@ -1767,6 +1780,13 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
 		vmcs12->guest_bndcfgs = evmcs->guest_bndcfgs;
 		vmcs12->guest_activity_state = evmcs->guest_activity_state;
 		vmcs12->guest_sysenter_cs = evmcs->guest_sysenter_cs;
+		vmcs12->guest_ia32_perf_global_ctrl = evmcs->guest_ia32_perf_global_ctrl;
+		/*
+		 * Not present in struct vmcs12:
+		 * vmcs12->guest_ia32_s_cet = evmcs->guest_ia32_s_cet;
+		 * vmcs12->guest_ia32_lbr_ctl = evmcs->guest_ia32_lbr_ctl;
+		 * vmcs12->guest_ia32_int_ssp_table_addr = evmcs->guest_ia32_int_ssp_table_addr;
+		 */
 	}
 
 	/*
@@ -1869,12 +1889,23 @@ static void copy_vmcs12_to_enlightened(struct vcpu_vmx *vmx)
 	 * evmcs->vm_exit_msr_store_count = vmcs12->vm_exit_msr_store_count;
 	 * evmcs->vm_exit_msr_load_count = vmcs12->vm_exit_msr_load_count;
 	 * evmcs->vm_entry_msr_load_count = vmcs12->vm_entry_msr_load_count;
+	 * evmcs->guest_ia32_perf_global_ctrl = vmcs12->guest_ia32_perf_global_ctrl;
+	 * evmcs->host_ia32_perf_global_ctrl = vmcs12->host_ia32_perf_global_ctrl;
+	 * evmcs->encls_exiting_bitmap = vmcs12->encls_exiting_bitmap;
+	 * evmcs->tsc_multiplier = vmcs12->tsc_multiplier;
 	 *
 	 * Not present in struct vmcs12:
 	 * evmcs->exit_io_instruction_ecx = vmcs12->exit_io_instruction_ecx;
 	 * evmcs->exit_io_instruction_esi = vmcs12->exit_io_instruction_esi;
 	 * evmcs->exit_io_instruction_edi = vmcs12->exit_io_instruction_edi;
 	 * evmcs->exit_io_instruction_eip = vmcs12->exit_io_instruction_eip;
+	 * evmcs->host_ia32_s_cet = vmcs12->host_ia32_s_cet;
+	 * evmcs->host_ssp = vmcs12->host_ssp;
+	 * evmcs->host_ia32_int_ssp_table_addr = vmcs12->host_ia32_int_ssp_table_addr;
+	 * evmcs->guest_ia32_s_cet = vmcs12->guest_ia32_s_cet;
+	 * evmcs->guest_ia32_lbr_ctl = vmcs12->guest_ia32_lbr_ctl;
+	 * evmcs->guest_ia32_int_ssp_table_addr = vmcs12->guest_ia32_int_ssp_table_addr;
+	 * evmcs->guest_ssp = vmcs12->guest_ssp;
 	 */
 
 	evmcs->guest_es_selector = vmcs12->guest_es_selector;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 06/26] KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 05/26] KVM: nVMX: Support several new fields in eVMCSv1 Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 07/26] KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields Vitaly Kuznetsov
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

KVM has to check guest visible HYPERV_CPUID_NESTED_FEATURES.EBX CPUID
leaf to know which Enlightened VMCS definition to use (original or 2022
update). Cache the leaf along with other Hyper-V CPUID feature leaves
to make the check quick.

While on it, wipe the whole 'hv_vcpu->cpuid_cache' with memset() instead
of having to zero each particular member when the corresponding CPUID entry
was not found.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/hyperv.c           | 17 ++++++++---------
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e8281d64a431..ea0ee6167447 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -615,6 +615,8 @@ struct kvm_vcpu_hv {
 		u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
 		u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
 		u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
+		u32 nested_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
+		u32 nested_ebx; /* HYPERV_CPUID_NESTED_FEATURES.EBX */
 	} cpuid_cache;
 };
 
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index c284a605e453..1098915360ae 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2005,31 +2005,30 @@ void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
 
 	hv_vcpu = to_hv_vcpu(vcpu);
 
+	memset(&hv_vcpu->cpuid_cache, 0, sizeof(hv_vcpu->cpuid_cache));
+
 	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_FEATURES);
 	if (entry) {
 		hv_vcpu->cpuid_cache.features_eax = entry->eax;
 		hv_vcpu->cpuid_cache.features_ebx = entry->ebx;
 		hv_vcpu->cpuid_cache.features_edx = entry->edx;
-	} else {
-		hv_vcpu->cpuid_cache.features_eax = 0;
-		hv_vcpu->cpuid_cache.features_ebx = 0;
-		hv_vcpu->cpuid_cache.features_edx = 0;
 	}
 
 	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO);
 	if (entry) {
 		hv_vcpu->cpuid_cache.enlightenments_eax = entry->eax;
 		hv_vcpu->cpuid_cache.enlightenments_ebx = entry->ebx;
-	} else {
-		hv_vcpu->cpuid_cache.enlightenments_eax = 0;
-		hv_vcpu->cpuid_cache.enlightenments_ebx = 0;
 	}
 
 	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES);
 	if (entry)
 		hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
-	else
-		hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
+
+	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES);
+	if (entry) {
+		hv_vcpu->cpuid_cache.nested_eax = entry->eax;
+		hv_vcpu->cpuid_cache.nested_ebx = entry->ebx;
+	}
 }
 
 int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 07/26] KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 06/26] KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 08/26] KVM: selftests: Switch to updated eVMCSv1 definition Vitaly Kuznetsov
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

The updated Enlightened VMCS definition has 'encls_exiting_bitmap'
field which needs mapping to VMCS, add the missing encoding.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/testing/selftests/kvm/include/x86_64/vmx.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 99fa1410964c..7d8c980317f7 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -208,6 +208,8 @@ enum vmcs_field {
 	VMWRITE_BITMAP_HIGH		= 0x00002029,
 	XSS_EXIT_BITMAP			= 0x0000202C,
 	XSS_EXIT_BITMAP_HIGH		= 0x0000202D,
+	ENCLS_EXITING_BITMAP		= 0x0000202E,
+	ENCLS_EXITING_BITMAP_HIGH	= 0x0000202F,
 	TSC_MULTIPLIER			= 0x00002032,
 	TSC_MULTIPLIER_HIGH		= 0x00002033,
 	GUEST_PHYSICAL_ADDRESS		= 0x00002400,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 08/26] KVM: selftests: Switch to updated eVMCSv1 definition
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (6 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 07/26] KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS Vitaly Kuznetsov
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Update Enlightened VMCS definition in selftests from KVM.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 .../selftests/kvm/include/x86_64/evmcs.h      | 45 +++++++++++++++++--
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index 3c9260f8e116..58db74f68af2 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -203,14 +203,25 @@ struct hv_enlightened_vmcs {
 		u32 reserved:30;
 	} hv_enlightenments_control;
 	u32 hv_vp_id;
-
+	u32 padding32_2;
 	u64 hv_vm_id;
 	u64 partition_assist_page;
 	u64 padding64_4[4];
 	u64 guest_bndcfgs;
-	u64 padding64_5[7];
+	u64 guest_ia32_perf_global_ctrl;
+	u64 guest_ia32_s_cet;
+	u64 guest_ssp;
+	u64 guest_ia32_int_ssp_table_addr;
+	u64 guest_ia32_lbr_ctl;
+	u64 padding64_5[2];
 	u64 xss_exit_bitmap;
-	u64 padding64_6[7];
+	u64 encls_exiting_bitmap;
+	u64 host_ia32_perf_global_ctrl;
+	u64 tsc_multiplier;
+	u64 host_ia32_s_cet;
+	u64 host_ssp;
+	u64 host_ia32_int_ssp_table_addr;
+	u64 padding64_6;
 };
 
 #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE                     0
@@ -656,6 +667,18 @@ static inline int evmcs_vmread(uint64_t encoding, uint64_t *value)
 	case VIRTUAL_PROCESSOR_ID:
 		*value = current_evmcs->virtual_processor_id;
 		break;
+	case HOST_IA32_PERF_GLOBAL_CTRL:
+		*value = current_evmcs->host_ia32_perf_global_ctrl;
+		break;
+	case GUEST_IA32_PERF_GLOBAL_CTRL:
+		*value = current_evmcs->guest_ia32_perf_global_ctrl;
+		break;
+	case ENCLS_EXITING_BITMAP:
+		*value = current_evmcs->encls_exiting_bitmap;
+		break;
+	case TSC_MULTIPLIER:
+		*value = current_evmcs->tsc_multiplier;
+		break;
 	default: return 1;
 	}
 
@@ -1169,6 +1192,22 @@ static inline int evmcs_vmwrite(uint64_t encoding, uint64_t value)
 		current_evmcs->virtual_processor_id = value;
 		current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_XLAT;
 		break;
+	case HOST_IA32_PERF_GLOBAL_CTRL:
+		current_evmcs->host_ia32_perf_global_ctrl = value;
+		current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1;
+		break;
+	case GUEST_IA32_PERF_GLOBAL_CTRL:
+		current_evmcs->guest_ia32_perf_global_ctrl = value;
+		current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1;
+		break;
+	case ENCLS_EXITING_BITMAP:
+		current_evmcs->encls_exiting_bitmap = value;
+		current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2;
+		break;
+	case TSC_MULTIPLIER:
+		current_evmcs->tsc_multiplier = value;
+		current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2;
+		break;
 	default: return 1;
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (7 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 08/26] KVM: selftests: Switch to updated eVMCSv1 definition Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-18 17:15   ` Sean Christopherson
  2022-08-18 17:19   ` Sean Christopherson
  2022-08-02 16:07 ` [PATCH v5 10/26] KVM: selftests: Enable TSC scaling in evmcs selftest Vitaly Kuznetsov
                   ` (16 subsequent siblings)
  25 siblings, 2 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Enlightened VMCS v1 got updated and now includes the required fields
for TSC scaling and loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT
features. For Hyper-V on KVM enablement, KVM can just observe VMX control
MSRs and use the features (with or without eVMCS) when
possible. Hyper-V on KVM case is trickier because of live migration:
the new features require explicit enablement from VMM to not break
it. Luckily, the updated eVMCS revision comes with a feature bit in
CPUID.0x4000000A.EBX.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c     |  2 +-
 arch/x86/kvm/vmx/evmcs.c  | 88 +++++++++++++++++++++++++++++++--------
 arch/x86/kvm/vmx/evmcs.h  | 17 ++------
 arch/x86/kvm/vmx/nested.c |  2 +-
 arch/x86/kvm/vmx/vmx.c    |  2 +-
 5 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 1098915360ae..8a2b24f9bbf6 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2552,7 +2552,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 		case HYPERV_CPUID_NESTED_FEATURES:
 			ent->eax = evmcs_ver;
 			ent->eax |= HV_X64_NESTED_MSR_BITMAP;
-
+			ent->ebx |= HV_X64_NESTED_EVMCS1_2022_UPDATE;
 			break;
 
 		case HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS:
diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 8bea5dea0341..e8497f9854a1 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -368,7 +368,60 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
+enum evmcs_revision {
+	EVMCSv1_2016,
+	EVMCSv1_2022,
+	EVMCS_REVISION_MAX,
+};
+
+enum evmcs_unsupported_ctrl_type {
+	EVMCS_EXIT_CTLS,
+	EVMCS_ENTRY_CTLS,
+	EVMCS_2NDEXEC,
+	EVMCS_PINCTRL,
+	EVMCS_VMFUNC,
+	EVMCS_CTRL_MAX,
+};
+
+static u32 evmcs_unsupported_ctls[EVMCS_CTRL_MAX][EVMCS_REVISION_MAX] = {
+	[EVMCS_EXIT_CTLS] = {
+		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL,
+		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL,
+	},
+	[EVMCS_ENTRY_CTLS] = {
+		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMENTRY_CTRL | VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
+		[EVMCSv1_2022] =  EVMCS1_UNSUPPORTED_VMENTRY_CTRL,
+	},
+	[EVMCS_2NDEXEC] = {
+		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_2NDEXEC | SECONDARY_EXEC_TSC_SCALING,
+		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_2NDEXEC,
+	},
+	[EVMCS_PINCTRL] = {
+		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_PINCTRL,
+		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_PINCTRL,
+	},
+	[EVMCS_VMFUNC] = {
+		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMFUNC,
+		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMFUNC,
+	},
+};
+
+static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu,
+				      enum evmcs_unsupported_ctrl_type ctrl_type)
+{
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	enum evmcs_revision evmcs_rev = EVMCSv1_2016;
+
+	if (!hv_vcpu)
+		return 0;
+
+	if (hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
+		evmcs_rev = EVMCSv1_2022;
+
+	return evmcs_unsupported_ctls[ctrl_type][evmcs_rev];
+}
+
+void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 {
 	u32 ctl_low = (u32)*pdata;
 	u32 ctl_high = (u32)(*pdata >> 32);
@@ -380,72 +433,73 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
 	switch (msr_index) {
 	case MSR_IA32_VMX_EXIT_CTLS:
 	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
-		ctl_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
+		ctl_high &= ~evmcs_get_unsupported_ctls(vcpu, EVMCS_EXIT_CTLS);
 		break;
 	case MSR_IA32_VMX_ENTRY_CTLS:
 	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
-		ctl_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
+		ctl_high &= ~evmcs_get_unsupported_ctls(vcpu, EVMCS_ENTRY_CTLS);
 		break;
 	case MSR_IA32_VMX_PROCBASED_CTLS2:
-		ctl_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
+		ctl_high &= ~evmcs_get_unsupported_ctls(vcpu, EVMCS_2NDEXEC);
 		break;
 	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
 	case MSR_IA32_VMX_PINBASED_CTLS:
-		ctl_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
+		ctl_high &= ~evmcs_get_unsupported_ctls(vcpu, EVMCS_PINCTRL);
 		break;
 	case MSR_IA32_VMX_VMFUNC:
-		ctl_low &= ~EVMCS1_UNSUPPORTED_VMFUNC;
+		ctl_low &= ~evmcs_get_unsupported_ctls(vcpu, EVMCS_VMFUNC);
 		break;
 	}
 
 	*pdata = ctl_low | ((u64)ctl_high << 32);
 }
 
-int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
+int nested_evmcs_check_controls(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 {
 	int ret = 0;
 	u32 unsupp_ctl;
 
 	unsupp_ctl = vmcs12->pin_based_vm_exec_control &
-		EVMCS1_UNSUPPORTED_PINCTRL;
+		evmcs_get_unsupported_ctls(vcpu, EVMCS_PINCTRL);
 	if (unsupp_ctl) {
 		trace_kvm_nested_vmenter_failed(
-			"eVMCS: unsupported pin-based VM-execution controls",
+			"eVMCS: unsupported pin-based VM-execution controls: ",
 			unsupp_ctl);
 		ret = -EINVAL;
 	}
 
 	unsupp_ctl = vmcs12->secondary_vm_exec_control &
-		EVMCS1_UNSUPPORTED_2NDEXEC;
+		evmcs_get_unsupported_ctls(vcpu, EVMCS_2NDEXEC);
 	if (unsupp_ctl) {
 		trace_kvm_nested_vmenter_failed(
-			"eVMCS: unsupported secondary VM-execution controls",
+			"eVMCS: unsupported secondary VM-execution controls: ",
 			unsupp_ctl);
 		ret = -EINVAL;
 	}
 
 	unsupp_ctl = vmcs12->vm_exit_controls &
-		EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
+		evmcs_get_unsupported_ctls(vcpu, EVMCS_EXIT_CTLS);
 	if (unsupp_ctl) {
 		trace_kvm_nested_vmenter_failed(
-			"eVMCS: unsupported VM-exit controls",
+			"eVMCS: unsupported VM-exit controls: ",
 			unsupp_ctl);
 		ret = -EINVAL;
 	}
 
 	unsupp_ctl = vmcs12->vm_entry_controls &
-		EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
+		evmcs_get_unsupported_ctls(vcpu, EVMCS_ENTRY_CTLS);
 	if (unsupp_ctl) {
 		trace_kvm_nested_vmenter_failed(
-			"eVMCS: unsupported VM-entry controls",
+			"eVMCS: unsupported VM-entry controls: ",
 			unsupp_ctl);
 		ret = -EINVAL;
 	}
 
-	unsupp_ctl = vmcs12->vm_function_control & EVMCS1_UNSUPPORTED_VMFUNC;
+	unsupp_ctl = vmcs12->vm_function_control &
+		evmcs_get_unsupported_ctls(vcpu, EVMCS_VMFUNC);
 	if (unsupp_ctl) {
 		trace_kvm_nested_vmenter_failed(
-			"eVMCS: unsupported VM-function controls",
+			"eVMCS: unsupported VM-function controls: ",
 			unsupp_ctl);
 		ret = -EINVAL;
 	}
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index f886a8ff0342..4b809c79ae63 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -37,16 +37,9 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
  *	EPTP_LIST_ADDRESS               = 0x00002024,
  *	VMREAD_BITMAP                   = 0x00002026,
  *	VMWRITE_BITMAP                  = 0x00002028,
- *
- *	TSC_MULTIPLIER                  = 0x00002032,
  *	PLE_GAP                         = 0x00004020,
  *	PLE_WINDOW                      = 0x00004022,
  *	VMX_PREEMPTION_TIMER_VALUE      = 0x0000482E,
- *      GUEST_IA32_PERF_GLOBAL_CTRL     = 0x00002808,
- *      HOST_IA32_PERF_GLOBAL_CTRL      = 0x00002c04,
- *
- * Currently unsupported in KVM:
- *	GUEST_IA32_RTIT_CTL		= 0x00002814,
  */
 #define EVMCS1_UNSUPPORTED_PINCTRL (PIN_BASED_POSTED_INTR | \
 				    PIN_BASED_VMX_PREEMPTION_TIMER)
@@ -58,12 +51,10 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
 	 SECONDARY_EXEC_ENABLE_PML |					\
 	 SECONDARY_EXEC_ENABLE_VMFUNC |					\
 	 SECONDARY_EXEC_SHADOW_VMCS |					\
-	 SECONDARY_EXEC_TSC_SCALING |					\
 	 SECONDARY_EXEC_PAUSE_LOOP_EXITING)
 #define EVMCS1_UNSUPPORTED_VMEXIT_CTRL					\
-	(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |				\
-	 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
-#define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
+	(VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
+#define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (0)
 #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
 
 struct evmcs_field {
@@ -243,7 +234,7 @@ bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa);
 uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
 int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 			uint16_t *vmcs_version);
-void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
-int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
+void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
+int nested_evmcs_check_controls(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12);
 
 #endif /* __KVM_X86_VMX_EVMCS_H */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 270a1d8e4a6e..edb2f9c74d71 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2895,7 +2895,7 @@ static int nested_vmx_check_controls(struct kvm_vcpu *vcpu,
 		return -EINVAL;
 
 	if (to_vmx(vcpu)->nested.enlightened_vmcs_enabled)
-		return nested_evmcs_check_controls(vmcs12);
+		return nested_evmcs_check_controls(vcpu, vmcs12);
 
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d7f8331d6f7e..bd6f8552102a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1933,7 +1933,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 */
 		if (!msr_info->host_initiated &&
 		    vmx->nested.enlightened_vmcs_enabled)
-			nested_evmcs_filter_control_msr(msr_info->index,
+			nested_evmcs_filter_control_msr(vcpu, msr_info->index,
 							&msr_info->data);
 		break;
 	case MSR_IA32_RTIT_CTL:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 10/26] KVM: selftests: Enable TSC scaling in evmcs selftest
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (8 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 11/26] KVM: VMX: Get rid of eVMCS specific VMX controls sanitization Vitaly Kuznetsov
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

The updated Enlightened VMCS v1 definition enables TSC scaling, test
that SECONDARY_EXEC_TSC_SCALING can now be enabled.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 .../testing/selftests/kvm/x86_64/evmcs_test.c | 31 +++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
index 99bc202243d2..21a7a792a010 100644
--- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
@@ -18,6 +18,9 @@
 
 #include "vmx.h"
 
+/* Test flags */
+#define HOST_HAS_TSC_SCALING BIT(0)
+
 static int ud_count;
 
 static void guest_ud_handler(struct ex_regs *regs)
@@ -64,11 +67,14 @@ void l2_guest_code(void)
 	vmcall();
 	rdmsr_gs_base(); /* intercepted */
 
+	/* TSC scaling */
+	vmcall();
+
 	/* Done, exit to L1 and never come back.  */
 	vmcall();
 }
 
-void guest_code(struct vmx_pages *vmx_pages)
+void guest_code(struct vmx_pages *vmx_pages, u64 test_flags)
 {
 #define L2_GUEST_STACK_SIZE 64
 	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
@@ -150,6 +156,18 @@ void guest_code(struct vmx_pages *vmx_pages)
 	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
 	GUEST_SYNC(11);
 
+	if (test_flags & HOST_HAS_TSC_SCALING) {
+		GUEST_ASSERT((rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2) >> 32) &
+			     SECONDARY_EXEC_TSC_SCALING);
+		/* Try enabling TSC scaling */
+		vmwrite(SECONDARY_VM_EXEC_CONTROL, vmreadz(SECONDARY_VM_EXEC_CONTROL) |
+			SECONDARY_EXEC_TSC_SCALING);
+		vmwrite(TSC_MULTIPLIER, 1);
+	}
+	GUEST_ASSERT(!vmresume());
+	GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+	GUEST_SYNC(12);
+
 	/* Try enlightened vmptrld with an incorrect GPA */
 	evmcs_vmptrld(0xdeadbeef, vmx_pages->enlightened_vmcs);
 	GUEST_ASSERT(vmlaunch());
@@ -204,6 +222,7 @@ int main(int argc, char *argv[])
 	struct kvm_vm *vm;
 	struct kvm_run *run;
 	struct ucall uc;
+	u64 test_flags = 0;
 	int stage;
 
 	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
@@ -212,11 +231,19 @@ int main(int argc, char *argv[])
 	TEST_REQUIRE(kvm_has_cap(KVM_CAP_NESTED_STATE));
 	TEST_REQUIRE(kvm_has_cap(KVM_CAP_HYPERV_ENLIGHTENED_VMCS));
 
+	if ((kvm_get_feature_msr(MSR_IA32_VMX_PROCBASED_CTLS2) >> 32) &
+	    SECONDARY_EXEC_TSC_SCALING) {
+		test_flags |= HOST_HAS_TSC_SCALING;
+		pr_info("TSC scaling is supported, adding to test\n");
+	} else {
+		pr_info("TSC scaling is not supported\n");
+	}
+
 	vcpu_set_hv_cpuid(vcpu);
 	vcpu_enable_evmcs(vcpu);
 
 	vcpu_alloc_vmx(vm, &vmx_pages_gva);
-	vcpu_args_set(vcpu, 1, vmx_pages_gva);
+	vcpu_args_set(vcpu, 2, vmx_pages_gva, test_flags);
 
 	vm_init_descriptor_tables(vm);
 	vcpu_init_descriptor_tables(vcpu);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 11/26] KVM: VMX: Get rid of eVMCS specific VMX controls sanitization
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (9 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 10/26] KVM: selftests: Enable TSC scaling in evmcs selftest Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 12/26] KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config() Vitaly Kuznetsov
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

With the updated eVMCSv1 definition, there's no known 'problematic'
controls which are exposed in VMX control MSRs but are not present in
eVMCSv1: all known Hyper-V versions either don't expose the new fields
by not setting bits in the VMX feature controls or support the new
eVMCS revision.

Get rid of VMX control MSRs filtering for KVM on Hyper-V.

Note: VMX control MSRs filtering for Hyper-V on KVM
(nested_evmcs_filter_control_msr()) stays as even the updated eVMCSv1
definition doesn't have all the features implemented by KVM and some
fields are still missing. Moreover, nested_evmcs_filter_control_msr()
has to support the original eVMCSv1 version when VMM wishes so.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.c | 13 -------------
 arch/x86/kvm/vmx/evmcs.h |  1 -
 arch/x86/kvm/vmx/vmx.c   |  5 -----
 3 files changed, 19 deletions(-)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index e8497f9854a1..4340ef636ac3 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -320,19 +320,6 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
 };
 const unsigned int nr_evmcs_1_fields = ARRAY_SIZE(vmcs_field_to_evmcs_1);
 
-#if IS_ENABLED(CONFIG_HYPERV)
-__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
-{
-	vmcs_conf->cpu_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_EXEC_CTRL;
-	vmcs_conf->pin_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_PINCTRL;
-	vmcs_conf->cpu_based_2nd_exec_ctrl &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
-	vmcs_conf->cpu_based_3rd_exec_ctrl = 0;
-
-	vmcs_conf->vmexit_ctrl &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
-	vmcs_conf->vmentry_ctrl &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
-}
-#endif
-
 bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa)
 {
 	struct hv_vp_assist_page assist_page;
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 4b809c79ae63..0feac101cce4 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -203,7 +203,6 @@ static inline void evmcs_load(u64 phys_addr)
 	vp_ap->enlighten_vmentry = 1;
 }
 
-__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf);
 #else /* !IS_ENABLED(CONFIG_HYPERV) */
 static __always_inline void evmcs_write64(unsigned long field, u64 value) {}
 static inline void evmcs_write32(unsigned long field, u32 value) {}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bd6f8552102a..7a18a1828dc0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2768,11 +2768,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	vmcs_conf->vmexit_ctrl         = _vmexit_control;
 	vmcs_conf->vmentry_ctrl        = _vmentry_control;
 
-#if IS_ENABLED(CONFIG_HYPERV)
-	if (enlightened_vmcs)
-		evmcs_sanitize_exec_ctrls(vmcs_conf);
-#endif
-
 	return 0;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 12/26] KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config()
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (10 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 11/26] KVM: VMX: Get rid of eVMCS specific VMX controls sanitization Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 13/26] KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING " Vitaly Kuznetsov
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

VM_ENTRY_IA32E_MODE control is toggled dynamically by vmx_set_efer()
and setup_vmcs_config() doesn't check its existence. On the contrary,
nested_vmx_setup_ctls_msrs() doesn set it on x86_64. Add the missing
check and filter the bit out in vmx_vmentry_ctrl().

No (real) functional change intended as all existing CPUs supporting
long mode and VMX are supposed to have it.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7a18a1828dc0..5429101eea87 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2689,6 +2689,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
 
 	min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
+#ifdef CONFIG_X86_64
+	min |= VM_ENTRY_IA32E_MODE;
+#endif
 	opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
 	      VM_ENTRY_LOAD_IA32_PAT |
 	      VM_ENTRY_LOAD_IA32_EFER |
@@ -4323,9 +4326,14 @@ static u32 vmx_vmentry_ctrl(void)
 	if (vmx_pt_mode_is_system())
 		vmentry_ctrl &= ~(VM_ENTRY_PT_CONCEAL_PIP |
 				  VM_ENTRY_LOAD_IA32_RTIT_CTL);
-	/* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
-	return vmentry_ctrl &
-		~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER);
+	/*
+	 * IA32e mode, and loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically.
+	 */
+	vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
+			  VM_ENTRY_LOAD_IA32_EFER |
+			  VM_ENTRY_IA32E_MODE);
+
+	return vmentry_ctrl;
 }
 
 static u32 vmx_vmexit_ctrl(void)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 13/26] KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING in setup_vmcs_config()
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (11 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 12/26] KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config() Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 14/26] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING " Vitaly Kuznetsov
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

CPU_BASED_{INTR,NMI}_WINDOW_EXITING controls are toggled dynamically by
vmx_enable_{irq,nmi}_window, handle_interrupt_window(), handle_nmi_window()
but setup_vmcs_config() doesn't check their existence. Add the check and
filter the controls out in vmx_exec_control().

Note: KVM explicitly supports CPUs without VIRTUAL_NMIS and all these CPUs
are supposedly lacking NMI_WINDOW_EXITING too. Adjust cpu_has_virtual_nmis()
accordingly.

No functional change intended.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/capabilities.h | 3 ++-
 arch/x86/kvm/vmx/vmx.c          | 8 +++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index c5e5dfef69c7..faee1db8b0e0 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -82,7 +82,8 @@ static inline bool cpu_has_vmx_basic_inout(void)
 
 static inline bool cpu_has_virtual_nmis(void)
 {
-	return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS;
+	return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
+	       vmcs_config.cpu_based_exec_ctrl & CPU_BASED_NMI_WINDOW_EXITING;
 }
 
 static inline bool cpu_has_vmx_preemption_timer(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5429101eea87..554326651372 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2566,10 +2566,12 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	      CPU_BASED_MWAIT_EXITING |
 	      CPU_BASED_MONITOR_EXITING |
 	      CPU_BASED_INVLPG_EXITING |
-	      CPU_BASED_RDPMC_EXITING;
+	      CPU_BASED_RDPMC_EXITING |
+	      CPU_BASED_INTR_WINDOW_EXITING;
 
 	opt = CPU_BASED_TPR_SHADOW |
 	      CPU_BASED_USE_MSR_BITMAPS |
+	      CPU_BASED_NMI_WINDOW_EXITING |
 	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
 	      CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
@@ -4380,6 +4382,10 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
 {
 	u32 exec_control = vmcs_config.cpu_based_exec_ctrl;
 
+	/* INTR_WINDOW_EXITING and NMI_WINDOW_EXITING are toggled dynamically */
+	exec_control &= ~(CPU_BASED_INTR_WINDOW_EXITING |
+			  CPU_BASED_NMI_WINDOW_EXITING);
+
 	if (vmx->vcpu.arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)
 		exec_control &= ~CPU_BASED_MOV_DR_EXITING;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 14/26] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING in setup_vmcs_config()
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (12 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 13/26] KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING " Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 15/26] KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM Vitaly Kuznetsov
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

SECONDARY_EXEC_ENCLS_EXITING is the only control which is conditionally
added to the 'optional' checklist in setup_vmcs_config() but the special
case can be avoided by always checking for its presence first and filtering
out the result later.

Note: the situation when SECONDARY_EXEC_ENCLS_EXITING is present but
cpu_has_sgx() is false is possible when SGX is "soft-disabled", e.g. if
software writes MCE control MSRs or there's an uncorrectable #MC.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 554326651372..2323783024b7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2607,9 +2607,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 			SECONDARY_EXEC_PT_CONCEAL_VMX |
 			SECONDARY_EXEC_ENABLE_VMFUNC |
 			SECONDARY_EXEC_BUS_LOCK_DETECTION |
-			SECONDARY_EXEC_NOTIFY_VM_EXITING;
-		if (cpu_has_sgx())
-			opt2 |= SECONDARY_EXEC_ENCLS_EXITING;
+			SECONDARY_EXEC_NOTIFY_VM_EXITING |
+			SECONDARY_EXEC_ENCLS_EXITING;
+
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2656,6 +2656,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		vmx_cap->vpid = 0;
 	}
 
+	if (!cpu_has_sgx())
+		_cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_ENCLS_EXITING;
+
 	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
 		u64 opt3 = TERTIARY_EXEC_IPI_VIRT;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 15/26] KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (13 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 14/26] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING " Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 16/26] KVM: VMX: Extend VMX controls macro shenanigans Vitaly Kuznetsov
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

From: Sean Christopherson <seanjc@google.com>

Don't toggle VM_ENTRY_IA32E_MODE in 32-bit kernels/KVM and instead bug
the VM if KVM attempts to run the guest with EFER.LMA=1. KVM doesn't
support running 64-bit guests with 32-bit hosts.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2323783024b7..f5217ba9269c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3041,10 +3041,15 @@ int vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 		return 0;
 
 	vcpu->arch.efer = efer;
+#ifdef CONFIG_X86_64
 	if (efer & EFER_LMA)
 		vm_entry_controls_setbit(vmx, VM_ENTRY_IA32E_MODE);
 	else
 		vm_entry_controls_clearbit(vmx, VM_ENTRY_IA32E_MODE);
+#else
+	if (KVM_BUG_ON(efer & EFER_LMA, vcpu->kvm))
+		return 1;
+#endif
 
 	vmx_setup_uret_msrs(vmx);
 	return 0;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 16/26] KVM: VMX: Extend VMX controls macro shenanigans
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (14 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 15/26] KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 17/26] KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of setup_vmcs_config() Vitaly Kuznetsov
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

When VMX controls macros are used to set or clear a control bit, make
sure that this bit was checked in setup_vmcs_config() and thus is properly
reflected in vmcs_config.

Opportunistically drop pointless "< 0" check for adjust_vmx_controls()'s
return value.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 114 +++++++-----------------------
 arch/x86/kvm/vmx/vmx.h | 155 +++++++++++++++++++++++++++++++++++------
 2 files changed, 157 insertions(+), 112 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f5217ba9269c..75cadb371fbb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -865,7 +865,7 @@ unsigned int __vmx_vcpu_run_flags(struct vcpu_vmx *vmx)
 	return flags;
 }
 
-static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
+static __always_inline void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
 		unsigned long entry, unsigned long exit)
 {
 	vm_entry_controls_clearbit(vmx, entry);
@@ -923,7 +923,7 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
 	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr);
 }
 
-static void add_atomic_switch_msr_special(struct vcpu_vmx *vmx,
+static __always_inline void add_atomic_switch_msr_special(struct vcpu_vmx *vmx,
 		unsigned long entry, unsigned long exit,
 		unsigned long guest_val_vmcs, unsigned long host_val_vmcs,
 		u64 guest_val, u64 host_val)
@@ -2527,7 +2527,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 				    struct vmx_capability *vmx_cap)
 {
 	u32 vmx_msr_low, vmx_msr_high;
-	u32 min, opt, min2, opt2;
 	u32 _pin_based_exec_control = 0;
 	u32 _cpu_based_exec_control = 0;
 	u32 _cpu_based_2nd_exec_control = 0;
@@ -2553,29 +2552,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	};
 
 	memset(vmcs_conf, 0, sizeof(*vmcs_conf));
-	min = CPU_BASED_HLT_EXITING |
-#ifdef CONFIG_X86_64
-	      CPU_BASED_CR8_LOAD_EXITING |
-	      CPU_BASED_CR8_STORE_EXITING |
-#endif
-	      CPU_BASED_CR3_LOAD_EXITING |
-	      CPU_BASED_CR3_STORE_EXITING |
-	      CPU_BASED_UNCOND_IO_EXITING |
-	      CPU_BASED_MOV_DR_EXITING |
-	      CPU_BASED_USE_TSC_OFFSETTING |
-	      CPU_BASED_MWAIT_EXITING |
-	      CPU_BASED_MONITOR_EXITING |
-	      CPU_BASED_INVLPG_EXITING |
-	      CPU_BASED_RDPMC_EXITING |
-	      CPU_BASED_INTR_WINDOW_EXITING;
-
-	opt = CPU_BASED_TPR_SHADOW |
-	      CPU_BASED_USE_MSR_BITMAPS |
-	      CPU_BASED_NMI_WINDOW_EXITING |
-	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
-	      CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
-	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
-				&_cpu_based_exec_control) < 0)
+
+	if (adjust_vmx_controls(KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL,
+				KVM_OPTIONAL_VMX_CPU_BASED_VM_EXEC_CONTROL,
+				MSR_IA32_VMX_PROCBASED_CTLS,
+				&_cpu_based_exec_control))
 		return -EIO;
 #ifdef CONFIG_X86_64
 	if (_cpu_based_exec_control & CPU_BASED_TPR_SHADOW)
@@ -2583,36 +2564,10 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 					   ~CPU_BASED_CR8_STORE_EXITING;
 #endif
 	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) {
-		min2 = 0;
-		opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
-			SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
-			SECONDARY_EXEC_WBINVD_EXITING |
-			SECONDARY_EXEC_ENABLE_VPID |
-			SECONDARY_EXEC_ENABLE_EPT |
-			SECONDARY_EXEC_UNRESTRICTED_GUEST |
-			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
-			SECONDARY_EXEC_DESC |
-			SECONDARY_EXEC_ENABLE_RDTSCP |
-			SECONDARY_EXEC_ENABLE_INVPCID |
-			SECONDARY_EXEC_APIC_REGISTER_VIRT |
-			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
-			SECONDARY_EXEC_SHADOW_VMCS |
-			SECONDARY_EXEC_XSAVES |
-			SECONDARY_EXEC_RDSEED_EXITING |
-			SECONDARY_EXEC_RDRAND_EXITING |
-			SECONDARY_EXEC_ENABLE_PML |
-			SECONDARY_EXEC_TSC_SCALING |
-			SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
-			SECONDARY_EXEC_PT_USE_GPA |
-			SECONDARY_EXEC_PT_CONCEAL_VMX |
-			SECONDARY_EXEC_ENABLE_VMFUNC |
-			SECONDARY_EXEC_BUS_LOCK_DETECTION |
-			SECONDARY_EXEC_NOTIFY_VM_EXITING |
-			SECONDARY_EXEC_ENCLS_EXITING;
-
-		if (adjust_vmx_controls(min2, opt2,
+		if (adjust_vmx_controls(KVM_REQUIRED_VMX_SECONDARY_VM_EXEC_CONTROL,
+					KVM_OPTIONAL_VMX_SECONDARY_VM_EXEC_CONTROL,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
-					&_cpu_based_2nd_exec_control) < 0)
+					&_cpu_based_2nd_exec_control))
 			return -EIO;
 	}
 #ifndef CONFIG_X86_64
@@ -2659,32 +2614,21 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	if (!cpu_has_sgx())
 		_cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_ENCLS_EXITING;
 
-	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
-		u64 opt3 = TERTIARY_EXEC_IPI_VIRT;
-
-		_cpu_based_3rd_exec_control = adjust_vmx_controls64(opt3,
+	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS)
+		_cpu_based_3rd_exec_control =
+			adjust_vmx_controls64(KVM_OPTIONAL_VMX_TERTIARY_VM_EXEC_CONTROL,
 					      MSR_IA32_VMX_PROCBASED_CTLS3);
-	}
 
-	min = VM_EXIT_SAVE_DEBUG_CONTROLS | VM_EXIT_ACK_INTR_ON_EXIT;
-#ifdef CONFIG_X86_64
-	min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
-#endif
-	opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
-	      VM_EXIT_LOAD_IA32_PAT |
-	      VM_EXIT_LOAD_IA32_EFER |
-	      VM_EXIT_CLEAR_BNDCFGS |
-	      VM_EXIT_PT_CONCEAL_PIP |
-	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
-	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
-				&_vmexit_control) < 0)
+	if (adjust_vmx_controls(KVM_REQUIRED_VMX_VM_EXIT_CONTROLS,
+				KVM_OPTIONAL_VMX_VM_EXIT_CONTROLS,
+				MSR_IA32_VMX_EXIT_CTLS,
+				&_vmexit_control))
 		return -EIO;
 
-	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
-	opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR |
-		 PIN_BASED_VMX_PREEMPTION_TIMER;
-	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
-				&_pin_based_exec_control) < 0)
+	if (adjust_vmx_controls(KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL,
+				KVM_OPTIONAL_VMX_PIN_BASED_VM_EXEC_CONTROL,
+				MSR_IA32_VMX_PINBASED_CTLS,
+				&_pin_based_exec_control))
 		return -EIO;
 
 	if (cpu_has_broken_vmx_preemption_timer())
@@ -2693,18 +2637,10 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
 		_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
 
-	min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
-#ifdef CONFIG_X86_64
-	min |= VM_ENTRY_IA32E_MODE;
-#endif
-	opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
-	      VM_ENTRY_LOAD_IA32_PAT |
-	      VM_ENTRY_LOAD_IA32_EFER |
-	      VM_ENTRY_LOAD_BNDCFGS |
-	      VM_ENTRY_PT_CONCEAL_PIP |
-	      VM_ENTRY_LOAD_IA32_RTIT_CTL;
-	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
-				&_vmentry_control) < 0)
+	if (adjust_vmx_controls(KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS,
+				KVM_OPTIONAL_VMX_VM_ENTRY_CONTROLS,
+				MSR_IA32_VMX_ENTRY_CTLS,
+				&_vmentry_control))
 		return -EIO;
 
 	for (i = 0; i < ARRAY_SIZE(vmcs_entry_exit_pairs); i++) {
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index fb8e3480a9d7..8dd942aeaa4a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -485,29 +485,138 @@ static inline u8 vmx_get_rvi(void)
 	return vmcs_read16(GUEST_INTR_STATUS) & 0xff;
 }
 
-#define BUILD_CONTROLS_SHADOW(lname, uname, bits)				\
-static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val)	\
-{										\
-	if (vmx->loaded_vmcs->controls_shadow.lname != val) {			\
-		vmcs_write##bits(uname, val);					\
-		vmx->loaded_vmcs->controls_shadow.lname = val;			\
-	}									\
-}										\
-static inline u##bits __##lname##_controls_get(struct loaded_vmcs *vmcs)	\
-{										\
-	return vmcs->controls_shadow.lname;					\
-}										\
-static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx)		\
-{										\
-	return __##lname##_controls_get(vmx->loaded_vmcs);			\
-}										\
-static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val)	\
-{										\
-	lname##_controls_set(vmx, lname##_controls_get(vmx) | val);		\
-}										\
-static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val)	\
-{										\
-	lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val);		\
+#define __KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS				\
+	(VM_ENTRY_LOAD_DEBUG_CONTROLS)
+#ifdef CONFIG_X86_64
+	#define KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS			\
+		(__KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS |			\
+		 VM_ENTRY_IA32E_MODE)
+#else
+	#define KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS			\
+		__KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS
+#endif
+#define KVM_OPTIONAL_VMX_VM_ENTRY_CONTROLS				\
+	(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |				\
+	 VM_ENTRY_LOAD_IA32_PAT |					\
+	 VM_ENTRY_LOAD_IA32_EFER |					\
+	 VM_ENTRY_LOAD_BNDCFGS |					\
+	 VM_ENTRY_PT_CONCEAL_PIP |					\
+	 VM_ENTRY_LOAD_IA32_RTIT_CTL)
+
+#define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS				\
+	(VM_EXIT_SAVE_DEBUG_CONTROLS |					\
+	 VM_EXIT_ACK_INTR_ON_EXIT)
+#ifdef CONFIG_X86_64
+	#define KVM_REQUIRED_VMX_VM_EXIT_CONTROLS			\
+		(__KVM_REQUIRED_VMX_VM_EXIT_CONTROLS |			\
+		 VM_EXIT_HOST_ADDR_SPACE_SIZE)
+#else
+	#define KVM_REQUIRED_VMX_VM_EXIT_CONTROLS			\
+		__KVM_REQUIRED_VMX_VM_EXIT_CONTROLS
+#endif
+#define KVM_OPTIONAL_VMX_VM_EXIT_CONTROLS				\
+	      (VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |			\
+	       VM_EXIT_LOAD_IA32_PAT |					\
+	       VM_EXIT_LOAD_IA32_EFER |					\
+	       VM_EXIT_CLEAR_BNDCFGS |					\
+	       VM_EXIT_PT_CONCEAL_PIP |					\
+	       VM_EXIT_CLEAR_IA32_RTIT_CTL)
+
+#define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL			\
+	(PIN_BASED_EXT_INTR_MASK |					\
+	 PIN_BASED_NMI_EXITING)
+#define KVM_OPTIONAL_VMX_PIN_BASED_VM_EXEC_CONTROL			\
+	(PIN_BASED_VIRTUAL_NMIS |					\
+	 PIN_BASED_POSTED_INTR |					\
+	 PIN_BASED_VMX_PREEMPTION_TIMER)
+
+#define __KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL			\
+	(CPU_BASED_HLT_EXITING |					\
+	 CPU_BASED_CR3_LOAD_EXITING |					\
+	 CPU_BASED_CR3_STORE_EXITING |					\
+	 CPU_BASED_UNCOND_IO_EXITING |					\
+	 CPU_BASED_MOV_DR_EXITING |					\
+	 CPU_BASED_USE_TSC_OFFSETTING |					\
+	 CPU_BASED_MWAIT_EXITING |					\
+	 CPU_BASED_MONITOR_EXITING |					\
+	 CPU_BASED_INVLPG_EXITING |					\
+	 CPU_BASED_RDPMC_EXITING |					\
+	 CPU_BASED_INTR_WINDOW_EXITING)
+
+#ifdef CONFIG_X86_64
+	#define KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL		\
+		(__KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL |		\
+		 CPU_BASED_CR8_LOAD_EXITING |				\
+		 CPU_BASED_CR8_STORE_EXITING)
+#else
+	#define KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL		\
+		__KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL
+#endif
+
+#define KVM_OPTIONAL_VMX_CPU_BASED_VM_EXEC_CONTROL			\
+	(CPU_BASED_TPR_SHADOW |						\
+	 CPU_BASED_USE_MSR_BITMAPS |					\
+	 CPU_BASED_NMI_WINDOW_EXITING |					\
+	 CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |			\
+	 CPU_BASED_ACTIVATE_TERTIARY_CONTROLS)
+
+#define KVM_REQUIRED_VMX_SECONDARY_VM_EXEC_CONTROL 0
+#define KVM_OPTIONAL_VMX_SECONDARY_VM_EXEC_CONTROL			\
+	(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |			\
+	 SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |			\
+	 SECONDARY_EXEC_WBINVD_EXITING |				\
+	 SECONDARY_EXEC_ENABLE_VPID |					\
+	 SECONDARY_EXEC_ENABLE_EPT |					\
+	 SECONDARY_EXEC_UNRESTRICTED_GUEST |				\
+	 SECONDARY_EXEC_PAUSE_LOOP_EXITING |				\
+	 SECONDARY_EXEC_DESC |						\
+	 SECONDARY_EXEC_ENABLE_RDTSCP |					\
+	 SECONDARY_EXEC_ENABLE_INVPCID |				\
+	 SECONDARY_EXEC_APIC_REGISTER_VIRT |				\
+	 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |				\
+	 SECONDARY_EXEC_SHADOW_VMCS |					\
+	 SECONDARY_EXEC_XSAVES |					\
+	 SECONDARY_EXEC_RDSEED_EXITING |				\
+	 SECONDARY_EXEC_RDRAND_EXITING |				\
+	 SECONDARY_EXEC_ENABLE_PML |					\
+	 SECONDARY_EXEC_TSC_SCALING |					\
+	 SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |				\
+	 SECONDARY_EXEC_PT_USE_GPA |					\
+	 SECONDARY_EXEC_PT_CONCEAL_VMX |				\
+	 SECONDARY_EXEC_ENABLE_VMFUNC |					\
+	 SECONDARY_EXEC_BUS_LOCK_DETECTION |				\
+	 SECONDARY_EXEC_NOTIFY_VM_EXITING |				\
+	 SECONDARY_EXEC_ENCLS_EXITING)
+
+#define KVM_REQUIRED_VMX_TERTIARY_VM_EXEC_CONTROL 0
+#define KVM_OPTIONAL_VMX_TERTIARY_VM_EXEC_CONTROL			\
+	(TERTIARY_EXEC_IPI_VIRT)
+
+#define BUILD_CONTROLS_SHADOW(lname, uname, bits)						\
+static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val)			\
+{												\
+	if (vmx->loaded_vmcs->controls_shadow.lname != val) {					\
+		vmcs_write##bits(uname, val);							\
+		vmx->loaded_vmcs->controls_shadow.lname = val;					\
+	}											\
+}												\
+static inline u##bits __##lname##_controls_get(struct loaded_vmcs *vmcs)			\
+{												\
+	return vmcs->controls_shadow.lname;							\
+}												\
+static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx)				\
+{												\
+	return __##lname##_controls_get(vmx->loaded_vmcs);					\
+}												\
+static __always_inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val)		\
+{												\
+	BUILD_BUG_ON(!(val & (KVM_REQUIRED_VMX_##uname | KVM_OPTIONAL_VMX_##uname)));		\
+	lname##_controls_set(vmx, lname##_controls_get(vmx) | val);				\
+}												\
+static __always_inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val)	\
+{												\
+	BUILD_BUG_ON(!(val & (KVM_REQUIRED_VMX_##uname | KVM_OPTIONAL_VMX_##uname)));		\
+	lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val);				\
 }
 BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32)
 BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 17/26] KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of setup_vmcs_config()
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (15 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 16/26] KVM: VMX: Extend VMX controls macro shenanigans Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 18/26] KVM: VMX: Add missing VMEXIT controls to vmcs_config Vitaly Kuznetsov
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

As a preparation to reusing the result of setup_vmcs_config() in
nested VMX MSR setup, move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering
to vmx_exec_control().

No functional change intended.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 75cadb371fbb..94d7060aebe1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2558,11 +2558,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 				MSR_IA32_VMX_PROCBASED_CTLS,
 				&_cpu_based_exec_control))
 		return -EIO;
-#ifdef CONFIG_X86_64
-	if (_cpu_based_exec_control & CPU_BASED_TPR_SHADOW)
-		_cpu_based_exec_control &= ~CPU_BASED_CR8_LOAD_EXITING &
-					   ~CPU_BASED_CR8_STORE_EXITING;
-#endif
 	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) {
 		if (adjust_vmx_controls(KVM_REQUIRED_VMX_SECONDARY_VM_EXEC_CONTROL,
 					KVM_OPTIONAL_VMX_SECONDARY_VM_EXEC_CONTROL,
@@ -4333,13 +4328,17 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
 	if (vmx->vcpu.arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)
 		exec_control &= ~CPU_BASED_MOV_DR_EXITING;
 
-	if (!cpu_need_tpr_shadow(&vmx->vcpu)) {
+	if (!cpu_need_tpr_shadow(&vmx->vcpu))
 		exec_control &= ~CPU_BASED_TPR_SHADOW;
+
 #ifdef CONFIG_X86_64
+	if (exec_control & CPU_BASED_TPR_SHADOW)
+		exec_control &= ~(CPU_BASED_CR8_LOAD_EXITING |
+				  CPU_BASED_CR8_STORE_EXITING);
+	else
 		exec_control |= CPU_BASED_CR8_STORE_EXITING |
 				CPU_BASED_CR8_LOAD_EXITING;
 #endif
-	}
 	if (!enable_ept)
 		exec_control |= CPU_BASED_CR3_STORE_EXITING |
 				CPU_BASED_CR3_LOAD_EXITING  |
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 18/26] KVM: VMX: Add missing VMEXIT controls to vmcs_config
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (16 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 17/26] KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of setup_vmcs_config() Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 19/26] KVM: VMX: Add missing CPU based VM execution " Vitaly Kuznetsov
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

As a preparation to reusing the result of setup_vmcs_config() in
nested VMX MSR setup, add the VMEXIT controls which KVM doesn't
use but supports for nVMX to KVM_OPT_VMX_VM_EXIT_CONTROLS and
filter them out in vmx_vmexit_ctrl().

No functional change intended.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 7 +++++++
 arch/x86/kvm/vmx/vmx.h | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 94d7060aebe1..a7097c7ed547 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4281,6 +4281,13 @@ static u32 vmx_vmexit_ctrl(void)
 {
 	u32 vmexit_ctrl = vmcs_config.vmexit_ctrl;
 
+	/*
+	 * Not used by KVM and never set in vmcs01 or vmcs02, but emulated for
+	 * nested virtualization and thus allowed to be set in vmcs12.
+	 */
+	vmexit_ctrl &= ~(VM_EXIT_SAVE_IA32_PAT | VM_EXIT_SAVE_IA32_EFER |
+			 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
+
 	if (vmx_pt_mode_is_system())
 		vmexit_ctrl &= ~(VM_EXIT_PT_CONCEAL_PIP |
 				 VM_EXIT_CLEAR_IA32_RTIT_CTL);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8dd942aeaa4a..e3b908e7365f 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -516,7 +516,10 @@ static inline u8 vmx_get_rvi(void)
 #endif
 #define KVM_OPTIONAL_VMX_VM_EXIT_CONTROLS				\
 	      (VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |			\
+	       VM_EXIT_SAVE_IA32_PAT |					\
 	       VM_EXIT_LOAD_IA32_PAT |					\
+	       VM_EXIT_SAVE_IA32_EFER |					\
+	       VM_EXIT_SAVE_VMX_PREEMPTION_TIMER |			\
 	       VM_EXIT_LOAD_IA32_EFER |					\
 	       VM_EXIT_CLEAR_BNDCFGS |					\
 	       VM_EXIT_PT_CONCEAL_PIP |					\
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 19/26] KVM: VMX: Add missing CPU based VM execution controls to vmcs_config
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (17 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 18/26] KVM: VMX: Add missing VMEXIT controls to vmcs_config Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 20/26] KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not setup Vitaly Kuznetsov
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

As a preparation to reusing the result of setup_vmcs_config() in
nested VMX MSR setup, add the CPU based VM execution controls which KVM
doesn't use but supports for nVMX to KVM_OPT_VMX_CPU_BASED_VM_EXEC_CONTROL
and filter them out in vmx_exec_control().

No functional change intended.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 9 +++++++++
 arch/x86/kvm/vmx/vmx.h | 6 +++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a7097c7ed547..589e5de7fdbb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4328,6 +4328,15 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
 {
 	u32 exec_control = vmcs_config.cpu_based_exec_ctrl;
 
+	/*
+	 * Not used by KVM, but fully supported for nesting, i.e. are allowed in
+	 * vmcs12 and propagated to vmcs02 when set in vmcs12.
+	 */
+	exec_control &= ~(CPU_BASED_RDTSC_EXITING |
+			  CPU_BASED_USE_IO_BITMAPS |
+			  CPU_BASED_MONITOR_TRAP_FLAG |
+			  CPU_BASED_PAUSE_EXITING);
+
 	/* INTR_WINDOW_EXITING and NMI_WINDOW_EXITING are toggled dynamically */
 	exec_control &= ~(CPU_BASED_INTR_WINDOW_EXITING |
 			  CPU_BASED_NMI_WINDOW_EXITING);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index e3b908e7365f..bf7d80ab543c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -557,9 +557,13 @@ static inline u8 vmx_get_rvi(void)
 #endif
 
 #define KVM_OPTIONAL_VMX_CPU_BASED_VM_EXEC_CONTROL			\
-	(CPU_BASED_TPR_SHADOW |						\
+	(CPU_BASED_RDTSC_EXITING |					\
+	 CPU_BASED_TPR_SHADOW |						\
+	 CPU_BASED_USE_IO_BITMAPS |					\
+	 CPU_BASED_MONITOR_TRAP_FLAG |					\
 	 CPU_BASED_USE_MSR_BITMAPS |					\
 	 CPU_BASED_NMI_WINDOW_EXITING |					\
+	 CPU_BASED_PAUSE_EXITING |					\
 	 CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |			\
 	 CPU_BASED_ACTIVATE_TERTIARY_CONTROLS)
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 20/26] KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not setup
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (18 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 19/26] KVM: VMX: Add missing CPU based VM execution " Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 21/26] KVM: x86: VMX: Replace some Intel model numbers with mnemonics Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

From: Sean Christopherson <seanjc@google.com>

Clear the CR3 and INVLPG interception controls at runtime based on
whether or not EPT is being _used_, as opposed to clearing the bits at
setup if EPT is _supported_ in hardware, and then restoring them when EPT
is not used.  Not mucking with the base config will allow using the base
config as the starting point for emulating the VMX capability MSRs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 589e5de7fdbb..a444f68f50f5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2580,13 +2580,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	rdmsr_safe(MSR_IA32_VMX_EPT_VPID_CAP,
 		&vmx_cap->ept, &vmx_cap->vpid);
 
-	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
-		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
-		   enabled */
-		_cpu_based_exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING |
-					     CPU_BASED_CR3_STORE_EXITING |
-					     CPU_BASED_INVLPG_EXITING);
-	} else if (vmx_cap->ept) {
+	if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) &&
+	    vmx_cap->ept) {
 		pr_warn_once("EPT CAP should not exist if not support "
 				"1-setting enable EPT VM-execution control\n");
 
@@ -4355,10 +4350,11 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
 		exec_control |= CPU_BASED_CR8_STORE_EXITING |
 				CPU_BASED_CR8_LOAD_EXITING;
 #endif
-	if (!enable_ept)
-		exec_control |= CPU_BASED_CR3_STORE_EXITING |
-				CPU_BASED_CR3_LOAD_EXITING  |
-				CPU_BASED_INVLPG_EXITING;
+	/* No need to intercept CR3 access or INVPLG when using EPT. */
+	if (enable_ept)
+		exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING |
+				  CPU_BASED_CR3_STORE_EXITING |
+				  CPU_BASED_INVLPG_EXITING);
 	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
 		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
 				CPU_BASED_MONITOR_EXITING);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 21/26] KVM: x86: VMX: Replace some Intel model numbers with mnemonics
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (19 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 20/26] KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not setup Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config() Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

From: Jim Mattson <jmattson@google.com>

Intel processor code names are more familiar to many readers than
their decimal model numbers.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a444f68f50f5..baf1054765a7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2658,11 +2658,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	 */
 	if (boot_cpu_data.x86 == 0x6) {
 		switch (boot_cpu_data.x86_model) {
-		case 26: /* AAK155 */
-		case 30: /* AAP115 */
-		case 37: /* AAT100 */
-		case 44: /* BC86,AAY89,BD102 */
-		case 46: /* BA97 */
+		case INTEL_FAM6_NEHALEM_EP:	/* AAK155 */
+		case INTEL_FAM6_NEHALEM:	/* AAP115 */
+		case INTEL_FAM6_WESTMERE:	/* AAT100 */
+		case INTEL_FAM6_WESTMERE_EP:	/* BC86,AAY89,BD102 */
+		case INTEL_FAM6_NEHALEM_EX:	/* BA97 */
 			_vmentry_control &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
 			_vmexit_control &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
 			pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (20 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 21/26] KVM: x86: VMX: Replace some Intel model numbers with mnemonics Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-18 17:49   ` Sean Christopherson
  2022-08-02 16:07 ` [PATCH v5 23/26] KVM: nVMX: Always set required-1 bits of pinbased_ctls to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  25 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

As a preparation to reusing the result of setup_vmcs_config() for setting
up nested VMX control MSRs, move LOAD_IA32_PERF_GLOBAL_CTRL errata handling
to vmx_vmexit_ctrl()/vmx_vmentry_ctrl() and print the warning from
hardware_setup(). While it seems reasonable to not expose
LOAD_IA32_PERF_GLOBAL_CTRL controls to L1 hypervisor on buggy CPUs,
such change would inevitably break live migration from older KVMs
where the controls are exposed. Keep the status quo for now, L1 hypervisor
itself is supposed to take care of the errata.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 59 +++++++++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index baf1054765a7..ab5d16691c5e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2495,6 +2495,30 @@ static bool cpu_has_sgx(void)
 	return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0));
 }
 
+/*
+ * Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they
+ * can't be used due to errata where VM Exit may incorrectly clear
+ * IA32_PERF_GLOBAL_CTRL[34:32]. Work around the errata by using the
+ * MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL.
+ */
+static bool cpu_has_perf_global_ctrl_bug(void)
+{
+	if (boot_cpu_data.x86 == 0x6) {
+		switch (boot_cpu_data.x86_model) {
+		case INTEL_FAM6_NEHALEM_EP:	/* AAK155 */
+		case INTEL_FAM6_NEHALEM:	/* AAP115 */
+		case INTEL_FAM6_WESTMERE:	/* AAT100 */
+		case INTEL_FAM6_WESTMERE_EP:	/* BC86,AAY89,BD102 */
+		case INTEL_FAM6_NEHALEM_EX:	/* BA97 */
+			return true;
+		default:
+			break;
+		}
+	}
+
+	return false;
+}
+
 static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
 				      u32 msr, u32 *result)
 {
@@ -2650,30 +2674,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		_vmexit_control &= ~x_ctrl;
 	}
 
-	/*
-	 * Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they
-	 * can't be used due to an errata where VM Exit may incorrectly clear
-	 * IA32_PERF_GLOBAL_CTRL[34:32].  Workaround the errata by using the
-	 * MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL.
-	 */
-	if (boot_cpu_data.x86 == 0x6) {
-		switch (boot_cpu_data.x86_model) {
-		case INTEL_FAM6_NEHALEM_EP:	/* AAK155 */
-		case INTEL_FAM6_NEHALEM:	/* AAP115 */
-		case INTEL_FAM6_WESTMERE:	/* AAT100 */
-		case INTEL_FAM6_WESTMERE_EP:	/* BC86,AAY89,BD102 */
-		case INTEL_FAM6_NEHALEM_EX:	/* BA97 */
-			_vmentry_control &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
-			_vmexit_control &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
-			pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
-					"does not work properly. Using workaround\n");
-			break;
-		default:
-			break;
-		}
-	}
-
-
 	rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high);
 
 	/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -4269,6 +4269,9 @@ static u32 vmx_vmentry_ctrl(void)
 			  VM_ENTRY_LOAD_IA32_EFER |
 			  VM_ENTRY_IA32E_MODE);
 
+	if (cpu_has_perf_global_ctrl_bug())
+		vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+
 	return vmentry_ctrl;
 }
 
@@ -4286,6 +4289,10 @@ static u32 vmx_vmexit_ctrl(void)
 	if (vmx_pt_mode_is_system())
 		vmexit_ctrl &= ~(VM_EXIT_PT_CONCEAL_PIP |
 				 VM_EXIT_CLEAR_IA32_RTIT_CTL);
+
+	if (cpu_has_perf_global_ctrl_bug())
+		vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+
 	/* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
 	return vmexit_ctrl &
 		~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);
@@ -8192,6 +8199,10 @@ static __init int hardware_setup(void)
 	if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
 		return -EIO;
 
+	if (cpu_has_perf_global_ctrl_bug())
+		pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
+			     "does not work properly. Using workaround\n");
+
 	if (boot_cpu_has(X86_FEATURE_NX))
 		kvm_enable_efer_bits(EFER_NX);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 23/26] KVM: nVMX: Always set required-1 bits of pinbased_ctls to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (21 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config() Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 24/26] KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Similar to exit_ctls_low, entry_ctls_low, procbased_ctls_low,
pinbased_ctls_low should be set to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR
and not host's MSR_IA32_VMX_PINBASED_CTLS value |=
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR.

The commit eabeaaccfca0 ("KVM: nVMX: Clean up and fix pin-based
execution controls") which introduced '|=' doesn't mention anything
about why this is needed, the change seems rather accidental.

Note: normally, required-1 portion of MSR_IA32_VMX_PINBASED_CTLS should
be equal to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR so no behavioral change
is expected, however, it is (in theory) possible to observe something
different there when e.g. KVM is running as a nested hypervisor. Hope
this doesn't happen in practice.

Reported-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index edb2f9c74d71..c86a0f8bb0f4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6574,7 +6574,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	rdmsr(MSR_IA32_VMX_PINBASED_CTLS,
 		msrs->pinbased_ctls_low,
 		msrs->pinbased_ctls_high);
-	msrs->pinbased_ctls_low |=
+	msrs->pinbased_ctls_low =
 		PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
 	msrs->pinbased_ctls_high &=
 		PIN_BASED_EXT_INTR_MASK |
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 24/26] KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (22 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 23/26] KVM: nVMX: Always set required-1 bits of pinbased_ctls to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 25/26] KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 26/26] KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up nested MSR Vitaly Kuznetsov
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Using raw host MSR values for setting up nested VMX control MSRs is
incorrect as some features need to disabled, e.g. when KVM runs as
a nested hypervisor on Hyper-V and uses Enlightened VMCS or when a
workaround for IA32_PERF_GLOBAL_CTRL is applied. For non-nested VMX, this
is done in setup_vmcs_config() and the result is stored in vmcs_config.
Use it for setting up allowed-1 bits in nested VMX MSRs too.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 30 ++++++++++++------------------
 arch/x86/kvm/vmx/nested.h |  2 +-
 arch/x86/kvm/vmx/vmx.c    |  5 ++---
 3 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c86a0f8bb0f4..036a28c96947 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6553,8 +6553,10 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void)
  * bit in the high half is on if the corresponding bit in the control field
  * may be on. See also vmx_control_verify().
  */
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
+void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
 {
+	struct nested_vmx_msrs *msrs = &vmcs_conf->nested;
+
 	/*
 	 * Note that as a general rule, the high half of the MSRs (bits in
 	 * the control fields which may be 1) should be initialized by the
@@ -6571,11 +6573,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	 */
 
 	/* pin-based controls */
-	rdmsr(MSR_IA32_VMX_PINBASED_CTLS,
-		msrs->pinbased_ctls_low,
-		msrs->pinbased_ctls_high);
 	msrs->pinbased_ctls_low =
 		PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+
+	msrs->pinbased_ctls_high = vmcs_conf->pin_based_exec_ctrl;
 	msrs->pinbased_ctls_high &=
 		PIN_BASED_EXT_INTR_MASK |
 		PIN_BASED_NMI_EXITING |
@@ -6586,12 +6587,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		PIN_BASED_VMX_PREEMPTION_TIMER;
 
 	/* exit controls */
-	rdmsr(MSR_IA32_VMX_EXIT_CTLS,
-		msrs->exit_ctls_low,
-		msrs->exit_ctls_high);
 	msrs->exit_ctls_low =
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
 
+	msrs->exit_ctls_high = vmcs_conf->vmexit_ctrl;
 	msrs->exit_ctls_high &=
 #ifdef CONFIG_X86_64
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
@@ -6607,11 +6606,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;
 
 	/* entry controls */
-	rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
-		msrs->entry_ctls_low,
-		msrs->entry_ctls_high);
 	msrs->entry_ctls_low =
 		VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
+
+	msrs->entry_ctls_high = vmcs_conf->vmentry_ctrl;
 	msrs->entry_ctls_high &=
 #ifdef CONFIG_X86_64
 		VM_ENTRY_IA32E_MODE |
@@ -6625,11 +6623,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;
 
 	/* cpu-based controls */
-	rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
-		msrs->procbased_ctls_low,
-		msrs->procbased_ctls_high);
 	msrs->procbased_ctls_low =
 		CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+
+	msrs->procbased_ctls_high = vmcs_conf->cpu_based_exec_ctrl;
 	msrs->procbased_ctls_high &=
 		CPU_BASED_INTR_WINDOW_EXITING |
 		CPU_BASED_NMI_WINDOW_EXITING | CPU_BASED_USE_TSC_OFFSETTING |
@@ -6663,12 +6660,9 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 	 * depend on CPUID bits, they are added later by
 	 * vmx_vcpu_after_set_cpuid.
 	 */
-	if (msrs->procbased_ctls_high & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
-		rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
-		      msrs->secondary_ctls_low,
-		      msrs->secondary_ctls_high);
-
 	msrs->secondary_ctls_low = 0;
+
+	msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
 	msrs->secondary_ctls_high &=
 		SECONDARY_EXEC_DESC |
 		SECONDARY_EXEC_ENABLE_RDTSCP |
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 88b00a7359e4..6312c9541c3c 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,7 +17,7 @@ enum nvmx_vmentry_status {
 };
 
 void vmx_leave_nested(struct kvm_vcpu *vcpu);
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
+void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps);
 void nested_vmx_hardware_unsetup(void);
 __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
 void nested_vmx_set_vmcs_shadowing_bitmap(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ab5d16691c5e..6461b03bad68 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7402,7 +7402,7 @@ static int __init vmx_check_processor_compat(void)
 	if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
 		return -EIO;
 	if (nested)
-		nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
+		nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept);
 	if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
 		printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
 				smp_processor_id());
@@ -8357,8 +8357,7 @@ static __init int hardware_setup(void)
 	setup_default_sgx_lepubkeyhash();
 
 	if (nested) {
-		nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
-					   vmx_capability.ept);
+		nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept);
 
 		r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
 		if (r)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 25/26] KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (23 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 24/26] KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  2022-08-02 16:07 ` [PATCH v5 26/26] KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up nested MSR Vitaly Kuznetsov
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

Like other host VMX control MSRs, MSR_IA32_VMX_MISC can be cached in
vmcs_config to avoid the need to re-read it later, e.g. from
cpu_has_vmx_intel_pt() or cpu_has_vmx_shadow_vmcs().

No (real) functional change intended.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/capabilities.h | 11 +++--------
 arch/x86/kvm/vmx/vmx.c          |  8 +++++---
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index faee1db8b0e0..87c4e46daf37 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -65,6 +65,7 @@ struct vmcs_config {
 	u64 cpu_based_3rd_exec_ctrl;
 	u32 vmexit_ctrl;
 	u32 vmentry_ctrl;
+	u64 misc;
 	struct nested_vmx_msrs nested;
 };
 extern struct vmcs_config vmcs_config;
@@ -225,11 +226,8 @@ static inline bool cpu_has_vmx_vmfunc(void)
 
 static inline bool cpu_has_vmx_shadow_vmcs(void)
 {
-	u64 vmx_msr;
-
 	/* check if the cpu supports writing r/o exit information fields */
-	rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
-	if (!(vmx_msr & MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS))
+	if (!(vmcs_config.misc & MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS))
 		return false;
 
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -371,10 +369,7 @@ static inline bool cpu_has_vmx_invvpid_global(void)
 
 static inline bool cpu_has_vmx_intel_pt(void)
 {
-	u64 vmx_msr;
-
-	rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
-	return (vmx_msr & MSR_IA32_VMX_MISC_INTEL_PT) &&
+	return (vmcs_config.misc & MSR_IA32_VMX_MISC_INTEL_PT) &&
 		(vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_PT_USE_GPA) &&
 		(vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_RTIT_CTL);
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6461b03bad68..73bc6e18c2bf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2557,6 +2557,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	u64 _cpu_based_3rd_exec_control = 0;
 	u32 _vmexit_control = 0;
 	u32 _vmentry_control = 0;
+	u64 misc_msr;
 	int i;
 
 	/*
@@ -2690,6 +2691,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	if (((vmx_msr_high >> 18) & 15) != 6)
 		return -EIO;
 
+	rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
+
 	vmcs_conf->size = vmx_msr_high & 0x1fff;
 	vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
 
@@ -2701,6 +2704,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	vmcs_conf->cpu_based_3rd_exec_ctrl = _cpu_based_3rd_exec_control;
 	vmcs_conf->vmexit_ctrl         = _vmexit_control;
 	vmcs_conf->vmentry_ctrl        = _vmentry_control;
+	vmcs_conf->misc	= misc_msr;
 
 	return 0;
 }
@@ -8317,11 +8321,9 @@ static __init int hardware_setup(void)
 
 	if (enable_preemption_timer) {
 		u64 use_timer_freq = 5000ULL * 1000 * 1000;
-		u64 vmx_msr;
 
-		rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
 		cpu_preemption_timer_multi =
-			vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+			vmcs_config.misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
 
 		if (tsc_khz)
 			use_timer_freq = (u64)tsc_khz * 1000;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 26/26] KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up nested MSR
  2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
                   ` (24 preceding siblings ...)
  2022-08-02 16:07 ` [PATCH v5 25/26] KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config Vitaly Kuznetsov
@ 2022-08-02 16:07 ` Vitaly Kuznetsov
  25 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-02 16:07 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson
  Cc: Anirudh Rayabharam, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	Nathan Chancellor, Michael Kelley, linux-hyperv, linux-kernel

vmcs_config has cached host MSR_IA32_VMX_MISC value, use it for setting
up nested MSR_IA32_VMX_MISC in nested_vmx_setup_ctls_msrs() and avoid the
redundant rdmsr().

No (real) functional change intended.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 036a28c96947..7f8fa47e9396 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6742,10 +6742,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
 		msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING;
 
 	/* miscellaneous data */
-	rdmsr(MSR_IA32_VMX_MISC,
-		msrs->misc_low,
-		msrs->misc_high);
-	msrs->misc_low &= VMX_MISC_SAVE_EFER_LMA;
+	msrs->misc_low = (u32)vmcs_conf->misc & VMX_MISC_SAVE_EFER_LMA;
 	msrs->misc_low |=
 		MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS |
 		VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags
  2022-08-02 16:07 ` [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags Vitaly Kuznetsov
@ 2022-08-18 15:14   ` Sean Christopherson
  2022-08-18 15:20     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-18 15:14 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> For some features, Hyper-V spec defines two separate CPUID bits: one
> listing whether the feature is supported or not and another one showing
> whether guest partition was granted access to the feature ("partition
> privilege mask"). 'Debug MSRs available' is one of such features. Add
> the missing 'access' bit.
> 
> Note: hv_check_msr_access() deliberately keeps checking
> HV_FEATURE_DEBUG_MSRS_AVAILABLE bit instead of the new HV_ACCESS_DEBUG_MSRS
> to not break existing VMMs (QEMU) which only expose one bit. Normally, VMMs
> should set either both these bits or none.

This is not the right approach long term.  If KVM absolutely cannot unconditionally
switch to checking HV_ACCESS_DEBUG_MSRS because it would break QEMU users, then we
should add a quirk, but sweeping the whole thing under the rug is wrong.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags
  2022-08-18 15:14   ` Sean Christopherson
@ 2022-08-18 15:20     ` Vitaly Kuznetsov
  2022-08-18 15:49       ` Sean Christopherson
  0 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-18 15:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> For some features, Hyper-V spec defines two separate CPUID bits: one
>> listing whether the feature is supported or not and another one showing
>> whether guest partition was granted access to the feature ("partition
>> privilege mask"). 'Debug MSRs available' is one of such features. Add
>> the missing 'access' bit.
>> 
>> Note: hv_check_msr_access() deliberately keeps checking
>> HV_FEATURE_DEBUG_MSRS_AVAILABLE bit instead of the new HV_ACCESS_DEBUG_MSRS
>> to not break existing VMMs (QEMU) which only expose one bit. Normally, VMMs
>> should set either both these bits or none.
>
> This is not the right approach long term.  If KVM absolutely cannot unconditionally
> switch to checking HV_ACCESS_DEBUG_MSRS because it would break QEMU users, then we
> should add a quirk, but sweeping the whole thing under the rug is wrong.
>

First, this patch is kind of unrelated to the series so in case it's the
only thing which blocks it from being merged -- let's just pull it out
and discuss separately.

My personal opinion is that in this particular case we actually can
switch to checking HV_ACCESS_DEBUG_MSRS and possibly backport this patch
to stable@ and be done with it as SynDBG is a debug feature which is not
supposed to be used much in the wild. This, however, will not give us
much besides 'purity' in KVM as no sane VMM is supposed to set just one
of the HV_FEATURE_DEBUG_MSRS_AVAILABLE/HV_ACCESS_DEBUG_MSRS bits. TL;DR:
I'm not against the change.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-02 16:07 ` [PATCH v5 03/26] x86/hyperv: Update " Vitaly Kuznetsov
@ 2022-08-18 15:21   ` Sean Christopherson
  2022-08-18 15:29     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-18 15:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> Updated Hyper-V Enlightened VMCS specification lists several new
> fields for the following features:
> 
> - PerfGlobalCtrl
> - EnclsExitingBitmap
> - Tsc Scaling
> - GuestLbrCtl
> - CET
> - SSP
> 
> Update the definition. The updated definition is available only when
> CPUID.0x4000000A.EBX BIT(0) is '1'. Add a define for it as well.
> 
> Note: The latest TLFS is available at
> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 6f0acc45e67a..ebc27017fa48 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -138,6 +138,17 @@
>  #define HV_X64_NESTED_GUEST_MAPPING_FLUSH		BIT(18)
>  #define HV_X64_NESTED_MSR_BITMAP			BIT(19)
>  
> +/*
> + * Nested quirks. These are HYPERV_CPUID_NESTED_FEATURES.EBX bits.

The "quirks" part is very confusing, largely because KVM has a well-established
quirks mechanism.  I also don't see "quirks" anywhere in the TLFS documentation.
Can the "Nested quirks" part simply be dropped?

> + *
> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
> + * published TLFS version. When the bit is set, nested hypervisor can use
> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
> + * specification).
> + */
> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE		BIT(0)

This bit is now defined[*], but the docs says it's only for perf_global_ctrl.  Are
we expecting an update to the TLFS?

	Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields
	in the enlightened VMCS.

[*] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#hypervisor-nested-virtualization-features---0x4000000a

> +
>  /*
>   * This is specific to AMD and specifies that enlightened TLB flush is
>   * supported. If guest opts in to this feature, ASID invalidations only
> @@ -559,9 +570,20 @@ struct hv_enlightened_vmcs {
>  	u64 partition_assist_page;
>  	u64 padding64_4[4];
>  	u64 guest_bndcfgs;
> -	u64 padding64_5[7];
> +	u64 guest_ia32_perf_global_ctrl;
> +	u64 guest_ia32_s_cet;
> +	u64 guest_ssp;
> +	u64 guest_ia32_int_ssp_table_addr;
> +	u64 guest_ia32_lbr_ctl;
> +	u64 padding64_5[2];
>  	u64 xss_exit_bitmap;
> -	u64 padding64_6[7];
> +	u64 encls_exiting_bitmap;
> +	u64 host_ia32_perf_global_ctrl;
> +	u64 tsc_multiplier;
> +	u64 host_ia32_s_cet;
> +	u64 host_ssp;
> +	u64 host_ia32_int_ssp_table_addr;
> +	u64 padding64_6;
>  } __packed;
>  
>  #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE			0
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-18 15:21   ` Sean Christopherson
@ 2022-08-18 15:29     ` Vitaly Kuznetsov
  2022-08-18 17:57       ` Sean Christopherson
  0 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-18 15:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> Updated Hyper-V Enlightened VMCS specification lists several new
>> fields for the following features:
>> 
>> - PerfGlobalCtrl
>> - EnclsExitingBitmap
>> - Tsc Scaling
>> - GuestLbrCtl
>> - CET
>> - SSP
>> 
>> Update the definition. The updated definition is available only when
>> CPUID.0x4000000A.EBX BIT(0) is '1'. Add a define for it as well.
>> 
>> Note: The latest TLFS is available at
>> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
>> 
>> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/include/asm/hyperv-tlfs.h | 26 ++++++++++++++++++++++++--
>>  1 file changed, 24 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
>> index 6f0acc45e67a..ebc27017fa48 100644
>> --- a/arch/x86/include/asm/hyperv-tlfs.h
>> +++ b/arch/x86/include/asm/hyperv-tlfs.h
>> @@ -138,6 +138,17 @@
>>  #define HV_X64_NESTED_GUEST_MAPPING_FLUSH		BIT(18)
>>  #define HV_X64_NESTED_MSR_BITMAP			BIT(19)
>>  
>> +/*
>> + * Nested quirks. These are HYPERV_CPUID_NESTED_FEATURES.EBX bits.
>
> The "quirks" part is very confusing, largely because KVM has a well-established
> quirks mechanism.  I also don't see "quirks" anywhere in the TLFS documentation.
> Can the "Nested quirks" part simply be dropped?
>

Yes, it's completely made up (just like I made up
HV_X64_NESTED_EVMCS1_2022_UPDATE), let's drop it.

>> + *
>> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
>> + * published TLFS version. When the bit is set, nested hypervisor can use
>> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
>> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
>> + * specification).
>> + */
>> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE		BIT(0)
>
> This bit is now defined[*], but the docs says it's only for perf_global_ctrl.  Are
> we expecting an update to the TLFS?
>
> 	Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields
> 	in the enlightened VMCS.
>
> [*] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#hypervisor-nested-virtualization-features---0x4000000a
>

Oh well, better this than nothing. I'll ping the people who told me
about this bit that their description is incomplete.

>> +
>>  /*
>>   * This is specific to AMD and specifies that enlightened TLB flush is
>>   * supported. If guest opts in to this feature, ASID invalidations only
>> @@ -559,9 +570,20 @@ struct hv_enlightened_vmcs {
>>  	u64 partition_assist_page;
>>  	u64 padding64_4[4];
>>  	u64 guest_bndcfgs;
>> -	u64 padding64_5[7];
>> +	u64 guest_ia32_perf_global_ctrl;
>> +	u64 guest_ia32_s_cet;
>> +	u64 guest_ssp;
>> +	u64 guest_ia32_int_ssp_table_addr;
>> +	u64 guest_ia32_lbr_ctl;
>> +	u64 padding64_5[2];
>>  	u64 xss_exit_bitmap;
>> -	u64 padding64_6[7];
>> +	u64 encls_exiting_bitmap;
>> +	u64 host_ia32_perf_global_ctrl;
>> +	u64 tsc_multiplier;
>> +	u64 host_ia32_s_cet;
>> +	u64 host_ssp;
>> +	u64 host_ia32_int_ssp_table_addr;
>> +	u64 padding64_6;
>>  } __packed;
>>  
>>  #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE			0
>> -- 
>> 2.35.3
>> 
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags
  2022-08-18 15:20     ` Vitaly Kuznetsov
@ 2022-08-18 15:49       ` Sean Christopherson
  2022-08-18 15:59         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-18 15:49 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Thu, Aug 18, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> >> For some features, Hyper-V spec defines two separate CPUID bits: one
> >> listing whether the feature is supported or not and another one showing
> >> whether guest partition was granted access to the feature ("partition
> >> privilege mask"). 'Debug MSRs available' is one of such features. Add
> >> the missing 'access' bit.
> >> 
> >> Note: hv_check_msr_access() deliberately keeps checking
> >> HV_FEATURE_DEBUG_MSRS_AVAILABLE bit instead of the new HV_ACCESS_DEBUG_MSRS
> >> to not break existing VMMs (QEMU) which only expose one bit. Normally, VMMs
> >> should set either both these bits or none.
> >
> > This is not the right approach long term.  If KVM absolutely cannot unconditionally
> > switch to checking HV_ACCESS_DEBUG_MSRS because it would break QEMU users, then we
> > should add a quirk, but sweeping the whole thing under the rug is wrong.
> >
> 
> First, this patch is kind of unrelated to the series so in case it's the
> only thing which blocks it from being merged -- let's just pull it out
> and discuss separately.

Regarding the series, are there any true dependencies between the eVMCS patches
(1 - 11) and the VMCS sanitization rework (12 - 26)?  I.e. can the VMCS rework
be queued ahead of the eVMCS v1 support?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags
  2022-08-18 15:49       ` Sean Christopherson
@ 2022-08-18 15:59         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-18 15:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Aug 18, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> >> For some features, Hyper-V spec defines two separate CPUID bits: one
>> >> listing whether the feature is supported or not and another one showing
>> >> whether guest partition was granted access to the feature ("partition
>> >> privilege mask"). 'Debug MSRs available' is one of such features. Add
>> >> the missing 'access' bit.
>> >> 
>> >> Note: hv_check_msr_access() deliberately keeps checking
>> >> HV_FEATURE_DEBUG_MSRS_AVAILABLE bit instead of the new HV_ACCESS_DEBUG_MSRS
>> >> to not break existing VMMs (QEMU) which only expose one bit. Normally, VMMs
>> >> should set either both these bits or none.
>> >
>> > This is not the right approach long term.  If KVM absolutely cannot unconditionally
>> > switch to checking HV_ACCESS_DEBUG_MSRS because it would break QEMU users, then we
>> > should add a quirk, but sweeping the whole thing under the rug is wrong.
>> >
>> 
>> First, this patch is kind of unrelated to the series so in case it's the
>> only thing which blocks it from being merged -- let's just pull it out
>> and discuss separately.
>
> Regarding the series, are there any true dependencies between the eVMCS patches
> (1 - 11) and the VMCS sanitization rework (12 - 26)?  I.e. can the VMCS rework
> be queued ahead of the eVMCS v1 support?

My memory is a bit blurry already but I think PATCH11 ("KVM: VMX: Get
rid of eVMCS specific VMX controls sanitization") needs to go before
PATCH24 ("KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs")
to have "bug compatibility" and resolve Jim's concern: guest visible
VMX feature MSR values are not supposed to change. Currently, we filter
out unsupported features from eVMCS for KVM itself but not for L1 as we
expose raw host MSR values there. This is likely broken if L1 decides to
*use* these features for real but that's another story.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-02 16:07 ` [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS Vitaly Kuznetsov
@ 2022-08-18 17:15   ` Sean Christopherson
  2022-08-19  8:06     ` Vitaly Kuznetsov
  2022-08-18 17:19   ` Sean Christopherson
  1 sibling, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-18 17:15 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> Enlightened VMCS v1 got updated and now includes the required fields
> for TSC scaling and loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT
> features. For Hyper-V on KVM enablement, KVM can just observe VMX control
> MSRs and use the features (with or without eVMCS) when
> possible. Hyper-V on KVM case is trickier because of live migration:
> the new features require explicit enablement from VMM to not break
> it. Luckily, the updated eVMCS revision comes with a feature bit in
> CPUID.0x4000000A.EBX.

The refactor to implement the 2-d array should go in a prep patch.  Unless you
(or anyone else) objects to the feedback below, I'll do the split myself, post v6,
and queue this up (without patch 1, and assuming there're no major issues in the
back half of the series).

> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c     |  2 +-
>  arch/x86/kvm/vmx/evmcs.c  | 88 +++++++++++++++++++++++++++++++--------
>  arch/x86/kvm/vmx/evmcs.h  | 17 ++------
>  arch/x86/kvm/vmx/nested.c |  2 +-
>  arch/x86/kvm/vmx/vmx.c    |  2 +-
>  5 files changed, 78 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 1098915360ae..8a2b24f9bbf6 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2552,7 +2552,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>  		case HYPERV_CPUID_NESTED_FEATURES:
>  			ent->eax = evmcs_ver;
>  			ent->eax |= HV_X64_NESTED_MSR_BITMAP;
> -
> +			ent->ebx |= HV_X64_NESTED_EVMCS1_2022_UPDATE;
>  			break;
>  
>  		case HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS:
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 8bea5dea0341..e8497f9854a1 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -368,7 +368,60 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> -void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
> +enum evmcs_revision {
> +	EVMCSv1_2016,
> +	EVMCSv1_2022,
> +	EVMCS_REVISION_MAX,

"MAX" is technically wrong, the "max" entry is usually the last valid entry.  This
should be NR_EVMCS_REVISIONS, and then NR_EVMCS_CTRLS below.

> +};
> +
> +enum evmcs_unsupported_ctrl_type {

I think drop the "unsupported" here, because the naming gets weird when trying to
use it for something other than indexing evmcs_unsupported_ctls (see bottom).

> +	EVMCS_EXIT_CTLS,
> +	EVMCS_ENTRY_CTLS,

s/CTLS/CTRLS for consistency.

> +	EVMCS_2NDEXEC,
> +	EVMCS_PINCTRL,
> +	EVMCS_VMFUNC,
> +	EVMCS_CTRL_MAX,
> +};
> +
> +static u32 evmcs_unsupported_ctls[EVMCS_CTRL_MAX][EVMCS_REVISION_MAX] = {

This can be "const".  And s/ctls/ctrls for consistency.

> +	[EVMCS_EXIT_CTLS] = {
> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL,
> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL,
> +	},
> +	[EVMCS_ENTRY_CTLS] = {
> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMENTRY_CTRL | VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
> +		[EVMCSv1_2022] =  EVMCS1_UNSUPPORTED_VMENTRY_CTRL,
> +	},
> +	[EVMCS_2NDEXEC] = {
> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_2NDEXEC | SECONDARY_EXEC_TSC_SCALING,
> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_2NDEXEC,
> +	},
> +	[EVMCS_PINCTRL] = {
> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_PINCTRL,
> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_PINCTRL,
> +	},
> +	[EVMCS_VMFUNC] = {
> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMFUNC,
> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMFUNC,
> +	},
> +};
> +
> +static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu,
> +				      enum evmcs_unsupported_ctrl_type ctrl_type)
> +{
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +	enum evmcs_revision evmcs_rev = EVMCSv1_2016;
> +
> +	if (!hv_vcpu)

This is a functiontal change, and I don't think it's correct either.  Previously,
KVM would apply the EVMCSv1_2016 filter irrespective of whether or not
vcpu->arch.hyperv is non-NULL.  nested_enable_evmcs() doesn't require a Hyper-V
vCPU, and AFAICT nothing requires a Hyper-V vCPU to use eVMCS.

So I believe this should be:

	if (hv_vcpu &&
	    hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
		evmcs_rev = EVMCSv1_2022;

> +		return 0;
> +
> +	if (hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
> +		evmcs_rev = EVMCSv1_2022;
> +
> +	return evmcs_unsupported_ctls[ctrl_type][evmcs_rev];
> +}
> +

...

> -int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
> +int nested_evmcs_check_controls(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  {
>  	int ret = 0;
>  	u32 unsupp_ctl;
>  
>  	unsupp_ctl = vmcs12->pin_based_vm_exec_control &
> -		EVMCS1_UNSUPPORTED_PINCTRL;
> +		evmcs_get_unsupported_ctls(vcpu, EVMCS_PINCTRL);
>  	if (unsupp_ctl) {
>  		trace_kvm_nested_vmenter_failed(
> -			"eVMCS: unsupported pin-based VM-execution controls",
> +			"eVMCS: unsupported pin-based VM-execution controls: ",
>  			unsupp_ctl);

Egad!  This is all broken, at least in theory.  The second param to
trace_kvm_nested_vmenter_failed() is the error code, not the bad value.  It mostly
works because __print_symbolic() falls back to printing the hex value on error,
but that relies on there being no collisions/matches between unsupp_ctl and the
set of VMX_VMENTER_INSTRUCTION_ERRORS.  E.g. if there's a collision then the
tracepoint will pretty print a string and cause confusion.

And checking every control even after one fails seems unnecessary subtle.  It
provides marginal benefit while increasing the probability that we screw up and
squash "ret" to zero.  Yeah, it might save some onion peeling, but if that happens
during production and now during initial development of a feature, then someone
has much bigger problems to worry about.

Unless there are objections, I'll fold in a patch to yield:

#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK

static bool nested_evmcs_is_valid_controls(enum evmcs_ctrl_type type, u32 val)
{
	return val & evmcs_get_unsupported_ctls(type);
}

int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
{
	if (CC(!nested_evmcs_is_valid_controls(EVMCS_PINCTRL,
					       vmcs12->pin_based_vm_exec_control)))
		return -EINVAL;

	if (CC(!nested_evmcs_is_valid_controls(EVMCS_2NDEXEC,
					       vmcs12->secondary_vm_exec_control)))
		return -EINVAL;

	if (CC(!nested_evmcs_is_valid_controls(EVMCS_EXIT_CTRLS,
					       vmcs12->vm_exit_controls)))
		return -EINVAL;

	if (CC(!nested_evmcs_is_valid_controls(EVMCS_ENTRY_CTRLS,
					       vmcs12->vm_entry_controls)))
		return -EINVAL;

	if (CC(!nested_evmcs_is_valid_controls(EVMCS_VMFUNC,
					       vmcs12->vm_function_control)))
		return -EINVAL;

	return 0;
}

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-02 16:07 ` [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS Vitaly Kuznetsov
  2022-08-18 17:15   ` Sean Christopherson
@ 2022-08-18 17:19   ` Sean Christopherson
  2022-08-19  7:42     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-18 17:19 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index f886a8ff0342..4b809c79ae63 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -37,16 +37,9 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
>   *	EPTP_LIST_ADDRESS               = 0x00002024,
>   *	VMREAD_BITMAP                   = 0x00002026,
>   *	VMWRITE_BITMAP                  = 0x00002028,
> - *
> - *	TSC_MULTIPLIER                  = 0x00002032,
>   *	PLE_GAP                         = 0x00004020,
>   *	PLE_WINDOW                      = 0x00004022,
>   *	VMX_PREEMPTION_TIMER_VALUE      = 0x0000482E,
> - *      GUEST_IA32_PERF_GLOBAL_CTRL     = 0x00002808,
> - *      HOST_IA32_PERF_GLOBAL_CTRL      = 0x00002c04,
> - *
> - * Currently unsupported in KVM:
> - *	GUEST_IA32_RTIT_CTL		= 0x00002814,

Almost forgot: is deleting this chunk of the comment intentional?

>   */
>  #define EVMCS1_UNSUPPORTED_PINCTRL (PIN_BASED_POSTED_INTR | \
>  				    PIN_BASED_VMX_PREEMPTION_TIMER)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()
  2022-08-02 16:07 ` [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config() Vitaly Kuznetsov
@ 2022-08-18 17:49   ` Sean Christopherson
  2022-08-19  7:48     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-18 17:49 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> While it seems reasonable to not expose LOAD_IA32_PERF_GLOBAL_CTRL controls
> to L1 hypervisor on buggy CPUs, such change would inevitably break live
> migration from older KVMs where the controls are exposed. Keep the status quo
> for now, L1 hypervisor itself is supposed to take care of the errata.

As noted before, this statement is wrong as it requires guest FMS == host FMS,
but it's irrelevant because KVM can emulate the control unconditionally.  I'll
test and fold in my suggested patch[*] (assuming it works) and reword this part
of the changelog.  Ah, and I'll also need to fold in a patch to actually emulate
the controls without hardware support.

[*] https://lore.kernel.org/all/YtnZmCutdd5tpUmz@google.com

> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 59 +++++++++++++++++++++++++-----------------
>  1 file changed, 35 insertions(+), 24 deletions(-)
> 

...

> @@ -8192,6 +8199,10 @@ static __init int hardware_setup(void)
>  	if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
>  		return -EIO;
>  
> +	if (cpu_has_perf_global_ctrl_bug())
> +		pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
> +			     "does not work properly. Using workaround\n");

Any objections to opportunistically tweaking this?

		pr_warn_once("kvm: CPU has VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL erratum,"
			     "using MSR load/store lists for PERF_GLOBAL_CTRL\n");

> +
>  	if (boot_cpu_has(X86_FEATURE_NX))
>  		kvm_enable_efer_bits(EFER_NX);
>  
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-18 15:29     ` Vitaly Kuznetsov
@ 2022-08-18 17:57       ` Sean Christopherson
  2022-08-22  9:18         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-18 17:57 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Thu, Aug 18, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> >> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
> >> + * published TLFS version. When the bit is set, nested hypervisor can use
> >> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
> >> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
> >> + * specification).
> >> + */
> >> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE		BIT(0)
> >
> > This bit is now defined[*], but the docs says it's only for perf_global_ctrl.  Are
> > we expecting an update to the TLFS?
> >
> > 	Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields
> > 	in the enlightened VMCS.
> >
> > [*] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#hypervisor-nested-virtualization-features---0x4000000a
> >
> 
> Oh well, better this than nothing. I'll ping the people who told me
> about this bit that their description is incomplete.

Not that it changes anything, but I'd rather have no documentation.  I'd much rather
KVM say "this is the undocumented behavior" than "the document behavior is wrong".

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-18 17:19   ` Sean Christopherson
@ 2022-08-19  7:42     ` Vitaly Kuznetsov
  2022-08-19 14:49       ` Sean Christopherson
  0 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-19  7:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
>> index f886a8ff0342..4b809c79ae63 100644
>> --- a/arch/x86/kvm/vmx/evmcs.h
>> +++ b/arch/x86/kvm/vmx/evmcs.h
>> @@ -37,16 +37,9 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
>>   *	EPTP_LIST_ADDRESS               = 0x00002024,
>>   *	VMREAD_BITMAP                   = 0x00002026,
>>   *	VMWRITE_BITMAP                  = 0x00002028,
>> - *
>> - *	TSC_MULTIPLIER                  = 0x00002032,
>>   *	PLE_GAP                         = 0x00004020,
>>   *	PLE_WINDOW                      = 0x00004022,
>>   *	VMX_PREEMPTION_TIMER_VALUE      = 0x0000482E,
>> - *      GUEST_IA32_PERF_GLOBAL_CTRL     = 0x00002808,
>> - *      HOST_IA32_PERF_GLOBAL_CTRL      = 0x00002c04,
>> - *
>> - * Currently unsupported in KVM:
>> - *	GUEST_IA32_RTIT_CTL		= 0x00002814,
>
> Almost forgot: is deleting this chunk of the comment intentional?
>

Intentional or not (I forgot :-), GUEST_IA32_RTIT_CTL is supported/used
by KVM since

commit f99e3daf94ff35dd4a878d32ff66e1fd35223ad6
Author: Chao Peng <chao.p.peng@linux.intel.com>
Date:   Wed Oct 24 16:05:10 2018 +0800

    KVM: x86: Add Intel PT virtualization work mode

...
 
commit bf8c55d8dc094c85a3f98cd302a4dddb720dd63f
Author: Chao Peng <chao.p.peng@linux.intel.com>
Date:   Wed Oct 24 16:05:14 2018 +0800

    KVM: x86: Implement Intel PT MSRs read/write emulation

but there's no corresponding field in eVMCS. It would probably be better
to remove "Currently unsupported in KVM:" line leaving

"GUEST_IA32_RTIT_CTL             = 0x00002814" 

in place. 

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()
  2022-08-18 17:49   ` Sean Christopherson
@ 2022-08-19  7:48     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-19  7:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> While it seems reasonable to not expose LOAD_IA32_PERF_GLOBAL_CTRL controls
>> to L1 hypervisor on buggy CPUs, such change would inevitably break live
>> migration from older KVMs where the controls are exposed. Keep the status quo
>> for now, L1 hypervisor itself is supposed to take care of the errata.
>
> As noted before, this statement is wrong as it requires guest FMS == host FMS,
> but it's irrelevant because KVM can emulate the control unconditionally.  I'll
> test and fold in my suggested patch[*] (assuming it works) and reword this part
> of the changelog.  Ah, and I'll also need to fold in a patch to actually emulate
> the controls without hardware support.
>
> [*] https://lore.kernel.org/all/YtnZmCutdd5tpUmz@google.com

Oh, I missed the part that my changelog is actually wrong when Paolo
said "Can you send this as a separate patch", no objections to re-wording!

>
>> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kvm/vmx/vmx.c | 59 +++++++++++++++++++++++++-----------------
>>  1 file changed, 35 insertions(+), 24 deletions(-)
>> 
>
> ...
>
>> @@ -8192,6 +8199,10 @@ static __init int hardware_setup(void)
>>  	if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
>>  		return -EIO;
>>  
>> +	if (cpu_has_perf_global_ctrl_bug())
>> +		pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
>> +			     "does not work properly. Using workaround\n");
>
> Any objections to opportunistically tweaking this?
>
> 		pr_warn_once("kvm: CPU has VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL erratum,"
> 			     "using MSR load/store lists for PERF_GLOBAL_CTRL\n");
>

Personally I'd say just 

 		pr_warn_once("kvm: CPU has VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL erratum\n");

leaving aside the workaround KVM uses. This is merely an implementation
detail which KVM users most likely don't really need. I have no strong
opinion though, feel free to adjust.

>> +
>>  	if (boot_cpu_has(X86_FEATURE_NX))
>>  		kvm_enable_efer_bits(EFER_NX);
>>  
>> -- 
>> 2.35.3
>> 
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-18 17:15   ` Sean Christopherson
@ 2022-08-19  8:06     ` Vitaly Kuznetsov
  2022-08-19 17:02       ` Sean Christopherson
  0 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-19  8:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> Enlightened VMCS v1 got updated and now includes the required fields
>> for TSC scaling and loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT
>> features. For Hyper-V on KVM enablement, KVM can just observe VMX control
>> MSRs and use the features (with or without eVMCS) when
>> possible. Hyper-V on KVM case is trickier because of live migration:
>> the new features require explicit enablement from VMM to not break
>> it. Luckily, the updated eVMCS revision comes with a feature bit in
>> CPUID.0x4000000A.EBX.
>
> The refactor to implement the 2-d array should go in a prep patch.  Unless you
> (or anyone else) objects to the feedback below, I'll do the split myself, post v6,
> and queue this up (without patch 1, and assuming there're no major issues in the
> back half of the series).

No objections from me, thanks!

>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kvm/hyperv.c     |  2 +-
>>  arch/x86/kvm/vmx/evmcs.c  | 88 +++++++++++++++++++++++++++++++--------
>>  arch/x86/kvm/vmx/evmcs.h  | 17 ++------
>>  arch/x86/kvm/vmx/nested.c |  2 +-
>>  arch/x86/kvm/vmx/vmx.c    |  2 +-
>>  5 files changed, 78 insertions(+), 33 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index 1098915360ae..8a2b24f9bbf6 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -2552,7 +2552,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>>  		case HYPERV_CPUID_NESTED_FEATURES:
>>  			ent->eax = evmcs_ver;
>>  			ent->eax |= HV_X64_NESTED_MSR_BITMAP;
>> -
>> +			ent->ebx |= HV_X64_NESTED_EVMCS1_2022_UPDATE;
>>  			break;
>>  
>>  		case HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS:
>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>> index 8bea5dea0341..e8497f9854a1 100644
>> --- a/arch/x86/kvm/vmx/evmcs.c
>> +++ b/arch/x86/kvm/vmx/evmcs.c
>> @@ -368,7 +368,60 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>>  	return 0;
>>  }
>>  
>> -void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
>> +enum evmcs_revision {
>> +	EVMCSv1_2016,
>> +	EVMCSv1_2022,
>> +	EVMCS_REVISION_MAX,
>
> "MAX" is technically wrong, the "max" entry is usually the last valid entry.  This
> should be NR_EVMCS_REVISIONS, and then NR_EVMCS_CTRLS below.
>
>> +};
>> +
>> +enum evmcs_unsupported_ctrl_type {
>
> I think drop the "unsupported" here, because the naming gets weird when trying to
> use it for something other than indexing evmcs_unsupported_ctls (see bottom).
>
>> +	EVMCS_EXIT_CTLS,
>> +	EVMCS_ENTRY_CTLS,
>
> s/CTLS/CTRLS for consistency.
>
>> +	EVMCS_2NDEXEC,
>> +	EVMCS_PINCTRL,
>> +	EVMCS_VMFUNC,
>> +	EVMCS_CTRL_MAX,
>> +};
>> +
>> +static u32 evmcs_unsupported_ctls[EVMCS_CTRL_MAX][EVMCS_REVISION_MAX] = {
>
> This can be "const".  And s/ctls/ctrls for consistency.
>
>> +	[EVMCS_EXIT_CTLS] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL,
>> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL,
>> +	},
>> +	[EVMCS_ENTRY_CTLS] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMENTRY_CTRL | VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
>> +		[EVMCSv1_2022] =  EVMCS1_UNSUPPORTED_VMENTRY_CTRL,
>> +	},
>> +	[EVMCS_2NDEXEC] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_2NDEXEC | SECONDARY_EXEC_TSC_SCALING,
>> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_2NDEXEC,
>> +	},
>> +	[EVMCS_PINCTRL] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_PINCTRL,
>> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_PINCTRL,
>> +	},
>> +	[EVMCS_VMFUNC] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMFUNC,
>> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMFUNC,
>> +	},
>> +};
>> +
>> +static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu,
>> +				      enum evmcs_unsupported_ctrl_type ctrl_type)
>> +{
>> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>> +	enum evmcs_revision evmcs_rev = EVMCSv1_2016;
>> +
>> +	if (!hv_vcpu)
>
> This is a functiontal change, and I don't think it's correct either.  Previously,
> KVM would apply the EVMCSv1_2016 filter irrespective of whether or not
> vcpu->arch.hyperv is non-NULL.  nested_enable_evmcs() doesn't require a Hyper-V
> vCPU, and AFAICT nothing requires a Hyper-V vCPU to use eVMCS.

Indeed, this *is* correct after PATCH11 when we get rid of VMX feature
MSR filtering for KVM-on-Hyper-V as the remaining use for
evmcs_get_unsupported_ctls() is Hyper-V on KVM and hv_vcpu is not NULL
there. As of this patch, this is incorrect as we're breaking e.g. Linux
guests on KVM on Hyper-V.

>
> So I believe this should be:
>
> 	if (hv_vcpu &&
> 	    hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
> 		evmcs_rev = EVMCSv1_2022;

This looks good to me, thanks!

>
>> +		return 0;
>> +
>> +	if (hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
>> +		evmcs_rev = EVMCSv1_2022;
>> +
>> +	return evmcs_unsupported_ctls[ctrl_type][evmcs_rev];
>> +}
>> +
>
> ...
>
>> -int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
>> +int nested_evmcs_check_controls(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>  {
>>  	int ret = 0;
>>  	u32 unsupp_ctl;
>>  
>>  	unsupp_ctl = vmcs12->pin_based_vm_exec_control &
>> -		EVMCS1_UNSUPPORTED_PINCTRL;
>> +		evmcs_get_unsupported_ctls(vcpu, EVMCS_PINCTRL);
>>  	if (unsupp_ctl) {
>>  		trace_kvm_nested_vmenter_failed(
>> -			"eVMCS: unsupported pin-based VM-execution controls",
>> +			"eVMCS: unsupported pin-based VM-execution controls: ",
>>  			unsupp_ctl);
>
> Egad!  This is all broken, at least in theory.  The second param to
> trace_kvm_nested_vmenter_failed() is the error code, not the bad value.  It mostly
> works because __print_symbolic() falls back to printing the hex value on error,
> but that relies on there being no collisions/matches between unsupp_ctl and the
> set of VMX_VMENTER_INSTRUCTION_ERRORS.  E.g. if there's a collision then the
> tracepoint will pretty print a string and cause confusion.
>
> And checking every control even after one fails seems unnecessary subtle.  It
> provides marginal benefit while increasing the probability that we screw up and
> squash "ret" to zero.  Yeah, it might save some onion peeling, but if that happens
> during production and now during initial development of a feature, then someone
> has much bigger problems to worry about.
>
> Unless there are objections, I'll fold in a patch to yield:
>
> #define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
>
> static bool nested_evmcs_is_valid_controls(enum evmcs_ctrl_type type, u32 val)
> {
> 	return val & evmcs_get_unsupported_ctls(type);
> }
>
> int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
> {
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_PINCTRL,
> 					       vmcs12->pin_based_vm_exec_control)))
> 		return -EINVAL;
>
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_2NDEXEC,
> 					       vmcs12->secondary_vm_exec_control)))
> 		return -EINVAL;
>
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_EXIT_CTRLS,
> 					       vmcs12->vm_exit_controls)))
> 		return -EINVAL;
>
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_ENTRY_CTRLS,
> 					       vmcs12->vm_entry_controls)))
> 		return -EINVAL;
>
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_VMFUNC,
> 					       vmcs12->vm_function_control)))
> 		return -EINVAL;
>
> 	return 0;
> }


Well, it's actually nice to see all the controls where KVM screwed up
without the need to instrument kernel, this way when someone comes with
an issue you can immediately see what's wrong in the trace
log. Honestly, I don't remember these firing outside of my development
environment, your patch to make things correct looks good to me.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-19  7:42     ` Vitaly Kuznetsov
@ 2022-08-19 14:49       ` Sean Christopherson
  2022-08-19 15:07         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-19 14:49 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Fri, Aug 19, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> >> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> >> index f886a8ff0342..4b809c79ae63 100644
> >> --- a/arch/x86/kvm/vmx/evmcs.h
> >> +++ b/arch/x86/kvm/vmx/evmcs.h
> >> @@ -37,16 +37,9 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
> >>   *	EPTP_LIST_ADDRESS               = 0x00002024,
> >>   *	VMREAD_BITMAP                   = 0x00002026,
> >>   *	VMWRITE_BITMAP                  = 0x00002028,
> >> - *
> >> - *	TSC_MULTIPLIER                  = 0x00002032,
> >>   *	PLE_GAP                         = 0x00004020,
> >>   *	PLE_WINDOW                      = 0x00004022,
> >>   *	VMX_PREEMPTION_TIMER_VALUE      = 0x0000482E,
> >> - *      GUEST_IA32_PERF_GLOBAL_CTRL     = 0x00002808,
> >> - *      HOST_IA32_PERF_GLOBAL_CTRL      = 0x00002c04,
> >> - *
> >> - * Currently unsupported in KVM:
> >> - *	GUEST_IA32_RTIT_CTL		= 0x00002814,
> >
> > Almost forgot: is deleting this chunk of the comment intentional?
> >
> 
> Intentional or not (I forgot :-), GUEST_IA32_RTIT_CTL is supported/used
> by KVM since
> 
> commit f99e3daf94ff35dd4a878d32ff66e1fd35223ad6
> Author: Chao Peng <chao.p.peng@linux.intel.com>
> Date:   Wed Oct 24 16:05:10 2018 +0800
> 
>     KVM: x86: Add Intel PT virtualization work mode
> 
> ...
>  
> commit bf8c55d8dc094c85a3f98cd302a4dddb720dd63f
> Author: Chao Peng <chao.p.peng@linux.intel.com>
> Date:   Wed Oct 24 16:05:14 2018 +0800
> 
>     KVM: x86: Implement Intel PT MSRs read/write emulation
> 
> but there's no corresponding field in eVMCS. It would probably be better
> to remove "Currently unsupported in KVM:" line leaving
> 
> "GUEST_IA32_RTIT_CTL             = 0x00002814" 
> 
> in place. 

GUEST_IA32_RTIT_CTL isn't supported for nested VMX though, which is how I
interpreted the "Currently unsupported in KVM:".  Would it be accurate to extend
that part of the comment to "Currently unsupported in KVM for nested VMX:"?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-19 14:49       ` Sean Christopherson
@ 2022-08-19 15:07         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-19 15:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Fri, Aug 19, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> >> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
>> >> index f886a8ff0342..4b809c79ae63 100644
>> >> --- a/arch/x86/kvm/vmx/evmcs.h
>> >> +++ b/arch/x86/kvm/vmx/evmcs.h
>> >> @@ -37,16 +37,9 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
>> >>   *	EPTP_LIST_ADDRESS               = 0x00002024,
>> >>   *	VMREAD_BITMAP                   = 0x00002026,
>> >>   *	VMWRITE_BITMAP                  = 0x00002028,
>> >> - *
>> >> - *	TSC_MULTIPLIER                  = 0x00002032,
>> >>   *	PLE_GAP                         = 0x00004020,
>> >>   *	PLE_WINDOW                      = 0x00004022,
>> >>   *	VMX_PREEMPTION_TIMER_VALUE      = 0x0000482E,
>> >> - *      GUEST_IA32_PERF_GLOBAL_CTRL     = 0x00002808,
>> >> - *      HOST_IA32_PERF_GLOBAL_CTRL      = 0x00002c04,
>> >> - *
>> >> - * Currently unsupported in KVM:
>> >> - *	GUEST_IA32_RTIT_CTL		= 0x00002814,
>> >
>> > Almost forgot: is deleting this chunk of the comment intentional?
>> >
>> 
>> Intentional or not (I forgot :-), GUEST_IA32_RTIT_CTL is supported/used
>> by KVM since
>> 
>> commit f99e3daf94ff35dd4a878d32ff66e1fd35223ad6
>> Author: Chao Peng <chao.p.peng@linux.intel.com>
>> Date:   Wed Oct 24 16:05:10 2018 +0800
>> 
>>     KVM: x86: Add Intel PT virtualization work mode
>> 
>> ...
>>  
>> commit bf8c55d8dc094c85a3f98cd302a4dddb720dd63f
>> Author: Chao Peng <chao.p.peng@linux.intel.com>
>> Date:   Wed Oct 24 16:05:14 2018 +0800
>> 
>>     KVM: x86: Implement Intel PT MSRs read/write emulation
>> 
>> but there's no corresponding field in eVMCS. It would probably be better
>> to remove "Currently unsupported in KVM:" line leaving
>> 
>> "GUEST_IA32_RTIT_CTL             = 0x00002814" 
>> 
>> in place. 
>
> GUEST_IA32_RTIT_CTL isn't supported for nested VMX though, which is how I
> interpreted the "Currently unsupported in KVM:".  Would it be accurate to extend
> that part of the comment to "Currently unsupported in KVM for nested VMX:"?

Yea, sounds good to me. 

FWIW, there are other controls which are currently missing in KVM,
e.g. 'guest_ia32_lbr_ctl' (VMX field 0x2816) and we have no 'macro
shenanigans' to catch the moment when support for these gets introduced
in KVM. When this happens, we need to extend VMCS-to-eVMCS mapping
(vmcs_field_to_evmcs_1) to support KVM-on-Hyper-V case and, when the
corresponding field gets added to 'struct vmcs12',
copy_enlightened_to_vmcs12()/copy_vmcs12_to_enlightened(). The whole
process is manual and thus error prone...

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-19  8:06     ` Vitaly Kuznetsov
@ 2022-08-19 17:02       ` Sean Christopherson
  2022-08-22  8:47         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-19 17:02 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Fri, Aug 19, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> >> +static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu,
> >> +				      enum evmcs_unsupported_ctrl_type ctrl_type)
> >> +{
> >> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> >> +	enum evmcs_revision evmcs_rev = EVMCSv1_2016;
> >> +
> >> +	if (!hv_vcpu)
> >
> > This is a functiontal change, and I don't think it's correct either.  Previously,
> > KVM would apply the EVMCSv1_2016 filter irrespective of whether or not
> > vcpu->arch.hyperv is non-NULL.  nested_enable_evmcs() doesn't require a Hyper-V
> > vCPU, and AFAICT nothing requires a Hyper-V vCPU to use eVMCS.
> 
> Indeed, this *is* correct after PATCH11 when we get rid of VMX feature
> MSR filtering for KVM-on-Hyper-V as the remaining use for
> evmcs_get_unsupported_ctls() is Hyper-V on KVM and hv_vcpu is not NULL
> there.

Hmm, nested_vmx_handle_enlightened_vmptrld() will fail without a Hyper-V vCPU, so
filtering eVMCS control iff there's a Hyper-V vCPU makes sense.  But that's a guest
visible change and should be a separate patch.

But that also raises the question of whether or not KVM should honor hyperv_enabled
when filtering MSRs.  Same question for nested VM-Enter.  nested_enlightened_vmentry()
will "fail" without an assist page, and the guest can't set the assist page without
hyperv_enabled==true, but nothing prevents the host from stuffing the assist page.

And on a very related topic, the handling of kvm_hv_vcpu_init() in kvm_hv_set_cpuid()
is buggy.  KVM will not report an error to userspace for KVM_SET_CPUID2 if allocation
fails.  If a later operation successfully create a Hyper-V vCPU, KVM will chug along
with Hyper-V enabled but without having cached the relevant Hyper-V CPUID info.

Less important is that kvm_hv_set_cpuid() should also zero out the CPUID cache if
Hyper-V is disabled.  I'm pretty sure sure all paths check hyperv_enabled before
consuming cpuid_cache, but it's unnecessarily risky.

If we fix the kvm_hv_set_cpuid() allocation failure, then we can also guarantee
that vcpu->arch.hyperv is non-NULL if vcpu->arch.hyperv_enabled==true.  And then
we can add gate guest eVMCS flow on hyperv_enabled, and evmcs_get_unsupported_ctls()
can then WARN if hv_vcpu is NULL.

Assuming I'm not overlooking something, I'll fold in yet more patches.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-19 17:02       ` Sean Christopherson
@ 2022-08-22  8:47         ` Vitaly Kuznetsov
  2022-08-22 16:50           ` Sean Christopherson
  0 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-22  8:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Fri, Aug 19, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> >> +static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu,
>> >> +				      enum evmcs_unsupported_ctrl_type ctrl_type)
>> >> +{
>> >> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>> >> +	enum evmcs_revision evmcs_rev = EVMCSv1_2016;
>> >> +
>> >> +	if (!hv_vcpu)
>> >
>> > This is a functiontal change, and I don't think it's correct either.  Previously,
>> > KVM would apply the EVMCSv1_2016 filter irrespective of whether or not
>> > vcpu->arch.hyperv is non-NULL.  nested_enable_evmcs() doesn't require a Hyper-V
>> > vCPU, and AFAICT nothing requires a Hyper-V vCPU to use eVMCS.
>> 
>> Indeed, this *is* correct after PATCH11 when we get rid of VMX feature
>> MSR filtering for KVM-on-Hyper-V as the remaining use for
>> evmcs_get_unsupported_ctls() is Hyper-V on KVM and hv_vcpu is not NULL
>> there.
>
> Hmm, nested_vmx_handle_enlightened_vmptrld() will fail without a Hyper-V vCPU, so
> filtering eVMCS control iff there's a Hyper-V vCPU makes sense.  But that's a guest
> visible change and should be a separate patch.
>

Yes, the change you suggested:

          if (hv_vcpu &&
              hv_vcpu->cpuid_cache.nested_eb & HV_X64_NESTED_EVMCS1_2022_UPDATE) 
			evmcs_rev = EVMCSv1_2022;

seems to keep the status quo so we can discuss dropping filtering when
!hv_vcpu separately.

> But that also raises the question of whether or not KVM should honor hyperv_enabled
> when filtering MSRs.  Same question for nested VM-Enter.  nested_enlightened_vmentry()
> will "fail" without an assist page, and the guest can't set the assist page without
> hyperv_enabled==true, but nothing prevents the host from stuffing the assist page.

The case sounds more like a misbehaving VMM to me. It would probably be
better to fail nested_enlightened_vmentry() immediately on !hyperv_enabled.

>
> And on a very related topic, the handling of kvm_hv_vcpu_init() in kvm_hv_set_cpuid()
> is buggy.  KVM will not report an error to userspace for KVM_SET_CPUID2 if allocation
> fails.  If a later operation successfully create a Hyper-V vCPU, KVM will chug along
> with Hyper-V enabled but without having cached the relevant Hyper-V
> CPUID info.

Indeed, that's probably because kvm_vcpu_after_set_cpuid() itself is
never supposed to fail and thus returns 'void'. I'm not up-to-date on
the discussion whether small allocations can actually fail (and whether
2832 bytes for 'struct kvm_vcpu_hv' is 'small') but propagating -ENOMEM
all the way up to VMM is likely the right way to go.

>
> Less important is that kvm_hv_set_cpuid() should also zero out the CPUID cache if
> Hyper-V is disabled.  I'm pretty sure sure all paths check hyperv_enabled before
> consuming cpuid_cache, but it's unnecessarily risky.

+1

>
> If we fix the kvm_hv_set_cpuid() allocation failure, then we can also guarantee
> that vcpu->arch.hyperv is non-NULL if vcpu->arch.hyperv_enabled==true.  And then
> we can add gate guest eVMCS flow on hyperv_enabled, and evmcs_get_unsupported_ctls()
> can then WARN if hv_vcpu is NULL.
>

Alternatively, we can just KVM_BUG_ON() in kvm_hv_set_cpuid() when
allocation fails, at least for the time being as the VM is likely
useless anyway.

> Assuming I'm not overlooking something, I'll fold in yet more patches.
>

Thanks for the thorough review here and don't hesitate to speak up when
you think it's too much of a change to do upon queueing)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-18 17:57       ` Sean Christopherson
@ 2022-08-22  9:18         ` Vitaly Kuznetsov
  2022-08-22 15:55           ` Sean Christopherson
  2022-08-22 16:13           ` Sean Christopherson
  0 siblings, 2 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-22  9:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Aug 18, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> >> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
>> >> + * published TLFS version. When the bit is set, nested hypervisor can use
>> >> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
>> >> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
>> >> + * specification).
>> >> + */
>> >> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE		BIT(0)
>> >
>> > This bit is now defined[*], but the docs says it's only for perf_global_ctrl.  Are
>> > we expecting an update to the TLFS?
>> >
>> > 	Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields
>> > 	in the enlightened VMCS.
>> >
>> > [*] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#hypervisor-nested-virtualization-features---0x4000000a
>> >
>> 
>> Oh well, better this than nothing. I'll ping the people who told me
>> about this bit that their description is incomplete.
>
> Not that it changes anything, but I'd rather have no documentation.  I'd much rather
> KVM say "this is the undocumented behavior" than "the document behavior is wrong".
>

So I reached out to Microsoft and their answer was that for all these new
eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX
MSRs should be enough. *PerfGlobalCtrl case is special because of Win11
bug (if we expose the feature in VMX feature MSRs but don't set
CPUID.0x4000000A.EBX BIT(0) it just doesn't boot).

What I'm still concerned about is future proofing KVM for new
features. When something is getting added to KVM for which no eVMCS
field is currently defined, both Hyper-V-on-KVM and KVM-on-Hyper-V cases
should be taken care of. It would probably be better to reverse our
filtering, explicitly listing features supported in eVMCS. The lists are
going to be fairly long but at least we won't have to take care of any
new architectural feature added to KVM.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-22  9:18         ` Vitaly Kuznetsov
@ 2022-08-22 15:55           ` Sean Christopherson
  2022-08-22 16:21             ` Vitaly Kuznetsov
  2022-08-22 16:13           ` Sean Christopherson
  1 sibling, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-22 15:55 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Thu, Aug 18, 2022, Vitaly Kuznetsov wrote:
> >> Sean Christopherson <seanjc@google.com> writes:
> >> 
> >> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> >> >> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
> >> >> + * published TLFS version. When the bit is set, nested hypervisor can use
> >> >> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
> >> >> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
> >> >> + * specification).
> >> >> + */
> >> >> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE		BIT(0)
> >> >
> >> > This bit is now defined[*], but the docs says it's only for perf_global_ctrl.  Are
> >> > we expecting an update to the TLFS?
> >> >
> >> > 	Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields
> >> > 	in the enlightened VMCS.
> >> >
> >> > [*] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#hypervisor-nested-virtualization-features---0x4000000a
> >> >
> >> 
> >> Oh well, better this than nothing. I'll ping the people who told me
> >> about this bit that their description is incomplete.
> >
> > Not that it changes anything, but I'd rather have no documentation.  I'd much rather
> > KVM say "this is the undocumented behavior" than "the document behavior is wrong".
> >
> 
> So I reached out to Microsoft and their answer was that for all these new
> eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX
> MSRs should be enough. *PerfGlobalCtrl case is special because of Win11
> bug (if we expose the feature in VMX feature MSRs but don't set
> CPUID.0x4000000A.EBX BIT(0) it just doesn't boot).

I.e. TSC_SCALING shouldn't be gated on the flag?  If so, then the 2-D array approach
is overkill since (a) the CPUID flag only controls PERF_GLOBAL_CTRL and (b) we aren't
expecting any more flags in the future.

What about this for an implementation?

static bool evmcs_has_perf_global_ctrl(struct kvm_vcpu *vcpu)
{
	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);

	/*
	 * Filtering VMX controls for eVMCS compatibility should only be done
	 * for guest accesses, and all such accesses should be gated on Hyper-V
	 * being enabled and initialized.
	 */
	if (WARN_ON_ONCE(!hv_vcpu))
		return false;

	return hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL;
}

static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu, u32 msr_index)
{
	u32 unsupported_ctrls;

	switch (msr_index) {
	case MSR_IA32_VMX_EXIT_CTLS:
	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
		unsupported_ctrls = EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
		if (!evmcs_has_perf_global_ctrl(vcpu))
			unsupported_ctrls |= VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
		return unsupported_ctrls;
	case MSR_IA32_VMX_ENTRY_CTLS:
	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
		unsupported_ctrls = EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
		if (!evmcs_has_perf_global_ctrl(vcpu))
			unsupported_ctrls |= VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
		return unsupported_ctrls;
	case MSR_IA32_VMX_PROCBASED_CTLS2:
		return EVMCS1_UNSUPPORTED_2NDEXEC;
	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
	case MSR_IA32_VMX_PINBASED_CTLS:
		return EVMCS1_UNSUPPORTED_PINCTRL;
	case MSR_IA32_VMX_VMFUNC:
		return EVMCS1_UNSUPPORTED_VMFUNC;
	default:
		KVM_BUG_ON(1, vcpu->kvm);
		return 0;
	}
}

void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
{
	u64 unsupported_ctrls = evmcs_get_unsupported_ctls(vcpu, msr_index);

	if (msr_index == MSR_IA32_VMX_VMFUNC)
		*pdata &= ~unsupported_ctrls;
	else
		*pdata &= ~(unsupported_ctrls << 32);
}


> What I'm still concerned about is future proofing KVM for new
> features. When something is getting added to KVM for which no eVMCS
> field is currently defined, both Hyper-V-on-KVM and KVM-on-Hyper-V cases
> should be taken care of. It would probably be better to reverse our
> filtering, explicitly listing features supported in eVMCS. The lists are
> going to be fairly long but at least we won't have to take care of any
> new architectural feature added to KVM.

Having the filtering be opt-in crossed my mind as well.  Reversing the filtering
can be done after this series though, correct?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-22  9:18         ` Vitaly Kuznetsov
  2022-08-22 15:55           ` Sean Christopherson
@ 2022-08-22 16:13           ` Sean Christopherson
  2022-08-22 16:24             ` Vitaly Kuznetsov
  1 sibling, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-22 16:13 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
> So I reached out to Microsoft and their answer was that for all these new
> eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX
> MSRs should be enough. *PerfGlobalCtrl case is special because of Win11
> bug (if we expose the feature in VMX feature MSRs but don't set
> CPUID.0x4000000A.EBX BIT(0) it just doesn't boot).

Does this mean that KVM-on-HyperV needs to avoid using the PERF_GLOBAL_CTRL fields
when the bit is not set?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-22 15:55           ` Sean Christopherson
@ 2022-08-22 16:21             ` Vitaly Kuznetsov
  2022-08-22 17:01               ` Sean Christopherson
  0 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-22 16:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Thu, Aug 18, 2022, Vitaly Kuznetsov wrote:
>> >> Sean Christopherson <seanjc@google.com> writes:
>> >> 
>> >> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> >> >> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
>> >> >> + * published TLFS version. When the bit is set, nested hypervisor can use
>> >> >> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
>> >> >> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
>> >> >> + * specification).
>> >> >> + */
>> >> >> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE		BIT(0)
>> >> >
>> >> > This bit is now defined[*], but the docs says it's only for perf_global_ctrl.  Are
>> >> > we expecting an update to the TLFS?
>> >> >
>> >> > 	Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields
>> >> > 	in the enlightened VMCS.
>> >> >
>> >> > [*] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#hypervisor-nested-virtualization-features---0x4000000a
>> >> >
>> >> 
>> >> Oh well, better this than nothing. I'll ping the people who told me
>> >> about this bit that their description is incomplete.
>> >
>> > Not that it changes anything, but I'd rather have no documentation.  I'd much rather
>> > KVM say "this is the undocumented behavior" than "the document behavior is wrong".
>> >
>> 
>> So I reached out to Microsoft and their answer was that for all these new
>> eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX
>> MSRs should be enough. *PerfGlobalCtrl case is special because of Win11
>> bug (if we expose the feature in VMX feature MSRs but don't set
>> CPUID.0x4000000A.EBX BIT(0) it just doesn't boot).
>
> I.e. TSC_SCALING shouldn't be gated on the flag?  If so, then the 2-D array approach
> is overkill since (a) the CPUID flag only controls PERF_GLOBAL_CTRL and (b) we aren't
> expecting any more flags in the future.
>

Unfortunately, we have to gate the presence of these new features on
something, otherwise VMM has no way to specify which particular eVMCS
"revision" it wants (TL;DR: we will break migration).

My initial implementation was inventing 'eVMCS revision' concept:
https://lore.kernel.org/kvm/20220629150625.238286-7-vkuznets@redhat.com/

which is needed if we don't gate all these new fields on CPUID.0x4000000A.EBX BIT(0).

Going forward, we will still (likely) need something when new fields show up.

> What about this for an implementation?
>
> static bool evmcs_has_perf_global_ctrl(struct kvm_vcpu *vcpu)
> {
> 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>
> 	/*
> 	 * Filtering VMX controls for eVMCS compatibility should only be done
> 	 * for guest accesses, and all such accesses should be gated on Hyper-V
> 	 * being enabled and initialized.
> 	 */
> 	if (WARN_ON_ONCE(!hv_vcpu))
> 		return false;
>
> 	return hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL;
> }
>
> static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu, u32 msr_index)
> {
> 	u32 unsupported_ctrls;
>
> 	switch (msr_index) {
> 	case MSR_IA32_VMX_EXIT_CTLS:
> 	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> 		unsupported_ctrls = EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> 		if (!evmcs_has_perf_global_ctrl(vcpu))
> 			unsupported_ctrls |= VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> 		return unsupported_ctrls;
> 	case MSR_IA32_VMX_ENTRY_CTLS:
> 	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> 		unsupported_ctrls = EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> 		if (!evmcs_has_perf_global_ctrl(vcpu))
> 			unsupported_ctrls |= VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> 		return unsupported_ctrls;
> 	case MSR_IA32_VMX_PROCBASED_CTLS2:
> 		return EVMCS1_UNSUPPORTED_2NDEXEC;
> 	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> 	case MSR_IA32_VMX_PINBASED_CTLS:
> 		return EVMCS1_UNSUPPORTED_PINCTRL;
> 	case MSR_IA32_VMX_VMFUNC:
> 		return EVMCS1_UNSUPPORTED_VMFUNC;
> 	default:
> 		KVM_BUG_ON(1, vcpu->kvm);
> 		return 0;
> 	}
> }
>
> void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> {
> 	u64 unsupported_ctrls = evmcs_get_unsupported_ctls(vcpu, msr_index);
>
> 	if (msr_index == MSR_IA32_VMX_VMFUNC)
> 		*pdata &= ~unsupported_ctrls;
> 	else
> 		*pdata &= ~(unsupported_ctrls << 32);
> }
>

It's smaller and I like it but it would only work in conjunction with
KVM_CAP_HYPERV_ENLIGHTENED_VMCS2...

>
>> What I'm still concerned about is future proofing KVM for new
>> features. When something is getting added to KVM for which no eVMCS
>> field is currently defined, both Hyper-V-on-KVM and KVM-on-Hyper-V cases
>> should be taken care of. It would probably be better to reverse our
>> filtering, explicitly listing features supported in eVMCS. The lists are
>> going to be fairly long but at least we won't have to take care of any
>> new architectural feature added to KVM.
>
> Having the filtering be opt-in crossed my mind as well.  Reversing the filtering
> can be done after this series though, correct?
>

Yes, that's my plan, Get this in to fix the immediate issue with 2022
features and probably reverse the filtering before Microsoft releases
something else :-)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-22 16:13           ` Sean Christopherson
@ 2022-08-22 16:24             ` Vitaly Kuznetsov
  0 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-22 16:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> So I reached out to Microsoft and their answer was that for all these new
>> eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX
>> MSRs should be enough. *PerfGlobalCtrl case is special because of Win11
>> bug (if we expose the feature in VMX feature MSRs but don't set
>> CPUID.0x4000000A.EBX BIT(0) it just doesn't boot).
>
> Does this mean that KVM-on-HyperV needs to avoid using the PERF_GLOBAL_CTRL fields
> when the bit is not set?

It doesn't have to, based on the reply I got from Microsoft, if 
PERF_GLOBAL_CTRL is exposed in architectural VMX feature MSRs than eVMCS
fields are guaranteed to be present. The PV bit is 'extra'.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-22  8:47         ` Vitaly Kuznetsov
@ 2022-08-22 16:50           ` Sean Christopherson
  2022-08-22 17:49             ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-22 16:50 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> > But that also raises the question of whether or not KVM should honor hyperv_enabled
> > when filtering MSRs.  Same question for nested VM-Enter.  nested_enlightened_vmentry()
> > will "fail" without an assist page, and the guest can't set the assist page without
> > hyperv_enabled==true, but nothing prevents the host from stuffing the assist page.
> 
> The case sounds more like a misbehaving VMM to me. It would probably be
> better to fail nested_enlightened_vmentry() immediately on !hyperv_enabled.

Hmm, sort of.  If KVM fails explicitly fails nested VM-Enter, then allowing the
guest to read the VMX MSRs with the same buggy setup is odd, e.g. nested VMX is
effectively unsupported at that point since there is nothing the guest can do to
make nested VM-Enter succeed.  Extending the "fail VM-Enter" behavior would be to
inject #GP on RDMSR, and at that point KVM is well into "made up architecture"
behavior.

All in all, I don't think it's worth forcing the issue, even though I do agree that
the VMM is being weird if it's enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS but not
advertising Hyper-V.

> > If we fix the kvm_hv_set_cpuid() allocation failure, then we can also guarantee
> > that vcpu->arch.hyperv is non-NULL if vcpu->arch.hyperv_enabled==true.  And then
> > we can add gate guest eVMCS flow on hyperv_enabled, and evmcs_get_unsupported_ctls()
> > can then WARN if hv_vcpu is NULL.
> >
> 
> Alternatively, we can just KVM_BUG_ON() in kvm_hv_set_cpuid() when
> allocation fails, at least for the time being as the VM is likely
> useless anyway.

I'd prefer not to use KVM_BUG_ON() in this case.  A proper fix isn't that much
more code, and this isn't a KVM bug unless we conciously make it one :-)

> > Assuming I'm not overlooking something, I'll fold in yet more patches.
> >
> 
> Thanks for the thorough review here and don't hesitate to speak up when
> you think it's too much of a change to do upon queueing)

Heh, this definitely snowballed beyond "fixup on queue".  Let's sort out how to
address the filtering issue and then decide how to handle v6.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-22 16:21             ` Vitaly Kuznetsov
@ 2022-08-22 17:01               ` Sean Christopherson
  2022-08-22 17:46                 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-22 17:01 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
> >> So I reached out to Microsoft and their answer was that for all these new
> >> eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX
> >> MSRs should be enough. *PerfGlobalCtrl case is special because of Win11
> >> bug (if we expose the feature in VMX feature MSRs but don't set
> >> CPUID.0x4000000A.EBX BIT(0) it just doesn't boot).
> >
> > I.e. TSC_SCALING shouldn't be gated on the flag?  If so, then the 2-D array approach
> > is overkill since (a) the CPUID flag only controls PERF_GLOBAL_CTRL and (b) we aren't
> > expecting any more flags in the future.
> >
> 
> Unfortunately, we have to gate the presence of these new features on
> something, otherwise VMM has no way to specify which particular eVMCS
> "revision" it wants (TL;DR: we will break migration).
> 
> My initial implementation was inventing 'eVMCS revision' concept:
> https://lore.kernel.org/kvm/20220629150625.238286-7-vkuznets@redhat.com/
> 
> which is needed if we don't gate all these new fields on CPUID.0x4000000A.EBX BIT(0).
> 
> Going forward, we will still (likely) need something when new fields show up.

My comments from that thread still apply.  Adding "revisions" or feature flags
isn't maintanable, e.g. at best KVM will end up with a ridiculous number of flags.

Looking at QEMU, which I strongly suspect is the only VMM that enables
KVM_CAP_HYPERV_ENLIGHTENED_VMCS, it does the sane thing of enabling the capability
before grabbing the VMX MSRs.

So, why not simply apply filtering for host accesses as well?  E.g.

		/*
		 * New Enlightened VMCS fields always lag behind their hardware
		 * counterparts, filter out fields that are not yet defined.
		 */
		if (vmx->nested.enlightened_vmcs_enabled)
			nested_evmcs_filter_control_msr(vcpu, msr_info);

and then the eVMCS can end up being:

static bool evmcs_has_perf_global_ctrl(struct kvm_vcpu *vcpu)
{
	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);

	/*
	 * PERF_GLOBAL_CTRL is filtered only for guest accesses, and all guest
	 * accesses should be gated on Hyper-V being enabled and initialized.
	 */
	if (WARN_ON_ONCE(!hv_vcpu))
		return false;

	return hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL;
}

static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu, u32 msr_index,
				      bool host_initiated)
{
	u32 unsupported_ctrls;

	switch (msr_index) {
	case MSR_IA32_VMX_EXIT_CTLS:
	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
		unsupported_ctrls = EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
		if (!host_initiated && !evmcs_has_perf_global_ctrl(vcpu))
			unsupported_ctrls |= VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
		return unsupported_ctrls;
	case MSR_IA32_VMX_ENTRY_CTLS:
	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
		unsupported_ctrls = EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
		if (!host_initiated && !evmcs_has_perf_global_ctrl(vcpu))
			unsupported_ctrls |= VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
		return unsupported_ctrls;
	case MSR_IA32_VMX_PROCBASED_CTLS2:
		return EVMCS1_UNSUPPORTED_2NDEXEC;
	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
	case MSR_IA32_VMX_PINBASED_CTLS:
		return EVMCS1_UNSUPPORTED_PINCTRL;
	case MSR_IA32_VMX_VMFUNC:
		return EVMCS1_UNSUPPORTED_VMFUNC;
	default:
		KVM_BUG_ON(1, vcpu->kvm);
		return 0;
	}
}

void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu,
				     struct msr_data *msr_info)
{
	u64 unsupported_ctrls;
	
	if (!msr_info->host_initiated && !vcpu->arch.hyperv_enabled)
		return;

	unsupported_ctrls = evmcs_get_unsupported_ctls(vcpu, msr_info->index,
						       msr_info->host_initiated);
	if (msr_info->index == MSR_IA32_VMX_VMFUNC)
		msr_info->data &= ~unsupported_ctrls;
	else
		msr_info->data &= ~(unsupported_ctrls << 32);
}

static bool nested_evmcs_is_valid_controls(struct kvm_vcpu *vcpu,
					   u32 msr_index, u32 val)
{
	return val & evmcs_get_unsupported_ctls(vcpu, msr_index, false);
}


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-22 17:01               ` Sean Christopherson
@ 2022-08-22 17:46                 ` Vitaly Kuznetsov
  2022-08-22 18:32                   ` Sean Christopherson
  0 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-22 17:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> >> So I reached out to Microsoft and their answer was that for all these new
>> >> eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX
>> >> MSRs should be enough. *PerfGlobalCtrl case is special because of Win11
>> >> bug (if we expose the feature in VMX feature MSRs but don't set
>> >> CPUID.0x4000000A.EBX BIT(0) it just doesn't boot).
>> >
>> > I.e. TSC_SCALING shouldn't be gated on the flag?  If so, then the 2-D array approach
>> > is overkill since (a) the CPUID flag only controls PERF_GLOBAL_CTRL and (b) we aren't
>> > expecting any more flags in the future.
>> >
>> 
>> Unfortunately, we have to gate the presence of these new features on
>> something, otherwise VMM has no way to specify which particular eVMCS
>> "revision" it wants (TL;DR: we will break migration).
>> 
>> My initial implementation was inventing 'eVMCS revision' concept:
>> https://lore.kernel.org/kvm/20220629150625.238286-7-vkuznets@redhat.com/
>> 
>> which is needed if we don't gate all these new fields on CPUID.0x4000000A.EBX BIT(0).
>> 
>> Going forward, we will still (likely) need something when new fields show up.
>
> My comments from that thread still apply.  Adding "revisions" or feature flags
> isn't maintanable, e.g. at best KVM will end up with a ridiculous number of flags.
>
> Looking at QEMU, which I strongly suspect is the only VMM that enables
> KVM_CAP_HYPERV_ENLIGHTENED_VMCS, it does the sane thing of enabling the capability
> before grabbing the VMX MSRs.
>
> So, why not simply apply filtering for host accesses as well?

(I understand that using QEMU to justify KVM's behavior is flawed but...)

QEMU's migration depends on the assumption that identical QEMU's command
lines create identical (from guest PoV) configurations. Assume we have
(simplified)

"-cpu CascadeLake-Sever,hv-evmcs"

on both source and destination but source host is newer, i.e. its KVM
knows about TSC Scaling in eVMCS and destination host has no idea about
it. If we just apply filtering upon vCPU creation, guest visible MSR
values are going to be different, right? Ok, assuming QEMU also migrates
VMX feature MSRs (TODO: check if that's true), we will be able to fail
mirgration late (which is already much worse than not being able to
create the desired configuration on destination, 'fail early') if we use
in-KVM filtering to throw an error to userspace. But if we blindly
filter control MSRs on the destination, 'TscScaling' will just disapper
undreneath the guest. This is unlikely to work.

In any case, what we need, is an option for VMM (read: QEMU) to create
the configuration with 'TscScaling' filtered out even KVM supports the
bit in eVMCS. This way the guest will be able to migrate backwards to an
older KVM which doesn't support it, i.e.

'-cpu CascadeLake-Sever,hv-evmcs'
 creates the 'origin' eVMCS configuration, no TscScaling

'-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022' creates the updated one.

KVM_CAP_HYPERV_ENLIGHTENED_VMCS is bad as it only takes 'eVMCS' version
as a parameter (as we assumed it will always change when new fields are
added, but that turned out to be false). That's why I suggested
KVM_CAP_HYPERV_ENLIGHTENED_VMCS2.

For the issue at hand, 'hv-evmcs-2022' can just set CPUID.0x4000000A.EBX
BIT(0) and then we gate all new fields' existence on it. It doesn't
matter much if we filter host accesses or not in this scheme.

Going all the way back, I'd certainly made the filtering apply to host
writes throwing an error when eVMCS is enabled (and I'd made it per-VM
and mandate that it is enabled prior to getting MSRs) but that doesn't
seem to help us much now.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
  2022-08-22 16:50           ` Sean Christopherson
@ 2022-08-22 17:49             ` Vitaly Kuznetsov
  0 siblings, 0 replies; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-22 17:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> > But that also raises the question of whether or not KVM should honor hyperv_enabled
>> > when filtering MSRs.  Same question for nested VM-Enter.  nested_enlightened_vmentry()
>> > will "fail" without an assist page, and the guest can't set the assist page without
>> > hyperv_enabled==true, but nothing prevents the host from stuffing the assist page.
>> 
>> The case sounds more like a misbehaving VMM to me. It would probably be
>> better to fail nested_enlightened_vmentry() immediately on !hyperv_enabled.
>
> Hmm, sort of.  If KVM fails explicitly fails nested VM-Enter, then allowing the
> guest to read the VMX MSRs with the same buggy setup is odd, e.g. nested VMX is
> effectively unsupported at that point since there is nothing the guest can do to
> make nested VM-Enter succeed.  Extending the "fail VM-Enter" behavior would be to
> inject #GP on RDMSR, and at that point KVM is well into "made up architecture"
> behavior.
>
> All in all, I don't think it's worth forcing the issue, even though I do agree that
> the VMM is being weird if it's enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS but not
> advertising Hyper-V.

I keep thinking about KVM-on-KVM using Hyper-V features like eVMCS, eMSR
bitmap, 'l2' tlb flush,... when I can't sleep at night sometimes :-)

...

>> 
>> Thanks for the thorough review here and don't hesitate to speak up when
>> you think it's too much of a change to do upon queueing)
>
> Heh, this definitely snowballed beyond "fixup on queue".  Let's sort out how to
> address the filtering issue and then decide how to handle v6.
>

Yep, let's keep the snowball rolling! :-)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-22 17:46                 ` Vitaly Kuznetsov
@ 2022-08-22 18:32                   ` Sean Christopherson
  2022-08-23  7:33                     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Sean Christopherson @ 2022-08-22 18:32 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
> >> My initial implementation was inventing 'eVMCS revision' concept:
> >> https://lore.kernel.org/kvm/20220629150625.238286-7-vkuznets@redhat.com/
> >> 
> >> which is needed if we don't gate all these new fields on CPUID.0x4000000A.EBX BIT(0).
> >> 
> >> Going forward, we will still (likely) need something when new fields show up.
> >
> > My comments from that thread still apply.  Adding "revisions" or feature flags
> > isn't maintanable, e.g. at best KVM will end up with a ridiculous number of flags.
> >
> > Looking at QEMU, which I strongly suspect is the only VMM that enables
> > KVM_CAP_HYPERV_ENLIGHTENED_VMCS, it does the sane thing of enabling the capability
> > before grabbing the VMX MSRs.
> >
> > So, why not simply apply filtering for host accesses as well?
> 
> (I understand that using QEMU to justify KVM's behavior is flawed but...)
> 
> QEMU's migration depends on the assumption that identical QEMU's command
> lines create identical (from guest PoV) configurations. Assume we have
> (simplified)
> 
> "-cpu CascadeLake-Sever,hv-evmcs"
> 
> on both source and destination but source host is newer, i.e. its KVM
> knows about TSC Scaling in eVMCS and destination host has no idea about
> it. If we just apply filtering upon vCPU creation, guest visible MSR
> values are going to be different, right? Ok, assuming QEMU also migrates
> VMX feature MSRs (TODO: check if that's true), we will be able to fail
> mirgration late (which is already much worse than not being able to
> create the desired configuration on destination, 'fail early') if we use
> in-KVM filtering to throw an error to userspace. But if we blindly
> filter control MSRs on the destination, 'TscScaling' will just disapper
> undreneath the guest. This is unlikely to work.

But all of that holds true irrespetive of eVMCS.  If QEMU attempts to migrate a
nested guest from a KVM that supports TSC_SCALING to a KVM that doesn't support
TSC_SCALING, then TSC_SCALING is going to disappear and VM-Entry on the dest will
fail.  I.e. QEMU _must_ be able to detect the incompatibility and not attempt
the migration.  And with that code in place, QEMU doesn't need to do anything new
for eVMCS, it Just Works.

> In any case, what we need, is an option for VMM (read: QEMU) to create
> the configuration with 'TscScaling' filtered out even KVM supports the
> bit in eVMCS. This way the guest will be able to migrate backwards to an
> older KVM which doesn't support it, i.e.
> 
> '-cpu CascadeLake-Sever,hv-evmcs'
>  creates the 'origin' eVMCS configuration, no TscScaling
> 
> '-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022' creates the updated one.

Again, this conundrum exists irrespective of eVMCS.  Properly solve the problem
for regular nVMX and eVMCS should naturally work.

> KVM_CAP_HYPERV_ENLIGHTENED_VMCS is bad as it only takes 'eVMCS' version
> as a parameter (as we assumed it will always change when new fields are
> added, but that turned out to be false). That's why I suggested
> KVM_CAP_HYPERV_ENLIGHTENED_VMCS2.

Enumerating features via versions is such a bad API though, e.g. if there's a
bug with nested TSC_SCALING, userspace can't disable just nested TSC_SCALING
without everything else under the inscrutable "hv-evmcs-2022" being collateral
damage.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-22 18:32                   ` Sean Christopherson
@ 2022-08-23  7:33                     ` Vitaly Kuznetsov
  2022-08-23 15:00                       ` Sean Christopherson
  0 siblings, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-23  7:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> >> My initial implementation was inventing 'eVMCS revision' concept:
>> >> https://lore.kernel.org/kvm/20220629150625.238286-7-vkuznets@redhat.com/
>> >> 
>> >> which is needed if we don't gate all these new fields on CPUID.0x4000000A.EBX BIT(0).
>> >> 
>> >> Going forward, we will still (likely) need something when new fields show up.
>> >
>> > My comments from that thread still apply.  Adding "revisions" or feature flags
>> > isn't maintanable, e.g. at best KVM will end up with a ridiculous number of flags.
>> >
>> > Looking at QEMU, which I strongly suspect is the only VMM that enables
>> > KVM_CAP_HYPERV_ENLIGHTENED_VMCS, it does the sane thing of enabling the capability
>> > before grabbing the VMX MSRs.
>> >
>> > So, why not simply apply filtering for host accesses as well?
>> 
>> (I understand that using QEMU to justify KVM's behavior is flawed but...)
>> 
>> QEMU's migration depends on the assumption that identical QEMU's command
>> lines create identical (from guest PoV) configurations. Assume we have
>> (simplified)
>> 
>> "-cpu CascadeLake-Sever,hv-evmcs"
>> 
>> on both source and destination but source host is newer, i.e. its KVM
>> knows about TSC Scaling in eVMCS and destination host has no idea about
>> it. If we just apply filtering upon vCPU creation, guest visible MSR
>> values are going to be different, right? Ok, assuming QEMU also migrates
>> VMX feature MSRs (TODO: check if that's true), we will be able to fail
>> mirgration late (which is already much worse than not being able to
>> create the desired configuration on destination, 'fail early') if we use
>> in-KVM filtering to throw an error to userspace. But if we blindly
>> filter control MSRs on the destination, 'TscScaling' will just disapper
>> undreneath the guest. This is unlikely to work.
>
> But all of that holds true irrespetive of eVMCS.  If QEMU attempts to migrate a
> nested guest from a KVM that supports TSC_SCALING to a KVM that doesn't support
> TSC_SCALING, then TSC_SCALING is going to disappear and VM-Entry on the dest will
> fail.  I.e. QEMU _must_ be able to detect the incompatibility and not attempt
> the migration.  And with that code in place, QEMU doesn't need to do anything new
> for eVMCS, it Just Works.

I'm obviously missing something. "-cpu CascadeLake-Sever" presumes
cetain features, including VMX features (e.g. TSC_SCALING), an attempt
to create such vCPU on a CPU which doesn't support it will lead to
immediate failure. So two VMs created on different hosts with

-cpu CascadeLake-Sever

are guaranteed to look exactly the same from guest PoV. This is not true
for '-cpu CascadeLake-Sever,hv-evmcs' (if we do it the way you suggest)
as 'hv-evmcs' will be a *different* filter on each host (which is going
to depend on KVM version, not even on the host's hardware).

>
>> In any case, what we need, is an option for VMM (read: QEMU) to create
>> the configuration with 'TscScaling' filtered out even KVM supports the
>> bit in eVMCS. This way the guest will be able to migrate backwards to an
>> older KVM which doesn't support it, i.e.
>> 
>> '-cpu CascadeLake-Sever,hv-evmcs'
>>  creates the 'origin' eVMCS configuration, no TscScaling
>> 
>> '-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022' creates the updated one.
>
> Again, this conundrum exists irrespective of eVMCS.  Properly solve the problem
> for regular nVMX and eVMCS should naturally work.

I don't think we have this problem for VMX features as named CPU models
in QEMU encode all of them explicitly, they *must* be present whenever
such vCPU is created.

>
>> KVM_CAP_HYPERV_ENLIGHTENED_VMCS is bad as it only takes 'eVMCS' version
>> as a parameter (as we assumed it will always change when new fields are
>> added, but that turned out to be false). That's why I suggested
>> KVM_CAP_HYPERV_ENLIGHTENED_VMCS2.
>
> Enumerating features via versions is such a bad API though, e.g. if there's a
> bug with nested TSC_SCALING, userspace can't disable just nested TSC_SCALING
> without everything else under the inscrutable "hv-evmcs-2022" being collateral
> damage.

Why? Something like 

"-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022,-vmx-tsc-scaling" 

should work well, no? 'hv-evmcs*' are just filters, if the VMX feature
is not there -- it's not there.

We can certainly make this fine-grained and introduce
KVM_CAP_HYPERV_EVMCS_TSC_SCALING instead and make a 'hv-*' flag for it,
however, with Hyper-V and Windows I really don't like 'frankenstein'
Hyper-V configurations which look nothing like genuine Hyper-V as nobody
at Microsoft tests new Windows builds with such configurations. And yes,
bugs were discovered in the past (e.g. PerfGlobalCtrl and Win11).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-23  7:33                     ` Vitaly Kuznetsov
@ 2022-08-23 15:00                       ` Sean Christopherson
  2022-08-23 15:31                         ` Sean Christopherson
  2022-08-23 16:54                         ` Vitaly Kuznetsov
  0 siblings, 2 replies; 59+ messages in thread
From: Sean Christopherson @ 2022-08-23 15:00 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Tue, Aug 23, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
> >> QEMU's migration depends on the assumption that identical QEMU's command
> >> lines create identical (from guest PoV) configurations. Assume we have
> >> (simplified)
> >> 
> >> "-cpu CascadeLake-Sever,hv-evmcs"
> >> 
> >> on both source and destination but source host is newer, i.e. its KVM
> >> knows about TSC Scaling in eVMCS and destination host has no idea about
> >> it. If we just apply filtering upon vCPU creation, guest visible MSR
> >> values are going to be different, right? Ok, assuming QEMU also migrates
> >> VMX feature MSRs (TODO: check if that's true), we will be able to fail
> >> mirgration late (which is already much worse than not being able to
> >> create the desired configuration on destination, 'fail early') if we use
> >> in-KVM filtering to throw an error to userspace. But if we blindly
> >> filter control MSRs on the destination, 'TscScaling' will just disapper
> >> undreneath the guest. This is unlikely to work.
> >
> > But all of that holds true irrespetive of eVMCS.  If QEMU attempts to migrate a
> > nested guest from a KVM that supports TSC_SCALING to a KVM that doesn't support
> > TSC_SCALING, then TSC_SCALING is going to disappear and VM-Entry on the dest will
> > fail.  I.e. QEMU _must_ be able to detect the incompatibility and not attempt
> > the migration.  And with that code in place, QEMU doesn't need to do anything new
> > for eVMCS, it Just Works.
> 
> I'm obviously missing something. "-cpu CascadeLake-Sever" presumes
> cetain features, including VMX features (e.g. TSC_SCALING), an attempt
> to create such vCPU on a CPU which doesn't support it will lead to
> immediate failure. So two VMs created on different hosts with
> 
> -cpu CascadeLake-Sever
> 
> are guaranteed to look exactly the same from guest PoV. This is not true
> for '-cpu CascadeLake-Sever,hv-evmcs' (if we do it the way you suggest)
> as 'hv-evmcs' will be a *different* filter on each host (which is going
> to depend on KVM version, not even on the host's hardware).

We're talking about nested VMX, i.e. exposing TSC_SCALING to L1.  QEMU's CLX
definition doesn't include TSC_SCALING.  In fact, none of QEMU's predefined CPU
models supports TSC_SCALING, precisely because KVM didn't support exposing the
feature to L1 until relatively recently.

$ git grep VMX_SECONDARY_EXEC_TSC_SCALING
target/i386/cpu.h:#define VMX_SECONDARY_EXEC_TSC_SCALING              0x02000000
target/i386/kvm/kvm.c:    if (f[FEAT_VMX_SECONDARY_CTLS] & VMX_SECONDARY_EXEC_TSC_SCALING) {

> >> In any case, what we need, is an option for VMM (read: QEMU) to create
> >> the configuration with 'TscScaling' filtered out even KVM supports the
> >> bit in eVMCS. This way the guest will be able to migrate backwards to an
> >> older KVM which doesn't support it, i.e.
> >> 
> >> '-cpu CascadeLake-Sever,hv-evmcs'
> >>  creates the 'origin' eVMCS configuration, no TscScaling
> >> 
> >> '-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022' creates the updated one.

Ah, I see what you're worried about.  Your concern is that QEMU will add a VMX
feature to a predefined CPU model, but only later gain eVMCS support, and so
"CascadeLake-Server,hv-evmcs" will do different things depending on the KVM
version.

But again, that's already reality.  Run "-cpu CascadeLake-Server" against a KVM
from before commits:

  28c1c9fabf48 ("KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES")
  1eaafe91a0df ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")

and it will fail.  There are undoubtedly many other features that are similarly
affected, just go back far enough in KVM time.

Or simply run "-cpu CascadeLake-Server" on pre-CLX hardware.  Anything that KVM
doesn't fully emulate will not be present.

> > Again, this conundrum exists irrespective of eVMCS.  Properly solve the problem
> > for regular nVMX and eVMCS should naturally work.
> 
> I don't think we have this problem for VMX features as named CPU models
> in QEMU encode all of them explicitly, they *must* be present whenever
> such vCPU is created.

Yes, and if KVM doesn't support features that CascadeLake-Server requires, spawning
the VM will fail on the destination, as it should.  My point is that this behavior
is not unique to eVMCS.

QEMU/Libvirt must also be prepared for rejection, because it is flat out impossible
to ensure that KVM+hardware supports a specific feature.

> >> KVM_CAP_HYPERV_ENLIGHTENED_VMCS is bad as it only takes 'eVMCS' version
> >> as a parameter (as we assumed it will always change when new fields are
> >> added, but that turned out to be false). That's why I suggested
> >> KVM_CAP_HYPERV_ENLIGHTENED_VMCS2.
> >
> > Enumerating features via versions is such a bad API though, e.g. if there's a
> > bug with nested TSC_SCALING, userspace can't disable just nested TSC_SCALING
> > without everything else under the inscrutable "hv-evmcs-2022" being collateral
> > damage.
> 
> Why? Something like 
> 
> "-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022,-vmx-tsc-scaling" 
> 
> should work well, no? 'hv-evmcs*' are just filters, if the VMX feature
> is not there -- it's not there.

Because it's completely unnecessary, adds non-trivial maintenance burden to KVM,
and requires explicit documentation to explain to userspace what "hv-evmcs-2022"
means.

It's unnecessary because if the user is concerned about eVMCS features showing up
in the future, then they should do:

  -cpu CascadeLake-Server,hv-evmcs,-vmx-tsc-scaling,-<any other VMX features not eVMCS-friendly>

If QEMU wants to make that more user friendly, then define CascadeLake-Server-eVMCS
or whatever so that the features that are unlikely be supported for eVMCS are off by
default.  This is no different than QEMU not including nested TSC_SCALING in any of
the predefined models; the developers _know_ KVM doesn't widely support TSC_SCALING,
so it was omitted, even though a real CLX CPU is guaranteed to support TSC_SCALING.

It's non-trivial maintenance for KVM because it would require defining new versions
every time an eVMCS field is added, allowing userspace to specify and restrict
features based on arbitrary versions, and do all of that without conflicting with
whatever PV enumeration Microsoft adds.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-23 15:00                       ` Sean Christopherson
@ 2022-08-23 15:31                         ` Sean Christopherson
  2022-08-23 16:54                         ` Vitaly Kuznetsov
  1 sibling, 0 replies; 59+ messages in thread
From: Sean Christopherson @ 2022-08-23 15:31 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Tue, Aug 23, 2022, Sean Christopherson wrote:
> On Tue, Aug 23, 2022, Vitaly Kuznetsov wrote:
> > >> In any case, what we need, is an option for VMM (read: QEMU) to create
> > >> the configuration with 'TscScaling' filtered out even KVM supports the
> > >> bit in eVMCS. This way the guest will be able to migrate backwards to an
> > >> older KVM which doesn't support it, i.e.
> > >> 
> > >> '-cpu CascadeLake-Sever,hv-evmcs'
> > >>  creates the 'origin' eVMCS configuration, no TscScaling
> > >> 
> > >> '-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022' creates the updated one.
> 
> Ah, I see what you're worried about.  Your concern is that QEMU will add a VMX
> feature to a predefined CPU model, but only later gain eVMCS support, and so
> "CascadeLake-Server,hv-evmcs" will do different things depending on the KVM
> version.
> 
> But again, that's already reality.  Run "-cpu CascadeLake-Server" against a KVM
> from before commits:
> 
>   28c1c9fabf48 ("KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES")
>   1eaafe91a0df ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
> 
> and it will fail.  There are undoubtedly many other features that are similarly
> affected, just go back far enough in KVM time.

The one potential issue I see is that KVM currently silently hides TSC_SCALING
and PERF_GLOBAL_CTRL, i.e. migrating from new KVM to old KVM may "succeed" and
then later fail a nested VM-Entry.

PERF_GLOBAL_CTRL is solved because Microsoft has conveniently provided a CPUID
bit.

TSC_SCALING is unlikely to be a problem since it's so new, but if we're worried
about someone doing e.g. "-cpu CascadeLake-Server,hv-evmcs,+vmx-tsc-scaling", then
we can add a KVM quirk to silently hide TSC_SCALING from the guest when eVMCS is
enabled.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-23 15:00                       ` Sean Christopherson
  2022-08-23 15:31                         ` Sean Christopherson
@ 2022-08-23 16:54                         ` Vitaly Kuznetsov
  2022-08-23 20:16                           ` Sean Christopherson
  1 sibling, 1 reply; 59+ messages in thread
From: Vitaly Kuznetsov @ 2022-08-23 16:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> We're talking about nested VMX, i.e. exposing TSC_SCALING to L1.  QEMU's CLX
> definition doesn't include TSC_SCALING.  In fact, none of QEMU's predefined CPU
> models supports TSC_SCALING, precisely because KVM didn't support exposing the
> feature to L1 until relatively recently.
>
> $ git grep VMX_SECONDARY_EXEC_TSC_SCALING
> target/i386/cpu.h:#define VMX_SECONDARY_EXEC_TSC_SCALING              0x02000000
> target/i386/kvm/kvm.c:    if (f[FEAT_VMX_SECONDARY_CTLS] &  VMX_SECONDARY_EXEC_TSC_SCALING) {

(sorry for my persistence but I still believe there are issues which we
won't be able to solve if we take the suggested approach).

You got me. Indeed, "vmx-tsc-scaling" feature is indeed not set for
named CPU models so my example was flawed. Let's swap it with
VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL /
VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL which a bunch of named models
have. So I do the same,

'-cpu CascadeLake-Sever,hv-evmcs'

on both the source host which knows about these eVMCS fields and the
destination host which doesn't.

First problem: CPUID. On the source host, we will have
CPUID.0x4000000A.EBX BIT(0) = 1, and "=0" on the destination. I don't
think we migrate CPUID data (can be wrong, though).

Second, assuming VMX feature MSRs are actually migrated, we must fail on
the destnation because VMX_VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL is
trying to get set. We can do this in KVM but note: currently, KVM
filters guest reads but not host's so when you're trying to migrate from
a non-fixed KVM, VMX_VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL are
actually present! So how do we distinguinsh in KVM between these two
cases, i.e. how do we know if
VMX_VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL were filtered out on the
source (old kvm) or not (new KVM)?

...
>
> Because it's completely unnecessary, adds non-trivial maintenance burden to KVM,
> and requires explicit documentation to explain to userspace what "hv-evmcs-2022"
> means.
>
> It's unnecessary because if the user is concerned about eVMCS features showing up
> in the future, then they should do:
>
>   -cpu CascadeLake-Server,hv-evmcs,-vmx-tsc-scaling,-<any other VMX features not eVMCS-friendly>
>
> If QEMU wants to make that more user friendly, then define CascadeLake-Server-eVMCS
> or whatever so that the features that are unlikely be supported for eVMCS are off by
> default.

I completely agree that what I'm trying to achieve here could've been
done in QEMU from day 1 but we now have what we have: KVM silently
filtering out certain VMX features and zero indication to userspace
VMM whether filtering is being done or not (besides this
CPUID.0x4000000A.EBX BIT(0) bit but I'm not even sure we analyze
source's CPUID data upon migration).

>  This is no different than QEMU not including nested TSC_SCALING in any of
> the predefined models; the developers _know_ KVM doesn't widely support TSC_SCALING,
> so it was omitted, even though a real CLX CPU is guaranteed to support TSC_SCALING.
>

Out of curiosity, what happens if someone sends the following patch to
QEMU:

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1db1278a599b..2278f4522b44 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3191,6 +3191,12 @@ static const X86CPUDefinition builtin_x86_defs[] = {
                   { "vmx-xsaves", "on" },
                   { /* end of list */ }
               },
+            { .version = 6,
+              .note = "ARCH_CAPABILITIES, EPT switching, XSAVES, no TSX, TSC_SCALING",
+              .props = (PropValue[]) {
+                  { "vmx-tsc-scaling", "on" },
+                  { /* end of list */ }
+              },
             },
             { /* end of list */ }
         }

Will Paolo remember about eVMCS and reject it?

> It's non-trivial maintenance for KVM because it would require defining new versions
> every time an eVMCS field is added, allowing userspace to specify and restrict
> features based on arbitrary versions, and do all of that without conflicting with
> whatever PV enumeration Microsoft adds.

The update at hand comes with a feature bit so no mater what we do, we
will need a new QEMU flag to support this feature bit. My suggestion was
just that we stretch its definition a bit and encode not only
PERF_GLOBAL_CTRL but all fields which were added. At the same time we
can switch to filtering host reads and failing host writes for what's
missing (and to do so we'll likely need to invert the logic and
explicitly list what eVMCS supports) so we're better prepared to the
next update.

-- 
Vitaly


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
  2022-08-23 16:54                         ` Vitaly Kuznetsov
@ 2022-08-23 20:16                           ` Sean Christopherson
  0 siblings, 0 replies; 59+ messages in thread
From: Sean Christopherson @ 2022-08-23 20:16 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Anirudh Rayabharam, Wanpeng Li, Jim Mattson,
	Maxim Levitsky, Nathan Chancellor, Michael Kelley, linux-hyperv,
	linux-kernel

On Tue, Aug 23, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > We're talking about nested VMX, i.e. exposing TSC_SCALING to L1.  QEMU's CLX
> > definition doesn't include TSC_SCALING.  In fact, none of QEMU's predefined CPU
> > models supports TSC_SCALING, precisely because KVM didn't support exposing the
> > feature to L1 until relatively recently.
> >
> > $ git grep VMX_SECONDARY_EXEC_TSC_SCALING
> > target/i386/cpu.h:#define VMX_SECONDARY_EXEC_TSC_SCALING              0x02000000
> > target/i386/kvm/kvm.c:    if (f[FEAT_VMX_SECONDARY_CTLS] &  VMX_SECONDARY_EXEC_TSC_SCALING) {
> 
> (sorry for my persistence but I still believe there are issues which we
> won't be able to solve if we take the suggested approach).
> 
> You got me. Indeed, "vmx-tsc-scaling" feature is indeed not set for
> named CPU models so my example was flawed. Let's swap it with
> VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL /
> VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL which a bunch of named models
> have. So I do the same,
> 
> '-cpu CascadeLake-Sever,hv-evmcs'
> 
> on both the source host which knows about these eVMCS fields and the
> destination host which doesn't.
 
> First problem: CPUID. On the source host, we will have
> CPUID.0x4000000A.EBX BIT(0) = 1, and "=0" on the destination. I don't
> think we migrate CPUID data (can be wrong, though).

Huh?  Why would the source have CPUID.0x4000000A.EBX.BIT(0) = 1?  If QEMU is
automatically parroting all KVM-supported Hyper-V features back into KVM via
KVM_SET_CPUID2 _and_ expects the resulting VM to be migratable, then that's a
QEMU bug.

The CPUID bits that matter _have_ to be "migrated", in the sense that the source
and destination absolutely must have compatible CPUID definitions.  The Linux kernel
does not support refreshing CPUID, where as userspace might depending on when the
userspace application starts up[*].  Dropping or adding CPUID bits across migration
is all but guaranteed to cause breakage, e.g. drop the PCID bit and KVM will start
injection #GPs on the destination.

[*] https://lore.kernel.org/lkml/Yvn5BNXfOm3uA7WA@worktop.programming.kicks-ass.net

> Second, assuming VMX feature MSRs are actually migrated, we must fail on

VMX feature MSRs are basically CPL-only CPUID leafs, i.e. they too must be "migrated",
where migrated can be an actual save/restore or QEMU ensuring that the destination
ends up with the same configuration as the source.

> the destnation because VMX_VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL is
> trying to get set. We can do this in KVM but note: currently, KVM
> filters guest reads but not host's so when you're trying to migrate from
> a non-fixed KVM, VMX_VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL are
> actually present! So how do we distinguinsh in KVM between these two
> cases, i.e. how do we know if
> VMX_VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL were filtered out on the
> source (old kvm) or not (new KVM)?

PERF_GLOBAL_CTRL is "solved" because Microsoft provided a CPUID bit.  First, fix
KVM to filter KVM_GET_MSRS when eVMCS is enabled.  Then, expose PERF_GLOBAL_CTRL
to the guest if and only if the new CPUID bit is set.

That guarantees that userspace has to explicitly enable exposure of the fields.
And again, if QEMU is blindly reflecting Hyper-V CPUID leafs, that's a QEMU bug.

But peeking ahead, I think we're in violent agreement on these points.

> > Because it's completely unnecessary, adds non-trivial maintenance burden to KVM,
> > and requires explicit documentation to explain to userspace what "hv-evmcs-2022"
> > means.
> >
> > It's unnecessary because if the user is concerned about eVMCS features showing up
> > in the future, then they should do:
> >
> >   -cpu CascadeLake-Server,hv-evmcs,-vmx-tsc-scaling,-<any other VMX features not eVMCS-friendly>
> >
> > If QEMU wants to make that more user friendly, then define CascadeLake-Server-eVMCS
> > or whatever so that the features that are unlikely be supported for eVMCS are off by
> > default.
> 
> I completely agree that what I'm trying to achieve here could've been
> done in QEMU from day 1 but we now have what we have: KVM silently
> filtering out certain VMX features and zero indication to userspace
> VMM whether filtering is being done or not (besides this
> CPUID.0x4000000A.EBX BIT(0) bit but I'm not even sure we analyze
> source's CPUID data upon migration).
>
> >  This is no different than QEMU not including nested TSC_SCALING in any of
> > the predefined models; the developers _know_ KVM doesn't widely support TSC_SCALING,
> > so it was omitted, even though a real CLX CPU is guaranteed to support TSC_SCALING.
> >
> 
> Out of curiosity, what happens if someone sends the following patch to
> QEMU:
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 1db1278a599b..2278f4522b44 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -3191,6 +3191,12 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>                    { "vmx-xsaves", "on" },
>                    { /* end of list */ }
>                },
> +            { .version = 6,
> +              .note = "ARCH_CAPABILITIES, EPT switching, XSAVES, no TSX, TSC_SCALING",
> +              .props = (PropValue[]) {
> +                  { "vmx-tsc-scaling", "on" },
> +                  { /* end of list */ }
> +              },
>              },
>              { /* end of list */ }
>          }
> 
> Will Paolo remember about eVMCS and reject it?

Ah, I see.  If QEMU adds vmx-tsc-scaling in the future, then creating a VM will
not fail as it should if QEMU runs with an older KVM that silently hides
TSC_SCALING.

Argh.  There's another problem.  KVM will break userspace if KVM starts enforcing
writes to VMX MSRs.  This isn't solvable without new uAPI.  We can handle
PERF_GLOBAL_CTRL and TSC_SCALING by enabling the enforcement after they're no
longer marked unsupported, but that doesn't address all the other controls that
are unsupported.  E.g. PML is in many of QEMU's named CPU models but is unsupported
when eVMCS is enabled.

This might end up looking at lot like your "versioning" approach, except that there
will be exactly two versions: legacy and enforced (or whatever we want to call 'em).

I suspect this may force QEMU to have eVMCS-specific named CPU models.  I don't see
any way around that, "CascadeLake-Server,hv-evmcs" really ends up being a wildly
different vCPU than vanilla "CascadeLake-Server".

> > It's non-trivial maintenance for KVM because it would require defining new versions
> > every time an eVMCS field is added, allowing userspace to specify and restrict
> > features based on arbitrary versions, and do all of that without conflicting with
> > whatever PV enumeration Microsoft adds.
> 
> The update at hand comes with a feature bit so no mater what we do, we
> will need a new QEMU flag to support this feature bit. My suggestion was
> just that we stretch its definition a bit and encode not only
> PERF_GLOBAL_CTRL but all fields which were added.

I really don't think KVM should take liberties with others' "architectural" CPUID
bits.  IMO, redefining Hyper-V's CPUID bits is no different than redefining Intel
or AMD's CPUID bits.

I'm pretty sure it's a moot point though, because we can't gate userspace behavior
on guest CPUID.

> At the same time we can switch to filtering host reads and failing host
> writes for what's missing (and to do so we'll likely need to invert the logic
> and explicitly list what eVMCS supports) so we're better prepared to the next update.

Yep.  Inverting the logic may not be strictly necessary, i.e. might be able to go
on top, but it definitely should be done sooner than later.

As above, we also have to snapshot the "legacy" controls and restrict the guest to
the legacy controls when KVM is _not_ enforcing userspace accesses.

Let me package up what I have so far, do some (very) light testing, and post it as
RFC so that we can make this less theoretical, and so that I can hand things back
off to you.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2022-08-23 20:36 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags Vitaly Kuznetsov
2022-08-18 15:14   ` Sean Christopherson
2022-08-18 15:20     ` Vitaly Kuznetsov
2022-08-18 15:49       ` Sean Christopherson
2022-08-18 15:59         ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 02/26] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 03/26] x86/hyperv: Update " Vitaly Kuznetsov
2022-08-18 15:21   ` Sean Christopherson
2022-08-18 15:29     ` Vitaly Kuznetsov
2022-08-18 17:57       ` Sean Christopherson
2022-08-22  9:18         ` Vitaly Kuznetsov
2022-08-22 15:55           ` Sean Christopherson
2022-08-22 16:21             ` Vitaly Kuznetsov
2022-08-22 17:01               ` Sean Christopherson
2022-08-22 17:46                 ` Vitaly Kuznetsov
2022-08-22 18:32                   ` Sean Christopherson
2022-08-23  7:33                     ` Vitaly Kuznetsov
2022-08-23 15:00                       ` Sean Christopherson
2022-08-23 15:31                         ` Sean Christopherson
2022-08-23 16:54                         ` Vitaly Kuznetsov
2022-08-23 20:16                           ` Sean Christopherson
2022-08-22 16:13           ` Sean Christopherson
2022-08-22 16:24             ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 04/26] KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 05/26] KVM: nVMX: Support several new fields in eVMCSv1 Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 06/26] KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 07/26] KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 08/26] KVM: selftests: Switch to updated eVMCSv1 definition Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS Vitaly Kuznetsov
2022-08-18 17:15   ` Sean Christopherson
2022-08-19  8:06     ` Vitaly Kuznetsov
2022-08-19 17:02       ` Sean Christopherson
2022-08-22  8:47         ` Vitaly Kuznetsov
2022-08-22 16:50           ` Sean Christopherson
2022-08-22 17:49             ` Vitaly Kuznetsov
2022-08-18 17:19   ` Sean Christopherson
2022-08-19  7:42     ` Vitaly Kuznetsov
2022-08-19 14:49       ` Sean Christopherson
2022-08-19 15:07         ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 10/26] KVM: selftests: Enable TSC scaling in evmcs selftest Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 11/26] KVM: VMX: Get rid of eVMCS specific VMX controls sanitization Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 12/26] KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config() Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 13/26] KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 14/26] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 15/26] KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 16/26] KVM: VMX: Extend VMX controls macro shenanigans Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 17/26] KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of setup_vmcs_config() Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 18/26] KVM: VMX: Add missing VMEXIT controls to vmcs_config Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 19/26] KVM: VMX: Add missing CPU based VM execution " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 20/26] KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not setup Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 21/26] KVM: x86: VMX: Replace some Intel model numbers with mnemonics Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config() Vitaly Kuznetsov
2022-08-18 17:49   ` Sean Christopherson
2022-08-19  7:48     ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 23/26] KVM: nVMX: Always set required-1 bits of pinbased_ctls to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 24/26] KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 25/26] KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 26/26] KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up nested MSR Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).