linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] KVM: nVMX: Fix Windows 11 + WSL2 + Enlightened VMCS
@ 2021-12-14 14:38 Vitaly Kuznetsov
  2021-12-14 14:38 ` [PATCH 1/5] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Vitaly Kuznetsov
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Vitaly Kuznetsov @ 2021-12-14 14:38 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

Windows 11 with enabled Hyper-V role doesn't boot on KVM when Enlightened
VMCS interface is provided to it. The observed behavior doesn't conform to
Hyper-V TLFS. In particular, I'm observing 'VMREAD' instructions trying to
access field 0x4404 ("VM-exit interruption information"). TLFS, however, is
very clear this should not be happening:

"Any VMREAD or VMWRITE instructions while an enlightened VMCS is active is
unsupported and can result in unexpected behavior."

Microsoft confirms this is a bug in Hyper-V which is supposed to get fixed
eventually. For the time being, implement a workaround in KVM allowing 
VMREAD instructions to read from the currently loaded Enlightened VMCS.

Patches 1-2 are unrelated fixes to VMX feature MSR filtering when eVMCS is
enabled. Patches 3 and 4 are preparatory changes, patch 5 implements the
workaround.

Vitaly Kuznetsov (5):
  KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
  KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
  KVM: nVMX: Rename vmcs_to_field_offset{,_table} to
    vmcs12_field_offset{,_table}
  KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
  KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use

 arch/x86/kvm/vmx/evmcs.c  |  4 ++--
 arch/x86/kvm/vmx/evmcs.h  | 48 +++++++++++++++++++++++++++++++--------
 arch/x86/kvm/vmx/nested.c | 42 ++++++++++++++++++++++++----------
 arch/x86/kvm/vmx/vmcs12.c |  4 ++--
 arch/x86/kvm/vmx/vmcs12.h |  6 ++---
 5 files changed, 76 insertions(+), 28 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
  2021-12-14 14:38 [PATCH 0/5] KVM: nVMX: Fix Windows 11 + WSL2 + Enlightened VMCS Vitaly Kuznetsov
@ 2021-12-14 14:38 ` Vitaly Kuznetsov
  2021-12-14 14:38 ` [PATCH 2/5] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Vitaly Kuznetsov @ 2021-12-14 14:38 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS,
MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair,
MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way
MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely
on 'true' MSR data.

Note, none of the currently existing Windows/Hyper-V versions are known
to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change
is aimed at making the filtering future proof.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index ba6f99f584ac..a7ed30d5647a 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -362,6 +362,7 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
 	case MSR_IA32_VMX_PROCBASED_CTLS2:
 		ctl_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
 		break;
+	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
 	case MSR_IA32_VMX_PINBASED_CTLS:
 		ctl_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
 		break;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
  2021-12-14 14:38 [PATCH 0/5] KVM: nVMX: Fix Windows 11 + WSL2 + Enlightened VMCS Vitaly Kuznetsov
  2021-12-14 14:38 ` [PATCH 1/5] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Vitaly Kuznetsov
@ 2021-12-14 14:38 ` Vitaly Kuznetsov
  2021-12-14 14:38 ` [PATCH 3/5] KVM: nVMX: Rename vmcs_to_field_offset{,_table} to vmcs12_field_offset{,_table} Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Vitaly Kuznetsov @ 2021-12-14 14:38 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field,
PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes
sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too.

Note, none of the currently existing Windows/Hyper-V versions are known
to enable 'save VMX-preemption timer value' when eVMCS is in use, the
change is aimed at making the filtering future proof.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 16731d2cf231..3a461a32128b 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -59,7 +59,9 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
 	 SECONDARY_EXEC_SHADOW_VMCS |					\
 	 SECONDARY_EXEC_TSC_SCALING |					\
 	 SECONDARY_EXEC_PAUSE_LOOP_EXITING)
-#define EVMCS1_UNSUPPORTED_VMEXIT_CTRL (VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL)
+#define EVMCS1_UNSUPPORTED_VMEXIT_CTRL					\
+	(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |				\
+	 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 #define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
 #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] KVM: nVMX: Rename vmcs_to_field_offset{,_table} to vmcs12_field_offset{,_table}
  2021-12-14 14:38 [PATCH 0/5] KVM: nVMX: Fix Windows 11 + WSL2 + Enlightened VMCS Vitaly Kuznetsov
  2021-12-14 14:38 ` [PATCH 1/5] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Vitaly Kuznetsov
  2021-12-14 14:38 ` [PATCH 2/5] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Vitaly Kuznetsov
@ 2021-12-14 14:38 ` Vitaly Kuznetsov
  2022-01-06 23:36   ` Sean Christopherson
  2021-12-14 14:38 ` [PATCH 4/5] KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread() Vitaly Kuznetsov
  2021-12-14 14:38 ` [PATCH 5/5] KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use Vitaly Kuznetsov
  4 siblings, 1 reply; 10+ messages in thread
From: Vitaly Kuznetsov @ 2021-12-14 14:38 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque
blob which is not supposed to be accessed directly. In fact,
vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 6 +++---
 arch/x86/kvm/vmx/vmcs12.c | 4 ++--
 arch/x86/kvm/vmx/vmcs12.h | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 9c941535f78c..0b990a6914c1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5086,7 +5086,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
 	/* Decode instruction info and find the field to read */
 	field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));
 
-	offset = vmcs_field_to_offset(field);
+	offset = vmcs12_field_offset(field);
 	if (offset < 0)
 		return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
 
@@ -5189,7 +5189,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
 
 	field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));
 
-	offset = vmcs_field_to_offset(field);
+	offset = vmcs12_field_offset(field);
 	if (offset < 0)
 		return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
 
@@ -6435,7 +6435,7 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void)
 	max_idx = 0;
 	for (i = 0; i < nr_vmcs12_fields; i++) {
 		/* The vmcs12 table is very, very sparsely populated. */
-		if (!vmcs_field_to_offset_table[i])
+		if (!vmcs12_field_offset_table[i])
 			continue;
 
 		idx = vmcs_field_index(VMCS12_IDX_TO_ENC(i));
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index cab6ba7a5005..61db9bc8842f 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -8,7 +8,7 @@
 	FIELD(number, name),						\
 	[ROL16(number##_HIGH, 6)] = VMCS12_OFFSET(name) + sizeof(u32)
 
-const unsigned short vmcs_field_to_offset_table[] = {
+const unsigned short vmcs12_field_offset_table[] = {
 	FIELD(VIRTUAL_PROCESSOR_ID, virtual_processor_id),
 	FIELD(POSTED_INTR_NV, posted_intr_nv),
 	FIELD(GUEST_ES_SELECTOR, guest_es_selector),
@@ -151,4 +151,4 @@ const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD(HOST_RSP, host_rsp),
 	FIELD(HOST_RIP, host_rip),
 };
-const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs_field_to_offset_table);
+const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offset_table);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 2a45f026ee11..13e2bd017538 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -361,10 +361,10 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(guest_pml_index, 996);
 }
 
-extern const unsigned short vmcs_field_to_offset_table[];
+extern const unsigned short vmcs12_field_offset_table[];
 extern const unsigned int nr_vmcs12_fields;
 
-static inline short vmcs_field_to_offset(unsigned long field)
+static inline short vmcs12_field_offset(unsigned long field)
 {
 	unsigned short offset;
 	unsigned int index;
@@ -377,7 +377,7 @@ static inline short vmcs_field_to_offset(unsigned long field)
 		return -ENOENT;
 
 	index = array_index_nospec(index, nr_vmcs12_fields);
-	offset = vmcs_field_to_offset_table[index];
+	offset = vmcs12_field_offset_table[index];
 	if (offset == 0)
 		return -ENOENT;
 	return offset;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
  2021-12-14 14:38 [PATCH 0/5] KVM: nVMX: Fix Windows 11 + WSL2 + Enlightened VMCS Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2021-12-14 14:38 ` [PATCH 3/5] KVM: nVMX: Rename vmcs_to_field_offset{,_table} to vmcs12_field_offset{,_table} Vitaly Kuznetsov
@ 2021-12-14 14:38 ` Vitaly Kuznetsov
  2021-12-14 14:38 ` [PATCH 5/5] KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use Vitaly Kuznetsov
  4 siblings, 0 replies; 10+ messages in thread
From: Vitaly Kuznetsov @ 2021-12-14 14:38 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

In preparation to allowing reads from Enlightened VMCS from
handle_vmread(), implement evmcs_field_offset() to get the correct
read offset. get_evmcs_offset(), which is being used by KVM-on-Hyper-V,
is almost what's needed but a few things need to be adjusted. First,
WARN_ON() is unacceptable for handle_vmread() as any field can (in
theory) be supplied by the guest and not all fields are defined in
eVMCS v1. Second, we need to handle 'holes' in eVMCS (missing fields).
It also sounds like a good idea to WARN_ON() if such fields are ever
accessed by KVM-on-Hyper-V.

Implement dedicated evmcs_field_offset() helper.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.c |  3 +--
 arch/x86/kvm/vmx/evmcs.h | 32 ++++++++++++++++++++++++--------
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index a7ed30d5647a..87e3dc10edf4 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -12,8 +12,6 @@
 
 DEFINE_STATIC_KEY_FALSE(enable_evmcs);
 
-#if IS_ENABLED(CONFIG_HYPERV)
-
 #define EVMCS1_OFFSET(x) offsetof(struct hv_enlightened_vmcs, x)
 #define EVMCS1_FIELD(number, name, clean_field)[ROL16(number, 6)] = \
 		{EVMCS1_OFFSET(name), clean_field}
@@ -296,6 +294,7 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
 };
 const unsigned int nr_evmcs_1_fields = ARRAY_SIZE(vmcs_field_to_evmcs_1);
 
+#if IS_ENABLED(CONFIG_HYPERV)
 __init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
 {
 	vmcs_conf->pin_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_PINCTRL;
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 3a461a32128b..9bc2521b159e 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -65,8 +65,6 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
 #define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
 #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
 
-#if IS_ENABLED(CONFIG_HYPERV)
-
 struct evmcs_field {
 	u16 offset;
 	u16 clean_field;
@@ -75,26 +73,44 @@ struct evmcs_field {
 extern const struct evmcs_field vmcs_field_to_evmcs_1[];
 extern const unsigned int nr_evmcs_1_fields;
 
-static __always_inline int get_evmcs_offset(unsigned long field,
-					    u16 *clean_field)
+static __always_inline int evmcs_field_offset(unsigned long field,
+					      u16 *clean_field)
 {
 	unsigned int index = ROL16(field, 6);
 	const struct evmcs_field *evmcs_field;
 
-	if (unlikely(index >= nr_evmcs_1_fields)) {
-		WARN_ONCE(1, "KVM: accessing unsupported EVMCS field %lx\n",
-			  field);
+	if (unlikely(index >= nr_evmcs_1_fields))
 		return -ENOENT;
-	}
 
 	evmcs_field = &vmcs_field_to_evmcs_1[index];
 
+	/*
+	 * Use offset=0 to detect holes in eVMCS. This offset belongs to
+	 * 'revision_id' but this field has no encoding and is supposed to
+	 * be accessed directly.
+	 */
+	if (unlikely(!evmcs_field->offset))
+		return -ENOENT;
+
 	if (clean_field)
 		*clean_field = evmcs_field->clean_field;
 
 	return evmcs_field->offset;
 }
 
+#if IS_ENABLED(CONFIG_HYPERV)
+
+static __always_inline int get_evmcs_offset(unsigned long field,
+					    u16 *clean_field)
+{
+	int offset = evmcs_field_offset(field, clean_field);
+
+	WARN_ONCE(offset < 0, "KVM: accessing unsupported EVMCS field %lx\n",
+		  field);
+
+	return offset;
+}
+
 static __always_inline void evmcs_write64(unsigned long field, u64 value)
 {
 	u16 clean_field;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
  2021-12-14 14:38 [PATCH 0/5] KVM: nVMX: Fix Windows 11 + WSL2 + Enlightened VMCS Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2021-12-14 14:38 ` [PATCH 4/5] KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread() Vitaly Kuznetsov
@ 2021-12-14 14:38 ` Vitaly Kuznetsov
  2022-01-07  0:03   ` Sean Christopherson
  4 siblings, 1 reply; 10+ messages in thread
From: Vitaly Kuznetsov @ 2021-12-14 14:38 UTC (permalink / raw)
  To: kvm, Paolo Bonzini
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when
Enlightened VMCS interface is in use:

"Any VMREAD or VMWRITE instructions while an enlightened VMCS is
active is unsupported and can result in unexpected behavior.""

Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field
0x4404 ("VM-exit interruption information") are observed. Failing
these attempts with nested_vmx_failInvalid() makes such guests
unbootable.

Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed
eventually but for the time being we need a workaround. (Temporary) allow
VMREAD to get data from the currently loaded Enlightened VMCS.

Note: VMWRITE instructions remain forbidden, it is not clear how to
handle them properly and hopefully won't ever be needed.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.h  | 12 ++++++++++++
 arch/x86/kvm/vmx/nested.c | 38 ++++++++++++++++++++++++++++----------
 2 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 9bc2521b159e..8d70f9aea94b 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -98,6 +98,18 @@ static __always_inline int evmcs_field_offset(unsigned long field,
 	return evmcs_field->offset;
 }
 
+static inline u64 evmcs_read_any(struct hv_enlightened_vmcs *evmcs,
+				 unsigned long field, u16 offset)
+{
+	/*
+	 * vmcs12_read_any() doesn't care whether the supplied structure
+	 * is 'struct vmcs12' or 'struct hv_enlightened_vmcs' as it takes
+	 * the exact offset of the required field, use it for convenience
+	 * here.
+	 */
+	return vmcs12_read_any((void *)evmcs, field, offset);
+}
+
 #if IS_ENABLED(CONFIG_HYPERV)
 
 static __always_inline int get_evmcs_offset(unsigned long field,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0b990a6914c1..27fedb220a23 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -7,6 +7,7 @@
 #include <asm/mmu_context.h>
 
 #include "cpuid.h"
+#include "evmcs.h"
 #include "hyperv.h"
 #include "mmu.h"
 #include "nested.h"
@@ -5074,27 +5075,44 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
 	if (!nested_vmx_check_permission(vcpu))
 		return 1;
 
+	/* Normal or Enlightened VMPTRLD must be performed first */
+	if (vmx->nested.current_vmptr == INVALID_GPA &&
+	    !evmptr_is_valid(vmx->nested.hv_evmcs_vmptr))
+		return nested_vmx_failInvalid(vcpu);
+
 	/*
 	 * In VMX non-root operation, when the VMCS-link pointer is INVALID_GPA,
 	 * any VMREAD sets the ALU flags for VMfailInvalid.
 	 */
-	if (vmx->nested.current_vmptr == INVALID_GPA ||
-	    (is_guest_mode(vcpu) &&
-	     get_vmcs12(vcpu)->vmcs_link_pointer == INVALID_GPA))
+	if (is_guest_mode(vcpu) &&
+	    get_vmcs12(vcpu)->vmcs_link_pointer == INVALID_GPA)
 		return nested_vmx_failInvalid(vcpu);
 
 	/* Decode instruction info and find the field to read */
 	field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));
 
-	offset = vmcs12_field_offset(field);
-	if (offset < 0)
-		return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
+	/*
+	 * Inside guest mode, Enlightened VMCS is not the ultimate source of
+	 * truth, shadow VMCS12/VMCS02 are.
+	 */
+	if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr) && !is_guest_mode(vcpu)) {
+		offset = evmcs_field_offset(field, NULL);
+		if (offset < 0)
+			return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
 
-	if (!is_guest_mode(vcpu) && is_vmcs12_ext_field(field))
-		copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12);
+		/* Read the field, zero-extended to a u64 value */
+		value = evmcs_read_any(vmx->nested.hv_evmcs, field, offset);
+	} else {
+		offset = vmcs12_field_offset(field);
+		if (offset < 0)
+			return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
 
-	/* Read the field, zero-extended to a u64 value */
-	value = vmcs12_read_any(vmcs12, field, offset);
+		if (!is_guest_mode(vcpu) && is_vmcs12_ext_field(field))
+			copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12);
+
+		/* Read the field, zero-extended to a u64 value */
+		value = vmcs12_read_any(vmcs12, field, offset);
+	}
 
 	/*
 	 * Now copy part of this value to register or memory, as requested.
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/5] KVM: nVMX: Rename vmcs_to_field_offset{,_table} to vmcs12_field_offset{,_table}
  2021-12-14 14:38 ` [PATCH 3/5] KVM: nVMX: Rename vmcs_to_field_offset{,_table} to vmcs12_field_offset{,_table} Vitaly Kuznetsov
@ 2022-01-06 23:36   ` Sean Christopherson
  2022-01-07  8:45     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-01-06 23:36 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

On Tue, Dec 14, 2021, Vitaly Kuznetsov wrote:
> vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque
> blob which is not supposed to be accessed directly. In fact,
> vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure.
> 
> No functional change intended.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
> index 2a45f026ee11..13e2bd017538 100644
> --- a/arch/x86/kvm/vmx/vmcs12.h
> +++ b/arch/x86/kvm/vmx/vmcs12.h
> @@ -361,10 +361,10 @@ static inline void vmx_check_vmcs12_offsets(void)
>  	CHECK_OFFSET(guest_pml_index, 996);
>  }
>  
> -extern const unsigned short vmcs_field_to_offset_table[];
> +extern const unsigned short vmcs12_field_offset_table[];

While we're tweaking names, what about dropping "table" and calling this
vmcs12_field_offsets?

>  extern const unsigned int nr_vmcs12_fields;
>  
> -static inline short vmcs_field_to_offset(unsigned long field)
> +static inline short vmcs12_field_offset(unsigned long field)

And get_vmcs12_field_offset() here to make it more obvious that it's translating
something to an offset, which is communicated by the "to" in the current name.

>  {
>  	unsigned short offset;
>  	unsigned int index;
> @@ -377,7 +377,7 @@ static inline short vmcs_field_to_offset(unsigned long field)
>  		return -ENOENT;
>  
>  	index = array_index_nospec(index, nr_vmcs12_fields);
> -	offset = vmcs_field_to_offset_table[index];
> +	offset = vmcs12_field_offset_table[index];
>  	if (offset == 0)
>  		return -ENOENT;
>  	return offset;
> -- 
> 2.33.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
  2021-12-14 14:38 ` [PATCH 5/5] KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use Vitaly Kuznetsov
@ 2022-01-07  0:03   ` Sean Christopherson
  2022-01-07  8:49     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-01-07  0:03 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

On Tue, Dec 14, 2021, Vitaly Kuznetsov wrote:
> Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when
> Enlightened VMCS interface is in use:
> 
> "Any VMREAD or VMWRITE instructions while an enlightened VMCS is
> active is unsupported and can result in unexpected behavior.""
> 
> Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field
> 0x4404 ("VM-exit interruption information") are observed. Failing
> these attempts with nested_vmx_failInvalid() makes such guests
> unbootable.
> 
> Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed
> eventually but for the time being we need a workaround. (Temporary) allow

Temporarily.  And for the record, I highly doubt this will be temporary :-)

> VMREAD to get data from the currently loaded Enlightened VMCS.

...

> @@ -5074,27 +5075,44 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
>  	if (!nested_vmx_check_permission(vcpu))
>  		return 1;
>  
> +	/* Normal or Enlightened VMPTRLD must be performed first */
> +	if (vmx->nested.current_vmptr == INVALID_GPA &&
> +	    !evmptr_is_valid(vmx->nested.hv_evmcs_vmptr))
> +		return nested_vmx_failInvalid(vcpu);

I believe this is wrong, as it allows this combination

	current_vmptr == INVALID_GPA && evmptr_is_valid() && is_guest_mode()

which is eVMCS with VMCS shadowing exposed to L2.  SECONDARY_EXEC_SHADOW_VMCS is
listed in EVMCS1_UNSUPPORTED_2NDEXEC, so it should be impossible for VMCS shadowing
to be enabled for L2.  And if VMCS shadowing is not enabled, all VMREADs cause
exits to L1, i.e. shouldn't reach this point.  If we want to allow that behavior,
then I think that should be a separate change.

Assuming eVMCS really isn't compatible with shadow VMCS, I believe we can do:

	/*
	 * Decode instruction info and find the field to read.  This can be
	 * done speculatively as there are no side effects
	 */
	field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));

	if (!evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) {
		/*
		 * In VMX non-root operation, when the VMCS-link pointer is
		 * INVALID_GPA, any VMREAD sets the ALU flags for VMfailInvalid.
		 */
		if (vmx->nested.current_vmptr == INVALID_GPA ||
		    (is_guest_mode(vcpu) &&
		     get_vmcs12(vcpu)->vmcs_link_pointer == INVALID_GPA))
			return nested_vmx_failInvalid(vcpu);

		offset = vmcs12_field_offset(field);
		if (offset < 0)
			return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);

		if (!is_guest_mode(vcpu) && is_vmcs12_ext_field(field))
			copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12);

		/* Read the field, zero-extended to a u64 value */
		value = vmcs12_read_any(vmcs12, field, offset);
	} else {
		/*
		 * <snarky comment about Hyper-V>
		 */
		if (WARN_ON_ONCE(is_guest_mode(vcpu))
			return nested_vmx_failInvalid(vcpu);

		offset = evmcs_field_offset(field, NULL);
		if (offset < 0)
			return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);

		/* Read the field, zero-extended to a u64 value */
		value = evmcs_read_any(vmx->nested.hv_evmcs, field, offset);
	}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/5] KVM: nVMX: Rename vmcs_to_field_offset{,_table} to vmcs12_field_offset{,_table}
  2022-01-06 23:36   ` Sean Christopherson
@ 2022-01-07  8:45     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 10+ messages in thread
From: Vitaly Kuznetsov @ 2022-01-07  8:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Dec 14, 2021, Vitaly Kuznetsov wrote:
>> vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque
>> blob which is not supposed to be accessed directly. In fact,
>> vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure.
>> 
>> No functional change intended.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
>> index 2a45f026ee11..13e2bd017538 100644
>> --- a/arch/x86/kvm/vmx/vmcs12.h
>> +++ b/arch/x86/kvm/vmx/vmcs12.h
>> @@ -361,10 +361,10 @@ static inline void vmx_check_vmcs12_offsets(void)
>>  	CHECK_OFFSET(guest_pml_index, 996);
>>  }
>>  
>> -extern const unsigned short vmcs_field_to_offset_table[];
>> +extern const unsigned short vmcs12_field_offset_table[];
>
> While we're tweaking names, what about dropping "table" and calling this
> vmcs12_field_offsets?
>

Ok.

>>  extern const unsigned int nr_vmcs12_fields;
>>  
>> -static inline short vmcs_field_to_offset(unsigned long field)
>> +static inline short vmcs12_field_offset(unsigned long field)
>
> And get_vmcs12_field_offset() here to make it more obvious that it's translating
> something to an offset, which is communicated by the "to" in the current name.
>

I think we could've even used just 'vmcs12_field_offset()' as I don't
see any ambiguity in it but 4 additional letters shouldn't hurt.

>>  {
>>  	unsigned short offset;
>>  	unsigned int index;
>> @@ -377,7 +377,7 @@ static inline short vmcs_field_to_offset(unsigned long field)
>>  		return -ENOENT;
>>  
>>  	index = array_index_nospec(index, nr_vmcs12_fields);
>> -	offset = vmcs_field_to_offset_table[index];
>> +	offset = vmcs12_field_offset_table[index];
>>  	if (offset == 0)
>>  		return -ENOENT;
>>  	return offset;
>> -- 
>> 2.33.1
>> 
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
  2022-01-07  0:03   ` Sean Christopherson
@ 2022-01-07  8:49     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 10+ messages in thread
From: Vitaly Kuznetsov @ 2022-01-07  8:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Maxim Levitsky,
	linux-kernel

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Dec 14, 2021, Vitaly Kuznetsov wrote:
>> Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when
>> Enlightened VMCS interface is in use:
>> 
>> "Any VMREAD or VMWRITE instructions while an enlightened VMCS is
>> active is unsupported and can result in unexpected behavior.""
>> 
>> Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field
>> 0x4404 ("VM-exit interruption information") are observed. Failing
>> these attempts with nested_vmx_failInvalid() makes such guests
>> unbootable.
>> 
>> Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed
>> eventually but for the time being we need a workaround. (Temporary) allow
>
> Temporarily.  And for the record, I highly doubt this will be temporary :-)
>

I'm just trying to be optimistic, at least in commit messages, you know :-)

>> VMREAD to get data from the currently loaded Enlightened VMCS.
>
> ...
>
>> @@ -5074,27 +5075,44 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
>>  	if (!nested_vmx_check_permission(vcpu))
>>  		return 1;
>>  
>> +	/* Normal or Enlightened VMPTRLD must be performed first */
>> +	if (vmx->nested.current_vmptr == INVALID_GPA &&
>> +	    !evmptr_is_valid(vmx->nested.hv_evmcs_vmptr))
>> +		return nested_vmx_failInvalid(vcpu);
>
> I believe this is wrong, as it allows this combination
>
> 	current_vmptr == INVALID_GPA && evmptr_is_valid() && is_guest_mode()
>
> which is eVMCS with VMCS shadowing exposed to L2.  SECONDARY_EXEC_SHADOW_VMCS is
> listed in EVMCS1_UNSUPPORTED_2NDEXEC, so it should be impossible for VMCS shadowing
> to be enabled for L2.  And if VMCS shadowing is not enabled, all VMREADs cause
> exits to L1, i.e. shouldn't reach this point.  If we want to allow that behavior,
> then I think that should be a separate change.

I think you're right, there's no need to allow for that. I'll use your
suggestion from below, thanks!

>
> Assuming eVMCS really isn't compatible with shadow VMCS, I believe we can do:
>
> 	/*
> 	 * Decode instruction info and find the field to read.  This can be
> 	 * done speculatively as there are no side effects
> 	 */
> 	field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));
>
> 	if (!evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) {
> 		/*
> 		 * In VMX non-root operation, when the VMCS-link pointer is
> 		 * INVALID_GPA, any VMREAD sets the ALU flags for VMfailInvalid.
> 		 */
> 		if (vmx->nested.current_vmptr == INVALID_GPA ||
> 		    (is_guest_mode(vcpu) &&
> 		     get_vmcs12(vcpu)->vmcs_link_pointer == INVALID_GPA))
> 			return nested_vmx_failInvalid(vcpu);
>
> 		offset = vmcs12_field_offset(field);
> 		if (offset < 0)
> 			return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
>
> 		if (!is_guest_mode(vcpu) && is_vmcs12_ext_field(field))
> 			copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12);
>
> 		/* Read the field, zero-extended to a u64 value */
> 		value = vmcs12_read_any(vmcs12, field, offset);
> 	} else {
> 		/*
> 		 * <snarky comment about Hyper-V>
> 		 */
> 		if (WARN_ON_ONCE(is_guest_mode(vcpu))
> 			return nested_vmx_failInvalid(vcpu);
>
> 		offset = evmcs_field_offset(field, NULL);
> 		if (offset < 0)
> 			return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
>
> 		/* Read the field, zero-extended to a u64 value */
> 		value = evmcs_read_any(vmx->nested.hv_evmcs, field, offset);
> 	}
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-01-07  8:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-14 14:38 [PATCH 0/5] KVM: nVMX: Fix Windows 11 + WSL2 + Enlightened VMCS Vitaly Kuznetsov
2021-12-14 14:38 ` [PATCH 1/5] KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Vitaly Kuznetsov
2021-12-14 14:38 ` [PATCH 2/5] KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Vitaly Kuznetsov
2021-12-14 14:38 ` [PATCH 3/5] KVM: nVMX: Rename vmcs_to_field_offset{,_table} to vmcs12_field_offset{,_table} Vitaly Kuznetsov
2022-01-06 23:36   ` Sean Christopherson
2022-01-07  8:45     ` Vitaly Kuznetsov
2021-12-14 14:38 ` [PATCH 4/5] KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread() Vitaly Kuznetsov
2021-12-14 14:38 ` [PATCH 5/5] KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use Vitaly Kuznetsov
2022-01-07  0:03   ` Sean Christopherson
2022-01-07  8:49     ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).