KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH RFC 0/3] x86/kvm/hyper-v: fix enlightened VMCS & QEMU4.2
@ 2020-01-15 17:10 Vitaly Kuznetsov
  2020-01-15 17:10 ` [PATCH RFC 1/3] x86/kvm/hyper-v: remove stale evmcs_already_enabled check from nested_enable_evmcs() Vitaly Kuznetsov
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-15 17:10 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Jim Mattson, linux-kernel,
	Liran Alon, Roman Kagan

With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
with default (matching CPU model) values and in case eVMCS is also enabled,
fails. While the regression is in QEMU, it may still be preferable to
fix this in KVM.

It would be great if we could just omit the VMX feature filtering in KVM
and make this guest's responsibility: if it switches to using enlightened
vmcs it should be aware that not all hardware features are going to be
supported. Genuine Hyper-V, however, fails the test. In particular, it
enables SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES and without
'apic_access_addr' field in eVMCS there's not much we can do in KVM.

The suggested approach in this patch series is: move VMX feature
filtering to vmx_get_msr() so only guest doesn't see them when eVMCS
is enabled (PATCH2) and add a check that it doesn't enable them
(PATCH3).

I can't say that I'm a great fan of this workaround myself, thus RFC.

My initial RFC sent to qemu-devel@:
https://lists.nongnu.org/archive/html/qemu-devel/2020-01/msg00123.html

Vitaly Kuznetsov (3):
  x86/kvm/hyper-v: remove stale evmcs_already_enabled check from
    nested_enable_evmcs()
  x86/kvm/hyper-v: move VMX controls sanitization out of
    nested_enable_evmcs()
  x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for
    nested guests

 arch/x86/kvm/vmx/evmcs.c  | 99 ++++++++++++++++++++++++++++++++++-----
 arch/x86/kvm/vmx/evmcs.h  |  2 +
 arch/x86/kvm/vmx/nested.c |  3 ++
 arch/x86/kvm/vmx/vmx.c    | 10 +++-
 4 files changed, 100 insertions(+), 14 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH RFC 1/3] x86/kvm/hyper-v: remove stale evmcs_already_enabled check from nested_enable_evmcs()
  2020-01-15 17:10 [PATCH RFC 0/3] x86/kvm/hyper-v: fix enlightened VMCS & QEMU4.2 Vitaly Kuznetsov
@ 2020-01-15 17:10 ` Vitaly Kuznetsov
  2020-01-15 22:50   ` Liran Alon
  2020-01-15 17:10 ` [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() Vitaly Kuznetsov
  2020-01-15 17:10 ` [PATCH RFC 3/3] x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests Vitaly Kuznetsov
  2 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-15 17:10 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Jim Mattson, linux-kernel,
	Liran Alon, Roman Kagan

In nested_enable_evmcs() evmcs_already_enabled check doesn't really do
anything: controls are already sanitized and we return '0' regardless.
Just drop the check.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 72359709cdc1..89c3e0caf39f 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -350,17 +350,12 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 			uint16_t *vmcs_version)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	bool evmcs_already_enabled = vmx->nested.enlightened_vmcs_enabled;
 
 	vmx->nested.enlightened_vmcs_enabled = true;
 
 	if (vmcs_version)
 		*vmcs_version = nested_get_evmcs_version(vcpu);
 
-	/* We don't support disabling the feature for simplicity. */
-	if (evmcs_already_enabled)
-		return 0;
-
 	vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
 	vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
 	vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-15 17:10 [PATCH RFC 0/3] x86/kvm/hyper-v: fix enlightened VMCS & QEMU4.2 Vitaly Kuznetsov
  2020-01-15 17:10 ` [PATCH RFC 1/3] x86/kvm/hyper-v: remove stale evmcs_already_enabled check from nested_enable_evmcs() Vitaly Kuznetsov
@ 2020-01-15 17:10 ` Vitaly Kuznetsov
  2020-01-15 22:49   ` Liran Alon
                     ` (2 more replies)
  2020-01-15 17:10 ` [PATCH RFC 3/3] x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests Vitaly Kuznetsov
  2 siblings, 3 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-15 17:10 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Jim Mattson, linux-kernel,
	Liran Alon, Roman Kagan

With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
with default (matching CPU model) values and in case eVMCS is also enabled,
fails.

It would be possible to drop VMX feature filtering completely and make
this a guest's responsibility: if it decides to use eVMCS it should know
which fields are available and which are not. Hyper-V mostly complies to
this, however, there is at least one problematic control:
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
which Hyper-V enables. As there is no 'apic_addr_field' in eVMCS, we
fail to handle this properly in KVM. It is unclear how this is supposed
to work, genuine Hyper-V doesn't expose the control so it is possible that
this is just a bug (in Hyper-V).

Move VMX controls sanitization from nested_enable_evmcs() to vmx_get_msr(),
this allows userspace to keep setting controls it wants and at the same
time hides them from the guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.c | 38 ++++++++++++++++++++++++++++++++------
 arch/x86/kvm/vmx/evmcs.h |  1 +
 arch/x86/kvm/vmx/vmx.c   | 10 ++++++++--
 3 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 89c3e0caf39f..b5d6582ba589 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -346,6 +346,38 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
        return 0;
 }
 
+void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
+{
+	u32 ctl_low = (u32)*pdata, ctl_high = (u32)(*pdata >> 32);
+	/*
+	 * Enlightened VMCS doesn't have certain fields, make sure we don't
+	 * expose unsupported controls to L1.
+	 */
+
+	switch (msr_index) {
+	case MSR_IA32_VMX_PINBASED_CTLS:
+	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
+		ctl_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
+		break;
+	case MSR_IA32_VMX_EXIT_CTLS:
+	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
+		ctl_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
+		break;
+	case MSR_IA32_VMX_ENTRY_CTLS:
+	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
+		ctl_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
+		break;
+	case MSR_IA32_VMX_PROCBASED_CTLS2:
+		ctl_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
+		break;
+	case MSR_IA32_VMX_VMFUNC:
+		ctl_low &= ~EVMCS1_UNSUPPORTED_VMFUNC;
+		break;
+	}
+
+	*pdata = ctl_low | ((u64)ctl_high << 32);
+}
+
 int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 			uint16_t *vmcs_version)
 {
@@ -356,11 +388,5 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 	if (vmcs_version)
 		*vmcs_version = nested_get_evmcs_version(vcpu);
 
-	vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
-	vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
-	vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
-	vmx->nested.msrs.secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
-	vmx->nested.msrs.vmfunc_controls &= ~EVMCS1_UNSUPPORTED_VMFUNC;
-
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 07ebf6882a45..b88d9807a796 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -201,5 +201,6 @@ bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa);
 uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
 int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 			uint16_t *vmcs_version);
+void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
 
 #endif /* __KVM_X86_VMX_EVMCS_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e3394c839dea..8eb74618b8d8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1849,8 +1849,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
-		return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
-				       &msr_info->data);
+		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
+				    &msr_info->data))
+			return 1;
+		if (!msr_info->host_initiated &&
+		    vmx->nested.enlightened_vmcs_enabled)
+			nested_evmcs_filter_control_msr(msr_info->index,
+							&msr_info->data);
+		break;
 	case MSR_IA32_RTIT_CTL:
 		if (pt_mode != PT_MODE_HOST_GUEST)
 			return 1;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH RFC 3/3] x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
  2020-01-15 17:10 [PATCH RFC 0/3] x86/kvm/hyper-v: fix enlightened VMCS & QEMU4.2 Vitaly Kuznetsov
  2020-01-15 17:10 ` [PATCH RFC 1/3] x86/kvm/hyper-v: remove stale evmcs_already_enabled check from nested_enable_evmcs() Vitaly Kuznetsov
  2020-01-15 17:10 ` [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() Vitaly Kuznetsov
@ 2020-01-15 17:10 ` Vitaly Kuznetsov
  2020-01-15 22:59   ` Liran Alon
  2 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-15 17:10 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Jim Mattson, linux-kernel,
	Liran Alon, Roman Kagan

Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
controls on for its guests and nested_vmx_check_controls() checks for
that. This is, however, not the case for the controls which are supported
on the host but are missing in enlightened VMCS and when eVMCS is in use.

It would certainly be possible to add these missing checks to
nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
non-eVMCS guests by doing less unrelated checks. Create a separate
nested_evmcs_check_controls() for this purpose.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx/evmcs.c  | 56 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/evmcs.h  |  1 +
 arch/x86/kvm/vmx/nested.c |  3 +++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index b5d6582ba589..88f462866396 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -4,9 +4,11 @@
 #include <linux/smp.h>
 
 #include "../hyperv.h"
-#include "evmcs.h"
 #include "vmcs.h"
+#include "vmcs12.h"
+#include "evmcs.h"
 #include "vmx.h"
+#include "trace.h"
 
 DEFINE_STATIC_KEY_FALSE(enable_evmcs);
 
@@ -378,6 +380,58 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
 	*pdata = ctl_low | ((u64)ctl_high << 32);
 }
 
+int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
+{
+	int ret = 0;
+	u32 unsupp_ctl;
+
+	unsupp_ctl = vmcs12->pin_based_vm_exec_control &
+		EVMCS1_UNSUPPORTED_PINCTRL;
+	if (unsupp_ctl) {
+		trace_kvm_nested_vmenter_failed(
+			"eVMCS: unsupported pin-based VM-execution controls",
+			unsupp_ctl);
+		ret = -EINVAL;
+	}
+
+	unsupp_ctl = vmcs12->secondary_vm_exec_control &
+		EVMCS1_UNSUPPORTED_2NDEXEC;
+	if (unsupp_ctl) {
+		trace_kvm_nested_vmenter_failed(
+			"eVMCS: unsupported secondary VM-execution controls",
+			unsupp_ctl);
+		ret = -EINVAL;
+	}
+
+	unsupp_ctl = vmcs12->vm_exit_controls &
+		EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
+	if (unsupp_ctl) {
+		trace_kvm_nested_vmenter_failed(
+			"eVMCS: unsupported VM-exit controls",
+			unsupp_ctl);
+		ret = -EINVAL;
+	}
+
+	unsupp_ctl = vmcs12->vm_entry_controls &
+		EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
+	if (unsupp_ctl) {
+		trace_kvm_nested_vmenter_failed(
+			"eVMCS: unsupported VM-entry controls",
+			unsupp_ctl);
+		ret = -EINVAL;
+	}
+
+	unsupp_ctl = vmcs12->vm_function_control & EVMCS1_UNSUPPORTED_VMFUNC;
+	if (unsupp_ctl) {
+		trace_kvm_nested_vmenter_failed(
+			"eVMCS: unsupported VM-function controls",
+			unsupp_ctl);
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
 int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 			uint16_t *vmcs_version)
 {
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index b88d9807a796..cb7517a5a41c 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -202,5 +202,6 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
 int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 			uint16_t *vmcs_version);
 void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
+int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
 
 #endif /* __KVM_X86_VMX_EVMCS_H */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4aea7d304beb..7c720b095663 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2767,6 +2767,9 @@ static int nested_vmx_check_controls(struct kvm_vcpu *vcpu,
 	    nested_check_vm_entry_controls(vcpu, vmcs12))
 		return -EINVAL;
 
+	if (to_vmx(vcpu)->nested.enlightened_vmcs_enabled)
+		return nested_evmcs_check_controls(vmcs12);
+
 	return 0;
 }
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-15 17:10 ` [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() Vitaly Kuznetsov
@ 2020-01-15 22:49   ` Liran Alon
  2020-01-16  8:37     ` Vitaly Kuznetsov
  2020-01-15 23:27   ` Sean Christopherson
  2020-01-19  8:54   ` Paolo Bonzini
  2 siblings, 1 reply; 34+ messages in thread
From: Liran Alon @ 2020-01-15 22:49 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm list, Paolo Bonzini, Sean Christopherson, Jim Mattson, LKML,
	Roman Kagan



> On 15 Jan 2020, at 19:10, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> 
> With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
> with default (matching CPU model) values and in case eVMCS is also enabled,
> fails.
> 
> It would be possible to drop VMX feature filtering completely and make
> this a guest's responsibility: if it decides to use eVMCS it should know
> which fields are available and which are not. Hyper-V mostly complies to
> this, however, there is at least one problematic control:
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
> which Hyper-V enables. As there is no 'apic_addr_field' in eVMCS, we
> fail to handle this properly in KVM. It is unclear how this is supposed
> to work, genuine Hyper-V doesn't expose the control so it is possible that
> this is just a bug (in Hyper-V).

Have you tried contacted someone at Hyper-V team about this?

> 
> Move VMX controls sanitization from nested_enable_evmcs() to vmx_get_msr(),
> this allows userspace to keep setting controls it wants and at the same
> time hides them from the guest.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> arch/x86/kvm/vmx/evmcs.c | 38 ++++++++++++++++++++++++++++++++------
> arch/x86/kvm/vmx/evmcs.h |  1 +
> arch/x86/kvm/vmx/vmx.c   | 10 ++++++++--
> 3 files changed, 41 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 89c3e0caf39f..b5d6582ba589 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -346,6 +346,38 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>        return 0;
> }
> 
> +void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
> +{
> +	u32 ctl_low = (u32)*pdata, ctl_high = (u32)(*pdata >> 32);

Nit: I dislike defining & initialising multiple local vars on same line.

> +	/*
> +	 * Enlightened VMCS doesn't have certain fields, make sure we don't
> +	 * expose unsupported controls to L1.
> +	 */
> +
> +	switch (msr_index) {
> +	case MSR_IA32_VMX_PINBASED_CTLS:
> +	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> +		ctl_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
> +		break;
> +	case MSR_IA32_VMX_EXIT_CTLS:
> +	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> +		ctl_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> +		break;
> +	case MSR_IA32_VMX_ENTRY_CTLS:
> +	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> +		ctl_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> +		break;
> +	case MSR_IA32_VMX_PROCBASED_CTLS2:
> +		ctl_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
> +		break;
> +	case MSR_IA32_VMX_VMFUNC:
> +		ctl_low &= ~EVMCS1_UNSUPPORTED_VMFUNC;
> +		break;
> +	}
> +
> +	*pdata = ctl_low | ((u64)ctl_high << 32);
> +}
> +
> int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> 			uint16_t *vmcs_version)
> {
> @@ -356,11 +388,5 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> 	if (vmcs_version)
> 		*vmcs_version = nested_get_evmcs_version(vcpu);
> 
> -	vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
> -	vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> -	vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> -	vmx->nested.msrs.secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
> -	vmx->nested.msrs.vmfunc_controls &= ~EVMCS1_UNSUPPORTED_VMFUNC;
> -
> 	return 0;
> }
> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index 07ebf6882a45..b88d9807a796 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -201,5 +201,6 @@ bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa);
> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
> int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> 			uint16_t *vmcs_version);
> +void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
> 
> #endif /* __KVM_X86_VMX_EVMCS_H */
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e3394c839dea..8eb74618b8d8 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1849,8 +1849,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
> 		if (!nested_vmx_allowed(vcpu))
> 			return 1;
> -		return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
> -				       &msr_info->data);
> +		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
> +				    &msr_info->data))
> +			return 1;
> +		if (!msr_info->host_initiated &&
> +		    vmx->nested.enlightened_vmcs_enabled)
> +			nested_evmcs_filter_control_msr(msr_info->index,
> +							&msr_info->data);
> +		break;

Nit: It seems more elegant to me to put the call to nested_evmcs_filter_control_msr() inside vmx_get_vmx_msr().

The patch itself makes sense to me and looks correct.
Reviewed-by: Liran Alon <liran.alon@oracle.com>

-Liran


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 1/3] x86/kvm/hyper-v: remove stale evmcs_already_enabled check from nested_enable_evmcs()
  2020-01-15 17:10 ` [PATCH RFC 1/3] x86/kvm/hyper-v: remove stale evmcs_already_enabled check from nested_enable_evmcs() Vitaly Kuznetsov
@ 2020-01-15 22:50   ` Liran Alon
  0 siblings, 0 replies; 34+ messages in thread
From: Liran Alon @ 2020-01-15 22:50 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Jim Mattson,
	linux-kernel, Roman Kagan



> On 15 Jan 2020, at 19:10, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> 
> In nested_enable_evmcs() evmcs_already_enabled check doesn't really do
> anything: controls are already sanitized and we return '0' regardless.
> Just drop the check.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Reviewed-by: Liran Alon <liran.alon@oracle.com>

-Liran

> ---
> arch/x86/kvm/vmx/evmcs.c | 5 -----
> 1 file changed, 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 72359709cdc1..89c3e0caf39f 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -350,17 +350,12 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> 			uint16_t *vmcs_version)
> {
> 	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -	bool evmcs_already_enabled = vmx->nested.enlightened_vmcs_enabled;
> 
> 	vmx->nested.enlightened_vmcs_enabled = true;
> 
> 	if (vmcs_version)
> 		*vmcs_version = nested_get_evmcs_version(vcpu);
> 
> -	/* We don't support disabling the feature for simplicity. */
> -	if (evmcs_already_enabled)
> -		return 0;
> -
> 	vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
> 	vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> 	vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> -- 
> 2.24.1
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 3/3] x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
  2020-01-15 17:10 ` [PATCH RFC 3/3] x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests Vitaly Kuznetsov
@ 2020-01-15 22:59   ` Liran Alon
  2020-01-16  8:55     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Liran Alon @ 2020-01-15 22:59 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Jim Mattson,
	linux-kernel, Roman Kagan



> On 15 Jan 2020, at 19:10, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> 
> Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
> controls on for its guests and nested_vmx_check_controls() checks for
> that. This is, however, not the case for the controls which are supported
> on the host but are missing in enlightened VMCS and when eVMCS is in use.
> 
> It would certainly be possible to add these missing checks to
> nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
> preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
> non-eVMCS guests by doing less unrelated checks. Create a separate
> nested_evmcs_check_controls() for this purpose.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> arch/x86/kvm/vmx/evmcs.c  | 56 ++++++++++++++++++++++++++++++++++++++-
> arch/x86/kvm/vmx/evmcs.h  |  1 +
> arch/x86/kvm/vmx/nested.c |  3 +++
> 3 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index b5d6582ba589..88f462866396 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -4,9 +4,11 @@
> #include <linux/smp.h>
> 
> #include "../hyperv.h"
> -#include "evmcs.h"
> #include "vmcs.h"
> +#include "vmcs12.h"
> +#include "evmcs.h"
> #include "vmx.h"
> +#include "trace.h"
> 
> DEFINE_STATIC_KEY_FALSE(enable_evmcs);
> 
> @@ -378,6 +380,58 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
> 	*pdata = ctl_low | ((u64)ctl_high << 32);
> }
> 
> +int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
> +{
> +	int ret = 0;
> +	u32 unsupp_ctl;
> +
> +	unsupp_ctl = vmcs12->pin_based_vm_exec_control &
> +		EVMCS1_UNSUPPORTED_PINCTRL;
> +	if (unsupp_ctl) {
> +		trace_kvm_nested_vmenter_failed(
> +			"eVMCS: unsupported pin-based VM-execution controls",
> +			unsupp_ctl);

Why not move "CC” macro from nested.c to nested.h and use it here as-well instead of replicating it’s logic?

-Liran


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-15 17:10 ` [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() Vitaly Kuznetsov
  2020-01-15 22:49   ` Liran Alon
@ 2020-01-15 23:27   ` Sean Christopherson
  2020-01-15 23:30     ` Liran Alon
  2020-01-19  8:54   ` Paolo Bonzini
  2 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2020-01-15 23:27 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

On Wed, Jan 15, 2020 at 06:10:13PM +0100, Vitaly Kuznetsov wrote:
> With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
> with default (matching CPU model) values and in case eVMCS is also enabled,
> fails.

As in, Qemu is blindly throwing values at KVM and complains on failure?
That seems like a Qemu bug, especially since Qemu needs to explicitly do
KVM_CAP_HYPERV_ENLIGHTENED_VMCS to enable eVMCS.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-15 23:27   ` Sean Christopherson
@ 2020-01-15 23:30     ` Liran Alon
  2020-01-16  8:51       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Liran Alon @ 2020-01-15 23:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, kvm, Paolo Bonzini, Jim Mattson, linux-kernel,
	Roman Kagan



> On 16 Jan 2020, at 1:27, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Wed, Jan 15, 2020 at 06:10:13PM +0100, Vitaly Kuznetsov wrote:
>> With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
>> with default (matching CPU model) values and in case eVMCS is also enabled,
>> fails.
> 
> As in, Qemu is blindly throwing values at KVM and complains on failure?
> That seems like a Qemu bug, especially since Qemu needs to explicitly do
> KVM_CAP_HYPERV_ENLIGHTENED_VMCS to enable eVMCS.

See: https://patchwork.kernel.org/patch/11316021/
For more context.

-Liran

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-15 22:49   ` Liran Alon
@ 2020-01-16  8:37     ` Vitaly Kuznetsov
  2020-02-03 15:11       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-16  8:37 UTC (permalink / raw)
  To: Liran Alon
  Cc: kvm list, Paolo Bonzini, Sean Christopherson, Jim Mattson, LKML,
	Roman Kagan

Liran Alon <liran.alon@oracle.com> writes:

>> On 15 Jan 2020, at 19:10, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> 
>> With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
>> with default (matching CPU model) values and in case eVMCS is also enabled,
>> fails.
>> 
>> It would be possible to drop VMX feature filtering completely and make
>> this a guest's responsibility: if it decides to use eVMCS it should know
>> which fields are available and which are not. Hyper-V mostly complies to
>> this, however, there is at least one problematic control:
>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
>> which Hyper-V enables. As there is no 'apic_addr_field' in eVMCS, we
>> fail to handle this properly in KVM. It is unclear how this is supposed
>> to work, genuine Hyper-V doesn't expose the control so it is possible that
>> this is just a bug (in Hyper-V).
>
> Have you tried contacted someone at Hyper-V team about this?
>

Yes, I have.

>> 
>> Move VMX controls sanitization from nested_enable_evmcs() to vmx_get_msr(),
>> this allows userspace to keep setting controls it wants and at the same
>> time hides them from the guest.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> arch/x86/kvm/vmx/evmcs.c | 38 ++++++++++++++++++++++++++++++++------
>> arch/x86/kvm/vmx/evmcs.h |  1 +
>> arch/x86/kvm/vmx/vmx.c   | 10 ++++++++--
>> 3 files changed, 41 insertions(+), 8 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>> index 89c3e0caf39f..b5d6582ba589 100644
>> --- a/arch/x86/kvm/vmx/evmcs.c
>> +++ b/arch/x86/kvm/vmx/evmcs.c
>> @@ -346,6 +346,38 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>>        return 0;
>> }
>> 
>> +void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
>> +{
>> +	u32 ctl_low = (u32)*pdata, ctl_high = (u32)(*pdata >> 32);
>
> Nit: I dislike defining & initialising multiple local vars on same line.
>
>> +	/*
>> +	 * Enlightened VMCS doesn't have certain fields, make sure we don't
>> +	 * expose unsupported controls to L1.
>> +	 */
>> +
>> +	switch (msr_index) {
>> +	case MSR_IA32_VMX_PINBASED_CTLS:
>> +	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
>> +		ctl_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
>> +		break;
>> +	case MSR_IA32_VMX_EXIT_CTLS:
>> +	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
>> +		ctl_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
>> +		break;
>> +	case MSR_IA32_VMX_ENTRY_CTLS:
>> +	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
>> +		ctl_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
>> +		break;
>> +	case MSR_IA32_VMX_PROCBASED_CTLS2:
>> +		ctl_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
>> +		break;
>> +	case MSR_IA32_VMX_VMFUNC:
>> +		ctl_low &= ~EVMCS1_UNSUPPORTED_VMFUNC;
>> +		break;
>> +	}
>> +
>> +	*pdata = ctl_low | ((u64)ctl_high << 32);
>> +}
>> +
>> int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>> 			uint16_t *vmcs_version)
>> {
>> @@ -356,11 +388,5 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>> 	if (vmcs_version)
>> 		*vmcs_version = nested_get_evmcs_version(vcpu);
>> 
>> -	vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
>> -	vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
>> -	vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
>> -	vmx->nested.msrs.secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
>> -	vmx->nested.msrs.vmfunc_controls &= ~EVMCS1_UNSUPPORTED_VMFUNC;
>> -
>> 	return 0;
>> }
>> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
>> index 07ebf6882a45..b88d9807a796 100644
>> --- a/arch/x86/kvm/vmx/evmcs.h
>> +++ b/arch/x86/kvm/vmx/evmcs.h
>> @@ -201,5 +201,6 @@ bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa);
>> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
>> int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>> 			uint16_t *vmcs_version);
>> +void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
>> 
>> #endif /* __KVM_X86_VMX_EVMCS_H */
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index e3394c839dea..8eb74618b8d8 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -1849,8 +1849,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
>> 		if (!nested_vmx_allowed(vcpu))
>> 			return 1;
>> -		return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
>> -				       &msr_info->data);
>> +		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
>> +				    &msr_info->data))
>> +			return 1;
>> +		if (!msr_info->host_initiated &&
>> +		    vmx->nested.enlightened_vmcs_enabled)
>> +			nested_evmcs_filter_control_msr(msr_info->index,
>> +							&msr_info->data);
>> +		break;
>
> Nit: It seems more elegant to me to put the call to nested_evmcs_filter_control_msr() inside vmx_get_vmx_msr().
>

Sure, will move it there (in case we actually decide to merge this)

> The patch itself makes sense to me and looks correct.
> Reviewed-by: Liran Alon <liran.alon@oracle.com>

Thanks!

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-15 23:30     ` Liran Alon
@ 2020-01-16  8:51       ` Vitaly Kuznetsov
  2020-01-16 16:19         ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-16  8:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Liran Alon, kvm, Paolo Bonzini, Jim Mattson, linux-kernel, Roman Kagan

Liran Alon <liran.alon@oracle.com> writes:

>> On 16 Jan 2020, at 1:27, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>> 
>> On Wed, Jan 15, 2020 at 06:10:13PM +0100, Vitaly Kuznetsov wrote:
>>> With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
>>> with default (matching CPU model) values and in case eVMCS is also enabled,
>>> fails.
>> 
>> As in, Qemu is blindly throwing values at KVM and complains on failure?
>> That seems like a Qemu bug, especially since Qemu needs to explicitly do
>> KVM_CAP_HYPERV_ENLIGHTENED_VMCS to enable eVMCS.
>
> See: https://patchwork.kernel.org/patch/11316021/
> For more context.

Ya,

while it would certainly be possible to require that userspace takes
into account KVM_CAP_HYPERV_ENLIGHTENED_VMCS (which is an opt-in) when
doing KVM_SET_MSRS there doesn't seem to be an existing (easy) way to
figure out which VMX controls were filtered out after enabling
KVM_CAP_HYPERV_ENLIGHTENED_VMCS: KVM_GET_MSRS returns global
&vmcs_config.nested values for VMX MSRs (vmx_get_msr_feature()).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 3/3] x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
  2020-01-15 22:59   ` Liran Alon
@ 2020-01-16  8:55     ` Vitaly Kuznetsov
  2020-01-16 16:21       ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-16  8:55 UTC (permalink / raw)
  To: Liran Alon
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Jim Mattson,
	linux-kernel, Roman Kagan

Liran Alon <liran.alon@oracle.com> writes:

>> On 15 Jan 2020, at 19:10, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> 
>> Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
>> controls on for its guests and nested_vmx_check_controls() checks for
>> that. This is, however, not the case for the controls which are supported
>> on the host but are missing in enlightened VMCS and when eVMCS is in use.
>> 
>> It would certainly be possible to add these missing checks to
>> nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
>> preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
>> non-eVMCS guests by doing less unrelated checks. Create a separate
>> nested_evmcs_check_controls() for this purpose.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> arch/x86/kvm/vmx/evmcs.c  | 56 ++++++++++++++++++++++++++++++++++++++-
>> arch/x86/kvm/vmx/evmcs.h  |  1 +
>> arch/x86/kvm/vmx/nested.c |  3 +++
>> 3 files changed, 59 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>> index b5d6582ba589..88f462866396 100644
>> --- a/arch/x86/kvm/vmx/evmcs.c
>> +++ b/arch/x86/kvm/vmx/evmcs.c
>> @@ -4,9 +4,11 @@
>> #include <linux/smp.h>
>> 
>> #include "../hyperv.h"
>> -#include "evmcs.h"
>> #include "vmcs.h"
>> +#include "vmcs12.h"
>> +#include "evmcs.h"
>> #include "vmx.h"
>> +#include "trace.h"
>> 
>> DEFINE_STATIC_KEY_FALSE(enable_evmcs);
>> 
>> @@ -378,6 +380,58 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
>> 	*pdata = ctl_low | ((u64)ctl_high << 32);
>> }
>> 
>> +int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
>> +{
>> +	int ret = 0;
>> +	u32 unsupp_ctl;
>> +
>> +	unsupp_ctl = vmcs12->pin_based_vm_exec_control &
>> +		EVMCS1_UNSUPPORTED_PINCTRL;
>> +	if (unsupp_ctl) {
>> +		trace_kvm_nested_vmenter_failed(
>> +			"eVMCS: unsupported pin-based VM-execution controls",
>> +			unsupp_ctl);
>
> Why not move "CC” macro from nested.c to nested.h and use it here as-well instead of replicating it’s logic?
>

Because error messages I add are both human readable and conform to SDM!
:-)

On a more serious not yes, we should probably do that. We may even want
to use it in non-nesting (and non VMX) code in KVM.

Thanks,

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-16  8:51       ` Vitaly Kuznetsov
@ 2020-01-16 16:19         ` Sean Christopherson
  2020-01-16 16:57           ` Vitaly Kuznetsov
  2020-01-18 21:42           ` Paolo Bonzini
  0 siblings, 2 replies; 34+ messages in thread
From: Sean Christopherson @ 2020-01-16 16:19 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Liran Alon, kvm, Paolo Bonzini, Jim Mattson, linux-kernel, Roman Kagan

On Thu, Jan 16, 2020 at 09:51:47AM +0100, Vitaly Kuznetsov wrote:
> Liran Alon <liran.alon@oracle.com> writes:
> 
> >> On 16 Jan 2020, at 1:27, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> >> 
> >> On Wed, Jan 15, 2020 at 06:10:13PM +0100, Vitaly Kuznetsov wrote:
> >>> With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
> >>> with default (matching CPU model) values and in case eVMCS is also enabled,
> >>> fails.
> >> 
> >> As in, Qemu is blindly throwing values at KVM and complains on failure?
> >> That seems like a Qemu bug, especially since Qemu needs to explicitly do
> >> KVM_CAP_HYPERV_ENLIGHTENED_VMCS to enable eVMCS.
> >
> > See: https://patchwork.kernel.org/patch/11316021/
> > For more context.
> 
> Ya,
> 
> while it would certainly be possible to require that userspace takes
> into account KVM_CAP_HYPERV_ENLIGHTENED_VMCS (which is an opt-in) when
> doing KVM_SET_MSRS there doesn't seem to be an existing (easy) way to
> figure out which VMX controls were filtered out after enabling
> KVM_CAP_HYPERV_ENLIGHTENED_VMCS: KVM_GET_MSRS returns global
> &vmcs_config.nested values for VMX MSRs (vmx_get_msr_feature()).

Ah, I was looking at the call to vmx_get_vmx_msr(&vmx->nested.msrs, ...)
in vmx_get_msr().

Why not just do this in Qemu?  IMO that's not a major ask, e.g. Qemu is
doing a decent amount of manual adjustment anyways.  And Qemu isn't even
using the result of KVM_GET_MSRS so I don't think it's fair to say this is
solely KVM's fault.

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 1d10046a6c..6545bb323e 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -2623,6 +2623,23 @@ static void kvm_msr_entry_add_vmx(X86CPU *cpu, FeatureWordArray f)
              MSR_VMX_EPT_UC | MSR_VMX_EPT_WB : 0);
     uint64_t fixed_vmx_ept_vpid = kvm_vmx_ept_vpid & fixed_vmx_ept_mask;

+    /* Hyper-V's eVMCS does't support certain features, adjust accordingly. */
+    if (cpu->hyperv_evmcs) {
+        f[FEAT_VMX_PINBASED_CTLS] &= ~(VMX_PIN_BASED_VMX_PREEMPTION_TIMER |
+                                       VMX_PIN_BASED_POSTED_INTR);
+        f[FEAT_VMX_EXIT_CTLS] &= ~VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+        f[FEAT_VMX_ENTRY_CTLS] &= ~VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+        f[FEAT_VMX_SECONDARY_CTLS] &= ~(VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+                                        VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+                                        VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT |
+                                        VMX_SECONDARY_EXEC_ENABLE_PML |
+                                        VMX_SECONDARY_EXEC_ENABLE_VMFUNC |
+                                        VMX_SECONDARY_EXEC_SHADOW_VMCS |
+                                        /* VMX_SECONDARY_EXEC_TSC_SCALING | */
+                                        VMX_SECONDARY_EXEC_PAUSE_LOOP_EXITING);
+        f[FEAT_VMX_VMFUNC]         &= ~MSR_VMX_VMFUNC_EPT_SWITCHING;
+    }
+
     kvm_msr_entry_add(cpu, MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
                       make_vmx_msr_value(MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
                                          f[FEAT_VMX_PROCBASED_CTLS]));

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 3/3] x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
  2020-01-16  8:55     ` Vitaly Kuznetsov
@ 2020-01-16 16:21       ` Sean Christopherson
  2020-01-19  8:57         ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2020-01-16 16:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Liran Alon, kvm, Paolo Bonzini, Jim Mattson, linux-kernel, Roman Kagan

On Thu, Jan 16, 2020 at 09:55:57AM +0100, Vitaly Kuznetsov wrote:
> Liran Alon <liran.alon@oracle.com> writes:
> 
> >> On 15 Jan 2020, at 19:10, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >> 
> >> Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
> >> controls on for its guests and nested_vmx_check_controls() checks for
> >> that. This is, however, not the case for the controls which are supported
> >> on the host but are missing in enlightened VMCS and when eVMCS is in use.
> >> 
> >> It would certainly be possible to add these missing checks to
> >> nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
> >> preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
> >> non-eVMCS guests by doing less unrelated checks. Create a separate
> >> nested_evmcs_check_controls() for this purpose.
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >> arch/x86/kvm/vmx/evmcs.c  | 56 ++++++++++++++++++++++++++++++++++++++-
> >> arch/x86/kvm/vmx/evmcs.h  |  1 +
> >> arch/x86/kvm/vmx/nested.c |  3 +++
> >> 3 files changed, 59 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> >> index b5d6582ba589..88f462866396 100644
> >> --- a/arch/x86/kvm/vmx/evmcs.c
> >> +++ b/arch/x86/kvm/vmx/evmcs.c
> >> @@ -4,9 +4,11 @@
> >> #include <linux/smp.h>
> >> 
> >> #include "../hyperv.h"
> >> -#include "evmcs.h"
> >> #include "vmcs.h"
> >> +#include "vmcs12.h"
> >> +#include "evmcs.h"
> >> #include "vmx.h"
> >> +#include "trace.h"
> >> 
> >> DEFINE_STATIC_KEY_FALSE(enable_evmcs);
> >> 
> >> @@ -378,6 +380,58 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
> >> 	*pdata = ctl_low | ((u64)ctl_high << 32);
> >> }
> >> 
> >> +int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
> >> +{
> >> +	int ret = 0;
> >> +	u32 unsupp_ctl;
> >> +
> >> +	unsupp_ctl = vmcs12->pin_based_vm_exec_control &
> >> +		EVMCS1_UNSUPPORTED_PINCTRL;
> >> +	if (unsupp_ctl) {
> >> +		trace_kvm_nested_vmenter_failed(
> >> +			"eVMCS: unsupported pin-based VM-execution controls",
> >> +			unsupp_ctl);
> >
> > Why not move "CC” macro from nested.c to nested.h and use it here as-well instead of replicating it’s logic?
> >
> 
> Because error messages I add are both human readable and conform to SDM!
> :-)
> 
> On a more serious not yes, we should probably do that. We may even want
> to use it in non-nesting (and non VMX) code in KVM.

No, the CC() macro is short for Consistency Check, i.e. specific to nVMX.
Even if KVM ends up adding nested_evmcs_check_controls(), which I'm hoping
can be avoided, I'd still be hesitant to expose CC() in nested.h.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-16 16:19         ` Sean Christopherson
@ 2020-01-16 16:57           ` Vitaly Kuznetsov
  2020-01-17  6:31             ` Sean Christopherson
  2020-01-18 21:42           ` Paolo Bonzini
  1 sibling, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-16 16:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Liran Alon, kvm, Paolo Bonzini, Jim Mattson, linux-kernel, Roman Kagan

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Thu, Jan 16, 2020 at 09:51:47AM +0100, Vitaly Kuznetsov wrote:
>> Liran Alon <liran.alon@oracle.com> writes:
>> 
>> >> On 16 Jan 2020, at 1:27, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>> >> 
>> >> On Wed, Jan 15, 2020 at 06:10:13PM +0100, Vitaly Kuznetsov wrote:
>> >>> With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
>> >>> with default (matching CPU model) values and in case eVMCS is also enabled,
>> >>> fails.
>> >> 
>> >> As in, Qemu is blindly throwing values at KVM and complains on failure?
>> >> That seems like a Qemu bug, especially since Qemu needs to explicitly do
>> >> KVM_CAP_HYPERV_ENLIGHTENED_VMCS to enable eVMCS.
>> >
>> > See: https://patchwork.kernel.org/patch/11316021/
>> > For more context.
>> 
>> Ya,
>> 
>> while it would certainly be possible to require that userspace takes
>> into account KVM_CAP_HYPERV_ENLIGHTENED_VMCS (which is an opt-in) when
>> doing KVM_SET_MSRS there doesn't seem to be an existing (easy) way to
>> figure out which VMX controls were filtered out after enabling
>> KVM_CAP_HYPERV_ENLIGHTENED_VMCS: KVM_GET_MSRS returns global
>> &vmcs_config.nested values for VMX MSRs (vmx_get_msr_feature()).
>
> Ah, I was looking at the call to vmx_get_vmx_msr(&vmx->nested.msrs, ...)
> in vmx_get_msr().
>
> Why not just do this in Qemu?  IMO that's not a major ask, e.g. Qemu is
> doing a decent amount of manual adjustment anyways.  And Qemu isn't even
> using the result of KVM_GET_MSRS so I don't think it's fair to say this is
> solely KVM's fault.
>
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 1d10046a6c..6545bb323e 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -2623,6 +2623,23 @@ static void kvm_msr_entry_add_vmx(X86CPU *cpu, FeatureWordArray f)
>               MSR_VMX_EPT_UC | MSR_VMX_EPT_WB : 0);
>      uint64_t fixed_vmx_ept_vpid = kvm_vmx_ept_vpid & fixed_vmx_ept_mask;
>
> +    /* Hyper-V's eVMCS does't support certain features, adjust accordingly. */
> +    if (cpu->hyperv_evmcs) {
> +        f[FEAT_VMX_PINBASED_CTLS] &= ~(VMX_PIN_BASED_VMX_PREEMPTION_TIMER |
> +                                       VMX_PIN_BASED_POSTED_INTR);
> +        f[FEAT_VMX_EXIT_CTLS] &= ~VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> +        f[FEAT_VMX_ENTRY_CTLS] &= ~VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> +        f[FEAT_VMX_SECONDARY_CTLS] &= ~(VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
> +                                        VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
> +                                        VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +                                        VMX_SECONDARY_EXEC_ENABLE_PML |
> +                                        VMX_SECONDARY_EXEC_ENABLE_VMFUNC |
> +                                        VMX_SECONDARY_EXEC_SHADOW_VMCS |
> +                                        /* VMX_SECONDARY_EXEC_TSC_SCALING | */
> +                                        VMX_SECONDARY_EXEC_PAUSE_LOOP_EXITING);
> +        f[FEAT_VMX_VMFUNC]         &= ~MSR_VMX_VMFUNC_EPT_SWITCHING;
> +    }
> +
>      kvm_msr_entry_add(cpu, MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
>                        make_vmx_msr_value(MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
>                                           f[FEAT_VMX_PROCBASED_CTLS]));
>

I accuse you of not reading my PATCH0 :-)

https://lists.nongnu.org/archive/html/qemu-devel/2020-01/msg00123.html
does exactly this :-) 

P.S. I expect Paolo to comment on which hack he hates less :-)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-16 16:57           ` Vitaly Kuznetsov
@ 2020-01-17  6:31             ` Sean Christopherson
  0 siblings, 0 replies; 34+ messages in thread
From: Sean Christopherson @ 2020-01-17  6:31 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Liran Alon, kvm, Paolo Bonzini, Jim Mattson, linux-kernel, Roman Kagan

On Thu, Jan 16, 2020 at 05:57:31PM +0100, Vitaly Kuznetsov wrote:
> I accuse you of not reading my PATCH0 :-)

I read it, I just didn't click the link :-D

> https://lists.nongnu.org/archive/html/qemu-devel/2020-01/msg00123.html
> does exactly this :-) 
> 
> P.S. I expect Paolo to comment on which hack he hates less :-)
> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-16 16:19         ` Sean Christopherson
  2020-01-16 16:57           ` Vitaly Kuznetsov
@ 2020-01-18 21:42           ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2020-01-18 21:42 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov
  Cc: Liran Alon, kvm, Jim Mattson, linux-kernel, Roman Kagan

On 16/01/20 17:19, Sean Christopherson wrote:
> Why not just do this in Qemu?  IMO that's not a major ask, e.g. Qemu is
> doing a decent amount of manual adjustment anyways.  And Qemu isn't even
> using the result of KVM_GET_MSRS so I don't think it's fair to say this is
> solely KVM's fault.

IMHO the features should stay available in case the guest chooses not to
use eVMCS.  A guest that uses eVMCS should know which features it cannot
use and not enable them.

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-15 17:10 ` [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() Vitaly Kuznetsov
  2020-01-15 22:49   ` Liran Alon
  2020-01-15 23:27   ` Sean Christopherson
@ 2020-01-19  8:54   ` Paolo Bonzini
  2020-01-22  5:47     ` Sean Christopherson
  2 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2020-01-19  8:54 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Sean Christopherson, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

On 15/01/20 18:10, Vitaly Kuznetsov wrote:
> With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
> with default (matching CPU model) values and in case eVMCS is also enabled,
> fails.
> 
> It would be possible to drop VMX feature filtering completely and make
> this a guest's responsibility: if it decides to use eVMCS it should know
> which fields are available and which are not. Hyper-V mostly complies to
> this, however, there is at least one problematic control:
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
> which Hyper-V enables. As there is no 'apic_addr_field' in eVMCS, we
> fail to handle this properly in KVM. It is unclear how this is supposed
> to work, genuine Hyper-V doesn't expose the control so it is possible that
> this is just a bug (in Hyper-V).

Yes, it most likely is and it would be nice if Microsoft fixed it, but I
guess we're stuck with it for existing Windows versions.  Well, for one
we found a bug in Hyper-V and not the converse. :)

There is a problem with this approach, in that we're stuck with it
forever due to live migration.  But I guess if in the future eVMCS v2
adds an apic_address field we can limit the hack to eVMCS v1.  Another
possibility is to use the quirks mechanism but it's overkill for now.

Unless there are objections, I plan to apply these patches.

Paolo


> Move VMX controls sanitization from nested_enable_evmcs() to vmx_get_msr(),
> this allows userspace to keep setting controls it wants and at the same
> time hides them from the guest.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/vmx/evmcs.c | 38 ++++++++++++++++++++++++++++++++------
>  arch/x86/kvm/vmx/evmcs.h |  1 +
>  arch/x86/kvm/vmx/vmx.c   | 10 ++++++++--
>  3 files changed, 41 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 89c3e0caf39f..b5d6582ba589 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -346,6 +346,38 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>         return 0;
>  }
>  
> +void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
> +{
> +	u32 ctl_low = (u32)*pdata, ctl_high = (u32)(*pdata >> 32);
> +	/*
> +	 * Enlightened VMCS doesn't have certain fields, make sure we don't
> +	 * expose unsupported controls to L1.
> +	 */
> +
> +	switch (msr_index) {
> +	case MSR_IA32_VMX_PINBASED_CTLS:
> +	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> +		ctl_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
> +		break;
> +	case MSR_IA32_VMX_EXIT_CTLS:
> +	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> +		ctl_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> +		break;
> +	case MSR_IA32_VMX_ENTRY_CTLS:
> +	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> +		ctl_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> +		break;
> +	case MSR_IA32_VMX_PROCBASED_CTLS2:
> +		ctl_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
> +		break;
> +	case MSR_IA32_VMX_VMFUNC:
> +		ctl_low &= ~EVMCS1_UNSUPPORTED_VMFUNC;
> +		break;
> +	}
> +
> +	*pdata = ctl_low | ((u64)ctl_high << 32);
> +}
> +
>  int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>  			uint16_t *vmcs_version)
>  {
> @@ -356,11 +388,5 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>  	if (vmcs_version)
>  		*vmcs_version = nested_get_evmcs_version(vcpu);
>  
> -	vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
> -	vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> -	vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> -	vmx->nested.msrs.secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
> -	vmx->nested.msrs.vmfunc_controls &= ~EVMCS1_UNSUPPORTED_VMFUNC;
> -
>  	return 0;
>  }
> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index 07ebf6882a45..b88d9807a796 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -201,5 +201,6 @@ bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa);
>  uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
>  int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>  			uint16_t *vmcs_version);
> +void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
>  
>  #endif /* __KVM_X86_VMX_EVMCS_H */
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e3394c839dea..8eb74618b8d8 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1849,8 +1849,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
>  		if (!nested_vmx_allowed(vcpu))
>  			return 1;
> -		return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
> -				       &msr_info->data);
> +		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
> +				    &msr_info->data))
> +			return 1;
> +		if (!msr_info->host_initiated &&
> +		    vmx->nested.enlightened_vmcs_enabled)
> +			nested_evmcs_filter_control_msr(msr_info->index,
> +							&msr_info->data);
> +		break;
>  	case MSR_IA32_RTIT_CTL:
>  		if (pt_mode != PT_MODE_HOST_GUEST)
>  			return 1;
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 3/3] x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
  2020-01-16 16:21       ` Sean Christopherson
@ 2020-01-19  8:57         ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2020-01-19  8:57 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov
  Cc: Liran Alon, kvm, Jim Mattson, linux-kernel, Roman Kagan

On 16/01/20 17:21, Sean Christopherson wrote:
> On Thu, Jan 16, 2020 at 09:55:57AM +0100, Vitaly Kuznetsov wrote:
>> Liran Alon <liran.alon@oracle.com> writes:
>>
>>>> On 15 Jan 2020, at 19:10, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>>>
>>>> Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
>>>> controls on for its guests and nested_vmx_check_controls() checks for
>>>> that. This is, however, not the case for the controls which are supported
>>>> on the host but are missing in enlightened VMCS and when eVMCS is in use.
>>>>
>>>> It would certainly be possible to add these missing checks to
>>>> nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
>>>> preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
>>>> non-eVMCS guests by doing less unrelated checks. Create a separate
>>>> nested_evmcs_check_controls() for this purpose.
>>>>
>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>> ---
>>>> arch/x86/kvm/vmx/evmcs.c  | 56 ++++++++++++++++++++++++++++++++++++++-
>>>> arch/x86/kvm/vmx/evmcs.h  |  1 +
>>>> arch/x86/kvm/vmx/nested.c |  3 +++
>>>> 3 files changed, 59 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>>>> index b5d6582ba589..88f462866396 100644
>>>> --- a/arch/x86/kvm/vmx/evmcs.c
>>>> +++ b/arch/x86/kvm/vmx/evmcs.c
>>>> @@ -4,9 +4,11 @@
>>>> #include <linux/smp.h>
>>>>
>>>> #include "../hyperv.h"
>>>> -#include "evmcs.h"
>>>> #include "vmcs.h"
>>>> +#include "vmcs12.h"
>>>> +#include "evmcs.h"
>>>> #include "vmx.h"
>>>> +#include "trace.h"
>>>>
>>>> DEFINE_STATIC_KEY_FALSE(enable_evmcs);
>>>>
>>>> @@ -378,6 +380,58 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
>>>> 	*pdata = ctl_low | ((u64)ctl_high << 32);
>>>> }
>>>>
>>>> +int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
>>>> +{
>>>> +	int ret = 0;
>>>> +	u32 unsupp_ctl;
>>>> +
>>>> +	unsupp_ctl = vmcs12->pin_based_vm_exec_control &
>>>> +		EVMCS1_UNSUPPORTED_PINCTRL;
>>>> +	if (unsupp_ctl) {
>>>> +		trace_kvm_nested_vmenter_failed(
>>>> +			"eVMCS: unsupported pin-based VM-execution controls",
>>>> +			unsupp_ctl);
>>>
>>> Why not move "CC” macro from nested.c to nested.h and use it here as-well instead of replicating it’s logic?
>>>
>>
>> Because error messages I add are both human readable and conform to SDM!
>> :-)
>>
>> On a more serious not yes, we should probably do that. We may even want
>> to use it in non-nesting (and non VMX) code in KVM.
> 
> No, the CC() macro is short for Consistency Check, i.e. specific to nVMX.
> Even if KVM ends up adding nested_evmcs_check_controls(), which I'm hoping
> can be avoided, I'd still be hesitant to expose CC() in nested.h.
> 

For now let's keep Vitaly's patch as is.  It's definitely a good one as
it would catch Hyper-V's issue immediately even without patch 2 (which
is the only really contentious change).

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-19  8:54   ` Paolo Bonzini
@ 2020-01-22  5:47     ` Sean Christopherson
  2020-01-22  9:37       ` Vitaly Kuznetsov
  2020-01-22 14:33       ` Paolo Bonzini
  0 siblings, 2 replies; 34+ messages in thread
From: Sean Christopherson @ 2020-01-22  5:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, kvm, Jim Mattson, linux-kernel, Liran Alon,
	Roman Kagan

On Sun, Jan 19, 2020 at 09:54:44AM +0100, Paolo Bonzini wrote:
> On 15/01/20 18:10, Vitaly Kuznetsov wrote:
> > With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
> > with default (matching CPU model) values and in case eVMCS is also enabled,
> > fails.
> > 
> > It would be possible to drop VMX feature filtering completely and make
> > this a guest's responsibility: if it decides to use eVMCS it should know
> > which fields are available and which are not. Hyper-V mostly complies to
> > this, however, there is at least one problematic control:
> > SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
> > which Hyper-V enables. As there is no 'apic_addr_field' in eVMCS, we
> > fail to handle this properly in KVM. It is unclear how this is supposed
> > to work, genuine Hyper-V doesn't expose the control so it is possible that
> > this is just a bug (in Hyper-V).
> 
> Yes, it most likely is and it would be nice if Microsoft fixed it, but I
> guess we're stuck with it for existing Windows versions.  Well, for one
> we found a bug in Hyper-V and not the converse. :)
> 
> There is a problem with this approach, in that we're stuck with it
> forever due to live migration.  But I guess if in the future eVMCS v2
> adds an apic_address field we can limit the hack to eVMCS v1.  Another
> possibility is to use the quirks mechanism but it's overkill for now.
> 
> Unless there are objections, I plan to apply these patches.

Doesn't applying this patch contradict your earlier opinion?  This patch
would still hide the affected controls from the guest because the host
controls enlightened_vmcs_enabled.

On Sat, Jan 18, 2020 at 10:42:31PM +0100, Paolo Bonzini wrote:
> IMHO the features should stay available in case the guest chooses not to
> use eVMCS.  A guest that uses eVMCS should know which features it cannot
> use and not enable them.

Makes sense, wasn't thinking about the scenario where the guest doesn't
support eVMCS or doesn't want to use it for whatever reason.

Rather than update vmx->nested.msrs or filter vmx_get_msr(), what about
manually adding eVMCS consistency checks on the disallowed bits and handle
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES as a one-off case by simply
clearing it from the eVMCS?  Or alternatively, squashing all the disallowed
bits.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-22  5:47     ` Sean Christopherson
@ 2020-01-22  9:37       ` Vitaly Kuznetsov
  2020-01-22 14:33       ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-22  9:37 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Sat, Jan 18, 2020 at 10:42:31PM +0100, Paolo Bonzini wrote:
>> IMHO the features should stay available in case the guest chooses not to
>> use eVMCS.  A guest that uses eVMCS should know which features it cannot
>> use and not enable them.
>
> Makes sense, wasn't thinking about the scenario where the guest doesn't
> support eVMCS or doesn't want to use it for whatever reason.
>
> Rather than update vmx->nested.msrs or filter vmx_get_msr(), what about
> manually adding eVMCS consistency checks on the disallowed bits and handle
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES as a one-off case by simply
> clearing it from the eVMCS?

Unfortunately, this doesn't work because ... Windows. Not only Hyper-V
enables SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES, it actually expects it
to work (somehow) so when I do

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 72359709cdc1..e6c30eec2817 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -361,11 +361,5 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
        if (evmcs_already_enabled)
                return 0;
 
-       vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
-       vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
-       vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
-       vmx->nested.msrs.secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
-       vmx->nested.msrs.vmfunc_controls &= ~EVMCS1_UNSUPPORTED_VMFUNC;
-
        return 0;
 }
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bb8afe0c5e7f..cd1f5a1c884b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1590,7 +1590,7 @@ static int copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx)
                        evmcs->pin_based_vm_exec_control;
                vmcs12->vm_exit_controls = evmcs->vm_exit_controls;
                vmcs12->secondary_vm_exec_control =
-                       evmcs->secondary_vm_exec_control;
+                       evmcs->secondary_vm_exec_control & ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
        }
 
        if (unlikely(!(evmcs->hv_clean_fields &

Hyper-V 2016 with > 1 vCPU fails to boot :-(

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-22  5:47     ` Sean Christopherson
  2020-01-22  9:37       ` Vitaly Kuznetsov
@ 2020-01-22 14:33       ` Paolo Bonzini
  2020-01-22 15:08         ` Vitaly Kuznetsov
  1 sibling, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2020-01-22 14:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, kvm, Jim Mattson, linux-kernel, Liran Alon,
	Roman Kagan

On 22/01/20 06:47, Sean Christopherson wrote:
>> Yes, it most likely is and it would be nice if Microsoft fixed it, but I
>> guess we're stuck with it for existing Windows versions.  Well, for one
>> we found a bug in Hyper-V and not the converse. :)
>>
>> There is a problem with this approach, in that we're stuck with it
>> forever due to live migration.  But I guess if in the future eVMCS v2
>> adds an apic_address field we can limit the hack to eVMCS v1.  Another
>> possibility is to use the quirks mechanism but it's overkill for now.
>>
>> Unless there are objections, I plan to apply these patches.
> Doesn't applying this patch contradict your earlier opinion?  This patch
> would still hide the affected controls from the guest because the host
> controls enlightened_vmcs_enabled.

It does.  Unfortunately the key sentence is "we're stuck with it for
existing Windows versions". :(

> Rather than update vmx->nested.msrs or filter vmx_get_msr(), what about
> manually adding eVMCS consistency checks on the disallowed bits and handle
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES as a one-off case by simply
> clearing it from the eVMCS?  Or alternatively, squashing all the disallowed
> bits.

Hmm, that is also a possibility.  It's a very hacky one, but I guess
adding APIC virtualization to eVMCS would require bumping the version to
2.  Vitaly, what do you think?

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-22 14:33       ` Paolo Bonzini
@ 2020-01-22 15:08         ` Vitaly Kuznetsov
  2020-01-22 15:51           ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-22 15:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 22/01/20 06:47, Sean Christopherson wrote:
>>> Yes, it most likely is and it would be nice if Microsoft fixed it, but I
>>> guess we're stuck with it for existing Windows versions.  Well, for one
>>> we found a bug in Hyper-V and not the converse. :)
>>>
>>> There is a problem with this approach, in that we're stuck with it
>>> forever due to live migration.  But I guess if in the future eVMCS v2
>>> adds an apic_address field we can limit the hack to eVMCS v1.  Another
>>> possibility is to use the quirks mechanism but it's overkill for now.
>>>
>>> Unless there are objections, I plan to apply these patches.
>> Doesn't applying this patch contradict your earlier opinion?  This patch
>> would still hide the affected controls from the guest because the host
>> controls enlightened_vmcs_enabled.
>
> It does.  Unfortunately the key sentence is "we're stuck with it for
> existing Windows versions". :(
>
>> Rather than update vmx->nested.msrs or filter vmx_get_msr(), what about
>> manually adding eVMCS consistency checks on the disallowed bits and handle
>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES as a one-off case by simply
>> clearing it from the eVMCS?  Or alternatively, squashing all the disallowed
>> bits.
>
> Hmm, that is also a possibility.  It's a very hacky one, but I guess
> adding APIC virtualization to eVMCS would require bumping the version to
> 2.  Vitaly, what do you think?

As I already replied to Sean I like the idea to filter out unsupported
controls from eVMCS but unfortunately it doesn't work: Hyper-V actually
expects APIC virtualization to work when it enables
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES (I have no idea how without
apic_access_addr field but). I checked and at least Hyper-V 2016 doesn't
boot (when >1 vCPU).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-22 15:08         ` Vitaly Kuznetsov
@ 2020-01-22 15:51           ` Sean Christopherson
  2020-01-22 16:29             ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2020-01-22 15:51 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

On Wed, Jan 22, 2020 at 04:08:55PM +0100, Vitaly Kuznetsov wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 22/01/20 06:47, Sean Christopherson wrote:
> >>> Yes, it most likely is and it would be nice if Microsoft fixed it, but I
> >>> guess we're stuck with it for existing Windows versions.  Well, for one
> >>> we found a bug in Hyper-V and not the converse. :)
> >>>
> >>> There is a problem with this approach, in that we're stuck with it
> >>> forever due to live migration.  But I guess if in the future eVMCS v2
> >>> adds an apic_address field we can limit the hack to eVMCS v1.  Another
> >>> possibility is to use the quirks mechanism but it's overkill for now.
> >>>
> >>> Unless there are objections, I plan to apply these patches.
> >> Doesn't applying this patch contradict your earlier opinion?  This patch
> >> would still hide the affected controls from the guest because the host
> >> controls enlightened_vmcs_enabled.
> >
> > It does.  Unfortunately the key sentence is "we're stuck with it for
> > existing Windows versions". :(

Ah, I didn't understand what "it" referred to :-)

> >> Rather than update vmx->nested.msrs or filter vmx_get_msr(), what about
> >> manually adding eVMCS consistency checks on the disallowed bits and handle
> >> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES as a one-off case by simply
> >> clearing it from the eVMCS?  Or alternatively, squashing all the disallowed
> >> bits.
> >
> > Hmm, that is also a possibility.  It's a very hacky one, but I guess
> > adding APIC virtualization to eVMCS would require bumping the version to
> > 2.  Vitaly, what do you think?
> 
> As I already replied to Sean I like the idea to filter out unsupported
> controls from eVMCS but unfortunately it doesn't work: Hyper-V actually
> expects APIC virtualization to work when it enables
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES (I have no idea how without
> apic_access_addr field but). I checked and at least Hyper-V 2016 doesn't
> boot (when >1 vCPU).

Nice.

I still don't see what we gain from applying this patch.  Once eVMCS is
enabled by userspace, which presumably happens before the guest is launched,
the guest will see the eVMCS-unfriendly controls as being unsupported, both
for eVMCS and regular VMCS.  AFAICT, we're adding a fairly ugly hack to KVM
just so that KVM can lie to userspace about what controls will be exposed to
the guest.

Can we extend the API to use cap->args[1] to control whether or not the
unsupported controls are removed from vmx->nested.msrs?  Userspace could
pass '1' to leave the controls untouched and then surgically hide the
controls that the guest is too dumb to know it shouldn't use by writing the
appropriate MSRs.  Assuming existing userspace is expected/required to zero
out args[1..3], this would be fully backwards compatible.


diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 72359709cdc1..241a769be738 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -346,8 +346,8 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
        return 0;
 }

-int nested_enable_evmcs(struct kvm_vcpu *vcpu,
-                       uint16_t *vmcs_version)
+int nested_enable_evmcs(struct kvm_vcpu *vcpu, uint16_t *vmcs_version,
+                       bool allow_unsupported_controls)
 {
        struct vcpu_vmx *vmx = to_vmx(vcpu);
        bool evmcs_already_enabled = vmx->nested.enlightened_vmcs_enabled;
@@ -358,7 +358,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
                *vmcs_version = nested_get_evmcs_version(vcpu);

        /* We don't support disabling the feature for simplicity. */
-       if (evmcs_already_enabled)
+       if (evmcs_already_enabled || allow_unsupported_controls)
                return 0;

        vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0cccc52e2d0a..5e1b8d51277b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4005,7 +4005,8 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
        case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
                if (!kvm_x86_ops->nested_enable_evmcs)
                        return -ENOTTY;
-               r = kvm_x86_ops->nested_enable_evmcs(vcpu, &vmcs_version);
+               r = kvm_x86_ops->nested_enable_evmcs(vcpu, &vmcs_version,
+                                                    cap->args[1]);
                if (!r) {
                        user_ptr = (void __user *)(uintptr_t)cap->args[0];
                        if (copy_to_user(user_ptr, &vmcs_version,

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-22 15:51           ` Sean Christopherson
@ 2020-01-22 16:29             ` Vitaly Kuznetsov
  2020-01-22 16:40               ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-22 16:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Wed, Jan 22, 2020 at 04:08:55PM +0100, Vitaly Kuznetsov wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>> > On 22/01/20 06:47, Sean Christopherson wrote:
>> >>> Yes, it most likely is and it would be nice if Microsoft fixed it, but I
>> >>> guess we're stuck with it for existing Windows versions.  Well, for one
>> >>> we found a bug in Hyper-V and not the converse. :)
>> >>>
>> >>> There is a problem with this approach, in that we're stuck with it
>> >>> forever due to live migration.  But I guess if in the future eVMCS v2
>> >>> adds an apic_address field we can limit the hack to eVMCS v1.  Another
>> >>> possibility is to use the quirks mechanism but it's overkill for now.
>> >>>
>> >>> Unless there are objections, I plan to apply these patches.
>> >> Doesn't applying this patch contradict your earlier opinion?  This patch
>> >> would still hide the affected controls from the guest because the host
>> >> controls enlightened_vmcs_enabled.
>> >
>> > It does.  Unfortunately the key sentence is "we're stuck with it for
>> > existing Windows versions". :(
>
> Ah, I didn't understand what "it" referred to :-)
>
>> >> Rather than update vmx->nested.msrs or filter vmx_get_msr(), what about
>> >> manually adding eVMCS consistency checks on the disallowed bits and handle
>> >> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES as a one-off case by simply
>> >> clearing it from the eVMCS?  Or alternatively, squashing all the disallowed
>> >> bits.
>> >
>> > Hmm, that is also a possibility.  It's a very hacky one, but I guess
>> > adding APIC virtualization to eVMCS would require bumping the version to
>> > 2.  Vitaly, what do you think?
>> 
>> As I already replied to Sean I like the idea to filter out unsupported
>> controls from eVMCS but unfortunately it doesn't work: Hyper-V actually
>> expects APIC virtualization to work when it enables
>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES (I have no idea how without
>> apic_access_addr field but). I checked and at least Hyper-V 2016 doesn't
>> boot (when >1 vCPU).
>
> Nice.
>
> I still don't see what we gain from applying this patch.  Once eVMCS is
> enabled by userspace, which presumably happens before the guest is launched,
> the guest will see the eVMCS-unfriendly controls as being unsupported, both
> for eVMCS and regular VMCS.  AFAICT, we're adding a fairly ugly hack to KVM
> just so that KVM can lie to userspace about what controls will be exposed to
> the guest.
>
> Can we extend the API to use cap->args[1] to control whether or not the
> unsupported controls are removed from vmx->nested.msrs?  Userspace could
> pass '1' to leave the controls untouched and then surgically hide the
> controls that the guest is too dumb to know it shouldn't use by writing the
> appropriate MSRs.  Assuming existing userspace is expected/required to zero
> out args[1..3], this would be fully backwards compatible.

Yes, in case we're back to the idea to filter things out in QEMU we can
do this. What I don't like is that every other userspace which decides
to enable eVMCS will have to perform the exact same surgery as in case
it sets allow_unsupported_controls=0 it'll have to know (hardcode) the
filtering (or KVM_SET_MSRS will fail) and in case it opts for
allow_unsupported_controls=1 Windows guests just won't boot without the
filtering.

It seems to be 1:1, eVMCSv1 requires the filter.

>
>
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 72359709cdc1..241a769be738 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -346,8 +346,8 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>         return 0;
>  }
>
> -int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> -                       uint16_t *vmcs_version)
> +int nested_enable_evmcs(struct kvm_vcpu *vcpu, uint16_t *vmcs_version,
> +                       bool allow_unsupported_controls)

Personally, I'd call it 'keep_unsupported_controls'.

>  {
>         struct vcpu_vmx *vmx = to_vmx(vcpu);
>         bool evmcs_already_enabled = vmx->nested.enlightened_vmcs_enabled;
> @@ -358,7 +358,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>                 *vmcs_version = nested_get_evmcs_version(vcpu);
>
>         /* We don't support disabling the feature for simplicity. */
> -       if (evmcs_already_enabled)
> +       if (evmcs_already_enabled || allow_unsupported_controls)
>                 return 0;
>
>         vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0cccc52e2d0a..5e1b8d51277b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4005,7 +4005,8 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>         case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
>                 if (!kvm_x86_ops->nested_enable_evmcs)
>                         return -ENOTTY;
> -               r = kvm_x86_ops->nested_enable_evmcs(vcpu, &vmcs_version);
> +               r = kvm_x86_ops->nested_enable_evmcs(vcpu, &vmcs_version,
> +                                                    cap->args[1]);
>                 if (!r) {
>                         user_ptr = (void __user *)(uintptr_t)cap->args[0];
>                         if (copy_to_user(user_ptr, &vmcs_version,
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-22 16:29             ` Vitaly Kuznetsov
@ 2020-01-22 16:40               ` Paolo Bonzini
  2020-01-23  9:15                 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2020-01-22 16:40 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Sean Christopherson
  Cc: kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

On 22/01/20 17:29, Vitaly Kuznetsov wrote:
> Yes, in case we're back to the idea to filter things out in QEMU we can
> do this. What I don't like is that every other userspace which decides
> to enable eVMCS will have to perform the exact same surgery as in case
> it sets allow_unsupported_controls=0 it'll have to know (hardcode) the
> filtering (or KVM_SET_MSRS will fail) and in case it opts for
> allow_unsupported_controls=1 Windows guests just won't boot without the
> filtering.
> 
> It seems to be 1:1, eVMCSv1 requires the filter.

Yes, that's the point.  It *is* a hack in KVM, but it is generally
preferrable to have an easier API for userspace, if there's only one way
to do it.

Though we could be a bit more "surgical" and only remove
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES---thus minimizing the impact on
non-eVMCS guests.  Vitaly, can you prepare a v2 that does that and adds
a huge "hack alert" comment that explains the discussion?

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-22 16:40               ` Paolo Bonzini
@ 2020-01-23  9:15                 ` Vitaly Kuznetsov
  2020-01-23 19:09                   ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-23  9:15 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 22/01/20 17:29, Vitaly Kuznetsov wrote:
>> Yes, in case we're back to the idea to filter things out in QEMU we can
>> do this. What I don't like is that every other userspace which decides
>> to enable eVMCS will have to perform the exact same surgery as in case
>> it sets allow_unsupported_controls=0 it'll have to know (hardcode) the
>> filtering (or KVM_SET_MSRS will fail) and in case it opts for
>> allow_unsupported_controls=1 Windows guests just won't boot without the
>> filtering.
>> 
>> It seems to be 1:1, eVMCSv1 requires the filter.
>
> Yes, that's the point.  It *is* a hack in KVM, but it is generally
> preferrable to have an easier API for userspace, if there's only one way
> to do it.
>
> Though we could be a bit more "surgical" and only remove
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES---thus minimizing the impact on
> non-eVMCS guests.  Vitaly, can you prepare a v2 that does that and adds
> a huge "hack alert" comment that explains the discussion?

Yes, sure. I'd like to do more testing to make sure filtering out
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES is enough for other Hyper-V
versions too (who knows how many bugs are there :-)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-23  9:15                 ` Vitaly Kuznetsov
@ 2020-01-23 19:09                   ` Vitaly Kuznetsov
  2020-01-24 17:25                     ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-23 19:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan,
	Sean Christopherson

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> On 22/01/20 17:29, Vitaly Kuznetsov wrote:
>>> Yes, in case we're back to the idea to filter things out in QEMU we can
>>> do this. What I don't like is that every other userspace which decides
>>> to enable eVMCS will have to perform the exact same surgery as in case
>>> it sets allow_unsupported_controls=0 it'll have to know (hardcode) the
>>> filtering (or KVM_SET_MSRS will fail) and in case it opts for
>>> allow_unsupported_controls=1 Windows guests just won't boot without the
>>> filtering.
>>> 
>>> It seems to be 1:1, eVMCSv1 requires the filter.
>>
>> Yes, that's the point.  It *is* a hack in KVM, but it is generally
>> preferrable to have an easier API for userspace, if there's only one way
>> to do it.
>>
>> Though we could be a bit more "surgical" and only remove
>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES---thus minimizing the impact on
>> non-eVMCS guests.  Vitaly, can you prepare a v2 that does that and adds
>> a huge "hack alert" comment that explains the discussion?
>
> Yes, sure. I'd like to do more testing to make sure filtering out
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES is enough for other Hyper-V
> versions too (who knows how many bugs are there :-)

... and the answer is -- more than one :-)

I've tested Hyper-V 2016/2019 BIOS and UEFI-booted and the minimal
viable set seems to be:

MSR_IA32_VMX_PROCBASED_CTLS2: 
	~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES

MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS:
	~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL

MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS:
	~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL
 
with these filtered out all 4 versions are at least able to boot with >1
vCPU and run a nested guest (different from Windows management
partition).

This still feels a bit fragile as who knows under which circumstances
Hyper-V might want to enable additional (missing) controls.

If there are no objections and if we still think it would be beneficial
to minimize the list of controls we filter out (and not go with the full
set like my RFC suggests), I'll prepare v2. (v1, actually, this was RFC).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-23 19:09                   ` Vitaly Kuznetsov
@ 2020-01-24 17:25                     ` Sean Christopherson
  2020-01-27 15:38                       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2020-01-24 17:25 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

On Thu, Jan 23, 2020 at 08:09:03PM +0100, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
> > Paolo Bonzini <pbonzini@redhat.com> writes:
> >
> >> On 22/01/20 17:29, Vitaly Kuznetsov wrote:
> >>> Yes, in case we're back to the idea to filter things out in QEMU we can
> >>> do this. What I don't like is that every other userspace which decides
> >>> to enable eVMCS will have to perform the exact same surgery as in case
> >>> it sets allow_unsupported_controls=0 it'll have to know (hardcode) the
> >>> filtering (or KVM_SET_MSRS will fail) and in case it opts for
> >>> allow_unsupported_controls=1 Windows guests just won't boot without the
> >>> filtering.
> >>> 
> >>> It seems to be 1:1, eVMCSv1 requires the filter.
> >>
> >> Yes, that's the point.  It *is* a hack in KVM, but it is generally
> >> preferrable to have an easier API for userspace, if there's only one way
> >> to do it.
> >>
> >> Though we could be a bit more "surgical" and only remove
> >> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES---thus minimizing the impact on
> >> non-eVMCS guests.  Vitaly, can you prepare a v2 that does that and adds
> >> a huge "hack alert" comment that explains the discussion?
> >
> > Yes, sure. I'd like to do more testing to make sure filtering out
> > SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES is enough for other Hyper-V
> > versions too (who knows how many bugs are there :-)
> 
> ... and the answer is -- more than one :-)
> 
> I've tested Hyper-V 2016/2019 BIOS and UEFI-booted and the minimal
> viable set seems to be:
> 
> MSR_IA32_VMX_PROCBASED_CTLS2: 
> 	~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
> 
> MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> 	~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
> 
> MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS:
> 	~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL
>  
> with these filtered out all 4 versions are at least able to boot with >1
> vCPU and run a nested guest (different from Windows management
> partition).
> 
> This still feels a bit fragile as who knows under which circumstances
> Hyper-V might want to enable additional (missing) controls.

No strong opinion, I'm good either way.

> If there are no objections and if we still think it would be beneficial
> to minimize the list of controls we filter out (and not go with the full
> set like my RFC suggests), I'll prepare v2. (v1, actually, this was RFC).

One last idea, can we keep the MSR filtering as is and add the hack in
vmx_restore_control_msr()?  That way the (userspace) host and guest see
the same values when reading the affected MSRs, and eVMCS wouldn't need
it's own hook to do consistency checks.

@@ -1181,28 +1181,38 @@ static int
 vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 {
        u64 supported;
-       u32 *lowp, *highp;
+       u32 *lowp, *highp, evmcs_unsupported;

        switch (msr_index) {
        case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
                lowp = &vmx->nested.msrs.pinbased_ctls_low;
                highp = &vmx->nested.msrs.pinbased_ctls_high;
+               if (vmx->nested.enlightened_vmcs_enabled)
+                       evmcs_unsupported = EVMCS1_UNSUPPORTED_PINCTRL;
                break;
        case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
                lowp = &vmx->nested.msrs.procbased_ctls_low;
                highp = &vmx->nested.msrs.procbased_ctls_high;
+               if (vmx->nested.enlightened_vmcs_enabled)
+                       evmcs_unsupported = 0;
                break;
        case MSR_IA32_VMX_TRUE_EXIT_CTLS:
                lowp = &vmx->nested.msrs.exit_ctls_low;
                highp = &vmx->nested.msrs.exit_ctls_high;
+               if (vmx->nested.enlightened_vmcs_enabled)
+                       evmcs_unsupported = EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
                break;
        case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
                lowp = &vmx->nested.msrs.entry_ctls_low;
                highp = &vmx->nested.msrs.entry_ctls_high;
+               if (vmx->nested.enlightened_vmcs_enabled)
+                       evmcs_unsupported = EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
                break;
        case MSR_IA32_VMX_PROCBASED_CTLS2:
                lowp = &vmx->nested.msrs.secondary_ctls_low;
                highp = &vmx->nested.msrs.secondary_ctls_high;
+               if (vmx->nested.enlightened_vmcs_enabled)
+                       evmcs_unsupported = EVMCS1_UNSUPPORTED_2NDEXEC;
                break;
        default:
                BUG();
@@ -1210,6 +1220,9 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)

        supported = vmx_control_msr(*lowp, *highp);

+       /* HACK! */
+       data &= ~(u64)evmcs_unsupported << 32;
+
        /* Check must-be-1 bits are still 1. */
        if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0)))


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-24 17:25                     ` Sean Christopherson
@ 2020-01-27 15:38                       ` Vitaly Kuznetsov
  2020-01-27 17:53                         ` Paolo Bonzini
  2020-01-27 18:17                         ` Sean Christopherson
  0 siblings, 2 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-27 15:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Thu, Jan 23, 2020 at 08:09:03PM +0100, Vitaly Kuznetsov wrote:
>> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>> 
>> > Paolo Bonzini <pbonzini@redhat.com> writes:
>> >
>> >> On 22/01/20 17:29, Vitaly Kuznetsov wrote:
>> >>> Yes, in case we're back to the idea to filter things out in QEMU we can
>> >>> do this. What I don't like is that every other userspace which decides
>> >>> to enable eVMCS will have to perform the exact same surgery as in case
>> >>> it sets allow_unsupported_controls=0 it'll have to know (hardcode) the
>> >>> filtering (or KVM_SET_MSRS will fail) and in case it opts for
>> >>> allow_unsupported_controls=1 Windows guests just won't boot without the
>> >>> filtering.
>> >>> 
>> >>> It seems to be 1:1, eVMCSv1 requires the filter.
>> >>
>> >> Yes, that's the point.  It *is* a hack in KVM, but it is generally
>> >> preferrable to have an easier API for userspace, if there's only one way
>> >> to do it.
>> >>
>> >> Though we could be a bit more "surgical" and only remove
>> >> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES---thus minimizing the impact on
>> >> non-eVMCS guests.  Vitaly, can you prepare a v2 that does that and adds
>> >> a huge "hack alert" comment that explains the discussion?
>> >
>> > Yes, sure. I'd like to do more testing to make sure filtering out
>> > SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES is enough for other Hyper-V
>> > versions too (who knows how many bugs are there :-)
>> 
>> ... and the answer is -- more than one :-)
>> 
>> I've tested Hyper-V 2016/2019 BIOS and UEFI-booted and the minimal
>> viable set seems to be:
>> 
>> MSR_IA32_VMX_PROCBASED_CTLS2: 
>> 	~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
>> 
>> MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS:
>> 	~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
>> 
>> MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS:
>> 	~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL
>>  
>> with these filtered out all 4 versions are at least able to boot with >1
>> vCPU and run a nested guest (different from Windows management
>> partition).
>> 
>> This still feels a bit fragile as who knows under which circumstances
>> Hyper-V might want to enable additional (missing) controls.
>
> No strong opinion, I'm good either way.
>
>> If there are no objections and if we still think it would be beneficial
>> to minimize the list of controls we filter out (and not go with the full
>> set like my RFC suggests), I'll prepare v2. (v1, actually, this was RFC).
>
> One last idea, can we keep the MSR filtering as is and add the hack in
> vmx_restore_control_msr()?  That way the (userspace) host and guest see
> the same values when reading the affected MSRs, and eVMCS wouldn't need
> it's own hook to do consistency checks.

Yes but (if I'm not mistaken) we'll have then to keep the filtering we
currently do in nested_enable_evmcs(): if userspace doesn't do
KVM_SET_MSR for VMX MSRs (QEMU<4.2) then the filtering in
vmx_restore_control_msr() won't happen and the guest will see the
unfiltered set of controls...

>
> @@ -1181,28 +1181,38 @@ static int
>  vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
>  {
>         u64 supported;
> -       u32 *lowp, *highp;
> +       u32 *lowp, *highp, evmcs_unsupported;
>
>         switch (msr_index) {
>         case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
>                 lowp = &vmx->nested.msrs.pinbased_ctls_low;
>                 highp = &vmx->nested.msrs.pinbased_ctls_high;
> +               if (vmx->nested.enlightened_vmcs_enabled)
> +                       evmcs_unsupported = EVMCS1_UNSUPPORTED_PINCTRL;
>                 break;
>         case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
>                 lowp = &vmx->nested.msrs.procbased_ctls_low;
>                 highp = &vmx->nested.msrs.procbased_ctls_high;
> +               if (vmx->nested.enlightened_vmcs_enabled)
> +                       evmcs_unsupported = 0;
>                 break;
>         case MSR_IA32_VMX_TRUE_EXIT_CTLS:
>                 lowp = &vmx->nested.msrs.exit_ctls_low;
>                 highp = &vmx->nested.msrs.exit_ctls_high;
> +               if (vmx->nested.enlightened_vmcs_enabled)
> +                       evmcs_unsupported = EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
>                 break;
>         case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
>                 lowp = &vmx->nested.msrs.entry_ctls_low;
>                 highp = &vmx->nested.msrs.entry_ctls_high;
> +               if (vmx->nested.enlightened_vmcs_enabled)
> +                       evmcs_unsupported = EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
>                 break;
>         case MSR_IA32_VMX_PROCBASED_CTLS2:
>                 lowp = &vmx->nested.msrs.secondary_ctls_low;
>                 highp = &vmx->nested.msrs.secondary_ctls_high;
> +               if (vmx->nested.enlightened_vmcs_enabled)
> +                       evmcs_unsupported = EVMCS1_UNSUPPORTED_2NDEXEC;
>                 break;
>         default:
>                 BUG();
> @@ -1210,6 +1220,9 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
>
>         supported = vmx_control_msr(*lowp, *highp);
>
> +       /* HACK! */
> +       data &= ~(u64)evmcs_unsupported << 32;
> +
>         /* Check must-be-1 bits are still 1. */
>         if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0)))
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-27 15:38                       ` Vitaly Kuznetsov
@ 2020-01-27 17:53                         ` Paolo Bonzini
  2020-01-27 21:52                           ` Vitaly Kuznetsov
  2020-01-27 18:17                         ` Sean Christopherson
  1 sibling, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2020-01-27 17:53 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Sean Christopherson
  Cc: kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

On 27/01/20 16:38, Vitaly Kuznetsov wrote:
>>> If there are no objections and if we still think it would be beneficial
>>> to minimize the list of controls we filter out (and not go with the full
>>> set like my RFC suggests), I'll prepare v2. (v1, actually, this was RFC).
>> One last idea, can we keep the MSR filtering as is and add the hack in
>> vmx_restore_control_msr()?  That way the (userspace) host and guest see
>> the same values when reading the affected MSRs, and eVMCS wouldn't need
>> it's own hook to do consistency checks.
> Yes but (if I'm not mistaken) we'll have then to keep the filtering we
> currently do in nested_enable_evmcs(): if userspace doesn't do
> KVM_SET_MSR for VMX MSRs (QEMU<4.2) then the filtering in
> vmx_restore_control_msr() won't happen and the guest will see the
> unfiltered set of controls...
> 

Indeed.  The place you used in the RFC is the best we can do, I am afraid.

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-27 15:38                       ` Vitaly Kuznetsov
  2020-01-27 17:53                         ` Paolo Bonzini
@ 2020-01-27 18:17                         ` Sean Christopherson
  1 sibling, 0 replies; 34+ messages in thread
From: Sean Christopherson @ 2020-01-27 18:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

On Mon, Jan 27, 2020 at 04:38:27PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> > One last idea, can we keep the MSR filtering as is and add the hack in
> > vmx_restore_control_msr()?  That way the (userspace) host and guest see
> > the same values when reading the affected MSRs, and eVMCS wouldn't need
> > it's own hook to do consistency checks.
> 
> Yes but (if I'm not mistaken) we'll have then to keep the filtering we
> currently do in nested_enable_evmcs(): if userspace doesn't do
> KVM_SET_MSR for VMX MSRs (QEMU<4.2) then the filtering in
> vmx_restore_control_msr() won't happen and the guest will see the
> unfiltered set of controls...

Ya, my thought was to add this on top of the nested_enable_evmcs() code.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-27 17:53                         ` Paolo Bonzini
@ 2020-01-27 21:52                           ` Vitaly Kuznetsov
  0 siblings, 0 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-27 21:52 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, Jim Mattson, linux-kernel, Liran Alon, Roman Kagan

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 27/01/20 16:38, Vitaly Kuznetsov wrote:
>>>> If there are no objections and if we still think it would be beneficial
>>>> to minimize the list of controls we filter out (and not go with the full
>>>> set like my RFC suggests), I'll prepare v2. (v1, actually, this was RFC).
>>> One last idea, can we keep the MSR filtering as is and add the hack in
>>> vmx_restore_control_msr()?  That way the (userspace) host and guest see
>>> the same values when reading the affected MSRs, and eVMCS wouldn't need
>>> it's own hook to do consistency checks.
>> Yes but (if I'm not mistaken) we'll have then to keep the filtering we
>> currently do in nested_enable_evmcs(): if userspace doesn't do
>> KVM_SET_MSR for VMX MSRs (QEMU<4.2) then the filtering in
>> vmx_restore_control_msr() won't happen and the guest will see the
>> unfiltered set of controls...
>> 
>
> Indeed.  The place you used in the RFC is the best we can do, I am afraid.
>

In case we decide to filter out the full set of unsupported stuff
there's basically nothing to change, feel free to just treat the RFC as
non-RFC :-) (and personally, I'd prefer to keep the 'full set' in the
filter as it is less fragile; the 'short list' I came up with is the
result of my experiments on one hardware host only and I'm not sure what
may make Hyper-V behave differently).

I can re-submit, of course, if needed.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  2020-01-16  8:37     ` Vitaly Kuznetsov
@ 2020-02-03 15:11       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 34+ messages in thread
From: Vitaly Kuznetsov @ 2020-02-03 15:11 UTC (permalink / raw)
  To: Liran Alon
  Cc: kvm list, Paolo Bonzini, Sean Christopherson, Jim Mattson, LKML,
	Roman Kagan

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Liran Alon <liran.alon@oracle.com> writes:
>
>>> On 15 Jan 2020, at 19:10, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>> 
>>> With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
>>> with default (matching CPU model) values and in case eVMCS is also enabled,
>>> fails.
>>> 
>>> It would be possible to drop VMX feature filtering completely and make
>>> this a guest's responsibility: if it decides to use eVMCS it should know
>>> which fields are available and which are not. Hyper-V mostly complies to
>>> this, however, there is at least one problematic control:
>>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
>>> which Hyper-V enables. As there is no 'apic_addr_field' in eVMCS, we
>>> fail to handle this properly in KVM. It is unclear how this is supposed
>>> to work, genuine Hyper-V doesn't expose the control so it is possible that
>>> this is just a bug (in Hyper-V).
>>
>> Have you tried contacted someone at Hyper-V team about this?
>>
>
> Yes, I have.

I heard back from MS, they admited the bug and suggested us ... to hide
the control from L1 when eVMCS is enabled. No surprises here. They also
said the bug is unlikely to get fixed in the existing Hyper-V versions
(2016 and 2019) so it seems we're stuck with the hack for awhile :-(

...

>>> 
>>> #endif /* __KVM_X86_VMX_EVMCS_H */
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index e3394c839dea..8eb74618b8d8 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -1849,8 +1849,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
>>> 		if (!nested_vmx_allowed(vcpu))
>>> 			return 1;
>>> -		return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
>>> -				       &msr_info->data);
>>> +		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
>>> +				    &msr_info->data))
>>> +			return 1;
>>> +		if (!msr_info->host_initiated &&
>>> +		    vmx->nested.enlightened_vmcs_enabled)
>>> +			nested_evmcs_filter_control_msr(msr_info->index,
>>> +							&msr_info->data);
>>> +		break;
>>
>> Nit: It seems more elegant to me to put the call to nested_evmcs_filter_control_msr() inside vmx_get_vmx_msr().
>>
>
> Sure, will move it there (in case we actually decide to merge this)
>

Thinking more about it, we can't check host_initiated in
vmx_get_vmx_msr() as it's not passed there, this can probably get
changed but I don't see a big difference so I'll probably keep the hack
where it is now.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, back to index

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 17:10 [PATCH RFC 0/3] x86/kvm/hyper-v: fix enlightened VMCS & QEMU4.2 Vitaly Kuznetsov
2020-01-15 17:10 ` [PATCH RFC 1/3] x86/kvm/hyper-v: remove stale evmcs_already_enabled check from nested_enable_evmcs() Vitaly Kuznetsov
2020-01-15 22:50   ` Liran Alon
2020-01-15 17:10 ` [PATCH RFC 2/3] x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() Vitaly Kuznetsov
2020-01-15 22:49   ` Liran Alon
2020-01-16  8:37     ` Vitaly Kuznetsov
2020-02-03 15:11       ` Vitaly Kuznetsov
2020-01-15 23:27   ` Sean Christopherson
2020-01-15 23:30     ` Liran Alon
2020-01-16  8:51       ` Vitaly Kuznetsov
2020-01-16 16:19         ` Sean Christopherson
2020-01-16 16:57           ` Vitaly Kuznetsov
2020-01-17  6:31             ` Sean Christopherson
2020-01-18 21:42           ` Paolo Bonzini
2020-01-19  8:54   ` Paolo Bonzini
2020-01-22  5:47     ` Sean Christopherson
2020-01-22  9:37       ` Vitaly Kuznetsov
2020-01-22 14:33       ` Paolo Bonzini
2020-01-22 15:08         ` Vitaly Kuznetsov
2020-01-22 15:51           ` Sean Christopherson
2020-01-22 16:29             ` Vitaly Kuznetsov
2020-01-22 16:40               ` Paolo Bonzini
2020-01-23  9:15                 ` Vitaly Kuznetsov
2020-01-23 19:09                   ` Vitaly Kuznetsov
2020-01-24 17:25                     ` Sean Christopherson
2020-01-27 15:38                       ` Vitaly Kuznetsov
2020-01-27 17:53                         ` Paolo Bonzini
2020-01-27 21:52                           ` Vitaly Kuznetsov
2020-01-27 18:17                         ` Sean Christopherson
2020-01-15 17:10 ` [PATCH RFC 3/3] x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests Vitaly Kuznetsov
2020-01-15 22:59   ` Liran Alon
2020-01-16  8:55     ` Vitaly Kuznetsov
2020-01-16 16:21       ` Sean Christopherson
2020-01-19  8:57         ` Paolo Bonzini

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git