kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests
@ 2019-08-29 20:56 Krish Sadhukhan
  2019-08-29 20:56 ` [PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL " Krish Sadhukhan
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Krish Sadhukhan @ 2019-08-29 20:56 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini, jmattson

Patch 1: Adds KVM check for GUEST_DEBUGCTL
Patch 2: Adds KVM check for GUEST_DR7
Patch 3: Fixes a bug in __enter_guest() in kvm-unit-test source
Patch 4: Adds kvm-unit-tests related to patch# 1 and 2


[PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL on vmentry of nested guests
[PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
[PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state
[PATCH 4/4] kvm-unit-test: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of

 arch/x86/kvm/vmx/nested.c | 10 ++++++++++
 arch/x86/kvm/x86.c        |  2 +-
 arch/x86/kvm/x86.h        | 12 ++++++++++++
 3 files changed, 23 insertions(+), 1 deletion(-)

Krish Sadhukhan (2):
      nVMX: Check GUEST_DEBUGCTL on vmentry of nested guests
      nVMX: Check GUEST_DR7 on vmentry of nested guests

 x86/vmx.c       |  9 +++++----
 x86/vmx_tests.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 4 deletions(-)

Krish Sadhukhan (2):
      nVMX: __enter_guest() should not set "launched" state when VM-entry fails
      nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL on vmentry of nested guests
  2019-08-29 20:56 [PATCH 0/4] KVM: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests Krish Sadhukhan
@ 2019-08-29 20:56 ` Krish Sadhukhan
  2019-08-29 22:12   ` Jim Mattson
  2019-08-29 20:56 ` [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 " Krish Sadhukhan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Krish Sadhukhan @ 2019-08-29 20:56 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini, jmattson

According to section "Checks on Guest Control Registers, Debug Registers, and
and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
of nested guests:

    If the "load debug controls" VM-entry control is 1, bits reserved in the
    IA32_DEBUGCTL MSR must be 0 in the field for that register. The first
    processors to support the virtual-machine extensions supported only the
    1-setting of this control and thus performed this check unconditionally.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
---
 arch/x86/kvm/vmx/nested.c | 4 ++++
 arch/x86/kvm/x86.h        | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 46af3a5e9209..0b234e95e0ed 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2677,6 +2677,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	    !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4))
 		return -EINVAL;
 
+	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
+	    !kvm_debugctl_valid(vmcs12->guest_ia32_debugctl))
+		return -EINVAL;
+
 	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
 	    !kvm_pat_valid(vmcs12->guest_ia32_pat))
 		return -EINVAL;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a470ff0868c5..28ba6d0c359f 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -354,6 +354,12 @@ static inline bool kvm_pat_valid(u64 data)
 	return (data | ((data & 0x0202020202020202ull) << 1)) == data;
 }
 
+static inline bool kvm_debugctl_valid(u64 data)
+{
+	/* Bits 2, 3, 4, 5, 13 and [31:16] are reserved */
+	return ((data & 0xFFFFFFFFFFFF203Cull) ? false : true);
+}
+
 void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu);
 void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-08-29 20:56 [PATCH 0/4] KVM: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests Krish Sadhukhan
  2019-08-29 20:56 ` [PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL " Krish Sadhukhan
@ 2019-08-29 20:56 ` Krish Sadhukhan
  2019-08-29 22:26   ` Jim Mattson
  2019-12-21  0:27   ` Jim Mattson
  2019-08-29 20:56 ` [PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state when VM-entry fails Krish Sadhukhan
  2019-08-29 20:56 ` [PATCH 4/4] kvm-unit-test: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests Krish Sadhukhan
  3 siblings, 2 replies; 27+ messages in thread
From: Krish Sadhukhan @ 2019-08-29 20:56 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini, jmattson

According to section "Checks on Guest Control Registers, Debug Registers, and
and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
of nested guests:

    If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
    field must be 0.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++++
 arch/x86/kvm/x86.c        | 2 +-
 arch/x86/kvm/x86.h        | 6 ++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0b234e95e0ed..f04619daf906 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2681,6 +2681,12 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	    !kvm_debugctl_valid(vmcs12->guest_ia32_debugctl))
 		return -EINVAL;
 
+#ifdef CONFIG_X86_64
+	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
+	    !kvm_dr7_valid(vmcs12->guest_dr7))
+		return -EINVAL;
+#endif
+
 	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
 	    !kvm_pat_valid(vmcs12->guest_ia32_pat))
 		return -EINVAL;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fafd81d2c9ea..423a7a573608 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1051,7 +1051,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
 	case 5:
 		/* fall through */
 	default: /* 7 */
-		if (val & 0xffffffff00000000ULL)
+		if (!kvm_dr7_valid(val))
 			return -1; /* #GP */
 		vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
 		kvm_update_dr7(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 28ba6d0c359f..4e55851fc3fb 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -360,6 +360,12 @@ static inline bool kvm_debugctl_valid(u64 data)
 	return ((data & 0xFFFFFFFFFFFF203Cull) ? false : true);
 }
 
+static inline bool kvm_dr7_valid(u64 data)
+{
+	/* Bits [63:32] are reserved */
+	return ((data & 0xFFFFFFFF00000000ull) ? false : true);
+}
+
 void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu);
 void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state when VM-entry fails
  2019-08-29 20:56 [PATCH 0/4] KVM: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests Krish Sadhukhan
  2019-08-29 20:56 ` [PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL " Krish Sadhukhan
  2019-08-29 20:56 ` [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 " Krish Sadhukhan
@ 2019-08-29 20:56 ` Krish Sadhukhan
  2019-09-04 15:42   ` Sean Christopherson
  2019-08-29 20:56 ` [PATCH 4/4] kvm-unit-test: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests Krish Sadhukhan
  3 siblings, 1 reply; 27+ messages in thread
From: Krish Sadhukhan @ 2019-08-29 20:56 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini, jmattson

Bit# 31 in VM-exit reason is set by hardware in both cases of early VM-entry
failures and VM-entry failures due to invalid guest state. Whenever VM-entry
fails, the nested VMCS is not in "launched" state any more. Hence,
__enter_guest() should not set the "launched" state when a VM-entry fails.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
---
 x86/vmx.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index 872ba11..183d11b 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -1805,6 +1805,8 @@ static void check_for_guest_termination(void)
  */
 static void __enter_guest(u8 abort_flag, struct vmentry_failure *failure)
 {
+	bool vm_entry_failure;
+
 	TEST_ASSERT_MSG(v2_guest_main,
 			"Never called test_set_guest_func!");
 
@@ -1812,15 +1814,14 @@ static void __enter_guest(u8 abort_flag, struct vmentry_failure *failure)
 			"Called enter_guest() after guest returned.");
 
 	vmx_enter_guest(failure);
+	vm_entry_failure = vmcs_read(EXI_REASON) & VMX_ENTRY_FAILURE;
 	if ((abort_flag & ABORT_ON_EARLY_VMENTRY_FAIL && failure->early) ||
-	    (abort_flag & ABORT_ON_INVALID_GUEST_STATE &&
-	    vmcs_read(EXI_REASON) & VMX_ENTRY_FAILURE)) {
-
+	    (abort_flag & ABORT_ON_INVALID_GUEST_STATE && vm_entry_failure)) {
 		print_vmentry_failure_info(failure);
 		abort();
 	}
 
-	if (!failure->early) {
+	if (!vm_entry_failure) {
 		launched = 1;
 		check_for_guest_termination();
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/4] kvm-unit-test: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests
  2019-08-29 20:56 [PATCH 0/4] KVM: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests Krish Sadhukhan
                   ` (2 preceding siblings ...)
  2019-08-29 20:56 ` [PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state when VM-entry fails Krish Sadhukhan
@ 2019-08-29 20:56 ` Krish Sadhukhan
  2019-08-29 23:17   ` Jim Mattson
  3 siblings, 1 reply; 27+ messages in thread
From: Krish Sadhukhan @ 2019-08-29 20:56 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar, pbonzini, jmattson

According to section "Checks on Guest Control Registers, Debug Registers, and
and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
of nested guests:

    If the "load debug controls" VM-entry control is 1,

       - bits reserved in the IA32_DEBUGCTL MSR must be 0 in the field for
         that register. The first processors to support the virtual-machine
         extensions supported only the 1-setting of this control and thus
         performed this check unconditionally.

       - bits 63:32 in the DR7 field must be 0.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
---
 x86/vmx_tests.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 8ad2674..0207caf 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -7154,6 +7154,64 @@ static void test_load_guest_pat(void)
 	test_pat(GUEST_PAT, "GUEST_PAT", ENT_CONTROLS, ENT_LOAD_PAT);
 }
 
+/*
+ * If the “load debug controls” VM-entry control is 1,
+ *
+ *   - bits reserved in the IA32_DEBUGCTL MSR must be 0 in the field for
+ *     that register.
+ *   - bits 63:32 in the DR7 field must be 0.
+ */
+static void test_debugctl(void)
+{
+	u64 debugctl_saved = vmcs_read(GUEST_DEBUGCTL);
+	u32 entry_ctl_saved = vmcs_read(ENT_CONTROLS);
+	u64 tmp;
+	int i;
+	u64 dr7_saved = vmcs_read(GUEST_DR7);
+
+	if (!(ctrl_exit_rev.clr & ENT_LOAD_DBGCTLS)) {
+		printf("\"IA32_DEBUGCTL\" VM-entry control not supported\n");
+		return;
+	}
+
+	vmx_set_test_stage(1);
+	test_set_guest(guest_state_test_main);
+
+#define	DEBUGCTL_RESERVED_BITS	0xFFFFFFFFFFFF203C
+
+	if (!(entry_ctl_saved & ENT_LOAD_DBGCTLS))
+		vmcs_write(ENT_CONTROLS, entry_ctl_saved | ENT_LOAD_DBGCTLS);
+
+	for (i = 2; i < 32; (i >= 16 ? i = i + 4 : i++)) {
+		if (!((1 << i) & DEBUGCTL_RESERVED_BITS))
+			continue;
+		tmp = debugctl_saved | (1 << i);
+		vmcs_write(GUEST_DEBUGCTL, tmp);
+		enter_guest_with_invalid_guest_state();
+		report_guest_state_test("ENT_LOAD_DBGCTLS enabled",
+				        VMX_FAIL_STATE | VMX_ENTRY_FAILURE,
+				        tmp, "GUEST_DEBUGCTL");
+	}
+
+	for (i = 32; i < 64; i = i + 4) {
+		tmp = dr7_saved | (1ull << i);
+		vmcs_write(GUEST_DR7, tmp);
+		enter_guest_with_invalid_guest_state();
+		report_guest_state_test("ENT_LOAD_DBGCTLS enabled",
+				        VMX_FAIL_STATE | VMX_ENTRY_FAILURE,
+				        tmp, "GUEST_DR7");
+	}
+
+	/*
+	 * Let the guest finish execution
+	 */
+	vmx_set_test_stage(2);
+	vmcs_write(GUEST_DEBUGCTL, debugctl_saved);
+	vmcs_write(ENT_CONTROLS, entry_ctl_saved);
+	vmcs_write(GUEST_DR7, dr7_saved);
+	enter_guest();
+}
+
 /*
  * Check that the virtual CPU checks the VMX Guest State Area as
  * documented in the Intel SDM.
@@ -7161,6 +7219,7 @@ static void test_load_guest_pat(void)
 static void vmx_guest_state_area_test(void)
 {
 	test_load_guest_pat();
+	test_debugctl();
 }
 
 static bool valid_vmcs_for_vmentry(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL on vmentry of nested guests
  2019-08-29 20:56 ` [PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL " Krish Sadhukhan
@ 2019-08-29 22:12   ` Jim Mattson
  2019-08-30 23:26     ` Krish Sadhukhan
  0 siblings, 1 reply; 27+ messages in thread
From: Jim Mattson @ 2019-08-29 22:12 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
> According to section "Checks on Guest Control Registers, Debug Registers, and
> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> of nested guests:
>
>     If the "load debug controls" VM-entry control is 1, bits reserved in the
>     IA32_DEBUGCTL MSR must be 0 in the field for that register. The first
>     processors to support the virtual-machine extensions supported only the
>     1-setting of this control and thus performed this check unconditionally.
>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 4 ++++
>  arch/x86/kvm/x86.h        | 6 ++++++
>  2 files changed, 10 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 46af3a5e9209..0b234e95e0ed 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2677,6 +2677,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
>             !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4))
>                 return -EINVAL;
>
> +       if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> +           !kvm_debugctl_valid(vmcs12->guest_ia32_debugctl))
> +               return -EINVAL;
> +
>         if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
>             !kvm_pat_valid(vmcs12->guest_ia32_pat))
>                 return -EINVAL;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index a470ff0868c5..28ba6d0c359f 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -354,6 +354,12 @@ static inline bool kvm_pat_valid(u64 data)
>         return (data | ((data & 0x0202020202020202ull) << 1)) == data;
>  }
>
> +static inline bool kvm_debugctl_valid(u64 data)
> +{
> +       /* Bits 2, 3, 4, 5, 13 and [31:16] are reserved */
> +       return ((data & 0xFFFFFFFFFFFF203Cull) ? false : true);
> +}

This should actually be consistent with the constraints in kvm_set_msr_common:

case MSR_IA32_DEBUGCTLMSR:
        if (!data) {
                /* We support the non-activated case already */
                break;
        } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
                /* Values other than LBR and BTF are vendor-specific,
                   thus reserved and should throw a #GP */
                return 1;
        }

Also, as I said earlier...

I'd rather see this built on an interface like:

bool kvm_valid_msr_value(u32 msr_index, u64 value);

Strange that we allow IA32_DEBUGCTL.BTF, since kvm_vcpu_do_singlestep
ignores it. And vLBR still isn't a thing, is it?

It's a bit scary to me that we allow any architecturally legal
IA32_DEBUGCTL bits to be set today. There's probably a CVE in there
somewhere.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-08-29 20:56 ` [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 " Krish Sadhukhan
@ 2019-08-29 22:26   ` Jim Mattson
  2019-08-30 23:07     ` Krish Sadhukhan
  2019-12-21  0:27   ` Jim Mattson
  1 sibling, 1 reply; 27+ messages in thread
From: Jim Mattson @ 2019-08-29 22:26 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
> According to section "Checks on Guest Control Registers, Debug Registers, and
> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> of nested guests:
>
>     If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
>     field must be 0.

Can't we just let the hardware check guest DR7? This results in
"VM-entry failure due to invalid guest state," right? And we just
reflect that to L1?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] kvm-unit-test: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests
  2019-08-29 20:56 ` [PATCH 4/4] kvm-unit-test: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests Krish Sadhukhan
@ 2019-08-29 23:17   ` Jim Mattson
  2019-08-30  1:12     ` Nadav Amit
  0 siblings, 1 reply; 27+ messages in thread
From: Jim Mattson @ 2019-08-29 23:17 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Thu, Aug 29, 2019 at 2:30 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
> According to section "Checks on Guest Control Registers, Debug Registers, and
> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> of nested guests:
>
>     If the "load debug controls" VM-entry control is 1,
>
>        - bits reserved in the IA32_DEBUGCTL MSR must be 0 in the field for
>          that register. The first processors to support the virtual-machine
>          extensions supported only the 1-setting of this control and thus
>          performed this check unconditionally.
>
>        - bits 63:32 in the DR7 field must be 0.
>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
> ---
> ...
> +#define        DEBUGCTL_RESERVED_BITS  0xFFFFFFFFFFFF203C

For the virtual CPU implemented by kvm today, only bits 0 and 1 are allowed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] kvm-unit-test: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests
  2019-08-29 23:17   ` Jim Mattson
@ 2019-08-30  1:12     ` Nadav Amit
  0 siblings, 0 replies; 27+ messages in thread
From: Nadav Amit @ 2019-08-30  1:12 UTC (permalink / raw)
  To: Jim Mattson, Krish Sadhukhan
  Cc: kvm list, Radim Krčmář, Paolo Bonzini

> On Aug 29, 2019, at 4:17 PM, Jim Mattson <jmattson@google.com> wrote:
> 
> On Thu, Aug 29, 2019 at 2:30 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
>> According to section "Checks on Guest Control Registers, Debug Registers, and
>> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
>> of nested guests:
>> 
>>    If the "load debug controls" VM-entry control is 1,
>> 
>>       - bits reserved in the IA32_DEBUGCTL MSR must be 0 in the field for
>>         that register. The first processors to support the virtual-machine
>>         extensions supported only the 1-setting of this control and thus
>>         performed this check unconditionally.
>> 
>>       - bits 63:32 in the DR7 field must be 0.
>> 
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
>> ---
>> ...
>> +#define        DEBUGCTL_RESERVED_BITS  0xFFFFFFFFFFFF203C
> 
> For the virtual CPU implemented by kvm today, only bits 0 and 1 are allowed.

Please build the tests so they would pass on bare-metal.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-08-29 22:26   ` Jim Mattson
@ 2019-08-30 23:07     ` Krish Sadhukhan
  2019-08-30 23:15       ` Jim Mattson
  0 siblings, 1 reply; 27+ messages in thread
From: Krish Sadhukhan @ 2019-08-30 23:07 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Radim Krčmář, Paolo Bonzini



On 08/29/2019 03:26 PM, Jim Mattson wrote:
> On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
>> According to section "Checks on Guest Control Registers, Debug Registers, and
>> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
>> of nested guests:
>>
>>      If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
>>      field must be 0.
> Can't we just let the hardware check guest DR7? This results in
> "VM-entry failure due to invalid guest state," right? And we just
> reflect that to L1?

Just trying to understand the reason why this particular check can be 
deferred to the hardware.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-08-30 23:07     ` Krish Sadhukhan
@ 2019-08-30 23:15       ` Jim Mattson
  2019-09-02  0:33         ` Jim Mattson
  2019-12-20 23:45         ` Jim Mattson
  0 siblings, 2 replies; 27+ messages in thread
From: Jim Mattson @ 2019-08-30 23:15 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Fri, Aug 30, 2019 at 4:07 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
>
> On 08/29/2019 03:26 PM, Jim Mattson wrote:
> > On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
> > <krish.sadhukhan@oracle.com> wrote:
> >> According to section "Checks on Guest Control Registers, Debug Registers, and
> >> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> >> of nested guests:
> >>
> >>      If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
> >>      field must be 0.
> > Can't we just let the hardware check guest DR7? This results in
> > "VM-entry failure due to invalid guest state," right? And we just
> > reflect that to L1?
>
> Just trying to understand the reason why this particular check can be
> deferred to the hardware.

The vmcs02 field has the same value as the vmcs12 field, and the
physical CPU has the same requirements as the virtual CPU.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL on vmentry of nested guests
  2019-08-29 22:12   ` Jim Mattson
@ 2019-08-30 23:26     ` Krish Sadhukhan
  2019-09-01 23:55       ` Jim Mattson
  0 siblings, 1 reply; 27+ messages in thread
From: Krish Sadhukhan @ 2019-08-30 23:26 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Radim Krčmář, Paolo Bonzini



On 08/29/2019 03:12 PM, Jim Mattson wrote:
> On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
>> According to section "Checks on Guest Control Registers, Debug Registers, and
>> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
>> of nested guests:
>>
>>      If the "load debug controls" VM-entry control is 1, bits reserved in the
>>      IA32_DEBUGCTL MSR must be 0 in the field for that register. The first
>>      processors to support the virtual-machine extensions supported only the
>>      1-setting of this control and thus performed this check unconditionally.
>>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
>> ---
>>   arch/x86/kvm/vmx/nested.c | 4 ++++
>>   arch/x86/kvm/x86.h        | 6 ++++++
>>   2 files changed, 10 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 46af3a5e9209..0b234e95e0ed 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -2677,6 +2677,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
>>              !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4))
>>                  return -EINVAL;
>>
>> +       if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
>> +           !kvm_debugctl_valid(vmcs12->guest_ia32_debugctl))
>> +               return -EINVAL;
>> +
>>          if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
>>              !kvm_pat_valid(vmcs12->guest_ia32_pat))
>>                  return -EINVAL;
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index a470ff0868c5..28ba6d0c359f 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -354,6 +354,12 @@ static inline bool kvm_pat_valid(u64 data)
>>          return (data | ((data & 0x0202020202020202ull) << 1)) == data;
>>   }
>>
>> +static inline bool kvm_debugctl_valid(u64 data)
>> +{
>> +       /* Bits 2, 3, 4, 5, 13 and [31:16] are reserved */
>> +       return ((data & 0xFFFFFFFFFFFF203Cull) ? false : true);
>> +}
> This should actually be consistent with the constraints in kvm_set_msr_common:
>
> case MSR_IA32_DEBUGCTLMSR:
>          if (!data) {
>                  /* We support the non-activated case already */
>                  break;
>          } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
>                  /* Values other than LBR and BTF are vendor-specific,
>                     thus reserved and should throw a #GP */
>                  return 1;
>          }
>
> Also, as I said earlier...
>
> I'd rather see this built on an interface like:
>
> bool kvm_valid_msr_value(u32 msr_index, u64 value);

Yes, I forgot to do it. Will send a patch for this...

>
> Strange that we allow IA32_DEBUGCTL.BTF, since kvm_vcpu_do_singlestep
> ignores it. And vLBR still isn't a thing, is it?

Yes, DEBUGCTLMSR_LBR isn't used.
Good catch !

>
> It's a bit scary to me that we allow any architecturally legal
> IA32_DEBUGCTL bits to be set today. There's probably a CVE in there
> somewhere.
Is it appropriate to disable those two bits as well, then ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL on vmentry of nested guests
  2019-08-30 23:26     ` Krish Sadhukhan
@ 2019-09-01 23:55       ` Jim Mattson
  0 siblings, 0 replies; 27+ messages in thread
From: Jim Mattson @ 2019-09-01 23:55 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Fri, Aug 30, 2019 at 4:26 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
>
> On 08/29/2019 03:12 PM, Jim Mattson wrote:
> > On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
> > <krish.sadhukhan@oracle.com> wrote:
> >> According to section "Checks on Guest Control Registers, Debug Registers, and
> >> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> >> of nested guests:
> >>
> >>      If the "load debug controls" VM-entry control is 1, bits reserved in the
> >>      IA32_DEBUGCTL MSR must be 0 in the field for that register. The first
> >>      processors to support the virtual-machine extensions supported only the
> >>      1-setting of this control and thus performed this check unconditionally.
> >>
> >> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> >> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
> >> ---
> >>   arch/x86/kvm/vmx/nested.c | 4 ++++
> >>   arch/x86/kvm/x86.h        | 6 ++++++
> >>   2 files changed, 10 insertions(+)
> >>
> >> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> >> index 46af3a5e9209..0b234e95e0ed 100644
> >> --- a/arch/x86/kvm/vmx/nested.c
> >> +++ b/arch/x86/kvm/vmx/nested.c
> >> @@ -2677,6 +2677,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> >>              !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4))
> >>                  return -EINVAL;
> >>
> >> +       if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> >> +           !kvm_debugctl_valid(vmcs12->guest_ia32_debugctl))
> >> +               return -EINVAL;
> >> +
> >>          if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
> >>              !kvm_pat_valid(vmcs12->guest_ia32_pat))
> >>                  return -EINVAL;
> >> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> >> index a470ff0868c5..28ba6d0c359f 100644
> >> --- a/arch/x86/kvm/x86.h
> >> +++ b/arch/x86/kvm/x86.h
> >> @@ -354,6 +354,12 @@ static inline bool kvm_pat_valid(u64 data)
> >>          return (data | ((data & 0x0202020202020202ull) << 1)) == data;
> >>   }
> >>
> >> +static inline bool kvm_debugctl_valid(u64 data)
> >> +{
> >> +       /* Bits 2, 3, 4, 5, 13 and [31:16] are reserved */
> >> +       return ((data & 0xFFFFFFFFFFFF203Cull) ? false : true);
> >> +}
> > This should actually be consistent with the constraints in kvm_set_msr_common:
> >
> > case MSR_IA32_DEBUGCTLMSR:
> >          if (!data) {
> >                  /* We support the non-activated case already */
> >                  break;
> >          } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
> >                  /* Values other than LBR and BTF are vendor-specific,
> >                     thus reserved and should throw a #GP */
> >                  return 1;
> >          }
> >
> > Also, as I said earlier...
> >
> > I'd rather see this built on an interface like:
> >
> > bool kvm_valid_msr_value(u32 msr_index, u64 value);
>
> Yes, I forgot to do it. Will send a patch for this...
>
> >
> > Strange that we allow IA32_DEBUGCTL.BTF, since kvm_vcpu_do_singlestep
> > ignores it. And vLBR still isn't a thing, is it?
>
> Yes, DEBUGCTLMSR_LBR isn't used.
> Good catch !
>
> >
> > It's a bit scary to me that we allow any architecturally legal
> > IA32_DEBUGCTL bits to be set today. There's probably a CVE in there
> > somewhere.
> Is it appropriate to disable those two bits as well, then ?

IA32_DEBUGCTL.BTF is just broken, and should be fixed.
IA32_DEBUGCTL.LBR is probably a long way from actually working, but
IIRC, Windows gets cranky if it can't set the bit.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-08-30 23:15       ` Jim Mattson
@ 2019-09-02  0:33         ` Jim Mattson
       [not found]           ` <e229bea2-acb2-e268-6281-d8e467c3282e@oracle.com>
  2019-09-04 17:14           ` Sean Christopherson
  2019-12-20 23:45         ` Jim Mattson
  1 sibling, 2 replies; 27+ messages in thread
From: Jim Mattson @ 2019-09-02  0:33 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Fri, Aug 30, 2019 at 4:15 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Aug 30, 2019 at 4:07 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
> >
> >
> >
> > On 08/29/2019 03:26 PM, Jim Mattson wrote:
> > > On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
> > > <krish.sadhukhan@oracle.com> wrote:
> > >> According to section "Checks on Guest Control Registers, Debug Registers, and
> > >> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> > >> of nested guests:
> > >>
> > >>      If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
> > >>      field must be 0.
> > > Can't we just let the hardware check guest DR7? This results in
> > > "VM-entry failure due to invalid guest state," right? And we just
> > > reflect that to L1?
> >
> > Just trying to understand the reason why this particular check can be
> > deferred to the hardware.
>
> The vmcs02 field has the same value as the vmcs12 field, and the
> physical CPU has the same requirements as the virtual CPU.

Actually, you're right. There is a problem. With the current
implementation, there's a priority inversion if the vmcs12 contains
both illegal guest state for which the checks are deferred to
hardware, and illegal entries in the VM-entry MSR-load area. In this
case, we will synthesize a "VM-entry failure due to MSR loading"
rather than a "VM-entry failure due to invalid guest state."

There are so many checks on guest state that it's really compelling to
defer as many as possible to hardware. However, we need to fix the
aforesaid priority inversion. Instead of returning early from
nested_vmx_enter_non_root_mode() with EXIT_REASON_MSR_LOAD_FAIL, we
could induce a "VM-entry failure due to MSR loading" for the next
VM-entry of vmcs02 and continue with the attempted vmcs02 VM-entry. If
hardware exits with EXIT_REASON_INVALID_STATE, we reflect that to L1,
and if hardware exits with EXIT_REASON_INVALID_STATE, we reflect that
to L1 (along with the appropriate exit qualification).

The tricky part is in undoing the successful MSR writes if we reflect
EXIT_REASON_INVALID_STATE to L1. Some MSR writes can't actually be
undone (e.g. writes to IA32_PRED_CMD), but maybe we can get away with
those. (Fortunately, it's illegal to put x2APIC MSRs in the VM-entry
MSR-load area!) Other MSR writes are just a bit tricky to undo (e.g.
writes to IA32_TIME_STAMP_COUNTER).

Alternatively, we could perform validity checks on the entire vmcs12
VM-entry MSR-load area before writing any of the MSRs. This may be
easier, but it would certainly be slower. We would have to be wary of
situations where processing an earlier entry affects the validity of a
later entry. (If we take this route, then we would also have to
process the valid prefix of the VM-entry MSR-load area when we reflect
EXIT_REASON_MSR_LOAD_FAIL to L1.)

Note that this approach could be extended to permit the deferral of
some control field checks to hardware as well. As long as the control
field is copied verbatim from vmcs12 to vmcs02 and the virtual CPU
enforces the same constraints as the physical CPU, deferral should be
fine. We just have to make sure that we induce a "VM-entry failure due
to invalid guest state" for the next VM-entry of vmcs02 if any
software checks on guest state fail, rather than immediately
synthesizing an "VM-entry failure due to invalid guest state" during
the construction of vmcs02.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state when VM-entry fails
  2019-08-29 20:56 ` [PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state when VM-entry fails Krish Sadhukhan
@ 2019-09-04 15:42   ` Sean Christopherson
  2019-09-13 20:37     ` Krish Sadhukhan
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-09-04 15:42 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm, rkrcmar, pbonzini, jmattson

On Thu, Aug 29, 2019 at 04:56:34PM -0400, Krish Sadhukhan wrote:
> Bit# 31 in VM-exit reason is set by hardware in both cases of early VM-entry
> failures and VM-entry failures due to invalid guest state.

This is incorrect, VMCS.EXIT_REASON is not written on a VM-Fail.  If the
tests are passing, you're probably consuming a stale EXIT_REASON.

> Whenever VM-entry
> fails, the nested VMCS is not in "launched" state any more. Hence,
> __enter_guest() should not set the "launched" state when a VM-entry fails.
> 
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
> ---
>  x86/vmx.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/x86/vmx.c b/x86/vmx.c
> index 872ba11..183d11b 100644
> --- a/x86/vmx.c
> +++ b/x86/vmx.c
> @@ -1805,6 +1805,8 @@ static void check_for_guest_termination(void)
>   */
>  static void __enter_guest(u8 abort_flag, struct vmentry_failure *failure)
>  {
> +	bool vm_entry_failure;
> +
>  	TEST_ASSERT_MSG(v2_guest_main,
>  			"Never called test_set_guest_func!");
>  
> @@ -1812,15 +1814,14 @@ static void __enter_guest(u8 abort_flag, struct vmentry_failure *failure)
>  			"Called enter_guest() after guest returned.");
>  
>  	vmx_enter_guest(failure);
> +	vm_entry_failure = vmcs_read(EXI_REASON) & VMX_ENTRY_FAILURE;

Rather than duplicating the code in vmx_run(), what if we move this check
into vmx_enter_guest() and rework struct vmentry_failure?  The code was
originally designed to handle only VM-Fail conditions, we should clean it
up instead of bolting more stuff on top.  E.g.:

struct vmentry_status {
	/* Did we attempt VMLAUNCH or VMRESUME */ 
	bool vmlaunch;
	/* Instruction mnemonic (for convenience). */
	const char *instr;
	/* VM-Enter passed all consistency checks, i.e. did not fail. */
	bool succeeded;
	/* VM-Enter failed before loading guest state, i.e. VM-Fail. */
	bool vm_fail;
	/* Contents of RFLAGS on VM-Fail, EXIT_REASON on VM-Exit.  */
	union {
		unsigned long vm_fail_flags;
		unsigned long vm_exit_reason;
	};
};

static void vmx_enter_guest(struct vmentry_status *status)
{
	status->vm_fail = 0;

	in_guest = 1;
	asm volatile (
		"mov %[HOST_RSP], %%rdi\n\t"
		"vmwrite %%rsp, %%rdi\n\t"
		LOAD_GPR_C
		"cmpb $0, %[launched]\n\t"
		"jne 1f\n\t"
		"vmlaunch\n\t"
		"jmp 2f\n\t"
		"1: "
		"vmresume\n\t"
		"2: "
		SAVE_GPR_C
		"pushf\n\t"
		"pop %%rdi\n\t"
		"mov %%rdi, %[vm_fail_flags]\n\t"
		"movl $1, %[vm_fail]\n\t"
		"jmp 3f\n\t"
		"vmx_return:\n\t"
		SAVE_GPR_C
		"3: \n\t"
		: [vm_fail]"+m"(status->vm_fail),
		  [vm_fail_flags]"=m"(status->vm_fail_flags)
		: [launched]"m"(launched), [HOST_RSP]"i"(HOST_RSP)
		: "rdi", "memory", "cc"
	);
	in_guest = 0;

	if (!status->vm_fail)
		status->vm_exit_reason = vmcs_read(EXI_REASON);
		
	status->succeeded = !status->vm_fail &&
			    !(status->vm_exit_reason & VMX_ENTRY_FAILURE);

	status->vmlaunch = !launched;
	status->instr = launched ? "vmresume" : "vmlaunch";

	if (status->succeeded)
		launched = 1;
}

>  	if ((abort_flag & ABORT_ON_EARLY_VMENTRY_FAIL && failure->early) ||
> -	    (abort_flag & ABORT_ON_INVALID_GUEST_STATE &&
> -	    vmcs_read(EXI_REASON) & VMX_ENTRY_FAILURE)) {
> -
> +	    (abort_flag & ABORT_ON_INVALID_GUEST_STATE && vm_entry_failure)) {
>  		print_vmentry_failure_info(failure);
>  		abort();
>  	}
>  
> -	if (!failure->early) {
> +	if (!vm_entry_failure) {
>  		launched = 1;
>  		check_for_guest_termination();
>  	}
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
       [not found]           ` <e229bea2-acb2-e268-6281-d8e467c3282e@oracle.com>
@ 2019-09-04 16:44             ` Jim Mattson
  2019-09-04 16:58               ` Sean Christopherson
  2019-09-04 18:05               ` Krish Sadhukhan
  0 siblings, 2 replies; 27+ messages in thread
From: Jim Mattson @ 2019-09-04 16:44 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Tue, Sep 3, 2019 at 5:59 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
>
> On 09/01/2019 05:33 PM, Jim Mattson wrote:
>
> On Fri, Aug 30, 2019 at 4:15 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Aug 30, 2019 at 4:07 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
>
> On 08/29/2019 03:26 PM, Jim Mattson wrote:
>
> On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
>
> According to section "Checks on Guest Control Registers, Debug Registers, and
> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> of nested guests:
>
>      If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
>      field must be 0.
>
> Can't we just let the hardware check guest DR7? This results in
> "VM-entry failure due to invalid guest state," right? And we just
> reflect that to L1?
>
> Just trying to understand the reason why this particular check can be
> deferred to the hardware.
>
> The vmcs02 field has the same value as the vmcs12 field, and the
> physical CPU has the same requirements as the virtual CPU.
>
> Actually, you're right. There is a problem. With the current
> implementation, there's a priority inversion if the vmcs12 contains
> both illegal guest state for which the checks are deferred to
> hardware, and illegal entries in the VM-entry MSR-load area. In this
> case, we will synthesize a "VM-entry failure due to MSR loading"
> rather than a "VM-entry failure due to invalid guest state."
>
> There are so many checks on guest state that it's really compelling to
> defer as many as possible to hardware. However, we need to fix the
> aforesaid priority inversion. Instead of returning early from
> nested_vmx_enter_non_root_mode() with EXIT_REASON_MSR_LOAD_FAIL, we
> could induce a "VM-entry failure due to MSR loading" for the next
> VM-entry of vmcs02 and continue with the attempted vmcs02 VM-entry. If
> hardware exits with EXIT_REASON_INVALID_STATE, we reflect that to L1,
> and if hardware exits with EXIT_REASON_INVALID_STATE, we reflect that
> to L1 (along with the appropriate exit qualification).
>
>
> Looking at nested_vmx_exit_reflected(), it seems we do return to L1 if the error is EXIT_REASON_INVALID_STATE. So if we fix the priority inversion, this should work then ?

Yes.

> The tricky part is in undoing the successful MSR writes if we reflect
> EXIT_REASON_INVALID_STATE to L1. Some MSR writes can't actually be
> undone (e.g. writes to IA32_PRED_CMD), but maybe we can get away with
> those. (Fortunately, it's illegal to put x2APIC MSRs in the VM-entry
> MSR-load area!) Other MSR writes are just a bit tricky to undo (e.g.
> writes to IA32_TIME_STAMP_COUNTER).
>
>
> Let's say that the priority inversion issue is fixed. In the scenario in which the Guest state is fine but the VM-entry MSR-Load area contains an illegal entry,  you are saying that the induced "VM-entry failure due to MSR loading"  will be caught during the next VM-entry of vmcs02. So how far does the attempted VM-entry of vmcs02  continue with an illegal MSR-Load entry and how do get to the next VM-entry of vmcs02 ?

Sorry; I don't understand the questions.
>
> There are two other scenarios there:
>
>     1. Guest state is illegal and VM-entry MSR-Load area contains an illegal entry
>     2. Guest state is illegal but VM-entry MSR-Load area is fine
>
> In these scenarios, L2 will exit to L1 with EXIT_REASON_INVALID_STATE and finally this will be returned to L1 userspace. Right ?  If so, we do we care about reverting MSR-writes  because the SDM section 26.8 say,
>
>         "Processor state is loaded as would be done on a VM exit (see Section 27.5)"

I'm not sure how the referenced section of the SDM is relevant. Are
you assuming that every MSR in the VM-entry MSR load area also appears
in the VM-exit MSR load area? That certainly isn't the case.

> Alternatively, we could perform validity checks on the entire vmcs12
> VM-entry MSR-load area before writing any of the MSRs. This may be
> easier, but it would certainly be slower. We would have to be wary of
> situations where processing an earlier entry affects the validity of a
> later entry. (If we take this route, then we would also have to
> process the valid prefix of the VM-entry MSR-load area when we reflect
> EXIT_REASON_MSR_LOAD_FAIL to L1.)

Forget this paragraph. Even if all of the checks pass, we still have
to undo all of the MSR-writes in the event of a deferred "VM-entry
failure due to invalid guest state."

> Note that this approach could be extended to permit the deferral of
> some control field checks to hardware as well.
>
>
> Why can't the first approach be used for VM-entry controls as well ?

Sorry; I don't understand this question either.

>  As long as the control
> field is copied verbatim from vmcs12 to vmcs02 and the virtual CPU
> enforces the same constraints as the physical CPU, deferral should be
> fine. We just have to make sure that we induce a "VM-entry failure due
> to invalid guest state" for the next VM-entry of vmcs02 if any
> software checks on guest state fail, rather than immediately
> synthesizing an "VM-entry failure due to invalid guest state" during
> the construction of vmcs02.
>
>
> Is it OK to keep this Guest check in software for now and then remove it once we have a solution in place ?

Why do you feel that getting the priority correct is so important for
this one check in particular? I'd be surprised if any hypervisor ever
assembled a VMCS that failed this check.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-09-04 16:44             ` Jim Mattson
@ 2019-09-04 16:58               ` Sean Christopherson
  2019-09-04 18:05               ` Krish Sadhukhan
  1 sibling, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-09-04 16:58 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Krish Sadhukhan, kvm list, Radim Krčmář, Paolo Bonzini

On Wed, Sep 04, 2019 at 09:44:58AM -0700, Jim Mattson wrote:
> On Tue, Sep 3, 2019 at 5:59 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
> > Is it OK to keep this Guest check in software for now and then remove it
> > once we have a solution in place ?
> 
> Why do you feel that getting the priority correct is so important for
> this one check in particular? I'd be surprised if any hypervisor ever
> assembled a VMCS that failed this check.

Agreed.  I don't see much value in adding a subset of guest state checks,
and adding every check will be painfully slow.  IMO we're better off
finding a solution that allows deferring guest state checks to hardware.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-09-02  0:33         ` Jim Mattson
       [not found]           ` <e229bea2-acb2-e268-6281-d8e467c3282e@oracle.com>
@ 2019-09-04 17:14           ` Sean Christopherson
  1 sibling, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2019-09-04 17:14 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Krish Sadhukhan, kvm list, Radim Krčmář, Paolo Bonzini

On Sun, Sep 01, 2019 at 05:33:26PM -0700, Jim Mattson wrote:
> Actually, you're right. There is a problem. With the current
> implementation, there's a priority inversion if the vmcs12 contains
> both illegal guest state for which the checks are deferred to
> hardware, and illegal entries in the VM-entry MSR-load area. In this
> case, we will synthesize a "VM-entry failure due to MSR loading"
> rather than a "VM-entry failure due to invalid guest state."
> 
> There are so many checks on guest state that it's really compelling to
> defer as many as possible to hardware. However, we need to fix the
> aforesaid priority inversion. Instead of returning early from
> nested_vmx_enter_non_root_mode() with EXIT_REASON_MSR_LOAD_FAIL, we
> could induce a "VM-entry failure due to MSR loading" for the next
> VM-entry of vmcs02 and continue with the attempted vmcs02 VM-entry. If
> hardware exits with EXIT_REASON_INVALID_STATE, we reflect that to L1,
> and if hardware exits with EXIT_REASON_INVALID_STATE, we reflect that
> to L1 (along with the appropriate exit qualification).
> 
> The tricky part is in undoing the successful MSR writes if we reflect
> EXIT_REASON_INVALID_STATE to L1. Some MSR writes can't actually be
> undone (e.g. writes to IA32_PRED_CMD), but maybe we can get away with
> those. (Fortunately, it's illegal to put x2APIC MSRs in the VM-entry
> MSR-load area!) Other MSR writes are just a bit tricky to undo (e.g.
> writes to IA32_TIME_STAMP_COUNTER).
> 
> Alternatively, we could perform validity checks on the entire vmcs12
> VM-entry MSR-load area before writing any of the MSRs. This may be
> easier, but it would certainly be slower. We would have to be wary of
> situations where processing an earlier entry affects the validity of a
> later entry. (If we take this route, then we would also have to
> process the valid prefix of the VM-entry MSR-load area when we reflect
> EXIT_REASON_MSR_LOAD_FAIL to L1.)

Maybe a hybrid of the two, e.g. updates that are difficult to unwind
are deferred until all checks pass.  I suspect the set of MSRs that are
difficult to unwind doesn't overlap with the set of MSRs that can affect
the legality of a downstream WRMSR.

> Note that this approach could be extended to permit the deferral of
> some control field checks to hardware as well. As long as the control
> field is copied verbatim from vmcs12 to vmcs02 and the virtual CPU
> enforces the same constraints as the physical CPU, deferral should be
> fine.

I doubt it's worth the effort/complexity to defer control checks to
hardware.  There aren't any control fields that are guaranteed to be
copied verbatim, e.g. we'd need accurate prediction of the final value.
Eliminating the basic vmx_control_verify() doesn't save much as those are
quite speedy, whereas eliminating the individual checks would need to
ensure that *all* fields involved in the check are copied verbatim.

> We just have to make sure that we induce a "VM-entry failure due
> to invalid guest state" for the next VM-entry of vmcs02 if any
> software checks on guest state fail, rather than immediately
> synthesizing an "VM-entry failure due to invalid guest state" during
> the construction of vmcs02.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-09-04 16:44             ` Jim Mattson
  2019-09-04 16:58               ` Sean Christopherson
@ 2019-09-04 18:05               ` Krish Sadhukhan
  2019-09-04 18:20                 ` Jim Mattson
  1 sibling, 1 reply; 27+ messages in thread
From: Krish Sadhukhan @ 2019-09-04 18:05 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Radim Krčmář, Paolo Bonzini


On 9/4/19 9:44 AM, Jim Mattson wrote:
> On Tue, Sep 3, 2019 at 5:59 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
>>
>>
>> On 09/01/2019 05:33 PM, Jim Mattson wrote:
>>
>> On Fri, Aug 30, 2019 at 4:15 PM Jim Mattson <jmattson@google.com> wrote:
>>
>> On Fri, Aug 30, 2019 at 4:07 PM Krish Sadhukhan
>> <krish.sadhukhan@oracle.com> wrote:
>>
>> On 08/29/2019 03:26 PM, Jim Mattson wrote:
>>
>> On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
>> <krish.sadhukhan@oracle.com> wrote:
>>
>> According to section "Checks on Guest Control Registers, Debug Registers, and
>> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
>> of nested guests:
>>
>>       If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
>>       field must be 0.
>>
>> Can't we just let the hardware check guest DR7? This results in
>> "VM-entry failure due to invalid guest state," right? And we just
>> reflect that to L1?
>>
>> Just trying to understand the reason why this particular check can be
>> deferred to the hardware.
>>
>> The vmcs02 field has the same value as the vmcs12 field, and the
>> physical CPU has the same requirements as the virtual CPU.
>>
>> Actually, you're right. There is a problem. With the current
>> implementation, there's a priority inversion if the vmcs12 contains
>> both illegal guest state for which the checks are deferred to
>> hardware, and illegal entries in the VM-entry MSR-load area. In this
>> case, we will synthesize a "VM-entry failure due to MSR loading"
>> rather than a "VM-entry failure due to invalid guest state."
>>
>> There are so many checks on guest state that it's really compelling to
>> defer as many as possible to hardware. However, we need to fix the
>> aforesaid priority inversion. Instead of returning early from
>> nested_vmx_enter_non_root_mode() with EXIT_REASON_MSR_LOAD_FAIL, we
>> could induce a "VM-entry failure due to MSR loading" for the next
>> VM-entry of vmcs02 and continue with the attempted vmcs02 VM-entry. If
>> hardware exits with EXIT_REASON_INVALID_STATE, we reflect that to L1,
>> and if hardware exits with EXIT_REASON_INVALID_STATE, we reflect that
>> to L1 (along with the appropriate exit qualification).
>>
>>
>> Looking at nested_vmx_exit_reflected(), it seems we do return to L1 if the error is EXIT_REASON_INVALID_STATE. So if we fix the priority inversion, this should work then ?
> Yes.
>
>> The tricky part is in undoing the successful MSR writes if we reflect
>> EXIT_REASON_INVALID_STATE to L1. Some MSR writes can't actually be
>> undone (e.g. writes to IA32_PRED_CMD), but maybe we can get away with
>> those. (Fortunately, it's illegal to put x2APIC MSRs in the VM-entry
>> MSR-load area!) Other MSR writes are just a bit tricky to undo (e.g.
>> writes to IA32_TIME_STAMP_COUNTER).
>>
>>
>> Let's say that the priority inversion issue is fixed. In the scenario in which the Guest state is fine but the VM-entry MSR-Load area contains an illegal entry,  you are saying that the induced "VM-entry failure due to MSR loading"  will be caught during the next VM-entry of vmcs02. So how far does the attempted VM-entry of vmcs02  continue with an illegal MSR-Load entry and how do get to the next VM-entry of vmcs02 ?
> Sorry; I don't understand the questions.


Let's say that all guest state checks are deferred to hardware and that 
they all will pass. Now, the VM-entry MSR-load area contains an illegal 
entry and we modify nested_vmx_enter_non_root_mode() to induce a 
"VM-entry failure due to MSR loading" for the next VM-entry of vmcs02. I 
wanted to understand how that induced error ultimately leads to a 
VM-entry failure ?


>> There are two other scenarios there:
>>
>>      1. Guest state is illegal and VM-entry MSR-Load area contains an illegal entry
>>      2. Guest state is illegal but VM-entry MSR-Load area is fine
>>
>> In these scenarios, L2 will exit to L1 with EXIT_REASON_INVALID_STATE and finally this will be returned to L1 userspace. Right ?  If so, we do we care about reverting MSR-writes  because the SDM section 26.8 say,
>>
>>          "Processor state is loaded as would be done on a VM exit (see Section 27.5)"
> I'm not sure how the referenced section of the SDM is relevant. Are
> you assuming that every MSR in the VM-entry MSR load area also appears
> in the VM-exit MSR load area? That certainly isn't the case.
>
>> Alternatively, we could perform validity checks on the entire vmcs12
>> VM-entry MSR-load area before writing any of the MSRs. This may be
>> easier, but it would certainly be slower. We would have to be wary of
>> situations where processing an earlier entry affects the validity of a
>> later entry. (If we take this route, then we would also have to
>> process the valid prefix of the VM-entry MSR-load area when we reflect
>> EXIT_REASON_MSR_LOAD_FAIL to L1.)
> Forget this paragraph. Even if all of the checks pass, we still have
> to undo all of the MSR-writes in the event of a deferred "VM-entry
> failure due to invalid guest state."
>
>> Note that this approach could be extended to permit the deferral of
>> some control field checks to hardware as well.
>>
>>
>> Why can't the first approach be used for VM-entry controls as well ?
> Sorry; I don't understand this question either.


Since you mentioned,

     "Note that this approach could be extended to permit the deferral 
of some control field checks..."

So it seemed that only the second approach was applicable to deferring 
VM-entry control checks to hardware. Hence I asked why the first 
approach can't be used.


>
>>   As long as the control
>> field is copied verbatim from vmcs12 to vmcs02 and the virtual CPU
>> enforces the same constraints as the physical CPU, deferral should be
>> fine. We just have to make sure that we induce a "VM-entry failure due
>> to invalid guest state" for the next VM-entry of vmcs02 if any
>> software checks on guest state fail, rather than immediately
>> synthesizing an "VM-entry failure due to invalid guest state" during
>> the construction of vmcs02.
>>
>>
>> Is it OK to keep this Guest check in software for now and then remove it once we have a solution in place ?
> Why do you feel that getting the priority correct is so important for
> this one check in particular? I'd be surprised if any hypervisor ever
> assembled a VMCS that failed this check.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-09-04 18:05               ` Krish Sadhukhan
@ 2019-09-04 18:20                 ` Jim Mattson
  2019-09-09  4:11                   ` Krish Sadhukhan
  0 siblings, 1 reply; 27+ messages in thread
From: Jim Mattson @ 2019-09-04 18:20 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Wed, Sep 4, 2019 at 11:07 AM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 9/4/19 9:44 AM, Jim Mattson wrote:
> > On Tue, Sep 3, 2019 at 5:59 PM Krish Sadhukhan
> > <krish.sadhukhan@oracle.com> wrote:
> >>
> >>
> >> On 09/01/2019 05:33 PM, Jim Mattson wrote:
> >>
> >> On Fri, Aug 30, 2019 at 4:15 PM Jim Mattson <jmattson@google.com> wrote:
> >>
> >> On Fri, Aug 30, 2019 at 4:07 PM Krish Sadhukhan
> >> <krish.sadhukhan@oracle.com> wrote:
> >>
> >> On 08/29/2019 03:26 PM, Jim Mattson wrote:
> >>
> >> On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
> >> <krish.sadhukhan@oracle.com> wrote:
> >>
> >> According to section "Checks on Guest Control Registers, Debug Registers, and
> >> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> >> of nested guests:
> >>
> >>       If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
> >>       field must be 0.
> >>
> >> Can't we just let the hardware check guest DR7? This results in
> >> "VM-entry failure due to invalid guest state," right? And we just
> >> reflect that to L1?
> >>
> >> Just trying to understand the reason why this particular check can be
> >> deferred to the hardware.
> >>
> >> The vmcs02 field has the same value as the vmcs12 field, and the
> >> physical CPU has the same requirements as the virtual CPU.
> >>
> >> Actually, you're right. There is a problem. With the current
> >> implementation, there's a priority inversion if the vmcs12 contains
> >> both illegal guest state for which the checks are deferred to
> >> hardware, and illegal entries in the VM-entry MSR-load area. In this
> >> case, we will synthesize a "VM-entry failure due to MSR loading"
> >> rather than a "VM-entry failure due to invalid guest state."
> >>
> >> There are so many checks on guest state that it's really compelling to
> >> defer as many as possible to hardware. However, we need to fix the
> >> aforesaid priority inversion. Instead of returning early from
> >> nested_vmx_enter_non_root_mode() with EXIT_REASON_MSR_LOAD_FAIL, we
> >> could induce a "VM-entry failure due to MSR loading" for the next
> >> VM-entry of vmcs02 and continue with the attempted vmcs02 VM-entry. If
> >> hardware exits with EXIT_REASON_INVALID_STATE, we reflect that to L1,
> >> and if hardware exits with EXIT_REASON_INVALID_STATE, we reflect that
> >> to L1 (along with the appropriate exit qualification).
> >>
> >>
> >> Looking at nested_vmx_exit_reflected(), it seems we do return to L1 if the error is EXIT_REASON_INVALID_STATE. So if we fix the priority inversion, this should work then ?
> > Yes.
> >
> >> The tricky part is in undoing the successful MSR writes if we reflect
> >> EXIT_REASON_INVALID_STATE to L1. Some MSR writes can't actually be
> >> undone (e.g. writes to IA32_PRED_CMD), but maybe we can get away with
> >> those. (Fortunately, it's illegal to put x2APIC MSRs in the VM-entry
> >> MSR-load area!) Other MSR writes are just a bit tricky to undo (e.g.
> >> writes to IA32_TIME_STAMP_COUNTER).
> >>
> >>
> >> Let's say that the priority inversion issue is fixed. In the scenario in which the Guest state is fine but the VM-entry MSR-Load area contains an illegal entry,  you are saying that the induced "VM-entry failure due to MSR loading"  will be caught during the next VM-entry of vmcs02. So how far does the attempted VM-entry of vmcs02  continue with an illegal MSR-Load entry and how do get to the next VM-entry of vmcs02 ?
> > Sorry; I don't understand the questions.
>
>
> Let's say that all guest state checks are deferred to hardware and that
> they all will pass. Now, the VM-entry MSR-load area contains an illegal
> entry and we modify nested_vmx_enter_non_root_mode() to induce a
> "VM-entry failure due to MSR loading" for the next VM-entry of vmcs02. I
> wanted to understand how that induced error ultimately leads to a
> VM-entry failure ?

One possible implementation is as follows:

While nested_vmx_load_msr() is processing the vmcs12 VM-entry MSR-load
area, it finds an error in entry <i>. We could set up the vmcs02
VM-entry MSR-load area so that the first entry has <i+1> in the
reserved bits, and the VM-entry MSR-load count is greater than 0.
Since the reserved bits must be one, when we try to launch/resume the
vmcs02 in vmx_vcpu_run(), it will result in "VM-entry failure due to
MSR loading." We can then reflect that to the guest, setting the
vmcs12 exit qualification field from the reserved bits in the first
entry of the vmcs02 VM-entry MSR-load area, rather than passing on the
exit qualification field from the vmcs02. Of course, this doesn't work
if <i> is MAX_UINT32, but I suspect you've already got bigger problems
in that case.

>
> >> There are two other scenarios there:
> >>
> >>      1. Guest state is illegal and VM-entry MSR-Load area contains an illegal entry
> >>      2. Guest state is illegal but VM-entry MSR-Load area is fine
> >>
> >> In these scenarios, L2 will exit to L1 with EXIT_REASON_INVALID_STATE and finally this will be returned to L1 userspace. Right ?  If so, we do we care about reverting MSR-writes  because the SDM section 26.8 say,
> >>
> >>          "Processor state is loaded as would be done on a VM exit (see Section 27.5)"
> > I'm not sure how the referenced section of the SDM is relevant. Are
> > you assuming that every MSR in the VM-entry MSR load area also appears
> > in the VM-exit MSR load area? That certainly isn't the case.
> >
> >> Alternatively, we could perform validity checks on the entire vmcs12
> >> VM-entry MSR-load area before writing any of the MSRs. This may be
> >> easier, but it would certainly be slower. We would have to be wary of
> >> situations where processing an earlier entry affects the validity of a
> >> later entry. (If we take this route, then we would also have to
> >> process the valid prefix of the VM-entry MSR-load area when we reflect
> >> EXIT_REASON_MSR_LOAD_FAIL to L1.)
> > Forget this paragraph. Even if all of the checks pass, we still have
> > to undo all of the MSR-writes in the event of a deferred "VM-entry
> > failure due to invalid guest state."
> >
> >> Note that this approach could be extended to permit the deferral of
> >> some control field checks to hardware as well.
> >>
> >>
> >> Why can't the first approach be used for VM-entry controls as well ?
> > Sorry; I don't understand this question either.
>
>
> Since you mentioned,
>
>      "Note that this approach could be extended to permit the deferral
> of some control field checks..."
>
> So it seemed that only the second approach was applicable to deferring
> VM-entry control checks to hardware. Hence I asked why the first
> approach can't be used.

By "this approach," I meant the deferred delivery of an error
discovered in software.

> >
> >>   As long as the control
> >> field is copied verbatim from vmcs12 to vmcs02 and the virtual CPU
> >> enforces the same constraints as the physical CPU, deferral should be
> >> fine. We just have to make sure that we induce a "VM-entry failure due
> >> to invalid guest state" for the next VM-entry of vmcs02 if any
> >> software checks on guest state fail, rather than immediately
> >> synthesizing an "VM-entry failure due to invalid guest state" during
> >> the construction of vmcs02.
> >>
> >>
> >> Is it OK to keep this Guest check in software for now and then remove it once we have a solution in place ?
> > Why do you feel that getting the priority correct is so important for
> > this one check in particular? I'd be surprised if any hypervisor ever
> > assembled a VMCS that failed this check.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-09-04 18:20                 ` Jim Mattson
@ 2019-09-09  4:11                   ` Krish Sadhukhan
  2019-09-09 15:56                     ` Jim Mattson
  0 siblings, 1 reply; 27+ messages in thread
From: Krish Sadhukhan @ 2019-09-09  4:11 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Radim Krčmář, Paolo Bonzini


On 9/4/19 11:20 AM, Jim Mattson wrote:
> On Wed, Sep 4, 2019 at 11:07 AM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
>>
>> On 9/4/19 9:44 AM, Jim Mattson wrote:
>>> On Tue, Sep 3, 2019 at 5:59 PM Krish Sadhukhan
>>> <krish.sadhukhan@oracle.com> wrote:
>>>>
>>>> On 09/01/2019 05:33 PM, Jim Mattson wrote:
>>>>
>>>> On Fri, Aug 30, 2019 at 4:15 PM Jim Mattson <jmattson@google.com> wrote:
>>>>
>>>> On Fri, Aug 30, 2019 at 4:07 PM Krish Sadhukhan
>>>> <krish.sadhukhan@oracle.com> wrote:
>>>>
>>>> On 08/29/2019 03:26 PM, Jim Mattson wrote:
>>>>
>>>> On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
>>>> <krish.sadhukhan@oracle.com> wrote:
>>>>
>>>> According to section "Checks on Guest Control Registers, Debug Registers, and
>>>> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
>>>> of nested guests:
>>>>
>>>>        If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
>>>>        field must be 0.
>>>>
>>>> Can't we just let the hardware check guest DR7? This results in
>>>> "VM-entry failure due to invalid guest state," right? And we just
>>>> reflect that to L1?
>>>>
>>>> Just trying to understand the reason why this particular check can be
>>>> deferred to the hardware.
>>>>
>>>> The vmcs02 field has the same value as the vmcs12 field, and the
>>>> physical CPU has the same requirements as the virtual CPU.
>>>>
>>>> Actually, you're right. There is a problem. With the current
>>>> implementation, there's a priority inversion if the vmcs12 contains
>>>> both illegal guest state for which the checks are deferred to
>>>> hardware, and illegal entries in the VM-entry MSR-load area. In this
>>>> case, we will synthesize a "VM-entry failure due to MSR loading"
>>>> rather than a "VM-entry failure due to invalid guest state."
>>>>
>>>> There are so many checks on guest state that it's really compelling to
>>>> defer as many as possible to hardware. However, we need to fix the
>>>> aforesaid priority inversion. Instead of returning early from
>>>> nested_vmx_enter_non_root_mode() with EXIT_REASON_MSR_LOAD_FAIL, we
>>>> could induce a "VM-entry failure due to MSR loading" for the next
>>>> VM-entry of vmcs02 and continue with the attempted vmcs02 VM-entry. If
>>>> hardware exits with EXIT_REASON_INVALID_STATE, we reflect that to L1,
>>>> and if hardware exits with EXIT_REASON_INVALID_STATE, we reflect that
>>>> to L1 (along with the appropriate exit qualification).
>>>>
>>>>
>>>> Looking at nested_vmx_exit_reflected(), it seems we do return to L1 if the error is EXIT_REASON_INVALID_STATE. So if we fix the priority inversion, this should work then ?
>>> Yes.
>>>
>>>> The tricky part is in undoing the successful MSR writes if we reflect
>>>> EXIT_REASON_INVALID_STATE to L1. Some MSR writes can't actually be
>>>> undone (e.g. writes to IA32_PRED_CMD), but maybe we can get away with
>>>> those. (Fortunately, it's illegal to put x2APIC MSRs in the VM-entry
>>>> MSR-load area!) Other MSR writes are just a bit tricky to undo (e.g.
>>>> writes to IA32_TIME_STAMP_COUNTER).
>>>>
>>>>
>>>> Let's say that the priority inversion issue is fixed. In the scenario in which the Guest state is fine but the VM-entry MSR-Load area contains an illegal entry,  you are saying that the induced "VM-entry failure due to MSR loading"  will be caught during the next VM-entry of vmcs02. So how far does the attempted VM-entry of vmcs02  continue with an illegal MSR-Load entry and how do get to the next VM-entry of vmcs02 ?
>>> Sorry; I don't understand the questions.
>>
>> Let's say that all guest state checks are deferred to hardware and that
>> they all will pass. Now, the VM-entry MSR-load area contains an illegal
>> entry and we modify nested_vmx_enter_non_root_mode() to induce a
>> "VM-entry failure due to MSR loading" for the next VM-entry of vmcs02. I
>> wanted to understand how that induced error ultimately leads to a
>> VM-entry failure ?
> One possible implementation is as follows:
>
> While nested_vmx_load_msr() is processing the vmcs12 VM-entry MSR-load
> area, it finds an error in entry <i>. We could set up the vmcs02
> VM-entry MSR-load area so that the first entry has <i+1> in the
> reserved bits, and the VM-entry MSR-load count is greater than 0.
> Since the reserved bits must be one, when we try to launch/resume the
> vmcs02 in vmx_vcpu_run(), it will result in "VM-entry failure due to
> MSR loading." We can then reflect that to the guest, setting the
> vmcs12 exit qualification field from the reserved bits in the first
> entry of the vmcs02 VM-entry MSR-load area, rather than passing on the
> exit qualification field from the vmcs02. Of course, this doesn't work
> if <i> is MAX_UINT32, but I suspect you've already got bigger problems
> in that case.


It seems like a good solution. The only problem I see in this is that 
using the reserved bits is not guaranteed to work forever as the 
hardware vendors can decide to use them anytime.

Instead, I was wondering whether we could set bits 31:0 in the first 
entry in the VM-entry MSR-load area of vmcs02 to a value of C0000100H. 
According to Intel SDM, this will cause VM-entry to fail:

            "The value of bits 31:0 is either C0000100H (the 
IA32_FS_BASE MSR) or C0000101 (the IA32_GS_BASE MSR)."

We can use bits 127:64 of that entry to indicate which MSR entry in the 
vmcs12 MSR-load area had an error and then we synthesize an exit 
qualification from that information.


>
>>>> There are two other scenarios there:
>>>>
>>>>       1. Guest state is illegal and VM-entry MSR-Load area contains an illegal entry
>>>>       2. Guest state is illegal but VM-entry MSR-Load area is fine
>>>>
>>>> In these scenarios, L2 will exit to L1 with EXIT_REASON_INVALID_STATE and finally this will be returned to L1 userspace. Right ?  If so, we do we care about reverting MSR-writes  because the SDM section 26.8 say,
>>>>
>>>>           "Processor state is loaded as would be done on a VM exit (see Section 27.5)"
>>> I'm not sure how the referenced section of the SDM is relevant. Are
>>> you assuming that every MSR in the VM-entry MSR load area also appears
>>> in the VM-exit MSR load area? That certainly isn't the case.
>>>
>>>> Alternatively, we could perform validity checks on the entire vmcs12
>>>> VM-entry MSR-load area before writing any of the MSRs. This may be
>>>> easier, but it would certainly be slower. We would have to be wary of
>>>> situations where processing an earlier entry affects the validity of a
>>>> later entry. (If we take this route, then we would also have to
>>>> process the valid prefix of the VM-entry MSR-load area when we reflect
>>>> EXIT_REASON_MSR_LOAD_FAIL to L1.)
>>> Forget this paragraph. Even if all of the checks pass, we still have
>>> to undo all of the MSR-writes in the event of a deferred "VM-entry
>>> failure due to invalid guest state."
>>>
>>>> Note that this approach could be extended to permit the deferral of
>>>> some control field checks to hardware as well.
>>>>
>>>>
>>>> Why can't the first approach be used for VM-entry controls as well ?
>>> Sorry; I don't understand this question either.
>>
>> Since you mentioned,
>>
>>       "Note that this approach could be extended to permit the deferral
>> of some control field checks..."
>>
>> So it seemed that only the second approach was applicable to deferring
>> VM-entry control checks to hardware. Hence I asked why the first
>> approach can't be used.
> By "this approach," I meant the deferred delivery of an error
> discovered in software.
>
>>>>    As long as the control
>>>> field is copied verbatim from vmcs12 to vmcs02 and the virtual CPU
>>>> enforces the same constraints as the physical CPU, deferral should be
>>>> fine. We just have to make sure that we induce a "VM-entry failure due
>>>> to invalid guest state" for the next VM-entry of vmcs02 if any
>>>> software checks on guest state fail, rather than immediately
>>>> synthesizing an "VM-entry failure due to invalid guest state" during
>>>> the construction of vmcs02.
>>>>
>>>>
>>>> Is it OK to keep this Guest check in software for now and then remove it once we have a solution in place ?
>>> Why do you feel that getting the priority correct is so important for
>>> this one check in particular? I'd be surprised if any hypervisor ever
>>> assembled a VMCS that failed this check.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-09-09  4:11                   ` Krish Sadhukhan
@ 2019-09-09 15:56                     ` Jim Mattson
  0 siblings, 0 replies; 27+ messages in thread
From: Jim Mattson @ 2019-09-09 15:56 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Sun, Sep 8, 2019 at 9:11 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:

> It seems like a good solution. The only problem I see in this is that
> using the reserved bits is not guaranteed to work forever as the
> hardware vendors can decide to use them anytime.

Unlikely, but point taken.

> Instead, I was wondering whether we could set bits 31:0 in the first
> entry in the VM-entry MSR-load area of vmcs02 to a value of C0000100H.
> According to Intel SDM, this will cause VM-entry to fail:
>
>             "The value of bits 31:0 is either C0000100H (the
> IA32_FS_BASE MSR) or C0000101 (the IA32_GS_BASE MSR)."
>
> We can use bits 127:64 of that entry to indicate which MSR entry in the
> vmcs12 MSR-load area had an error and then we synthesize an exit
> qualification from that information.

That seems reasonable to me.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state when VM-entry fails
  2019-09-04 15:42   ` Sean Christopherson
@ 2019-09-13 20:37     ` Krish Sadhukhan
  2019-09-13 21:06       ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Krish Sadhukhan @ 2019-09-13 20:37 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, rkrcmar, pbonzini, jmattson


On 9/4/19 8:42 AM, Sean Christopherson wrote:
> On Thu, Aug 29, 2019 at 04:56:34PM -0400, Krish Sadhukhan wrote:
>> Bit# 31 in VM-exit reason is set by hardware in both cases of early VM-entry
>> failures and VM-entry failures due to invalid guest state.
> This is incorrect, VMCS.EXIT_REASON is not written on a VM-Fail.  If the
> tests are passing, you're probably consuming a stale EXIT_REASON.

In vmx_vcpu_run(),

         if (vmx->fail || (vmx->exit_reason & 
VMX_EXIT_REASONS_FAILED_VMENTRY))
                 return;

         vmx->loaded_vmcs->launched = 1;

we return without setting "launched" whenever bit# 31 is set in Exit 
Reason. If VM-entry fails due to invalid guest state or due to errors in 
VM-entry MSR-loading area, bit#31 is set.  As a result, L2 is not in 
"launched" state when we return to L1.  Tests that want to VMRESUME L2 
after fixing the bad guest state or the bad MSR-loading area, fail with 
VM-Instruction Error 5,

         "Early vmresume failure: error number is 5. See Intel 30.4."

>
>> Whenever VM-entry
>> fails, the nested VMCS is not in "launched" state any more. Hence,
>> __enter_guest() should not set the "launched" state when a VM-entry fails.
>>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
>> ---
>>   x86/vmx.c | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/x86/vmx.c b/x86/vmx.c
>> index 872ba11..183d11b 100644
>> --- a/x86/vmx.c
>> +++ b/x86/vmx.c
>> @@ -1805,6 +1805,8 @@ static void check_for_guest_termination(void)
>>    */
>>   static void __enter_guest(u8 abort_flag, struct vmentry_failure *failure)
>>   {
>> +	bool vm_entry_failure;
>> +
>>   	TEST_ASSERT_MSG(v2_guest_main,
>>   			"Never called test_set_guest_func!");
>>   
>> @@ -1812,15 +1814,14 @@ static void __enter_guest(u8 abort_flag, struct vmentry_failure *failure)
>>   			"Called enter_guest() after guest returned.");
>>   
>>   	vmx_enter_guest(failure);
>> +	vm_entry_failure = vmcs_read(EXI_REASON) & VMX_ENTRY_FAILURE;
> Rather than duplicating the code in vmx_run(), what if we move this check
> into vmx_enter_guest() and rework struct vmentry_failure?  The code was
> originally designed to handle only VM-Fail conditions, we should clean it
> up instead of bolting more stuff on top.  E.g.:
>
> struct vmentry_status {
> 	/* Did we attempt VMLAUNCH or VMRESUME */
> 	bool vmlaunch;
> 	/* Instruction mnemonic (for convenience). */
> 	const char *instr;
> 	/* VM-Enter passed all consistency checks, i.e. did not fail. */
> 	bool succeeded;
> 	/* VM-Enter failed before loading guest state, i.e. VM-Fail. */
> 	bool vm_fail;
> 	/* Contents of RFLAGS on VM-Fail, EXIT_REASON on VM-Exit.  */
> 	union {
> 		unsigned long vm_fail_flags;
> 		unsigned long vm_exit_reason;
> 	};
> };
>
> static void vmx_enter_guest(struct vmentry_status *status)
> {
> 	status->vm_fail = 0;
>
> 	in_guest = 1;
> 	asm volatile (
> 		"mov %[HOST_RSP], %%rdi\n\t"
> 		"vmwrite %%rsp, %%rdi\n\t"
> 		LOAD_GPR_C
> 		"cmpb $0, %[launched]\n\t"
> 		"jne 1f\n\t"
> 		"vmlaunch\n\t"
> 		"jmp 2f\n\t"
> 		"1: "
> 		"vmresume\n\t"
> 		"2: "
> 		SAVE_GPR_C
> 		"pushf\n\t"
> 		"pop %%rdi\n\t"
> 		"mov %%rdi, %[vm_fail_flags]\n\t"
> 		"movl $1, %[vm_fail]\n\t"
> 		"jmp 3f\n\t"
> 		"vmx_return:\n\t"
> 		SAVE_GPR_C
> 		"3: \n\t"
> 		: [vm_fail]"+m"(status->vm_fail),
> 		  [vm_fail_flags]"=m"(status->vm_fail_flags)
> 		: [launched]"m"(launched), [HOST_RSP]"i"(HOST_RSP)
> 		: "rdi", "memory", "cc"
> 	);
> 	in_guest = 0;
>
> 	if (!status->vm_fail)
> 		status->vm_exit_reason = vmcs_read(EXI_REASON);
> 		
> 	status->succeeded = !status->vm_fail &&
> 			    !(status->vm_exit_reason & VMX_ENTRY_FAILURE);
>
> 	status->vmlaunch = !launched;
> 	status->instr = launched ? "vmresume" : "vmlaunch";
>
> 	if (status->succeeded)
> 		launched = 1;
> }


This looks good. Do you want to send a patch or you want me to add it to 
the current set ?


>
>>   	if ((abort_flag & ABORT_ON_EARLY_VMENTRY_FAIL && failure->early) ||
>> -	    (abort_flag & ABORT_ON_INVALID_GUEST_STATE &&
>> -	    vmcs_read(EXI_REASON) & VMX_ENTRY_FAILURE)) {
>> -
>> +	    (abort_flag & ABORT_ON_INVALID_GUEST_STATE && vm_entry_failure)) {
>>   		print_vmentry_failure_info(failure);
>>   		abort();
>>   	}
>>   
>> -	if (!failure->early) {
>> +	if (!vm_entry_failure) {
>>   		launched = 1;
>>   		check_for_guest_termination();
>>   	}
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state when VM-entry fails
  2019-09-13 20:37     ` Krish Sadhukhan
@ 2019-09-13 21:06       ` Sean Christopherson
  2019-09-16 19:12         ` Krish Sadhukhan
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2019-09-13 21:06 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm, rkrcmar, pbonzini, jmattson

On Fri, Sep 13, 2019 at 01:37:55PM -0700, Krish Sadhukhan wrote:
> 
> On 9/4/19 8:42 AM, Sean Christopherson wrote:
> >On Thu, Aug 29, 2019 at 04:56:34PM -0400, Krish Sadhukhan wrote:
> >>Bit# 31 in VM-exit reason is set by hardware in both cases of early VM-entry
> >>failures and VM-entry failures due to invalid guest state.
> >This is incorrect, VMCS.EXIT_REASON is not written on a VM-Fail.  If the
> >tests are passing, you're probably consuming a stale EXIT_REASON.
> 
> In vmx_vcpu_run(),
> 
>         if (vmx->fail || (vmx->exit_reason &
> VMX_EXIT_REASONS_FAILED_VMENTRY))
>                 return;
> 
>         vmx->loaded_vmcs->launched = 1;
> 
> we return without setting "launched" whenever bit# 31 is set in Exit Reason.
> If VM-entry fails due to invalid guest state or due to errors in VM-entry
> MSR-loading area, bit#31 is set.  As a result, L2 is not in "launched" state
> when we return to L1.  Tests that want to VMRESUME L2 after fixing the bad
> guest state or the bad MSR-loading area, fail with VM-Instruction Error 5,
> 
>         "Early vmresume failure: error number is 5. See Intel 30.4."

Yes, a VMCS isn't marked launched if it generates a VM-Exit due to a
failed consistency check.  But as that code shows, a failed consistency
check results in said VM-Exit *or* a VM-Fail.  Cosnsitency checks that
fail early, i.e. before loading guest state, generate VM-Fail, any check
that fails after the CPU has started loading guest state manifests as a
VM-Exit.  VMCS.EXIT_REASON isn't touched in the VM-Fail case.

E.g. in CHECKS ON VMX CONTROLS AND HOST-STATE AREA, the SDM states:

  VM entry fails if any of these checks fail.  When such failures occur,
  control is passed to the next instruction, RFLAGS.ZF is set to 1 to
  indicate the failure, and the VM-instruction error field is loaded with
  an error number that indicates whether the failure was due to the
  controls or the host-state area (see Chapter 30).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state when VM-entry fails
  2019-09-13 21:06       ` Sean Christopherson
@ 2019-09-16 19:12         ` Krish Sadhukhan
  0 siblings, 0 replies; 27+ messages in thread
From: Krish Sadhukhan @ 2019-09-16 19:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, rkrcmar, pbonzini, jmattson



On 09/13/2019 02:06 PM, Sean Christopherson wrote:
> On Fri, Sep 13, 2019 at 01:37:55PM -0700, Krish Sadhukhan wrote:
>> On 9/4/19 8:42 AM, Sean Christopherson wrote:
>>> On Thu, Aug 29, 2019 at 04:56:34PM -0400, Krish Sadhukhan wrote:
>>>> Bit# 31 in VM-exit reason is set by hardware in both cases of early VM-entry
>>>> failures and VM-entry failures due to invalid guest state.
>>> This is incorrect, VMCS.EXIT_REASON is not written on a VM-Fail.  If the
>>> tests are passing, you're probably consuming a stale EXIT_REASON.
>> In vmx_vcpu_run(),
>>
>>          if (vmx->fail || (vmx->exit_reason &
>> VMX_EXIT_REASONS_FAILED_VMENTRY))
>>                  return;
>>
>>          vmx->loaded_vmcs->launched = 1;
>>
>> we return without setting "launched" whenever bit# 31 is set in Exit Reason.
>> If VM-entry fails due to invalid guest state or due to errors in VM-entry
>> MSR-loading area, bit#31 is set.  As a result, L2 is not in "launched" state
>> when we return to L1.  Tests that want to VMRESUME L2 after fixing the bad
>> guest state or the bad MSR-loading area, fail with VM-Instruction Error 5,
>>
>>          "Early vmresume failure: error number is 5. See Intel 30.4."
> Yes, a VMCS isn't marked launched if it generates a VM-Exit due to a
> failed consistency check.  But as that code shows, a failed consistency
> check results in said VM-Exit *or* a VM-Fail.  Cosnsitency checks that
> fail early, i.e. before loading guest state, generate VM-Fail, any check
> that fails after the CPU has started loading guest state manifests as a
> VM-Exit.  VMCS.EXIT_REASON isn't touched in the VM-Fail case.
>
> E.g. in CHECKS ON VMX CONTROLS AND HOST-STATE AREA, the SDM states:
>
>    VM entry fails if any of these checks fail.  When such failures occur,
>    control is passed to the next instruction, RFLAGS.ZF is set to 1 to
>    indicate the failure, and the VM-instruction error field is loaded with
>    an error number that indicates whether the failure was due to the
>    controls or the host-state area (see Chapter 30).

The fix done by Marc Orr in

         "[kvm-unit-tests PATCH v2] x86: nvmx: test max atomic switch MSRs"

fixes this problem.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-08-30 23:15       ` Jim Mattson
  2019-09-02  0:33         ` Jim Mattson
@ 2019-12-20 23:45         ` Jim Mattson
  1 sibling, 0 replies; 27+ messages in thread
From: Jim Mattson @ 2019-12-20 23:45 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Fri, Aug 30, 2019 at 4:15 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Aug 30, 2019 at 4:07 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
> >
> >
> >
> > On 08/29/2019 03:26 PM, Jim Mattson wrote:
> > > On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
> > > <krish.sadhukhan@oracle.com> wrote:
> > >> According to section "Checks on Guest Control Registers, Debug Registers, and
> > >> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> > >> of nested guests:
> > >>
> > >>      If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
> > >>      field must be 0.
> > > Can't we just let the hardware check guest DR7? This results in
> > > "VM-entry failure due to invalid guest state," right? And we just
> > > reflect that to L1?
> >
> > Just trying to understand the reason why this particular check can be
> > deferred to the hardware.
>
> The vmcs02 field has the same value as the vmcs12 field, and the
> physical CPU has the same requirements as the virtual CPU.

Sadly, I was mistaken. The guest DR7 value is not transferred from
vmcs12 to vmcs02. It is set prior to the vmcs02 VM-entry by
kvm_set_dr(). Unfortunately, that function synthesizes a #GP if any
bit in the high dword of DR7 is set. So, you are correct, Krish: this
field must be checked in software.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
  2019-08-29 20:56 ` [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 " Krish Sadhukhan
  2019-08-29 22:26   ` Jim Mattson
@ 2019-12-21  0:27   ` Jim Mattson
  1 sibling, 0 replies; 27+ messages in thread
From: Jim Mattson @ 2019-12-21  0:27 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: kvm list, Radim Krčmář, Paolo Bonzini

On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
> According to section "Checks on Guest Control Registers, Debug Registers, and
> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> of nested guests:
>
>     If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
>     field must be 0.
>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 6 ++++++
>  arch/x86/kvm/x86.c        | 2 +-
>  arch/x86/kvm/x86.h        | 6 ++++++
>  3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 0b234e95e0ed..f04619daf906 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2681,6 +2681,12 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
>             !kvm_debugctl_valid(vmcs12->guest_ia32_debugctl))
>                 return -EINVAL;
>
> +#ifdef CONFIG_X86_64
> +       if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> +           !kvm_dr7_valid(vmcs12->guest_dr7))
> +               return -EINVAL;
> +#endif
> +
>         if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
>             !kvm_pat_valid(vmcs12->guest_ia32_pat))
>                 return -EINVAL;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fafd81d2c9ea..423a7a573608 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1051,7 +1051,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
>         case 5:
>                 /* fall through */
>         default: /* 7 */
> -               if (val & 0xffffffff00000000ULL)
> +               if (!kvm_dr7_valid(val))
>                         return -1; /* #GP */
>                 vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
>                 kvm_update_dr7(vcpu);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 28ba6d0c359f..4e55851fc3fb 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -360,6 +360,12 @@ static inline bool kvm_debugctl_valid(u64 data)
>         return ((data & 0xFFFFFFFFFFFF203Cull) ? false : true);
>  }
>
> +static inline bool kvm_dr7_valid(u64 data)

This should be 'unsigned long data.'

> +{
> +       /* Bits [63:32] are reserved */
> +       return ((data & 0xFFFFFFFF00000000ull) ? false : true);

    return !(data & 0xFFFFFFFF00000000ull);

Or, shorter:

    return (u32)data == data;

> +}
> +
>  void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu);
>  void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu);
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-12-21  0:27 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-29 20:56 [PATCH 0/4] KVM: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests Krish Sadhukhan
2019-08-29 20:56 ` [PATCH 1/4] KVM: nVMX: Check GUEST_DEBUGCTL " Krish Sadhukhan
2019-08-29 22:12   ` Jim Mattson
2019-08-30 23:26     ` Krish Sadhukhan
2019-09-01 23:55       ` Jim Mattson
2019-08-29 20:56 ` [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 " Krish Sadhukhan
2019-08-29 22:26   ` Jim Mattson
2019-08-30 23:07     ` Krish Sadhukhan
2019-08-30 23:15       ` Jim Mattson
2019-09-02  0:33         ` Jim Mattson
     [not found]           ` <e229bea2-acb2-e268-6281-d8e467c3282e@oracle.com>
2019-09-04 16:44             ` Jim Mattson
2019-09-04 16:58               ` Sean Christopherson
2019-09-04 18:05               ` Krish Sadhukhan
2019-09-04 18:20                 ` Jim Mattson
2019-09-09  4:11                   ` Krish Sadhukhan
2019-09-09 15:56                     ` Jim Mattson
2019-09-04 17:14           ` Sean Christopherson
2019-12-20 23:45         ` Jim Mattson
2019-12-21  0:27   ` Jim Mattson
2019-08-29 20:56 ` [PATCH 3/4] kvm-unit-test: nVMX: __enter_guest() should not set "launched" state when VM-entry fails Krish Sadhukhan
2019-09-04 15:42   ` Sean Christopherson
2019-09-13 20:37     ` Krish Sadhukhan
2019-09-13 21:06       ` Sean Christopherson
2019-09-16 19:12         ` Krish Sadhukhan
2019-08-29 20:56 ` [PATCH 4/4] kvm-unit-test: nVMX: Check GUEST_DEBUGCTL and GUEST_DR7 on vmentry of nested guests Krish Sadhukhan
2019-08-29 23:17   ` Jim Mattson
2019-08-30  1:12     ` Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).