All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 1/3] KVM: X86: Fix operand size during instruction decoding
@ 2017-11-03  0:35 Wanpeng Li
  2017-11-03  0:35 ` [PATCH v4 2/3] KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry Wanpeng Li
  2017-11-03  0:35 ` [PATCH v4 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure Wanpeng Li
  0 siblings, 2 replies; 5+ messages in thread
From: Wanpeng Li @ 2017-11-03  0:35 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Wanpeng Li, Nadav Amit, Pedro Fonseca

From: Wanpeng Li <wanpeng.li@hotmail.com>

Pedro reported:
  During tests that we conducted on KVM, we noticed that executing a "PUSH %ES"
  instruction under KVM produces different results on both memory and the SP
  register depending on whether EPT support is enabled. With EPT the SP is
  reduced by 4 bytes (and the written value is 0-padded) but without EPT support
  it is only reduced by 2 bytes. The difference can be observed when the CS.DB
  field is 1 (32-bit) but not when it's 0 (16-bit).

The internal segment descriptor cache exist even in real/vm8096 mode. The CS.D
also should be respected instead of just default operand-size/66H prefix during
instruction decoding. This patch fixes it by also adjusting operand-size according
to CS.D.

Reported-by: Pedro Fonseca <pfonseca@cs.washington.edu>
Tested-by: Pedro Fonseca <pfonseca@cs.washington.edu>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Pedro Fonseca <pfonseca@cs.washington.edu>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v3 -> v4:
 * def_ad_bytes must be changed to 4
 * separate X86EMUL_MODE_PROT16 altogether from the others
v2 -> v3:
 * cleanup the codes 
v1 -> v2:
 * respect cs.d for real/vm8096, other modes have already 
   been considered in init_emulate_ctxt().

 arch/x86/kvm/emulate.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8079d14..b4a87de 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5000,6 +5000,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
 	bool op_prefix = false;
 	bool has_seg_override = false;
 	struct opcode opcode;
+	u16 dummy;
+	struct desc_struct desc;
 
 	ctxt->memop.type = OP_NONE;
 	ctxt->memopp = NULL;
@@ -5018,6 +5020,11 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
 	switch (mode) {
 	case X86EMUL_MODE_REAL:
 	case X86EMUL_MODE_VM86:
+		def_op_bytes = def_ad_bytes = 2;
+		ctxt->ops->get_segment(ctxt, &dummy, &desc, NULL, VCPU_SREG_CS);
+		if (desc.d)
+			def_op_bytes = def_ad_bytes = 4;
+		break;
 	case X86EMUL_MODE_PROT16:
 		def_op_bytes = def_ad_bytes = 2;
 		break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 2/3] KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
  2017-11-03  0:35 [PATCH v4 1/3] KVM: X86: Fix operand size during instruction decoding Wanpeng Li
@ 2017-11-03  0:35 ` Wanpeng Li
  2017-11-03 17:52   ` Jim Mattson
  2017-11-03  0:35 ` [PATCH v4 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure Wanpeng Li
  1 sibling, 1 reply; 5+ messages in thread
From: Wanpeng Li @ 2017-11-03  0:35 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li, Jim Mattson

From: Wanpeng Li <wanpeng.li@hotmail.com>

According to the SDM, if the "load IA32_BNDCFGS" VM-entry controls is 1, the
following checks are performed on the field for the IA32_BNDCFGS MSR:
 - Bits reserved in the IA32_BNDCFGS MSR must be 0.
 - The linear address in bits 63:12 must be canonical.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v3 -> v4:
 * simply condition
 * use && instead of nested "if"s

 arch/x86/kvm/vmx.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e6c8ffa..6cf3972 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10805,6 +10805,11 @@ static int check_vmentry_postreqs(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 			return 1;
 	}
 
+	if (kvm_mpx_supported() &&
+		(is_noncanonical_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu) ||
+		(vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD)))
+			return 1;
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
  2017-11-03  0:35 [PATCH v4 1/3] KVM: X86: Fix operand size during instruction decoding Wanpeng Li
  2017-11-03  0:35 ` [PATCH v4 2/3] KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry Wanpeng Li
@ 2017-11-03  0:35 ` Wanpeng Li
  1 sibling, 0 replies; 5+ messages in thread
From: Wanpeng Li @ 2017-11-03  0:35 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li, Jim Mattson

From: Wanpeng Li <wanpeng.li@hotmail.com>

Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure 
properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls in L1) 
null pointer deference and also L0 calltrace when EPT=0 on both L0 and L1.

In L1:

BUG: unable to handle kernel paging request at ffffffffc015bf8f
 IP: vmx_vcpu_run+0x202/0x510 [kvm_intel]
 PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161
 Oops: 0003 [#1] PREEMPT SMP
 CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6
 RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel]
 Call Trace:
 WARNING: kernel stack frame pointer at ffffb86f4988bc18 in qemu-system-x86:1798 has bad value 0000000000000002

In L0:

-----------[ cut here ]------------
 WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:9845 vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
 CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc7+ #25
 RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
 Call Trace:
  paging64_page_fault+0x500/0xde0 [kvm]
  ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm]
  ? nonpaging_page_fault+0x3b0/0x3b0 [kvm]
  ? __asan_storeN+0x12/0x20
  ? paging64_gva_to_gpa+0xb0/0x120 [kvm]
  ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm]
  ? lock_acquire+0x2c0/0x2c0
  ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel]
  ? vmx_get_segment+0x2a6/0x310 [kvm_intel]
  ? sched_clock+0x1f/0x30
  ? check_chain_key+0x137/0x1e0
  ? __lock_acquire+0x83c/0x2420
  ? kvm_multiple_exception+0xf2/0x220 [kvm]
  ? debug_check_no_locks_freed+0x240/0x240
  ? debug_smp_processor_id+0x17/0x20
  ? __lock_is_held+0x9e/0x100
  kvm_mmu_page_fault+0x90/0x180 [kvm]
  kvm_handle_page_fault+0x15c/0x310 [kvm]
  ? __lock_is_held+0x9e/0x100
  handle_exception+0x3c7/0x4d0 [kvm_intel]
  vmx_handle_exit+0x103/0x1010 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm]

The commit avoids to load host state of vmcs12 as vmcs01's guest state 
since vmcs12 is not modified (except for the VM-instruction error field)
if the checking of vmcs control area fails. However, the mmu context is 
switched to nested mmu in prepare_vmcs02() and it will not be reloaded 
since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME 
fails. This patch fixes it by reloading mmu context when nested 
VMLAUNCH/VMRESUME fails.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v3 -> v4:
 * move it to a new function load_vmcs12_mmu_host_state

 arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6cf3972..8aefb91 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11259,6 +11259,24 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	kvm_clear_interrupt_queue(vcpu);
 }
 
+static void load_vmcs12_mmu_host_state(struct kvm_vcpu *vcpu,
+			struct vmcs12 *vmcs12)
+{
+	u32 entry_failure_code;
+
+	nested_ept_uninit_mmu_context(vcpu);
+
+	/*
+	 * Only PDPTE load can fail as the value of cr3 was checked on entry and
+	 * couldn't have changed.
+	 */
+	if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code))
+		nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
+
+	if (!enable_ept)
+		vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
+}
+
 /*
  * A part of what we need to when the nested L2 guest exits and we want to
  * run its L1 parent, is to reset L1's guest state to the host state specified
@@ -11272,7 +11290,6 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 				   struct vmcs12 *vmcs12)
 {
 	struct kvm_segment seg;
-	u32 entry_failure_code;
 
 	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
 		vcpu->arch.efer = vmcs12->host_ia32_efer;
@@ -11299,17 +11316,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK);
 	vmx_set_cr4(vcpu, vmcs12->host_cr4);
 
-	nested_ept_uninit_mmu_context(vcpu);
-
-	/*
-	 * Only PDPTE load can fail as the value of cr3 was checked on entry and
-	 * couldn't have changed.
-	 */
-	if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code))
-		nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
-
-	if (!enable_ept)
-		vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
+	load_vmcs12_mmu_host_state(vcpu, vmcs12);
 
 	if (enable_vpid) {
 		/*
@@ -11539,6 +11546,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	 * accordingly.
 	 */
 	nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
+
+	load_vmcs12_mmu_host_state(vcpu, vmcs12);
+
 	/*
 	 * The emulated instruction was already skipped in
 	 * nested_vmx_run, but the updated RIP was never
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 2/3] KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
  2017-11-03  0:35 ` [PATCH v4 2/3] KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry Wanpeng Li
@ 2017-11-03 17:52   ` Jim Mattson
  2017-11-06  1:32     ` Wanpeng Li
  0 siblings, 1 reply; 5+ messages in thread
From: Jim Mattson @ 2017-11-03 17:52 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm list, Paolo Bonzini, Radim Krčmář, Wanpeng Li

On Thu, Nov 2, 2017 at 5:35 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> According to the SDM, if the "load IA32_BNDCFGS" VM-entry controls is 1, the
> following checks are performed on the field for the IA32_BNDCFGS MSR:
>  - Bits reserved in the IA32_BNDCFGS MSR must be 0.
>  - The linear address in bits 63:12 must be canonical.
>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Jim Mattson <jmattson@google.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> v3 -> v4:
>  * simply condition
>  * use && instead of nested "if"s
>
>  arch/x86/kvm/vmx.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index e6c8ffa..6cf3972 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10805,6 +10805,11 @@ static int check_vmentry_postreqs(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>                         return 1;
>         }
>
> +       if (kvm_mpx_supported() &&

I think you kept the wrong conjunct from the previous version. Drop
kvm_mpx_supported() and restore the test for
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS).

> +               (is_noncanonical_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu) ||
> +               (vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD)))
> +                       return 1;
> +
>         return 0;
>  }
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 2/3] KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
  2017-11-03 17:52   ` Jim Mattson
@ 2017-11-06  1:32     ` Wanpeng Li
  0 siblings, 0 replies; 5+ messages in thread
From: Wanpeng Li @ 2017-11-06  1:32 UTC (permalink / raw)
  To: Jim Mattson
  Cc: LKML, kvm list, Paolo Bonzini, Radim Krčmář, Wanpeng Li

2017-11-04 1:52 GMT+08:00 Jim Mattson <jmattson@google.com>:
> On Thu, Nov 2, 2017 at 5:35 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> According to the SDM, if the "load IA32_BNDCFGS" VM-entry controls is 1, the
>> following checks are performed on the field for the IA32_BNDCFGS MSR:
>>  - Bits reserved in the IA32_BNDCFGS MSR must be 0.
>>  - The linear address in bits 63:12 must be canonical.
>>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Jim Mattson <jmattson@google.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> v3 -> v4:
>>  * simply condition
>>  * use && instead of nested "if"s
>>
>>  arch/x86/kvm/vmx.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index e6c8ffa..6cf3972 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -10805,6 +10805,11 @@ static int check_vmentry_postreqs(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>                         return 1;
>>         }
>>
>> +       if (kvm_mpx_supported() &&
>
> I think you kept the wrong conjunct from the previous version. Drop
> kvm_mpx_supported() and restore the test for
> (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS).

Yeah, I misunderstand the English.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-06  1:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-03  0:35 [PATCH v4 1/3] KVM: X86: Fix operand size during instruction decoding Wanpeng Li
2017-11-03  0:35 ` [PATCH v4 2/3] KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry Wanpeng Li
2017-11-03 17:52   ` Jim Mattson
2017-11-06  1:32     ` Wanpeng Li
2017-11-03  0:35 ` [PATCH v4 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure Wanpeng Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.