linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] KVM: VMX: Fix single stepping with emulated instructions.
@ 2016-11-28  4:18 Kyle Huey
  2016-11-28  4:18 ` [PATCH 1/5] KVM: x86: Add a return value to kvm_emulate_cpuid Kyle Huey
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Kyle Huey @ 2016-11-28  4:18 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel
  Cc: kvm, linux-kernel

KVM does not currently honor the trap flag when emulating instructions that
cause VM exits. This is observable from guest userspace, try stepping on a
CPUID instruction in gdb in a KVM guest. The program will stop two
instructions after CPUID.

To fix this, in skip_emulated_instruction we can check for RFLAGS.TF. Patch
5 does this. To handle both the guest setting TF and the
KVM_GUESTDBG_SINGLESTEP cases we need to be able to indicate to callees that
an exit to userspace is required. Patches 1-4 are largely plumbing to make
this possible.

Traps triggered by task switch instructions require some additional handling
and are not implemented. KVM_GUESTDBG_SINGLESTEP traps can be squashed by
certain instructions which also trigger userspace exits, such as HALT,
MOV CR8, and IO instructions. I believe (although I have not tested) that
KVM will simply generate another trap on the next instruction, so this is
no worse than the current behavior.

These patches only fix this issue for VMX. I don't have AMD silicon to test
on.

A small patch to kvm-unit-tests is coming in a separate email.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] KVM: x86: Add a return value to kvm_emulate_cpuid
  2016-11-28  4:18 [PATCH 0/5] KVM: VMX: Fix single stepping with emulated instructions Kyle Huey
@ 2016-11-28  4:18 ` Kyle Huey
  2016-11-28  4:18 ` [PATCH 2/5] KVM: VMX: Reorder some skip_emulated_instruction calls Kyle Huey
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Kyle Huey @ 2016-11-28  4:18 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel
  Cc: kvm, linux-kernel

skip_emulated_instruction is going to grow a return value, and we'll need
to return it from kvm_emulate_cpuid.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/cpuid.c            | 3 ++-
 arch/x86/kvm/svm.c              | 3 +--
 arch/x86/kvm/vmx.c              | 3 +--
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 77cb3f9..80bad5c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1129,17 +1129,17 @@ void kvm_enable_efer_bits(u64);
 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
 int kvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 
 struct x86_emulate_ctxt;
 
 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
 int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, unsigned short port);
-void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
+int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
 int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 25f0f15..07cc629 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -874,22 +874,23 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
 		*ecx = best->ecx;
 		*edx = best->edx;
 	} else
 		*eax = *ebx = *ecx = *edx = 0;
 	trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx);
 }
 EXPORT_SYMBOL_GPL(kvm_cpuid);
 
-void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
+int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
 	u32 eax, ebx, ecx, edx;
 
 	eax = kvm_register_read(vcpu, VCPU_REGS_RAX);
 	ecx = kvm_register_read(vcpu, VCPU_REGS_RCX);
 	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx);
 	kvm_register_write(vcpu, VCPU_REGS_RAX, eax);
 	kvm_register_write(vcpu, VCPU_REGS_RBX, ebx);
 	kvm_register_write(vcpu, VCPU_REGS_RCX, ecx);
 	kvm_register_write(vcpu, VCPU_REGS_RDX, edx);
 	kvm_x86_ops->skip_emulated_instruction(vcpu);
+	return 1;
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5e64e656..5bdffcd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3234,18 +3234,17 @@ static int task_switch_interception(struct vcpu_svm *svm)
 		return 0;
 	}
 	return 1;
 }
 
 static int cpuid_interception(struct vcpu_svm *svm)
 {
 	svm->next_rip = kvm_rip_read(&svm->vcpu) + 2;
-	kvm_emulate_cpuid(&svm->vcpu);
-	return 1;
+	return kvm_emulate_cpuid(&svm->vcpu);
 }
 
 static int iret_interception(struct vcpu_svm *svm)
 {
 	++svm->vcpu.stat.nmi_window_exits;
 	clr_intercept(svm, INTERCEPT_IRET);
 	svm->vcpu.arch.hflags |= HF_IRET_MASK;
 	svm->nmi_iret_rip = kvm_rip_read(&svm->vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0e86219..e4af9699 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5832,18 +5832,17 @@ static void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 
 static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 {
 	vmcs_writel(GUEST_DR7, val);
 }
 
 static int handle_cpuid(struct kvm_vcpu *vcpu)
 {
-	kvm_emulate_cpuid(vcpu);
-	return 1;
+	return kvm_emulate_cpuid(vcpu);
 }
 
 static int handle_rdmsr(struct kvm_vcpu *vcpu)
 {
 	u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX];
 	struct msr_data msr_info;
 
 	msr_info.index = ecx;
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] KVM: VMX: Reorder some skip_emulated_instruction calls
  2016-11-28  4:18 [PATCH 0/5] KVM: VMX: Fix single stepping with emulated instructions Kyle Huey
  2016-11-28  4:18 ` [PATCH 1/5] KVM: x86: Add a return value to kvm_emulate_cpuid Kyle Huey
@ 2016-11-28  4:18 ` Kyle Huey
  2016-11-28  4:18 ` [PATCH 3/5] KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 Kyle Huey
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Kyle Huey @ 2016-11-28  4:18 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel
  Cc: kvm, linux-kernel

The functions being moved ahead of skip_emulated_instruction here don't
need updated IPs, and moving skip_emulated_instruction to the end will
make it easier to return its return value.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/kvm/vmx.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e4af9699..f2f9cf5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5703,18 +5703,18 @@ static int handle_cr(struct kvm_vcpu *vcpu)
 				vcpu->run->exit_reason = KVM_EXIT_SET_TPR;
 				return 0;
 			}
 		}
 		break;
 	case 2: /* clts */
 		handle_clts(vcpu);
 		trace_kvm_cr_write(0, kvm_read_cr0(vcpu));
-		skip_emulated_instruction(vcpu);
 		vmx_fpu_activate(vcpu);
+		skip_emulated_instruction(vcpu);
 		return 1;
 	case 1: /*mov from cr*/
 		switch (cr) {
 		case 3:
 			val = kvm_read_cr3(vcpu);
 			kvm_register_write(vcpu, reg, val);
 			trace_kvm_cr_read(cr, val);
 			skip_emulated_instruction(vcpu);
@@ -6128,18 +6128,18 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 
 static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 {
 	int ret;
 	gpa_t gpa;
 
 	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
 	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
-		skip_emulated_instruction(vcpu);
 		trace_kvm_fast_mmio(gpa);
+		skip_emulated_instruction(vcpu);
 		return 1;
 	}
 
 	ret = handle_mmio_page_fault(vcpu, gpa, true);
 	if (likely(ret == RET_MMIO_PF_EMULATE))
 		return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
 					      EMULATE_DONE;
 
@@ -6502,18 +6502,18 @@ static __exit void hardware_unsetup(void)
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
 	if (ple_gap)
 		grow_ple_window(vcpu);
 
-	skip_emulated_instruction(vcpu);
 	kvm_vcpu_on_spin(vcpu);
+	skip_emulated_instruction(vcpu);
 
 	return 1;
 }
 
 static int handle_nop(struct kvm_vcpu *vcpu)
 {
 	skip_emulated_instruction(vcpu);
 	return 1;
@@ -6957,18 +6957,18 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
 	vmx->nested.vmcs02_num = 0;
 
 	hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_REL_PINNED);
 	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
 
 	vmx->nested.vmxon = true;
 
-	skip_emulated_instruction(vcpu);
 	nested_vmx_succeed(vcpu);
+	skip_emulated_instruction(vcpu);
 	return 1;
 
 out_shadow_vmcs:
 	kfree(vmx->nested.cached_vmcs12);
 
 out_cached_vmcs12:
 	free_page((unsigned long)vmx->nested.msr_bitmap);
 
@@ -7078,18 +7078,18 @@ static void free_nested(struct vcpu_vmx *vmx)
 }
 
 /* Emulate the VMXOFF instruction */
 static int handle_vmoff(struct kvm_vcpu *vcpu)
 {
 	if (!nested_vmx_check_permission(vcpu))
 		return 1;
 	free_nested(to_vmx(vcpu));
-	skip_emulated_instruction(vcpu);
 	nested_vmx_succeed(vcpu);
+	skip_emulated_instruction(vcpu);
 	return 1;
 }
 
 /* Emulate the VMCLEAR instruction */
 static int handle_vmclear(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	gpa_t vmptr;
@@ -7119,18 +7119,18 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
 	}
 	vmcs12 = kmap(page);
 	vmcs12->launch_state = 0;
 	kunmap(page);
 	nested_release_page(page);
 
 	nested_free_vmcs02(vmx, vmptr);
 
-	skip_emulated_instruction(vcpu);
 	nested_vmx_succeed(vcpu);
+	skip_emulated_instruction(vcpu);
 	return 1;
 }
 
 static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch);
 
 /* Emulate the VMLAUNCH instruction */
 static int handle_vmlaunch(struct kvm_vcpu *vcpu)
 {
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12
  2016-11-28  4:18 [PATCH 0/5] KVM: VMX: Fix single stepping with emulated instructions Kyle Huey
  2016-11-28  4:18 ` [PATCH 1/5] KVM: x86: Add a return value to kvm_emulate_cpuid Kyle Huey
  2016-11-28  4:18 ` [PATCH 2/5] KVM: VMX: Reorder some skip_emulated_instruction calls Kyle Huey
@ 2016-11-28  4:18 ` Kyle Huey
  2016-11-28  4:18 ` [PATCH 4/5] KVM: x86: Add a return value to skip_emulated_instruction and propagate it Kyle Huey
  2016-11-28  4:18 ` [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction Kyle Huey
  4 siblings, 0 replies; 11+ messages in thread
From: Kyle Huey @ 2016-11-28  4:18 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel
  Cc: kvm, linux-kernel

We can't return both the pass/fail boolean for the vmcs and the upcoming
continue/exit-to-userspace boolean for skip_emulated_instruction out of
nested_vmx_check_vmcs, so move skip_emulated_instruction out of it instead.

Additionally, VMENTER/VMRESUME only trigger singlestep exceptions when
they advance the IP to the following instruction, not when they a) succeed,
b) fail MSR validation or c) throw an exception. Add a
skip_emulated_instruction_no_trap variant that will not check RFLAGS.TF.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/kvm/vmx.c | 60 +++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 39 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f2f9cf5..9cc4c41 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2455,28 +2455,33 @@ static void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
 		interruptibility |= GUEST_INTR_STATE_MOV_SS;
 	else if (mask & KVM_X86_SHADOW_INT_STI)
 		interruptibility |= GUEST_INTR_STATE_STI;
 
 	if ((interruptibility != interruptibility_old))
 		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, interruptibility);
 }
 
-static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
+static void skip_emulated_instruction_no_trap(struct kvm_vcpu *vcpu)
 {
 	unsigned long rip;
 
 	rip = kvm_rip_read(vcpu);
 	rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 	kvm_rip_write(vcpu, rip);
 
 	/* skipping an emulated instruction also counts */
 	vmx_set_interrupt_shadow(vcpu, 0);
 }
 
+static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
+{
+	skip_emulated_instruction_no_trap(vcpu);
+}
+
 /*
  * KVM wants to inject page-faults which it got to the guest. This function
  * checks whether in a nested guest, we need to inject them to L1 or L2.
  */
 static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 
@@ -7319,33 +7324,36 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
  * VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
  * used before) all generate the same failure when it is missing.
  */
 static int nested_vmx_check_vmcs12(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	if (vmx->nested.current_vmptr == -1ull) {
 		nested_vmx_failInvalid(vcpu);
-		skip_emulated_instruction(vcpu);
 		return 0;
 	}
 	return 1;
 }
 
 static int handle_vmread(struct kvm_vcpu *vcpu)
 {
 	unsigned long field;
 	u64 field_value;
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
 	gva_t gva = 0;
 
-	if (!nested_vmx_check_permission(vcpu) ||
-	    !nested_vmx_check_vmcs12(vcpu))
+	if (!nested_vmx_check_permission(vcpu))
+		return 1;
+
+	if (!nested_vmx_check_vmcs12(vcpu)) {
+		skip_emulated_instruction(vcpu);
 		return 1;
+	}
 
 	/* Decode instruction info and find the field to read */
 	field = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
 	/* Read the field, zero-extended to a u64 field_value */
 	if (vmcs12_read_any(vcpu, field, &field_value) < 0) {
 		nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
 		skip_emulated_instruction(vcpu);
 		return 1;
@@ -7383,19 +7391,23 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
 	 * mode, and eventually we need to write that into a field of several
 	 * possible lengths. The code below first zero-extends the value to 64
 	 * bit (field_value), and then copies only the appropriate number of
 	 * bits into the vmcs12 field.
 	 */
 	u64 field_value = 0;
 	struct x86_exception e;
 
-	if (!nested_vmx_check_permission(vcpu) ||
-	    !nested_vmx_check_vmcs12(vcpu))
+	if (!nested_vmx_check_permission(vcpu))
+		return 1;
+
+	if (!nested_vmx_check_vmcs12(vcpu)) {
+		skip_emulated_instruction(vcpu);
 		return 1;
+	}
 
 	if (vmx_instruction_info & (1u << 10))
 		field_value = kvm_register_readl(vcpu,
 			(((vmx_instruction_info) >> 3) & 0xf));
 	else {
 		if (get_vmx_mem_address(vcpu, exit_qualification,
 				vmx_instruction_info, false, &gva))
 			return 1;
@@ -10041,21 +10053,22 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 {
 	struct vmcs12 *vmcs12;
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	int cpu;
 	struct loaded_vmcs *vmcs02;
 	bool ia32e;
 	u32 msr_entry_idx;
 
-	if (!nested_vmx_check_permission(vcpu) ||
-	    !nested_vmx_check_vmcs12(vcpu))
+	if (!nested_vmx_check_permission(vcpu))
 		return 1;
 
-	skip_emulated_instruction(vcpu);
+	if (!nested_vmx_check_vmcs12(vcpu))
+		goto out;
+
 	vmcs12 = get_vmcs12(vcpu);
 
 	if (enable_shadow_vmcs)
 		copy_shadow_to_vmcs12(vmx);
 
 	/*
 	 * The nested entry process starts with enforcing various prerequisites
 	 * on vmcs12 as required by the Intel SDM, and act appropriately when
@@ -10065,43 +10078,43 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	 * To speed up the normal (success) code path, we should avoid checking
 	 * for misconfigurations which will anyway be caught by the processor
 	 * when using the merged vmcs02.
 	 */
 	if (vmcs12->launch_state == launch) {
 		nested_vmx_failValid(vcpu,
 			launch ? VMXERR_VMLAUNCH_NONCLEAR_VMCS
 			       : VMXERR_VMRESUME_NONLAUNCHED_VMCS);
-		return 1;
+		goto out;
 	}
 
 	if (vmcs12->guest_activity_state != GUEST_ACTIVITY_ACTIVE &&
 	    vmcs12->guest_activity_state != GUEST_ACTIVITY_HLT) {
 		nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
-		return 1;
+		goto out;
 	}
 
 	if (!nested_get_vmcs12_pages(vcpu, vmcs12)) {
 		nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
-		return 1;
+		goto out;
 	}
 
 	if (nested_vmx_check_msr_bitmap_controls(vcpu, vmcs12)) {
 		nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
-		return 1;
+		goto out;
 	}
 
 	if (nested_vmx_check_apicv_controls(vcpu, vmcs12)) {
 		nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
-		return 1;
+		goto out;
 	}
 
 	if (nested_vmx_check_msr_switch_controls(vcpu, vmcs12)) {
 		nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
-		return 1;
+		goto out;
 	}
 
 	if (!vmx_control_verify(vmcs12->cpu_based_vm_exec_control,
 				vmx->nested.nested_vmx_true_procbased_ctls_low,
 				vmx->nested.nested_vmx_procbased_ctls_high) ||
 	    !vmx_control_verify(vmcs12->secondary_vm_exec_control,
 				vmx->nested.nested_vmx_secondary_ctls_low,
 				vmx->nested.nested_vmx_secondary_ctls_high) ||
@@ -10111,36 +10124,36 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	    !vmx_control_verify(vmcs12->vm_exit_controls,
 				vmx->nested.nested_vmx_true_exit_ctls_low,
 				vmx->nested.nested_vmx_exit_ctls_high) ||
 	    !vmx_control_verify(vmcs12->vm_entry_controls,
 				vmx->nested.nested_vmx_true_entry_ctls_low,
 				vmx->nested.nested_vmx_entry_ctls_high))
 	{
 		nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
-		return 1;
+		goto out;
 	}
 
 	if (((vmcs12->host_cr0 & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON) ||
 	    ((vmcs12->host_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
 		nested_vmx_failValid(vcpu,
 			VMXERR_ENTRY_INVALID_HOST_STATE_FIELD);
-		return 1;
+		goto out;
 	}
 
 	if (!nested_cr0_valid(vcpu, vmcs12->guest_cr0) ||
 	    ((vmcs12->guest_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
 		nested_vmx_entry_failure(vcpu, vmcs12,
 			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
-		return 1;
+		goto out;
 	}
 	if (vmcs12->vmcs_link_pointer != -1ull) {
 		nested_vmx_entry_failure(vcpu, vmcs12,
 			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_VMCS_LINK_PTR);
-		return 1;
+		goto out;
 	}
 
 	/*
 	 * If the load IA32_EFER VM-entry control is 1, the following checks
 	 * are performed on the field for the IA32_EFER MSR:
 	 * - Bits reserved in the IA32_EFER MSR must be 0.
 	 * - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of
 	 *   the IA-32e mode guest VM-exit control. It must also be identical
@@ -10150,17 +10163,17 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER) {
 		ia32e = (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE) != 0;
 		if (!kvm_valid_efer(vcpu, vmcs12->guest_ia32_efer) ||
 		    ia32e != !!(vmcs12->guest_ia32_efer & EFER_LMA) ||
 		    ((vmcs12->guest_cr0 & X86_CR0_PG) &&
 		     ia32e != !!(vmcs12->guest_ia32_efer & EFER_LME))) {
 			nested_vmx_entry_failure(vcpu, vmcs12,
 				EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
-			return 1;
+			goto out;
 		}
 	}
 
 	/*
 	 * If the load IA32_EFER VM-exit control is 1, bits reserved in the
 	 * IA32_EFER MSR must be 0 in the field for that register. In addition,
 	 * the values of the LMA and LME bits in the field must each be that of
 	 * the host address-space size VM-exit control.
@@ -10168,29 +10181,30 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER) {
 		ia32e = (vmcs12->vm_exit_controls &
 			 VM_EXIT_HOST_ADDR_SPACE_SIZE) != 0;
 		if (!kvm_valid_efer(vcpu, vmcs12->host_ia32_efer) ||
 		    ia32e != !!(vmcs12->host_ia32_efer & EFER_LMA) ||
 		    ia32e != !!(vmcs12->host_ia32_efer & EFER_LME)) {
 			nested_vmx_entry_failure(vcpu, vmcs12,
 				EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
-			return 1;
+			goto out;
 		}
 	}
 
 	/*
 	 * We're finally done with prerequisite checking, and can start with
 	 * the nested entry.
 	 */
 
 	vmcs02 = nested_get_current_vmcs02(vmx);
 	if (!vmcs02)
 		return -ENOMEM;
 
+	skip_emulated_instruction_no_trap(vcpu);
 	enter_guest_mode(vcpu);
 
 	if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
 		vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
 
 	cpu = get_cpu();
 	vmx->loaded_vmcs = vmcs02;
 	vmx_vcpu_put(vcpu);
@@ -10222,16 +10236,20 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 
 	/*
 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
 	 * returned as far as L1 is concerned. It will only return (and set
 	 * the success flag) when L2 exits (see nested_vmx_vmexit()).
 	 */
 	return 1;
+
+out:
+	skip_emulated_instruction(vcpu);
+	return 1;
 }
 
 /*
  * On a nested exit from L2 to L1, vmcs12.guest_cr0 might not be up-to-date
  * because L2 may have changed some cr0 bits directly (CRO_GUEST_HOST_MASK).
  * This function returns the new value we should put in vmcs12.guest_cr0.
  * It's not enough to just return the vmcs02 GUEST_CR0. Rather,
  *  1. Bits that neither L0 nor L1 trapped, were set directly by L2 and are now
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] KVM: x86: Add a return value to skip_emulated_instruction and propagate it.
  2016-11-28  4:18 [PATCH 0/5] KVM: VMX: Fix single stepping with emulated instructions Kyle Huey
                   ` (2 preceding siblings ...)
  2016-11-28  4:18 ` [PATCH 3/5] KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 Kyle Huey
@ 2016-11-28  4:18 ` Kyle Huey
  2016-11-28  4:18 ` [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction Kyle Huey
  4 siblings, 0 replies; 11+ messages in thread
From: Kyle Huey @ 2016-11-28  4:18 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel
  Cc: kvm, linux-kernel

Return 1 (for the moment) from skip_emulated_instruction and propagate it
through. This is straightforward, except for:

- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
  MOVs to CR8, and HALT. If skip_emulated_instruction does trigger a
  KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
  IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
  take precedence. The singlestep will be triggered again on the next
  instruction, which is the current behavior.
- XSAVES/XRSTORS, which should never be called.
- Task switch instructions which would require additional handling (e.g.
  the task switch bit) and are instead modified to use the no_trap
  variant.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/include/asm/kvm_host.h |   4 +-
 arch/x86/kvm/cpuid.c            |   3 +-
 arch/x86/kvm/svm.c              |  16 ++--
 arch/x86/kvm/vmx.c              | 175 ++++++++++++++++------------------------
 arch/x86/kvm/x86.c              |  22 +++--
 5 files changed, 92 insertions(+), 128 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 80bad5c..f846610 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -919,17 +919,17 @@ struct kvm_x86_ops {
 	u32 (*get_pkru)(struct kvm_vcpu *vcpu);
 	void (*fpu_activate)(struct kvm_vcpu *vcpu);
 	void (*fpu_deactivate)(struct kvm_vcpu *vcpu);
 
 	void (*tlb_flush)(struct kvm_vcpu *vcpu);
 
 	void (*run)(struct kvm_vcpu *vcpu);
 	int (*handle_exit)(struct kvm_vcpu *vcpu);
-	void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
+	int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
 	void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
 	u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
 	void (*patch_hypercall)(struct kvm_vcpu *vcpu,
 				unsigned char *hypercall_addr);
 	void (*set_irq)(struct kvm_vcpu *vcpu);
 	void (*set_nmi)(struct kvm_vcpu *vcpu);
 	void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
 				bool has_error_code, u32 error_code,
@@ -1363,17 +1363,17 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 				     struct kvm_async_pf *work);
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work);
 void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 			       struct kvm_async_pf *work);
 bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
-void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
+int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
 
 int kvm_is_in_guest(void);
 
 int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size);
 int x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size);
 bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 07cc629..d733ac2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -885,12 +885,11 @@ int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 
 	eax = kvm_register_read(vcpu, VCPU_REGS_RAX);
 	ecx = kvm_register_read(vcpu, VCPU_REGS_RCX);
 	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx);
 	kvm_register_write(vcpu, VCPU_REGS_RAX, eax);
 	kvm_register_write(vcpu, VCPU_REGS_RBX, ebx);
 	kvm_register_write(vcpu, VCPU_REGS_RCX, ecx);
 	kvm_register_write(vcpu, VCPU_REGS_RDX, edx);
-	kvm_x86_ops->skip_emulated_instruction(vcpu);
-	return 1;
+	return kvm_x86_ops->skip_emulated_instruction(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5bdffcd..ed28be0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -603,37 +603,38 @@ static void svm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
 
 	if (mask == 0)
 		svm->vmcb->control.int_state &= ~SVM_INTERRUPT_SHADOW_MASK;
 	else
 		svm->vmcb->control.int_state |= SVM_INTERRUPT_SHADOW_MASK;
 
 }
 
-static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
+static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	if (svm->vmcb->control.next_rip != 0) {
 		WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
 		svm->next_rip = svm->vmcb->control.next_rip;
 	}
 
 	if (!svm->next_rip) {
 		if (emulate_instruction(vcpu, EMULTYPE_SKIP) !=
 				EMULATE_DONE)
 			printk(KERN_DEBUG "%s: NOP\n", __func__);
-		return;
+		return 1;
 	}
 	if (svm->next_rip - kvm_rip_read(vcpu) > MAX_INST_SIZE)
 		printk(KERN_ERR "%s: ip 0x%lx next 0x%llx\n",
 		       __func__, kvm_rip_read(vcpu), svm->next_rip);
 
 	kvm_rip_write(vcpu, svm->next_rip);
 	svm_set_interrupt_shadow(vcpu, 0);
+	return 1;
 }
 
 static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
 				bool has_error_code, u32 error_code,
 				bool reinject)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -3146,18 +3147,17 @@ static int skinit_interception(struct vcpu_svm *svm)
 	trace_kvm_skinit(svm->vmcb->save.rip, kvm_register_read(&svm->vcpu, VCPU_REGS_RAX));
 
 	kvm_queue_exception(&svm->vcpu, UD_VECTOR);
 	return 1;
 }
 
 static int wbinvd_interception(struct vcpu_svm *svm)
 {
-	kvm_emulate_wbinvd(&svm->vcpu);
-	return 1;
+	return kvm_emulate_wbinvd(&svm->vcpu);
 }
 
 static int xsetbv_interception(struct vcpu_svm *svm)
 {
 	u64 new_bv = kvm_read_edx_eax(&svm->vcpu);
 	u32 index = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX);
 
 	if (kvm_set_xcr(&svm->vcpu, index, new_bv) == 0) {
@@ -3270,19 +3270,17 @@ static int emulate_on_interception(struct vcpu_svm *svm)
 static int rdpmc_interception(struct vcpu_svm *svm)
 {
 	int err;
 
 	if (!static_cpu_has(X86_FEATURE_NRIPS))
 		return emulate_on_interception(svm);
 
 	err = kvm_rdpmc(&svm->vcpu);
-	kvm_complete_insn_gp(&svm->vcpu, err);
-
-	return 1;
+	return kvm_complete_insn_gp(&svm->vcpu, err);
 }
 
 static bool check_selective_cr0_intercepted(struct vcpu_svm *svm,
 					    unsigned long val)
 {
 	unsigned long cr0 = svm->vcpu.arch.cr0;
 	bool ret = false;
 	u64 intercept;
@@ -3369,19 +3367,17 @@ static int cr_interception(struct vcpu_svm *svm)
 			break;
 		default:
 			WARN(1, "unhandled read from CR%d", cr);
 			kvm_queue_exception(&svm->vcpu, UD_VECTOR);
 			return 1;
 		}
 		kvm_register_write(&svm->vcpu, reg, val);
 	}
-	kvm_complete_insn_gp(&svm->vcpu, err);
-
-	return 1;
+	return kvm_complete_insn_gp(&svm->vcpu, err);
 }
 
 static int dr_interception(struct vcpu_svm *svm)
 {
 	int reg, dr;
 	unsigned long val;
 
 	if (svm->vcpu.guest_debug == 0) {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9cc4c41..f404aef 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2467,19 +2467,20 @@ static void skip_emulated_instruction_no_trap(struct kvm_vcpu *vcpu)
 	rip = kvm_rip_read(vcpu);
 	rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 	kvm_rip_write(vcpu, rip);
 
 	/* skipping an emulated instruction also counts */
 	vmx_set_interrupt_shadow(vcpu, 0);
 }
 
-static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
+static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	skip_emulated_instruction_no_trap(vcpu);
+	return 1;
 }
 
 /*
  * KVM wants to inject page-faults which it got to the guest. This function
  * checks whether in a nested guest, we need to inject them to L1 or L2.
  */
 static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
@@ -5511,17 +5512,17 @@ static int handle_exception(struct kvm_vcpu *vcpu)
 		return 1;
 	case DB_VECTOR:
 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
 		if (!(vcpu->guest_debug &
 		      (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) {
 			vcpu->arch.dr6 &= ~15;
 			vcpu->arch.dr6 |= dr6 | DR6_RTM;
 			if (!(dr6 & ~DR6_RESERVED)) /* icebp */
-				skip_emulated_instruction(vcpu);
+				skip_emulated_instruction_no_trap(vcpu);
 
 			kvm_queue_exception(vcpu, DB_VECTOR);
 			return 1;
 		}
 		kvm_run->debug.arch.dr6 = dr6 | DR6_FIXED_1;
 		kvm_run->debug.arch.dr7 = vmcs_readl(GUEST_DR7);
 		/* fall through */
 	case BP_VECTOR:
@@ -5556,33 +5557,38 @@ static int handle_triple_fault(struct kvm_vcpu *vcpu)
 {
 	vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
 	return 0;
 }
 
 static int handle_io(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification;
-	int size, in, string;
+	int size, in, string, ret;
 	unsigned port;
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	string = (exit_qualification & 16) != 0;
 	in = (exit_qualification & 8) != 0;
 
 	++vcpu->stat.io_exits;
 
 	if (string || in)
 		return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 
 	port = exit_qualification >> 16;
 	size = (exit_qualification & 7) + 1;
-	skip_emulated_instruction(vcpu);
 
-	return kvm_fast_pio_out(vcpu, size, port);
+	ret = skip_emulated_instruction(vcpu);
+
+	/*
+	 * TODO: we might be squashing a KVM_GUESTDBG_SINGLESTEP-triggered
+	 * KVM_EXIT_DEBUG here.
+	 */
+	return kvm_fast_pio_out(vcpu, size, port) && ret;
 }
 
 static void
 vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
 {
 	/*
 	 * Patch in the VMCALL instruction:
 	 */
@@ -5670,80 +5676,79 @@ static void handle_clts(struct kvm_vcpu *vcpu)
 }
 
 static int handle_cr(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification, val;
 	int cr;
 	int reg;
 	int err;
+	int ret;
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	cr = exit_qualification & 15;
 	reg = (exit_qualification >> 8) & 15;
 	switch ((exit_qualification >> 4) & 3) {
 	case 0: /* mov to cr */
 		val = kvm_register_readl(vcpu, reg);
 		trace_kvm_cr_write(cr, val);
 		switch (cr) {
 		case 0:
 			err = handle_set_cr0(vcpu, val);
-			kvm_complete_insn_gp(vcpu, err);
-			return 1;
+			return kvm_complete_insn_gp(vcpu, err);
 		case 3:
 			err = kvm_set_cr3(vcpu, val);
-			kvm_complete_insn_gp(vcpu, err);
-			return 1;
+			return kvm_complete_insn_gp(vcpu, err);
 		case 4:
 			err = handle_set_cr4(vcpu, val);
-			kvm_complete_insn_gp(vcpu, err);
-			return 1;
+			return kvm_complete_insn_gp(vcpu, err);
 		case 8: {
 				u8 cr8_prev = kvm_get_cr8(vcpu);
 				u8 cr8 = (u8)val;
 				err = kvm_set_cr8(vcpu, cr8);
-				kvm_complete_insn_gp(vcpu, err);
+				ret = kvm_complete_insn_gp(vcpu, err);
 				if (lapic_in_kernel(vcpu))
-					return 1;
+					return ret;
 				if (cr8_prev <= cr8)
-					return 1;
+					return ret;
+				/*
+				 * TODO: we might be squashing a
+				 * KVM_GUESTDBG_SINGLESTEP-triggered
+				 * KVM_EXIT_DEBUG here.
+				 */
 				vcpu->run->exit_reason = KVM_EXIT_SET_TPR;
 				return 0;
 			}
 		}
 		break;
 	case 2: /* clts */
 		handle_clts(vcpu);
 		trace_kvm_cr_write(0, kvm_read_cr0(vcpu));
 		vmx_fpu_activate(vcpu);
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	case 1: /*mov from cr*/
 		switch (cr) {
 		case 3:
 			val = kvm_read_cr3(vcpu);
 			kvm_register_write(vcpu, reg, val);
 			trace_kvm_cr_read(cr, val);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		case 8:
 			val = kvm_get_cr8(vcpu);
 			kvm_register_write(vcpu, reg, val);
 			trace_kvm_cr_read(cr, val);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 		break;
 	case 3: /* lmsw */
 		val = (exit_qualification >> LMSW_SOURCE_DATA_SHIFT) & 0x0f;
 		trace_kvm_cr_write(0, (kvm_read_cr0(vcpu) & ~0xful) | val);
 		kvm_lmsw(vcpu, val);
 
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	default:
 		break;
 	}
 	vcpu->run->exit_reason = 0;
 	vcpu_unimpl(vcpu, "unhandled control register: op %d cr %d\n",
 	       (int)(exit_qualification >> 4) & 3, cr);
 	return 0;
 }
@@ -5804,18 +5809,17 @@ static int handle_dr(struct kvm_vcpu *vcpu)
 
 		if (kvm_get_dr(vcpu, dr, &val))
 			return 1;
 		kvm_register_write(vcpu, reg, val);
 	} else
 		if (kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg)))
 			return 1;
 
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 static u64 vmx_get_dr6(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.dr6;
 }
 
 static void vmx_set_dr6(struct kvm_vcpu *vcpu, unsigned long val)
@@ -5858,18 +5862,17 @@ static int handle_rdmsr(struct kvm_vcpu *vcpu)
 		return 1;
 	}
 
 	trace_kvm_msr_read(ecx, msr_info.data);
 
 	/* FIXME: handling of bits 32:63 of rax, rdx */
 	vcpu->arch.regs[VCPU_REGS_RAX] = msr_info.data & -1u;
 	vcpu->arch.regs[VCPU_REGS_RDX] = (msr_info.data >> 32) & -1u;
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 static int handle_wrmsr(struct kvm_vcpu *vcpu)
 {
 	struct msr_data msr;
 	u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX];
 	u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u)
 		| ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32);
@@ -5879,18 +5882,17 @@ static int handle_wrmsr(struct kvm_vcpu *vcpu)
 	msr.host_initiated = false;
 	if (kvm_set_msr(vcpu, &msr) != 0) {
 		trace_kvm_msr_write_ex(ecx, data);
 		kvm_inject_gp(vcpu, 0);
 		return 1;
 	}
 
 	trace_kvm_msr_write(ecx, data);
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 static int handle_tpr_below_threshold(struct kvm_vcpu *vcpu)
 {
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 	return 1;
 }
 
@@ -5924,43 +5926,39 @@ static int handle_invd(struct kvm_vcpu *vcpu)
 	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 }
 
 static int handle_invlpg(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
 	kvm_mmu_invlpg(vcpu, exit_qualification);
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 static int handle_rdpmc(struct kvm_vcpu *vcpu)
 {
 	int err;
 
 	err = kvm_rdpmc(vcpu);
-	kvm_complete_insn_gp(vcpu, err);
-
-	return 1;
+	return kvm_complete_insn_gp(vcpu, err);
 }
 
 static int handle_wbinvd(struct kvm_vcpu *vcpu)
 {
-	kvm_emulate_wbinvd(vcpu);
-	return 1;
+	return kvm_emulate_wbinvd(vcpu);
 }
 
 static int handle_xsetbv(struct kvm_vcpu *vcpu)
 {
 	u64 new_bv = kvm_read_edx_eax(vcpu);
 	u32 index = kvm_register_read(vcpu, VCPU_REGS_RCX);
 
 	if (kvm_set_xcr(vcpu, index, new_bv) == 0)
-		skip_emulated_instruction(vcpu);
+		return skip_emulated_instruction(vcpu);
 	return 1;
 }
 
 static int handle_xsaves(struct kvm_vcpu *vcpu)
 {
 	skip_emulated_instruction(vcpu);
 	WARN(1, "this should never happen\n");
 	return 1;
@@ -5984,18 +5982,17 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
 		/*
 		 * Sane guest uses MOV to write EOI, with written value
 		 * not cared. So make a short-circuit here by avoiding
 		 * heavy instruction emulation.
 		 */
 		if ((access_type == TYPE_LINEAR_APIC_INST_WRITE) &&
 		    (offset == APIC_EOI)) {
 			kvm_lapic_set_eoi(vcpu);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 	}
 	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 }
 
 static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
@@ -6057,17 +6054,17 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
 			break;
 		}
 	}
 	tss_selector = exit_qualification;
 
 	if (!idt_v || (type != INTR_TYPE_HARD_EXCEPTION &&
 		       type != INTR_TYPE_EXT_INTR &&
 		       type != INTR_TYPE_NMI_INTR))
-		skip_emulated_instruction(vcpu);
+		skip_emulated_instruction_no_trap(vcpu);
 
 	if (kvm_task_switch(vcpu, tss_selector,
 			    type == INTR_TYPE_SOFT_INTR ? idt_index : -1, reason,
 			    has_error_code, error_code) == EMULATE_FAIL) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
 		vcpu->run->internal.ndata = 0;
 		return 0;
@@ -6134,18 +6131,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 {
 	int ret;
 	gpa_t gpa;
 
 	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
 	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
 		trace_kvm_fast_mmio(gpa);
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	}
 
 	ret = handle_mmio_page_fault(vcpu, gpa, true);
 	if (likely(ret == RET_MMIO_PF_EMULATE))
 		return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
 					      EMULATE_DONE;
 
 	if (unlikely(ret == RET_MMIO_PF_INVALID))
@@ -6508,25 +6504,22 @@ static __exit void hardware_unsetup(void)
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
 	if (ple_gap)
 		grow_ple_window(vcpu);
 
 	kvm_vcpu_on_spin(vcpu);
-	skip_emulated_instruction(vcpu);
-
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 static int handle_nop(struct kvm_vcpu *vcpu)
 {
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 static int handle_mwait(struct kvm_vcpu *vcpu)
 {
 	printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
 	return handle_nop(vcpu);
 }
 
@@ -6823,59 +6816,53 @@ static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason,
 		 *
 		 * Note - IA32_VMX_BASIC[48] will never be 1
 		 * for the nested case;
 		 * which replaces physical address width with 32
 		 *
 		 */
 		if (!PAGE_ALIGNED(vmptr) || (vmptr >> maxphyaddr)) {
 			nested_vmx_failInvalid(vcpu);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 
 		page = nested_get_page(vcpu, vmptr);
 		if (page == NULL ||
 		    *(u32 *)kmap(page) != VMCS12_REVISION) {
 			nested_vmx_failInvalid(vcpu);
 			kunmap(page);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 		kunmap(page);
 		vmx->nested.vmxon_ptr = vmptr;
 		break;
 	case EXIT_REASON_VMCLEAR:
 		if (!PAGE_ALIGNED(vmptr) || (vmptr >> maxphyaddr)) {
 			nested_vmx_failValid(vcpu,
 					     VMXERR_VMCLEAR_INVALID_ADDRESS);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 
 		if (vmptr == vmx->nested.vmxon_ptr) {
 			nested_vmx_failValid(vcpu,
 					     VMXERR_VMCLEAR_VMXON_POINTER);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 		break;
 	case EXIT_REASON_VMPTRLD:
 		if (!PAGE_ALIGNED(vmptr) || (vmptr >> maxphyaddr)) {
 			nested_vmx_failValid(vcpu,
 					     VMXERR_VMPTRLD_INVALID_ADDRESS);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 
 		if (vmptr == vmx->nested.vmxon_ptr) {
 			nested_vmx_failValid(vcpu,
 					     VMXERR_VMCLEAR_VMXON_POINTER);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 		break;
 	default:
 		return 1; /* shouldn't happen */
 	}
 
 	if (vmpointer)
 		*vmpointer = vmptr;
@@ -6921,18 +6908,17 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
 		return 1;
 	}
 
 	if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMON, NULL))
 		return 1;
 
 	if (vmx->nested.vmxon) {
 		nested_vmx_failValid(vcpu, VMXERR_VMXON_IN_VMX_ROOT_OPERATION);
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	}
 
 	if ((vmx->msr_ia32_feature_control & VMXON_NEEDED_FEATURES)
 			!= VMXON_NEEDED_FEATURES) {
 		kvm_inject_gp(vcpu, 0);
 		return 1;
 	}
 
@@ -6963,18 +6949,17 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
 
 	hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_REL_PINNED);
 	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
 
 	vmx->nested.vmxon = true;
 
 	nested_vmx_succeed(vcpu);
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 
 out_shadow_vmcs:
 	kfree(vmx->nested.cached_vmcs12);
 
 out_cached_vmcs12:
 	free_page((unsigned long)vmx->nested.msr_bitmap);
 
 out_msr_bitmap:
@@ -7084,18 +7069,17 @@ static void free_nested(struct vcpu_vmx *vmx)
 
 /* Emulate the VMXOFF instruction */
 static int handle_vmoff(struct kvm_vcpu *vcpu)
 {
 	if (!nested_vmx_check_permission(vcpu))
 		return 1;
 	free_nested(to_vmx(vcpu));
 	nested_vmx_succeed(vcpu);
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 /* Emulate the VMCLEAR instruction */
 static int handle_vmclear(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	gpa_t vmptr;
 	struct vmcs12 *vmcs12;
@@ -7125,18 +7109,17 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
 	vmcs12 = kmap(page);
 	vmcs12->launch_state = 0;
 	kunmap(page);
 	nested_release_page(page);
 
 	nested_free_vmcs02(vmx, vmptr);
 
 	nested_vmx_succeed(vcpu);
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch);
 
 /* Emulate the VMLAUNCH instruction */
 static int handle_vmlaunch(struct kvm_vcpu *vcpu)
 {
 	return nested_vmx_run(vcpu, true);
@@ -7340,28 +7323,25 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
 	u64 field_value;
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
 	gva_t gva = 0;
 
 	if (!nested_vmx_check_permission(vcpu))
 		return 1;
 
-	if (!nested_vmx_check_vmcs12(vcpu)) {
-		skip_emulated_instruction(vcpu);
-		return 1;
-	}
+	if (!nested_vmx_check_vmcs12(vcpu))
+		return skip_emulated_instruction(vcpu);
 
 	/* Decode instruction info and find the field to read */
 	field = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
 	/* Read the field, zero-extended to a u64 field_value */
 	if (vmcs12_read_any(vcpu, field, &field_value) < 0) {
 		nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	}
 	/*
 	 * Now copy part of this value to register or memory, as requested.
 	 * Note that the number of bits actually copied is 32 or 64 depending
 	 * on the guest's mode (32 or 64 bit), not on the given field's length.
 	 */
 	if (vmx_instruction_info & (1u << 10)) {
 		kvm_register_writel(vcpu, (((vmx_instruction_info) >> 3) & 0xf),
@@ -7371,18 +7351,17 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
 				vmx_instruction_info, true, &gva))
 			return 1;
 		/* _system ok, as nested_vmx_check_permission verified cpl=0 */
 		kvm_write_guest_virt_system(&vcpu->arch.emulate_ctxt, gva,
 			     &field_value, (is_long_mode(vcpu) ? 8 : 4), NULL);
 	}
 
 	nested_vmx_succeed(vcpu);
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 
 static int handle_vmwrite(struct kvm_vcpu *vcpu)
 {
 	unsigned long field;
 	gva_t gva;
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
@@ -7394,20 +7373,18 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
 	 * bits into the vmcs12 field.
 	 */
 	u64 field_value = 0;
 	struct x86_exception e;
 
 	if (!nested_vmx_check_permission(vcpu))
 		return 1;
 
-	if (!nested_vmx_check_vmcs12(vcpu)) {
-		skip_emulated_instruction(vcpu);
-		return 1;
-	}
+	if (!nested_vmx_check_vmcs12(vcpu))
+		return skip_emulated_instruction(vcpu);
 
 	if (vmx_instruction_info & (1u << 10))
 		field_value = kvm_register_readl(vcpu,
 			(((vmx_instruction_info) >> 3) & 0xf));
 	else {
 		if (get_vmx_mem_address(vcpu, exit_qualification,
 				vmx_instruction_info, false, &gva))
 			return 1;
@@ -7418,29 +7395,26 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
 		}
 	}
 
 
 	field = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
 	if (vmcs_field_readonly(field)) {
 		nested_vmx_failValid(vcpu,
 			VMXERR_VMWRITE_READ_ONLY_VMCS_COMPONENT);
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	}
 
 	if (vmcs12_write_any(vcpu, field, field_value) < 0) {
 		nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	}
 
 	nested_vmx_succeed(vcpu);
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 /* Emulate the VMPTRLD instruction */
 static int handle_vmptrld(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	gpa_t vmptr;
 
@@ -7451,27 +7425,25 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
 		return 1;
 
 	if (vmx->nested.current_vmptr != vmptr) {
 		struct vmcs12 *new_vmcs12;
 		struct page *page;
 		page = nested_get_page(vcpu, vmptr);
 		if (page == NULL) {
 			nested_vmx_failInvalid(vcpu);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 		new_vmcs12 = kmap(page);
 		if (new_vmcs12->revision_id != VMCS12_REVISION) {
 			kunmap(page);
 			nested_release_page_clean(page);
 			nested_vmx_failValid(vcpu,
 				VMXERR_VMPTRLD_INCORRECT_VMCS_REVISION_ID);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 
 		nested_release_vmcs12(vmx);
 		vmx->nested.current_vmptr = vmptr;
 		vmx->nested.current_vmcs12 = new_vmcs12;
 		vmx->nested.current_vmcs12_page = page;
 		/*
 		 * Load VMCS12 from guest memory since it is not already
@@ -7485,18 +7457,17 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
 				      SECONDARY_EXEC_SHADOW_VMCS);
 			vmcs_write64(VMCS_LINK_POINTER,
 				     __pa(vmx->vmcs01.shadow_vmcs));
 			vmx->nested.sync_shadow_vmcs = true;
 		}
 	}
 
 	nested_vmx_succeed(vcpu);
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 /* Emulate the VMPTRST instruction */
 static int handle_vmptrst(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
 	gva_t vmcs_gva;
@@ -7511,18 +7482,17 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
 	/* ok to use *_system, as nested_vmx_check_permission verified cpl=0 */
 	if (kvm_write_guest_virt_system(&vcpu->arch.emulate_ctxt, vmcs_gva,
 				 (void *)&to_vmx(vcpu)->nested.current_vmptr,
 				 sizeof(u64), &e)) {
 		kvm_inject_page_fault(vcpu, &e);
 		return 1;
 	}
 	nested_vmx_succeed(vcpu);
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 /* Emulate the INVEPT instruction */
 static int handle_invept(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 vmx_instruction_info, types;
 	unsigned long type;
@@ -7550,18 +7520,17 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 	vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
 	type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
 
 	types = (vmx->nested.nested_vmx_ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6;
 
 	if (type >= 32 || !(types & (1 << type))) {
 		nested_vmx_failValid(vcpu,
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	}
 
 	/* According to the Intel VMX instruction reference, the memory
 	 * operand is read even if it isn't needed (e.g., for type==global)
 	 */
 	if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
 			vmx_instruction_info, false, &gva))
 		return 1;
@@ -7582,18 +7551,17 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 		nested_vmx_succeed(vcpu);
 		break;
 	default:
 		BUG_ON(1);
 		break;
 	}
 
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 static int handle_invvpid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 vmx_instruction_info;
 	unsigned long type, types;
 	gva_t gva;
@@ -7614,18 +7582,17 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 	type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
 
 	types = (vmx->nested.nested_vmx_vpid_caps &
 			VMX_VPID_EXTENT_SUPPORTED_MASK) >> 8;
 
 	if (type >= 32 || !(types & (1 << type))) {
 		nested_vmx_failValid(vcpu,
 			VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	}
 
 	/* according to the intel vmx instruction reference, the memory
 	 * operand is read even if it isn't needed (e.g., for type==global)
 	 */
 	if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
 			vmx_instruction_info, false, &gva))
 		return 1;
@@ -7637,33 +7604,30 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 
 	switch (type) {
 	case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
 	case VMX_VPID_EXTENT_SINGLE_CONTEXT:
 	case VMX_VPID_EXTENT_SINGLE_NON_GLOBAL:
 		if (!vpid) {
 			nested_vmx_failValid(vcpu,
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
-			skip_emulated_instruction(vcpu);
-			return 1;
+			return skip_emulated_instruction(vcpu);
 		}
 		break;
 	case VMX_VPID_EXTENT_ALL_CONTEXT:
 		break;
 	default:
 		WARN_ON_ONCE(1);
-		skip_emulated_instruction(vcpu);
-		return 1;
+		return skip_emulated_instruction(vcpu);
 	}
 
 	__vmx_flush_tlb(vcpu, vmx->nested.vpid02);
 	nested_vmx_succeed(vcpu);
 
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 static int handle_pml_full(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification;
 
 	trace_kvm_pml_full(vcpu->vcpu_id);
 
@@ -10238,18 +10202,17 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
 	 * returned as far as L1 is concerned. It will only return (and set
 	 * the success flag) when L2 exits (see nested_vmx_vmexit()).
 	 */
 	return 1;
 
 out:
-	skip_emulated_instruction(vcpu);
-	return 1;
+	return skip_emulated_instruction(vcpu);
 }
 
 /*
  * On a nested exit from L2 to L1, vmcs12.guest_cr0 might not be up-to-date
  * because L2 may have changed some cr0 bits directly (CRO_GUEST_HOST_MASK).
  * This function returns the new value we should put in vmcs12.guest_cr0.
  * It's not enough to just return the vmcs02 GUEST_CR0. Rather,
  *  1. Bits that neither L0 nor L1 trapped, were set directly by L2 and are now
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ec59301..fbcacf8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -420,22 +420,24 @@ void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
 EXPORT_SYMBOL_GPL(kvm_queue_exception);
 
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
 	kvm_multiple_exception(vcpu, nr, false, 0, true);
 }
 EXPORT_SYMBOL_GPL(kvm_requeue_exception);
 
-void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
+int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
 {
 	if (err)
 		kvm_inject_gp(vcpu, 0);
 	else
-		kvm_x86_ops->skip_emulated_instruction(vcpu);
+		return kvm_x86_ops->skip_emulated_instruction(vcpu);
+
+	return 1;
 }
 EXPORT_SYMBOL_GPL(kvm_complete_insn_gp);
 
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 {
 	++vcpu->stat.pf_guest;
 	vcpu->arch.cr2 = fault->address;
 	kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
@@ -4808,18 +4810,18 @@ static int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
 		cpumask_clear(vcpu->arch.wbinvd_dirty_mask);
 	} else
 		wbinvd();
 	return X86EMUL_CONTINUE;
 }
 
 int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu)
 {
-	kvm_x86_ops->skip_emulated_instruction(vcpu);
-	return kvm_emulate_wbinvd_noskip(vcpu);
+	kvm_emulate_wbinvd_noskip(vcpu);
+	return kvm_x86_ops->skip_emulated_instruction(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_wbinvd);
 
 
 
 static void emulator_wbinvd(struct x86_emulate_ctxt *ctxt)
 {
 	kvm_emulate_wbinvd_noskip(emul_to_vcpu(ctxt));
@@ -6002,18 +6004,22 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 		vcpu->run->exit_reason = KVM_EXIT_HLT;
 		return 0;
 	}
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_halt);
 
 int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 {
-	kvm_x86_ops->skip_emulated_instruction(vcpu);
-	return kvm_vcpu_halt(vcpu);
+	int ret = kvm_x86_ops->skip_emulated_instruction(vcpu);
+	/*
+	 * TODO: we might be squashing a GUESTDBG_SINGLESTEP-triggered
+	 * KVM_EXIT_DEBUG here.
+	 */
+	return kvm_vcpu_halt(vcpu) && ret;
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
 /*
  * kvm_pv_kick_cpu_op:  Kick a vcpu.
  *
  * @apicid - apicid of vcpu to be kicked.
  */
@@ -6034,19 +6040,19 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.apicv_active = false;
 	kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
 }
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
-	int op_64_bit, r = 1;
+	int op_64_bit, r;
 
-	kvm_x86_ops->skip_emulated_instruction(vcpu);
+	r = kvm_x86_ops->skip_emulated_instruction(vcpu);
 
 	if (kvm_hv_hypercall_enabled(vcpu->kvm))
 		return kvm_hv_hypercall(vcpu);
 
 	nr = kvm_register_read(vcpu, VCPU_REGS_RAX);
 	a0 = kvm_register_read(vcpu, VCPU_REGS_RBX);
 	a1 = kvm_register_read(vcpu, VCPU_REGS_RCX);
 	a2 = kvm_register_read(vcpu, VCPU_REGS_RDX);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction
  2016-11-28  4:18 [PATCH 0/5] KVM: VMX: Fix single stepping with emulated instructions Kyle Huey
                   ` (3 preceding siblings ...)
  2016-11-28  4:18 ` [PATCH 4/5] KVM: x86: Add a return value to skip_emulated_instruction and propagate it Kyle Huey
@ 2016-11-28  4:18 ` Kyle Huey
  2016-11-28 11:42   ` Paolo Bonzini
  4 siblings, 1 reply; 11+ messages in thread
From: Kyle Huey @ 2016-11-28  4:18 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel
  Cc: kvm, linux-kernel

Similar to the code in kvm_vcpu_check_singlestep, check for TF and,
depending on the origin, synthesize a DB exception or an exit to userspace.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/kvm/vmx.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f404aef..6583e97 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2470,16 +2470,37 @@ static void skip_emulated_instruction_no_trap(struct kvm_vcpu *vcpu)
 
 	/* skipping an emulated instruction also counts */
 	vmx_set_interrupt_shadow(vcpu, 0);
 }
 
 static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	skip_emulated_instruction_no_trap(vcpu);
+
+	if (unlikely(vmx_get_rflags(vcpu) & X86_EFLAGS_TF)) {
+		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
+			vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1 |
+						    DR6_RTM;
+			vcpu->run->debug.arch.pc = vcpu->arch.singlestep_rip;
+			vcpu->run->debug.arch.exception = DB_VECTOR;
+			vcpu->run->exit_reason = KVM_EXIT_DEBUG;
+			return 0;
+		}
+
+		/*
+		 * "Certain debug exceptions may clear bit 0-3.  The
+		 * remaining contents of the DR6 register are never
+		 * cleared by the processor".
+		 */
+		vcpu->arch.dr6 &= ~15;
+		vcpu->arch.dr6 |= DR6_BS | DR6_RTM;
+		kvm_queue_exception(vcpu, DB_VECTOR);
+	}
+
 	return 1;
 }
 
 /*
  * KVM wants to inject page-faults which it got to the guest. This function
  * checks whether in a nested guest, we need to inject them to L1 or L2.
  */
 static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction
  2016-11-28  4:18 ` [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction Kyle Huey
@ 2016-11-28 11:42   ` Paolo Bonzini
  2016-11-28 16:13     ` Kyle Huey
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-11-28 11:42 UTC (permalink / raw)
  To: Kyle Huey, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Joerg Roedel
  Cc: kvm, linux-kernel



On 28/11/2016 05:18, Kyle Huey wrote:
> +
> +	if (unlikely(vmx_get_rflags(vcpu) & X86_EFLAGS_TF)) {
> +		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> +			vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1 |
> +						    DR6_RTM;
> +			vcpu->run->debug.arch.pc = vcpu->arch.singlestep_rip;
> +			vcpu->run->debug.arch.exception = DB_VECTOR;
> +			vcpu->run->exit_reason = KVM_EXIT_DEBUG;
> +			return 0;
> +		}
> +
> +		/*
> +		 * "Certain debug exceptions may clear bit 0-3.  The
> +		 * remaining contents of the DR6 register are never
> +		 * cleared by the processor".
> +		 */
> +		vcpu->arch.dr6 &= ~15;
> +		vcpu->arch.dr6 |= DR6_BS | DR6_RTM;
> +		kvm_queue_exception(vcpu, DB_VECTOR);
> +	}

This code is pretty much the same as kvm_vcpu_check_singlestep.  Let's 
not duplicate the code and implement skip_emulated_instruction can be
implemented in x86.c, like

	unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
	int r = EMULATE_DONE;

	/* This would be the no_trap variant */
	kvm_x86_ops->skip_emulated_instruction(vcpu);
	kvm_vcpu_check_singlestep(vcpu, rflags, &r);
	return r == EMULATE_DONE;

(because x86.c/vmx.c/svm.c are separate modules, when moving the function
to x86.c you should rename it to kvm_skip_emulated_instruction).

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction
  2016-11-28 11:42   ` Paolo Bonzini
@ 2016-11-28 16:13     ` Kyle Huey
  2016-11-28 17:19       ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Kyle Huey @ 2016-11-28 16:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, kvm list, open list

On Mon, Nov 28, 2016 at 3:42 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 28/11/2016 05:18, Kyle Huey wrote:
>> +
>> +     if (unlikely(vmx_get_rflags(vcpu) & X86_EFLAGS_TF)) {
>> +             if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
>> +                     vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1 |
>> +                                                 DR6_RTM;
>> +                     vcpu->run->debug.arch.pc = vcpu->arch.singlestep_rip;
>> +                     vcpu->run->debug.arch.exception = DB_VECTOR;
>> +                     vcpu->run->exit_reason = KVM_EXIT_DEBUG;
>> +                     return 0;
>> +             }
>> +
>> +             /*
>> +              * "Certain debug exceptions may clear bit 0-3.  The
>> +              * remaining contents of the DR6 register are never
>> +              * cleared by the processor".
>> +              */
>> +             vcpu->arch.dr6 &= ~15;
>> +             vcpu->arch.dr6 |= DR6_BS | DR6_RTM;
>> +             kvm_queue_exception(vcpu, DB_VECTOR);
>> +     }
>
> This code is pretty much the same as kvm_vcpu_check_singlestep.  Let's
> not duplicate the code and implement skip_emulated_instruction can be
> implemented in x86.c, like
>
>         unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
>         int r = EMULATE_DONE;
>
>         /* This would be the no_trap variant */
>         kvm_x86_ops->skip_emulated_instruction(vcpu);
>         kvm_vcpu_check_singlestep(vcpu, rflags, &r);
>         return r == EMULATE_DONE;
>
> (because x86.c/vmx.c/svm.c are separate modules, when moving the function
> to x86.c you should rename it to kvm_skip_emulated_instruction).
>
> Paolo

They're not exactly the same.  For some reason I don't understand
kvm_vcpu_check_singlestep clears the trap flag.  Perhaps that is also
a bug?

- Kyle

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction
  2016-11-28 16:13     ` Kyle Huey
@ 2016-11-28 17:19       ` Paolo Bonzini
  2016-11-28 18:34         ` Kyle Huey
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-11-28 17:19 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, kvm list, open list



On 28/11/2016 17:13, Kyle Huey wrote:
> On Mon, Nov 28, 2016 at 3:42 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> This code is pretty much the same as kvm_vcpu_check_singlestep.  Let's
>> not duplicate the code and implement skip_emulated_instruction can be
>> implemented in x86.c, like
>>
>>         unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
>>         int r = EMULATE_DONE;
>>
>>         /* This would be the no_trap variant */
>>         kvm_x86_ops->skip_emulated_instruction(vcpu);
>>         kvm_vcpu_check_singlestep(vcpu, rflags, &r);
>>         return r == EMULATE_DONE;
>>
>> (because x86.c/vmx.c/svm.c are separate modules, when moving the function
>> to x86.c you should rename it to kvm_skip_emulated_instruction).
>>
>> Paolo
> 
> They're not exactly the same.  For some reason I don't understand
> kvm_vcpu_check_singlestep clears the trap flag.  Perhaps that is also
> a bug?

The Intel manual says "The processor clears the TF flag before calling
the exception handler" (17.3.1.4), so I think you should do it too.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction
  2016-11-28 17:19       ` Paolo Bonzini
@ 2016-11-28 18:34         ` Kyle Huey
  2016-11-28 22:43           ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Kyle Huey @ 2016-11-28 18:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, kvm list, open list

On Mon, Nov 28, 2016 at 9:19 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 28/11/2016 17:13, Kyle Huey wrote:
>> On Mon, Nov 28, 2016 at 3:42 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> This code is pretty much the same as kvm_vcpu_check_singlestep.  Let's
>>> not duplicate the code and implement skip_emulated_instruction can be
>>> implemented in x86.c, like
>>>
>>>         unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
>>>         int r = EMULATE_DONE;
>>>
>>>         /* This would be the no_trap variant */
>>>         kvm_x86_ops->skip_emulated_instruction(vcpu);
>>>         kvm_vcpu_check_singlestep(vcpu, rflags, &r);
>>>         return r == EMULATE_DONE;
>>>
>>> (because x86.c/vmx.c/svm.c are separate modules, when moving the function
>>> to x86.c you should rename it to kvm_skip_emulated_instruction).
>>>
>>> Paolo
>>
>> They're not exactly the same.  For some reason I don't understand
>> kvm_vcpu_check_singlestep clears the trap flag.  Perhaps that is also
>> a bug?
>
> The Intel manual says "The processor clears the TF flag before calling
> the exception handler" (17.3.1.4), so I think you should do it too.

The processor does this automatically. "When accessing an exception or
interrupt handler through either an interrupt gate or a trap gate, the
processor clears the TF flag in the EFLAGS register after it saves the
contents of the EFLAGS register on the stack." (Vol 3, 6.12.1.2)
Empirically, this holds when injecting an exception on VM entry. If
you take the x86/debug.c test from kvm-unit-tests and inspect RFLAGS
in handle_db (not regs->rflags, but the actual RFLAGS register while
running the exception handler) the TF is clear. And, if you modify my
patch to clear TF before returning, the single stepping ceases after
the CPUID instruction because the TF was in fact cleared for good.

- Kyle

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction
  2016-11-28 18:34         ` Kyle Huey
@ 2016-11-28 22:43           ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2016-11-28 22:43 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, kvm list, open list



On 28/11/2016 19:34, Kyle Huey wrote:
>> > The Intel manual says "The processor clears the TF flag before calling
>> > the exception handler" (17.3.1.4), so I think you should do it too.
> The processor does this automatically. "When accessing an exception or
> interrupt handler through either an interrupt gate or a trap gate, the
> processor clears the TF flag in the EFLAGS register after it saves the
> contents of the EFLAGS register on the stack." (Vol 3, 6.12.1.2)
> Empirically, this holds when injecting an exception on VM entry. If
> you take the x86/debug.c test from kvm-unit-tests and inspect RFLAGS
> in handle_db (not regs->rflags, but the actual RFLAGS register while
> running the exception handler) the TF is clear. And, if you modify my
> patch to clear TF before returning, the single stepping ceases after
> the CPUID instruction because the TF was in fact cleared for good.

Ok, then that would be a bug in kvm_vcpu_check_singlestep (because
kvm_vcpu_check_singlestep is mostly interesting for real mode emulation,
I checked kvm_inject_realmode_interrupt and it clears TF too, in
__emulate_int_real).

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-11-28 22:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-28  4:18 [PATCH 0/5] KVM: VMX: Fix single stepping with emulated instructions Kyle Huey
2016-11-28  4:18 ` [PATCH 1/5] KVM: x86: Add a return value to kvm_emulate_cpuid Kyle Huey
2016-11-28  4:18 ` [PATCH 2/5] KVM: VMX: Reorder some skip_emulated_instruction calls Kyle Huey
2016-11-28  4:18 ` [PATCH 3/5] KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 Kyle Huey
2016-11-28  4:18 ` [PATCH 4/5] KVM: x86: Add a return value to skip_emulated_instruction and propagate it Kyle Huey
2016-11-28  4:18 ` [PATCH 5/5] KVM: VMX: Handle RFLAGS.TF in skip_emulated_instruction Kyle Huey
2016-11-28 11:42   ` Paolo Bonzini
2016-11-28 16:13     ` Kyle Huey
2016-11-28 17:19       ` Paolo Bonzini
2016-11-28 18:34         ` Kyle Huey
2016-11-28 22:43           ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).