kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Handle monitor trap flag during instruction emulation
@ 2020-01-13 22:10 Oliver Upton
  2020-01-13 22:10 ` [PATCH 1/3] KVM: x86: Add vendor-specific #DB payload delivery Oliver Upton
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Oliver Upton @ 2020-01-13 22:10 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Oliver Upton

KVM already provides guests the ability to use the 'monitor trap flag'
VM-execution control. Support for this flag is provided by the fact that
KVM unconditionally forwards MTF VM-exits to the guest (if requested),
as KVM doesn't utilize MTF. While this provides support during hardware
instruction execution, it is insufficient for instruction emulation.

Should L0 emulate an instruction on the behalf of L2, L0 should also
synthesize an MTF VM-exit into L1, should control be set.

The first patch fixes the handling of #DB payloads for both Intel and
AMD. To support MTF, KVM must also populate the 'pending debug
exceptions' field, rather than directly manipulating the debug register
state. Additionally, the exception payload associated with #DB is said
to be compatible with the 'pending debug exceptions' field in VMX. This
does not map cleanly into an AMD DR6 register, requiring bit 12 (enabled
breakpoint on Intel, reserved MBZ on AMD) to be masked off.

The second patch implements MTF under instruction emulation by adding
vendor-specific hooks to kvm_skip_emulated_instruction(). Should any
non-debug exception be pending before this call, MTF will follow event
delivery. Otherwise, an MTF VM-exit may be synthesized directly into L1.

Third patch introduces tests to kvm-unit-tests. These tests path both
under virtualization and on bare-metal.

Oliver Upton (2):
  KVM: x86: Add vendor-specific #DB payload delivery
  KVM: x86: Emulate MTF when performing instruction emulation

 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm.c              | 25 +++++++++++++++++++++
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/vmx/nested.h       |  5 +++++
 arch/x86/kvm/vmx/vmx.c          | 39 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              | 27 ++++++-----------------
 6 files changed, 78 insertions(+), 22 deletions(-)

-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] KVM: x86: Add vendor-specific #DB payload delivery
  2020-01-13 22:10 [PATCH 0/3] Handle monitor trap flag during instruction emulation Oliver Upton
@ 2020-01-13 22:10 ` Oliver Upton
  2020-01-13 22:52   ` Sean Christopherson
  2020-01-13 22:10 ` [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation Oliver Upton
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Oliver Upton @ 2020-01-13 22:10 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Oliver Upton, Peter Shier, Jim Mattson

VMX and SVM differ slightly in the handling of #DB debug-trap exceptions
upon VM-entry. VMX defines the 'pending debug exceptions' field in the
VMCS for hardware to inject the event and update debug register state
upon VM-entry.

SVM provides no such mechanism for maintaining the state of a pending
debug exception, as the debug register state is updated before delivery
of #VMEXIT.

Lastly, while KVM defines the exception payload for debug-trap
exceptions as compatible with the 'pending debug exceptions' VMCS field,
it is not compatible with the DR6 register across both vendors.

Split the #DB payload delivery between SVM and VMX to capture the
nuanced differences in instruction emulation. Utilize the 'pending debug
exceptions' field on VMX to deliver the payload. On SVM, directly update
register state with the appropriate requested bits from the exception
payload.

Fixes: f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery")

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c              | 20 ++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c          | 20 +++++++++++++++++++-
 arch/x86/kvm/x86.c              | 21 +--------------------
 4 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b79cd6aa4075..4739ca11885d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1100,6 +1100,7 @@ struct kvm_x86_ops {
 	void (*set_nmi)(struct kvm_vcpu *vcpu);
 	void (*queue_exception)(struct kvm_vcpu *vcpu);
 	void (*cancel_injection)(struct kvm_vcpu *vcpu);
+	void (*deliver_db_payload)(struct kvm_vcpu *vcpu,
+				   unsigned long payload);
 	int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
 	int (*nmi_allowed)(struct kvm_vcpu *vcpu);
 	bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 122d4ce3b1ab..16ded16af997 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5615,6 +5615,25 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
 	svm_complete_interrupts(svm);
 }
 
+static void svm_deliver_db_payload(struct kvm_vcpu *vcpu, unsigned long payload)
+{
+	/*
+	 * The exception payload is defined as compatible with the 'pending
+	 * debug exceptions' field in VMX, not the DR6 register. Clear bit 12
+	 * (enabled breakpoint) in the payload, which is reserved MBZ in DR6.
+	 */
+	payload &= ~BIT(12);
+
+	/*
+	 * The processor updates bits 3:0 according to the matched breakpoint
+	 * conditions on every debug breakpoint or general-detect condition.
+	 * Hardware will not clear any other bits in DR6. Clear bits 3:0 and set
+	 * the bits requested in the exception payload.
+	 */
+	vcpu->arch.dr6 &= ~DR_TRAP_BITS;
+	vcpu->arch.dr6 |= payload;
+}
+
 static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -7308,6 +7327,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.set_nmi = svm_inject_nmi,
 	.queue_exception = svm_queue_exception,
 	.cancel_injection = svm_cancel_injection,
+	.deliver_db_payload = svm_deliver_db_payload,
 	.interrupt_allowed = svm_interrupt_allowed,
 	.nmi_allowed = svm_nmi_allowed,
 	.get_nmi_mask = svm_get_nmi_mask,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e3394c839dea..148696199c88 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1613,6 +1613,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned nr = vcpu->arch.exception.nr;
 	bool has_error_code = vcpu->arch.exception.has_error_code;
+	bool has_payload = vcpu->arch.exception.has_payload;
 	u32 error_code = vcpu->arch.exception.error_code;
 	u32 intr_info = nr | INTR_INFO_VALID_MASK;
 
@@ -1640,7 +1641,13 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 	} else
 		intr_info |= INTR_TYPE_HARD_EXCEPTION;
 
-	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
+	/*
+	 * Debug-trap exceptions are injected into the guest via the 'pending
+	 * debug exceptions' vmcs field and thus should not be injected into the
+	 * guest using the general event injection mechanism.
+	 */
+	if (nr != DB_VECTOR || !has_payload)
+		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
 
 	vmx_clear_hlt(vcpu);
 }
@@ -6398,6 +6405,16 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
 	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
 }
 
+static void vmx_deliver_db_payload(struct kvm_vcpu *vcpu, unsigned long payload)
+{
+	/*
+	 * Synthesized debug exceptions that have an associated payload must be
+	 * traps, and thus the 'pending debug exceptions' field can be used to
+	 * allow hardware to inject the event upon VM-entry.
+	 */
+	vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, payload);
+}
+
 static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
 {
 	int i, nr_msrs;
@@ -7821,6 +7838,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.set_nmi = vmx_inject_nmi,
 	.queue_exception = vmx_queue_exception,
 	.cancel_injection = vmx_cancel_injection,
+	.deliver_db_payload = vmx_deliver_db_payload,
 	.interrupt_allowed = vmx_interrupt_allowed,
 	.nmi_allowed = vmx_nmi_allowed,
 	.get_nmi_mask = vmx_get_nmi_mask,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf917139de6b..c14174c033e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -415,26 +415,7 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
 
 	switch (nr) {
 	case DB_VECTOR:
-		/*
-		 * "Certain debug exceptions may clear bit 0-3.  The
-		 * remaining contents of the DR6 register are never
-		 * cleared by the processor".
-		 */
-		vcpu->arch.dr6 &= ~DR_TRAP_BITS;
-		/*
-		 * DR6.RTM is set by all #DB exceptions that don't clear it.
-		 */
-		vcpu->arch.dr6 |= DR6_RTM;
-		vcpu->arch.dr6 |= payload;
-		/*
-		 * Bit 16 should be set in the payload whenever the #DB
-		 * exception should clear DR6.RTM. This makes the payload
-		 * compatible with the pending debug exceptions under VMX.
-		 * Though not currently documented in the SDM, this also
-		 * makes the payload compatible with the exit qualification
-		 * for #DB exceptions under VMX.
-		 */
-		vcpu->arch.dr6 ^= payload & DR6_RTM;
+		kvm_x86_ops->deliver_db_payload(vcpu, payload);
 		break;
 	case PF_VECTOR:
 		vcpu->arch.cr2 = payload;
-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation
  2020-01-13 22:10 [PATCH 0/3] Handle monitor trap flag during instruction emulation Oliver Upton
  2020-01-13 22:10 ` [PATCH 1/3] KVM: x86: Add vendor-specific #DB payload delivery Oliver Upton
@ 2020-01-13 22:10 ` Oliver Upton
  2020-01-14  0:05   ` Sean Christopherson
  2020-01-13 22:10 ` [kvm-unit-tests PATCH 3/3] x86: VMX: Add tests for monitor trap flag Oliver Upton
  2020-01-13 22:35 ` [PATCH 0/3] Handle monitor trap flag during instruction emulation Sean Christopherson
  3 siblings, 1 reply; 15+ messages in thread
From: Oliver Upton @ 2020-01-13 22:10 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Oliver Upton, Peter Shier, Jim Mattson

Since commit 5f3d45e7f282 ("kvm/x86: add support for
MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap
flag processor-based execution control for its L2 guest. KVM simply
forwards any MTF VM-exits to the L1 guest, which works for normal
instruction execution.

However, when KVM needs to emulate an instruction on the behalf of an L2
guest, the monitor trap flag is not emulated. Add the necessary logic to
kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon
instruction emulation for L2.

Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG")

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c              |  5 +++++
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/vmx/nested.h       |  5 +++++
 arch/x86/kvm/vmx/vmx.c          | 19 +++++++++++++++++++
 arch/x86/kvm/x86.c              |  6 ++++++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4739ca11885d..89dcdc7201ae 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1092,6 +1092,7 @@ struct kvm_x86_ops {
 	void (*run)(struct kvm_vcpu *vcpu);
 	int (*handle_exit)(struct kvm_vcpu *vcpu);
 	int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
+	void (*emulation_complete)(struct kvm_vcpu *vcpu);
 	void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
 	u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
 	void (*patch_hypercall)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 16ded16af997..f21eec4443d5 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -802,6 +802,10 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static void svm_emulation_complete(struct kvm_vcpu *vcpu)
+{
+}
+
 static void svm_queue_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -7320,6 +7324,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.run = svm_vcpu_run,
 	.handle_exit = handle_exit,
 	.skip_emulated_instruction = skip_emulated_instruction,
+	.emulation_complete = svm_emulation_complete,
 	.set_interrupt_shadow = svm_set_interrupt_shadow,
 	.get_interrupt_shadow = svm_get_interrupt_shadow,
 	.patch_hypercall = svm_patch_hypercall,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4aea7d304beb..ee26f2d10a09 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5578,7 +5578,7 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason)
 	case EXIT_REASON_MWAIT_INSTRUCTION:
 		return nested_cpu_has(vmcs12, CPU_BASED_MWAIT_EXITING);
 	case EXIT_REASON_MONITOR_TRAP_FLAG:
-		return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_TRAP_FLAG);
+		return nested_cpu_has_mtf(vmcs12);
 	case EXIT_REASON_MONITOR_INSTRUCTION:
 		return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_EXITING);
 	case EXIT_REASON_PAUSE_INSTRUCTION:
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index fc874d4ead0f..901d2745bc93 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -238,6 +238,11 @@ static inline bool nested_cpu_has_save_preemption_timer(struct vmcs12 *vmcs12)
 	    VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 }
 
+static inline bool nested_cpu_has_mtf(struct vmcs12 *vmcs12)
+{
+	return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_TRAP_FLAG);
+}
+
 /*
  * In nested virtualization, check if L1 asked to exit on external interrupts.
  * For most existing hypervisors, this will always return true.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 148696199c88..8d3b693c3d3a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1595,6 +1595,24 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static void vmx_emulation_complete(struct kvm_vcpu *vcpu)
+{
+	if (!(is_guest_mode(vcpu) &&
+	      nested_cpu_has_mtf(get_vmcs12(vcpu))))
+		return;
+
+	/*
+	 * Per the SDM, MTF takes priority over debug-trap instructions. As
+	 * instruction emulation is completed (i.e. at the instruction
+	 * boundary), any #DB exception must be a trap. Emulate an MTF VM-exit
+	 * into L1 should there be a debug-trap exception pending or no
+	 * exception pending.
+	 */
+	if (!vcpu->arch.exception.pending ||
+	    vcpu->arch.exception.nr == DB_VECTOR)
+		nested_vmx_vmexit(vcpu, EXIT_REASON_MONITOR_TRAP_FLAG, 0, 0);
+}
+
 static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
 {
 	/*
@@ -7831,6 +7849,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.run = vmx_vcpu_run,
 	.handle_exit = vmx_handle_exit,
 	.skip_emulated_instruction = skip_emulated_instruction,
+	.emulation_complete = vmx_emulation_complete,
 	.set_interrupt_shadow = vmx_set_interrupt_shadow,
 	.get_interrupt_shadow = vmx_get_interrupt_shadow,
 	.patch_hypercall = vmx_patch_hypercall,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c14174c033e4..d3af7a8a3c4b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6546,6 +6546,12 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	 */
 	if (unlikely(rflags & X86_EFLAGS_TF))
 		r = kvm_vcpu_do_singlestep(vcpu);
+	/*
+	 * Allow for vendor-specific handling of completed emulation before
+	 * returning.
+	 */
+	if (r)
+		kvm_x86_ops->emulation_complete(vcpu);
 	return r;
 }
 EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [kvm-unit-tests PATCH 3/3] x86: VMX: Add tests for monitor trap flag
  2020-01-13 22:10 [PATCH 0/3] Handle monitor trap flag during instruction emulation Oliver Upton
  2020-01-13 22:10 ` [PATCH 1/3] KVM: x86: Add vendor-specific #DB payload delivery Oliver Upton
  2020-01-13 22:10 ` [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation Oliver Upton
@ 2020-01-13 22:10 ` Oliver Upton
  2020-01-13 22:35 ` [PATCH 0/3] Handle monitor trap flag during instruction emulation Sean Christopherson
  3 siblings, 0 replies; 15+ messages in thread
From: Oliver Upton @ 2020-01-13 22:10 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Oliver Upton, Peter Shier, Jim Mattson

Test to verify that MTF VM-exits into host are synthesized when the
'monitor trap flag' processor-based VM-execution control is set under
various conditions.

Expect an MTF VM-exit if instruction execution produces no events other
than MTF. Should instruction execution produce a concurrent debug-trap
and MTF event, expect an MTF VM-exit with the 'pending debug exceptions'
VMCS field set. Expect an MTF VM-exit to follow event delivery should
instruction execution generate a higher-priority event, such as a
general-protection fault. Lastly, expect an MTF VM-exit to follow
delivery of a debug-trap software exception (INT1/INT3/INTO/INT n).

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 x86/vmx.h       |   1 +
 x86/vmx_tests.c | 157 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 158 insertions(+)

diff --git a/x86/vmx.h b/x86/vmx.h
index 6214400f2b53..6adf0916564b 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -399,6 +399,7 @@ enum Ctrl0 {
 	CPU_NMI_WINDOW		= 1ul << 22,
 	CPU_IO			= 1ul << 24,
 	CPU_IO_BITMAP		= 1ul << 25,
+	CPU_MTF			= 1ul << 27,
 	CPU_MSR_BITMAP		= 1ul << 28,
 	CPU_MONITOR		= 1ul << 29,
 	CPU_PAUSE		= 1ul << 30,
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index fce773c18b8b..5beed8e29b4b 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -4952,6 +4952,162 @@ static void test_vmx_preemption_timer(void)
 	vmcs_write(EXI_CONTROLS, saved_exit);
 }
 
+extern unsigned char test_mtf1;
+extern unsigned char test_mtf2;
+extern unsigned char test_mtf3;
+
+__attribute__((noclone)) static void test_mtf_guest(void)
+{
+	asm ("vmcall;\n\t"
+	     "out %al, $0x80;\n\t"
+	     "test_mtf1:\n\t"
+	     "vmcall;\n\t"
+	     "out %al, $0x80;\n\t"
+	     "test_mtf2:\n\t"
+	     /*
+	      * Prepare for the 'MOV CR3' test. Attempt to induce a
+	      * general-protection fault by moving a non-canonical address into
+	      * CR3. The 'MOV CR3' instruction does not take an imm64 operand,
+	      * so we must MOV the desired value into a register first.
+	      *
+	      * MOV RAX is done before the VMCALL such that MTF is only enabled
+	      * for the instruction under test.
+	      */
+	     "mov $0x8000000000000000, %rax;\n\t"
+	     "vmcall;\n\t"
+	     "mov %rax, %cr3;\n\t"
+	     "test_mtf3:\n\t"
+	     "vmcall;\n\t"
+	     /*
+	      * ICEBP/INT1 instruction. Though the instruction is now
+	      * documented, don't rely on assemblers enumerating the
+	      * instruction. Resort to hand assembly.
+	      */
+	     ".byte 0xf1;\n\t");
+}
+
+static void test_mtf_gp_handler(struct ex_regs *regs)
+{
+	regs->rip = (unsigned long) &test_mtf3;
+}
+
+static void test_mtf_db_handler(struct ex_regs *regs)
+{
+}
+
+static void enable_mtf(void)
+{
+	u32 ctrl0 = vmcs_read(CPU_EXEC_CTRL0);
+
+	vmcs_write(CPU_EXEC_CTRL0, ctrl0 | CPU_MTF);
+}
+
+static void disable_mtf(void)
+{
+	u32 ctrl0 = vmcs_read(CPU_EXEC_CTRL0);
+
+	vmcs_write(CPU_EXEC_CTRL0, ctrl0 & ~CPU_MTF);
+}
+
+static void enable_tf(void)
+{
+	unsigned long rflags = vmcs_read(GUEST_RFLAGS);
+
+	vmcs_write(GUEST_RFLAGS, rflags | X86_EFLAGS_TF);
+}
+
+static void disable_tf(void)
+{
+	unsigned long rflags = vmcs_read(GUEST_RFLAGS);
+
+	vmcs_write(GUEST_RFLAGS, rflags & ~X86_EFLAGS_TF);
+}
+
+static void report_mtf(const char *insn_name, unsigned long exp_rip)
+{
+	unsigned long rip = vmcs_read(GUEST_RIP);
+
+	assert_exit_reason(VMX_MTF);
+	report(rip == exp_rip, "MTF VM-exit after %s instruction. RIP: 0x%lx (expected 0x%lx)",
+	       insn_name, rip, exp_rip);
+}
+
+static void vmx_mtf_test(void)
+{
+	unsigned long pending_dbg;
+	handler old_gp, old_db;
+
+	if (!(ctrl_cpu_rev[0].clr & CPU_MTF)) {
+		printf("CPU does not support the 'monitor trap flag' processor-based VM-execution control.\n");
+		return;
+	}
+
+	test_set_guest(test_mtf_guest);
+
+	/* Expect an MTF VM-exit after OUT instruction */
+	enter_guest();
+	skip_exit_vmcall();
+
+	enable_mtf();
+	enter_guest();
+	report_mtf("OUT", (unsigned long) &test_mtf1);
+	disable_mtf();
+
+	/*
+	 * Concurrent #DB trap and MTF on instruction boundary. Expect MTF
+	 * VM-exit with populated 'pending debug exceptions' VMCS field.
+	 */
+	enter_guest();
+	skip_exit_vmcall();
+
+	enable_mtf();
+	enable_tf();
+
+	enter_guest();
+	report_mtf("OUT", (unsigned long) &test_mtf2);
+	pending_dbg = vmcs_read(GUEST_PENDING_DEBUG);
+	report(pending_dbg & DR_STEP,
+	       "'pending debug exceptions' field after MTF VM-exit: 0x%lx (expected 0x%lx)",
+	       pending_dbg, (unsigned long) DR_STEP);
+
+	disable_mtf();
+	disable_tf();
+	vmcs_write(GUEST_PENDING_DEBUG, 0);
+
+	/*
+	 * #GP exception takes priority over MTF. Expect MTF VM-exit with rIP
+	 * advanced to first instruction of #GP handler.
+	 */
+	enter_guest();
+	skip_exit_vmcall();
+
+	old_gp = handle_exception(GP_VECTOR, test_mtf_gp_handler);
+
+	enable_mtf();
+	enter_guest();
+	report_mtf("MOV CR3", (unsigned long) get_idt_addr(&boot_idt[GP_VECTOR]));
+	disable_mtf();
+
+	/*
+	 * Concurrent MTF and privileged software exception (i.e. ICEBP/INT1).
+	 * MTF should follow the delivery of #DB trap, though the SDM doesn't
+	 * provide clear indication of the relative priority.
+	 */
+	enter_guest();
+	skip_exit_vmcall();
+
+	handle_exception(GP_VECTOR, old_gp);
+	old_db = handle_exception(DB_VECTOR, test_mtf_db_handler);
+
+	enable_mtf();
+	enter_guest();
+	report_mtf("INT1", (unsigned long) get_idt_addr(&boot_idt[DB_VECTOR]));
+	disable_mtf();
+
+	enter_guest();
+	handle_exception(DB_VECTOR, old_db);
+}
+
 /*
  * Tests for VM-execution control fields
  */
@@ -9443,5 +9599,6 @@ struct vmx_test vmx_tests[] = {
 	TEST(atomic_switch_max_msrs_test),
 	TEST(atomic_switch_overflow_msrs_test),
 	TEST(rdtsc_vmexit_diff_test),
+	TEST(vmx_mtf_test),
 	{ NULL, NULL, NULL, NULL, NULL, {0} },
 };
-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Handle monitor trap flag during instruction emulation
  2020-01-13 22:10 [PATCH 0/3] Handle monitor trap flag during instruction emulation Oliver Upton
                   ` (2 preceding siblings ...)
  2020-01-13 22:10 ` [kvm-unit-tests PATCH 3/3] x86: VMX: Add tests for monitor trap flag Oliver Upton
@ 2020-01-13 22:35 ` Sean Christopherson
  2020-01-13 23:06   ` Oliver Upton
  3 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2020-01-13 22:35 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, Paolo Bonzini

On Mon, Jan 13, 2020 at 02:10:50PM -0800, Oliver Upton wrote:
> KVM already provides guests the ability to use the 'monitor trap flag'
> VM-execution control. Support for this flag is provided by the fact that
> KVM unconditionally forwards MTF VM-exits to the guest (if requested),
> as KVM doesn't utilize MTF. While this provides support during hardware
> instruction execution, it is insufficient for instruction emulation.
> 
> Should L0 emulate an instruction on the behalf of L2, L0 should also
> synthesize an MTF VM-exit into L1, should control be set.
> 
> The first patch fixes the handling of #DB payloads for both Intel and
> AMD. To support MTF, KVM must also populate the 'pending debug
> exceptions' field, rather than directly manipulating the debug register
> state. Additionally, the exception payload associated with #DB is said
> to be compatible with the 'pending debug exceptions' field in VMX. This
> does not map cleanly into an AMD DR6 register, requiring bit 12 (enabled
> breakpoint on Intel, reserved MBZ on AMD) to be masked off.
> 
> The second patch implements MTF under instruction emulation by adding
> vendor-specific hooks to kvm_skip_emulated_instruction(). Should any
> non-debug exception be pending before this call, MTF will follow event
> delivery. Otherwise, an MTF VM-exit may be synthesized directly into L1.
> 
> Third patch introduces tests to kvm-unit-tests. These tests path both
> under virtualization and on bare-metal.
> 
> Oliver Upton (2):
>   KVM: x86: Add vendor-specific #DB payload delivery
>   KVM: x86: Emulate MTF when performing instruction emulation
> 
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/svm.c              | 25 +++++++++++++++++++++
>  arch/x86/kvm/vmx/nested.c       |  2 +-
>  arch/x86/kvm/vmx/nested.h       |  5 +++++
>  arch/x86/kvm/vmx/vmx.c          | 39 ++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c              | 27 ++++++-----------------
>  6 files changed, 78 insertions(+), 22 deletions(-)
> 
> -- 

What commit is this series based on?  It doesn't apply cleanly on the
current kvm/master or kvm/queue.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] KVM: x86: Add vendor-specific #DB payload delivery
  2020-01-13 22:10 ` [PATCH 1/3] KVM: x86: Add vendor-specific #DB payload delivery Oliver Upton
@ 2020-01-13 22:52   ` Sean Christopherson
  2020-01-13 23:16     ` Oliver Upton
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2020-01-13 22:52 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, Paolo Bonzini, Peter Shier, Jim Mattson

On Mon, Jan 13, 2020 at 02:10:51PM -0800, Oliver Upton wrote:
> VMX and SVM differ slightly in the handling of #DB debug-trap exceptions
> upon VM-entry. VMX defines the 'pending debug exceptions' field in the
> VMCS for hardware to inject the event and update debug register state
> upon VM-entry.
> 
> SVM provides no such mechanism for maintaining the state of a pending
> debug exception, as the debug register state is updated before delivery
> of #VMEXIT.
> 
> Lastly, while KVM defines the exception payload for debug-trap
> exceptions as compatible with the 'pending debug exceptions' VMCS field,
> it is not compatible with the DR6 register across both vendors.
> 
> Split the #DB payload delivery between SVM and VMX to capture the
> nuanced differences in instruction emulation. Utilize the 'pending debug
> exceptions' field on VMX to deliver the payload. On SVM, directly update
> register state with the appropriate requested bits from the exception
> payload.
> 
> Fixes: f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery")

Without even looking at the code, I recommend splitting this into two
patches: first fix SVM without introducing any functional changes to VMX,
and then change VMX to use the PENDING_DBG field.  That way the SVM code
isn't affected if for some reason the VMX patch needs to be reverted.

Not that I doubt the correctness of the VMX change, I just expect problems
by default whenever anything so much as breathes on dr6 :-)

> Signed-off-by: Oliver Upton <oupton@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>
> Reviewed-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/svm.c              | 20 ++++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.c          | 20 +++++++++++++++++++-
>  arch/x86/kvm/x86.c              | 21 +--------------------
>  4 files changed, 41 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b79cd6aa4075..4739ca11885d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1100,6 +1100,7 @@ struct kvm_x86_ops {
>  	void (*set_nmi)(struct kvm_vcpu *vcpu);
>  	void (*queue_exception)(struct kvm_vcpu *vcpu);
>  	void (*cancel_injection)(struct kvm_vcpu *vcpu);
> +	void (*deliver_db_payload)(struct kvm_vcpu *vcpu,
> +				   unsigned long payload);
>  	int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
>  	int (*nmi_allowed)(struct kvm_vcpu *vcpu);
>  	bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 122d4ce3b1ab..16ded16af997 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -5615,6 +5615,25 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
>  	svm_complete_interrupts(svm);
>  }
>  
> +static void svm_deliver_db_payload(struct kvm_vcpu *vcpu, unsigned long payload)
> +{
> +	/*
> +	 * The exception payload is defined as compatible with the 'pending
> +	 * debug exceptions' field in VMX, not the DR6 register. Clear bit 12
> +	 * (enabled breakpoint) in the payload, which is reserved MBZ in DR6.
> +	 */
> +	payload &= ~BIT(12);
> +
> +	/*
> +	 * The processor updates bits 3:0 according to the matched breakpoint
> +	 * conditions on every debug breakpoint or general-detect condition.
> +	 * Hardware will not clear any other bits in DR6. Clear bits 3:0 and set
> +	 * the bits requested in the exception payload.
> +	 */
> +	vcpu->arch.dr6 &= ~DR_TRAP_BITS;
> +	vcpu->arch.dr6 |= payload;
> +}
> +
>  static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -7308,6 +7327,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  	.set_nmi = svm_inject_nmi,
>  	.queue_exception = svm_queue_exception,
>  	.cancel_injection = svm_cancel_injection,
> +	.deliver_db_payload = svm_deliver_db_payload,
>  	.interrupt_allowed = svm_interrupt_allowed,
>  	.nmi_allowed = svm_nmi_allowed,
>  	.get_nmi_mask = svm_get_nmi_mask,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e3394c839dea..148696199c88 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1613,6 +1613,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	unsigned nr = vcpu->arch.exception.nr;
>  	bool has_error_code = vcpu->arch.exception.has_error_code;
> +	bool has_payload = vcpu->arch.exception.has_payload;
>  	u32 error_code = vcpu->arch.exception.error_code;
>  	u32 intr_info = nr | INTR_INFO_VALID_MASK;
>  
> @@ -1640,7 +1641,13 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
>  	} else
>  		intr_info |= INTR_TYPE_HARD_EXCEPTION;
>  
> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
> +	/*
> +	 * Debug-trap exceptions are injected into the guest via the 'pending
> +	 * debug exceptions' vmcs field and thus should not be injected into the
> +	 * guest using the general event injection mechanism.
> +	 */
> +	if (nr != DB_VECTOR || !has_payload)
> +		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
>  
>  	vmx_clear_hlt(vcpu);
>  }
> @@ -6398,6 +6405,16 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>  	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
>  }
>  
> +static void vmx_deliver_db_payload(struct kvm_vcpu *vcpu, unsigned long payload)
> +{
> +	/*
> +	 * Synthesized debug exceptions that have an associated payload must be
> +	 * traps, and thus the 'pending debug exceptions' field can be used to
> +	 * allow hardware to inject the event upon VM-entry.
> +	 */

I'm confused by this, probably because I don't understand what you mean by
"Synthesized debug exceptions".  E.g. code #DBs are fault-like and update
dr6.

> +	vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, payload);
> +}
> +
>  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
>  {
>  	int i, nr_msrs;
> @@ -7821,6 +7838,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.set_nmi = vmx_inject_nmi,
>  	.queue_exception = vmx_queue_exception,
>  	.cancel_injection = vmx_cancel_injection,
> +	.deliver_db_payload = vmx_deliver_db_payload,
>  	.interrupt_allowed = vmx_interrupt_allowed,
>  	.nmi_allowed = vmx_nmi_allowed,
>  	.get_nmi_mask = vmx_get_nmi_mask,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf917139de6b..c14174c033e4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -415,26 +415,7 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
>  
>  	switch (nr) {
>  	case DB_VECTOR:
> -		/*
> -		 * "Certain debug exceptions may clear bit 0-3.  The
> -		 * remaining contents of the DR6 register are never
> -		 * cleared by the processor".
> -		 */
> -		vcpu->arch.dr6 &= ~DR_TRAP_BITS;
> -		/*
> -		 * DR6.RTM is set by all #DB exceptions that don't clear it.
> -		 */
> -		vcpu->arch.dr6 |= DR6_RTM;
> -		vcpu->arch.dr6 |= payload;
> -		/*
> -		 * Bit 16 should be set in the payload whenever the #DB
> -		 * exception should clear DR6.RTM. This makes the payload
> -		 * compatible with the pending debug exceptions under VMX.
> -		 * Though not currently documented in the SDM, this also
> -		 * makes the payload compatible with the exit qualification
> -		 * for #DB exceptions under VMX.
> -		 */
> -		vcpu->arch.dr6 ^= payload & DR6_RTM;
> +		kvm_x86_ops->deliver_db_payload(vcpu, payload);
>  		break;
>  	case PF_VECTOR:
>  		vcpu->arch.cr2 = payload;
> -- 
> 2.25.0.rc1.283.g88dfdc4193-goog
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Handle monitor trap flag during instruction emulation
  2020-01-13 22:35 ` [PATCH 0/3] Handle monitor trap flag during instruction emulation Sean Christopherson
@ 2020-01-13 23:06   ` Oliver Upton
  0 siblings, 0 replies; 15+ messages in thread
From: Oliver Upton @ 2020-01-13 23:06 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, Paolo Bonzini

On Mon, Jan 13, 2020 at 02:35:04PM -0800, Sean Christopherson wrote:
> On Mon, Jan 13, 2020 at 02:10:50PM -0800, Oliver Upton wrote:
> > KVM already provides guests the ability to use the 'monitor trap flag'
> > VM-execution control. Support for this flag is provided by the fact that
> > KVM unconditionally forwards MTF VM-exits to the guest (if requested),
> > as KVM doesn't utilize MTF. While this provides support during hardware
> > instruction execution, it is insufficient for instruction emulation.
> > 
> > Should L0 emulate an instruction on the behalf of L2, L0 should also
> > synthesize an MTF VM-exit into L1, should control be set.
> > 
> > The first patch fixes the handling of #DB payloads for both Intel and
> > AMD. To support MTF, KVM must also populate the 'pending debug
> > exceptions' field, rather than directly manipulating the debug register
> > state. Additionally, the exception payload associated with #DB is said
> > to be compatible with the 'pending debug exceptions' field in VMX. This
> > does not map cleanly into an AMD DR6 register, requiring bit 12 (enabled
> > breakpoint on Intel, reserved MBZ on AMD) to be masked off.
> > 
> > The second patch implements MTF under instruction emulation by adding
> > vendor-specific hooks to kvm_skip_emulated_instruction(). Should any
> > non-debug exception be pending before this call, MTF will follow event
> > delivery. Otherwise, an MTF VM-exit may be synthesized directly into L1.
> > 
> > Third patch introduces tests to kvm-unit-tests. These tests path both
> > under virtualization and on bare-metal.
> > 
> > Oliver Upton (2):
> >   KVM: x86: Add vendor-specific #DB payload delivery
> >   KVM: x86: Emulate MTF when performing instruction emulation
> > 
> >  arch/x86/include/asm/kvm_host.h |  2 ++
> >  arch/x86/kvm/svm.c              | 25 +++++++++++++++++++++
> >  arch/x86/kvm/vmx/nested.c       |  2 +-
> >  arch/x86/kvm/vmx/nested.h       |  5 +++++
> >  arch/x86/kvm/vmx/vmx.c          | 39 ++++++++++++++++++++++++++++++++-
> >  arch/x86/kvm/x86.c              | 27 ++++++-----------------
> >  6 files changed, 78 insertions(+), 22 deletions(-)
> > 
> > -- 
> 
> What commit is this series based on?  It doesn't apply cleanly on the
> current kvm/master or kvm/queue.

Blech. I use torvalds/master for initial review before sending out (woo,
Gerrit!). Seems I sent out my set based on torvalds, not kvm. I'll
rebase in v2 (while addressing your comments).

Thanks for the prompt reply, Sean :)

--
Best,
Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] KVM: x86: Add vendor-specific #DB payload delivery
  2020-01-13 22:52   ` Sean Christopherson
@ 2020-01-13 23:16     ` Oliver Upton
  2020-01-13 23:51       ` Oliver Upton
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Upton @ 2020-01-13 23:16 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, Paolo Bonzini, Peter Shier, Jim Mattson

On Mon, Jan 13, 2020 at 02:52:10PM -0800, Sean Christopherson wrote:
> On Mon, Jan 13, 2020 at 02:10:51PM -0800, Oliver Upton wrote:
> > VMX and SVM differ slightly in the handling of #DB debug-trap exceptions
> > upon VM-entry. VMX defines the 'pending debug exceptions' field in the
> > VMCS for hardware to inject the event and update debug register state
> > upon VM-entry.
> > 
> > SVM provides no such mechanism for maintaining the state of a pending
> > debug exception, as the debug register state is updated before delivery
> > of #VMEXIT.
> > 
> > Lastly, while KVM defines the exception payload for debug-trap
> > exceptions as compatible with the 'pending debug exceptions' VMCS field,
> > it is not compatible with the DR6 register across both vendors.
> > 
> > Split the #DB payload delivery between SVM and VMX to capture the
> > nuanced differences in instruction emulation. Utilize the 'pending debug
> > exceptions' field on VMX to deliver the payload. On SVM, directly update
> > register state with the appropriate requested bits from the exception
> > payload.
> > 
> > Fixes: f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery")
> 
> Without even looking at the code, I recommend splitting this into two
> patches: first fix SVM without introducing any functional changes to VMX,
> and then change VMX to use the PENDING_DBG field.  That way the SVM code
> isn't affected if for some reason the VMX patch needs to be reverted.

Alright, I'll split this between vendors in the SVM patch, but keep the
existing DR6 magic in both. I'll also mask BIT(12) in the SVM patch.

> Not that I doubt the correctness of the VMX change, I just expect problems
> by default whenever anything so much as breathes on dr6 :-)

Hah. I'll muck with VMX afterwards :)

As always, thanks for the great, prompt review Sean.

> > Signed-off-by: Oliver Upton <oupton@google.com>
> > Reviewed-by: Peter Shier <pshier@google.com>
> > Reviewed-by: Jim Mattson <jmattson@google.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/kvm/svm.c              | 20 ++++++++++++++++++++
> >  arch/x86/kvm/vmx/vmx.c          | 20 +++++++++++++++++++-
> >  arch/x86/kvm/x86.c              | 21 +--------------------
> >  4 files changed, 41 insertions(+), 21 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index b79cd6aa4075..4739ca11885d 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1100,6 +1100,7 @@ struct kvm_x86_ops {
> >  	void (*set_nmi)(struct kvm_vcpu *vcpu);
> >  	void (*queue_exception)(struct kvm_vcpu *vcpu);
> >  	void (*cancel_injection)(struct kvm_vcpu *vcpu);
> > +	void (*deliver_db_payload)(struct kvm_vcpu *vcpu,
> > +				   unsigned long payload);
> >  	int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
> >  	int (*nmi_allowed)(struct kvm_vcpu *vcpu);
> >  	bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 122d4ce3b1ab..16ded16af997 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -5615,6 +5615,25 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
> >  	svm_complete_interrupts(svm);
> >  }
> >  
> > +static void svm_deliver_db_payload(struct kvm_vcpu *vcpu, unsigned long payload)
> > +{
> > +	/*
> > +	 * The exception payload is defined as compatible with the 'pending
> > +	 * debug exceptions' field in VMX, not the DR6 register. Clear bit 12
> > +	 * (enabled breakpoint) in the payload, which is reserved MBZ in DR6.
> > +	 */
> > +	payload &= ~BIT(12);
> > +
> > +	/*
> > +	 * The processor updates bits 3:0 according to the matched breakpoint
> > +	 * conditions on every debug breakpoint or general-detect condition.
> > +	 * Hardware will not clear any other bits in DR6. Clear bits 3:0 and set
> > +	 * the bits requested in the exception payload.
> > +	 */
> > +	vcpu->arch.dr6 &= ~DR_TRAP_BITS;
> > +	vcpu->arch.dr6 |= payload;
> > +}
> > +
> >  static void svm_vcpu_run(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> > @@ -7308,6 +7327,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >  	.set_nmi = svm_inject_nmi,
> >  	.queue_exception = svm_queue_exception,
> >  	.cancel_injection = svm_cancel_injection,
> > +	.deliver_db_payload = svm_deliver_db_payload,
> >  	.interrupt_allowed = svm_interrupt_allowed,
> >  	.nmi_allowed = svm_nmi_allowed,
> >  	.get_nmi_mask = svm_get_nmi_mask,
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index e3394c839dea..148696199c88 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1613,6 +1613,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >  	unsigned nr = vcpu->arch.exception.nr;
> >  	bool has_error_code = vcpu->arch.exception.has_error_code;
> > +	bool has_payload = vcpu->arch.exception.has_payload;
> >  	u32 error_code = vcpu->arch.exception.error_code;
> >  	u32 intr_info = nr | INTR_INFO_VALID_MASK;
> >  
> > @@ -1640,7 +1641,13 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
> >  	} else
> >  		intr_info |= INTR_TYPE_HARD_EXCEPTION;
> >  
> > -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
> > +	/*
> > +	 * Debug-trap exceptions are injected into the guest via the 'pending
> > +	 * debug exceptions' vmcs field and thus should not be injected into the
> > +	 * guest using the general event injection mechanism.
> > +	 */
> > +	if (nr != DB_VECTOR || !has_payload)
> > +		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
> >  
> >  	vmx_clear_hlt(vcpu);
> >  }
> > @@ -6398,6 +6405,16 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
> >  	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
> >  }
> >  
> > +static void vmx_deliver_db_payload(struct kvm_vcpu *vcpu, unsigned long payload)
> > +{
> > +	/*
> > +	 * Synthesized debug exceptions that have an associated payload must be
> > +	 * traps, and thus the 'pending debug exceptions' field can be used to
> > +	 * allow hardware to inject the event upon VM-entry.
> > +	 */
> 
> I'm confused by this, probably because I don't understand what you mean by
> "Synthesized debug exceptions".  E.g. code #DBs are fault-like and update
> dr6.
> 
> > +	vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, payload);
> > +}
> > +
> >  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> >  {
> >  	int i, nr_msrs;
> > @@ -7821,6 +7838,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >  	.set_nmi = vmx_inject_nmi,
> >  	.queue_exception = vmx_queue_exception,
> >  	.cancel_injection = vmx_cancel_injection,
> > +	.deliver_db_payload = vmx_deliver_db_payload,
> >  	.interrupt_allowed = vmx_interrupt_allowed,
> >  	.nmi_allowed = vmx_nmi_allowed,
> >  	.get_nmi_mask = vmx_get_nmi_mask,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index cf917139de6b..c14174c033e4 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -415,26 +415,7 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
> >  
> >  	switch (nr) {
> >  	case DB_VECTOR:
> > -		/*
> > -		 * "Certain debug exceptions may clear bit 0-3.  The
> > -		 * remaining contents of the DR6 register are never
> > -		 * cleared by the processor".
> > -		 */
> > -		vcpu->arch.dr6 &= ~DR_TRAP_BITS;
> > -		/*
> > -		 * DR6.RTM is set by all #DB exceptions that don't clear it.
> > -		 */
> > -		vcpu->arch.dr6 |= DR6_RTM;
> > -		vcpu->arch.dr6 |= payload;
> > -		/*
> > -		 * Bit 16 should be set in the payload whenever the #DB
> > -		 * exception should clear DR6.RTM. This makes the payload
> > -		 * compatible with the pending debug exceptions under VMX.
> > -		 * Though not currently documented in the SDM, this also
> > -		 * makes the payload compatible with the exit qualification
> > -		 * for #DB exceptions under VMX.
> > -		 */
> > -		vcpu->arch.dr6 ^= payload & DR6_RTM;
> > +		kvm_x86_ops->deliver_db_payload(vcpu, payload);
> >  		break;
> >  	case PF_VECTOR:
> >  		vcpu->arch.cr2 = payload;
> > -- 
> > 2.25.0.rc1.283.g88dfdc4193-goog
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] KVM: x86: Add vendor-specific #DB payload delivery
  2020-01-13 23:16     ` Oliver Upton
@ 2020-01-13 23:51       ` Oliver Upton
  0 siblings, 0 replies; 15+ messages in thread
From: Oliver Upton @ 2020-01-13 23:51 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, Paolo Bonzini, Peter Shier, Jim Mattson

On Mon, Jan 13, 2020 at 03:16:54PM -0800, Oliver Upton wrote:
> On Mon, Jan 13, 2020 at 02:52:10PM -0800, Sean Christopherson wrote:
> > On Mon, Jan 13, 2020 at 02:10:51PM -0800, Oliver Upton wrote:
> > > VMX and SVM differ slightly in the handling of #DB debug-trap exceptions
> > > upon VM-entry. VMX defines the 'pending debug exceptions' field in the
> > > VMCS for hardware to inject the event and update debug register state
> > > upon VM-entry.
> > > 
> > > SVM provides no such mechanism for maintaining the state of a pending
> > > debug exception, as the debug register state is updated before delivery
> > > of #VMEXIT.
> > > 
> > > Lastly, while KVM defines the exception payload for debug-trap
> > > exceptions as compatible with the 'pending debug exceptions' VMCS field,
> > > it is not compatible with the DR6 register across both vendors.
> > > 
> > > Split the #DB payload delivery between SVM and VMX to capture the
> > > nuanced differences in instruction emulation. Utilize the 'pending debug
> > > exceptions' field on VMX to deliver the payload. On SVM, directly update
> > > register state with the appropriate requested bits from the exception
> > > payload.
> > > 
> > > Fixes: f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery")
> > 
> > Without even looking at the code, I recommend splitting this into two
> > patches: first fix SVM without introducing any functional changes to VMX,
> > and then change VMX to use the PENDING_DBG field.  That way the SVM code
> > isn't affected if for some reason the VMX patch needs to be reverted.
> 
> Alright, I'll split this between vendors in the SVM patch, but keep the
> existing DR6 magic in both. I'll also mask BIT(12) in the SVM patch.
> 
> > Not that I doubt the correctness of the VMX change, I just expect problems
> > by default whenever anything so much as breathes on dr6 :-)
> 
> Hah. I'll muck with VMX afterwards :)
> 
> As always, thanks for the great, prompt review Sean.
> 
> > > Signed-off-by: Oliver Upton <oupton@google.com>
> > > Reviewed-by: Peter Shier <pshier@google.com>
> > > Reviewed-by: Jim Mattson <jmattson@google.com>
> > > ---
> > >  arch/x86/include/asm/kvm_host.h |  1 +
> > >  arch/x86/kvm/svm.c              | 20 ++++++++++++++++++++
> > >  arch/x86/kvm/vmx/vmx.c          | 20 +++++++++++++++++++-
> > >  arch/x86/kvm/x86.c              | 21 +--------------------
> > >  4 files changed, 41 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index b79cd6aa4075..4739ca11885d 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1100,6 +1100,7 @@ struct kvm_x86_ops {
> > >  	void (*set_nmi)(struct kvm_vcpu *vcpu);
> > >  	void (*queue_exception)(struct kvm_vcpu *vcpu);
> > >  	void (*cancel_injection)(struct kvm_vcpu *vcpu);
> > > +	void (*deliver_db_payload)(struct kvm_vcpu *vcpu,
> > > +				   unsigned long payload);
> > >  	int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
> > >  	int (*nmi_allowed)(struct kvm_vcpu *vcpu);
> > >  	bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index 122d4ce3b1ab..16ded16af997 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -5615,6 +5615,25 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
> > >  	svm_complete_interrupts(svm);
> > >  }
> > >  
> > > +static void svm_deliver_db_payload(struct kvm_vcpu *vcpu, unsigned long payload)
> > > +{
> > > +	/*
> > > +	 * The exception payload is defined as compatible with the 'pending
> > > +	 * debug exceptions' field in VMX, not the DR6 register. Clear bit 12
> > > +	 * (enabled breakpoint) in the payload, which is reserved MBZ in DR6.
> > > +	 */
> > > +	payload &= ~BIT(12);
> > > +
> > > +	/*
> > > +	 * The processor updates bits 3:0 according to the matched breakpoint
> > > +	 * conditions on every debug breakpoint or general-detect condition.
> > > +	 * Hardware will not clear any other bits in DR6. Clear bits 3:0 and set
> > > +	 * the bits requested in the exception payload.
> > > +	 */
> > > +	vcpu->arch.dr6 &= ~DR_TRAP_BITS;
> > > +	vcpu->arch.dr6 |= payload;
> > > +}
> > > +
> > >  static void svm_vcpu_run(struct kvm_vcpu *vcpu)
> > >  {
> > >  	struct vcpu_svm *svm = to_svm(vcpu);
> > > @@ -7308,6 +7327,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >  	.set_nmi = svm_inject_nmi,
> > >  	.queue_exception = svm_queue_exception,
> > >  	.cancel_injection = svm_cancel_injection,
> > > +	.deliver_db_payload = svm_deliver_db_payload,
> > >  	.interrupt_allowed = svm_interrupt_allowed,
> > >  	.nmi_allowed = svm_nmi_allowed,
> > >  	.get_nmi_mask = svm_get_nmi_mask,
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index e3394c839dea..148696199c88 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1613,6 +1613,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
> > >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > >  	unsigned nr = vcpu->arch.exception.nr;
> > >  	bool has_error_code = vcpu->arch.exception.has_error_code;
> > > +	bool has_payload = vcpu->arch.exception.has_payload;
> > >  	u32 error_code = vcpu->arch.exception.error_code;
> > >  	u32 intr_info = nr | INTR_INFO_VALID_MASK;
> > >  
> > > @@ -1640,7 +1641,13 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
> > >  	} else
> > >  		intr_info |= INTR_TYPE_HARD_EXCEPTION;
> > >  
> > > -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
> > > +	/*
> > > +	 * Debug-trap exceptions are injected into the guest via the 'pending
> > > +	 * debug exceptions' vmcs field and thus should not be injected into the
> > > +	 * guest using the general event injection mechanism.
> > > +	 */
> > > +	if (nr != DB_VECTOR || !has_payload)
> > > +		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
> > >  
> > >  	vmx_clear_hlt(vcpu);
> > >  }
> > > @@ -6398,6 +6405,16 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
> > >  	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
> > >  }
> > >  
> > > +static void vmx_deliver_db_payload(struct kvm_vcpu *vcpu, unsigned long payload)
> > > +{
> > > +	/*
> > > +	 * Synthesized debug exceptions that have an associated payload must be
> > > +	 * traps, and thus the 'pending debug exceptions' field can be used to
> > > +	 * allow hardware to inject the event upon VM-entry.
> > > +	 */
> > 
> > I'm confused by this, probably because I don't understand what you mean by
> > "Synthesized debug exceptions".  E.g. code #DBs are fault-like and update
> > dr6.

Yeah, this comment could benefit from being cleaned up. By 'synthesized'
I meant to imply exceptions that we are going to inject resulting from
instruction emulation.

AFAIK, KVM only emulates single step #DBs during instruction emulation,
and as such we should never reach this with a software #DB (I tested
this in the kvm-unit-tests patch). I think that software #DBs should get
reinjected by handle_exception_nmi(), which manipulates DR6 and calls
kvm_queue_exception() (i.e. no payload).

Regardless, I need to expand this comment to ensure it is abundantly
clear what is happening.

Sean, given my above comments, do you still see any issues with this
change (beyond comment fix)?

Sorry I missed this in the first mail, thank you for bringing attention
to this.

> > 
> > > +	vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, payload);
> > > +}
> > > +
> > >  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> > >  {
> > >  	int i, nr_msrs;
> > > @@ -7821,6 +7838,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > >  	.set_nmi = vmx_inject_nmi,
> > >  	.queue_exception = vmx_queue_exception,
> > >  	.cancel_injection = vmx_cancel_injection,
> > > +	.deliver_db_payload = vmx_deliver_db_payload,
> > >  	.interrupt_allowed = vmx_interrupt_allowed,
> > >  	.nmi_allowed = vmx_nmi_allowed,
> > >  	.get_nmi_mask = vmx_get_nmi_mask,
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index cf917139de6b..c14174c033e4 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -415,26 +415,7 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
> > >  
> > >  	switch (nr) {
> > >  	case DB_VECTOR:
> > > -		/*
> > > -		 * "Certain debug exceptions may clear bit 0-3.  The
> > > -		 * remaining contents of the DR6 register are never
> > > -		 * cleared by the processor".
> > > -		 */
> > > -		vcpu->arch.dr6 &= ~DR_TRAP_BITS;
> > > -		/*
> > > -		 * DR6.RTM is set by all #DB exceptions that don't clear it.
> > > -		 */
> > > -		vcpu->arch.dr6 |= DR6_RTM;
> > > -		vcpu->arch.dr6 |= payload;
> > > -		/*
> > > -		 * Bit 16 should be set in the payload whenever the #DB
> > > -		 * exception should clear DR6.RTM. This makes the payload
> > > -		 * compatible with the pending debug exceptions under VMX.
> > > -		 * Though not currently documented in the SDM, this also
> > > -		 * makes the payload compatible with the exit qualification
> > > -		 * for #DB exceptions under VMX.
> > > -		 */
> > > -		vcpu->arch.dr6 ^= payload & DR6_RTM;
> > > +		kvm_x86_ops->deliver_db_payload(vcpu, payload);
> > >  		break;
> > >  	case PF_VECTOR:
> > >  		vcpu->arch.cr2 = payload;
> > > -- 
> > > 2.25.0.rc1.283.g88dfdc4193-goog
> > > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation
  2020-01-13 22:10 ` [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation Oliver Upton
@ 2020-01-14  0:05   ` Sean Christopherson
  2020-01-14 17:58     ` Jim Mattson
  2020-01-15 22:51     ` Oliver Upton
  0 siblings, 2 replies; 15+ messages in thread
From: Sean Christopherson @ 2020-01-14  0:05 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, Paolo Bonzini, Peter Shier, Jim Mattson

On Mon, Jan 13, 2020 at 02:10:52PM -0800, Oliver Upton wrote:
> Since commit 5f3d45e7f282 ("kvm/x86: add support for
> MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap
> flag processor-based execution control for its L2 guest. KVM simply
> forwards any MTF VM-exits to the L1 guest, which works for normal
> instruction execution.
> 
> However, when KVM needs to emulate an instruction on the behalf of an L2
> guest, the monitor trap flag is not emulated. Add the necessary logic to
> kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon
> instruction emulation for L2.

This only fixes the flows where KVM is doing "fast" emulation, or whatever
you want to call it.  The full emulation paths are still missing proper MTF
support.  Dunno if it makes sense to fix it all at once, e.g. if the full
emulation is stupid complex, but at a minimum the gaps should be called out
with a brief explanation of why they're not being addressed.

> Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG")
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>
> Reviewed-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/svm.c              |  5 +++++
>  arch/x86/kvm/vmx/nested.c       |  2 +-
>  arch/x86/kvm/vmx/nested.h       |  5 +++++
>  arch/x86/kvm/vmx/vmx.c          | 19 +++++++++++++++++++
>  arch/x86/kvm/x86.c              |  6 ++++++
>  6 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4739ca11885d..89dcdc7201ae 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1092,6 +1092,7 @@ struct kvm_x86_ops {
>  	void (*run)(struct kvm_vcpu *vcpu);
>  	int (*handle_exit)(struct kvm_vcpu *vcpu);
>  	int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
> +	void (*emulation_complete)(struct kvm_vcpu *vcpu);
>  	void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
>  	u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
>  	void (*patch_hypercall)(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 16ded16af997..f21eec4443d5 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -802,6 +802,10 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static void svm_emulation_complete(struct kvm_vcpu *vcpu)
> +{
> +}
> +
>  static void svm_queue_exception(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -7320,6 +7324,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  	.run = svm_vcpu_run,
>  	.handle_exit = handle_exit,
>  	.skip_emulated_instruction = skip_emulated_instruction,
> +	.emulation_complete = svm_emulation_complete,
>  	.set_interrupt_shadow = svm_set_interrupt_shadow,
>  	.get_interrupt_shadow = svm_get_interrupt_shadow,
>  	.patch_hypercall = svm_patch_hypercall,
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 4aea7d304beb..ee26f2d10a09 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -5578,7 +5578,7 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason)
>  	case EXIT_REASON_MWAIT_INSTRUCTION:
>  		return nested_cpu_has(vmcs12, CPU_BASED_MWAIT_EXITING);
>  	case EXIT_REASON_MONITOR_TRAP_FLAG:
> -		return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_TRAP_FLAG);
> +		return nested_cpu_has_mtf(vmcs12);
>  	case EXIT_REASON_MONITOR_INSTRUCTION:
>  		return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_EXITING);
>  	case EXIT_REASON_PAUSE_INSTRUCTION:
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index fc874d4ead0f..901d2745bc93 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -238,6 +238,11 @@ static inline bool nested_cpu_has_save_preemption_timer(struct vmcs12 *vmcs12)
>  	    VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
>  }
>  
> +static inline bool nested_cpu_has_mtf(struct vmcs12 *vmcs12)
> +{
> +	return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_TRAP_FLAG);
> +}
> +
>  /*
>   * In nested virtualization, check if L1 asked to exit on external interrupts.
>   * For most existing hypervisors, this will always return true.
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 148696199c88..8d3b693c3d3a 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1595,6 +1595,24 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static void vmx_emulation_complete(struct kvm_vcpu *vcpu)
> +{
> +	if (!(is_guest_mode(vcpu) &&
> +	      nested_cpu_has_mtf(get_vmcs12(vcpu))))
> +		return;
> +
> +	/*
> +	 * Per the SDM, MTF takes priority over debug-trap instructions. As

Except for T-bit #DBs.  Thankfully KVM doesn't emulate them.  :-D

> +	 * instruction emulation is completed (i.e. at the instruction
> +	 * boundary), any #DB exception must be a trap. Emulate an MTF VM-exit
> +	 * into L1 should there be a debug-trap exception pending or no
> +	 * exception pending.
> +	 */
> +	if (!vcpu->arch.exception.pending ||
> +	    vcpu->arch.exception.nr == DB_VECTOR)
> +		nested_vmx_vmexit(vcpu, EXIT_REASON_MONITOR_TRAP_FLAG, 0, 0);

Doing a direct nested_vmx_vmexit() in kvm_skip_emulated_instruction() feels
like a hack.  It might work for now, i.e. while only the "fast" emulation
paths deal with MTF, but I have a feeling that we'll want MTF handling to
live in vmx_check_nested_events(), or possibly in a custom callback that is
invoked from similar call sites.

Actually, I thought of a case it breaks.  If HLT is passed through from
L1->L2, this approach will process the VM-Exit prior to handling the HLT
in L0.  KVM will record the wrong activity state in vmcs12 ("active"
instead of "halted") and then incorrectly put L1 into KVM_MP_STATE_HALTED.

Another case, which may or may not be possible, is if INIT is recognized
on the same instruction, in which case it takes priority over MTF.  SMI
might also be an issue.

> +}
> +
>  static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
>  {
>  	/*
> @@ -7831,6 +7849,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.run = vmx_vcpu_run,
>  	.handle_exit = vmx_handle_exit,
>  	.skip_emulated_instruction = skip_emulated_instruction,
> +	.emulation_complete = vmx_emulation_complete,
>  	.set_interrupt_shadow = vmx_set_interrupt_shadow,
>  	.get_interrupt_shadow = vmx_get_interrupt_shadow,
>  	.patch_hypercall = vmx_patch_hypercall,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c14174c033e4..d3af7a8a3c4b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6546,6 +6546,12 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  	 */
>  	if (unlikely(rflags & X86_EFLAGS_TF))
>  		r = kvm_vcpu_do_singlestep(vcpu);
> +	/*
> +	 * Allow for vendor-specific handling of completed emulation before
> +	 * returning.
> +	 */
> +	if (r)
> +		kvm_x86_ops->emulation_complete(vcpu);
>  	return r;
>  }
>  EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
> -- 
> 2.25.0.rc1.283.g88dfdc4193-goog
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation
  2020-01-14  0:05   ` Sean Christopherson
@ 2020-01-14 17:58     ` Jim Mattson
  2020-01-14 18:28       ` Sean Christopherson
  2020-01-15 22:51     ` Oliver Upton
  1 sibling, 1 reply; 15+ messages in thread
From: Jim Mattson @ 2020-01-14 17:58 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Oliver Upton, kvm list, Paolo Bonzini, Peter Shier

On Mon, Jan 13, 2020 at 4:05 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:

> Another case, which may or may not be possible, is if INIT is recognized
> on the same instruction, in which case it takes priority over MTF.  SMI
> might also be an issue.

Don't we already have a priority inversion today when INIT or SMI are
coincident with a debug trap on the previous instruction (e.g.
single-step trap on an emulated instruction)?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation
  2020-01-14 17:58     ` Jim Mattson
@ 2020-01-14 18:28       ` Sean Christopherson
  2020-01-17 21:43         ` Jim Mattson
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2020-01-14 18:28 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Oliver Upton, kvm list, Paolo Bonzini, Peter Shier

On Tue, Jan 14, 2020 at 09:58:22AM -0800, Jim Mattson wrote:
> On Mon, Jan 13, 2020 at 4:05 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> 
> > Another case, which may or may not be possible, is if INIT is recognized
> > on the same instruction, in which case it takes priority over MTF.  SMI
> > might also be an issue.
> 
> Don't we already have a priority inversion today when INIT or SMI are
> coincident with a debug trap on the previous instruction (e.g.
> single-step trap on an emulated instruction)?

Liran fixed the INIT issue in commit 4b9852f4f389 ("KVM: x86: Fix INIT
signal handling in various CPU states").

SMI still appears to be inverted.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation
  2020-01-14  0:05   ` Sean Christopherson
  2020-01-14 17:58     ` Jim Mattson
@ 2020-01-15 22:51     ` Oliver Upton
  2020-01-21 15:28       ` Paolo Bonzini
  1 sibling, 1 reply; 15+ messages in thread
From: Oliver Upton @ 2020-01-15 22:51 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, Paolo Bonzini, Peter Shier, Jim Mattson

On Mon, Jan 13, 2020 at 04:05:18PM -0800, Sean Christopherson wrote:
> On Mon, Jan 13, 2020 at 02:10:52PM -0800, Oliver Upton wrote:
> > Since commit 5f3d45e7f282 ("kvm/x86: add support for
> > MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap
> > flag processor-based execution control for its L2 guest. KVM simply
> > forwards any MTF VM-exits to the L1 guest, which works for normal
> > instruction execution.
> > 
> > However, when KVM needs to emulate an instruction on the behalf of an L2
> > guest, the monitor trap flag is not emulated. Add the necessary logic to
> > kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon
> > instruction emulation for L2.
> 
> This only fixes the flows where KVM is doing "fast" emulation, or whatever
> you want to call it.  The full emulation paths are still missing proper MTF
> support.  Dunno if it makes sense to fix it all at once, e.g. if the full
> emulation is stupid complex, but at a minimum the gaps should be called out
> with a brief explanation of why they're not being addressed.

Good point. I'm instead inclined to call the hook to emulation_complete
(will rename as appropriate) from kvm_vcpu_do_singlestep(), as it
appears this is how x86_emulate_instruction processes the trap flag.
Need to take a deeper look + test to ensure this change will fix MTF for
full instruction emulation.

> > Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG")
> > 
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > Reviewed-by: Peter Shier <pshier@google.com>
> > Reviewed-by: Jim Mattson <jmattson@google.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/kvm/svm.c              |  5 +++++
> >  arch/x86/kvm/vmx/nested.c       |  2 +-
> >  arch/x86/kvm/vmx/nested.h       |  5 +++++
> >  arch/x86/kvm/vmx/vmx.c          | 19 +++++++++++++++++++
> >  arch/x86/kvm/x86.c              |  6 ++++++
> >  6 files changed, 37 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 4739ca11885d..89dcdc7201ae 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1092,6 +1092,7 @@ struct kvm_x86_ops {
> >  	void (*run)(struct kvm_vcpu *vcpu);
> >  	int (*handle_exit)(struct kvm_vcpu *vcpu);
> >  	int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
> > +	void (*emulation_complete)(struct kvm_vcpu *vcpu);
> >  	void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
> >  	u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
> >  	void (*patch_hypercall)(struct kvm_vcpu *vcpu,
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 16ded16af997..f21eec4443d5 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -802,6 +802,10 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
> >  	return 1;
> >  }
> >  
> > +static void svm_emulation_complete(struct kvm_vcpu *vcpu)
> > +{
> > +}
> > +
> >  static void svm_queue_exception(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> > @@ -7320,6 +7324,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >  	.run = svm_vcpu_run,
> >  	.handle_exit = handle_exit,
> >  	.skip_emulated_instruction = skip_emulated_instruction,
> > +	.emulation_complete = svm_emulation_complete,
> >  	.set_interrupt_shadow = svm_set_interrupt_shadow,
> >  	.get_interrupt_shadow = svm_get_interrupt_shadow,
> >  	.patch_hypercall = svm_patch_hypercall,
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 4aea7d304beb..ee26f2d10a09 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -5578,7 +5578,7 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason)
> >  	case EXIT_REASON_MWAIT_INSTRUCTION:
> >  		return nested_cpu_has(vmcs12, CPU_BASED_MWAIT_EXITING);
> >  	case EXIT_REASON_MONITOR_TRAP_FLAG:
> > -		return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_TRAP_FLAG);
> > +		return nested_cpu_has_mtf(vmcs12);
> >  	case EXIT_REASON_MONITOR_INSTRUCTION:
> >  		return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_EXITING);
> >  	case EXIT_REASON_PAUSE_INSTRUCTION:
> > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> > index fc874d4ead0f..901d2745bc93 100644
> > --- a/arch/x86/kvm/vmx/nested.h
> > +++ b/arch/x86/kvm/vmx/nested.h
> > @@ -238,6 +238,11 @@ static inline bool nested_cpu_has_save_preemption_timer(struct vmcs12 *vmcs12)
> >  	    VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> >  }
> >  
> > +static inline bool nested_cpu_has_mtf(struct vmcs12 *vmcs12)
> > +{
> > +	return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_TRAP_FLAG);
> > +}
> > +
> >  /*
> >   * In nested virtualization, check if L1 asked to exit on external interrupts.
> >   * For most existing hypervisors, this will always return true.
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 148696199c88..8d3b693c3d3a 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1595,6 +1595,24 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
> >  	return 1;
> >  }
> >  
> > +static void vmx_emulation_complete(struct kvm_vcpu *vcpu)
> > +{
> > +	if (!(is_guest_mode(vcpu) &&
> > +	      nested_cpu_has_mtf(get_vmcs12(vcpu))))
> > +		return;
> > +
> > +	/*
> > +	 * Per the SDM, MTF takes priority over debug-trap instructions. As
> 
> Except for T-bit #DBs.  Thankfully KVM doesn't emulate them.  :-D
> 
> > +	 * instruction emulation is completed (i.e. at the instruction
> > +	 * boundary), any #DB exception must be a trap. Emulate an MTF VM-exit
> > +	 * into L1 should there be a debug-trap exception pending or no
> > +	 * exception pending.
> > +	 */
> > +	if (!vcpu->arch.exception.pending ||
> > +	    vcpu->arch.exception.nr == DB_VECTOR)
> > +		nested_vmx_vmexit(vcpu, EXIT_REASON_MONITOR_TRAP_FLAG, 0, 0);
> 
> Doing a direct nested_vmx_vmexit() in kvm_skip_emulated_instruction() feels
> like a hack.  It might work for now, i.e. while only the "fast" emulation
> paths deal with MTF, but I have a feeling that we'll want MTF handling to
> live in vmx_check_nested_events(), or possibly in a custom callback that is
> invoked from similar call sites.

I'm convinced now that perhaps kvm_vcpu_do_singlestep() is the right
place to call a hook that determines if an MTF VM-exit is pending, as
x86_emulate_instruction() also uses this function to handle the trap
flag.

> Actually, I thought of a case it breaks.  If HLT is passed through from
> L1->L2, this approach will process the VM-Exit prior to handling the HLT
> in L0.  KVM will record the wrong activity state in vmcs12 ("active"
> instead of "halted") and then incorrectly put L1 into KVM_MP_STATE_HALTED.

Great catch!

I'm doing some experimentation with HLT to figure out how to correctly
handle this.

> Another case, which may or may not be possible, is if INIT is recognized
> on the same instruction, in which case it takes priority over MTF.  SMI
> might also be an issue.

I've changed this locally to defer delivery of the MTF VM-exit to
vmx_check_nested_events(). Tests look fine, seems this would resolve
INIT signal priority.

> > +}
> > +
> >  static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
> >  {
> >  	/*
> > @@ -7831,6 +7849,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >  	.run = vmx_vcpu_run,
> >  	.handle_exit = vmx_handle_exit,
> >  	.skip_emulated_instruction = skip_emulated_instruction,
> > +	.emulation_complete = vmx_emulation_complete,
> >  	.set_interrupt_shadow = vmx_set_interrupt_shadow,
> >  	.get_interrupt_shadow = vmx_get_interrupt_shadow,
> >  	.patch_hypercall = vmx_patch_hypercall,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index c14174c033e4..d3af7a8a3c4b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -6546,6 +6546,12 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
> >  	 */
> >  	if (unlikely(rflags & X86_EFLAGS_TF))
> >  		r = kvm_vcpu_do_singlestep(vcpu);
> > +	/*
> > +	 * Allow for vendor-specific handling of completed emulation before
> > +	 * returning.
> > +	 */
> > +	if (r)
> > +		kvm_x86_ops->emulation_complete(vcpu);
> >  	return r;
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
> > -- 
> > 2.25.0.rc1.283.g88dfdc4193-goog
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation
  2020-01-14 18:28       ` Sean Christopherson
@ 2020-01-17 21:43         ` Jim Mattson
  0 siblings, 0 replies; 15+ messages in thread
From: Jim Mattson @ 2020-01-17 21:43 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Oliver Upton, kvm list, Paolo Bonzini, Peter Shier

On Tue, Jan 14, 2020 at 10:28 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Jan 14, 2020 at 09:58:22AM -0800, Jim Mattson wrote:
> > On Mon, Jan 13, 2020 at 4:05 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> >
> > > Another case, which may or may not be possible, is if INIT is recognized
> > > on the same instruction, in which case it takes priority over MTF.  SMI
> > > might also be an issue.
> >
> > Don't we already have a priority inversion today when INIT or SMI are
> > coincident with a debug trap on the previous instruction (e.g.
> > single-step trap on an emulated instruction)?
>
> Liran fixed the INIT issue in commit 4b9852f4f389 ("KVM: x86: Fix INIT
> signal handling in various CPU states").
>
> SMI still appears to be inverted.

I find the callgraph for vmx_check_nested_events very confusing. It's
called as many as three times per call to vcpu_enter_guest():

    1. From kvm_vcpu_running(), before the call to vcpu_enter_guest().
    2. From inject_pending_event(), in vcpu_enter_guest(), after all
of the calls to kvm_check_request().
    3. From inject_pending_event(), after injecting (but not
reinjecting) an event, but not if we've processed an SMI or an NMI.

Can this possibly respect the architected priorities? I'm skeptical.

Within the body of vmx_check_nested_events(), the following priorities
are imposed:

1. INIT
2. Intercepted fault or trap on previous instruction
3. VMX preemption timer
4. NMI
5. External interrupt

(2) is appropriately placed for "traps on the previous instruction,"
but it looks like there is a priority inversion between INIT and an
intercepted fault on the previous instruction. In fact, because of the
first two calls to vmx_check_nested_events() listed above, there is a
priority inversion between an *unintercepted* fault on the previous
instruction and any of {INIT, VMX preemption timer, NMI, external
interrupt}.

Or is there some subtlety here that I'm missing?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation
  2020-01-15 22:51     ` Oliver Upton
@ 2020-01-21 15:28       ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2020-01-21 15:28 UTC (permalink / raw)
  To: Oliver Upton, Sean Christopherson; +Cc: kvm, Peter Shier, Jim Mattson

On 15/01/20 23:51, Oliver Upton wrote:
> Good point. I'm instead inclined to call the hook to emulation_complete
> (will rename as appropriate) from kvm_vcpu_do_singlestep(), as it
> appears this is how x86_emulate_instruction processes the trap flag.
> Need to take a deeper look + test to ensure this change will fix MTF for
> full instruction emulation.

Cool, I was going to suggest the same.  You can use the forced emulation
prefix to send your testcase down the x86_emulate_instruction path.

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-01-21 15:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-13 22:10 [PATCH 0/3] Handle monitor trap flag during instruction emulation Oliver Upton
2020-01-13 22:10 ` [PATCH 1/3] KVM: x86: Add vendor-specific #DB payload delivery Oliver Upton
2020-01-13 22:52   ` Sean Christopherson
2020-01-13 23:16     ` Oliver Upton
2020-01-13 23:51       ` Oliver Upton
2020-01-13 22:10 ` [PATCH 2/3] KVM: x86: Emulate MTF when performing instruction emulation Oliver Upton
2020-01-14  0:05   ` Sean Christopherson
2020-01-14 17:58     ` Jim Mattson
2020-01-14 18:28       ` Sean Christopherson
2020-01-17 21:43         ` Jim Mattson
2020-01-15 22:51     ` Oliver Upton
2020-01-21 15:28       ` Paolo Bonzini
2020-01-13 22:10 ` [kvm-unit-tests PATCH 3/3] x86: VMX: Add tests for monitor trap flag Oliver Upton
2020-01-13 22:35 ` [PATCH 0/3] Handle monitor trap flag during instruction emulation Sean Christopherson
2020-01-13 23:06   ` Oliver Upton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).