[PATCH v2 0/3] KVM: VMX: handle preemption timer value of zero

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] KVM: VMX: handle preemption timer value of zero
@ 2018-09-19 22:50 Paolo Bonzini
  2018-09-19 22:50 ` [PATCH 1/3] KVM: VMX: immediately mark preemption timer expired only for zero value Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Paolo Bonzini @ 2018-09-19 22:50 UTC (permalink / raw)
  To: linux-kernel, kvm



A VMX preemption timer value of '0' at the time of VMEnter is
architecturally guaranteed to cause a VMExit prior to the CPU
executing any instructions in the guest.  KVM serendipitously
emulates this behavior when running as L0, but doesn't ensure
correct emulation when running at L1 or higher.  Explicitly
emulate the architectural behavior of a timer value of '0'.

v1->v2:
- move flag to vmx->loaded_vmcs
- extract arming the timer to a separate function instead of using a boolean
- clean up SVM

Sean Christopherson (3):
  KVM: VMX: immediately mark preemption timer expired only for zero
    value
  KVM: VMX: modify preemption timer bit only when arming timer
  KVM: VMX: use preemption timer to force immediate VMExit

 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm.c              |  2 +
 arch/x86/kvm/vmx.c              | 94 ++++++++++++++++++++++++++---------------
 arch/x86/kvm/x86.c              |  8 +++-
 4 files changed, 70 insertions(+), 36 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] KVM: VMX: immediately mark preemption timer expired only for zero value
  2018-09-19 22:50 [PATCH v2 0/3] KVM: VMX: handle preemption timer value of zero Paolo Bonzini
@ 2018-09-19 22:50 ` Paolo Bonzini
  2018-09-19 22:50 ` [PATCH 2/3] KVM: VMX: modify preemption timer bit only when arming timer Paolo Bonzini
  2018-09-19 22:50 ` [PATCH 3/3] KVM: VMX: use preemption timer to force immediate VMExit Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2018-09-19 22:50 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

A VMX preemption timer value of '0' at the time of VMEnter is
architecturally guaranteed to cause a VMExit prior to the CPU
executing any instructions in the guest.  This architectural
definition is in place to ensure that a previously expired timer
is correctly recognized by the CPU as it is possible for the timer
to reach zero and not trigger a VMexit due to a higher priority
VMExit being signalled instead, e.g. a pending #DB that morphs into
a VMExit.

Whether by design or coincidence, commit f4124500c2c1 ("KVM: nVMX:
Fully emulate preemption timer") special cased timer values of '0'
and '1' to ensure prompt delivery of the VMExit.  Unlike '0', a
timer value of '1' has no has no architectural guarantees regarding
when it is delivered.

Modify the timer emulation to trigger immediate VMExit if and only
if the timer value is '0', and document precisely why '0' is special.
Do this even if calibration of the virtual TSC failed, i.e. VMExit
will occur immediately regardless of the frequency of the timer.
Making only '0' a special case gives KVM leeway to be more aggressive
in ensuring the VMExit is injected prior to executing instructions in
the nested guest, and also eliminates any ambiguity as to why '1' is
a special case, e.g. why wasn't the threshold for a "short timeout"
set to 10, 100, 1000, etc...

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 533a327372c8..4655d6dd6759 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11427,16 +11427,18 @@ static void vmx_start_preemption_timer(struct kvm_vcpu *vcpu)
 	u64 preemption_timeout = get_vmcs12(vcpu)->vmx_preemption_timer_value;
 	struct vcpu_vmx *vmx = to_vmx(vcpu);

-	if (vcpu->arch.virtual_tsc_khz == 0)
-		return;
-
-	/* Make sure short timeouts reliably trigger an immediate vmexit.
-	 * hrtimer_start does not guarantee this. */
-	if (preemption_timeout <= 1) {
+	/*
+	 * A timer value of zero is architecturally guaranteed to cause
+	 * a VMExit prior to executing any instructions in the guest.
+	 */
+	if (preemption_timeout == 0) {
 		vmx_preemption_timer_fn(&vmx->nested.preemption_timer);
 		return;
 	}

+	if (vcpu->arch.virtual_tsc_khz == 0)
+		return;
+
 	preemption_timeout <<= VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE;
 	preemption_timeout *= 1000000;
 	do_div(preemption_timeout, vcpu->arch.virtual_tsc_khz);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] KVM: VMX: modify preemption timer bit only when arming timer
  2018-09-19 22:50 [PATCH v2 0/3] KVM: VMX: handle preemption timer value of zero Paolo Bonzini
  2018-09-19 22:50 ` [PATCH 1/3] KVM: VMX: immediately mark preemption timer expired only for zero value Paolo Bonzini
@ 2018-09-19 22:50 ` Paolo Bonzini
  2018-09-19 22:50 ` [PATCH 3/3] KVM: VMX: use preemption timer to force immediate VMExit Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2018-09-19 22:50 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Provide a singular location where the VMX preemption timer bit is
set/cleared so that future usages of the preemption timer can ensure
the VMCS bit is up-to-date without having to modify unrelated code
paths.  For example, the preemption timer can be used to force an
immediate VMExit.  Cache the status of the timer to avoid redundant
VMREAD and VMWRITE, e.g. if the timer stays armed across multiple
VMEnters/VMExits.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 61 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4655d6dd6759..ed80673572d2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -397,6 +397,7 @@ struct loaded_vmcs {
 	int cpu;
 	bool launched;
 	bool nmi_known_unmasked;
+	bool hv_timer_armed;
 	/* Support for vnmi-less CPUs */
 	int soft_vnmi_blocked;
 	ktime_t entry_time;
@@ -10595,24 +10596,38 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
 					msrs[i].host, false);
 }
 
-static void vmx_arm_hv_timer(struct kvm_vcpu *vcpu)
+static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u64 val)
+{
+	vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
+	if (!vmx->loaded_vmcs->hv_timer_armed)
+		vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
+			      PIN_BASED_VMX_PREEMPTION_TIMER);
+	vmx->loaded_vmcs->hv_timer_armed = true;
+}
+
+static void vmx_update_hv_timer(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u64 tscl;
 	u32 delta_tsc;
 
-	if (vmx->hv_deadline_tsc == -1)
-		return;
+	if (vmx->hv_deadline_tsc != -1) {
+		tscl = rdtsc();
+		if (vmx->hv_deadline_tsc > tscl)
+			/* set_hv_timer ensures the delta fits in 32-bits */
+			delta_tsc = (u32)((vmx->hv_deadline_tsc - tscl) >>
+				cpu_preemption_timer_multi);
+		else
+			delta_tsc = 0;
 
-	tscl = rdtsc();
-	if (vmx->hv_deadline_tsc > tscl)
-		/* sure to be 32 bit only because checked on set_hv_timer */
-		delta_tsc = (u32)((vmx->hv_deadline_tsc - tscl) >>
-			cpu_preemption_timer_multi);
-	else
-		delta_tsc = 0;
+		vmx_arm_hv_timer(vmx, delta_tsc);
+		return;
+	}
 
-	vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, delta_tsc);
+	if (vmx->loaded_vmcs->hv_timer_armed)
+		vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
+				PIN_BASED_VMX_PREEMPTION_TIMER);
+	vmx->loaded_vmcs->hv_timer_armed = false;
 }
 
 static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
@@ -10672,7 +10687,7 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	atomic_switch_perf_msrs(vmx);
 
-	vmx_arm_hv_timer(vcpu);
+	vmx_update_hv_timer(vcpu);
 
 	/*
 	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
@@ -12078,11 +12093,10 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 
 	exec_control = vmcs12->pin_based_vm_exec_control;
 
-	/* Preemption timer setting is only taken from vmcs01.  */
-	exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
+	/* Preemption timer setting is computed directly in vmx_vcpu_run.  */
 	exec_control |= vmcs_config.pin_based_exec_ctrl;
-	if (vmx->hv_deadline_tsc == -1)
-		exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
+	exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
+	vmx->loaded_vmcs->hv_timer_armed = false;
 
 	/* Posted interrupts setting is only taken from vmcs12.  */
 	if (nested_cpu_has_posted_intr(vmcs12)) {
@@ -13255,12 +13269,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
 	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
 	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
-	if (vmx->hv_deadline_tsc == -1)
-		vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
-				PIN_BASED_VMX_PREEMPTION_TIMER);
-	else
-		vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
-			      PIN_BASED_VMX_PREEMPTION_TIMER);
+
 	if (kvm_has_tsc_control)
 		decache_tsc_multiplier(vmx);
 
@@ -13464,18 +13473,12 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
 		return -ERANGE;
 
 	vmx->hv_deadline_tsc = tscl + delta_tsc;
-	vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
-			PIN_BASED_VMX_PREEMPTION_TIMER);
-
 	return delta_tsc == 0;
 }
 
 static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	vmx->hv_deadline_tsc = -1;
-	vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
-			PIN_BASED_VMX_PREEMPTION_TIMER);
+	to_vmx(vcpu)->hv_deadline_tsc = -1;
 }
 #endif
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] KVM: VMX: use preemption timer to force immediate VMExit
  2018-09-19 22:50 [PATCH v2 0/3] KVM: VMX: handle preemption timer value of zero Paolo Bonzini
  2018-09-19 22:50 ` [PATCH 1/3] KVM: VMX: immediately mark preemption timer expired only for zero value Paolo Bonzini
  2018-09-19 22:50 ` [PATCH 2/3] KVM: VMX: modify preemption timer bit only when arming timer Paolo Bonzini
@ 2018-09-19 22:50 ` Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2018-09-19 22:50 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest.  Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI.  This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).

When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI.  Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed.  But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0.  Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.

For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing.  The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter.  Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.

Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate.  There are no known errata affecting timer value of '0'.

[1] I/O SMIs also have guarantees on when they arrive, but I have
    no idea if/how those are emulated in KVM.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm.c              |  2 ++
 arch/x86/kvm/vmx.c              | 21 ++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  8 +++++++-
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8e90488c3d56..bffb25b50425 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1055,6 +1055,7 @@ struct kvm_x86_ops {
 	bool (*umip_emulated)(void);
 
 	int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
+	void (*request_immediate_exit)(struct kvm_vcpu *vcpu);
 
 	void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
 
@@ -1482,6 +1483,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 
 int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
+void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu);
 
 int kvm_is_in_guest(void);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c7f1c3fd782d..d96092b35936 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7148,6 +7148,8 @@ static int svm_unregister_enc_region(struct kvm *kvm,
 	.check_intercept = svm_check_intercept,
 	.handle_external_intr = svm_handle_external_intr,
 
+	.request_immediate_exit = __kvm_request_immediate_exit,
+
 	.sched_in = svm_sched_in,
 
 	.pmu_ops = &amd_pmu_ops,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ed80673572d2..c9a57e2a5999 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1020,6 +1020,8 @@ struct vcpu_vmx {
 	int ple_window;
 	bool ple_window_dirty;
 
+	bool req_immediate_exit;
+
 	/* Support for PML */
 #define PML_ENTITY_NUM		512
 	struct page *pml_pg;
@@ -2865,6 +2867,8 @@ static void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	u16 fs_sel, gs_sel;
 	int i;
 
+	vmx->req_immediate_exit = false;
+
 	if (vmx->loaded_cpu_state)
 		return;
 
@@ -7967,6 +7971,9 @@ static __init int hardware_setup(void)
 		kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
 	}
 
+	if (!cpu_has_vmx_preemption_timer())
+		kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
+
 	if (cpu_has_vmx_preemption_timer() && enable_preemption_timer) {
 		u64 vmx_msr;
 
@@ -9209,7 +9216,8 @@ static int handle_pml_full(struct kvm_vcpu *vcpu)
 
 static int handle_preemption_timer(struct kvm_vcpu *vcpu)
 {
-	kvm_lapic_expired_hv_timer(vcpu);
+	if (!to_vmx(vcpu)->req_immediate_exit)
+		kvm_lapic_expired_hv_timer(vcpu);
 	return 1;
 }
 
@@ -10611,6 +10619,11 @@ static void vmx_update_hv_timer(struct kvm_vcpu *vcpu)
 	u64 tscl;
 	u32 delta_tsc;
 
+	if (vmx->req_immediate_exit) {
+		vmx_arm_hv_timer(vmx, 0);
+		return;
+	}
+
 	if (vmx->hv_deadline_tsc != -1) {
 		tscl = rdtsc();
 		if (vmx->hv_deadline_tsc > tscl)
@@ -12879,6 +12892,11 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 	return 0;
 }
 
+static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
+{
+	to_vmx(vcpu)->req_immediate_exit = true;
+}
+
 static u32 vmx_get_preemption_timer_value(struct kvm_vcpu *vcpu)
 {
 	ktime_t remaining =
@@ -14135,6 +14153,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
 	.umip_emulated = vmx_umip_emulated,
 
 	.check_nested_events = vmx_check_nested_events,
+	.request_immediate_exit = vmx_request_immediate_exit,
 
 	.sched_in = vmx_sched_in,
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5c870203737f..9d0fda9056de 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7361,6 +7361,12 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
 
+void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
+{
+	smp_send_reschedule(vcpu->cpu);
+}
+EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
+
 /*
  * Returns 1 to let vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -7565,7 +7571,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	if (req_immediate_exit) {
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
-		smp_send_reschedule(vcpu->cpu);
+		kvm_x86_ops->request_immediate_exit(vcpu);
 	}
 
 	trace_kvm_entry(vcpu->vcpu_id);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-09-19 22:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-19 22:50 [PATCH v2 0/3] KVM: VMX: handle preemption timer value of zero Paolo Bonzini
2018-09-19 22:50 ` [PATCH 1/3] KVM: VMX: immediately mark preemption timer expired only for zero value Paolo Bonzini
2018-09-19 22:50 ` [PATCH 2/3] KVM: VMX: modify preemption timer bit only when arming timer Paolo Bonzini
2018-09-19 22:50 ` [PATCH 3/3] KVM: VMX: use preemption timer to force immediate VMExit Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).