[PATCH v3 00/28] KVM: nSVM: event fixes and migration support

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/28] KVM: nSVM: event fixes and migration support
@ 2020-05-29 15:39 Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 01/30] KVM: x86: track manually whether an event has been injected Paolo Bonzini
                   ` (30 more replies)
  0 siblings, 31 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

This is basically the same as v2 except that it has a small fix to
"KVM: x86: enable event window in inject_pending_event", where
a second pending interrupt or NMI was not enabling the window-open
vmexit (caught by apic.flat).  In addition I've renamed
inject_pending_event to handle_processor_events.

The series now passes kvm-unit-tests and various nested hypervisor tests
so now it's *really* ready for review!  (Thanks Krish for looking at
it so far).

I'm quite pleased with the overall look of the code, though the
INT_CTL arbitration is a bit ugly.  I have plans to implement nested
vGIF and vLS, and then I will probably clean it up.

Paolo

Paolo Bonzini (28):
  KVM: x86: track manually whether an event has been injected
  KVM: x86: enable event window in inject_pending_event
  KVM: nSVM: inject exceptions via svm_check_nested_events
  KVM: nSVM: remove exit_required
  KVM: nSVM: correctly inject INIT vmexits
  KVM: SVM: always update CR3 in VMCB
  KVM: nVMX: always update CR3 in VMCS
  KVM: nSVM: move map argument out of enter_svm_guest_mode
  KVM: nSVM: extract load_nested_vmcb_control
  KVM: nSVM: extract preparation of VMCB for nested run
  KVM: nSVM: move MMU setup to nested_prepare_vmcb_control
  KVM: nSVM: clean up tsc_offset update
  KVM: nSVM: pass vmcb_control_area to copy_vmcb_control_area
  KVM: nSVM: remove trailing padding for struct vmcb_control_area
  KVM: nSVM: save all control fields in svm->nested
  KVM: nSVM: restore clobbered INT_CTL fields after clearing VINTR
  KVM: nSVM: synchronize VMCB controls updated by the processor on every
    vmexit
  KVM: nSVM: remove unnecessary if
  KVM: nSVM: extract svm_set_gif
  KVM: SVM: preserve VGIF across VMCB switch
  KVM: nSVM: synthesize correct EXITINTINFO on vmexit
  KVM: nSVM: remove HF_VINTR_MASK
  KVM: nSVM: remove HF_HIF_MASK
  KVM: nSVM: split nested_vmcb_check_controls
  KVM: nSVM: leave guest mode when clearing EFER.SVME
  KVM: MMU: pass arbitrary CR0/CR4/EFER to kvm_init_shadow_mmu
  selftests: kvm: add a SVM version of state-test
  KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE

Vitaly Kuznetsov (2):
  selftests: kvm: introduce cpu_has_svm() check
  selftests: kvm: fix smm test on SVM

 arch/x86/include/asm/kvm_host.h               |  12 +-
 arch/x86/include/asm/svm.h                    |   9 +-
 arch/x86/include/uapi/asm/kvm.h               |  17 +-
 arch/x86/kvm/cpuid.h                          |   5 +
 arch/x86/kvm/irq.c                            |   1 +
 arch/x86/kvm/mmu.h                            |   2 +-
 arch/x86/kvm/mmu/mmu.c                        |  14 +-
 arch/x86/kvm/svm/nested.c                     | 624 ++++++++++++------
 arch/x86/kvm/svm/svm.c                        | 154 ++---
 arch/x86/kvm/svm/svm.h                        |  33 +-
 arch/x86/kvm/vmx/nested.c                     |   5 -
 arch/x86/kvm/vmx/vmx.c                        |  25 +-
 arch/x86/kvm/x86.c                            | 146 ++--
 .../selftests/kvm/include/x86_64/svm_util.h   |  10 +
 tools/testing/selftests/kvm/x86_64/smm_test.c |  19 +-
 .../testing/selftests/kvm/x86_64/state_test.c |  62 +-
 16 files changed, 708 insertions(+), 430 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 01/30] KVM: x86: track manually whether an event has been injected
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 02/30] KVM: x86: enable event window in inject_pending_event Paolo Bonzini
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Instead of calling kvm_event_needs_reinjection, track its
future return value in a variable.  This will be useful in
the next patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 329bdd2eb2cf..77b9b4e66673 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7717,11 +7717,14 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu)
 static int inject_pending_event(struct kvm_vcpu *vcpu)
 {
 	int r;
+	bool can_inject = true;
 
 	/* try to reinject previous events if any */
 
-	if (vcpu->arch.exception.injected)
+	if (vcpu->arch.exception.injected) {
 		kvm_x86_ops.queue_exception(vcpu);
+		can_inject = false;
+	}
 	/*
 	 * Do not inject an NMI or interrupt if there is a pending
 	 * exception.  Exceptions and interrupts are recognized at
@@ -7737,10 +7740,13 @@ static int inject_pending_event(struct kvm_vcpu *vcpu)
 	 * fully complete the previous instruction.
 	 */
 	else if (!vcpu->arch.exception.pending) {
-		if (vcpu->arch.nmi_injected)
+		if (vcpu->arch.nmi_injected) {
 			kvm_x86_ops.set_nmi(vcpu);
-		else if (vcpu->arch.interrupt.injected)
+			can_inject = false;
+		} else if (vcpu->arch.interrupt.injected) {
 			kvm_x86_ops.set_irq(vcpu);
+			can_inject = false;
+		}
 	}
 
 	WARN_ON_ONCE(vcpu->arch.exception.injected &&
@@ -7790,10 +7796,11 @@ static int inject_pending_event(struct kvm_vcpu *vcpu)
 		}
 
 		kvm_x86_ops.queue_exception(vcpu);
+		can_inject = false;
 	}
 
-	/* Don't consider new event if we re-injected an event */
-	if (kvm_event_needs_reinjection(vcpu))
+	/* Finish re-injection before considering new events */
+	if (!can_inject)
 		return 0;
 
 	if (vcpu->arch.smi_pending &&
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/30] KVM: x86: enable event window in inject_pending_event
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 01/30] KVM: x86: track manually whether an event has been injected Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 03/30] KVM: nSVM: inject exceptions via svm_check_nested_events Paolo Bonzini
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

In case an interrupt arrives after nested.check_events but before the
call to kvm_cpu_has_injectable_intr, we could end up enabling the interrupt
window even if the interrupt is actually going to be a vmexit.  This is
useless rather than harmful, but it really complicates reasoning about
SVM's handling of the VINTR intercept.  We'd like to never bother with
the VINTR intercept if V_INTR_MASKING=1 && INTERCEPT_INTR=1, because in
that case there is no interrupt window and we can just exit the nested
guest whenever we want.

This patch moves the opening of the interrupt window inside
inject_pending_event.  This consolidates the check for pending
interrupt/NMI/SMI in one place, and makes KVM's usage of immediate
exits more consistent, extending it beyond just nested virtualization.

There are two functional changes here.  They only affect corner cases,
but overall they simplify the inject_pending_event.

- re-injection of still-pending events will also use req_immediate_exit
instead of using interrupt-window intercepts.  This should have no impact
on performance on Intel since it simply replaces an interrupt-window
or NMI-window exit for a preemption-timer exit.  On AMD, which has no
equivalent of the preemption time, it may incur some overhead but an
actual effect on performance should only be visible in pathological cases.

- kvm_arch_interrupt_allowed and kvm_vcpu_has_events will return true
if an interrupt, NMI or SMI is blocked by nested_run_pending.  This
makes sense because entering the VM will allow it to make progress
and deliver the event.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |   8 +--
 arch/x86/kvm/svm/svm.c          |  24 +++----
 arch/x86/kvm/vmx/vmx.c          |  20 +++---
 arch/x86/kvm/x86.c              | 117 ++++++++++++++++++--------------
 4 files changed, 92 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db261da578f3..7707bd4b0593 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1136,8 +1136,8 @@ struct kvm_x86_ops {
 	void (*set_nmi)(struct kvm_vcpu *vcpu);
 	void (*queue_exception)(struct kvm_vcpu *vcpu);
 	void (*cancel_injection)(struct kvm_vcpu *vcpu);
-	bool (*interrupt_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
-	bool (*nmi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
+	int (*interrupt_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
+	int (*nmi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
 	bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
 	void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked);
 	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
@@ -1234,10 +1234,10 @@ struct kvm_x86_ops {
 
 	void (*setup_mce)(struct kvm_vcpu *vcpu);
 
-	bool (*smi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
+	int (*smi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
 	int (*pre_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
 	int (*pre_leave_smm)(struct kvm_vcpu *vcpu, const char *smstate);
-	int (*enable_smi_window)(struct kvm_vcpu *vcpu);
+	void (*enable_smi_window)(struct kvm_vcpu *vcpu);
 
 	int (*mem_enc_op)(struct kvm *kvm, void __user *argp);
 	int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d877a0f70cac..5da494847a2f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3053,15 +3053,15 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
-static bool svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	if (svm->nested.nested_run_pending)
-		return false;
+		return -EBUSY;
 
 	/* An NMI must not be injected into L2 if it's supposed to VM-Exit.  */
 	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
-		return false;
+		return -EBUSY;
 
 	return !svm_nmi_blocked(vcpu);
 }
@@ -3112,18 +3112,18 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
 	return (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK);
 }
 
-static bool svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	if (svm->nested.nested_run_pending)
-		return false;
+		return -EBUSY;
 
 	/*
 	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
 	 * e.g. if the IRQ arrived asynchronously after checking nested events.
 	 */
 	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
-		return false;
+		return -EBUSY;
 
 	return !svm_interrupt_blocked(vcpu);
 }
@@ -3793,15 +3793,15 @@ bool svm_smi_blocked(struct kvm_vcpu *vcpu)
 	return is_smm(vcpu);
 }
 
-static bool svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	if (svm->nested.nested_run_pending)
-		return false;
+		return -EBUSY;
 
 	/* An SMI must not be injected into L2 if it's supposed to VM-Exit.  */
 	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_smi(svm))
-		return false;
+		return -EBUSY;
 
 	return !svm_smi_blocked(vcpu);
 }
@@ -3848,7 +3848,7 @@ static int svm_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 	return 0;
 }
 
-static int enable_smi_window(struct kvm_vcpu *vcpu)
+static void enable_smi_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -3856,9 +3856,9 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
 		if (vgif_enabled(svm))
 			set_intercept(svm, INTERCEPT_STGI);
 		/* STGI will cause a vm exit */
-		return 1;
+	} else {
+		/* We must be in SMM; RSM will cause a vmexit anyway.  */
 	}
-	return 0;
 }
 
 static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4e76e30b661c..b3a41645e157 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4551,14 +4551,14 @@ bool vmx_nmi_blocked(struct kvm_vcpu *vcpu)
 		 GUEST_INTR_STATE_NMI));
 }
 
-static bool vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+static int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	if (to_vmx(vcpu)->nested.nested_run_pending)
-		return false;
+		return -EBUSY;
 
 	/* An NMI must not be injected into L2 if it's supposed to VM-Exit.  */
 	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_nmi(vcpu))
-		return false;
+		return -EBUSY;
 
 	return !vmx_nmi_blocked(vcpu);
 }
@@ -4573,17 +4573,17 @@ bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
 		(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
 }
 
-static bool vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	if (to_vmx(vcpu)->nested.nested_run_pending)
-		return false;
+		return -EBUSY;
 
        /*
         * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
         * e.g. if the IRQ arrived asynchronously after checking nested events.
         */
 	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
-		return false;
+		return -EBUSY;
 
 	return !vmx_interrupt_blocked(vcpu);
 }
@@ -7757,11 +7757,11 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEAT_CTL_LMCE_ENABLED;
 }
 
-static bool vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+static int vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	/* we need a nested vmexit to enter SMM, postpone if run is pending */
 	if (to_vmx(vcpu)->nested.nested_run_pending)
-		return false;
+		return -EBUSY;
 	return !is_smm(vcpu);
 }
 
@@ -7799,9 +7799,9 @@ static int vmx_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 	return 0;
 }
 
-static int enable_smi_window(struct kvm_vcpu *vcpu)
+static void enable_smi_window(struct kvm_vcpu *vcpu)
 {
-	return 0;
+	/* RSM will cause a vmexit anyway.  */
 }
 
 static bool vmx_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 77b9b4e66673..0ee828f60d05 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7714,7 +7714,7 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu)
 	kvm_x86_ops.update_cr8_intercept(vcpu, tpr, max_irr);
 }
 
-static int inject_pending_event(struct kvm_vcpu *vcpu)
+static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
 {
 	int r;
 	bool can_inject = true;
@@ -7760,8 +7760,8 @@ static int inject_pending_event(struct kvm_vcpu *vcpu)
 	 */
 	if (is_guest_mode(vcpu)) {
 		r = kvm_x86_ops.nested_ops->check_events(vcpu);
-		if (r != 0)
-			return r;
+		if (r < 0)
+			goto busy;
 	}
 
 	/* try to inject new event if pending */
@@ -7799,27 +7799,69 @@ static int inject_pending_event(struct kvm_vcpu *vcpu)
 		can_inject = false;
 	}
 
-	/* Finish re-injection before considering new events */
-	if (!can_inject)
-		return 0;
+	/*
+	 * Finally, inject interrupt events.  If an event cannot be injected
+	 * due to architectural conditions (e.g. IF=0) a window-open exit
+	 * will re-request KVM_REQ_EVENT.  Sometimes however an event is pending
+	 * and can architecturally be injected, but we cannot do it right now:
+	 * an interrupt could have arrived just now and we have to inject it
+	 * as a vmexit, or there could already an event in the queue, which is
+	 * indicated by can_inject.  In that case we request an immediate exit
+	 * in order to make progress and get back here for another iteration.
+	 * The kvm_x86_ops hooks communicate this by returning -EBUSY.
+	 */
+	if (vcpu->arch.smi_pending) {
+		r = can_inject ? kvm_x86_ops.smi_allowed(vcpu, true) : -EBUSY;
+		if (r < 0)
+			goto busy;
+		if (r) {
+			vcpu->arch.smi_pending = false;
+			++vcpu->arch.smi_count;
+			enter_smm(vcpu);
+			can_inject = false;
+		} else
+			kvm_x86_ops.enable_smi_window(vcpu);
+	}
+
+	if (vcpu->arch.nmi_pending) {
+		r = can_inject ? kvm_x86_ops.nmi_allowed(vcpu, true) : -EBUSY;
+		if (r < 0)
+			goto busy;
+		if (r) {
+			--vcpu->arch.nmi_pending;
+			vcpu->arch.nmi_injected = true;
+			kvm_x86_ops.set_nmi(vcpu);
+			can_inject = false;
+			WARN_ON(kvm_x86_ops.nmi_allowed(vcpu, true) < 0);
+		}
+		if (vcpu->arch.nmi_pending)
+			kvm_x86_ops.enable_nmi_window(vcpu);
+	}
 
-	if (vcpu->arch.smi_pending &&
-	    kvm_x86_ops.smi_allowed(vcpu, true)) {
-		vcpu->arch.smi_pending = false;
-		++vcpu->arch.smi_count;
-		enter_smm(vcpu);
-	} else if (vcpu->arch.nmi_pending &&
-		   kvm_x86_ops.nmi_allowed(vcpu, true)) {
-		--vcpu->arch.nmi_pending;
-		vcpu->arch.nmi_injected = true;
-		kvm_x86_ops.set_nmi(vcpu);
-	} else if (kvm_cpu_has_injectable_intr(vcpu) &&
-		   kvm_x86_ops.interrupt_allowed(vcpu, true)) {
-		kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu), false);
-		kvm_x86_ops.set_irq(vcpu);
+	if (kvm_cpu_has_injectable_intr(vcpu)) {
+		r = can_inject ? kvm_x86_ops.interrupt_allowed(vcpu, true) : -EBUSY;
+		if (r < 0)
+			goto busy;
+		if (r) {
+			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu), false);
+			kvm_x86_ops.set_irq(vcpu);
+			WARN_ON(kvm_x86_ops.interrupt_allowed(vcpu, true) < 0);
+		}
+		if (kvm_cpu_has_injectable_intr(vcpu))
+			kvm_x86_ops.enable_irq_window(vcpu);
 	}
 
-	return 0;
+	if (is_guest_mode(vcpu) &&
+	    kvm_x86_ops.nested_ops->hv_timer_pending &&
+	    kvm_x86_ops.nested_ops->hv_timer_pending(vcpu))
+		*req_immediate_exit = true;
+
+	WARN_ON(vcpu->arch.exception.pending);
+	return;
+
+busy:
+	*req_immediate_exit = true;
+	return;
 }
 
 static void process_nmi(struct kvm_vcpu *vcpu)
@@ -8357,36 +8399,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			goto out;
 		}
 
-		if (inject_pending_event(vcpu) != 0)
-			req_immediate_exit = true;
-		else {
-			/* Enable SMI/NMI/IRQ window open exits if needed.
-			 *
-			 * SMIs have three cases:
-			 * 1) They can be nested, and then there is nothing to
-			 *    do here because RSM will cause a vmexit anyway.
-			 * 2) There is an ISA-specific reason why SMI cannot be
-			 *    injected, and the moment when this changes can be
-			 *    intercepted.
-			 * 3) Or the SMI can be pending because
-			 *    inject_pending_event has completed the injection
-			 *    of an IRQ or NMI from the previous vmexit, and
-			 *    then we request an immediate exit to inject the
-			 *    SMI.
-			 */
-			if (vcpu->arch.smi_pending && !is_smm(vcpu))
-				if (!kvm_x86_ops.enable_smi_window(vcpu))
-					req_immediate_exit = true;
-			if (vcpu->arch.nmi_pending)
-				kvm_x86_ops.enable_nmi_window(vcpu);
-			if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win)
-				kvm_x86_ops.enable_irq_window(vcpu);
-			if (is_guest_mode(vcpu) &&
-			    kvm_x86_ops.nested_ops->hv_timer_pending &&
-			    kvm_x86_ops.nested_ops->hv_timer_pending(vcpu))
-				req_immediate_exit = true;
-			WARN_ON(vcpu->arch.exception.pending);
-		}
+		inject_pending_event(vcpu, &req_immediate_exit);
+		if (req_int_win)
+			kvm_x86_ops.enable_irq_window(vcpu);
 
 		if (kvm_lapic_enabled(vcpu)) {
 			update_cr8_intercept(vcpu);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/30] KVM: nSVM: inject exceptions via svm_check_nested_events
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 01/30] KVM: x86: track manually whether an event has been injected Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 02/30] KVM: x86: enable event window in inject_pending_event Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 04/30] KVM: nSVM: remove exit_required Paolo Bonzini
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

This allows exceptions injected by the emulator to be properly delivered
as vmexits.  The code also becomes simpler, because we can just let all
L0-intercepted exceptions go through the usual path.  In particular, our
emulation of the VMX #DB exit qualification is very much simplified,
because the vmexit injection path can use kvm_deliver_exception_payload
to update DR6.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm/nested.c       | 136 +++++++++++++-------------------
 arch/x86/kvm/svm/svm.c          |   9 ---
 arch/x86/kvm/svm/svm.h          |   1 +
 arch/x86/kvm/x86.c              |  13 +--
 5 files changed, 58 insertions(+), 103 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7707bd4b0593..e6f2e1a2dab6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1495,6 +1495,8 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
+void kvm_update_dr7(struct kvm_vcpu *vcpu);
+
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index f39e0d578b9b..1ae13fd17028 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -111,6 +111,8 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	h = &svm->nested.hsave->control;
 	g = &svm->nested;
 
+	svm->nested.host_intercept_exceptions = h->intercept_exceptions;
+
 	c->intercept_cr = h->intercept_cr;
 	c->intercept_dr = h->intercept_dr;
 	c->intercept_exceptions = h->intercept_exceptions;
@@ -616,50 +618,6 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 	return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
 }
 
-/* DB exceptions for our internal use must not cause vmexit */
-static int nested_svm_intercept_db(struct vcpu_svm *svm)
-{
-	unsigned long dr6 = svm->vmcb->save.dr6;
-
-	/* Always catch it and pass it to userspace if debugging.  */
-	if (svm->vcpu.guest_debug &
-	    (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
-		return NESTED_EXIT_HOST;
-
-	/* if we're not singlestepping, it's not ours */
-	if (!svm->nmi_singlestep)
-		goto reflected_db;
-
-	/* if it's not a singlestep exception, it's not ours */
-	if (!(dr6 & DR6_BS))
-		goto reflected_db;
-
-	/* if the guest is singlestepping, it should get the vmexit */
-	if (svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF) {
-		disable_nmi_singlestep(svm);
-		goto reflected_db;
-	}
-
-	/* it's ours, the nested hypervisor must not see this one */
-	return NESTED_EXIT_HOST;
-
-reflected_db:
-	/*
-	 * Synchronize guest DR6 here just like in kvm_deliver_exception_payload;
-	 * it will be moved into the nested VMCB by nested_svm_vmexit.  Once
-	 * exceptions will be moved to svm_check_nested_events, all this stuff
-	 * will just go away and we could just return NESTED_EXIT_HOST
-	 * unconditionally.  db_interception will queue the exception, which
-	 * will be processed by svm_check_nested_events if a nested vmexit is
-	 * required, and we will just use kvm_deliver_exception_payload to copy
-	 * the payload to DR6 before vmexit.
-	 */
-	WARN_ON(svm->vcpu.arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT);
-	svm->vcpu.arch.dr6 &= ~(DR_TRAP_BITS | DR6_RTM);
-	svm->vcpu.arch.dr6 |= dr6 & ~DR6_FIXED_1;
-	return NESTED_EXIT_DONE;
-}
-
 static int nested_svm_intercept_ioio(struct vcpu_svm *svm)
 {
 	unsigned port, size, iopm_len;
@@ -710,20 +668,12 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
 		break;
 	}
 	case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
-		u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
-		if (svm->nested.intercept_exceptions & excp_bits) {
-			if (exit_code == SVM_EXIT_EXCP_BASE + DB_VECTOR)
-				vmexit = nested_svm_intercept_db(svm);
-			else if (exit_code == SVM_EXIT_EXCP_BASE + BP_VECTOR &&
-				 svm->vcpu.guest_debug & KVM_GUESTDBG_USE_SW_BP)
-				vmexit = NESTED_EXIT_HOST;
-			else
-				vmexit = NESTED_EXIT_DONE;
-		}
-		/* async page fault always cause vmexit */
-		else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) &&
-			 svm->vcpu.arch.exception.nested_apf != 0)
-			vmexit = NESTED_EXIT_DONE;
+		/*
+		 * Host-intercepted exceptions have been checked already in
+		 * nested_svm_exit_special.  There is nothing to do here,
+		 * the vmexit is injected by svm_check_nested_events.
+		 */
+		vmexit = NESTED_EXIT_DONE;
 		break;
 	}
 	case SVM_EXIT_ERR: {
@@ -768,35 +718,45 @@ int nested_svm_check_permissions(struct vcpu_svm *svm)
 	return 0;
 }
 
-int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
-			       bool has_error_code, u32 error_code)
+static bool nested_exit_on_exception(struct vcpu_svm *svm)
 {
-	int vmexit;
+	unsigned int nr = svm->vcpu.arch.exception.nr;
 
-	if (!is_guest_mode(&svm->vcpu))
-		return 0;
+	return (svm->nested.intercept_exceptions & (1 << nr));
+}
 
-	vmexit = nested_svm_intercept(svm);
-	if (vmexit != NESTED_EXIT_DONE)
-		return 0;
+static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
+{
+	unsigned int nr = svm->vcpu.arch.exception.nr;
 
 	svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
 	svm->vmcb->control.exit_code_hi = 0;
-	svm->vmcb->control.exit_info_1 = error_code;
+
+	if (svm->vcpu.arch.exception.has_error_code)
+		svm->vmcb->control.exit_info_1 = svm->vcpu.arch.exception.error_code;
 
 	/*
 	 * EXITINFO2 is undefined for all exception intercepts other
 	 * than #PF.
 	 */
-	if (svm->vcpu.arch.exception.nested_apf)
-		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.apf.nested_apf_token;
-	else if (svm->vcpu.arch.exception.has_payload)
-		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.exception.payload;
-	else
-		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
+	if (nr == PF_VECTOR) {
+		if (svm->vcpu.arch.exception.nested_apf)
+			svm->vmcb->control.exit_info_2 = svm->vcpu.arch.apf.nested_apf_token;
+		else if (svm->vcpu.arch.exception.has_payload)
+			svm->vmcb->control.exit_info_2 = svm->vcpu.arch.exception.payload;
+		else
+			svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
+	} else if (nr == DB_VECTOR) {
+		/* See inject_pending_event.  */
+		kvm_deliver_exception_payload(&svm->vcpu);
+		if (svm->vcpu.arch.dr7 & DR7_GD) {
+			svm->vcpu.arch.dr7 &= ~DR7_GD;
+			kvm_update_dr7(&svm->vcpu);
+		}
+	} else
+		WARN_ON(svm->vcpu.arch.exception.has_payload);
 
-	svm->nested.exit_required = true;
-	return vmexit;
+	nested_svm_vmexit(svm);
 }
 
 static void nested_svm_smi(struct vcpu_svm *svm)
@@ -835,6 +795,15 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
 		kvm_event_needs_reinjection(vcpu) || svm->nested.exit_required ||
 		svm->nested.nested_run_pending;
 
+	if (vcpu->arch.exception.pending) {
+		if (block_nested_events)
+                        return -EBUSY;
+		if (!nested_exit_on_exception(svm))
+			return 0;
+		nested_svm_inject_exception_vmexit(svm);
+		return 0;
+	}
+
 	if (vcpu->arch.smi_pending && !svm_smi_blocked(vcpu)) {
 		if (block_nested_events)
 			return -EBUSY;
@@ -872,18 +841,19 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
 	switch (exit_code) {
 	case SVM_EXIT_INTR:
 	case SVM_EXIT_NMI:
-	case SVM_EXIT_EXCP_BASE + MC_VECTOR:
-		return NESTED_EXIT_HOST;
 	case SVM_EXIT_NPF:
-		/* For now we are always handling NPFs when using them */
-		if (npt_enabled)
+		return NESTED_EXIT_HOST;
+	case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
+		u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
+
+		if (get_host_vmcb(svm)->control.intercept_exceptions & excp_bits)
 			return NESTED_EXIT_HOST;
-		break;
-	case SVM_EXIT_EXCP_BASE + PF_VECTOR:
-		/* Trap async PF even if not shadowing */
-		if (!npt_enabled || svm->vcpu.arch.apf.host_apf_reason)
+		else if (exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR &&
+			 svm->vcpu.arch.apf.host_apf_reason)
+			/* Trap async PF even if not shadowing */
 			return NESTED_EXIT_HOST;
 		break;
+	}
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5da494847a2f..89b6fb1e0abc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -331,17 +331,8 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	unsigned nr = vcpu->arch.exception.nr;
 	bool has_error_code = vcpu->arch.exception.has_error_code;
-	bool reinject = vcpu->arch.exception.injected;
 	u32 error_code = vcpu->arch.exception.error_code;
 
-	/*
-	 * If we are within a nested VM we'd better #VMEXIT and let the guest
-	 * handle the exception
-	 */
-	if (!reinject &&
-	    nested_svm_check_exception(svm, nr, has_error_code, error_code))
-		return;
-
 	kvm_deliver_exception_payload(&svm->vcpu);
 
 	if (nr == BP_VECTOR && !nrips) {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5cc559ab862d..8342032291fc 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -86,6 +86,7 @@ struct nested_state {
 	u64 hsave_msr;
 	u64 vm_cr_msr;
 	u64 vmcb;
+	u32 host_intercept_exceptions;
 
 	/* These are the merged vectors */
 	u32 *msrpm;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0ee828f60d05..f0fa610bed91 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1072,7 +1072,7 @@ static void kvm_update_dr0123(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void kvm_update_dr7(struct kvm_vcpu *vcpu)
+void kvm_update_dr7(struct kvm_vcpu *vcpu)
 {
 	unsigned long dr7;
 
@@ -1085,6 +1085,7 @@ static void kvm_update_dr7(struct kvm_vcpu *vcpu)
 	if (dr7 & DR7_BP_EN_MASK)
 		vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
 }
+EXPORT_SYMBOL_GPL(kvm_update_dr7);
 
 static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
 {
@@ -7778,16 +7779,6 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
 					     X86_EFLAGS_RF);
 
 		if (vcpu->arch.exception.nr == DB_VECTOR) {
-			/*
-			 * This code assumes that nSVM doesn't use
-			 * check_nested_events(). If it does, the
-			 * DR6/DR7 changes should happen before L1
-			 * gets a #VMEXIT for an intercepted #DB in
-			 * L2.  (Under VMX, on the other hand, the
-			 * DR6/DR7 changes should not happen in the
-			 * event of a VM-exit to L1 for an intercepted
-			 * #DB in L2.)
-			 */
 			kvm_deliver_exception_payload(vcpu);
 			if (vcpu->arch.dr7 & DR7_GD) {
 				vcpu->arch.dr7 &= ~DR7_GD;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/30] KVM: nSVM: remove exit_required
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (2 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 03/30] KVM: nSVM: inject exceptions via svm_check_nested_events Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 05/30] KVM: nSVM: correctly inject INIT vmexits Paolo Bonzini
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

All events now inject vmexits before vmentry rather than after vmexit.  Therefore,
exit_required is not set anymore and we can remove it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c |  3 +--
 arch/x86/kvm/svm/svm.c    | 14 --------------
 arch/x86/kvm/svm/svm.h    |  3 ---
 3 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1ae13fd17028..3d17c62a84a3 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -792,8 +792,7 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	bool block_nested_events =
-		kvm_event_needs_reinjection(vcpu) || svm->nested.exit_required ||
-		svm->nested.nested_run_pending;
+		kvm_event_needs_reinjection(vcpu) || svm->nested.nested_run_pending;
 
 	if (vcpu->arch.exception.pending) {
 		if (block_nested_events)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 89b6fb1e0abc..545f63ebc720 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2889,13 +2889,6 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	if (npt_enabled)
 		vcpu->arch.cr3 = svm->vmcb->save.cr3;
 
-	if (unlikely(svm->nested.exit_required)) {
-		nested_svm_vmexit(svm);
-		svm->nested.exit_required = false;
-
-		return 1;
-	}
-
 	if (is_guest_mode(vcpu)) {
 		int vmexit;
 
@@ -3327,13 +3320,6 @@ static fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
 	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
 
-	/*
-	 * A vmexit emulation is required before the vcpu can be executed
-	 * again.
-	 */
-	if (unlikely(svm->nested.exit_required))
-		return EXIT_FASTPATH_NONE;
-
 	/*
 	 * Disable singlestep if we're injecting an interrupt/exception.
 	 * We don't want our modified rflags to be pushed on the stack where
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8342032291fc..89fab75dd4f5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -95,9 +95,6 @@ struct nested_state {
 	u64 vmcb_msrpm;
 	u64 vmcb_iopm;
 
-	/* A VMEXIT is required but not yet emulated */
-	bool exit_required;
-
 	/* A VMRUN has started but has not yet been performed, so
 	 * we cannot inject a nested vmexit yet.  */
 	bool nested_run_pending;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/30] KVM: nSVM: correctly inject INIT vmexits
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (3 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 04/30] KVM: nSVM: remove exit_required Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 06/30] KVM: SVM: always update CR3 in VMCB Paolo Bonzini
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

The usual drill at this point, except there is no code to remove because this
case was not handled at all.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 3d17c62a84a3..dcac4c3510ab 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -25,6 +25,7 @@
 #include "trace.h"
 #include "mmu.h"
 #include "x86.h"
+#include "lapic.h"
 #include "svm.h"
 
 static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
@@ -788,11 +789,37 @@ static void nested_svm_intr(struct vcpu_svm *svm)
 	nested_svm_vmexit(svm);
 }
 
+static inline bool nested_exit_on_init(struct vcpu_svm *svm)
+{
+	return (svm->nested.intercept & (1ULL << INTERCEPT_INIT));
+}
+
+static void nested_svm_init(struct vcpu_svm *svm)
+{
+	svm->vmcb->control.exit_code   = SVM_EXIT_INIT;
+	svm->vmcb->control.exit_info_1 = 0;
+	svm->vmcb->control.exit_info_2 = 0;
+
+	nested_svm_vmexit(svm);
+}
+
+
 static int svm_check_nested_events(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	bool block_nested_events =
 		kvm_event_needs_reinjection(vcpu) || svm->nested.nested_run_pending;
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	if (lapic_in_kernel(vcpu) &&
+	    test_bit(KVM_APIC_INIT, &apic->pending_events)) {
+		if (block_nested_events)
+			return -EBUSY;
+		if (!nested_exit_on_init(svm))
+			return 0;
+		nested_svm_init(svm);
+		return 0;
+	}
 
 	if (vcpu->arch.exception.pending) {
 		if (block_nested_events)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/30] KVM: SVM: always update CR3 in VMCB
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (4 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 05/30] KVM: nSVM: correctly inject INIT vmexits Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 17:41   ` Krish Sadhukhan
  2020-05-29 15:39 ` [PATCH 07/30] KVM: nVMX: always update CR3 in VMCS Paolo Bonzini
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

svm_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02 as
an optimization, but this is only correct before the nested vmentry.
If userspace is modifying CR3 with KVM_SET_SREGS after the VM has
already been put in guest mode, the value of CR3 will not be updated.
Remove the optimization, which almost never triggers anyway.
This was was added in commit 689f3bf21628 ("KVM: x86: unify callbacks
to load paging root", 2020-03-16) just to keep the two vendor-specific
modules closer, but we'll fix VMX too.

Fixes: 689f3bf21628 ("KVM: x86: unify callbacks to load paging root")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c |  6 +-----
 arch/x86/kvm/svm/svm.c    | 16 +++++-----------
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dcac4c3510ab..8756c9f463fd 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -256,11 +256,7 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	svm_set_efer(&svm->vcpu, nested_vmcb->save.efer);
 	svm_set_cr0(&svm->vcpu, nested_vmcb->save.cr0);
 	svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4);
-	if (npt_enabled) {
-		svm->vmcb->save.cr3 = nested_vmcb->save.cr3;
-		svm->vcpu.arch.cr3 = nested_vmcb->save.cr3;
-	} else
-		(void)kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);
+	(void)kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);
 
 	/* Guest paging mode is active - reset mmu */
 	kvm_mmu_reset_context(&svm->vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 545f63ebc720..feb96a410f2d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3447,7 +3447,6 @@ static fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long root)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	bool update_guest_cr3 = true;
 	unsigned long cr3;
 
 	cr3 = __sme_set(root);
@@ -3456,18 +3455,13 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long root)
 		mark_dirty(svm->vmcb, VMCB_NPT);
 
 		/* Loading L2's CR3 is handled by enter_svm_guest_mode.  */
-		if (is_guest_mode(vcpu))
-			update_guest_cr3 = false;
-		else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
-			cr3 = vcpu->arch.cr3;
-		else /* CR3 is already up-to-date.  */
-			update_guest_cr3 = false;
+		if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
+			return;
+		cr3 = vcpu->arch.cr3;
 	}
 
-	if (update_guest_cr3) {
-		svm->vmcb->save.cr3 = cr3;
-		mark_dirty(svm->vmcb, VMCB_CR);
-	}
+	svm->vmcb->save.cr3 = cr3;
+	mark_dirty(svm->vmcb, VMCB_CR);
 }
 
 static int is_disabled(void)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/30] KVM: nVMX: always update CR3 in VMCS
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (5 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 06/30] KVM: SVM: always update CR3 in VMCB Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 08/30] KVM: nSVM: move map argument out of enter_svm_guest_mode Paolo Bonzini
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

vmx_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02 as
an optimization, but this is only correct before the nested vmentry.
If userspace is modifying CR3 with KVM_SET_SREGS after the VM has
already been put in guest mode, the value of CR3 will not be updated.
Remove the optimization, which almost never triggers anyway.

Fixes: 04f11ef45810 ("KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b3a41645e157..7b55dc6230a9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3085,10 +3085,7 @@ void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd)
 			spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
 		}
 
-		/* Loading vmcs02.GUEST_CR3 is handled by nested VM-Enter. */
-		if (is_guest_mode(vcpu))
-			update_guest_cr3 = false;
-		else if (!enable_unrestricted_guest && !is_paging(vcpu))
+		if (!enable_unrestricted_guest && !is_paging(vcpu))
 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
 		else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
 			guest_cr3 = vcpu->arch.cr3;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/30] KVM: nSVM: move map argument out of enter_svm_guest_mode
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (6 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 07/30] KVM: nVMX: always update CR3 in VMCS Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 18:10   ` Krish Sadhukhan
  2020-05-29 15:39 ` [PATCH 09/30] KVM: nSVM: extract load_nested_vmcb_control Paolo Bonzini
                   ` (22 subsequent siblings)
  30 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Unmapping the nested VMCB in enter_svm_guest_mode is a bit of a wart,
since the map is not used elsewhere in the function.  There are
just two calls, so move it there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 14 ++++++--------
 arch/x86/kvm/svm/svm.c    |  3 ++-
 arch/x86/kvm/svm/svm.h    |  2 +-
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 8756c9f463fd..8e98d5e420a5 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -229,7 +229,7 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
 }
 
 void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
-			  struct vmcb *nested_vmcb, struct kvm_host_map *map)
+			  struct vmcb *nested_vmcb)
 {
 	bool evaluate_pending_interrupts =
 		is_intercept(svm, INTERCEPT_VINTR) ||
@@ -304,8 +304,6 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	svm->vmcb->control.pause_filter_thresh =
 		nested_vmcb->control.pause_filter_thresh;
 
-	kvm_vcpu_unmap(&svm->vcpu, map, true);
-
 	/* Enter Guest-Mode */
 	enter_guest_mode(&svm->vcpu);
 
@@ -368,10 +366,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 		nested_vmcb->control.exit_code_hi = 0;
 		nested_vmcb->control.exit_info_1  = 0;
 		nested_vmcb->control.exit_info_2  = 0;
-
-		kvm_vcpu_unmap(&svm->vcpu, &map, true);
-
-		return ret;
+		goto out;
 	}
 
 	trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
@@ -414,7 +409,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 	copy_vmcb_control_area(hsave, vmcb);
 
 	svm->nested.nested_run_pending = 1;
-	enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb, &map);
+	enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb);
 
 	if (!nested_svm_vmrun_msrpm(svm)) {
 		svm->vmcb->control.exit_code    = SVM_EXIT_ERR;
@@ -425,6 +420,9 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 		nested_svm_vmexit(svm);
 	}
 
+out:
+	kvm_vcpu_unmap(&svm->vcpu, &map, true);
+
 	return ret;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index feb96a410f2d..76b3f553815e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3814,7 +3814,8 @@ static int svm_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 		if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(vmcb), &map) == -EINVAL)
 			return 1;
 		nested_vmcb = map.hva;
-		enter_svm_guest_mode(svm, vmcb, nested_vmcb, &map);
+		enter_svm_guest_mode(svm, vmcb, nested_vmcb);
+		kvm_vcpu_unmap(&svm->vcpu, &map, true);
 	}
 	return 0;
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 89fab75dd4f5..33e3f09d7a8e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -395,7 +395,7 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
 }
 
 void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
-			  struct vmcb *nested_vmcb, struct kvm_host_map *map);
+			  struct vmcb *nested_vmcb);
 int nested_svm_vmrun(struct vcpu_svm *svm);
 void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
 int nested_svm_vmexit(struct vcpu_svm *svm);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/30] KVM: nSVM: extract load_nested_vmcb_control
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (7 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 08/30] KVM: nSVM: move map argument out of enter_svm_guest_mode Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 10/30] KVM: nSVM: extract preparation of VMCB for nested run Paolo Bonzini
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

When restoring SVM nested state, the control state cache in svm->nested
will have to be filled, but the save state will not have to be moved
into svm->vmcb.  Therefore, pull the code that handles the control area
into a separate function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 8e98d5e420a5..fc0c6d1678eb 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -228,6 +228,23 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
 	return true;
 }
 
+static void load_nested_vmcb_control(struct vcpu_svm *svm,
+				     struct vmcb_control_area *control)
+{
+	svm->nested.nested_cr3 = control->nested_cr3;
+
+	svm->nested.vmcb_msrpm = control->msrpm_base_pa & ~0x0fffULL;
+	svm->nested.vmcb_iopm  = control->iopm_base_pa  & ~0x0fffULL;
+
+	/* cache intercepts */
+	svm->nested.intercept_cr         = control->intercept_cr;
+	svm->nested.intercept_dr         = control->intercept_dr;
+	svm->nested.intercept_exceptions = control->intercept_exceptions;
+	svm->nested.intercept            = control->intercept;
+
+	svm->vcpu.arch.tsc_offset += control->tsc_offset;
+}
+
 void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 			  struct vmcb *nested_vmcb)
 {
@@ -235,15 +252,16 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 		is_intercept(svm, INTERCEPT_VINTR) ||
 		is_intercept(svm, INTERCEPT_IRET);
 
+	svm->nested.vmcb = vmcb_gpa;
 	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
 		svm->vcpu.arch.hflags |= HF_HIF_MASK;
 	else
 		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
 
-	if (nested_vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
-		svm->nested.nested_cr3 = nested_vmcb->control.nested_cr3;
+	load_nested_vmcb_control(svm, &nested_vmcb->control);
+
+	if (nested_vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE)
 		nested_svm_init_mmu_context(&svm->vcpu);
-	}
 
 	/* Load the nested guest state */
 	svm->vmcb->save.es = nested_vmcb->save.es;
@@ -274,25 +292,15 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	svm->vcpu.arch.dr6  = nested_vmcb->save.dr6;
 	svm->vmcb->save.cpl = nested_vmcb->save.cpl;
 
-	svm->nested.vmcb_msrpm = nested_vmcb->control.msrpm_base_pa & ~0x0fffULL;
-	svm->nested.vmcb_iopm  = nested_vmcb->control.iopm_base_pa  & ~0x0fffULL;
-
-	/* cache intercepts */
-	svm->nested.intercept_cr         = nested_vmcb->control.intercept_cr;
-	svm->nested.intercept_dr         = nested_vmcb->control.intercept_dr;
-	svm->nested.intercept_exceptions = nested_vmcb->control.intercept_exceptions;
-	svm->nested.intercept            = nested_vmcb->control.intercept;
-
 	svm_flush_tlb(&svm->vcpu);
-	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
 	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
 		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
 	else
 		svm->vcpu.arch.hflags &= ~HF_VINTR_MASK;
 
-	svm->vcpu.arch.tsc_offset += nested_vmcb->control.tsc_offset;
 	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset;
 
+	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
 	svm->vmcb->control.virt_ext = nested_vmcb->control.virt_ext;
 	svm->vmcb->control.int_vector = nested_vmcb->control.int_vector;
 	svm->vmcb->control.int_state = nested_vmcb->control.int_state;
@@ -313,8 +321,6 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	 */
 	recalc_intercepts(svm);
 
-	svm->nested.vmcb = vmcb_gpa;
-
 	/*
 	 * If L1 had a pending IRQ/NMI before executing VMRUN,
 	 * which wasn't delivered because it was disallowed (e.g.
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/30] KVM: nSVM: extract preparation of VMCB for nested run
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (8 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 09/30] KVM: nSVM: extract load_nested_vmcb_control Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 18:27   ` Krish Sadhukhan
  2020-05-29 15:39 ` [PATCH 11/30] KVM: nSVM: move MMU setup to nested_prepare_vmcb_control Paolo Bonzini
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Split out filling svm->vmcb.save and svm->vmcb.control before VMRUN.
Only the latter will be useful when restoring nested SVM state.

This patch introduces no semantic change, so the MMU setup is still
done in nested_prepare_vmcb_save.  The next patch will clean up things.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 40 +++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fc0c6d1678eb..73be7af79453 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -245,21 +245,8 @@ static void load_nested_vmcb_control(struct vcpu_svm *svm,
 	svm->vcpu.arch.tsc_offset += control->tsc_offset;
 }
 
-void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
-			  struct vmcb *nested_vmcb)
+static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
 {
-	bool evaluate_pending_interrupts =
-		is_intercept(svm, INTERCEPT_VINTR) ||
-		is_intercept(svm, INTERCEPT_IRET);
-
-	svm->nested.vmcb = vmcb_gpa;
-	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
-		svm->vcpu.arch.hflags |= HF_HIF_MASK;
-	else
-		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
-
-	load_nested_vmcb_control(svm, &nested_vmcb->control);
-
 	if (nested_vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE)
 		nested_svm_init_mmu_context(&svm->vcpu);
 
@@ -291,7 +278,10 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	svm->vmcb->save.dr7 = nested_vmcb->save.dr7;
 	svm->vcpu.arch.dr6  = nested_vmcb->save.dr6;
 	svm->vmcb->save.cpl = nested_vmcb->save.cpl;
+}
 
+static void nested_prepare_vmcb_control(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
+{
 	svm_flush_tlb(&svm->vcpu);
 	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
 		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
@@ -321,6 +311,26 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	 */
 	recalc_intercepts(svm);
 
+	mark_all_dirty(svm->vmcb);
+}
+
+void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
+			  struct vmcb *nested_vmcb)
+{
+	bool evaluate_pending_interrupts =
+		is_intercept(svm, INTERCEPT_VINTR) ||
+		is_intercept(svm, INTERCEPT_IRET);
+
+	svm->nested.vmcb = vmcb_gpa;
+	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
+		svm->vcpu.arch.hflags |= HF_HIF_MASK;
+	else
+		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
+
+	load_nested_vmcb_control(svm, &nested_vmcb->control);
+	nested_prepare_vmcb_save(svm, nested_vmcb);
+	nested_prepare_vmcb_control(svm, nested_vmcb);
+
 	/*
 	 * If L1 had a pending IRQ/NMI before executing VMRUN,
 	 * which wasn't delivered because it was disallowed (e.g.
@@ -336,8 +346,6 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	enable_gif(svm);
 	if (unlikely(evaluate_pending_interrupts))
 		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
-
-	mark_all_dirty(svm->vmcb);
 }
 
 int nested_svm_vmrun(struct vcpu_svm *svm)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/30] KVM: nSVM: move MMU setup to nested_prepare_vmcb_control
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (9 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 10/30] KVM: nSVM: extract preparation of VMCB for nested run Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 12/30] KVM: nSVM: clean up tsc_offset update Paolo Bonzini
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Everything that is needed during nested state restore is now part of
nested_prepare_vmcb_control.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 73be7af79453..a85cc7376a82 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -247,9 +247,6 @@ static void load_nested_vmcb_control(struct vcpu_svm *svm,
 
 static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
 {
-	if (nested_vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE)
-		nested_svm_init_mmu_context(&svm->vcpu);
-
 	/* Load the nested guest state */
 	svm->vmcb->save.es = nested_vmcb->save.es;
 	svm->vmcb->save.cs = nested_vmcb->save.cs;
@@ -263,9 +260,6 @@ static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_v
 	svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4);
 	(void)kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);
 
-	/* Guest paging mode is active - reset mmu */
-	kvm_mmu_reset_context(&svm->vcpu);
-
 	svm->vmcb->save.cr2 = svm->vcpu.arch.cr2 = nested_vmcb->save.cr2;
 	kvm_rax_write(&svm->vcpu, nested_vmcb->save.rax);
 	kvm_rsp_write(&svm->vcpu, nested_vmcb->save.rsp);
@@ -282,6 +276,12 @@ static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_v
 
 static void nested_prepare_vmcb_control(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
 {
+	if (nested_vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE)
+		nested_svm_init_mmu_context(&svm->vcpu);
+
+	/* Guest paging mode is active - reset mmu */
+	kvm_mmu_reset_context(&svm->vcpu);
+
 	svm_flush_tlb(&svm->vcpu);
 	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
 		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 12/30] KVM: nSVM: clean up tsc_offset update
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (10 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 11/30] KVM: nSVM: move MMU setup to nested_prepare_vmcb_control Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 13/30] KVM: nSVM: pass vmcb_control_area to copy_vmcb_control_area Paolo Bonzini
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Use l1_tsc_offset to compute svm->vcpu.arch.tsc_offset and
svm->vmcb->control.tsc_offset, instead of relying on hsave.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a85cc7376a82..5ca403a69148 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -241,8 +241,6 @@ static void load_nested_vmcb_control(struct vcpu_svm *svm,
 	svm->nested.intercept_dr         = control->intercept_dr;
 	svm->nested.intercept_exceptions = control->intercept_exceptions;
 	svm->nested.intercept            = control->intercept;
-
-	svm->vcpu.arch.tsc_offset += control->tsc_offset;
 }
 
 static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
@@ -288,7 +286,8 @@ static void nested_prepare_vmcb_control(struct vcpu_svm *svm, struct vmcb *neste
 	else
 		svm->vcpu.arch.hflags &= ~HF_VINTR_MASK;
 
-	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset;
+	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset =
+		svm->vcpu.arch.l1_tsc_offset + nested_vmcb->control.tsc_offset;
 
 	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
 	svm->vmcb->control.virt_ext = nested_vmcb->control.virt_ext;
@@ -553,7 +552,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	/* Restore the original control entries */
 	copy_vmcb_control_area(vmcb, hsave);
 
-	svm->vcpu.arch.tsc_offset = svm->vmcb->control.tsc_offset;
+	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset =
+		svm->vcpu.arch.l1_tsc_offset;
+
 	kvm_clear_exception_queue(&svm->vcpu);
 	kvm_clear_interrupt_queue(&svm->vcpu);
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 13/30] KVM: nSVM: pass vmcb_control_area to copy_vmcb_control_area
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (11 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 12/30] KVM: nSVM: clean up tsc_offset update Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 14/30] KVM: nSVM: remove trailing padding for struct vmcb_control_area Paolo Bonzini
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

This will come in handy when we put a struct vmcb_control_area in
svm->nested.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5ca403a69148..fd9742c1a860 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -141,11 +141,9 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	c->intercept |= g->intercept;
 }
 
-static void copy_vmcb_control_area(struct vmcb *dst_vmcb, struct vmcb *from_vmcb)
+static void copy_vmcb_control_area(struct vmcb_control_area *dst,
+				   struct vmcb_control_area *from)
 {
-	struct vmcb_control_area *dst  = &dst_vmcb->control;
-	struct vmcb_control_area *from = &from_vmcb->control;
-
 	dst->intercept_cr         = from->intercept_cr;
 	dst->intercept_dr         = from->intercept_dr;
 	dst->intercept_exceptions = from->intercept_exceptions;
@@ -419,7 +417,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 	else
 		hsave->save.cr3    = kvm_read_cr3(&svm->vcpu);
 
-	copy_vmcb_control_area(hsave, vmcb);
+	copy_vmcb_control_area(&hsave->control, &vmcb->control);
 
 	svm->nested.nested_run_pending = 1;
 	enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb);
@@ -550,7 +548,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 		nested_vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
 
 	/* Restore the original control entries */
-	copy_vmcb_control_area(vmcb, hsave);
+	copy_vmcb_control_area(&vmcb->control, &hsave->control);
 
 	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset =
 		svm->vcpu.arch.l1_tsc_offset;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 14/30] KVM: nSVM: remove trailing padding for struct vmcb_control_area
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (12 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 13/30] KVM: nSVM: pass vmcb_control_area to copy_vmcb_control_area Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 15/30] KVM: nSVM: save all control fields in svm->nested Paolo Bonzini
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Allow placing the VMCB structs on the stack or in other structs without
wasting too much space.  Add BUILD_BUG_ON as a quick safeguard against typos.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/svm.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 6ece8561ba66..8a1f5382a4ea 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -96,7 +96,6 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u8 reserved_6[8];	/* Offset 0xe8 */
 	u64 avic_logical_id;	/* Offset 0xf0 */
 	u64 avic_physical_id;	/* Offset 0xf8 */
-	u8 reserved_7[768];
 };
 
 
@@ -203,8 +202,16 @@ struct __attribute__ ((__packed__)) vmcb_save_area {
 	u64 last_excp_to;
 };
 
+
+static inline void __unused_size_checks(void)
+{
+	BUILD_BUG_ON(sizeof(struct vmcb_save_area) != 0x298);
+	BUILD_BUG_ON(sizeof(struct vmcb_control_area) != 256);
+}
+
 struct __attribute__ ((__packed__)) vmcb {
 	struct vmcb_control_area control;
+	u8 reserved_control[1024 - sizeof(struct vmcb_control_area)];
 	struct vmcb_save_area save;
 };
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 15/30] KVM: nSVM: save all control fields in svm->nested
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (13 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 14/30] KVM: nSVM: remove trailing padding for struct vmcb_control_area Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 16/30] KVM: nSVM: restore clobbered INT_CTL fields after clearing VINTR Paolo Bonzini
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

In preparation for nested SVM save/restore, store all data that matters
from the VMCB control area into svm->nested.  It will then become part
of the nested SVM state that is saved by KVM_SET_NESTED_STATE and
restored by KVM_GET_NESTED_STATE, just like the cached vmcs12 for nVMX.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 73 +++++++++++++++++----------------------
 arch/x86/kvm/svm/svm.c    |  4 +--
 arch/x86/kvm/svm/svm.h    | 20 +++--------
 3 files changed, 39 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fd9742c1a860..1e5f460b5540 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -60,7 +60,7 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
 static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	u64 cr3 = svm->nested.nested_cr3;
+	u64 cr3 = svm->nested.ctl.nested_cr3;
 	u64 pdpte;
 	int ret;
 
@@ -75,7 +75,7 @@ static unsigned long nested_svm_get_tdp_cr3(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	return svm->nested.nested_cr3;
+	return svm->nested.ctl.nested_cr3;
 }
 
 static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
@@ -100,8 +100,7 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
 
 void recalc_intercepts(struct vcpu_svm *svm)
 {
-	struct vmcb_control_area *c, *h;
-	struct nested_state *g;
+	struct vmcb_control_area *c, *h, *g;
 
 	mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 
@@ -110,7 +109,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
 
 	c = &svm->vmcb->control;
 	h = &svm->nested.hsave->control;
-	g = &svm->nested;
+	g = &svm->nested.ctl;
 
 	svm->nested.host_intercept_exceptions = h->intercept_exceptions;
 
@@ -180,7 +179,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 	 */
 	int i;
 
-	if (!(svm->nested.intercept & (1ULL << INTERCEPT_MSR_PROT)))
+	if (!(svm->nested.ctl.intercept & (1ULL << INTERCEPT_MSR_PROT)))
 		return true;
 
 	for (i = 0; i < MSRPM_OFFSETS; i++) {
@@ -191,7 +190,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 			break;
 
 		p      = msrpm_offsets[i];
-		offset = svm->nested.vmcb_msrpm + (p * 4);
+		offset = svm->nested.ctl.msrpm_base_pa + (p * 4);
 
 		if (kvm_vcpu_read_guest(&svm->vcpu, offset, &value, 4))
 			return false;
@@ -229,16 +228,10 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
 static void load_nested_vmcb_control(struct vcpu_svm *svm,
 				     struct vmcb_control_area *control)
 {
-	svm->nested.nested_cr3 = control->nested_cr3;
+	copy_vmcb_control_area(&svm->nested.ctl, control);
 
-	svm->nested.vmcb_msrpm = control->msrpm_base_pa & ~0x0fffULL;
-	svm->nested.vmcb_iopm  = control->iopm_base_pa  & ~0x0fffULL;
-
-	/* cache intercepts */
-	svm->nested.intercept_cr         = control->intercept_cr;
-	svm->nested.intercept_dr         = control->intercept_dr;
-	svm->nested.intercept_exceptions = control->intercept_exceptions;
-	svm->nested.intercept            = control->intercept;
+	svm->nested.ctl.msrpm_base_pa &= ~0x0fffULL;
+	svm->nested.ctl.iopm_base_pa  &= ~0x0fffULL;
 }
 
 static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
@@ -270,34 +263,32 @@ static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_v
 	svm->vmcb->save.cpl = nested_vmcb->save.cpl;
 }
 
-static void nested_prepare_vmcb_control(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
+static void nested_prepare_vmcb_control(struct vcpu_svm *svm)
 {
-	if (nested_vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE)
+	if (svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE)
 		nested_svm_init_mmu_context(&svm->vcpu);
 
 	/* Guest paging mode is active - reset mmu */
 	kvm_mmu_reset_context(&svm->vcpu);
 
 	svm_flush_tlb(&svm->vcpu);
-	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
+	if (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
 		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
 	else
 		svm->vcpu.arch.hflags &= ~HF_VINTR_MASK;
 
 	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset =
-		svm->vcpu.arch.l1_tsc_offset + nested_vmcb->control.tsc_offset;
+		svm->vcpu.arch.l1_tsc_offset + svm->nested.ctl.tsc_offset;
 
-	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
-	svm->vmcb->control.virt_ext = nested_vmcb->control.virt_ext;
-	svm->vmcb->control.int_vector = nested_vmcb->control.int_vector;
-	svm->vmcb->control.int_state = nested_vmcb->control.int_state;
-	svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
-	svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
+	svm->vmcb->control.int_ctl             = svm->nested.ctl.int_ctl | V_INTR_MASKING_MASK;
+	svm->vmcb->control.virt_ext            = svm->nested.ctl.virt_ext;
+	svm->vmcb->control.int_vector          = svm->nested.ctl.int_vector;
+	svm->vmcb->control.int_state           = svm->nested.ctl.int_state;
+	svm->vmcb->control.event_inj           = svm->nested.ctl.event_inj;
+	svm->vmcb->control.event_inj_err       = svm->nested.ctl.event_inj_err;
 
-	svm->vmcb->control.pause_filter_count =
-		nested_vmcb->control.pause_filter_count;
-	svm->vmcb->control.pause_filter_thresh =
-		nested_vmcb->control.pause_filter_thresh;
+	svm->vmcb->control.pause_filter_count  = svm->nested.ctl.pause_filter_count;
+	svm->vmcb->control.pause_filter_thresh = svm->nested.ctl.pause_filter_thresh;
 
 	/* Enter Guest-Mode */
 	enter_guest_mode(&svm->vcpu);
@@ -326,7 +317,7 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 
 	load_nested_vmcb_control(svm, &nested_vmcb->control);
 	nested_prepare_vmcb_save(svm, nested_vmcb);
-	nested_prepare_vmcb_control(svm, nested_vmcb);
+	nested_prepare_vmcb_control(svm);
 
 	/*
 	 * If L1 had a pending IRQ/NMI before executing VMRUN,
@@ -556,7 +547,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	kvm_clear_exception_queue(&svm->vcpu);
 	kvm_clear_interrupt_queue(&svm->vcpu);
 
-	svm->nested.nested_cr3 = 0;
+	svm->nested.ctl.nested_cr3 = 0;
 
 	/* Restore selected save entries */
 	svm->vmcb->save.es = hsave->save.es;
@@ -606,7 +597,7 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 	u32 offset, msr, value;
 	int write, mask;
 
-	if (!(svm->nested.intercept & (1ULL << INTERCEPT_MSR_PROT)))
+	if (!(svm->nested.ctl.intercept & (1ULL << INTERCEPT_MSR_PROT)))
 		return NESTED_EXIT_HOST;
 
 	msr    = svm->vcpu.arch.regs[VCPU_REGS_RCX];
@@ -620,7 +611,7 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 	/* Offset is in 32 bit units but need in 8 bit units */
 	offset *= 4;
 
-	if (kvm_vcpu_read_guest(&svm->vcpu, svm->nested.vmcb_msrpm + offset, &value, 4))
+	if (kvm_vcpu_read_guest(&svm->vcpu, svm->nested.ctl.msrpm_base_pa + offset, &value, 4))
 		return NESTED_EXIT_DONE;
 
 	return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
@@ -633,13 +624,13 @@ static int nested_svm_intercept_ioio(struct vcpu_svm *svm)
 	u8 start_bit;
 	u64 gpa;
 
-	if (!(svm->nested.intercept & (1ULL << INTERCEPT_IOIO_PROT)))
+	if (!(svm->nested.ctl.intercept & (1ULL << INTERCEPT_IOIO_PROT)))
 		return NESTED_EXIT_HOST;
 
 	port = svm->vmcb->control.exit_info_1 >> 16;
 	size = (svm->vmcb->control.exit_info_1 & SVM_IOIO_SIZE_MASK) >>
 		SVM_IOIO_SIZE_SHIFT;
-	gpa  = svm->nested.vmcb_iopm + (port / 8);
+	gpa  = svm->nested.ctl.iopm_base_pa + (port / 8);
 	start_bit = port % 8;
 	iopm_len = (start_bit + size > 8) ? 2 : 1;
 	mask = (0xf >> (4 - size)) << start_bit;
@@ -665,13 +656,13 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
 		break;
 	case SVM_EXIT_READ_CR0 ... SVM_EXIT_WRITE_CR8: {
 		u32 bit = 1U << (exit_code - SVM_EXIT_READ_CR0);
-		if (svm->nested.intercept_cr & bit)
+		if (svm->nested.ctl.intercept_cr & bit)
 			vmexit = NESTED_EXIT_DONE;
 		break;
 	}
 	case SVM_EXIT_READ_DR0 ... SVM_EXIT_WRITE_DR7: {
 		u32 bit = 1U << (exit_code - SVM_EXIT_READ_DR0);
-		if (svm->nested.intercept_dr & bit)
+		if (svm->nested.ctl.intercept_dr & bit)
 			vmexit = NESTED_EXIT_DONE;
 		break;
 	}
@@ -690,7 +681,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
 	}
 	default: {
 		u64 exit_bits = 1ULL << (exit_code - SVM_EXIT_INTR);
-		if (svm->nested.intercept & exit_bits)
+		if (svm->nested.ctl.intercept & exit_bits)
 			vmexit = NESTED_EXIT_DONE;
 	}
 	}
@@ -730,7 +721,7 @@ static bool nested_exit_on_exception(struct vcpu_svm *svm)
 {
 	unsigned int nr = svm->vcpu.arch.exception.nr;
 
-	return (svm->nested.intercept_exceptions & (1 << nr));
+	return (svm->nested.ctl.intercept_exceptions & (1 << nr));
 }
 
 static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
@@ -798,7 +789,7 @@ static void nested_svm_intr(struct vcpu_svm *svm)
 
 static inline bool nested_exit_on_init(struct vcpu_svm *svm)
 {
-	return (svm->nested.intercept & (1ULL << INTERCEPT_INIT));
+	return (svm->nested.ctl.intercept & (1ULL << INTERCEPT_INIT));
 }
 
 static void nested_svm_init(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 76b3f553815e..ec98c5979656 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2173,7 +2173,7 @@ static bool check_selective_cr0_intercepted(struct vcpu_svm *svm,
 	bool ret = false;
 	u64 intercept;
 
-	intercept = svm->nested.intercept;
+	intercept = svm->nested.ctl.intercept;
 
 	if (!is_guest_mode(&svm->vcpu) ||
 	    (!(intercept & (1ULL << INTERCEPT_SELECTIVE_CR0))))
@@ -3649,7 +3649,7 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
 		    info->intercept == x86_intercept_clts)
 			break;
 
-		intercept = svm->nested.intercept;
+		intercept = svm->nested.ctl.intercept;
 
 		if (!(intercept & (1ULL << INTERCEPT_SELECTIVE_CR0)))
 			break;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 33e3f09d7a8e..dd5418f20256 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -91,22 +91,12 @@ struct nested_state {
 	/* These are the merged vectors */
 	u32 *msrpm;
 
-	/* gpa pointers to the real vectors */
-	u64 vmcb_msrpm;
-	u64 vmcb_iopm;
-
 	/* A VMRUN has started but has not yet been performed, so
 	 * we cannot inject a nested vmexit yet.  */
 	bool nested_run_pending;
 
-	/* cache for intercepts of the guest */
-	u32 intercept_cr;
-	u32 intercept_dr;
-	u32 intercept_exceptions;
-	u64 intercept;
-
-	/* Nested Paging related state */
-	u64 nested_cr3;
+	/* cache for control fields of the guest */
+	struct vmcb_control_area ctl;
 };
 
 struct vcpu_svm {
@@ -381,17 +371,17 @@ static inline bool svm_nested_virtualize_tpr(struct kvm_vcpu *vcpu)
 
 static inline bool nested_exit_on_smi(struct vcpu_svm *svm)
 {
-	return (svm->nested.intercept & (1ULL << INTERCEPT_SMI));
+	return (svm->nested.ctl.intercept & (1ULL << INTERCEPT_SMI));
 }
 
 static inline bool nested_exit_on_intr(struct vcpu_svm *svm)
 {
-	return (svm->nested.intercept & (1ULL << INTERCEPT_INTR));
+	return (svm->nested.ctl.intercept & (1ULL << INTERCEPT_INTR));
 }
 
 static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
 {
-	return (svm->nested.intercept & (1ULL << INTERCEPT_NMI));
+	return (svm->nested.ctl.intercept & (1ULL << INTERCEPT_NMI));
 }
 
 void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 16/30] KVM: nSVM: restore clobbered INT_CTL fields after clearing VINTR
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (14 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 15/30] KVM: nSVM: save all control fields in svm->nested Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 17/30] KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit Paolo Bonzini
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Restore the INT_CTL value from the guest's VMCB once we've stopped using
it, so that virtual interrupts can be injected as requested by L1.
V_TPR is up-to-date however, and it can change if the guest writes to CR8,
so keep it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ec98c5979656..4122ba86bac2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1365,9 +1365,17 @@ static void svm_set_vintr(struct vcpu_svm *svm)
 
 static void svm_clear_vintr(struct vcpu_svm *svm)
 {
+	const u32 mask = V_TPR_MASK | V_GIF_ENABLE_MASK | V_GIF_MASK | V_INTR_MASKING_MASK;
 	clr_intercept(svm, INTERCEPT_VINTR);
 
-	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+	/* Drop int_ctl fields related to VINTR injection.  */
+	svm->vmcb->control.int_ctl &= mask;
+	if (is_guest_mode(&svm->vcpu)) {
+		WARN_ON((svm->vmcb->control.int_ctl & V_TPR_MASK) !=
+			(svm->nested.ctl.int_ctl & V_TPR_MASK));
+		svm->vmcb->control.int_ctl |= svm->nested.ctl.int_ctl & ~mask;
+	}
+
 	mark_dirty(svm->vmcb, VMCB_INTR);
 }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 17/30] KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (15 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 16/30] KVM: nSVM: restore clobbered INT_CTL fields after clearing VINTR Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-30  2:06   ` Krish Sadhukhan
  2020-05-29 15:39 ` [PATCH 18/30] KVM: nSVM: remove unnecessary if Paolo Bonzini
                   ` (13 subsequent siblings)
  30 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

The control state changes on every L2->L0 vmexit, and we will have to
serialize it in the nested state.  So keep it up to date in svm->nested.ctl
and just copy them back to the nested VMCB in nested_svm_vmexit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 57 ++++++++++++++++++++++-----------------
 arch/x86/kvm/svm/svm.c    |  5 +++-
 arch/x86/kvm/svm/svm.h    |  1 +
 3 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1e5f460b5540..921466eba556 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -234,6 +234,34 @@ static void load_nested_vmcb_control(struct vcpu_svm *svm,
 	svm->nested.ctl.iopm_base_pa  &= ~0x0fffULL;
 }
 
+/*
+ * Synchronize fields that are written by the processor, so that
+ * they can be copied back into the nested_vmcb.
+ */
+void sync_nested_vmcb_control(struct vcpu_svm *svm)
+{
+	u32 mask;
+	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
+	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
+
+	/* Only a few fields of int_ctl are written by the processor.  */
+	mask = V_IRQ_MASK | V_TPR_MASK;
+	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
+	    is_intercept(svm, SVM_EXIT_VINTR)) {
+		/*
+		 * In order to request an interrupt window, L0 is usurping
+		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
+		 * even if it was clear in L1's VMCB.  Restoring it would be
+		 * wrong.  However, in this case V_IRQ will remain true until
+		 * interrupt_window_interception calls svm_clear_vintr and
+		 * restores int_ctl.  We can just leave it aside.
+		 */
+		mask &= ~V_IRQ_MASK;
+	}
+	svm->nested.ctl.int_ctl        &= ~mask;
+	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
+}
+
 static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
 {
 	/* Load the nested guest state */
@@ -471,6 +499,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	/* Exit Guest-Mode */
 	leave_guest_mode(&svm->vcpu);
 	svm->nested.vmcb = 0;
+	WARN_ON_ONCE(svm->nested.nested_run_pending);
 
 	/* in case we halted in L2 */
 	svm->vcpu.arch.mp_state = KVM_MP_STATE_RUNNABLE;
@@ -497,8 +526,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	nested_vmcb->save.dr6    = svm->vcpu.arch.dr6;
 	nested_vmcb->save.cpl    = vmcb->save.cpl;
 
-	nested_vmcb->control.int_ctl           = vmcb->control.int_ctl;
-	nested_vmcb->control.int_vector        = vmcb->control.int_vector;
 	nested_vmcb->control.int_state         = vmcb->control.int_state;
 	nested_vmcb->control.exit_code         = vmcb->control.exit_code;
 	nested_vmcb->control.exit_code_hi      = vmcb->control.exit_code_hi;
@@ -510,34 +537,16 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (svm->nrips_enabled)
 		nested_vmcb->control.next_rip  = vmcb->control.next_rip;
 
-	/*
-	 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
-	 * to make sure that we do not lose injected events. So check event_inj
-	 * here and copy it to exit_int_info if it is valid.
-	 * Exit_int_info and event_inj can't be both valid because the case
-	 * below only happens on a VMRUN instruction intercept which has
-	 * no valid exit_int_info set.
-	 */
-	if (vmcb->control.event_inj & SVM_EVTINJ_VALID) {
-		struct vmcb_control_area *nc = &nested_vmcb->control;
-
-		nc->exit_int_info     = vmcb->control.event_inj;
-		nc->exit_int_info_err = vmcb->control.event_inj_err;
-	}
-
-	nested_vmcb->control.tlb_ctl           = 0;
-	nested_vmcb->control.event_inj         = 0;
-	nested_vmcb->control.event_inj_err     = 0;
+	nested_vmcb->control.int_ctl           = svm->nested.ctl.int_ctl;
+	nested_vmcb->control.tlb_ctl           = svm->nested.ctl.tlb_ctl;
+	nested_vmcb->control.event_inj         = svm->nested.ctl.event_inj;
+	nested_vmcb->control.event_inj_err     = svm->nested.ctl.event_inj_err;
 
 	nested_vmcb->control.pause_filter_count =
 		svm->vmcb->control.pause_filter_count;
 	nested_vmcb->control.pause_filter_thresh =
 		svm->vmcb->control.pause_filter_thresh;
 
-	/* We always set V_INTR_MASKING and remember the old value in hflags */
-	if (!(svm->vcpu.arch.hflags & HF_VINTR_MASK))
-		nested_vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
-
 	/* Restore the original control entries */
 	copy_vmcb_control_area(&vmcb->control, &hsave->control);
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4122ba86bac2..b710e62ace16 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3427,7 +3427,10 @@ static fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 	sync_cr8_to_lapic(vcpu);
 
 	svm->next_rip = 0;
-	svm->nested.nested_run_pending = 0;
+	if (is_guest_mode(&svm->vcpu)) {
+		sync_nested_vmcb_control(svm);
+		svm->nested.nested_run_pending = 0;
+	}
 
 	svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index dd5418f20256..7e79f0af1204 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -394,6 +394,7 @@ int nested_svm_check_permissions(struct vcpu_svm *svm);
 int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
 			       bool has_error_code, u32 error_code);
 int nested_svm_exit_special(struct vcpu_svm *svm);
+void sync_nested_vmcb_control(struct vcpu_svm *svm);
 
 extern struct kvm_x86_nested_ops svm_nested_ops;
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 18/30] KVM: nSVM: remove unnecessary if
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (16 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 17/30] KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 19/30] KVM: nSVM: extract svm_set_gif Paolo Bonzini
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

kvm_vcpu_apicv_active must be false when nested virtualization is enabled,
so there is no need to check it in clgi_interception.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b710e62ace16..7383f821eb3b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2015,8 +2015,7 @@ static int clgi_interception(struct vcpu_svm *svm)
 	disable_gif(svm);
 
 	/* After a CLGI no interrupts should come */
-	if (!kvm_vcpu_apicv_active(&svm->vcpu))
-		svm_clear_vintr(svm);
+	svm_clear_vintr(svm);
 
 	return ret;
 }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 19/30] KVM: nSVM: extract svm_set_gif
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (17 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 18/30] KVM: nSVM: remove unnecessary if Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-06-05 20:33   ` Qian Cai
  2020-05-29 15:39 ` [PATCH 20/30] KVM: SVM: preserve VGIF across VMCB switch Paolo Bonzini
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Extract the code that is needed to implement CLGI and STGI,
so that we can run it from VMRUN and vmexit (and in the future,
KVM_SET_NESTED_STATE).  Skip the request for KVM_REQ_EVENT unless needed,
subsuming the evaluate_pending_interrupts optimization that is found
in enter_svm_guest_mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/irq.c        |  1 +
 arch/x86/kvm/svm/nested.c | 22 ++---------------
 arch/x86/kvm/svm/svm.c    | 51 ++++++++++++++++++++++++++-------------
 arch/x86/kvm/svm/svm.h    |  1 +
 4 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 54f7ea68083b..99d118ffc67d 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -83,6 +83,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
 
 	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
 }
+EXPORT_SYMBOL_GPL(kvm_cpu_has_injectable_intr);
 
 /*
  * check if there is pending interrupt without
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 921466eba556..7e4a506828c9 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -333,10 +333,6 @@ static void nested_prepare_vmcb_control(struct vcpu_svm *svm)
 void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 			  struct vmcb *nested_vmcb)
 {
-	bool evaluate_pending_interrupts =
-		is_intercept(svm, INTERCEPT_VINTR) ||
-		is_intercept(svm, INTERCEPT_IRET);
-
 	svm->nested.vmcb = vmcb_gpa;
 	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
 		svm->vcpu.arch.hflags |= HF_HIF_MASK;
@@ -347,21 +343,7 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	nested_prepare_vmcb_save(svm, nested_vmcb);
 	nested_prepare_vmcb_control(svm);
 
-	/*
-	 * If L1 had a pending IRQ/NMI before executing VMRUN,
-	 * which wasn't delivered because it was disallowed (e.g.
-	 * interrupts disabled), L0 needs to evaluate if this pending
-	 * event should cause an exit from L2 to L1 or be delivered
-	 * directly to L2.
-	 *
-	 * Usually this would be handled by the processor noticing an
-	 * IRQ/NMI window request.  However, VMRUN can unblock interrupts
-	 * by implicitly setting GIF, so force L0 to perform pending event
-	 * evaluation by requesting a KVM_REQ_EVENT.
-	 */
-	enable_gif(svm);
-	if (unlikely(evaluate_pending_interrupts))
-		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
+	svm_set_gif(svm, true);
 }
 
 int nested_svm_vmrun(struct vcpu_svm *svm)
@@ -505,7 +487,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	svm->vcpu.arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
 	/* Give the current vmcb to the guest */
-	disable_gif(svm);
+	svm_set_gif(svm, false);
 
 	nested_vmcb->save.es     = vmcb->save.es;
 	nested_vmcb->save.cs     = vmcb->save.cs;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7383f821eb3b..e48e4173bc60 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1981,6 +1981,38 @@ static int vmrun_interception(struct vcpu_svm *svm)
 	return nested_svm_vmrun(svm);
 }
 
+void svm_set_gif(struct vcpu_svm *svm, bool value)
+{
+	if (value) {
+		/*
+		 * If VGIF is enabled, the STGI intercept is only added to
+		 * detect the opening of the SMI/NMI window; remove it now.
+		 * Likewise, clear the VINTR intercept, we will set it
+		 * again while processing KVM_REQ_EVENT if needed.
+		 */
+		if (vgif_enabled(svm))
+			clr_intercept(svm, INTERCEPT_STGI);
+		if (is_intercept(svm, SVM_EXIT_VINTR))
+			svm_clear_vintr(svm);
+
+		enable_gif(svm);
+		if (svm->vcpu.arch.smi_pending ||
+		    svm->vcpu.arch.nmi_pending ||
+		    kvm_cpu_has_injectable_intr(&svm->vcpu))
+			kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
+	} else {
+		disable_gif(svm);
+
+		/*
+		 * After a CLGI no interrupts should come.  But if vGIF is
+		 * in use, we still rely on the VINTR intercept (rather than
+		 * STGI) to detect an open interrupt window.
+		*/
+		if (!vgif_enabled(svm))
+			svm_clear_vintr(svm);
+	}
+}
+
 static int stgi_interception(struct vcpu_svm *svm)
 {
 	int ret;
@@ -1988,18 +2020,8 @@ static int stgi_interception(struct vcpu_svm *svm)
 	if (nested_svm_check_permissions(svm))
 		return 1;
 
-	/*
-	 * If VGIF is enabled, the STGI intercept is only added to
-	 * detect the opening of the SMI/NMI window; remove it now.
-	 */
-	if (vgif_enabled(svm))
-		clr_intercept(svm, INTERCEPT_STGI);
-
 	ret = kvm_skip_emulated_instruction(&svm->vcpu);
-	kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
-
-	enable_gif(svm);
-
+	svm_set_gif(svm, true);
 	return ret;
 }
 
@@ -2011,12 +2033,7 @@ static int clgi_interception(struct vcpu_svm *svm)
 		return 1;
 
 	ret = kvm_skip_emulated_instruction(&svm->vcpu);
-
-	disable_gif(svm);
-
-	/* After a CLGI no interrupts should come */
-	svm_clear_vintr(svm);
-
+	svm_set_gif(svm, false);
 	return ret;
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7e79f0af1204..10b7b55720a0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -357,6 +357,7 @@ void disable_nmi_singlestep(struct vcpu_svm *svm);
 bool svm_smi_blocked(struct kvm_vcpu *vcpu);
 bool svm_nmi_blocked(struct kvm_vcpu *vcpu);
 bool svm_interrupt_blocked(struct kvm_vcpu *vcpu);
+void svm_set_gif(struct vcpu_svm *svm, bool value);
 
 /* nested.c */
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 20/30] KVM: SVM: preserve VGIF across VMCB switch
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (18 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 19/30] KVM: nSVM: extract svm_set_gif Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-31 23:11   ` Krish Sadhukhan
  2020-05-29 15:39 ` [PATCH 21/30] KVM: nSVM: synthesize correct EXITINTINFO on vmexit Paolo Bonzini
                   ` (10 subsequent siblings)
  30 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

There is only one GIF flag for the whole processor, so make sure it is not clobbered
when switching to L2 (in which case we also have to include the V_GIF_ENABLE_MASK,
lest we confuse enable_gif/disable_gif/gif_set).  When going back, L1 could in
theory have entered L2 without issuing a CLGI so make sure the svm_set_gif is
done last, after svm->vmcb->control.int_ctl has been copied back from hsave.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 7e4a506828c9..6c7f0bffdf01 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -293,6 +293,7 @@ static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_v
 
 static void nested_prepare_vmcb_control(struct vcpu_svm *svm)
 {
+	const u32 mask = V_INTR_MASKING_MASK | V_GIF_ENABLE_MASK | V_GIF_MASK;
 	if (svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE)
 		nested_svm_init_mmu_context(&svm->vcpu);
 
@@ -308,7 +309,10 @@ static void nested_prepare_vmcb_control(struct vcpu_svm *svm)
 	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset =
 		svm->vcpu.arch.l1_tsc_offset + svm->nested.ctl.tsc_offset;
 
-	svm->vmcb->control.int_ctl             = svm->nested.ctl.int_ctl | V_INTR_MASKING_MASK;
+	svm->vmcb->control.int_ctl             =
+		(svm->nested.ctl.int_ctl & ~mask) |
+		(svm->nested.hsave->control.int_ctl & mask);
+
 	svm->vmcb->control.virt_ext            = svm->nested.ctl.virt_ext;
 	svm->vmcb->control.int_vector          = svm->nested.ctl.int_vector;
 	svm->vmcb->control.int_state           = svm->nested.ctl.int_state;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 21/30] KVM: nSVM: synthesize correct EXITINTINFO on vmexit
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (19 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 20/30] KVM: SVM: preserve VGIF across VMCB switch Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 22/30] KVM: nSVM: remove HF_VINTR_MASK Paolo Bonzini
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

This bit was added to nested VMX right when nested_run_pending was
introduced, but it is not yet there in nSVM.  Since we can have pending
events that L0 injected directly into L2 on vmentry, we have to transfer
them into L1's queue.

For this to work, one important change is required: svm_complete_interrupts
(which clears the "injected" fields from the previous VMRUN, and updates them
from svm->vmcb's EXITINTINFO) must be placed before we inject the vmexit.
This is not too scary though; VMX even does it in vmx_vcpu_run.

While at it, the nested_vmexit_inject tracepoint is moved towards the
end of nested_svm_vmexit.  This ensures that the synthesized EXITINTINFO
is visible in the trace.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 59 +++++++++++++++++++++++++++++++--------
 arch/x86/kvm/svm/svm.c    |  4 +--
 2 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6c7f0bffdf01..c3c04fef11df 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -262,6 +262,43 @@ void sync_nested_vmcb_control(struct vcpu_svm *svm)
 	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
 }
 
+/*
+ * Transfer any event that L0 or L1 wanted to inject into L2 to
+ * EXIT_INT_INFO.
+ */
+static void nested_vmcb_save_pending_event(struct vcpu_svm *svm,
+					   struct vmcb *nested_vmcb)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	u32 exit_int_info = 0;
+	unsigned int nr;
+
+	if (vcpu->arch.exception.injected) {
+		nr = vcpu->arch.exception.nr;
+		exit_int_info = nr | SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_EXEPT;
+
+		if (vcpu->arch.exception.has_error_code) {
+			exit_int_info |= SVM_EVTINJ_VALID_ERR;
+			nested_vmcb->control.exit_int_info_err =
+				vcpu->arch.exception.error_code;
+		}
+
+	} else if (vcpu->arch.nmi_injected) {
+		exit_int_info = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
+
+	} else if (vcpu->arch.interrupt.injected) {
+		nr = vcpu->arch.interrupt.nr;
+		exit_int_info = nr | SVM_EVTINJ_VALID;
+
+		if (vcpu->arch.interrupt.soft)
+			exit_int_info |= SVM_EVTINJ_TYPE_SOFT;
+		else
+			exit_int_info |= SVM_EVTINJ_TYPE_INTR;
+	}
+
+	nested_vmcb->control.exit_int_info = exit_int_info;
+}
+
 static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
 {
 	/* Load the nested guest state */
@@ -466,13 +503,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	struct vmcb *vmcb = svm->vmcb;
 	struct kvm_host_map map;
 
-	trace_kvm_nested_vmexit_inject(vmcb->control.exit_code,
-				       vmcb->control.exit_info_1,
-				       vmcb->control.exit_info_2,
-				       vmcb->control.exit_int_info,
-				       vmcb->control.exit_int_info_err,
-				       KVM_ISA_SVM);
-
 	rc = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->nested.vmcb), &map);
 	if (rc) {
 		if (rc == -EINVAL)
@@ -517,8 +547,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	nested_vmcb->control.exit_code_hi      = vmcb->control.exit_code_hi;
 	nested_vmcb->control.exit_info_1       = vmcb->control.exit_info_1;
 	nested_vmcb->control.exit_info_2       = vmcb->control.exit_info_2;
-	nested_vmcb->control.exit_int_info     = vmcb->control.exit_int_info;
-	nested_vmcb->control.exit_int_info_err = vmcb->control.exit_int_info_err;
+
+	if (nested_vmcb->control.exit_code != SVM_EXIT_ERR)
+		nested_vmcb_save_pending_event(svm, nested_vmcb);
 
 	if (svm->nrips_enabled)
 		nested_vmcb->control.next_rip  = vmcb->control.next_rip;
@@ -539,9 +570,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset =
 		svm->vcpu.arch.l1_tsc_offset;
 
-	kvm_clear_exception_queue(&svm->vcpu);
-	kvm_clear_interrupt_queue(&svm->vcpu);
-
 	svm->nested.ctl.nested_cr3 = 0;
 
 	/* Restore selected save entries */
@@ -570,6 +598,13 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	mark_all_dirty(svm->vmcb);
 
+	trace_kvm_nested_vmexit_inject(nested_vmcb->control.exit_code,
+				       nested_vmcb->control.exit_info_1,
+				       nested_vmcb->control.exit_info_2,
+				       nested_vmcb->control.exit_int_info,
+				       nested_vmcb->control.exit_int_info_err,
+				       KVM_ISA_SVM);
+
 	kvm_vcpu_unmap(&svm->vcpu, &map, true);
 
 	nested_svm_uninit_mmu_context(&svm->vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e48e4173bc60..422b1cc62a20 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2913,6 +2913,8 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	if (npt_enabled)
 		vcpu->arch.cr3 = svm->vmcb->save.cr3;
 
+	svm_complete_interrupts(svm);
+
 	if (is_guest_mode(vcpu)) {
 		int vmexit;
 
@@ -2932,8 +2934,6 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 			return 1;
 	}
 
-	svm_complete_interrupts(svm);
-
 	if (svm->vmcb->control.exit_code == SVM_EXIT_ERR) {
 		kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
 		kvm_run->fail_entry.hardware_entry_failure_reason
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 22/30] KVM: nSVM: remove HF_VINTR_MASK
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (20 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 21/30] KVM: nSVM: synthesize correct EXITINTINFO on vmexit Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 23/30] KVM: nSVM: remove HF_HIF_MASK Paolo Bonzini
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Now that the int_ctl field is stored in svm->nested.ctl.int_ctl, we can
use it instead of vcpu->arch.hflags to check whether L2 is running
in V_INTR_MASKING mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 1 -
 arch/x86/kvm/svm/nested.c       | 6 +-----
 arch/x86/kvm/svm/svm.c          | 2 +-
 arch/x86/kvm/svm/svm.h          | 4 +++-
 4 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e6f2e1a2dab6..0dfc522f96cc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1596,7 +1596,6 @@ enum {
 
 #define HF_GIF_MASK		(1 << 0)
 #define HF_HIF_MASK		(1 << 1)
-#define HF_VINTR_MASK		(1 << 2)
 #define HF_NMI_MASK		(1 << 3)
 #define HF_IRET_MASK		(1 << 4)
 #define HF_GUEST_MASK		(1 << 5) /* VCPU is in guest-mode */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c3c04fef11df..6967fe884eaf 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -118,7 +118,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	c->intercept_exceptions = h->intercept_exceptions;
 	c->intercept = h->intercept;
 
-	if (svm->vcpu.arch.hflags & HF_VINTR_MASK) {
+	if (g->int_ctl & V_INTR_MASKING_MASK) {
 		/* We only want the cr8 intercept bits of L1 */
 		c->intercept_cr &= ~(1U << INTERCEPT_CR8_READ);
 		c->intercept_cr &= ~(1U << INTERCEPT_CR8_WRITE);
@@ -338,10 +338,6 @@ static void nested_prepare_vmcb_control(struct vcpu_svm *svm)
 	kvm_mmu_reset_context(&svm->vcpu);
 
 	svm_flush_tlb(&svm->vcpu);
-	if (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
-		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
-	else
-		svm->vcpu.arch.hflags &= ~HF_VINTR_MASK;
 
 	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset =
 		svm->vcpu.arch.l1_tsc_offset + svm->nested.ctl.tsc_offset;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 422b1cc62a20..6867dba4f736 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3104,7 +3104,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
 
 	if (is_guest_mode(vcpu)) {
 		/* As long as interrupts are being delivered...  */
-		if ((svm->vcpu.arch.hflags & HF_VINTR_MASK)
+		if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
 		    ? !(svm->vcpu.arch.hflags & HF_HIF_MASK)
 		    : !(kvm_get_rflags(vcpu) & X86_EFLAGS_IF))
 			return true;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 10b7b55720a0..be8e830f83fa 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -367,7 +367,9 @@ void svm_set_gif(struct vcpu_svm *svm, bool value);
 
 static inline bool svm_nested_virtualize_tpr(struct kvm_vcpu *vcpu)
 {
-	return is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK);
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	return is_guest_mode(vcpu) && (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK);
 }
 
 static inline bool nested_exit_on_smi(struct vcpu_svm *svm)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 23/30] KVM: nSVM: remove HF_HIF_MASK
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (21 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 22/30] KVM: nSVM: remove HF_VINTR_MASK Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 24/30] KVM: nSVM: split nested_vmcb_check_controls Paolo Bonzini
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

The L1 flags can be found in the save area of svm->nested.hsave, fish
it from there so that there is one fewer thing to migrate.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 1 -
 arch/x86/kvm/svm/nested.c       | 5 -----
 arch/x86/kvm/svm/svm.c          | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0dfc522f96cc..3485f8454088 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1595,7 +1595,6 @@ enum {
 };
 
 #define HF_GIF_MASK		(1 << 0)
-#define HF_HIF_MASK		(1 << 1)
 #define HF_NMI_MASK		(1 << 3)
 #define HF_IRET_MASK		(1 << 4)
 #define HF_GUEST_MASK		(1 << 5) /* VCPU is in guest-mode */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6967fe884eaf..65ecc8586f75 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -371,11 +371,6 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 			  struct vmcb *nested_vmcb)
 {
 	svm->nested.vmcb = vmcb_gpa;
-	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
-		svm->vcpu.arch.hflags |= HF_HIF_MASK;
-	else
-		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
-
 	load_nested_vmcb_control(svm, &nested_vmcb->control);
 	nested_prepare_vmcb_save(svm, nested_vmcb);
 	nested_prepare_vmcb_control(svm);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6867dba4f736..bc08221f6743 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3105,7 +3105,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
 	if (is_guest_mode(vcpu)) {
 		/* As long as interrupts are being delivered...  */
 		if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
-		    ? !(svm->vcpu.arch.hflags & HF_HIF_MASK)
+		    ? !(svm->nested.hsave->save.rflags & X86_EFLAGS_IF)
 		    : !(kvm_get_rflags(vcpu) & X86_EFLAGS_IF))
 			return true;
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 24/30] KVM: nSVM: split nested_vmcb_check_controls
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (22 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 23/30] KVM: nSVM: remove HF_HIF_MASK Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 25/30] KVM: nSVM: leave guest mode when clearing EFER.SVME Paolo Bonzini
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

The authoritative state does not come from the VMCB once in guest mode,
but KVM_SET_NESTED_STATE can still perform checks on L1's provided SVM
controls because we get them from userspace.

Therefore, split out a function to do them.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 65ecc8586f75..bd3a89cd4070 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -203,26 +203,31 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 	return true;
 }
 
-static bool nested_vmcb_checks(struct vmcb *vmcb)
+static bool nested_vmcb_check_controls(struct vmcb_control_area *control)
 {
-	if ((vmcb->save.efer & EFER_SVME) == 0)
+	if ((control->intercept & (1ULL << INTERCEPT_VMRUN)) == 0)
 		return false;
 
-	if (((vmcb->save.cr0 & X86_CR0_CD) == 0) &&
-	    (vmcb->save.cr0 & X86_CR0_NW))
+	if (control->asid == 0)
 		return false;
 
-	if ((vmcb->control.intercept & (1ULL << INTERCEPT_VMRUN)) == 0)
+	if ((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) &&
+	    !npt_enabled)
 		return false;
 
-	if (vmcb->control.asid == 0)
+	return true;
+}
+
+static bool nested_vmcb_checks(struct vmcb *vmcb)
+{
+	if ((vmcb->save.efer & EFER_SVME) == 0)
 		return false;
 
-	if ((vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE) &&
-	    !npt_enabled)
+	if (((vmcb->save.cr0 & X86_CR0_CD) == 0) &&
+	    (vmcb->save.cr0 & X86_CR0_NW))
 		return false;
 
-	return true;
+	return nested_vmcb_check_controls(&vmcb->control);
 }
 
 static void load_nested_vmcb_control(struct vcpu_svm *svm,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 25/30] KVM: nSVM: leave guest mode when clearing EFER.SVME
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (23 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 24/30] KVM: nSVM: split nested_vmcb_check_controls Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-06-01  2:26   ` Krish Sadhukhan
  2020-05-29 15:39 ` [PATCH 26/30] KVM: MMU: pass arbitrary CR0/CR4/EFER to kvm_init_shadow_mmu Paolo Bonzini
                   ` (5 subsequent siblings)
  30 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

According to the AMD manual, the effect of turning off EFER.SVME while a
guest is running is undefined.  We make it leave guest mode immediately,
similar to the effect of clearing the VMX bit in MSR_IA32_FEAT_CTL.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 16 ++++++++++++++++
 arch/x86/kvm/svm/svm.c    | 10 ++++++++--
 arch/x86/kvm/svm/svm.h    |  1 +
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index bd3a89cd4070..369eca73fe3e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -618,6 +618,22 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	return 0;
 }
 
+/*
+ * Forcibly leave nested mode in order to be able to reset the VCPU later on.
+ */
+void svm_leave_nested(struct vcpu_svm *svm)
+{
+	if (is_guest_mode(&svm->vcpu)) {
+		struct vmcb *hsave = svm->nested.hsave;
+		struct vmcb *vmcb = svm->vmcb;
+
+		svm->nested.nested_run_pending = 0;
+		leave_guest_mode(&svm->vcpu);
+		copy_vmcb_control_area(&vmcb->control, &hsave->control);
+		nested_svm_uninit_mmu_context(&svm->vcpu);
+	}
+}
+
 static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 {
 	u32 offset, msr, value;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bc08221f6743..b4db9a980469 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -265,6 +265,7 @@ static int get_npt_level(struct kvm_vcpu *vcpu)
 
 void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
 	vcpu->arch.efer = efer;
 
 	if (!npt_enabled) {
@@ -275,8 +276,13 @@ void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 			efer &= ~EFER_LME;
 	}
 
-	to_svm(vcpu)->vmcb->save.efer = efer | EFER_SVME;
-	mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);
+	if (!(efer & EFER_SVME)) {
+		svm_leave_nested(svm);
+		svm_set_gif(svm, true);
+	}
+
+	svm->vmcb->save.efer = efer | EFER_SVME;
+	mark_dirty(svm->vmcb, VMCB_CR);
 }
 
 static int is_external_interrupt(u32 info)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index be8e830f83fa..6ac4c00a5d82 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -389,6 +389,7 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
 
 void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 			  struct vmcb *nested_vmcb);
+void svm_leave_nested(struct vcpu_svm *svm);
 int nested_svm_vmrun(struct vcpu_svm *svm);
 void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
 int nested_svm_vmexit(struct vcpu_svm *svm);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 26/30] KVM: MMU: pass arbitrary CR0/CR4/EFER to kvm_init_shadow_mmu
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (24 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 25/30] KVM: nSVM: leave guest mode when clearing EFER.SVME Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 27/30] selftests: kvm: introduce cpu_has_svm() check Paolo Bonzini
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

This allows fetching the registers from the hsave area when setting
up the NPT shadow MMU, and is needed for KVM_SET_NESTED_STATE (which
runs long after the CR0, CR4 and EFER values in vcpu have been switched
to hold L2 guest state).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.h        |  2 +-
 arch/x86/kvm/mmu/mmu.c    | 14 +++++++++-----
 arch/x86/kvm/svm/nested.c |  5 ++++-
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 048e865ad485..0ad06bfe2c2c 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -57,7 +57,7 @@ void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 void kvm_init_mmu(struct kvm_vcpu *vcpu, bool reset_roots);
-void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu);
+void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer);
 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 			     bool accessed_dirty, gpa_t new_eptp);
 bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fd1c9145505c..2e62a03410c7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4952,7 +4952,7 @@ kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only)
 	return role;
 }
 
-void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu)
+void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer)
 {
 	struct kvm_mmu *context = vcpu->arch.mmu;
 	union kvm_mmu_role new_role =
@@ -4961,11 +4961,11 @@ void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu)
 	if (new_role.as_u64 == context->mmu_role.as_u64)
 		return;
 
-	if (!is_paging(vcpu))
+	if (!(cr0 & X86_CR0_PG))
 		nonpaging_init_context(vcpu, context);
-	else if (is_long_mode(vcpu))
+	else if (efer & EFER_LMA)
 		paging64_init_context(vcpu, context);
-	else if (is_pae(vcpu))
+	else if (cr4 & X86_CR4_PAE)
 		paging32E_init_context(vcpu, context);
 	else
 		paging32_init_context(vcpu, context);
@@ -5043,7 +5043,11 @@ static void init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *context = vcpu->arch.mmu;
 
-	kvm_init_shadow_mmu(vcpu);
+	kvm_init_shadow_mmu(vcpu,
+			    kvm_read_cr0_bits(vcpu, X86_CR0_PG),
+			    kvm_read_cr4_bits(vcpu, X86_CR4_PAE),
+			    vcpu->arch.efer);
+
 	context->get_guest_pgd     = get_cr3;
 	context->get_pdptr         = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 369eca73fe3e..c712fe577029 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -80,10 +80,13 @@ static unsigned long nested_svm_get_tdp_cr3(struct kvm_vcpu *vcpu)
 
 static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *hsave = svm->nested.hsave;
+
 	WARN_ON(mmu_is_nested(vcpu));
 
 	vcpu->arch.mmu = &vcpu->arch.guest_mmu;
-	kvm_init_shadow_mmu(vcpu);
+	kvm_init_shadow_mmu(vcpu, X86_CR0_PG, hsave->save.cr4, hsave->save.efer);
 	vcpu->arch.mmu->get_guest_pgd     = nested_svm_get_tdp_cr3;
 	vcpu->arch.mmu->get_pdptr         = nested_svm_get_tdp_pdptr;
 	vcpu->arch.mmu->inject_page_fault = nested_svm_inject_npf_exit;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 27/30] selftests: kvm: introduce cpu_has_svm() check
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (25 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 26/30] KVM: MMU: pass arbitrary CR0/CR4/EFER to kvm_init_shadow_mmu Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 28/30] selftests: kvm: add a SVM version of state-test Paolo Bonzini
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Vitaly Kuznetsov

From: Vitaly Kuznetsov <vkuznets@redhat.com>

Many tests will want to check if the CPU is Intel or AMD in
guest code, add cpu_has_svm() and put it as static
inline to svm_util.h.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200529130407.57176-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tools/testing/selftests/kvm/include/x86_64/svm_util.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
index cd037917fece..b1057773206a 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -35,4 +35,14 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_r
 void run_guest(struct vmcb *vmcb, uint64_t vmcb_gpa);
 void nested_svm_check_supported(void);
 
+static inline bool cpu_has_svm(void)
+{
+	u32 eax = 0x80000001, ecx;
+
+	asm("cpuid" :
+	    "=a" (eax), "=c" (ecx) : "0" (eax) : "ebx", "edx");
+
+	return ecx & CPUID_SVM;
+}
+
 #endif /* SELFTEST_KVM_SVM_UTILS_H */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 28/30] selftests: kvm: add a SVM version of state-test
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (26 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 27/30] selftests: kvm: introduce cpu_has_svm() check Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 29/30] selftests: kvm: fix smm test on SVM Paolo Bonzini
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

The test is similar to the existing one for VMX, but simpler because we
don't have to test shadow VMCS or vmptrld/vmptrst/vmclear.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 .../testing/selftests/kvm/x86_64/state_test.c | 62 +++++++++++++++----
 1 file changed, 50 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/state_test.c b/tools/testing/selftests/kvm/x86_64/state_test.c
index 5b1a016edf55..d43b6f99b66c 100644
--- a/tools/testing/selftests/kvm/x86_64/state_test.c
+++ b/tools/testing/selftests/kvm/x86_64/state_test.c
@@ -18,14 +18,46 @@
 #include "kvm_util.h"
 #include "processor.h"
 #include "vmx.h"
+#include "svm_util.h"
 
 #define VCPU_ID		5
+#define L2_GUEST_STACK_SIZE 256
 
-void l2_guest_code(void)
+void svm_l2_guest_code(void)
 {
+	GUEST_SYNC(4);
+	/* Exit to L1 */
+	vmcall();
 	GUEST_SYNC(6);
+	/* Done, exit to L1 and never come back.  */
+	vmcall();
+}
 
-        /* Exit to L1 */
+static void svm_l1_guest_code(struct svm_test_data *svm)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	struct vmcb *vmcb = svm->vmcb;
+
+	GUEST_ASSERT(svm->vmcb_gpa);
+	/* Prepare for L2 execution. */
+	generic_svm_setup(svm, svm_l2_guest_code,
+			  &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+	GUEST_SYNC(3);
+	run_guest(vmcb, svm->vmcb_gpa);
+	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
+	GUEST_SYNC(5);
+	vmcb->save.rip += 3;
+	run_guest(vmcb, svm->vmcb_gpa);
+	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
+	GUEST_SYNC(7);
+}
+
+void vmx_l2_guest_code(void)
+{
+	GUEST_SYNC(6);
+
+	/* Exit to L1 */
 	vmcall();
 
 	/* L1 has now set up a shadow VMCS for us.  */
@@ -42,10 +74,9 @@ void l2_guest_code(void)
 	vmcall();
 }
 
-void l1_guest_code(struct vmx_pages *vmx_pages)
+static void vmx_l1_guest_code(struct vmx_pages *vmx_pages)
 {
-#define L2_GUEST_STACK_SIZE 64
-        unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
 
 	GUEST_ASSERT(vmx_pages->vmcs_gpa);
 	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
@@ -56,7 +87,7 @@ void l1_guest_code(struct vmx_pages *vmx_pages)
 	GUEST_SYNC(4);
 	GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa);
 
-	prepare_vmcs(vmx_pages, l2_guest_code,
+	prepare_vmcs(vmx_pages, vmx_l2_guest_code,
 		     &l2_guest_stack[L2_GUEST_STACK_SIZE]);
 
 	GUEST_SYNC(5);
@@ -106,20 +137,24 @@ void l1_guest_code(struct vmx_pages *vmx_pages)
 	GUEST_ASSERT(vmresume());
 }
 
-void guest_code(struct vmx_pages *vmx_pages)
+static void __attribute__((__flatten__)) guest_code(void *arg)
 {
 	GUEST_SYNC(1);
 	GUEST_SYNC(2);
 
-	if (vmx_pages)
-		l1_guest_code(vmx_pages);
+	if (arg) {
+		if (cpu_has_svm())
+			svm_l1_guest_code(arg);
+		else
+			vmx_l1_guest_code(arg);
+	}
 
 	GUEST_DONE();
 }
 
 int main(int argc, char *argv[])
 {
-	vm_vaddr_t vmx_pages_gva = 0;
+	vm_vaddr_t nested_gva = 0;
 
 	struct kvm_regs regs1, regs2;
 	struct kvm_vm *vm;
@@ -136,8 +171,11 @@ int main(int argc, char *argv[])
 	vcpu_regs_get(vm, VCPU_ID, &regs1);
 
 	if (kvm_check_cap(KVM_CAP_NESTED_STATE)) {
-		vcpu_alloc_vmx(vm, &vmx_pages_gva);
-		vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva);
+		if (kvm_get_supported_cpuid_entry(0x80000001)->ecx & CPUID_SVM)
+			vcpu_alloc_svm(vm, &nested_gva);
+		else
+			vcpu_alloc_vmx(vm, &nested_gva);
+		vcpu_args_set(vm, VCPU_ID, 1, nested_gva);
 	} else {
 		pr_info("will skip nested state checks\n");
 		vcpu_args_set(vm, VCPU_ID, 1, 0);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 29/30] selftests: kvm: fix smm test on SVM
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (27 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 28/30] selftests: kvm: add a SVM version of state-test Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-05-29 15:39 ` [PATCH 30/30] KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE Paolo Bonzini
  2020-05-29 17:59 ` [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Sean Christopherson
  30 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Vitaly Kuznetsov

From: Vitaly Kuznetsov <vkuznets@redhat.com>

KVM_CAP_NESTED_STATE is now supported for AMD too but smm test acts like
it is still Intel only.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200529130407.57176-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tools/testing/selftests/kvm/x86_64/smm_test.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c
index 8230b6bc6b8f..6f8f478b3ceb 100644
--- a/tools/testing/selftests/kvm/x86_64/smm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smm_test.c
@@ -17,6 +17,7 @@
 #include "kvm_util.h"
 
 #include "vmx.h"
+#include "svm_util.h"
 
 #define VCPU_ID	      1
 
@@ -58,7 +59,7 @@ void self_smi(void)
 	      APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI);
 }
 
-void guest_code(struct vmx_pages *vmx_pages)
+void guest_code(void *arg)
 {
 	uint64_t apicbase = rdmsr(MSR_IA32_APICBASE);
 
@@ -72,8 +73,11 @@ void guest_code(struct vmx_pages *vmx_pages)
 
 	sync_with_host(4);
 
-	if (vmx_pages) {
-		GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
+	if (arg) {
+		if (cpu_has_svm())
+			generic_svm_setup(arg, NULL, NULL);
+		else
+			GUEST_ASSERT(prepare_for_vmx_operation(arg));
 
 		sync_with_host(5);
 
@@ -87,7 +91,7 @@ void guest_code(struct vmx_pages *vmx_pages)
 
 int main(int argc, char *argv[])
 {
-	vm_vaddr_t vmx_pages_gva = 0;
+	vm_vaddr_t nested_gva = 0;
 
 	struct kvm_regs regs;
 	struct kvm_vm *vm;
@@ -114,8 +118,11 @@ int main(int argc, char *argv[])
 	vcpu_set_msr(vm, VCPU_ID, MSR_IA32_SMBASE, SMRAM_GPA);
 
 	if (kvm_check_cap(KVM_CAP_NESTED_STATE)) {
-		vcpu_alloc_vmx(vm, &vmx_pages_gva);
-		vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva);
+		if (kvm_get_supported_cpuid_entry(0x80000001)->ecx & CPUID_SVM)
+			vcpu_alloc_svm(vm, &nested_gva);
+		else
+			vcpu_alloc_vmx(vm, &nested_gva);
+		vcpu_args_set(vm, VCPU_ID, 1, nested_gva);
 	} else {
 		pr_info("will skip SMM test with VMX enabled\n");
 		vcpu_args_set(vm, VCPU_ID, 1, 0);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 30/30] KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (28 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 29/30] selftests: kvm: fix smm test on SVM Paolo Bonzini
@ 2020-05-29 15:39 ` Paolo Bonzini
  2020-06-02  0:11   ` Krish Sadhukhan
  2020-05-29 17:59 ` [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Sean Christopherson
  30 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm

Similar to VMX, the state that is captured through the currently available
IOCTLs is a mix of L1 and L2 state, dependent on whether the L2 guest was
running at the moment when the process was interrupted to save its state.

In particular, the SVM-specific state for nested virtualization includes
the L1 saved state (including the interrupt flag), the cached L2 controls,
and the GIF.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/uapi/asm/kvm.h |  17 +++-
 arch/x86/kvm/cpuid.h            |   5 ++
 arch/x86/kvm/svm/nested.c       | 147 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |   1 +
 arch/x86/kvm/vmx/nested.c       |   5 --
 arch/x86/kvm/x86.c              |   3 +-
 6 files changed, 171 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 3f3f780c8c65..12075a9de1c1 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -385,18 +385,22 @@ struct kvm_sync_regs {
 #define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
 
 #define KVM_STATE_NESTED_FORMAT_VMX	0
-#define KVM_STATE_NESTED_FORMAT_SVM	1	/* unused */
+#define KVM_STATE_NESTED_FORMAT_SVM	1
 
 #define KVM_STATE_NESTED_GUEST_MODE	0x00000001
 #define KVM_STATE_NESTED_RUN_PENDING	0x00000002
 #define KVM_STATE_NESTED_EVMCS		0x00000004
 #define KVM_STATE_NESTED_MTF_PENDING	0x00000008
+#define KVM_STATE_NESTED_GIF_SET	0x00000100
 
 #define KVM_STATE_NESTED_SMM_GUEST_MODE	0x00000001
 #define KVM_STATE_NESTED_SMM_VMXON	0x00000002
 
 #define KVM_STATE_NESTED_VMX_VMCS_SIZE	0x1000
 
+#define KVM_STATE_NESTED_SVM_VMCB_SIZE	0x1000
+
+
 struct kvm_vmx_nested_state_data {
 	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
 	__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
@@ -411,6 +415,15 @@ struct kvm_vmx_nested_state_hdr {
 	} smm;
 };
 
+struct kvm_svm_nested_state_data {
+	/* Save area only used if KVM_STATE_NESTED_RUN_PENDING.  */
+	__u8 vmcb12[KVM_STATE_NESTED_SVM_VMCB_SIZE];
+};
+
+struct kvm_svm_nested_state_hdr {
+	__u64 vmcb_pa;
+};
+
 /* for KVM_CAP_NESTED_STATE */
 struct kvm_nested_state {
 	__u16 flags;
@@ -419,6 +432,7 @@ struct kvm_nested_state {
 
 	union {
 		struct kvm_vmx_nested_state_hdr vmx;
+		struct kvm_svm_nested_state_hdr svm;
 
 		/* Pad the header to 128 bytes.  */
 		__u8 pad[120];
@@ -431,6 +445,7 @@ struct kvm_nested_state {
 	 */
 	union {
 		struct kvm_vmx_nested_state_data vmx[0];
+		struct kvm_svm_nested_state_data svm[0];
 	} data;
 };
 
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 63a70f6a3df3..05434cd9342f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -303,4 +303,9 @@ static __always_inline void kvm_cpu_cap_check_and_set(unsigned int x86_feature)
 		kvm_cpu_cap_set(x86_feature);
 }
 
+static inline bool page_address_valid(struct kvm_vcpu *vcpu, gpa_t gpa)
+{
+	return PAGE_ALIGNED(gpa) && !(gpa >> cpuid_maxphyaddr(vcpu));
+}
+
 #endif
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c712fe577029..6b1049148c1b 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -25,6 +25,7 @@
 #include "trace.h"
 #include "mmu.h"
 #include "x86.h"
+#include "cpuid.h"
 #include "lapic.h"
 #include "svm.h"
 
@@ -238,6 +239,8 @@ static void load_nested_vmcb_control(struct vcpu_svm *svm,
 {
 	copy_vmcb_control_area(&svm->nested.ctl, control);
 
+	/* Copy it here because nested_svm_check_controls will check it.  */
+	svm->nested.ctl.asid           = control->asid;
 	svm->nested.ctl.msrpm_base_pa &= ~0x0fffULL;
 	svm->nested.ctl.iopm_base_pa  &= ~0x0fffULL;
 }
@@ -930,6 +933,150 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
 	return NESTED_EXIT_CONTINUE;
 }
 
+static int svm_get_nested_state(struct kvm_vcpu *vcpu,
+				struct kvm_nested_state __user *user_kvm_nested_state,
+				u32 user_data_size)
+{
+	struct vcpu_svm *svm;
+	struct kvm_nested_state kvm_state = {
+		.flags = 0,
+		.format = KVM_STATE_NESTED_FORMAT_SVM,
+		.size = sizeof(kvm_state),
+	};
+	struct vmcb __user *user_vmcb = (struct vmcb __user *)
+		&user_kvm_nested_state->data.svm[0];
+
+	if (!vcpu)
+		return kvm_state.size + KVM_STATE_NESTED_SVM_VMCB_SIZE;
+
+	svm = to_svm(vcpu);
+
+	if (user_data_size < kvm_state.size)
+		goto out;
+
+	/* First fill in the header and copy it out.  */
+	if (is_guest_mode(vcpu)) {
+		kvm_state.hdr.svm.vmcb_pa = svm->nested.vmcb;
+		kvm_state.size += KVM_STATE_NESTED_SVM_VMCB_SIZE;
+		kvm_state.flags |= KVM_STATE_NESTED_GUEST_MODE;
+
+		if (svm->nested.nested_run_pending)
+			kvm_state.flags |= KVM_STATE_NESTED_RUN_PENDING;
+	}
+
+	if (gif_set(svm))
+		kvm_state.flags |= KVM_STATE_NESTED_GIF_SET;
+
+	if (copy_to_user(user_kvm_nested_state, &kvm_state, sizeof(kvm_state)))
+		return -EFAULT;
+
+	if (!is_guest_mode(vcpu))
+		goto out;
+
+	/*
+	 * Copy over the full size of the VMCB rather than just the size
+	 * of the structs.
+	 */
+	if (clear_user(user_vmcb, KVM_STATE_NESTED_SVM_VMCB_SIZE))
+		return -EFAULT;
+	if (copy_to_user(&user_vmcb->control, &svm->nested.ctl,
+			 sizeof(user_vmcb->control)))
+		return -EFAULT;
+	if (copy_to_user(&user_vmcb->save, &svm->nested.hsave->save,
+			 sizeof(user_vmcb->save)))
+		return -EFAULT;
+
+out:
+	return kvm_state.size;
+}
+
+static int svm_set_nested_state(struct kvm_vcpu *vcpu,
+				struct kvm_nested_state __user *user_kvm_nested_state,
+				struct kvm_nested_state *kvm_state)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *hsave = svm->nested.hsave;
+	struct vmcb __user *user_vmcb = (struct vmcb __user *)
+		&user_kvm_nested_state->data.svm[0];
+	struct vmcb_control_area ctl;
+	struct vmcb_save_area save;
+	u32 cr0;
+
+	if (kvm_state->format != KVM_STATE_NESTED_FORMAT_SVM)
+		return -EINVAL;
+
+	if (kvm_state->flags & ~(KVM_STATE_NESTED_GUEST_MODE |
+				 KVM_STATE_NESTED_RUN_PENDING |
+				 KVM_STATE_NESTED_GIF_SET))
+		return -EINVAL;
+
+	/*
+	 * If in guest mode, vcpu->arch.efer actually refers to the L2 guest's
+	 * EFER.SVME, but EFER.SVME still has to be 1 for VMRUN to succeed.
+	 */
+	if (!(vcpu->arch.efer & EFER_SVME)) {
+		/* GIF=1 and no guest mode are required if SVME=0.  */
+		if (kvm_state->flags != KVM_STATE_NESTED_GIF_SET)
+			return -EINVAL;
+	}
+
+	/* SMM temporarily disables SVM, so we cannot be in guest mode.  */
+	if (is_smm(vcpu) && (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE))
+		return -EINVAL;
+
+	if (!(kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) {
+		svm_leave_nested(svm);
+		goto out_set_gif;
+	}
+
+	if (!page_address_valid(vcpu, kvm_state->hdr.svm.vmcb_pa))
+		return -EINVAL;
+	if (kvm_state->size < sizeof(*kvm_state) + KVM_STATE_NESTED_SVM_VMCB_SIZE)
+		return -EINVAL;
+	if (copy_from_user(&ctl, &user_vmcb->control, sizeof(ctl)))
+		return -EFAULT;
+	if (copy_from_user(&save, &user_vmcb->save, sizeof(save)))
+		return -EFAULT;
+
+	if (!nested_vmcb_check_controls(&ctl))
+		return -EINVAL;
+
+	/*
+	 * Processor state contains L2 state.  Check that it is
+	 * valid for guest mode (see nested_vmcb_checks).
+	 */
+	cr0 = kvm_read_cr0(vcpu);
+        if (((cr0 & X86_CR0_CD) == 0) && (cr0 & X86_CR0_NW))
+                return -EINVAL;
+
+	/*
+	 * Validate host state saved from before VMRUN (see
+	 * nested_svm_check_permissions).
+	 * TODO: validate reserved bits for all saved state.
+	 */
+	if (!(save.cr0 & X86_CR0_PG))
+		return -EINVAL;
+
+	/*
+	 * All checks done, we can enter guest mode.  L1 control fields
+	 * come from the nested save state.  Guest state is already
+	 * in the registers, the save area of the nested state instead
+	 * contains saved L1 state.
+	 */
+	copy_vmcb_control_area(&hsave->control, &svm->vmcb->control);
+	hsave->save = save;
+
+	svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa;
+	load_nested_vmcb_control(svm, &ctl);
+	nested_prepare_vmcb_control(svm);
+
+out_set_gif:
+	svm_set_gif(svm, !!(kvm_state->flags & KVM_STATE_NESTED_GIF_SET));
+	return 0;
+}
+
 struct kvm_x86_nested_ops svm_nested_ops = {
 	.check_events = svm_check_nested_events,
+	.get_state = svm_get_nested_state,
+	.set_state = svm_set_nested_state,
 };
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b4db9a980469..3871bfb40594 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1193,6 +1193,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 		svm->avic_is_running = true;
 
 	svm->nested.hsave = page_address(hsave_page);
+	clear_page(svm->nested.hsave);
 
 	svm->msrpm = page_address(msrpm_pages);
 	svm_vcpu_init_msrpm(svm->msrpm);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 51ebb60e1533..106fc6fceb97 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -437,11 +437,6 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
 	}
 }
 
-static bool page_address_valid(struct kvm_vcpu *vcpu, gpa_t gpa)
-{
-	return PAGE_ALIGNED(gpa) && !(gpa >> cpuid_maxphyaddr(vcpu));
-}
-
 static int nested_vmx_check_io_bitmap_controls(struct kvm_vcpu *vcpu,
 					       struct vmcs12 *vmcs12)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f0fa610bed91..d4aa7dc662d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4628,7 +4628,8 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 		if (kvm_state.flags &
 		    ~(KVM_STATE_NESTED_RUN_PENDING | KVM_STATE_NESTED_GUEST_MODE
-		      | KVM_STATE_NESTED_EVMCS | KVM_STATE_NESTED_MTF_PENDING))
+		      | KVM_STATE_NESTED_EVMCS | KVM_STATE_NESTED_MTF_PENDING
+		      | KVM_STATE_NESTED_GIF_SET))
 			break;
 
 		/* nested_run_pending implies guest_mode.  */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/30] KVM: SVM: always update CR3 in VMCB
  2020-05-29 15:39 ` [PATCH 06/30] KVM: SVM: always update CR3 in VMCB Paolo Bonzini
@ 2020-05-29 17:41   ` Krish Sadhukhan
  2020-05-29 17:56     ` Sean Christopherson
  0 siblings, 1 reply; 50+ messages in thread
From: Krish Sadhukhan @ 2020-05-29 17:41 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm


On 5/29/20 8:39 AM, Paolo Bonzini wrote:
> svm_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02

Did you mean to say enter_svm_guest_mode here ?
> as
> an optimization, but this is only correct before the nested vmentry.
> If userspace is modifying CR3 with KVM_SET_SREGS after the VM has
> already been put in guest mode, the value of CR3 will not be updated.
> Remove the optimization, which almost never triggers anyway.
> This was was added in commit 689f3bf21628 ("KVM: x86: unify callbacks
> to load paging root", 2020-03-16) just to keep the two vendor-specific
> modules closer, but we'll fix VMX too.
>
> Fixes: 689f3bf21628 ("KVM: x86: unify callbacks to load paging root")
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/svm/nested.c |  6 +-----
>   arch/x86/kvm/svm/svm.c    | 16 +++++-----------
>   2 files changed, 6 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index dcac4c3510ab..8756c9f463fd 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -256,11 +256,7 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
>   	svm_set_efer(&svm->vcpu, nested_vmcb->save.efer);
>   	svm_set_cr0(&svm->vcpu, nested_vmcb->save.cr0);
>   	svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4);
> -	if (npt_enabled) {
> -		svm->vmcb->save.cr3 = nested_vmcb->save.cr3;
> -		svm->vcpu.arch.cr3 = nested_vmcb->save.cr3;
> -	} else
> -		(void)kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);
> +	(void)kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);
>   
>   	/* Guest paging mode is active - reset mmu */
>   	kvm_mmu_reset_context(&svm->vcpu);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 545f63ebc720..feb96a410f2d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3447,7 +3447,6 @@ static fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>   static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long root)
>   {
>   	struct vcpu_svm *svm = to_svm(vcpu);
> -	bool update_guest_cr3 = true;
>   	unsigned long cr3;
>   
>   	cr3 = __sme_set(root);
> @@ -3456,18 +3455,13 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long root)
>   		mark_dirty(svm->vmcb, VMCB_NPT);
>   
>   		/* Loading L2's CR3 is handled by enter_svm_guest_mode.  */
> -		if (is_guest_mode(vcpu))
> -			update_guest_cr3 = false;
> -		else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
> -			cr3 = vcpu->arch.cr3;
> -		else /* CR3 is already up-to-date.  */
> -			update_guest_cr3 = false;
> +		if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
> +			return;
> +		cr3 = vcpu->arch.cr3;
>   	}
>   
> -	if (update_guest_cr3) {
> -		svm->vmcb->save.cr3 = cr3;
> -		mark_dirty(svm->vmcb, VMCB_CR);
> -	}
> +	svm->vmcb->save.cr3 = cr3;
> +	mark_dirty(svm->vmcb, VMCB_CR);
>   }
>   
>   static int is_disabled(void)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/30] KVM: SVM: always update CR3 in VMCB
  2020-05-29 17:41   ` Krish Sadhukhan
@ 2020-05-29 17:56     ` Sean Christopherson
  0 siblings, 0 replies; 50+ messages in thread
From: Sean Christopherson @ 2020-05-29 17:56 UTC (permalink / raw)
  To: Krish Sadhukhan; +Cc: Paolo Bonzini, linux-kernel, kvm

On Fri, May 29, 2020 at 10:41:58AM -0700, Krish Sadhukhan wrote:
> 
> On 5/29/20 8:39 AM, Paolo Bonzini wrote:
> >svm_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02
> 
> Did you mean to say enter_svm_guest_mode here ?

Heh, looks like Vitaly passed the s/vmcs/vmcb bug on to Paolo, too. :-D

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 00/28] KVM: nSVM: event fixes and migration support
  2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
                   ` (29 preceding siblings ...)
  2020-05-29 15:39 ` [PATCH 30/30] KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE Paolo Bonzini
@ 2020-05-29 17:59 ` Sean Christopherson
  2020-05-29 19:07   ` Paolo Bonzini
  30 siblings, 1 reply; 50+ messages in thread
From: Sean Christopherson @ 2020-05-29 17:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

> [PATCH v3 00/28] KVM: nSVM: event fixes and migration support

You've got something funky going on with the way you generate cover letters,
looks like it doesn't count patches authored by someone else.  The 'v3' is
also missing from the patches, though I suppose some heathens do that on
purpose.

> Paolo Bonzini (28):
> 
> Vitaly Kuznetsov (2):

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/30] KVM: nSVM: move map argument out of enter_svm_guest_mode
  2020-05-29 15:39 ` [PATCH 08/30] KVM: nSVM: move map argument out of enter_svm_guest_mode Paolo Bonzini
@ 2020-05-29 18:10   ` Krish Sadhukhan
  2020-05-29 19:04     ` Paolo Bonzini
  0 siblings, 1 reply; 50+ messages in thread
From: Krish Sadhukhan @ 2020-05-29 18:10 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm


On 5/29/20 8:39 AM, Paolo Bonzini wrote:
> Unmapping the nested VMCB in enter_svm_guest_mode is a bit of a wart,
> since the map is not used elsewhere in the function.  There are
> just two calls, so move it there.


The last sentence sounds bit incomplete.

Also, does it make sense to mention the reason why unmapping is not 
required before we enter guest mode ?

>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/svm/nested.c | 14 ++++++--------
>   arch/x86/kvm/svm/svm.c    |  3 ++-
>   arch/x86/kvm/svm/svm.h    |  2 +-
>   3 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 8756c9f463fd..8e98d5e420a5 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -229,7 +229,7 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
>   }
>   
>   void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
> -			  struct vmcb *nested_vmcb, struct kvm_host_map *map)
> +			  struct vmcb *nested_vmcb)
>   {
>   	bool evaluate_pending_interrupts =
>   		is_intercept(svm, INTERCEPT_VINTR) ||
> @@ -304,8 +304,6 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
>   	svm->vmcb->control.pause_filter_thresh =
>   		nested_vmcb->control.pause_filter_thresh;
>   
> -	kvm_vcpu_unmap(&svm->vcpu, map, true);
> -
>   	/* Enter Guest-Mode */
>   	enter_guest_mode(&svm->vcpu);
>   
> @@ -368,10 +366,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
>   		nested_vmcb->control.exit_code_hi = 0;
>   		nested_vmcb->control.exit_info_1  = 0;
>   		nested_vmcb->control.exit_info_2  = 0;
> -
> -		kvm_vcpu_unmap(&svm->vcpu, &map, true);
> -
> -		return ret;
> +		goto out;
>   	}
>   
>   	trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
> @@ -414,7 +409,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
>   	copy_vmcb_control_area(hsave, vmcb);
>   
>   	svm->nested.nested_run_pending = 1;
> -	enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb, &map);
> +	enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb);
>   
>   	if (!nested_svm_vmrun_msrpm(svm)) {
>   		svm->vmcb->control.exit_code    = SVM_EXIT_ERR;
> @@ -425,6 +420,9 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
>   		nested_svm_vmexit(svm);
>   	}
>   
> +out:
> +	kvm_vcpu_unmap(&svm->vcpu, &map, true);
> +
>   	return ret;
>   }
>   
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index feb96a410f2d..76b3f553815e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3814,7 +3814,8 @@ static int svm_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
>   		if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(vmcb), &map) == -EINVAL)
>   			return 1;
>   		nested_vmcb = map.hva;
> -		enter_svm_guest_mode(svm, vmcb, nested_vmcb, &map);
> +		enter_svm_guest_mode(svm, vmcb, nested_vmcb);
> +		kvm_vcpu_unmap(&svm->vcpu, &map, true);
>   	}
>   	return 0;
>   }
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 89fab75dd4f5..33e3f09d7a8e 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -395,7 +395,7 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
>   }
>   
>   void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
> -			  struct vmcb *nested_vmcb, struct kvm_host_map *map);
> +			  struct vmcb *nested_vmcb);
>   int nested_svm_vmrun(struct vcpu_svm *svm);
>   void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
>   int nested_svm_vmexit(struct vcpu_svm *svm);

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 10/30] KVM: nSVM: extract preparation of VMCB for nested run
  2020-05-29 15:39 ` [PATCH 10/30] KVM: nSVM: extract preparation of VMCB for nested run Paolo Bonzini
@ 2020-05-29 18:27   ` Krish Sadhukhan
  2020-05-29 19:02     ` Paolo Bonzini
  0 siblings, 1 reply; 50+ messages in thread
From: Krish Sadhukhan @ 2020-05-29 18:27 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm


On 5/29/20 8:39 AM, Paolo Bonzini wrote:
> Split out filling svm->vmcb.save and svm->vmcb.control before VMRUN.
> Only the latter will be useful when restoring nested SVM state.
>
> This patch introduces no semantic change, so the MMU setup is still
> done in nested_prepare_vmcb_save.  The next patch will clean up things.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/svm/nested.c | 40 +++++++++++++++++++++++----------------
>   1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index fc0c6d1678eb..73be7af79453 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -245,21 +245,8 @@ static void load_nested_vmcb_control(struct vcpu_svm *svm,
>   	svm->vcpu.arch.tsc_offset += control->tsc_offset;
>   }
>   
> -void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
> -			  struct vmcb *nested_vmcb)
> +static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)


Not a big deal, but I feel that it helps a lot in readability if we keep 
the names symmetric. This one could be named prepare_nested_vmcb_save to 
match load_nested_vmcb_control that you created in the previous patch. 
Or load_nested_vmcb_control could be renamed to nested_load_vmcb_control 
to match the name here.

>   {
> -	bool evaluate_pending_interrupts =
> -		is_intercept(svm, INTERCEPT_VINTR) ||
> -		is_intercept(svm, INTERCEPT_IRET);
> -
> -	svm->nested.vmcb = vmcb_gpa;
> -	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
> -		svm->vcpu.arch.hflags |= HF_HIF_MASK;
> -	else
> -		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
> -
> -	load_nested_vmcb_control(svm, &nested_vmcb->control);
> -
>   	if (nested_vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE)
>   		nested_svm_init_mmu_context(&svm->vcpu);
>   
> @@ -291,7 +278,10 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
>   	svm->vmcb->save.dr7 = nested_vmcb->save.dr7;
>   	svm->vcpu.arch.dr6  = nested_vmcb->save.dr6;
>   	svm->vmcb->save.cpl = nested_vmcb->save.cpl;
> +}
>   
> +static void nested_prepare_vmcb_control(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
> +{
>   	svm_flush_tlb(&svm->vcpu);
>   	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
>   		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
> @@ -321,6 +311,26 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
>   	 */
>   	recalc_intercepts(svm);
>   
> +	mark_all_dirty(svm->vmcb);
> +}
> +
> +void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
> +			  struct vmcb *nested_vmcb)
> +{
> +	bool evaluate_pending_interrupts =
> +		is_intercept(svm, INTERCEPT_VINTR) ||
> +		is_intercept(svm, INTERCEPT_IRET);
> +
> +	svm->nested.vmcb = vmcb_gpa;
> +	if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
> +		svm->vcpu.arch.hflags |= HF_HIF_MASK;
> +	else
> +		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
> +
> +	load_nested_vmcb_control(svm, &nested_vmcb->control);
> +	nested_prepare_vmcb_save(svm, nested_vmcb);
> +	nested_prepare_vmcb_control(svm, nested_vmcb);
> +
>   	/*
>   	 * If L1 had a pending IRQ/NMI before executing VMRUN,
>   	 * which wasn't delivered because it was disallowed (e.g.
> @@ -336,8 +346,6 @@ void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
>   	enable_gif(svm);
>   	if (unlikely(evaluate_pending_interrupts))
>   		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
> -
> -	mark_all_dirty(svm->vmcb);
>   }
>   
>   int nested_svm_vmrun(struct vcpu_svm *svm)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 10/30] KVM: nSVM: extract preparation of VMCB for nested run
  2020-05-29 18:27   ` Krish Sadhukhan
@ 2020-05-29 19:02     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 19:02 UTC (permalink / raw)
  To: Krish Sadhukhan, linux-kernel, kvm

On 29/05/20 20:27, Krish Sadhukhan wrote:
>>
>> +static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct
>> vmcb *nested_vmcb)
> 
> 
> Not a big deal, but I feel that it helps a lot in readability if we keep
> the names symmetric. This one could be named prepare_nested_vmcb_save to
> match load_nested_vmcb_control that you created in the previous patch.
> Or load_nested_vmcb_control could be renamed to nested_load_vmcb_control
> to match the name here.

This is actually intended: while load_nested_vmcb_control loads the
members of nested_vmcb->control into svm->nested, the two functions in
this patch prepare the svm->vmcb.  A couple patches later,
nested_prepare_vmcb_control will not use nested_vmcb anymore.

I could use nested_load_nested_vmcb_control, but that is just too ugly!
 Instead, the best thing to do would be to use the vmcb01/vmcb02/vmcb12
names as in nVMX, in which case the functions would become
nested_load_vmcb12_control and nested_prepare_vmcb02_{save,control}.
However this is a bit hard to do right now because the svm->vmcb acts as
both vmcb01 and vmcb02 depending on what is running.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/30] KVM: nSVM: move map argument out of enter_svm_guest_mode
  2020-05-29 18:10   ` Krish Sadhukhan
@ 2020-05-29 19:04     ` Paolo Bonzini
  2020-05-29 20:02       ` Krish Sadhukhan
  0 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 19:04 UTC (permalink / raw)
  To: Krish Sadhukhan, linux-kernel, kvm

On 29/05/20 20:10, Krish Sadhukhan wrote:
>> Unmapping the nested VMCB in enter_svm_guest_mode is a bit of a wart,
>> since the map is not used elsewhere in the function.  There are
>> just two calls, so move it there.
> 
> The last sentence sounds bit incomplete.

Good point---more precisely, "calls" should be "callers".  "It" refers
to "unmapping".

> Also, does it make sense to mention the reason why unmapping is not
> required before we enter guest mode ?

Unmapping is a host operation and is not visible by the guest; is this
what you mean?

Paolo


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 00/28] KVM: nSVM: event fixes and migration support
  2020-05-29 17:59 ` [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Sean Christopherson
@ 2020-05-29 19:07   ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-29 19:07 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm

On 29/05/20 19:59, Sean Christopherson wrote:
>> [PATCH v3 00/28] KVM: nSVM: event fixes and migration support
> You've got something funky going on with the way you generate cover letters,
> looks like it doesn't count patches authored by someone else.  The 'v3' is
> also missing from the patches, though I suppose some heathens do that on
> purpose.

No, I simply had a draft of the cover letter ready, and when I decided
to include Vitaly's patches I didn't update the subject.

Paolo


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/30] KVM: nSVM: move map argument out of enter_svm_guest_mode
  2020-05-29 19:04     ` Paolo Bonzini
@ 2020-05-29 20:02       ` Krish Sadhukhan
  0 siblings, 0 replies; 50+ messages in thread
From: Krish Sadhukhan @ 2020-05-29 20:02 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm


On 5/29/20 12:04 PM, Paolo Bonzini wrote:
> On 29/05/20 20:10, Krish Sadhukhan wrote:
>>> Unmapping the nested VMCB in enter_svm_guest_mode is a bit of a wart,
>>> since the map is not used elsewhere in the function.  There are
>>> just two calls, so move it there.
>> The last sentence sounds bit incomplete.
> Good point---more precisely, "calls" should be "callers".  "It" refers
> to "unmapping".
>
>> Also, does it make sense to mention the reason why unmapping is not
>> required before we enter guest mode ?
> Unmapping is a host operation and is not visible by the guest; is this
> what you mean?


Yes, I was thinking if we could mention it in the commit message...


-Krish

>
> Paolo
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 17/30] KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit
  2020-05-29 15:39 ` [PATCH 17/30] KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit Paolo Bonzini
@ 2020-05-30  2:06   ` Krish Sadhukhan
  2020-05-30  5:10     ` Paolo Bonzini
  0 siblings, 1 reply; 50+ messages in thread
From: Krish Sadhukhan @ 2020-05-30  2:06 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm


On 5/29/20 8:39 AM, Paolo Bonzini wrote:
> The control state changes on every L2->L0 vmexit, and we will have to
> serialize it in the nested state.  So keep it up to date in svm->nested.ctl
> and just copy them back to the nested VMCB in nested_svm_vmexit.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/svm/nested.c | 57 ++++++++++++++++++++++-----------------
>   arch/x86/kvm/svm/svm.c    |  5 +++-
>   arch/x86/kvm/svm/svm.h    |  1 +
>   3 files changed, 38 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 1e5f460b5540..921466eba556 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -234,6 +234,34 @@ static void load_nested_vmcb_control(struct vcpu_svm *svm,
>   	svm->nested.ctl.iopm_base_pa  &= ~0x0fffULL;
>   }
>   
> +/*
> + * Synchronize fields that are written by the processor, so that
> + * they can be copied back into the nested_vmcb.
> + */
> +void sync_nested_vmcb_control(struct vcpu_svm *svm)
> +{
> +	u32 mask;
> +	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
> +	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
> +
> +	/* Only a few fields of int_ctl are written by the processor.  */
> +	mask = V_IRQ_MASK | V_TPR_MASK;
> +	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> +	    is_intercept(svm, SVM_EXIT_VINTR)) {
> +		/*
> +		 * In order to request an interrupt window, L0 is usurping
> +		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> +		 * even if it was clear in L1's VMCB.  Restoring it would be
> +		 * wrong.  However, in this case V_IRQ will remain true until
> +		 * interrupt_window_interception calls svm_clear_vintr and
> +		 * restores int_ctl.  We can just leave it aside.
> +		 */
> +		mask &= ~V_IRQ_MASK;
> +	}
> +	svm->nested.ctl.int_ctl        &= ~mask;
> +	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
> +}
> +
>   static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
>   {
>   	/* Load the nested guest state */
> @@ -471,6 +499,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>   	/* Exit Guest-Mode */
>   	leave_guest_mode(&svm->vcpu);
>   	svm->nested.vmcb = 0;
> +	WARN_ON_ONCE(svm->nested.nested_run_pending);
>   
>   	/* in case we halted in L2 */
>   	svm->vcpu.arch.mp_state = KVM_MP_STATE_RUNNABLE;
> @@ -497,8 +526,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>   	nested_vmcb->save.dr6    = svm->vcpu.arch.dr6;
>   	nested_vmcb->save.cpl    = vmcb->save.cpl;
>   
> -	nested_vmcb->control.int_ctl           = vmcb->control.int_ctl;
> -	nested_vmcb->control.int_vector        = vmcb->control.int_vector;


While it's not related to this patch, I am wondering if we need the 
following line in enter_svm_guest_mode():

     svm->vmcb->control.int_vector = nested_vmcb->control.int_vector;


If int_vector is used only to trigger a #VMEXIT after we have decided to 
inject a virtual interrupt, we probably don't the vector value from the 
nested vmcb.

>   	nested_vmcb->control.int_state         = vmcb->control.int_state;
>   	nested_vmcb->control.exit_code         = vmcb->control.exit_code;
>   	nested_vmcb->control.exit_code_hi      = vmcb->control.exit_code_hi;
> @@ -510,34 +537,16 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>   	if (svm->nrips_enabled)
>   		nested_vmcb->control.next_rip  = vmcb->control.next_rip;
>   
> -	/*
> -	 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
> -	 * to make sure that we do not lose injected events. So check event_inj
> -	 * here and copy it to exit_int_info if it is valid.
> -	 * Exit_int_info and event_inj can't be both valid because the case
> -	 * below only happens on a VMRUN instruction intercept which has
> -	 * no valid exit_int_info set.
> -	 */
> -	if (vmcb->control.event_inj & SVM_EVTINJ_VALID) {
> -		struct vmcb_control_area *nc = &nested_vmcb->control;
> -
> -		nc->exit_int_info     = vmcb->control.event_inj;
> -		nc->exit_int_info_err = vmcb->control.event_inj_err;
> -	}
> -
> -	nested_vmcb->control.tlb_ctl           = 0;
> -	nested_vmcb->control.event_inj         = 0;
> -	nested_vmcb->control.event_inj_err     = 0;
> +	nested_vmcb->control.int_ctl           = svm->nested.ctl.int_ctl;
> +	nested_vmcb->control.tlb_ctl           = svm->nested.ctl.tlb_ctl;
> +	nested_vmcb->control.event_inj         = svm->nested.ctl.event_inj;
> +	nested_vmcb->control.event_inj_err     = svm->nested.ctl.event_inj_err;
>   
>   	nested_vmcb->control.pause_filter_count =
>   		svm->vmcb->control.pause_filter_count;
>   	nested_vmcb->control.pause_filter_thresh =
>   		svm->vmcb->control.pause_filter_thresh;
>   
> -	/* We always set V_INTR_MASKING and remember the old value in hflags */
> -	if (!(svm->vcpu.arch.hflags & HF_VINTR_MASK))
> -		nested_vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
> -
>   	/* Restore the original control entries */
>   	copy_vmcb_control_area(&vmcb->control, &hsave->control);
>   
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 4122ba86bac2..b710e62ace16 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3427,7 +3427,10 @@ static fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>   	sync_cr8_to_lapic(vcpu);
>   
>   	svm->next_rip = 0;
> -	svm->nested.nested_run_pending = 0;
> +	if (is_guest_mode(&svm->vcpu)) {
> +		sync_nested_vmcb_control(svm);
> +		svm->nested.nested_run_pending = 0;


I don't see any existence of nested_run_pending for nSVM either in Linus 
tree or KVM tree. Is there a  patch that has this added but not pushed yet ?

> +	}
>   
>   	svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
>   
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index dd5418f20256..7e79f0af1204 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -394,6 +394,7 @@ int nested_svm_check_permissions(struct vcpu_svm *svm);
>   int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
>   			       bool has_error_code, u32 error_code);
>   int nested_svm_exit_special(struct vcpu_svm *svm);
> +void sync_nested_vmcb_control(struct vcpu_svm *svm);
>   
>   extern struct kvm_x86_nested_ops svm_nested_ops;
>   

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 17/30] KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit
  2020-05-30  2:06   ` Krish Sadhukhan
@ 2020-05-30  5:10     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-05-30  5:10 UTC (permalink / raw)
  To: Krish Sadhukhan, linux-kernel, kvm

On 30/05/20 04:06, Krish Sadhukhan wrote:
>>
>>   -    nested_vmcb->control.int_ctl           = vmcb->control.int_ctl;
>> -    nested_vmcb->control.int_vector        = vmcb->control.int_vector;
> 
> 
> While it's not related to this patch, I am wondering if we need the
> following line in enter_svm_guest_mode():
> 
>     svm->vmcb->control.int_vector = nested_vmcb->control.int_vector;
> 
> If int_vector is used only to trigger a #VMEXIT after we have decided to
> inject a virtual interrupt, we probably don't the vector value from the
> nested vmcb.

KVM does not use int_vector, but other hypervisor sometimes trigger
virtual interrupts using int_ctl+int_vector instead of eventinj.
(Unlike KVM they don't intercept EXIT_VINTR).  Hyper-V operates like that.

Paolo


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 20/30] KVM: SVM: preserve VGIF across VMCB switch
  2020-05-29 15:39 ` [PATCH 20/30] KVM: SVM: preserve VGIF across VMCB switch Paolo Bonzini
@ 2020-05-31 23:11   ` Krish Sadhukhan
  2020-06-01  7:30     ` Paolo Bonzini
  0 siblings, 1 reply; 50+ messages in thread
From: Krish Sadhukhan @ 2020-05-31 23:11 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm


On 5/29/20 8:39 AM, Paolo Bonzini wrote:
> There is only one GIF flag for the whole processor, so make sure it is not clobbered
> when switching to L2 (in which case we also have to include the V_GIF_ENABLE_MASK,
> lest we confuse enable_gif/disable_gif/gif_set).  When going back, L1 could in
> theory have entered L2 without issuing a CLGI so make sure the svm_set_gif is
> done last, after svm->vmcb->control.int_ctl has been copied back from hsave.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/svm/nested.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 7e4a506828c9..6c7f0bffdf01 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -293,6 +293,7 @@ static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_v
>   
>   static void nested_prepare_vmcb_control(struct vcpu_svm *svm)
>   {
> +	const u32 mask = V_INTR_MASKING_MASK | V_GIF_ENABLE_MASK | V_GIF_MASK;
>   	if (svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE)
>   		nested_svm_init_mmu_context(&svm->vcpu);
>   
> @@ -308,7 +309,10 @@ static void nested_prepare_vmcb_control(struct vcpu_svm *svm)
>   	svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset =
>   		svm->vcpu.arch.l1_tsc_offset + svm->nested.ctl.tsc_offset;
>   
> -	svm->vmcb->control.int_ctl             = svm->nested.ctl.int_ctl | V_INTR_MASKING_MASK;
> +	svm->vmcb->control.int_ctl             =
> +		(svm->nested.ctl.int_ctl & ~mask) |
> +		(svm->nested.hsave->control.int_ctl & mask);


If this is the very first VMRUN, do we have any int_ctl saved in hsave ?

> +
>   	svm->vmcb->control.virt_ext            = svm->nested.ctl.virt_ext;
>   	svm->vmcb->control.int_vector          = svm->nested.ctl.int_vector;
>   	svm->vmcb->control.int_state           = svm->nested.ctl.int_state;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 25/30] KVM: nSVM: leave guest mode when clearing EFER.SVME
  2020-05-29 15:39 ` [PATCH 25/30] KVM: nSVM: leave guest mode when clearing EFER.SVME Paolo Bonzini
@ 2020-06-01  2:26   ` Krish Sadhukhan
  2020-06-01  7:28     ` Paolo Bonzini
  0 siblings, 1 reply; 50+ messages in thread
From: Krish Sadhukhan @ 2020-06-01  2:26 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm


On 5/29/20 8:39 AM, Paolo Bonzini wrote:
> According to the AMD manual, the effect of turning off EFER.SVME while a
> guest is running is undefined.  We make it leave guest mode immediately,
> similar to the effect of clearing the VMX bit in MSR_IA32_FEAT_CTL.


I see that svm_set_efer() is called in enter_svm_guest_mode() and 
nested_svm_vmexit(). In the VMRUN path, we have already checked 
EFER.SVME in nested_vmcb_checks(). So if it was not set, we wouldn't 
come to enter_svm_guest_mode(). Your fix is only for the #VMEXIT path then ?

>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/svm/nested.c | 16 ++++++++++++++++
>   arch/x86/kvm/svm/svm.c    | 10 ++++++++--
>   arch/x86/kvm/svm/svm.h    |  1 +
>   3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index bd3a89cd4070..369eca73fe3e 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -618,6 +618,22 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>   	return 0;
>   }
>   
> +/*
> + * Forcibly leave nested mode in order to be able to reset the VCPU later on.
> + */
> +void svm_leave_nested(struct vcpu_svm *svm)
> +{
> +	if (is_guest_mode(&svm->vcpu)) {
> +		struct vmcb *hsave = svm->nested.hsave;
> +		struct vmcb *vmcb = svm->vmcb;
> +
> +		svm->nested.nested_run_pending = 0;
> +		leave_guest_mode(&svm->vcpu);
> +		copy_vmcb_control_area(&vmcb->control, &hsave->control);
> +		nested_svm_uninit_mmu_context(&svm->vcpu);
> +	}
> +}
> +
>   static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
>   {
>   	u32 offset, msr, value;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index bc08221f6743..b4db9a980469 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -265,6 +265,7 @@ static int get_npt_level(struct kvm_vcpu *vcpu)
>   
>   void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>   {
> +	struct vcpu_svm *svm = to_svm(vcpu);
>   	vcpu->arch.efer = efer;
>   
>   	if (!npt_enabled) {
> @@ -275,8 +276,13 @@ void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>   			efer &= ~EFER_LME;
>   	}
>   
> -	to_svm(vcpu)->vmcb->save.efer = efer | EFER_SVME;
> -	mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);
> +	if (!(efer & EFER_SVME)) {
> +		svm_leave_nested(svm);
> +		svm_set_gif(svm, true);
> +	}
> +
> +	svm->vmcb->save.efer = efer | EFER_SVME;
> +	mark_dirty(svm->vmcb, VMCB_CR);
>   }
>   
>   static int is_external_interrupt(u32 info)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index be8e830f83fa..6ac4c00a5d82 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -389,6 +389,7 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
>   
>   void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
>   			  struct vmcb *nested_vmcb);
> +void svm_leave_nested(struct vcpu_svm *svm);
>   int nested_svm_vmrun(struct vcpu_svm *svm);
>   void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
>   int nested_svm_vmexit(struct vcpu_svm *svm);

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 25/30] KVM: nSVM: leave guest mode when clearing EFER.SVME
  2020-06-01  2:26   ` Krish Sadhukhan
@ 2020-06-01  7:28     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-06-01  7:28 UTC (permalink / raw)
  To: Krish Sadhukhan, linux-kernel, kvm

On 01/06/20 04:26, Krish Sadhukhan wrote:
> On 5/29/20 8:39 AM, Paolo Bonzini wrote:
>> According to the AMD manual, the effect of turning off EFER.SVME while a
>> guest is running is undefined.  We make it leave guest mode immediately,
>> similar to the effect of clearing the VMX bit in MSR_IA32_FEAT_CTL.
> 
> I see that svm_set_efer() is called in enter_svm_guest_mode() and
> nested_svm_vmexit(). In the VMRUN path, we have already checked
> EFER.SVME in nested_vmcb_checks(). So if it was not set, we wouldn't
> come to enter_svm_guest_mode(). Your fix is only for the #VMEXIT path
> then ?

No, it's for KVM_SET_MSR.

Paolo


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 20/30] KVM: SVM: preserve VGIF across VMCB switch
  2020-05-31 23:11   ` Krish Sadhukhan
@ 2020-06-01  7:30     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-06-01  7:30 UTC (permalink / raw)
  To: Krish Sadhukhan, linux-kernel, kvm

On 01/06/20 01:11, Krish Sadhukhan wrote:
>>
>> +    svm->vmcb->control.int_ctl             =
>> +        (svm->nested.ctl.int_ctl & ~mask) |
>> +        (svm->nested.hsave->control.int_ctl & mask);
> 
> 
> If this is the very first VMRUN, do we have any int_ctl saved in hsave ?

Yes, copy_vmcb_control_area(hsave, vmcb) is called before
enter_svm_guest_mode (which calls nested_prepare_vmcb_control).

Paolo


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 30/30] KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE
  2020-05-29 15:39 ` [PATCH 30/30] KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE Paolo Bonzini
@ 2020-06-02  0:11   ` Krish Sadhukhan
  2020-06-04 14:47     ` Paolo Bonzini
  0 siblings, 1 reply; 50+ messages in thread
From: Krish Sadhukhan @ 2020-06-02  0:11 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm


On 5/29/20 8:39 AM, Paolo Bonzini wrote:
> Similar to VMX, the state that is captured through the currently available
> IOCTLs is a mix of L1 and L2 state, dependent on whether the L2 guest was
> running at the moment when the process was interrupted to save its state.
>
> In particular, the SVM-specific state for nested virtualization includes
> the L1 saved state (including the interrupt flag), the cached L2 controls,
> and the GIF.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/include/uapi/asm/kvm.h |  17 +++-
>   arch/x86/kvm/cpuid.h            |   5 ++
>   arch/x86/kvm/svm/nested.c       | 147 ++++++++++++++++++++++++++++++++
>   arch/x86/kvm/svm/svm.c          |   1 +
>   arch/x86/kvm/vmx/nested.c       |   5 --
>   arch/x86/kvm/x86.c              |   3 +-
>   6 files changed, 171 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 3f3f780c8c65..12075a9de1c1 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -385,18 +385,22 @@ struct kvm_sync_regs {
>   #define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
>   
>   #define KVM_STATE_NESTED_FORMAT_VMX	0
> -#define KVM_STATE_NESTED_FORMAT_SVM	1	/* unused */
> +#define KVM_STATE_NESTED_FORMAT_SVM	1
>   
>   #define KVM_STATE_NESTED_GUEST_MODE	0x00000001
>   #define KVM_STATE_NESTED_RUN_PENDING	0x00000002
>   #define KVM_STATE_NESTED_EVMCS		0x00000004
>   #define KVM_STATE_NESTED_MTF_PENDING	0x00000008
> +#define KVM_STATE_NESTED_GIF_SET	0x00000100
>   
>   #define KVM_STATE_NESTED_SMM_GUEST_MODE	0x00000001
>   #define KVM_STATE_NESTED_SMM_VMXON	0x00000002
>   
>   #define KVM_STATE_NESTED_VMX_VMCS_SIZE	0x1000
>   
> +#define KVM_STATE_NESTED_SVM_VMCB_SIZE	0x1000
> +
> +
>   struct kvm_vmx_nested_state_data {
>   	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
>   	__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
> @@ -411,6 +415,15 @@ struct kvm_vmx_nested_state_hdr {
>   	} smm;
>   };
>   
> +struct kvm_svm_nested_state_data {
> +	/* Save area only used if KVM_STATE_NESTED_RUN_PENDING.  */
> +	__u8 vmcb12[KVM_STATE_NESTED_SVM_VMCB_SIZE];
> +};
> +
> +struct kvm_svm_nested_state_hdr {
> +	__u64 vmcb_pa;
> +};
> +
>   /* for KVM_CAP_NESTED_STATE */
>   struct kvm_nested_state {
>   	__u16 flags;
> @@ -419,6 +432,7 @@ struct kvm_nested_state {
>   
>   	union {
>   		struct kvm_vmx_nested_state_hdr vmx;
> +		struct kvm_svm_nested_state_hdr svm;
>   
>   		/* Pad the header to 128 bytes.  */
>   		__u8 pad[120];
> @@ -431,6 +445,7 @@ struct kvm_nested_state {
>   	 */
>   	union {
>   		struct kvm_vmx_nested_state_data vmx[0];
> +		struct kvm_svm_nested_state_data svm[0];
>   	} data;
>   };
>   
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index 63a70f6a3df3..05434cd9342f 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -303,4 +303,9 @@ static __always_inline void kvm_cpu_cap_check_and_set(unsigned int x86_feature)
>   		kvm_cpu_cap_set(x86_feature);
>   }
>   
> +static inline bool page_address_valid(struct kvm_vcpu *vcpu, gpa_t gpa)
> +{
> +	return PAGE_ALIGNED(gpa) && !(gpa >> cpuid_maxphyaddr(vcpu));
> +}
> +
>   #endif
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index c712fe577029..6b1049148c1b 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -25,6 +25,7 @@
>   #include "trace.h"
>   #include "mmu.h"
>   #include "x86.h"
> +#include "cpuid.h"
>   #include "lapic.h"
>   #include "svm.h"
>   
> @@ -238,6 +239,8 @@ static void load_nested_vmcb_control(struct vcpu_svm *svm,
>   {
>   	copy_vmcb_control_area(&svm->nested.ctl, control);
>   
> +	/* Copy it here because nested_svm_check_controls will check it.  */
> +	svm->nested.ctl.asid           = control->asid;
>   	svm->nested.ctl.msrpm_base_pa &= ~0x0fffULL;
>   	svm->nested.ctl.iopm_base_pa  &= ~0x0fffULL;
>   }
> @@ -930,6 +933,150 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
>   	return NESTED_EXIT_CONTINUE;
>   }
>   
> +static int svm_get_nested_state(struct kvm_vcpu *vcpu,
> +				struct kvm_nested_state __user *user_kvm_nested_state,
> +				u32 user_data_size)
> +{
> +	struct vcpu_svm *svm;
> +	struct kvm_nested_state kvm_state = {
> +		.flags = 0,
> +		.format = KVM_STATE_NESTED_FORMAT_SVM,
> +		.size = sizeof(kvm_state),
> +	};
> +	struct vmcb __user *user_vmcb = (struct vmcb __user *)
> +		&user_kvm_nested_state->data.svm[0];
> +
> +	if (!vcpu)
> +		return kvm_state.size + KVM_STATE_NESTED_SVM_VMCB_SIZE;
> +
> +	svm = to_svm(vcpu);
> +
> +	if (user_data_size < kvm_state.size)
> +		goto out;
> +
> +	/* First fill in the header and copy it out.  */
> +	if (is_guest_mode(vcpu)) {
> +		kvm_state.hdr.svm.vmcb_pa = svm->nested.vmcb;
> +		kvm_state.size += KVM_STATE_NESTED_SVM_VMCB_SIZE;
> +		kvm_state.flags |= KVM_STATE_NESTED_GUEST_MODE;
> +
> +		if (svm->nested.nested_run_pending)
> +			kvm_state.flags |= KVM_STATE_NESTED_RUN_PENDING;
> +	}
> +
> +	if (gif_set(svm))
> +		kvm_state.flags |= KVM_STATE_NESTED_GIF_SET;
> +
> +	if (copy_to_user(user_kvm_nested_state, &kvm_state, sizeof(kvm_state)))
> +		return -EFAULT;
> +
> +	if (!is_guest_mode(vcpu))
> +		goto out;
> +
> +	/*
> +	 * Copy over the full size of the VMCB rather than just the size
> +	 * of the structs.
> +	 */
> +	if (clear_user(user_vmcb, KVM_STATE_NESTED_SVM_VMCB_SIZE))
> +		return -EFAULT;
> +	if (copy_to_user(&user_vmcb->control, &svm->nested.ctl,
> +			 sizeof(user_vmcb->control)))
> +		return -EFAULT;
> +	if (copy_to_user(&user_vmcb->save, &svm->nested.hsave->save,
> +			 sizeof(user_vmcb->save)))
> +		return -EFAULT;
> +
> +out:
> +	return kvm_state.size;
> +}
> +
> +static int svm_set_nested_state(struct kvm_vcpu *vcpu,
> +				struct kvm_nested_state __user *user_kvm_nested_state,
> +				struct kvm_nested_state *kvm_state)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct vmcb *hsave = svm->nested.hsave;
> +	struct vmcb __user *user_vmcb = (struct vmcb __user *)
> +		&user_kvm_nested_state->data.svm[0];
> +	struct vmcb_control_area ctl;
> +	struct vmcb_save_area save;
> +	u32 cr0;
> +
> +	if (kvm_state->format != KVM_STATE_NESTED_FORMAT_SVM)
> +		return -EINVAL;
> +
> +	if (kvm_state->flags & ~(KVM_STATE_NESTED_GUEST_MODE |
> +				 KVM_STATE_NESTED_RUN_PENDING |
> +				 KVM_STATE_NESTED_GIF_SET))
> +		return -EINVAL;
> +
> +	/*
> +	 * If in guest mode, vcpu->arch.efer actually refers to the L2 guest's
> +	 * EFER.SVME, but EFER.SVME still has to be 1 for VMRUN to succeed.
> +	 */
> +	if (!(vcpu->arch.efer & EFER_SVME)) {
> +		/* GIF=1 and no guest mode are required if SVME=0.  */
> +		if (kvm_state->flags != KVM_STATE_NESTED_GIF_SET)
> +			return -EINVAL;
> +	}
> +
> +	/* SMM temporarily disables SVM, so we cannot be in guest mode.  */
> +	if (is_smm(vcpu) && (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE))
> +		return -EINVAL;
> +
> +	if (!(kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) {


Should this be done up at the beginning of the function ? If this flag 
isn't set, we probably don't want to come this far.

> +		svm_leave_nested(svm);
> +		goto out_set_gif;
> +	}
> +
> +	if (!page_address_valid(vcpu, kvm_state->hdr.svm.vmcb_pa))
> +		return -EINVAL;
> +	if (kvm_state->size < sizeof(*kvm_state) + KVM_STATE_NESTED_SVM_VMCB_SIZE)
> +		return -EINVAL;
> +	if (copy_from_user(&ctl, &user_vmcb->control, sizeof(ctl)))
> +		return -EFAULT;
> +	if (copy_from_user(&save, &user_vmcb->save, sizeof(save)))
> +		return -EFAULT;
> +
> +	if (!nested_vmcb_check_controls(&ctl))
> +		return -EINVAL;
> +
> +	/*
> +	 * Processor state contains L2 state.  Check that it is
> +	 * valid for guest mode (see nested_vmcb_checks).
> +	 */
> +	cr0 = kvm_read_cr0(vcpu);
> +        if (((cr0 & X86_CR0_CD) == 0) && (cr0 & X86_CR0_NW))
> +                return -EINVAL;


Does it make sense to create a wrapper for the CR0 checks ? We have 
these checks in nested_vmcb_check_controls() also.

> +
> +	/*
> +	 * Validate host state saved from before VMRUN (see
> +	 * nested_svm_check_permissions).
> +	 * TODO: validate reserved bits for all saved state.
> +	 */
> +	if (!(save.cr0 & X86_CR0_PG))
> +		return -EINVAL;
> +
> +	/*
> +	 * All checks done, we can enter guest mode.  L1 control fields
> +	 * come from the nested save state.  Guest state is already
> +	 * in the registers, the save area of the nested state instead
> +	 * contains saved L1 state.
> +	 */
> +	copy_vmcb_control_area(&hsave->control, &svm->vmcb->control);
> +	hsave->save = save;
> +
> +	svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa;
> +	load_nested_vmcb_control(svm, &ctl);
> +	nested_prepare_vmcb_control(svm);
> +
> +out_set_gif:
> +	svm_set_gif(svm, !!(kvm_state->flags & KVM_STATE_NESTED_GIF_SET));
> +	return 0;
> +}
> +
>   struct kvm_x86_nested_ops svm_nested_ops = {
>   	.check_events = svm_check_nested_events,
> +	.get_state = svm_get_nested_state,
> +	.set_state = svm_set_nested_state,
>   };
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b4db9a980469..3871bfb40594 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1193,6 +1193,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>   		svm->avic_is_running = true;
>   
>   	svm->nested.hsave = page_address(hsave_page);
> +	clear_page(svm->nested.hsave);
>   
>   	svm->msrpm = page_address(msrpm_pages);
>   	svm_vcpu_init_msrpm(svm->msrpm);
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 51ebb60e1533..106fc6fceb97 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -437,11 +437,6 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
>   	}
>   }
>   
> -static bool page_address_valid(struct kvm_vcpu *vcpu, gpa_t gpa)
> -{
> -	return PAGE_ALIGNED(gpa) && !(gpa >> cpuid_maxphyaddr(vcpu));
> -}
> -
>   static int nested_vmx_check_io_bitmap_controls(struct kvm_vcpu *vcpu,
>   					       struct vmcs12 *vmcs12)
>   {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f0fa610bed91..d4aa7dc662d5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4628,7 +4628,8 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>   
>   		if (kvm_state.flags &
>   		    ~(KVM_STATE_NESTED_RUN_PENDING | KVM_STATE_NESTED_GUEST_MODE
> -		      | KVM_STATE_NESTED_EVMCS | KVM_STATE_NESTED_MTF_PENDING))
> +		      | KVM_STATE_NESTED_EVMCS | KVM_STATE_NESTED_MTF_PENDING
> +		      | KVM_STATE_NESTED_GIF_SET))
>   			break;
>   
>   		/* nested_run_pending implies guest_mode.  */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 30/30] KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE
  2020-06-02  0:11   ` Krish Sadhukhan
@ 2020-06-04 14:47     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-06-04 14:47 UTC (permalink / raw)
  To: Krish Sadhukhan, linux-kernel, kvm

Sorry I missed this.

On 02/06/20 02:11, Krish Sadhukhan wrote:
>>
>> +
>> +    /* SMM temporarily disables SVM, so we cannot be in guest mode.  */
>> +    if (is_smm(vcpu) && (kvm_state->flags &
>> KVM_STATE_NESTED_GUEST_MODE))
>> +        return -EINVAL;
>> +
>> +    if (!(kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) {
> 
> 
> Should this be done up at the beginning of the function ? If this flag
> isn't set, we probably don't want to come this far.

So far we have only done consistency checks.  These have to be done no
matter what (for example checking that GIF=1 if SVME=0).

>> +        svm_leave_nested(svm);
>> +        goto out_set_gif;
>> +    }
>> +
>> +    if (!page_address_valid(vcpu, kvm_state->hdr.svm.vmcb_pa))
>> +        return -EINVAL;
>> +    if (kvm_state->size < sizeof(*kvm_state) +
>> KVM_STATE_NESTED_SVM_VMCB_SIZE)
>> +        return -EINVAL;
>> +    if (copy_from_user(&ctl, &user_vmcb->control, sizeof(ctl)))
>> +        return -EFAULT;
>> +    if (copy_from_user(&save, &user_vmcb->save, sizeof(save)))
>> +        return -EFAULT;
>> +
>> +    if (!nested_vmcb_check_controls(&ctl))
>> +        return -EINVAL;
>> +
>> +    /*
>> +     * Processor state contains L2 state.  Check that it is
>> +     * valid for guest mode (see nested_vmcb_checks).
>> +     */
>> +    cr0 = kvm_read_cr0(vcpu);
>> +        if (((cr0 & X86_CR0_CD) == 0) && (cr0 & X86_CR0_NW))
>> +                return -EINVAL;
> 
> 
> Does it make sense to create a wrapper for the CR0 checks ? We have
> these checks in nested_vmcb_check_controls() also.

Not in nested_vmcb_check_controls (rather nested_vmcb_checks as
mentioned in the comments).

If there are more checks it certainly makes sense to have them.  Right
now however there are only two checks in svm_set_nested_state, and they
come from two different functions so I chose to duplicate them.

Paolo


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 19/30] KVM: nSVM: extract svm_set_gif
  2020-05-29 15:39 ` [PATCH 19/30] KVM: nSVM: extract svm_set_gif Paolo Bonzini
@ 2020-06-05 20:33   ` Qian Cai
  2020-06-08 11:11     ` Paolo Bonzini
  0 siblings, 1 reply; 50+ messages in thread
From: Qian Cai @ 2020-06-05 20:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, torvalds

On Fri, May 29, 2020 at 11:39:23AM -0400, Paolo Bonzini wrote:
> Extract the code that is needed to implement CLGI and STGI,
> so that we can run it from VMRUN and vmexit (and in the future,
> KVM_SET_NESTED_STATE).  Skip the request for KVM_REQ_EVENT unless needed,
> subsuming the evaluate_pending_interrupts optimization that is found
> in enter_svm_guest_mode.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/irq.c        |  1 +
>  arch/x86/kvm/svm/nested.c | 22 ++---------------
>  arch/x86/kvm/svm/svm.c    | 51 ++++++++++++++++++++++++++-------------
>  arch/x86/kvm/svm/svm.h    |  1 +
>  4 files changed, 38 insertions(+), 37 deletions(-)
[]
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1981,6 +1981,38 @@ static int vmrun_interception(struct vcpu_svm *svm)
>  	return nested_svm_vmrun(svm);
>  }
>  
> +void svm_set_gif(struct vcpu_svm *svm, bool value)
> +{
> +	if (value) {
> +		/*
> +		 * If VGIF is enabled, the STGI intercept is only added to
> +		 * detect the opening of the SMI/NMI window; remove it now.
> +		 * Likewise, clear the VINTR intercept, we will set it
> +		 * again while processing KVM_REQ_EVENT if needed.
> +		 */
> +		if (vgif_enabled(svm))
> +			clr_intercept(svm, INTERCEPT_STGI);
> +		if (is_intercept(svm, SVM_EXIT_VINTR))

A simple qemu-kvm will trigger the warning. (Looks like the patch had
already been pulled into the mainline quickly.)

# qemu-kvm -name ubuntu-18.04-server-cloudimg -cpu host -smp 2 -m 2G \
  -hda ubuntu-18.04-server-cloudimg.qcow2 \
  -cdrom ubuntu-18.04-server-cloudimg.iso -nic user,hostfwd=tcp::2222-:22 \
  -nographic -device vfio-pci,host=0000:04:0a.0

[ 1362.284812][ T2195] ================================================================================
[ 1362.294789][ T2195] UBSAN: shift-out-of-bounds in arch/x86/kvm/svm/svm.h:316:47
[ 1362.302209][ T2195] shift exponent 100 is too large for 64-bit type 'long long unsigned int'
[ 1362.310715][ T2195] CPU: 51 PID: 2195 Comm: qemu-kvm Not tainted 5.7.0-next-20200605 #6
[ 1362.319244][ T2195] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
[ 1362.328530][ T2195] Call Trace:
[ 1362.331710][ T2195]  dump_stack+0xa7/0xea
[ 1362.335758][ T2195]  ubsan_epilogue+0x9/0x45
[ 1362.340070][ T2195]  __ubsan_handle_shift_out_of_bounds.cold.13+0x14/0x98
[ 1362.347386][ T2195]  ? __kasan_check_write+0x14/0x20
[ 1362.352405][ T2195]  ? set_msr_interception+0x1b8/0x300 [kvm_amd]
[ 1362.358558][ T2195]  svm_set_gif.cold.64+0x16/0xd1 [kvm_amd]
[ 1362.364276][ T2195]  svm_set_efer+0xbc/0xc0 [kvm_amd]
svm_set_efer at arch/x86/kvm/svm/svm.c:281
[ 1362.369866][ T2195]  init_vmcb+0x107c/0x1b80 [kvm_amd]
[ 1362.375056][ T2195]  svm_create_vcpu+0x237/0x360 [kvm_amd]
[ 1362.380677][ T2195]  kvm_arch_vcpu_create+0x490/0x5f0 [kvm]
[ 1362.386381][ T2195]  kvm_vm_ioctl+0x10c5/0x17f0 [kvm]
[ 1362.391561][ T2195]  ? kvm_unregister_device_ops+0xd0/0xd0 [kvm]
[ 1362.398118][ T2195]  ? validate_chain+0xab/0x1b30
[ 1362.402864][ T2195]  ? __kasan_check_read+0x11/0x20
[ 1362.407784][ T2195]  ? check_chain_key+0x1df/0x2e0
[ 1362.412615][ T2195]  ? __lock_acquire+0x74c/0xc10
[ 1362.417363][ T2195]  ? match_held_lock+0x20/0x2f0
[ 1362.422593][ T2195]  ? check_chain_key+0x1df/0x2e0
[ 1362.427425][ T2195]  ? find_held_lock+0xca/0xf0
[ 1362.431997][ T2195]  ? __fget_files+0x172/0x270
[ 1362.436565][ T2195]  ? lock_downgrade+0x3e0/0x3e0
[ 1362.441310][ T2195]  ? rcu_read_lock_held+0xac/0xc0
[ 1362.446744][ T2195]  ? rcu_read_lock_sched_held+0xe0/0xe0
[ 1362.452190][ T2195]  ? __fget_files+0x18c/0x270
[ 1362.456759][ T2195]  ? __fget_light+0xf2/0x110
[ 1362.461242][ T2195]  ksys_ioctl+0x26e/0xc60
[ 1362.465464][ T2195]  ? generic_block_fiemap+0x70/0x70
[ 1362.471024][ T2195]  ? find_held_lock+0xca/0xf0
[ 1362.475597][ T2195]  ? __task_pid_nr_ns+0x145/0x290
[ 1362.480517][ T2195]  ? check_flags.part.26+0x86/0x230
[ 1362.485613][ T2195]  ? __kasan_check_read+0x11/0x20
[ 1362.490535][ T2195]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1362.497009][ T2195]  ? do_syscall_64+0x23/0x340
[ 1362.501581][ T2195]  ? rcu_read_lock_sched_held+0xac/0xe0
[ 1362.507023][ T2195]  ? mark_held_locks+0x34/0xb0
[ 1362.511679][ T2195]  ? do_syscall_64+0x29/0x340
[ 1362.516254][ T2195]  __x64_sys_ioctl+0x43/0x4c
[ 1362.521177][ T2195]  do_syscall_64+0x64/0x340
[ 1362.525573][ T2195]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1362.531366][ T2195] RIP: 0033:0x7f8c609cb87b
[ 1362.535676][ T2195] Code: Bad RIP value.
[ 1362.539633][ T2195] RSP: 002b:00007f8c517fb6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1362.548399][ T2195] RAX: ffffffffffffffda RBX: 0000564c201f9110 RCX: 00007f8c609cb87b
[ 1362.556287][ T2195] RDX: 0000000000000000 RSI: 000000000000ae41 RDI: 000000000000000a
[ 1362.564178][ T2195] RBP: 0000564c201f9110 R08: 0000564c1e47cd50 R09: 0000564c201f9110
[ 1362.572455][ T2195] R10: 0000564c1ebfd680 R11: 0000000000000246 R12: 0000564c201954d0
[ 1362.580344][ T2195] R13: 00007ffe3a592d6f R14: 00007ffe3a592e00 R15: 00007f8c517fb840
[ 1362.588287][ T2195] ================================================================================

> +			svm_clear_vintr(svm);
> +
> +		enable_gif(svm);
> +		if (svm->vcpu.arch.smi_pending ||
> +		    svm->vcpu.arch.nmi_pending ||
> +		    kvm_cpu_has_injectable_intr(&svm->vcpu))
> +			kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
> +	} else {
> +		disable_gif(svm);
> +
> +		/*
> +		 * After a CLGI no interrupts should come.  But if vGIF is
> +		 * in use, we still rely on the VINTR intercept (rather than
> +		 * STGI) to detect an open interrupt window.
> +		*/
> +		if (!vgif_enabled(svm))
> +			svm_clear_vintr(svm);
> +	}
> +}
> +

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 19/30] KVM: nSVM: extract svm_set_gif
  2020-06-05 20:33   ` Qian Cai
@ 2020-06-08 11:11     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2020-06-08 11:11 UTC (permalink / raw)
  To: Qian Cai; +Cc: linux-kernel, kvm, torvalds

On 05/06/20 22:33, Qian Cai wrote:
>> +	if (value) {
>> +		/*
>> +		 * If VGIF is enabled, the STGI intercept is only added to
>> +		 * detect the opening of the SMI/NMI window; remove it now.
>> +		 * Likewise, clear the VINTR intercept, we will set it
>> +		 * again while processing KVM_REQ_EVENT if needed.
>> +		 */
>> +		if (vgif_enabled(svm))
>> +			clr_intercept(svm, INTERCEPT_STGI);
>> +		if (is_intercept(svm, SVM_EXIT_VINTR))
> A simple qemu-kvm will trigger the warning. (Looks like the patch had
> already been pulled into the mainline quickly.)

You're right, that should have been INTERCEPT_VINTR.  I'll post a fix.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2020-06-08 11:11 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-29 15:39 [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Paolo Bonzini
2020-05-29 15:39 ` [PATCH 01/30] KVM: x86: track manually whether an event has been injected Paolo Bonzini
2020-05-29 15:39 ` [PATCH 02/30] KVM: x86: enable event window in inject_pending_event Paolo Bonzini
2020-05-29 15:39 ` [PATCH 03/30] KVM: nSVM: inject exceptions via svm_check_nested_events Paolo Bonzini
2020-05-29 15:39 ` [PATCH 04/30] KVM: nSVM: remove exit_required Paolo Bonzini
2020-05-29 15:39 ` [PATCH 05/30] KVM: nSVM: correctly inject INIT vmexits Paolo Bonzini
2020-05-29 15:39 ` [PATCH 06/30] KVM: SVM: always update CR3 in VMCB Paolo Bonzini
2020-05-29 17:41   ` Krish Sadhukhan
2020-05-29 17:56     ` Sean Christopherson
2020-05-29 15:39 ` [PATCH 07/30] KVM: nVMX: always update CR3 in VMCS Paolo Bonzini
2020-05-29 15:39 ` [PATCH 08/30] KVM: nSVM: move map argument out of enter_svm_guest_mode Paolo Bonzini
2020-05-29 18:10   ` Krish Sadhukhan
2020-05-29 19:04     ` Paolo Bonzini
2020-05-29 20:02       ` Krish Sadhukhan
2020-05-29 15:39 ` [PATCH 09/30] KVM: nSVM: extract load_nested_vmcb_control Paolo Bonzini
2020-05-29 15:39 ` [PATCH 10/30] KVM: nSVM: extract preparation of VMCB for nested run Paolo Bonzini
2020-05-29 18:27   ` Krish Sadhukhan
2020-05-29 19:02     ` Paolo Bonzini
2020-05-29 15:39 ` [PATCH 11/30] KVM: nSVM: move MMU setup to nested_prepare_vmcb_control Paolo Bonzini
2020-05-29 15:39 ` [PATCH 12/30] KVM: nSVM: clean up tsc_offset update Paolo Bonzini
2020-05-29 15:39 ` [PATCH 13/30] KVM: nSVM: pass vmcb_control_area to copy_vmcb_control_area Paolo Bonzini
2020-05-29 15:39 ` [PATCH 14/30] KVM: nSVM: remove trailing padding for struct vmcb_control_area Paolo Bonzini
2020-05-29 15:39 ` [PATCH 15/30] KVM: nSVM: save all control fields in svm->nested Paolo Bonzini
2020-05-29 15:39 ` [PATCH 16/30] KVM: nSVM: restore clobbered INT_CTL fields after clearing VINTR Paolo Bonzini
2020-05-29 15:39 ` [PATCH 17/30] KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit Paolo Bonzini
2020-05-30  2:06   ` Krish Sadhukhan
2020-05-30  5:10     ` Paolo Bonzini
2020-05-29 15:39 ` [PATCH 18/30] KVM: nSVM: remove unnecessary if Paolo Bonzini
2020-05-29 15:39 ` [PATCH 19/30] KVM: nSVM: extract svm_set_gif Paolo Bonzini
2020-06-05 20:33   ` Qian Cai
2020-06-08 11:11     ` Paolo Bonzini
2020-05-29 15:39 ` [PATCH 20/30] KVM: SVM: preserve VGIF across VMCB switch Paolo Bonzini
2020-05-31 23:11   ` Krish Sadhukhan
2020-06-01  7:30     ` Paolo Bonzini
2020-05-29 15:39 ` [PATCH 21/30] KVM: nSVM: synthesize correct EXITINTINFO on vmexit Paolo Bonzini
2020-05-29 15:39 ` [PATCH 22/30] KVM: nSVM: remove HF_VINTR_MASK Paolo Bonzini
2020-05-29 15:39 ` [PATCH 23/30] KVM: nSVM: remove HF_HIF_MASK Paolo Bonzini
2020-05-29 15:39 ` [PATCH 24/30] KVM: nSVM: split nested_vmcb_check_controls Paolo Bonzini
2020-05-29 15:39 ` [PATCH 25/30] KVM: nSVM: leave guest mode when clearing EFER.SVME Paolo Bonzini
2020-06-01  2:26   ` Krish Sadhukhan
2020-06-01  7:28     ` Paolo Bonzini
2020-05-29 15:39 ` [PATCH 26/30] KVM: MMU: pass arbitrary CR0/CR4/EFER to kvm_init_shadow_mmu Paolo Bonzini
2020-05-29 15:39 ` [PATCH 27/30] selftests: kvm: introduce cpu_has_svm() check Paolo Bonzini
2020-05-29 15:39 ` [PATCH 28/30] selftests: kvm: add a SVM version of state-test Paolo Bonzini
2020-05-29 15:39 ` [PATCH 29/30] selftests: kvm: fix smm test on SVM Paolo Bonzini
2020-05-29 15:39 ` [PATCH 30/30] KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE Paolo Bonzini
2020-06-02  0:11   ` Krish Sadhukhan
2020-06-04 14:47     ` Paolo Bonzini
2020-05-29 17:59 ` [PATCH v3 00/28] KVM: nSVM: event fixes and migration support Sean Christopherson
2020-05-29 19:07   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).