[PATCH v2 0/5] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting
@ 2021-12-13 10:46 Maxim Levitsky
  2021-12-13 10:46 ` [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF Maxim Levitsky
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-12-13 10:46 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Ingo Molnar, Maxim Levitsky

This patch series aims to lift long standing restriction of
using AVIC only when nested virtualization is not exposed
to the guest.

Notes about specific patches:

Patch 1 - this is an unrelated fix to KVM for a corner case I found
while writing a unit test for the feature.

Patch 2 addresses the fact that AVIC appears to be disabled
in CPUID on several Milan systems I am tesing on.

This adds a workaround (with a big warning and a kernel taint) to enable
it anyway if you really know what you are doing.

Patch 3 is from Paolo, which was done after our long discussion about the
various races that AVIC inhibition is subject to. Thanks Paolo!
It replaces two patches I sent in the previous version which attempted
to fix the same issue but weren't quite right.

Patch 4 is the more or less the same patch 5 in V1, but with proper
justification.

Patch 6 is the patch that adds the AVIC co-existance itself and
in this version I fixed (and partially tested) it in regard to AVIC inhibition
due to interrupt window.

Everything was tested on a Zen3 (Milan) machine.

On Zen2 machines, the errata #1235 makes my tests fail quite fast.
For general use though, most of the time this errata doesn't cause
long hangs.

Best regards,
	Maxim Levitsky

Maxim Levitsky (5):
  KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets
    L2 control EFLAGS.IF
  KVM: SVM: allow to force AVIC to be enabled
  KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  KVM: x86: don't touch irr_pending in kvm_apic_update_apicv when
    inhibiting it
  KVM: SVM: allow AVIC to co-exist with a nested guest running

 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  7 ++-
 arch/x86/kvm/lapic.c               |  5 +-
 arch/x86/kvm/svm/avic.c            | 91 +++++++++++++++++++-----------
 arch/x86/kvm/svm/nested.c          | 11 ++--
 arch/x86/kvm/svm/svm.c             | 51 +++++++++++------
 arch/x86/kvm/svm/svm.h             |  1 +
 arch/x86/kvm/x86.c                 | 17 +++++-
 8 files changed, 125 insertions(+), 59 deletions(-)

-- 
2.26.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF
  2021-12-13 10:46 [PATCH v2 0/5] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
@ 2021-12-13 10:46 ` Maxim Levitsky
  2021-12-13 11:34   ` Paolo Bonzini
  2021-12-13 10:46 ` [PATCH v2 2/5] KVM: SVM: allow to force AVIC to be enabled Maxim Levitsky
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-12-13 10:46 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Ingo Molnar, Maxim Levitsky

Fix a corner case in which L1 hypervisor intercepts interrupts (INTERCEPT_INTR)
and either doesn't use virtual interrupt masking (V_INTR_MASKING) or
enters a nested guest with EFLAGS.IF disabled prior to the entry.

In this case, despite the fact that L1 intercepts the interrupts,
KVM still needs to set up an interrupt window to wait before it
can deliver INTR vmexit.

Currently instead, the KVM enters an endless loop of 'req_immediate_exit'.

Note that on VMX this case is impossible as there is only
'vmexit on external interrupts' execution control which either set,
in which case both host and guest's EFLAGS.IF
is ignored, or clear, in which case no VMexit is delivered.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e57e6857e0630..c9668a3b51011 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3372,17 +3372,21 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
 static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	bool blocked;
+
 	if (svm->nested.nested_run_pending)
 		return -EBUSY;
 
+	blocked = svm_interrupt_blocked(vcpu);
+
 	/*
 	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
 	 * e.g. if the IRQ arrived asynchronously after checking nested events.
 	 */
 	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
-		return -EBUSY;
-
-	return !svm_interrupt_blocked(vcpu);
+		return !blocked ? -EBUSY : 0;
+	else
+		return !blocked;
 }
 
 static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/5] KVM: SVM: allow to force AVIC to be enabled
  2021-12-13 10:46 [PATCH v2 0/5] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
  2021-12-13 10:46 ` [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF Maxim Levitsky
@ 2021-12-13 10:46 ` Maxim Levitsky
  2022-01-04 22:25   ` Sean Christopherson
  2021-12-13 10:46 ` [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition Maxim Levitsky
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-12-13 10:46 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Ingo Molnar, Maxim Levitsky

Apparently on some systems AVIC is disabled in CPUID but still usable.

Allow the user to override the CPUID if the user is willing to
take the risk.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c9668a3b51011..468cc385c35f0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -206,6 +206,9 @@ module_param(tsc_scaling, int, 0444);
 static bool avic;
 module_param(avic, bool, 0444);
 
+static bool force_avic;
+module_param_unsafe(force_avic, bool, 0444);
+
 bool __read_mostly dump_invalid_vmcb;
 module_param(dump_invalid_vmcb, bool, 0644);
 
@@ -4656,10 +4659,14 @@ static __init int svm_hardware_setup(void)
 			nrips = false;
 	}
 
-	enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
+	enable_apicv = avic = avic && npt_enabled && (boot_cpu_has(X86_FEATURE_AVIC) || force_avic);
 
 	if (enable_apicv) {
-		pr_info("AVIC enabled\n");
+		if (!boot_cpu_has(X86_FEATURE_AVIC)) {
+			pr_warn("AVIC is not supported in CPUID but force enabled");
+			pr_warn("Your system might crash and burn");
+		} else
+			pr_info("AVIC enabled\n");
 
 		amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
 	} else {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  2021-12-13 10:46 [PATCH v2 0/5] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
  2021-12-13 10:46 ` [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF Maxim Levitsky
  2021-12-13 10:46 ` [PATCH v2 2/5] KVM: SVM: allow to force AVIC to be enabled Maxim Levitsky
@ 2021-12-13 10:46 ` Maxim Levitsky
  2022-01-04 22:52   ` Sean Christopherson
  2021-12-13 10:46 ` [PATCH v2 4/5] KVM: x86: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it Maxim Levitsky
  2021-12-13 10:46 ` [PATCH v2 5/5] KVM: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky
  4 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-12-13 10:46 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Ingo Molnar, Maxim Levitsky

If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
inhibited, it might read a stale value of vcpu->arch.apicv_active
which can lead to the target vCPU not noticing the interrupt.

To fix this use load-acquire/store-release so that, if the target vCPU
is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the
AVIC.  If AVIC has been disabled in the meanwhile, proceed with the
KVM_REQ_EVENT-based delivery.

All this complicated logic is actually exactly how we can handle an
incomplete IPI vmexit; the only difference lies in who sets IRR, whether
KVM or the processor.

Also incomplete IPI vmexit, has the same races as svm_deliver_avic_intr.
therefore just reuse the avic_kick_target_vcpu for it as well.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Co-developed-with: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 85 +++++++++++++++++++++++++----------------
 arch/x86/kvm/x86.c      |  4 +-
 2 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 90364d02f22aa..34f62da2fbadd 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -289,6 +289,47 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static void avic_kick_target_vcpu(struct kvm_vcpu *vcpu)
+{
+	bool in_guest_mode;
+
+	/*
+	 * vcpu->arch.apicv_active is read after vcpu->mode.  Pairs
+	 * with smp_store_release in vcpu_enter_guest.
+	 */
+	in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);
+	if (READ_ONCE(vcpu->arch.apicv_active)) {
+		if (in_guest_mode) {
+			/*
+			 * Signal the doorbell to tell hardware to inject the IRQ if the vCPU
+			 * is in the guest.  If the vCPU is not in the guest, hardware will
+			 * automatically process AVIC interrupts at VMRUN.
+			 *
+			 * Note, the vCPU could get migrated to a different pCPU at any
+			 * point, which could result in signalling the wrong/previous
+			 * pCPU.  But if that happens the vCPU is guaranteed to do a
+			 * VMRUN (after being migrated) and thus will process pending
+			 * interrupts, i.e. a doorbell is not needed (and the spurious
+			 * one is harmless).
+			 */
+			int cpu = READ_ONCE(vcpu->cpu);
+			if (cpu != get_cpu())
+				wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
+			put_cpu();
+		} else {
+			/*
+			 * Wake the vCPU if it was blocking.  KVM will then detect the
+			 * pending IRQ when checking if the vCPU has a wake event.
+			 */
+			kvm_vcpu_wake_up(vcpu);
+		}
+	} else {
+		/* Compare this case with __apic_accept_irq.  */
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
 				   u32 icrl, u32 icrh)
 {
@@ -304,8 +345,10 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
 					GET_APIC_DEST_FIELD(icrh),
-					icrl & APIC_DEST_MASK))
-			kvm_vcpu_wake_up(vcpu);
+					icrl & APIC_DEST_MASK)) {
+			vcpu->arch.apic->irr_pending = true;
+			avic_kick_target_vcpu(vcpu);
+		}
 	}
 }
 
@@ -671,9 +714,12 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 
 int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 {
-	if (!vcpu->arch.apicv_active)
-		return -1;
-
+	/*
+	 * Below, we have to handle anyway the case of AVIC being disabled
+	 * in the middle of this function, and there is hardly any overhead
+	 * if AVIC is disabled.  So, we do not bother returning -1 and handle
+	 * the kick ourselves for disabled APICv.
+	 */
 	kvm_lapic_set_irr(vec, vcpu->arch.apic);
 
 	/*
@@ -684,34 +730,7 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 	 * the doorbell if the vCPU is already running in the guest.
 	 */
 	smp_mb__after_atomic();
-
-	/*
-	 * Signal the doorbell to tell hardware to inject the IRQ if the vCPU
-	 * is in the guest.  If the vCPU is not in the guest, hardware will
-	 * automatically process AVIC interrupts at VMRUN.
-	 */
-	if (vcpu->mode == IN_GUEST_MODE) {
-		int cpu = READ_ONCE(vcpu->cpu);
-
-		/*
-		 * Note, the vCPU could get migrated to a different pCPU at any
-		 * point, which could result in signalling the wrong/previous
-		 * pCPU.  But if that happens the vCPU is guaranteed to do a
-		 * VMRUN (after being migrated) and thus will process pending
-		 * interrupts, i.e. a doorbell is not needed (and the spurious
-		 * one is harmless).
-		 */
-		if (cpu != get_cpu())
-			wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
-		put_cpu();
-	} else {
-		/*
-		 * Wake the vCPU if it was blocking.  KVM will then detect the
-		 * pending IRQ when checking if the vCPU has a wake event.
-		 */
-		kvm_vcpu_wake_up(vcpu);
-	}
-
+	avic_kick_target_vcpu(vcpu);
 	return 0;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 85127b3e3690b..81a74d86ee5eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9869,7 +9869,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * result in virtual interrupt delivery.
 	 */
 	local_irq_disable();
-	vcpu->mode = IN_GUEST_MODE;
+
+	/* Store vcpu->apicv_active before vcpu->mode.  */
+	smp_store_release(&vcpu->mode, IN_GUEST_MODE);
 
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/5] KVM: x86: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it
  2021-12-13 10:46 [PATCH v2 0/5] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
                   ` (2 preceding siblings ...)
  2021-12-13 10:46 ` [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition Maxim Levitsky
@ 2021-12-13 10:46 ` Maxim Levitsky
  2022-01-04 22:57   ` Sean Christopherson
  2021-12-13 10:46 ` [PATCH v2 5/5] KVM: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky
  4 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-12-13 10:46 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Ingo Molnar, Maxim Levitsky

kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits
can be set by the CPU after it was called, and won't cause the irr_pending
to be set to true.

Also the logic in avic_kick_target_vcpu doesn't expect a race with this
function.

To make it simple, just keep irr_pending set to true and
let the next interrupt injection to the guest clear it.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index baca9fa37a91c..6e1fbbf4c508b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2312,7 +2312,10 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
 		apic->irr_pending = true;
 		apic->isr_count = 1;
 	} else {
-		apic->irr_pending = (apic_search_irr(apic) != -1);
+		/*
+		 * Don't touch irr_pending, let it be cleared when
+		 * we process the interrupt
+		 */
 		apic->isr_count = count_vectors(apic->regs + APIC_ISR);
 	}
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/5] KVM: SVM: allow AVIC to co-exist with a nested guest running
  2021-12-13 10:46 [PATCH v2 0/5] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
                   ` (3 preceding siblings ...)
  2021-12-13 10:46 ` [PATCH v2 4/5] KVM: x86: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it Maxim Levitsky
@ 2021-12-13 10:46 ` Maxim Levitsky
  2022-01-05 21:56   ` Sean Christopherson
  4 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-12-13 10:46 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Ingo Molnar, Maxim Levitsky

Inhibit the AVIC of the vCPU that is running nested for the duration of the nested run,
so that all interrupts arriving from both its vCPU siblings and from the KVM are
delivered using normal IPIs and cause that vCPU to vmexit.

Note that unlike normal AVIC inhibition, there is no need to update the AVIC mmio
memslot, because the nested guest uses its own set of paging tables.
That also means that AVIC doesn't need to be inhibited VM wide.

Note that in theory when a nested guest doesn't intercept physical interrupts,
we could continue using AVIC to deliver them to it but don't bother doing this.

Plus when nested AVIC is implemented, the nested guest will likely use it,
which will not allow this optimization to be used anyway.
(can't use real AVIC to support both L1 and L2 at the same time)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  7 ++++++-
 arch/x86/kvm/svm/avic.c            |  6 +++++-
 arch/x86/kvm/svm/nested.c          | 11 ++++++-----
 arch/x86/kvm/svm/svm.c             | 30 +++++++++++++++++++-----------
 arch/x86/kvm/svm/svm.h             |  1 +
 arch/x86/kvm/x86.c                 | 13 ++++++++++++-
 7 files changed, 50 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index c2b007171abd2..d9d7459ef9e8f 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -119,6 +119,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush)
 KVM_X86_OP_NULL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP_NULL(complete_emulated_msr)
+KVM_X86_OP_NULL(apicv_check_inhibit);
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_NULL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e863d569c89a4..78b3793cc08c5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1036,7 +1036,6 @@ struct kvm_x86_msr_filter {
 
 #define APICV_INHIBIT_REASON_DISABLE    0
 #define APICV_INHIBIT_REASON_HYPERV     1
-#define APICV_INHIBIT_REASON_NESTED     2
 #define APICV_INHIBIT_REASON_IRQWIN     3
 #define APICV_INHIBIT_REASON_PIT_REINJ  4
 #define APICV_INHIBIT_REASON_X2APIC	5
@@ -1486,6 +1485,12 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+
+	/*
+	 * Returns false if for some reason APICv (e.g guest mode)
+	 * must be inhibited on this vCPU
+	 */
+	bool (*apicv_check_inhibit)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 34f62da2fbadd..5a8304938f51e 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -734,6 +734,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 	return 0;
 }
 
+bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu)
+{
+	return is_guest_mode(vcpu);
+}
+
 bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
 {
 	return false;
@@ -950,7 +955,6 @@ bool svm_check_apicv_inhibit_reasons(ulong bit)
 	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
 			  BIT(APICV_INHIBIT_REASON_ABSENT) |
 			  BIT(APICV_INHIBIT_REASON_HYPERV) |
-			  BIT(APICV_INHIBIT_REASON_NESTED) |
 			  BIT(APICV_INHIBIT_REASON_IRQWIN) |
 			  BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
 			  BIT(APICV_INHIBIT_REASON_X2APIC) |
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index cf206855ebf09..bf17c2d7cf321 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -551,11 +551,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	 * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
 	 */
 
-	/*
-	 * Also covers avic_vapic_bar, avic_backing_page, avic_logical_id,
-	 * avic_physical_id.
-	 */
-	WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
 
 	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
 	svm->vmcb->control.nested_ctl = svm->vmcb01.ptr->control.nested_ctl;
@@ -659,6 +654,9 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
 
 	svm_set_gif(svm, true);
 
+	if (kvm_vcpu_apicv_active(vcpu))
+		kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
+
 	return 0;
 }
 
@@ -923,6 +921,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (unlikely(svm->vmcb->save.rflags & X86_EFLAGS_TF))
 		kvm_queue_exception(&(svm->vcpu), DB_VECTOR);
 
+	if (kvm_apicv_activated(vcpu->kvm))
+		kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 468cc385c35f0..e4228580286e8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1383,7 +1383,8 @@ static void svm_set_vintr(struct vcpu_svm *svm)
 	/*
 	 * The following fields are ignored when AVIC is enabled
 	 */
-	WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
+	if (!is_guest_mode(&svm->vcpu))
+		WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
 
 	svm_set_intercept(svm, INTERCEPT_VINTR);
 
@@ -2853,10 +2854,16 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
 	svm_clear_vintr(to_svm(vcpu));
 
 	/*
-	 * For AVIC, the only reason to end up here is ExtINTs.
+	 * If not running nested, for AVIC, the only reason to end up here is ExtINTs.
 	 * In this case AVIC was temporarily disabled for
 	 * requesting the IRQ window and we have to re-enable it.
+	 *
+	 * If running nested, still uninhibit the AVIC in case irq window
+	 * was requested when it was not running nested.
+	 * All vCPUs which run nested will have their AVIC still
+	 * inhibited due to AVIC inhibition override for that.
 	 */
+
 	kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_IRQWIN);
 
 	++vcpu->stat.irq_window_exits;
@@ -3410,8 +3417,16 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
 		 * unless we have pending ExtINT since it cannot be injected
 		 * via AVIC. In such case, we need to temporarily disable AVIC,
 		 * and fallback to injecting IRQ via V_IRQ.
+		 *
+		 * If running nested, this vCPU will use separate page tables
+		 * which don't have L1's AVIC mapped, and the AVIC is
+		 * already inhibited thus there is no need for global
+		 * AVIC inhibition.
 		 */
-		kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN);
+
+		if (!is_guest_mode(vcpu))
+			kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN);
+
 		svm_set_vintr(svm);
 	}
 }
@@ -3881,14 +3896,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
 			kvm_request_apicv_update(vcpu->kvm, false,
 						 APICV_INHIBIT_REASON_X2APIC);
-
-		/*
-		 * Currently, AVIC does not work with nested virtualization.
-		 * So, we disable AVIC when cpuid for SVM is set in the L1 guest.
-		 */
-		if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
-			kvm_request_apicv_update(vcpu->kvm, false,
-						 APICV_INHIBIT_REASON_NESTED);
 	}
 	init_vmcb_after_set_cpuid(vcpu);
 }
@@ -4486,6 +4493,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
+	.apicv_check_inhibit = avic_is_vcpu_inhibited,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index daa8ca84afccd..545684ea37353 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -590,6 +590,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
 void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
 void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
 int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec);
+bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu);
 bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
 		       uint32_t guest_irq, bool set);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 81a74d86ee5eb..125599855af47 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9161,6 +9161,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
 		r = kvm_check_nested_events(vcpu);
 		if (r < 0)
 			goto out;
+
+		/* Nested VM exit might need to update APICv status */
+		if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
+			kvm_vcpu_update_apicv(vcpu);
 	}
 
 	/* try to inject new event if pending */
@@ -9538,6 +9542,10 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	down_read(&vcpu->kvm->arch.apicv_update_lock);
 
 	activate = kvm_apicv_activated(vcpu->kvm);
+
+	if (kvm_x86_ops.apicv_check_inhibit)
+		activate = activate && !kvm_x86_ops.apicv_check_inhibit(vcpu);
+
 	if (vcpu->arch.apicv_active == activate)
 		goto out;
 
@@ -9935,7 +9943,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		 * per-VM state, and responsing vCPUs must wait for the update
 		 * to complete before servicing KVM_REQ_APICV_UPDATE.
 		 */
-		WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
+		if (!is_guest_mode(vcpu))
+			WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
+		else
+			WARN_ON(kvm_vcpu_apicv_active(vcpu));
 
 		exit_fastpath = static_call(kvm_x86_run)(vcpu);
 		if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF
  2021-12-13 10:46 ` [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF Maxim Levitsky
@ 2021-12-13 11:34   ` Paolo Bonzini
  2021-12-13 13:07     ` Maxim Levitsky
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2021-12-13 11:34 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Dave Hansen,
	H. Peter Anvin, Sean Christopherson, Wanpeng Li, Ingo Molnar

On 12/13/21 11:46, Maxim Levitsky wrote:
> Fix a corner case in which L1 hypervisor intercepts interrupts (INTERCEPT_INTR)
> and either doesn't use virtual interrupt masking (V_INTR_MASKING) or
> enters a nested guest with EFLAGS.IF disabled prior to the entry.
> 
> In this case, despite the fact that L1 intercepts the interrupts,
> KVM still needs to set up an interrupt window to wait before it
> can deliver INTR vmexit.
> 
> Currently instead, the KVM enters an endless loop of 'req_immediate_exit'.
> 
> Note that on VMX this case is impossible as there is only
> 'vmexit on external interrupts' execution control which either set,
> in which case both host and guest's EFLAGS.IF
> is ignored, or clear, in which case no VMexit is delivered.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/svm.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e57e6857e0630..c9668a3b51011 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3372,17 +3372,21 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>   static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>   {
>   	struct vcpu_svm *svm = to_svm(vcpu);
> +	bool blocked;
> +
>   	if (svm->nested.nested_run_pending)
>   		return -EBUSY;
>   
> +	blocked = svm_interrupt_blocked(vcpu);
> +
>   	/*
>   	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
>   	 * e.g. if the IRQ arrived asynchronously after checking nested events.
>   	 */
>   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
> -		return -EBUSY;
> -
> -	return !svm_interrupt_blocked(vcpu);
> +		return !blocked ? -EBUSY : 0;
> +	else
> +		return !blocked;
>   }
>   
>   static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
> 

Right, another case is when CLGI is not trapped and the guest therefore
runs with GIF=0.  I think that means that a similar change has to be
done in all the *_allowed functions.

I would write it as

   	if (svm->nested.nested_run_pending)
   		return -EBUSY;
   
	if (svm_interrupt_blocked(vcpu))
		return 0;

   	/*
   	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
   	 * e.g. if the IRQ arrived asynchronously after checking nested events.
   	 */
   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
		return -EBUSY;
	return 1;

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF
  2021-12-13 11:34   ` Paolo Bonzini
@ 2021-12-13 13:07     ` Maxim Levitsky
  2021-12-13 13:15       ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-12-13 13:07 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Dave Hansen,
	H. Peter Anvin, Sean Christopherson, Wanpeng Li, Ingo Molnar

On Mon, 2021-12-13 at 12:34 +0100, Paolo Bonzini wrote:
> On 12/13/21 11:46, Maxim Levitsky wrote:
> > Fix a corner case in which L1 hypervisor intercepts interrupts (INTERCEPT_INTR)
> > and either doesn't use virtual interrupt masking (V_INTR_MASKING) or
> > enters a nested guest with EFLAGS.IF disabled prior to the entry.
> > 
> > In this case, despite the fact that L1 intercepts the interrupts,
> > KVM still needs to set up an interrupt window to wait before it
> > can deliver INTR vmexit.
> > 
> > Currently instead, the KVM enters an endless loop of 'req_immediate_exit'.
> > 
> > Note that on VMX this case is impossible as there is only
> > 'vmexit on external interrupts' execution control which either set,
> > in which case both host and guest's EFLAGS.IF
> > is ignored, or clear, in which case no VMexit is delivered.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >   arch/x86/kvm/svm/svm.c | 10 +++++++---
> >   1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index e57e6857e0630..c9668a3b51011 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3372,17 +3372,21 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
> >   static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >   {
> >   	struct vcpu_svm *svm = to_svm(vcpu);
> > +	bool blocked;
> > +
> >   	if (svm->nested.nested_run_pending)
> >   		return -EBUSY;
> >   
> > +	blocked = svm_interrupt_blocked(vcpu);
> > +
> >   	/*
> >   	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
> >   	 * e.g. if the IRQ arrived asynchronously after checking nested events.
> >   	 */
> >   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
> > -		return -EBUSY;
> > -
> > -	return !svm_interrupt_blocked(vcpu);
> > +		return !blocked ? -EBUSY : 0;
> > +	else
> > +		return !blocked;
> >   }
> >   
> >   static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
> > 
> 
> Right, another case is when CLGI is not trapped and the guest therefore
> runs with GIF=0.  I think that means that a similar change has to be
> done in all the *_allowed functions.

I think that SVM sets real GIF to 1 on VMentry regardless if it is trapped or not.

However if not trapped, and neither EFLAGS.IF is trapped, one could enter a guest
that has EFLAGS.IF == 0, then the guest could disable GIF, enable EFLAGS.IF,
and then enable GIF, but then GIF enablement should trigger out interrupt window
VINTR as well.


> 
> I would write it as
> 
>    	if (svm->nested.nested_run_pending)
>    		return -EBUSY;
>    
> 	if (svm_interrupt_blocked(vcpu))
> 		return 0;
> 
>    	/*
>    	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
>    	 * e.g. if the IRQ arrived asynchronously after checking nested events.
>    	 */
>    	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
> 		return -EBUSY;
> 	return 1;

Thanks! I was worried to not break the non nested case but looking again at the code,
it is logically equivalent. 


Thanks for the review,
	Best regards,
		Maxim Levitsky

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF
  2021-12-13 13:07     ` Maxim Levitsky
@ 2021-12-13 13:15       ` Paolo Bonzini
  2021-12-13 13:29         ` Maxim Levitsky
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2021-12-13 13:15 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Dave Hansen,
	H. Peter Anvin, Sean Christopherson, Wanpeng Li, Ingo Molnar

On 12/13/21 14:07, Maxim Levitsky wrote:
>> Right, another case is when CLGI is not trapped and the guest therefore
>> runs with GIF=0.  I think that means that a similar change has to be
>> done in all the *_allowed functions.
>
> I think that SVM sets real GIF to 1 on VMentry regardless if it is trapped or not.

Yes, the issue is only when CLGI is not trapped (and vGIF is disabled).

> However if not trapped, and neither EFLAGS.IF is trapped, one could enter a guest
> that has EFLAGS.IF == 0, then the guest could disable GIF, enable EFLAGS.IF,
> and then enable GIF, but then GIF enablement should trigger out interrupt window
> VINTR as well.

While GIF=0 you have svm_nmi_blocked returning true and svm_nmi_allowed 
returning -EBUSY; that's wrong isn't it?

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF
  2021-12-13 13:15       ` Paolo Bonzini
@ 2021-12-13 13:29         ` Maxim Levitsky
  0 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-12-13 13:29 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Dave Hansen,
	H. Peter Anvin, Sean Christopherson, Wanpeng Li, Ingo Molnar

On Mon, 2021-12-13 at 14:15 +0100, Paolo Bonzini wrote:
> On 12/13/21 14:07, Maxim Levitsky wrote:
> > > Right, another case is when CLGI is not trapped and the guest therefore
> > > runs with GIF=0.  I think that means that a similar change has to be
> > > done in all the *_allowed functions.
> > 
> > I think that SVM sets real GIF to 1 on VMentry regardless if it is trapped or not.
> 
> Yes, the issue is only when CLGI is not trapped (and vGIF is disabled).

Yes, but I just wanted to clarify that GIF is initially enabled on VM entry
regardless if it is trapped or not, after that the guest can indeed disable
the GIF if CLGI/STGI is not trapped and vGIF disabled.

> 
> > However if not trapped, and neither EFLAGS.IF is trapped, one could enter a guest
> > that has EFLAGS.IF == 0, then the guest could disable GIF, enable EFLAGS.IF,
> > and then enable GIF, but then GIF enablement should trigger out interrupt window
> > VINTR as well.
> 
> While GIF=0 you have svm_nmi_blocked returning true and svm_nmi_allowed 
> returning -EBUSY; that's wrong isn't it?

Yes, 100% agree, patch (and unit test for this as well) is on the way!

Best regards.	
	Maxim Levitsky
> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] KVM: SVM: allow to force AVIC to be enabled
  2021-12-13 10:46 ` [PATCH v2 2/5] KVM: SVM: allow to force AVIC to be enabled Maxim Levitsky
@ 2022-01-04 22:25   ` Sean Christopherson
  2022-01-05 10:54     ` Maxim Levitsky
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2022-01-04 22:25 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Mon, Dec 13, 2021, Maxim Levitsky wrote:
> Apparently on some systems AVIC is disabled in CPUID but still usable.
> 
> Allow the user to override the CPUID if the user is willing to
> take the risk.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/svm.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index c9668a3b51011..468cc385c35f0 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -206,6 +206,9 @@ module_param(tsc_scaling, int, 0444);
>  static bool avic;
>  module_param(avic, bool, 0444);
>  
> +static bool force_avic;
> +module_param_unsafe(force_avic, bool, 0444);
> +
>  bool __read_mostly dump_invalid_vmcb;
>  module_param(dump_invalid_vmcb, bool, 0644);
>  
> @@ -4656,10 +4659,14 @@ static __init int svm_hardware_setup(void)
>  			nrips = false;
>  	}
>  
> -	enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
> +	enable_apicv = avic = avic && npt_enabled && (boot_cpu_has(X86_FEATURE_AVIC) || force_avic);
>  
>  	if (enable_apicv) {
> -		pr_info("AVIC enabled\n");
> +		if (!boot_cpu_has(X86_FEATURE_AVIC)) {
> +			pr_warn("AVIC is not supported in CPUID but force enabled");
> +			pr_warn("Your system might crash and burn");
> +		} else

Needs curly braces, though arguably the "AVIC enabled" part should be printed
regardless of boot_cpu_has(X86_FEATURE_AVIC).

> +			pr_info("AVIC enabled\n");

This is all more than a bit terrifying, though I can see the usefuless for a
developer.  At the very least, this should taint the kernel.  This should also
probably be buried behind a Kconfig that is itself buried behind EXPERT.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  2021-12-13 10:46 ` [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition Maxim Levitsky
@ 2022-01-04 22:52   ` Sean Christopherson
  2022-01-05 11:03     ` Maxim Levitsky
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2022-01-04 22:52 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Mon, Dec 13, 2021, Maxim Levitsky wrote:
> If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> inhibited, it might read a stale value of vcpu->arch.apicv_active
> which can lead to the target vCPU not noticing the interrupt.
> 
> To fix this use load-acquire/store-release so that, if the target vCPU
> is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the
> AVIC.  If AVIC has been disabled in the meanwhile, proceed with the
> KVM_REQ_EVENT-based delivery.
> 
> All this complicated logic is actually exactly how we can handle an
> incomplete IPI vmexit; the only difference lies in who sets IRR, whether
> KVM or the processor.
> 
> Also incomplete IPI vmexit, has the same races as svm_deliver_avic_intr.
> therefore just reuse the avic_kick_target_vcpu for it as well.
> 
> Reported-by: Maxim Levitsky <mlevitsk@redhat.com>

Heh, probably don't need a Reported-by for a patch you wrote :-)

> Co-developed-with: Paolo Bonzini <pbonzini@redhat.com>

Co-developed-by: is preferred, and should be accompanied by Paolo's SoB.

> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/avic.c | 85 +++++++++++++++++++++++++----------------
>  arch/x86/kvm/x86.c      |  4 +-
>  2 files changed, 55 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 90364d02f22aa..34f62da2fbadd 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -289,6 +289,47 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> +static void avic_kick_target_vcpu(struct kvm_vcpu *vcpu)
> +{
> +	bool in_guest_mode;
> +
> +	/*
> +	 * vcpu->arch.apicv_active is read after vcpu->mode.  Pairs

This should say "must be read", not "is read".  It's obvious from the code that
apicv_active is read second, the comment is there to say that it _must_ be read
after vcpu->mode.

> +	 * with smp_store_release in vcpu_enter_guest.
> +	 */
> +	in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);

IMO, it's marginally clear to initialize the bool.

	bool in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);

> +	if (READ_ONCE(vcpu->arch.apicv_active)) {
> +		if (in_guest_mode) {
> +			/*
> +			 * Signal the doorbell to tell hardware to inject the IRQ if the vCPU
> +			 * is in the guest.  If the vCPU is not in the guest, hardware will
> +			 * automatically process AVIC interrupts at VMRUN.

Might as well wrap these comments at 80 chars since they're being moved.  Or
maybe even better....

	/* blah blah blah */
	if (!READ_ONCE(vcpu->arch.apicv_active)) {
		kvm_make_request(KVM_REQ_EVENT, vcpu);
		kvm_vcpu_kick(vcpu);
		return;
	}

	if (in_guest_mode) {
		...
	} else {
		....
	}

...so that the existing comments can be preserved as is.

> +			 *
> +			 * Note, the vCPU could get migrated to a different pCPU at any
> +			 * point, which could result in signalling the wrong/previous
> +			 * pCPU.  But if that happens the vCPU is guaranteed to do a
> +			 * VMRUN (after being migrated) and thus will process pending
> +			 * interrupts, i.e. a doorbell is not needed (and the spurious
> +			 * one is harmless).
> +			 */
> +			int cpu = READ_ONCE(vcpu->cpu);
> +			if (cpu != get_cpu())
> +				wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
> +			put_cpu();
> +		} else {
> +			/*
> +			 * Wake the vCPU if it was blocking.  KVM will then detect the
> +			 * pending IRQ when checking if the vCPU has a wake event.
> +			 */
> +			kvm_vcpu_wake_up(vcpu);
> +		}
> +	} else {
> +		/* Compare this case with __apic_accept_irq.  */

Honestly, this comment isn't very helpful.  It only takes a few lines to say:

		/*
		 * Manually signal the event, the __apic_accept_irq() fallback
		 * path can't be used if AVIC is disabled after the vector is
		 * already queued in the vIRR.
		 */

(incorporating more feedback below)

> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +		kvm_vcpu_kick(vcpu);
> +	}
> +}
> +
>  static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
>  				   u32 icrl, u32 icrh)
>  {
> @@ -304,8 +345,10 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
>  					GET_APIC_DEST_FIELD(icrh),
> -					icrl & APIC_DEST_MASK))
> -			kvm_vcpu_wake_up(vcpu);
> +					icrl & APIC_DEST_MASK)) {
> +			vcpu->arch.apic->irr_pending = true;
> +			avic_kick_target_vcpu(vcpu);
> +		}
>  	}
>  }
>  
> @@ -671,9 +714,12 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>  
>  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>  {
> -	if (!vcpu->arch.apicv_active)
> -		return -1;
> -
> +	/*
> +	 * Below, we have to handle anyway the case of AVIC being disabled
> +	 * in the middle of this function, and there is hardly any overhead
> +	 * if AVIC is disabled.  So, we do not bother returning -1 and handle
> +	 * the kick ourselves for disabled APICv.

Hmm, my preference would be to keep the "return -1" even though apicv_active must
be rechecked.  That would help highlight that returning "failure" after this point
is not an option as it would result in kvm_lapic_set_irr() being called twice.

> +	 */
>  	kvm_lapic_set_irr(vec, vcpu->arch.apic);
>  
>  	/*
> @@ -684,34 +730,7 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>  	 * the doorbell if the vCPU is already running in the guest.
>  	 */
>  	smp_mb__after_atomic();
> -
> -	/*
> -	 * Signal the doorbell to tell hardware to inject the IRQ if the vCPU
> -	 * is in the guest.  If the vCPU is not in the guest, hardware will
> -	 * automatically process AVIC interrupts at VMRUN.
> -	 */
> -	if (vcpu->mode == IN_GUEST_MODE) {
> -		int cpu = READ_ONCE(vcpu->cpu);
> -
> -		/*
> -		 * Note, the vCPU could get migrated to a different pCPU at any
> -		 * point, which could result in signalling the wrong/previous
> -		 * pCPU.  But if that happens the vCPU is guaranteed to do a
> -		 * VMRUN (after being migrated) and thus will process pending
> -		 * interrupts, i.e. a doorbell is not needed (and the spurious
> -		 * one is harmless).
> -		 */
> -		if (cpu != get_cpu())
> -			wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
> -		put_cpu();
> -	} else {
> -		/*
> -		 * Wake the vCPU if it was blocking.  KVM will then detect the
> -		 * pending IRQ when checking if the vCPU has a wake event.
> -		 */
> -		kvm_vcpu_wake_up(vcpu);
> -	}
> -
> +	avic_kick_target_vcpu(vcpu);
>  	return 0;
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 85127b3e3690b..81a74d86ee5eb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9869,7 +9869,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	 * result in virtual interrupt delivery.
>  	 */
>  	local_irq_disable();
> -	vcpu->mode = IN_GUEST_MODE;
> +
> +	/* Store vcpu->apicv_active before vcpu->mode.  */
> +	smp_store_release(&vcpu->mode, IN_GUEST_MODE);
>  
>  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>  
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] KVM: x86: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it
  2021-12-13 10:46 ` [PATCH v2 4/5] KVM: x86: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it Maxim Levitsky
@ 2022-01-04 22:57   ` Sean Christopherson
  2022-01-05 10:56     ` Maxim Levitsky
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2022-01-04 22:57 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Mon, Dec 13, 2021, Maxim Levitsky wrote:
> kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits
> can be set by the CPU after it was called, and won't cause the irr_pending
> to be set to true.
> 
> Also the logic in avic_kick_target_vcpu doesn't expect a race with this
> function.
> 
> To make it simple, just keep irr_pending set to true and
> let the next interrupt injection to the guest clear it.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index baca9fa37a91c..6e1fbbf4c508b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2312,7 +2312,10 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
>  		apic->irr_pending = true;
>  		apic->isr_count = 1;
>  	} else {
> -		apic->irr_pending = (apic_search_irr(apic) != -1);
> +		/*
> +		 * Don't touch irr_pending, let it be cleared when
> +		 * we process the interrupt

Please don't use pronouns in comments, e.g. who is "we" in this context?  Please
also say _why_.  IIUC, this could more precisely be:

		/*
		 * Don't clear irr_pending, searching the IRR can race with
		 * updates from the CPU as APICv is still active from hardware's
		 * perspective.  The flag will be cleared as appropriate when
		 * KVM injects the interrupt.
		 */

> +		 */
>  		apic->isr_count = count_vectors(apic->regs + APIC_ISR);
>  	}
>  }
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] KVM: SVM: allow to force AVIC to be enabled
  2022-01-04 22:25   ` Sean Christopherson
@ 2022-01-05 10:54     ` Maxim Levitsky
  2022-01-05 17:46       ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2022-01-05 10:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Tue, 2022-01-04 at 22:25 +0000, Sean Christopherson wrote:
> On Mon, Dec 13, 2021, Maxim Levitsky wrote:
> > Apparently on some systems AVIC is disabled in CPUID but still usable.
> > 
> > Allow the user to override the CPUID if the user is willing to
> > take the risk.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index c9668a3b51011..468cc385c35f0 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -206,6 +206,9 @@ module_param(tsc_scaling, int, 0444);
> >  static bool avic;
> >  module_param(avic, bool, 0444);
> >  
> > +static bool force_avic;
> > +module_param_unsafe(force_avic, bool, 0444);
> > +
> >  bool __read_mostly dump_invalid_vmcb;
> >  module_param(dump_invalid_vmcb, bool, 0644);
> >  
> > @@ -4656,10 +4659,14 @@ static __init int svm_hardware_setup(void)
> >  			nrips = false;
> >  	}
> >  
> > -	enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
> > +	enable_apicv = avic = avic && npt_enabled && (boot_cpu_has(X86_FEATURE_AVIC) || force_avic);
> >  
> >  	if (enable_apicv) {
> > -		pr_info("AVIC enabled\n");
> > +		if (!boot_cpu_has(X86_FEATURE_AVIC)) {
> > +			pr_warn("AVIC is not supported in CPUID but force enabled");
> > +			pr_warn("Your system might crash and burn");
> > +		} else
> 
> Needs curly braces, though arguably the "AVIC enabled" part should be printed
> regardless of boot_cpu_has(X86_FEATURE_AVIC).
> 
> > +			pr_info("AVIC enabled\n");
> 
> This is all more than a bit terrifying, though I can see the usefuless for a
> developer.  At the very least, this should taint the kernel.  This should also
> probably be buried behind a Kconfig that is itself buried behind EXPERT.
> 
I used 'module_param_unsafe' which does taint the kernel.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] KVM: x86: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it
  2022-01-04 22:57   ` Sean Christopherson
@ 2022-01-05 10:56     ` Maxim Levitsky
  0 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2022-01-05 10:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Tue, 2022-01-04 at 22:57 +0000, Sean Christopherson wrote:
> On Mon, Dec 13, 2021, Maxim Levitsky wrote:
> > kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits
> > can be set by the CPU after it was called, and won't cause the irr_pending
> > to be set to true.
> > 
> > Also the logic in avic_kick_target_vcpu doesn't expect a race with this
> > function.
> > 
> > To make it simple, just keep irr_pending set to true and
> > let the next interrupt injection to the guest clear it.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/lapic.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index baca9fa37a91c..6e1fbbf4c508b 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2312,7 +2312,10 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
> >  		apic->irr_pending = true;
> >  		apic->isr_count = 1;
> >  	} else {
> > -		apic->irr_pending = (apic_search_irr(apic) != -1);
> > +		/*
> > +		 * Don't touch irr_pending, let it be cleared when
> > +		 * we process the interrupt
> 
> Please don't use pronouns in comments, e.g. who is "we" in this context?  Please
> also say _why_.  IIUC, this could more precisely be:

Yes, good point. I will fix this.

Best regards,
	Maxim Levitsky

> 
> 		/*
> 		 * Don't clear irr_pending, searching the IRR can race with
> 		 * updates from the CPU as APICv is still active from hardware's
> 		 * perspective.  The flag will be cleared as appropriate when
> 		 * KVM injects the interrupt.
> 		 */
> 
> > +		 */
> >  		apic->isr_count = count_vectors(apic->regs + APIC_ISR);
> >  	}
> >  }
> > -- 
> > 2.26.3
> > 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  2022-01-04 22:52   ` Sean Christopherson
@ 2022-01-05 11:03     ` Maxim Levitsky
  2022-01-05 11:54       ` Paolo Bonzini
  2022-01-07 15:32       ` Paolo Bonzini
  0 siblings, 2 replies; 25+ messages in thread
From: Maxim Levitsky @ 2022-01-05 11:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Tue, 2022-01-04 at 22:52 +0000, Sean Christopherson wrote:
> On Mon, Dec 13, 2021, Maxim Levitsky wrote:
> > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> > inhibited, it might read a stale value of vcpu->arch.apicv_active
> > which can lead to the target vCPU not noticing the interrupt.
> > 
> > To fix this use load-acquire/store-release so that, if the target vCPU
> > is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the
> > AVIC.  If AVIC has been disabled in the meanwhile, proceed with the
> > KVM_REQ_EVENT-based delivery.
> > 
> > All this complicated logic is actually exactly how we can handle an
> > incomplete IPI vmexit; the only difference lies in who sets IRR, whether
> > KVM or the processor.
> > 
> > Also incomplete IPI vmexit, has the same races as svm_deliver_avic_intr.
> > therefore just reuse the avic_kick_target_vcpu for it as well.
> > 
> > Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Heh, probably don't need a Reported-by for a patch you wrote :-)

Paolo gave me this version, I pretty much sent it as is. We had few iterations
of this patch before though we agreed that the race is finally gone.

> 
> > Co-developed-with: Paolo Bonzini <pbonzini@redhat.com>
> 
> Co-developed-by: is preferred, and should be accompanied by Paolo's SoB.

First time I use this format, so I didn't knew about this.

> 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/avic.c | 85 +++++++++++++++++++++++++----------------
> >  arch/x86/kvm/x86.c      |  4 +-
> >  2 files changed, 55 insertions(+), 34 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 90364d02f22aa..34f62da2fbadd 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -289,6 +289,47 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
> >  	return 0;
> >  }
> >  
> > +static void avic_kick_target_vcpu(struct kvm_vcpu *vcpu)
> > +{
> > +	bool in_guest_mode;
> > +
> > +	/*
> > +	 * vcpu->arch.apicv_active is read after vcpu->mode.  Pairs
> 
> This should say "must be read", not "is read".  It's obvious from the code that
> apicv_active is read second, the comment is there to say that it _must_ be read
> after vcpu->mode.
> 
> > +	 * with smp_store_release in vcpu_enter_guest.
> > +	 */
> > +	in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);
> 
> IMO, it's marginally clear to initialize the bool.
> 
> 	bool in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);
> 
> > +	if (READ_ONCE(vcpu->arch.apicv_active)) {
> > +		if (in_guest_mode) {
> > +			/*
> > +			 * Signal the doorbell to tell hardware to inject the IRQ if the vCPU
> > +			 * is in the guest.  If the vCPU is not in the guest, hardware will
> > +			 * automatically process AVIC interrupts at VMRUN.
> 
> Might as well wrap these comments at 80 chars since they're being moved.  Or
> maybe even better....
> 
> 	/* blah blah blah */
> 	if (!READ_ONCE(vcpu->arch.apicv_active)) {
> 		kvm_make_request(KVM_REQ_EVENT, vcpu);
> 		kvm_vcpu_kick(vcpu);
> 		return;
> 	}
> 
> 	if (in_guest_mode) {
> 		...
> 	} else {
> 		....
> 	}
> 
> ...so that the existing comments can be preserved as is.
> 
> > +			 *
> > +			 * Note, the vCPU could get migrated to a different pCPU at any
> > +			 * point, which could result in signalling the wrong/previous
> > +			 * pCPU.  But if that happens the vCPU is guaranteed to do a
> > +			 * VMRUN (after being migrated) and thus will process pending
> > +			 * interrupts, i.e. a doorbell is not needed (and the spurious
> > +			 * one is harmless).
> > +			 */
> > +			int cpu = READ_ONCE(vcpu->cpu);
> > +			if (cpu != get_cpu())
> > +				wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
> > +			put_cpu();
> > +		} else {
> > +			/*
> > +			 * Wake the vCPU if it was blocking.  KVM will then detect the
> > +			 * pending IRQ when checking if the vCPU has a wake event.
> > +			 */
> > +			kvm_vcpu_wake_up(vcpu);
> > +		}
> > +	} else {
> > +		/* Compare this case with __apic_accept_irq.  */
> 
> Honestly, this comment isn't very helpful.  It only takes a few lines to say:
> 
> 		/*
> 		 * Manually signal the event, the __apic_accept_irq() fallback
> 		 * path can't be used if AVIC is disabled after the vector is
> 		 * already queued in the vIRR.
> 		 */
> 
> (incorporating more feedback below)
> 
> > +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> > +		kvm_vcpu_kick(vcpu);
> > +	}
> > +}
> > +
> >  static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
> >  				   u32 icrl, u32 icrh)
> >  {
> > @@ -304,8 +345,10 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
> >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> >  		if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
> >  					GET_APIC_DEST_FIELD(icrh),
> > -					icrl & APIC_DEST_MASK))
> > -			kvm_vcpu_wake_up(vcpu);
> > +					icrl & APIC_DEST_MASK)) {
> > +			vcpu->arch.apic->irr_pending = true;
> > +			avic_kick_target_vcpu(vcpu);
> > +		}
> >  	}
> >  }
> >  
> > @@ -671,9 +714,12 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
> >  
> >  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >  {
> > -	if (!vcpu->arch.apicv_active)
> > -		return -1;
> > -
> > +	/*
> > +	 * Below, we have to handle anyway the case of AVIC being disabled
> > +	 * in the middle of this function, and there is hardly any overhead
> > +	 * if AVIC is disabled.  So, we do not bother returning -1 and handle
> > +	 * the kick ourselves for disabled APICv.
> 
> Hmm, my preference would be to keep the "return -1" even though apicv_active must
> be rechecked.  That would help highlight that returning "failure" after this point
> is not an option as it would result in kvm_lapic_set_irr() being called twice.

I don't mind either - this will fix the tracepoint I recently added to report the
number of interrupts that were delivered by AVIC/APICv - with this patch,
all of them count as such.


I will also address all other feedback about the comments and send new version soon.

Thanks for the review!
Best regards,
	Maxim Levitsky

> 
> > +	 */
> >  	kvm_lapic_set_irr(vec, vcpu->arch.apic);
> >  
> >  	/*
> > @@ -684,34 +730,7 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >  	 * the doorbell if the vCPU is already running in the guest.
> >  	 */
> >  	smp_mb__after_atomic();
> > -
> > -	/*
> > -	 * Signal the doorbell to tell hardware to inject the IRQ if the vCPU
> > -	 * is in the guest.  If the vCPU is not in the guest, hardware will
> > -	 * automatically process AVIC interrupts at VMRUN.
> > -	 */
> > -	if (vcpu->mode == IN_GUEST_MODE) {
> > -		int cpu = READ_ONCE(vcpu->cpu);
> > -
> > -		/*
> > -		 * Note, the vCPU could get migrated to a different pCPU at any
> > -		 * point, which could result in signalling the wrong/previous
> > -		 * pCPU.  But if that happens the vCPU is guaranteed to do a
> > -		 * VMRUN (after being migrated) and thus will process pending
> > -		 * interrupts, i.e. a doorbell is not needed (and the spurious
> > -		 * one is harmless).
> > -		 */
> > -		if (cpu != get_cpu())
> > -			wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
> > -		put_cpu();
> > -	} else {
> > -		/*
> > -		 * Wake the vCPU if it was blocking.  KVM will then detect the
> > -		 * pending IRQ when checking if the vCPU has a wake event.
> > -		 */
> > -		kvm_vcpu_wake_up(vcpu);
> > -	}
> > -
> > +	avic_kick_target_vcpu(vcpu);
> >  	return 0;
> >  }
> >  
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 85127b3e3690b..81a74d86ee5eb 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9869,7 +9869,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  	 * result in virtual interrupt delivery.
> >  	 */
> >  	local_irq_disable();
> > -	vcpu->mode = IN_GUEST_MODE;
> > +
> > +	/* Store vcpu->apicv_active before vcpu->mode.  */
> > +	smp_store_release(&vcpu->mode, IN_GUEST_MODE);
> >  
> >  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> >  
> > -- 
> > 2.26.3
> > 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  2022-01-05 11:03     ` Maxim Levitsky
@ 2022-01-05 11:54       ` Paolo Bonzini
  2022-01-05 12:15         ` Maxim Levitsky
  2022-01-07 15:32       ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2022-01-05 11:54 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Dave Hansen,
	H. Peter Anvin, Wanpeng Li, Ingo Molnar

On 1/5/22 12:03, Maxim Levitsky wrote:
>> Hmm, my preference would be to keep the "return -1" even though apicv_active must
>> be rechecked.  That would help highlight that returning "failure" after this point
>> is not an option as it would result in kvm_lapic_set_irr() being called twice.
> I don't mind either - this will fix the tracepoint I recently added to report the
> number of interrupts that were delivered by AVIC/APICv - with this patch,
> all of them count as such.

Perhaps we can move the tracepoints in the delivery functions.  This 
also makes them more precise in the rare case where apicv_active changes 
in the middle of the function.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  2022-01-05 11:54       ` Paolo Bonzini
@ 2022-01-05 12:15         ` Maxim Levitsky
  0 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2022-01-05 12:15 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Dave Hansen,
	H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Wed, 2022-01-05 at 12:54 +0100, Paolo Bonzini wrote:
> On 1/5/22 12:03, Maxim Levitsky wrote:
> > > Hmm, my preference would be to keep the "return -1" even though apicv_active must
> > > be rechecked.  That would help highlight that returning "failure" after this point
> > > is not an option as it would result in kvm_lapic_set_irr() being called twice.
> > I don't mind either - this will fix the tracepoint I recently added to report the
> > number of interrupts that were delivered by AVIC/APICv - with this patch,
> > all of them count as such.
> 
> Perhaps we can move the tracepoints in the delivery functions.  This 
> also makes them more precise in the rare case where apicv_active changes 
> in the middle of the function.

That is what I was thinking to do as well, but I don't mind returning the 'return -1' as well.
Best regards,
	Maxim Levitsky

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] KVM: SVM: allow to force AVIC to be enabled
  2022-01-05 10:54     ` Maxim Levitsky
@ 2022-01-05 17:46       ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2022-01-05 17:46 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Wed, Jan 05, 2022, Maxim Levitsky wrote:
> On Tue, 2022-01-04 at 22:25 +0000, Sean Christopherson wrote:
> > This is all more than a bit terrifying, though I can see the usefuless for a
> > developer.  At the very least, this should taint the kernel.  This should also
> > probably be buried behind a Kconfig that is itself buried behind EXPERT.
> > 
> I used 'module_param_unsafe' which does taint the kernel.

Ah, neat, TIL.  Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] KVM: SVM: allow AVIC to co-exist with a nested guest running
  2021-12-13 10:46 ` [PATCH v2 5/5] KVM: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky
@ 2022-01-05 21:56   ` Sean Christopherson
  2022-01-06  8:44     ` Maxim Levitsky
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2022-01-05 21:56 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Mon, Dec 13, 2021, Maxim Levitsky wrote:
> @@ -1486,6 +1485,12 @@ struct kvm_x86_ops {
>  	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
>  
>  	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> +
> +	/*
> +	 * Returns false if for some reason APICv (e.g guest mode)
> +	 * must be inhibited on this vCPU

Comment is wrong, code returns 'true' if AVIC is inhibited due to is_guest_mode().
Even better, rename the hook to something that's more self-documenting.

vcpu_is_apicv_inhibited() jumps to mind, but that's a bad name since it's not
called by kvm_vcpu_apicv_active().  Maybe vcpu_has_apicv_inhibit_condition()?

> +	 */
> +	bool (*apicv_check_inhibit)(struct kvm_vcpu *vcpu);
>  };
>  
>  struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 34f62da2fbadd..5a8304938f51e 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -734,6 +734,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>  	return 0;
>  }
>  
> +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu)

This should follow whatever wording we decide on for the kvm_x86_ops hook.  In
isolation, this name is too close to kvm_vcpu_apicv_active() and could be wrongly
assumed to mean that APICv is not inhibited for _any_ reason on this vCPU if it
returns false.

> +{
> +	return is_guest_mode(vcpu);
> +}
> +
>  bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
>  {
>  	return false;

...

> @@ -4486,6 +4493,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.complete_emulated_msr = svm_complete_emulated_msr,
>  
>  	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> +	.apicv_check_inhibit = avic_is_vcpu_inhibited,

This can technically be NULL if nested=0.

>  };
>  
>  /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index daa8ca84afccd..545684ea37353 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -590,6 +590,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
>  void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
>  void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
>  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec);
> +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu);
>  bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu);
>  int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
>  		       uint32_t guest_irq, bool set);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 81a74d86ee5eb..125599855af47 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9161,6 +9161,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
>  		r = kvm_check_nested_events(vcpu);
>  		if (r < 0)
>  			goto out;
> +
> +		/* Nested VM exit might need to update APICv status */
> +		if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
> +			kvm_vcpu_update_apicv(vcpu);
>  	}
>  
>  	/* try to inject new event if pending */
> @@ -9538,6 +9542,10 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>  	down_read(&vcpu->kvm->arch.apicv_update_lock);
>  
>  	activate = kvm_apicv_activated(vcpu->kvm);
> +
> +	if (kvm_x86_ops.apicv_check_inhibit)
> +		activate = activate && !kvm_x86_ops.apicv_check_inhibit(vcpu);

Might as well use Use static_call().  This would also be a candidate for
DEFINE_STATIC_CALL_RET0, though that's overkill if this is the only call site.

> +
>  	if (vcpu->arch.apicv_active == activate)
>  		goto out;
>  
> @@ -9935,7 +9943,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		 * per-VM state, and responsing vCPUs must wait for the update
>  		 * to complete before servicing KVM_REQ_APICV_UPDATE.
>  		 */
> -		WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
> +		if (!is_guest_mode(vcpu))
> +			WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
> +		else
> +			WARN_ON(kvm_vcpu_apicv_active(vcpu));

Won't this fire on VMX?

>  
>  		exit_fastpath = static_call(kvm_x86_run)(vcpu);
>  		if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] KVM: SVM: allow AVIC to co-exist with a nested guest running
  2022-01-05 21:56   ` Sean Christopherson
@ 2022-01-06  8:44     ` Maxim Levitsky
  2022-01-06 17:41       ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2022-01-06  8:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Wed, 2022-01-05 at 21:56 +0000, Sean Christopherson wrote:
> On Mon, Dec 13, 2021, Maxim Levitsky wrote:
> > @@ -1486,6 +1485,12 @@ struct kvm_x86_ops {
> >  	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
> >  
> >  	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> > +
> > +	/*
> > +	 * Returns false if for some reason APICv (e.g guest mode)
> > +	 * must be inhibited on this vCPU
> 
> Comment is wrong, code returns 'true' if AVIC is inhibited due to is_guest_mode().
> Even better, rename the hook to something that's more self-documenting.
> 
> vcpu_is_apicv_inhibited() jumps to mind, but that's a bad name since it's not
> called by kvm_vcpu_apicv_active().  Maybe vcpu_has_apicv_inhibit_condition()?

Yep. I also kind of don't like the name, but I didn't though of anything better.
vcpu_has_apicv_inhibit_condition seems a good idea.

> 
> > +	 */
> > +	bool (*apicv_check_inhibit)(struct kvm_vcpu *vcpu);
> >  };
> >  
> >  struct kvm_x86_nested_ops {
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 34f62da2fbadd..5a8304938f51e 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -734,6 +734,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >  	return 0;
> >  }
> >  
> > +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu)
> 
> This should follow whatever wording we decide on for the kvm_x86_ops hook.  In
> isolation, this name is too close to kvm_vcpu_apicv_active() and could be wrongly
> assumed to mean that APICv is not inhibited for _any_ reason on this vCPU if it
> returns false.
I will think of a better name.


> 
> > +{
> > +	return is_guest_mode(vcpu);
> > +}
> > +
> >  bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
> >  {
> >  	return false;
> 
> ...
> 
> > @@ -4486,6 +4493,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> >  	.complete_emulated_msr = svm_complete_emulated_msr,
> >  
> >  	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> > +	.apicv_check_inhibit = avic_is_vcpu_inhibited,
> 
> This can technically be NULL if nested=0.
Good idea, now it is possible to after recent refactoring.

> 
> >  };
> >  
> >  /*
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index daa8ca84afccd..545684ea37353 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -590,6 +590,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
> >  void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
> >  void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
> >  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec);
> > +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu);
> >  bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu);
> >  int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
> >  		       uint32_t guest_irq, bool set);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 81a74d86ee5eb..125599855af47 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9161,6 +9161,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
> >  		r = kvm_check_nested_events(vcpu);
> >  		if (r < 0)
> >  			goto out;
> > +
> > +		/* Nested VM exit might need to update APICv status */
> > +		if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
> > +			kvm_vcpu_update_apicv(vcpu);
> >  	}
> >  
> >  	/* try to inject new event if pending */
> > @@ -9538,6 +9542,10 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
> >  	down_read(&vcpu->kvm->arch.apicv_update_lock);
> >  
> >  	activate = kvm_apicv_activated(vcpu->kvm);
> > +
> > +	if (kvm_x86_ops.apicv_check_inhibit)
> > +		activate = activate && !kvm_x86_ops.apicv_check_inhibit(vcpu);
> 
> Might as well use Use static_call().  This would also be a candidate for
> DEFINE_STATIC_CALL_RET0, though that's overkill if this is the only call site.
This is also something that should be done, but I prefer to do this in one go.
There are several nested related functions that were not converted to static_call
(like .check_nested_events).

Also I recently found that we have KVM_X86_OP and KVM_X86_OP_NULL which are the
same thing - another thing for refactoring, so I prefer to refactor this
in one patch series.



> 
> > +
> >  	if (vcpu->arch.apicv_active == activate)
> >  		goto out;
> >  
> > @@ -9935,7 +9943,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  		 * per-VM state, and responsing vCPUs must wait for the update
> >  		 * to complete before servicing KVM_REQ_APICV_UPDATE.
> >  		 */
> > -		WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
> > +		if (!is_guest_mode(vcpu))
> > +			WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
> > +		else
> > +			WARN_ON(kvm_vcpu_apicv_active(vcpu));
> 
> Won't this fire on VMX?

Yes it will! Good catch. It almost like I would like to have .apicv_is_avic boolean,
for such cases :-) I'll think of something.

Best regards,
	Maxim Levitsky

> 
> >  
> >  		exit_fastpath = static_call(kvm_x86_run)(vcpu);
> >  		if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
> > -- 
> > 2.26.3
> > 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] KVM: SVM: allow AVIC to co-exist with a nested guest running
  2022-01-06  8:44     ` Maxim Levitsky
@ 2022-01-06 17:41       ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2022-01-06 17:41 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Paolo Bonzini,
	Dave Hansen, H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Thu, Jan 06, 2022, Maxim Levitsky wrote:
> Also I recently found that we have KVM_X86_OP and KVM_X86_OP_NULL which are the
> same thing - another thing for refactoring, so I prefer to refactor this
> in one patch series.

I have a mostly-complete patch series that removes KVM_X86_OP_NULL, will get it
finished and posted after I get through my review backlog.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  2022-01-05 11:03     ` Maxim Levitsky
  2022-01-05 11:54       ` Paolo Bonzini
@ 2022-01-07 15:32       ` Paolo Bonzini
  2022-01-07 23:42         ` Sean Christopherson
  1 sibling, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2022-01-07 15:32 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Dave Hansen,
	H. Peter Anvin, Wanpeng Li, Ingo Molnar

On 1/5/22 12:03, Maxim Levitsky wrote:
>>> -	if (!vcpu->arch.apicv_active)
>>> -		return -1;
>>> -
>>> +	/*
>>> +	 * Below, we have to handle anyway the case of AVIC being disabled
>>> +	 * in the middle of this function, and there is hardly any overhead
>>> +	 * if AVIC is disabled.  So, we do not bother returning -1 and handle
>>> +	 * the kick ourselves for disabled APICv.
>> Hmm, my preference would be to keep the "return -1" even though apicv_active must
>> be rechecked.  That would help highlight that returning "failure" after this point
>> is not an option as it would result in kvm_lapic_set_irr() being called twice.
> I don't mind either - this will fix the tracepoint I recently added to report the
> number of interrupts that were delivered by AVIC/APICv - with this patch,
> all of them count as such.

The reasoning here is that, unlike VMX, we have to react anyway to 
vcpu->arch.apicv_active becoming false halfway through the function.

Removing the early return means that there's one less case of load 
(mis)reordering that the reader has to check.  So I really would prefer 
to remove it.

Agreed with the other feedback.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  2022-01-07 15:32       ` Paolo Bonzini
@ 2022-01-07 23:42         ` Sean Christopherson
  2022-01-10 15:10           ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2022-01-07 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Maxim Levitsky, kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Dave Hansen,
	H. Peter Anvin, Wanpeng Li, Ingo Molnar

On Fri, Jan 07, 2022, Paolo Bonzini wrote:
> On 1/5/22 12:03, Maxim Levitsky wrote:
> > > > -	if (!vcpu->arch.apicv_active)
> > > > -		return -1;
> > > > -
> > > > +	/*
> > > > +	 * Below, we have to handle anyway the case of AVIC being disabled
> > > > +	 * in the middle of this function, and there is hardly any overhead
> > > > +	 * if AVIC is disabled.  So, we do not bother returning -1 and handle
> > > > +	 * the kick ourselves for disabled APICv.
> > > Hmm, my preference would be to keep the "return -1" even though apicv_active must
> > > be rechecked.  That would help highlight that returning "failure" after this point
> > > is not an option as it would result in kvm_lapic_set_irr() being called twice.
> > I don't mind either - this will fix the tracepoint I recently added to report the
> > number of interrupts that were delivered by AVIC/APICv - with this patch,
> > all of them count as such.
> 
> The reasoning here is that, unlike VMX, we have to react anyway to
> vcpu->arch.apicv_active becoming false halfway through the function.
> 
> Removing the early return means that there's one less case of load
> (mis)reordering that the reader has to check.

Yeah, I don't disagree, but the flip side is that without the early check, it's
not all that obvious that SVM must not return -1.  And when AVIC isn't supported
or is disabled at the module level, flowing into AVIC "specific" IRR logic is
a bit weird.  And the LAPIC code effectively becomes Intel-only.

To make everyone happy, and fix the tracepoint issue, what about moving delivery
into vendor code?  E.g. the below (incomplete), with SVM functions renamed so that
anything that isn't guaranteed to be AVIC specific uses svm_ instead of avic_.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index baca9fa37a91..a9ac724c6305 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1096,14 +1096,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
                                                       apic->regs + APIC_TMR);
                }

-               if (static_call(kvm_x86_deliver_posted_interrupt)(vcpu, vector)) {
-                       kvm_lapic_set_irr(vector, apic);
-                       kvm_make_request(KVM_REQ_EVENT, vcpu);
-                       kvm_vcpu_kick(vcpu);
-               } else {
-                       trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
-                                                  trig_mode, vector);
-               }
+               static_call(kvm_x86_deliver_interrupt)(vcpu, vector);
                break;

        case APIC_DM_REMRD:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fe06b02994e6..1fadd14ea884 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4012,6 +4012,18 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
        return 0;
 }

+static void vmx_deliver_interrupt(struct kvm_vcpu *vcpu, int vector)
+{
+       if (vmx_deliver_posted_interrupt(vcpu, vector)) {
+               kvm_lapic_set_irr(vector, apic);
+               kvm_make_request(KVM_REQ_EVENT, vcpu);
+               kvm_vcpu_kick(vcpu);
+       } else {
+               trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
+                                          trig_mode, vector);
+       }
+}
+
 /*
  * Set up the vmcs's constant host-state fields, i.e., host-state fields that
  * will not change in the lifetime of the guest.
@@ -7651,7 +7663,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
        .hwapic_isr_update = vmx_hwapic_isr_update,
        .guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
        .sync_pir_to_irr = vmx_sync_pir_to_irr,
-       .deliver_posted_interrupt = vmx_deliver_posted_interrupt,
+       .deliver_interrupt = vmx_deliver_interrupt,
        .dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,

        .set_tss_addr = vmx_set_tss_addr,


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  2022-01-07 23:42         ` Sean Christopherson
@ 2022-01-10 15:10           ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2022-01-10 15:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, kvm, Jim Mattson, Thomas Gleixner, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Borislav Petkov, linux-kernel, Dave Hansen,
	H. Peter Anvin, Wanpeng Li, Ingo Molnar

On 1/8/22 00:42, Sean Christopherson wrote:
> Yeah, I don't disagree, but the flip side is that without the early check, it's
> not all that obvious that SVM must not return -1.

But that's what the comment is about.

> And when AVIC isn't supported
> or is disabled at the module level, flowing into AVIC "specific" IRR logic is
> a bit weird.  And the LAPIC code effectively becomes Intel-only.

Yeah, I agree that this particular aspect is a bit weird at first.  But 
it is also natural if you consider how IRR is implemented for AVIC vs. 
APICv, especially with respect to incomplete IPIs.  And the LAPIC code 
becomes a blueprint for AVIC's, i.e. you can see that the AVIC code 
effectively becomes the one in lapic.c when you have either 
!vcpu->arch.apicv_active or an incomplete IPI.

I don't hate the code below, it does lose a bit of expressiveness of the 
LAPIC code but I guess it's a good middle ground.

Paolo

> To make everyone happy, and fix the tracepoint issue, what about moving delivery
> into vendor code?  E.g. the below (incomplete), with SVM functions renamed so that
> anything that isn't guaranteed to be AVIC specific uses svm_ instead of avic_.
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index baca9fa37a91..a9ac724c6305 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1096,14 +1096,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>                                                         apic->regs + APIC_TMR);
>                  }
> 
> -               if (static_call(kvm_x86_deliver_posted_interrupt)(vcpu, vector)) {
> -                       kvm_lapic_set_irr(vector, apic);
> -                       kvm_make_request(KVM_REQ_EVENT, vcpu);
> -                       kvm_vcpu_kick(vcpu);
> -               } else {
> -                       trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
> -                                                  trig_mode, vector);
> -               }
> +               static_call(kvm_x86_deliver_interrupt)(vcpu, vector);
>                  break;
> 
>          case APIC_DM_REMRD:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index fe06b02994e6..1fadd14ea884 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4012,6 +4012,18 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>          return 0;
>   }
> 
> +static void vmx_deliver_interrupt(struct kvm_vcpu *vcpu, int vector)
> +{
> +       if (vmx_deliver_posted_interrupt(vcpu, vector)) {
> +               kvm_lapic_set_irr(vector, apic);
> +               kvm_make_request(KVM_REQ_EVENT, vcpu);
> +               kvm_vcpu_kick(vcpu);
> +       } else {
> +               trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
> +                                          trig_mode, vector);
> +       }
> +}
> +
>   /*
>    * Set up the vmcs's constant host-state fields, i.e., host-state fields that
>    * will not change in the lifetime of the guest.
> @@ -7651,7 +7663,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>          .hwapic_isr_update = vmx_hwapic_isr_update,
>          .guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
>          .sync_pir_to_irr = vmx_sync_pir_to_irr,
> -       .deliver_posted_interrupt = vmx_deliver_posted_interrupt,
> +       .deliver_interrupt = vmx_deliver_interrupt,
>          .dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
> 
>          .set_tss_addr = vmx_set_tss_addr,
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-01-10 15:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-13 10:46 [PATCH v2 0/5] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
2021-12-13 10:46 ` [PATCH v2 1/5] KVM: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control EFLAGS.IF Maxim Levitsky
2021-12-13 11:34   ` Paolo Bonzini
2021-12-13 13:07     ` Maxim Levitsky
2021-12-13 13:15       ` Paolo Bonzini
2021-12-13 13:29         ` Maxim Levitsky
2021-12-13 10:46 ` [PATCH v2 2/5] KVM: SVM: allow to force AVIC to be enabled Maxim Levitsky
2022-01-04 22:25   ` Sean Christopherson
2022-01-05 10:54     ` Maxim Levitsky
2022-01-05 17:46       ` Sean Christopherson
2021-12-13 10:46 ` [PATCH v2 3/5] KVM: SVM: fix race between interrupt delivery and AVIC inhibition Maxim Levitsky
2022-01-04 22:52   ` Sean Christopherson
2022-01-05 11:03     ` Maxim Levitsky
2022-01-05 11:54       ` Paolo Bonzini
2022-01-05 12:15         ` Maxim Levitsky
2022-01-07 15:32       ` Paolo Bonzini
2022-01-07 23:42         ` Sean Christopherson
2022-01-10 15:10           ` Paolo Bonzini
2021-12-13 10:46 ` [PATCH v2 4/5] KVM: x86: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it Maxim Levitsky
2022-01-04 22:57   ` Sean Christopherson
2022-01-05 10:56     ` Maxim Levitsky
2021-12-13 10:46 ` [PATCH v2 5/5] KVM: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky
2022-01-05 21:56   ` Sean Christopherson
2022-01-06  8:44     ` Maxim Levitsky
2022-01-06 17:41       ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).