linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting
@ 2021-12-09 11:54 Maxim Levitsky
  2021-12-09 11:54 ` [PATCH 1/6] KVM: SVM: allow to force AVIC to be enabled Maxim Levitsky
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 11:54 UTC (permalink / raw)
  To: kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson,
	Maxim Levitsky

This patch series aims to lift long standing restriction of
using AVIC only when nested virtualization is not exposed
to the guest.

Notes about specific patches:

Patch 1 addresses the fact that AVIC appears to be disabled
in CPUID on several Milan systems I am tesing on.
This adds a workaround (with a big warning and a kernel taint) to enable
it anyway if you really know what you are doing.

It is possible that those systems have the AVIC disabled as a workaround
for the AVIC errata #1235 which might be already fixed but OEM might
not yet re-enabled it out of caution.

Patch 6 adds the AVIC co-existance itself, and was tested with a modification
of the ipi_stress unit test which I soon post upstream which made one
of vCPUs enter nested guest and receive the IPI while nested guest is running.

It was tested on a Zen3 (Milan) machine. On Zen2 machines I have errata #1235
makes my test fail quite fast.

Best regards,
	Maxim Levitsky

Maxim Levitsky (6):
  KVM: SVM: allow to force AVIC to be enabled
  KVM: x86: add a tracepoint for APICv/AVIC interrupt delivery
  KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition
  KVM: SVM: fix races in the AVIC incomplete IPI delivery to vCPUs
  KVM: x86: never clear irr_pending in kvm_apic_update_apicv
  KVM: SVM: allow AVIC to co-exist with a nested guest running

 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  7 +++++-
 arch/x86/kvm/lapic.c               |  6 ++++-
 arch/x86/kvm/svm/avic.c            | 35 +++++++++++++++++++++++-----
 arch/x86/kvm/svm/nested.c          | 13 ++++++-----
 arch/x86/kvm/svm/svm.c             | 37 +++++++++++++++++++-----------
 arch/x86/kvm/svm/svm.h             |  1 +
 arch/x86/kvm/trace.h               | 24 +++++++++++++++++++
 arch/x86/kvm/x86.c                 | 17 ++++++++++++--
 9 files changed, 111 insertions(+), 30 deletions(-)

-- 
2.26.3



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/6] KVM: SVM: allow to force AVIC to be enabled
  2021-12-09 11:54 [PATCH 0/6] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
@ 2021-12-09 11:54 ` Maxim Levitsky
  2021-12-09 11:54 ` [PATCH 2/6] KVM: x86: add a tracepoint for APICv/AVIC interrupt delivery Maxim Levitsky
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 11:54 UTC (permalink / raw)
  To: kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson,
	Maxim Levitsky

Apparently on some systems AVIC is disabled in CPUID but still usable.

Allow the user to override the CPUID if the user is willing to
take the risk.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dde0106ffc47..6fbce42b9776 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -206,6 +206,9 @@ module_param(tsc_scaling, int, 0444);
 static bool avic;
 module_param(avic, bool, 0444);
 
+static bool force_avic;
+module_param_unsafe(force_avic, bool, 0444);
+
 bool __read_mostly dump_invalid_vmcb;
 module_param(dump_invalid_vmcb, bool, 0644);
 
@@ -1058,10 +1061,14 @@ static __init int svm_hardware_setup(void)
 			nrips = false;
 	}
 
-	enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
+	enable_apicv = avic = avic && npt_enabled && (boot_cpu_has(X86_FEATURE_AVIC) || force_avic);
 
 	if (enable_apicv) {
-		pr_info("AVIC enabled\n");
+		if (!boot_cpu_has(X86_FEATURE_AVIC)) {
+			pr_warn("AVIC is not supported in CPUID but force enabled");
+			pr_warn("Your system might crash and burn");
+		} else
+			pr_info("AVIC enabled\n");
 
 		amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
 	}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/6] KVM: x86: add a tracepoint for APICv/AVIC interrupt delivery
  2021-12-09 11:54 [PATCH 0/6] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
  2021-12-09 11:54 ` [PATCH 1/6] KVM: SVM: allow to force AVIC to be enabled Maxim Levitsky
@ 2021-12-09 11:54 ` Maxim Levitsky
  2021-12-09 11:54 ` [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition Maxim Levitsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 11:54 UTC (permalink / raw)
  To: kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson,
	Maxim Levitsky

This allows to see how many interrupts were delivered via the
APICv/AVIC from the host.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c |  3 +++
 arch/x86/kvm/trace.h | 24 ++++++++++++++++++++++++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 28 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 40270d7bc597..c5028e6b0f96 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1100,6 +1100,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 			kvm_lapic_set_irr(vector, apic);
 			kvm_make_request(KVM_REQ_EVENT, vcpu);
 			kvm_vcpu_kick(vcpu);
+		} else {
+			trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
+						   trig_mode, vector);
 		}
 		break;
 
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 953b0fcb21ee..92e6f6702f00 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1356,6 +1356,30 @@ TRACE_EVENT(kvm_apicv_update_request,
 		  __entry->bit)
 );
 
+TRACE_EVENT(kvm_apicv_accept_irq,
+	    TP_PROTO(__u32 apicid, __u16 dm, __u16 tm, __u8 vec),
+	    TP_ARGS(apicid, dm, tm, vec),
+
+	TP_STRUCT__entry(
+		__field(	__u32,		apicid		)
+		__field(	__u16,		dm		)
+		__field(	__u16,		tm		)
+		__field(	__u8,		vec		)
+	),
+
+	TP_fast_assign(
+		__entry->apicid		= apicid;
+		__entry->dm		= dm;
+		__entry->tm		= tm;
+		__entry->vec		= vec;
+	),
+
+	TP_printk("apicid %x vec %u (%s|%s)",
+		  __entry->apicid, __entry->vec,
+		  __print_symbolic((__entry->dm >> 8 & 0x7), kvm_deliver_mode),
+		  __entry->tm ? "level" : "edge")
+);
+
 /*
  * Tracepoint for AMD AVIC
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1aaf37e1bd0f..26cb3a4cd0e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12693,6 +12693,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_accept_irq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition
  2021-12-09 11:54 [PATCH 0/6] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
  2021-12-09 11:54 ` [PATCH 1/6] KVM: SVM: allow to force AVIC to be enabled Maxim Levitsky
  2021-12-09 11:54 ` [PATCH 2/6] KVM: x86: add a tracepoint for APICv/AVIC interrupt delivery Maxim Levitsky
@ 2021-12-09 11:54 ` Maxim Levitsky
  2021-12-09 14:11   ` Paolo Bonzini
  2021-12-09 11:54 ` [PATCH 4/6] KVM: SVM: fix races in the AVIC incomplete IPI delivery to vCPUs Maxim Levitsky
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 11:54 UTC (permalink / raw)
  To: kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson,
	Maxim Levitsky

If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
inhibited, it might read a stale value of vcpu->arch.apicv_active
which can lead to the target vCPU not noticing the interrupt.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 859ad2dc50f1..8c1b934bfa9b 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 	 * automatically process AVIC interrupts at VMRUN.
 	 */
 	if (vcpu->mode == IN_GUEST_MODE) {
+
+		/*
+		 * At this point we had read the vcpu->arch.apicv_active == true
+		 * and the vcpu->mode == IN_GUEST_MODE.
+		 * Since we have a memory barrier after setting IN_GUEST_MODE,
+		 * it ensures that AVIC inhibition is complete and thus
+		 * the target is really running with AVIC enabled.
+		 */
+
 		int cpu = READ_ONCE(vcpu->cpu);
 
 		/*
@@ -706,10 +715,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 		put_cpu();
 	} else {
 		/*
-		 * Wake the vCPU if it was blocking.  KVM will then detect the
-		 * pending IRQ when checking if the vCPU has a wake event.
+		 * Kick the target vCPU otherwise, to make sure
+		 * it processes the interrupt even if its AVIC is inhibited.
 		 */
-		kvm_vcpu_wake_up(vcpu);
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		kvm_vcpu_kick(vcpu);
 	}
 
 	return 0;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/6] KVM: SVM: fix races in the AVIC incomplete IPI delivery to vCPUs
  2021-12-09 11:54 [PATCH 0/6] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
                   ` (2 preceding siblings ...)
  2021-12-09 11:54 ` [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition Maxim Levitsky
@ 2021-12-09 11:54 ` Maxim Levitsky
  2021-12-09 15:38   ` Sean Christopherson
  2021-12-09 11:54 ` [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv Maxim Levitsky
  2021-12-09 11:54 ` [PATCH 6/6] KVM: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky
  5 siblings, 1 reply; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 11:54 UTC (permalink / raw)
  To: kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson,
	Maxim Levitsky

If the target vCPU has AVIC inhibited while the source vCPU isn't,
we need to set irr_pending, for the target to notice the interrupt.
Do it always to be safe, the same as in svm_deliver_avic_intr.

Also if the target has AVIC inhibited, the same kind of races
that happen in svm_deliver_avic_intr can happen here as well,
so apply the same approach of kicking the target vCPUs.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8c1b934bfa9b..bdfc37caa64a 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -304,8 +304,17 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
 					GET_APIC_DEST_FIELD(icrh),
-					icrl & APIC_DEST_MASK))
-			kvm_vcpu_wake_up(vcpu);
+					icrl & APIC_DEST_MASK)) {
+
+			vcpu->arch.apic->irr_pending = true;
+			kvm_make_request(KVM_REQ_EVENT, vcpu);
+			/*
+			 * The target vCPU might have AVIC inhibited,
+			 * so we have to kick it, to make sure it processes
+			 * the interrupt.
+			 */
+			kvm_vcpu_kick(vcpu);
+		}
 	}
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv
  2021-12-09 11:54 [PATCH 0/6] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
                   ` (3 preceding siblings ...)
  2021-12-09 11:54 ` [PATCH 4/6] KVM: SVM: fix races in the AVIC incomplete IPI delivery to vCPUs Maxim Levitsky
@ 2021-12-09 11:54 ` Maxim Levitsky
  2021-12-09 14:12   ` Paolo Bonzini
  2021-12-10 12:07   ` Paolo Bonzini
  2021-12-09 11:54 ` [PATCH 6/6] KVM: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky
  5 siblings, 2 replies; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 11:54 UTC (permalink / raw)
  To: kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson,
	Maxim Levitsky

It is possible that during the AVIC incomplete IPI vmexit,
its handler will set irr_pending to true,
but the target vCPU will still see the IRR bit not set,
due to the apparent lack of memory ordering between CPU's vIRR write
that is supposed to happen prior to the AVIC incomplete IPI
vmexit and the write of the irr_pending in that handler.

The AVIC incomplete IPI handler sets this boolean, then issues
a write barrier and then raises KVM_REQ_EVENT,
thus when we later process the KVM_REQ_EVENT we will notice
the vIRR bits set.

Also reorder call to kvm_apic_update_apicv to be after
.refresh_apicv_exec_ctrl, although that doesn't guarantee
that it will see up to date IRR bits.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 3 ++-
 arch/x86/kvm/x86.c   | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c5028e6b0f96..ecd6111b9a0d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2314,7 +2314,8 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
 		apic->irr_pending = true;
 		apic->isr_count = 1;
 	} else {
-		apic->irr_pending = (apic_search_irr(apic) != -1);
+		if (apic_search_irr(apic) != -1)
+			apic->irr_pending = true;
 		apic->isr_count = count_vectors(apic->regs + APIC_ISR);
 	}
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26cb3a4cd0e9..ca037ac2ea08 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9542,8 +9542,8 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 		goto out;
 
 	vcpu->arch.apicv_active = activate;
-	kvm_apic_update_apicv(vcpu);
 	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
+	kvm_apic_update_apicv(vcpu);
 
 	/*
 	 * When APICv gets disabled, we may still have injected interrupts
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/6] KVM: SVM: allow AVIC to co-exist with a nested guest running
  2021-12-09 11:54 [PATCH 0/6] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
                   ` (4 preceding siblings ...)
  2021-12-09 11:54 ` [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv Maxim Levitsky
@ 2021-12-09 11:54 ` Maxim Levitsky
  5 siblings, 0 replies; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 11:54 UTC (permalink / raw)
  To: kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson,
	Maxim Levitsky

Inhibit the AVIC of the vCPU that is running nested for the
duration of the nested run, so that all interrupts arriving
from both its vCPU siblings and from KVM are delivered using
normal IPIs and cause that vCPU to vmexit.

Note that in the theory when a nested guest doesn't intercept
physical interrupts, we could continue using AVIC to deliver them
to it but don't bother doing so for now.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  7 ++++++-
 arch/x86/kvm/svm/avic.c            |  6 +++++-
 arch/x86/kvm/svm/nested.c          | 13 +++++++------
 arch/x86/kvm/svm/svm.c             | 26 ++++++++++++++------------
 arch/x86/kvm/svm/svm.h             |  1 +
 arch/x86/kvm/x86.c                 | 14 +++++++++++++-
 7 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index cefe1d81e2e8..c531dc0fca11 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -121,6 +121,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush)
 KVM_X86_OP_NULL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP_NULL(complete_emulated_msr)
+KVM_X86_OP_NULL(apicv_check_inhibit);
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_NULL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d5fede05eb5f..a0f17d5284e6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1036,7 +1036,6 @@ struct kvm_x86_msr_filter {
 
 #define APICV_INHIBIT_REASON_DISABLE    0
 #define APICV_INHIBIT_REASON_HYPERV     1
-#define APICV_INHIBIT_REASON_NESTED     2
 #define APICV_INHIBIT_REASON_IRQWIN     3
 #define APICV_INHIBIT_REASON_PIT_REINJ  4
 #define APICV_INHIBIT_REASON_X2APIC	5
@@ -1498,6 +1497,12 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+
+	/*
+	 * Returns false if for some reason APICv (e.g guest mode)
+	 * must be inhibited on this vCPU
+	 */
+	bool (*apicv_check_inhibit)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index bdfc37caa64a..c0550f505c7e 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -734,6 +734,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 	return 0;
 }
 
+bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu)
+{
+	return is_guest_mode(vcpu);
+}
+
 bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
 {
 	return false;
@@ -950,7 +955,6 @@ bool svm_check_apicv_inhibit_reasons(ulong bit)
 	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
 			  BIT(APICV_INHIBIT_REASON_ABSENT) |
 			  BIT(APICV_INHIBIT_REASON_HYPERV) |
-			  BIT(APICV_INHIBIT_REASON_NESTED) |
 			  BIT(APICV_INHIBIT_REASON_IRQWIN) |
 			  BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
 			  BIT(APICV_INHIBIT_REASON_X2APIC) |
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index cf206855ebf0..cd07049670c9 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -551,12 +551,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	 * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
 	 */
 
-	/*
-	 * Also covers avic_vapic_bar, avic_backing_page, avic_logical_id,
-	 * avic_physical_id.
-	 */
-	WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
-
 	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
 	svm->vmcb->control.nested_ctl = svm->vmcb01.ptr->control.nested_ctl;
 	svm->vmcb->control.iopm_base_pa = svm->vmcb01.ptr->control.iopm_base_pa;
@@ -659,6 +653,9 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
 
 	svm_set_gif(svm, true);
 
+	if (kvm_vcpu_apicv_active(vcpu))
+		kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
+
 	return 0;
 }
 
@@ -697,6 +694,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 
 	nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
+
 	nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
 
 	if (!nested_vmcb_check_save(vcpu) ||
@@ -923,6 +921,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (unlikely(svm->vmcb->save.rflags & X86_EFLAGS_TF))
 		kvm_queue_exception(&(svm->vcpu), DB_VECTOR);
 
+	if (kvm_apicv_activated(vcpu->kvm))
+		kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6fbce42b9776..ab70ee8e1b8c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1620,7 +1620,8 @@ static void svm_set_vintr(struct vcpu_svm *svm)
 	/*
 	 * The following fields are ignored when AVIC is enabled
 	 */
-	WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
+	if (!is_guest_mode(&svm->vcpu))
+		WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
 
 	svm_set_intercept(svm, INTERCEPT_VINTR);
 
@@ -3090,11 +3091,16 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
 	svm_clear_vintr(to_svm(vcpu));
 
 	/*
-	 * For AVIC, the only reason to end up here is ExtINTs.
+	 * If not running nested, for AVIC, the only reason to end up here is ExtINTs.
 	 * In this case AVIC was temporarily disabled for
 	 * requesting the IRQ window and we have to re-enable it.
+	 *
+	 * If running nested, this vCPU has avic inhibited during the
+	 * nested run, and can use the IRQ window
 	 */
-	kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_IRQWIN);
+
+	if (!is_guest_mode(vcpu))
+		kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_IRQWIN);
 
 	++vcpu->stat.irq_window_exits;
 	return 1;
@@ -3644,7 +3650,10 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
 		 * via AVIC. In such case, we need to temporarily disable AVIC,
 		 * and fallback to injecting IRQ via V_IRQ.
 		 */
-		kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN);
+
+		if (!is_guest_mode(vcpu))
+			kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN);
+
 		svm_set_vintr(svm);
 	}
 }
@@ -4114,14 +4123,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
 			kvm_request_apicv_update(vcpu->kvm, false,
 						 APICV_INHIBIT_REASON_X2APIC);
-
-		/*
-		 * Currently, AVIC does not work with nested virtualization.
-		 * So, we disable AVIC when cpuid for SVM is set in the L1 guest.
-		 */
-		if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
-			kvm_request_apicv_update(vcpu->kvm, false,
-						 APICV_INHIBIT_REASON_NESTED);
 	}
 	init_vmcb_after_set_cpuid(vcpu);
 }
@@ -4719,6 +4720,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
+	.apicv_check_inhibit = avic_is_vcpu_inhibited,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 83ced47fa9b9..2b628e9bc5c9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -590,6 +590,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
 void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
 void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
 int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec);
+bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu);
 bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
 		       uint32_t guest_irq, bool set);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca037ac2ea08..00ec878c5872 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9161,6 +9161,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
 		r = kvm_check_nested_events(vcpu);
 		if (r < 0)
 			goto out;
+
+		/* Nested VM exit might need to update APICv status */
+		if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
+			kvm_vcpu_update_apicv(vcpu);
 	}
 
 	/* try to inject new event if pending */
@@ -9538,6 +9542,10 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	down_read(&vcpu->kvm->arch.apicv_update_lock);
 
 	activate = kvm_apicv_activated(vcpu->kvm);
+
+	if (kvm_x86_ops.apicv_check_inhibit)
+		activate = activate && !kvm_x86_ops.apicv_check_inhibit(vcpu);
+
 	if (vcpu->arch.apicv_active == activate)
 		goto out;
 
@@ -9578,6 +9586,7 @@ void __kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
 
 	if (!!old != !!new) {
 		trace_kvm_apicv_update_request(activate, bit);
+
 		/*
 		 * Kick all vCPUs before setting apicv_inhibit_reasons to avoid
 		 * false positives in the sanity check WARN in svm_vcpu_run().
@@ -9932,7 +9941,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		 * per-VM state, and responsing vCPUs must wait for the update
 		 * to complete before servicing KVM_REQ_APICV_UPDATE.
 		 */
-		WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
+		if (!is_guest_mode(vcpu))
+			WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
+		else
+			WARN_ON(kvm_vcpu_apicv_active(vcpu));
 
 		exit_fastpath = static_call(kvm_x86_run)(vcpu);
 		if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition
  2021-12-09 11:54 ` [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition Maxim Levitsky
@ 2021-12-09 14:11   ` Paolo Bonzini
  2021-12-09 14:26     ` Maxim Levitsky
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2021-12-09 14:11 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson

On 12/9/21 12:54, Maxim Levitsky wrote:
> If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> inhibited, it might read a stale value of vcpu->arch.apicv_active
> which can lead to the target vCPU not noticing the interrupt.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
>   1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 859ad2dc50f1..8c1b934bfa9b 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>   	 * automatically process AVIC interrupts at VMRUN.
>   	 */
>   	if (vcpu->mode == IN_GUEST_MODE) {
> +
> +		/*
> +		 * At this point we had read the vcpu->arch.apicv_active == true
> +		 * and the vcpu->mode == IN_GUEST_MODE.
> +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> +		 * it ensures that AVIC inhibition is complete and thus
> +		 * the target is really running with AVIC enabled.
> +		 */
> +
>   		int cpu = READ_ONCE(vcpu->cpu);

I don't think it's correct.  The vCPU has apicv_active written (in 
kvm_vcpu_update_apicv) before vcpu->mode.

For the acquire/release pair to work properly you need to 1) read 
apicv_active *after* vcpu->mode here 2) use store_release and 
load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.

Paolo

>   		/*
> @@ -706,10 +715,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>   		put_cpu();
>   	} else {
>   		/*
> -		 * Wake the vCPU if it was blocking.  KVM will then detect the
> -		 * pending IRQ when checking if the vCPU has a wake event.
> +		 * Kick the target vCPU otherwise, to make sure
> +		 * it processes the interrupt even if its AVIC is inhibited.
>   		 */
> -		kvm_vcpu_wake_up(vcpu);
> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +		kvm_vcpu_kick(vcpu);
>   	}
>   
>   	return 0;
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv
  2021-12-09 11:54 ` [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv Maxim Levitsky
@ 2021-12-09 14:12   ` Paolo Bonzini
  2021-12-09 15:03     ` Maxim Levitsky
  2021-12-10 12:07   ` Paolo Bonzini
  1 sibling, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2021-12-09 14:12 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson

On 12/9/21 12:54, Maxim Levitsky wrote:
> Also reorder call to kvm_apic_update_apicv to be after
> .refresh_apicv_exec_ctrl, although that doesn't guarantee
> that it will see up to date IRR bits.

Can you spell out why do that?

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition
  2021-12-09 14:11   ` Paolo Bonzini
@ 2021-12-09 14:26     ` Maxim Levitsky
  2021-12-09 15:27       ` Sean Christopherson
  0 siblings, 1 reply; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 14:26 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson

On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote:
> On 12/9/21 12:54, Maxim Levitsky wrote:
> > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> > inhibited, it might read a stale value of vcpu->arch.apicv_active
> > which can lead to the target vCPU not noticing the interrupt.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
> >   1 file changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 859ad2dc50f1..8c1b934bfa9b 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >   	 * automatically process AVIC interrupts at VMRUN.
> >   	 */
> >   	if (vcpu->mode == IN_GUEST_MODE) {
> > +
> > +		/*
> > +		 * At this point we had read the vcpu->arch.apicv_active == true
> > +		 * and the vcpu->mode == IN_GUEST_MODE.
> > +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> > +		 * it ensures that AVIC inhibition is complete and thus
> > +		 * the target is really running with AVIC enabled.
> > +		 */
> > +
> >   		int cpu = READ_ONCE(vcpu->cpu);
> 
> I don't think it's correct.  The vCPU has apicv_active written (in 
> kvm_vcpu_update_apicv) before vcpu->mode.

I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE
thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value.
But apparently the memory barrier is after setting vcpu->mode.


> 
> For the acquire/release pair to work properly you need to 1) read 
> apicv_active *after* vcpu->mode here 2) use store_release and 
> load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.

store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it,
which I expected to be there.

And yes I see now, I need a read barrier here as well. I am still learning this.

Best regards,
	Maxim Levitsky

> 
> Paolo
> 
> >   		/*
> > @@ -706,10 +715,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >   		put_cpu();
> >   	} else {
> >   		/*
> > -		 * Wake the vCPU if it was blocking.  KVM will then detect the
> > -		 * pending IRQ when checking if the vCPU has a wake event.
> > +		 * Kick the target vCPU otherwise, to make sure
> > +		 * it processes the interrupt even if its AVIC is inhibited.
> >   		 */
> > -		kvm_vcpu_wake_up(vcpu);
> > +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> > +		kvm_vcpu_kick(vcpu);
> >   	}
> >   
> >   	return 0;
> > 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv
  2021-12-09 14:12   ` Paolo Bonzini
@ 2021-12-09 15:03     ` Maxim Levitsky
  0 siblings, 0 replies; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 15:03 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson

On Thu, 2021-12-09 at 15:12 +0100, Paolo Bonzini wrote:
> On 12/9/21 12:54, Maxim Levitsky wrote:
> > Also reorder call to kvm_apic_update_apicv to be after
> > .refresh_apicv_exec_ctrl, although that doesn't guarantee
> > that it will see up to date IRR bits.
> 
> Can you spell out why do that?


Here is what I seen happening during kvm_vcpu_update_apicv when we about to disable AVIC:

1. we call kvm_apic_update_apicv which sets irr_pending == false,
because there is nothing in IRR yet.

2. we call kvm_x86_refresh_apicv_exec_ctrl which disables AVIC

If IPI arrives in between 1 and 2, the IRR bits are set, and legit there
is no VMexit happening so no chance of irr_pending to be set to true.


This is why I reordered those calls and added a memory barrier between them
(but I didn't post it in the series)

However I then found out that even with incomplete IPI handler setting irr_pending,
I can here observe irr_pending = true but no bits in IRR so the kvm_apic_update_apicv
would reset it. I expected VM exit to be write barrier but it seems that it isn't.

However I ended up fixing the incomplete IPI handler to just always 
	- set irr_pending
	- raise KVM_REQ_EVENT
	- kick the vcpu

Because kicking a sleeping vCPU is just waking it up,
and otherwise vcpu kick only sends IPI when the target vCPU
is in guest mode anyway.

That I think ensures for good that interrupt will be processed by
this vCPU regardless of order of these calls, and barrier between them.

The only thing I kept is that make kvm_apic_update_apicv never clear
irr_pending to make sure it doesn't reset it if it sees the writes out of order.

Later the KVM_REQ_EVENT should see writes in order because kvm_make_request
includes a write barrier, and the kick should ensure that the vCPU will
process that request.

So in summary this reorder is not needed anymore but it seems more logical
to scan IRR after we disable AVIC.
Or on the second though I think we should drop the IRR scan from here at all,
now that the callers do vcpu kicks.

Best regards,
	Maxim Levitsky





> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition
  2021-12-09 14:26     ` Maxim Levitsky
@ 2021-12-09 15:27       ` Sean Christopherson
  2021-12-09 15:33         ` Maxim Levitsky
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2021-12-09 15:27 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson

On Thu, Dec 09, 2021, Maxim Levitsky wrote:
> On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote:
> > On 12/9/21 12:54, Maxim Levitsky wrote:
> > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> > > inhibited, it might read a stale value of vcpu->arch.apicv_active
> > > which can lead to the target vCPU not noticing the interrupt.
> > > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
> > >   1 file changed, 13 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index 859ad2dc50f1..8c1b934bfa9b 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> > >   	 * automatically process AVIC interrupts at VMRUN.
> > >   	 */
> > >   	if (vcpu->mode == IN_GUEST_MODE) {
> > > +
> > > +		/*
> > > +		 * At this point we had read the vcpu->arch.apicv_active == true
> > > +		 * and the vcpu->mode == IN_GUEST_MODE.
> > > +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> > > +		 * it ensures that AVIC inhibition is complete and thus
> > > +		 * the target is really running with AVIC enabled.
> > > +		 */
> > > +
> > >   		int cpu = READ_ONCE(vcpu->cpu);
> > 
> > I don't think it's correct.  The vCPU has apicv_active written (in 
> > kvm_vcpu_update_apicv) before vcpu->mode.
> 
> I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE
> thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value.
> But apparently the memory barrier is after setting vcpu->mode.
> 
> 
> > 
> > For the acquire/release pair to work properly you need to 1) read 
> > apicv_active *after* vcpu->mode here 2) use store_release and 
> > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.
> 
> store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it,
> which I expected to be there.
> 
> And yes I see now, I need a read barrier here as well. I am still learning this.

Sans barriers and comments, can't this be written as returning an "error" if the
vCPU is not IN_GUEST_MODE?  Effectively the same thing, but a little more precise
and it avoids duplicating the lapic.c code.

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 26ed5325c593..cddf7a8da3ea 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -671,7 +671,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)

 int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 {
-       if (!vcpu->arch.apicv_active)
+       if (vcpu->mode != IN_GUEST_MODE || !vcpu->arch.apicv_active)
                return -1;

        kvm_lapic_set_irr(vec, vcpu->arch.apic);
@@ -706,8 +706,9 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
                put_cpu();
        } else {
                /*
-                * Wake the vCPU if it was blocking.  KVM will then detect the
-                * pending IRQ when checking if the vCPU has a wake event.
+                * Wake the vCPU if it is blocking.  If the vCPU exited the
+                * guest since the previous vcpu->mode check, it's guaranteed
+                * to see the event before re-enterring the guest.
                 */
                kvm_vcpu_wake_up(vcpu);
        }


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition
  2021-12-09 15:27       ` Sean Christopherson
@ 2021-12-09 15:33         ` Maxim Levitsky
  2021-12-09 15:35           ` Maxim Levitsky
  0 siblings, 1 reply; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 15:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson

On Thu, 2021-12-09 at 15:27 +0000, Sean Christopherson wrote:
> On Thu, Dec 09, 2021, Maxim Levitsky wrote:
> > On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote:
> > > On 12/9/21 12:54, Maxim Levitsky wrote:
> > > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> > > > inhibited, it might read a stale value of vcpu->arch.apicv_active
> > > > which can lead to the target vCPU not noticing the interrupt.
> > > > 
> > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > ---
> > > >   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
> > > >   1 file changed, 13 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > > index 859ad2dc50f1..8c1b934bfa9b 100644
> > > > --- a/arch/x86/kvm/svm/avic.c
> > > > +++ b/arch/x86/kvm/svm/avic.c
> > > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> > > >   	 * automatically process AVIC interrupts at VMRUN.
> > > >   	 */
> > > >   	if (vcpu->mode == IN_GUEST_MODE) {
> > > > +
> > > > +		/*
> > > > +		 * At this point we had read the vcpu->arch.apicv_active == true
> > > > +		 * and the vcpu->mode == IN_GUEST_MODE.
> > > > +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> > > > +		 * it ensures that AVIC inhibition is complete and thus
> > > > +		 * the target is really running with AVIC enabled.
> > > > +		 */
> > > > +
> > > >   		int cpu = READ_ONCE(vcpu->cpu);
> > > 
> > > I don't think it's correct.  The vCPU has apicv_active written (in 
> > > kvm_vcpu_update_apicv) before vcpu->mode.
> > 
> > I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE
> > thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value.
> > But apparently the memory barrier is after setting vcpu->mode.
> > 
> > 
> > > For the acquire/release pair to work properly you need to 1) read 
> > > apicv_active *after* vcpu->mode here 2) use store_release and 
> > > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.
> > 
> > store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it,
> > which I expected to be there.
> > 
> > And yes I see now, I need a read barrier here as well. I am still learning this.
> 
> Sans barriers and comments, can't this be written as returning an "error" if the
> vCPU is not IN_GUEST_MODE?  Effectively the same thing, but a little more precise
> and it avoids duplicating the lapic.c code.

Yes, beside the fact that we already set the vIRR bit so if I return -1 here, it will be set again..
(and these are set using atomic ops)

I don't know how much that matters except the fact that while a vCPU runs a nested guest,
callers wishing to send IPI to it, will go through this code path a lot 
(even when I implement nested AVIC as it is a separate thing which is used by L2 only).

Best regards,
	Maxim Levitsky

> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 26ed5325c593..cddf7a8da3ea 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -671,7 +671,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
> 
>  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>  {
> -       if (!vcpu->arch.apicv_active)
> +       if (vcpu->mode != IN_GUEST_MODE || !vcpu->arch.apicv_active)
>                 return -1;
> 
>         kvm_lapic_set_irr(vec, vcpu->arch.apic);
> @@ -706,8 +706,9 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>                 put_cpu();
>         } else {
>                 /*
> -                * Wake the vCPU if it was blocking.  KVM will then detect the
> -                * pending IRQ when checking if the vCPU has a wake event.
> +                * Wake the vCPU if it is blocking.  If the vCPU exited the
> +                * guest since the previous vcpu->mode check, it's guaranteed
> +                * to see the event before re-enterring the guest.
>                  */
>                 kvm_vcpu_wake_up(vcpu);
>         }
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition
  2021-12-09 15:33         ` Maxim Levitsky
@ 2021-12-09 15:35           ` Maxim Levitsky
  0 siblings, 0 replies; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-09 15:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson

On Thu, 2021-12-09 at 17:33 +0200, Maxim Levitsky wrote:
> On Thu, 2021-12-09 at 15:27 +0000, Sean Christopherson wrote:
> > On Thu, Dec 09, 2021, Maxim Levitsky wrote:
> > > On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote:
> > > > On 12/9/21 12:54, Maxim Levitsky wrote:
> > > > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> > > > > inhibited, it might read a stale value of vcpu->arch.apicv_active
> > > > > which can lead to the target vCPU not noticing the interrupt.
> > > > > 
> > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > > ---
> > > > >   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
> > > > >   1 file changed, 13 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > > > index 859ad2dc50f1..8c1b934bfa9b 100644
> > > > > --- a/arch/x86/kvm/svm/avic.c
> > > > > +++ b/arch/x86/kvm/svm/avic.c
> > > > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> > > > >   	 * automatically process AVIC interrupts at VMRUN.
> > > > >   	 */
> > > > >   	if (vcpu->mode == IN_GUEST_MODE) {
> > > > > +
> > > > > +		/*
> > > > > +		 * At this point we had read the vcpu->arch.apicv_active == true
> > > > > +		 * and the vcpu->mode == IN_GUEST_MODE.
> > > > > +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> > > > > +		 * it ensures that AVIC inhibition is complete and thus
> > > > > +		 * the target is really running with AVIC enabled.
> > > > > +		 */
> > > > > +
> > > > >   		int cpu = READ_ONCE(vcpu->cpu);
> > > > 
> > > > I don't think it's correct.  The vCPU has apicv_active written (in 
> > > > kvm_vcpu_update_apicv) before vcpu->mode.
> > > 
> > > I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE
> > > thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value.
> > > But apparently the memory barrier is after setting vcpu->mode.
> > > 
> > > 
> > > > For the acquire/release pair to work properly you need to 1) read 
> > > > apicv_active *after* vcpu->mode here 2) use store_release and 
> > > > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.
> > > 
> > > store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it,
> > > which I expected to be there.
> > > 
> > > And yes I see now, I need a read barrier here as well. I am still learning this.
> > 
> > Sans barriers and comments, can't this be written as returning an "error" if the
> > vCPU is not IN_GUEST_MODE?  Effectively the same thing, but a little more precise
> > and it avoids duplicating the lapic.c code.
> 
> Yes, beside the fact that we already set the vIRR bit so if I return -1 here, it will be set again..
> (and these are set using atomic ops)
> 
> I don't know how much that matters except the fact that while a vCPU runs a nested guest,
> callers wishing to send IPI to it, will go through this code path a lot 
> (even when I implement nested AVIC as it is a separate thing which is used by L2 only).

Ah, hit send too soon, makes sense now to me!
Best regards,
	Maxim Levitsky
> 
> Best regards,
> 	Maxim Levitsky
> 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 26ed5325c593..cddf7a8da3ea 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -671,7 +671,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
> > 
> >  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >  {
> > -       if (!vcpu->arch.apicv_active)
> > +       if (vcpu->mode != IN_GUEST_MODE || !vcpu->arch.apicv_active)
> >                 return -1;
> > 
> >         kvm_lapic_set_irr(vec, vcpu->arch.apic);
> > @@ -706,8 +706,9 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >                 put_cpu();
> >         } else {
> >                 /*
> > -                * Wake the vCPU if it was blocking.  KVM will then detect the
> > -                * pending IRQ when checking if the vCPU has a wake event.
> > +                * Wake the vCPU if it is blocking.  If the vCPU exited the
> > +                * guest since the previous vcpu->mode check, it's guaranteed
> > +                * to see the event before re-enterring the guest.
> >                  */
> >                 kvm_vcpu_wake_up(vcpu);
> >         }
> > 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] KVM: SVM: fix races in the AVIC incomplete IPI delivery to vCPUs
  2021-12-09 11:54 ` [PATCH 4/6] KVM: SVM: fix races in the AVIC incomplete IPI delivery to vCPUs Maxim Levitsky
@ 2021-12-09 15:38   ` Sean Christopherson
  2021-12-10 11:37     ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2021-12-09 15:38 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson

On Thu, Dec 09, 2021, Maxim Levitsky wrote:
> If the target vCPU has AVIC inhibited while the source vCPU isn't,
> we need to set irr_pending, for the target to notice the interrupt.
> Do it always to be safe, the same as in svm_deliver_avic_intr.
> 
> Also if the target has AVIC inhibited, the same kind of races
> that happen in svm_deliver_avic_intr can happen here as well,
> so apply the same approach of kicking the target vCPUs.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/avic.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 8c1b934bfa9b..bdfc37caa64a 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -304,8 +304,17 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
>  					GET_APIC_DEST_FIELD(icrh),
> -					icrl & APIC_DEST_MASK))
> -			kvm_vcpu_wake_up(vcpu);
> +					icrl & APIC_DEST_MASK)) {

What about leveraging svm_deliver_avic_intr() to handle this so that all the
logic to handle this mess is more or less contained in one location?  And if the
vCPU has made its way back to the guest with APICv=1, we'd avoid an unnecessary
kick.

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 26ed5325c593..cf9f5caa6e1b 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -304,8 +304,12 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
        kvm_for_each_vcpu(i, vcpu, kvm) {
                if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
                                        GET_APIC_DEST_FIELD(icrh),
-                                       icrl & APIC_DEST_MASK))
-                       kvm_vcpu_wake_up(vcpu);
+                                       icrl & APIC_DEST_MASK)) {
+                       if (svm_deliver_avic_intr(vcpu, -1) {
+                               vcpu->arch.apic->irr_pending = true;
+                               kvm_make_request(KVM_REQ_EVENT, vcpu);
+                       }
+               }
        }
 }
 

And change svm_deliver_avic_intr() to ignore a negative vector, probably with a
comment saying that means the vector has already been set in the IRR.

	if (vec > 0) {
		kvm_lapic_set_irr(vec, vcpu->arch.apic);

		/*
		* Pairs with the smp_mb_*() after setting vcpu->guest_mode in
		* vcpu_enter_guest() to ensure the write to the vIRR is ordered before
		* the read of guest_mode, which guarantees that either VMRUN will see
		* and process the new vIRR entry, or that the below code will signal
		* the doorbell if the vCPU is already running in the guest.
		*/
		smp_mb__after_atomic();
	}


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] KVM: SVM: fix races in the AVIC incomplete IPI delivery to vCPUs
  2021-12-09 15:38   ` Sean Christopherson
@ 2021-12-10 11:37     ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2021-12-10 11:37 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson

On 12/9/21 16:38, Sean Christopherson wrote:
> +                       if (svm_deliver_avic_intr(vcpu, -1) {
> +                               vcpu->arch.apic->irr_pending = true;
> +                               kvm_make_request(KVM_REQ_EVENT, vcpu);
> +                       }

This is a good idea, but the details depends on patch 3 so I'll send a 
series of my own.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv
  2021-12-09 11:54 ` [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv Maxim Levitsky
  2021-12-09 14:12   ` Paolo Bonzini
@ 2021-12-10 12:07   ` Paolo Bonzini
  2021-12-10 12:20     ` Maxim Levitsky
  1 sibling, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2021-12-10 12:07 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson

On 12/9/21 12:54, Maxim Levitsky wrote:
> It is possible that during the AVIC incomplete IPI vmexit,
> its handler will set irr_pending to true,
> but the target vCPU will still see the IRR bit not set,
> due to the apparent lack of memory ordering between CPU's vIRR write
> that is supposed to happen prior to the AVIC incomplete IPI
> vmexit and the write of the irr_pending in that handler.

Are you sure about this?  Store-to-store ordering should be 
guaranteed---if not by the architecture---by existing memory barriers 
between vmrun returning and avic_incomplete_ipi_interception().  For 
example, srcu_read_lock implies an smp_mb().

Even more damning: no matter what internal black magic the processor 
could be using to write to IRR, the processor needs to order the writes 
against reads of IsRunning on processors without the erratum.  That 
would be equivalent to flushing the store buffer, and it would imply 
that the write of vIRR is ordered before the write to irr_pending.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv
  2021-12-10 12:07   ` Paolo Bonzini
@ 2021-12-10 12:20     ` Maxim Levitsky
  2021-12-10 12:47       ` Maxim Levitsky
  0 siblings, 1 reply; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-10 12:20 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson

On Fri, 2021-12-10 at 13:07 +0100, Paolo Bonzini wrote:
> On 12/9/21 12:54, Maxim Levitsky wrote:
> > It is possible that during the AVIC incomplete IPI vmexit,
> > its handler will set irr_pending to true,
> > but the target vCPU will still see the IRR bit not set,
> > due to the apparent lack of memory ordering between CPU's vIRR write
> > that is supposed to happen prior to the AVIC incomplete IPI
> > vmexit and the write of the irr_pending in that handler.
> 
> Are you sure about this?  Store-to-store ordering should be 
> guaranteed---if not by the architecture---by existing memory barriers 
> between vmrun returning and avic_incomplete_ipi_interception().  For 
> example, srcu_read_lock implies an smp_mb().
> 
> Even more damning: no matter what internal black magic the processor 
> could be using to write to IRR, the processor needs to order the writes 
> against reads of IsRunning on processors without the erratum.  That 
> would be equivalent to flushing the store buffer, and it would imply 
> that the write of vIRR is ordered before the write to irr_pending.
> 
> Paolo
> 
Yes I almost 100% sure now that this patch is wrong.
the code was just seeing irr_pending true because it is set
to true while APICv/AVIC is use, and was not seeing yet the vIRR bits,
because they didn't arrive yet. This this patch isn't needed.

Thanks again for help!
I am testing your version of fixes to avic inhibition races,
and then I'll send a new version of these patches.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv
  2021-12-10 12:20     ` Maxim Levitsky
@ 2021-12-10 12:47       ` Maxim Levitsky
  2021-12-10 13:03         ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-10 12:47 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson

On Fri, 2021-12-10 at 14:20 +0200, Maxim Levitsky wrote:
> On Fri, 2021-12-10 at 13:07 +0100, Paolo Bonzini wrote:
> > On 12/9/21 12:54, Maxim Levitsky wrote:
> > > It is possible that during the AVIC incomplete IPI vmexit,
> > > its handler will set irr_pending to true,
> > > but the target vCPU will still see the IRR bit not set,
> > > due to the apparent lack of memory ordering between CPU's vIRR write
> > > that is supposed to happen prior to the AVIC incomplete IPI
> > > vmexit and the write of the irr_pending in that handler.
> > 
> > Are you sure about this?  Store-to-store ordering should be 
> > guaranteed---if not by the architecture---by existing memory barriers 
> > between vmrun returning and avic_incomplete_ipi_interception().  For 
> > example, srcu_read_lock implies an smp_mb().
> > 
> > Even more damning: no matter what internal black magic the processor 
> > could be using to write to IRR, the processor needs to order the writes 
> > against reads of IsRunning on processors without the erratum.  That 
> > would be equivalent to flushing the store buffer, and it would imply 
> > that the write of vIRR is ordered before the write to irr_pending.
> > 
> > Paolo
> > 
> Yes I almost 100% sure now that this patch is wrong.
> the code was just seeing irr_pending true because it is set
> to true while APICv/AVIC is use, and was not seeing yet the vIRR bits,
> because they didn't arrive yet. This this patch isn't needed.
> 
> Thanks again for help!
> I am testing your version of fixes to avic inhibition races,
> and then I'll send a new version of these patches.
> 
> Best regards,
> 	Maxim Levitsky

And yet that patch is needed for a differnt reason.

If the sender has AVIC enabled, it can turn on vIRR bits at any moment
without setting irr_pending = true - there are no VMexits happeing
on the sender side.

If we scan vIRR here and see no bits, and *then* disable AVIC,
there is a window where the they could legit be turned on without any cpu errata,
and we will not have irr_pending == true, and thus the following 
KVM_REQ_EVENT will make no difference.

Not touching irr_pending and letting just the KVM_REQ_EVENT do the work
will work too, and if the avic errata is present, reduce slightly
the chances of it happening.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv
  2021-12-10 12:47       ` Maxim Levitsky
@ 2021-12-10 13:03         ` Paolo Bonzini
  2021-12-10 13:10           ` Maxim Levitsky
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2021-12-10 13:03 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson

On 12/10/21 13:47, Maxim Levitsky wrote:
> If we scan vIRR here and see no bits, and*then*  disable AVIC,
> there is a window where the they could legit be turned on without any cpu errata,
> and we will not have irr_pending == true, and thus the following
> KVM_REQ_EVENT will make no difference.

Right.

> Not touching irr_pending and letting just the KVM_REQ_EVENT do the work
> will work too,

Yeah, I think that's preferrable.  irr_pending == true is a conservative 
setting that works; irr_pending will be evaluated again on the first 
call to apic_clear_irr and that's enough.

With that justification, you don't need to reorder the call to 
kvm_apic_update_apicv to be after kvm_x86_refresh_apicv_exec_ctrl.

Paolo

  and if the avic errata is present, reduce slightly
> the chances of it happening.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv
  2021-12-10 13:03         ` Paolo Bonzini
@ 2021-12-10 13:10           ` Maxim Levitsky
  0 siblings, 0 replies; 21+ messages in thread
From: Maxim Levitsky @ 2021-12-10 13:10 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Dave Hansen, Joerg Roedel, H. Peter Anvin,
	Vitaly Kuznetsov, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Ingo Molnar, Thomas Gleixner, Jim Mattson, Sean Christopherson

On Fri, 2021-12-10 at 14:03 +0100, Paolo Bonzini wrote:
> On 12/10/21 13:47, Maxim Levitsky wrote:
> > If we scan vIRR here and see no bits, and*then*  disable AVIC,
> > there is a window where the they could legit be turned on without any cpu errata,
> > and we will not have irr_pending == true, and thus the following
> > KVM_REQ_EVENT will make no difference.
> 
> Right.
> 
> > Not touching irr_pending and letting just the KVM_REQ_EVENT do the work
> > will work too,
> 
> Yeah, I think that's preferrable.  irr_pending == true is a conservative 
> setting that works; irr_pending will be evaluated again on the first 
> call to apic_clear_irr and that's enough.
> 
> With that justification, you don't need to reorder the call to 
> kvm_apic_update_apicv to be after kvm_x86_refresh_apicv_exec_ctrl.

Yes exactly! but no need to scan IRR here since irr_pending is already
true at that point anyway - it is always true while avic is enabled.


Best regards,
	Maxim Levitsky
> 
> Paolo
> 
>   and if the avic errata is present, reduce slightly
> > the chances of it happening.



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-12-10 13:10 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-09 11:54 [PATCH 0/6] RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting Maxim Levitsky
2021-12-09 11:54 ` [PATCH 1/6] KVM: SVM: allow to force AVIC to be enabled Maxim Levitsky
2021-12-09 11:54 ` [PATCH 2/6] KVM: x86: add a tracepoint for APICv/AVIC interrupt delivery Maxim Levitsky
2021-12-09 11:54 ` [PATCH 3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition Maxim Levitsky
2021-12-09 14:11   ` Paolo Bonzini
2021-12-09 14:26     ` Maxim Levitsky
2021-12-09 15:27       ` Sean Christopherson
2021-12-09 15:33         ` Maxim Levitsky
2021-12-09 15:35           ` Maxim Levitsky
2021-12-09 11:54 ` [PATCH 4/6] KVM: SVM: fix races in the AVIC incomplete IPI delivery to vCPUs Maxim Levitsky
2021-12-09 15:38   ` Sean Christopherson
2021-12-10 11:37     ` Paolo Bonzini
2021-12-09 11:54 ` [PATCH 5/6] KVM: x86: never clear irr_pending in kvm_apic_update_apicv Maxim Levitsky
2021-12-09 14:12   ` Paolo Bonzini
2021-12-09 15:03     ` Maxim Levitsky
2021-12-10 12:07   ` Paolo Bonzini
2021-12-10 12:20     ` Maxim Levitsky
2021-12-10 12:47       ` Maxim Levitsky
2021-12-10 13:03         ` Paolo Bonzini
2021-12-10 13:10           ` Maxim Levitsky
2021-12-09 11:54 ` [PATCH 6/6] KVM: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).