All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] KVM: x86: APICv fixes
@ 2022-04-20  1:37 Sean Christopherson
  2022-04-20  1:37 ` [PATCH v2 1/4] KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled Sean Christopherson
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Sean Christopherson @ 2022-04-20  1:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky, Gaoning Pan,
	Yongkang Jia

Patch 1 is brown paper bag fix for a copy+paste bug (thankfully from this
cycle).

Patch 2 fixes a nVMX + APICv bug where KVM essentially corrupts vmcs02 if
an APICv update arrives while L2 is running.  Found when testing a slight
variation of patch 3.

Patch 3 fixes a race where an APICv update that occurs while a vCPU is
being created will fail to notify that vCPU due it not yet being visible
to the updater.

Patch 4 is a minor optimization/cleanup found by inspection when digging
into everything else.

v2:
  - Collect reviews. [Maxim]
  - Reword patch 2's changelog to make it clear that ignoring vmcs02
    is perfectly ok. [Maxim]

v1: https://lore.kernel.org/all/20220416034249.2609491-3-seanjc@google.com

Sean Christopherson (4):
  KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled
  KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
  KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
  KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled

 arch/x86/kvm/vmx/nested.c |  5 +++++
 arch/x86/kvm/vmx/vmx.c    |  5 +++++
 arch/x86/kvm/vmx/vmx.h    |  1 +
 arch/x86/kvm/x86.c        | 20 ++++++++++++++++++--
 4 files changed, 29 insertions(+), 2 deletions(-)


base-commit: 150866cd0ec871c765181d145aa0912628289c8a
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/4] KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled
  2022-04-20  1:37 [PATCH v2 0/4] KVM: x86: APICv fixes Sean Christopherson
@ 2022-04-20  1:37 ` Sean Christopherson
  2022-04-20  1:37 ` [PATCH v2 2/4] KVM: nVMX: Defer APICv updates while L2 is active until L1 is active Sean Christopherson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2022-04-20  1:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky, Gaoning Pan,
	Yongkang Jia

Set the DISABLE inhibit, not the ABSENT inhibit, if APICv is disabled via
module param.  A recent refactoring to add a wrapper for setting/clearing
inhibits unintentionally changed the flag, probably due to a copy+paste
goof.

Fixes: 4f4c4a3ee53c ("KVM: x86: Trace all APICv inhibit changes and capture overall status")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab336f7c82e4..753296902535 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9159,7 +9159,7 @@ static void kvm_apicv_init(struct kvm *kvm)
 
 	if (!enable_apicv)
 		set_or_clear_apicv_inhibit(inhibits,
-					   APICV_INHIBIT_REASON_ABSENT, true);
+					   APICV_INHIBIT_REASON_DISABLE, true);
 }
 
 static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/4] KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
  2022-04-20  1:37 [PATCH v2 0/4] KVM: x86: APICv fixes Sean Christopherson
  2022-04-20  1:37 ` [PATCH v2 1/4] KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled Sean Christopherson
@ 2022-04-20  1:37 ` Sean Christopherson
  2022-04-20  1:37 ` [PATCH v2 3/4] KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race Sean Christopherson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2022-04-20  1:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky, Gaoning Pan,
	Yongkang Jia

Defer APICv updates that occur while L2 is active until nested VM-Exit,
i.e. until L1 regains control.  vmx_refresh_apicv_exec_ctrl() assumes L1
is active and (a) stomps all over vmcs02 and (b) neglects to ever updated
vmcs01.  E.g. if vmcs12 doesn't enable the TPR shadow for L2 (and thus no
APICv controls), L1 performs nested VM-Enter APICv inhibited, and APICv
becomes unhibited while L2 is active, KVM will set various APICv controls
in vmcs02 and trigger a failed VM-Entry.  The kicker is that, unless
running with nested_early_check=1, KVM blames L1 and chaos ensues.

In all cases, ignoring vmcs02 and always deferring the inhibition change
to vmcs01 is correct (or at least acceptable).  The ABSENT and DISABLE
inhibitions cannot truly change while L2 is active (see below).

IRQ_BLOCKING can change, but it is firmly a best effort debug feature.
Furthermore, only L2's APIC is accelerated/virtualized to the full extent
possible, e.g. even if L1 passes through its APIC to L2, normal MMIO/MSR
interception will apply to the virtual APIC managed by KVM.
The exception is the SELF_IPI register when x2APIC is enabled, but that's
an acceptable hole.

Lastly, Hyper-V's Auto EOI can technically be toggled if L1 exposes the
MSRs to L2, but for that to work in any sane capacity, L1 would need to
pass through IRQs to L2 as well, and IRQs must be intercepted to enable
virtual interrupt delivery.  I.e. exposing Auto EOI to L2 and enabling
VID for L2 are, for all intents and purposes, mutually exclusive.

Lack of dynamic toggling is also why this scenario is all but impossible
to encounter in KVM's current form.  But a future patch will pend an
APICv update request _during_ vCPU creation to plug a race where a vCPU
that's being created doesn't get included in the "all vCPUs request"
because it's not yet visible to other vCPUs.  If userspaces restores L2
after VM creation (hello, KVM selftests), the first KVM_RUN will occur
while L2 is active and thus service the APICv update request made during
VM creation.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 5 +++++
 arch/x86/kvm/vmx/vmx.c    | 5 +++++
 arch/x86/kvm/vmx/vmx.h    | 1 +
 3 files changed, 11 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a6688663da4d..f5cb18e00e78 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4640,6 +4640,11 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 		kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
 	}
 
+	if (vmx->nested.update_vmcs01_apicv_status) {
+		vmx->nested.update_vmcs01_apicv_status = false;
+		kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
+	}
+
 	if ((vm_exit_reason != -1) &&
 	    (enable_shadow_vmcs || evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)))
 		vmx->nested.need_vmcs12_to_shadow_sync = true;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cf8581978bce..4c407a34b11e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4174,6 +4174,11 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	if (is_guest_mode(vcpu)) {
+		vmx->nested.update_vmcs01_apicv_status = true;
+		return;
+	}
+
 	pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx));
 	if (cpu_has_secondary_exec_ctrls()) {
 		if (kvm_vcpu_apicv_active(vcpu))
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 9c6bfcd84008..b98c7e96697a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -183,6 +183,7 @@ struct nested_vmx {
 	bool change_vmcs01_virtual_apic_mode;
 	bool reload_vmcs01_apic_access_page;
 	bool update_vmcs01_cpu_dirty_logging;
+	bool update_vmcs01_apicv_status;
 
 	/*
 	 * Enlightened VMCS has been enabled. It does not mean that L1 has to
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/4] KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
  2022-04-20  1:37 [PATCH v2 0/4] KVM: x86: APICv fixes Sean Christopherson
  2022-04-20  1:37 ` [PATCH v2 1/4] KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled Sean Christopherson
  2022-04-20  1:37 ` [PATCH v2 2/4] KVM: nVMX: Defer APICv updates while L2 is active until L1 is active Sean Christopherson
@ 2022-04-20  1:37 ` Sean Christopherson
  2022-04-20  1:37 ` [PATCH v2 4/4] KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled Sean Christopherson
  2022-04-20 10:51 ` [PATCH v2 0/4] KVM: x86: APICv fixes Paolo Bonzini
  4 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2022-04-20  1:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky, Gaoning Pan,
	Yongkang Jia

Make a KVM_REQ_APICV_UPDATE request when creating a vCPU with an
in-kernel local APIC and APICv enabled at the module level.  Consuming
kvm_apicv_activated() and stuffing vcpu->arch.apicv_active directly can
race with __kvm_set_or_clear_apicv_inhibit(), as vCPU creation happens
before the vCPU is fully onlined, i.e. it won't get the request made to
"all" vCPUs.  If APICv is globally inhibited between setting apicv_active
and onlining the vCPU, the vCPU will end up running with APICv enabled
and trigger KVM's sanity check.

Mark APICv as active during vCPU creation if APICv is enabled at the
module level, both to be optimistic about it's final state, e.g. to avoid
additional VMWRITEs on VMX, and because there are likely bugs lurking
since KVM checks apicv_active in multiple vCPU creation paths.  While
keeping the current behavior of consuming kvm_apicv_activated() is
arguably safer from a regression perspective, force apicv_active so that
vCPU creation runs with deterministic state and so that if there are bugs,
they are found sooner than later, i.e. not when some crazy race condition
is hit.

  WARNING: CPU: 0 PID: 484 at arch/x86/kvm/x86.c:9877 vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877
  Modules linked in:
  CPU: 0 PID: 484 Comm: syz-executor361 Not tainted 5.16.13 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1~cloud0 04/01/2014
  RIP: 0010:vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877
  Call Trace:
   <TASK>
   vcpu_run arch/x86/kvm/x86.c:10039 [inline]
   kvm_arch_vcpu_ioctl_run+0x337/0x15e0 arch/x86/kvm/x86.c:10234
   kvm_vcpu_ioctl+0x4d2/0xc80 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3727
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:874 [inline]
   __se_sys_ioctl fs/ioctl.c:860 [inline]
   __x64_sys_ioctl+0x16d/0x1d0 fs/ioctl.c:860
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

The bug was hit by a syzkaller spamming VM creation with 2 vCPUs and a
call to KVM_SET_GUEST_DEBUG.

  r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x0, 0x0)
  r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
  ioctl$KVM_CAP_SPLIT_IRQCHIP(r1, 0x4068aea3, &(0x7f0000000000)) (async)
  r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) (async)
  r3 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x400000000000002)
  ioctl$KVM_SET_GUEST_DEBUG(r3, 0x4048ae9b, &(0x7f00000000c0)={0x5dda9c14aa95f5c5})
  ioctl$KVM_RUN(r2, 0xae80, 0x0)

Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Reported-by: Yongkang Jia <kangel@zju.edu.cn>
Fixes: 8df14af42f00 ("kvm: x86: Add support for dynamic APICv activation")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 753296902535..09a270cc1c8f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11259,8 +11259,21 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		r = kvm_create_lapic(vcpu, lapic_timer_advance_ns);
 		if (r < 0)
 			goto fail_mmu_destroy;
-		if (kvm_apicv_activated(vcpu->kvm))
+
+		/*
+		 * Defer evaluating inhibits until the vCPU is first run, as
+		 * this vCPU will not get notified of any changes until this
+		 * vCPU is visible to other vCPUs (marked online and added to
+		 * the set of vCPUs).  Opportunistically mark APICv active as
+		 * VMX in particularly is highly unlikely to have inhibits.
+		 * Ignore the current per-VM APICv state so that vCPU creation
+		 * is guaranteed to run with a deterministic value, the request
+		 * will ensure the vCPU gets the correct state before VM-Entry.
+		 */
+		if (enable_apicv) {
 			vcpu->arch.apicv_active = true;
+			kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
+		}
 	} else
 		static_branch_inc(&kvm_has_noapic_vcpu);
 
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 4/4] KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled
  2022-04-20  1:37 [PATCH v2 0/4] KVM: x86: APICv fixes Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-04-20  1:37 ` [PATCH v2 3/4] KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race Sean Christopherson
@ 2022-04-20  1:37 ` Sean Christopherson
  2022-04-20 10:51 ` [PATCH v2 0/4] KVM: x86: APICv fixes Paolo Bonzini
  4 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2022-04-20  1:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky, Gaoning Pan,
	Yongkang Jia

Skip the APICv inhibit update for KVM_GUESTDBG_BLOCKIRQ if APICv is
disabled at the module level to avoid having to acquire the mutex and
potentially process all vCPUs. The DISABLE inhibit will (barring bugs)
never be lifted, so piling on more inhibits is unnecessary.

Fixes: cae72dcc3b21 ("KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09a270cc1c8f..16c5fa7d165d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11048,6 +11048,9 @@ static void kvm_arch_vcpu_guestdbg_update_apicv_inhibit(struct kvm *kvm)
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
 
+	if (!enable_apicv)
+		return;
+
 	down_write(&kvm->arch.apicv_update_lock);
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86: APICv fixes
  2022-04-20  1:37 [PATCH v2 0/4] KVM: x86: APICv fixes Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-04-20  1:37 ` [PATCH v2 4/4] KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled Sean Christopherson
@ 2022-04-20 10:51 ` Paolo Bonzini
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2022-04-20 10:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Maxim Levitsky, Gaoning Pan, Yongkang Jia

Queued, thanks.

Paolo



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-20 10:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20  1:37 [PATCH v2 0/4] KVM: x86: APICv fixes Sean Christopherson
2022-04-20  1:37 ` [PATCH v2 1/4] KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled Sean Christopherson
2022-04-20  1:37 ` [PATCH v2 2/4] KVM: nVMX: Defer APICv updates while L2 is active until L1 is active Sean Christopherson
2022-04-20  1:37 ` [PATCH v2 3/4] KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race Sean Christopherson
2022-04-20  1:37 ` [PATCH v2 4/4] KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled Sean Christopherson
2022-04-20 10:51 ` [PATCH v2 0/4] KVM: x86: APICv fixes Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.