[PATCH 0/5] Fix a race between posted interrupt delivery and migration in a nested VM

* [PATCH 0/5] Fix a race between posted interrupt delivery and migration in a nested VM
@ 2022-08-02 23:07 Mingwei Zhang
  2022-08-02 23:07 ` [PATCH 1/5] KVM: x86: Get vmcs12 pages before checking pending interrupts Mingwei Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Mingwei Zhang @ 2022-08-02 23:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Oliver Upton, Mingwei Zhang

This patch set aims to fix a race condition between posted interrupt
delivery and migration for a nested VM. In particular, we proves that when
a nested vCPU is halted and just migrated, it will lose a posted
interrupt from another vCPU in the same VM.

The patches consist of 1 kernel change which is the fix and the rest of
the changes generate a selftest that articulates such a racing scenario
to prove the existence of the race. In summary, running this test on an
unpatched kernel will generate a warning [1] and with that, we proves that
there is the loss of a posted interrupt. Note, the warning will only happen
once per reboot, since it is a WARN_ON_ONCE.

[1] The kernel warning happens at arch/x86/kvm/vmx/vmx.c:

static bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
{
	...
	if (WARN_ON_ONCE(!is_guest_mode(vcpu)) ||
		!nested_cpu_has_vid(get_vmcs12(vcpu)) ||
		WARN_ON_ONCE(!vmx->nested.virtual_apic_map.gfn)) <= HERE
		return false;
	...
}

The dump is there:

[237880.809453] ------------[ cut here ]------------
[237880.809455] WARNING: CPU: 21 PID: 112454 at
arch/x86/kvm/vmx/vmx.c:3973 vmx_guest_apic_has_interrupt+0x79/0xe0
[kvm_intel]
[237880.809469] Modules linked in: kvm_intel vfat fat i2c_mux_pca954x
i2c_mux spidev cdc_acm xhci_pci xhci_hcd sha3_generic gq(O)
[237880.809479] CPU: 21 PID: 112454 Comm: vmx_migrate_pi_ Tainted: G S
O      5.19.0-smp-DEV #2
		......
[237880.809484] RIP: 0010:vmx_guest_apic_has_interrupt+0x79/0xe0
[kvm_intel]
[237880.809491] Code: c6 76 2d 41 81 e6 f0 00 00 00 48 8b 83 68 25 00 00
b9 f0 00 00 00 23 88 a0 00 00 00 44 39 f1 0f 92 c0 eb c0 0f 0b 31 c0 eb
ba <0f> 0b 31 c0 eb b4 80 3d 41 c9 02 00 00 74 39 48 c7 c7 18 f3 12 c0
[237880.809493] RSP: 0018:ffff88815c9e7d80 EFLAGS: 00010246
[237880.809495] RAX: ffff88813acbd000 RBX: ffff8881943ec9c0 RCX:
00000000ffffffff
[237880.809497] RDX: 0000000000000000 RSI: ffff8881d8676000 RDI:
ffff8881943ec9c0
[237880.809499] RBP: ffff88815c9e7d90 R08: ffff88815c9e7ce8 R09:
ffff88815c9e7cf0
[237880.809500] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000009aa8
[237880.809501] R13: ffff8881943ec9c0 R14: ffff8881943ed101 R15:
ffff8881943ec9c0
[237880.809503] FS:  00000000006283c0(0000) GS:ffff88af80740000(0000)
knlGS:0000000000000000
[237880.809505] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[237880.809506] CR2: 00007f9314b4f001 CR3: 00000001cd7b0005 CR4:
00000000003726e0
[237880.809508] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[237880.809509] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[237880.809511] Call Trace:
[237880.809512]  <TASK>
[237880.809514]  kvm_vcpu_has_events+0xe1/0x150
[237880.809519]  vcpu_run+0xee/0x2c0
[237880.809523]  kvm_arch_vcpu_ioctl_run+0x355/0x610
[237880.809526]  kvm_vcpu_ioctl+0x551/0x610
[237880.809531]  ? do_futex+0xc8/0x160
[237880.809537]  __se_sys_ioctl+0x77/0xc0
[237880.809541]  __x64_sys_ioctl+0x1d/0x20
[237880.809543]  do_syscall_64+0x44/0xa0
[237880.809549]  ? irqentry_exit+0x12/0x30
[237880.809552]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[237880.809555] RIP: 0033:0x471777
...
[237880.809570]  </TASK>
[237880.809571] ---[ end trace 0000000000000000 ]---

Jim Mattson (1):
  selftests: KVM: Test if posted interrupt delivery race with migration

Mingwei Zhang (3):
  selftests: KVM/x86: Add APIC state into kvm_x86_state
  selftests: KVM: Introduce vcpu_run_interruptable()
  selftests: KVM: Add support for posted interrupt handling in L2

Oliver Upton (1):
  kvm: x86: get vmcs12 pages before checking pending interrupts

 arch/x86/kvm/x86.c                            |  17 ++
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/kvm_util_base.h     |  12 +
 .../selftests/kvm/include/x86_64/processor.h  |   1 +
 .../selftests/kvm/include/x86_64/vmx.h        |  10 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  11 +
 .../selftests/kvm/lib/x86_64/processor.c      |   2 +
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  16 +
 .../kvm/x86_64/vmx_migrate_pi_pending.c       | 289 ++++++++++++++++++
 10 files changed, 360 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_migrate_pi_pending.c

-- 
2.37.1.455.g008518b4e5-goog

^ permalink raw reply	[flat|nested] 28+ messages in thread