Deadlock due to EPT_VIOLATION

* Deadlock due to EPT_VIOLATION
@ 2023-05-23 14:02 Brian Rak
  2023-05-23 16:22 ` Sean Christopherson
  0 siblings, 1 reply; 48+ messages in thread
From: Brian Rak @ 2023-05-23 14:02 UTC (permalink / raw)
  To: kvm

We've been hitting an issue lately where KVM guests (w/ qemu) have been 
getting stuck in a loop of EPT_VIOLATIONs, and end up requiring a guest 
reboot to fix.

On Intel machines the trace ends up looking like:

     CPU-2386625 [094] 6598425.465404: kvm_entry:            vcpu 1, rip 
0xffffffffc0771aa2
     CPU-2386625 [094] 6598425.465405: kvm_exit:             reason 
EPT_VIOLATION rip 0xffffffffc0771aa2 info 683 800000ec
     CPU-2386625 [094] 6598425.465405: kvm_page_fault:       vcpu 1 rip 
0xffffffffc0771aa2 address 0x0000000002594fe0 error_code 0x683
     CPU-2386625 [094] 6598425.465406: kvm_inj_virq:         IRQ 0xec 
[reinjected]
     CPU-2386625 [094] 6598425.465406: kvm_entry:            vcpu 1, rip 
0xffffffffc0771aa2
     CPU-2386625 [094] 6598425.465407: kvm_exit:             reason 
EPT_VIOLATION rip 0xffffffffc0771aa2 info 683 800000ec
     CPU-2386625 [094] 6598425.465407: kvm_page_fault:       vcpu 1 rip 
0xffffffffc0771aa2 address 0x0000000002594fe0 error_code 0x683
     CPU-2386625 [094] 6598425.465408: kvm_inj_virq:         IRQ 0xec 
[reinjected]
     CPU-2386625 [094] 6598425.465408: kvm_entry:            vcpu 1, rip 
0xffffffffc0771aa2
     CPU-2386625 [094] 6598425.465409: kvm_exit:             reason 
EPT_VIOLATION rip 0xffffffffc0771aa2 info 683 800000ec
     CPU-2386625 [094] 6598425.465410: kvm_page_fault:       vcpu 1 rip 
0xffffffffc0771aa2 address 0x0000000002594fe0 error_code 0x683
     CPU-2386625 [094] 6598425.465410: kvm_inj_virq:         IRQ 0xec 
[reinjected]
     CPU-2386625 [094] 6598425.465410: kvm_entry:            vcpu 1, rip 
0xffffffffc0771aa2
     CPU-2386625 [094] 6598425.465411: kvm_exit:             reason 
EPT_VIOLATION rip 0xffffffffc0771aa2 info 683 800000ec
     CPU-2386625 [094] 6598425.465412: kvm_page_fault:       vcpu 1 rip 
0xffffffffc0771aa2 address 0x0000000002594fe0 error_code 0x683
     CPU-2386625 [094] 6598425.465413: kvm_inj_virq:         IRQ 0xec 
[reinjected]
     CPU-2386625 [094] 6598425.465413: kvm_entry:            vcpu 1, rip 
0xffffffffc0771aa2
     CPU-2386625 [094] 6598425.465414: kvm_exit:             reason 
EPT_VIOLATION rip 0xffffffffc0771aa2 info 683 800000ec
     CPU-2386625 [094] 6598425.465414: kvm_page_fault:       vcpu 1 rip 
0xffffffffc0771aa2 address 0x0000000002594fe0 error_code 0x683
     CPU-2386625 [094] 6598425.465415: kvm_inj_virq:         IRQ 0xec 
[reinjected]
     CPU-2386625 [094] 6598425.465415: kvm_entry:            vcpu 1, rip 
0xffffffffc0771aa2
     CPU-2386625 [094] 6598425.465417: kvm_exit:             reason 
EPT_VIOLATION rip 0xffffffffc0771aa2 info 683 800000ec
     CPU-2386625 [094] 6598425.465417: kvm_page_fault:       vcpu 1 rip 
0xffffffffc0771aa2 address 0x0000000002594fe0 error_code 0x683

on AMD machines, we end up with:

     CPU-14414 [063] 3039492.055571: kvm_page_fault:       vcpu 0 rip 
0xffffffffb172ab2b address 0x0000000f88eb2ff8 error_code 0x200000006
     CPU-14414 [063] 3039492.055571: kvm_entry:            vcpu 0, rip 
0xffffffffb172ab2b
     CPU-14414 [063] 3039492.055572: kvm_exit:             reason 
EXIT_NPF rip 0xffffffffb172ab2b info 200000006 f88eb2ff8
     CPU-14414 [063] 3039492.055572: kvm_page_fault:       vcpu 0 rip 
0xffffffffb172ab2b address 0x0000000f88eb2ff8 error_code 0x200000006
     CPU-14414 [063] 3039492.055573: kvm_entry:            vcpu 0, rip 
0xffffffffb172ab2b
     CPU-14414 [063] 3039492.055574: kvm_exit:             reason 
EXIT_NPF rip 0xffffffffb172ab2b info 200000006 f88eb2ff8
     CPU-14414 [063] 3039492.055574: kvm_page_fault:       vcpu 0 rip 
0xffffffffb172ab2b address 0x0000000f88eb2ff8 error_code 0x200000006
     CPU-14414 [063] 3039492.055575: kvm_entry:            vcpu 0, rip 
0xffffffffb172ab2b
     CPU-14414 [063] 3039492.055575: kvm_exit:             reason 
EXIT_NPF rip 0xffffffffb172ab2b info 200000006 f88eb2ff8
     CPU-14414 [063] 3039492.055576: kvm_page_fault:       vcpu 0 rip 
0xffffffffb172ab2b address 0x0000000f88eb2ff8 error_code 0x200000006
     CPU-14414 [063] 3039492.055576: kvm_entry:            vcpu 0, rip 
0xffffffffb172ab2b
     CPU-14414 [063] 3039492.055577: kvm_exit:             reason 
EXIT_NPF rip 0xffffffffb172ab2b info 200000006 f88eb2ff8
     CPU-14414 [063] 3039492.055577: kvm_page_fault:       vcpu 0 rip 
0xffffffffb172ab2b address 0x0000000f88eb2ff8 error_code 0x200000006
     CPU-14414 [063] 3039492.055578: kvm_entry:            vcpu 0, rip 
0xffffffffb172ab2b
     CPU-14414 [063] 3039492.055579: kvm_exit:             reason 
EXIT_NPF rip 0xffffffffb172ab2b info 200000006 f88eb2ff8
     CPU-14414 [063] 3039492.055579: kvm_page_fault:       vcpu 0 rip 
0xffffffffb172ab2b address 0x0000000f88eb2ff8 error_code 0x200000006
     CPU-14414 [063] 3039492.055580: kvm_entry:            vcpu 0, rip 
0xffffffffb172ab2b

The qemu process ends up looking like this once it happens:

     0x00007fdc6a51be26 in internal_fallocate64 (fd=-514841856, 
offset=16, len=140729021657088) at ../sysdeps/posix/posix_fallocate64.c:36
     36     return EINVAL;
     (gdb) thread apply all bt

     Thread 6 (Thread 0x7fdbdefff700 (LWP 879746) "vnc_worker"):
     #0  futex_wait_cancelable (private=0, expected=0, 
futex_word=0x7fdc688f66cc) at ../sysdeps/nptl/futex-internal.h:186
     #1  __pthread_cond_wait_common (abstime=0x0, clockid=0, 
mutex=0x7fdc688f66d8, cond=0x7fdc688f66a0) at pthread_cond_wait.c:508
     #2  __pthread_cond_wait (cond=cond@entry=0x7fdc688f66a0, 
mutex=mutex@entry=0x7fdc688f66d8) at pthread_cond_wait.c:638
     #3  0x0000563424cbd32b in qemu_cond_wait_impl (cond=0x7fdc688f66a0, 
mutex=0x7fdc688f66d8, file=0x563424d302b4 "../../ui/vnc-jobs.c", 
line=248) at ../../util/qemu-thread-posix.c:220
     #4  0x00005634247dac33 in vnc_worker_thread_loop 
(queue=0x7fdc688f66a0) at ../../ui/vnc-jobs.c:248
     #5  0x00005634247db8f8 in vnc_worker_thread 
(arg=arg@entry=0x7fdc688f66a0) at ../../ui/vnc-jobs.c:361
     #6  0x0000563424cbc7e9 in qemu_thread_start (args=0x7fdbdeffcf30) 
at ../../util/qemu-thread-posix.c:505
     #7  0x00007fdc6a8e1ea7 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
     #8  0x00007fdc6a527a2f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

     Thread 5 (Thread 0x7fdbe5dff700 (LWP 879738) "CPU 1/KVM"):
     #0  0x00007fdc6a51d5f7 in preadv64v2 (fd=1756258112, 
vector=0x563424b5f007 <kvm_vcpu_ioctl+103>, count=0, offset=0, 
flags=44672) at ../sysdeps/unix/sysv/linux/preadv64v2.c:31
     #1  0x0000000000000000 in ?? ()

     Thread 4 (Thread 0x7fdbe6fff700 (LWP 879737) "CPU 0/KVM"):
     #0  0x00007fdc6a51d5f7 in preadv64v2 (fd=1755834304, 
vector=0x563424b5f007 <kvm_vcpu_ioctl+103>, count=0, offset=0, 
flags=44672) at ../sysdeps/unix/sysv/linux/preadv64v2.c:31
     #1  0x0000000000000000 in ?? ()

     Thread 3 (Thread 0x7fdbe83ff700 (LWP 879735) "IO mon_iothread"):
     #0  0x00007fdc6a51bd2f in internal_fallocate64 (fd=-413102080, 
offset=3, len=4294967295) at ../sysdeps/posix/posix_fallocate64.c:32
     #1  0x000d5572b9bb0764 in ?? ()
     #2  0x000000016891db00 in ?? ()
     #3  0xffffffff7fffffff in ?? ()
     #4  0xf6b8254512850600 in ?? ()
     #5  0x0000000000000000 in ?? ()

     Thread 2 (Thread 0x7fdc693ff700 (LWP 879730) "qemu-kvm"):
     #0  0x00007fdc6a5212e9 in ?? () from 
target:/lib/x86_64-linux-gnu/libc.so.6
     #1  0x0000563424cbd9aa in qemu_futex_wait (val=<optimized out>, 
f=<optimized out>) at ./include/qemu/futex.h:29
     #2  qemu_event_wait (ev=ev@entry=0x5634254bd1a8 
<rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:430
     #3  0x0000563424cc6d80 in call_rcu_thread (opaque=opaque@entry=0x0) 
at ../../util/rcu.c:261
     #4  0x0000563424cbc7e9 in qemu_thread_start (args=0x7fdc693fcf30) 
at ../../util/qemu-thread-posix.c:505
     #5  0x00007fdc6a8e1ea7 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
     #6  0x00007fdc6a527a2f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

     Thread 1 (Thread 0x7fdc69c3a680 (LWP 879712) "qemu-kvm"):
     #0  0x00007fdc6a51be26 in internal_fallocate64 (fd=-514841856, 
offset=16, len=140729021657088) at ../sysdeps/posix/posix_fallocate64.c:36
     #1  0x0000000000000000 in ?? ()

We first started seeing this back in 5.19, and we're still seeing it as 
of 6.1.24 (likely later too, we don't have a ton of data for newer 
versions).  We haven't been able to link this to any specific hardware.  
It appears to happen more often on Intel, but our sample size is much 
larger there.  Guest operating system type/version doesn't appear to 
matter.  This usually happens to guests with a heavy network/disk 
workload, but it can happen to even idle guests. This has happened on 
qemu 7.0 and 7.2 (upgrading to 7.2.2 is on our list to do).

Where do we go from here?  We haven't really made a lot of progress in 
figuring out why this keeps happening, nor have we been able to come up 
with a reliable way to reproduce it.

^ permalink raw reply	[flat|nested] 48+ messages in thread