All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Rak <brak@vultr.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Subject: Re: Deadlock due to EPT_VIOLATION
Date: Tue, 30 May 2023 13:35:53 -0400	[thread overview]
Message-ID: <9fa11f06-bd55-b061-d16a-081351f04a13@gameservers.com> (raw)
In-Reply-To: <ZHEefxsu5E3BsPni@google.com>


On 5/26/2023 5:02 PM, Sean Christopherson wrote:
> On Fri, May 26, 2023, Brian Rak wrote:
>> On 5/24/2023 9:39 AM, Brian Rak wrote:
>>> On 5/23/2023 12:22 PM, Sean Christopherson wrote:
>>>> The other thing that would be helpful would be getting kernel stack
>>>> traces of the
>>>> relevant tasks/threads.� The vCPU stack traces won't be interesting,
>>>> but it'll
>>>> likely help to see what the fallocate() tasks are doing.
>>> I'll see what I can come up with here, I was running into some
>>> difficulty getting useful stack traces out of the VM
>> I didn't have any luck gathering guest-level stack traces - kaslr makes it
>> pretty difficult even if I have the guest kernel symbols.
> Sorry, I was hoping to get host stack traces, not guest stack traces.  I am hoping
> to see what the fallocate() in the *host* is doing.

Ah - here's a different instance of it with a full backtrace from the host:

(gdb) thread apply all bt

Thread 8 (Thread 0x7fbbd0bff700 (LWP 353251) "vnc_worker"):
#0  futex_wait_cancelable (private=0, expected=0, 
futex_word=0x7fbddd4f6b2c) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, 
mutex=0x7fbddd4f6b38, cond=0x7fbddd4f6b00) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x7fbddd4f6b00, 
mutex=mutex@entry=0x7fbddd4f6b38) at pthread_cond_wait.c:638
#3  0x00005593ea9a232b in qemu_cond_wait_impl (cond=0x7fbddd4f6b00, 
mutex=0x7fbddd4f6b38, file=0x5593eaa152b4 "../../ui/vnc-jobs.c", 
line=248) at ../../util/qemu-thread-posix.c:220
#4  0x00005593ea4bfc33 in vnc_worker_thread_loop (queue=0x7fbddd4f6b00) 
at ../../ui/vnc-jobs.c:248
#5  0x00005593ea4c08f8 in vnc_worker_thread 
(arg=arg@entry=0x7fbddd4f6b00) at ../../ui/vnc-jobs.c:361
#6  0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbd0bfcf30) at 
../../util/qemu-thread-posix.c:505
#7  0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
#8  0x00007fbddf127a2f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7fbbd85ff700 (LWP 353248) "CPU 3/KVM"):
#0  0x00007fbddf11d237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005593ea844007 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbddd42f7c0, 
type=type@entry=44672) at ../../accel/kvm/kvm-all.c:3035
#2  0x00005593ea844171 in kvm_cpu_exec (cpu=cpu@entry=0x7fbddd42f7c0) at 
../../accel/kvm/kvm-all.c:2850
#3  0x00005593ea8457ed in kvm_vcpu_thread_fn 
(arg=arg@entry=0x7fbddd42f7c0) at ../../accel/kvm/kvm-accel-ops.c:51
#4  0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbd85fcf30) at 
../../util/qemu-thread-posix.c:505
#5  0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
#6  0x00007fbddf127a2f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7fbbd97ff700 (LWP 353247) "CPU 2/KVM"):
#0  0x00007fbddf11d237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005593ea844007 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbddd4213c0, 
type=type@entry=44672) at ../../accel/kvm/kvm-all.c:3035
#2  0x00005593ea844171 in kvm_cpu_exec (cpu=cpu@entry=0x7fbddd4213c0) at 
../../accel/kvm/kvm-all.c:2850
#3  0x00005593ea8457ed in kvm_vcpu_thread_fn 
(arg=arg@entry=0x7fbddd4213c0) at ../../accel/kvm/kvm-accel-ops.c:51
#4  0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbd97fcf30) at 
../../util/qemu-thread-posix.c:505
#5  0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
#6  0x00007fbddf127a2f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7fbbda9ff700 (LWP 353246) "CPU 1/KVM"):
#0  0x00007fbddf11d237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005593ea844007 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbddd6f5e40, 
type=type@entry=44672) at ../../accel/kvm/kvm-all.c:3035
#2  0x00005593ea844171 in kvm_cpu_exec (cpu=cpu@entry=0x7fbddd6f5e40) at 
../../accel/kvm/kvm-all.c:2850
#3  0x00005593ea8457ed in kvm_vcpu_thread_fn 
(arg=arg@entry=0x7fbddd6f5e40) at ../../accel/kvm/kvm-accel-ops.c:51
#4  0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbda9fcf30) at 
../../util/qemu-thread-posix.c:505
#5  0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
#6  0x00007fbddf127a2f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fbbdbbff700 (LWP 353245) "CPU 0/KVM"):
#0  0x00007fbddf11d237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005593ea844007 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbddd6a8c00, 
type=type@entry=44672) at ../../accel/kvm/kvm-all.c:3035
#2  0x00005593ea844171 in kvm_cpu_exec (cpu=cpu@entry=0x7fbddd6a8c00) at 
../../accel/kvm/kvm-all.c:2850
#3  0x00005593ea8457ed in kvm_vcpu_thread_fn 
(arg=arg@entry=0x7fbddd6a8c00) at ../../accel/kvm/kvm-accel-ops.c:51
#4  0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbdbbfcf30) at 
../../util/qemu-thread-posix.c:505
#5  0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
#6  0x00007fbddf127a2f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fbbdcfff700 (LWP 353244) "IO mon_iothread"):
#0  0x00007fbddf11b96f in __GI___poll (fds=0x7fbbdc209000, nfds=3, 
timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fbddf6eb0ae in ?? () from 
target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007fbddf6eb40b in g_main_loop_run () from 
target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00005593ea880509 in iothread_run 
(opaque=opaque@entry=0x7fbddd51db00) at ../../iothread.c:74
#4  0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbbdcffcf30) at 
../../util/qemu-thread-posix.c:505
#5  0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
#6  0x00007fbddf127a2f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fbdddfff700 (LWP 353235) "qemu-kvm"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00005593ea9a29aa in qemu_futex_wait (val=<optimized out>, 
f=<optimized out>) at ./include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x5593eb1a21a8 <rcu_call_ready_event>) 
at ../../util/qemu-thread-posix.c:430
#3  0x00005593ea9abd80 in call_rcu_thread (opaque=opaque@entry=0x0) at 
../../util/rcu.c:261
#4  0x00005593ea9a17e9 in qemu_thread_start (args=0x7fbdddffcf30) at 
../../util/qemu-thread-posix.c:505
#5  0x00007fbddf51bea7 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
#6  0x00007fbddf127a2f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fbdde874680 (LWP 353228) "qemu-kvm"):
#0  0x00007fbddf11ba66 in __ppoll (fds=0x7fbbd39b9f00, nfds=21, 
timeout=<optimized out>, timeout@entry=0x7fff9ff32c90, 
sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1  0x00005593ea9b70f1 in ppoll (__ss=0x0, __timeout=0x7fff9ff32c90, 
__nfds=<optimized out>, __fds=<optimized out>) at 
/usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, 
timeout=timeout@entry=2429393072) at ../../util/qemu-timer.c:351
#3  0x00005593ea9b4955 in os_host_main_loop_wait (timeout=2429393072) at 
../../util/main-loop.c:315
#4  main_loop_wait (nonblocking=nonblocking@entry=0) at 
../../util/main-loop.c:606
#5  0x00005593ea64ca91 in qemu_main_loop () at ../../softmmu/runstate.c:739
#6  0x00005593ea84c8e7 in qemu_default_main () at ../../softmmu/main.c:37
#7  0x00007fbddf04fd0a in __libc_start_main (main=0x5593ea497620 <main>, 
argc=109, argv=0x7fff9ff32e58, init=<optimized out>, fini=<optimized 
out>, rtld_fini=<optimized out>, stack_end=0x7fff9ff32e48) at 
../csu/libc-start.c:308
#8  0x00005593ea498dda in _start ()

and trace with -e all from that same guest:

              CPU-353246 [041] 5473778.770320: write_msr:            
1d9, value 4000
              CPU-353248 [027] 5473778.770321: write_msr:            
1d9, value 4000
              CPU-353246 [041] 5473778.770321: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc726140 info 683 0
              CPU-353248 [027] 5473778.770321: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
              CPU-353245 [039] 5473778.770321: kvm_entry:            
vcpu 0, rip 0xffffffffbc7648b0
              CPU-353246 [041] 5473778.770321: kvm_page_fault:       
vcpu 1 rip 0xffffffffbc726140 address 0x00000001170aaff8 error_code 0x683
              CPU-353247 [019] 5473778.770321: write_msr:            
1d9, value 4000
              CPU-353248 [027] 5473778.770321: kvm_page_fault:       
vcpu 3 rip 0xffffffffbc7648b0 address 0x0000000274300ff8 error_code 0x683
              CPU-353245 [039] 5473778.770321: rcu_utilization:      
Start context switch
              CPU-353247 [019] 5473778.770321: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
              CPU-353245 [039] 5473778.770322: rcu_utilization:      End 
context switch
              CPU-353246 [041] 5473778.770322: kvm_entry:            
vcpu 1, rip 0xffffffffbc726140
              CPU-353247 [019] 5473778.770322: kvm_page_fault:       
vcpu 2 rip 0xffffffffbc7648b0 address 0x000000026e580ff8 error_code 0x683
              CPU-353248 [027] 5473778.770322: kvm_entry:            
vcpu 3, rip 0xffffffffbc7648b0
              CPU-353246 [041] 5473778.770322: rcu_utilization:      
Start context switch
              CPU-353246 [041] 5473778.770322: rcu_utilization:      End 
context switch
              CPU-353248 [027] 5473778.770322: rcu_utilization:      
Start context switch
              CPU-353248 [027] 5473778.770322: rcu_utilization:      End 
context switch
              CPU-353247 [019] 5473778.770323: kvm_entry:            
vcpu 2, rip 0xffffffffbc7648b0
              CPU-353245 [039] 5473778.770323: write_msr:            
1d9, value 4000
              CPU-353245 [039] 5473778.770323: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
              CPU-353247 [019] 5473778.770323: rcu_utilization:      
Start context switch
              CPU-353246 [041] 5473778.770323: write_msr:            
1d9, value 4000
              CPU-353245 [039] 5473778.770323: kvm_page_fault:       
vcpu 0 rip 0xffffffffbc7648b0 address 0x0000000263670ff8 error_code 0x683
              CPU-353246 [041] 5473778.770323: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc726140 info 683 0
              CPU-353247 [019] 5473778.770323: rcu_utilization:      End 
context switch
              CPU-353248 [027] 5473778.770324: write_msr:            
1d9, value 4000
              CPU-353246 [041] 5473778.770324: kvm_page_fault:       
vcpu 1 rip 0xffffffffbc726140 address 0x00000001170aaff8 error_code 0x683
              CPU-353248 [027] 5473778.770324: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
              CPU-353245 [039] 5473778.770324: kvm_entry:            
vcpu 0, rip 0xffffffffbc7648b0
              CPU-353248 [027] 5473778.770324: kvm_page_fault:       
vcpu 3 rip 0xffffffffbc7648b0 address 0x0000000274300ff8 error_code 0x683
              CPU-353246 [041] 5473778.770324: kvm_entry:            
vcpu 1, rip 0xffffffffbc726140
              CPU-353245 [039] 5473778.770324: rcu_utilization:      
Start context switch
              CPU-353246 [041] 5473778.770325: rcu_utilization:      
Start context switch
              CPU-353247 [019] 5473778.770325: write_msr:            
1d9, value 4000
              CPU-353245 [039] 5473778.770325: rcu_utilization:      End 
context switch
              CPU-353246 [041] 5473778.770325: rcu_utilization:      End 
context switch
              CPU-353247 [019] 5473778.770325: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
              CPU-353248 [027] 5473778.770325: kvm_entry:            
vcpu 3, rip 0xffffffffbc7648b0
              CPU-353247 [019] 5473778.770325: kvm_page_fault:       
vcpu 2 rip 0xffffffffbc7648b0 address 0x000000026e580ff8 error_code 0x683
              CPU-353248 [027] 5473778.770325: rcu_utilization:      
Start context switch
              CPU-353248 [027] 5473778.770326: rcu_utilization:      End 
context switch
              CPU-353245 [039] 5473778.770326: write_msr:            
1d9, value 4000
              CPU-353246 [041] 5473778.770326: write_msr:            
1d9, value 4000
              CPU-353245 [039] 5473778.770326: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
              CPU-353247 [019] 5473778.770326: kvm_entry:            
vcpu 2, rip 0xffffffffbc7648b0
              CPU-353246 [041] 5473778.770326: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc726140 info 683 0
              CPU-353245 [039] 5473778.770326: kvm_page_fault:       
vcpu 0 rip 0xffffffffbc7648b0 address 0x0000000263670ff8 error_code 0x683
              CPU-353246 [041] 5473778.770326: kvm_page_fault:       
vcpu 1 rip 0xffffffffbc726140 address 0x00000001170aaff8 error_code 0x683
              CPU-353247 [019] 5473778.770326: rcu_utilization:      
Start context switch
              CPU-353247 [019] 5473778.770327: rcu_utilization:      End 
context switch
              CPU-353248 [027] 5473778.770327: write_msr:            
1d9, value 4000
              CPU-353246 [041] 5473778.770327: kvm_entry:            
vcpu 1, rip 0xffffffffbc726140
              CPU-353248 [027] 5473778.770327: kvm_exit:             
reason EPT_VIOLATION rip 0xffffffffbc7648b0 info 683 0
              CPU-353245 [039] 5473778.770327: kvm_entry:            
vcpu 0, rip 0xffffffffbc7648b0
              CPU-353246 [041] 5473778.770327: rcu_utilization:      
Start context switch
              CPU-353246 [041] 5473778.770328: rcu_utilization:      End 
context switch

> Another datapoint that might provide insight would be seeing if/how KVM's page
> faults stats change, e.g. look at /sys/kernel/debug/kvm/pf_* multiple times when
> the guest is stuck.

It looks like pf_taken is the only real one incrementing:

# cd /sys/kernel/debug/kvm/353228-12

# tail -n1 pf*; sleep 5; ; echo "======"; tail -n1 pf*
==> pf_emulate <==
12353

==> pf_fast <==
56

==> pf_fixed <==
27039604264

==> pf_guest <==
0

==> pf_mmio_spte_created <==
12353

==> pf_spurious <==
2348694

==> pf_taken <==
74486522452
======
==> pf_emulate <==
12353

==> pf_fast <==
56

==> pf_fixed <==
27039604264

==> pf_guest <==
0

==> pf_mmio_spte_created <==
12353

==> pf_spurious <==
2348694

==> pf_taken <==
74499574212
# tail -n1 *tlb*; sleep 5; echo "======"; tail -n1 *tlb*
==> remote_tlb_flush <==
1731549762

==> remote_tlb_flush_requests <==
3092024754

==> tlb_flush <==
8203297646
======
==> remote_tlb_flush <==
1731549762

==> remote_tlb_flush_requests <==
3092024754

==> tlb_flush <==
8203297649


> Are you able to run modified host kernels?  If so, the easiest next step, assuming
> stack traces don't provide a smoking gun, would be to add printks into the page
> fault path to see why KVM is retrying instead of installing a SPTE.
We can, but it can take quite some time from when we do the update to 
actually seeing results.  This problem is inconsistent at best, and even 
though we're seeing it a ton of times a day, it's can show up anywhere.  
Even if we rolled it out today, we'd still be looking at weeks/months 
before we had any significant number of machines on it.

  reply	other threads:[~2023-05-30 17:36 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-23 14:02 Deadlock due to EPT_VIOLATION Brian Rak
2023-05-23 16:22 ` Sean Christopherson
2023-05-24 13:39   ` Brian Rak
2023-05-26 16:59     ` Brian Rak
2023-05-26 21:02       ` Sean Christopherson
2023-05-30 17:35         ` Brian Rak [this message]
2023-05-30 18:36           ` Sean Christopherson
2023-05-31 17:40             ` Brian Rak
2023-07-21 14:34             ` Amaan Cheval
2023-07-21 17:37               ` Sean Christopherson
2023-07-24 12:08                 ` Amaan Cheval
2023-07-25 17:30                   ` Sean Christopherson
2023-08-02 14:21                     ` Amaan Cheval
2023-08-02 15:34                       ` Sean Christopherson
2023-08-02 16:45                         ` Amaan Cheval
2023-08-02 17:52                           ` Sean Christopherson
2023-08-08 15:34                             ` Amaan Cheval
2023-08-08 17:07                               ` Sean Christopherson
2023-08-10  0:48                                 ` Eric Wheeler
2023-08-10  1:27                                   ` Eric Wheeler
2023-08-10 23:58                                     ` Sean Christopherson
2023-08-11 12:37                                       ` Amaan Cheval
2023-08-11 18:02                                         ` Sean Christopherson
2023-08-12  0:50                                           ` Eric Wheeler
2023-08-14 17:29                                             ` Sean Christopherson
2023-08-15  0:30                                 ` Eric Wheeler
2023-08-15 16:10                                   ` Sean Christopherson
2023-08-16 23:54                                     ` Eric Wheeler
2023-08-17 18:21                                       ` Sean Christopherson
2023-08-18  0:55                                         ` Eric Wheeler
2023-08-18 14:33                                           ` Sean Christopherson
2023-08-18 23:06                                             ` Eric Wheeler
2023-08-21 20:27                                               ` Eric Wheeler
2023-08-21 23:51                                                 ` Sean Christopherson
2023-08-22  0:11                                                   ` Sean Christopherson
2023-08-22  1:10                                                   ` Eric Wheeler
2023-08-22 15:11                                                     ` Sean Christopherson
2023-08-22 21:23                                                       ` Eric Wheeler
2023-08-22 21:32                                                         ` Sean Christopherson
2023-08-23  0:39                                                       ` Eric Wheeler
2023-08-23 17:54                                                         ` Sean Christopherson
2023-08-23 19:44                                                           ` Eric Wheeler
2023-08-23 22:12                                                           ` Eric Wheeler
2023-08-23 22:32                                                             ` Eric Wheeler
2023-08-23 23:21                                                               ` Sean Christopherson
2023-08-24  0:30                                                                 ` Eric Wheeler
2023-08-24  0:52                                                                   ` Sean Christopherson
2023-08-24 23:51                                                                     ` Eric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fa11f06-bd55-b061-d16a-081351f04a13@gameservers.com \
    --to=brak@vultr.com \
    --cc=brak@gameservers.com \
    --cc=kvm@vger.kernel.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.