All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Rak <brak@vultr.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Subject: Re: Deadlock due to EPT_VIOLATION
Date: Wed, 31 May 2023 13:40:17 -0400	[thread overview]
Message-ID: <7fb24485-5049-a64c-0f62-bedebbc5eec2@gameservers.com> (raw)
In-Reply-To: <ZHZCEUzr9Ak7rkjG@google.com>


On 5/30/2023 2:36 PM, Sean Christopherson wrote:
> On Tue, May 30, 2023, Brian Rak wrote:
>> On 5/26/2023 5:02 PM, Sean Christopherson wrote:
>>> On Fri, May 26, 2023, Brian Rak wrote:
>>>> On 5/24/2023 9:39 AM, Brian Rak wrote:
>>>>> On 5/23/2023 12:22 PM, Sean Christopherson wrote:
>>>>>> The other thing that would be helpful would be getting kernel stack
>>>>>> traces of the
>>>>>> relevant tasks/threads.� The vCPU stack traces won't be interesting,
>>>>>> but it'll
>>>>>> likely help to see what the fallocate() tasks are doing.
>>>>> I'll see what I can come up with here, I was running into some
>>>>> difficulty getting useful stack traces out of the VM
>>>> I didn't have any luck gathering guest-level stack traces - kaslr makes it
>>>> pretty difficult even if I have the guest kernel symbols.
>>> Sorry, I was hoping to get host stack traces, not guest stack traces.  I am hoping
>>> to see what the fallocate() in the *host* is doing.
>> Ah - here's a different instance of it with a full backtrace from the host:
> Gah, I wasn't specific enough again.  Though there's no longer an fallocate() for
> any of the threads', so that's probably a moot point.  What I wanted to see is what
> exactly the host kernel was doing, e.g. if something in the host memory management
> was indirectly preventing vCPUs from making forward progress.  But that doesn't
> seem to be the case here, and I would expect other problems if fallocate() was
> stuck.  So ignore that request for now.
>
>>> Another datapoint that might provide insight would be seeing if/how KVM's page
>>> faults stats change, e.g. look at /sys/kernel/debug/kvm/pf_* multiple times when
>>> the guest is stuck.
>> It looks like pf_taken is the only real one incrementing:
> Drat.  That's what I expected, but it doesn't narrow down the search much.
>
>>> Are you able to run modified host kernels?  If so, the easiest next step, assuming
>>> stack traces don't provide a smoking gun, would be to add printks into the page
>>> fault path to see why KVM is retrying instead of installing a SPTE.
>> We can, but it can take quite some time from when we do the update to
>> actually seeing results.� This problem is inconsistent at best, and even
>> though we're seeing it a ton of times a day, it's can show up anywhere.�
>> Even if we rolled it out today, we'd still be looking at weeks/months before
>> we had any significant number of machines on it.
> Would you be able to run a bpftrace program on a host with a stuck guest?  If so,
> I believe I could craft a program for the kvm_exit tracepoint that would rule out
> or confirm two of the three likely culprits.
>
> Can you also dump the kvm.ko module params?  E.g. `tail /sys/module/kvm/parameters/*`

Yes, we can run bpftrace programs

# tail /sys/module/kvm/parameters/*
==> /sys/module/kvm/parameters/eager_page_split <==
Y

==> /sys/module/kvm/parameters/enable_pmu <==
Y

==> /sys/module/kvm/parameters/enable_vmware_backdoor <==
N

==> /sys/module/kvm/parameters/flush_on_reuse <==
N

==> /sys/module/kvm/parameters/force_emulation_prefix <==
0

==> /sys/module/kvm/parameters/halt_poll_ns <==
200000

==> /sys/module/kvm/parameters/halt_poll_ns_grow <==
2

==> /sys/module/kvm/parameters/halt_poll_ns_grow_start <==
10000

==> /sys/module/kvm/parameters/halt_poll_ns_shrink <==
0

==> /sys/module/kvm/parameters/ignore_msrs <==
N

==> /sys/module/kvm/parameters/kvmclock_periodic_sync <==
Y

==> /sys/module/kvm/parameters/lapic_timer_advance_ns <==
-1

==> /sys/module/kvm/parameters/min_timer_period_us <==
200

==> /sys/module/kvm/parameters/mitigate_smt_rsb <==
N

==> /sys/module/kvm/parameters/mmio_caching <==
Y

==> /sys/module/kvm/parameters/nx_huge_pages <==
Y

==> /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms <==
0

==> /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio <==
60

==> /sys/module/kvm/parameters/pi_inject_timer <==
0

==> /sys/module/kvm/parameters/report_ignored_msrs <==
Y

==> /sys/module/kvm/parameters/tdp_mmu <==
Y

==> /sys/module/kvm/parameters/tsc_tolerance_ppm <==
250

==> /sys/module/kvm/parameters/vector_hashing <==
Y



  reply	other threads:[~2023-05-31 17:41 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-23 14:02 Deadlock due to EPT_VIOLATION Brian Rak
2023-05-23 16:22 ` Sean Christopherson
2023-05-24 13:39   ` Brian Rak
2023-05-26 16:59     ` Brian Rak
2023-05-26 21:02       ` Sean Christopherson
2023-05-30 17:35         ` Brian Rak
2023-05-30 18:36           ` Sean Christopherson
2023-05-31 17:40             ` Brian Rak [this message]
2023-07-21 14:34             ` Amaan Cheval
2023-07-21 17:37               ` Sean Christopherson
2023-07-24 12:08                 ` Amaan Cheval
2023-07-25 17:30                   ` Sean Christopherson
2023-08-02 14:21                     ` Amaan Cheval
2023-08-02 15:34                       ` Sean Christopherson
2023-08-02 16:45                         ` Amaan Cheval
2023-08-02 17:52                           ` Sean Christopherson
2023-08-08 15:34                             ` Amaan Cheval
2023-08-08 17:07                               ` Sean Christopherson
2023-08-10  0:48                                 ` Eric Wheeler
2023-08-10  1:27                                   ` Eric Wheeler
2023-08-10 23:58                                     ` Sean Christopherson
2023-08-11 12:37                                       ` Amaan Cheval
2023-08-11 18:02                                         ` Sean Christopherson
2023-08-12  0:50                                           ` Eric Wheeler
2023-08-14 17:29                                             ` Sean Christopherson
2023-08-15  0:30                                 ` Eric Wheeler
2023-08-15 16:10                                   ` Sean Christopherson
2023-08-16 23:54                                     ` Eric Wheeler
2023-08-17 18:21                                       ` Sean Christopherson
2023-08-18  0:55                                         ` Eric Wheeler
2023-08-18 14:33                                           ` Sean Christopherson
2023-08-18 23:06                                             ` Eric Wheeler
2023-08-21 20:27                                               ` Eric Wheeler
2023-08-21 23:51                                                 ` Sean Christopherson
2023-08-22  0:11                                                   ` Sean Christopherson
2023-08-22  1:10                                                   ` Eric Wheeler
2023-08-22 15:11                                                     ` Sean Christopherson
2023-08-22 21:23                                                       ` Eric Wheeler
2023-08-22 21:32                                                         ` Sean Christopherson
2023-08-23  0:39                                                       ` Eric Wheeler
2023-08-23 17:54                                                         ` Sean Christopherson
2023-08-23 19:44                                                           ` Eric Wheeler
2023-08-23 22:12                                                           ` Eric Wheeler
2023-08-23 22:32                                                             ` Eric Wheeler
2023-08-23 23:21                                                               ` Sean Christopherson
2023-08-24  0:30                                                                 ` Eric Wheeler
2023-08-24  0:52                                                                   ` Sean Christopherson
2023-08-24 23:51                                                                     ` Eric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7fb24485-5049-a64c-0f62-bedebbc5eec2@gameservers.com \
    --to=brak@vultr.com \
    --cc=brak@gameservers.com \
    --cc=kvm@vger.kernel.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.