All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v2 0/7] KVM: nVMX: Fixes for nested state migration when eVMCS is in use
Date: Thu, 27 May 2021 10:01:20 +0200	[thread overview]
Message-ID: <87a6og7ghb.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <5a6314ff3c7b9cc8e6bdf452008ad1b264c95608.camel@redhat.com>

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Mon, 2021-05-24 at 14:44 +0200, Vitaly Kuznetsov wrote:
>> Maxim Levitsky <mlevitsk@redhat.com> writes:
>> 
>> > On Mon, 2021-05-17 at 15:50 +0200, Vitaly Kuznetsov wrote:
>> > > Changes since v1 (Sean):
>> > > - Drop now-unneeded curly braces in nested_sync_vmcs12_to_shadow().
>> > > - Pass 'evmcs->hv_clean_fields' instead of 'bool from_vmentry' to
>> > >   copy_enlightened_to_vmcs12().
>> > > 
>> > > Commit f5c7e8425f18 ("KVM: nVMX: Always make an attempt to map eVMCS after
>> > > migration") fixed the most obvious reason why Hyper-V on KVM (e.g. Win10
>> > >  + WSL2) was crashing immediately after migration. It was also reported
>> > > that we have more issues to fix as, while the failure rate was lowered 
>> > > signifincatly, it was still possible to observe crashes after several
>> > > dozens of migration. Turns out, the issue arises when we manage to issue
>> > > KVM_GET_NESTED_STATE right after L2->L2 VMEXIT but before L1 gets a chance
>> > > to run. This state is tracked with 'need_vmcs12_to_shadow_sync' flag but
>> > > the flag itself is not part of saved nested state. A few other less 
>> > > significant issues are fixed along the way.
>> > > 
>> > > While there's no proof this series fixes all eVMCS related problems,
>> > > Win10+WSL2 was able to survive 3333 (thanks, Max!) migrations without
>> > > crashing in testing.
>> > > 
>> > > Patches are based on the current kvm/next tree.
>> > > 
>> > > Vitaly Kuznetsov (7):
>> > >   KVM: nVMX: Introduce nested_evmcs_is_used()
>> > >   KVM: nVMX: Release enlightened VMCS on VMCLEAR
>> > >   KVM: nVMX: Ignore 'hv_clean_fields' data when eVMCS data is copied in
>> > >     vmx_get_nested_state()
>> > >   KVM: nVMX: Force enlightened VMCS sync from nested_vmx_failValid()
>> > >   KVM: nVMX: Reset eVMCS clean fields data from prepare_vmcs02()
>> > >   KVM: nVMX: Request to sync eVMCS from VMCS12 after migration
>> > >   KVM: selftests: evmcs_test: Test that KVM_STATE_NESTED_EVMCS is never
>> > >     lost
>> > > 
>> > >  arch/x86/kvm/vmx/nested.c                     | 110 ++++++++++++------
>> > >  .../testing/selftests/kvm/x86_64/evmcs_test.c |  64 +++++-----
>> > >  2 files changed, 115 insertions(+), 59 deletions(-)
>> > > 
>> > 
>> > Hi Vitaly!
>> > 
>> > In addition to the review of this patch series,
>> 
>> Thanks by the way!
> No problem!
>
>> 
>> >  I would like
>> > to share an idea on how to avoid the hack of mapping the evmcs
>> > in nested_vmx_vmexit, because I think I found a possible generic
>> > solution to this and similar issues:
>> > 
>> > The solution is to always set nested_run_pending after 
>> > nested migration (which means that we won't really
>> > need to migrate this flag anymore).
>> > 
>> > I was thinking a lot about it and I think that there is no downside to this,
>> > other than sometimes a one extra vmexit after migration.
>> > 
>> > Otherwise there is always a risk of the following scenario:
>> > 
>> >   1. We migrate with nested_run_pending=0 (but don't restore all the state
>> >   yet, like that HV_X64_MSR_VP_ASSIST_PAGE msr,
>> >   or just the guest memory map is not up to date, guest is in smm or something
>> >   like that)
>> > 
>> >   2. Userspace calls some ioctl that causes a nested vmexit
>> > 
>> >   This can happen today if the userspace calls 
>> >   kvm_arch_vcpu_ioctl_get_mpstate -> kvm_apic_accept_events -> kvm_check_nested_events
>> > 
>> >   3. Userspace finally sets correct guest's msrs, correct guest memory map and only
>> >   then calls KVM_RUN
>> > 
>> > This means that at (2) we can't map and write the evmcs/vmcs12/vmcb12 even
>> > if KVM_REQ_GET_NESTED_STATE_PAGES is pending,
>> > but we have to do so to complete the nested vmexit.
>> 
>> Why do we need to write to eVMCS to complete vmexit? AFAICT, there's
>> only one place which calls copy_vmcs12_to_enlightened():
>> nested_sync_vmcs12_to_shadow() which, in its turn, has only 1 caller:
>> vmx_prepare_switch_to_guest() so unless userspace decided to execute
>> not-fully-restored guest this should not happen. I'm probably missing
>> something in your scenario)
> You are right! 
> The evmcs write is delayed to the next vmentry.
>
> However since we are now mapping the evmcs during nested vmexit,
> and this can fail for example that HV assist msr is not up to date.
>
> For example consider this: 
>
> 1. Userspace first sets nested state
> 2. Userspace calls KVM_GET_MP_STATE.
> 3. Nested vmexit that happened in 2 will end up not be able to map the evmcs,
> since HV_ASSIST msr is not yet loaded.
>
>
> Also the vmcb write (that is for SVM) _is_ done right away on nested vmexit 
> and conceptually has the same issue.
> (if memory map is not up to date, we might not be able to read/write the 
> vmcb12 on nested vmexit)
>

It seems we have one correct way to restore a guest and a number of
incorrect ones :-) It may happen that this is not even a nested-only
thing (think about trying to resore caps, regs, msrs, cpuids, in a
random sequence). I'd vote for documenting the right one somewhere, even
if we'll just be extracting it from QEMU.

>
>> 
>> > To some extent, the entry to the nested mode after a migration is only complete
>> > when we process the KVM_REQ_GET_NESTED_STATE_PAGES, so we shoudn't interrupt it.
>> > 
>> > This will allow us to avoid dealing with KVM_REQ_GET_NESTED_STATE_PAGES on
>> > nested vmexit path at all. 
>> 
>> Remember, we have three possible states when nested state is
>> transferred:
>> 1) L2 was running
>> 2) L1 was running
>> 3) We're in beetween L2 and L1 (need_vmcs12_to_shadow_sync = true).
>
> I understand. This suggestion wasn't meant to fix the case 3, but more to fix
> case 1, where we are in L2, migrate, and then immediately decide to 
> do a nested vmexit before we processed the KVM_REQ_GET_NESTED_STATE_PAGES
> request, and also before potentially before the guest state was fully uploaded
> (see that KVM_GET_MP_STATE thing).
>  
> In a nutshell, I vote for not allowing nested vmexits from the moment
> when we set the nested state and until the moment we enter the nested
> guest once (maybe with request for immediate vmexit),
> because during this time period, the guest state is not fully consistent.
>

Using 'nested_run_pending=1' perhaps? Or, we can get back to 'vm_bugged'
idea and kill the guest immediately if something forces such an exit.

-- 
Vitaly


  reply	other threads:[~2021-05-27  8:02 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 13:50 [PATCH v2 0/7] KVM: nVMX: Fixes for nested state migration when eVMCS is in use Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 1/7] KVM: nVMX: Introduce nested_evmcs_is_used() Vitaly Kuznetsov
2021-05-24 12:11   ` Maxim Levitsky
2021-05-24 12:35     ` Vitaly Kuznetsov
2021-05-26 14:34       ` Maxim Levitsky
2021-05-27  7:54         ` Vitaly Kuznetsov
2021-05-27 14:10           ` Maxim Levitsky
2021-05-24 13:54   ` Paolo Bonzini
2021-05-24 14:09     ` Vitaly Kuznetsov
2021-05-24 14:18       ` Paolo Bonzini
2021-05-24 14:37         ` Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 2/7] KVM: nVMX: Release enlightened VMCS on VMCLEAR Vitaly Kuznetsov
2021-05-24 12:13   ` Maxim Levitsky
2021-05-17 13:50 ` [PATCH v2 3/7] KVM: nVMX: Ignore 'hv_clean_fields' data when eVMCS data is copied in vmx_get_nested_state() Vitaly Kuznetsov
2021-05-24 12:26   ` Maxim Levitsky
2021-05-24 13:01     ` Vitaly Kuznetsov
2021-05-24 13:58       ` Paolo Bonzini
2021-05-26 14:44         ` Maxim Levitsky
2021-05-24 13:56   ` Paolo Bonzini
2021-05-24 14:12     ` Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 4/7] KVM: nVMX: Force enlightened VMCS sync from nested_vmx_failValid() Vitaly Kuznetsov
2021-05-24 12:27   ` Maxim Levitsky
2021-05-17 13:50 ` [PATCH v2 5/7] KVM: nVMX: Reset eVMCS clean fields data from prepare_vmcs02() Vitaly Kuznetsov
2021-05-24 12:34   ` Maxim Levitsky
2021-05-24 13:07     ` Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 6/7] KVM: nVMX: Request to sync eVMCS from VMCS12 after migration Vitaly Kuznetsov
2021-05-24 12:35   ` Maxim Levitsky
2021-05-17 13:50 ` [PATCH v2 7/7] KVM: selftests: evmcs_test: Test that KVM_STATE_NESTED_EVMCS is never lost Vitaly Kuznetsov
2021-05-24 12:36   ` Maxim Levitsky
2021-05-24 12:08 ` [PATCH v2 0/7] KVM: nVMX: Fixes for nested state migration when eVMCS is in use Maxim Levitsky
2021-05-24 12:44   ` Vitaly Kuznetsov
2021-05-26 14:41     ` Maxim Levitsky
2021-05-27  8:01       ` Vitaly Kuznetsov [this message]
2021-05-27 14:11         ` Maxim Levitsky
2021-05-27 14:17           ` Paolo Bonzini
2021-05-24 14:01 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6og7ghb.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.