kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v2 0/7] KVM: nVMX: Fixes for nested state migration when eVMCS is in use
Date: Thu, 27 May 2021 17:11:06 +0300	[thread overview]
Message-ID: <3b76c3da7af87c576862fa6a538505fe89a47702.camel@redhat.com> (raw)
In-Reply-To: <87a6og7ghb.fsf@vitty.brq.redhat.com>

On Thu, 2021-05-27 at 10:01 +0200, Vitaly Kuznetsov wrote:
> Maxim Levitsky <mlevitsk@redhat.com> writes:
> 
> > On Mon, 2021-05-24 at 14:44 +0200, Vitaly Kuznetsov wrote:
> > > Maxim Levitsky <mlevitsk@redhat.com> writes:
> > > 
> > > > On Mon, 2021-05-17 at 15:50 +0200, Vitaly Kuznetsov wrote:
> > > > > Changes since v1 (Sean):
> > > > > - Drop now-unneeded curly braces in nested_sync_vmcs12_to_shadow().
> > > > > - Pass 'evmcs->hv_clean_fields' instead of 'bool from_vmentry' to
> > > > >   copy_enlightened_to_vmcs12().
> > > > > 
> > > > > Commit f5c7e8425f18 ("KVM: nVMX: Always make an attempt to map eVMCS after
> > > > > migration") fixed the most obvious reason why Hyper-V on KVM (e.g. Win10
> > > > >  + WSL2) was crashing immediately after migration. It was also reported
> > > > > that we have more issues to fix as, while the failure rate was lowered 
> > > > > signifincatly, it was still possible to observe crashes after several
> > > > > dozens of migration. Turns out, the issue arises when we manage to issue
> > > > > KVM_GET_NESTED_STATE right after L2->L2 VMEXIT but before L1 gets a chance
> > > > > to run. This state is tracked with 'need_vmcs12_to_shadow_sync' flag but
> > > > > the flag itself is not part of saved nested state. A few other less 
> > > > > significant issues are fixed along the way.
> > > > > 
> > > > > While there's no proof this series fixes all eVMCS related problems,
> > > > > Win10+WSL2 was able to survive 3333 (thanks, Max!) migrations without
> > > > > crashing in testing.
> > > > > 
> > > > > Patches are based on the current kvm/next tree.
> > > > > 
> > > > > Vitaly Kuznetsov (7):
> > > > >   KVM: nVMX: Introduce nested_evmcs_is_used()
> > > > >   KVM: nVMX: Release enlightened VMCS on VMCLEAR
> > > > >   KVM: nVMX: Ignore 'hv_clean_fields' data when eVMCS data is copied in
> > > > >     vmx_get_nested_state()
> > > > >   KVM: nVMX: Force enlightened VMCS sync from nested_vmx_failValid()
> > > > >   KVM: nVMX: Reset eVMCS clean fields data from prepare_vmcs02()
> > > > >   KVM: nVMX: Request to sync eVMCS from VMCS12 after migration
> > > > >   KVM: selftests: evmcs_test: Test that KVM_STATE_NESTED_EVMCS is never
> > > > >     lost
> > > > > 
> > > > >  arch/x86/kvm/vmx/nested.c                     | 110 ++++++++++++------
> > > > >  .../testing/selftests/kvm/x86_64/evmcs_test.c |  64 +++++-----
> > > > >  2 files changed, 115 insertions(+), 59 deletions(-)
> > > > > 
> > > > 
> > > > Hi Vitaly!
> > > > 
> > > > In addition to the review of this patch series,
> > > 
> > > Thanks by the way!
> > No problem!
> > 
> > > >  I would like
> > > > to share an idea on how to avoid the hack of mapping the evmcs
> > > > in nested_vmx_vmexit, because I think I found a possible generic
> > > > solution to this and similar issues:
> > > > 
> > > > The solution is to always set nested_run_pending after 
> > > > nested migration (which means that we won't really
> > > > need to migrate this flag anymore).
> > > > 
> > > > I was thinking a lot about it and I think that there is no downside to this,
> > > > other than sometimes a one extra vmexit after migration.
> > > > 
> > > > Otherwise there is always a risk of the following scenario:
> > > > 
> > > >   1. We migrate with nested_run_pending=0 (but don't restore all the state
> > > >   yet, like that HV_X64_MSR_VP_ASSIST_PAGE msr,
> > > >   or just the guest memory map is not up to date, guest is in smm or something
> > > >   like that)
> > > > 
> > > >   2. Userspace calls some ioctl that causes a nested vmexit
> > > > 
> > > >   This can happen today if the userspace calls 
> > > >   kvm_arch_vcpu_ioctl_get_mpstate -> kvm_apic_accept_events -> kvm_check_nested_events
> > > > 
> > > >   3. Userspace finally sets correct guest's msrs, correct guest memory map and only
> > > >   then calls KVM_RUN
> > > > 
> > > > This means that at (2) we can't map and write the evmcs/vmcs12/vmcb12 even
> > > > if KVM_REQ_GET_NESTED_STATE_PAGES is pending,
> > > > but we have to do so to complete the nested vmexit.
> > > 
> > > Why do we need to write to eVMCS to complete vmexit? AFAICT, there's
> > > only one place which calls copy_vmcs12_to_enlightened():
> > > nested_sync_vmcs12_to_shadow() which, in its turn, has only 1 caller:
> > > vmx_prepare_switch_to_guest() so unless userspace decided to execute
> > > not-fully-restored guest this should not happen. I'm probably missing
> > > something in your scenario)
> > You are right! 
> > The evmcs write is delayed to the next vmentry.
> > 
> > However since we are now mapping the evmcs during nested vmexit,
> > and this can fail for example that HV assist msr is not up to date.
> > 
> > For example consider this: 
> > 
> > 1. Userspace first sets nested state
> > 2. Userspace calls KVM_GET_MP_STATE.
> > 3. Nested vmexit that happened in 2 will end up not be able to map the evmcs,
> > since HV_ASSIST msr is not yet loaded.
> > 
> > 
> > Also the vmcb write (that is for SVM) _is_ done right away on nested vmexit 
> > and conceptually has the same issue.
> > (if memory map is not up to date, we might not be able to read/write the 
> > vmcb12 on nested vmexit)
> > 
> 
> It seems we have one correct way to restore a guest and a number of
> incorrect ones :-) It may happen that this is not even a nested-only
> thing (think about trying to resore caps, regs, msrs, cpuids, in a
> random sequence). I'd vote for documenting the right one somewhere, even
> if we'll just be extracting it from QEMU.
> 
> > > > To some extent, the entry to the nested mode after a migration is only complete
> > > > when we process the KVM_REQ_GET_NESTED_STATE_PAGES, so we shoudn't interrupt it.
> > > > 
> > > > This will allow us to avoid dealing with KVM_REQ_GET_NESTED_STATE_PAGES on
> > > > nested vmexit path at all. 
> > > 
> > > Remember, we have three possible states when nested state is
> > > transferred:
> > > 1) L2 was running
> > > 2) L1 was running
> > > 3) We're in beetween L2 and L1 (need_vmcs12_to_shadow_sync = true).
> > 
> > I understand. This suggestion wasn't meant to fix the case 3, but more to fix
> > case 1, where we are in L2, migrate, and then immediately decide to 
> > do a nested vmexit before we processed the KVM_REQ_GET_NESTED_STATE_PAGES
> > request, and also before potentially before the guest state was fully uploaded
> > (see that KVM_GET_MP_STATE thing).
> >  
> > In a nutshell, I vote for not allowing nested vmexits from the moment
> > when we set the nested state and until the moment we enter the nested
> > guest once (maybe with request for immediate vmexit),
> > because during this time period, the guest state is not fully consistent.
> > 
> 
> Using 'nested_run_pending=1' perhaps? Or, we can get back to 'vm_bugged'
> idea and kill the guest immediately if something forces such an exit.

Exactly, this is my idea. Set the nested_run_pending=1 always after the migration
It shoudn't cause any issues and it would avoid cases like that.

That variable can then be renamed too to something like 'nested_vmexit_not_allowed'
or something like that.

Paolo, what do you think?

Best regards,
	Maxim Levitsky

> 



  reply	other threads:[~2021-05-27 14:11 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 13:50 [PATCH v2 0/7] KVM: nVMX: Fixes for nested state migration when eVMCS is in use Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 1/7] KVM: nVMX: Introduce nested_evmcs_is_used() Vitaly Kuznetsov
2021-05-24 12:11   ` Maxim Levitsky
2021-05-24 12:35     ` Vitaly Kuznetsov
2021-05-26 14:34       ` Maxim Levitsky
2021-05-27  7:54         ` Vitaly Kuznetsov
2021-05-27 14:10           ` Maxim Levitsky
2021-05-24 13:54   ` Paolo Bonzini
2021-05-24 14:09     ` Vitaly Kuznetsov
2021-05-24 14:18       ` Paolo Bonzini
2021-05-24 14:37         ` Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 2/7] KVM: nVMX: Release enlightened VMCS on VMCLEAR Vitaly Kuznetsov
2021-05-24 12:13   ` Maxim Levitsky
2021-05-17 13:50 ` [PATCH v2 3/7] KVM: nVMX: Ignore 'hv_clean_fields' data when eVMCS data is copied in vmx_get_nested_state() Vitaly Kuznetsov
2021-05-24 12:26   ` Maxim Levitsky
2021-05-24 13:01     ` Vitaly Kuznetsov
2021-05-24 13:58       ` Paolo Bonzini
2021-05-26 14:44         ` Maxim Levitsky
2021-05-24 13:56   ` Paolo Bonzini
2021-05-24 14:12     ` Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 4/7] KVM: nVMX: Force enlightened VMCS sync from nested_vmx_failValid() Vitaly Kuznetsov
2021-05-24 12:27   ` Maxim Levitsky
2021-05-17 13:50 ` [PATCH v2 5/7] KVM: nVMX: Reset eVMCS clean fields data from prepare_vmcs02() Vitaly Kuznetsov
2021-05-24 12:34   ` Maxim Levitsky
2021-05-24 13:07     ` Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 6/7] KVM: nVMX: Request to sync eVMCS from VMCS12 after migration Vitaly Kuznetsov
2021-05-24 12:35   ` Maxim Levitsky
2021-05-17 13:50 ` [PATCH v2 7/7] KVM: selftests: evmcs_test: Test that KVM_STATE_NESTED_EVMCS is never lost Vitaly Kuznetsov
2021-05-24 12:36   ` Maxim Levitsky
2021-05-24 12:08 ` [PATCH v2 0/7] KVM: nVMX: Fixes for nested state migration when eVMCS is in use Maxim Levitsky
2021-05-24 12:44   ` Vitaly Kuznetsov
2021-05-26 14:41     ` Maxim Levitsky
2021-05-27  8:01       ` Vitaly Kuznetsov
2021-05-27 14:11         ` Maxim Levitsky [this message]
2021-05-27 14:17           ` Paolo Bonzini
2021-05-24 14:01 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b76c3da7af87c576862fa6a538505fe89a47702.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).