All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] x86: Eliminate VM state changes during KVM_GET_MP_STATE
@ 2021-06-10 20:39 Jim Mattson
  2021-06-11 10:31 ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Jim Mattson @ 2021-06-10 20:39 UTC (permalink / raw)
  To: kvm list; +Cc: Jan Kiszka, Paolo Bonzini

The call to kvm_apic_accept_events(vcpu) from
kvm_arch_vcpu_ioctl_get_mpstate() is just, oh, so wrong. First, it can
change all kinds of vCPU state, which imposes undocumented
dependencies on reliable orderings of the KVM_GET_* family of ioctls
for serializing VM state. But, even worse, it can modify guest memory,
even while all vCPU threads are stopped! Just follow the call chain
down to vmx_complete_nested_posted_interrupt(). That path wasn't there
when the call to kvm_apic_accept_events(vcpu) was added to
kvm_arch_vcpu_ioctl_get_mpstate() back in 2013, but it's there now.

I'm not entirely sure why this call was added in the first place. Was
this simply to avoid serializing the two new bits in
apic->pending_events? It seems that we now *do* serialize
KVM_APIC_INIT in the kvm_vcpu_events struct. Strangely, we call it
smi.latched_init, but it looks like we put the pending_events bit in
there all the time, regardless of what's happening with SMM.

Would anyone object to serializing KVM_APIC_SIPI in the
kvm_vcpu_events struct as well? Or would it be better to resurrect
KVM_MP_STATE_SIPI_RECEIVED? In any event, the call to
kvm_apic_accept_events(vcpu) from kvm_arch_vcpu_ioctl_get_mpstate()
has to go.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] x86: Eliminate VM state changes during KVM_GET_MP_STATE
  2021-06-10 20:39 [RFC] x86: Eliminate VM state changes during KVM_GET_MP_STATE Jim Mattson
@ 2021-06-11 10:31 ` Paolo Bonzini
  2021-06-14 21:19   ` Jim Mattson
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2021-06-11 10:31 UTC (permalink / raw)
  To: Jim Mattson, kvm list; +Cc: Jan Kiszka

On 10/06/21 22:39, Jim Mattson wrote:
> But, even worse, it can modify guest memory,
> even while all vCPU threads are stopped!

To some extent this is a userspace issue---they could declare vCPU 
threads stopped only after KVM_GET_MPSTATE is done, and only start the 
downtime phase of migration after that.  But it is nevertheless a pretty 
bad excuse.

> Was this simply to avoid serializing the two new bits in 
> apic->pending_events?

Yes, and more precisely to allow some interoperability between old and 
new kernels.

> Would anyone object to serializing KVM_APIC_SIPI in the 
> kvm_vcpu_events struct as well? Or would it be better to resurrect 
> KVM_MP_STATE_SIPI_RECEIVED?

Reusing KVM_VCPUEVENT_VALID_SIPI_VECTOR to mean KVM_APIC_SIPI is set 
would be nice, but it would break new->old migration (because old 
kernels only set KVM_APIC_SIPI if they see KVM_MP_STATE_SIPI_RECEIVED). 
  Can we decide this migration corner case is not something we care about?

Using KVM_MP_STATE_SIPI_RECEIVED solves interoperability issues because 
we never deleted the pre-2013 code, on the other hand 
KVM_MP_STATE_SIPI_RECEIVED assumes the existing mpstate is 
KVM_MP_STATE_INIT_RECEIVED; it does not account very well for the case 
of INIT+SIPI both being pending.  Unlike real hardware, KVM will queue a 
SIPI if it comes before the INIT has been processed, so that even in 
overcommit scenarios it is not possible to fail the INIT-SIPI; dropping 
kvm_apic_accept_events altogether would break this "tweak" across 
migration, which might cause failure to bring up APs.

If we start with just removing guest memory writes, there is an easy way 
out: the tweak does not work in guest mode, where INIT causes a vmexit 
and a pre-queued SIPI will never be delivered.  So, if in guest mode, we 
can ignore the case of pending INIT+SIPI, and only do a minimal version 
of kvm_apic_accept_events() that delays the larger side effects to the 
next KVM_RUN.

For a first improvement, the logic becomes the following:

* if in guest mode and both INIT and SIPI are set, clear KVM_APIC_SIPI 
and exit.  KVM_GET_VCPU_EVENTS will set latched_init.

* if in guest mode and SIPI is set, KVM_APIC_SIPI is transmitted to 
userspace via KVM_MP_STATE_SIPI_RECEIVED and 
KVM_VCPUEVENT_VALID_SIPI_VECTOR.

* if not in guest mode, run kvm_apic_accept_events.  Later this can be 
changed to drop KVM_APIC_SIPI if it's deemed preferrable.

I'll post a few RFC patches.  Selftests would be needed too.

Paolo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] x86: Eliminate VM state changes during KVM_GET_MP_STATE
  2021-06-11 10:31 ` Paolo Bonzini
@ 2021-06-14 21:19   ` Jim Mattson
  2021-06-15  8:52     ` Maxim Levitsky
  0 siblings, 1 reply; 4+ messages in thread
From: Jim Mattson @ 2021-06-14 21:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm list, Jan Kiszka

On Fri, Jun 11, 2021 at 3:31 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/06/21 22:39, Jim Mattson wrote:
> > But, even worse, it can modify guest memory,
> > even while all vCPU threads are stopped!
>
> To some extent this is a userspace issue---they could declare vCPU
> threads stopped only after KVM_GET_MPSTATE is done, and only start the
> downtime phase of migration after that.  But it is nevertheless a pretty
> bad excuse.

I agree that this could be fixed by documenting the behavior. Since I
don't think there's any existing documentation that says which ioctls
can modify guest memory, such a documentation change wouldn't actually
constitute an API breakage.

BTW, which ioctls can modify guest memory?

And, while we're at it, can we document the required orderings of the
various _GET_ and _SET_ ioctls for save and restore?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] x86: Eliminate VM state changes during KVM_GET_MP_STATE
  2021-06-14 21:19   ` Jim Mattson
@ 2021-06-15  8:52     ` Maxim Levitsky
  0 siblings, 0 replies; 4+ messages in thread
From: Maxim Levitsky @ 2021-06-15  8:52 UTC (permalink / raw)
  To: Jim Mattson, Paolo Bonzini; +Cc: kvm list, Jan Kiszka

On Mon, 2021-06-14 at 14:19 -0700, Jim Mattson wrote:
> On Fri, Jun 11, 2021 at 3:31 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > On 10/06/21 22:39, Jim Mattson wrote:
> > > But, even worse, it can modify guest memory,
> > > even while all vCPU threads are stopped!
> > 
> > To some extent this is a userspace issue---they could declare vCPU
> > threads stopped only after KVM_GET_MPSTATE is done, and only start the
> > downtime phase of migration after that.  But it is nevertheless a pretty
> > bad excuse.
> 
> I agree that this could be fixed by documenting the behavior. Since I
> don't think there's any existing documentation that says which ioctls
> can modify guest memory, such a documentation change wouldn't actually
> constitute an API breakage.
> 
> BTW, which ioctls can modify guest memory?
> 
> And, while we're at it, can we document the required orderings of the
> various _GET_ and _SET_ ioctls for save and restore?
> 

I strongly vote to make KVM_GET_MP_STATE not change guest state.
It will backfire one day.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-15  8:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10 20:39 [RFC] x86: Eliminate VM state changes during KVM_GET_MP_STATE Jim Mattson
2021-06-11 10:31 ` Paolo Bonzini
2021-06-14 21:19   ` Jim Mattson
2021-06-15  8:52     ` Maxim Levitsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.