Re: [Qemu-devel] [PATCH 6/7] target/i386: add VMX features

From: Eduardo Habkost <ehabkost@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 6/7] target/i386: add VMX features
Date: Fri, 5 Jul 2019 19:33:29 -0300	[thread overview]
Message-ID: <20190705223329.GL5198@habkost.net> (raw)
In-Reply-To: <6262c798-fc94-5100-8836-e3cbea306282@redhat.com>

On Sat, Jul 06, 2019 at 12:12:49AM +0200, Paolo Bonzini wrote:
> On 05/07/19 23:22, Eduardo Habkost wrote:
> >> +    switch (index) {
> >> +    case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> >> +        default1 = 0x00000016;
> >> +        break;
> >> +    case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
> >> +        default1 = 0x0401e172;
> >> +        break;
> >> +    case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> >> +        default1 = 0x000011ff;
> >> +        break;
> >> +    case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> >> +        default1 = 0x00036dff;
> >> +        break;
> >> +    case MSR_IA32_VMX_PROCBASED_CTLS2:
> >> +        default1 = 0;
> >> +        break;
> > How do you plan to implement backwards compatibility if these
> > defaults ever change?  Shouldn't these values be part of the CPU
> > model definitions so we can update them in the future?
> 
> These are not defaults, they are "default-1 bits": if a feature is
> disabled, these bits are 1 in both halves of the MSR rather than zero.
> The set of default-1 bits is documented and is not going to change in
> the future.
> 
> Some default-1 bits *could* however become features in the future, and
> four of these already have features associated to them:
> vmx-cr3-load-noexit, vmx-cr3-store-noexit, vmx-exit-nosave-debugctl,
> vmx-entry-noload-debugctl.  You can see that they have "no" in their
> name because the feature is about the ability to "do less" rather than
> "do more".

Understood.  Thanks!

> 
> >> +    uint64_t kvm_vmx_basic =
> >> +        kvm_arch_get_supported_msr_feature(kvm_state,
> >> +                                           MSR_IA32_VMX_BASIC);
> >> +    uint64_t kvm_vmx_misc =
> >> +        kvm_arch_get_supported_msr_feature(kvm_state,
> >> +                                           MSR_IA32_VMX_MISC);
> >> +    uint64_t kvm_vmx_ept_vpid =
> >> +        kvm_arch_get_supported_msr_feature(kvm_state,
> >> +                                           MSR_IA32_VMX_EPT_VPID_CAP);
> > 
> > If the MSR value we're exposing to the guest depends on
> > kvm_arch_get_supported_msr_feature(), how will we ensure this
> > will be safe for live migration?
> 
> Because KVM guarantees that this part of the guest ABI will never
> change.  These values do not come from the host values of the MSRs, they
> are fixed by KVM.  More details below.
> 
> > If we really need to tweak the MSR values based on the host for
> > some reason (which is not clear to me yet), why don't we update
> > env->features[...] at x86_cpu_expand_features() to reflect what
> > the guest is really seeing?
> > 
> > 
> >> +    /*
> >> +     * Bits 0-30, 32-44 and 50-53 come from the host.  KVM should
> >> +     * not change them for backwards compatibility.
> > 
> > Backwards compatibility with what?
> > 
> > Don't we want the MSR values to depend solely on the QEMU command
> > line in the future?
> 
> These bits are: VMCS revision, VMCS size and VMCS memory type.  QEMU
> cannot know them, as they depend on the internal implementation details
> of KVM.
> 
> Now that KVM supports nested virt live migration they cannot change
> anymore---otherwise KVM would break KVM live migration compatibility.
> However, theoretically in the future KVM could add some capability
> (which userspace would have to manually enable) and when the capability
> is enabled the values can change.

Oh, that's the info I was missing.  I always expected
kvm_arch_get_supported_*() to be subject to change (depending on
KVM and hardware capabilities), and not be part of guest ABI.

Now, if KVM is going to to implement the guest ABI guarantee at
KVM_GET_MSRS, that's OK.  Is this going to be obvious to people
touching KVM_GET_MSRS in the future?

What if we do want the guest ABI to change in the future?  How do
you expect QEMU to ask KVM to enable the new guest ABI?  How do
you expect the user to ask QEMU to enable the new guest ABI?

> 
> > +    /*
> > +     * Same for bits 0-4 and 25-27.  Bits 16-24 (CR3 target count) can
> > +     * change in the future but are always zero for now, clear them to be
> > +     * future proof.  Bits 32-63 in theory could change, though KVM does
> > +     * not support dual-monitor treatment and probably never will; mask
> > +     * them out as well.
> > +     */
> 
> The reasoning is more or less the same here.  These bits are part of the
> guest ABI (preemption timer scaling, CR3 target count, MSR count, MSEG
> revision).  Right now bits 0-4 are 5 and the others are 0; in the future:
> 
> - KVM cannot change bits 0-4 and 32-63 them without breaking guest ABI
> (the values must match between what you read and what you set)
> 
> - KVM could change bits 16-24, but it always allows writing a value that
> is _smaller_ than the one you read.  So I'm zeroing those, ensuring no
> future ABI changes.
> 
> - KVM could in theory change bits 25-27: here it also allows writing a
> value that is smaller than the one you read, so guest ABI is preserved.
>  Such a change is very unlikely, all Intel silicon has always had 0
> here.  But I can change the code to zero these three bits just like bits
> 16-24.

The complex rules above make me a bit nervous.  Can we at least
make QEMU validate the values returned by
kvm_arch_get_supported_msr_feature() to catch ABI-breaking
mistakes in the future?

-- 
Eduardo