Re: [PATCH] libx86: Introduce x86_cpu_policy_calculate_compatible() with MSR_ARCH_CAPS handling

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Xen-devel <xen-devel@lists.xenproject.org>,
	Jan Beulich <JBeulich@suse.com>, Wei Liu <wl@xen.org>
Subject: Re: [PATCH] libx86: Introduce x86_cpu_policy_calculate_compatible() with MSR_ARCH_CAPS handling
Date: Wed, 5 May 2021 15:29:26 +0100	[thread overview]
Message-ID: <38f5b74f-b005-784b-a92d-8ddb9e1b8d3c@citrix.com> (raw)
In-Reply-To: <YJKXZyCHpRg32tyc@Air-de-Roger>

On 05/05/2021 14:02, Roger Pau Monné wrote:
> On Wed, May 05, 2021 at 01:37:48PM +0100, Andrew Cooper wrote:
>> On 05/05/2021 11:04, Roger Pau Monné wrote:
>>> On Tue, May 04, 2021 at 10:31:20PM +0100, Andrew Cooper wrote:
>>>> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
>>>> index f6cea4e2f9..06039e8aa8 100644
>>>> --- a/xen/lib/x86/policy.c
>>>> +++ b/xen/lib/x86/policy.c
>>>> @@ -29,6 +29,9 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>>>>      if ( ~host->msr->platform_info.raw & guest->msr->platform_info.raw )
>>>>          FAIL_MSR(MSR_INTEL_PLATFORM_INFO);
>>>>  
>>>> +    if ( ~host->msr->arch_caps.raw & guest->msr->arch_caps.raw )
>>>> +        FAIL_MSR(MSR_ARCH_CAPABILITIES);
>>> It might be nice to expand test_is_compatible_{success,failure} to
>>> account for arch_caps being checked now.
>> At some point we're going to need to stop unit testing "does the AND
>> operator work", and limit testing to the interesting corner cases.
>>
>>> Shouldn't this check also take into account that host might not have
>>> RSBA set, but it's legit for a guest policy to have it?
>> When we expose this properly to guests, the max policies will have RSBA
>> set, and the default policies will have RSBA forwarded from hardware
>> and/or the model table.
>>
>> Therefore, we can accept any VM RSBA configuration, irrespective of the
>> particulars of this host, but if you e.g. have a pool of haswell's, the
>> default policy will have RSBA clear across the board, and the VM won't
>> see it.
>>
>>> if ( ~host->msr->arch_caps.raw & guest->msr->arch_caps.raw & ~POL_MASK )
>>>     FAIL_MSR(MSR_ARCH_CAPABILITIES);
>>>
>>> Maybe POL_MASK should be renamed and defined in a header so it's
>>> widely available?
>> No - this would be incorrect.  The polarity of certain bits only matters
>> for levelling calculations, not for "can this VM run on this host"
>> calculations.
>>
>> If the VM has seen RSBA, and Xen doesn't know about it, the VM cannot run.
> But then the logic relation between
> x86_cpu_policy_calculate_compatible and
> x86_cpu_policies_are_compatible is broken AFAICT.
>
> If you give x86_cpu_policy_calculate_compatible one policy with RSBA set
> and one without it will generate a compatible policy, yet that output
> will be regarded as not compatible if feed into
> x86_cpu_policies_are_compatible against the policy that doesn't have
> RSBA set.
>
> I think the output from x86_cpu_policy_calculate_compatible should
> strictly return true when checked against any of the inputs using
> x86_cpu_policies_are_compatible, or else we need to note it somewhere
> because I think it's not the expected behavior.

Welcome to the monumental complexity, and the reason why this isn't 5
minutes of work.  This is just the tip of the iceberg.

"Please create me a policy for a VM" is conducted across PV/HVM default
policies, and/or user settings, while "Can this VM run on this host" is
checked against the max policy.  This split is necessary to cope with
the corner cases.

So no - levelling max policies isn't expected to result in anything
useful, and calling is_compatible with a default (rather than max) host
setting also isn't going result in a useful answer.

And yes - for some changes, RSBA included, you're going to need to
update all your Xen's across the pool before migration is going to work,
but that's already the case now.

Tangentially, we haven't started yet on

struct irritating_corner_cases {
    bool vm_not_using_fcs_fds;
    bool vm_not_using_lbr;
    ...
};

which will require explicit user opt-in to override the "No - you can't
migration across the IvyBridge/Haswell, or pre-Zen/Zen boundary".

Technically, MCXSR_MASK is also a hard blocker to migration, but we
don't even have that data in a consumable form, and we just might be
extremely lucky and discover that it is restricted to non-64-bit CPUs.

Migration with a VM having turned on LBR is still a disaster.  For now,
we drop everything on the floor, and let the VM explode if the
LBR_FORMAT has changed, or if the number of stack entries changes (which
does change with Hyperthreading enabled/disabled in firmware).

>>>> +
>>>>  #undef FAIL_MSR
>>>>  #undef FAIL_CPUID
>>>>  #undef NA
>>>> @@ -43,6 +46,50 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>>>>      return ret;
>>>>  }
>>>>  
>>>> +int x86_cpu_policy_calculate_compatible(const struct cpu_policy *a,
>>>> +                                        const struct cpu_policy *b,
>>>> +                                        struct cpu_policy *out,
>>>> +                                        struct cpu_policy_errors *err)
>>> I think this should be in an #ifndef __XEN__ protected region?
>>>
>>> There's no need to expose this to the hypervisor, as I would expect it
>>> will never have to do compatible policy generation? (ie: it will
>>> always be done by the toolstack?)
>> As indicated previously, I still think we want this in Xen for the boot
>> paths, but I suppose the guard was my suggestion to you, so is only fair
>> at this point.
> TBH I replied before seeing your email that also had this suggestion.
> If it's indeed going to be used by Xen itself then that's fine, but I
> couldn't figure out why the hypervisor would need to generate
> compatible policies itself.
>
> Maybe it will be used to generate the initial policies?

Yes.

>
>>>> +{
>>>> +    const struct cpuid_policy *ap = a->cpuid, *bp = b->cpuid;
>>>> +    const struct msr_policy *am = a->msr, *bm = b->msr;
>>>> +    struct cpuid_policy *cp = out->cpuid;
>>>> +    struct msr_policy *mp = out->msr;
>>>> +
>>>> +    memset(cp, 0, sizeof(*cp));
>>>> +    memset(mp, 0, sizeof(*mp));
>>>> +
>>>> +    cp->basic.max_leaf = min(ap->basic.max_leaf, bp->basic.max_leaf);
>>>> +
>>>> +    if ( cp->basic.max_leaf >= 7 )
>>>> +    {
>>>> +        cp->feat.max_subleaf = min(ap->feat.max_subleaf, bp->feat.max_subleaf);
>>>> +
>>>> +        cp->feat.raw[0].b = ap->feat.raw[0].b & bp->feat.raw[0].b;
>>>> +        cp->feat.raw[0].c = ap->feat.raw[0].c & bp->feat.raw[0].c;
>>>> +        cp->feat.raw[0].d = ap->feat.raw[0].d & bp->feat.raw[0].d;
>>>> +    }
>>>> +
>>>> +    /* TODO: Far more. */
>>> Right, my proposed patch (07/13) went a bit further and also leveled
>>> 1c, 1d, Da1, e1c, e1d, e7d, e8b and e21a, and we also need to level
>>> a couple of max_leaf fields.
>>>
>>> I'm happy for this to go in first, and I can rebase the extra logic I
>>> have on top of this one.
>> There is a lot of work to do.
>>
>> One thing I haven't addressed yet is the fact is things which don't
>> level, e.g. vendor.  You've got to pick one, and there isn't a
>> mathematical relationship to use between a and b.
>>
>> I think for that, we ought to document that we strictly take from a. 
>> This makes the operation not commutative, and in particular, I don't
>> think we want to waste too much time/effort trying to make cross-vendor
>> cases work - it was a stunt a decade ago, with a huge number of sharp
>> corners, as well as creating a number of XSAs due to poor implementation.
>>
>> For v1, I suggest we firmly stick to the same-vendor case.  It's not as
>> if there is a lack of things to do to make this work.
> OK, so level all the feature fields and pick the non feature parts of
> cpuid strictly from one of the inputs.

The awkward part to address is that we've still got simultaneous
equations with feature handling.  I'm fairly certain that the simple
and's which both you and I did won't be sufficient in due course.

~Andrew