Re: [PATCH 0/7] i386: Add `machine` parameter to query-cpu-definitions

From: David Hildenbrand <david@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	qemu-devel@nongnu.org,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Jiri Denemark <jdenemar@redhat.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [PATCH 0/7] i386: Add `machine` parameter to query-cpu-definitions
Date: Fri, 25 Oct 2019 19:19:55 +0200	[thread overview]
Message-ID: <92cda748-8cb0-e95d-8fe1-4f9632762f64@redhat.com> (raw)
In-Reply-To: <20191025150040.GC3581@redhat.com>

>> I once was told that if a user actually specified an explicit CPU model in
>> the libvirt XML ("haswell-whatever"), you should not go ahead and make any
>> later changes to that model (guest ABI should not change when you
>> update/restart the guest ...). So this only applies when creating new
>> guests? Or will you change existing model definitions implicitly?
> 
> Libvirt will only ever expand a bare CPU model at time it first parses
> the XML. So if a mgmt app defines a new persistent guest in libvirt, the
> CPU is expanded them and remains unchanged thereafter, in order to preserve
> ABI compat.

Okay, perfect.

> 
> If using transient guests its different as libvirt doesn't store the config
> in disk when the guest isn't running. So mgmt apps using transient guests
> are responsible  for picking a explicit versioned model themselves if they
> need stable ABI.

That makes sense.

> 
>>>> Then you can specify "-cpu z13-vX" or "-cpu z13 -cpuv X" (no idea how
>>>> versioned CPU model were implemented) on any QEMU machine. Which is the same
>>>> as telling your customer "please use z13,featX=on" in case you have a good
>>>> reason to not use the host model (along with baselining) but use an explicit
>>>> model.
>>>>
>>>> If you can change the default model of QEMU machines, you can automate this
>>>> process. I am pretty sure this is a corner case, though (e.g., IBRS).
>>>> Usually you have a new QEMU machine and can simply enable the new feature
>>>> from that point on.
>>>
>>> There are now 4 Haswell variants, only some of which are runnable
>>> on any given host, depending on what microcode the user has installed
>>> or what particular Haswell silicon SKU the user purchased. Given the
>>> frequency of new CPU flaws arrived since the first Meltdown/Spectre,
>>> this isn't a corner case, at least for the x86 world & Intel in
>>> particular. Other arches/vendors haven't been quite so badly affected
>>> in this way.
>>
>> On s390x you can assume that such firmware/microcode updates will be on any
>> machine after some time. That is a big difference to x86-64 AFAIK.
> 
> I don't know s390x much, but can we really assume that users promptly
> install firmware updates, any better than users do for x86 or other
> arch. IME corporate beaurcracy can drag out time to update arbitrarily
> long.

That's what you get when you pay premium prices for premium support :D

The real issue when it comes to CPU models on s390x is the variance of 
features of a specific model across environments (especially different 
hypervisors).

>>> If we tied each new Haswell variant to a machine type, then users would
>>> be blocked from consuming a new machine type depending on runnability of
>>> the CPU model. This is not at all desirable, as mgmt apps now have complex
>>> rules on what machine type they can use.
>>
>> So you actually want different CPU variants, which you have already, just in
>> a different form. (e.g., "haswell" will be mapped to "haswell-whatever",
>> just differently via versions)
> 
> Yes, you can think of "Haswell", "Haswell-noTSX", "Haswell-noTSX-IBRS"
> as all being versions of the same thing. There was never any explicit
> association or naming though. So what's changing is that we're defining
> a sane naming scheme for the variants of each model so we don't end
> up with   "Haswell-noTSX-IBRS-SSBD-MDS-WHATEVER-NEXT-INTEL-FLAW-IS",
> and we declaring that a bare "Haswell" will expand to some "best"
> version depending on machine type (but also selectable by mgmt app
> above).

I mean, all you really want is a way to specify "give me the best 
haswell you can do with this accelerator", which *could* map to "-cpu 
haswell,tsx=off,ibrs=on,ssbf=on" ... but also something else on the HW.

I really don't see why we need versioned CPU models for that, all you 
want to do is apply delta updates to the initial model if possible on 
the current accelerator. Just like HW does. See below for a simpler 
approach.

> 
>>> Both these called for making CPU versioning independant of machine
>>> type versioning.
>>>
>>> Essentially the goal with CPU versioning is that the user can request
>>> a bare "Haswell" and libvirt (or the mgmt app) will automatically
>>> expand this to the best Haswell version that the host is able to
>>> support with its CPUs / microcode / BIOS config combination.
>>
>>
>> So if I do a "-cpu haswell -M whatever-machine", as far as I understood
>> reading this,  I get the "default CPU model alias for that QEMU machine" and
>> *not* the "best Haswell version that the host is able to support".
>>
>> Or does the default actually also depend on the current host?
> 
> At the QEMU level "haswell" will expand to a particular CPU version
> per machine type. So yes, at the QEMU level machine types might have
> a dependancy on the host.
> 
> Above QEMU though, libvirt/mgmt apps can be more dynamic in how they
> expand a bare "haswell" to take account of what the host supports.

Let me propose something *much* simpler which would work just fine on 
s390x, and I assume on x86-64 as well.

On s390x we e.g. have the two models:
- "z14-base"
  -> minimum feature set we expect to have in every environment
  -> QEMU version/machine independent, will never change
- "z14"
  -> "default model", can change between QEMU machines
  -> migration-safe

Now, internally we have in addition something that matches:
- "z14-max"
  -> all possible CPU features valid for this model
  -> Includes e.g., nested virt features not in the "z14" model
  -> Can and will change between QEMU versions

Of course we also have:
- "max"
  -> "all features supported by the accelerator in the current host"

What we really want is:
- "z14-best"
  -> "best features for this model supported by the accelerator in the
      current host"

The trick is that *usually* :
	"z14-best" = baseline("z14-max", "max")

Minor exceptions can be easily added (e.g., always disable a certain 
feature). So, what I would be proposing for s390x (and also x86-64, but 
I know that they have more legacy handling) is simply implementing and 
exposing all -best models.

Why is this good in general?

1. No semantic changes of existing models. What was migration safe
    remains migration safe.
2. No CPU versions, less complexity.
3. No changes in the tool stack. Implement it in QEMU and it will be
    supported on every layer. Simply specify/expand "z14-best" and you
    get something that will run and make use of the best (on s390x
    usually all) features available that are valid for this model.

Why is this good for s390x?

1. As discussed, versioning all the different flavors we have is
    not feasible, nor practicable.

2. Features that are typically not around (especially, nested virt
    features) will be enabled similar to the host model when around.

I think I can hack a prototype of that in a couple of hours.

-- 

Thanks,

David / dhildenb