All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM_GET_SUPPORTED_CPUID
@ 2019-11-14 20:09 Jim Mattson
  2019-11-15 11:42 ` KVM_GET_SUPPORTED_CPUID Paolo Bonzini
  0 siblings, 1 reply; 5+ messages in thread
From: Jim Mattson @ 2019-11-14 20:09 UTC (permalink / raw)
  To: kvm list

Can someone explain this ioctl to me? The more I look at it, the less
sense it makes to me.

Let's start with leaf 0. If I see 0xd in EAX, does that indicate the
*maximum* supported value in EAX? Or does that mean that only a value
of 0xd is supported for EAX? If I see "AuthenticAMD" in EBX/EDX/ECX,
does that mean that "GenuineIntel" is *not* supported? I thought
people were having reasonable success with cross-vendor migration.

What about leaf 7 EBX? If a bit is clear, does that mean setting the
bit is unsupported? If a bit is set, does that mean clearing the bit
is unsupported? Do those answers still apply for bits 6 and 13, where
a '1' indicates the absence of a feature?

What about leaf 0xa? Kvm's api.txt says, "Note that certain
capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may expose cpuid
features (e.g. MONITOR) which are not supported by kvm in its default
configuration. If userspace enables such capabilities, it is
responsible for modifying the results of this ioctl appropriately."
However, it appears that the vPMU is enabled not by such a capability,
but by filling in leaf 0xa. How does userspace know what leaf 0xa
values are supported by both the hardware and kvm?

And as for KVM_CAP_X86_DISABLE_EXITS, in particular, how is userspace
supposed to know what the appropriate modification to
KVM_GET_SUPPORTED_CPUID is? Is this documented somewhere else?

And as for the "certain capabilities" clause above, should I assume
that any capability enabled by userspace may require modifications to
KVM_GET_SUPPORTED_CPUID?  What might those required modifications be?
Has anyone thought to document them, or better yet, provide an API to
get them?

What about the processor brand string in leaves 0x80000000-0x80000004?
Is a string of NULs really the only supported value?

And just a nit, but why does this ioctl bother returning explicitly
zeroed leaves for unsupported leaves in range?

It would really be nice if I could use this ioctl to write a
"HostSupportsGuest" function based in part on an existing guest's
CPUID information, but that doesn't seem all that feasible, without
intimate knowledge of how the host's implementation of kvm works.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: KVM_GET_SUPPORTED_CPUID
  2019-11-14 20:09 KVM_GET_SUPPORTED_CPUID Jim Mattson
@ 2019-11-15 11:42 ` Paolo Bonzini
  2019-11-15 18:01   ` KVM_GET_SUPPORTED_CPUID Sean Christopherson
  2019-11-15 18:33   ` KVM_GET_SUPPORTED_CPUID Jim Mattson
  0 siblings, 2 replies; 5+ messages in thread
From: Paolo Bonzini @ 2019-11-15 11:42 UTC (permalink / raw)
  To: Jim Mattson, kvm list

On 14/11/19 21:09, Jim Mattson wrote:
> Can someone explain this ioctl to me? The more I look at it, the less
> sense it makes to me.

It certainly has some historical baggage in it, much like
KVM_GET_MSR_INDEX_LIST.  But (unlike KVM_GET_MSR_INDEX_LIST) it's mostly
okay; the issues you report boil down to one of:

1) KVM_GET_SUPPORTED_CPUID being a system ioctl

2) supporting the simple case of taking the output of
KVM_GET_SUPPORTED_CPUID and passing it to KVM_SET_CPUID2

3) CPUID information being poorly designed, or just Intel doing
undesirable things

> Let's start with leaf 0. If I see 0xd in EAX, does that indicate the
> *maximum* supported value in EAX?

This is easy, you can always supply a subset of the values to the guest,
and this includes reducing the value of integer values (such as the
number of leaves) or clearing bits.  It should be documented better,
possibly including with a list of leaves that can be filled by the VMM
as it likes (e.g. cache topology).

> Or does that mean that only a value
> of 0xd is supported for EAX? If I see "AuthenticAMD" in EBX/EDX/ECX,
> does that mean that "GenuineIntel" is *not* supported? I thought
> people were having reasonable success with cross-vendor migration.

This is (2).  But in general passing the host value is the safe choice,
everything else has reasonable success but it's not something I would
recommend in production (and it's something I wouldn't mind removing,
really).

> What about leaf 7 EBX? If a bit is clear, does that mean setting the
> bit is unsupported? If a bit is set, does that mean clearing the bit
> is unsupported? Do those answers still apply for bits 6 and 13, where
> a '1' indicates the absence of a feature?

Again, clearing bits is always supported in theory, but I say "in
theory" because of course bits 6 and 13 are indeed problematic. And
unfortunately the only solutions for those is to stick your head in the
sand and pretend they don't exist.  If bits 6 and 13 were handled
strictly, people would not be able to migrate VMs between e.g. Haswell
and Ivy Bridge machines within the same fleet, which is something people
want to do.  So, this is (3).

A similar case is CPUID[0Ah].EBX (unavailable architectural events).

> What about leaf 0xa? Kvm's api.txt says, "Note that certain
> capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may expose cpuid
> features (e.g. MONITOR) which are not supported by kvm in its default
> configuration. If userspace enables such capabilities, it is
> responsible for modifying the results of this ioctl appropriately."
> However, it appears that the vPMU is enabled not by such a capability,
> but by filling in leaf 0xa.

Right, the supported values are provided by KVM_GET_SUPPORTED_CPUID.  So
as long as you don't zero the PMU version id to 0, PMU is enabled.

> How does userspace know what leaf 0xa
> values are supported by both the hardware and kvm?

Reducing functionality is supported---fewer GP or fixed counters, or
disabling events by *setting* bits in EBX or reducing EAX[31:24].

> And as for KVM_CAP_X86_DISABLE_EXITS, in particular, how is userspace
> supposed to know what the appropriate modification to
> KVM_GET_SUPPORTED_CPUID is? Is this documented somewhere else?
> 
> And as for the "certain capabilities" clause above, should I assume
> that any capability enabled by userspace may require modifications to
> KVM_GET_SUPPORTED_CPUID?  What might those required modifications be?
> Has anyone thought to document them, or better yet, provide an API to
> get them?

And finally this is (1).  It should be documented by the individual
capabilities or ioctls.

With KVM_ENABLE_CAP, the only one that is _absent_ from
KVM_GET_SUPPORTED_CPUID the MONITOR bit.

The opposite case is X2APIC, which is reported as supported in
KVM_GET_SUPPORTED_CPUID even though requires KVM_CREATE_IRQCHIP (or
KVM_ENABLE_CAP + KVM_CAP_SPLIT_IRQCHIP).  Of course any serious VMM uses
in-kernel irqchip, but still it's ugly.

Providing an API to get a known-good value of CPUID for the current VM
(a KVM_GET_SUPPORTED_CPUID vm-ioctl, basically) would be a fine idea.
If anybody is interested, I can think of other cases where this applies.
 It would provide a better way to do (2), essentially.

> What about the processor brand string in leaves 0x80000000-0x80000004?
> Is a string of NULs really the only supported value?

Indeed this should probably return the host values, at least for
consistency with the vendor.

> And just a nit, but why does this ioctl bother returning explicitly
> zeroed leaves for unsupported leaves in range?

No particular reason, it just keeps the code simpler.

> It would really be nice if I could use this ioctl to write a
> "HostSupportsGuest" function based in part on an existing guest's
> CPUID information, but that doesn't seem all that feasible, without
> intimate knowledge of how the host's implementation of kvm works.

It does not depend that much on knowledge of the host's implementation
of KVM.  However, it does depend on tiny details of the CPUID bits.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: KVM_GET_SUPPORTED_CPUID
  2019-11-15 11:42 ` KVM_GET_SUPPORTED_CPUID Paolo Bonzini
@ 2019-11-15 18:01   ` Sean Christopherson
  2019-11-15 18:33   ` KVM_GET_SUPPORTED_CPUID Jim Mattson
  1 sibling, 0 replies; 5+ messages in thread
From: Sean Christopherson @ 2019-11-15 18:01 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jim Mattson, kvm list

On Fri, Nov 15, 2019 at 12:42:17PM +0100, Paolo Bonzini wrote:
> On 14/11/19 21:09, Jim Mattson wrote:
> > If I see "AuthenticAMD" in EBX/EDX/ECX,
> > does that mean that "GenuineIntel" is *not* supported? I thought
> > people were having reasonable success with cross-vendor migration.
> 
> This is (2).  But in general passing the host value is the safe choice,
> everything else has reasonable success but it's not something I would
> recommend in production (and it's something I wouldn't mind removing,
> really).

Maybe keep it but add a pr_warn_once() to inform the user that exposions
are likely?  Or make it opt-in via a module param?  I've found this useful
for smoke testing patches that touch AMD/Hygon/Zhaoxin/Centaur code.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: KVM_GET_SUPPORTED_CPUID
  2019-11-15 11:42 ` KVM_GET_SUPPORTED_CPUID Paolo Bonzini
  2019-11-15 18:01   ` KVM_GET_SUPPORTED_CPUID Sean Christopherson
@ 2019-11-15 18:33   ` Jim Mattson
  2019-11-15 19:15     ` KVM_GET_SUPPORTED_CPUID Paolo Bonzini
  1 sibling, 1 reply; 5+ messages in thread
From: Jim Mattson @ 2019-11-15 18:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm list

On Fri, Nov 15, 2019 at 3:42 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 14/11/19 21:09, Jim Mattson wrote:
> > Can someone explain this ioctl to me? The more I look at it, the less
> > sense it makes to me.
>
> It certainly has some historical baggage in it, much like
> KVM_GET_MSR_INDEX_LIST.  But (unlike KVM_GET_MSR_INDEX_LIST) it's mostly
> okay; the issues you report boil down to one of:
>
> 1) KVM_GET_SUPPORTED_CPUID being a system ioctl
>
> 2) supporting the simple case of taking the output of
> KVM_GET_SUPPORTED_CPUID and passing it to KVM_SET_CPUID2

For this purpose, wouldn't 'DEFAULT' make a lot more sense than
'SUPPORTED' in the name of this ioctl?

> 3) CPUID information being poorly designed, or just Intel doing
> undesirable things
>
> > Let's start with leaf 0. If I see 0xd in EAX, does that indicate the
> > *maximum* supported value in EAX?
>
> This is easy, you can always supply a subset of the values to the guest,
> and this includes reducing the value of integer values (such as the
> number of leaves) or clearing bits.  It should be documented better,
> possibly including with a list of leaves that can be filled by the VMM
> as it likes (e.g. cache topology).

Maybe you can reduce CPUID.0H:EAX, but there are some integer values
that you can't reduce (e.g. CPUID.(EAX=0Dh,ECX=0):ECX). So, I'd argue
that this isn't "easy."

> > Or does that mean that only a value
> > of 0xd is supported for EAX? If I see "AuthenticAMD" in EBX/EDX/ECX,
> > does that mean that "GenuineIntel" is *not* supported? I thought
> > people were having reasonable success with cross-vendor migration.
>
> This is (2).  But in general passing the host value is the safe choice,
> everything else has reasonable success but it's not something I would
> recommend in production (and it's something I wouldn't mind removing,
> really).
>
> > What about leaf 7 EBX? If a bit is clear, does that mean setting the
> > bit is unsupported? If a bit is set, does that mean clearing the bit
> > is unsupported? Do those answers still apply for bits 6 and 13, where
> > a '1' indicates the absence of a feature?
>
> Again, clearing bits is always supported in theory, but I say "in
> theory" because of course bits 6 and 13 are indeed problematic. And
> unfortunately the only solutions for those is to stick your head in the
> sand and pretend they don't exist.  If bits 6 and 13 were handled
> strictly, people would not be able to migrate VMs between e.g. Haswell
> and Ivy Bridge machines within the same fleet, which is something people
> want to do.  So, this is (3).

For these two bits, one could argue that *setting* them is always
supported, at least insofar as *clearing* the normal polarity bits is
supported. If you say that FCS and FDS are "deprecated [sic]" on your
Ivy Bridge platform, but software relies on it, then that software is
just as ill-behaved as software that depends on any other feature that
has been masked off. (Of course, none of the software that actually
depends on this feature actually checks the CPUID bit, since the CPUID
bit was defined after-the-fact.) So, even if you're strict about it,
you can migrate between Haswell and Ivy Bridge.


> A similar case is CPUID[0Ah].EBX (unavailable architectural events).
>
> > What about leaf 0xa? Kvm's api.txt says, "Note that certain
> > capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may expose cpuid
> > features (e.g. MONITOR) which are not supported by kvm in its default
> > configuration. If userspace enables such capabilities, it is
> > responsible for modifying the results of this ioctl appropriately."
> > However, it appears that the vPMU is enabled not by such a capability,
> > but by filling in leaf 0xa.
>
> Right, the supported values are provided by KVM_GET_SUPPORTED_CPUID.  So
> as long as you don't zero the PMU version id to 0, PMU is enabled.
>
> > How does userspace know what leaf 0xa
> > values are supported by both the hardware and kvm?
>
> Reducing functionality is supported---fewer GP or fixed counters, or
> disabling events by *setting* bits in EBX or reducing EAX[31:24].
>
> > And as for KVM_CAP_X86_DISABLE_EXITS, in particular, how is userspace
> > supposed to know what the appropriate modification to
> > KVM_GET_SUPPORTED_CPUID is? Is this documented somewhere else?
> >
> > And as for the "certain capabilities" clause above, should I assume
> > that any capability enabled by userspace may require modifications to
> > KVM_GET_SUPPORTED_CPUID?  What might those required modifications be?
> > Has anyone thought to document them, or better yet, provide an API to
> > get them?
>
> And finally this is (1).  It should be documented by the individual
> capabilities or ioctls.
>
> With KVM_ENABLE_CAP, the only one that is _absent_ from
> KVM_GET_SUPPORTED_CPUID the MONITOR bit.

And leaf 5?

> The opposite case is X2APIC, which is reported as supported in
> KVM_GET_SUPPORTED_CPUID even though requires KVM_CREATE_IRQCHIP (or
> KVM_ENABLE_CAP + KVM_CAP_SPLIT_IRQCHIP).  Of course any serious VMM uses
> in-kernel irqchip, but still it's ugly.
>
> Providing an API to get a known-good value of CPUID for the current VM
> (a KVM_GET_SUPPORTED_CPUID vm-ioctl, basically) would be a fine idea.
> If anybody is interested, I can think of other cases where this applies.
>  It would provide a better way to do (2), essentially.
>
> > What about the processor brand string in leaves 0x80000000-0x80000004?
> > Is a string of NULs really the only supported value?
>
> Indeed this should probably return the host values, at least for
> consistency with the vendor.
>
> > And just a nit, but why does this ioctl bother returning explicitly
> > zeroed leaves for unsupported leaves in range?
>
> No particular reason, it just keeps the code simpler.
>
> > It would really be nice if I could use this ioctl to write a
> > "HostSupportsGuest" function based in part on an existing guest's
> > CPUID information, but that doesn't seem all that feasible, without
> > intimate knowledge of how the host's implementation of kvm works.
>
> It does not depend that much on knowledge of the host's implementation
> of KVM.  However, it does depend on tiny details of the CPUID bits.
>
> Thanks,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: KVM_GET_SUPPORTED_CPUID
  2019-11-15 18:33   ` KVM_GET_SUPPORTED_CPUID Jim Mattson
@ 2019-11-15 19:15     ` Paolo Bonzini
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2019-11-15 19:15 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list

On 15/11/19 19:33, Jim Mattson wrote:
> On Fri, Nov 15, 2019 at 3:42 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 14/11/19 21:09, Jim Mattson wrote:
>>> Can someone explain this ioctl to me? The more I look at it, the less
>>> sense it makes to me.
>>
>> It certainly has some historical baggage in it, much like
>> KVM_GET_MSR_INDEX_LIST.  But (unlike KVM_GET_MSR_INDEX_LIST) it's mostly
>> okay; the issues you report boil down to one of:
>>
>> 1) KVM_GET_SUPPORTED_CPUID being a system ioctl
>>
>> 2) supporting the simple case of taking the output of
>> KVM_GET_SUPPORTED_CPUID and passing it to KVM_SET_CPUID2
> 
> For this purpose, wouldn't 'DEFAULT' make a lot more sense than
> 'SUPPORTED' in the name of this ioctl?

I'm not sure, this only applies to some relatively unimportant leaves,
such as cache topology or vendor/model name.  So *in general* SUPPORTED
is accurate for leaves that userspace will want to look at in the output
of KVM_GET_SUPPORTED_CPUID.

DEFAULT would imply that it's what you get if you skip KVM_SET_CPUID2.
However, this is not the case.

>> 3) CPUID information being poorly designed, or just Intel doing
>> undesirable things
>>
>>> Let's start with leaf 0. If I see 0xd in EAX, does that indicate the
>>> *maximum* supported value in EAX?
>>
>> This is easy, you can always supply a subset of the values to the guest,
> 
> Maybe you can reduce CPUID.0H:EAX, but there are some integer values
> that you can't reduce (e.g. CPUID.(EAX=0Dh,ECX=0):ECX). So, I'd argue
> that this isn't "easy."

True, "easy" referred to leaf 0 only.  Other leaves may have nasty
interdependencies.  The values of EBX and ECX depend on the ECX>1
subleaves, and EAX (as well as the presence of those subleaves)
generally depends on feature bits from other CPUID leaves.  However, all
this is dependent only on processor features and not on KVM's
implementation.  You can still reduce them if you pay enough attention.
 For example you can reduce CPUID(0Dh,0).ECX as long as you hide some
bits from CPUID(0Dh,0).EDX:EAX.

The only thing you need to know about the KVM implementation, is that it
takes into account CPUID(0Dh,0).EDX:EAX and CPUID(0Dh,1).EDX:ECX to
decide whether to raise an exception on XSETBV or WRMSR(IA32_XSS)
respectively.  So to some extent this is also a documentation issue.

[Aside: fixing leaf 0Dh was my first serious KVM patch. :)]

>>> What about leaf 7 EBX? If a bit is clear, does that mean setting the
>>> bit is unsupported? If a bit is set, does that mean clearing the bit
>>> is unsupported? Do those answers still apply for bits 6 and 13, where
>>> a '1' indicates the absence of a feature?
>>
>> Again, clearing bits is always supported in theory, but I say "in
>> theory" because of course bits 6 and 13 are indeed problematic. And
>> unfortunately the only solutions for those is to stick your head in the
>> sand and pretend they don't exist.  If bits 6 and 13 were handled
>> strictly, people would not be able to migrate VMs between e.g. Haswell
>> and Ivy Bridge machines within the same fleet, which is something people
>> want to do.  So, this is (3).
> 
> For these two bits, one could argue that *setting* them is always
> supported, at least insofar as *clearing* the normal polarity bits is
> supported.

True, they just have reversed polarity.  The issues are that you might
have old VMs started even before the bits were defined, or before you
added support for the bits in your stack.  Setting them might cause
problems in userspace if they were written to know about CPUID(7,0).EBX
but not about the reversed polarity of these bits (this may not apply
exactly to these two bits---I don't know if they were part of the first
batch of features documented for leaf 7---but the same situation may
arise in the future).  At least CPUID(0Ah).EBX is better in this
respect, because you know it exists and what it is going to be used in
the future.  They are just defined backwards.

Of course as you point out those x87 CPUID bits are useless anyway
because nobody checks them.

> If you say that FCS and FDS are "deprecated [sic]" on your
> Ivy Bridge platform, but software relies on it, then that software is
> just as ill-behaved as software that depends on any other feature that
> has been masked off. (Of course, none of the software that actually
> depends on this feature actually checks the CPUID bit, since the CPUID
> bit was defined after-the-fact.) So, even if you're strict about it,
> you can migrate between Haswell and Ivy Bridge. [...]

>> With KVM_ENABLE_CAP, the only one that is _absent_ from
>> KVM_GET_SUPPORTED_CPUID the MONITOR bit.
> 
> And leaf 5?

Hmm I thought all zeroes would be fine.  What does "(default is
processor's monitor granularity)" mean for EAX and EBX?

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-15 19:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14 20:09 KVM_GET_SUPPORTED_CPUID Jim Mattson
2019-11-15 11:42 ` KVM_GET_SUPPORTED_CPUID Paolo Bonzini
2019-11-15 18:01   ` KVM_GET_SUPPORTED_CPUID Sean Christopherson
2019-11-15 18:33   ` KVM_GET_SUPPORTED_CPUID Jim Mattson
2019-11-15 19:15     ` KVM_GET_SUPPORTED_CPUID Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.