linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] CPUID usage for interaction between Hypervisors and Linux.
@ 2008-10-01 17:14 Alok Kataria
  2008-10-01 17:21 ` H. Peter Anvin
                   ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: Alok Kataria @ 2008-10-01 17:14 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	H. Peter Anvin, Ingo Molnar
  Cc: the arch/x86 maintainers, LKML, Nakajima, Jun, Dan Hecht,
	Zachary Amsden, virtualization, kvm

Hi,

Please find below the proposal for the generic use of cpuid space
allotted for hypervisors. Apart from this cpuid space another thing
worth noting would be that, Intel & AMD reserve the MSRs from 0x40000000
- 0x400000FF for software use. Though the proposal doesn't talk about
MSR's right now, we should be aware of these reservations as we may want
to extend the way we use CPUID to MSR usage as well.

While we are at it, we also think we should form a group which has at
least one person representing each of the hypervisors interested in
generalizing the hypervisor CPUID space for Linux guest OS. This group
will be informed whenever a new CPUID leaf from the generic space is to
be used. This would help avoid any duplicate definitions for a CPUID
semantic by two different hypervisors. I think most of the people are
subscribed to LKML or the virtualization lists and we should use these
lists as a platform to decide on things. 

Thanks,
Alok

---

Hypervisor CPUID Interface Proposal
-----------------------------------

Intel & AMD have reserved cpuid levels 0x40000000 - 0x400000FF for
software use.  Hypervisors can use these levels to provide an interface
to pass information from the hypervisor to the guest running inside a
virtual machine.

This proposal defines a standard framework for the way in which the
Linux and hypervisor communities incrementally define this CPUID space.

(This proposal may be adopted by other guest OSes.  However, that is not
a requirement because a hypervisor can expose a different CPUID
interface depending on the guest OS type that is specified by the VM
configuration.)

Hypervisor Present Bit:
        Bit 31 of ECX of CPUID leaf 0x1.

        This bit has been reserved by Intel & AMD for use by
        hypervisors, and indicates the presence of a hypervisor.

        Virtual CPU's (hypervisors) set this bit to 1 and physical CPU's
        (all existing and future cpu's) set this bit to zero.  This bit
	can be probed by the guest software to detect whether they are
	running inside a virtual machine.

Hypervisor CPUID Information Leaf:
        Leaf 0x40000000.

        This leaf returns the CPUID leaf range supported by the
        hypervisor and the hypervisor vendor signature.

        # EAX: The maximum input value for CPUID supported by the hypervisor.
        # EBX, ECX, EDX: Hypervisor vendor ID signature.

Hypervisor Specific Leaves:
        Leaf range 0x40000001 - 0x4000000F.

        These cpuid leaves are reserved as hypervisor specific leaves.
        The semantics of these 15 leaves depend on the signature read
        from the "Hypervisor Information Leaf".

Generic Leaves:
        Leaf range 0x40000010 - 0x4000000FF.

        The semantics of these leaves are consistent across all
        hypervisors.  This allows the guest kernel to probe and
        interpret these leaves without checking for a hypervisor
        signature.

        A hypervisor can indicate that a leaf or a leaf's field is
        unsupported by returning zero when that leaf or field is probed.

        To avoid the situation where multiple hypervisors attempt to define the
        semantics for the same leaf during development, we can partition
        the generic leaf space to allow each hypervisor to define a part
        of the generic space.

        For instance:
          VMware could define 0x4000001X
          Xen could define 0x4000002X
          KVM could define 0x4000003X
	  and so on...

        Note that hypervisors can implement any leaves that have been
        defined in the generic leaf space whenever common features can
        be found.  For example, VMware hypervisors can implement leafs
        that have been defined in the KVM area 0x4000003X and vice
        versa.

        The kernel can detect the support for a generic field inside 
        leaf 0x400000XY using the following algorithm:

		1.  Get EAX from Leaf 0x400000000, Hypervisor CPUID information.
		    EAX returns the maximum input value for the hypervisor CPUID
		    space.

		    If EAX < 0x400000XY, then the field is not available.

		2.  Else, extract the field from the target Leaf 0x400000XY 
                    by doing cpuid(0x400000XY).

		    If (field == 0), this feature is unsupported/unimplemented
                    by the hypervisor.  The kernel should handle this case 
                    gracefully so that a hypervisor is never required to 
                    support or implement any particular generic leaf.

--------------------------------------------------------------------------------

Definition of the Generic CPUID space.
        Leaf 0x40000010, Timing Information.

        VMware has defined the first generic leaf to provide timing
        information.  This leaf returns the current TSC frequency and
        current Bus frequency in kHz.

        # EAX: (Virtual) TSC frequency in kHz.
        # EBX: (Virtual) Bus (local apic timer) frequency in kHz.
        # ECX, EDX: RESERVED (Per above, reserved fields are set to zero).

--------------------------------------------------------------------------------

Written By,
	Alok N Kataria <akataria@vmware.com>
	Dan Hecht <dhecht@vmware.com>
Inputs from,
	Jun Nakajima <jun.nakajima@intel.com>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 17:14 [RFC] CPUID usage for interaction between Hypervisors and Linux Alok Kataria
@ 2008-10-01 17:21 ` H. Peter Anvin
  2008-10-01 17:33   ` Alok Kataria
  2008-10-01 17:47 ` H. Peter Anvin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-01 17:21 UTC (permalink / raw)
  To: akataria
  Cc: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Nakajima, Jun,
	Dan Hecht, Zachary Amsden, virtualization, kvm

Alok Kataria wrote:
> 
> (This proposal may be adopted by other guest OSes.  However, that is not
> a requirement because a hypervisor can expose a different CPUID
> interface depending on the guest OS type that is specified by the VM
> configuration.)
> 

Excuse me, but that is blatantly idiotic.  Expecting the user having to 
configure a VM to match the target OS is *exactly* as stupid as 
expecting the user to reconfigure the BIOS.  It's totally the wrong 
thing to do.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 17:21 ` H. Peter Anvin
@ 2008-10-01 17:33   ` Alok Kataria
  2008-10-01 17:45     ` H. Peter Anvin
  2008-10-01 18:06     ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 50+ messages in thread
From: Alok Kataria @ 2008-10-01 17:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Nakajima, Jun,
	Daniel Hecht, Zach Amsden, virtualization, kvm

On Wed, 2008-10-01 at 10:21 -0700, H. Peter Anvin wrote:
> Alok Kataria wrote:
> >
> > (This proposal may be adopted by other guest OSes.  However, that is not
> > a requirement because a hypervisor can expose a different CPUID
> > interface depending on the guest OS type that is specified by the VM
> > configuration.)
> >
> 
> Excuse me, but that is blatantly idiotic.  Expecting the user having to
> configure a VM to match the target OS is *exactly* as stupid as
> expecting the user to reconfigure the BIOS.  It's totally the wrong
> thing to do.

Hi Peter, 

Its not a user who has to do anything special here.
There are *intelligent* VM developers out there who can export a
different CPUid interface depending on the guest OS type. And this is
what most of the hypervisors do (not necessarily for CPUID, but for
other things right now).

Alok.
> 
>         -hpa


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 17:33   ` Alok Kataria
@ 2008-10-01 17:45     ` H. Peter Anvin
  2008-10-01 18:06     ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-01 17:45 UTC (permalink / raw)
  To: akataria
  Cc: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Nakajima, Jun,
	Daniel Hecht, Zach Amsden, virtualization, kvm

Alok Kataria wrote:
> 
> Hi Peter, 
> 
> Its not a user who has to do anything special here.
> There are *intelligent* VM developers out there who can export a
> different CPUid interface depending on the guest OS type. And this is
> what most of the hypervisors do (not necessarily for CPUID, but for
> other things right now).
> 

It doesn't matter, really; it's still the wrong thing to do, for the 
same reason it's the wrong thing in -- for example -- ACPI, which has 
similar "cleverness".

If we want to have a "Linux standard CPUID interface" suite we should 
just put them on a different set of numbers and let a hypervisor export 
all the interfaces.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 17:14 [RFC] CPUID usage for interaction between Hypervisors and Linux Alok Kataria
  2008-10-01 17:21 ` H. Peter Anvin
@ 2008-10-01 17:47 ` H. Peter Anvin
  2008-10-01 18:04 ` Jeremy Fitzhardinge
       [not found] ` <48E3BBC1.2050607__35819.6151479662$1222884502$gmane$org@goop.org>
  3 siblings, 0 replies; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-01 17:47 UTC (permalink / raw)
  To: akataria
  Cc: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Nakajima, Jun,
	Dan Hecht, Zachary Amsden, virtualization, kvm

Alok Kataria wrote:
> 
> Hypervisor CPUID Interface Proposal
> -----------------------------------
> 
> Intel & AMD have reserved cpuid levels 0x40000000 - 0x400000FF for
> software use.  Hypervisors can use these levels to provide an interface
> to pass information from the hypervisor to the guest running inside a
> virtual machine.
> 
> This proposal defines a standard framework for the way in which the
> Linux and hypervisor communities incrementally define this CPUID space.
> 

I also observe that your proposal provides no mean of positive 
identification, i.e. that a hypervisor actually conforms to your proposal.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 17:14 [RFC] CPUID usage for interaction between Hypervisors and Linux Alok Kataria
  2008-10-01 17:21 ` H. Peter Anvin
  2008-10-01 17:47 ` H. Peter Anvin
@ 2008-10-01 18:04 ` Jeremy Fitzhardinge
  2008-10-01 18:07   ` H. Peter Anvin
  2008-10-01 21:01   ` Alok Kataria
       [not found] ` <48E3BBC1.2050607__35819.6151479662$1222884502$gmane$org@goop.org>
  3 siblings, 2 replies; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-01 18:04 UTC (permalink / raw)
  To: akataria
  Cc: avi, Rusty Russell, Gerd Hoffmann, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Dan Hecht,
	Zachary Amsden, virtualization, kvm

Alok Kataria wrote:
> Hi,
>
> Please find below the proposal for the generic use of cpuid space
> allotted for hypervisors. Apart from this cpuid space another thing
> worth noting would be that, Intel & AMD reserve the MSRs from 0x40000000
> - 0x400000FF for software use. Though the proposal doesn't talk about
> MSR's right now, we should be aware of these reservations as we may want
> to extend the way we use CPUID to MSR usage as well.
>
> While we are at it, we also think we should form a group which has at
> least one person representing each of the hypervisors interested in
> generalizing the hypervisor CPUID space for Linux guest OS. This group
> will be informed whenever a new CPUID leaf from the generic space is to
> be used. This would help avoid any duplicate definitions for a CPUID
> semantic by two different hypervisors. I think most of the people are
> subscribed to LKML or the virtualization lists and we should use these
> lists as a platform to decide on things. 
>
> Thanks,
> Alok
>
> ---
>
> Hypervisor CPUID Interface Proposal
> -----------------------------------
>
> Intel & AMD have reserved cpuid levels 0x40000000 - 0x400000FF for
> software use.  Hypervisors can use these levels to provide an interface
> to pass information from the hypervisor to the guest running inside a
> virtual machine.
>
> This proposal defines a standard framework for the way in which the
> Linux and hypervisor communities incrementally define this CPUID space.
>
> (This proposal may be adopted by other guest OSes.  However, that is not
> a requirement because a hypervisor can expose a different CPUID
> interface depending on the guest OS type that is specified by the VM
> configuration.)
>
> Hypervisor Present Bit:
>         Bit 31 of ECX of CPUID leaf 0x1.
>
>         This bit has been reserved by Intel & AMD for use by
>         hypervisors, and indicates the presence of a hypervisor.
>
>         Virtual CPU's (hypervisors) set this bit to 1 and physical CPU's
>         (all existing and future cpu's) set this bit to zero.  This bit
> 	can be probed by the guest software to detect whether they are
> 	running inside a virtual machine.
>
> Hypervisor CPUID Information Leaf:
>         Leaf 0x40000000.
>
>         This leaf returns the CPUID leaf range supported by the
>         hypervisor and the hypervisor vendor signature.
>
>         # EAX: The maximum input value for CPUID supported by the hypervisor.
>         # EBX, ECX, EDX: Hypervisor vendor ID signature.
>
> Hypervisor Specific Leaves:
>         Leaf range 0x40000001 - 0x4000000F.
>
>         These cpuid leaves are reserved as hypervisor specific leaves.
>         The semantics of these 15 leaves depend on the signature read
>         from the "Hypervisor Information Leaf".
>
> Generic Leaves:
>         Leaf range 0x40000010 - 0x4000000FF.
>
>         The semantics of these leaves are consistent across all
>         hypervisors.  This allows the guest kernel to probe and
>         interpret these leaves without checking for a hypervisor
>         signature.
>
>         A hypervisor can indicate that a leaf or a leaf's field is
>         unsupported by returning zero when that leaf or field is probed.
>
>         To avoid the situation where multiple hypervisors attempt to define the
>         semantics for the same leaf during development, we can partition
>         the generic leaf space to allow each hypervisor to define a part
>         of the generic space.
>
>         For instance:
>           VMware could define 0x4000001X
>           Xen could define 0x4000002X
>           KVM could define 0x4000003X
> 	  and so on...
>   

No, we're not getting anywhere.  This is an outright broken idea.  The 
space is too small to be able to chop up in this way, and the number of 
vendors too large to be able to do it without having a central oversight.

The only way this can work is by having explicit positive identification 
of each group of leaves with a signature.  If there's a recognizable 
signature, then you can inspect the rest of the group; if not, then you 
can't.  That way, you can avoid any leaf usage which doesn't conform to 
this model, and you can also simultaneously support multiple hypervisor 
ABIs.  It also accommodates existing hypervisor use of this leaf space, 
even if they currently use a fixed location within it.

A concrete counter-proposal:

The space 0x40000000-0x400000ff is reserved for hypervisor usage.

This region is divided into 16 16-leaf blocks.  Each block has the 
structure:

0x400000x0:
    eax: max used leaf within the leaf block (max 0x400000xf)
    e[bcd]x: leaf block signature.  This may be a hypervisor-specific 
signature, or a generic signature, depending on the contents of the block

A guest may search for any supported Hypervisor ABIs by inspecting each 
leaf at 0x400000x0 for a known signature, and then may choose its mode 
of operation accordingly.  It must ignore any unknown signatures, and 
not touch any of the leaves within an unknown leaf block.

Hypervisor vendors who want to add a hypervisor-specific leaf block must 
choose a signature which is recognizably related to their or their 
hypervisor's name.

Signatures starting with "Generic" are reserved for generic leaf blocks.

A guest may scan leaf blocks to enumerate what hypervisor ABIs/hypercall 
interfaces are available to it.  It may mix and match any information 
from leaves it understands.  However, once it starts using a specific 
hypervisor ABI by making hypercalls or doing other operations with 
side-effects, it must commit to using that ABI exclusively (a specific 
hypervisor ABI may include the generic ABI by reference, however).

Correspondingly, a hypervisor must treat any cpuid accesses as 
side-effect free.

Definition of specific blocks:

Generic hypervisor leaf block:
  0x400000x0 signature is "GenericVMMIF" (or something)
  0x400000x1 tsc leaf as you've described

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 17:33   ` Alok Kataria
  2008-10-01 17:45     ` H. Peter Anvin
@ 2008-10-01 18:06     ` Jeremy Fitzhardinge
  2008-10-01 21:05       ` Alok Kataria
  1 sibling, 1 reply; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-01 18:06 UTC (permalink / raw)
  To: akataria
  Cc: H. Peter Anvin, avi, Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Alok Kataria wrote:
> Its not a user who has to do anything special here.
> There are *intelligent* VM developers out there who can export a
> different CPUid interface depending on the guest OS type. And this is
> what most of the hypervisors do (not necessarily for CPUID, but for
> other things right now).
>   

No, that's always a terrible idea.  Sure, its necessary to deal with 
some backward-compatibility issues, but we should even consider a new 
interface which assumes this kind of thing.  We want properly enumerable 
interfaces.

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 18:04 ` Jeremy Fitzhardinge
@ 2008-10-01 18:07   ` H. Peter Anvin
  2008-10-01 18:12     ` Jeremy Fitzhardinge
  2008-10-01 21:01   ` Alok Kataria
  1 sibling, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-01 18:07 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: akataria, avi, Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Dan Hecht,
	Zachary Amsden, virtualization, kvm

Jeremy Fitzhardinge wrote:
> 
> No, we're not getting anywhere.  This is an outright broken idea.  The 
> space is too small to be able to chop up in this way, and the number of 
> vendors too large to be able to do it without having a central oversight.
> 

I suspect we can get a larger number space if we ask Intel & AMD.  In 
fact, I think we should request that the entire 0x40xxxxxx numberspace 
is assigned to virtualization *anyway*.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 18:07   ` H. Peter Anvin
@ 2008-10-01 18:12     ` Jeremy Fitzhardinge
  2008-10-01 18:16       ` H. Peter Anvin
  0 siblings, 1 reply; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-01 18:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akataria, avi, Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Dan Hecht,
	Zachary Amsden, virtualization, kvm

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>>
>> No, we're not getting anywhere.  This is an outright broken idea.  
>> The space is too small to be able to chop up in this way, and the 
>> number of vendors too large to be able to do it without having a 
>> central oversight.
>>
>
> I suspect we can get a larger number space if we ask Intel & AMD.  In 
> fact, I think we should request that the entire 0x40xxxxxx numberspace 
> is assigned to virtualization *anyway*.

Yes, that would be good.  In that case I'd revise my proposal to back 
each leaf block 256 leaves instead of 16.  But it still needs to be a 
proper enumeration with signatures, rather than assigning fixed points 
in that space to specific interfaces.

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 18:12     ` Jeremy Fitzhardinge
@ 2008-10-01 18:16       ` H. Peter Anvin
  2008-10-01 18:36         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-01 18:16 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: akataria, avi, Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Dan Hecht,
	Zachary Amsden, virtualization, kvm

Jeremy Fitzhardinge wrote:
>>
>> I suspect we can get a larger number space if we ask Intel & AMD.  In 
>> fact, I think we should request that the entire 0x40xxxxxx numberspace 
>> is assigned to virtualization *anyway*.
> 
> Yes, that would be good.  In that case I'd revise my proposal to back 
> each leaf block 256 leaves instead of 16.  But it still needs to be a 
> proper enumeration with signatures, rather than assigning fixed points 
> in that space to specific interfaces.
> 

With a sufficiently large block, we could use fixed points, e.g. by 
having each vendor create interfaces in the 0x40SSSSXX range, where SSSS 
is the PCI ID they use for PCI devices.

Note that I said "create interfaces".  It's important that all about 
this is who specified the interface -- for "what hypervisor is this" 
just use 0x40000000 and disambiguate based on that.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 18:16       ` H. Peter Anvin
@ 2008-10-01 18:36         ` Jeremy Fitzhardinge
  2008-10-01 18:43           ` H. Peter Anvin
  2008-10-01 20:38           ` Chris Wright
  0 siblings, 2 replies; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-01 18:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akataria, avi, Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Dan Hecht,
	Zachary Amsden, virtualization, kvm

H. Peter Anvin wrote:
> With a sufficiently large block, we could use fixed points, e.g. by 
> having each vendor create interfaces in the 0x40SSSSXX range, where 
> SSSS is the PCI ID they use for PCI devices.

Sure, you could do that, but you'd still want to have a signature in 
0x40SSSS00 to positively identify the chunk.  And what if you wanted 
more than 256 leaves?

> Note that I said "create interfaces".  It's important that all about 
> this is who specified the interface -- for "what hypervisor is this" 
> just use 0x40000000 and disambiguate based on that.

"What hypervisor is this?" isn't a very interesting question; if you're 
even asking it then it suggests that something has gone wrong.  Its much 
more useful to ask "what interfaces does this hypervisor support?", and 
enumerating a smallish range of well-known leaves looking for signatures 
is the simplest way to do that.  (We could use signatures derived from 
the PCI vendor IDs which would help with managing that namespace.)

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 18:36         ` Jeremy Fitzhardinge
@ 2008-10-01 18:43           ` H. Peter Anvin
  2008-10-01 19:56             ` Jeremy Fitzhardinge
  2008-10-01 20:38           ` Chris Wright
  1 sibling, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-01 18:43 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: akataria, avi, Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Dan Hecht,
	Zachary Amsden, virtualization, kvm

Jeremy Fitzhardinge wrote:
> H. Peter Anvin wrote:
>> With a sufficiently large block, we could use fixed points, e.g. by 
>> having each vendor create interfaces in the 0x40SSSSXX range, where 
>> SSSS is the PCI ID they use for PCI devices.
> 
> Sure, you could do that, but you'd still want to have a signature in 
> 0x40SSSS00 to positively identify the chunk.  And what if you wanted 
> more than 256 leaves?

What you'd want, at least, is a standard CPUID identification and range 
leaf at the top.  256 leaves is a *lot*, though; I'm not saying one 
couldn't run out, but it'd be hard.  Keep in mind that for large objects 
there are "counting" CPUID levels, as much as I personally dislike them, 
and one could easily argue that if you're doing something that would 
require anywhere near 256 leaves you probably are storing bulk data that 
belongs elsewhere.

Of course, if we had some kind of central authority assigning 8-bit IDs 
that would be even better, especially since there are tools in the field 
which already scan on 64K boundaries.  I don't know, though, how likely 
it is that we'll have to deal with 256 hypervisors.

>> Note that I said "create interfaces".  It's important that all about 
>> this is who specified the interface -- for "what hypervisor is this" 
>> just use 0x40000000 and disambiguate based on that.
> 
> "What hypervisor is this?" isn't a very interesting question; if you're 
> even asking it then it suggests that something has gone wrong.  Its much 
> more useful to ask "what interfaces does this hypervisor support?", and 
> enumerating a smallish range of well-known leaves looking for signatures 
> is the simplest way to do that.  (We could use signatures derived from 
> the PCI vendor IDs which would help with managing that namespace.)
> 

I agree completely, of course (except that "what hypervisor is this" 
still has limited usage, especially when it comes to dealing with bug 
workarounds.  Similar to the way we use CPU vendor IDs and stepping 
numbers for physical CPUs.)

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 18:43           ` H. Peter Anvin
@ 2008-10-01 19:56             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-01 19:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akataria, avi, Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Dan Hecht,
	Zachary Amsden, virtualization, kvm

H. Peter Anvin wrote:
> What you'd want, at least, is a standard CPUID identification and 
> range leaf at the top.  256 leaves is a *lot*, though; I'm not saying 
> one couldn't run out, but it'd be hard.  Keep in mind that for large 
> objects there are "counting" CPUID levels, as much as I personally 
> dislike them, and one could easily argue that if you're doing 
> something that would require anywhere near 256 leaves you probably are 
> storing bulk data that belongs elsewhere.

I agree, but it just makes the proposal a bit more brittle.

> Of course, if we had some kind of central authority assigning 8-bit 
> IDs that would be even better, especially since there are tools in the 
> field which already scan on 64K boundaries.  I don't know, though, how 
> likely it is that we'll have to deal with 256 hypervisors.

I'm assuming that the likelihood of getting all possible vendors - 
current and future - to agree to a scheme like this is pretty small.  We 
need to come up with something that will work well when there are 
non-cooperative parties to deal with.

> I agree completely, of course (except that "what hypervisor is this" 
> still has limited usage, especially when it comes to dealing with bug 
> workarounds.  Similar to the way we use CPU vendor IDs and stepping 
> numbers for physical CPUs.)

I guess.  Its certainly useful to be able to identify the hypervisor for 
bug reporting and just general status information.  But making 
functional changes on that basis should be a last resort.

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
       [not found] ` <48E3BBC1.2050607__35819.6151479662$1222884502$gmane$org@goop.org>
@ 2008-10-01 20:03   ` Anthony Liguori
  2008-10-01 20:08     ` Jeremy Fitzhardinge
       [not found]     ` <48E3D8A8.604__13396.6479487301$1222891831$gmane$org@goop.org>
  0 siblings, 2 replies; 50+ messages in thread
From: Anthony Liguori @ 2008-10-01 20:03 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: akataria, kvm, the arch/x86 maintainers, Dan Hecht, LKML,
	virtualization, avi, H. Peter Anvin, Ingo Molnar

Jeremy Fitzhardinge wrote:
> Alok Kataria wrote:
> 
> No, we're not getting anywhere.  This is an outright broken idea.  The 
> space is too small to be able to chop up in this way, and the number of 
> vendors too large to be able to do it without having a central oversight.
> 
> The only way this can work is by having explicit positive identification 
> of each group of leaves with a signature.  If there's a recognizable 
> signature, then you can inspect the rest of the group; if not, then you 
> can't.  That way, you can avoid any leaf usage which doesn't conform to 
> this model, and you can also simultaneously support multiple hypervisor 
> ABIs.  It also accommodates existing hypervisor use of this leaf space, 
> even if they currently use a fixed location within it.
> 
> A concrete counter-proposal:

Mmm, cpuid bikeshedding :-)

> The space 0x40000000-0x400000ff is reserved for hypervisor usage.
> 
> This region is divided into 16 16-leaf blocks.  Each block has the 
> structure:
> 
> 0x400000x0:
>     eax: max used leaf within the leaf block (max 0x400000xf)

Why even bother with this?  It doesn't seem necessary in your proposal.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 20:03   ` Anthony Liguori
@ 2008-10-01 20:08     ` Jeremy Fitzhardinge
       [not found]     ` <48E3D8A8.604__13396.6479487301$1222891831$gmane$org@goop.org>
  1 sibling, 0 replies; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-01 20:08 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: akataria, kvm, the arch/x86 maintainers, Dan Hecht, LKML,
	virtualization, avi, H. Peter Anvin, Ingo Molnar

Anthony Liguori wrote:
> Mmm, cpuid bikeshedding :-)

My shade of blue is better.

>> The space 0x40000000-0x400000ff is reserved for hypervisor usage.
>>
>> This region is divided into 16 16-leaf blocks.  Each block has the 
>> structure:
>>
>> 0x400000x0:
>>     eax: max used leaf within the leaf block (max 0x400000xf)
>
> Why even bother with this?  It doesn't seem necessary in your proposal.

It allows someone to incrementally add things to their block in a fairly 
orderly way.  But more importantly, its the prevailing idiom, and the 
existing and proposed cpuid schemes already do this, so they'd fit in as-is.

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 18:36         ` Jeremy Fitzhardinge
  2008-10-01 18:43           ` H. Peter Anvin
@ 2008-10-01 20:38           ` Chris Wright
  2008-10-01 22:38             ` H. Peter Anvin
  1 sibling, 1 reply; 50+ messages in thread
From: Chris Wright @ 2008-10-01 20:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, akataria, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Nakajima, Jun,
	Dan Hecht, Zachary Amsden, virtualization, kvm

* Jeremy Fitzhardinge (jeremy@goop.org) wrote:
> "What hypervisor is this?" isn't a very interesting question; if you're  
> even asking it then it suggests that something has gone wrong.

It's essentially already happening.  Everyone wants to be a better
hyperv than hyperv ;-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 18:04 ` Jeremy Fitzhardinge
  2008-10-01 18:07   ` H. Peter Anvin
@ 2008-10-01 21:01   ` Alok Kataria
  2008-10-01 21:08     ` Anthony Liguori
  2008-10-01 21:17     ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 50+ messages in thread
From: Alok Kataria @ 2008-10-01 21:01 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: avi, Rusty Russell, Gerd Hoffmann, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	Zach Amsden, virtualization, kvm

On Wed, 2008-10-01 at 11:04 -0700, Jeremy Fitzhardinge wrote:

> No, we're not getting anywhere.  This is an outright broken idea.  The
> space is too small to be able to chop up in this way, and the number of
> vendors too large to be able to do it without having a central oversight.
> 
> The only way this can work is by having explicit positive identification
> of each group of leaves with a signature.  If there's a recognizable
> signature, then you can inspect the rest of the group; if not, then you
> can't.  That way, you can avoid any leaf usage which doesn't conform to
> this model, and you can also simultaneously support multiple hypervisor
> ABIs.  It also accommodates existing hypervisor use of this leaf space,
> even if they currently use a fixed location within it.
> 
> A concrete counter-proposal:
> 
> The space 0x40000000-0x400000ff is reserved for hypervisor usage.
> 
> This region is divided into 16 16-leaf blocks.  Each block has the
> structure:
> 
> 0x400000x0:
>     eax: max used leaf within the leaf block (max 0x400000xf)
>     e[bcd]x: leaf block signature.  This may be a hypervisor-specific
> signature, or a generic signature, depending on the contents of the block
> 
> A guest may search for any supported Hypervisor ABIs by inspecting each
> leaf at 0x400000x0 for a known signature, and then may choose its mode
> of operation accordingly.  It must ignore any unknown signatures, and
> not touch any of the leaves within an unknown leaf block.
> Hypervisor vendors who want to add a hypervisor-specific leaf block must
> choose a signature which is recognizably related to their or their
> hypervisor's name.
> 
> Signatures starting with "Generic" are reserved for generic leaf blocks.
> 
> A guest may scan leaf blocks to enumerate what hypervisor ABIs/hypercall
> interfaces are available to it.  It may mix and match any information
> from leaves it understands.  However, once it starts using a specific
> hypervisor ABI by making hypercalls or doing other operations with
> side-effects, it must commit to using that ABI exclusively (a specific
> hypervisor ABI may include the generic ABI by reference, however).
> 
> Correspondingly, a hypervisor must treat any cpuid accesses as
> side-effect free.
> 
> Definition of specific blocks:
> 
> Generic hypervisor leaf block:
>   0x400000x0 signature is "GenericVMMIF" (or something)
>   0x400000x1 tsc leaf as you've described
> 

I see following issues with this proposal,

1. Kernel complexity : Just thinking about the complexity that this will
put in the kernel to handle these multiple ABI signatures and scanning
all of these leaf block's is difficult to digest.

2. Divergence in the interface provided by the hypervisors  : 
	The reason we brought up a flat hierarchy is because we think we should
be moving towards a approach where the guest code doesn't diverge too
much when running under different hypervisors. That is the guest
essentially does the same thing if its running on say Xen or VMware.

This design IMO, will take us a step backward to  what we already have
seen with para virt ops. Each hypervisor (mostly) defines its own cpuid
block, the guest correspondingly needs to have code to handle each of
these cpuid blocks, with these blocks will mostly being exclusive.


3. Is their a need to do all this over engineering : 
	Aren't we over engineering a simple interface over here. The point is,
there are right now 256 cpuid leafs do we realistically think we are
ever going to exhaust all these leafs. We are really surprised to know
that people may think this space is small enough. It would be
interesting to know what all use you might want to put cpuid for.

Thanks,
Alok



>     J


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
       [not found]     ` <48E3D8A8.604__13396.6479487301$1222891831$gmane$org@goop.org>
@ 2008-10-01 21:03       ` Anthony Liguori
  0 siblings, 0 replies; 50+ messages in thread
From: Anthony Liguori @ 2008-10-01 21:03 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: kvm, the arch/x86 maintainers, Dan Hecht, LKML, virtualization,
	avi, H. Peter Anvin, akataria, Ingo Molnar

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>> Mmm, cpuid bikeshedding :-)
> 
> My shade of blue is better.
> 
>>> The space 0x40000000-0x400000ff is reserved for hypervisor usage.
>>>
>>> This region is divided into 16 16-leaf blocks.  Each block has the 
>>> structure:
>>>
>>> 0x400000x0:
>>>     eax: max used leaf within the leaf block (max 0x400000xf)
>> Why even bother with this?  It doesn't seem necessary in your proposal.
> 
> It allows someone to incrementally add things to their block in a fairly 
> orderly way.  But more importantly, its the prevailing idiom, and the 
> existing and proposed cpuid schemes already do this, so they'd fit in as-is.

We just leave eax as zero.  It wouldn't be that upsetting to change this 
as it would only keep new guests from working on older KVMs.

However, I see little incentive to change anything unless there's 
something compelling that we would get in return.  Since we're only 
talking about Linux guests, it's just as easy for us to add things to 
our paravirt_ops implementation as it would be to add things using this 
new model.

If this was something that other guests were all agreeing to support 
(even if it was just the BSDs and OpenSolaris), then there may be value 
to it.  Right now, I see no real value in changing the status quo.

Regards,

Anthony Liguori


>     J


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 18:06     ` Jeremy Fitzhardinge
@ 2008-10-01 21:05       ` Alok Kataria
  2008-10-01 22:46         ` H. Peter Anvin
  0 siblings, 1 reply; 50+ messages in thread
From: Alok Kataria @ 2008-10-01 21:05 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, avi, Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	Zach Amsden, virtualization, kvm

On Wed, 2008-10-01 at 11:06 -0700, Jeremy Fitzhardinge wrote:
> Alok Kataria wrote:
> > Its not a user who has to do anything special here.
> > There are *intelligent* VM developers out there who can export a
> > different CPUid interface depending on the guest OS type. And this is
> > what most of the hypervisors do (not necessarily for CPUID, but for
> > other things right now).
> >
> 
> No, that's always a terrible idea.  Sure, its necessary to deal with
> some backward-compatibility issues, but we should even consider a new
> interface which assumes this kind of thing.  We want properly enumerable
> interfaces.

The reason we still have to do this is because, Microsoft has already
defined a CPUID format which is way different than what you or I are
proposing ( with the current case of 256 leafs being available). And I
doubt they would change the way they deal with it on their OS. 
Any proposal that we go with, we will have to export different CPUID
interface from the hypervisor for the 2 OS in question. 

So i think this is something that we anyways will have to do and not
worth binging about in the discussion.

--
Alok

>     J


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:01   ` Alok Kataria
@ 2008-10-01 21:08     ` Anthony Liguori
  2008-10-01 21:15       ` Chris Wright
  2008-10-01 21:23       ` Alok Kataria
  2008-10-01 21:17     ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 50+ messages in thread
From: Anthony Liguori @ 2008-10-01 21:08 UTC (permalink / raw)
  To: akataria
  Cc: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers, LKML,
	Nakajima, Jun, Daniel Hecht, Zach Amsden, virtualization, kvm

Alok Kataria wrote:
> On Wed, 2008-10-01 at 11:04 -0700, Jeremy Fitzhardinge wrote:
>   
> 2. Divergence in the interface provided by the hypervisors  : 
> 	The reason we brought up a flat hierarchy is because we think we should
> be moving towards a approach where the guest code doesn't diverge too
> much when running under different hypervisors. That is the guest
> essentially does the same thing if its running on say Xen or VMware.
>
> This design IMO, will take us a step backward to  what we already have
> seen with para virt ops. Each hypervisor (mostly) defines its own cpuid
> block, the guest correspondingly needs to have code to handle each of
> these cpuid blocks, with these blocks will mostly being exclusive.
>   

What's wrong with what we have in paravirt_ops?  Just agreeing on CPUID 
doesn't help very much.  You still need a mechanism for doing hypercalls 
to implement anything meaningful.  We aren't going to agree on a 
hypercall mechanism.  KVM uses direct hypercall instructions, Xen uses a 
hypercall page, VMware uses VMI, Hyper-V uses MSR writes.  We all have 
already defined the hypercall namespace in a certain way.

We've already gone down the road of trying to make standard paravirtual 
interfaces (via virtio).  No one was sufficiently interested in 
collaborating.  I don't see why other paravirtualizations are going to 
be much different.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:08     ` Anthony Liguori
@ 2008-10-01 21:15       ` Chris Wright
  2008-10-01 21:31         ` Anthony Liguori
  2008-10-01 21:23       ` Alok Kataria
  1 sibling, 1 reply; 50+ messages in thread
From: Chris Wright @ 2008-10-01 21:15 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: akataria, Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers, LKML,
	Nakajima, Jun, Daniel Hecht, Zach Amsden, virtualization, kvm

* Anthony Liguori (anthony@codemonkey.ws) wrote:
> We've already gone down the road of trying to make standard paravirtual  
> interfaces (via virtio).  No one was sufficiently interested in  
> collaborating.  I don't see why other paravirtualizations are going to  
> be much different.

The point is to be able to support those interfaces.  Presently a Linux guest
will test and find out which HV it's running on, and adapt.  Another
guest will fail to enlighten itself, and perf will suffer...yadda, yadda.

thanks,
-chris

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:01   ` Alok Kataria
  2008-10-01 21:08     ` Anthony Liguori
@ 2008-10-01 21:17     ` Jeremy Fitzhardinge
  2008-10-01 21:34       ` Anthony Liguori
  1 sibling, 1 reply; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-01 21:17 UTC (permalink / raw)
  To: akataria
  Cc: avi, Rusty Russell, Gerd Hoffmann, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Alok Kataria wrote:
> 1. Kernel complexity : Just thinking about the complexity that this will
> put in the kernel to handle these multiple ABI signatures and scanning
> all of these leaf block's is difficult to digest.
>   

The scanning for the signatures is trivial; it's not a significant 
amount of code.  Actually implementing them is a different matter, but 
that's the same regardless of where they are placed or how they're 
discovered.  After discovery its the same either way: there's a leaf 
base with offsets from it.

> 2. Divergence in the interface provided by the hypervisors  : 
> 	The reason we brought up a flat hierarchy is because we think we should
> be moving towards a approach where the guest code doesn't diverge too
> much when running under different hypervisors. That is the guest
> essentially does the same thing if its running on say Xen or VMware.
>   

I guess, but the bulk of the uses of this stuff are going to be 
hypervisor-specific.  You're hard-pressed to come up with any other 
generic uses beyond tsc.  In general, if a hypervisor is going to put 
something in a special cpuid leaf, its because there's no other good way 
to represent it.  Generic things are generally going to appear as an 
emulated piece of the virtualized platform, in ACPI, DMI, a 
hardware-defined cpuid leaf, etc...

> 3. Is their a need to do all this over engineering : 
> 	Aren't we over engineering a simple interface over here. The point is,
> there are right now 256 cpuid leafs do we realistically think we are
> ever going to exhaust all these leafs. We are really surprised to know
> that people may think this space is small enough. It would be
> interesting to know what all use you might want to put cpuid for.
>   

Look, if you want to propose a way to use that cpuid space in a 
reasonably flexible way that allows it to be used as the need arises, 
then we can talk about it.  But I think your proposal is a poor way to 
achieve those ends

If you want blessing for something that you've already implemented and 
shipped, well, you don't need anyone's blessing for that.

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:08     ` Anthony Liguori
  2008-10-01 21:15       ` Chris Wright
@ 2008-10-01 21:23       ` Alok Kataria
  2008-10-01 21:29         ` Anthony Liguori
  1 sibling, 1 reply; 50+ messages in thread
From: Alok Kataria @ 2008-10-01 21:23 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers, LKML,
	Nakajima, Jun, Daniel Hecht, Zach Amsden, virtualization, kvm

On Wed, 2008-10-01 at 14:08 -0700, Anthony Liguori wrote:
> Alok Kataria wrote:
> > On Wed, 2008-10-01 at 11:04 -0700, Jeremy Fitzhardinge wrote:
> >
> > 2. Divergence in the interface provided by the hypervisors  :
> >       The reason we brought up a flat hierarchy is because we think we should
> > be moving towards a approach where the guest code doesn't diverge too
> > much when running under different hypervisors. That is the guest
> > essentially does the same thing if its running on say Xen or VMware.
> >
> > This design IMO, will take us a step backward to  what we already have
> > seen with para virt ops. Each hypervisor (mostly) defines its own cpuid
> > block, the guest correspondingly needs to have code to handle each of
> > these cpuid blocks, with these blocks will mostly being exclusive.
> >
> 
> What's wrong with what we have in paravirt_ops? 

Your explanation below answers the question you raised, the problem
being we need to have support for each of these different hypercall
mechanisms in the kernel. 
I understand that this was the correct thing to do at that moment. 
But do we want to go the same way again for CPUID when we can make it
generic (flat enough) for anybody to use it in the same manner and
expose a generic interface to the kernel.

>  Just agreeing on CPUID
> doesn't help very much. 
Yeah, nobody is removing any of the paravirt ops support.

>  You still need a mechanism for doing hypercalls
> to implement anything meaningful.  We aren't going to agree on a
> hypercall mechanism.  KVM uses direct hypercall instructions, Xen uses a
> hypercall page, VMware uses VMI, Hyper-V uses MSR writes.  We all have
> already defined the hypercall namespace in a certain way.

Thanks,
Alok



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:23       ` Alok Kataria
@ 2008-10-01 21:29         ` Anthony Liguori
  0 siblings, 0 replies; 50+ messages in thread
From: Anthony Liguori @ 2008-10-01 21:29 UTC (permalink / raw)
  To: akataria
  Cc: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers, LKML,
	Nakajima, Jun, Daniel Hecht, Zach Amsden, virtualization, kvm

Alok Kataria wrote:
> Your explanation below answers the question you raised, the problem
> being we need to have support for each of these different hypercall
> mechanisms in the kernel. 
> I understand that this was the correct thing to do at that moment. 
> But do we want to go the same way again for CPUID when we can make it
> generic (flat enough) for anybody to use it in the same manner and
> expose a generic interface to the kernel.
>   

But what sort of information can be stored in cpuid that's actually 
useful?  Right now we just it in KVM for feature bits.  Most of the 
stuff that's interesting is stored in shared memory because a guest can 
read that without taking a vmexit or via a hypercall.

We can all agree upon a common mechanism for doing something but if no 
one is using that mechanism to do anything significant, what purpose 
does it serve?

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:15       ` Chris Wright
@ 2008-10-01 21:31         ` Anthony Liguori
  0 siblings, 0 replies; 50+ messages in thread
From: Anthony Liguori @ 2008-10-01 21:31 UTC (permalink / raw)
  To: Chris Wright
  Cc: akataria, Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers, LKML,
	Nakajima, Jun, Daniel Hecht, Zach Amsden, virtualization, kvm

Chris Wright wrote:
> * Anthony Liguori (anthony@codemonkey.ws) wrote:
>   
>> We've already gone down the road of trying to make standard paravirtual  
>> interfaces (via virtio).  No one was sufficiently interested in  
>> collaborating.  I don't see why other paravirtualizations are going to  
>> be much different.
>>     
>
> The point is to be able to support those interfaces.  Presently a Linux guest
> will test and find out which HV it's running on, and adapt.  Another
> guest will fail to enlighten itself, and perf will suffer...yadda, yadda.
>   

Agreeing on CPUID does not get us close at all to having shared 
interfaces for paravirtualization.  As I said in another note, there are 
more fundamental things that we differ on (like hypercall mechanism) 
that's going to make that challenging.

We already are sharing code, when appropriate (see the Xen/KVM PV clock 
interface).

Regards,

Anthony Liguori

> thanks,
> -chris
>   


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:17     ` Jeremy Fitzhardinge
@ 2008-10-01 21:34       ` Anthony Liguori
  2008-10-01 21:43         ` Chris Wright
  2008-10-01 23:47         ` Zachary Amsden
  0 siblings, 2 replies; 50+ messages in thread
From: Anthony Liguori @ 2008-10-01 21:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: akataria, avi, Rusty Russell, Gerd Hoffmann, H. Peter Anvin,
	Ingo Molnar, the arch/x86 maintainers, LKML, Nakajima, Jun,
	Daniel Hecht, Zach Amsden, virtualization, kvm

Jeremy Fitzhardinge wrote:
> Alok Kataria wrote:
>
> I guess, but the bulk of the uses of this stuff are going to be 
> hypervisor-specific.  You're hard-pressed to come up with any other 
> generic uses beyond tsc.

And arguably, storing TSC frequency in CPUID is a terrible interface 
because the TSC frequency can change any time a guest is entered.  It 
really should be a shared memory area so that a guest doesn't have to 
vmexit to read it (like it is with the Xen/KVM paravirt clock).

Regards,

Anthony Liguori

>   In general, if a hypervisor is going to put something in a special 
> cpuid leaf, its because there's no other good way to represent it.  
> Generic things are generally going to appear as an emulated piece of 
> the virtualized platform, in ACPI, DMI, a hardware-defined cpuid leaf, 
> etc...


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:34       ` Anthony Liguori
@ 2008-10-01 21:43         ` Chris Wright
  2008-10-02 11:29           ` Avi Kivity
  2008-10-01 23:47         ` Zachary Amsden
  1 sibling, 1 reply; 50+ messages in thread
From: Chris Wright @ 2008-10-01 21:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jeremy Fitzhardinge, akataria, avi, Rusty Russell, Gerd Hoffmann,
	H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers, LKML,
	Nakajima, Jun, Daniel Hecht, Zach Amsden, virtualization, kvm

* Anthony Liguori (anthony@codemonkey.ws) wrote:
> And arguably, storing TSC frequency in CPUID is a terrible interface  
> because the TSC frequency can change any time a guest is entered.  It  

True for older hardware, newer hardware should fix this.  I guess the
point is, the are numbers that are easy to measure incorrectly in guest.
Doesn't justify the whole thing..

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 20:38           ` Chris Wright
@ 2008-10-01 22:38             ` H. Peter Anvin
  0 siblings, 0 replies; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-01 22:38 UTC (permalink / raw)
  To: Chris Wright
  Cc: Jeremy Fitzhardinge, akataria, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Nakajima, Jun,
	Dan Hecht, Zachary Amsden, virtualization, kvm

Chris Wright wrote:
> * Jeremy Fitzhardinge (jeremy@goop.org) wrote:
>> "What hypervisor is this?" isn't a very interesting question; if you're  
>> even asking it then it suggests that something has gone wrong.
> 
> It's essentially already happening.  Everyone wants to be a better
> hyperv than hyperv ;-)

That's a hy-perv?  ;)

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:05       ` Alok Kataria
@ 2008-10-01 22:46         ` H. Peter Anvin
  2008-10-02  1:11           ` Nakajima, Jun
  0 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-01 22:46 UTC (permalink / raw)
  To: akataria
  Cc: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Nakajima, Jun,
	Daniel Hecht, Zach Amsden, virtualization, kvm

Alok Kataria wrote:
>> No, that's always a terrible idea.  Sure, its necessary to deal with
>> some backward-compatibility issues, but we should even consider a new
>> interface which assumes this kind of thing.  We want properly enumerable
>> interfaces.
> 
> The reason we still have to do this is because, Microsoft has already
> defined a CPUID format which is way different than what you or I are
> proposing ( with the current case of 256 leafs being available). And I
> doubt they would change the way they deal with it on their OS. 
> Any proposal that we go with, we will have to export different CPUID
> interface from the hypervisor for the 2 OS in question. 
> 
> So i think this is something that we anyways will have to do and not
> worth binging about in the discussion.

No, that's a good hint that what "you and I" are proposing is utterly 
broken and exactly underscores what I have been stressing about 
noncompliant hypervisors.

All I have seen out of Microsoft only covers CPUID levels 0x40000000 as 
an vendor identification leaf and 0x40000001 as a "hypervisor 
identification leaf", but you might have access to other information.

This further underscores my belief that using 0x400000xx for anything 
"standards-based" at all is utterly futile, and that this space should 
be treated as vendor identification and the rest as vendor-specific. 
Any hope of creating a standard that's actually usable needs to be 
outside this space, e.g. in the 0x40SSSSxx space I proposed earlier.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:34       ` Anthony Liguori
  2008-10-01 21:43         ` Chris Wright
@ 2008-10-01 23:47         ` Zachary Amsden
  2008-10-02  0:39           ` H. Peter Anvin
  2008-10-02  0:41           ` Anthony Liguori
  1 sibling, 2 replies; 50+ messages in thread
From: Zachary Amsden @ 2008-10-01 23:47 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jeremy Fitzhardinge, Alok Kataria, avi, Rusty Russell,
	Gerd Hoffmann, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	virtualization, kvm

On Wed, 2008-10-01 at 14:34 -0700, Anthony Liguori wrote:
> Jeremy Fitzhardinge wrote:
> > Alok Kataria wrote:
> >
> > I guess, but the bulk of the uses of this stuff are going to be
> > hypervisor-specific.  You're hard-pressed to come up with any other
> > generic uses beyond tsc.
> 
> And arguably, storing TSC frequency in CPUID is a terrible interface
> because the TSC frequency can change any time a guest is entered.  It
> really should be a shared memory area so that a guest doesn't have to
> vmexit to read it (like it is with the Xen/KVM paravirt clock).

It's not terrible, it's actually brilliant.  TSC is part of the
processor architecture, the processor should a way to tell us what speed
it is.

Having a TSC with no interface to determine the frequency is a terrible
design flaw.  This is what caused the problem in the first place.

And now we're trying to fiddle around with software wizardry what should
be done in hardware in the first place.  Once again, para-virtualization
is basically useless.  We can't agree on a solution without
over-designing some complex system with interface signatures and
multi-vendor cooperation and nonsense.  Solve the non-virtualized
problem and the virtualized problem goes away.

Jun, you work at Intel.  Can you ask for a new architecturally defined
MSR that returns the TSC frequency?  Not a virtualization specific MSR.
A real MSR that would exist on physical processors.  The TSC started as
an MSR anyway.  There should be another MSR that tells the frequency.
If it's hard to do in hardware, it can be a write-once MSR that gets
initialized by the BIOS.  It's really a very simple solution to a very
common problem.  Other MSRs are dedicated to bus speed and so on, this
seems remarkably similar.

Once the physical problem is solved, the virtualized problem doesn't
even exist.  We simply add support for the newly defined MSR and voilla.
Other chipmakers probably agree it's a good idea and go along with it
too, and in the meantime, reading a non-existent MSR is a fairly
harmlessly handled #GP.

I realize it's the wrong thing for us now, but long term, it's the only
architecturally 'correct' approach.  You can even extend it to have
visible TSC frequency changes clocked via performance counter events
(and then get interrupts on those events if you so wish), solving the
dynamic problem too.

Paravirtualization is a symptom of an architectural problem.  We should
always be trying to fix the architecture first.

Zach


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 23:47         ` Zachary Amsden
@ 2008-10-02  0:39           ` H. Peter Anvin
  2008-10-02  0:57             ` H. Peter Anvin
  2008-10-02  1:11             ` Zachary Amsden
  2008-10-02  0:41           ` Anthony Liguori
  1 sibling, 2 replies; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-02  0:39 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Anthony Liguori, Jeremy Fitzhardinge, Alok Kataria, avi,
	Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	virtualization, kvm

Zachary Amsden wrote:
> 
> Jun, you work at Intel.  Can you ask for a new architecturally defined
> MSR that returns the TSC frequency?  Not a virtualization specific MSR.
> A real MSR that would exist on physical processors.  The TSC started as
> an MSR anyway.  There should be another MSR that tells the frequency.
> If it's hard to do in hardware, it can be a write-once MSR that gets
> initialized by the BIOS.  It's really a very simple solution to a very
> common problem.  Other MSRs are dedicated to bus speed and so on, this
> seems remarkably similar.
> 

Ah, if it was only that simple.  Transmeta actually did this, but it's 
not as useful as you think.

There are at least three crystals in modern PCs: one at 32.768 kHz (for 
the RTC), one at 14.31818 MHz (PIT, PMTMR and HPET), and one at a higher 
frequency (often 200 MHz.)

All the main data distribution clocks in the system are derived from the 
third, which is subject to spread-spectrum modulation due to RFI 
concerns.  Therefore, relying on the *nominal* frequency of this clock 
is vastly incorrect; often by as much as 2%.  Spread-spectrum modulation 
is supposed to vary around zero enough that the spreading averages out, 
but the only way to know what the center frequency actually is is to 
average.  Furthermore, this high-frequency clock is generally not 
calibrated anywhere near as well as the 14 MHz clock; in good designs 
the 14 MHz is actually a TCXO (temperature compensated crystal 
oscillator), which is accurate to something like ±2 ppm.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 23:47         ` Zachary Amsden
  2008-10-02  0:39           ` H. Peter Anvin
@ 2008-10-02  0:41           ` Anthony Liguori
  1 sibling, 0 replies; 50+ messages in thread
From: Anthony Liguori @ 2008-10-02  0:41 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Jeremy Fitzhardinge, Alok Kataria, avi, Rusty Russell,
	Gerd Hoffmann, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	virtualization, kvm

Zachary Amsden wrote:
> On Wed, 2008-10-01 at 14:34 -0700, Anthony Liguori wrote:
>   
>> Jeremy Fitzhardinge wrote:
>>     
>>> Alok Kataria wrote:
>>>
>>> I guess, but the bulk of the uses of this stuff are going to be
>>> hypervisor-specific.  You're hard-pressed to come up with any other
>>> generic uses beyond tsc.
>>>       
>> And arguably, storing TSC frequency in CPUID is a terrible interface
>> because the TSC frequency can change any time a guest is entered.  It
>> really should be a shared memory area so that a guest doesn't have to
>> vmexit to read it (like it is with the Xen/KVM paravirt clock).
>>     
>
> It's not terrible, it's actually brilliant.

But of course!  Okay, not really :-)

>   TSC is part of the
> processor architecture, the processor should a way to tell us what speed
> it is.
>   

It does.  1 tick == 1 tick.  The processor doesn't have a concept of 
wall clock time so wall clock units don't make much sense.  If it did, 
I'd say, screw the TSC, just give me a ns granular time stamp and let's 
all forget that the TSC even exists.

> And now we're trying to fiddle around with software wizardry what should
> be done in hardware in the first place.  Once again, para-virtualization
> is basically useless.  We can't agree on a solution without
> over-designing some complex system with interface signatures and
> multi-vendor cooperation and nonsense.  Solve the non-virtualized
> problem and the virtualized problem goes away.
>
> Jun, you work at Intel.  Can you ask for a new architecturally defined
> MSR that returns the TSC frequency?  Not a virtualization specific MSR.
> A real MSR that would exist on physical processors.  The TSC started as
> an MSR anyway.  There should be another MSR that tells the frequency.
> If it's hard to do in hardware, it can be a write-once MSR that gets
> initialized by the BIOS.

rdtscp sort of gives you this.  But still, just give me my rdnsc and 
I'll be happy.

> I realize it's the wrong thing for us now, but long term, it's the only
> architecturally 'correct' approach.  You can even extend it to have
> visible TSC frequency changes clocked via performance counter events
> (and then get interrupts on those events if you so wish), solving the
> dynamic problem too.
>   

So a solution is needed that works for now.  Anything that requires a 
vmexit is bad because the TSC frequency can change quite often.  Even if 
you ignore the troubles with frequency scaling on older processors and 
VCPU migration across NUMA nodes, there will be a very visible change in 
TSC frequency after a live migration.

So there are two possible solutions.  Have a shared memory area that the 
guest can consult that has the latest TSC frequency (this is what KVM 
and Xen do) or have some sort of interrupt mechanism that notifies the 
guest when the TSC frequency changes after which, software can do 
something that vmexits to get the TSC frequency.

The proposed solution doesn't include a TSC frequency change 
notification mechanism.

This is part of the problem with this sort of approach to 
standardization.  It's hard to come up with the best interface at 
first.  You have to try a couple ways, and then everyone can eventually 
standardize on the best one if one ever emerges.

Regards,

Anthony Liguori

> Paravirtualization is a symptom of an architectural problem.  We should
> always be trying to fix the architecture first.
>
> Zach
>
>   


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-02  0:39           ` H. Peter Anvin
@ 2008-10-02  0:57             ` H. Peter Anvin
  2008-10-02  1:11             ` Zachary Amsden
  1 sibling, 0 replies; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-02  0:57 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Anthony Liguori, Jeremy Fitzhardinge, Alok Kataria, avi,
	Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	virtualization, kvm

H. Peter Anvin wrote:
> 
> Ah, if it was only that simple.  Transmeta actually did this, but it's 
> not as useful as you think.
> 

For what it's worth, Transmeta's implementation used CPUID leaf 
0x80860001.ECX to give the TSC frequency rounded to the nearest MHz. 
The caveat of spread-spectrum modulation applies.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 22:46         ` H. Peter Anvin
@ 2008-10-02  1:11           ` Nakajima, Jun
  2008-10-02  1:24             ` H. Peter Anvin
  0 siblings, 1 reply; 50+ messages in thread
From: Nakajima, Jun @ 2008-10-02  1:11 UTC (permalink / raw)
  To: H. Peter Anvin, akataria
  Cc: Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2438 bytes --]

On 10/1/2008 3:46:45 PM, H. Peter Anvin wrote:
> Alok Kataria wrote:
> > > No, that's always a terrible idea.  Sure, its necessary to deal
> > > with some backward-compatibility issues, but we should even
> > > consider a new interface which assumes this kind of thing.  We
> > > want properly enumerable interfaces.
> >
> > The reason we still have to do this is because, Microsoft has
> > already defined a CPUID format which is way different than what you
> > or I are proposing ( with the current case of 256 leafs being
> > available). And I doubt they would change the way they deal with it on their OS.
> > Any proposal that we go with, we will have to export different CPUID
> > interface from the hypervisor for the 2 OS in question.
> >
> > So i think this is something that we anyways will have to do and not
> > worth binging about in the discussion.
>
> No, that's a good hint that what "you and I" are proposing is utterly
> broken and exactly underscores what I have been stressing about
> noncompliant hypervisors.
>
> All I have seen out of Microsoft only covers CPUID levels 0x40000000
> as an vendor identification leaf and 0x40000001 as a "hypervisor
> identification leaf", but you might have access to other information.

No, it says "Leaf 0x40000001 as hypervisor vendor-neutral interface identification, which determines the semantics of leaves from 0x40000002 through 0x400000FF." The Leaf 0x40000000 returns vendor identifier signature (i.e. hypervisor identification) and the hypervisor CPUID leaf range, as in the proposal.

>
> This further underscores my belief that using 0x400000xx for anything
> "standards-based" at all is utterly futile, and that this space should
> be treated as vendor identification and the rest as vendor-specific.
> Any hope of creating a standard that's actually usable needs to be
> outside this space, e.g. in the 0x40SSSSxx space I proposed earlier.
>

Actually I'm not sure I'm following your logic. Are you saying using that 0x400000xx for anything "standards-based" is utterly futile because Microsoft said "the range is hypervisor vendor-neutral"? Or you were not sure what they meant there. If we are not clear, we can ask them.


>         -hpa
             .
Jun Nakajima | Intel Open Source Technology Center
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-02  0:39           ` H. Peter Anvin
  2008-10-02  0:57             ` H. Peter Anvin
@ 2008-10-02  1:11             ` Zachary Amsden
  2008-10-02  1:21               ` H. Peter Anvin
  1 sibling, 1 reply; 50+ messages in thread
From: Zachary Amsden @ 2008-10-02  1:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Anthony Liguori, Jeremy Fitzhardinge, Alok Kataria, avi,
	Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	virtualization, kvm

On Wed, 2008-10-01 at 17:39 -0700, H. Peter Anvin wrote:
> third, which is subject to spread-spectrum modulation due to RFI
> concerns.  Therefore, relying on the *nominal* frequency of this clock

I'm not suggesting using the nominal value.  I'm suggesting the
measurement be done in the one and only place where there is perfect
control of the system, the processor boot-strapping in the BIOS.

Only the platform designers themselves know the speed of the oscillator
which is modulating the clock and so only they should be calibrating the
speed of the TSC.

If this modulation really does alter the frequency by +/- 2% (seems high
to me, but hey, I don't design motherboards), using an LFO, then
basically all the calibration done in Linux is broken and has been for
some time.  You can't calibrate only once, or risk being off by 2%, you
can't calibrate repeatedly and take the fastest estimate, or you are off
by 2%, and you can't calibrate repeatedly and take the average without
risking SMI noise affecting the lowest clock speed measurement,
contributing unknown error.

Hmm.  Re-reading your e-mail, I see you are saying the nominal frequency
may be off by 2% (and I easily believe that), not necessarily that the
frequency modulation may be 2% (which I still think is high).  Does
anyone know what the actual bounds on spread spectrum modulation are or
how fast the clock is modulated?

Zach


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-02  1:11             ` Zachary Amsden
@ 2008-10-02  1:21               ` H. Peter Anvin
  0 siblings, 0 replies; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-02  1:21 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Anthony Liguori, Jeremy Fitzhardinge, Alok Kataria, avi,
	Rusty Russell, Gerd Hoffmann, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	virtualization, kvm

Zachary Amsden wrote:
> 
> I'm not suggesting using the nominal value.  I'm suggesting the
> measurement be done in the one and only place where there is perfect
> control of the system, the processor boot-strapping in the BIOS.
> 
> Only the platform designers themselves know the speed of the oscillator
> which is modulating the clock and so only they should be calibrating the
> speed of the TSC.
> 

No.  *Noone*, including the manufacturers, know the speed of the 
oscillator which is modulating the clock.  What you have to do is 
average over a timespan which is long enough that the SSM averages out 
(a relatively small fraction of a second.)

As for trusting the BIOS on this, that's a total joke.  Firmware vendors 
can't get the most basic details right.

> If this modulation really does alter the frequency by +/- 2% (seems high
> to me, but hey, I don't design motherboards), using an LFO, then
> basically all the calibration done in Linux is broken and has been for
> some time.  You can't calibrate only once, or risk being off by 2%, you
> can't calibrate repeatedly and take the fastest estimate, or you are off
> by 2%, and you can't calibrate repeatedly and take the average without
> risking SMI noise affecting the lowest clock speed measurement,
> contributing unknown error.

You have to calibrate over a sample interval long enough that the SSM 
averages out.

> Hmm.  Re-reading your e-mail, I see you are saying the nominal frequency
> may be off by 2% (and I easily believe that), not necessarily that the
> frequency modulation may be 2% (which I still think is high).  Does
> anyone know what the actual bounds on spread spectrum modulation are or
> how fast the clock is modulated?

No, I'm saying the frequency modulation may be up to 2%.  Typically it 
is something like [-2%,+0%].

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-02  1:11           ` Nakajima, Jun
@ 2008-10-02  1:24             ` H. Peter Anvin
  2008-10-03 22:33               ` Nakajima, Jun
  0 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-02  1:24 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: akataria, Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Nakajima, Jun wrote:
>>
>> All I have seen out of Microsoft only covers CPUID levels 0x40000000
>> as an vendor identification leaf and 0x40000001 as a "hypervisor
>> identification leaf", but you might have access to other information.
> 
> No, it says "Leaf 0x40000001 as hypervisor vendor-neutral interface identification, which determines the semantics of leaves from 0x40000002 through 0x400000FF." The Leaf 0x40000000 returns vendor identifier signature (i.e. hypervisor identification) and the hypervisor CPUID leaf range, as in the proposal.
> 

In other words, 0x40000002+ is vendor-specific space, based on the 
hypervisor specified in 0x40000001 (in theory); in practice both 
0x40000000:0x40000001 since M$ seem to use clever identifiers as 
"Hypervisor 1".

>> This further underscores my belief that using 0x400000xx for anything
>> "standards-based" at all is utterly futile, and that this space should
>> be treated as vendor identification and the rest as vendor-specific.
>> Any hope of creating a standard that's actually usable needs to be
>> outside this space, e.g. in the 0x40SSSSxx space I proposed earlier.
> 
> Actually I'm not sure I'm following your logic. Are you saying using that 0x400000xx for anything "standards-based" is utterly futile because Microsoft said "the range is hypervisor vendor-neutral"? Or you were not sure what they meant there. If we are not clear, we can ask them.
> 

What I'm saying is that Microsoft is effectively squatting on the 
0x400000xx space with their definition.  As written, it's not even clear 
that it will remain consistent between *their own* hypervisors, even 
less anyone else's.

	-hpa


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-01 21:43         ` Chris Wright
@ 2008-10-02 11:29           ` Avi Kivity
  0 siblings, 0 replies; 50+ messages in thread
From: Avi Kivity @ 2008-10-02 11:29 UTC (permalink / raw)
  To: Chris Wright
  Cc: Anthony Liguori, Jeremy Fitzhardinge, akataria, Rusty Russell,
	Gerd Hoffmann, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, LKML, Nakajima, Jun, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Chris Wright wrote:
> * Anthony Liguori (anthony@codemonkey.ws) wrote:
>   
>> And arguably, storing TSC frequency in CPUID is a terrible interface  
>> because the TSC frequency can change any time a guest is entered.  It  
>>     
>
> True for older hardware, newer hardware should fix this.  I guess the
> point is, the are numbers that are easy to measure incorrectly in guest.
> Doesn't justify the whole thing..
>   

It's not fixed for newer hardware.  Larger systems still have multiple 
tsc frequencies.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-02  1:24             ` H. Peter Anvin
@ 2008-10-03 22:33               ` Nakajima, Jun
  2008-10-03 23:30                 ` H. Peter Anvin
  0 siblings, 1 reply; 50+ messages in thread
From: Nakajima, Jun @ 2008-10-03 22:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akataria, Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3427 bytes --]

On 10/1/2008 6:24:26 PM, H. Peter Anvin wrote:
> Nakajima, Jun wrote:
> > >
> > > All I have seen out of Microsoft only covers CPUID levels
> > > 0x40000000 as an vendor identification leaf and 0x40000001 as a
> > > "hypervisor identification leaf", but you might have access to other information.
> >
> > No, it says "Leaf 0x40000001 as hypervisor vendor-neutral interface
> > identification, which determines the semantics of leaves from
> > 0x40000002 through 0x400000FF." The Leaf 0x40000000 returns vendor
> > identifier signature (i.e. hypervisor identification) and the
> > hypervisor CPUID leaf range, as in the proposal.
> >
>

Resuming the thread :-)

> In other words, 0x40000002+ is vendor-specific space, based on the
> hypervisor specified in 0x40000001 (in theory); in practice both
> 0x40000000:0x40000001 since M$ seem to use clever identifiers as
> "Hypervisor 1".

What it means their hypervisor returns the interface signature (i.e. "Hv#1"), and that defines the interface. If we use "Lv_1", for example, we can define the interface 0x40000002 through 0x400000FF for Linux. Since leaf 0x40000000 and 0x40000001 are separate, we can decouple the hypervisor vender from the interface it supports. This also allows a hypervisor to support multiple interfaces.

And whether a guest wants to use the interface without checking the vender id is a different thing. For Linux, we don't want to hardcode the vender ids in the upstream code at least for such a generic interface.

So I think we need to modify the proposal:

Hypervisor interface identification Leaf:
        Leaf 0x40000001.

        This leaf returns the interface signature that the hypervisor implements.
        # EAX: "Lv_1" (or something)
        # EBX, ECX, EDX: Reserved.

Lv_1 interface Leaves:
        Leaf range 0x40000002 - 0x4000000FF.

In fact, both Xen and KVM are using the leaf 0x40000001 for different purposes today (Xen: Xen version number, KVM: KVM para-virtualization features). But I don't think this would break their existing binaries mainly because they would need to expose the interface explicitly now.

>
> > > This further underscores my belief that using 0x400000xx for
> > > anything "standards-based" at all is utterly futile, and that this
> > > space should be treated as vendor identification and the rest as
> > > vendor-specific. Any hope of creating a standard that's actually
> > > usable needs to be outside this space, e.g. in the 0x40SSSSxx
> > > space I proposed earlier.
> >
> > Actually I'm not sure I'm following your logic. Are you saying using
> > that 0x400000xx for anything "standards-based" is utterly futile
> > because Microsoft said "the range is hypervisor vendor-neutral"? Or
> > you were not sure what they meant there. If we are not clear, we can
> > ask them.
> >
>
> What I'm saying is that Microsoft is effectively squatting on the
> 0x400000xx space with their definition.  As written, it's not even
> clear that it will remain consistent between *their own* hypervisors,
> even less anyone else's.

I hope the above clarified your concern. You can google-search a more detailed public spec. Let me know if you want to know a specific URL.

>
>         -hpa
>
             .
Jun Nakajima | Intel Open Source Technology Center
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-03 22:33               ` Nakajima, Jun
@ 2008-10-03 23:30                 ` H. Peter Anvin
  2008-10-04  0:27                   ` Nakajima, Jun
  0 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-03 23:30 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: akataria, Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Nakajima, Jun wrote:
> What it means their hypervisor returns the interface signature (i.e. "Hv#1"), and that defines the interface. If we use "Lv_1", for example, we can define the interface 0x40000002 through 0x400000FF for Linux. Since leaf 0x40000000 and 0x40000001 are separate, we can decouple the hypervisor vender from the interface it supports.

Right so far.

> This also allows a hypervisor to support multiple interfaces.

Wrong.

This isn't a two-way interface.  It's a one-way interface, and it 
*SHOULD BE*; exposing different information depending on what is running 
is a hack that is utterly tortorous at best.

> 
> In fact, both Xen and KVM are using the leaf 0x40000001 for different purposes today (Xen: Xen version number, KVM: KVM para-virtualization features). But I don't think this would break their existing binaries mainly because they would need to expose the interface explicitly now.
> 
>>>> This further underscores my belief that using 0x400000xx for
>>>> anything "standards-based" at all is utterly futile, and that this
>>>> space should be treated as vendor identification and the rest as
>>>> vendor-specific. Any hope of creating a standard that's actually
>>>> usable needs to be outside this space, e.g. in the 0x40SSSSxx
>>>> space I proposed earlier.
>>> Actually I'm not sure I'm following your logic. Are you saying using
>>> that 0x400000xx for anything "standards-based" is utterly futile
>>> because Microsoft said "the range is hypervisor vendor-neutral"? Or
>>> you were not sure what they meant there. If we are not clear, we can
>>> ask them.
>>>
>> What I'm saying is that Microsoft is effectively squatting on the
>> 0x400000xx space with their definition.  As written, it's not even
>> clear that it will remain consistent between *their own* hypervisors,
>> even less anyone else's.
> 
> I hope the above clarified your concern. You can google-search a more detailed public spec. Let me know if you want to know a specific URL.
> 

No, it hasn't "clarified my concern" in any way.  It's exactly 
*underscoring* it.  In other words, I consider 0x400000xx unusable for 
anything that is standards-based.  The interfaces everyone is currently 
using aren't designed to export multiple interfaces; they're designed to 
tell the guest which *one* interface is exported.  That is fine, we just 
need to go elsewhere.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-03 23:30                 ` H. Peter Anvin
@ 2008-10-04  0:27                   ` Nakajima, Jun
  2008-10-04  0:35                     ` H. Peter Anvin
  2008-10-04  8:53                     ` Avi Kivity
  0 siblings, 2 replies; 50+ messages in thread
From: Nakajima, Jun @ 2008-10-04  0:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akataria, Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3494 bytes --]

On 10/3/2008 4:30:29 PM, H. Peter Anvin wrote:
> Nakajima, Jun wrote:
> > What it means their hypervisor returns the interface signature (i.e.
> > "Hv#1"), and that defines the interface. If we use "Lv_1", for
> > example, we can define the interface 0x40000002 through 0x400000FF for Linux.
> > Since leaf 0x40000000 and 0x40000001 are separate, we can decouple
> > the hypervisor vender from the interface it supports.
>
> Right so far.
>
> > This also allows a hypervisor to support multiple interfaces.
>
> Wrong.
>
> This isn't a two-way interface.  It's a one-way interface, and it
> *SHOULD BE*; exposing different information depending on what is
> running is a hack that is utterly tortorous at best.

What I mean is that a hypervisor (with a single vender id) can support multiple interfaces, exposing a single interface to each guest that would expect a specific interface at runtime.

>
> >
> > In fact, both Xen and KVM are using the leaf 0x40000001 for
> > different purposes today (Xen: Xen version number, KVM: KVM
> > para-virtualization features). But I don't think this would break
> > their existing binaries mainly because they would need to expose the interface explicitly now.
> >
> > > > > This further underscores my belief that using 0x400000xx for
> > > > > anything "standards-based" at all is utterly futile, and that
> > > > > this space should be treated as vendor identification and the
> > > > > rest as vendor-specific. Any hope of creating a standard
> > > > > that's actually usable needs to be outside this space, e.g. in
> > > > > the 0x40SSSSxx space I proposed earlier.
> > > > Actually I'm not sure I'm following your logic. Are you saying
> > > > using that 0x400000xx for anything "standards-based" is utterly
> > > > futile because Microsoft said "the range is hypervisor
> > > > vendor-neutral"? Or you were not sure what they meant there. If
> > > > we are not clear, we can ask them.
> > > >
> > > What I'm saying is that Microsoft is effectively squatting on the
> > > 0x400000xx space with their definition.  As written, it's not even
> > > clear that it will remain consistent between *their own*
> > > hypervisors, even less anyone else's.
> >
> > I hope the above clarified your concern. You can google-search a
> > more detailed public spec. Let me know if you want to know a specific URL.
> >
>
> No, it hasn't "clarified my concern" in any way.  It's exactly
> *underscoring* it.  In other words, I consider 0x400000xx unusable for
> anything that is standards-based.  The interfaces everyone is
> currently using aren't designed to export multiple interfaces; they're
> designed to tell the guest which *one* interface is exported.  That is
> fine, we just need to go elsewhere.
>
>         -hpa

What's the significance of supporting multiple interfaces to the same guest simultaneously, i.e. _runtime_? We don't want the guests to run on such a literarily Frankenstein machine. And practically, such testing/debugging would be good only for Halloween :-).

The interface space can be distinct, but the contents are defined and implemented independently, thus you might find overlaps, inconsistency, etc. among the interfaces. And why is runtime "multiple interfaces" required for a standards-based interface?

             .
Jun Nakajima | Intel Open Source Technology Center
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-04  0:27                   ` Nakajima, Jun
@ 2008-10-04  0:35                     ` H. Peter Anvin
  2008-10-07 22:30                       ` Nakajima, Jun
  2008-10-04  8:53                     ` Avi Kivity
  1 sibling, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-04  0:35 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: akataria, Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Nakajima, Jun wrote:
> 
> What I mean is that a hypervisor (with a single vender id) can support multiple interfaces, exposing a single interface to each guest that would expect a specific interface at runtime.
> 

Yes, and for the reasons outlined in a previous post in this thread, 
this is an incredibly bad idea.  We already hate the guts of the ACPI 
people for this reason.

> 
> What's the significance of supporting multiple interfaces to the same guest simultaneously, i.e. _runtime_? We don't want the guests to run on such a literarily Frankenstein machine. And practically, such testing/debugging would be good only for Halloween :-).
> 

By that notion, EVERY CPU currently shipped is a "Frankenstein" CPU, 
since at very least they export Intel-derived and AMD-derived 
interfaces.  This is in other words, a ridiculous claim.

> The interface space can be distinct, but the contents are defined and implemented independently, thus you might find overlaps, inconsistency, etc. among the interfaces. And why is runtime "multiple interfaces" required for a standards-based interface?

That is the whole point -- without a central coordinating authority, 
you're going to have to accommodate many definition sources.  Otherwise, 
you're just back to where we started -- each hypervisor exports an 
interface and that's just that.

If there are multiple interface specifications, they should be exported 
simulateously in non-conflicting numberspaces, and the *GUEST* gets to 
choose what to believe.  We already do this for *all kinds* of 
information, including CPUID.  It's the right thing to do.

	-hpa


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-04  0:27                   ` Nakajima, Jun
  2008-10-04  0:35                     ` H. Peter Anvin
@ 2008-10-04  8:53                     ` Avi Kivity
  1 sibling, 0 replies; 50+ messages in thread
From: Avi Kivity @ 2008-10-04  8:53 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: H. Peter Anvin, akataria, Jeremy Fitzhardinge, Rusty Russell,
	Gerd Hoffmann, Ingo Molnar, the arch/x86 maintainers, LKML,
	Daniel Hecht, Zach Amsden, virtualization, kvm

Nakajima, Jun wrote:
> What's the significance of supporting multiple interfaces to the same guest simultaneously, i.e. _runtime_? We don't want the guests to run on such a literarily Frankenstein machine. And practically, such testing/debugging would be good only for Halloween :-).
>
>   

If you can only expose one interface, you need to have the user choose.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-04  0:35                     ` H. Peter Anvin
@ 2008-10-07 22:30                       ` Nakajima, Jun
  2008-10-07 22:37                         ` H. Peter Anvin
  2008-10-07 23:41                         ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 50+ messages in thread
From: Nakajima, Jun @ 2008-10-07 22:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akataria, Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1301 bytes --]

On 10/3/2008 5:35:39 PM, H. Peter Anvin wrote:
> Nakajima, Jun wrote:
> >
> > What's the significance of supporting multiple interfaces to the
> > same guest simultaneously, i.e. _runtime_? We don't want the guests
> > to run on such a literarily Frankenstein machine. And practically,
> > such testing/debugging would be good only for Halloween :-).
> >
>
> By that notion, EVERY CPU currently shipped is a "Frankenstein" CPU,
> since at very least they export Intel-derived and AMD-derived interfaces.
>  This is in other words, a ridiculous claim.

The big difference here is that you could create a VM at runtime (by combining the existing interfaces) that did not exist before (or was not tested before). For example, a hypervisor could show hyper-v, osx-v (if any), linux-v, etc., and a guest could create a VM with hyper-v MMU, osx-v interrupt handling, Linux-v timer, etc. And such combinations/variations can grow exponentially.

Or are you suggesting that multiple interfaces be _available_ to guests at runtime but the guest chooses one of them?

>         -hpa
>
             .
Jun Nakajima | Intel Open Source Technology Center
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-07 22:30                       ` Nakajima, Jun
@ 2008-10-07 22:37                         ` H. Peter Anvin
  2008-10-07 23:45                           ` Jeremy Fitzhardinge
  2008-10-07 23:41                         ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-07 22:37 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: akataria, Jeremy Fitzhardinge, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Nakajima, Jun wrote:
> On 10/3/2008 5:35:39 PM, H. Peter Anvin wrote:
>> Nakajima, Jun wrote:
>>> What's the significance of supporting multiple interfaces to the
>>> same guest simultaneously, i.e. _runtime_? We don't want the guests
>>> to run on such a literarily Frankenstein machine. And practically,
>>> such testing/debugging would be good only for Halloween :-).
>>>
>> By that notion, EVERY CPU currently shipped is a "Frankenstein" CPU,
>> since at very least they export Intel-derived and AMD-derived interfaces.
>>  This is in other words, a ridiculous claim.
> 
> The big difference here is that you could create a VM at runtime (by combining the existing interfaces) that did not exist before (or was not tested before). For example, a hypervisor could show hyper-v, osx-v (if any), linux-v, etc., and a guest could create a VM with hyper-v MMU, osx-v interrupt handling, Linux-v timer, etc. And such combinations/variations can grow exponentially.
> 
> Or are you suggesting that multiple interfaces be _available_ to guests at runtime but the guest chooses one of them?
> 

The guest chooses what it wants to use.  We already do this: for 
example, we use CPUID leaf 0x80000006 preferentially to CPUID leaf 2, 
simply because it is a better interface.

And you're absolutely right that the guest may end up picking and 
choosing different parts of the interfaces.  That's how it is supposed 
to work.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-07 22:30                       ` Nakajima, Jun
  2008-10-07 22:37                         ` H. Peter Anvin
@ 2008-10-07 23:41                         ` Jeremy Fitzhardinge
  2008-10-07 23:45                           ` H. Peter Anvin
  1 sibling, 1 reply; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-07 23:41 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: H. Peter Anvin, akataria, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Nakajima, Jun wrote:
> On 10/3/2008 5:35:39 PM, H. Peter Anvin wrote:
>   
>> Nakajima, Jun wrote:
>>     
>>> What's the significance of supporting multiple interfaces to the
>>> same guest simultaneously, i.e. _runtime_? We don't want the guests
>>> to run on such a literarily Frankenstein machine. And practically,
>>> such testing/debugging would be good only for Halloween :-).
>>>
>>>       
>> By that notion, EVERY CPU currently shipped is a "Frankenstein" CPU,
>> since at very least they export Intel-derived and AMD-derived interfaces.
>>  This is in other words, a ridiculous claim.
>>     
>
> The big difference here is that you could create a VM at runtime (by combining the existing interfaces) that did not exist before (or was not tested before). For example, a hypervisor could show hyper-v, osx-v (if any), linux-v, etc., and a guest could create a VM with hyper-v MMU, osx-v interrupt handling, Linux-v timer, etc. And such combinations/variations can grow exponentially.
>   

That would be crazy.

> Or are you suggesting that multiple interfaces be _available_ to guests at runtime but the guest chooses one of them?
>   

Right, that's what I've been suggesting.    I think hypervisors should 
be able to offer multiple ABIs to guests, but a guest has to commit to 
using one exclusively (ie, once they start to use one then the others 
turn themselves off, kill the domain, etc).

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-07 23:41                         ` Jeremy Fitzhardinge
@ 2008-10-07 23:45                           ` H. Peter Anvin
  2008-10-08  0:40                             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-07 23:45 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nakajima, Jun, akataria, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Jeremy Fitzhardinge wrote:
>>
>> The big difference here is that you could create a VM at runtime (by 
>> combining the existing interfaces) that did not exist before (or was 
>> not tested before). For example, a hypervisor could show hyper-v, 
>> osx-v (if any), linux-v, etc., and a guest could create a VM with 
>> hyper-v MMU, osx-v interrupt handling, Linux-v timer, etc. And such 
>> combinations/variations can grow exponentially.
> 
> That would be crazy.
> 

Not necessarily, although the example above is extreme.  Redundant 
interfaces is the norm in an evolving platform.

>> Or are you suggesting that multiple interfaces be _available_ to 
>> guests at runtime but the guest chooses one of them?
> 
> Right, that's what I've been suggesting.    I think hypervisors should 
> be able to offer multiple ABIs to guests, but a guest has to commit to 
> using one exclusively (ie, once they start to use one then the others 
> turn themselves off, kill the domain, etc).

Not inherently.  Of course, there may be interfaces which are interently 
or by policy mutually exclusive, but a hypervisor should only export the 
interfaces it wants a guest to be able to use.

This is particularly so with CPUID, which is a *data export* interface, 
it doesn't perform any action.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-07 22:37                         ` H. Peter Anvin
@ 2008-10-07 23:45                           ` Jeremy Fitzhardinge
  2008-10-08  1:09                             ` H. Peter Anvin
  0 siblings, 1 reply; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-07 23:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nakajima, Jun, akataria, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

H. Peter Anvin wrote:
> And you're absolutely right that the guest may end up picking and 
> choosing different parts of the interfaces.  That's how it is supposed 
> to work. 

No, that would be a horrible, horrible mistake.  There's no sane way to 
implement that; it would mean that the hypervisor would have to have 
some kind of state model that incorporates all the ABIs in a consistent 
way.  Any guest using multiple ABIs would effectively end up being 
dependent on a particular hypervisor via a frankensteinian interface 
that no other hypervisor would implement in the same way, even if they 
claim to implement the same set of interfaces.

If the hypervisor just needs to deal with one at a time then it can have 
relatively simple ABI<->internal state translation.

However, if you have the notion of hypervisor-agnostic or common 
interfaces, then you can include those as part of the rest of the ABI 
and make it sane (so Xen+common, hyperv+common, etc).

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-07 23:45                           ` H. Peter Anvin
@ 2008-10-08  0:40                             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 50+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-08  0:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nakajima, Jun, akataria, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>>>
>>> The big difference here is that you could create a VM at runtime (by 
>>> combining the existing interfaces) that did not exist before (or was 
>>> not tested before). For example, a hypervisor could show hyper-v, 
>>> osx-v (if any), linux-v, etc., and a guest could create a VM with 
>>> hyper-v MMU, osx-v interrupt handling, Linux-v timer, etc. And such 
>>> combinations/variations can grow exponentially.
>>
>> That would be crazy.
>>
>
> Not necessarily, although the example above is extreme.  Redundant 
> interfaces is the norm in an evolving platform.

Sure.  A common feature across all hypervisor-specific ABIs may get 
subsumed into a generic interface which is equivalent to all the 
others.  That's fine.  But nobody should expect to be able to mix 
hyperV's lazy tlb interface with KVM's pv mmu updates and expect to get 
a working result.

>>> Or are you suggesting that multiple interfaces be _available_ to 
>>> guests at runtime but the guest chooses one of them?
>>
>> Right, that's what I've been suggesting.    I think hypervisors 
>> should be able to offer multiple ABIs to guests, but a guest has to 
>> commit to using one exclusively (ie, once they start to use one then 
>> the others turn themselves off, kill the domain, etc).
>
> Not inherently.  Of course, there may be interfaces which are 
> interently or by policy mutually exclusive, but a hypervisor should 
> only export the interfaces it wants a guest to be able to use.

It should export any interface that it implements fully, but those 
interfaces may have contradictory or inconsistent semantics which 
prevent them from being used concurrently.

> This is particularly so with CPUID, which is a *data export* 
> interface, it doesn't perform any action. 

Well, sure.  There's two distinct issues:

   1. Using cpuid to get information about the kernel's environment.  If
      the environment is sane, then cpuid is a read-only, side-effect
      free way of getting information, and any information gathered is
      fair game.
   2. One of the pieces of information you can get with cpuid is a
      discovery of what paravirtual hypercall interfaces the environment
      supports, which the guest can compare against its list of
      interfaces that it supports.  If there's some amount of
      intersection, it can decide to use one of those interfaces.

I'm saying that *in general* a guest should expect to be able to use one 
and only one of those interfaces.  There will be explicitly defined 
exceptions to that - such as using generic ABIs in addition to 
hypervisor specific ABIs - but a guest can't expect to to be able to mix 
and match.

A tricky issue with selecting an ABI is if two hypervisors end up using 
exactly the same mechanism for implementing hypercalls (or whatever), so 
that there needs to be some explicit way for the guest to nominate which 
interface its actually using...

    J

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
  2008-10-07 23:45                           ` Jeremy Fitzhardinge
@ 2008-10-08  1:09                             ` H. Peter Anvin
  0 siblings, 0 replies; 50+ messages in thread
From: H. Peter Anvin @ 2008-10-08  1:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nakajima, Jun, akataria, avi, Rusty Russell, Gerd Hoffmann,
	Ingo Molnar, the arch/x86 maintainers, LKML, Daniel Hecht,
	Zach Amsden, virtualization, kvm

Jeremy Fitzhardinge wrote:
> H. Peter Anvin wrote:
>> And you're absolutely right that the guest may end up picking and 
>> choosing different parts of the interfaces.  That's how it is supposed 
>> to work. 
> 
> No, that would be a horrible, horrible mistake.  There's no sane way to 
> implement that; it would mean that the hypervisor would have to have 
> some kind of state model that incorporates all the ABIs in a consistent 
> way.  Any guest using multiple ABIs would effectively end up being 
> dependent on a particular hypervisor via a frankensteinian interface 
> that no other hypervisor would implement in the same way, even if they 
> claim to implement the same set of interfaces.
> 
> If the hypervisor just needs to deal with one at a time then it can have 
> relatively simple ABI<->internal state translation.
> 
> However, if you have the notion of hypervisor-agnostic or common 
> interfaces, then you can include those as part of the rest of the ABI 
> and make it sane (so Xen+common, hyperv+common, etc).
> 

It depends on what classes of interfaces you're talking about.  I think 
you and Jun have a bit narrow definition of "ABI" in this context.  This 
is functionally equivalent to hardware interfaces (after all, that is 
what the hypervisor ABI *is* as far as the kernel is concerned) -- noone 
expects, say, a SATA controller that can run in legacy IDE mode to also 
take AHCI commands at the same time, but the kernel *does* expect that a 
chipset which exports LAPIC, HPET, PMTMR and TSC clock sources can use 
all four at the same time.  In the latter case the interfaces are 
inherently independent and refer to different chunks of hardware which 
just happen to be related in that they all are related to timing.  In 
the former case, we're dealing with *one* piece of hardware which can 
operate in one of two modes.

For hypervisors, you will end up with cases where you have both types -- 
for example, KVM will happily use VMware's video interface, but that 
doesn't mean KVM wants to use VMware's interfaces for storage.  This is 
exactly how it should be: the extent this kind of mix and match that is 
possible is a matter of the definition of the individual interfaces 
themselves, not of the overall architecture.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2008-10-08  1:14 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-01 17:14 [RFC] CPUID usage for interaction between Hypervisors and Linux Alok Kataria
2008-10-01 17:21 ` H. Peter Anvin
2008-10-01 17:33   ` Alok Kataria
2008-10-01 17:45     ` H. Peter Anvin
2008-10-01 18:06     ` Jeremy Fitzhardinge
2008-10-01 21:05       ` Alok Kataria
2008-10-01 22:46         ` H. Peter Anvin
2008-10-02  1:11           ` Nakajima, Jun
2008-10-02  1:24             ` H. Peter Anvin
2008-10-03 22:33               ` Nakajima, Jun
2008-10-03 23:30                 ` H. Peter Anvin
2008-10-04  0:27                   ` Nakajima, Jun
2008-10-04  0:35                     ` H. Peter Anvin
2008-10-07 22:30                       ` Nakajima, Jun
2008-10-07 22:37                         ` H. Peter Anvin
2008-10-07 23:45                           ` Jeremy Fitzhardinge
2008-10-08  1:09                             ` H. Peter Anvin
2008-10-07 23:41                         ` Jeremy Fitzhardinge
2008-10-07 23:45                           ` H. Peter Anvin
2008-10-08  0:40                             ` Jeremy Fitzhardinge
2008-10-04  8:53                     ` Avi Kivity
2008-10-01 17:47 ` H. Peter Anvin
2008-10-01 18:04 ` Jeremy Fitzhardinge
2008-10-01 18:07   ` H. Peter Anvin
2008-10-01 18:12     ` Jeremy Fitzhardinge
2008-10-01 18:16       ` H. Peter Anvin
2008-10-01 18:36         ` Jeremy Fitzhardinge
2008-10-01 18:43           ` H. Peter Anvin
2008-10-01 19:56             ` Jeremy Fitzhardinge
2008-10-01 20:38           ` Chris Wright
2008-10-01 22:38             ` H. Peter Anvin
2008-10-01 21:01   ` Alok Kataria
2008-10-01 21:08     ` Anthony Liguori
2008-10-01 21:15       ` Chris Wright
2008-10-01 21:31         ` Anthony Liguori
2008-10-01 21:23       ` Alok Kataria
2008-10-01 21:29         ` Anthony Liguori
2008-10-01 21:17     ` Jeremy Fitzhardinge
2008-10-01 21:34       ` Anthony Liguori
2008-10-01 21:43         ` Chris Wright
2008-10-02 11:29           ` Avi Kivity
2008-10-01 23:47         ` Zachary Amsden
2008-10-02  0:39           ` H. Peter Anvin
2008-10-02  0:57             ` H. Peter Anvin
2008-10-02  1:11             ` Zachary Amsden
2008-10-02  1:21               ` H. Peter Anvin
2008-10-02  0:41           ` Anthony Liguori
     [not found] ` <48E3BBC1.2050607__35819.6151479662$1222884502$gmane$org@goop.org>
2008-10-01 20:03   ` Anthony Liguori
2008-10-01 20:08     ` Jeremy Fitzhardinge
     [not found]     ` <48E3D8A8.604__13396.6479487301$1222891831$gmane$org@goop.org>
2008-10-01 21:03       ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).