xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-4.7] docs: Feature Levelling feature document
@ 2016-05-31 17:05 Andrew Cooper
  2016-06-01  9:29 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrew Cooper @ 2016-05-31 17:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Jan Beulich

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/features/feature-levelling.pandoc | 211 +++++++++++++++++++++++++++++++++
 1 file changed, 211 insertions(+)
 create mode 100644 docs/features/feature-levelling.pandoc

diff --git a/docs/features/feature-levelling.pandoc b/docs/features/feature-levelling.pandoc
new file mode 100644
index 0000000..50bf099
--- /dev/null
+++ b/docs/features/feature-levelling.pandoc
@@ -0,0 +1,211 @@
+% Feature Levelling
+% Draft 1
+
+\clearpage
+
+# Basics
+
+---------------- ----------------------------------------------------
+         Status: **Supported**
+
+   Architecture: x86
+
+      Component: Hypervisor, toolstack, guest
+---------------- ----------------------------------------------------
+
+
+# Overview
+
+On native hardware, a kernel will boot, detect features, typically optimise
+certain codepaths based on the available features, and expect the features to
+remain available until it shuts down.
+
+The same expectation exists for virtual machines, and it is up to the
+hypervisor/toolstack to fulfil this expectation for the lifetime of the
+virtual machine, including across migrate/suspend/resume.
+
+
+# User details
+
+Many factors affect the featureset which a VM may use:
+
+* The CPU itself
+* The BIOS/firmware/microcode version and settings
+* The hypervisor version and command line settings
+* Further restrictions the toolstack chooses to apply
+
+A firmware or software upgrade might reduce the available set of features
+(e.g. Intel disabling TSX in a microcode update for certain Haswell/Broadwell
+processors), as may editing the settings.
+
+It is unsafe to make any assumption about features remaining consistent across
+a host reboot.  Xen recalculates all information from scratch each boot, and
+provides the information for the toolstack to consume.
+
+N.B. `xl`, being inherently a single-host toolstack, doesn't make use of these
+levelling improvements.  These features are of interest to higher level
+toolstacks such as `libvirt` or `XAPI`.
+
+
+# Technical details
+
+The `CPUID` instruction is used by software to query for features.  In the
+virtualisation usecase, guest software should query Xen rather than hardware
+directly.  However, `CPUID` is an unprivileged instruction which doesn't
+fault, complicating the task of hiding hardware features from guests.
+
+Important files:
+
+* Hypervisor
+    * `xen/arch/x86/cpu/*.c`
+    * `xen/arch/x86/cpuid.c`
+    * `xen/include/asm-x86/cpuid-autogen.h`
+    * `xen/include/public/arch-x86/cpufeatureset.h`
+    * `xen/tools/gen-cpuid.py`
+* `libxc`
+    * `tools/libxc/xc_cpuid_x86.c`
+
+## Ability to control CPUID
+
+### HVM
+
+HVM guests (using `Intel VT-x` or `AMD SVM`) will unconditionally exit to Xen
+on all `CPUID` instructions, allowing Xen full control over all information.
+
+### PV
+
+The `CPUID` instruction is unprivileged, so executing it in a PV guest will
+not trap, leaving Xen no direct ability to control the information returned.
+
+### Xen Forced Emulation Prefix
+
+Xen-aware PV software can make use of the 'Forced Emulation Prefix'
+
+> `ud2a; .ascii 'xen'; cpuid`
+
+which Xen recognises as a deliberate attempt to get the fully-controlled
+`CPUID` information rather than the hardware-reported information.  This only
+works with cooperative software.
+
+### Masking and Override MSRs
+
+AMD CPUs from the `K8` onwards support _Feature Override_ MSRs, which allow
+direct control of the values returned for certain `CPUID` leaves.  These MSRs
+allow any result to be returned, including the ability to advertise features
+which are not actually supported.
+
+Intel CPUs between `Nehalem` and `SandyBridge` have differing numbers of
+_Feature Mask_ MSRs, which are a simple AND-mask applied to all `CPUID`
+instructions requesting specific feature bitmap sets.  The exact MSRs, and
+which feature bitmap sets they affect are hardware specific.  These MSRs allow
+features to be hidden by clearing the appropriate bit in the mask, but does
+not allow unsupported features to be advertised.
+
+### CPUID Faulting
+
+Intel CPUs from `IvyBridge` onwards have _CPUID Faulting_, which allows Xen to
+cause `CPUID` instruction executed in PV guests to fault.  This allows Xen
+full control over all information, exactly like HVM guests.
+
+## Compile time
+
+As some features depend on other features, it is important that, when
+disabling a certain feature, we disable all features which depend on it.  This
+allows runtime logic to be simplified, by being able to rely on testing only
+the single appropriate feature, rather than the entire feature dependency
+chain.
+
+To speed up runtime calculation of feature dependencies, the dependency chain
+is calculated and flattened by `xen/tools/gen-cpuid.py` to create
+`xen/include/asm-x86/cpuid-autogen.h` from
+`xen/include/public/arch-x86/cpufeatureset.h`, allowing the runtime code to
+disable all dependent features of a specific disabled feature in constant
+time.
+
+## Host boot
+
+As Xen boots, it will enumerate the features it can see.  This is stored as
+the _raw\_featureset_.
+
+Errata checks and command line arguments are then taken into account to reduce
+the _raw\_featureset_ into the _host\_featureset_, which is the set of
+features Xen uses.  On hardware with masking/override MSRs, the default MSR
+values are picked from the _host\_featureset_.
+
+The _host\_featureset_ is then used to calculate the _pv\_featureset_ and
+_hvm\_featureset_, which are the maximum featuresets Xen is willing to offer
+to PV and HVM guests respectively.
+
+In addition, Xen will calculate how much control it has over non-cooperative
+PV `CPUID` instructions, storing this information as _levelling\_caps_.
+
+## Domain creation
+
+The toolstack can query each of the calculated featureset via
+`XEN_SYSCTL_get_cpu_featureset`, and query for the levelling caps via
+`XEN_SYSCTL_get_cpu_levelling_caps`.
+
+These data should be used by the toolstack when choosing the eventual
+featureset to offer to the guest.
+
+Once a featureset has been chosen, it is set (implicitly or explicitly) via
+`XEN_DOMCTL_set_cpuid`.  Xen will clamp the toolstacks choice to the
+appropriate PV or HVM featureset.  On hardware with masking/override MSRs, the
+guest cpuid policy is reflected in the MSRs, which are context switched with
+other vcpu state.
+
+# Limitations
+
+A guest which ignores the provided feature information and manually probes for
+features will be able to find some of them.  e.g. There is no way of forcibly
+preventing a guest from using 1GB superpages if the hardware supports it.
+
+Some information simply cannot be hidden from guests.  There is no way to
+control certain behaviour such as the hardware MXCSR\_MASK or x87 FPU exception
+behaviour.
+
+
+# Testing
+
+Feature levelling is a very wide area, and used all over the hypervisor.
+Please ask on xen-devel for help identifying more specific tests which could
+be of use.
+
+
+# Known issues / Areas for improvement
+
+Xen currently has no concept of per-{socket,core,thread} CPUID information.
+As a result, details such as APIC IDs, topology and cache information do not
+match real hardware, and do not match the documented expectations in the Intel
+and AMD system manuals.
+
+The CPU feature flags are the only information which the toolstack has a
+sensible interface for querying and levelling.  Other information in the CPUID
+policy is important and should be levelled (e.g. maxphysaddr).
+
+The CPUID policy is currently regenerated from scratch by the receiving side,
+once memory and vcpu content has been restored.  This means that the receiving
+Xen cannot verify the memory/vcpu content against the CPUID policy, and can
+end up running a guest which will subsequently crash.  The CPUID policy should
+be at the head of the migration stream.
+
+MSRs are another source of features for guests.  There is no general provision
+for controlling the available MSRs.  E.g. 64bit versions of Windows notice
+changes in IA32\_MISC\_ENABLE, and suffer a BSOD 0x109 (Critical Structure
+Corruption)
+
+
+# References
+
+[Intel Flexmigration](http://www.intel.co.uk/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf)
+
+[AMD Extended Migration Technology](http://developer.amd.com/wordpress/media/2012/10/43781-3.00-PUB_Live-Virtual-Machine-Migration-on-AMD-processors.pdf)
+
+
+# History
+
+------------------------------------------------------------------------
+Date       Revision Version  Notes
+---------- -------- -------- -------------------------------------------
+2016-05-31 1        Xen 4.7  Document written
+---------- -------- -------- -------------------------------------------
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.7] docs: Feature Levelling feature document
  2016-05-31 17:05 [PATCH for-4.7] docs: Feature Levelling feature document Andrew Cooper
@ 2016-06-01  9:29 ` Jan Beulich
  2016-06-03 15:36   ` Andrew Cooper
  2016-06-01  9:41 ` Wei Liu
  2016-06-01 10:25 ` Ian Jackson
  2 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2016-06-01  9:29 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Xen-devel

>>> On 31.05.16 at 19:05, <andrew.cooper3@citrix.com> wrote:
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

with one spelling correction:

> +# Overview
> +
> +On native hardware, a kernel will boot, detect features, typically optimise
> +certain codepaths based on the available features, and expect the features to
> +remain available until it shuts down.
> +
> +The same expectation exists for virtual machines, and it is up to the
> +hypervisor/toolstack to fulfil this expectation for the lifetime of the

fulfill

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.7] docs: Feature Levelling feature document
  2016-05-31 17:05 [PATCH for-4.7] docs: Feature Levelling feature document Andrew Cooper
  2016-06-01  9:29 ` Jan Beulich
@ 2016-06-01  9:41 ` Wei Liu
  2016-06-01 10:25 ` Ian Jackson
  2 siblings, 0 replies; 12+ messages in thread
From: Wei Liu @ 2016-06-01  9:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Jan Beulich, Xen-devel

Release-acked-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.7] docs: Feature Levelling feature document
  2016-05-31 17:05 [PATCH for-4.7] docs: Feature Levelling feature document Andrew Cooper
  2016-06-01  9:29 ` Jan Beulich
  2016-06-01  9:41 ` Wei Liu
@ 2016-06-01 10:25 ` Ian Jackson
  2016-06-01 12:05   ` Andrew Cooper
  2 siblings, 1 reply; 12+ messages in thread
From: Ian Jackson @ 2016-06-01 10:25 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

Andrew Cooper writes ("[PATCH for-4.7] docs: Feature Levelling feature document"):
> +N.B. `xl`, being inherently a single-host toolstack, doesn't make use of these
> +levelling improvements.  These features are of interest to higher level
> +toolstacks such as `libvirt` or `XAPI`.

I don't think this is quite the right spin, IYSWIM.  xl does not
currently provide any way to sort this stuff out.  But in principle, I
think there would be ways that it could.

I would prefer a wording which was more encouraging to future
improvements.  Shall I suggest something ?

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.7] docs: Feature Levelling feature document
  2016-06-01 10:25 ` Ian Jackson
@ 2016-06-01 12:05   ` Andrew Cooper
  2016-06-01 12:14     ` Ian Jackson
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2016-06-01 12:05 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Wei Liu, Jan Beulich, Xen-devel

On 01/06/16 11:25, Ian Jackson wrote:
> Andrew Cooper writes ("[PATCH for-4.7] docs: Feature Levelling feature document"):
>> +N.B. `xl`, being inherently a single-host toolstack, doesn't make use of these
>> +levelling improvements.  These features are of interest to higher level
>> +toolstacks such as `libvirt` or `XAPI`.
> I don't think this is quite the right spin, IYSWIM.  xl does not
> currently provide any way to sort this stuff out.  But in principle, I
> think there would be ways that it could.
>
> I would prefer a wording which was more encouraging to future
> improvements.  Shall I suggest something ?

I guess there are two different issues here.  (Note: I am specifically
distinguishing `xl` as a toolstack itself, from libxl which is a just a
library.)

Simply exposing the levelling/featureset information in `xl info` is
certainly a possible thing to do.  Joao has some plans for surfacing the
levelling information in libxl for libvirt to use.

However, without a fundamental redesign of how xl works, it isn't going
to gain multi-host knowledge and consideration during domain creation.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.7] docs: Feature Levelling feature document
  2016-06-01 12:05   ` Andrew Cooper
@ 2016-06-01 12:14     ` Ian Jackson
  2016-06-01 13:11       ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Jackson @ 2016-06-01 12:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

Andrew Cooper writes ("Re: [PATCH for-4.7] docs: Feature Levelling feature document"):
> On 01/06/16 11:25, Ian Jackson wrote:
> > I would prefer a wording which was more encouraging to future
> > improvements.  Shall I suggest something ?
> 
> I guess there are two different issues here.  (Note: I am specifically
> distinguishing `xl` as a toolstack itself, from libxl which is a just a
> library.)
> 
> Simply exposing the levelling/featureset information in `xl info` is
> certainly a possible thing to do.  Joao has some plans for surfacing the
> levelling information in libxl for libvirt to use.

Right.

> However, without a fundamental redesign of how xl works, it isn't going
> to gain multi-host knowledge and consideration during domain creation.

IMO xl ought to have the moving parts necessary to allow an
administrator to: 1. collect feature information from their hosts;
2. combine that information into the desired feature set to expose to
guests; 3. specify the feature set in their host configuration; 4. do
all of the above conveniently, without seddery.

We should assume that the administrator has available tools like
GNU parallel, ansible, or whatever.

I don't want to design this now but I do want the feature levelling
documentation to welcome suggestions for it, or at least not to seem
to rule it out.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.7] docs: Feature Levelling feature document
  2016-06-01 12:14     ` Ian Jackson
@ 2016-06-01 13:11       ` Andrew Cooper
  2016-06-03 14:59         ` [PATCH v2 " Ian Jackson
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2016-06-01 13:11 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Wei Liu, Jan Beulich, Xen-devel

On 01/06/16 13:14, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [PATCH for-4.7] docs: Feature Levelling feature document"):
>> On 01/06/16 11:25, Ian Jackson wrote:
>>> I would prefer a wording which was more encouraging to future
>>> improvements.  Shall I suggest something ?
>> I guess there are two different issues here.  (Note: I am specifically
>> distinguishing `xl` as a toolstack itself, from libxl which is a just a
>> library.)
>>
>> Simply exposing the levelling/featureset information in `xl info` is
>> certainly a possible thing to do.  Joao has some plans for surfacing the
>> levelling information in libxl for libvirt to use.
> Right.
>
>> However, without a fundamental redesign of how xl works, it isn't going
>> to gain multi-host knowledge and consideration during domain creation.
> IMO xl ought to have the moving parts necessary to allow an
> administrator to: 1. collect feature information from their hosts;
> 2. combine that information into the desired feature set to expose to
> guests; 3. specify the feature set in their host configuration; 4. do
> all of the above conveniently, without seddery.
>
> We should assume that the administrator has available tools like
> GNU parallel, ansible, or whatever.
>
> I don't want to design this now but I do want the feature levelling
> documentation to welcome suggestions for it, or at least not to seem
> to rule it out.

1) is currently available via the `xen-cpuid` binary introduced,
although I intended it more as a developer tool

Combining is the awkward part, but in the common case, it is just a
bitwise AND of the bitmaps provided by `xen-cpuid`.

3) I don't know what you mean about their host configuration.  Do you
mean guest configuration?

All of this works in combination with the existing cpuid= guest
configuration.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 for-4.7] docs: Feature Levelling feature document
  2016-06-01 13:11       ` Andrew Cooper
@ 2016-06-03 14:59         ` Ian Jackson
  2016-06-03 15:35           ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Jackson @ 2016-06-03 14:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

Andrew Cooper writes ("Re: [PATCH for-4.7] docs: Feature Levelling feature document"):
> On 01/06/16 13:14, Ian Jackson wrote:
> > IMO xl ought to have the moving parts necessary to allow an
> > administrator to: 1. collect feature information from their hosts;
> > 2. combine that information into the desired feature set to expose to
> > guests; 3. specify the feature set in their host configuration; 4. do
> > all of the above conveniently, without seddery.
> >
> > We should assume that the administrator has available tools like
> > GNU parallel, ansible, or whatever.
> >
> > I don't want to design this now but I do want the feature levelling
> > documentation to welcome suggestions for it, or at least not to seem
> > to rule it out.
> 
> 1) is currently available via the `xen-cpuid` binary introduced,
> although I intended it more as a developer tool
> 
> Combining is the awkward part, but in the common case, it is just a
> bitwise AND of the bitmaps provided by `xen-cpuid`.

Right.

> 3) I don't know what you mean about their host configuration.  Do you
> mean guest configuration?

No, I mean that

1. the admin should have the ability to write a default to be used for
   all guests, in one place

2. the admin should have the ability to write this information
   somewhere other than the domain config file (because domain config
   files are often generated by other tools)

> All of this works in combination with the existing cpuid= guest
> configuration.

Great.  Documentation on how to do it `by hand' would be nice but I
don't think it's essential.

Below: incremental diff as a "squash!" patch, followed by combined
updated patch.

Ian.


From 6331d6673cd292e4b8b064b8eef36cb4ed80b72b Mon Sep 17 00:00:00 2001
From: Ian Jackson <ian.jackson@eu.citrix.com>
Date: Fri, 3 Jun 2016 15:43:36 +0100
Subject: [PATCH] squash! docs: Feature Levelling feature document

---
v2: Better wording about xl

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 docs/features/feature-levelling.pandoc |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/docs/features/feature-levelling.pandoc b/docs/features/feature-levelling.pandoc
index 50bf099..e1f7231 100644
--- a/docs/features/feature-levelling.pandoc
+++ b/docs/features/feature-levelling.pandoc
@@ -42,10 +42,12 @@ It is unsafe to make any assumption about features remaining consistent across
 a host reboot.  Xen recalculates all information from scratch each boot, and
 provides the information for the toolstack to consume.
 
-N.B. `xl`, being inherently a single-host toolstack, doesn't make use of these
-levelling improvements.  These features are of interest to higher level
-toolstacks such as `libvirt` or `XAPI`.
-
+`xl` currently has no facilities to help the user collect appropriate
+feature information from relevant hosts and compute appropriate
+feature specifications for use in host or domain configurations.
+(`xl` being a single-host toolstack, it would in any case need
+external support for accessing remote hosts eg via ssh, in the form of
+automation software like GNU parallel or ansible.)
 
 # Technical details
 
@@ -174,6 +176,9 @@ be of use.
 
 # Known issues / Areas for improvement
 
+The feature querying and levelling functions should exposed in a
+convenient-to-use way by `xl`.
+
 Xen currently has no concept of per-{socket,core,thread} CPUID information.
 As a result, details such as APIC IDs, topology and cache information do not
 match real hardware, and do not match the documented expectations in the Intel
-- 
1.7.10.4



From debe87a91a0742d402a08baa7e572b2755da629f Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue, 31 May 2016 18:05:45 +0100
Subject: [PATCH] docs: Feature Levelling feature document

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Better wording about xl
---
 docs/features/feature-levelling.pandoc |  216 ++++++++++++++++++++++++++++++++
 1 file changed, 216 insertions(+)
 create mode 100644 docs/features/feature-levelling.pandoc

diff --git a/docs/features/feature-levelling.pandoc b/docs/features/feature-levelling.pandoc
new file mode 100644
index 0000000..e1f7231
--- /dev/null
+++ b/docs/features/feature-levelling.pandoc
@@ -0,0 +1,216 @@
+% Feature Levelling
+% Draft 1
+
+\clearpage
+
+# Basics
+
+---------------- ----------------------------------------------------
+         Status: **Supported**
+
+   Architecture: x86
+
+      Component: Hypervisor, toolstack, guest
+---------------- ----------------------------------------------------
+
+
+# Overview
+
+On native hardware, a kernel will boot, detect features, typically optimise
+certain codepaths based on the available features, and expect the features to
+remain available until it shuts down.
+
+The same expectation exists for virtual machines, and it is up to the
+hypervisor/toolstack to fulfil this expectation for the lifetime of the
+virtual machine, including across migrate/suspend/resume.
+
+
+# User details
+
+Many factors affect the featureset which a VM may use:
+
+* The CPU itself
+* The BIOS/firmware/microcode version and settings
+* The hypervisor version and command line settings
+* Further restrictions the toolstack chooses to apply
+
+A firmware or software upgrade might reduce the available set of features
+(e.g. Intel disabling TSX in a microcode update for certain Haswell/Broadwell
+processors), as may editing the settings.
+
+It is unsafe to make any assumption about features remaining consistent across
+a host reboot.  Xen recalculates all information from scratch each boot, and
+provides the information for the toolstack to consume.
+
+`xl` currently has no facilities to help the user collect appropriate
+feature information from relevant hosts and compute appropriate
+feature specifications for use in host or domain configurations.
+(`xl` being a single-host toolstack, it would in any case need
+external support for accessing remote hosts eg via ssh, in the form of
+automation software like GNU parallel or ansible.)
+
+# Technical details
+
+The `CPUID` instruction is used by software to query for features.  In the
+virtualisation usecase, guest software should query Xen rather than hardware
+directly.  However, `CPUID` is an unprivileged instruction which doesn't
+fault, complicating the task of hiding hardware features from guests.
+
+Important files:
+
+* Hypervisor
+    * `xen/arch/x86/cpu/*.c`
+    * `xen/arch/x86/cpuid.c`
+    * `xen/include/asm-x86/cpuid-autogen.h`
+    * `xen/include/public/arch-x86/cpufeatureset.h`
+    * `xen/tools/gen-cpuid.py`
+* `libxc`
+    * `tools/libxc/xc_cpuid_x86.c`
+
+## Ability to control CPUID
+
+### HVM
+
+HVM guests (using `Intel VT-x` or `AMD SVM`) will unconditionally exit to Xen
+on all `CPUID` instructions, allowing Xen full control over all information.
+
+### PV
+
+The `CPUID` instruction is unprivileged, so executing it in a PV guest will
+not trap, leaving Xen no direct ability to control the information returned.
+
+### Xen Forced Emulation Prefix
+
+Xen-aware PV software can make use of the 'Forced Emulation Prefix'
+
+> `ud2a; .ascii 'xen'; cpuid`
+
+which Xen recognises as a deliberate attempt to get the fully-controlled
+`CPUID` information rather than the hardware-reported information.  This only
+works with cooperative software.
+
+### Masking and Override MSRs
+
+AMD CPUs from the `K8` onwards support _Feature Override_ MSRs, which allow
+direct control of the values returned for certain `CPUID` leaves.  These MSRs
+allow any result to be returned, including the ability to advertise features
+which are not actually supported.
+
+Intel CPUs between `Nehalem` and `SandyBridge` have differing numbers of
+_Feature Mask_ MSRs, which are a simple AND-mask applied to all `CPUID`
+instructions requesting specific feature bitmap sets.  The exact MSRs, and
+which feature bitmap sets they affect are hardware specific.  These MSRs allow
+features to be hidden by clearing the appropriate bit in the mask, but does
+not allow unsupported features to be advertised.
+
+### CPUID Faulting
+
+Intel CPUs from `IvyBridge` onwards have _CPUID Faulting_, which allows Xen to
+cause `CPUID` instruction executed in PV guests to fault.  This allows Xen
+full control over all information, exactly like HVM guests.
+
+## Compile time
+
+As some features depend on other features, it is important that, when
+disabling a certain feature, we disable all features which depend on it.  This
+allows runtime logic to be simplified, by being able to rely on testing only
+the single appropriate feature, rather than the entire feature dependency
+chain.
+
+To speed up runtime calculation of feature dependencies, the dependency chain
+is calculated and flattened by `xen/tools/gen-cpuid.py` to create
+`xen/include/asm-x86/cpuid-autogen.h` from
+`xen/include/public/arch-x86/cpufeatureset.h`, allowing the runtime code to
+disable all dependent features of a specific disabled feature in constant
+time.
+
+## Host boot
+
+As Xen boots, it will enumerate the features it can see.  This is stored as
+the _raw\_featureset_.
+
+Errata checks and command line arguments are then taken into account to reduce
+the _raw\_featureset_ into the _host\_featureset_, which is the set of
+features Xen uses.  On hardware with masking/override MSRs, the default MSR
+values are picked from the _host\_featureset_.
+
+The _host\_featureset_ is then used to calculate the _pv\_featureset_ and
+_hvm\_featureset_, which are the maximum featuresets Xen is willing to offer
+to PV and HVM guests respectively.
+
+In addition, Xen will calculate how much control it has over non-cooperative
+PV `CPUID` instructions, storing this information as _levelling\_caps_.
+
+## Domain creation
+
+The toolstack can query each of the calculated featureset via
+`XEN_SYSCTL_get_cpu_featureset`, and query for the levelling caps via
+`XEN_SYSCTL_get_cpu_levelling_caps`.
+
+These data should be used by the toolstack when choosing the eventual
+featureset to offer to the guest.
+
+Once a featureset has been chosen, it is set (implicitly or explicitly) via
+`XEN_DOMCTL_set_cpuid`.  Xen will clamp the toolstacks choice to the
+appropriate PV or HVM featureset.  On hardware with masking/override MSRs, the
+guest cpuid policy is reflected in the MSRs, which are context switched with
+other vcpu state.
+
+# Limitations
+
+A guest which ignores the provided feature information and manually probes for
+features will be able to find some of them.  e.g. There is no way of forcibly
+preventing a guest from using 1GB superpages if the hardware supports it.
+
+Some information simply cannot be hidden from guests.  There is no way to
+control certain behaviour such as the hardware MXCSR\_MASK or x87 FPU exception
+behaviour.
+
+
+# Testing
+
+Feature levelling is a very wide area, and used all over the hypervisor.
+Please ask on xen-devel for help identifying more specific tests which could
+be of use.
+
+
+# Known issues / Areas for improvement
+
+The feature querying and levelling functions should exposed in a
+convenient-to-use way by `xl`.
+
+Xen currently has no concept of per-{socket,core,thread} CPUID information.
+As a result, details such as APIC IDs, topology and cache information do not
+match real hardware, and do not match the documented expectations in the Intel
+and AMD system manuals.
+
+The CPU feature flags are the only information which the toolstack has a
+sensible interface for querying and levelling.  Other information in the CPUID
+policy is important and should be levelled (e.g. maxphysaddr).
+
+The CPUID policy is currently regenerated from scratch by the receiving side,
+once memory and vcpu content has been restored.  This means that the receiving
+Xen cannot verify the memory/vcpu content against the CPUID policy, and can
+end up running a guest which will subsequently crash.  The CPUID policy should
+be at the head of the migration stream.
+
+MSRs are another source of features for guests.  There is no general provision
+for controlling the available MSRs.  E.g. 64bit versions of Windows notice
+changes in IA32\_MISC\_ENABLE, and suffer a BSOD 0x109 (Critical Structure
+Corruption)
+
+
+# References
+
+[Intel Flexmigration](http://www.intel.co.uk/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf)
+
+[AMD Extended Migration Technology](http://developer.amd.com/wordpress/media/2012/10/43781-3.00-PUB_Live-Virtual-Machine-Migration-on-AMD-processors.pdf)
+
+
+# History
+
+------------------------------------------------------------------------
+Date       Revision Version  Notes
+---------- -------- -------- -------------------------------------------
+2016-05-31 1        Xen 4.7  Document written
+---------- -------- -------- -------------------------------------------
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 for-4.7] docs: Feature Levelling feature document
  2016-06-03 14:59         ` [PATCH v2 " Ian Jackson
@ 2016-06-03 15:35           ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2016-06-03 15:35 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Wei Liu, Jan Beulich, Xen-devel

On 03/06/16 15:59, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [PATCH for-4.7] docs: Feature Levelling feature document"):
>> On 01/06/16 13:14, Ian Jackson wrote:
>>> IMO xl ought to have the moving parts necessary to allow an
>>> administrator to: 1. collect feature information from their hosts;
>>> 2. combine that information into the desired feature set to expose to
>>> guests; 3. specify the feature set in their host configuration; 4. do
>>> all of the above conveniently, without seddery.
>>>
>>> We should assume that the administrator has available tools like
>>> GNU parallel, ansible, or whatever.
>>>
>>> I don't want to design this now but I do want the feature levelling
>>> documentation to welcome suggestions for it, or at least not to seem
>>> to rule it out.
>> 1) is currently available via the `xen-cpuid` binary introduced,
>> although I intended it more as a developer tool
>>
>> Combining is the awkward part, but in the common case, it is just a
>> bitwise AND of the bitmaps provided by `xen-cpuid`.
> Right.
>
>> 3) I don't know what you mean about their host configuration.  Do you
>> mean guest configuration?
> No, I mean that
>
> 1. the admin should have the ability to write a default to be used for
>    all guests, in one place
>
> 2. the admin should have the ability to write this information
>    somewhere other than the domain config file (because domain config
>    files are often generated by other tools)

Ah ok - I see what you mean now.  This is a non-trivial UX problem to
solve, especially as any stashed default is stale as soon as you reboot
the host, but I agree that we can definitely do better than the current
status quo.

>
>> All of this works in combination with the existing cpuid= guest
>> configuration.
> Great.  Documentation on how to do it `by hand' would be nice but I
> don't think it's essential.

Sadly, while the most common case is easy, there are many sharp edges a
user should be aware of before playing in this area.

A part of the submitted series was to do with sanding some of the edges
in Xen, so that a misinformed toolstack can't actually advertise
features to the guest which Xen can't fulfil.

>
> Below: incremental diff as a "squash!" patch, followed by combined
> updated patch.

Thanks.  I have folded it in and submitted a v2.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.7] docs: Feature Levelling feature document
  2016-06-01  9:29 ` Jan Beulich
@ 2016-06-03 15:36   ` Andrew Cooper
  2016-06-03 15:42     ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2016-06-03 15:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, Xen-devel

On 01/06/16 10:29, Jan Beulich wrote:
>>>> On 31.05.16 at 19:05, <andrew.cooper3@citrix.com> wrote:
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> with one spelling correction:
>
>> +# Overview
>> +
>> +On native hardware, a kernel will boot, detect features, typically optimise
>> +certain codepaths based on the available features, and expect the features to
>> +remain available until it shuts down.
>> +
>> +The same expectation exists for virtual machines, and it is up to the
>> +hypervisor/toolstack to fulfil this expectation for the lifetime of the
> fulfill

That is the American spelling.  The English spelling does not have a
double l.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.7] docs: Feature Levelling feature document
  2016-06-03 15:36   ` Andrew Cooper
@ 2016-06-03 15:42     ` Jan Beulich
  2016-06-03 15:53       ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2016-06-03 15:42 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Xen-devel

>>> On 03.06.16 at 17:36, <andrew.cooper3@citrix.com> wrote:
> On 01/06/16 10:29, Jan Beulich wrote:
>>>>> On 31.05.16 at 19:05, <andrew.cooper3@citrix.com> wrote:
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> with one spelling correction:
>>
>>> +# Overview
>>> +
>>> +On native hardware, a kernel will boot, detect features, typically optimise
>>> +certain codepaths based on the available features, and expect the features to
>>> +remain available until it shuts down.
>>> +
>>> +The same expectation exists for virtual machines, and it is up to the
>>> +hypervisor/toolstack to fulfil this expectation for the lifetime of the
>> fulfill
> 
> That is the American spelling.  The English spelling does not have a
> double l.

Oh, very interesting. I would never have thought of this kind of a
difference between British and American English, the more that
you also write "fill" afaik, not "fil". But - good to know, thanks!

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.7] docs: Feature Levelling feature document
  2016-06-03 15:42     ` Jan Beulich
@ 2016-06-03 15:53       ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2016-06-03 15:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Wei Liu, Ian Jackson, Xen-devel

On 03/06/16 16:42, Jan Beulich wrote:
>>>> On 03.06.16 at 17:36, <andrew.cooper3@citrix.com> wrote:
>> On 01/06/16 10:29, Jan Beulich wrote:
>>>>>> On 31.05.16 at 19:05, <andrew.cooper3@citrix.com> wrote:
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> with one spelling correction:
>>>
>>>> +# Overview
>>>> +
>>>> +On native hardware, a kernel will boot, detect features, typically optimise
>>>> +certain codepaths based on the available features, and expect the features to
>>>> +remain available until it shuts down.
>>>> +
>>>> +The same expectation exists for virtual machines, and it is up to the
>>>> +hypervisor/toolstack to fulfil this expectation for the lifetime of the
>>> fulfill
>> That is the American spelling.  The English spelling does not have a
>> double l.
> Oh, very interesting. I would never have thought of this kind of a
> difference between British and American English, the more that
> you also write "fill" afaik, not "fil". But - good to know, thanks!

Because English is so well known for its consistency :)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-06-03 15:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-31 17:05 [PATCH for-4.7] docs: Feature Levelling feature document Andrew Cooper
2016-06-01  9:29 ` Jan Beulich
2016-06-03 15:36   ` Andrew Cooper
2016-06-03 15:42     ` Jan Beulich
2016-06-03 15:53       ` Andrew Cooper
2016-06-01  9:41 ` Wei Liu
2016-06-01 10:25 ` Ian Jackson
2016-06-01 12:05   ` Andrew Cooper
2016-06-01 12:14     ` Ian Jackson
2016-06-01 13:11       ` Andrew Cooper
2016-06-03 14:59         ` [PATCH v2 " Ian Jackson
2016-06-03 15:35           ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).