All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests
@ 2016-02-05 13:41 Andrew Cooper
  2016-02-05 13:41 ` [PATCH v2 01/30] xen/x86: Drop X86_FEATURE_3DNOW_ALT Andrew Cooper
                   ` (30 more replies)
  0 siblings, 31 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:41 UTC (permalink / raw)
  To: Xen-devel
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, Ian Jackson, Tim Deegan,
	Jan Beulich

Presented here is v2 of my work to improve cpuid levelling for guests.

This series is available in git form at:
  http://xenbits.xen.org/git-http/people/andrewcoop/xen.git levelling-v2

Major changes from v1 include a rebase onto staging, reworking of the
automatic generation of cpu featureset information, and fixes to xsave
handling for PV guests on Intel.

There is still an outstanding issue with xsave handling for PV guests on AMD
which I am investigating, and is the cause for the series still being RFC.

The current cpuid code, both in the hypervisor and toolstack, has grown
organically for a very long time, and is flawed in many ways.  This series
focuses specifically on the fixing the bits pertaining to the visible
features, and I will be fixing other areas in future work (e.g. per-core,
per-package values, auditing of incoming migration values, etc.)

These changes alter the workflow of cpuid handling as follows:

Xen boots and evaluates its current capabilities.  It uses this information to
calculate the maximum featuresets it can provide to guests, and provides this
information for toolstack consumption.  A toolstack may then calculate a safe
set of features (taking into account migratability), and sets a guests cpuid
policy.  Xen then takes care of context switching the levelling state.

In particular, this means that PV guests may have different levels while
running on the same host, an option which was not previously available.

Andrew Cooper (30):
  xen/x86: Drop X86_FEATURE_3DNOW_ALT
  xen/x86: Do not store VIA/Cyrix/Centaur CPU features
  xen/x86: Drop cpuinfo_x86.x86_power
  xen/x86: Improvements to pv_cpuid()
  xen/public: Export cpu featureset information in the public API
  xen/x86: Script to automatically process featureset information
  xen/x86: Collect more cpuid feature leaves
  xen/x86: Mask out unknown features from Xen's capabilities
  xen/x86: Store antifeatures inverted in a featureset
  xen/x86: Annotate VM applicability in featureset
  xen/x86: Calculate maximum host and guest featuresets
  xen/x86: Generate deep dependencies of features
  xen/x86: Clear dependent features when clearing a cpu cap
  xen/x86: Improve disabling of features which have dependencies
  xen/x86: Improvements to in-hypervisor cpuid sanity checks
  x86/cpu: Move set_cpumask() calls into c_early_init()
  x86/cpu: Common infrastructure for levelling context switching
  x86/cpu: Rework AMD masking MSR setup
  x86/cpu: Rework Intel masking/faulting setup
  x86/cpu: Context switch cpuid masks and faulting state in
    context_switch()
  x86/pv: Provide custom cpumasks for PV domains
  x86/domctl: Update PV domain cpumasks when setting cpuid policy
  xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  tools/libxc: Modify bitmap operations to take void pointers
  tools/libxc: Use public/featureset.h for cpuid policy generation
  tools/libxc: Expose the automatically generated cpu featuremask
    information
  tools: Utility for dealing with featuresets
  tools/libxc: Wire a featureset through to cpuid policy logic
  tools/libxc: Use featuresets rather than guesswork
  tools/libxc: Calculate xstate cpuid leaf from guest information

 .gitignore                                  |   2 +
 tools/libxc/Makefile                        |   9 +
 tools/libxc/include/xenctrl.h               |  21 +-
 tools/libxc/xc_bitops.h                     |  21 +-
 tools/libxc/xc_cpufeature.h                 | 147 --------
 tools/libxc/xc_cpuid_x86.c                  | 550 ++++++++++++++++------------
 tools/libxl/libxl_cpuid.c                   |   2 +-
 tools/misc/Makefile                         |   4 +
 tools/misc/xen-cpuid.c                      | 394 ++++++++++++++++++++
 tools/ocaml/libs/xc/xenctrl.ml              |   3 +
 tools/ocaml/libs/xc/xenctrl.mli             |   4 +
 tools/ocaml/libs/xc/xenctrl_stubs.c         |  37 +-
 tools/python/xen/lowlevel/xc/xc.c           |   2 +-
 xen/arch/x86/Makefile                       |   1 +
 xen/arch/x86/apic.c                         |   2 +-
 xen/arch/x86/cpu/amd.c                      | 309 ++++++++++------
 xen/arch/x86/cpu/centaur.c                  |   3 -
 xen/arch/x86/cpu/common.c                   |  51 ++-
 xen/arch/x86/cpu/intel.c                    | 269 +++++++++-----
 xen/arch/x86/cpuid.c                        | 227 ++++++++++++
 xen/arch/x86/domain.c                       |  17 +-
 xen/arch/x86/domctl.c                       |  88 +++++
 xen/arch/x86/hvm/hvm.c                      |  56 ++-
 xen/arch/x86/setup.c                        |   3 +
 xen/arch/x86/sysctl.c                       |  66 ++++
 xen/arch/x86/traps.c                        | 223 ++++++-----
 xen/arch/x86/xstate.c                       |   6 +-
 xen/include/Makefile                        |  10 +
 xen/include/asm-x86/cpufeature.h            | 174 +--------
 xen/include/asm-x86/cpuid.h                 |  44 +++
 xen/include/asm-x86/domain.h                |   2 +
 xen/include/asm-x86/processor.h             |  31 +-
 xen/include/public/arch-x86/cpufeatureset.h | 216 +++++++++++
 xen/include/public/sysctl.h                 |  25 ++
 xen/tools/gen-cpuid.py                      | 329 +++++++++++++++++
 35 files changed, 2410 insertions(+), 938 deletions(-)
 delete mode 100644 tools/libxc/xc_cpufeature.h
 create mode 100644 tools/misc/xen-cpuid.c
 create mode 100644 xen/arch/x86/cpuid.c
 create mode 100644 xen/include/asm-x86/cpuid.h
 create mode 100644 xen/include/public/arch-x86/cpufeatureset.h
 create mode 100755 xen/tools/gen-cpuid.py

-- 
2.1.4

^ permalink raw reply	[flat|nested] 139+ messages in thread

* [PATCH v2 01/30] xen/x86: Drop X86_FEATURE_3DNOW_ALT
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
@ 2016-02-05 13:41 ` Andrew Cooper
  2016-02-05 13:41 ` [PATCH v2 02/30] xen/x86: Do not store VIA/Cyrix/Centaur CPU features Andrew Cooper
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:41 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Introducing an X86_FEATURE aliased value turns out to complicate automatic
processing of the feature list.  Drop X86_FEATURE_3DNOW_ALT and use
X86_FEATURE_PBE, extending the comment accordingly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

New in v2
---
 xen/arch/x86/cpu/amd.c           | 9 ++++++---
 xen/include/asm-x86/cpufeature.h | 1 -
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 8ec841b..1ac44e0 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -440,9 +440,12 @@ static void init_amd(struct cpuinfo_x86 *c)
 		wrmsrl(MSR_K7_HWCR, value);
 	}
 
-	/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
-	   3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
-	__clear_bit(X86_FEATURE_3DNOW_ALT, c->x86_capability);
+	/*
+	 * Some AMD CPUs duplicate the 3DNow bit in base and extended CPUID
+	 * leaves.  Unfortunately, this aliases PBE on Intel CPUs. Clobber the
+	 * alias, leaving 3DNow in the extended leaf.
+	 */
+	__clear_bit(X86_FEATURE_PBE, c->x86_capability);
 	
 	if (c->x86 == 0xf && c->x86_model < 0x14
 	    && cpu_has(c, X86_FEATURE_LAHF_LM)) {
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 23f9fb2..6583039 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -45,7 +45,6 @@
 #define X86_FEATURE_ACC		(0*32+29) /* Automatic clock control */
 #define X86_FEATURE_IA64	(0*32+30) /* IA-64 processor */
 #define X86_FEATURE_PBE		(0*32+31) /* Pending Break Enable */
-#define X86_FEATURE_3DNOW_ALT	(0*32+31) /* AMD nonstandard 3DNow (Aliases PBE) */
 
 /* AMD-defined CPU features, CPUID level 0x80000001, word 1 */
 /* Don't duplicate feature flags which are redundant with Intel! */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 02/30] xen/x86: Do not store VIA/Cyrix/Centaur CPU features
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
  2016-02-05 13:41 ` [PATCH v2 01/30] xen/x86: Drop X86_FEATURE_3DNOW_ALT Andrew Cooper
@ 2016-02-05 13:41 ` Andrew Cooper
  2016-02-05 13:41 ` [PATCH v2 03/30] xen/x86: Drop cpuinfo_x86.x86_power Andrew Cooper
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:41 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Nothing uses them.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

New in v2
---
 xen/arch/x86/cpu/centaur.c       |  3 ---
 xen/include/asm-x86/cpufeature.h | 12 +-----------
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/xen/arch/x86/cpu/centaur.c b/xen/arch/x86/cpu/centaur.c
index c0ac117..b137d55 100644
--- a/xen/arch/x86/cpu/centaur.c
+++ b/xen/arch/x86/cpu/centaur.c
@@ -38,9 +38,6 @@ static void init_c3(struct cpuinfo_x86 *c)
 			wrmsrl(MSR_VIA_RNG, msr_content | RNG_ENABLE);
 			printk(KERN_INFO "CPU: Enabled h/w RNG\n");
 		}
-
-		c->x86_capability[cpufeat_word(X86_FEATURE_XSTORE)]
-                    = cpuid_edx(0xC0000001);
 	}
 
 	if (c->x86 == 0x6 && c->x86_model >= 0xf) {
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 6583039..e7e369b 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -109,17 +109,7 @@
 #define X86_FEATURE_RDRAND 	(4*32+30) /* Digital Random Number Generator */
 #define X86_FEATURE_HYPERVISOR	(4*32+31) /* Running under some hypervisor */
 
-/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
-#define X86_FEATURE_XSTORE	(5*32+ 2) /* on-CPU RNG present (xstore insn) */
-#define X86_FEATURE_XSTORE_EN	(5*32+ 3) /* on-CPU RNG enabled */
-#define X86_FEATURE_XCRYPT	(5*32+ 6) /* on-CPU crypto (xcrypt insn) */
-#define X86_FEATURE_XCRYPT_EN	(5*32+ 7) /* on-CPU crypto enabled */
-#define X86_FEATURE_ACE2	(5*32+ 8) /* Advanced Cryptography Engine v2 */
-#define X86_FEATURE_ACE2_EN	(5*32+ 9) /* ACE v2 enabled */
-#define X86_FEATURE_PHE		(5*32+ 10) /* PadLock Hash Engine */
-#define X86_FEATURE_PHE_EN	(5*32+ 11) /* PHE enabled */
-#define X86_FEATURE_PMM		(5*32+ 12) /* PadLock Montgomery Multiplier */
-#define X86_FEATURE_PMM_EN	(5*32+ 13) /* PMM enabled */
+/* UNUSED, word 5 */
 
 /* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */
 #define X86_FEATURE_LAHF_LM     (6*32+ 0) /* LAHF/SAHF in long mode */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 03/30] xen/x86: Drop cpuinfo_x86.x86_power
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
  2016-02-05 13:41 ` [PATCH v2 01/30] xen/x86: Drop X86_FEATURE_3DNOW_ALT Andrew Cooper
  2016-02-05 13:41 ` [PATCH v2 02/30] xen/x86: Do not store VIA/Cyrix/Centaur CPU features Andrew Cooper
@ 2016-02-05 13:41 ` Andrew Cooper
  2016-02-05 13:41 ` [PATCH v2 04/30] xen/x86: Improvements to pv_cpuid() Andrew Cooper
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:41 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Nothing uses it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

New in v2
---
 xen/arch/x86/cpu/amd.c          | 3 +--
 xen/include/asm-x86/processor.h | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 1ac44e0..c184f57 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -475,8 +475,7 @@ static void init_amd(struct cpuinfo_x86 *c)
 	}
 
 	if (c->extended_cpuid_level >= 0x80000007) {
-		c->x86_power = cpuid_edx(0x80000007);
-		if (c->x86_power & (1<<8)) {
+		if (cpuid_edx(0x80000007) & (1<<8)) {
 			__set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
 			__set_bit(X86_FEATURE_NONSTOP_TSC, c->x86_capability);
 			if (c->x86 != 0x11)
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 26ba141..271340e 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -191,7 +191,6 @@ struct cpuinfo_x86 {
     char x86_model_id[64];
     int  x86_cache_size; /* in KB - valid for CPUS which support this call  */
     int  x86_cache_alignment;    /* In bytes */
-    int  x86_power;
     __u32 x86_max_cores; /* cpuid returned max cores value */
     __u32 booted_cores;  /* number of cores as seen by OS */
     __u32 x86_num_siblings; /* cpuid logical cpus per chip value */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 04/30] xen/x86: Improvements to pv_cpuid()
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (2 preceding siblings ...)
  2016-02-05 13:41 ` [PATCH v2 03/30] xen/x86: Drop cpuinfo_x86.x86_power Andrew Cooper
@ 2016-02-05 13:41 ` Andrew Cooper
  2016-02-05 13:41 ` [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API Andrew Cooper
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:41 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

pv_cpuid() has two completely separate paths inside it depending on whether
current is dom0 or a domU.  This causes unnecessary divergence, and
complicates future improvements.  Take steps to undo it.

Changes:
 * Create leaf and subleaf variables and use them consistently, instead of a
   mix of {a,c} and regs->e{a,c}x as the input parameters.
 * Combine the dom0 and domU hypervisor leaf handling, with an early exit.
 * Apply sanity checks to domU as well.  This brings PV domU cpuid handling in
   line with HVM domains and PV dom0.
 * Perform a real cpuid instruction for calculating CPUID.0xD[ECX=0].EBX.  The
   correct xcr0 is in context, and this avoids the O(M*N) loop over the domain
   cpuid policy list which exists currently.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

New in v2
---
 xen/arch/x86/traps.c | 74 ++++++++++++++++++++--------------------------------
 1 file changed, 29 insertions(+), 45 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index e105b95..6a181bb 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -824,51 +824,24 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx,
 
 void pv_cpuid(struct cpu_user_regs *regs)
 {
-    uint32_t a, b, c, d;
+    uint32_t leaf, subleaf, a, b, c, d;
     struct vcpu *curr = current;
     struct domain *currd = curr->domain;
 
-    a = regs->eax;
+    leaf = a = regs->eax;
     b = regs->ebx;
-    c = regs->ecx;
+    subleaf = c = regs->ecx;
     d = regs->edx;
 
-    if ( !is_control_domain(currd) && !is_hardware_domain(currd) )
-    {
-        unsigned int cpuid_leaf = a, sub_leaf = c;
-
-        if ( !cpuid_hypervisor_leaves(a, c, &a, &b, &c, &d) )
-            domain_cpuid(currd, a, c, &a, &b, &c, &d);
-
-        switch ( cpuid_leaf )
-        {
-        case XSTATE_CPUID:
-        {
-            unsigned int _eax, _ebx, _ecx, _edx;
-            /* EBX value of main leaf 0 depends on enabled xsave features */
-            if ( sub_leaf == 0 && curr->arch.xcr0 )
-            {
-                /* reset EBX to default value first */
-                b = XSTATE_AREA_MIN_SIZE;
-                for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
-                {
-                    if ( !(curr->arch.xcr0 & (1ULL << sub_leaf)) )
-                        continue;
-                    domain_cpuid(currd, cpuid_leaf, sub_leaf,
-                                 &_eax, &_ebx, &_ecx, &_edx);
-                    if ( (_eax + _ebx) > b )
-                        b = _eax + _ebx;
-                }
-            }
-            goto xstate;
-        }
-        }
+    if ( cpuid_hypervisor_leaves(leaf, subleaf, &a, &b, &c, &d) )
         goto out;
-    }
 
-    cpuid_count(a, c, &a, &b, &c, &d);
+    if ( !is_control_domain(currd) && !is_hardware_domain(currd) )
+        domain_cpuid(currd, leaf, subleaf, &a, &b, &c, &d);
+    else
+        cpuid_count(leaf, subleaf, &a, &b, &c, &d);
 
-    if ( (regs->eax & 0x7fffffff) == 0x00000001 )
+    if ( (leaf & 0x7fffffff) == 0x00000001 )
     {
         /* Modify Feature Information. */
         if ( !cpu_has_apic )
@@ -883,7 +856,7 @@ void pv_cpuid(struct cpu_user_regs *regs)
         }
     }
 
-    switch ( regs->_eax )
+    switch ( leaf )
     {
     case 0x00000001:
         /* Modify Feature Information. */
@@ -918,7 +891,7 @@ void pv_cpuid(struct cpu_user_regs *regs)
         break;
 
     case 0x00000007:
-        if ( regs->_ecx == 0 )
+        if ( subleaf == 0 )
             b &= (cpufeat_mask(X86_FEATURE_BMI1) |
                   cpufeat_mask(X86_FEATURE_HLE)  |
                   cpufeat_mask(X86_FEATURE_AVX2) |
@@ -934,14 +907,29 @@ void pv_cpuid(struct cpu_user_regs *regs)
         break;
 
     case XSTATE_CPUID:
-    xstate:
         if ( !cpu_has_xsave )
             goto unsupported;
-        if ( regs->_ecx == 1 )
+        switch ( subleaf )
+        {
+        case 0:
         {
+            uint32_t tmp;
+
+            /*
+             * Always read CPUID.0xD[ECX=0].EBX from hardware, rather than
+             * domain policy.  It varies with enabled xstate, and the correct
+             * xcr0 is in context.
+             */
+            if ( !is_control_domain(currd) && !is_hardware_domain(currd) )
+                cpuid_count(leaf, subleaf, &tmp, &b, &tmp, &tmp);
+            break;
+        }
+
+        case 1:
             a &= (boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_XSAVEOPT)] &
                   ~cpufeat_mask(X86_FEATURE_XSAVES));
             b = c = d = 0;
+            break;
         }
         break;
 
@@ -983,15 +971,11 @@ void pv_cpuid(struct cpu_user_regs *regs)
     unsupported:
         a = b = c = d = 0;
         break;
-
-    default:
-        (void)cpuid_hypervisor_leaves(regs->eax, 0, &a, &b, &c, &d);
-        break;
     }
 
  out:
     /* VPMU may decide to modify some of the leaves */
-    vpmu_do_cpuid(regs->eax, &a, &b, &c, &d);
+    vpmu_do_cpuid(leaf, &a, &b, &c, &d);
 
     regs->eax = a;
     regs->ebx = b;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (3 preceding siblings ...)
  2016-02-05 13:41 ` [PATCH v2 04/30] xen/x86: Improvements to pv_cpuid() Andrew Cooper
@ 2016-02-05 13:41 ` Andrew Cooper
  2016-02-12 16:27   ` Jan Beulich
  2016-02-19 17:29   ` Joao Martins
  2016-02-05 13:41 ` [PATCH v2 06/30] xen/x86: Script to automatically process featureset information Andrew Cooper
                   ` (25 subsequent siblings)
  30 siblings, 2 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:41 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan, Ian Campbell, Jan Beulich

For the featureset to be a useful object, it needs a stable interpretation, a
property which is missing from the current hw_caps interface.

Additionly, introduce TSC_ADJUST, SHA, PREFETCHWT1, ITSC, EFRO and CLZERO
which will be used by later changes.

To maintain compilation, FSCAPINTS is currently hardcoded at 9.  Future
changes will change this to being dynamically generated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Ian Campbell <Ian.Campbell@citrix.com>

v2:
 * Rebase over upstream changes
 * Collect all feature introductions from later in the series
 * Restrict API to Xen and toolstack
---
 xen/include/asm-x86/cpufeature.h            | 159 +++--------------------
 xen/include/public/arch-x86/cpufeatureset.h | 195 ++++++++++++++++++++++++++++
 2 files changed, 210 insertions(+), 144 deletions(-)
 create mode 100644 xen/include/public/arch-x86/cpufeatureset.h

diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index e7e369b..eb6eb63 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -11,151 +11,22 @@
 
 #include <xen/const.h>
 
-#define NCAPINTS	9	/* N 32-bit words worth of info */
-
-/* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */
-#define X86_FEATURE_FPU		(0*32+ 0) /* Onboard FPU */
-#define X86_FEATURE_VME		(0*32+ 1) /* Virtual Mode Extensions */
-#define X86_FEATURE_DE		(0*32+ 2) /* Debugging Extensions */
-#define X86_FEATURE_PSE 	(0*32+ 3) /* Page Size Extensions */
-#define X86_FEATURE_TSC		(0*32+ 4) /* Time Stamp Counter */
-#define X86_FEATURE_MSR		(0*32+ 5) /* Model-Specific Registers, RDMSR, WRMSR */
-#define X86_FEATURE_PAE		(0*32+ 6) /* Physical Address Extensions */
-#define X86_FEATURE_MCE		(0*32+ 7) /* Machine Check Architecture */
-#define X86_FEATURE_CX8		(0*32+ 8) /* CMPXCHG8 instruction */
-#define X86_FEATURE_APIC	(0*32+ 9) /* Onboard APIC */
-#define X86_FEATURE_SEP		(0*32+11) /* SYSENTER/SYSEXIT */
-#define X86_FEATURE_MTRR	(0*32+12) /* Memory Type Range Registers */
-#define X86_FEATURE_PGE		(0*32+13) /* Page Global Enable */
-#define X86_FEATURE_MCA		(0*32+14) /* Machine Check Architecture */
-#define X86_FEATURE_CMOV	(0*32+15) /* CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
-#define X86_FEATURE_PAT		(0*32+16) /* Page Attribute Table */
-#define X86_FEATURE_PSE36	(0*32+17) /* 36-bit PSEs */
-#define X86_FEATURE_PN		(0*32+18) /* Processor serial number */
-#define X86_FEATURE_CLFLSH	(0*32+19) /* Supports the CLFLUSH instruction */
-#define X86_FEATURE_DS		(0*32+21) /* Debug Store */
-#define X86_FEATURE_ACPI	(0*32+22) /* ACPI via MSR */
-#define X86_FEATURE_MMX		(0*32+23) /* Multimedia Extensions */
-#define X86_FEATURE_FXSR	(0*32+24) /* FXSAVE and FXRSTOR instructions (fast save and restore */
-				          /* of FPU context), and CR4.OSFXSR available */
-#define X86_FEATURE_XMM		(0*32+25) /* Streaming SIMD Extensions */
-#define X86_FEATURE_XMM2	(0*32+26) /* Streaming SIMD Extensions-2 */
-#define X86_FEATURE_SELFSNOOP	(0*32+27) /* CPU self snoop */
-#define X86_FEATURE_HT		(0*32+28) /* Hyper-Threading */
-#define X86_FEATURE_ACC		(0*32+29) /* Automatic clock control */
-#define X86_FEATURE_IA64	(0*32+30) /* IA-64 processor */
-#define X86_FEATURE_PBE		(0*32+31) /* Pending Break Enable */
-
-/* AMD-defined CPU features, CPUID level 0x80000001, word 1 */
-/* Don't duplicate feature flags which are redundant with Intel! */
-#define X86_FEATURE_SYSCALL	(1*32+11) /* SYSCALL/SYSRET */
-#define X86_FEATURE_MP		(1*32+19) /* MP Capable. */
-#define X86_FEATURE_NX		(1*32+20) /* Execute Disable */
-#define X86_FEATURE_MMXEXT	(1*32+22) /* AMD MMX extensions */
-#define X86_FEATURE_FFXSR       (1*32+25) /* FFXSR instruction optimizations */
-#define X86_FEATURE_PAGE1GB	(1*32+26) /* 1Gb large page support */
-#define X86_FEATURE_RDTSCP	(1*32+27) /* RDTSCP */
-#define X86_FEATURE_LM		(1*32+29) /* Long Mode (x86-64) */
-#define X86_FEATURE_3DNOWEXT	(1*32+30) /* AMD 3DNow! extensions */
-#define X86_FEATURE_3DNOW	(1*32+31) /* 3DNow! */
-
-/* Intel-defined CPU features, CPUID level 0x0000000D:1 (eax), word 2 */
-#define X86_FEATURE_XSAVEOPT	(2*32+ 0) /* XSAVEOPT instruction. */
-#define X86_FEATURE_XSAVEC	(2*32+ 1) /* XSAVEC/XRSTORC instructions. */
-#define X86_FEATURE_XGETBV1	(2*32+ 2) /* XGETBV with %ecx=1. */
-#define X86_FEATURE_XSAVES	(2*32+ 3) /* XSAVES/XRSTORS instructions. */
-
-/* Other features, Linux-defined mapping, word 3 */
+#include <public/arch-x86/cpufeatureset.h>
+
+#define FSCAPINTS 9
+#define NCAPINTS (FSCAPINTS + 1) /* N 32-bit words worth of info */
+
+/* Other features, Linux-defined mapping, FSMAX+1 */
 /* This range is used for feature bits which conflict or are synthesized */
-#define X86_FEATURE_CONSTANT_TSC (3*32+ 8) /* TSC ticks at a constant rate */
-#define X86_FEATURE_NONSTOP_TSC	(3*32+ 9) /* TSC does not stop in C states */
-#define X86_FEATURE_ARAT	(3*32+ 10) /* Always running APIC timer */
-#define X86_FEATURE_ARCH_PERFMON (3*32+11) /* Intel Architectural PerfMon */
-#define X86_FEATURE_TSC_RELIABLE (3*32+12) /* TSC is known to be reliable */
-#define X86_FEATURE_XTOPOLOGY    (3*32+13) /* cpu topology enum extensions */
-#define X86_FEATURE_CPUID_FAULTING (3*32+14) /* cpuid faulting */
-#define X86_FEATURE_CLFLUSH_MONITOR (3*32+15) /* clflush reqd with monitor */
-#define X86_FEATURE_APERFMPERF   (3*32+16) /* APERFMPERF */
-
-/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
-#define X86_FEATURE_XMM3	(4*32+ 0) /* Streaming SIMD Extensions-3 */
-#define X86_FEATURE_PCLMULQDQ	(4*32+ 1) /* Carry-less mulitplication */
-#define X86_FEATURE_DTES64	(4*32+ 2) /* 64-bit Debug Store */
-#define X86_FEATURE_MWAIT	(4*32+ 3) /* Monitor/Mwait support */
-#define X86_FEATURE_DSCPL	(4*32+ 4) /* CPL Qualified Debug Store */
-#define X86_FEATURE_VMXE	(4*32+ 5) /* Virtual Machine Extensions */
-#define X86_FEATURE_SMXE	(4*32+ 6) /* Safer Mode Extensions */
-#define X86_FEATURE_EST		(4*32+ 7) /* Enhanced SpeedStep */
-#define X86_FEATURE_TM2		(4*32+ 8) /* Thermal Monitor 2 */
-#define X86_FEATURE_SSSE3	(4*32+ 9) /* Supplemental Streaming SIMD Extensions-3 */
-#define X86_FEATURE_CID		(4*32+10) /* Context ID */
-#define X86_FEATURE_FMA		(4*32+12) /* Fused Multiply Add */
-#define X86_FEATURE_CX16        (4*32+13) /* CMPXCHG16B */
-#define X86_FEATURE_XTPR	(4*32+14) /* Send Task Priority Messages */
-#define X86_FEATURE_PDCM	(4*32+15) /* Perf/Debug Capability MSR */
-#define X86_FEATURE_PCID	(4*32+17) /* Process Context ID */
-#define X86_FEATURE_DCA		(4*32+18) /* Direct Cache Access */
-#define X86_FEATURE_SSE4_1	(4*32+19) /* Streaming SIMD Extensions 4.1 */
-#define X86_FEATURE_SSE4_2	(4*32+20) /* Streaming SIMD Extensions 4.2 */
-#define X86_FEATURE_X2APIC	(4*32+21) /* Extended xAPIC */
-#define X86_FEATURE_MOVBE	(4*32+22) /* movbe instruction */
-#define X86_FEATURE_POPCNT	(4*32+23) /* POPCNT instruction */
-#define X86_FEATURE_TSC_DEADLINE (4*32+24) /* "tdt" TSC Deadline Timer */
-#define X86_FEATURE_AES		(4*32+25) /* AES instructions */
-#define X86_FEATURE_XSAVE	(4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */
-#define X86_FEATURE_OSXSAVE	(4*32+27) /* OSXSAVE */
-#define X86_FEATURE_AVX 	(4*32+28) /* Advanced Vector Extensions */
-#define X86_FEATURE_F16C 	(4*32+29) /* Half-precision convert instruction */
-#define X86_FEATURE_RDRAND 	(4*32+30) /* Digital Random Number Generator */
-#define X86_FEATURE_HYPERVISOR	(4*32+31) /* Running under some hypervisor */
-
-/* UNUSED, word 5 */
-
-/* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */
-#define X86_FEATURE_LAHF_LM     (6*32+ 0) /* LAHF/SAHF in long mode */
-#define X86_FEATURE_CMP_LEGACY  (6*32+ 1) /* If yes HyperThreading not valid */
-#define X86_FEATURE_SVM         (6*32+ 2) /* Secure virtual machine */
-#define X86_FEATURE_EXTAPIC     (6*32+ 3) /* Extended APIC space */
-#define X86_FEATURE_CR8_LEGACY  (6*32+ 4) /* CR8 in 32-bit mode */
-#define X86_FEATURE_ABM         (6*32+ 5) /* Advanced bit manipulation */
-#define X86_FEATURE_SSE4A       (6*32+ 6) /* SSE-4A */
-#define X86_FEATURE_MISALIGNSSE (6*32+ 7) /* Misaligned SSE mode */
-#define X86_FEATURE_3DNOWPREFETCH (6*32+ 8) /* 3DNow prefetch instructions */
-#define X86_FEATURE_OSVW        (6*32+ 9) /* OS Visible Workaround */
-#define X86_FEATURE_IBS         (6*32+10) /* Instruction Based Sampling */
-#define X86_FEATURE_XOP         (6*32+11) /* extended AVX instructions */
-#define X86_FEATURE_SKINIT      (6*32+12) /* SKINIT/STGI instructions */
-#define X86_FEATURE_WDT         (6*32+13) /* Watchdog timer */
-#define X86_FEATURE_LWP         (6*32+15) /* Light Weight Profiling */
-#define X86_FEATURE_FMA4        (6*32+16) /* 4 operands MAC instructions */
-#define X86_FEATURE_NODEID_MSR  (6*32+19) /* NodeId MSR */
-#define X86_FEATURE_TBM         (6*32+21) /* trailing bit manipulations */
-#define X86_FEATURE_TOPOEXT     (6*32+22) /* topology extensions CPUID leafs */
-#define X86_FEATURE_DBEXT       (6*32+26) /* data breakpoint extension */
-#define X86_FEATURE_MWAITX      (6*32+29) /* MWAIT extension (MONITORX/MWAITX) */
-
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 7 */
-#define X86_FEATURE_FSGSBASE	(7*32+ 0) /* {RD,WR}{FS,GS}BASE instructions */
-#define X86_FEATURE_BMI1	(7*32+ 3) /* 1st bit manipulation extensions */
-#define X86_FEATURE_HLE 	(7*32+ 4) /* Hardware Lock Elision */
-#define X86_FEATURE_AVX2	(7*32+ 5) /* AVX2 instructions */
-#define X86_FEATURE_SMEP	(7*32+ 7) /* Supervisor Mode Execution Protection */
-#define X86_FEATURE_BMI2	(7*32+ 8) /* 2nd bit manipulation extensions */
-#define X86_FEATURE_ERMS	(7*32+ 9) /* Enhanced REP MOVSB/STOSB */
-#define X86_FEATURE_INVPCID	(7*32+10) /* Invalidate Process Context ID */
-#define X86_FEATURE_RTM 	(7*32+11) /* Restricted Transactional Memory */
-#define X86_FEATURE_CMT 	(7*32+12) /* Cache Monitoring Technology */
-#define X86_FEATURE_NO_FPU_SEL 	(7*32+13) /* FPU CS/DS stored as zero */
-#define X86_FEATURE_MPX		(7*32+14) /* Memory Protection Extensions */
-#define X86_FEATURE_CAT 	(7*32+15) /* Cache Allocation Technology */
-#define X86_FEATURE_RDSEED	(7*32+18) /* RDSEED instruction */
-#define X86_FEATURE_ADX		(7*32+19) /* ADCX, ADOX instructions */
-#define X86_FEATURE_SMAP	(7*32+20) /* Supervisor Mode Access Prevention */
-#define X86_FEATURE_PCOMMIT	(7*32+22) /* PCOMMIT instruction */
-
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 8 */
-#define X86_FEATURE_PKU	(8*32+ 3) /* Protection Keys for Userspace */
-#define X86_FEATURE_OSPKE	(8*32+ 4) /* OS Protection Keys Enable */
+#define X86_FEATURE_CONSTANT_TSC	((FSCAPINTS+0)*32+ 8) /* TSC ticks at a constant rate */
+#define X86_FEATURE_NONSTOP_TSC		((FSCAPINTS+0)*32+ 9) /* TSC does not stop in C states */
+#define X86_FEATURE_ARAT		((FSCAPINTS+0)*32+10) /* Always running APIC timer */
+#define X86_FEATURE_ARCH_PERFMON	((FSCAPINTS+0)*32+11) /* Intel Architectural PerfMon */
+#define X86_FEATURE_TSC_RELIABLE	((FSCAPINTS+0)*32+12) /* TSC is known to be reliable */
+#define X86_FEATURE_XTOPOLOGY		((FSCAPINTS+0)*32+13) /* cpu topology enum extensions */
+#define X86_FEATURE_CPUID_FAULTING	((FSCAPINTS+0)*32+14) /* cpuid faulting */
+#define X86_FEATURE_CLFLUSH_MONITOR	((FSCAPINTS+0)*32+15) /* clflush reqd with monitor */
+#define X86_FEATURE_APERFMPERF		((FSCAPINTS+0)*32+16) /* APERFMPERF */
 
 #define cpufeat_word(idx)	((idx) / 32)
 #define cpufeat_bit(idx)	((idx) % 32)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
new file mode 100644
index 0000000..02d695d
--- /dev/null
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -0,0 +1,195 @@
+/*
+ * arch-x86/cpufeatureset.h
+ *
+ * CPU featureset definitions
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2015 Citrix Systems, Inc.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_CPUFEATURESET_H__
+#define __XEN_PUBLIC_ARCH_X86_CPUFEATURESET_H__
+
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+
+/*
+ * A featureset is a bitmap of x86 features, represented as a collection of
+ * 32bit words.
+ *
+ * Words are as specified in vendors programming manuals, and shall not
+ * contain any synthesied values.  New words may be added to the end of
+ * featureset.
+ *
+ * All featureset words currently originate from leaves specified for the
+ * CPUID instruction, but this is not preclude other sources of information.
+ */
+
+/* Intel-defined CPU features, CPUID level 0x00000001.edx, word 0 */
+#define X86_FEATURE_FPU           ( 0*32+ 0) /*   Onboard FPU */
+#define X86_FEATURE_VME           ( 0*32+ 1) /*   Virtual Mode Extensions */
+#define X86_FEATURE_DE            ( 0*32+ 2) /*   Debugging Extensions */
+#define X86_FEATURE_PSE           ( 0*32+ 3) /*   Page Size Extensions */
+#define X86_FEATURE_TSC           ( 0*32+ 4) /*   Time Stamp Counter */
+#define X86_FEATURE_MSR           ( 0*32+ 5) /*   Model-Specific Registers, RDMSR, WRMSR */
+#define X86_FEATURE_PAE           ( 0*32+ 6) /*   Physical Address Extensions */
+#define X86_FEATURE_MCE           ( 0*32+ 7) /*   Machine Check Architecture */
+#define X86_FEATURE_CX8           ( 0*32+ 8) /*   CMPXCHG8 instruction */
+#define X86_FEATURE_APIC          ( 0*32+ 9) /*   Onboard APIC */
+#define X86_FEATURE_SEP           ( 0*32+11) /*   SYSENTER/SYSEXIT */
+#define X86_FEATURE_MTRR          ( 0*32+12) /*   Memory Type Range Registers */
+#define X86_FEATURE_PGE           ( 0*32+13) /*   Page Global Enable */
+#define X86_FEATURE_MCA           ( 0*32+14) /*   Machine Check Architecture */
+#define X86_FEATURE_CMOV          ( 0*32+15) /*   CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
+#define X86_FEATURE_PAT           ( 0*32+16) /*   Page Attribute Table */
+#define X86_FEATURE_PSE36         ( 0*32+17) /*   36-bit PSEs */
+#define X86_FEATURE_PN            ( 0*32+18) /*   Processor serial number */
+#define X86_FEATURE_CLFLSH        ( 0*32+19) /*   CLFLUSH instruction */
+#define X86_FEATURE_DS            ( 0*32+21) /*   Debug Store */
+#define X86_FEATURE_ACPI          ( 0*32+22) /*   ACPI via MSR */
+#define X86_FEATURE_MMX           ( 0*32+23) /*   Multimedia Extensions */
+#define X86_FEATURE_FXSR          ( 0*32+24) /*   FXSAVE and FXRSTOR instructions */
+#define X86_FEATURE_XMM           ( 0*32+25) /*   Streaming SIMD Extensions */
+#define X86_FEATURE_XMM2          ( 0*32+26) /*   Streaming SIMD Extensions-2 */
+#define X86_FEATURE_SELFSNOOP     ( 0*32+27) /*   CPU self snoop */
+#define X86_FEATURE_HT            ( 0*32+28) /*   Hyper-Threading */
+#define X86_FEATURE_ACC           ( 0*32+29) /*   Automatic clock control */
+#define X86_FEATURE_IA64          ( 0*32+30) /*   IA-64 processor */
+#define X86_FEATURE_PBE           ( 0*32+31) /*   Pending Break Enable */
+
+/* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
+#define X86_FEATURE_XMM3          ( 1*32+ 0) /*   Streaming SIMD Extensions-3 */
+#define X86_FEATURE_PCLMULQDQ     ( 1*32+ 1) /*   Carry-less mulitplication */
+#define X86_FEATURE_DTES64        ( 1*32+ 2) /*   64-bit Debug Store */
+#define X86_FEATURE_MWAIT         ( 1*32+ 3) /*   Monitor/Mwait support */
+#define X86_FEATURE_DSCPL         ( 1*32+ 4) /*   CPL Qualified Debug Store */
+#define X86_FEATURE_VMXE          ( 1*32+ 5) /*   Virtual Machine Extensions */
+#define X86_FEATURE_SMXE          ( 1*32+ 6) /*   Safer Mode Extensions */
+#define X86_FEATURE_EST           ( 1*32+ 7) /*   Enhanced SpeedStep */
+#define X86_FEATURE_TM2           ( 1*32+ 8) /*   Thermal Monitor 2 */
+#define X86_FEATURE_SSSE3         ( 1*32+ 9) /*   Supplemental Streaming SIMD Extensions-3 */
+#define X86_FEATURE_CID           ( 1*32+10) /*   Context ID */
+#define X86_FEATURE_FMA           ( 1*32+12) /*   Fused Multiply Add */
+#define X86_FEATURE_CX16          ( 1*32+13) /*   CMPXCHG16B */
+#define X86_FEATURE_XTPR          ( 1*32+14) /*   Send Task Priority Messages */
+#define X86_FEATURE_PDCM          ( 1*32+15) /*   Perf/Debug Capability MSR */
+#define X86_FEATURE_PCID          ( 1*32+17) /*   Process Context ID */
+#define X86_FEATURE_DCA           ( 1*32+18) /*   Direct Cache Access */
+#define X86_FEATURE_SSE4_1        ( 1*32+19) /*   Streaming SIMD Extensions 4.1 */
+#define X86_FEATURE_SSE4_2        ( 1*32+20) /*   Streaming SIMD Extensions 4.2 */
+#define X86_FEATURE_X2APIC        ( 1*32+21) /*   Extended xAPIC */
+#define X86_FEATURE_MOVBE         ( 1*32+22) /*   movbe instruction */
+#define X86_FEATURE_POPCNT        ( 1*32+23) /*   POPCNT instruction */
+#define X86_FEATURE_TSC_DEADLINE  ( 1*32+24) /*   TSC Deadline Timer */
+#define X86_FEATURE_AES           ( 1*32+25) /*   AES instructions */
+#define X86_FEATURE_XSAVE         ( 1*32+26) /*   XSAVE/XRSTOR/XSETBV/XGETBV */
+#define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
+#define X86_FEATURE_AVX           ( 1*32+28) /*   Advanced Vector Extensions */
+#define X86_FEATURE_F16C          ( 1*32+29) /*   Half-precision convert instruction */
+#define X86_FEATURE_RDRAND        ( 1*32+30) /*   Digital Random Number Generator */
+#define X86_FEATURE_HYPERVISOR    ( 1*32+31) /*   Running under some hypervisor */
+
+/* AMD-defined CPU features, CPUID level 0x80000001.edx, word 2 */
+#define X86_FEATURE_SYSCALL       ( 2*32+11) /*   SYSCALL/SYSRET */
+#define X86_FEATURE_MP            ( 2*32+19) /*   MP Capable. */
+#define X86_FEATURE_NX            ( 2*32+20) /*   Execute Disable */
+#define X86_FEATURE_MMXEXT        ( 2*32+22) /*   AMD MMX extensions */
+#define X86_FEATURE_FFXSR         ( 2*32+25) /*   FFXSR instruction optimizations */
+#define X86_FEATURE_PAGE1GB       ( 2*32+26) /*   1Gb large page support */
+#define X86_FEATURE_RDTSCP        ( 2*32+27) /*   RDTSCP */
+#define X86_FEATURE_LM            ( 2*32+29) /*   Long Mode (x86-64) */
+#define X86_FEATURE_3DNOWEXT      ( 2*32+30) /*   AMD 3DNow! extensions */
+#define X86_FEATURE_3DNOW         ( 2*32+31) /*   3DNow! */
+
+/* AMD-defined CPU features, CPUID level 0x80000001.ecx, word 3 */
+#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
+#define X86_FEATURE_CMP_LEGACY    ( 3*32+ 1) /*   If yes HyperThreading not valid */
+#define X86_FEATURE_SVM           ( 3*32+ 2) /*   Secure virtual machine */
+#define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
+#define X86_FEATURE_CR8_LEGACY    ( 3*32+ 4) /*   CR8 in 32-bit mode */
+#define X86_FEATURE_ABM           ( 3*32+ 5) /*   Advanced bit manipulation */
+#define X86_FEATURE_SSE4A         ( 3*32+ 6) /*   SSE-4A */
+#define X86_FEATURE_MISALIGNSSE   ( 3*32+ 7) /*   Misaligned SSE mode */
+#define X86_FEATURE_3DNOWPREFETCH ( 3*32+ 8) /*   3DNow prefetch instructions */
+#define X86_FEATURE_OSVW          ( 3*32+ 9) /*   OS Visible Workaround */
+#define X86_FEATURE_IBS           ( 3*32+10) /*   Instruction Based Sampling */
+#define X86_FEATURE_XOP           ( 3*32+11) /*   extended AVX instructions */
+#define X86_FEATURE_SKINIT        ( 3*32+12) /*   SKINIT/STGI instructions */
+#define X86_FEATURE_WDT           ( 3*32+13) /*   Watchdog timer */
+#define X86_FEATURE_LWP           ( 3*32+15) /*   Light Weight Profiling */
+#define X86_FEATURE_FMA4          ( 3*32+16) /*   4 operands MAC instructions */
+#define X86_FEATURE_NODEID_MSR    ( 3*32+19) /*   NodeId MSR */
+#define X86_FEATURE_TBM           ( 3*32+21) /*   trailing bit manipulations */
+#define X86_FEATURE_TOPOEXT       ( 3*32+22) /*   topology extensions CPUID leafs */
+#define X86_FEATURE_DBEXT         ( 3*32+26) /*   data breakpoint extension */
+#define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension (MONITORX/MWAITX) */
+
+/* Intel-defined CPU features, CPUID level 0x0000000D:1.eax, word 4 */
+#define X86_FEATURE_XSAVEOPT      ( 4*32+ 0) /*   XSAVEOPT instruction */
+#define X86_FEATURE_XSAVEC        ( 4*32+ 1) /*   XSAVEC/XRSTORC instructions */
+#define X86_FEATURE_XGETBV1       ( 4*32+ 2) /*   XGETBV with %ecx=1 */
+#define X86_FEATURE_XSAVES        ( 4*32+ 3) /*   XSAVES/XRSTORS instructions */
+
+/* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */
+#define X86_FEATURE_FSGSBASE      ( 5*32+ 0) /*   {RD,WR}{FS,GS}BASE instructions */
+#define X86_FEATURE_TSC_ADJUST    ( 5*32+ 1) /*   TSC_ADJUST MSR available */
+#define X86_FEATURE_BMI1          ( 5*32+ 3) /*   1st bit manipulation extensions */
+#define X86_FEATURE_HLE           ( 5*32+ 4) /*   Hardware Lock Elision */
+#define X86_FEATURE_AVX2          ( 5*32+ 5) /*   AVX2 instructions */
+#define X86_FEATURE_SMEP          ( 5*32+ 7) /*   Supervisor Mode Execution Protection */
+#define X86_FEATURE_BMI2          ( 5*32+ 8) /*   2nd bit manipulation extensions */
+#define X86_FEATURE_ERMS          ( 5*32+ 9) /*   Enhanced REP MOVSB/STOSB */
+#define X86_FEATURE_INVPCID       ( 5*32+10) /*   Invalidate Process Context ID */
+#define X86_FEATURE_RTM           ( 5*32+11) /*   Restricted Transactional Memory */
+#define X86_FEATURE_CMT           ( 5*32+12) /*   Cache Monitoring Technology */
+#define X86_FEATURE_NO_FPU_SEL    ( 5*32+13) /*   FPU CS/DS stored as zero */
+#define X86_FEATURE_MPX           ( 5*32+14) /*   Memory Protection Extensions */
+#define X86_FEATURE_CAT           ( 5*32+15) /*   Cache Allocation Technology */
+#define X86_FEATURE_RDSEED        ( 5*32+18) /*   RDSEED instruction */
+#define X86_FEATURE_ADX           ( 5*32+19) /*   ADCX, ADOX instructions */
+#define X86_FEATURE_SMAP          ( 5*32+20) /*   Supervisor Mode Access Prevention */
+#define X86_FEATURE_PCOMMIT       ( 5*32+22) /*   PCOMMIT instruction */
+#define X86_FEATURE_CLFLUSHOPT    ( 5*32+23) /*   CLFLUSHOPT instruction */
+#define X86_FEATURE_CLWB          ( 5*32+24) /*   CLWB instruction */
+#define X86_FEATURE_SHA           ( 5*32+29) /*   SHA1 & SHA256 instructions */
+
+/* Intel-defined CPU features, CPUID level 0x00000007:0.ecx, word 6 */
+#define X86_FEATURE_PREFETCHWT1   ( 6*32+ 0) /*   PREFETCHWT1 instruction */
+#define X86_FEATURE_PKU           ( 6*32+ 3) /*   Protection Keys for Userspace */
+#define X86_FEATURE_OSPKE         ( 6*32+ 4) /*   OS Protection Keys Enable */
+
+/* AMD-defined CPU features, CPUID level 0x80000007.edx, word 7 */
+#define X86_FEATURE_ITSC          ( 7*32+ 8) /*   Invariant TSC */
+#define X86_FEATURE_EFRO          ( 7*32+10) /*   APERF/MPERF Read Only interface */
+
+/* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
+#define X86_FEATURE_CLZERO        ( 8*32+ 0) /*   CLZERO instruction */
+
+#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
+#endif /* !__XEN_PUBLIC_ARCH_X86_CPUFEATURESET_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 06/30] xen/x86: Script to automatically process featureset information
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (4 preceding siblings ...)
  2016-02-05 13:41 ` [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API Andrew Cooper
@ 2016-02-05 13:41 ` Andrew Cooper
  2016-02-12 16:36   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 07/30] xen/x86: Collect more cpuid feature leaves Andrew Cooper
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:41 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan, Ian Campbell, Jan Beulich

This script consumes include/public/arch-x86/cpufeatureset.h and generates a
single include/asm-x86/cpuid-autogen.h containing all the processed
information.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Ian Campbell <Ian.Campbell@citrix.com>

For all intents and purposes, new in v2.  All generated information is now
expressed by #defines (using C structure initialisers for most) and contained
in a single header file.
---
 .gitignore                       |   1 +
 xen/include/Makefile             |  10 ++
 xen/include/asm-x86/cpufeature.h |   4 +-
 xen/tools/gen-cpuid.py           | 191 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 205 insertions(+), 1 deletion(-)
 create mode 100755 xen/tools/gen-cpuid.py

diff --git a/.gitignore b/.gitignore
index 91f690c..b40453e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -252,6 +252,7 @@ xen/include/headers.chk
 xen/include/headers++.chk
 xen/include/asm
 xen/include/asm-*/asm-offsets.h
+xen/include/asm-x86/cpuid-autogen.h
 xen/include/compat/*
 xen/include/config/
 xen/include/generated/
diff --git a/xen/include/Makefile b/xen/include/Makefile
index 9c8188b..268bc9d 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -117,5 +117,15 @@ headers++.chk: $(PUBLIC_HEADERS) Makefile
 
 endif
 
+ifeq ($(XEN_TARGET_ARCH),x86_64)
+
+$(BASEDIR)/include/asm-x86/cpuid-autogen.h: $(BASEDIR)/include/public/arch-x86/cpufeatureset.h $(BASEDIR)/tools/gen-cpuid.py FORCE
+	$(PYTHON) $(BASEDIR)/tools/gen-cpuid.py -i $^ -o $@.new
+	$(call move-if-changed,$@.new,$@)
+
+all: $(BASEDIR)/include/asm-x86/cpuid-autogen.h
+endif
+
 clean::
 	rm -rf compat headers.chk headers++.chk
+	rm -f $(BASEDIR)/include/asm-x86/cpuid-autogen.h
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index eb6eb63..d069563 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -13,7 +13,9 @@
 
 #include <public/arch-x86/cpufeatureset.h>
 
-#define FSCAPINTS 9
+#include <asm/cpuid-autogen.h>
+
+#define FSCAPINTS FEATURESET_NR_ENTRIES
 #define NCAPINTS (FSCAPINTS + 1) /* N 32-bit words worth of info */
 
 /* Other features, Linux-defined mapping, FSMAX+1 */
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
new file mode 100755
index 0000000..c8240c0
--- /dev/null
+++ b/xen/tools/gen-cpuid.py
@@ -0,0 +1,191 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+import sys, os, re
+
+class Fail(Exception):
+    pass
+
+class State(object):
+
+    def __init__(self, input, output):
+
+        self.source = input
+        self.input  = open_file_or_fd(input, "r", 2)
+        self.output = open_file_or_fd(output, "w", 2)
+
+        # State parsed from input
+        self.names = {} # Name => value mapping
+
+        # State calculated
+        self.nr_entries = 0 # Number of words in a featureset
+
+def parse_definitions(state):
+    """
+    Parse featureset information from @param f and mutate the global
+    namespace with symbols
+    """
+    feat_regex = re.compile(
+        r"^#define X86_FEATURE_([A-Z0-9_]+)"
+        "\s+\(([\s\d]+\*[\s\d]+\+[\s\d]+)\).*$")
+
+    this = sys.modules[__name__]
+
+    for l in state.input.readlines():
+        # Short circuit the regex...
+        if not l.startswith("#define X86_FEATURE_"):
+            continue
+
+        res = feat_regex.match(l)
+
+        if res is None:
+            raise Fail("Failed to interpret '%s'" % (l.strip(), ))
+
+        name = res.groups()[0]
+        val = eval(res.groups()[1]) # Regex confines this to a very simple expression
+
+        if hasattr(this, name):
+            raise Fail("Duplicate symbol %s" % (name,))
+
+        if val in state.names:
+            raise Fail("Aliased value between %s and %s" %
+                            (name, state.names[val]))
+
+        # Mutate the current namespace to insert a feature literal with its
+        # bit index
+        setattr(this, name, val)
+
+        # Construct a reverse mapping of value to name
+        state.names[val] = name
+
+
+def featureset_to_uint32s(fs, nr):
+    """ Represent a featureset as a list of C-compatible uint32_t's """
+
+    bitmap = 0L
+    for f in fs:
+        bitmap |= 1L << f
+
+    words = []
+    while bitmap:
+        words.append(bitmap & ((1L << 32) - 1))
+        bitmap >>= 32
+
+    assert len(words) <= nr
+
+    if len(words) < nr:
+        words.extend([0] * (nr - len(words)))
+
+    return [ "0x%08xU" % x for x in words ]
+
+def format_uint32s(words, indent):
+    """ Format a list of uint32_t's sutable for a macro definition """
+    spaces = " " * indent
+    return spaces + (", \\\n" + spaces).join(words) + ", \\"
+
+
+def crunch_numbers(state):
+
+    # Size of bitmaps
+    state.nr_entries = nr_entries = (max(state.names.keys()) >> 5) + 1
+
+
+def write_results(state):
+    state.output.write(
+"""/*
+ * Automatically generated by %s - Do not edit!
+ * Source data: %s
+ */
+#ifndef __XEN_X86__FEATURESET_DATA__
+#define __XEN_X86__FEATURESET_DATA__
+""" % (sys.argv[0], state.source))
+
+    state.output.write(
+"""
+#define FEATURESET_NR_ENTRIES %s
+""" % (state.nr_entries,
+       ))
+
+    state.output.write(
+"""
+#endif /* __XEN_X86__FEATURESET_DATA__ */
+""")
+
+
+def open_file_or_fd(val, mode, buffering):
+    """
+    If 'val' looks like a decimal integer, open it as an fd.  If not, try to
+    open it as a regular file.
+    """
+
+    fd = -1
+    try:
+        # Does it look like an integer?
+        try:
+            fd = int(val, 10)
+        except ValueError:
+            pass
+
+        if fd == 0:
+            return sys.stdin
+        elif fd == 1:
+            return sys.stdout
+        elif fd == 2:
+            return sys.stderr
+
+        # Try to open it...
+        if fd != -1:
+            return os.fdopen(fd, mode, buffering)
+        else:
+            return open(val, mode, buffering)
+
+    except StandardError, e:
+        if fd != -1:
+            raise Fail("Unable to open fd %d: %s: %s" %
+                       (fd, e.__class__.__name__, e))
+        else:
+            raise Fail("Unable to open file '%s': %s: %s" %
+                       (val, e.__class__.__name__, e))
+
+    raise SystemExit(2)
+
+def main():
+    from optparse import OptionParser
+
+    # Change stdout to be line-buffered.
+    sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 1)
+
+    parser = OptionParser(usage = "%prog [options] -i INPUT -o OUTPUT",
+                          description =
+                          "Process featureset information")
+
+    parser.add_option("-i", "--in", dest = "fin", metavar = "<FD or FILE>",
+                      default = "0",
+                      help = "Featureset definitions")
+    parser.add_option("-o", "--out", dest = "fout", metavar = "<FD or FILE>",
+                      default = "1",
+                      help = "Featureset calculated information")
+
+    opts, _ = parser.parse_args()
+
+    if opts.fin is None or opts.fout is None:
+        parser.print_help(sys.stderr)
+        raise SystemExit(1)
+
+    state = State(opts.fin, opts.fout)
+
+    parse_definitions(state)
+    crunch_numbers(state)
+    write_results(state)
+
+
+if __name__ == "__main__":
+    try:
+        sys.exit(main())
+    except Fail, e:
+        print >>sys.stderr, e
+        sys.exit(1)
+    except SystemExit, e:
+        sys.exit(e.code)
+    except KeyboardInterrupt:
+        sys.exit(2)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 07/30] xen/x86: Collect more cpuid feature leaves
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (5 preceding siblings ...)
  2016-02-05 13:41 ` [PATCH v2 06/30] xen/x86: Script to automatically process featureset information Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-12 16:38   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities Andrew Cooper
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

New words are:
 * 0x80000007.edx - Contains Invarient TSC
 * 0x80000008.ebx - Newly used for AMD Zen processors

In addition, replace some open-coded ITSC and EFRO manipulation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Rely on ordering of generic_identify() to simplify init_amd()
 * Remove opencoded EFRO manipulation as well
---
 xen/arch/x86/cpu/amd.c    | 21 +++------------------
 xen/arch/x86/cpu/common.c |  6 ++++++
 xen/arch/x86/cpu/intel.c  |  2 +-
 xen/arch/x86/domain.c     |  2 +-
 4 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index c184f57..f9dc532 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -294,21 +294,6 @@ int cpu_has_amd_erratum(const struct cpuinfo_x86 *cpu, int osvw_id, ...)
 	return 0;
 }
 
-/* Can this system suffer from TSC drift due to C1 clock ramping? */
-static int c1_ramping_may_cause_clock_drift(struct cpuinfo_x86 *c) 
-{ 
-	if (cpuid_edx(0x80000007) & (1<<8)) {
-		/*
-		 * CPUID.AdvPowerMgmtInfo.TscInvariant
-		 * EDX bit 8, 8000_0007
-		 * Invariant TSC on 8th Gen or newer, use it
-		 * (assume all cores have invariant TSC)
-		 */
-		return 0;
-	}
-	return 1;
-}
-
 /*
  * Disable C1-Clock ramping if enabled in PMM7.CpuLowPwrEnh on 8th-generation
  * cores only. Assume BIOS has setup all Northbridges equivalently.
@@ -475,7 +460,7 @@ static void init_amd(struct cpuinfo_x86 *c)
 	}
 
 	if (c->extended_cpuid_level >= 0x80000007) {
-		if (cpuid_edx(0x80000007) & (1<<8)) {
+		if (cpu_has(c, X86_FEATURE_ITSC)) {
 			__set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
 			__set_bit(X86_FEATURE_NONSTOP_TSC, c->x86_capability);
 			if (c->x86 != 0x11)
@@ -600,14 +585,14 @@ static void init_amd(struct cpuinfo_x86 *c)
 		wrmsrl(MSR_K7_PERFCTR3, 0);
 	}
 
-	if (cpuid_edx(0x80000007) & (1 << 10)) {
+	if (cpu_has(c, X86_FEATURE_EFRO)) {
 		rdmsr(MSR_K7_HWCR, l, h);
 		l |= (1 << 27); /* Enable read-only APERF/MPERF bit */
 		wrmsr(MSR_K7_HWCR, l, h);
 	}
 
 	/* Prevent TSC drift in non single-processor, single-core platforms. */
-	if ((smp_processor_id() == 1) && c1_ramping_may_cause_clock_drift(c))
+	if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC))
 		disable_c1_ramping();
 
 	set_cpuidmask(c);
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 335f044..a99cc7c 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -269,6 +269,12 @@ static void generic_identify(struct cpuinfo_x86 *c)
 
 	if (c->extended_cpuid_level >= 0x80000004)
 		get_model_name(c); /* Default name */
+	if (c->extended_cpuid_level >= 0x80000007)
+		c->x86_capability[cpufeat_word(X86_FEATURE_ITSC)]
+			= cpuid_edx(0x80000007);
+	if (c->extended_cpuid_level >= 0x80000008)
+		c->x86_capability[cpufeat_word(X86_FEATURE_CLZERO)]
+			= cpuid_ebx(0x80000008);
 
 	/* Intel-defined flags: level 0x00000007 */
 	if ( c->cpuid_level >= 0x00000007 )
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index d4f574b..bdf89f6 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -281,7 +281,7 @@ static void init_intel(struct cpuinfo_x86 *c)
 	if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
 		(c->x86 == 0x6 && c->x86_model >= 0x0e))
 		__set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
-	if (cpuid_edx(0x80000007) & (1u<<8)) {
+	if (cpu_has(c, X86_FEATURE_ITSC)) {
 		__set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
 		__set_bit(X86_FEATURE_NONSTOP_TSC, c->x86_capability);
 		__set_bit(X86_FEATURE_TSC_RELIABLE, c->x86_capability);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9d43f7b..8f2c0b6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2605,7 +2605,7 @@ void domain_cpuid(
              */
             if ( (input == 0x80000007) && /* Advanced Power Management */
                  !d->disable_migrate && !d->arch.vtsc )
-                *edx &= ~(1u<<8); /* TSC Invariant */
+                *edx &= ~cpufeat_mask(X86_FEATURE_ITSC);
 
             return;
         }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (6 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 07/30] xen/x86: Collect more cpuid feature leaves Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-12 16:43   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset Andrew Cooper
                   ` (22 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

If Xen doesn't know about a feature, it is unsafe for use and should be
deliberately hidden from Xen's capabilities.

This doesn't make a practical difference yet, but will make a difference
later when the guest featuresets are seeded from the host featureset.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Reduced substantially from v1, by using the autogenerated information.
---
 xen/arch/x86/Makefile            |  1 +
 xen/arch/x86/cpu/common.c        |  3 +++
 xen/arch/x86/cpuid.c             | 19 +++++++++++++++++++
 xen/include/asm-x86/cpufeature.h |  3 +--
 xen/include/asm-x86/cpuid.h      | 24 ++++++++++++++++++++++++
 xen/tools/gen-cpuid.py           | 24 ++++++++++++++++++++++++
 6 files changed, 72 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/x86/cpuid.c
 create mode 100644 xen/include/asm-x86/cpuid.h

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 8e6e901..0e2b1d5 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -12,6 +12,7 @@ obj-y += bitops.o
 obj-bin-y += bzimage.init.o
 obj-bin-y += clear_page.o
 obj-bin-y += copy_page.o
+obj-y += cpuid.o
 obj-y += compat.o x86_64/compat.o
 obj-$(CONFIG_KEXEC) += crash.o
 obj-y += debug.o
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index a99cc7c..151dfe4 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -341,6 +341,9 @@ void identify_cpu(struct cpuinfo_x86 *c)
 	 * The vendor-specific functions might have changed features.  Now
 	 * we do "generic changes."
 	 */
+	for (i = 0; i < FSCAPINTS; ++i) {
+		c->x86_capability[i] &= known_features[i];
+	}
 
 	for (i = 0 ; i < NCAPINTS ; ++i)
 		c->x86_capability[i] &= ~cleared_caps[i];
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
new file mode 100644
index 0000000..fb3a6ac
--- /dev/null
+++ b/xen/arch/x86/cpuid.c
@@ -0,0 +1,19 @@
+#include <xen/lib.h>
+#include <asm/cpuid.h>
+
+const uint32_t known_features[] = INIT_KNOWN_FEATURES;
+
+static void __maybe_unused build_assertions(void)
+{
+    BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index d069563..a984a81 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -13,9 +13,8 @@
 
 #include <public/arch-x86/cpufeatureset.h>
 
-#include <asm/cpuid-autogen.h>
+#include <asm/cpuid.h>
 
-#define FSCAPINTS FEATURESET_NR_ENTRIES
 #define NCAPINTS (FSCAPINTS + 1) /* N 32-bit words worth of info */
 
 /* Other features, Linux-defined mapping, FSMAX+1 */
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
new file mode 100644
index 0000000..6cca5ea
--- /dev/null
+++ b/xen/include/asm-x86/cpuid.h
@@ -0,0 +1,24 @@
+#ifndef __X86_CPUID_H__
+#define __X86_CPUID_H__
+
+#include <asm/cpuid-autogen.h>
+
+#define FSCAPINTS FEATURESET_NR_ENTRIES
+
+#ifndef __ASSEMBLY__
+#include <xen/types.h>
+
+extern const uint32_t known_features[FSCAPINTS];
+
+#endif /* __ASSEMBLY__ */
+#endif /* !__X86_CPUID_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index c8240c0..0843be6 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -19,6 +19,8 @@ class State(object):
 
         # State calculated
         self.nr_entries = 0 # Number of words in a featureset
+        self.common = 0 # Common features between 1d and e1d
+        self.known = [] # All known features
 
 def parse_definitions(state):
     """
@@ -89,6 +91,22 @@ def crunch_numbers(state):
     # Size of bitmaps
     state.nr_entries = nr_entries = (max(state.names.keys()) >> 5) + 1
 
+    # Features common between 1d and e1d.
+    common_1d = (FPU, VME, DE, PSE, TSC, MSR, PAE, MCE, CX8, APIC,
+                 MTRR, PGE, MCA, CMOV, PAT, PSE36, MMX, FXSR)
+
+    # All known features.  Duplicate the common features in e1d
+    e1d_base = (SYSCALL >> 5) << 5
+    state.known = featureset_to_uint32s(
+        state.names.keys() + [ e1d_base + (x % 32) for x in common_1d ],
+        nr_entries)
+
+    # Fold common back into names
+    for f in common_1d:
+        state.names[e1d_base + (f % 32)] = "E1D_" + state.names[f]
+
+    state.common = featureset_to_uint32s(common_1d, 1)[0]
+
 
 def write_results(state):
     state.output.write(
@@ -103,7 +121,13 @@ def write_results(state):
     state.output.write(
 """
 #define FEATURESET_NR_ENTRIES %s
+
+#define INIT_COMMON_FEATURES %s
+
+#define INIT_KNOWN_FEATURES { \\\n%s\n}
 """ % (state.nr_entries,
+       state.common,
+       format_uint32s(state.known, 4),
        ))
 
     state.output.write(
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (7 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-12 16:47   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset Andrew Cooper
                   ` (21 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Awkwardly, some new feature bits mean "Feature $X no longer works".
Store these inverted in a featureset.

This permits safe zero-extending of a smaller featureset as part of a
comparison, and safe reasoning (subset?, superset?, compatible? etc.)
without specific knowldge of meaning of each bit.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2: Annotate inverted features using a magic comment and autogeneration.
---
 xen/arch/x86/cpu/common.c                   |  1 +
 xen/arch/x86/cpuid.c                        |  2 ++
 xen/include/asm-x86/cpufeature.h            |  2 +-
 xen/include/asm-x86/cpuid.h                 |  1 +
 xen/include/public/arch-x86/cpufeatureset.h | 18 +++++++++++++++++-
 xen/tools/gen-cpuid.py                      | 15 ++++++++++++++-
 6 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 151dfe4..39c340b 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -343,6 +343,7 @@ void identify_cpu(struct cpuinfo_x86 *c)
 	 */
 	for (i = 0; i < FSCAPINTS; ++i) {
 		c->x86_capability[i] &= known_features[i];
+		c->x86_capability[i] ^= inverted_features[i];
 	}
 
 	for (i = 0 ; i < NCAPINTS ; ++i)
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index fb3a6ac..30a3392 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -2,10 +2,12 @@
 #include <asm/cpuid.h>
 
 const uint32_t known_features[] = INIT_KNOWN_FEATURES;
+const uint32_t inverted_features[] = INIT_INVERTED_FEATURES;
 
 static void __maybe_unused build_assertions(void)
 {
     BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);
+    BUILD_BUG_ON(ARRAY_SIZE(inverted_features) != FSCAPINTS);
 }
 
 /*
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index a984a81..f228fa2 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -65,7 +65,7 @@
 
 #define cpu_has_smep            boot_cpu_has(X86_FEATURE_SMEP)
 #define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
-#define cpu_has_fpu_sel         (!boot_cpu_has(X86_FEATURE_NO_FPU_SEL))
+#define cpu_has_fpu_sel         boot_cpu_has(X86_FEATURE_FPU_SEL)
 
 #define cpu_has_ffxsr           ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) \
                                  && boot_cpu_has(X86_FEATURE_FFXSR))
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index 6cca5ea..341dbc1 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -9,6 +9,7 @@
 #include <xen/types.h>
 
 extern const uint32_t known_features[FSCAPINTS];
+extern const uint32_t inverted_features[FSCAPINTS];
 
 #endif /* __ASSEMBLY__ */
 #endif /* !__X86_CPUID_H__ */
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 02d695d..2748cfd 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -37,10 +37,26 @@
  * contain any synthesied values.  New words may be added to the end of
  * featureset.
  *
+ * "Anti" features have their representation inverted.  This permits safe
+ * zero-extending of a smaller featureset as part of a comparison, and safe
+ * reasoning (subset?, superset?, compatible? etc.) without specific knowldge
+ * of meaning of each bit.
+ *
  * All featureset words currently originate from leaves specified for the
  * CPUID instruction, but this is not preclude other sources of information.
  */
 
+/*
+ * Attribute syntax:
+ *
+ * Attributes for a particular feature are provided as characters before the
+ * first space in the comment immediately following the feature value.
+ *
+ * Inverted: '!'
+ *   This feature has its value in a featureset inverted, compared to how it
+ *   is specified by vendor architecture manuals.
+ */
+
 /* Intel-defined CPU features, CPUID level 0x00000001.edx, word 0 */
 #define X86_FEATURE_FPU           ( 0*32+ 0) /*   Onboard FPU */
 #define X86_FEATURE_VME           ( 0*32+ 1) /*   Virtual Mode Extensions */
@@ -158,7 +174,7 @@
 #define X86_FEATURE_INVPCID       ( 5*32+10) /*   Invalidate Process Context ID */
 #define X86_FEATURE_RTM           ( 5*32+11) /*   Restricted Transactional Memory */
 #define X86_FEATURE_CMT           ( 5*32+12) /*   Cache Monitoring Technology */
-#define X86_FEATURE_NO_FPU_SEL    ( 5*32+13) /*   FPU CS/DS stored as zero */
+#define X86_FEATURE_FPU_SEL       ( 5*32+13) /*!  FPU CS/DS stored as zero */
 #define X86_FEATURE_MPX           ( 5*32+14) /*   Memory Protection Extensions */
 #define X86_FEATURE_CAT           ( 5*32+15) /*   Cache Allocation Technology */
 #define X86_FEATURE_RDSEED        ( 5*32+18) /*   RDSEED instruction */
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 0843be6..9e0cc34 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -16,11 +16,13 @@ class State(object):
 
         # State parsed from input
         self.names = {} # Name => value mapping
+        self.raw_inverted = []
 
         # State calculated
         self.nr_entries = 0 # Number of words in a featureset
         self.common = 0 # Common features between 1d and e1d
         self.known = [] # All known features
+        self.inverted = [] # Features with inverted representations
 
 def parse_definitions(state):
     """
@@ -29,7 +31,8 @@ def parse_definitions(state):
     """
     feat_regex = re.compile(
         r"^#define X86_FEATURE_([A-Z0-9_]+)"
-        "\s+\(([\s\d]+\*[\s\d]+\+[\s\d]+)\).*$")
+        "\s+\(([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
+        "\s+/\*([!]*) .*$")
 
     this = sys.modules[__name__]
 
@@ -45,6 +48,7 @@ def parse_definitions(state):
 
         name = res.groups()[0]
         val = eval(res.groups()[1]) # Regex confines this to a very simple expression
+        attr = res.groups()[2]
 
         if hasattr(this, name):
             raise Fail("Duplicate symbol %s" % (name,))
@@ -60,6 +64,11 @@ def parse_definitions(state):
         # Construct a reverse mapping of value to name
         state.names[val] = name
 
+        if len(attr):
+
+            if "!" in attr:
+                state.raw_inverted.append(val)
+
 
 def featureset_to_uint32s(fs, nr):
     """ Represent a featureset as a list of C-compatible uint32_t's """
@@ -106,6 +115,7 @@ def crunch_numbers(state):
         state.names[e1d_base + (f % 32)] = "E1D_" + state.names[f]
 
     state.common = featureset_to_uint32s(common_1d, 1)[0]
+    state.inverted = featureset_to_uint32s(state.raw_inverted, nr_entries)
 
 
 def write_results(state):
@@ -125,9 +135,12 @@ def write_results(state):
 #define INIT_COMMON_FEATURES %s
 
 #define INIT_KNOWN_FEATURES { \\\n%s\n}
+
+#define INIT_INVERTED_FEATURES { \\\n%s\n}
 """ % (state.nr_entries,
        state.common,
        format_uint32s(state.known, 4),
+       format_uint32s(state.inverted, 4),
        ))
 
     state.output.write(
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (8 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-12 17:05   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Use attributes to specify whether a feature is applicable to be exposed to:
 1) All guests
 2) HVM guests
 3) HVM HAP guests

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2: Annotate features using a magic comment and autogeneration.
---
 xen/include/public/arch-x86/cpufeatureset.h | 187 ++++++++++++++--------------
 xen/tools/gen-cpuid.py                      |  32 ++++-
 2 files changed, 127 insertions(+), 92 deletions(-)

diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 2748cfd..d10b725 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -55,139 +55,144 @@
  * Inverted: '!'
  *   This feature has its value in a featureset inverted, compared to how it
  *   is specified by vendor architecture manuals.
+ *
+ * Applicability to guests: 'A', 'S' or 'H'
+ *   'A' = All guests.
+ *   'S' = All HVM guests (not PV guests).
+ *   'H' = HVM HAP guests (not PV or HVM Shadow guests).
  */
 
 /* Intel-defined CPU features, CPUID level 0x00000001.edx, word 0 */
-#define X86_FEATURE_FPU           ( 0*32+ 0) /*   Onboard FPU */
-#define X86_FEATURE_VME           ( 0*32+ 1) /*   Virtual Mode Extensions */
-#define X86_FEATURE_DE            ( 0*32+ 2) /*   Debugging Extensions */
-#define X86_FEATURE_PSE           ( 0*32+ 3) /*   Page Size Extensions */
-#define X86_FEATURE_TSC           ( 0*32+ 4) /*   Time Stamp Counter */
-#define X86_FEATURE_MSR           ( 0*32+ 5) /*   Model-Specific Registers, RDMSR, WRMSR */
-#define X86_FEATURE_PAE           ( 0*32+ 6) /*   Physical Address Extensions */
-#define X86_FEATURE_MCE           ( 0*32+ 7) /*   Machine Check Architecture */
-#define X86_FEATURE_CX8           ( 0*32+ 8) /*   CMPXCHG8 instruction */
-#define X86_FEATURE_APIC          ( 0*32+ 9) /*   Onboard APIC */
-#define X86_FEATURE_SEP           ( 0*32+11) /*   SYSENTER/SYSEXIT */
-#define X86_FEATURE_MTRR          ( 0*32+12) /*   Memory Type Range Registers */
-#define X86_FEATURE_PGE           ( 0*32+13) /*   Page Global Enable */
-#define X86_FEATURE_MCA           ( 0*32+14) /*   Machine Check Architecture */
-#define X86_FEATURE_CMOV          ( 0*32+15) /*   CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
-#define X86_FEATURE_PAT           ( 0*32+16) /*   Page Attribute Table */
-#define X86_FEATURE_PSE36         ( 0*32+17) /*   36-bit PSEs */
+#define X86_FEATURE_FPU           ( 0*32+ 0) /*A  Onboard FPU */
+#define X86_FEATURE_VME           ( 0*32+ 1) /*S  Virtual Mode Extensions */
+#define X86_FEATURE_DE            ( 0*32+ 2) /*A  Debugging Extensions */
+#define X86_FEATURE_PSE           ( 0*32+ 3) /*S  Page Size Extensions */
+#define X86_FEATURE_TSC           ( 0*32+ 4) /*A  Time Stamp Counter */
+#define X86_FEATURE_MSR           ( 0*32+ 5) /*A  Model-Specific Registers, RDMSR, WRMSR */
+#define X86_FEATURE_PAE           ( 0*32+ 6) /*A  Physical Address Extensions */
+#define X86_FEATURE_MCE           ( 0*32+ 7) /*A  Machine Check Architecture */
+#define X86_FEATURE_CX8           ( 0*32+ 8) /*A  CMPXCHG8 instruction */
+#define X86_FEATURE_APIC          ( 0*32+ 9) /*A  Onboard APIC */
+#define X86_FEATURE_SEP           ( 0*32+11) /*A  SYSENTER/SYSEXIT */
+#define X86_FEATURE_MTRR          ( 0*32+12) /*S  Memory Type Range Registers */
+#define X86_FEATURE_PGE           ( 0*32+13) /*S  Page Global Enable */
+#define X86_FEATURE_MCA           ( 0*32+14) /*A  Machine Check Architecture */
+#define X86_FEATURE_CMOV          ( 0*32+15) /*A  CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
+#define X86_FEATURE_PAT           ( 0*32+16) /*A  Page Attribute Table */
+#define X86_FEATURE_PSE36         ( 0*32+17) /*S  36-bit PSEs */
 #define X86_FEATURE_PN            ( 0*32+18) /*   Processor serial number */
-#define X86_FEATURE_CLFLSH        ( 0*32+19) /*   CLFLUSH instruction */
+#define X86_FEATURE_CLFLSH        ( 0*32+19) /*A  CLFLUSH instruction */
 #define X86_FEATURE_DS            ( 0*32+21) /*   Debug Store */
-#define X86_FEATURE_ACPI          ( 0*32+22) /*   ACPI via MSR */
-#define X86_FEATURE_MMX           ( 0*32+23) /*   Multimedia Extensions */
-#define X86_FEATURE_FXSR          ( 0*32+24) /*   FXSAVE and FXRSTOR instructions */
-#define X86_FEATURE_XMM           ( 0*32+25) /*   Streaming SIMD Extensions */
-#define X86_FEATURE_XMM2          ( 0*32+26) /*   Streaming SIMD Extensions-2 */
+#define X86_FEATURE_ACPI          ( 0*32+22) /*A  ACPI via MSR */
+#define X86_FEATURE_MMX           ( 0*32+23) /*A  Multimedia Extensions */
+#define X86_FEATURE_FXSR          ( 0*32+24) /*A  FXSAVE and FXRSTOR instructions */
+#define X86_FEATURE_XMM           ( 0*32+25) /*A  Streaming SIMD Extensions */
+#define X86_FEATURE_XMM2          ( 0*32+26) /*A  Streaming SIMD Extensions-2 */
 #define X86_FEATURE_SELFSNOOP     ( 0*32+27) /*   CPU self snoop */
-#define X86_FEATURE_HT            ( 0*32+28) /*   Hyper-Threading */
+#define X86_FEATURE_HT            ( 0*32+28) /*A  Hyper-Threading */
 #define X86_FEATURE_ACC           ( 0*32+29) /*   Automatic clock control */
 #define X86_FEATURE_IA64          ( 0*32+30) /*   IA-64 processor */
 #define X86_FEATURE_PBE           ( 0*32+31) /*   Pending Break Enable */
 
 /* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
-#define X86_FEATURE_XMM3          ( 1*32+ 0) /*   Streaming SIMD Extensions-3 */
-#define X86_FEATURE_PCLMULQDQ     ( 1*32+ 1) /*   Carry-less mulitplication */
+#define X86_FEATURE_XMM3          ( 1*32+ 0) /*A  Streaming SIMD Extensions-3 */
+#define X86_FEATURE_PCLMULQDQ     ( 1*32+ 1) /*A  Carry-less mulitplication */
 #define X86_FEATURE_DTES64        ( 1*32+ 2) /*   64-bit Debug Store */
 #define X86_FEATURE_MWAIT         ( 1*32+ 3) /*   Monitor/Mwait support */
 #define X86_FEATURE_DSCPL         ( 1*32+ 4) /*   CPL Qualified Debug Store */
-#define X86_FEATURE_VMXE          ( 1*32+ 5) /*   Virtual Machine Extensions */
+#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */
 #define X86_FEATURE_SMXE          ( 1*32+ 6) /*   Safer Mode Extensions */
 #define X86_FEATURE_EST           ( 1*32+ 7) /*   Enhanced SpeedStep */
 #define X86_FEATURE_TM2           ( 1*32+ 8) /*   Thermal Monitor 2 */
-#define X86_FEATURE_SSSE3         ( 1*32+ 9) /*   Supplemental Streaming SIMD Extensions-3 */
+#define X86_FEATURE_SSSE3         ( 1*32+ 9) /*A  Supplemental Streaming SIMD Extensions-3 */
 #define X86_FEATURE_CID           ( 1*32+10) /*   Context ID */
-#define X86_FEATURE_FMA           ( 1*32+12) /*   Fused Multiply Add */
-#define X86_FEATURE_CX16          ( 1*32+13) /*   CMPXCHG16B */
+#define X86_FEATURE_FMA           ( 1*32+12) /*A  Fused Multiply Add */
+#define X86_FEATURE_CX16          ( 1*32+13) /*A  CMPXCHG16B */
 #define X86_FEATURE_XTPR          ( 1*32+14) /*   Send Task Priority Messages */
 #define X86_FEATURE_PDCM          ( 1*32+15) /*   Perf/Debug Capability MSR */
-#define X86_FEATURE_PCID          ( 1*32+17) /*   Process Context ID */
+#define X86_FEATURE_PCID          ( 1*32+17) /*H  Process Context ID */
 #define X86_FEATURE_DCA           ( 1*32+18) /*   Direct Cache Access */
-#define X86_FEATURE_SSE4_1        ( 1*32+19) /*   Streaming SIMD Extensions 4.1 */
-#define X86_FEATURE_SSE4_2        ( 1*32+20) /*   Streaming SIMD Extensions 4.2 */
-#define X86_FEATURE_X2APIC        ( 1*32+21) /*   Extended xAPIC */
-#define X86_FEATURE_MOVBE         ( 1*32+22) /*   movbe instruction */
-#define X86_FEATURE_POPCNT        ( 1*32+23) /*   POPCNT instruction */
-#define X86_FEATURE_TSC_DEADLINE  ( 1*32+24) /*   TSC Deadline Timer */
-#define X86_FEATURE_AES           ( 1*32+25) /*   AES instructions */
-#define X86_FEATURE_XSAVE         ( 1*32+26) /*   XSAVE/XRSTOR/XSETBV/XGETBV */
+#define X86_FEATURE_SSE4_1        ( 1*32+19) /*A  Streaming SIMD Extensions 4.1 */
+#define X86_FEATURE_SSE4_2        ( 1*32+20) /*A  Streaming SIMD Extensions 4.2 */
+#define X86_FEATURE_X2APIC        ( 1*32+21) /*A  Extended xAPIC */
+#define X86_FEATURE_MOVBE         ( 1*32+22) /*A  movbe instruction */
+#define X86_FEATURE_POPCNT        ( 1*32+23) /*A  POPCNT instruction */
+#define X86_FEATURE_TSC_DEADLINE  ( 1*32+24) /*S  TSC Deadline Timer */
+#define X86_FEATURE_AES           ( 1*32+25) /*A  AES instructions */
+#define X86_FEATURE_XSAVE         ( 1*32+26) /*A  XSAVE/XRSTOR/XSETBV/XGETBV */
 #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
-#define X86_FEATURE_AVX           ( 1*32+28) /*   Advanced Vector Extensions */
-#define X86_FEATURE_F16C          ( 1*32+29) /*   Half-precision convert instruction */
-#define X86_FEATURE_RDRAND        ( 1*32+30) /*   Digital Random Number Generator */
-#define X86_FEATURE_HYPERVISOR    ( 1*32+31) /*   Running under some hypervisor */
+#define X86_FEATURE_AVX           ( 1*32+28) /*A  Advanced Vector Extensions */
+#define X86_FEATURE_F16C          ( 1*32+29) /*A  Half-precision convert instruction */
+#define X86_FEATURE_RDRAND        ( 1*32+30) /*A  Digital Random Number Generator */
+#define X86_FEATURE_HYPERVISOR    ( 1*32+31) /*A  Running under some hypervisor */
 
 /* AMD-defined CPU features, CPUID level 0x80000001.edx, word 2 */
-#define X86_FEATURE_SYSCALL       ( 2*32+11) /*   SYSCALL/SYSRET */
-#define X86_FEATURE_MP            ( 2*32+19) /*   MP Capable. */
-#define X86_FEATURE_NX            ( 2*32+20) /*   Execute Disable */
-#define X86_FEATURE_MMXEXT        ( 2*32+22) /*   AMD MMX extensions */
-#define X86_FEATURE_FFXSR         ( 2*32+25) /*   FFXSR instruction optimizations */
-#define X86_FEATURE_PAGE1GB       ( 2*32+26) /*   1Gb large page support */
-#define X86_FEATURE_RDTSCP        ( 2*32+27) /*   RDTSCP */
-#define X86_FEATURE_LM            ( 2*32+29) /*   Long Mode (x86-64) */
-#define X86_FEATURE_3DNOWEXT      ( 2*32+30) /*   AMD 3DNow! extensions */
-#define X86_FEATURE_3DNOW         ( 2*32+31) /*   3DNow! */
+#define X86_FEATURE_SYSCALL       ( 2*32+11) /*A  SYSCALL/SYSRET */
+#define X86_FEATURE_MP            ( 2*32+19) /*A  MP Capable. */
+#define X86_FEATURE_NX            ( 2*32+20) /*A  Execute Disable */
+#define X86_FEATURE_MMXEXT        ( 2*32+22) /*A  AMD MMX extensions */
+#define X86_FEATURE_FFXSR         ( 2*32+25) /*A  FFXSR instruction optimizations */
+#define X86_FEATURE_PAGE1GB       ( 2*32+26) /*H  1Gb large page support */
+#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */
+#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
+#define X86_FEATURE_3DNOWEXT      ( 2*32+30) /*A  AMD 3DNow! extensions */
+#define X86_FEATURE_3DNOW         ( 2*32+31) /*A  3DNow! */
 
 /* AMD-defined CPU features, CPUID level 0x80000001.ecx, word 3 */
-#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
+#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */
 #define X86_FEATURE_CMP_LEGACY    ( 3*32+ 1) /*   If yes HyperThreading not valid */
-#define X86_FEATURE_SVM           ( 3*32+ 2) /*   Secure virtual machine */
+#define X86_FEATURE_SVM           ( 3*32+ 2) /*S  Secure virtual machine */
 #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
-#define X86_FEATURE_CR8_LEGACY    ( 3*32+ 4) /*   CR8 in 32-bit mode */
-#define X86_FEATURE_ABM           ( 3*32+ 5) /*   Advanced bit manipulation */
-#define X86_FEATURE_SSE4A         ( 3*32+ 6) /*   SSE-4A */
-#define X86_FEATURE_MISALIGNSSE   ( 3*32+ 7) /*   Misaligned SSE mode */
-#define X86_FEATURE_3DNOWPREFETCH ( 3*32+ 8) /*   3DNow prefetch instructions */
+#define X86_FEATURE_CR8_LEGACY    ( 3*32+ 4) /*S  CR8 in 32-bit mode */
+#define X86_FEATURE_ABM           ( 3*32+ 5) /*A  Advanced bit manipulation */
+#define X86_FEATURE_SSE4A         ( 3*32+ 6) /*A  SSE-4A */
+#define X86_FEATURE_MISALIGNSSE   ( 3*32+ 7) /*A  Misaligned SSE mode */
+#define X86_FEATURE_3DNOWPREFETCH ( 3*32+ 8) /*A  3DNow prefetch instructions */
 #define X86_FEATURE_OSVW          ( 3*32+ 9) /*   OS Visible Workaround */
-#define X86_FEATURE_IBS           ( 3*32+10) /*   Instruction Based Sampling */
-#define X86_FEATURE_XOP           ( 3*32+11) /*   extended AVX instructions */
+#define X86_FEATURE_IBS           ( 3*32+10) /*S  Instruction Based Sampling */
+#define X86_FEATURE_XOP           ( 3*32+11) /*A  extended AVX instructions */
 #define X86_FEATURE_SKINIT        ( 3*32+12) /*   SKINIT/STGI instructions */
 #define X86_FEATURE_WDT           ( 3*32+13) /*   Watchdog timer */
-#define X86_FEATURE_LWP           ( 3*32+15) /*   Light Weight Profiling */
-#define X86_FEATURE_FMA4          ( 3*32+16) /*   4 operands MAC instructions */
+#define X86_FEATURE_LWP           ( 3*32+15) /*A  Light Weight Profiling */
+#define X86_FEATURE_FMA4          ( 3*32+16) /*A  4 operands MAC instructions */
 #define X86_FEATURE_NODEID_MSR    ( 3*32+19) /*   NodeId MSR */
-#define X86_FEATURE_TBM           ( 3*32+21) /*   trailing bit manipulations */
+#define X86_FEATURE_TBM           ( 3*32+21) /*A  trailing bit manipulations */
 #define X86_FEATURE_TOPOEXT       ( 3*32+22) /*   topology extensions CPUID leafs */
-#define X86_FEATURE_DBEXT         ( 3*32+26) /*   data breakpoint extension */
+#define X86_FEATURE_DBEXT         ( 3*32+26) /*A  data breakpoint extension */
 #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension (MONITORX/MWAITX) */
 
 /* Intel-defined CPU features, CPUID level 0x0000000D:1.eax, word 4 */
-#define X86_FEATURE_XSAVEOPT      ( 4*32+ 0) /*   XSAVEOPT instruction */
-#define X86_FEATURE_XSAVEC        ( 4*32+ 1) /*   XSAVEC/XRSTORC instructions */
-#define X86_FEATURE_XGETBV1       ( 4*32+ 2) /*   XGETBV with %ecx=1 */
-#define X86_FEATURE_XSAVES        ( 4*32+ 3) /*   XSAVES/XRSTORS instructions */
+#define X86_FEATURE_XSAVEOPT      ( 4*32+ 0) /*A  XSAVEOPT instruction */
+#define X86_FEATURE_XSAVEC        ( 4*32+ 1) /*A  XSAVEC/XRSTORC instructions */
+#define X86_FEATURE_XGETBV1       ( 4*32+ 2) /*A  XGETBV with %ecx=1 */
+#define X86_FEATURE_XSAVES        ( 4*32+ 3) /*S  XSAVES/XRSTORS instructions */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */
-#define X86_FEATURE_FSGSBASE      ( 5*32+ 0) /*   {RD,WR}{FS,GS}BASE instructions */
-#define X86_FEATURE_TSC_ADJUST    ( 5*32+ 1) /*   TSC_ADJUST MSR available */
-#define X86_FEATURE_BMI1          ( 5*32+ 3) /*   1st bit manipulation extensions */
-#define X86_FEATURE_HLE           ( 5*32+ 4) /*   Hardware Lock Elision */
-#define X86_FEATURE_AVX2          ( 5*32+ 5) /*   AVX2 instructions */
-#define X86_FEATURE_SMEP          ( 5*32+ 7) /*   Supervisor Mode Execution Protection */
-#define X86_FEATURE_BMI2          ( 5*32+ 8) /*   2nd bit manipulation extensions */
-#define X86_FEATURE_ERMS          ( 5*32+ 9) /*   Enhanced REP MOVSB/STOSB */
-#define X86_FEATURE_INVPCID       ( 5*32+10) /*   Invalidate Process Context ID */
-#define X86_FEATURE_RTM           ( 5*32+11) /*   Restricted Transactional Memory */
+#define X86_FEATURE_FSGSBASE      ( 5*32+ 0) /*A  {RD,WR}{FS,GS}BASE instructions */
+#define X86_FEATURE_TSC_ADJUST    ( 5*32+ 1) /*S  TSC_ADJUST MSR available */
+#define X86_FEATURE_BMI1          ( 5*32+ 3) /*A  1st bit manipulation extensions */
+#define X86_FEATURE_HLE           ( 5*32+ 4) /*A  Hardware Lock Elision */
+#define X86_FEATURE_AVX2          ( 5*32+ 5) /*A  AVX2 instructions */
+#define X86_FEATURE_SMEP          ( 5*32+ 7) /*S  Supervisor Mode Execution Protection */
+#define X86_FEATURE_BMI2          ( 5*32+ 8) /*A  2nd bit manipulation extensions */
+#define X86_FEATURE_ERMS          ( 5*32+ 9) /*A  Enhanced REP MOVSB/STOSB */
+#define X86_FEATURE_INVPCID       ( 5*32+10) /*H  Invalidate Process Context ID */
+#define X86_FEATURE_RTM           ( 5*32+11) /*A  Restricted Transactional Memory */
 #define X86_FEATURE_CMT           ( 5*32+12) /*   Cache Monitoring Technology */
-#define X86_FEATURE_FPU_SEL       ( 5*32+13) /*!  FPU CS/DS stored as zero */
+#define X86_FEATURE_FPU_SEL       ( 5*32+13) /*!A FPU CS/DS stored as zero */
 #define X86_FEATURE_MPX           ( 5*32+14) /*   Memory Protection Extensions */
 #define X86_FEATURE_CAT           ( 5*32+15) /*   Cache Allocation Technology */
-#define X86_FEATURE_RDSEED        ( 5*32+18) /*   RDSEED instruction */
-#define X86_FEATURE_ADX           ( 5*32+19) /*   ADCX, ADOX instructions */
-#define X86_FEATURE_SMAP          ( 5*32+20) /*   Supervisor Mode Access Prevention */
-#define X86_FEATURE_PCOMMIT       ( 5*32+22) /*   PCOMMIT instruction */
-#define X86_FEATURE_CLFLUSHOPT    ( 5*32+23) /*   CLFLUSHOPT instruction */
-#define X86_FEATURE_CLWB          ( 5*32+24) /*   CLWB instruction */
-#define X86_FEATURE_SHA           ( 5*32+29) /*   SHA1 & SHA256 instructions */
+#define X86_FEATURE_RDSEED        ( 5*32+18) /*A  RDSEED instruction */
+#define X86_FEATURE_ADX           ( 5*32+19) /*A  ADCX, ADOX instructions */
+#define X86_FEATURE_SMAP          ( 5*32+20) /*S  Supervisor Mode Access Prevention */
+#define X86_FEATURE_PCOMMIT       ( 5*32+22) /*A  PCOMMIT instruction */
+#define X86_FEATURE_CLFLUSHOPT    ( 5*32+23) /*A  CLFLUSHOPT instruction */
+#define X86_FEATURE_CLWB          ( 5*32+24) /*A  CLWB instruction */
+#define X86_FEATURE_SHA           ( 5*32+29) /*A  SHA1 & SHA256 instructions */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.ecx, word 6 */
-#define X86_FEATURE_PREFETCHWT1   ( 6*32+ 0) /*   PREFETCHWT1 instruction */
-#define X86_FEATURE_PKU           ( 6*32+ 3) /*   Protection Keys for Userspace */
+#define X86_FEATURE_PREFETCHWT1   ( 6*32+ 0) /*A  PREFETCHWT1 instruction */
+#define X86_FEATURE_PKU           ( 6*32+ 3) /*H  Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE         ( 6*32+ 4) /*   OS Protection Keys Enable */
 
 /* AMD-defined CPU features, CPUID level 0x80000007.edx, word 7 */
@@ -195,7 +200,7 @@
 #define X86_FEATURE_EFRO          ( 7*32+10) /*   APERF/MPERF Read Only interface */
 
 /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
-#define X86_FEATURE_CLZERO        ( 8*32+ 0) /*   CLZERO instruction */
+#define X86_FEATURE_CLZERO        ( 8*32+ 0) /*A  CLZERO instruction */
 
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 #endif /* !__XEN_PUBLIC_ARCH_X86_CPUFEATURESET_H__ */
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 9e0cc34..5f0f892 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -17,12 +17,18 @@ class State(object):
         # State parsed from input
         self.names = {} # Name => value mapping
         self.raw_inverted = []
+        self.raw_pv = []
+        self.raw_hvm_shadow = []
+        self.raw_hvm_hap = []
 
         # State calculated
         self.nr_entries = 0 # Number of words in a featureset
         self.common = 0 # Common features between 1d and e1d
         self.known = [] # All known features
         self.inverted = [] # Features with inverted representations
+        self.pv = []
+        self.hvm_shadow = []
+        self.hvm_hap = []
 
 def parse_definitions(state):
     """
@@ -32,7 +38,7 @@ def parse_definitions(state):
     feat_regex = re.compile(
         r"^#define X86_FEATURE_([A-Z0-9_]+)"
         "\s+\(([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
-        "\s+/\*([!]*) .*$")
+        "\s+/\*([\w!]*) .*$")
 
     this = sys.modules[__name__]
 
@@ -69,6 +75,18 @@ def parse_definitions(state):
             if "!" in attr:
                 state.raw_inverted.append(val)
 
+            if "A" in attr:
+                state.raw_pv.append(val)
+                state.raw_hvm_shadow.append(val)
+                state.raw_hvm_hap.append(val)
+            elif "S" in attr:
+                state.raw_hvm_shadow.append(val)
+                state.raw_hvm_hap.append(val)
+            elif "H" in attr:
+                state.raw_hvm_hap.append(val)
+            else:
+                raise Fail("Unrecognised attributes '%s' for %s" % (attr, name))
+
 
 def featureset_to_uint32s(fs, nr):
     """ Represent a featureset as a list of C-compatible uint32_t's """
@@ -116,6 +134,9 @@ def crunch_numbers(state):
 
     state.common = featureset_to_uint32s(common_1d, 1)[0]
     state.inverted = featureset_to_uint32s(state.raw_inverted, nr_entries)
+    state.pv = featureset_to_uint32s(state.raw_pv, nr_entries)
+    state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries)
+    state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
 
 
 def write_results(state):
@@ -137,10 +158,19 @@ def write_results(state):
 #define INIT_KNOWN_FEATURES { \\\n%s\n}
 
 #define INIT_INVERTED_FEATURES { \\\n%s\n}
+
+#define INIT_PV_FEATURES { \\\n%s\n}
+
+#define INIT_HVM_SHADOW_FEATURES { \\\n%s\n}
+
+#define INIT_HVM_HAP_FEATURES { \\\n%s\n}
 """ % (state.nr_entries,
        state.common,
        format_uint32s(state.known, 4),
        format_uint32s(state.inverted, 4),
+       format_uint32s(state.pv, 4),
+       format_uint32s(state.hvm_shadow, 4),
+       format_uint32s(state.hvm_hap, 4),
        ))
 
     state.output.write(
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (9 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-15 13:37   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 12/30] xen/x86: Generate deep dependencies of features Andrew Cooper
                   ` (19 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

All of this information will be used by the toolstack to make informed
levelling decisions for VMs, and by Xen to sanity check toolstack-provided
information.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/cpuid.c        | 152 ++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c        |   3 +
 xen/include/asm-x86/cpuid.h |  17 +++++
 3 files changed, 172 insertions(+)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 30a3392..1af0e6c 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -1,13 +1,165 @@
 #include <xen/lib.h>
 #include <asm/cpuid.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <asm/processor.h>
+
+#define COMMON_1D INIT_COMMON_FEATURES
 
 const uint32_t known_features[] = INIT_KNOWN_FEATURES;
 const uint32_t inverted_features[] = INIT_INVERTED_FEATURES;
 
+static const uint32_t pv_featuremask[] = INIT_PV_FEATURES;
+static const uint32_t hvm_shadow_featuremask[] = INIT_HVM_SHADOW_FEATURES;
+static const uint32_t hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES;
+
+uint32_t __read_mostly raw_featureset[FSCAPINTS];
+uint32_t __read_mostly host_featureset[FSCAPINTS];
+uint32_t __read_mostly pv_featureset[FSCAPINTS];
+uint32_t __read_mostly hvm_featureset[FSCAPINTS];
+
+static void sanitise_featureset(uint32_t *fs)
+{
+    unsigned int i;
+
+    for ( i = 0; i < FSCAPINTS; ++i )
+    {
+        /* Clamp to known mask. */
+        fs[i] &= known_features[i];
+    }
+
+    switch ( boot_cpu_data.x86_vendor )
+    {
+    case X86_VENDOR_INTEL:
+        /* Intel clears the common bits in e1d. */
+        fs[FEATURESET_e1d] &= ~COMMON_1D;
+        break;
+
+    case X86_VENDOR_AMD:
+        /* AMD duplicates the common bits between 1d and e1d. */
+        fs[FEATURESET_e1d] = ((fs[FEATURESET_1d]  &  COMMON_1D) |
+                              (fs[FEATURESET_e1d] & ~COMMON_1D));
+        break;
+    }
+}
+
+static void calculate_raw_featureset(void)
+{
+    unsigned int i, max, tmp;
+
+    max = cpuid_eax(0);
+
+    if ( max >= 1 )
+        cpuid(0x1, &tmp, &tmp,
+              &raw_featureset[FEATURESET_1c],
+              &raw_featureset[FEATURESET_1d]);
+    if ( max >= 7 )
+        cpuid_count(0x7, 0, &tmp,
+                    &raw_featureset[FEATURESET_7b0],
+                    &raw_featureset[FEATURESET_7c0],
+                    &tmp);
+    if ( max >= 0xd )
+        cpuid_count(0xd, 1,
+                    &raw_featureset[FEATURESET_Da1],
+                    &tmp, &tmp, &tmp);
+
+    max = cpuid_eax(0x80000000);
+    if ( max >= 0x80000001 )
+        cpuid(0x80000001, &tmp, &tmp,
+              &raw_featureset[FEATURESET_e1c],
+              &raw_featureset[FEATURESET_e1d]);
+    if ( max >= 0x80000007 )
+        cpuid(0x80000007, &tmp, &tmp, &tmp,
+              &raw_featureset[FEATURESET_e7d]);
+    if ( max >= 0x80000008 )
+        cpuid(0x80000008, &tmp,
+              &raw_featureset[FEATURESET_e8b],
+              &tmp, &tmp);
+
+    for ( i = 0; i < ARRAY_SIZE(raw_featureset); ++i )
+        raw_featureset[i] ^= inverted_features[i];
+}
+
+static void calculate_host_featureset(void)
+{
+    memcpy(host_featureset, boot_cpu_data.x86_capability,
+           sizeof(host_featureset));
+}
+
+static void calculate_pv_featureset(void)
+{
+    unsigned int i;
+
+    for ( i = 0; i < ARRAY_SIZE(pv_featureset); ++i )
+        pv_featureset[i] = host_featureset[i] & pv_featuremask[i];
+
+    /* Unconditionally claim to be able to set the hypervisor bit. */
+    __set_bit(X86_FEATURE_HYPERVISOR, pv_featureset);
+
+    sanitise_featureset(pv_featureset);
+}
+
+static void calculate_hvm_featureset(void)
+{
+    unsigned int i;
+    const uint32_t *hvm_featuremask;
+
+    if ( !hvm_enabled )
+        return;
+
+    hvm_featuremask = hvm_funcs.hap_supported ?
+        hvm_hap_featuremask : hvm_shadow_featuremask;
+
+    for ( i = 0; i < ARRAY_SIZE(hvm_featureset); ++i )
+        hvm_featureset[i] = host_featureset[i] & hvm_featuremask[i];
+
+    /* Unconditionally claim to be able to set the hypervisor bit. */
+    __set_bit(X86_FEATURE_HYPERVISOR, hvm_featureset);
+
+    /*
+     * On AMD, PV guests are entirely unable to use 'sysenter' as Xen runs in
+     * long mode (and init_amd() has cleared it out of host capabilities), but
+     * HVM guests are able if running in protected mode.
+     */
+    if ( (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
+         test_bit(X86_FEATURE_SEP, raw_featureset) )
+        __set_bit(X86_FEATURE_SEP, hvm_featureset);
+
+    /*
+     * With VT-x, some features are only supported by Xen if dedicated
+     * hardware support is also available.
+     */
+    if ( cpu_has_vmx )
+    {
+        if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
+             !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
+            __clear_bit(X86_FEATURE_MPX, hvm_featureset);
+
+        if ( !cpu_has_vmx_xsaves )
+            __clear_bit(X86_FEATURE_XSAVES, hvm_featureset);
+
+        if ( !cpu_has_vmx_pcommit )
+            __clear_bit(X86_FEATURE_PCOMMIT, hvm_featureset);
+    }
+
+    sanitise_featureset(pv_featureset);
+}
+
+void calculate_featuresets(void)
+{
+    calculate_raw_featureset();
+    calculate_host_featureset();
+    calculate_pv_featureset();
+    calculate_hvm_featureset();
+}
+
 static void __maybe_unused build_assertions(void)
 {
     BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);
     BUILD_BUG_ON(ARRAY_SIZE(inverted_features) != FSCAPINTS);
+    BUILD_BUG_ON(ARRAY_SIZE(pv_featuremask) != FSCAPINTS);
+    BUILD_BUG_ON(ARRAY_SIZE(hvm_shadow_featuremask) != FSCAPINTS);
+    BUILD_BUG_ON(ARRAY_SIZE(hvm_hap_featuremask) != FSCAPINTS);
 }
 
 /*
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 76c7b0f..50e4e51 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -50,6 +50,7 @@
 #include <asm/nmi.h>
 #include <asm/alternative.h>
 #include <asm/mc146818rtc.h>
+#include <asm/cpuid.h>
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool_t __initdata opt_nosmp;
@@ -1437,6 +1438,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                "Multiple initrd candidates, picking module #%u\n",
                initrdidx);
 
+    calculate_featuresets();
+
     /*
      * Temporarily clear SMAP in CR4 to allow user-accesses in construct_dom0().
      * This saves a large number of corner cases interactions with
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index 341dbc1..18ba95b 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -5,12 +5,29 @@
 
 #define FSCAPINTS FEATURESET_NR_ENTRIES
 
+#define FEATURESET_1d     0 /* 0x00000001.edx      */
+#define FEATURESET_1c     1 /* 0x00000001.ecx      */
+#define FEATURESET_e1d    2 /* 0x80000001.edx      */
+#define FEATURESET_e1c    3 /* 0x80000001.ecx      */
+#define FEATURESET_Da1    4 /* 0x0000000d:1.eax    */
+#define FEATURESET_7b0    5 /* 0x00000007:0.ebx    */
+#define FEATURESET_7c0    6 /* 0x00000007:0.ecx    */
+#define FEATURESET_e7d    7 /* 0x80000007.edx      */
+#define FEATURESET_e8b    8 /* 0x80000008.ebx      */
+
 #ifndef __ASSEMBLY__
 #include <xen/types.h>
 
 extern const uint32_t known_features[FSCAPINTS];
 extern const uint32_t inverted_features[FSCAPINTS];
 
+extern uint32_t raw_featureset[FSCAPINTS];
+extern uint32_t host_featureset[FSCAPINTS];
+extern uint32_t pv_featureset[FSCAPINTS];
+extern uint32_t hvm_featureset[FSCAPINTS];
+
+void calculate_featuresets(void);
+
 #endif /* __ASSEMBLY__ */
 #endif /* !__X86_CPUID_H__ */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (10 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-15 14:06   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
                   ` (18 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Some features depend on other features.  Working out and maintaining the exact
dependency tree is complicated, so it is expressed in the automatic generation
script, and flattened for faster runtime use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

For all intents an purposes, new in v2.
---
 xen/arch/x86/cpuid.c        | 54 +++++++++++++++++++++++++++++++++
 xen/include/asm-x86/cpuid.h |  2 ++
 xen/tools/gen-cpuid.py      | 73 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 128 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 1af0e6c..25dcd0e 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -12,6 +12,7 @@ const uint32_t inverted_features[] = INIT_INVERTED_FEATURES;
 static const uint32_t pv_featuremask[] = INIT_PV_FEATURES;
 static const uint32_t hvm_shadow_featuremask[] = INIT_HVM_SHADOW_FEATURES;
 static const uint32_t hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES;
+static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
 
 uint32_t __read_mostly raw_featureset[FSCAPINTS];
 uint32_t __read_mostly host_featureset[FSCAPINTS];
@@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
 
 static void sanitise_featureset(uint32_t *fs)
 {
+    uint32_t disabled_features[FSCAPINTS];
     unsigned int i;
 
     for ( i = 0; i < FSCAPINTS; ++i )
     {
         /* Clamp to known mask. */
         fs[i] &= known_features[i];
+
+        /*
+         * Identify which features with deep dependencies have been
+         * disabled.
+         */
+        disabled_features[i] = ~fs[i] & deep_features[i];
+    }
+
+    for_each_set_bit(i, (void *)disabled_features,
+                     sizeof(disabled_features) * 8)
+    {
+        const uint32_t *dfs = lookup_deep_deps(i);
+        unsigned int j;
+
+        ASSERT(dfs); /* deep_features[] should guarentee this. */
+
+        for ( j = 0; j < FSCAPINTS; ++j )
+        {
+            fs[j] &= ~dfs[j];
+            disabled_features[j] &= ~dfs[j];
+        }
     }
 
     switch ( boot_cpu_data.x86_vendor )
@@ -153,6 +176,36 @@ void calculate_featuresets(void)
     calculate_hvm_featureset();
 }
 
+const uint32_t *lookup_deep_deps(uint32_t feature)
+{
+    static const struct {
+        uint32_t feature;
+        uint32_t fs[FSCAPINTS];
+    } deep_deps[] = INIT_DEEP_DEPS;
+    unsigned int start = 0, end = ARRAY_SIZE(deep_deps);
+
+    BUILD_BUG_ON(ARRAY_SIZE(deep_deps) != NR_DEEP_DEPS);
+
+    /* Fast early exit. */
+    if ( !test_bit(feature, deep_features) )
+        return NULL;
+
+    /* deep_deps[] is sorted.  Perform a binary search. */
+    while ( start < end )
+    {
+        unsigned int mid = start + ((end - start) / 2);
+
+        if ( deep_deps[mid].feature > feature )
+            end = mid;
+        else if ( deep_deps[mid].feature < feature )
+            start = mid + 1;
+        else
+            return deep_deps[mid].fs;
+    }
+
+    return NULL;
+}
+
 static void __maybe_unused build_assertions(void)
 {
     BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);
@@ -160,6 +213,7 @@ static void __maybe_unused build_assertions(void)
     BUILD_BUG_ON(ARRAY_SIZE(pv_featuremask) != FSCAPINTS);
     BUILD_BUG_ON(ARRAY_SIZE(hvm_shadow_featuremask) != FSCAPINTS);
     BUILD_BUG_ON(ARRAY_SIZE(hvm_hap_featuremask) != FSCAPINTS);
+    BUILD_BUG_ON(ARRAY_SIZE(deep_features) != FSCAPINTS);
 }
 
 /*
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index 18ba95b..cd7fa90 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -28,6 +28,8 @@ extern uint32_t hvm_featureset[FSCAPINTS];
 
 void calculate_featuresets(void);
 
+const uint32_t *lookup_deep_deps(uint32_t feature);
+
 #endif /* __ASSEMBLY__ */
 #endif /* !__X86_CPUID_H__ */
 
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 5f0f892..c44f124 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -138,6 +138,61 @@ def crunch_numbers(state):
     state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries)
     state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
 
+    deps = {
+        XSAVE:
+        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
+
+        AVX:
+        (FMA, FMA4, F16C, AVX2, XOP),
+
+        PAE:
+        (LM, ),
+
+        LM:
+        (CX16, LAHF_LM, PAGE1GB),
+
+        XMM:
+        (LM, ),
+
+        XMM2:
+        (LM, ),
+
+        XMM3:
+        (LM, ),
+
+        APIC:
+        (X2APIC, ),
+
+        PSE:
+        (PSE36, ),
+    }
+
+    deep_features = tuple(sorted(deps.keys()))
+    state.deep_deps = {}
+
+    for feat in deep_features:
+
+        seen = [feat]
+        to_process = list(deps[feat])
+
+        while len(to_process):
+            f = to_process.pop(0)
+
+            if f in seen:
+                raise Fail("ERROR: Cycle found with %s when processing %s"
+                           % (state.names[f], state.names[feat]))
+
+            seen.append(f)
+            to_process.extend(deps.get(f, []))
+
+        state.deep_deps[feat] = seen[1:]
+
+    state.deep_features = featureset_to_uint32s(deps.keys(), nr_entries)
+    state.nr_deep_deps = len(state.deep_deps.keys())
+
+    for k, v in state.deep_deps.iteritems():
+        state.deep_deps[k] = featureset_to_uint32s(v, nr_entries)
+
 
 def write_results(state):
     state.output.write(
@@ -164,6 +219,12 @@ def write_results(state):
 #define INIT_HVM_SHADOW_FEATURES { \\\n%s\n}
 
 #define INIT_HVM_HAP_FEATURES { \\\n%s\n}
+
+#define NR_DEEP_DEPS %s
+
+#define INIT_DEEP_FEATURES { \\\n%s\n}
+
+#define INIT_DEEP_DEPS { \\
 """ % (state.nr_entries,
        state.common,
        format_uint32s(state.known, 4),
@@ -171,10 +232,20 @@ def write_results(state):
        format_uint32s(state.pv, 4),
        format_uint32s(state.hvm_shadow, 4),
        format_uint32s(state.hvm_hap, 4),
+       state.nr_deep_deps,
+       format_uint32s(state.deep_features, 4),
        ))
 
+    for dep in sorted(state.deep_deps.keys()):
+        state.output.write(
+            "    { %#xU, /* %s */ { \\\n%s\n    }, }, \\\n"
+            % (dep, state.names[dep],
+               format_uint32s(state.deep_deps[dep], 8)
+           ))
+
     state.output.write(
-"""
+"""}
+
 #endif /* __XEN_X86__FEATURESET_DATA__ */
 """)
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (11 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 12/30] xen/x86: Generate deep dependencies of features Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-15 14:53   ` Jan Beulich
  2016-02-15 14:56   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 14/30] xen/x86: Improve disabling of features which have dependencies Andrew Cooper
                   ` (17 subsequent siblings)
  30 siblings, 2 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

When clearing a cpu cap, clear all dependent features.  This avoids having a
featureset with intermediate features disabled, but leaf features enabled.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/cpu/common.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 39c340b..e205565 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -53,8 +53,24 @@ static unsigned int cleared_caps[NCAPINTS];
 
 void __init setup_clear_cpu_cap(unsigned int cap)
 {
+	const uint32_t *dfs;
+	unsigned int i;
+
+	if ( test_bit(cap, cleared_caps) )
+		return;
+
 	__clear_bit(cap, boot_cpu_data.x86_capability);
 	__set_bit(cap, cleared_caps);
+
+	dfs = lookup_deep_deps(cap);
+
+	if ( !dfs )
+		return;
+
+	for ( i = 0; i < FSCAPINTS; ++i ) {
+		cleared_caps[i] |= dfs[i];
+		boot_cpu_data.x86_capability[i] &= ~dfs[i];
+	}
 }
 
 static void default_init(struct cpuinfo_x86 * c)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 14/30] xen/x86: Improve disabling of features which have dependencies
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (12 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-05 13:42 ` [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

APIC and XSAVE have dependent features, which also need disabling if Xen
chooses to disable a feature.

Use setup_clear_cpu_cap() rather than clear_bit(), as it takes care of
dependent features as well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
---
v2: Move boolean_param() adjacent to use_xsave in xstate_init()
---
 xen/arch/x86/apic.c       |  2 +-
 xen/arch/x86/cpu/common.c | 12 +++---------
 xen/arch/x86/xstate.c     |  6 +++++-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index b9601ad..8df5bd3 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1349,7 +1349,7 @@ void pmu_apic_interrupt(struct cpu_user_regs *regs)
 int __init APIC_init_uniprocessor (void)
 {
     if (enable_local_apic < 0)
-        __clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
+        setup_clear_cpu_cap(X86_FEATURE_APIC);
 
     if (!smp_found_config && !cpu_has_apic) {
         skip_ioapic_setup = 1;
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index e205565..46d93a6 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -16,9 +16,6 @@
 
 #include "cpu.h"
 
-static bool_t use_xsave = 1;
-boolean_param("xsave", use_xsave);
-
 bool_t opt_arat = 1;
 boolean_param("arat", opt_arat);
 
@@ -343,12 +340,6 @@ void identify_cpu(struct cpuinfo_x86 *c)
 	if (this_cpu->c_init)
 		this_cpu->c_init(c);
 
-        /* Initialize xsave/xrstor features */
-	if ( !use_xsave )
-		__clear_bit(X86_FEATURE_XSAVE, boot_cpu_data.x86_capability);
-
-	if ( cpu_has_xsave )
-		xstate_init(c);
 
    	if ( !opt_pku )
 		setup_clear_cpu_cap(X86_FEATURE_PKU);
@@ -374,6 +365,9 @@ void identify_cpu(struct cpuinfo_x86 *c)
 
 	/* Now the feature flags better reflect actual CPU features! */
 
+	if ( cpu_has_xsave )
+		xstate_init(c);
+
 #ifdef NOISY_CAPS
 	printk(KERN_DEBUG "CPU: After all inits, caps:");
 	for (i = 0; i < NCAPINTS; i++)
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index c5d17ff..56b5df2 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -505,11 +505,15 @@ unsigned int xstate_ctxt_size(u64 xcr0)
 /* Collect the information of processor's extended state */
 void xstate_init(struct cpuinfo_x86 *c)
 {
+    static bool_t __initdata use_xsave = 1;
+    boolean_param("xsave", use_xsave);
+
     bool_t bsp = c == &boot_cpu_data;
     u32 eax, ebx, ecx, edx;
     u64 feature_mask;
 
-    if ( boot_cpu_data.cpuid_level < XSTATE_CPUID )
+    if ( (bsp && !use_xsave) ||
+         boot_cpu_data.cpuid_level < XSTATE_CPUID )
     {
         BUG_ON(!bsp);
         setup_clear_cpu_cap(X86_FEATURE_XSAVE);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (13 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 14/30] xen/x86: Improve disabling of features which have dependencies Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-15 15:43   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
                   ` (15 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

* Use the boot-generated pv and hvm featureset to clamp the visible features,
  rather than picking and choosing at individual features.  This subsumes the
  static feature manipulation.
* More use of compiler-visible &'s and |'s, rather than clear,set bit.
* Remove logic which hides PSE36 out of PAE mode.  This is not how real
  hardware behaves.
* Improve logic to set OSXSAVE.  The bit is cleared by virtue of not being
  valid in a featureset, and should be a strict fast-forward from %cr4.
  Provide a very big health warning for OXSAVE for PV guests, which is
  non-architectural.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Reinstate some of the dynamic checks for now.  Future development work will
   instate a complete per-domain policy.
 * Fix OSXSAVE handling for PV guests.
---
 xen/arch/x86/hvm/hvm.c |  56 +++++++++---------
 xen/arch/x86/traps.c   | 151 ++++++++++++++++++++++++-------------------------
 2 files changed, 100 insertions(+), 107 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 35ec6c9..03b3868 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -71,6 +71,7 @@
 #include <public/memory.h>
 #include <public/vm_event.h>
 #include <public/arch-x86/cpuid.h>
+#include <asm/cpuid.h>
 
 bool_t __read_mostly hvm_enabled;
 
@@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
         /* Fix up VLAPIC details. */
         *ebx &= 0x00FFFFFFu;
         *ebx |= (v->vcpu_id * 2) << 24;
+
+        *ecx &= hvm_featureset[FEATURESET_1c];
+        *edx &= hvm_featureset[FEATURESET_1d];
+
         if ( vlapic_hw_disabled(vcpu_vlapic(v)) )
-            __clear_bit(X86_FEATURE_APIC & 31, edx);
+            *edx &= ~cpufeat_bit(X86_FEATURE_APIC);
 
         /* Fix up OSXSAVE. */
-        if ( cpu_has_xsave )
-            *ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
-                     cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
+        if ( v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE )
+            *ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
 
         /* Don't expose PCID to non-hap hvm. */
         if ( !hap_enabled(d) )
             *ecx &= ~cpufeat_mask(X86_FEATURE_PCID);
-
-        /* Only provide PSE36 when guest runs in 32bit PAE or in long mode */
-        if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) )
-            *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
         break;
+
     case 0x7:
         if ( count == 0 )
         {
-            if ( !cpu_has_smep )
-                *ebx &= ~cpufeat_mask(X86_FEATURE_SMEP);
-
-            if ( !cpu_has_smap )
-                *ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
-
-            /* Don't expose MPX to hvm when VMX support is not available */
-            if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
-                 !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
-                *ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
+            *ebx &= hvm_featureset[FEATURESET_7b0];
+            *ecx &= hvm_featureset[FEATURESET_7c0];
 
             /* Don't expose INVPCID to non-hap hvm. */
             if ( !hap_enabled(d) )
                 *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
-
-            /* Don't expose PCOMMIT to hvm when VMX support is not available */
-            if ( !cpu_has_vmx_pcommit )
-                *ebx &= ~cpufeat_mask(X86_FEATURE_PCOMMIT);
         }
-
         break;
+
     case 0xb:
         /* Fix the x2APIC identifier. */
         *edx = v->vcpu_id * 2;
         break;
+
     case 0xd:
         /* EBX value of main leaf 0 depends on enabled xsave features */
         if ( count == 0 && v->arch.xcr0 ) 
@@ -4677,9 +4667,12 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                     *ebx = _eax + _ebx;
             }
         }
+
         if ( count == 1 )
         {
-            if ( cpu_has_xsaves && cpu_has_vmx_xsaves )
+            *eax &= hvm_featureset[FEATURESET_Da1];
+
+            if ( *eax & cpufeat_mask(X86_FEATURE_XSAVES) )
             {
                 *ebx = XSTATE_AREA_MIN_SIZE;
                 if ( v->arch.xcr0 | v->arch.hvm_vcpu.msr_xss )
@@ -4694,6 +4687,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
         break;
 
     case 0x80000001:
+        *ecx &= hvm_featureset[FEATURESET_e1c];
+        *edx &= hvm_featureset[FEATURESET_e1d];
+
         /* We expose RDTSCP feature to guest only when
            tsc_mode == TSC_MODE_DEFAULT and host_tsc_is_safe() returns 1 */
         if ( d->arch.tsc_mode != TSC_MODE_DEFAULT ||
@@ -4702,12 +4698,10 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
         /* Hide 1GB-superpage feature if we can't emulate it. */
         if (!hvm_pse1gb_supported(d))
             *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB);
-        /* Only provide PSE36 when guest runs in 32bit PAE or in long mode */
-        if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) )
-            *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
-        /* Hide data breakpoint extensions if the hardware has no support. */
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
-            *ecx &= ~cpufeat_mask(X86_FEATURE_DBEXT);
+        break;
+
+    case 0x80000007:
+        *edx &= hvm_featureset[FEATURESET_e7d];
         break;
 
     case 0x80000008:
@@ -4725,6 +4719,8 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
         hvm_cpuid(0x80000001, NULL, NULL, NULL, &_edx);
         *eax = (*eax & ~0xffff00) | (_edx & cpufeat_mask(X86_FEATURE_LM)
                                      ? 0x3000 : 0x2000);
+
+        *ebx &= hvm_featureset[FEATURESET_e8b];
         break;
     }
 }
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 6a181bb..d0f836c 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -73,6 +73,7 @@
 #include <asm/hpet.h>
 #include <asm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
+#include <asm/cpuid.h>
 #include <xsm/xsm.h>
 
 /*
@@ -841,69 +842,70 @@ void pv_cpuid(struct cpu_user_regs *regs)
     else
         cpuid_count(leaf, subleaf, &a, &b, &c, &d);
 
-    if ( (leaf & 0x7fffffff) == 0x00000001 )
-    {
-        /* Modify Feature Information. */
-        if ( !cpu_has_apic )
-            __clear_bit(X86_FEATURE_APIC, &d);
-
-        if ( !is_pvh_domain(currd) )
-        {
-            __clear_bit(X86_FEATURE_PSE, &d);
-            __clear_bit(X86_FEATURE_PGE, &d);
-            __clear_bit(X86_FEATURE_PSE36, &d);
-            __clear_bit(X86_FEATURE_VME, &d);
-        }
-    }
-
     switch ( leaf )
     {
     case 0x00000001:
-        /* Modify Feature Information. */
-        if ( !cpu_has_sep )
-            __clear_bit(X86_FEATURE_SEP, &d);
-        __clear_bit(X86_FEATURE_DS, &d);
-        __clear_bit(X86_FEATURE_ACC, &d);
-        __clear_bit(X86_FEATURE_PBE, &d);
-        if ( is_pvh_domain(currd) )
-            __clear_bit(X86_FEATURE_MTRR, &d);
-
-        __clear_bit(X86_FEATURE_DTES64 % 32, &c);
-        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
-        __clear_bit(X86_FEATURE_DSCPL % 32, &c);
-        __clear_bit(X86_FEATURE_VMXE % 32, &c);
-        __clear_bit(X86_FEATURE_SMXE % 32, &c);
-        __clear_bit(X86_FEATURE_TM2 % 32, &c);
+        c &= pv_featureset[FEATURESET_1c];
+        d &= pv_featureset[FEATURESET_1d];
+
         if ( is_pv_32bit_domain(currd) )
-            __clear_bit(X86_FEATURE_CX16 % 32, &c);
-        __clear_bit(X86_FEATURE_XTPR % 32, &c);
-        __clear_bit(X86_FEATURE_PDCM % 32, &c);
-        __clear_bit(X86_FEATURE_PCID % 32, &c);
-        __clear_bit(X86_FEATURE_DCA % 32, &c);
-        if ( !cpu_has_xsave )
-        {
-            __clear_bit(X86_FEATURE_XSAVE % 32, &c);
-            __clear_bit(X86_FEATURE_AVX % 32, &c);
-        }
-        if ( !cpu_has_apic )
-           __clear_bit(X86_FEATURE_X2APIC % 32, &c);
-        __set_bit(X86_FEATURE_HYPERVISOR % 32, &c);
+            c &= ~cpufeat_mask(X86_FEATURE_CX16);
+
+        /*
+         * !!! Warning - OSXSAVE handling for PV guests is non-architectural !!!
+         *
+         * Architecturally, the correct code here is simply:
+         *
+         *   if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE )
+         *       c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+         *
+         * However because of bugs in Xen (before c/s bd19080b, Nov 2010, the
+         * XSAVE cpuid flag leaked into guests despite the feature not being
+         * avilable for use), buggy workarounds where introduced to Linux (c/s
+         * 947ccf9c, also Nov 2010) which relied on the fact that Xen also
+         * incorrectly leaked OSXSAVE into the guest.
+         *
+         * Furthermore, providing architectural OSXSAVE behaviour to a many
+         * Linux PV guests triggered a further kernel bug when the fpu code
+         * observes that XSAVEOPT is available, assumes that xsave state had
+         * been set up for the task, and follows a wild pointer.
+         *
+         * Therefore, the leaking of Xen's OSXSAVE setting has become a
+         * defacto part of the PV ABI and can't reasonably be corrected.
+         *
+         * The following situations and logic now applies:
+         *
+         * - Hardware without CPUID faulting support and native CPUID:
+         *    There is nothing Xen can do here.  The hosts XSAVE flag will
+         *    leak through and Xen's OSXSAVE choice will leak through.
+         *
+         *    In the case that the guest kernel has not set up OSXSAVE, only
+         *    SSE will be set in xcr0, and guest userspace can't do too much
+         *    damage itself.
+         *
+         * - Enlightened CPUID or CPUID faulting available:
+         *    Xen can fully control what is seen here.  Guest kernels need to
+         *    see the leaked OSXSAVE, but guest userspace is given
+         *    architectural behaviour, to reflect the guest kernels
+         *    intentions.
+         */
+        if ( (is_pv_domain(currd) && guest_kernel_mode(curr, regs) &&
+              (this_cpu(cr4) & X86_CR4_OSXSAVE)) ||
+             (curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) )
+            c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+
+        c |= cpufeat_mask(X86_FEATURE_HYPERVISOR);
         break;
 
     case 0x00000007:
         if ( subleaf == 0 )
-            b &= (cpufeat_mask(X86_FEATURE_BMI1) |
-                  cpufeat_mask(X86_FEATURE_HLE)  |
-                  cpufeat_mask(X86_FEATURE_AVX2) |
-                  cpufeat_mask(X86_FEATURE_BMI2) |
-                  cpufeat_mask(X86_FEATURE_ERMS) |
-                  cpufeat_mask(X86_FEATURE_RTM)  |
-                  cpufeat_mask(X86_FEATURE_RDSEED)  |
-                  cpufeat_mask(X86_FEATURE_ADX)  |
-                  cpufeat_mask(X86_FEATURE_FSGSBASE));
+        {
+            b &= pv_featureset[FEATURESET_7b0];
+            c &= pv_featureset[FEATURESET_7c0];
+        }
         else
-            b = 0;
-        a = c = d = 0;
+            b = c = 0;
+        a = d = 0;
         break;
 
     case XSTATE_CPUID:
@@ -926,37 +928,32 @@ void pv_cpuid(struct cpu_user_regs *regs)
         }
 
         case 1:
-            a &= (boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_XSAVEOPT)] &
-                  ~cpufeat_mask(X86_FEATURE_XSAVES));
+            a &= pv_featureset[FEATURESET_Da1];
             b = c = d = 0;
             break;
         }
         break;
 
     case 0x80000001:
-        /* Modify Feature Information. */
+        c &= pv_featureset[FEATURESET_e1c];
+        d &= pv_featureset[FEATURESET_e1d];
+
         if ( is_pv_32bit_domain(currd) )
         {
-            __clear_bit(X86_FEATURE_LM % 32, &d);
-            __clear_bit(X86_FEATURE_LAHF_LM % 32, &c);
-        }
-        if ( is_pv_32bit_domain(currd) &&
-             boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
-            __clear_bit(X86_FEATURE_SYSCALL % 32, &d);
-        __clear_bit(X86_FEATURE_PAGE1GB % 32, &d);
-        __clear_bit(X86_FEATURE_RDTSCP % 32, &d);
-
-        __clear_bit(X86_FEATURE_SVM % 32, &c);
-        if ( !cpu_has_apic )
-           __clear_bit(X86_FEATURE_EXTAPIC % 32, &c);
-        __clear_bit(X86_FEATURE_OSVW % 32, &c);
-        __clear_bit(X86_FEATURE_IBS % 32, &c);
-        __clear_bit(X86_FEATURE_SKINIT % 32, &c);
-        __clear_bit(X86_FEATURE_WDT % 32, &c);
-        __clear_bit(X86_FEATURE_LWP % 32, &c);
-        __clear_bit(X86_FEATURE_NODEID_MSR % 32, &c);
-        __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
-        __clear_bit(X86_FEATURE_MWAITX % 32, &c);
+            d &= ~cpufeat_mask(X86_FEATURE_LM);
+            c &= ~cpufeat_mask(X86_FEATURE_LAHF_LM);
+
+            if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
+                d &= ~cpufeat_mask(X86_FEATURE_SYSCALL);
+        }
+        break;
+
+    case 0x80000007:
+        d &= pv_featureset[FEATURESET_e7d];
+        break;
+
+    case 0x80000008:
+        b &= pv_featureset[FEATURESET_e8b];
         break;
 
     case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init()
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (14 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-16 14:10   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching Andrew Cooper
                   ` (14 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Before c/s 44e24f8567 "x86: don't call generic_identify() redundantly", the
commandline-provided masks would take effect in Xen's view of the features.

As the masks got applied after the query for features, the redundant call to
generic_identify() would clobber the pre-masking feature information with the
post-masking information.

Move the set_cpumask() calls into c_early_init() so their effects take place
before the main query for features in generic_identify().

The cpuid_mask_* command line parameters now limit the entire system, a
feature XenServer was relying on for testing purposes.  Subsequent changes
will cause the mask MSRs to be context switched per-domain, removing the need
to use the command line parameters for heterogeneous levelling purposes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/cpu/amd.c   |  8 ++++++--
 xen/arch/x86/cpu/intel.c | 34 +++++++++++++++++-----------------
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index f9dc532..5908cba 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -407,6 +407,11 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
                                                          c->cpu_core_id);
 }
 
+static void early_init_amd(struct cpuinfo_x86 *c)
+{
+	set_cpuidmask(c);
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -595,14 +600,13 @@ static void init_amd(struct cpuinfo_x86 *c)
 	if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC))
 		disable_c1_ramping();
 
-	set_cpuidmask(c);
-
 	check_syscfg_dram_mod_en();
 }
 
 static const struct cpu_dev amd_cpu_dev = {
 	.c_vendor	= "AMD",
 	.c_ident 	= { "AuthenticAMD" },
+	.c_early_init	= early_init_amd,
 	.c_init		= init_amd,
 };
 
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index bdf89f6..ad22375 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -189,6 +189,23 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	if (boot_cpu_data.x86 == 0xF && boot_cpu_data.x86_model == 3 &&
 	    (boot_cpu_data.x86_mask == 3 || boot_cpu_data.x86_mask == 4))
 		paddr_bits = 36;
+
+	if (c == &boot_cpu_data && c->x86 == 6) {
+		if (probe_intel_cpuid_faulting())
+			__set_bit(X86_FEATURE_CPUID_FAULTING,
+				  c->x86_capability);
+	} else if (boot_cpu_has(X86_FEATURE_CPUID_FAULTING)) {
+		BUG_ON(!probe_intel_cpuid_faulting());
+		__set_bit(X86_FEATURE_CPUID_FAULTING, c->x86_capability);
+	}
+
+	if (!cpu_has_cpuid_faulting)
+		set_cpuidmask(c);
+	else if ((c == &boot_cpu_data) &&
+		 (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
+		    opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
+		    opt_cpuid_mask_xsave_eax)))
+		printk("No CPUID feature masking support available\n");
 }
 
 /*
@@ -258,23 +275,6 @@ static void init_intel(struct cpuinfo_x86 *c)
 		detect_ht(c);
 	}
 
-	if (c == &boot_cpu_data && c->x86 == 6) {
-		if (probe_intel_cpuid_faulting())
-			__set_bit(X86_FEATURE_CPUID_FAULTING,
-				  c->x86_capability);
-	} else if (boot_cpu_has(X86_FEATURE_CPUID_FAULTING)) {
-		BUG_ON(!probe_intel_cpuid_faulting());
-		__set_bit(X86_FEATURE_CPUID_FAULTING, c->x86_capability);
-	}
-
-	if (!cpu_has_cpuid_faulting)
-		set_cpuidmask(c);
-	else if ((c == &boot_cpu_data) &&
-		 (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
-		    opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
-		    opt_cpuid_mask_xsave_eax)))
-		printk("No CPUID feature masking support available\n");
-
 	/* Work around errata */
 	Intel_errata_workarounds(c);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (15 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-16 14:15   ` Jan Beulich
  2016-02-17 19:06   ` Konrad Rzeszutek Wilk
  2016-02-05 13:42 ` [PATCH v2 18/30] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
                   ` (13 subsequent siblings)
  30 siblings, 2 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

This change is purely scaffolding to reduce the complexity of the following
three patches.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2: s/cpumasks/cpuidmasks/
---
 xen/arch/x86/cpu/common.c        |  6 ++++++
 xen/include/asm-x86/cpufeature.h |  1 +
 xen/include/asm-x86/processor.h  | 28 ++++++++++++++++++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 46d93a6..3fdae96 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -36,6 +36,12 @@ integer_param("cpuid_mask_ext_ecx", opt_cpuid_mask_ext_ecx);
 unsigned int opt_cpuid_mask_ext_edx = ~0u;
 integer_param("cpuid_mask_ext_edx", opt_cpuid_mask_ext_edx);
 
+unsigned int __initdata expected_levelling_cap;
+unsigned int __read_mostly levelling_caps;
+
+DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
+struct cpuidmasks __read_mostly cpuidmask_defaults;
+
 const struct cpu_dev *__read_mostly cpu_devs[X86_VENDOR_NUM] = {};
 
 unsigned int paddr_bits __read_mostly = 36;
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f228fa2..8ac6b56 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -95,6 +95,7 @@
 #define cpu_has_xsavec		boot_cpu_has(X86_FEATURE_XSAVEC)
 #define cpu_has_xgetbv1		boot_cpu_has(X86_FEATURE_XGETBV1)
 #define cpu_has_xsaves		boot_cpu_has(X86_FEATURE_XSAVES)
+#define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 
 enum _cache_type {
     CACHE_TYPE_NULL = 0,
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 271340e..09e82d8 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -574,6 +574,34 @@ void microcode_set_module(unsigned int);
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len);
 int microcode_resume_cpu(unsigned int cpu);
 
+#define LCAP_faulting (1U << 0)
+#define LCAP_1cd      (3U << 1)
+#define LCAP_e1cd     (3U << 3)
+#define LCAP_Da1      (1U << 5)
+#define LCAP_6c       (1U << 6)
+#define LCAP_7ab0     (3U << 7)
+
+/*
+ * Expected levelling capabilities (given cpuid vendor/family information),
+ * and levelling capabilities actually available (given MSR probing).
+ */
+extern unsigned int expected_levelling_cap, levelling_caps;
+
+struct cpuidmasks
+{
+    uint64_t _1cd;
+    uint64_t e1cd;
+    uint64_t Da1;
+    uint64_t _6c;
+    uint64_t _7ab0;
+};
+
+/* Per CPU shadows of masking MSR values, for lazy context switching. */
+DECLARE_PER_CPU(struct cpuidmasks, cpuidmasks);
+
+/* Default masking MSR values, calculated at boot. */
+extern struct cpuidmasks cpuidmask_defaults;
+
 enum get_cpu_vendor {
    gcv_host_early,
    gcv_host_late,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 18/30] x86/cpu: Rework AMD masking MSR setup
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (16 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-17  7:40   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 19/30] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
                   ` (12 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

This patch is best reviewed as its end result rather than as a diff, as it
rewrites almost all of the setup.

On the BSP, cpuid information is used to evaluate the potential available set
of masking MSRs, and they are unconditionally probed, filling in the
availability information and hardware defaults.

The command line parameters are then combined with the hardware defaults to
further restrict the Xen default masking level.  Each cpu is then context
switched into the default levelling state.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Provide extra information if opt_cpu_info
 * Extra comment indicating the expected use of amd_ctxt_switch_levelling()
---
 xen/arch/x86/cpu/amd.c | 267 +++++++++++++++++++++++++++++++------------------
 1 file changed, 170 insertions(+), 97 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 5908cba..1708dd9 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -80,6 +80,13 @@ static inline int wrmsr_amd_safe(unsigned int msr, unsigned int lo,
 	return err;
 }
 
+static void wrmsr_amd(unsigned int msr, uint64_t val)
+{
+	asm volatile("wrmsr" ::
+		     "c" (msr), "a" ((uint32_t)val),
+		     "d" (val >> 32), "D" (0x9c5a203a));
+}
+
 static const struct cpuidmask {
 	uint16_t fam;
 	char rev[2];
@@ -126,126 +133,189 @@ static const struct cpuidmask *__init noinline get_cpuidmask(const char *opt)
 }
 
 /*
+ * Sets caps in expected_levelling_cap, probes for the specified mask MSR, and
+ * set caps in levelling_caps if it is found.  Processors prior to Fam 10h
+ * required a 32-bit password for masking MSRs.  Reads the default value into
+ * msr_val.
+ */
+static void __init __probe_mask_msr(unsigned int msr, uint64_t caps,
+                                    uint64_t *msr_val)
+{
+	unsigned int hi, lo;
+
+        expected_levelling_cap |= caps;
+
+	if ((rdmsr_amd_safe(msr, &lo, &hi) == 0) &&
+	    (wrmsr_amd_safe(msr, lo, hi) == 0))
+		levelling_caps |= caps;
+
+	*msr_val = ((uint64_t)hi << 32) | lo;
+}
+
+/*
+ * Probe for the existance of the expected masking MSRs.  They might easily
+ * not be available if Xen is running virtualised.
+ */
+static void __init noinline probe_masking_msrs(void)
+{
+	const struct cpuinfo_x86 *c = &boot_cpu_data;
+
+	/*
+	 * First, work out which masking MSRs we should have, based on
+	 * revision and cpuid.
+	 */
+
+	/* Fam11 doesn't support masking at all. */
+	if (c->x86 == 0x11)
+		return;
+
+	__probe_mask_msr(MSR_K8_FEATURE_MASK, LCAP_1cd,
+			 &cpuidmask_defaults._1cd);
+	__probe_mask_msr(MSR_K8_EXT_FEATURE_MASK, LCAP_e1cd,
+			 &cpuidmask_defaults.e1cd);
+
+	if (c->cpuid_level >= 7)
+		__probe_mask_msr(MSR_AMD_L7S0_FEATURE_MASK, LCAP_7ab0,
+				 &cpuidmask_defaults._7ab0);
+
+	if (c->x86 == 0x15 && c->cpuid_level >= 6 && cpuid_ecx(6))
+		__probe_mask_msr(MSR_AMD_THRM_FEATURE_MASK, LCAP_6c,
+				 &cpuidmask_defaults._6c);
+
+	/*
+	 * Don't bother warning about a mismatch if virtualised.  These MSRs
+	 * are not architectural and almost never virtualised.
+	 */
+	if ((expected_levelling_cap == levelling_caps) ||
+	    cpu_has_hypervisor)
+		return;
+
+	printk(XENLOG_WARNING "Mismatch between expected (%#x) "
+	       "and real (%#x) levelling caps: missing %#x\n",
+	       expected_levelling_cap, levelling_caps,
+	       (expected_levelling_cap ^ levelling_caps) & levelling_caps);
+	printk(XENLOG_WARNING "Fam %#x, model %#x level %#x\n",
+	       c->x86, c->x86_model, c->cpuid_level);
+	printk(XENLOG_WARNING
+	       "If not running virtualised, please report a bug\n");
+}
+
+/*
+ * Context switch levelling state to the next domain.  A parameter of NULL is
+ * used to context switch to the default host state, and is used by the BSP/AP
+ * startup code.
+ */
+static void amd_ctxt_switch_levelling(const struct domain *nextd)
+{
+	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
+	const struct cpuidmasks *masks = &cpuidmask_defaults;
+
+#define LAZY(cap, msr, field)						\
+	({								\
+		if (((levelling_caps & cap) == cap) &&			\
+		    (these_masks->field != masks->field))		\
+		{							\
+			wrmsr_amd(msr, masks->field);			\
+			these_masks->field = masks->field;		\
+		}							\
+	})
+
+	LAZY(LCAP_1cd,  MSR_K8_FEATURE_MASK,       _1cd);
+	LAZY(LCAP_e1cd, MSR_K8_EXT_FEATURE_MASK,   e1cd);
+	LAZY(LCAP_7ab0, MSR_AMD_L7S0_FEATURE_MASK, _7ab0);
+	LAZY(LCAP_6c,   MSR_AMD_THRM_FEATURE_MASK, _6c);
+
+#undef LAZY
+}
+
+/*
  * Mask the features and extended features returned by CPUID.  Parameters are
  * set from the boot line via two methods:
  *
  *   1) Specific processor revision string
  *   2) User-defined masks
  *
- * The processor revision string parameter has precedene.
+ * The user-defined masks take precedence.
  */
-static void set_cpuidmask(const struct cpuinfo_x86 *c)
+static void __init noinline amd_init_levelling(void)
 {
-	static unsigned int feat_ecx, feat_edx;
-	static unsigned int extfeat_ecx, extfeat_edx;
-	static unsigned int l7s0_eax, l7s0_ebx;
-	static unsigned int thermal_ecx;
-	static bool_t skip_feat, skip_extfeat;
-	static bool_t skip_l7s0_eax_ebx, skip_thermal_ecx;
-	static enum { not_parsed, no_mask, set_mask } status;
-	unsigned int eax, ebx, ecx, edx;
-
-	if (status == no_mask)
-		return;
+	const struct cpuidmask *m = NULL;
 
-	if (status == set_mask)
-		goto setmask;
+	probe_masking_msrs();
 
-	ASSERT((status == not_parsed) && (c == &boot_cpu_data));
-	status = no_mask;
+	if (*opt_famrev != '\0') {
+		m = get_cpuidmask(opt_famrev);
 
-	/* Fam11 doesn't support masking at all. */
-	if (c->x86 == 0x11)
-		return;
+		if (!m)
+			printk("Invalid processor string: %s\n", opt_famrev);
+	}
 
-	if (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
-	      opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
-	      opt_cpuid_mask_l7s0_eax & opt_cpuid_mask_l7s0_ebx &
-	      opt_cpuid_mask_thermal_ecx)) {
-		feat_ecx = opt_cpuid_mask_ecx;
-		feat_edx = opt_cpuid_mask_edx;
-		extfeat_ecx = opt_cpuid_mask_ext_ecx;
-		extfeat_edx = opt_cpuid_mask_ext_edx;
-		l7s0_eax = opt_cpuid_mask_l7s0_eax;
-		l7s0_ebx = opt_cpuid_mask_l7s0_ebx;
-		thermal_ecx = opt_cpuid_mask_thermal_ecx;
-	} else if (*opt_famrev == '\0') {
-		return;
-	} else {
-		const struct cpuidmask *m = get_cpuidmask(opt_famrev);
+	if ((levelling_caps & LCAP_1cd) == LCAP_1cd) {
+		uint32_t ecx, edx, tmp;
 
-		if (!m) {
-			printk("Invalid processor string: %s\n", opt_famrev);
-			printk("CPUID will not be masked\n");
-			return;
+		cpuid(0x00000001, &tmp, &tmp, &ecx, &edx);
+
+		if(~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx)) {
+			ecx &= opt_cpuid_mask_ecx;
+			edx &= opt_cpuid_mask_edx;
+		} else if ( m ) {
+			ecx &= m->ecx;
+			edx &= m->edx;
 		}
-		feat_ecx = m->ecx;
-		feat_edx = m->edx;
-		extfeat_ecx = m->ext_ecx;
-		extfeat_edx = m->ext_edx;
+
+		cpuidmask_defaults._1cd &= ((uint64_t)ecx << 32) | edx;
 	}
 
-        /* Setting bits in the CPUID mask MSR that are not set in the
-         * unmasked CPUID response can cause those bits to be set in the
-         * masked response.  Avoid that by explicitly masking in software. */
-        feat_ecx &= cpuid_ecx(0x00000001);
-        feat_edx &= cpuid_edx(0x00000001);
-        extfeat_ecx &= cpuid_ecx(0x80000001);
-        extfeat_edx &= cpuid_edx(0x80000001);
+	if ((levelling_caps & LCAP_e1cd) == LCAP_e1cd) {
+		uint32_t ecx, edx, tmp;
 
-	status = set_mask;
-	printk("Writing CPUID feature mask ECX:EDX -> %08Xh:%08Xh\n", 
-	       feat_ecx, feat_edx);
-	printk("Writing CPUID extended feature mask ECX:EDX -> %08Xh:%08Xh\n", 
-	       extfeat_ecx, extfeat_edx);
+		cpuid(0x80000001, &tmp, &tmp, &ecx, &edx);
 
-	if (c->cpuid_level >= 7)
-		cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
-	else
-		ebx = eax = 0;
-	if ((eax | ebx) && ~(l7s0_eax & l7s0_ebx)) {
-		if (l7s0_eax > eax)
-			l7s0_eax = eax;
-		l7s0_ebx &= ebx;
-		printk("Writing CPUID leaf 7 subleaf 0 feature mask EAX:EBX -> %08Xh:%08Xh\n",
-		       l7s0_eax, l7s0_ebx);
-	} else
-		skip_l7s0_eax_ebx = 1;
-
-	/* Only Fam15 has the respective MSR. */
-	ecx = c->x86 == 0x15 && c->cpuid_level >= 6 ? cpuid_ecx(6) : 0;
-	if (ecx && ~thermal_ecx) {
-		thermal_ecx &= ecx;
-		printk("Writing CPUID thermal/power feature mask ECX -> %08Xh\n",
-		       thermal_ecx);
-	} else
-		skip_thermal_ecx = 1;
-
- setmask:
-	/* AMD processors prior to family 10h required a 32-bit password */
-	if (!skip_feat &&
-	    wrmsr_amd_safe(MSR_K8_FEATURE_MASK, feat_edx, feat_ecx)) {
-		skip_feat = 1;
-		printk("Failed to set CPUID feature mask\n");
+		if(~(opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx)) {
+			ecx &= opt_cpuid_mask_ext_ecx;
+			edx &= opt_cpuid_mask_ext_edx;
+		} else if ( m ) {
+			ecx &= m->ext_ecx;
+			edx &= m->ext_edx;
+		}
+
+		cpuidmask_defaults.e1cd &= ((uint64_t)ecx << 32) | edx;
 	}
 
-	if (!skip_extfeat &&
-	    wrmsr_amd_safe(MSR_K8_EXT_FEATURE_MASK, extfeat_edx, extfeat_ecx)) {
-		skip_extfeat = 1;
-		printk("Failed to set CPUID extended feature mask\n");
+	if ((levelling_caps & LCAP_7ab0) == LCAP_7ab0) {
+		uint32_t eax, ebx, tmp;
+
+		cpuid(0x00000007, &eax, &ebx, &tmp, &tmp);
+
+		if(~(opt_cpuid_mask_l7s0_eax & opt_cpuid_mask_l7s0_ebx)) {
+			eax &= opt_cpuid_mask_l7s0_eax;
+			ebx &= opt_cpuid_mask_l7s0_ebx;
+		}
+
+		cpuidmask_defaults._7ab0 &= ((uint64_t)eax << 32) | ebx;
 	}
 
-	if (!skip_l7s0_eax_ebx &&
-	    wrmsr_amd_safe(MSR_AMD_L7S0_FEATURE_MASK, l7s0_ebx, l7s0_eax)) {
-		skip_l7s0_eax_ebx = 1;
-		printk("Failed to set CPUID leaf 7 subleaf 0 feature mask\n");
+	if ((levelling_caps & LCAP_6c) == LCAP_6c) {
+		uint32_t ecx = cpuid_ecx(6);
+
+		if (~opt_cpuid_mask_thermal_ecx)
+			ecx &= opt_cpuid_mask_thermal_ecx;
+
+		cpuidmask_defaults._6c &= (~0ULL << 32) | ecx;
 	}
 
-	if (!skip_thermal_ecx &&
-	    (rdmsr_amd_safe(MSR_AMD_THRM_FEATURE_MASK, &eax, &edx) ||
-	     wrmsr_amd_safe(MSR_AMD_THRM_FEATURE_MASK, thermal_ecx, edx))){
-		skip_thermal_ecx = 1;
-		printk("Failed to set CPUID thermal/power feature mask\n");
+	if (opt_cpu_info) {
+		printk(XENLOG_INFO "Levelling caps: %#x\n", levelling_caps);
+		printk(XENLOG_INFO
+		       "MSR defaults: 1d 0x%08x, 1c 0x%08x, e1d 0x%08x, "
+		       "e1c 0x%08x, 7a0 0x%08x, 7b0 0x%08x, 6c 0x%08x\n",
+		       (uint32_t)cpuidmask_defaults._1cd,
+		       (uint32_t)(cpuidmask_defaults._1cd >> 32),
+		       (uint32_t)cpuidmask_defaults.e1cd,
+		       (uint32_t)(cpuidmask_defaults.e1cd >> 32),
+		       (uint32_t)(cpuidmask_defaults._7ab0 >> 32),
+		       (uint32_t)cpuidmask_defaults._7ab0,
+		       (uint32_t)cpuidmask_defaults._6c);
 	}
 }
 
@@ -409,7 +479,10 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
 
 static void early_init_amd(struct cpuinfo_x86 *c)
 {
-	set_cpuidmask(c);
+	if (c == &boot_cpu_data)
+		amd_init_levelling();
+
+	amd_ctxt_switch_levelling(NULL);
 }
 
 static void init_amd(struct cpuinfo_x86 *c)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 19/30] x86/cpu: Rework Intel masking/faulting setup
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (17 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 18/30] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-17  7:57   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 20/30] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

This patch is best reviewed as its end result rather than as a diff, as it
rewrites almost all of the setup.

On the BSP, cpuid information is used to evaluate the potential available set
of masking MSRs, and they are unconditionally probed, filling in the
availability information and hardware defaults.  A side effect of this is that
probe_intel_cpuid_faulting() can move to being __init.

The command line parameters are then combined with the hardware defaults to
further restrict the Xen default masking level.  Each cpu is then context
switched into the default levelling state.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Style fixes
 * Provide extra information if opt_cpu_info
 * Extra comment indicating the expected use of intel_ctxt_switch_levelling()
---
 xen/arch/x86/cpu/intel.c | 242 +++++++++++++++++++++++++++++------------------
 1 file changed, 150 insertions(+), 92 deletions(-)

diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index ad22375..143f497 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -18,11 +18,18 @@
 
 #define select_idle_routine(x) ((void)0)
 
-static unsigned int probe_intel_cpuid_faulting(void)
+static bool_t __init probe_intel_cpuid_faulting(void)
 {
 	uint64_t x;
-	return !rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x) &&
-		(x & MSR_PLATFORM_INFO_CPUID_FAULTING);
+
+	if ( rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x) ||
+	     !(x & MSR_PLATFORM_INFO_CPUID_FAULTING) )
+		return 0;
+
+	expected_levelling_cap |= LCAP_faulting;
+	levelling_caps |=  LCAP_faulting;
+	__set_bit(X86_FEATURE_CPUID_FAULTING, boot_cpu_data.x86_capability);
+	return 1;
 }
 
 static DEFINE_PER_CPU(bool_t, cpuid_faulting_enabled);
@@ -44,41 +51,46 @@ void set_cpuid_faulting(bool_t enable)
 }
 
 /*
- * opt_cpuid_mask_ecx/edx: cpuid.1[ecx, edx] feature mask.
- * For example, E8400[Intel Core 2 Duo Processor series] ecx = 0x0008E3FD,
- * edx = 0xBFEBFBFF when executing CPUID.EAX = 1 normally. If you want to
- * 'rev down' to E8400, you can set these values in these Xen boot parameters.
+ * Set caps in expected_levelling_cap, probe a specific masking MSR, and set
+ * caps in levelling_caps if it is found, or clobber the MSR index if missing.
+ * If preset, reads the default value into msr_val.
  */
-static void set_cpuidmask(const struct cpuinfo_x86 *c)
+static void __init __probe_mask_msr(unsigned int *msr, uint64_t caps,
+				    uint64_t *msr_val)
 {
-	static unsigned int msr_basic, msr_ext, msr_xsave;
-	static enum { not_parsed, no_mask, set_mask } status;
-	u64 msr_val;
+	uint64_t val;
 
-	if (status == no_mask)
-		return;
+	expected_levelling_cap |= caps;
 
-	if (status == set_mask)
-		goto setmask;
+	if (rdmsr_safe(*msr, val) || wrmsr_safe(*msr, val))
+		*msr = 0;
+	else
+	{
+		levelling_caps |= caps;
+		*msr_val = val;
+	}
+}
 
-	ASSERT((status == not_parsed) && (c == &boot_cpu_data));
-	status = no_mask;
+/* Indicies of the masking MSRs, or 0 if unavailable. */
+static unsigned int __read_mostly msr_basic, msr_ext, msr_xsave;
 
-	if (!~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
-	       opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
-	       opt_cpuid_mask_xsave_eax))
-		return;
+/*
+ * Probe for the existance of the expected masking MSRs.  They might easily
+ * not be available if Xen is running virtualised.
+ */
+static void __init probe_masking_msrs(void)
+{
+	const struct cpuinfo_x86 *c = &boot_cpu_data;
+	unsigned int exp_msr_basic = 0, exp_msr_ext = 0, exp_msr_xsave = 0;
 
 	/* Only family 6 supports this feature. */
-	if (c->x86 != 6) {
-		printk("No CPUID feature masking support available\n");
+	if (c->x86 != 6)
 		return;
-	}
 
 	switch (c->x86_model) {
 	case 0x17: /* Yorkfield, Wolfdale, Penryn, Harpertown(DP) */
 	case 0x1d: /* Dunnington(MP) */
-		msr_basic = MSR_INTEL_MASK_V1_CPUID1;
+		exp_msr_basic = msr_basic = MSR_INTEL_MASK_V1_CPUID1;
 		break;
 
 	case 0x1a: /* Bloomfield, Nehalem-EP(Gainestown) */
@@ -88,71 +100,126 @@ static void set_cpuidmask(const struct cpuinfo_x86 *c)
 	case 0x2c: /* Gulftown, Westmere-EP */
 	case 0x2e: /* Nehalem-EX(Beckton) */
 	case 0x2f: /* Westmere-EX */
-		msr_basic = MSR_INTEL_MASK_V2_CPUID1;
-		msr_ext   = MSR_INTEL_MASK_V2_CPUID80000001;
+		exp_msr_basic = msr_basic = MSR_INTEL_MASK_V2_CPUID1;
+		exp_msr_ext   = msr_ext   = MSR_INTEL_MASK_V2_CPUID80000001;
 		break;
 
 	case 0x2a: /* SandyBridge */
 	case 0x2d: /* SandyBridge-E, SandyBridge-EN, SandyBridge-EP */
-		msr_basic = MSR_INTEL_MASK_V3_CPUID1;
-		msr_ext   = MSR_INTEL_MASK_V3_CPUID80000001;
-		msr_xsave = MSR_INTEL_MASK_V3_CPUIDD_01;
+		exp_msr_basic = msr_basic = MSR_INTEL_MASK_V3_CPUID1;
+		exp_msr_ext   = msr_ext   = MSR_INTEL_MASK_V3_CPUID80000001;
+		exp_msr_xsave = msr_xsave = MSR_INTEL_MASK_V3_CPUIDD_01;
 		break;
 	}
 
-	status = set_mask;
+	if (msr_basic)
+		__probe_mask_msr(&msr_basic, LCAP_1cd, &cpuidmask_defaults._1cd);
 
-	if (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx)) {
-		if (msr_basic)
-			printk("Writing CPUID feature mask ecx:edx -> %08x:%08x\n",
-			       opt_cpuid_mask_ecx, opt_cpuid_mask_edx);
-		else
-			printk("No CPUID feature mask available\n");
-	}
-	else
-		msr_basic = 0;
-
-	if (~(opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx)) {
-		if (msr_ext)
-			printk("Writing CPUID extended feature mask ecx:edx -> %08x:%08x\n",
-			       opt_cpuid_mask_ext_ecx, opt_cpuid_mask_ext_edx);
-		else
-			printk("No CPUID extended feature mask available\n");
-	}
-	else
-		msr_ext = 0;
-
-	if (~opt_cpuid_mask_xsave_eax) {
-		if (msr_xsave)
-			printk("Writing CPUID xsave feature mask eax -> %08x\n",
-			       opt_cpuid_mask_xsave_eax);
-		else
-			printk("No CPUID xsave feature mask available\n");
+	if (msr_ext)
+		__probe_mask_msr(&msr_ext, LCAP_e1cd, &cpuidmask_defaults.e1cd);
+
+	if (msr_xsave)
+		__probe_mask_msr(&msr_xsave, LCAP_Da1, &cpuidmask_defaults.Da1);
+
+	/*
+	 * Don't bother warning about a mismatch if virtualised.  These MSRs
+	 * are not architectural and almost never virtualised.
+	 */
+	if ((expected_levelling_cap == levelling_caps) ||
+	    cpu_has_hypervisor)
+		return;
+
+	printk(XENLOG_WARNING "Mismatch between expected (%#x) "
+	       "and real (%#x) levelling caps: missing %#x\n",
+	       expected_levelling_cap, levelling_caps,
+	       (expected_levelling_cap ^ levelling_caps) & levelling_caps);
+	printk(XENLOG_WARNING "Fam %#x, model %#x expected (%#x/%#x/%#x), "
+	       "got (%#x/%#x/%#x)\n", c->x86, c->x86_model,
+	       exp_msr_basic, exp_msr_ext, exp_msr_xsave,
+	       msr_basic, msr_ext, msr_xsave);
+	printk(XENLOG_WARNING
+	       "If not running virtualised, please report a bug\n");
+}
+
+/*
+ * Context switch levelling state to the next domain.  A parameter of NULL is
+ * used to context switch to the default host state, and is used by the BSP/AP
+ * startup code.
+ */
+static void intel_ctxt_switch_levelling(const struct domain *nextd)
+{
+	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
+	const struct cpuidmasks *masks = &cpuidmask_defaults;
+
+#define LAZY(msr, field)						\
+	({								\
+		if (msr && (these_masks->field != masks->field))	\
+		{							\
+			wrmsrl(msr, masks->field);			\
+			these_masks->field = masks->field;		\
+		}							\
+	})
+
+	LAZY(msr_basic, _1cd);
+	LAZY(msr_ext,   e1cd);
+	LAZY(msr_xsave, Da1);
+
+#undef LAZY
+}
+
+/*
+ * opt_cpuid_mask_ecx/edx: cpuid.1[ecx, edx] feature mask.
+ * For example, E8400[Intel Core 2 Duo Processor series] ecx = 0x0008E3FD,
+ * edx = 0xBFEBFBFF when executing CPUID.EAX = 1 normally. If you want to
+ * 'rev down' to E8400, you can set these values in these Xen boot parameters.
+ */
+static void __init noinline intel_init_levelling(void)
+{
+	if (!probe_intel_cpuid_faulting())
+		probe_masking_msrs();
+
+	if (msr_basic) {
+		uint32_t ecx, edx, tmp;
+
+		cpuid(0x00000001, &tmp, &tmp, &ecx, &edx);
+
+		ecx &= opt_cpuid_mask_ecx;
+		edx &= opt_cpuid_mask_edx;
+
+		cpuidmask_defaults._1cd &= ((u64)edx << 32) | ecx;
 	}
-	else
-		msr_xsave = 0;
-
- setmask:
-	if (msr_basic &&
-	    wrmsr_safe(msr_basic,
-		       ((u64)opt_cpuid_mask_edx << 32) | opt_cpuid_mask_ecx)){
-		msr_basic = 0;
-		printk("Failed to set CPUID feature mask\n");
+
+	if (msr_ext) {
+		uint32_t ecx, edx, tmp;
+
+		cpuid(0x80000001, &tmp, &tmp, &ecx, &edx);
+
+		ecx &= opt_cpuid_mask_ext_ecx;
+		edx &= opt_cpuid_mask_ext_edx;
+
+		cpuidmask_defaults.e1cd &= ((u64)edx << 32) | ecx;
 	}
 
-	if (msr_ext &&
-	    wrmsr_safe(msr_ext,
-		       ((u64)opt_cpuid_mask_ext_edx << 32) | opt_cpuid_mask_ext_ecx)){
-		msr_ext = 0;
-		printk("Failed to set CPUID extended feature mask\n");
+	if (msr_xsave) {
+		uint32_t eax, tmp;
+
+		cpuid_count(0x0000000d, 1, &eax, &tmp, &tmp, &tmp);
+
+		eax &= opt_cpuid_mask_xsave_eax;
+
+		cpuidmask_defaults.Da1 &= (~0ULL << 32) | eax;
 	}
 
-	if (msr_xsave &&
-	    (rdmsr_safe(msr_xsave, msr_val) ||
-	     wrmsr_safe(msr_xsave,
-			(msr_val & (~0ULL << 32)) | opt_cpuid_mask_xsave_eax))){
-		msr_xsave = 0;
-		printk("Failed to set CPUID xsave feature mask\n");
+	if (opt_cpu_info) {
+		printk(XENLOG_INFO "Levelling caps: %#x\n", levelling_caps);
+		printk(XENLOG_INFO
+		       "MSR defaults: 1d 0x%08x, 1c 0x%08x, e1d 0x%08x, "
+		       "e1c 0x%08x, Da1 0x%08x\n",
+		       (uint32_t)(cpuidmask_defaults._1cd >> 32),
+		       (uint32_t)cpuidmask_defaults._1cd,
+		       (uint32_t)(cpuidmask_defaults.e1cd >> 32),
+		       (uint32_t)cpuidmask_defaults.e1cd,
+		       (uint32_t)cpuidmask_defaults.Da1);
 	}
 }
 
@@ -190,22 +257,13 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	    (boot_cpu_data.x86_mask == 3 || boot_cpu_data.x86_mask == 4))
 		paddr_bits = 36;
 
-	if (c == &boot_cpu_data && c->x86 == 6) {
-		if (probe_intel_cpuid_faulting())
-			__set_bit(X86_FEATURE_CPUID_FAULTING,
-				  c->x86_capability);
-	} else if (boot_cpu_has(X86_FEATURE_CPUID_FAULTING)) {
-		BUG_ON(!probe_intel_cpuid_faulting());
-		__set_bit(X86_FEATURE_CPUID_FAULTING, c->x86_capability);
-	}
+	if (c == &boot_cpu_data)
+		intel_init_levelling();
+
+	if (test_bit(X86_FEATURE_CPUID_FAULTING, boot_cpu_data.x86_capability))
+            __set_bit(X86_FEATURE_CPUID_FAULTING, c->x86_capability);
 
-	if (!cpu_has_cpuid_faulting)
-		set_cpuidmask(c);
-	else if ((c == &boot_cpu_data) &&
-		 (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
-		    opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
-		    opt_cpuid_mask_xsave_eax)))
-		printk("No CPUID feature masking support available\n");
+	intel_ctxt_switch_levelling(NULL);
 }
 
 /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 20/30] x86/cpu: Context switch cpuid masks and faulting state in context_switch()
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (18 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 19/30] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-17  8:06   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
                   ` (10 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

A single ctxt_switch_levelling() function pointer is provided
(defaulting to an empty nop), which is overridden in the appropriate
$VENDOR_init_levelling().

set_cpuid_faulting() is made private and included within
intel_ctxt_switch_levelling().

One functional change is that the faulting configuration is no longer special
cased for dom0.  There was never any need to, and it will cause dom0 to
observe the same information through native and enlightened cpuid.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Style fixes
 * ASSERT() that faulting is available in set_cpuid_faulting()
---
 xen/arch/x86/cpu/amd.c          |  3 +++
 xen/arch/x86/cpu/common.c       |  7 +++++++
 xen/arch/x86/cpu/intel.c        | 20 +++++++++++++++-----
 xen/arch/x86/domain.c           |  4 +---
 xen/include/asm-x86/processor.h |  2 +-
 5 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 1708dd9..9d162bc 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -317,6 +317,9 @@ static void __init noinline amd_init_levelling(void)
 		       (uint32_t)cpuidmask_defaults._7ab0,
 		       (uint32_t)cpuidmask_defaults._6c);
 	}
+
+	if (levelling_caps)
+		ctxt_switch_levelling = amd_ctxt_switch_levelling;
 }
 
 /*
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 3fdae96..dc2442b 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -90,6 +90,13 @@ static const struct cpu_dev default_cpu = {
 };
 static const struct cpu_dev *this_cpu = &default_cpu;
 
+static void default_ctxt_switch_levelling(const struct domain *nextd)
+{
+	/* Nop */
+}
+void (* __read_mostly ctxt_switch_levelling)(const struct domain *nextd) =
+	default_ctxt_switch_levelling;
+
 bool_t opt_cpu_info;
 boolean_param("cpuinfo", opt_cpu_info);
 
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 143f497..95d44dd 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -32,13 +32,15 @@ static bool_t __init probe_intel_cpuid_faulting(void)
 	return 1;
 }
 
-static DEFINE_PER_CPU(bool_t, cpuid_faulting_enabled);
-void set_cpuid_faulting(bool_t enable)
+static void set_cpuid_faulting(bool_t enable)
 {
+	static DEFINE_PER_CPU(bool_t, cpuid_faulting_enabled);
+	bool_t *this_enabled = &this_cpu(cpuid_faulting_enabled);
 	uint32_t hi, lo;
 
-	if (!cpu_has_cpuid_faulting ||
-	    this_cpu(cpuid_faulting_enabled) == enable )
+	ASSERT(cpu_has_cpuid_faulting);
+
+	if (*this_enabled == enable)
 		return;
 
 	rdmsr(MSR_INTEL_MISC_FEATURES_ENABLES, lo, hi);
@@ -47,7 +49,7 @@ void set_cpuid_faulting(bool_t enable)
 		lo |= MSR_MISC_FEATURES_CPUID_FAULTING;
 	wrmsr(MSR_INTEL_MISC_FEATURES_ENABLES, lo, hi);
 
-	this_cpu(cpuid_faulting_enabled) = enable;
+	*this_enabled = enable;
 }
 
 /*
@@ -151,6 +153,11 @@ static void intel_ctxt_switch_levelling(const struct domain *nextd)
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
 	const struct cpuidmasks *masks = &cpuidmask_defaults;
 
+	if (cpu_has_cpuid_faulting) {
+		set_cpuid_faulting(nextd && is_pv_domain(nextd));
+		return;
+	}
+
 #define LAZY(msr, field)						\
 	({								\
 		if (msr && (these_masks->field != masks->field))	\
@@ -221,6 +228,9 @@ static void __init noinline intel_init_levelling(void)
 		       (uint32_t)cpuidmask_defaults.e1cd,
 		       (uint32_t)cpuidmask_defaults.Da1);
 	}
+
+	if (levelling_caps)
+		ctxt_switch_levelling = intel_ctxt_switch_levelling;
 }
 
 static void early_init_intel(struct cpuinfo_x86 *c)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8f2c0b6..dbce90f 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2079,9 +2079,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
             load_segments(next);
         }
 
-        set_cpuid_faulting(is_pv_domain(nextd) &&
-                           !is_control_domain(nextd) &&
-                           !is_hardware_domain(nextd));
+        ctxt_switch_levelling(nextd);
     }
 
     context_saved(prev);
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 09e82d8..12b6e25 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -210,7 +210,7 @@ extern struct cpuinfo_x86 boot_cpu_data;
 extern struct cpuinfo_x86 cpu_data[];
 #define current_cpu_data cpu_data[smp_processor_id()]
 
-extern void set_cpuid_faulting(bool_t enable);
+extern void (*ctxt_switch_levelling)(const struct domain *nextd);
 
 extern u64 host_pat;
 extern bool_t opt_cpu_info;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (19 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 20/30] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-17  8:13   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 22/30] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
                   ` (9 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

And use them in preference to cpumask_defaults on context switch.  HVM domains
must not be masked (to avoid interfering with cpuid calls within the guest),
so always lazily context switch to the host default.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * s/cpumasks/cpuidmasks/
 * Use structure assignment
 * Fix error path in arch_domain_create()
---
 xen/arch/x86/cpu/amd.c       |  4 +++-
 xen/arch/x86/cpu/intel.c     |  5 ++++-
 xen/arch/x86/domain.c        | 11 +++++++++++
 xen/include/asm-x86/domain.h |  2 ++
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 9d162bc..deb98ea 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -208,7 +208,9 @@ static void __init noinline probe_masking_msrs(void)
 static void amd_ctxt_switch_levelling(const struct domain *nextd)
 {
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
-	const struct cpuidmasks *masks = &cpuidmask_defaults;
+	const struct cpuidmasks *masks =
+            (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
+            ? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
 
 #define LAZY(cap, msr, field)						\
 	({								\
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 95d44dd..b403af4 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -151,13 +151,16 @@ static void __init probe_masking_msrs(void)
 static void intel_ctxt_switch_levelling(const struct domain *nextd)
 {
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
-	const struct cpuidmasks *masks = &cpuidmask_defaults;
+	const struct cpuidmasks *masks;
 
 	if (cpu_has_cpuid_faulting) {
 		set_cpuid_faulting(nextd && is_pv_domain(nextd));
 		return;
 	}
 
+	masks = (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
+		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
+
 #define LAZY(msr, field)						\
 	({								\
 		if (msr && (these_masks->field != masks->field))	\
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dbce90f..d7cd4d2 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -574,6 +574,11 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
             goto fail;
         clear_page(d->arch.pv_domain.gdt_ldt_l1tab);
 
+        d->arch.pv_domain.cpuidmasks = xmalloc(struct cpuidmasks);
+        if ( !d->arch.pv_domain.cpuidmasks )
+            goto fail;
+        *d->arch.pv_domain.cpuidmasks = cpuidmask_defaults;
+
         rc = create_perdomain_mapping(d, GDT_LDT_VIRT_START,
                                       GDT_LDT_MBYTES << (20 - PAGE_SHIFT),
                                       NULL, NULL);
@@ -663,7 +668,10 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
         paging_final_teardown(d);
     free_perdomain_mappings(d);
     if ( is_pv_domain(d) )
+    {
+        xfree(d->arch.pv_domain.cpuidmasks);
         free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab);
+    }
     psr_domain_free(d);
     return rc;
 }
@@ -683,7 +691,10 @@ void arch_domain_destroy(struct domain *d)
 
     free_perdomain_mappings(d);
     if ( is_pv_domain(d) )
+    {
         free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab);
+        xfree(d->arch.pv_domain.cpuidmasks);
+    }
 
     free_xenheap_page(d->shared_info);
     cleanup_domain_irq_mapping(d);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 4072e27..c464932 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -252,6 +252,8 @@ struct pv_domain
 
     /* map_domain_page() mapping cache. */
     struct mapcache_domain mapcache;
+
+    struct cpuidmasks *cpuidmasks;
 };
 
 struct monitor_write_data {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 22/30] x86/domctl: Update PV domain cpumasks when setting cpuid policy
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (20 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-17  8:22   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
                   ` (8 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

This allows PV domains with different featuresets to observe different values
from a native cpuid instruction, on supporting hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
---
v2:
 * Use switch() rather than if/elseif chain
 * Clamp to static PV featuremask
---
 xen/arch/x86/domctl.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 55aecdc..f06bc02 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include <asm/xstate.h>
 #include <asm/debugger.h>
 #include <asm/psr.h>
+#include <asm/cpuid.h>
 
 static int gdbsx_guest_mem_io(domid_t domid, struct xen_domctl_gdbsx_memio *iop)
 {
@@ -87,6 +88,93 @@ static void update_domain_cpuid_info(struct domain *d,
         d->arch.x86_model = (ctl->eax >> 4) & 0xf;
         if ( d->arch.x86 >= 0x6 )
             d->arch.x86_model |= (ctl->eax >> 12) & 0xf0;
+
+        if ( is_pv_domain(d) )
+        {
+            uint64_t mask = cpuidmask_defaults._1cd;
+            uint32_t ecx = ctl->ecx & pv_featureset[FEATURESET_1c];
+            uint32_t edx = ctl->edx & pv_featureset[FEATURESET_1d];
+
+            switch ( boot_cpu_data.x86_vendor )
+            {
+            case X86_VENDOR_INTEL:
+                mask &= ((uint64_t)edx << 32) | ecx;
+                break;
+
+            case X86_VENDOR_AMD:
+                mask &= ((uint64_t)ecx << 32) | edx;
+                break;
+            }
+
+            d->arch.pv_domain.cpuidmasks->_1cd = mask;
+        }
+        break;
+
+    case 6:
+        if ( is_pv_domain(d) )
+        {
+            uint64_t mask = cpuidmask_defaults._6c;
+
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+                mask &= (~0ULL << 32) | ctl->ecx;
+
+            d->arch.pv_domain.cpuidmasks->_6c = mask;
+        }
+        break;
+
+    case 7:
+        if ( ctl->input[1] != 0 )
+            break;
+
+        if ( is_pv_domain(d) )
+        {
+            uint64_t mask = cpuidmask_defaults._7ab0;
+            uint32_t eax = ctl->eax;
+            uint32_t ebx = ctl->ebx & pv_featureset[FEATURESET_7b0];
+
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+                mask &= ((uint64_t)eax << 32) | ebx;
+
+            d->arch.pv_domain.cpuidmasks->_7ab0 = mask;
+        }
+        break;
+
+    case 0xd:
+        if ( ctl->input[1] != 1 )
+            break;
+
+        if ( is_pv_domain(d) )
+        {
+            uint64_t mask = cpuidmask_defaults.Da1;
+            uint32_t eax = ctl->eax & pv_featureset[FEATURESET_Da1];
+
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+                mask &= (~0ULL << 32) | eax;
+
+            d->arch.pv_domain.cpuidmasks->Da1 = mask;
+        }
+        break;
+
+    case 0x80000001:
+        if ( is_pv_domain(d) )
+        {
+            uint64_t mask = cpuidmask_defaults.e1cd;
+            uint32_t ecx = ctl->ecx & pv_featureset[FEATURESET_e1c];
+            uint32_t edx = ctl->edx & pv_featureset[FEATURESET_e1d];
+
+            switch ( boot_cpu_data.x86_vendor )
+            {
+            case X86_VENDOR_INTEL:
+                mask &= ((uint64_t)edx << 32) | ecx;
+                break;
+
+            case X86_VENDOR_AMD:
+                mask &= ((uint64_t)ecx << 32) | edx;
+                break;
+            }
+
+            d->arch.pv_domain.cpuidmasks->e1cd = mask;
+        }
         break;
     }
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (21 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 22/30] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-05 16:12   ` Wei Liu
  2016-02-17  8:30   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
                   ` (7 subsequent siblings)
  30 siblings, 2 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, Tim Deegan, Rob Hoes,
	Jan Beulich, David Scott

And provide stubs for toolstack use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: David Scott <dave@recoil.org>
CC: Rob Hoes <Rob.Hoes@citrix.com>

v2:
 * Rebased to use libxencall
 * Improve hypercall documentation
---
 tools/libxc/include/xenctrl.h       |  3 ++
 tools/libxc/xc_cpuid_x86.c          | 27 +++++++++++++++
 tools/ocaml/libs/xc/xenctrl.ml      |  3 ++
 tools/ocaml/libs/xc/xenctrl.mli     |  4 +++
 tools/ocaml/libs/xc/xenctrl_stubs.c | 35 ++++++++++++++++++++
 xen/arch/x86/sysctl.c               | 66 +++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h         | 25 ++++++++++++++
 7 files changed, 163 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 1a5f4ec..5a7500a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2571,6 +2571,9 @@ int xc_psr_cat_get_domain_data(xc_interface *xch, uint32_t domid,
 int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
                            uint32_t *cos_max, uint32_t *cbm_len,
                            bool *cdp_enabled);
+
+int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
+                          uint32_t *nr_features, uint32_t *featureset);
 #endif
 
 /* Compat shims */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c142595..7b802da 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -33,6 +33,33 @@
 #define DEF_MAX_INTELEXT  0x80000008u
 #define DEF_MAX_AMDEXT    0x8000001cu
 
+int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
+                          uint32_t *nr_features, uint32_t *featureset)
+{
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(featureset,
+                             *nr_features * sizeof(*featureset),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    int ret;
+
+    if ( xc_hypercall_bounce_pre(xch, featureset) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_get_cpu_featureset;
+    sysctl.u.cpu_featureset.index = index;
+    sysctl.u.cpu_featureset.nr_features = *nr_features;
+    set_xen_guest_handle(sysctl.u.cpu_featureset.features, featureset);
+
+    ret = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, featureset);
+
+    if ( !ret )
+        *nr_features = sysctl.u.cpu_featureset.nr_features;
+
+    return ret;
+}
+
 struct cpuid_domain_info
 {
     enum
diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index 58a53a1..75006e7 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -242,6 +242,9 @@ external version_changeset: handle -> string = "stub_xc_version_changeset"
 external version_capabilities: handle -> string =
   "stub_xc_version_capabilities"
 
+type featureset_index = Featureset_raw | Featureset_host | Featureset_pv | Featureset_hvm
+external get_cpu_featureset : handle -> featureset_index -> int64 array = "stub_xc_get_cpu_featureset"
+
 external watchdog : handle -> int -> int32 -> int
   = "stub_xc_watchdog"
 
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index 16443df..720e4b2 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -147,6 +147,10 @@ external version_compile_info : handle -> compile_info
 external version_changeset : handle -> string = "stub_xc_version_changeset"
 external version_capabilities : handle -> string
   = "stub_xc_version_capabilities"
+
+type featureset_index = Featureset_raw | Featureset_host | Featureset_pv | Featureset_hvm
+external get_cpu_featureset : handle -> featureset_index -> int64 array = "stub_xc_get_cpu_featureset"
+
 type core_magic = Magic_hvm | Magic_pv
 type core_header = {
   xch_magic : core_magic;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 74928e9..e7adf37 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1214,6 +1214,41 @@ CAMLprim value stub_xc_domain_deassign_device(value xch, value domid, value desc
 	CAMLreturn(Val_unit);
 }
 
+CAMLprim value stub_xc_get_cpu_featureset(value xch, value idx)
+{
+	CAMLparam2(xch, idx);
+	CAMLlocal1(bitmap_val);
+
+	/* Safe, because of the global ocaml lock. */
+	static uint32_t fs_len;
+
+	if (fs_len == 0)
+	{
+		int ret = xc_get_cpu_featureset(_H(xch), 0, &fs_len, NULL);
+
+		if (ret || (fs_len == 0))
+			failwith_xc(_H(xch));
+	}
+
+	{
+		/* To/from hypervisor to retrieve actual featureset */
+		uint32_t fs[fs_len], len = fs_len;
+		unsigned int i;
+
+		int ret = xc_get_cpu_featureset(_H(xch), Int_val(idx), &len, fs);
+
+		if (ret)
+			failwith_xc(_H(xch));
+
+		bitmap_val = caml_alloc(len, 0);
+
+		for (i = 0; i < len; ++i)
+			Store_field(bitmap_val, i, caml_copy_int64(fs[i]));
+	}
+
+	CAMLreturn(bitmap_val);
+}
+
 CAMLprim value stub_xc_watchdog(value xch, value domid, value timeout)
 {
 	CAMLparam3(xch, domid, timeout);
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 58cbd70..837a307 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -30,6 +30,7 @@
 #include <xen/cpu.h>
 #include <xsm/xsm.h>
 #include <asm/psr.h>
+#include <asm/cpuid.h>
 
 struct l3_cache_info {
     int ret;
@@ -190,6 +191,71 @@ long arch_do_sysctl(
         }
         break;
 
+    case XEN_SYSCTL_get_cpu_featureset:
+    {
+        const uint32_t *featureset;
+        unsigned int nr;
+
+        /* Request for maximum number of features? */
+        if ( guest_handle_is_null(sysctl->u.cpu_featureset.features) )
+        {
+            sysctl->u.cpu_featureset.nr_features = FSCAPINTS;
+            if ( __copy_field_to_guest(u_sysctl, sysctl,
+                                       u.cpu_featureset.nr_features) )
+                ret = -EFAULT;
+            break;
+        }
+
+        /* Clip the number of entries. */
+        nr = sysctl->u.cpu_featureset.nr_features;
+        if ( nr > FSCAPINTS )
+            nr = FSCAPINTS;
+
+        switch ( sysctl->u.cpu_featureset.index )
+        {
+        case XEN_SYSCTL_cpu_featureset_raw:
+            featureset = raw_featureset;
+            break;
+
+        case XEN_SYSCTL_cpu_featureset_host:
+            featureset = host_featureset;
+            break;
+
+        case XEN_SYSCTL_cpu_featureset_pv:
+            featureset = pv_featureset;
+            break;
+
+        case XEN_SYSCTL_cpu_featureset_hvm:
+            featureset = hvm_featureset;
+            break;
+
+        default:
+            featureset = NULL;
+            break;
+        }
+
+        /* Bad featureset index? */
+        if ( !ret && !featureset )
+            ret = -EINVAL;
+
+        /* Copy the requested featureset into place. */
+        if ( !ret && copy_to_guest(sysctl->u.cpu_featureset.features,
+                                   featureset, nr) )
+            ret = -EFAULT;
+
+        /* Inform the caller of how many features we wrote. */
+        sysctl->u.cpu_featureset.nr_features = nr;
+        if ( !ret && __copy_field_to_guest(u_sysctl, sysctl,
+                                           u.cpu_featureset.nr_features) )
+            ret = -EFAULT;
+
+        /* Inform the caller if there was more data to provide. */
+        if ( !ret && nr < FSCAPINTS )
+            ret = -ENOBUFS;
+
+        break;
+    }
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 96680eb..434cb2d 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -766,6 +766,29 @@ struct xen_sysctl_tmem_op {
 typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
 
+/*
+ * XEN_SYSCTL_get_cpu_featureset (x86 specific)
+ *
+ * Return information about the maximum sets of features which can be offered
+ * to different types of guests.  This is all strictly information as found in
+ * `cpuid` feature leaves with no synthetic additions.
+ */
+struct xen_sysctl_cpu_featureset {
+#define XEN_SYSCTL_cpu_featureset_raw      0
+#define XEN_SYSCTL_cpu_featureset_host     1
+#define XEN_SYSCTL_cpu_featureset_pv       2
+#define XEN_SYSCTL_cpu_featureset_hvm      3
+    uint32_t index;       /* IN: Which featureset to query? */
+    uint32_t nr_features; /* IN/OUT: Number of entries in/written to
+                           * 'features', or the maximum number of features if
+                           * the guest handle is NULL.  NB. All featuresets
+                           * come from the same numberspace, so have the same
+                           * maximum length. */
+    XEN_GUEST_HANDLE_64(uint32) features; /* OUT: */
+};
+typedef struct xen_sysctl_featureset xen_sysctl_featureset_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_featureset_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -791,6 +814,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pcitopoinfo                   22
 #define XEN_SYSCTL_psr_cat_op                    23
 #define XEN_SYSCTL_tmem_op                       24
+#define XEN_SYSCTL_get_cpu_featureset            25
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -816,6 +840,7 @@ struct xen_sysctl {
         struct xen_sysctl_psr_cmt_op        psr_cmt_op;
         struct xen_sysctl_psr_cat_op        psr_cat_op;
         struct xen_sysctl_tmem_op           tmem_op;
+        struct xen_sysctl_cpu_featureset    cpu_featureset;
         uint8_t                             pad[128];
     } u;
 };
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (22 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-05 16:12   ` Wei Liu
  2016-02-08 16:23   ` Tim Deegan
  2016-02-05 13:42 ` [PATCH v2 25/30] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
                   ` (6 subsequent siblings)
  30 siblings, 2 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

The type of the pointer to a bitmap is not interesting; it does not affect the
representation of the block of bits being pointed to.

Make the libxc functions consistent with those in Xen, so they can work just
as well with 'unsigned int *' based bitmaps.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

New in v2
---
 tools/libxc/xc_bitops.h | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/xc_bitops.h b/tools/libxc/xc_bitops.h
index cd749f4..2a1710f 100644
--- a/tools/libxc/xc_bitops.h
+++ b/tools/libxc/xc_bitops.h
@@ -26,48 +26,53 @@ static inline unsigned long *bitmap_alloc(int nr_bits)
     return calloc(1, bitmap_size(nr_bits));
 }
 
-static inline void bitmap_set(unsigned long *addr, int nr_bits)
+static inline void bitmap_set(void *addr, int nr_bits)
 {
     memset(addr, 0xff, bitmap_size(nr_bits));
 }
 
-static inline void bitmap_clear(unsigned long *addr, int nr_bits)
+static inline void bitmap_clear(void *addr, int nr_bits)
 {
     memset(addr, 0, bitmap_size(nr_bits));
 }
 
-static inline int test_bit(int nr, unsigned long *addr)
+static inline int test_bit(int nr, const void *_addr)
 {
+    const unsigned long *addr = _addr;
     return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
 }
 
-static inline void clear_bit(int nr, unsigned long *addr)
+static inline void clear_bit(int nr, void *_addr)
 {
+    unsigned long *addr = _addr;
     BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
 }
 
-static inline void set_bit(int nr, unsigned long *addr)
+static inline void set_bit(int nr, void *_addr)
 {
+    unsigned long *addr = _addr;
     BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
 }
 
-static inline int test_and_clear_bit(int nr, unsigned long *addr)
+static inline int test_and_clear_bit(int nr, void *addr)
 {
     int oldbit = test_bit(nr, addr);
     clear_bit(nr, addr);
     return oldbit;
 }
 
-static inline int test_and_set_bit(int nr, unsigned long *addr)
+static inline int test_and_set_bit(int nr, void *addr)
 {
     int oldbit = test_bit(nr, addr);
     set_bit(nr, addr);
     return oldbit;
 }
 
-static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
+static inline void bitmap_or(void *_dst, const void *_other,
                              int nr_bits)
 {
+    unsigned long *dst = _dst;
+    const unsigned long *other = _other;
     int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
     for ( i = 0; i < nr_longs; ++i )
         dst[i] |= other[i];
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 25/30] tools/libxc: Use public/featureset.h for cpuid policy generation
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (23 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-05 16:12   ` Wei Liu
  2016-02-05 13:42 ` [PATCH v2 26/30] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
                   ` (5 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

Rather than having a different local copy of some of the feature
definitions.

Modify the xc_cpuid_x86.c cpumask helpers to appropriate truncate the
new values.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/xc_cpufeature.h | 147 --------------------------------------------
 tools/libxc/xc_cpuid_x86.c  |   8 +--
 2 files changed, 4 insertions(+), 151 deletions(-)
 delete mode 100644 tools/libxc/xc_cpufeature.h

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
deleted file mode 100644
index ee53679..0000000
--- a/tools/libxc/xc_cpufeature.h
+++ /dev/null
@@ -1,147 +0,0 @@
-/*
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation;
- * version 2.1 of the License.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef __LIBXC_CPUFEATURE_H
-#define __LIBXC_CPUFEATURE_H
-
-/* Intel-defined CPU features, CPUID level 0x00000001 (edx) */
-#define X86_FEATURE_FPU          0 /* Onboard FPU */
-#define X86_FEATURE_VME          1 /* Virtual Mode Extensions */
-#define X86_FEATURE_DE           2 /* Debugging Extensions */
-#define X86_FEATURE_PSE          3 /* Page Size Extensions */
-#define X86_FEATURE_TSC          4 /* Time Stamp Counter */
-#define X86_FEATURE_MSR          5 /* Model-Specific Registers, RDMSR, WRMSR */
-#define X86_FEATURE_PAE          6 /* Physical Address Extensions */
-#define X86_FEATURE_MCE          7 /* Machine Check Architecture */
-#define X86_FEATURE_CX8          8 /* CMPXCHG8 instruction */
-#define X86_FEATURE_APIC         9 /* Onboard APIC */
-#define X86_FEATURE_SEP         11 /* SYSENTER/SYSEXIT */
-#define X86_FEATURE_MTRR        12 /* Memory Type Range Registers */
-#define X86_FEATURE_PGE         13 /* Page Global Enable */
-#define X86_FEATURE_MCA         14 /* Machine Check Architecture */
-#define X86_FEATURE_CMOV        15 /* CMOV instruction */
-#define X86_FEATURE_PAT         16 /* Page Attribute Table */
-#define X86_FEATURE_PSE36       17 /* 36-bit PSEs */
-#define X86_FEATURE_PN          18 /* Processor serial number */
-#define X86_FEATURE_CLFLSH      19 /* Supports the CLFLUSH instruction */
-#define X86_FEATURE_DS          21 /* Debug Store */
-#define X86_FEATURE_ACPI        22 /* ACPI via MSR */
-#define X86_FEATURE_MMX         23 /* Multimedia Extensions */
-#define X86_FEATURE_FXSR        24 /* FXSAVE and FXRSTOR instructions */
-#define X86_FEATURE_XMM         25 /* Streaming SIMD Extensions */
-#define X86_FEATURE_XMM2        26 /* Streaming SIMD Extensions-2 */
-#define X86_FEATURE_SELFSNOOP   27 /* CPU self snoop */
-#define X86_FEATURE_HT          28 /* Hyper-Threading */
-#define X86_FEATURE_ACC         29 /* Automatic clock control */
-#define X86_FEATURE_IA64        30 /* IA-64 processor */
-#define X86_FEATURE_PBE         31 /* Pending Break Enable */
-
-/* AMD-defined CPU features, CPUID level 0x80000001 */
-/* Don't duplicate feature flags which are redundant with Intel! */
-#define X86_FEATURE_SYSCALL     11 /* SYSCALL/SYSRET */
-#define X86_FEATURE_MP          19 /* MP Capable. */
-#define X86_FEATURE_NX          20 /* Execute Disable */
-#define X86_FEATURE_MMXEXT      22 /* AMD MMX extensions */
-#define X86_FEATURE_FFXSR       25 /* FFXSR instruction optimizations */
-#define X86_FEATURE_PAGE1GB     26 /* 1Gb large page support */
-#define X86_FEATURE_RDTSCP      27 /* RDTSCP */
-#define X86_FEATURE_LM          29 /* Long Mode (x86-64) */
-#define X86_FEATURE_3DNOWEXT    30 /* AMD 3DNow! extensions */
-#define X86_FEATURE_3DNOW       31 /* 3DNow! */
-
-/* Intel-defined CPU features, CPUID level 0x00000001 (ecx) */
-#define X86_FEATURE_XMM3         0 /* Streaming SIMD Extensions-3 */
-#define X86_FEATURE_PCLMULQDQ    1 /* Carry-less multiplication */
-#define X86_FEATURE_DTES64       2 /* 64-bit Debug Store */
-#define X86_FEATURE_MWAIT        3 /* Monitor/Mwait support */
-#define X86_FEATURE_DSCPL        4 /* CPL Qualified Debug Store */
-#define X86_FEATURE_VMXE         5 /* Virtual Machine Extensions */
-#define X86_FEATURE_SMXE         6 /* Safer Mode Extensions */
-#define X86_FEATURE_EST          7 /* Enhanced SpeedStep */
-#define X86_FEATURE_TM2          8 /* Thermal Monitor 2 */
-#define X86_FEATURE_SSSE3        9 /* Supplemental Streaming SIMD Exts-3 */
-#define X86_FEATURE_CID         10 /* Context ID */
-#define X86_FEATURE_FMA         12 /* Fused Multiply Add */
-#define X86_FEATURE_CX16        13 /* CMPXCHG16B */
-#define X86_FEATURE_XTPR        14 /* Send Task Priority Messages */
-#define X86_FEATURE_PDCM        15 /* Perf/Debug Capability MSR */
-#define X86_FEATURE_PCID        17 /* Process Context ID */
-#define X86_FEATURE_DCA         18 /* Direct Cache Access */
-#define X86_FEATURE_SSE4_1      19 /* Streaming SIMD Extensions 4.1 */
-#define X86_FEATURE_SSE4_2      20 /* Streaming SIMD Extensions 4.2 */
-#define X86_FEATURE_X2APIC      21 /* x2APIC */
-#define X86_FEATURE_MOVBE       22 /* movbe instruction */
-#define X86_FEATURE_POPCNT      23 /* POPCNT instruction */
-#define X86_FEATURE_TSC_DEADLINE 24 /* "tdt" TSC Deadline Timer */
-#define X86_FEATURE_AES         25 /* AES acceleration instructions */
-#define X86_FEATURE_XSAVE       26 /* XSAVE/XRSTOR/XSETBV/XGETBV */
-#define X86_FEATURE_AVX         28 /* Advanced Vector Extensions */
-#define X86_FEATURE_F16C        29 /* Half-precision convert instruction */
-#define X86_FEATURE_RDRAND      30 /* Digital Random Number Generator */
-#define X86_FEATURE_HYPERVISOR  31 /* Running under some hypervisor */
-
-/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001 */
-#define X86_FEATURE_XSTORE       2 /* on-CPU RNG present (xstore insn) */
-#define X86_FEATURE_XSTORE_EN    3 /* on-CPU RNG enabled */
-#define X86_FEATURE_XCRYPT       6 /* on-CPU crypto (xcrypt insn) */
-#define X86_FEATURE_XCRYPT_EN    7 /* on-CPU crypto enabled */
-#define X86_FEATURE_ACE2         8 /* Advanced Cryptography Engine v2 */
-#define X86_FEATURE_ACE2_EN      9 /* ACE v2 enabled */
-#define X86_FEATURE_PHE         10 /* PadLock Hash Engine */
-#define X86_FEATURE_PHE_EN      11 /* PHE enabled */
-#define X86_FEATURE_PMM         12 /* PadLock Montgomery Multiplier */
-#define X86_FEATURE_PMM_EN      13 /* PMM enabled */
-
-/* More extended AMD flags: CPUID level 0x80000001, ecx */
-#define X86_FEATURE_LAHF_LM      0 /* LAHF/SAHF in long mode */
-#define X86_FEATURE_CMP_LEGACY   1 /* If yes HyperThreading not valid */
-#define X86_FEATURE_SVM          2 /* Secure virtual machine */
-#define X86_FEATURE_EXTAPIC      3 /* Extended APIC space */
-#define X86_FEATURE_CR8_LEGACY   4 /* CR8 in 32-bit mode */
-#define X86_FEATURE_ABM          5 /* Advanced bit manipulation */
-#define X86_FEATURE_SSE4A        6 /* SSE-4A */
-#define X86_FEATURE_MISALIGNSSE  7 /* Misaligned SSE mode */
-#define X86_FEATURE_3DNOWPREFETCH 8 /* 3DNow prefetch instructions */
-#define X86_FEATURE_OSVW         9 /* OS Visible Workaround */
-#define X86_FEATURE_IBS         10 /* Instruction Based Sampling */
-#define X86_FEATURE_XOP         11 /* extended AVX instructions */
-#define X86_FEATURE_SKINIT      12 /* SKINIT/STGI instructions */
-#define X86_FEATURE_WDT         13 /* Watchdog timer */
-#define X86_FEATURE_LWP         15 /* Light Weight Profiling */
-#define X86_FEATURE_FMA4        16 /* 4 operands MAC instructions */
-#define X86_FEATURE_NODEID_MSR  19 /* NodeId MSR */
-#define X86_FEATURE_TBM         21 /* trailing bit manipulations */
-#define X86_FEATURE_TOPOEXT     22 /* topology extensions CPUID leafs */
-#define X86_FEATURE_DBEXT       26 /* data breakpoint extension */
-
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx) */
-#define X86_FEATURE_FSGSBASE     0 /* {RD,WR}{FS,GS}BASE instructions */
-#define X86_FEATURE_TSC_ADJUST   1 /* Tsc thread offset */
-#define X86_FEATURE_BMI1         3 /* 1st group bit manipulation extensions */
-#define X86_FEATURE_HLE          4 /* Hardware Lock Elision */
-#define X86_FEATURE_AVX2         5 /* AVX2 instructions */
-#define X86_FEATURE_SMEP         7 /* Supervisor Mode Execution Protection */
-#define X86_FEATURE_BMI2         8 /* 2nd group bit manipulation extensions */
-#define X86_FEATURE_ERMS         9 /* Enhanced REP MOVSB/STOSB */
-#define X86_FEATURE_INVPCID     10 /* Invalidate Process Context ID */
-#define X86_FEATURE_RTM         11 /* Restricted Transactional Memory */
-#define X86_FEATURE_RDSEED      18 /* RDSEED instruction */
-#define X86_FEATURE_ADX         19 /* ADCX, ADOX instructions */
-#define X86_FEATURE_SMAP        20 /* Supervisor Mode Access Protection */
-#define X86_FEATURE_PCOMMIT     22 /* PCOMMIT instruction */
-#define X86_FEATURE_CLFLUSHOPT  23 /* CLFLUSHOPT instruction */
-#define X86_FEATURE_CLWB        24 /* CLWB instruction */
-
-#endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 7b802da..348cbdd 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -22,12 +22,12 @@
 #include <stdlib.h>
 #include <stdbool.h>
 #include "xc_private.h"
-#include "xc_cpufeature.h"
+#include <xen/arch-x86/cpufeatureset.h>
 #include <xen/hvm/params.h>
 
-#define bitmaskof(idx)      (1u << (idx))
-#define clear_bit(idx, dst) ((dst) &= ~(1u << (idx)))
-#define set_bit(idx, dst)   ((dst) |= (1u << (idx)))
+#define bitmaskof(idx)      (1u << ((idx) & 31))
+#define clear_bit(idx, dst) ((dst) &= ~bitmaskof(idx))
+#define set_bit(idx, dst)   ((dst) |=  bitmaskof(idx))
 
 #define DEF_MAX_BASE 0x0000000du
 #define DEF_MAX_INTELEXT  0x80000008u
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 26/30] tools/libxc: Expose the automatically generated cpu featuremask information
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (24 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 25/30] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-05 16:12   ` Wei Liu
  2016-02-05 13:42 ` [PATCH v2 27/30] tools: Utility for dealing with featuresets Andrew Cooper
                   ` (4 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

New in v2
---
 tools/libxc/Makefile          |  9 ++++++
 tools/libxc/include/xenctrl.h | 14 ++++++++
 tools/libxc/xc_cpuid_x86.c    | 75 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+)

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 0a8614c..30de3fe 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -145,6 +145,15 @@ $(eval $(genpath-target))
 
 xc_private.h: _paths.h
 
+ifeq ($(CONFIG_X86),y)
+
+_xc_cpuid_autogen.h: $(XEN_ROOT)/xen/include/public/arch-x86/cpufeatureset.h $(XEN_ROOT)/xen/tools/gen-cpuid.py
+	$(PYTHON) $(XEN_ROOT)/xen/tools/gen-cpuid.py -i $^ -o $@.new
+	$(call move-if-changed,$@.new,$@)
+
+build: _xc_cpuid_autogen.h
+endif
+
 $(CTRL_LIB_OBJS) $(GUEST_LIB_OBJS) \
 $(CTRL_PIC_OBJS) $(GUEST_PIC_OBJS): xc_private.h
 
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 5a7500a..1da372d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2574,6 +2574,20 @@ int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
 
 int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
                           uint32_t *nr_features, uint32_t *featureset);
+
+uint32_t xc_get_cpu_featureset_size(void);
+
+enum xc_static_cpu_featuremask {
+    XC_FEATUREMASK_KNOWN,
+    XC_FEATUREMASK_INVERTED,
+    XC_FEATUREMASK_PV,
+    XC_FEATUREMASK_HVM_SHADOW,
+    XC_FEATUREMASK_HVM_HAP,
+    XC_FEATUREMASK_DEEP_FEATURES,
+};
+const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
+const uint32_t *xc_get_feature_deep_deps(uint32_t feature);
+
 #endif
 
 /* Compat shims */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 348cbdd..7ef37d2 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -22,6 +22,7 @@
 #include <stdlib.h>
 #include <stdbool.h>
 #include "xc_private.h"
+#include "_xc_cpuid_autogen.h"
 #include <xen/arch-x86/cpufeatureset.h>
 #include <xen/hvm/params.h>
 
@@ -60,6 +61,80 @@ int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
     return ret;
 }
 
+uint32_t xc_get_cpu_featureset_size(void)
+{
+    return FEATURESET_NR_ENTRIES;
+}
+
+const uint32_t *xc_get_static_cpu_featuremask(
+    enum xc_static_cpu_featuremask mask)
+{
+    const static uint32_t known[FEATURESET_NR_ENTRIES] = INIT_KNOWN_FEATURES,
+        inverted[FEATURESET_NR_ENTRIES] = INIT_INVERTED_FEATURES,
+        pv[FEATURESET_NR_ENTRIES] = INIT_PV_FEATURES,
+        hvm_shadow[FEATURESET_NR_ENTRIES] = INIT_HVM_SHADOW_FEATURES,
+        hvm_hap[FEATURESET_NR_ENTRIES] = INIT_HVM_HAP_FEATURES,
+        deep_features[FEATURESET_NR_ENTRIES] = INIT_DEEP_FEATURES;
+
+    XC_BUILD_BUG_ON(ARRAY_SIZE(known) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(inverted) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(pv) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(hvm_shadow) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(hvm_hap) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(deep_features) != FEATURESET_NR_ENTRIES);
+
+    switch ( mask )
+    {
+    case XC_FEATUREMASK_KNOWN:
+        return known;
+
+    case XC_FEATUREMASK_INVERTED:
+        return inverted;
+
+    case XC_FEATUREMASK_PV:
+        return pv;
+
+    case XC_FEATUREMASK_HVM_SHADOW:
+        return hvm_shadow;
+
+    case XC_FEATUREMASK_HVM_HAP:
+        return hvm_hap;
+
+    case XC_FEATUREMASK_DEEP_FEATURES:
+        return deep_features;
+
+    default:
+        return NULL;
+    }
+}
+
+const uint32_t *xc_get_feature_deep_deps(uint32_t feature)
+{
+    static const struct {
+        uint32_t feature;
+        uint32_t fs[FEATURESET_NR_ENTRIES];
+    } deep_deps[] = INIT_DEEP_DEPS;
+
+    unsigned int start = 0, end = ARRAY_SIZE(deep_deps);
+
+    XC_BUILD_BUG_ON(ARRAY_SIZE(deep_deps) != NR_DEEP_DEPS);
+
+    /* deep_deps[] is sorted.  Perform a binary search. */
+    while ( start < end )
+    {
+        unsigned int mid = start + ((end - start) / 2);
+
+        if ( deep_deps[mid].feature > feature )
+            end = mid;
+        else if ( deep_deps[mid].feature < feature )
+            start = mid + 1;
+        else
+            return deep_deps[mid].fs;
+    }
+
+    return NULL;
+}
+
 struct cpuid_domain_info
 {
     enum
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 27/30] tools: Utility for dealing with featuresets
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (25 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 26/30] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-05 16:13   ` Wei Liu
  2016-02-05 13:42 ` [PATCH v2 28/30] tools/libxc: Wire a featureset through to cpuid policy logic Andrew Cooper
                   ` (3 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

It is able to reports the current featuresets; both the static masks and
dynamic featuresets from Xen, or to decode an arbitrary featureset into
`/proc/cpuinfo` style strings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2: No linking hackary
---
 .gitignore             |   1 +
 tools/misc/Makefile    |   4 +
 tools/misc/xen-cpuid.c | 394 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 399 insertions(+)
 create mode 100644 tools/misc/xen-cpuid.c

diff --git a/.gitignore b/.gitignore
index b40453e..20ffa2d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -179,6 +179,7 @@ tools/misc/cpuperf/cpuperf-perfcntr
 tools/misc/cpuperf/cpuperf-xen
 tools/misc/xc_shadow
 tools/misc/xen_cpuperf
+tools/misc/xen-cpuid
 tools/misc/xen-detect
 tools/misc/xen-tmem-list-parse
 tools/misc/xenperf
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index a2ef0ec..a94dad9 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -10,6 +10,7 @@ CFLAGS += $(CFLAGS_xeninclude)
 CFLAGS += $(CFLAGS_libxenstore)
 
 # Everything to be installed in regular bin/
+INSTALL_BIN-$(CONFIG_X86)      += xen-cpuid
 INSTALL_BIN-$(CONFIG_X86)      += xen-detect
 INSTALL_BIN                    += xencons
 INSTALL_BIN                    += xencov_split
@@ -68,6 +69,9 @@ clean:
 .PHONY: distclean
 distclean: clean
 
+xen-cpuid: xen-cpuid.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)
+
 xen-hvmctx: xen-hvmctx.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
new file mode 100644
index 0000000..d0f2a5c
--- /dev/null
+++ b/tools/misc/xen-cpuid.c
@@ -0,0 +1,394 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <err.h>
+#include <getopt.h>
+#include <string.h>
+
+#include <xenctrl.h>
+
+#define ARRAY_SIZE(a) (sizeof a / sizeof *a)
+static uint32_t nr_features;
+
+static const char *str_1d[32] =
+{
+    [ 0] = "fpu",  [ 1] = "vme",
+    [ 2] = "de",   [ 3] = "pse",
+    [ 4] = "tsc",  [ 5] = "msr",
+    [ 6] = "pae",  [ 7] = "mce",
+    [ 8] = "cx8",  [ 9] = "apic",
+    [10] = "REZ",  [11] = "sysenter",
+    [12] = "mtrr", [13] = "pge",
+    [14] = "mca",  [15] = "cmov",
+    [16] = "pat",  [17] = "pse36",
+    [18] = "psn",  [19] = "clflsh",
+    [20] = "REZ",  [21] = "ds",
+    [22] = "acpi", [23] = "mmx",
+    [24] = "fxsr", [25] = "sse",
+    [26] = "sse2", [27] = "ss",
+    [28] = "htt",  [29] = "tm",
+    [30] = "ia64", [31] = "pbe",
+};
+
+static const char *str_1c[32] =
+{
+    [ 0] = "sse3",    [ 1] = "pclmulqdq",
+    [ 2] = "dtes64",  [ 3] = "monitor",
+    [ 4] = "ds-cpl",  [ 5] = "vmx",
+    [ 6] = "smx",     [ 7] = "est",
+    [ 8] = "tm2",     [ 9] = "ssse3",
+    [10] = "cntx-id", [11] = "sdgb",
+    [12] = "fma",     [13] = "cx16",
+    [14] = "xtpr",    [15] = "pdcm",
+    [16] = "REZ",     [17] = "pcid",
+    [18] = "dca",     [19] = "sse41",
+    [20] = "sse42",   [21] = "x2apic",
+    [22] = "movebe",  [23] = "popcnt",
+    [24] = "tsc-dl",  [25] = "aes",
+    [26] = "xsave",   [27] = "osxsave",
+    [28] = "avx",     [29] = "f16c",
+    [30] = "rdrnd",   [31] = "hyper",
+};
+
+static const char *str_e1d[32] =
+{
+    [ 0] = "fpu",    [ 1] = "vme",
+    [ 2] = "de",     [ 3] = "pse",
+    [ 4] = "tsc",    [ 5] = "msr",
+    [ 6] = "pae",    [ 7] = "mce",
+    [ 8] = "cx8",    [ 9] = "apic",
+    [10] = "REZ",    [11] = "syscall",
+    [12] = "mtrr",   [13] = "pge",
+    [14] = "mca",    [15] = "cmov",
+    [16] = "fcmov",  [17] = "pse36",
+    [18] = "REZ",    [19] = "mp",
+    [20] = "nx",     [21] = "REZ",
+    [22] = "mmx+",   [23] = "mmx",
+    [24] = "fxsr",   [25] = "fxsr+",
+    [26] = "pg1g",   [27] = "rdtscp",
+    [28] = "REZ",    [29] = "lm",
+    [30] = "3dnow+", [31] = "3dnow",
+};
+
+static const char *str_e1c[32] =
+{
+    [ 0] = "lahf_lm",    [ 1] = "cmp",
+    [ 2] = "svm",        [ 3] = "extapic",
+    [ 4] = "cr8d",       [ 5] = "lzcnt",
+    [ 6] = "sse4a",      [ 7] = "msse",
+    [ 8] = "3dnowpf",    [ 9] = "osvw",
+    [10] = "ibs",        [11] = "xop",
+    [12] = "skinit",     [13] = "wdt",
+    [14] = "REZ",        [15] = "lwp",
+    [16] = "fma4",       [17] = "tce",
+    [18] = "REZ",        [19] = "nodeid",
+    [20] = "REZ",        [21] = "tbm",
+    [22] = "topoext",    [23] = "perfctr_core",
+    [24] = "perfctr_nb", [25] = "REZ",
+    [26] = "dbx",        [27] = "perftsc",
+    [28] = "pcx_l2i",    [29] = "monitorx",
+
+    [30 ... 31] = "REZ",
+};
+
+static const char *str_7b0[32] =
+{
+    [ 0] = "fsgsbase", [ 1] = "tsc-adj",
+    [ 2] = "sgx",      [ 3] = "bmi1",
+    [ 4] = "hle",      [ 5] = "avx2",
+    [ 6] = "REZ",      [ 7] = "smep",
+    [ 8] = "bmi2",     [ 9] = "erms",
+    [10] = "invpcid",  [11] = "rtm",
+    [12] = "pqm",      [13] = "depfpp",
+    [14] = "mpx",      [15] = "pqe",
+    [16] = "avx512f",  [17] = "avx512dq",
+    [18] = "rdseed",   [19] = "adx",
+    [20] = "smap",     [21] = "avx512ifma",
+    [22] = "pcomit",   [23] = "clflushopt",
+    [24] = "clwb",     [25] = "pt",
+    [26] = "avx512pf", [27] = "avx512er",
+    [28] = "avx512cd", [29] = "sha",
+    [30] = "avx512bw", [31] = "avx512vl",
+};
+
+static const char *str_Da1[32] =
+{
+    [ 0] = "xsaveopt", [ 1] = "xsavec",
+    [ 2] = "xgetbv1",  [ 3] = "xsaves",
+
+    [4 ... 31] = "REZ",
+};
+
+static const char *str_7c0[32] =
+{
+    [ 0] = "prechwt1", [ 1] = "avx512vbmi",
+    [ 2] = "REZ",      [ 3] = "pku",
+    [ 4] = "ospke",
+
+    [5 ... 31] = "REZ",
+};
+
+static const char *str_e7d[32] =
+{
+    [0 ... 7] = "REZ",
+
+    [ 8] = "itsc",
+
+    [9 ... 31] = "REZ",
+};
+
+static const char *str_e8b[32] =
+{
+    [ 0] = "clzero",
+
+    [1 ... 31] = "REZ",
+};
+
+static struct {
+    const char *name;
+    const char *abbr;
+    const char **strs;
+} decodes[] =
+{
+    { "0x00000001.edx",   "1d",  str_1d },
+    { "0x00000001.ecx",   "1c",  str_1c },
+    { "0x80000001.edx",   "e1d", str_e1d },
+    { "0x80000001.ecx",   "e1c", str_e1c },
+    { "0x0000000d:1.eax", "Da1", str_Da1 },
+    { "0x00000007:0.ebx", "7b0", str_7b0 },
+    { "0x00000007:0.ecx", "7c0", str_7c0 },
+    { "0x80000007.edx",   "e7d", str_e7d },
+    { "0x80000008.ebx",   "e8b", str_e8b },
+};
+
+#define COL_ALIGN "18"
+
+static struct fsinfo {
+    const char *name;
+    uint32_t len;
+    uint32_t *fs;
+} featuresets[] =
+{
+    [XEN_SYSCTL_cpu_featureset_host] = { "Host", 0, NULL },
+    [XEN_SYSCTL_cpu_featureset_raw]  = { "Raw",  0, NULL },
+    [XEN_SYSCTL_cpu_featureset_pv]   = { "PV",   0, NULL },
+    [XEN_SYSCTL_cpu_featureset_hvm]  = { "HVM",  0, NULL },
+};
+
+static void dump_leaf(uint32_t leaf, const char **strs)
+{
+    unsigned i;
+
+    if ( !strs )
+    {
+        printf(" ???");
+        return;
+    }
+
+    for ( i = 0; i < 32; ++i )
+        if ( leaf & (1u << i) )
+            printf(" %s", strs[i] ?: "???" );
+}
+
+static void decode_featureset(const uint32_t *features,
+                              const uint32_t length,
+                              const char *name,
+                              bool detail)
+{
+    unsigned int i;
+
+    printf("%-"COL_ALIGN"s        ", name);
+    for ( i = 0; i < length; ++i )
+        printf("%08x%c", features[i],
+               i < length - 1 ? ':' : '\n');
+
+    if ( !detail )
+        return;
+
+    for ( i = 0; i < length && i < ARRAY_SIZE(decodes); ++i )
+    {
+        printf("  [%02u] %-"COL_ALIGN"s", i, decodes[i].name ?: "<UNKNOWN>");
+        if ( decodes[i].name )
+            dump_leaf(features[i], decodes[i].strs);
+        printf("\n");
+    }
+}
+
+static void get_featureset(xc_interface *xch, unsigned int idx)
+{
+    struct fsinfo *f = &featuresets[idx];
+
+    f->len = xc_get_cpu_featureset_size();
+    f->fs = calloc(nr_features, sizeof(*f->fs));
+
+    if ( !f->fs )
+        err(1, "calloc(, featureset)");
+
+    if ( xc_get_cpu_featureset(xch, idx, &f->len, f->fs) )
+        err(1, "xc_get_featureset()");
+}
+
+static void dump_info(xc_interface *xch, bool detail)
+{
+    unsigned int i;
+
+    printf("nr_features: %u\n", nr_features);
+
+    if ( !detail )
+    {
+        printf("       %"COL_ALIGN"s ", "KEY");
+        for ( i = 0; i < ARRAY_SIZE(decodes); ++i )
+            printf("%-8s ", decodes[i].abbr ?: "???");
+        printf("\n");
+    }
+
+    printf("\nStatic sets:\n");
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_KNOWN),
+                      nr_features, "Known", detail);
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_INVERTED),
+                      nr_features, "Inverted", detail);
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_PV),
+                      nr_features, "PV Mask", detail);
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_HVM_SHADOW),
+                      nr_features, "HVM Shadow Mask", detail);
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_HVM_HAP),
+                      nr_features, "HVM Hap Mask", detail);
+
+    printf("\nDynamic sets:\n");
+    for ( i = 0; i < ARRAY_SIZE(featuresets); ++i )
+    {
+        get_featureset(xch, i);
+
+        decode_featureset(featuresets[i].fs, featuresets[i].len,
+                          featuresets[i].name, detail);
+    }
+
+    for ( i = 0; i < ARRAY_SIZE(featuresets); ++i )
+        free(featuresets[i].fs);
+}
+
+int main(int argc, char **argv)
+{
+    enum { MODE_UNKNOWN, MODE_INFO, MODE_DETAIL, MODE_INTERPRET }
+    mode = MODE_UNKNOWN;
+
+    nr_features = xc_get_cpu_featureset_size();
+
+    for ( ;; )
+    {
+        int option_index = 0, c;
+        static struct option long_options[] =
+        {
+            { "help", no_argument, NULL, 'h' },
+            { "info", no_argument, NULL, 'i' },
+            { "detail", no_argument, NULL, 'd' },
+            { "verbose", no_argument, NULL, 'v' },
+            { NULL, 0, NULL, 0 },
+        };
+
+        c = getopt_long(argc, argv, "hidv", long_options, &option_index);
+
+        if ( c == -1 )
+            break;
+
+        switch ( c )
+        {
+        case 'h':
+ option_error:
+            printf("Usage: %s [ info | detail | <featureset>* ]\n", argv[0]);
+            return 0;
+
+        case 'i':
+            mode = MODE_INFO;
+            break;
+
+        case 'd':
+        case 'v':
+            mode = MODE_DETAIL;
+            break;
+
+        default:
+            printf("Bad option '%c'\n", c);
+            goto option_error;
+        }
+    }
+
+    if ( mode == MODE_UNKNOWN )
+    {
+        if ( optind == argc )
+            mode = MODE_INFO;
+        else if ( optind < argc )
+        {
+            if ( !strcmp(argv[optind], "info") )
+            {
+                mode = MODE_INFO;
+                optind++;
+            }
+            else if ( !strcmp(argv[optind], "detail") )
+            {
+                mode = MODE_DETAIL;
+                optind++;
+            }
+            else
+                mode = MODE_INTERPRET;
+        }
+        else
+            mode = MODE_INTERPRET;
+    }
+
+    if ( mode == MODE_INFO || mode == MODE_DETAIL )
+    {
+        xc_interface *xch = xc_interface_open(0, 0, 0);
+
+        if ( !xch )
+            err(1, "xc_interface_open");
+
+        if ( xc_get_cpu_featureset(xch, 0, &nr_features, NULL) )
+            err(1, "xc_get_featureset(, NULL)");
+
+        dump_info(xch, mode == MODE_DETAIL);
+
+        xc_interface_close(xch);
+    }
+    else
+    {
+        uint32_t fs[nr_features + 1];
+
+        while ( optind < argc )
+        {
+            char *ptr = argv[optind++];
+            unsigned int i = 0;
+            int offset;
+
+            memset(fs, 0, sizeof(fs));
+
+            while ( sscanf(ptr, "%x%n", &fs[i], &offset) == 1 )
+            {
+                i++;
+                ptr += offset;
+
+                if ( i == nr_features )
+                    break;
+
+                if ( *ptr == ':' )
+                {
+                    ptr++; continue;
+                }
+                break;
+            }
+
+            decode_featureset(fs, i, "Raw", true);
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 28/30] tools/libxc: Wire a featureset through to cpuid policy logic
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (26 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 27/30] tools: Utility for dealing with featuresets Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-05 16:13   ` Wei Liu
  2016-02-05 13:42 ` [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
                   ` (2 subsequent siblings)
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

Later changes will cause the cpuid generation logic to seed their information
from a featureset.  This patch adds the infrastructure to specify a
featureset, and will obtain the appropriate default from Xen if omitted.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2:
 * Modify existing call rather than introducing a new one.
 * Fix up in-tree callsites.
---
 tools/libxc/include/xenctrl.h       |  4 ++-
 tools/libxc/xc_cpuid_x86.c          | 69 ++++++++++++++++++++++++++++++++-----
 tools/libxl/libxl_cpuid.c           |  2 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c |  2 +-
 tools/python/xen/lowlevel/xc/xc.c   |  2 +-
 5 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 1da372d..230f834 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1927,7 +1927,9 @@ int xc_cpuid_set(xc_interface *xch,
                  const char **config,
                  char **config_transformed);
 int xc_cpuid_apply_policy(xc_interface *xch,
-                          domid_t domid);
+                          domid_t domid,
+                          uint32_t *featureset,
+                          unsigned int nr_features);
 void xc_cpuid_to_str(const unsigned int *regs,
                      char **strs); /* some strs[] may be NULL if ENOMEM */
 int xc_mca_op(xc_interface *xch, struct xen_mc *mc);
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 7ef37d2..e762d73 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -148,6 +148,9 @@ struct cpuid_domain_info
     bool pvh;
     uint64_t xfeature_mask;
 
+    uint32_t *featureset;
+    unsigned int nr_features;
+
     /* PV-only information. */
     bool pv64;
 
@@ -179,11 +182,14 @@ static void cpuid(const unsigned int *input, unsigned int *regs)
 }
 
 static int get_cpuid_domain_info(xc_interface *xch, domid_t domid,
-                                 struct cpuid_domain_info *info)
+                                 struct cpuid_domain_info *info,
+                                 uint32_t *featureset,
+                                 unsigned int nr_features)
 {
     struct xen_domctl domctl = {};
     xc_dominfo_t di;
     unsigned int in[2] = { 0, ~0U }, regs[4];
+    unsigned int i, host_nr_features = xc_get_cpu_featureset_size();
     int rc;
 
     cpuid(in, regs);
@@ -205,6 +211,23 @@ static int get_cpuid_domain_info(xc_interface *xch, domid_t domid,
     info->hvm = di.hvm;
     info->pvh = di.pvh;
 
+    info->featureset = calloc(host_nr_features, sizeof(*info->featureset));
+    if ( !info->featureset )
+        return -ENOMEM;
+
+    info->nr_features = host_nr_features;
+
+    if ( featureset )
+    {
+        memcpy(info->featureset, featureset,
+               min(host_nr_features, nr_features) * sizeof(*info->featureset));
+
+        /* Check for truncated set bits. */
+        for ( i = nr_features; i < host_nr_features; ++i )
+            if ( featureset[i] != 0 )
+                return -EOPNOTSUPP;
+    }
+
     /* Get xstate information. */
     domctl.cmd = XEN_DOMCTL_getvcpuextstate;
     domctl.domain = domid;
@@ -229,6 +252,14 @@ static int get_cpuid_domain_info(xc_interface *xch, domid_t domid,
             return rc;
 
         info->nestedhvm = !!val;
+
+        if ( !featureset )
+        {
+            rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_hvm,
+                                       &host_nr_features, info->featureset);
+            if ( rc )
+                return rc;
+        }
     }
     else
     {
@@ -239,11 +270,24 @@ static int get_cpuid_domain_info(xc_interface *xch, domid_t domid,
             return rc;
 
         info->pv64 = (width == 8);
+
+        if ( !featureset )
+        {
+            rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_pv,
+                                       &host_nr_features, info->featureset);
+            if ( rc )
+                return rc;
+        }
     }
 
     return 0;
 }
 
+static void free_cpuid_domain_info(struct cpuid_domain_info *info)
+{
+    free(info->featureset);
+}
+
 static void amd_xc_cpuid_policy(xc_interface *xch,
                                 const struct cpuid_domain_info *info,
                                 const unsigned int *input, unsigned int *regs)
@@ -764,16 +808,18 @@ void xc_cpuid_to_str(const unsigned int *regs, char **strs)
     }
 }
 
-int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid)
+int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
+                          uint32_t *featureset,
+                          unsigned int nr_features)
 {
     struct cpuid_domain_info info = {};
     unsigned int input[2] = { 0, 0 }, regs[4];
     unsigned int base_max, ext_max;
     int rc;
 
-    rc = get_cpuid_domain_info(xch, domid, &info);
+    rc = get_cpuid_domain_info(xch, domid, &info, featureset, nr_features);
     if ( rc )
-        return rc;
+        goto out;
 
     cpuid(input, regs);
     base_max = (regs[0] <= DEF_MAX_BASE) ? regs[0] : DEF_MAX_BASE;
@@ -796,7 +842,7 @@ int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid)
         {
             rc = xc_cpuid_do_domctl(xch, domid, input, regs);
             if ( rc )
-                return rc;
+                goto out;
         }
 
         /* Intel cache descriptor leaves. */
@@ -824,7 +870,9 @@ int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid)
             break;
     }
 
-    return 0;
+ out:
+    free_cpuid_domain_info(&info);
+    return rc;
 }
 
 /*
@@ -913,9 +961,9 @@ int xc_cpuid_set(
 
     memset(config_transformed, 0, 4 * sizeof(*config_transformed));
 
-    rc = get_cpuid_domain_info(xch, domid, &info);
+    rc = get_cpuid_domain_info(xch, domid, &info, NULL, 0);
     if ( rc )
-        return rc;
+        goto out;
 
     cpuid(input, regs);
 
@@ -966,7 +1014,7 @@ int xc_cpuid_set(
 
     rc = xc_cpuid_do_domctl(xch, domid, input, regs);
     if ( rc == 0 )
-        return 0;
+        goto out;
 
  fail:
     for ( i = 0; i < 4; i++ )
@@ -974,5 +1022,8 @@ int xc_cpuid_set(
         free(config_transformed[i]);
         config_transformed[i] = NULL;
     }
+
+ out:
+    free_cpuid_domain_info(&info);
     return rc;
 }
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index c66e912..fc20157 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -334,7 +334,7 @@ int libxl_cpuid_parse_config_xend(libxl_cpuid_policy_list *cpuid,
 
 void libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid)
 {
-    xc_cpuid_apply_policy(ctx->xch, domid);
+    xc_cpuid_apply_policy(ctx->xch, domid, NULL, 0);
 }
 
 void libxl_cpuid_set(libxl_ctx *ctx, uint32_t domid,
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index e7adf37..5477df3 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -796,7 +796,7 @@ CAMLprim value stub_xc_domain_cpuid_apply_policy(value xch, value domid)
 #if defined(__i386__) || defined(__x86_64__)
 	int r;
 
-	r = xc_cpuid_apply_policy(_H(xch), _D(domid));
+	r = xc_cpuid_apply_policy(_H(xch), _D(domid), NULL, 0);
 	if (r < 0)
 		failwith_xc(_H(xch));
 #else
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index c40a4e9..22a1c9f 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -731,7 +731,7 @@ static PyObject *pyxc_dom_set_policy_cpuid(XcObject *self,
     if ( !PyArg_ParseTuple(args, "i", &domid) )
         return NULL;
 
-    if ( xc_cpuid_apply_policy(self->xc_handle, domid) )
+    if ( xc_cpuid_apply_policy(self->xc_handle, domid, NULL, 0) )
         return pyxc_error_to_exception(self->xc_handle);
 
     Py_INCREF(zero);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (27 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 28/30] tools/libxc: Wire a featureset through to cpuid policy logic Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-05 16:13   ` Wei Liu
  2016-02-17  8:55   ` Jan Beulich
  2016-02-05 13:42 ` [PATCH v2 30/30] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
  2016-02-08 17:26 ` [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling Andrew Cooper
  30 siblings, 2 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

It is conceptually wrong to base a VM's featureset on the features visible to
the toolstack which happens to construct it.

Instead, the featureset used is either an explicit one passed by the
toolstack, or the default which Xen believes it can give to the guest.

Collect all the feature manipulation into a single function which adjusts the
featureset, and perform deep dependency removal.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2: Join several related patches together
---
 tools/libxc/xc_cpuid_x86.c | 331 ++++++++++++++++-----------------------------
 1 file changed, 119 insertions(+), 212 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index e762d73..0e79812 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -21,18 +21,22 @@
 
 #include <stdlib.h>
 #include <stdbool.h>
+#include <limits.h>
 #include "xc_private.h"
+#include "xc_bitops.h"
 #include "_xc_cpuid_autogen.h"
 #include <xen/arch-x86/cpufeatureset.h>
 #include <xen/hvm/params.h>
 
+#define featureword_of(idx) ((idx) >> 5)
 #define bitmaskof(idx)      (1u << ((idx) & 31))
-#define clear_bit(idx, dst) ((dst) &= ~bitmaskof(idx))
-#define set_bit(idx, dst)   ((dst) |=  bitmaskof(idx))
+#define clear_feature(idx, dst) ((dst) &= ~bitmaskof(idx))
+#define set_feature(idx, dst)   ((dst) |=  bitmaskof(idx))
 
 #define DEF_MAX_BASE 0x0000000du
 #define DEF_MAX_INTELEXT  0x80000008u
 #define DEF_MAX_AMDEXT    0x8000001cu
+#define COMMON_1D INIT_COMMON_FEATURES
 
 int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
                           uint32_t *nr_features, uint32_t *featureset)
@@ -304,38 +308,6 @@ static void amd_xc_cpuid_policy(xc_interface *xch,
             regs[0] = DEF_MAX_AMDEXT;
         break;
 
-    case 0x80000001: {
-        if ( !info->pae )
-            clear_bit(X86_FEATURE_PAE, regs[3]);
-
-        /* Filter all other features according to a whitelist. */
-        regs[2] &= (bitmaskof(X86_FEATURE_LAHF_LM) |
-                    bitmaskof(X86_FEATURE_CMP_LEGACY) |
-                    (info->nestedhvm ? bitmaskof(X86_FEATURE_SVM) : 0) |
-                    bitmaskof(X86_FEATURE_CR8_LEGACY) |
-                    bitmaskof(X86_FEATURE_ABM) |
-                    bitmaskof(X86_FEATURE_SSE4A) |
-                    bitmaskof(X86_FEATURE_MISALIGNSSE) |
-                    bitmaskof(X86_FEATURE_3DNOWPREFETCH) |
-                    bitmaskof(X86_FEATURE_OSVW) |
-                    bitmaskof(X86_FEATURE_XOP) |
-                    bitmaskof(X86_FEATURE_LWP) |
-                    bitmaskof(X86_FEATURE_FMA4) |
-                    bitmaskof(X86_FEATURE_TBM) |
-                    bitmaskof(X86_FEATURE_DBEXT));
-        regs[3] &= (0x0183f3ff | /* features shared with 0x00000001:EDX */
-                    bitmaskof(X86_FEATURE_NX) |
-                    bitmaskof(X86_FEATURE_LM) |
-                    bitmaskof(X86_FEATURE_PAGE1GB) |
-                    bitmaskof(X86_FEATURE_SYSCALL) |
-                    bitmaskof(X86_FEATURE_MP) |
-                    bitmaskof(X86_FEATURE_MMXEXT) |
-                    bitmaskof(X86_FEATURE_FFXSR) |
-                    bitmaskof(X86_FEATURE_3DNOW) |
-                    bitmaskof(X86_FEATURE_3DNOWEXT));
-        break;
-    }
-
     case 0x80000008:
         /*
          * ECX[15:12] is ApicIdCoreSize: ECX[7:0] is NumberOfCores (minus one).
@@ -382,12 +354,6 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
 {
     switch ( input[0] )
     {
-    case 0x00000001:
-        /* ECX[5] is availability of VMX */
-        if ( info->nestedhvm )
-            set_bit(X86_FEATURE_VMXE, regs[2]);
-        break;
-
     case 0x00000004:
         /*
          * EAX[31:26] is Maximum Cores Per Package (minus one).
@@ -403,19 +369,6 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
             regs[0] = DEF_MAX_INTELEXT;
         break;
 
-    case 0x80000001: {
-        /* Only a few features are advertised in Intel's 0x80000001. */
-        regs[2] &= (bitmaskof(X86_FEATURE_LAHF_LM) |
-                    bitmaskof(X86_FEATURE_3DNOWPREFETCH) |
-                    bitmaskof(X86_FEATURE_ABM));
-        regs[3] &= (bitmaskof(X86_FEATURE_NX) |
-                    bitmaskof(X86_FEATURE_LM) |
-                    bitmaskof(X86_FEATURE_PAGE1GB) |
-                    bitmaskof(X86_FEATURE_SYSCALL) |
-                    bitmaskof(X86_FEATURE_RDTSCP));
-        break;
-    }
-
     case 0x80000005:
         regs[0] = regs[1] = regs[2] = 0;
         break;
@@ -467,11 +420,8 @@ static void xc_cpuid_config_xsave(xc_interface *xch,
         regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
         break;
     case 1: /* leaf 1 */
-        regs[0] &= (XSAVEOPT | XSAVEC | XGETBV1 | XSAVES);
-        if ( !info->hvm )
-            regs[0] &= ~XSAVES;
-        regs[2] &= info->xfeature_mask;
-        regs[3] = 0;
+        regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
+        regs[1] = regs[2] = regs[3] = 0;
         break;
     case 2 ... 63: /* sub-leaves */
         if ( !(info->xfeature_mask & (1ULL << input[1])) )
@@ -503,82 +453,22 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
          */
         regs[1] = (regs[1] & 0x0000ffffu) | ((regs[1] & 0x007f0000u) << 1);
 
-        regs[2] &= (bitmaskof(X86_FEATURE_XMM3) |
-                    bitmaskof(X86_FEATURE_PCLMULQDQ) |
-                    bitmaskof(X86_FEATURE_SSSE3) |
-                    bitmaskof(X86_FEATURE_FMA) |
-                    bitmaskof(X86_FEATURE_CX16) |
-                    bitmaskof(X86_FEATURE_PCID) |
-                    bitmaskof(X86_FEATURE_SSE4_1) |
-                    bitmaskof(X86_FEATURE_SSE4_2) |
-                    bitmaskof(X86_FEATURE_MOVBE)  |
-                    bitmaskof(X86_FEATURE_POPCNT) |
-                    bitmaskof(X86_FEATURE_AES) |
-                    bitmaskof(X86_FEATURE_F16C) |
-                    bitmaskof(X86_FEATURE_RDRAND) |
-                    ((info->xfeature_mask != 0) ?
-                     (bitmaskof(X86_FEATURE_AVX) |
-                      bitmaskof(X86_FEATURE_XSAVE)) : 0));
-
-        regs[2] |= (bitmaskof(X86_FEATURE_HYPERVISOR) |
-                    bitmaskof(X86_FEATURE_TSC_DEADLINE) |
-                    bitmaskof(X86_FEATURE_X2APIC));
-
-        regs[3] &= (bitmaskof(X86_FEATURE_FPU) |
-                    bitmaskof(X86_FEATURE_VME) |
-                    bitmaskof(X86_FEATURE_DE) |
-                    bitmaskof(X86_FEATURE_PSE) |
-                    bitmaskof(X86_FEATURE_TSC) |
-                    bitmaskof(X86_FEATURE_MSR) |
-                    bitmaskof(X86_FEATURE_PAE) |
-                    bitmaskof(X86_FEATURE_MCE) |
-                    bitmaskof(X86_FEATURE_CX8) |
-                    bitmaskof(X86_FEATURE_APIC) |
-                    bitmaskof(X86_FEATURE_SEP) |
-                    bitmaskof(X86_FEATURE_MTRR) |
-                    bitmaskof(X86_FEATURE_PGE) |
-                    bitmaskof(X86_FEATURE_MCA) |
-                    bitmaskof(X86_FEATURE_CMOV) |
-                    bitmaskof(X86_FEATURE_PAT) |
-                    bitmaskof(X86_FEATURE_CLFLSH) |
-                    bitmaskof(X86_FEATURE_PSE36) |
-                    bitmaskof(X86_FEATURE_MMX) |
-                    bitmaskof(X86_FEATURE_FXSR) |
-                    bitmaskof(X86_FEATURE_XMM) |
-                    bitmaskof(X86_FEATURE_XMM2) |
-                    bitmaskof(X86_FEATURE_HT));
-            
-        /* We always support MTRR MSRs. */
-        regs[3] |= bitmaskof(X86_FEATURE_MTRR);
-
-        if ( !info->pae )
-        {
-            clear_bit(X86_FEATURE_PAE, regs[3]);
-            clear_bit(X86_FEATURE_PSE36, regs[3]);
-        }
+        regs[2] = info->featureset[featureword_of(X86_FEATURE_XMM3)];
+        regs[3] = info->featureset[featureword_of(X86_FEATURE_FPU)];
         break;
 
     case 0x00000007: /* Intel-defined CPU features */
-        if ( input[1] == 0 ) {
-            regs[1] &= (bitmaskof(X86_FEATURE_TSC_ADJUST) |
-                        bitmaskof(X86_FEATURE_BMI1) |
-                        bitmaskof(X86_FEATURE_HLE)  |
-                        bitmaskof(X86_FEATURE_AVX2) |
-                        bitmaskof(X86_FEATURE_SMEP) |
-                        bitmaskof(X86_FEATURE_BMI2) |
-                        bitmaskof(X86_FEATURE_ERMS) |
-                        bitmaskof(X86_FEATURE_INVPCID) |
-                        bitmaskof(X86_FEATURE_RTM)  |
-                        bitmaskof(X86_FEATURE_RDSEED)  |
-                        bitmaskof(X86_FEATURE_ADX)  |
-                        bitmaskof(X86_FEATURE_SMAP) |
-                        bitmaskof(X86_FEATURE_FSGSBASE) |
-                        bitmaskof(X86_FEATURE_PCOMMIT) |
-                        bitmaskof(X86_FEATURE_CLWB) |
-                        bitmaskof(X86_FEATURE_CLFLUSHOPT));
-        } else
+        if ( input[1] == 0 )
+        {
+            regs[1] = info->featureset[featureword_of(X86_FEATURE_FSGSBASE)];
+            regs[2] = info->featureset[featureword_of(X86_FEATURE_PREFETCHWT1)];
+        }
+        else
+        {
             regs[1] = 0;
-        regs[0] = regs[2] = regs[3] = 0;
+            regs[2] = 0;
+        }
+        regs[0] = regs[3] = 0;
         break;
 
     case 0x0000000d:
@@ -590,14 +480,8 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
         break;
 
     case 0x80000001:
-        if ( !info->pae )
-        {
-            clear_bit(X86_FEATURE_LAHF_LM, regs[2]);
-            clear_bit(X86_FEATURE_LM, regs[3]);
-            clear_bit(X86_FEATURE_NX, regs[3]);
-            clear_bit(X86_FEATURE_PSE36, regs[3]);
-            clear_bit(X86_FEATURE_PAGE1GB, regs[3]);
-        }
+        regs[2] = info->featureset[featureword_of(X86_FEATURE_LAHF_LM)];
+        regs[3] = info->featureset[featureword_of(X86_FEATURE_SYSCALL)];
         break;
 
     case 0x80000007:
@@ -641,64 +525,25 @@ static void xc_cpuid_pv_policy(xc_interface *xch,
                                const struct cpuid_domain_info *info,
                                const unsigned int *input, unsigned int *regs)
 {
-    if ( (input[0] & 0x7fffffff) == 0x00000001 )
-    {
-        clear_bit(X86_FEATURE_VME, regs[3]);
-        if ( !info->pvh )
-        {
-            clear_bit(X86_FEATURE_PSE, regs[3]);
-            clear_bit(X86_FEATURE_PGE, regs[3]);
-        }
-        clear_bit(X86_FEATURE_MCE, regs[3]);
-        clear_bit(X86_FEATURE_MCA, regs[3]);
-        clear_bit(X86_FEATURE_MTRR, regs[3]);
-        clear_bit(X86_FEATURE_PSE36, regs[3]);
-    }
-
     switch ( input[0] )
     {
     case 0x00000001:
-        if ( info->vendor == VENDOR_AMD )
-            clear_bit(X86_FEATURE_SEP, regs[3]);
-        clear_bit(X86_FEATURE_DS, regs[3]);
-        clear_bit(X86_FEATURE_ACC, regs[3]);
-        clear_bit(X86_FEATURE_PBE, regs[3]);
-
-        clear_bit(X86_FEATURE_DTES64, regs[2]);
-        clear_bit(X86_FEATURE_MWAIT, regs[2]);
-        clear_bit(X86_FEATURE_DSCPL, regs[2]);
-        clear_bit(X86_FEATURE_VMXE, regs[2]);
-        clear_bit(X86_FEATURE_SMXE, regs[2]);
-        clear_bit(X86_FEATURE_EST, regs[2]);
-        clear_bit(X86_FEATURE_TM2, regs[2]);
-        if ( !info->pv64 )
-            clear_bit(X86_FEATURE_CX16, regs[2]);
-        if ( info->xfeature_mask == 0 )
-        {
-            clear_bit(X86_FEATURE_XSAVE, regs[2]);
-            clear_bit(X86_FEATURE_AVX, regs[2]);
-        }
-        clear_bit(X86_FEATURE_XTPR, regs[2]);
-        clear_bit(X86_FEATURE_PDCM, regs[2]);
-        clear_bit(X86_FEATURE_PCID, regs[2]);
-        clear_bit(X86_FEATURE_DCA, regs[2]);
-        set_bit(X86_FEATURE_HYPERVISOR, regs[2]);
+        regs[2] = info->featureset[featureword_of(X86_FEATURE_XMM3)];
+        regs[3] = info->featureset[featureword_of(X86_FEATURE_FPU)];
         break;
 
     case 0x00000007:
         if ( input[1] == 0 )
-            regs[1] &= (bitmaskof(X86_FEATURE_BMI1) |
-                        bitmaskof(X86_FEATURE_HLE)  |
-                        bitmaskof(X86_FEATURE_AVX2) |
-                        bitmaskof(X86_FEATURE_BMI2) |
-                        bitmaskof(X86_FEATURE_ERMS) |
-                        bitmaskof(X86_FEATURE_RTM)  |
-                        bitmaskof(X86_FEATURE_RDSEED)  |
-                        bitmaskof(X86_FEATURE_ADX)  |
-                        bitmaskof(X86_FEATURE_FSGSBASE));
+        {
+            regs[1] = info->featureset[featureword_of(X86_FEATURE_FSGSBASE)];
+            regs[2] = info->featureset[featureword_of(X86_FEATURE_PREFETCHWT1)];
+        }
         else
+        {
             regs[1] = 0;
-        regs[0] = regs[2] = regs[3] = 0;
+            regs[2] = 0;
+        }
+        regs[0] = regs[3] = 0;
         break;
 
     case 0x0000000d:
@@ -706,29 +551,8 @@ static void xc_cpuid_pv_policy(xc_interface *xch,
         break;
 
     case 0x80000001:
-        if ( !info->pv64 )
-        {
-            clear_bit(X86_FEATURE_LM, regs[3]);
-            clear_bit(X86_FEATURE_LAHF_LM, regs[2]);
-            if ( info->vendor != VENDOR_AMD )
-                clear_bit(X86_FEATURE_SYSCALL, regs[3]);
-        }
-        else
-        {
-            set_bit(X86_FEATURE_SYSCALL, regs[3]);
-        }
-        if ( !info->pvh )
-            clear_bit(X86_FEATURE_PAGE1GB, regs[3]);
-        clear_bit(X86_FEATURE_RDTSCP, regs[3]);
-
-        clear_bit(X86_FEATURE_SVM, regs[2]);
-        clear_bit(X86_FEATURE_OSVW, regs[2]);
-        clear_bit(X86_FEATURE_IBS, regs[2]);
-        clear_bit(X86_FEATURE_SKINIT, regs[2]);
-        clear_bit(X86_FEATURE_WDT, regs[2]);
-        clear_bit(X86_FEATURE_LWP, regs[2]);
-        clear_bit(X86_FEATURE_NODEID_MSR, regs[2]);
-        clear_bit(X86_FEATURE_TOPOEXT, regs[2]);
+        regs[2] = info->featureset[featureword_of(X86_FEATURE_LAHF_LM)];
+        regs[3] = info->featureset[featureword_of(X86_FEATURE_SYSCALL)];
         break;
 
     case 0x00000005: /* MONITOR/MWAIT */
@@ -808,6 +632,87 @@ void xc_cpuid_to_str(const unsigned int *regs, char **strs)
     }
 }
 
+static void sanitise_featureset(struct cpuid_domain_info *info)
+{
+    const uint32_t fs_size = xc_get_cpu_featureset_size();
+    uint32_t disabled_features[fs_size];
+    static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
+    unsigned int i, b;
+
+    if ( info->hvm )
+    {
+        /* HVM Guest */
+
+        if ( !info->pae )
+            clear_bit(X86_FEATURE_PAE, info->featureset);
+
+        if ( !info->nestedhvm )
+        {
+            clear_bit(X86_FEATURE_SVM, info->featureset);
+            clear_bit(X86_FEATURE_VMXE, info->featureset);
+        }
+    }
+    else
+    {
+        /* PV or PVH Guest */
+
+        if ( !info->pv64 )
+        {
+            clear_bit(X86_FEATURE_LM, info->featureset);
+            if ( info->vendor != VENDOR_AMD )
+                clear_bit(X86_FEATURE_SYSCALL, info->featureset);
+        }
+
+        if ( !info->pvh )
+        {
+            clear_bit(X86_FEATURE_PSE, info->featureset);
+            clear_bit(X86_FEATURE_PSE36, info->featureset);
+            clear_bit(X86_FEATURE_PGE, info->featureset);
+            clear_bit(X86_FEATURE_PAGE1GB, info->featureset);
+        }
+    }
+
+    if ( info->xfeature_mask == 0 )
+        clear_bit(X86_FEATURE_XSAVE, info->featureset);
+
+    /* Disable deep dependencies of disabled features. */
+    for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
+        disabled_features[i] = ~info->featureset[i] & deep_features[i];
+
+    for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
+    {
+        const uint32_t *dfs;
+
+        if ( !test_bit(b, disabled_features) ||
+             !(dfs = xc_get_feature_deep_deps(b)) )
+             continue;
+
+        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
+        {
+            info->featureset[i] &= ~dfs[i];
+            disabled_features[i] &= ~dfs[i];
+        }
+    }
+
+    switch ( info->vendor )
+    {
+    case VENDOR_INTEL:
+        /* Intel clears the common bits in e1d. */
+        info->featureset[featureword_of(X86_FEATURE_SYSCALL)] &= ~COMMON_1D;
+        break;
+
+    case VENDOR_AMD:
+        /* AMD duplicates the common bits between 1d and e1d. */
+        info->featureset[featureword_of(X86_FEATURE_SYSCALL)] =
+            ((info->featureset[featureword_of(X86_FEATURE_FPU)] & COMMON_1D) |
+             (info->featureset[featureword_of(X86_FEATURE_SYSCALL)] & ~COMMON_1D));
+        break;
+
+    default:
+        break;
+    }
+}
+
 int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
                           uint32_t *featureset,
                           unsigned int nr_features)
@@ -831,6 +736,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
     else
         ext_max = (regs[0] <= DEF_MAX_INTELEXT) ? regs[0] : DEF_MAX_INTELEXT;
 
+    sanitise_featureset(&info);
+
     input[0] = 0;
     input[1] = XEN_CPUID_INPUT_UNUSED;
     for ( ; ; )
@@ -1002,9 +909,9 @@ int xc_cpuid_set(
                 val = polval;
 
             if ( val )
-                set_bit(31 - j, regs[i]);
+                set_feature(31 - j, regs[i]);
             else
-                clear_bit(31 - j, regs[i]);
+                clear_feature(31 - j, regs[i]);
 
             config_transformed[i][j] = config[i][j];
             if ( config[i][j] == 's' )
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH v2 30/30] tools/libxc: Calculate xstate cpuid leaf from guest information
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (28 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
@ 2016-02-05 13:42 ` Andrew Cooper
  2016-02-05 14:28   ` Jan Beulich
  2016-02-08 17:26 ` [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling Andrew Cooper
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 13:42 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

It is unsafe to generate the guests xstate leaves from host information, as it
prevents the differences between hosts from being hidden.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/xc_cpuid_x86.c | 44 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 0e79812..810377c 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -380,6 +380,11 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
     }
 }
 
+#define X86_XCR0_X87    (1ULL <<  0)
+#define X86_XCR0_SSE    (1ULL <<  1)
+#define X86_XCR0_AVX    (1ULL <<  2)
+#define X86_XCR0_LWP    (1ULL << 62)
+
 #define XSAVEOPT        (1 << 0)
 #define XSAVEC          (1 << 1)
 #define XGETBV1         (1 << 2)
@@ -389,34 +394,53 @@ static void xc_cpuid_config_xsave(xc_interface *xch,
                                   const struct cpuid_domain_info *info,
                                   const unsigned int *input, unsigned int *regs)
 {
-    if ( info->xfeature_mask == 0 )
+    uint64_t guest_xfeature_mask;
+
+    if ( info->xfeature_mask == 0 ||
+         !test_bit(X86_FEATURE_XSAVE, info->featureset) )
     {
         regs[0] = regs[1] = regs[2] = regs[3] = 0;
         return;
     }
 
+    guest_xfeature_mask = X86_XCR0_SSE | X86_XCR0_X87;
+
+    if ( test_bit(X86_FEATURE_AVX, info->featureset) )
+        guest_xfeature_mask |= X86_XCR0_AVX;
+
+    if ( test_bit(X86_FEATURE_LWP, info->featureset) )
+        guest_xfeature_mask |= X86_XCR0_LWP;
+
+    /*
+     * Clamp to host mask.  Should be no-op, as guest_xfeature_mask should not
+     * be able to be calculated as larger than info->xfeature_mask.
+     *
+     * TODO - see about making this a harder error.
+     */
+    guest_xfeature_mask &= info->xfeature_mask;
+
     switch ( input[1] )
     {
-    case 0: 
+    case 0:
         /* EAX: low 32bits of xfeature_enabled_mask */
-        regs[0] = info->xfeature_mask & 0xFFFFFFFF;
+        regs[0] = guest_xfeature_mask & 0xFFFFFFFF;
         /* EDX: high 32bits of xfeature_enabled_mask */
-        regs[3] = (info->xfeature_mask >> 32) & 0xFFFFFFFF;
+        regs[3] = (guest_xfeature_mask >> 32) & 0xFFFFFFFF;
         /* ECX: max size required by all HW features */
         {
             unsigned int _input[2] = {0xd, 0x0}, _regs[4];
             regs[2] = 0;
-            for ( _input[1] = 2; _input[1] < 64; _input[1]++ )
+            for ( _input[1] = 2; _input[1] <= 62; _input[1]++ )
             {
                 cpuid(_input, _regs);
                 if ( (_regs[0] + _regs[1]) > regs[2] )
                     regs[2] = _regs[0] + _regs[1];
             }
         }
-        /* EBX: max size required by enabled features. 
-         * This register contains a dynamic value, which varies when a guest 
-         * enables or disables XSTATE features (via xsetbv). The default size 
-         * after reset is 576. */ 
+        /* EBX: max size required by enabled features.
+         * This register contains a dynamic value, which varies when a guest
+         * enables or disables XSTATE features (via xsetbv). The default size
+         * after reset is 576. */
         regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
         break;
     case 1: /* leaf 1 */
@@ -424,7 +448,7 @@ static void xc_cpuid_config_xsave(xc_interface *xch,
         regs[1] = regs[2] = regs[3] = 0;
         break;
     case 2 ... 63: /* sub-leaves */
-        if ( !(info->xfeature_mask & (1ULL << input[1])) )
+        if ( !(guest_xfeature_mask & (1ULL << input[1])) )
         {
             regs[0] = regs[1] = regs[2] = regs[3] = 0;
             break;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 30/30] tools/libxc: Calculate xstate cpuid leaf from guest information
  2016-02-05 13:42 ` [PATCH v2 30/30] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
@ 2016-02-05 14:28   ` Jan Beulich
  2016-02-05 15:22     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-05 14:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Ian Campbell, Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -380,6 +380,11 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
>      }
>  }
>  
> +#define X86_XCR0_X87    (1ULL <<  0)
> +#define X86_XCR0_SSE    (1ULL <<  1)
> +#define X86_XCR0_AVX    (1ULL <<  2)
> +#define X86_XCR0_LWP    (1ULL << 62)

What about all the other bits we meanwhile know?

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 30/30] tools/libxc: Calculate xstate cpuid leaf from guest information
  2016-02-05 14:28   ` Jan Beulich
@ 2016-02-05 15:22     ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-05 15:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, Ian Campbell, Xen-devel

On 05/02/16 14:28, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> --- a/tools/libxc/xc_cpuid_x86.c
>> +++ b/tools/libxc/xc_cpuid_x86.c
>> @@ -380,6 +380,11 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
>>      }
>>  }
>>  
>> +#define X86_XCR0_X87    (1ULL <<  0)
>> +#define X86_XCR0_SSE    (1ULL <<  1)
>> +#define X86_XCR0_AVX    (1ULL <<  2)
>> +#define X86_XCR0_LWP    (1ULL << 62)
> What about all the other bits we meanwhile know?

These are the only ones used.

... which leads me to wonder where MPX support went.  I think I may have
accidentally lost it in the rebase.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  2016-02-05 13:42 ` [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
@ 2016-02-05 16:12   ` Wei Liu
  2016-02-17  8:30   ` Jan Beulich
  1 sibling, 0 replies; 139+ messages in thread
From: Wei Liu @ 2016-02-05 16:12 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, Tim Deegan, Rob Hoes, Xen-devel,
	Jan Beulich, David Scott

On Fri, Feb 05, 2016 at 01:42:16PM +0000, Andrew Cooper wrote:
> And provide stubs for toolstack use.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: David Scott <dave@recoil.org>
> CC: Rob Hoes <Rob.Hoes@citrix.com>
> 
> v2:
>  * Rebased to use libxencall
>  * Improve hypercall documentation
> ---
>  tools/libxc/include/xenctrl.h       |  3 ++
>  tools/libxc/xc_cpuid_x86.c          | 27 +++++++++++++++
>  tools/ocaml/libs/xc/xenctrl.ml      |  3 ++
>  tools/ocaml/libs/xc/xenctrl.mli     |  4 +++
>  tools/ocaml/libs/xc/xenctrl_stubs.c | 35 ++++++++++++++++++++
>  xen/arch/x86/sysctl.c               | 66 +++++++++++++++++++++++++++++++++++++
>  xen/include/public/sysctl.h         | 25 ++++++++++++++
>  7 files changed, 163 insertions(+)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 1a5f4ec..5a7500a 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2571,6 +2571,9 @@ int xc_psr_cat_get_domain_data(xc_interface *xch, uint32_t domid,
>  int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
>                             uint32_t *cos_max, uint32_t *cbm_len,
>                             bool *cdp_enabled);
> +
> +int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
> +                          uint32_t *nr_features, uint32_t *featureset);
>  #endif
>  
>  /* Compat shims */
> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
> index c142595..7b802da 100644
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -33,6 +33,33 @@
>  #define DEF_MAX_INTELEXT  0x80000008u
>  #define DEF_MAX_AMDEXT    0x8000001cu
>  
> +int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
> +                          uint32_t *nr_features, uint32_t *featureset)
> +{
> +    DECLARE_SYSCTL;
> +    DECLARE_HYPERCALL_BOUNCE(featureset,
> +                             *nr_features * sizeof(*featureset),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    int ret;
> +
> +    if ( xc_hypercall_bounce_pre(xch, featureset) )
> +        return -1;
> +
> +    sysctl.cmd = XEN_SYSCTL_get_cpu_featureset;
> +    sysctl.u.cpu_featureset.index = index;
> +    sysctl.u.cpu_featureset.nr_features = *nr_features;
> +    set_xen_guest_handle(sysctl.u.cpu_featureset.features, featureset);
> +
> +    ret = do_sysctl(xch, &sysctl);
> +
> +    xc_hypercall_bounce_post(xch, featureset);
> +
> +    if ( !ret )
> +        *nr_features = sysctl.u.cpu_featureset.nr_features;
> +
> +    return ret;
> +}
> +

Looks like a sensible wrapper, so 

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers
  2016-02-05 13:42 ` [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
@ 2016-02-05 16:12   ` Wei Liu
  2016-02-08 11:40     ` Andrew Cooper
  2016-02-08 16:23   ` Tim Deegan
  1 sibling, 1 reply; 139+ messages in thread
From: Wei Liu @ 2016-02-05 16:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Xen-devel

On Fri, Feb 05, 2016 at 01:42:17PM +0000, Andrew Cooper wrote:
> The type of the pointer to a bitmap is not interesting; it does not affect the
> representation of the block of bits being pointed to.
> 
> Make the libxc functions consistent with those in Xen, so they can work just
> as well with 'unsigned int *' based bitmaps.

I'm not sure I understand this, the bitmap functions in
xen/include/bitmap.h seem to take unsigned long *.

Not saying that I object to this patch, just this comment looks wrong.

Wei.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 25/30] tools/libxc: Use public/featureset.h for cpuid policy generation
  2016-02-05 13:42 ` [PATCH v2 25/30] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
@ 2016-02-05 16:12   ` Wei Liu
  0 siblings, 0 replies; 139+ messages in thread
From: Wei Liu @ 2016-02-05 16:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Xen-devel

On Fri, Feb 05, 2016 at 01:42:18PM +0000, Andrew Cooper wrote:
> Rather than having a different local copy of some of the feature
> definitions.
> 
> Modify the xc_cpuid_x86.c cpumask helpers to appropriate truncate the
> new values.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Using the canonical source instead of a local copy is certainly a good
idea.

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 26/30] tools/libxc: Expose the automatically generated cpu featuremask information
  2016-02-05 13:42 ` [PATCH v2 26/30] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
@ 2016-02-05 16:12   ` Wei Liu
  2016-02-05 16:15     ` Wei Liu
  0 siblings, 1 reply; 139+ messages in thread
From: Wei Liu @ 2016-02-05 16:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Xen-devel

On Fri, Feb 05, 2016 at 01:42:19PM +0000, Andrew Cooper wrote:
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> 
> New in v2
> ---
>  tools/libxc/Makefile          |  9 ++++++
>  tools/libxc/include/xenctrl.h | 14 ++++++++
>  tools/libxc/xc_cpuid_x86.c    | 75 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 98 insertions(+)
> 
> diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
> index 0a8614c..30de3fe 100644
> --- a/tools/libxc/Makefile
> +++ b/tools/libxc/Makefile
> @@ -145,6 +145,15 @@ $(eval $(genpath-target))
>  
>  xc_private.h: _paths.h
>  
> +ifeq ($(CONFIG_X86),y)
> +
> +_xc_cpuid_autogen.h: $(XEN_ROOT)/xen/include/public/arch-x86/cpufeatureset.h $(XEN_ROOT)/xen/tools/gen-cpuid.py
> +	$(PYTHON) $(XEN_ROOT)/xen/tools/gen-cpuid.py -i $^ -o $@.new

I don't seem to see this file in tree or in this series.

And I think ultimately that file should be maintained by x86
maintainers.

> +	$(call move-if-changed,$@.new,$@)
> +
> +build: _xc_cpuid_autogen.h
> +endif
> +
>  $(CTRL_LIB_OBJS) $(GUEST_LIB_OBJS) \
>  $(CTRL_PIC_OBJS) $(GUEST_PIC_OBJS): xc_private.h
>  
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 5a7500a..1da372d 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2574,6 +2574,20 @@ int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
>  
>  int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
>                            uint32_t *nr_features, uint32_t *featureset);
> +
> +uint32_t xc_get_cpu_featureset_size(void);
> +
> +enum xc_static_cpu_featuremask {
> +    XC_FEATUREMASK_KNOWN,
> +    XC_FEATUREMASK_INVERTED,
> +    XC_FEATUREMASK_PV,
> +    XC_FEATUREMASK_HVM_SHADOW,
> +    XC_FEATUREMASK_HVM_HAP,
> +    XC_FEATUREMASK_DEEP_FEATURES,
> +};
> +const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
> +const uint32_t *xc_get_feature_deep_deps(uint32_t feature);
> +
>  #endif
>  
>  /* Compat shims */
> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
> index 348cbdd..7ef37d2 100644
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -22,6 +22,7 @@
>  #include <stdlib.h>
>  #include <stdbool.h>
>  #include "xc_private.h"
> +#include "_xc_cpuid_autogen.h"
>  #include <xen/arch-x86/cpufeatureset.h>
>  #include <xen/hvm/params.h>
>  
> @@ -60,6 +61,80 @@ int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
>      return ret;
>  }
>  
> +uint32_t xc_get_cpu_featureset_size(void)
> +{
> +    return FEATURESET_NR_ENTRIES;
> +}
> +
> +const uint32_t *xc_get_static_cpu_featuremask(
> +    enum xc_static_cpu_featuremask mask)

I can only get a vague idea how these functions are supposed to work. I
think I will leave this to hypervisor maintainers.

Wei.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 27/30] tools: Utility for dealing with featuresets
  2016-02-05 13:42 ` [PATCH v2 27/30] tools: Utility for dealing with featuresets Andrew Cooper
@ 2016-02-05 16:13   ` Wei Liu
  0 siblings, 0 replies; 139+ messages in thread
From: Wei Liu @ 2016-02-05 16:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Xen-devel

On Fri, Feb 05, 2016 at 01:42:20PM +0000, Andrew Cooper wrote:
> It is able to reports the current featuresets; both the static masks and
> dynamic featuresets from Xen, or to decode an arbitrary featureset into
> `/proc/cpuinfo` style strings.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

I will let you worry about how to keep this utility in sync with all the
x86 features in hypervsior. :-)

Is it possible or useful to move the translation between string and
feature bits to libxc ? This is definitely not a blocker for this patch
though.

Wei.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 28/30] tools/libxc: Wire a featureset through to cpuid policy logic
  2016-02-05 13:42 ` [PATCH v2 28/30] tools/libxc: Wire a featureset through to cpuid policy logic Andrew Cooper
@ 2016-02-05 16:13   ` Wei Liu
  0 siblings, 0 replies; 139+ messages in thread
From: Wei Liu @ 2016-02-05 16:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Xen-devel

On Fri, Feb 05, 2016 at 01:42:21PM +0000, Andrew Cooper wrote:
> Later changes will cause the cpuid generation logic to seed their information
> from a featureset.  This patch adds the infrastructure to specify a
> featureset, and will obtain the appropriate default from Xen if omitted.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork
  2016-02-05 13:42 ` [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
@ 2016-02-05 16:13   ` Wei Liu
  2016-02-17  8:55   ` Jan Beulich
  1 sibling, 0 replies; 139+ messages in thread
From: Wei Liu @ 2016-02-05 16:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Xen-devel

On Fri, Feb 05, 2016 at 01:42:22PM +0000, Andrew Cooper wrote:
> It is conceptually wrong to base a VM's featureset on the features visible to
> the toolstack which happens to construct it.
> 

Agreed.

> Instead, the featureset used is either an explicit one passed by the
> toolstack, or the default which Xen believes it can give to the guest.
> 
> Collect all the feature manipulation into a single function which adjusts the
> featureset, and perform deep dependency removal.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

Note that I didn't review this patch in details.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 26/30] tools/libxc: Expose the automatically generated cpu featuremask information
  2016-02-05 16:12   ` Wei Liu
@ 2016-02-05 16:15     ` Wei Liu
  0 siblings, 0 replies; 139+ messages in thread
From: Wei Liu @ 2016-02-05 16:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Xen-devel

On Fri, Feb 05, 2016 at 04:12:45PM +0000, Wei Liu wrote:
> On Fri, Feb 05, 2016 at 01:42:19PM +0000, Andrew Cooper wrote:
> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> > CC: Ian Campbell <Ian.Campbell@citrix.com>
> > CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> > CC: Wei Liu <wei.liu2@citrix.com>
> > 
> > New in v2
> > ---
> >  tools/libxc/Makefile          |  9 ++++++
> >  tools/libxc/include/xenctrl.h | 14 ++++++++
> >  tools/libxc/xc_cpuid_x86.c    | 75 +++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 98 insertions(+)
> > 
> > diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
> > index 0a8614c..30de3fe 100644
> > --- a/tools/libxc/Makefile
> > +++ b/tools/libxc/Makefile
> > @@ -145,6 +145,15 @@ $(eval $(genpath-target))
> >  
> >  xc_private.h: _paths.h
> >  
> > +ifeq ($(CONFIG_X86),y)
> > +
> > +_xc_cpuid_autogen.h: $(XEN_ROOT)/xen/include/public/arch-x86/cpufeatureset.h $(XEN_ROOT)/xen/tools/gen-cpuid.py
> > +	$(PYTHON) $(XEN_ROOT)/xen/tools/gen-cpuid.py -i $^ -o $@.new
> 
> I don't seem to see this file in tree or in this series.
> 

Stupid me, I missed it. It's in another patch.

So in principle:

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers
  2016-02-05 16:12   ` Wei Liu
@ 2016-02-08 11:40     ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-08 11:40 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Jackson, Ian Campbell, Xen-devel

On 05/02/16 16:12, Wei Liu wrote:
> On Fri, Feb 05, 2016 at 01:42:17PM +0000, Andrew Cooper wrote:
>> The type of the pointer to a bitmap is not interesting; it does not affect the
>> representation of the block of bits being pointed to.
>>
>> Make the libxc functions consistent with those in Xen, so they can work just
>> as well with 'unsigned int *' based bitmaps.
> I'm not sure I understand this, the bitmap functions in
> xen/include/bitmap.h seem to take unsigned long *.
>
> Not saying that I object to this patch, just this comment looks wrong.

The lower level bit operations <asm/bitops.h> are all in terms of void*,
for both x86 and arm

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers
  2016-02-05 13:42 ` [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
  2016-02-05 16:12   ` Wei Liu
@ 2016-02-08 16:23   ` Tim Deegan
  2016-02-08 16:36     ` Ian Campbell
  1 sibling, 1 reply; 139+ messages in thread
From: Tim Deegan @ 2016-02-08 16:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Xen-devel

At 13:42 +0000 on 05 Feb (1454679737), Andrew Cooper wrote:
> The type of the pointer to a bitmap is not interesting; it does not affect the
> representation of the block of bits being pointed to.

It does affect the alignment, though.  Is this safe on ARM?

Tim.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers
  2016-02-08 16:23   ` Tim Deegan
@ 2016-02-08 16:36     ` Ian Campbell
  2016-02-10 10:07       ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Ian Campbell @ 2016-02-08 16:36 UTC (permalink / raw)
  To: Tim Deegan, Andrew Cooper; +Cc: Wei Liu, Ian Jackson, Xen-devel

On Mon, 2016-02-08 at 16:23 +0000, Tim Deegan wrote:
> At 13:42 +0000 on 05 Feb (1454679737), Andrew Cooper wrote:
> > The type of the pointer to a bitmap is not interesting; it does not
> > affect the
> > representation of the block of bits being pointed to.
> 
> It does affect the alignment, though.  Is this safe on ARM?

Good point. These constructs in the patch:

+    const unsigned long *addr = _addr;

Would be broken if _addr were not suitably aligned for an unsigned long.

That probably rules out this approach unfortunately.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 139+ messages in thread

* [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling
  2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (29 preceding siblings ...)
  2016-02-05 13:42 ` [PATCH v2 30/30] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
@ 2016-02-08 17:26 ` Andrew Cooper
  2016-02-17  9:02   ` Jan Beulich
  30 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-08 17:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Will be folded into appropriate patches in v3.
---
 xen/arch/x86/cpu/amd.c | 15 +++++++++++++--
 xen/arch/x86/domctl.c  | 15 +++++++++++++++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index deb98ea..3e345fe 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -265,7 +265,15 @@ static void __init noinline amd_init_levelling(void)
 			edx &= m->edx;
 		}
 
-		cpuidmask_defaults._1cd &= ((uint64_t)ecx << 32) | edx;
+		/* Fast-forward bits - Must be set. */
+		if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
+			ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+		edx |= cpufeat_mask(X86_FEATURE_APIC);
+
+		/* Allow the HYPERVISOR bit to be set via guest policy. */
+		ecx |= cpufeat_mask(X86_FEATURE_HYPERVISOR);
+
+		cpuidmask_defaults._1cd = ((uint64_t)ecx << 32) | edx;
 	}
 
 	if ((levelling_caps & LCAP_e1cd) == LCAP_e1cd) {
@@ -281,7 +289,10 @@ static void __init noinline amd_init_levelling(void)
 			edx &= m->ext_edx;
 		}
 
-		cpuidmask_defaults.e1cd &= ((uint64_t)ecx << 32) | edx;
+		/* Fast-forward bits - Must be set. */
+		edx |= cpufeat_mask(X86_FEATURE_APIC);
+
+		cpuidmask_defaults.e1cd = ((uint64_t)ecx << 32) | edx;
 	}
 
 	if ((levelling_caps & LCAP_7ab0) == LCAP_7ab0) {
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index f06bc02..613bb5c 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -103,6 +103,15 @@ static void update_domain_cpuid_info(struct domain *d,
 
             case X86_VENDOR_AMD:
                 mask &= ((uint64_t)ecx << 32) | edx;
+
+                /* Fast-forward bits - Must be set. */
+                if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
+                    ecx = cpufeat_mask(X86_FEATURE_OSXSAVE);
+                else
+                    ecx = 0;
+                edx = cpufeat_mask(X86_FEATURE_APIC);
+
+                mask |= ((uint64_t)ecx << 32) | edx;
                 break;
             }
 
@@ -170,6 +179,12 @@ static void update_domain_cpuid_info(struct domain *d,
 
             case X86_VENDOR_AMD:
                 mask &= ((uint64_t)ecx << 32) | edx;
+
+                /* Fast-forward bits - Must be set. */
+                ecx = 0;
+                edx = cpufeat_mask(X86_FEATURE_APIC);
+
+                mask |= ((uint64_t)ecx << 32) | edx;
                 break;
             }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers
  2016-02-08 16:36     ` Ian Campbell
@ 2016-02-10 10:07       ` Andrew Cooper
  2016-02-10 10:18         ` Ian Campbell
  2016-02-17 20:06         ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-10 10:07 UTC (permalink / raw)
  To: Ian Campbell, Tim Deegan; +Cc: Wei Liu, Ian Jackson, Xen-devel

On 08/02/16 16:36, Ian Campbell wrote:
> On Mon, 2016-02-08 at 16:23 +0000, Tim Deegan wrote:
>> At 13:42 +0000 on 05 Feb (1454679737), Andrew Cooper wrote:
>>> The type of the pointer to a bitmap is not interesting; it does not
>>> affect the
>>> representation of the block of bits being pointed to.
>> It does affect the alignment, though.  Is this safe on ARM?
> Good point. These constructs in the patch:
>
> +    const unsigned long *addr = _addr;
>
> Would be broken if _addr were not suitably aligned for an unsigned long.
>
> That probably rules out this approach unfortunately.

What about reworking libxc bitops in terms of unsigned char?  That
should cover all alignment issues.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers
  2016-02-10 10:07       ` Andrew Cooper
@ 2016-02-10 10:18         ` Ian Campbell
  2016-02-18 13:37           ` Andrew Cooper
  2016-02-17 20:06         ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 139+ messages in thread
From: Ian Campbell @ 2016-02-10 10:18 UTC (permalink / raw)
  To: Andrew Cooper, Tim Deegan; +Cc: Wei Liu, Ian Jackson, Xen-devel

On Wed, 2016-02-10 at 10:07 +0000, Andrew Cooper wrote:
> On 08/02/16 16:36, Ian Campbell wrote:
> > On Mon, 2016-02-08 at 16:23 +0000, Tim Deegan wrote:
> > > At 13:42 +0000 on 05 Feb (1454679737), Andrew Cooper wrote:
> > > > The type of the pointer to a bitmap is not interesting; it does not
> > > > affect the
> > > > representation of the block of bits being pointed to.
> > > It does affect the alignment, though.  Is this safe on ARM?
> > Good point. These constructs in the patch:
> > 
> > +    const unsigned long *addr = _addr;
> > 
> > Would be broken if _addr were not suitably aligned for an unsigned
> > long.
> > 
> > That probably rules out this approach unfortunately.
> 
> What about reworking libxc bitops in terms of unsigned char?  That
> should cover all alignment issues.

Assuming any asm or calls to __builtin_foo backends were adjusted to suite,
that would be ok, would that be compatible with the Xen side though?

Or you could make things a bit cleverer to do the
    const unsigned long *addr = _addr;
in a way which also forces the alignment and adjusts nr to compensate (I
don't see that 4 bytes of align is going to make addr point to e.g. a
different sort of memory to _addr).

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-05 13:41 ` [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API Andrew Cooper
@ 2016-02-12 16:27   ` Jan Beulich
  2016-02-17 13:08     ` Andrew Cooper
  2016-02-19 17:29   ` Joao Martins
  1 sibling, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-12 16:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Ian Campbell, Xen-devel

>>> On 05.02.16 at 14:41, <andrew.cooper3@citrix.com> wrote:
> +/* Intel-defined CPU features, CPUID level 0x00000001.edx, word 0 */
> +#define X86_FEATURE_FPU           ( 0*32+ 0) /*   Onboard FPU */

Regardless of you limiting the interface to tools only, I'm not
convinced exposing constants starting with X86_* here is
appropriate.

> +#define X86_FEATURE_XMM           ( 0*32+25) /*   Streaming SIMD Extensions */
> +#define X86_FEATURE_XMM2          ( 0*32+26) /*   Streaming SIMD Extensions-2 */
> [...]
> +/* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
> +#define X86_FEATURE_XMM3          ( 1*32+ 0) /*   Streaming SIMD Extensions-3 */

Apart from that exposing them should be done using canonical instead
of Linux-invented names, i.e. s/XMM/SSE/ for the above lines. I've
had a need to create a patch to do this just earlier today.

> +#define X86_FEATURE_SSSE3         ( 1*32+ 9) /*   Supplemental Streaming SIMD Extensions-3 */

Note how this one and ...

> +#define X86_FEATURE_SSE4_1        ( 1*32+19) /*   Streaming SIMD Extensions 4.1 */
> +#define X86_FEATURE_SSE4_2        ( 1*32+20) /*   Streaming SIMD Extensions 4.2 */

... these two already use names matching the SDM.

Since canonicalization of the names implies changes elsewhere, I also
can't really offer to do the adjustments while committing.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 06/30] xen/x86: Script to automatically process featureset information
  2016-02-05 13:41 ` [PATCH v2 06/30] xen/x86: Script to automatically process featureset information Andrew Cooper
@ 2016-02-12 16:36   ` Jan Beulich
  2016-02-12 16:43     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-12 16:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Ian Campbell, Xen-devel

>>> On 05.02.16 at 14:41, <andrew.cooper3@citrix.com> wrote:
> This script consumes include/public/arch-x86/cpufeatureset.h and generates a
> single include/asm-x86/cpuid-autogen.h containing all the processed
> information.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
albeit ...

> --- /dev/null
> +++ b/xen/tools/gen-cpuid.py
> @@ -0,0 +1,191 @@

... I can't really comment on this. Am I at least right in understanding
that all it does at this point is produce FEATURESET_NR_ENTRIES?

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 07/30] xen/x86: Collect more cpuid feature leaves
  2016-02-05 13:42 ` [PATCH v2 07/30] xen/x86: Collect more cpuid feature leaves Andrew Cooper
@ 2016-02-12 16:38   ` Jan Beulich
  0 siblings, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-12 16:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> New words are:
>  * 0x80000007.edx - Contains Invarient TSC
>  * 0x80000008.ebx - Newly used for AMD Zen processors
> 
> In addition, replace some open-coded ITSC and EFRO manipulation.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 06/30] xen/x86: Script to automatically process featureset information
  2016-02-12 16:36   ` Jan Beulich
@ 2016-02-12 16:43     ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-12 16:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Ian Campbell, Xen-devel

On 12/02/16 16:36, Jan Beulich wrote:
>>>> On 05.02.16 at 14:41, <andrew.cooper3@citrix.com> wrote:
>> This script consumes include/public/arch-x86/cpufeatureset.h and generates a
>> single include/asm-x86/cpuid-autogen.h containing all the processed
>> information.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> albeit ...
>
>> --- /dev/null
>> +++ b/xen/tools/gen-cpuid.py
>> @@ -0,0 +1,191 @@
> ... I can't really comment on this. Am I at least right in understanding
> that all it does at this point is produce FEATURESET_NR_ENTRIES?

At this point, yes.

Future patch change this script, and associated C code together.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities
  2016-02-05 13:42 ` [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities Andrew Cooper
@ 2016-02-12 16:43   ` Jan Beulich
  2016-02-12 16:48     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-12 16:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> --- /dev/null
> +++ b/xen/arch/x86/cpuid.c
> @@ -0,0 +1,19 @@
> +#include <xen/lib.h>
> +#include <asm/cpuid.h>
> +
> +const uint32_t known_features[] = INIT_KNOWN_FEATURES;
> +
> +static void __maybe_unused build_assertions(void)
> +{
> +    BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);

This is sort of redundant with ...

> --- /dev/null
> +++ b/xen/include/asm-x86/cpuid.h
> @@ -0,0 +1,24 @@
> +#ifndef __X86_CPUID_H__
> +#define __X86_CPUID_H__
> +
> +#include <asm/cpuid-autogen.h>
> +
> +#define FSCAPINTS FEATURESET_NR_ENTRIES
> +
> +#ifndef __ASSEMBLY__
> +#include <xen/types.h>
> +
> +extern const uint32_t known_features[FSCAPINTS];

... the use of FSCAPINTS here. You'd catch more mistakes if you
just used [] here.

But either way
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset
  2016-02-05 13:42 ` [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset Andrew Cooper
@ 2016-02-12 16:47   ` Jan Beulich
  2016-02-12 16:50     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-12 16:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> Awkwardly, some new feature bits mean "Feature $X no longer works".
> Store these inverted in a featureset.
> 
> This permits safe zero-extending of a smaller featureset as part of a
> comparison, and safe reasoning (subset?, superset?, compatible? etc.)
> without specific knowldge of meaning of each bit.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <JBeulich@suse.com>
albeit ...

> @@ -158,7 +174,7 @@
>  #define X86_FEATURE_INVPCID       ( 5*32+10) /*   Invalidate Process Context ID */
>  #define X86_FEATURE_RTM           ( 5*32+11) /*   Restricted Transactional Memory */
>  #define X86_FEATURE_CMT           ( 5*32+12) /*   Cache Monitoring Technology */
> -#define X86_FEATURE_NO_FPU_SEL    ( 5*32+13) /*   FPU CS/DS stored as zero */
> +#define X86_FEATURE_FPU_SEL       ( 5*32+13) /*!  FPU CS/DS stored as zero */

... changes like this to the public interface should normally be
avoided (i.e. you had better left out the "NO" one when you first
created this file).

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities
  2016-02-12 16:43   ` Jan Beulich
@ 2016-02-12 16:48     ` Andrew Cooper
  2016-02-12 17:14       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-12 16:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 12/02/16 16:43, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> --- /dev/null
>> +++ b/xen/arch/x86/cpuid.c
>> @@ -0,0 +1,19 @@
>> +#include <xen/lib.h>
>> +#include <asm/cpuid.h>
>> +
>> +const uint32_t known_features[] = INIT_KNOWN_FEATURES;
>> +
>> +static void __maybe_unused build_assertions(void)
>> +{
>> +    BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);
> This is sort of redundant with ...
>
>> --- /dev/null
>> +++ b/xen/include/asm-x86/cpuid.h
>> @@ -0,0 +1,24 @@
>> +#ifndef __X86_CPUID_H__
>> +#define __X86_CPUID_H__
>> +
>> +#include <asm/cpuid-autogen.h>
>> +
>> +#define FSCAPINTS FEATURESET_NR_ENTRIES
>> +
>> +#ifndef __ASSEMBLY__
>> +#include <xen/types.h>
>> +
>> +extern const uint32_t known_features[FSCAPINTS];
> ... the use of FSCAPINTS here. You'd catch more mistakes if you
> just used [] here.

Not quite.

The extern gives an explicit size so other translation units can use
ARRAY_SIZE().

Without the BUILD_BUG_ON(), const uint32_t known_features[] can actually
be longer than FSCAPINTS, and everything compiles fine.

The BUILD_BUG_ON() were introduced following an off-by-one error
generating INIT_KNOWN_FEATURES, where ARRAY_SIZE(known_features) was
different in this translation unit than all others.

>
> But either way
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thanks,

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset
  2016-02-12 16:47   ` Jan Beulich
@ 2016-02-12 16:50     ` Andrew Cooper
  2016-02-12 17:15       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-12 16:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 12/02/16 16:47, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> Awkwardly, some new feature bits mean "Feature $X no longer works".
>> Store these inverted in a featureset.
>>
>> This permits safe zero-extending of a smaller featureset as part of a
>> comparison, and safe reasoning (subset?, superset?, compatible? etc.)
>> without specific knowldge of meaning of each bit.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Jan Beulich <JBeulich@suse.com>
> albeit ...
>
>> @@ -158,7 +174,7 @@
>>  #define X86_FEATURE_INVPCID       ( 5*32+10) /*   Invalidate Process Context ID */
>>  #define X86_FEATURE_RTM           ( 5*32+11) /*   Restricted Transactional Memory */
>>  #define X86_FEATURE_CMT           ( 5*32+12) /*   Cache Monitoring Technology */
>> -#define X86_FEATURE_NO_FPU_SEL    ( 5*32+13) /*   FPU CS/DS stored as zero */
>> +#define X86_FEATURE_FPU_SEL       ( 5*32+13) /*!  FPU CS/DS stored as zero */
> ... changes like this to the public interface should normally be
> avoided (i.e. you had better left out the "NO" one when you first
> created this file).

I couldn't find a neater way of doing this while keeping the name
consistent with its representation.  I took the decision that this is
the lesser of the available evils when making this change.

I am open to alternate suggestions.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-05 13:42 ` [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset Andrew Cooper
@ 2016-02-12 17:05   ` Jan Beulich
  2016-02-12 17:42     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-12 17:05 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> Use attributes to specify whether a feature is applicable to be exposed to:
>  1) All guests
>  2) HVM guests
>  3) HVM HAP guests

No provisions for PV-only or shadow-only features?

> +#define X86_FEATURE_MTRR          ( 0*32+12) /*S  Memory Type Range Registers */

Why is this being hidden from PV guests (namely Dom0)? Right now
this is being cleared for PHV only.

> +#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */

Shouldn't this get a "nested-only" class? Also don't we currently
require HAP for nested mode to work?

>  #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */

Leaving this untouched warrants at least a comment in the commit
message I would think.

> +#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */

Hmm, I'm confused - on one hand we currently clear this bit for
PV guests, but otoh do_invalid_op() emulates it.

> +#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
> [...]
> -#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
> +#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */

How do you intend to handle exposing these to 64-bit PV guests,
but not to 32-bit ones?

>  #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */

This currently is left untouched for HVM guests, and gets cleared
only when !cpu_has_apic (i.e. effectively never) for PV ones.

>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension (MONITORX/MWAITX) */

Why not exposed to HVM (also for _MWAIT as I now notice)?

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities
  2016-02-12 16:48     ` Andrew Cooper
@ 2016-02-12 17:14       ` Jan Beulich
  2016-02-17 13:12         ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-12 17:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 12.02.16 at 17:48, <andrew.cooper3@citrix.com> wrote:
> On 12/02/16 16:43, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> --- /dev/null
>>> +++ b/xen/arch/x86/cpuid.c
>>> @@ -0,0 +1,19 @@
>>> +#include <xen/lib.h>
>>> +#include <asm/cpuid.h>
>>> +
>>> +const uint32_t known_features[] = INIT_KNOWN_FEATURES;
>>> +
>>> +static void __maybe_unused build_assertions(void)
>>> +{
>>> +    BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);
>> This is sort of redundant with ...
>>
>>> --- /dev/null
>>> +++ b/xen/include/asm-x86/cpuid.h
>>> @@ -0,0 +1,24 @@
>>> +#ifndef __X86_CPUID_H__
>>> +#define __X86_CPUID_H__
>>> +
>>> +#include <asm/cpuid-autogen.h>
>>> +
>>> +#define FSCAPINTS FEATURESET_NR_ENTRIES
>>> +
>>> +#ifndef __ASSEMBLY__
>>> +#include <xen/types.h>
>>> +
>>> +extern const uint32_t known_features[FSCAPINTS];
>> ... the use of FSCAPINTS here. You'd catch more mistakes if you
>> just used [] here.
> 
> Not quite.
> 
> The extern gives an explicit size so other translation units can use
> ARRAY_SIZE().

True.

> Without the BUILD_BUG_ON(), const uint32_t known_features[] can actually
> be longer than FSCAPINTS, and everything compiles fine.
> 
> The BUILD_BUG_ON() were introduced following an off-by-one error
> generating INIT_KNOWN_FEATURES, where ARRAY_SIZE(known_features) was
> different in this translation unit than all others.

But what if INIT_KNOWN_FEATURES inits fewer than the intended
number of elements. The remaining array members will be zero, sure,
but I think such a condition would suggest a mistake elsewhere, and
hence might be worth flagging.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset
  2016-02-12 16:50     ` Andrew Cooper
@ 2016-02-12 17:15       ` Jan Beulich
  0 siblings, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-12 17:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 12.02.16 at 17:50, <andrew.cooper3@citrix.com> wrote:
> On 12/02/16 16:47, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> Awkwardly, some new feature bits mean "Feature $X no longer works".
>>> Store these inverted in a featureset.
>>>
>>> This permits safe zero-extending of a smaller featureset as part of a
>>> comparison, and safe reasoning (subset?, superset?, compatible? etc.)
>>> without specific knowldge of meaning of each bit.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Acked-by: Jan Beulich <JBeulich@suse.com>
>> albeit ...
>>
>>> @@ -158,7 +174,7 @@
>>>  #define X86_FEATURE_INVPCID       ( 5*32+10) /*   Invalidate Process Context ID */
>>>  #define X86_FEATURE_RTM           ( 5*32+11) /*   Restricted Transactional Memory */
>>>  #define X86_FEATURE_CMT           ( 5*32+12) /*   Cache Monitoring Technology */
>>> -#define X86_FEATURE_NO_FPU_SEL    ( 5*32+13) /*   FPU CS/DS stored as zero */
>>> +#define X86_FEATURE_FPU_SEL       ( 5*32+13) /*!  FPU CS/DS stored as zero */
>> ... changes like this to the public interface should normally be
>> avoided (i.e. you had better left out the "NO" one when you first
>> created this file).
> 
> I couldn't find a neater way of doing this while keeping the name
> consistent with its representation.  I took the decision that this is
> the lesser of the available evils when making this change.
> 
> I am open to alternate suggestions.

How about you keep it in asm-x86/cpufeatures.h prior to this
patch? But it's not really a big deal provided the series up to
here goes in more or less at the same time...

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-12 17:05   ` Jan Beulich
@ 2016-02-12 17:42     ` Andrew Cooper
  2016-02-15  9:20       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-12 17:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 12/02/16 17:05, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> Use attributes to specify whether a feature is applicable to be exposed to:
>>  1) All guests
>>  2) HVM guests
>>  3) HVM HAP guests
> No provisions for PV-only or shadow-only features?

No.  There are not currently any, and I would prefer to avoid
introducing any.

Of course, if a plausible usecase comes along, we can certainly make
changes and accommodate.

>
>> +#define X86_FEATURE_MTRR          ( 0*32+12) /*S  Memory Type Range Registers */
> Why is this being hidden from PV guests (namely Dom0)? Right now
> this is being cleared for PHV only.

It is unilaterally hidden from PV domains other than dom0.  We have no
handling for them in emulate_privileged_op(), which means dom0 can't use
them anyway.

>
>> +#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */
> Shouldn't this get a "nested-only" class?

I am not sure that would be appropriate.  On the Intel side, this bit is
the only option in cpuid; the VT-x features need MSR-levelling, which is
moderately far away on my TODO list.

Having said that, the AMD side has all nested features in cpuid.  I
guess this is more a problem for whomever steps up and makes nested virt
a properly supported option, but this is way off at this point.

> Also don't we currently require HAP for nested mode to work?

Experimentally, the p2m lock contention caused by Shadow and Nested virt
caused Xen to fall over very frequently with watchdog timeouts.

Having said that, nothing formal is written down one way or another, and
it is possible in limited scenarios to make nested virt work without
hap.  FWIW, I would be happy with a blanket "no nested virt without HAP"
statement for Xen.

>
>>  #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
> Leaving this untouched warrants at least a comment in the commit
> message I would think.

The handling for the magic bits

>
>> +#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */
> Hmm, I'm confused - on one hand we currently clear this bit for
> PV guests, but otoh do_invalid_op() emulates it.

Urgh yes - I had forgotten about this gem.  I lacked sufficient tuits to
untangle the swamp which is the vtsc subsystem.

Currently, the dynamic vtsc setting controls whether the RDTSCP feature
flag is visible.

>
>> +#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
>> [...]
>> -#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
>> +#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */
> How do you intend to handle exposing these to 64-bit PV guests,
> but not to 32-bit ones?

At the end of this series, the deep dependency logic used by the
toolstack, and some remaining dynamic checks in the intercept hooks.

By the end of my plans for full cpuid handling, Xen will create a policy
on each domaincreate, and keep it consistent with calls such as set_width.

>
>>  #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
> This currently is left untouched for HVM guests, and gets cleared
> only when !cpu_has_apic (i.e. effectively never) for PV ones.

There is no HVM support for handling a guest trying to use EXTAPIC, and
PV guests don't get to play with the hardware APIC anyway.  As far as I
can tell, it has always been wrong to ever expose this feature.

>
>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension (MONITORX/MWAITX) */
> Why not exposed to HVM (also for _MWAIT as I now notice)?

Because that is a good chunk of extra work to support.  We would need to
use 4K monitor widths, and extra p2m handling.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-12 17:42     ` Andrew Cooper
@ 2016-02-15  9:20       ` Jan Beulich
  2016-02-15 14:38         ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15  9:20 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
> On 12/02/16 17:05, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> +#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */
>> Shouldn't this get a "nested-only" class?
> 
> I am not sure that would be appropriate.  On the Intel side, this bit is
> the only option in cpuid; the VT-x features need MSR-levelling, which is
> moderately far away on my TODO list.
> 
> Having said that, the AMD side has all nested features in cpuid.  I
> guess this is more a problem for whomever steps up and makes nested virt
> a properly supported option, but this is way off at this point.

Okay then. My hope was that introducing the extra category
wouldn't be too much extra effort inside this series.

>> Also don't we currently require HAP for nested mode to work?
> 
> Experimentally, the p2m lock contention caused by Shadow and Nested virt
> caused Xen to fall over very frequently with watchdog timeouts.
> 
> Having said that, nothing formal is written down one way or another, and
> it is possible in limited scenarios to make nested virt work without
> hap.  FWIW, I would be happy with a blanket "no nested virt without HAP"
> statement for Xen.

Same here.

>>>  #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
>> Leaving this untouched warrants at least a comment in the commit
>> message I would think.
> 
> The handling for the magic bits

Unfinished sentence?

>>> +#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */
>> Hmm, I'm confused - on one hand we currently clear this bit for
>> PV guests, but otoh do_invalid_op() emulates it.
> 
> Urgh yes - I had forgotten about this gem.  I lacked sufficient tuits to
> untangle the swamp which is the vtsc subsystem.
> 
> Currently, the dynamic vtsc setting controls whether the RDTSCP feature
> flag is visible.

I don't see where that would be happening - all I see is a single

        __clear_bit(X86_FEATURE_RDTSCP % 32, &d);

in pv_cpuid().

>>> +#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
>>> [...]
>>> -#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
>>> +#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */
>> How do you intend to handle exposing these to 64-bit PV guests,
>> but not to 32-bit ones?
> 
> At the end of this series, the deep dependency logic used by the
> toolstack, and some remaining dynamic checks in the intercept hooks.

Okay, I'll try to remember to look there once I get to that point in
the series.

>>>  #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
>> This currently is left untouched for HVM guests, and gets cleared
>> only when !cpu_has_apic (i.e. effectively never) for PV ones.
> 
> There is no HVM support for handling a guest trying to use EXTAPIC, and
> PV guests don't get to play with the hardware APIC anyway.  As far as I
> can tell, it has always been wrong to ever expose this feature.

Well, that's a fair statement, but should be made in the commit
message (after all it's a behavioral change).

>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
> (MONITORX/MWAITX) */
>> Why not exposed to HVM (also for _MWAIT as I now notice)?
> 
> Because that is a good chunk of extra work to support.  We would need to
> use 4K monitor widths, and extra p2m handling.

I don't understand: The base (_MWAIT) feature being exposed to
guests today, and kernels making use of the feature when available
suggests to me that things work. Are you saying you know
otherwise? (And if there really is a reason to mask the feature all of
the sudden, this should again be justified in the commit message.)

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets
  2016-02-05 13:42 ` [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
@ 2016-02-15 13:37   ` Jan Beulich
  2016-02-15 14:57     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 13:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -1,13 +1,165 @@
>  #include <xen/lib.h>
>  #include <asm/cpuid.h>
> +#include <asm/hvm/hvm.h>
> +#include <asm/hvm/vmx/vmcs.h>
> +#include <asm/processor.h>
> +
> +#define COMMON_1D INIT_COMMON_FEATURES
>  
>  const uint32_t known_features[] = INIT_KNOWN_FEATURES;
>  const uint32_t inverted_features[] = INIT_INVERTED_FEATURES;
>  
> +static const uint32_t pv_featuremask[] = INIT_PV_FEATURES;
> +static const uint32_t hvm_shadow_featuremask[] = INIT_HVM_SHADOW_FEATURES;
> +static const uint32_t hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES;

Considering that calculate_featuresets() gets called from
__start_xen(), that function and some others as well as the
above could apparently all live in .init.* sections.

> +uint32_t __read_mostly raw_featureset[FSCAPINTS];
> +uint32_t __read_mostly host_featureset[FSCAPINTS];
> +uint32_t __read_mostly pv_featureset[FSCAPINTS];
> +uint32_t __read_mostly hvm_featureset[FSCAPINTS];
> +
> +static void sanitise_featureset(uint32_t *fs)
> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < FSCAPINTS; ++i )
> +    {
> +        /* Clamp to known mask. */
> +        fs[i] &= known_features[i];
> +    }
> +
> +    switch ( boot_cpu_data.x86_vendor )
> +    {
> +    case X86_VENDOR_INTEL:
> +        /* Intel clears the common bits in e1d. */
> +        fs[FEATURESET_e1d] &= ~COMMON_1D;
> +        break;
> +
> +    case X86_VENDOR_AMD:
> +        /* AMD duplicates the common bits between 1d and e1d. */
> +        fs[FEATURESET_e1d] = ((fs[FEATURESET_1d]  &  COMMON_1D) |
> +                              (fs[FEATURESET_e1d] & ~COMMON_1D));
> +        break;
> +    }

How is this meant to work with cross vendor migration?

> +static void calculate_raw_featureset(void)
> +{
> +    unsigned int i, max, tmp;
> +
> +    max = cpuid_eax(0);
> +
> +    if ( max >= 1 )
> +        cpuid(0x1, &tmp, &tmp,
> +              &raw_featureset[FEATURESET_1c],
> +              &raw_featureset[FEATURESET_1d]);
> +    if ( max >= 7 )
> +        cpuid_count(0x7, 0, &tmp,
> +                    &raw_featureset[FEATURESET_7b0],
> +                    &raw_featureset[FEATURESET_7c0],
> +                    &tmp);
> +    if ( max >= 0xd )
> +        cpuid_count(0xd, 1,
> +                    &raw_featureset[FEATURESET_Da1],
> +                    &tmp, &tmp, &tmp);
> +
> +    max = cpuid_eax(0x80000000);
> +    if ( max >= 0x80000001 )

I don't recall where it was that I recently stumbled across a similar
check, but this is dangerous: Instead of >= this should check that
the upper 16 bits equal 0x8000 and the lower ones are >= 1.

> +static void calculate_host_featureset(void)
> +{
> +    memcpy(host_featureset, boot_cpu_data.x86_capability,
> +           sizeof(host_featureset));
> +}

Why not simply

#define host_featureset boot_cpu_data.x86_capability

?

> +static void calculate_pv_featureset(void)
> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < ARRAY_SIZE(pv_featureset); ++i )
> +        pv_featureset[i] = host_featureset[i] & pv_featuremask[i];

I think when two arrays are involved simply using FSCAPINTS
as the upper bound would be more appropriate (and shorter).

> +static void calculate_hvm_featureset(void)
> +{
> +    unsigned int i;
> +    const uint32_t *hvm_featuremask;
> +
> +    if ( !hvm_enabled )
> +        return;
> +
> +    hvm_featuremask = hvm_funcs.hap_supported ?
> +        hvm_hap_featuremask : hvm_shadow_featuremask;
> +
> +    for ( i = 0; i < ARRAY_SIZE(hvm_featureset); ++i )
> +        hvm_featureset[i] = host_featureset[i] & hvm_featuremask[i];
> +
> +    /* Unconditionally claim to be able to set the hypervisor bit. */
> +    __set_bit(X86_FEATURE_HYPERVISOR, hvm_featureset);
> +
> +    /*
> +     * On AMD, PV guests are entirely unable to use 'sysenter' as Xen runs in
> +     * long mode (and init_amd() has cleared it out of host capabilities), but
> +     * HVM guests are able if running in protected mode.
> +     */
> +    if ( (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
> +         test_bit(X86_FEATURE_SEP, raw_featureset) )
> +        __set_bit(X86_FEATURE_SEP, hvm_featureset);
> +
> +    /*
> +     * With VT-x, some features are only supported by Xen if dedicated
> +     * hardware support is also available.
> +     */
> +    if ( cpu_has_vmx )
> +    {
> +        if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
> +             !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
> +            __clear_bit(X86_FEATURE_MPX, hvm_featureset);
> +
> +        if ( !cpu_has_vmx_xsaves )
> +            __clear_bit(X86_FEATURE_XSAVES, hvm_featureset);
> +
> +        if ( !cpu_has_vmx_pcommit )
> +            __clear_bit(X86_FEATURE_PCOMMIT, hvm_featureset);
> +    }
> +
> +    sanitise_featureset(pv_featureset);

s/pv_/hvm_/ ?

>  static void __maybe_unused build_assertions(void)

While affecting an earlier patch - __init?

> --- a/xen/include/asm-x86/cpuid.h
> +++ b/xen/include/asm-x86/cpuid.h
> @@ -5,12 +5,29 @@
>  
>  #define FSCAPINTS FEATURESET_NR_ENTRIES
>  
> +#define FEATURESET_1d     0 /* 0x00000001.edx      */
> +#define FEATURESET_1c     1 /* 0x00000001.ecx      */
> +#define FEATURESET_e1d    2 /* 0x80000001.edx      */
> +#define FEATURESET_e1c    3 /* 0x80000001.ecx      */
> +#define FEATURESET_Da1    4 /* 0x0000000d:1.eax    */
> +#define FEATURESET_7b0    5 /* 0x00000007:0.ebx    */
> +#define FEATURESET_7c0    6 /* 0x00000007:0.ecx    */
> +#define FEATURESET_e7d    7 /* 0x80000007.edx      */
> +#define FEATURESET_e8b    8 /* 0x80000008.ebx      */
> +
>  #ifndef __ASSEMBLY__
>  #include <xen/types.h>
>  
>  extern const uint32_t known_features[FSCAPINTS];
>  extern const uint32_t inverted_features[FSCAPINTS];
>  
> +extern uint32_t raw_featureset[FSCAPINTS];
> +extern uint32_t host_featureset[FSCAPINTS];
> +extern uint32_t pv_featureset[FSCAPINTS];
> +extern uint32_t hvm_featureset[FSCAPINTS];

I wonder whether it wouldn't be better to make these accessible
only via function calls, with the functions returning pointers to
const, to avoid seducing people into fiddling with these from
outside cpuid.c.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-05 13:42 ` [PATCH v2 12/30] xen/x86: Generate deep dependencies of features Andrew Cooper
@ 2016-02-15 14:06   ` Jan Beulich
  2016-02-15 15:28     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 14:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>  
>  static void sanitise_featureset(uint32_t *fs)
>  {
> +    uint32_t disabled_features[FSCAPINTS];
>      unsigned int i;
>  
>      for ( i = 0; i < FSCAPINTS; ++i )
>      {
>          /* Clamp to known mask. */
>          fs[i] &= known_features[i];
> +
> +        /*
> +         * Identify which features with deep dependencies have been
> +         * disabled.
> +         */
> +        disabled_features[i] = ~fs[i] & deep_features[i];
> +    }
> +
> +    for_each_set_bit(i, (void *)disabled_features,
> +                     sizeof(disabled_features) * 8)
> +    {
> +        const uint32_t *dfs = lookup_deep_deps(i);
> +        unsigned int j;
> +
> +        ASSERT(dfs); /* deep_features[] should guarentee this. */
> +
> +        for ( j = 0; j < FSCAPINTS; ++j )
> +        {
> +            fs[j] &= ~dfs[j];
> +            disabled_features[j] &= ~dfs[j];
> +        }
>      }

Am I getting the logic in the Python script right that it is indeed
unnecessary for this loop to be restarted even when a feature
at a higher numbered bit position results in a lower numbered
bit getting cleared?

> @@ -153,6 +176,36 @@ void calculate_featuresets(void)
>      calculate_hvm_featureset();
>  }
>  
> +const uint32_t *lookup_deep_deps(uint32_t feature)

Do you really mean this rather internal function to be non-static?
Even if there was a later use in this series, it would seem suspicious
to me; in fact I'd have expected for it and ...

> +{
> +    static const struct {
> +        uint32_t feature;
> +        uint32_t fs[FSCAPINTS];
> +    } deep_deps[] = INIT_DEEP_DEPS;

... this data to again be placed in .init.* sections.

> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries)
>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>  
> +    deps = {
> +        XSAVE:
> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
> +
> +        AVX:
> +        (FMA, FMA4, F16C, AVX2, XOP),

I continue to question whether namely XOP, but perhaps also the
others here except maybe AVX2, really is depending on AVX, and
not just on XSAVE.

> +        PAE:
> +        (LM, ),
> +
> +        LM:
> +        (CX16, LAHF_LM, PAGE1GB),
> +
> +        XMM:
> +        (LM, ),
> +
> +        XMM2:
> +        (LM, ),
> +
> +        XMM3:
> +        (LM, ),

I don't think so - SSE2 is a commonly implied prereq for long mode,
but not SSE3. Instead what I'm missing is a dependency of SSEn,
AES, SHA and maybe more on SSE (or maybe directly FXSR as per
above), and of SSE on FXSR. And there may be more, like MMX
really ought to be dependent on FPU or FXSR (not currently
expressable as it seems).

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-15  9:20       ` Jan Beulich
@ 2016-02-15 14:38         ` Andrew Cooper
  2016-02-15 14:50           ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 14:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 09:20, Jan Beulich wrote:
>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>> +#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */
>>> Shouldn't this get a "nested-only" class?
>> I am not sure that would be appropriate.  On the Intel side, this bit is
>> the only option in cpuid; the VT-x features need MSR-levelling, which is
>> moderately far away on my TODO list.
>>
>> Having said that, the AMD side has all nested features in cpuid.  I
>> guess this is more a problem for whomever steps up and makes nested virt
>> a properly supported option, but this is way off at this point.
> Okay then. My hope was that introducing the extra category
> wouldn't be too much extra effort inside this series.
>>> Also don't we currently require HAP for nested mode to work?
>> Experimentally, the p2m lock contention caused by Shadow and Nested virt
>> caused Xen to fall over very frequently with watchdog timeouts.
>>
>> Having said that, nothing formal is written down one way or another, and
>> it is possible in limited scenarios to make nested virt work without
>> hap.  FWIW, I would be happy with a blanket "no nested virt without HAP"
>> statement for Xen.
> Same here.

Where possible, I am deliberately trying not to make policy changes
hidden inside a large functional series.

As far as nested virt specifically goes, I am still not sure what is the
best approach, so would prefer not to change things at this time.

>
>>>>  #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
>>> Leaving this untouched warrants at least a comment in the commit
>>> message I would think.
>> The handling for the magic bits
> Unfinished sentence?

Oops yes, although it was going to be statement rather than a query.

>
>>>> +#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */
>>> Hmm, I'm confused - on one hand we currently clear this bit for
>>> PV guests, but otoh do_invalid_op() emulates it.
>> Urgh yes - I had forgotten about this gem.  I lacked sufficient tuits to
>> untangle the swamp which is the vtsc subsystem.
>>
>> Currently, the dynamic vtsc setting controls whether the RDTSCP feature
>> flag is visible.
> I don't see where that would be happening - all I see is a single
>
>         __clear_bit(X86_FEATURE_RDTSCP % 32, &d);
>
> in pv_cpuid().

The HVM side is more dynamic.  Either way, the subsystem is a mess.

>
>>>> +#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
>>>> [...]
>>>> -#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
>>>> +#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */
>>> How do you intend to handle exposing these to 64-bit PV guests,
>>> but not to 32-bit ones?
>> At the end of this series, the deep dependency logic used by the
>> toolstack, and some remaining dynamic checks in the intercept hooks.
> Okay, I'll try to remember to look there once I get to that point in
> the series.
>
>>>>  #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
>>> This currently is left untouched for HVM guests, and gets cleared
>>> only when !cpu_has_apic (i.e. effectively never) for PV ones.
>> There is no HVM support for handling a guest trying to use EXTAPIC, and
>> PV guests don't get to play with the hardware APIC anyway.  As far as I
>> can tell, it has always been wrong to ever expose this feature.
> Well, that's a fair statement, but should be made in the commit
> message (after all it's a behavioral change).

I will include it in the commit message in the future.

>
>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>> (MONITORX/MWAITX) */
>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>> Because that is a good chunk of extra work to support.  We would need to
>> use 4K monitor widths, and extra p2m handling.
> I don't understand: The base (_MWAIT) feature being exposed to
> guests today, and kernels making use of the feature when available
> suggests to me that things work. Are you saying you know
> otherwise? (And if there really is a reason to mask the feature all of
> the sudden, this should again be justified in the commit message.)

PV guests had it clobbered by Xen in traps.c

HVM guests have:

vmx.c:
    case EXIT_REASON_MWAIT_INSTRUCTION:
    case EXIT_REASON_MONITOR_INSTRUCTION:
    case EXIT_REASON_GETSEC:
       
/*                                                                                                                                                                              

         * We should never exit on GETSEC because CR4.SMXE is always 0
when                                                                                                             

         * running in guest context, and the CPU checks that before
getting                                                                                                             

         * as far as
vmexit.                                                                                                                                                            

         */
        WARN_ON(exit_reason == EXIT_REASON_GETSEC);
    hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
        break;

and svm.c:
    case VMEXIT_MONITOR:
    case VMEXIT_MWAIT:
        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
        break;

I don't see how a guest could actually use this feature.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-15 14:38         ` Andrew Cooper
@ 2016-02-15 14:50           ` Jan Beulich
  2016-02-15 14:53             ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 14:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 09:20, Jan Beulich wrote:
>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>> (MONITORX/MWAITX) */
>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>> Because that is a good chunk of extra work to support.  We would need to
>>> use 4K monitor widths, and extra p2m handling.
>> I don't understand: The base (_MWAIT) feature being exposed to
>> guests today, and kernels making use of the feature when available
>> suggests to me that things work. Are you saying you know
>> otherwise? (And if there really is a reason to mask the feature all of
>> the sudden, this should again be justified in the commit message.)
> 
> PV guests had it clobbered by Xen in traps.c
> 
> HVM guests have:
> 
> vmx.c:
>     case EXIT_REASON_MWAIT_INSTRUCTION:
>     case EXIT_REASON_MONITOR_INSTRUCTION:
>[...]
>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>         break;
> 
> and svm.c:
>     case VMEXIT_MONITOR:
>     case VMEXIT_MWAIT:
>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>         break;
> 
> I don't see how a guest could actually use this feature.

Do you see the respective intercepts getting enabled anywhere?
(I don't outside of nested code, which I didn't check in detail.)

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-15 14:50           ` Jan Beulich
@ 2016-02-15 14:53             ` Andrew Cooper
  2016-02-15 15:02               ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 14:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 14:50, Jan Beulich wrote:
>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>> (MONITORX/MWAITX) */
>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>> Because that is a good chunk of extra work to support.  We would need to
>>>> use 4K monitor widths, and extra p2m handling.
>>> I don't understand: The base (_MWAIT) feature being exposed to
>>> guests today, and kernels making use of the feature when available
>>> suggests to me that things work. Are you saying you know
>>> otherwise? (And if there really is a reason to mask the feature all of
>>> the sudden, this should again be justified in the commit message.)
>> PV guests had it clobbered by Xen in traps.c
>>
>> HVM guests have:
>>
>> vmx.c:
>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>> [...]
>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>         break;
>>
>> and svm.c:
>>     case VMEXIT_MONITOR:
>>     case VMEXIT_MWAIT:
>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>         break;
>>
>> I don't see how a guest could actually use this feature.
> Do you see the respective intercepts getting enabled anywhere?
> (I don't outside of nested code, which I didn't check in detail.)

Yes - the intercepts are always enabled to prevent the guest actually
putting the processor to sleep.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap
  2016-02-05 13:42 ` [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
@ 2016-02-15 14:53   ` Jan Beulich
  2016-02-15 15:33     ` Andrew Cooper
  2016-02-15 14:56   ` Jan Beulich
  1 sibling, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 14:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -53,8 +53,24 @@ static unsigned int cleared_caps[NCAPINTS];
>  
>  void __init setup_clear_cpu_cap(unsigned int cap)
>  {
> +	const uint32_t *dfs;
> +	unsigned int i;
> +
> +	if ( test_bit(cap, cleared_caps) )
> +		return;
> +
>  	__clear_bit(cap, boot_cpu_data.x86_capability);
>  	__set_bit(cap, cleared_caps);

Perhaps __test_and_set_bit() above?

> +	dfs = lookup_deep_deps(cap);

Ah, this is a use I can see as valid outside of cpuid.c, but it's still
(as I had expected) one from an __init function.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap
  2016-02-05 13:42 ` [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
  2016-02-15 14:53   ` Jan Beulich
@ 2016-02-15 14:56   ` Jan Beulich
  1 sibling, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 14:56 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -53,8 +53,24 @@ static unsigned int cleared_caps[NCAPINTS];
>  
>  void __init setup_clear_cpu_cap(unsigned int cap)
>  {
> +	const uint32_t *dfs;
> +	unsigned int i;
> +
> +	if ( test_bit(cap, cleared_caps) )
> +		return;
> +
>  	__clear_bit(cap, boot_cpu_data.x86_capability);
>  	__set_bit(cap, cleared_caps);
> +
> +	dfs = lookup_deep_deps(cap);
> +
> +	if ( !dfs )
> +		return;
> +
> +	for ( i = 0; i < FSCAPINTS; ++i ) {
> +		cleared_caps[i] |= dfs[i];
> +		boot_cpu_data.x86_capability[i] &= ~dfs[i];
> +	}
>  }

Oh, also - you're mixing styles here (note the stray blanks inside
for() and if()) ...

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets
  2016-02-15 13:37   ` Jan Beulich
@ 2016-02-15 14:57     ` Andrew Cooper
  2016-02-15 15:07       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 14:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 13:37, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/cpuid.c
>> +++ b/xen/arch/x86/cpuid.c
>> @@ -1,13 +1,165 @@
>>  #include <xen/lib.h>
>>  #include <asm/cpuid.h>
>> +#include <asm/hvm/hvm.h>
>> +#include <asm/hvm/vmx/vmcs.h>
>> +#include <asm/processor.h>
>> +
>> +#define COMMON_1D INIT_COMMON_FEATURES
>>  
>>  const uint32_t known_features[] = INIT_KNOWN_FEATURES;
>>  const uint32_t inverted_features[] = INIT_INVERTED_FEATURES;
>>  
>> +static const uint32_t pv_featuremask[] = INIT_PV_FEATURES;
>> +static const uint32_t hvm_shadow_featuremask[] = INIT_HVM_SHADOW_FEATURES;
>> +static const uint32_t hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES;
> Considering that calculate_featuresets() gets called from
> __start_xen(), that function and some others as well as the
> above could apparently all live in .init.* sections.

known and inverted features are used from the AP bringup path, so can't
be init.

calculate_featureset(), and these _featuremask[] arrays can be init.

>
>> +uint32_t __read_mostly raw_featureset[FSCAPINTS];
>> +uint32_t __read_mostly host_featureset[FSCAPINTS];
>> +uint32_t __read_mostly pv_featureset[FSCAPINTS];
>> +uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>> +
>> +static void sanitise_featureset(uint32_t *fs)
>> +{
>> +    unsigned int i;
>> +
>> +    for ( i = 0; i < FSCAPINTS; ++i )
>> +    {
>> +        /* Clamp to known mask. */
>> +        fs[i] &= known_features[i];
>> +    }
>> +
>> +    switch ( boot_cpu_data.x86_vendor )
>> +    {
>> +    case X86_VENDOR_INTEL:
>> +        /* Intel clears the common bits in e1d. */
>> +        fs[FEATURESET_e1d] &= ~COMMON_1D;
>> +        break;
>> +
>> +    case X86_VENDOR_AMD:
>> +        /* AMD duplicates the common bits between 1d and e1d. */
>> +        fs[FEATURESET_e1d] = ((fs[FEATURESET_1d]  &  COMMON_1D) |
>> +                              (fs[FEATURESET_e1d] & ~COMMON_1D));
>> +        break;
>> +    }
> How is this meant to work with cross vendor migration?

I don't see how cross-vendor is relevant here.  This is about reporting
the hosts modified featureset accurately to the toolstack.

Once the deep dependency logic logic is inserted in here, it is possible
that some of the common features are modified, at which point their
shared nature on AMD needs re-calculating.

>
>> +static void calculate_raw_featureset(void)
>> +{
>> +    unsigned int i, max, tmp;
>> +
>> +    max = cpuid_eax(0);
>> +
>> +    if ( max >= 1 )
>> +        cpuid(0x1, &tmp, &tmp,
>> +              &raw_featureset[FEATURESET_1c],
>> +              &raw_featureset[FEATURESET_1d]);
>> +    if ( max >= 7 )
>> +        cpuid_count(0x7, 0, &tmp,
>> +                    &raw_featureset[FEATURESET_7b0],
>> +                    &raw_featureset[FEATURESET_7c0],
>> +                    &tmp);
>> +    if ( max >= 0xd )
>> +        cpuid_count(0xd, 1,
>> +                    &raw_featureset[FEATURESET_Da1],
>> +                    &tmp, &tmp, &tmp);
>> +
>> +    max = cpuid_eax(0x80000000);
>> +    if ( max >= 0x80000001 )
> I don't recall where it was that I recently stumbled across a similar
> check, but this is dangerous: Instead of >= this should check that
> the upper 16 bits equal 0x8000 and the lower ones are >= 1.

Ok.

>
>> +static void calculate_host_featureset(void)
>> +{
>> +    memcpy(host_featureset, boot_cpu_data.x86_capability,
>> +           sizeof(host_featureset));
>> +}
> Why not simply
>
> #define host_featureset boot_cpu_data.x86_capability

ARRAY_SIZE(host_featureset) changes, although I guess this won't matter
if I standardise on FSCAPINTS.

>
> ?
>
>> +static void calculate_pv_featureset(void)
>> +{
>> +    unsigned int i;
>> +
>> +    for ( i = 0; i < ARRAY_SIZE(pv_featureset); ++i )
>> +        pv_featureset[i] = host_featureset[i] & pv_featuremask[i];
> I think when two arrays are involved simply using FSCAPINTS
> as the upper bound would be more appropriate (and shorter).
>
>> +static void calculate_hvm_featureset(void)
>> +{
>> +    unsigned int i;
>> +    const uint32_t *hvm_featuremask;
>> +
>> +    if ( !hvm_enabled )
>> +        return;
>> +
>> +    hvm_featuremask = hvm_funcs.hap_supported ?
>> +        hvm_hap_featuremask : hvm_shadow_featuremask;
>> +
>> +    for ( i = 0; i < ARRAY_SIZE(hvm_featureset); ++i )
>> +        hvm_featureset[i] = host_featureset[i] & hvm_featuremask[i];
>> +
>> +    /* Unconditionally claim to be able to set the hypervisor bit. */
>> +    __set_bit(X86_FEATURE_HYPERVISOR, hvm_featureset);
>> +
>> +    /*
>> +     * On AMD, PV guests are entirely unable to use 'sysenter' as Xen runs in
>> +     * long mode (and init_amd() has cleared it out of host capabilities), but
>> +     * HVM guests are able if running in protected mode.
>> +     */
>> +    if ( (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
>> +         test_bit(X86_FEATURE_SEP, raw_featureset) )
>> +        __set_bit(X86_FEATURE_SEP, hvm_featureset);
>> +
>> +    /*
>> +     * With VT-x, some features are only supported by Xen if dedicated
>> +     * hardware support is also available.
>> +     */
>> +    if ( cpu_has_vmx )
>> +    {
>> +        if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
>> +             !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
>> +            __clear_bit(X86_FEATURE_MPX, hvm_featureset);
>> +
>> +        if ( !cpu_has_vmx_xsaves )
>> +            __clear_bit(X86_FEATURE_XSAVES, hvm_featureset);
>> +
>> +        if ( !cpu_has_vmx_pcommit )
>> +            __clear_bit(X86_FEATURE_PCOMMIT, hvm_featureset);
>> +    }
>> +
>> +    sanitise_featureset(pv_featureset);
> s/pv_/hvm_/ ?

Oops yes.

>
>>  static void __maybe_unused build_assertions(void)
> While affecting an earlier patch - __init?

Can do, but nothing from this actually gets emitted into a translation unit.

>
>> --- a/xen/include/asm-x86/cpuid.h
>> +++ b/xen/include/asm-x86/cpuid.h
>> @@ -5,12 +5,29 @@
>>  
>>  #define FSCAPINTS FEATURESET_NR_ENTRIES
>>  
>> +#define FEATURESET_1d     0 /* 0x00000001.edx      */
>> +#define FEATURESET_1c     1 /* 0x00000001.ecx      */
>> +#define FEATURESET_e1d    2 /* 0x80000001.edx      */
>> +#define FEATURESET_e1c    3 /* 0x80000001.ecx      */
>> +#define FEATURESET_Da1    4 /* 0x0000000d:1.eax    */
>> +#define FEATURESET_7b0    5 /* 0x00000007:0.ebx    */
>> +#define FEATURESET_7c0    6 /* 0x00000007:0.ecx    */
>> +#define FEATURESET_e7d    7 /* 0x80000007.edx      */
>> +#define FEATURESET_e8b    8 /* 0x80000008.ebx      */
>> +
>>  #ifndef __ASSEMBLY__
>>  #include <xen/types.h>
>>  
>>  extern const uint32_t known_features[FSCAPINTS];
>>  extern const uint32_t inverted_features[FSCAPINTS];
>>  
>> +extern uint32_t raw_featureset[FSCAPINTS];
>> +extern uint32_t host_featureset[FSCAPINTS];
>> +extern uint32_t pv_featureset[FSCAPINTS];
>> +extern uint32_t hvm_featureset[FSCAPINTS];
> I wonder whether it wouldn't be better to make these accessible
> only via function calls, with the functions returning pointers to
> const, to avoid seducing people into fiddling with these from
> outside cpuid.c.

That does make it more awkward to iterate over, although I suppose I
don't do too much of that outside of cpuid.c

Also, without LTO, the function calls can't actually be omitted.  I am
tempted to leave it as-is and rely on code review to catch miuses.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-15 14:53             ` Andrew Cooper
@ 2016-02-15 15:02               ` Jan Beulich
  2016-02-15 15:41                 ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 15:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 14:50, Jan Beulich wrote:
>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>> (MONITORX/MWAITX) */
>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>> use 4K monitor widths, and extra p2m handling.
>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>> guests today, and kernels making use of the feature when available
>>>> suggests to me that things work. Are you saying you know
>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>> the sudden, this should again be justified in the commit message.)
>>> PV guests had it clobbered by Xen in traps.c
>>>
>>> HVM guests have:
>>>
>>> vmx.c:
>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>> [...]
>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>         break;
>>>
>>> and svm.c:
>>>     case VMEXIT_MONITOR:
>>>     case VMEXIT_MWAIT:
>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>         break;
>>>
>>> I don't see how a guest could actually use this feature.
>> Do you see the respective intercepts getting enabled anywhere?
>> (I don't outside of nested code, which I didn't check in detail.)
> 
> Yes - the intercepts are always enabled to prevent the guest actually
> putting the processor to sleep.

Hmm, you're right, somehow I've managed to ignore the relevant
lines grep reported. Yet - how do things work then, without the
MWAIT feature flag currently getting cleared?

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets
  2016-02-15 14:57     ` Andrew Cooper
@ 2016-02-15 15:07       ` Jan Beulich
  2016-02-15 15:52         ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 15:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 15.02.16 at 15:57, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 13:37, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/cpuid.c
>>> +++ b/xen/arch/x86/cpuid.c
>>> @@ -1,13 +1,165 @@
>>>  #include <xen/lib.h>
>>>  #include <asm/cpuid.h>
>>> +#include <asm/hvm/hvm.h>
>>> +#include <asm/hvm/vmx/vmcs.h>
>>> +#include <asm/processor.h>
>>> +
>>> +#define COMMON_1D INIT_COMMON_FEATURES
>>>  
>>>  const uint32_t known_features[] = INIT_KNOWN_FEATURES;
>>>  const uint32_t inverted_features[] = INIT_INVERTED_FEATURES;
>>>  
>>> +static const uint32_t pv_featuremask[] = INIT_PV_FEATURES;
>>> +static const uint32_t hvm_shadow_featuremask[] = INIT_HVM_SHADOW_FEATURES;
>>> +static const uint32_t hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES;
>> Considering that calculate_featuresets() gets called from
>> __start_xen(), that function and some others as well as the
>> above could apparently all live in .init.* sections.
> 
> known and inverted features are used from the AP bringup path, so can't
> be init.

Of course. My comment was only about the additions done here.

>>> +uint32_t __read_mostly raw_featureset[FSCAPINTS];
>>> +uint32_t __read_mostly host_featureset[FSCAPINTS];
>>> +uint32_t __read_mostly pv_featureset[FSCAPINTS];
>>> +uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>>> +
>>> +static void sanitise_featureset(uint32_t *fs)
>>> +{
>>> +    unsigned int i;
>>> +
>>> +    for ( i = 0; i < FSCAPINTS; ++i )
>>> +    {
>>> +        /* Clamp to known mask. */
>>> +        fs[i] &= known_features[i];
>>> +    }
>>> +
>>> +    switch ( boot_cpu_data.x86_vendor )
>>> +    {
>>> +    case X86_VENDOR_INTEL:
>>> +        /* Intel clears the common bits in e1d. */
>>> +        fs[FEATURESET_e1d] &= ~COMMON_1D;
>>> +        break;
>>> +
>>> +    case X86_VENDOR_AMD:
>>> +        /* AMD duplicates the common bits between 1d and e1d. */
>>> +        fs[FEATURESET_e1d] = ((fs[FEATURESET_1d]  &  COMMON_1D) |
>>> +                              (fs[FEATURESET_e1d] & ~COMMON_1D));
>>> +        break;
>>> +    }
>> How is this meant to work with cross vendor migration?
> 
> I don't see how cross-vendor is relevant here.  This is about reporting
> the hosts modified featureset accurately to the toolstack.

I.e. you're not later going to use what you generate here to also
massage (or at least validate) what guests are going to see?

>>> +static void calculate_host_featureset(void)
>>> +{
>>> +    memcpy(host_featureset, boot_cpu_data.x86_capability,
>>> +           sizeof(host_featureset));
>>> +}
>> Why not simply
>>
>> #define host_featureset boot_cpu_data.x86_capability
> 
> ARRAY_SIZE(host_featureset) changes, although I guess this won't matter
> if I standardise on FSCAPINTS.

Right, since boot_cpu_data.x86_capability[] cannot, according to
earlier patches, have less than FSCAPINTS elements.

>>>  static void __maybe_unused build_assertions(void)
>> While affecting an earlier patch - __init?
> 
> Can do, but nothing from this actually gets emitted into a translation unit.

With the few(?) compiler versions you managed to test against, I
suppose...

>>> --- a/xen/include/asm-x86/cpuid.h
>>> +++ b/xen/include/asm-x86/cpuid.h
>>> @@ -5,12 +5,29 @@
>>>  
>>>  #define FSCAPINTS FEATURESET_NR_ENTRIES
>>>  
>>> +#define FEATURESET_1d     0 /* 0x00000001.edx      */
>>> +#define FEATURESET_1c     1 /* 0x00000001.ecx      */
>>> +#define FEATURESET_e1d    2 /* 0x80000001.edx      */
>>> +#define FEATURESET_e1c    3 /* 0x80000001.ecx      */
>>> +#define FEATURESET_Da1    4 /* 0x0000000d:1.eax    */
>>> +#define FEATURESET_7b0    5 /* 0x00000007:0.ebx    */
>>> +#define FEATURESET_7c0    6 /* 0x00000007:0.ecx    */
>>> +#define FEATURESET_e7d    7 /* 0x80000007.edx      */
>>> +#define FEATURESET_e8b    8 /* 0x80000008.ebx      */
>>> +
>>>  #ifndef __ASSEMBLY__
>>>  #include <xen/types.h>
>>>  
>>>  extern const uint32_t known_features[FSCAPINTS];
>>>  extern const uint32_t inverted_features[FSCAPINTS];
>>>  
>>> +extern uint32_t raw_featureset[FSCAPINTS];
>>> +extern uint32_t host_featureset[FSCAPINTS];
>>> +extern uint32_t pv_featureset[FSCAPINTS];
>>> +extern uint32_t hvm_featureset[FSCAPINTS];
>> I wonder whether it wouldn't be better to make these accessible
>> only via function calls, with the functions returning pointers to
>> const, to avoid seducing people into fiddling with these from
>> outside cpuid.c.
> 
> That does make it more awkward to iterate over, although I suppose I
> don't do too much of that outside of cpuid.c
> 
> Also, without LTO, the function calls can't actually be omitted.  I am
> tempted to leave it as-is and rely on code review to catch miuses.

Okay then.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-15 14:06   ` Jan Beulich
@ 2016-02-15 15:28     ` Andrew Cooper
  2016-02-15 15:52       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 15:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 14:06, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>>  
>>  static void sanitise_featureset(uint32_t *fs)
>>  {
>> +    uint32_t disabled_features[FSCAPINTS];
>>      unsigned int i;
>>  
>>      for ( i = 0; i < FSCAPINTS; ++i )
>>      {
>>          /* Clamp to known mask. */
>>          fs[i] &= known_features[i];
>> +
>> +        /*
>> +         * Identify which features with deep dependencies have been
>> +         * disabled.
>> +         */
>> +        disabled_features[i] = ~fs[i] & deep_features[i];
>> +    }
>> +
>> +    for_each_set_bit(i, (void *)disabled_features,
>> +                     sizeof(disabled_features) * 8)
>> +    {
>> +        const uint32_t *dfs = lookup_deep_deps(i);
>> +        unsigned int j;
>> +
>> +        ASSERT(dfs); /* deep_features[] should guarentee this. */
>> +
>> +        for ( j = 0; j < FSCAPINTS; ++j )
>> +        {
>> +            fs[j] &= ~dfs[j];
>> +            disabled_features[j] &= ~dfs[j];
>> +        }
>>      }
> Am I getting the logic in the Python script right that it is indeed
> unnecessary for this loop to be restarted even when a feature
> at a higher numbered bit position results in a lower numbered
> bit getting cleared?

Correct - the python logic flattens the dependency tree so an individual
lookup_deep_deps() will get you all features eventually influenced by
feature i.  It might be the deep features for i includes a feature
numerically lower than i, but because dfs[] contains all features on the
eventual chain, we don't need to start back from 0 again.

I felt this was far more productive to code at build time, rather than
at runtime.

>
>> @@ -153,6 +176,36 @@ void calculate_featuresets(void)
>>      calculate_hvm_featureset();
>>  }
>>  
>> +const uint32_t *lookup_deep_deps(uint32_t feature)
> Do you really mean this rather internal function to be non-static?

(You have found the answer already)

> Even if there was a later use in this series, it would seem suspicious
> to me; in fact I'd have expected for it and ...
>
>> +{
>> +    static const struct {
>> +        uint32_t feature;
>> +        uint32_t fs[FSCAPINTS];
>> +    } deep_deps[] = INIT_DEEP_DEPS;
> ... this data to again be placed in .init.* sections.

For this series, I think it can be .init.  For the future cpuid changes,
I am going to need to use it for validation during the set_cpuid
hypercall, so will have to move to not being .init for that.

>
>> --- a/xen/tools/gen-cpuid.py
>> +++ b/xen/tools/gen-cpuid.py
>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries)
>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>  
>> +    deps = {
>> +        XSAVE:
>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>> +
>> +        AVX:
>> +        (FMA, FMA4, F16C, AVX2, XOP),
> I continue to question whether namely XOP, but perhaps also the
> others here except maybe AVX2, really is depending on AVX, and
> not just on XSAVE.

I am sure we have had this argument before.

All VEX encoded SIMD instructions (including XOP which is listed in the
same category by AMD) are specified to act on 256bit AVX state, and
require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
instructions encoding %xmm registers, which explicitly zero the upper
128bits of the associated %ymm register.

This is very clearly a dependency on AVX, even if it isn't written in
one clear concise statement in the instruction manuals.

>
>> +        PAE:
>> +        (LM, ),
>> +
>> +        LM:
>> +        (CX16, LAHF_LM, PAGE1GB),
>> +
>> +        XMM:
>> +        (LM, ),
>> +
>> +        XMM2:
>> +        (LM, ),
>> +
>> +        XMM3:
>> +        (LM, ),
> I don't think so - SSE2 is a commonly implied prereq for long mode,
> but not SSE3. Instead what I'm missing is a dependency of SSEn,
> AES, SHA and maybe more on SSE (or maybe directly FXSR as per
> above), and of SSE on FXSR. And there may be more, like MMX
> really ought to be dependent on FPU or FXSR (not currently
> expressable as it seems).

I will double check all of these.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap
  2016-02-15 14:53   ` Jan Beulich
@ 2016-02-15 15:33     ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 15:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 14:53, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -53,8 +53,24 @@ static unsigned int cleared_caps[NCAPINTS];
>>  
>>  void __init setup_clear_cpu_cap(unsigned int cap)
>>  {
>> +	const uint32_t *dfs;
>> +	unsigned int i;
>> +
>> +	if ( test_bit(cap, cleared_caps) )
>> +		return;
>> +
>>  	__clear_bit(cap, boot_cpu_data.x86_capability);
>>  	__set_bit(cap, cleared_caps);
> Perhaps __test_and_set_bit() above?

Hmm yes - that won't make it atomic.

And I will fix up the style issues.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-15 15:02               ` Jan Beulich
@ 2016-02-15 15:41                 ` Andrew Cooper
  2016-02-17 19:02                   ` Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: " Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 15:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 15:02, Jan Beulich wrote:
>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>> (MONITORX/MWAITX) */
>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>> guests today, and kernels making use of the feature when available
>>>>> suggests to me that things work. Are you saying you know
>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>> the sudden, this should again be justified in the commit message.)
>>>> PV guests had it clobbered by Xen in traps.c
>>>>
>>>> HVM guests have:
>>>>
>>>> vmx.c:
>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>> [...]
>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>         break;
>>>>
>>>> and svm.c:
>>>>     case VMEXIT_MONITOR:
>>>>     case VMEXIT_MWAIT:
>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>         break;
>>>>
>>>> I don't see how a guest could actually use this feature.
>>> Do you see the respective intercepts getting enabled anywhere?
>>> (I don't outside of nested code, which I didn't check in detail.)
>> Yes - the intercepts are always enabled to prevent the guest actually
>> putting the processor to sleep.
> Hmm, you're right, somehow I've managed to ignore the relevant
> lines grep reported. Yet - how do things work then, without the
> MWAIT feature flag currently getting cleared?

I have never observed it being used.  Do you have some local patches in
the SLES hypervisor?

There is some gross layer violation in xen/enlighten.c to pretend that
MWAIT is present to trick the ACPI code into evaluating _CST() methods
to report back to Xen.  (This is yet another PV-ism which will cause a
headache for a DMLite dom0)

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-05 13:42 ` [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
@ 2016-02-15 15:43   ` Jan Beulich
  2016-02-15 17:12     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 15:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>          /* Fix up VLAPIC details. */
>          *ebx &= 0x00FFFFFFu;
>          *ebx |= (v->vcpu_id * 2) << 24;
> +
> +        *ecx &= hvm_featureset[FEATURESET_1c];
> +        *edx &= hvm_featureset[FEATURESET_1d];

Looks like I've overlooked an issue in patch 11, which becomes
apparent here: How can you use a domain-independent featureset
here, when features vary between HAP and shadow mode guests?
I.e. in the earlier patch I suppose you need to calculate two
hvm_*_featureset[]s, with the HAP one perhaps empty when
!hvm_funcs.hap_supported.

> @@ -4694,6 +4687,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>          break;
>  
>      case 0x80000001:
> +        *ecx &= hvm_featureset[FEATURESET_e1c];
> +        *edx &= hvm_featureset[FEATURESET_e1d];

Looking at the uses here, wouldn't it be better (less ambiguous) for
these to be named FEATURESET_x1c and FEATURESET_x1d
respectively, since x is not a hex digit, but e is?

> @@ -841,69 +842,70 @@ void pv_cpuid(struct cpu_user_regs *regs)
>      else
>          cpuid_count(leaf, subleaf, &a, &b, &c, &d);
>  
> -    if ( (leaf & 0x7fffffff) == 0x00000001 )
> -    {
> -        /* Modify Feature Information. */
> -        if ( !cpu_has_apic )
> -            __clear_bit(X86_FEATURE_APIC, &d);
> -
> -        if ( !is_pvh_domain(currd) )
> -        {
> -            __clear_bit(X86_FEATURE_PSE, &d);
> -            __clear_bit(X86_FEATURE_PGE, &d);
> -            __clear_bit(X86_FEATURE_PSE36, &d);
> -            __clear_bit(X86_FEATURE_VME, &d);
> -        }
> -    }
> -
>      switch ( leaf )
>      {
>      case 0x00000001:
> -        /* Modify Feature Information. */
> -        if ( !cpu_has_sep )
> -            __clear_bit(X86_FEATURE_SEP, &d);
> -        __clear_bit(X86_FEATURE_DS, &d);
> -        __clear_bit(X86_FEATURE_ACC, &d);
> -        __clear_bit(X86_FEATURE_PBE, &d);
> -        if ( is_pvh_domain(currd) )
> -            __clear_bit(X86_FEATURE_MTRR, &d);
> -
> -        __clear_bit(X86_FEATURE_DTES64 % 32, &c);
> -        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
> -        __clear_bit(X86_FEATURE_DSCPL % 32, &c);
> -        __clear_bit(X86_FEATURE_VMXE % 32, &c);
> -        __clear_bit(X86_FEATURE_SMXE % 32, &c);
> -        __clear_bit(X86_FEATURE_TM2 % 32, &c);
> +        c &= pv_featureset[FEATURESET_1c];
> +        d &= pv_featureset[FEATURESET_1d];
> +
>          if ( is_pv_32bit_domain(currd) )
> -            __clear_bit(X86_FEATURE_CX16 % 32, &c);
> -        __clear_bit(X86_FEATURE_XTPR % 32, &c);
> -        __clear_bit(X86_FEATURE_PDCM % 32, &c);
> -        __clear_bit(X86_FEATURE_PCID % 32, &c);
> -        __clear_bit(X86_FEATURE_DCA % 32, &c);
> -        if ( !cpu_has_xsave )
> -        {
> -            __clear_bit(X86_FEATURE_XSAVE % 32, &c);
> -            __clear_bit(X86_FEATURE_AVX % 32, &c);
> -        }
> -        if ( !cpu_has_apic )
> -           __clear_bit(X86_FEATURE_X2APIC % 32, &c);
> -        __set_bit(X86_FEATURE_HYPERVISOR % 32, &c);
> +            c &= ~cpufeat_mask(X86_FEATURE_CX16);

Shouldn't this be taken care of by clearing LM and then applying
your dependencies correction? Or is that meant to only get
enforced later? Is it maybe still worth having both pv64_featureset[]
and pv32_featureset[]?

> +        /*
> +         * !!! Warning - OSXSAVE handling for PV guests is non-architectural !!!
> +         *
> +         * Architecturally, the correct code here is simply:
> +         *
> +         *   if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE )
> +         *       c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
> +         *
> +         * However because of bugs in Xen (before c/s bd19080b, Nov 2010, the
> +         * XSAVE cpuid flag leaked into guests despite the feature not being
> +         * avilable for use), buggy workarounds where introduced to Linux (c/s
> +         * 947ccf9c, also Nov 2010) which relied on the fact that Xen also
> +         * incorrectly leaked OSXSAVE into the guest.
> +         *
> +         * Furthermore, providing architectural OSXSAVE behaviour to a many
> +         * Linux PV guests triggered a further kernel bug when the fpu code
> +         * observes that XSAVEOPT is available, assumes that xsave state had
> +         * been set up for the task, and follows a wild pointer.
> +         *
> +         * Therefore, the leaking of Xen's OSXSAVE setting has become a
> +         * defacto part of the PV ABI and can't reasonably be corrected.
> +         *
> +         * The following situations and logic now applies:
> +         *
> +         * - Hardware without CPUID faulting support and native CPUID:
> +         *    There is nothing Xen can do here.  The hosts XSAVE flag will
> +         *    leak through and Xen's OSXSAVE choice will leak through.
> +         *
> +         *    In the case that the guest kernel has not set up OSXSAVE, only
> +         *    SSE will be set in xcr0, and guest userspace can't do too much
> +         *    damage itself.
> +         *
> +         * - Enlightened CPUID or CPUID faulting available:
> +         *    Xen can fully control what is seen here.  Guest kernels need to
> +         *    see the leaked OSXSAVE, but guest userspace is given
> +         *    architectural behaviour, to reflect the guest kernels
> +         *    intentions.
> +         */

Well, I think all of this is too harsh: In a hypervisor-guest
relationship of PV kind I don't view it as entirely wrong to let the
guest kernel know of whether the hypervisor enabled XSAVE
support by inspecting the OSXSAVE bit. From guest kernel
perspective, the hypervisor is kind of the OS. While I won't
make weakening the above a little a requirement, I'd appreciate
you doing so.

> +    case 0x80000007:
> +        d &= pv_featureset[FEATURESET_e7d];
> +        break;

By not clearing eax and ebx (not sure about ecx) here you would
again expose flags to guests without proper white listing.

> +    case 0x80000008:
> +        b &= pv_featureset[FEATURESET_e8b];
>          break;

Same here for ecx and edx and perhaps the upper 8 bits of eax.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-15 15:28     ` Andrew Cooper
@ 2016-02-15 15:52       ` Jan Beulich
  2016-02-15 16:09         ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 15:52 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 15.02.16 at 16:28, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 14:06, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>>>  
>>>  static void sanitise_featureset(uint32_t *fs)
>>>  {
>>> +    uint32_t disabled_features[FSCAPINTS];
>>>      unsigned int i;
>>>  
>>>      for ( i = 0; i < FSCAPINTS; ++i )
>>>      {
>>>          /* Clamp to known mask. */
>>>          fs[i] &= known_features[i];
>>> +
>>> +        /*
>>> +         * Identify which features with deep dependencies have been
>>> +         * disabled.
>>> +         */
>>> +        disabled_features[i] = ~fs[i] & deep_features[i];
>>> +    }
>>> +
>>> +    for_each_set_bit(i, (void *)disabled_features,
>>> +                     sizeof(disabled_features) * 8)
>>> +    {
>>> +        const uint32_t *dfs = lookup_deep_deps(i);
>>> +        unsigned int j;
>>> +
>>> +        ASSERT(dfs); /* deep_features[] should guarentee this. */
>>> +
>>> +        for ( j = 0; j < FSCAPINTS; ++j )
>>> +        {
>>> +            fs[j] &= ~dfs[j];
>>> +            disabled_features[j] &= ~dfs[j];
>>> +        }
>>>      }
>> Am I getting the logic in the Python script right that it is indeed
>> unnecessary for this loop to be restarted even when a feature
>> at a higher numbered bit position results in a lower numbered
>> bit getting cleared?
> 
> Correct - the python logic flattens the dependency tree so an individual
> lookup_deep_deps() will get you all features eventually influenced by
> feature i.  It might be the deep features for i includes a feature
> numerically lower than i, but because dfs[] contains all features on the
> eventual chain, we don't need to start back from 0 again.
> 
> I felt this was far more productive to code at build time, rather than
> at runtime.

Oh, definitely.

>>> --- a/xen/tools/gen-cpuid.py
>>> +++ b/xen/tools/gen-cpuid.py
>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
> nr_entries)
>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>  
>>> +    deps = {
>>> +        XSAVE:
>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>> +
>>> +        AVX:
>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>> I continue to question whether namely XOP, but perhaps also the
>> others here except maybe AVX2, really is depending on AVX, and
>> not just on XSAVE.
> 
> I am sure we have had this argument before.

Indeed, hence the "I continue to ...".

> All VEX encoded SIMD instructions (including XOP which is listed in the
> same category by AMD) are specified to act on 256bit AVX state, and
> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
> instructions encoding %xmm registers, which explicitly zero the upper
> 128bits of the associated %ymm register.
> 
> This is very clearly a dependency on AVX, even if it isn't written in
> one clear concise statement in the instruction manuals.

The question is what AVX actually means: To me it's an instruction set
extension, not one of machine state. The machine state extension to
me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
intentionally say "to me", because I can certainly see why this may be
viewed differently.) Note how you yourself have recourse to XCR0,
which is very clearly tied to XSAVE and not AVX, above (and note also
that there's nothing called AVX to be enabled in XCR0, it's YMM that
you talk about).

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets
  2016-02-15 15:07       ` Jan Beulich
@ 2016-02-15 15:52         ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 15:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 15:07, Jan Beulich wrote:
>
>>>> +uint32_t __read_mostly raw_featureset[FSCAPINTS];
>>>> +uint32_t __read_mostly host_featureset[FSCAPINTS];
>>>> +uint32_t __read_mostly pv_featureset[FSCAPINTS];
>>>> +uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>>>> +
>>>> +static void sanitise_featureset(uint32_t *fs)
>>>> +{
>>>> +    unsigned int i;
>>>> +
>>>> +    for ( i = 0; i < FSCAPINTS; ++i )
>>>> +    {
>>>> +        /* Clamp to known mask. */
>>>> +        fs[i] &= known_features[i];
>>>> +    }
>>>> +
>>>> +    switch ( boot_cpu_data.x86_vendor )
>>>> +    {
>>>> +    case X86_VENDOR_INTEL:
>>>> +        /* Intel clears the common bits in e1d. */
>>>> +        fs[FEATURESET_e1d] &= ~COMMON_1D;
>>>> +        break;
>>>> +
>>>> +    case X86_VENDOR_AMD:
>>>> +        /* AMD duplicates the common bits between 1d and e1d. */
>>>> +        fs[FEATURESET_e1d] = ((fs[FEATURESET_1d]  &  COMMON_1D) |
>>>> +                              (fs[FEATURESET_e1d] & ~COMMON_1D));
>>>> +        break;
>>>> +    }
>>> How is this meant to work with cross vendor migration?
>> I don't see how cross-vendor is relevant here.  This is about reporting
>> the hosts modified featureset accurately to the toolstack.
> I.e. you're not later going to use what you generate here to also
> massage (or at least validate) what guests are going to see?

Oh right - as currently implemented, this will clobber features on an
Intel host attempting to emulate AMD through cross-vendor.

I will reconsider the logic in v3.  This isn't trivial to fix,
especially given that we don't yet have per-domain policies.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-15 15:52       ` Jan Beulich
@ 2016-02-15 16:09         ` Andrew Cooper
  2016-02-15 16:27           ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 16:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 15:52, Jan Beulich wrote:
>
>>>> --- a/xen/tools/gen-cpuid.py
>>>> +++ b/xen/tools/gen-cpuid.py
>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>> nr_entries)
>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>  
>>>> +    deps = {
>>>> +        XSAVE:
>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>> +
>>>> +        AVX:
>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>> I continue to question whether namely XOP, but perhaps also the
>>> others here except maybe AVX2, really is depending on AVX, and
>>> not just on XSAVE.
>> I am sure we have had this argument before.
> Indeed, hence the "I continue to ...".
>
>> All VEX encoded SIMD instructions (including XOP which is listed in the
>> same category by AMD) are specified to act on 256bit AVX state, and
>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>> instructions encoding %xmm registers, which explicitly zero the upper
>> 128bits of the associated %ymm register.
>>
>> This is very clearly a dependency on AVX, even if it isn't written in
>> one clear concise statement in the instruction manuals.
> The question is what AVX actually means: To me it's an instruction set
> extension, not one of machine state. The machine state extension to
> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
> intentionally say "to me", because I can certainly see why this may be
> viewed differently.)

The AVX feature bit is also the indicator that the AVX bit may be set in
XCR0, which links it to machine state and not just instruction sets.

>  Note how you yourself have recourse to XCR0,
> which is very clearly tied to XSAVE and not AVX, above (and note also
> that there's nothing called AVX to be enabled in XCR0, it's YMM that
> you talk about).

The key point is this.  If I choose to enable XSAVE and disable AVX for
a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
therefore shouldn't see the features.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-15 16:09         ` Andrew Cooper
@ 2016-02-15 16:27           ` Jan Beulich
  2016-02-15 19:07             ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-15 16:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 15:52, Jan Beulich wrote:
>>
>>>>> --- a/xen/tools/gen-cpuid.py
>>>>> +++ b/xen/tools/gen-cpuid.py
>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>>> nr_entries)
>>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>>  
>>>>> +    deps = {
>>>>> +        XSAVE:
>>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>>> +
>>>>> +        AVX:
>>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>>> I continue to question whether namely XOP, but perhaps also the
>>>> others here except maybe AVX2, really is depending on AVX, and
>>>> not just on XSAVE.
>>> I am sure we have had this argument before.
>> Indeed, hence the "I continue to ...".
>>
>>> All VEX encoded SIMD instructions (including XOP which is listed in the
>>> same category by AMD) are specified to act on 256bit AVX state, and
>>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>>> instructions encoding %xmm registers, which explicitly zero the upper
>>> 128bits of the associated %ymm register.
>>>
>>> This is very clearly a dependency on AVX, even if it isn't written in
>>> one clear concise statement in the instruction manuals.
>> The question is what AVX actually means: To me it's an instruction set
>> extension, not one of machine state. The machine state extension to
>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
>> intentionally say "to me", because I can certainly see why this may be
>> viewed differently.)
> 
> The AVX feature bit is also the indicator that the AVX bit may be set in
> XCR0, which links it to machine state and not just instruction sets.

No, it's not (and again - there's no bit named AVX in XCR0): Which
bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX,
which is - surprise, surprise - the so called XSTATE leaf (i.e. related
to XSAVE, and not to AVX).

>>  Note how you yourself have recourse to XCR0,
>> which is very clearly tied to XSAVE and not AVX, above (and note also
>> that there's nothing called AVX to be enabled in XCR0, it's YMM that
>> you talk about).
> 
> The key point is this.  If I choose to enable XSAVE and disable AVX for
> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
> therefore shouldn't see the features.

Are you sure? Did you try? Those instructions may not be very
useful without other AVX instructions, but I don't think there's
any coupling. And if I, as an example, look at one of the
3-operand vfmadd instructions, I also don't see any #UD
resulting from the AVX bit being clear (as opposed to various of
the AVX-512 extensions, which clearly document that AVX512F
needs to always be checked). It's only in the textual description
of e.g. FMA or AVX2 detection where such a connection is being
made.

In any event, please don't misunderstand my bringing up of this
as objection to the way you handle things. I merely wanted to
point out again that this is not the only way the (often self-
contradictory) SDM can be understood.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-15 15:43   ` Jan Beulich
@ 2016-02-15 17:12     ` Andrew Cooper
  2016-02-16 10:06       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 17:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 15:43, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>          /* Fix up VLAPIC details. */
>>          *ebx &= 0x00FFFFFFu;
>>          *ebx |= (v->vcpu_id * 2) << 24;
>> +
>> +        *ecx &= hvm_featureset[FEATURESET_1c];
>> +        *edx &= hvm_featureset[FEATURESET_1d];
> Looks like I've overlooked an issue in patch 11, which becomes
> apparent here: How can you use a domain-independent featureset
> here, when features vary between HAP and shadow mode guests?
> I.e. in the earlier patch I suppose you need to calculate two
> hvm_*_featureset[]s, with the HAP one perhaps empty when
> !hvm_funcs.hap_supported.

Their use here is a halfway house between nothing and the planned full
per-domain policies.

In this case, the "don't expose $X to a non-hap domain" checks have been
retained, to cover the difference.

>
>> @@ -4694,6 +4687,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>          break;
>>  
>>      case 0x80000001:
>> +        *ecx &= hvm_featureset[FEATURESET_e1c];
>> +        *edx &= hvm_featureset[FEATURESET_e1d];
> Looking at the uses here, wouldn't it be better (less ambiguous) for
> these to be named FEATURESET_x1c and FEATURESET_x1d
> respectively, since x is not a hex digit, but e is?

I originally chose e because these are commonly known as the extended
cpuid leaves.  'e' being a hex digit is why the number is specified as
an upper case hex number, as shown with FEATURESET_Da1.

I suppose x could work here, but I don't see being less ambiguous. 
(Perhaps I am clouded by having this syntax firmly embedded in my
expectation).

>
>> @@ -841,69 +842,70 @@ void pv_cpuid(struct cpu_user_regs *regs)
>>      else
>>          cpuid_count(leaf, subleaf, &a, &b, &c, &d);
>>  
>> -    if ( (leaf & 0x7fffffff) == 0x00000001 )
>> -    {
>> -        /* Modify Feature Information. */
>> -        if ( !cpu_has_apic )
>> -            __clear_bit(X86_FEATURE_APIC, &d);
>> -
>> -        if ( !is_pvh_domain(currd) )
>> -        {
>> -            __clear_bit(X86_FEATURE_PSE, &d);
>> -            __clear_bit(X86_FEATURE_PGE, &d);
>> -            __clear_bit(X86_FEATURE_PSE36, &d);
>> -            __clear_bit(X86_FEATURE_VME, &d);
>> -        }
>> -    }
>> -
>>      switch ( leaf )
>>      {
>>      case 0x00000001:
>> -        /* Modify Feature Information. */
>> -        if ( !cpu_has_sep )
>> -            __clear_bit(X86_FEATURE_SEP, &d);
>> -        __clear_bit(X86_FEATURE_DS, &d);
>> -        __clear_bit(X86_FEATURE_ACC, &d);
>> -        __clear_bit(X86_FEATURE_PBE, &d);
>> -        if ( is_pvh_domain(currd) )
>> -            __clear_bit(X86_FEATURE_MTRR, &d);
>> -
>> -        __clear_bit(X86_FEATURE_DTES64 % 32, &c);
>> -        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
>> -        __clear_bit(X86_FEATURE_DSCPL % 32, &c);
>> -        __clear_bit(X86_FEATURE_VMXE % 32, &c);
>> -        __clear_bit(X86_FEATURE_SMXE % 32, &c);
>> -        __clear_bit(X86_FEATURE_TM2 % 32, &c);
>> +        c &= pv_featureset[FEATURESET_1c];
>> +        d &= pv_featureset[FEATURESET_1d];
>> +
>>          if ( is_pv_32bit_domain(currd) )
>> -            __clear_bit(X86_FEATURE_CX16 % 32, &c);
>> -        __clear_bit(X86_FEATURE_XTPR % 32, &c);
>> -        __clear_bit(X86_FEATURE_PDCM % 32, &c);
>> -        __clear_bit(X86_FEATURE_PCID % 32, &c);
>> -        __clear_bit(X86_FEATURE_DCA % 32, &c);
>> -        if ( !cpu_has_xsave )
>> -        {
>> -            __clear_bit(X86_FEATURE_XSAVE % 32, &c);
>> -            __clear_bit(X86_FEATURE_AVX % 32, &c);
>> -        }
>> -        if ( !cpu_has_apic )
>> -           __clear_bit(X86_FEATURE_X2APIC % 32, &c);
>> -        __set_bit(X86_FEATURE_HYPERVISOR % 32, &c);
>> +            c &= ~cpufeat_mask(X86_FEATURE_CX16);
> Shouldn't this be taken care of by clearing LM and then applying
> your dependencies correction? Or is that meant to only get
> enforced later? Is it maybe still worth having both pv64_featureset[]
> and pv32_featureset[]?

Again, this is just used as a halfway house.  Longterm, I plan to read
the featureset straight from the per-domain policy, which won't be
stored as an adhoc array.

>
>> +        /*
>> +         * !!! Warning - OSXSAVE handling for PV guests is non-architectural !!!
>> +         *
>> +         * Architecturally, the correct code here is simply:
>> +         *
>> +         *   if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE )
>> +         *       c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
>> +         *
>> +         * However because of bugs in Xen (before c/s bd19080b, Nov 2010, the
>> +         * XSAVE cpuid flag leaked into guests despite the feature not being
>> +         * avilable for use), buggy workarounds where introduced to Linux (c/s
>> +         * 947ccf9c, also Nov 2010) which relied on the fact that Xen also
>> +         * incorrectly leaked OSXSAVE into the guest.
>> +         *
>> +         * Furthermore, providing architectural OSXSAVE behaviour to a many
>> +         * Linux PV guests triggered a further kernel bug when the fpu code
>> +         * observes that XSAVEOPT is available, assumes that xsave state had
>> +         * been set up for the task, and follows a wild pointer.
>> +         *
>> +         * Therefore, the leaking of Xen's OSXSAVE setting has become a
>> +         * defacto part of the PV ABI and can't reasonably be corrected.
>> +         *
>> +         * The following situations and logic now applies:
>> +         *
>> +         * - Hardware without CPUID faulting support and native CPUID:
>> +         *    There is nothing Xen can do here.  The hosts XSAVE flag will
>> +         *    leak through and Xen's OSXSAVE choice will leak through.
>> +         *
>> +         *    In the case that the guest kernel has not set up OSXSAVE, only
>> +         *    SSE will be set in xcr0, and guest userspace can't do too much
>> +         *    damage itself.
>> +         *
>> +         * - Enlightened CPUID or CPUID faulting available:
>> +         *    Xen can fully control what is seen here.  Guest kernels need to
>> +         *    see the leaked OSXSAVE, but guest userspace is given
>> +         *    architectural behaviour, to reflect the guest kernels
>> +         *    intentions.
>> +         */
> Well, I think all of this is too harsh: In a hypervisor-guest
> relationship of PV kind I don't view it as entirely wrong to let the
> guest kernel know of whether the hypervisor enabled XSAVE
> support by inspecting the OSXSAVE bit. From guest kernel
> perspective, the hypervisor is kind of the OS.

Xen shadows the guests %cr4.  That alone means that the emulated cpuid
should have matched.

If you are going for the OS argument, that means "Xen has XSAVE turned
on and the guest can the provided features" and not "Xen supports the
guest configuring its own XCR0", which is what it is taken to mean.  In
both of these cases, deviating from the architectural behaviour confuses
the situation, and adds adds needless divergence from the native
implementation in guests.

I admit that all of this stems from the complete fiasco which was the
original XSAVE support, but all of it would be unnecessary had buggy
bugfixes not been stack up on each other.

>  While I won't
> make weakening the above a little a requirement, I'd appreciate
> you doing so.

I don't which to be deliberately over the top, but I really don't think
I am in this case.  None of this should be necessary.

>
>> +    case 0x80000007:
>> +        d &= pv_featureset[FEATURESET_e7d];
>> +        break;
> By not clearing eax and ebx (not sure about ecx) here you would
> again expose flags to guests without proper white listing.
>
>> +    case 0x80000008:
>> +        b &= pv_featureset[FEATURESET_e8b];
>>          break;
> Same here for ecx and edx and perhaps the upper 8 bits of eax.

Both of these would be changes to how these things are currently
handled, whereby a guest gets to see whatever the toolstack managed to
find in said leaf.  I was hoping to put off some of these decisions, but
they probably need making now.  On the PV side they definitely can't be
fully hidden, as these leaves are not maskable.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-15 16:27           ` Jan Beulich
@ 2016-02-15 19:07             ` Andrew Cooper
  2016-02-16  9:54               ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-15 19:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 15/02/16 16:27, Jan Beulich wrote:
>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
>> On 15/02/16 15:52, Jan Beulich wrote:
>>>>>> --- a/xen/tools/gen-cpuid.py
>>>>>> +++ b/xen/tools/gen-cpuid.py
>>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>>>> nr_entries)
>>>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>>>  
>>>>>> +    deps = {
>>>>>> +        XSAVE:
>>>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>>>> +
>>>>>> +        AVX:
>>>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>>>> I continue to question whether namely XOP, but perhaps also the
>>>>> others here except maybe AVX2, really is depending on AVX, and
>>>>> not just on XSAVE.
>>>> I am sure we have had this argument before.
>>> Indeed, hence the "I continue to ...".
>>>
>>>> All VEX encoded SIMD instructions (including XOP which is listed in the
>>>> same category by AMD) are specified to act on 256bit AVX state, and
>>>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>>>> instructions encoding %xmm registers, which explicitly zero the upper
>>>> 128bits of the associated %ymm register.
>>>>
>>>> This is very clearly a dependency on AVX, even if it isn't written in
>>>> one clear concise statement in the instruction manuals.
>>> The question is what AVX actually means: To me it's an instruction set
>>> extension, not one of machine state. The machine state extension to
>>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
>>> intentionally say "to me", because I can certainly see why this may be
>>> viewed differently.)
>> The AVX feature bit is also the indicator that the AVX bit may be set in
>> XCR0, which links it to machine state and not just instruction sets.
> No, it's not (and again - there's no bit named AVX in XCR0):

(and again - Intel disagree) The Intel manual uniformly refers to
XCR0.AVX (bit 2).  AMD uses XCR0.YMM.

>  Which
> bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX,
> which is - surprise, surprise - the so called XSTATE leaf (i.e. related
> to XSAVE, and not to AVX).

In hardware, all these bits are almost certainly hardwired on or off. 
Part of the issue here is that with virtualisation, there are a whole
lot more combinations than exist on real hardware.

Whether right or wrong, the values for guests values for
CPUID[0xd].EDX:EAX are now generated from the guest featureset.  This is
based on my assumption that that's how real hardware actually works, and
prevents the possibility of them getting out of sync.

>
>>>  Note how you yourself have recourse to XCR0,
>>> which is very clearly tied to XSAVE and not AVX, above (and note also
>>> that there's nothing called AVX to be enabled in XCR0, it's YMM that
>>> you talk about).
>> The key point is this.  If I choose to enable XSAVE and disable AVX for
>> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
>> therefore shouldn't see the features.
> Are you sure? Did you try?

Yes

void test_main(void)
{
    printk("AVX Testing\n");

    write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT |
X86_CR4_OSXSAVE);

    asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0));
    asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");

    asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0));
    asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");

    xtf_success();
}

with disassembly:

00104000 <test_main>:
  104000:       48 83 ec 08             sub    $0x8,%rsp
  104004:       bf b0 4c 10 00          mov    $0x104cb0,%edi
  104009:       31 c0                   xor    %eax,%eax
  10400b:       e8 b0 c2 ff ff          callq  1002c0 <printk>
  104010:       0f 20 e0                mov    %cr4,%rax
  104013:       48 0d 00 06 04 00       or     $0x40600,%rax
  104019:       0f 22 e0                mov    %rax,%cr4
  10401c:       31 c9                   xor    %ecx,%ecx
  10401e:       31 d2                   xor    %edx,%edx
  104020:       b8 07 00 00 00          mov    $0x7,%eax
  104025:       0f 01 d1                xsetbv
  104028:       c4 e2 f1 98 d0          vfmadd132pd %xmm0,%xmm1,%xmm2
  10402d:       b0 03                   mov    $0x3,%al
  10402f:       0f 01 d1                xsetbv
  104032:       c4 e2 f1 98 d0          vfmadd132pd %xmm0,%xmm1,%xmm2
  104037:       48 83 c4 08             add    $0x8,%rsp
  10403b:       e9 60 d6 ff ff          jmpq   1016a0 <xtf_success>

causes a #UD exception on the second FMA instruction only:

(d3) [  357.071427] --- Xen Test Framework ---
(d3) [  357.094556] Environment: HVM 64bit (Long mode 4 levels)
(d3) [  357.094709] AVX Testing
(d3) [  357.094867] ******************************
(d3) [  357.095050] PANIC: Unhandled exception: vec 6 at
0008:0000000000104032
(d3) [  357.095160] ******************************


>  Those instructions may not be very
> useful without other AVX instructions, but I don't think there's
> any coupling. And if I, as an example, look at one of the
> 3-operand vfmadd instructions, I also don't see any #UD
> resulting from the AVX bit being clear (as opposed to various of
> the AVX-512 extensions, which clearly document that AVX512F
> needs to always be checked). It's only in the textual description
> of e.g. FMA or AVX2 detection where such a connection is being
> made.
>
> In any event, please don't misunderstand my bringing up of this
> as objection to the way you handle things. I merely wanted to
> point out again that this is not the only way the (often self-
> contradictory) SDM can be understood.

The fact that there is ambiguity means that we must be even more careful
when making changes like this.  After all, if there are multiple ways to
interpret the text, you can probably bet that different software takes
contrary interpretations.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-15 19:07             ` Andrew Cooper
@ 2016-02-16  9:54               ` Jan Beulich
  2016-02-17 10:25                 ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-16  9:54 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 15.02.16 at 20:07, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 16:27, Jan Beulich wrote:
>>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
>>> On 15/02/16 15:52, Jan Beulich wrote:
>>>>>>> --- a/xen/tools/gen-cpuid.py
>>>>>>> +++ b/xen/tools/gen-cpuid.py
>>>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>>>>> nr_entries)
>>>>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>>>>  
>>>>>>> +    deps = {
>>>>>>> +        XSAVE:
>>>>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>>>>> +
>>>>>>> +        AVX:
>>>>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>>>>> I continue to question whether namely XOP, but perhaps also the
>>>>>> others here except maybe AVX2, really is depending on AVX, and
>>>>>> not just on XSAVE.
>>>>> I am sure we have had this argument before.
>>>> Indeed, hence the "I continue to ...".
>>>>
>>>>> All VEX encoded SIMD instructions (including XOP which is listed in the
>>>>> same category by AMD) are specified to act on 256bit AVX state, and
>>>>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>>>>> instructions encoding %xmm registers, which explicitly zero the upper
>>>>> 128bits of the associated %ymm register.
>>>>>
>>>>> This is very clearly a dependency on AVX, even if it isn't written in
>>>>> one clear concise statement in the instruction manuals.
>>>> The question is what AVX actually means: To me it's an instruction set
>>>> extension, not one of machine state. The machine state extension to
>>>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
>>>> intentionally say "to me", because I can certainly see why this may be
>>>> viewed differently.)
>>> The AVX feature bit is also the indicator that the AVX bit may be set in
>>> XCR0, which links it to machine state and not just instruction sets.
>> No, it's not (and again - there's no bit named AVX in XCR0):
> 
> (and again - Intel disagree) The Intel manual uniformly refers to
> XCR0.AVX (bit 2).  AMD uses XCR0.YMM.

Interesting. I'm sure early documentation was different, which
would be in line with the bits named YMM and ZMM etc in our
code base. But anyway.

>>  Which
>> bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX,
>> which is - surprise, surprise - the so called XSTATE leaf (i.e. related
>> to XSAVE, and not to AVX).
> 
> In hardware, all these bits are almost certainly hardwired on or off. 
> Part of the issue here is that with virtualisation, there are a whole
> lot more combinations than exist on real hardware.
> 
> Whether right or wrong, the values for guests values for
> CPUID[0xd].EDX:EAX are now generated from the guest featureset.  This is
> based on my assumption that that's how real hardware actually works, and
> prevents the possibility of them getting out of sync.

Which I agree with. In this context, however, I wonder how you
mean to do what you say above. I don't recall having see any
related code, but of course this may well be in one of the patches
I didn't yet get around to look at.

>>>>  Note how you yourself have recourse to XCR0,
>>>> which is very clearly tied to XSAVE and not AVX, above (and note also
>>>> that there's nothing called AVX to be enabled in XCR0, it's YMM that
>>>> you talk about).
>>> The key point is this.  If I choose to enable XSAVE and disable AVX for
>>> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
>>> therefore shouldn't see the features.
>> Are you sure? Did you try?
> 
> Yes
> 
> void test_main(void)
> {
>     printk("AVX Testing\n");
> 
>     write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE);
> 
>     asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0));
>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
> 
>     asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0));
>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");

Here you clear the bit in XCR0, which wasn't the question. The
question was whether VFMADD... would fault when the CPUID
AVX bit is clear.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-15 17:12     ` Andrew Cooper
@ 2016-02-16 10:06       ` Jan Beulich
  2016-02-17 10:43         ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-16 10:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 15.02.16 at 18:12, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 15:43, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>          /* Fix up VLAPIC details. */
>>>          *ebx &= 0x00FFFFFFu;
>>>          *ebx |= (v->vcpu_id * 2) << 24;
>>> +
>>> +        *ecx &= hvm_featureset[FEATURESET_1c];
>>> +        *edx &= hvm_featureset[FEATURESET_1d];
>> Looks like I've overlooked an issue in patch 11, which becomes
>> apparent here: How can you use a domain-independent featureset
>> here, when features vary between HAP and shadow mode guests?
>> I.e. in the earlier patch I suppose you need to calculate two
>> hvm_*_featureset[]s, with the HAP one perhaps empty when
>> !hvm_funcs.hap_supported.
> 
> Their use here is a halfway house between nothing and the planned full
> per-domain policies.
> 
> In this case, the "don't expose $X to a non-hap domain" checks have been
> retained, to cover the difference.

Well, doesn't it seem to you that doing only half of the HAP/shadow
separation is odd/confusing? I.e. could I talk you into not doing any
such separation (enforcing the non-HAP overrides as is done now)
or finishing the separation to become visible/usable here?

>>> @@ -4694,6 +4687,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>          break;
>>>  
>>>      case 0x80000001:
>>> +        *ecx &= hvm_featureset[FEATURESET_e1c];
>>> +        *edx &= hvm_featureset[FEATURESET_e1d];
>> Looking at the uses here, wouldn't it be better (less ambiguous) for
>> these to be named FEATURESET_x1c and FEATURESET_x1d
>> respectively, since x is not a hex digit, but e is?
> 
> I originally chose e because these are commonly known as the extended
> cpuid leaves.  'e' being a hex digit is why the number is specified as
> an upper case hex number, as shown with FEATURESET_Da1.
> 
> I suppose x could work here, but I don't see being less ambiguous. 
> (Perhaps I am clouded by having this syntax firmly embedded in my
> expectation).

Okay then, I just wanted to point out that the current naming may
be confusing to others (and to a degree it is to me).

>>>      switch ( leaf )
>>>      {
>>>      case 0x00000001:
>>> -        /* Modify Feature Information. */
>>> -        if ( !cpu_has_sep )
>>> -            __clear_bit(X86_FEATURE_SEP, &d);
>>> -        __clear_bit(X86_FEATURE_DS, &d);
>>> -        __clear_bit(X86_FEATURE_ACC, &d);
>>> -        __clear_bit(X86_FEATURE_PBE, &d);
>>> -        if ( is_pvh_domain(currd) )
>>> -            __clear_bit(X86_FEATURE_MTRR, &d);
>>> -
>>> -        __clear_bit(X86_FEATURE_DTES64 % 32, &c);
>>> -        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
>>> -        __clear_bit(X86_FEATURE_DSCPL % 32, &c);
>>> -        __clear_bit(X86_FEATURE_VMXE % 32, &c);
>>> -        __clear_bit(X86_FEATURE_SMXE % 32, &c);
>>> -        __clear_bit(X86_FEATURE_TM2 % 32, &c);
>>> +        c &= pv_featureset[FEATURESET_1c];
>>> +        d &= pv_featureset[FEATURESET_1d];
>>> +
>>>          if ( is_pv_32bit_domain(currd) )
>>> -            __clear_bit(X86_FEATURE_CX16 % 32, &c);
>>> -        __clear_bit(X86_FEATURE_XTPR % 32, &c);
>>> -        __clear_bit(X86_FEATURE_PDCM % 32, &c);
>>> -        __clear_bit(X86_FEATURE_PCID % 32, &c);
>>> -        __clear_bit(X86_FEATURE_DCA % 32, &c);
>>> -        if ( !cpu_has_xsave )
>>> -        {
>>> -            __clear_bit(X86_FEATURE_XSAVE % 32, &c);
>>> -            __clear_bit(X86_FEATURE_AVX % 32, &c);
>>> -        }
>>> -        if ( !cpu_has_apic )
>>> -           __clear_bit(X86_FEATURE_X2APIC % 32, &c);
>>> -        __set_bit(X86_FEATURE_HYPERVISOR % 32, &c);
>>> +            c &= ~cpufeat_mask(X86_FEATURE_CX16);
>> Shouldn't this be taken care of by clearing LM and then applying
>> your dependencies correction? Or is that meant to only get
>> enforced later? Is it maybe still worth having both pv64_featureset[]
>> and pv32_featureset[]?
> 
> Again, this is just used as a halfway house.  Longterm, I plan to read
> the featureset straight from the per-domain policy, which won't be
> stored as an adhoc array.

Okay, that's certainly fine in this case I suppose.

>>> +    case 0x80000007:
>>> +        d &= pv_featureset[FEATURESET_e7d];
>>> +        break;
>> By not clearing eax and ebx (not sure about ecx) here you would
>> again expose flags to guests without proper white listing.
>>
>>> +    case 0x80000008:
>>> +        b &= pv_featureset[FEATURESET_e8b];
>>>          break;
>> Same here for ecx and edx and perhaps the upper 8 bits of eax.
> 
> Both of these would be changes to how these things are currently
> handled, whereby a guest gets to see whatever the toolstack managed to
> find in said leaf.  I was hoping to put off some of these decisions, but
> they probably need making now.  On the PV side they definitely can't be
> fully hidden, as these leaves are not maskable.

Right, but many are meaningful primarily to kernels, and there we
can hide them.

Since you're switching from black to white listing here, I also think
we need a default label alongside the "unsupported" one here.
Similarly I would think XSTATE sub-leaves beyond 63 need hiding
now.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init()
  2016-02-05 13:42 ` [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
@ 2016-02-16 14:10   ` Jan Beulich
  2016-02-17 10:45     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-16 14:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> Before c/s 44e24f8567 "x86: don't call generic_identify() redundantly", the
> commandline-provided masks would take effect in Xen's view of the features.
> 
> As the masks got applied after the query for features, the redundant call to
> generic_identify() would clobber the pre-masking feature information with the
> post-masking information.
> 
> Move the set_cpumask() calls into c_early_init() so their effects take place
> before the main query for features in generic_identify().
> 
> The cpuid_mask_* command line parameters now limit the entire system, a
> feature XenServer was relying on for testing purposes.

And I continue to view this as a step backwards, and hence can't
really approve of this change. And XenServer relying on this for
whatever purposes is hardly a good argument here.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching
  2016-02-05 13:42 ` [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching Andrew Cooper
@ 2016-02-16 14:15   ` Jan Beulich
  2016-02-17  8:15     ` Jan Beulich
  2016-02-17 19:06   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-16 14:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -574,6 +574,34 @@ void microcode_set_module(unsigned int);
>  int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len);
>  int microcode_resume_cpu(unsigned int cpu);
>  
> +#define LCAP_faulting (1U << 0)
> +#define LCAP_1cd      (3U << 1)
> +#define LCAP_e1cd     (3U << 3)
> +#define LCAP_Da1      (1U << 5)
> +#define LCAP_6c       (1U << 6)
> +#define LCAP_7ab0     (3U << 7)

I guess the cases where the mask has two set bits is when two
CPUID output registers are being controlled, but I don't see
what use that pairing is going to be. But with the patch
supposedly going to make sense only in the context of the
following ones, I'll see (and I'd presumably be able to ack this
one then also only when having seen the others).

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 18/30] x86/cpu: Rework AMD masking MSR setup
  2016-02-05 13:42 ` [PATCH v2 18/30] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
@ 2016-02-17  7:40   ` Jan Beulich
  2016-02-17 10:56     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17  7:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> @@ -126,126 +133,189 @@ static const struct cpuidmask *__init noinline get_cpuidmask(const char *opt)
>  }
>  
>  /*
> + * Sets caps in expected_levelling_cap, probes for the specified mask MSR, and
> + * set caps in levelling_caps if it is found.  Processors prior to Fam 10h
> + * required a 32-bit password for masking MSRs.  Reads the default value into
> + * msr_val.
> + */
> +static void __init __probe_mask_msr(unsigned int msr, uint64_t caps,

Please reduce the leading underscores to at most one.

> +                                    uint64_t *msr_val)
> +{
> +	unsigned int hi, lo;
> +
> +        expected_levelling_cap |= caps;

Indentation.

> +	if ((rdmsr_amd_safe(msr, &lo, &hi) == 0) &&
> +	    (wrmsr_amd_safe(msr, lo, hi) == 0))
> +		levelling_caps |= caps;
> +
> +	*msr_val = ((uint64_t)hi << 32) | lo;
> +}

Why can't this function, currently returning void, simply return the
value read?

> +/*
> + * Context switch levelling state to the next domain.  A parameter of NULL is
> + * used to context switch to the default host state, and is used by the BSP/AP
> + * startup code.
> + */
> +static void amd_ctxt_switch_levelling(const struct domain *nextd)
> +{
> +	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
> +	const struct cpuidmasks *masks = &cpuidmask_defaults;

May I suggest naming this "defaults", to aid clarity of the code
below?

> +#define LAZY(cap, msr, field)						\
> +	({								\
> +		if (((levelling_caps & cap) == cap) &&			\
> +		    (these_masks->field != masks->field))		\

Perhaps worth swapping the operands of the && and wrapping
the then left side of it in unlikely(), to hopefully make the most
desirable route through this function a branch-less one?

> +static void __init noinline amd_init_levelling(void)
>  {
> -	static unsigned int feat_ecx, feat_edx;
> -	static unsigned int extfeat_ecx, extfeat_edx;
> -	static unsigned int l7s0_eax, l7s0_ebx;
> -	static unsigned int thermal_ecx;
> -	static bool_t skip_feat, skip_extfeat;
> -	static bool_t skip_l7s0_eax_ebx, skip_thermal_ecx;
> -	static enum { not_parsed, no_mask, set_mask } status;
> -	unsigned int eax, ebx, ecx, edx;
> -
> -	if (status == no_mask)
> -		return;
> +	const struct cpuidmask *m = NULL;
>  
> -	if (status == set_mask)
> -		goto setmask;
> +	probe_masking_msrs();
>  
> -	ASSERT((status == not_parsed) && (c == &boot_cpu_data));
> -	status = no_mask;
> +	if (*opt_famrev != '\0') {
> +		m = get_cpuidmask(opt_famrev);
>  
> -	/* Fam11 doesn't support masking at all. */
> -	if (c->x86 == 0x11)
> -		return;
> +		if (!m)
> +			printk("Invalid processor string: %s\n", opt_famrev);
> +	}
>  
> -	if (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
> -	      opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
> -	      opt_cpuid_mask_l7s0_eax & opt_cpuid_mask_l7s0_ebx &
> -	      opt_cpuid_mask_thermal_ecx)) {
> -		feat_ecx = opt_cpuid_mask_ecx;
> -		feat_edx = opt_cpuid_mask_edx;
> -		extfeat_ecx = opt_cpuid_mask_ext_ecx;
> -		extfeat_edx = opt_cpuid_mask_ext_edx;
> -		l7s0_eax = opt_cpuid_mask_l7s0_eax;
> -		l7s0_ebx = opt_cpuid_mask_l7s0_ebx;
> -		thermal_ecx = opt_cpuid_mask_thermal_ecx;
> -	} else if (*opt_famrev == '\0') {
> -		return;
> -	} else {
> -		const struct cpuidmask *m = get_cpuidmask(opt_famrev);
> +	if ((levelling_caps & LCAP_1cd) == LCAP_1cd) {
> +		uint32_t ecx, edx, tmp;
>  
> -		if (!m) {
> -			printk("Invalid processor string: %s\n", opt_famrev);
> -			printk("CPUID will not be masked\n");
> -			return;
> +		cpuid(0x00000001, &tmp, &tmp, &ecx, &edx);

Didn't you collect raw CPUID output already?

> +		if(~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx)) {
> +			ecx &= opt_cpuid_mask_ecx;
> +			edx &= opt_cpuid_mask_edx;
> +		} else if ( m ) {

Partial Xen coding style slipped in here.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 19/30] x86/cpu: Rework Intel masking/faulting setup
  2016-02-05 13:42 ` [PATCH v2 19/30] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
@ 2016-02-17  7:57   ` Jan Beulich
  2016-02-17 10:59     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17  7:57 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -18,11 +18,18 @@
>  
>  #define select_idle_routine(x) ((void)0)
>  
> -static unsigned int probe_intel_cpuid_faulting(void)
> +static bool_t __init probe_intel_cpuid_faulting(void)
>  {
>  	uint64_t x;
> -	return !rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x) &&
> -		(x & MSR_PLATFORM_INFO_CPUID_FAULTING);
> +
> +	if ( rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x) ||
> +	     !(x & MSR_PLATFORM_INFO_CPUID_FAULTING) )
> +		return 0;

Partial Xen coding style again.

> @@ -44,41 +51,46 @@ void set_cpuid_faulting(bool_t enable)
>  }
>  
>  /*
> - * opt_cpuid_mask_ecx/edx: cpuid.1[ecx, edx] feature mask.
> - * For example, E8400[Intel Core 2 Duo Processor series] ecx = 0x0008E3FD,
> - * edx = 0xBFEBFBFF when executing CPUID.EAX = 1 normally. If you want to
> - * 'rev down' to E8400, you can set these values in these Xen boot parameters.
> + * Set caps in expected_levelling_cap, probe a specific masking MSR, and set
> + * caps in levelling_caps if it is found, or clobber the MSR index if missing.
> + * If preset, reads the default value into msr_val.
>   */
> -static void set_cpuidmask(const struct cpuinfo_x86 *c)
> +static void __init __probe_mask_msr(unsigned int *msr, uint64_t caps,
> +				    uint64_t *msr_val)
>  {
> -	static unsigned int msr_basic, msr_ext, msr_xsave;
> -	static enum { not_parsed, no_mask, set_mask } status;
> -	u64 msr_val;
> +	uint64_t val;
>  
> -	if (status == no_mask)
> -		return;
> +	expected_levelling_cap |= caps;
>  
> -	if (status == set_mask)
> -		goto setmask;
> +	if (rdmsr_safe(*msr, val) || wrmsr_safe(*msr, val))
> +		*msr = 0;
> +	else
> +	{
> +		levelling_caps |= caps;
> +		*msr_val = val;
> +	}
> +}

Same as for the AMD side: Perhaps neater if the function returned
the MSR value? (Also again partial Xen coding style here.)

> +/* Indicies of the masking MSRs, or 0 if unavailable. */
> +static unsigned int __read_mostly msr_basic, msr_ext, msr_xsave;

I think this way __read_mostly applies only to msr_basic, which I
don't think is what you want. Also I think you mean "indices" or
"indexes".

> +static void __init probe_masking_msrs(void)
> +{
> +	const struct cpuinfo_x86 *c = &boot_cpu_data;
> +	unsigned int exp_msr_basic = 0, exp_msr_ext = 0, exp_msr_xsave = 0;
>  
>  	/* Only family 6 supports this feature. */
> -	if (c->x86 != 6) {
> -		printk("No CPUID feature masking support available\n");
> +	if (c->x86 != 6)
>  		return;
> -	}
>  
>  	switch (c->x86_model) {
>  	case 0x17: /* Yorkfield, Wolfdale, Penryn, Harpertown(DP) */
>  	case 0x1d: /* Dunnington(MP) */
> -		msr_basic = MSR_INTEL_MASK_V1_CPUID1;
> +		exp_msr_basic = msr_basic = MSR_INTEL_MASK_V1_CPUID1;
>  		break;
>  
>  	case 0x1a: /* Bloomfield, Nehalem-EP(Gainestown) */
> @@ -88,71 +100,126 @@ static void set_cpuidmask(const struct cpuinfo_x86 *c)
>  	case 0x2c: /* Gulftown, Westmere-EP */
>  	case 0x2e: /* Nehalem-EX(Beckton) */
>  	case 0x2f: /* Westmere-EX */
> -		msr_basic = MSR_INTEL_MASK_V2_CPUID1;
> -		msr_ext   = MSR_INTEL_MASK_V2_CPUID80000001;
> +		exp_msr_basic = msr_basic = MSR_INTEL_MASK_V2_CPUID1;
> +		exp_msr_ext   = msr_ext   = MSR_INTEL_MASK_V2_CPUID80000001;
>  		break;
>  
>  	case 0x2a: /* SandyBridge */
>  	case 0x2d: /* SandyBridge-E, SandyBridge-EN, SandyBridge-EP */
> -		msr_basic = MSR_INTEL_MASK_V3_CPUID1;
> -		msr_ext   = MSR_INTEL_MASK_V3_CPUID80000001;
> -		msr_xsave = MSR_INTEL_MASK_V3_CPUIDD_01;
> +		exp_msr_basic = msr_basic = MSR_INTEL_MASK_V3_CPUID1;
> +		exp_msr_ext   = msr_ext   = MSR_INTEL_MASK_V3_CPUID80000001;
> +		exp_msr_xsave = msr_xsave = MSR_INTEL_MASK_V3_CPUIDD_01;
>  		break;
>  	}

Instead of all these changes, and instead of the variable needing
initializers, you could simply initialize all three ext_msr_* right after
the switch().

> +static void intel_ctxt_switch_levelling(const struct domain *nextd)
> +{
> +	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
> +	const struct cpuidmasks *masks = &cpuidmask_defaults;
> +
> +#define LAZY(msr, field)						\
> +	({								\
> +		if (msr && (these_masks->field != masks->field))	\
> +		{							\
> +			wrmsrl(msr, masks->field);			\
> +			these_masks->field = masks->field;		\
> +		}							\
> +	})
> +
> +	LAZY(msr_basic, _1cd);
> +	LAZY(msr_ext,   e1cd);
> +	LAZY(msr_xsave, Da1);

Please either use token concatenation inside the macro body to
eliminate the redundant msr_ prefixes here, or properly
parenthesize the uses of "msr" inside the macro body.

> +	if (opt_cpu_info) {
> +		printk(XENLOG_INFO "Levelling caps: %#x\n", levelling_caps);
> +		printk(XENLOG_INFO
> +		       "MSR defaults: 1d 0x%08x, 1c 0x%08x, e1d 0x%08x, "
> +		       "e1c 0x%08x, Da1 0x%08x\n",
> +		       (uint32_t)(cpuidmask_defaults._1cd >> 32),
> +		       (uint32_t)cpuidmask_defaults._1cd,
> +		       (uint32_t)(cpuidmask_defaults.e1cd >> 32),
> +		       (uint32_t)cpuidmask_defaults.e1cd,
> +		       (uint32_t)cpuidmask_defaults.Da1);

Could I convince you to make this second printk() dependent
upon there not being CPUID faulting support?

> @@ -190,22 +257,13 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>  	    (boot_cpu_data.x86_mask == 3 || boot_cpu_data.x86_mask == 4))
>  		paddr_bits = 36;
>  
> -	if (c == &boot_cpu_data && c->x86 == 6) {
> -		if (probe_intel_cpuid_faulting())
> -			__set_bit(X86_FEATURE_CPUID_FAULTING,
> -				  c->x86_capability);
> -	} else if (boot_cpu_has(X86_FEATURE_CPUID_FAULTING)) {
> -		BUG_ON(!probe_intel_cpuid_faulting());
> -		__set_bit(X86_FEATURE_CPUID_FAULTING, c->x86_capability);
> -	}
> +	if (c == &boot_cpu_data)
> +		intel_init_levelling();
> +
> +	if (test_bit(X86_FEATURE_CPUID_FAULTING, boot_cpu_data.x86_capability))
> +            __set_bit(X86_FEATURE_CPUID_FAULTING, c->x86_capability);

Mixing tabs and spaces for indentation.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 20/30] x86/cpu: Context switch cpuid masks and faulting state in context_switch()
  2016-02-05 13:42 ` [PATCH v2 20/30] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
@ 2016-02-17  8:06   ` Jan Beulich
  0 siblings, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-17  8:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> A single ctxt_switch_levelling() function pointer is provided
> (defaulting to an empty nop), which is overridden in the appropriate
> $VENDOR_init_levelling().
> 
> set_cpuid_faulting() is made private and included within
> intel_ctxt_switch_levelling().
> 
> One functional change is that the faulting configuration is no longer 
> special
> cased for dom0.  There was never any need to, and it will cause dom0 to
> observe the same information through native and enlightened cpuid.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains
  2016-02-05 13:42 ` [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
@ 2016-02-17  8:13   ` Jan Beulich
  2016-02-17 11:03     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17  8:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -208,7 +208,9 @@ static void __init noinline probe_masking_msrs(void)
>  static void amd_ctxt_switch_levelling(const struct domain *nextd)
>  {
>  	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
> -	const struct cpuidmasks *masks = &cpuidmask_defaults;
> +	const struct cpuidmasks *masks =
> +            (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
> +            ? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;

Mixing tabs and spaces for indentation.

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -574,6 +574,11 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
>              goto fail;
>          clear_page(d->arch.pv_domain.gdt_ldt_l1tab);
>  
> +        d->arch.pv_domain.cpuidmasks = xmalloc(struct cpuidmasks);
> +        if ( !d->arch.pv_domain.cpuidmasks )
> +            goto fail;
> +        *d->arch.pv_domain.cpuidmasks = cpuidmask_defaults;

Along the lines of not masking features for the hypervisor's own use
(see the respective comment on the earlier patch) I think this patch,
here or in domain_build.c, should except Dom0 from having the
default masking applied. This shouldn't, however, extend to CPUID
faulting. (Perhaps this rather belongs here so that the non-Dom0
hardware domain case can also be taken care of.)

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching
  2016-02-16 14:15   ` Jan Beulich
@ 2016-02-17  8:15     ` Jan Beulich
  2016-02-17 10:46       ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17  8:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 16.02.16 at 15:15, <JBeulich@suse.com> wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/include/asm-x86/processor.h
>> +++ b/xen/include/asm-x86/processor.h
>> @@ -574,6 +574,34 @@ void microcode_set_module(unsigned int);
>>  int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len);
>>  int microcode_resume_cpu(unsigned int cpu);
>>  
>> +#define LCAP_faulting (1U << 0)
>> +#define LCAP_1cd      (3U << 1)
>> +#define LCAP_e1cd     (3U << 3)
>> +#define LCAP_Da1      (1U << 5)
>> +#define LCAP_6c       (1U << 6)
>> +#define LCAP_7ab0     (3U << 7)
> 
> I guess the cases where the mask has two set bits is when two
> CPUID output registers are being controlled, but I don't see
> what use that pairing is going to be. But with the patch
> supposedly going to make sense only in the context of the
> following ones, I'll see (and I'd presumably be able to ack this
> one then also only when having seen the others).

Having seen patches up to and including 21, I still don't see the
point of using 2-bit masks here.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 22/30] x86/domctl: Update PV domain cpumasks when setting cpuid policy
  2016-02-05 13:42 ` [PATCH v2 22/30] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
@ 2016-02-17  8:22   ` Jan Beulich
  2016-02-17 12:13     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17  8:22 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> @@ -87,6 +88,93 @@ static void update_domain_cpuid_info(struct domain *d,
>          d->arch.x86_model = (ctl->eax >> 4) & 0xf;
>          if ( d->arch.x86 >= 0x6 )
>              d->arch.x86_model |= (ctl->eax >> 12) & 0xf0;
> +
> +        if ( is_pv_domain(d) )

For clarity, wouldn't it be reasonable to check the respective
capability flag in all of these conditionals, even if without such
checks what gets set below simply won't ever get used? Even
more, the earlier patch allocating d->arch.pv_domain.cpuidmasks
could skip this allocation if none of the masking capability bits
are set (in which case checking the pointer to be non-NULL would
seem to be the right check here then).

Either way, Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  2016-02-05 13:42 ` [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
  2016-02-05 16:12   ` Wei Liu
@ 2016-02-17  8:30   ` Jan Beulich
  2016-02-17 12:17     ` Andrew Cooper
  1 sibling, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17  8:30 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, Tim Deegan, Rob Hoes, Xen-devel, David Scott

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> @@ -190,6 +191,71 @@ long arch_do_sysctl(
>          }
>          break;
>  
> +    case XEN_SYSCTL_get_cpu_featureset:
> +    {
> +        const uint32_t *featureset;
> +        unsigned int nr;
> +
> +        /* Request for maximum number of features? */
> +        if ( guest_handle_is_null(sysctl->u.cpu_featureset.features) )
> +        {
> +            sysctl->u.cpu_featureset.nr_features = FSCAPINTS;
> +            if ( __copy_field_to_guest(u_sysctl, sysctl,
> +                                       u.cpu_featureset.nr_features) )
> +                ret = -EFAULT;
> +            break;
> +        }
> +
> +        /* Clip the number of entries. */
> +        nr = sysctl->u.cpu_featureset.nr_features;
> +        if ( nr > FSCAPINTS )
> +            nr = FSCAPINTS;

min() (perhaps even allowing to obviate the comment)?

> +        switch ( sysctl->u.cpu_featureset.index )
> +        {
> +        case XEN_SYSCTL_cpu_featureset_raw:
> +            featureset = raw_featureset;
> +            break;
> +
> +        case XEN_SYSCTL_cpu_featureset_host:
> +            featureset = host_featureset;
> +            break;
> +
> +        case XEN_SYSCTL_cpu_featureset_pv:
> +            featureset = pv_featureset;
> +            break;
> +
> +        case XEN_SYSCTL_cpu_featureset_hvm:
> +            featureset = hvm_featureset;
> +            break;
> +
> +        default:
> +            featureset = NULL;
> +            break;
> +        }
> +
> +        /* Bad featureset index? */
> +        if ( !ret && !featureset )
> +            ret = -EINVAL;

Nothing above altered "ret" from its zero value, so the check here
is pointless.

> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -766,6 +766,29 @@ struct xen_sysctl_tmem_op {
>  typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
>  
> +/*
> + * XEN_SYSCTL_get_cpu_featureset (x86 specific)
> + *
> + * Return information about the maximum sets of features which can be offered
> + * to different types of guests.  This is all strictly information as found in
> + * `cpuid` feature leaves with no synthetic additions.
> + */

The reference to guests in the comment conflicts with the raw and
host types below.

> +struct xen_sysctl_cpu_featureset {
> +#define XEN_SYSCTL_cpu_featureset_raw      0
> +#define XEN_SYSCTL_cpu_featureset_host     1
> +#define XEN_SYSCTL_cpu_featureset_pv       2
> +#define XEN_SYSCTL_cpu_featureset_hvm      3
> +    uint32_t index;       /* IN: Which featureset to query? */
> +    uint32_t nr_features; /* IN/OUT: Number of entries in/written to
> +                           * 'features', or the maximum number of features if
> +                           * the guest handle is NULL.  NB. All featuresets
> +                           * come from the same numberspace, so have the same
> +                           * maximum length. */
> +    XEN_GUEST_HANDLE_64(uint32) features; /* OUT: */

Stray colon.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork
  2016-02-05 13:42 ` [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
  2016-02-05 16:13   ` Wei Liu
@ 2016-02-17  8:55   ` Jan Beulich
  2016-02-17 13:03     ` Andrew Cooper
  1 sibling, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17  8:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Ian Campbell, Xen-devel

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> @@ -467,11 +420,8 @@ static void xc_cpuid_config_xsave(xc_interface *xch,
>          regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
>          break;
>      case 1: /* leaf 1 */
> -        regs[0] &= (XSAVEOPT | XSAVEC | XGETBV1 | XSAVES);
> -        if ( !info->hvm )
> -            regs[0] &= ~XSAVES;
> -        regs[2] &= info->xfeature_mask;
> -        regs[3] = 0;
> +        regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
> +        regs[1] = regs[2] = regs[3] = 0;
>          break;

This change (to regs[2] handling) reminds me of an apparent issue
in the earlier dependencies patch, which I realized only after having
sent the reply, and then forgot to send another reply for: Shouldn't
features requiring certain XSAVE states depend on that state's
availability instead of just XSAVE? That would make the above use
proper masking for both regs[2] and regs[3] (and also for regs[0]
and regs[3] in the sub-leaf 0 case).

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling
  2016-02-08 17:26 ` [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling Andrew Cooper
@ 2016-02-17  9:02   ` Jan Beulich
  2016-02-17 13:06     ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17  9:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 08.02.16 at 18:26, <andrew.cooper3@citrix.com> wrote:

This fiddles with behavior on AMD only, yet it's not obvious why this
couldn't be done in vendor independent code (it should, afaict, be
benign for Intel).

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-16  9:54               ` Jan Beulich
@ 2016-02-17 10:25                 ` Andrew Cooper
  2016-02-17 10:42                   ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 10:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 16/02/16 09:54, Jan Beulich wrote:
>>>> On 15.02.16 at 20:07, <andrew.cooper3@citrix.com> wrote:
>> On 15/02/16 16:27, Jan Beulich wrote:
>>>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
>>>> On 15/02/16 15:52, Jan Beulich wrote:
>>>>>>>> --- a/xen/tools/gen-cpuid.py
>>>>>>>> +++ b/xen/tools/gen-cpuid.py
>>>>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>>>>>> nr_entries)
>>>>>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>>>>>  
>>>>>>>> +    deps = {
>>>>>>>> +        XSAVE:
>>>>>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>>>>>> +
>>>>>>>> +        AVX:
>>>>>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>>>>>> I continue to question whether namely XOP, but perhaps also the
>>>>>>> others here except maybe AVX2, really is depending on AVX, and
>>>>>>> not just on XSAVE.
>>>>>> I am sure we have had this argument before.
>>>>> Indeed, hence the "I continue to ...".
>>>>>
>>>>>> All VEX encoded SIMD instructions (including XOP which is listed in the
>>>>>> same category by AMD) are specified to act on 256bit AVX state, and
>>>>>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>>>>>> instructions encoding %xmm registers, which explicitly zero the upper
>>>>>> 128bits of the associated %ymm register.
>>>>>>
>>>>>> This is very clearly a dependency on AVX, even if it isn't written in
>>>>>> one clear concise statement in the instruction manuals.
>>>>> The question is what AVX actually means: To me it's an instruction set
>>>>> extension, not one of machine state. The machine state extension to
>>>>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
>>>>> intentionally say "to me", because I can certainly see why this may be
>>>>> viewed differently.)
>>>> The AVX feature bit is also the indicator that the AVX bit may be set in
>>>> XCR0, which links it to machine state and not just instruction sets.
>>> No, it's not (and again - there's no bit named AVX in XCR0):
>> (and again - Intel disagree) The Intel manual uniformly refers to
>> XCR0.AVX (bit 2).  AMD uses XCR0.YMM.
> Interesting. I'm sure early documentation was different, which
> would be in line with the bits named YMM and ZMM etc in our
> code base. But anyway.
>
>>>  Which
>>> bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX,
>>> which is - surprise, surprise - the so called XSTATE leaf (i.e. related
>>> to XSAVE, and not to AVX).
>> In hardware, all these bits are almost certainly hardwired on or off. 
>> Part of the issue here is that with virtualisation, there are a whole
>> lot more combinations than exist on real hardware.
>>
>> Whether right or wrong, the values for guests values for
>> CPUID[0xd].EDX:EAX are now generated from the guest featureset.  This is
>> based on my assumption that that's how real hardware actually works, and
>> prevents the possibility of them getting out of sync.
> Which I agree with. In this context, however, I wonder how you
> mean to do what you say above. I don't recall having see any
> related code, but of course this may well be in one of the patches
> I didn't yet get around to look at.
>
>>>>>  Note how you yourself have recourse to XCR0,
>>>>> which is very clearly tied to XSAVE and not AVX, above (and note also
>>>>> that there's nothing called AVX to be enabled in XCR0, it's YMM that
>>>>> you talk about).
>>>> The key point is this.  If I choose to enable XSAVE and disable AVX for
>>>> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
>>>> therefore shouldn't see the features.
>>> Are you sure? Did you try?
>> Yes
>>
>> void test_main(void)
>> {
>>     printk("AVX Testing\n");
>>
>>     write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE);
>>
>>     asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0));
>>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
>>
>>     asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0));
>>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
> Here you clear the bit in XCR0, which wasn't the question. The
> question was whether VFMADD... would fault when the CPUID
> AVX bit is clear.

How will the pipeline know whether the guest was offered the AVX flag in
cpuid?

Hiding feature bits does not prevent the functionality from working. 
i.e. 1GB superpages don't stop working if you hide the feature bit.  No
amount of levelling can stop a guest from manually probing for
features/instruction sets.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features
  2016-02-17 10:25                 ` Andrew Cooper
@ 2016-02-17 10:42                   ` Jan Beulich
  0 siblings, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-17 10:42 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 17.02.16 at 11:25, <andrew.cooper3@citrix.com> wrote:
> On 16/02/16 09:54, Jan Beulich wrote:
>>>>> On 15.02.16 at 20:07, <andrew.cooper3@citrix.com> wrote:
>>> On 15/02/16 16:27, Jan Beulich wrote:
>>>>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
>>>>> The key point is this.  If I choose to enable XSAVE and disable AVX for
>>>>> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
>>>>> therefore shouldn't see the features.
>>>> Are you sure? Did you try?
>>> Yes
>>>
>>> void test_main(void)
>>> {
>>>     printk("AVX Testing\n");
>>>
>>>     write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE);
>>>
>>>     asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0));
>>>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
>>>
>>>     asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0));
>>>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
>> Here you clear the bit in XCR0, which wasn't the question. The
>> question was whether VFMADD... would fault when the CPUID
>> AVX bit is clear.
> 
> How will the pipeline know whether the guest was offered the AVX flag in
> cpuid?
> 
> Hiding feature bits does not prevent the functionality from working. 
> i.e. 1GB superpages don't stop working if you hide the feature bit.  No
> amount of levelling can stop a guest from manually probing for
> features/instruction sets.

That wasn't my point. I was intentionally referring to the CPUID
bits, knowing that us masking them in software doesn't affect
hardware in any way. Hence the "Are you sure? Did you try?"
which were really meant in a rhetorical way, as the answer can
only possible be "no" here (unless you had control over the
hardware).

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-16 10:06       ` Jan Beulich
@ 2016-02-17 10:43         ` Andrew Cooper
  2016-02-17 10:55           ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 10:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 16/02/16 10:06, Jan Beulich wrote:
>>>> On 15.02.16 at 18:12, <andrew.cooper3@citrix.com> wrote:
>> On 15/02/16 15:43, Jan Beulich wrote:
>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>>          /* Fix up VLAPIC details. */
>>>>          *ebx &= 0x00FFFFFFu;
>>>>          *ebx |= (v->vcpu_id * 2) << 24;
>>>> +
>>>> +        *ecx &= hvm_featureset[FEATURESET_1c];
>>>> +        *edx &= hvm_featureset[FEATURESET_1d];
>>> Looks like I've overlooked an issue in patch 11, which becomes
>>> apparent here: How can you use a domain-independent featureset
>>> here, when features vary between HAP and shadow mode guests?
>>> I.e. in the earlier patch I suppose you need to calculate two
>>> hvm_*_featureset[]s, with the HAP one perhaps empty when
>>> !hvm_funcs.hap_supported.
>> Their use here is a halfway house between nothing and the planned full
>> per-domain policies.
>>
>> In this case, the "don't expose $X to a non-hap domain" checks have been
>> retained, to cover the difference.
> Well, doesn't it seem to you that doing only half of the HAP/shadow
> separation is odd/confusing? I.e. could I talk you into not doing any
> such separation (enforcing the non-HAP overrides as is done now)
> or finishing the separation to become visible/usable here?

The HAP/shadow distinction is needed in the toolstack to account for the
hap=<bool> option.

The distinction will disappear when per-domain policies are introduced. 
If you notice, the distinction is private to the data generated by the
autogen script, and does not form a part of any API/ABI.  The sysctl
only has a single hvm featureset.

>>>> +    case 0x80000007:
>>>> +        d &= pv_featureset[FEATURESET_e7d];
>>>> +        break;
>>> By not clearing eax and ebx (not sure about ecx) here you would
>>> again expose flags to guests without proper white listing.
>>>
>>>> +    case 0x80000008:
>>>> +        b &= pv_featureset[FEATURESET_e8b];
>>>>          break;
>>> Same here for ecx and edx and perhaps the upper 8 bits of eax.
>> Both of these would be changes to how these things are currently
>> handled, whereby a guest gets to see whatever the toolstack managed to
>> find in said leaf.  I was hoping to put off some of these decisions, but
>> they probably need making now.  On the PV side they definitely can't be
>> fully hidden, as these leaves are not maskable.
> Right, but many are meaningful primarily to kernels, and there we
> can hide them.
>
> Since you're switching from black to white listing here, I also think
> we need a default label alongside the "unsupported" one here.
> Similarly I would think XSTATE sub-leaves beyond 63 need hiding
> now.

I would prefer not to do that now.  It is conflated with the future
work, deliberately deferred to make this work a manageable size.

At the moment, the toolstack won't ever generate subleaves beyond 63.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init()
  2016-02-16 14:10   ` Jan Beulich
@ 2016-02-17 10:45     ` Andrew Cooper
  2016-02-17 10:58       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 10:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 16/02/16 14:10, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> Before c/s 44e24f8567 "x86: don't call generic_identify() redundantly", the
>> commandline-provided masks would take effect in Xen's view of the features.
>>
>> As the masks got applied after the query for features, the redundant call to
>> generic_identify() would clobber the pre-masking feature information with the
>> post-masking information.
>>
>> Move the set_cpumask() calls into c_early_init() so their effects take place
>> before the main query for features in generic_identify().
>>
>> The cpuid_mask_* command line parameters now limit the entire system, a
>> feature XenServer was relying on for testing purposes.
> And I continue to view this as a step backwards, and hence can't
> really approve of this change.

It is not a step backwards.  Being able to disable features in Xen is
critical for testing.  You accidentally broke that with a patch which
was supposed to be no functional change.

This series means that these command line options are not required for
levelling.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching
  2016-02-17  8:15     ` Jan Beulich
@ 2016-02-17 10:46       ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 10:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 08:15, Jan Beulich wrote:
>>>> On 16.02.16 at 15:15, <JBeulich@suse.com> wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/include/asm-x86/processor.h
>>> +++ b/xen/include/asm-x86/processor.h
>>> @@ -574,6 +574,34 @@ void microcode_set_module(unsigned int);
>>>  int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len);
>>>  int microcode_resume_cpu(unsigned int cpu);
>>>  
>>> +#define LCAP_faulting (1U << 0)
>>> +#define LCAP_1cd      (3U << 1)
>>> +#define LCAP_e1cd     (3U << 3)
>>> +#define LCAP_Da1      (1U << 5)
>>> +#define LCAP_6c       (1U << 6)
>>> +#define LCAP_7ab0     (3U << 7)
>> I guess the cases where the mask has two set bits is when two
>> CPUID output registers are being controlled, but I don't see
>> what use that pairing is going to be. But with the patch
>> supposedly going to make sense only in the context of the
>> following ones, I'll see (and I'd presumably be able to ack this
>> one then also only when having seen the others).
> Having seen patches up to and including 21, I still don't see the
> point of using 2-bit masks here.

The previous sysctl interface had individual bits.  I suppose that now I
have dropped that, these could return to single bits.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-17 10:43         ` Andrew Cooper
@ 2016-02-17 10:55           ` Jan Beulich
  2016-02-17 14:02             ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17 10:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 17.02.16 at 11:43, <andrew.cooper3@citrix.com> wrote:
> On 16/02/16 10:06, Jan Beulich wrote:
>>>>> On 15.02.16 at 18:12, <andrew.cooper3@citrix.com> wrote:
>>> On 15/02/16 15:43, Jan Beulich wrote:
>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>>>          /* Fix up VLAPIC details. */
>>>>>          *ebx &= 0x00FFFFFFu;
>>>>>          *ebx |= (v->vcpu_id * 2) << 24;
>>>>> +
>>>>> +        *ecx &= hvm_featureset[FEATURESET_1c];
>>>>> +        *edx &= hvm_featureset[FEATURESET_1d];
>>>> Looks like I've overlooked an issue in patch 11, which becomes
>>>> apparent here: How can you use a domain-independent featureset
>>>> here, when features vary between HAP and shadow mode guests?
>>>> I.e. in the earlier patch I suppose you need to calculate two
>>>> hvm_*_featureset[]s, with the HAP one perhaps empty when
>>>> !hvm_funcs.hap_supported.
>>> Their use here is a halfway house between nothing and the planned full
>>> per-domain policies.
>>>
>>> In this case, the "don't expose $X to a non-hap domain" checks have been
>>> retained, to cover the difference.
>> Well, doesn't it seem to you that doing only half of the HAP/shadow
>> separation is odd/confusing? I.e. could I talk you into not doing any
>> such separation (enforcing the non-HAP overrides as is done now)
>> or finishing the separation to become visible/usable here?
> 
> The HAP/shadow distinction is needed in the toolstack to account for the
> hap=<bool> option.
> 
> The distinction will disappear when per-domain policies are introduced. 
> If you notice, the distinction is private to the data generated by the
> autogen script, and does not form a part of any API/ABI.  The sysctl
> only has a single hvm featureset.

I don't see this as being in line with

    hvm_featuremask = hvm_funcs.hap_supported ?
        hvm_hap_featuremask : hvm_shadow_featuremask;

in patch 11. A shadow mode guest should see exactly the same
set of features, regardless of whether HAP was available (and
enabled) on a host.

>>>>> +    case 0x80000007:
>>>>> +        d &= pv_featureset[FEATURESET_e7d];
>>>>> +        break;
>>>> By not clearing eax and ebx (not sure about ecx) here you would
>>>> again expose flags to guests without proper white listing.
>>>>
>>>>> +    case 0x80000008:
>>>>> +        b &= pv_featureset[FEATURESET_e8b];
>>>>>          break;
>>>> Same here for ecx and edx and perhaps the upper 8 bits of eax.
>>> Both of these would be changes to how these things are currently
>>> handled, whereby a guest gets to see whatever the toolstack managed to
>>> find in said leaf.  I was hoping to put off some of these decisions, but
>>> they probably need making now.  On the PV side they definitely can't be
>>> fully hidden, as these leaves are not maskable.
>> Right, but many are meaningful primarily to kernels, and there we
>> can hide them.
>>
>> Since you're switching from black to white listing here, I also think
>> we need a default label alongside the "unsupported" one here.
>> Similarly I would think XSTATE sub-leaves beyond 63 need hiding
>> now.
> 
> I would prefer not to do that now.  It is conflated with the future
> work, deliberately deferred to make this work a manageable size.

I understand that you need to set some boundaries for this
first step. But I also think that we shouldn't stop in the middle
of switching from black listing to white listing here.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 18/30] x86/cpu: Rework AMD masking MSR setup
  2016-02-17  7:40   ` Jan Beulich
@ 2016-02-17 10:56     ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 10:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 07:40, Jan Beulich wrote:
>
>> +	if ((rdmsr_amd_safe(msr, &lo, &hi) == 0) &&
>> +	    (wrmsr_amd_safe(msr, lo, hi) == 0))
>> +		levelling_caps |= caps;
>> +
>> +	*msr_val = ((uint64_t)hi << 32) | lo;
>> +}
> Why can't this function, currently returning void, simply return the
> value read?

Hmm - it can.  This current layout is an artefact of several changes in
design.

>> +static void __init noinline amd_init_levelling(void)
>>  {
>> -	static unsigned int feat_ecx, feat_edx;
>> -	static unsigned int extfeat_ecx, extfeat_edx;
>> -	static unsigned int l7s0_eax, l7s0_ebx;
>> -	static unsigned int thermal_ecx;
>> -	static bool_t skip_feat, skip_extfeat;
>> -	static bool_t skip_l7s0_eax_ebx, skip_thermal_ecx;
>> -	static enum { not_parsed, no_mask, set_mask } status;
>> -	unsigned int eax, ebx, ecx, edx;
>> -
>> -	if (status == no_mask)
>> -		return;
>> +	const struct cpuidmask *m = NULL;
>>  
>> -	if (status == set_mask)
>> -		goto setmask;
>> +	probe_masking_msrs();
>>  
>> -	ASSERT((status == not_parsed) && (c == &boot_cpu_data));
>> -	status = no_mask;
>> +	if (*opt_famrev != '\0') {
>> +		m = get_cpuidmask(opt_famrev);
>>  
>> -	/* Fam11 doesn't support masking at all. */
>> -	if (c->x86 == 0x11)
>> -		return;
>> +		if (!m)
>> +			printk("Invalid processor string: %s\n", opt_famrev);
>> +	}
>>  
>> -	if (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
>> -	      opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
>> -	      opt_cpuid_mask_l7s0_eax & opt_cpuid_mask_l7s0_ebx &
>> -	      opt_cpuid_mask_thermal_ecx)) {
>> -		feat_ecx = opt_cpuid_mask_ecx;
>> -		feat_edx = opt_cpuid_mask_edx;
>> -		extfeat_ecx = opt_cpuid_mask_ext_ecx;
>> -		extfeat_edx = opt_cpuid_mask_ext_edx;
>> -		l7s0_eax = opt_cpuid_mask_l7s0_eax;
>> -		l7s0_ebx = opt_cpuid_mask_l7s0_ebx;
>> -		thermal_ecx = opt_cpuid_mask_thermal_ecx;
>> -	} else if (*opt_famrev == '\0') {
>> -		return;
>> -	} else {
>> -		const struct cpuidmask *m = get_cpuidmask(opt_famrev);
>> +	if ((levelling_caps & LCAP_1cd) == LCAP_1cd) {
>> +		uint32_t ecx, edx, tmp;
>>  
>> -		if (!m) {
>> -			printk("Invalid processor string: %s\n", opt_famrev);
>> -			printk("CPUID will not be masked\n");
>> -			return;
>> +		cpuid(0x00000001, &tmp, &tmp, &ecx, &edx);
> Didn't you collect raw CPUID output already?

This is now c_early_init(), which is ahead of populating c->x86_capability

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init()
  2016-02-17 10:45     ` Andrew Cooper
@ 2016-02-17 10:58       ` Jan Beulich
  2016-02-18 12:41         ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17 10:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 17.02.16 at 11:45, <andrew.cooper3@citrix.com> wrote:
> On 16/02/16 14:10, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> Before c/s 44e24f8567 "x86: don't call generic_identify() redundantly", the
>>> commandline-provided masks would take effect in Xen's view of the features.
>>>
>>> As the masks got applied after the query for features, the redundant call to
>>> generic_identify() would clobber the pre-masking feature information with the
>>> post-masking information.
>>>
>>> Move the set_cpumask() calls into c_early_init() so their effects take place
>>> before the main query for features in generic_identify().
>>>
>>> The cpuid_mask_* command line parameters now limit the entire system, a
>>> feature XenServer was relying on for testing purposes.
>> And I continue to view this as a step backwards, and hence can't
>> really approve of this change.
> 
> It is not a step backwards.  Being able to disable features in Xen is
> critical for testing.  You accidentally broke that with a patch which
> was supposed to be no functional change.

Views differ: I would say I unknowingly fixed this with that patch
(as to me it was always clear that this masking should not apply to
Xen itself).

If that behavior is critical for testing, add a command line option
to enable it.

> This series means that these command line options are not required for
> levelling.

By the end of the series, agreed.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 19/30] x86/cpu: Rework Intel masking/faulting setup
  2016-02-17  7:57   ` Jan Beulich
@ 2016-02-17 10:59     ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 10:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 07:57, Jan Beulich wrote:
>
>> +/* Indicies of the masking MSRs, or 0 if unavailable. */
>> +static unsigned int __read_mostly msr_basic, msr_ext, msr_xsave;
> I think this way __read_mostly applies only to msr_basic, which I
> don't think is what you want. Also I think you mean "indices" or
> "indexes".

"Indices" is what I meant.

>
>> +static void __init probe_masking_msrs(void)
>> +{
>> +	const struct cpuinfo_x86 *c = &boot_cpu_data;
>> +	unsigned int exp_msr_basic = 0, exp_msr_ext = 0, exp_msr_xsave = 0;
>>  
>>  	/* Only family 6 supports this feature. */
>> -	if (c->x86 != 6) {
>> -		printk("No CPUID feature masking support available\n");
>> +	if (c->x86 != 6)
>>  		return;
>> -	}
>>  
>>  	switch (c->x86_model) {
>>  	case 0x17: /* Yorkfield, Wolfdale, Penryn, Harpertown(DP) */
>>  	case 0x1d: /* Dunnington(MP) */
>> -		msr_basic = MSR_INTEL_MASK_V1_CPUID1;
>> +		exp_msr_basic = msr_basic = MSR_INTEL_MASK_V1_CPUID1;
>>  		break;
>>  
>>  	case 0x1a: /* Bloomfield, Nehalem-EP(Gainestown) */
>> @@ -88,71 +100,126 @@ static void set_cpuidmask(const struct cpuinfo_x86 *c)
>>  	case 0x2c: /* Gulftown, Westmere-EP */
>>  	case 0x2e: /* Nehalem-EX(Beckton) */
>>  	case 0x2f: /* Westmere-EX */
>> -		msr_basic = MSR_INTEL_MASK_V2_CPUID1;
>> -		msr_ext   = MSR_INTEL_MASK_V2_CPUID80000001;
>> +		exp_msr_basic = msr_basic = MSR_INTEL_MASK_V2_CPUID1;
>> +		exp_msr_ext   = msr_ext   = MSR_INTEL_MASK_V2_CPUID80000001;
>>  		break;
>>  
>>  	case 0x2a: /* SandyBridge */
>>  	case 0x2d: /* SandyBridge-E, SandyBridge-EN, SandyBridge-EP */
>> -		msr_basic = MSR_INTEL_MASK_V3_CPUID1;
>> -		msr_ext   = MSR_INTEL_MASK_V3_CPUID80000001;
>> -		msr_xsave = MSR_INTEL_MASK_V3_CPUIDD_01;
>> +		exp_msr_basic = msr_basic = MSR_INTEL_MASK_V3_CPUID1;
>> +		exp_msr_ext   = msr_ext   = MSR_INTEL_MASK_V3_CPUID80000001;
>> +		exp_msr_xsave = msr_xsave = MSR_INTEL_MASK_V3_CPUIDD_01;
>>  		break;
>>  	}
> Instead of all these changes, and instead of the variable needing
> initializers, you could simply initialize all three ext_msr_* right after
> the switch().

That would certainly be neater.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains
  2016-02-17  8:13   ` Jan Beulich
@ 2016-02-17 11:03     ` Andrew Cooper
  2016-02-17 11:14       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 11:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 08:13, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/cpu/amd.c
>> +++ b/xen/arch/x86/cpu/amd.c
>> @@ -208,7 +208,9 @@ static void __init noinline probe_masking_msrs(void)
>>  static void amd_ctxt_switch_levelling(const struct domain *nextd)
>>  {
>>  	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
>> -	const struct cpuidmasks *masks = &cpuidmask_defaults;
>> +	const struct cpuidmasks *masks =
>> +            (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
>> +            ? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
> Mixing tabs and spaces for indentation.
>
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -574,6 +574,11 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
>>              goto fail;
>>          clear_page(d->arch.pv_domain.gdt_ldt_l1tab);
>>  
>> +        d->arch.pv_domain.cpuidmasks = xmalloc(struct cpuidmasks);
>> +        if ( !d->arch.pv_domain.cpuidmasks )
>> +            goto fail;
>> +        *d->arch.pv_domain.cpuidmasks = cpuidmask_defaults;
> Along the lines of not masking features for the hypervisor's own use
> (see the respective comment on the earlier patch) I think this patch,
> here or in domain_build.c, should except Dom0 from having the
> default masking applied. This shouldn't, however, extend to CPUID
> faulting. (Perhaps this rather belongs here so that the non-Dom0
> hardware domain case can also be taken care of.)

Very specifically not.  It is wrong to special case Dom0 and the
hardware domain, as their cpuid values should relevent to their VM, not
the host.

The default cpuid policy level with real hardware, so its no practical
change at this point.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains
  2016-02-17 11:03     ` Andrew Cooper
@ 2016-02-17 11:14       ` Jan Beulich
  2016-02-18 12:48         ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17 11:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 17.02.16 at 12:03, <andrew.cooper3@citrix.com> wrote:
> On 17/02/16 08:13, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/cpu/amd.c
>>> +++ b/xen/arch/x86/cpu/amd.c
>>> @@ -208,7 +208,9 @@ static void __init noinline probe_masking_msrs(void)
>>>  static void amd_ctxt_switch_levelling(const struct domain *nextd)
>>>  {
>>>  	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
>>> -	const struct cpuidmasks *masks = &cpuidmask_defaults;
>>> +	const struct cpuidmasks *masks =
>>> +            (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
>>> +            ? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
>> Mixing tabs and spaces for indentation.
>>
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -574,6 +574,11 @@ int arch_domain_create(struct domain *d, unsigned int 
> domcr_flags,
>>>              goto fail;
>>>          clear_page(d->arch.pv_domain.gdt_ldt_l1tab);
>>>  
>>> +        d->arch.pv_domain.cpuidmasks = xmalloc(struct cpuidmasks);
>>> +        if ( !d->arch.pv_domain.cpuidmasks )
>>> +            goto fail;
>>> +        *d->arch.pv_domain.cpuidmasks = cpuidmask_defaults;
>> Along the lines of not masking features for the hypervisor's own use
>> (see the respective comment on the earlier patch) I think this patch,
>> here or in domain_build.c, should except Dom0 from having the
>> default masking applied. This shouldn't, however, extend to CPUID
>> faulting. (Perhaps this rather belongs here so that the non-Dom0
>> hardware domain case can also be taken care of.)
> 
> Very specifically not.  It is wrong to special case Dom0 and the
> hardware domain, as their cpuid values should relevent to their VM, not
> the host.

I can't see how this second half of the sentence is a reason for
not special casing Dom0.

> The default cpuid policy level with real hardware, so its no practical
> change at this point.

As long as no-one uses the then deprecated command line options.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 22/30] x86/domctl: Update PV domain cpumasks when setting cpuid policy
  2016-02-17  8:22   ` Jan Beulich
@ 2016-02-17 12:13     ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 12:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 08:22, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> @@ -87,6 +88,93 @@ static void update_domain_cpuid_info(struct domain *d,
>>          d->arch.x86_model = (ctl->eax >> 4) & 0xf;
>>          if ( d->arch.x86 >= 0x6 )
>>              d->arch.x86_model |= (ctl->eax >> 12) & 0xf0;
>> +
>> +        if ( is_pv_domain(d) )
> For clarity, wouldn't it be reasonable to check the respective
> capability flag in all of these conditionals, even if without such
> checks what gets set below simply won't ever get used? Even
> more, the earlier patch allocating d->arch.pv_domain.cpuidmasks
> could skip this allocation if none of the masking capability bits
> are set (in which case checking the pointer to be non-NULL would
> seem to be the right check here then).

Can do.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  2016-02-17  8:30   ` Jan Beulich
@ 2016-02-17 12:17     ` Andrew Cooper
  2016-02-17 12:23       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 12:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Tim Deegan, Rob Hoes, Xen-devel, David Scott

On 17/02/16 08:30, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> @@ -190,6 +191,71 @@ long arch_do_sysctl(
>>          }
>>          break;
>>  
>> +    case XEN_SYSCTL_get_cpu_featureset:
>> +    {
>> +        const uint32_t *featureset;
>> +        unsigned int nr;
>> +
>> +        /* Request for maximum number of features? */
>> +        if ( guest_handle_is_null(sysctl->u.cpu_featureset.features) )
>> +        {
>> +            sysctl->u.cpu_featureset.nr_features = FSCAPINTS;
>> +            if ( __copy_field_to_guest(u_sysctl, sysctl,
>> +                                       u.cpu_featureset.nr_features) )
>> +                ret = -EFAULT;
>> +            break;
>> +        }
>> +
>> +        /* Clip the number of entries. */
>> +        nr = sysctl->u.cpu_featureset.nr_features;
>> +        if ( nr > FSCAPINTS )
>> +            nr = FSCAPINTS;
> min() (perhaps even allowing to obviate the comment)?

They are different types, and you specifically objected to min_t() before.

>
>> +        switch ( sysctl->u.cpu_featureset.index )
>> +        {
>> +        case XEN_SYSCTL_cpu_featureset_raw:
>> +            featureset = raw_featureset;
>> +            break;
>> +
>> +        case XEN_SYSCTL_cpu_featureset_host:
>> +            featureset = host_featureset;
>> +            break;
>> +
>> +        case XEN_SYSCTL_cpu_featureset_pv:
>> +            featureset = pv_featureset;
>> +            break;
>> +
>> +        case XEN_SYSCTL_cpu_featureset_hvm:
>> +            featureset = hvm_featureset;
>> +            break;
>> +
>> +        default:
>> +            featureset = NULL;
>> +            break;
>> +        }
>> +
>> +        /* Bad featureset index? */
>> +        if ( !ret && !featureset )
>> +            ret = -EINVAL;
> Nothing above altered "ret" from its zero value, so the check here
> is pointless.

So it is.

>
>> --- a/xen/include/public/sysctl.h
>> +++ b/xen/include/public/sysctl.h
>> @@ -766,6 +766,29 @@ struct xen_sysctl_tmem_op {
>>  typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
>>  
>> +/*
>> + * XEN_SYSCTL_get_cpu_featureset (x86 specific)
>> + *
>> + * Return information about the maximum sets of features which can be offered
>> + * to different types of guests.  This is all strictly information as found in
>> + * `cpuid` feature leaves with no synthetic additions.
>> + */
> The reference to guests in the comment conflicts with the raw and
> host types below.

I will reword.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  2016-02-17 12:17     ` Andrew Cooper
@ 2016-02-17 12:23       ` Jan Beulich
  0 siblings, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-17 12:23 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, Tim Deegan, Rob Hoes, Xen-devel, DavidScott

>>> On 17.02.16 at 13:17, <andrew.cooper3@citrix.com> wrote:
> On 17/02/16 08:30, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> @@ -190,6 +191,71 @@ long arch_do_sysctl(
>>>          }
>>>          break;
>>>  
>>> +    case XEN_SYSCTL_get_cpu_featureset:
>>> +    {
>>> +        const uint32_t *featureset;
>>> +        unsigned int nr;
>>> +
>>> +        /* Request for maximum number of features? */
>>> +        if ( guest_handle_is_null(sysctl->u.cpu_featureset.features) )
>>> +        {
>>> +            sysctl->u.cpu_featureset.nr_features = FSCAPINTS;
>>> +            if ( __copy_field_to_guest(u_sysctl, sysctl,
>>> +                                       u.cpu_featureset.nr_features) )
>>> +                ret = -EFAULT;
>>> +            break;
>>> +        }
>>> +
>>> +        /* Clip the number of entries. */
>>> +        nr = sysctl->u.cpu_featureset.nr_features;
>>> +        if ( nr > FSCAPINTS )
>>> +            nr = FSCAPINTS;
>> min() (perhaps even allowing to obviate the comment)?
> 
> They are different types, and you specifically objected to min_t() before.

I commonly object to min_t() when min() can reasonably be used,
but I certainly prefer min_t() over some form of open coding. Apart
from that I suppose FSCAPINTS could easily be added a U suffix?

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork
  2016-02-17  8:55   ` Jan Beulich
@ 2016-02-17 13:03     ` Andrew Cooper
  2016-02-17 13:19       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 13:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, Ian Campbell, Xen-devel

On 17/02/16 08:55, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> @@ -467,11 +420,8 @@ static void xc_cpuid_config_xsave(xc_interface *xch,
>>          regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
>>          break;
>>      case 1: /* leaf 1 */
>> -        regs[0] &= (XSAVEOPT | XSAVEC | XGETBV1 | XSAVES);
>> -        if ( !info->hvm )
>> -            regs[0] &= ~XSAVES;
>> -        regs[2] &= info->xfeature_mask;
>> -        regs[3] = 0;
>> +        regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
>> +        regs[1] = regs[2] = regs[3] = 0;
>>          break;
> This change (to regs[2] handling) reminds me of an apparent issue
> in the earlier dependencies patch, which I realized only after having
> sent the reply, and then forgot to send another reply for: Shouldn't
> features requiring certain XSAVE states depend on that state's
> availability instead of just XSAVE? That would make the above use
> proper masking for both regs[2] and regs[3] (and also for regs[0]
> and regs[3] in the sub-leaf 0 case).

Have you looks at this patch in combination with the following one?

Strictly speaking, there is a difference between the xstate availability
and the features which use them.  However, the former are not level-able
on older hardware.  I have deliberately constructed the former from the
latter, to prevent the two becoming different, and unlike real hardware.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling
  2016-02-17  9:02   ` Jan Beulich
@ 2016-02-17 13:06     ` Andrew Cooper
  2016-02-17 13:36       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 13:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 09:02, Jan Beulich wrote:
>>>> On 08.02.16 at 18:26, <andrew.cooper3@citrix.com> wrote:
> This fiddles with behavior on AMD only, yet it's not obvious why this
> couldn't be done in vendor independent code (it should, afaict, be
> benign for Intel).

AMD and Intel levelling are fundamentally different.

The former are override MSRs with some quirks when it comes to the magic
bits, while the latter are strict masks which take effect before the
magic bits are folded in.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-12 16:27   ` Jan Beulich
@ 2016-02-17 13:08     ` Andrew Cooper
  2016-02-17 13:34       ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 13:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Ian Campbell, Xen-devel

On 12/02/16 16:27, Jan Beulich wrote:
>>>> On 05.02.16 at 14:41, <andrew.cooper3@citrix.com> wrote:
>> +/* Intel-defined CPU features, CPUID level 0x00000001.edx, word 0 */
>> +#define X86_FEATURE_FPU           ( 0*32+ 0) /*   Onboard FPU */
> Regardless of you limiting the interface to tools only, I'm not
> convinced exposing constants starting with X86_* here is
> appropriate.

The toolstack already uses these exact names.  Patch 25 depends on me
not changing any of these.

>
>> +#define X86_FEATURE_XMM           ( 0*32+25) /*   Streaming SIMD Extensions */
>> +#define X86_FEATURE_XMM2          ( 0*32+26) /*   Streaming SIMD Extensions-2 */
>> [...]
>> +/* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
>> +#define X86_FEATURE_XMM3          ( 1*32+ 0) /*   Streaming SIMD Extensions-3 */
> Apart from that exposing them should be done using canonical instead
> of Linux-invented names, i.e. s/XMM/SSE/ for the above lines. I've
> had a need to create a patch to do this just earlier today.

I can do this as a preparatory patch.  Any other names you want to me to
change?

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities
  2016-02-12 17:14       ` Jan Beulich
@ 2016-02-17 13:12         ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 13:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 12/02/16 17:14, Jan Beulich wrote:
>>>> On 12.02.16 at 17:48, <andrew.cooper3@citrix.com> wrote:
>> On 12/02/16 16:43, Jan Beulich wrote:
>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>> --- /dev/null
>>>> +++ b/xen/arch/x86/cpuid.c
>>>> @@ -0,0 +1,19 @@
>>>> +#include <xen/lib.h>
>>>> +#include <asm/cpuid.h>
>>>> +
>>>> +const uint32_t known_features[] = INIT_KNOWN_FEATURES;
>>>> +
>>>> +static void __maybe_unused build_assertions(void)
>>>> +{
>>>> +    BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);
>>> This is sort of redundant with ...
>>>
>>>> --- /dev/null
>>>> +++ b/xen/include/asm-x86/cpuid.h
>>>> @@ -0,0 +1,24 @@
>>>> +#ifndef __X86_CPUID_H__
>>>> +#define __X86_CPUID_H__
>>>> +
>>>> +#include <asm/cpuid-autogen.h>
>>>> +
>>>> +#define FSCAPINTS FEATURESET_NR_ENTRIES
>>>> +
>>>> +#ifndef __ASSEMBLY__
>>>> +#include <xen/types.h>
>>>> +
>>>> +extern const uint32_t known_features[FSCAPINTS];
>>> ... the use of FSCAPINTS here. You'd catch more mistakes if you
>>> just used [] here.
>> Not quite.
>>
>> The extern gives an explicit size so other translation units can use
>> ARRAY_SIZE().
> True.
>
>> Without the BUILD_BUG_ON(), const uint32_t known_features[] can actually
>> be longer than FSCAPINTS, and everything compiles fine.
>>
>> The BUILD_BUG_ON() were introduced following an off-by-one error
>> generating INIT_KNOWN_FEATURES, where ARRAY_SIZE(known_features) was
>> different in this translation unit than all others.
> But what if INIT_KNOWN_FEATURES inits fewer than the intended
> number of elements. The remaining array members will be zero, sure,
> but I think such a condition would suggest a mistake elsewhere, and
> hence might be worth flagging.

In principle, implicit zero extending is ok.

In practice, the autogen script explicitly zero extends the identifier
to the intended number of words.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork
  2016-02-17 13:03     ` Andrew Cooper
@ 2016-02-17 13:19       ` Jan Beulich
  0 siblings, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-17 13:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Ian Campbell, Xen-devel

>>> On 17.02.16 at 14:03, <andrew.cooper3@citrix.com> wrote:
> On 17/02/16 08:55, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> @@ -467,11 +420,8 @@ static void xc_cpuid_config_xsave(xc_interface *xch,
>>>          regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
>>>          break;
>>>      case 1: /* leaf 1 */
>>> -        regs[0] &= (XSAVEOPT | XSAVEC | XGETBV1 | XSAVES);
>>> -        if ( !info->hvm )
>>> -            regs[0] &= ~XSAVES;
>>> -        regs[2] &= info->xfeature_mask;
>>> -        regs[3] = 0;
>>> +        regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
>>> +        regs[1] = regs[2] = regs[3] = 0;
>>>          break;
>> This change (to regs[2] handling) reminds me of an apparent issue
>> in the earlier dependencies patch, which I realized only after having
>> sent the reply, and then forgot to send another reply for: Shouldn't
>> features requiring certain XSAVE states depend on that state's
>> availability instead of just XSAVE? That would make the above use
>> proper masking for both regs[2] and regs[3] (and also for regs[0]
>> and regs[3] in the sub-leaf 0 case).
> 
> Have you looks at this patch in combination with the following one?

I did look at the next one only after writing this reply, but with
that one not again changing sub-leaf 1 handling I don't see why
this would make a difference.

> Strictly speaking, there is a difference between the xstate availability
> and the features which use them.  However, the former are not level-able
> on older hardware.  I have deliberately constructed the former from the
> latter, to prevent the two becoming different, and unlike real hardware.

You mean this

+    guest_xfeature_mask = X86_XCR0_SSE | X86_XCR0_X87;
+
+    if ( test_bit(X86_FEATURE_AVX, info->featureset) )
+        guest_xfeature_mask |= X86_XCR0_AVX;
+
+    if ( test_bit(X86_FEATURE_LWP, info->featureset) )
+        guest_xfeature_mask |= X86_XCR0_LWP;

I guess? That's exactly what I've been talking about: You make
everything else come from the hypervisor - why not this mask
too (at once avoiding the need to fiddle with this code for each
new state component)? It would just be two more array elements
for sub-leaf 0 EDX:EAX and maybe another two for sub-leaf 1
EDX:ECX (they could really be combined, since the two bit masks
are exclusive of one another). And perhaps there should then
be a bi-directional dependency: CPUID[BASE].AVX (for example)
should depend on CPUID[XSAVE].AVX and vice versa.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-17 13:08     ` Andrew Cooper
@ 2016-02-17 13:34       ` Jan Beulich
  0 siblings, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-17 13:34 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Ian Campbell, Xen-devel

>>> On 17.02.16 at 14:08, <andrew.cooper3@citrix.com> wrote:
> On 12/02/16 16:27, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>> +/* Intel-defined CPU features, CPUID level 0x00000001.edx, word 0 */
>>> +#define X86_FEATURE_FPU           ( 0*32+ 0) /*   Onboard FPU */
>> Regardless of you limiting the interface to tools only, I'm not
>> convinced exposing constants starting with X86_* here is
>> appropriate.
> 
> The toolstack already uses these exact names.  Patch 25 depends on me
> not changing any of these.

I don't see your reply being a meaningful response to my
statement above: If properly adding XEN_ prefixes here means
libxc needs further changes, so be it. Of course I also wouldn't
mind a model similar to that used in public/errno.h, allowing the
including site to control the generated names.

>>> +#define X86_FEATURE_XMM           ( 0*32+25) /*   Streaming SIMD Extensions */
>>> +#define X86_FEATURE_XMM2          ( 0*32+26) /*   Streaming SIMD Extensions-2 */
>>> [...]
>>> +/* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
>>> +#define X86_FEATURE_XMM3          ( 1*32+ 0) /*   Streaming SIMD Extensions-3 */
>> Apart from that exposing them should be done using canonical instead
>> of Linux-invented names, i.e. s/XMM/SSE/ for the above lines. I've
>> had a need to create a patch to do this just earlier today.
> 
> I can do this as a preparatory patch.  Any other names you want to me to
> change?

According to the SDM

PN -> PSN
CLFLUSH -> CLFSH
SELFSNOOP -> SS
HT -> HTT
ACC -> TM

MWAIT -> MONITOR
VMXE -> VMX
SMXE -> SMX
EST -> EIST
CID -> CNXTID
CX16 -> CMPXCHG16B

CMT -> PQM
CAT -> PQE

some of which I'm not really convinced should be changed (most
notably CLFLUSH and SELFSNOOP, but perhaps also at least CX16,
as that's in line with CX8).

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling
  2016-02-17 13:06     ` Andrew Cooper
@ 2016-02-17 13:36       ` Jan Beulich
  0 siblings, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-17 13:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 17.02.16 at 14:06, <andrew.cooper3@citrix.com> wrote:
> On 17/02/16 09:02, Jan Beulich wrote:
>>>>> On 08.02.16 at 18:26, <andrew.cooper3@citrix.com> wrote:
>> This fiddles with behavior on AMD only, yet it's not obvious why this
>> couldn't be done in vendor independent code (it should, afaict, be
>> benign for Intel).
> 
> AMD and Intel levelling are fundamentally different.
> 
> The former are override MSRs with some quirks when it comes to the magic
> bits, while the latter are strict masks which take effect before the
> magic bits are folded in.

That's what you've derived from observations aiui, not something
written down somewhere. As long as the final effect is the same,
I still think doing such adjustments in vendor independent code
would be better. But I won't insist.

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-17 10:55           ` Jan Beulich
@ 2016-02-17 14:02             ` Andrew Cooper
  2016-02-17 14:45               ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-17 14:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 10:55, Jan Beulich wrote:
>>>> On 17.02.16 at 11:43, <andrew.cooper3@citrix.com> wrote:
>> On 16/02/16 10:06, Jan Beulich wrote:
>>>>>> On 15.02.16 at 18:12, <andrew.cooper3@citrix.com> wrote:
>>>> On 15/02/16 15:43, Jan Beulich wrote:
>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>>>>          /* Fix up VLAPIC details. */
>>>>>>          *ebx &= 0x00FFFFFFu;
>>>>>>          *ebx |= (v->vcpu_id * 2) << 24;
>>>>>> +
>>>>>> +        *ecx &= hvm_featureset[FEATURESET_1c];
>>>>>> +        *edx &= hvm_featureset[FEATURESET_1d];
>>>>> Looks like I've overlooked an issue in patch 11, which becomes
>>>>> apparent here: How can you use a domain-independent featureset
>>>>> here, when features vary between HAP and shadow mode guests?
>>>>> I.e. in the earlier patch I suppose you need to calculate two
>>>>> hvm_*_featureset[]s, with the HAP one perhaps empty when
>>>>> !hvm_funcs.hap_supported.
>>>> Their use here is a halfway house between nothing and the planned full
>>>> per-domain policies.
>>>>
>>>> In this case, the "don't expose $X to a non-hap domain" checks have been
>>>> retained, to cover the difference.
>>> Well, doesn't it seem to you that doing only half of the HAP/shadow
>>> separation is odd/confusing? I.e. could I talk you into not doing any
>>> such separation (enforcing the non-HAP overrides as is done now)
>>> or finishing the separation to become visible/usable here?
>> The HAP/shadow distinction is needed in the toolstack to account for the
>> hap=<bool> option.
>>
>> The distinction will disappear when per-domain policies are introduced. 
>> If you notice, the distinction is private to the data generated by the
>> autogen script, and does not form a part of any API/ABI.  The sysctl
>> only has a single hvm featureset.
> I don't see this as being in line with
>
>     hvm_featuremask = hvm_funcs.hap_supported ?
>         hvm_hap_featuremask : hvm_shadow_featuremask;
>
> in patch 11. A shadow mode guest should see exactly the same
> set of features, regardless of whether HAP was available (and
> enabled) on a host.

A shadow mode guest will see the same features, independently of whether
HAP was available.

This example could in principle be replaced with a clause similar to the
vmx side, but it doesn't remove the need for the shadow mask.

There is currently no way for the toolstack to query the cpuid policy,
which means it must have a correct idea of the eventual policy handed to
the guest.  The pv32/64 split isn't interesting in this regard as it
won't change.  HAP vs shadow however is very important to permit
migrating between hap and non-hap capable hosts.

>
>>>>>> +    case 0x80000007:
>>>>>> +        d &= pv_featureset[FEATURESET_e7d];
>>>>>> +        break;
>>>>> By not clearing eax and ebx (not sure about ecx) here you would
>>>>> again expose flags to guests without proper white listing.
>>>>>
>>>>>> +    case 0x80000008:
>>>>>> +        b &= pv_featureset[FEATURESET_e8b];
>>>>>>          break;
>>>>> Same here for ecx and edx and perhaps the upper 8 bits of eax.
>>>> Both of these would be changes to how these things are currently
>>>> handled, whereby a guest gets to see whatever the toolstack managed to
>>>> find in said leaf.  I was hoping to put off some of these decisions, but
>>>> they probably need making now.  On the PV side they definitely can't be
>>>> fully hidden, as these leaves are not maskable.
>>> Right, but many are meaningful primarily to kernels, and there we
>>> can hide them.
>>>
>>> Since you're switching from black to white listing here, I also think
>>> we need a default label alongside the "unsupported" one here.
>>> Similarly I would think XSTATE sub-leaves beyond 63 need hiding
>>> now.
>> I would prefer not to do that now.  It is conflated with the future
>> work, deliberately deferred to make this work a manageable size.
> I understand that you need to set some boundaries for this
> first step. But I also think that we shouldn't stop in the middle
> of switching from black listing to white listing here.

It was always a mixed black/white list which has turned a little less
like a blacklist.  There are still elements of both.

All this series is intended to do is turn the levellable feature bits
themselves to being strictly controlled.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-17 14:02             ` Andrew Cooper
@ 2016-02-17 14:45               ` Jan Beulich
  2016-02-18 12:17                 ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Jan Beulich @ 2016-02-17 14:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 17.02.16 at 15:02, <andrew.cooper3@citrix.com> wrote:
> On 17/02/16 10:55, Jan Beulich wrote:
>>>>> On 17.02.16 at 11:43, <andrew.cooper3@citrix.com> wrote:
>>> On 16/02/16 10:06, Jan Beulich wrote:
>>>>>>> On 15.02.16 at 18:12, <andrew.cooper3@citrix.com> wrote:
>>>>> On 15/02/16 15:43, Jan Beulich wrote:
>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>>>>>          /* Fix up VLAPIC details. */
>>>>>>>          *ebx &= 0x00FFFFFFu;
>>>>>>>          *ebx |= (v->vcpu_id * 2) << 24;
>>>>>>> +
>>>>>>> +        *ecx &= hvm_featureset[FEATURESET_1c];
>>>>>>> +        *edx &= hvm_featureset[FEATURESET_1d];
>>>>>> Looks like I've overlooked an issue in patch 11, which becomes
>>>>>> apparent here: How can you use a domain-independent featureset
>>>>>> here, when features vary between HAP and shadow mode guests?
>>>>>> I.e. in the earlier patch I suppose you need to calculate two
>>>>>> hvm_*_featureset[]s, with the HAP one perhaps empty when
>>>>>> !hvm_funcs.hap_supported.
>>>>> Their use here is a halfway house between nothing and the planned full
>>>>> per-domain policies.
>>>>>
>>>>> In this case, the "don't expose $X to a non-hap domain" checks have been
>>>>> retained, to cover the difference.
>>>> Well, doesn't it seem to you that doing only half of the HAP/shadow
>>>> separation is odd/confusing? I.e. could I talk you into not doing any
>>>> such separation (enforcing the non-HAP overrides as is done now)
>>>> or finishing the separation to become visible/usable here?
>>> The HAP/shadow distinction is needed in the toolstack to account for the
>>> hap=<bool> option.
>>>
>>> The distinction will disappear when per-domain policies are introduced. 
>>> If you notice, the distinction is private to the data generated by the
>>> autogen script, and does not form a part of any API/ABI.  The sysctl
>>> only has a single hvm featureset.
>> I don't see this as being in line with
>>
>>     hvm_featuremask = hvm_funcs.hap_supported ?
>>         hvm_hap_featuremask : hvm_shadow_featuremask;
>>
>> in patch 11. A shadow mode guest should see exactly the same
>> set of features, regardless of whether HAP was available (and
>> enabled) on a host.
> 
> A shadow mode guest will see the same features, independently of whether
> HAP was available.

I'm afraid I'm being dense: Either the guest sees the same features,
which to me implies both of hvm_{hap,shadow}_featuremask are
identical, or the two masks are different, resulting in different guest
feature masks (and hence different guest features) depending on
HAP availability. What am I missing?

Jan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-15 15:41                 ` Andrew Cooper
@ 2016-02-17 19:02                   ` Konrad Rzeszutek Wilk
  2016-02-17 19:58                     ` Boris Ostrovsky
  2016-02-18 15:02                     ` Roger Pau Monné
  0 siblings, 2 replies; 139+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-17 19:02 UTC (permalink / raw)
  To: Andrew Cooper, roger.pau, boris.ostrovsky; +Cc: Jan Beulich, Xen-devel

On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
> On 15/02/16 15:02, Jan Beulich wrote:
> >>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
> >> On 15/02/16 14:50, Jan Beulich wrote:
> >>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
> >>>> On 15/02/16 09:20, Jan Beulich wrote:
> >>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
> >>>>>> On 12/02/16 17:05, Jan Beulich wrote:
> >>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> >>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
> >>>>>> (MONITORX/MWAITX) */
> >>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
> >>>>>> Because that is a good chunk of extra work to support.  We would need to
> >>>>>> use 4K monitor widths, and extra p2m handling.
> >>>>> I don't understand: The base (_MWAIT) feature being exposed to
> >>>>> guests today, and kernels making use of the feature when available
> >>>>> suggests to me that things work. Are you saying you know
> >>>>> otherwise? (And if there really is a reason to mask the feature all of
> >>>>> the sudden, this should again be justified in the commit message.)
> >>>> PV guests had it clobbered by Xen in traps.c
> >>>>
> >>>> HVM guests have:
> >>>>
> >>>> vmx.c:
> >>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
> >>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
> >>>> [...]
> >>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
> >>>>         break;
> >>>>
> >>>> and svm.c:
> >>>>     case VMEXIT_MONITOR:
> >>>>     case VMEXIT_MWAIT:
> >>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
> >>>>         break;
> >>>>
> >>>> I don't see how a guest could actually use this feature.
> >>> Do you see the respective intercepts getting enabled anywhere?
> >>> (I don't outside of nested code, which I didn't check in detail.)
> >> Yes - the intercepts are always enabled to prevent the guest actually
> >> putting the processor to sleep.
> > Hmm, you're right, somehow I've managed to ignore the relevant
> > lines grep reported. Yet - how do things work then, without the
> > MWAIT feature flag currently getting cleared?
> 
> I have never observed it being used.  Do you have some local patches in
> the SLES hypervisor?
> 
> There is some gross layer violation in xen/enlighten.c to pretend that
> MWAIT is present to trick the ACPI code into evaluating _CST() methods
> to report back to Xen.  (This is yet another PV-ism which will cause a
> headache for a DMLite dom0)

Yes indeed. CC-ing Roger, and Boris.

> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching
  2016-02-05 13:42 ` [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching Andrew Cooper
  2016-02-16 14:15   ` Jan Beulich
@ 2016-02-17 19:06   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 139+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-17 19:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

On Fri, Feb 05, 2016 at 01:42:10PM +0000, Andrew Cooper wrote:
> This change is purely scaffolding to reduce the complexity of the following
> three patches.

Keep in mind that the patches may not be applied right after this.

It would be easier to jus spell out the three patches.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-17 19:02                   ` Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: " Konrad Rzeszutek Wilk
@ 2016-02-17 19:58                     ` Boris Ostrovsky
  2016-02-18 15:02                     ` Roger Pau Monné
  1 sibling, 0 replies; 139+ messages in thread
From: Boris Ostrovsky @ 2016-02-17 19:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andrew Cooper, roger.pau; +Cc: Jan Beulich, Xen-devel

On 02/17/2016 02:02 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>   #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension
>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>
>>>>>> HVM guests have:
>>>>>>
>>>>>> vmx.c:
>>>>>>      case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>      case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>> [...]
>>>>>>      hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>          break;
>>>>>>
>>>>>> and svm.c:
>>>>>>      case VMEXIT_MONITOR:
>>>>>>      case VMEXIT_MWAIT:
>>>>>>          hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>          break;
>>>>>>
>>>>>> I don't see how a guest could actually use this feature.
>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>> putting the processor to sleep.
>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>> lines grep reported. Yet - how do things work then, without the
>>> MWAIT feature flag currently getting cleared?


We whitelist CPUID0x00000001.ecx features in 
libxc/xc_cpuid_x86.c:xc_cpuid_hvm_policy() so MWAIT is never set.

-boris


>> I have never observed it being used.  Do you have some local patches in
>> the SLES hypervisor?
>>
>> There is some gross layer violation in xen/enlighten.c to pretend that
>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>> to report back to Xen.  (This is yet another PV-ism which will cause a
>> headache for a DMLite dom0)
> Yes indeed. CC-ing Roger, and Boris.
>
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers
  2016-02-10 10:07       ` Andrew Cooper
  2016-02-10 10:18         ` Ian Campbell
@ 2016-02-17 20:06         ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 139+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-17 20:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Tim Deegan, Ian Campbell, Xen-devel

On Wed, Feb 10, 2016 at 10:07:12AM +0000, Andrew Cooper wrote:
> On 08/02/16 16:36, Ian Campbell wrote:
> > On Mon, 2016-02-08 at 16:23 +0000, Tim Deegan wrote:
> >> At 13:42 +0000 on 05 Feb (1454679737), Andrew Cooper wrote:
> >>> The type of the pointer to a bitmap is not interesting; it does not
> >>> affect the
> >>> representation of the block of bits being pointed to.
> >> It does affect the alignment, though.  Is this safe on ARM?
> > Good point. These constructs in the patch:
> >
> > +    const unsigned long *addr = _addr;
> >
> > Would be broken if _addr were not suitably aligned for an unsigned long.
> >
> > That probably rules out this approach unfortunately.
> 
> What about reworking libxc bitops in terms of unsigned char?  That
> should cover all alignment issues.

See 3cab67ac83b1d56c3daedd9c4adfed497a114246

"+/*
+ * xc_bitops.h has macros that do this as well - however they assume that
+ * the bitmask is word aligned but xc_cpumap_t is only guaranteed to be
+ * byte aligned and so we need byte versions for architectures which do
+ * not support misaligned accesses (which is basically everyone
+ * but x86, although even on x86 it can be inefficient).
+ */
"

> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-17 14:45               ` Jan Beulich
@ 2016-02-18 12:17                 ` Andrew Cooper
  2016-02-18 13:23                   ` Jan Beulich
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-18 12:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 14:45, Jan Beulich wrote:
>>>> On 17.02.16 at 15:02, <andrew.cooper3@citrix.com> wrote:
>> On 17/02/16 10:55, Jan Beulich wrote:
>>>>>> On 17.02.16 at 11:43, <andrew.cooper3@citrix.com> wrote:
>>>> On 16/02/16 10:06, Jan Beulich wrote:
>>>>>>>> On 15.02.16 at 18:12, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 15/02/16 15:43, Jan Beulich wrote:
>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>>>>>>          /* Fix up VLAPIC details. */
>>>>>>>>          *ebx &= 0x00FFFFFFu;
>>>>>>>>          *ebx |= (v->vcpu_id * 2) << 24;
>>>>>>>> +
>>>>>>>> +        *ecx &= hvm_featureset[FEATURESET_1c];
>>>>>>>> +        *edx &= hvm_featureset[FEATURESET_1d];
>>>>>>> Looks like I've overlooked an issue in patch 11, which becomes
>>>>>>> apparent here: How can you use a domain-independent featureset
>>>>>>> here, when features vary between HAP and shadow mode guests?
>>>>>>> I.e. in the earlier patch I suppose you need to calculate two
>>>>>>> hvm_*_featureset[]s, with the HAP one perhaps empty when
>>>>>>> !hvm_funcs.hap_supported.
>>>>>> Their use here is a halfway house between nothing and the planned full
>>>>>> per-domain policies.
>>>>>>
>>>>>> In this case, the "don't expose $X to a non-hap domain" checks have been
>>>>>> retained, to cover the difference.
>>>>> Well, doesn't it seem to you that doing only half of the HAP/shadow
>>>>> separation is odd/confusing? I.e. could I talk you into not doing any
>>>>> such separation (enforcing the non-HAP overrides as is done now)
>>>>> or finishing the separation to become visible/usable here?
>>>> The HAP/shadow distinction is needed in the toolstack to account for the
>>>> hap=<bool> option.
>>>>
>>>> The distinction will disappear when per-domain policies are introduced. 
>>>> If you notice, the distinction is private to the data generated by the
>>>> autogen script, and does not form a part of any API/ABI.  The sysctl
>>>> only has a single hvm featureset.
>>> I don't see this as being in line with
>>>
>>>     hvm_featuremask = hvm_funcs.hap_supported ?
>>>         hvm_hap_featuremask : hvm_shadow_featuremask;
>>>
>>> in patch 11. A shadow mode guest should see exactly the same
>>> set of features, regardless of whether HAP was available (and
>>> enabled) on a host.
>> A shadow mode guest will see the same features, independently of whether
>> HAP was available.
> I'm afraid I'm being dense: Either the guest sees the same features,
> which to me implies both of hvm_{hap,shadow}_featuremask are
> identical, or the two masks are different, resulting in different guest
> feature masks (and hence different guest features) depending on
> HAP availability. What am I missing?

A guest booted with hap and a guest booted with shadow will see
different features when booted on the same host.  Hap includes 1GB
superpages, PCID, etc.

The problem comes with a shadow guest booted on a hap-capable host. 
Such a guest can safely be migrated to a non-hap capable host, but only
if the toolstack knows that the guest saw a reduced featureset.

As there is still no interface to query what a guest can actually see
(depends on full per-domain policys and no dynamic hiding), the shadow
featuremask is used by the toolstack as a promise of what the Xen
dynamic checks will do.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init()
  2016-02-17 10:58       ` Jan Beulich
@ 2016-02-18 12:41         ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-18 12:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 10:58, Jan Beulich wrote:
>>>> On 17.02.16 at 11:45, <andrew.cooper3@citrix.com> wrote:
>> On 16/02/16 14:10, Jan Beulich wrote:
>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>> Before c/s 44e24f8567 "x86: don't call generic_identify() redundantly", the
>>>> commandline-provided masks would take effect in Xen's view of the features.
>>>>
>>>> As the masks got applied after the query for features, the redundant call to
>>>> generic_identify() would clobber the pre-masking feature information with the
>>>> post-masking information.
>>>>
>>>> Move the set_cpumask() calls into c_early_init() so their effects take place
>>>> before the main query for features in generic_identify().
>>>>
>>>> The cpuid_mask_* command line parameters now limit the entire system, a
>>>> feature XenServer was relying on for testing purposes.
>>> And I continue to view this as a step backwards, and hence can't
>>> really approve of this change.
>> It is not a step backwards.  Being able to disable features in Xen is
>> critical for testing.  You accidentally broke that with a patch which
>> was supposed to be no functional change.
> Views differ: I would say I unknowingly fixed this with that patch
> (as to me it was always clear that this masking should not apply to
> Xen itself).
>
> If that behavior is critical for testing, add a command line option
> to enable it.

This series returns the command line options to their original
behaviour, and avoids them needing to be used for levelling. 
Introducing yet another command line option would be pointless.

~Andrew

>
>> This series means that these command line options are not required for
>> levelling.
> By the end of the series, agreed.
>
> Jan
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains
  2016-02-17 11:14       ` Jan Beulich
@ 2016-02-18 12:48         ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-18 12:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/02/16 11:14, Jan Beulich wrote:
>>>> On 17.02.16 at 12:03, <andrew.cooper3@citrix.com> wrote:
>> On 17/02/16 08:13, Jan Beulich wrote:
>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>> --- a/xen/arch/x86/cpu/amd.c
>>>> +++ b/xen/arch/x86/cpu/amd.c
>>>> @@ -208,7 +208,9 @@ static void __init noinline probe_masking_msrs(void)
>>>>  static void amd_ctxt_switch_levelling(const struct domain *nextd)
>>>>  {
>>>>  	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
>>>> -	const struct cpuidmasks *masks = &cpuidmask_defaults;
>>>> +	const struct cpuidmasks *masks =
>>>> +            (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
>>>> +            ? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
>>> Mixing tabs and spaces for indentation.
>>>
>>>> --- a/xen/arch/x86/domain.c
>>>> +++ b/xen/arch/x86/domain.c
>>>> @@ -574,6 +574,11 @@ int arch_domain_create(struct domain *d, unsigned int 
>> domcr_flags,
>>>>              goto fail;
>>>>          clear_page(d->arch.pv_domain.gdt_ldt_l1tab);
>>>>  
>>>> +        d->arch.pv_domain.cpuidmasks = xmalloc(struct cpuidmasks);
>>>> +        if ( !d->arch.pv_domain.cpuidmasks )
>>>> +            goto fail;
>>>> +        *d->arch.pv_domain.cpuidmasks = cpuidmask_defaults;
>>> Along the lines of not masking features for the hypervisor's own use
>>> (see the respective comment on the earlier patch) I think this patch,
>>> here or in domain_build.c, should except Dom0 from having the
>>> default masking applied. This shouldn't, however, extend to CPUID
>>> faulting. (Perhaps this rather belongs here so that the non-Dom0
>>> hardware domain case can also be taken care of.)
>> Very specifically not.  It is wrong to special case Dom0 and the
>> hardware domain, as their cpuid values should relevent to their VM, not
>> the host.
> I can't see how this second half of the sentence is a reason for
> not special casing Dom0.

Dom0 is just a VM which happens to have all the hardware by default.

It has the same requirements as all other VMs when it comes to cpuid;
most notably that it shouldn't see features which it can't use.  The
problem comes far more obvious with an HVMLite dom0, running an
almost-native kernel.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-02-18 12:17                 ` Andrew Cooper
@ 2016-02-18 13:23                   ` Jan Beulich
  0 siblings, 0 replies; 139+ messages in thread
From: Jan Beulich @ 2016-02-18 13:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 18.02.16 at 13:17, <andrew.cooper3@citrix.com> wrote:
> On 17/02/16 14:45, Jan Beulich wrote:
>>>>> On 17.02.16 at 15:02, <andrew.cooper3@citrix.com> wrote:
>>> On 17/02/16 10:55, Jan Beulich wrote:
>>>>>>> On 17.02.16 at 11:43, <andrew.cooper3@citrix.com> wrote:
>>>>> On 16/02/16 10:06, Jan Beulich wrote:
>>>>>>>>> On 15.02.16 at 18:12, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 15/02/16 15:43, Jan Beulich wrote:
>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
> unsigned int *ebx,
>>>>>>>>>          /* Fix up VLAPIC details. */
>>>>>>>>>          *ebx &= 0x00FFFFFFu;
>>>>>>>>>          *ebx |= (v->vcpu_id * 2) << 24;
>>>>>>>>> +
>>>>>>>>> +        *ecx &= hvm_featureset[FEATURESET_1c];
>>>>>>>>> +        *edx &= hvm_featureset[FEATURESET_1d];
>>>>>>>> Looks like I've overlooked an issue in patch 11, which becomes
>>>>>>>> apparent here: How can you use a domain-independent featureset
>>>>>>>> here, when features vary between HAP and shadow mode guests?
>>>>>>>> I.e. in the earlier patch I suppose you need to calculate two
>>>>>>>> hvm_*_featureset[]s, with the HAP one perhaps empty when
>>>>>>>> !hvm_funcs.hap_supported.
>>>>>>> Their use here is a halfway house between nothing and the planned full
>>>>>>> per-domain policies.
>>>>>>>
>>>>>>> In this case, the "don't expose $X to a non-hap domain" checks have been
>>>>>>> retained, to cover the difference.
>>>>>> Well, doesn't it seem to you that doing only half of the HAP/shadow
>>>>>> separation is odd/confusing? I.e. could I talk you into not doing any
>>>>>> such separation (enforcing the non-HAP overrides as is done now)
>>>>>> or finishing the separation to become visible/usable here?
>>>>> The HAP/shadow distinction is needed in the toolstack to account for the
>>>>> hap=<bool> option.
>>>>>
>>>>> The distinction will disappear when per-domain policies are introduced. 
>>>>> If you notice, the distinction is private to the data generated by the
>>>>> autogen script, and does not form a part of any API/ABI.  The sysctl
>>>>> only has a single hvm featureset.
>>>> I don't see this as being in line with
>>>>
>>>>     hvm_featuremask = hvm_funcs.hap_supported ?
>>>>         hvm_hap_featuremask : hvm_shadow_featuremask;
>>>>
>>>> in patch 11. A shadow mode guest should see exactly the same
>>>> set of features, regardless of whether HAP was available (and
>>>> enabled) on a host.
>>> A shadow mode guest will see the same features, independently of whether
>>> HAP was available.
>> I'm afraid I'm being dense: Either the guest sees the same features,
>> which to me implies both of hvm_{hap,shadow}_featuremask are
>> identical, or the two masks are different, resulting in different guest
>> feature masks (and hence different guest features) depending on
>> HAP availability. What am I missing?
> 
> A guest booted with hap and a guest booted with shadow will see
> different features when booted on the same host.  Hap includes 1GB
> superpages, PCID, etc.

Okay, now I (think I) at least understand the background. What
I then still don't understand is why you don't - in the code still
visible above - use a HAP/shadow dependent mask, thus taking
care of those differences instead of elsewhere doing dynamic
adjustments.

Jan

> The problem comes with a shadow guest booted on a hap-capable host. 
> Such a guest can safely be migrated to a non-hap capable host, but only
> if the toolstack knows that the guest saw a reduced featureset.
> 
> As there is still no interface to query what a guest can actually see
> (depends on full per-domain policys and no dynamic hiding), the shadow
> featuremask is used by the toolstack as a promise of what the Xen
> dynamic checks will do.
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers
  2016-02-10 10:18         ` Ian Campbell
@ 2016-02-18 13:37           ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-18 13:37 UTC (permalink / raw)
  To: Ian Campbell, Tim Deegan; +Cc: Wei Liu, Ian Jackson, Xen-devel

On 10/02/16 10:18, Ian Campbell wrote:
> On Wed, 2016-02-10 at 10:07 +0000, Andrew Cooper wrote:
>> On 08/02/16 16:36, Ian Campbell wrote:
>>> On Mon, 2016-02-08 at 16:23 +0000, Tim Deegan wrote:
>>>> At 13:42 +0000 on 05 Feb (1454679737), Andrew Cooper wrote:
>>>>> The type of the pointer to a bitmap is not interesting; it does not
>>>>> affect the
>>>>> representation of the block of bits being pointed to.
>>>> It does affect the alignment, though.  Is this safe on ARM?
>>> Good point. These constructs in the patch:
>>>
>>> +    const unsigned long *addr = _addr;
>>>
>>> Would be broken if _addr were not suitably aligned for an unsigned
>>> long.
>>>
>>> That probably rules out this approach unfortunately.
>> What about reworking libxc bitops in terms of unsigned char?  That
>> should cover all alignment issues.
> Assuming any asm or calls to __builtin_foo backends were adjusted to suite,
> that would be ok, would that be compatible with the Xen side though?

It is all plain C.  What I mean is

-static inline int test_bit(int nr, unsigned long *addr)
+static inline int test_bit(int nr, const void *_addr)
 {
+    const char *addr = _addr;
     return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
 }

and changing BITMAP_{ENTRY,SHIFT}() to use char rather than unsigned long.

The prototypes still have void *, but char* is used internally which
will match the minimum alignment of any object passed.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-17 19:02                   ` Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: " Konrad Rzeszutek Wilk
  2016-02-17 19:58                     ` Boris Ostrovsky
@ 2016-02-18 15:02                     ` Roger Pau Monné
  2016-02-18 15:12                       ` Andrew Cooper
  2016-02-18 15:16                       ` David Vrabel
  1 sibling, 2 replies; 139+ messages in thread
From: Roger Pau Monné @ 2016-02-18 15:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andrew Cooper, boris.ostrovsky
  Cc: Jan Beulich, Xen-devel

El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>
>>>>>> HVM guests have:
>>>>>>
>>>>>> vmx.c:
>>>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>> [...]
>>>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>         break;
>>>>>>
>>>>>> and svm.c:
>>>>>>     case VMEXIT_MONITOR:
>>>>>>     case VMEXIT_MWAIT:
>>>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>         break;
>>>>>>
>>>>>> I don't see how a guest could actually use this feature.
>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>> putting the processor to sleep.
>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>> lines grep reported. Yet - how do things work then, without the
>>> MWAIT feature flag currently getting cleared?
>>
>> I have never observed it being used.  Do you have some local patches in
>> the SLES hypervisor?
>>
>> There is some gross layer violation in xen/enlighten.c to pretend that
>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>> to report back to Xen.  (This is yet another PV-ism which will cause a
>> headache for a DMLite dom0)
> 
> Yes indeed. CC-ing Roger, and Boris.

Yes, all this is indeed not very nice, and we would ideally like to get
rid of it on PVHv2.

Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
order to fetch this information from user-space and send it to Xen using
privcmd?

AFAIK those tools work on most OSes (or at least the ones we care about
as Dom0).

Roger.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-18 15:02                     ` Roger Pau Monné
@ 2016-02-18 15:12                       ` Andrew Cooper
  2016-02-18 16:24                         ` Boris Ostrovsky
  2016-02-18 17:03                         ` Roger Pau Monné
  2016-02-18 15:16                       ` David Vrabel
  1 sibling, 2 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-18 15:12 UTC (permalink / raw)
  To: Roger Pau Monné, Konrad Rzeszutek Wilk, boris.ostrovsky
  Cc: Jan Beulich, Xen-devel

On 18/02/16 15:02, Roger Pau Monné wrote:
> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>
>>>>>>> HVM guests have:
>>>>>>>
>>>>>>> vmx.c:
>>>>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>> [...]
>>>>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>         break;
>>>>>>>
>>>>>>> and svm.c:
>>>>>>>     case VMEXIT_MONITOR:
>>>>>>>     case VMEXIT_MWAIT:
>>>>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>         break;
>>>>>>>
>>>>>>> I don't see how a guest could actually use this feature.
>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>>> putting the processor to sleep.
>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>> lines grep reported. Yet - how do things work then, without the
>>>> MWAIT feature flag currently getting cleared?
>>> I have never observed it being used.  Do you have some local patches in
>>> the SLES hypervisor?
>>>
>>> There is some gross layer violation in xen/enlighten.c to pretend that
>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>>> to report back to Xen.  (This is yet another PV-ism which will cause a
>>> headache for a DMLite dom0)
>> Yes indeed. CC-ing Roger, and Boris.
> Yes, all this is indeed not very nice, and we would ideally like to get
> rid of it on PVHv2.
>
> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
> order to fetch this information from user-space and send it to Xen using
> privcmd?
>
> AFAIK those tools work on most OSes (or at least the ones we care about
> as Dom0).

In general, we can't rely on userspace evaluation of AML.

For trivial AML which evaluates to a constant, it could be interpreted
by userspace, but anything accessing system resources will need
evaluating by the kernel.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-18 15:02                     ` Roger Pau Monné
  2016-02-18 15:12                       ` Andrew Cooper
@ 2016-02-18 15:16                       ` David Vrabel
  1 sibling, 0 replies; 139+ messages in thread
From: David Vrabel @ 2016-02-18 15:16 UTC (permalink / raw)
  To: Roger Pau Monné,
	Konrad Rzeszutek Wilk, Andrew Cooper, boris.ostrovsky
  Cc: Jan Beulich, Xen-devel

On 18/02/16 15:02, Roger Pau Monné wrote:
> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>
>>>>>>> HVM guests have:
>>>>>>>
>>>>>>> vmx.c:
>>>>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>> [...]
>>>>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>         break;
>>>>>>>
>>>>>>> and svm.c:
>>>>>>>     case VMEXIT_MONITOR:
>>>>>>>     case VMEXIT_MWAIT:
>>>>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>         break;
>>>>>>>
>>>>>>> I don't see how a guest could actually use this feature.
>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>>> putting the processor to sleep.
>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>> lines grep reported. Yet - how do things work then, without the
>>>> MWAIT feature flag currently getting cleared?
>>>
>>> I have never observed it being used.  Do you have some local patches in
>>> the SLES hypervisor?
>>>
>>> There is some gross layer violation in xen/enlighten.c to pretend that
>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>>> to report back to Xen.  (This is yet another PV-ism which will cause a
>>> headache for a DMLite dom0)
>>
>> Yes indeed. CC-ing Roger, and Boris.
> 
> Yes, all this is indeed not very nice, and we would ideally like to get
> rid of it on PVHv2.
> 
> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
> order to fetch this information from user-space and send it to Xen using
> privcmd?
> 
> AFAIK those tools work on most OSes (or at least the ones we care about
> as Dom0).

If that works, that would be great!

David

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-18 15:12                       ` Andrew Cooper
@ 2016-02-18 16:24                         ` Boris Ostrovsky
  2016-02-18 16:48                           ` Andrew Cooper
  2016-02-18 17:03                         ` Roger Pau Monné
  1 sibling, 1 reply; 139+ messages in thread
From: Boris Ostrovsky @ 2016-02-18 16:24 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monné, Konrad Rzeszutek Wilk
  Cc: Jan Beulich, Xen-devel



On 02/18/2016 10:12 AM, Andrew Cooper wrote:
> On 18/02/16 15:02, Roger Pau Monné wrote:
>> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>>   #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension
>>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>>
>>>>>>>> HVM guests have:
>>>>>>>>
>>>>>>>> vmx.c:
>>>>>>>>      case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>>      case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>>> [...]
>>>>>>>>      hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>          break;
>>>>>>>>
>>>>>>>> and svm.c:
>>>>>>>>      case VMEXIT_MONITOR:
>>>>>>>>      case VMEXIT_MWAIT:
>>>>>>>>          hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>          break;
>>>>>>>>
>>>>>>>> I don't see how a guest could actually use this feature.
>>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>>>> putting the processor to sleep.
>>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>>> lines grep reported. Yet - how do things work then, without the
>>>>> MWAIT feature flag currently getting cleared?
>>>> I have never observed it being used.  Do you have some local patches in
>>>> the SLES hypervisor?
>>>>
>>>> There is some gross layer violation in xen/enlighten.c to pretend that
>>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>>>> to report back to Xen.  (This is yet another PV-ism which will cause a
>>>> headache for a DMLite dom0)
>>> Yes indeed. CC-ing Roger, and Boris.
>> Yes, all this is indeed not very nice, and we would ideally like to get
>> rid of it on PVHv2.

We will have to come up with something else: AIUI the whole point of 
xen_check_mwait() is to come up with proper ECX and EDX values for the 
MWAIT CPUID leaf. Those value are expected to be reported from 
xen_cpuid() pv_op so that acpi_processor_ffh_state_probe_cpu() can set C 
states structures properly.

The problem is that PVH executes CPUID instruction natively. (And so 
this must have been broken on PVHv1 as well).

-boris


>>
>> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
>> order to fetch this information from user-space and send it to Xen using
>> privcmd?
>>
>> AFAIK those tools work on most OSes (or at least the ones we care about
>> as Dom0).
> In general, we can't rely on userspace evaluation of AML.
>
> For trivial AML which evaluates to a constant, it could be interpreted
> by userspace, but anything accessing system resources will need
> evaluating by the kernel.
>
> ~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-18 16:24                         ` Boris Ostrovsky
@ 2016-02-18 16:48                           ` Andrew Cooper
  0 siblings, 0 replies; 139+ messages in thread
From: Andrew Cooper @ 2016-02-18 16:48 UTC (permalink / raw)
  To: Boris Ostrovsky, Roger Pau Monné, Konrad Rzeszutek Wilk
  Cc: Jan Beulich, Xen-devel

On 18/02/16 16:24, Boris Ostrovsky wrote:
>
>
> On 02/18/2016 10:12 AM, Andrew Cooper wrote:
>> On 18/02/16 15:02, Roger Pau Monné wrote:
>>> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>>>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>>>   #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT
>>>>>>>>>>>>> extension
>>>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>>>> Because that is a good chunk of extra work to support.  We
>>>>>>>>>>> would need to
>>>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>>>> guests today, and kernels making use of the feature when
>>>>>>>>>> available
>>>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>>>> otherwise? (And if there really is a reason to mask the
>>>>>>>>>> feature all of
>>>>>>>>>> the sudden, this should again be justified in the commit
>>>>>>>>>> message.)
>>>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>>>
>>>>>>>>> HVM guests have:
>>>>>>>>>
>>>>>>>>> vmx.c:
>>>>>>>>>      case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>>>      case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>>>> [...]
>>>>>>>>>      hvm_inject_hw_exception(TRAP_invalid_op,
>>>>>>>>> HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>>          break;
>>>>>>>>>
>>>>>>>>> and svm.c:
>>>>>>>>>      case VMEXIT_MONITOR:
>>>>>>>>>      case VMEXIT_MWAIT:
>>>>>>>>>          hvm_inject_hw_exception(TRAP_invalid_op,
>>>>>>>>> HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>>          break;
>>>>>>>>>
>>>>>>>>> I don't see how a guest could actually use this feature.
>>>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>>>> Yes - the intercepts are always enabled to prevent the guest
>>>>>>> actually
>>>>>>> putting the processor to sleep.
>>>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>>>> lines grep reported. Yet - how do things work then, without the
>>>>>> MWAIT feature flag currently getting cleared?
>>>>> I have never observed it being used.  Do you have some local
>>>>> patches in
>>>>> the SLES hypervisor?
>>>>>
>>>>> There is some gross layer violation in xen/enlighten.c to pretend
>>>>> that
>>>>> MWAIT is present to trick the ACPI code into evaluating _CST()
>>>>> methods
>>>>> to report back to Xen.  (This is yet another PV-ism which will
>>>>> cause a
>>>>> headache for a DMLite dom0)
>>>> Yes indeed. CC-ing Roger, and Boris.
>>> Yes, all this is indeed not very nice, and we would ideally like to get
>>> rid of it on PVHv2.
>
> We will have to come up with something else: AIUI the whole point of
> xen_check_mwait() is to come up with proper ECX and EDX values for the
> MWAIT CPUID leaf. Those value are expected to be reported from
> xen_cpuid() pv_op so that acpi_processor_ffh_state_probe_cpu() can set
> C states structures properly.
>
> The problem is that PVH executes CPUID instruction natively. (And so
> this must have been broken on PVHv1 as well).

Currently, mwait is unusable by any domains, and will not be offered in
any cpuid policy.

How a particular dom0 goes about deciding to enumerate the ACPI objects
is its own business, but personally I think it is a layering violation
to have the enumeration of an existing ACPI object based on a feature bit.

Dom0, being suitably enlightened, should know that its job is to service
Xen when it comes to ACPI, and unilaterally collect and upload
everything it can.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-18 15:12                       ` Andrew Cooper
  2016-02-18 16:24                         ` Boris Ostrovsky
@ 2016-02-18 17:03                         ` Roger Pau Monné
  2016-02-18 22:08                           ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 139+ messages in thread
From: Roger Pau Monné @ 2016-02-18 17:03 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk, boris.ostrovsky
  Cc: Jan Beulich, Xen-devel

El 18/2/16 a les 16:12, Andrew Cooper ha escrit:
> On 18/02/16 15:02, Roger Pau Monné wrote:
>> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>>
>>>>>>>> HVM guests have:
>>>>>>>>
>>>>>>>> vmx.c:
>>>>>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>>> [...]
>>>>>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>         break;
>>>>>>>>
>>>>>>>> and svm.c:
>>>>>>>>     case VMEXIT_MONITOR:
>>>>>>>>     case VMEXIT_MWAIT:
>>>>>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>         break;
>>>>>>>>
>>>>>>>> I don't see how a guest could actually use this feature.
>>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>>>> putting the processor to sleep.
>>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>>> lines grep reported. Yet - how do things work then, without the
>>>>> MWAIT feature flag currently getting cleared?
>>>> I have never observed it being used.  Do you have some local patches in
>>>> the SLES hypervisor?
>>>>
>>>> There is some gross layer violation in xen/enlighten.c to pretend that
>>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>>>> to report back to Xen.  (This is yet another PV-ism which will cause a
>>>> headache for a DMLite dom0)
>>> Yes indeed. CC-ing Roger, and Boris.
>> Yes, all this is indeed not very nice, and we would ideally like to get
>> rid of it on PVHv2.
>>
>> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
>> order to fetch this information from user-space and send it to Xen using
>> privcmd?
>>
>> AFAIK those tools work on most OSes (or at least the ones we care about
>> as Dom0).
> 
> In general, we can't rely on userspace evaluation of AML.
> 
> For trivial AML which evaluates to a constant, it could be interpreted
> by userspace, but anything accessing system resources will need
> evaluating by the kernel.

Hm, I've took a look at the ACPI tables in one of my systems, and I'm 
not sure, but I guess the CPU related methods indeed must be executed 
by the kernel. I don't have much idea of ASL, but I guess the 
"Register" instruction means that a specific register must be poked, 
and it probably can't be done from user-space:

[...]
    Scope (\_PR.CPU0)
    {
        Name (_PPC, Zero)  // _PPC: Performance Present Capabilities
        Method (_PCT, 0, NotSerialized)  // _PCT: Performance Control
        {
            \_PR.CPU0._PPC = \_PR.CPPC
            If (((CFGD & One) && (PDC0 & One)))
            {
                Return (Package (0x02)
                {
                    ResourceTemplate ()
                    {
                        Register (FFixedHW, 
                            0x00,               // Bit Width
                            0x00,               // Bit Offset
                            0x0000000000000000, // Address
                            ,)
                    }, 

                    ResourceTemplate ()
                    {
                        Register (FFixedHW, 
                            0x00,               // Bit Width
                            0x00,               // Bit Offset
                            0x0000000000000000, // Address
                            ,)
                    }
                })
            }
        }

        Name (_PSS, Package (0x10)  // _PSS: Performance Supported States
        {
            Package (0x06)
            {
                0x00000834, 
                0x00003A98, 
                0x0000000A, 
                0x0000000A, 
                0x00001500, 
                0x00001500
            }, 

            Package (0x06)
            {
                0x000007D0, 
                0x00003708, 
                0x0000000A, 
                0x0000000A, 
                0x00001400, 
                0x00001400
            }, 
[...]

Do we have a formal list of what exactly does Xen want from ACPI that 
it cannot fetch itself?

I'm quite sure Xen cares about all the "Processor Vendor-Specific ACPI" 
[0], that should be _PCT, _CST and _PTC (located in \_PR_.CPUN._XXX).

Roger.

[0] http://www.intel.es/content/dam/www/public/us/en/documents/product-specifications/processor-vendor-specific-acpi-specification.pdf

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset
  2016-02-18 17:03                         ` Roger Pau Monné
@ 2016-02-18 22:08                           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 139+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-18 22:08 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, boris.ostrovsky, Jan Beulich, Xen-devel

. snip..
> >>>>> Hmm, you're right, somehow I've managed to ignore the relevant
> >>>>> lines grep reported. Yet - how do things work then, without the
> >>>>> MWAIT feature flag currently getting cleared?
> >>>> I have never observed it being used.  Do you have some local patches in
> >>>> the SLES hypervisor?
> >>>>
> >>>> There is some gross layer violation in xen/enlighten.c to pretend that
> >>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
> >>>> to report back to Xen.  (This is yet another PV-ism which will cause a
> >>>> headache for a DMLite dom0)
> >>> Yes indeed. CC-ing Roger, and Boris.
> >> Yes, all this is indeed not very nice, and we would ideally like to get
> >> rid of it on PVHv2.
> >>
> >> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
> >> order to fetch this information from user-space and send it to Xen using
> >> privcmd?
> >>
> >> AFAIK those tools work on most OSes (or at least the ones we care about
> >> as Dom0).
> > 
> > In general, we can't rely on userspace evaluation of AML.
> > 
> > For trivial AML which evaluates to a constant, it could be interpreted
> > by userspace, but anything accessing system resources will need
> > evaluating by the kernel.
> 
> Hm, I've took a look at the ACPI tables in one of my systems, and I'm 
... snip ..
> Do we have a formal list of what exactly does Xen want from ACPI that 
> it cannot fetch itself?
> 
> I'm quite sure Xen cares about all the "Processor Vendor-Specific ACPI" 
> [0], that should be _PCT, _CST and _PTC (located in \_PR_.CPUN._XXX).

Correct. But those values - especially _CST are modified by the firmware
depending on the ..[something, I can't actually find the code for it].

Here is an copy-n-paste of the code that sets the
generic ACPI code on the path to get the lower C-states:

commit 73c154c60be106b47f15d1111fc2d75cc7a436f2
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Mon Feb 13 22:26:32 2012 -0500

    xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
    
    For the hypervisor to take advantage of the MWAIT support it needs
    to extract from the ACPI _CST the register address. But the
    hypervisor does not have the support to parse DSDT so it relies on
    the initial domain (dom0) to parse the ACPI Power Management information
    and push it up to the hypervisor. The pushing of the data is done
    by the processor_harveset_xen module which parses the information that
    the ACPI parser has graciously exposed in 'struct acpi_processor'.
    
    For the ACPI parser to also expose the Cx states for MWAIT, we need
    to expose the MWAIT capability (leaf 1). Furthermore we also need to
    expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
    function.
    
    The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
    operations, but it can't do it since it needs to be backwards compatible.
    Instead we choose to use the native CPUID to figure out if the MWAIT
    capability exists and use the XEN_SET_PDC query hypercall to figure out
    if the hypervisor wants us to expose the MWAIT_LEAF capability or not.
    
    Note: The XEN_SET_PDC query was implemented in c/s 23783:
    "ACPI: add _PDC input override mechanism".
    
    With this in place, instead of
     C3 ACPI IOPORT 415
    we get now
     C3:ACPI FFH INTEL MWAIT 0x20
    
    Note: The cpu_idle which would be calling the mwait variants for
idling
    never gets set b/c we set the default pm_idle to be the hypercall
variant.

> 
> Roger.
> 
> [0] http://www.intel.es/content/dam/www/public/us/en/documents/product-specifications/processor-vendor-specific-acpi-specification.pdf

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-05 13:41 ` [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API Andrew Cooper
  2016-02-12 16:27   ` Jan Beulich
@ 2016-02-19 17:29   ` Joao Martins
  2016-02-19 17:55     ` Andrew Cooper
  1 sibling, 1 reply; 139+ messages in thread
From: Joao Martins @ 2016-02-19 17:29 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Ian Campbell, Jan Beulich, Xen-devel

On 02/05/2016 01:41 PM, Andrew Cooper wrote:
> For the featureset to be a useful object, it needs a stable interpretation, a
> property which is missing from the current hw_caps interface.
> 
> Additionly, introduce TSC_ADJUST, SHA, PREFETCHWT1, ITSC, EFRO and CLZERO
> which will be used by later changes.
> 
> To maintain compilation, FSCAPINTS is currently hardcoded at 9.  Future
> changes will change this to being dynamically generated.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Hey Andrew!

There are a few word motions in this patch:

[current]   [this series]
word 0   ->  word 0
word 4   ->  word 1
word 1,6 ->  word 2,3
word 2   ->  word 4
word 7,8 ->  word 5,6
         ->  word 7   (new leaf not previously described)
         ->  word 8   (new leaf not previously described)
word 3   ->  word 9   (linux defined mapping)

Since you're proposing the stabilization of physinfo.hw_caps and given that this
is exposed on both sysctl and libxl (through libxl_hwcap) shouldn't its size
match the real one (boot_cpu_data.x86_capability) i.e. NCAPINTS ? Additionally I
see that libxl_hwcap is also hardcoded to 8 alongside struct xen_sysctl_physinfo
when it should be 10 ?

libxl users could potentially make use of this hwcap field to see what features
the host CPU supports.

Joao

> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> 
> v2:
>  * Rebase over upstream changes
>  * Collect all feature introductions from later in the series
>  * Restrict API to Xen and toolstack
> ---
>  xen/include/asm-x86/cpufeature.h            | 159 +++--------------------
>  xen/include/public/arch-x86/cpufeatureset.h | 195 ++++++++++++++++++++++++++++
>  2 files changed, 210 insertions(+), 144 deletions(-)
>  create mode 100644 xen/include/public/arch-x86/cpufeatureset.h
> 
> diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
> index e7e369b..eb6eb63 100644
> --- a/xen/include/asm-x86/cpufeature.h
> +++ b/xen/include/asm-x86/cpufeature.h
> @@ -11,151 +11,22 @@
>  
>  #include <xen/const.h>
>  
> -#define NCAPINTS	9	/* N 32-bit words worth of info */
> -
> -/* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */
> -#define X86_FEATURE_FPU		(0*32+ 0) /* Onboard FPU */
> -#define X86_FEATURE_VME		(0*32+ 1) /* Virtual Mode Extensions */
> -#define X86_FEATURE_DE		(0*32+ 2) /* Debugging Extensions */
> -#define X86_FEATURE_PSE 	(0*32+ 3) /* Page Size Extensions */
> -#define X86_FEATURE_TSC		(0*32+ 4) /* Time Stamp Counter */
> -#define X86_FEATURE_MSR		(0*32+ 5) /* Model-Specific Registers, RDMSR, WRMSR */
> -#define X86_FEATURE_PAE		(0*32+ 6) /* Physical Address Extensions */
> -#define X86_FEATURE_MCE		(0*32+ 7) /* Machine Check Architecture */
> -#define X86_FEATURE_CX8		(0*32+ 8) /* CMPXCHG8 instruction */
> -#define X86_FEATURE_APIC	(0*32+ 9) /* Onboard APIC */
> -#define X86_FEATURE_SEP		(0*32+11) /* SYSENTER/SYSEXIT */
> -#define X86_FEATURE_MTRR	(0*32+12) /* Memory Type Range Registers */
> -#define X86_FEATURE_PGE		(0*32+13) /* Page Global Enable */
> -#define X86_FEATURE_MCA		(0*32+14) /* Machine Check Architecture */
> -#define X86_FEATURE_CMOV	(0*32+15) /* CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
> -#define X86_FEATURE_PAT		(0*32+16) /* Page Attribute Table */
> -#define X86_FEATURE_PSE36	(0*32+17) /* 36-bit PSEs */
> -#define X86_FEATURE_PN		(0*32+18) /* Processor serial number */
> -#define X86_FEATURE_CLFLSH	(0*32+19) /* Supports the CLFLUSH instruction */
> -#define X86_FEATURE_DS		(0*32+21) /* Debug Store */
> -#define X86_FEATURE_ACPI	(0*32+22) /* ACPI via MSR */
> -#define X86_FEATURE_MMX		(0*32+23) /* Multimedia Extensions */
> -#define X86_FEATURE_FXSR	(0*32+24) /* FXSAVE and FXRSTOR instructions (fast save and restore */
> -				          /* of FPU context), and CR4.OSFXSR available */
> -#define X86_FEATURE_XMM		(0*32+25) /* Streaming SIMD Extensions */
> -#define X86_FEATURE_XMM2	(0*32+26) /* Streaming SIMD Extensions-2 */
> -#define X86_FEATURE_SELFSNOOP	(0*32+27) /* CPU self snoop */
> -#define X86_FEATURE_HT		(0*32+28) /* Hyper-Threading */
> -#define X86_FEATURE_ACC		(0*32+29) /* Automatic clock control */
> -#define X86_FEATURE_IA64	(0*32+30) /* IA-64 processor */
> -#define X86_FEATURE_PBE		(0*32+31) /* Pending Break Enable */
> -
> -/* AMD-defined CPU features, CPUID level 0x80000001, word 1 */
> -/* Don't duplicate feature flags which are redundant with Intel! */
> -#define X86_FEATURE_SYSCALL	(1*32+11) /* SYSCALL/SYSRET */
> -#define X86_FEATURE_MP		(1*32+19) /* MP Capable. */
> -#define X86_FEATURE_NX		(1*32+20) /* Execute Disable */
> -#define X86_FEATURE_MMXEXT	(1*32+22) /* AMD MMX extensions */
> -#define X86_FEATURE_FFXSR       (1*32+25) /* FFXSR instruction optimizations */
> -#define X86_FEATURE_PAGE1GB	(1*32+26) /* 1Gb large page support */
> -#define X86_FEATURE_RDTSCP	(1*32+27) /* RDTSCP */
> -#define X86_FEATURE_LM		(1*32+29) /* Long Mode (x86-64) */
> -#define X86_FEATURE_3DNOWEXT	(1*32+30) /* AMD 3DNow! extensions */
> -#define X86_FEATURE_3DNOW	(1*32+31) /* 3DNow! */
> -
> -/* Intel-defined CPU features, CPUID level 0x0000000D:1 (eax), word 2 */
> -#define X86_FEATURE_XSAVEOPT	(2*32+ 0) /* XSAVEOPT instruction. */
> -#define X86_FEATURE_XSAVEC	(2*32+ 1) /* XSAVEC/XRSTORC instructions. */
> -#define X86_FEATURE_XGETBV1	(2*32+ 2) /* XGETBV with %ecx=1. */
> -#define X86_FEATURE_XSAVES	(2*32+ 3) /* XSAVES/XRSTORS instructions. */
> -
> -/* Other features, Linux-defined mapping, word 3 */
> +#include <public/arch-x86/cpufeatureset.h>
> +
> +#define FSCAPINTS 9
> +#define NCAPINTS (FSCAPINTS + 1) /* N 32-bit words worth of info */
> +
> +/* Other features, Linux-defined mapping, FSMAX+1 */
>  /* This range is used for feature bits which conflict or are synthesized */
> -#define X86_FEATURE_CONSTANT_TSC (3*32+ 8) /* TSC ticks at a constant rate */
> -#define X86_FEATURE_NONSTOP_TSC	(3*32+ 9) /* TSC does not stop in C states */
> -#define X86_FEATURE_ARAT	(3*32+ 10) /* Always running APIC timer */
> -#define X86_FEATURE_ARCH_PERFMON (3*32+11) /* Intel Architectural PerfMon */
> -#define X86_FEATURE_TSC_RELIABLE (3*32+12) /* TSC is known to be reliable */
> -#define X86_FEATURE_XTOPOLOGY    (3*32+13) /* cpu topology enum extensions */
> -#define X86_FEATURE_CPUID_FAULTING (3*32+14) /* cpuid faulting */
> -#define X86_FEATURE_CLFLUSH_MONITOR (3*32+15) /* clflush reqd with monitor */
> -#define X86_FEATURE_APERFMPERF   (3*32+16) /* APERFMPERF */
> -
> -/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
> -#define X86_FEATURE_XMM3	(4*32+ 0) /* Streaming SIMD Extensions-3 */
> -#define X86_FEATURE_PCLMULQDQ	(4*32+ 1) /* Carry-less mulitplication */
> -#define X86_FEATURE_DTES64	(4*32+ 2) /* 64-bit Debug Store */
> -#define X86_FEATURE_MWAIT	(4*32+ 3) /* Monitor/Mwait support */
> -#define X86_FEATURE_DSCPL	(4*32+ 4) /* CPL Qualified Debug Store */
> -#define X86_FEATURE_VMXE	(4*32+ 5) /* Virtual Machine Extensions */
> -#define X86_FEATURE_SMXE	(4*32+ 6) /* Safer Mode Extensions */
> -#define X86_FEATURE_EST		(4*32+ 7) /* Enhanced SpeedStep */
> -#define X86_FEATURE_TM2		(4*32+ 8) /* Thermal Monitor 2 */
> -#define X86_FEATURE_SSSE3	(4*32+ 9) /* Supplemental Streaming SIMD Extensions-3 */
> -#define X86_FEATURE_CID		(4*32+10) /* Context ID */
> -#define X86_FEATURE_FMA		(4*32+12) /* Fused Multiply Add */
> -#define X86_FEATURE_CX16        (4*32+13) /* CMPXCHG16B */
> -#define X86_FEATURE_XTPR	(4*32+14) /* Send Task Priority Messages */
> -#define X86_FEATURE_PDCM	(4*32+15) /* Perf/Debug Capability MSR */
> -#define X86_FEATURE_PCID	(4*32+17) /* Process Context ID */
> -#define X86_FEATURE_DCA		(4*32+18) /* Direct Cache Access */
> -#define X86_FEATURE_SSE4_1	(4*32+19) /* Streaming SIMD Extensions 4.1 */
> -#define X86_FEATURE_SSE4_2	(4*32+20) /* Streaming SIMD Extensions 4.2 */
> -#define X86_FEATURE_X2APIC	(4*32+21) /* Extended xAPIC */
> -#define X86_FEATURE_MOVBE	(4*32+22) /* movbe instruction */
> -#define X86_FEATURE_POPCNT	(4*32+23) /* POPCNT instruction */
> -#define X86_FEATURE_TSC_DEADLINE (4*32+24) /* "tdt" TSC Deadline Timer */
> -#define X86_FEATURE_AES		(4*32+25) /* AES instructions */
> -#define X86_FEATURE_XSAVE	(4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */
> -#define X86_FEATURE_OSXSAVE	(4*32+27) /* OSXSAVE */
> -#define X86_FEATURE_AVX 	(4*32+28) /* Advanced Vector Extensions */
> -#define X86_FEATURE_F16C 	(4*32+29) /* Half-precision convert instruction */
> -#define X86_FEATURE_RDRAND 	(4*32+30) /* Digital Random Number Generator */
> -#define X86_FEATURE_HYPERVISOR	(4*32+31) /* Running under some hypervisor */
> -
> -/* UNUSED, word 5 */
> -
> -/* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */
> -#define X86_FEATURE_LAHF_LM     (6*32+ 0) /* LAHF/SAHF in long mode */
> -#define X86_FEATURE_CMP_LEGACY  (6*32+ 1) /* If yes HyperThreading not valid */
> -#define X86_FEATURE_SVM         (6*32+ 2) /* Secure virtual machine */
> -#define X86_FEATURE_EXTAPIC     (6*32+ 3) /* Extended APIC space */
> -#define X86_FEATURE_CR8_LEGACY  (6*32+ 4) /* CR8 in 32-bit mode */
> -#define X86_FEATURE_ABM         (6*32+ 5) /* Advanced bit manipulation */
> -#define X86_FEATURE_SSE4A       (6*32+ 6) /* SSE-4A */
> -#define X86_FEATURE_MISALIGNSSE (6*32+ 7) /* Misaligned SSE mode */
> -#define X86_FEATURE_3DNOWPREFETCH (6*32+ 8) /* 3DNow prefetch instructions */
> -#define X86_FEATURE_OSVW        (6*32+ 9) /* OS Visible Workaround */
> -#define X86_FEATURE_IBS         (6*32+10) /* Instruction Based Sampling */
> -#define X86_FEATURE_XOP         (6*32+11) /* extended AVX instructions */
> -#define X86_FEATURE_SKINIT      (6*32+12) /* SKINIT/STGI instructions */
> -#define X86_FEATURE_WDT         (6*32+13) /* Watchdog timer */
> -#define X86_FEATURE_LWP         (6*32+15) /* Light Weight Profiling */
> -#define X86_FEATURE_FMA4        (6*32+16) /* 4 operands MAC instructions */
> -#define X86_FEATURE_NODEID_MSR  (6*32+19) /* NodeId MSR */
> -#define X86_FEATURE_TBM         (6*32+21) /* trailing bit manipulations */
> -#define X86_FEATURE_TOPOEXT     (6*32+22) /* topology extensions CPUID leafs */
> -#define X86_FEATURE_DBEXT       (6*32+26) /* data breakpoint extension */
> -#define X86_FEATURE_MWAITX      (6*32+29) /* MWAIT extension (MONITORX/MWAITX) */
> -
> -/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 7 */
> -#define X86_FEATURE_FSGSBASE	(7*32+ 0) /* {RD,WR}{FS,GS}BASE instructions */
> -#define X86_FEATURE_BMI1	(7*32+ 3) /* 1st bit manipulation extensions */
> -#define X86_FEATURE_HLE 	(7*32+ 4) /* Hardware Lock Elision */
> -#define X86_FEATURE_AVX2	(7*32+ 5) /* AVX2 instructions */
> -#define X86_FEATURE_SMEP	(7*32+ 7) /* Supervisor Mode Execution Protection */
> -#define X86_FEATURE_BMI2	(7*32+ 8) /* 2nd bit manipulation extensions */
> -#define X86_FEATURE_ERMS	(7*32+ 9) /* Enhanced REP MOVSB/STOSB */
> -#define X86_FEATURE_INVPCID	(7*32+10) /* Invalidate Process Context ID */
> -#define X86_FEATURE_RTM 	(7*32+11) /* Restricted Transactional Memory */
> -#define X86_FEATURE_CMT 	(7*32+12) /* Cache Monitoring Technology */
> -#define X86_FEATURE_NO_FPU_SEL 	(7*32+13) /* FPU CS/DS stored as zero */
> -#define X86_FEATURE_MPX		(7*32+14) /* Memory Protection Extensions */
> -#define X86_FEATURE_CAT 	(7*32+15) /* Cache Allocation Technology */
> -#define X86_FEATURE_RDSEED	(7*32+18) /* RDSEED instruction */
> -#define X86_FEATURE_ADX		(7*32+19) /* ADCX, ADOX instructions */
> -#define X86_FEATURE_SMAP	(7*32+20) /* Supervisor Mode Access Prevention */
> -#define X86_FEATURE_PCOMMIT	(7*32+22) /* PCOMMIT instruction */
> -
> -/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 8 */
> -#define X86_FEATURE_PKU	(8*32+ 3) /* Protection Keys for Userspace */
> -#define X86_FEATURE_OSPKE	(8*32+ 4) /* OS Protection Keys Enable */
> +#define X86_FEATURE_CONSTANT_TSC	((FSCAPINTS+0)*32+ 8) /* TSC ticks at a constant rate */
> +#define X86_FEATURE_NONSTOP_TSC		((FSCAPINTS+0)*32+ 9) /* TSC does not stop in C states */
> +#define X86_FEATURE_ARAT		((FSCAPINTS+0)*32+10) /* Always running APIC timer */
> +#define X86_FEATURE_ARCH_PERFMON	((FSCAPINTS+0)*32+11) /* Intel Architectural PerfMon */
> +#define X86_FEATURE_TSC_RELIABLE	((FSCAPINTS+0)*32+12) /* TSC is known to be reliable */
> +#define X86_FEATURE_XTOPOLOGY		((FSCAPINTS+0)*32+13) /* cpu topology enum extensions */
> +#define X86_FEATURE_CPUID_FAULTING	((FSCAPINTS+0)*32+14) /* cpuid faulting */
> +#define X86_FEATURE_CLFLUSH_MONITOR	((FSCAPINTS+0)*32+15) /* clflush reqd with monitor */
> +#define X86_FEATURE_APERFMPERF		((FSCAPINTS+0)*32+16) /* APERFMPERF */
>  
>  #define cpufeat_word(idx)	((idx) / 32)
>  #define cpufeat_bit(idx)	((idx) % 32)
> diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
> new file mode 100644
> index 0000000..02d695d
> --- /dev/null
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -0,0 +1,195 @@
> +/*
> + * arch-x86/cpufeatureset.h
> + *
> + * CPU featureset definitions
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + * Copyright (c) 2015 Citrix Systems, Inc.
> + */
> +
> +#ifndef __XEN_PUBLIC_ARCH_X86_CPUFEATURESET_H__
> +#define __XEN_PUBLIC_ARCH_X86_CPUFEATURESET_H__
> +
> +#if defined(__XEN__) || defined(__XEN_TOOLS__)
> +
> +/*
> + * A featureset is a bitmap of x86 features, represented as a collection of
> + * 32bit words.
> + *
> + * Words are as specified in vendors programming manuals, and shall not
> + * contain any synthesied values.  New words may be added to the end of
> + * featureset.
> + *
> + * All featureset words currently originate from leaves specified for the
> + * CPUID instruction, but this is not preclude other sources of information.
> + */
> +
> +/* Intel-defined CPU features, CPUID level 0x00000001.edx, word 0 */
> +#define X86_FEATURE_FPU           ( 0*32+ 0) /*   Onboard FPU */
> +#define X86_FEATURE_VME           ( 0*32+ 1) /*   Virtual Mode Extensions */
> +#define X86_FEATURE_DE            ( 0*32+ 2) /*   Debugging Extensions */
> +#define X86_FEATURE_PSE           ( 0*32+ 3) /*   Page Size Extensions */
> +#define X86_FEATURE_TSC           ( 0*32+ 4) /*   Time Stamp Counter */
> +#define X86_FEATURE_MSR           ( 0*32+ 5) /*   Model-Specific Registers, RDMSR, WRMSR */
> +#define X86_FEATURE_PAE           ( 0*32+ 6) /*   Physical Address Extensions */
> +#define X86_FEATURE_MCE           ( 0*32+ 7) /*   Machine Check Architecture */
> +#define X86_FEATURE_CX8           ( 0*32+ 8) /*   CMPXCHG8 instruction */
> +#define X86_FEATURE_APIC          ( 0*32+ 9) /*   Onboard APIC */
> +#define X86_FEATURE_SEP           ( 0*32+11) /*   SYSENTER/SYSEXIT */
> +#define X86_FEATURE_MTRR          ( 0*32+12) /*   Memory Type Range Registers */
> +#define X86_FEATURE_PGE           ( 0*32+13) /*   Page Global Enable */
> +#define X86_FEATURE_MCA           ( 0*32+14) /*   Machine Check Architecture */
> +#define X86_FEATURE_CMOV          ( 0*32+15) /*   CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
> +#define X86_FEATURE_PAT           ( 0*32+16) /*   Page Attribute Table */
> +#define X86_FEATURE_PSE36         ( 0*32+17) /*   36-bit PSEs */
> +#define X86_FEATURE_PN            ( 0*32+18) /*   Processor serial number */
> +#define X86_FEATURE_CLFLSH        ( 0*32+19) /*   CLFLUSH instruction */
> +#define X86_FEATURE_DS            ( 0*32+21) /*   Debug Store */
> +#define X86_FEATURE_ACPI          ( 0*32+22) /*   ACPI via MSR */
> +#define X86_FEATURE_MMX           ( 0*32+23) /*   Multimedia Extensions */
> +#define X86_FEATURE_FXSR          ( 0*32+24) /*   FXSAVE and FXRSTOR instructions */
> +#define X86_FEATURE_XMM           ( 0*32+25) /*   Streaming SIMD Extensions */
> +#define X86_FEATURE_XMM2          ( 0*32+26) /*   Streaming SIMD Extensions-2 */
> +#define X86_FEATURE_SELFSNOOP     ( 0*32+27) /*   CPU self snoop */
> +#define X86_FEATURE_HT            ( 0*32+28) /*   Hyper-Threading */
> +#define X86_FEATURE_ACC           ( 0*32+29) /*   Automatic clock control */
> +#define X86_FEATURE_IA64          ( 0*32+30) /*   IA-64 processor */
> +#define X86_FEATURE_PBE           ( 0*32+31) /*   Pending Break Enable */
> +
> +/* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
> +#define X86_FEATURE_XMM3          ( 1*32+ 0) /*   Streaming SIMD Extensions-3 */
> +#define X86_FEATURE_PCLMULQDQ     ( 1*32+ 1) /*   Carry-less mulitplication */
> +#define X86_FEATURE_DTES64        ( 1*32+ 2) /*   64-bit Debug Store */
> +#define X86_FEATURE_MWAIT         ( 1*32+ 3) /*   Monitor/Mwait support */
> +#define X86_FEATURE_DSCPL         ( 1*32+ 4) /*   CPL Qualified Debug Store */
> +#define X86_FEATURE_VMXE          ( 1*32+ 5) /*   Virtual Machine Extensions */
> +#define X86_FEATURE_SMXE          ( 1*32+ 6) /*   Safer Mode Extensions */
> +#define X86_FEATURE_EST           ( 1*32+ 7) /*   Enhanced SpeedStep */
> +#define X86_FEATURE_TM2           ( 1*32+ 8) /*   Thermal Monitor 2 */
> +#define X86_FEATURE_SSSE3         ( 1*32+ 9) /*   Supplemental Streaming SIMD Extensions-3 */
> +#define X86_FEATURE_CID           ( 1*32+10) /*   Context ID */
> +#define X86_FEATURE_FMA           ( 1*32+12) /*   Fused Multiply Add */
> +#define X86_FEATURE_CX16          ( 1*32+13) /*   CMPXCHG16B */
> +#define X86_FEATURE_XTPR          ( 1*32+14) /*   Send Task Priority Messages */
> +#define X86_FEATURE_PDCM          ( 1*32+15) /*   Perf/Debug Capability MSR */
> +#define X86_FEATURE_PCID          ( 1*32+17) /*   Process Context ID */
> +#define X86_FEATURE_DCA           ( 1*32+18) /*   Direct Cache Access */
> +#define X86_FEATURE_SSE4_1        ( 1*32+19) /*   Streaming SIMD Extensions 4.1 */
> +#define X86_FEATURE_SSE4_2        ( 1*32+20) /*   Streaming SIMD Extensions 4.2 */
> +#define X86_FEATURE_X2APIC        ( 1*32+21) /*   Extended xAPIC */
> +#define X86_FEATURE_MOVBE         ( 1*32+22) /*   movbe instruction */
> +#define X86_FEATURE_POPCNT        ( 1*32+23) /*   POPCNT instruction */
> +#define X86_FEATURE_TSC_DEADLINE  ( 1*32+24) /*   TSC Deadline Timer */
> +#define X86_FEATURE_AES           ( 1*32+25) /*   AES instructions */
> +#define X86_FEATURE_XSAVE         ( 1*32+26) /*   XSAVE/XRSTOR/XSETBV/XGETBV */
> +#define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
> +#define X86_FEATURE_AVX           ( 1*32+28) /*   Advanced Vector Extensions */
> +#define X86_FEATURE_F16C          ( 1*32+29) /*   Half-precision convert instruction */
> +#define X86_FEATURE_RDRAND        ( 1*32+30) /*   Digital Random Number Generator */
> +#define X86_FEATURE_HYPERVISOR    ( 1*32+31) /*   Running under some hypervisor */
> +
> +/* AMD-defined CPU features, CPUID level 0x80000001.edx, word 2 */
> +#define X86_FEATURE_SYSCALL       ( 2*32+11) /*   SYSCALL/SYSRET */
> +#define X86_FEATURE_MP            ( 2*32+19) /*   MP Capable. */
> +#define X86_FEATURE_NX            ( 2*32+20) /*   Execute Disable */
> +#define X86_FEATURE_MMXEXT        ( 2*32+22) /*   AMD MMX extensions */
> +#define X86_FEATURE_FFXSR         ( 2*32+25) /*   FFXSR instruction optimizations */
> +#define X86_FEATURE_PAGE1GB       ( 2*32+26) /*   1Gb large page support */
> +#define X86_FEATURE_RDTSCP        ( 2*32+27) /*   RDTSCP */
> +#define X86_FEATURE_LM            ( 2*32+29) /*   Long Mode (x86-64) */
> +#define X86_FEATURE_3DNOWEXT      ( 2*32+30) /*   AMD 3DNow! extensions */
> +#define X86_FEATURE_3DNOW         ( 2*32+31) /*   3DNow! */
> +
> +/* AMD-defined CPU features, CPUID level 0x80000001.ecx, word 3 */
> +#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
> +#define X86_FEATURE_CMP_LEGACY    ( 3*32+ 1) /*   If yes HyperThreading not valid */
> +#define X86_FEATURE_SVM           ( 3*32+ 2) /*   Secure virtual machine */
> +#define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
> +#define X86_FEATURE_CR8_LEGACY    ( 3*32+ 4) /*   CR8 in 32-bit mode */
> +#define X86_FEATURE_ABM           ( 3*32+ 5) /*   Advanced bit manipulation */
> +#define X86_FEATURE_SSE4A         ( 3*32+ 6) /*   SSE-4A */
> +#define X86_FEATURE_MISALIGNSSE   ( 3*32+ 7) /*   Misaligned SSE mode */
> +#define X86_FEATURE_3DNOWPREFETCH ( 3*32+ 8) /*   3DNow prefetch instructions */
> +#define X86_FEATURE_OSVW          ( 3*32+ 9) /*   OS Visible Workaround */
> +#define X86_FEATURE_IBS           ( 3*32+10) /*   Instruction Based Sampling */
> +#define X86_FEATURE_XOP           ( 3*32+11) /*   extended AVX instructions */
> +#define X86_FEATURE_SKINIT        ( 3*32+12) /*   SKINIT/STGI instructions */
> +#define X86_FEATURE_WDT           ( 3*32+13) /*   Watchdog timer */
> +#define X86_FEATURE_LWP           ( 3*32+15) /*   Light Weight Profiling */
> +#define X86_FEATURE_FMA4          ( 3*32+16) /*   4 operands MAC instructions */
> +#define X86_FEATURE_NODEID_MSR    ( 3*32+19) /*   NodeId MSR */
> +#define X86_FEATURE_TBM           ( 3*32+21) /*   trailing bit manipulations */
> +#define X86_FEATURE_TOPOEXT       ( 3*32+22) /*   topology extensions CPUID leafs */
> +#define X86_FEATURE_DBEXT         ( 3*32+26) /*   data breakpoint extension */
> +#define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension (MONITORX/MWAITX) */
> +
> +/* Intel-defined CPU features, CPUID level 0x0000000D:1.eax, word 4 */
> +#define X86_FEATURE_XSAVEOPT      ( 4*32+ 0) /*   XSAVEOPT instruction */
> +#define X86_FEATURE_XSAVEC        ( 4*32+ 1) /*   XSAVEC/XRSTORC instructions */
> +#define X86_FEATURE_XGETBV1       ( 4*32+ 2) /*   XGETBV with %ecx=1 */
> +#define X86_FEATURE_XSAVES        ( 4*32+ 3) /*   XSAVES/XRSTORS instructions */
> +
> +/* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */
> +#define X86_FEATURE_FSGSBASE      ( 5*32+ 0) /*   {RD,WR}{FS,GS}BASE instructions */
> +#define X86_FEATURE_TSC_ADJUST    ( 5*32+ 1) /*   TSC_ADJUST MSR available */
> +#define X86_FEATURE_BMI1          ( 5*32+ 3) /*   1st bit manipulation extensions */
> +#define X86_FEATURE_HLE           ( 5*32+ 4) /*   Hardware Lock Elision */
> +#define X86_FEATURE_AVX2          ( 5*32+ 5) /*   AVX2 instructions */
> +#define X86_FEATURE_SMEP          ( 5*32+ 7) /*   Supervisor Mode Execution Protection */
> +#define X86_FEATURE_BMI2          ( 5*32+ 8) /*   2nd bit manipulation extensions */
> +#define X86_FEATURE_ERMS          ( 5*32+ 9) /*   Enhanced REP MOVSB/STOSB */
> +#define X86_FEATURE_INVPCID       ( 5*32+10) /*   Invalidate Process Context ID */
> +#define X86_FEATURE_RTM           ( 5*32+11) /*   Restricted Transactional Memory */
> +#define X86_FEATURE_CMT           ( 5*32+12) /*   Cache Monitoring Technology */
> +#define X86_FEATURE_NO_FPU_SEL    ( 5*32+13) /*   FPU CS/DS stored as zero */
> +#define X86_FEATURE_MPX           ( 5*32+14) /*   Memory Protection Extensions */
> +#define X86_FEATURE_CAT           ( 5*32+15) /*   Cache Allocation Technology */
> +#define X86_FEATURE_RDSEED        ( 5*32+18) /*   RDSEED instruction */
> +#define X86_FEATURE_ADX           ( 5*32+19) /*   ADCX, ADOX instructions */
> +#define X86_FEATURE_SMAP          ( 5*32+20) /*   Supervisor Mode Access Prevention */
> +#define X86_FEATURE_PCOMMIT       ( 5*32+22) /*   PCOMMIT instruction */
> +#define X86_FEATURE_CLFLUSHOPT    ( 5*32+23) /*   CLFLUSHOPT instruction */
> +#define X86_FEATURE_CLWB          ( 5*32+24) /*   CLWB instruction */
> +#define X86_FEATURE_SHA           ( 5*32+29) /*   SHA1 & SHA256 instructions */
> +
> +/* Intel-defined CPU features, CPUID level 0x00000007:0.ecx, word 6 */
> +#define X86_FEATURE_PREFETCHWT1   ( 6*32+ 0) /*   PREFETCHWT1 instruction */
> +#define X86_FEATURE_PKU           ( 6*32+ 3) /*   Protection Keys for Userspace */
> +#define X86_FEATURE_OSPKE         ( 6*32+ 4) /*   OS Protection Keys Enable */
> +
> +/* AMD-defined CPU features, CPUID level 0x80000007.edx, word 7 */
> +#define X86_FEATURE_ITSC          ( 7*32+ 8) /*   Invariant TSC */
> +#define X86_FEATURE_EFRO          ( 7*32+10) /*   APERF/MPERF Read Only interface */
> +
> +/* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
> +#define X86_FEATURE_CLZERO        ( 8*32+ 0) /*   CLZERO instruction */
> +
> +#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
> +#endif /* !__XEN_PUBLIC_ARCH_X86_CPUFEATURESET_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-19 17:29   ` Joao Martins
@ 2016-02-19 17:55     ` Andrew Cooper
  2016-02-19 22:03       ` Joao Martins
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-19 17:55 UTC (permalink / raw)
  To: Joao Martins; +Cc: Tim Deegan, Ian Campbell, Jan Beulich, Xen-devel

On 19/02/16 17:29, Joao Martins wrote:
> On 02/05/2016 01:41 PM, Andrew Cooper wrote:
>> For the featureset to be a useful object, it needs a stable interpretation, a
>> property which is missing from the current hw_caps interface.
>>
>> Additionly, introduce TSC_ADJUST, SHA, PREFETCHWT1, ITSC, EFRO and CLZERO
>> which will be used by later changes.
>>
>> To maintain compilation, FSCAPINTS is currently hardcoded at 9.  Future
>> changes will change this to being dynamically generated.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Hey Andrew!
>
> There are a few word motions in this patch:

Indeed there are.  They are in aid of getting a new clean interface.

>
> [current]   [this series]
> word 0   ->  word 0
> word 4   ->  word 1
> word 1,6 ->  word 2,3
> word 2   ->  word 4
> word 7,8 ->  word 5,6
>          ->  word 7   (new leaf not previously described)
>          ->  word 8   (new leaf not previously described)
> word 3   ->  word 9   (linux defined mapping)
>
> Since you're proposing the stabilization of physinfo.hw_caps

Stabilising of physinfo.hw_caps is a side effect, but it has shifted
words in the past (c/s 4f4eec3, 6c421a1, 9c907c6).  It is not a stable
interface from Xen, and cannot be relied upon.

It has also never had a published ABI.

>  and given that this
> is exposed on both sysctl and libxl (through libxl_hwcap) shouldn't its size
> match the real one (boot_cpu_data.x86_capability) i.e. NCAPINTS ? Additionally I
> see that libxl_hwcap is also hardcoded to 8 alongside struct xen_sysctl_physinfo
> when it should be 10 ?

Hardcoding of the size in sysctl can be worked around. Fixing libxl is
harder.

The synthetic leaves are internal and should not be exposed.

> libxl users could potentially make use of this hwcap field to see what features
> the host CPU supports.

The purpose of the new featureset interface is to have stable object
which can be used by higher level toolstacks.

This is done by pretending that hw_caps never existed, and replacing it
wholesale with a bitmap, (specified as variable length and safe to
zero-extend), with an ABI in the public header files detailing what each
bit means.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-19 17:55     ` Andrew Cooper
@ 2016-02-19 22:03       ` Joao Martins
  2016-02-20 16:17         ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Joao Martins @ 2016-02-19 22:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Ian Campbell, Jan Beulich, Xen-devel

On 02/19/2016 05:55 PM, Andrew Cooper wrote:
> On 19/02/16 17:29, Joao Martins wrote:
>> On 02/05/2016 01:41 PM, Andrew Cooper wrote:
>>> For the featureset to be a useful object, it needs a stable interpretation, a
>>> property which is missing from the current hw_caps interface.
>>>
>>> Additionly, introduce TSC_ADJUST, SHA, PREFETCHWT1, ITSC, EFRO and CLZERO
>>> which will be used by later changes.
>>>
>>> To maintain compilation, FSCAPINTS is currently hardcoded at 9.  Future
>>> changes will change this to being dynamically generated.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Hey Andrew!
>>
>> There are a few word motions in this patch:
> 
> Indeed there are.  They are in aid of getting a new clean interface.
> 
>>
>> [current]   [this series]
>> word 0   ->  word 0
>> word 4   ->  word 1
>> word 1,6 ->  word 2,3
>> word 2   ->  word 4
>> word 7,8 ->  word 5,6
>>          ->  word 7   (new leaf not previously described)
>>          ->  word 8   (new leaf not previously described)
>> word 3   ->  word 9   (linux defined mapping)
>>
>> Since you're proposing the stabilization of physinfo.hw_caps
> 
> Stabilising of physinfo.hw_caps is a side effect, but it has shifted
> words in the past (c/s 4f4eec3, 6c421a1, 9c907c6).  It is not a stable
> interface from Xen, and cannot be relied upon.
> 
> It has also never had a published ABI.
> 
Thanks for the clarification! I thought that it was sort of a stable API because
it was exposed through libxl, but I got the wrong idea entirely.

>>  and given that this
>> is exposed on both sysctl and libxl (through libxl_hwcap) shouldn't its size
>> match the real one (boot_cpu_data.x86_capability) i.e. NCAPINTS ? Additionally I
>> see that libxl_hwcap is also hardcoded to 8 alongside struct xen_sysctl_physinfo
>> when it should be 10 ?
> 
> Hardcoding of the size in sysctl can be worked around. Fixing libxl is
> harder.
> 
> The synthetic leaves are internal and should not be exposed.
> 
>> libxl users could potentially make use of this hwcap field to see what features
>> the host CPU supports.
> 
> The purpose of the new featureset interface is to have stable object
> which can be used by higher level toolstacks.
> 
> This is done by pretending that hw_caps never existed, and replacing it
> wholesale with a bitmap, (specified as variable length and safe to
> zero-extend), with an ABI in the public header files detailing what each
> bit means.

Given that you introduce a new API for libxc (xc_get_cpu_featureset()) perhaps
an equivalent to libxl could also be added? That wat users of libxl could also
query about the host and guests supported features. I would be happy to produce
patches towards that.

Joao

> 
> ~Andrew
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-19 22:03       ` Joao Martins
@ 2016-02-20 16:17         ` Andrew Cooper
  2016-02-20 17:39           ` Joao Martins
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-20 16:17 UTC (permalink / raw)
  To: Joao Martins; +Cc: Tim Deegan, Ian Campbell, Jan Beulich, Xen-devel

On 19/02/16 22:03, Joao Martins wrote:
> On 02/19/2016 05:55 PM, Andrew Cooper wrote:
>> On 19/02/16 17:29, Joao Martins wrote:
>>> On 02/05/2016 01:41 PM, Andrew Cooper wrote:
>>>> For the featureset to be a useful object, it needs a stable interpretation, a
>>>> property which is missing from the current hw_caps interface.
>>>>
>>>> Additionly, introduce TSC_ADJUST, SHA, PREFETCHWT1, ITSC, EFRO and CLZERO
>>>> which will be used by later changes.
>>>>
>>>> To maintain compilation, FSCAPINTS is currently hardcoded at 9.  Future
>>>> changes will change this to being dynamically generated.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Hey Andrew!
>>>
>>> There are a few word motions in this patch:
>> Indeed there are.  They are in aid of getting a new clean interface.
>>
>>> [current]   [this series]
>>> word 0   ->  word 0
>>> word 4   ->  word 1
>>> word 1,6 ->  word 2,3
>>> word 2   ->  word 4
>>> word 7,8 ->  word 5,6
>>>          ->  word 7   (new leaf not previously described)
>>>          ->  word 8   (new leaf not previously described)
>>> word 3   ->  word 9   (linux defined mapping)
>>>
>>> Since you're proposing the stabilization of physinfo.hw_caps
>> Stabilising of physinfo.hw_caps is a side effect, but it has shifted
>> words in the past (c/s 4f4eec3, 6c421a1, 9c907c6).  It is not a stable
>> interface from Xen, and cannot be relied upon.
>>
>> It has also never had a published ABI.
>>
> Thanks for the clarification! I thought that it was sort of a stable API because
> it was exposed through libxl, but I got the wrong idea entirely.

Supposedly so.  In reality, a number of poor decisions were made when
declaring certain things stable.

>
>>>  and given that this
>>> is exposed on both sysctl and libxl (through libxl_hwcap) shouldn't its size
>>> match the real one (boot_cpu_data.x86_capability) i.e. NCAPINTS ? Additionally I
>>> see that libxl_hwcap is also hardcoded to 8 alongside struct xen_sysctl_physinfo
>>> when it should be 10 ?
>> Hardcoding of the size in sysctl can be worked around. Fixing libxl is
>> harder.
>>
>> The synthetic leaves are internal and should not be exposed.
>>
>>> libxl users could potentially make use of this hwcap field to see what features
>>> the host CPU supports.
>> The purpose of the new featureset interface is to have stable object
>> which can be used by higher level toolstacks.
>>
>> This is done by pretending that hw_caps never existed, and replacing it
>> wholesale with a bitmap, (specified as variable length and safe to
>> zero-extend), with an ABI in the public header files detailing what each
>> bit means.
> Given that you introduce a new API for libxc (xc_get_cpu_featureset()) perhaps
> an equivalent to libxl could also be added? That wat users of libxl could also
> query about the host and guests supported features. I would be happy to produce
> patches towards that.

In principle, this is fine.  Part of this is covered by the xen-cpuid
utility in a later patch.

Despite my plans to further rework guest cpuid handling, the principle
of the {raw,host,pv,hvm}_featuresets is expected to stay, and be usable
in their current form.

However, other details such as the hvm hap and shadow mask are expected
to change moving forwards, so shouldn't be made into a stable API.

Note also that the current "inverted" mask is subject to some redesign,
following the "x86: workaround inability to fully restore FPU state​"
issue David was working on.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-20 16:17         ` Andrew Cooper
@ 2016-02-20 17:39           ` Joao Martins
  2016-02-20 19:17             ` Andrew Cooper
  0 siblings, 1 reply; 139+ messages in thread
From: Joao Martins @ 2016-02-20 17:39 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Ian Campbell, Jan Beulich, Xen-devel



On 02/20/2016 04:17 PM, Andrew Cooper wrote:
> On 19/02/16 22:03, Joao Martins wrote:
>> On 02/19/2016 05:55 PM, Andrew Cooper wrote:
>>> On 19/02/16 17:29, Joao Martins wrote:
>>>> On 02/05/2016 01:41 PM, Andrew Cooper wrote:
>>>>> For the featureset to be a useful object, it needs a stable interpretation, a
>>>>> property which is missing from the current hw_caps interface.
>>>>>
>>>>> Additionly, introduce TSC_ADJUST, SHA, PREFETCHWT1, ITSC, EFRO and CLZERO
>>>>> which will be used by later changes.
>>>>>
>>>>> To maintain compilation, FSCAPINTS is currently hardcoded at 9.  Future
>>>>> changes will change this to being dynamically generated.
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Hey Andrew!
>>>>
>>>> There are a few word motions in this patch:
>>> Indeed there are.  They are in aid of getting a new clean interface.
>>>
>>>> [current]   [this series]
>>>> word 0   ->  word 0
>>>> word 4   ->  word 1
>>>> word 1,6 ->  word 2,3
>>>> word 2   ->  word 4
>>>> word 7,8 ->  word 5,6
>>>>          ->  word 7   (new leaf not previously described)
>>>>          ->  word 8   (new leaf not previously described)
>>>> word 3   ->  word 9   (linux defined mapping)
>>>>
>>>> Since you're proposing the stabilization of physinfo.hw_caps
>>> Stabilising of physinfo.hw_caps is a side effect, but it has shifted
>>> words in the past (c/s 4f4eec3, 6c421a1, 9c907c6).  It is not a stable
>>> interface from Xen, and cannot be relied upon.
>>>
>>> It has also never had a published ABI.
>>>
>> Thanks for the clarification! I thought that it was sort of a stable API because
>> it was exposed through libxl, but I got the wrong idea entirely.
> 
> Supposedly so.  In reality, a number of poor decisions were made when
> declaring certain things stable.
> 
>>
>>>>  and given that this
>>>> is exposed on both sysctl and libxl (through libxl_hwcap) shouldn't its size
>>>> match the real one (boot_cpu_data.x86_capability) i.e. NCAPINTS ? Additionally I
>>>> see that libxl_hwcap is also hardcoded to 8 alongside struct xen_sysctl_physinfo
>>>> when it should be 10 ?
>>> Hardcoding of the size in sysctl can be worked around. Fixing libxl is
>>> harder.
>>>
>>> The synthetic leaves are internal and should not be exposed.
>>>
>>>> libxl users could potentially make use of this hwcap field to see what features
>>>> the host CPU supports.
>>> The purpose of the new featureset interface is to have stable object
>>> which can be used by higher level toolstacks.
>>>
>>> This is done by pretending that hw_caps never existed, and replacing it
>>> wholesale with a bitmap, (specified as variable length and safe to
>>> zero-extend), with an ABI in the public header files detailing what each
>>> bit means.
>> Given that you introduce a new API for libxc (xc_get_cpu_featureset()) perhaps
>> an equivalent to libxl could also be added? That wat users of libxl could also
>> query about the host and guests supported features. I would be happy to produce
>> patches towards that.
> 
> In principle, this is fine.  Part of this is covered by the xen-cpuid
> utility in a later patch.
> 
OK.

> Despite my plans to further rework guest cpuid handling, the principle
> of the {raw,host,pv,hvm}_featuresets is expected to stay, and be usable
> in their current form.
That's great to hear. The reason I brought this up is because libvirt has the
idea of cpu model and features associated with it (similar to qemu -cpu
XXX,+feature,-feature stuff but in an hypervisor agnostic manner that other
architectures can also use). libvirt could do mostly everything on its own, but
it still needs to know what the host supports. Based on that it then calculates
the lowest common denominator of cpu features to be enabled or masked out for
guests when comparing to an older family in a pool of servers. Though PV/HVM
(with{,out} hap/shadow) have different feature sets as you mention. So libvirt
might be thrown into error since a certain feature isn't sure to be set/masked
for a certain type of guest. So knowing those (i.e {pv,hvm,...}_featuresets in
advance lets libxl users make more reliable usage of the libxl cpuid policies to
more correctly normalize the cpuid for each type of guest.

> 
> However, other details such as the hvm hap and shadow mask are expected
> to change moving forwards, so shouldn't be made into a stable API.
> 
> Note also that the current "inverted" mask is subject to some redesign,
> following the "x86: workaround inability to fully restore FPU state​"
> issue David was working on.
Ah, Thanks for the heads up!

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-20 17:39           ` Joao Martins
@ 2016-02-20 19:17             ` Andrew Cooper
  2016-02-22 18:50               ` Joao Martins
  0 siblings, 1 reply; 139+ messages in thread
From: Andrew Cooper @ 2016-02-20 19:17 UTC (permalink / raw)
  To: Joao Martins; +Cc: Tim Deegan, Ian Campbell, Jan Beulich, Xen-devel

On 20/02/16 17:39, Joao Martins wrote:
>
>>>>>  and given that this
>>>>> is exposed on both sysctl and libxl (through libxl_hwcap) shouldn't its size
>>>>> match the real one (boot_cpu_data.x86_capability) i.e. NCAPINTS ? Additionally I
>>>>> see that libxl_hwcap is also hardcoded to 8 alongside struct xen_sysctl_physinfo
>>>>> when it should be 10 ?
>>>> Hardcoding of the size in sysctl can be worked around. Fixing libxl is
>>>> harder.
>>>>
>>>> The synthetic leaves are internal and should not be exposed.
>>>>
>>>>> libxl users could potentially make use of this hwcap field to see what features
>>>>> the host CPU supports.
>>>> The purpose of the new featureset interface is to have stable object
>>>> which can be used by higher level toolstacks.
>>>>
>>>> This is done by pretending that hw_caps never existed, and replacing it
>>>> wholesale with a bitmap, (specified as variable length and safe to
>>>> zero-extend), with an ABI in the public header files detailing what each
>>>> bit means.
>>> Given that you introduce a new API for libxc (xc_get_cpu_featureset()) perhaps
>>> an equivalent to libxl could also be added? That wat users of libxl could also
>>> query about the host and guests supported features. I would be happy to produce
>>> patches towards that.
>> In principle, this is fine.  Part of this is covered by the xen-cpuid
>> utility in a later patch.
>>
> OK.
>
>> Despite my plans to further rework guest cpuid handling, the principle
>> of the {raw,host,pv,hvm}_featuresets is expected to stay, and be usable
>> in their current form.
> That's great to hear. The reason I brought this up is because libvirt has the
> idea of cpu model and features associated with it (similar to qemu -cpu
> XXX,+feature,-feature stuff but in an hypervisor agnostic manner that other
> architectures can also use). libvirt could do mostly everything on its own, but
> it still needs to know what the host supports. Based on that it then calculates
> the lowest common denominator of cpu features to be enabled or masked out for
> guests when comparing to an older family in a pool of servers. Though PV/HVM
> (with{,out} hap/shadow) have different feature sets as you mention. So libvirt
> might be thrown into error since a certain feature isn't sure to be set/masked
> for a certain type of guest. So knowing those (i.e {pv,hvm,...}_featuresets in
> advance lets libxl users make more reliable usage of the libxl cpuid policies to
> more correctly normalize the cpuid for each type of guest.

Does libvirt currently use hw_caps (and my series will inadvertently
break it), or are you looking to do some new work for future benefit?

Sadly, cpuid levelling is a quagmire and not as simple as just choosing
the common subset of bits.  When I started this project I was expecting
it to be bad, but nothing like as bad as it has turned out to be.

As an example, the "deprecates fcs/fds" bit which is the subject of the
"inverted" mask.  The meaning of the bit is "hardware no longer supports
x87 fcs/fds, and they are hardwired to zero".

Originally, the point of the inverted mask was to make a "featureset"
which could be levelled sensibly without specific knowledge of the
meaning of each bit.  This property is important for forwards
compatibility, and avoiding unnecessary complexity in higher level
toolstack components.

However, with hindsight, attempting to level this bit is pointless.  It
is a statement about a change in pre-existing behaviour of an element of
the cpu pipeline, and the pipeline behaviour will not change depending
on how the bit is advertised to the guest.  Another bit, "fdp exception
only" is in a similar bucket.

Other issues, which I haven't even tried to tackle in this series, are
items such as the MXCSR mask.  The real value cannot be levelled, is
expected to remain constant after boot, and liable to induce #GP faults
on fxrstor if it changes.  Alternatively, there is EFER.LMSLE (long mode
segment limit enable) which doesn't even have a feature bit to indicate
availability (not that I can plausibly see an OS actually turning that
feature on).

A toolstack needs to handles all of:
* The maximum "configuration" available to a guest on the available servers.
* Which bits of that can be controlled, and which will simply leak through.
* What the guest actually saw when it booted.

(I use configuration here to include items such as max leaf, max phys
addr, etc which are important to be levelled, but not included in the
plain feature bits in cpuid).

My longterm plans involve:
* Having Xen construct a full "maximum" cpuid policy, rather than just a
featureset.
* Per-domain cpuid policy, seeded from maximum on domain_create, and
modified where appropriate (e.g. hap vs shadow, PV guest switching
between native and compat mode).
* All validity checking for updates in the set_cpuid hypercall rather
than being deferred to the cpuid intercept point.
* A get_cpuid hypercall so a toolstack can actually retrieve the policy
a guest will see.

Even further work involves:
* Put all this information into the migration stream, rather than having
it regenerated by the destination toolstack.
* MSR levelling.

But that is a huge quantity more work, which is why this series focuses
just on the featureset alone, in the hope that the featureset it still a
useful discrete item outside the context of a full cpuid policy.

I guess my question at the end of all this is what libvirt currently
handles of all of this?  We certainly can wire the featureset
information through libxl, but it is insufficient in the general case
for making migration safe.

~Andrew

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
  2016-02-20 19:17             ` Andrew Cooper
@ 2016-02-22 18:50               ` Joao Martins
  0 siblings, 0 replies; 139+ messages in thread
From: Joao Martins @ 2016-02-22 18:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Ian Campbell, Jan Beulich, Xen-devel



On 02/20/2016 07:17 PM, Andrew Cooper wrote:
> On 20/02/16 17:39, Joao Martins wrote:
>>
>>>>>>  and given that this
>>>>>> is exposed on both sysctl and libxl (through libxl_hwcap) shouldn't its size
>>>>>> match the real one (boot_cpu_data.x86_capability) i.e. NCAPINTS ? Additionally I
>>>>>> see that libxl_hwcap is also hardcoded to 8 alongside struct xen_sysctl_physinfo
>>>>>> when it should be 10 ?
>>>>> Hardcoding of the size in sysctl can be worked around. Fixing libxl is
>>>>> harder.
>>>>>
>>>>> The synthetic leaves are internal and should not be exposed.
>>>>>
>>>>>> libxl users could potentially make use of this hwcap field to see what features
>>>>>> the host CPU supports.
>>>>> The purpose of the new featureset interface is to have stable object
>>>>> which can be used by higher level toolstacks.
>>>>>
>>>>> This is done by pretending that hw_caps never existed, and replacing it
>>>>> wholesale with a bitmap, (specified as variable length and safe to
>>>>> zero-extend), with an ABI in the public header files detailing what each
>>>>> bit means.
>>>> Given that you introduce a new API for libxc (xc_get_cpu_featureset()) perhaps
>>>> an equivalent to libxl could also be added? That wat users of libxl could also
>>>> query about the host and guests supported features. I would be happy to produce
>>>> patches towards that.
>>> In principle, this is fine.  Part of this is covered by the xen-cpuid
>>> utility in a later patch.
>>>
>> OK.
>>
>>> Despite my plans to further rework guest cpuid handling, the principle
>>> of the {raw,host,pv,hvm}_featuresets is expected to stay, and be usable
>>> in their current form.
>> That's great to hear. The reason I brought this up is because libvirt has the
>> idea of cpu model and features associated with it (similar to qemu -cpu
>> XXX,+feature,-feature stuff but in an hypervisor agnostic manner that other
>> architectures can also use). libvirt could do mostly everything on its own, but
>> it still needs to know what the host supports. Based on that it then calculates
>> the lowest common denominator of cpu features to be enabled or masked out for
>> guests when comparing to an older family in a pool of servers. Though PV/HVM
>> (with{,out} hap/shadow) have different feature sets as you mention. So libvirt
>> might be thrown into error since a certain feature isn't sure to be set/masked
>> for a certain type of guest. So knowing those (i.e {pv,hvm,...}_featuresets in
>> advance lets libxl users make more reliable usage of the libxl cpuid policies to
>> more correctly normalize the cpuid for each type of guest.
> 
> Does libvirt currently use hw_caps (and my series will inadvertently
> break it), or are you looking to do some new work for future benefit?
Yeah, but only one bit i.e. PAE on word 0 (which is the only word that was kept
on the same place on your series). Yeah I am looking at this for future work and
trying to understand what's missing there. I do have a patch for libvirt to
parse your hw_caps but given it's not a stable format, so it might not make
sense anymore to upstream it.

> 
> Sadly, cpuid levelling is a quagmire and not as simple as just choosing
> the common subset of bits.  When I started this project I was expecting
> it to be bad, but nothing like as bad as it has turned out to be.
> 
Indeed, Perhaps I overstated a bit before, when saying "libvirt could do mostly
everything on its own". It certainly doesn't deal with these issues you mention
below. I guess this would hypervisor part of it (qemu/xen/vmware module on
libvirt). I further extend a bit below on what libvirt deals with.

> As an example, the "deprecates fcs/fds" bit which is the subject of the
> "inverted" mask.  The meaning of the bit is "hardware no longer supports
> x87 fcs/fds, and they are hardwired to zero".
> 
> Originally, the point of the inverted mask was to make a "featureset"
> which could be levelled sensibly without specific knowledge of the
> meaning of each bit.  This property is important for forwards
> compatibility, and avoiding unnecessary complexity in higher level
> toolstack components.
> 
> However, with hindsight, attempting to level this bit is pointless.  It
> is a statement about a change in pre-existing behaviour of an element of
> the cpu pipeline, and the pipeline behaviour will not change depending
> on how the bit is advertised to the guest.  Another bit, "fdp exception
> only" is in a similar bucket.
> 
> Other issues, which I haven't even tried to tackle in this series, are
> items such as the MXCSR mask.  The real value cannot be levelled, is
> expected to remain constant after boot, and liable to induce #GP faults
> on fxrstor if it changes.  Alternatively, there is EFER.LMSLE (long mode
> segment limit enable) which doesn't even have a feature bit to indicate
> availability (not that I can plausibly see an OS actually turning that
> feature on).
Woah, I wasn't aware of these issues levelling issues.

> 
> A toolstack needs to handles all of:
> * The maximum "configuration" available to a guest on the available servers.
> * Which bits of that can be controlled, and which will simply leak through.
> * What the guest actually saw when it booted.
> 
> (I use configuration here to include items such as max leaf, max phys
> addr, etc which are important to be levelled, but not included in the
> plain feature bits in cpuid).
> 
> My longterm plans involve:
> * Having Xen construct a full "maximum" cpuid policy, rather than just a
> featureset.
> * Per-domain cpuid policy, seeded from maximum on domain_create, and
> modified where appropriate (e.g. hap vs shadow, PV guest switching
> between native and compat mode).
> * All validity checking for updates in the set_cpuid hypercall rather
> than being deferred to the cpuid intercept point.
> * A get_cpuid hypercall so a toolstack can actually retrieve the policy
> a guest will see.
> 
> Even further work involves:
> * Put all this information into the migration stream, rather than having
> it regenerated by the destination toolstack.
> * MSR levelling.
> 
> But that is a huge quantity more work, which is why this series focuses
> just on the featureset alone, in the hope that the featureset it still a
> useful discrete item outside the context of a full cpuid policy.
> 
> I guess my question at the end of all this is what libvirt currently
> handles of all of this? 

Hm, libvirt is a high level toolstack (meaning higher than libxl) and doesn't
deal with these things at this detail, at least AFAICT. It has the idea of cpu
and feature of each which is an idea originally borrowed from qemu as a way of
feature representation of each type of host. Each supported hypervisor in
libvirt will deal it's own way.

It has a cpu map per architecture[0] (x86/ppc only) to describe for example how
does it look like each family of CPUs (Penryn, Broadwell, Opteron, etc). It
describes too how can the features be checked: on x86, these features are
described with CPUID leaf, subleaf and registers output as you might imagine.
Note that these can be changed the way the admin says, and even define custom
ones and exclude features from them too. With these defined you include the
common features and model to create the *guest* CPU definition. Upon
bootstrapping the hypervisor driver, it looks for the most similar model and
append any unmatched features in addition to the host cpu model. This is the
same algorithm when comparing a newer family to an older one in a pool of
servers i.e. comparing cpu definitions.

[This could be viewed the same as the items you included above:
* The maximum "configuration" available to a guest on the available servers.
* Which bits of that can be controlled, and which will simply leak through.]

Though it wouldn't deal with the configuration as you say, but just with the
features, deferring the rest to underlying hypervisor libraries in use?]

In addition there are also policies attached to each features: for features
there is "force", "require", "disable", "optional", "forbid". Also there are
policies to describe how you want to match the cpu model you are describing such
as *minimum* amount of features, *exact* match of features described. When
booting the guest it then checks whether all the features are actually there and
if it's all according to feature policies.
[This could be viewed the same as the items you included above:
* What the guest actually saw when it booted.]

[0]
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/cpu/cpu_map.xml;h=0b6d424db4bdaef7925a2faf7b881f104b1ef4e5;hb=HEAD


> We certainly can wire the featureset
> information through libxl, but it is insufficient in the general case
> for making migration safe.
Right, with the info and plans you just described I guess it some of things
aren't there yet and it would be a lot of "guesswork".

Thanks!
Joao

^ permalink raw reply	[flat|nested] 139+ messages in thread

end of thread, other threads:[~2016-02-22 18:50 UTC | newest]

Thread overview: 139+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 01/30] xen/x86: Drop X86_FEATURE_3DNOW_ALT Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 02/30] xen/x86: Do not store VIA/Cyrix/Centaur CPU features Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 03/30] xen/x86: Drop cpuinfo_x86.x86_power Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 04/30] xen/x86: Improvements to pv_cpuid() Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API Andrew Cooper
2016-02-12 16:27   ` Jan Beulich
2016-02-17 13:08     ` Andrew Cooper
2016-02-17 13:34       ` Jan Beulich
2016-02-19 17:29   ` Joao Martins
2016-02-19 17:55     ` Andrew Cooper
2016-02-19 22:03       ` Joao Martins
2016-02-20 16:17         ` Andrew Cooper
2016-02-20 17:39           ` Joao Martins
2016-02-20 19:17             ` Andrew Cooper
2016-02-22 18:50               ` Joao Martins
2016-02-05 13:41 ` [PATCH v2 06/30] xen/x86: Script to automatically process featureset information Andrew Cooper
2016-02-12 16:36   ` Jan Beulich
2016-02-12 16:43     ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 07/30] xen/x86: Collect more cpuid feature leaves Andrew Cooper
2016-02-12 16:38   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities Andrew Cooper
2016-02-12 16:43   ` Jan Beulich
2016-02-12 16:48     ` Andrew Cooper
2016-02-12 17:14       ` Jan Beulich
2016-02-17 13:12         ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset Andrew Cooper
2016-02-12 16:47   ` Jan Beulich
2016-02-12 16:50     ` Andrew Cooper
2016-02-12 17:15       ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset Andrew Cooper
2016-02-12 17:05   ` Jan Beulich
2016-02-12 17:42     ` Andrew Cooper
2016-02-15  9:20       ` Jan Beulich
2016-02-15 14:38         ` Andrew Cooper
2016-02-15 14:50           ` Jan Beulich
2016-02-15 14:53             ` Andrew Cooper
2016-02-15 15:02               ` Jan Beulich
2016-02-15 15:41                 ` Andrew Cooper
2016-02-17 19:02                   ` Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: " Konrad Rzeszutek Wilk
2016-02-17 19:58                     ` Boris Ostrovsky
2016-02-18 15:02                     ` Roger Pau Monné
2016-02-18 15:12                       ` Andrew Cooper
2016-02-18 16:24                         ` Boris Ostrovsky
2016-02-18 16:48                           ` Andrew Cooper
2016-02-18 17:03                         ` Roger Pau Monné
2016-02-18 22:08                           ` Konrad Rzeszutek Wilk
2016-02-18 15:16                       ` David Vrabel
2016-02-05 13:42 ` [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
2016-02-15 13:37   ` Jan Beulich
2016-02-15 14:57     ` Andrew Cooper
2016-02-15 15:07       ` Jan Beulich
2016-02-15 15:52         ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 12/30] xen/x86: Generate deep dependencies of features Andrew Cooper
2016-02-15 14:06   ` Jan Beulich
2016-02-15 15:28     ` Andrew Cooper
2016-02-15 15:52       ` Jan Beulich
2016-02-15 16:09         ` Andrew Cooper
2016-02-15 16:27           ` Jan Beulich
2016-02-15 19:07             ` Andrew Cooper
2016-02-16  9:54               ` Jan Beulich
2016-02-17 10:25                 ` Andrew Cooper
2016-02-17 10:42                   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
2016-02-15 14:53   ` Jan Beulich
2016-02-15 15:33     ` Andrew Cooper
2016-02-15 14:56   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 14/30] xen/x86: Improve disabling of features which have dependencies Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
2016-02-15 15:43   ` Jan Beulich
2016-02-15 17:12     ` Andrew Cooper
2016-02-16 10:06       ` Jan Beulich
2016-02-17 10:43         ` Andrew Cooper
2016-02-17 10:55           ` Jan Beulich
2016-02-17 14:02             ` Andrew Cooper
2016-02-17 14:45               ` Jan Beulich
2016-02-18 12:17                 ` Andrew Cooper
2016-02-18 13:23                   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
2016-02-16 14:10   ` Jan Beulich
2016-02-17 10:45     ` Andrew Cooper
2016-02-17 10:58       ` Jan Beulich
2016-02-18 12:41         ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching Andrew Cooper
2016-02-16 14:15   ` Jan Beulich
2016-02-17  8:15     ` Jan Beulich
2016-02-17 10:46       ` Andrew Cooper
2016-02-17 19:06   ` Konrad Rzeszutek Wilk
2016-02-05 13:42 ` [PATCH v2 18/30] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
2016-02-17  7:40   ` Jan Beulich
2016-02-17 10:56     ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 19/30] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
2016-02-17  7:57   ` Jan Beulich
2016-02-17 10:59     ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 20/30] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
2016-02-17  8:06   ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
2016-02-17  8:13   ` Jan Beulich
2016-02-17 11:03     ` Andrew Cooper
2016-02-17 11:14       ` Jan Beulich
2016-02-18 12:48         ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 22/30] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
2016-02-17  8:22   ` Jan Beulich
2016-02-17 12:13     ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
2016-02-05 16:12   ` Wei Liu
2016-02-17  8:30   ` Jan Beulich
2016-02-17 12:17     ` Andrew Cooper
2016-02-17 12:23       ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
2016-02-05 16:12   ` Wei Liu
2016-02-08 11:40     ` Andrew Cooper
2016-02-08 16:23   ` Tim Deegan
2016-02-08 16:36     ` Ian Campbell
2016-02-10 10:07       ` Andrew Cooper
2016-02-10 10:18         ` Ian Campbell
2016-02-18 13:37           ` Andrew Cooper
2016-02-17 20:06         ` Konrad Rzeszutek Wilk
2016-02-05 13:42 ` [PATCH v2 25/30] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
2016-02-05 16:12   ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 26/30] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
2016-02-05 16:12   ` Wei Liu
2016-02-05 16:15     ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 27/30] tools: Utility for dealing with featuresets Andrew Cooper
2016-02-05 16:13   ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 28/30] tools/libxc: Wire a featureset through to cpuid policy logic Andrew Cooper
2016-02-05 16:13   ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
2016-02-05 16:13   ` Wei Liu
2016-02-17  8:55   ` Jan Beulich
2016-02-17 13:03     ` Andrew Cooper
2016-02-17 13:19       ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 30/30] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
2016-02-05 14:28   ` Jan Beulich
2016-02-05 15:22     ` Andrew Cooper
2016-02-08 17:26 ` [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling Andrew Cooper
2016-02-17  9:02   ` Jan Beulich
2016-02-17 13:06     ` Andrew Cooper
2016-02-17 13:36       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.