All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/21] x86: Improvements to cpuid handling for guests
@ 2016-04-07 11:57 Andrew Cooper
  2016-04-07 11:57 ` [PATCH v5 01/21] xen/x86: Annotate VM applicability in featureset Andrew Cooper
                   ` (20 more replies)
  0 siblings, 21 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

This series is available in git form at:
  http://xenbits.xen.org/git-http/people/andrewcoop/xen.git levelling-v4

Notable changes from v4 are further tweaking of the SSE* dependency tree,
fixing of deep-C and P state collection on some Linux dom0 kernels, fix the
construction of HVM domains on Skylake hardware, XSM hooks for the new
sysctls.

Most patches do now how Acks/Reviews.  The remaining patches are #1-3,6-7
(x86), #8,14 (XSM), #15(ARM), #26 (Toolstack).

The current cpuid code, both in the hypervisor and toolstack, has grown
organically for a very long time, and is flawed in many ways.  This series
focuses specifically on the fixing the bits pertaining to the visible
features, and I will be fixing other areas in future work (e.g. per-core,
per-package values, auditing of incoming migration values, etc.)

These changes alter the workflow of cpuid handling as follows:

Xen boots and evaluates its current capabilities.  It uses this information to
calculate the maximum featuresets it can provide to guests, and provides this
information for toolstack consumption.  A toolstack may then calculate a safe
set of features (taking into account migratability), and sets a guests cpuid
policy.  Xen then takes care of context switching the levelling state.

In particular, this means that PV guests may have different levels while
running on the same host, an option which was not previously available.

Andrew Cooper (21):
  xen/x86: Annotate VM applicability in featureset
  xen/x86: Calculate maximum host and guest featuresets
  xen/x86: Generate deep dependencies of features
  xen/x86: Clear dependent features when clearing a cpu cap
  xen/x86: Improve disabling of features which have dependencies
  xen/x86: Improvements to in-hypervisor cpuid sanity checks
  x86/cpu: Move set_cpumask() calls into c_early_init()
  x86/cpu: Sysctl and common infrastructure for levelling context
    switching
  x86/cpu: Rework AMD masking MSR setup
  x86/cpu: Rework Intel masking/faulting setup
  x86/cpu: Context switch cpuid masks and faulting state in
    context_switch()
  x86/pv: Provide custom cpumasks for PV domains
  x86/domctl: Update PV domain cpumasks when setting cpuid policy
  xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  tools/libxc: Modify bitmap operations to take void pointers
  tools/libxc: Use public/featureset.h for cpuid policy generation
  tools/libxc: Expose the automatically generated cpu featuremask
    information
  tools: Utility for dealing with featuresets
  tools/libxc: Wire a featureset through to cpuid policy logic
  tools/libxc: Use featuresets rather than guesswork
  tools/libxc: Calculate xstate cpuid leaf from guest information

 .gitignore                                   |   1 +
 tools/flask/policy/policy/modules/xen/xen.te |   2 +
 tools/libxc/Makefile                         |   9 +
 tools/libxc/include/xenctrl.h                |  22 +-
 tools/libxc/xc_bitops.h                      |  37 +-
 tools/libxc/xc_cpufeature.h                  | 151 -------
 tools/libxc/xc_cpuid_x86.c                   | 639 +++++++++++++++++----------
 tools/libxl/libxl_cpuid.c                    |   2 +-
 tools/misc/Makefile                          |   4 +
 tools/misc/xen-cpuid.c                       | 394 +++++++++++++++++
 tools/ocaml/libs/xc/xenctrl.ml               |   3 +
 tools/ocaml/libs/xc/xenctrl.mli              |   4 +
 tools/ocaml/libs/xc/xenctrl_stubs.c          |  37 +-
 tools/python/xen/lowlevel/xc/xc.c            |   2 +-
 xen/arch/x86/apic.c                          |   2 +-
 xen/arch/x86/cpu/amd.c                       | 292 ++++++++----
 xen/arch/x86/cpu/common.c                    |  41 +-
 xen/arch/x86/cpu/intel.c                     | 278 ++++++++----
 xen/arch/x86/cpuid.c                         | 218 +++++++++
 xen/arch/x86/crash.c                         |   3 +
 xen/arch/x86/domain.c                        |  18 +-
 xen/arch/x86/domctl.c                        | 138 ++++++
 xen/arch/x86/hvm/hvm.c                       | 125 ++++--
 xen/arch/x86/setup.c                         |   3 +
 xen/arch/x86/sysctl.c                        |  57 +++
 xen/arch/x86/traps.c                         | 254 +++++++----
 xen/arch/x86/xstate.c                        |   6 +-
 xen/include/asm-x86/cpufeature.h             |   4 +
 xen/include/asm-x86/cpuid.h                  |  51 +++
 xen/include/asm-x86/domain.h                 |   2 +
 xen/include/asm-x86/processor.h              |   2 +-
 xen/include/public/arch-x86/cpufeatureset.h  | 190 ++++----
 xen/include/public/sysctl.h                  |  50 +++
 xen/tools/gen-cpuid.py                       | 185 +++++++-
 xen/xsm/flask/hooks.c                        |   6 +
 xen/xsm/flask/policy/access_vectors          |   4 +
 36 files changed, 2400 insertions(+), 836 deletions(-)
 delete mode 100644 tools/libxc/xc_cpufeature.h
 create mode 100644 tools/misc/xen-cpuid.c

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 01/21] xen/x86: Annotate VM applicability in featureset
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 23:01   ` Jan Beulich
  2016-04-07 11:57 ` [PATCH v5 02/21] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Use attributes to specify whether a feature is applicable to be exposed to:
 1) All guests
 2) HVM guests
 3) HVM HAP guests
and, via absence of an attribute, to no guests.

There is no current need for other categories (e.g. PV-only features), and
such categories should not be introduced if possible.  These categories follow
from the fact that, with increased hardware support, a guest gets more
features to use.

These settings are derived from the existing code in {pv,hvm}_cpuid(), and
xc_cpuid_x86.c.  One notable exception is EXTAPIC which was previously
erroneously exposed to guests.  PV guests don't get to use the APIC and the
HVM APIC emulation doesn't support extended space.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Annotate features using a magic comment and autogeneration.
v3:
 * Rebase over the new namespaceing changes.
 * Expand commit message.
 * Correct PSE36 to being a HAP-only feature.
v4:
 * Re-break PSE36.
 * Hide LWP from PV guests.
v5:
 * Explicitly identify that attributes are not part of the public API.
---
 xen/include/public/arch-x86/cpufeatureset.h | 190 ++++++++++++++--------------
 xen/tools/gen-cpuid.py                      |  32 ++++-
 2 files changed, 128 insertions(+), 94 deletions(-)

diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 8308972..edd9975 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -75,141 +75,147 @@ enum {
  * Attribute syntax:
  *
  * Attributes for a particular feature are provided as characters before the
- * first space in the comment immediately following the feature value.
+ * first space in the comment immediately following the feature value.  Note -
+ * none of these attributes form part of the Xen public ABI.
  *
  * Special: '!'
  *   This bit has special properties and is not a straight indication of a
  *   piece of new functionality.  Xen will handle these differently,
  *   and may override toolstack settings completely.
+ *
+ * Applicability to guests: 'A', 'S' or 'H'
+ *   'A' = All guests.
+ *   'S' = All HVM guests (not PV guests).
+ *   'H' = HVM HAP guests (not PV or HVM Shadow guests).
  */
 
 /* Intel-defined CPU features, CPUID level 0x00000001.edx, word 0 */
-XEN_CPUFEATURE(FPU,           0*32+ 0) /*   Onboard FPU */
-XEN_CPUFEATURE(VME,           0*32+ 1) /*   Virtual Mode Extensions */
-XEN_CPUFEATURE(DE,            0*32+ 2) /*   Debugging Extensions */
-XEN_CPUFEATURE(PSE,           0*32+ 3) /*   Page Size Extensions */
-XEN_CPUFEATURE(TSC,           0*32+ 4) /*   Time Stamp Counter */
-XEN_CPUFEATURE(MSR,           0*32+ 5) /*   Model-Specific Registers, RDMSR, WRMSR */
-XEN_CPUFEATURE(PAE,           0*32+ 6) /*   Physical Address Extensions */
-XEN_CPUFEATURE(MCE,           0*32+ 7) /*   Machine Check Architecture */
-XEN_CPUFEATURE(CX8,           0*32+ 8) /*   CMPXCHG8 instruction */
-XEN_CPUFEATURE(APIC,          0*32+ 9) /*!  Onboard APIC */
-XEN_CPUFEATURE(SEP,           0*32+11) /*   SYSENTER/SYSEXIT */
-XEN_CPUFEATURE(MTRR,          0*32+12) /*   Memory Type Range Registers */
-XEN_CPUFEATURE(PGE,           0*32+13) /*   Page Global Enable */
-XEN_CPUFEATURE(MCA,           0*32+14) /*   Machine Check Architecture */
-XEN_CPUFEATURE(CMOV,          0*32+15) /*   CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
-XEN_CPUFEATURE(PAT,           0*32+16) /*   Page Attribute Table */
-XEN_CPUFEATURE(PSE36,         0*32+17) /*   36-bit PSEs */
-XEN_CPUFEATURE(CLFLUSH,       0*32+19) /*   CLFLUSH instruction */
+XEN_CPUFEATURE(FPU,           0*32+ 0) /*A  Onboard FPU */
+XEN_CPUFEATURE(VME,           0*32+ 1) /*S  Virtual Mode Extensions */
+XEN_CPUFEATURE(DE,            0*32+ 2) /*A  Debugging Extensions */
+XEN_CPUFEATURE(PSE,           0*32+ 3) /*S  Page Size Extensions */
+XEN_CPUFEATURE(TSC,           0*32+ 4) /*A  Time Stamp Counter */
+XEN_CPUFEATURE(MSR,           0*32+ 5) /*A  Model-Specific Registers, RDMSR, WRMSR */
+XEN_CPUFEATURE(PAE,           0*32+ 6) /*A  Physical Address Extensions */
+XEN_CPUFEATURE(MCE,           0*32+ 7) /*A  Machine Check Architecture */
+XEN_CPUFEATURE(CX8,           0*32+ 8) /*A  CMPXCHG8 instruction */
+XEN_CPUFEATURE(APIC,          0*32+ 9) /*!A Onboard APIC */
+XEN_CPUFEATURE(SEP,           0*32+11) /*A  SYSENTER/SYSEXIT */
+XEN_CPUFEATURE(MTRR,          0*32+12) /*S  Memory Type Range Registers */
+XEN_CPUFEATURE(PGE,           0*32+13) /*S  Page Global Enable */
+XEN_CPUFEATURE(MCA,           0*32+14) /*A  Machine Check Architecture */
+XEN_CPUFEATURE(CMOV,          0*32+15) /*A  CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
+XEN_CPUFEATURE(PAT,           0*32+16) /*A  Page Attribute Table */
+XEN_CPUFEATURE(PSE36,         0*32+17) /*S  36-bit PSEs */
+XEN_CPUFEATURE(CLFLUSH,       0*32+19) /*A  CLFLUSH instruction */
 XEN_CPUFEATURE(DS,            0*32+21) /*   Debug Store */
-XEN_CPUFEATURE(ACPI,          0*32+22) /*   ACPI via MSR */
-XEN_CPUFEATURE(MMX,           0*32+23) /*   Multimedia Extensions */
-XEN_CPUFEATURE(FXSR,          0*32+24) /*   FXSAVE and FXRSTOR instructions */
-XEN_CPUFEATURE(SSE,           0*32+25) /*   Streaming SIMD Extensions */
-XEN_CPUFEATURE(SSE2,          0*32+26) /*   Streaming SIMD Extensions-2 */
-XEN_CPUFEATURE(HTT,           0*32+28) /*!  Hyper-Threading Technology */
+XEN_CPUFEATURE(ACPI,          0*32+22) /*A  ACPI via MSR */
+XEN_CPUFEATURE(MMX,           0*32+23) /*A  Multimedia Extensions */
+XEN_CPUFEATURE(FXSR,          0*32+24) /*A  FXSAVE and FXRSTOR instructions */
+XEN_CPUFEATURE(SSE,           0*32+25) /*A  Streaming SIMD Extensions */
+XEN_CPUFEATURE(SSE2,          0*32+26) /*A  Streaming SIMD Extensions-2 */
+XEN_CPUFEATURE(HTT,           0*32+28) /*!A Hyper-Threading Technology */
 XEN_CPUFEATURE(TM1,           0*32+29) /*   Thermal Monitor 1 */
 XEN_CPUFEATURE(PBE,           0*32+31) /*   Pending Break Enable */
 
 /* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
-XEN_CPUFEATURE(SSE3,          1*32+ 0) /*   Streaming SIMD Extensions-3 */
-XEN_CPUFEATURE(PCLMULQDQ,     1*32+ 1) /*   Carry-less mulitplication */
+XEN_CPUFEATURE(SSE3,          1*32+ 0) /*A  Streaming SIMD Extensions-3 */
+XEN_CPUFEATURE(PCLMULQDQ,     1*32+ 1) /*A  Carry-less mulitplication */
 XEN_CPUFEATURE(DTES64,        1*32+ 2) /*   64-bit Debug Store */
 XEN_CPUFEATURE(MONITOR,       1*32+ 3) /*   Monitor/Mwait support */
 XEN_CPUFEATURE(DSCPL,         1*32+ 4) /*   CPL Qualified Debug Store */
-XEN_CPUFEATURE(VMX,           1*32+ 5) /*   Virtual Machine Extensions */
+XEN_CPUFEATURE(VMX,           1*32+ 5) /*S  Virtual Machine Extensions */
 XEN_CPUFEATURE(SMX,           1*32+ 6) /*   Safer Mode Extensions */
 XEN_CPUFEATURE(EIST,          1*32+ 7) /*   Enhanced SpeedStep */
 XEN_CPUFEATURE(TM2,           1*32+ 8) /*   Thermal Monitor 2 */
-XEN_CPUFEATURE(SSSE3,         1*32+ 9) /*   Supplemental Streaming SIMD Extensions-3 */
-XEN_CPUFEATURE(FMA,           1*32+12) /*   Fused Multiply Add */
-XEN_CPUFEATURE(CX16,          1*32+13) /*   CMPXCHG16B */
+XEN_CPUFEATURE(SSSE3,         1*32+ 9) /*A  Supplemental Streaming SIMD Extensions-3 */
+XEN_CPUFEATURE(FMA,           1*32+12) /*A  Fused Multiply Add */
+XEN_CPUFEATURE(CX16,          1*32+13) /*A  CMPXCHG16B */
 XEN_CPUFEATURE(XTPR,          1*32+14) /*   Send Task Priority Messages */
 XEN_CPUFEATURE(PDCM,          1*32+15) /*   Perf/Debug Capability MSR */
-XEN_CPUFEATURE(PCID,          1*32+17) /*   Process Context ID */
+XEN_CPUFEATURE(PCID,          1*32+17) /*H  Process Context ID */
 XEN_CPUFEATURE(DCA,           1*32+18) /*   Direct Cache Access */
-XEN_CPUFEATURE(SSE4_1,        1*32+19) /*   Streaming SIMD Extensions 4.1 */
-XEN_CPUFEATURE(SSE4_2,        1*32+20) /*   Streaming SIMD Extensions 4.2 */
-XEN_CPUFEATURE(X2APIC,        1*32+21) /*!  Extended xAPIC */
-XEN_CPUFEATURE(MOVBE,         1*32+22) /*   movbe instruction */
-XEN_CPUFEATURE(POPCNT,        1*32+23) /*   POPCNT instruction */
-XEN_CPUFEATURE(TSC_DEADLINE,  1*32+24) /*   TSC Deadline Timer */
-XEN_CPUFEATURE(AESNI,         1*32+25) /*   AES instructions */
-XEN_CPUFEATURE(XSAVE,         1*32+26) /*   XSAVE/XRSTOR/XSETBV/XGETBV */
+XEN_CPUFEATURE(SSE4_1,        1*32+19) /*A  Streaming SIMD Extensions 4.1 */
+XEN_CPUFEATURE(SSE4_2,        1*32+20) /*A  Streaming SIMD Extensions 4.2 */
+XEN_CPUFEATURE(X2APIC,        1*32+21) /*!A Extended xAPIC */
+XEN_CPUFEATURE(MOVBE,         1*32+22) /*A  movbe instruction */
+XEN_CPUFEATURE(POPCNT,        1*32+23) /*A  POPCNT instruction */
+XEN_CPUFEATURE(TSC_DEADLINE,  1*32+24) /*S  TSC Deadline Timer */
+XEN_CPUFEATURE(AESNI,         1*32+25) /*A  AES instructions */
+XEN_CPUFEATURE(XSAVE,         1*32+26) /*A  XSAVE/XRSTOR/XSETBV/XGETBV */
 XEN_CPUFEATURE(OSXSAVE,       1*32+27) /*!  OSXSAVE */
-XEN_CPUFEATURE(AVX,           1*32+28) /*   Advanced Vector Extensions */
-XEN_CPUFEATURE(F16C,          1*32+29) /*   Half-precision convert instruction */
-XEN_CPUFEATURE(RDRAND,        1*32+30) /*   Digital Random Number Generator */
-XEN_CPUFEATURE(HYPERVISOR,    1*32+31) /*!  Running under some hypervisor */
+XEN_CPUFEATURE(AVX,           1*32+28) /*A  Advanced Vector Extensions */
+XEN_CPUFEATURE(F16C,          1*32+29) /*A  Half-precision convert instruction */
+XEN_CPUFEATURE(RDRAND,        1*32+30) /*A  Digital Random Number Generator */
+XEN_CPUFEATURE(HYPERVISOR,    1*32+31) /*!A Running under some hypervisor */
 
 /* AMD-defined CPU features, CPUID level 0x80000001.edx, word 2 */
-XEN_CPUFEATURE(SYSCALL,       2*32+11) /*   SYSCALL/SYSRET */
-XEN_CPUFEATURE(NX,            2*32+20) /*   Execute Disable */
-XEN_CPUFEATURE(MMXEXT,        2*32+22) /*   AMD MMX extensions */
-XEN_CPUFEATURE(FFXSR,         2*32+25) /*   FFXSR instruction optimizations */
-XEN_CPUFEATURE(PAGE1GB,       2*32+26) /*   1Gb large page support */
-XEN_CPUFEATURE(RDTSCP,        2*32+27) /*   RDTSCP */
-XEN_CPUFEATURE(LM,            2*32+29) /*   Long Mode (x86-64) */
-XEN_CPUFEATURE(3DNOWEXT,      2*32+30) /*   AMD 3DNow! extensions */
-XEN_CPUFEATURE(3DNOW,         2*32+31) /*   3DNow! */
+XEN_CPUFEATURE(SYSCALL,       2*32+11) /*A  SYSCALL/SYSRET */
+XEN_CPUFEATURE(NX,            2*32+20) /*A  Execute Disable */
+XEN_CPUFEATURE(MMXEXT,        2*32+22) /*A  AMD MMX extensions */
+XEN_CPUFEATURE(FFXSR,         2*32+25) /*A  FFXSR instruction optimizations */
+XEN_CPUFEATURE(PAGE1GB,       2*32+26) /*H  1Gb large page support */
+XEN_CPUFEATURE(RDTSCP,        2*32+27) /*S  RDTSCP */
+XEN_CPUFEATURE(LM,            2*32+29) /*A  Long Mode (x86-64) */
+XEN_CPUFEATURE(3DNOWEXT,      2*32+30) /*A  AMD 3DNow! extensions */
+XEN_CPUFEATURE(3DNOW,         2*32+31) /*A  3DNow! */
 
 /* AMD-defined CPU features, CPUID level 0x80000001.ecx, word 3 */
-XEN_CPUFEATURE(LAHF_LM,       3*32+ 0) /*   LAHF/SAHF in long mode */
-XEN_CPUFEATURE(CMP_LEGACY,    3*32+ 1) /*!  If yes HyperThreading not valid */
-XEN_CPUFEATURE(SVM,           3*32+ 2) /*   Secure virtual machine */
+XEN_CPUFEATURE(LAHF_LM,       3*32+ 0) /*A  LAHF/SAHF in long mode */
+XEN_CPUFEATURE(CMP_LEGACY,    3*32+ 1) /*!A If yes HyperThreading not valid */
+XEN_CPUFEATURE(SVM,           3*32+ 2) /*S  Secure virtual machine */
 XEN_CPUFEATURE(EXTAPIC,       3*32+ 3) /*   Extended APIC space */
-XEN_CPUFEATURE(CR8_LEGACY,    3*32+ 4) /*   CR8 in 32-bit mode */
-XEN_CPUFEATURE(ABM,           3*32+ 5) /*   Advanced bit manipulation */
-XEN_CPUFEATURE(SSE4A,         3*32+ 6) /*   SSE-4A */
-XEN_CPUFEATURE(MISALIGNSSE,   3*32+ 7) /*   Misaligned SSE mode */
-XEN_CPUFEATURE(3DNOWPREFETCH, 3*32+ 8) /*   3DNow prefetch instructions */
+XEN_CPUFEATURE(CR8_LEGACY,    3*32+ 4) /*S  CR8 in 32-bit mode */
+XEN_CPUFEATURE(ABM,           3*32+ 5) /*A  Advanced bit manipulation */
+XEN_CPUFEATURE(SSE4A,         3*32+ 6) /*A  SSE-4A */
+XEN_CPUFEATURE(MISALIGNSSE,   3*32+ 7) /*A  Misaligned SSE mode */
+XEN_CPUFEATURE(3DNOWPREFETCH, 3*32+ 8) /*A  3DNow prefetch instructions */
 XEN_CPUFEATURE(OSVW,          3*32+ 9) /*   OS Visible Workaround */
-XEN_CPUFEATURE(IBS,           3*32+10) /*   Instruction Based Sampling */
-XEN_CPUFEATURE(XOP,           3*32+11) /*   extended AVX instructions */
+XEN_CPUFEATURE(IBS,           3*32+10) /*S  Instruction Based Sampling */
+XEN_CPUFEATURE(XOP,           3*32+11) /*A  extended AVX instructions */
 XEN_CPUFEATURE(SKINIT,        3*32+12) /*   SKINIT/STGI instructions */
 XEN_CPUFEATURE(WDT,           3*32+13) /*   Watchdog timer */
-XEN_CPUFEATURE(LWP,           3*32+15) /*   Light Weight Profiling */
-XEN_CPUFEATURE(FMA4,          3*32+16) /*   4 operands MAC instructions */
+XEN_CPUFEATURE(LWP,           3*32+15) /*S  Light Weight Profiling */
+XEN_CPUFEATURE(FMA4,          3*32+16) /*A  4 operands MAC instructions */
 XEN_CPUFEATURE(NODEID_MSR,    3*32+19) /*   NodeId MSR */
-XEN_CPUFEATURE(TBM,           3*32+21) /*   trailing bit manipulations */
+XEN_CPUFEATURE(TBM,           3*32+21) /*A  trailing bit manipulations */
 XEN_CPUFEATURE(TOPOEXT,       3*32+22) /*   topology extensions CPUID leafs */
-XEN_CPUFEATURE(DBEXT,         3*32+26) /*   data breakpoint extension */
+XEN_CPUFEATURE(DBEXT,         3*32+26) /*A  data breakpoint extension */
 XEN_CPUFEATURE(MONITORX,      3*32+29) /*   MONITOR extension (MONITORX/MWAITX) */
 
 /* Intel-defined CPU features, CPUID level 0x0000000D:1.eax, word 4 */
-XEN_CPUFEATURE(XSAVEOPT,      4*32+ 0) /*   XSAVEOPT instruction */
-XEN_CPUFEATURE(XSAVEC,        4*32+ 1) /*   XSAVEC/XRSTORC instructions */
-XEN_CPUFEATURE(XGETBV1,       4*32+ 2) /*   XGETBV with %ecx=1 */
-XEN_CPUFEATURE(XSAVES,        4*32+ 3) /*   XSAVES/XRSTORS instructions */
+XEN_CPUFEATURE(XSAVEOPT,      4*32+ 0) /*A  XSAVEOPT instruction */
+XEN_CPUFEATURE(XSAVEC,        4*32+ 1) /*A  XSAVEC/XRSTORC instructions */
+XEN_CPUFEATURE(XGETBV1,       4*32+ 2) /*A  XGETBV with %ecx=1 */
+XEN_CPUFEATURE(XSAVES,        4*32+ 3) /*S  XSAVES/XRSTORS instructions */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */
-XEN_CPUFEATURE(FSGSBASE,      5*32+ 0) /*   {RD,WR}{FS,GS}BASE instructions */
-XEN_CPUFEATURE(TSC_ADJUST,    5*32+ 1) /*   TSC_ADJUST MSR available */
-XEN_CPUFEATURE(BMI1,          5*32+ 3) /*   1st bit manipulation extensions */
-XEN_CPUFEATURE(HLE,           5*32+ 4) /*   Hardware Lock Elision */
-XEN_CPUFEATURE(AVX2,          5*32+ 5) /*   AVX2 instructions */
+XEN_CPUFEATURE(FSGSBASE,      5*32+ 0) /*A  {RD,WR}{FS,GS}BASE instructions */
+XEN_CPUFEATURE(TSC_ADJUST,    5*32+ 1) /*S  TSC_ADJUST MSR available */
+XEN_CPUFEATURE(BMI1,          5*32+ 3) /*A  1st bit manipulation extensions */
+XEN_CPUFEATURE(HLE,           5*32+ 4) /*A  Hardware Lock Elision */
+XEN_CPUFEATURE(AVX2,          5*32+ 5) /*A  AVX2 instructions */
 XEN_CPUFEATURE(FDP_EXCP_ONLY, 5*32+ 6) /*!  x87 FDP only updated on exception. */
-XEN_CPUFEATURE(SMEP,          5*32+ 7) /*   Supervisor Mode Execution Protection */
-XEN_CPUFEATURE(BMI2,          5*32+ 8) /*   2nd bit manipulation extensions */
-XEN_CPUFEATURE(ERMS,          5*32+ 9) /*   Enhanced REP MOVSB/STOSB */
-XEN_CPUFEATURE(INVPCID,       5*32+10) /*   Invalidate Process Context ID */
-XEN_CPUFEATURE(RTM,           5*32+11) /*   Restricted Transactional Memory */
+XEN_CPUFEATURE(SMEP,          5*32+ 7) /*S  Supervisor Mode Execution Protection */
+XEN_CPUFEATURE(BMI2,          5*32+ 8) /*A  2nd bit manipulation extensions */
+XEN_CPUFEATURE(ERMS,          5*32+ 9) /*A  Enhanced REP MOVSB/STOSB */
+XEN_CPUFEATURE(INVPCID,       5*32+10) /*H  Invalidate Process Context ID */
+XEN_CPUFEATURE(RTM,           5*32+11) /*A  Restricted Transactional Memory */
 XEN_CPUFEATURE(PQM,           5*32+12) /*   Platform QoS Monitoring */
 XEN_CPUFEATURE(NO_FPU_SEL,    5*32+13) /*!  FPU CS/DS stored as zero */
-XEN_CPUFEATURE(MPX,           5*32+14) /*   Memory Protection Extensions */
+XEN_CPUFEATURE(MPX,           5*32+14) /*S  Memory Protection Extensions */
 XEN_CPUFEATURE(PQE,           5*32+15) /*   Platform QoS Enforcement */
-XEN_CPUFEATURE(RDSEED,        5*32+18) /*   RDSEED instruction */
-XEN_CPUFEATURE(ADX,           5*32+19) /*   ADCX, ADOX instructions */
-XEN_CPUFEATURE(SMAP,          5*32+20) /*   Supervisor Mode Access Prevention */
-XEN_CPUFEATURE(PCOMMIT,       5*32+22) /*   PCOMMIT instruction */
-XEN_CPUFEATURE(CLFLUSHOPT,    5*32+23) /*   CLFLUSHOPT instruction */
-XEN_CPUFEATURE(CLWB,          5*32+24) /*   CLWB instruction */
-XEN_CPUFEATURE(SHA,           5*32+29) /*   SHA1 & SHA256 instructions */
+XEN_CPUFEATURE(RDSEED,        5*32+18) /*A  RDSEED instruction */
+XEN_CPUFEATURE(ADX,           5*32+19) /*A  ADCX, ADOX instructions */
+XEN_CPUFEATURE(SMAP,          5*32+20) /*S  Supervisor Mode Access Prevention */
+XEN_CPUFEATURE(PCOMMIT,       5*32+22) /*A  PCOMMIT instruction */
+XEN_CPUFEATURE(CLFLUSHOPT,    5*32+23) /*A  CLFLUSHOPT instruction */
+XEN_CPUFEATURE(CLWB,          5*32+24) /*A  CLWB instruction */
+XEN_CPUFEATURE(SHA,           5*32+29) /*A  SHA1 & SHA256 instructions */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.ecx, word 6 */
-XEN_CPUFEATURE(PREFETCHWT1,   6*32+ 0) /*   PREFETCHWT1 instruction */
-XEN_CPUFEATURE(PKU,           6*32+ 3) /*   Protection Keys for Userspace */
+XEN_CPUFEATURE(PREFETCHWT1,   6*32+ 0) /*A  PREFETCHWT1 instruction */
+XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
 
 /* AMD-defined CPU features, CPUID level 0x80000007.edx, word 7 */
@@ -217,7 +223,7 @@ XEN_CPUFEATURE(ITSC,          7*32+ 8) /*   Invariant TSC */
 XEN_CPUFEATURE(EFRO,          7*32+10) /*   APERF/MPERF Read Only interface */
 
 /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
-XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*   CLZERO instruction */
+XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
 
 #endif /* XEN_CPUFEATURE */
 
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 46ef6d1..f971ab2 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -17,12 +17,18 @@ class State(object):
         # State parsed from input
         self.names = {} # Name => value mapping
         self.raw_special = set()
+        self.raw_pv = set()
+        self.raw_hvm_shadow = set()
+        self.raw_hvm_hap = set()
 
         # State calculated
         self.nr_entries = 0 # Number of words in a featureset
         self.common_1d = 0 # Common features between 1d and e1d
         self.known = [] # All known features
         self.special = [] # Features with special semantics
+        self.pv = []
+        self.hvm_shadow = []
+        self.hvm_hap = []
 
 def parse_definitions(state):
     """
@@ -32,7 +38,7 @@ def parse_definitions(state):
     feat_regex = re.compile(
         r"^XEN_CPUFEATURE\(([A-Z0-9_]+),"
         "\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
-        "\s+/\*([!]*) .*$")
+        "\s+/\*([\w!]*) .*$")
 
     this = sys.modules[__name__]
 
@@ -72,6 +78,16 @@ def parse_definitions(state):
 
             if a == "!":
                 state.raw_special.add(val)
+            elif a in "ASH":
+                if a == "A":
+                    state.raw_pv.add(val)
+                    state.raw_hvm_shadow.add(val)
+                    state.raw_hvm_hap.add(val)
+                elif attr == "S":
+                    state.raw_hvm_shadow.add(val)
+                    state.raw_hvm_hap.add(val)
+                elif attr == "H":
+                    state.raw_hvm_hap.add(val)
             else:
                 raise Fail("Unrecognised attribute '%s' for %s" % (a, name))
 
@@ -124,6 +140,9 @@ def crunch_numbers(state):
 
     state.common_1d = featureset_to_uint32s(common_1d, 1)[0]
     state.special = featureset_to_uint32s(state.raw_special, nr_entries)
+    state.pv = featureset_to_uint32s(state.raw_pv, nr_entries)
+    state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries)
+    state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
 
 
 def write_results(state):
@@ -144,11 +163,20 @@ def write_results(state):
 
 #define INIT_KNOWN_FEATURES { \\\n%s\n} 
 
-#define INIT_SPECIAL_FEATURES { \\\n%s\n} 
+#define INIT_SPECIAL_FEATURES { \\\n%s\n}
+
+#define INIT_PV_FEATURES { \\\n%s\n}
+
+#define INIT_HVM_SHADOW_FEATURES { \\\n%s\n}
+
+#define INIT_HVM_HAP_FEATURES { \\\n%s\n}
 """ % (state.nr_entries,
        state.common_1d,
        format_uint32s(state.known, 4),
        format_uint32s(state.special, 4),
+       format_uint32s(state.pv, 4),
+       format_uint32s(state.hvm_shadow, 4),
+       format_uint32s(state.hvm_hap, 4),
        ))
 
     state.output.write(
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 02/21] xen/x86: Calculate maximum host and guest featuresets
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
  2016-04-07 11:57 ` [PATCH v5 01/21] xen/x86: Annotate VM applicability in featureset Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 23:04   ` Jan Beulich
  2016-04-07 11:57 ` [PATCH v5 03/21] xen/x86: Generate deep dependencies of features Andrew Cooper
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

All of this information will be used by the toolstack to make informed
levelling decisions for VMs, and by Xen to sanity check toolstack-provided
information.

The split between the shadow and hap HVM masks is necessary due to the lack of
a "get cpuid policy" hypercall.  Multi-host toolstacks (i.e. not libxl)
dealing with hap and non-hap capable hosts need to be able to calculate that
migrating a shadow guest is safe.

Future planned development work will implement proper cpuid policy handing in
Xen, including a "get policy" hypercall, but until then, the difference is
made available for toolstack use via a non-stable interface.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v3:
 * Move as much as possible into .init.
 * Fix the handing of the shared bits for the cross-vendor case.
 * Fix extended check.
v4:
 * Fix copy&paste error in calculate_hvm_featureset()
v5:
 * Expand commit message, explaining about the shadow and hap masks.
---
 xen/arch/x86/cpuid.c        | 162 ++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c        |   3 +
 xen/include/asm-x86/cpuid.h |  17 +++++
 3 files changed, 182 insertions(+)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 77e008a..41439f8 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -1,14 +1,176 @@
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <asm/cpuid.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <asm/processor.h>
 
 const uint32_t known_features[] = INIT_KNOWN_FEATURES;
 const uint32_t special_features[] = INIT_SPECIAL_FEATURES;
 
+static const uint32_t __initconst pv_featuremask[] = INIT_PV_FEATURES;
+static const uint32_t __initconst hvm_shadow_featuremask[] = INIT_HVM_SHADOW_FEATURES;
+static const uint32_t __initconst hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES;
+
+uint32_t __read_mostly raw_featureset[FSCAPINTS];
+uint32_t __read_mostly pv_featureset[FSCAPINTS];
+uint32_t __read_mostly hvm_featureset[FSCAPINTS];
+
+static void __init sanitise_featureset(uint32_t *fs)
+{
+    unsigned int i;
+
+    for ( i = 0; i < FSCAPINTS; ++i )
+    {
+        /* Clamp to known mask. */
+        fs[i] &= known_features[i];
+    }
+
+    /*
+     * Sort out shared bits.  We are constructing a featureset which needs to
+     * be applicable to a cross-vendor case.  Intel strictly clears the common
+     * bits in e1d, while AMD strictly duplicates them.
+     *
+     * We duplicate them here to be compatible with AMD while on Intel, and
+     * rely on logic closer to the guest to make the featureset stricter if
+     * emulating Intel.
+     */
+    fs[FEATURESET_e1d] = ((fs[FEATURESET_1d]  &  CPUID_COMMON_1D_FEATURES) |
+                          (fs[FEATURESET_e1d] & ~CPUID_COMMON_1D_FEATURES));
+}
+
+static void __init calculate_raw_featureset(void)
+{
+    unsigned int max, tmp;
+
+    max = cpuid_eax(0);
+
+    if ( max >= 1 )
+        cpuid(0x1, &tmp, &tmp,
+              &raw_featureset[FEATURESET_1c],
+              &raw_featureset[FEATURESET_1d]);
+    if ( max >= 7 )
+        cpuid_count(0x7, 0, &tmp,
+                    &raw_featureset[FEATURESET_7b0],
+                    &raw_featureset[FEATURESET_7c0],
+                    &tmp);
+    if ( max >= 0xd )
+        cpuid_count(0xd, 1,
+                    &raw_featureset[FEATURESET_Da1],
+                    &tmp, &tmp, &tmp);
+
+    max = cpuid_eax(0x80000000);
+    if ( (max >> 16) != 0x8000 )
+        return;
+
+    if ( max >= 0x80000001 )
+        cpuid(0x80000001, &tmp, &tmp,
+              &raw_featureset[FEATURESET_e1c],
+              &raw_featureset[FEATURESET_e1d]);
+    if ( max >= 0x80000007 )
+        cpuid(0x80000007, &tmp, &tmp, &tmp,
+              &raw_featureset[FEATURESET_e7d]);
+    if ( max >= 0x80000008 )
+        cpuid(0x80000008, &tmp,
+              &raw_featureset[FEATURESET_e8b],
+              &tmp, &tmp);
+}
+
+static void __init calculate_pv_featureset(void)
+{
+    unsigned int i;
+
+    for ( i = 0; i < FSCAPINTS; ++i )
+        pv_featureset[i] = host_featureset[i] & pv_featuremask[i];
+
+    /* Unconditionally claim to be able to set the hypervisor bit. */
+    __set_bit(X86_FEATURE_HYPERVISOR, pv_featureset);
+
+    /*
+     * Allow the toolstack to set HTT, X2APIC and CMP_LEGACY.  These bits
+     * affect how to interpret topology information in other cpuid leaves.
+     */
+    __set_bit(X86_FEATURE_HTT, pv_featureset);
+    __set_bit(X86_FEATURE_X2APIC, pv_featureset);
+    __set_bit(X86_FEATURE_CMP_LEGACY, pv_featureset);
+
+    sanitise_featureset(pv_featureset);
+}
+
+static void __init calculate_hvm_featureset(void)
+{
+    unsigned int i;
+    const uint32_t *hvm_featuremask;
+
+    if ( !hvm_enabled )
+        return;
+
+    hvm_featuremask = hvm_funcs.hap_supported ?
+        hvm_hap_featuremask : hvm_shadow_featuremask;
+
+    for ( i = 0; i < FSCAPINTS; ++i )
+        hvm_featureset[i] = host_featureset[i] & hvm_featuremask[i];
+
+    /* Unconditionally claim to be able to set the hypervisor bit. */
+    __set_bit(X86_FEATURE_HYPERVISOR, hvm_featureset);
+
+    /*
+     * Allow the toolstack to set HTT, X2APIC and CMP_LEGACY.  These bits
+     * affect how to interpret topology information in other cpuid leaves.
+     */
+    __set_bit(X86_FEATURE_HTT, hvm_featureset);
+    __set_bit(X86_FEATURE_X2APIC, hvm_featureset);
+    __set_bit(X86_FEATURE_CMP_LEGACY, hvm_featureset);
+
+    /*
+     * Xen can provide an APIC emulation to HVM guests even if the host's APIC
+     * isn't enabled.
+     */
+    __set_bit(X86_FEATURE_APIC, hvm_featureset);
+
+    /*
+     * On AMD, PV guests are entirely unable to use SYSENTER as Xen runs in
+     * long mode (and init_amd() has cleared it out of host capabilities), but
+     * HVM guests are able if running in protected mode.
+     */
+    if ( (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
+         test_bit(X86_FEATURE_SEP, raw_featureset) )
+        __set_bit(X86_FEATURE_SEP, hvm_featureset);
+
+    /*
+     * With VT-x, some features are only supported by Xen if dedicated
+     * hardware support is also available.
+     */
+    if ( cpu_has_vmx )
+    {
+        if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
+             !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
+            __clear_bit(X86_FEATURE_MPX, hvm_featureset);
+
+        if ( !cpu_has_vmx_xsaves )
+            __clear_bit(X86_FEATURE_XSAVES, hvm_featureset);
+
+        if ( !cpu_has_vmx_pcommit )
+            __clear_bit(X86_FEATURE_PCOMMIT, hvm_featureset);
+    }
+
+    sanitise_featureset(hvm_featureset);
+}
+
+void __init calculate_featuresets(void)
+{
+    calculate_raw_featureset();
+    calculate_pv_featureset();
+    calculate_hvm_featureset();
+}
+
 static void __init __maybe_unused build_assertions(void)
 {
     BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);
     BUILD_BUG_ON(ARRAY_SIZE(special_features) != FSCAPINTS);
+    BUILD_BUG_ON(ARRAY_SIZE(pv_featuremask) != FSCAPINTS);
+    BUILD_BUG_ON(ARRAY_SIZE(hvm_shadow_featuremask) != FSCAPINTS);
+    BUILD_BUG_ON(ARRAY_SIZE(hvm_hap_featuremask) != FSCAPINTS);
 }
 
 /*
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index ee65f55..3696c31 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -50,6 +50,7 @@
 #include <asm/nmi.h>
 #include <asm/alternative.h>
 #include <asm/mc146818rtc.h>
+#include <asm/cpuid.h>
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool_t __initdata opt_nosmp;
@@ -1525,6 +1526,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                "Multiple initrd candidates, picking module #%u\n",
                initrdidx);
 
+    calculate_featuresets();
+
     /*
      * Temporarily clear SMAP in CR4 to allow user-accesses in construct_dom0().
      * This saves a large number of corner cases interactions with
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index 0ecf357..5041bcd 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -6,12 +6,29 @@
 
 #define FSCAPINTS FEATURESET_NR_ENTRIES
 
+#define FEATURESET_1d     0 /* 0x00000001.edx      */
+#define FEATURESET_1c     1 /* 0x00000001.ecx      */
+#define FEATURESET_e1d    2 /* 0x80000001.edx      */
+#define FEATURESET_e1c    3 /* 0x80000001.ecx      */
+#define FEATURESET_Da1    4 /* 0x0000000d:1.eax    */
+#define FEATURESET_7b0    5 /* 0x00000007:0.ebx    */
+#define FEATURESET_7c0    6 /* 0x00000007:0.ecx    */
+#define FEATURESET_e7d    7 /* 0x80000007.edx      */
+#define FEATURESET_e8b    8 /* 0x80000008.ebx      */
+
 #ifndef __ASSEMBLY__
 #include <xen/types.h>
 
 extern const uint32_t known_features[FSCAPINTS];
 extern const uint32_t special_features[FSCAPINTS];
 
+extern uint32_t raw_featureset[FSCAPINTS];
+#define host_featureset boot_cpu_data.x86_capability
+extern uint32_t pv_featureset[FSCAPINTS];
+extern uint32_t hvm_featureset[FSCAPINTS];
+
+void calculate_featuresets(void);
+
 #endif /* __ASSEMBLY__ */
 #endif /* !__X86_CPUID_H__ */
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 03/21] xen/x86: Generate deep dependencies of features
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
  2016-04-07 11:57 ` [PATCH v5 01/21] xen/x86: Annotate VM applicability in featureset Andrew Cooper
  2016-04-07 11:57 ` [PATCH v5 02/21] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 23:18   ` Jan Beulich
  2016-04-07 11:57 ` [PATCH v5 04/21] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Some features depend on other features.  Working out and maintaining the exact
dependency tree is complicated, so it is expressed in the automatic generation
script.

At runtime, Xen needs to be disable all features which are dependent on a
feature being disabled.  Because of the flattening performed at compile time,
runtime can use a single mask to disable all eventual features.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * New.
v3:
 * Vastly more reserch and comments.
v4:
 * Expand commit message.
 * More tweaks to the dependency tree.
 * Avoid for_each_set_bit() walking off the end of disabled_features[].
   Expanding disabled_features[] turns out to be far more simple than
   attempting to opencode for_each_set_bit()
v5:
 * Further tweaking of the SSE* dependencies.
---
 xen/arch/x86/cpuid.c        |  56 ++++++++++++++++
 xen/include/asm-x86/cpuid.h |   2 +
 xen/tools/gen-cpuid.py      | 153 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 210 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 41439f8..e1e0e44 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -11,6 +11,7 @@ const uint32_t special_features[] = INIT_SPECIAL_FEATURES;
 static const uint32_t __initconst pv_featuremask[] = INIT_PV_FEATURES;
 static const uint32_t __initconst hvm_shadow_featuremask[] = INIT_HVM_SHADOW_FEATURES;
 static const uint32_t __initconst hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES;
+static const uint32_t __initconst deep_features[] = INIT_DEEP_FEATURES;
 
 uint32_t __read_mostly raw_featureset[FSCAPINTS];
 uint32_t __read_mostly pv_featureset[FSCAPINTS];
@@ -18,12 +19,36 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
 
 static void __init sanitise_featureset(uint32_t *fs)
 {
+    /* for_each_set_bit() uses unsigned longs.  Extend with zeroes. */
+    uint32_t disabled_features[
+        ROUNDUP(FSCAPINTS, sizeof(unsigned long)/sizeof(uint32_t))] = {};
     unsigned int i;
 
     for ( i = 0; i < FSCAPINTS; ++i )
     {
         /* Clamp to known mask. */
         fs[i] &= known_features[i];
+
+        /*
+         * Identify which features with deep dependencies have been
+         * disabled.
+         */
+        disabled_features[i] = ~fs[i] & deep_features[i];
+    }
+
+    for_each_set_bit(i, (void *)disabled_features,
+                     sizeof(disabled_features) * 8)
+    {
+        const uint32_t *dfs = lookup_deep_deps(i);
+        unsigned int j;
+
+        ASSERT(dfs); /* deep_features[] should guarentee this. */
+
+        for ( j = 0; j < FSCAPINTS; ++j )
+        {
+            fs[j] &= ~dfs[j];
+            disabled_features[j] &= ~dfs[j];
+        }
     }
 
     /*
@@ -164,6 +189,36 @@ void __init calculate_featuresets(void)
     calculate_hvm_featureset();
 }
 
+const uint32_t * __init lookup_deep_deps(uint32_t feature)
+{
+    static const struct {
+        uint32_t feature;
+        uint32_t fs[FSCAPINTS];
+    } deep_deps[] __initconst = INIT_DEEP_DEPS;
+    unsigned int start = 0, end = ARRAY_SIZE(deep_deps);
+
+    BUILD_BUG_ON(ARRAY_SIZE(deep_deps) != NR_DEEP_DEPS);
+
+    /* Fast early exit. */
+    if ( !test_bit(feature, deep_features) )
+        return NULL;
+
+    /* deep_deps[] is sorted.  Perform a binary search. */
+    while ( start < end )
+    {
+        unsigned int mid = start + ((end - start) / 2);
+
+        if ( deep_deps[mid].feature > feature )
+            end = mid;
+        else if ( deep_deps[mid].feature < feature )
+            start = mid + 1;
+        else
+            return deep_deps[mid].fs;
+    }
+
+    return NULL;
+}
+
 static void __init __maybe_unused build_assertions(void)
 {
     BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS);
@@ -171,6 +226,7 @@ static void __init __maybe_unused build_assertions(void)
     BUILD_BUG_ON(ARRAY_SIZE(pv_featuremask) != FSCAPINTS);
     BUILD_BUG_ON(ARRAY_SIZE(hvm_shadow_featuremask) != FSCAPINTS);
     BUILD_BUG_ON(ARRAY_SIZE(hvm_hap_featuremask) != FSCAPINTS);
+    BUILD_BUG_ON(ARRAY_SIZE(deep_features) != FSCAPINTS);
 }
 
 /*
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index 5041bcd..4725672 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -29,6 +29,8 @@ extern uint32_t hvm_featureset[FSCAPINTS];
 
 void calculate_featuresets(void);
 
+const uint32_t *lookup_deep_deps(uint32_t feature);
+
 #endif /* __ASSEMBLY__ */
 #endif /* !__X86_CPUID_H__ */
 
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index f971ab2..57533de 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -144,6 +144,141 @@ def crunch_numbers(state):
     state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries)
     state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
 
+    #
+    # Feature dependency information.
+    #
+    # !!! WARNING !!!
+    #
+    # A lot of this information is derived from the written text of vendors
+    # software manuals, rather than directly from a statement.  As such, it
+    # does contain guesswork and assumptions, and may not accurately match
+    # hardware implementations.
+    #
+    # It is however designed to create an end result for a guest which does
+    # plausibly match real hardware.
+    #
+    # !!! WARNING !!!
+    #
+    # The format of this dictionary is that the feature in the key is a direct
+    # prerequisite of each feature in the value.
+    #
+    # The first consideration is about which functionality is physically built
+    # on top of other features.  The second consideration, which is more
+    # subjective, is whether real hardware would ever be found supporting
+    # feature X but not Y.
+    #
+    deps = {
+        # FPU is taken to mean support for the x87 regisers as well as the
+        # instructions.  MMX is documented to alias the %MM registers over the
+        # x87 %ST registers in hardware.
+        FPU: [MMX],
+
+        # The PSE36 feature indicates that reserved bits in a PSE superpage
+        # may be used as extra physical address bits.
+        PSE: [PSE36],
+
+        # Entering Long Mode requires that %CR4.PAE is set.  The NX and PKU
+        # pagetable bits are only representable in the 64bit PTE format
+        # offered by PAE.
+        PAE: [LM, NX, PKU],
+
+        # APIC is special, but X2APIC does depend on APIC being available in
+        # the first place.
+        APIC: [X2APIC],
+
+        # AMD built MMXExtentions and 3DNow as extentions to MMX.
+        MMX: [MMXEXT, _3DNOW],
+
+        # The FXSAVE/FXRSTOR instructions were introduced into hardware before
+        # SSE, which is why they behave differently based on %CR4.OSFXSAVE and
+        # have their own feature bit.  AMD however introduce the Fast FXSR
+        # feature as an optimisation.
+        FXSR: [FFXSR, SSE],
+
+        # SSE is taken to mean support for the %XMM registers as well as the
+        # instructions.  Several futher instruction sets are built on core
+        # %XMM support, without specific inter-dependencies.
+        SSE: [SSE2, SSE3, SSSE3, SSE4A,
+              AESNI, SHA],
+
+        # SSE2 was re-specified as core instructions for 64bit.
+        SSE2: [LM],
+
+        # SSE4.1 explicitly depends on SSE3 and SSSE3
+        SSE3: [SSE4_1],
+        SSSE3: [SSE4_1],
+
+        # SSE4.2 explicitly depends on SSE4.1
+        SSE4_1: [SSE4_2],
+
+        # AMD specify no relationship between POPCNT and SSE4.2.  Intel
+        # document that SSE4.2 should be checked for before checking for
+        # POPCNT.  However, it has its own feature bit, and operates on GPRs
+        # rather than %XMM state, so doesn't inherently depend on SSE.
+        # Therefore, we do not specify a dependency between SSE4_2 and POPCNT.
+        #
+        # SSE4_2: [POPCNT]
+
+        # The INVPCID instruction depends on PCID infrastructure being
+        # available.
+        PCID: [INVPCID],
+
+        # XSAVE is an extra set of instructions for state management, but
+        # doesn't constitue new state itself.  Some of the dependent features
+        # are instructions built on top of base XSAVE, while others are new
+        # instruction groups which are specified to require XSAVE for state
+        # management.
+        XSAVE: [XSAVEOPT, XSAVEC, XGETBV1, XSAVES,
+                AVX, MPX, PKU, LWP],
+
+        # AVX is taken to mean hardware support for VEX encoded instructions,
+        # 256bit registers, and the instructions themselves.  Each of these
+        # subsequent instruction groups may only be VEX encoded.
+        AVX: [FMA, FMA4, F16C, AVX2, XOP],
+
+        # CX16 is only encodable in Long Mode.  LAHF_LM indicates that the
+        # SAHF/LAHF instructions are reintroduced in Long Mode.  1GB
+        # superpages and PCID are only available in 4 level paging.
+        LM: [CX16, PCID, LAHF_LM, PAGE1GB],
+
+        # AMD K6-2+ and K6-III processors shipped with 3DNow+, beyond the
+        # standard 3DNow in the earlier K6 processors.
+        _3DNOW: [_3DNOWEXT],
+    }
+
+    deep_features = tuple(sorted(deps.keys()))
+    state.deep_deps = {}
+
+    for feat in deep_features:
+
+        seen = [feat]
+        to_process = list(deps[feat])
+
+        while len(to_process):
+
+            # To debug, uncomment the following lines:
+            # def repl(l):
+            #     return "[" + ", ".join((state.names[x] for x in l)) + "]"
+            # print >>sys.stderr, "Feature %s, seen %s, to_process %s " % \
+            #     (state.names[feat], repl(seen), repl(to_process))
+
+            f = to_process.pop(0)
+
+            if f in seen:
+                raise Fail("ERROR: Cycle found with %s when processing %s"
+                           % (state.names[f], state.names[feat]))
+
+            seen.append(f)
+            to_process = list(set(to_process + deps.get(f, [])))
+
+        state.deep_deps[feat] = seen[1:]
+
+    state.deep_features = featureset_to_uint32s(deps.keys(), nr_entries)
+    state.nr_deep_deps = len(state.deep_deps.keys())
+
+    for k, v in state.deep_deps.iteritems():
+        state.deep_deps[k] = featureset_to_uint32s(v, nr_entries)
+
 
 def write_results(state):
     state.output.write(
@@ -170,6 +305,12 @@ def write_results(state):
 #define INIT_HVM_SHADOW_FEATURES { \\\n%s\n}
 
 #define INIT_HVM_HAP_FEATURES { \\\n%s\n}
+
+#define NR_DEEP_DEPS %sU
+
+#define INIT_DEEP_FEATURES { \\\n%s\n}
+
+#define INIT_DEEP_DEPS { \\
 """ % (state.nr_entries,
        state.common_1d,
        format_uint32s(state.known, 4),
@@ -177,10 +318,20 @@ def write_results(state):
        format_uint32s(state.pv, 4),
        format_uint32s(state.hvm_shadow, 4),
        format_uint32s(state.hvm_hap, 4),
+       state.nr_deep_deps,
+       format_uint32s(state.deep_features, 4),
        ))
 
+    for dep in sorted(state.deep_deps.keys()):
+        state.output.write(
+            "    { %#xU, /* %s */ { \\\n%s\n    }, }, \\\n"
+            % (dep, state.names[dep],
+               format_uint32s(state.deep_deps[dep], 8)
+           ))
+
     state.output.write(
-"""
+"""}
+
 #endif /* __XEN_X86__FEATURESET_DATA__ */
 """)
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 04/21] xen/x86: Clear dependent features when clearing a cpu cap
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (2 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 03/21] xen/x86: Generate deep dependencies of features Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 15:36   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 05/21] xen/x86: Improve disabling of features which have dependencies Andrew Cooper
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

When clearing a cpu cap, clear all dependent features.  This avoids having a
featureset with intermediate features disabled, but leaf features enabled.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
---
v3:
 * Style fixes.  Use __test_and_set_bit()
---
 xen/arch/x86/cpu/common.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index d302272..0942b44 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -53,8 +53,22 @@ static unsigned int cleared_caps[NCAPINTS];
 
 void __init setup_clear_cpu_cap(unsigned int cap)
 {
+	const uint32_t *dfs;
+	unsigned int i;
+
+	if (__test_and_set_bit(cap, cleared_caps))
+		return;
+
 	__clear_bit(cap, boot_cpu_data.x86_capability);
-	__set_bit(cap, cleared_caps);
+	dfs = lookup_deep_deps(cap);
+
+	if (!dfs)
+		return;
+
+	for (i = 0; i < FSCAPINTS; ++i) {
+		cleared_caps[i] |= dfs[i];
+		boot_cpu_data.x86_capability[i] &= ~dfs[i];
+	}
 }
 
 static void default_init(struct cpuinfo_x86 * c)
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 05/21] xen/x86: Improve disabling of features which have dependencies
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (3 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 04/21] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 15:04   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 06/21] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

APIC and XSAVE have dependent features, which also need disabling if Xen
chooses to disable a feature.

Use setup_clear_cpu_cap() rather than clear_bit(), as it takes care of
dependent features as well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
---
v2: Move boolean_param() adjacent to use_xsave in xstate_init()
---
 xen/arch/x86/apic.c       |  2 +-
 xen/arch/x86/cpu/common.c | 12 +++---------
 xen/arch/x86/xstate.c     |  6 +++++-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index b9601ad..8df5bd3 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1349,7 +1349,7 @@ void pmu_apic_interrupt(struct cpu_user_regs *regs)
 int __init APIC_init_uniprocessor (void)
 {
     if (enable_local_apic < 0)
-        __clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
+        setup_clear_cpu_cap(X86_FEATURE_APIC);
 
     if (!smp_found_config && !cpu_has_apic) {
         skip_ioapic_setup = 1;
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 0942b44..b5c023f 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -16,9 +16,6 @@
 
 #include "cpu.h"
 
-static bool_t use_xsave = 1;
-boolean_param("xsave", use_xsave);
-
 bool_t opt_arat = 1;
 boolean_param("arat", opt_arat);
 
@@ -341,12 +338,6 @@ void identify_cpu(struct cpuinfo_x86 *c)
 	if (this_cpu->c_init)
 		this_cpu->c_init(c);
 
-        /* Initialize xsave/xrstor features */
-	if ( !use_xsave )
-		__clear_bit(X86_FEATURE_XSAVE, boot_cpu_data.x86_capability);
-
-	if ( cpu_has_xsave )
-		xstate_init(c);
 
    	if ( !opt_pku )
 		setup_clear_cpu_cap(X86_FEATURE_PKU);
@@ -370,6 +361,9 @@ void identify_cpu(struct cpuinfo_x86 *c)
 
 	/* Now the feature flags better reflect actual CPU features! */
 
+	if ( cpu_has_xsave )
+		xstate_init(c);
+
 #ifdef NOISY_CAPS
 	printk(KERN_DEBUG "CPU: After all inits, caps:");
 	for (i = 0; i < NCAPINTS; i++)
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 8c652bc..4a6c9f6 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -534,11 +534,15 @@ unsigned int xstate_ctxt_size(u64 xcr0)
 /* Collect the information of processor's extended state */
 void xstate_init(struct cpuinfo_x86 *c)
 {
+    static bool_t __initdata use_xsave = 1;
+    boolean_param("xsave", use_xsave);
+
     bool_t bsp = c == &boot_cpu_data;
     u32 eax, ebx, ecx, edx;
     u64 feature_mask;
 
-    if ( boot_cpu_data.cpuid_level < XSTATE_CPUID )
+    if ( (bsp && !use_xsave) ||
+         boot_cpu_data.cpuid_level < XSTATE_CPUID )
     {
         BUG_ON(!bsp);
         setup_clear_cpu_cap(X86_FEATURE_XSAVE);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 06/21] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (4 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 05/21] xen/x86: Improve disabling of features which have dependencies Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 16:10   ` Konrad Rzeszutek Wilk
  2016-04-08 18:06   ` Jan Beulich
  2016-04-07 11:57 ` [PATCH v5 07/21] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Currently, {pv,hvm}_cpuid() has a large quantity of essentially-static logic
for modifying the features visible to a guest.  A lot of this can be subsumed
by {pv,hvm}_featuremask, which identify the features available on this
hardware which could be given to a PV or HVM guest.

This is a step in the direction of full per-domain cpuid policies, but lots
more development is needed for that.  As a result, the static checks are
simplified, but the dynamic checks need to remain for now.

As a side effect, some of the logic for special features can be improved.
OSXSAVE and OSPKE will be automatically cleared because of being absent in the
featuremask.  This allows the fast-forward logic to be more simple.

In addition, there are some corrections to the existing logic:

 * Hiding PSE36 out of PAE mode is architecturally wrong.  It turns out that
   it was a bugfix for running HyperV under Xen, which wanted to see PSE36
   even after choosing to use PAE paging.  PSE36 is not supported by shadow
   paging, so is hidden from non-HAP guests, but is still visible for HAP
   guests.  It is also leaked into non-HAP guests when the guest is already
   running in PAE mode.
 * Changing the visibility of RDTSCP based on host TSC stability or virtual
   TSC mode is bogus, so dropped.
 * When emulating Intel to a guest, the common features in e1d should be
   cleared.
 * The APIC bit in e1d (on non-Intel) is also a fast-forward from the
   APIC_BASE MSR.

As a small improvement, use compiler-visible &'s and |'s, rather than
{clear,set}_bit().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Reinstate some of the dynamic checks for now.  Future development work will
   instate a complete per-domain policy.
 * Fix OSXSAVE handling for PV guests.
v3:
 * Better handling of the cross-vendor case.
 * Improvements to the handling of special features.
 * Correct PSE36 to being a HAP-only feature.
 * Yet more OSXSAVE fixes for PV guests.
v4:
 * Leak PSE36 into shadow guests to fix buggy versions of Hyper-V.
 * Leak MTRR into the hardware domain to fix Xenolinux dom0.
 * Change cross-vendor 1D disabling logic.
 * Avoid reading arch.pv_vcpu for PVH guests.
v5:
 * Clarify the commit message regarding PSE36.
 * Drop redundant is_pv_domain().
 * Fix deep-C and P states with modern Linux PVOps on faulting-capable hardware.
---
 xen/arch/x86/hvm/hvm.c           | 125 ++++++++++++-------
 xen/arch/x86/traps.c             | 254 +++++++++++++++++++++++++++------------
 xen/include/asm-x86/cpufeature.h |   2 +
 3 files changed, 263 insertions(+), 118 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b239f74..9ce61cc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -72,6 +72,7 @@
 #include <public/memory.h>
 #include <public/vm_event.h>
 #include <public/arch-x86/cpuid.h>
+#include <asm/cpuid.h>
 
 bool_t __read_mostly hvm_enabled;
 
@@ -3358,62 +3359,71 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
         /* Fix up VLAPIC details. */
         *ebx &= 0x00FFFFFFu;
         *ebx |= (v->vcpu_id * 2) << 24;
+
+        *ecx &= hvm_featureset[FEATURESET_1c];
+        *edx &= hvm_featureset[FEATURESET_1d];
+
+        /* APIC exposed to guests, but Fast-forward MSR_APIC_BASE.EN back in. */
         if ( vlapic_hw_disabled(vcpu_vlapic(v)) )
-            __clear_bit(X86_FEATURE_APIC & 31, edx);
+            *edx &= ~cpufeat_bit(X86_FEATURE_APIC);
 
-        /* Fix up OSXSAVE. */
-        if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) &&
-             (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) )
+        /* OSXSAVE cleared by hvm_featureset.  Fast-forward CR4 back in. */
+        if ( v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE )
             *ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
-        else
-            *ecx &= ~cpufeat_mask(X86_FEATURE_OSXSAVE);
 
-        /* Don't expose PCID to non-hap hvm. */
+        /* Don't expose HAP-only features to non-hap guests. */
         if ( !hap_enabled(d) )
+        {
             *ecx &= ~cpufeat_mask(X86_FEATURE_PCID);
 
-        /* Only provide PSE36 when guest runs in 32bit PAE or in long mode */
-        if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) )
-            *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
+            /*
+             * PSE36 is not supported in shadow mode.  This bit should be
+             * unilaterally cleared.
+             *
+             * However, an unspecified version of Hyper-V from 2011 refuses
+             * to start as the "cpu does not provide required hw features" if
+             * it can't see PSE36.
+             *
+             * As a workaround, leak the toolstack-provided PSE36 value into a
+             * shadow guest if the guest is already using PAE paging (and
+             * won't care about reverting back to PSE paging).  Otherwise,
+             * knoble it, so a 32bit guest doesn't get the impression that it
+             * could try to use PSE36 paging.
+             */
+            if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) )
+                *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
+        }
         break;
+
     case 0x7:
         if ( count == 0 )
         {
-            if ( !cpu_has_smep )
-                *ebx &= ~cpufeat_mask(X86_FEATURE_SMEP);
-
-            if ( !cpu_has_smap )
-                *ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
+            /* Fold host's FDP_EXCP_ONLY and NO_FPU_SEL into guest's view. */
+            *ebx &= (hvm_featureset[FEATURESET_7b0] &
+                     ~special_features[FEATURESET_7b0]);
+            *ebx |= (host_featureset[FEATURESET_7b0] &
+                     special_features[FEATURESET_7b0]);
 
-            /* Don't expose MPX to hvm when VMX support is not available. */
-            if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
-                 !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
-                *ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
+            *ecx &= hvm_featureset[FEATURESET_7c0];
 
+            /* Don't expose HAP-only features to non-hap guests. */
             if ( !hap_enabled(d) )
             {
-                 /* Don't expose INVPCID to non-hap hvm. */
                  *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
-                 /* X86_FEATURE_PKU is not yet implemented for shadow paging. */
                  *ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
             }
 
-            if ( (*ecx & cpufeat_mask(X86_FEATURE_PKU)) &&
-                 (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) )
+            /* OSPKE cleared by hvm_featureset.  Fast-forward CR4 back in. */
+            if ( v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE )
                 *ecx |= cpufeat_mask(X86_FEATURE_OSPKE);
-            else
-                *ecx &= ~cpufeat_mask(X86_FEATURE_OSPKE);
-
-            /* Don't expose PCOMMIT to hvm when VMX support is not available. */
-            if ( !cpu_has_vmx_pcommit )
-                *ebx &= ~cpufeat_mask(X86_FEATURE_PCOMMIT);
         }
-
         break;
+
     case 0xb:
         /* Fix the x2APIC identifier. */
         *edx = v->vcpu_id * 2;
         break;
+
     case 0xd:
         /* EBX value of main leaf 0 depends on enabled xsave features */
         if ( count == 0 && v->arch.xcr0 ) 
@@ -3430,9 +3440,12 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                     *ebx = _eax + _ebx;
             }
         }
+
         if ( count == 1 )
         {
-            if ( cpu_has_xsaves && cpu_has_vmx_xsaves )
+            *eax &= hvm_featureset[FEATURESET_Da1];
+
+            if ( *eax & cpufeat_mask(X86_FEATURE_XSAVES) )
             {
                 *ebx = XSTATE_AREA_MIN_SIZE;
                 if ( v->arch.xcr0 | v->arch.hvm_vcpu.msr_xss )
@@ -3447,20 +3460,42 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
         break;
 
     case 0x80000001:
-        /* We expose RDTSCP feature to guest only when
-           tsc_mode == TSC_MODE_DEFAULT and host_tsc_is_safe() returns 1 */
-        if ( d->arch.tsc_mode != TSC_MODE_DEFAULT ||
-             !host_tsc_is_safe() )
-            *edx &= ~cpufeat_mask(X86_FEATURE_RDTSCP);
-        /* Hide 1GB-superpage feature if we can't emulate it. */
-        if (!hvm_pse1gb_supported(d))
+        *ecx &= hvm_featureset[FEATURESET_e1c];
+        *edx &= hvm_featureset[FEATURESET_e1d];
+
+        /* If not emulating AMD, clear the duplicated features in e1d. */
+        if ( d->arch.x86_vendor != X86_VENDOR_AMD )
+            *edx &= ~CPUID_COMMON_1D_FEATURES;
+        /* fast-forward MSR_APIC_BASE.EN if it hasn't already been clobbered. */
+        else if ( vlapic_hw_disabled(vcpu_vlapic(v)) )
+            *edx &= ~cpufeat_bit(X86_FEATURE_APIC);
+
+        /* Don't expose HAP-only features to non-hap guests. */
+        if ( !hap_enabled(d) )
+        {
             *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB);
-        /* Only provide PSE36 when guest runs in 32bit PAE or in long mode */
-        if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) )
-            *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
-        /* Hide data breakpoint extensions if the hardware has no support. */
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
-            *ecx &= ~cpufeat_mask(X86_FEATURE_DBEXT);
+
+            /*
+             * PSE36 is not supported in shadow mode.  This bit should be
+             * unilaterally cleared.
+             *
+             * However, an unspecified version of Hyper-V from 2011 refuses
+             * to start as the "cpu does not provide required hw features" if
+             * it can't see PSE36.
+             *
+             * As a workaround, leak the toolstack-provided PSE36 value into a
+             * shadow guest if the guest is already using PAE paging (and
+             * won't care about reverting back to PSE paging).  Otherwise,
+             * knoble it, so a 32bit guest doesn't get the impression that it
+             * could try to use PSE36 paging.
+             */
+            if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) )
+                *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
+        }
+        break;
+
+    case 0x80000007:
+        *edx &= hvm_featureset[FEATURESET_e7d];
         break;
 
     case 0x80000008:
@@ -3478,6 +3513,8 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
         hvm_cpuid(0x80000001, NULL, NULL, NULL, &_edx);
         *eax = (*eax & ~0xffff00) | (_edx & cpufeat_mask(X86_FEATURE_LM)
                                      ? 0x3000 : 0x2000);
+
+        *ebx &= hvm_featureset[FEATURESET_e8b];
         break;
     }
 }
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 6fbb1cf..bd2f67b 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -73,6 +73,7 @@
 #include <asm/hpet.h>
 #include <asm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
+#include <asm/cpuid.h>
 #include <xsm/xsm.h>
 
 /*
@@ -932,69 +933,162 @@ void pv_cpuid(struct cpu_user_regs *regs)
     else
         cpuid_count(leaf, subleaf, &a, &b, &c, &d);
 
-    if ( (leaf & 0x7fffffff) == 0x00000001 )
-    {
-        /* Modify Feature Information. */
-        if ( !cpu_has_apic )
-            __clear_bit(X86_FEATURE_APIC, &d);
-
-        if ( !is_pvh_domain(currd) )
-        {
-            __clear_bit(X86_FEATURE_PSE, &d);
-            __clear_bit(X86_FEATURE_PGE, &d);
-            __clear_bit(X86_FEATURE_PSE36, &d);
-            __clear_bit(X86_FEATURE_VME, &d);
-        }
-    }
-
     switch ( leaf )
     {
     case 0x00000001:
-        /* Modify Feature Information. */
-        if ( !cpu_has_sep )
-            __clear_bit(X86_FEATURE_SEP, &d);
-        __clear_bit(X86_FEATURE_DS, &d);
-        __clear_bit(X86_FEATURE_TM1, &d);
-        __clear_bit(X86_FEATURE_PBE, &d);
-        if ( is_pvh_domain(currd) )
-            __clear_bit(X86_FEATURE_MTRR, &d);
-
-        __clear_bit(X86_FEATURE_DTES64 % 32, &c);
-        __clear_bit(X86_FEATURE_MONITOR % 32, &c);
-        __clear_bit(X86_FEATURE_DSCPL % 32, &c);
-        __clear_bit(X86_FEATURE_VMX % 32, &c);
-        __clear_bit(X86_FEATURE_SMX % 32, &c);
-        __clear_bit(X86_FEATURE_TM2 % 32, &c);
+        c &= pv_featureset[FEATURESET_1c];
+        d &= pv_featureset[FEATURESET_1d];
+
         if ( is_pv_32bit_domain(currd) )
-            __clear_bit(X86_FEATURE_CX16 % 32, &c);
-        __clear_bit(X86_FEATURE_XTPR % 32, &c);
-        __clear_bit(X86_FEATURE_PDCM % 32, &c);
-        __clear_bit(X86_FEATURE_PCID % 32, &c);
-        __clear_bit(X86_FEATURE_DCA % 32, &c);
-        if ( !cpu_has_xsave )
+            c &= ~cpufeat_mask(X86_FEATURE_CX16);
+
+        if ( !is_pvh_domain(currd) )
         {
-            __clear_bit(X86_FEATURE_XSAVE % 32, &c);
-            __clear_bit(X86_FEATURE_AVX % 32, &c);
+            /*
+             * Delete the PVH condition when HVMLite formally replaces PVH,
+             * and HVM guests no longer enter a PV codepath.
+             */
+
+            /*
+             * !!! OSXSAVE handling for PV guests is non-architectural !!!
+             *
+             * Architecturally, the correct code here is simply:
+             *
+             *   if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE )
+             *       c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+             *
+             * However because of bugs in Xen (before c/s bd19080b, Nov 2010,
+             * the XSAVE cpuid flag leaked into guests despite the feature not
+             * being available for use), buggy workarounds where introduced to
+             * Linux (c/s 947ccf9c, also Nov 2010) which relied on the fact
+             * that Xen also incorrectly leaked OSXSAVE into the guest.
+             *
+             * Furthermore, providing architectural OSXSAVE behaviour to a
+             * many Linux PV guests triggered a further kernel bug when the
+             * fpu code observes that XSAVEOPT is available, assumes that
+             * xsave state had been set up for the task, and follows a wild
+             * pointer.
+             *
+             * Older Linux PVOPS kernels however do require architectural
+             * behaviour.  They observe Xen's leaked OSXSAVE and assume they
+             * can already use XSETBV, dying with a #UD because the shadowed
+             * CR4.OSXSAVE is clear.  This behaviour has been adjusted in all
+             * observed cases via stable backports of the above changeset.
+             *
+             * Therefore, the leaking of Xen's OSXSAVE setting has become a
+             * defacto part of the PV ABI and can't reasonably be corrected.
+             *
+             * The following situations and logic now applies:
+             *
+             * - Hardware without CPUID faulting support and native CPUID:
+             *    There is nothing Xen can do here.  The hosts XSAVE flag will
+             *    leak through and Xen's OSXSAVE choice will leak through.
+             *
+             *    In the case that the guest kernel has not set up OSXSAVE, only
+             *    SSE will be set in xcr0, and guest userspace can't do too much
+             *    damage itself.
+             *
+             * - Enlightened CPUID or CPUID faulting available:
+             *    Xen can fully control what is seen here.  Guest kernels need
+             *    to see the leaked OSXSAVE, but guest userspace is given
+             *    architectural behaviour, to reflect the guest kernels
+             *    intentions.
+             */
+            /* OSXSAVE cleared by pv_featureset.  Fast-forward CR4 back in. */
+            if ( (guest_kernel_mode(curr, regs) &&
+                  (read_cr4() & X86_CR4_OSXSAVE)) ||
+                 (curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) )
+                c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+
+            /*
+             * At the time of writing, a PV domain is the only viable option
+             * for Dom0.  Several interactions between dom0 and Xen for real
+             * hardware setup have unfortunately been implemented based on
+             * state which incorrectly leaked into dom0.
+             *
+             * These leaks are retained for backwards compatibility, but
+             * restricted to the hardware domains kernel only.
+             */
+            if ( is_hardware_domain(currd) && guest_kernel_mode(curr, regs) )
+            {
+                /*
+                 * MTRR used to unconditionally leak into PV guests.  They
+                 * cannot MTRR infrastructure at all, and shouldn't be able to
+                 * see the feature.
+                 *
+                 * Modern PVOPS Linux self-clobbers the MTRR feature, to avoid
+                 * trying to use the associated MSRs.  Xenolinux-based PV dom0's
+                 * however use the MTRR feature as an indication of the presence
+                 * of the XENPF_{add,del,read}_memtype hypercalls.
+                 */
+                if ( cpu_has_mtrr )
+                    d |= cpufeat_mask(X86_FEATURE_MTRR);
+
+                /*
+                 * MONITOR never leaked into PV guests, as PV guests cannot
+                 * use the MONITOR/MWAIT instructions.  As such, they require
+                 * the feature to not being present in emulated CPUID.
+                 *
+                 * Modern PVOPS Linux try to be cunning and use native CPUID
+                 * to see if the hardware actually supports MONITOR, and by
+                 * extension, deep C states.
+                 *
+                 * If the feature is seen, deep-C state information is
+                 * obtained from the DSDT and handed back to Xen via the
+                 * XENPF_set_processor_pminfo hypercall.
+                 *
+                 * This mechanism is incompatible with an HVM-based hardware
+                 * domain, and also with CPUID Faulting.
+                 *
+                 * Luckily, Xen can be just as 'cunning', and distinguish an
+                 * emulated CPUID from a faulted CPUID by whether a #UD or #GP
+                 * fault is currently being serviced.  Yuck...
+                 */
+                if ( cpu_has_monitor && regs->entry_vector == TRAP_gp_fault )
+                    c |= cpufeat_mask(X86_FEATURE_MONITOR);
+
+                /*
+                 * While MONITOR never leaked into PV guests, EIST always used
+                 * to.
+                 *
+                 * Modern PVOPS will only parse P state information from the
+                 * DSDT and return it to Xen if EIST is seen in the emulated
+                 * CPUID information.
+                 */
+                if ( cpu_has_eist )
+                    c |= cpufeat_mask(X86_FEATURE_EIST);
+            }
         }
-        if ( !cpu_has_apic )
-           __clear_bit(X86_FEATURE_X2APIC % 32, &c);
-        __set_bit(X86_FEATURE_HYPERVISOR % 32, &c);
+
+        c |= cpufeat_mask(X86_FEATURE_HYPERVISOR);
         break;
 
     case 0x00000007:
         if ( subleaf == 0 )
-            b &= (cpufeat_mask(X86_FEATURE_BMI1) |
-                  cpufeat_mask(X86_FEATURE_HLE)  |
-                  cpufeat_mask(X86_FEATURE_AVX2) |
-                  cpufeat_mask(X86_FEATURE_BMI2) |
-                  cpufeat_mask(X86_FEATURE_ERMS) |
-                  cpufeat_mask(X86_FEATURE_RTM)  |
-                  cpufeat_mask(X86_FEATURE_RDSEED)  |
-                  cpufeat_mask(X86_FEATURE_ADX)  |
-                  cpufeat_mask(X86_FEATURE_FSGSBASE));
+        {
+            /* Fold host's FDP_EXCP_ONLY and NO_FPU_SEL into guest's view. */
+            b &= (pv_featureset[FEATURESET_7b0] &
+                  ~special_features[FEATURESET_7b0]);
+            b |= (host_featureset[FEATURESET_7b0] &
+                  special_features[FEATURESET_7b0]);
+
+            c &= pv_featureset[FEATURESET_7c0];
+
+            if ( !is_pvh_domain(currd) )
+            {
+                /*
+                 * Delete the PVH condition when HVMLite formally replaces PVH,
+                 * and HVM guests no longer enter a PV codepath.
+                 */
+
+                /* OSPKE cleared by pv_featureset.  Fast-forward CR4 back in. */
+                if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_PKE )
+                    c |= cpufeat_mask(X86_FEATURE_OSPKE);
+            }
+        }
         else
-            b = 0;
-        a = c = d = 0;
+            b = c = 0;
+        a = d = 0;
         break;
 
     case XSTATE_CPUID:
@@ -1017,37 +1111,49 @@ void pv_cpuid(struct cpu_user_regs *regs)
         }
 
         case 1:
-            a &= (boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_XSAVEOPT)] &
-                  ~cpufeat_mask(X86_FEATURE_XSAVES));
+            a &= pv_featureset[FEATURESET_Da1];
             b = c = d = 0;
             break;
         }
         break;
 
     case 0x80000001:
-        /* Modify Feature Information. */
+        c &= pv_featureset[FEATURESET_e1c];
+        d &= pv_featureset[FEATURESET_e1d];
+
+        /* If not emulating AMD, clear the duplicated features in e1d. */
+        if ( currd->arch.x86_vendor != X86_VENDOR_AMD )
+            d &= ~CPUID_COMMON_1D_FEATURES;
+
+        /*
+         * MTRR used to unconditionally leak into PV guests.  They cannot MTRR
+         * infrastructure at all, and shouldn't be able to see the feature.
+         *
+         * Modern PVOPS Linux self-clobbers the MTRR feature, to avoid trying
+         * to use the associated MSRs.  Xenolinux-based PV dom0's however use
+         * the MTRR feature as an indication of the presence of the
+         * XENPF_{add,del,read}_memtype hypercalls.
+         */
+        if ( is_hardware_domain(currd) && guest_kernel_mode(curr, regs) &&
+             cpu_has_mtrr )
+            d |= cpufeat_mask(X86_FEATURE_MTRR);
+
         if ( is_pv_32bit_domain(currd) )
         {
-            __clear_bit(X86_FEATURE_LM % 32, &d);
-            __clear_bit(X86_FEATURE_LAHF_LM % 32, &c);
+            d &= ~cpufeat_mask(X86_FEATURE_LM);
+            c &= ~cpufeat_mask(X86_FEATURE_LAHF_LM);
+
+            if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
+                d &= ~cpufeat_mask(X86_FEATURE_SYSCALL);
         }
-        if ( is_pv_32bit_domain(currd) &&
-             boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
-            __clear_bit(X86_FEATURE_SYSCALL % 32, &d);
-        __clear_bit(X86_FEATURE_PAGE1GB % 32, &d);
-        __clear_bit(X86_FEATURE_RDTSCP % 32, &d);
-
-        __clear_bit(X86_FEATURE_SVM % 32, &c);
-        if ( !cpu_has_apic )
-           __clear_bit(X86_FEATURE_EXTAPIC % 32, &c);
-        __clear_bit(X86_FEATURE_OSVW % 32, &c);
-        __clear_bit(X86_FEATURE_IBS % 32, &c);
-        __clear_bit(X86_FEATURE_SKINIT % 32, &c);
-        __clear_bit(X86_FEATURE_WDT % 32, &c);
-        __clear_bit(X86_FEATURE_LWP % 32, &c);
-        __clear_bit(X86_FEATURE_NODEID_MSR % 32, &c);
-        __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
-        __clear_bit(X86_FEATURE_MONITORX % 32, &c);
+        break;
+
+    case 0x80000007:
+        d &= pv_featureset[FEATURESET_e7d];
+        break;
+
+    case 0x80000008:
+        b &= pv_featureset[FEATURESET_e8b];
         break;
 
     case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index e29b024..6a08579 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -81,6 +81,8 @@
 #define cpu_has_xsavec		boot_cpu_has(X86_FEATURE_XSAVEC)
 #define cpu_has_xgetbv1		boot_cpu_has(X86_FEATURE_XGETBV1)
 #define cpu_has_xsaves		boot_cpu_has(X86_FEATURE_XSAVES)
+#define cpu_has_monitor		boot_cpu_has(X86_FEATURE_MONITOR)
+#define cpu_has_eist		boot_cpu_has(X86_FEATURE_EIST)
 
 enum _cache_type {
     CACHE_TYPE_NULL = 0,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 07/21] x86/cpu: Move set_cpumask() calls into c_early_init()
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (5 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 06/21] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 18:09   ` Jan Beulich
  2016-04-07 11:57 ` [PATCH v5 08/21] x86/cpu: Sysctl and common infrastructure for levelling context switching Andrew Cooper
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Before c/s 44e24f8567 "x86: don't call generic_identify() redundantly", the
commandline-provided masks would take effect in Xen's view of the processor
features.

As the masks got applied after the query for features, the redundant call to
generic_identify() would clobber the pre-masking feature information with the
post-masking information.

Move the set_cpumask() calls into c_early_init() so the effects of the command
line parameters take place before the main query for features in
generic_identify().

The cpuid_mask_* command line parameters now limit the entire system.
Subsequent changes will cause the mask MSRs to be context switched per-domain,
removing the need to use the command line parameters for heterogeneous
levelling purposes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v5:
 * Tweak wording in the commit message
---
 xen/arch/x86/cpu/amd.c   |  8 ++++++--
 xen/arch/x86/cpu/intel.c | 34 +++++++++++++++++-----------------
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 47a38c6..5516777 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -407,6 +407,11 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
                                                          c->cpu_core_id);
 }
 
+static void early_init_amd(struct cpuinfo_x86 *c)
+{
+	set_cpuidmask(c);
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -595,14 +600,13 @@ static void init_amd(struct cpuinfo_x86 *c)
 	if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC))
 		disable_c1_ramping();
 
-	set_cpuidmask(c);
-
 	check_syscfg_dram_mod_en();
 }
 
 static const struct cpu_dev amd_cpu_dev = {
 	.c_vendor	= "AMD",
 	.c_ident 	= { "AuthenticAMD" },
+	.c_early_init	= early_init_amd,
 	.c_init		= init_amd,
 };
 
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index bdf89f6..ad22375 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -189,6 +189,23 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	if (boot_cpu_data.x86 == 0xF && boot_cpu_data.x86_model == 3 &&
 	    (boot_cpu_data.x86_mask == 3 || boot_cpu_data.x86_mask == 4))
 		paddr_bits = 36;
+
+	if (c == &boot_cpu_data && c->x86 == 6) {
+		if (probe_intel_cpuid_faulting())
+			__set_bit(X86_FEATURE_CPUID_FAULTING,
+				  c->x86_capability);
+	} else if (boot_cpu_has(X86_FEATURE_CPUID_FAULTING)) {
+		BUG_ON(!probe_intel_cpuid_faulting());
+		__set_bit(X86_FEATURE_CPUID_FAULTING, c->x86_capability);
+	}
+
+	if (!cpu_has_cpuid_faulting)
+		set_cpuidmask(c);
+	else if ((c == &boot_cpu_data) &&
+		 (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
+		    opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
+		    opt_cpuid_mask_xsave_eax)))
+		printk("No CPUID feature masking support available\n");
 }
 
 /*
@@ -258,23 +275,6 @@ static void init_intel(struct cpuinfo_x86 *c)
 		detect_ht(c);
 	}
 
-	if (c == &boot_cpu_data && c->x86 == 6) {
-		if (probe_intel_cpuid_faulting())
-			__set_bit(X86_FEATURE_CPUID_FAULTING,
-				  c->x86_capability);
-	} else if (boot_cpu_has(X86_FEATURE_CPUID_FAULTING)) {
-		BUG_ON(!probe_intel_cpuid_faulting());
-		__set_bit(X86_FEATURE_CPUID_FAULTING, c->x86_capability);
-	}
-
-	if (!cpu_has_cpuid_faulting)
-		set_cpuidmask(c);
-	else if ((c == &boot_cpu_data) &&
-		 (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
-		    opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
-		    opt_cpuid_mask_xsave_eax)))
-		printk("No CPUID feature masking support available\n");
-
 	/* Work around errata */
 	Intel_errata_workarounds(c);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 08/21] x86/cpu: Sysctl and common infrastructure for levelling context switching
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (6 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 07/21] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 16:54   ` Daniel De Graaf
  2016-04-08 16:12   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 09/21] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
                   ` (12 subsequent siblings)
  20 siblings, 2 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Daniel De Graaf

A toolstack needs to know how much control Xen has over the visible cpuid
values in PV guests.  Provide an explicit mechanism to query what Xen is
capable of.

This interface will currently report no capabilities.  This change is
scaffolding for future patches, which will introduce detection and switching
logic, after which the interface will report hardware capabilities correctly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
---
CC: Daniel De Graaf <dgdegra@tycho.nsa.gov>

v2:
 * s/cpumasks/cpuidmasks/
v3:
 * Reintroduce XEN_SYSCTL_get_levelling_caps (requested by Joao for some
   libvirt development he has planned).
 * Rename to XEN_SYSCTL_get_cpu_levelling_caps, and rename the constants to
   match the Xen command line options.
v4:
 * Move declarations from processor.h to cpuid.h
 * API corrections for XEN_SYSCTL_get_levelling_caps
v5:
 * XSM policy pieces
---
 tools/flask/policy/policy/modules/xen/xen.te |  1 +
 xen/arch/x86/cpu/common.c                    |  6 ++++++
 xen/arch/x86/sysctl.c                        |  6 ++++++
 xen/include/asm-x86/cpufeature.h             |  1 +
 xen/include/asm-x86/cpuid.h                  | 32 ++++++++++++++++++++++++++++
 xen/include/public/sysctl.h                  | 23 ++++++++++++++++++++
 xen/xsm/flask/hooks.c                        |  3 +++
 xen/xsm/flask/policy/access_vectors          |  2 ++
 8 files changed, 74 insertions(+)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 7e69ce9..c29b067 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -72,6 +72,7 @@ allow dom0_t xen_t:xen2 {
 allow dom0_t xen_t:xen2 {
     pmu_ctrl
     get_symbol
+    get_cpu_levelling_caps
 };
 
 # Allow dom0 to use all XENVER_ subops and VERSION subops that have checks.
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index b5c023f..7ef75b0 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -36,6 +36,12 @@ integer_param("cpuid_mask_ext_ecx", opt_cpuid_mask_ext_ecx);
 unsigned int opt_cpuid_mask_ext_edx = ~0u;
 integer_param("cpuid_mask_ext_edx", opt_cpuid_mask_ext_edx);
 
+unsigned int __initdata expected_levelling_cap;
+unsigned int __read_mostly levelling_caps;
+
+DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
+struct cpuidmasks __read_mostly cpuidmask_defaults;
+
 const struct cpu_dev *__read_mostly cpu_devs[X86_VENDOR_NUM] = {};
 
 unsigned int paddr_bits __read_mostly = 36;
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 58cbd70..f68cbec 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -190,6 +190,12 @@ long arch_do_sysctl(
         }
         break;
 
+    case XEN_SYSCTL_get_cpu_levelling_caps:
+        sysctl->u.cpu_levelling_caps.caps = levelling_caps;
+        if ( __copy_field_to_guest(u_sysctl, sysctl, u.cpu_levelling_caps.caps) )
+            ret = -EFAULT;
+        break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 6a08579..9a93799 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -83,6 +83,7 @@
 #define cpu_has_xsaves		boot_cpu_has(X86_FEATURE_XSAVES)
 #define cpu_has_monitor		boot_cpu_has(X86_FEATURE_MONITOR)
 #define cpu_has_eist		boot_cpu_has(X86_FEATURE_EIST)
+#define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 
 enum _cache_type {
     CACHE_TYPE_NULL = 0,
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index 4725672..9a21c25 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -3,6 +3,7 @@
 
 #include <asm/cpufeatureset.h>
 #include <asm/cpuid-autogen.h>
+#include <asm/percpu.h>
 
 #define FSCAPINTS FEATURESET_NR_ENTRIES
 
@@ -18,6 +19,7 @@
 
 #ifndef __ASSEMBLY__
 #include <xen/types.h>
+#include <public/sysctl.h>
 
 extern const uint32_t known_features[FSCAPINTS];
 extern const uint32_t special_features[FSCAPINTS];
@@ -31,6 +33,36 @@ void calculate_featuresets(void);
 
 const uint32_t *lookup_deep_deps(uint32_t feature);
 
+/*
+ * Expected levelling capabilities (given cpuid vendor/family information),
+ * and levelling capabilities actually available (given MSR probing).
+ */
+#define LCAP_faulting XEN_SYSCTL_CPU_LEVELCAP_faulting
+#define LCAP_1cd      (XEN_SYSCTL_CPU_LEVELCAP_ecx |        \
+                       XEN_SYSCTL_CPU_LEVELCAP_edx)
+#define LCAP_e1cd     (XEN_SYSCTL_CPU_LEVELCAP_extd_ecx |   \
+                       XEN_SYSCTL_CPU_LEVELCAP_extd_edx)
+#define LCAP_Da1      XEN_SYSCTL_CPU_LEVELCAP_xsave_eax
+#define LCAP_6c       XEN_SYSCTL_CPU_LEVELCAP_thermal_ecx
+#define LCAP_7ab0     (XEN_SYSCTL_CPU_LEVELCAP_l7s0_eax |   \
+                       XEN_SYSCTL_CPU_LEVELCAP_l7s0_ebx)
+extern unsigned int expected_levelling_cap, levelling_caps;
+
+struct cpuidmasks
+{
+    uint64_t _1cd;
+    uint64_t e1cd;
+    uint64_t Da1;
+    uint64_t _6c;
+    uint64_t _7ab0;
+};
+
+/* Per CPU shadows of masking MSR values, for lazy context switching. */
+DECLARE_PER_CPU(struct cpuidmasks, cpuidmasks);
+
+/* Default masking MSR values, calculated at boot. */
+extern struct cpuidmasks cpuidmask_defaults;
+
 #endif /* __ASSEMBLY__ */
 #endif /* !__X86_CPUID_H__ */
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 96680eb..1ab16db 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -766,6 +766,27 @@ struct xen_sysctl_tmem_op {
 typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
 
+/*
+ * XEN_SYSCTL_get_cpu_levelling_caps (x86 specific)
+ *
+ * Return hardware capabilities concerning masking or faulting of the cpuid
+ * instruction for PV guests.
+ */
+struct xen_sysctl_cpu_levelling_caps {
+#define XEN_SYSCTL_CPU_LEVELCAP_faulting    (1ul <<  0) /* CPUID faulting    */
+#define XEN_SYSCTL_CPU_LEVELCAP_ecx         (1ul <<  1) /* 0x00000001.ecx    */
+#define XEN_SYSCTL_CPU_LEVELCAP_edx         (1ul <<  2) /* 0x00000001.edx    */
+#define XEN_SYSCTL_CPU_LEVELCAP_extd_ecx    (1ul <<  3) /* 0x80000001.ecx    */
+#define XEN_SYSCTL_CPU_LEVELCAP_extd_edx    (1ul <<  4) /* 0x80000001.edx    */
+#define XEN_SYSCTL_CPU_LEVELCAP_xsave_eax   (1ul <<  5) /* 0x0000000D:1.eax  */
+#define XEN_SYSCTL_CPU_LEVELCAP_thermal_ecx (1ul <<  6) /* 0x00000006.ecx    */
+#define XEN_SYSCTL_CPU_LEVELCAP_l7s0_eax    (1ul <<  7) /* 0x00000007:0.eax  */
+#define XEN_SYSCTL_CPU_LEVELCAP_l7s0_ebx    (1ul <<  8) /* 0x00000007:0.ebx  */
+    uint32_t caps;
+};
+typedef struct xen_sysctl_cpu_levelling_caps xen_sysctl_cpu_levelling_caps_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpu_levelling_caps_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -791,6 +812,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pcitopoinfo                   22
 #define XEN_SYSCTL_psr_cat_op                    23
 #define XEN_SYSCTL_tmem_op                       24
+#define XEN_SYSCTL_get_cpu_levelling_caps        25
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -816,6 +838,7 @@ struct xen_sysctl {
         struct xen_sysctl_psr_cmt_op        psr_cmt_op;
         struct xen_sysctl_psr_cat_op        psr_cat_op;
         struct xen_sysctl_tmem_op           tmem_op;
+        struct xen_sysctl_cpu_levelling_caps cpu_levelling_caps;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 1eaec58..f0e3e5f 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -808,6 +808,9 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_tmem_op:
         return domain_has_xen(current->domain, XEN__TMEM_CONTROL);
 
+    case XEN_SYSCTL_get_cpu_levelling_caps:
+        return domain_has_xen(current->domain, XEN2__GET_CPU_LEVELLING_CAPS);
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 56600bb..31ecf02 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -93,6 +93,8 @@ class xen2
     pmu_ctrl
 # PMU use (domains, including unprivileged ones, will be using this operation)
     pmu_use
+# XEN_SYSCTL_get_cpu_levelling_caps
+    get_cpu_levelling_caps
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 09/21] x86/cpu: Rework AMD masking MSR setup
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (7 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 08/21] x86/cpu: Sysctl and common infrastructure for levelling context switching Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 16:13   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 10/21] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

This patch is best reviewed as its end result rather than as a diff, as it
rewrites almost all of the setup.

On the BSP, cpuid information is used to evaluate the potential available set
of masking MSRs, and they are unconditionally probed, filling in the
availability information and hardware defaults.

The command line parameters are then combined with the hardware defaults to
further restrict the Xen default masking level.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
---
v2:
 * Provide extra information if opt_cpu_info
 * Extra comment indicating the expected use of amd_ctxt_switch_levelling()
v3:
 * Fix the interaction of the fast-forward bits with the override MSRs.
 * Style fixups.
v5:
 * Tweak comments and style.
---
 xen/arch/x86/cpu/amd.c | 281 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 184 insertions(+), 97 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 5516777..93a8a5e 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -80,6 +80,13 @@ static inline int wrmsr_amd_safe(unsigned int msr, unsigned int lo,
 	return err;
 }
 
+static void wrmsr_amd(unsigned int msr, uint64_t val)
+{
+	asm volatile("wrmsr" ::
+		     "c" (msr), "a" ((uint32_t)val),
+		     "d" (val >> 32), "D" (0x9c5a203a));
+}
+
 static const struct cpuidmask {
 	uint16_t fam;
 	char rev[2];
@@ -126,126 +133,203 @@ static const struct cpuidmask *__init noinline get_cpuidmask(const char *opt)
 }
 
 /*
+ * Sets caps in expected_levelling_cap, probes for the specified mask MSR, and
+ * set caps in levelling_caps if it is found.  Processors prior to Fam 10h
+ * required a 32-bit password for masking MSRs.  Returns the default value.
+ */
+static uint64_t __init _probe_mask_msr(unsigned int msr, uint64_t caps)
+{
+	unsigned int hi, lo;
+
+	expected_levelling_cap |= caps;
+
+	if ((rdmsr_amd_safe(msr, &lo, &hi) == 0) &&
+	    (wrmsr_amd_safe(msr, lo, hi) == 0))
+		levelling_caps |= caps;
+
+	return ((uint64_t)hi << 32) | lo;
+}
+
+/*
+ * Probe for the existance of the expected masking MSRs.  They might easily
+ * not be available if Xen is running virtualised.
+ */
+static void __init noinline probe_masking_msrs(void)
+{
+	const struct cpuinfo_x86 *c = &boot_cpu_data;
+
+	/*
+	 * First, work out which masking MSRs we should have, based on
+	 * revision and cpuid.
+	 */
+
+	/* Fam11 doesn't support masking at all. */
+	if (c->x86 == 0x11)
+		return;
+
+	cpuidmask_defaults._1cd =
+		_probe_mask_msr(MSR_K8_FEATURE_MASK, LCAP_1cd);
+	cpuidmask_defaults.e1cd =
+		_probe_mask_msr(MSR_K8_EXT_FEATURE_MASK, LCAP_e1cd);
+
+	if (c->cpuid_level >= 7)
+		cpuidmask_defaults._7ab0 =
+			_probe_mask_msr(MSR_AMD_L7S0_FEATURE_MASK, LCAP_7ab0);
+
+	if (c->x86 == 0x15 && c->cpuid_level >= 6 && cpuid_ecx(6))
+		cpuidmask_defaults._6c =
+			_probe_mask_msr(MSR_AMD_THRM_FEATURE_MASK, LCAP_6c);
+
+	/*
+	 * Don't bother warning about a mismatch if virtualised.  These MSRs
+	 * are not architectural and almost never virtualised.
+	 */
+	if ((expected_levelling_cap == levelling_caps) ||
+	    cpu_has_hypervisor)
+		return;
+
+	printk(XENLOG_WARNING "Mismatch between expected (%#x) "
+	       "and real (%#x) levelling caps: missing %#x\n",
+	       expected_levelling_cap, levelling_caps,
+	       (expected_levelling_cap ^ levelling_caps) & levelling_caps);
+	printk(XENLOG_WARNING "Fam %#x, model %#x level %#x\n",
+	       c->x86, c->x86_model, c->cpuid_level);
+	printk(XENLOG_WARNING
+	       "If not running virtualised, please report a bug\n");
+}
+
+/*
+ * Context switch levelling state to the next domain.  A parameter of NULL is
+ * used to context switch to the default host state (by the cpu bringup-code,
+ * crash path, etc).
+ */
+static void amd_ctxt_switch_levelling(const struct domain *nextd)
+{
+	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
+	const struct cpuidmasks *masks = &cpuidmask_defaults;
+
+#define LAZY(cap, msr, field)						\
+	({								\
+		if (unlikely(these_masks->field != masks->field) &&	\
+		    ((levelling_caps & cap) == cap))			\
+		{							\
+			wrmsr_amd(msr, masks->field);			\
+			these_masks->field = masks->field;		\
+		}							\
+	})
+
+	LAZY(LCAP_1cd,  MSR_K8_FEATURE_MASK,       _1cd);
+	LAZY(LCAP_e1cd, MSR_K8_EXT_FEATURE_MASK,   e1cd);
+	LAZY(LCAP_7ab0, MSR_AMD_L7S0_FEATURE_MASK, _7ab0);
+	LAZY(LCAP_6c,   MSR_AMD_THRM_FEATURE_MASK, _6c);
+
+#undef LAZY
+}
+
+/*
  * Mask the features and extended features returned by CPUID.  Parameters are
  * set from the boot line via two methods:
  *
  *   1) Specific processor revision string
  *   2) User-defined masks
  *
- * The processor revision string parameter has precedene.
+ * The user-defined masks take precedence.
+ *
+ * AMD "masking msrs" are actually overrides, making it possible to advertise
+ * features which are not supported by the hardware.  Care must be taken to
+ * avoid this, as the accidentally-advertised features will not actually
+ * function.
  */
-static void set_cpuidmask(const struct cpuinfo_x86 *c)
+static void __init noinline amd_init_levelling(void)
 {
-	static unsigned int feat_ecx, feat_edx;
-	static unsigned int extfeat_ecx, extfeat_edx;
-	static unsigned int l7s0_eax, l7s0_ebx;
-	static unsigned int thermal_ecx;
-	static bool_t skip_feat, skip_extfeat;
-	static bool_t skip_l7s0_eax_ebx, skip_thermal_ecx;
-	static enum { not_parsed, no_mask, set_mask } status;
-	unsigned int eax, ebx, ecx, edx;
-
-	if (status == no_mask)
-		return;
+	const struct cpuidmask *m = NULL;
 
-	if (status == set_mask)
-		goto setmask;
+	probe_masking_msrs();
 
-	ASSERT((status == not_parsed) && (c == &boot_cpu_data));
-	status = no_mask;
+	if (*opt_famrev != '\0') {
+		m = get_cpuidmask(opt_famrev);
 
-	/* Fam11 doesn't support masking at all. */
-	if (c->x86 == 0x11)
-		return;
+		if (!m)
+			printk("Invalid processor string: %s\n", opt_famrev);
+	}
 
-	if (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
-	      opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
-	      opt_cpuid_mask_l7s0_eax & opt_cpuid_mask_l7s0_ebx &
-	      opt_cpuid_mask_thermal_ecx)) {
-		feat_ecx = opt_cpuid_mask_ecx;
-		feat_edx = opt_cpuid_mask_edx;
-		extfeat_ecx = opt_cpuid_mask_ext_ecx;
-		extfeat_edx = opt_cpuid_mask_ext_edx;
-		l7s0_eax = opt_cpuid_mask_l7s0_eax;
-		l7s0_ebx = opt_cpuid_mask_l7s0_ebx;
-		thermal_ecx = opt_cpuid_mask_thermal_ecx;
-	} else if (*opt_famrev == '\0') {
-		return;
-	} else {
-		const struct cpuidmask *m = get_cpuidmask(opt_famrev);
+	if ((levelling_caps & LCAP_1cd) == LCAP_1cd) {
+		uint32_t ecx, edx, tmp;
 
-		if (!m) {
-			printk("Invalid processor string: %s\n", opt_famrev);
-			printk("CPUID will not be masked\n");
-			return;
+		cpuid(0x00000001, &tmp, &tmp, &ecx, &edx);
+
+		if (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx)) {
+			ecx &= opt_cpuid_mask_ecx;
+			edx &= opt_cpuid_mask_edx;
+		} else if (m) {
+			ecx &= m->ecx;
+			edx &= m->edx;
 		}
-		feat_ecx = m->ecx;
-		feat_edx = m->edx;
-		extfeat_ecx = m->ext_ecx;
-		extfeat_edx = m->ext_edx;
+
+		/* Fast-forward bits - Must be set. */
+		if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
+			ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+		edx |= cpufeat_mask(X86_FEATURE_APIC);
+
+		/* Allow the HYPERVISOR bit to be set via guest policy. */
+		ecx |= cpufeat_mask(X86_FEATURE_HYPERVISOR);
+
+		cpuidmask_defaults._1cd = ((uint64_t)ecx << 32) | edx;
 	}
 
-        /* Setting bits in the CPUID mask MSR that are not set in the
-         * unmasked CPUID response can cause those bits to be set in the
-         * masked response.  Avoid that by explicitly masking in software. */
-        feat_ecx &= cpuid_ecx(0x00000001);
-        feat_edx &= cpuid_edx(0x00000001);
-        extfeat_ecx &= cpuid_ecx(0x80000001);
-        extfeat_edx &= cpuid_edx(0x80000001);
+	if ((levelling_caps & LCAP_e1cd) == LCAP_e1cd) {
+		uint32_t ecx, edx, tmp;
 
-	status = set_mask;
-	printk("Writing CPUID feature mask ECX:EDX -> %08Xh:%08Xh\n", 
-	       feat_ecx, feat_edx);
-	printk("Writing CPUID extended feature mask ECX:EDX -> %08Xh:%08Xh\n", 
-	       extfeat_ecx, extfeat_edx);
+		cpuid(0x80000001, &tmp, &tmp, &ecx, &edx);
 
-	if (c->cpuid_level >= 7)
-		cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
-	else
-		ebx = eax = 0;
-	if ((eax | ebx) && ~(l7s0_eax & l7s0_ebx)) {
-		if (l7s0_eax > eax)
-			l7s0_eax = eax;
-		l7s0_ebx &= ebx;
-		printk("Writing CPUID leaf 7 subleaf 0 feature mask EAX:EBX -> %08Xh:%08Xh\n",
-		       l7s0_eax, l7s0_ebx);
-	} else
-		skip_l7s0_eax_ebx = 1;
-
-	/* Only Fam15 has the respective MSR. */
-	ecx = c->x86 == 0x15 && c->cpuid_level >= 6 ? cpuid_ecx(6) : 0;
-	if (ecx && ~thermal_ecx) {
-		thermal_ecx &= ecx;
-		printk("Writing CPUID thermal/power feature mask ECX -> %08Xh\n",
-		       thermal_ecx);
-	} else
-		skip_thermal_ecx = 1;
-
- setmask:
-	/* AMD processors prior to family 10h required a 32-bit password */
-	if (!skip_feat &&
-	    wrmsr_amd_safe(MSR_K8_FEATURE_MASK, feat_edx, feat_ecx)) {
-		skip_feat = 1;
-		printk("Failed to set CPUID feature mask\n");
+		if (~(opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx)) {
+			ecx &= opt_cpuid_mask_ext_ecx;
+			edx &= opt_cpuid_mask_ext_edx;
+		} else if (m) {
+			ecx &= m->ext_ecx;
+			edx &= m->ext_edx;
+		}
+
+		/* Fast-forward bits - Must be set. */
+		edx |= cpufeat_mask(X86_FEATURE_APIC);
+
+		cpuidmask_defaults.e1cd = ((uint64_t)ecx << 32) | edx;
 	}
 
-	if (!skip_extfeat &&
-	    wrmsr_amd_safe(MSR_K8_EXT_FEATURE_MASK, extfeat_edx, extfeat_ecx)) {
-		skip_extfeat = 1;
-		printk("Failed to set CPUID extended feature mask\n");
+	if ((levelling_caps & LCAP_7ab0) == LCAP_7ab0) {
+		uint32_t eax, ebx, tmp;
+
+		cpuid(0x00000007, &eax, &ebx, &tmp, &tmp);
+
+		if (~(opt_cpuid_mask_l7s0_eax & opt_cpuid_mask_l7s0_ebx)) {
+			eax &= opt_cpuid_mask_l7s0_eax;
+			ebx &= opt_cpuid_mask_l7s0_ebx;
+		}
+
+		cpuidmask_defaults._7ab0 &= ((uint64_t)eax << 32) | ebx;
 	}
 
-	if (!skip_l7s0_eax_ebx &&
-	    wrmsr_amd_safe(MSR_AMD_L7S0_FEATURE_MASK, l7s0_ebx, l7s0_eax)) {
-		skip_l7s0_eax_ebx = 1;
-		printk("Failed to set CPUID leaf 7 subleaf 0 feature mask\n");
+	if ((levelling_caps & LCAP_6c) == LCAP_6c) {
+		uint32_t ecx = cpuid_ecx(6);
+
+		if (~opt_cpuid_mask_thermal_ecx)
+			ecx &= opt_cpuid_mask_thermal_ecx;
+
+		cpuidmask_defaults._6c &= (~0ULL << 32) | ecx;
 	}
 
-	if (!skip_thermal_ecx &&
-	    (rdmsr_amd_safe(MSR_AMD_THRM_FEATURE_MASK, &eax, &edx) ||
-	     wrmsr_amd_safe(MSR_AMD_THRM_FEATURE_MASK, thermal_ecx, edx))){
-		skip_thermal_ecx = 1;
-		printk("Failed to set CPUID thermal/power feature mask\n");
+	if (opt_cpu_info) {
+		printk(XENLOG_INFO "Levelling caps: %#x\n", levelling_caps);
+		printk(XENLOG_INFO
+		       "MSR defaults: 1d 0x%08x, 1c 0x%08x, e1d 0x%08x, "
+		       "e1c 0x%08x, 7a0 0x%08x, 7b0 0x%08x, 6c 0x%08x\n",
+		       (uint32_t)cpuidmask_defaults._1cd,
+		       (uint32_t)(cpuidmask_defaults._1cd >> 32),
+		       (uint32_t)cpuidmask_defaults.e1cd,
+		       (uint32_t)(cpuidmask_defaults.e1cd >> 32),
+		       (uint32_t)(cpuidmask_defaults._7ab0 >> 32),
+		       (uint32_t)cpuidmask_defaults._7ab0,
+		       (uint32_t)cpuidmask_defaults._6c);
 	}
 }
 
@@ -409,7 +493,10 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
 
 static void early_init_amd(struct cpuinfo_x86 *c)
 {
-	set_cpuidmask(c);
+	if (c == &boot_cpu_data)
+		amd_init_levelling();
+
+	amd_ctxt_switch_levelling(NULL);
 }
 
 static void init_amd(struct cpuinfo_x86 *c)
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 10/21] x86/cpu: Rework Intel masking/faulting setup
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (8 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 09/21] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 16:14   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 11/21] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

This patch is best reviewed as its end result rather than as a diff, as it
rewrites almost all of the setup.

On the BSP, cpuid information is used to evaluate the potential available set
of masking MSRs, and they are unconditionally probed, filling in the
availability information and hardware defaults.  A side effect of this is that
probe_intel_cpuid_faulting() can move to being __init.

The command line parameters are then combined with the hardware defaults to
further restrict the Xen default masking level.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
---
v2:
 * Style fixes.
 * Provide extra information if opt_cpu_info.
 * Extra comment indicating the expected use of intel_ctxt_switch_levelling().
v3:
 * Style fixes.
 * Avoid printing the cpumask defaults if faulting is available.
v5:
 * Tweak comments.
---
 xen/arch/x86/cpu/intel.c | 234 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 149 insertions(+), 85 deletions(-)

diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index ad22375..6e1fbbb 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -18,11 +18,18 @@
 
 #define select_idle_routine(x) ((void)0)
 
-static unsigned int probe_intel_cpuid_faulting(void)
+static bool_t __init probe_intel_cpuid_faulting(void)
 {
 	uint64_t x;
-	return !rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x) &&
-		(x & MSR_PLATFORM_INFO_CPUID_FAULTING);
+
+	if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x) ||
+	    !(x & MSR_PLATFORM_INFO_CPUID_FAULTING))
+		return 0;
+
+	expected_levelling_cap |= LCAP_faulting;
+	levelling_caps |=  LCAP_faulting;
+	__set_bit(X86_FEATURE_CPUID_FAULTING, boot_cpu_data.x86_capability);
+	return 1;
 }
 
 static DEFINE_PER_CPU(bool_t, cpuid_faulting_enabled);
@@ -44,36 +51,40 @@ void set_cpuid_faulting(bool_t enable)
 }
 
 /*
- * opt_cpuid_mask_ecx/edx: cpuid.1[ecx, edx] feature mask.
- * For example, E8400[Intel Core 2 Duo Processor series] ecx = 0x0008E3FD,
- * edx = 0xBFEBFBFF when executing CPUID.EAX = 1 normally. If you want to
- * 'rev down' to E8400, you can set these values in these Xen boot parameters.
+ * Set caps in expected_levelling_cap, probe a specific masking MSR, and set
+ * caps in levelling_caps if it is found, or clobber the MSR index if missing.
+ * If preset, reads the default value into msr_val.
  */
-static void set_cpuidmask(const struct cpuinfo_x86 *c)
+static uint64_t __init _probe_mask_msr(unsigned int *msr, uint64_t caps)
 {
-	static unsigned int msr_basic, msr_ext, msr_xsave;
-	static enum { not_parsed, no_mask, set_mask } status;
-	u64 msr_val;
+	uint64_t val = 0;
 
-	if (status == no_mask)
-		return;
+	expected_levelling_cap |= caps;
 
-	if (status == set_mask)
-		goto setmask;
+	if (rdmsr_safe(*msr, val) || wrmsr_safe(*msr, val))
+		*msr = 0;
+	else
+		levelling_caps |= caps;
 
-	ASSERT((status == not_parsed) && (c == &boot_cpu_data));
-	status = no_mask;
+	return val;
+}
 
-	if (!~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
-	       opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
-	       opt_cpuid_mask_xsave_eax))
-		return;
+/* Indices of the masking MSRs, or 0 if unavailable. */
+static unsigned int __read_mostly msr_basic, __read_mostly msr_ext,
+	__read_mostly msr_xsave;
+
+/*
+ * Probe for the existance of the expected masking MSRs.  They might easily
+ * not be available if Xen is running virtualised.
+ */
+static void __init probe_masking_msrs(void)
+{
+	const struct cpuinfo_x86 *c = &boot_cpu_data;
+	unsigned int exp_msr_basic, exp_msr_ext, exp_msr_xsave;
 
 	/* Only family 6 supports this feature. */
-	if (c->x86 != 6) {
-		printk("No CPUID feature masking support available\n");
+	if (c->x86 != 6)
 		return;
-	}
 
 	switch (c->x86_model) {
 	case 0x17: /* Yorkfield, Wolfdale, Penryn, Harpertown(DP) */
@@ -100,59 +111,121 @@ static void set_cpuidmask(const struct cpuinfo_x86 *c)
 		break;
 	}
 
-	status = set_mask;
+	exp_msr_basic = msr_basic;
+	exp_msr_ext   = msr_ext;
+	exp_msr_xsave = msr_xsave;
 
-	if (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx)) {
-		if (msr_basic)
-			printk("Writing CPUID feature mask ecx:edx -> %08x:%08x\n",
-			       opt_cpuid_mask_ecx, opt_cpuid_mask_edx);
-		else
-			printk("No CPUID feature mask available\n");
-	}
-	else
-		msr_basic = 0;
-
-	if (~(opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx)) {
-		if (msr_ext)
-			printk("Writing CPUID extended feature mask ecx:edx -> %08x:%08x\n",
-			       opt_cpuid_mask_ext_ecx, opt_cpuid_mask_ext_edx);
-		else
-			printk("No CPUID extended feature mask available\n");
-	}
-	else
-		msr_ext = 0;
-
-	if (~opt_cpuid_mask_xsave_eax) {
-		if (msr_xsave)
-			printk("Writing CPUID xsave feature mask eax -> %08x\n",
-			       opt_cpuid_mask_xsave_eax);
-		else
-			printk("No CPUID xsave feature mask available\n");
+	if (msr_basic)
+		cpuidmask_defaults._1cd = _probe_mask_msr(&msr_basic, LCAP_1cd);
+
+	if (msr_ext)
+		cpuidmask_defaults.e1cd = _probe_mask_msr(&msr_ext, LCAP_e1cd);
+
+	if (msr_xsave)
+		cpuidmask_defaults.Da1 = _probe_mask_msr(&msr_xsave, LCAP_Da1);
+
+	/*
+	 * Don't bother warning about a mismatch if virtualised.  These MSRs
+	 * are not architectural and almost never virtualised.
+	 */
+	if ((expected_levelling_cap == levelling_caps) ||
+	    cpu_has_hypervisor)
+		return;
+
+	printk(XENLOG_WARNING "Mismatch between expected (%#x) "
+	       "and real (%#x) levelling caps: missing %#x\n",
+	       expected_levelling_cap, levelling_caps,
+	       (expected_levelling_cap ^ levelling_caps) & levelling_caps);
+	printk(XENLOG_WARNING "Fam %#x, model %#x expected (%#x/%#x/%#x), "
+	       "got (%#x/%#x/%#x)\n", c->x86, c->x86_model,
+	       exp_msr_basic, exp_msr_ext, exp_msr_xsave,
+	       msr_basic, msr_ext, msr_xsave);
+	printk(XENLOG_WARNING
+	       "If not running virtualised, please report a bug\n");
+}
+
+/*
+ * Context switch levelling state to the next domain.  A parameter of NULL is
+ * used to context switch to the default host state (by the cpu bringup-code,
+ * crash path, etc).
+ */
+static void intel_ctxt_switch_levelling(const struct domain *nextd)
+{
+	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
+	const struct cpuidmasks *masks = &cpuidmask_defaults;
+
+#define LAZY(msr, field)						\
+	({								\
+		if (unlikely(these_masks->field != masks->field) &&	\
+		    (msr))						\
+		{							\
+			wrmsrl((msr), masks->field);			\
+			these_masks->field = masks->field;		\
+		}							\
+	})
+
+	LAZY(msr_basic, _1cd);
+	LAZY(msr_ext,   e1cd);
+	LAZY(msr_xsave, Da1);
+
+#undef LAZY
+}
+
+/*
+ * opt_cpuid_mask_ecx/edx: cpuid.1[ecx, edx] feature mask.
+ * For example, E8400[Intel Core 2 Duo Processor series] ecx = 0x0008E3FD,
+ * edx = 0xBFEBFBFF when executing CPUID.EAX = 1 normally. If you want to
+ * 'rev down' to E8400, you can set these values in these Xen boot parameters.
+ */
+static void __init noinline intel_init_levelling(void)
+{
+	if (!probe_intel_cpuid_faulting())
+		probe_masking_msrs();
+
+	if (msr_basic) {
+		uint32_t ecx, edx, tmp;
+
+		cpuid(0x00000001, &tmp, &tmp, &ecx, &edx);
+
+		ecx &= opt_cpuid_mask_ecx;
+		edx &= opt_cpuid_mask_edx;
+
+		cpuidmask_defaults._1cd &= ((u64)edx << 32) | ecx;
 	}
-	else
-		msr_xsave = 0;
-
- setmask:
-	if (msr_basic &&
-	    wrmsr_safe(msr_basic,
-		       ((u64)opt_cpuid_mask_edx << 32) | opt_cpuid_mask_ecx)){
-		msr_basic = 0;
-		printk("Failed to set CPUID feature mask\n");
+
+	if (msr_ext) {
+		uint32_t ecx, edx, tmp;
+
+		cpuid(0x80000001, &tmp, &tmp, &ecx, &edx);
+
+		ecx &= opt_cpuid_mask_ext_ecx;
+		edx &= opt_cpuid_mask_ext_edx;
+
+		cpuidmask_defaults.e1cd &= ((u64)edx << 32) | ecx;
 	}
 
-	if (msr_ext &&
-	    wrmsr_safe(msr_ext,
-		       ((u64)opt_cpuid_mask_ext_edx << 32) | opt_cpuid_mask_ext_ecx)){
-		msr_ext = 0;
-		printk("Failed to set CPUID extended feature mask\n");
+	if (msr_xsave) {
+		uint32_t eax, tmp;
+
+		cpuid_count(0x0000000d, 1, &eax, &tmp, &tmp, &tmp);
+
+		eax &= opt_cpuid_mask_xsave_eax;
+
+		cpuidmask_defaults.Da1 &= (~0ULL << 32) | eax;
 	}
 
-	if (msr_xsave &&
-	    (rdmsr_safe(msr_xsave, msr_val) ||
-	     wrmsr_safe(msr_xsave,
-			(msr_val & (~0ULL << 32)) | opt_cpuid_mask_xsave_eax))){
-		msr_xsave = 0;
-		printk("Failed to set CPUID xsave feature mask\n");
+	if (opt_cpu_info) {
+		printk(XENLOG_INFO "Levelling caps: %#x\n", levelling_caps);
+
+		if (!cpu_has_cpuid_faulting)
+			printk(XENLOG_INFO
+			       "MSR defaults: 1d 0x%08x, 1c 0x%08x, e1d 0x%08x, "
+			       "e1c 0x%08x, Da1 0x%08x\n",
+			       (uint32_t)(cpuidmask_defaults._1cd >> 32),
+			       (uint32_t)cpuidmask_defaults._1cd,
+			       (uint32_t)(cpuidmask_defaults.e1cd >> 32),
+			       (uint32_t)cpuidmask_defaults.e1cd,
+			       (uint32_t)cpuidmask_defaults.Da1);
 	}
 }
 
@@ -190,22 +263,13 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	    (boot_cpu_data.x86_mask == 3 || boot_cpu_data.x86_mask == 4))
 		paddr_bits = 36;
 
-	if (c == &boot_cpu_data && c->x86 == 6) {
-		if (probe_intel_cpuid_faulting())
-			__set_bit(X86_FEATURE_CPUID_FAULTING,
-				  c->x86_capability);
-	} else if (boot_cpu_has(X86_FEATURE_CPUID_FAULTING)) {
-		BUG_ON(!probe_intel_cpuid_faulting());
+	if (c == &boot_cpu_data)
+		intel_init_levelling();
+
+	if (test_bit(X86_FEATURE_CPUID_FAULTING, boot_cpu_data.x86_capability))
 		__set_bit(X86_FEATURE_CPUID_FAULTING, c->x86_capability);
-	}
 
-	if (!cpu_has_cpuid_faulting)
-		set_cpuidmask(c);
-	else if ((c == &boot_cpu_data) &&
-		 (~(opt_cpuid_mask_ecx & opt_cpuid_mask_edx &
-		    opt_cpuid_mask_ext_ecx & opt_cpuid_mask_ext_edx &
-		    opt_cpuid_mask_xsave_eax)))
-		printk("No CPUID feature masking support available\n");
+	intel_ctxt_switch_levelling(NULL);
 }
 
 /*
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 11/21] x86/cpu: Context switch cpuid masks and faulting state in context_switch()
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (9 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 10/21] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 16:15   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 12/21] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

A single ctxt_switch_levelling() function pointer is provided
(defaulting to an empty nop), which is overridden in the appropriate
$VENDOR_init_levelling().

set_cpuid_faulting() is made private and included within
intel_ctxt_switch_levelling().

One (attempted) functional change is that the faulting configuration should
not be special cased for dom0.  It turns out that the toolstack relies on the
special case (and indeed, on being a PV domain in the first place) to
correctly build HVM domains.

For now, the control domain is left as a special case, until futher work can
be completed to remove the restriction.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
---
v3:
 * Don't leave cpuid masking/faulting active for the kexec kernel.
v2:
 * Style fixes.
 * ASSERT() that faulting is available in set_cpuid_faulting().
v5:
 * Fix the building of HVM domains from hardware with faulting available.
---
 xen/arch/x86/cpu/amd.c          |  3 +++
 xen/arch/x86/cpu/common.c       |  7 +++++++
 xen/arch/x86/cpu/intel.c        | 37 ++++++++++++++++++++++++++++++++-----
 xen/arch/x86/crash.c            |  3 +++
 xen/arch/x86/domain.c           |  4 +---
 xen/include/asm-x86/processor.h |  2 +-
 6 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 93a8a5e..3e2f4a8 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -331,6 +331,9 @@ static void __init noinline amd_init_levelling(void)
 		       (uint32_t)cpuidmask_defaults._7ab0,
 		       (uint32_t)cpuidmask_defaults._6c);
 	}
+
+	if (levelling_caps)
+		ctxt_switch_levelling = amd_ctxt_switch_levelling;
 }
 
 /*
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 7ef75b0..fe6eab4 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -88,6 +88,13 @@ static const struct cpu_dev default_cpu = {
 };
 static const struct cpu_dev *this_cpu = &default_cpu;
 
+static void default_ctxt_switch_levelling(const struct domain *nextd)
+{
+	/* Nop */
+}
+void (* __read_mostly ctxt_switch_levelling)(const struct domain *nextd) =
+	default_ctxt_switch_levelling;
+
 bool_t opt_cpu_info;
 boolean_param("cpuinfo", opt_cpu_info);
 
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 6e1fbbb..e21c32d 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -32,13 +32,15 @@ static bool_t __init probe_intel_cpuid_faulting(void)
 	return 1;
 }
 
-static DEFINE_PER_CPU(bool_t, cpuid_faulting_enabled);
-void set_cpuid_faulting(bool_t enable)
+static void set_cpuid_faulting(bool_t enable)
 {
+	static DEFINE_PER_CPU(bool_t, cpuid_faulting_enabled);
+	bool_t *this_enabled = &this_cpu(cpuid_faulting_enabled);
 	uint32_t hi, lo;
 
-	if (!cpu_has_cpuid_faulting ||
-	    this_cpu(cpuid_faulting_enabled) == enable )
+	ASSERT(cpu_has_cpuid_faulting);
+
+	if (*this_enabled == enable)
 		return;
 
 	rdmsr(MSR_INTEL_MISC_FEATURES_ENABLES, lo, hi);
@@ -47,7 +49,7 @@ void set_cpuid_faulting(bool_t enable)
 		lo |= MSR_MISC_FEATURES_CPUID_FAULTING;
 	wrmsr(MSR_INTEL_MISC_FEATURES_ENABLES, lo, hi);
 
-	this_cpu(cpuid_faulting_enabled) = enable;
+	*this_enabled = enable;
 }
 
 /*
@@ -154,6 +156,28 @@ static void intel_ctxt_switch_levelling(const struct domain *nextd)
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
 	const struct cpuidmasks *masks = &cpuidmask_defaults;
 
+	if (cpu_has_cpuid_faulting) {
+		/*
+		 * We *should* be enabling faulting for the control domain.
+		 *
+		 * Unfortunately, the domain builder (having only ever been a
+		 * PV guest) expects to be able to see host cpuid state in a
+		 * native CPUID instruction, to correctly build a CPUID policy
+		 * for HVM guests (notably the xstate leaves).
+		 *
+		 * This logic is fundimentally broken for HVM toolstack
+		 * domains, and faulting causes PV guests to behave like HVM
+		 * guests from their point of view.
+		 *
+		 * Future development plans will move responsibility for
+		 * generating the maximum full cpuid policy into Xen, at which
+		 * this problem will disappear.
+		 */
+		set_cpuid_faulting(nextd && is_pv_domain(nextd) &&
+				   !is_control_domain(nextd));
+		return;
+	}
+
 #define LAZY(msr, field)						\
 	({								\
 		if (unlikely(these_masks->field != masks->field) &&	\
@@ -227,6 +251,9 @@ static void __init noinline intel_init_levelling(void)
 			       (uint32_t)cpuidmask_defaults.e1cd,
 			       (uint32_t)cpuidmask_defaults.Da1);
 	}
+
+	if (levelling_caps)
+		ctxt_switch_levelling = intel_ctxt_switch_levelling;
 }
 
 static void early_init_intel(struct cpuinfo_x86 *c)
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index 888a214..f28f527 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -189,6 +189,9 @@ void machine_crash_shutdown(void)
 
     nmi_shootdown_cpus();
 
+    /* Reset CPUID masking and faulting to the host's default. */
+    ctxt_switch_levelling(NULL);
+
     info = kexec_crash_save_info();
     info->xen_phys_start = xen_phys_start;
     info->dom0_pfn_to_mfn_frame_list_list =
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a4f6db2..cba77a2 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2082,9 +2082,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
             load_segments(next);
         }
 
-        set_cpuid_faulting(is_pv_domain(nextd) &&
-                           !is_control_domain(nextd) &&
-                           !is_hardware_domain(nextd));
+        ctxt_switch_levelling(nextd);
     }
 
     context_saved(prev);
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index a950554..f29d370 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -209,7 +209,7 @@ extern struct cpuinfo_x86 boot_cpu_data;
 extern struct cpuinfo_x86 cpu_data[];
 #define current_cpu_data cpu_data[smp_processor_id()]
 
-extern void set_cpuid_faulting(bool_t enable);
+extern void (*ctxt_switch_levelling)(const struct domain *nextd);
 
 extern u64 host_pat;
 extern bool_t opt_cpu_info;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 12/21] x86/pv: Provide custom cpumasks for PV domains
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (10 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 11/21] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 16:17   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 13/21] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

And use them in preference to cpumask_defaults on context switch.  HVM domains
must not be masked (to avoid interfering with cpuid calls within the guest),
so always lazily context switch to the host default.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
---
v2:
 * s/cpumasks/cpuidmasks/
 * Use structure assignment
 * Fix error path in arch_domain_create()
v3:
 * Indentation fixes.
 * Only allocate PV cpuidmasks if the host is has cpumasks to use.
---
 xen/arch/x86/cpu/amd.c       |  4 +++-
 xen/arch/x86/cpu/intel.c     |  5 ++++-
 xen/arch/x86/domain.c        | 14 ++++++++++++++
 xen/include/asm-x86/domain.h |  2 ++
 4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 3e2f4a8..d5afc3e 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -206,7 +206,9 @@ static void __init noinline probe_masking_msrs(void)
 static void amd_ctxt_switch_levelling(const struct domain *nextd)
 {
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
-	const struct cpuidmasks *masks = &cpuidmask_defaults;
+	const struct cpuidmasks *masks =
+		(nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
+		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
 
 #define LAZY(cap, msr, field)						\
 	({								\
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index e21c32d..fe4736e 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -154,7 +154,7 @@ static void __init probe_masking_msrs(void)
 static void intel_ctxt_switch_levelling(const struct domain *nextd)
 {
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
-	const struct cpuidmasks *masks = &cpuidmask_defaults;
+	const struct cpuidmasks *masks;
 
 	if (cpu_has_cpuid_faulting) {
 		/*
@@ -178,6 +178,9 @@ static void intel_ctxt_switch_levelling(const struct domain *nextd)
 		return;
 	}
 
+	masks = (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
+		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
+
 #define LAZY(msr, field)						\
 	({								\
 		if (unlikely(these_masks->field != masks->field) &&	\
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index cba77a2..a64bfdc 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -577,6 +577,14 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
             goto fail;
         clear_page(d->arch.pv_domain.gdt_ldt_l1tab);
 
+        if ( levelling_caps & ~LCAP_faulting )
+        {
+            d->arch.pv_domain.cpuidmasks = xmalloc(struct cpuidmasks);
+            if ( !d->arch.pv_domain.cpuidmasks )
+                goto fail;
+            *d->arch.pv_domain.cpuidmasks = cpuidmask_defaults;
+        }
+
         rc = create_perdomain_mapping(d, GDT_LDT_VIRT_START,
                                       GDT_LDT_MBYTES << (20 - PAGE_SHIFT),
                                       NULL, NULL);
@@ -672,7 +680,10 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
         paging_final_teardown(d);
     free_perdomain_mappings(d);
     if ( is_pv_domain(d) )
+    {
+        xfree(d->arch.pv_domain.cpuidmasks);
         free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab);
+    }
     psr_domain_free(d);
     return rc;
 }
@@ -692,7 +703,10 @@ void arch_domain_destroy(struct domain *d)
 
     free_perdomain_mappings(d);
     if ( is_pv_domain(d) )
+    {
         free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab);
+        xfree(d->arch.pv_domain.cpuidmasks);
+    }
 
     free_xenheap_page(d->shared_info);
     cleanup_domain_irq_mapping(d);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index de60def..90f021f 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -252,6 +252,8 @@ struct pv_domain
 
     /* map_domain_page() mapping cache. */
     struct mapcache_domain mapcache;
+
+    struct cpuidmasks *cpuidmasks;
 };
 
 struct monitor_write_data {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 13/21] x86/domctl: Update PV domain cpumasks when setting cpuid policy
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (11 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 12/21] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 16:26   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 14/21] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

This allows PV domains with different featuresets to observe different values
from a native cpuid instruction, on supporting hardware.

It is important to leak the host view of X2APIC, HTT and CMP_LEGACY through to
guests, even though they could be hidden.  These flags affect how to interpret
other cpuid leaves which are not maskable.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v2:
 * Use switch() rather than if/elseif chain
 * Clamp to static PV featuremask
v3:
 * Only set a shadow cpumask if it is available in hardware.  This causes
   fewer branches in the context switch.
 * Fix interaction between fastforward bits and override MSR.
 * Fix up the cross-vendor case.
 * Fix the host view of HTT/CMP_LEGACY.
v4:
 * More comments explaining the masking MSRs behaviour.
 * s/CPU/CPUID/
 * Leak host X2APIC.
v5:
 * Fix commit message wrt X2APIC.
---
 xen/arch/x86/domctl.c            | 138 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/cpufeature.h |   1 +
 2 files changed, 139 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index e5180ef..cba1e37 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include <asm/xstate.h>
 #include <asm/debugger.h>
 #include <asm/psr.h>
+#include <asm/cpuid.h>
 
 static int gdbsx_guest_mem_io(domid_t domid, struct xen_domctl_gdbsx_memio *iop)
 {
@@ -87,6 +88,143 @@ static void update_domain_cpuid_info(struct domain *d,
         d->arch.x86_model = (ctl->eax >> 4) & 0xf;
         if ( d->arch.x86 >= 0x6 )
             d->arch.x86_model |= (ctl->eax >> 12) & 0xf0;
+
+        if ( is_pv_domain(d) && ((levelling_caps & LCAP_1cd) == LCAP_1cd) )
+        {
+            uint64_t mask = cpuidmask_defaults._1cd;
+            uint32_t ecx = ctl->ecx & pv_featureset[FEATURESET_1c];
+            uint32_t edx = ctl->edx & pv_featureset[FEATURESET_1d];
+
+            /*
+             * Must expose hosts HTT and X2APIC value so a guest using native
+             * CPUID can correctly interpret other leaves which cannot be
+             * masked.
+             */
+            if ( cpu_has_x2apic )
+                ecx |= cpufeat_mask(X86_FEATURE_X2APIC);
+            if ( cpu_has_htt )
+                edx |= cpufeat_mask(X86_FEATURE_HTT);
+
+            switch ( boot_cpu_data.x86_vendor )
+            {
+            case X86_VENDOR_INTEL:
+                /*
+                 * Intel masking MSRs are documented as AND masks.
+                 * Experimentally, they are applied before OSXSAVE and APIC
+                 * are fast-forwarded from real hardware state.
+                 */
+                mask &= ((uint64_t)edx << 32) | ecx;
+                break;
+
+            case X86_VENDOR_AMD:
+                mask &= ((uint64_t)ecx << 32) | edx;
+
+                /*
+                 * AMD masking MSRs are documented as overrides.
+                 * Experimentally, fast-forwarding of the OSXSAVE and APIC
+                 * bits from real hardware state only occurs if the MSR has
+                 * the respective bits set.
+                 */
+                if ( ecx & cpufeat_mask(X86_FEATURE_XSAVE) )
+                    ecx = cpufeat_mask(X86_FEATURE_OSXSAVE);
+                else
+                    ecx = 0;
+                edx = cpufeat_mask(X86_FEATURE_APIC);
+
+                mask |= ((uint64_t)ecx << 32) | edx;
+                break;
+            }
+
+            d->arch.pv_domain.cpuidmasks->_1cd = mask;
+        }
+        break;
+
+    case 6:
+        if ( is_pv_domain(d) && ((levelling_caps & LCAP_6c) == LCAP_6c) )
+        {
+            uint64_t mask = cpuidmask_defaults._6c;
+
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+                mask &= (~0ULL << 32) | ctl->ecx;
+
+            d->arch.pv_domain.cpuidmasks->_6c = mask;
+        }
+        break;
+
+    case 7:
+        if ( ctl->input[1] != 0 )
+            break;
+
+        if ( is_pv_domain(d) && ((levelling_caps & LCAP_7ab0) == LCAP_7ab0) )
+        {
+            uint64_t mask = cpuidmask_defaults._7ab0;
+            uint32_t eax = ctl->eax;
+            uint32_t ebx = ctl->ebx & pv_featureset[FEATURESET_7b0];
+
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+                mask &= ((uint64_t)eax << 32) | ebx;
+
+            d->arch.pv_domain.cpuidmasks->_7ab0 = mask;
+        }
+        break;
+
+    case 0xd:
+        if ( ctl->input[1] != 1 )
+            break;
+
+        if ( is_pv_domain(d) && ((levelling_caps & LCAP_Da1) == LCAP_Da1) )
+        {
+            uint64_t mask = cpuidmask_defaults.Da1;
+            uint32_t eax = ctl->eax & pv_featureset[FEATURESET_Da1];
+
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+                mask &= (~0ULL << 32) | eax;
+
+            d->arch.pv_domain.cpuidmasks->Da1 = mask;
+        }
+        break;
+
+    case 0x80000001:
+        if ( is_pv_domain(d) && ((levelling_caps & LCAP_e1cd) == LCAP_e1cd) )
+        {
+            uint64_t mask = cpuidmask_defaults.e1cd;
+            uint32_t ecx = ctl->ecx & pv_featureset[FEATURESET_e1c];
+            uint32_t edx = ctl->edx & pv_featureset[FEATURESET_e1d];
+
+            /*
+             * Must expose hosts CMP_LEGACY value so a guest using native
+             * CPUID can correctly interpret other leaves which cannot be
+             * masked.
+             */
+            if ( cpu_has_cmp_legacy )
+                ecx |= cpufeat_mask(X86_FEATURE_CMP_LEGACY);
+
+            /* If not emulating AMD, clear the duplicated features in e1d. */
+            if ( d->arch.x86_vendor != X86_VENDOR_AMD )
+                edx &= ~CPUID_COMMON_1D_FEATURES;
+
+            switch ( boot_cpu_data.x86_vendor )
+            {
+            case X86_VENDOR_INTEL:
+                mask &= ((uint64_t)edx << 32) | ecx;
+                break;
+
+            case X86_VENDOR_AMD:
+                mask &= ((uint64_t)ecx << 32) | edx;
+
+                /*
+                 * Fast-forward bits - Must be set in the masking MSR for
+                 * fast-forwarding to occur in hardware.
+                 */
+                ecx = 0;
+                edx = cpufeat_mask(X86_FEATURE_APIC);
+
+                mask |= ((uint64_t)ecx << 32) | edx;
+                break;
+            }
+
+            d->arch.pv_domain.cpuidmasks->e1cd = mask;
+        }
         break;
     }
 }
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 9a93799..97c7e9e 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -84,6 +84,7 @@
 #define cpu_has_monitor		boot_cpu_has(X86_FEATURE_MONITOR)
 #define cpu_has_eist		boot_cpu_has(X86_FEATURE_EIST)
 #define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
+#define cpu_has_cmp_legacy	boot_cpu_has(X86_FEATURE_CMP_LEGACY)
 
 enum _cache_type {
     CACHE_TYPE_NULL = 0,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 14/21] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (12 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 13/21] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 16:54   ` Daniel De Graaf
  2016-04-08 16:32   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 15/21] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
                   ` (6 subsequent siblings)
  20 siblings, 2 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Daniel De Graaf, Tim Deegan

And provide stubs for toolstack use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave@recoil.org>
Acked-by: Jan Beulich <JBeulich@suse.com>
---
CC: Tim Deegan <tim@xen.org>
CC: Daniel De Graaf <dgdegra@tycho.nsa.gov>

v2:
 * Rebased to use libxencall
 * Improve hypercall documentation
v3:
 * Provide libxc implementation for XEN_SYSCTL_get_cpu_levelling_caps as well.
v4:
 * More const.
v5:
 * XSM bits.
---
 tools/flask/policy/policy/modules/xen/xen.te |  1 +
 tools/libxc/include/xenctrl.h                |  4 +++
 tools/libxc/xc_cpuid_x86.c                   | 41 ++++++++++++++++++++++
 tools/ocaml/libs/xc/xenctrl.ml               |  3 ++
 tools/ocaml/libs/xc/xenctrl.mli              |  4 +++
 tools/ocaml/libs/xc/xenctrl_stubs.c          | 35 +++++++++++++++++++
 xen/arch/x86/sysctl.c                        | 51 ++++++++++++++++++++++++++++
 xen/include/public/sysctl.h                  | 27 +++++++++++++++
 xen/xsm/flask/hooks.c                        |  3 ++
 xen/xsm/flask/policy/access_vectors          |  2 ++
 10 files changed, 171 insertions(+)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index c29b067..a551756 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -73,6 +73,7 @@ allow dom0_t xen_t:xen2 {
     pmu_ctrl
     get_symbol
     get_cpu_levelling_caps
+    get_cpu_featureset
 };
 
 # Allow dom0 to use all XENVER_ subops and VERSION subops that have checks.
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index e8cb1ec..1c865a3 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2618,6 +2618,10 @@ int xc_psr_cat_get_domain_data(xc_interface *xch, uint32_t domid,
 int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
                            uint32_t *cos_max, uint32_t *cbm_len,
                            bool *cdp_enabled);
+
+int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps);
+int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
+                          uint32_t *nr_features, uint32_t *featureset);
 #endif
 
 /* Compat shims */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 733add4..5780397 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -33,6 +33,47 @@
 #define DEF_MAX_INTELEXT  0x80000008u
 #define DEF_MAX_AMDEXT    0x8000001cu
 
+int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps)
+{
+    DECLARE_SYSCTL;
+    int ret;
+
+    sysctl.cmd = XEN_SYSCTL_get_cpu_levelling_caps;
+    ret = do_sysctl(xch, &sysctl);
+
+    if ( !ret )
+        *caps = sysctl.u.cpu_levelling_caps.caps;
+
+    return ret;
+}
+
+int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
+                          uint32_t *nr_features, uint32_t *featureset)
+{
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(featureset,
+                             *nr_features * sizeof(*featureset),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    int ret;
+
+    if ( xc_hypercall_bounce_pre(xch, featureset) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_get_cpu_featureset;
+    sysctl.u.cpu_featureset.index = index;
+    sysctl.u.cpu_featureset.nr_features = *nr_features;
+    set_xen_guest_handle(sysctl.u.cpu_featureset.features, featureset);
+
+    ret = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, featureset);
+
+    if ( !ret )
+        *nr_features = sysctl.u.cpu_featureset.nr_features;
+
+    return ret;
+}
+
 struct cpuid_domain_info
 {
     enum
diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index 58a53a1..75006e7 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -242,6 +242,9 @@ external version_changeset: handle -> string = "stub_xc_version_changeset"
 external version_capabilities: handle -> string =
   "stub_xc_version_capabilities"
 
+type featureset_index = Featureset_raw | Featureset_host | Featureset_pv | Featureset_hvm
+external get_cpu_featureset : handle -> featureset_index -> int64 array = "stub_xc_get_cpu_featureset"
+
 external watchdog : handle -> int -> int32 -> int
   = "stub_xc_watchdog"
 
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index 16443df..720e4b2 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -147,6 +147,10 @@ external version_compile_info : handle -> compile_info
 external version_changeset : handle -> string = "stub_xc_version_changeset"
 external version_capabilities : handle -> string
   = "stub_xc_version_capabilities"
+
+type featureset_index = Featureset_raw | Featureset_host | Featureset_pv | Featureset_hvm
+external get_cpu_featureset : handle -> featureset_index -> int64 array = "stub_xc_get_cpu_featureset"
+
 type core_magic = Magic_hvm | Magic_pv
 type core_header = {
   xch_magic : core_magic;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 4ac5dce..e87f14f 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1207,6 +1207,41 @@ CAMLprim value stub_xc_domain_deassign_device(value xch, value domid, value desc
 	CAMLreturn(Val_unit);
 }
 
+CAMLprim value stub_xc_get_cpu_featureset(value xch, value idx)
+{
+	CAMLparam2(xch, idx);
+	CAMLlocal1(bitmap_val);
+
+	/* Safe, because of the global ocaml lock. */
+	static uint32_t fs_len;
+
+	if (fs_len == 0)
+	{
+		int ret = xc_get_cpu_featureset(_H(xch), 0, &fs_len, NULL);
+
+		if (ret || (fs_len == 0))
+			failwith_xc(_H(xch));
+	}
+
+	{
+		/* To/from hypervisor to retrieve actual featureset */
+		uint32_t fs[fs_len], len = fs_len;
+		unsigned int i;
+
+		int ret = xc_get_cpu_featureset(_H(xch), Int_val(idx), &len, fs);
+
+		if (ret)
+			failwith_xc(_H(xch));
+
+		bitmap_val = caml_alloc(len, 0);
+
+		for (i = 0; i < len; ++i)
+			Store_field(bitmap_val, i, caml_copy_int64(fs[i]));
+	}
+
+	CAMLreturn(bitmap_val);
+}
+
 CAMLprim value stub_xc_watchdog(value xch, value domid, value timeout)
 {
 	CAMLparam3(xch, domid, timeout);
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index f68cbec..9c75de6 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -30,6 +30,7 @@
 #include <xen/cpu.h>
 #include <xsm/xsm.h>
 #include <asm/psr.h>
+#include <asm/cpuid.h>
 
 struct l3_cache_info {
     int ret;
@@ -196,6 +197,56 @@ long arch_do_sysctl(
             ret = -EFAULT;
         break;
 
+    case XEN_SYSCTL_get_cpu_featureset:
+    {
+        static const uint32_t *const featureset_table[] = {
+            [XEN_SYSCTL_cpu_featureset_raw]  = raw_featureset,
+            [XEN_SYSCTL_cpu_featureset_host] = host_featureset,
+            [XEN_SYSCTL_cpu_featureset_pv]   = pv_featureset,
+            [XEN_SYSCTL_cpu_featureset_hvm]  = hvm_featureset,
+        };
+        const uint32_t *featureset = NULL;
+        unsigned int nr;
+
+        /* Request for maximum number of features? */
+        if ( guest_handle_is_null(sysctl->u.cpu_featureset.features) )
+        {
+            sysctl->u.cpu_featureset.nr_features = FSCAPINTS;
+            if ( __copy_field_to_guest(u_sysctl, sysctl,
+                                       u.cpu_featureset.nr_features) )
+                ret = -EFAULT;
+            break;
+        }
+
+        /* Clip the number of entries. */
+        nr = min(sysctl->u.cpu_featureset.nr_features, FSCAPINTS);
+
+        /* Look up requested featureset. */
+        if ( sysctl->u.cpu_featureset.index < ARRAY_SIZE(featureset_table) )
+            featureset = featureset_table[sysctl->u.cpu_featureset.index];
+
+        /* Bad featureset index? */
+        if ( !featureset )
+            ret = -EINVAL;
+
+        /* Copy the requested featureset into place. */
+        if ( !ret && copy_to_guest(sysctl->u.cpu_featureset.features,
+                                   featureset, nr) )
+            ret = -EFAULT;
+
+        /* Inform the caller of how many features we wrote. */
+        sysctl->u.cpu_featureset.nr_features = nr;
+        if ( !ret && __copy_field_to_guest(u_sysctl, sysctl,
+                                           u.cpu_featureset.nr_features) )
+            ret = -EFAULT;
+
+        /* Inform the caller if there was more data to provide. */
+        if ( !ret && nr < FSCAPINTS )
+            ret = -ENOBUFS;
+
+        break;
+    }
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 1ab16db..4596d20 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -787,6 +787,31 @@ struct xen_sysctl_cpu_levelling_caps {
 typedef struct xen_sysctl_cpu_levelling_caps xen_sysctl_cpu_levelling_caps_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpu_levelling_caps_t);
 
+/*
+ * XEN_SYSCTL_get_cpu_featureset (x86 specific)
+ *
+ * Return information about featuresets available on this host.
+ *  -  Raw: The real cpuid values.
+ *  - Host: The values Xen is using, (after command line overrides, etc).
+ *  -   PV: Maximum set of features which can be given to a PV guest.
+ *  -  HVM: Maximum set of features which can be given to a HVM guest.
+ */
+struct xen_sysctl_cpu_featureset {
+#define XEN_SYSCTL_cpu_featureset_raw      0
+#define XEN_SYSCTL_cpu_featureset_host     1
+#define XEN_SYSCTL_cpu_featureset_pv       2
+#define XEN_SYSCTL_cpu_featureset_hvm      3
+    uint32_t index;       /* IN: Which featureset to query? */
+    uint32_t nr_features; /* IN/OUT: Number of entries in/written to
+                           * 'features', or the maximum number of features if
+                           * the guest handle is NULL.  NB. All featuresets
+                           * come from the same numberspace, so have the same
+                           * maximum length. */
+    XEN_GUEST_HANDLE_64(uint32) features; /* OUT: */
+};
+typedef struct xen_sysctl_featureset xen_sysctl_featureset_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_featureset_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -813,6 +838,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_psr_cat_op                    23
 #define XEN_SYSCTL_tmem_op                       24
 #define XEN_SYSCTL_get_cpu_levelling_caps        25
+#define XEN_SYSCTL_get_cpu_featureset            26
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -839,6 +865,7 @@ struct xen_sysctl {
         struct xen_sysctl_psr_cat_op        psr_cat_op;
         struct xen_sysctl_tmem_op           tmem_op;
         struct xen_sysctl_cpu_levelling_caps cpu_levelling_caps;
+        struct xen_sysctl_cpu_featureset    cpu_featureset;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index f0e3e5f..1fb0e84 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -811,6 +811,9 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_get_cpu_levelling_caps:
         return domain_has_xen(current->domain, XEN2__GET_CPU_LEVELLING_CAPS);
 
+    case XEN_SYSCTL_get_cpu_featureset:
+        return domain_has_xen(current->domain, XEN2__GET_CPU_FEATURESET);
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 31ecf02..0ebb56b 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -95,6 +95,8 @@ class xen2
     pmu_use
 # XEN_SYSCTL_get_cpu_levelling_caps
     get_cpu_levelling_caps
+# XEN_SYSCTL_get_cpu_featureset
+    get_cpu_featureset
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 15/21] tools/libxc: Modify bitmap operations to take void pointers
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (13 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 14/21] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 13:00   ` Wei Liu
  2016-04-08 16:34   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 16/21] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
                   ` (5 subsequent siblings)
  20 siblings, 2 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Julien Grall, Ian Jackson, Wei Liu, Stefano Stabellini

The type of the pointer to a bitmap is not interesting; it does not affect the
representation of the block of bits being pointed to.

Make the libxc functions consistent with those in Xen, so they can work just
as well with 'unsigned int *' based bitmaps.

As part of doing so, change the implementation to be in terms of char rather
than unsigned long.  This fixes alignment concerns with ARM.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Julien Grall <julien.grall@arm.com>

v2:
 * New
v3:
 * Implement in terms of char rather than unsigned long to fix alignment
   issues for ARM.
v4:
 * Fix erronious calculation in bitmap_size()
---
 tools/libxc/xc_bitops.h | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/xc_bitops.h b/tools/libxc/xc_bitops.h
index cd749f4..3e7a544 100644
--- a/tools/libxc/xc_bitops.h
+++ b/tools/libxc/xc_bitops.h
@@ -6,70 +6,73 @@
 #include <stdlib.h>
 #include <string.h>
 
+/* Needed by several includees, but no longer used for bitops. */
 #define BITS_PER_LONG (sizeof(unsigned long) * 8)
 #define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6)
 
-#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr)/BITS_PER_LONG]
-#define BITMAP_SHIFT(_nr) ((_nr) % BITS_PER_LONG)
+#define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr) / 8]
+#define BITMAP_SHIFT(_nr) ((_nr) % 8)
 
 /* calculate required space for number of longs needed to hold nr_bits */
 static inline int bitmap_size(int nr_bits)
 {
-    int nr_long, nr_bytes;
-    nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG;
-    nr_bytes = nr_long * sizeof(unsigned long);
-    return nr_bytes;
+    return (nr_bits + 7) / 8;
 }
 
-static inline unsigned long *bitmap_alloc(int nr_bits)
+static inline void *bitmap_alloc(int nr_bits)
 {
     return calloc(1, bitmap_size(nr_bits));
 }
 
-static inline void bitmap_set(unsigned long *addr, int nr_bits)
+static inline void bitmap_set(void *addr, int nr_bits)
 {
     memset(addr, 0xff, bitmap_size(nr_bits));
 }
 
-static inline void bitmap_clear(unsigned long *addr, int nr_bits)
+static inline void bitmap_clear(void *addr, int nr_bits)
 {
     memset(addr, 0, bitmap_size(nr_bits));
 }
 
-static inline int test_bit(int nr, unsigned long *addr)
+static inline int test_bit(int nr, const void *_addr)
 {
+    const char *addr = _addr;
     return (BITMAP_ENTRY(nr, addr) >> BITMAP_SHIFT(nr)) & 1;
 }
 
-static inline void clear_bit(int nr, unsigned long *addr)
+static inline void clear_bit(int nr, void *_addr)
 {
+    char *addr = _addr;
     BITMAP_ENTRY(nr, addr) &= ~(1UL << BITMAP_SHIFT(nr));
 }
 
-static inline void set_bit(int nr, unsigned long *addr)
+static inline void set_bit(int nr, void *_addr)
 {
+    char *addr = _addr;
     BITMAP_ENTRY(nr, addr) |= (1UL << BITMAP_SHIFT(nr));
 }
 
-static inline int test_and_clear_bit(int nr, unsigned long *addr)
+static inline int test_and_clear_bit(int nr, void *addr)
 {
     int oldbit = test_bit(nr, addr);
     clear_bit(nr, addr);
     return oldbit;
 }
 
-static inline int test_and_set_bit(int nr, unsigned long *addr)
+static inline int test_and_set_bit(int nr, void *addr)
 {
     int oldbit = test_bit(nr, addr);
     set_bit(nr, addr);
     return oldbit;
 }
 
-static inline void bitmap_or(unsigned long *dst, const unsigned long *other,
+static inline void bitmap_or(void *_dst, const void *_other,
                              int nr_bits)
 {
-    int i, nr_longs = (bitmap_size(nr_bits) / sizeof(unsigned long));
-    for ( i = 0; i < nr_longs; ++i )
+    char *dst = _dst;
+    const char *other = _other;
+    int i;
+    for ( i = 0; i < bitmap_size(nr_bits); ++i )
         dst[i] |= other[i];
 }
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 16/21] tools/libxc: Use public/featureset.h for cpuid policy generation
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (14 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 15/21] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 16:37   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 17/21] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson

Rather than having a different local copy of some of the feature
definitions.

Modify the xc_cpuid_x86.c cpumask helpers to appropriate truncate the
new values.

As some of the feature have been renamed in the public API, similar renames
are made here.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>

v3:
 * Adjust naming to match Xen.
---
 tools/libxc/xc_cpufeature.h | 151 --------------------------------------------
 tools/libxc/xc_cpuid_x86.c  |  37 ++++++-----
 2 files changed, 20 insertions(+), 168 deletions(-)
 delete mode 100644 tools/libxc/xc_cpufeature.h

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
deleted file mode 100644
index 01dbeec..0000000
--- a/tools/libxc/xc_cpufeature.h
+++ /dev/null
@@ -1,151 +0,0 @@
-/*
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation;
- * version 2.1 of the License.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef __LIBXC_CPUFEATURE_H
-#define __LIBXC_CPUFEATURE_H
-
-/* Intel-defined CPU features, CPUID level 0x00000001 (edx) */
-#define X86_FEATURE_FPU          0 /* Onboard FPU */
-#define X86_FEATURE_VME          1 /* Virtual Mode Extensions */
-#define X86_FEATURE_DE           2 /* Debugging Extensions */
-#define X86_FEATURE_PSE          3 /* Page Size Extensions */
-#define X86_FEATURE_TSC          4 /* Time Stamp Counter */
-#define X86_FEATURE_MSR          5 /* Model-Specific Registers, RDMSR, WRMSR */
-#define X86_FEATURE_PAE          6 /* Physical Address Extensions */
-#define X86_FEATURE_MCE          7 /* Machine Check Architecture */
-#define X86_FEATURE_CX8          8 /* CMPXCHG8 instruction */
-#define X86_FEATURE_APIC         9 /* Onboard APIC */
-#define X86_FEATURE_SEP         11 /* SYSENTER/SYSEXIT */
-#define X86_FEATURE_MTRR        12 /* Memory Type Range Registers */
-#define X86_FEATURE_PGE         13 /* Page Global Enable */
-#define X86_FEATURE_MCA         14 /* Machine Check Architecture */
-#define X86_FEATURE_CMOV        15 /* CMOV instruction */
-#define X86_FEATURE_PAT         16 /* Page Attribute Table */
-#define X86_FEATURE_PSE36       17 /* 36-bit PSEs */
-#define X86_FEATURE_PN          18 /* Processor serial number */
-#define X86_FEATURE_CLFLSH      19 /* Supports the CLFLUSH instruction */
-#define X86_FEATURE_DS          21 /* Debug Store */
-#define X86_FEATURE_ACPI        22 /* ACPI via MSR */
-#define X86_FEATURE_MMX         23 /* Multimedia Extensions */
-#define X86_FEATURE_FXSR        24 /* FXSAVE and FXRSTOR instructions */
-#define X86_FEATURE_XMM         25 /* Streaming SIMD Extensions */
-#define X86_FEATURE_XMM2        26 /* Streaming SIMD Extensions-2 */
-#define X86_FEATURE_SELFSNOOP   27 /* CPU self snoop */
-#define X86_FEATURE_HT          28 /* Hyper-Threading */
-#define X86_FEATURE_ACC         29 /* Automatic clock control */
-#define X86_FEATURE_IA64        30 /* IA-64 processor */
-#define X86_FEATURE_PBE         31 /* Pending Break Enable */
-
-/* AMD-defined CPU features, CPUID level 0x80000001 */
-/* Don't duplicate feature flags which are redundant with Intel! */
-#define X86_FEATURE_SYSCALL     11 /* SYSCALL/SYSRET */
-#define X86_FEATURE_MP          19 /* MP Capable. */
-#define X86_FEATURE_NX          20 /* Execute Disable */
-#define X86_FEATURE_MMXEXT      22 /* AMD MMX extensions */
-#define X86_FEATURE_FFXSR       25 /* FFXSR instruction optimizations */
-#define X86_FEATURE_PAGE1GB     26 /* 1Gb large page support */
-#define X86_FEATURE_RDTSCP      27 /* RDTSCP */
-#define X86_FEATURE_LM          29 /* Long Mode (x86-64) */
-#define X86_FEATURE_3DNOWEXT    30 /* AMD 3DNow! extensions */
-#define X86_FEATURE_3DNOW       31 /* 3DNow! */
-
-/* Intel-defined CPU features, CPUID level 0x00000001 (ecx) */
-#define X86_FEATURE_XMM3         0 /* Streaming SIMD Extensions-3 */
-#define X86_FEATURE_PCLMULQDQ    1 /* Carry-less multiplication */
-#define X86_FEATURE_DTES64       2 /* 64-bit Debug Store */
-#define X86_FEATURE_MWAIT        3 /* Monitor/Mwait support */
-#define X86_FEATURE_DSCPL        4 /* CPL Qualified Debug Store */
-#define X86_FEATURE_VMXE         5 /* Virtual Machine Extensions */
-#define X86_FEATURE_SMXE         6 /* Safer Mode Extensions */
-#define X86_FEATURE_EST          7 /* Enhanced SpeedStep */
-#define X86_FEATURE_TM2          8 /* Thermal Monitor 2 */
-#define X86_FEATURE_SSSE3        9 /* Supplemental Streaming SIMD Exts-3 */
-#define X86_FEATURE_CID         10 /* Context ID */
-#define X86_FEATURE_FMA         12 /* Fused Multiply Add */
-#define X86_FEATURE_CX16        13 /* CMPXCHG16B */
-#define X86_FEATURE_XTPR        14 /* Send Task Priority Messages */
-#define X86_FEATURE_PDCM        15 /* Perf/Debug Capability MSR */
-#define X86_FEATURE_PCID        17 /* Process Context ID */
-#define X86_FEATURE_DCA         18 /* Direct Cache Access */
-#define X86_FEATURE_SSE4_1      19 /* Streaming SIMD Extensions 4.1 */
-#define X86_FEATURE_SSE4_2      20 /* Streaming SIMD Extensions 4.2 */
-#define X86_FEATURE_X2APIC      21 /* x2APIC */
-#define X86_FEATURE_MOVBE       22 /* movbe instruction */
-#define X86_FEATURE_POPCNT      23 /* POPCNT instruction */
-#define X86_FEATURE_TSC_DEADLINE 24 /* "tdt" TSC Deadline Timer */
-#define X86_FEATURE_AES         25 /* AES acceleration instructions */
-#define X86_FEATURE_XSAVE       26 /* XSAVE/XRSTOR/XSETBV/XGETBV */
-#define X86_FEATURE_AVX         28 /* Advanced Vector Extensions */
-#define X86_FEATURE_F16C        29 /* Half-precision convert instruction */
-#define X86_FEATURE_RDRAND      30 /* Digital Random Number Generator */
-#define X86_FEATURE_HYPERVISOR  31 /* Running under some hypervisor */
-
-/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001 */
-#define X86_FEATURE_XSTORE       2 /* on-CPU RNG present (xstore insn) */
-#define X86_FEATURE_XSTORE_EN    3 /* on-CPU RNG enabled */
-#define X86_FEATURE_XCRYPT       6 /* on-CPU crypto (xcrypt insn) */
-#define X86_FEATURE_XCRYPT_EN    7 /* on-CPU crypto enabled */
-#define X86_FEATURE_ACE2         8 /* Advanced Cryptography Engine v2 */
-#define X86_FEATURE_ACE2_EN      9 /* ACE v2 enabled */
-#define X86_FEATURE_PHE         10 /* PadLock Hash Engine */
-#define X86_FEATURE_PHE_EN      11 /* PHE enabled */
-#define X86_FEATURE_PMM         12 /* PadLock Montgomery Multiplier */
-#define X86_FEATURE_PMM_EN      13 /* PMM enabled */
-
-/* More extended AMD flags: CPUID level 0x80000001, ecx */
-#define X86_FEATURE_LAHF_LM      0 /* LAHF/SAHF in long mode */
-#define X86_FEATURE_CMP_LEGACY   1 /* If yes HyperThreading not valid */
-#define X86_FEATURE_SVM          2 /* Secure virtual machine */
-#define X86_FEATURE_EXTAPIC      3 /* Extended APIC space */
-#define X86_FEATURE_CR8_LEGACY   4 /* CR8 in 32-bit mode */
-#define X86_FEATURE_ABM          5 /* Advanced bit manipulation */
-#define X86_FEATURE_SSE4A        6 /* SSE-4A */
-#define X86_FEATURE_MISALIGNSSE  7 /* Misaligned SSE mode */
-#define X86_FEATURE_3DNOWPREFETCH 8 /* 3DNow prefetch instructions */
-#define X86_FEATURE_OSVW         9 /* OS Visible Workaround */
-#define X86_FEATURE_IBS         10 /* Instruction Based Sampling */
-#define X86_FEATURE_XOP         11 /* extended AVX instructions */
-#define X86_FEATURE_SKINIT      12 /* SKINIT/STGI instructions */
-#define X86_FEATURE_WDT         13 /* Watchdog timer */
-#define X86_FEATURE_LWP         15 /* Light Weight Profiling */
-#define X86_FEATURE_FMA4        16 /* 4 operands MAC instructions */
-#define X86_FEATURE_NODEID_MSR  19 /* NodeId MSR */
-#define X86_FEATURE_TBM         21 /* trailing bit manipulations */
-#define X86_FEATURE_TOPOEXT     22 /* topology extensions CPUID leafs */
-#define X86_FEATURE_DBEXT       26 /* data breakpoint extension */
-
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx) */
-#define X86_FEATURE_FSGSBASE     0 /* {RD,WR}{FS,GS}BASE instructions */
-#define X86_FEATURE_TSC_ADJUST   1 /* Tsc thread offset */
-#define X86_FEATURE_BMI1         3 /* 1st group bit manipulation extensions */
-#define X86_FEATURE_HLE          4 /* Hardware Lock Elision */
-#define X86_FEATURE_AVX2         5 /* AVX2 instructions */
-#define X86_FEATURE_SMEP         7 /* Supervisor Mode Execution Protection */
-#define X86_FEATURE_BMI2         8 /* 2nd group bit manipulation extensions */
-#define X86_FEATURE_ERMS         9 /* Enhanced REP MOVSB/STOSB */
-#define X86_FEATURE_INVPCID     10 /* Invalidate Process Context ID */
-#define X86_FEATURE_RTM         11 /* Restricted Transactional Memory */
-#define X86_FEATURE_MPX         14 /* Memory Protection Extensions */
-#define X86_FEATURE_RDSEED      18 /* RDSEED instruction */
-#define X86_FEATURE_ADX         19 /* ADCX, ADOX instructions */
-#define X86_FEATURE_SMAP        20 /* Supervisor Mode Access Protection */
-#define X86_FEATURE_PCOMMIT     22 /* PCOMMIT instruction */
-#define X86_FEATURE_CLFLUSHOPT  23 /* CLFLUSHOPT instruction */
-#define X86_FEATURE_CLWB        24 /* CLWB instruction */
-
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx) */
-#define X86_FEATURE_PKU     3
-
-#endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 5780397..d3674db 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -22,12 +22,16 @@
 #include <stdlib.h>
 #include <stdbool.h>
 #include "xc_private.h"
-#include "xc_cpufeature.h"
 #include <xen/hvm/params.h>
 
-#define bitmaskof(idx)      (1u << (idx))
-#define clear_bit(idx, dst) ((dst) &= ~(1u << (idx)))
-#define set_bit(idx, dst)   ((dst) |= (1u << (idx)))
+enum {
+#define XEN_CPUFEATURE(name, value) X86_FEATURE_##name = value,
+#include <xen/arch-x86/cpufeatureset.h>
+};
+
+#define bitmaskof(idx)      (1u << ((idx) & 31))
+#define clear_bit(idx, dst) ((dst) &= ~bitmaskof(idx))
+#define set_bit(idx, dst)   ((dst) |=  bitmaskof(idx))
 
 #define DEF_MAX_BASE 0x0000000du
 #define DEF_MAX_INTELEXT  0x80000008u
@@ -223,7 +227,6 @@ static void amd_xc_cpuid_policy(xc_interface *xch,
                     bitmaskof(X86_FEATURE_LM) |
                     bitmaskof(X86_FEATURE_PAGE1GB) |
                     bitmaskof(X86_FEATURE_SYSCALL) |
-                    bitmaskof(X86_FEATURE_MP) |
                     bitmaskof(X86_FEATURE_MMXEXT) |
                     bitmaskof(X86_FEATURE_FFXSR) |
                     bitmaskof(X86_FEATURE_3DNOW) |
@@ -280,7 +283,7 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
     case 0x00000001:
         /* ECX[5] is availability of VMX */
         if ( info->nestedhvm )
-            set_bit(X86_FEATURE_VMXE, regs[2]);
+            set_bit(X86_FEATURE_VMX, regs[2]);
         break;
 
     case 0x00000004:
@@ -398,7 +401,7 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
          */
         regs[1] = (regs[1] & 0x0000ffffu) | ((regs[1] & 0x007f0000u) << 1);
 
-        regs[2] &= (bitmaskof(X86_FEATURE_XMM3) |
+        regs[2] &= (bitmaskof(X86_FEATURE_SSE3) |
                     bitmaskof(X86_FEATURE_PCLMULQDQ) |
                     bitmaskof(X86_FEATURE_SSSE3) |
                     bitmaskof(X86_FEATURE_FMA) |
@@ -408,7 +411,7 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
                     bitmaskof(X86_FEATURE_SSE4_2) |
                     bitmaskof(X86_FEATURE_MOVBE)  |
                     bitmaskof(X86_FEATURE_POPCNT) |
-                    bitmaskof(X86_FEATURE_AES) |
+                    bitmaskof(X86_FEATURE_AESNI) |
                     bitmaskof(X86_FEATURE_F16C) |
                     bitmaskof(X86_FEATURE_RDRAND) |
                     ((info->xfeature_mask != 0) ?
@@ -435,13 +438,13 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
                     bitmaskof(X86_FEATURE_MCA) |
                     bitmaskof(X86_FEATURE_CMOV) |
                     bitmaskof(X86_FEATURE_PAT) |
-                    bitmaskof(X86_FEATURE_CLFLSH) |
+                    bitmaskof(X86_FEATURE_CLFLUSH) |
                     bitmaskof(X86_FEATURE_PSE36) |
                     bitmaskof(X86_FEATURE_MMX) |
                     bitmaskof(X86_FEATURE_FXSR) |
-                    bitmaskof(X86_FEATURE_XMM) |
-                    bitmaskof(X86_FEATURE_XMM2) |
-                    bitmaskof(X86_FEATURE_HT));
+                    bitmaskof(X86_FEATURE_SSE) |
+                    bitmaskof(X86_FEATURE_SSE2) |
+                    bitmaskof(X86_FEATURE_HTT));
             
         /* We always support MTRR MSRs. */
         regs[3] |= bitmaskof(X86_FEATURE_MTRR);
@@ -560,15 +563,15 @@ static void xc_cpuid_pv_policy(xc_interface *xch,
         if ( info->vendor == VENDOR_AMD )
             clear_bit(X86_FEATURE_SEP, regs[3]);
         clear_bit(X86_FEATURE_DS, regs[3]);
-        clear_bit(X86_FEATURE_ACC, regs[3]);
+        clear_bit(X86_FEATURE_TM1, regs[3]);
         clear_bit(X86_FEATURE_PBE, regs[3]);
 
         clear_bit(X86_FEATURE_DTES64, regs[2]);
-        clear_bit(X86_FEATURE_MWAIT, regs[2]);
+        clear_bit(X86_FEATURE_MONITOR, regs[2]);
         clear_bit(X86_FEATURE_DSCPL, regs[2]);
-        clear_bit(X86_FEATURE_VMXE, regs[2]);
-        clear_bit(X86_FEATURE_SMXE, regs[2]);
-        clear_bit(X86_FEATURE_EST, regs[2]);
+        clear_bit(X86_FEATURE_VMX, regs[2]);
+        clear_bit(X86_FEATURE_SMX, regs[2]);
+        clear_bit(X86_FEATURE_EIST, regs[2]);
         clear_bit(X86_FEATURE_TM2, regs[2]);
         if ( !info->pv64 )
             clear_bit(X86_FEATURE_CX16, regs[2]);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 17/21] tools/libxc: Expose the automatically generated cpu featuremask information
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (15 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 16/21] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-08 16:38   ` Konrad Rzeszutek Wilk
  2016-04-07 11:57 ` [PATCH v5 18/21] tools: Utility for dealing with featuresets Andrew Cooper
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>

New in v2
---
 tools/libxc/Makefile          |  9 ++++++
 tools/libxc/include/xenctrl.h | 14 ++++++++
 tools/libxc/xc_cpuid_x86.c    | 75 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+)

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 608404f..ef02c9d 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -145,6 +145,15 @@ $(eval $(genpath-target))
 
 xc_private.h: _paths.h
 
+ifeq ($(CONFIG_X86),y)
+
+_xc_cpuid_autogen.h: $(XEN_ROOT)/xen/include/public/arch-x86/cpufeatureset.h $(XEN_ROOT)/xen/tools/gen-cpuid.py
+	$(PYTHON) $(XEN_ROOT)/xen/tools/gen-cpuid.py -i $^ -o $@.new
+	$(call move-if-changed,$@.new,$@)
+
+build: _xc_cpuid_autogen.h
+endif
+
 $(CTRL_LIB_OBJS) $(GUEST_LIB_OBJS) \
 $(CTRL_PIC_OBJS) $(GUEST_PIC_OBJS): xc_private.h
 
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 1c865a3..3715f51 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2622,6 +2622,20 @@ int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
 int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps);
 int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
                           uint32_t *nr_features, uint32_t *featureset);
+
+uint32_t xc_get_cpu_featureset_size(void);
+
+enum xc_static_cpu_featuremask {
+    XC_FEATUREMASK_KNOWN,
+    XC_FEATUREMASK_SPECIAL,
+    XC_FEATUREMASK_PV,
+    XC_FEATUREMASK_HVM_SHADOW,
+    XC_FEATUREMASK_HVM_HAP,
+    XC_FEATUREMASK_DEEP_FEATURES,
+};
+const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
+const uint32_t *xc_get_feature_deep_deps(uint32_t feature);
+
 #endif
 
 /* Compat shims */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index d3674db..0cffb36 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -28,6 +28,7 @@ enum {
 #define XEN_CPUFEATURE(name, value) X86_FEATURE_##name = value,
 #include <xen/arch-x86/cpufeatureset.h>
 };
+#include "_xc_cpuid_autogen.h"
 
 #define bitmaskof(idx)      (1u << ((idx) & 31))
 #define clear_bit(idx, dst) ((dst) &= ~bitmaskof(idx))
@@ -78,6 +79,80 @@ int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
     return ret;
 }
 
+uint32_t xc_get_cpu_featureset_size(void)
+{
+    return FEATURESET_NR_ENTRIES;
+}
+
+const uint32_t *xc_get_static_cpu_featuremask(
+    enum xc_static_cpu_featuremask mask)
+{
+    const static uint32_t known[FEATURESET_NR_ENTRIES] = INIT_KNOWN_FEATURES,
+        special[FEATURESET_NR_ENTRIES] = INIT_SPECIAL_FEATURES,
+        pv[FEATURESET_NR_ENTRIES] = INIT_PV_FEATURES,
+        hvm_shadow[FEATURESET_NR_ENTRIES] = INIT_HVM_SHADOW_FEATURES,
+        hvm_hap[FEATURESET_NR_ENTRIES] = INIT_HVM_HAP_FEATURES,
+        deep_features[FEATURESET_NR_ENTRIES] = INIT_DEEP_FEATURES;
+
+    XC_BUILD_BUG_ON(ARRAY_SIZE(known) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(special) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(pv) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(hvm_shadow) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(hvm_hap) != FEATURESET_NR_ENTRIES);
+    XC_BUILD_BUG_ON(ARRAY_SIZE(deep_features) != FEATURESET_NR_ENTRIES);
+
+    switch ( mask )
+    {
+    case XC_FEATUREMASK_KNOWN:
+        return known;
+
+    case XC_FEATUREMASK_SPECIAL:
+        return special;
+
+    case XC_FEATUREMASK_PV:
+        return pv;
+
+    case XC_FEATUREMASK_HVM_SHADOW:
+        return hvm_shadow;
+
+    case XC_FEATUREMASK_HVM_HAP:
+        return hvm_hap;
+
+    case XC_FEATUREMASK_DEEP_FEATURES:
+        return deep_features;
+
+    default:
+        return NULL;
+    }
+}
+
+const uint32_t *xc_get_feature_deep_deps(uint32_t feature)
+{
+    static const struct {
+        uint32_t feature;
+        uint32_t fs[FEATURESET_NR_ENTRIES];
+    } deep_deps[] = INIT_DEEP_DEPS;
+
+    unsigned int start = 0, end = ARRAY_SIZE(deep_deps);
+
+    XC_BUILD_BUG_ON(ARRAY_SIZE(deep_deps) != NR_DEEP_DEPS);
+
+    /* deep_deps[] is sorted.  Perform a binary search. */
+    while ( start < end )
+    {
+        unsigned int mid = start + ((end - start) / 2);
+
+        if ( deep_deps[mid].feature > feature )
+            end = mid;
+        else if ( deep_deps[mid].feature < feature )
+            start = mid + 1;
+        else
+            return deep_deps[mid].fs;
+    }
+
+    return NULL;
+}
+
 struct cpuid_domain_info
 {
     enum
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 18/21] tools: Utility for dealing with featuresets
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (16 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 17/21] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 11:57 ` [PATCH v5 19/21] tools/libxc: Wire a featureset through to cpuid policy logic Andrew Cooper
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson

It is able to reports the current featuresets; both the static masks and
dynamic featuresets from Xen, or to decode an arbitrary featureset into
`/proc/cpuinfo` style strings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>

v2: No linking hackary
---
 .gitignore             |   1 +
 tools/misc/Makefile    |   4 +
 tools/misc/xen-cpuid.c | 394 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 399 insertions(+)
 create mode 100644 tools/misc/xen-cpuid.c

diff --git a/.gitignore b/.gitignore
index b40453e..20ffa2d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -179,6 +179,7 @@ tools/misc/cpuperf/cpuperf-perfcntr
 tools/misc/cpuperf/cpuperf-xen
 tools/misc/xc_shadow
 tools/misc/xen_cpuperf
+tools/misc/xen-cpuid
 tools/misc/xen-detect
 tools/misc/xen-tmem-list-parse
 tools/misc/xenperf
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index a2ef0ec..a94dad9 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -10,6 +10,7 @@ CFLAGS += $(CFLAGS_xeninclude)
 CFLAGS += $(CFLAGS_libxenstore)
 
 # Everything to be installed in regular bin/
+INSTALL_BIN-$(CONFIG_X86)      += xen-cpuid
 INSTALL_BIN-$(CONFIG_X86)      += xen-detect
 INSTALL_BIN                    += xencons
 INSTALL_BIN                    += xencov_split
@@ -68,6 +69,9 @@ clean:
 .PHONY: distclean
 distclean: clean
 
+xen-cpuid: xen-cpuid.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)
+
 xen-hvmctx: xen-hvmctx.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
new file mode 100644
index 0000000..608c488
--- /dev/null
+++ b/tools/misc/xen-cpuid.c
@@ -0,0 +1,394 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <err.h>
+#include <getopt.h>
+#include <string.h>
+
+#include <xenctrl.h>
+
+#define ARRAY_SIZE(a) (sizeof a / sizeof *a)
+static uint32_t nr_features;
+
+static const char *str_1d[32] =
+{
+    [ 0] = "fpu",  [ 1] = "vme",
+    [ 2] = "de",   [ 3] = "pse",
+    [ 4] = "tsc",  [ 5] = "msr",
+    [ 6] = "pae",  [ 7] = "mce",
+    [ 8] = "cx8",  [ 9] = "apic",
+    [10] = "REZ",  [11] = "sysenter",
+    [12] = "mtrr", [13] = "pge",
+    [14] = "mca",  [15] = "cmov",
+    [16] = "pat",  [17] = "pse36",
+    [18] = "psn",  [19] = "clflush",
+    [20] = "REZ",  [21] = "ds",
+    [22] = "acpi", [23] = "mmx",
+    [24] = "fxsr", [25] = "sse",
+    [26] = "sse2", [27] = "ss",
+    [28] = "htt",  [29] = "tm",
+    [30] = "ia64", [31] = "pbe",
+};
+
+static const char *str_1c[32] =
+{
+    [ 0] = "sse3",    [ 1] = "pclmulqdq",
+    [ 2] = "dtes64",  [ 3] = "monitor",
+    [ 4] = "ds-cpl",  [ 5] = "vmx",
+    [ 6] = "smx",     [ 7] = "est",
+    [ 8] = "tm2",     [ 9] = "ssse3",
+    [10] = "cntx-id", [11] = "sdgb",
+    [12] = "fma",     [13] = "cx16",
+    [14] = "xtpr",    [15] = "pdcm",
+    [16] = "REZ",     [17] = "pcid",
+    [18] = "dca",     [19] = "sse41",
+    [20] = "sse42",   [21] = "x2apic",
+    [22] = "movebe",  [23] = "popcnt",
+    [24] = "tsc-dl",  [25] = "aesni",
+    [26] = "xsave",   [27] = "osxsave",
+    [28] = "avx",     [29] = "f16c",
+    [30] = "rdrnd",   [31] = "hyper",
+};
+
+static const char *str_e1d[32] =
+{
+    [ 0] = "fpu",    [ 1] = "vme",
+    [ 2] = "de",     [ 3] = "pse",
+    [ 4] = "tsc",    [ 5] = "msr",
+    [ 6] = "pae",    [ 7] = "mce",
+    [ 8] = "cx8",    [ 9] = "apic",
+    [10] = "REZ",    [11] = "syscall",
+    [12] = "mtrr",   [13] = "pge",
+    [14] = "mca",    [15] = "cmov",
+    [16] = "fcmov",  [17] = "pse36",
+    [18] = "REZ",    [19] = "mp",
+    [20] = "nx",     [21] = "REZ",
+    [22] = "mmx+",   [23] = "mmx",
+    [24] = "fxsr",   [25] = "fxsr+",
+    [26] = "pg1g",   [27] = "rdtscp",
+    [28] = "REZ",    [29] = "lm",
+    [30] = "3dnow+", [31] = "3dnow",
+};
+
+static const char *str_e1c[32] =
+{
+    [ 0] = "lahf_lm",    [ 1] = "cmp",
+    [ 2] = "svm",        [ 3] = "extapic",
+    [ 4] = "cr8d",       [ 5] = "lzcnt",
+    [ 6] = "sse4a",      [ 7] = "msse",
+    [ 8] = "3dnowpf",    [ 9] = "osvw",
+    [10] = "ibs",        [11] = "xop",
+    [12] = "skinit",     [13] = "wdt",
+    [14] = "REZ",        [15] = "lwp",
+    [16] = "fma4",       [17] = "tce",
+    [18] = "REZ",        [19] = "nodeid",
+    [20] = "REZ",        [21] = "tbm",
+    [22] = "topoext",    [23] = "perfctr_core",
+    [24] = "perfctr_nb", [25] = "REZ",
+    [26] = "dbx",        [27] = "perftsc",
+    [28] = "pcx_l2i",    [29] = "monitorx",
+
+    [30 ... 31] = "REZ",
+};
+
+static const char *str_7b0[32] =
+{
+    [ 0] = "fsgsbase", [ 1] = "tsc-adj",
+    [ 2] = "sgx",      [ 3] = "bmi1",
+    [ 4] = "hle",      [ 5] = "avx2",
+    [ 6] = "REZ",      [ 7] = "smep",
+    [ 8] = "bmi2",     [ 9] = "erms",
+    [10] = "invpcid",  [11] = "rtm",
+    [12] = "pqm",      [13] = "depfpp",
+    [14] = "mpx",      [15] = "pqe",
+    [16] = "avx512f",  [17] = "avx512dq",
+    [18] = "rdseed",   [19] = "adx",
+    [20] = "smap",     [21] = "avx512ifma",
+    [22] = "pcomit",   [23] = "clflushopt",
+    [24] = "clwb",     [25] = "pt",
+    [26] = "avx512pf", [27] = "avx512er",
+    [28] = "avx512cd", [29] = "sha",
+    [30] = "avx512bw", [31] = "avx512vl",
+};
+
+static const char *str_Da1[32] =
+{
+    [ 0] = "xsaveopt", [ 1] = "xsavec",
+    [ 2] = "xgetbv1",  [ 3] = "xsaves",
+
+    [4 ... 31] = "REZ",
+};
+
+static const char *str_7c0[32] =
+{
+    [ 0] = "prechwt1", [ 1] = "avx512vbmi",
+    [ 2] = "REZ",      [ 3] = "pku",
+    [ 4] = "ospke",
+
+    [5 ... 31] = "REZ",
+};
+
+static const char *str_e7d[32] =
+{
+    [0 ... 7] = "REZ",
+
+    [ 8] = "itsc",
+
+    [9 ... 31] = "REZ",
+};
+
+static const char *str_e8b[32] =
+{
+    [ 0] = "clzero",
+
+    [1 ... 31] = "REZ",
+};
+
+static struct {
+    const char *name;
+    const char *abbr;
+    const char **strs;
+} decodes[] =
+{
+    { "0x00000001.edx",   "1d",  str_1d },
+    { "0x00000001.ecx",   "1c",  str_1c },
+    { "0x80000001.edx",   "e1d", str_e1d },
+    { "0x80000001.ecx",   "e1c", str_e1c },
+    { "0x0000000d:1.eax", "Da1", str_Da1 },
+    { "0x00000007:0.ebx", "7b0", str_7b0 },
+    { "0x00000007:0.ecx", "7c0", str_7c0 },
+    { "0x80000007.edx",   "e7d", str_e7d },
+    { "0x80000008.ebx",   "e8b", str_e8b },
+};
+
+#define COL_ALIGN "18"
+
+static struct fsinfo {
+    const char *name;
+    uint32_t len;
+    uint32_t *fs;
+} featuresets[] =
+{
+    [XEN_SYSCTL_cpu_featureset_host] = { "Host", 0, NULL },
+    [XEN_SYSCTL_cpu_featureset_raw]  = { "Raw",  0, NULL },
+    [XEN_SYSCTL_cpu_featureset_pv]   = { "PV",   0, NULL },
+    [XEN_SYSCTL_cpu_featureset_hvm]  = { "HVM",  0, NULL },
+};
+
+static void dump_leaf(uint32_t leaf, const char **strs)
+{
+    unsigned i;
+
+    if ( !strs )
+    {
+        printf(" ???");
+        return;
+    }
+
+    for ( i = 0; i < 32; ++i )
+        if ( leaf & (1u << i) )
+            printf(" %s", strs[i] ?: "???" );
+}
+
+static void decode_featureset(const uint32_t *features,
+                              const uint32_t length,
+                              const char *name,
+                              bool detail)
+{
+    unsigned int i;
+
+    printf("%-"COL_ALIGN"s        ", name);
+    for ( i = 0; i < length; ++i )
+        printf("%08x%c", features[i],
+               i < length - 1 ? ':' : '\n');
+
+    if ( !detail )
+        return;
+
+    for ( i = 0; i < length && i < ARRAY_SIZE(decodes); ++i )
+    {
+        printf("  [%02u] %-"COL_ALIGN"s", i, decodes[i].name ?: "<UNKNOWN>");
+        if ( decodes[i].name )
+            dump_leaf(features[i], decodes[i].strs);
+        printf("\n");
+    }
+}
+
+static void get_featureset(xc_interface *xch, unsigned int idx)
+{
+    struct fsinfo *f = &featuresets[idx];
+
+    f->len = xc_get_cpu_featureset_size();
+    f->fs = calloc(nr_features, sizeof(*f->fs));
+
+    if ( !f->fs )
+        err(1, "calloc(, featureset)");
+
+    if ( xc_get_cpu_featureset(xch, idx, &f->len, f->fs) )
+        err(1, "xc_get_featureset()");
+}
+
+static void dump_info(xc_interface *xch, bool detail)
+{
+    unsigned int i;
+
+    printf("nr_features: %u\n", nr_features);
+
+    if ( !detail )
+    {
+        printf("       %"COL_ALIGN"s ", "KEY");
+        for ( i = 0; i < ARRAY_SIZE(decodes); ++i )
+            printf("%-8s ", decodes[i].abbr ?: "???");
+        printf("\n");
+    }
+
+    printf("\nStatic sets:\n");
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_KNOWN),
+                      nr_features, "Known", detail);
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_SPECIAL),
+                      nr_features, "Special", detail);
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_PV),
+                      nr_features, "PV Mask", detail);
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_HVM_SHADOW),
+                      nr_features, "HVM Shadow Mask", detail);
+    decode_featureset(xc_get_static_cpu_featuremask(XC_FEATUREMASK_HVM_HAP),
+                      nr_features, "HVM Hap Mask", detail);
+
+    printf("\nDynamic sets:\n");
+    for ( i = 0; i < ARRAY_SIZE(featuresets); ++i )
+    {
+        get_featureset(xch, i);
+
+        decode_featureset(featuresets[i].fs, featuresets[i].len,
+                          featuresets[i].name, detail);
+    }
+
+    for ( i = 0; i < ARRAY_SIZE(featuresets); ++i )
+        free(featuresets[i].fs);
+}
+
+int main(int argc, char **argv)
+{
+    enum { MODE_UNKNOWN, MODE_INFO, MODE_DETAIL, MODE_INTERPRET }
+    mode = MODE_UNKNOWN;
+
+    nr_features = xc_get_cpu_featureset_size();
+
+    for ( ;; )
+    {
+        int option_index = 0, c;
+        static struct option long_options[] =
+        {
+            { "help", no_argument, NULL, 'h' },
+            { "info", no_argument, NULL, 'i' },
+            { "detail", no_argument, NULL, 'd' },
+            { "verbose", no_argument, NULL, 'v' },
+            { NULL, 0, NULL, 0 },
+        };
+
+        c = getopt_long(argc, argv, "hidv", long_options, &option_index);
+
+        if ( c == -1 )
+            break;
+
+        switch ( c )
+        {
+        case 'h':
+ option_error:
+            printf("Usage: %s [ info | detail | <featureset>* ]\n", argv[0]);
+            return 0;
+
+        case 'i':
+            mode = MODE_INFO;
+            break;
+
+        case 'd':
+        case 'v':
+            mode = MODE_DETAIL;
+            break;
+
+        default:
+            printf("Bad option '%c'\n", c);
+            goto option_error;
+        }
+    }
+
+    if ( mode == MODE_UNKNOWN )
+    {
+        if ( optind == argc )
+            mode = MODE_INFO;
+        else if ( optind < argc )
+        {
+            if ( !strcmp(argv[optind], "info") )
+            {
+                mode = MODE_INFO;
+                optind++;
+            }
+            else if ( !strcmp(argv[optind], "detail") )
+            {
+                mode = MODE_DETAIL;
+                optind++;
+            }
+            else
+                mode = MODE_INTERPRET;
+        }
+        else
+            mode = MODE_INTERPRET;
+    }
+
+    if ( mode == MODE_INFO || mode == MODE_DETAIL )
+    {
+        xc_interface *xch = xc_interface_open(0, 0, 0);
+
+        if ( !xch )
+            err(1, "xc_interface_open");
+
+        if ( xc_get_cpu_featureset(xch, 0, &nr_features, NULL) )
+            err(1, "xc_get_featureset(, NULL)");
+
+        dump_info(xch, mode == MODE_DETAIL);
+
+        xc_interface_close(xch);
+    }
+    else
+    {
+        uint32_t fs[nr_features + 1];
+
+        while ( optind < argc )
+        {
+            char *ptr = argv[optind++];
+            unsigned int i = 0;
+            int offset;
+
+            memset(fs, 0, sizeof(fs));
+
+            while ( sscanf(ptr, "%x%n", &fs[i], &offset) == 1 )
+            {
+                i++;
+                ptr += offset;
+
+                if ( i == nr_features )
+                    break;
+
+                if ( *ptr == ':' )
+                {
+                    ptr++; continue;
+                }
+                break;
+            }
+
+            decode_featureset(fs, i, "Raw", true);
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 19/21] tools/libxc: Wire a featureset through to cpuid policy logic
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (17 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 18/21] tools: Utility for dealing with featuresets Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 11:57 ` [PATCH v5 20/21] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
  2016-04-07 11:57 ` [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
  20 siblings, 0 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson

Later changes will cause the cpuid generation logic to seed their information
from a featureset.  This patch adds the infrastructure to specify a
featureset, and will obtain the appropriate default from Xen if omitted.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>

v2:
 * Modify existing call rather than introducing a new one.
 * Fix up in-tree callsites.
---
 tools/libxc/include/xenctrl.h       |  4 ++-
 tools/libxc/xc_cpuid_x86.c          | 69 ++++++++++++++++++++++++++++++++-----
 tools/libxl/libxl_cpuid.c           |  2 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c |  2 +-
 tools/python/xen/lowlevel/xc/xc.c   |  2 +-
 5 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 3715f51..f5a034a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1985,7 +1985,9 @@ int xc_cpuid_set(xc_interface *xch,
                  const char **config,
                  char **config_transformed);
 int xc_cpuid_apply_policy(xc_interface *xch,
-                          domid_t domid);
+                          domid_t domid,
+                          uint32_t *featureset,
+                          unsigned int nr_features);
 void xc_cpuid_to_str(const unsigned int *regs,
                      char **strs); /* some strs[] may be NULL if ENOMEM */
 int xc_mca_op(xc_interface *xch, struct xen_mc *mc);
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 0cffb36..a92f5e4 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -166,6 +166,9 @@ struct cpuid_domain_info
     bool pvh;
     uint64_t xfeature_mask;
 
+    uint32_t *featureset;
+    unsigned int nr_features;
+
     /* PV-only information. */
     bool pv64;
 
@@ -197,11 +200,14 @@ static void cpuid(const unsigned int *input, unsigned int *regs)
 }
 
 static int get_cpuid_domain_info(xc_interface *xch, domid_t domid,
-                                 struct cpuid_domain_info *info)
+                                 struct cpuid_domain_info *info,
+                                 uint32_t *featureset,
+                                 unsigned int nr_features)
 {
     struct xen_domctl domctl = {};
     xc_dominfo_t di;
     unsigned int in[2] = { 0, ~0U }, regs[4];
+    unsigned int i, host_nr_features = xc_get_cpu_featureset_size();
     int rc;
 
     cpuid(in, regs);
@@ -223,6 +229,23 @@ static int get_cpuid_domain_info(xc_interface *xch, domid_t domid,
     info->hvm = di.hvm;
     info->pvh = di.pvh;
 
+    info->featureset = calloc(host_nr_features, sizeof(*info->featureset));
+    if ( !info->featureset )
+        return -ENOMEM;
+
+    info->nr_features = host_nr_features;
+
+    if ( featureset )
+    {
+        memcpy(info->featureset, featureset,
+               min(host_nr_features, nr_features) * sizeof(*info->featureset));
+
+        /* Check for truncated set bits. */
+        for ( i = nr_features; i < host_nr_features; ++i )
+            if ( featureset[i] != 0 )
+                return -EOPNOTSUPP;
+    }
+
     /* Get xstate information. */
     domctl.cmd = XEN_DOMCTL_getvcpuextstate;
     domctl.domain = domid;
@@ -247,6 +270,14 @@ static int get_cpuid_domain_info(xc_interface *xch, domid_t domid,
             return rc;
 
         info->nestedhvm = !!val;
+
+        if ( !featureset )
+        {
+            rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_hvm,
+                                       &host_nr_features, info->featureset);
+            if ( rc )
+                return rc;
+        }
     }
     else
     {
@@ -257,11 +288,24 @@ static int get_cpuid_domain_info(xc_interface *xch, domid_t domid,
             return rc;
 
         info->pv64 = (width == 8);
+
+        if ( !featureset )
+        {
+            rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_pv,
+                                       &host_nr_features, info->featureset);
+            if ( rc )
+                return rc;
+        }
     }
 
     return 0;
 }
 
+static void free_cpuid_domain_info(struct cpuid_domain_info *info)
+{
+    free(info->featureset);
+}
+
 static void amd_xc_cpuid_policy(xc_interface *xch,
                                 const struct cpuid_domain_info *info,
                                 const unsigned int *input, unsigned int *regs)
@@ -789,16 +833,18 @@ void xc_cpuid_to_str(const unsigned int *regs, char **strs)
     }
 }
 
-int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid)
+int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
+                          uint32_t *featureset,
+                          unsigned int nr_features)
 {
     struct cpuid_domain_info info = {};
     unsigned int input[2] = { 0, 0 }, regs[4];
     unsigned int base_max, ext_max;
     int rc;
 
-    rc = get_cpuid_domain_info(xch, domid, &info);
+    rc = get_cpuid_domain_info(xch, domid, &info, featureset, nr_features);
     if ( rc )
-        return rc;
+        goto out;
 
     cpuid(input, regs);
     base_max = (regs[0] <= DEF_MAX_BASE) ? regs[0] : DEF_MAX_BASE;
@@ -821,7 +867,7 @@ int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid)
         {
             rc = xc_cpuid_do_domctl(xch, domid, input, regs);
             if ( rc )
-                return rc;
+                goto out;
         }
 
         /* Intel cache descriptor leaves. */
@@ -849,7 +895,9 @@ int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid)
             break;
     }
 
-    return 0;
+ out:
+    free_cpuid_domain_info(&info);
+    return rc;
 }
 
 /*
@@ -938,9 +986,9 @@ int xc_cpuid_set(
 
     memset(config_transformed, 0, 4 * sizeof(*config_transformed));
 
-    rc = get_cpuid_domain_info(xch, domid, &info);
+    rc = get_cpuid_domain_info(xch, domid, &info, NULL, 0);
     if ( rc )
-        return rc;
+        goto out;
 
     cpuid(input, regs);
 
@@ -991,7 +1039,7 @@ int xc_cpuid_set(
 
     rc = xc_cpuid_do_domctl(xch, domid, input, regs);
     if ( rc == 0 )
-        return 0;
+        goto out;
 
  fail:
     for ( i = 0; i < 4; i++ )
@@ -999,5 +1047,8 @@ int xc_cpuid_set(
         free(config_transformed[i]);
         config_transformed[i] = NULL;
     }
+
+ out:
+    free_cpuid_domain_info(&info);
     return rc;
 }
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index c66e912..fc20157 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -334,7 +334,7 @@ int libxl_cpuid_parse_config_xend(libxl_cpuid_policy_list *cpuid,
 
 void libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid)
 {
-    xc_cpuid_apply_policy(ctx->xch, domid);
+    xc_cpuid_apply_policy(ctx->xch, domid, NULL, 0);
 }
 
 void libxl_cpuid_set(libxl_ctx *ctx, uint32_t domid,
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index e87f14f..22741d5 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -796,7 +796,7 @@ CAMLprim value stub_xc_domain_cpuid_apply_policy(value xch, value domid)
 #if defined(__i386__) || defined(__x86_64__)
 	int r;
 
-	r = xc_cpuid_apply_policy(_H(xch), _D(domid));
+	r = xc_cpuid_apply_policy(_H(xch), _D(domid), NULL, 0);
 	if (r < 0)
 		failwith_xc(_H(xch));
 #else
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index d53870f..812a905 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -731,7 +731,7 @@ static PyObject *pyxc_dom_set_policy_cpuid(XcObject *self,
     if ( !PyArg_ParseTuple(args, "i", &domid) )
         return NULL;
 
-    if ( xc_cpuid_apply_policy(self->xc_handle, domid) )
+    if ( xc_cpuid_apply_policy(self->xc_handle, domid, NULL, 0) )
         return pyxc_error_to_exception(self->xc_handle);
 
     Py_INCREF(zero);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 20/21] tools/libxc: Use featuresets rather than guesswork
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (18 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 19/21] tools/libxc: Wire a featureset through to cpuid policy logic Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 11:57 ` [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
  20 siblings, 0 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson

It is conceptually wrong to base a VM's featureset on the features visible to
the toolstack which happens to construct it.

Instead, the featureset used is either an explicit one passed by the
toolstack, or the default which Xen believes it can give to the guest.

Collect all the feature manipulation into a single function which adjusts the
featureset, and perform deep dependency removal.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>

v2:
 * Join several related patches together.
v3:
 * Correctly adjust HTT/CMP_LEGACY in the policy.  PV guests see host details,
   so get the host features.  HVM guests have their vcpu topology presented in
   an HTT compatible manor (even if ends up reporting 1 cpu), so have
   CMP_LEGACY unconditionally cleared.
---
 tools/libxc/xc_cpuid_x86.c | 356 +++++++++++++++++----------------------------
 1 file changed, 137 insertions(+), 219 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index a92f5e4..fc7e20a 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -21,7 +21,9 @@
 
 #include <stdlib.h>
 #include <stdbool.h>
+#include <limits.h>
 #include "xc_private.h"
+#include "xc_bitops.h"
 #include <xen/hvm/params.h>
 
 enum {
@@ -31,12 +33,14 @@ enum {
 #include "_xc_cpuid_autogen.h"
 
 #define bitmaskof(idx)      (1u << ((idx) & 31))
-#define clear_bit(idx, dst) ((dst) &= ~bitmaskof(idx))
-#define set_bit(idx, dst)   ((dst) |=  bitmaskof(idx))
+#define featureword_of(idx) ((idx) >> 5)
+#define clear_feature(idx, dst) ((dst) &= ~bitmaskof(idx))
+#define set_feature(idx, dst)   ((dst) |=  bitmaskof(idx))
 
 #define DEF_MAX_BASE 0x0000000du
 #define DEF_MAX_INTELEXT  0x80000008u
 #define DEF_MAX_AMDEXT    0x8000001cu
+#define COMMON_1D CPUID_COMMON_1D_FEATURES
 
 int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps)
 {
@@ -322,37 +326,6 @@ static void amd_xc_cpuid_policy(xc_interface *xch,
             regs[0] = DEF_MAX_AMDEXT;
         break;
 
-    case 0x80000001: {
-        if ( !info->pae )
-            clear_bit(X86_FEATURE_PAE, regs[3]);
-
-        /* Filter all other features according to a whitelist. */
-        regs[2] &= (bitmaskof(X86_FEATURE_LAHF_LM) |
-                    bitmaskof(X86_FEATURE_CMP_LEGACY) |
-                    (info->nestedhvm ? bitmaskof(X86_FEATURE_SVM) : 0) |
-                    bitmaskof(X86_FEATURE_CR8_LEGACY) |
-                    bitmaskof(X86_FEATURE_ABM) |
-                    bitmaskof(X86_FEATURE_SSE4A) |
-                    bitmaskof(X86_FEATURE_MISALIGNSSE) |
-                    bitmaskof(X86_FEATURE_3DNOWPREFETCH) |
-                    bitmaskof(X86_FEATURE_OSVW) |
-                    bitmaskof(X86_FEATURE_XOP) |
-                    bitmaskof(X86_FEATURE_LWP) |
-                    bitmaskof(X86_FEATURE_FMA4) |
-                    bitmaskof(X86_FEATURE_TBM) |
-                    bitmaskof(X86_FEATURE_DBEXT));
-        regs[3] &= (0x0183f3ff | /* features shared with 0x00000001:EDX */
-                    bitmaskof(X86_FEATURE_NX) |
-                    bitmaskof(X86_FEATURE_LM) |
-                    bitmaskof(X86_FEATURE_PAGE1GB) |
-                    bitmaskof(X86_FEATURE_SYSCALL) |
-                    bitmaskof(X86_FEATURE_MMXEXT) |
-                    bitmaskof(X86_FEATURE_FFXSR) |
-                    bitmaskof(X86_FEATURE_3DNOW) |
-                    bitmaskof(X86_FEATURE_3DNOWEXT));
-        break;
-    }
-
     case 0x80000008:
         /*
          * ECX[15:12] is ApicIdCoreSize: ECX[7:0] is NumberOfCores (minus one).
@@ -399,12 +372,6 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
 {
     switch ( input[0] )
     {
-    case 0x00000001:
-        /* ECX[5] is availability of VMX */
-        if ( info->nestedhvm )
-            set_bit(X86_FEATURE_VMX, regs[2]);
-        break;
-
     case 0x00000004:
         /*
          * EAX[31:26] is Maximum Cores Per Package (minus one).
@@ -420,19 +387,6 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
             regs[0] = DEF_MAX_INTELEXT;
         break;
 
-    case 0x80000001: {
-        /* Only a few features are advertised in Intel's 0x80000001. */
-        regs[2] &= (bitmaskof(X86_FEATURE_LAHF_LM) |
-                    bitmaskof(X86_FEATURE_3DNOWPREFETCH) |
-                    bitmaskof(X86_FEATURE_ABM));
-        regs[3] &= (bitmaskof(X86_FEATURE_NX) |
-                    bitmaskof(X86_FEATURE_LM) |
-                    bitmaskof(X86_FEATURE_PAGE1GB) |
-                    bitmaskof(X86_FEATURE_SYSCALL) |
-                    bitmaskof(X86_FEATURE_RDTSCP));
-        break;
-    }
-
     case 0x80000005:
         regs[0] = regs[1] = regs[2] = 0;
         break;
@@ -444,10 +398,6 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
     }
 }
 
-#define XSAVEOPT        (1 << 0)
-#define XSAVEC          (1 << 1)
-#define XGETBV1         (1 << 2)
-#define XSAVES          (1 << 3)
 /* Configure extended state enumeration leaves (0x0000000D for xsave) */
 static void xc_cpuid_config_xsave(xc_interface *xch,
                                   const struct cpuid_domain_info *info,
@@ -484,9 +434,7 @@ static void xc_cpuid_config_xsave(xc_interface *xch,
         regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
         break;
     case 1: /* leaf 1 */
-        regs[0] &= (XSAVEOPT | XSAVEC | XGETBV1 | XSAVES);
-        if ( !info->hvm )
-            regs[0] &= ~XSAVES;
+        regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
         regs[2] &= info->xfeature_mask;
         regs[3] = 0;
         break;
@@ -520,85 +468,22 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
          */
         regs[1] = (regs[1] & 0x0000ffffu) | ((regs[1] & 0x007f0000u) << 1);
 
-        regs[2] &= (bitmaskof(X86_FEATURE_SSE3) |
-                    bitmaskof(X86_FEATURE_PCLMULQDQ) |
-                    bitmaskof(X86_FEATURE_SSSE3) |
-                    bitmaskof(X86_FEATURE_FMA) |
-                    bitmaskof(X86_FEATURE_CX16) |
-                    bitmaskof(X86_FEATURE_PCID) |
-                    bitmaskof(X86_FEATURE_SSE4_1) |
-                    bitmaskof(X86_FEATURE_SSE4_2) |
-                    bitmaskof(X86_FEATURE_MOVBE)  |
-                    bitmaskof(X86_FEATURE_POPCNT) |
-                    bitmaskof(X86_FEATURE_AESNI) |
-                    bitmaskof(X86_FEATURE_F16C) |
-                    bitmaskof(X86_FEATURE_RDRAND) |
-                    ((info->xfeature_mask != 0) ?
-                     (bitmaskof(X86_FEATURE_AVX) |
-                      bitmaskof(X86_FEATURE_XSAVE)) : 0));
-
-        regs[2] |= (bitmaskof(X86_FEATURE_HYPERVISOR) |
-                    bitmaskof(X86_FEATURE_TSC_DEADLINE) |
-                    bitmaskof(X86_FEATURE_X2APIC));
-
-        regs[3] &= (bitmaskof(X86_FEATURE_FPU) |
-                    bitmaskof(X86_FEATURE_VME) |
-                    bitmaskof(X86_FEATURE_DE) |
-                    bitmaskof(X86_FEATURE_PSE) |
-                    bitmaskof(X86_FEATURE_TSC) |
-                    bitmaskof(X86_FEATURE_MSR) |
-                    bitmaskof(X86_FEATURE_PAE) |
-                    bitmaskof(X86_FEATURE_MCE) |
-                    bitmaskof(X86_FEATURE_CX8) |
-                    bitmaskof(X86_FEATURE_APIC) |
-                    bitmaskof(X86_FEATURE_SEP) |
-                    bitmaskof(X86_FEATURE_MTRR) |
-                    bitmaskof(X86_FEATURE_PGE) |
-                    bitmaskof(X86_FEATURE_MCA) |
-                    bitmaskof(X86_FEATURE_CMOV) |
-                    bitmaskof(X86_FEATURE_PAT) |
-                    bitmaskof(X86_FEATURE_CLFLUSH) |
-                    bitmaskof(X86_FEATURE_PSE36) |
-                    bitmaskof(X86_FEATURE_MMX) |
-                    bitmaskof(X86_FEATURE_FXSR) |
-                    bitmaskof(X86_FEATURE_SSE) |
-                    bitmaskof(X86_FEATURE_SSE2) |
-                    bitmaskof(X86_FEATURE_HTT));
-            
-        /* We always support MTRR MSRs. */
-        regs[3] |= bitmaskof(X86_FEATURE_MTRR);
-
-        if ( !info->pae )
-        {
-            clear_bit(X86_FEATURE_PAE, regs[3]);
-            clear_bit(X86_FEATURE_PSE36, regs[3]);
-        }
+        regs[2] = info->featureset[featureword_of(X86_FEATURE_SSE3)];
+        regs[3] = (info->featureset[featureword_of(X86_FEATURE_FPU)] |
+                   bitmaskof(X86_FEATURE_HTT));
         break;
 
     case 0x00000007: /* Intel-defined CPU features */
-        if ( input[1] == 0 ) {
-            regs[1] &= (bitmaskof(X86_FEATURE_TSC_ADJUST) |
-                        bitmaskof(X86_FEATURE_BMI1) |
-                        bitmaskof(X86_FEATURE_HLE)  |
-                        bitmaskof(X86_FEATURE_AVX2) |
-                        bitmaskof(X86_FEATURE_SMEP) |
-                        bitmaskof(X86_FEATURE_BMI2) |
-                        bitmaskof(X86_FEATURE_ERMS) |
-                        bitmaskof(X86_FEATURE_INVPCID) |
-                        bitmaskof(X86_FEATURE_RTM)  |
-                        ((info->xfeature_mask != 0) ?
-                        bitmaskof(X86_FEATURE_MPX) : 0)  |
-                        bitmaskof(X86_FEATURE_RDSEED)  |
-                        bitmaskof(X86_FEATURE_ADX)  |
-                        bitmaskof(X86_FEATURE_SMAP) |
-                        bitmaskof(X86_FEATURE_FSGSBASE) |
-                        bitmaskof(X86_FEATURE_PCOMMIT) |
-                        bitmaskof(X86_FEATURE_CLWB) |
-                        bitmaskof(X86_FEATURE_CLFLUSHOPT));
-            regs[2] &= bitmaskof(X86_FEATURE_PKU);
-        } else
-            regs[1] = regs[2] = 0;
-
+        if ( input[1] == 0 )
+        {
+            regs[1] = info->featureset[featureword_of(X86_FEATURE_FSGSBASE)];
+            regs[2] = info->featureset[featureword_of(X86_FEATURE_PREFETCHWT1)];
+        }
+        else
+        {
+            regs[1] = 0;
+            regs[2] = 0;
+        }
         regs[0] = regs[3] = 0;
         break;
 
@@ -611,14 +496,9 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
         break;
 
     case 0x80000001:
-        if ( !info->pae )
-        {
-            clear_bit(X86_FEATURE_LAHF_LM, regs[2]);
-            clear_bit(X86_FEATURE_LM, regs[3]);
-            clear_bit(X86_FEATURE_NX, regs[3]);
-            clear_bit(X86_FEATURE_PSE36, regs[3]);
-            clear_bit(X86_FEATURE_PAGE1GB, regs[3]);
-        }
+        regs[2] = (info->featureset[featureword_of(X86_FEATURE_LAHF_LM)] &
+                   ~bitmaskof(X86_FEATURE_CMP_LEGACY));
+        regs[3] = info->featureset[featureword_of(X86_FEATURE_SYSCALL)];
         break;
 
     case 0x80000007:
@@ -662,68 +542,34 @@ static void xc_cpuid_pv_policy(xc_interface *xch,
                                const struct cpuid_domain_info *info,
                                const unsigned int *input, unsigned int *regs)
 {
-    if ( (input[0] & 0x7fffffff) == 0x00000001 )
-    {
-        clear_bit(X86_FEATURE_VME, regs[3]);
-        if ( !info->pvh )
-        {
-            clear_bit(X86_FEATURE_PSE, regs[3]);
-            clear_bit(X86_FEATURE_PGE, regs[3]);
-        }
-        clear_bit(X86_FEATURE_MCE, regs[3]);
-        clear_bit(X86_FEATURE_MCA, regs[3]);
-        clear_bit(X86_FEATURE_MTRR, regs[3]);
-        clear_bit(X86_FEATURE_PSE36, regs[3]);
-    }
-
     switch ( input[0] )
     {
     case 0x00000001:
-        if ( info->vendor == VENDOR_AMD )
-            clear_bit(X86_FEATURE_SEP, regs[3]);
-        clear_bit(X86_FEATURE_DS, regs[3]);
-        clear_bit(X86_FEATURE_TM1, regs[3]);
-        clear_bit(X86_FEATURE_PBE, regs[3]);
-
-        clear_bit(X86_FEATURE_DTES64, regs[2]);
-        clear_bit(X86_FEATURE_MONITOR, regs[2]);
-        clear_bit(X86_FEATURE_DSCPL, regs[2]);
-        clear_bit(X86_FEATURE_VMX, regs[2]);
-        clear_bit(X86_FEATURE_SMX, regs[2]);
-        clear_bit(X86_FEATURE_EIST, regs[2]);
-        clear_bit(X86_FEATURE_TM2, regs[2]);
-        if ( !info->pv64 )
-            clear_bit(X86_FEATURE_CX16, regs[2]);
-        if ( info->xfeature_mask == 0 )
-        {
-            clear_bit(X86_FEATURE_XSAVE, regs[2]);
-            clear_bit(X86_FEATURE_AVX, regs[2]);
-        }
-        clear_bit(X86_FEATURE_XTPR, regs[2]);
-        clear_bit(X86_FEATURE_PDCM, regs[2]);
-        clear_bit(X86_FEATURE_PCID, regs[2]);
-        clear_bit(X86_FEATURE_DCA, regs[2]);
-        set_bit(X86_FEATURE_HYPERVISOR, regs[2]);
+    {
+        /* Host topology exposed to PV guest.  Provide host value. */
+        bool host_htt = regs[3] & bitmaskof(X86_FEATURE_HTT);
+
+        regs[2] = info->featureset[featureword_of(X86_FEATURE_SSE3)];
+        regs[3] = (info->featureset[featureword_of(X86_FEATURE_FPU)] &
+                   ~bitmaskof(X86_FEATURE_HTT));
+
+        if ( host_htt )
+            regs[3] |= bitmaskof(X86_FEATURE_HTT);
         break;
+    }
 
     case 0x00000007:
         if ( input[1] == 0 )
         {
-            regs[1] &= (bitmaskof(X86_FEATURE_BMI1) |
-                        bitmaskof(X86_FEATURE_HLE)  |
-                        bitmaskof(X86_FEATURE_AVX2) |
-                        bitmaskof(X86_FEATURE_BMI2) |
-                        bitmaskof(X86_FEATURE_ERMS) |
-                        bitmaskof(X86_FEATURE_RTM)  |
-                        bitmaskof(X86_FEATURE_RDSEED)  |
-                        bitmaskof(X86_FEATURE_ADX)  |
-                        bitmaskof(X86_FEATURE_FSGSBASE));
-            if ( info->xfeature_mask == 0 )
-                clear_bit(X86_FEATURE_MPX, regs[1]);
+            regs[1] = info->featureset[featureword_of(X86_FEATURE_FSGSBASE)];
+            regs[2] = info->featureset[featureword_of(X86_FEATURE_PREFETCHWT1)];
         }
         else
+        {
             regs[1] = 0;
-        regs[0] = regs[2] = regs[3] = 0;
+            regs[2] = 0;
+        }
+        regs[0] = regs[3] = 0;
         break;
 
     case 0x0000000d:
@@ -731,30 +577,19 @@ static void xc_cpuid_pv_policy(xc_interface *xch,
         break;
 
     case 0x80000001:
-        if ( !info->pv64 )
-        {
-            clear_bit(X86_FEATURE_LM, regs[3]);
-            clear_bit(X86_FEATURE_LAHF_LM, regs[2]);
-            if ( info->vendor != VENDOR_AMD )
-                clear_bit(X86_FEATURE_SYSCALL, regs[3]);
-        }
-        else
-        {
-            set_bit(X86_FEATURE_SYSCALL, regs[3]);
-        }
-        if ( !info->pvh )
-            clear_bit(X86_FEATURE_PAGE1GB, regs[3]);
-        clear_bit(X86_FEATURE_RDTSCP, regs[3]);
-
-        clear_bit(X86_FEATURE_SVM, regs[2]);
-        clear_bit(X86_FEATURE_OSVW, regs[2]);
-        clear_bit(X86_FEATURE_IBS, regs[2]);
-        clear_bit(X86_FEATURE_SKINIT, regs[2]);
-        clear_bit(X86_FEATURE_WDT, regs[2]);
-        clear_bit(X86_FEATURE_LWP, regs[2]);
-        clear_bit(X86_FEATURE_NODEID_MSR, regs[2]);
-        clear_bit(X86_FEATURE_TOPOEXT, regs[2]);
+    {
+        /* Host topology exposed to PV guest.  Provide host CMP_LEGACY value. */
+        bool host_cmp_legacy = regs[2] & bitmaskof(X86_FEATURE_CMP_LEGACY);
+
+        regs[2] = (info->featureset[featureword_of(X86_FEATURE_LAHF_LM)] &
+                   ~bitmaskof(X86_FEATURE_CMP_LEGACY));
+        regs[3] = info->featureset[featureword_of(X86_FEATURE_SYSCALL)];
+
+        if ( host_cmp_legacy )
+            regs[2] |= bitmaskof(X86_FEATURE_CMP_LEGACY);
+
         break;
+    }
 
     case 0x00000005: /* MONITOR/MWAIT */
     case 0x0000000a: /* Architectural Performance Monitor Features */
@@ -833,6 +668,87 @@ void xc_cpuid_to_str(const unsigned int *regs, char **strs)
     }
 }
 
+static void sanitise_featureset(struct cpuid_domain_info *info)
+{
+    const uint32_t fs_size = xc_get_cpu_featureset_size();
+    uint32_t disabled_features[fs_size];
+    static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
+    unsigned int i, b;
+
+    if ( info->hvm )
+    {
+        /* HVM Guest */
+
+        if ( !info->pae )
+            clear_bit(X86_FEATURE_PAE, info->featureset);
+
+        if ( !info->nestedhvm )
+        {
+            clear_bit(X86_FEATURE_SVM, info->featureset);
+            clear_bit(X86_FEATURE_VMX, info->featureset);
+        }
+    }
+    else
+    {
+        /* PV or PVH Guest */
+
+        if ( !info->pv64 )
+        {
+            clear_bit(X86_FEATURE_LM, info->featureset);
+            if ( info->vendor != VENDOR_AMD )
+                clear_bit(X86_FEATURE_SYSCALL, info->featureset);
+        }
+
+        if ( !info->pvh )
+        {
+            clear_bit(X86_FEATURE_PSE, info->featureset);
+            clear_bit(X86_FEATURE_PSE36, info->featureset);
+            clear_bit(X86_FEATURE_PGE, info->featureset);
+            clear_bit(X86_FEATURE_PAGE1GB, info->featureset);
+        }
+    }
+
+    if ( info->xfeature_mask == 0 )
+        clear_bit(X86_FEATURE_XSAVE, info->featureset);
+
+    /* Disable deep dependencies of disabled features. */
+    for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
+        disabled_features[i] = ~info->featureset[i] & deep_features[i];
+
+    for ( b = 0; b < sizeof(disabled_features) * CHAR_BIT; ++b )
+    {
+        const uint32_t *dfs;
+
+        if ( !test_bit(b, disabled_features) ||
+             !(dfs = xc_get_feature_deep_deps(b)) )
+             continue;
+
+        for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
+        {
+            info->featureset[i] &= ~dfs[i];
+            disabled_features[i] &= ~dfs[i];
+        }
+    }
+
+    switch ( info->vendor )
+    {
+    case VENDOR_INTEL:
+        /* Intel clears the common bits in e1d. */
+        info->featureset[featureword_of(X86_FEATURE_SYSCALL)] &= ~COMMON_1D;
+        break;
+
+    case VENDOR_AMD:
+        /* AMD duplicates the common bits between 1d and e1d. */
+        info->featureset[featureword_of(X86_FEATURE_SYSCALL)] =
+            ((info->featureset[featureword_of(X86_FEATURE_FPU)] & COMMON_1D) |
+             (info->featureset[featureword_of(X86_FEATURE_SYSCALL)] & ~COMMON_1D));
+        break;
+
+    default:
+        break;
+    }
+}
+
 int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
                           uint32_t *featureset,
                           unsigned int nr_features)
@@ -856,6 +772,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
     else
         ext_max = (regs[0] <= DEF_MAX_INTELEXT) ? regs[0] : DEF_MAX_INTELEXT;
 
+    sanitise_featureset(&info);
+
     input[0] = 0;
     input[1] = XEN_CPUID_INPUT_UNUSED;
     for ( ; ; )
@@ -1027,9 +945,9 @@ int xc_cpuid_set(
                 val = polval;
 
             if ( val )
-                set_bit(31 - j, regs[i]);
+                set_feature(31 - j, regs[i]);
             else
-                clear_bit(31 - j, regs[i]);
+                clear_feature(31 - j, regs[i]);
 
             config_transformed[i][j] = config[i][j];
             if ( config[i][j] == 's' )
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information
  2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
                   ` (19 preceding siblings ...)
  2016-04-07 11:57 ` [PATCH v5 20/21] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
@ 2016-04-07 11:57 ` Andrew Cooper
  2016-04-07 12:58   ` Wei Liu
  2016-04-08 21:00   ` Jan Beulich
  20 siblings, 2 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 11:57 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, Jan Beulich

The existing logic is broken for heterogeneous migration.  By always
advertising the host maximum xstate, a migration to a less capable host always
fails as Xen cannot accomodate the xcr0_accum in the migration stream.

By calculating xstate from the feature information (which a multi-host
toolstack will have levelled appropriately), the guest will have the current
hosts maximum xstate advertised, allowing for correct migration to less
capable hosts.

In addition, some further improvements and corrections:
 - don't discard the known flags in sub-leaves 2..63 ECX
 - zap sub-leaves beyond 62
 - zap all bits in leaf 1, EBX/ECX.  No XSS features are currently supported.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
CC: Wei Liu <wei.liu2@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>

v3:
 * Reintroduce MPX adjustment (this series has been in development since
   before the introduction of MPX upstream, and it got lost in a rebase).
v4:
 * Fold further improvements from Jan.
v5:
 * Reintroduce PKRU, (again, lost due to rebasing).
 * Rewrite the commit message and comments to try and better explain why I am
   deliberatly removing host-specific information from the xstate calculation.
 * Reintroduce 0xFFFFFFFF masks for EAX, to avoid Coverity complaining about
   truncation on assignment.
---
 tools/libxc/xc_cpuid_x86.c | 89 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 14 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index fc7e20a..6d14904 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -398,54 +398,115 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
     }
 }
 
+/* XSTATE bits in XCR0. */
+#define X86_XCR0_X87    (1ULL <<  0)
+#define X86_XCR0_SSE    (1ULL <<  1)
+#define X86_XCR0_AVX    (1ULL <<  2)
+#define X86_XCR0_BNDREG (1ULL <<  3)
+#define X86_XCR0_BNDCSR (1ULL <<  4)
+#define X86_XCR0_PKRU   (1ULL <<  9)
+#define X86_XCR0_LWP    (1ULL << 62)
+
+#define X86_XSS_MASK    (0) /* No XSS states supported yet. */
+
+/* Per-component subleaf flags. */
+#define XSTATE_XSS      (1ULL <<  0)
+#define XSTATE_ALIGN64  (1ULL <<  1)
+
 /* Configure extended state enumeration leaves (0x0000000D for xsave) */
 static void xc_cpuid_config_xsave(xc_interface *xch,
                                   const struct cpuid_domain_info *info,
                                   const unsigned int *input, unsigned int *regs)
 {
-    if ( info->xfeature_mask == 0 )
+    uint64_t guest_xfeature_mask;
+
+    if ( info->xfeature_mask == 0 ||
+         !test_bit(X86_FEATURE_XSAVE, info->featureset) )
     {
         regs[0] = regs[1] = regs[2] = regs[3] = 0;
         return;
     }
 
+    guest_xfeature_mask = X86_XCR0_SSE | X86_XCR0_X87;
+
+    if ( test_bit(X86_FEATURE_AVX, info->featureset) )
+        guest_xfeature_mask |= X86_XCR0_AVX;
+
+    if ( test_bit(X86_FEATURE_MPX, info->featureset) )
+        guest_xfeature_mask |= X86_XCR0_BNDREG | X86_XCR0_BNDCSR;
+
+    if ( test_bit(X86_FEATURE_PKU, info->featureset) )
+        guest_xfeature_mask |= X86_XCR0_PKRU;
+
+    if ( test_bit(X86_FEATURE_LWP, info->featureset) )
+        guest_xfeature_mask |= X86_XCR0_LWP;
+
+    /*
+     * In the common case, the toolstack will have queried Xen for the maximum
+     * available featureset, and guest_xfeature_mask should not able to be
+     * calculated as being greater than the host limit, info->xfeature_mask.
+     *
+     * Nothing currently prevents a toolstack (or an optimistic user) from
+     * purposefully trying to select a larger-than-available xstate set.
+     *
+     * To avoid the domain dying with an unexpected fault, clamp the
+     * calculated mask to the host limit.  Future development work will remove
+     * this possibility, when Xen fully audits the complete cpuid polcy set
+     * for a domain.
+     */
+    guest_xfeature_mask &= info->xfeature_mask;
+
     switch ( input[1] )
     {
-    case 0: 
+    case 0:
         /* EAX: low 32bits of xfeature_enabled_mask */
-        regs[0] = info->xfeature_mask & 0xFFFFFFFF;
+        regs[0] = guest_xfeature_mask & 0xFFFFFFFF;
         /* EDX: high 32bits of xfeature_enabled_mask */
-        regs[3] = (info->xfeature_mask >> 32) & 0xFFFFFFFF;
+        regs[3] = guest_xfeature_mask >> 32;
         /* ECX: max size required by all HW features */
         {
             unsigned int _input[2] = {0xd, 0x0}, _regs[4];
             regs[2] = 0;
-            for ( _input[1] = 2; _input[1] < 64; _input[1]++ )
+            for ( _input[1] = 2; _input[1] <= 62; _input[1]++ )
             {
                 cpuid(_input, _regs);
                 if ( (_regs[0] + _regs[1]) > regs[2] )
                     regs[2] = _regs[0] + _regs[1];
             }
         }
-        /* EBX: max size required by enabled features. 
-         * This register contains a dynamic value, which varies when a guest 
-         * enables or disables XSTATE features (via xsetbv). The default size 
-         * after reset is 576. */ 
+        /* EBX: max size required by enabled features.
+         * This register contains a dynamic value, which varies when a guest
+         * enables or disables XSTATE features (via xsetbv). The default size
+         * after reset is 576. */
         regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
         break;
+
     case 1: /* leaf 1 */
         regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
-        regs[2] &= info->xfeature_mask;
-        regs[3] = 0;
+        regs[1] = 0;
+
+        if ( test_bit(X86_FEATURE_XSAVES, info->featureset) )
+        {
+            regs[2] = guest_xfeature_mask & X86_XSS_MASK & 0xFFFFFFFF;
+            regs[3] = (guest_xfeature_mask >> 32) & X86_XSS_MASK;
+        }
+        else
+            regs[2] = regs[3] = 0;
         break;
-    case 2 ... 63: /* sub-leaves */
-        if ( !(info->xfeature_mask & (1ULL << input[1])) )
+
+    case 2 ... 62: /* per-component sub-leaves */
+        if ( !(guest_xfeature_mask & (1ULL << input[1])) )
         {
             regs[0] = regs[1] = regs[2] = regs[3] = 0;
             break;
         }
         /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
-        regs[2] = regs[3] = 0;
+        regs[2] &= XSTATE_XSS | XSTATE_ALIGN64;
+        regs[3] = 0;
+        break;
+
+    default:
+        regs[0] = regs[1] = regs[2] = regs[3] = 0;
         break;
     }
 }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information
  2016-04-07 11:57 ` [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
@ 2016-04-07 12:58   ` Wei Liu
  2016-04-08 21:00   ` Jan Beulich
  1 sibling, 0 replies; 50+ messages in thread
From: Wei Liu @ 2016-04-07 12:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Jan Beulich, Xen-devel

On Thu, Apr 07, 2016 at 12:57:26PM +0100, Andrew Cooper wrote:
> The existing logic is broken for heterogeneous migration.  By always
> advertising the host maximum xstate, a migration to a less capable host always
> fails as Xen cannot accomodate the xcr0_accum in the migration stream.
> 
> By calculating xstate from the feature information (which a multi-host
> toolstack will have levelled appropriately), the guest will have the current
> hosts maximum xstate advertised, allowing for correct migration to less
> capable hosts.
> 
> In addition, some further improvements and corrections:
>  - don't discard the known flags in sub-leaves 2..63 ECX
>  - zap sub-leaves beyond 62
>  - zap all bits in leaf 1, EBX/ECX.  No XSS features are currently supported.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Since this is mostly x86 code and signed off by two x86 maintainers, I'm
just rubber-stamping this patch:

Acked-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 15/21] tools/libxc: Modify bitmap operations to take void pointers
  2016-04-07 11:57 ` [PATCH v5 15/21] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
@ 2016-04-07 13:00   ` Wei Liu
  2016-04-08 16:34   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 50+ messages in thread
From: Wei Liu @ 2016-04-07 13:00 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Julien Grall, Ian Jackson, Stefano Stabellini, Xen-devel

On Thu, Apr 07, 2016 at 12:57:20PM +0100, Andrew Cooper wrote:
> The type of the pointer to a bitmap is not interesting; it does not affect the
> representation of the block of bits being pointed to.
> 
> Make the libxc functions consistent with those in Xen, so they can work just
> as well with 'unsigned int *' based bitmaps.
> 
> As part of doing so, change the implementation to be in terms of char rather
> than unsigned long.  This fixes alignment concerns with ARM.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

The code looks fine to me.

We've given enough time for ARM folks to object so:

Acked-by: Wei Liu <wei.liu2@citrix.com>

We can always fix this on ARM if it appears to be broken.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 08/21] x86/cpu: Sysctl and common infrastructure for levelling context switching
  2016-04-07 11:57 ` [PATCH v5 08/21] x86/cpu: Sysctl and common infrastructure for levelling context switching Andrew Cooper
@ 2016-04-07 16:54   ` Daniel De Graaf
  2016-04-08 16:12   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 50+ messages in thread
From: Daniel De Graaf @ 2016-04-07 16:54 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel

On 04/07/2016 07:57 AM, Andrew Cooper wrote:
> A toolstack needs to know how much control Xen has over the visible cpuid
> values in PV guests.  Provide an explicit mechanism to query what Xen is
> capable of.
>
> This interface will currently report no capabilities.  This change is
> scaffolding for future patches, which will introduce detection and switching
> logic, after which the interface will report hardware capabilities correctly.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Jan Beulich <JBeulich@suse.com>

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/21] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  2016-04-07 11:57 ` [PATCH v5 14/21] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
@ 2016-04-07 16:54   ` Daniel De Graaf
  2016-04-08 16:32   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 50+ messages in thread
From: Daniel De Graaf @ 2016-04-07 16:54 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Tim Deegan

On 04/07/2016 07:57 AM, Andrew Cooper wrote:
> And provide stubs for toolstack use.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Acked-by: David Scott <dave@recoil.org>
> Acked-by: Jan Beulich <JBeulich@suse.com>

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 01/21] xen/x86: Annotate VM applicability in featureset
  2016-04-07 11:57 ` [PATCH v5 01/21] xen/x86: Annotate VM applicability in featureset Andrew Cooper
@ 2016-04-07 23:01   ` Jan Beulich
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Beulich @ 2016-04-07 23:01 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
> Use attributes to specify whether a feature is applicable to be exposed to:
>  1) All guests
>  2) HVM guests
>  3) HVM HAP guests
> and, via absence of an attribute, to no guests.
> 
> There is no current need for other categories (e.g. PV-only features), and
> such categories should not be introduced if possible.  These categories 
> follow
> from the fact that, with increased hardware support, a guest gets more
> features to use.
> 
> These settings are derived from the existing code in {pv,hvm}_cpuid(), and
> xc_cpuid_x86.c.  One notable exception is EXTAPIC which was previously
> erroneously exposed to guests.  PV guests don't get to use the APIC and the
> HVM APIC emulation doesn't support extended space.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 02/21] xen/x86: Calculate maximum host and guest featuresets
  2016-04-07 11:57 ` [PATCH v5 02/21] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
@ 2016-04-07 23:04   ` Jan Beulich
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Beulich @ 2016-04-07 23:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
> All of this information will be used by the toolstack to make informed
> levelling decisions for VMs, and by Xen to sanity check toolstack-provided
> information.
> 
> The split between the shadow and hap HVM masks is necessary due to the lack of
> a "get cpuid policy" hypercall.  Multi-host toolstacks (i.e. not libxl)
> dealing with hap and non-hap capable hosts need to be able to calculate that
> migrating a shadow guest is safe.
> 
> Future planned development work will implement proper cpuid policy handing in
> Xen, including a "get policy" hypercall, but until then, the difference is
> made available for toolstack use via a non-stable interface.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/21] xen/x86: Generate deep dependencies of features
  2016-04-07 11:57 ` [PATCH v5 03/21] xen/x86: Generate deep dependencies of features Andrew Cooper
@ 2016-04-07 23:18   ` Jan Beulich
  2016-04-07 23:36     ` Andrew Cooper
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2016-04-07 23:18 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
> +    deps = {
> +        # FPU is taken to mean support for the x87 regisers as well as the
> +        # instructions.  MMX is documented to alias the %MM registers over the
> +        # x87 %ST registers in hardware.
> +        FPU: [MMX],
> +
> +        # The PSE36 feature indicates that reserved bits in a PSE superpage
> +        # may be used as extra physical address bits.
> +        PSE: [PSE36],
> +
> +        # Entering Long Mode requires that %CR4.PAE is set.  The NX and PKU
> +        # pagetable bits are only representable in the 64bit PTE format
> +        # offered by PAE.
> +        PAE: [LM, NX, PKU],

PKU is explicitly documented to be available only in IA-32e paging,
so I think it should be listed as a dependent of LM.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/21] xen/x86: Generate deep dependencies of features
  2016-04-07 23:18   ` Jan Beulich
@ 2016-04-07 23:36     ` Andrew Cooper
  2016-04-08 15:17       ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-07 23:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 08/04/2016 00:18, Jan Beulich wrote:
>>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
>> +    deps = {
>> +        # FPU is taken to mean support for the x87 regisers as well as the
>> +        # instructions.  MMX is documented to alias the %MM registers over the
>> +        # x87 %ST registers in hardware.
>> +        FPU: [MMX],
>> +
>> +        # The PSE36 feature indicates that reserved bits in a PSE superpage
>> +        # may be used as extra physical address bits.
>> +        PSE: [PSE36],
>> +
>> +        # Entering Long Mode requires that %CR4.PAE is set.  The NX and PKU
>> +        # pagetable bits are only representable in the 64bit PTE format
>> +        # offered by PAE.
>> +        PAE: [LM, NX, PKU],
> PKU is explicitly documented to be available only in IA-32e paging,
> so I think it should be listed as a dependent of LM.

I am not so sure.  Unlike PCID, it is not declared that setting CR4.PKE
will result in a #GP fault out of Long Mode, and the description does
say "If CR4.PKE = 0, or if IA-32e paging is not active, the processor
does not associate linear addresses with protection keys".

This suggests that CR4.PKE can be set out of Long Mode, but all
pagewalks will act as if key0 is in use.  PCID on the other hand goes to
lengths declaring that you can't set CR4.PCIDE out of Long Mode, nor may
you attempt to exit Long Mode if it is set.

I do have some Skylake hardware with PKU available in the office.  I
will investigate what really happens in this case.  (I am also mindful
of the fact that 32bit PV guests could in principle use PKU, at which
point the dependency on LM would be counter-productive.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 05/21] xen/x86: Improve disabling of features which have dependencies
  2016-04-07 11:57 ` [PATCH v5 05/21] xen/x86: Improve disabling of features which have dependencies Andrew Cooper
@ 2016-04-08 15:04   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 15:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

On Thu, Apr 07, 2016 at 12:57:10PM +0100, Andrew Cooper wrote:
> APIC and XSAVE have dependent features, which also need disabling if Xen
> chooses to disable a feature.
> 
> Use setup_clear_cpu_cap() rather than clear_bit(), as it takes care of
> dependent features as well.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/21] xen/x86: Generate deep dependencies of features
  2016-04-07 23:36     ` Andrew Cooper
@ 2016-04-08 15:17       ` Jan Beulich
  2016-04-08 15:18         ` Andrew Cooper
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2016-04-08 15:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 08.04.16 at 01:36, <andrew.cooper3@citrix.com> wrote:
> On 08/04/2016 00:18, Jan Beulich wrote:
>>>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
>>> +    deps = {
>>> +        # FPU is taken to mean support for the x87 regisers as well as the
>>> +        # instructions.  MMX is documented to alias the %MM registers over the
>>> +        # x87 %ST registers in hardware.
>>> +        FPU: [MMX],
>>> +
>>> +        # The PSE36 feature indicates that reserved bits in a PSE superpage
>>> +        # may be used as extra physical address bits.
>>> +        PSE: [PSE36],
>>> +
>>> +        # Entering Long Mode requires that %CR4.PAE is set.  The NX and PKU
>>> +        # pagetable bits are only representable in the 64bit PTE format
>>> +        # offered by PAE.
>>> +        PAE: [LM, NX, PKU],
>> PKU is explicitly documented to be available only in IA-32e paging,
>> so I think it should be listed as a dependent of LM.
> 
> I am not so sure.  Unlike PCID, it is not declared that setting CR4.PKE
> will result in a #GP fault out of Long Mode, and the description does
> say "If CR4.PKE = 0, or if IA-32e paging is not active, the processor
> does not associate linear addresses with protection keys".
> 
> This suggests that CR4.PKE can be set out of Long Mode, but all
> pagewalks will act as if key0 is in use.  PCID on the other hand goes to
> lengths declaring that you can't set CR4.PCIDE out of Long Mode, nor may
> you attempt to exit Long Mode if it is set.
> 
> I do have some Skylake hardware with PKU available in the office.  I
> will investigate what really happens in this case.  (I am also mindful
> of the fact that 32bit PV guests could in principle use PKU, at which
> point the dependency on LM would be counter-productive.)

With the goal of the series going in soon (today?), wouldn't putting
in the tighter variant be the better approach, allowing us to relax
things if we find they can be relaxed?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 03/21] xen/x86: Generate deep dependencies of features
  2016-04-08 15:17       ` Jan Beulich
@ 2016-04-08 15:18         ` Andrew Cooper
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Cooper @ 2016-04-08 15:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 08/04/16 16:17, Jan Beulich wrote:
>>>> On 08.04.16 at 01:36, <andrew.cooper3@citrix.com> wrote:
>> On 08/04/2016 00:18, Jan Beulich wrote:
>>>>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
>>>> +    deps = {
>>>> +        # FPU is taken to mean support for the x87 regisers as well as the
>>>> +        # instructions.  MMX is documented to alias the %MM registers over the
>>>> +        # x87 %ST registers in hardware.
>>>> +        FPU: [MMX],
>>>> +
>>>> +        # The PSE36 feature indicates that reserved bits in a PSE superpage
>>>> +        # may be used as extra physical address bits.
>>>> +        PSE: [PSE36],
>>>> +
>>>> +        # Entering Long Mode requires that %CR4.PAE is set.  The NX and PKU
>>>> +        # pagetable bits are only representable in the 64bit PTE format
>>>> +        # offered by PAE.
>>>> +        PAE: [LM, NX, PKU],
>>> PKU is explicitly documented to be available only in IA-32e paging,
>>> so I think it should be listed as a dependent of LM.
>> I am not so sure.  Unlike PCID, it is not declared that setting CR4.PKE
>> will result in a #GP fault out of Long Mode, and the description does
>> say "If CR4.PKE = 0, or if IA-32e paging is not active, the processor
>> does not associate linear addresses with protection keys".
>>
>> This suggests that CR4.PKE can be set out of Long Mode, but all
>> pagewalks will act as if key0 is in use.  PCID on the other hand goes to
>> lengths declaring that you can't set CR4.PCIDE out of Long Mode, nor may
>> you attempt to exit Long Mode if it is set.
>>
>> I do have some Skylake hardware with PKU available in the office.  I
>> will investigate what really happens in this case.  (I am also mindful
>> of the fact that 32bit PV guests could in principle use PKU, at which
>> point the dependency on LM would be counter-productive.)
> With the goal of the series going in soon (today?), wouldn't putting
> in the tighter variant be the better approach, allowing us to relax
> things if we find they can be relaxed?

At this point, yes.  I will make a note when changing.

I need to rebase anyway as there is a textual collision with the xsave
bugfixes from yesterday.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 04/21] xen/x86: Clear dependent features when clearing a cpu cap
  2016-04-07 11:57 ` [PATCH v5 04/21] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
@ 2016-04-08 15:36   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 15:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

On Thu, Apr 07, 2016 at 12:57:09PM +0100, Andrew Cooper wrote:
> When clearing a cpu cap, clear all dependent features.  This avoids having a
> featureset with intermediate features disabled, but leaf features enabled.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v3:
>  * Style fixes.  Use __test_and_set_bit()
> ---
>  xen/arch/x86/cpu/common.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
> index d302272..0942b44 100644
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -53,8 +53,22 @@ static unsigned int cleared_caps[NCAPINTS];
>  
>  void __init setup_clear_cpu_cap(unsigned int cap)
>  {
> +	const uint32_t *dfs;
> +	unsigned int i;
> +
> +	if (__test_and_set_bit(cap, cleared_caps))
> +		return;
> +
>  	__clear_bit(cap, boot_cpu_data.x86_capability);
> -	__set_bit(cap, cleared_caps);
> +	dfs = lookup_deep_deps(cap);
> +
> +	if (!dfs)
> +		return;
> +
> +	for (i = 0; i < FSCAPINTS; ++i) {
> +		cleared_caps[i] |= dfs[i];
> +		boot_cpu_data.x86_capability[i] &= ~dfs[i];
> +	}
>  }
>  
>  static void default_init(struct cpuinfo_x86 * c)
> -- 
> 2.1.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/21] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-04-07 11:57 ` [PATCH v5 06/21] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
@ 2016-04-08 16:10   ` Konrad Rzeszutek Wilk
  2016-04-08 18:06   ` Jan Beulich
  1 sibling, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

On Thu, Apr 07, 2016 at 12:57:11PM +0100, Andrew Cooper wrote:
> Currently, {pv,hvm}_cpuid() has a large quantity of essentially-static logic
> for modifying the features visible to a guest.  A lot of this can be subsumed
> by {pv,hvm}_featuremask, which identify the features available on this
> hardware which could be given to a PV or HVM guest.
> 
> This is a step in the direction of full per-domain cpuid policies, but lots
> more development is needed for that.  As a result, the static checks are
> simplified, but the dynamic checks need to remain for now.
> 
> As a side effect, some of the logic for special features can be improved.
> OSXSAVE and OSPKE will be automatically cleared because of being absent in the
> featuremask.  This allows the fast-forward logic to be more simple.
> 
> In addition, there are some corrections to the existing logic:
> 
>  * Hiding PSE36 out of PAE mode is architecturally wrong.  It turns out that
>    it was a bugfix for running HyperV under Xen, which wanted to see PSE36
>    even after choosing to use PAE paging.  PSE36 is not supported by shadow
>    paging, so is hidden from non-HAP guests, but is still visible for HAP
>    guests.  It is also leaked into non-HAP guests when the guest is already
>    running in PAE mode.
>  * Changing the visibility of RDTSCP based on host TSC stability or virtual
>    TSC mode is bogus, so dropped.

Oracle added this in so I am going to do some homework to see if it
was ever used and such. If it was.. then we will have to treat it as
some regression fix or maybe add some more duct-tape if tsc_mode=3 is enabled
in the guest config. <sigh>

>  * When emulating Intel to a guest, the common features in e1d should be
>    cleared.
>  * The APIC bit in e1d (on non-Intel) is also a fast-forward from the
>    APIC_BASE MSR.
> 
> As a small improvement, use compiler-visible &'s and |'s, rather than
> {clear,set}_bit().
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 08/21] x86/cpu: Sysctl and common infrastructure for levelling context switching
  2016-04-07 11:57 ` [PATCH v5 08/21] x86/cpu: Sysctl and common infrastructure for levelling context switching Andrew Cooper
  2016-04-07 16:54   ` Daniel De Graaf
@ 2016-04-08 16:12   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Daniel De Graaf, Xen-devel

On Thu, Apr 07, 2016 at 12:57:13PM +0100, Andrew Cooper wrote:
> A toolstack needs to know how much control Xen has over the visible cpuid
> values in PV guests.  Provide an explicit mechanism to query what Xen is
> capable of.
> 
> This interface will currently report no capabilities.  This change is
> scaffolding for future patches, which will introduce detection and switching
> logic, after which the interface will report hardware capabilities correctly.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 09/21] x86/cpu: Rework AMD masking MSR setup
  2016-04-07 11:57 ` [PATCH v5 09/21] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
@ 2016-04-08 16:13   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

On Thu, Apr 07, 2016 at 12:57:14PM +0100, Andrew Cooper wrote:
> This patch is best reviewed as its end result rather than as a diff, as it
> rewrites almost all of the setup.
> 
> On the BSP, cpuid information is used to evaluate the potential available set
> of masking MSRs, and they are unconditionally probed, filling in the
> availability information and hardware defaults.
> 
> The command line parameters are then combined with the hardware defaults to
> further restrict the Xen default masking level.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 10/21] x86/cpu: Rework Intel masking/faulting setup
  2016-04-07 11:57 ` [PATCH v5 10/21] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
@ 2016-04-08 16:14   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

On Thu, Apr 07, 2016 at 12:57:15PM +0100, Andrew Cooper wrote:
> This patch is best reviewed as its end result rather than as a diff, as it
> rewrites almost all of the setup.
> 
> On the BSP, cpuid information is used to evaluate the potential available set
> of masking MSRs, and they are unconditionally probed, filling in the
> availability information and hardware defaults.  A side effect of this is that
> probe_intel_cpuid_faulting() can move to being __init.
> 
> The command line parameters are then combined with the hardware defaults to
> further restrict the Xen default masking level.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 11/21] x86/cpu: Context switch cpuid masks and faulting state in context_switch()
  2016-04-07 11:57 ` [PATCH v5 11/21] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
@ 2016-04-08 16:15   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

On Thu, Apr 07, 2016 at 12:57:16PM +0100, Andrew Cooper wrote:
> A single ctxt_switch_levelling() function pointer is provided
> (defaulting to an empty nop), which is overridden in the appropriate
> $VENDOR_init_levelling().
> 
> set_cpuid_faulting() is made private and included within
> intel_ctxt_switch_levelling().
> 
> One (attempted) functional change is that the faulting configuration should
> not be special cased for dom0.  It turns out that the toolstack relies on the
> special case (and indeed, on being a PV domain in the first place) to
> correctly build HVM domains.
> 
> For now, the control domain is left as a special case, until futher work can
> be completed to remove the restriction.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 12/21] x86/pv: Provide custom cpumasks for PV domains
  2016-04-07 11:57 ` [PATCH v5 12/21] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
@ 2016-04-08 16:17   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

On Thu, Apr 07, 2016 at 12:57:17PM +0100, Andrew Cooper wrote:
> And use them in preference to cpumask_defaults on context switch.  HVM domains
> must not be masked (to avoid interfering with cpuid calls within the guest),
> so always lazily context switch to the host default.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 13/21] x86/domctl: Update PV domain cpumasks when setting cpuid policy
  2016-04-07 11:57 ` [PATCH v5 13/21] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
@ 2016-04-08 16:26   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

On Thu, Apr 07, 2016 at 12:57:18PM +0100, Andrew Cooper wrote:
> This allows PV domains with different featuresets to observe different values
> from a native cpuid instruction, on supporting hardware.
> 
> It is important to leak the host view of X2APIC, HTT and CMP_LEGACY through to
> guests, even though they could be hidden.  These flags affect how to interpret
> other cpuid leaves which are not maskable.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 14/21] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
  2016-04-07 11:57 ` [PATCH v5 14/21] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
  2016-04-07 16:54   ` Daniel De Graaf
@ 2016-04-08 16:32   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:32 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Daniel De Graaf, Tim Deegan, Xen-devel

On Thu, Apr 07, 2016 at 12:57:19PM +0100, Andrew Cooper wrote:
> And provide stubs for toolstack use.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Acked-by: David Scott <dave@recoil.org>
> Acked-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 15/21] tools/libxc: Modify bitmap operations to take void pointers
  2016-04-07 11:57 ` [PATCH v5 15/21] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
  2016-04-07 13:00   ` Wei Liu
@ 2016-04-08 16:34   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:34 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Julien Grall, Ian Jackson, Stefano Stabellini, Xen-devel

On Thu, Apr 07, 2016 at 12:57:20PM +0100, Andrew Cooper wrote:
> The type of the pointer to a bitmap is not interesting; it does not affect the
> representation of the block of bits being pointed to.
> 
> Make the libxc functions consistent with those in Xen, so they can work just
> as well with 'unsigned int *' based bitmaps.
> 
> As part of doing so, change the implementation to be in terms of char rather
> than unsigned long.  This fixes alignment concerns with ARM.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 16/21] tools/libxc: Use public/featureset.h for cpuid policy generation
  2016-04-07 11:57 ` [PATCH v5 16/21] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
@ 2016-04-08 16:37   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Xen-devel

On Thu, Apr 07, 2016 at 12:57:21PM +0100, Andrew Cooper wrote:
> Rather than having a different local copy of some of the feature
> definitions.
> 
> Modify the xc_cpuid_x86.c cpumask helpers to appropriate truncate the
> new values.
> 
> As some of the feature have been renamed in the public API, similar renames
> are made here.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 17/21] tools/libxc: Expose the automatically generated cpu featuremask information
  2016-04-07 11:57 ` [PATCH v5 17/21] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
@ 2016-04-08 16:38   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 50+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-08 16:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Xen-devel

On Thu, Apr 07, 2016 at 12:57:22PM +0100, Andrew Cooper wrote:
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/21] xen/x86: Improvements to in-hypervisor cpuid sanity checks
  2016-04-07 11:57 ` [PATCH v5 06/21] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
  2016-04-08 16:10   ` Konrad Rzeszutek Wilk
@ 2016-04-08 18:06   ` Jan Beulich
  1 sibling, 0 replies; 50+ messages in thread
From: Jan Beulich @ 2016-04-08 18:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
> Currently, {pv,hvm}_cpuid() has a large quantity of essentially-static logic
> for modifying the features visible to a guest.  A lot of this can be 
> subsumed
> by {pv,hvm}_featuremask, which identify the features available on this
> hardware which could be given to a PV or HVM guest.
> 
> This is a step in the direction of full per-domain cpuid policies, but lots
> more development is needed for that.  As a result, the static checks are
> simplified, but the dynamic checks need to remain for now.
> 
> As a side effect, some of the logic for special features can be improved.
> OSXSAVE and OSPKE will be automatically cleared because of being absent in 
> the
> featuremask.  This allows the fast-forward logic to be more simple.
> 
> In addition, there are some corrections to the existing logic:
> 
>  * Hiding PSE36 out of PAE mode is architecturally wrong.  It turns out that
>    it was a bugfix for running HyperV under Xen, which wanted to see PSE36
>    even after choosing to use PAE paging.  PSE36 is not supported by shadow
>    paging, so is hidden from non-HAP guests, but is still visible for HAP
>    guests.  It is also leaked into non-HAP guests when the guest is already
>    running in PAE mode.
>  * Changing the visibility of RDTSCP based on host TSC stability or virtual
>    TSC mode is bogus, so dropped.
>  * When emulating Intel to a guest, the common features in e1d should be
>    cleared.
>  * The APIC bit in e1d (on non-Intel) is also a fast-forward from the
>    APIC_BASE MSR.
> 
> As a small improvement, use compiler-visible &'s and |'s, rather than
> {clear,set}_bit().
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 07/21] x86/cpu: Move set_cpumask() calls into c_early_init()
  2016-04-07 11:57 ` [PATCH v5 07/21] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
@ 2016-04-08 18:09   ` Jan Beulich
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Beulich @ 2016-04-08 18:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
> Before c/s 44e24f8567 "x86: don't call generic_identify() redundantly", the
> commandline-provided masks would take effect in Xen's view of the processor
> features.
> 
> As the masks got applied after the query for features, the redundant call to
> generic_identify() would clobber the pre-masking feature information with the
> post-masking information.
> 
> Move the set_cpumask() calls into c_early_init() so the effects of the 
> command
> line parameters take place before the main query for features in
> generic_identify().
> 
> The cpuid_mask_* command line parameters now limit the entire system.
> Subsequent changes will cause the mask MSRs to be context switched 
> per-domain,
> removing the need to use the command line parameters for heterogeneous
> levelling purposes.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with the previously mentioned constraint that due to the behavioral
change this shouldn't get committed independently of sufficiently
many of the following patches (until use of the masking command
line options is no longer necessary to control what PV guests get to
see).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information
  2016-04-07 11:57 ` [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
  2016-04-07 12:58   ` Wei Liu
@ 2016-04-08 21:00   ` Jan Beulich
  2016-04-08 21:45     ` Andrew Cooper
  1 sibling, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2016-04-08 21:00 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Xen-devel

>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
> The existing logic is broken for heterogeneous migration.  By always
> advertising the host maximum xstate, a migration to a less capable host always
> fails as Xen cannot accomodate the xcr0_accum in the migration stream.

I don't understand this - xcr0_accum is definitely part of the
migration stream.

> v5:
>  * Reintroduce PKRU, (again, lost due to rebasing).
>  * Rewrite the commit message and comments to try and better explain why I am
>    deliberatly removing host-specific information from the xstate calculation.
>  * Reintroduce 0xFFFFFFFF masks for EAX, to avoid Coverity complaining about
>    truncation on assignment.

Urgh - I don't think ugliness like this can be justified by Coverity
complaining. That would be different if the compiler complained
(like compilers other than gcc do).

>  static void xc_cpuid_config_xsave(xc_interface *xch,
>                                    const struct cpuid_domain_info *info,
>                                    const unsigned int *input, unsigned int *regs)
>  {
> -    if ( info->xfeature_mask == 0 )
> +    uint64_t guest_xfeature_mask;
> +
> +    if ( info->xfeature_mask == 0 ||
> +         !test_bit(X86_FEATURE_XSAVE, info->featureset) )
>      {
>          regs[0] = regs[1] = regs[2] = regs[3] = 0;
>          return;
>      }
>  
> +    guest_xfeature_mask = X86_XCR0_SSE | X86_XCR0_X87;
> +
> +    if ( test_bit(X86_FEATURE_AVX, info->featureset) )
> +        guest_xfeature_mask |= X86_XCR0_AVX;
> +
> +    if ( test_bit(X86_FEATURE_MPX, info->featureset) )
> +        guest_xfeature_mask |= X86_XCR0_BNDREG | X86_XCR0_BNDCSR;
> +
> +    if ( test_bit(X86_FEATURE_PKU, info->featureset) )
> +        guest_xfeature_mask |= X86_XCR0_PKRU;
> +
> +    if ( test_bit(X86_FEATURE_LWP, info->featureset) )
> +        guest_xfeature_mask |= X86_XCR0_LWP;

I continue to be unhappy about that white listing, but well...

>      case 1: /* leaf 1 */
>          regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
> -        regs[2] &= info->xfeature_mask;
> -        regs[3] = 0;
> +        regs[1] = 0;
> +
> +        if ( test_bit(X86_FEATURE_XSAVES, info->featureset) )
> +        {
> +            regs[2] = guest_xfeature_mask & X86_XSS_MASK & 0xFFFFFFFF;
> +            regs[3] = (guest_xfeature_mask >> 32) & X86_XSS_MASK;

This is correct only because X86_XSS_MASK == 0(or else >> and &
need to be swapped). Yet if you assume this, the and-ing with
0xFFFFFFFF makes even less sense here than above.

> +    case 2 ... 62: /* per-component sub-leaves */
> +        if ( !(guest_xfeature_mask & (1ULL << input[1])) )
>          {
>              regs[0] = regs[1] = regs[2] = regs[3] = 0;
>              break;
>          }
>          /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
> -        regs[2] = regs[3] = 0;
> +        regs[2] &= XSTATE_XSS | XSTATE_ALIGN64;

I'm sorry for realizing this only now, but shouldn't we hide XSTATE_XSS
from PV guests? Or is this being taken care of elsewhere?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information
  2016-04-08 21:00   ` Jan Beulich
@ 2016-04-08 21:45     ` Andrew Cooper
  2016-04-08 22:38       ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Andrew Cooper @ 2016-04-08 21:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, Xen-devel

On 08/04/16 22:00, Jan Beulich wrote:
>>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
>> The existing logic is broken for heterogeneous migration.  By always
>> advertising the host maximum xstate, a migration to a less capable host always
>> fails as Xen cannot accomodate the xcr0_accum in the migration stream.
> I don't understand this - xcr0_accum is definitely part of the
> migration stream.

It is.

The problem is that, by using info->xfeature_mask rather than
guest_feature_mask, attempts by the toolstack to level a guest down on a
more capable host fails, and the guest actually does see un-levelled
xstates (i.e. the current host maximum xstates), and enables them.

When the migration (to a less capable host) does occur, the receiving
Xen (correctly) fails the restore, as it cannot accommodate the xstates
currently in use by the guest.

This is one of the several bugs contributing to why XenServer has never
enabled xsave by default.  (The forthcoming XenServer Dundee release
will be the first version with xsave enabled by default, now that
migration bugs like this are addressed.)

>
>> v5:
>>  * Reintroduce PKRU, (again, lost due to rebasing).
>>  * Rewrite the commit message and comments to try and better explain why I am
>>    deliberatly removing host-specific information from the xstate calculation.
>>  * Reintroduce 0xFFFFFFFF masks for EAX, to avoid Coverity complaining about
>>    truncation on assignment.
> Urgh - I don't think ugliness like this can be justified by Coverity
> complaining. That would be different if the compiler complained
> (like compilers other than gcc do).

Hmm - Clang is happy with the code as was.

I would prefer to avoid the Coverity issue if at all possible.  Would a
(uint32_t) cast be more acceptable?

>
>>  static void xc_cpuid_config_xsave(xc_interface *xch,
>>                                    const struct cpuid_domain_info *info,
>>                                    const unsigned int *input, unsigned int *regs)
>>  {
>> -    if ( info->xfeature_mask == 0 )
>> +    uint64_t guest_xfeature_mask;
>> +
>> +    if ( info->xfeature_mask == 0 ||
>> +         !test_bit(X86_FEATURE_XSAVE, info->featureset) )
>>      {
>>          regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>          return;
>>      }
>>  
>> +    guest_xfeature_mask = X86_XCR0_SSE | X86_XCR0_X87;
>> +
>> +    if ( test_bit(X86_FEATURE_AVX, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_AVX;
>> +
>> +    if ( test_bit(X86_FEATURE_MPX, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_BNDREG | X86_XCR0_BNDCSR;
>> +
>> +    if ( test_bit(X86_FEATURE_PKU, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_PKRU;
>> +
>> +    if ( test_bit(X86_FEATURE_LWP, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_LWP;
> I continue to be unhappy about that white listing, but well...

It is all going to move into a single location in Xen in due course, but
as I have explained above, it has to exist somewhere to allow the guest
to safely migrate in a heterogeneous environment.

>
>>      case 1: /* leaf 1 */
>>          regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
>> -        regs[2] &= info->xfeature_mask;
>> -        regs[3] = 0;
>> +        regs[1] = 0;
>> +
>> +        if ( test_bit(X86_FEATURE_XSAVES, info->featureset) )
>> +        {
>> +            regs[2] = guest_xfeature_mask & X86_XSS_MASK & 0xFFFFFFFF;
>> +            regs[3] = (guest_xfeature_mask >> 32) & X86_XSS_MASK;
> This is correct only because X86_XSS_MASK == 0(or else >> and &
> need to be swapped). Yet if you assume this, the and-ing with
> 0xFFFFFFFF makes even less sense here than above.

So it is.  I will submit a bugfix, once we have got an agreement on the
masking angle.

>
>> +    case 2 ... 62: /* per-component sub-leaves */
>> +        if ( !(guest_xfeature_mask & (1ULL << input[1])) )
>>          {
>>              regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>              break;
>>          }
>>          /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
>> -        regs[2] = regs[3] = 0;
>> +        regs[2] &= XSTATE_XSS | XSTATE_ALIGN64;
> I'm sorry for realizing this only now, but shouldn't we hide XSTATE_XSS
> from PV guests? Or is this being taken care of elsewhere?

That is covered with a dynamic check in pv_cpuid() at the moment.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information
  2016-04-08 21:45     ` Andrew Cooper
@ 2016-04-08 22:38       ` Jan Beulich
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Beulich @ 2016-04-08 22:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Xen-devel

>>> On 08.04.16 at 23:45, <andrew.cooper3@citrix.com> wrote:
> On 08/04/16 22:00, Jan Beulich wrote:
>>>>> On 07.04.16 at 13:57, <andrew.cooper3@citrix.com> wrote:
>>> v5:
>>>  * Reintroduce PKRU, (again, lost due to rebasing).
>>>  * Rewrite the commit message and comments to try and better explain why I am
>>>    deliberatly removing host-specific information from the xstate calculation.
>>>  * Reintroduce 0xFFFFFFFF masks for EAX, to avoid Coverity complaining about
>>>    truncation on assignment.
>> Urgh - I don't think ugliness like this can be justified by Coverity
>> complaining. That would be different if the compiler complained
>> (like compilers other than gcc do).
> 
> Hmm - Clang is happy with the code as was.

I should have said "like some compilers other than gcc do".

> I would prefer to avoid the Coverity issue if at all possible.  Would a
> (uint32_t) cast be more acceptable?

To me, yes. However, only slightly - you know my general
opposition to casts. As to the Coverity aspect - we have tons of
such apparent truncations all over the hypervisor (and likely also
all over the tool stack), so I don't really see the point of uglifying
the code here just to avoid one single further instance.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2016-04-08 22:38 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-07 11:57 [PATCH v5 00/21] x86: Improvements to cpuid handling for guests Andrew Cooper
2016-04-07 11:57 ` [PATCH v5 01/21] xen/x86: Annotate VM applicability in featureset Andrew Cooper
2016-04-07 23:01   ` Jan Beulich
2016-04-07 11:57 ` [PATCH v5 02/21] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
2016-04-07 23:04   ` Jan Beulich
2016-04-07 11:57 ` [PATCH v5 03/21] xen/x86: Generate deep dependencies of features Andrew Cooper
2016-04-07 23:18   ` Jan Beulich
2016-04-07 23:36     ` Andrew Cooper
2016-04-08 15:17       ` Jan Beulich
2016-04-08 15:18         ` Andrew Cooper
2016-04-07 11:57 ` [PATCH v5 04/21] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
2016-04-08 15:36   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 05/21] xen/x86: Improve disabling of features which have dependencies Andrew Cooper
2016-04-08 15:04   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 06/21] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
2016-04-08 16:10   ` Konrad Rzeszutek Wilk
2016-04-08 18:06   ` Jan Beulich
2016-04-07 11:57 ` [PATCH v5 07/21] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
2016-04-08 18:09   ` Jan Beulich
2016-04-07 11:57 ` [PATCH v5 08/21] x86/cpu: Sysctl and common infrastructure for levelling context switching Andrew Cooper
2016-04-07 16:54   ` Daniel De Graaf
2016-04-08 16:12   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 09/21] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
2016-04-08 16:13   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 10/21] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
2016-04-08 16:14   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 11/21] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
2016-04-08 16:15   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 12/21] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
2016-04-08 16:17   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 13/21] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
2016-04-08 16:26   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 14/21] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
2016-04-07 16:54   ` Daniel De Graaf
2016-04-08 16:32   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 15/21] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
2016-04-07 13:00   ` Wei Liu
2016-04-08 16:34   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 16/21] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
2016-04-08 16:37   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 17/21] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
2016-04-08 16:38   ` Konrad Rzeszutek Wilk
2016-04-07 11:57 ` [PATCH v5 18/21] tools: Utility for dealing with featuresets Andrew Cooper
2016-04-07 11:57 ` [PATCH v5 19/21] tools/libxc: Wire a featureset through to cpuid policy logic Andrew Cooper
2016-04-07 11:57 ` [PATCH v5 20/21] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
2016-04-07 11:57 ` [PATCH v5 21/21] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
2016-04-07 12:58   ` Wei Liu
2016-04-08 21:00   ` Jan Beulich
2016-04-08 21:45     ` Andrew Cooper
2016-04-08 22:38       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.