xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] x86: Fixes to TSX handling
@ 2021-05-27 13:25 Andrew Cooper
  2021-05-27 13:25 ` [PATCH 1/3] x86/cpuid: Rework HLE and RTM handling Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Andrew Cooper @ 2021-05-27 13:25 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

Fix the handling of the HLE/RTM CPUID bits for guests.

Andrew Cooper (3):
  x86/cpuid: Rework HLE and RTM handling
  x86/tsx: Minor cleanup and improvements
  x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead

 docs/misc/xen-command-line.pandoc           |  40 +++++------
 tools/libs/guest/xg_cpuid_x86.c             |   2 +
 xen/arch/x86/cpu/intel.c                    |   3 -
 xen/arch/x86/cpu/vpmu.c                     |   4 +-
 xen/arch/x86/cpuid.c                        |  26 +++----
 xen/arch/x86/hvm/vmx/vmx.c                  |   2 +-
 xen/arch/x86/msr.c                          |   2 +-
 xen/arch/x86/spec_ctrl.c                    |   5 +-
 xen/arch/x86/tsx.c                          | 103 ++++++++++++++++++++++++----
 xen/include/asm-x86/cpufeature.h            |   1 +
 xen/include/asm-x86/processor.h             |   1 +
 xen/include/asm-x86/vpmu.h                  |   1 -
 xen/include/public/arch-x86/cpufeatureset.h |   4 +-
 13 files changed, 131 insertions(+), 63 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] x86/cpuid: Rework HLE and RTM handling
  2021-05-27 13:25 [PATCH 0/3] x86: Fixes to TSX handling Andrew Cooper
@ 2021-05-27 13:25 ` Andrew Cooper
  2021-05-27 15:03   ` Jan Beulich
  2021-05-27 15:20   ` Roger Pau Monné
  2021-05-27 13:25 ` [PATCH 2/3] x86/tsx: Minor cleanup and improvements Andrew Cooper
  2021-05-27 13:25 ` [PATCH 3/3] x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead Andrew Cooper
  2 siblings, 2 replies; 11+ messages in thread
From: Andrew Cooper @ 2021-05-27 13:25 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

The TAA mitigation offered the option to hide the HLE and RTM CPUID bits,
which has caused some migration compatibility problems.

These two bits are special.  Annotate them with ! to emphasise this point.

Hardware Lock Elision (HLE) may or may not be visible in CPUID, but is
disabled in microcode on all CPUs, and has been removed from the architecture.
Do not advertise it to VMs by default.

Restricted Transactional Memory (RTM) may or may not be visible in CPUID, and
may or may not be configured in force-abort mode.  Have tsx_init() note
whether RTM has been configured into force-abort mode, so
guest_common_feature_adjustments() can conditionally hide it from VMs by
default.

The host policy values for HLE/RTM may or may not be set, depending on any
previous running kernel's choice of visibility, and Xen's choice.  TSX is
available on any CPU which enumerates a TSX-hiding mechanism, so instead of
doing a two-step to clobber any hiding, scan CPUID, then set the visibility,
just force visibility of the bits in the first place.

With the HLE/RTM bits now unilaterally visible in the host policy,
xc_cpuid_apply_policy() can construct a more appropriate policy out of thin
air for pre-4.13 VMs with no CPUID data in their migration stream, and
specifically one where HLE/RTM doesn't potentially disappear behind the back
of a running VM.

Fixes: 8c4330818f6 ("x86/spec-ctrl: Mitigate the TSX Asynchronous Abort sidechannel")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 tools/libs/guest/xg_cpuid_x86.c             |  2 ++
 xen/arch/x86/cpuid.c                        | 24 ++++++++++------------
 xen/arch/x86/spec_ctrl.c                    |  3 ---
 xen/arch/x86/tsx.c                          | 31 +++++++++++++++++++++++++++--
 xen/include/asm-x86/processor.h             |  1 +
 xen/include/public/arch-x86/cpufeatureset.h |  4 ++--
 6 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 1ebc108213..ec5a47fde4 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -511,6 +511,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
          * so migrated-in VM's don't risk seeing features disappearing.
          */
         p->basic.rdrand = test_bit(X86_FEATURE_RDRAND, host_featureset);
+        p->feat.hle = test_bit(X86_FEATURE_HLE, host_featureset);
+        p->feat.rtm = test_bit(X86_FEATURE_RTM, host_featureset);
 
         if ( di.hvm )
         {
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 752bf244ea..55a7b16342 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -375,6 +375,16 @@ static void __init guest_common_default_feature_adjustments(uint32_t *fs)
          boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 0x3a &&
          cpu_has_rdrand && !is_forced_cpu_cap(X86_FEATURE_RDRAND) )
         __clear_bit(X86_FEATURE_RDRAND, fs);
+
+    /*
+     * On certain hardware, speculative or errata workarounds can result in
+     * TSX being placed in "force-abort" mode, where it doesn't actually
+     * function as expected, but is technically compatible with the ISA.
+     *
+     * Do not advertise RTM to guests by default if it won't actually work.
+     */
+    if ( rtm_disabled )
+        __clear_bit(X86_FEATURE_RTM, fs);
 }
 
 static void __init guest_common_feature_adjustments(uint32_t *fs)
@@ -652,20 +662,6 @@ void recalculate_cpuid_policy(struct domain *d)
             __clear_bit(X86_FEATURE_SYSCALL, max_fs);
     }
 
-    /*
-     * On hardware with MSR_TSX_CTRL, the admin may have elected to disable
-     * TSX and hide the feature bits.  Migrating-in VMs may have been booted
-     * pre-mitigation when the TSX features were visbile.
-     *
-     * This situation is compatible (albeit with a perf hit to any TSX code in
-     * the guest), so allow the feature bits to remain set.
-     */
-    if ( cpu_has_tsx_ctrl )
-    {
-        __set_bit(X86_FEATURE_HLE, max_fs);
-        __set_bit(X86_FEATURE_RTM, max_fs);
-    }
-
     /* Clamp the toolstacks choices to reality. */
     for ( i = 0; i < ARRAY_SIZE(fs); i++ )
         fs[i] &= max_fs[i];
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index cd05f42394..f2782b2d55 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1158,9 +1158,6 @@ void __init init_speculation_mitigations(void)
          ((hw_smt_enabled && opt_smt) ||
           !boot_cpu_has(X86_FEATURE_SC_VERW_IDLE)) )
     {
-        setup_clear_cpu_cap(X86_FEATURE_HLE);
-        setup_clear_cpu_cap(X86_FEATURE_RTM);
-
         opt_tsx = 0;
         tsx_init();
     }
diff --git a/xen/arch/x86/tsx.c b/xen/arch/x86/tsx.c
index 39e483640a..e09e819dce 100644
--- a/xen/arch/x86/tsx.c
+++ b/xen/arch/x86/tsx.c
@@ -15,6 +15,7 @@
  */
 int8_t __read_mostly opt_tsx = -1;
 int8_t __read_mostly cpu_has_tsx_ctrl = -1;
+bool __read_mostly rtm_disabled;
 
 static int __init parse_tsx(const char *s)
 {
@@ -45,6 +46,30 @@ void tsx_init(void)
             rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
         cpu_has_tsx_ctrl = !!(caps & ARCH_CAPS_TSX_CTRL);
+
+        /*
+         * The TSX features (HLE/RTM) are handled specially.  They both
+         * enumerate features but, on certain parts, have mechanisms to be
+         * hidden without disrupting running software.
+         *
+         * At the moment, we're running in an unknown context (WRT hiding -
+         * particularly if another fully fledged kernel ran before us) and
+         * depending on user settings, may elect to continue hiding them from
+         * native CPUID instructions.
+         *
+         * Xen doesn't use TSX itself, but use cpu_has_{hle,rtm} for various
+         * system reasons, mostly errata detection, so the meaning is more
+         * useful as "TSX infrastructure available", as opposed to "features
+         * advertised and working".
+         *
+         * Force the features to be visible in Xen's view if we see any of the
+         * infrastructure capable of hiding them.
+         */
+        if ( cpu_has_tsx_ctrl )
+        {
+            setup_force_cpu_cap(X86_FEATURE_HLE);
+            setup_force_cpu_cap(X86_FEATURE_RTM);
+        }
     }
 
     if ( cpu_has_tsx_ctrl )
@@ -53,9 +78,11 @@ void tsx_init(void)
 
         rdmsrl(MSR_TSX_CTRL, val);
 
+        /* Check bottom bit only.  Higher bits are various sentinels. */
+        rtm_disabled = !(opt_tsx & 1);
+
         val &= ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR);
-        /* Check bottom bit only.  Higher bits are various sentinals. */
-        if ( !(opt_tsx & 1) )
+        if ( rtm_disabled )
             val |= TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR;
 
         wrmsrl(MSR_TSX_CTRL, val);
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 83143d4df8..bc4dc69253 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -627,6 +627,7 @@ static inline uint8_t get_cpu_family(uint32_t raw, uint8_t *model,
 }
 
 extern int8_t opt_tsx, cpu_has_tsx_ctrl;
+extern bool rtm_disabled;
 void tsx_init(void);
 
 enum ap_boot_method {
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index c42f56bdd4..b65af42436 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -197,14 +197,14 @@ XEN_CPUFEATURE(FSGSBASE,      5*32+ 0) /*A  {RD,WR}{FS,GS}BASE instructions */
 XEN_CPUFEATURE(TSC_ADJUST,    5*32+ 1) /*S  TSC_ADJUST MSR available */
 XEN_CPUFEATURE(SGX,           5*32+ 2) /*   Software Guard extensions */
 XEN_CPUFEATURE(BMI1,          5*32+ 3) /*A  1st bit manipulation extensions */
-XEN_CPUFEATURE(HLE,           5*32+ 4) /*A  Hardware Lock Elision */
+XEN_CPUFEATURE(HLE,           5*32+ 4) /*!a Hardware Lock Elision */
 XEN_CPUFEATURE(AVX2,          5*32+ 5) /*A  AVX2 instructions */
 XEN_CPUFEATURE(FDP_EXCP_ONLY, 5*32+ 6) /*!  x87 FDP only updated on exception. */
 XEN_CPUFEATURE(SMEP,          5*32+ 7) /*S  Supervisor Mode Execution Protection */
 XEN_CPUFEATURE(BMI2,          5*32+ 8) /*A  2nd bit manipulation extensions */
 XEN_CPUFEATURE(ERMS,          5*32+ 9) /*A  Enhanced REP MOVSB/STOSB */
 XEN_CPUFEATURE(INVPCID,       5*32+10) /*H  Invalidate Process Context ID */
-XEN_CPUFEATURE(RTM,           5*32+11) /*A  Restricted Transactional Memory */
+XEN_CPUFEATURE(RTM,           5*32+11) /*!A Restricted Transactional Memory */
 XEN_CPUFEATURE(PQM,           5*32+12) /*   Platform QoS Monitoring */
 XEN_CPUFEATURE(NO_FPU_SEL,    5*32+13) /*!  FPU CS/DS stored as zero */
 XEN_CPUFEATURE(MPX,           5*32+14) /*s  Memory Protection Extensions */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] x86/tsx: Minor cleanup and improvements
  2021-05-27 13:25 [PATCH 0/3] x86: Fixes to TSX handling Andrew Cooper
  2021-05-27 13:25 ` [PATCH 1/3] x86/cpuid: Rework HLE and RTM handling Andrew Cooper
@ 2021-05-27 13:25 ` Andrew Cooper
  2021-05-27 15:06   ` Jan Beulich
  2021-05-27 15:40   ` Roger Pau Monné
  2021-05-27 13:25 ` [PATCH 3/3] x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead Andrew Cooper
  2 siblings, 2 replies; 11+ messages in thread
From: Andrew Cooper @ 2021-05-27 13:25 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

 * Introduce cpu_has_arch_caps and replace boot_cpu_has(X86_FEATURE_ARCH_CAPS)
 * Read CPUID data into the appropriate boot_cpu_data.x86_capability[]
   element, as subsequent changes are going to need more cpu_has_* logic.
 * Use the hi/lo MSR helpers, which substantially improves code generation.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/cpuid.c             |  2 +-
 xen/arch/x86/hvm/vmx/vmx.c       |  2 +-
 xen/arch/x86/msr.c               |  2 +-
 xen/arch/x86/spec_ctrl.c         |  2 +-
 xen/arch/x86/tsx.c               | 21 ++++++++++++---------
 xen/include/asm-x86/cpufeature.h |  1 +
 6 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 55a7b16342..f3c8950aa3 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -747,7 +747,7 @@ int init_domain_cpuid_policy(struct domain *d)
      * so dom0 can turn off workarounds as appropriate.  Temporary, until the
      * domain policy logic gains a better understanding of MSRs.
      */
-    if ( is_hardware_domain(d) && boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
+    if ( is_hardware_domain(d) && cpu_has_arch_caps )
         p->feat.arch_caps = true;
 
     d->arch.cpuid = p;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 1450fd1991..7e3e67fdc3 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2566,7 +2566,7 @@ static bool __init has_if_pschange_mc(void)
     if ( cpu_has_hypervisor )
         return false;
 
-    if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
+    if ( cpu_has_arch_caps )
         rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
     if ( caps & ARCH_CAPS_IF_PSCHANGE_MC_NO )
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index c3a988bd11..374f92b2c5 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -136,7 +136,7 @@ int init_domain_msr_policy(struct domain *d)
      * so dom0 can turn off workarounds as appropriate.  Temporary, until the
      * domain policy logic gains a better understanding of MSRs.
      */
-    if ( is_hardware_domain(d) && boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
+    if ( is_hardware_domain(d) && cpu_has_arch_caps )
     {
         uint64_t val;
 
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index f2782b2d55..739b7913ff 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -885,7 +885,7 @@ void __init init_speculation_mitigations(void)
     bool cpu_has_bug_taa;
     uint64_t caps = 0;
 
-    if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
+    if ( cpu_has_arch_caps )
         rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
     hw_smt_enabled = check_smt_enabled();
diff --git a/xen/arch/x86/tsx.c b/xen/arch/x86/tsx.c
index e09e819dce..98ecb71a4a 100644
--- a/xen/arch/x86/tsx.c
+++ b/xen/arch/x86/tsx.c
@@ -34,15 +34,18 @@ void tsx_init(void)
 {
     /*
      * This function is first called between microcode being loaded, and CPUID
-     * being scanned generally.  Calculate from raw data whether MSR_TSX_CTRL
-     * is available.
+     * being scanned generally.  Read into boot_cpu_data.x86_capability[] for
+     * the cpu_has_* bits we care about using here.
      */
     if ( unlikely(cpu_has_tsx_ctrl < 0) )
     {
         uint64_t caps = 0;
 
-        if ( boot_cpu_data.cpuid_level >= 7 &&
-             (cpuid_count_edx(7, 0) & cpufeat_mask(X86_FEATURE_ARCH_CAPS)) )
+        if ( boot_cpu_data.cpuid_level >= 7 )
+            boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_ARCH_CAPS)]
+                = cpuid_count_edx(7, 0);
+
+        if ( cpu_has_arch_caps )
             rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
         cpu_has_tsx_ctrl = !!(caps & ARCH_CAPS_TSX_CTRL);
@@ -74,18 +77,18 @@ void tsx_init(void)
 
     if ( cpu_has_tsx_ctrl )
     {
-        uint64_t val;
+        uint32_t hi, lo;
 
-        rdmsrl(MSR_TSX_CTRL, val);
+        rdmsr(MSR_TSX_CTRL, lo, hi);
 
         /* Check bottom bit only.  Higher bits are various sentinels. */
         rtm_disabled = !(opt_tsx & 1);
 
-        val &= ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR);
+        lo &= ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR);
         if ( rtm_disabled )
-            val |= TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR;
+            lo |= TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR;
 
-        wrmsrl(MSR_TSX_CTRL, val);
+        wrmsr(MSR_TSX_CTRL, lo, hi);
     }
     else if ( opt_tsx >= 0 )
         printk_once(XENLOG_WARNING
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 33b2257888..9f5ae3aa0d 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -133,6 +133,7 @@
 #define cpu_has_avx512_vp2intersect boot_cpu_has(X86_FEATURE_AVX512_VP2INTERSECT)
 #define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
 #define cpu_has_serialize       boot_cpu_has(X86_FEATURE_SERIALIZE)
+#define cpu_has_arch_caps       boot_cpu_has(X86_FEATURE_ARCH_CAPS)
 
 /* CPUID level 0x00000007:1.eax */
 #define cpu_has_avx_vnni        boot_cpu_has(X86_FEATURE_AVX_VNNI)
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead
  2021-05-27 13:25 [PATCH 0/3] x86: Fixes to TSX handling Andrew Cooper
  2021-05-27 13:25 ` [PATCH 1/3] x86/cpuid: Rework HLE and RTM handling Andrew Cooper
  2021-05-27 13:25 ` [PATCH 2/3] x86/tsx: Minor cleanup and improvements Andrew Cooper
@ 2021-05-27 13:25 ` Andrew Cooper
  2021-05-27 15:11   ` Jan Beulich
  2021-05-27 17:24   ` Roger Pau Monné
  2 siblings, 2 replies; 11+ messages in thread
From: Andrew Cooper @ 2021-05-27 13:25 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

This reuses the rtm_disable infrastructure, so CPUID derivation works properly
when TSX is disabled in favour of working PCR3.

vpmu= is not a supported feature, and having this functionality under tsx=
centralises all TSX handling.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 docs/misc/xen-command-line.pandoc | 40 +++++++++++++++---------------
 xen/arch/x86/cpu/intel.c          |  3 ---
 xen/arch/x86/cpu/vpmu.c           |  4 +--
 xen/arch/x86/tsx.c                | 51 +++++++++++++++++++++++++++++++++++++--
 xen/include/asm-x86/vpmu.h        |  1 -
 5 files changed, 70 insertions(+), 29 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index c32a397a12..a6facc33ea 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2296,14 +2296,21 @@ pages) must also be specified via the tbuf_size parameter.
 
 Controls for the use of Transactional Synchronization eXtensions.
 
-On Intel parts released in Q3 2019 (with updated microcode), and future parts,
-a control has been introduced which allows TSX to be turned off.
+Several microcode updates are relevant:
 
-On systems with the ability to turn TSX off, this boolean offers system wide
-control of whether TSX is enabled or disabled.
+ * March 2019, fixing the TSX memory ordering errata on all TSX-enabled CPUs
+   to date.  Introduced MSR_TSX_FORCE_ABORT on SKL/SKX/KBL/WHL/CFL parts.  The
+   errata workaround uses Performance Counter 3, so the user can select
+   between working TSX and working perfcounters.
 
-On parts vulnerable to CVE-2019-11135 / TSX Asynchronous Abort, the following
-logic applies:
+ * November 2019, fixing the TSX Async Abort speculative vulnerability.
+   Introduced MSR_TSX_CTRL on all TSX-enabled MDS_NO parts to date,
+   CLX/WHL-R/CFL-R, with the controls becoming architectural moving forward
+   and formally retiring HLE from the architecture.  The user can disable TSX
+   to mitigate TAA, and elect to hide the HLE/RTM CPUID bits.
+
+On systems with the ability to disable TSX off, this boolean offers system
+wide control of whether TSX is enabled or disabled.
 
  * An explicit `tsx=` choice is honoured, even if it is `true` and would
    result in a vulnerable system.
@@ -2311,10 +2318,14 @@ logic applies:
  * When no explicit `tsx=` choice is given, parts vulnerable to TAA will be
    mitigated by disabling TSX, as this is the lowest overhead option.
 
- * If the use of TSX is important, the more expensive TAA mitigations can be
+   If the use of TSX is important, the more expensive TAA mitigations can be
    opted in to with `smt=0 spec-ctrl=md-clear`, at which point TSX will remain
    active by default.
 
+ * When no explicit `tsx=` option is given, parts susceptible to the memory
+   ordering errata default to `true` to enable working TSX.  Alternatively,
+   selecting `tsx=0` will disable TSX and restore PCR3 to a working state.
+
 ### ucode
 > `= List of [ <integer> | scan=<bool>, nmi=<bool>, allow-same=<bool> ]`
 
@@ -2456,20 +2467,7 @@ provide access to a wealth of low level processor information.
 
 *   The `arch` option allows access to the pre-defined architectural events.
 
-*   The `rtm-abort` boolean controls a trade-off between working Restricted
-    Transactional Memory, and working performance counters.
-
-    All processors released to date (Q1 2019) supporting Transactional Memory
-    Extensions suffer an erratum which has been addressed in microcode.
-
-    Processors based on the Skylake microarchitecture with up-to-date
-    microcode internally use performance counter 3 to work around the erratum.
-    A consequence is that the counter gets reprogrammed whenever an `XBEGIN`
-    instruction is executed.
-
-    An alternative mode exists where PCR3 behaves as before, at the cost of
-    `XBEGIN` unconditionally aborting.  Enabling `rtm-abort` mode will
-    activate this alternative mode.
+*   The `rtm-abort` boolean has been superseded.  Use `tsx=0` instead.
 
 *Warning:*
 As the virtualisation is not 100% safe, don't use the vpmu flag on
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 37439071d9..abf8e206d7 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -356,9 +356,6 @@ static void Intel_errata_workarounds(struct cpuinfo_x86 *c)
 	    (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
 		__set_bit(X86_FEATURE_CLFLUSH_MONITOR, c->x86_capability);
 
-	if (cpu_has_tsx_force_abort && opt_rtm_abort)
-		wrmsrl(MSR_TSX_FORCE_ABORT, TSX_FORCE_ABORT_RTM);
-
 	probe_c3_errata(c);
 }
 
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index d8659c63f8..16e91a3694 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -49,7 +49,6 @@ CHECK_pmu_params;
 static unsigned int __read_mostly opt_vpmu_enabled;
 unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
 unsigned int __read_mostly vpmu_features = 0;
-bool __read_mostly opt_rtm_abort;
 
 static DEFINE_SPINLOCK(vpmu_lock);
 static unsigned vpmu_count;
@@ -79,7 +78,8 @@ static int __init parse_vpmu_params(const char *s)
         else if ( !cmdline_strcmp(s, "arch") )
             vpmu_features |= XENPMU_FEATURE_ARCH_ONLY;
         else if ( (val = parse_boolean("rtm-abort", s, ss)) >= 0 )
-            opt_rtm_abort = val;
+            printk(XENLOG_WARNING
+                   "'rtm-abort=<bool>' superseded.  Use 'tsx=<bool>' instead\n");
         else
             rc = -EINVAL;
 
diff --git a/xen/arch/x86/tsx.c b/xen/arch/x86/tsx.c
index 98ecb71a4a..338191df7f 100644
--- a/xen/arch/x86/tsx.c
+++ b/xen/arch/x86/tsx.c
@@ -6,7 +6,9 @@
  * Valid values:
  *   1 => Explicit tsx=1
  *   0 => Explicit tsx=0
- *  -1 => Default, implicit tsx=1, may change to 0 to mitigate TAA
+ *  -1 => Default, altered to 0/1 (if unspecified) by:
+ *                 - TAA heuristics/settings for speculative safety
+ *                 - "TSX vs PCR3" select for TSX memory ordering safety
  *  -3 => Implicit tsx=1 (feed-through from spec-ctrl=0)
  *
  * This is arranged such that the bottom bit encodes whether TSX is actually
@@ -50,6 +52,26 @@ void tsx_init(void)
 
         cpu_has_tsx_ctrl = !!(caps & ARCH_CAPS_TSX_CTRL);
 
+        if ( cpu_has_tsx_force_abort )
+        {
+            /*
+             * On an early TSX-enable Skylake part subject to the memory
+             * ordering erratum, with at least the March 2019 microcode.
+             */
+
+            /*
+             * If no explicit tsx= option is provided, pick a default.
+             *
+             * This deliberately overrides the implicit opt_tsx=-3 from
+             * `spec-ctrl=0` because:
+             * - parse_spec_ctrl() ran before any CPU details where know.
+             * - We now know we're running on a CPU not affected by TAA (as
+             *   TSX_FORCE_ABORT is enumerated).
+             */
+            if ( opt_tsx < 0 )
+                opt_tsx = 1;
+        }
+
         /*
          * The TSX features (HLE/RTM) are handled specially.  They both
          * enumerate features but, on certain parts, have mechanisms to be
@@ -75,6 +97,12 @@ void tsx_init(void)
         }
     }
 
+    /*
+     * Note: MSR_TSX_CTRL is enumerated on TSX-enabled MDS_NO and later parts.
+     * MSR_TSX_FORCE_ABORT is enumerated on TSX-enabled pre-MDS_NO Skylake
+     * parts only.  The two features are on a disjoint set of CPUs, and not
+     * offered to guests by hypervisors.
+     */
     if ( cpu_has_tsx_ctrl )
     {
         uint32_t hi, lo;
@@ -90,9 +118,28 @@ void tsx_init(void)
 
         wrmsr(MSR_TSX_CTRL, lo, hi);
     }
+    else if ( cpu_has_tsx_force_abort )
+    {
+        /*
+         * On an early TSX-enable Skylake part subject to the memory ordering
+         * erratum, with at least the March 2019 microcode.
+         */
+        uint32_t hi, lo;
+
+        rdmsr(MSR_TSX_FORCE_ABORT, lo, hi);
+
+        /* Check bottom bit only.  Higher bits are various sentinels. */
+        rtm_disabled = !(opt_tsx & 1);
+
+        lo &= ~TSX_FORCE_ABORT_RTM;
+        if ( rtm_disabled )
+            lo |= TSX_FORCE_ABORT_RTM;
+
+        wrmsr(MSR_TSX_FORCE_ABORT, lo, hi);
+    }
     else if ( opt_tsx >= 0 )
         printk_once(XENLOG_WARNING
-                    "MSR_TSX_CTRL not available - Ignoring tsx= setting\n");
+                    "TSX controls not available - Ignoring tsx= setting\n");
 }
 
 /*
diff --git a/xen/include/asm-x86/vpmu.h b/xen/include/asm-x86/vpmu.h
index 55f85ba00f..4b0a6ba3da 100644
--- a/xen/include/asm-x86/vpmu.h
+++ b/xen/include/asm-x86/vpmu.h
@@ -126,7 +126,6 @@ static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
 extern unsigned int vpmu_mode;
 extern unsigned int vpmu_features;
-extern bool opt_rtm_abort;
 
 /* Context switch */
 static inline void vpmu_switch_from(struct vcpu *prev)
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] x86/cpuid: Rework HLE and RTM handling
  2021-05-27 13:25 ` [PATCH 1/3] x86/cpuid: Rework HLE and RTM handling Andrew Cooper
@ 2021-05-27 15:03   ` Jan Beulich
  2021-05-27 15:20   ` Roger Pau Monné
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2021-05-27 15:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 27.05.2021 15:25, Andrew Cooper wrote:
> The TAA mitigation offered the option to hide the HLE and RTM CPUID bits,
> which has caused some migration compatibility problems.
> 
> These two bits are special.  Annotate them with ! to emphasise this point.
> 
> Hardware Lock Elision (HLE) may or may not be visible in CPUID, but is
> disabled in microcode on all CPUs, and has been removed from the architecture.
> Do not advertise it to VMs by default.
> 
> Restricted Transactional Memory (RTM) may or may not be visible in CPUID, and
> may or may not be configured in force-abort mode.  Have tsx_init() note
> whether RTM has been configured into force-abort mode, so
> guest_common_feature_adjustments() can conditionally hide it from VMs by
> default.
> 
> The host policy values for HLE/RTM may or may not be set, depending on any
> previous running kernel's choice of visibility, and Xen's choice.  TSX is
> available on any CPU which enumerates a TSX-hiding mechanism, so instead of
> doing a two-step to clobber any hiding, scan CPUID, then set the visibility,
> just force visibility of the bits in the first place.
> 
> With the HLE/RTM bits now unilaterally visible in the host policy,
> xc_cpuid_apply_policy() can construct a more appropriate policy out of thin
> air for pre-4.13 VMs with no CPUID data in their migration stream, and
> specifically one where HLE/RTM doesn't potentially disappear behind the back
> of a running VM.
> 
> Fixes: 8c4330818f6 ("x86/spec-ctrl: Mitigate the TSX Asynchronous Abort sidechannel")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] x86/tsx: Minor cleanup and improvements
  2021-05-27 13:25 ` [PATCH 2/3] x86/tsx: Minor cleanup and improvements Andrew Cooper
@ 2021-05-27 15:06   ` Jan Beulich
  2021-05-27 15:40   ` Roger Pau Monné
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2021-05-27 15:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 27.05.2021 15:25, Andrew Cooper wrote:
>  * Introduce cpu_has_arch_caps and replace boot_cpu_has(X86_FEATURE_ARCH_CAPS)
>  * Read CPUID data into the appropriate boot_cpu_data.x86_capability[]
>    element, as subsequent changes are going to need more cpu_has_* logic.
>  * Use the hi/lo MSR helpers, which substantially improves code generation.
> 
> No practical change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead
  2021-05-27 13:25 ` [PATCH 3/3] x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead Andrew Cooper
@ 2021-05-27 15:11   ` Jan Beulich
  2021-05-27 17:24   ` Roger Pau Monné
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2021-05-27 15:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 27.05.2021 15:25, Andrew Cooper wrote:
> This reuses the rtm_disable infrastructure, so CPUID derivation works properly
> when TSX is disabled in favour of working PCR3.
> 
> vpmu= is not a supported feature, and having this functionality under tsx=
> centralises all TSX handling.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] x86/cpuid: Rework HLE and RTM handling
  2021-05-27 13:25 ` [PATCH 1/3] x86/cpuid: Rework HLE and RTM handling Andrew Cooper
  2021-05-27 15:03   ` Jan Beulich
@ 2021-05-27 15:20   ` Roger Pau Monné
  1 sibling, 0 replies; 11+ messages in thread
From: Roger Pau Monné @ 2021-05-27 15:20 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Wei Liu

On Thu, May 27, 2021 at 02:25:17PM +0100, Andrew Cooper wrote:
> The TAA mitigation offered the option to hide the HLE and RTM CPUID bits,
> which has caused some migration compatibility problems.
> 
> These two bits are special.  Annotate them with ! to emphasise this point.
> 
> Hardware Lock Elision (HLE) may or may not be visible in CPUID, but is
> disabled in microcode on all CPUs, and has been removed from the architecture.
> Do not advertise it to VMs by default.
> 
> Restricted Transactional Memory (RTM) may or may not be visible in CPUID, and
> may or may not be configured in force-abort mode.  Have tsx_init() note
> whether RTM has been configured into force-abort mode, so
> guest_common_feature_adjustments() can conditionally hide it from VMs by
> default.
> 
> The host policy values for HLE/RTM may or may not be set, depending on any
> previous running kernel's choice of visibility, and Xen's choice.  TSX is
> available on any CPU which enumerates a TSX-hiding mechanism, so instead of
> doing a two-step to clobber any hiding, scan CPUID, then set the visibility,
> just force visibility of the bits in the first place.
> 
> With the HLE/RTM bits now unilaterally visible in the host policy,
> xc_cpuid_apply_policy() can construct a more appropriate policy out of thin
> air for pre-4.13 VMs with no CPUID data in their migration stream, and
> specifically one where HLE/RTM doesn't potentially disappear behind the back
> of a running VM.
> 
> Fixes: 8c4330818f6 ("x86/spec-ctrl: Mitigate the TSX Asynchronous Abort sidechannel")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] x86/tsx: Minor cleanup and improvements
  2021-05-27 13:25 ` [PATCH 2/3] x86/tsx: Minor cleanup and improvements Andrew Cooper
  2021-05-27 15:06   ` Jan Beulich
@ 2021-05-27 15:40   ` Roger Pau Monné
  1 sibling, 0 replies; 11+ messages in thread
From: Roger Pau Monné @ 2021-05-27 15:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Wei Liu

On Thu, May 27, 2021 at 02:25:18PM +0100, Andrew Cooper wrote:
>  * Introduce cpu_has_arch_caps and replace boot_cpu_has(X86_FEATURE_ARCH_CAPS)
>  * Read CPUID data into the appropriate boot_cpu_data.x86_capability[]
>    element, as subsequent changes are going to need more cpu_has_* logic.
>  * Use the hi/lo MSR helpers, which substantially improves code generation.
> 
> No practical change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead
  2021-05-27 13:25 ` [PATCH 3/3] x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead Andrew Cooper
  2021-05-27 15:11   ` Jan Beulich
@ 2021-05-27 17:24   ` Roger Pau Monné
  2021-05-27 18:33     ` Andrew Cooper
  1 sibling, 1 reply; 11+ messages in thread
From: Roger Pau Monné @ 2021-05-27 17:24 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Wei Liu

On Thu, May 27, 2021 at 02:25:19PM +0100, Andrew Cooper wrote:
> This reuses the rtm_disable infrastructure, so CPUID derivation works properly
> when TSX is disabled in favour of working PCR3.
> 
> vpmu= is not a supported feature, and having this functionality under tsx=
> centralises all TSX handling.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> ---
>  docs/misc/xen-command-line.pandoc | 40 +++++++++++++++---------------
>  xen/arch/x86/cpu/intel.c          |  3 ---
>  xen/arch/x86/cpu/vpmu.c           |  4 +--
>  xen/arch/x86/tsx.c                | 51 +++++++++++++++++++++++++++++++++++++--
>  xen/include/asm-x86/vpmu.h        |  1 -
>  5 files changed, 70 insertions(+), 29 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index c32a397a12..a6facc33ea 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -2296,14 +2296,21 @@ pages) must also be specified via the tbuf_size parameter.
>  
>  Controls for the use of Transactional Synchronization eXtensions.
>  
> -On Intel parts released in Q3 2019 (with updated microcode), and future parts,
> -a control has been introduced which allows TSX to be turned off.
> +Several microcode updates are relevant:
>  
> -On systems with the ability to turn TSX off, this boolean offers system wide
> -control of whether TSX is enabled or disabled.
> + * March 2019, fixing the TSX memory ordering errata on all TSX-enabled CPUs
> +   to date.  Introduced MSR_TSX_FORCE_ABORT on SKL/SKX/KBL/WHL/CFL parts.  The
> +   errata workaround uses Performance Counter 3, so the user can select
> +   between working TSX and working perfcounters.
>  
> -On parts vulnerable to CVE-2019-11135 / TSX Asynchronous Abort, the following
> -logic applies:
> + * November 2019, fixing the TSX Async Abort speculative vulnerability.
> +   Introduced MSR_TSX_CTRL on all TSX-enabled MDS_NO parts to date,
> +   CLX/WHL-R/CFL-R, with the controls becoming architectural moving forward
> +   and formally retiring HLE from the architecture.  The user can disable TSX
> +   to mitigate TAA, and elect to hide the HLE/RTM CPUID bits.
> +
> +On systems with the ability to disable TSX off, this boolean offers system
> +wide control of whether TSX is enabled or disabled.
>  
>   * An explicit `tsx=` choice is honoured, even if it is `true` and would
>     result in a vulnerable system.
> @@ -2311,10 +2318,14 @@ logic applies:
>   * When no explicit `tsx=` choice is given, parts vulnerable to TAA will be
>     mitigated by disabling TSX, as this is the lowest overhead option.
>  
> - * If the use of TSX is important, the more expensive TAA mitigations can be
> +   If the use of TSX is important, the more expensive TAA mitigations can be
>     opted in to with `smt=0 spec-ctrl=md-clear`, at which point TSX will remain
>     active by default.
>  
> + * When no explicit `tsx=` option is given, parts susceptible to the memory
> +   ordering errata default to `true` to enable working TSX.  Alternatively,
> +   selecting `tsx=0` will disable TSX and restore PCR3 to a working state.
> +
>  ### ucode
>  > `= List of [ <integer> | scan=<bool>, nmi=<bool>, allow-same=<bool> ]`
>  
> @@ -2456,20 +2467,7 @@ provide access to a wealth of low level processor information.
>  
>  *   The `arch` option allows access to the pre-defined architectural events.
>  
> -*   The `rtm-abort` boolean controls a trade-off between working Restricted
> -    Transactional Memory, and working performance counters.
> -
> -    All processors released to date (Q1 2019) supporting Transactional Memory
> -    Extensions suffer an erratum which has been addressed in microcode.
> -
> -    Processors based on the Skylake microarchitecture with up-to-date
> -    microcode internally use performance counter 3 to work around the erratum.
> -    A consequence is that the counter gets reprogrammed whenever an `XBEGIN`
> -    instruction is executed.
> -
> -    An alternative mode exists where PCR3 behaves as before, at the cost of
> -    `XBEGIN` unconditionally aborting.  Enabling `rtm-abort` mode will
> -    activate this alternative mode.
> +*   The `rtm-abort` boolean has been superseded.  Use `tsx=0` instead.
>  
>  *Warning:*
>  As the virtualisation is not 100% safe, don't use the vpmu flag on
> diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
> index 37439071d9..abf8e206d7 100644
> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -356,9 +356,6 @@ static void Intel_errata_workarounds(struct cpuinfo_x86 *c)
>  	    (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
>  		__set_bit(X86_FEATURE_CLFLUSH_MONITOR, c->x86_capability);
>  
> -	if (cpu_has_tsx_force_abort && opt_rtm_abort)
> -		wrmsrl(MSR_TSX_FORCE_ABORT, TSX_FORCE_ABORT_RTM);
> -
>  	probe_c3_errata(c);
>  }
>  
> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
> index d8659c63f8..16e91a3694 100644
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -49,7 +49,6 @@ CHECK_pmu_params;
>  static unsigned int __read_mostly opt_vpmu_enabled;
>  unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
>  unsigned int __read_mostly vpmu_features = 0;
> -bool __read_mostly opt_rtm_abort;
>  
>  static DEFINE_SPINLOCK(vpmu_lock);
>  static unsigned vpmu_count;
> @@ -79,7 +78,8 @@ static int __init parse_vpmu_params(const char *s)
>          else if ( !cmdline_strcmp(s, "arch") )
>              vpmu_features |= XENPMU_FEATURE_ARCH_ONLY;
>          else if ( (val = parse_boolean("rtm-abort", s, ss)) >= 0 )
> -            opt_rtm_abort = val;
> +            printk(XENLOG_WARNING
> +                   "'rtm-abort=<bool>' superseded.  Use 'tsx=<bool>' instead\n");
>          else
>              rc = -EINVAL;
>  
> diff --git a/xen/arch/x86/tsx.c b/xen/arch/x86/tsx.c
> index 98ecb71a4a..338191df7f 100644
> --- a/xen/arch/x86/tsx.c
> +++ b/xen/arch/x86/tsx.c
> @@ -6,7 +6,9 @@
>   * Valid values:
>   *   1 => Explicit tsx=1
>   *   0 => Explicit tsx=0
> - *  -1 => Default, implicit tsx=1, may change to 0 to mitigate TAA
> + *  -1 => Default, altered to 0/1 (if unspecified) by:
> + *                 - TAA heuristics/settings for speculative safety
> + *                 - "TSX vs PCR3" select for TSX memory ordering safety
>   *  -3 => Implicit tsx=1 (feed-through from spec-ctrl=0)
>   *
>   * This is arranged such that the bottom bit encodes whether TSX is actually
> @@ -50,6 +52,26 @@ void tsx_init(void)
>  
>          cpu_has_tsx_ctrl = !!(caps & ARCH_CAPS_TSX_CTRL);
>  
> +        if ( cpu_has_tsx_force_abort )
> +        {
> +            /*
> +             * On an early TSX-enable Skylake part subject to the memory
> +             * ordering erratum, with at least the March 2019 microcode.
> +             */
> +
> +            /*
> +             * If no explicit tsx= option is provided, pick a default.
> +             *
> +             * This deliberately overrides the implicit opt_tsx=-3 from
> +             * `spec-ctrl=0` because:
> +             * - parse_spec_ctrl() ran before any CPU details where know.
> +             * - We now know we're running on a CPU not affected by TAA (as
> +             *   TSX_FORCE_ABORT is enumerated).
> +             */
> +            if ( opt_tsx < 0 )
> +                opt_tsx = 1;
> +        }
> +
>          /*
>           * The TSX features (HLE/RTM) are handled specially.  They both
>           * enumerate features but, on certain parts, have mechanisms to be
> @@ -75,6 +97,12 @@ void tsx_init(void)
>          }
>      }
>  
> +    /*
> +     * Note: MSR_TSX_CTRL is enumerated on TSX-enabled MDS_NO and later parts.
> +     * MSR_TSX_FORCE_ABORT is enumerated on TSX-enabled pre-MDS_NO Skylake
> +     * parts only.  The two features are on a disjoint set of CPUs, and not
> +     * offered to guests by hypervisors.
> +     */
>      if ( cpu_has_tsx_ctrl )
>      {
>          uint32_t hi, lo;
> @@ -90,9 +118,28 @@ void tsx_init(void)
>  
>          wrmsr(MSR_TSX_CTRL, lo, hi);
>      }
> +    else if ( cpu_has_tsx_force_abort )
> +    {
> +        /*
> +         * On an early TSX-enable Skylake part subject to the memory ordering
> +         * erratum, with at least the March 2019 microcode.
> +         */
> +        uint32_t hi, lo;
> +
> +        rdmsr(MSR_TSX_FORCE_ABORT, lo, hi);
> +
> +        /* Check bottom bit only.  Higher bits are various sentinels. */
> +        rtm_disabled = !(opt_tsx & 1);

I think you also calculate rtm_disabled in the previous if case
(cpu_has_tsx_ctrl), maybe that could be pulled out?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead
  2021-05-27 17:24   ` Roger Pau Monné
@ 2021-05-27 18:33     ` Andrew Cooper
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2021-05-27 18:33 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, Jan Beulich, Wei Liu

On 27/05/2021 18:24, Roger Pau Monné wrote:
> On Thu, May 27, 2021 at 02:25:19PM +0100, Andrew Cooper wrote:
>> This reuses the rtm_disable infrastructure, so CPUID derivation works properly
>> when TSX is disabled in favour of working PCR3.
>>
>> vpmu= is not a supported feature, and having this functionality under tsx=
>> centralises all TSX handling.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wl@xen.org>
>> ---
>>  docs/misc/xen-command-line.pandoc | 40 +++++++++++++++---------------
>>  xen/arch/x86/cpu/intel.c          |  3 ---
>>  xen/arch/x86/cpu/vpmu.c           |  4 +--
>>  xen/arch/x86/tsx.c                | 51 +++++++++++++++++++++++++++++++++++++--
>>  xen/include/asm-x86/vpmu.h        |  1 -
>>  5 files changed, 70 insertions(+), 29 deletions(-)
>>
>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
>> index c32a397a12..a6facc33ea 100644
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -2296,14 +2296,21 @@ pages) must also be specified via the tbuf_size parameter.
>>  
>>  Controls for the use of Transactional Synchronization eXtensions.
>>  
>> -On Intel parts released in Q3 2019 (with updated microcode), and future parts,
>> -a control has been introduced which allows TSX to be turned off.
>> +Several microcode updates are relevant:
>>  
>> -On systems with the ability to turn TSX off, this boolean offers system wide
>> -control of whether TSX is enabled or disabled.
>> + * March 2019, fixing the TSX memory ordering errata on all TSX-enabled CPUs
>> +   to date.  Introduced MSR_TSX_FORCE_ABORT on SKL/SKX/KBL/WHL/CFL parts.  The
>> +   errata workaround uses Performance Counter 3, so the user can select
>> +   between working TSX and working perfcounters.
>>  
>> -On parts vulnerable to CVE-2019-11135 / TSX Asynchronous Abort, the following
>> -logic applies:
>> + * November 2019, fixing the TSX Async Abort speculative vulnerability.
>> +   Introduced MSR_TSX_CTRL on all TSX-enabled MDS_NO parts to date,
>> +   CLX/WHL-R/CFL-R, with the controls becoming architectural moving forward
>> +   and formally retiring HLE from the architecture.  The user can disable TSX
>> +   to mitigate TAA, and elect to hide the HLE/RTM CPUID bits.
>> +
>> +On systems with the ability to disable TSX off, this boolean offers system
>> +wide control of whether TSX is enabled or disabled.
>>  
>>   * An explicit `tsx=` choice is honoured, even if it is `true` and would
>>     result in a vulnerable system.
>> @@ -2311,10 +2318,14 @@ logic applies:
>>   * When no explicit `tsx=` choice is given, parts vulnerable to TAA will be
>>     mitigated by disabling TSX, as this is the lowest overhead option.
>>  
>> - * If the use of TSX is important, the more expensive TAA mitigations can be
>> +   If the use of TSX is important, the more expensive TAA mitigations can be
>>     opted in to with `smt=0 spec-ctrl=md-clear`, at which point TSX will remain
>>     active by default.
>>  
>> + * When no explicit `tsx=` option is given, parts susceptible to the memory
>> +   ordering errata default to `true` to enable working TSX.  Alternatively,
>> +   selecting `tsx=0` will disable TSX and restore PCR3 to a working state.
>> +
>>  ### ucode
>>  > `= List of [ <integer> | scan=<bool>, nmi=<bool>, allow-same=<bool> ]`
>>  
>> @@ -2456,20 +2467,7 @@ provide access to a wealth of low level processor information.
>>  
>>  *   The `arch` option allows access to the pre-defined architectural events.
>>  
>> -*   The `rtm-abort` boolean controls a trade-off between working Restricted
>> -    Transactional Memory, and working performance counters.
>> -
>> -    All processors released to date (Q1 2019) supporting Transactional Memory
>> -    Extensions suffer an erratum which has been addressed in microcode.
>> -
>> -    Processors based on the Skylake microarchitecture with up-to-date
>> -    microcode internally use performance counter 3 to work around the erratum.
>> -    A consequence is that the counter gets reprogrammed whenever an `XBEGIN`
>> -    instruction is executed.
>> -
>> -    An alternative mode exists where PCR3 behaves as before, at the cost of
>> -    `XBEGIN` unconditionally aborting.  Enabling `rtm-abort` mode will
>> -    activate this alternative mode.
>> +*   The `rtm-abort` boolean has been superseded.  Use `tsx=0` instead.
>>  
>>  *Warning:*
>>  As the virtualisation is not 100% safe, don't use the vpmu flag on
>> diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
>> index 37439071d9..abf8e206d7 100644
>> --- a/xen/arch/x86/cpu/intel.c
>> +++ b/xen/arch/x86/cpu/intel.c
>> @@ -356,9 +356,6 @@ static void Intel_errata_workarounds(struct cpuinfo_x86 *c)
>>  	    (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
>>  		__set_bit(X86_FEATURE_CLFLUSH_MONITOR, c->x86_capability);
>>  
>> -	if (cpu_has_tsx_force_abort && opt_rtm_abort)
>> -		wrmsrl(MSR_TSX_FORCE_ABORT, TSX_FORCE_ABORT_RTM);
>> -
>>  	probe_c3_errata(c);
>>  }
>>  
>> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
>> index d8659c63f8..16e91a3694 100644
>> --- a/xen/arch/x86/cpu/vpmu.c
>> +++ b/xen/arch/x86/cpu/vpmu.c
>> @@ -49,7 +49,6 @@ CHECK_pmu_params;
>>  static unsigned int __read_mostly opt_vpmu_enabled;
>>  unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
>>  unsigned int __read_mostly vpmu_features = 0;
>> -bool __read_mostly opt_rtm_abort;
>>  
>>  static DEFINE_SPINLOCK(vpmu_lock);
>>  static unsigned vpmu_count;
>> @@ -79,7 +78,8 @@ static int __init parse_vpmu_params(const char *s)
>>          else if ( !cmdline_strcmp(s, "arch") )
>>              vpmu_features |= XENPMU_FEATURE_ARCH_ONLY;
>>          else if ( (val = parse_boolean("rtm-abort", s, ss)) >= 0 )
>> -            opt_rtm_abort = val;
>> +            printk(XENLOG_WARNING
>> +                   "'rtm-abort=<bool>' superseded.  Use 'tsx=<bool>' instead\n");
>>          else
>>              rc = -EINVAL;
>>  
>> diff --git a/xen/arch/x86/tsx.c b/xen/arch/x86/tsx.c
>> index 98ecb71a4a..338191df7f 100644
>> --- a/xen/arch/x86/tsx.c
>> +++ b/xen/arch/x86/tsx.c
>> @@ -6,7 +6,9 @@
>>   * Valid values:
>>   *   1 => Explicit tsx=1
>>   *   0 => Explicit tsx=0
>> - *  -1 => Default, implicit tsx=1, may change to 0 to mitigate TAA
>> + *  -1 => Default, altered to 0/1 (if unspecified) by:
>> + *                 - TAA heuristics/settings for speculative safety
>> + *                 - "TSX vs PCR3" select for TSX memory ordering safety
>>   *  -3 => Implicit tsx=1 (feed-through from spec-ctrl=0)
>>   *
>>   * This is arranged such that the bottom bit encodes whether TSX is actually
>> @@ -50,6 +52,26 @@ void tsx_init(void)
>>  
>>          cpu_has_tsx_ctrl = !!(caps & ARCH_CAPS_TSX_CTRL);
>>  
>> +        if ( cpu_has_tsx_force_abort )
>> +        {
>> +            /*
>> +             * On an early TSX-enable Skylake part subject to the memory
>> +             * ordering erratum, with at least the March 2019 microcode.
>> +             */
>> +
>> +            /*
>> +             * If no explicit tsx= option is provided, pick a default.
>> +             *
>> +             * This deliberately overrides the implicit opt_tsx=-3 from
>> +             * `spec-ctrl=0` because:
>> +             * - parse_spec_ctrl() ran before any CPU details where know.
>> +             * - We now know we're running on a CPU not affected by TAA (as
>> +             *   TSX_FORCE_ABORT is enumerated).
>> +             */
>> +            if ( opt_tsx < 0 )
>> +                opt_tsx = 1;
>> +        }
>> +
>>          /*
>>           * The TSX features (HLE/RTM) are handled specially.  They both
>>           * enumerate features but, on certain parts, have mechanisms to be
>> @@ -75,6 +97,12 @@ void tsx_init(void)
>>          }
>>      }
>>  
>> +    /*
>> +     * Note: MSR_TSX_CTRL is enumerated on TSX-enabled MDS_NO and later parts.
>> +     * MSR_TSX_FORCE_ABORT is enumerated on TSX-enabled pre-MDS_NO Skylake
>> +     * parts only.  The two features are on a disjoint set of CPUs, and not
>> +     * offered to guests by hypervisors.
>> +     */
>>      if ( cpu_has_tsx_ctrl )
>>      {
>>          uint32_t hi, lo;
>> @@ -90,9 +118,28 @@ void tsx_init(void)
>>  
>>          wrmsr(MSR_TSX_CTRL, lo, hi);
>>      }
>> +    else if ( cpu_has_tsx_force_abort )
>> +    {
>> +        /*
>> +         * On an early TSX-enable Skylake part subject to the memory ordering
>> +         * erratum, with at least the March 2019 microcode.
>> +         */
>> +        uint32_t hi, lo;
>> +
>> +        rdmsr(MSR_TSX_FORCE_ABORT, lo, hi);
>> +
>> +        /* Check bottom bit only.  Higher bits are various sentinels. */
>> +        rtm_disabled = !(opt_tsx & 1);
> I think you also calculate rtm_disabled in the previous if case
> (cpu_has_tsx_ctrl), maybe that could be pulled out?

rtm_disabled needs to not become true if !cpu_has_tsx_ctrl ||
!cpu_has_tsx_force_abort

Otherwise we'll default to disabling the CPUID bits even systems when we
can't we can't actually control TSX behaviour.

~Andrew


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-05-27 18:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-27 13:25 [PATCH 0/3] x86: Fixes to TSX handling Andrew Cooper
2021-05-27 13:25 ` [PATCH 1/3] x86/cpuid: Rework HLE and RTM handling Andrew Cooper
2021-05-27 15:03   ` Jan Beulich
2021-05-27 15:20   ` Roger Pau Monné
2021-05-27 13:25 ` [PATCH 2/3] x86/tsx: Minor cleanup and improvements Andrew Cooper
2021-05-27 15:06   ` Jan Beulich
2021-05-27 15:40   ` Roger Pau Monné
2021-05-27 13:25 ` [PATCH 3/3] x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead Andrew Cooper
2021-05-27 15:11   ` Jan Beulich
2021-05-27 17:24   ` Roger Pau Monné
2021-05-27 18:33     ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).