xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Xen Security Advisory 297 v1 (CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091) - Microarchitectural Data Sampling speculative side channel
@ 2019-05-14 17:00 Xen.org security team
  0 siblings, 0 replies; only message in thread
From: Xen.org security team @ 2019-05-14 17:00 UTC (permalink / raw)
  To: xen-announce, xen-devel, xen-users, oss-security; +Cc: Xen.org security team

[-- Attachment #1: Type: text/plain, Size: 11777 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

 Xen Security Advisory CVE-2018-12126,CVE-2018-12127,CVE-2018-12130,CVE-2019-11091 / XSA-297

       Microarchitectural Data Sampling speculative side channel

ISSUE DESCRIPTION
=================

Microarchitectural Data Sampling refers to a group of speculative
sidechannels vulnerabilities.  They consist of:

 * CVE-2018-12126 - MSBDS - Microarchitectural Store Buffer Data Sampling
 * CVE-2018-12127 - MLPDS - Microarchitectural Load Port Data Sampling
 * CVE-2018-12130 - MFBDS - Microarchitectural Fill Buffer Data Sampling
 * CVE-2019-11091 - MDSUM - Microarchitectural Data Sampling Uncacheable Memory

These issues pertain to the Load Ports, Store Buffers and Fill Buffers
in the pipeline.  The Load Ports are used to service all memory reads.
The Store Buffers service all in-flight speculative writes (including IO
Port writes), while the Fill Buffers service all memory writes which are
post-retirement, and no longer speculative.

Under certain circumstances, a later load which takes a fault or assist
(an internal condition to processor e.g. setting a pagetable Access or
Dirty bit) may be forwarded stale data from these buffers during
speculative execution, which may then be leaked via a sidechannel.

MDSUM (Uncacheable Memory) is a special case of the other three.
Previously, the use of uncacheable memory was believed to be safe
against speculative sidechannels.

For more details, see:
  https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html

IMPACT
======

An attacker, which could include a malicious untrusted user process on a
trusted guest, or an untrusted guest, can sample the content of
recently-used memory operands and IO Port writes.

This can include data from:

 * A previously executing context (process, or guest, or
   hypervisor/toolstack) at the same privilege level.
 * A higher privilege context (kernel, hypervisor, SMM) which
   interrupted the attacker's execution.

Vulnerable data is that on the same physical core as the attacker.  This
includes, when hyper-threading is enabled, adjacent threads.

An attacker cannot use this vulnerability to target specific data.  An
attack would likely require sampling over a period of time and the
application of statistical methods to reconstruct interesting data.

VULNERABLE SYSTEMS
==================

Systems running all versions of Xen are affected.

Only x86 processors are vulnerable.
ARM processors are not believed to be vulnerable.

Only Intel based processors are potentially affected.  Processors from
other manufacturers (eg, AMD) are not believed to be vulnerable.

Please consult the Intel Security Advisory for details on the affected
processors, and which are getting microcode updates.

MITIGATION
==========

This issue can be mitigated with a combination of software, firmware and
configuration changes.

Note that some affected processors are not expected to receive microcode
updates.  For these processors, there is no mitigation available.  Users
with workloads of concern on these processors should see about moving
the workload elsewhere.

RESOLUTION
==========

New microcode, and possibly a new firmware image is required to prevent
SMM data from being leaked with this vulnerability.  Consult your
hardware vendor.  The microcode update alone may be packaged for
boot-time loading.  Consult your dom0 OS vendor.

Software updates to Xen (details below) are required to prevent data
leakage from Xen into lower privileged contexts.

Leakage of data from Xen or other guests can only prevented entirely by
disabling hyper-threading (if available and active in the BIOS), and by
applying the patches to Xen.

The Xen patches use the MD_CLEAR feature (available in the new
microcode) on every exit to guest.  On affected hardware, MD_CLEAR is
used by default (controlled by `spec-ctrl=[no-]md-clear`), subject to
microcode availability.

Note: For compatibility with development versions of this fix,
`spec-ctrl=[no-]mds` is also accepted on Xen 4.12 and earlier as an
alias.  Consult vendor documentation in preference to here.

The availability of the MD_CLEAR functionality in microcode is reported
by Xen on boot.  e.g:

  [root@localhost ~]# xl dmesg | grep MD_CLEAR
  (XEN)   Hardware features: IBRS/IBPB STIBP L1D_FLUSH SSBD MD_CLEAR
  (XEN)   Support for HVM VMs: MSR_SPEC_CTRL RSB EAGER_FPU MD_CLEAR
  (XEN)   Support for PV VMs: MSR_SPEC_CTRL RSB EAGER_FPU MD_CLEAR

SMT/Hyper-Threading is not disabled by default because doing so is
expected to be too disruptive to existing configurations.  See the
discussion concerning SMT/Hyper-Threading below.

Guest software updates will be required to prevent data leakage within
the guest, by making use of the new MD_CLEAR functionality at suitable
points.  Consult the vendors of all software used in the guest.

xsa297/xsa297-unstable-*.patch      xen-unstable
xsa297/xsa297-4.12-*.patch          Xen 4.12.x
xsa297/xsa297-4.11-*.patch          Xen 4.11.x
xsa297/xsa297-4.10-*.patch          Xen 4.10.x
xsa297/xsa297-4.9-*.patch           Xen 4.9.x
xsa297/xsa297-4.8-*.patch           Xen 4.8.x
xsa297/xsa297-4.7-*.patch           Xen 4.7.x

$ sha256sum xsa297* xsa297*/*
242a4c9b24aeeac14b5f3e5069cd964c061072c52e63c3050a20986868a87a11  xsa297.meta
f45354f35ba1726aa274443394e4e1669a518df3c8b62ec2ca24d4bfcf2f4b8e  xsa297/xsa297-unstable-1.patch
e5a0cbcbd07e5fb20d7dcdd14011281d19baa9c0d958520deede59f11aec3e46  xsa297/xsa297-unstable-2.patch
c055646eabeb03f6c6d62e95e37fe705d7716a47936dbd65671bedaddad895f7  xsa297/xsa297-unstable-3.patch
9c2d4aa8122d3f8c109b32d707dae5d236f8df14e25db5db97743c29432bab75  xsa297/xsa297-unstable-4.patch
af7758b080579e130a9f1bcc5dc6b4a67699a09a5d89280cc2f6c45e26d23ed7  xsa297/xsa297-4.7-1.patch
15d214c38db3f240687a06a07725d31929c942201a07b988b20ba9766d325321  xsa297/xsa297-4.7-2.patch
ab16d609822712b6110867cd0179ade0f05ccb36e975cb8f7e711497b47813f5  xsa297/xsa297-4.7-3.patch
b292aff757800ad8fa37b4db0035a2cf9a580ff8c99816059e0aba9e925f573d  xsa297/xsa297-4.7-4.patch
15556852f8f62c4cb6019dd7731df31983b16076de6730a7a66defd3d0594927  xsa297/xsa297-4.7-5.patch
5308bbcc58b4811ff05fce60899e071e4fe3c9cade792fcc5c1c1b6d51001132  xsa297/xsa297-4.7-6.patch
e9752a539d103c465342a1f87c92dcc8a099f34fc668413f7f134174f2bffc68  xsa297/xsa297-4.7-7.patch
9f22cbc56de9fb439c97aa48952ee0a0ebf92b78c135934270c40840961ddf89  xsa297/xsa297-4.7-8.patch
b56828a7fb74529cc1cac152e2ee69df8de53c09676c14ca24823daf990b193a  xsa297/xsa297-4.8-1.patch
34b04ff0a463c211b30c41048f4a448bd8d83123b426b4de695034b46c7b0952  xsa297/xsa297-4.8-2.patch
4db2dc7aa3dc3fd1ed30160bd686c9a4890351688feafd7e6e3317438bccd561  xsa297/xsa297-4.8-3.patch
febf5797b67133f089baa39d710366eefff04d25438dbda051d33cdb78f59004  xsa297/xsa297-4.8-4.patch
eb75795273cca3c29a883bdb499e8e33237ad016c41080e5eb6f9755334e35c9  xsa297/xsa297-4.8-5.patch
601cd6c6c805bf9200702806e6e807db04406f15487191d1d95c4521c5f3f860  xsa297/xsa297-4.8-6.patch
dc3ef28a7b92788eb06a88d44ea8a1f1b72ef859ff46f6ff7c30376ae0ec17df  xsa297/xsa297-4.8-7.patch
a5dbc30dcc4bed0bf7ea77b73d9dab0c82e5efdc0a93569617b8e7e5221b3ca5  xsa297/xsa297-4.8-8.patch
10c9ec26d23360de805bc4439aee5766c22f12f6229b8ee5ea045d44cec79267  xsa297/xsa297-4.9-1.patch
34b04ff0a463c211b30c41048f4a448bd8d83123b426b4de695034b46c7b0952  xsa297/xsa297-4.9-2.patch
da0d8c5f171379698f9a0cb030c2dda8f37153016da929457f318e4f8a0ee2bb  xsa297/xsa297-4.9-3.patch
151822d5d60bfebc5df61de9a8593f504f6c25cf973a28dcb85e85c0416a0aed  xsa297/xsa297-4.9-4.patch
eb75795273cca3c29a883bdb499e8e33237ad016c41080e5eb6f9755334e35c9  xsa297/xsa297-4.9-5.patch
6636a0c92be892090522317c2b2145d2764168b50d4c1d964d2d52b3f33f54cf  xsa297/xsa297-4.9-6.patch
226de265e8ebb535711f94fb9cc78802f97d092348fb6fb707c15d557659bae9  xsa297/xsa297-4.9-7.patch
44b05fe2fe7d340a6ebfdd61d3d59fb5255410269051b9bcf5bbf400e6712189  xsa297/xsa297-4.9-8.patch
d05db34a8ee61f6762bb94b7c5c976364a3869fa7b70782a22dfb3c563c9d6c5  xsa297/xsa297-4.10-1.patch
6075a134a941cc83a9a7b2e428d7111857ab9bc5f3f75935a2f5d4cb7c7aca0d  xsa297/xsa297-4.10-2.patch
cc35f2c5d5c39120cbea0808b0772e9a3568fb45324023eb3192e9d4bea5dee0  xsa297/xsa297-4.10-3.patch
eb75795273cca3c29a883bdb499e8e33237ad016c41080e5eb6f9755334e35c9  xsa297/xsa297-4.10-4.patch
5f2d3ac26e4fbde707cc7c5d398a3550f05ca4814d60b5ce9d5b6d95e7243a68  xsa297/xsa297-4.10-5.patch
9942f973f341a0d7e439f679fcdbcecc9828f6ca50e8b48ba914b5b40688039b  xsa297/xsa297-4.10-6.patch
c2e33bd9cdd696a71eae276b20eea8dc7e13e6013380ab20635a9931dd2bf1d4  xsa297/xsa297-4.10-7.patch
87e79a9c2a8fd385b3536b2935293dc3efcaca31523b8f2780be5d62cce4248f  xsa297/xsa297-4.11-1.patch
57a63c120053a6bf76c1005735186ad477ea736354a2f60d5517b205f553f600  xsa297/xsa297-4.11-2.patch
bb98de0ccdc74b6db6045cf4869da3788c83eaec12e4a9292293c409f94cd473  xsa297/xsa297-4.11-3.patch
eb75795273cca3c29a883bdb499e8e33237ad016c41080e5eb6f9755334e35c9  xsa297/xsa297-4.11-4.patch
d8aa296014cb15107668e351fd383e1dfd9af56a3d109725f7b5e1d9a2a7cf81  xsa297/xsa297-4.11-5.patch
a28fb9f1cc84b8f91e2b53e629c7729bab60b41675720eb169d8172d9edf58f0  xsa297/xsa297-4.11-6.patch
8310358fb547c306b35762bf8b389700560e277f8d58f780f7215b3f78dfa4cc  xsa297/xsa297-4.11-7.patch
fd3a8d83645799e832767a43647b1255e2e0240ba9c40c6f661d27a08e880770  xsa297/xsa297-4.12-1.patch
6cf827a557c1993db9f7f1c08a8efc3e2ccb955fe2d59c0cadb1b3294d9bfed8  xsa297/xsa297-4.12-2.patch
14d47c1b448c3e81f773e7f762e6af682285d61ce8fade8902f52340cb77efab  xsa297/xsa297-4.12-3.patch
891c181ce03b765795534e20004679bb102150d98c8db2b879a2cabe60689089  xsa297/xsa297-4.12-4.patch
1d17219413ca150e4815469481d19f044b1abcf3dc7f6824caccbc85b3a99702  xsa297/xsa297-4.12-5.patch
3fb437b18d9cba432c424331a083da52b3ad1ebf81fed3413315ee1d8117ca40  xsa297/xsa297-4.12-6.patch
71a8bfd370269bcd3520e8969a9c4fbedb75035e3eb4b12347ac08d67ccf91e0  xsa297/xsa297-4.12-7.patch
$

DISCUSSION CONCERNING SMT/HYPER-THREADING
=========================================

An attacker can perform an MDS-based attack from userspace, with
unprivileged instructions.  As the leakage occurs from stale data
latched in buffers in the pipeline, the only defence is to flush the
buffers before moving to a less privileged context.

On affected processors when hyper-threading is enabled, the in-flight
memory content from other threads can be observed, from whatever context
is executing.  This could be a different piece of userspace, or the
guest kernel, or Xen.  It could also be from a vcpu belonging to a
different guest.

Work is ongoing on xen-devel to develop core-aware scheduling, which
will mitigate the cross-domain leak by ensuring that vcpus from
different domains are never concurrently scheduled on sibling threads.
However, this alone will not prevent cross-privilege level leakage from
within the same domain, including leakage from Xen.

If you have any untrusted code running in VMs, and need to prevent the
risk of data leakage, the only available option at the moment is to
disable hyper-threading.  This is preferably done in the BIOS, but can
also be done by Xen at boot time by specifying `smt=no` on the command
line.

NOTE REGARDING LACK OF EMBARGO
==============================

Despite an attempt to organise predisclosure, the discoverers ultimately
did not authorise a predisclosure.
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAlza5AgMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZYnYH/RMXGBcI79B0qD3lG19j8b//j8iDLskTHSkRg6HU
3UXMu1aAEqtgP3yoDl00vDKBMqfNygduQt5mf+nPgM3QDJi0tU9bkUjq+K+u5GQN
wHj6HXneAtv7pQNo+VCO7JIXg0/XFVC/qLA44Edw9V3rZIq++LzWNQAtXmHs6xlx
0D2gq7B3hsz0ZkM7og9uPCFmkvwXbjq63f2SUmBw6u95lUThu3BDSkG87/HcqBDb
n9MhvcIsy8cfvVt4A47UIPXm48MVQfH5a2Q8yTRK+Ix59vJ2QzAXn8N4WtksI0u6
YDgDihr7yU5xuQAU593Gbs5hOm9eDyTSY2jy0bwPlZb/Nf0=
=XbSi
-----END PGP SIGNATURE-----

[-- Attachment #2: xsa297.meta --]
[-- Type: application/octet-stream, Size: 3631 bytes --]

{
  "XSA": 297,
  "SupportedVersions": [
    "master",
    "4.12",
    "4.11",
    "4.10",
    "4.9",
    "4.8",
    "4.7"
  ],
  "Trees": [
    "xen"
  ],
  "Recipes": {
    "4.10": {
      "Recipes": {
        "xen": {
          "StableRef": "aa6978c2688f28e5fc55c960bbfe5e64f9105f84",
          "Prereqs": [],
          "Patches": [
            "xsa297/xsa297-4.10-1.patch",
            "xsa297/xsa297-4.10-2.patch",
            "xsa297/xsa297-4.10-3.patch",
            "xsa297/xsa297-4.10-4.patch",
            "xsa297/xsa297-4.10-5.patch",
            "xsa297/xsa297-4.10-6.patch",
            "xsa297/xsa297-4.10-7.patch"
          ]
        }
      }
    },
    "4.11": {
      "Recipes": {
        "xen": {
          "StableRef": "3b062f5040a103d86b44c5e8412ff9555b00d06c",
          "Prereqs": [],
          "Patches": [
            "xsa297/xsa297-4.11-1.patch",
            "xsa297/xsa297-4.11-2.patch",
            "xsa297/xsa297-4.11-3.patch",
            "xsa297/xsa297-4.11-4.patch",
            "xsa297/xsa297-4.11-5.patch",
            "xsa297/xsa297-4.11-6.patch",
            "xsa297/xsa297-4.11-7.patch"
          ]
        }
      }
    },
    "4.12": {
      "Recipes": {
        "xen": {
          "StableRef": "fd2a34c9655acecaaa1541dd84fc670936303175",
          "Prereqs": [],
          "Patches": [
            "xsa297/xsa297-4.12-1.patch",
            "xsa297/xsa297-4.12-2.patch",
            "xsa297/xsa297-4.12-3.patch",
            "xsa297/xsa297-4.12-4.patch",
            "xsa297/xsa297-4.12-5.patch",
            "xsa297/xsa297-4.12-6.patch",
            "xsa297/xsa297-4.12-7.patch"
          ]
        }
      }
    },
    "4.7": {
      "Recipes": {
        "xen": {
          "StableRef": "7c8db58d3739c805f4c0f773b65157f306b00c2a",
          "Prereqs": [],
          "Patches": [
            "xsa297/xsa297-4.7-1.patch",
            "xsa297/xsa297-4.7-2.patch",
            "xsa297/xsa297-4.7-3.patch",
            "xsa297/xsa297-4.7-4.patch",
            "xsa297/xsa297-4.7-5.patch",
            "xsa297/xsa297-4.7-6.patch",
            "xsa297/xsa297-4.7-7.patch",
            "xsa297/xsa297-4.7-8.patch"
          ]
        }
      }
    },
    "4.8": {
      "Recipes": {
        "xen": {
          "StableRef": "e9d860f1f657a198d990bdae3e295001bd19223c",
          "Prereqs": [],
          "Patches": [
            "xsa297/xsa297-4.8-1.patch",
            "xsa297/xsa297-4.8-2.patch",
            "xsa297/xsa297-4.8-3.patch",
            "xsa297/xsa297-4.8-4.patch",
            "xsa297/xsa297-4.8-5.patch",
            "xsa297/xsa297-4.8-6.patch",
            "xsa297/xsa297-4.8-7.patch",
            "xsa297/xsa297-4.8-8.patch"
          ]
        }
      }
    },
    "4.9": {
      "Recipes": {
        "xen": {
          "StableRef": "63d9330ba9fdec7c8e9346e6d85360747d61c947",
          "Prereqs": [],
          "Patches": [
            "xsa297/xsa297-4.9-1.patch",
            "xsa297/xsa297-4.9-2.patch",
            "xsa297/xsa297-4.9-3.patch",
            "xsa297/xsa297-4.9-4.patch",
            "xsa297/xsa297-4.9-5.patch",
            "xsa297/xsa297-4.9-6.patch",
            "xsa297/xsa297-4.9-7.patch",
            "xsa297/xsa297-4.9-8.patch"
          ]
        }
      }
    },
    "master": {
      "Recipes": {
        "xen": {
          "StableRef": "99bb45e684283b3bc621dbc99b1b93c856b4dd1c",
          "Prereqs": [],
          "Patches": [
            "xsa297/xsa297-unstable-1.patch",
            "xsa297/xsa297-unstable-2.patch",
            "xsa297/xsa297-unstable-3.patch",
            "xsa297/xsa297-unstable-4.patch"
          ]
        }
      }
    }
  }
}

[-- Attachment #3: xsa297/xsa297-unstable-1.patch --]
[-- Type: application/octet-stream, Size: 2149 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Misc non-functional cleanup

 * Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
   mitigations will shortly appear.
 * Use alternative_input() and cover the lack of memory cobber with a further
   barrier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index a5b5651..daabede 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -69,6 +69,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     uint32_t val = 0;
 
     /*
+     * Branch Target Injection:
+     *
      * Latch the new shadow value, then enable shadowing, then update the MSR.
      * There are no SMP issues here; only local processor ordering concerns.
      */
@@ -76,8 +78,9 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     barrier();
     info->spec_ctrl_flags |= SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE("", "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input("", "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -86,13 +89,16 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     uint32_t val = info->xen_spec_ctrl;
 
     /*
+     * Branch Target Injection:
+     *
      * Disable shadowing before updating the MSR.  There are no SMP issues
      * here; only local processor ordering concerns.
      */
     info->spec_ctrl_flags &= ~SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE("", "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input("", "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 #endif /* __ASSEMBLY__ */

[-- Attachment #4: xsa297/xsa297-unstable-2.patch --]
[-- Type: application/octet-stream, Size: 7444 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data
 Sampling

The MD_CLEAR feature can be automatically offered to guests.  No
infrastructure is needed in Xen to support the guest making use of it.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index d9ff372..fdf4cfc 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -483,7 +483,7 @@ accounting for hardware capabilities as enumerated via CPUID.
 
 Currently accepted:
 
-The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`,
+The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`,
 `l1d-flush` and `ssbd` are used by default if available and applicable.  They can
 be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
 won't offer them to guests.
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index 66c3f05..4e3656f 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -211,6 +211,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
 
         {"avx512-4vnniw",0x00000007,  0, CPUID_REG_EDX,  2,  1},
         {"avx512-4fmaps",0x00000007,  0, CPUID_REG_EDX,  3,  1},
+        {"md-clear",     0x00000007,  0, CPUID_REG_EDX, 10,  1},
         {"ibrsb",        0x00000007,  0, CPUID_REG_EDX, 26,  1},
         {"stibp",        0x00000007,  0, CPUID_REG_EDX, 27,  1},
         {"l1d-flush",    0x00000007,  0, CPUID_REG_EDX, 28,  1},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index ee74d86..9d93eee 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -154,6 +154,7 @@ static const char *const str_7d0[32] =
     [ 2] = "avx512_4vnniw", [ 3] = "avx512_4fmaps",
     [ 4] = "fsrm",
 
+    [10] = "md-clear",
     /* 12 */                [13] = "tsx-force-abort",
 
     [18] = "pconfig",
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index cb170ac..6f59325 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -29,7 +29,12 @@ static int __init parse_xen_cpuid(const char *s)
         if ( !ss )
             ss = strchr(s, '\0');
 
-        if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+        if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        {
+            if ( !val )
+                setup_clear_cpu_cap(X86_FEATURE_MD_CLEAR);
+        }
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
         {
             if ( !val )
                 setup_clear_cpu_cap(X86_FEATURE_IBPB);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index be1424f..d566d9b 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -297,17 +297,19 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
     printk("Speculative mitigation facilities:\n");
 
     /* Hardware features which pertain to speculative mitigations. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP"     : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))  ? " SSBD"      : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"      : "",
            (caps & ARCH_CAPS_IBRS_ALL)              ? " IBRS_ALL"  : "",
            (caps & ARCH_CAPS_RDCL_NO)               ? " RDCL_NO"   : "",
            (caps & ARCH_CAPS_RSBA)                  ? " RSBA"      : "",
            (caps & ARCH_CAPS_SKIP_L1DFL)            ? " SKIP_L1DFL": "",
-           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "");
+           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "",
+           (caps & ARCH_CAPS_MDS_NO)                ? " MDS_NO"    : "");
 
     /* Compiled-in support which pertains to mitigations. */
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
@@ -346,23 +348,25 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * mitigation support for guests.
      */
 #ifdef CONFIG_HVM
-    printk("  Support for HVM VMs:%s%s%s%s\n",
+    printk("  Support for HVM VMs:%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
-           opt_eager_fpu                             ? " EAGER_FPU"     : "");
+           opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
 
 #endif
 #ifdef CONFIG_PV
-    printk("  Support for PV VMs:%s%s%s%s\n",
+    printk("  Support for PV VMs:%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
-           opt_eager_fpu                             ? " EAGER_FPU"     : "");
+           opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 389f95f..637259b 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -51,6 +51,7 @@
 #define ARCH_CAPS_RSBA			(_AC(1, ULL) << 2)
 #define ARCH_CAPS_SKIP_L1DFL		(_AC(1, ULL) << 3)
 #define ARCH_CAPS_SSB_NO		(_AC(1, ULL) << 4)
+#define ARCH_CAPS_MDS_NO		(_AC(1, ULL) << 5)
 
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 2bcc548..55231d4 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -242,6 +242,7 @@ XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
 XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation Single Precision */
+XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */

[-- Attachment #5: xsa297/xsa297-unstable-3.patch --]
[-- Type: application/octet-stream, Size: 6217 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers

Three synthetic features are introduced, as we need individual control of
each, depending on circumstances.  A later change will enable them at
appropriate points.

The verw_sel field doesn't strictly need to live in struct cpu_info.  It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index 052228c..33930ce 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -110,6 +110,7 @@ void __dummy__(void)
     BLANK();
 
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
+    OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
     OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index 1d0bf6f..996f89d 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -32,3 +32,6 @@ XEN_CPUFEATURE(SC_RSB_PV,       (FSCAPINTS+0)*32+18) /* RSB overwrite needed for
 XEN_CPUFEATURE(SC_RSB_HVM,      (FSCAPINTS+0)*32+19) /* RSB overwrite needed for HVM */
 XEN_CPUFEATURE(SC_MSR_IDLE,     (FSCAPINTS+0)*32+21) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
 XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+22) /* Xen uses MSR_DEBUGCTL.LBR */
+XEN_CPUFEATURE(SC_VERW_PV,      (FSCAPINTS+0)*32+23) /* VERW used by Xen for PV */
+XEN_CPUFEATURE(SC_VERW_HVM,     (FSCAPINTS+0)*32+24) /* VERW used by Xen for HVM */
+XEN_CPUFEATURE(SC_VERW_IDLE,    (FSCAPINTS+0)*32+25) /* VERW used by Xen for idle */
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 5bd64b2..f3508c3 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -38,6 +38,7 @@ struct vcpu;
 struct cpu_info {
     struct cpu_user_regs guest_cpu_user_regs;
     unsigned int processor_id;
+    unsigned int verw_sel;
     struct vcpu *current_vcpu;
     unsigned long per_cpu_offset;
     unsigned long cr4;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index daabede..1339ddd 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -61,6 +61,13 @@ static inline void init_shadow_spec_ctrl_state(void)
     info->shadow_spec_ctrl = 0;
     info->xen_spec_ctrl = default_xen_spec_ctrl;
     info->spec_ctrl_flags = default_spec_ctrl_flags;
+
+    /*
+     * For least latency, the VERW selector should be a writeable data
+     * descriptor resident in the cache.  __HYPERVISOR_DS32 shares a cache
+     * line with __HYPERVISOR_CS, so is expected to be very cache-hot.
+     */
+    info->verw_sel = __HYPERVISOR_DS32;
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe after this call. */
@@ -81,6 +88,22 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     alternative_input("", "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When entering idle, our store buffer entries
+     * are re-partitioned to allow the other threads to use them.
+     *
+     * Flush the buffers to ensure that no sensitive data of ours can be
+     * leaked by a sibling after it gets our store buffer entries.
+     *
+     * Note: VERW must be encoded with a memory operand, as it is only that
+     * form which causes a flush.
+     */
+    alternative_input("", "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
+                      [sel] "m" (info->verw_sel));
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -99,6 +122,17 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     alternative_input("", "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When exiting idle, the other threads store
+     * buffer entries are re-partitioned to give us some.
+     *
+     * We now have store buffer entries with stale data from sibling threads.
+     * A flush if necessary will be performed on the return to guest path.
+     */
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 803f7ce..c60093b 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -241,12 +241,16 @@
 /* Use when exiting to PV guest context. */
 #define SPEC_CTRL_EXIT_TO_PV                                            \
     ALTERNATIVE "",                                                     \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;              \
+    ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)),           \
+        X86_FEATURE_SC_VERW_PV
 
 /* Use when exiting to HVM guest context. */
 #define SPEC_CTRL_EXIT_TO_HVM                                           \
     ALTERNATIVE "",                                                     \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM;             \
+    ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)),           \
+        X86_FEATURE_SC_VERW_HVM
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.

[-- Attachment #6: xsa297/xsa297-unstable-4.patch --]
[-- Type: application/octet-stream, Size: 12740 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Introduce options to control VERW flushing

The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:

 MLPDS - Microarchitectural Load Port Data Sampling
 MSBDS - Microarchitectural Store Buffer Data Sampling
 MFBDS - Microarchitectural Fill Buffer Data Sampling
 MDSUM - Microarchitectural Data Sampling Uncacheable Memory

MDSUM is a special case of the other three, and isn't distinguished further.

These issues pertain to three microarchitectural buffers.  The Load Ports, the
Store Buffers and the Fill Buffers.  Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.

For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:

 * Vulnerable hardware is divided into two categories - parts which suffer
   from SP only, and parts with any other combination of vulnerabilities.

 * SP only has an HT interaction when the thread goes idle, due to the static
   partitioning of resources.  LP and FB have HT interactions at all points,
   due to the competitive sharing of resources.  All issues potentially leak
   data across the return-to-guest transition.

 * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
   we don't need to do both on the HVM return-to-guest path.  However, some
   parts are not vulnerable to L1TF (therefore have no MSR_FLUSH_CMD), but are
   vulnerable to MDS, so do require VERW on the HVM path.

Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index fdf4cfc..1fc1802 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1903,7 +1903,7 @@ not be able to control the state of the mitigation.
 By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
 >              l1d-flush,l1tf-barrier}=<bool> ]`
 
@@ -1927,9 +1927,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=` and `rsb=` offer fine grained control
-over the alternative blocks used by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+grained control over the alternative blocks used by Xen.  These impact Xen's
+ability to protect itself, and Xen's ability to virtualise support for guests
+to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -1938,6 +1939,11 @@ protect itself, and Xen's ability to virtualise support for guests to use.
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
 * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
   Return Address Stack on entry to Xen.
+* `md-clear=` offers control over whether to use VERW to flush
+  microarchitectural buffers on idle and exit from Xen.  *Note: For
+  compatibility with development versions of this fix, `mds=` is also accepted
+  on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
+  preference to here.*
 
 If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index d566d9b..5d98cac 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -36,6 +36,8 @@ static bool __initdata opt_msr_sc_pv = true;
 static bool __initdata opt_msr_sc_hvm = true;
 static bool __initdata opt_rsb_pv = true;
 static bool __initdata opt_rsb_hvm = true;
+static int8_t __initdata opt_md_clear_pv = -1;
+static int8_t __initdata opt_md_clear_hvm = -1;
 
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
@@ -61,6 +63,9 @@ paddr_t __read_mostly l1tf_addr_mask, __read_mostly l1tf_safe_maddr;
 static bool __initdata cpu_has_bug_l1tf;
 static unsigned int __initdata l1d_maxphysaddr;
 
+static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
+static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
+
 static int __init parse_spec_ctrl(const char *s)
 {
     const char *ss;
@@ -98,6 +103,8 @@ static int __init parse_spec_ctrl(const char *s)
         disable_common:
             opt_rsb_pv = false;
             opt_rsb_hvm = false;
+            opt_md_clear_pv = 0;
+            opt_md_clear_hvm = 0;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -120,11 +127,13 @@ static int __init parse_spec_ctrl(const char *s)
         {
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
+            opt_md_clear_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
+            opt_md_clear_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
         {
@@ -136,6 +145,11 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_pv = val;
             opt_rsb_hvm = val;
         }
+        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        {
+            opt_md_clear_pv = val;
+            opt_md_clear_hvm = val;
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -323,7 +337,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -334,6 +348,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
            opt_ibpb                                  ? " IBPB"  : "",
            opt_l1d_flush                             ? " L1D_FLUSH" : "",
+           opt_md_clear_pv || opt_md_clear_hvm       ? " VERW"  : "",
            opt_l1tf_barrier                          ? " L1TF_BARRIER" : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
@@ -744,6 +759,107 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
+/* Calculate whether this CPU is vulnerable to MDS. */
+static __init void mds_calculations(uint64_t caps)
+{
+    /* MDS is only known to affect Intel Family 6 processors at this time. */
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return;
+
+    /* Any processor advertising MDS_NO should be not vulnerable to MDS. */
+    if ( caps & ARCH_CAPS_MDS_NO )
+        return;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+        /*
+         * Core processors since at least Nehalem are vulnerable.
+         */
+    case 0x1f: /* Auburndale / Havendale */
+    case 0x1e: /* Nehalem */
+    case 0x1a: /* Nehalem EP */
+    case 0x2e: /* Nehalem EX */
+    case 0x25: /* Westmere */
+    case 0x2c: /* Westmere EP */
+    case 0x2f: /* Westmere EX */
+    case 0x2a: /* SandyBridge */
+    case 0x2d: /* SandyBridge EP/EX */
+    case 0x3a: /* IvyBridge */
+    case 0x3e: /* IvyBridge EP/EX */
+    case 0x3c: /* Haswell */
+    case 0x3f: /* Haswell EX/EP */
+    case 0x45: /* Haswell D */
+    case 0x46: /* Haswell H */
+    case 0x3d: /* Broadwell */
+    case 0x47: /* Broadwell H */
+    case 0x4f: /* Broadwell EP/EX */
+    case 0x56: /* Broadwell D */
+    case 0x4e: /* Skylake M */
+    case 0x5e: /* Skylake D */
+        cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Some Core processors have per-stepping vulnerability.
+         */
+    case 0x55: /* Skylake-X / Cascade Lake */
+        if ( boot_cpu_data.x86_mask <= 5 )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x8e: /* Kaby / Coffee / Whiskey Lake M */
+        if ( boot_cpu_data.x86_mask <= 0xb )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x9e: /* Kaby / Coffee / Whiskey Lake D */
+        if ( boot_cpu_data.x86_mask <= 0xc )
+            cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Very old and very new Atom processors are not vulnerable.
+         */
+    case 0x1c: /* Pineview */
+    case 0x26: /* Lincroft */
+    case 0x27: /* Penwell */
+    case 0x35: /* Cloverview */
+    case 0x36: /* Cedarview */
+    case 0x7a: /* Goldmont */
+        break;
+
+        /*
+         * Middling Atom processors are vulnerable to just the Store Buffer
+         * aspect.
+         */
+    case 0x37: /* Baytrail / Valleyview (Silvermont) */
+    case 0x4a: /* Merrifield */
+    case 0x4c: /* Cherrytrail / Brasswell */
+    case 0x4d: /* Avaton / Rangely (Silvermont) */
+    case 0x5a: /* Moorefield */
+    case 0x5d:
+    case 0x65:
+    case 0x6e:
+    case 0x75:
+        /*
+         * Knights processors (which are based on the Silvermont/Airmont
+         * microarchitecture) are similarly only affected by the Store Buffer
+         * aspect.
+         */
+    case 0x57: /* Knights Landing */
+    case 0x85: /* Knights Mill */
+        cpu_has_bug_msbds_only = true;
+        break;
+
+    default:
+        printk("Unrecognised CPU model %#x - assuming vulnerable to MDS\n",
+               boot_cpu_data.x86_model);
+        cpu_has_bug_mds = true;
+        break;
+    }
+}
+
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
@@ -937,6 +1053,47 @@ void __init init_speculation_mitigations(void)
             "enabled.  Please assess your configuration and choose an\n"
             "explicit 'smt=<bool>' setting.  See XSA-273.\n");
 
+    mds_calculations(caps);
+
+    /*
+     * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+     * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+     * but it is somewhat better than nothing.
+     */
+    if ( opt_md_clear_pv == -1 )
+        opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                           boot_cpu_has(X86_FEATURE_MD_CLEAR));
+    if ( opt_md_clear_hvm == -1 )
+        opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                            boot_cpu_has(X86_FEATURE_MD_CLEAR));
+
+    /*
+     * Enable MDS defences as applicable.  The PV blocks need using all the
+     * time, and the Idle blocks need using if either PV or HVM defences are
+     * used.
+     *
+     * HVM is more complicated.  The MD_CLEAR microcode extends L1D_FLUSH with
+     * equivelent semantics to avoid needing to perform both flushes on the
+     * HVM path.  The HVM blocks don't need activating if our hypervisor told
+     * us it was handling L1D_FLUSH, or we are using L1D_FLUSH ourselves.
+     */
+    if ( opt_md_clear_pv )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_PV);
+    if ( opt_md_clear_pv || opt_md_clear_hvm )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
+    if ( opt_md_clear_hvm && !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_HVM);
+
+    /*
+     * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+     * active and no explicit SMT choice.
+     */
+    if ( opt_smt == -1 && cpu_has_bug_mds && hw_smt_enabled )
+        warning_add(
+            "Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading\n"
+            "enabled.  Mitigations will not be fully effective.  Please\n"
+            "choose an explicit smt=<bool> setting.  See XSA-297.\n");
+
     print_details(thunk, caps);
 
     /*

[-- Attachment #7: xsa297/xsa297-4.7-1.patch --]
[-- Type: application/octet-stream, Size: 8272 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/tsx: Implement controls for RTM force-abort mode

The CPUID bit and MSR are deliberately not exposed to guests, because they
won't exist on newer processors.  As vPMU isn't security supported, the
misbehaviour of PCR3 isn't expected to impact production deployments.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6be613f29b4205349275d24367bd4c82fb2960dd
master date: 2019-03-12 17:05:21 +0000

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 3855cd3..b620768 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1715,7 +1715,7 @@ Use Virtual Processor ID support if available.  This prevents the need for TLB
 flushes on VM entry and exit, increasing performance.
 
 ### vpmu
-> `= ( <boolean> | { bts | ipc | arch [, ...] } )`
+> `= ( <boolean> | { bts | ipc | arch | rtm-abort=<bool> [, ...] } )`
 
 > Default: `off`
 
@@ -1741,6 +1741,21 @@ in the Pre-Defined Architectural Performance Events table from the Intel 64
 and IA-32 Architectures Software Developer's Manual, Volume 3B, System
 Programming Guide, Part 2.
 
+vpmu=rtm-abort controls a trade-off between working Restricted Transactional
+Memory, and working performance counters.
+
+All processors released to date (Q1 2019) supporting Transactional Memory
+Extensions suffer an erratum which has been addressed in microcode.
+
+Processors based on the Skylake microarchitecture with up-to-date
+microcode internally use performance counter 3 to work around the erratum.
+A consequence is that the counter gets reprogrammed whenever an `XBEGIN`
+instruction is executed.
+
+An alternative mode exists where PCR3 behaves as before, at the cost of
+`XBEGIN` unconditionally aborting.  Enabling `rtm-abort` mode will
+activate this alternative mode.
+
 If a boolean is not used, combinations of flags are allowed, comma separated.
 For example, vpmu=arch,bts.
 
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 184f8ad..288fc48 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -149,7 +149,11 @@ static const char *str_e8b[32] =
 
 static const char *str_7d0[32] =
 {
-    [0 ... 25] = "REZ",
+    [0 ... 11] = "REZ",
+
+    [12] = "REZ",           [13] = "tsx-force-abort",
+
+    [14 ... 25] = "REZ",
 
     [26] = "ibrsb",         [27] = "stibp",
     [28] = "l1d_flush",     [29] = "arch_caps",
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 181e815..62c1449 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -355,6 +355,9 @@ static void Intel_errata_workarounds(struct cpuinfo_x86 *c)
 	if (c->x86 == 6 && cpu_has_clflush &&
 	    (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
 		__set_bit(X86_FEATURE_CLFLUSH_MONITOR, c->x86_capability);
+
+	if (cpu_has_tsx_force_abort && opt_rtm_abort)
+		wrmsrl(MSR_TSX_FORCE_ABORT, TSX_FORCE_ABORT_RTM);
 }
 
 
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 2f9ddf6..5635a97 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -53,6 +53,7 @@ CHECK_pmu_params;
 static unsigned int __read_mostly opt_vpmu_enabled;
 unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
 unsigned int __read_mostly vpmu_features = 0;
+bool_t __read_mostly opt_rtm_abort;
 static void parse_vpmu_params(char *s);
 custom_param("vpmu", parse_vpmu_params);
 
@@ -63,6 +64,8 @@ static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
 
 static int parse_vpmu_param(char *s, unsigned int len)
 {
+    int val;
+
     if ( !*s || !len )
         return 0;
     if ( !strncmp(s, "bts", len) )
@@ -71,6 +74,8 @@ static int parse_vpmu_param(char *s, unsigned int len)
         vpmu_features |= XENPMU_FEATURE_IPC_ONLY;
     else if ( !strncmp(s, "arch", len) )
         vpmu_features |= XENPMU_FEATURE_ARCH_ONLY;
+    else if ( (val = parse_boolean("rtm-abort", s, s + len)) >= 0 )
+        opt_rtm_abort = val;
     else
         return 1;
     return 0;
@@ -97,6 +102,10 @@ static void __init parse_vpmu_params(char *s)
                 break;
             p = sep + 1;
         }
+
+        if ( !vpmu_features ) /* rtm-abort doesn't imply vpmu=1 */
+            break;
+
         /* fall through */
     case 1:
         /* Default VPMU mode */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 522430b..ce22756 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3797,6 +3797,8 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_PRED_CMD:
     case MSR_FLUSH_CMD:
         /* Write-only */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         goto gp_fault;
 
     case MSR_SPEC_CTRL:
@@ -4026,6 +4028,8 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
 
     case MSR_ARCH_CAPABILITIES:
         /* Read-only */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         goto gp_fault;
 
     case MSR_AMD64_NB_CFG:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 583936e..4b257fb 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2904,6 +2904,8 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_INTEL_PLATFORM_INFO:
         case MSR_ARCH_CAPABILITIES:
             /* The MSR is read-only. */
+        case MSR_TSX_FORCE_ABORT:
+            /* Not offered to guests. */
             goto fail;
 
         case MSR_SPEC_CTRL:
@@ -3075,6 +3077,8 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_PRED_CMD:
         case MSR_FLUSH_CMD:
             /* Write-only */
+        case MSR_TSX_FORCE_ABORT:
+            /* Not offered to guests. */
             goto fail;
 
         case MSR_SPEC_CTRL:
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f9c8335..c5001b4 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -85,6 +85,7 @@
 #define cpu_has_avx             boot_cpu_has(X86_FEATURE_AVX)
 #define cpu_has_lwp             boot_cpu_has(X86_FEATURE_LWP)
 #define cpu_has_mpx             boot_cpu_has(X86_FEATURE_MPX)
+#define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
 #define cpu_has_rdtscp          boot_cpu_has(X86_FEATURE_RDTSCP)
 #define cpu_has_svm		boot_cpu_has(X86_FEATURE_SVM)
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 76d92d8..c2aa36e 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -50,6 +50,9 @@
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
 
+#define MSR_TSX_FORCE_ABORT             0x0000010f
+#define TSX_FORCE_ABORT_RTM             (_AC(1, ULL) <<  0)
+
 /* Intel MSRs. Some also available on other CPUs */
 #define MSR_IA32_PERFCTR0		0x000000c1
 #define MSR_IA32_A_PERFCTR0		0x000004c1
diff --git a/xen/include/asm-x86/vpmu.h b/xen/include/asm-x86/vpmu.h
index 75b1973..0072d12 100644
--- a/xen/include/asm-x86/vpmu.h
+++ b/xen/include/asm-x86/vpmu.h
@@ -127,6 +127,7 @@ static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
 extern unsigned int vpmu_mode;
 extern unsigned int vpmu_features;
+extern bool_t opt_rtm_abort;
 
 /* Context switch */
 static inline void vpmu_switch_from(struct vcpu *prev)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index f5fb483..73de870 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -227,6 +227,7 @@ XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
 XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
+XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A! STIBP */
 XEN_CPUFEATURE(L1D_FLUSH,     9*32+28) /*S  MSR_FLUSH_CMD and L1D flush. */

[-- Attachment #8: xsa297/xsa297-4.7-2.patch --]
[-- Type: application/octet-stream, Size: 4162 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Reposition the XPTI command line parsing logic

It has ended up in the middle of the mitigation calculation logic.  Move it to
be beside the other command line parsing.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 6145c98..1343e0a 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -201,6 +201,68 @@ static int __init parse_spec_ctrl(char *s)
 }
 custom_param("spec-ctrl", parse_spec_ctrl);
 
+int8_t __read_mostly opt_xpti = -1;
+
+static __init void xpti_init_default(uint64_t caps)
+{
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+        caps = ARCH_CAPABILITIES_RDCL_NO;
+
+    if ( caps & ARCH_CAPABILITIES_RDCL_NO )
+        opt_xpti = 0;
+    else
+        opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
+}
+
+static __init int parse_xpti(char *s)
+{
+    char *ss;
+    int val, rc = 0;
+
+    /* Inhibit the defaults as an explicit choice has been given. */
+    if ( opt_xpti == -1 )
+        opt_xpti = 0;
+
+    /* Interpret 'xpti' alone in its positive boolean form. */
+    if ( *s == '\0' )
+        opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        switch ( parse_bool(s) )
+        {
+        case 0:
+            opt_xpti = 0;
+            break;
+
+        case 1:
+            opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
+            break;
+
+        default:
+            if ( !strcmp(s, "default") )
+                opt_xpti = -1;
+            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
+                opt_xpti = (opt_xpti & ~OPT_XPTI_DOM0) |
+                           (val ? OPT_XPTI_DOM0 : 0);
+            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
+                opt_xpti = (opt_xpti & ~OPT_XPTI_DOMU) |
+                           (val ? OPT_XPTI_DOMU : 0);
+            else
+                rc = -EINVAL;
+            break;
+        }
+
+        s = ss + 1;
+    } while ( ss );
+
+    return rc;
+}
+custom_param("xpti", parse_xpti);
+
 int8_t __read_mostly opt_pv_l1tf = -1;
 
 static __init int parse_pv_l1tf(char *s)
@@ -639,68 +701,6 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
-int8_t __read_mostly opt_xpti = -1;
-
-static __init void xpti_init_default(uint64_t caps)
-{
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-        caps = ARCH_CAPABILITIES_RDCL_NO;
-
-    if ( caps & ARCH_CAPABILITIES_RDCL_NO )
-        opt_xpti = 0;
-    else
-        opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
-}
-
-static __init int parse_xpti(char *s)
-{
-    char *ss;
-    int val, rc = 0;
-
-    /* Inhibit the defaults as an explicit choice has been given. */
-    if ( opt_xpti == -1 )
-        opt_xpti = 0;
-
-    /* Interpret 'xpti' alone in its positive boolean form. */
-    if ( *s == '\0' )
-        opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
-
-    do {
-        ss = strchr(s, ',');
-        if ( ss )
-            *ss = '\0';
-
-        switch ( parse_bool(s) )
-        {
-        case 0:
-            opt_xpti = 0;
-            break;
-
-        case 1:
-            opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
-            break;
-
-        default:
-            if ( !strcmp(s, "default") )
-                opt_xpti = -1;
-            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
-                opt_xpti = (opt_xpti & ~OPT_XPTI_DOM0) |
-                           (val ? OPT_XPTI_DOM0 : 0);
-            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
-                opt_xpti = (opt_xpti & ~OPT_XPTI_DOMU) |
-                           (val ? OPT_XPTI_DOMU : 0);
-            else
-                rc = -EINVAL;
-            break;
-        }
-
-        s = ss + 1;
-    } while ( ss );
-
-    return rc;
-}
-custom_param("xpti", parse_xpti);
-
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;

[-- Attachment #9: xsa297/xsa297-4.7-3.patch --]
[-- Type: application/octet-stream, Size: 2193 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT

This is a model specific register which details the current configuration
cores and threads in the package.  Because of how Hyperthread and Core
configuration works works in firmware, the MSR it is de-facto constant and
will remain unchanged until the next system reset.

It is a read only MSR (so unilaterally reject writes), but for now retain its
leaky-on-read properties.  Further CPUID/MSR work is required before we can
start virtualising a consistent topology to the guest, and retaining the old
behaviour is the safest course of action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ce22756..93cb71e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4026,6 +4026,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
         wrmsrl(MSR_FLUSH_CMD, msr_content);
         break;
 
+    case MSR_INTEL_CORE_THREAD_COUNT:
     case MSR_ARCH_CAPABILITIES:
         /* Read-only */
     case MSR_TSX_FORCE_ABORT:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 4b257fb..2bc05f1 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2901,6 +2901,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                 wrmsrl(regs->_ecx, msr_content);
             break;
 
+        case MSR_INTEL_CORE_THREAD_COUNT:
         case MSR_INTEL_PLATFORM_INFO:
         case MSR_ARCH_CAPABILITIES:
             /* The MSR is read-only. */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index c2aa36e..f25f38c 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -31,6 +31,10 @@
 #define EFER_LMSLE		(1<<_EFER_LMSLE)
 #define EFER_FFXSE		(1<<_EFER_FFXSE)
 
+#define MSR_INTEL_CORE_THREAD_COUNT     0x00000035
+#define MSR_CTC_THREAD_MASK             0x0000ffff
+#define MSR_CTC_CORE_MASK               0xffff0000
+
 /* Speculation Controls. */
 #define MSR_SPEC_CTRL			0x00000048
 #define SPEC_CTRL_IBRS			(_AC(1, ULL) << 0)

[-- Attachment #10: xsa297/xsa297-4.7-4.patch --]
[-- Type: application/octet-stream, Size: 4388 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/boot: Detect the firmware SMT setting correctly on Intel hardware

While boot_cpu_data.x86_num_siblings is an accurate value to use on AMD
hardware, it isn't on Intel when the user has disabled Hyperthreading in the
firmware.  As a result, a user which has chosen to disable HT still gets
nagged on L1TF-vulnerable hardware when they haven't chosen an explicit
smt=<bool> setting.

Make use of the largely-undocumented MSR_INTEL_CORE_THREAD_COUNT which in
practice exists since Nehalem, when booting on real hardware.  Fall back to
using the ACPI table APIC IDs.

While adjusting this logic, fix a latent bug in amd_get_topology().  The
thread count field in CPUID.0x8000001e.ebx is documented as 8 bits wide,
rather than 2 bits wide.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index bc16f7e..dbdf740 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -502,7 +502,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
                 u32 eax, ebx, ecx, edx;
 
                 cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-                c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
+                c->x86_num_siblings = ((ebx >> 8) & 0xff) + 1;
 
                 if (c->x86 < 0x17)
                         c->compute_unit_id = ebx & 0xFF;
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 1343e0a..9e368a9 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -395,6 +395,45 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            opt_pv_l1tf & OPT_PV_L1TF_DOMU  ? "enabled"  : "disabled");
 }
 
+static bool_t __init check_smt_enabled(void)
+{
+    uint64_t val;
+    unsigned int cpu;
+
+    /*
+     * x86_num_siblings defaults to 1 in the absence of other information, and
+     * is adjusted based on other topology information found in CPUID leaves.
+     *
+     * On AMD hardware, it will be the current SMT configuration.  On Intel
+     * hardware, it will represent the maximum capability, rather than the
+     * current configuration.
+     */
+    if ( boot_cpu_data.x86_num_siblings < 2 )
+        return 0;
+
+    /*
+     * Intel Nehalem and later hardware does have an MSR which reports the
+     * current count of cores/threads in the package.
+     *
+     * At the time of writing, it is almost completely undocumented, so isn't
+     * virtualised reliably.
+     */
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && !cpu_has_hypervisor &&
+         !rdmsr_safe(MSR_INTEL_CORE_THREAD_COUNT, val) )
+        return (MASK_EXTR(val, MSR_CTC_CORE_MASK) !=
+                MASK_EXTR(val, MSR_CTC_THREAD_MASK));
+
+    /*
+     * Search over the CPUs reported in the ACPI tables.  Any whose APIC ID
+     * has a non-zero thread id component indicates that SMT is active.
+     */
+    for_each_present_cpu ( cpu )
+        if ( x86_cpu_to_apicid[cpu] & (boot_cpu_data.x86_num_siblings - 1) )
+            return 1;
+
+    return 0;
+}
+
 /* Calculate whether Retpoline is known-safe on this CPU. */
 static bool_t __init retpoline_safe(uint64_t caps)
 {
@@ -704,12 +743,14 @@ static __init void l1tf_calculations(uint64_t caps)
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
-    bool_t use_spec_ctrl = 0, ibrs = 0;
+    bool_t use_spec_ctrl = 0, ibrs = 0, hw_smt_enabled;
     uint64_t caps = 0;
 
     if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
         rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
+    hw_smt_enabled = check_smt_enabled();
+
     /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
      * instructions exactly and disable all heuristics.
@@ -886,8 +927,7 @@ void __init init_speculation_mitigations(void)
      * However, if we are on affected hardware, with HT enabled, and the user
      * hasn't explicitly chosen whether to use HT or not, nag them to do so.
      */
-    if ( opt_smt == -1 && cpu_has_bug_l1tf &&
-         boot_cpu_data.x86_num_siblings > 1 )
+    if ( opt_smt == -1 && cpu_has_bug_l1tf && hw_smt_enabled )
     {
         printk("******************************************************\n");
         printk("Booted on L1TF-vulnerable hardware with SMT/Hyperthreading\n");

[-- Attachment #11: xsa297/xsa297-4.7-5.patch --]
[-- Type: application/octet-stream, Size: 2359 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Misc non-functional cleanup

 * Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
   mitigations will shortly appear.
 * Use alternative_input() and cover the lack of memory cobber with a further
   barrier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index ee7f18d..631fc36 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -65,6 +65,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     uint32_t val = 0;
 
     /*
+     * Branch Target Injection:
+     *
      * Latch the new shadow value, then enable shadowing, then update the MSR.
      * There are no SMP issues here; only local processor ordering concerns.
      */
@@ -72,10 +74,10 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     barrier();
     info->spec_ctrl_flags |= SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", %c3)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0),
-                      "i" (X86_FEATURE_SC_MSR_IDLE)
-                   : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", %c3,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0),
+                      "i" (X86_FEATURE_SC_MSR_IDLE));
+    barrier();
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -84,15 +86,17 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     uint32_t val = info->xen_spec_ctrl;
 
     /*
+     * Branch Target Injection:
+     *
      * Disable shadowing before updating the MSR.  There are no SMP issues
      * here; only local processor ordering concerns.
      */
     info->spec_ctrl_flags &= ~SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", %c3)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0),
-                      "i" (X86_FEATURE_SC_MSR_IDLE)
-                   : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", %c3,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0),
+                      "i" (X86_FEATURE_SC_MSR_IDLE));
+    barrier();
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */

[-- Attachment #12: xsa297/xsa297-4.7-6.patch --]
[-- Type: application/octet-stream, Size: 7320 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data
 Sampling

The MD_CLEAR feature can be automatically offered to guests.  No
infrastructure is needed in Xen to support the guest making use of it.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index b620768..20863b2 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -442,7 +442,7 @@ accounting for hardware capabilities as enumerated via CPUID.
 
 Currently accepted:
 
-The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`,
+The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`,
 `l1d-flush` and `ssbd` are used by default if available and applicable.  They can
 be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
 won't offer them to guests.
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index a677922..35c80b3 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -158,6 +158,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
         {"de",           0x00000001, NA, CPUID_REG_EDX,  2,  1},
         {"vme",          0x00000001, NA, CPUID_REG_EDX,  1,  1},
         {"fpu",          0x00000001, NA, CPUID_REG_EDX,  0,  1},
+        {"md-clear",     0x00000007,  0, CPUID_REG_EDX, 10,  1},
         {"ibrsb",        0x00000007,  0, CPUID_REG_EDX, 26,  1},
         {"stibp",        0x00000007,  0, CPUID_REG_EDX, 27,  1},
         {"l1d-flush",    0x00000007,  0, CPUID_REG_EDX, 28,  1},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 288fc48..ba4823f 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -149,8 +149,9 @@ static const char *str_e8b[32] =
 
 static const char *str_7d0[32] =
 {
-    [0 ... 11] = "REZ",
+    [0 ... 9] = "REZ",
 
+    [10] = "md-clear",      [11] = "REZ",
     [12] = "REZ",           [13] = "tsx-force-abort",
 
     [14 ... 25] = "REZ",
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 99778bb..7d7217f 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -27,7 +27,12 @@ static int __init parse_xen_cpuid(const char *s)
         if ( !ss )
             ss = strchr(s, '\0');
 
-        if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+        if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        {
+            if ( !val )
+                setup_clear_cpu_cap(X86_FEATURE_MD_CLEAR);
+        }
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
         {
             if ( !val )
                 setup_clear_cpu_cap(X86_FEATURE_IBPB);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 9e368a9..b2d1757 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -325,17 +325,19 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
     printk("Speculative mitigation facilities:\n");
 
     /* Hardware features which pertain to speculative mitigations. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP"     : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))  ? " SSBD"      : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"      : "",
            (caps & ARCH_CAPABILITIES_IBRS_ALL)      ? " IBRS_ALL"  : "",
            (caps & ARCH_CAPABILITIES_RDCL_NO)       ? " RDCL_NO"   : "",
            (caps & ARCH_CAPS_RSBA)                  ? " RSBA"      : "",
            (caps & ARCH_CAPS_SKIP_L1DFL)            ? " SKIP_L1DFL": "",
-           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "");
+           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "",
+           (caps & ARCH_CAPS_MDS_NO)                ? " MDS_NO"    : "");
 
     /* Compiled-in support which pertains to mitigations. */
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
@@ -372,19 +374,21 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Alternatives blocks for protecting against and/or virtualising
      * mitigation support for guests.
      */
-    printk("  Support for VMs: PV:%s%s%s%s, HVM:%s%s%s%s\n",
+    printk("  Support for VMs: PV:%s%s%s%s%s, HVM:%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
-           opt_eager_fpu                             ? " EAGER_FPU"     : "");
+           opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s\n",
            opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled",
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index f25f38c..7668969 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -50,6 +50,7 @@
 #define ARCH_CAPS_RSBA			(_AC(1, ULL) << 2)
 #define ARCH_CAPS_SKIP_L1DFL		(_AC(1, ULL) << 3)
 #define ARCH_CAPS_SSB_NO		(_AC(1, ULL) << 4)
+#define ARCH_CAPS_MDS_NO		(_AC(1, ULL) << 5)
 
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 73de870..4282943 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -227,6 +227,7 @@ XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
 XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
+XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A! STIBP */

[-- Attachment #13: xsa297/xsa297-4.7-7.patch --]
[-- Type: application/octet-stream, Size: 6474 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers

Three synthetic features are introduced, as we need individual control of
each, depending on circumstances.  A later change will enable them at
appropriate points.

The verw_sel field doesn't strictly need to live in struct cpu_info.  It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index 994b23b..2efeeec 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -137,6 +137,7 @@ void __dummy__(void)
 
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
     OFFSET(CPUINFO_processor_id, struct cpu_info, processor_id);
+    OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
     OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index c5001b4..ec65630 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -36,6 +36,9 @@
 #define X86_FEATURE_SC_RSB_HVM		((FSCAPINTS+0)*32+ 17) /* RSB overwrite needed for HVM */
 #define X86_FEATURE_NO_XPTI		((FSCAPINTS+0)*32+ 18) /* XPTI mitigation not in use */
 #define X86_FEATURE_SC_MSR_IDLE		((FSCAPINTS+0)*32+ 19) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
+#define X86_FEATURE_SC_VERW_PV		((FSCAPINTS+0)*32+ 20) /* VERW used by Xen for PV */
+#define X86_FEATURE_SC_VERW_HVM		((FSCAPINTS+0)*32+ 21) /* VERW used by Xen for HVM */
+#define X86_FEATURE_SC_VERW_IDLE	((FSCAPINTS+0)*32+ 22) /* VERW used by Xen for idle */
 
 #define cpufeat_word(idx)	((idx) / 32)
 #define cpufeat_bit(idx)	((idx) % 32)
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 0cf4a7c..8851552 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -39,6 +39,7 @@ struct vcpu;
 struct cpu_info {
     struct cpu_user_regs guest_cpu_user_regs;
     unsigned int processor_id;
+    unsigned int verw_sel;
     struct vcpu *current_vcpu;
     unsigned long per_cpu_offset;
     unsigned long cr4;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 631fc36..0cc9eb9 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -57,6 +57,13 @@ static inline void init_shadow_spec_ctrl_state(void)
     info->shadow_spec_ctrl = 0;
     info->xen_spec_ctrl = default_xen_spec_ctrl;
     info->spec_ctrl_flags = default_spec_ctrl_flags;
+
+    /*
+     * For least latency, the VERW selector should be a writeable data
+     * descriptor resident in the cache.  __HYPERVISOR_DS32 shares a cache
+     * line with __HYPERVISOR_CS, so is expected to be very cache-hot.
+     */
+    info->verw_sel = __HYPERVISOR_DS32;
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe after this call. */
@@ -78,6 +85,23 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0),
                       "i" (X86_FEATURE_SC_MSR_IDLE));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When entering idle, our store buffer entries
+     * are re-partitioned to allow the other threads to use them.
+     *
+     * Flush the buffers to ensure that no sensitive data of ours can be
+     * leaked by a sibling after it gets our store buffer entries.
+     *
+     * Note: VERW must be encoded with a memory operand, as it is only that
+     * form which causes a flush.
+     */
+    alternative_input(ASM_NOP8, "verw %[sel]", %c[feat],
+                      [sel] "m" (info->verw_sel),
+                      [feat] "i" (X86_FEATURE_SC_VERW_IDLE));
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -97,6 +121,17 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0),
                       "i" (X86_FEATURE_SC_MSR_IDLE));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When exiting idle, the other threads store
+     * buffer entries are re-partitioned to give us some.
+     *
+     * We now have store buffer entries with stale data from sibling threads.
+     * A flush if necessary will be performed on the return to guest path.
+     */
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 4d864eb..560306f 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -247,12 +247,18 @@
 /* Use when exiting to PV guest context. */
 #define SPEC_CTRL_EXIT_TO_PV                                            \
     ALTERNATIVE __stringify(ASM_NOP24),                                 \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;              \
+    ALTERNATIVE __stringify(ASM_NOP8),                                  \
+        __stringify(verw CPUINFO_verw_sel(%rsp)),                       \
+        X86_FEATURE_SC_VERW_PV
 
 /* Use when exiting to HVM guest context. */
 #define SPEC_CTRL_EXIT_TO_HVM                                           \
     ALTERNATIVE __stringify(ASM_NOP24),                                 \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM;             \
+    ALTERNATIVE __stringify(ASM_NOP8),                                  \
+        __stringify(verw CPUINFO_verw_sel(%rsp)),                       \
+        X86_FEATURE_SC_VERW_HVM
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.

[-- Attachment #14: xsa297/xsa297-4.7-8.patch --]
[-- Type: application/octet-stream, Size: 13927 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Introduce options to control VERW flushing

The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:

 MLPDS - Microarchitectural Load Port Data Sampling
 MSBDS - Microarchitectural Store Buffer Data Sampling
 MFBDS - Microarchitectural Fill Buffer Data Sampling
 MDSUM - Microarchitectural Data Sampling Uncacheable Memory

MDSUM is a special case of the other three, and isn't distinguished further.

These issues pertain to three microarchitectural buffers.  The Load Ports, the
Store Buffers and the Fill Buffers.  Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.

For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:

 * Vulnerable hardware is divided into two categories - parts which suffer
   from SP only, and parts with any other combination of vulnerabilities.

 * SP only has an HT interaction when the thread goes idle, due to the static
   partitioning of resources.  LP and FB have HT interactions at all points,
   due to the competitive sharing of resources.  All issues potentially leak
   data across the return-to-guest transition.

 * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
   we don't need to do both on the HVM return-to-guest path.  However, the
   Knights range of processors are immune to L1TF (therefore have no
   MSR_FLUSH_CMD), but are vulnerable to MDS, so do require VERW on the HVM
   path.

Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 20863b2..9c6cabc 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1514,7 +1514,7 @@ is being interpreted as a custom timeout in milliseconds. Zero or boolean
 false disable the quirk workaround, which is also the default.
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
 >              l1d-flush}=<bool> ]`
 
@@ -1538,9 +1538,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=` and `rsb=` offer fine grained control
-over the alternative blocks used by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+grained control over the alternative blocks used by Xen.  These impact Xen's
+ability to protect itself, and Xen's ability to virtualise support for guests
+to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -1549,6 +1550,11 @@ protect itself, and Xen's ability to virtualise support for guests to use.
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
 * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
   Return Address Stack on entry to Xen.
+* `md-clear=` offers control over whether to use VERW to flush
+  microarchitectural buffers on idle and exit from Xen.  *Note: For
+  compatibility with development versions of this fix, `mds=` is also accepted
+  on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
+  preference to here.*
 
 If Xen was compiled with INDIRECT\_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index fc9e379..8a3e9f1 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -439,6 +439,15 @@ void identify_cpu(struct cpuinfo_x86 *c)
 		if (test_bit(X86_FEATURE_SC_MSR_IDLE,
 			     boot_cpu_data.x86_capability))
 			__set_bit(X86_FEATURE_SC_MSR_IDLE, c->x86_capability);
+		if (test_bit(X86_FEATURE_SC_VERW_PV,
+			     boot_cpu_data.x86_capability))
+			__set_bit(X86_FEATURE_SC_VERW_PV, c->x86_capability);
+		if (test_bit(X86_FEATURE_SC_VERW_HVM,
+			     boot_cpu_data.x86_capability))
+			__set_bit(X86_FEATURE_SC_VERW_HVM, c->x86_capability);
+		if (test_bit(X86_FEATURE_SC_VERW_IDLE,
+			     boot_cpu_data.x86_capability))
+			__set_bit(X86_FEATURE_SC_VERW_IDLE, c->x86_capability);
 
 		/* AND the already accumulated flags with these */
 		for ( i = 0 ; i < NCAPINTS ; i++ )
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index b2d1757..7d37382 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -32,6 +32,8 @@ static bool_t __initdata opt_msr_sc_pv = 1;
 static bool_t __initdata opt_msr_sc_hvm = 1;
 static bool_t __initdata opt_rsb_pv = 1;
 static bool_t __initdata opt_rsb_hvm = 1;
+static int8_t __initdata opt_md_clear_pv = -1;
+static int8_t __initdata opt_md_clear_hvm = -1;
 
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
@@ -56,6 +58,9 @@ paddr_t __read_mostly l1tf_addr_mask, __read_mostly l1tf_safe_maddr;
 static bool_t __initdata cpu_has_bug_l1tf;
 static unsigned int __initdata l1d_maxphysaddr;
 
+static bool_t __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
+static bool_t __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
+
 static int __init parse_bti(const char *s)
 {
     const char *ss;
@@ -128,6 +133,8 @@ static int __init parse_spec_ctrl(char *s)
         disable_common:
             opt_rsb_pv = 0;
             opt_rsb_hvm = 0;
+            opt_md_clear_pv = 0;
+            opt_md_clear_hvm = 0;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -150,11 +157,13 @@ static int __init parse_spec_ctrl(char *s)
         {
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
+            opt_md_clear_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
+            opt_md_clear_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
         {
@@ -166,6 +175,12 @@ static int __init parse_spec_ctrl(char *s)
             opt_rsb_pv = val;
             opt_rsb_hvm = val;
         }
+        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 ||
+                  (val = parse_boolean("mds", s, ss)) >= 0 )
+        {
+            opt_md_clear_pv = val;
+            opt_md_clear_hvm = val;
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -351,7 +366,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -361,7 +376,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            !boot_cpu_has(X86_FEATURE_SSBD)           ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
            opt_ibpb                                  ? " IBPB"  : "",
-           opt_l1d_flush                             ? " L1D_FLUSH" : "");
+           opt_l1d_flush                             ? " L1D_FLUSH" : "",
+           opt_md_clear_pv || opt_md_clear_hvm       ? " VERW"  : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf )
@@ -744,6 +760,107 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
+/* Calculate whether this CPU is vulnerable to MDS. */
+static __init void mds_calculations(uint64_t caps)
+{
+    /* MDS is only known to affect Intel Family 6 processors at this time. */
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return;
+
+    /* Any processor advertising MDS_NO should be not vulnerable to MDS. */
+    if ( caps & ARCH_CAPS_MDS_NO )
+        return;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+        /*
+         * Core processors since at least Nehalem are vulnerable.
+         */
+    case 0x1f: /* Auburndale / Havendale */
+    case 0x1e: /* Nehalem */
+    case 0x1a: /* Nehalem EP */
+    case 0x2e: /* Nehalem EX */
+    case 0x25: /* Westmere */
+    case 0x2c: /* Westmere EP */
+    case 0x2f: /* Westmere EX */
+    case 0x2a: /* SandyBridge */
+    case 0x2d: /* SandyBridge EP/EX */
+    case 0x3a: /* IvyBridge */
+    case 0x3e: /* IvyBridge EP/EX */
+    case 0x3c: /* Haswell */
+    case 0x3f: /* Haswell EX/EP */
+    case 0x45: /* Haswell D */
+    case 0x46: /* Haswell H */
+    case 0x3d: /* Broadwell */
+    case 0x47: /* Broadwell H */
+    case 0x4f: /* Broadwell EP/EX */
+    case 0x56: /* Broadwell D */
+    case 0x4e: /* Skylake M */
+    case 0x5e: /* Skylake D */
+        cpu_has_bug_mds = 1;
+        break;
+
+        /*
+         * Some Core processors have per-stepping vulnerability.
+         */
+    case 0x55: /* Skylake-X / Cascade Lake */
+        if ( boot_cpu_data.x86_mask <= 5 )
+            cpu_has_bug_mds = 1;
+        break;
+
+    case 0x8e: /* Kaby / Coffee / Whiskey Lake M */
+        if ( boot_cpu_data.x86_mask <= 0xb )
+            cpu_has_bug_mds = 1;
+        break;
+
+    case 0x9e: /* Kaby / Coffee / Whiskey Lake D */
+        if ( boot_cpu_data.x86_mask <= 0xc )
+            cpu_has_bug_mds = 1;
+        break;
+
+        /*
+         * Very old and very new Atom processors are not vulnerable.
+         */
+    case 0x1c: /* Pineview */
+    case 0x26: /* Lincroft */
+    case 0x27: /* Penwell */
+    case 0x35: /* Cloverview */
+    case 0x36: /* Cedarview */
+    case 0x7a: /* Goldmont */
+        break;
+
+        /*
+         * Middling Atom processors are vulnerable to just the Store Buffer
+         * aspect.
+         */
+    case 0x37: /* Baytrail / Valleyview (Silvermont) */
+    case 0x4a: /* Merrifield */
+    case 0x4c: /* Cherrytrail / Brasswell */
+    case 0x4d: /* Avaton / Rangely (Silvermont) */
+    case 0x5a: /* Moorefield */
+    case 0x5d:
+    case 0x65:
+    case 0x6e:
+    case 0x75:
+        /*
+         * Knights processors (which are based on the Silvermont/Airmont
+         * microarchitecture) are similarly only affected by the Store Buffer
+         * aspect.
+         */
+    case 0x57: /* Knights Landing */
+    case 0x85: /* Knights Mill */
+        cpu_has_bug_msbds_only = 1;
+        break;
+
+    default:
+        printk("Unrecognised CPU model %#x - assuming vulnerable to MDS\n",
+               boot_cpu_data.x86_model);
+        cpu_has_bug_mds = 1;
+        break;
+    }
+}
+
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
@@ -940,6 +1057,50 @@ void __init init_speculation_mitigations(void)
         printk("******************************************************\n");
     }
 
+    mds_calculations(caps);
+
+    /*
+     * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+     * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+     * but it is somewhat better than nothing.
+     */
+    if ( opt_md_clear_pv == -1 )
+        opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                           boot_cpu_has(X86_FEATURE_MD_CLEAR));
+    if ( opt_md_clear_hvm == -1 )
+        opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                            boot_cpu_has(X86_FEATURE_MD_CLEAR));
+
+    /*
+     * Enable MDS defences as applicable.  The PV blocks need using all the
+     * time, and the Idle blocks need using if either PV or HVM defences are
+     * used.
+     *
+     * HVM is more complicated.  The MD_CLEAR microcode extends L1D_FLUSH with
+     * equivelent semantics to avoid needing to perform both flushes on the
+     * HVM path.  The HVM blocks don't need activating if our hypervisor told
+     * us it was handling L1D_FLUSH, or we are using L1D_FLUSH ourselves.
+     */
+    if ( opt_md_clear_pv )
+        __set_bit(X86_FEATURE_SC_VERW_PV, boot_cpu_data.x86_capability);
+    if ( opt_md_clear_pv || opt_md_clear_hvm )
+        __set_bit(X86_FEATURE_SC_VERW_IDLE, boot_cpu_data.x86_capability);
+    if ( opt_md_clear_hvm && !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush )
+        __set_bit(X86_FEATURE_SC_VERW_HVM, boot_cpu_data.x86_capability);
+
+    /*
+     * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+     * active and no explicit SMT choice.
+     */
+    if ( opt_smt == -1 && cpu_has_bug_mds && hw_smt_enabled )
+    {
+        printk("******************************************************\n");
+        printk("Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading\n");
+        printk("enabled.  Mitigations will not be fully effective.  Please\n");
+        printk("choose an explicit smt=<bool> setting.  See XSA-297.\n");
+        printk("******************************************************\n");
+    }
+
     print_details(thunk, caps);
 
     /*

[-- Attachment #15: xsa297/xsa297-4.8-1.patch --]
[-- Type: application/octet-stream, Size: 8285 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/tsx: Implement controls for RTM force-abort mode

The CPUID bit and MSR are deliberately not exposed to guests, because they
won't exist on newer processors.  As vPMU isn't security supported, the
misbehaviour of PCR3 isn't expected to impact production deployments.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6be613f29b4205349275d24367bd4c82fb2960dd
master date: 2019-03-12 17:05:21 +0000

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 1c826ef..0bf6852 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1781,7 +1781,7 @@ Use Virtual Processor ID support if available.  This prevents the need for TLB
 flushes on VM entry and exit, increasing performance.
 
 ### vpmu
-> `= ( <boolean> | { bts | ipc | arch [, ...] } )`
+> `= ( <boolean> | { bts | ipc | arch | rtm-abort=<bool> [, ...] } )`
 
 > Default: `off`
 
@@ -1807,6 +1807,21 @@ in the Pre-Defined Architectural Performance Events table from the Intel 64
 and IA-32 Architectures Software Developer's Manual, Volume 3B, System
 Programming Guide, Part 2.
 
+vpmu=rtm-abort controls a trade-off between working Restricted Transactional
+Memory, and working performance counters.
+
+All processors released to date (Q1 2019) supporting Transactional Memory
+Extensions suffer an erratum which has been addressed in microcode.
+
+Processors based on the Skylake microarchitecture with up-to-date
+microcode internally use performance counter 3 to work around the erratum.
+A consequence is that the counter gets reprogrammed whenever an `XBEGIN`
+instruction is executed.
+
+An alternative mode exists where PCR3 behaves as before, at the cost of
+`XBEGIN` unconditionally aborting.  Enabling `rtm-abort` mode will
+activate this alternative mode.
+
 If a boolean is not used, combinations of flags are allowed, comma separated.
 For example, vpmu=arch,bts.
 
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 184f8ad..288fc48 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -149,7 +149,11 @@ static const char *str_e8b[32] =
 
 static const char *str_7d0[32] =
 {
-    [0 ... 25] = "REZ",
+    [0 ... 11] = "REZ",
+
+    [12] = "REZ",           [13] = "tsx-force-abort",
+
+    [14 ... 25] = "REZ",
 
     [26] = "ibrsb",         [27] = "stibp",
     [28] = "l1d_flush",     [29] = "arch_caps",
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index fe54720..a9a8ec5 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -356,6 +356,9 @@ static void Intel_errata_workarounds(struct cpuinfo_x86 *c)
 	if (c->x86 == 6 && cpu_has_clflush &&
 	    (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
 		__set_bit(X86_FEATURE_CLFLUSH_MONITOR, c->x86_capability);
+
+	if (cpu_has_tsx_force_abort && opt_rtm_abort)
+		wrmsrl(MSR_TSX_FORCE_ABORT, TSX_FORCE_ABORT_RTM);
 }
 
 
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 2f9ddf6..91cb43e 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -53,6 +53,7 @@ CHECK_pmu_params;
 static unsigned int __read_mostly opt_vpmu_enabled;
 unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
 unsigned int __read_mostly vpmu_features = 0;
+bool __read_mostly opt_rtm_abort;
 static void parse_vpmu_params(char *s);
 custom_param("vpmu", parse_vpmu_params);
 
@@ -63,6 +64,8 @@ static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
 
 static int parse_vpmu_param(char *s, unsigned int len)
 {
+    int val;
+
     if ( !*s || !len )
         return 0;
     if ( !strncmp(s, "bts", len) )
@@ -71,6 +74,8 @@ static int parse_vpmu_param(char *s, unsigned int len)
         vpmu_features |= XENPMU_FEATURE_IPC_ONLY;
     else if ( !strncmp(s, "arch", len) )
         vpmu_features |= XENPMU_FEATURE_ARCH_ONLY;
+    else if ( (val = parse_boolean("rtm-abort", s, s + len)) >= 0 )
+        opt_rtm_abort = val;
     else
         return 1;
     return 0;
@@ -97,6 +102,10 @@ static void __init parse_vpmu_params(char *s)
                 break;
             p = sep + 1;
         }
+
+        if ( !vpmu_features ) /* rtm-abort doesn't imply vpmu=1 */
+            break;
+
         /* fall through */
     case 1:
         /* Default VPMU mode */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ca64f6a..9582950 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3936,6 +3936,8 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_PRED_CMD:
     case MSR_FLUSH_CMD:
         /* Write-only */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         goto gp_fault;
 
     case MSR_SPEC_CTRL:
@@ -4179,6 +4181,8 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
 
     case MSR_ARCH_CAPABILITIES:
         /* Read-only */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         goto gp_fault;
 
     case MSR_AMD64_NB_CFG:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 75a1dbf..404bdce 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2517,6 +2517,8 @@ static int priv_op_read_msr(unsigned int reg, uint64_t *val,
     case MSR_PRED_CMD:
     case MSR_FLUSH_CMD:
         /* Write-only */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         break;
 
     case MSR_SPEC_CTRL:
@@ -2745,6 +2747,8 @@ static int priv_op_write_msr(unsigned int reg, uint64_t val,
     case MSR_INTEL_PLATFORM_INFO:
     case MSR_ARCH_CAPABILITIES:
         /* The MSR is read-only. */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         break;
 
     case MSR_SPEC_CTRL:
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 89ff249..370e89e 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -93,6 +93,7 @@ XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+24) /* Xen uses MSR_DEBUGCTL.LB
 #define cpu_has_avx             boot_cpu_has(X86_FEATURE_AVX)
 #define cpu_has_lwp             boot_cpu_has(X86_FEATURE_LWP)
 #define cpu_has_mpx             boot_cpu_has(X86_FEATURE_MPX)
+#define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
 #define cpu_has_rdtscp          boot_cpu_has(X86_FEATURE_RDTSCP)
 #define cpu_has_svm		boot_cpu_has(X86_FEATURE_SVM)
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 7bb382f..29ece6a 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -53,6 +53,9 @@
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
 
+#define MSR_TSX_FORCE_ABORT             0x0000010f
+#define TSX_FORCE_ABORT_RTM             (_AC(1, ULL) <<  0)
+
 /* Intel MSRs. Some also available on other CPUs */
 #define MSR_IA32_PERFCTR0		0x000000c1
 #define MSR_IA32_A_PERFCTR0		0x000004c1
diff --git a/xen/include/asm-x86/vpmu.h b/xen/include/asm-x86/vpmu.h
index 75b1973..37f521f 100644
--- a/xen/include/asm-x86/vpmu.h
+++ b/xen/include/asm-x86/vpmu.h
@@ -127,6 +127,7 @@ static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
 extern unsigned int vpmu_mode;
 extern unsigned int vpmu_features;
+extern bool opt_rtm_abort;
 
 /* Context switch */
 static inline void vpmu_switch_from(struct vcpu *prev)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 5eedd07..ed6fbfc 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -236,6 +236,7 @@ XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
 XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
+XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A! STIBP */
 XEN_CPUFEATURE(L1D_FLUSH,     9*32+28) /*S  MSR_FLUSH_CMD and L1D flush. */

[-- Attachment #16: xsa297/xsa297-4.8-2.patch --]
[-- Type: application/octet-stream, Size: 4253 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Reposition the XPTI command line parsing logic

It has ended up in the middle of the mitigation calculation logic.  Move it to
be beside the other command line parsing.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 25da6a2..9665ec5 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -206,6 +206,73 @@ static int __init parse_spec_ctrl(char *s)
 }
 custom_param("spec-ctrl", parse_spec_ctrl);
 
+int8_t __read_mostly opt_xpti_hwdom = -1;
+int8_t __read_mostly opt_xpti_domu = -1;
+
+static __init void xpti_init_default(uint64_t caps)
+{
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+        caps = ARCH_CAPABILITIES_RDCL_NO;
+
+    if ( caps & ARCH_CAPABILITIES_RDCL_NO )
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 0;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 0;
+    }
+    else
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 1;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 1;
+    }
+}
+
+static __init int parse_xpti(char *s)
+{
+    char *ss;
+    int val, rc = 0;
+
+    /* Interpret 'xpti' alone in its positive boolean form. */
+    if ( *s == '\0' )
+        opt_xpti_hwdom = opt_xpti_domu = 1;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        switch ( parse_bool(s) )
+        {
+        case 0:
+            opt_xpti_hwdom = opt_xpti_domu = 0;
+            break;
+
+        case 1:
+            opt_xpti_hwdom = opt_xpti_domu = 1;
+            break;
+
+        default:
+            if ( !strcmp(s, "default") )
+                opt_xpti_hwdom = opt_xpti_domu = -1;
+            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
+                opt_xpti_hwdom = val;
+            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
+                opt_xpti_domu = val;
+            else if ( *s )
+                rc = -EINVAL;
+            break;
+        }
+
+        s = ss + 1;
+    } while ( ss );
+
+    return rc;
+}
+custom_param("xpti", parse_xpti);
+
 int8_t __read_mostly opt_pv_l1tf_hwdom = -1;
 int8_t __read_mostly opt_pv_l1tf_domu = -1;
 
@@ -639,73 +706,6 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
-int8_t __read_mostly opt_xpti_hwdom = -1;
-int8_t __read_mostly opt_xpti_domu = -1;
-
-static __init void xpti_init_default(uint64_t caps)
-{
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-        caps = ARCH_CAPABILITIES_RDCL_NO;
-
-    if ( caps & ARCH_CAPABILITIES_RDCL_NO )
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 0;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 0;
-    }
-    else
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 1;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 1;
-    }
-}
-
-static __init int parse_xpti(char *s)
-{
-    char *ss;
-    int val, rc = 0;
-
-    /* Interpret 'xpti' alone in its positive boolean form. */
-    if ( *s == '\0' )
-        opt_xpti_hwdom = opt_xpti_domu = 1;
-
-    do {
-        ss = strchr(s, ',');
-        if ( ss )
-            *ss = '\0';
-
-        switch ( parse_bool(s) )
-        {
-        case 0:
-            opt_xpti_hwdom = opt_xpti_domu = 0;
-            break;
-
-        case 1:
-            opt_xpti_hwdom = opt_xpti_domu = 1;
-            break;
-
-        default:
-            if ( !strcmp(s, "default") )
-                opt_xpti_hwdom = opt_xpti_domu = -1;
-            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
-                opt_xpti_hwdom = val;
-            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
-                opt_xpti_domu = val;
-            else if ( *s )
-                rc = -EINVAL;
-            break;
-        }
-
-        s = ss + 1;
-    } while ( ss );
-
-    return rc;
-}
-custom_param("xpti", parse_xpti);
-
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;

[-- Attachment #17: xsa297/xsa297-4.8-3.patch --]
[-- Type: application/octet-stream, Size: 2207 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT

This is a model specific register which details the current configuration
cores and threads in the package.  Because of how Hyperthread and Core
configuration works works in firmware, the MSR it is de-facto constant and
will remain unchanged until the next system reset.

It is a read only MSR (so unilaterally reject writes), but for now retain its
leaky-on-read properties.  Further CPUID/MSR work is required before we can
start virtualising a consistent topology to the guest, and retaining the old
behaviour is the safest course of action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 9582950..23d6f09 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4179,6 +4179,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
         wrmsrl(MSR_FLUSH_CMD, msr_content);
         break;
 
+    case MSR_INTEL_CORE_THREAD_COUNT:
     case MSR_ARCH_CAPABILITIES:
         /* Read-only */
     case MSR_TSX_FORCE_ABORT:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 404bdce..232d1b0 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2744,6 +2744,7 @@ static int priv_op_write_msr(unsigned int reg, uint64_t val,
             wrmsrl(reg, val);
         return X86EMUL_OKAY;
 
+    case MSR_INTEL_CORE_THREAD_COUNT:
     case MSR_INTEL_PLATFORM_INFO:
     case MSR_ARCH_CAPABILITIES:
         /* The MSR is read-only. */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 29ece6a..54f3a66 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -34,6 +34,10 @@
 #define EFER_KNOWN_MASK		(EFER_SCE | EFER_LME | EFER_LMA | EFER_NX | \
 				 EFER_SVME | EFER_LMSLE | EFER_FFXSE)
 
+#define MSR_INTEL_CORE_THREAD_COUNT     0x00000035
+#define MSR_CTC_THREAD_MASK             0x0000ffff
+#define MSR_CTC_CORE_MASK               0xffff0000
+
 /* Speculation Controls. */
 #define MSR_SPEC_CTRL			0x00000048
 #define SPEC_CTRL_IBRS			(_AC(1, ULL) << 0)

[-- Attachment #18: xsa297/xsa297-4.8-4.patch --]
[-- Type: application/octet-stream, Size: 4400 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/boot: Detect the firmware SMT setting correctly on Intel hardware

While boot_cpu_data.x86_num_siblings is an accurate value to use on AMD
hardware, it isn't on Intel when the user has disabled Hyperthreading in the
firmware.  As a result, a user which has chosen to disable HT still gets
nagged on L1TF-vulnerable hardware when they haven't chosen an explicit
smt=<bool> setting.

Make use of the largely-undocumented MSR_INTEL_CORE_THREAD_COUNT which in
practice exists since Nehalem, when booting on real hardware.  Fall back to
using the ACPI table APIC IDs.

While adjusting this logic, fix a latent bug in amd_get_topology().  The
thread count field in CPUID.0x8000001e.ebx is documented as 8 bits wide,
rather than 2 bits wide.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 62fcc3b..cb5a322 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -502,7 +502,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
                 u32 eax, ebx, ecx, edx;
 
                 cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-                c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
+                c->x86_num_siblings = ((ebx >> 8) & 0xff) + 1;
 
                 if (c->x86 < 0x17)
                         c->compute_unit_id = ebx & 0xFF;
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 9665ec5..2a7267c 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -400,6 +400,45 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            opt_pv_l1tf_domu  ? "enabled"  : "disabled");
 }
 
+static bool __init check_smt_enabled(void)
+{
+    uint64_t val;
+    unsigned int cpu;
+
+    /*
+     * x86_num_siblings defaults to 1 in the absence of other information, and
+     * is adjusted based on other topology information found in CPUID leaves.
+     *
+     * On AMD hardware, it will be the current SMT configuration.  On Intel
+     * hardware, it will represent the maximum capability, rather than the
+     * current configuration.
+     */
+    if ( boot_cpu_data.x86_num_siblings < 2 )
+        return false;
+
+    /*
+     * Intel Nehalem and later hardware does have an MSR which reports the
+     * current count of cores/threads in the package.
+     *
+     * At the time of writing, it is almost completely undocumented, so isn't
+     * virtualised reliably.
+     */
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && !cpu_has_hypervisor &&
+         !rdmsr_safe(MSR_INTEL_CORE_THREAD_COUNT, val) )
+        return (MASK_EXTR(val, MSR_CTC_CORE_MASK) !=
+                MASK_EXTR(val, MSR_CTC_THREAD_MASK));
+
+    /*
+     * Search over the CPUs reported in the ACPI tables.  Any whose APIC ID
+     * has a non-zero thread id component indicates that SMT is active.
+     */
+    for_each_present_cpu ( cpu )
+        if ( x86_cpu_to_apicid[cpu] & (boot_cpu_data.x86_num_siblings - 1) )
+            return true;
+
+    return false;
+}
+
 /* Calculate whether Retpoline is known-safe on this CPU. */
 static bool __init retpoline_safe(uint64_t caps)
 {
@@ -709,12 +748,14 @@ static __init void l1tf_calculations(uint64_t caps)
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
-    bool use_spec_ctrl = false, ibrs = false;
+    bool use_spec_ctrl = false, ibrs = false, hw_smt_enabled;
     uint64_t caps = 0;
 
     if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
         rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
+    hw_smt_enabled = check_smt_enabled();
+
     /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
      * instructions exactly and disable all heuristics.
@@ -887,8 +928,7 @@ void __init init_speculation_mitigations(void)
      * However, if we are on affected hardware, with HT enabled, and the user
      * hasn't explicitly chosen whether to use HT or not, nag them to do so.
      */
-    if ( opt_smt == -1 && cpu_has_bug_l1tf &&
-         boot_cpu_data.x86_num_siblings > 1 )
+    if ( opt_smt == -1 && cpu_has_bug_l1tf && hw_smt_enabled )
         warning_add(
             "Booted on L1TF-vulnerable hardware with SMT/Hyperthreading\n"
             "enabled.  Please assess your configuration and choose an\n"

[-- Attachment #19: xsa297/xsa297-4.8-5.patch --]
[-- Type: application/octet-stream, Size: 2181 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Misc non-functional cleanup

 * Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
   mitigations will shortly appear.
 * Use alternative_input() and cover the lack of memory cobber with a further
   barrier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index c846354..4983071 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -61,6 +61,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     uint32_t val = 0;
 
     /*
+     * Branch Target Injection:
+     *
      * Latch the new shadow value, then enable shadowing, then update the MSR.
      * There are no SMP issues here; only local processor ordering concerns.
      */
@@ -68,8 +70,9 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     barrier();
     info->spec_ctrl_flags |= SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -78,13 +81,16 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     uint32_t val = info->xen_spec_ctrl;
 
     /*
+     * Branch Target Injection:
+     *
      * Disable shadowing before updating the MSR.  There are no SMP issues
      * here; only local processor ordering concerns.
      */
     info->spec_ctrl_flags &= ~SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */

[-- Attachment #20: xsa297/xsa297-4.8-6.patch --]
[-- Type: application/octet-stream, Size: 7310 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data
 Sampling

The MD_CLEAR feature can be automatically offered to guests.  No
infrastructure is needed in Xen to support the guest making use of it.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 0bf6852..107761d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -456,7 +456,7 @@ accounting for hardware capabilities as enumerated via CPUID.
 
 Currently accepted:
 
-The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`,
+The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`,
 `l1d-flush` and `ssbd` are used by default if available and applicable.  They can
 be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
 won't offer them to guests.
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index d0f4eeb..20d0602 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -158,6 +158,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
         {"de",           0x00000001, NA, CPUID_REG_EDX,  2,  1},
         {"vme",          0x00000001, NA, CPUID_REG_EDX,  1,  1},
         {"fpu",          0x00000001, NA, CPUID_REG_EDX,  0,  1},
+        {"md-clear",     0x00000007,  0, CPUID_REG_EDX, 10,  1},
         {"ibrsb",        0x00000007,  0, CPUID_REG_EDX, 26,  1},
         {"stibp",        0x00000007,  0, CPUID_REG_EDX, 27,  1},
         {"l1d-flush",    0x00000007,  0, CPUID_REG_EDX, 28,  1},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 288fc48..ba4823f 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -149,8 +149,9 @@ static const char *str_e8b[32] =
 
 static const char *str_7d0[32] =
 {
-    [0 ... 11] = "REZ",
+    [0 ... 9] = "REZ",
 
+    [10] = "md-clear",      [11] = "REZ",
     [12] = "REZ",           [13] = "tsx-force-abort",
 
     [14 ... 25] = "REZ",
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 99778bb..7d7217f 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -27,7 +27,12 @@ static int __init parse_xen_cpuid(const char *s)
         if ( !ss )
             ss = strchr(s, '\0');
 
-        if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+        if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        {
+            if ( !val )
+                setup_clear_cpu_cap(X86_FEATURE_MD_CLEAR);
+        }
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
         {
             if ( !val )
                 setup_clear_cpu_cap(X86_FEATURE_IBPB);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 2a7267c..4e6558a 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -330,17 +330,19 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
     printk("Speculative mitigation facilities:\n");
 
     /* Hardware features which pertain to speculative mitigations. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP"     : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))  ? " SSBD"      : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"      : "",
            (caps & ARCH_CAPABILITIES_IBRS_ALL)      ? " IBRS_ALL"  : "",
            (caps & ARCH_CAPABILITIES_RDCL_NO)       ? " RDCL_NO"   : "",
            (caps & ARCH_CAPS_RSBA)                  ? " RSBA"      : "",
            (caps & ARCH_CAPS_SKIP_L1DFL)            ? " SKIP_L1DFL": "",
-           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "");
+           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "",
+           (caps & ARCH_CAPS_MDS_NO)                ? " MDS_NO"    : "");
 
     /* Compiled-in support which pertains to mitigations. */
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
@@ -377,19 +379,21 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Alternatives blocks for protecting against and/or virtualising
      * mitigation support for guests.
      */
-    printk("  Support for VMs: PV:%s%s%s%s, HVM:%s%s%s%s\n",
+    printk("  Support for VMs: PV:%s%s%s%s%s, HVM:%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
-           opt_eager_fpu                             ? " EAGER_FPU"     : "");
+           opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 54f3a66..92d10e2 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -53,6 +53,7 @@
 #define ARCH_CAPS_RSBA			(_AC(1, ULL) << 2)
 #define ARCH_CAPS_SKIP_L1DFL		(_AC(1, ULL) << 3)
 #define ARCH_CAPS_SSB_NO		(_AC(1, ULL) << 4)
+#define ARCH_CAPS_MDS_NO		(_AC(1, ULL) << 5)
 
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index ed6fbfc..ad4b40b 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -236,6 +236,7 @@ XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
 XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
+XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A! STIBP */

[-- Attachment #21: xsa297/xsa297-4.8-7.patch --]
[-- Type: application/octet-stream, Size: 6512 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers

Three synthetic features are introduced, as we need individual control of
each, depending on circumstances.  A later change will enable them at
appropriate points.

The verw_sel field doesn't strictly need to live in struct cpu_info.  It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index a170673..4b7bc40 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -137,6 +137,7 @@ void __dummy__(void)
 
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
     OFFSET(CPUINFO_processor_id, struct cpu_info, processor_id);
+    OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
     OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 370e89e..6057d95 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -32,6 +32,9 @@ XEN_CPUFEATURE(SC_RSB_HVM,      (FSCAPINTS+0)*32+21) /* RSB overwrite needed for
 XEN_CPUFEATURE(NO_XPTI,         (FSCAPINTS+0)*32+22) /* XPTI mitigation not in use */
 XEN_CPUFEATURE(SC_MSR_IDLE,     (FSCAPINTS+0)*32+23) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
 XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+24) /* Xen uses MSR_DEBUGCTL.LBR */
+XEN_CPUFEATURE(SC_VERW_PV,      (FSCAPINTS+0)*32+25) /* VERW used by Xen for PV */
+XEN_CPUFEATURE(SC_VERW_HVM,     (FSCAPINTS+0)*32+26) /* VERW used by Xen for HVM */
+XEN_CPUFEATURE(SC_VERW_IDLE,    (FSCAPINTS+0)*32+27) /* VERW used by Xen for idle */
 
 #define NCAPINTS (FSCAPINTS + 1) /* N 32-bit words worth of info */
 
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 1c2799d..adecb5c 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -39,6 +39,7 @@ struct vcpu;
 struct cpu_info {
     struct cpu_user_regs guest_cpu_user_regs;
     unsigned int processor_id;
+    unsigned int verw_sel;
     struct vcpu *current_vcpu;
     unsigned long per_cpu_offset;
     unsigned long cr4;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 4983071..98a0a50 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -53,6 +53,13 @@ static inline void init_shadow_spec_ctrl_state(void)
     info->shadow_spec_ctrl = 0;
     info->xen_spec_ctrl = default_xen_spec_ctrl;
     info->spec_ctrl_flags = default_spec_ctrl_flags;
+
+    /*
+     * For least latency, the VERW selector should be a writeable data
+     * descriptor resident in the cache.  __HYPERVISOR_DS32 shares a cache
+     * line with __HYPERVISOR_CS, so is expected to be very cache-hot.
+     */
+    info->verw_sel = __HYPERVISOR_DS32;
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe after this call. */
@@ -73,6 +80,22 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When entering idle, our store buffer entries
+     * are re-partitioned to allow the other threads to use them.
+     *
+     * Flush the buffers to ensure that no sensitive data of ours can be
+     * leaked by a sibling after it gets our store buffer entries.
+     *
+     * Note: VERW must be encoded with a memory operand, as it is only that
+     * form which causes a flush.
+     */
+    alternative_input(ASM_NOP8, "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
+                      [sel] "m" (info->verw_sel));
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -91,6 +114,17 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When exiting idle, the other threads store
+     * buffer entries are re-partitioned to give us some.
+     *
+     * We now have store buffer entries with stale data from sibling threads.
+     * A flush if necessary will be performed on the return to guest path.
+     */
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 4d864eb..560306f 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -247,12 +247,18 @@
 /* Use when exiting to PV guest context. */
 #define SPEC_CTRL_EXIT_TO_PV                                            \
     ALTERNATIVE __stringify(ASM_NOP24),                                 \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;              \
+    ALTERNATIVE __stringify(ASM_NOP8),                                  \
+        __stringify(verw CPUINFO_verw_sel(%rsp)),                       \
+        X86_FEATURE_SC_VERW_PV
 
 /* Use when exiting to HVM guest context. */
 #define SPEC_CTRL_EXIT_TO_HVM                                           \
     ALTERNATIVE __stringify(ASM_NOP24),                                 \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM;             \
+    ALTERNATIVE __stringify(ASM_NOP8),                                  \
+        __stringify(verw CPUINFO_verw_sel(%rsp)),                       \
+        X86_FEATURE_SC_VERW_HVM
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.

[-- Attachment #22: xsa297/xsa297-4.8-8.patch --]
[-- Type: application/octet-stream, Size: 12919 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Introduce options to control VERW flushing

The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:

 MLPDS - Microarchitectural Load Port Data Sampling
 MSBDS - Microarchitectural Store Buffer Data Sampling
 MFBDS - Microarchitectural Fill Buffer Data Sampling
 MDSUM - Microarchitectural Data Sampling Uncacheable Memory

MDSUM is a special case of the other three, and isn't distinguished further.

These issues pertain to three microarchitectural buffers.  The Load Ports, the
Store Buffers and the Fill Buffers.  Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.

For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:

 * Vulnerable hardware is divided into two categories - parts which suffer
   from SP only, and parts with any other combination of vulnerabilities.

 * SP only has an HT interaction when the thread goes idle, due to the static
   partitioning of resources.  LP and FB have HT interactions at all points,
   due to the competitive sharing of resources.  All issues potentially leak
   data across the return-to-guest transition.

 * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
   we don't need to do both on the HVM return-to-guest path.  However, some
   parts are not vulnerable to L1TF (therefore have no MSR_FLUSH_CMD), but are
   vulnerable to MDS, so do require VERW on the HVM path.

Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 107761d..87ca056 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1586,7 +1586,7 @@ is being interpreted as a custom timeout in milliseconds. Zero or boolean
 false disable the quirk workaround, which is also the default.
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
 >              l1d-flush}=<bool> ]`
 
@@ -1610,9 +1610,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=` and `rsb=` offer fine grained control
-over the alternative blocks used by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+grained control over the alternative blocks used by Xen.  These impact Xen's
+ability to protect itself, and Xen's ability to virtualise support for guests
+to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -1621,6 +1622,11 @@ protect itself, and Xen's ability to virtualise support for guests to use.
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
 * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
   Return Address Stack on entry to Xen.
+* `md-clear=` offers control over whether to use VERW to flush
+  microarchitectural buffers on idle and exit from Xen.  *Note: For
+  compatibility with development versions of this fix, `mds=` is also accepted
+  on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
+  preference to here.*
 
 If Xen was compiled with INDIRECT\_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 4e6558a..558626c 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -33,6 +33,8 @@ static bool __initdata opt_msr_sc_pv = true;
 static bool __initdata opt_msr_sc_hvm = true;
 static bool __initdata opt_rsb_pv = true;
 static bool __initdata opt_rsb_hvm = true;
+static int8_t __initdata opt_md_clear_pv = -1;
+static int8_t __initdata opt_md_clear_hvm = -1;
 
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
@@ -57,6 +59,9 @@ paddr_t __read_mostly l1tf_addr_mask, __read_mostly l1tf_safe_maddr;
 static bool __initdata cpu_has_bug_l1tf;
 static unsigned int __initdata l1d_maxphysaddr;
 
+static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
+static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
+
 static int __init parse_bti(const char *s)
 {
     const char *ss;
@@ -133,6 +138,8 @@ static int __init parse_spec_ctrl(char *s)
         disable_common:
             opt_rsb_pv = false;
             opt_rsb_hvm = false;
+            opt_md_clear_pv = 0;
+            opt_md_clear_hvm = 0;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -155,11 +162,13 @@ static int __init parse_spec_ctrl(char *s)
         {
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
+            opt_md_clear_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
+            opt_md_clear_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
         {
@@ -171,6 +180,12 @@ static int __init parse_spec_ctrl(char *s)
             opt_rsb_pv = val;
             opt_rsb_hvm = val;
         }
+        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 ||
+                  (val = parse_boolean("mds", s, ss)) >= 0 )
+        {
+            opt_md_clear_pv = val;
+            opt_md_clear_hvm = val;
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -356,7 +371,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -366,7 +381,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            !boot_cpu_has(X86_FEATURE_SSBD)           ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
            opt_ibpb                                  ? " IBPB"  : "",
-           opt_l1d_flush                             ? " L1D_FLUSH" : "");
+           opt_l1d_flush                             ? " L1D_FLUSH" : "",
+           opt_md_clear_pv || opt_md_clear_hvm       ? " VERW"  : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -749,6 +765,107 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
+/* Calculate whether this CPU is vulnerable to MDS. */
+static __init void mds_calculations(uint64_t caps)
+{
+    /* MDS is only known to affect Intel Family 6 processors at this time. */
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return;
+
+    /* Any processor advertising MDS_NO should be not vulnerable to MDS. */
+    if ( caps & ARCH_CAPS_MDS_NO )
+        return;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+        /*
+         * Core processors since at least Nehalem are vulnerable.
+         */
+    case 0x1f: /* Auburndale / Havendale */
+    case 0x1e: /* Nehalem */
+    case 0x1a: /* Nehalem EP */
+    case 0x2e: /* Nehalem EX */
+    case 0x25: /* Westmere */
+    case 0x2c: /* Westmere EP */
+    case 0x2f: /* Westmere EX */
+    case 0x2a: /* SandyBridge */
+    case 0x2d: /* SandyBridge EP/EX */
+    case 0x3a: /* IvyBridge */
+    case 0x3e: /* IvyBridge EP/EX */
+    case 0x3c: /* Haswell */
+    case 0x3f: /* Haswell EX/EP */
+    case 0x45: /* Haswell D */
+    case 0x46: /* Haswell H */
+    case 0x3d: /* Broadwell */
+    case 0x47: /* Broadwell H */
+    case 0x4f: /* Broadwell EP/EX */
+    case 0x56: /* Broadwell D */
+    case 0x4e: /* Skylake M */
+    case 0x5e: /* Skylake D */
+        cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Some Core processors have per-stepping vulnerability.
+         */
+    case 0x55: /* Skylake-X / Cascade Lake */
+        if ( boot_cpu_data.x86_mask <= 5 )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x8e: /* Kaby / Coffee / Whiskey Lake M */
+        if ( boot_cpu_data.x86_mask <= 0xb )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x9e: /* Kaby / Coffee / Whiskey Lake D */
+        if ( boot_cpu_data.x86_mask <= 0xc )
+            cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Very old and very new Atom processors are not vulnerable.
+         */
+    case 0x1c: /* Pineview */
+    case 0x26: /* Lincroft */
+    case 0x27: /* Penwell */
+    case 0x35: /* Cloverview */
+    case 0x36: /* Cedarview */
+    case 0x7a: /* Goldmont */
+        break;
+
+        /*
+         * Middling Atom processors are vulnerable to just the Store Buffer
+         * aspect.
+         */
+    case 0x37: /* Baytrail / Valleyview (Silvermont) */
+    case 0x4a: /* Merrifield */
+    case 0x4c: /* Cherrytrail / Brasswell */
+    case 0x4d: /* Avaton / Rangely (Silvermont) */
+    case 0x5a: /* Moorefield */
+    case 0x5d:
+    case 0x65:
+    case 0x6e:
+    case 0x75:
+        /*
+         * Knights processors (which are based on the Silvermont/Airmont
+         * microarchitecture) are similarly only affected by the Store Buffer
+         * aspect.
+         */
+    case 0x57: /* Knights Landing */
+    case 0x85: /* Knights Mill */
+        cpu_has_bug_msbds_only = true;
+        break;
+
+    default:
+        printk("Unrecognised CPU model %#x - assuming vulnerable to MDS\n",
+               boot_cpu_data.x86_model);
+        cpu_has_bug_mds = true;
+        break;
+    }
+}
+
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
@@ -938,6 +1055,47 @@ void __init init_speculation_mitigations(void)
             "enabled.  Please assess your configuration and choose an\n"
             "explicit 'smt=<bool>' setting.  See XSA-273.\n");
 
+    mds_calculations(caps);
+
+    /*
+     * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+     * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+     * but it is somewhat better than nothing.
+     */
+    if ( opt_md_clear_pv == -1 )
+        opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                           boot_cpu_has(X86_FEATURE_MD_CLEAR));
+    if ( opt_md_clear_hvm == -1 )
+        opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                            boot_cpu_has(X86_FEATURE_MD_CLEAR));
+
+    /*
+     * Enable MDS defences as applicable.  The PV blocks need using all the
+     * time, and the Idle blocks need using if either PV or HVM defences are
+     * used.
+     *
+     * HVM is more complicated.  The MD_CLEAR microcode extends L1D_FLUSH with
+     * equivelent semantics to avoid needing to perform both flushes on the
+     * HVM path.  The HVM blocks don't need activating if our hypervisor told
+     * us it was handling L1D_FLUSH, or we are using L1D_FLUSH ourselves.
+     */
+    if ( opt_md_clear_pv )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_PV);
+    if ( opt_md_clear_pv || opt_md_clear_hvm )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
+    if ( opt_md_clear_hvm && !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_HVM);
+
+    /*
+     * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+     * active and no explicit SMT choice.
+     */
+    if ( opt_smt == -1 && cpu_has_bug_mds && hw_smt_enabled )
+        warning_add(
+            "Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading\n"
+            "enabled.  Mitigations will not be fully effective.  Please\n"
+            "choose an explicit smt=<bool> setting.  See XSA-297.\n");
+
     print_details(thunk, caps);
 
     /*

[-- Attachment #23: xsa297/xsa297-4.9-1.patch --]
[-- Type: application/octet-stream, Size: 8249 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/tsx: Implement controls for RTM force-abort mode

The CPUID bit and MSR are deliberately not exposed to guests, because they
won't exist on newer processors.  As vPMU isn't security supported, the
misbehaviour of PCR3 isn't expected to impact production deployments.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6be613f29b4205349275d24367bd4c82fb2960dd
master date: 2019-03-12 17:05:21 +0000

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 6009450..10ed971 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1885,7 +1885,7 @@ Use Virtual Processor ID support if available.  This prevents the need for TLB
 flushes on VM entry and exit, increasing performance.
 
 ### vpmu
-> `= ( <boolean> | { bts | ipc | arch [, ...] } )`
+> `= ( <boolean> | { bts | ipc | arch | rtm-abort=<bool> [, ...] } )`
 
 > Default: `off`
 
@@ -1911,6 +1911,21 @@ in the Pre-Defined Architectural Performance Events table from the Intel 64
 and IA-32 Architectures Software Developer's Manual, Volume 3B, System
 Programming Guide, Part 2.
 
+vpmu=rtm-abort controls a trade-off between working Restricted Transactional
+Memory, and working performance counters.
+
+All processors released to date (Q1 2019) supporting Transactional Memory
+Extensions suffer an erratum which has been addressed in microcode.
+
+Processors based on the Skylake microarchitecture with up-to-date
+microcode internally use performance counter 3 to work around the erratum.
+A consequence is that the counter gets reprogrammed whenever an `XBEGIN`
+instruction is executed.
+
+An alternative mode exists where PCR3 behaves as before, at the cost of
+`XBEGIN` unconditionally aborting.  Enabling `rtm-abort` mode will
+activate this alternative mode.
+
 If a boolean is not used, combinations of flags are allowed, comma separated.
 For example, vpmu=arch,bts.
 
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index d6e60be..702c072 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -157,7 +157,11 @@ static const char *str_7d0[32] =
 
     [ 2] = "avx512_4vnniw", [ 3] = "avx512_4fmaps",
 
-    [4 ... 25] = "REZ",
+    [4 ... 11] = "REZ",
+
+    [12] = "REZ",           [13] = "tsx-force-abort",
+
+    [14 ... 25] = "REZ",
 
     [26] = "ibrsb",         [27] = "stibp",
     [28] = "l1d_flush",     [29] = "arch_caps",
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index a7c0d49..449273c 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -356,6 +356,9 @@ static void Intel_errata_workarounds(struct cpuinfo_x86 *c)
 	if (c->x86 == 6 && cpu_has_clflush &&
 	    (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
 		__set_bit(X86_FEATURE_CLFLUSH_MONITOR, c->x86_capability);
+
+	if (cpu_has_tsx_force_abort && opt_rtm_abort)
+		wrmsrl(MSR_TSX_FORCE_ABORT, TSX_FORCE_ABORT_RTM);
 }
 
 
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 40da7e3..8b7a7a9 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -53,6 +53,7 @@ CHECK_pmu_params;
 static unsigned int __read_mostly opt_vpmu_enabled;
 unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
 unsigned int __read_mostly vpmu_features = 0;
+bool __read_mostly opt_rtm_abort;
 static void parse_vpmu_params(char *s);
 custom_param("vpmu", parse_vpmu_params);
 
@@ -63,6 +64,8 @@ static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
 
 static int parse_vpmu_param(char *s, unsigned int len)
 {
+    int val;
+
     if ( !*s || !len )
         return 0;
     if ( !strncmp(s, "bts", len) )
@@ -71,6 +74,8 @@ static int parse_vpmu_param(char *s, unsigned int len)
         vpmu_features |= XENPMU_FEATURE_IPC_ONLY;
     else if ( !strncmp(s, "arch", len) )
         vpmu_features |= XENPMU_FEATURE_ARCH_ONLY;
+    else if ( (val = parse_boolean("rtm-abort", s, s + len)) >= 0 )
+        opt_rtm_abort = val;
     else
         return 1;
     return 0;
@@ -97,6 +102,10 @@ static void __init parse_vpmu_params(char *s)
                 break;
             p = sep + 1;
         }
+
+        if ( !vpmu_features ) /* rtm-abort doesn't imply vpmu=1 */
+            break;
+
         /* fall through */
     case 1:
         /* Default VPMU mode */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0cbb0f5..346f1cf 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3425,6 +3425,8 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_PRED_CMD:
     case MSR_FLUSH_CMD:
         /* Write-only */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         goto gp_fault;
 
     case MSR_SPEC_CTRL:
@@ -3647,6 +3649,8 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
 
     case MSR_ARCH_CAPABILITIES:
         /* Read-only */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         goto gp_fault;
 
     case MSR_AMD64_NB_CFG:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2f9f75f..8635f70 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2640,6 +2640,8 @@ static int priv_op_read_msr(unsigned int reg, uint64_t *val,
     case MSR_PRED_CMD:
     case MSR_FLUSH_CMD:
         /* Write-only */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         break;
 
     case MSR_SPEC_CTRL:
@@ -2861,6 +2863,8 @@ static int priv_op_write_msr(unsigned int reg, uint64_t val,
     case MSR_INTEL_PLATFORM_INFO:
     case MSR_ARCH_CAPABILITIES:
         /* The MSR is read-only. */
+    case MSR_TSX_FORCE_ABORT:
+        /* Not offered to guests. */
         break;
 
     case MSR_SPEC_CTRL:
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 5043231..b10d8ef 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -100,6 +100,9 @@
 /* CPUID level 0x80000007.edx */
 #define cpu_has_itsc            boot_cpu_has(X86_FEATURE_ITSC)
 
+/* CPUID level 0x00000007:0.edx */
+#define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
+
 /* Synthesized. */
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
 #define cpu_has_cpuid_faulting  boot_cpu_has(X86_FEATURE_CPUID_FAULTING)
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 17722d2..8dec40e 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -53,6 +53,9 @@
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
 
+#define MSR_TSX_FORCE_ABORT             0x0000010f
+#define TSX_FORCE_ABORT_RTM             (_AC(1, ULL) <<  0)
+
 /* Intel MSRs. Some also available on other CPUs */
 #define MSR_IA32_PERFCTR0		0x000000c1
 #define MSR_IA32_A_PERFCTR0		0x000004c1
diff --git a/xen/include/asm-x86/vpmu.h b/xen/include/asm-x86/vpmu.h
index 5e778ab..1287b9f 100644
--- a/xen/include/asm-x86/vpmu.h
+++ b/xen/include/asm-x86/vpmu.h
@@ -125,6 +125,7 @@ static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
 extern unsigned int vpmu_mode;
 extern unsigned int vpmu_features;
+extern bool opt_rtm_abort;
 
 /* Context switch */
 static inline void vpmu_switch_from(struct vcpu *prev)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index e1a2c4e..33b515e 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -241,6 +241,7 @@ XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
 XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation Single Precision */
+XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */
 XEN_CPUFEATURE(L1D_FLUSH,     9*32+28) /*S  MSR_FLUSH_CMD and L1D flush. */

[-- Attachment #24: xsa297/xsa297-4.9-2.patch --]
[-- Type: application/octet-stream, Size: 4253 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Reposition the XPTI command line parsing logic

It has ended up in the middle of the mitigation calculation logic.  Move it to
be beside the other command line parsing.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 25da6a2..9665ec5 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -206,6 +206,73 @@ static int __init parse_spec_ctrl(char *s)
 }
 custom_param("spec-ctrl", parse_spec_ctrl);
 
+int8_t __read_mostly opt_xpti_hwdom = -1;
+int8_t __read_mostly opt_xpti_domu = -1;
+
+static __init void xpti_init_default(uint64_t caps)
+{
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+        caps = ARCH_CAPABILITIES_RDCL_NO;
+
+    if ( caps & ARCH_CAPABILITIES_RDCL_NO )
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 0;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 0;
+    }
+    else
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 1;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 1;
+    }
+}
+
+static __init int parse_xpti(char *s)
+{
+    char *ss;
+    int val, rc = 0;
+
+    /* Interpret 'xpti' alone in its positive boolean form. */
+    if ( *s == '\0' )
+        opt_xpti_hwdom = opt_xpti_domu = 1;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        switch ( parse_bool(s) )
+        {
+        case 0:
+            opt_xpti_hwdom = opt_xpti_domu = 0;
+            break;
+
+        case 1:
+            opt_xpti_hwdom = opt_xpti_domu = 1;
+            break;
+
+        default:
+            if ( !strcmp(s, "default") )
+                opt_xpti_hwdom = opt_xpti_domu = -1;
+            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
+                opt_xpti_hwdom = val;
+            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
+                opt_xpti_domu = val;
+            else if ( *s )
+                rc = -EINVAL;
+            break;
+        }
+
+        s = ss + 1;
+    } while ( ss );
+
+    return rc;
+}
+custom_param("xpti", parse_xpti);
+
 int8_t __read_mostly opt_pv_l1tf_hwdom = -1;
 int8_t __read_mostly opt_pv_l1tf_domu = -1;
 
@@ -639,73 +706,6 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
-int8_t __read_mostly opt_xpti_hwdom = -1;
-int8_t __read_mostly opt_xpti_domu = -1;
-
-static __init void xpti_init_default(uint64_t caps)
-{
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-        caps = ARCH_CAPABILITIES_RDCL_NO;
-
-    if ( caps & ARCH_CAPABILITIES_RDCL_NO )
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 0;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 0;
-    }
-    else
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 1;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 1;
-    }
-}
-
-static __init int parse_xpti(char *s)
-{
-    char *ss;
-    int val, rc = 0;
-
-    /* Interpret 'xpti' alone in its positive boolean form. */
-    if ( *s == '\0' )
-        opt_xpti_hwdom = opt_xpti_domu = 1;
-
-    do {
-        ss = strchr(s, ',');
-        if ( ss )
-            *ss = '\0';
-
-        switch ( parse_bool(s) )
-        {
-        case 0:
-            opt_xpti_hwdom = opt_xpti_domu = 0;
-            break;
-
-        case 1:
-            opt_xpti_hwdom = opt_xpti_domu = 1;
-            break;
-
-        default:
-            if ( !strcmp(s, "default") )
-                opt_xpti_hwdom = opt_xpti_domu = -1;
-            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
-                opt_xpti_hwdom = val;
-            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
-                opt_xpti_domu = val;
-            else if ( *s )
-                rc = -EINVAL;
-            break;
-        }
-
-        s = ss + 1;
-    } while ( ss );
-
-    return rc;
-}
-custom_param("xpti", parse_xpti);
-
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;

[-- Attachment #25: xsa297/xsa297-4.9-3.patch --]
[-- Type: application/octet-stream, Size: 2207 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT

This is a model specific register which details the current configuration
cores and threads in the package.  Because of how Hyperthread and Core
configuration works works in firmware, the MSR it is de-facto constant and
will remain unchanged until the next system reset.

It is a read only MSR (so unilaterally reject writes), but for now retain its
leaky-on-read properties.  Further CPUID/MSR work is required before we can
start virtualising a consistent topology to the guest, and retaining the old
behaviour is the safest course of action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 346f1cf..0164ae5 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3647,6 +3647,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
         wrmsrl(MSR_FLUSH_CMD, msr_content);
         break;
 
+    case MSR_INTEL_CORE_THREAD_COUNT:
     case MSR_ARCH_CAPABILITIES:
         /* Read-only */
     case MSR_TSX_FORCE_ABORT:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 8635f70..f3c8705 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2860,6 +2860,7 @@ static int priv_op_write_msr(unsigned int reg, uint64_t val,
             wrmsrl(reg, val);
         return X86EMUL_OKAY;
 
+    case MSR_INTEL_CORE_THREAD_COUNT:
     case MSR_INTEL_PLATFORM_INFO:
     case MSR_ARCH_CAPABILITIES:
         /* The MSR is read-only. */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 8dec40e..bc72476 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -34,6 +34,10 @@
 #define EFER_KNOWN_MASK		(EFER_SCE | EFER_LME | EFER_LMA | EFER_NX | \
 				 EFER_SVME | EFER_LMSLE | EFER_FFXSE)
 
+#define MSR_INTEL_CORE_THREAD_COUNT     0x00000035
+#define MSR_CTC_THREAD_MASK             0x0000ffff
+#define MSR_CTC_CORE_MASK               0xffff0000
+
 /* Speculation Controls. */
 #define MSR_SPEC_CTRL			0x00000048
 #define SPEC_CTRL_IBRS			(_AC(1, ULL) << 0)

[-- Attachment #26: xsa297/xsa297-4.9-4.patch --]
[-- Type: application/octet-stream, Size: 4400 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/boot: Detect the firmware SMT setting correctly on Intel hardware

While boot_cpu_data.x86_num_siblings is an accurate value to use on AMD
hardware, it isn't on Intel when the user has disabled Hyperthreading in the
firmware.  As a result, a user which has chosen to disable HT still gets
nagged on L1TF-vulnerable hardware when they haven't chosen an explicit
smt=<bool> setting.

Make use of the largely-undocumented MSR_INTEL_CORE_THREAD_COUNT which in
practice exists since Nehalem, when booting on real hardware.  Fall back to
using the ACPI table APIC IDs.

While adjusting this logic, fix a latent bug in amd_get_topology().  The
thread count field in CPUID.0x8000001e.ebx is documented as 8 bits wide,
rather than 2 bits wide.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index cb2abda..c61beed 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -501,7 +501,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
                 u32 eax, ebx, ecx, edx;
 
                 cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-                c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
+                c->x86_num_siblings = ((ebx >> 8) & 0xff) + 1;
 
                 if (c->x86 < 0x17)
                         c->compute_unit_id = ebx & 0xFF;
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 9665ec5..2a7267c 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -400,6 +400,45 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            opt_pv_l1tf_domu  ? "enabled"  : "disabled");
 }
 
+static bool __init check_smt_enabled(void)
+{
+    uint64_t val;
+    unsigned int cpu;
+
+    /*
+     * x86_num_siblings defaults to 1 in the absence of other information, and
+     * is adjusted based on other topology information found in CPUID leaves.
+     *
+     * On AMD hardware, it will be the current SMT configuration.  On Intel
+     * hardware, it will represent the maximum capability, rather than the
+     * current configuration.
+     */
+    if ( boot_cpu_data.x86_num_siblings < 2 )
+        return false;
+
+    /*
+     * Intel Nehalem and later hardware does have an MSR which reports the
+     * current count of cores/threads in the package.
+     *
+     * At the time of writing, it is almost completely undocumented, so isn't
+     * virtualised reliably.
+     */
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && !cpu_has_hypervisor &&
+         !rdmsr_safe(MSR_INTEL_CORE_THREAD_COUNT, val) )
+        return (MASK_EXTR(val, MSR_CTC_CORE_MASK) !=
+                MASK_EXTR(val, MSR_CTC_THREAD_MASK));
+
+    /*
+     * Search over the CPUs reported in the ACPI tables.  Any whose APIC ID
+     * has a non-zero thread id component indicates that SMT is active.
+     */
+    for_each_present_cpu ( cpu )
+        if ( x86_cpu_to_apicid[cpu] & (boot_cpu_data.x86_num_siblings - 1) )
+            return true;
+
+    return false;
+}
+
 /* Calculate whether Retpoline is known-safe on this CPU. */
 static bool __init retpoline_safe(uint64_t caps)
 {
@@ -709,12 +748,14 @@ static __init void l1tf_calculations(uint64_t caps)
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
-    bool use_spec_ctrl = false, ibrs = false;
+    bool use_spec_ctrl = false, ibrs = false, hw_smt_enabled;
     uint64_t caps = 0;
 
     if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
         rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
+    hw_smt_enabled = check_smt_enabled();
+
     /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
      * instructions exactly and disable all heuristics.
@@ -887,8 +928,7 @@ void __init init_speculation_mitigations(void)
      * However, if we are on affected hardware, with HT enabled, and the user
      * hasn't explicitly chosen whether to use HT or not, nag them to do so.
      */
-    if ( opt_smt == -1 && cpu_has_bug_l1tf &&
-         boot_cpu_data.x86_num_siblings > 1 )
+    if ( opt_smt == -1 && cpu_has_bug_l1tf && hw_smt_enabled )
         warning_add(
             "Booted on L1TF-vulnerable hardware with SMT/Hyperthreading\n"
             "enabled.  Please assess your configuration and choose an\n"

[-- Attachment #27: xsa297/xsa297-4.9-5.patch --]
[-- Type: application/octet-stream, Size: 2181 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Misc non-functional cleanup

 * Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
   mitigations will shortly appear.
 * Use alternative_input() and cover the lack of memory cobber with a further
   barrier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index c846354..4983071 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -61,6 +61,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     uint32_t val = 0;
 
     /*
+     * Branch Target Injection:
+     *
      * Latch the new shadow value, then enable shadowing, then update the MSR.
      * There are no SMP issues here; only local processor ordering concerns.
      */
@@ -68,8 +70,9 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     barrier();
     info->spec_ctrl_flags |= SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -78,13 +81,16 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     uint32_t val = info->xen_spec_ctrl;
 
     /*
+     * Branch Target Injection:
+     *
      * Disable shadowing before updating the MSR.  There are no SMP issues
      * here; only local processor ordering concerns.
      */
     info->spec_ctrl_flags &= ~SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */

[-- Attachment #28: xsa297/xsa297-4.9-6.patch --]
[-- Type: application/octet-stream, Size: 7428 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data
 Sampling

The MD_CLEAR feature can be automatically offered to guests.  No
infrastructure is needed in Xen to support the guest making use of it.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 10ed971..a74a995 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -456,7 +456,7 @@ accounting for hardware capabilities as enumerated via CPUID.
 
 Currently accepted:
 
-The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`,
+The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`,
 `l1d-flush` and `ssbd` are used by default if available and applicable.  They can
 be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
 won't offer them to guests.
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index d0f4eeb..20d0602 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -158,6 +158,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
         {"de",           0x00000001, NA, CPUID_REG_EDX,  2,  1},
         {"vme",          0x00000001, NA, CPUID_REG_EDX,  1,  1},
         {"fpu",          0x00000001, NA, CPUID_REG_EDX,  0,  1},
+        {"md-clear",     0x00000007,  0, CPUID_REG_EDX, 10,  1},
         {"ibrsb",        0x00000007,  0, CPUID_REG_EDX, 26,  1},
         {"stibp",        0x00000007,  0, CPUID_REG_EDX, 27,  1},
         {"l1d-flush",    0x00000007,  0, CPUID_REG_EDX, 28,  1},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 702c072..72c67d0 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -157,8 +157,9 @@ static const char *str_7d0[32] =
 
     [ 2] = "avx512_4vnniw", [ 3] = "avx512_4fmaps",
 
-    [4 ... 11] = "REZ",
+    [4 ... 9] = "REZ",
 
+    [10] = "md-clear",      [11] = "REZ",
     [12] = "REZ",           [13] = "tsx-force-abort",
 
     [14 ... 25] = "REZ",
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 0f59cd6..2201f8a 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -28,7 +28,12 @@ static int __init parse_xen_cpuid(const char *s)
         if ( !ss )
             ss = strchr(s, '\0');
 
-        if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+        if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        {
+            if ( !val )
+                setup_clear_cpu_cap(X86_FEATURE_MD_CLEAR);
+        }
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
         {
             if ( !val )
                 setup_clear_cpu_cap(X86_FEATURE_IBPB);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 2a7267c..4e6558a 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -330,17 +330,19 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
     printk("Speculative mitigation facilities:\n");
 
     /* Hardware features which pertain to speculative mitigations. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP"     : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))  ? " SSBD"      : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"      : "",
            (caps & ARCH_CAPABILITIES_IBRS_ALL)      ? " IBRS_ALL"  : "",
            (caps & ARCH_CAPABILITIES_RDCL_NO)       ? " RDCL_NO"   : "",
            (caps & ARCH_CAPS_RSBA)                  ? " RSBA"      : "",
            (caps & ARCH_CAPS_SKIP_L1DFL)            ? " SKIP_L1DFL": "",
-           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "");
+           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "",
+           (caps & ARCH_CAPS_MDS_NO)                ? " MDS_NO"    : "");
 
     /* Compiled-in support which pertains to mitigations. */
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
@@ -377,19 +379,21 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Alternatives blocks for protecting against and/or virtualising
      * mitigation support for guests.
      */
-    printk("  Support for VMs: PV:%s%s%s%s, HVM:%s%s%s%s\n",
+    printk("  Support for VMs: PV:%s%s%s%s%s, HVM:%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
-           opt_eager_fpu                             ? " EAGER_FPU"     : "");
+           opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index bc72476..92d9ee7 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -53,6 +53,7 @@
 #define ARCH_CAPS_RSBA			(_AC(1, ULL) << 2)
 #define ARCH_CAPS_SKIP_L1DFL		(_AC(1, ULL) << 3)
 #define ARCH_CAPS_SSB_NO		(_AC(1, ULL) << 4)
+#define ARCH_CAPS_MDS_NO		(_AC(1, ULL) << 5)
 
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 33b515e..000a941 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -241,6 +241,7 @@ XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
 XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation Single Precision */
+XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */

[-- Attachment #29: xsa297/xsa297-4.9-7.patch --]
[-- Type: application/octet-stream, Size: 6443 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers

Three synthetic features are introduced, as we need individual control of
each, depending on circumstances.  A later change will enable them at
appropriate points.

The verw_sel field doesn't strictly need to live in struct cpu_info.  It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index 60bad81..0ab1612 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -137,6 +137,7 @@ void __dummy__(void)
 
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
     OFFSET(CPUINFO_processor_id, struct cpu_info, processor_id);
+    OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
     OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index d06211c..9a322ac 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -35,3 +35,6 @@ XEN_CPUFEATURE(SC_RSB_HVM,      (FSCAPINTS+0)*32+21) /* RSB overwrite needed for
 XEN_CPUFEATURE(NO_XPTI,         (FSCAPINTS+0)*32+22) /* XPTI mitigation not in use */
 XEN_CPUFEATURE(SC_MSR_IDLE,     (FSCAPINTS+0)*32+23) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
 XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+24) /* Xen uses MSR_DEBUGCTL.LBR */
+XEN_CPUFEATURE(SC_VERW_PV,      (FSCAPINTS+0)*32+25) /* VERW used by Xen for PV */
+XEN_CPUFEATURE(SC_VERW_HVM,     (FSCAPINTS+0)*32+26) /* VERW used by Xen for HVM */
+XEN_CPUFEATURE(SC_VERW_IDLE,    (FSCAPINTS+0)*32+27) /* VERW used by Xen for idle */
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 9a137a1..b7860a7 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -38,6 +38,7 @@ struct vcpu;
 struct cpu_info {
     struct cpu_user_regs guest_cpu_user_regs;
     unsigned int processor_id;
+    unsigned int verw_sel;
     struct vcpu *current_vcpu;
     unsigned long per_cpu_offset;
     unsigned long cr4;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 4983071..98a0a50 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -53,6 +53,13 @@ static inline void init_shadow_spec_ctrl_state(void)
     info->shadow_spec_ctrl = 0;
     info->xen_spec_ctrl = default_xen_spec_ctrl;
     info->spec_ctrl_flags = default_spec_ctrl_flags;
+
+    /*
+     * For least latency, the VERW selector should be a writeable data
+     * descriptor resident in the cache.  __HYPERVISOR_DS32 shares a cache
+     * line with __HYPERVISOR_CS, so is expected to be very cache-hot.
+     */
+    info->verw_sel = __HYPERVISOR_DS32;
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe after this call. */
@@ -73,6 +80,22 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When entering idle, our store buffer entries
+     * are re-partitioned to allow the other threads to use them.
+     *
+     * Flush the buffers to ensure that no sensitive data of ours can be
+     * leaked by a sibling after it gets our store buffer entries.
+     *
+     * Note: VERW must be encoded with a memory operand, as it is only that
+     * form which causes a flush.
+     */
+    alternative_input(ASM_NOP8, "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
+                      [sel] "m" (info->verw_sel));
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -91,6 +114,17 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When exiting idle, the other threads store
+     * buffer entries are re-partitioned to give us some.
+     *
+     * We now have store buffer entries with stale data from sibling threads.
+     * A flush if necessary will be performed on the return to guest path.
+     */
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 4d864eb..560306f 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -247,12 +247,18 @@
 /* Use when exiting to PV guest context. */
 #define SPEC_CTRL_EXIT_TO_PV                                            \
     ALTERNATIVE __stringify(ASM_NOP24),                                 \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;              \
+    ALTERNATIVE __stringify(ASM_NOP8),                                  \
+        __stringify(verw CPUINFO_verw_sel(%rsp)),                       \
+        X86_FEATURE_SC_VERW_PV
 
 /* Use when exiting to HVM guest context. */
 #define SPEC_CTRL_EXIT_TO_HVM                                           \
     ALTERNATIVE __stringify(ASM_NOP24),                                 \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM;             \
+    ALTERNATIVE __stringify(ASM_NOP8),                                  \
+        __stringify(verw CPUINFO_verw_sel(%rsp)),                       \
+        X86_FEATURE_SC_VERW_HVM
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.

[-- Attachment #30: xsa297/xsa297-4.9-8.patch --]
[-- Type: application/octet-stream, Size: 12919 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Introduce options to control VERW flushing

The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:

 MLPDS - Microarchitectural Load Port Data Sampling
 MSBDS - Microarchitectural Store Buffer Data Sampling
 MFBDS - Microarchitectural Fill Buffer Data Sampling
 MDSUM - Microarchitectural Data Sampling Uncacheable Memory

MDSUM is a special case of the other three, and isn't distinguished further.

These issues pertain to three microarchitectural buffers.  The Load Ports, the
Store Buffers and the Fill Buffers.  Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.

For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:

 * Vulnerable hardware is divided into two categories - parts which suffer
   from SP only, and parts with any other combination of vulnerabilities.

 * SP only has an HT interaction when the thread goes idle, due to the static
   partitioning of resources.  LP and FB have HT interactions at all points,
   due to the competitive sharing of resources.  All issues potentially leak
   data across the return-to-guest transition.

 * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
   we don't need to do both on the HVM return-to-guest path.  However, some
   parts are not vulnerable to L1TF (therefore have no MSR_FLUSH_CMD), but are
   vulnerable to MDS, so do require VERW on the HVM path.

Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index a74a995..8522621 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1677,7 +1677,7 @@ is being interpreted as a custom timeout in milliseconds. Zero or boolean
 false disable the quirk workaround, which is also the default.
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
 >              l1d-flush}=<bool> ]`
 
@@ -1701,9 +1701,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=` and `rsb=` offer fine grained control
-over the alternative blocks used by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+grained control over the alternative blocks used by Xen.  These impact Xen's
+ability to protect itself, and Xen's ability to virtualise support for guests
+to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -1712,6 +1713,11 @@ protect itself, and Xen's ability to virtualise support for guests to use.
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
 * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
   Return Address Stack on entry to Xen.
+* `md-clear=` offers control over whether to use VERW to flush
+  microarchitectural buffers on idle and exit from Xen.  *Note: For
+  compatibility with development versions of this fix, `mds=` is also accepted
+  on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
+  preference to here.*
 
 If Xen was compiled with INDIRECT\_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 4e6558a..558626c 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -33,6 +33,8 @@ static bool __initdata opt_msr_sc_pv = true;
 static bool __initdata opt_msr_sc_hvm = true;
 static bool __initdata opt_rsb_pv = true;
 static bool __initdata opt_rsb_hvm = true;
+static int8_t __initdata opt_md_clear_pv = -1;
+static int8_t __initdata opt_md_clear_hvm = -1;
 
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
@@ -57,6 +59,9 @@ paddr_t __read_mostly l1tf_addr_mask, __read_mostly l1tf_safe_maddr;
 static bool __initdata cpu_has_bug_l1tf;
 static unsigned int __initdata l1d_maxphysaddr;
 
+static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
+static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
+
 static int __init parse_bti(const char *s)
 {
     const char *ss;
@@ -133,6 +138,8 @@ static int __init parse_spec_ctrl(char *s)
         disable_common:
             opt_rsb_pv = false;
             opt_rsb_hvm = false;
+            opt_md_clear_pv = 0;
+            opt_md_clear_hvm = 0;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -155,11 +162,13 @@ static int __init parse_spec_ctrl(char *s)
         {
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
+            opt_md_clear_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
+            opt_md_clear_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
         {
@@ -171,6 +180,12 @@ static int __init parse_spec_ctrl(char *s)
             opt_rsb_pv = val;
             opt_rsb_hvm = val;
         }
+        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 ||
+                  (val = parse_boolean("mds", s, ss)) >= 0 )
+        {
+            opt_md_clear_pv = val;
+            opt_md_clear_hvm = val;
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -356,7 +371,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -366,7 +381,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            !boot_cpu_has(X86_FEATURE_SSBD)           ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
            opt_ibpb                                  ? " IBPB"  : "",
-           opt_l1d_flush                             ? " L1D_FLUSH" : "");
+           opt_l1d_flush                             ? " L1D_FLUSH" : "",
+           opt_md_clear_pv || opt_md_clear_hvm       ? " VERW"  : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -749,6 +765,107 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
+/* Calculate whether this CPU is vulnerable to MDS. */
+static __init void mds_calculations(uint64_t caps)
+{
+    /* MDS is only known to affect Intel Family 6 processors at this time. */
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return;
+
+    /* Any processor advertising MDS_NO should be not vulnerable to MDS. */
+    if ( caps & ARCH_CAPS_MDS_NO )
+        return;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+        /*
+         * Core processors since at least Nehalem are vulnerable.
+         */
+    case 0x1f: /* Auburndale / Havendale */
+    case 0x1e: /* Nehalem */
+    case 0x1a: /* Nehalem EP */
+    case 0x2e: /* Nehalem EX */
+    case 0x25: /* Westmere */
+    case 0x2c: /* Westmere EP */
+    case 0x2f: /* Westmere EX */
+    case 0x2a: /* SandyBridge */
+    case 0x2d: /* SandyBridge EP/EX */
+    case 0x3a: /* IvyBridge */
+    case 0x3e: /* IvyBridge EP/EX */
+    case 0x3c: /* Haswell */
+    case 0x3f: /* Haswell EX/EP */
+    case 0x45: /* Haswell D */
+    case 0x46: /* Haswell H */
+    case 0x3d: /* Broadwell */
+    case 0x47: /* Broadwell H */
+    case 0x4f: /* Broadwell EP/EX */
+    case 0x56: /* Broadwell D */
+    case 0x4e: /* Skylake M */
+    case 0x5e: /* Skylake D */
+        cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Some Core processors have per-stepping vulnerability.
+         */
+    case 0x55: /* Skylake-X / Cascade Lake */
+        if ( boot_cpu_data.x86_mask <= 5 )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x8e: /* Kaby / Coffee / Whiskey Lake M */
+        if ( boot_cpu_data.x86_mask <= 0xb )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x9e: /* Kaby / Coffee / Whiskey Lake D */
+        if ( boot_cpu_data.x86_mask <= 0xc )
+            cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Very old and very new Atom processors are not vulnerable.
+         */
+    case 0x1c: /* Pineview */
+    case 0x26: /* Lincroft */
+    case 0x27: /* Penwell */
+    case 0x35: /* Cloverview */
+    case 0x36: /* Cedarview */
+    case 0x7a: /* Goldmont */
+        break;
+
+        /*
+         * Middling Atom processors are vulnerable to just the Store Buffer
+         * aspect.
+         */
+    case 0x37: /* Baytrail / Valleyview (Silvermont) */
+    case 0x4a: /* Merrifield */
+    case 0x4c: /* Cherrytrail / Brasswell */
+    case 0x4d: /* Avaton / Rangely (Silvermont) */
+    case 0x5a: /* Moorefield */
+    case 0x5d:
+    case 0x65:
+    case 0x6e:
+    case 0x75:
+        /*
+         * Knights processors (which are based on the Silvermont/Airmont
+         * microarchitecture) are similarly only affected by the Store Buffer
+         * aspect.
+         */
+    case 0x57: /* Knights Landing */
+    case 0x85: /* Knights Mill */
+        cpu_has_bug_msbds_only = true;
+        break;
+
+    default:
+        printk("Unrecognised CPU model %#x - assuming vulnerable to MDS\n",
+               boot_cpu_data.x86_model);
+        cpu_has_bug_mds = true;
+        break;
+    }
+}
+
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
@@ -938,6 +1055,47 @@ void __init init_speculation_mitigations(void)
             "enabled.  Please assess your configuration and choose an\n"
             "explicit 'smt=<bool>' setting.  See XSA-273.\n");
 
+    mds_calculations(caps);
+
+    /*
+     * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+     * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+     * but it is somewhat better than nothing.
+     */
+    if ( opt_md_clear_pv == -1 )
+        opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                           boot_cpu_has(X86_FEATURE_MD_CLEAR));
+    if ( opt_md_clear_hvm == -1 )
+        opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                            boot_cpu_has(X86_FEATURE_MD_CLEAR));
+
+    /*
+     * Enable MDS defences as applicable.  The PV blocks need using all the
+     * time, and the Idle blocks need using if either PV or HVM defences are
+     * used.
+     *
+     * HVM is more complicated.  The MD_CLEAR microcode extends L1D_FLUSH with
+     * equivelent semantics to avoid needing to perform both flushes on the
+     * HVM path.  The HVM blocks don't need activating if our hypervisor told
+     * us it was handling L1D_FLUSH, or we are using L1D_FLUSH ourselves.
+     */
+    if ( opt_md_clear_pv )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_PV);
+    if ( opt_md_clear_pv || opt_md_clear_hvm )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
+    if ( opt_md_clear_hvm && !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_HVM);
+
+    /*
+     * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+     * active and no explicit SMT choice.
+     */
+    if ( opt_smt == -1 && cpu_has_bug_mds && hw_smt_enabled )
+        warning_add(
+            "Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading\n"
+            "enabled.  Mitigations will not be fully effective.  Please\n"
+            "choose an explicit smt=<bool> setting.  See XSA-297.\n");
+
     print_details(thunk, caps);
 
     /*

[-- Attachment #31: xsa297/xsa297-4.10-1.patch --]
[-- Type: application/octet-stream, Size: 4283 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Reposition the XPTI command line parsing logic

It has ended up in the middle of the mitigation calculation logic.  Move it to
be beside the other command line parsing.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index f586289..b17281b 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -207,6 +207,73 @@ static int __init parse_spec_ctrl(const char *s)
 }
 custom_param("spec-ctrl", parse_spec_ctrl);
 
+int8_t __read_mostly opt_xpti_hwdom = -1;
+int8_t __read_mostly opt_xpti_domu = -1;
+
+static __init void xpti_init_default(uint64_t caps)
+{
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+        caps = ARCH_CAPS_RDCL_NO;
+
+    if ( caps & ARCH_CAPS_RDCL_NO )
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 0;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 0;
+    }
+    else
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 1;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 1;
+    }
+}
+
+static __init int parse_xpti(const char *s)
+{
+    const char *ss;
+    int val, rc = 0;
+
+    /* Interpret 'xpti' alone in its positive boolean form. */
+    if ( *s == '\0' )
+        opt_xpti_hwdom = opt_xpti_domu = 1;
+
+    do {
+        ss = strchr(s, ',');
+        if ( !ss )
+            ss = strchr(s, '\0');
+
+        switch ( parse_bool(s, ss) )
+        {
+        case 0:
+            opt_xpti_hwdom = opt_xpti_domu = 0;
+            break;
+
+        case 1:
+            opt_xpti_hwdom = opt_xpti_domu = 1;
+            break;
+
+        default:
+            if ( !strcmp(s, "default") )
+                opt_xpti_hwdom = opt_xpti_domu = -1;
+            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
+                opt_xpti_hwdom = val;
+            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
+                opt_xpti_domu = val;
+            else if ( *s )
+                rc = -EINVAL;
+            break;
+        }
+
+        s = ss + 1;
+    } while ( *ss );
+
+    return rc;
+}
+custom_param("xpti", parse_xpti);
+
 int8_t __read_mostly opt_pv_l1tf_hwdom = -1;
 int8_t __read_mostly opt_pv_l1tf_domu = -1;
 
@@ -660,73 +727,6 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
-int8_t __read_mostly opt_xpti_hwdom = -1;
-int8_t __read_mostly opt_xpti_domu = -1;
-
-static __init void xpti_init_default(uint64_t caps)
-{
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-        caps = ARCH_CAPS_RDCL_NO;
-
-    if ( caps & ARCH_CAPS_RDCL_NO )
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 0;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 0;
-    }
-    else
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 1;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 1;
-    }
-}
-
-static __init int parse_xpti(const char *s)
-{
-    const char *ss;
-    int val, rc = 0;
-
-    /* Interpret 'xpti' alone in its positive boolean form. */
-    if ( *s == '\0' )
-        opt_xpti_hwdom = opt_xpti_domu = 1;
-
-    do {
-        ss = strchr(s, ',');
-        if ( !ss )
-            ss = strchr(s, '\0');
-
-        switch ( parse_bool(s, ss) )
-        {
-        case 0:
-            opt_xpti_hwdom = opt_xpti_domu = 0;
-            break;
-
-        case 1:
-            opt_xpti_hwdom = opt_xpti_domu = 1;
-            break;
-
-        default:
-            if ( !strcmp(s, "default") )
-                opt_xpti_hwdom = opt_xpti_domu = -1;
-            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
-                opt_xpti_hwdom = val;
-            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
-                opt_xpti_domu = val;
-            else if ( *s )
-                rc = -EINVAL;
-            break;
-        }
-
-        s = ss + 1;
-    } while ( *ss );
-
-    return rc;
-}
-custom_param("xpti", parse_xpti);
-
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;

[-- Attachment #32: xsa297/xsa297-4.10-2.patch --]
[-- Type: application/octet-stream, Size: 2077 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT

This is a model specific register which details the current configuration
cores and threads in the package.  Because of how Hyperthread and Core
configuration works works in firmware, the MSR it is de-facto constant and
will remain unchanged until the next system reset.

It is a read only MSR (so unilaterally reject writes), but for now retain its
leaky-on-read properties.  Further CPUID/MSR work is required before we can
start virtualising a consistent topology to the guest, and retaining the old
behaviour is the safest course of action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index a20eec4..6853d4c 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -161,6 +161,10 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
                _MSR_MISC_FEATURES_CPUID_FAULTING;
         break;
 
+        /*
+         * TODO: Implement when we have better topology representation.
+    case MSR_INTEL_CORE_THREAD_COUNT:
+         */
     default:
         return X86EMUL_UNHANDLEABLE;
     }
@@ -183,6 +187,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
     {
         uint64_t rsvd;
 
+    case MSR_INTEL_CORE_THREAD_COUNT:
     case MSR_INTEL_PLATFORM_INFO:
     case MSR_ARCH_CAPABILITIES:
         /* Read-only */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 3b07c2f..4c0ba60 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -34,6 +34,10 @@
 #define EFER_KNOWN_MASK		(EFER_SCE | EFER_LME | EFER_LMA | EFER_NX | \
 				 EFER_SVME | EFER_LMSLE | EFER_FFXSE)
 
+#define MSR_INTEL_CORE_THREAD_COUNT     0x00000035
+#define MSR_CTC_THREAD_MASK             0x0000ffff
+#define MSR_CTC_CORE_MASK               0xffff0000
+
 /* Speculation Controls. */
 #define MSR_SPEC_CTRL			0x00000048
 #define SPEC_CTRL_IBRS			(_AC(1, ULL) << 0)

[-- Attachment #33: xsa297/xsa297-4.10-3.patch --]
[-- Type: application/octet-stream, Size: 4424 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/boot: Detect the firmware SMT setting correctly on Intel hardware

While boot_cpu_data.x86_num_siblings is an accurate value to use on AMD
hardware, it isn't on Intel when the user has disabled Hyperthreading in the
firmware.  As a result, a user which has chosen to disable HT still gets
nagged on L1TF-vulnerable hardware when they haven't chosen an explicit
smt=<bool> setting.

Make use of the largely-undocumented MSR_INTEL_CORE_THREAD_COUNT which in
practice exists since Nehalem, when booting on real hardware.  Fall back to
using the ACPI table APIC IDs.

While adjusting this logic, fix a latent bug in amd_get_topology().  The
thread count field in CPUID.0x8000001e.ebx is documented as 8 bits wide,
rather than 2 bits wide.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 76078b5..894b892 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -505,7 +505,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
                 u32 eax, ebx, ecx, edx;
 
                 cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-                c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
+                c->x86_num_siblings = ((ebx >> 8) & 0xff) + 1;
 
                 if (c->x86 < 0x17)
                         c->compute_unit_id = ebx & 0xFF;
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index b17281b..d2cbb93 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -401,6 +401,45 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            opt_pv_l1tf_domu  ? "enabled"  : "disabled");
 }
 
+static bool __init check_smt_enabled(void)
+{
+    uint64_t val;
+    unsigned int cpu;
+
+    /*
+     * x86_num_siblings defaults to 1 in the absence of other information, and
+     * is adjusted based on other topology information found in CPUID leaves.
+     *
+     * On AMD hardware, it will be the current SMT configuration.  On Intel
+     * hardware, it will represent the maximum capability, rather than the
+     * current configuration.
+     */
+    if ( boot_cpu_data.x86_num_siblings < 2 )
+        return false;
+
+    /*
+     * Intel Nehalem and later hardware does have an MSR which reports the
+     * current count of cores/threads in the package.
+     *
+     * At the time of writing, it is almost completely undocumented, so isn't
+     * virtualised reliably.
+     */
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && !cpu_has_hypervisor &&
+         !rdmsr_safe(MSR_INTEL_CORE_THREAD_COUNT, val) )
+        return (MASK_EXTR(val, MSR_CTC_CORE_MASK) !=
+                MASK_EXTR(val, MSR_CTC_THREAD_MASK));
+
+    /*
+     * Search over the CPUs reported in the ACPI tables.  Any whose APIC ID
+     * has a non-zero thread id component indicates that SMT is active.
+     */
+    for_each_present_cpu ( cpu )
+        if ( x86_cpu_to_apicid[cpu] & (boot_cpu_data.x86_num_siblings - 1) )
+            return true;
+
+    return false;
+}
+
 /* Calculate whether Retpoline is known-safe on this CPU. */
 static bool __init retpoline_safe(uint64_t caps)
 {
@@ -730,12 +769,14 @@ static __init void l1tf_calculations(uint64_t caps)
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
-    bool use_spec_ctrl = false, ibrs = false;
+    bool use_spec_ctrl = false, ibrs = false, hw_smt_enabled;
     uint64_t caps = 0;
 
     if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
         rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
+    hw_smt_enabled = check_smt_enabled();
+
     /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
      * instructions exactly and disable all heuristics.
@@ -911,8 +952,7 @@ void __init init_speculation_mitigations(void)
      * However, if we are on affected hardware, with HT enabled, and the user
      * hasn't explicitly chosen whether to use HT or not, nag them to do so.
      */
-    if ( opt_smt == -1 && cpu_has_bug_l1tf && !pv_shim &&
-         boot_cpu_data.x86_num_siblings > 1 )
+    if ( opt_smt == -1 && cpu_has_bug_l1tf && !pv_shim && hw_smt_enabled )
         warning_add(
             "Booted on L1TF-vulnerable hardware with SMT/Hyperthreading\n"
             "enabled.  Please assess your configuration and choose an\n"

[-- Attachment #34: xsa297/xsa297-4.10-4.patch --]
[-- Type: application/octet-stream, Size: 2181 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Misc non-functional cleanup

 * Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
   mitigations will shortly appear.
 * Use alternative_input() and cover the lack of memory cobber with a further
   barrier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index c846354..4983071 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -61,6 +61,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     uint32_t val = 0;
 
     /*
+     * Branch Target Injection:
+     *
      * Latch the new shadow value, then enable shadowing, then update the MSR.
      * There are no SMP issues here; only local processor ordering concerns.
      */
@@ -68,8 +70,9 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     barrier();
     info->spec_ctrl_flags |= SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -78,13 +81,16 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     uint32_t val = info->xen_spec_ctrl;
 
     /*
+     * Branch Target Injection:
+     *
      * Disable shadowing before updating the MSR.  There are no SMP issues
      * here; only local processor ordering concerns.
      */
     info->spec_ctrl_flags &= ~SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */

[-- Attachment #35: xsa297/xsa297-4.10-5.patch --]
[-- Type: application/octet-stream, Size: 7364 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data
 Sampling

The MD_CLEAR feature can be automatically offered to guests.  No
infrastructure is needed in Xen to support the guest making use of it.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 23c3da6..c330078 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -496,7 +496,7 @@ accounting for hardware capabilities as enumerated via CPUID.
 
 Currently accepted:
 
-The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`,
+The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`,
 `l1d-flush` and `ssbd` are used by default if available and applicable.  They can
 be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
 won't offer them to guests.
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index 52e16c2..5a1702d 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -202,6 +202,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
 
         {"avx512-4vnniw",0x00000007,  0, CPUID_REG_EDX,  2,  1},
         {"avx512-4fmaps",0x00000007,  0, CPUID_REG_EDX,  3,  1},
+        {"md-clear",     0x00000007,  0, CPUID_REG_EDX, 10,  1},
         {"ibrsb",        0x00000007,  0, CPUID_REG_EDX, 26,  1},
         {"stibp",        0x00000007,  0, CPUID_REG_EDX, 27,  1},
         {"l1d-flush",    0x00000007,  0, CPUID_REG_EDX, 28,  1},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 6c91a48..570424d 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -162,8 +162,9 @@ static const char *str_7d0[32] =
 
     [ 2] = "avx512_4vnniw", [ 3] = "avx512_4fmaps",
 
-    [4 ... 11] = "REZ",
+    [4 ... 9] = "REZ",
 
+    [10] = "md-clear",      [11] = "REZ",
     [12] = "REZ",           [13] = "tsx-force-abort",
 
     [14 ... 25] = "REZ",
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 24b9495..98b63f3 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -28,7 +28,12 @@ static int __init parse_xen_cpuid(const char *s)
         if ( !ss )
             ss = strchr(s, '\0');
 
-        if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+        if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        {
+            if ( !val )
+                setup_clear_cpu_cap(X86_FEATURE_MD_CLEAR);
+        }
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
         {
             if ( !val )
                 setup_clear_cpu_cap(X86_FEATURE_IBPB);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index d2cbb93..0d8d572 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -331,17 +331,19 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
     printk("Speculative mitigation facilities:\n");
 
     /* Hardware features which pertain to speculative mitigations. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP"     : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))  ? " SSBD"      : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"      : "",
            (caps & ARCH_CAPS_IBRS_ALL)              ? " IBRS_ALL"  : "",
            (caps & ARCH_CAPS_RDCL_NO)               ? " RDCL_NO"   : "",
            (caps & ARCH_CAPS_RSBA)                  ? " RSBA"      : "",
            (caps & ARCH_CAPS_SKIP_L1DFL)            ? " SKIP_L1DFL": "",
-           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "");
+           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "",
+           (caps & ARCH_CAPS_MDS_NO)                ? " MDS_NO"    : "");
 
     /* Compiled-in support which pertains to mitigations. */
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
@@ -378,19 +380,21 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Alternatives blocks for protecting against and/or virtualising
      * mitigation support for guests.
      */
-    printk("  Support for VMs: PV:%s%s%s%s, HVM:%s%s%s%s\n",
+    printk("  Support for VMs: PV:%s%s%s%s%s, HVM:%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
-           opt_eager_fpu                             ? " EAGER_FPU"     : "");
+           opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 4c0ba60..e61aac2 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -53,6 +53,7 @@
 #define ARCH_CAPS_RSBA			(_AC(1, ULL) << 2)
 #define ARCH_CAPS_SKIP_L1DFL		(_AC(1, ULL) << 3)
 #define ARCH_CAPS_SSB_NO		(_AC(1, ULL) << 4)
+#define ARCH_CAPS_MDS_NO		(_AC(1, ULL) << 5)
 
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index aa2656d..a14d8a7 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -242,6 +242,7 @@ XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
 XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation Single Precision */
+XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */

[-- Attachment #36: xsa297/xsa297-4.10-6.patch --]
[-- Type: application/octet-stream, Size: 6443 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers

Three synthetic features are introduced, as we need individual control of
each, depending on circumstances.  A later change will enable them at
appropriate points.

The verw_sel field doesn't strictly need to live in struct cpu_info.  It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index 10c243a..428c49c 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -137,6 +137,7 @@ void __dummy__(void)
 
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
     OFFSET(CPUINFO_processor_id, struct cpu_info, processor_id);
+    OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
     OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index 8e5cc53..96a5a01 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -33,3 +33,6 @@ XEN_CPUFEATURE(SC_RSB_HVM,      (FSCAPINTS+0)*32+19) /* RSB overwrite needed for
 XEN_CPUFEATURE(NO_XPTI,         (FSCAPINTS+0)*32+20) /* XPTI mitigation not in use */
 XEN_CPUFEATURE(SC_MSR_IDLE,     (FSCAPINTS+0)*32+21) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
 XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+22) /* Xen uses MSR_DEBUGCTL.LBR */
+XEN_CPUFEATURE(SC_VERW_PV,      (FSCAPINTS+0)*32+23) /* VERW used by Xen for PV */
+XEN_CPUFEATURE(SC_VERW_HVM,     (FSCAPINTS+0)*32+24) /* VERW used by Xen for HVM */
+XEN_CPUFEATURE(SC_VERW_IDLE,    (FSCAPINTS+0)*32+25) /* VERW used by Xen for idle */
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 9a137a1..b7860a7 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -38,6 +38,7 @@ struct vcpu;
 struct cpu_info {
     struct cpu_user_regs guest_cpu_user_regs;
     unsigned int processor_id;
+    unsigned int verw_sel;
     struct vcpu *current_vcpu;
     unsigned long per_cpu_offset;
     unsigned long cr4;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 4983071..98a0a50 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -53,6 +53,13 @@ static inline void init_shadow_spec_ctrl_state(void)
     info->shadow_spec_ctrl = 0;
     info->xen_spec_ctrl = default_xen_spec_ctrl;
     info->spec_ctrl_flags = default_spec_ctrl_flags;
+
+    /*
+     * For least latency, the VERW selector should be a writeable data
+     * descriptor resident in the cache.  __HYPERVISOR_DS32 shares a cache
+     * line with __HYPERVISOR_CS, so is expected to be very cache-hot.
+     */
+    info->verw_sel = __HYPERVISOR_DS32;
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe after this call. */
@@ -73,6 +80,22 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When entering idle, our store buffer entries
+     * are re-partitioned to allow the other threads to use them.
+     *
+     * Flush the buffers to ensure that no sensitive data of ours can be
+     * leaked by a sibling after it gets our store buffer entries.
+     *
+     * Note: VERW must be encoded with a memory operand, as it is only that
+     * form which causes a flush.
+     */
+    alternative_input(ASM_NOP8, "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
+                      [sel] "m" (info->verw_sel));
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -91,6 +114,17 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When exiting idle, the other threads store
+     * buffer entries are re-partitioned to give us some.
+     *
+     * We now have store buffer entries with stale data from sibling threads.
+     * A flush if necessary will be performed on the return to guest path.
+     */
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index c659f3f..f2e36a3 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -248,12 +248,18 @@
 /* Use when exiting to PV guest context. */
 #define SPEC_CTRL_EXIT_TO_PV                                            \
     ALTERNATIVE __stringify(ASM_NOP24),                                 \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;              \
+    ALTERNATIVE __stringify(ASM_NOP8),                                  \
+        __stringify(verw CPUINFO_verw_sel(%rsp)),                       \
+        X86_FEATURE_SC_VERW_PV
 
 /* Use when exiting to HVM guest context. */
 #define SPEC_CTRL_EXIT_TO_HVM                                           \
     ALTERNATIVE __stringify(ASM_NOP24),                                 \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM;             \
+    ALTERNATIVE __stringify(ASM_NOP8),                                  \
+        __stringify(verw CPUINFO_verw_sel(%rsp)),                       \
+        X86_FEATURE_SC_VERW_HVM
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.

[-- Attachment #37: xsa297/xsa297-4.10-7.patch --]
[-- Type: application/octet-stream, Size: 12937 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Introduce options to control VERW flushing

The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:

 MLPDS - Microarchitectural Load Port Data Sampling
 MSBDS - Microarchitectural Store Buffer Data Sampling
 MFBDS - Microarchitectural Fill Buffer Data Sampling
 MDSUM - Microarchitectural Data Sampling Uncacheable Memory

MDSUM is a special case of the other three, and isn't distinguished further.

These issues pertain to three microarchitectural buffers.  The Load Ports, the
Store Buffers and the Fill Buffers.  Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.

For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:

 * Vulnerable hardware is divided into two categories - parts which suffer
   from SP only, and parts with any other combination of vulnerabilities.

 * SP only has an HT interaction when the thread goes idle, due to the static
   partitioning of resources.  LP and FB have HT interactions at all points,
   due to the competitive sharing of resources.  All issues potentially leak
   data across the return-to-guest transition.

 * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
   we don't need to do both on the HVM return-to-guest path.  However, some
   parts are not vulnerable to L1TF (therefore have no MSR_FLUSH_CMD), but are
   vulnerable to MDS, so do require VERW on the HVM path.

Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index c330078..b79b340 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1782,7 +1782,7 @@ is being interpreted as a custom timeout in milliseconds. Zero or boolean
 false disable the quirk workaround, which is also the default.
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
 >              l1d-flush}=<bool> ]`
 
@@ -1806,9 +1806,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=` and `rsb=` offer fine grained control
-over the alternative blocks used by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+grained control over the alternative blocks used by Xen.  These impact Xen's
+ability to protect itself, and Xen's ability to virtualise support for guests
+to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -1817,6 +1818,11 @@ protect itself, and Xen's ability to virtualise support for guests to use.
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
 * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
   Return Address Stack on entry to Xen.
+* `md-clear=` offers control over whether to use VERW to flush
+  microarchitectural buffers on idle and exit from Xen.  *Note: For
+  compatibility with development versions of this fix, `mds=` is also accepted
+  on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
+  preference to here.*
 
 If Xen was compiled with INDIRECT\_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 0d8d572..e25dadf 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -34,6 +34,8 @@ static bool __initdata opt_msr_sc_pv = true;
 static bool __initdata opt_msr_sc_hvm = true;
 static bool __initdata opt_rsb_pv = true;
 static bool __initdata opt_rsb_hvm = true;
+static int8_t __initdata opt_md_clear_pv = -1;
+static int8_t __initdata opt_md_clear_hvm = -1;
 
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
@@ -58,6 +60,9 @@ paddr_t __read_mostly l1tf_addr_mask, __read_mostly l1tf_safe_maddr;
 static bool __initdata cpu_has_bug_l1tf;
 static unsigned int __initdata l1d_maxphysaddr;
 
+static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
+static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
+
 static int __init parse_bti(const char *s)
 {
     const char *ss;
@@ -134,6 +139,8 @@ static int __init parse_spec_ctrl(const char *s)
         disable_common:
             opt_rsb_pv = false;
             opt_rsb_hvm = false;
+            opt_md_clear_pv = 0;
+            opt_md_clear_hvm = 0;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -156,11 +163,13 @@ static int __init parse_spec_ctrl(const char *s)
         {
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
+            opt_md_clear_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
+            opt_md_clear_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
         {
@@ -172,6 +181,12 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_pv = val;
             opt_rsb_hvm = val;
         }
+        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 ||
+                  (val = parse_boolean("mds", s, ss)) >= 0 )
+        {
+            opt_md_clear_pv = val;
+            opt_md_clear_hvm = val;
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -357,7 +372,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -367,7 +382,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            !boot_cpu_has(X86_FEATURE_SSBD)           ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
            opt_ibpb                                  ? " IBPB"  : "",
-           opt_l1d_flush                             ? " L1D_FLUSH" : "");
+           opt_l1d_flush                             ? " L1D_FLUSH" : "",
+           opt_md_clear_pv || opt_md_clear_hvm       ? " VERW"  : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -770,6 +786,107 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
+/* Calculate whether this CPU is vulnerable to MDS. */
+static __init void mds_calculations(uint64_t caps)
+{
+    /* MDS is only known to affect Intel Family 6 processors at this time. */
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return;
+
+    /* Any processor advertising MDS_NO should be not vulnerable to MDS. */
+    if ( caps & ARCH_CAPS_MDS_NO )
+        return;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+        /*
+         * Core processors since at least Nehalem are vulnerable.
+         */
+    case 0x1f: /* Auburndale / Havendale */
+    case 0x1e: /* Nehalem */
+    case 0x1a: /* Nehalem EP */
+    case 0x2e: /* Nehalem EX */
+    case 0x25: /* Westmere */
+    case 0x2c: /* Westmere EP */
+    case 0x2f: /* Westmere EX */
+    case 0x2a: /* SandyBridge */
+    case 0x2d: /* SandyBridge EP/EX */
+    case 0x3a: /* IvyBridge */
+    case 0x3e: /* IvyBridge EP/EX */
+    case 0x3c: /* Haswell */
+    case 0x3f: /* Haswell EX/EP */
+    case 0x45: /* Haswell D */
+    case 0x46: /* Haswell H */
+    case 0x3d: /* Broadwell */
+    case 0x47: /* Broadwell H */
+    case 0x4f: /* Broadwell EP/EX */
+    case 0x56: /* Broadwell D */
+    case 0x4e: /* Skylake M */
+    case 0x5e: /* Skylake D */
+        cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Some Core processors have per-stepping vulnerability.
+         */
+    case 0x55: /* Skylake-X / Cascade Lake */
+        if ( boot_cpu_data.x86_mask <= 5 )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x8e: /* Kaby / Coffee / Whiskey Lake M */
+        if ( boot_cpu_data.x86_mask <= 0xb )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x9e: /* Kaby / Coffee / Whiskey Lake D */
+        if ( boot_cpu_data.x86_mask <= 0xc )
+            cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Very old and very new Atom processors are not vulnerable.
+         */
+    case 0x1c: /* Pineview */
+    case 0x26: /* Lincroft */
+    case 0x27: /* Penwell */
+    case 0x35: /* Cloverview */
+    case 0x36: /* Cedarview */
+    case 0x7a: /* Goldmont */
+        break;
+
+        /*
+         * Middling Atom processors are vulnerable to just the Store Buffer
+         * aspect.
+         */
+    case 0x37: /* Baytrail / Valleyview (Silvermont) */
+    case 0x4a: /* Merrifield */
+    case 0x4c: /* Cherrytrail / Brasswell */
+    case 0x4d: /* Avaton / Rangely (Silvermont) */
+    case 0x5a: /* Moorefield */
+    case 0x5d:
+    case 0x65:
+    case 0x6e:
+    case 0x75:
+        /*
+         * Knights processors (which are based on the Silvermont/Airmont
+         * microarchitecture) are similarly only affected by the Store Buffer
+         * aspect.
+         */
+    case 0x57: /* Knights Landing */
+    case 0x85: /* Knights Mill */
+        cpu_has_bug_msbds_only = true;
+        break;
+
+    default:
+        printk("Unrecognised CPU model %#x - assuming vulnerable to MDS\n",
+               boot_cpu_data.x86_model);
+        cpu_has_bug_mds = true;
+        break;
+    }
+}
+
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
@@ -962,6 +1079,47 @@ void __init init_speculation_mitigations(void)
             "enabled.  Please assess your configuration and choose an\n"
             "explicit 'smt=<bool>' setting.  See XSA-273.\n");
 
+    mds_calculations(caps);
+
+    /*
+     * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+     * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+     * but it is somewhat better than nothing.
+     */
+    if ( opt_md_clear_pv == -1 )
+        opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                           boot_cpu_has(X86_FEATURE_MD_CLEAR));
+    if ( opt_md_clear_hvm == -1 )
+        opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                            boot_cpu_has(X86_FEATURE_MD_CLEAR));
+
+    /*
+     * Enable MDS defences as applicable.  The PV blocks need using all the
+     * time, and the Idle blocks need using if either PV or HVM defences are
+     * used.
+     *
+     * HVM is more complicated.  The MD_CLEAR microcode extends L1D_FLUSH with
+     * equivelent semantics to avoid needing to perform both flushes on the
+     * HVM path.  The HVM blocks don't need activating if our hypervisor told
+     * us it was handling L1D_FLUSH, or we are using L1D_FLUSH ourselves.
+     */
+    if ( opt_md_clear_pv )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_PV);
+    if ( opt_md_clear_pv || opt_md_clear_hvm )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
+    if ( opt_md_clear_hvm && !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_HVM);
+
+    /*
+     * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+     * active and no explicit SMT choice.
+     */
+    if ( opt_smt == -1 && cpu_has_bug_mds && hw_smt_enabled )
+        warning_add(
+            "Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading\n"
+            "enabled.  Mitigations will not be fully effective.  Please\n"
+            "choose an explicit smt=<bool> setting.  See XSA-297.\n");
+
     print_details(thunk, caps);
 
     /*

[-- Attachment #38: xsa297/xsa297-4.11-1.patch --]
[-- Type: application/octet-stream, Size: 4283 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Reposition the XPTI command line parsing logic

It has ended up in the middle of the mitigation calculation logic.  Move it to
be beside the other command line parsing.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 8fa6c10..949bbda 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -223,6 +223,73 @@ static int __init parse_spec_ctrl(const char *s)
 }
 custom_param("spec-ctrl", parse_spec_ctrl);
 
+int8_t __read_mostly opt_xpti_hwdom = -1;
+int8_t __read_mostly opt_xpti_domu = -1;
+
+static __init void xpti_init_default(uint64_t caps)
+{
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+        caps = ARCH_CAPS_RDCL_NO;
+
+    if ( caps & ARCH_CAPS_RDCL_NO )
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 0;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 0;
+    }
+    else
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 1;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 1;
+    }
+}
+
+static __init int parse_xpti(const char *s)
+{
+    const char *ss;
+    int val, rc = 0;
+
+    /* Interpret 'xpti' alone in its positive boolean form. */
+    if ( *s == '\0' )
+        opt_xpti_hwdom = opt_xpti_domu = 1;
+
+    do {
+        ss = strchr(s, ',');
+        if ( !ss )
+            ss = strchr(s, '\0');
+
+        switch ( parse_bool(s, ss) )
+        {
+        case 0:
+            opt_xpti_hwdom = opt_xpti_domu = 0;
+            break;
+
+        case 1:
+            opt_xpti_hwdom = opt_xpti_domu = 1;
+            break;
+
+        default:
+            if ( !strcmp(s, "default") )
+                opt_xpti_hwdom = opt_xpti_domu = -1;
+            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
+                opt_xpti_hwdom = val;
+            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
+                opt_xpti_domu = val;
+            else if ( *s )
+                rc = -EINVAL;
+            break;
+        }
+
+        s = ss + 1;
+    } while ( *ss );
+
+    return rc;
+}
+custom_param("xpti", parse_xpti);
+
 int8_t __read_mostly opt_pv_l1tf_hwdom = -1;
 int8_t __read_mostly opt_pv_l1tf_domu = -1;
 
@@ -676,73 +743,6 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
-int8_t __read_mostly opt_xpti_hwdom = -1;
-int8_t __read_mostly opt_xpti_domu = -1;
-
-static __init void xpti_init_default(uint64_t caps)
-{
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-        caps = ARCH_CAPS_RDCL_NO;
-
-    if ( caps & ARCH_CAPS_RDCL_NO )
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 0;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 0;
-    }
-    else
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 1;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 1;
-    }
-}
-
-static __init int parse_xpti(const char *s)
-{
-    const char *ss;
-    int val, rc = 0;
-
-    /* Interpret 'xpti' alone in its positive boolean form. */
-    if ( *s == '\0' )
-        opt_xpti_hwdom = opt_xpti_domu = 1;
-
-    do {
-        ss = strchr(s, ',');
-        if ( !ss )
-            ss = strchr(s, '\0');
-
-        switch ( parse_bool(s, ss) )
-        {
-        case 0:
-            opt_xpti_hwdom = opt_xpti_domu = 0;
-            break;
-
-        case 1:
-            opt_xpti_hwdom = opt_xpti_domu = 1;
-            break;
-
-        default:
-            if ( !strcmp(s, "default") )
-                opt_xpti_hwdom = opt_xpti_domu = -1;
-            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
-                opt_xpti_hwdom = val;
-            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
-                opt_xpti_domu = val;
-            else if ( *s )
-                rc = -EINVAL;
-            break;
-        }
-
-        s = ss + 1;
-    } while ( *ss );
-
-    return rc;
-}
-custom_param("xpti", parse_xpti);
-
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;

[-- Attachment #39: xsa297/xsa297-4.11-2.patch --]
[-- Type: application/octet-stream, Size: 2077 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT

This is a model specific register which details the current configuration
cores and threads in the package.  Because of how Hyperthread and Core
configuration works works in firmware, the MSR it is de-facto constant and
will remain unchanged until the next system reset.

It is a read only MSR (so unilaterally reject writes), but for now retain its
leaky-on-read properties.  Further CPUID/MSR work is required before we can
start virtualising a consistent topology to the guest, and retaining the old
behaviour is the safest course of action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index b49fbd8..153f36b 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -180,6 +180,10 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
                _MSR_MISC_FEATURES_CPUID_FAULTING;
         break;
 
+        /*
+         * TODO: Implement when we have better topology representation.
+    case MSR_INTEL_CORE_THREAD_COUNT:
+         */
     default:
         return X86EMUL_UNHANDLEABLE;
     }
@@ -202,6 +206,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
     {
         uint64_t rsvd;
 
+    case MSR_INTEL_CORE_THREAD_COUNT:
     case MSR_INTEL_PLATFORM_INFO:
     case MSR_ARCH_CAPABILITIES:
         /* Read-only */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 7588fc1..7cddfca 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -34,6 +34,10 @@
 #define EFER_KNOWN_MASK		(EFER_SCE | EFER_LME | EFER_LMA | EFER_NX | \
 				 EFER_SVME | EFER_LMSLE | EFER_FFXSE)
 
+#define MSR_INTEL_CORE_THREAD_COUNT     0x00000035
+#define MSR_CTC_THREAD_MASK             0x0000ffff
+#define MSR_CTC_CORE_MASK               0xffff0000
+
 /* Speculation Controls. */
 #define MSR_SPEC_CTRL			0x00000048
 #define SPEC_CTRL_IBRS			(_AC(1, ULL) << 0)

[-- Attachment #40: xsa297/xsa297-4.11-3.patch --]
[-- Type: application/octet-stream, Size: 4424 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/boot: Detect the firmware SMT setting correctly on Intel hardware

While boot_cpu_data.x86_num_siblings is an accurate value to use on AMD
hardware, it isn't on Intel when the user has disabled Hyperthreading in the
firmware.  As a result, a user which has chosen to disable HT still gets
nagged on L1TF-vulnerable hardware when they haven't chosen an explicit
smt=<bool> setting.

Make use of the largely-undocumented MSR_INTEL_CORE_THREAD_COUNT which in
practice exists since Nehalem, when booting on real hardware.  Fall back to
using the ACPI table APIC IDs.

While adjusting this logic, fix a latent bug in amd_get_topology().  The
thread count field in CPUID.0x8000001e.ebx is documented as 8 bits wide,
rather than 2 bits wide.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 76078b5..894b892 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -505,7 +505,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
                 u32 eax, ebx, ecx, edx;
 
                 cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-                c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
+                c->x86_num_siblings = ((ebx >> 8) & 0xff) + 1;
 
                 if (c->x86 < 0x17)
                         c->compute_unit_id = ebx & 0xFF;
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 949bbda..ac1be4a 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -417,6 +417,45 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            opt_pv_l1tf_domu  ? "enabled"  : "disabled");
 }
 
+static bool __init check_smt_enabled(void)
+{
+    uint64_t val;
+    unsigned int cpu;
+
+    /*
+     * x86_num_siblings defaults to 1 in the absence of other information, and
+     * is adjusted based on other topology information found in CPUID leaves.
+     *
+     * On AMD hardware, it will be the current SMT configuration.  On Intel
+     * hardware, it will represent the maximum capability, rather than the
+     * current configuration.
+     */
+    if ( boot_cpu_data.x86_num_siblings < 2 )
+        return false;
+
+    /*
+     * Intel Nehalem and later hardware does have an MSR which reports the
+     * current count of cores/threads in the package.
+     *
+     * At the time of writing, it is almost completely undocumented, so isn't
+     * virtualised reliably.
+     */
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && !cpu_has_hypervisor &&
+         !rdmsr_safe(MSR_INTEL_CORE_THREAD_COUNT, val) )
+        return (MASK_EXTR(val, MSR_CTC_CORE_MASK) !=
+                MASK_EXTR(val, MSR_CTC_THREAD_MASK));
+
+    /*
+     * Search over the CPUs reported in the ACPI tables.  Any whose APIC ID
+     * has a non-zero thread id component indicates that SMT is active.
+     */
+    for_each_present_cpu ( cpu )
+        if ( x86_cpu_to_apicid[cpu] & (boot_cpu_data.x86_num_siblings - 1) )
+            return true;
+
+    return false;
+}
+
 /* Calculate whether Retpoline is known-safe on this CPU. */
 static bool __init retpoline_safe(uint64_t caps)
 {
@@ -746,12 +785,14 @@ static __init void l1tf_calculations(uint64_t caps)
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
-    bool use_spec_ctrl = false, ibrs = false;
+    bool use_spec_ctrl = false, ibrs = false, hw_smt_enabled;
     uint64_t caps = 0;
 
     if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
         rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
+    hw_smt_enabled = check_smt_enabled();
+
     /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
      * instructions exactly and disable all heuristics.
@@ -927,8 +968,7 @@ void __init init_speculation_mitigations(void)
      * However, if we are on affected hardware, with HT enabled, and the user
      * hasn't explicitly chosen whether to use HT or not, nag them to do so.
      */
-    if ( opt_smt == -1 && cpu_has_bug_l1tf && !pv_shim &&
-         boot_cpu_data.x86_num_siblings > 1 )
+    if ( opt_smt == -1 && cpu_has_bug_l1tf && !pv_shim && hw_smt_enabled )
         warning_add(
             "Booted on L1TF-vulnerable hardware with SMT/Hyperthreading\n"
             "enabled.  Please assess your configuration and choose an\n"

[-- Attachment #41: xsa297/xsa297-4.11-4.patch --]
[-- Type: application/octet-stream, Size: 2181 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Misc non-functional cleanup

 * Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
   mitigations will shortly appear.
 * Use alternative_input() and cover the lack of memory cobber with a further
   barrier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index c846354..4983071 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -61,6 +61,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     uint32_t val = 0;
 
     /*
+     * Branch Target Injection:
+     *
      * Latch the new shadow value, then enable shadowing, then update the MSR.
      * There are no SMP issues here; only local processor ordering concerns.
      */
@@ -68,8 +70,9 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     barrier();
     info->spec_ctrl_flags |= SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -78,13 +81,16 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     uint32_t val = info->xen_spec_ctrl;
 
     /*
+     * Branch Target Injection:
+     *
      * Disable shadowing before updating the MSR.  There are no SMP issues
      * here; only local processor ordering concerns.
      */
     info->spec_ctrl_flags &= ~SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */

[-- Attachment #42: xsa297/xsa297-4.11-5.patch --]
[-- Type: application/octet-stream, Size: 7314 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data
 Sampling

The MD_CLEAR feature can be automatically offered to guests.  No
infrastructure is needed in Xen to support the guest making use of it.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 8e24380..8260dfb 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -489,7 +489,7 @@ accounting for hardware capabilities as enumerated via CPUID.
 
 Currently accepted:
 
-The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`,
+The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`,
 `l1d-flush` and `ssbd` are used by default if available and applicable.  They can
 be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
 won't offer them to guests.
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index 52e16c2..5a1702d 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -202,6 +202,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
 
         {"avx512-4vnniw",0x00000007,  0, CPUID_REG_EDX,  2,  1},
         {"avx512-4fmaps",0x00000007,  0, CPUID_REG_EDX,  3,  1},
+        {"md-clear",     0x00000007,  0, CPUID_REG_EDX, 10,  1},
         {"ibrsb",        0x00000007,  0, CPUID_REG_EDX, 26,  1},
         {"stibp",        0x00000007,  0, CPUID_REG_EDX, 27,  1},
         {"l1d-flush",    0x00000007,  0, CPUID_REG_EDX, 28,  1},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 0ac903a..16697c4 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -142,6 +142,7 @@ static const char *str_7d0[32] =
 {
     [ 2] = "avx512_4vnniw", [ 3] = "avx512_4fmaps",
 
+    [10] = "md-clear",
     /* 12 */                [13] = "tsx-force-abort",
 
     [26] = "ibrsb",         [27] = "stibp",
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 5cc89e2..497bd2a 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -28,7 +28,12 @@ static int __init parse_xen_cpuid(const char *s)
         if ( !ss )
             ss = strchr(s, '\0');
 
-        if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+        if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        {
+            if ( !val )
+                setup_clear_cpu_cap(X86_FEATURE_MD_CLEAR);
+        }
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
         {
             if ( !val )
                 setup_clear_cpu_cap(X86_FEATURE_IBPB);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index ac1be4a..fdd90a8 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -347,17 +347,19 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
     printk("Speculative mitigation facilities:\n");
 
     /* Hardware features which pertain to speculative mitigations. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP"     : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))  ? " SSBD"      : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"      : "",
            (caps & ARCH_CAPS_IBRS_ALL)              ? " IBRS_ALL"  : "",
            (caps & ARCH_CAPS_RDCL_NO)               ? " RDCL_NO"   : "",
            (caps & ARCH_CAPS_RSBA)                  ? " RSBA"      : "",
            (caps & ARCH_CAPS_SKIP_L1DFL)            ? " SKIP_L1DFL": "",
-           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "");
+           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "",
+           (caps & ARCH_CAPS_MDS_NO)                ? " MDS_NO"    : "");
 
     /* Compiled-in support which pertains to mitigations. */
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
@@ -394,19 +396,21 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Alternatives blocks for protecting against and/or virtualising
      * mitigation support for guests.
      */
-    printk("  Support for VMs: PV:%s%s%s%s, HVM:%s%s%s%s\n",
+    printk("  Support for VMs: PV:%s%s%s%s%s, HVM:%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
-           opt_eager_fpu                             ? " EAGER_FPU"     : "");
+           opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 7cddfca..b8151d2 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -53,6 +53,7 @@
 #define ARCH_CAPS_RSBA			(_AC(1, ULL) << 2)
 #define ARCH_CAPS_SKIP_L1DFL		(_AC(1, ULL) << 3)
 #define ARCH_CAPS_SSB_NO		(_AC(1, ULL) << 4)
+#define ARCH_CAPS_MDS_NO		(_AC(1, ULL) << 5)
 
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index aa2656d..a14d8a7 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -242,6 +242,7 @@ XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
 XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation Single Precision */
+XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */

[-- Attachment #43: xsa297/xsa297-4.11-6.patch --]
[-- Type: application/octet-stream, Size: 6287 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers

Three synthetic features are introduced, as we need individual control of
each, depending on circumstances.  A later change will enable them at
appropriate points.

The verw_sel field doesn't strictly need to live in struct cpu_info.  It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index 5957c76..97cff49 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -129,6 +129,7 @@ void __dummy__(void)
 
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
     OFFSET(CPUINFO_processor_id, struct cpu_info, processor_id);
+    OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
     OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index 8e5cc53..96a5a01 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -33,3 +33,6 @@ XEN_CPUFEATURE(SC_RSB_HVM,      (FSCAPINTS+0)*32+19) /* RSB overwrite needed for
 XEN_CPUFEATURE(NO_XPTI,         (FSCAPINTS+0)*32+20) /* XPTI mitigation not in use */
 XEN_CPUFEATURE(SC_MSR_IDLE,     (FSCAPINTS+0)*32+21) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
 XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+22) /* Xen uses MSR_DEBUGCTL.LBR */
+XEN_CPUFEATURE(SC_VERW_PV,      (FSCAPINTS+0)*32+23) /* VERW used by Xen for PV */
+XEN_CPUFEATURE(SC_VERW_HVM,     (FSCAPINTS+0)*32+24) /* VERW used by Xen for HVM */
+XEN_CPUFEATURE(SC_VERW_IDLE,    (FSCAPINTS+0)*32+25) /* VERW used by Xen for idle */
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 5bd64b2..f3508c3 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -38,6 +38,7 @@ struct vcpu;
 struct cpu_info {
     struct cpu_user_regs guest_cpu_user_regs;
     unsigned int processor_id;
+    unsigned int verw_sel;
     struct vcpu *current_vcpu;
     unsigned long per_cpu_offset;
     unsigned long cr4;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 4983071..333d180 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -53,6 +53,13 @@ static inline void init_shadow_spec_ctrl_state(void)
     info->shadow_spec_ctrl = 0;
     info->xen_spec_ctrl = default_xen_spec_ctrl;
     info->spec_ctrl_flags = default_spec_ctrl_flags;
+
+    /*
+     * For least latency, the VERW selector should be a writeable data
+     * descriptor resident in the cache.  __HYPERVISOR_DS32 shares a cache
+     * line with __HYPERVISOR_CS, so is expected to be very cache-hot.
+     */
+    info->verw_sel = __HYPERVISOR_DS32;
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe after this call. */
@@ -73,6 +80,22 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When entering idle, our store buffer entries
+     * are re-partitioned to allow the other threads to use them.
+     *
+     * Flush the buffers to ensure that no sensitive data of ours can be
+     * leaked by a sibling after it gets our store buffer entries.
+     *
+     * Note: VERW must be encoded with a memory operand, as it is only that
+     * form which causes a flush.
+     */
+    alternative_input("", "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
+                      [sel] "m" (info->verw_sel));
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -91,6 +114,17 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     alternative_input(ASM_NOP3, "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When exiting idle, the other threads store
+     * buffer entries are re-partitioned to give us some.
+     *
+     * We now have store buffer entries with stale data from sibling threads.
+     * A flush if necessary will be performed on the return to guest path.
+     */
 }
 
 #endif /* !__X86_SPEC_CTRL_H__ */
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index edace2a..9cc15e7 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -245,12 +245,16 @@
 /* Use when exiting to PV guest context. */
 #define SPEC_CTRL_EXIT_TO_PV                                            \
     ALTERNATIVE "",                                                     \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;              \
+    ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)),           \
+        X86_FEATURE_SC_VERW_PV
 
 /* Use when exiting to HVM guest context. */
 #define SPEC_CTRL_EXIT_TO_HVM                                           \
     ALTERNATIVE "",                                                     \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM;             \
+    ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)),           \
+        X86_FEATURE_SC_VERW_HVM
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.

[-- Attachment #44: xsa297/xsa297-4.11-7.patch --]
[-- Type: application/octet-stream, Size: 12937 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Introduce options to control VERW flushing

The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:

 MLPDS - Microarchitectural Load Port Data Sampling
 MSBDS - Microarchitectural Store Buffer Data Sampling
 MFBDS - Microarchitectural Fill Buffer Data Sampling
 MDSUM - Microarchitectural Data Sampling Uncacheable Memory

MDSUM is a special case of the other three, and isn't distinguished further.

These issues pertain to three microarchitectural buffers.  The Load Ports, the
Store Buffers and the Fill Buffers.  Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.

For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:

 * Vulnerable hardware is divided into two categories - parts which suffer
   from SP only, and parts with any other combination of vulnerabilities.

 * SP only has an HT interaction when the thread goes idle, due to the static
   partitioning of resources.  LP and FB have HT interactions at all points,
   due to the competitive sharing of resources.  All issues potentially leak
   data across the return-to-guest transition.

 * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
   we don't need to do both on the HVM return-to-guest path.  However, some
   parts are not vulnerable to L1TF (therefore have no MSR_FLUSH_CMD), but are
   vulnerable to MDS, so do require VERW on the HVM path.

Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 8260dfb..8108bbf 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1800,7 +1800,7 @@ is being interpreted as a custom timeout in milliseconds. Zero or boolean
 false disable the quirk workaround, which is also the default.
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
 >              l1d-flush}=<bool> ]`
 
@@ -1824,9 +1824,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=` and `rsb=` offer fine grained control
-over the alternative blocks used by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+grained control over the alternative blocks used by Xen.  These impact Xen's
+ability to protect itself, and Xen's ability to virtualise support for guests
+to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -1835,6 +1836,11 @@ protect itself, and Xen's ability to virtualise support for guests to use.
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
 * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
   Return Address Stack on entry to Xen.
+* `md-clear=` offers control over whether to use VERW to flush
+  microarchitectural buffers on idle and exit from Xen.  *Note: For
+  compatibility with development versions of this fix, `mds=` is also accepted
+  on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
+  preference to here.*
 
 If Xen was compiled with INDIRECT\_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index fdd90a8..10fcd77 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -34,6 +34,8 @@ static bool __initdata opt_msr_sc_pv = true;
 static bool __initdata opt_msr_sc_hvm = true;
 static bool __initdata opt_rsb_pv = true;
 static bool __initdata opt_rsb_hvm = true;
+static int8_t __initdata opt_md_clear_pv = -1;
+static int8_t __initdata opt_md_clear_hvm = -1;
 
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
@@ -58,6 +60,9 @@ paddr_t __read_mostly l1tf_addr_mask, __read_mostly l1tf_safe_maddr;
 static bool __initdata cpu_has_bug_l1tf;
 static unsigned int __initdata l1d_maxphysaddr;
 
+static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
+static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
+
 static int __init parse_bti(const char *s)
 {
     const char *ss;
@@ -150,6 +155,8 @@ static int __init parse_spec_ctrl(const char *s)
         disable_common:
             opt_rsb_pv = false;
             opt_rsb_hvm = false;
+            opt_md_clear_pv = 0;
+            opt_md_clear_hvm = 0;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -172,11 +179,13 @@ static int __init parse_spec_ctrl(const char *s)
         {
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
+            opt_md_clear_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
+            opt_md_clear_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
         {
@@ -188,6 +197,12 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_pv = val;
             opt_rsb_hvm = val;
         }
+        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 ||
+                  (val = parse_boolean("mds", s, ss)) >= 0 )
+        {
+            opt_md_clear_pv = val;
+            opt_md_clear_hvm = val;
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -373,7 +388,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -383,7 +398,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            !boot_cpu_has(X86_FEATURE_SSBD)           ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
            opt_ibpb                                  ? " IBPB"  : "",
-           opt_l1d_flush                             ? " L1D_FLUSH" : "");
+           opt_l1d_flush                             ? " L1D_FLUSH" : "",
+           opt_md_clear_pv || opt_md_clear_hvm       ? " VERW"  : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -786,6 +802,107 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
+/* Calculate whether this CPU is vulnerable to MDS. */
+static __init void mds_calculations(uint64_t caps)
+{
+    /* MDS is only known to affect Intel Family 6 processors at this time. */
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return;
+
+    /* Any processor advertising MDS_NO should be not vulnerable to MDS. */
+    if ( caps & ARCH_CAPS_MDS_NO )
+        return;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+        /*
+         * Core processors since at least Nehalem are vulnerable.
+         */
+    case 0x1f: /* Auburndale / Havendale */
+    case 0x1e: /* Nehalem */
+    case 0x1a: /* Nehalem EP */
+    case 0x2e: /* Nehalem EX */
+    case 0x25: /* Westmere */
+    case 0x2c: /* Westmere EP */
+    case 0x2f: /* Westmere EX */
+    case 0x2a: /* SandyBridge */
+    case 0x2d: /* SandyBridge EP/EX */
+    case 0x3a: /* IvyBridge */
+    case 0x3e: /* IvyBridge EP/EX */
+    case 0x3c: /* Haswell */
+    case 0x3f: /* Haswell EX/EP */
+    case 0x45: /* Haswell D */
+    case 0x46: /* Haswell H */
+    case 0x3d: /* Broadwell */
+    case 0x47: /* Broadwell H */
+    case 0x4f: /* Broadwell EP/EX */
+    case 0x56: /* Broadwell D */
+    case 0x4e: /* Skylake M */
+    case 0x5e: /* Skylake D */
+        cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Some Core processors have per-stepping vulnerability.
+         */
+    case 0x55: /* Skylake-X / Cascade Lake */
+        if ( boot_cpu_data.x86_mask <= 5 )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x8e: /* Kaby / Coffee / Whiskey Lake M */
+        if ( boot_cpu_data.x86_mask <= 0xb )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x9e: /* Kaby / Coffee / Whiskey Lake D */
+        if ( boot_cpu_data.x86_mask <= 0xc )
+            cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Very old and very new Atom processors are not vulnerable.
+         */
+    case 0x1c: /* Pineview */
+    case 0x26: /* Lincroft */
+    case 0x27: /* Penwell */
+    case 0x35: /* Cloverview */
+    case 0x36: /* Cedarview */
+    case 0x7a: /* Goldmont */
+        break;
+
+        /*
+         * Middling Atom processors are vulnerable to just the Store Buffer
+         * aspect.
+         */
+    case 0x37: /* Baytrail / Valleyview (Silvermont) */
+    case 0x4a: /* Merrifield */
+    case 0x4c: /* Cherrytrail / Brasswell */
+    case 0x4d: /* Avaton / Rangely (Silvermont) */
+    case 0x5a: /* Moorefield */
+    case 0x5d:
+    case 0x65:
+    case 0x6e:
+    case 0x75:
+        /*
+         * Knights processors (which are based on the Silvermont/Airmont
+         * microarchitecture) are similarly only affected by the Store Buffer
+         * aspect.
+         */
+    case 0x57: /* Knights Landing */
+    case 0x85: /* Knights Mill */
+        cpu_has_bug_msbds_only = true;
+        break;
+
+    default:
+        printk("Unrecognised CPU model %#x - assuming vulnerable to MDS\n",
+               boot_cpu_data.x86_model);
+        cpu_has_bug_mds = true;
+        break;
+    }
+}
+
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
@@ -978,6 +1095,47 @@ void __init init_speculation_mitigations(void)
             "enabled.  Please assess your configuration and choose an\n"
             "explicit 'smt=<bool>' setting.  See XSA-273.\n");
 
+    mds_calculations(caps);
+
+    /*
+     * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+     * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+     * but it is somewhat better than nothing.
+     */
+    if ( opt_md_clear_pv == -1 )
+        opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                           boot_cpu_has(X86_FEATURE_MD_CLEAR));
+    if ( opt_md_clear_hvm == -1 )
+        opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                            boot_cpu_has(X86_FEATURE_MD_CLEAR));
+
+    /*
+     * Enable MDS defences as applicable.  The PV blocks need using all the
+     * time, and the Idle blocks need using if either PV or HVM defences are
+     * used.
+     *
+     * HVM is more complicated.  The MD_CLEAR microcode extends L1D_FLUSH with
+     * equivelent semantics to avoid needing to perform both flushes on the
+     * HVM path.  The HVM blocks don't need activating if our hypervisor told
+     * us it was handling L1D_FLUSH, or we are using L1D_FLUSH ourselves.
+     */
+    if ( opt_md_clear_pv )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_PV);
+    if ( opt_md_clear_pv || opt_md_clear_hvm )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
+    if ( opt_md_clear_hvm && !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_HVM);
+
+    /*
+     * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+     * active and no explicit SMT choice.
+     */
+    if ( opt_smt == -1 && cpu_has_bug_mds && hw_smt_enabled )
+        warning_add(
+            "Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading\n"
+            "enabled.  Mitigations will not be fully effective.  Please\n"
+            "choose an explicit smt=<bool> setting.  See XSA-297.\n");
+
     print_details(thunk, caps);
 
     /*

[-- Attachment #45: xsa297/xsa297-4.12-1.patch --]
[-- Type: application/octet-stream, Size: 4283 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Reposition the XPTI command line parsing logic

It has ended up in the middle of the mitigation calculation logic.  Move it to
be beside the other command line parsing.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 1171c02..99310c8 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -167,6 +167,73 @@ static int __init parse_spec_ctrl(const char *s)
 }
 custom_param("spec-ctrl", parse_spec_ctrl);
 
+int8_t __read_mostly opt_xpti_hwdom = -1;
+int8_t __read_mostly opt_xpti_domu = -1;
+
+static __init void xpti_init_default(uint64_t caps)
+{
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+        caps = ARCH_CAPS_RDCL_NO;
+
+    if ( caps & ARCH_CAPS_RDCL_NO )
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 0;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 0;
+    }
+    else
+    {
+        if ( opt_xpti_hwdom < 0 )
+            opt_xpti_hwdom = 1;
+        if ( opt_xpti_domu < 0 )
+            opt_xpti_domu = 1;
+    }
+}
+
+static __init int parse_xpti(const char *s)
+{
+    const char *ss;
+    int val, rc = 0;
+
+    /* Interpret 'xpti' alone in its positive boolean form. */
+    if ( *s == '\0' )
+        opt_xpti_hwdom = opt_xpti_domu = 1;
+
+    do {
+        ss = strchr(s, ',');
+        if ( !ss )
+            ss = strchr(s, '\0');
+
+        switch ( parse_bool(s, ss) )
+        {
+        case 0:
+            opt_xpti_hwdom = opt_xpti_domu = 0;
+            break;
+
+        case 1:
+            opt_xpti_hwdom = opt_xpti_domu = 1;
+            break;
+
+        default:
+            if ( !strcmp(s, "default") )
+                opt_xpti_hwdom = opt_xpti_domu = -1;
+            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
+                opt_xpti_hwdom = val;
+            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
+                opt_xpti_domu = val;
+            else if ( *s )
+                rc = -EINVAL;
+            break;
+        }
+
+        s = ss + 1;
+    } while ( *ss );
+
+    return rc;
+}
+custom_param("xpti", parse_xpti);
+
 int8_t __read_mostly opt_pv_l1tf_hwdom = -1;
 int8_t __read_mostly opt_pv_l1tf_domu = -1;
 
@@ -627,73 +694,6 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
-int8_t __read_mostly opt_xpti_hwdom = -1;
-int8_t __read_mostly opt_xpti_domu = -1;
-
-static __init void xpti_init_default(uint64_t caps)
-{
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-        caps = ARCH_CAPS_RDCL_NO;
-
-    if ( caps & ARCH_CAPS_RDCL_NO )
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 0;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 0;
-    }
-    else
-    {
-        if ( opt_xpti_hwdom < 0 )
-            opt_xpti_hwdom = 1;
-        if ( opt_xpti_domu < 0 )
-            opt_xpti_domu = 1;
-    }
-}
-
-static __init int parse_xpti(const char *s)
-{
-    const char *ss;
-    int val, rc = 0;
-
-    /* Interpret 'xpti' alone in its positive boolean form. */
-    if ( *s == '\0' )
-        opt_xpti_hwdom = opt_xpti_domu = 1;
-
-    do {
-        ss = strchr(s, ',');
-        if ( !ss )
-            ss = strchr(s, '\0');
-
-        switch ( parse_bool(s, ss) )
-        {
-        case 0:
-            opt_xpti_hwdom = opt_xpti_domu = 0;
-            break;
-
-        case 1:
-            opt_xpti_hwdom = opt_xpti_domu = 1;
-            break;
-
-        default:
-            if ( !strcmp(s, "default") )
-                opt_xpti_hwdom = opt_xpti_domu = -1;
-            else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
-                opt_xpti_hwdom = val;
-            else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
-                opt_xpti_domu = val;
-            else if ( *s )
-                rc = -EINVAL;
-            break;
-        }
-
-        s = ss + 1;
-    } while ( *ss );
-
-    return rc;
-}
-custom_param("xpti", parse_xpti);
-
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;

[-- Attachment #46: xsa297/xsa297-4.12-2.patch --]
[-- Type: application/octet-stream, Size: 2078 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT

This is a model specific register which details the current configuration
cores and threads in the package.  Because of how Hyperthread and Core
configuration works works in firmware, the MSR it is de-facto constant and
will remain unchanged until the next system reset.

It is a read only MSR (so unilaterally reject writes), but for now retain its
leaky-on-read properties.  Further CPUID/MSR work is required before we can
start virtualising a consistent topology to the guest, and retaining the old
behaviour is the safest course of action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 4df4a59..a7f67d9 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -200,6 +200,10 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
                                    ARRAY_SIZE(msrs->dr_mask))];
         break;
 
+        /*
+         * TODO: Implement when we have better topology representation.
+    case MSR_INTEL_CORE_THREAD_COUNT:
+         */
     default:
         return X86EMUL_UNHANDLEABLE;
     }
@@ -229,6 +233,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
     {
         uint64_t rsvd;
 
+    case MSR_INTEL_CORE_THREAD_COUNT:
     case MSR_INTEL_PLATFORM_INFO:
     case MSR_ARCH_CAPABILITIES:
         /* Read-only */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 11512d4..389f95f 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -32,6 +32,10 @@
 #define EFER_KNOWN_MASK		(EFER_SCE | EFER_LME | EFER_LMA | EFER_NX | \
 				 EFER_SVME | EFER_FFXSE)
 
+#define MSR_INTEL_CORE_THREAD_COUNT     0x00000035
+#define MSR_CTC_THREAD_MASK             0x0000ffff
+#define MSR_CTC_CORE_MASK               0xffff0000
+
 /* Speculation Controls. */
 #define MSR_SPEC_CTRL			0x00000048
 #define SPEC_CTRL_IBRS			(_AC(1, ULL) << 0)

[-- Attachment #47: xsa297/xsa297-4.12-3.patch --]
[-- Type: application/octet-stream, Size: 4374 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/boot: Detect the firmware SMT setting correctly on Intel hardware

While boot_cpu_data.x86_num_siblings is an accurate value to use on AMD
hardware, it isn't on Intel when the user has disabled Hyperthreading in the
firmware.  As a result, a user which has chosen to disable HT still gets
nagged on L1TF-vulnerable hardware when they haven't chosen an explicit
smt=<bool> setting.

Make use of the largely-undocumented MSR_INTEL_CORE_THREAD_COUNT which in
practice exists since Nehalem, when booting on real hardware.  Fall back to
using the ACPI table APIC IDs.

While adjusting this logic, fix a latent bug in amd_get_topology().  The
thread count field in CPUID.0x8000001e.ebx is documented as 8 bits wide,
rather than 2 bits wide.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index c790416..b1debac 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -507,7 +507,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
                 u32 eax, ebx, ecx, edx;
 
                 cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-                c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
+                c->x86_num_siblings = ((ebx >> 8) & 0xff) + 1;
 
                 if (c->x86 < 0x17)
                         c->compute_unit_id = ebx & 0xFF;
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 99310c8..e49ab3f 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -368,6 +368,45 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
 #endif
 }
 
+static bool __init check_smt_enabled(void)
+{
+    uint64_t val;
+    unsigned int cpu;
+
+    /*
+     * x86_num_siblings defaults to 1 in the absence of other information, and
+     * is adjusted based on other topology information found in CPUID leaves.
+     *
+     * On AMD hardware, it will be the current SMT configuration.  On Intel
+     * hardware, it will represent the maximum capability, rather than the
+     * current configuration.
+     */
+    if ( boot_cpu_data.x86_num_siblings < 2 )
+        return false;
+
+    /*
+     * Intel Nehalem and later hardware does have an MSR which reports the
+     * current count of cores/threads in the package.
+     *
+     * At the time of writing, it is almost completely undocumented, so isn't
+     * virtualised reliably.
+     */
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && !cpu_has_hypervisor &&
+         !rdmsr_safe(MSR_INTEL_CORE_THREAD_COUNT, val) )
+        return (MASK_EXTR(val, MSR_CTC_CORE_MASK) !=
+                MASK_EXTR(val, MSR_CTC_THREAD_MASK));
+
+    /*
+     * Search over the CPUs reported in the ACPI tables.  Any whose APIC ID
+     * has a non-zero thread id component indicates that SMT is active.
+     */
+    for_each_present_cpu ( cpu )
+        if ( x86_cpu_to_apicid[cpu] & (boot_cpu_data.x86_num_siblings - 1) )
+            return true;
+
+    return false;
+}
+
 /* Calculate whether Retpoline is known-safe on this CPU. */
 static bool __init retpoline_safe(uint64_t caps)
 {
@@ -697,12 +736,14 @@ static __init void l1tf_calculations(uint64_t caps)
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
-    bool use_spec_ctrl = false, ibrs = false;
+    bool use_spec_ctrl = false, ibrs = false, hw_smt_enabled;
     uint64_t caps = 0;
 
     if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
         rdmsrl(MSR_ARCH_CAPABILITIES, caps);
 
+    hw_smt_enabled = check_smt_enabled();
+
     /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
      * instructions exactly and disable all heuristics.
@@ -873,8 +914,7 @@ void __init init_speculation_mitigations(void)
      * However, if we are on affected hardware, with HT enabled, and the user
      * hasn't explicitly chosen whether to use HT or not, nag them to do so.
      */
-    if ( opt_smt == -1 && cpu_has_bug_l1tf && !pv_shim &&
-         boot_cpu_data.x86_num_siblings > 1 )
+    if ( opt_smt == -1 && cpu_has_bug_l1tf && !pv_shim && hw_smt_enabled )
         warning_add(
             "Booted on L1TF-vulnerable hardware with SMT/Hyperthreading\n"
             "enabled.  Please assess your configuration and choose an\n"

[-- Attachment #48: xsa297/xsa297-4.12-4.patch --]
[-- Type: application/octet-stream, Size: 2149 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Misc non-functional cleanup

 * Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
   mitigations will shortly appear.
 * Use alternative_input() and cover the lack of memory cobber with a further
   barrier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 779da2b..20ee112 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -68,6 +68,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     uint32_t val = 0;
 
     /*
+     * Branch Target Injection:
+     *
      * Latch the new shadow value, then enable shadowing, then update the MSR.
      * There are no SMP issues here; only local processor ordering concerns.
      */
@@ -75,8 +77,9 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     barrier();
     info->spec_ctrl_flags |= SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE("", "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input("", "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -85,13 +88,16 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     uint32_t val = info->xen_spec_ctrl;
 
     /*
+     * Branch Target Injection:
+     *
      * Disable shadowing before updating the MSR.  There are no SMP issues
      * here; only local processor ordering concerns.
      */
     info->spec_ctrl_flags &= ~SCF_use_shadow;
     barrier();
-    asm volatile ( ALTERNATIVE("", "wrmsr", X86_FEATURE_SC_MSR_IDLE)
-                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+    alternative_input("", "wrmsr", X86_FEATURE_SC_MSR_IDLE,
+                      "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
+    barrier();
 }
 
 #endif /* __ASSEMBLY__ */

[-- Attachment #49: xsa297/xsa297-4.12-5.patch --]
[-- Type: application/octet-stream, Size: 7443 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data
 Sampling

The MD_CLEAR feature can be automatically offered to guests.  No
infrastructure is needed in Xen to support the guest making use of it.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 6db82f3..f80d8d8 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -483,7 +483,7 @@ accounting for hardware capabilities as enumerated via CPUID.
 
 Currently accepted:
 
-The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`,
+The Speculation Control hardware features `md-clear`, `ibrsb`, `stibp`, `ibpb`,
 `l1d-flush` and `ssbd` are used by default if available and applicable.  They can
 be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
 won't offer them to guests.
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index 52e16c2..5a1702d 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -202,6 +202,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
 
         {"avx512-4vnniw",0x00000007,  0, CPUID_REG_EDX,  2,  1},
         {"avx512-4fmaps",0x00000007,  0, CPUID_REG_EDX,  3,  1},
+        {"md-clear",     0x00000007,  0, CPUID_REG_EDX, 10,  1},
         {"ibrsb",        0x00000007,  0, CPUID_REG_EDX, 26,  1},
         {"stibp",        0x00000007,  0, CPUID_REG_EDX, 27,  1},
         {"l1d-flush",    0x00000007,  0, CPUID_REG_EDX, 28,  1},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index d87a72e..f67ecd3 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -146,6 +146,7 @@ static const char *str_7d0[32] =
 {
     [ 2] = "avx512_4vnniw", [ 3] = "avx512_4fmaps",
 
+    [10] = "md-clear",
     /* 12 */                [13] = "tsx-force-abort",
 
     [26] = "ibrsb",         [27] = "stibp",
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index ab0aab6..3efad9c 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -29,7 +29,12 @@ static int __init parse_xen_cpuid(const char *s)
         if ( !ss )
             ss = strchr(s, '\0');
 
-        if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+        if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        {
+            if ( !val )
+                setup_clear_cpu_cap(X86_FEATURE_MD_CLEAR);
+        }
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
         {
             if ( !val )
                 setup_clear_cpu_cap(X86_FEATURE_IBPB);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index e49ab3f..a573b02 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -291,17 +291,19 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
     printk("Speculative mitigation facilities:\n");
 
     /* Hardware features which pertain to speculative mitigations. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP"     : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))  ? " SSBD"      : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"      : "",
            (caps & ARCH_CAPS_IBRS_ALL)              ? " IBRS_ALL"  : "",
            (caps & ARCH_CAPS_RDCL_NO)               ? " RDCL_NO"   : "",
            (caps & ARCH_CAPS_RSBA)                  ? " RSBA"      : "",
            (caps & ARCH_CAPS_SKIP_L1DFL)            ? " SKIP_L1DFL": "",
-           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "");
+           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "",
+           (caps & ARCH_CAPS_MDS_NO)                ? " MDS_NO"    : "");
 
     /* Compiled-in support which pertains to mitigations. */
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
@@ -339,23 +341,25 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * mitigation support for guests.
      */
 #ifdef CONFIG_HVM
-    printk("  Support for HVM VMs:%s%s%s%s\n",
+    printk("  Support for HVM VMs:%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
-           opt_eager_fpu                             ? " EAGER_FPU"     : "");
+           opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
 
 #endif
 #ifdef CONFIG_PV
-    printk("  Support for PV VMs:%s%s%s%s\n",
+    printk("  Support for PV VMs:%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
-           opt_eager_fpu                             ? " EAGER_FPU"     : "");
+           opt_eager_fpu                             ? " EAGER_FPU"     : "",
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 389f95f..637259b 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -51,6 +51,7 @@
 #define ARCH_CAPS_RSBA			(_AC(1, ULL) << 2)
 #define ARCH_CAPS_SKIP_L1DFL		(_AC(1, ULL) << 3)
 #define ARCH_CAPS_SSB_NO		(_AC(1, ULL) << 4)
+#define ARCH_CAPS_MDS_NO		(_AC(1, ULL) << 5)
 
 #define MSR_FLUSH_CMD			0x0000010b
 #define FLUSH_CMD_L1D			(_AC(1, ULL) << 0)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 2bcc548..55231d4 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -242,6 +242,7 @@ XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
 XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation Single Precision */
+XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */

[-- Attachment #50: xsa297/xsa297-4.12-6.patch --]
[-- Type: application/octet-stream, Size: 6217 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers

Three synthetic features are introduced, as we need individual control of
each, depending on circumstances.  A later change will enable them at
appropriate points.

The verw_sel field doesn't strictly need to live in struct cpu_info.  It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index 052228c..33930ce 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -110,6 +110,7 @@ void __dummy__(void)
     BLANK();
 
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
+    OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
     OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index 0c06274..ba55245 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -31,3 +31,6 @@ XEN_CPUFEATURE(SC_RSB_PV,       (FSCAPINTS+0)*32+18) /* RSB overwrite needed for
 XEN_CPUFEATURE(SC_RSB_HVM,      (FSCAPINTS+0)*32+19) /* RSB overwrite needed for HVM */
 XEN_CPUFEATURE(SC_MSR_IDLE,     (FSCAPINTS+0)*32+21) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
 XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+22) /* Xen uses MSR_DEBUGCTL.LBR */
+XEN_CPUFEATURE(SC_VERW_PV,      (FSCAPINTS+0)*32+23) /* VERW used by Xen for PV */
+XEN_CPUFEATURE(SC_VERW_HVM,     (FSCAPINTS+0)*32+24) /* VERW used by Xen for HVM */
+XEN_CPUFEATURE(SC_VERW_IDLE,    (FSCAPINTS+0)*32+25) /* VERW used by Xen for idle */
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 5bd64b2..f3508c3 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -38,6 +38,7 @@ struct vcpu;
 struct cpu_info {
     struct cpu_user_regs guest_cpu_user_regs;
     unsigned int processor_id;
+    unsigned int verw_sel;
     struct vcpu *current_vcpu;
     unsigned long per_cpu_offset;
     unsigned long cr4;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 20ee112..ba03bb4 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -60,6 +60,13 @@ static inline void init_shadow_spec_ctrl_state(void)
     info->shadow_spec_ctrl = 0;
     info->xen_spec_ctrl = default_xen_spec_ctrl;
     info->spec_ctrl_flags = default_spec_ctrl_flags;
+
+    /*
+     * For least latency, the VERW selector should be a writeable data
+     * descriptor resident in the cache.  __HYPERVISOR_DS32 shares a cache
+     * line with __HYPERVISOR_CS, so is expected to be very cache-hot.
+     */
+    info->verw_sel = __HYPERVISOR_DS32;
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe after this call. */
@@ -80,6 +87,22 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     alternative_input("", "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When entering idle, our store buffer entries
+     * are re-partitioned to allow the other threads to use them.
+     *
+     * Flush the buffers to ensure that no sensitive data of ours can be
+     * leaked by a sibling after it gets our store buffer entries.
+     *
+     * Note: VERW must be encoded with a memory operand, as it is only that
+     * form which causes a flush.
+     */
+    alternative_input("", "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
+                      [sel] "m" (info->verw_sel));
 }
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
@@ -98,6 +121,17 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     alternative_input("", "wrmsr", X86_FEATURE_SC_MSR_IDLE,
                       "a" (val), "c" (MSR_SPEC_CTRL), "d" (0));
     barrier();
+
+    /*
+     * Microarchitectural Store Buffer Data Sampling:
+     *
+     * On vulnerable systems, store buffer entries are statically partitioned
+     * between active threads.  When exiting idle, the other threads store
+     * buffer entries are re-partitioned to give us some.
+     *
+     * We now have store buffer entries with stale data from sibling threads.
+     * A flush if necessary will be performed on the return to guest path.
+     */
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 803f7ce..c60093b 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -241,12 +241,16 @@
 /* Use when exiting to PV guest context. */
 #define SPEC_CTRL_EXIT_TO_PV                                            \
     ALTERNATIVE "",                                                     \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;              \
+    ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)),           \
+        X86_FEATURE_SC_VERW_PV
 
 /* Use when exiting to HVM guest context. */
 #define SPEC_CTRL_EXIT_TO_HVM                                           \
     ALTERNATIVE "",                                                     \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM;             \
+    ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)),           \
+        X86_FEATURE_SC_VERW_HVM
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.

[-- Attachment #51: xsa297/xsa297-4.12-7.patch --]
[-- Type: application/octet-stream, Size: 12914 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Introduce options to control VERW flushing

The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:

 MLPDS - Microarchitectural Load Port Data Sampling
 MSBDS - Microarchitectural Store Buffer Data Sampling
 MFBDS - Microarchitectural Fill Buffer Data Sampling
 MDSUM - Microarchitectural Data Sampling Uncacheable Memory

MDSUM is a special case of the other three, and isn't distinguished further.

These issues pertain to three microarchitectural buffers.  The Load Ports, the
Store Buffers and the Fill Buffers.  Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.

For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:

 * Vulnerable hardware is divided into two categories - parts which suffer
   from SP only, and parts with any other combination of vulnerabilities.

 * SP only has an HT interaction when the thread goes idle, due to the static
   partitioning of resources.  LP and FB have HT interactions at all points,
   due to the competitive sharing of resources.  All issues potentially leak
   data across the return-to-guest transition.

 * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
   we don't need to do both on the HVM return-to-guest path.  However, some
   parts are not vulnerable to L1TF (therefore have no MSR_FLUSH_CMD), but are
   vulnerable to MDS, so do require VERW on the HVM path.

Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index f80d8d8..85081fd 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1895,7 +1895,7 @@ not be able to control the state of the mitigation.
 By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
 >              l1d-flush}=<bool> ]`
 
@@ -1919,9 +1919,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=` and `rsb=` offer fine grained control
-over the alternative blocks used by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+grained control over the alternative blocks used by Xen.  These impact Xen's
+ability to protect itself, and Xen's ability to virtualise support for guests
+to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -1930,6 +1931,11 @@ protect itself, and Xen's ability to virtualise support for guests to use.
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
 * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
   Return Address Stack on entry to Xen.
+* `md-clear=` offers control over whether to use VERW to flush
+  microarchitectural buffers on idle and exit from Xen.  *Note: For
+  compatibility with development versions of this fix, `mds=` is also accepted
+  on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
+  preference to here.*
 
 If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index a573b02..0509ac8 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -35,6 +35,8 @@ static bool __initdata opt_msr_sc_pv = true;
 static bool __initdata opt_msr_sc_hvm = true;
 static bool __initdata opt_rsb_pv = true;
 static bool __initdata opt_rsb_hvm = true;
+static int8_t __initdata opt_md_clear_pv = -1;
+static int8_t __initdata opt_md_clear_hvm = -1;
 
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
@@ -59,6 +61,9 @@ paddr_t __read_mostly l1tf_addr_mask, __read_mostly l1tf_safe_maddr;
 static bool __initdata cpu_has_bug_l1tf;
 static unsigned int __initdata l1d_maxphysaddr;
 
+static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
+static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
+
 static int __init parse_spec_ctrl(const char *s)
 {
     const char *ss;
@@ -94,6 +99,8 @@ static int __init parse_spec_ctrl(const char *s)
         disable_common:
             opt_rsb_pv = false;
             opt_rsb_hvm = false;
+            opt_md_clear_pv = 0;
+            opt_md_clear_hvm = 0;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -116,11 +123,13 @@ static int __init parse_spec_ctrl(const char *s)
         {
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
+            opt_md_clear_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
+            opt_md_clear_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
         {
@@ -132,6 +141,12 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_pv = val;
             opt_rsb_hvm = val;
         }
+        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 ||
+                  (val = parse_boolean("mds", s, ss)) >= 0 )
+        {
+            opt_md_clear_pv = val;
+            opt_md_clear_hvm = val;
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -317,7 +332,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -327,7 +342,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            !boot_cpu_has(X86_FEATURE_SSBD)           ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
            opt_ibpb                                  ? " IBPB"  : "",
-           opt_l1d_flush                             ? " L1D_FLUSH" : "");
+           opt_l1d_flush                             ? " L1D_FLUSH" : "",
+           opt_md_clear_pv || opt_md_clear_hvm       ? " VERW"  : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -737,6 +753,107 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
+/* Calculate whether this CPU is vulnerable to MDS. */
+static __init void mds_calculations(uint64_t caps)
+{
+    /* MDS is only known to affect Intel Family 6 processors at this time. */
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return;
+
+    /* Any processor advertising MDS_NO should be not vulnerable to MDS. */
+    if ( caps & ARCH_CAPS_MDS_NO )
+        return;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+        /*
+         * Core processors since at least Nehalem are vulnerable.
+         */
+    case 0x1f: /* Auburndale / Havendale */
+    case 0x1e: /* Nehalem */
+    case 0x1a: /* Nehalem EP */
+    case 0x2e: /* Nehalem EX */
+    case 0x25: /* Westmere */
+    case 0x2c: /* Westmere EP */
+    case 0x2f: /* Westmere EX */
+    case 0x2a: /* SandyBridge */
+    case 0x2d: /* SandyBridge EP/EX */
+    case 0x3a: /* IvyBridge */
+    case 0x3e: /* IvyBridge EP/EX */
+    case 0x3c: /* Haswell */
+    case 0x3f: /* Haswell EX/EP */
+    case 0x45: /* Haswell D */
+    case 0x46: /* Haswell H */
+    case 0x3d: /* Broadwell */
+    case 0x47: /* Broadwell H */
+    case 0x4f: /* Broadwell EP/EX */
+    case 0x56: /* Broadwell D */
+    case 0x4e: /* Skylake M */
+    case 0x5e: /* Skylake D */
+        cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Some Core processors have per-stepping vulnerability.
+         */
+    case 0x55: /* Skylake-X / Cascade Lake */
+        if ( boot_cpu_data.x86_mask <= 5 )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x8e: /* Kaby / Coffee / Whiskey Lake M */
+        if ( boot_cpu_data.x86_mask <= 0xb )
+            cpu_has_bug_mds = true;
+        break;
+
+    case 0x9e: /* Kaby / Coffee / Whiskey Lake D */
+        if ( boot_cpu_data.x86_mask <= 0xc )
+            cpu_has_bug_mds = true;
+        break;
+
+        /*
+         * Very old and very new Atom processors are not vulnerable.
+         */
+    case 0x1c: /* Pineview */
+    case 0x26: /* Lincroft */
+    case 0x27: /* Penwell */
+    case 0x35: /* Cloverview */
+    case 0x36: /* Cedarview */
+    case 0x7a: /* Goldmont */
+        break;
+
+        /*
+         * Middling Atom processors are vulnerable to just the Store Buffer
+         * aspect.
+         */
+    case 0x37: /* Baytrail / Valleyview (Silvermont) */
+    case 0x4a: /* Merrifield */
+    case 0x4c: /* Cherrytrail / Brasswell */
+    case 0x4d: /* Avaton / Rangely (Silvermont) */
+    case 0x5a: /* Moorefield */
+    case 0x5d:
+    case 0x65:
+    case 0x6e:
+    case 0x75:
+        /*
+         * Knights processors (which are based on the Silvermont/Airmont
+         * microarchitecture) are similarly only affected by the Store Buffer
+         * aspect.
+         */
+    case 0x57: /* Knights Landing */
+    case 0x85: /* Knights Mill */
+        cpu_has_bug_msbds_only = true;
+        break;
+
+    default:
+        printk("Unrecognised CPU model %#x - assuming vulnerable to MDS\n",
+               boot_cpu_data.x86_model);
+        cpu_has_bug_mds = true;
+        break;
+    }
+}
+
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
@@ -924,6 +1041,47 @@ void __init init_speculation_mitigations(void)
             "enabled.  Please assess your configuration and choose an\n"
             "explicit 'smt=<bool>' setting.  See XSA-273.\n");
 
+    mds_calculations(caps);
+
+    /*
+     * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+     * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+     * but it is somewhat better than nothing.
+     */
+    if ( opt_md_clear_pv == -1 )
+        opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                           boot_cpu_has(X86_FEATURE_MD_CLEAR));
+    if ( opt_md_clear_hvm == -1 )
+        opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                            boot_cpu_has(X86_FEATURE_MD_CLEAR));
+
+    /*
+     * Enable MDS defences as applicable.  The PV blocks need using all the
+     * time, and the Idle blocks need using if either PV or HVM defences are
+     * used.
+     *
+     * HVM is more complicated.  The MD_CLEAR microcode extends L1D_FLUSH with
+     * equivelent semantics to avoid needing to perform both flushes on the
+     * HVM path.  The HVM blocks don't need activating if our hypervisor told
+     * us it was handling L1D_FLUSH, or we are using L1D_FLUSH ourselves.
+     */
+    if ( opt_md_clear_pv )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_PV);
+    if ( opt_md_clear_pv || opt_md_clear_hvm )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
+    if ( opt_md_clear_hvm && !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush )
+        setup_force_cpu_cap(X86_FEATURE_SC_VERW_HVM);
+
+    /*
+     * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+     * active and no explicit SMT choice.
+     */
+    if ( opt_smt == -1 && cpu_has_bug_mds && hw_smt_enabled )
+        warning_add(
+            "Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading\n"
+            "enabled.  Mitigations will not be fully effective.  Please\n"
+            "choose an explicit smt=<bool> setting.  See XSA-297.\n");
+
     print_details(thunk, caps);
 
     /*

[-- Attachment #52: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2019-05-14 17:00 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-14 17:00 Xen Security Advisory 297 v1 (CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091) - Microarchitectural Data Sampling speculative side channel Xen.org security team

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).