All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection
@ 2018-01-18 15:45 Andrew Cooper
  2018-01-18 15:46 ` [PATCH v9 01/11] x86/thunk: Fix GEN_INDIRECT_THUNK comment Andrew Cooper
                   ` (10 more replies)
  0 siblings, 11 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:45 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

This series is availabe in git form from:

  http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/sp2-mitigations-v9

A copy of Intel's spec for IBRS/IBPB can be found here:

  https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

In addition to this software series, you will need the following:

  1) A compiler which understands -mindirect-branch=thunk-external and
     -mindirect-branch-register.  A GCC patch series implementing this should
     be available imminently.  In the meantime, a development branch can be
     obtained from:

     https://github.com/hjl-tools/gcc/commits/hjl/indirect/gcc-7-branch/master

  2) New microcode from Intel and AMD.  These provide new MSRs for Xen to use,
     and virtualise for guest kernels to use.

There are some limitations, even with the work presented here.

  1) vCPU-to-vCPU SP2 attacks can only be mitigated at the hypervisor level
     with IBPB support, which for internal pipeline reasons, we do not expect
     to be made available on older processors.  For now, I will leave these
     details to the hardware vendors.

  2) Hardware lacking SMEP is in a worse position than hardware with SMEP.  If
     you have SMEP (Intel IvyBridge and later, Some AMD Fam16h and all Fam17h
     and later), make absolutely sure it is enabled in the BIOS and working.

  3) On hardware lacking SMEP support, it is still an open question how to
     protect against RSB-to-SMM speculation.  Native operating systems can fix
     this by prohibiting userspace from mmap()'ing addresses which alias the
     SMM range, but Xen has no feasible way of enforcing this restriction on
     PV guests, even if we could tolerate the ABI breakage.  (However, see the
     forthcoming SP3 mitigation series for alternatives for un trusted PV
     guests).

~Andrew

Changes from v8:
  * Large rework of STIBP handling following Intel publishing a new spec
  * Rebase over the XPTI patches

Andrew Cooper (11):
  x86/thunk: Fix GEN_INDIRECT_THUNK comment
  x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  x86/msr: Emulation of MSR_{SPEC_CTRL,PRED_CMD} for guests
  x86/migrate: Move MSR_SPEC_CTRL on migrate
  x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL,PRED_CMD}
  x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  x86/boot: Calculate the most appropriate BTI mitigation to use
  x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  x86/ctxt: Issue a speculation barrier between vcpu contexts
  x86/cpuid: Offer Indirect Branch Controls to guests
  x86/idle: Clear SPEC_CTRL while idle

 docs/misc/xen-command-line.markdown         |  13 +-
 tools/libxc/xc_cpuid_x86.c                  |   4 +-
 xen/arch/x86/acpi/cpu_idle.c                |  21 +++
 xen/arch/x86/cpu/mwait-idle.c               |   7 +
 xen/arch/x86/cpuid.c                        |  28 +++
 xen/arch/x86/domain.c                       |  31 ++++
 xen/arch/x86/domctl.c                       |  21 +++
 xen/arch/x86/hvm/hvm.c                      |   2 +
 xen/arch/x86/hvm/svm/entry.S                |   8 +-
 xen/arch/x86/hvm/svm/svm.c                  |   5 +
 xen/arch/x86/hvm/vmx/entry.S                |  11 ++
 xen/arch/x86/hvm/vmx/vmx.c                  |  17 ++
 xen/arch/x86/indirect-thunk.S               |   2 +-
 xen/arch/x86/msr.c                          |  45 +++++
 xen/arch/x86/setup.c                        |   1 +
 xen/arch/x86/smpboot.c                      |   2 +
 xen/arch/x86/spec_ctrl.c                    | 142 ++++++++++++++-
 xen/arch/x86/x86_64/asm-offsets.c           |   6 +
 xen/arch/x86/x86_64/compat/entry.S          |  14 ++
 xen/arch/x86/x86_64/entry.S                 |  34 ++++
 xen/include/asm-x86/asm_defns.h             |   3 +
 xen/include/asm-x86/cpufeatures.h           |   2 +
 xen/include/asm-x86/current.h               |   6 +
 xen/include/asm-x86/msr-index.h             |   2 +
 xen/include/asm-x86/msr.h                   |  10 ++
 xen/include/asm-x86/nops.h                  |   7 +
 xen/include/asm-x86/spec_ctrl.h             |  45 +++++
 xen/include/asm-x86/spec_ctrl_asm.h         | 267 ++++++++++++++++++++++++++++
 xen/include/public/arch-x86/cpufeatureset.h |   6 +-
 29 files changed, 751 insertions(+), 11 deletions(-)
 create mode 100644 xen/include/asm-x86/spec_ctrl_asm.h

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 01/11] x86/thunk: Fix GEN_INDIRECT_THUNK comment
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-18 16:06   ` Jan Beulich
  2018-01-18 15:46 ` [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests Andrew Cooper
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

This is a rebasing error in c/s 858cba0d4c6b "x86: Introduce alternative
indirect thunks" hidden by other changes in the same sentence.

The name with dots rather than underscores was the prerelease GCC ABI.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v9: New.
---
 xen/arch/x86/indirect-thunk.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/indirect-thunk.S b/xen/arch/x86/indirect-thunk.S
index 7d34707..e03fc14 100644
--- a/xen/arch/x86/indirect-thunk.S
+++ b/xen/arch/x86/indirect-thunk.S
@@ -31,7 +31,7 @@
 .endm
 
 /*
- * Build the __x86.indirect_thunk.* symbols.  Execution lands on an
+ * Build the __x86_indirect_thunk_* symbols.  Execution lands on an
  * alternative patch point which implements one of the above THUNK_*'s
  */
 .macro GEN_INDIRECT_THUNK reg:req
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
  2018-01-18 15:46 ` [PATCH v9 01/11] x86/thunk: Fix GEN_INDIRECT_THUNK comment Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-19 10:40   ` Jan Beulich
  2018-01-18 15:46 ` [PATCH v9 03/11] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} " Andrew Cooper
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Intel specifies IBRS/IBPB (combined, in a single bit) and STIBP as a separate
bit.  AMD specifies IBPB alone in a 3rd bit.

AMD's IBPB is a subset of Intel's combined IBRS/IBPB.  For performance
reasons, administrators might wish to express "IBPB only" even on Intel
hardware, so we allow the AMD bit to be used for this purpose.

The behaviour of STIBP is more complicated.

It is our current understanding that STIBP will be advertised on HT-capable
hardware irrespective of whether HT is enabled, but not advertised on
HT-incapable hardware.  However, for ease of virtualisation, STIBP's
functionality is ignored rather than reserved by microcode/hardware on
HT-incapable hardware.

For guest safety, we treat STIBP as special, always override the toolstack
choice, and always advertise STIBP if IBRS is available.  This removes the
corner case where STIBP is not advertised, but the guest is running on
HT-capable hardware where it does matter.

Finally as a bugfix, update the libxc CPUID logic to understand the e8b
feature leaf, which has the side effect of also offering CLZERO to guests on
applicable hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v9:
 * New
 * Move libxc fix from a later patch, and associated toolstack ack.
---
 tools/libxc/xc_cpuid_x86.c                  |  4 +++-
 xen/arch/x86/cpuid.c                        | 28 ++++++++++++++++++++++++++++
 xen/include/public/arch-x86/cpufeatureset.h |  2 +-
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 25b922e..9fa2f7c 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -465,7 +465,9 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 
     case 0x80000008:
         regs[0] &= 0x0000ffffu;
-        regs[1] = regs[3] = 0;
+        regs[1] = info->featureset[featureword_of(X86_FEATURE_CLZERO)];
+        /* regs[2] handled in the per-vendor logic. */
+        regs[3] = 0;
         break;
 
     case 0x00000002: /* Intel cache info (dumped by AMD policy) */
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 2ef71d2..b3c9ac6 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -383,6 +383,16 @@ static void __init calculate_pv_max_policy(void)
     /* Unconditionally claim to be able to set the hypervisor bit. */
     __set_bit(X86_FEATURE_HYPERVISOR, pv_featureset);
 
+    /* On hardware with IBRS/IBPB support, there are further adjustments. */
+    if ( test_bit(X86_FEATURE_IBRSB, pv_featureset) )
+    {
+        /* Offer STIBP unconditionally.  It is a nop on non-HT hardware. */
+        __set_bit(X86_FEATURE_STIBP, pv_featureset);
+
+        /* AMD's IBPB is a subset of IBRS/IBPB. */
+        __set_bit(X86_FEATURE_IBPB, pv_featureset);
+    }
+
     sanitise_featureset(pv_featureset);
     cpuid_featureset_to_policy(pv_featureset, p);
     recalculate_xstate(p);
@@ -440,6 +450,16 @@ static void __init calculate_hvm_max_policy(void)
             __clear_bit(X86_FEATURE_XSAVES, hvm_featureset);
     }
 
+    /* On hardware with IBRS/IBPB support, there are further adjustments. */
+    if ( test_bit(X86_FEATURE_IBRSB, hvm_featureset) )
+    {
+        /* Offer STIBP unconditionally.  It is a nop on non-HT hardware. */
+        __set_bit(X86_FEATURE_STIBP, hvm_featureset);
+
+        /* AMD's IBPB is a subset of IBRS/IBPB. */
+        __set_bit(X86_FEATURE_IBPB, hvm_featureset);
+    }
+
     sanitise_featureset(hvm_featureset);
     cpuid_featureset_to_policy(hvm_featureset, p);
     recalculate_xstate(p);
@@ -581,6 +601,14 @@ void recalculate_cpuid_policy(struct domain *d)
     recalculate_xstate(p);
     recalculate_misc(p);
 
+    /*
+     * Override STIBP to match IBRS.  Guests can safely use STIBP
+     * functionality on non-HT hardware, but can't necesserily protect
+     * themselves from SP2/Spectre/Branch Target Injection if STIBP is hidden
+     * on HT-capable hardware.
+     */
+    p->feat.stibp = p->feat.ibrsb;
+
     for ( i = 0; i < ARRAY_SIZE(p->cache.raw); ++i )
     {
         if ( p->cache.subleaf[i].type >= 1 &&
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index e148755..0f21fed 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -243,7 +243,7 @@ XEN_CPUFEATURE(IBPB,          8*32+12) /*   IBPB support only (no IBRS, used by
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
 XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation Single Precision */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*   IBRS and IBPB support (used by Intel) */
-XEN_CPUFEATURE(STIBP,         9*32+27) /*   STIBP */
+XEN_CPUFEATURE(STIBP,         9*32+27) /*!  STIBP */
 
 #endif /* XEN_CPUFEATURE */
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 03/11] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} for guests
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
  2018-01-18 15:46 ` [PATCH v9 01/11] x86/thunk: Fix GEN_INDIRECT_THUNK comment Andrew Cooper
  2018-01-18 15:46 ` [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-19 10:45   ` Jan Beulich
  2018-01-18 15:46 ` [PATCH v9 04/11] x86/migrate: Move MSR_SPEC_CTRL on migrate Andrew Cooper
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

As per the spec currently available here:

https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

MSR_ARCH_CAPABILITIES will only come into existence on new hardware, but is
implemented as a straight #GP for now to avoid being leaky when new hardware
arrives.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v9:
 * Alter the STIBP handling to match Intel's latest spec
 * Drop spec_ctrl.guest as it is no longer needed
---
 xen/arch/x86/msr.c              | 45 +++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/msr-index.h |  2 ++
 xen/include/asm-x86/msr.h       | 10 +++++++++
 3 files changed, 57 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 187f862..7875d9c 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -120,11 +120,22 @@ int init_vcpu_msr_policy(struct vcpu *v)
 
 int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
 {
+    const struct cpuid_policy *cp = v->domain->arch.cpuid;
     const struct msr_domain_policy *dp = v->domain->arch.msr;
     const struct msr_vcpu_policy *vp = v->arch.msr;
 
     switch ( msr )
     {
+    case MSR_PRED_CMD:
+        /* Write-only */
+        goto gp_fault;
+
+    case MSR_SPEC_CTRL:
+        if ( !cp->feat.ibrsb )
+            goto gp_fault;
+        *val = vp->spec_ctrl.raw;
+        break;
+
     case MSR_INTEL_PLATFORM_INFO:
         if ( !dp->plaform_info.available )
             goto gp_fault;
@@ -132,6 +143,10 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
                _MSR_PLATFORM_INFO_CPUID_FAULTING;
         break;
 
+    case MSR_ARCH_CAPABILITIES:
+        /* Not implemented yet. */
+        goto gp_fault;
+
     case MSR_INTEL_MISC_FEATURES_ENABLES:
         if ( !vp->misc_features_enables.available )
             goto gp_fault;
@@ -153,14 +168,44 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 {
     const struct vcpu *curr = current;
     struct domain *d = v->domain;
+    const struct cpuid_policy *cp = d->arch.cpuid;
     struct msr_domain_policy *dp = d->arch.msr;
     struct msr_vcpu_policy *vp = v->arch.msr;
 
     switch ( msr )
     {
     case MSR_INTEL_PLATFORM_INFO:
+    case MSR_ARCH_CAPABILITIES:
+        /* Read-only */
         goto gp_fault;
 
+    case MSR_SPEC_CTRL:
+        if ( !cp->feat.ibrsb )
+            goto gp_fault; /* MSR available? */
+
+        /*
+         * Note: SPEC_CTRL_STIBP is specified as safe to use (i.e. ignored)
+         * when STIBP isn't enumerated in hardware.
+         */
+
+        if ( val & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP) )
+            goto gp_fault; /* Rsvd bit set? */
+
+        vp->spec_ctrl.raw = val;
+        break;
+
+    case MSR_PRED_CMD:
+        if ( !cp->feat.ibrsb && !cp->extd.ibpb )
+            goto gp_fault; /* MSR available? */
+
+        /*
+         * The only defined behaviour is when writing PRED_CMD_IBPB.  In
+         * practice, real hardware accepts any value without faulting.
+         */
+        if ( v == curr && (val & PRED_CMD_IBPB) )
+            wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
+        break;
+
     case MSR_INTEL_MISC_FEATURES_ENABLES:
     {
         uint64_t rsvd = ~0ull;
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index a0aacfa..23ad743 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -39,6 +39,8 @@
 #define MSR_PRED_CMD			0x00000049
 #define PRED_CMD_IBPB			(_AC(1, ULL) << 0)
 
+#define MSR_ARCH_CAPABILITIES		0x0000010a
+
 /* Intel MSRs. Some also available on other CPUs */
 #define MSR_IA32_PERFCTR0		0x000000c1
 #define MSR_IA32_A_PERFCTR0		0x000004c1
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 2fbed02..928f1cc 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -223,6 +223,16 @@ struct msr_domain_policy
 /* MSR policy object for per-vCPU MSRs */
 struct msr_vcpu_policy
 {
+    /* 0x00000048 - MSR_SPEC_CTRL */
+    struct {
+        /*
+         * Only the bottom two bits are defined, so no need to waste space
+         * with uint64_t at the moment, but use uint32_t for the convenience
+         * of the assembly code.
+         */
+        uint32_t raw;
+    } spec_ctrl;
+
     /* 0x00000140  MSR_INTEL_MISC_FEATURES_ENABLES */
     struct {
         bool available; /* This MSR is non-architectural */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 04/11] x86/migrate: Move MSR_SPEC_CTRL on migrate
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
                   ` (2 preceding siblings ...)
  2018-01-18 15:46 ` [PATCH v9 03/11] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} " Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-18 15:46 ` [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD} Andrew Cooper
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domctl.c  | 2 ++
 xen/arch/x86/hvm/hvm.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 2585d4e..1a15a34 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1292,6 +1292,7 @@ long arch_do_domctl(
         struct xen_domctl_vcpu_msr msr = {};
         struct vcpu *v;
         static const uint32_t msrs_to_send[] = {
+            MSR_SPEC_CTRL,
             MSR_INTEL_MISC_FEATURES_ENABLES,
         };
         uint32_t nr_msrs = ARRAY_SIZE(msrs_to_send);
@@ -1415,6 +1416,7 @@ long arch_do_domctl(
 
                 switch ( msr.index )
                 {
+                case MSR_SPEC_CTRL:
                 case MSR_INTEL_MISC_FEATURES_ENABLES:
                     if ( guest_wrmsr(v, msr.index, msr.value) != X86EMUL_OKAY )
                         break;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index db282b5..ed36598 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1323,6 +1323,7 @@ static int hvm_load_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h)
 
 #define HVM_CPU_MSR_SIZE(cnt) offsetof(struct hvm_msr, msr[cnt])
 static const uint32_t msrs_to_send[] = {
+    MSR_SPEC_CTRL,
     MSR_INTEL_MISC_FEATURES_ENABLES,
 };
 static unsigned int __read_mostly msr_count_max = ARRAY_SIZE(msrs_to_send);
@@ -1458,6 +1459,7 @@ static int hvm_load_cpu_msrs(struct domain *d, hvm_domain_context_t *h)
         {
             int rc;
 
+        case MSR_SPEC_CTRL:
         case MSR_INTEL_MISC_FEATURES_ENABLES:
             rc = guest_wrmsr(v, ctxt->msr[i].index, ctxt->msr[i].val);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD}
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
                   ` (3 preceding siblings ...)
  2018-01-18 15:46 ` [PATCH v9 04/11] x86/migrate: Move MSR_SPEC_CTRL on migrate Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-18 18:04   ` Boris Ostrovsky
                     ` (2 more replies)
  2018-01-18 15:46 ` [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point Andrew Cooper
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Jan Beulich, Andrew Cooper, Jun Nakajima,
	Boris Ostrovsky, Suravee Suthikulpanit

For performance reasons, HVM guests should have direct access to these MSRs
when possible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

v7:
 * Drop excess brackets
v9:
 * Re-implement it light of Intels new spec.  Drop R-by's.
 * Spelling fixes
---
 xen/arch/x86/domctl.c      | 19 +++++++++++++++++++
 xen/arch/x86/hvm/svm/svm.c |  5 +++++
 xen/arch/x86/hvm/vmx/vmx.c | 17 +++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 1a15a34..a9adae3 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -53,6 +53,7 @@ static int update_domain_cpuid_info(struct domain *d,
     struct cpuid_policy *p = d->arch.cpuid;
     const struct cpuid_leaf leaf = { ctl->eax, ctl->ebx, ctl->ecx, ctl->edx };
     int old_vendor = p->x86_vendor;
+    unsigned int old_7d0 = p->feat.raw[0].d, old_e8b = p->extd.raw[8].b;
     bool call_policy_changed = false; /* Avoid for_each_vcpu() unnecessarily */
 
     /*
@@ -218,6 +219,14 @@ static int update_domain_cpuid_info(struct domain *d,
 
             d->arch.pv_domain.cpuidmasks->_7ab0 = mask;
         }
+
+        /*
+         * If the IBRS/IBPB policy has changed, we need to recalculate the MSR
+         * interception bitmaps.
+         */
+        call_policy_changed = (is_hvm_domain(d) &&
+                               ((old_7d0 ^ p->feat.raw[0].d) &
+                                cpufeat_mask(X86_FEATURE_IBRSB)));
         break;
 
     case 0xa:
@@ -292,6 +301,16 @@ static int update_domain_cpuid_info(struct domain *d,
             d->arch.pv_domain.cpuidmasks->e1cd = mask;
         }
         break;
+
+    case 0x80000008:
+        /*
+         * If the IBRB policy has changed, we need to recalculate the MSR
+         * interception bitmaps.
+         */
+        call_policy_changed = (is_hvm_domain(d) &&
+                               ((old_e8b ^ p->extd.raw[8].b) &
+                                cpufeat_mask(X86_FEATURE_IBPB)));
+        break;
     }
 
     if ( call_policy_changed )
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 9d9ad77..249ede0 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -617,6 +617,7 @@ static void svm_cpuid_policy_changed(struct vcpu *v)
 {
     struct arch_svm_struct *arch_svm = &v->arch.hvm_svm;
     struct vmcb_struct *vmcb = arch_svm->vmcb;
+    const struct cpuid_policy *cp = v->domain->arch.cpuid;
     u32 bitmap = vmcb_get_exception_intercepts(vmcb);
 
     if ( opt_hvm_fep ||
@@ -626,6 +627,10 @@ static void svm_cpuid_policy_changed(struct vcpu *v)
         bitmap &= ~(1U << TRAP_invalid_op);
 
     vmcb_set_exception_intercepts(vmcb, bitmap);
+
+    /* Give access to MSR_PRED_CMD if the guest has been told about it. */
+    svm_intercept_msr(v, MSR_PRED_CMD,
+                      cp->extd.ibpb ? MSR_INTERCEPT_NONE : MSR_INTERCEPT_RW);
 }
 
 static void svm_sync_vmcb(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e036303..1546c2a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -656,6 +656,8 @@ void vmx_update_exception_bitmap(struct vcpu *v)
 
 static void vmx_cpuid_policy_changed(struct vcpu *v)
 {
+    const struct cpuid_policy *cp = v->domain->arch.cpuid;
+
     if ( opt_hvm_fep ||
          (v->domain->arch.cpuid->x86_vendor != boot_cpu_data.x86_vendor) )
         v->arch.hvm_vmx.exception_bitmap |= (1U << TRAP_invalid_op);
@@ -665,6 +667,21 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
     vmx_vmcs_enter(v);
     vmx_update_exception_bitmap(v);
     vmx_vmcs_exit(v);
+
+    /*
+     * We can safely pass MSR_SPEC_CTRL through to the guest, even if STIBP
+     * isn't enumerated in hardware, as SPEC_CTRL_STIBP is ignored.
+     */
+    if ( cp->feat.ibrsb )
+        vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
+    else
+        vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
+
+    /* MSR_PRED_CMD is safe to pass through if the guest knows about it. */
+    if ( cp->feat.ibrsb || cp->extd.ibpb )
+        vmx_clear_msr_intercept(v, MSR_PRED_CMD,  VMX_MSR_RW);
+    else
+        vmx_set_msr_intercept(v, MSR_PRED_CMD,  VMX_MSR_RW);
 }
 
 int vmx_guest_x86_mode(struct vcpu *v)
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
                   ` (4 preceding siblings ...)
  2018-01-18 15:46 ` [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD} Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-19 10:39   ` Andrew Cooper
  2018-01-19 11:43   ` Jan Beulich
  2018-01-18 15:46 ` [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use Andrew Cooper
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

We need to be able to either set or clear IBRS in Xen context, as well as
restore appropriate guest values in guest context.  See the documentation in
asm-x86/spec_ctrl_asm.h for details.

Writes to %cr3 are slower when SPEC_CTRL.IBRS is set, so the
SPEC_CTRL_{ENTRY/EXIT}* positioning is important, and optimised for the
expected common case of Xen not using IBRS itself.

There is a semi-unrelated bugfix, where various asm_defn.h macros have a
hidden dependency on PAGE_SIZE, which results in an assembler error if used in
a .macro definition.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v7:
 * Spelling fixes
 * Reposition the semicolon fix.
v9:
 * Rebase over XPTI.  Moderate changes in the exit-to-PV paths.
 * Swap cmpw for testb when looking for cpl0
---
 xen/arch/x86/hvm/svm/entry.S        |   8 +-
 xen/arch/x86/hvm/vmx/entry.S        |  11 ++
 xen/arch/x86/setup.c                |   1 +
 xen/arch/x86/smpboot.c              |   2 +
 xen/arch/x86/x86_64/asm-offsets.c   |   6 +
 xen/arch/x86/x86_64/compat/entry.S  |  14 +++
 xen/arch/x86/x86_64/entry.S         |  34 ++++++
 xen/include/asm-x86/asm_defns.h     |   3 +
 xen/include/asm-x86/current.h       |   6 +
 xen/include/asm-x86/nops.h          |   6 +
 xen/include/asm-x86/spec_ctrl.h     |   9 ++
 xen/include/asm-x86/spec_ctrl_asm.h | 227 ++++++++++++++++++++++++++++++++++++
 12 files changed, 326 insertions(+), 1 deletion(-)
 create mode 100644 xen/include/asm-x86/spec_ctrl_asm.h

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index df86da0..fb1048b 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -79,6 +79,9 @@ UNLIKELY_END(svm_trace)
         or   $X86_EFLAGS_MBS,%rax
         mov  %rax,VMCB_rflags(%rcx)
 
+        /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+        SPEC_CTRL_EXIT_TO_GUEST /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+
         pop  %r15
         pop  %r14
         pop  %r13
@@ -101,8 +104,11 @@ UNLIKELY_END(svm_trace)
         SAVE_ALL
 
         GET_CURRENT(bx)
-        mov  VCPU_svm_vmcb(%rbx),%rcx
 
+        SPEC_CTRL_ENTRY_FROM_VMEXIT /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+        mov  VCPU_svm_vmcb(%rbx),%rcx
         movb $0,VCPU_svm_vmcb_in_sync(%rbx)
         mov  VMCB_rax(%rcx),%rax
         mov  %rax,UREGS_rax(%rsp)
diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index b2f98be..21e959f 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -38,6 +38,9 @@ ENTRY(vmx_asm_vmexit_handler)
         movb $1,VCPU_vmx_launched(%rbx)
         mov  %rax,VCPU_hvm_guest_cr2(%rbx)
 
+        SPEC_CTRL_ENTRY_FROM_VMEXIT /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         mov  %rsp,%rdi
         call vmx_vmexit_handler
 
@@ -68,6 +71,10 @@ UNLIKELY_END(realmode)
         call vmx_vmenter_helper
         test %al, %al
         jz .Lvmx_vmentry_restart
+
+        /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+        SPEC_CTRL_EXIT_TO_GUEST /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+
         mov  VCPU_hvm_guest_cr2(%rbx),%rax
 
         pop  %r15
@@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
 .Lvmx_vmentry_fail:
         sti
         SAVE_ALL
+
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         call vmx_vmentry_failure
         BUG  /* vmx_vmentry_failure() shouldn't return. */
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b6888c7..41afd2a 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -678,6 +678,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     set_processor_id(0);
     set_current(INVALID_VCPU); /* debug sanity. */
     idle_vcpu[0] = current;
+    init_shadow_spec_ctrl_state();
 
     percpu_init_areas();
 
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 2cdd431..196ee7e 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -41,6 +41,7 @@
 #include <asm/guest.h>
 #include <asm/msr.h>
 #include <asm/mtrr.h>
+#include <asm/spec_ctrl.h>
 #include <asm/time.h>
 #include <asm/tboot.h>
 #include <mach_apic.h>
@@ -309,6 +310,7 @@ void start_secondary(void *unused)
     set_current(idle_vcpu[cpu]);
     this_cpu(curr_vcpu) = idle_vcpu[cpu];
     rdmsrl(MSR_EFER, this_cpu(efer));
+    init_shadow_spec_ctrl_state();
 
     /*
      * Just as during early bootstrap, it is convenient here to disable
diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index b1a4310..17f1d77 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -88,6 +88,7 @@ void __dummy__(void)
     OFFSET(VCPU_kernel_ss, struct vcpu, arch.pv_vcpu.kernel_ss);
     OFFSET(VCPU_iopl, struct vcpu, arch.pv_vcpu.iopl);
     OFFSET(VCPU_guest_context_flags, struct vcpu, arch.vgc_flags);
+    OFFSET(VCPU_arch_msr, struct vcpu, arch.msr);
     OFFSET(VCPU_nmi_pending, struct vcpu, nmi_pending);
     OFFSET(VCPU_mce_pending, struct vcpu, mce_pending);
     OFFSET(VCPU_nmi_old_mask, struct vcpu, nmi_state.old_mask);
@@ -139,6 +140,8 @@ void __dummy__(void)
     OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
     OFFSET(CPUINFO_pv_cr3, struct cpu_info, pv_cr3);
+    OFFSET(CPUINFO_shadow_spec_ctrl, struct cpu_info, shadow_spec_ctrl);
+    OFFSET(CPUINFO_use_shadow_spec_ctrl, struct cpu_info, use_shadow_spec_ctrl);
     DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
     BLANK();
 
@@ -154,6 +157,9 @@ void __dummy__(void)
     OFFSET(TRAPBOUNCE_eip, struct trap_bounce, eip);
     BLANK();
 
+    OFFSET(VCPUMSR_spec_ctrl_raw, struct msr_vcpu_policy, spec_ctrl.raw);
+    BLANK();
+
 #ifdef CONFIG_PERF_COUNTERS
     DEFINE(ASM_PERFC_exceptions, PERFC_exceptions);
     BLANK();
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index e668f00..430b5ce 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -18,6 +18,10 @@ ENTRY(entry_int82)
         pushq $0
         movl  $HYPERCALL_VECTOR, 4(%rsp)
         SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
+
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         CR4_PV32_RESTORE
 
         GET_CURRENT(bx)
@@ -142,6 +146,13 @@ ENTRY(compat_restore_all_guest)
         .popsection
         or    $X86_EFLAGS_IF,%r11
         mov   %r11d,UREGS_eflags(%rsp)
+
+        mov VCPU_arch_msr(%rbx), %rax
+        mov VCPUMSR_spec_ctrl_raw(%rax), %eax
+
+        /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+        SPEC_CTRL_EXIT_TO_GUEST /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: acd */
+
         RESTORE_ALL adj=8 compat=1
 .Lft0:  iretq
         _ASM_PRE_EXTABLE(.Lft0, handle_exception)
@@ -200,6 +211,9 @@ ENTRY(cstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         GET_STACK_END(bx)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
         neg   %rcx
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 710c061..b75b292 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -38,6 +38,10 @@ ENTRY(switch_to_kernel)
 restore_all_guest:
         ASSERT_INTERRUPTS_DISABLED
 
+        /* Stash guest SPEC_CTRL value while we can read struct vcpu. */
+        mov VCPU_arch_msr(%rbx), %rdx
+        mov VCPUMSR_spec_ctrl_raw(%rdx), %r15d
+
         /* Copy guest mappings and switch to per-CPU root page table. */
         mov   %cr3, %r9
         GET_STACK_END(dx)
@@ -65,6 +69,12 @@ restore_all_guest:
         write_cr3 rax, rdi, rsi
 .Lrag_keep_cr3:
 
+        /* Restore stashed SPEC_CTRL value. */
+        mov   %r15d, %eax
+
+        /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+        SPEC_CTRL_EXIT_TO_GUEST /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: acd */
+
         RESTORE_ALL
         testw $TRAP_syscall,4(%rsp)
         jz    iret_exit_to_guest
@@ -115,6 +125,9 @@ UNLIKELY_START(g, exit_cr3)
         write_cr3 rax, rdi, rsi
 UNLIKELY_END(exit_cr3)
 
+        /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+        SPEC_CTRL_EXIT_TO_XEN /* Req: nothing, Clob: acd */
+
         RESTORE_ALL adj=8
         iretq
 
@@ -145,6 +158,9 @@ ENTRY(lstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         GET_STACK_END(bx)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
         neg   %rcx
@@ -248,6 +264,9 @@ GLOBAL(sysenter_eflags_saved)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         GET_STACK_END(bx)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
         neg   %rcx
@@ -294,6 +313,9 @@ ENTRY(int80_direct_trap)
         movl  $0x80, 4(%rsp)
         SAVE_ALL
 
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         GET_STACK_END(bx)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
         neg   %rcx
@@ -468,6 +490,9 @@ ENTRY(dom_crash_sync_extable)
 ENTRY(common_interrupt)
         SAVE_ALL CLAC
 
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         GET_STACK_END(14)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
         mov   %rcx, %r15
@@ -506,6 +531,9 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
         SAVE_ALL CLAC
 
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         GET_STACK_END(14)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
         mov   %rcx, %r15
@@ -700,6 +728,9 @@ ENTRY(double_fault)
         /* Set AC to reduce chance of further SMAP faults */
         SAVE_ALL STAC
 
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         GET_STACK_END(bx)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rbx
         test  %rbx, %rbx
@@ -729,6 +760,9 @@ ENTRY(nmi)
 handle_ist_exception:
         SAVE_ALL CLAC
 
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
+        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
         GET_STACK_END(14)
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
         mov   %rcx, %r15
diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index d2d91ca..b74ac81 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -7,6 +7,7 @@
 #include <asm/asm-offsets.h>
 #endif
 #include <asm/bug.h>
+#include <asm/page.h>
 #include <asm/processor.h>
 #include <asm/percpu.h>
 #include <xen/stringify.h>
@@ -386,4 +387,6 @@ static always_inline void stac(void)
 4:  .p2align 2                            ; \
     .popsection
 
+#include <asm/spec_ctrl_asm.h>
+
 #endif /* __X86_ASM_DEFNS_H__ */
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index b929c48..1009d05 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -53,6 +53,12 @@ struct cpu_info {
      */
     unsigned long xen_cr3;
     unsigned long pv_cr3;
+
+    /* See asm-x86/spec_ctrl_asm.h for usage. */
+    unsigned int shadow_spec_ctrl;
+    bool         use_shadow_spec_ctrl;
+
+    unsigned long __pad;
     /* get_stack_bottom() must be 16-byte aligned */
 };
 
diff --git a/xen/include/asm-x86/nops.h b/xen/include/asm-x86/nops.h
index 1a46b97..bb5b5d5 100644
--- a/xen/include/asm-x86/nops.h
+++ b/xen/include/asm-x86/nops.h
@@ -65,6 +65,12 @@
 #define ASM_NOP8 _ASM_MK_NOP(P6_NOP8)
 #define ASM_NOP9 _ASM_MK_NOP(P6_NOP9)
 
+#define ASM_NOP22 ASM_NOP8; ASM_NOP8; ASM_NOP6
+#define ASM_NOP24 ASM_NOP8; ASM_NOP8; ASM_NOP8
+#define ASM_NOP26 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP2
+#define ASM_NOP32 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8
+#define ASM_NOP36 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP4
+
 #define ASM_NOP_MAX 9
 
 #endif /* __X86_ASM_NOPS_H__ */
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index e088a55..b451250 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -20,8 +20,17 @@
 #ifndef __X86_SPEC_CTRL_H__
 #define __X86_SPEC_CTRL_H__
 
+#include <asm/current.h>
+
 void init_speculation_mitigations(void);
 
+static inline void init_shadow_spec_ctrl_state(void)
+{
+    struct cpu_info *info = get_cpu_info();
+
+    info->shadow_spec_ctrl = info->use_shadow_spec_ctrl = 0;
+}
+
 #endif /* !__X86_SPEC_CTRL_H__ */
 
 /*
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
new file mode 100644
index 0000000..701a5ad9
--- /dev/null
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -0,0 +1,227 @@
+/******************************************************************************
+ * include/asm-x86/spec_ctrl.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017-2018 Citrix Systems Ltd.
+ */
+
+#ifndef __X86_SPEC_CTRL_ASM_H__
+#define __X86_SPEC_CTRL_ASM_H__
+
+#ifdef __ASSEMBLY__
+#include <asm/msr.h>
+
+/*
+ * Saving and restoring MSR_SPEC_CTRL state is a little tricky.
+ *
+ * We want the guests choice of SPEC_CTRL while in guest context, and IBRS
+ * (set or clear, depending on the hardware) while running in Xen context.
+ * Therefore, a simplistic algorithm is:
+ *
+ *  - Set/clear IBRS on entry to Xen
+ *  - Set the guests' choice on exit to guest
+ *  - Leave SPEC_CTRL unchanged on exit to xen
+ *
+ * There are two complicating factors:
+ *  1) HVM guests can have direct access to the MSR, so it can change
+ *     behind Xen's back.
+ *  2) An NMI or MCE can interrupt at any point, including early in the entry
+ *     path, or late in the exit path after restoring the guest value.  This
+ *     will corrupt the guest value.
+ *
+ * Factor 1 is dealt with by relying on NMIs/MCEs being blocked immediately
+ * after VMEXIT.  The VMEXIT-specific code reads MSR_SPEC_CTRL and updates
+ * current before loading Xen's MSR_SPEC_CTRL setting.
+ *
+ * Factor 2 is harder.  We maintain a shadow_spec_ctrl value, and
+ * use_shadow_spec_ctrl boolean per cpu.  The synchronous use is:
+ *
+ *  1) Store guest value in shadow_spec_ctrl
+ *  2) Set use_shadow_spec_ctrl boolean
+ *  3) Load guest value into MSR_SPEC_CTRL
+ *  4) Exit to guest
+ *  5) Entry from guest
+ *  6) Clear use_shadow_spec_ctrl boolean
+ *
+ * The asynchronous use for interrupts/exceptions is:
+ *  -  Set/clear IBRS on entry to Xen
+ *  -  On exit to Xen, check use_shadow_spec_ctrl
+ *  -  If set, load shadow_spec_ctrl
+ *
+ * Therefore, an interrupt/exception which hits the synchronous path between
+ * steps 2 and 6 will restore the shadow value rather than leaving Xen's value
+ * loaded and corrupting the value used in guest context.
+ *
+ * The following ASM fragments implement this algorithm.  See their local
+ * comments for further details.
+ *  - SPEC_CTRL_ENTRY_FROM_VMEXIT
+ *  - SPEC_CTRL_ENTRY_FROM_PV
+ *  - SPEC_CTRL_ENTRY_FROM_INTR
+ *  - SPEC_CTRL_EXIT_TO_XEN
+ *  - SPEC_CTRL_EXIT_TO_GUEST
+ */
+
+.macro DO_SPEC_CTRL_ENTRY_FROM_VMEXIT ibrs_val:req
+/*
+ * Requires %rbx=current, %rsp=regs/cpuinfo
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * The common case is that a guest has direct access to MSR_SPEC_CTRL, at
+ * which point we need to save the guest value before setting IBRS for Xen.
+ * Unilaterally saving the guest value is shorter and faster than checking.
+ */
+    mov $MSR_SPEC_CTRL, %ecx
+    rdmsr
+
+    /* Stash the value from hardware. */
+    mov VCPU_arch_msr(%rbx), %rdx
+    mov %al, VCPUMSR_spec_ctrl_raw(%rdx)
+    xor %edx, %edx
+
+    /* Clear SPEC_CTRL shadowing *before* loading Xen's value. */
+    movb %dl, CPUINFO_use_shadow_spec_ctrl(%rsp)
+
+    /* Load Xen's intended value. */
+    mov $\ibrs_val, %eax
+    wrmsr
+.endm
+
+.macro DO_SPEC_CTRL_ENTRY maybexen:req ibrs_val:req
+/*
+ * Requires %rsp=regs (also cpuinfo if !maybexen)
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * PV guests can't update MSR_SPEC_CTRL behind Xen's back, so no need to read
+ * it back.  Entries from guest context need to clear SPEC_CTRL shadowing,
+ * while entries from Xen must leave shadowing in its current state.
+ */
+    mov $MSR_SPEC_CTRL, %ecx
+
+    .if \maybexen
+        testb $3, UREGS_cs(%rsp)
+        jz .L\@_entry_from_xen
+    .endif
+
+    /*
+     * Clear SPEC_CTRL shadowing *before* loading Xen's value.  If entering
+     * from a possibly-xen context, %rsp doesn't necessarily alias the cpuinfo
+     * block so calculate the position directly.
+     */
+    .if \maybexen
+        GET_STACK_END(dx)
+        movb $0, STACK_CPUINFO_FIELD(use_shadow_spec_ctrl)(%rdx)
+    .else
+        movb $0, CPUINFO_use_shadow_spec_ctrl(%rsp)
+    .endif
+
+.L\@_entry_from_xen:
+    /* Load Xen's intended value. */
+    mov $\ibrs_val, %eax
+    xor %edx, %edx
+    wrmsr
+.endm
+
+.macro DO_SPEC_CTRL_EXIT_TO_XEN
+/*
+ * Requires nothing
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * When returning to Xen context, look to see whether SPEC_CTRL shadowing is
+ * in effect, and reload the shadow value.  This covers race conditions which
+ * exist with an NMI/MCE/etc hitting late in the return-to-guest path.
+ */
+    GET_STACK_END(dx)
+    cmpb $0, STACK_CPUINFO_FIELD(use_shadow_spec_ctrl)(%rdx)
+    je .L\@_skip
+
+    mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rdx), %eax
+    mov $MSR_SPEC_CTRL, %ecx
+    xor %edx, %edx
+    wrmsr
+
+.L\@_skip:
+.endm
+
+.macro DO_SPEC_CTRL_EXIT_TO_GUEST
+/*
+ * Requires %eax=spec_ctrl, %rsp=regs/cpuinfo
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * When returning to guest context, set up SPEC_CTRL shadowing and load the
+ * guest value.
+ */
+    /* Set up shadow value *before* enabling shadowing. */
+    mov %eax, CPUINFO_shadow_spec_ctrl(%rsp)
+
+    /* Set SPEC_CTRL shadowing *before* loading the guest value. */
+    movb $1, CPUINFO_use_shadow_spec_ctrl(%rsp)
+
+    mov $MSR_SPEC_CTRL, %ecx
+    xor %edx, %edx
+    wrmsr
+.endm
+
+/* Use after a VMEXIT from an HVM guest. */
+#define SPEC_CTRL_ENTRY_FROM_VMEXIT                                     \
+    ALTERNATIVE_2 __stringify(ASM_NOP32),                               \
+        __stringify(DO_SPEC_CTRL_ENTRY_FROM_VMEXIT                      \
+                    ibrs_val=SPEC_CTRL_IBRS),                           \
+        X86_FEATURE_XEN_IBRS_SET,                                       \
+        __stringify(DO_SPEC_CTRL_ENTRY_FROM_VMEXIT                      \
+                    ibrs_val=0),                                        \
+        X86_FEATURE_XEN_IBRS_CLEAR
+
+/* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
+#define SPEC_CTRL_ENTRY_FROM_PV                                         \
+    ALTERNATIVE_2 __stringify(ASM_NOP22),                               \
+        __stringify(DO_SPEC_CTRL_ENTRY maybexen=0                       \
+                    ibrs_val=SPEC_CTRL_IBRS),                           \
+        X86_FEATURE_XEN_IBRS_SET,                                       \
+        __stringify(DO_SPEC_CTRL_ENTRY maybexen=0 ibrs_val=0),          \
+        X86_FEATURE_XEN_IBRS_CLEAR
+
+/* Use in interrupt/exception context.  May interrupt Xen or PV context. */
+#define SPEC_CTRL_ENTRY_FROM_INTR                                       \
+    ALTERNATIVE_2 __stringify(ASM_NOP36),                               \
+        __stringify(DO_SPEC_CTRL_ENTRY maybexen=1                       \
+                    ibrs_val=SPEC_CTRL_IBRS),                           \
+        X86_FEATURE_XEN_IBRS_SET,                                       \
+        __stringify(DO_SPEC_CTRL_ENTRY maybexen=1 ibrs_val=0),          \
+        X86_FEATURE_XEN_IBRS_CLEAR
+
+/* Use when exiting to Xen context. */
+#define SPEC_CTRL_EXIT_TO_XEN                                           \
+    ALTERNATIVE_2 __stringify(ASM_NOP26),                               \
+        DO_SPEC_CTRL_EXIT_TO_XEN, X86_FEATURE_XEN_IBRS_SET,             \
+        DO_SPEC_CTRL_EXIT_TO_XEN, X86_FEATURE_XEN_IBRS_CLEAR
+
+/* Use when exiting to guest context. */
+#define SPEC_CTRL_EXIT_TO_GUEST                                         \
+    ALTERNATIVE_2 __stringify(ASM_NOP24),                               \
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_XEN_IBRS_SET,           \
+        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_XEN_IBRS_CLEAR
+
+#endif /* __ASSEMBLY__ */
+#endif /* !__X86_SPEC_CTRL_ASM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
                   ` (5 preceding siblings ...)
  2018-01-18 15:46 ` [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-19 12:06   ` Jan Beulich
  2018-01-19 14:01   ` Jan Beulich
  2018-01-18 15:46 ` [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen Andrew Cooper
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v7:
 * static, and tweak comment
---
 docs/misc/xen-command-line.markdown |   6 ++-
 xen/arch/x86/spec_ctrl.c            | 104 ++++++++++++++++++++++++++++++++++--
 2 files changed, 105 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index f73990f..b4a7ecd 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -246,7 +246,7 @@ enough. Setting this to a high value may cause boot failure, particularly if
 the NMI watchdog is also enabled.
 
 ### bti (x86)
-> `= List of [ thunk=retpoline|lfence|jmp ]`
+> `= List of [ thunk=retpoline|lfence|jmp, ibrs=<bool> ]`
 
 Branch Target Injection controls.  By default, Xen will pick the most
 appropriate BTI mitigations based on compiled in support, loaded microcode,
@@ -261,6 +261,10 @@ locations.  The default thunk is `retpoline` (generally preferred for Intel
 hardware), with the alternatives being `jmp` (a `jmp *%reg` gadget, minimal
 overhead), and `lfence` (an `lfence; jmp *%reg` gadget, preferred for AMD).
 
+On hardware supporting IBRS, the `ibrs=` option can be used to force or
+prevent Xen using the feature itself.  If Xen is not using IBRS itself,
+functionality is still set up so IBRS can be virtualised for guests.
+
 ### xenheap\_megabytes (arm32)
 > `= <size>`
 
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 89e7287..7b0daaf 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -20,6 +20,7 @@
 #include <xen/init.h>
 #include <xen/lib.h>
 
+#include <asm/microcode.h>
 #include <asm/processor.h>
 #include <asm/spec_ctrl.h>
 
@@ -31,11 +32,12 @@ static enum ind_thunk {
     THUNK_LFENCE,
     THUNK_JMP,
 } opt_thunk __initdata = THUNK_DEFAULT;
+static int opt_ibrs __initdata = -1;
 
 static int __init parse_bti(const char *s)
 {
     const char *ss;
-    int rc = 0;
+    int val, rc = 0;
 
     do {
         ss = strchr(s, ',');
@@ -55,6 +57,8 @@ static int __init parse_bti(const char *s)
             else
                 rc = -EINVAL;
         }
+        else if ( (val = parse_boolean("ibrs", s, ss)) >= 0 )
+            opt_ibrs = val;
         else
             rc = -EINVAL;
 
@@ -91,24 +95,82 @@ static void __init print_details(enum ind_thunk thunk)
         printk(XENLOG_DEBUG "  Compiled-in support: INDIRECT_THUNK\n");
 
     printk(XENLOG_INFO
-           "BTI mitigations: Thunk %s\n",
+           "BTI mitigations: Thunk %s, Others:%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
-           thunk == THUNK_JMP       ? "JMP" : "?");
+           thunk == THUNK_JMP       ? "JMP" : "?",
+           boot_cpu_has(X86_FEATURE_XEN_IBRS_SET)    ? " IBRS+" :
+           boot_cpu_has(X86_FEATURE_XEN_IBRS_CLEAR)  ? " IBRS-"      : "");
+}
+
+/* Calculate whether Retpoline is known-safe on this CPU. */
+static bool __init retpoline_safe(void)
+{
+    unsigned int ucode_rev = this_cpu(ucode_cpu_info).cpu_sig.rev;
+
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+        return true;
+
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return false;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+    case 0x17: /* Penryn */
+    case 0x1d: /* Dunnington */
+    case 0x1e: /* Nehalem */
+    case 0x1f: /* Auburndale / Havendale */
+    case 0x1a: /* Nehalem EP */
+    case 0x2e: /* Nehalem EX */
+    case 0x25: /* Westmere */
+    case 0x2c: /* Westmere EP */
+    case 0x2f: /* Westmere EX */
+    case 0x2a: /* SandyBridge */
+    case 0x2d: /* SandyBridge EP/EX */
+    case 0x3a: /* IvyBridge */
+    case 0x3e: /* IvyBridge EP/EX */
+    case 0x3c: /* Haswell */
+    case 0x3f: /* Haswell EX/EP */
+    case 0x45: /* Haswell D */
+    case 0x46: /* Haswell H */
+        return true;
+
+        /*
+         * Broadwell processors are retpoline-safe after specific microcode
+         * versions.
+         */
+    case 0x3d: /* Broadwell */
+        return ucode_rev >= 0x28;
+    case 0x47: /* Broadwell H */
+        return ucode_rev >= 0x1b;
+    case 0x4f: /* Broadwell EP/EX */
+        return ucode_rev >= 0xb000025;
+    case 0x56: /* Broadwell D */
+        return false; /* TBD. */
+
+        /*
+         * Skylake and later processors are not retpoline-safe.
+         */
+    default:
+        return false;
+    }
 }
 
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
+    bool ibrs = false;
 
     /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
      * instructions exactly and disable all heuristics.
      */
-    if ( opt_thunk != THUNK_DEFAULT )
+    if ( opt_thunk != THUNK_DEFAULT || opt_ibrs != -1 )
     {
         thunk = opt_thunk;
+        ibrs  = !!opt_ibrs;
     }
     else
     {
@@ -124,7 +186,21 @@ void __init init_speculation_mitigations(void)
              */
             if ( cpu_has_lfence_dispatch )
                 thunk = THUNK_LFENCE;
+            /*
+             * On Intel hardware, we'd like to use retpoline in preference to
+             * IBRS, but only if it is safe on this hardware.
+             */
+            else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+            {
+                if ( retpoline_safe() )
+                    thunk = THUNK_RETPOLINE;
+                else
+                    ibrs = true;
+            }
         }
+        /* Without compiler thunk support, use IBRS if available. */
+        else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+            ibrs = true;
     }
 
     /*
@@ -135,6 +211,13 @@ void __init init_speculation_mitigations(void)
         thunk = THUNK_NONE;
 
     /*
+     * If IBRS is in use and thunks are compiled in, there is no point
+     * suffering extra overhead.  Switch to the least-overhead thunk.
+     */
+    if ( ibrs && thunk == THUNK_DEFAULT )
+        thunk = THUNK_JMP;
+
+    /*
      * If there are still no thunk preferences, the compiled default is
      * actually retpoline, and it is better than nothing.
      */
@@ -147,6 +230,19 @@ void __init init_speculation_mitigations(void)
     else if ( thunk == THUNK_JMP )
         setup_force_cpu_cap(X86_FEATURE_IND_THUNK_JMP);
 
+    if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+    {
+        /*
+         * Even if we've chosen to not have IBRS set in Xen context, we still
+         * need the IBRS entry/exit logic to virtualise IBRS support for
+         * guests.
+         */
+        if ( ibrs )
+            setup_force_cpu_cap(X86_FEATURE_XEN_IBRS_SET);
+        else
+            setup_force_cpu_cap(X86_FEATURE_XEN_IBRS_CLEAR);
+    }
+
     print_details(thunk);
 }
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
                   ` (6 preceding siblings ...)
  2018-01-18 15:46 ` [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-19 12:47   ` Jan Beulich
  2018-01-18 15:46 ` [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts Andrew Cooper
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

ret instructions are speculated directly to values recorded in the RSB/RAS, as
there is no uncertainty in well-formed code.  Guests can take advantage of
this in two ways:

  1) If they can find a path in Xen which executes more ret instructions than
     call instructions.  (At least one in the waitqueue infrastructure,
     probably others.)
  2) Use the fact that the RSB/RAS in hardware is actually a circular stack
     without a concept of empty.  (When it logically empties, stale values
     will start being used.)

To mitigate, unconditionally overwrite the RSB on entry to Xen with gadgets
which will capture and contain rogue speculation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v7:
 * Rewritten almost from scratch.  See code comments for details.
v8:
 * Use jmp rather than call to contain speculation.  It doesn't affect the
   correctness of containment, but removes 6 bytes.
---
 docs/misc/xen-command-line.markdown |  6 +++++-
 xen/arch/x86/spec_ctrl.c            | 34 +++++++++++++++++++++++++++++--
 xen/include/asm-x86/cpufeatures.h   |  2 ++
 xen/include/asm-x86/nops.h          |  1 +
 xen/include/asm-x86/spec_ctrl_asm.h | 40 +++++++++++++++++++++++++++++++++++++
 5 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index b4a7ecd..11399ce 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -246,7 +246,7 @@ enough. Setting this to a high value may cause boot failure, particularly if
 the NMI watchdog is also enabled.
 
 ### bti (x86)
-> `= List of [ thunk=retpoline|lfence|jmp, ibrs=<bool> ]`
+> `= List of [ thunk=retpoline|lfence|jmp, ibrs=<bool>, rsb_{vmexit,native}=<bool> ]`
 
 Branch Target Injection controls.  By default, Xen will pick the most
 appropriate BTI mitigations based on compiled in support, loaded microcode,
@@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can be used to force or
 prevent Xen using the feature itself.  If Xen is not using IBRS itself,
 functionality is still set up so IBRS can be virtualised for guests.
 
+The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when the
+RSB gets overwritten.  There are individual controls for an entry from HVM
+context, and an entry from a native (PV or Xen) context.
+
 ### xenheap\_megabytes (arm32)
 > `= <size>`
 
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 7b0daaf..680fabe 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -33,6 +33,7 @@ static enum ind_thunk {
     THUNK_JMP,
 } opt_thunk __initdata = THUNK_DEFAULT;
 static int opt_ibrs __initdata = -1;
+static bool opt_rsb_native __initdata = true, opt_rsb_vmexit __initdata = true;
 
 static int __init parse_bti(const char *s)
 {
@@ -59,6 +60,10 @@ static int __init parse_bti(const char *s)
         }
         else if ( (val = parse_boolean("ibrs", s, ss)) >= 0 )
             opt_ibrs = val;
+        else if ( (val = parse_boolean("rsb_native", s, ss)) >= 0 )
+            opt_rsb_native = val;
+        else if ( (val = parse_boolean("rsb_vmexit", s, ss)) >= 0 )
+            opt_rsb_vmexit = val;
         else
             rc = -EINVAL;
 
@@ -95,13 +100,15 @@ static void __init print_details(enum ind_thunk thunk)
         printk(XENLOG_DEBUG "  Compiled-in support: INDIRECT_THUNK\n");
 
     printk(XENLOG_INFO
-           "BTI mitigations: Thunk %s, Others:%s\n",
+           "BTI mitigations: Thunk %s, Others:%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
            thunk == THUNK_JMP       ? "JMP" : "?",
            boot_cpu_has(X86_FEATURE_XEN_IBRS_SET)    ? " IBRS+" :
-           boot_cpu_has(X86_FEATURE_XEN_IBRS_CLEAR)  ? " IBRS-"      : "");
+           boot_cpu_has(X86_FEATURE_XEN_IBRS_CLEAR)  ? " IBRS-"      : "",
+           boot_cpu_has(X86_FEATURE_RSB_NATIVE)      ? " RSB_NATIVE" : "",
+           boot_cpu_has(X86_FEATURE_RSB_VMEXIT)      ? " RSB_VMEXIT" : "");
 }
 
 /* Calculate whether Retpoline is known-safe on this CPU. */
@@ -243,6 +250,29 @@ void __init init_speculation_mitigations(void)
             setup_force_cpu_cap(X86_FEATURE_XEN_IBRS_CLEAR);
     }
 
+    /*
+     * PV guests can poison the RSB to any virtual address from which
+     * they can execute a call instruction.  This is necessarily outside
+     * of the Xen supervisor mappings.
+     *
+     * With SMEP enabled, the processor won't speculate into user
+     * mappings.  Therefore, we don't need to worry about poisoned
+     * entries from 64bit PV guests.
+     *
+     * 32bit PV guest kernels run in ring 1, so use supervisor mappings.
+     * If a processors speculates to 32bit PV guest kernel mappings, it is
+     * speculating in 64bit supervisor mode, and can leak data.
+     */
+    if ( opt_rsb_native )
+        setup_force_cpu_cap(X86_FEATURE_RSB_NATIVE);
+
+    /*
+     * HVM guests can always poison the RSB to point at Xen supervisor
+     * mappings.
+     */
+    if ( opt_rsb_vmexit )
+        setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
+
     print_details(thunk);
 }
 
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index dd2388f..0ee4a1f 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -28,3 +28,5 @@ XEN_CPUFEATURE(IND_THUNK_JMP,   (FSCAPINTS+0)*32+14) /* Use IND_THUNK_JMP */
 XEN_CPUFEATURE(XEN_IBPB,        (FSCAPINTS+0)*32+15) /* IBRSB || IBPB */
 XEN_CPUFEATURE(XEN_IBRS_SET,    (FSCAPINTS+0)*32+16) /* IBRSB && IRBS set in Xen */
 XEN_CPUFEATURE(XEN_IBRS_CLEAR,  (FSCAPINTS+0)*32+17) /* IBRSB && IBRS clear in Xen */
+XEN_CPUFEATURE(RSB_NATIVE,      (FSCAPINTS+0)*32+18) /* RSB overwrite needed for native */
+XEN_CPUFEATURE(RSB_VMEXIT,      (FSCAPINTS+0)*32+20) /* RSB overwrite needed for vmexit */
diff --git a/xen/include/asm-x86/nops.h b/xen/include/asm-x86/nops.h
index bb5b5d5..62de583 100644
--- a/xen/include/asm-x86/nops.h
+++ b/xen/include/asm-x86/nops.h
@@ -69,6 +69,7 @@
 #define ASM_NOP24 ASM_NOP8; ASM_NOP8; ASM_NOP8
 #define ASM_NOP26 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP2
 #define ASM_NOP32 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8
+#define ASM_NOP34 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP2
 #define ASM_NOP36 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP4
 
 #define ASM_NOP_MAX 9
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 701a5ad9..f419b2e 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -73,6 +73,40 @@
  *  - SPEC_CTRL_EXIT_TO_GUEST
  */
 
+.macro DO_OVERWRITE_RSB
+/*
+ * Requires nothing
+ * Clobbers %rax, %rcx
+ *
+ * Requires 256 bytes of stack space, but %rsp has no net change. Based on
+ * Google's performance numbers, the loop is unrolled to 16 iterations and two
+ * calls per iteration.
+ *
+ * The call filling the RSB needs a nonzero displacement, but we use "1:
+ * pause, jmp 1b" to safely contains any ret-based speculation, even if the
+ * loop is speculatively executed prematurely.
+ *
+ * %rsp is preserved by using an extra GPR because a) we've got plenty spare,
+ * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can be
+ * optimised with mov-elimination in modern cores.
+ */
+    mov $16, %ecx   /* 16 iterations, two calls per loop */
+    mov %rsp, %rax  /* Store the current %rsp */
+
+.L\@_fill_rsb_loop:
+
+    .rept 2         /* Unrolled twice. */
+    call 2f         /* Create an RSB entry. */
+1:  pause
+    jmp 1b          /* Capture rogue speculation. */
+2:
+    .endr
+
+    sub $1, %ecx
+    jnz .L\@_fill_rsb_loop
+    mov %rax, %rsp  /* Retore old %rsp */
+.endm
+
 .macro DO_SPEC_CTRL_ENTRY_FROM_VMEXIT ibrs_val:req
 /*
  * Requires %rbx=current, %rsp=regs/cpuinfo
@@ -175,6 +209,8 @@
 
 /* Use after a VMEXIT from an HVM guest. */
 #define SPEC_CTRL_ENTRY_FROM_VMEXIT                                     \
+    ALTERNATIVE __stringify(ASM_NOP34),                                 \
+        DO_OVERWRITE_RSB, X86_FEATURE_RSB_VMEXIT;                       \
     ALTERNATIVE_2 __stringify(ASM_NOP32),                               \
         __stringify(DO_SPEC_CTRL_ENTRY_FROM_VMEXIT                      \
                     ibrs_val=SPEC_CTRL_IBRS),                           \
@@ -185,6 +221,8 @@
 
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
 #define SPEC_CTRL_ENTRY_FROM_PV                                         \
+    ALTERNATIVE __stringify(ASM_NOP34),                                 \
+        DO_OVERWRITE_RSB, X86_FEATURE_RSB_NATIVE;                       \
     ALTERNATIVE_2 __stringify(ASM_NOP22),                               \
         __stringify(DO_SPEC_CTRL_ENTRY maybexen=0                       \
                     ibrs_val=SPEC_CTRL_IBRS),                           \
@@ -194,6 +232,8 @@
 
 /* Use in interrupt/exception context.  May interrupt Xen or PV context. */
 #define SPEC_CTRL_ENTRY_FROM_INTR                                       \
+    ALTERNATIVE __stringify(ASM_NOP34),                                 \
+        DO_OVERWRITE_RSB, X86_FEATURE_RSB_NATIVE;                       \
     ALTERNATIVE_2 __stringify(ASM_NOP36),                               \
         __stringify(DO_SPEC_CTRL_ENTRY maybexen=1                       \
                     ibrs_val=SPEC_CTRL_IBRS),                           \
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
                   ` (7 preceding siblings ...)
  2018-01-18 15:46 ` [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-19 13:25   ` Jan Beulich
  2018-01-18 15:46 ` [PATCH v9 10/11] x86/cpuid: Offer Indirect Branch Controls to guests Andrew Cooper
  2018-01-18 15:46 ` [PATCH v9 11/11] x86/idle: Clear SPEC_CTRL while idle Andrew Cooper
  10 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, David Woodhouse, Jan Beulich

Issuing an IBPB command flushes the Branch Target Buffer, so that any poison
left by one vcpu won't remain when beginning to execute the next.

The cost of IBPB is substantial, and skipped on transition to idle, as Xen's
idle code is robust already.  All transitions into vcpu context are fully
serialising in practice (and under consideration for being retroactively
declared architecturally serialising), so a cunning attacker cannot use SP1 to
try and skip the flush.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: David Woodhouse <dwmw@amazon.co.uk>

v7:
 * Use the opt_ibpb boolean rather than using a cpufeature flag.
v9:
 * Expand the commit message.
 * Optimise the idle case, based on a suggestion from David.
---
 docs/misc/xen-command-line.markdown |  5 ++++-
 xen/arch/x86/domain.c               | 23 +++++++++++++++++++++++
 xen/arch/x86/spec_ctrl.c            | 10 +++++++++-
 xen/include/asm-x86/spec_ctrl.h     |  2 ++
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 11399ce..9c10d3a 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -246,7 +246,7 @@ enough. Setting this to a high value may cause boot failure, particularly if
 the NMI watchdog is also enabled.
 
 ### bti (x86)
-> `= List of [ thunk=retpoline|lfence|jmp, ibrs=<bool>, rsb_{vmexit,native}=<bool> ]`
+> `= List of [ thunk=retpoline|lfence|jmp, ibrs=<bool>, ibpb=<bool>, rsb_{vmexit,native}=<bool> ]`
 
 Branch Target Injection controls.  By default, Xen will pick the most
 appropriate BTI mitigations based on compiled in support, loaded microcode,
@@ -265,6 +265,9 @@ On hardware supporting IBRS, the `ibrs=` option can be used to force or
 prevent Xen using the feature itself.  If Xen is not using IBRS itself,
 functionality is still set up so IBRS can be virtualised for guests.
 
+On hardware supporting IBPB, the `ibpb=` option can be used to prevent Xen
+from issuing Branch Prediction Barriers on vcpu context switches.
+
 The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when the
 RSB gets overwritten.  There are individual controls for an entry from HVM
 context, and an entry from a native (PV or Xen) context.
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dbf4522..7f60b16 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1743,6 +1743,29 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
         }
 
         ctxt_switch_levelling(next);
+
+        if ( opt_ibpb )
+        {
+            static DEFINE_PER_CPU(unsigned int, last_nonidle);
+            unsigned int *last_id = &this_cpu(last_nonidle);
+
+            /* Squash the domid and vcpu id together for efficiency. */
+            unsigned int next_id = (((unsigned int)nextd->domain_id << 16) |
+                                    (uint16_t)next->vcpu_id);
+            BUILD_BUG_ON(MAX_VIRT_CPUS > 0xffff);
+
+            /*
+             * When scheduling from a vcpu, to idle, and back to the same vcpu
+             * (which might be common in a lightly loaded system, or when
+             * using vcpu pinning), there is no need to issue IBPB, as we are
+             * returning to the same security context.
+             */
+            if ( *last_id != next_id )
+            {
+                wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
+                *last_id = next_id;
+            }
+        }
     }
 
     context_saved(prev);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 680fabe..ae3e7d7 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -33,6 +33,7 @@ static enum ind_thunk {
     THUNK_JMP,
 } opt_thunk __initdata = THUNK_DEFAULT;
 static int opt_ibrs __initdata = -1;
+bool __read_mostly opt_ibpb = true;
 static bool opt_rsb_native __initdata = true, opt_rsb_vmexit __initdata = true;
 
 static int __init parse_bti(const char *s)
@@ -60,6 +61,8 @@ static int __init parse_bti(const char *s)
         }
         else if ( (val = parse_boolean("ibrs", s, ss)) >= 0 )
             opt_ibrs = val;
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+            opt_ibpb = val;
         else if ( (val = parse_boolean("rsb_native", s, ss)) >= 0 )
             opt_rsb_native = val;
         else if ( (val = parse_boolean("rsb_vmexit", s, ss)) >= 0 )
@@ -100,13 +103,14 @@ static void __init print_details(enum ind_thunk thunk)
         printk(XENLOG_DEBUG "  Compiled-in support: INDIRECT_THUNK\n");
 
     printk(XENLOG_INFO
-           "BTI mitigations: Thunk %s, Others:%s%s%s\n",
+           "BTI mitigations: Thunk %s, Others:%s%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
            thunk == THUNK_JMP       ? "JMP" : "?",
            boot_cpu_has(X86_FEATURE_XEN_IBRS_SET)    ? " IBRS+" :
            boot_cpu_has(X86_FEATURE_XEN_IBRS_CLEAR)  ? " IBRS-"      : "",
+           opt_ibpb                                  ? " IBPB"       : "",
            boot_cpu_has(X86_FEATURE_RSB_NATIVE)      ? " RSB_NATIVE" : "",
            boot_cpu_has(X86_FEATURE_RSB_VMEXIT)      ? " RSB_VMEXIT" : "");
 }
@@ -273,6 +277,10 @@ void __init init_speculation_mitigations(void)
     if ( opt_rsb_vmexit )
         setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
 
+    /* Check we have hardware IBPB support before using it... */
+    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
+        opt_ibpb = false;
+
     print_details(thunk);
 }
 
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index b451250..828707e 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -24,6 +24,8 @@
 
 void init_speculation_mitigations(void);
 
+extern bool opt_ibpb;
+
 static inline void init_shadow_spec_ctrl_state(void)
 {
     struct cpu_info *info = get_cpu_info();
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 10/11] x86/cpuid: Offer Indirect Branch Controls to guests
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
                   ` (8 preceding siblings ...)
  2018-01-18 15:46 ` [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  2018-01-18 15:46 ` [PATCH v9 11/11] x86/idle: Clear SPEC_CTRL while idle Andrew Cooper
  10 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

With all infrastructure in place, it is now safe to let guests see and use
these features.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
v9:
 * Split patch in half with the libxc hunk moving earlier, and rebasing over
   the changed nature of STIBP
---
 xen/include/public/arch-x86/cpufeatureset.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 0f21fed..fa81af1 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -237,13 +237,13 @@ XEN_CPUFEATURE(EFRO,          7*32+10) /*   APERF/MPERF Read Only interface */
 
 /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
 XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
-XEN_CPUFEATURE(IBPB,          8*32+12) /*   IBPB support only (no IBRS, used by AMD) */
+XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
 XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation Single Precision */
-XEN_CPUFEATURE(IBRSB,         9*32+26) /*   IBRS and IBPB support (used by Intel) */
-XEN_CPUFEATURE(STIBP,         9*32+27) /*!  STIBP */
+XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
+XEN_CPUFEATURE(STIBP,         9*32+27) /*A! STIBP */
 
 #endif /* XEN_CPUFEATURE */
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 11/11] x86/idle: Clear SPEC_CTRL while idle
  2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
                   ` (9 preceding siblings ...)
  2018-01-18 15:46 ` [PATCH v9 10/11] x86/cpuid: Offer Indirect Branch Controls to guests Andrew Cooper
@ 2018-01-18 15:46 ` Andrew Cooper
  10 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-18 15:46 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

On contemporary hardware, setting IBRS/STIBP has a performance impact on
adjacent hyperthreads.  It is therefore recommended to clear the setting
before becoming idle, to avoid an idle core preventing adjacent userspace
execution from running at full performance.

Care must be taken to ensure there are no ret or indirect branch instructions
between spec_ctrl_{enter,exit}_idle() invocations, which are forced always
inline.  Care must also be taken to avoid using spec_ctrl_enter_idle() between
flushing caches and becoming idle, in cases where that matters.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/acpi/cpu_idle.c    | 21 +++++++++++++++++++++
 xen/arch/x86/cpu/mwait-idle.c   |  7 +++++++
 xen/arch/x86/domain.c           |  8 ++++++++
 xen/include/asm-x86/spec_ctrl.h | 34 ++++++++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index cb1c5da..3f72bda 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -55,6 +55,7 @@
 #include <asm/mwait.h>
 #include <xen/notifier.h>
 #include <xen/cpu.h>
+#include <asm/spec_ctrl.h>
 
 /*#define DEBUG_PM_CX*/
 
@@ -414,8 +415,14 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
      */
     if ( (expires > NOW() || expires == 0) && !softirq_pending(cpu) )
     {
+        struct cpu_info *info = get_cpu_info();
+
         cpumask_set_cpu(cpu, &cpuidle_mwait_flags);
+
+        spec_ctrl_enter_idle(info);
         __mwait(eax, ecx);
+        spec_ctrl_exit_idle(info);
+
         cpumask_clear_cpu(cpu, &cpuidle_mwait_flags);
     }
 
@@ -430,6 +437,8 @@ static void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
 
 static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
 {
+    struct cpu_info *info = get_cpu_info();
+
     switch ( cx->entry_method )
     {
     case ACPI_CSTATE_EM_FFH:
@@ -437,15 +446,19 @@ static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
         acpi_processor_ffh_cstate_enter(cx);
         return;
     case ACPI_CSTATE_EM_SYSIO:
+        spec_ctrl_enter_idle(info);
         /* IO port based C-state */
         inb(cx->address);
         /* Dummy wait op - must do something useless after P_LVL2 read
            because chipsets cannot guarantee that STPCLK# signal
            gets asserted in time to freeze execution properly. */
         inl(pmtmr_ioport);
+        spec_ctrl_exit_idle(info);
         return;
     case ACPI_CSTATE_EM_HALT:
+        spec_ctrl_enter_idle(info);
         safe_halt();
+        spec_ctrl_exit_idle(info);
         local_irq_disable();
         return;
     }
@@ -573,7 +586,13 @@ static void acpi_processor_idle(void)
         if ( pm_idle_save )
             pm_idle_save();
         else
+        {
+            struct cpu_info *info = get_cpu_info();
+
+            spec_ctrl_enter_idle(info);
             safe_halt();
+            spec_ctrl_exit_idle(info);
+        }
         return;
     }
 
@@ -752,6 +771,7 @@ void acpi_dead_idle(void)
          * Otherwise, CPU may still hold dirty data, breaking cache coherency,
          * leading to strange errors.
          */
+        spec_ctrl_enter_idle(get_cpu_info());
         wbinvd();
 
         while ( 1 )
@@ -781,6 +801,7 @@ void acpi_dead_idle(void)
         u32 address = cx->address;
         u32 pmtmr_ioport_local = pmtmr_ioport;
 
+        spec_ctrl_enter_idle(get_cpu_info());
         wbinvd();
 
         while ( 1 )
diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index 762dff1..e357f29 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -58,6 +58,7 @@
 #include <asm/hpet.h>
 #include <asm/mwait.h>
 #include <asm/msr.h>
+#include <asm/spec_ctrl.h>
 #include <acpi/cpufreq/cpufreq.h>
 
 #define MWAIT_IDLE_VERSION "0.4.1"
@@ -736,7 +737,13 @@ static void mwait_idle(void)
 		if (pm_idle_save)
 			pm_idle_save();
 		else
+		{
+			struct cpu_info *info = get_cpu_info();
+
+			spec_ctrl_enter_idle(info);
 			safe_halt();
+			spec_ctrl_exit_idle(info);
+		}
 		return;
 	}
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 7f60b16..10a0338 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -55,6 +55,7 @@
 #include <asm/hvm/viridian.h>
 #include <asm/debugreg.h>
 #include <asm/msr.h>
+#include <asm/spec_ctrl.h>
 #include <asm/traps.h>
 #include <asm/nmi.h>
 #include <asm/mce.h>
@@ -74,9 +75,15 @@ void (*dead_idle) (void) __read_mostly = default_dead_idle;
 
 static void default_idle(void)
 {
+    struct cpu_info *info = get_cpu_info();
+
     local_irq_disable();
     if ( cpu_is_haltable(smp_processor_id()) )
+    {
+        spec_ctrl_enter_idle(info);
         safe_halt();
+        spec_ctrl_exit_idle(info);
+    }
     else
         local_irq_enable();
 }
@@ -88,6 +95,7 @@ void default_dead_idle(void)
      * held by the CPUs spinning here indefinitely, and get discarded by
      * a subsequent INIT.
      */
+    spec_ctrl_enter_idle(get_cpu_info());
     wbinvd();
     for ( ; ; )
         halt();
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 828707e..e250dc9 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -20,7 +20,9 @@
 #ifndef __X86_SPEC_CTRL_H__
 #define __X86_SPEC_CTRL_H__
 
+#include <asm/alternative.h>
 #include <asm/current.h>
+#include <asm/msr-index.h>
 
 void init_speculation_mitigations(void);
 
@@ -33,6 +35,38 @@ static inline void init_shadow_spec_ctrl_state(void)
     info->shadow_spec_ctrl = info->use_shadow_spec_ctrl = 0;
 }
 
+/* WARNING! `ret`, `call *`, `jmp *` not safe after this call. */
+static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
+{
+    uint32_t val = 0;
+
+    /*
+     * Latch the new shadow value, then enable shadowing, then update the MSR.
+     * There are no SMP issues here; only local processor ordering concerns.
+     */
+    info->shadow_spec_ctrl = val;
+    barrier();
+    info->use_shadow_spec_ctrl = true;
+    barrier();
+    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_XEN_IBRS_SET)
+                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+}
+
+/* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
+static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
+{
+    uint32_t val = SPEC_CTRL_IBRS;
+
+    /*
+     * Disable shadowing before updating the MSR.  There are no SMP issues
+     * here; only local processor ordering concerns.
+     */
+    info->use_shadow_spec_ctrl = false;
+    barrier();
+    asm volatile ( ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_XEN_IBRS_SET)
+                   :: "a" (val), "c" (MSR_SPEC_CTRL), "d" (0) : "memory" );
+}
+
 #endif /* !__X86_SPEC_CTRL_H__ */
 
 /*
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 01/11] x86/thunk: Fix GEN_INDIRECT_THUNK comment
  2018-01-18 15:46 ` [PATCH v9 01/11] x86/thunk: Fix GEN_INDIRECT_THUNK comment Andrew Cooper
@ 2018-01-18 16:06   ` Jan Beulich
  0 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2018-01-18 16:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
> This is a rebasing error in c/s 858cba0d4c6b "x86: Introduce alternative
> indirect thunks" hidden by other changes in the same sentence.
> 
> The name with dots rather than underscores was the prerelease GCC ABI.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD}
  2018-01-18 15:46 ` [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD} Andrew Cooper
@ 2018-01-18 18:04   ` Boris Ostrovsky
  2018-01-19 10:52   ` Jan Beulich
  2018-01-22  1:47   ` Tian, Kevin
  2 siblings, 0 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2018-01-18 18:04 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Suravee Suthikulpanit, Kevin Tian, Jun Nakajima, Jan Beulich

On 01/18/2018 10:46 AM, Andrew Cooper wrote:
> For performance reasons, HVM guests should have direct access to these MSRs
> when possible.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>
> v7:
>  * Drop excess brackets
> v9:
>  * Re-implement it light of Intels new spec.  Drop R-by's.
>  * Spelling fixes

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-18 15:46 ` [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point Andrew Cooper
@ 2018-01-19 10:39   ` Andrew Cooper
  2018-01-19 11:43   ` Jan Beulich
  1 sibling, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 10:39 UTC (permalink / raw)
  To: Xen-devel; +Cc: Jan Beulich

On 18/01/18 15:46, Andrew Cooper wrote:
> We need to be able to either set or clear IBRS in Xen context, as well as
> restore appropriate guest values in guest context.  See the documentation in
> asm-x86/spec_ctrl_asm.h for details.
>
> Writes to %cr3 are slower when SPEC_CTRL.IBRS is set, so the
> SPEC_CTRL_{ENTRY/EXIT}* positioning is important, and optimised for the
> expected common case of Xen not using IBRS itself.
>
> There is a semi-unrelated bugfix, where various asm_defn.h macros have a
> hidden dependency on PAGE_SIZE, which results in an assembler error if used in
> a .macro definition.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

/sigh - It appears that I've missed two hunks from this patch, and as a
result, its subtly broken for HVM guests.

I'll post an incremental version when I've unbroken my automatic tests
and refreshed the XTF unit tests for these MSRs.

It is expected to be two extra mov's in the vmx/svm exit paths, so the
rest of the patch is ok for review in general.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-18 15:46 ` [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests Andrew Cooper
@ 2018-01-19 10:40   ` Jan Beulich
  2018-01-19 10:53     ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 10:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
> For guest safety, we treat STIBP as special, always override the toolstack
> choice, and always advertise STIBP if IBRS is available.  This removes the
> corner case where STIBP is not advertised, but the guest is running on
> HT-capable hardware where it does matter.

I guess the answer to my question may live somewhere later in the
series, but since I haven't got there yet: Is this based on the
assumption that on HT-capable hardware they would always be
available together? Otherwise, how do you emulate STIBP for the
guest if all you've got is IBRS/IBPB?

> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -383,6 +383,16 @@ static void __init calculate_pv_max_policy(void)
>      /* Unconditionally claim to be able to set the hypervisor bit. */
>      __set_bit(X86_FEATURE_HYPERVISOR, pv_featureset);
>  
> +    /* On hardware with IBRS/IBPB support, there are further adjustments. */
> +    if ( test_bit(X86_FEATURE_IBRSB, pv_featureset) )
> +    {
> +        /* Offer STIBP unconditionally.  It is a nop on non-HT hardware. */
> +        __set_bit(X86_FEATURE_STIBP, pv_featureset);
> +
> +        /* AMD's IBPB is a subset of IBRS/IBPB. */
> +        __set_bit(X86_FEATURE_IBPB, pv_featureset);
> +    }
> +
>      sanitise_featureset(pv_featureset);
>      cpuid_featureset_to_policy(pv_featureset, p);
>      recalculate_xstate(p);
> @@ -440,6 +450,16 @@ static void __init calculate_hvm_max_policy(void)
>              __clear_bit(X86_FEATURE_XSAVES, hvm_featureset);
>      }
>  
> +    /* On hardware with IBRS/IBPB support, there are further adjustments. */
> +    if ( test_bit(X86_FEATURE_IBRSB, hvm_featureset) )
> +    {
> +        /* Offer STIBP unconditionally.  It is a nop on non-HT hardware. */
> +        __set_bit(X86_FEATURE_STIBP, hvm_featureset);
> +
> +        /* AMD's IBPB is a subset of IBRS/IBPB. */
> +        __set_bit(X86_FEATURE_IBPB, hvm_featureset);
> +    }

As long as we don't expect this logic to grow, having it duplicated
like this is probably fine. Otherwise a helper function might be
better.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 03/11] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} for guests
  2018-01-18 15:46 ` [PATCH v9 03/11] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} " Andrew Cooper
@ 2018-01-19 10:45   ` Jan Beulich
  2018-01-19 11:05     ` Andrew Cooper
  2018-01-22 14:50     ` Andrew Cooper
  0 siblings, 2 replies; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 10:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
> @@ -153,14 +168,44 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>  {
>      const struct vcpu *curr = current;
>      struct domain *d = v->domain;
> +    const struct cpuid_policy *cp = d->arch.cpuid;
>      struct msr_domain_policy *dp = d->arch.msr;
>      struct msr_vcpu_policy *vp = v->arch.msr;
>  
>      switch ( msr )
>      {
>      case MSR_INTEL_PLATFORM_INFO:
> +    case MSR_ARCH_CAPABILITIES:
> +        /* Read-only */
>          goto gp_fault;
>  
> +    case MSR_SPEC_CTRL:
> +        if ( !cp->feat.ibrsb )
> +            goto gp_fault; /* MSR available? */
> +
> +        /*
> +         * Note: SPEC_CTRL_STIBP is specified as safe to use (i.e. ignored)
> +         * when STIBP isn't enumerated in hardware.
> +         */
> +
> +        if ( val & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP) )
> +            goto gp_fault; /* Rsvd bit set? */
> +
> +        vp->spec_ctrl.raw = val;
> +        break;

Did you check (or inquire) whether reading back the value on a
system which ignores the write to 1 actually produces the
written value? I'd sort of expect zero to come back instead.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD}
  2018-01-18 15:46 ` [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD} Andrew Cooper
  2018-01-18 18:04   ` Boris Ostrovsky
@ 2018-01-19 10:52   ` Jan Beulich
  2018-01-19 10:54     ` Andrew Cooper
  2018-01-22  1:47   ` Tian, Kevin
  2 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 10:52 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Boris Ostrovsky, Kevin Tian, Jun Nakajima, SuraveeSuthikulpanit,
	Xen-devel

>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
> @@ -292,6 +301,16 @@ static int update_domain_cpuid_info(struct domain *d,
>              d->arch.pv_domain.cpuidmasks->e1cd = mask;
>          }
>          break;
> +
> +    case 0x80000008:
> +        /*
> +         * If the IBRB policy has changed, we need to recalculate the MSR

"IBPB" I think? Other than that
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 10:40   ` Jan Beulich
@ 2018-01-19 10:53     ` Andrew Cooper
  2018-01-19 11:46       ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 10:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 10:40, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>> For guest safety, we treat STIBP as special, always override the toolstack
>> choice, and always advertise STIBP if IBRS is available.  This removes the
>> corner case where STIBP is not advertised, but the guest is running on
>> HT-capable hardware where it does matter.
> I guess the answer to my question may live somewhere later in the
> series, but since I haven't got there yet: Is this based on the
> assumption that on HT-capable hardware they would always be
> available together? Otherwise, how do you emulate STIBP for the
> guest if all you've got is IBRS/IBPB?

The safety depends on the guest seeing STIBP and using it if it wants
to.  (Not that I've seen any sign of STIBP in the Linux code, or from
observing what Windows appears to do).

For topology reasons (despite the other cans of worms in this area), we
unilaterally set HT, so all guests should find themselves on HT-capable
systems.

>
>> --- a/xen/arch/x86/cpuid.c
>> +++ b/xen/arch/x86/cpuid.c
>> @@ -383,6 +383,16 @@ static void __init calculate_pv_max_policy(void)
>>      /* Unconditionally claim to be able to set the hypervisor bit. */
>>      __set_bit(X86_FEATURE_HYPERVISOR, pv_featureset);
>>  
>> +    /* On hardware with IBRS/IBPB support, there are further adjustments. */
>> +    if ( test_bit(X86_FEATURE_IBRSB, pv_featureset) )
>> +    {
>> +        /* Offer STIBP unconditionally.  It is a nop on non-HT hardware. */
>> +        __set_bit(X86_FEATURE_STIBP, pv_featureset);
>> +
>> +        /* AMD's IBPB is a subset of IBRS/IBPB. */
>> +        __set_bit(X86_FEATURE_IBPB, pv_featureset);
>> +    }
>> +
>>      sanitise_featureset(pv_featureset);
>>      cpuid_featureset_to_policy(pv_featureset, p);
>>      recalculate_xstate(p);
>> @@ -440,6 +450,16 @@ static void __init calculate_hvm_max_policy(void)
>>              __clear_bit(X86_FEATURE_XSAVES, hvm_featureset);
>>      }
>>  
>> +    /* On hardware with IBRS/IBPB support, there are further adjustments. */
>> +    if ( test_bit(X86_FEATURE_IBRSB, hvm_featureset) )
>> +    {
>> +        /* Offer STIBP unconditionally.  It is a nop on non-HT hardware. */
>> +        __set_bit(X86_FEATURE_STIBP, hvm_featureset);
>> +
>> +        /* AMD's IBPB is a subset of IBRS/IBPB. */
>> +        __set_bit(X86_FEATURE_IBPB, hvm_featureset);
>> +    }
> As long as we don't expect this logic to grow, having it duplicated
> like this is probably fine. Otherwise a helper function might be
> better.

I'll see about this when I get back to the CPUID work.  There probably
will be others actions which are common across guest types, so a helper
is probably worthwhile, but some bits of this will change when we expose
full cpuid_policies to the toolstack.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD}
  2018-01-19 10:52   ` Jan Beulich
@ 2018-01-19 10:54     ` Andrew Cooper
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 10:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Boris Ostrovsky, Kevin Tian, Jun Nakajima, SuraveeSuthikulpanit,
	Xen-devel

On 19/01/18 10:52, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>> @@ -292,6 +301,16 @@ static int update_domain_cpuid_info(struct domain *d,
>>              d->arch.pv_domain.cpuidmasks->e1cd = mask;
>>          }
>>          break;
>> +
>> +    case 0x80000008:
>> +        /*
>> +         * If the IBRB policy has changed, we need to recalculate the MSR
> "IBPB" I think? Other than that
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Indeed.  Sorry.  (I'm going to grep across the codebase for any other
mistakes before committing, because I doubt this is the only example).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 03/11] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} for guests
  2018-01-19 10:45   ` Jan Beulich
@ 2018-01-19 11:05     ` Andrew Cooper
  2018-01-22 14:50     ` Andrew Cooper
  1 sibling, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 11:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andi Kleen, Jun Nakajima, Dave Hansen, Asit K Mallick,
	tim.c.chen, Xen-devel, Peter Zijlstra, arjan.van.de.ven,
	Dan Williams, David Woodhouse, Ashok Raj

On 19/01/18 10:45, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>> @@ -153,14 +168,44 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>>  {
>>      const struct vcpu *curr = current;
>>      struct domain *d = v->domain;
>> +    const struct cpuid_policy *cp = d->arch.cpuid;
>>      struct msr_domain_policy *dp = d->arch.msr;
>>      struct msr_vcpu_policy *vp = v->arch.msr;
>>  
>>      switch ( msr )
>>      {
>>      case MSR_INTEL_PLATFORM_INFO:
>> +    case MSR_ARCH_CAPABILITIES:
>> +        /* Read-only */
>>          goto gp_fault;
>>  
>> +    case MSR_SPEC_CTRL:
>> +        if ( !cp->feat.ibrsb )
>> +            goto gp_fault; /* MSR available? */
>> +
>> +        /*
>> +         * Note: SPEC_CTRL_STIBP is specified as safe to use (i.e. ignored)
>> +         * when STIBP isn't enumerated in hardware.
>> +         */
>> +
>> +        if ( val & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP) )
>> +            goto gp_fault; /* Rsvd bit set? */
>> +
>> +        vp->spec_ctrl.raw = val;
>> +        break;
> Did you check (or inquire) whether reading back the value on a
> system which ignores the write to 1 actually produces the
> written value? I'd sort of expect zero to come back instead.

Very good question.  I sadly don't having a suitable hardware/microcode
combination to experiment with at the moment.

Given that the point of ignoring the write to 1 was to make things
easier for virt/migration scenarios, I really hope the answer is "read
as written", rather than "read as zero".

CC'ing a bunch of people in the hopes that someone might have an answer.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-18 15:46 ` [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point Andrew Cooper
  2018-01-19 10:39   ` Andrew Cooper
@ 2018-01-19 11:43   ` Jan Beulich
  2018-01-19 13:36     ` Andrew Cooper
  1 sibling, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 11:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
> We need to be able to either set or clear IBRS in Xen context, as well as
> restore appropriate guest values in guest context.  See the documentation in
> asm-x86/spec_ctrl_asm.h for details.
> 
> Writes to %cr3 are slower when SPEC_CTRL.IBRS is set, so the
> SPEC_CTRL_{ENTRY/EXIT}* positioning is important, and optimised for the
> expected common case of Xen not using IBRS itself.

And this expectation is because on pre-Skylake we're fine using just
retpoline, and we expect post-Skylake to not have the issue anymore?
Such reasoning would help being spelled out here.

As to the behavior of CR3 writes - is this written down anywhere in
Intel's and/or AMD's docs? Or else, how do you know this is uniformly
the case on all past, current, and future hardware?

> @@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
>  .Lvmx_vmentry_fail:
>          sti
>          SAVE_ALL
> +
> +        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */

I think the use of the PV variant here requires a comment.

> @@ -142,6 +146,13 @@ ENTRY(compat_restore_all_guest)
>          .popsection
>          or    $X86_EFLAGS_IF,%r11
>          mov   %r11d,UREGS_eflags(%rsp)
> +
> +        mov VCPU_arch_msr(%rbx), %rax
> +        mov VCPUMSR_spec_ctrl_raw(%rax), %eax

I understand that it's code like this which is missing from the SVM
and VMX exit paths.

> @@ -729,6 +760,9 @@ ENTRY(nmi)
>  handle_ist_exception:
>          SAVE_ALL CLAC
>  
> +        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
> +        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */

Following my considerations towards alternative patching to
eliminate as much overhead as possible from the Meltdown
band-aid in case it is being disabled, I'm rather hesitant to see any
patchable code being introduced into the NMI/#MC entry paths
without the patching logic first being made safe in this regard.
Exceptions coming here aren't very frequent (except perhaps on
hardware about to die), so the path isn't performance critical.
Therefore I think we should try to avoid any patching here, and
just conditionals instead. This in fact is one of the reasons why I
didn't want to macro-ize the assembly additions done in the
Meltdown band-aid.

I do realize that this then also affects the exit-to-Xen path,
which I agree is less desirable to use conditionals on.

> --- /dev/null
> +++ b/xen/include/asm-x86/spec_ctrl_asm.h
> @@ -0,0 +1,227 @@
> +/******************************************************************************
> + * include/asm-x86/spec_ctrl.h
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2017-2018 Citrix Systems Ltd.
> + */
> +
> +#ifndef __X86_SPEC_CTRL_ASM_H__
> +#define __X86_SPEC_CTRL_ASM_H__
> +
> +#ifdef __ASSEMBLY__
> +#include <asm/msr.h>
> +
> +/*
> + * Saving and restoring MSR_SPEC_CTRL state is a little tricky.
> + *
> + * We want the guests choice of SPEC_CTRL while in guest context, and IBRS
> + * (set or clear, depending on the hardware) while running in Xen context.

To me this continues to be somewhat strange to read. One possibility
to improve it would be to move the opening parenthesis after "set or
clear", another would be to say "..., and Xen's choice (set or ...".
But you're the native speaker, so I'm open to other alternatives you
may consider even better.

> + * Therefore, a simplistic algorithm is:
> + *
> + *  - Set/clear IBRS on entry to Xen
> + *  - Set the guests' choice on exit to guest
> + *  - Leave SPEC_CTRL unchanged on exit to xen
> + *
> + * There are two complicating factors:
> + *  1) HVM guests can have direct access to the MSR, so it can change
> + *     behind Xen's back.
> + *  2) An NMI or MCE can interrupt at any point, including early in the entry
> + *     path, or late in the exit path after restoring the guest value.  This
> + *     will corrupt the guest value.
> + *
> + * Factor 1 is dealt with by relying on NMIs/MCEs being blocked immediately
> + * after VMEXIT.  The VMEXIT-specific code reads MSR_SPEC_CTRL and updates
> + * current before loading Xen's MSR_SPEC_CTRL setting.
> + *
> + * Factor 2 is harder.  We maintain a shadow_spec_ctrl value, and
> + * use_shadow_spec_ctrl boolean per cpu.  The synchronous use is:
> + *
> + *  1) Store guest value in shadow_spec_ctrl
> + *  2) Set use_shadow_spec_ctrl boolean
> + *  3) Load guest value into MSR_SPEC_CTRL
> + *  4) Exit to guest
> + *  5) Entry from guest
> + *  6) Clear use_shadow_spec_ctrl boolean

7) Load Xen value into MSR_SPEC_CTRL

(It is important to spell out that the MSR write in _both_ cases
needs to come after the write of the boolean.)

> +.macro DO_SPEC_CTRL_ENTRY_FROM_VMEXIT ibrs_val:req
> +/*
> + * Requires %rbx=current, %rsp=regs/cpuinfo
> + * Clobbers %rax, %rcx, %rdx
> + *
> + * The common case is that a guest has direct access to MSR_SPEC_CTRL, at
> + * which point we need to save the guest value before setting IBRS for Xen.
> + * Unilaterally saving the guest value is shorter and faster than checking.
> + */
> +    mov $MSR_SPEC_CTRL, %ecx
> +    rdmsr
> +
> +    /* Stash the value from hardware. */
> +    mov VCPU_arch_msr(%rbx), %rdx
> +    mov %al, VCPUMSR_spec_ctrl_raw(%rdx)

Patch 3 makes this a 32-bit field now - why still %al?

> +.macro DO_SPEC_CTRL_ENTRY maybexen:req ibrs_val:req
> +/*
> + * Requires %rsp=regs (also cpuinfo if !maybexen)
> + * Clobbers %rax, %rcx, %rdx
> + *
> + * PV guests can't update MSR_SPEC_CTRL behind Xen's back, so no need to read
> + * it back.  Entries from guest context need to clear SPEC_CTRL shadowing,
> + * while entries from Xen must leave shadowing in its current state.
> + */
> +    mov $MSR_SPEC_CTRL, %ecx
> +
> +    .if \maybexen
> +        testb $3, UREGS_cs(%rsp)
> +        jz .L\@_entry_from_xen
> +    .endif
> +
> +    /*
> +     * Clear SPEC_CTRL shadowing *before* loading Xen's value.  If entering
> +     * from a possibly-xen context, %rsp doesn't necessarily alias the cpuinfo
> +     * block so calculate the position directly.
> +     */
> +    .if \maybexen
> +        GET_STACK_END(dx)

In almost all cases (and perhaps in all ones with maybexen set) the
macro invocation sits right ahead of a GET_STACK_BASE(). If you
swapped them and allowed the register to be passed in, you could
avoid this load.

> +.macro DO_SPEC_CTRL_EXIT_TO_GUEST
> +/*
> + * Requires %eax=spec_ctrl, %rsp=regs/cpuinfo
> + * Clobbers %rax, %rcx, %rdx

%rax isn't really being clobbered anymore. The clobber annotations
on all use sites appear to be similarly stale.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 10:53     ` Andrew Cooper
@ 2018-01-19 11:46       ` Jan Beulich
  2018-01-19 12:01         ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 11:46 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 19.01.18 at 11:53, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 10:40, Jan Beulich wrote:
>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>> For guest safety, we treat STIBP as special, always override the toolstack
>>> choice, and always advertise STIBP if IBRS is available.  This removes the
>>> corner case where STIBP is not advertised, but the guest is running on
>>> HT-capable hardware where it does matter.
>> I guess the answer to my question may live somewhere later in the
>> series, but since I haven't got there yet: Is this based on the
>> assumption that on HT-capable hardware they would always be
>> available together? Otherwise, how do you emulate STIBP for the
>> guest if all you've got is IBRS/IBPB?
> 
> The safety depends on the guest seeing STIBP and using it if it wants
> to.  (Not that I've seen any sign of STIBP in the Linux code, or from
> observing what Windows appears to do).
> 
> For topology reasons (despite the other cans of worms in this area), we
> unilaterally set HT, so all guests should find themselves on HT-capable
> systems.

But this doesn't answer my question: What do you do if the guest
uses STIBP (because you've told it that it can), but the hardware
doesn't support it? Aren't you producing a false sense of security
to the guest this way?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 11:46       ` Jan Beulich
@ 2018-01-19 12:01         ` Andrew Cooper
  2018-01-19 12:11           ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 12:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 11:46, Jan Beulich wrote:
>>>> On 19.01.18 at 11:53, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 10:40, Jan Beulich wrote:
>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>> For guest safety, we treat STIBP as special, always override the toolstack
>>>> choice, and always advertise STIBP if IBRS is available.  This removes the
>>>> corner case where STIBP is not advertised, but the guest is running on
>>>> HT-capable hardware where it does matter.
>>> I guess the answer to my question may live somewhere later in the
>>> series, but since I haven't got there yet: Is this based on the
>>> assumption that on HT-capable hardware they would always be
>>> available together? Otherwise, how do you emulate STIBP for the
>>> guest if all you've got is IBRS/IBPB?
>> The safety depends on the guest seeing STIBP and using it if it wants
>> to.  (Not that I've seen any sign of STIBP in the Linux code, or from
>> observing what Windows appears to do).
>>
>> For topology reasons (despite the other cans of worms in this area), we
>> unilaterally set HT, so all guests should find themselves on HT-capable
>> systems.
> But this doesn't answer my question: What do you do if the guest
> uses STIBP (because you've told it that it can), but the hardware
> doesn't support it? Aren't you producing a false sense of security
> to the guest this way?

The entire point of SPEC_CTRL_STIBP being ignored on some hardware is to
let this work.

By advertising STIBP, we are telling the guest "There might be (but not
definitely) interference from other threads in the BTB.  If you care
about this, you should set SPEC_CTRL.STIBP".

On hardware where there is definitely no interference, this is a nop.

In any situation where a guest might migrate to a host where there is
interference, it needs to know about STIBP so (if it cares) it can
choose to set SPEC_CTRL.STIBP.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use
  2018-01-18 15:46 ` [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use Andrew Cooper
@ 2018-01-19 12:06   ` Jan Beulich
  2018-01-19 13:48     ` Andrew Cooper
  2018-01-19 14:01   ` Jan Beulich
  1 sibling, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 12:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
> @@ -124,7 +186,21 @@ void __init init_speculation_mitigations(void)
>               */
>              if ( cpu_has_lfence_dispatch )
>                  thunk = THUNK_LFENCE;
> +            /*
> +             * On Intel hardware, we'd like to use retpoline in preference to
> +             * IBRS, but only if it is safe on this hardware.
> +             */
> +            else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
> +            {
> +                if ( retpoline_safe() )
> +                    thunk = THUNK_RETPOLINE;
> +                else
> +                    ibrs = true;
> +            }

I think I had asked about this piece of code before, but maybe not
the same I'm noticing now: Why is using retpoline dependent upon
IBRSB? I.e. why not

            else if ( retpoline_safe() )
                thunk = THUNK_RETPOLINE;
            else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
                ibrs = true;

?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 12:01         ` Andrew Cooper
@ 2018-01-19 12:11           ` Jan Beulich
  2018-01-19 12:36             ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 12:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 19.01.18 at 13:01, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 11:46, Jan Beulich wrote:
>>>>> On 19.01.18 at 11:53, <andrew.cooper3@citrix.com> wrote:
>>> On 19/01/18 10:40, Jan Beulich wrote:
>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>> For guest safety, we treat STIBP as special, always override the toolstack
>>>>> choice, and always advertise STIBP if IBRS is available.  This removes the
>>>>> corner case where STIBP is not advertised, but the guest is running on
>>>>> HT-capable hardware where it does matter.
>>>> I guess the answer to my question may live somewhere later in the
>>>> series, but since I haven't got there yet: Is this based on the
>>>> assumption that on HT-capable hardware they would always be
>>>> available together? Otherwise, how do you emulate STIBP for the
>>>> guest if all you've got is IBRS/IBPB?
>>> The safety depends on the guest seeing STIBP and using it if it wants
>>> to.  (Not that I've seen any sign of STIBP in the Linux code, or from
>>> observing what Windows appears to do).
>>>
>>> For topology reasons (despite the other cans of worms in this area), we
>>> unilaterally set HT, so all guests should find themselves on HT-capable
>>> systems.
>> But this doesn't answer my question: What do you do if the guest
>> uses STIBP (because you've told it that it can), but the hardware
>> doesn't support it? Aren't you producing a false sense of security
>> to the guest this way?
> 
> The entire point of SPEC_CTRL_STIBP being ignored on some hardware is to
> let this work.
> 
> By advertising STIBP, we are telling the guest "There might be (but not
> definitely) interference from other threads in the BTB.  If you care
> about this, you should set SPEC_CTRL.STIBP".
> 
> On hardware where there is definitely no interference, this is a nop.
> 
> In any situation where a guest might migrate to a host where there is
> interference, it needs to know about STIBP so (if it cares) it can
> choose to set SPEC_CTRL.STIBP.

This is the part that is clear, but my question remains unanswered:
If HT hardware doesn't support STIBP, how can the guest guard
itself _successfully_? Afaict it will only think it is safe in such a case.
As said in my very first reply on this thread, the answer may well
be "We expect STIBP and IBRS to always come together on HT
hardware", but that's not written down anywhere afaics.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 12:11           ` Jan Beulich
@ 2018-01-19 12:36             ` Andrew Cooper
  2018-01-19 12:52               ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 12:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 12:11, Jan Beulich wrote:
>>>> On 19.01.18 at 13:01, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 11:46, Jan Beulich wrote:
>>>>>> On 19.01.18 at 11:53, <andrew.cooper3@citrix.com> wrote:
>>>> On 19/01/18 10:40, Jan Beulich wrote:
>>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>>> For guest safety, we treat STIBP as special, always override the toolstack
>>>>>> choice, and always advertise STIBP if IBRS is available.  This removes the
>>>>>> corner case where STIBP is not advertised, but the guest is running on
>>>>>> HT-capable hardware where it does matter.
>>>>> I guess the answer to my question may live somewhere later in the
>>>>> series, but since I haven't got there yet: Is this based on the
>>>>> assumption that on HT-capable hardware they would always be
>>>>> available together? Otherwise, how do you emulate STIBP for the
>>>>> guest if all you've got is IBRS/IBPB?
>>>> The safety depends on the guest seeing STIBP and using it if it wants
>>>> to.  (Not that I've seen any sign of STIBP in the Linux code, or from
>>>> observing what Windows appears to do).
>>>>
>>>> For topology reasons (despite the other cans of worms in this area), we
>>>> unilaterally set HT, so all guests should find themselves on HT-capable
>>>> systems.
>>> But this doesn't answer my question: What do you do if the guest
>>> uses STIBP (because you've told it that it can), but the hardware
>>> doesn't support it? Aren't you producing a false sense of security
>>> to the guest this way?
>> The entire point of SPEC_CTRL_STIBP being ignored on some hardware is to
>> let this work.
>>
>> By advertising STIBP, we are telling the guest "There might be (but not
>> definitely) interference from other threads in the BTB.  If you care
>> about this, you should set SPEC_CTRL.STIBP".
>>
>> On hardware where there is definitely no interference, this is a nop.
>>
>> In any situation where a guest might migrate to a host where there is
>> interference, it needs to know about STIBP so (if it cares) it can
>> choose to set SPEC_CTRL.STIBP.
> This is the part that is clear, but my question remains unanswered:
> If HT hardware doesn't support STIBP, how can the guest guard
> itself _successfully_?

I'm completely lost now.  On hardware which doesn't support STIBP, there
is no action required required.

> Afaict it will only think it is safe in such a case.
> As said in my very first reply on this thread, the answer may well
> be "We expect STIBP and IBRS to always come together on HT
> hardware", but that's not written down anywhere afaics.

It is safe for a guest to use STIBP in on harwdare where STIBP it isn't
actually required for safety.

A guest is not safe if it believes it doesn't need to use STIBP, and
migrates to a host which does require STIBP for safety.

I'm not sure how else to try and explain this.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-18 15:46 ` [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen Andrew Cooper
@ 2018-01-19 12:47   ` Jan Beulich
  2018-01-19 14:24     ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 12:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
> @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can be 
> used to force or
>  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
>  functionality is still set up so IBRS can be virtualised for guests.
>  
> +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when the
> +RSB gets overwritten.  There are individual controls for an entry from HVM
> +context, and an entry from a native (PV or Xen) context.

Would you mind adding a sentence or two to the description making
clear what use this fine grained control is? I can't really figure why I
might need to be concerned about one of the two cases, but not the
other.

> --- a/xen/arch/x86/spec_ctrl.c
> +++ b/xen/arch/x86/spec_ctrl.c
> @@ -33,6 +33,7 @@ static enum ind_thunk {
>      THUNK_JMP,
>  } opt_thunk __initdata = THUNK_DEFAULT;
>  static int opt_ibrs __initdata = -1;
> +static bool opt_rsb_native __initdata = true, opt_rsb_vmexit __initdata = true;

Would you mind splitting this into two separate declarations? The
double use of __initdata is necessary, but looks odd.

Also the latest here it becomes clear that, while minor, opt_ibrs
would better be int8_t.

> @@ -243,6 +250,29 @@ void __init init_speculation_mitigations(void)
>              setup_force_cpu_cap(X86_FEATURE_XEN_IBRS_CLEAR);
>      }
>  
> +    /*
> +     * PV guests can poison the RSB to any virtual address from which
> +     * they can execute a call instruction.  This is necessarily outside
> +     * of the Xen supervisor mappings.
> +     *
> +     * With SMEP enabled, the processor won't speculate into user
> +     * mappings.  Therefore, we don't need to worry about poisoned
> +     * entries from 64bit PV guests.

"... in this case" (i.e. when SMEP is enabled).

> --- a/xen/include/asm-x86/spec_ctrl_asm.h
> +++ b/xen/include/asm-x86/spec_ctrl_asm.h
> @@ -73,6 +73,40 @@
>   *  - SPEC_CTRL_EXIT_TO_GUEST
>   */
>  
> +.macro DO_OVERWRITE_RSB
> +/*
> + * Requires nothing
> + * Clobbers %rax, %rcx
> + *
> + * Requires 256 bytes of stack space, but %rsp has no net change. Based on
> + * Google's performance numbers, the loop is unrolled to 16 iterations and two
> + * calls per iteration.
> + *
> + * The call filling the RSB needs a nonzero displacement, but we use "1:
> + * pause, jmp 1b" to safely contains any ret-based speculation, even if the
> + * loop is speculatively executed prematurely.

I'm struggling to understand why you use "but" here. Maybe just a
lack of English skills on my part?

> + * %rsp is preserved by using an extra GPR because a) we've got plenty spare,
> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can be
> + * optimised with mov-elimination in modern cores.
> + */
> +    mov $16, %ecx   /* 16 iterations, two calls per loop */
> +    mov %rsp, %rax  /* Store the current %rsp */
> +
> +.L\@_fill_rsb_loop:
> +
> +    .rept 2         /* Unrolled twice. */
> +    call 2f         /* Create an RSB entry. */
> +1:  pause
> +    jmp 1b          /* Capture rogue speculation. */
> +2:

I won't further insist on changing away from numeric labels here, but
I'd still like to point out an example of a high risk use of such labels in
mainline code: There's a "jz 1b" soon after
exception_with_ints_disabled, leading across _two_ other labels and
quite a few insns and macro invocations. May I at the very least
suggest that you don't use 1 and 2 here?

> +    .endr
> +
> +    sub $1, %ecx
> +    jnz .L\@_fill_rsb_loop

Iirc it's only CMP and TEST which can be fused with Jcc. Would
it perhaps help slightly to move the SUB ahead of the two CALLs?
And once they are farther apart, perhaps DEC wouldn't be
that bad a choice anymore?

> +    mov %rax, %rsp  /* Retore old %rsp */

Restore

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 12:36             ` Andrew Cooper
@ 2018-01-19 12:52               ` Jan Beulich
  2018-01-19 13:06                 ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 12:52 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 19.01.18 at 13:36, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 12:11, Jan Beulich wrote:
>>>>> On 19.01.18 at 13:01, <andrew.cooper3@citrix.com> wrote:
>>> On 19/01/18 11:46, Jan Beulich wrote:
>>>>>>> On 19.01.18 at 11:53, <andrew.cooper3@citrix.com> wrote:
>>>>> On 19/01/18 10:40, Jan Beulich wrote:
>>>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>>>> For guest safety, we treat STIBP as special, always override the toolstack
>>>>>>> choice, and always advertise STIBP if IBRS is available.  This removes the
>>>>>>> corner case where STIBP is not advertised, but the guest is running on
>>>>>>> HT-capable hardware where it does matter.
>>>>>> I guess the answer to my question may live somewhere later in the
>>>>>> series, but since I haven't got there yet: Is this based on the
>>>>>> assumption that on HT-capable hardware they would always be
>>>>>> available together? Otherwise, how do you emulate STIBP for the
>>>>>> guest if all you've got is IBRS/IBPB?
>>>>> The safety depends on the guest seeing STIBP and using it if it wants
>>>>> to.  (Not that I've seen any sign of STIBP in the Linux code, or from
>>>>> observing what Windows appears to do).
>>>>>
>>>>> For topology reasons (despite the other cans of worms in this area), we
>>>>> unilaterally set HT, so all guests should find themselves on HT-capable
>>>>> systems.
>>>> But this doesn't answer my question: What do you do if the guest
>>>> uses STIBP (because you've told it that it can), but the hardware
>>>> doesn't support it? Aren't you producing a false sense of security
>>>> to the guest this way?
>>> The entire point of SPEC_CTRL_STIBP being ignored on some hardware is to
>>> let this work.
>>>
>>> By advertising STIBP, we are telling the guest "There might be (but not
>>> definitely) interference from other threads in the BTB.  If you care
>>> about this, you should set SPEC_CTRL.STIBP".
>>>
>>> On hardware where there is definitely no interference, this is a nop.
>>>
>>> In any situation where a guest might migrate to a host where there is
>>> interference, it needs to know about STIBP so (if it cares) it can
>>> choose to set SPEC_CTRL.STIBP.
>> This is the part that is clear, but my question remains unanswered:
>> If HT hardware doesn't support STIBP, how can the guest guard
>> itself _successfully_?
> 
> I'm completely lost now.  On hardware which doesn't support STIBP, there
> is no action required required.

How that? Do you perhaps mean there's nothing we can do? Yes,
and the same applies to the guest. Yet if you've got HT hardware
which supports IBRS but not STIBP, you still tell the guest that
STIBP is available. Hence the guest seeing (and using) both, it'll
assume it is safe (and perhaps report so to its users) when in
fact it's still vulnerable.

>> Afaict it will only think it is safe in such a case.
>> As said in my very first reply on this thread, the answer may well
>> be "We expect STIBP and IBRS to always come together on HT
>> hardware", but that's not written down anywhere afaics.
> 
> It is safe for a guest to use STIBP in on harwdare where STIBP it isn't
> actually required for safety.

Yes.

> A guest is not safe if it believes it doesn't need to use STIBP, and
> migrates to a host which does require STIBP for safety.

Yes. But:

A guest is not safe if it believes it uses STIBP, but that's just fake.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 12:52               ` Jan Beulich
@ 2018-01-19 13:06                 ` Andrew Cooper
  2018-01-19 13:33                   ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 13:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 12:52, Jan Beulich wrote:
>>>> On 19.01.18 at 13:36, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 12:11, Jan Beulich wrote:
>>>>>> On 19.01.18 at 13:01, <andrew.cooper3@citrix.com> wrote:
>>>> On 19/01/18 11:46, Jan Beulich wrote:
>>>>>>>> On 19.01.18 at 11:53, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 19/01/18 10:40, Jan Beulich wrote:
>>>>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> For guest safety, we treat STIBP as special, always override the toolstack
>>>>>>>> choice, and always advertise STIBP if IBRS is available.  This removes the
>>>>>>>> corner case where STIBP is not advertised, but the guest is running on
>>>>>>>> HT-capable hardware where it does matter.
>>>>>>> I guess the answer to my question may live somewhere later in the
>>>>>>> series, but since I haven't got there yet: Is this based on the
>>>>>>> assumption that on HT-capable hardware they would always be
>>>>>>> available together? Otherwise, how do you emulate STIBP for the
>>>>>>> guest if all you've got is IBRS/IBPB?
>>>>>> The safety depends on the guest seeing STIBP and using it if it wants
>>>>>> to.  (Not that I've seen any sign of STIBP in the Linux code, or from
>>>>>> observing what Windows appears to do).
>>>>>>
>>>>>> For topology reasons (despite the other cans of worms in this area), we
>>>>>> unilaterally set HT, so all guests should find themselves on HT-capable
>>>>>> systems.
>>>>> But this doesn't answer my question: What do you do if the guest
>>>>> uses STIBP (because you've told it that it can), but the hardware
>>>>> doesn't support it? Aren't you producing a false sense of security
>>>>> to the guest this way?
>>>> The entire point of SPEC_CTRL_STIBP being ignored on some hardware is to
>>>> let this work.
>>>>
>>>> By advertising STIBP, we are telling the guest "There might be (but not
>>>> definitely) interference from other threads in the BTB.  If you care
>>>> about this, you should set SPEC_CTRL.STIBP".
>>>>
>>>> On hardware where there is definitely no interference, this is a nop.
>>>>
>>>> In any situation where a guest might migrate to a host where there is
>>>> interference, it needs to know about STIBP so (if it cares) it can
>>>> choose to set SPEC_CTRL.STIBP.
>>> This is the part that is clear, but my question remains unanswered:
>>> If HT hardware doesn't support STIBP, how can the guest guard
>>> itself _successfully_?
>> I'm completely lost now.  On hardware which doesn't support STIBP, there
>> is no action required required.
> How that? Do you perhaps mean there's nothing we can do? Yes,
> and the same applies to the guest. Yet if you've got HT hardware
> which supports IBRS but not STIBP

If Intel's microcode fails to advertise STIBP on HT-hardware where it is
required for safety, then its broken microcode.

> you still tell the guest that STIBP is available. Hence the guest seeing (and using) both, it'll
> assume it is safe (and perhaps report so to its users) when in
> fact it's still vulnerable.

Ok - I see your point now, but there is nothing we can do about that.

Even if Xen faithfully reported the (lack of) STIBP to the guest, and
fixed up behind the scenes just in case (as per v8 of my series), the
guest would still be vulnerable.

There are some thing we are simply going to have to trust the microcode
to do right.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  2018-01-18 15:46 ` [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts Andrew Cooper
@ 2018-01-19 13:25   ` Jan Beulich
  2018-01-19 14:48     ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 13:25 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: David Woodhouse, Xen-devel

>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1743,6 +1743,29 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>          }
>  
>          ctxt_switch_levelling(next);
> +
> +        if ( opt_ibpb )
> +        {
> +            static DEFINE_PER_CPU(unsigned int, last_nonidle);
> +            unsigned int *last_id = &this_cpu(last_nonidle);

"nonidle" is not entirely correct without an is_idle_...() check around
it, as the outer condition leaves room for idle vCPU-s to make it here.
But take this as a remark, not a strict request to change the name.

> +            /* Squash the domid and vcpu id together for efficiency. */
> +            unsigned int next_id = (((unsigned int)nextd->domain_id << 16) |
> +                                    (uint16_t)next->vcpu_id);

Is this really more efficient than just comparing struct vcpu pointers?

> +            BUILD_BUG_ON(MAX_VIRT_CPUS > 0xffff);
> +
> +            /*
> +             * When scheduling from a vcpu, to idle, and back to the same vcpu
> +             * (which might be common in a lightly loaded system, or when
> +             * using vcpu pinning), there is no need to issue IBPB, as we are
> +             * returning to the same security context.
> +             */
> +            if ( *last_id != next_id )
> +            {
> +                wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
> +                *last_id = next_id;
> +            }

I've read David's mails regarding this another time, but I still can't
conclude why this extra logic would be necessary. Transitions
from a guest vCPU through idle and back to that very vCPU do
not alter per_cpu(curr_vcpu, ...) - __context_switch() is the
only place to update it. There's certainly the potential for it to
be called through __sync_local_execstate(), but is that a
common case? I'd support introduction of the extra logic only if
so.

Furthermore, if this indeed was a sufficiently common case, doing
lazy context switches at all for HVM guests would once again
need to be put under question.

> --- a/xen/arch/x86/spec_ctrl.c
> +++ b/xen/arch/x86/spec_ctrl.c
> @@ -33,6 +33,7 @@ static enum ind_thunk {
>      THUNK_JMP,
>  } opt_thunk __initdata = THUNK_DEFAULT;
>  static int opt_ibrs __initdata = -1;
> +bool __read_mostly opt_ibpb = true;
>  static bool opt_rsb_native __initdata = true, opt_rsb_vmexit __initdata = true;

Considering the (better imo) placement of __read_mostly here,
would you mind moving the __initdata accordingly in the earlier
patches (opt_thunk having reasons to be an exception)?
Otherwise please move the __read_mostly to the end.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 13:06                 ` Andrew Cooper
@ 2018-01-19 13:33                   ` Jan Beulich
  2018-01-19 15:00                     ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 13:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 19.01.18 at 14:06, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 12:52, Jan Beulich wrote:
>>>>> On 19.01.18 at 13:36, <andrew.cooper3@citrix.com> wrote:
>>> On 19/01/18 12:11, Jan Beulich wrote:
>>>>>>> On 19.01.18 at 13:01, <andrew.cooper3@citrix.com> wrote:
>>>>> On 19/01/18 11:46, Jan Beulich wrote:
>>>>>>>>> On 19.01.18 at 11:53, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 19/01/18 10:40, Jan Beulich wrote:
>>>>>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> For guest safety, we treat STIBP as special, always override the toolstack
>>>>>>>>> choice, and always advertise STIBP if IBRS is available.  This removes the
>>>>>>>>> corner case where STIBP is not advertised, but the guest is running on
>>>>>>>>> HT-capable hardware where it does matter.
>>>>>>>> I guess the answer to my question may live somewhere later in the
>>>>>>>> series, but since I haven't got there yet: Is this based on the
>>>>>>>> assumption that on HT-capable hardware they would always be
>>>>>>>> available together? Otherwise, how do you emulate STIBP for the
>>>>>>>> guest if all you've got is IBRS/IBPB?
>>>>>>> The safety depends on the guest seeing STIBP and using it if it wants
>>>>>>> to.  (Not that I've seen any sign of STIBP in the Linux code, or from
>>>>>>> observing what Windows appears to do).
>>>>>>>
>>>>>>> For topology reasons (despite the other cans of worms in this area), we
>>>>>>> unilaterally set HT, so all guests should find themselves on HT-capable
>>>>>>> systems.
>>>>>> But this doesn't answer my question: What do you do if the guest
>>>>>> uses STIBP (because you've told it that it can), but the hardware
>>>>>> doesn't support it? Aren't you producing a false sense of security
>>>>>> to the guest this way?
>>>>> The entire point of SPEC_CTRL_STIBP being ignored on some hardware is to
>>>>> let this work.
>>>>>
>>>>> By advertising STIBP, we are telling the guest "There might be (but not
>>>>> definitely) interference from other threads in the BTB.  If you care
>>>>> about this, you should set SPEC_CTRL.STIBP".
>>>>>
>>>>> On hardware where there is definitely no interference, this is a nop.
>>>>>
>>>>> In any situation where a guest might migrate to a host where there is
>>>>> interference, it needs to know about STIBP so (if it cares) it can
>>>>> choose to set SPEC_CTRL.STIBP.
>>>> This is the part that is clear, but my question remains unanswered:
>>>> If HT hardware doesn't support STIBP, how can the guest guard
>>>> itself _successfully_?
>>> I'm completely lost now.  On hardware which doesn't support STIBP, there
>>> is no action required required.
>> How that? Do you perhaps mean there's nothing we can do? Yes,
>> and the same applies to the guest. Yet if you've got HT hardware
>> which supports IBRS but not STIBP
> 
> If Intel's microcode fails to advertise STIBP on HT-hardware where it is
> required for safety, then its broken microcode.
> 
>> you still tell the guest that STIBP is available. Hence the guest seeing 
> (and using) both, it'll
>> assume it is safe (and perhaps report so to its users) when in
>> fact it's still vulnerable.
> 
> Ok - I see your point now, but there is nothing we can do about that.

There are options:

1) Disable HT (read: use only on thread on each core)
2) Don't advertise STIBP if we find ourselves on HT with IBRS but
   no STIBP.
3) Issue a bright warning (in the hope that it is noticed).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-19 11:43   ` Jan Beulich
@ 2018-01-19 13:36     ` Andrew Cooper
  2018-01-19 13:51       ` Jan Beulich
  2018-01-22 22:27       ` Boris Ostrovsky
  0 siblings, 2 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 13:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 11:43, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>> We need to be able to either set or clear IBRS in Xen context, as well as
>> restore appropriate guest values in guest context.  See the documentation in
>> asm-x86/spec_ctrl_asm.h for details.
>>
>> Writes to %cr3 are slower when SPEC_CTRL.IBRS is set, so the
>> SPEC_CTRL_{ENTRY/EXIT}* positioning is important, and optimised for the
>> expected common case of Xen not using IBRS itself.
> And this expectation is because on pre-Skylake we're fine using just
> retpoline, and we expect post-Skylake to not have the issue anymore?
> Such reasoning would help being spelled out here.

Kaby Lake is in the same bucket as Skylake.  Coffee Lake and Cannon Lake
are unknown quantities.

Intel have stated that the expect these issues to be resolved on the
same hardware as CET, as retpoline is a deliberate ROP-gadget and won't
function with Controlflow Enforcement Technology enabled.

As far as I'm aware, CET is expected in Ice Lake, but who knows what
this issue has done to Intel's release schedule.

> As to the behavior of CR3 writes - is this written down anywhere in
> Intel's and/or AMD's docs? Or else, how do you know this is uniformly
> the case on all past, current, and future hardware?

AMD don't implement IBRS, so this isn't applicable to them.

I'm not sure if this is stated in any public documentation.  I'll see
about poking people...

Irrespective, it can be demonstrated trivially with some timing analysis.

>> @@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
>>  .Lvmx_vmentry_fail:
>>          sti
>>          SAVE_ALL
>> +
>> +        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
> I think the use of the PV variant here requires a comment.

Oh.  It used to have one...  I'll try to find it.

>
>> @@ -142,6 +146,13 @@ ENTRY(compat_restore_all_guest)
>>          .popsection
>>          or    $X86_EFLAGS_IF,%r11
>>          mov   %r11d,UREGS_eflags(%rsp)
>> +
>> +        mov VCPU_arch_msr(%rbx), %rax
>> +        mov VCPUMSR_spec_ctrl_raw(%rax), %eax
> I understand that it's code like this which is missing from the SVM
> and VMX exit paths.

Correct.

>
>> @@ -729,6 +760,9 @@ ENTRY(nmi)
>>  handle_ist_exception:
>>          SAVE_ALL CLAC
>>  
>> +        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
>> +        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
> Following my considerations towards alternative patching to
> eliminate as much overhead as possible from the Meltdown
> band-aid in case it is being disabled, I'm rather hesitant to see any
> patchable code being introduced into the NMI/#MC entry paths
> without the patching logic first being made safe in this regard.
> Exceptions coming here aren't very frequent (except perhaps on
> hardware about to die), so the path isn't performance critical.
> Therefore I think we should try to avoid any patching here, and
> just conditionals instead. This in fact is one of the reasons why I
> didn't want to macro-ize the assembly additions done in the
> Meltdown band-aid.
>
> I do realize that this then also affects the exit-to-Xen path,
> which I agree is less desirable to use conditionals on.

While I agree that our lack of IST-safe patching is a problem, these
alternative points are already present on the NMI and MCE paths, and
need to be.  As a result, the DF handler is in no worse of a position. 
As a perfect example, observe the CLAC in context.

I could perhaps be talked into making a SPEC_CTRL_ENTRY_FROM_IST variant
which doesn't use alternatives (but IMO this is pointless in the
presence of CLAC), but still don't think it is reasonable to treat DF
differently to NMI/MCE.

>
>> --- /dev/null
>> +++ b/xen/include/asm-x86/spec_ctrl_asm.h
>> @@ -0,0 +1,227 @@
>> +/******************************************************************************
>> + * include/asm-x86/spec_ctrl.h
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Copyright (c) 2017-2018 Citrix Systems Ltd.
>> + */
>> +
>> +#ifndef __X86_SPEC_CTRL_ASM_H__
>> +#define __X86_SPEC_CTRL_ASM_H__
>> +
>> +#ifdef __ASSEMBLY__
>> +#include <asm/msr.h>
>> +
>> +/*
>> + * Saving and restoring MSR_SPEC_CTRL state is a little tricky.
>> + *
>> + * We want the guests choice of SPEC_CTRL while in guest context, and IBRS
>> + * (set or clear, depending on the hardware) while running in Xen context.
> To me this continues to be somewhat strange to read. One possibility
> to improve it would be to move the opening parenthesis after "set or
> clear", another would be to say "..., and Xen's choice (set or ...".
> But you're the native speaker, so I'm open to other alternatives you
> may consider even better.

Your suggestion is clearer.  I'll borrow that.

>> + * Therefore, a simplistic algorithm is:
>> + *
>> + *  - Set/clear IBRS on entry to Xen
>> + *  - Set the guests' choice on exit to guest
>> + *  - Leave SPEC_CTRL unchanged on exit to xen
>> + *
>> + * There are two complicating factors:
>> + *  1) HVM guests can have direct access to the MSR, so it can change
>> + *     behind Xen's back.
>> + *  2) An NMI or MCE can interrupt at any point, including early in the entry
>> + *     path, or late in the exit path after restoring the guest value.  This
>> + *     will corrupt the guest value.
>> + *
>> + * Factor 1 is dealt with by relying on NMIs/MCEs being blocked immediately
>> + * after VMEXIT.  The VMEXIT-specific code reads MSR_SPEC_CTRL and updates
>> + * current before loading Xen's MSR_SPEC_CTRL setting.
>> + *
>> + * Factor 2 is harder.  We maintain a shadow_spec_ctrl value, and
>> + * use_shadow_spec_ctrl boolean per cpu.  The synchronous use is:
>> + *
>> + *  1) Store guest value in shadow_spec_ctrl
>> + *  2) Set use_shadow_spec_ctrl boolean
>> + *  3) Load guest value into MSR_SPEC_CTRL
>> + *  4) Exit to guest
>> + *  5) Entry from guest
>> + *  6) Clear use_shadow_spec_ctrl boolean
> 7) Load Xen value into MSR_SPEC_CTRL
>
> (It is important to spell out that the MSR write in _both_ cases
> needs to come after the write of the boolean.)

Ok.

>
>> +.macro DO_SPEC_CTRL_ENTRY_FROM_VMEXIT ibrs_val:req
>> +/*
>> + * Requires %rbx=current, %rsp=regs/cpuinfo
>> + * Clobbers %rax, %rcx, %rdx
>> + *
>> + * The common case is that a guest has direct access to MSR_SPEC_CTRL, at
>> + * which point we need to save the guest value before setting IBRS for Xen.
>> + * Unilaterally saving the guest value is shorter and faster than checking.
>> + */
>> +    mov $MSR_SPEC_CTRL, %ecx
>> +    rdmsr
>> +
>> +    /* Stash the value from hardware. */
>> +    mov VCPU_arch_msr(%rbx), %rdx
>> +    mov %al, VCPUMSR_spec_ctrl_raw(%rdx)
> Patch 3 makes this a 32-bit field now - why still %al?

Oversight.

>
>> +.macro DO_SPEC_CTRL_ENTRY maybexen:req ibrs_val:req
>> +/*
>> + * Requires %rsp=regs (also cpuinfo if !maybexen)
>> + * Clobbers %rax, %rcx, %rdx
>> + *
>> + * PV guests can't update MSR_SPEC_CTRL behind Xen's back, so no need to read
>> + * it back.  Entries from guest context need to clear SPEC_CTRL shadowing,
>> + * while entries from Xen must leave shadowing in its current state.
>> + */
>> +    mov $MSR_SPEC_CTRL, %ecx
>> +
>> +    .if \maybexen
>> +        testb $3, UREGS_cs(%rsp)
>> +        jz .L\@_entry_from_xen
>> +    .endif
>> +
>> +    /*
>> +     * Clear SPEC_CTRL shadowing *before* loading Xen's value.  If entering
>> +     * from a possibly-xen context, %rsp doesn't necessarily alias the cpuinfo
>> +     * block so calculate the position directly.
>> +     */
>> +    .if \maybexen
>> +        GET_STACK_END(dx)
> In almost all cases (and perhaps in all ones with maybexen set) the
> macro invocation sits right ahead of a GET_STACK_BASE(). If you
> swapped them and allowed the register to be passed in, you could
> avoid this load.

I seem to recall this being problematic.  I'll recheck.

>
>> +.macro DO_SPEC_CTRL_EXIT_TO_GUEST
>> +/*
>> + * Requires %eax=spec_ctrl, %rsp=regs/cpuinfo
>> + * Clobbers %rax, %rcx, %rdx
> %rax isn't really being clobbered anymore. The clobber annotations
> on all use sites appear to be similarly stale.

True.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use
  2018-01-19 12:06   ` Jan Beulich
@ 2018-01-19 13:48     ` Andrew Cooper
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 13:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 12:06, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>> @@ -124,7 +186,21 @@ void __init init_speculation_mitigations(void)
>>               */
>>              if ( cpu_has_lfence_dispatch )
>>                  thunk = THUNK_LFENCE;
>> +            /*
>> +             * On Intel hardware, we'd like to use retpoline in preference to
>> +             * IBRS, but only if it is safe on this hardware.
>> +             */
>> +            else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
>> +            {
>> +                if ( retpoline_safe() )
>> +                    thunk = THUNK_RETPOLINE;
>> +                else
>> +                    ibrs = true;
>> +            }
> I think I had asked about this piece of code before, but maybe not
> the same I'm noticing now: Why is using retpoline dependent upon
> IBRSB? I.e. why not
>
>             else if ( retpoline_safe() )
>                 thunk = THUNK_RETPOLINE;
>             else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
>                 ibrs = true;
>
> ?

Probably because that was my train of though when putting this
together.  It made more of a different in previous iterations when the
IBRS_SET/CLEAR decision was taken here.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-19 13:36     ` Andrew Cooper
@ 2018-01-19 13:51       ` Jan Beulich
  2018-01-22 11:42         ` Andrew Cooper
  2018-01-22 22:27       ` Boris Ostrovsky
  1 sibling, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 13:51 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 19.01.18 at 14:36, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 11:43, Jan Beulich wrote:
>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>> @@ -729,6 +760,9 @@ ENTRY(nmi)
>>>  handle_ist_exception:
>>>          SAVE_ALL CLAC
>>>  
>>> +        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
>>> +        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>> Following my considerations towards alternative patching to
>> eliminate as much overhead as possible from the Meltdown
>> band-aid in case it is being disabled, I'm rather hesitant to see any
>> patchable code being introduced into the NMI/#MC entry paths
>> without the patching logic first being made safe in this regard.
>> Exceptions coming here aren't very frequent (except perhaps on
>> hardware about to die), so the path isn't performance critical.
>> Therefore I think we should try to avoid any patching here, and
>> just conditionals instead. This in fact is one of the reasons why I
>> didn't want to macro-ize the assembly additions done in the
>> Meltdown band-aid.
>>
>> I do realize that this then also affects the exit-to-Xen path,
>> which I agree is less desirable to use conditionals on.
> 
> While I agree that our lack of IST-safe patching is a problem, these
> alternative points are already present on the NMI and MCE paths, and
> need to be.  As a result, the DF handler is in no worse of a position. 
> As a perfect example, observe the CLAC in context.

Oh, indeed. We should change that.

> I could perhaps be talked into making a SPEC_CTRL_ENTRY_FROM_IST variant
> which doesn't use alternatives (but IMO this is pointless in the
> presence of CLAC), but still don't think it is reasonable to treat DF
> differently to NMI/MCE.

#DF is debatable: On one hand I can see that if things go wrong,
it can equally be raised at any time. Otoh #MC and even more so
NMI can be raised _without_ things going (fatally) wrong, i.e. the
patching may break a boot which would otherwise have succeeded
(whereas the #DF would make the boot fail anyway).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use
  2018-01-18 15:46 ` [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use Andrew Cooper
  2018-01-19 12:06   ` Jan Beulich
@ 2018-01-19 14:01   ` Jan Beulich
  2018-01-22 15:11     ` Andrew Cooper
  1 sibling, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 14:01 UTC (permalink / raw)
  To: Andrew Cooper, Asit K Mallick, Jun Nakajima; +Cc: Xen-devel

>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
> +/* Calculate whether Retpoline is known-safe on this CPU. */
> +static bool __init retpoline_safe(void)
> +{
> +    unsigned int ucode_rev = this_cpu(ucode_cpu_info).cpu_sig.rev;
> +
> +    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
> +        return true;
> +
> +    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> +         boot_cpu_data.x86 != 6 )
> +        return false;
> +
> +    switch ( boot_cpu_data.x86_model )
> +    {
> +    case 0x17: /* Penryn */
> +    case 0x1d: /* Dunnington */
> +    case 0x1e: /* Nehalem */
> +    case 0x1f: /* Auburndale / Havendale */
> +    case 0x1a: /* Nehalem EP */
> +    case 0x2e: /* Nehalem EX */
> +    case 0x25: /* Westmere */
> +    case 0x2c: /* Westmere EP */
> +    case 0x2f: /* Westmere EX */
> +    case 0x2a: /* SandyBridge */
> +    case 0x2d: /* SandyBridge EP/EX */
> +    case 0x3a: /* IvyBridge */
> +    case 0x3e: /* IvyBridge EP/EX */
> +    case 0x3c: /* Haswell */
> +    case 0x3f: /* Haswell EX/EP */
> +    case 0x45: /* Haswell D */
> +    case 0x46: /* Haswell H */
> +        return true;

Especially regarding this part of the function (leave the other half
in context for reference) - can we perhaps shorten this? E.g.
fold together everything below the lowest Broadwell model value?
There in particular don't appear to be an Atoms in the list above,
yet either they're affected by the underlying issue as well (and
hence would benefit from whatever applicable mitigation), or we
should suppress as much mitigation overhead as possible for them
(which I don't think is the case).

In any event, as said before - this function will need an ack from
an Intel person, and it would be nice for ...

> +        /*
> +         * Broadwell processors are retpoline-safe after specific microcode
> +         * versions.
> +         */
> +    case 0x3d: /* Broadwell */
> +        return ucode_rev >= 0x28;
> +    case 0x47: /* Broadwell H */
> +        return ucode_rev >= 0x1b;
> +    case 0x4f: /* Broadwell EP/EX */
> +        return ucode_rev >= 0xb000025;
> +    case 0x56: /* Broadwell D */
> +        return false; /* TBD. */

... this to be filled in.

Jan

> +        /*
> +         * Skylake and later processors are not retpoline-safe.
> +         */
> +    default:
> +        return false;
> +    }
>  }



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-19 12:47   ` Jan Beulich
@ 2018-01-19 14:24     ` Andrew Cooper
  2018-01-19 15:02       ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 14:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 12:47, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>> @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can be 
>> used to force or
>>  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
>>  functionality is still set up so IBRS can be virtualised for guests.
>>  
>> +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when the
>> +RSB gets overwritten.  There are individual controls for an entry from HVM
>> +context, and an entry from a native (PV or Xen) context.
> Would you mind adding a sentence or two to the description making
> clear what use this fine grained control is? I can't really figure why I
> might need to be concerned about one of the two cases, but not the
> other.

I though I'd covered that in the commit message, but I'm not sure this
is a suitable place to discuss the details.  PV and HVM guests have
different reasoning for why we need to overwrite the RSB.

In the past, there used to be a default interaction of rsb_native and
SMEP, but that proved to be insufficient and rsb_native is now
unconditionally enabled.  In principle however, it should fall within
CONFIG_PV.

>
>> --- a/xen/arch/x86/spec_ctrl.c
>> +++ b/xen/arch/x86/spec_ctrl.c
>> @@ -33,6 +33,7 @@ static enum ind_thunk {
>>      THUNK_JMP,
>>  } opt_thunk __initdata = THUNK_DEFAULT;
>>  static int opt_ibrs __initdata = -1;
>> +static bool opt_rsb_native __initdata = true, opt_rsb_vmexit __initdata = true;
> Would you mind splitting this into two separate declarations? The
> double use of __initdata is necessary, but looks odd.
>
> Also the latest here it becomes clear that, while minor, opt_ibrs
> would better be int8_t.

Ok.

>
>> @@ -243,6 +250,29 @@ void __init init_speculation_mitigations(void)
>>              setup_force_cpu_cap(X86_FEATURE_XEN_IBRS_CLEAR);
>>      }
>>  
>> +    /*
>> +     * PV guests can poison the RSB to any virtual address from which
>> +     * they can execute a call instruction.  This is necessarily outside
>> +     * of the Xen supervisor mappings.
>> +     *
>> +     * With SMEP enabled, the processor won't speculate into user
>> +     * mappings.  Therefore, we don't need to worry about poisoned
>> +     * entries from 64bit PV guests.
> "... in this case" (i.e. when SMEP is enabled).

Ok.

>
>> --- a/xen/include/asm-x86/spec_ctrl_asm.h
>> +++ b/xen/include/asm-x86/spec_ctrl_asm.h
>> @@ -73,6 +73,40 @@
>>   *  - SPEC_CTRL_EXIT_TO_GUEST
>>   */
>>  
>> +.macro DO_OVERWRITE_RSB
>> +/*
>> + * Requires nothing
>> + * Clobbers %rax, %rcx
>> + *
>> + * Requires 256 bytes of stack space, but %rsp has no net change. Based on
>> + * Google's performance numbers, the loop is unrolled to 16 iterations and two
>> + * calls per iteration.
>> + *
>> + * The call filling the RSB needs a nonzero displacement, but we use "1:
>> + * pause, jmp 1b" to safely contains any ret-based speculation, even if the
>> + * loop is speculatively executed prematurely.
> I'm struggling to understand why you use "but" here. Maybe just a
> lack of English skills on my part?

"displacement.  A nop would do, but" ?

It is a justification for why we are putting more than a single byte in
the middle.

>
>> + * %rsp is preserved by using an extra GPR because a) we've got plenty spare,
>> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can be
>> + * optimised with mov-elimination in modern cores.
>> + */
>> +    mov $16, %ecx   /* 16 iterations, two calls per loop */
>> +    mov %rsp, %rax  /* Store the current %rsp */
>> +
>> +.L\@_fill_rsb_loop:
>> +
>> +    .rept 2         /* Unrolled twice. */
>> +    call 2f         /* Create an RSB entry. */
>> +1:  pause
>> +    jmp 1b          /* Capture rogue speculation. */
>> +2:
> I won't further insist on changing away from numeric labels here, but
> I'd still like to point out an example of a high risk use of such labels in
> mainline code: There's a "jz 1b" soon after
> exception_with_ints_disabled, leading across _two_ other labels and
> quite a few insns and macro invocations. May I at the very least
> suggest that you don't use 1 and 2 here?

I spent ages trying to get .L labels working here, but they don't
function inside a rept, as you end up with duplicate local symbols.

Even using irp to inject a unique number into the loop doesn't appear to
work, because the \ escape gets interpreted as a token separator. 
AFAICT, \@ is special by virtue of the fact that it doesn't count as a
token separator.

If you've got a better suggestion then I'm all ears.

Alternatively, I could manually unroll the loop, or pick some arbitrary
other numbers to use.

>
>> +    .endr
>> +
>> +    sub $1, %ecx
>> +    jnz .L\@_fill_rsb_loop
> Iirc it's only CMP and TEST which can be fused with Jcc. Would
> it perhaps help slightly to move the SUB ahead of the two CALLs?
> And once they are farther apart, perhaps DEC wouldn't be
> that bad a choice anymore?

Nehalem, as well as Builldozer and later can only fuse CMP and TEST. 
SandyBridge and later can fuse ADD, SUB, AND, INC and DEC.

Fusibility is also dependent on the condition code, but JNZ can fuse
with any of the options.

It is still unclear to what effect the INC/DEC flags merge stall
matters, but if they were better than their ADD/SUB alternatives, I'd
expect compilers to actually emit them.

>
>> +    mov %rax, %rsp  /* Retore old %rsp */
> Restore

Oops.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  2018-01-19 13:25   ` Jan Beulich
@ 2018-01-19 14:48     ` Andrew Cooper
  2018-01-19 15:07       ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 14:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: David Woodhouse, Xen-devel

On 19/01/18 13:25, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1743,6 +1743,29 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>          }
>>  
>>          ctxt_switch_levelling(next);
>> +
>> +        if ( opt_ibpb )
>> +        {
>> +            static DEFINE_PER_CPU(unsigned int, last_nonidle);
>> +            unsigned int *last_id = &this_cpu(last_nonidle);
> "nonidle" is not entirely correct without an is_idle_...() check around
> it, as the outer condition leaves room for idle vCPU-s to make it here.
> But take this as a remark, not a strict request to change the name.

If you can suggest a better name, I'll use it.

>
>> +            /* Squash the domid and vcpu id together for efficiency. */
>> +            unsigned int next_id = (((unsigned int)nextd->domain_id << 16) |
>> +                                    (uint16_t)next->vcpu_id);
> Is this really more efficient than just comparing struct vcpu pointers?

vcpu pointers are rather more susceptible to false aliasing in the case
that the 4k memory allocation behind struct vcpu gets reused.

The risks are admittedly minute, but this is a much safer option.

>
>> +            BUILD_BUG_ON(MAX_VIRT_CPUS > 0xffff);
>> +
>> +            /*
>> +             * When scheduling from a vcpu, to idle, and back to the same vcpu
>> +             * (which might be common in a lightly loaded system, or when
>> +             * using vcpu pinning), there is no need to issue IBPB, as we are
>> +             * returning to the same security context.
>> +             */
>> +            if ( *last_id != next_id )
>> +            {
>> +                wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
>> +                *last_id = next_id;
>> +            }
> I've read David's mails regarding this another time, but I still can't
> conclude why this extra logic would be necessary. Transitions
> from a guest vCPU through idle and back to that very vCPU do
> not alter per_cpu(curr_vcpu, ...) - __context_switch() is the
> only place to update it. There's certainly the potential for it to
> be called through __sync_local_execstate(), but is that a
> common case? I'd support introduction of the extra logic only if
> so.
>
> Furthermore, if this indeed was a sufficiently common case, doing
> lazy context switches at all for HVM guests would once again
> need to be put under question.

David found that transitions to idle and back to the same vcpu were
reliably taking an unnecessary IBPB.

I'm deeply suspicious that, given a TLB shootdown IPI interrupting
half-idle will switch to full idle, lazy context switching is actually a
win (as half-idle still remains set in a domains dirty mask).  Andy Luto
dropped lazyfpu from linux a while ago, proving that was worse in most
cases even for regular processes.

I certainly don't think that trapping #NM for HVM vcpu lazy fpu is
clever on any hardware.

Just one of the many things to investigate in copious free time.

>
>> --- a/xen/arch/x86/spec_ctrl.c
>> +++ b/xen/arch/x86/spec_ctrl.c
>> @@ -33,6 +33,7 @@ static enum ind_thunk {
>>      THUNK_JMP,
>>  } opt_thunk __initdata = THUNK_DEFAULT;
>>  static int opt_ibrs __initdata = -1;
>> +bool __read_mostly opt_ibpb = true;
>>  static bool opt_rsb_native __initdata = true, opt_rsb_vmexit __initdata = true;
> Considering the (better imo) placement of __read_mostly here,
> would you mind moving the __initdata accordingly in the earlier
> patches (opt_thunk having reasons to be an exception)?
> Otherwise please move the __read_mostly to the end.

Ok.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 13:33                   ` Jan Beulich
@ 2018-01-19 15:00                     ` Andrew Cooper
  2018-01-19 15:09                       ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 15:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 13:33, Jan Beulich wrote:
>>>> On 19.01.18 at 14:06, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 12:52, Jan Beulich wrote:
>>>>>> On 19.01.18 at 13:36, <andrew.cooper3@citrix.com> wrote:
>>>> On 19/01/18 12:11, Jan Beulich wrote:
>>>>>>>> On 19.01.18 at 13:01, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 19/01/18 11:46, Jan Beulich wrote:
>>>>>>>>>> On 19.01.18 at 11:53, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 19/01/18 10:40, Jan Beulich wrote:
>>>>>>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> For guest safety, we treat STIBP as special, always override the toolstack
>>>>>>>>>> choice, and always advertise STIBP if IBRS is available.  This removes the
>>>>>>>>>> corner case where STIBP is not advertised, but the guest is running on
>>>>>>>>>> HT-capable hardware where it does matter.
>>>>>>>>> I guess the answer to my question may live somewhere later in the
>>>>>>>>> series, but since I haven't got there yet: Is this based on the
>>>>>>>>> assumption that on HT-capable hardware they would always be
>>>>>>>>> available together? Otherwise, how do you emulate STIBP for the
>>>>>>>>> guest if all you've got is IBRS/IBPB?
>>>>>>>> The safety depends on the guest seeing STIBP and using it if it wants
>>>>>>>> to.  (Not that I've seen any sign of STIBP in the Linux code, or from
>>>>>>>> observing what Windows appears to do).
>>>>>>>>
>>>>>>>> For topology reasons (despite the other cans of worms in this area), we
>>>>>>>> unilaterally set HT, so all guests should find themselves on HT-capable
>>>>>>>> systems.
>>>>>>> But this doesn't answer my question: What do you do if the guest
>>>>>>> uses STIBP (because you've told it that it can), but the hardware
>>>>>>> doesn't support it? Aren't you producing a false sense of security
>>>>>>> to the guest this way?
>>>>>> The entire point of SPEC_CTRL_STIBP being ignored on some hardware is to
>>>>>> let this work.
>>>>>>
>>>>>> By advertising STIBP, we are telling the guest "There might be (but not
>>>>>> definitely) interference from other threads in the BTB.  If you care
>>>>>> about this, you should set SPEC_CTRL.STIBP".
>>>>>>
>>>>>> On hardware where there is definitely no interference, this is a nop.
>>>>>>
>>>>>> In any situation where a guest might migrate to a host where there is
>>>>>> interference, it needs to know about STIBP so (if it cares) it can
>>>>>> choose to set SPEC_CTRL.STIBP.
>>>>> This is the part that is clear, but my question remains unanswered:
>>>>> If HT hardware doesn't support STIBP, how can the guest guard
>>>>> itself _successfully_?
>>>> I'm completely lost now.  On hardware which doesn't support STIBP, there
>>>> is no action required required.
>>> How that? Do you perhaps mean there's nothing we can do? Yes,
>>> and the same applies to the guest. Yet if you've got HT hardware
>>> which supports IBRS but not STIBP
>> If Intel's microcode fails to advertise STIBP on HT-hardware where it is
>> required for safety, then its broken microcode.
>>
>>> you still tell the guest that STIBP is available. Hence the guest seeing 
>> (and using) both, it'll
>>> assume it is safe (and perhaps report so to its users) when in
>>> fact it's still vulnerable.
>> Ok - I see your point now, but there is nothing we can do about that.
> There are options:
>
> 1) Disable HT (read: use only on thread on each core)

The spec explicitly calls out the case of HT-hardware which doesn't
share BTBs across cores, which means there is at least one production or
planned system where this is true.

> 2) Don't advertise STIBP if we find ourselves on HT with IBRS but
>    no STIBP.

No.  The purpose of advertising STIBP even in cases where it isn't
strictly needed is to prevent us falling into the position of needing to
hide STIBP in heterogeneous cases, and increasing the likelyhood of the
guest being unsafe.

(For a guest which cares to use STIBP), unilaterally advertising it is
safe, while hiding the flag because of heterogeneous circumstances is not.

> 3) Issue a bright warning (in the hope that it is noticed).

And what would the conditions of this warning be?  The spec allows for
STIBP not to be advertised on hardware which doesn't actually need it,
which might include HT-capable hardware (but probably wont, for
microcode simplicity reasons).

I'm afraid that despite all of this, I don't see an valid argument
against the logic as implemented in the patch, and I don't see any
viable option to working around the edge case you are concerned about,
which is very definitely a microcode bug.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-19 14:24     ` Andrew Cooper
@ 2018-01-19 15:02       ` Jan Beulich
  2018-01-19 16:10         ` Andrew Cooper
  2018-01-22 15:51         ` Andrew Cooper
  0 siblings, 2 replies; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 15:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 19.01.18 at 15:24, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 12:47, Jan Beulich wrote:
>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>> @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can be 
>>> used to force or
>>>  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
>>>  functionality is still set up so IBRS can be virtualised for guests.
>>>  
>>> +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when the
>>> +RSB gets overwritten.  There are individual controls for an entry from HVM
>>> +context, and an entry from a native (PV or Xen) context.
>> Would you mind adding a sentence or two to the description making
>> clear what use this fine grained control is? I can't really figure why I
>> might need to be concerned about one of the two cases, but not the
>> other.
> 
> I though I'd covered that in the commit message, but I'm not sure this
> is a suitable place to discuss the details.  PV and HVM guests have
> different reasoning for why we need to overwrite the RSB.
> 
> In the past, there used to be a default interaction of rsb_native and
> SMEP, but that proved to be insufficient and rsb_native is now
> unconditionally enabled.  In principle however, it should fall within
> CONFIG_PV.

Thanks for the explanation, but I'm afraid I'm none the wiser as
to why the two separate options are needed (or even just wanted).

>>> --- a/xen/include/asm-x86/spec_ctrl_asm.h
>>> +++ b/xen/include/asm-x86/spec_ctrl_asm.h
>>> @@ -73,6 +73,40 @@
>>>   *  - SPEC_CTRL_EXIT_TO_GUEST
>>>   */
>>>  
>>> +.macro DO_OVERWRITE_RSB
>>> +/*
>>> + * Requires nothing
>>> + * Clobbers %rax, %rcx
>>> + *
>>> + * Requires 256 bytes of stack space, but %rsp has no net change. Based on
>>> + * Google's performance numbers, the loop is unrolled to 16 iterations and two
>>> + * calls per iteration.
>>> + *
>>> + * The call filling the RSB needs a nonzero displacement, but we use "1:
>>> + * pause, jmp 1b" to safely contains any ret-based speculation, even if the
>>> + * loop is speculatively executed prematurely.
>> I'm struggling to understand why you use "but" here. Maybe just a
>> lack of English skills on my part?
> 
> "displacement.  A nop would do, but" ?
> 
> It is a justification for why we are putting more than a single byte in
> the middle.

Oh, I see, but only with the addition you suggest.

>>> + * %rsp is preserved by using an extra GPR because a) we've got plenty spare,
>>> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can be
>>> + * optimised with mov-elimination in modern cores.
>>> + */
>>> +    mov $16, %ecx   /* 16 iterations, two calls per loop */
>>> +    mov %rsp, %rax  /* Store the current %rsp */
>>> +
>>> +.L\@_fill_rsb_loop:
>>> +
>>> +    .rept 2         /* Unrolled twice. */
>>> +    call 2f         /* Create an RSB entry. */
>>> +1:  pause
>>> +    jmp 1b          /* Capture rogue speculation. */
>>> +2:
>> I won't further insist on changing away from numeric labels here, but
>> I'd still like to point out an example of a high risk use of such labels in
>> mainline code: There's a "jz 1b" soon after
>> exception_with_ints_disabled, leading across _two_ other labels and
>> quite a few insns and macro invocations. May I at the very least
>> suggest that you don't use 1 and 2 here?
> 
> I spent ages trying to get .L labels working here, but they don't
> function inside a rept, as you end up with duplicate local symbols.
> 
> Even using irp to inject a unique number into the loop doesn't appear to
> work, because the \ escape gets interpreted as a token separator. 
> AFAICT, \@ is special by virtue of the fact that it doesn't count as a
> token separator.
> 
> If you've got a better suggestion then I'm all ears.
> 
> Alternatively, I could manually unroll the loop, or pick some arbitrary
> other numbers to use.

Since the unroll number is just 2, this is what I would have
suggested primarily. .rept of course won't work, as it's not a
macro invocation, and hence doesn't increment the internal
counter. With .irp I can get things to work:

	.macro m
	.irp n, 1, 2
.Lxyz_\@_\n:	mov	$\@, %eax
	.endr
	.endm

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  2018-01-19 14:48     ` Andrew Cooper
@ 2018-01-19 15:07       ` Jan Beulich
  2018-01-20 11:10         ` David Woodhouse
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 15:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: David Woodhouse, Xen-devel

>>> On 19.01.18 at 15:48, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 13:25, Jan Beulich wrote:
>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -1743,6 +1743,29 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>>          }
>>>  
>>>          ctxt_switch_levelling(next);
>>> +
>>> +        if ( opt_ibpb )
>>> +        {
>>> +            static DEFINE_PER_CPU(unsigned int, last_nonidle);
>>> +            unsigned int *last_id = &this_cpu(last_nonidle);
>> "nonidle" is not entirely correct without an is_idle_...() check around
>> it, as the outer condition leaves room for idle vCPU-s to make it here.
>> But take this as a remark, not a strict request to change the name.
> 
> If you can suggest a better name, I'll use it.

Well, the best I can come up with is just "last". Considering the
narrow scope of the variable, this may actually be fine.

>>> +            /* Squash the domid and vcpu id together for efficiency. */
>>> +            unsigned int next_id = (((unsigned int)nextd->domain_id << 16) |
>>> +                                    (uint16_t)next->vcpu_id);
>> Is this really more efficient than just comparing struct vcpu pointers?
> 
> vcpu pointers are rather more susceptible to false aliasing in the case
> that the 4k memory allocation behind struct vcpu gets reused.
> 
> The risks are admittedly minute, but this is a much safer option.

Oh, right, I didn't consider the case of the vCPU (and domain)
having gone away in the meantime. Mind extending the comment
to clarify this?

>>> +            BUILD_BUG_ON(MAX_VIRT_CPUS > 0xffff);
>>> +
>>> +            /*
>>> +             * When scheduling from a vcpu, to idle, and back to the same vcpu
>>> +             * (which might be common in a lightly loaded system, or when
>>> +             * using vcpu pinning), there is no need to issue IBPB, as we are
>>> +             * returning to the same security context.
>>> +             */
>>> +            if ( *last_id != next_id )
>>> +            {
>>> +                wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
>>> +                *last_id = next_id;
>>> +            }
>> I've read David's mails regarding this another time, but I still can't
>> conclude why this extra logic would be necessary. Transitions
>> from a guest vCPU through idle and back to that very vCPU do
>> not alter per_cpu(curr_vcpu, ...) - __context_switch() is the
>> only place to update it. There's certainly the potential for it to
>> be called through __sync_local_execstate(), but is that a
>> common case? I'd support introduction of the extra logic only if
>> so.
>>
>> Furthermore, if this indeed was a sufficiently common case, doing
>> lazy context switches at all for HVM guests would once again
>> need to be put under question.
> 
> David found that transitions to idle and back to the same vcpu were
> reliably taking an unnecessary IBPB.

I understand that, but there was no explanation whatsoever as
to why that is.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
  2018-01-19 15:00                     ` Andrew Cooper
@ 2018-01-19 15:09                       ` Jan Beulich
  0 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 15:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 19.01.18 at 16:00, <andrew.cooper3@citrix.com> wrote:
> I'm afraid that despite all of this, I don't see an valid argument
> against the logic as implemented in the patch, and I don't see any
> viable option to working around the edge case you are concerned about,
> which is very definitely a microcode bug.

Well, okay, then let's leave it at that and hope for the best:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-19 15:02       ` Jan Beulich
@ 2018-01-19 16:10         ` Andrew Cooper
  2018-01-19 16:19           ` Jan Beulich
  2018-01-22 15:51         ` Andrew Cooper
  1 sibling, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 16:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 15:02, Jan Beulich wrote:
>>>> On 19.01.18 at 15:24, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 12:47, Jan Beulich wrote:
>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>> @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can be 
>>>> used to force or
>>>>  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
>>>>  functionality is still set up so IBRS can be virtualised for guests.
>>>>  
>>>> +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when the
>>>> +RSB gets overwritten.  There are individual controls for an entry from HVM
>>>> +context, and an entry from a native (PV or Xen) context.
>>> Would you mind adding a sentence or two to the description making
>>> clear what use this fine grained control is? I can't really figure why I
>>> might need to be concerned about one of the two cases, but not the
>>> other.
>> I though I'd covered that in the commit message, but I'm not sure this
>> is a suitable place to discuss the details.  PV and HVM guests have
>> different reasoning for why we need to overwrite the RSB.
>>
>> In the past, there used to be a default interaction of rsb_native and
>> SMEP, but that proved to be insufficient and rsb_native is now
>> unconditionally enabled.  In principle however, it should fall within
>> CONFIG_PV.
> Thanks for the explanation, but I'm afraid I'm none the wiser as
> to why the two separate options are needed (or even just wanted).

If you never run 32bit PV guests, and don't use Introspection on HVM VMs
larger than 7 vcpus, then you are believed safe to turn rsb_native off.

Also, based on feedback we're seeing from the field, an "I trust my PV
guest mode" looks like it will go a long way.  When dom0 is the only PV
guest (which is very common, and increasingly so these days), then we
can drop rsb_native and IBRS_CLEAR, giving us zero overhead for
exception/interrupt handling.

>
>>>> --- a/xen/include/asm-x86/spec_ctrl_asm.h
>>>> +++ b/xen/include/asm-x86/spec_ctrl_asm.h
>>>> @@ -73,6 +73,40 @@
>>>>   *  - SPEC_CTRL_EXIT_TO_GUEST
>>>>   */
>>>>  
>>>> +.macro DO_OVERWRITE_RSB
>>>> +/*
>>>> + * Requires nothing
>>>> + * Clobbers %rax, %rcx
>>>> + *
>>>> + * Requires 256 bytes of stack space, but %rsp has no net change. Based on
>>>> + * Google's performance numbers, the loop is unrolled to 16 iterations and two
>>>> + * calls per iteration.
>>>> + *
>>>> + * The call filling the RSB needs a nonzero displacement, but we use "1:
>>>> + * pause, jmp 1b" to safely contains any ret-based speculation, even if the
>>>> + * loop is speculatively executed prematurely.
>>> I'm struggling to understand why you use "but" here. Maybe just a
>>> lack of English skills on my part?
>> "displacement.  A nop would do, but" ?
>>
>> It is a justification for why we are putting more than a single byte in
>> the middle.
> Oh, I see, but only with the addition you suggest.
>
>>>> + * %rsp is preserved by using an extra GPR because a) we've got plenty spare,
>>>> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can be
>>>> + * optimised with mov-elimination in modern cores.
>>>> + */
>>>> +    mov $16, %ecx   /* 16 iterations, two calls per loop */
>>>> +    mov %rsp, %rax  /* Store the current %rsp */
>>>> +
>>>> +.L\@_fill_rsb_loop:
>>>> +
>>>> +    .rept 2         /* Unrolled twice. */
>>>> +    call 2f         /* Create an RSB entry. */
>>>> +1:  pause
>>>> +    jmp 1b          /* Capture rogue speculation. */
>>>> +2:
>>> I won't further insist on changing away from numeric labels here, but
>>> I'd still like to point out an example of a high risk use of such labels in
>>> mainline code: There's a "jz 1b" soon after
>>> exception_with_ints_disabled, leading across _two_ other labels and
>>> quite a few insns and macro invocations. May I at the very least
>>> suggest that you don't use 1 and 2 here?
>> I spent ages trying to get .L labels working here, but they don't
>> function inside a rept, as you end up with duplicate local symbols.
>>
>> Even using irp to inject a unique number into the loop doesn't appear to
>> work, because the \ escape gets interpreted as a token separator. 
>> AFAICT, \@ is special by virtue of the fact that it doesn't count as a
>> token separator.
>>
>> If you've got a better suggestion then I'm all ears.
>>
>> Alternatively, I could manually unroll the loop, or pick some arbitrary
>> other numbers to use.
> Since the unroll number is just 2, this is what I would have
> suggested primarily. .rept of course won't work, as it's not a
> macro invocation, and hence doesn't increment the internal
> counter. With .irp I can get things to work:
>
> 	.macro m
> 	.irp n, 1, 2
> .Lxyz_\@_\n:	mov	$\@, %eax
> 	.endr
> 	.endm

Well.  That explains a previous curiosity I'd found of .exitm only
breaking to the end of the .irp, not the .macro

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-19 16:10         ` Andrew Cooper
@ 2018-01-19 16:19           ` Jan Beulich
  2018-01-19 16:43             ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-19 16:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 19.01.18 at 17:10, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 15:02, Jan Beulich wrote:
>>>>> On 19.01.18 at 15:24, <andrew.cooper3@citrix.com> wrote:
>>> On 19/01/18 12:47, Jan Beulich wrote:
>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>> @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can be 
>>>>> used to force or
>>>>>  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
>>>>>  functionality is still set up so IBRS can be virtualised for guests.
>>>>>  
>>>>> +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when the
>>>>> +RSB gets overwritten.  There are individual controls for an entry from HVM
>>>>> +context, and an entry from a native (PV or Xen) context.
>>>> Would you mind adding a sentence or two to the description making
>>>> clear what use this fine grained control is? I can't really figure why I
>>>> might need to be concerned about one of the two cases, but not the
>>>> other.
>>> I though I'd covered that in the commit message, but I'm not sure this
>>> is a suitable place to discuss the details.  PV and HVM guests have
>>> different reasoning for why we need to overwrite the RSB.
>>>
>>> In the past, there used to be a default interaction of rsb_native and
>>> SMEP, but that proved to be insufficient and rsb_native is now
>>> unconditionally enabled.  In principle however, it should fall within
>>> CONFIG_PV.
>> Thanks for the explanation, but I'm afraid I'm none the wiser as
>> to why the two separate options are needed (or even just wanted).
> 
> If you never run 32bit PV guests, and don't use Introspection on HVM VMs
> larger than 7 vcpus, then you are believed safe to turn rsb_native off.

Where does that funny 7 come from.

> Also, based on feedback we're seeing from the field, an "I trust my PV
> guest mode" looks like it will go a long way.  When dom0 is the only PV
> guest (which is very common, and increasingly so these days), then we
> can drop rsb_native and IBRS_CLEAR, giving us zero overhead for
> exception/interrupt handling.

Okay, makes sense, thanks. Would be nice if you could add some of
this to the description.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-19 16:19           ` Jan Beulich
@ 2018-01-19 16:43             ` Andrew Cooper
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-19 16:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 16:19, Jan Beulich wrote:
>>>> On 19.01.18 at 17:10, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 15:02, Jan Beulich wrote:
>>>>>> On 19.01.18 at 15:24, <andrew.cooper3@citrix.com> wrote:
>>>> On 19/01/18 12:47, Jan Beulich wrote:
>>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>>> @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can be 
>>>>>> used to force or
>>>>>>  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
>>>>>>  functionality is still set up so IBRS can be virtualised for guests.
>>>>>>  
>>>>>> +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when the
>>>>>> +RSB gets overwritten.  There are individual controls for an entry from HVM
>>>>>> +context, and an entry from a native (PV or Xen) context.
>>>>> Would you mind adding a sentence or two to the description making
>>>>> clear what use this fine grained control is? I can't really figure why I
>>>>> might need to be concerned about one of the two cases, but not the
>>>>> other.
>>>> I though I'd covered that in the commit message, but I'm not sure this
>>>> is a suitable place to discuss the details.  PV and HVM guests have
>>>> different reasoning for why we need to overwrite the RSB.
>>>>
>>>> In the past, there used to be a default interaction of rsb_native and
>>>> SMEP, but that proved to be insufficient and rsb_native is now
>>>> unconditionally enabled.  In principle however, it should fall within
>>>> CONFIG_PV.
>>> Thanks for the explanation, but I'm afraid I'm none the wiser as
>>> to why the two separate options are needed (or even just wanted).
>> If you never run 32bit PV guests, and don't use Introspection on HVM VMs
>> larger than 7 vcpus, then you are believed safe to turn rsb_native off.
> Where does that funny 7 come from.

It was 7 last time I looked.

Its complicated, but is basically sizeof(vm_event ring) /
sizeof(vm_event request), accounting for there being space for at least
one synchronous request remaining if async requests are actually used.

The plan for resolving this is:
1) Paul's new mapping API
2) Switching the vm_event ring in two
2a) sync requests become a straight array using vcpu_id as an index
2b) async requests can become an (arbitrary large, within reason)
multipage ring, and respecified to be lossy if not drained quickly enough.
3) I purge the waitqueue infrastructure and pretend that it never existed.

This means that livepatching is finally safe in combination with
introspection, and the rings can't be fiddled with my a cunning guest.

>
>> Also, based on feedback we're seeing from the field, an "I trust my PV
>> guest mode" looks like it will go a long way.  When dom0 is the only PV
>> guest (which is very common, and increasingly so these days), then we
>> can drop rsb_native and IBRS_CLEAR, giving us zero overhead for
>> exception/interrupt handling.
> Okay, makes sense, thanks. Would be nice if you could add some of
> this to the description.

I'll see what I can do.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  2018-01-19 15:07       ` Jan Beulich
@ 2018-01-20 11:10         ` David Woodhouse
  2018-01-22 10:15           ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: David Woodhouse @ 2018-01-20 11:10 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1599 bytes --]

On Fri, 2018-01-19 at 08:07 -0700, Jan Beulich wrote:
> > > > On 19.01.18 at 15:48, <andrew.cooper3@citrix.com> wrote:
> > vcpu pointers are rather more susceptible to false aliasing in the case
> > that the 4k memory allocation behind struct vcpu gets reused.
> > 
> > The risks are admittedly minute, but this is a much safer option.
>
> Oh, right, I didn't consider the case of the vCPU (and domain)
> having gone away in the meantime. Mind extending the comment
> to clarify this?

I look at this and figured I didn't care about aliasing. Sure, we might
not do the IBPB if the stale pointer is aliased and we switch from a
now-dead vCPU to a new vCPU. But good luck exploiting the new vCPU from
an old domain that's now dead!

I actually thought Andrew changed from pointers just because of the
'ick' factor of keeping a known-stale pointer around and using it for
*comparison* but having to careful avoid ever *dereferencing* it (as my
debugging patch in context_switch() did... oops!)

> > David found that transitions to idle and back to the same vcpu were
> > reliably taking an unnecessary IBPB.

I'll repeat my caveat there — I observed this on our ancient Xen
4.mumble not current HEAD. But the code here looks remarkably similar;
it hasn't changed for a while.

> I understand that, but there was no explanation whatsoever as
> to why that is.

Looks like we set per_cpu(curr_vcpu)=next every time we switch, even if
we are switching to the idle domain. Which is why I added per_cpu(last)
specifically to be updated only when it *isn't* an idle vCPU.

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD}
  2018-01-18 15:46 ` [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD} Andrew Cooper
  2018-01-18 18:04   ` Boris Ostrovsky
  2018-01-19 10:52   ` Jan Beulich
@ 2018-01-22  1:47   ` Tian, Kevin
  2 siblings, 0 replies; 60+ messages in thread
From: Tian, Kevin @ 2018-01-22  1:47 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Suravee Suthikulpanit, Boris Ostrovsky, Nakajima, Jun, Jan Beulich

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Thursday, January 18, 2018 11:46 PM
> 
> For performance reasons, HVM guests should have direct access to these
> MSRs
> when possible.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  2018-01-20 11:10         ` David Woodhouse
@ 2018-01-22 10:15           ` Jan Beulich
  0 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2018-01-22 10:15 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Cooper, Xen-devel

>>> On 20.01.18 at 12:10, <dwmw2@infradead.org> wrote:
> On Fri, 2018-01-19 at 08:07 -0700, Jan Beulich wrote:
>> I understand that, but there was no explanation whatsoever as
>> to why that is.
> 
> Looks like we set per_cpu(curr_vcpu)=next every time we switch, even if
> we are switching to the idle domain.

I can't see that - the only place it is being updated is at the end of
__context_switch(), which isn't being called in the normal case
when context_switch() finds is_idle_domain(nextd).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-19 13:51       ` Jan Beulich
@ 2018-01-22 11:42         ` Andrew Cooper
  2018-01-22 12:06           ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-22 11:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 13:51, Jan Beulich wrote:
>>>> On 19.01.18 at 14:36, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 11:43, Jan Beulich wrote:
>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>> @@ -729,6 +760,9 @@ ENTRY(nmi)
>>>>  handle_ist_exception:
>>>>          SAVE_ALL CLAC
>>>>  
>>>> +        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
>>>> +        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>>> Following my considerations towards alternative patching to
>>> eliminate as much overhead as possible from the Meltdown
>>> band-aid in case it is being disabled, I'm rather hesitant to see any
>>> patchable code being introduced into the NMI/#MC entry paths
>>> without the patching logic first being made safe in this regard.
>>> Exceptions coming here aren't very frequent (except perhaps on
>>> hardware about to die), so the path isn't performance critical.
>>> Therefore I think we should try to avoid any patching here, and
>>> just conditionals instead. This in fact is one of the reasons why I
>>> didn't want to macro-ize the assembly additions done in the
>>> Meltdown band-aid.
>>>
>>> I do realize that this then also affects the exit-to-Xen path,
>>> which I agree is less desirable to use conditionals on.
>> While I agree that our lack of IST-safe patching is a problem, these
>> alternative points are already present on the NMI and MCE paths, and
>> need to be.  As a result, the DF handler is in no worse of a position. 
>> As a perfect example, observe the CLAC in context.
> Oh, indeed. We should change that.
>
>> I could perhaps be talked into making a SPEC_CTRL_ENTRY_FROM_IST variant
>> which doesn't use alternatives (but IMO this is pointless in the
>> presence of CLAC), but still don't think it is reasonable to treat DF
>> differently to NMI/MCE.
> #DF is debatable: On one hand I can see that if things go wrong,
> it can equally be raised at any time. Otoh #MC and even more so
> NMI can be raised _without_ things going (fatally) wrong, i.e. the
> patching may break a boot which would otherwise have succeeded
> (whereas the #DF would make the boot fail anyway).

I don't see a conclusion here, or a reason for treating #DF differently
to NMI or #MC.

There is currently a very very slim race on boot where an NMI or #MC
hitting the main application of alternatives may cause Xen to explode. 
This has been the case since alternatives were introduced, and this
patch doesn't make the problem meaningfully worse.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-22 11:42         ` Andrew Cooper
@ 2018-01-22 12:06           ` Jan Beulich
  2018-01-22 13:52             ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-22 12:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 22.01.18 at 12:42, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 13:51, Jan Beulich wrote:
>>>>> On 19.01.18 at 14:36, <andrew.cooper3@citrix.com> wrote:
>>> On 19/01/18 11:43, Jan Beulich wrote:
>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>> @@ -729,6 +760,9 @@ ENTRY(nmi)
>>>>>  handle_ist_exception:
>>>>>          SAVE_ALL CLAC
>>>>>  
>>>>> +        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
>>>>> +        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>>>> Following my considerations towards alternative patching to
>>>> eliminate as much overhead as possible from the Meltdown
>>>> band-aid in case it is being disabled, I'm rather hesitant to see any
>>>> patchable code being introduced into the NMI/#MC entry paths
>>>> without the patching logic first being made safe in this regard.
>>>> Exceptions coming here aren't very frequent (except perhaps on
>>>> hardware about to die), so the path isn't performance critical.
>>>> Therefore I think we should try to avoid any patching here, and
>>>> just conditionals instead. This in fact is one of the reasons why I
>>>> didn't want to macro-ize the assembly additions done in the
>>>> Meltdown band-aid.
>>>>
>>>> I do realize that this then also affects the exit-to-Xen path,
>>>> which I agree is less desirable to use conditionals on.
>>> While I agree that our lack of IST-safe patching is a problem, these
>>> alternative points are already present on the NMI and MCE paths, and
>>> need to be.  As a result, the DF handler is in no worse of a position. 
>>> As a perfect example, observe the CLAC in context.
>> Oh, indeed. We should change that.
>>
>>> I could perhaps be talked into making a SPEC_CTRL_ENTRY_FROM_IST variant
>>> which doesn't use alternatives (but IMO this is pointless in the
>>> presence of CLAC), but still don't think it is reasonable to treat DF
>>> differently to NMI/MCE.
>> #DF is debatable: On one hand I can see that if things go wrong,
>> it can equally be raised at any time. Otoh #MC and even more so
>> NMI can be raised _without_ things going (fatally) wrong, i.e. the
>> patching may break a boot which would otherwise have succeeded
>> (whereas the #DF would make the boot fail anyway).
> 
> I don't see a conclusion here, or a reason for treating #DF differently
> to NMI or #MC.

Odd - I thought my reply was pretty clear in this regard. I have
no good idea how to word it differently. Furthermore the goal
of the reply was not to settle on how to treat #DF, but to try
to convince you to avoid adding more patch points to the NMI /
#MC path (if you want #DF treated similarly, I wouldn't
object patching to be avoided there too).

> There is currently a very very slim race on boot where an NMI or #MC
> hitting the main application of alternatives may cause Xen to explode. 
> This has been the case since alternatives were introduced, and this
> patch doesn't make the problem meaningfully worse.

SMAP patching affects 3 bytes (and I'm intending to put together a
patch removing that patching from the NMI / #MC path), while you
add patching of quite a few more bytes, increasing the risk
accordingly.

If you really don't want to switch away from the patching approach,
I won't refuse to ack the patch. But it'll mean subsequent changes
will be more intrusive, to get this converted to conditionals instead
(unless someone has _immediate_ plans to deal with the issues in
the patching logic itself).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-22 12:06           ` Jan Beulich
@ 2018-01-22 13:52             ` Andrew Cooper
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-22 13:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 22/01/18 12:06, Jan Beulich wrote:
>>>> On 22.01.18 at 12:42, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 13:51, Jan Beulich wrote:
>>>>>> On 19.01.18 at 14:36, <andrew.cooper3@citrix.com> wrote:
>>>> On 19/01/18 11:43, Jan Beulich wrote:
>>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>>> @@ -729,6 +760,9 @@ ENTRY(nmi)
>>>>>>  handle_ist_exception:
>>>>>>          SAVE_ALL CLAC
>>>>>>  
>>>>>> +        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
>>>>>> +        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>>>>> Following my considerations towards alternative patching to
>>>>> eliminate as much overhead as possible from the Meltdown
>>>>> band-aid in case it is being disabled, I'm rather hesitant to see any
>>>>> patchable code being introduced into the NMI/#MC entry paths
>>>>> without the patching logic first being made safe in this regard.
>>>>> Exceptions coming here aren't very frequent (except perhaps on
>>>>> hardware about to die), so the path isn't performance critical.
>>>>> Therefore I think we should try to avoid any patching here, and
>>>>> just conditionals instead. This in fact is one of the reasons why I
>>>>> didn't want to macro-ize the assembly additions done in the
>>>>> Meltdown band-aid.
>>>>>
>>>>> I do realize that this then also affects the exit-to-Xen path,
>>>>> which I agree is less desirable to use conditionals on.
>>>> While I agree that our lack of IST-safe patching is a problem, these
>>>> alternative points are already present on the NMI and MCE paths, and
>>>> need to be.  As a result, the DF handler is in no worse of a position. 
>>>> As a perfect example, observe the CLAC in context.
>>> Oh, indeed. We should change that.
>>>
>>>> I could perhaps be talked into making a SPEC_CTRL_ENTRY_FROM_IST variant
>>>> which doesn't use alternatives (but IMO this is pointless in the
>>>> presence of CLAC), but still don't think it is reasonable to treat DF
>>>> differently to NMI/MCE.
>>> #DF is debatable: On one hand I can see that if things go wrong,
>>> it can equally be raised at any time. Otoh #MC and even more so
>>> NMI can be raised _without_ things going (fatally) wrong, i.e. the
>>> patching may break a boot which would otherwise have succeeded
>>> (whereas the #DF would make the boot fail anyway).
>> I don't see a conclusion here, or a reason for treating #DF differently
>> to NMI or #MC.
> Odd - I thought my reply was pretty clear in this regard.

The clear way of replying would be "Yes - please put in conditionals for
now".

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 03/11] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} for guests
  2018-01-19 10:45   ` Jan Beulich
  2018-01-19 11:05     ` Andrew Cooper
@ 2018-01-22 14:50     ` Andrew Cooper
  1 sibling, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-22 14:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 10:45, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>> @@ -153,14 +168,44 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>>  {
>>      const struct vcpu *curr = current;
>>      struct domain *d = v->domain;
>> +    const struct cpuid_policy *cp = d->arch.cpuid;
>>      struct msr_domain_policy *dp = d->arch.msr;
>>      struct msr_vcpu_policy *vp = v->arch.msr;
>>  
>>      switch ( msr )
>>      {
>>      case MSR_INTEL_PLATFORM_INFO:
>> +    case MSR_ARCH_CAPABILITIES:
>> +        /* Read-only */
>>          goto gp_fault;
>>  
>> +    case MSR_SPEC_CTRL:
>> +        if ( !cp->feat.ibrsb )
>> +            goto gp_fault; /* MSR available? */
>> +
>> +        /*
>> +         * Note: SPEC_CTRL_STIBP is specified as safe to use (i.e. ignored)
>> +         * when STIBP isn't enumerated in hardware.
>> +         */
>> +
>> +        if ( val & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP) )
>> +            goto gp_fault; /* Rsvd bit set? */
>> +
>> +        vp->spec_ctrl.raw = val;
>> +        break;
> Did you check (or inquire) whether reading back the value on a
> system which ignores the write to 1 actually produces the
> written value? I'd sort of expect zero to come back instead.

Tom Lendacky has confirmed on the LKML that AMD will implement these
bits as "read as written" rather than "read as zero".

https://lkml.org/lkml/2018/1/22/499

Still no comment from Intel.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use
  2018-01-19 14:01   ` Jan Beulich
@ 2018-01-22 15:11     ` Andrew Cooper
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-22 15:11 UTC (permalink / raw)
  To: Jan Beulich, Asit K Mallick, Jun Nakajima; +Cc: Xen-devel

On 19/01/18 14:01, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>> +/* Calculate whether Retpoline is known-safe on this CPU. */
>> +static bool __init retpoline_safe(void)
>> +{
>> +    unsigned int ucode_rev = this_cpu(ucode_cpu_info).cpu_sig.rev;
>> +
>> +    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
>> +        return true;
>> +
>> +    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
>> +         boot_cpu_data.x86 != 6 )
>> +        return false;
>> +
>> +    switch ( boot_cpu_data.x86_model )
>> +    {
>> +    case 0x17: /* Penryn */
>> +    case 0x1d: /* Dunnington */
>> +    case 0x1e: /* Nehalem */
>> +    case 0x1f: /* Auburndale / Havendale */
>> +    case 0x1a: /* Nehalem EP */
>> +    case 0x2e: /* Nehalem EX */
>> +    case 0x25: /* Westmere */
>> +    case 0x2c: /* Westmere EP */
>> +    case 0x2f: /* Westmere EX */
>> +    case 0x2a: /* SandyBridge */
>> +    case 0x2d: /* SandyBridge EP/EX */
>> +    case 0x3a: /* IvyBridge */
>> +    case 0x3e: /* IvyBridge EP/EX */
>> +    case 0x3c: /* Haswell */
>> +    case 0x3f: /* Haswell EX/EP */
>> +    case 0x45: /* Haswell D */
>> +    case 0x46: /* Haswell H */
>> +        return true;
> Especially regarding this part of the function (leave the other half
> in context for reference) - can we perhaps shorten this? E.g.
> fold together everything below the lowest Broadwell model value?

No.  The numbers aren't linear and we don't have confirmation that
retpoline is safe on those CPUs.  What will happen in practice on the
older CPUs is that we see a lack of IBRS (no microcode) and a lack of
any other option, and fall back to repotline (i.e. no change from the
compiled binary) as it is better than nothing, if not completely safe.

> There in particular don't appear to be an Atoms in the list above,
> yet either they're affected by the underlying issue as well (and
> hence would benefit from whatever applicable mitigation), or we
> should suppress as much mitigation overhead as possible for them
> (which I don't think is the case).

Some Atoms are definitely affected, but I've not had any confirmation on
affected models.

>
> In any event, as said before - this function will need an ack from
> an Intel person, and it would be nice for ...
>
>> +        /*
>> +         * Broadwell processors are retpoline-safe after specific microcode
>> +         * versions.
>> +         */
>> +    case 0x3d: /* Broadwell */
>> +        return ucode_rev >= 0x28;
>> +    case 0x47: /* Broadwell H */
>> +        return ucode_rev >= 0x1b;
>> +    case 0x4f: /* Broadwell EP/EX */
>> +        return ucode_rev >= 0xb000025;
>> +    case 0x56: /* Broadwell D */
>> +        return false; /* TBD. */
> ... this to be filled in.

Still waiting.  I don't think this should block the patch going in.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-19 15:02       ` Jan Beulich
  2018-01-19 16:10         ` Andrew Cooper
@ 2018-01-22 15:51         ` Andrew Cooper
  2018-01-22 16:49           ` Jan Beulich
  1 sibling, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-22 15:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 19/01/18 15:02, Jan Beulich wrote:
>>>> On 19.01.18 at 15:24, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 12:47, Jan Beulich wrote:
>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>> @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can be 
>>>> used to force or
>>>>  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
>>>>  functionality is still set up so IBRS can be virtualised for guests.
>>>>  
>>>> +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when the
>>>> +RSB gets overwritten.  There are individual controls for an entry from HVM
>>>> +context, and an entry from a native (PV or Xen) context.
>>> Would you mind adding a sentence or two to the description making
>>> clear what use this fine grained control is? I can't really figure why I
>>> might need to be concerned about one of the two cases, but not the
>>> other.
>> I though I'd covered that in the commit message, but I'm not sure this
>> is a suitable place to discuss the details.  PV and HVM guests have
>> different reasoning for why we need to overwrite the RSB.
>>
>> In the past, there used to be a default interaction of rsb_native and
>> SMEP, but that proved to be insufficient and rsb_native is now
>> unconditionally enabled.  In principle however, it should fall within
>> CONFIG_PV.
> Thanks for the explanation, but I'm afraid I'm none the wiser as
> to why the two separate options are needed (or even just wanted).
>
>>>> --- a/xen/include/asm-x86/spec_ctrl_asm.h
>>>> +++ b/xen/include/asm-x86/spec_ctrl_asm.h
>>>> @@ -73,6 +73,40 @@
>>>>   *  - SPEC_CTRL_EXIT_TO_GUEST
>>>>   */
>>>>  
>>>> +.macro DO_OVERWRITE_RSB
>>>> +/*
>>>> + * Requires nothing
>>>> + * Clobbers %rax, %rcx
>>>> + *
>>>> + * Requires 256 bytes of stack space, but %rsp has no net change. Based on
>>>> + * Google's performance numbers, the loop is unrolled to 16 iterations and two
>>>> + * calls per iteration.
>>>> + *
>>>> + * The call filling the RSB needs a nonzero displacement, but we use "1:
>>>> + * pause, jmp 1b" to safely contains any ret-based speculation, even if the
>>>> + * loop is speculatively executed prematurely.
>>> I'm struggling to understand why you use "but" here. Maybe just a
>>> lack of English skills on my part?
>> "displacement.  A nop would do, but" ?
>>
>> It is a justification for why we are putting more than a single byte in
>> the middle.
> Oh, I see, but only with the addition you suggest.
>
>>>> + * %rsp is preserved by using an extra GPR because a) we've got plenty spare,
>>>> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can be
>>>> + * optimised with mov-elimination in modern cores.
>>>> + */
>>>> +    mov $16, %ecx   /* 16 iterations, two calls per loop */
>>>> +    mov %rsp, %rax  /* Store the current %rsp */
>>>> +
>>>> +.L\@_fill_rsb_loop:
>>>> +
>>>> +    .rept 2         /* Unrolled twice. */
>>>> +    call 2f         /* Create an RSB entry. */
>>>> +1:  pause
>>>> +    jmp 1b          /* Capture rogue speculation. */
>>>> +2:
>>> I won't further insist on changing away from numeric labels here, but
>>> I'd still like to point out an example of a high risk use of such labels in
>>> mainline code: There's a "jz 1b" soon after
>>> exception_with_ints_disabled, leading across _two_ other labels and
>>> quite a few insns and macro invocations. May I at the very least
>>> suggest that you don't use 1 and 2 here?
>> I spent ages trying to get .L labels working here, but they don't
>> function inside a rept, as you end up with duplicate local symbols.
>>
>> Even using irp to inject a unique number into the loop doesn't appear to
>> work, because the \ escape gets interpreted as a token separator. 
>> AFAICT, \@ is special by virtue of the fact that it doesn't count as a
>> token separator.
>>
>> If you've got a better suggestion then I'm all ears.
>>
>> Alternatively, I could manually unroll the loop, or pick some arbitrary
>> other numbers to use.
> Since the unroll number is just 2, this is what I would have
> suggested primarily. .rept of course won't work, as it's not a
> macro invocation, and hence doesn't increment the internal
> counter. With .irp I can get things to work:
>
> 	.macro m
> 	.irp n, 1, 2
> .Lxyz_\@_\n:	mov	$\@, %eax
> 	.endr
> 	.endm

This appears to only work when \n is at the end of the label.  None of:

.Lxyz_\@_\n_:    mov    $\@, %eax
.Lxyz_\@_\n\()_:    mov    $\@, %eax
.Lxyz_\n\@:    mov    $\@, %eax

work.

Given this appears to be a corner case to begin with, how likely do you
think it is to work with older assemblers?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-22 15:51         ` Andrew Cooper
@ 2018-01-22 16:49           ` Jan Beulich
  2018-01-22 17:04             ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2018-01-22 16:49 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 22.01.18 at 16:51, <andrew.cooper3@citrix.com> wrote:
> On 19/01/18 15:02, Jan Beulich wrote:
>>>>> On 19.01.18 at 15:24, <andrew.cooper3@citrix.com> wrote:
>>> On 19/01/18 12:47, Jan Beulich wrote:
>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>> + * %rsp is preserved by using an extra GPR because a) we've got plenty spare,
>>>>> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can be
>>>>> + * optimised with mov-elimination in modern cores.
>>>>> + */
>>>>> +    mov $16, %ecx   /* 16 iterations, two calls per loop */
>>>>> +    mov %rsp, %rax  /* Store the current %rsp */
>>>>> +
>>>>> +.L\@_fill_rsb_loop:
>>>>> +
>>>>> +    .rept 2         /* Unrolled twice. */
>>>>> +    call 2f         /* Create an RSB entry. */
>>>>> +1:  pause
>>>>> +    jmp 1b          /* Capture rogue speculation. */
>>>>> +2:
>>>> I won't further insist on changing away from numeric labels here, but
>>>> I'd still like to point out an example of a high risk use of such labels in
>>>> mainline code: There's a "jz 1b" soon after
>>>> exception_with_ints_disabled, leading across _two_ other labels and
>>>> quite a few insns and macro invocations. May I at the very least
>>>> suggest that you don't use 1 and 2 here?
>>> I spent ages trying to get .L labels working here, but they don't
>>> function inside a rept, as you end up with duplicate local symbols.
>>>
>>> Even using irp to inject a unique number into the loop doesn't appear to
>>> work, because the \ escape gets interpreted as a token separator. 
>>> AFAICT, \@ is special by virtue of the fact that it doesn't count as a
>>> token separator.
>>>
>>> If you've got a better suggestion then I'm all ears.
>>>
>>> Alternatively, I could manually unroll the loop, or pick some arbitrary
>>> other numbers to use.
>> Since the unroll number is just 2, this is what I would have
>> suggested primarily. .rept of course won't work, as it's not a
>> macro invocation, and hence doesn't increment the internal
>> counter. With .irp I can get things to work:
>>
>> 	.macro m
>> 	.irp n, 1, 2
>> .Lxyz_\@_\n:	mov	$\@, %eax
>> 	.endr
>> 	.endm
> 
> This appears to only work when \n is at the end of the label.  None of:
> 
> .Lxyz_\@_\n_:    mov    $\@, %eax
> .Lxyz_\@_\n\()_:    mov    $\@, %eax
> .Lxyz_\n\@:    mov    $\@, %eax
> 
> work.

.Lxyz_\n\(\)()_\(\)@: mov	$\@, %ecx

(There are two rounds of expansion, so \-s you want expanded
in the second round need escaping for the first one.

> Given this appears to be a corner case to begin with, how likely do you
> think it is to work with older assemblers?

The really old ones where macro handling was a mess will be a
problem irrespective of the above, I'm afraid. From the point on
where macros were made work sensibly I think not much has
changed. But yes, there's a risk.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen
  2018-01-22 16:49           ` Jan Beulich
@ 2018-01-22 17:04             ` Andrew Cooper
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-22 17:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 22/01/18 16:49, Jan Beulich wrote:
>>>> On 22.01.18 at 16:51, <andrew.cooper3@citrix.com> wrote:
>> On 19/01/18 15:02, Jan Beulich wrote:
>>>>>> On 19.01.18 at 15:24, <andrew.cooper3@citrix.com> wrote:
>>>> On 19/01/18 12:47, Jan Beulich wrote:
>>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@citrix.com> wrote:
>>>>>> + * %rsp is preserved by using an extra GPR because a) we've got plenty spare,
>>>>>> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can be
>>>>>> + * optimised with mov-elimination in modern cores.
>>>>>> + */
>>>>>> +    mov $16, %ecx   /* 16 iterations, two calls per loop */
>>>>>> +    mov %rsp, %rax  /* Store the current %rsp */
>>>>>> +
>>>>>> +.L\@_fill_rsb_loop:
>>>>>> +
>>>>>> +    .rept 2         /* Unrolled twice. */
>>>>>> +    call 2f         /* Create an RSB entry. */
>>>>>> +1:  pause
>>>>>> +    jmp 1b          /* Capture rogue speculation. */
>>>>>> +2:
>>>>> I won't further insist on changing away from numeric labels here, but
>>>>> I'd still like to point out an example of a high risk use of such labels in
>>>>> mainline code: There's a "jz 1b" soon after
>>>>> exception_with_ints_disabled, leading across _two_ other labels and
>>>>> quite a few insns and macro invocations. May I at the very least
>>>>> suggest that you don't use 1 and 2 here?
>>>> I spent ages trying to get .L labels working here, but they don't
>>>> function inside a rept, as you end up with duplicate local symbols.
>>>>
>>>> Even using irp to inject a unique number into the loop doesn't appear to
>>>> work, because the \ escape gets interpreted as a token separator. 
>>>> AFAICT, \@ is special by virtue of the fact that it doesn't count as a
>>>> token separator.
>>>>
>>>> If you've got a better suggestion then I'm all ears.
>>>>
>>>> Alternatively, I could manually unroll the loop, or pick some arbitrary
>>>> other numbers to use.
>>> Since the unroll number is just 2, this is what I would have
>>> suggested primarily. .rept of course won't work, as it's not a
>>> macro invocation, and hence doesn't increment the internal
>>> counter. With .irp I can get things to work:
>>>
>>> 	.macro m
>>> 	.irp n, 1, 2
>>> .Lxyz_\@_\n:	mov	$\@, %eax
>>> 	.endr
>>> 	.endm
>> This appears to only work when \n is at the end of the label.  None of:
>>
>> .Lxyz_\@_\n_:    mov    $\@, %eax
>> .Lxyz_\@_\n\()_:    mov    $\@, %eax
>> .Lxyz_\n\@:    mov    $\@, %eax
>>
>> work.
> .Lxyz_\n\(\)()_\(\)@: mov	$\@, %ecx
>
> (There are two rounds of expansion, so \-s you want expanded
> in the second round need escaping for the first one.

Eww.  That's too much like Perl IMO.

I'll see if I can arrange things to work neatly with \n at the end of
the label.  If not, I'll manually unroll the loop; it is only twice
after all.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-19 13:36     ` Andrew Cooper
  2018-01-19 13:51       ` Jan Beulich
@ 2018-01-22 22:27       ` Boris Ostrovsky
  2018-01-23  0:17         ` Andrew Cooper
  1 sibling, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2018-01-22 22:27 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: Xen-devel

On 01/19/2018 08:36 AM, Andrew Cooper wrote:
> On 19/01/18 11:43, Jan Beulich wrote:
>
>>> @@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
>>>  .Lvmx_vmentry_fail:
>>>          sti
>>>          SAVE_ALL
>>> +
>>> +        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
>> I think the use of the PV variant here requires a comment.
> Oh.  It used to have one...  I'll try to find it.

I, in fact, meant to ask about this for a long time and always forgot.
Perhaps your comment will say more than just why a PV variant is used
here but in case it won't --- why do we have *any* mitigation here? We
are never returning to the guest, do we?

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-22 22:27       ` Boris Ostrovsky
@ 2018-01-23  0:17         ` Andrew Cooper
  2018-01-23  2:19           ` Boris Ostrovsky
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2018-01-23  0:17 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich; +Cc: Xen-devel

On 22/01/2018 22:27, Boris Ostrovsky wrote:
> On 01/19/2018 08:36 AM, Andrew Cooper wrote:
>> On 19/01/18 11:43, Jan Beulich wrote:
>>
>>>> @@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
>>>>  .Lvmx_vmentry_fail:
>>>>          sti
>>>>          SAVE_ALL
>>>> +
>>>> +        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
>>> I think the use of the PV variant here requires a comment.
>> Oh.  It used to have one...  I'll try to find it.
> I, in fact, meant to ask about this for a long time and always forgot.
> Perhaps your comment will say more than just why a PV variant is used
> here but in case it won't --- why do we have *any* mitigation here? We
> are never returning to the guest, do we?

We never return to *this* guest, but we are still open to abuse from a
separate hyperthread, so still need to set SPEC_CTRL.IBRS if we are
using IBRS for safety.  (If we are using lfence+jmp or repoline then we
don't need this change, but its not a hotpath so doesn't warrant yet
another variant of SPEC_CTRL_ENTRY_FROM_*.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-23  0:17         ` Andrew Cooper
@ 2018-01-23  2:19           ` Boris Ostrovsky
  2018-01-23 20:00             ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2018-01-23  2:19 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: Xen-devel



On 01/22/2018 07:17 PM, Andrew Cooper wrote:
> On 22/01/2018 22:27, Boris Ostrovsky wrote:
>> On 01/19/2018 08:36 AM, Andrew Cooper wrote:
>>> On 19/01/18 11:43, Jan Beulich wrote:
>>>
>>>>> @@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
>>>>>   .Lvmx_vmentry_fail:
>>>>>           sti
>>>>>           SAVE_ALL
>>>>> +
>>>>> +        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
>>>> I think the use of the PV variant here requires a comment.
>>> Oh.  It used to have one...  I'll try to find it.
>> I, in fact, meant to ask about this for a long time and always forgot.
>> Perhaps your comment will say more than just why a PV variant is used
>> here but in case it won't --- why do we have *any* mitigation here? We
>> are never returning to the guest, do we?
> 
> We never return to *this* guest, but we are still open to abuse from a
> separate hyperthread, so still need to set SPEC_CTRL.IBRS if we are
> using IBRS for safety.  (If we are using lfence+jmp or repoline then we
> don't need this change, but its not a hotpath so doesn't warrant yet
> another variant of SPEC_CTRL_ENTRY_FROM_*.)


We wrote IBRS during VMEXIT. I thought this serves as barrier for all 
preceding predictions (both threads) from lower protection mode.

Is the concern here that SPEC_CTRL_EXIT_TO_GUEST (before VMENTER) may 
set IBRS to 0 and *that* will open hypervisor to other thread's mischief?

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
  2018-01-23  2:19           ` Boris Ostrovsky
@ 2018-01-23 20:00             ` Andrew Cooper
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Cooper @ 2018-01-23 20:00 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich; +Cc: Xen-devel

On 23/01/18 02:19, Boris Ostrovsky wrote:
>
>
> On 01/22/2018 07:17 PM, Andrew Cooper wrote:
>> On 22/01/2018 22:27, Boris Ostrovsky wrote:
>>> On 01/19/2018 08:36 AM, Andrew Cooper wrote:
>>>> On 19/01/18 11:43, Jan Beulich wrote:
>>>>
>>>>>> @@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
>>>>>>   .Lvmx_vmentry_fail:
>>>>>>           sti
>>>>>>           SAVE_ALL
>>>>>> +
>>>>>> +        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob:
>>>>>> acd */
>>>>> I think the use of the PV variant here requires a comment.
>>>> Oh.  It used to have one...  I'll try to find it.
>>> I, in fact, meant to ask about this for a long time and always forgot.
>>> Perhaps your comment will say more than just why a PV variant is used
>>> here but in case it won't --- why do we have *any* mitigation here? We
>>> are never returning to the guest, do we?
>>
>> We never return to *this* guest, but we are still open to abuse from a
>> separate hyperthread, so still need to set SPEC_CTRL.IBRS if we are
>> using IBRS for safety.  (If we are using lfence+jmp or repoline then we
>> don't need this change, but its not a hotpath so doesn't warrant yet
>> another variant of SPEC_CTRL_ENTRY_FROM_*.)
>
>
> We wrote IBRS during VMEXIT. I thought this serves as barrier for all
> preceding predictions (both threads) from lower protection mode.

There is no guarantee that a sequence which sets IBRS, then clears it,
serves to flush older predictions.

In practice, some implementations of IBRS are a flush internally, and
some are a disable internally.  This is why the semantics require you to
rewrite it the value 1 on entry to a higher mode (even if it was
previously set), and keep it set for the duration of protection you want.

> Is the concern here that SPEC_CTRL_EXIT_TO_GUEST (before VMENTER) may
> set IBRS to 0 and *that* will open hypervisor to other thread's mischief?

Correct.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2018-01-23 20:00 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-18 15:45 [PATCH v9 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection Andrew Cooper
2018-01-18 15:46 ` [PATCH v9 01/11] x86/thunk: Fix GEN_INDIRECT_THUNK comment Andrew Cooper
2018-01-18 16:06   ` Jan Beulich
2018-01-18 15:46 ` [PATCH v9 02/11] x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests Andrew Cooper
2018-01-19 10:40   ` Jan Beulich
2018-01-19 10:53     ` Andrew Cooper
2018-01-19 11:46       ` Jan Beulich
2018-01-19 12:01         ` Andrew Cooper
2018-01-19 12:11           ` Jan Beulich
2018-01-19 12:36             ` Andrew Cooper
2018-01-19 12:52               ` Jan Beulich
2018-01-19 13:06                 ` Andrew Cooper
2018-01-19 13:33                   ` Jan Beulich
2018-01-19 15:00                     ` Andrew Cooper
2018-01-19 15:09                       ` Jan Beulich
2018-01-18 15:46 ` [PATCH v9 03/11] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} " Andrew Cooper
2018-01-19 10:45   ` Jan Beulich
2018-01-19 11:05     ` Andrew Cooper
2018-01-22 14:50     ` Andrew Cooper
2018-01-18 15:46 ` [PATCH v9 04/11] x86/migrate: Move MSR_SPEC_CTRL on migrate Andrew Cooper
2018-01-18 15:46 ` [PATCH v9 05/11] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD} Andrew Cooper
2018-01-18 18:04   ` Boris Ostrovsky
2018-01-19 10:52   ` Jan Beulich
2018-01-19 10:54     ` Andrew Cooper
2018-01-22  1:47   ` Tian, Kevin
2018-01-18 15:46 ` [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point Andrew Cooper
2018-01-19 10:39   ` Andrew Cooper
2018-01-19 11:43   ` Jan Beulich
2018-01-19 13:36     ` Andrew Cooper
2018-01-19 13:51       ` Jan Beulich
2018-01-22 11:42         ` Andrew Cooper
2018-01-22 12:06           ` Jan Beulich
2018-01-22 13:52             ` Andrew Cooper
2018-01-22 22:27       ` Boris Ostrovsky
2018-01-23  0:17         ` Andrew Cooper
2018-01-23  2:19           ` Boris Ostrovsky
2018-01-23 20:00             ` Andrew Cooper
2018-01-18 15:46 ` [PATCH v9 07/11] x86/boot: Calculate the most appropriate BTI mitigation to use Andrew Cooper
2018-01-19 12:06   ` Jan Beulich
2018-01-19 13:48     ` Andrew Cooper
2018-01-19 14:01   ` Jan Beulich
2018-01-22 15:11     ` Andrew Cooper
2018-01-18 15:46 ` [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen Andrew Cooper
2018-01-19 12:47   ` Jan Beulich
2018-01-19 14:24     ` Andrew Cooper
2018-01-19 15:02       ` Jan Beulich
2018-01-19 16:10         ` Andrew Cooper
2018-01-19 16:19           ` Jan Beulich
2018-01-19 16:43             ` Andrew Cooper
2018-01-22 15:51         ` Andrew Cooper
2018-01-22 16:49           ` Jan Beulich
2018-01-22 17:04             ` Andrew Cooper
2018-01-18 15:46 ` [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts Andrew Cooper
2018-01-19 13:25   ` Jan Beulich
2018-01-19 14:48     ` Andrew Cooper
2018-01-19 15:07       ` Jan Beulich
2018-01-20 11:10         ` David Woodhouse
2018-01-22 10:15           ` Jan Beulich
2018-01-18 15:46 ` [PATCH v9 10/11] x86/cpuid: Offer Indirect Branch Controls to guests Andrew Cooper
2018-01-18 15:46 ` [PATCH v9 11/11] x86/idle: Clear SPEC_CTRL while idle Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.