All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] x86/spec-ctrl: Fix NMI race condition
@ 2022-01-13 16:38 Andrew Cooper
  2022-01-13 16:38 ` [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Andrew Cooper @ 2022-01-13 16:38 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

Andrew Cooper (3):
  x86/msr: Split MSR_SPEC_CTRL handling
  x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM
  x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling

 xen/arch/x86/hvm/svm/entry.S             |  5 ++--
 xen/arch/x86/hvm/vmx/entry.S             | 23 ++++++++++-----
 xen/arch/x86/hvm/vmx/vmx.c               | 38 ++++++++++++++++++++++++-
 xen/arch/x86/include/asm/msr.h           | 21 ++++++++++----
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 49 ++++----------------------------
 xen/arch/x86/msr.c                       |  6 ++--
 xen/arch/x86/pv/emul-priv-op.c           | 10 +++++++
 7 files changed, 90 insertions(+), 62 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling
  2022-01-13 16:38 [PATCH 0/3] x86/spec-ctrl: Fix NMI race condition Andrew Cooper
@ 2022-01-13 16:38 ` Andrew Cooper
  2022-01-14 11:03   ` Roger Pau Monné
  2022-01-17 11:07   ` Jan Beulich
  2022-01-13 16:38 ` [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM Andrew Cooper
  2022-01-13 16:38 ` [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling Andrew Cooper
  2 siblings, 2 replies; 24+ messages in thread
From: Andrew Cooper @ 2022-01-13 16:38 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need
to be three different access methods for where the guest's value lives.
However, it would be better not to duplicate the #GP checking logic.

guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we
can repurpose X86EMUL_UNHANDLEABLE slightly for this.  This is going to be a
common pattern for other MSRs too in the future.

Duplicate the msrs->spec_ctrl.raw accesses in the PV and VT-x paths for now.
The SVM path is currently unreachable because of the CPUID policy.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/hvm/vmx/vmx.c     | 10 ++++++++++
 xen/arch/x86/include/asm/msr.h | 11 +++++++----
 xen/arch/x86/msr.c             |  6 ++----
 xen/arch/x86/pv/emul-priv-op.c | 10 ++++++++++
 4 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index a7a0d662342a..28ee6393f11e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3128,12 +3128,17 @@ static int is_last_branch_msr(u32 ecx)
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
 {
     struct vcpu *curr = current;
+    const struct vcpu_msrs *msrs = curr->arch.msrs;
     uint64_t tmp;
 
     HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
 
     switch ( msr )
     {
+    case MSR_SPEC_CTRL: /* guest_rdmsr() has already performed #GP checks. */
+        *msr_content = msrs->spec_ctrl.raw;
+        break;
+
     case MSR_IA32_SYSENTER_CS:
         __vmread(GUEST_SYSENTER_CS, msr_content);
         break;
@@ -3331,6 +3336,7 @@ void vmx_vlapic_msr_changed(struct vcpu *v)
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 {
     struct vcpu *v = current;
+    struct vcpu_msrs *msrs = v->arch.msrs;
     const struct cpuid_policy *cp = v->domain->arch.cpuid;
 
     HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x, msr_value=%#"PRIx64, msr, msr_content);
@@ -3339,6 +3345,10 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     {
         uint64_t rsvd, tmp;
 
+    case MSR_SPEC_CTRL: /* guest_wrmsr() has already performed #GP checks. */
+        msrs->spec_ctrl.raw = msr_content;
+        return X86EMUL_OKAY;
+
     case MSR_IA32_SYSENTER_CS:
         __vmwrite(GUEST_SYSENTER_CS, msr_content);
         break;
diff --git a/xen/arch/x86/include/asm/msr.h b/xen/arch/x86/include/asm/msr.h
index 1d3eca9063a2..0b2176a9bc53 100644
--- a/xen/arch/x86/include/asm/msr.h
+++ b/xen/arch/x86/include/asm/msr.h
@@ -367,10 +367,13 @@ int init_domain_msr_policy(struct domain *d);
 int init_vcpu_msr_policy(struct vcpu *v);
 
 /*
- * Below functions can return X86EMUL_UNHANDLEABLE which means that MSR is
- * not (yet) handled by it and must be processed by legacy handlers. Such
- * behaviour is needed for transition period until all rd/wrmsr are handled
- * by the new MSR infrastructure.
+ * The below functions return X86EMUL_*.  Callers are responsible for
+ * converting X86EMUL_EXCEPTION into #GP[0].
+ *
+ * X86EMUL_UNHANDLEABLE means "not everything complete".  It could be that:
+ *   1) Common #GP checks have been done, but val access needs delegating to the
+ *      per-VM-type handlers.
+ *   2) The MSR is not handled at all by common logic.
  *
  * These functions are also used by the migration logic, so need to cope with
  * being used outside of v's context.
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index b834456c7b02..3549630d6699 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -265,8 +265,7 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
     case MSR_SPEC_CTRL:
         if ( !cp->feat.ibrsb )
             goto gp_fault;
-        *val = msrs->spec_ctrl.raw;
-        break;
+        return X86EMUL_UNHANDLEABLE; /* Delegate value to per-VM-type logic. */
 
     case MSR_INTEL_PLATFORM_INFO:
         *val = mp->platform_info.raw;
@@ -514,8 +513,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
         if ( val & rsvd )
             goto gp_fault; /* Rsvd bit set? */
 
-        msrs->spec_ctrl.raw = val;
-        break;
+        return X86EMUL_UNHANDLEABLE; /* Delegate value to per-VM-type logic. */
 
     case MSR_PRED_CMD:
         if ( !cp->feat.ibrsb && !cp->extd.ibpb )
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index c78be6d92b21..6644e739209c 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -875,6 +875,7 @@ static int read_msr(unsigned int reg, uint64_t *val,
                     struct x86_emulate_ctxt *ctxt)
 {
     struct vcpu *curr = current;
+    const struct vcpu_msrs *msrs = curr->arch.msrs;
     const struct domain *currd = curr->domain;
     const struct cpuid_policy *cp = currd->arch.cpuid;
     bool vpmu_msr = false, warn = false;
@@ -898,6 +899,10 @@ static int read_msr(unsigned int reg, uint64_t *val,
             *val |= APIC_BASE_BSP;
         return X86EMUL_OKAY;
 
+    case MSR_SPEC_CTRL: /* guest_rdmsr() has already performed #GP checks. */
+        *val = msrs->spec_ctrl.raw;
+        return X86EMUL_OKAY;
+
     case MSR_FS_BASE:
         if ( !cp->extd.lm )
             break;
@@ -1024,6 +1029,7 @@ static int write_msr(unsigned int reg, uint64_t val,
                      struct x86_emulate_ctxt *ctxt)
 {
     struct vcpu *curr = current;
+    struct vcpu_msrs *msrs = curr->arch.msrs;
     const struct domain *currd = curr->domain;
     const struct cpuid_policy *cp = currd->arch.cpuid;
     bool vpmu_msr = false;
@@ -1041,6 +1047,10 @@ static int write_msr(unsigned int reg, uint64_t val,
     {
         uint64_t temp;
 
+    case MSR_SPEC_CTRL: /* guest_wrmsr() has already performed #GP checks. */
+        msrs->spec_ctrl.raw = val;
+        return X86EMUL_OKAY;
+
     case MSR_FS_BASE:
     case MSR_GS_BASE:
     case MSR_SHADOW_GS_BASE:
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM
  2022-01-13 16:38 [PATCH 0/3] x86/spec-ctrl: Fix NMI race condition Andrew Cooper
  2022-01-13 16:38 ` [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling Andrew Cooper
@ 2022-01-13 16:38 ` Andrew Cooper
  2022-01-14 11:42   ` Roger Pau Monné
  2022-01-17 11:22   ` Jan Beulich
  2022-01-13 16:38 ` [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling Andrew Cooper
  2 siblings, 2 replies; 24+ messages in thread
From: Andrew Cooper @ 2022-01-13 16:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Jun Nakajima, Kevin Tian

These were written before Spectre/Meltdown went public, and there was large
uncertainty in how the protections would evolve.  As it turns out, they're
very specific to Intel hardware, and not very suitable for AMD.

Expand and drop the macros.  No change at all for VT-x.

For AMD, the only relevant piece of functionality is DO_OVERWRITE_RSB,
although we will soon be adding (different) logic to handle MSR_SPEC_CTRL.

This has a marginal improvement of removing an unconditional pile of long-nops
from the vmentry/exit path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/svm/entry.S             |  5 +++--
 xen/arch/x86/hvm/vmx/entry.S             |  8 ++++++--
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 17 ++---------------
 3 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index e208a4b32ae7..276215d36aff 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -59,7 +59,7 @@ __UNLIKELY_END(nsvm_hap)
         mov VCPUMSR_spec_ctrl_raw(%rax), %eax
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-        SPEC_CTRL_EXIT_TO_HVM   /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+        /* SPEC_CTRL_EXIT_TO_SVM   (nothing currently) */
 
         pop  %r15
         pop  %r14
@@ -86,7 +86,8 @@ __UNLIKELY_END(nsvm_hap)
 
         GET_CURRENT(bx)
 
-        SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: b=curr %rsp=regs/cpuinfo, Clob: ac  */
+        ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         stgi
diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 27c8c5ca4943..30139ae58e9d 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -33,7 +33,9 @@ ENTRY(vmx_asm_vmexit_handler)
         movb $1,VCPU_vmx_launched(%rbx)
         mov  %rax,VCPU_hvm_guest_cr2(%rbx)
 
-        SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+        /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+        ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
+        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         /* Hardware clears MSR_DEBUGCTL on VMExit.  Reinstate it if debugging Xen. */
@@ -80,7 +82,9 @@ UNLIKELY_END(realmode)
         mov VCPUMSR_spec_ctrl_raw(%rax), %eax
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-        SPEC_CTRL_EXIT_TO_HVM   /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+        /* SPEC_CTRL_EXIT_TO_VMX   Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+        ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
+        ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)), X86_FEATURE_SC_VERW_HVM
 
         mov  VCPU_hvm_guest_cr2(%rbx),%rax
 
diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index cb34299a865b..18ecfcd70375 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -68,14 +68,14 @@
  *
  * The following ASM fragments implement this algorithm.  See their local
  * comments for further details.
- *  - SPEC_CTRL_ENTRY_FROM_HVM
+ *  - SPEC_CTRL_ENTRY_FROM_{SVM,VMX} (See appropriate entry.S files)
  *  - SPEC_CTRL_ENTRY_FROM_PV
  *  - SPEC_CTRL_ENTRY_FROM_INTR
  *  - SPEC_CTRL_ENTRY_FROM_INTR_IST
  *  - SPEC_CTRL_EXIT_TO_XEN_IST
  *  - SPEC_CTRL_EXIT_TO_XEN
  *  - SPEC_CTRL_EXIT_TO_PV
- *  - SPEC_CTRL_EXIT_TO_HVM
+ *  - SPEC_CTRL_EXIT_TO_{SVM,VMX}
  */
 
 .macro DO_OVERWRITE_RSB tmp=rax
@@ -225,12 +225,6 @@
     wrmsr
 .endm
 
-/* Use after a VMEXIT from an HVM guest. */
-#define SPEC_CTRL_ENTRY_FROM_HVM                                        \
-    ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM;           \
-    ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM,                        \
-        X86_FEATURE_SC_MSR_HVM
-
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
 #define SPEC_CTRL_ENTRY_FROM_PV                                         \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
@@ -255,13 +249,6 @@
     ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)),           \
         X86_FEATURE_SC_VERW_PV
 
-/* Use when exiting to HVM guest context. */
-#define SPEC_CTRL_EXIT_TO_HVM                                           \
-    ALTERNATIVE "",                                                     \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM;             \
-    ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)),           \
-        X86_FEATURE_SC_VERW_HVM
-
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.
  * Fine grain control of SCF_ist_wrmsr is needed for safety in the S3 resume
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-13 16:38 [PATCH 0/3] x86/spec-ctrl: Fix NMI race condition Andrew Cooper
  2022-01-13 16:38 ` [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling Andrew Cooper
  2022-01-13 16:38 ` [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM Andrew Cooper
@ 2022-01-13 16:38 ` Andrew Cooper
  2022-01-14 12:50   ` Roger Pau Monné
                     ` (2 more replies)
  2 siblings, 3 replies; 24+ messages in thread
From: Andrew Cooper @ 2022-01-13 16:38 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Jun Nakajima, Kevin Tian

The logic was based on a mistaken understanding of how NMI blocking on vmexit
works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
and the guest's value will be clobbered before it is saved.

Switch to using MSR load/save lists.  This causes the guest value to be saved
atomically with respect to NMIs/MCEs/etc.

First, update vmx_cpuid_policy_changed() to configure the load/save lists at
the same time as configuring the intercepts.  This function is always used in
remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
function, rather than having multiple remote acquisitions of the same VMCS.

vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
-ESRCH but we don't matter, and this path will be taken during domain create
on a system lacking IBRSB.

Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
rather than vcpu_msrs, and update the comment to describe the new state
location.

Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
entirely, and use a shorter code sequence to simply reload Xen's setting from
the top-of-stack block.

Because the guest values are loaded/saved atomically, we do not need to use
the shadowing logic to cope with late NMIs/etc, so we can omit
DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
no need to switch back to Xen's context on an early failure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>

Needs backporting as far as people can tolerate.

If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
but this is awkard to arrange in asm.
---
 xen/arch/x86/hvm/vmx/entry.S             | 19 ++++++++++-------
 xen/arch/x86/hvm/vmx/vmx.c               | 36 +++++++++++++++++++++++++++-----
 xen/arch/x86/include/asm/msr.h           | 10 ++++++++-
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 32 ++++------------------------
 4 files changed, 56 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 30139ae58e9d..297ed8685126 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -35,7 +35,14 @@ ENTRY(vmx_asm_vmexit_handler)
 
         /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
         ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
-        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
+
+        .macro restore_spec_ctrl
+            mov    $MSR_SPEC_CTRL, %ecx
+            movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
+            xor    %edx, %edx
+            wrmsr
+        .endm
+        ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         /* Hardware clears MSR_DEBUGCTL on VMExit.  Reinstate it if debugging Xen. */
@@ -83,7 +90,6 @@ UNLIKELY_END(realmode)
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
         /* SPEC_CTRL_EXIT_TO_VMX   Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
-        ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
         ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)), X86_FEATURE_SC_VERW_HVM
 
         mov  VCPU_hvm_guest_cr2(%rbx),%rax
@@ -119,12 +125,11 @@ UNLIKELY_END(realmode)
         SAVE_ALL
 
         /*
-         * PV variant needed here as no guest code has executed (so
-         * MSR_SPEC_CTRL can't have changed value), and NMIs/MCEs are liable
-         * to hit (in which case the HVM variant might corrupt things).
+         * SPEC_CTRL_ENTRY notes
+         *
+         * If we end up here, no guest code has executed.  We still have Xen's
+         * choice of MSR_SPEC_CTRL in context, and the RSB is safe.
          */
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
-        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         call vmx_vmentry_failure
         jmp  .Lvmx_process_softirqs
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 28ee6393f11e..d7feb5f9c455 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -592,6 +592,7 @@ void vmx_update_exception_bitmap(struct vcpu *v)
 static void vmx_cpuid_policy_changed(struct vcpu *v)
 {
     const struct cpuid_policy *cp = v->domain->arch.cpuid;
+    int rc = 0;
 
     if ( opt_hvm_fep ||
          (v->domain->arch.cpuid->x86_vendor != boot_cpu_data.x86_vendor) )
@@ -601,17 +602,27 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
 
     vmx_vmcs_enter(v);
     vmx_update_exception_bitmap(v);
-    vmx_vmcs_exit(v);
 
     /*
      * We can safely pass MSR_SPEC_CTRL through to the guest, even if STIBP
      * isn't enumerated in hardware, as SPEC_CTRL_STIBP is ignored.
      */
     if ( cp->feat.ibrsb )
+    {
         vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
+
+        rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0);
+        if ( rc )
+            goto err;
+    }
     else
+    {
         vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
 
+        /* Can only fail with -ESRCH, and we don't care. */
+        vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST);
+    }
+
     /* MSR_PRED_CMD is safe to pass through if the guest knows about it. */
     if ( cp->feat.ibrsb || cp->extd.ibpb )
         vmx_clear_msr_intercept(v, MSR_PRED_CMD,  VMX_MSR_RW);
@@ -623,6 +634,15 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
         vmx_clear_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
     else
         vmx_set_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
+
+ err:
+    vmx_vmcs_exit(v);
+
+    if ( rc )
+    {
+        printk(XENLOG_G_ERR "MSR load/save list error: %d", rc);
+        domain_crash(v->domain);
+    }
 }
 
 int vmx_guest_x86_mode(struct vcpu *v)
@@ -3128,7 +3148,6 @@ static int is_last_branch_msr(u32 ecx)
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
 {
     struct vcpu *curr = current;
-    const struct vcpu_msrs *msrs = curr->arch.msrs;
     uint64_t tmp;
 
     HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
@@ -3136,7 +3155,11 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     switch ( msr )
     {
     case MSR_SPEC_CTRL: /* guest_rdmsr() has already performed #GP checks. */
-        *msr_content = msrs->spec_ctrl.raw;
+        if ( vmx_read_guest_msr(curr, msr, msr_content) )
+        {
+            ASSERT_UNREACHABLE();
+            goto gp_fault;
+        }
         break;
 
     case MSR_IA32_SYSENTER_CS:
@@ -3336,7 +3359,6 @@ void vmx_vlapic_msr_changed(struct vcpu *v)
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 {
     struct vcpu *v = current;
-    struct vcpu_msrs *msrs = v->arch.msrs;
     const struct cpuid_policy *cp = v->domain->arch.cpuid;
 
     HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x, msr_value=%#"PRIx64, msr, msr_content);
@@ -3346,7 +3368,11 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         uint64_t rsvd, tmp;
 
     case MSR_SPEC_CTRL: /* guest_wrmsr() has already performed #GP checks. */
-        msrs->spec_ctrl.raw = msr_content;
+        if ( vmx_write_guest_msr(v, msr, msr_content) )
+        {
+            ASSERT_UNREACHABLE();
+            goto gp_fault;
+        }
         return X86EMUL_OKAY;
 
     case MSR_IA32_SYSENTER_CS:
diff --git a/xen/arch/x86/include/asm/msr.h b/xen/arch/x86/include/asm/msr.h
index 0b2176a9bc53..5f851958992b 100644
--- a/xen/arch/x86/include/asm/msr.h
+++ b/xen/arch/x86/include/asm/msr.h
@@ -287,7 +287,15 @@ extern struct msr_policy     raw_msr_policy,
 /* Container object for per-vCPU MSRs */
 struct vcpu_msrs
 {
-    /* 0x00000048 - MSR_SPEC_CTRL */
+    /*
+     * 0x00000048 - MSR_SPEC_CTRL
+     *
+     * For PV guests, this holds the guest kernel value.  It is accessed on
+     * every entry/exit path.
+     *
+     * For VT-x guests, the guest value is held in the MSR guest load/save
+     * list.
+     */
     struct {
         uint32_t raw;
     } spec_ctrl;
diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 18ecfcd70375..ddce8b33fd17 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -42,9 +42,10 @@
  *     path, or late in the exit path after restoring the guest value.  This
  *     will corrupt the guest value.
  *
- * Factor 1 is dealt with by relying on NMIs/MCEs being blocked immediately
- * after VMEXIT.  The VMEXIT-specific code reads MSR_SPEC_CTRL and updates
- * current before loading Xen's MSR_SPEC_CTRL setting.
+ * Factor 1 is dealt with:
+ *   - On VMX by using MSR load/save lists to have vmentry/exit atomically
+ *     load/save the guest value.  Xen's value is loaded in regular code, and
+ *     there is no need to use the shadow logic (below).
  *
  * Factor 2 is harder.  We maintain a shadow_spec_ctrl value, and a use_shadow
  * boolean in the per cpu spec_ctrl_flags.  The synchronous use is:
@@ -126,31 +127,6 @@
 #endif
 .endm
 
-.macro DO_SPEC_CTRL_ENTRY_FROM_HVM
-/*
- * Requires %rbx=current, %rsp=regs/cpuinfo
- * Clobbers %rax, %rcx, %rdx
- *
- * The common case is that a guest has direct access to MSR_SPEC_CTRL, at
- * which point we need to save the guest value before setting IBRS for Xen.
- * Unilaterally saving the guest value is shorter and faster than checking.
- */
-    mov $MSR_SPEC_CTRL, %ecx
-    rdmsr
-
-    /* Stash the value from hardware. */
-    mov VCPU_arch_msrs(%rbx), %rdx
-    mov %eax, VCPUMSR_spec_ctrl_raw(%rdx)
-    xor %edx, %edx
-
-    /* Clear SPEC_CTRL shadowing *before* loading Xen's value. */
-    andb $~SCF_use_shadow, CPUINFO_spec_ctrl_flags(%rsp)
-
-    /* Load Xen's intended value. */
-    movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
-    wrmsr
-.endm
-
 .macro DO_SPEC_CTRL_ENTRY maybexen:req
 /*
  * Requires %rsp=regs (also cpuinfo if !maybexen)
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling
  2022-01-13 16:38 ` [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling Andrew Cooper
@ 2022-01-14 11:03   ` Roger Pau Monné
  2022-01-17 11:07   ` Jan Beulich
  1 sibling, 0 replies; 24+ messages in thread
From: Roger Pau Monné @ 2022-01-14 11:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Wei Liu

On Thu, Jan 13, 2022 at 04:38:31PM +0000, Andrew Cooper wrote:
> In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need
> to be three different access methods for where the guest's value lives.
> However, it would be better not to duplicate the #GP checking logic.
> 
> guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we
> can repurpose X86EMUL_UNHANDLEABLE slightly for this.  This is going to be a
> common pattern for other MSRs too in the future.
> 
> Duplicate the msrs->spec_ctrl.raw accesses in the PV and VT-x paths for now.
> The SVM path is currently unreachable because of the CPUID policy.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

It feels a bit weird to have the checks separated from the actual
access to the MSR data, but that's likely better than to replicate the
checks in both the PV and HVM vendor paths.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM
  2022-01-13 16:38 ` [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM Andrew Cooper
@ 2022-01-14 11:42   ` Roger Pau Monné
  2022-01-14 11:49     ` Andrew Cooper
  2022-01-17 11:22   ` Jan Beulich
  1 sibling, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2022-01-14 11:42 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Wei Liu, Jun Nakajima, Kevin Tian

On Thu, Jan 13, 2022 at 04:38:32PM +0000, Andrew Cooper wrote:
> These were written before Spectre/Meltdown went public, and there was large
> uncertainty in how the protections would evolve.  As it turns out, they're
> very specific to Intel hardware, and not very suitable for AMD.
> 
> Expand and drop the macros.  No change at all for VT-x.
> 
> For AMD, the only relevant piece of functionality is DO_OVERWRITE_RSB,
> although we will soon be adding (different) logic to handle MSR_SPEC_CTRL.
> 
> This has a marginal improvement of removing an unconditional pile of long-nops
> from the vmentry/exit path.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

I wonder however if it would be clearer to define
SPEC_CTRL_ENTRY_FROM_{SVM,VMX} and EXIT macros in spec_ctrl_asm.h
(even if just used in a single place) so that all the related SPEC
macros are in a single file.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM
  2022-01-14 11:42   ` Roger Pau Monné
@ 2022-01-14 11:49     ` Andrew Cooper
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2022-01-14 11:49 UTC (permalink / raw)
  To: Roger Pau Monné, Andrew Cooper
  Cc: Xen-devel, Jan Beulich, Wei Liu, Jun Nakajima, Kevin Tian

On 14/01/2022 11:42, Roger Pau Monné wrote:
> On Thu, Jan 13, 2022 at 04:38:32PM +0000, Andrew Cooper wrote:
>> These were written before Spectre/Meltdown went public, and there was large
>> uncertainty in how the protections would evolve.  As it turns out, they're
>> very specific to Intel hardware, and not very suitable for AMD.
>>
>> Expand and drop the macros.  No change at all for VT-x.
>>
>> For AMD, the only relevant piece of functionality is DO_OVERWRITE_RSB,
>> although we will soon be adding (different) logic to handle MSR_SPEC_CTRL.
>>
>> This has a marginal improvement of removing an unconditional pile of long-nops
>> from the vmentry/exit path.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>
> I wonder however if it would be clearer to define
> SPEC_CTRL_ENTRY_FROM_{SVM,VMX} and EXIT macros in spec_ctrl_asm.h
> (even if just used in a single place) so that all the related SPEC
> macros are in a single file.

For AMD MSR_SPEC_CTRL support, I'm going to need to shift the STGI/CLGI,
then call up into C, and I do not thing this is appropriate to have
separated out into spec_ctrl_asm.h

I left the comments intact deliberately so `grep SPEC_CTRL_ENTRY` still
lets you find everything.

The main difference between VT-x/SVM and PV is that for HVM, we have
this dance exactly once.  For PV, it is repeated multiple times in
subtly different ways, which is why having all the spec ctrl shadowing
logic together makes sense.

Its pretty ugly whichever way you look at it.

~Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-13 16:38 ` [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling Andrew Cooper
@ 2022-01-14 12:50   ` Roger Pau Monné
  2022-01-14 13:08     ` Andrew Cooper
  2022-01-14 14:14   ` Andrew Cooper
  2022-01-17 11:51   ` Jan Beulich
  2 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2022-01-14 12:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Wei Liu, Jun Nakajima, Kevin Tian

On Thu, Jan 13, 2022 at 04:38:33PM +0000, Andrew Cooper wrote:
> The logic was based on a mistaken understanding of how NMI blocking on vmexit
> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
> and the guest's value will be clobbered before it is saved.
> 
> Switch to using MSR load/save lists.  This causes the guest value to be saved
> atomically with respect to NMIs/MCEs/etc.
> 
> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
> the same time as configuring the intercepts.  This function is always used in
> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
> function, rather than having multiple remote acquisitions of the same VMCS.
> 
> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
> -ESRCH but we don't matter, and this path will be taken during domain create
> on a system lacking IBRSB.
> 
> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
> rather than vcpu_msrs, and update the comment to describe the new state
> location.
> 
> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
> entirely, and use a shorter code sequence to simply reload Xen's setting from
> the top-of-stack block.
> 
> Because the guest values are loaded/saved atomically, we do not need to use
> the shadowing logic to cope with late NMIs/etc, so we can omit
> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
> no need to switch back to Xen's context on an early failure.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> 
> Needs backporting as far as people can tolerate.
> 
> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
> but this is awkard to arrange in asm.
> ---
>  xen/arch/x86/hvm/vmx/entry.S             | 19 ++++++++++-------
>  xen/arch/x86/hvm/vmx/vmx.c               | 36 +++++++++++++++++++++++++++-----
>  xen/arch/x86/include/asm/msr.h           | 10 ++++++++-
>  xen/arch/x86/include/asm/spec_ctrl_asm.h | 32 ++++------------------------
>  4 files changed, 56 insertions(+), 41 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
> index 30139ae58e9d..297ed8685126 100644
> --- a/xen/arch/x86/hvm/vmx/entry.S
> +++ b/xen/arch/x86/hvm/vmx/entry.S
> @@ -35,7 +35,14 @@ ENTRY(vmx_asm_vmexit_handler)
>  
>          /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>          ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
> -        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
> +
> +        .macro restore_spec_ctrl
> +            mov    $MSR_SPEC_CTRL, %ecx
> +            movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
> +            xor    %edx, %edx
> +            wrmsr
> +        .endm
> +        ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM

I assume loading the host value into SPEC_CTRL strictly needs to
happen after the RSB overwrite, or else you could use a VM exit host
load MSR entry to handle MSR_SPEC_CTRL.

Also I don't understand why SPEC_CTRL shadowing is not cleared before
loading Xen's value now.

>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>  
>          /* Hardware clears MSR_DEBUGCTL on VMExit.  Reinstate it if debugging Xen. */
> @@ -83,7 +90,6 @@ UNLIKELY_END(realmode)
>  
>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>          /* SPEC_CTRL_EXIT_TO_VMX   Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
> -        ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
>          ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)), X86_FEATURE_SC_VERW_HVM
>  
>          mov  VCPU_hvm_guest_cr2(%rbx),%rax
> @@ -119,12 +125,11 @@ UNLIKELY_END(realmode)
>          SAVE_ALL
>  
>          /*
> -         * PV variant needed here as no guest code has executed (so
> -         * MSR_SPEC_CTRL can't have changed value), and NMIs/MCEs are liable
> -         * to hit (in which case the HVM variant might corrupt things).
> +         * SPEC_CTRL_ENTRY notes
> +         *
> +         * If we end up here, no guest code has executed.  We still have Xen's
> +         * choice of MSR_SPEC_CTRL in context, and the RSB is safe.
>           */
> -        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
> -        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>  
>          call vmx_vmentry_failure
>          jmp  .Lvmx_process_softirqs
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 28ee6393f11e..d7feb5f9c455 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -592,6 +592,7 @@ void vmx_update_exception_bitmap(struct vcpu *v)
>  static void vmx_cpuid_policy_changed(struct vcpu *v)
>  {
>      const struct cpuid_policy *cp = v->domain->arch.cpuid;
> +    int rc = 0;
>  
>      if ( opt_hvm_fep ||
>           (v->domain->arch.cpuid->x86_vendor != boot_cpu_data.x86_vendor) )
> @@ -601,17 +602,27 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
>  
>      vmx_vmcs_enter(v);
>      vmx_update_exception_bitmap(v);
> -    vmx_vmcs_exit(v);
>  
>      /*
>       * We can safely pass MSR_SPEC_CTRL through to the guest, even if STIBP
>       * isn't enumerated in hardware, as SPEC_CTRL_STIBP is ignored.
>       */
>      if ( cp->feat.ibrsb )
> +    {
>          vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
> +
> +        rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0);
> +        if ( rc )
> +            goto err;
> +    }
>      else
> +    {
>          vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
>  
> +        /* Can only fail with -ESRCH, and we don't care. */
> +        vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST);
> +    }
> +
>      /* MSR_PRED_CMD is safe to pass through if the guest knows about it. */
>      if ( cp->feat.ibrsb || cp->extd.ibpb )
>          vmx_clear_msr_intercept(v, MSR_PRED_CMD,  VMX_MSR_RW);
> @@ -623,6 +634,15 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
>          vmx_clear_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
>      else
>          vmx_set_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
> +
> + err:

Nit: I would name this out, since it's used by both the error and the
normal exit paths, but that's just my taste.

> +    vmx_vmcs_exit(v);
> +
> +    if ( rc )
> +    {
> +        printk(XENLOG_G_ERR "MSR load/save list error: %d", rc);
> +        domain_crash(v->domain);
> +    }
>  }
>  
>  int vmx_guest_x86_mode(struct vcpu *v)
> @@ -3128,7 +3148,6 @@ static int is_last_branch_msr(u32 ecx)
>  static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>  {
>      struct vcpu *curr = current;
> -    const struct vcpu_msrs *msrs = curr->arch.msrs;
>      uint64_t tmp;
>  
>      HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
> @@ -3136,7 +3155,11 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>      switch ( msr )
>      {
>      case MSR_SPEC_CTRL: /* guest_rdmsr() has already performed #GP checks. */
> -        *msr_content = msrs->spec_ctrl.raw;
> +        if ( vmx_read_guest_msr(curr, msr, msr_content) )
> +        {
> +            ASSERT_UNREACHABLE();
> +            goto gp_fault;
> +        }

AFAICT this is required just for Xen internal callers, as a guest
attempt to access MSR_SPEC_CTRL will never be trapped, or if trapped it
would be because !cp->feat.ibrsb and thus won't get here anyway.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-14 12:50   ` Roger Pau Monné
@ 2022-01-14 13:08     ` Andrew Cooper
  2022-01-14 13:43       ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2022-01-14 13:08 UTC (permalink / raw)
  To: Roger Pau Monné, Andrew Cooper
  Cc: Xen-devel, Jan Beulich, Wei Liu, Jun Nakajima, Kevin Tian

On 14/01/2022 12:50, Roger Pau Monné wrote:
> On Thu, Jan 13, 2022 at 04:38:33PM +0000, Andrew Cooper wrote:
>> The logic was based on a mistaken understanding of how NMI blocking on vmexit
>> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
>> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
>> and the guest's value will be clobbered before it is saved.
>>
>> Switch to using MSR load/save lists.  This causes the guest value to be saved
>> atomically with respect to NMIs/MCEs/etc.
>>
>> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
>> the same time as configuring the intercepts.  This function is always used in
>> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
>> function, rather than having multiple remote acquisitions of the same VMCS.
>>
>> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
>> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
>> -ESRCH but we don't matter, and this path will be taken during domain create
>> on a system lacking IBRSB.
>>
>> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
>> rather than vcpu_msrs, and update the comment to describe the new state
>> location.
>>
>> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
>> entirely, and use a shorter code sequence to simply reload Xen's setting from
>> the top-of-stack block.
>>
>> Because the guest values are loaded/saved atomically, we do not need to use
>> the shadowing logic to cope with late NMIs/etc, so we can omit
>> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
>> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
>> no need to switch back to Xen's context on an early failure.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wl@xen.org>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>>
>> Needs backporting as far as people can tolerate.
>>
>> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
>> but this is awkard to arrange in asm.
>> ---
>>  xen/arch/x86/hvm/vmx/entry.S             | 19 ++++++++++-------
>>  xen/arch/x86/hvm/vmx/vmx.c               | 36 +++++++++++++++++++++++++++-----
>>  xen/arch/x86/include/asm/msr.h           | 10 ++++++++-
>>  xen/arch/x86/include/asm/spec_ctrl_asm.h | 32 ++++------------------------
>>  4 files changed, 56 insertions(+), 41 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
>> index 30139ae58e9d..297ed8685126 100644
>> --- a/xen/arch/x86/hvm/vmx/entry.S
>> +++ b/xen/arch/x86/hvm/vmx/entry.S
>> @@ -35,7 +35,14 @@ ENTRY(vmx_asm_vmexit_handler)
>>  
>>          /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>>          ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
>> -        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
>> +
>> +        .macro restore_spec_ctrl
>> +            mov    $MSR_SPEC_CTRL, %ecx
>> +            movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
>> +            xor    %edx, %edx
>> +            wrmsr
>> +        .endm
>> +        ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
> I assume loading the host value into SPEC_CTRL strictly needs to
> happen after the RSB overwrite, or else you could use a VM exit host
> load MSR entry to handle MSR_SPEC_CTRL.

We could use the host load list, but Intel warned that the lists aren't
as efficient as writing it like this, hence why I didn't use them
originally when we thought the NMI behaviour was different.  Obviously,
we're using them now for correctness reasons, which is more important
than avoiding them for (unquantified) perf reasons.

I've got some plans for changes to how Xen handles MSR_SPEC_CTRL for its
own protection, which I hope will bring a substantial efficiency
improvements, and those would require us not to use a host load list here.

> Also I don't understand why SPEC_CTRL shadowing is not cleared before
> loading Xen's value now.

Because we now don't set shadowing it in the first place, because of ...

>
>>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>>  
>>          /* Hardware clears MSR_DEBUGCTL on VMExit.  Reinstate it if debugging Xen. */
>> @@ -83,7 +90,6 @@ UNLIKELY_END(realmode)
>>  
>>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>>          /* SPEC_CTRL_EXIT_TO_VMX   Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
>> -        ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
>>          ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)), X86_FEATURE_SC_VERW_HVM

... this hunk.

Also, see the note about ASSERT() in the commit message.

I'm happy to try and make this clearer in the commit message if you have
any suggestions.

>>  
>>          mov  VCPU_hvm_guest_cr2(%rbx),%rax
>> @@ -119,12 +125,11 @@ UNLIKELY_END(realmode)
>>          SAVE_ALL
>>  
>>          /*
>> -         * PV variant needed here as no guest code has executed (so
>> -         * MSR_SPEC_CTRL can't have changed value), and NMIs/MCEs are liable
>> -         * to hit (in which case the HVM variant might corrupt things).
>> +         * SPEC_CTRL_ENTRY notes
>> +         *
>> +         * If we end up here, no guest code has executed.  We still have Xen's
>> +         * choice of MSR_SPEC_CTRL in context, and the RSB is safe.
>>           */
>> -        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
>> -        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>>  
>>          call vmx_vmentry_failure
>>          jmp  .Lvmx_process_softirqs
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index 28ee6393f11e..d7feb5f9c455 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -592,6 +592,7 @@ void vmx_update_exception_bitmap(struct vcpu *v)
>>  static void vmx_cpuid_policy_changed(struct vcpu *v)
>>  {
>>      const struct cpuid_policy *cp = v->domain->arch.cpuid;
>> +    int rc = 0;
>>  
>>      if ( opt_hvm_fep ||
>>           (v->domain->arch.cpuid->x86_vendor != boot_cpu_data.x86_vendor) )
>> @@ -601,17 +602,27 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
>>  
>>      vmx_vmcs_enter(v);
>>      vmx_update_exception_bitmap(v);
>> -    vmx_vmcs_exit(v);
>>  
>>      /*
>>       * We can safely pass MSR_SPEC_CTRL through to the guest, even if STIBP
>>       * isn't enumerated in hardware, as SPEC_CTRL_STIBP is ignored.
>>       */
>>      if ( cp->feat.ibrsb )
>> +    {
>>          vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
>> +
>> +        rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0);
>> +        if ( rc )
>> +            goto err;
>> +    }
>>      else
>> +    {
>>          vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
>>  
>> +        /* Can only fail with -ESRCH, and we don't care. */
>> +        vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST);
>> +    }
>> +
>>      /* MSR_PRED_CMD is safe to pass through if the guest knows about it. */
>>      if ( cp->feat.ibrsb || cp->extd.ibpb )
>>          vmx_clear_msr_intercept(v, MSR_PRED_CMD,  VMX_MSR_RW);
>> @@ -623,6 +634,15 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
>>          vmx_clear_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
>>      else
>>          vmx_set_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
>> +
>> + err:
> Nit: I would name this out, since it's used by both the error and the
> normal exit paths, but that's just my taste.

Ok.

>
>> +    vmx_vmcs_exit(v);
>> +
>> +    if ( rc )
>> +    {
>> +        printk(XENLOG_G_ERR "MSR load/save list error: %d", rc);
>> +        domain_crash(v->domain);
>> +    }
>>  }
>>  
>>  int vmx_guest_x86_mode(struct vcpu *v)
>> @@ -3128,7 +3148,6 @@ static int is_last_branch_msr(u32 ecx)
>>  static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>>  {
>>      struct vcpu *curr = current;
>> -    const struct vcpu_msrs *msrs = curr->arch.msrs;
>>      uint64_t tmp;
>>  
>>      HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
>> @@ -3136,7 +3155,11 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>>      switch ( msr )
>>      {
>>      case MSR_SPEC_CTRL: /* guest_rdmsr() has already performed #GP checks. */
>> -        *msr_content = msrs->spec_ctrl.raw;
>> +        if ( vmx_read_guest_msr(curr, msr, msr_content) )
>> +        {
>> +            ASSERT_UNREACHABLE();
>> +            goto gp_fault;
>> +        }
> AFAICT this is required just for Xen internal callers, as a guest
> attempt to access MSR_SPEC_CTRL will never be trapped, or if trapped it
> would be because !cp->feat.ibrsb and thus won't get here anyway.

We can end up here through FEP, or introspection caring about the MSR,
but most importantly, for moving MSR_SPEC_CTRL on migrate.

What the ASSERT_UNREACHABLE() is saying is that "if the common code
declares not #GP, then MSR_SPEC_CTRL is *definitely* exists in the MSR
list".

~Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-14 13:08     ` Andrew Cooper
@ 2022-01-14 13:43       ` Roger Pau Monné
  2022-01-14 13:49         ` Andrew Cooper
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2022-01-14 13:43 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Andrew Cooper, Xen-devel, Jan Beulich, Wei Liu, Jun Nakajima, Kevin Tian

On Fri, Jan 14, 2022 at 01:08:43PM +0000, Andrew Cooper wrote:
> On 14/01/2022 12:50, Roger Pau Monné wrote:
> > On Thu, Jan 13, 2022 at 04:38:33PM +0000, Andrew Cooper wrote:
> >> The logic was based on a mistaken understanding of how NMI blocking on vmexit
> >> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
> >> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
> >> and the guest's value will be clobbered before it is saved.
> >>
> >> Switch to using MSR load/save lists.  This causes the guest value to be saved
> >> atomically with respect to NMIs/MCEs/etc.
> >>
> >> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
> >> the same time as configuring the intercepts.  This function is always used in
> >> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
> >> function, rather than having multiple remote acquisitions of the same VMCS.
> >>
> >> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
> >> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
> >> -ESRCH but we don't matter, and this path will be taken during domain create
> >> on a system lacking IBRSB.
> >>
> >> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
> >> rather than vcpu_msrs, and update the comment to describe the new state
> >> location.
> >>
> >> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
> >> entirely, and use a shorter code sequence to simply reload Xen's setting from
> >> the top-of-stack block.
> >>
> >> Because the guest values are loaded/saved atomically, we do not need to use
> >> the shadowing logic to cope with late NMIs/etc, so we can omit
> >> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
> >> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
> >> no need to switch back to Xen's context on an early failure.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Jan Beulich <JBeulich@suse.com>
> >> CC: Roger Pau Monné <roger.pau@citrix.com>
> >> CC: Wei Liu <wl@xen.org>
> >> CC: Jun Nakajima <jun.nakajima@intel.com>
> >> CC: Kevin Tian <kevin.tian@intel.com>
> >>
> >> Needs backporting as far as people can tolerate.
> >>
> >> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
> >> but this is awkard to arrange in asm.
> >> ---
> >>  xen/arch/x86/hvm/vmx/entry.S             | 19 ++++++++++-------
> >>  xen/arch/x86/hvm/vmx/vmx.c               | 36 +++++++++++++++++++++++++++-----
> >>  xen/arch/x86/include/asm/msr.h           | 10 ++++++++-
> >>  xen/arch/x86/include/asm/spec_ctrl_asm.h | 32 ++++------------------------
> >>  4 files changed, 56 insertions(+), 41 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
> >> index 30139ae58e9d..297ed8685126 100644
> >> --- a/xen/arch/x86/hvm/vmx/entry.S
> >> +++ b/xen/arch/x86/hvm/vmx/entry.S
> >> @@ -35,7 +35,14 @@ ENTRY(vmx_asm_vmexit_handler)
> >>  
> >>          /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
> >>          ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
> >> -        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
> >> +
> >> +        .macro restore_spec_ctrl
> >> +            mov    $MSR_SPEC_CTRL, %ecx
> >> +            movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
> >> +            xor    %edx, %edx
> >> +            wrmsr
> >> +        .endm
> >> +        ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
> > I assume loading the host value into SPEC_CTRL strictly needs to
> > happen after the RSB overwrite, or else you could use a VM exit host
> > load MSR entry to handle MSR_SPEC_CTRL.
> 
> We could use the host load list, but Intel warned that the lists aren't
> as efficient as writing it like this, hence why I didn't use them
> originally when we thought the NMI behaviour was different.  Obviously,
> we're using them now for correctness reasons, which is more important
> than avoiding them for (unquantified) perf reasons.
> 
> I've got some plans for changes to how Xen handles MSR_SPEC_CTRL for its
> own protection, which I hope will bring a substantial efficiency
> improvements, and those would require us not to use a host load list here.

Might be worth mentioning that further work will prevent us from using
host load list for SPEC_CTRL has a comment here or in the commit
message.

Using host load lists would also allow us to get rid of
X86_FEATURE_SC_MSR_HVM.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-14 13:43       ` Roger Pau Monné
@ 2022-01-14 13:49         ` Andrew Cooper
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2022-01-14 13:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, Xen-devel, Jan Beulich, Wei Liu, Jun Nakajima, Kevin Tian

On 14/01/2022 13:43, Roger Pau Monné wrote:
> On Fri, Jan 14, 2022 at 01:08:43PM +0000, Andrew Cooper wrote:
>> On 14/01/2022 12:50, Roger Pau Monné wrote:
>>> On Thu, Jan 13, 2022 at 04:38:33PM +0000, Andrew Cooper wrote:
>>>> The logic was based on a mistaken understanding of how NMI blocking on vmexit
>>>> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
>>>> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
>>>> and the guest's value will be clobbered before it is saved.
>>>>
>>>> Switch to using MSR load/save lists.  This causes the guest value to be saved
>>>> atomically with respect to NMIs/MCEs/etc.
>>>>
>>>> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
>>>> the same time as configuring the intercepts.  This function is always used in
>>>> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
>>>> function, rather than having multiple remote acquisitions of the same VMCS.
>>>>
>>>> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
>>>> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
>>>> -ESRCH but we don't matter, and this path will be taken during domain create
>>>> on a system lacking IBRSB.
>>>>
>>>> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
>>>> rather than vcpu_msrs, and update the comment to describe the new state
>>>> location.
>>>>
>>>> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
>>>> entirely, and use a shorter code sequence to simply reload Xen's setting from
>>>> the top-of-stack block.
>>>>
>>>> Because the guest values are loaded/saved atomically, we do not need to use
>>>> the shadowing logic to cope with late NMIs/etc, so we can omit
>>>> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
>>>> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
>>>> no need to switch back to Xen's context on an early failure.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>> CC: Wei Liu <wl@xen.org>
>>>> CC: Jun Nakajima <jun.nakajima@intel.com>
>>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>>
>>>> Needs backporting as far as people can tolerate.
>>>>
>>>> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
>>>> but this is awkard to arrange in asm.
>>>> ---
>>>>  xen/arch/x86/hvm/vmx/entry.S             | 19 ++++++++++-------
>>>>  xen/arch/x86/hvm/vmx/vmx.c               | 36 +++++++++++++++++++++++++++-----
>>>>  xen/arch/x86/include/asm/msr.h           | 10 ++++++++-
>>>>  xen/arch/x86/include/asm/spec_ctrl_asm.h | 32 ++++------------------------
>>>>  4 files changed, 56 insertions(+), 41 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
>>>> index 30139ae58e9d..297ed8685126 100644
>>>> --- a/xen/arch/x86/hvm/vmx/entry.S
>>>> +++ b/xen/arch/x86/hvm/vmx/entry.S
>>>> @@ -35,7 +35,14 @@ ENTRY(vmx_asm_vmexit_handler)
>>>>  
>>>>          /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>>>>          ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
>>>> -        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
>>>> +
>>>> +        .macro restore_spec_ctrl
>>>> +            mov    $MSR_SPEC_CTRL, %ecx
>>>> +            movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
>>>> +            xor    %edx, %edx
>>>> +            wrmsr
>>>> +        .endm
>>>> +        ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
>>> I assume loading the host value into SPEC_CTRL strictly needs to
>>> happen after the RSB overwrite, or else you could use a VM exit host
>>> load MSR entry to handle MSR_SPEC_CTRL.
>> We could use the host load list, but Intel warned that the lists aren't
>> as efficient as writing it like this, hence why I didn't use them
>> originally when we thought the NMI behaviour was different.  Obviously,
>> we're using them now for correctness reasons, which is more important
>> than avoiding them for (unquantified) perf reasons.
>>
>> I've got some plans for changes to how Xen handles MSR_SPEC_CTRL for its
>> own protection, which I hope will bring a substantial efficiency
>> improvements, and those would require us not to use a host load list here.
> Might be worth mentioning that further work will prevent us from using
> host load list for SPEC_CTRL has a comment here or in the commit
> message.
>
> Using host load lists would also allow us to get rid of
> X86_FEATURE_SC_MSR_HVM.

No it wouldn't.  I need to reuse this for the AMD side, and it also
encodes "admin turned it off via spec-ctrl=".

> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

~Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-13 16:38 ` [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling Andrew Cooper
  2022-01-14 12:50   ` Roger Pau Monné
@ 2022-01-14 14:14   ` Andrew Cooper
  2022-01-14 14:41     ` Andrew Cooper
  2022-01-17 11:51   ` Jan Beulich
  2 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2022-01-14 14:14 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Jan Beulich, Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian

On 13/01/2022 16:38, Andrew Cooper wrote:
> The logic was based on a mistaken understanding of how NMI blocking on vmexit
> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
> and the guest's value will be clobbered before it is saved.
>
> Switch to using MSR load/save lists.  This causes the guest value to be saved
> atomically with respect to NMIs/MCEs/etc.
>
> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
> the same time as configuring the intercepts.  This function is always used in
> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
> function, rather than having multiple remote acquisitions of the same VMCS.
>
> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
> -ESRCH but we don't matter, and this path will be taken during domain create
> on a system lacking IBRSB.
>
> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
> rather than vcpu_msrs, and update the comment to describe the new state
> location.
>
> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
> entirely, and use a shorter code sequence to simply reload Xen's setting from
> the top-of-stack block.
>
> Because the guest values are loaded/saved atomically, we do not need to use
> the shadowing logic to cope with late NMIs/etc, so we can omit
> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
> no need to switch back to Xen's context on an early failure.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
>
> Needs backporting as far as people can tolerate.
>
> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
> but this is awkard to arrange in asm.

Actually, it's just occurred to me that an ASSERT is actually quite easy
here.  I'm proposing this additional delta (totally untested).

diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 297ed8685126..f569c3259b32 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -41,6 +41,13 @@ ENTRY(vmx_asm_vmexit_handler)
             movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
             xor    %edx, %edx
             wrmsr
+
+#ifdef CONFIG_DEBUG
+            testb $SCF_use_shadow, CPUINFO_spec_ctrl_flags(%rsp)
+            jz 1f
+            ASSERT_FAILED("MSR_SPEC_CTRL shadowing active")
+1:
+#endif
         .endm
         ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */

~Andrew


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-14 14:14   ` Andrew Cooper
@ 2022-01-14 14:41     ` Andrew Cooper
  2022-01-17  9:21       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2022-01-14 14:41 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Jan Beulich, Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian

On 14/01/2022 14:14, Andrew Cooper wrote:
> On 13/01/2022 16:38, Andrew Cooper wrote:
>> The logic was based on a mistaken understanding of how NMI blocking on vmexit
>> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
>> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
>> and the guest's value will be clobbered before it is saved.
>>
>> Switch to using MSR load/save lists.  This causes the guest value to be saved
>> atomically with respect to NMIs/MCEs/etc.
>>
>> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
>> the same time as configuring the intercepts.  This function is always used in
>> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
>> function, rather than having multiple remote acquisitions of the same VMCS.
>>
>> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
>> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
>> -ESRCH but we don't matter, and this path will be taken during domain create
>> on a system lacking IBRSB.
>>
>> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
>> rather than vcpu_msrs, and update the comment to describe the new state
>> location.
>>
>> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
>> entirely, and use a shorter code sequence to simply reload Xen's setting from
>> the top-of-stack block.
>>
>> Because the guest values are loaded/saved atomically, we do not need to use
>> the shadowing logic to cope with late NMIs/etc, so we can omit
>> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
>> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
>> no need to switch back to Xen's context on an early failure.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wl@xen.org>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>>
>> Needs backporting as far as people can tolerate.
>>
>> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
>> but this is awkard to arrange in asm.
> Actually, it's just occurred to me that an ASSERT is actually quite easy
> here.  I'm proposing this additional delta (totally untested).
>
> diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
> index 297ed8685126..f569c3259b32 100644
> --- a/xen/arch/x86/hvm/vmx/entry.S
> +++ b/xen/arch/x86/hvm/vmx/entry.S
> @@ -41,6 +41,13 @@ ENTRY(vmx_asm_vmexit_handler)
>              movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
>              xor    %edx, %edx
>              wrmsr
> +
> +#ifdef CONFIG_DEBUG
> +            testb $SCF_use_shadow, CPUINFO_spec_ctrl_flags(%rsp)
> +            jz 1f
> +            ASSERT_FAILED("MSR_SPEC_CTRL shadowing active")
> +1:
> +#endif
>          .endm
>          ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */

Irritatingly this doesn't work, because the metadata associated with the
ud2 instruction is tied to the compiled position in
.altinstr_replacement, not the runtime position after alternatives have run.

~Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-14 14:41     ` Andrew Cooper
@ 2022-01-17  9:21       ` Jan Beulich
  2022-01-17 11:43         ` Andrew Cooper
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-01-17  9:21 UTC (permalink / raw)
  To: Andrew Cooper, Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 14.01.2022 15:41, Andrew Cooper wrote:
> On 14/01/2022 14:14, Andrew Cooper wrote:
>> On 13/01/2022 16:38, Andrew Cooper wrote:
>>> The logic was based on a mistaken understanding of how NMI blocking on vmexit
>>> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
>>> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
>>> and the guest's value will be clobbered before it is saved.
>>>
>>> Switch to using MSR load/save lists.  This causes the guest value to be saved
>>> atomically with respect to NMIs/MCEs/etc.
>>>
>>> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
>>> the same time as configuring the intercepts.  This function is always used in
>>> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
>>> function, rather than having multiple remote acquisitions of the same VMCS.
>>>
>>> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
>>> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
>>> -ESRCH but we don't matter, and this path will be taken during domain create
>>> on a system lacking IBRSB.
>>>
>>> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
>>> rather than vcpu_msrs, and update the comment to describe the new state
>>> location.
>>>
>>> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
>>> entirely, and use a shorter code sequence to simply reload Xen's setting from
>>> the top-of-stack block.
>>>
>>> Because the guest values are loaded/saved atomically, we do not need to use
>>> the shadowing logic to cope with late NMIs/etc, so we can omit
>>> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
>>> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
>>> no need to switch back to Xen's context on an early failure.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>> CC: Wei Liu <wl@xen.org>
>>> CC: Jun Nakajima <jun.nakajima@intel.com>
>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>
>>> Needs backporting as far as people can tolerate.
>>>
>>> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
>>> but this is awkard to arrange in asm.
>> Actually, it's just occurred to me that an ASSERT is actually quite easy
>> here.  I'm proposing this additional delta (totally untested).
>>
>> diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
>> index 297ed8685126..f569c3259b32 100644
>> --- a/xen/arch/x86/hvm/vmx/entry.S
>> +++ b/xen/arch/x86/hvm/vmx/entry.S
>> @@ -41,6 +41,13 @@ ENTRY(vmx_asm_vmexit_handler)
>>              movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
>>              xor    %edx, %edx
>>              wrmsr
>> +
>> +#ifdef CONFIG_DEBUG
>> +            testb $SCF_use_shadow, CPUINFO_spec_ctrl_flags(%rsp)
>> +            jz 1f
>> +            ASSERT_FAILED("MSR_SPEC_CTRL shadowing active")
>> +1:
>> +#endif
>>          .endm
>>          ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
>>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
> 
> Irritatingly this doesn't work, because the metadata associated with the
> ud2 instruction is tied to the compiled position in
> .altinstr_replacement, not the runtime position after alternatives have run.

Could we have the macro "return" ZF, leaving it to the invocation
site of the macro to act on it? ALTERNATIVE's first argument could
easily be "xor %reg, %reg" to set ZF without much other overhead.

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling
  2022-01-13 16:38 ` [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling Andrew Cooper
  2022-01-14 11:03   ` Roger Pau Monné
@ 2022-01-17 11:07   ` Jan Beulich
  2022-01-17 11:24     ` Andrew Cooper
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-01-17 11:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 13.01.2022 17:38, Andrew Cooper wrote:
> In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need
> to be three different access methods for where the guest's value lives.
> However, it would be better not to duplicate the #GP checking logic.
> 
> guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we
> can repurpose X86EMUL_UNHANDLEABLE slightly for this.  This is going to be a
> common pattern for other MSRs too in the future.

I consider this repurposing risky. Did you consider using e.g.
X86EMUL_DONE or X86EMUL_RETRY instead? Neither of the two is
presently used by the MSR machinery afaics.

What's worse, aren't you making it possible to hit the
ASSERT_UNREACHABLE() in hvm_save_cpu_msrs() as well as in
XEN_DOMCTL_get_vcpu_msrs handling? And have hvm_load_cpu_msrs()
produce -ENXIO and XEN_DOMCTL_set_vcpu_msrs return -EINVAL?

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM
  2022-01-13 16:38 ` [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM Andrew Cooper
  2022-01-14 11:42   ` Roger Pau Monné
@ 2022-01-17 11:22   ` Jan Beulich
  2022-01-17 11:41     ` Andrew Cooper
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-01-17 11:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 13.01.2022 17:38, Andrew Cooper wrote:
> --- a/xen/arch/x86/hvm/svm/entry.S
> +++ b/xen/arch/x86/hvm/svm/entry.S
> @@ -59,7 +59,7 @@ __UNLIKELY_END(nsvm_hap)
>          mov VCPUMSR_spec_ctrl_raw(%rax), %eax
>  
>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
> -        SPEC_CTRL_EXIT_TO_HVM   /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
> +        /* SPEC_CTRL_EXIT_TO_SVM   (nothing currently) */
>  
>          pop  %r15
>          pop  %r14
> @@ -86,7 +86,8 @@ __UNLIKELY_END(nsvm_hap)
>  
>          GET_CURRENT(bx)
>  
> -        SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
> +        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: b=curr %rsp=regs/cpuinfo, Clob: ac  */
> +        ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM

Just for my own understanding: The comments you add aren't commented
out macro invocations (as I did read it first), but comments naming
would-be-macros which are then expanded "manually"? That is partly
because initially I read the description saying "Expand and drop the
macros" as meaning that the macros grow in what they contain, which
looked contradictory to them getting dropped at the same time.
Perhaps me not sufficiently understanding the difference between all
possible meanings of "expand" vs "extend" ...

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling
  2022-01-17 11:07   ` Jan Beulich
@ 2022-01-17 11:24     ` Andrew Cooper
  2022-01-17 11:54       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2022-01-17 11:24 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 17/01/2022 11:07, Jan Beulich wrote:
> On 13.01.2022 17:38, Andrew Cooper wrote:
>> In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need
>> to be three different access methods for where the guest's value lives.
>> However, it would be better not to duplicate the #GP checking logic.
>>
>> guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we
>> can repurpose X86EMUL_UNHANDLEABLE slightly for this.  This is going to be a
>> common pattern for other MSRs too in the future.
> I consider this repurposing risky. Did you consider using e.g.
> X86EMUL_DONE or X86EMUL_RETRY instead? Neither of the two is
> presently used by the MSR machinery afaics.

RETRY is used for the MSRs which can cause a p2m allocation and hit the
paging path.  DONE is not remotely appropriate for this purpose.

I also don't want to introduce a new PARTIAL because that is a recipe
for backport bugs.

> What's worse, aren't you making it possible to hit the
> ASSERT_UNREACHABLE() in hvm_save_cpu_msrs() as well as in
> XEN_DOMCTL_get_vcpu_msrs handling? And have hvm_load_cpu_msrs()
> produce -ENXIO and XEN_DOMCTL_set_vcpu_msrs return -EINVAL?

I found this bug after backporting the patches and getting them some
wider testing.  I'm doing a pre-req patch to rework the migration
machinery to call into the pv/hvm paths rather than into
guest_{rd,wr}msr() directly.

~Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM
  2022-01-17 11:22   ` Jan Beulich
@ 2022-01-17 11:41     ` Andrew Cooper
  2022-01-17 11:57       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2022-01-17 11:41 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 17/01/2022 11:22, Jan Beulich wrote:
> On 13.01.2022 17:38, Andrew Cooper wrote:
>> --- a/xen/arch/x86/hvm/svm/entry.S
>> +++ b/xen/arch/x86/hvm/svm/entry.S
>> @@ -59,7 +59,7 @@ __UNLIKELY_END(nsvm_hap)
>>          mov VCPUMSR_spec_ctrl_raw(%rax), %eax
>>  
>>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>> -        SPEC_CTRL_EXIT_TO_HVM   /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
>> +        /* SPEC_CTRL_EXIT_TO_SVM   (nothing currently) */
>>  
>>          pop  %r15
>>          pop  %r14
>> @@ -86,7 +86,8 @@ __UNLIKELY_END(nsvm_hap)
>>  
>>          GET_CURRENT(bx)
>>  
>> -        SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>> +        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: b=curr %rsp=regs/cpuinfo, Clob: ac  */
>> +        ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
> Just for my own understanding: The comments you add aren't commented
> out macro invocations (as I did read it first), but comments naming
> would-be-macros which are then expanded "manually"? That is partly
> because initially I read the description saying "Expand and drop the
> macros" as meaning that the macros grow in what they contain, which
> looked contradictory to them getting dropped at the same time.
> Perhaps me not sufficiently understanding the difference between all
> possible meanings of "expand" vs "extend" ...

They're grep fodder to be able to easily locate where we're doing
entry/exit speculation logic.  These paths will continue to diverge over
time, and cannot have a common implementation.

~Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-17  9:21       ` Jan Beulich
@ 2022-01-17 11:43         ` Andrew Cooper
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2022-01-17 11:43 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 17/01/2022 09:21, Jan Beulich wrote:
> On 14.01.2022 15:41, Andrew Cooper wrote:
>> On 14/01/2022 14:14, Andrew Cooper wrote:
>>> On 13/01/2022 16:38, Andrew Cooper wrote:
>>>> The logic was based on a mistaken understanding of how NMI blocking on vmexit
>>>> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
>>>> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
>>>> and the guest's value will be clobbered before it is saved.
>>>>
>>>> Switch to using MSR load/save lists.  This causes the guest value to be saved
>>>> atomically with respect to NMIs/MCEs/etc.
>>>>
>>>> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
>>>> the same time as configuring the intercepts.  This function is always used in
>>>> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
>>>> function, rather than having multiple remote acquisitions of the same VMCS.
>>>>
>>>> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
>>>> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
>>>> -ESRCH but we don't matter, and this path will be taken during domain create
>>>> on a system lacking IBRSB.
>>>>
>>>> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
>>>> rather than vcpu_msrs, and update the comment to describe the new state
>>>> location.
>>>>
>>>> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
>>>> entirely, and use a shorter code sequence to simply reload Xen's setting from
>>>> the top-of-stack block.
>>>>
>>>> Because the guest values are loaded/saved atomically, we do not need to use
>>>> the shadowing logic to cope with late NMIs/etc, so we can omit
>>>> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
>>>> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
>>>> no need to switch back to Xen's context on an early failure.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>> CC: Wei Liu <wl@xen.org>
>>>> CC: Jun Nakajima <jun.nakajima@intel.com>
>>>> CC: Kevin Tian <kevin.tian@intel.com>
>>>>
>>>> Needs backporting as far as people can tolerate.
>>>>
>>>> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
>>>> but this is awkard to arrange in asm.
>>> Actually, it's just occurred to me that an ASSERT is actually quite easy
>>> here.  I'm proposing this additional delta (totally untested).
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
>>> index 297ed8685126..f569c3259b32 100644
>>> --- a/xen/arch/x86/hvm/vmx/entry.S
>>> +++ b/xen/arch/x86/hvm/vmx/entry.S
>>> @@ -41,6 +41,13 @@ ENTRY(vmx_asm_vmexit_handler)
>>>              movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
>>>              xor    %edx, %edx
>>>              wrmsr
>>> +
>>> +#ifdef CONFIG_DEBUG
>>> +            testb $SCF_use_shadow, CPUINFO_spec_ctrl_flags(%rsp)
>>> +            jz 1f
>>> +            ASSERT_FAILED("MSR_SPEC_CTRL shadowing active")
>>> +1:
>>> +#endif
>>>          .endm
>>>          ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
>>>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>> Irritatingly this doesn't work, because the metadata associated with the
>> ud2 instruction is tied to the compiled position in
>> .altinstr_replacement, not the runtime position after alternatives have run.
> Could we have the macro "return" ZF, leaving it to the invocation
> site of the macro to act on it? ALTERNATIVE's first argument could
> easily be "xor %reg, %reg" to set ZF without much other overhead.

That's very subtle, and a substantial layering violation.  I really
don't think the complexity is worth it.

~Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-13 16:38 ` [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling Andrew Cooper
  2022-01-14 12:50   ` Roger Pau Monné
  2022-01-14 14:14   ` Andrew Cooper
@ 2022-01-17 11:51   ` Jan Beulich
  2022-01-17 17:23     ` Andrew Cooper
  2 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-01-17 11:51 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 13.01.2022 17:38, Andrew Cooper wrote:
> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
> rather than vcpu_msrs, and update the comment to describe the new state
> location.

Nit: Assuming "the comment" refers to something in the named function,
I'm afraid I can't spot any comment being updated there. Perhaps you
mean the comment in the declaration of struct vcpu_msrs?

> --- a/xen/arch/x86/hvm/vmx/entry.S
> +++ b/xen/arch/x86/hvm/vmx/entry.S
> @@ -35,7 +35,14 @@ ENTRY(vmx_asm_vmexit_handler)
>  
>          /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>          ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
> -        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
> +
> +        .macro restore_spec_ctrl
> +            mov    $MSR_SPEC_CTRL, %ecx
> +            movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
> +            xor    %edx, %edx
> +            wrmsr
> +        .endm
> +        ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM
>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>  
>          /* Hardware clears MSR_DEBUGCTL on VMExit.  Reinstate it if debugging Xen. */
> @@ -83,7 +90,6 @@ UNLIKELY_END(realmode)
>  
>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>          /* SPEC_CTRL_EXIT_TO_VMX   Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
> -        ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
>          ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)), X86_FEATURE_SC_VERW_HVM

Shouldn't you update the "Clob:" remarks here as well?

> @@ -119,12 +125,11 @@ UNLIKELY_END(realmode)
>          SAVE_ALL
>  
>          /*
> -         * PV variant needed here as no guest code has executed (so
> -         * MSR_SPEC_CTRL can't have changed value), and NMIs/MCEs are liable
> -         * to hit (in which case the HVM variant might corrupt things).
> +         * SPEC_CTRL_ENTRY notes
> +         *
> +         * If we end up here, no guest code has executed.  We still have Xen's
> +         * choice of MSR_SPEC_CTRL in context, and the RSB is safe.
>           */
> -        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
> -        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */

Is "no guest code has executed" actually enough here? If VM entry failed
due to a later MSR-load-list entry, SPEC_CTRL could have changed value
already, couldn't it?

> @@ -601,17 +602,27 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
>  
>      vmx_vmcs_enter(v);
>      vmx_update_exception_bitmap(v);
> -    vmx_vmcs_exit(v);
>  
>      /*
>       * We can safely pass MSR_SPEC_CTRL through to the guest, even if STIBP
>       * isn't enumerated in hardware, as SPEC_CTRL_STIBP is ignored.
>       */
>      if ( cp->feat.ibrsb )
> +    {
>          vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
> +
> +        rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0);
> +        if ( rc )
> +            goto err;
> +    }
>      else
> +    {
>          vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
>  
> +        /* Can only fail with -ESRCH, and we don't care. */
> +        vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST);

To be forward-compatible with possible future changes to the
function (or load list handling as a whole), how about having an
assertion proving what the comment says:

        rc = vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST);
        if ( rc )
        {
            ASSERT(rc == -ESRCH);
            rc = 0;
        }

?

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling
  2022-01-17 11:24     ` Andrew Cooper
@ 2022-01-17 11:54       ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2022-01-17 11:54 UTC (permalink / raw)
  To: Andrew Cooper, Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 17.01.2022 12:24, Andrew Cooper wrote:
> On 17/01/2022 11:07, Jan Beulich wrote:
>> On 13.01.2022 17:38, Andrew Cooper wrote:
>>> In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need
>>> to be three different access methods for where the guest's value lives.
>>> However, it would be better not to duplicate the #GP checking logic.
>>>
>>> guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we
>>> can repurpose X86EMUL_UNHANDLEABLE slightly for this.  This is going to be a
>>> common pattern for other MSRs too in the future.
>> I consider this repurposing risky. Did you consider using e.g.
>> X86EMUL_DONE or X86EMUL_RETRY instead? Neither of the two is
>> presently used by the MSR machinery afaics.
> 
> RETRY is used for the MSRs which can cause a p2m allocation and hit the
> paging path.  DONE is not remotely appropriate for this purpose.

Well, okay then. I would have said DONE is as (in)appropriate as
UNHANDLEABLE here.

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM
  2022-01-17 11:41     ` Andrew Cooper
@ 2022-01-17 11:57       ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2022-01-17 11:57 UTC (permalink / raw)
  To: Andrew Cooper, Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 17.01.2022 12:41, Andrew Cooper wrote:
> On 17/01/2022 11:22, Jan Beulich wrote:
>> On 13.01.2022 17:38, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/hvm/svm/entry.S
>>> +++ b/xen/arch/x86/hvm/svm/entry.S
>>> @@ -59,7 +59,7 @@ __UNLIKELY_END(nsvm_hap)
>>>          mov VCPUMSR_spec_ctrl_raw(%rax), %eax
>>>  
>>>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>>> -        SPEC_CTRL_EXIT_TO_HVM   /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
>>> +        /* SPEC_CTRL_EXIT_TO_SVM   (nothing currently) */
>>>  
>>>          pop  %r15
>>>          pop  %r14
>>> @@ -86,7 +86,8 @@ __UNLIKELY_END(nsvm_hap)
>>>  
>>>          GET_CURRENT(bx)
>>>  
>>> -        SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>>> +        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: b=curr %rsp=regs/cpuinfo, Clob: ac  */
>>> +        ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
>> Just for my own understanding: The comments you add aren't commented
>> out macro invocations (as I did read it first), but comments naming
>> would-be-macros which are then expanded "manually"? That is partly
>> because initially I read the description saying "Expand and drop the
>> macros" as meaning that the macros grow in what they contain, which
>> looked contradictory to them getting dropped at the same time.
>> Perhaps me not sufficiently understanding the difference between all
>> possible meanings of "expand" vs "extend" ...
> 
> They're grep fodder to be able to easily locate where we're doing
> entry/exit speculation logic.

And I don't suppose I could talk you into replacing the use of "expand"
in the description, even if it was just me who was mislead? Perhaps by
"Open-code and drop the macros"?

>  These paths will continue to diverge over
> time, and cannot have a common implementation.

That's understood.

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-17 11:51   ` Jan Beulich
@ 2022-01-17 17:23     ` Andrew Cooper
  2022-01-18  7:57       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2022-01-17 17:23 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 17/01/2022 11:51, Jan Beulich wrote:
> On 13.01.2022 17:38, Andrew Cooper wrote:
>> @@ -119,12 +125,11 @@ UNLIKELY_END(realmode)
>>          SAVE_ALL
>>  
>>          /*
>> -         * PV variant needed here as no guest code has executed (so
>> -         * MSR_SPEC_CTRL can't have changed value), and NMIs/MCEs are liable
>> -         * to hit (in which case the HVM variant might corrupt things).
>> +         * SPEC_CTRL_ENTRY notes
>> +         *
>> +         * If we end up here, no guest code has executed.  We still have Xen's
>> +         * choice of MSR_SPEC_CTRL in context, and the RSB is safe.
>>           */
>> -        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
>> -        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
> Is "no guest code has executed" actually enough here? If VM entry failed
> due to a later MSR-load-list entry, SPEC_CTRL could have changed value
> already, couldn't it?

No, it can't.

See the very start of SDM Chapter 25 "VMEntries" for the distinction
between early and late entry failures.  (i.e. those which cause
execution to continue beyond VMLAUNCH/VMRESUME, and those which cause
execution to continue from the vmexit handler.)

Early failures are conditions such as `pop %ss; vmlaunch`, and bad host
state in the VMCS.

Everything pertaining to guest state is late failure, so by the time we
get to processing the MSR load/save list, we're definitely not taking
this path.

~Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
  2022-01-17 17:23     ` Andrew Cooper
@ 2022-01-18  7:57       ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2022-01-18  7:57 UTC (permalink / raw)
  To: Andrew Cooper, Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 17.01.2022 18:23, Andrew Cooper wrote:
> On 17/01/2022 11:51, Jan Beulich wrote:
>> On 13.01.2022 17:38, Andrew Cooper wrote:
>>> @@ -119,12 +125,11 @@ UNLIKELY_END(realmode)
>>>          SAVE_ALL
>>>  
>>>          /*
>>> -         * PV variant needed here as no guest code has executed (so
>>> -         * MSR_SPEC_CTRL can't have changed value), and NMIs/MCEs are liable
>>> -         * to hit (in which case the HVM variant might corrupt things).
>>> +         * SPEC_CTRL_ENTRY notes
>>> +         *
>>> +         * If we end up here, no guest code has executed.  We still have Xen's
>>> +         * choice of MSR_SPEC_CTRL in context, and the RSB is safe.
>>>           */
>>> -        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
>>> -        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>> Is "no guest code has executed" actually enough here? If VM entry failed
>> due to a later MSR-load-list entry, SPEC_CTRL could have changed value
>> already, couldn't it?
> 
> No, it can't.
> 
> See the very start of SDM Chapter 25 "VMEntries" for the distinction
> between early and late entry failures.  (i.e. those which cause
> execution to continue beyond VMLAUNCH/VMRESUME, and those which cause
> execution to continue from the vmexit handler.)
> 
> Early failures are conditions such as `pop %ss; vmlaunch`, and bad host
> state in the VMCS.
> 
> Everything pertaining to guest state is late failure, so by the time we
> get to processing the MSR load/save list, we're definitely not taking
> this path.

I see. This still means that the answer to my 1st question is "yes". In
which case I'd like to ask to extend the comment to include "no MSRs
have been loaded from the load list" or something equivalent, despite
realizing that such an amendment would have helped already before your
change.

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-01-18  7:57 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 16:38 [PATCH 0/3] x86/spec-ctrl: Fix NMI race condition Andrew Cooper
2022-01-13 16:38 ` [PATCH 1/3] x86/msr: Split MSR_SPEC_CTRL handling Andrew Cooper
2022-01-14 11:03   ` Roger Pau Monné
2022-01-17 11:07   ` Jan Beulich
2022-01-17 11:24     ` Andrew Cooper
2022-01-17 11:54       ` Jan Beulich
2022-01-13 16:38 ` [PATCH 2/3] x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM Andrew Cooper
2022-01-14 11:42   ` Roger Pau Monné
2022-01-14 11:49     ` Andrew Cooper
2022-01-17 11:22   ` Jan Beulich
2022-01-17 11:41     ` Andrew Cooper
2022-01-17 11:57       ` Jan Beulich
2022-01-13 16:38 ` [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling Andrew Cooper
2022-01-14 12:50   ` Roger Pau Monné
2022-01-14 13:08     ` Andrew Cooper
2022-01-14 13:43       ` Roger Pau Monné
2022-01-14 13:49         ` Andrew Cooper
2022-01-14 14:14   ` Andrew Cooper
2022-01-14 14:41     ` Andrew Cooper
2022-01-17  9:21       ` Jan Beulich
2022-01-17 11:43         ` Andrew Cooper
2022-01-17 11:51   ` Jan Beulich
2022-01-17 17:23     ` Andrew Cooper
2022-01-18  7:57       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.