xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] x86: Initial pieces for guest CET support
@ 2021-04-26 17:54 Andrew Cooper
  2021-04-26 17:54 ` [PATCH 1/3] x86/hvm: Introduce experimental " Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Andrew Cooper @ 2021-04-26 17:54 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Jun Nakajima, Kevin Tian

Some initial pieces for guest support.  Everything will currently malfunction
for VMs which explicitly opt in to CET_SS/IBT.

Still TODO as a minimum:
 * Teach the pagewalk logic about shadow stack accesses and errors.
 * Emulator support for the new instructions.  WRUSS is an irritating corner
   case, requiring a change to how we express pagewalk inputs, as
   user/supervisor is no longer dependent on CPL.
 * Context switching of U/S_CET state.  Recommended way is with XSAVES, except
   the S_CET has broken sematics - it ends up as a mix of host and guest
   state, and isn't safe to XRSTOR without editing what the CPU wrote out.

The above ought to suffice for getting some XTF testing in place.  For general
guest support:
 * In-guest XSAVES support.  Windows is the only OS to support CET at the time
   of writing, and it cross-checks for XSAVES.  Linux expected to perform the
   same cross-check in due course.

Stretch features (not for initial support):
  * Adding EPT/NPT Supervisor Shadow Stack protections into mem_access, so
    introspection can block aliasing attacks.

Andrew Cooper (3):
  x86/hvm: Introduce experimental guest CET support
  x86/svm: Enumeration for CET
  x86/VT-x: Enumeration for CET

 xen/arch/x86/hvm/hvm.c                      | 18 ++++++++++++++++--
 xen/arch/x86/hvm/svm/svm.c                  |  1 +
 xen/arch/x86/hvm/svm/svmdebug.c             |  2 ++
 xen/arch/x86/hvm/vmx/vmcs.c                 |  6 ++++++
 xen/include/asm-x86/hvm/svm/svm.h           |  2 ++
 xen/include/asm-x86/hvm/svm/vmcb.h          | 10 ++++++++--
 xen/include/asm-x86/hvm/vmx/vmcs.h          | 11 ++++++++++-
 xen/include/public/arch-x86/cpufeatureset.h |  4 ++--
 8 files changed, 47 insertions(+), 7 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/3] x86/hvm: Introduce experimental guest CET support
  2021-04-26 17:54 [PATCH 0/3] x86: Initial pieces for guest CET support Andrew Cooper
@ 2021-04-26 17:54 ` Andrew Cooper
  2021-04-27 15:47   ` Jan Beulich
  2021-04-26 17:54 ` [PATCH 2/3] x86/svm: Enumeration for CET Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2021-04-26 17:54 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

For now, let VMs opt into using CET by setting cet_ss/ibt in the CPUID
policy.  Also extend cr4 handling to permit CR4.CET being set, along with
logic to interlock CR4.CET and CR0.WP.

Everything else will malfunction for now, but this will help adding support
incrementally - there is a lot to do before CET will work properly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/hvm/hvm.c                      | 18 ++++++++++++++++--
 xen/include/public/arch-x86/cpufeatureset.h |  4 ++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ae37bc434a..28beacc45b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -976,11 +976,12 @@ const char *hvm_efer_valid(const struct vcpu *v, uint64_t value,
 unsigned long hvm_cr4_guest_valid_bits(const struct domain *d)
 {
     const struct cpuid_policy *p = d->arch.cpuid;
-    bool mce, vmxe;
+    bool mce, vmxe, cet;
 
     /* Logic broken out simply to aid readability below. */
     mce  = p->basic.mce || p->basic.mca;
     vmxe = p->basic.vmx && nestedhvm_enabled(d);
+    cet  = p->feat.cet_ss || p->feat.cet_ibt;
 
     return ((p->basic.vme     ? X86_CR4_VME | X86_CR4_PVI : 0) |
             (p->basic.tsc     ? X86_CR4_TSD               : 0) |
@@ -999,7 +1000,8 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d)
             (p->basic.xsave   ? X86_CR4_OSXSAVE           : 0) |
             (p->feat.smep     ? X86_CR4_SMEP              : 0) |
             (p->feat.smap     ? X86_CR4_SMAP              : 0) |
-            (p->feat.pku      ? X86_CR4_PKE               : 0));
+            (p->feat.pku      ? X86_CR4_PKE               : 0) |
+            (cet              ? X86_CR4_CET               : 0));
 }
 
 static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
@@ -2289,6 +2291,12 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
         }
     }
 
+    if ( !(value & X86_CR0_WP) && (v->arch.hvm.guest_cr[4] & X86_CR4_CET) )
+    {
+        gprintk(XENLOG_DEBUG, "Trying to clear WP with CET set\n");
+        return X86EMUL_EXCEPTION;
+    }
+
     if ( (value & X86_CR0_PG) && !(old_value & X86_CR0_PG) )
     {
         if ( v->arch.hvm.guest_efer & EFER_LME )
@@ -2444,6 +2452,12 @@ int hvm_set_cr4(unsigned long value, bool may_defer)
         }
     }
 
+    if ( (value & X86_CR4_CET) && !(v->arch.hvm.guest_cr[0] & X86_CR0_WP) )
+    {
+        gprintk(XENLOG_DEBUG, "Trying to set CET without WP\n");
+        return X86EMUL_EXCEPTION;
+    }
+
     old_cr = v->arch.hvm.guest_cr[4];
 
     if ( (value & X86_CR4_PCIDE) && !(old_cr & X86_CR4_PCIDE) &&
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index c42f56bdd4..6f94a73408 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -232,7 +232,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /*S  User Mode Instruction Prevention */
 XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
 XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
-XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*   CET - Shadow Stacks */
+XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*h  CET - Shadow Stacks */
 XEN_CPUFEATURE(GFNI,          6*32+ 8) /*A  Galois Field Instrs */
 XEN_CPUFEATURE(VAES,          6*32+ 9) /*A  Vector AES Instrs */
 XEN_CPUFEATURE(VPCLMULQDQ,    6*32+10) /*A  Vector Carry-less Multiplication Instrs */
@@ -267,7 +267,7 @@ XEN_CPUFEATURE(SRBDS_CTRL,    9*32+ 9) /*   MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS.
 XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(SERIALIZE,     9*32+14) /*a  SERIALIZE insn */
-XEN_CPUFEATURE(CET_IBT,       9*32+20) /*   CET - Indirect Branch Tracking */
+XEN_CPUFEATURE(CET_IBT,       9*32+20) /*h  CET - Indirect Branch Tracking */
 XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
 XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */
 XEN_CPUFEATURE(L1D_FLUSH,     9*32+28) /*S  MSR_FLUSH_CMD and L1D flush. */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] x86/svm: Enumeration for CET
  2021-04-26 17:54 [PATCH 0/3] x86: Initial pieces for guest CET support Andrew Cooper
  2021-04-26 17:54 ` [PATCH 1/3] x86/hvm: Introduce experimental " Andrew Cooper
@ 2021-04-26 17:54 ` Andrew Cooper
  2021-04-27 15:53   ` Jan Beulich
  2021-04-26 17:54 ` [PATCH 3/3] x86/VT-x: " Andrew Cooper
  2021-04-27  6:46 ` [PATCH 0/3] x86: Initial pieces for guest CET support Jan Beulich
  3 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2021-04-26 17:54 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

On CET-capable hardware, VMRUN/EXIT unconditionally swaps S_SET, SSP and
ISST (subject to cleanbits) without further settings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/hvm/svm/svm.c         |  1 +
 xen/arch/x86/hvm/svm/svmdebug.c    |  2 ++
 xen/include/asm-x86/hvm/svm/svm.h  |  2 ++
 xen/include/asm-x86/hvm/svm/vmcb.h | 10 ++++++++--
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 4585efe1f8..642a64b747 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1658,6 +1658,7 @@ const struct hvm_function_table * __init start_svm(void)
     P(cpu_has_pause_filter, "Pause-Intercept Filter");
     P(cpu_has_pause_thresh, "Pause-Intercept Filter Threshold");
     P(cpu_has_tsc_ratio, "TSC Rate MSR");
+    P(cpu_has_svm_sss, "NPT Supervisor Shadow Stack");
 #undef P
 
     if ( !printed )
diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c
index f450391df4..bce86f0ef7 100644
--- a/xen/arch/x86/hvm/svm/svmdebug.c
+++ b/xen/arch/x86/hvm/svm/svmdebug.c
@@ -82,6 +82,8 @@ void svm_vmcb_dump(const char *from, const struct vmcb_struct *vmcb)
            vmcb->cstar, vmcb->sfmask);
     printk("KernGSBase = 0x%016"PRIx64" PAT = 0x%016"PRIx64"\n",
            vmcb->kerngsbase, vmcb_get_g_pat(vmcb));
+    printk("SSP = 0x%016"PRIx64" S_CET = 0x%016"PRIx64" ISST = 0x%016"PRIx64"\n",
+           vmcb->_ssp, vmcb->_msr_s_cet, vmcb->_msr_isst);
     printk("H_CR3 = 0x%016"PRIx64" CleanBits = %#x\n",
            vmcb_get_h_cr3(vmcb), vmcb->cleanbits.raw);
 
diff --git a/xen/include/asm-x86/hvm/svm/svm.h b/xen/include/asm-x86/hvm/svm/svm.h
index faeca40174..bee939156f 100644
--- a/xen/include/asm-x86/hvm/svm/svm.h
+++ b/xen/include/asm-x86/hvm/svm/svm.h
@@ -75,6 +75,7 @@ extern u32 svm_feature_flags;
 #define SVM_FEATURE_PAUSETHRESH   12 /* Pause intercept filter support */
 #define SVM_FEATURE_VLOADSAVE     15 /* virtual vmload/vmsave */
 #define SVM_FEATURE_VGIF          16 /* Virtual GIF */
+#define SVM_FEATURE_SSS           19 /* NPT Supervisor Shadow Stacks */
 
 #define cpu_has_svm_feature(f) (svm_feature_flags & (1u << (f)))
 #define cpu_has_svm_npt       cpu_has_svm_feature(SVM_FEATURE_NPT)
@@ -89,6 +90,7 @@ extern u32 svm_feature_flags;
 #define cpu_has_pause_thresh  cpu_has_svm_feature(SVM_FEATURE_PAUSETHRESH)
 #define cpu_has_tsc_ratio     cpu_has_svm_feature(SVM_FEATURE_TSCRATEMSR)
 #define cpu_has_svm_vloadsave cpu_has_svm_feature(SVM_FEATURE_VLOADSAVE)
+#define cpu_has_svm_sss       cpu_has_svm_feature(SVM_FEATURE_SSS)
 
 #define SVM_PAUSEFILTER_INIT    4000
 #define SVM_PAUSETHRESH_INIT    1000
diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h b/xen/include/asm-x86/hvm/svm/vmcb.h
index 0b03a8f076..fbedea209e 100644
--- a/xen/include/asm-x86/hvm/svm/vmcb.h
+++ b/xen/include/asm-x86/hvm/svm/vmcb.h
@@ -248,6 +248,8 @@ enum VMEXIT_EXITCODE
     VMEXIT_EXCEPTION_AC  =  81, /* 0x51, alignment-check */
     VMEXIT_EXCEPTION_MC  =  82, /* 0x52, machine-check */
     VMEXIT_EXCEPTION_XF  =  83, /* 0x53, simd floating-point */
+/*  VMEXIT_EXCEPTION_20  =  84,    0x54, #VE (Intel specific) */
+    VMEXIT_EXCEPTION_CP  =  85, /* 0x55, controlflow protection */
 
     /* exceptions 20-31 (exitcodes 84-95) are reserved */
 
@@ -397,6 +399,8 @@ typedef union
         bool seg:1;        /* 8:  cs, ds, es, ss, cpl */
         bool cr2:1;        /* 9:  cr2 */
         bool lbr:1;        /* 10: debugctlmsr, last{branch,int}{to,from}ip */
+        bool :1;
+        bool cet:1;        /* 12: msr_s_set, ssp, msr_isst */
     };
     uint32_t raw;
 } vmcbcleanbits_t;
@@ -451,7 +455,7 @@ struct vmcb_struct {
             bool _sev_enable    :1;
             bool _sev_es_enable :1;
             bool _gmet          :1;
-            bool                :1;
+            bool _np_sss        :1;
             bool _vte           :1;
         };
         uint64_t _np_ctrl;
@@ -497,7 +501,9 @@ struct vmcb_struct {
     u64 rip;
     u64 res14[11];
     u64 rsp;
-    u64 res15[3];
+    u64 _msr_s_cet;             /* offset 0x400 + 0x1E0 - cleanbit 12 */
+    u64 _ssp;                   /* offset 0x400 + 0x1E8   | */
+    u64 _msr_isst;              /* offset 0x400 + 0x1F0   v */
     u64 rax;
     u64 star;
     u64 lstar;
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] x86/VT-x: Enumeration for CET
  2021-04-26 17:54 [PATCH 0/3] x86: Initial pieces for guest CET support Andrew Cooper
  2021-04-26 17:54 ` [PATCH 1/3] x86/hvm: Introduce experimental " Andrew Cooper
  2021-04-26 17:54 ` [PATCH 2/3] x86/svm: Enumeration for CET Andrew Cooper
@ 2021-04-26 17:54 ` Andrew Cooper
  2021-04-27 15:56   ` Jan Beulich
  2021-04-27  6:46 ` [PATCH 0/3] x86: Initial pieces for guest CET support Jan Beulich
  3 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2021-04-26 17:54 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Jun Nakajima, Kevin Tian

VT-x has separate entry/exit control for loading guest/host state.  Saving
guest state on vmexit is performed unconditionally.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        |  6 ++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h | 11 ++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index f9f9bc18cd..5849817630 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -2014,6 +2014,9 @@ void vmcs_dump_vcpu(struct vcpu *v)
     printk("RFLAGS=0x%08lx (0x%08lx)  DR7 = 0x%016lx\n",
            vmr(GUEST_RFLAGS), regs->rflags,
            vmr(GUEST_DR7));
+    if ( vmentry_ctl & VM_ENTRY_LOAD_GUEST_CET )
+        printk("SSP = 0x%016lx S_CET = 0x%016lx ISST = 0x%016lx\n",
+               vmr(GUEST_SSP), vmr(GUEST_S_CET), vmr(GUEST_ISST));
     printk("Sysenter RSP=%016lx CS:RIP=%04x:%016lx\n",
            vmr(GUEST_SYSENTER_ESP),
            vmr32(GUEST_SYSENTER_CS), vmr(GUEST_SYSENTER_EIP));
@@ -2057,6 +2060,9 @@ void vmcs_dump_vcpu(struct vcpu *v)
            vmr(HOST_GDTR_BASE), vmr(HOST_IDTR_BASE));
     printk("CR0=%016lx CR3=%016lx CR4=%016lx\n",
            vmr(HOST_CR0), vmr(HOST_CR3), vmr(HOST_CR4));
+    if ( vmexit_ctl & VM_EXIT_LOAD_HOST_CET )
+        printk("SSP = 0x%016lx S_CET = 0x%016lx ISST = 0x%016lx\n",
+               vmr(HOST_SSP), vmr(HOST_S_CET), vmr(HOST_ISST));
     printk("Sysenter RSP=%016lx CS:RIP=%04x:%016lx\n",
            vmr(HOST_SYSENTER_ESP),
            vmr32(HOST_SYSENTER_CS), vmr(HOST_SYSENTER_EIP));
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 8073af323b..4c4246f190 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -46,7 +46,8 @@ struct ept_data {
             uint64_t mt:3,   /* Memory Type. */
                      wl:3,   /* Walk length -1. */
                      ad:1,   /* Enable EPT A/D bits. */
-                     :5,     /* rsvd. */
+                     sss:1,  /* Supervisor Shadow Stack. */
+                     :4,     /* rsvd. */
                      mfn:52;
         };
         u64 eptp;
@@ -238,6 +239,7 @@ extern u32 vmx_pin_based_exec_control;
 #define VM_EXIT_LOAD_HOST_EFER          0x00200000
 #define VM_EXIT_SAVE_PREEMPT_TIMER      0x00400000
 #define VM_EXIT_CLEAR_BNDCFGS           0x00800000
+#define VM_EXIT_LOAD_HOST_CET           0x10000000
 extern u32 vmx_vmexit_control;
 
 #define VM_ENTRY_IA32E_MODE             0x00000200
@@ -247,6 +249,7 @@ extern u32 vmx_vmexit_control;
 #define VM_ENTRY_LOAD_GUEST_PAT         0x00004000
 #define VM_ENTRY_LOAD_GUEST_EFER        0x00008000
 #define VM_ENTRY_LOAD_BNDCFGS           0x00010000
+#define VM_ENTRY_LOAD_GUEST_CET         0x00100000
 extern u32 vmx_vmentry_control;
 
 #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001
@@ -516,6 +519,9 @@ enum vmcs_field {
     GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
     GUEST_SYSENTER_ESP              = 0x00006824,
     GUEST_SYSENTER_EIP              = 0x00006826,
+    GUEST_S_CET                     = 0x00006828,
+    GUEST_SSP                       = 0x0000682a,
+    GUEST_ISST                      = 0x0000682c,
     HOST_CR0                        = 0x00006c00,
     HOST_CR3                        = 0x00006c02,
     HOST_CR4                        = 0x00006c04,
@@ -528,6 +534,9 @@ enum vmcs_field {
     HOST_SYSENTER_EIP               = 0x00006c12,
     HOST_RSP                        = 0x00006c14,
     HOST_RIP                        = 0x00006c16,
+    HOST_S_CET                      = 0x00006c18,
+    HOST_SSP                        = 0x00006c1a,
+    HOST_ISST                       = 0x00006c1c,
 };
 
 #define VMCS_VPID_WIDTH 16
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/3] x86: Initial pieces for guest CET support
  2021-04-26 17:54 [PATCH 0/3] x86: Initial pieces for guest CET support Andrew Cooper
                   ` (2 preceding siblings ...)
  2021-04-26 17:54 ` [PATCH 3/3] x86/VT-x: " Andrew Cooper
@ 2021-04-27  6:46 ` Jan Beulich
  2021-04-27 10:13   ` Andrew Cooper
  3 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2021-04-27  6:46 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 26.04.2021 19:54, Andrew Cooper wrote:
> Some initial pieces for guest support.  Everything will currently malfunction
> for VMs which explicitly opt in to CET_SS/IBT.
> 
> Still TODO as a minimum:
>  * Teach the pagewalk logic about shadow stack accesses and errors.
>  * Emulator support for the new instructions.  WRUSS is an irritating corner
>    case, requiring a change to how we express pagewalk inputs, as
>    user/supervisor is no longer dependent on CPL.

I can put this on my todo list, considering that I'm the one to play
with the emulator the most. Just let me know if you would prefer to
do it yourself. (Otherwise my next item there after AMX is now
complete would have been KeyLocker insns.)

>  * Context switching of U/S_CET state.  Recommended way is with XSAVES, except
>    the S_CET has broken sematics - it ends up as a mix of host and guest
>    state, and isn't safe to XRSTOR without editing what the CPU wrote out.

Hmm, I wasn't aware of quirks here - would you mind going into more
detail?

> The above ought to suffice for getting some XTF testing in place.  For general
> guest support:
>  * In-guest XSAVES support.  Windows is the only OS to support CET at the time
>    of writing, and it cross-checks for XSAVES.  Linux expected to perform the
>    same cross-check in due course.

What specifically do you mean here? The XSAVES CPU feature is marked
'S', so ought to be visible to Windows. Hence I guess you mean support
for the respective XSS bits?

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/3] x86: Initial pieces for guest CET support
  2021-04-27  6:46 ` [PATCH 0/3] x86: Initial pieces for guest CET support Jan Beulich
@ 2021-04-27 10:13   ` Andrew Cooper
  2021-04-28 12:25     ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2021-04-27 10:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 27/04/2021 07:46, Jan Beulich wrote:
> On 26.04.2021 19:54, Andrew Cooper wrote:
>> Some initial pieces for guest support.  Everything will currently malfunction
>> for VMs which explicitly opt in to CET_SS/IBT.
>>
>> Still TODO as a minimum:
>>  * Teach the pagewalk logic about shadow stack accesses and errors.
>>  * Emulator support for the new instructions.  WRUSS is an irritating corner
>>    case, requiring a change to how we express pagewalk inputs, as
>>    user/supervisor is no longer dependent on CPL.
> I can put this on my todo list, considering that I'm the one to play
> with the emulator the most. Just let me know if you would prefer to
> do it yourself. (Otherwise my next item there after AMX is now
> complete would have been KeyLocker insns.)

My plan was actually to start with WRSS and do the pagewalk side of
things, including refreshing my comprehensive pagewalk XTF test.  This
will block work on all the other shadow stack instructions.

There are 3 emulator complexities for shadow stack instructions.  SSP
itself as a register, WRUSS no longer being CPL-based for
user/supervisor, and the fact that RSTORSSP in particular uses an atomic
block which microcode can express, but can't be encoded at an ISA
level.  I've got no idea what to do about this last problem, because we
can't map the two guest frames and re-issue the instruction - the
aliasing check on the tokens forces us to map the two frames in their
correct linear addresses.


IBT is quite different.  There are only two new instructions,
ENDBR{32,64} but tweaks to lots of other instructions (all indirect
control transfers), as well as 0x3e for the notrack prefix, and the
legacy code page bitmap.  The tracker bits themselves are somewhat
irritating to access in the {U,S}_CET MSRs.

Furthermore, I don't have access to hardware with CET-IBT (the SDP seems
to have let out some blue smoke), so any support here would be speculative.


Other bits I'd forgotten from the first set of bullet points.
* {RD,WR}MSR support for all the MSRs, including finally breaking into
the non-trivial work of context-dependent state.
* Changes to Task Switch.

>>  * Context switching of U/S_CET state.  Recommended way is with XSAVES, except
>>    the S_CET has broken sematics - it ends up as a mix of host and guest
>>    state, and isn't safe to XRSTOR without editing what the CPU wrote out.
> Hmm, I wasn't aware of quirks here - would you mind going into more
> detail?

I'll double check my notes.

>
>> The above ought to suffice for getting some XTF testing in place.  For general
>> guest support:
>>  * In-guest XSAVES support.  Windows is the only OS to support CET at the time
>>    of writing, and it cross-checks for XSAVES.  Linux expected to perform the
>>    same cross-check in due course.
> What specifically do you mean here? The XSAVES CPU feature is marked
> 'S', so ought to be visible to Windows. Hence I guess you mean support
> for the respective XSS bits?

We really shouldn't have exposed XSAVES without XSS bits, but yes - what
matters is that the {U,S}_CET XSTATE components are usable in the guest.

~Andrew



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] x86/hvm: Introduce experimental guest CET support
  2021-04-26 17:54 ` [PATCH 1/3] x86/hvm: Introduce experimental " Andrew Cooper
@ 2021-04-27 15:47   ` Jan Beulich
  2021-04-27 17:39     ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2021-04-27 15:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 26.04.2021 19:54, Andrew Cooper wrote:
> For now, let VMs opt into using CET by setting cet_ss/ibt in the CPUID
> policy.  Also extend cr4 handling to permit CR4.CET being set, along with
> logic to interlock CR4.CET and CR0.WP.
> 
> Everything else will malfunction for now, but this will help adding support
> incrementally - there is a lot to do before CET will work properly.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

Just one consideration:

> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -232,7 +232,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /*S  User Mode Instruction Prevention */
>  XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
>  XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
>  XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
> -XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*   CET - Shadow Stacks */
> +XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*h  CET - Shadow Stacks */
>  XEN_CPUFEATURE(GFNI,          6*32+ 8) /*A  Galois Field Instrs */
>  XEN_CPUFEATURE(VAES,          6*32+ 9) /*A  Vector AES Instrs */
>  XEN_CPUFEATURE(VPCLMULQDQ,    6*32+10) /*A  Vector Carry-less Multiplication Instrs */
> @@ -267,7 +267,7 @@ XEN_CPUFEATURE(SRBDS_CTRL,    9*32+ 9) /*   MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS.
>  XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
>  XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
>  XEN_CPUFEATURE(SERIALIZE,     9*32+14) /*a  SERIALIZE insn */
> -XEN_CPUFEATURE(CET_IBT,       9*32+20) /*   CET - Indirect Branch Tracking */
> +XEN_CPUFEATURE(CET_IBT,       9*32+20) /*h  CET - Indirect Branch Tracking */
>  XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
>  XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */
>  XEN_CPUFEATURE(L1D_FLUSH,     9*32+28) /*S  MSR_FLUSH_CMD and L1D flush. */

If by the time 4.16 finishes up the various todo items haven't been
taken care of, should we take note to undo these markings? I would
have suggested allowing them for debug builds only, but that's kind
of ugly to achieve in a public header.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] x86/svm: Enumeration for CET
  2021-04-26 17:54 ` [PATCH 2/3] x86/svm: Enumeration for CET Andrew Cooper
@ 2021-04-27 15:53   ` Jan Beulich
  2021-04-27 17:47     ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2021-04-27 15:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 26.04.2021 19:54, Andrew Cooper wrote:
> On CET-capable hardware, VMRUN/EXIT unconditionally swaps S_SET, SSP and

Nit: S_CET?

> ISST (subject to cleanbits) without further settings.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
with one question:

> @@ -497,7 +501,9 @@ struct vmcb_struct {
>      u64 rip;
>      u64 res14[11];
>      u64 rsp;
> -    u64 res15[3];
> +    u64 _msr_s_cet;             /* offset 0x400 + 0x1E0 - cleanbit 12 */
> +    u64 _ssp;                   /* offset 0x400 + 0x1E8   | */
> +    u64 _msr_isst;              /* offset 0x400 + 0x1F0   v */
>      u64 rax;
>      u64 star;
>      u64 lstar;

Any reason for the leading underscores, when none of the neighboring
fields have such? Did you perhaps mean to add VMCB_ACCESSORS()
instances for them? (Ack remains in effect if you decide to do so.)

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] x86/VT-x: Enumeration for CET
  2021-04-26 17:54 ` [PATCH 3/3] x86/VT-x: " Andrew Cooper
@ 2021-04-27 15:56   ` Jan Beulich
  2021-04-27 16:27     ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2021-04-27 15:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 26.04.2021 19:54, Andrew Cooper wrote:
> VT-x has separate entry/exit control for loading guest/host state.  Saving
> guest state on vmexit is performed unconditionally.

With the latter I find ...

> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -2014,6 +2014,9 @@ void vmcs_dump_vcpu(struct vcpu *v)
>      printk("RFLAGS=0x%08lx (0x%08lx)  DR7 = 0x%016lx\n",
>             vmr(GUEST_RFLAGS), regs->rflags,
>             vmr(GUEST_DR7));
> +    if ( vmentry_ctl & VM_ENTRY_LOAD_GUEST_CET )
> +        printk("SSP = 0x%016lx S_CET = 0x%016lx ISST = 0x%016lx\n",
> +               vmr(GUEST_SSP), vmr(GUEST_S_CET), vmr(GUEST_ISST));

... the conditional here a little odd, but I expect the plan is
to have the various bits all set consistently once actually
enabling the functionality.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] x86/VT-x: Enumeration for CET
  2021-04-27 15:56   ` Jan Beulich
@ 2021-04-27 16:27     ` Andrew Cooper
  2021-04-28  9:18       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2021-04-27 16:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 27/04/2021 16:56, Jan Beulich wrote:
> On 26.04.2021 19:54, Andrew Cooper wrote:
>> VT-x has separate entry/exit control for loading guest/host state.  Saving
>> guest state on vmexit is performed unconditionally.
> With the latter I find ...
>
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -2014,6 +2014,9 @@ void vmcs_dump_vcpu(struct vcpu *v)
>>      printk("RFLAGS=0x%08lx (0x%08lx)  DR7 = 0x%016lx\n",
>>             vmr(GUEST_RFLAGS), regs->rflags,
>>             vmr(GUEST_DR7));
>> +    if ( vmentry_ctl & VM_ENTRY_LOAD_GUEST_CET )
>> +        printk("SSP = 0x%016lx S_CET = 0x%016lx ISST = 0x%016lx\n",
>> +               vmr(GUEST_SSP), vmr(GUEST_S_CET), vmr(GUEST_ISST));
> ... the conditional here a little odd, but I expect the plan is
> to have the various bits all set consistently once actually
> enabling the functionality.

TBH, the general behaviour here is poor.

What happens now, as Xen does use CET itself, is that Xen's values
propagate into guest context, and are written back into the VMCS on
VMExit.  There is no way to turn this behaviour off AFAICT.

Therefore, we must not print the guest values when the vCPU isn't
configured for CET, because otherwise we'd be rendering what is actually
Xen state, in the guest state area.

Once a VM is using CET, we'll have both VM_ENTRY_LOAD_GUEST_CET and
VM_EXIT_LOAD_HOST_CET set.


There is theoretically an optimisations to be had for a hypervisor not
using CET, to only use the VM_ENTRY_LOAD_GUEST_CET control and leave
VM_EXIT_LOAD_HOST_CET clear, but getting this optimisation wrong will
leave the VMM running with guest controlled values.

Personally, I think it was be a far safer interface for there only to be
a single bit to control "switch CET state" or not.

~Andrew



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] x86/hvm: Introduce experimental guest CET support
  2021-04-27 15:47   ` Jan Beulich
@ 2021-04-27 17:39     ` Andrew Cooper
  2021-04-28  9:11       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2021-04-27 17:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 27/04/2021 16:47, Jan Beulich wrote:
> On 26.04.2021 19:54, Andrew Cooper wrote:
>> For now, let VMs opt into using CET by setting cet_ss/ibt in the CPUID
>> policy.  Also extend cr4 handling to permit CR4.CET being set, along with
>> logic to interlock CR4.CET and CR0.WP.
>>
>> Everything else will malfunction for now, but this will help adding support
>> incrementally - there is a lot to do before CET will work properly.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
>
> Just one consideration:
>
>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>> @@ -232,7 +232,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /*S  User Mode Instruction Prevention */
>>  XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
>>  XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
>>  XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
>> -XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*   CET - Shadow Stacks */
>> +XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*h  CET - Shadow Stacks */
>>  XEN_CPUFEATURE(GFNI,          6*32+ 8) /*A  Galois Field Instrs */
>>  XEN_CPUFEATURE(VAES,          6*32+ 9) /*A  Vector AES Instrs */
>>  XEN_CPUFEATURE(VPCLMULQDQ,    6*32+10) /*A  Vector Carry-less Multiplication Instrs */
>> @@ -267,7 +267,7 @@ XEN_CPUFEATURE(SRBDS_CTRL,    9*32+ 9) /*   MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS.
>>  XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
>>  XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
>>  XEN_CPUFEATURE(SERIALIZE,     9*32+14) /*a  SERIALIZE insn */
>> -XEN_CPUFEATURE(CET_IBT,       9*32+20) /*   CET - Indirect Branch Tracking */
>> +XEN_CPUFEATURE(CET_IBT,       9*32+20) /*h  CET - Indirect Branch Tracking */
>>  XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
>>  XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */
>>  XEN_CPUFEATURE(L1D_FLUSH,     9*32+28) /*S  MSR_FLUSH_CMD and L1D flush. */
> If by the time 4.16 finishes up the various todo items haven't been
> taken care of, should we take note to undo these markings? I would
> have suggested allowing them for debug builds only, but that's kind
> of ugly to achieve in a public header.

TBH, I still don't think this should be a public header.  If I would
have forseen how much of a PITA is it, I would have argued harder
against it.

It is, at best, tools-only (via SYSCTL_get_featureset), and I don't
intend that interface to survive the tools ABI stabilisation  work, as
it has been fully superseded by the cpu_policy interfaces and libx86.


As for the max markings themselves, I'm not sure they ought to be
debug-only.  Its important aspect of making guest support "tech preview"
to ensure the logic works irrespective of CONFIG_DEBUG, and I think it
is entirely fine for an experimental feature to be of status "your VM
will explode if you enable this right now", even if that spills over
into 4.17.

In reality, once we've got {U,S}_CET context switching working at the
Xen level, and {RD,WR}MSR plumbing done, it ought to be safe to people
to turn on an experiment with.  At this point, we're in the position of
"expected to work correctly in a subset of usecases".  I'd ideally like
to get to this point before 4.16 releases, but that will be very subject
to other work.


All the emulator work is for cases that a VM won't encounter by default
(Task Switch too, as Minix in the Intel Management Engine is the only
known 32-bit CET-SS implementation).

Obviously, we want to get the checklist complete before enabling by
default, but give the complexities in the emulator, I suspect we'll have
a long gap between "believed can be used safely in a subset of cases",
and "believed safe to use in general", and a long list of people happy
to use it in a "doesn't work under introspection yet" state.

~Andrew



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] x86/svm: Enumeration for CET
  2021-04-27 15:53   ` Jan Beulich
@ 2021-04-27 17:47     ` Andrew Cooper
  2021-04-28  9:14       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2021-04-27 17:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 27/04/2021 16:53, Jan Beulich wrote:
> On 26.04.2021 19:54, Andrew Cooper wrote:
>> On CET-capable hardware, VMRUN/EXIT unconditionally swaps S_SET, SSP and
> Nit: S_CET?

Ah yes.

>
>> ISST (subject to cleanbits) without further settings.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> with one question:
>
>> @@ -497,7 +501,9 @@ struct vmcb_struct {
>>      u64 rip;
>>      u64 res14[11];
>>      u64 rsp;
>> -    u64 res15[3];
>> +    u64 _msr_s_cet;             /* offset 0x400 + 0x1E0 - cleanbit 12 */
>> +    u64 _ssp;                   /* offset 0x400 + 0x1E8   | */
>> +    u64 _msr_isst;              /* offset 0x400 + 0x1F0   v */
>>      u64 rax;
>>      u64 star;
>>      u64 lstar;
> Any reason for the leading underscores, when none of the neighboring
> fields have such?

Yes - they're covered by a cleanbit, and for better or worse, this is
our style.

> Did you perhaps mean to add VMCB_ACCESSORS()
> instances for them?

TBH, I opencoded the cleanbit handling because I thoroughly hate that
entire infrastructure.

I ought to add lines, but I'm still very tempted to rip it all up and
implement something which is less obfuscating.

~Andrew



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] x86/hvm: Introduce experimental guest CET support
  2021-04-27 17:39     ` Andrew Cooper
@ 2021-04-28  9:11       ` Jan Beulich
  2021-04-28 17:54         ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2021-04-28  9:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 27.04.2021 19:39, Andrew Cooper wrote:
> On 27/04/2021 16:47, Jan Beulich wrote:
>> On 26.04.2021 19:54, Andrew Cooper wrote:
>>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>>> @@ -232,7 +232,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /*S  User Mode Instruction Prevention */
>>>  XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
>>>  XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
>>>  XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
>>> -XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*   CET - Shadow Stacks */
>>> +XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*h  CET - Shadow Stacks */
>>>  XEN_CPUFEATURE(GFNI,          6*32+ 8) /*A  Galois Field Instrs */
>>>  XEN_CPUFEATURE(VAES,          6*32+ 9) /*A  Vector AES Instrs */
>>>  XEN_CPUFEATURE(VPCLMULQDQ,    6*32+10) /*A  Vector Carry-less Multiplication Instrs */
>>> @@ -267,7 +267,7 @@ XEN_CPUFEATURE(SRBDS_CTRL,    9*32+ 9) /*   MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS.
>>>  XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
>>>  XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
>>>  XEN_CPUFEATURE(SERIALIZE,     9*32+14) /*a  SERIALIZE insn */
>>> -XEN_CPUFEATURE(CET_IBT,       9*32+20) /*   CET - Indirect Branch Tracking */
>>> +XEN_CPUFEATURE(CET_IBT,       9*32+20) /*h  CET - Indirect Branch Tracking */
>>>  XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
>>>  XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */
>>>  XEN_CPUFEATURE(L1D_FLUSH,     9*32+28) /*S  MSR_FLUSH_CMD and L1D flush. */
>> If by the time 4.16 finishes up the various todo items haven't been
>> taken care of, should we take note to undo these markings? I would
>> have suggested allowing them for debug builds only, but that's kind
>> of ugly to achieve in a public header.
> 
> TBH, I still don't think this should be a public header.  If I would
> have forseen how much of a PITA is it, I would have argued harder
> against it.
> 
> It is, at best, tools-only (via SYSCTL_get_featureset), and I don't
> intend that interface to survive the tools ABI stabilisation  work, as
> it has been fully superseded by the cpu_policy interfaces and libx86.

Well - what features we expose is part of the (or at least something
like an) ABI. The actual numbering of course isn't (or shouldn't be).
I'm in no way opposed to moving the header out of public/ (and until
now I was of the impression that it was you who put it there), but if
we do I think we will want an alternative way of expressing which
extended features we support. I say this in particular because I think
SUPPORT.md doesn't lend itself to documenting support status at this
level of granularity. I'd much rather see that file refer to this
header, even if this may mean some difficulty to non-programmers.

> As for the max markings themselves, I'm not sure they ought to be
> debug-only.  Its important aspect of making guest support "tech preview"
> to ensure the logic works irrespective of CONFIG_DEBUG, and I think it
> is entirely fine for an experimental feature to be of status "your VM
> will explode if you enable this right now", even if that spills over
> into 4.17.

For a release I consider this undesirable. If a feature is in such a
state at the point of entering the RC phase, I think such markings
ought to be undone.

> In reality, once we've got {U,S}_CET context switching working at the
> Xen level, and {RD,WR}MSR plumbing done, it ought to be safe to people
> to turn on an experiment with.  At this point, we're in the position of
> "expected to work correctly in a subset of usecases".  I'd ideally like
> to get to this point before 4.16 releases, but that will be very subject
> to other work.

At _that_ point I could see the markings getting introduced outside
of debug builds, but still lower-case.

> All the emulator work is for cases that a VM won't encounter by default
> (Task Switch too, as Minix in the Intel Management Engine is the only
> known 32-bit CET-SS implementation).

Right, this is what is a prereq for the markings to become upper-case
ones (if - not specifically for the features here, but as a general
remark - we want them to become so ever in the first place) and the
features fully supported.

Jan

> Obviously, we want to get the checklist complete before enabling by
> default, but give the complexities in the emulator, I suspect we'll have
> a long gap between "believed can be used safely in a subset of cases",
> and "believed safe to use in general", and a long list of people happy
> to use it in a "doesn't work under introspection yet" state.
> 
> ~Andrew
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] x86/svm: Enumeration for CET
  2021-04-27 17:47     ` Andrew Cooper
@ 2021-04-28  9:14       ` Jan Beulich
  2021-04-28 14:17         ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2021-04-28  9:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 27.04.2021 19:47, Andrew Cooper wrote:
> On 27/04/2021 16:53, Jan Beulich wrote:
>> On 26.04.2021 19:54, Andrew Cooper wrote:
>>> @@ -497,7 +501,9 @@ struct vmcb_struct {
>>>      u64 rip;
>>>      u64 res14[11];
>>>      u64 rsp;
>>> -    u64 res15[3];
>>> +    u64 _msr_s_cet;             /* offset 0x400 + 0x1E0 - cleanbit 12 */
>>> +    u64 _ssp;                   /* offset 0x400 + 0x1E8   | */
>>> +    u64 _msr_isst;              /* offset 0x400 + 0x1F0   v */
>>>      u64 rax;
>>>      u64 star;
>>>      u64 lstar;
>> Any reason for the leading underscores, when none of the neighboring
>> fields have such?
> 
> Yes - they're covered by a cleanbit, and for better or worse, this is
> our style.

The underscore prefixes are, to my understanding, there only to
emphasize that the fields shouldn't be accessed directly, but ...

>> Did you perhaps mean to add VMCB_ACCESSORS()
>> instances for them?
> 
> TBH, I opencoded the cleanbit handling because I thoroughly hate that
> entire infrastructure.

... via this (or something with similar abstracting effect). So
for any fields you mean to access directly they imo shouldn't be
there. I particularly don't view them as indicators of being
covered by cleanbits (if the respective accessors aren't used).

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] x86/VT-x: Enumeration for CET
  2021-04-27 16:27     ` Andrew Cooper
@ 2021-04-28  9:18       ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2021-04-28  9:18 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 27.04.2021 18:27, Andrew Cooper wrote:
> On 27/04/2021 16:56, Jan Beulich wrote:
>> On 26.04.2021 19:54, Andrew Cooper wrote:
>>> VT-x has separate entry/exit control for loading guest/host state.  Saving
>>> guest state on vmexit is performed unconditionally.
>> With the latter I find ...
>>
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -2014,6 +2014,9 @@ void vmcs_dump_vcpu(struct vcpu *v)
>>>      printk("RFLAGS=0x%08lx (0x%08lx)  DR7 = 0x%016lx\n",
>>>             vmr(GUEST_RFLAGS), regs->rflags,
>>>             vmr(GUEST_DR7));
>>> +    if ( vmentry_ctl & VM_ENTRY_LOAD_GUEST_CET )
>>> +        printk("SSP = 0x%016lx S_CET = 0x%016lx ISST = 0x%016lx\n",
>>> +               vmr(GUEST_SSP), vmr(GUEST_S_CET), vmr(GUEST_ISST));
>> ... the conditional here a little odd, but I expect the plan is
>> to have the various bits all set consistently once actually
>> enabling the functionality.
> 
> TBH, the general behaviour here is poor.
> 
> What happens now, as Xen does use CET itself, is that Xen's values
> propagate into guest context, and are written back into the VMCS on
> VMExit.  There is no way to turn this behaviour off AFAICT.
> 
> Therefore, we must not print the guest values when the vCPU isn't
> configured for CET, because otherwise we'd be rendering what is actually
> Xen state, in the guest state area.
> 
> Once a VM is using CET, we'll have both VM_ENTRY_LOAD_GUEST_CET and
> VM_EXIT_LOAD_HOST_CET set.

As I did assume then, so fair enough.

> There is theoretically an optimisations to be had for a hypervisor not
> using CET, to only use the VM_ENTRY_LOAD_GUEST_CET control and leave
> VM_EXIT_LOAD_HOST_CET clear, but getting this optimisation wrong will
> leave the VMM running with guest controlled values.
> 
> Personally, I think it was be a far safer interface for there only to be
> a single bit to control "switch CET state" or not.

I agree, but this then goes for other state having multiple controls
as well, I guess. I've been wondering whether this separation somehow
helps them with the implementation of the guest-save, host-load, and
guest-load steps.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/3] x86: Initial pieces for guest CET support
  2021-04-27 10:13   ` Andrew Cooper
@ 2021-04-28 12:25     ` Andrew Cooper
  2021-04-28 13:03       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2021-04-28 12:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 27/04/2021 11:13, Andrew Cooper wrote:
> There are 3 emulator complexities for shadow stack instructions.  SSP
> itself as a register, WRUSS no longer being CPL-based for
> user/supervisor, and the fact that RSTORSSP in particular uses an atomic
> block which microcode can express, but can't be encoded at an ISA
> level.  I've got no idea what to do about this last problem, because we
> can't map the two guest frames and re-issue the instruction - the
> aliasing check on the tokens forces us to map the two frames in their
> correct linear addresses.

Actually, RSTORSSP isn't too difficult.  I'd mis-read the pseudocode.

The atomic block is a check&edit of the token on the remote stack (not
both stacks, as I'd mistakenly thought).  The purpose is to prevent two
concurrent RSTORSSP's moving two threads onto the same shadow stack.

Without microcode superpowers, the best we can do this with a read,
check, cmpxchg() loop.

The common case will be no conflict, as stack switching will be well
formed (outside of debugging).  Any conflict here from real code is
going to yield #GP/#CP on one of the threads participating, so in the
case of a conflict in the emulator, a likely consequence of the 2nd
iteration is going to be a hard failure.

That said, malicious cases within the guest, or from foreign mappings,
can cause the cmpxchg() loop to take an unbounded time, so after 3
retries or so, we need to escalate to vcpu_pause_all_except_self(), and
or the ARM stop_machine() big hammer.

I'm tempted to just throw #GP back after 3 retries.  Its potentially
non-architectural behaviour, but won't occur in non-malicious
circumstances, and all fallback mechanisms have system-wide implications
that we oughtn't to be bowing to in a malicious circumstance.

~Andrew



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/3] x86: Initial pieces for guest CET support
  2021-04-28 12:25     ` Andrew Cooper
@ 2021-04-28 13:03       ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2021-04-28 13:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné, Wei Liu, Jun Nakajima, Kevin Tian, Xen-devel

On 28.04.2021 14:25, Andrew Cooper wrote:
> On 27/04/2021 11:13, Andrew Cooper wrote:
>> There are 3 emulator complexities for shadow stack instructions.  SSP
>> itself as a register, WRUSS no longer being CPL-based for
>> user/supervisor, and the fact that RSTORSSP in particular uses an atomic
>> block which microcode can express, but can't be encoded at an ISA
>> level.  I've got no idea what to do about this last problem, because we
>> can't map the two guest frames and re-issue the instruction - the
>> aliasing check on the tokens forces us to map the two frames in their
>> correct linear addresses.
> 
> Actually, RSTORSSP isn't too difficult.  I'd mis-read the pseudocode.
> 
> The atomic block is a check&edit of the token on the remote stack (not
> both stacks, as I'd mistakenly thought).  The purpose is to prevent two
> concurrent RSTORSSP's moving two threads onto the same shadow stack.
> 
> Without microcode superpowers, the best we can do this with a read,
> check, cmpxchg() loop.
> 
> The common case will be no conflict, as stack switching will be well
> formed (outside of debugging).  Any conflict here from real code is
> going to yield #GP/#CP on one of the threads participating, so in the
> case of a conflict in the emulator, a likely consequence of the 2nd
> iteration is going to be a hard failure.
> 
> That said, malicious cases within the guest, or from foreign mappings,
> can cause the cmpxchg() loop to take an unbounded time, so after 3
> retries or so, we need to escalate to vcpu_pause_all_except_self(), and
> or the ARM stop_machine() big hammer.
> 
> I'm tempted to just throw #GP back after 3 retries.  Its potentially
> non-architectural behaviour, but won't occur in non-malicious
> circumstances, and all fallback mechanisms have system-wide implications
> that we oughtn't to be bowing to in a malicious circumstance.

I agree.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] x86/svm: Enumeration for CET
  2021-04-28  9:14       ` Jan Beulich
@ 2021-04-28 14:17         ` Andrew Cooper
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2021-04-28 14:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 28/04/2021 10:14, Jan Beulich wrote:
> On 27.04.2021 19:47, Andrew Cooper wrote:
>> On 27/04/2021 16:53, Jan Beulich wrote:
>>> On 26.04.2021 19:54, Andrew Cooper wrote:
>>>> @@ -497,7 +501,9 @@ struct vmcb_struct {
>>>>      u64 rip;
>>>>      u64 res14[11];
>>>>      u64 rsp;
>>>> -    u64 res15[3];
>>>> +    u64 _msr_s_cet;             /* offset 0x400 + 0x1E0 - cleanbit 12 */
>>>> +    u64 _ssp;                   /* offset 0x400 + 0x1E8   | */
>>>> +    u64 _msr_isst;              /* offset 0x400 + 0x1F0   v */
>>>>      u64 rax;
>>>>      u64 star;
>>>>      u64 lstar;
>>> Any reason for the leading underscores, when none of the neighboring
>>> fields have such?
>> Yes - they're covered by a cleanbit, and for better or worse, this is
>> our style.
> The underscore prefixes are, to my understanding, there only to
> emphasize that the fields shouldn't be accessed directly, but ...
>
>>> Did you perhaps mean to add VMCB_ACCESSORS()
>>> instances for them?
>> TBH, I opencoded the cleanbit handling because I thoroughly hate that
>> entire infrastructure.
> ... via this (or something with similar abstracting effect). So
> for any fields you mean to access directly they imo shouldn't be
> there. I particularly don't view them as indicators of being
> covered by cleanbits (if the respective accessors aren't used).

The leading underscores are enforced by 'vmcb->_ ## name' in
VMCB_ACCESSORS().

The cleanbits are the only reason we can't treat this as a simple struct.

~Andrew


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] x86/hvm: Introduce experimental guest CET support
  2021-04-28  9:11       ` Jan Beulich
@ 2021-04-28 17:54         ` Andrew Cooper
  2021-04-29  9:07           ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2021-04-28 17:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 28/04/2021 10:11, Jan Beulich wrote:
> On 27.04.2021 19:39, Andrew Cooper wrote:
>> On 27/04/2021 16:47, Jan Beulich wrote:
>>> On 26.04.2021 19:54, Andrew Cooper wrote:
>>>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>>>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>>>> @@ -232,7 +232,7 @@ XEN_CPUFEATURE(UMIP,          6*32+ 2) /*S  User Mode Instruction Prevention */
>>>>  XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
>>>>  XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
>>>>  XEN_CPUFEATURE(AVX512_VBMI2,  6*32+ 6) /*A  Additional AVX-512 Vector Byte Manipulation Instrs */
>>>> -XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*   CET - Shadow Stacks */
>>>> +XEN_CPUFEATURE(CET_SS,        6*32+ 7) /*h  CET - Shadow Stacks */
>>>>  XEN_CPUFEATURE(GFNI,          6*32+ 8) /*A  Galois Field Instrs */
>>>>  XEN_CPUFEATURE(VAES,          6*32+ 9) /*A  Vector AES Instrs */
>>>>  XEN_CPUFEATURE(VPCLMULQDQ,    6*32+10) /*A  Vector Carry-less Multiplication Instrs */
>>>> @@ -267,7 +267,7 @@ XEN_CPUFEATURE(SRBDS_CTRL,    9*32+ 9) /*   MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS.
>>>>  XEN_CPUFEATURE(MD_CLEAR,      9*32+10) /*A  VERW clears microarchitectural buffers */
>>>>  XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
>>>>  XEN_CPUFEATURE(SERIALIZE,     9*32+14) /*a  SERIALIZE insn */
>>>> -XEN_CPUFEATURE(CET_IBT,       9*32+20) /*   CET - Indirect Branch Tracking */
>>>> +XEN_CPUFEATURE(CET_IBT,       9*32+20) /*h  CET - Indirect Branch Tracking */
>>>>  XEN_CPUFEATURE(IBRSB,         9*32+26) /*A  IBRS and IBPB support (used by Intel) */
>>>>  XEN_CPUFEATURE(STIBP,         9*32+27) /*A  STIBP */
>>>>  XEN_CPUFEATURE(L1D_FLUSH,     9*32+28) /*S  MSR_FLUSH_CMD and L1D flush. */
>>> If by the time 4.16 finishes up the various todo items haven't been
>>> taken care of, should we take note to undo these markings? I would
>>> have suggested allowing them for debug builds only, but that's kind
>>> of ugly to achieve in a public header.
>> TBH, I still don't think this should be a public header.  If I would
>> have forseen how much of a PITA is it, I would have argued harder
>> against it.
>>
>> It is, at best, tools-only (via SYSCTL_get_featureset), and I don't
>> intend that interface to survive the tools ABI stabilisation  work, as
>> it has been fully superseded by the cpu_policy interfaces and libx86.
> Well - what features we expose is part of the (or at least something
> like an) ABI.

Yes, but that ABI is called CPUID and MSRs, per Intel/AMD's spec.

VM's see it via the real instructions.  Control software sees it via a
pair of arrays ({leaf, subleaf, a, b, c, d} and {idx, val}) expressed in
terms of the vendors ABI.

>  The actual numbering of course isn't (or shouldn't be).
> I'm in no way opposed to moving the header out of public/ (and until
> now I was of the impression that it was you who put it there), but if
> we do I think we will want an alternative way of expressing which
> extended features we support. I say this in particular because I think
> SUPPORT.md doesn't lend itself to documenting support status at this
> level of granularity. I'd much rather see that file refer to this
> header, even if this may mean some difficulty to non-programmers.

We don't need to say or do anything for experimental features.

Hitting Tech Preview and supported ought to be in release notes.  I do
agree that SUPPORT.md isn't great.

>> As for the max markings themselves, I'm not sure they ought to be
>> debug-only.  Its important aspect of making guest support "tech preview"
>> to ensure the logic works irrespective of CONFIG_DEBUG, and I think it
>> is entirely fine for an experimental feature to be of status "your VM
>> will explode if you enable this right now", even if that spills over
>> into 4.17.
> For a release I consider this undesirable. If a feature is in such a
> state at the point of entering the RC phase, I think such markings
> ought to be undone.

Well no.  They either don't go in in the first place, or they go in and
stay.  Putting them in with an expectation to take them out later is a
recipe for forgetting to do so.

I know we're making up our "how to do complicated experimental features"
process as we go here, but nothing *in Xen* will malfunction if a user
opts into CET_SS/IBT.  Therefore I think they're fine to go in and stay.

~Andrew



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] x86/hvm: Introduce experimental guest CET support
  2021-04-28 17:54         ` Andrew Cooper
@ 2021-04-29  9:07           ` Jan Beulich
  2021-04-30 15:08             ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2021-04-29  9:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 28.04.2021 19:54, Andrew Cooper wrote:
> I know we're making up our "how to do complicated experimental features"
> process as we go here, but nothing *in Xen* will malfunction if a user
> opts into CET_SS/IBT.  Therefore I think they're fine to go in and stay.

Well, okay then. I hope possibls future additions of mine then will
get similar treatment.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] x86/hvm: Introduce experimental guest CET support
  2021-04-29  9:07           ` Jan Beulich
@ 2021-04-30 15:08             ` Andrew Cooper
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2021-04-30 15:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 29/04/2021 10:07, Jan Beulich wrote:
> On 28.04.2021 19:54, Andrew Cooper wrote:
>> I know we're making up our "how to do complicated experimental features"
>> process as we go here, but nothing *in Xen* will malfunction if a user
>> opts into CET_SS/IBT.  Therefore I think they're fine to go in and stay.
> Well, okay then. I hope possibls future additions of mine then will
> get similar treatment.

So having thought on this for a while...

The annotations in the header file don't help me.  My dev workflow
already starts with turning them to default, skips messing around with
toolstack level things.

At this point in feature development, there is no use to anyone who
isn't also editing the hypervisor too.

So unless it is going to help you (and I suspect it wont), its probably
not helpful to the wider world.

I'll drop the markings from this patch, and put them back in at the
point that we believe "basic VMs" work.

~Andrew



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-04-30 15:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26 17:54 [PATCH 0/3] x86: Initial pieces for guest CET support Andrew Cooper
2021-04-26 17:54 ` [PATCH 1/3] x86/hvm: Introduce experimental " Andrew Cooper
2021-04-27 15:47   ` Jan Beulich
2021-04-27 17:39     ` Andrew Cooper
2021-04-28  9:11       ` Jan Beulich
2021-04-28 17:54         ` Andrew Cooper
2021-04-29  9:07           ` Jan Beulich
2021-04-30 15:08             ` Andrew Cooper
2021-04-26 17:54 ` [PATCH 2/3] x86/svm: Enumeration for CET Andrew Cooper
2021-04-27 15:53   ` Jan Beulich
2021-04-27 17:47     ` Andrew Cooper
2021-04-28  9:14       ` Jan Beulich
2021-04-28 14:17         ` Andrew Cooper
2021-04-26 17:54 ` [PATCH 3/3] x86/VT-x: " Andrew Cooper
2021-04-27 15:56   ` Jan Beulich
2021-04-27 16:27     ` Andrew Cooper
2021-04-28  9:18       ` Jan Beulich
2021-04-27  6:46 ` [PATCH 0/3] x86: Initial pieces for guest CET support Jan Beulich
2021-04-27 10:13   ` Andrew Cooper
2021-04-28 12:25     ` Andrew Cooper
2021-04-28 13:03       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).