All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks
@ 2020-05-27 19:18 Andrew Cooper
  2020-05-27 19:18 ` [PATCH v2 01/14] x86/traps: Clean up printing in {do_reserved, fatal}_trap() Andrew Cooper
                   ` (14 more replies)
  0 siblings, 15 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

This series implements Shadow Stack support for Xen to use.

You'll need a CET-capable toolchain (Binutils >= 2.29 or LLVM >= 7), but no
specific compiler support required.

CET-SS makes PV32 unusable, so using shadow stacks prevents the use of 32bit
PV guests.  Compatibilty can be obtained using PV Shim.

See patches for individual changes.

Andrew Cooper (14):
  x86/traps: Clean up printing in {do_reserved,fatal}_trap()
  x86/traps: Factor out extable_fixup() and make printing consistent
  x86/shstk: Introduce Supervisor Shadow Stack support
  x86/traps: Implement #CP handler and extend #PF for shadow stacks
  x86/shstk: Re-layout the stack block for shadow stacks
  x86/shstk: Create shadow stacks
  x86/cpu: Adjust enable_nmis() to be shadow stack compatible
  x86/cpu: Adjust reset_stack_and_jump() to be shadow stack compatible
  x86/spec-ctrl: Adjust DO_OVERWRITE_RSB to be shadow stack compatible
  x86/extable: Adjust extable handling to be shadow stack compatible
  x86/alt: Adjust _alternative_instructions() to not create shadow stacks
  x86/entry: Adjust guest paths to be shadow stack compatible
  x86/S3: Save and restore Shadow Stack configuration
  x86/shstk: Activate Supervisor Shadow Stacks

 docs/misc/xen-command-line.pandoc   |  25 ++++
 xen/arch/x86/Kconfig                |  18 +++
 xen/arch/x86/acpi/wakeup_prot.S     |  58 +++++++++
 xen/arch/x86/alternative.c          |  14 +++
 xen/arch/x86/boot/x86_64.S          |  35 +++++-
 xen/arch/x86/cpu/common.c           |  39 +++++-
 xen/arch/x86/crash.c                |   7 ++
 xen/arch/x86/mm.c                   |  46 ++++---
 xen/arch/x86/setup.c                |  56 +++++++++
 xen/arch/x86/smpboot.c              |   3 +-
 xen/arch/x86/spec_ctrl.c            |   8 ++
 xen/arch/x86/traps.c                | 239 ++++++++++++++++++++++++++----------
 xen/arch/x86/x86_64/compat/entry.S  |   1 +
 xen/arch/x86/x86_64/entry.S         |  50 +++++++-
 xen/include/asm-x86/asm_defns.h     |   8 +-
 xen/include/asm-x86/config.h        |   5 +
 xen/include/asm-x86/cpufeature.h    |   1 +
 xen/include/asm-x86/cpufeatures.h   |   1 +
 xen/include/asm-x86/current.h       |  60 +++++++--
 xen/include/asm-x86/mm.h            |   1 -
 xen/include/asm-x86/msr-index.h     |   3 +
 xen/include/asm-x86/page.h          |   1 +
 xen/include/asm-x86/processor.h     |  60 ++++++---
 xen/include/asm-x86/spec_ctrl_asm.h |  16 ++-
 xen/include/asm-x86/x86-defns.h     |  35 ++++++
 xen/include/asm-x86/x86_64/page.h   |   1 +
 xen/scripts/Kconfig.include         |   4 +
 27 files changed, 664 insertions(+), 131 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 01/14] x86/traps: Clean up printing in {do_reserved, fatal}_trap()
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-28  9:45   ` [PATCH v2 01/14] x86/traps: Clean up printing in {do_reserved,fatal}_trap() Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 02/14] x86/traps: Factor out extable_fixup() and make printing consistent Andrew Cooper
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

For one, they render the vector in a different base.

Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their
mnemonic, which starts bringing the code/diagnostics in line with the Intel
and AMD manuals.

Provide constants for every archtiecturally defined exception, even those not
implemented by Xen yet, as do_reserved_trap() is a catch-all handler.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Move "#" into vec_name() to skip it for the 3-character vectors
---
 xen/arch/x86/traps.c            | 26 +++++++++++++++++++++-----
 xen/include/asm-x86/processor.h |  6 +-----
 xen/include/asm-x86/x86-defns.h | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index a8300c214d..427178e649 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -682,6 +682,22 @@ const char *trapstr(unsigned int trapnr)
     return trapnr < ARRAY_SIZE(strings) ? strings[trapnr] : "???";
 }
 
+static const char *vec_name(unsigned int vec)
+{
+    static const char names[][4] = {
+#define P(x) [X86_EXC_ ## x] = "#" #x
+#define N(x) [X86_EXC_ ## x] = #x
+        P(DE),  P(DB),  N(NMI), P(BP),  P(OF),  P(BR),  P(UD),  P(NM),
+        P(DF),  N(CSO), P(TS),  P(NP),  P(SS),  P(GP),  P(PF),  N(SPV),
+        P(MF),  P(AC),  P(MC),  P(XM),  P(VE),  P(CP),
+                                        P(HV),  P(VC),  P(SX),
+#undef N
+#undef P
+    };
+
+    return (vec < ARRAY_SIZE(names) && names[vec][0]) ? names[vec] : "???";
+}
+
 /*
  * This is called for faults at very unexpected times (e.g., when interrupts
  * are disabled). In such situations we can't do much that is safe. We try to
@@ -739,10 +755,9 @@ void fatal_trap(const struct cpu_user_regs *regs, bool show_remote)
         }
     }
 
-    panic("FATAL TRAP: vector = %d (%s)\n"
-          "[error_code=%04x] %s\n",
-          trapnr, trapstr(trapnr), regs->error_code,
-          (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT");
+    panic("FATAL TRAP: vec %u, %s[%04x]%s\n",
+          trapnr, vec_name(trapnr), regs->error_code,
+          (regs->eflags & X86_EFLAGS_IF) ? "" : " IN INTERRUPT CONTEXT");
 }
 
 static void do_reserved_trap(struct cpu_user_regs *regs)
@@ -753,7 +768,8 @@ static void do_reserved_trap(struct cpu_user_regs *regs)
         return;
 
     show_execution_state(regs);
-    panic("FATAL RESERVED TRAP %#x: %s\n", trapnr, trapstr(trapnr));
+    panic("FATAL RESERVED TRAP: vec %u, %s[%04x]\n",
+          trapnr, vec_name(trapnr), regs->error_code);
 }
 
 static void do_trap(struct cpu_user_regs *regs)
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 070691882b..96deac73ed 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -43,11 +43,7 @@
 #define TRAP_virtualisation   20
 #define TRAP_nr               32
 
-#define TRAP_HAVE_EC                                                    \
-    ((1u << TRAP_double_fault) | (1u << TRAP_invalid_tss) |             \
-     (1u << TRAP_no_segment) | (1u << TRAP_stack_error) |               \
-     (1u << TRAP_gp_fault) | (1u << TRAP_page_fault) |                  \
-     (1u << TRAP_alignment_check))
+#define TRAP_HAVE_EC X86_EXC_HAVE_EC
 
 /* Set for entry via SYSCALL. Informs return code to use SYSRETQ not IRETQ. */
 /* NB. Same as VGCF_in_syscall. No bits in common with any other TRAP_ defn. */
diff --git a/xen/include/asm-x86/x86-defns.h b/xen/include/asm-x86/x86-defns.h
index 8bf503220a..5366e2d018 100644
--- a/xen/include/asm-x86/x86-defns.h
+++ b/xen/include/asm-x86/x86-defns.h
@@ -118,4 +118,38 @@
 
 #define X86_NR_VECTORS 256
 
+/* Exception Vectors */
+#define X86_EXC_DE             0 /* Divide Error. */
+#define X86_EXC_DB             1 /* Debug Exception. */
+#define X86_EXC_NMI            2 /* NMI. */
+#define X86_EXC_BP             3 /* Breakpoint. */
+#define X86_EXC_OF             4 /* Overflow. */
+#define X86_EXC_BR             5 /* BOUND Range. */
+#define X86_EXC_UD             6 /* Invalid Opcode. */
+#define X86_EXC_NM             7 /* Device Not Available. */
+#define X86_EXC_DF             8 /* Double Fault. */
+#define X86_EXC_CSO            9 /* Coprocessor Segment Overrun. */
+#define X86_EXC_TS            10 /* Invalid TSS. */
+#define X86_EXC_NP            11 /* Segment Not Present. */
+#define X86_EXC_SS            12 /* Stack-Segment Fault. */
+#define X86_EXC_GP            13 /* General Porection Fault. */
+#define X86_EXC_PF            14 /* Page Fault. */
+#define X86_EXC_SPV           15 /* PIC Spurious Interrupt Vector. */
+#define X86_EXC_MF            16 /* Maths fault (x87 FPU). */
+#define X86_EXC_AC            17 /* Alignment Check. */
+#define X86_EXC_MC            18 /* Machine Check. */
+#define X86_EXC_XM            19 /* SIMD Exception. */
+#define X86_EXC_VE            20 /* Virtualisation Exception. */
+#define X86_EXC_CP            21 /* Control-flow Protection. */
+#define X86_EXC_HV            28 /* Hypervisor Injection. */
+#define X86_EXC_VC            29 /* VMM Communication. */
+#define X86_EXC_SX            30 /* Security Exception. */
+
+/* Bitmap of exceptions which have error codes. */
+#define X86_EXC_HAVE_EC                                             \
+    ((1u << X86_EXC_DF) | (1u << X86_EXC_TS) | (1u << X86_EXC_NP) | \
+     (1u << X86_EXC_SS) | (1u << X86_EXC_GP) | (1u << X86_EXC_PF) | \
+     (1u << X86_EXC_AC) | (1u << X86_EXC_CP) |                      \
+     (1u << X86_EXC_VC) | (1u << X86_EXC_SX))
+
 #endif	/* __XEN_X86_DEFNS_H__ */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 02/14] x86/traps: Factor out extable_fixup() and make printing consistent
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
  2020-05-27 19:18 ` [PATCH v2 01/14] x86/traps: Clean up printing in {do_reserved, fatal}_trap() Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-28  9:50   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support Andrew Cooper
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

UD faults never had any diagnostics printed, and the others were inconsistent.

Don't use dprintk() because identifying traps.c is actively unhelpful in the
message, as it is the location of the fixup, not the fault.  Use the new
vec_name() infrastructure, rather than leaving raw numbers for the log.

  (XEN) Running stub recovery selftests...
  (XEN) Fixup #UD[0000]: ffff82d07fffd040 [ffff82d07fffd040] -> ffff82d0403ac9d6
  (XEN) Fixup #GP[0000]: ffff82d07fffd041 [ffff82d07fffd041] -> ffff82d0403ac9d6
  (XEN) Fixup #SS[0000]: ffff82d07fffd040 [ffff82d07fffd040] -> ffff82d0403ac9d6
  (XEN) Fixup #BP[0000]: ffff82d07fffd041 [ffff82d07fffd041] -> ffff82d0403ac9d6

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Rebase
 * Rename to extable_fixup() to better distinguish from fixup_page_fault()
---
 xen/arch/x86/traps.c | 77 ++++++++++++++++++++++++----------------------------
 1 file changed, 35 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 427178e649..eeb3e146ef 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -772,10 +772,31 @@ static void do_reserved_trap(struct cpu_user_regs *regs)
           trapnr, vec_name(trapnr), regs->error_code);
 }
 
+static bool extable_fixup(struct cpu_user_regs *regs, bool print)
+{
+    unsigned long fixup = search_exception_table(regs);
+
+    if ( unlikely(fixup == 0) )
+        return false;
+
+    /*
+     * Don't use dprintk() because the __FILE__ reference is unhelpful.
+     * Can currently be triggered by guests.  Make sure we ratelimit.
+     */
+    if ( IS_ENABLED(CONFIG_DEBUG) && print )
+        printk(XENLOG_GUEST XENLOG_WARNING "Fixup %s[%04x]: %p [%ps] -> %p\n",
+               vec_name(regs->entry_vector), regs->error_code,
+               _p(regs->rip), _p(regs->rip), _p(fixup));
+
+    regs->rip = fixup;
+    this_cpu(last_extable_addr) = regs->rip;
+
+    return true;
+}
+
 static void do_trap(struct cpu_user_regs *regs)
 {
     unsigned int trapnr = regs->entry_vector;
-    unsigned long fixup;
 
     if ( regs->error_code & X86_XEC_EXT )
         goto hardware_trap;
@@ -793,14 +814,8 @@ static void do_trap(struct cpu_user_regs *regs)
         return;
     }
 
-    if ( likely((fixup = search_exception_table(regs)) != 0) )
-    {
-        dprintk(XENLOG_ERR, "Trap %u: %p [%ps] -> %p\n",
-                trapnr, _p(regs->rip), _p(regs->rip), _p(fixup));
-        this_cpu(last_extable_addr) = regs->rip;
-        regs->rip = fixup;
+    if ( likely(extable_fixup(regs, true)) )
         return;
-    }
 
  hardware_trap:
     if ( debugger_trap_fatal(trapnr, regs) )
@@ -1108,12 +1123,8 @@ void do_invalid_op(struct cpu_user_regs *regs)
     }
 
  die:
-    if ( (fixup = search_exception_table(regs)) != 0 )
-    {
-        this_cpu(last_extable_addr) = regs->rip;
-        regs->rip = fixup;
+    if ( likely(extable_fixup(regs, true)) )
         return;
-    }
 
     if ( debugger_trap_fatal(TRAP_invalid_op, regs) )
         return;
@@ -1129,16 +1140,8 @@ void do_int3(struct cpu_user_regs *regs)
 
     if ( !guest_mode(regs) )
     {
-        unsigned long fixup;
-
-        if ( (fixup = search_exception_table(regs)) != 0 )
-        {
-            this_cpu(last_extable_addr) = regs->rip;
-            dprintk(XENLOG_DEBUG, "Trap %u: %p [%ps] -> %p\n",
-                    TRAP_int3, _p(regs->rip), _p(regs->rip), _p(fixup));
-            regs->rip = fixup;
+        if ( likely(extable_fixup(regs, true)) )
             return;
-        }
 
         if ( !debugger_trap_fatal(TRAP_int3, regs) )
             printk(XENLOG_DEBUG "Hit embedded breakpoint at %p [%ps]\n",
@@ -1415,7 +1418,7 @@ static int fixup_page_fault(unsigned long addr, struct cpu_user_regs *regs)
 
 void do_page_fault(struct cpu_user_regs *regs)
 {
-    unsigned long addr, fixup;
+    unsigned long addr;
     unsigned int error_code;
 
     addr = read_cr2();
@@ -1461,11 +1464,9 @@ void do_page_fault(struct cpu_user_regs *regs)
         if ( pf_type != real_fault )
             return;
 
-        if ( likely((fixup = search_exception_table(regs)) != 0) )
+        if ( likely(extable_fixup(regs, false)) )
         {
             perfc_incr(copy_user_faults);
-            this_cpu(last_extable_addr) = regs->rip;
-            regs->rip = fixup;
             return;
         }
 
@@ -1521,7 +1522,6 @@ void do_general_protection(struct cpu_user_regs *regs)
 #ifdef CONFIG_PV
     struct vcpu *v = current;
 #endif
-    unsigned long fixup;
 
     if ( debugger_trap_entry(TRAP_gp_fault, regs) )
         return;
@@ -1588,14 +1588,8 @@ void do_general_protection(struct cpu_user_regs *regs)
 
  gp_in_kernel:
 
-    if ( likely((fixup = search_exception_table(regs)) != 0) )
-    {
-        dprintk(XENLOG_INFO, "GPF (%04x): %p [%ps] -> %p\n",
-                regs->error_code, _p(regs->rip), _p(regs->rip), _p(fixup));
-        this_cpu(last_extable_addr) = regs->rip;
-        regs->rip = fixup;
+    if ( likely(extable_fixup(regs, true)) )
         return;
-    }
 
  hardware_gp:
     if ( debugger_trap_fatal(TRAP_gp_fault, regs) )
@@ -1754,18 +1748,17 @@ void do_device_not_available(struct cpu_user_regs *regs)
 
     if ( !guest_mode(regs) )
     {
-        unsigned long fixup = search_exception_table(regs);
-
-        gprintk(XENLOG_ERR, "#NM: %p [%ps] -> %p\n",
-                _p(regs->rip), _p(regs->rip), _p(fixup));
         /*
          * We shouldn't be able to reach here, but for release builds have
          * the recovery logic in place nevertheless.
          */
-        ASSERT_UNREACHABLE();
-        BUG_ON(!fixup);
-        regs->rip = fixup;
-        return;
+        if ( extable_fixup(regs, true) )
+        {
+            ASSERT_UNREACHABLE();
+            return;
+        }
+
+        fatal_trap(regs, false);
     }
 
 #ifdef CONFIG_PV
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
  2020-05-27 19:18 ` [PATCH v2 01/14] x86/traps: Clean up printing in {do_reserved, fatal}_trap() Andrew Cooper
  2020-05-27 19:18 ` [PATCH v2 02/14] x86/traps: Factor out extable_fixup() and make printing consistent Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-28 10:25   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 04/14] x86/traps: Implement #CP handler and extend #PF for shadow stacks Andrew Cooper
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Introduce CONFIG_HAS_AS_CET to determine whether CET instructions are
supported in the assembler, and CONFIG_XEN_SHSTK as the main build option.

Introduce cet={no-,}shstk to for a user to select whether or not to use shadow
stacks at runtime, and X86_FEATURE_XEN_SHSTK to determine Xen's overall
enablement of shadow stacks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

LLVM 6 supports CET-SS instructions while only LLVM 7 supports CET-IBT
instructions.  We'd need to split HAS_AS_CET into two if we want to support
supervisor shadow stacks with LLVM 6.  (This demonstrates exactly why picking
a handful of instructions to test is the right approach.)

v2:
 * Leave a comment identifying minimum toolchain support, to make it easier to
   remove ifdefary in the future when bumping minima.
 * Reindent CONFIG_XEN_SHSTK help text.
 * Rename xen= to cet=.  Add documentation, __init.
---
 docs/misc/xen-command-line.pandoc | 17 +++++++++++++++++
 xen/arch/x86/Kconfig              | 18 ++++++++++++++++++
 xen/arch/x86/setup.c              | 30 ++++++++++++++++++++++++++++++
 xen/include/asm-x86/cpufeature.h  |  1 +
 xen/include/asm-x86/cpufeatures.h |  1 +
 xen/scripts/Kconfig.include       |  4 ++++
 6 files changed, 71 insertions(+)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index e16bb90184..d4934eabb7 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -270,6 +270,23 @@ and not running softirqs. Reduce this if softirqs are not being run frequently
 enough. Setting this to a high value may cause boot failure, particularly if
 the NMI watchdog is also enabled.
 
+### cet
+    = List of [ shstk=<bool> ]
+
+    Applicability: x86
+
+Controls for the use of Control-flow Enforcement Technology.  CET is group of
+hardware features designed to combat Return-oriented Programming (ROP, also
+call/jmp COP/JOP) attacks.
+
+*   The `shstk=` boolean controls whether Xen uses Shadow Stacks for its own
+    protection.
+
+    The option is available when `CONFIG_XEN_SHSTK` is compiled in, and
+    defaults to `true` on hardware supporting CET-SS.  Specifying
+    `cet=no-shstk` will cause Xen not to use Shadow Stacks even when support
+    is available in hardware.
+
 ### clocksource (x86)
 > `= pit | hpet | acpi | tsc`
 
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index b565f6831d..304a42ffb2 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -34,6 +34,10 @@ config ARCH_DEFCONFIG
 config INDIRECT_THUNK
 	def_bool $(cc-option,-mindirect-branch-register)
 
+config HAS_AS_CET
+	# binutils >= 2.29 and LLVM >= 7
+	def_bool $(as-instr,wrssq %rax$(comma)0;setssbsy;endbr64)
+
 menu "Architecture Features"
 
 source "arch/Kconfig"
@@ -97,6 +101,20 @@ config HVM
 
 	  If unsure, say Y.
 
+config XEN_SHSTK
+	bool "Supervisor Shadow Stacks"
+	depends on HAS_AS_CET && EXPERT = "y"
+	default y
+	---help---
+	  Control-flow Enforcement Technology (CET) is a set of features in
+	  hardware designed to combat Return-oriented Programming (ROP, also
+	  call/jump COP/JOP) attacks.  Shadow Stacks are one CET feature
+	  designed to provide return address protection.
+
+	  This option arranges for Xen to use CET-SS for its own protection.
+	  When CET-SS is active, 32bit PV guests cannot be used.  Backwards
+	  compatiblity can be provided vai the PV Shim mechanism.
+
 config SHADOW_PAGING
         bool "Shadow Paging"
         default y
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2dec7a3fc6..584589baff 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -95,6 +95,36 @@ unsigned long __initdata highmem_start;
 size_param("highmem-start", highmem_start);
 #endif
 
+static bool __initdata opt_xen_shstk = true;
+
+static int __init parse_cet(const char *s)
+{
+    const char *ss;
+    int val, rc = 0;
+
+    do {
+        ss = strchr(s, ',');
+        if ( !ss )
+            ss = strchr(s, '\0');
+
+        if ( (val = parse_boolean("shstk", s, ss)) >= 0 )
+        {
+#ifdef CONFIG_XEN_SHSTK
+            opt_xen_shstk = val;
+#else
+            no_config_param("XEN_SHSTK", "cet", s, ss);
+#endif
+        }
+        else
+            rc = -EINVAL;
+
+        s = ss + 1;
+    } while ( *ss );
+
+    return rc;
+}
+custom_param("cet", parse_cet);
+
 cpumask_t __read_mostly cpu_present_map;
 
 unsigned long __read_mostly xen_phys_start;
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index cadef4e824..b831448eba 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -137,6 +137,7 @@
 #define cpu_has_aperfmperf      boot_cpu_has(X86_FEATURE_APERFMPERF)
 #define cpu_has_lfence_dispatch boot_cpu_has(X86_FEATURE_LFENCE_DISPATCH)
 #define cpu_has_xen_lbr         boot_cpu_has(X86_FEATURE_XEN_LBR)
+#define cpu_has_xen_shstk       boot_cpu_has(X86_FEATURE_XEN_SHSTK)
 
 #define cpu_has_msr_tsc_aux     (cpu_has_rdtscp || cpu_has_rdpid)
 
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index b9d3cac975..d7e42d9bb6 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -38,6 +38,7 @@ XEN_CPUFEATURE(XEN_LBR,           X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
 XEN_CPUFEATURE(SC_VERW_PV,        X86_SYNTH(23)) /* VERW used by Xen for PV */
 XEN_CPUFEATURE(SC_VERW_HVM,       X86_SYNTH(24)) /* VERW used by Xen for HVM */
 XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
+XEN_CPUFEATURE(XEN_SHSTK,         X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
 
 /* Bug words follow the synthetic words. */
 #define X86_NR_BUG 1
diff --git a/xen/scripts/Kconfig.include b/xen/scripts/Kconfig.include
index 8221095ca3..e1f13e1720 100644
--- a/xen/scripts/Kconfig.include
+++ b/xen/scripts/Kconfig.include
@@ -31,6 +31,10 @@ cc-option = $(success,$(CC) -Werror $(CLANG_FLAGS) $(1) -E -x c /dev/null -o /de
 # Return y if the linker supports <flag>, n otherwise
 ld-option = $(success,$(LD) -v $(1))
 
+# $(as-instr,<instr>)
+# Return y if the assembler supports <instr>, n otherwise
+as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler -o /dev/null -)
+
 # check if $(CC) and $(LD) exist
 $(error-if,$(failure,command -v $(CC)),compiler '$(CC)' not found)
 $(error-if,$(failure,command -v $(LD)),linker '$(LD)' not found)
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 04/14] x86/traps: Implement #CP handler and extend #PF for shadow stacks
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (2 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-28 12:03   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 05/14] x86/shstk: Re-layout the stack block " Andrew Cooper
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

For now, any #CP exception or shadow stack #PF indicate a bug in Xen, but
attempt to recover from #CP if taken in guest context.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Rebase over #PF[Rsvd] rework.
 * Alignment for PFEC_shstk.
 * Use more X86_EXC_* names.
---
 xen/arch/x86/traps.c            | 46 +++++++++++++++++++++++++++++++++++++++--
 xen/arch/x86/x86_64/entry.S     |  7 ++++++-
 xen/include/asm-x86/processor.h |  2 ++
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index eeb3e146ef..90da787ee2 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -156,7 +156,9 @@ void (* const exception_table[TRAP_nr])(struct cpu_user_regs *regs) = {
     [TRAP_alignment_check]              = do_trap,
     [TRAP_machine_check]                = (void *)do_machine_check,
     [TRAP_simd_error]                   = do_trap,
-    [TRAP_virtualisation ...
+    [TRAP_virtualisation]               = do_reserved_trap,
+    [X86_EXC_CP]                        = do_entry_CP,
+    [X86_EXC_CP + 1 ...
      (ARRAY_SIZE(exception_table) - 1)] = do_reserved_trap,
 };
 
@@ -1445,8 +1447,10 @@ void do_page_fault(struct cpu_user_regs *regs)
      *
      * Anything remaining is an error, constituting corruption of the
      * pagetables and probably an L1TF vulnerable gadget.
+     *
+     * Any shadow stack access fault is a bug in Xen.
      */
-    if ( error_code & PFEC_reserved_bit )
+    if ( error_code & (PFEC_reserved_bit | PFEC_shstk) )
         goto fatal;
 
     if ( unlikely(!guest_mode(regs)) )
@@ -1898,6 +1902,43 @@ void do_debug(struct cpu_user_regs *regs)
     pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
 }
 
+void do_entry_CP(struct cpu_user_regs *regs)
+{
+    static const char errors[][10] = {
+        [1] = "near ret",
+        [2] = "far/iret",
+        [3] = "endbranch",
+        [4] = "rstorssp",
+        [5] = "setssbsy",
+    };
+    const char *err = "??";
+    unsigned int ec = regs->error_code;
+
+    if ( debugger_trap_entry(TRAP_debug, regs) )
+        return;
+
+    /* Decode ec if possible */
+    if ( ec < ARRAY_SIZE(errors) && errors[ec][0] )
+        err = errors[ec];
+
+    /*
+     * For now, only supervisors shadow stacks should be active.  A #CP from
+     * guest context is probably a Xen bug, but kill the guest in an attempt
+     * to recover.
+     */
+    if ( guest_mode(regs) )
+    {
+        gprintk(XENLOG_ERR, "Hit #CP[%04x] in guest context %04x:%p\n",
+                ec, regs->cs, _p(regs->rip));
+        ASSERT_UNREACHABLE();
+        domain_crash(current->domain);
+        return;
+    }
+
+    show_execution_state(regs);
+    panic("CONTROL-FLOW PROTECTION FAULT: #CP[%04x] %s\n", ec, err);
+}
+
 static void __init noinline __set_intr_gate(unsigned int n,
                                             uint32_t dpl, void *addr)
 {
@@ -1987,6 +2028,7 @@ void __init init_idt_traps(void)
     set_intr_gate(TRAP_alignment_check,&alignment_check);
     set_intr_gate(TRAP_machine_check,&machine_check);
     set_intr_gate(TRAP_simd_error,&simd_coprocessor_error);
+    set_intr_gate(X86_EXC_CP, entry_CP);
 
     /* Specify dedicated interrupt stacks for NMI, #DF, and #MC. */
     enable_each_ist(idt_table);
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index d55453f3f3..f7ee3dce91 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -795,6 +795,10 @@ ENTRY(alignment_check)
         movl  $TRAP_alignment_check,4(%rsp)
         jmp   handle_exception
 
+ENTRY(entry_CP)
+        movl  $X86_EXC_CP, 4(%rsp)
+        jmp   handle_exception
+
 ENTRY(double_fault)
         movl  $TRAP_double_fault,4(%rsp)
         /* Set AC to reduce chance of further SMAP faults */
@@ -940,7 +944,8 @@ autogen_stubs: /* Automatically generated stubs. */
         entrypoint 1b
 
         /* Reserved exceptions, heading towards do_reserved_trap(). */
-        .elseif vec == TRAP_copro_seg || vec == TRAP_spurious_int || (vec > TRAP_simd_error && vec < TRAP_nr)
+        .elseif vec == X86_EXC_CSO || vec == X86_EXC_SPV || \
+                vec == X86_EXC_VE  || (vec > X86_EXC_CP && vec < TRAP_nr)
 
 1:      test  $8,%spl        /* 64bit exception frames are 16 byte aligned, but the word */
         jz    2f             /* size is 8 bytes.  Check whether the processor gave us an */
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 96deac73ed..c2b9dc1ac0 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -68,6 +68,7 @@
 #define PFEC_reserved_bit   (_AC(1,U) << 3)
 #define PFEC_insn_fetch     (_AC(1,U) << 4)
 #define PFEC_prot_key       (_AC(1,U) << 5)
+#define PFEC_shstk          (_AC(1,U) << 6)
 #define PFEC_arch_mask      (_AC(0xffff,U)) /* Architectural PFEC values. */
 /* Internally used only flags. */
 #define PFEC_page_paged     (1U<<16)
@@ -530,6 +531,7 @@ DECLARE_TRAP_HANDLER(coprocessor_error);
 DECLARE_TRAP_HANDLER(simd_coprocessor_error);
 DECLARE_TRAP_HANDLER_CONST(machine_check);
 DECLARE_TRAP_HANDLER(alignment_check);
+DECLARE_TRAP_HANDLER(entry_CP);
 
 DECLARE_TRAP_HANDLER(entry_int82);
 
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 05/14] x86/shstk: Re-layout the stack block for shadow stacks
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (3 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 04/14] x86/traps: Implement #CP handler and extend #PF for shadow stacks Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-28 12:33   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 06/14] x86/shstk: Create " Andrew Cooper
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

We have two free pages in the current stack.  A useful property of shadow
stacks and regular stacks is that they act as each others guard pages as far
as OoB writes go.

Move the regular IST stacks up by one page, to allow their shadow stack page
to be in slot 0.  The primary shadow stack uses slot 5.

As the shadow IST stacks are only 1k large, shuffle the order of IST vectors
to have #DF numerically highest (so there is no chance of a shadow stack
overflow clobbering the supervisor token).

The XPTI code already breaks the MEMORY_GUARD abstraction for stacks by
forcing it to be present.  To avoid having too many configurations, do away
with the concept entirely, and unconditionally unmap the pages in all cases.

A later change will turn these properly into shadow stacks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Adjust text
 * Introduce PRIMARY_SHSTK_SLOT (Name subject to improvement).
---
 xen/arch/x86/cpu/common.c       | 10 +++++-----
 xen/arch/x86/mm.c               | 19 ++++++-------------
 xen/arch/x86/smpboot.c          |  3 +--
 xen/arch/x86/traps.c            | 23 ++++++-----------------
 xen/include/asm-x86/config.h    |  3 +++
 xen/include/asm-x86/current.h   | 12 ++++++------
 xen/include/asm-x86/mm.h        |  1 -
 xen/include/asm-x86/processor.h |  6 +++---
 8 files changed, 30 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 09b911b3ba..690fd8baa8 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -753,14 +753,14 @@ void load_system_tables(void)
 	 * valid on every instruction boundary.  (Note: these are all
 	 * semantically ACCESS_ONCE() due to tss's volatile qualifier.)
 	 *
-	 * rsp0 refers to the primary stack.  #MC, #DF, NMI and #DB handlers
+	 * rsp0 refers to the primary stack.  #MC, NMI, #DB and #DF handlers
 	 * each get their own stacks.  No IO Bitmap.
 	 */
 	tss->rsp0 = stack_bottom;
-	tss->ist[IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE;
-	tss->ist[IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE;
-	tss->ist[IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE;
-	tss->ist[IST_DB  - 1] = stack_top + IST_DB  * PAGE_SIZE;
+	tss->ist[IST_MCE - 1] = stack_top + (1 + IST_MCE) * PAGE_SIZE;
+	tss->ist[IST_NMI - 1] = stack_top + (1 + IST_NMI) * PAGE_SIZE;
+	tss->ist[IST_DB  - 1] = stack_top + (1 + IST_DB)  * PAGE_SIZE;
+	tss->ist[IST_DF  - 1] = stack_top + (1 + IST_DF)  * PAGE_SIZE;
 	tss->bitmap = IOBMP_INVALID_OFFSET;
 
 	/* All other stack pointers poisioned. */
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e42044eb74..2f1e716b6d 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5996,25 +5996,18 @@ void memguard_unguard_range(void *p, unsigned long l)
 
 void memguard_guard_stack(void *p)
 {
-    /* IST_MAX IST pages + at least 1 guard page + primary stack. */
-    BUILD_BUG_ON((IST_MAX + 1) * PAGE_SIZE + PRIMARY_STACK_SIZE > STACK_SIZE);
+    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
 
-    memguard_guard_range(p + IST_MAX * PAGE_SIZE,
-                         STACK_SIZE - PRIMARY_STACK_SIZE - IST_MAX * PAGE_SIZE);
+    p += PRIMARY_SHSTK_SLOT * PAGE_SIZE;
+    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
 }
 
 void memguard_unguard_stack(void *p)
 {
-    memguard_unguard_range(p + IST_MAX * PAGE_SIZE,
-                           STACK_SIZE - PRIMARY_STACK_SIZE - IST_MAX * PAGE_SIZE);
-}
-
-bool memguard_is_stack_guard_page(unsigned long addr)
-{
-    addr &= STACK_SIZE - 1;
+    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_RW);
 
-    return addr >= IST_MAX * PAGE_SIZE &&
-           addr < STACK_SIZE - PRIMARY_STACK_SIZE;
+    p += PRIMARY_SHSTK_SLOT * PAGE_SIZE;
+    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_RW);
 }
 
 void arch_dump_shared_mem_info(void)
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 170ab24e66..13b3dade9c 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -823,8 +823,7 @@ static int setup_cpu_root_pgt(unsigned int cpu)
 
     /* Install direct map page table entries for stack, IDT, and TSS. */
     for ( off = rc = 0; !rc && off < STACK_SIZE; off += PAGE_SIZE )
-        if ( !memguard_is_stack_guard_page(off) )
-            rc = clone_mapping(__va(__pa(stack_base[cpu])) + off, rpt);
+        rc = clone_mapping(__va(__pa(stack_base[cpu])) + off, rpt);
 
     if ( !rc )
         rc = clone_mapping(idt_tables[cpu], rpt);
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 90da787ee2..235a72cf4a 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -365,20 +365,15 @@ static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
 /*
  * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
  *
- * Stack pages 0 - 3:
+ * Stack pages 1 - 4:
  *   These are all 1-page IST stacks.  Each of these stacks have an exception
  *   frame and saved register state at the top.  The interesting bound for a
  *   trace is the word adjacent to this, while the bound for a dump is the
  *   very top, including the exception frame.
  *
- * Stack pages 4 and 5:
- *   None of these are particularly interesting.  With MEMORY_GUARD, page 5 is
- *   explicitly not present, so attempting to dump or trace it is
- *   counterproductive.  Without MEMORY_GUARD, it is possible for a call chain
- *   to use the entire primary stack and wander into page 5.  In this case,
- *   consider these pages an extension of the primary stack to aid debugging
- *   hopefully rare situations where the primary stack has effective been
- *   overflown.
+ * Stack pages 0 and 5:
+ *   Shadow stacks.  These are mapped read-only, and used by CET-SS capable
+ *   processors.  They will never contain regular stack data.
  *
  * Stack pages 6 and 7:
  *   These form the primary stack, and have a cpu_info at the top.  For a
@@ -392,13 +387,10 @@ unsigned long get_stack_trace_bottom(unsigned long sp)
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 3:
+    case 1 ... 4:
         return ROUNDUP(sp, PAGE_SIZE) -
             offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
 
-#ifndef MEMORY_GUARD
-    case 4 ... 5:
-#endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) -
             sizeof(struct cpu_info) - sizeof(unsigned long);
@@ -412,12 +404,9 @@ unsigned long get_stack_dump_bottom(unsigned long sp)
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 3:
+    case 1 ... 4:
         return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
 
-#ifndef MEMORY_GUARD
-    case 4 ... 5:
-#endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);
 
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index 266d281718..f3cf5df462 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -75,6 +75,9 @@
 /* Primary stack is restricted to 8kB by guard pages. */
 #define PRIMARY_STACK_SIZE 8192
 
+/* Primary shadow stack is slot 5 of 8, immediately under the primary stack. */
+#define PRIMARY_SHSTK_SLOT 5
+
 /* Total size of syscall and emulation stubs. */
 #define STUB_BUF_SHIFT (L1_CACHE_SHIFT > 7 ? L1_CACHE_SHIFT : 7)
 #define STUB_BUF_SIZE  (1 << STUB_BUF_SHIFT)
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 5b8f4dbc79..99b66a0087 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -16,12 +16,12 @@
  *
  * 7 - Primary stack (with a struct cpu_info at the top)
  * 6 - Primary stack
- * 5 - Optionally not present (MEMORY_GUARD)
- * 4 - Unused; optionally not present (MEMORY_GUARD)
- * 3 - Unused; optionally not present (MEMORY_GUARD)
- * 2 - MCE IST stack
- * 1 - NMI IST stack
- * 0 - Double Fault IST stack
+ * 5 - Primay Shadow Stack (read-only)
+ * 4 - #DF IST stack
+ * 3 - #DB IST stack
+ * 2 - NMI IST stack
+ * 1 - #MC IST stack
+ * 0 - IST Shadow Stacks (4x 1k, read-only)
  */
 
 /*
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 3d3f9d49ac..7e74996053 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -536,7 +536,6 @@ void memguard_unguard_range(void *p, unsigned long l);
 
 void memguard_guard_stack(void *p);
 void memguard_unguard_stack(void *p);
-bool __attribute_const__ memguard_is_stack_guard_page(unsigned long addr);
 
 struct mmio_ro_emulate_ctxt {
         unsigned long cr2;
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index c2b9dc1ac0..8ab09cf7ed 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -440,10 +440,10 @@ struct tss_page {
 DECLARE_PER_CPU(struct tss_page, tss_page);
 
 #define IST_NONE 0UL
-#define IST_DF   1UL
+#define IST_MCE  1UL
 #define IST_NMI  2UL
-#define IST_MCE  3UL
-#define IST_DB   4UL
+#define IST_DB   3UL
+#define IST_DF   4UL
 #define IST_MAX  4UL
 
 /* Set the Interrupt Stack Table used by a particular IDT entry. */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 06/14] x86/shstk: Create shadow stacks
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (4 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 05/14] x86/shstk: Re-layout the stack block " Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-28 12:50   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 07/14] x86/cpu: Adjust enable_nmis() to be shadow stack compatible Andrew Cooper
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Introduce HYPERVISOR_SHSTK pagetable constants, which are Read-Only + Dirty.
Use these in place of _PAGE_NONE for memguard_guard_stack().

Supervisor shadow stacks need a token written at the top, which is most easily
done before making the frame read only.

Allocate the shadow IST stack block in struct tss_page.  It doesn't strictly
need to live here, but it is a convenient location (and XPTI-safe, for testing
purposes), and placing it ahead of the TSS doesn't risk colliding with a bad
IO Bitmap offset and turning into some IO port permissions.

Have load_system_tables() set up the shadow IST stack table when setting up
the regular IST in the TSS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Introduce IST_SHSTK_SIZE (Name subject to improvement).
 * Skip writing the shadow stack token for !XEN_SHSTK builds.
 * Tweak clobbering to be correct and safe.
---
 xen/arch/x86/cpu/common.c         | 24 ++++++++++++++++++++++++
 xen/arch/x86/mm.c                 | 25 +++++++++++++++++++++++--
 xen/include/asm-x86/config.h      |  2 ++
 xen/include/asm-x86/page.h        |  1 +
 xen/include/asm-x86/processor.h   |  3 ++-
 xen/include/asm-x86/x86_64/page.h |  1 +
 6 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 690fd8baa8..dcc9ee08de 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -769,6 +769,30 @@ void load_system_tables(void)
 	tss->rsp1 = 0x8600111111111111ul;
 	tss->rsp2 = 0x8600111111111111ul;
 
+	/* Set up the shadow stack IST. */
+	if (cpu_has_xen_shstk) {
+		volatile uint64_t *ist_ssp = this_cpu(tss_page).ist_ssp;
+
+		/*
+		 * Used entries must point at the supervisor stack token.
+		 * Unused entries are poisoned.
+		 *
+		 * This IST Table may be live, and the NMI/#MC entries must
+		 * remain valid on every instruction boundary, hence the
+		 * volatile qualifier.
+		 */
+		ist_ssp[0] = 0x8600111111111111ul;
+		ist_ssp[IST_MCE] = stack_top + (IST_MCE * IST_SHSTK_SIZE) - 8;
+		ist_ssp[IST_NMI] = stack_top + (IST_NMI * IST_SHSTK_SIZE) - 8;
+		ist_ssp[IST_DB]	 = stack_top + (IST_DB	* IST_SHSTK_SIZE) - 8;
+		ist_ssp[IST_DF]	 = stack_top + (IST_DF	* IST_SHSTK_SIZE) - 8;
+		for ( i = IST_DF + 1;
+		      i < ARRAY_SIZE(this_cpu(tss_page).ist_ssp); ++i )
+			ist_ssp[i] = 0x8600111111111111ul;
+
+		wrmsrl(MSR_INTERRUPT_SSP_TABLE, (unsigned long)ist_ssp);
+	}
+
 	BUILD_BUG_ON(sizeof(*tss) <= 0x67); /* Mandated by the architecture. */
 
 	_set_tssldt_desc(gdt + TSS_ENTRY, (unsigned long)tss,
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2f1e716b6d..4d6d22cc41 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5994,12 +5994,33 @@ void memguard_unguard_range(void *p, unsigned long l)
 
 #endif
 
+static void write_sss_token(unsigned long *ptr)
+{
+    /*
+     * A supervisor shadow stack token is its own linear address, with the
+     * busy bit (0) clear.
+     */
+    *ptr = (unsigned long)ptr;
+}
+
 void memguard_guard_stack(void *p)
 {
-    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
+    /* IST Shadow stacks.  4x 1k in stack page 0. */
+    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
+    {
+        write_sss_token(p + (IST_MCE * IST_SHSTK_SIZE) - 8);
+        write_sss_token(p + (IST_NMI * IST_SHSTK_SIZE) - 8);
+        write_sss_token(p + (IST_DB  * IST_SHSTK_SIZE) - 8);
+        write_sss_token(p + (IST_DF  * IST_SHSTK_SIZE) - 8);
+    }
+    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
 
+    /* Primary Shadow Stack.  1x 4k in stack page 5. */
     p += PRIMARY_SHSTK_SLOT * PAGE_SIZE;
-    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
+    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
+        write_sss_token(p + PAGE_SIZE - 8);
+
+    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
 }
 
 void memguard_unguard_stack(void *p)
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index f3cf5df462..2ba234383d 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -66,6 +66,8 @@
 #define STACK_ORDER 3
 #define STACK_SIZE  (PAGE_SIZE << STACK_ORDER)
 
+#define IST_SHSTK_SIZE 1024
+
 #define TRAMPOLINE_STACK_SPACE  PAGE_SIZE
 #define TRAMPOLINE_SPACE        (KB(64) - TRAMPOLINE_STACK_SPACE)
 #define WAKEUP_STACK_MIN        3072
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 5acf3d3d5a..f632affaef 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -364,6 +364,7 @@ void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
                                    _PAGE_DIRTY | _PAGE_RW)
 #define __PAGE_HYPERVISOR_UCMINUS (__PAGE_HYPERVISOR | _PAGE_PCD)
 #define __PAGE_HYPERVISOR_UC      (__PAGE_HYPERVISOR | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_HYPERVISOR_SHSTK   (__PAGE_HYPERVISOR_RO | _PAGE_DIRTY)
 
 #define MAP_SMALL_PAGES _PAGE_AVAIL0 /* don't use superpages mappings */
 
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 8ab09cf7ed..859bd9e2ec 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -435,7 +435,8 @@ struct __packed tss64 {
     uint16_t :16, bitmap;
 };
 struct tss_page {
-    struct tss64 __aligned(PAGE_SIZE) tss;
+    uint64_t __aligned(PAGE_SIZE) ist_ssp[8];
+    struct tss64 tss;
 };
 DECLARE_PER_CPU(struct tss_page, tss_page);
 
diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
index 9876634881..26621f9519 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -171,6 +171,7 @@ static inline intpte_t put_pte_flags(unsigned int x)
 #define PAGE_HYPERVISOR_RW      (__PAGE_HYPERVISOR_RW      | _PAGE_GLOBAL)
 #define PAGE_HYPERVISOR_RX      (__PAGE_HYPERVISOR_RX      | _PAGE_GLOBAL)
 #define PAGE_HYPERVISOR_RWX     (__PAGE_HYPERVISOR         | _PAGE_GLOBAL)
+#define PAGE_HYPERVISOR_SHSTK   (__PAGE_HYPERVISOR_SHSTK   | _PAGE_GLOBAL)
 
 #define PAGE_HYPERVISOR         PAGE_HYPERVISOR_RW
 #define PAGE_HYPERVISOR_UCMINUS (__PAGE_HYPERVISOR_UCMINUS | \
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 07/14] x86/cpu: Adjust enable_nmis() to be shadow stack compatible
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (5 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 06/14] x86/shstk: Create " Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-27 19:18 ` [PATCH v2 08/14] x86/cpu: Adjust reset_stack_and_jump() " Andrew Cooper
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

When executing an IRET-to-self, the shadow stack must agree with the regular
stack.  We can't manipulate SSP directly, so have to fake a shadow IRET frame
by executing 3 CALLs, then editing the result to look correct.

This is not a fastpath, is called on the BSP long before CET can be set up,
and may be called on the crash path after CET is disabled.  Use the fact that
INCSSP is allocated from the hint nop space to construct a test for CET being
active which is safe on all processors.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/include/asm-x86/processor.h | 43 +++++++++++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 859bd9e2ec..badd7e60e5 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -545,17 +545,40 @@ static inline void enable_nmis(void)
 {
     unsigned long tmp;
 
-    asm volatile ( "mov %%rsp, %[tmp]     \n\t"
-                   "push %[ss]            \n\t"
-                   "push %[tmp]           \n\t"
-                   "pushf                 \n\t"
-                   "push %[cs]            \n\t"
-                   "lea 1f(%%rip), %[tmp] \n\t"
-                   "push %[tmp]           \n\t"
-                   "iretq; 1:             \n\t"
-                   : [tmp] "=&r" (tmp)
+    asm volatile ( "mov     %%rsp, %[rsp]        \n\t"
+                   "lea    .Ldone(%%rip), %[rip] \n\t"
+#ifdef CONFIG_XEN_SHSTK
+                   /* Check for CET-SS being active. */
+                   "mov    $1, %k[ssp]           \n\t"
+                   "rdsspq %[ssp]                \n\t"
+                   "cmp    $1, %k[ssp]           \n\t"
+                   "je     .Lshstk_done          \n\t"
+
+                   /* Push 3 words on the shadow stack */
+                   ".rept 3                      \n\t"
+                   "call 1f; nop; 1:             \n\t"
+                   ".endr                        \n\t"
+
+                   /* Fixup to be an IRET shadow stack frame */
+                   "wrssq  %q[cs], -1*8(%[ssp])  \n\t"
+                   "wrssq  %[rip], -2*8(%[ssp])  \n\t"
+                   "wrssq  %[ssp], -3*8(%[ssp])  \n\t"
+
+                   ".Lshstk_done:"
+#endif
+                   /* Write an IRET regular frame */
+                   "push   %[ss]                 \n\t"
+                   "push   %[rsp]                \n\t"
+                   "pushf                        \n\t"
+                   "push   %q[cs]                \n\t"
+                   "push   %[rip]                \n\t"
+                   "iretq                        \n\t"
+                   ".Ldone:                      \n\t"
+                   : [rip] "=&r" (tmp),
+                     [rsp] "=&r" (tmp),
+                     [ssp] "=&r" (tmp)
                    : [ss] "i" (__HYPERVISOR_DS),
-                     [cs] "i" (__HYPERVISOR_CS) );
+                     [cs] "r" (__HYPERVISOR_CS) );
 }
 
 void sysenter_entry(void);
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 08/14] x86/cpu: Adjust reset_stack_and_jump() to be shadow stack compatible
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (6 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 07/14] x86/cpu: Adjust enable_nmis() to be shadow stack compatible Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-28 14:41   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 09/14] x86/spec-ctrl: Adjust DO_OVERWRITE_RSB " Andrew Cooper
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

We need to unwind up to the supervisor token.  See the comment for details.

The use of UNLIKELY_END_SECTION in this case highlights that it isn't safe
when it isn't the final statement of an asm().  Adjust all declarations with a
newline.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Drop 'cmc' which was stray debugging.
 * Replace raw numbers with defines.
 * Use a real BUG frame in .fixup, to get static branch preduction working the
   right way around.
---
 xen/include/asm-x86/asm_defns.h |  8 +++----
 xen/include/asm-x86/current.h   | 48 ++++++++++++++++++++++++++++++++++++++---
 2 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index b42a19b654..035708adac 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -177,13 +177,13 @@ register unsigned long current_stack_pointer asm("rsp");
 
 #ifdef __clang__ /* clang's builtin assember can't do .subsection */
 
-#define UNLIKELY_START_SECTION ".pushsection .text.unlikely,\"ax\""
-#define UNLIKELY_END_SECTION   ".popsection"
+#define UNLIKELY_START_SECTION ".pushsection .text.unlikely,\"ax\"\n\t"
+#define UNLIKELY_END_SECTION   ".popsection\n\t"
 
 #else
 
-#define UNLIKELY_START_SECTION ".subsection 1"
-#define UNLIKELY_END_SECTION   ".subsection 0"
+#define UNLIKELY_START_SECTION ".subsection 1\n\t"
+#define UNLIKELY_END_SECTION   ".subsection 0\n\t"
 
 #endif
 
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 99b66a0087..086326b81a 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -124,13 +124,55 @@ unsigned long get_stack_dump_bottom (unsigned long sp);
 # define CHECK_FOR_LIVEPATCH_WORK ""
 #endif
 
+#ifdef CONFIG_XEN_SHSTK
+/*
+ * We need to unwind the primary shadow stack to its supervisor token, located
+ * at 0x5ff8 from the base of the stack blocks.
+ *
+ * Read the shadow stack pointer, subtract it from 0x5ff8, divide by 8 to get
+ * the number of slots needing popping.
+ *
+ * INCSSPQ can't pop more than 255 entries.  We shouldn't ever need to pop
+ * that many entries, and getting this wrong will cause us to #DF later.  Turn
+ * it into a BUG() now for fractionally easier debugging.
+ */
+# define SHADOW_STACK_WORK                                      \
+    "mov $1, %[ssp];"                                           \
+    "rdsspd %[ssp];"                                            \
+    "cmp $1, %[ssp];"                                           \
+    "je .L_shstk_done.%=;" /* CET not active?  Skip. */         \
+    "mov $%c[skstk_base], %[val];"                              \
+    "and $%c[stack_mask], %[ssp];"                              \
+    "sub %[ssp], %[val];"                                       \
+    "shr $3, %[val];"                                           \
+    "cmp $255, %[val];" /* More than 255 entries?  Crash. */    \
+    UNLIKELY_START(a, shstk_adjust)                             \
+    _ASM_BUGFRAME_TEXT(0)                                       \
+    UNLIKELY_END_SECTION                                        \
+    "incsspq %q[val];"                                          \
+    ".L_shstk_done.%=:"
+#else
+# define SHADOW_STACK_WORK ""
+#endif
+
 #define switch_stack_and_jump(fn, instr)                                \
     ({                                                                  \
+        unsigned int tmp;                                               \
         __asm__ __volatile__ (                                          \
-            "mov %0,%%"__OP"sp;"                                        \
+            SHADOW_STACK_WORK                                           \
+            "mov %[stk], %%rsp;"                                        \
             instr                                                       \
-             "jmp %c1"                                                  \
-            : : "r" (guest_cpu_user_regs()), "i" (fn) : "memory" );     \
+            "jmp %c[fun];"                                              \
+            : [val] "=&r" (tmp),                                        \
+              [ssp] "=&r" (tmp)                                         \
+            : [stk] "r" (guest_cpu_user_regs()),                        \
+              [fun] "i" (fn),                                           \
+              [skstk_base] "i"                                          \
+              ((PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8),               \
+              [stack_mask] "i" (STACK_SIZE - 1),                        \
+              _ASM_BUGFRAME_INFO(BUGFRAME_bug, __LINE__,                \
+                                 __FILE__, NULL)                        \
+            : "memory" );                                               \
         unreachable();                                                  \
     })
 
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 09/14] x86/spec-ctrl: Adjust DO_OVERWRITE_RSB to be shadow stack compatible
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (7 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 08/14] x86/cpu: Adjust reset_stack_and_jump() " Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-27 19:18 ` [PATCH v2 10/14] x86/extable: Adjust extable handling " Andrew Cooper
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

The 32 calls need dropping from the shadow stack as well as the regular stack.
To shorten the code, we can use the 32bit forms of RDSSP/INCSSP, but need to
double up the input to INCSSP to counter the operand size based multiplier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/include/asm-x86/spec_ctrl_asm.h | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index c60093b090..cb34299a86 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -83,9 +83,9 @@
  * Requires nothing
  * Clobbers \tmp (%rax by default), %rcx
  *
- * Requires 256 bytes of stack space, but %rsp has no net change. Based on
- * Google's performance numbers, the loop is unrolled to 16 iterations and two
- * calls per iteration.
+ * Requires 256 bytes of {,shadow}stack space, but %rsp/SSP has no net
+ * change. Based on Google's performance numbers, the loop is unrolled to 16
+ * iterations and two calls per iteration.
  *
  * The call filling the RSB needs a nonzero displacement.  A nop would do, but
  * we use "1: pause; lfence; jmp 1b" to safely contains any ret-based
@@ -114,6 +114,16 @@
     sub $1, %ecx
     jnz .L\@_fill_rsb_loop
     mov %\tmp, %rsp                 /* Restore old %rsp */
+
+#ifdef CONFIG_XEN_SHSTK
+    mov $1, %ecx
+    rdsspd %ecx
+    cmp $1, %ecx
+    je .L\@_shstk_done
+    mov $64, %ecx                   /* 64 * 4 bytes, given incsspd */
+    incsspd %ecx                    /* Restore old SSP */
+.L\@_shstk_done:
+#endif
 .endm
 
 .macro DO_SPEC_CTRL_ENTRY_FROM_HVM
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 10/14] x86/extable: Adjust extable handling to be shadow stack compatible
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (8 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 09/14] x86/spec-ctrl: Adjust DO_OVERWRITE_RSB " Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-28 16:15   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 11/14] x86/alt: Adjust _alternative_instructions() to not create shadow stacks Andrew Cooper
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

When adjusting an IRET frame to recover from a fault, and equivalent
adjustment needs making in the shadow IRET frame.

The adjustment in exception_with_ints_disabled() could in principle be an
alternative block rather than an ifdef, as the only two current users of
_PRE_EXTABLE() are IRET-to-guest instructions.  However, this is not a
fastpath, and this form is more robust to future changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Break extable_shstk_fixup() out into a separate function.
 * Guard from shstk underflows, and unrealistic call traces.
---
 xen/arch/x86/traps.c        | 67 ++++++++++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/x86_64/entry.S | 11 +++++++-
 2 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 235a72cf4a..ce910294ea 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -363,7 +363,7 @@ static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
 }
 
 /*
- * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
+ * Notes for get_{stack,shstk}*_bottom() helpers
  *
  * Stack pages 1 - 4:
  *   These are all 1-page IST stacks.  Each of these stacks have an exception
@@ -400,6 +400,18 @@ unsigned long get_stack_trace_bottom(unsigned long sp)
     }
 }
 
+static unsigned long get_shstk_bottom(unsigned long sp)
+{
+    switch ( get_stack_page(sp) )
+    {
+#ifdef CONFIG_XEN_SHSTK
+    case 0:  return ROUNDUP(sp, IST_SHSTK_SIZE) - sizeof(unsigned long);
+    case 5:  return ROUNDUP(sp, PAGE_SIZE)      - sizeof(unsigned long);
+#endif
+    default: return sp - sizeof(unsigned long);
+    }
+}
+
 unsigned long get_stack_dump_bottom(unsigned long sp)
 {
     switch ( get_stack_page(sp) )
@@ -763,6 +775,56 @@ static void do_reserved_trap(struct cpu_user_regs *regs)
           trapnr, vec_name(trapnr), regs->error_code);
 }
 
+static void extable_shstk_fixup(struct cpu_user_regs *regs, unsigned long fixup)
+{
+    unsigned long ssp, *ptr, *base;
+
+    asm ( "rdsspq %0" : "=r" (ssp) : "0" (1) );
+    if ( ssp == 1 )
+        return;
+
+    ptr = _p(ssp);
+    base = _p(get_shstk_bottom(ssp));
+
+    for ( ; ptr < base; ++ptr )
+    {
+        /*
+         * Search for %rip.  The shstk currently looks like this:
+         *
+         *   ...  [Likely pointed to by SSP]
+         *   %cs  [== regs->cs]
+         *   %rip [== regs->rip]
+         *   SSP  [Likely points to 3 slots higher, above %cs]
+         *   ...  [call tree to this function, likely 2/3 slots]
+         *
+         * and we want to overwrite %rip with fixup.  There are two
+         * complications:
+         *   1) We cant depend on SSP values, because they won't differ by 3
+         *      slots if the exception is taken on an IST stack.
+         *   2) There are synthetic (unrealistic but not impossible) scenarios
+         *      where %rip can end up in the call tree to this function, so we
+         *      can't check against regs->rip alone.
+         *
+         * Check for both reg->rip and regs->cs matching.
+         */
+
+        if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
+        {
+            asm ( "wrssq %[fix], %[stk]"
+                  : [stk] "=m" (*ptr)
+                  : [fix] "r" (fixup) );
+            return;
+        }
+    }
+
+    /*
+     * We failed to locate and fix up the shadow IRET frame.  This could be
+     * due to shadow stack corruption, or bad logic above.  We cannot continue
+     * executing the interrupted context.
+     */
+    BUG();
+}
+
 static bool extable_fixup(struct cpu_user_regs *regs, bool print)
 {
     unsigned long fixup = search_exception_table(regs);
@@ -779,6 +841,9 @@ static bool extable_fixup(struct cpu_user_regs *regs, bool print)
                vec_name(regs->entry_vector), regs->error_code,
                _p(regs->rip), _p(regs->rip), _p(fixup));
 
+    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
+        extable_shstk_fixup(regs, fixup);
+
     regs->rip = fixup;
     this_cpu(last_extable_addr) = regs->rip;
 
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index f7ee3dce91..78ac0df49f 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -708,7 +708,16 @@ exception_with_ints_disabled:
         call  search_pre_exception_table
         testq %rax,%rax                 # no fixup code for faulting EIP?
         jz    1b
-        movq  %rax,UREGS_rip(%rsp)
+        movq  %rax,UREGS_rip(%rsp)      # fixup regular stack
+
+#ifdef CONFIG_XEN_SHSTK
+        mov    $1, %edi
+        rdsspq %rdi
+        cmp    $1, %edi
+        je     .L_exn_shstk_done
+        wrssq  %rax, (%rdi)             # fixup shadow stack
+.L_exn_shstk_done:
+#endif
         subq  $8,UREGS_rsp(%rsp)        # add ec/ev to previous stack frame
         testb $15,UREGS_rsp(%rsp)       # return %rsp is now aligned?
         jz    1f                        # then there is a pad quadword already
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 11/14] x86/alt: Adjust _alternative_instructions() to not create shadow stacks
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (9 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 10/14] x86/extable: Adjust extable handling " Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-29 12:23   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 12/14] x86/entry: Adjust guest paths to be shadow stack compatible Andrew Cooper
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

The current alternatives algorithm clears CR0.WP and writes into .text.  This
has a side effect of the mappings becoming shadow stacks once CET is active.

Adjust _alternative_instructions() to clean up after itself.  This involves
extending the set of bits modify_xen_mappings() to include Dirty (and Accessed
for good measure).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/alternative.c | 14 ++++++++++++++
 xen/arch/x86/mm.c          |  6 +++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index ce2b4302e6..004e9ede25 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -21,6 +21,7 @@
 #include <asm/processor.h>
 #include <asm/alternative.h>
 #include <xen/init.h>
+#include <asm/setup.h>
 #include <asm/system.h>
 #include <asm/traps.h>
 #include <asm/nmi.h>
@@ -398,6 +399,19 @@ static void __init _alternative_instructions(bool force)
         panic("Timed out waiting for alternatives self-NMI to hit\n");
 
     set_nmi_callback(saved_nmi_callback);
+
+    /*
+     * When Xen is using shadow stacks, the alternatives clearing CR0.WP and
+     * writing into the mappings set dirty bits, turning the mappings into
+     * shadow stack mappings.
+     *
+     * While we can execute from them, this would also permit them to be the
+     * target of WRSS instructions, so reset the dirty after patching.
+     */
+    if ( cpu_has_xen_shstk )
+        modify_xen_mappings(XEN_VIRT_START + MB(2),
+                            (unsigned long)&__2M_text_end,
+                            PAGE_HYPERVISOR_RX);
 }
 
 void __init alternative_instructions(void)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4d6d22cc41..711b12828f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5442,8 +5442,8 @@ int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
  * mappings, but will shatter superpages if necessary, and will destroy
  * mappings if not passed _PAGE_PRESENT.
  *
- * The only flags considered are NX, RW and PRESENT.  All other input flags
- * are ignored.
+ * The only flags considered are NX, D, A, RW and PRESENT.  All other input
+ * flags are ignored.
  *
  * It is an error to call with present flags over an unpopulated range.
  */
@@ -5456,7 +5456,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     unsigned long v = s;
 
     /* Set of valid PTE bits which may be altered. */
-#define FLAGS_MASK (_PAGE_NX|_PAGE_RW|_PAGE_PRESENT)
+#define FLAGS_MASK (_PAGE_NX|_PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_RW|_PAGE_PRESENT)
     nf &= FLAGS_MASK;
 
     ASSERT(IS_ALIGNED(s, PAGE_SIZE));
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 12/14] x86/entry: Adjust guest paths to be shadow stack compatible
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (10 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 11/14] x86/alt: Adjust _alternative_instructions() to not create shadow stacks Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-29 12:40   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 13/14] x86/S3: Save and restore Shadow Stack configuration Andrew Cooper
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

The SYSCALL/SYSENTER/SYSRET paths need to use {SET,CLR}SSBSY.  The IRET to
guest paths must not.  In the SYSRET path, re-position the mov which loads rip
into %rcx so we can use %rcx for CLRSSBSY, rather than spilling another
register to the stack.

While we can in principle detect shadow stack corruption and a failure to
clear the supervisor token busy bit in the SYSRET path (by inspecting the
carry flag following CLRSSBSY), we cannot detect similar problems for the IRET
path (IRET is specified not to fault in this case).

We will double fault at some point later, when next trying to enter Xen, due
to an already-set supervisor shadow stack busy bit.  As SYSRET is a uncommon
path anyway, avoid the added complexity for no appreciable gain.

The IST switch onto the primary stack is not great as we have an instruction
boundary with no shadow stack.  This is the least bad option available.

These paths are not used before shadow stacks are properly established, so can
use alternatives to avoid extra runtime CET detection logic.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Break comment deletion out to an earlier patch
 * SETSSBSY on the SYSENTER path as well
 * Don't spill %rax to the stack in the SYSRET path
---
 xen/arch/x86/x86_64/compat/entry.S |  1 +
 xen/arch/x86/x86_64/entry.S        | 32 +++++++++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 3cd375bd48..2ca81341a4 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -198,6 +198,7 @@ ENTRY(cr4_pv32_restore)
 
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
+        ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
         /* sti could live here when we don't switch page tables below. */
         CR4_PV32_RESTORE
         movq  8(%rsp),%rax /* Restore %rax. */
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 78ac0df49f..449ee468e4 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -191,9 +191,16 @@ restore_all_guest:
         sarq  $47,%rcx
         incl  %ecx
         cmpl  $1,%ecx
-        movq  8(%rsp),%rcx            # RIP
         ja    iret_exit_to_guest
 
+        /* Clear the supervisor shadow stack token busy bit. */
+.macro rag_clrssbsy
+        rdsspq %rcx
+        clrssbsy (%rcx)
+.endm
+        ALTERNATIVE "", rag_clrssbsy, X86_FEATURE_XEN_SHSTK
+
+        movq  8(%rsp), %rcx           # RIP
         cmpw  $FLAT_USER_CS32,16(%rsp)# CS
         movq  32(%rsp),%rsp           # RSP
         je    1f
@@ -226,6 +233,7 @@ iret_exit_to_guest:
  * %ss must be saved into the space left by the trampoline.
  */
 ENTRY(lstar_enter)
+        ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
         /* sti could live here when we don't switch page tables below. */
         movq  8(%rsp),%rax /* Restore %rax. */
         movq  $FLAT_KERNEL_SS,8(%rsp)
@@ -259,6 +267,7 @@ ENTRY(lstar_enter)
         jmp   test_all_events
 
 ENTRY(sysenter_entry)
+        ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
         /* sti could live here when we don't switch page tables below. */
         pushq $FLAT_USER_SS
         pushq $0
@@ -877,6 +886,27 @@ handle_ist_exception:
         movl  $UREGS_kernel_sizeof/8,%ecx
         movq  %rdi,%rsp
         rep   movsq
+
+        /* Switch Shadow Stacks */
+.macro ist_switch_shstk
+        rdsspq %rdi
+        clrssbsy (%rdi)
+        /*
+         * Switching supervisor shadow stacks is specially hard, as supervisor
+         * and restore tokens are incompatible.
+         *
+         * For now, we only need to switch on to an unused primary shadow
+         * stack, so use SETSSBSY for the purpose, exactly like the
+         * SYSCALL/SYSENTER entry.
+         *
+         * Ideally, we'd want to CLRSSBSY after switching stacks, but that
+         * will leave SSP zeroed so it not an option.  Instead, we transiently
+         * have a zero SSP on this instruction boundary, and depend on IST for
+         * NMI/#MC protection.
+         */
+        setssbsy
+.endm
+        ALTERNATIVE "", ist_switch_shstk, X86_FEATURE_XEN_SHSTK
 1:
 #else
         ASSERT_CONTEXT_IS_XEN
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 13/14] x86/S3: Save and restore Shadow Stack configuration
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (11 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 12/14] x86/entry: Adjust guest paths to be shadow stack compatible Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-29 12:52   ` Jan Beulich
  2020-05-27 19:18 ` [PATCH v2 14/14] x86/shstk: Activate Supervisor Shadow Stacks Andrew Cooper
  2020-05-29 22:28 ` [PATCH v2 00/14] x86: Support for CET " Andrew Cooper
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

See code for details

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

Semi-RFC - I can't actually test this path.  Currently attempting to arrange
for someone else to.

v2:
 * New, split out of "x86/shstk: Activate Supervisor Shadow Stacks"
 * Drop asm/config.h include
 * Fix order of operations to avoid multiple crashes.
---
 xen/arch/x86/acpi/wakeup_prot.S | 58 +++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/msr-index.h |  3 +++
 xen/include/asm-x86/x86-defns.h |  1 +
 3 files changed, 62 insertions(+)

diff --git a/xen/arch/x86/acpi/wakeup_prot.S b/xen/arch/x86/acpi/wakeup_prot.S
index 4dba6020a7..dcc7e2327d 100644
--- a/xen/arch/x86/acpi/wakeup_prot.S
+++ b/xen/arch/x86/acpi/wakeup_prot.S
@@ -1,3 +1,7 @@
+#include <asm/msr-index.h>
+#include <asm/page.h>
+#include <asm/processor.h>
+
         .file __FILE__
         .text
         .code64
@@ -15,6 +19,12 @@ ENTRY(do_suspend_lowlevel)
         mov     %cr0, %rax
         mov     %rax, saved_cr0(%rip)
 
+#ifdef CONFIG_XEN_SHSTK
+        mov     $1, %eax
+        rdsspq  %rax
+        mov     %rax, saved_ssp(%rip)
+#endif
+
         /* enter sleep state physically */
         mov     $3, %edi
         call    acpi_enter_sleep_state
@@ -48,6 +58,51 @@ ENTRY(s3_resume)
         pushq   %rax
         lretq
 1:
+#ifdef CONFIG_XEN_SHSTK
+        /*
+         * Restoring SSP is a little complicated, because we are intercepting
+         * an in-use shadow stack.  Write a temporary token under the stack,
+         * so SETSSBSY will successfully load a value useful for us, then
+         * reset MSR_PL0_SSP to its usual value and pop the temporary token.
+         */
+        mov     saved_rsp(%rip), %rdi
+        cmpq    $1, %rdi
+        je      .L_shstk_done
+
+        /* Set up MSR_S_CET. */
+        mov     $MSR_S_CET, %ecx
+        xor     %edx, %edx
+        mov     $CET_SHSTK_EN | CET_WRSS_EN, %eax
+        wrmsr
+
+        /* Construct the temporary supervisor token under SSP. */
+        sub     $8, %rdi
+
+        /* Load it into MSR_PL0_SSP. */
+        mov     $MSR_PL0_SSP, %ecx
+        mov     %rdi, %rdx
+        shr     $32, %rdx
+        mov     %edi, %eax
+        wrmsr
+
+        /* Enable CET.  MSR_INTERRUPT_SSP_TABLE is set up later in load_system_tables(). */
+        mov     $XEN_MINIMAL_CR4 | X86_CR4_CET, %ebx
+        mov     %rbx, %cr4
+
+        /* Write the temporary token onto the shadow stack, and activate it. */
+        wrssq   %rdi, (%rdi)
+        setssbsy
+
+        /* Reset MSR_PL0_SSP back to its normal value. */
+        and     $~(STACK_SIZE - 1), %eax
+        or      $(PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8, %eax
+        wrmsr
+
+        /* Pop the temporary token off the stack. */
+        mov     $2, %eax
+        incsspd %eax
+.L_shstk_done:
+#endif
 
         call    load_system_tables
 
@@ -65,6 +120,9 @@ ENTRY(s3_resume)
 
 saved_rsp:      .quad   0
 saved_cr0:      .quad   0
+#ifdef CONFIG_XEN_SHSTK
+saved_ssp:      .quad   0
+#endif
 
 GLOBAL(saved_magic)
         .long   0x9abcdef0
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 85c5f20b76..cdfb7b047b 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -68,6 +68,9 @@
 
 #define MSR_U_CET                           0x000006a0
 #define MSR_S_CET                           0x000006a2
+#define  CET_SHSTK_EN                       (_AC(1, ULL) <<  0)
+#define  CET_WRSS_EN                        (_AC(1, ULL) <<  1)
+
 #define MSR_PL0_SSP                         0x000006a4
 #define MSR_PL1_SSP                         0x000006a5
 #define MSR_PL2_SSP                         0x000006a6
diff --git a/xen/include/asm-x86/x86-defns.h b/xen/include/asm-x86/x86-defns.h
index 5366e2d018..072c87042c 100644
--- a/xen/include/asm-x86/x86-defns.h
+++ b/xen/include/asm-x86/x86-defns.h
@@ -73,6 +73,7 @@
 #define X86_CR4_SMEP       0x00100000 /* enable SMEP */
 #define X86_CR4_SMAP       0x00200000 /* enable SMAP */
 #define X86_CR4_PKE        0x00400000 /* enable PKE */
+#define X86_CR4_CET        0x00800000 /* Control-flow Enforcement Technology */
 
 /*
  * XSTATE component flags in XCR0
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 14/14] x86/shstk: Activate Supervisor Shadow Stacks
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (12 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 13/14] x86/S3: Save and restore Shadow Stack configuration Andrew Cooper
@ 2020-05-27 19:18 ` Andrew Cooper
  2020-05-29 13:09   ` Jan Beulich
  2020-05-29 22:28 ` [PATCH v2 00/14] x86: Support for CET " Andrew Cooper
  14 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-27 19:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

With all other plumbing in place, activate shadow stacks when possible.  Note
that this requires prohibiting the use of PV32.  Compatibility can be
maintained if necessary via PV-Shim.

The BSP needs to wait until alternatives have run (to avoid interaction with
CR0.WP), and after the first reset_stack_and_jump() to avoid having a pristine
shadow stack interact in problematic ways with an in-use regular stack.
Activate shadow stack in reinit_bsp_stack().

APs have all infrastructure set up by the booting CPU, so enable shadow stacks
before entering C.  Adjust the logic to call start_secondary rather than jump
to it, so stack traces make more sense.

The crash path needs to turn CET off to avoid interfering with the crash
kernel's environment.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Split S3 out into earlier patch.
 * Discuss CET-SS disabling PV32, and start_secondary to be a call.
---
 docs/misc/xen-command-line.pandoc |  8 ++++++++
 xen/arch/x86/boot/x86_64.S        | 35 +++++++++++++++++++++++++++++++++--
 xen/arch/x86/cpu/common.c         |  5 +++++
 xen/arch/x86/crash.c              |  7 +++++++
 xen/arch/x86/setup.c              | 26 ++++++++++++++++++++++++++
 xen/arch/x86/spec_ctrl.c          |  8 ++++++++
 6 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index d4934eabb7..9892c57ee9 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -287,6 +287,10 @@ call/jmp COP/JOP) attacks.
     `cet=no-shstk` will cause Xen not to use Shadow Stacks even when support
     is available in hardware.
 
+    Shadow Stacks are incompatible with 32bit PV guests.  This option will
+    override the `pv=32` boolean to false.  Backwards compatibility can be
+    maintained with the `pv-shim` mechanism.
+
 ### clocksource (x86)
 > `= pit | hpet | acpi | tsc`
 
@@ -1726,6 +1730,10 @@ Controls for aspects of PV guest support.
 *   The `32` boolean controls whether 32bit PV guests can be created.  It
     defaults to `true`, and is ignored when `CONFIG_PV32` is compiled out.
 
+    32bit PV guests are incompatible with CET Shadow Stacks.  If Xen is using
+    shadow stacks, this option will be overridden to `false`.  Backwards
+    compatibility can be maintained with the `pv-shim` mechanism.
+
 ### pv-linear-pt (x86)
 > `= <boolean>`
 
diff --git a/xen/arch/x86/boot/x86_64.S b/xen/arch/x86/boot/x86_64.S
index 314a32a19f..551acd9e94 100644
--- a/xen/arch/x86/boot/x86_64.S
+++ b/xen/arch/x86/boot/x86_64.S
@@ -28,8 +28,39 @@ ENTRY(__high_start)
         lretq
 1:
         test    %ebx,%ebx
-        jnz     start_secondary
-
+        jz      .L_bsp
+
+        /* APs.  Set up shadow stacks before entering C. */
+
+        testl   $cpufeat_mask(X86_FEATURE_XEN_SHSTK), \
+                CPUINFO_FEATURE_OFFSET(X86_FEATURE_XEN_SHSTK) + boot_cpu_data(%rip)
+        je      .L_ap_shstk_done
+
+        /* Set up MSR_S_CET. */
+        mov     $MSR_S_CET, %ecx
+        xor     %edx, %edx
+        mov     $CET_SHSTK_EN | CET_WRSS_EN, %eax
+        wrmsr
+
+        /* Derive MSR_PL0_SSP from %rsp (token written when stack is allocated). */
+        mov     $MSR_PL0_SSP, %ecx
+        mov     %rsp, %rdx
+        shr     $32, %rdx
+        mov     %esp, %eax
+        and     $~(STACK_SIZE - 1), %eax
+        or      $(PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8, %eax
+        wrmsr
+
+        /* Enable CET.  MSR_INTERRUPT_SSP_TABLE is set up later in load_system_tables(). */
+        mov     $XEN_MINIMAL_CR4 | X86_CR4_CET, %ecx
+        mov     %rcx, %cr4
+        setssbsy
+
+.L_ap_shstk_done:
+        call    start_secondary
+        BUG     /* start_secondary() shouldn't return. */
+
+.L_bsp:
         /* Pass off the Multiboot info structure to C land (if applicable). */
         mov     multiboot_ptr(%rip),%edi
         call    __start_xen
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index dcc9ee08de..b4416f941c 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -329,6 +329,11 @@ void __init early_cpu_init(void)
 	       x86_cpuid_vendor_to_str(c->x86_vendor), c->x86, c->x86,
 	       c->x86_model, c->x86_model, c->x86_mask, eax);
 
+	if (c->cpuid_level >= 7) {
+		cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
+		c->x86_capability[cpufeat_word(X86_FEATURE_CET_SS)] = ecx;
+	}
+
 	eax = cpuid_eax(0x80000000);
 	if ((eax >> 16) == 0x8000 && eax >= 0x80000008) {
 		eax = cpuid_eax(0x80000008);
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index 450eecd46b..0611b4fb9b 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -200,6 +200,13 @@ void machine_crash_shutdown(void)
     /* Reset CPUID masking and faulting to the host's default. */
     ctxt_switch_levelling(NULL);
 
+    /* Disable shadow stacks. */
+    if ( cpu_has_xen_shstk )
+    {
+        wrmsrl(MSR_S_CET, 0);
+        write_cr4(read_cr4() & ~X86_CR4_CET);
+    }
+
     info = kexec_crash_save_info();
     info->xen_phys_start = xen_phys_start;
     info->dom0_pfn_to_mfn_frame_list_list =
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 584589baff..0c4fd8c6a8 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -664,6 +664,13 @@ static void __init noreturn reinit_bsp_stack(void)
     stack_base[0] = stack;
     memguard_guard_stack(stack);
 
+    if ( cpu_has_xen_shstk )
+    {
+        wrmsrl(MSR_PL0_SSP, (unsigned long)stack + 0x5ff8);
+        wrmsrl(MSR_S_CET, CET_SHSTK_EN | CET_WRSS_EN);
+        asm volatile ("setssbsy" ::: "memory");
+    }
+
     reset_stack_and_jump_nolp(init_done);
 }
 
@@ -1065,6 +1072,21 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     /* This must come before e820 code because it sets paddr_bits. */
     early_cpu_init();
 
+    /* Choose shadow stack early, to set infrastructure up appropriately. */
+    if ( opt_xen_shstk && boot_cpu_has(X86_FEATURE_CET_SS) )
+    {
+        printk("Enabling Supervisor Shadow Stacks\n");
+
+        setup_force_cpu_cap(X86_FEATURE_XEN_SHSTK);
+#ifdef CONFIG_PV32
+        if ( opt_pv32 )
+        {
+            opt_pv32 = 0;
+            printk("  - Disabling PV32 due to Shadow Stacks\n");
+        }
+#endif
+    }
+
     /* Sanitise the raw E820 map to produce a final clean version. */
     max_page = raw_max_page = init_e820(memmap_type, &e820_raw);
 
@@ -1801,6 +1823,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     alternative_branches();
 
+    /* Defer CR4.CET until alternatives have finished playing with CR4.WP */
+    if ( cpu_has_xen_shstk )
+        set_in_cr4(X86_CR4_CET);
+
     /*
      * NB: when running as a PV shim VCPUOP_up/down is wired to the shim
      * physical cpu_add/remove functions, so launch the guest with only
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index c5d8e587a8..a94be2d594 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -882,6 +882,14 @@ void __init init_speculation_mitigations(void)
     hw_smt_enabled = check_smt_enabled();
 
     /*
+     * First, disable the use of retpolines if Xen is using shadow stacks, as
+     * they are incompatible.
+     */
+    if ( cpu_has_xen_shstk &&
+         (opt_thunk == THUNK_DEFAULT || opt_thunk == THUNK_RETPOLINE) )
+        thunk = THUNK_JMP;
+
+    /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
      * instructions exactly and disable all heuristics.
      */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 01/14] x86/traps: Clean up printing in {do_reserved,fatal}_trap()
  2020-05-27 19:18 ` [PATCH v2 01/14] x86/traps: Clean up printing in {do_reserved, fatal}_trap() Andrew Cooper
@ 2020-05-28  9:45   ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2020-05-28  9:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> For one, they render the vector in a different base.
> 
> Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their
> mnemonic, which starts bringing the code/diagnostics in line with the Intel
> and AMD manuals.
> 
> Provide constants for every archtiecturally defined exception, even those not
> implemented by Xen yet, as do_reserved_trap() is a catch-all handler.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

As before somewhat hesitantly
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 02/14] x86/traps: Factor out extable_fixup() and make printing consistent
  2020-05-27 19:18 ` [PATCH v2 02/14] x86/traps: Factor out extable_fixup() and make printing consistent Andrew Cooper
@ 2020-05-28  9:50   ` Jan Beulich
  2020-05-28 17:26     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-28  9:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> UD faults never had any diagnostics printed, and the others were inconsistent.
> 
> Don't use dprintk() because identifying traps.c is actively unhelpful in the
> message, as it is the location of the fixup, not the fault.  Use the new
> vec_name() infrastructure, rather than leaving raw numbers for the log.
> 
>   (XEN) Running stub recovery selftests...
>   (XEN) Fixup #UD[0000]: ffff82d07fffd040 [ffff82d07fffd040] -> ffff82d0403ac9d6
>   (XEN) Fixup #GP[0000]: ffff82d07fffd041 [ffff82d07fffd041] -> ffff82d0403ac9d6
>   (XEN) Fixup #SS[0000]: ffff82d07fffd040 [ffff82d07fffd040] -> ffff82d0403ac9d6
>   (XEN) Fixup #BP[0000]: ffff82d07fffd041 [ffff82d07fffd041] -> ffff82d0403ac9d6
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

As before
Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit I realize I have one more suggestion:

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -772,10 +772,31 @@ static void do_reserved_trap(struct cpu_user_regs *regs)
>            trapnr, vec_name(trapnr), regs->error_code);
>  }
>  
> +static bool extable_fixup(struct cpu_user_regs *regs, bool print)
> +{
> +    unsigned long fixup = search_exception_table(regs);
> +
> +    if ( unlikely(fixup == 0) )
> +        return false;
> +
> +    /*
> +     * Don't use dprintk() because the __FILE__ reference is unhelpful.
> +     * Can currently be triggered by guests.  Make sure we ratelimit.
> +     */
> +    if ( IS_ENABLED(CONFIG_DEBUG) && print )

How about pulling the IS_ENABLED(CONFIG_DEBUG) into the call sites
currently passing "true"?

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-05-27 19:18 ` [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support Andrew Cooper
@ 2020-05-28 10:25   ` Jan Beulich
  2020-05-28 18:10     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-28 10:25 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> Introduce CONFIG_HAS_AS_CET to determine whether CET instructions are
> supported in the assembler, and CONFIG_XEN_SHSTK as the main build option.
> 
> Introduce cet={no-,}shstk to for a user to select whether or not to use shadow
> stacks at runtime, and X86_FEATURE_XEN_SHSTK to determine Xen's overall
> enablement of shadow stacks.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wl@xen.org>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> LLVM 6 supports CET-SS instructions while only LLVM 7 supports CET-IBT
> instructions.  We'd need to split HAS_AS_CET into two if we want to support
> supervisor shadow stacks with LLVM 6.  (This demonstrates exactly why picking
> a handful of instructions to test is the right approach.)

In this case I agree with splitting; I wasn't aware clang implemented
the respective insns piecemeal.

> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -270,6 +270,23 @@ and not running softirqs. Reduce this if softirqs are not being run frequently
>  enough. Setting this to a high value may cause boot failure, particularly if
>  the NMI watchdog is also enabled.
>  
> +### cet
> +    = List of [ shstk=<bool> ]
> +
> +    Applicability: x86
> +
> +Controls for the use of Control-flow Enforcement Technology.  CET is group of

Nit: "... is a group of ..."

> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -34,6 +34,10 @@ config ARCH_DEFCONFIG
>  config INDIRECT_THUNK
>  	def_bool $(cc-option,-mindirect-branch-register)
>  
> +config HAS_AS_CET
> +	# binutils >= 2.29 and LLVM >= 7
> +	def_bool $(as-instr,wrssq %rax$(comma)0;setssbsy;endbr64)

So you put me in a really awkward position: I'd really like to see
this series go in for 4.14, yet I've previously indicated I want the
underlying concept to first be agreed upon, before any uses get
introduced.

A fundamental problem with this, at least as long as (a) more of
Anthony's series hasn't been committed and (b) we re-build Xen upon
installing (as root), even if it was fully built (as non-root) and
is ready without any re-building. Each of these aspects means that
what you've configured and built may not be what gets installed, by
virtue of the tool chains differing. (a) in addition may lead to
the install-time rebuild to actually go wrong, due to there being
dependency tracking issues on at least {xen,efi}.lds (which I've
noticed in a different context yesterday).

> --- a/xen/scripts/Kconfig.include
> +++ b/xen/scripts/Kconfig.include
> @@ -31,6 +31,10 @@ cc-option = $(success,$(CC) -Werror $(CLANG_FLAGS) $(1) -E -x c /dev/null -o /de
>  # Return y if the linker supports <flag>, n otherwise
>  ld-option = $(success,$(LD) -v $(1))
>  
> +# $(as-instr,<instr>)
> +# Return y if the assembler supports <instr>, n otherwise
> +as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler -o /dev/null -)

Is this actually checking the right thing in the clang case? I.e.
doesn't "-x assembler" make clang invoke the system assembler rather
than use its integrated one, whereas you need the insns to be
recognized by the integrated assembler unless we find a need to pass
-no-integrated-as?

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/14] x86/traps: Implement #CP handler and extend #PF for shadow stacks
  2020-05-27 19:18 ` [PATCH v2 04/14] x86/traps: Implement #CP handler and extend #PF for shadow stacks Andrew Cooper
@ 2020-05-28 12:03   ` Jan Beulich
  2020-05-28 13:22     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-28 12:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> For now, any #CP exception or shadow stack #PF indicate a bug in Xen, but
> attempt to recover from #CP if taken in guest context.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one more question and a suggestion:

> @@ -1445,8 +1447,10 @@ void do_page_fault(struct cpu_user_regs *regs)
>       *
>       * Anything remaining is an error, constituting corruption of the
>       * pagetables and probably an L1TF vulnerable gadget.
> +     *
> +     * Any shadow stack access fault is a bug in Xen.
>       */
> -    if ( error_code & PFEC_reserved_bit )
> +    if ( error_code & (PFEC_reserved_bit | PFEC_shstk) )
>          goto fatal;

Wouldn't you saying "any" imply putting this new check higher up, in
particular ahead of the call to fixup_page_fault()?

> @@ -940,7 +944,8 @@ autogen_stubs: /* Automatically generated stubs. */
>          entrypoint 1b
>  
>          /* Reserved exceptions, heading towards do_reserved_trap(). */
> -        .elseif vec == TRAP_copro_seg || vec == TRAP_spurious_int || (vec > TRAP_simd_error && vec < TRAP_nr)
> +        .elseif vec == X86_EXC_CSO || vec == X86_EXC_SPV || \
> +                vec == X86_EXC_VE  || (vec > X86_EXC_CP && vec < TRAP_nr)

Adding yet another || here adds to the fragility of the entire
construct. Wouldn't it be better to implement do_entry_VE at
this occasion, even its handling continues to end up in
do_reserved_trap()? This would have the benefit of avoiding the
pointless checking of %spl first thing in its handling. Feel
free to keep the R-b if you decide to go this route.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 05/14] x86/shstk: Re-layout the stack block for shadow stacks
  2020-05-27 19:18 ` [PATCH v2 05/14] x86/shstk: Re-layout the stack block " Andrew Cooper
@ 2020-05-28 12:33   ` Jan Beulich
  2020-05-29 19:21     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-28 12:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -365,20 +365,15 @@ static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
>  /*
>   * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
>   *
> - * Stack pages 0 - 3:
> + * Stack pages 1 - 4:
>   *   These are all 1-page IST stacks.  Each of these stacks have an exception
>   *   frame and saved register state at the top.  The interesting bound for a
>   *   trace is the word adjacent to this, while the bound for a dump is the
>   *   very top, including the exception frame.
>   *
> - * Stack pages 4 and 5:
> - *   None of these are particularly interesting.  With MEMORY_GUARD, page 5 is
> - *   explicitly not present, so attempting to dump or trace it is
> - *   counterproductive.  Without MEMORY_GUARD, it is possible for a call chain
> - *   to use the entire primary stack and wander into page 5.  In this case,
> - *   consider these pages an extension of the primary stack to aid debugging
> - *   hopefully rare situations where the primary stack has effective been
> - *   overflown.
> + * Stack pages 0 and 5:
> + *   Shadow stacks.  These are mapped read-only, and used by CET-SS capable
> + *   processors.  They will never contain regular stack data.

I don't mind the comment getting put in place already here, but will it
reflect reality even when CET-SS is not in use, in that the pages then
still are mapped r/o rather than being left unmapped to act as guard
pages not only for stack pushes but also for stack pops? At which point
the "dump or trace it is counterproductive" remark would still apply in
this case, and hence may better be retained.

> @@ -392,13 +387,10 @@ unsigned long get_stack_trace_bottom(unsigned long sp)
>  {
>      switch ( get_stack_page(sp) )
>      {
> -    case 0 ... 3:
> +    case 1 ... 4:
>          return ROUNDUP(sp, PAGE_SIZE) -
>              offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
>  
> -#ifndef MEMORY_GUARD
> -    case 4 ... 5:
> -#endif
>      case 6 ... 7:
>          return ROUNDUP(sp, STACK_SIZE) -
>              sizeof(struct cpu_info) - sizeof(unsigned long);
> @@ -412,12 +404,9 @@ unsigned long get_stack_dump_bottom(unsigned long sp)
>  {
>      switch ( get_stack_page(sp) )
>      {
> -    case 0 ... 3:
> +    case 1 ... 4:
>          return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
>  
> -#ifndef MEMORY_GUARD
> -    case 4 ... 5:
> -#endif
>      case 6 ... 7:
>          return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);

The need to adjust these literal numbers demonstrates how fragile
this is. I admit I can't see a good way to get rid of the literal
numbers altogether, but could I talk you into switching to (for
the latter, as example)

    switch ( get_stack_page(sp) )
    {
    case 0: case PRIMARY_SHSTK_SLOT:
        return 0;

    case 1 ... 4:
        return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);

    case 6 ... 7:
        return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);

    default:
        return sp - sizeof(unsigned long);
    }

? Of course this will need the callers to be aware they may get
back zero, but there are only very few (which made me notice the
functions would better be static). And the returning of zero may
then want changing (conditionally upon us using CET-SS) in a
later patch, where iirc you use the shadow stack for call trace
generation.

As a positive side effect this will yield a compile error if
PRIMARY_SHSTK_SLOT gets changed without adjusting these
functions.

> --- a/xen/include/asm-x86/config.h
> +++ b/xen/include/asm-x86/config.h
> @@ -75,6 +75,9 @@
>  /* Primary stack is restricted to 8kB by guard pages. */
>  #define PRIMARY_STACK_SIZE 8192
>  
> +/* Primary shadow stack is slot 5 of 8, immediately under the primary stack. */
> +#define PRIMARY_SHSTK_SLOT 5

Any reason to put it here rather than ...

> --- a/xen/include/asm-x86/current.h
> +++ b/xen/include/asm-x86/current.h
> @@ -16,12 +16,12 @@
>   *
>   * 7 - Primary stack (with a struct cpu_info at the top)
>   * 6 - Primary stack
> - * 5 - Optionally not present (MEMORY_GUARD)
> - * 4 - Unused; optionally not present (MEMORY_GUARD)
> - * 3 - Unused; optionally not present (MEMORY_GUARD)
> - * 2 - MCE IST stack
> - * 1 - NMI IST stack
> - * 0 - Double Fault IST stack
> + * 5 - Primay Shadow Stack (read-only)
> + * 4 - #DF IST stack
> + * 3 - #DB IST stack
> + * 2 - NMI IST stack
> + * 1 - #MC IST stack
> + * 0 - IST Shadow Stacks (4x 1k, read-only)
>   */

... right below this comment?

Same question as above regarding the "read-only" here.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/14] x86/shstk: Create shadow stacks
  2020-05-27 19:18 ` [PATCH v2 06/14] x86/shstk: Create " Andrew Cooper
@ 2020-05-28 12:50   ` Jan Beulich
  2020-05-29 19:35     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-28 12:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -769,6 +769,30 @@ void load_system_tables(void)
>  	tss->rsp1 = 0x8600111111111111ul;
>  	tss->rsp2 = 0x8600111111111111ul;
>  
> +	/* Set up the shadow stack IST. */
> +	if (cpu_has_xen_shstk) {
> +		volatile uint64_t *ist_ssp = this_cpu(tss_page).ist_ssp;
> +
> +		/*
> +		 * Used entries must point at the supervisor stack token.
> +		 * Unused entries are poisoned.
> +		 *
> +		 * This IST Table may be live, and the NMI/#MC entries must
> +		 * remain valid on every instruction boundary, hence the
> +		 * volatile qualifier.
> +		 */

Move this comment ahead of what it comments on, as we usually have it?

> +		ist_ssp[0] = 0x8600111111111111ul;
> +		ist_ssp[IST_MCE] = stack_top + (IST_MCE * IST_SHSTK_SIZE) - 8;
> +		ist_ssp[IST_NMI] = stack_top + (IST_NMI * IST_SHSTK_SIZE) - 8;
> +		ist_ssp[IST_DB]	 = stack_top + (IST_DB	* IST_SHSTK_SIZE) - 8;
> +		ist_ssp[IST_DF]	 = stack_top + (IST_DF	* IST_SHSTK_SIZE) - 8;

Strictly speaking you want to introduce

#define IST_SHSTK_SLOT 0

next to PRIMARY_SHSTK_SLOT and use

		ist_ssp[IST_MCE] = stack_top + (IST_SHSTK_SLOT * PAGE_SIZE) +
                                               (IST_MCE * IST_SHSTK_SIZE) - 8;

etc here. It's getting longish, so I'm not going to insist. But if you
go this route, then please also below / elsewhere.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5994,12 +5994,33 @@ void memguard_unguard_range(void *p, unsigned long l)
>  
>  #endif
>  
> +static void write_sss_token(unsigned long *ptr)
> +{
> +    /*
> +     * A supervisor shadow stack token is its own linear address, with the
> +     * busy bit (0) clear.
> +     */
> +    *ptr = (unsigned long)ptr;
> +}
> +
>  void memguard_guard_stack(void *p)
>  {
> -    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
> +    /* IST Shadow stacks.  4x 1k in stack page 0. */
> +    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
> +    {
> +        write_sss_token(p + (IST_MCE * IST_SHSTK_SIZE) - 8);
> +        write_sss_token(p + (IST_NMI * IST_SHSTK_SIZE) - 8);
> +        write_sss_token(p + (IST_DB  * IST_SHSTK_SIZE) - 8);
> +        write_sss_token(p + (IST_DF  * IST_SHSTK_SIZE) - 8);

Up to now two successive memguard_guard_stack() were working fine. This
will be no longer the case, just as an observation.

> +    }
> +    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);

As already hinted at in reply to the previous patch, I think this wants
to remain _PAGE_NONE when we don't use CET-SS.

> +    /* Primary Shadow Stack.  1x 4k in stack page 5. */
>      p += PRIMARY_SHSTK_SLOT * PAGE_SIZE;
> -    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
> +    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
> +        write_sss_token(p + PAGE_SIZE - 8);
> +
> +    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
>  }
>  
>  void memguard_unguard_stack(void *p)

Would this function perhaps better zap the tokens?

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/14] x86/traps: Implement #CP handler and extend #PF for shadow stacks
  2020-05-28 12:03   ` Jan Beulich
@ 2020-05-28 13:22     ` Andrew Cooper
  2020-05-28 13:31       ` Jan Beulich
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-28 13:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 28/05/2020 13:03, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> For now, any #CP exception or shadow stack #PF indicate a bug in Xen, but
>> attempt to recover from #CP if taken in guest context.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with one more question and a suggestion:
>
>> @@ -1445,8 +1447,10 @@ void do_page_fault(struct cpu_user_regs *regs)
>>       *
>>       * Anything remaining is an error, constituting corruption of the
>>       * pagetables and probably an L1TF vulnerable gadget.
>> +     *
>> +     * Any shadow stack access fault is a bug in Xen.
>>       */
>> -    if ( error_code & PFEC_reserved_bit )
>> +    if ( error_code & (PFEC_reserved_bit | PFEC_shstk) )
>>          goto fatal;
> Wouldn't you saying "any" imply putting this new check higher up, in
> particular ahead of the call to fixup_page_fault()?

Can do.

>
>> @@ -940,7 +944,8 @@ autogen_stubs: /* Automatically generated stubs. */
>>          entrypoint 1b
>>  
>>          /* Reserved exceptions, heading towards do_reserved_trap(). */
>> -        .elseif vec == TRAP_copro_seg || vec == TRAP_spurious_int || (vec > TRAP_simd_error && vec < TRAP_nr)
>> +        .elseif vec == X86_EXC_CSO || vec == X86_EXC_SPV || \
>> +                vec == X86_EXC_VE  || (vec > X86_EXC_CP && vec < TRAP_nr)
> Adding yet another || here adds to the fragility of the entire
> construct. Wouldn't it be better to implement do_entry_VE at
> this occasion, even its handling continues to end up in
> do_reserved_trap()? This would have the benefit of avoiding the
> pointless checking of %spl first thing in its handling. Feel
> free to keep the R-b if you decide to go this route.

I actually have a different plan, which deletes this entire clause, and
simplifies our autogen sanity checking somewhat.

For vectors which Xen has no implementation of (for whatever reason),
use DPL0, non-present descriptors, and redirect #NP[IDT] into
do_reserved_trap().

No need for any entry logic for the trivially fatal paths, or safety
against being uncertain about error codes.

However, its a little too close to 4.14 to clean this up now.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/14] x86/traps: Implement #CP handler and extend #PF for shadow stacks
  2020-05-28 13:22     ` Andrew Cooper
@ 2020-05-28 13:31       ` Jan Beulich
  2020-05-29 18:50         ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-28 13:31 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 28.05.2020 15:22, Andrew Cooper wrote:
> On 28/05/2020 13:03, Jan Beulich wrote:
>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>> @@ -940,7 +944,8 @@ autogen_stubs: /* Automatically generated stubs. */
>>>          entrypoint 1b
>>>  
>>>          /* Reserved exceptions, heading towards do_reserved_trap(). */
>>> -        .elseif vec == TRAP_copro_seg || vec == TRAP_spurious_int || (vec > TRAP_simd_error && vec < TRAP_nr)
>>> +        .elseif vec == X86_EXC_CSO || vec == X86_EXC_SPV || \
>>> +                vec == X86_EXC_VE  || (vec > X86_EXC_CP && vec < TRAP_nr)
>> Adding yet another || here adds to the fragility of the entire
>> construct. Wouldn't it be better to implement do_entry_VE at
>> this occasion, even its handling continues to end up in
>> do_reserved_trap()? This would have the benefit of avoiding the
>> pointless checking of %spl first thing in its handling. Feel
>> free to keep the R-b if you decide to go this route.
> 
> I actually have a different plan, which deletes this entire clause, and
> simplifies our autogen sanity checking somewhat.
> 
> For vectors which Xen has no implementation of (for whatever reason),
> use DPL0, non-present descriptors, and redirect #NP[IDT] into
> do_reserved_trap().

Except that #NP itself being a contributory exception, if the such
covered exception is also contributory (e.g. #CP) or of page fault
class (e.g. #VE), we'd get #DF instead of #NP afaict.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 08/14] x86/cpu: Adjust reset_stack_and_jump() to be shadow stack compatible
  2020-05-27 19:18 ` [PATCH v2 08/14] x86/cpu: Adjust reset_stack_and_jump() " Andrew Cooper
@ 2020-05-28 14:41   ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2020-05-28 14:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> We need to unwind up to the supervisor token.  See the comment for details.
> 
> The use of UNLIKELY_END_SECTION in this case highlights that it isn't safe
> when it isn't the final statement of an asm().  Adjust all declarations with a
> newline.

That's only one perspective to take. I'd appreciate if you undid this.
The intention has always been ...

> --- a/xen/include/asm-x86/current.h
> +++ b/xen/include/asm-x86/current.h
> @@ -124,13 +124,55 @@ unsigned long get_stack_dump_bottom (unsigned long sp);
>  # define CHECK_FOR_LIVEPATCH_WORK ""
>  #endif
>  
> +#ifdef CONFIG_XEN_SHSTK
> +/*
> + * We need to unwind the primary shadow stack to its supervisor token, located
> + * at 0x5ff8 from the base of the stack blocks.
> + *
> + * Read the shadow stack pointer, subtract it from 0x5ff8, divide by 8 to get
> + * the number of slots needing popping.
> + *
> + * INCSSPQ can't pop more than 255 entries.  We shouldn't ever need to pop
> + * that many entries, and getting this wrong will cause us to #DF later.  Turn
> + * it into a BUG() now for fractionally easier debugging.
> + */
> +# define SHADOW_STACK_WORK                                      \
> +    "mov $1, %[ssp];"                                           \
> +    "rdsspd %[ssp];"                                            \
> +    "cmp $1, %[ssp];"                                           \
> +    "je .L_shstk_done.%=;" /* CET not active?  Skip. */         \
> +    "mov $%c[skstk_base], %[val];"                              \
> +    "and $%c[stack_mask], %[ssp];"                              \
> +    "sub %[ssp], %[val];"                                       \
> +    "shr $3, %[val];"                                           \
> +    "cmp $255, %[val];" /* More than 255 entries?  Crash. */    \
> +    UNLIKELY_START(a, shstk_adjust)                             \

... to put suitable separators at the use sites (which, seeing all
the other semicolons here, would be a semicolon then, not a
newline/tab combination. If a tab was to be put anywhere, then like
this:

#define UNLIKELY_START_SECTION "\t.pushsection .text.unlikely,\"ax\""
#define UNLIKELY_END_SECTION   "\t.popsection"

as directives (from a cross arch perspective) are supposed to be
indented; it's just that on most arch-es it doesn't matter (and
hence is irrelevant here).

Preferably with this aspect undone
Reviewed-by: Jan Beulich <jbeulich@suse.com>

As to the comment, could I talk you into replacing the two 0x5ff8
instances by something like "last word of the shadow stack"?

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/14] x86/extable: Adjust extable handling to be shadow stack compatible
  2020-05-27 19:18 ` [PATCH v2 10/14] x86/extable: Adjust extable handling " Andrew Cooper
@ 2020-05-28 16:15   ` Jan Beulich
  2020-05-29 19:43     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-28 16:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> @@ -400,6 +400,18 @@ unsigned long get_stack_trace_bottom(unsigned long sp)
>      }
>  }
>  
> +static unsigned long get_shstk_bottom(unsigned long sp)
> +{
> +    switch ( get_stack_page(sp) )
> +    {
> +#ifdef CONFIG_XEN_SHSTK
> +    case 0:  return ROUNDUP(sp, IST_SHSTK_SIZE) - sizeof(unsigned long);
> +    case 5:  return ROUNDUP(sp, PAGE_SIZE)      - sizeof(unsigned long);

PRIMARY_SHSTK_SLOT please and, if introduced as suggested for the
earlier patch, IST_SHSTK_SLOT in the earlier line.

> @@ -763,6 +775,56 @@ static void do_reserved_trap(struct cpu_user_regs *regs)
>            trapnr, vec_name(trapnr), regs->error_code);
>  }
>  
> +static void extable_shstk_fixup(struct cpu_user_regs *regs, unsigned long fixup)
> +{
> +    unsigned long ssp, *ptr, *base;
> +
> +    asm ( "rdsspq %0" : "=r" (ssp) : "0" (1) );
> +    if ( ssp == 1 )
> +        return;
> +
> +    ptr = _p(ssp);
> +    base = _p(get_shstk_bottom(ssp));
> +
> +    for ( ; ptr < base; ++ptr )
> +    {
> +        /*
> +         * Search for %rip.  The shstk currently looks like this:
> +         *
> +         *   ...  [Likely pointed to by SSP]
> +         *   %cs  [== regs->cs]
> +         *   %rip [== regs->rip]
> +         *   SSP  [Likely points to 3 slots higher, above %cs]
> +         *   ...  [call tree to this function, likely 2/3 slots]
> +         *
> +         * and we want to overwrite %rip with fixup.  There are two
> +         * complications:
> +         *   1) We cant depend on SSP values, because they won't differ by 3
> +         *      slots if the exception is taken on an IST stack.
> +         *   2) There are synthetic (unrealistic but not impossible) scenarios
> +         *      where %rip can end up in the call tree to this function, so we
> +         *      can't check against regs->rip alone.
> +         *
> +         * Check for both reg->rip and regs->cs matching.

Nit: regs->rip

> +         */
> +
> +        if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
> +        {
> +            asm ( "wrssq %[fix], %[stk]"
> +                  : [stk] "=m" (*ptr)

Could this be ptr[0], to match the if()?

Considering how important it is that we don't fix up the wrong stack
location here, I continue to wonder if we wouldn't better also
include the SSP value on the stack in the checking, at the very
least by way of an ASSERT() or BUG_ON(). Since, with how the code is
currently written, this would require a somewhat odd looking ptr[-1]
I also wonder whether "while ( ++ptr < base )" as the loop header
wouldn't be better. The first entry on the stack can't be the RIP
we look for anyway, can it?

> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -708,7 +708,16 @@ exception_with_ints_disabled:
>          call  search_pre_exception_table
>          testq %rax,%rax                 # no fixup code for faulting EIP?
>          jz    1b
> -        movq  %rax,UREGS_rip(%rsp)
> +        movq  %rax,UREGS_rip(%rsp)      # fixup regular stack
> +
> +#ifdef CONFIG_XEN_SHSTK
> +        mov    $1, %edi
> +        rdsspq %rdi
> +        cmp    $1, %edi
> +        je     .L_exn_shstk_done
> +        wrssq  %rax, (%rdi)             # fixup shadow stack

According to the comment in extable_shstk_fixup(), isn't the value
to be replaced at 8(%rdi)?

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 02/14] x86/traps: Factor out extable_fixup() and make printing consistent
  2020-05-28  9:50   ` Jan Beulich
@ 2020-05-28 17:26     ` Andrew Cooper
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-28 17:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 28/05/2020 10:50, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> UD faults never had any diagnostics printed, and the others were inconsistent.
>>
>> Don't use dprintk() because identifying traps.c is actively unhelpful in the
>> message, as it is the location of the fixup, not the fault.  Use the new
>> vec_name() infrastructure, rather than leaving raw numbers for the log.
>>
>>   (XEN) Running stub recovery selftests...
>>   (XEN) Fixup #UD[0000]: ffff82d07fffd040 [ffff82d07fffd040] -> ffff82d0403ac9d6
>>   (XEN) Fixup #GP[0000]: ffff82d07fffd041 [ffff82d07fffd041] -> ffff82d0403ac9d6
>>   (XEN) Fixup #SS[0000]: ffff82d07fffd040 [ffff82d07fffd040] -> ffff82d0403ac9d6
>>   (XEN) Fixup #BP[0000]: ffff82d07fffd041 [ffff82d07fffd041] -> ffff82d0403ac9d6
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> As before
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> albeit I realize I have one more suggestion:
>
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -772,10 +772,31 @@ static void do_reserved_trap(struct cpu_user_regs *regs)
>>            trapnr, vec_name(trapnr), regs->error_code);
>>  }
>>  
>> +static bool extable_fixup(struct cpu_user_regs *regs, bool print)
>> +{
>> +    unsigned long fixup = search_exception_table(regs);
>> +
>> +    if ( unlikely(fixup == 0) )
>> +        return false;
>> +
>> +    /*
>> +     * Don't use dprintk() because the __FILE__ reference is unhelpful.
>> +     * Can currently be triggered by guests.  Make sure we ratelimit.
>> +     */
>> +    if ( IS_ENABLED(CONFIG_DEBUG) && print )
> How about pulling the IS_ENABLED(CONFIG_DEBUG) into the call sites
> currently passing "true"?

That is an obfuscation, not an improvement, in code legibility.

It is however a transformation that the compiler does (as part of
dropping the print parameter entirely).

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-05-28 10:25   ` Jan Beulich
@ 2020-05-28 18:10     ` Andrew Cooper
  2020-05-29 11:59       ` Jan Beulich
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-28 18:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 28/05/2020 11:25, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> Introduce CONFIG_HAS_AS_CET to determine whether CET instructions are
>> supported in the assembler, and CONFIG_XEN_SHSTK as the main build option.
>>
>> Introduce cet={no-,}shstk to for a user to select whether or not to use shadow
>> stacks at runtime, and X86_FEATURE_XEN_SHSTK to determine Xen's overall
>> enablement of shadow stacks.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Wei Liu <wl@xen.org>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>
>> LLVM 6 supports CET-SS instructions while only LLVM 7 supports CET-IBT
>> instructions.  We'd need to split HAS_AS_CET into two if we want to support
>> supervisor shadow stacks with LLVM 6.  (This demonstrates exactly why picking
>> a handful of instructions to test is the right approach.)
> In this case I agree with splitting; I wasn't aware clang implemented
> the respective insns piecemeal.

I only became aware when trying Clang for v2.  I'll turn it into
HAS_AS_CET_SS, because the more I think about IBT, the more I think that
will need to be tied to the compiler, rather than the assembler.

>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -270,6 +270,23 @@ and not running softirqs. Reduce this if softirqs are not being run frequently
>>  enough. Setting this to a high value may cause boot failure, particularly if
>>  the NMI watchdog is also enabled.
>>  
>> +### cet
>> +    = List of [ shstk=<bool> ]
>> +
>> +    Applicability: x86
>> +
>> +Controls for the use of Control-flow Enforcement Technology.  CET is group of
> Nit: "... is a group of ..."

Oops.

>
>> --- a/xen/arch/x86/Kconfig
>> +++ b/xen/arch/x86/Kconfig
>> @@ -34,6 +34,10 @@ config ARCH_DEFCONFIG
>>  config INDIRECT_THUNK
>>  	def_bool $(cc-option,-mindirect-branch-register)
>>  
>> +config HAS_AS_CET
>> +	# binutils >= 2.29 and LLVM >= 7
>> +	def_bool $(as-instr,wrssq %rax$(comma)0;setssbsy;endbr64)
> So you put me in a really awkward position: I'd really like to see
> this series go in for 4.14, yet I've previously indicated I want the
> underlying concept to first be agreed upon, before any uses get
> introduced.

There are already users.  One of them is even in context.

I don't see that there is anything open for dispute in the first place. 
Being able to do exactly this was a one key driving factor to a newer
Kconfig, because it is superior mechanism to the ad-hoc mess we had
previously (not to mention, a vast detriment to build time).

> A fundamental problem with this, at least as long as (a) more of
> Anthony's series hasn't been committed and (b) we re-build Xen upon
> installing (as root), even if it was fully built (as non-root) and
> is ready without any re-building.

How is any any of this relevant to Anthony's recent changes?

You've always been asking for trouble if your `make` before `make
install` has as different toolchain, even in regular autotools/userspace
software.  The newer Kconfig logic might make this trouble far more
obvious, but doesn't introduce the problem.

> Each of these aspects means that
> what you've configured and built may not be what gets installed, by
> virtue of the tool chains differing. (a) in addition may lead to
> the install-time rebuild to actually go wrong, due to there being
> dependency tracking issues on at least {xen,efi}.lds (which I've
> noticed in a different context yesterday).

Again - how is any of this related to the recent changes?

Fundamentally, it is either no regression from 4.13, or already needs
bugfixing due to the existing users in 4.14.

Neither of these options affects this patch.

>> --- a/xen/scripts/Kconfig.include
>> +++ b/xen/scripts/Kconfig.include
>> @@ -31,6 +31,10 @@ cc-option = $(success,$(CC) -Werror $(CLANG_FLAGS) $(1) -E -x c /dev/null -o /de
>>  # Return y if the linker supports <flag>, n otherwise
>>  ld-option = $(success,$(LD) -v $(1))
>>  
>> +# $(as-instr,<instr>)
>> +# Return y if the assembler supports <instr>, n otherwise
>> +as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler -o /dev/null -)
> Is this actually checking the right thing in the clang case?

Appears to work correctly for me.  (Again, its either fine, or need
bugfixing anyway for 4.14, and raising with Linux as this is unmodified
upstream code like the rest of Kconfig.include).

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-05-28 18:10     ` Andrew Cooper
@ 2020-05-29 11:59       ` Jan Beulich
  2020-05-29 15:51         ` Anthony PERARD
  2020-05-29 18:36         ` Andrew Cooper
  0 siblings, 2 replies; 56+ messages in thread
From: Jan Beulich @ 2020-05-29 11:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Anthony Perard, Xen-devel, Wei Liu, Roger Pau Monné

On 28.05.2020 20:10, Andrew Cooper wrote:
> On 28/05/2020 11:25, Jan Beulich wrote:
>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/Kconfig
>>> +++ b/xen/arch/x86/Kconfig
>>> @@ -34,6 +34,10 @@ config ARCH_DEFCONFIG
>>>  config INDIRECT_THUNK
>>>  	def_bool $(cc-option,-mindirect-branch-register)
>>>  
>>> +config HAS_AS_CET
>>> +	# binutils >= 2.29 and LLVM >= 7
>>> +	def_bool $(as-instr,wrssq %rax$(comma)0;setssbsy;endbr64)
>> So you put me in a really awkward position: I'd really like to see
>> this series go in for 4.14, yet I've previously indicated I want the
>> underlying concept to first be agreed upon, before any uses get
>> introduced.
> 
> There are already users.  One of them is even in context.

Hmm, indeed. I clearly didn't notice this aspect when reviewing
Anthony's series.

> I don't see that there is anything open for dispute in the first place. 
> Being able to do exactly this was a one key driving factor to a newer
> Kconfig, because it is superior mechanism to the ad-hoc mess we had
> previously (not to mention, a vast detriment to build time).

This "key driving factor" was presumably from your perspective.
Could you point me to a discussion (and resulting decision) that
this is an explicit goal of that work? I don't recall any, and
hence I also don't recall having been given a chance in influence
the direction, decision, and overall outcome.

In the interest of getting this series in for 4.14, and on the
assumption that you're willing to have a discussion on the
direction wrt storing tool chain capabilities in .config before
any further uses get added (and with the potential need to undo
the ones we have / gain here)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

The comment at the very end may point at an issue that wants
sorting, the others below are merely replies to some of the
points you've made.

>> A fundamental problem with this, at least as long as (a) more of
>> Anthony's series hasn't been committed and (b) we re-build Xen upon
>> installing (as root), even if it was fully built (as non-root) and
>> is ready without any re-building.
> 
> How is any any of this relevant to Anthony's recent changes?

Command line options not getting written to .*.cmd and read back
and compare upon rebuild is the issue here. Albeit thinking
about it again for anything that's stored in .config that's
for now compensated for by a change to .config triggering a
global rebuild.

The specific issue I had run into was with XEN_BUILD_EFI changing
without xen.lds getting re-generated.

> You've always been asking for trouble if your `make` before `make
> install` has as different toolchain, even in regular autotools/userspace
> software.  The newer Kconfig logic might make this trouble far more
> obvious, but doesn't introduce the problem.

I disagree. There should be no dependency on the tool chain at
all at install time, with a fully built tree. It ought to be
fine to not even have a tool chain accessible to root, even
less so the precise one that was used for building the tree.

>>> --- a/xen/scripts/Kconfig.include
>>> +++ b/xen/scripts/Kconfig.include
>>> @@ -31,6 +31,10 @@ cc-option = $(success,$(CC) -Werror $(CLANG_FLAGS) $(1) -E -x c /dev/null -o /de
>>>  # Return y if the linker supports <flag>, n otherwise
>>>  ld-option = $(success,$(LD) -v $(1))
>>>  
>>> +# $(as-instr,<instr>)
>>> +# Return y if the assembler supports <instr>, n otherwise
>>> +as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler -o /dev/null -)
>> Is this actually checking the right thing in the clang case?
> 
> Appears to work correctly for me.  (Again, its either fine, or need
> bugfixing anyway for 4.14, and raising with Linux as this is unmodified
> upstream code like the rest of Kconfig.include).

This answer made me try it out: On a system with clang 5 and
binutils 2.32 "incsspd	%eax" translates fine with
-no-integrated-as but doesn't without. The previously mentioned
odd use of CLANG_FLAGS, perhaps together with the disconnect
from where we establish whether to use -no-integrated-as in the
first place (arch.mk; I have no idea whether the CFLAGS
determined would be usable by the kconfig invocation, nor how
to solve the chicken-and-egg problem there if this is meant to
work that way), may be the reason for this. Cc-ing Anthony once
again ...

As an aside - this being taken from Linux doesn't mean it's
suitable for our use. For example, Linux'es way to use (or not)
-no-integrated-as is entirely different from ours (via LLVM_IAS
setting, used in the top level Makefile). I don't think I can
see whether the check above, for the case at hand, would work
correctly there, both with and without that variable set.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 11/14] x86/alt: Adjust _alternative_instructions() to not create shadow stacks
  2020-05-27 19:18 ` [PATCH v2 11/14] x86/alt: Adjust _alternative_instructions() to not create shadow stacks Andrew Cooper
@ 2020-05-29 12:23   ` Jan Beulich
  2020-05-29 19:46     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-29 12:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> @@ -398,6 +399,19 @@ static void __init _alternative_instructions(bool force)
>          panic("Timed out waiting for alternatives self-NMI to hit\n");
>  
>      set_nmi_callback(saved_nmi_callback);
> +
> +    /*
> +     * When Xen is using shadow stacks, the alternatives clearing CR0.WP and
> +     * writing into the mappings set dirty bits, turning the mappings into
> +     * shadow stack mappings.
> +     *
> +     * While we can execute from them, this would also permit them to be the
> +     * target of WRSS instructions, so reset the dirty after patching.
> +     */
> +    if ( cpu_has_xen_shstk )
> +        modify_xen_mappings(XEN_VIRT_START + MB(2),
> +                            (unsigned long)&__2M_text_end,
> +                            PAGE_HYPERVISOR_RX);

Am I misremembering, or did you post a patch before that should
be part of this series, as being a prereq to this change,
making modify_xen_mappings() also respect _PAGE_DIRTY as
requested by the caller?

Additionally I notice this

        if ( desc->Attribute & (efi_bs_revision < EFI_REVISION(2, 5)
                                ? EFI_MEMORY_WP : EFI_MEMORY_RO) )
            prot &= ~_PAGE_RW;

in efi_init_memory(). Afaict we will need to clear _PAGE_DIRTY
there as well, with prot starting out as PAGE_HYPERVISOR_RWX.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 12/14] x86/entry: Adjust guest paths to be shadow stack compatible
  2020-05-27 19:18 ` [PATCH v2 12/14] x86/entry: Adjust guest paths to be shadow stack compatible Andrew Cooper
@ 2020-05-29 12:40   ` Jan Beulich
  2020-05-29 19:58     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-29 12:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> The SYSCALL/SYSENTER/SYSRET paths need to use {SET,CLR}SSBSY.  The IRET to
> guest paths must not.  In the SYSRET path, re-position the mov which loads rip
> into %rcx so we can use %rcx for CLRSSBSY, rather than spilling another
> register to the stack.
> 
> While we can in principle detect shadow stack corruption and a failure to
> clear the supervisor token busy bit in the SYSRET path (by inspecting the
> carry flag following CLRSSBSY), we cannot detect similar problems for the IRET
> path (IRET is specified not to fault in this case).
> 
> We will double fault at some point later, when next trying to enter Xen, due
> to an already-set supervisor shadow stack busy bit.  As SYSRET is a uncommon
> path anyway, avoid the added complexity for no appreciable gain.

I'm okay with the avoidance of complexity, but why is the SYSRET path
uncommon? Almost all hypercall returns ought to take that path?

> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -191,9 +191,16 @@ restore_all_guest:
>          sarq  $47,%rcx
>          incl  %ecx
>          cmpl  $1,%ecx
> -        movq  8(%rsp),%rcx            # RIP
>          ja    iret_exit_to_guest

This removal from the shared part of the exit path needs to be
reflected on both of the sides of the JA, i.e. ...

>  
> +        /* Clear the supervisor shadow stack token busy bit. */
> +.macro rag_clrssbsy
> +        rdsspq %rcx
> +        clrssbsy (%rcx)
> +.endm
> +        ALTERNATIVE "", rag_clrssbsy, X86_FEATURE_XEN_SHSTK
> +
> +        movq  8(%rsp), %rcx           # RIP

... not just here, but also like this (with the JA above changed
to target the new label):

         ALIGN
 /* No special register assumptions. */
+.Liret_exit_to_guest:
+        movq  8(%rsp),%rcx            # RIP
 iret_exit_to_guest:
         andl  $~(X86_EFLAGS_IOPL|X86_EFLAGS_NT|X86_EFLAGS_VM),24(%rsp)
         orl   $X86_EFLAGS_IF,24(%rsp)

Granted it's mostly cosmetic, as the IRETQ ought to fault, but
it's still a use of IRET in place of SYSRET, and hence we better
get guest register state right. With this or a functionally
identical adjustment (or a clarification on what makes you
convinced this adjustment isn't needed)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 13/14] x86/S3: Save and restore Shadow Stack configuration
  2020-05-27 19:18 ` [PATCH v2 13/14] x86/S3: Save and restore Shadow Stack configuration Andrew Cooper
@ 2020-05-29 12:52   ` Jan Beulich
  2020-05-29 20:00     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-29 12:52 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> See code for details
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wl@xen.org>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> Semi-RFC - I can't actually test this path.  Currently attempting to arrange
> for someone else to.

Nevertheless
Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one question, just for my understanding:

> @@ -48,6 +58,51 @@ ENTRY(s3_resume)
>          pushq   %rax
>          lretq
>  1:
> +#ifdef CONFIG_XEN_SHSTK
> +        /*
> +         * Restoring SSP is a little complicated, because we are intercepting
> +         * an in-use shadow stack.  Write a temporary token under the stack,
> +         * so SETSSBSY will successfully load a value useful for us, then
> +         * reset MSR_PL0_SSP to its usual value and pop the temporary token.
> +         */
> +        mov     saved_rsp(%rip), %rdi
> +        cmpq    $1, %rdi
> +        je      .L_shstk_done
> +
> +        /* Set up MSR_S_CET. */
> +        mov     $MSR_S_CET, %ecx
> +        xor     %edx, %edx
> +        mov     $CET_SHSTK_EN | CET_WRSS_EN, %eax
> +        wrmsr
> +
> +        /* Construct the temporary supervisor token under SSP. */
> +        sub     $8, %rdi
> +
> +        /* Load it into MSR_PL0_SSP. */
> +        mov     $MSR_PL0_SSP, %ecx
> +        mov     %rdi, %rdx
> +        shr     $32, %rdx
> +        mov     %edi, %eax
> +        wrmsr
> +
> +        /* Enable CET.  MSR_INTERRUPT_SSP_TABLE is set up later in load_system_tables(). */
> +        mov     $XEN_MINIMAL_CR4 | X86_CR4_CET, %ebx
> +        mov     %rbx, %cr4

Does this imply NMI or #MC are fatal between here and there?

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 14/14] x86/shstk: Activate Supervisor Shadow Stacks
  2020-05-27 19:18 ` [PATCH v2 14/14] x86/shstk: Activate Supervisor Shadow Stacks Andrew Cooper
@ 2020-05-29 13:09   ` Jan Beulich
  2020-05-29 20:28     ` Andrew Cooper
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-05-29 13:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 27.05.2020 21:18, Andrew Cooper wrote:
> With all other plumbing in place, activate shadow stacks when possible.  Note
> that this requires prohibiting the use of PV32.  Compatibility can be
> maintained if necessary via PV-Shim.

In the revision log you say "Discuss CET-SS disabling PV32", and I
agree both here and in the command line doc you mention the "that"
aspect. But what about the "why"? Aiui "is incompatible" or
"requires" are too strong statements: It could be made work (by
disabling / enabling CET on the way out of / back into Xen), but
besides losing some of the intended protection that way, it would
be quite a bit of overhead. So it's more like a design decision,
and it would be nice to express it like this at least in the
commit message.

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -664,6 +664,13 @@ static void __init noreturn reinit_bsp_stack(void)
>      stack_base[0] = stack;
>      memguard_guard_stack(stack);
>  
> +    if ( cpu_has_xen_shstk )
> +    {
> +        wrmsrl(MSR_PL0_SSP, (unsigned long)stack + 0x5ff8);

Please replace this remaining literal number accordingly.

> @@ -1801,6 +1823,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      alternative_branches();
>  
> +    /* Defer CR4.CET until alternatives have finished playing with CR4.WP */
> +    if ( cpu_has_xen_shstk )
> +        set_in_cr4(X86_CR4_CET);

Nit: The comment still wants changing to CR0.WP.

With these taken care of in some form
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-05-29 11:59       ` Jan Beulich
@ 2020-05-29 15:51         ` Anthony PERARD
  2020-05-29 18:39           ` Andrew Cooper
  2020-05-29 18:36         ` Andrew Cooper
  1 sibling, 1 reply; 56+ messages in thread
From: Anthony PERARD @ 2020-05-29 15:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Xen-devel

On Fri, May 29, 2020 at 01:59:55PM +0200, Jan Beulich wrote:
> On 28.05.2020 20:10, Andrew Cooper wrote:
> > On 28/05/2020 11:25, Jan Beulich wrote:
> >> On 27.05.2020 21:18, Andrew Cooper wrote:
> >>> --- a/xen/scripts/Kconfig.include
> >>> +++ b/xen/scripts/Kconfig.include
> >>> @@ -31,6 +31,10 @@ cc-option = $(success,$(CC) -Werror $(CLANG_FLAGS) $(1) -E -x c /dev/null -o /de
> >>>  # Return y if the linker supports <flag>, n otherwise
> >>>  ld-option = $(success,$(LD) -v $(1))
> >>>  
> >>> +# $(as-instr,<instr>)
> >>> +# Return y if the assembler supports <instr>, n otherwise
> >>> +as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler -o /dev/null -)
> >> Is this actually checking the right thing in the clang case?
> > 
> > Appears to work correctly for me.  (Again, its either fine, or need
> > bugfixing anyway for 4.14, and raising with Linux as this is unmodified
> > upstream code like the rest of Kconfig.include).
> 
> This answer made me try it out: On a system with clang 5 and
> binutils 2.32 "incsspd	%eax" translates fine with
> -no-integrated-as but doesn't without. The previously mentioned
> odd use of CLANG_FLAGS, perhaps together with the disconnect
> from where we establish whether to use -no-integrated-as in the
> first place (arch.mk; I have no idea whether the CFLAGS
> determined would be usable by the kconfig invocation, nor how
> to solve the chicken-and-egg problem there if this is meant to
> work that way), may be the reason for this. Cc-ing Anthony once
> again ...

I've just prepare/sent a patch which should fix this CLANG_FLAGS issue
and allows to tests the assembler in Kconfig.

See: [XEN PATCH] xen/build: introduce CLANG_FLAGS for testing other CFLAGS

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-05-29 11:59       ` Jan Beulich
  2020-05-29 15:51         ` Anthony PERARD
@ 2020-05-29 18:36         ` Andrew Cooper
  2020-06-02 12:06           ` Jan Beulich
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 18:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Anthony Perard, Xen-devel, Wei Liu, Roger Pau Monné

On 29/05/2020 12:59, Jan Beulich wrote:
> On 28.05.2020 20:10, Andrew Cooper wrote:
>> On 28/05/2020 11:25, Jan Beulich wrote:
>>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>>> --- a/xen/arch/x86/Kconfig
>>>> +++ b/xen/arch/x86/Kconfig
>>>> @@ -34,6 +34,10 @@ config ARCH_DEFCONFIG
>>>>  config INDIRECT_THUNK
>>>>  	def_bool $(cc-option,-mindirect-branch-register)
>>>>  
>>>> +config HAS_AS_CET
>>>> +	# binutils >= 2.29 and LLVM >= 7
>>>> +	def_bool $(as-instr,wrssq %rax$(comma)0;setssbsy;endbr64)
>>> So you put me in a really awkward position: I'd really like to see
>>> this series go in for 4.14, yet I've previously indicated I want the
>>> underlying concept to first be agreed upon, before any uses get
>>> introduced.
>> There are already users.  One of them is even in context.
> Hmm, indeed. I clearly didn't notice this aspect when reviewing
> Anthony's series.
>
>> I don't see that there is anything open for dispute in the first place. 
>> Being able to do exactly this was a one key driving factor to a newer
>> Kconfig, because it is superior mechanism to the ad-hoc mess we had
>> previously (not to mention, a vast detriment to build time).
> This "key driving factor" was presumably from your perspective.
> Could you point me to a discussion (and resulting decision) that
> this is an explicit goal of that work? I don't recall any, and
> hence I also don't recall having been given a chance in influence
> the direction, decision, and overall outcome.

It took up a large chunk of the build system design session in Chicago.

>
> In the interest of getting this series in for 4.14, and on the
> assumption that you're willing to have a discussion on the
> direction wrt storing tool chain capabilities in .config before
> any further uses get added (and with the potential need to undo
> the ones we have / gain here)
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-05-29 15:51         ` Anthony PERARD
@ 2020-05-29 18:39           ` Andrew Cooper
  2020-06-02 12:09             ` Jan Beulich
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 18:39 UTC (permalink / raw)
  To: Anthony PERARD, Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29/05/2020 16:51, Anthony PERARD wrote:
> On Fri, May 29, 2020 at 01:59:55PM +0200, Jan Beulich wrote:
>> On 28.05.2020 20:10, Andrew Cooper wrote:
>>> On 28/05/2020 11:25, Jan Beulich wrote:
>>>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>>>> --- a/xen/scripts/Kconfig.include
>>>>> +++ b/xen/scripts/Kconfig.include
>>>>> @@ -31,6 +31,10 @@ cc-option = $(success,$(CC) -Werror $(CLANG_FLAGS) $(1) -E -x c /dev/null -o /de
>>>>>  # Return y if the linker supports <flag>, n otherwise
>>>>>  ld-option = $(success,$(LD) -v $(1))
>>>>>  
>>>>> +# $(as-instr,<instr>)
>>>>> +# Return y if the assembler supports <instr>, n otherwise
>>>>> +as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler -o /dev/null -)
>>>> Is this actually checking the right thing in the clang case?
>>> Appears to work correctly for me.  (Again, its either fine, or need
>>> bugfixing anyway for 4.14, and raising with Linux as this is unmodified
>>> upstream code like the rest of Kconfig.include).
>> This answer made me try it out: On a system with clang 5 and
>> binutils 2.32 "incsspd	%eax" translates fine with
>> -no-integrated-as but doesn't without. The previously mentioned
>> odd use of CLANG_FLAGS, perhaps together with the disconnect
>> from where we establish whether to use -no-integrated-as in the
>> first place (arch.mk; I have no idea whether the CFLAGS
>> determined would be usable by the kconfig invocation, nor how
>> to solve the chicken-and-egg problem there if this is meant to
>> work that way), may be the reason for this. Cc-ing Anthony once
>> again ...
> I've just prepare/sent a patch which should fix this CLANG_FLAGS issue
> and allows to tests the assembler in Kconfig.
>
> See: [XEN PATCH] xen/build: introduce CLANG_FLAGS for testing other CFLAGS

Thanks.  I've checked carefully, and this is an improvement from before.

Therefore I have acked and included the patch.

However, I think there is a further problem.  When -no-integrated-as
does get passed down, I don't see Clang falling back to using the Gnu
assember, so I suspect we have a further plumbing problem.  (It only
affects Clang 5.0 and earlier, so obsolete toolchains these days).

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/14] x86/traps: Implement #CP handler and extend #PF for shadow stacks
  2020-05-28 13:31       ` Jan Beulich
@ 2020-05-29 18:50         ` Andrew Cooper
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 18:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 28/05/2020 14:31, Jan Beulich wrote:
> On 28.05.2020 15:22, Andrew Cooper wrote:
>> On 28/05/2020 13:03, Jan Beulich wrote:
>>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>>> @@ -940,7 +944,8 @@ autogen_stubs: /* Automatically generated stubs. */
>>>>          entrypoint 1b
>>>>  
>>>>          /* Reserved exceptions, heading towards do_reserved_trap(). */
>>>> -        .elseif vec == TRAP_copro_seg || vec == TRAP_spurious_int || (vec > TRAP_simd_error && vec < TRAP_nr)
>>>> +        .elseif vec == X86_EXC_CSO || vec == X86_EXC_SPV || \
>>>> +                vec == X86_EXC_VE  || (vec > X86_EXC_CP && vec < TRAP_nr)
>>> Adding yet another || here adds to the fragility of the entire
>>> construct. Wouldn't it be better to implement do_entry_VE at
>>> this occasion, even its handling continues to end up in
>>> do_reserved_trap()? This would have the benefit of avoiding the
>>> pointless checking of %spl first thing in its handling. Feel
>>> free to keep the R-b if you decide to go this route.
>> I actually have a different plan, which deletes this entire clause, and
>> simplifies our autogen sanity checking somewhat.
>>
>> For vectors which Xen has no implementation of (for whatever reason),
>> use DPL0, non-present descriptors, and redirect #NP[IDT] into
>> do_reserved_trap().
> Except that #NP itself being a contributory exception, if the such
> covered exception is also contributory (e.g. #CP) or of page fault
> class (e.g. #VE), we'd get #DF instead of #NP afaict.

Hmm.  Good point.

I also had some other cleanup plans.  (In due course,) I'll see what I
can do to make the status quo better.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 05/14] x86/shstk: Re-layout the stack block for shadow stacks
  2020-05-28 12:33   ` Jan Beulich
@ 2020-05-29 19:21     ` Andrew Cooper
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 19:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 28/05/2020 13:33, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -365,20 +365,15 @@ static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
>>  /*
>>   * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
>>   *
>> - * Stack pages 0 - 3:
>> + * Stack pages 1 - 4:
>>   *   These are all 1-page IST stacks.  Each of these stacks have an exception
>>   *   frame and saved register state at the top.  The interesting bound for a
>>   *   trace is the word adjacent to this, while the bound for a dump is the
>>   *   very top, including the exception frame.
>>   *
>> - * Stack pages 4 and 5:
>> - *   None of these are particularly interesting.  With MEMORY_GUARD, page 5 is
>> - *   explicitly not present, so attempting to dump or trace it is
>> - *   counterproductive.  Without MEMORY_GUARD, it is possible for a call chain
>> - *   to use the entire primary stack and wander into page 5.  In this case,
>> - *   consider these pages an extension of the primary stack to aid debugging
>> - *   hopefully rare situations where the primary stack has effective been
>> - *   overflown.
>> + * Stack pages 0 and 5:
>> + *   Shadow stacks.  These are mapped read-only, and used by CET-SS capable
>> + *   processors.  They will never contain regular stack data.
> I don't mind the comment getting put in place already here, but will it
> reflect reality even when CET-SS is not in use, in that the pages then
> still are mapped r/o rather than being left unmapped to act as guard
> pages not only for stack pushes but also for stack pops?

I can't parse this question.

However, I think it is answered by the following patch which does move
things to unilaterally being r/o even in the non-CET-SS case.

> At which point
> the "dump or trace it is counterproductive" remark would still apply in
> this case, and hence may better be retained.

Well - I'm thinking forwards to cleanup where we'd want to integrate the
shadow stack into stack trace reporting, at which point we would
consider these frames interesting to dump/trace.

>
>> @@ -392,13 +387,10 @@ unsigned long get_stack_trace_bottom(unsigned long sp)
>>  {
>>      switch ( get_stack_page(sp) )
>>      {
>> -    case 0 ... 3:
>> +    case 1 ... 4:
>>          return ROUNDUP(sp, PAGE_SIZE) -
>>              offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
>>  
>> -#ifndef MEMORY_GUARD
>> -    case 4 ... 5:
>> -#endif
>>      case 6 ... 7:
>>          return ROUNDUP(sp, STACK_SIZE) -
>>              sizeof(struct cpu_info) - sizeof(unsigned long);
>> @@ -412,12 +404,9 @@ unsigned long get_stack_dump_bottom(unsigned long sp)
>>  {
>>      switch ( get_stack_page(sp) )
>>      {
>> -    case 0 ... 3:
>> +    case 1 ... 4:
>>          return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
>>  
>> -#ifndef MEMORY_GUARD
>> -    case 4 ... 5:
>> -#endif
>>      case 6 ... 7:
>>          return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);
> The need to adjust these literal numbers demonstrates how fragile
> this is. I admit I can't see a good way to get rid of the literal
> numbers altogether,

Frankly, this is why there is a massive comment, and I really didn't
want to introduce PRIMARY_SHSTK_SLOT to begin with, because the whole
thing is fragile and there is no obvious naming/labelling scheme which
is liable to survive tweaking.

>  but could I talk you into switching to (for
> the latter, as example)
>
>     switch ( get_stack_page(sp) )
>     {
>     case 0: case PRIMARY_SHSTK_SLOT:
>         return 0;
>
>     case 1 ... 4:
>         return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
>
>     case 6 ... 7:
>         return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);
>
>     default:
>         return sp - sizeof(unsigned long);
>     }
>
> ? Of course this will need the callers to be aware they may get
> back zero, but there are only very few (which made me notice the
> functions would better be static).

It was definitely needed externally at some point in the past.

>  And the returning of zero may
> then want changing (conditionally upon us using CET-SS) in a
> later patch, where iirc you use the shadow stack for call trace
> generation.
>
> As a positive side effect this will yield a compile error if
> PRIMARY_SHSTK_SLOT gets changed without adjusting these
> functions.

Overall to your question, potentially as future clean-up to how we
express stacks, but not right now for 4.14.

>
>> --- a/xen/include/asm-x86/config.h
>> +++ b/xen/include/asm-x86/config.h
>> @@ -75,6 +75,9 @@
>>  /* Primary stack is restricted to 8kB by guard pages. */
>>  #define PRIMARY_STACK_SIZE 8192
>>  
>> +/* Primary shadow stack is slot 5 of 8, immediately under the primary stack. */
>> +#define PRIMARY_SHSTK_SLOT 5
> Any reason to put it here rather than ...
>
>> --- a/xen/include/asm-x86/current.h
>> +++ b/xen/include/asm-x86/current.h
>> @@ -16,12 +16,12 @@
>>   *
>>   * 7 - Primary stack (with a struct cpu_info at the top)
>>   * 6 - Primary stack
>> - * 5 - Optionally not present (MEMORY_GUARD)
>> - * 4 - Unused; optionally not present (MEMORY_GUARD)
>> - * 3 - Unused; optionally not present (MEMORY_GUARD)
>> - * 2 - MCE IST stack
>> - * 1 - NMI IST stack
>> - * 0 - Double Fault IST stack
>> + * 5 - Primay Shadow Stack (read-only)
>> + * 4 - #DF IST stack
>> + * 3 - #DB IST stack
>> + * 2 - NMI IST stack
>> + * 1 - #MC IST stack
>> + * 0 - IST Shadow Stacks (4x 1k, read-only)
>>   */
> ... right below this comment?

Yes - grouping the related constants.

> Same question as above regarding the "read-only" here.

I'll adjust the commit message to make it clearer that some of the text
here is made true in the next patch.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/14] x86/shstk: Create shadow stacks
  2020-05-28 12:50   ` Jan Beulich
@ 2020-05-29 19:35     ` Andrew Cooper
  2020-05-29 21:45       ` Andrew Cooper
  2020-06-02 12:35       ` Jan Beulich
  0 siblings, 2 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 19:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 28/05/2020 13:50, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -769,6 +769,30 @@ void load_system_tables(void)
>>  	tss->rsp1 = 0x8600111111111111ul;
>>  	tss->rsp2 = 0x8600111111111111ul;
>>  
>> +	/* Set up the shadow stack IST. */
>> +	if (cpu_has_xen_shstk) {
>> +		volatile uint64_t *ist_ssp = this_cpu(tss_page).ist_ssp;
>> +
>> +		/*
>> +		 * Used entries must point at the supervisor stack token.
>> +		 * Unused entries are poisoned.
>> +		 *
>> +		 * This IST Table may be live, and the NMI/#MC entries must
>> +		 * remain valid on every instruction boundary, hence the
>> +		 * volatile qualifier.
>> +		 */
> Move this comment ahead of what it comments on, as we usually have it?
>
>> +		ist_ssp[0] = 0x8600111111111111ul;
>> +		ist_ssp[IST_MCE] = stack_top + (IST_MCE * IST_SHSTK_SIZE) - 8;
>> +		ist_ssp[IST_NMI] = stack_top + (IST_NMI * IST_SHSTK_SIZE) - 8;
>> +		ist_ssp[IST_DB]	 = stack_top + (IST_DB	* IST_SHSTK_SIZE) - 8;
>> +		ist_ssp[IST_DF]	 = stack_top + (IST_DF	* IST_SHSTK_SIZE) - 8;
> Strictly speaking you want to introduce
>
> #define IST_SHSTK_SLOT 0
>
> next to PRIMARY_SHSTK_SLOT and use
>
> 		ist_ssp[IST_MCE] = stack_top + (IST_SHSTK_SLOT * PAGE_SIZE) +
>                                                (IST_MCE * IST_SHSTK_SIZE) - 8;
>
> etc here. It's getting longish, so I'm not going to insist. But if you
> go this route, then please also below / elsewhere.

Actually no.  I've got a much better idea, based on how Linux does the
same, but it's definitely 4.15 material at this point.

>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5994,12 +5994,33 @@ void memguard_unguard_range(void *p, unsigned long l)
>>  
>>  #endif
>>  
>> +static void write_sss_token(unsigned long *ptr)
>> +{
>> +    /*
>> +     * A supervisor shadow stack token is its own linear address, with the
>> +     * busy bit (0) clear.
>> +     */
>> +    *ptr = (unsigned long)ptr;
>> +}
>> +
>>  void memguard_guard_stack(void *p)
>>  {
>> -    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
>> +    /* IST Shadow stacks.  4x 1k in stack page 0. */
>> +    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
>> +    {
>> +        write_sss_token(p + (IST_MCE * IST_SHSTK_SIZE) - 8);
>> +        write_sss_token(p + (IST_NMI * IST_SHSTK_SIZE) - 8);
>> +        write_sss_token(p + (IST_DB  * IST_SHSTK_SIZE) - 8);
>> +        write_sss_token(p + (IST_DF  * IST_SHSTK_SIZE) - 8);
> Up to now two successive memguard_guard_stack() were working fine. This
> will be no longer the case, just as an observation.

I don't think that matters.

>
>> +    }
>> +    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
> As already hinted at in reply to the previous patch, I think this wants
> to remain _PAGE_NONE when we don't use CET-SS.

The commit message discussed why that is not an option (currently), and
why I don't consider it a good idea to make possible.

>> +    /* Primary Shadow Stack.  1x 4k in stack page 5. */
>>      p += PRIMARY_SHSTK_SLOT * PAGE_SIZE;
>> -    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
>> +    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
>> +        write_sss_token(p + PAGE_SIZE - 8);
>> +
>> +    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
>>  }
>>  
>>  void memguard_unguard_stack(void *p)
> Would this function perhaps better zap the tokens?

Why?  We don't zap any other stack contents, and let the regular page
scrubbing clean it.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/14] x86/extable: Adjust extable handling to be shadow stack compatible
  2020-05-28 16:15   ` Jan Beulich
@ 2020-05-29 19:43     ` Andrew Cooper
  2020-05-29 21:17       ` Andrew Cooper
  2020-06-02 12:57       ` Jan Beulich
  0 siblings, 2 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 19:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 28/05/2020 17:15, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> @@ -400,6 +400,18 @@ unsigned long get_stack_trace_bottom(unsigned long sp)
>>      }
>>  }
>>  
>> +static unsigned long get_shstk_bottom(unsigned long sp)
>> +{
>> +    switch ( get_stack_page(sp) )
>> +    {
>> +#ifdef CONFIG_XEN_SHSTK
>> +    case 0:  return ROUNDUP(sp, IST_SHSTK_SIZE) - sizeof(unsigned long);
>> +    case 5:  return ROUNDUP(sp, PAGE_SIZE)      - sizeof(unsigned long);
> PRIMARY_SHSTK_SLOT please and, if introduced as suggested for the
> earlier patch, IST_SHSTK_SLOT in the earlier line.
>
>> @@ -763,6 +775,56 @@ static void do_reserved_trap(struct cpu_user_regs *regs)
>>            trapnr, vec_name(trapnr), regs->error_code);
>>  }
>>  
>> +static void extable_shstk_fixup(struct cpu_user_regs *regs, unsigned long fixup)
>> +{
>> +    unsigned long ssp, *ptr, *base;
>> +
>> +    asm ( "rdsspq %0" : "=r" (ssp) : "0" (1) );
>> +    if ( ssp == 1 )
>> +        return;
>> +
>> +    ptr = _p(ssp);
>> +    base = _p(get_shstk_bottom(ssp));
>> +
>> +    for ( ; ptr < base; ++ptr )
>> +    {
>> +        /*
>> +         * Search for %rip.  The shstk currently looks like this:
>> +         *
>> +         *   ...  [Likely pointed to by SSP]
>> +         *   %cs  [== regs->cs]
>> +         *   %rip [== regs->rip]
>> +         *   SSP  [Likely points to 3 slots higher, above %cs]
>> +         *   ...  [call tree to this function, likely 2/3 slots]
>> +         *
>> +         * and we want to overwrite %rip with fixup.  There are two
>> +         * complications:
>> +         *   1) We cant depend on SSP values, because they won't differ by 3
>> +         *      slots if the exception is taken on an IST stack.
>> +         *   2) There are synthetic (unrealistic but not impossible) scenarios
>> +         *      where %rip can end up in the call tree to this function, so we
>> +         *      can't check against regs->rip alone.
>> +         *
>> +         * Check for both reg->rip and regs->cs matching.
> Nit: regs->rip
>
>> +         */
>> +
>> +        if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
>> +        {
>> +            asm ( "wrssq %[fix], %[stk]"
>> +                  : [stk] "=m" (*ptr)
> Could this be ptr[0], to match the if()?
>
> Considering how important it is that we don't fix up the wrong stack
> location here, I continue to wonder if we wouldn't better also
> include the SSP value on the stack in the checking, at the very
> least by way of an ASSERT() or BUG_ON().

Well no, for the reason discussed in point 1.

Its not technically an issue right now, but there is no possible way to
BUILD_BUG_ON() someone turning an exception into IST, or stopping the
use of the extable infrastructure on a #DB.

Such a check would lie in wait and either provide an unexpected tangent
to someone debugging a complicated issue (I do use #DB for a fair bit),
or become a security vulnerability.

> Since, with how the code is
> currently written, this would require a somewhat odd looking ptr[-1]
> I also wonder whether "while ( ++ptr < base )" as the loop header
> wouldn't be better. The first entry on the stack can't be the RIP
> we look for anyway, can it?

Yes it can.

>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -708,7 +708,16 @@ exception_with_ints_disabled:
>>          call  search_pre_exception_table
>>          testq %rax,%rax                 # no fixup code for faulting EIP?
>>          jz    1b
>> -        movq  %rax,UREGS_rip(%rsp)
>> +        movq  %rax,UREGS_rip(%rsp)      # fixup regular stack
>> +
>> +#ifdef CONFIG_XEN_SHSTK
>> +        mov    $1, %edi
>> +        rdsspq %rdi
>> +        cmp    $1, %edi
>> +        je     .L_exn_shstk_done
>> +        wrssq  %rax, (%rdi)             # fixup shadow stack
> According to the comment in extable_shstk_fixup(), isn't the value
> to be replaced at 8(%rdi)?

Hmm - I think you're right.  I thought I had this covered by unit tests.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 11/14] x86/alt: Adjust _alternative_instructions() to not create shadow stacks
  2020-05-29 12:23   ` Jan Beulich
@ 2020-05-29 19:46     ` Andrew Cooper
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 19:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29/05/2020 13:23, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> @@ -398,6 +399,19 @@ static void __init _alternative_instructions(bool force)
>>          panic("Timed out waiting for alternatives self-NMI to hit\n");
>>  
>>      set_nmi_callback(saved_nmi_callback);
>> +
>> +    /*
>> +     * When Xen is using shadow stacks, the alternatives clearing CR0.WP and
>> +     * writing into the mappings set dirty bits, turning the mappings into
>> +     * shadow stack mappings.
>> +     *
>> +     * While we can execute from them, this would also permit them to be the
>> +     * target of WRSS instructions, so reset the dirty after patching.
>> +     */
>> +    if ( cpu_has_xen_shstk )
>> +        modify_xen_mappings(XEN_VIRT_START + MB(2),
>> +                            (unsigned long)&__2M_text_end,
>> +                            PAGE_HYPERVISOR_RX);
> Am I misremembering, or did you post a patch before that should
> be part of this series, as being a prereq to this change,
> making modify_xen_mappings() also respect _PAGE_DIRTY as
> requested by the caller?

No.  Its the hunk you deleted from lower in this patch.

> Additionally I notice this
>
>         if ( desc->Attribute & (efi_bs_revision < EFI_REVISION(2, 5)
>                                 ? EFI_MEMORY_WP : EFI_MEMORY_RO) )
>             prot &= ~_PAGE_RW;
>
> in efi_init_memory(). Afaict we will need to clear _PAGE_DIRTY
> there as well, with prot starting out as PAGE_HYPERVISOR_RWX.

Ok.  I'll pull that out into a separate patch.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 12/14] x86/entry: Adjust guest paths to be shadow stack compatible
  2020-05-29 12:40   ` Jan Beulich
@ 2020-05-29 19:58     ` Andrew Cooper
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 19:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29/05/2020 13:40, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> The SYSCALL/SYSENTER/SYSRET paths need to use {SET,CLR}SSBSY.  The IRET to
>> guest paths must not.  In the SYSRET path, re-position the mov which loads rip
>> into %rcx so we can use %rcx for CLRSSBSY, rather than spilling another
>> register to the stack.
>>
>> While we can in principle detect shadow stack corruption and a failure to
>> clear the supervisor token busy bit in the SYSRET path (by inspecting the
>> carry flag following CLRSSBSY), we cannot detect similar problems for the IRET
>> path (IRET is specified not to fault in this case).
>>
>> We will double fault at some point later, when next trying to enter Xen, due
>> to an already-set supervisor shadow stack busy bit.  As SYSRET is a uncommon
>> path anyway, avoid the added complexity for no appreciable gain.
> I'm okay with the avoidance of complexity, but why is the SYSRET path
> uncommon? Almost all hypercall returns ought to take that path?

But hypercalls returns aren't the majority of returns to guest context.

IRET from Xen IPIs, or from event channel injections hitting guest
userspace, are the most common in a non-idle system.

>
>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -191,9 +191,16 @@ restore_all_guest:
>>          sarq  $47,%rcx
>>          incl  %ecx
>>          cmpl  $1,%ecx
>> -        movq  8(%rsp),%rcx            # RIP
>>          ja    iret_exit_to_guest
> This removal from the shared part of the exit path needs to be
> reflected on both of the sides of the JA, i.e. ...
>
>>  
>> +        /* Clear the supervisor shadow stack token busy bit. */
>> +.macro rag_clrssbsy
>> +        rdsspq %rcx
>> +        clrssbsy (%rcx)
>> +.endm
>> +        ALTERNATIVE "", rag_clrssbsy, X86_FEATURE_XEN_SHSTK
>> +
>> +        movq  8(%rsp), %rcx           # RIP
> ... not just here, but also like this (with the JA above changed
> to target the new label):
>
>          ALIGN
>  /* No special register assumptions. */
> +.Liret_exit_to_guest:
> +        movq  8(%rsp),%rcx            # RIP
>  iret_exit_to_guest:
>          andl  $~(X86_EFLAGS_IOPL|X86_EFLAGS_NT|X86_EFLAGS_VM),24(%rsp)
>          orl   $X86_EFLAGS_IF,24(%rsp)
>
> Granted it's mostly cosmetic, as the IRETQ ought to fault, but
> it's still a use of IRET in place of SYSRET, and hence we better
> get guest register state right. With this or a functionally
> identical adjustment (or a clarification on what makes you
> convinced this adjustment isn't needed)
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Ah yes.  I really ought to retroactively create an XSA-7 PoC for this.

Will fix.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 13/14] x86/S3: Save and restore Shadow Stack configuration
  2020-05-29 12:52   ` Jan Beulich
@ 2020-05-29 20:00     ` Andrew Cooper
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 20:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29/05/2020 13:52, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> See code for details
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Wei Liu <wl@xen.org>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Semi-RFC - I can't actually test this path.  Currently attempting to arrange
>> for someone else to.
> Nevertheless
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with one question, just for my understanding:
>
>> @@ -48,6 +58,51 @@ ENTRY(s3_resume)
>>          pushq   %rax
>>          lretq
>>  1:
>> +#ifdef CONFIG_XEN_SHSTK
>> +        /*
>> +         * Restoring SSP is a little complicated, because we are intercepting
>> +         * an in-use shadow stack.  Write a temporary token under the stack,
>> +         * so SETSSBSY will successfully load a value useful for us, then
>> +         * reset MSR_PL0_SSP to its usual value and pop the temporary token.
>> +         */
>> +        mov     saved_rsp(%rip), %rdi
>> +        cmpq    $1, %rdi
>> +        je      .L_shstk_done
>> +
>> +        /* Set up MSR_S_CET. */
>> +        mov     $MSR_S_CET, %ecx
>> +        xor     %edx, %edx
>> +        mov     $CET_SHSTK_EN | CET_WRSS_EN, %eax
>> +        wrmsr
>> +
>> +        /* Construct the temporary supervisor token under SSP. */
>> +        sub     $8, %rdi
>> +
>> +        /* Load it into MSR_PL0_SSP. */
>> +        mov     $MSR_PL0_SSP, %ecx
>> +        mov     %rdi, %rdx
>> +        shr     $32, %rdx
>> +        mov     %edi, %eax
>> +        wrmsr
>> +
>> +        /* Enable CET.  MSR_INTERRUPT_SSP_TABLE is set up later in load_system_tables(). */
>> +        mov     $XEN_MINIMAL_CR4 | X86_CR4_CET, %ebx
>> +        mov     %rbx, %cr4
> Does this imply NMI or #MC are fatal between here and there?

Yes, but that is always the case during CPU bringup.

Only a few instructions ago, we didn't have an IDT, and we don't have
yet have an established %tr, so can't get the regular IST pointer either.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 14/14] x86/shstk: Activate Supervisor Shadow Stacks
  2020-05-29 13:09   ` Jan Beulich
@ 2020-05-29 20:28     ` Andrew Cooper
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 20:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29/05/2020 14:09, Jan Beulich wrote:
> On 27.05.2020 21:18, Andrew Cooper wrote:
>> With all other plumbing in place, activate shadow stacks when possible.  Note
>> that this requires prohibiting the use of PV32.  Compatibility can be
>> maintained if necessary via PV-Shim.
> In the revision log you say "Discuss CET-SS disabling PV32", and I
> agree both here and in the command line doc you mention the "that"
> aspect. But what about the "why"? Aiui "is incompatible" or
> "requires" are too strong statements: It could be made work (by
> disabling / enabling CET on the way out of / back into Xen), but
> besides losing some of the intended protection that way, it would
> be quite a bit of overhead. So it's more like a design decision,
> and it would be nice to express it like this at least in the
> commit message.

For starters, the guest kernel and Xen share the single
MSR_S_CET.SHSKT_EN bit, as they are both supervisor in the eyes of the
processor.

We can't use the PV32_CR4 trick to turn CET.SS off on return to guest
kernel context, because (unlike SMAP/SMEP), the race condition with a
late NMI would manifest as #CP against the IRET, not a spurious page fault.

Furthermore, an IRET to Ring 3 and an IRET to Ring 1 now differ by three
words on the shadow stack.  An IRET to Ring 1 is a supervisor return, so
performs consistency checks on %cs/%lip/SSP on the shadow stack.  We
could in theory shuffle the shadow stack by 3 words on context switch.

It might theoretically be possible to make something which functioned
correctly with a PV guest kernel which doesn't understand a
paravirtualised version of supervisor shadow stacks, but I quickly
concluded that it isn't even worth the effort to figure for certain.

>
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -664,6 +664,13 @@ static void __init noreturn reinit_bsp_stack(void)
>>      stack_base[0] = stack;
>>      memguard_guard_stack(stack);
>>  
>> +    if ( cpu_has_xen_shstk )
>> +    {
>> +        wrmsrl(MSR_PL0_SSP, (unsigned long)stack + 0x5ff8);
> Please replace this remaining literal number accordingly.
>
>> @@ -1801,6 +1823,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>  
>>      alternative_branches();
>>  
>> +    /* Defer CR4.CET until alternatives have finished playing with CR4.WP */
>> +    if ( cpu_has_xen_shstk )
>> +        set_in_cr4(X86_CR4_CET);
> Nit: The comment still wants changing to CR0.WP.
>
> With these taken care of in some form
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/14] x86/extable: Adjust extable handling to be shadow stack compatible
  2020-05-29 19:43     ` Andrew Cooper
@ 2020-05-29 21:17       ` Andrew Cooper
  2020-06-02 13:11         ` Jan Beulich
  2020-06-02 12:57       ` Jan Beulich
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 21:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29/05/2020 20:43, Andrew Cooper wrote:
> On 28/05/2020 17:15, Jan Beulich wrote:
>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>> +
>>> +        if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
>>> +        {
>>> +            asm ( "wrssq %[fix], %[stk]"
>>> +                  : [stk] "=m" (*ptr)
>> Could this be ptr[0], to match the if()?
>>
>> Considering how important it is that we don't fix up the wrong stack
>> location here, I continue to wonder if we wouldn't better also
>> include the SSP value on the stack in the checking, at the very
>> least by way of an ASSERT() or BUG_ON().
> Well no, for the reason discussed in point 1.
>
> Its not technically an issue right now, but there is no possible way to
> BUILD_BUG_ON() someone turning an exception into IST, or stopping the
> use of the extable infrastructure on a #DB.
>
> Such a check would lie in wait and either provide an unexpected tangent
> to someone debugging a complicated issue (I do use #DB for a fair bit),
> or become a security vulnerability.

Also (which I forgot first time around),

The ptr[1] == regs->cs check is 64 bits wide, and the way to get that on
the shadow stack would be to execute a call instruction ending at at
0xe008 linear, or from a bad WRSSQ edit, both of which are serious
errors and worthy of hitting the BUG().

>>> --- a/xen/arch/x86/x86_64/entry.S
>>> +++ b/xen/arch/x86/x86_64/entry.S
>>> @@ -708,7 +708,16 @@ exception_with_ints_disabled:
>>>          call  search_pre_exception_table
>>>          testq %rax,%rax                 # no fixup code for faulting EIP?
>>>          jz    1b
>>> -        movq  %rax,UREGS_rip(%rsp)
>>> +        movq  %rax,UREGS_rip(%rsp)      # fixup regular stack
>>> +
>>> +#ifdef CONFIG_XEN_SHSTK
>>> +        mov    $1, %edi
>>> +        rdsspq %rdi
>>> +        cmp    $1, %edi
>>> +        je     .L_exn_shstk_done
>>> +        wrssq  %rax, (%rdi)             # fixup shadow stack
>> According to the comment in extable_shstk_fixup(), isn't the value
>> to be replaced at 8(%rdi)?
> Hmm - I think you're right.  I thought I had this covered by unit tests.

The unit test has been fixed, and so has this code.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/14] x86/shstk: Create shadow stacks
  2020-05-29 19:35     ` Andrew Cooper
@ 2020-05-29 21:45       ` Andrew Cooper
  2020-06-02 12:32         ` Jan Beulich
  2020-06-02 12:35       ` Jan Beulich
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 21:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29/05/2020 20:35, Andrew Cooper wrote:
>>> +    }
>>> +    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
>> As already hinted at in reply to the previous patch, I think this wants
>> to remain _PAGE_NONE when we don't use CET-SS.
> The commit message discussed why that is not an option (currently), and
> why I don't consider it a good idea to make possible.

Apologies.  I thought I'd written it in the commit message, but it was
half in the previous patch, and not terribly clear.  I've reworked both.

The current practical reason is to do with clone_mappings() in the XPTI
case.

A wild off-stack read is far far less likely than hitting the guard page
with a push first, which means that a R/O guard page is about the same
usefulness to us, but results in a much more simple stack setup, as it
doesn't vary attributes with enabled features.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks
  2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
                   ` (13 preceding siblings ...)
  2020-05-27 19:18 ` [PATCH v2 14/14] x86/shstk: Activate Supervisor Shadow Stacks Andrew Cooper
@ 2020-05-29 22:28 ` Andrew Cooper
  14 siblings, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2020-05-29 22:28 UTC (permalink / raw)
  To: Xen-devel; +Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 27/05/2020 20:18, Andrew Cooper wrote:
> This series implements Shadow Stack support for Xen to use.

Given that we almost got to agreement, and considering the value of this
feature, I've fixed up most of the remaining comments and committed the
series.

The main area of concern was the fragility of stack expressions.  I've
got a plan for 4.15 to far more robust (by borrowing a trick from
Linux), and have left the existing logic at least self-consistent.

If there are still major concerns with the result, we can fix that up
early next week.

~Andrew


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-05-29 18:36         ` Andrew Cooper
@ 2020-06-02 12:06           ` Jan Beulich
  2020-06-02 12:26             ` Anthony PERARD
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-06-02 12:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Anthony Perard, Xen-devel, Wei Liu, Roger Pau Monné

On 29.05.2020 20:36, Andrew Cooper wrote:
> On 29/05/2020 12:59, Jan Beulich wrote:
>> On 28.05.2020 20:10, Andrew Cooper wrote:
>>> On 28/05/2020 11:25, Jan Beulich wrote:
>>>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>>>> --- a/xen/arch/x86/Kconfig
>>>>> +++ b/xen/arch/x86/Kconfig
>>>>> @@ -34,6 +34,10 @@ config ARCH_DEFCONFIG
>>>>>  config INDIRECT_THUNK
>>>>>  	def_bool $(cc-option,-mindirect-branch-register)
>>>>>  
>>>>> +config HAS_AS_CET
>>>>> +	# binutils >= 2.29 and LLVM >= 7
>>>>> +	def_bool $(as-instr,wrssq %rax$(comma)0;setssbsy;endbr64)
>>>> So you put me in a really awkward position: I'd really like to see
>>>> this series go in for 4.14, yet I've previously indicated I want the
>>>> underlying concept to first be agreed upon, before any uses get
>>>> introduced.
>>> There are already users.  One of them is even in context.
>> Hmm, indeed. I clearly didn't notice this aspect when reviewing
>> Anthony's series.
>>
>>> I don't see that there is anything open for dispute in the first place. 
>>> Being able to do exactly this was a one key driving factor to a newer
>>> Kconfig, because it is superior mechanism to the ad-hoc mess we had
>>> previously (not to mention, a vast detriment to build time).
>> This "key driving factor" was presumably from your perspective.
>> Could you point me to a discussion (and resulting decision) that
>> this is an explicit goal of that work? I don't recall any, and
>> hence I also don't recall having been given a chance in influence
>> the direction, decision, and overall outcome.
> 
> It took up a large chunk of the build system design session in Chicago.

I don't recall; perhaps I was in another parallel session? If it's
the one with notes at
https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg00786.html
then a remark close to the top suggests I was there, but there's no
sign of this aspect having got discussed. There is, among the issues
listed, "Xen build re-evaluates compiler support for every translation
unit", but that's only remotely related.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-05-29 18:39           ` Andrew Cooper
@ 2020-06-02 12:09             ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2020-06-02 12:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Anthony PERARD, Xen-devel, Wei Liu, Roger Pau Monné

On 29.05.2020 20:39, Andrew Cooper wrote:
> On 29/05/2020 16:51, Anthony PERARD wrote:
>> On Fri, May 29, 2020 at 01:59:55PM +0200, Jan Beulich wrote:
>>> On 28.05.2020 20:10, Andrew Cooper wrote:
>>>> On 28/05/2020 11:25, Jan Beulich wrote:
>>>>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>>>>> --- a/xen/scripts/Kconfig.include
>>>>>> +++ b/xen/scripts/Kconfig.include
>>>>>> @@ -31,6 +31,10 @@ cc-option = $(success,$(CC) -Werror $(CLANG_FLAGS) $(1) -E -x c /dev/null -o /de
>>>>>>  # Return y if the linker supports <flag>, n otherwise
>>>>>>  ld-option = $(success,$(LD) -v $(1))
>>>>>>  
>>>>>> +# $(as-instr,<instr>)
>>>>>> +# Return y if the assembler supports <instr>, n otherwise
>>>>>> +as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler -o /dev/null -)
>>>>> Is this actually checking the right thing in the clang case?
>>>> Appears to work correctly for me.  (Again, its either fine, or need
>>>> bugfixing anyway for 4.14, and raising with Linux as this is unmodified
>>>> upstream code like the rest of Kconfig.include).
>>> This answer made me try it out: On a system with clang 5 and
>>> binutils 2.32 "incsspd	%eax" translates fine with
>>> -no-integrated-as but doesn't without. The previously mentioned
>>> odd use of CLANG_FLAGS, perhaps together with the disconnect
>>> from where we establish whether to use -no-integrated-as in the
>>> first place (arch.mk; I have no idea whether the CFLAGS
>>> determined would be usable by the kconfig invocation, nor how
>>> to solve the chicken-and-egg problem there if this is meant to
>>> work that way), may be the reason for this. Cc-ing Anthony once
>>> again ...
>> I've just prepare/sent a patch which should fix this CLANG_FLAGS issue
>> and allows to tests the assembler in Kconfig.
>>
>> See: [XEN PATCH] xen/build: introduce CLANG_FLAGS for testing other CFLAGS
> 
> Thanks.  I've checked carefully, and this is an improvement from before.
> 
> Therefore I have acked and included the patch.
> 
> However, I think there is a further problem.  When -no-integrated-as
> does get passed down, I don't see Clang falling back to using the Gnu
> assember, so I suspect we have a further plumbing problem.  (It only
> affects Clang 5.0 and earlier, so obsolete toolchains these days).

Hmm, what exactly do you mean saying "I don't see Clang falling back
..."? In the playing I did, I specifically passed -v to see what gets
or does not get invoked, and it was /usr/bin/as that did get used
(clang 5.0.1). Obviously it'll be whatever /usr/bin/as is then.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-06-02 12:06           ` Jan Beulich
@ 2020-06-02 12:26             ` Anthony PERARD
  2020-06-02 12:41               ` Jan Beulich
  0 siblings, 1 reply; 56+ messages in thread
From: Anthony PERARD @ 2020-06-02 12:26 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On Tue, Jun 02, 2020 at 02:06:11PM +0200, Jan Beulich wrote:
> I don't recall; perhaps I was in another parallel session? If it's
> the one with notes at
> https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg00786.html
> then a remark close to the top suggests I was there, but there's no
> sign of this aspect having got discussed. There is, among the issues
> listed, "Xen build re-evaluates compiler support for every translation
> unit", but that's only remotely related.

What is a "translation unit"? What would that have to do with Kconfig?

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/14] x86/shstk: Create shadow stacks
  2020-05-29 21:45       ` Andrew Cooper
@ 2020-06-02 12:32         ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2020-06-02 12:32 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29.05.2020 23:45, Andrew Cooper wrote:
> On 29/05/2020 20:35, Andrew Cooper wrote:
>>>> +    }
>>>> +    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
>>> As already hinted at in reply to the previous patch, I think this wants
>>> to remain _PAGE_NONE when we don't use CET-SS.
>> The commit message discussed why that is not an option (currently), and
>> why I don't consider it a good idea to make possible.
> 
> Apologies.  I thought I'd written it in the commit message, but it was
> half in the previous patch, and not terribly clear.  I've reworked both.

Thanks, I've looked at them, but it's still not really clear to me:

> The current practical reason is to do with clone_mappings() in the XPTI
> case.

What exactly is the problem here? clone_mapping(), afaict, deals
fine with non-present PTEs. The original memguard_is_stack_guard_page()
check was more as a safe guard, to avoid establishing a mapping of
a guard page as much as possible.

> A wild off-stack read is far far less likely than hitting the guard page
> with a push first, which means that a R/O guard page is about the same
> usefulness to us, but results in a much more simple stack setup, as it
> doesn't vary attributes with enabled features.

While OoB stack reads may indeed be less likely, such aren't necessarily
"wild" (assuming my understanding of the term is what you mean): A
function epilogue can certainly pop (by the respective insn or by
incrementing %rsp by too much) too many slots, which would be detected
earlier with non-present mappings than with r/o ones. So I'd prefer to
stick to non-present guard pages when CET-SS is not in use, unless
there's indeed a technical reason not to do so. The two uses of
PAGE_HYPERVISOR_SHSTK can't be that bad a "variation" to alternatively
make _PAGE_NONE. In fact PAGE_HYPERVISOR_SHSTK could itself resolve
to _PAGE_NONE when !cpu_has_xen_shstk ...

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/14] x86/shstk: Create shadow stacks
  2020-05-29 19:35     ` Andrew Cooper
  2020-05-29 21:45       ` Andrew Cooper
@ 2020-06-02 12:35       ` Jan Beulich
  1 sibling, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2020-06-02 12:35 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29.05.2020 21:35, Andrew Cooper wrote:
> On 28/05/2020 13:50, Jan Beulich wrote:
>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>> +    /* Primary Shadow Stack.  1x 4k in stack page 5. */
>>>      p += PRIMARY_SHSTK_SLOT * PAGE_SIZE;
>>> -    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, _PAGE_NONE);
>>> +    if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
>>> +        write_sss_token(p + PAGE_SIZE - 8);
>>> +
>>> +    map_pages_to_xen((unsigned long)p, virt_to_mfn(p), 1, PAGE_HYPERVISOR_SHSTK);
>>>  }
>>>  
>>>  void memguard_unguard_stack(void *p)
>> Would this function perhaps better zap the tokens?
> 
> Why?  We don't zap any other stack contents, and let the regular page
> scrubbing clean it.

Except that Xen used pages, if re-used by Xen itself, may not go
through a round of scrubbing. As long as we use 1:1 mappings,
re-using the same page for a shadow stack will end up having the
necessary token already in place. Looks like a defense-in-depth
measure to zap them off as soon as a page goes out of (shadow
stack) use.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-06-02 12:26             ` Anthony PERARD
@ 2020-06-02 12:41               ` Jan Beulich
  2020-06-02 13:50                 ` Anthony PERARD
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2020-06-02 12:41 UTC (permalink / raw)
  To: Anthony PERARD; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Xen-devel

On 02.06.2020 14:26, Anthony PERARD wrote:
> On Tue, Jun 02, 2020 at 02:06:11PM +0200, Jan Beulich wrote:
>> I don't recall; perhaps I was in another parallel session? If it's
>> the one with notes at
>> https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg00786.html
>> then a remark close to the top suggests I was there, but there's no
>> sign of this aspect having got discussed. There is, among the issues
>> listed, "Xen build re-evaluates compiler support for every translation
>> unit", but that's only remotely related.
> 
> What is a "translation unit"? What would that have to do with Kconfig?

A translation unit is the collective input from all source and
header files seen by the compiler during the processing of one
top level input file's worth of compilation. The connection to
Kconfig here is that by switching to it, the compiler flags
don't get re-constructed once per CU. Of course doing it via
Kconfig is not the only possible solution to the problem (as
can be seen from the patch that I had intermediately submitted
to get at least the HAVE_AS_* out of that set), but for us the
change in behavior is one intended (side-)effect of the switch
to newer Kconfig.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/14] x86/extable: Adjust extable handling to be shadow stack compatible
  2020-05-29 19:43     ` Andrew Cooper
  2020-05-29 21:17       ` Andrew Cooper
@ 2020-06-02 12:57       ` Jan Beulich
  1 sibling, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2020-06-02 12:57 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29.05.2020 21:43, Andrew Cooper wrote:
> On 28/05/2020 17:15, Jan Beulich wrote:
>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>> @@ -763,6 +775,56 @@ static void do_reserved_trap(struct cpu_user_regs *regs)
>>>            trapnr, vec_name(trapnr), regs->error_code);
>>>  }
>>>  
>>> +static void extable_shstk_fixup(struct cpu_user_regs *regs, unsigned long fixup)
>>> +{
>>> +    unsigned long ssp, *ptr, *base;
>>> +
>>> +    asm ( "rdsspq %0" : "=r" (ssp) : "0" (1) );
>>> +    if ( ssp == 1 )
>>> +        return;
>>> +
>>> +    ptr = _p(ssp);
>>> +    base = _p(get_shstk_bottom(ssp));
>>> +
>>> +    for ( ; ptr < base; ++ptr )
>>> +    {
>>> +        /*
>>> +         * Search for %rip.  The shstk currently looks like this:
>>> +         *
>>> +         *   ...  [Likely pointed to by SSP]
>>> +         *   %cs  [== regs->cs]
>>> +         *   %rip [== regs->rip]
>>> +         *   SSP  [Likely points to 3 slots higher, above %cs]
>>> +         *   ...  [call tree to this function, likely 2/3 slots]
>>> +         *
>>> +         * and we want to overwrite %rip with fixup.  There are two
>>> +         * complications:
>>> +         *   1) We cant depend on SSP values, because they won't differ by 3
>>> +         *      slots if the exception is taken on an IST stack.
>>> +         *   2) There are synthetic (unrealistic but not impossible) scenarios
>>> +         *      where %rip can end up in the call tree to this function, so we
>>> +         *      can't check against regs->rip alone.
>>> +         *
>>> +         * Check for both reg->rip and regs->cs matching.
>>> +         */
>>> +
>>> +        if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
>>> +        {
>>> +            asm ( "wrssq %[fix], %[stk]"
>>> +                  : [stk] "=m" (*ptr)
>> Could this be ptr[0], to match the if()?
>>
>> Considering how important it is that we don't fix up the wrong stack
>> location here, I continue to wonder if we wouldn't better also
>> include the SSP value on the stack in the checking, at the very
>> least by way of an ASSERT() or BUG_ON().
> 
> Well no, for the reason discussed in point 1.

I don't see my suggestion in conflict with that point. I didn't
suggest an check using == ; instead what I'm thinking about here
is something as weak as "Does this point somewhere into the
stack range for this CPU?" After all there are only a limited
set of classes of entries that can be on a shadow stack:
- LIP (Xen .text, livepatching area)
- CS  (<= 0xffff)
- SSP (within stack range for the CPU)
- supervisor token (a single precise address)
- padding (zero)
The number ranges covered by these classes are entirely disjoint,
so qualifying all three slots accordingly can be done without any
risk of getting an entry wrong.

> Its not technically an issue right now, but there is no possible way to
> BUILD_BUG_ON() someone turning an exception into IST, or stopping the
> use of the extable infrastructure on a #DB.
> 
> Such a check would lie in wait and either provide an unexpected tangent
> to someone debugging a complicated issue (I do use #DB for a fair bit),
> or become a security vulnerability.
> 
>> Since, with how the code is
>> currently written, this would require a somewhat odd looking ptr[-1]
>> I also wonder whether "while ( ++ptr < base )" as the loop header
>> wouldn't be better. The first entry on the stack can't be the RIP
>> we look for anyway, can it?
> 
> Yes it can.

How, when there's the return address of this function plus
an SSP value preceding it?

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/14] x86/extable: Adjust extable handling to be shadow stack compatible
  2020-05-29 21:17       ` Andrew Cooper
@ 2020-06-02 13:11         ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2020-06-02 13:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 29.05.2020 23:17, Andrew Cooper wrote:
> On 29/05/2020 20:43, Andrew Cooper wrote:
>> On 28/05/2020 17:15, Jan Beulich wrote:
>>> On 27.05.2020 21:18, Andrew Cooper wrote:
>>>> +
>>>> +        if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
>>>> +        {
>>>> +            asm ( "wrssq %[fix], %[stk]"
>>>> +                  : [stk] "=m" (*ptr)
>>> Could this be ptr[0], to match the if()?
>>>
>>> Considering how important it is that we don't fix up the wrong stack
>>> location here, I continue to wonder if we wouldn't better also
>>> include the SSP value on the stack in the checking, at the very
>>> least by way of an ASSERT() or BUG_ON().
>> Well no, for the reason discussed in point 1.
>>
>> Its not technically an issue right now, but there is no possible way to
>> BUILD_BUG_ON() someone turning an exception into IST, or stopping the
>> use of the extable infrastructure on a #DB.
>>
>> Such a check would lie in wait and either provide an unexpected tangent
>> to someone debugging a complicated issue (I do use #DB for a fair bit),
>> or become a security vulnerability.
> 
> Also (which I forgot first time around),
> 
> The ptr[1] == regs->cs check is 64 bits wide, and the way to get that on
> the shadow stack would be to execute a call instruction ending at at
> 0xe008 linear, or from a bad WRSSQ edit, both of which are serious
> errors and worthy of hitting the BUG().

Maybe you misunderstood me then: By suggesting more strict checking
I'm actually asking for it to become more likely to hit that BUG(),
not less likely. (This is despite agreeing that the very limited
range of CS values on the stack already makes it practically
impossible to mistake the frame found.)

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-06-02 12:41               ` Jan Beulich
@ 2020-06-02 13:50                 ` Anthony PERARD
  2020-06-02 14:13                   ` Jan Beulich
  0 siblings, 1 reply; 56+ messages in thread
From: Anthony PERARD @ 2020-06-02 13:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Xen-devel

On Tue, Jun 02, 2020 at 02:41:30PM +0200, Jan Beulich wrote:
> On 02.06.2020 14:26, Anthony PERARD wrote:
> > On Tue, Jun 02, 2020 at 02:06:11PM +0200, Jan Beulich wrote:
> >> I don't recall; perhaps I was in another parallel session? If it's
> >> the one with notes at
> >> https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg00786.html
> >> then a remark close to the top suggests I was there, but there's no
> >> sign of this aspect having got discussed. There is, among the issues
> >> listed, "Xen build re-evaluates compiler support for every translation
> >> unit", but that's only remotely related.
> > 
> > What is a "translation unit"? What would that have to do with Kconfig?
> 
> A translation unit is the collective input from all source and
> header files seen by the compiler during the processing of one
> top level input file's worth of compilation.

Thanks. Now it feels like the issue listed in the design session note
isn't exactly correct, compiler didn't used to be evaluated once per CU,
but only once per Makefile loaded/executed.

> The connection to
> Kconfig here is that by switching to it, the compiler flags
> don't get re-constructed once per CU. Of course doing it via
> Kconfig is not the only possible solution to the problem (as
> can be seen from the patch that I had intermediately submitted
> to get at least the HAVE_AS_* out of that set), but for us the
> change in behavior is one intended (side-)effect of the switch
> to newer Kconfig.

Well, with a recent patch of mine, tool chain support should already be
only evaluated only once in the root Makefile. So I do also think that
we don't need to move all of them to Kconfig.

So advantage of evaluating some compiler flags in Kconfig is that we
can have other CONFIG_* depends on those flags; the inconvenient is to
figure out when do we need to re-evaluate the .config.

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support
  2020-06-02 13:50                 ` Anthony PERARD
@ 2020-06-02 14:13                   ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2020-06-02 14:13 UTC (permalink / raw)
  To: Anthony PERARD; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Xen-devel

On 02.06.2020 15:50, Anthony PERARD wrote:
> On Tue, Jun 02, 2020 at 02:41:30PM +0200, Jan Beulich wrote:
>> On 02.06.2020 14:26, Anthony PERARD wrote:
>>> On Tue, Jun 02, 2020 at 02:06:11PM +0200, Jan Beulich wrote:
>>>> I don't recall; perhaps I was in another parallel session? If it's
>>>> the one with notes at
>>>> https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg00786.html
>>>> then a remark close to the top suggests I was there, but there's no
>>>> sign of this aspect having got discussed. There is, among the issues
>>>> listed, "Xen build re-evaluates compiler support for every translation
>>>> unit", but that's only remotely related.
>>>
>>> What is a "translation unit"? What would that have to do with Kconfig?
>>
>> A translation unit is the collective input from all source and
>> header files seen by the compiler during the processing of one
>> top level input file's worth of compilation.
> 
> Thanks. Now it feels like the issue listed in the design session note
> isn't exactly correct, compiler didn't used to be evaluated once per CU,
> but only once per Makefile loaded/executed.
> 
>> The connection to
>> Kconfig here is that by switching to it, the compiler flags
>> don't get re-constructed once per CU. Of course doing it via
>> Kconfig is not the only possible solution to the problem (as
>> can be seen from the patch that I had intermediately submitted
>> to get at least the HAVE_AS_* out of that set), but for us the
>> change in behavior is one intended (side-)effect of the switch
>> to newer Kconfig.
> 
> Well, with a recent patch of mine, tool chain support should already be
> only evaluated only once in the root Makefile. So I do also think that
> we don't need to move all of them to Kconfig.
> 
> So advantage of evaluating some compiler flags in Kconfig is that we
> can have other CONFIG_* depends on those flags;

I'm not even certain this is an advantage: CET-SS is going a
different direction than the majority of cases we have had in
the tree so far (INDIRECT_THUNK is another such case). In the
majority of cases, though, the tool chain capabilities don't
affect functionality, but just code quality. And indeed I
think that's how Kconfig should be used: When configuring Xen,
it ought to be specified what functionality is wanted,
irrespective of compilers used/available. In particular a
syncconfig run would better not require a host compiler to be
available at all, such that configurations for targets for
which there's no cross compiler can also be produced. (Not
sure whether that's still the case, but our internal kernel
processing used to work that way, without requiring everyone
to have compilers for all target architectures our kernels
support.)

It would be the responsibility of the person doing the config
to ensure only options get enabled that can actually be built
in the environment that the full build is actually to be run
in. (That is, for the case it's inconvenient [CET-SS] or
impossible [INDIRECT_THUNK] to build without suitable tool
chain support, I'm not questioning the desire to have certain
features outright unavailable. It's just that I don't see why
this unavailability needs to be expressed at the Kconfig
level.)

Jan

> the inconvenient is to
> figure out when do we need to re-evaluate the .config.
> 



^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2020-06-02 14:14 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 19:18 [PATCH v2 00/14] x86: Support for CET Supervisor Shadow Stacks Andrew Cooper
2020-05-27 19:18 ` [PATCH v2 01/14] x86/traps: Clean up printing in {do_reserved, fatal}_trap() Andrew Cooper
2020-05-28  9:45   ` [PATCH v2 01/14] x86/traps: Clean up printing in {do_reserved,fatal}_trap() Jan Beulich
2020-05-27 19:18 ` [PATCH v2 02/14] x86/traps: Factor out extable_fixup() and make printing consistent Andrew Cooper
2020-05-28  9:50   ` Jan Beulich
2020-05-28 17:26     ` Andrew Cooper
2020-05-27 19:18 ` [PATCH v2 03/14] x86/shstk: Introduce Supervisor Shadow Stack support Andrew Cooper
2020-05-28 10:25   ` Jan Beulich
2020-05-28 18:10     ` Andrew Cooper
2020-05-29 11:59       ` Jan Beulich
2020-05-29 15:51         ` Anthony PERARD
2020-05-29 18:39           ` Andrew Cooper
2020-06-02 12:09             ` Jan Beulich
2020-05-29 18:36         ` Andrew Cooper
2020-06-02 12:06           ` Jan Beulich
2020-06-02 12:26             ` Anthony PERARD
2020-06-02 12:41               ` Jan Beulich
2020-06-02 13:50                 ` Anthony PERARD
2020-06-02 14:13                   ` Jan Beulich
2020-05-27 19:18 ` [PATCH v2 04/14] x86/traps: Implement #CP handler and extend #PF for shadow stacks Andrew Cooper
2020-05-28 12:03   ` Jan Beulich
2020-05-28 13:22     ` Andrew Cooper
2020-05-28 13:31       ` Jan Beulich
2020-05-29 18:50         ` Andrew Cooper
2020-05-27 19:18 ` [PATCH v2 05/14] x86/shstk: Re-layout the stack block " Andrew Cooper
2020-05-28 12:33   ` Jan Beulich
2020-05-29 19:21     ` Andrew Cooper
2020-05-27 19:18 ` [PATCH v2 06/14] x86/shstk: Create " Andrew Cooper
2020-05-28 12:50   ` Jan Beulich
2020-05-29 19:35     ` Andrew Cooper
2020-05-29 21:45       ` Andrew Cooper
2020-06-02 12:32         ` Jan Beulich
2020-06-02 12:35       ` Jan Beulich
2020-05-27 19:18 ` [PATCH v2 07/14] x86/cpu: Adjust enable_nmis() to be shadow stack compatible Andrew Cooper
2020-05-27 19:18 ` [PATCH v2 08/14] x86/cpu: Adjust reset_stack_and_jump() " Andrew Cooper
2020-05-28 14:41   ` Jan Beulich
2020-05-27 19:18 ` [PATCH v2 09/14] x86/spec-ctrl: Adjust DO_OVERWRITE_RSB " Andrew Cooper
2020-05-27 19:18 ` [PATCH v2 10/14] x86/extable: Adjust extable handling " Andrew Cooper
2020-05-28 16:15   ` Jan Beulich
2020-05-29 19:43     ` Andrew Cooper
2020-05-29 21:17       ` Andrew Cooper
2020-06-02 13:11         ` Jan Beulich
2020-06-02 12:57       ` Jan Beulich
2020-05-27 19:18 ` [PATCH v2 11/14] x86/alt: Adjust _alternative_instructions() to not create shadow stacks Andrew Cooper
2020-05-29 12:23   ` Jan Beulich
2020-05-29 19:46     ` Andrew Cooper
2020-05-27 19:18 ` [PATCH v2 12/14] x86/entry: Adjust guest paths to be shadow stack compatible Andrew Cooper
2020-05-29 12:40   ` Jan Beulich
2020-05-29 19:58     ` Andrew Cooper
2020-05-27 19:18 ` [PATCH v2 13/14] x86/S3: Save and restore Shadow Stack configuration Andrew Cooper
2020-05-29 12:52   ` Jan Beulich
2020-05-29 20:00     ` Andrew Cooper
2020-05-27 19:18 ` [PATCH v2 14/14] x86/shstk: Activate Supervisor Shadow Stacks Andrew Cooper
2020-05-29 13:09   ` Jan Beulich
2020-05-29 20:28     ` Andrew Cooper
2020-05-29 22:28 ` [PATCH v2 00/14] x86: Support for CET " Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.