All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] x86: CET-SS related adjustments
@ 2024-02-28 13:50 Jan Beulich
  2024-02-28 13:51 ` [PATCH 1/4] x86: remove redundant XEN_SHSTK check from reinit_bsp_stack() Jan Beulich
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Jan Beulich @ 2024-02-28 13:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

One might think of this as follow-on to XSA-451, but that's not quite
the right order of events.

There are a few open aspects; see individual patches.

1: remove redundant XEN_SHSTK check from reinit_bsp_stack()
2: record SSP at non-guest entry points
3: traps: use entry_ssp in fixup_exception_return()
4: prefer shadow stack for producing call traces

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] x86: remove redundant XEN_SHSTK check from reinit_bsp_stack()
  2024-02-28 13:50 [PATCH 0/4] x86: CET-SS related adjustments Jan Beulich
@ 2024-02-28 13:51 ` Jan Beulich
  2024-02-28 13:54   ` Andrew Cooper
  2024-02-28 13:52 ` [PATCH 2/4] x86: record SSP at non-guest entry points Jan Beulich
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-02-28 13:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

As of 72d51813d631 ("x86: amend cpu_has_xen_{ibt,shstk}") this has been
integrated into cpu_has_xen_shstk.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -810,7 +810,7 @@ static void __init noreturn reinit_bsp_s
     if ( rc )
         panic("Error %d setting up PV root page table\n", rc);
 
-    if ( IS_ENABLED(CONFIG_XEN_SHSTK) && cpu_has_xen_shstk )
+    if ( cpu_has_xen_shstk )
     {
         wrmsrl(MSR_PL0_SSP,
                (unsigned long)stack + (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8);



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/4] x86: record SSP at non-guest entry points
  2024-02-28 13:50 [PATCH 0/4] x86: CET-SS related adjustments Jan Beulich
  2024-02-28 13:51 ` [PATCH 1/4] x86: remove redundant XEN_SHSTK check from reinit_bsp_stack() Jan Beulich
@ 2024-02-28 13:52 ` Jan Beulich
  2024-02-28 15:16   ` Andrew Cooper
  2024-02-28 13:52 ` [PATCH 3/4] x86/traps: use entry_ssp in fixup_exception_return() Jan Beulich
  2024-02-28 13:53 ` [PATCH 4/4] x86: prefer shadow stack for producing call traces Jan Beulich
  3 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-02-28 13:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

We will want to use that value for call trace generation, and likely
also to eliminate the somewhat fragile shadow stack searching done in
fixup_exception_return(). For those purposes, guest-only entry points do
not need to record that value.

To keep the saving code simple, record our own SSP that corresponds to
an exception frame, pointing to the top of the shadow stack counterpart
of what the CPU has saved on the regular stack. Consuming code can then
work its way from there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
To record the full 64-bit value, some of the unused bits in the %cs slot
could be used. Sadly that slot has saved_upcall_mask in an unhelpful
location, otherwise simply storing low and high 32 bits in those two
separate half-slots would be a pretty obvious choice. As long as
"entry_ssp" is used in non-guest-entry frames only, we could of course
put half of it into a union with saved_upcall_mask ...

Else may want to put a BUILD_BUG_ON(VADDR_BITS > 48) somewhere, but I'm
afraid I can't really identify a good place for such to live.

Leveraging that the CPU stores zero in the upper bits of the selector
register slots, the save sequence could also be

        shl   $16, %rcx
        or    %rcx, UREGS_entry_ssp-2(%rsp)

That's shorter and avoids a 16-bit operation, but may be less desirable,
for being a read-modify-write access.

--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1354,6 +1354,7 @@ void arch_get_info_guest(struct vcpu *v,
     if ( !compat )
     {
         memcpy(&c.nat->user_regs, &v->arch.user_regs, sizeof(c.nat->user_regs));
+        c.nat->user_regs.entry_ssp = 0;
         if ( is_pv_domain(d) )
             memcpy(c.nat->trap_ctxt, v->arch.pv.trap_ctxt,
                    sizeof(c.nat->trap_ctxt));
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -94,7 +94,7 @@ __UNLIKELY_END(nsvm_hap)
         sti
         vmrun
 
-        SAVE_ALL
+        SAVE_ALL ssp=0
 
         GET_CURRENT(bx)
 
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -25,7 +25,7 @@
 #define VMLAUNCH     .byte 0x0f,0x01,0xc2
 
 ENTRY(vmx_asm_vmexit_handler)
-        SAVE_ALL
+        SAVE_ALL ssp=0
 
         mov  %cr2,%rax
         GET_CURRENT(bx)
@@ -119,7 +119,7 @@ UNLIKELY_END(realmode)
 
 .Lvmx_vmentry_fail:
         sti
-        SAVE_ALL
+        SAVE_ALL ssp=0
 
         /*
          * SPEC_CTRL_ENTRY notes
--- a/xen/arch/x86/include/asm/asm_defns.h
+++ b/xen/arch/x86/include/asm/asm_defns.h
@@ -221,7 +221,7 @@ static always_inline void stac(void)
 #endif
 
 #ifdef __ASSEMBLY__
-.macro SAVE_ALL compat=0
+.macro SAVE_ALL compat=0 ssp=IS_ENABLED(CONFIG_XEN_SHSTK)
         addq  $-(UREGS_error_code-UREGS_r15), %rsp
         cld
         movq  %rdi,UREGS_rdi(%rsp)
@@ -235,6 +235,9 @@ static always_inline void stac(void)
         movq  %rax,UREGS_rax(%rsp)
         xor   %eax, %eax
 .if !\compat
+.if \ssp
+        rdsspq %rcx
+.endif
         movq  %r8,UREGS_r8(%rsp)
         movq  %r9,UREGS_r9(%rsp)
         movq  %r10,UREGS_r10(%rsp)
@@ -264,6 +267,11 @@ static always_inline void stac(void)
         xor   %r13d, %r13d
         xor   %r14d, %r14d
         xor   %r15d, %r15d
+.if \ssp && !\compat
+        mov   %cx, UREGS_entry_ssp(%rsp)
+        shr   $16, %rcx
+        mov   %ecx, UREGS_entry_ssp+2(%rsp)
+.endif
 .endm
 
 #define LOAD_ONE_REG(reg, compat) \
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -48,6 +48,7 @@ void __dummy__(void)
     OFFSET(UREGS_eflags, struct cpu_user_regs, rflags);
     OFFSET(UREGS_rsp, struct cpu_user_regs, rsp);
     OFFSET(UREGS_ss, struct cpu_user_regs, ss);
+    OFFSET(UREGS_entry_ssp, struct cpu_user_regs, entry_ssp);
     OFFSET(UREGS_kernel_sizeof, struct cpu_user_regs, es);
     BLANK();
 
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -257,7 +257,7 @@ FUNC(lstar_enter)
         pushq $0
         BUILD_BUG_ON(TRAP_syscall & 0xff)
         movb  $TRAP_syscall >> 8, EFRAME_entry_vector + 1(%rsp)
-        SAVE_ALL
+        SAVE_ALL ssp=0
 
         SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
@@ -296,7 +296,7 @@ FUNC(cstar_enter)
         pushq $0
         BUILD_BUG_ON(TRAP_syscall & 0xff)
         movb  $TRAP_syscall >> 8, EFRAME_entry_vector + 1(%rsp)
-        SAVE_ALL
+        SAVE_ALL ssp=0
 
         SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
@@ -339,7 +339,7 @@ LABEL(sysenter_eflags_saved, 0)
         pushq $0
         BUILD_BUG_ON(TRAP_syscall & 0xff)
         movb  $TRAP_syscall >> 8, EFRAME_entry_vector + 1(%rsp)
-        SAVE_ALL
+        SAVE_ALL ssp=0
 
         SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
@@ -394,7 +394,7 @@ FUNC(entry_int80)
         ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
         pushq $0
         movb  $0x80, EFRAME_entry_vector(%rsp)
-        SAVE_ALL
+        SAVE_ALL ssp=0
 
         SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
--- a/xen/include/public/arch-x86/xen-x86_64.h
+++ b/xen/include/public/arch-x86/xen-x86_64.h
@@ -183,7 +183,19 @@ struct cpu_user_regs {
     uint8_t  _pad1[3];
     __DECL_REG_LO16(flags); /* rflags.IF == !saved_upcall_mask */
     __DECL_REG_LO8(sp);
-    uint16_t ss, _pad2[3];
+    uint16_t ss;
+#if !defined(__XEN__)
+    uint16_t _pad2[3];
+#elif defined(COMPILE_OFFSETS)
+    uint16_t entry_ssp[3];
+#else
+    /*
+     * This points _at_ the corresponding shadow stack frame; it is _not_ the
+     * outer context's SSP.  That, if the outer context has CET-SS enabled,
+     * is stored in the top slot of the pointed to shadow stack frame.
+     */
+    signed long entry_ssp:48;
+#endif
     uint16_t es, _pad3[3];
     uint16_t ds, _pad4[3];
     uint16_t fs, _pad5[3];



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/4] x86/traps: use entry_ssp in fixup_exception_return()
  2024-02-28 13:50 [PATCH 0/4] x86: CET-SS related adjustments Jan Beulich
  2024-02-28 13:51 ` [PATCH 1/4] x86: remove redundant XEN_SHSTK check from reinit_bsp_stack() Jan Beulich
  2024-02-28 13:52 ` [PATCH 2/4] x86: record SSP at non-guest entry points Jan Beulich
@ 2024-02-28 13:52 ` Jan Beulich
  2024-02-28 15:21   ` Andrew Cooper
  2024-02-28 13:53 ` [PATCH 4/4] x86: prefer shadow stack for producing call traces Jan Beulich
  3 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-02-28 13:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

With the value recorded on entry there's no need anymore to go hunt for
the respective exception frame on the shadow stack. By deriving "ptr"
from that field (without any offset), it then ends up pointin one slot
lower than before. Therefore all array indexes need incrementing, nicely
doing away with all the negative ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Indentation of the prior inner (but not innermost) if()'s body is
deliberately left untouched, to aid review. It'll be adjusted in a
separate follow-on patch.

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -434,18 +434,6 @@ unsigned long get_stack_trace_bottom(uns
     }
 }
 
-static unsigned long get_shstk_bottom(unsigned long sp)
-{
-    switch ( get_stack_page(sp) )
-    {
-#ifdef CONFIG_XEN_SHSTK
-    case 0:  return ROUNDUP(sp, IST_SHSTK_SIZE) - sizeof(unsigned long);
-    case 5:  return ROUNDUP(sp, PAGE_SIZE)      - sizeof(unsigned long);
-#endif
-    default: return sp - sizeof(unsigned long);
-    }
-}
-
 unsigned long get_stack_dump_bottom(unsigned long sp)
 {
     switch ( get_stack_page(sp) )
@@ -837,24 +825,26 @@ static void fixup_exception_return(struc
 {
     if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
     {
-        unsigned long ssp, *ptr, *base;
+        unsigned long ssp = rdssp();
 
-        if ( (ssp = rdssp()) == SSP_NO_SHSTK )
-            goto shstk_done;
+        if ( ssp != SSP_NO_SHSTK )
+        {
+            unsigned long *ptr = _p(regs->entry_ssp);
+            unsigned long primary_shstk =
+                (ssp & ~(STACK_SIZE - 1)) +
+                (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8;
 
-        ptr = _p(ssp);
-        base = _p(get_shstk_bottom(ssp));
+            BUG_ON((regs->entry_ssp ^ primary_shstk) >> PAGE_SHIFT);
 
-        for ( ; ptr < base; ++ptr )
-        {
             /*
-             * Search for %rip.  The shstk currently looks like this:
+             * The shstk currently looks like this:
              *
              *   tok  [Supervisor token, == &tok | BUSY, only with FRED inactive]
              *   ...  [Pointed to by SSP for most exceptions, empty in IST cases]
              *   %cs  [== regs->cs]
              *   %rip [== regs->rip]
-             *   SSP  [Likely points to 3 slots higher, above %cs]
+             *   SSP  [Pointed to by entry_ssp; Likely points to 3 slots
+             *         higher, above %cs]
              *   ...  [call tree to this function, likely 2/3 slots]
              *
              * and we want to overwrite %rip with fixup.  There are two
@@ -867,13 +857,10 @@ static void fixup_exception_return(struc
              *
              * Check for both regs->rip and regs->cs matching.
              */
-            if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
-            {
-                unsigned long primary_shstk =
-                    (ssp & ~(STACK_SIZE - 1)) +
-                    (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8;
+            BUG_ON(ptr[1] != regs->rip || ptr[2] != regs->cs);
 
-                wrss(fixup, ptr);
+            {
+                wrss(fixup, &ptr[1]);
 
                 if ( !stub_ra )
                     goto shstk_done;
@@ -890,7 +877,7 @@ static void fixup_exception_return(struc
                  * - if we're on an IST stack, we need to increment the
                  *   original SSP.
                  */
-                BUG_ON((ptr[-1] ^ primary_shstk) >> PAGE_SHIFT);
+                BUG_ON((ptr[0] ^ primary_shstk) >> PAGE_SHIFT);
 
                 if ( (ssp ^ primary_shstk) >> PAGE_SHIFT )
                 {
@@ -899,37 +886,27 @@ static void fixup_exception_return(struc
                      * addresses actually match.  Then increment the interrupted
                      * context's SSP.
                      */
-                    BUG_ON(stub_ra != *(unsigned long*)ptr[-1]);
-                    wrss(ptr[-1] + 8, &ptr[-1]);
+                    BUG_ON(stub_ra != *(unsigned long*)ptr[0]);
+                    wrss(ptr[0] + 8, &ptr[0]);
                     goto shstk_done;
                 }
 
                 /* Make sure the two return addresses actually match. */
-                BUG_ON(stub_ra != ptr[2]);
+                BUG_ON(stub_ra != ptr[3]);
 
                 /* Move exception frame, updating SSP there. */
-                wrss(ptr[1], &ptr[2]); /* %cs */
-                wrss(ptr[0], &ptr[1]); /* %rip */
-                wrss(ptr[-1] + 8, &ptr[0]); /* SSP */
+                wrss(ptr[2], &ptr[3]); /* %cs */
+                wrss(ptr[1], &ptr[2]); /* %rip */
+                wrss(ptr[0] + 8, &ptr[1]); /* SSP */
 
                 /* Move all newer entries. */
-                while ( --ptr != _p(ssp) )
-                    wrss(ptr[-1], &ptr[0]);
+                while ( ptr-- != _p(ssp) )
+                    wrss(ptr[0], &ptr[1]);
 
                 /* Finally account for our own stack having shifted up. */
                 asm volatile ( "incsspd %0" :: "r" (2) );
-
-                goto shstk_done;
             }
         }
-
-        /*
-         * We failed to locate and fix up the shadow IRET frame.  This could
-         * be due to shadow stack corruption, or bad logic above.  We cannot
-         * continue executing the interrupted context.
-         */
-        BUG();
-
     }
  shstk_done:
 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/4] x86: prefer shadow stack for producing call traces
  2024-02-28 13:50 [PATCH 0/4] x86: CET-SS related adjustments Jan Beulich
                   ` (2 preceding siblings ...)
  2024-02-28 13:52 ` [PATCH 3/4] x86/traps: use entry_ssp in fixup_exception_return() Jan Beulich
@ 2024-02-28 13:53 ` Jan Beulich
  2024-02-28 16:15   ` Andrew Cooper
  3 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-02-28 13:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Shadow stacks contain little more than return addresses, and they in
particular allow precise call traces also without FRAME_POINTER.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
While the 'E' for exception frames is probably okay, I'm not overly
happy with the 'C' (for CET). I would have preferred 'S' (for shadow),
but we use that character already.

As an alternative to suppressing output for the top level exception
frame, adding the new code ahead of the 'R' output line (and then also
ahead of the stack top read) could be considered.

Perhaps having a printk() for the PV entry case is meaningless, for
- no frame being pushed when entered from CPL=3 (64-bit PV),
- no entry possible from CPL<3 (32-bit PV disabled when CET is active)?
In which case the comment probably should just be "Bogus." and the code
merely be "break;".

Quite likely a number of other uses of is_active_kernel_text() also want
amending with in_stub().

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -449,6 +449,11 @@ unsigned long get_stack_dump_bottom(unsi
     }
 }
 
+static bool in_stub(unsigned long addr)
+{
+    return !((this_cpu(stubs.addr) ^ addr) >> STUB_BUF_SHIFT);
+}
+
 #if !defined(CONFIG_FRAME_POINTER)
 
 /*
@@ -539,6 +544,50 @@ static void show_trace(const struct cpu_
          !is_active_kernel_text(tos) )
         printk("   [<%p>] R %pS\n", _p(regs->rip), _p(regs->rip));
 
+    if ( IS_ENABLED(CONFIG_XEN_SHSTK) && rdssp() != SSP_NO_SHSTK )
+    {
+        const unsigned long *ptr = _p(regs->entry_ssp);
+        unsigned int n;
+
+        for ( n = 0; (unsigned long)ptr & (PAGE_SIZE - sizeof(*ptr)); ++n )
+        {
+            unsigned long val = *ptr;
+
+            if ( is_active_kernel_text(val) || in_stub(val) )
+            {
+                /* Normal return address entry.  */
+                printk("   [<%p>] C %pS\n", _p(val), _p(val));
+                ++ptr;
+            }
+            else if ( !((val ^ *ptr) >> (PAGE_SHIFT + STACK_ORDER)) )
+            {
+                if ( val & (sizeof(val) - 1) )
+                {
+                    /* Most likely a supervisor token. */
+                    break;
+                }
+
+                /*
+                 * Ought to be a hypervisor interruption frame.  But don't
+                 * (re)log the current frame's %rip.
+                 */
+                if ( n || ptr[1] != regs->rip )
+                    printk("   [<%p>] E %pS\n", _p(ptr[1]), _p(ptr[1]));
+                ptr = _p(val);
+            }
+            else
+            {
+                /* Ought to be a PV guest hypercall/interruption frame.  */
+                printk("   %04lx:[<%p>] E\n", ptr[2], _p(ptr[1]));
+                ptr = 0;
+            }
+        }
+
+        /* Fall back to legacy stack trace if nothing was logged at all. */
+        if ( n )
+            return;
+    }
+
     if ( fault )
     {
         printk("   [Fault on access]\n");



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] x86: remove redundant XEN_SHSTK check from reinit_bsp_stack()
  2024-02-28 13:51 ` [PATCH 1/4] x86: remove redundant XEN_SHSTK check from reinit_bsp_stack() Jan Beulich
@ 2024-02-28 13:54   ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2024-02-28 13:54 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 28/02/2024 1:51 pm, Jan Beulich wrote:
> As of 72d51813d631 ("x86: amend cpu_has_xen_{ibt,shstk}") this has been
> integrated into cpu_has_xen_shstk.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

I'm vaguely recall noticing that after your original patch, but never
got around to sending a fix.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] x86: record SSP at non-guest entry points
  2024-02-28 13:52 ` [PATCH 2/4] x86: record SSP at non-guest entry points Jan Beulich
@ 2024-02-28 15:16   ` Andrew Cooper
  2024-02-29  9:39     ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2024-02-28 15:16 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 28/02/2024 1:52 pm, Jan Beulich wrote:
> We will want to use that value for call trace generation, and likely
> also to eliminate the somewhat fragile shadow stack searching done in
> fixup_exception_return(). For those purposes, guest-only entry points do
> not need to record that value.
>
> To keep the saving code simple, record our own SSP that corresponds to
> an exception frame, pointing to the top of the shadow stack counterpart
> of what the CPU has saved on the regular stack. Consuming code can then
> work its way from there.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> To record the full 64-bit value, some of the unused bits in the %cs slot
> could be used. Sadly that slot has saved_upcall_mask in an unhelpful
> location, otherwise simply storing low and high 32 bits in those two
> separate half-slots would be a pretty obvious choice. As long as
> "entry_ssp" is used in non-guest-entry frames only, we could of course
> put half of it into a union with saved_upcall_mask ...
>
> Else may want to put a BUILD_BUG_ON(VADDR_BITS > 48) somewhere, but I'm
> afraid I can't really identify a good place for such to live.

Perhaps in reinit_bsp_stack() when we enable SHSTK on the BSP?

Having it anywhere vaguely relevant is better than not having it.

>
> Leveraging that the CPU stores zero in the upper bits of the selector
> register slots, the save sequence could also be
>
>         shl   $16, %rcx
>         or    %rcx, UREGS_entry_ssp-2(%rsp)
>
> That's shorter and avoids a 16-bit operation, but may be less desirable,
> for being a read-modify-write access.

I doubt you'll be able to measure a difference between the two options
(it's into the active stack, after all), but the two stores is probably
marginally better.  When shstks are active, we're taking a large hit
from the busy token handling.

I was concerned by the misaligned access, but it's not misaligned, its
it?  It's the start of entry_ssp which is misaligned and the -2 brings
it back to being properly aligned.

> --- a/xen/arch/x86/include/asm/asm_defns.h
> +++ b/xen/arch/x86/include/asm/asm_defns.h
> @@ -221,7 +221,7 @@ static always_inline void stac(void)
>  #endif
>  
>  #ifdef __ASSEMBLY__
> -.macro SAVE_ALL compat=0
> +.macro SAVE_ALL compat=0 ssp=IS_ENABLED(CONFIG_XEN_SHSTK)

I'm not sure this is what you want to do.  Because it's only the
default, we'll still....

>          addq  $-(UREGS_error_code-UREGS_r15), %rsp
>          cld
>          movq  %rdi,UREGS_rdi(%rsp)
> @@ -235,6 +235,9 @@ static always_inline void stac(void)
>          movq  %rax,UREGS_rax(%rsp)
>          xor   %eax, %eax
>  .if !\compat
> +.if \ssp
> +        rdsspq %rcx

... pick this up even in !CONFIG_XEN_SHSTK builds, and ...

> +.endif
>          movq  %r8,UREGS_r8(%rsp)
>          movq  %r9,UREGS_r9(%rsp)
>          movq  %r10,UREGS_r10(%rsp)
> @@ -264,6 +267,11 @@ static always_inline void stac(void)
>          xor   %r13d, %r13d
>          xor   %r14d, %r14d
>          xor   %r15d, %r15d
> +.if \ssp && !\compat
> +        mov   %cx, UREGS_entry_ssp(%rsp)
> +        shr   $16, %rcx
> +        mov   %ecx, UREGS_entry_ssp+2(%rsp)

... store it here.

I think you need to use ssp=1 by default, and

#ifdef CONFIG_XEN_SHSTK
.if
    ...

for these two blocks, so they disappear properly in !SHSTK builds.

But for the rest of the behaviour, there are two overlapping things,
because you end up getting entry_ssp=0 in SHSTK builds running on
hardware without shstk active.

And with that in mind, to confirm, the RDSSP block depends on the xor
%ecx,%ecx between the two hunks in order to function as intended?

> --- a/xen/include/public/arch-x86/xen-x86_64.h
> +++ b/xen/include/public/arch-x86/xen-x86_64.h
> @@ -183,7 +183,19 @@ struct cpu_user_regs {
>      uint8_t  _pad1[3];
>      __DECL_REG_LO16(flags); /* rflags.IF == !saved_upcall_mask */
>      __DECL_REG_LO8(sp);
> -    uint16_t ss, _pad2[3];
> +    uint16_t ss;
> +#if !defined(__XEN__)
> +    uint16_t _pad2[3];
> +#elif defined(COMPILE_OFFSETS)
> +    uint16_t entry_ssp[3];
> +#else
> +    /*
> +     * This points _at_ the corresponding shadow stack frame; it is _not_ the
> +     * outer context's SSP.  That, if the outer context has CET-SS enabled,
> +     * is stored in the top slot of the pointed to shadow stack frame.
> +     */
> +    signed long entry_ssp:48;
> +#endif

I have to admit that I dislike this.  (And only some of that is because
it's work I'm going to have to revert in order to make FRED support work...)

We desperately need to break our use of this structure to start with,
and with that done, we don't need to play games about hiding SSP in a
spare 6 bytes in an ABI dubiously made public nearly two decades ago.

How hard would it be just:

#define cpu_user_regs abi_cpu_user_regs
#include <public/xen.h>
#undef cpu_user_regs

and make a copy of cpu_user_regs that we can really put an SSP field into?

It would make this patch more simple, and we could finally get the vm86
block off the stack (which also fixes OoB accesses in the #DF handler -
cant remember if I finished the bodge around that or not.)

Irrespective of anything else, why do we have COMPILE_OFFSETS getting in
here?

~Andrew


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] x86/traps: use entry_ssp in fixup_exception_return()
  2024-02-28 13:52 ` [PATCH 3/4] x86/traps: use entry_ssp in fixup_exception_return() Jan Beulich
@ 2024-02-28 15:21   ` Andrew Cooper
  2024-02-28 15:26     ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2024-02-28 15:21 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 28/02/2024 1:52 pm, Jan Beulich wrote:
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -837,24 +825,26 @@ static void fixup_exception_return(struc
>  {
>      if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
>      {
> -        unsigned long ssp, *ptr, *base;
> +        unsigned long ssp = rdssp();
>  
> -        if ( (ssp = rdssp()) == SSP_NO_SHSTK )
> -            goto shstk_done;
> +        if ( ssp != SSP_NO_SHSTK )
> +        {
> +            unsigned long *ptr = _p(regs->entry_ssp);

To double check, this works by the magic of:

    signed long entry_ssp:48;

getting sign extended back into a canonical address?

~Andrew


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] x86/traps: use entry_ssp in fixup_exception_return()
  2024-02-28 15:21   ` Andrew Cooper
@ 2024-02-28 15:26     ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2024-02-28 15:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 28.02.2024 16:21, Andrew Cooper wrote:
> On 28/02/2024 1:52 pm, Jan Beulich wrote:
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -837,24 +825,26 @@ static void fixup_exception_return(struc
>>  {
>>      if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
>>      {
>> -        unsigned long ssp, *ptr, *base;
>> +        unsigned long ssp = rdssp();
>>  
>> -        if ( (ssp = rdssp()) == SSP_NO_SHSTK )
>> -            goto shstk_done;
>> +        if ( ssp != SSP_NO_SHSTK )
>> +        {
>> +            unsigned long *ptr = _p(regs->entry_ssp);
> 
> To double check, this works by the magic of:
> 
>     signed long entry_ssp:48;
> 
> getting sign extended back into a canonical address?

That's the very reason for using an explicitly signed bitfield there,
yes.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] x86: prefer shadow stack for producing call traces
  2024-02-28 13:53 ` [PATCH 4/4] x86: prefer shadow stack for producing call traces Jan Beulich
@ 2024-02-28 16:15   ` Andrew Cooper
  2024-02-29  8:28     ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2024-02-28 16:15 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 28/02/2024 1:53 pm, Jan Beulich wrote:
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -539,6 +544,50 @@ static void show_trace(const struct cpu_
>           !is_active_kernel_text(tos) )
>          printk("   [<%p>] R %pS\n", _p(regs->rip), _p(regs->rip));
>  
> +    if ( IS_ENABLED(CONFIG_XEN_SHSTK) && rdssp() != SSP_NO_SHSTK )
> +    {
> +        const unsigned long *ptr = _p(regs->entry_ssp);
> +        unsigned int n;
> +
> +        for ( n = 0; (unsigned long)ptr & (PAGE_SIZE - sizeof(*ptr)); ++n )
> +        {
> +            unsigned long val = *ptr;
> +
> +            if ( is_active_kernel_text(val) || in_stub(val) )
> +            {
> +                /* Normal return address entry.  */
> +                printk("   [<%p>] C %pS\n", _p(val), _p(val));
> +                ++ptr;
> +            }
> +            else if ( !((val ^ *ptr) >> (PAGE_SHIFT + STACK_ORDER)) )
> +            {
> +                if ( val & (sizeof(val) - 1) )
> +                {
> +                    /* Most likely a supervisor token. */
> +                    break;
> +                }

Tokens are their own linear address, with metadata in the bottom two
bits.  I think it would be better to check that explicitly, rather than
assuming anything nonzero in the upper bits is a token.

> +
> +                /*
> +                 * Ought to be a hypervisor interruption frame.  But don't
> +                 * (re)log the current frame's %rip.
> +                 */
> +                if ( n || ptr[1] != regs->rip )
> +                    printk("   [<%p>] E %pS\n", _p(ptr[1]), _p(ptr[1]));
> +                ptr = _p(val);
> +            }
> +            else
> +            {
> +                /* Ought to be a PV guest hypercall/interruption frame.  */
> +                printk("   %04lx:[<%p>] E\n", ptr[2], _p(ptr[1]));
> +                ptr = 0;

On a CPL3 -> CPL0 transition, the guest's SSP is written back into
MSR_PL3_SSP.  The supervisor token on MSR_PL0_SSP is marked busy (either
automatically, or by SETSSBY), but nothing pertaining to CPL3 is pushed
onto the supervisor shadow stack.

This is why we can move off an IST stack onto the primary stack when
interrupting CPL3 with only a CLEARSSBSY/SETSSBSY pair, and no memmove()
loop of WRSS's.

In other words, I'm pretty sure this is a dead codeapth.  (Or worse, if
it happens not to  be dead, then the comment is misleading.)

A CPL1 -> CPL0 transition does push an shstk interrupt frame, and not
wanting to memmove() the shstk by 3 slots on a context switch is part of
why I just disallowed PV32 guests when CET was active.

~Andrew


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] x86: prefer shadow stack for producing call traces
  2024-02-28 16:15   ` Andrew Cooper
@ 2024-02-29  8:28     ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2024-02-29  8:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 28.02.2024 17:15, Andrew Cooper wrote:
> On 28/02/2024 1:53 pm, Jan Beulich wrote:
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -539,6 +544,50 @@ static void show_trace(const struct cpu_
>>           !is_active_kernel_text(tos) )
>>          printk("   [<%p>] R %pS\n", _p(regs->rip), _p(regs->rip));
>>  
>> +    if ( IS_ENABLED(CONFIG_XEN_SHSTK) && rdssp() != SSP_NO_SHSTK )
>> +    {
>> +        const unsigned long *ptr = _p(regs->entry_ssp);
>> +        unsigned int n;
>> +
>> +        for ( n = 0; (unsigned long)ptr & (PAGE_SIZE - sizeof(*ptr)); ++n )
>> +        {
>> +            unsigned long val = *ptr;
>> +
>> +            if ( is_active_kernel_text(val) || in_stub(val) )
>> +            {
>> +                /* Normal return address entry.  */
>> +                printk("   [<%p>] C %pS\n", _p(val), _p(val));
>> +                ++ptr;
>> +            }
>> +            else if ( !((val ^ *ptr) >> (PAGE_SHIFT + STACK_ORDER)) )
>> +            {
>> +                if ( val & (sizeof(val) - 1) )
>> +                {
>> +                    /* Most likely a supervisor token. */
>> +                    break;
>> +                }
> 
> Tokens are their own linear address, with metadata in the bottom two
> bits.  I think it would be better to check that explicitly, rather than
> assuming anything nonzero in the upper bits is a token.

Well, yes. What you don't say though is what to do in such an event
(other than simply breaking out of the loop). The lack of a clear route
here is why I've done it the "wider" way. And why hence the comment says
"likely".

>> +
>> +                /*
>> +                 * Ought to be a hypervisor interruption frame.  But don't
>> +                 * (re)log the current frame's %rip.
>> +                 */
>> +                if ( n || ptr[1] != regs->rip )
>> +                    printk("   [<%p>] E %pS\n", _p(ptr[1]), _p(ptr[1]));
>> +                ptr = _p(val);
>> +            }
>> +            else
>> +            {
>> +                /* Ought to be a PV guest hypercall/interruption frame.  */
>> +                printk("   %04lx:[<%p>] E\n", ptr[2], _p(ptr[1]));
>> +                ptr = 0;
> 
> On a CPL3 -> CPL0 transition, the guest's SSP is written back into
> MSR_PL3_SSP.  The supervisor token on MSR_PL0_SSP is marked busy (either
> automatically, or by SETSSBY), but nothing pertaining to CPL3 is pushed
> onto the supervisor shadow stack.
> 
> This is why we can move off an IST stack onto the primary stack when
> interrupting CPL3 with only a CLEARSSBSY/SETSSBSY pair, and no memmove()
> loop of WRSS's.
> 
> In other words, I'm pretty sure this is a dead codeapth.  (Or worse, if
> it happens not to  be dead, then the comment is misleading.)

IOW you're confirming the respective post-commit-message remark. Then, like
above, the question here similarly is: What's the most reasonable thing to
do in the final "else"?

> A CPL1 -> CPL0 transition does push an shstk interrupt frame, and not
> wanting to memmove() the shstk by 3 slots on a context switch is part of
> why I just disallowed PV32 guests when CET was active.

Hmm, yes, such would be needed when switching to PV32. When switching from
PV32, such moving wouldn't be necessary, would it? Upon exiting to ring3
the ring0 shadow stack isn't consulted (beyond an alignment check on SSP),
and upon next entry (from ring3) the proper base ring0 would be used
again. For VMX it would apparently be a matter of using the "load CET"
VM-exit control to get SSP similarly reset to its base value. For SVM I'm
afraid I can't find any explicit information in the PM as to what happens
during #VMEXIT (nor what, if anything, is saved by VMRUN).

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] x86: record SSP at non-guest entry points
  2024-02-28 15:16   ` Andrew Cooper
@ 2024-02-29  9:39     ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2024-02-29  9:39 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 28.02.2024 16:16, Andrew Cooper wrote:
> On 28/02/2024 1:52 pm, Jan Beulich wrote:
>> We will want to use that value for call trace generation, and likely
>> also to eliminate the somewhat fragile shadow stack searching done in
>> fixup_exception_return(). For those purposes, guest-only entry points do
>> not need to record that value.
>>
>> To keep the saving code simple, record our own SSP that corresponds to
>> an exception frame, pointing to the top of the shadow stack counterpart
>> of what the CPU has saved on the regular stack. Consuming code can then
>> work its way from there.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> To record the full 64-bit value, some of the unused bits in the %cs slot
>> could be used. Sadly that slot has saved_upcall_mask in an unhelpful
>> location, otherwise simply storing low and high 32 bits in those two
>> separate half-slots would be a pretty obvious choice. As long as
>> "entry_ssp" is used in non-guest-entry frames only, we could of course
>> put half of it into a union with saved_upcall_mask ...
>>
>> Else may want to put a BUILD_BUG_ON(VADDR_BITS > 48) somewhere, but I'm
>> afraid I can't really identify a good place for such to live.
> 
> Perhaps in reinit_bsp_stack() when we enable SHSTK on the BSP?
> 
> Having it anywhere vaguely relevant is better than not having it.

Okay.

>> Leveraging that the CPU stores zero in the upper bits of the selector
>> register slots, the save sequence could also be
>>
>>         shl   $16, %rcx
>>         or    %rcx, UREGS_entry_ssp-2(%rsp)
>>
>> That's shorter and avoids a 16-bit operation, but may be less desirable,
>> for being a read-modify-write access.
> 
> I doubt you'll be able to measure a difference between the two options
> (it's into the active stack, after all), but the two stores is probably
> marginally better.  When shstks are active, we're taking a large hit
> from the busy token handling.
> 
> I was concerned by the misaligned access, but it's not misaligned, its
> it?  It's the start of entry_ssp which is misaligned and the -2 brings
> it back to being properly aligned.

Correct.

>> --- a/xen/arch/x86/include/asm/asm_defns.h
>> +++ b/xen/arch/x86/include/asm/asm_defns.h
>> @@ -221,7 +221,7 @@ static always_inline void stac(void)
>>  #endif
>>  
>>  #ifdef __ASSEMBLY__
>> -.macro SAVE_ALL compat=0
>> +.macro SAVE_ALL compat=0 ssp=IS_ENABLED(CONFIG_XEN_SHSTK)
> 
> I'm not sure this is what you want to do.  Because it's only the
> default, we'll still....
> 
>>          addq  $-(UREGS_error_code-UREGS_r15), %rsp
>>          cld
>>          movq  %rdi,UREGS_rdi(%rsp)
>> @@ -235,6 +235,9 @@ static always_inline void stac(void)
>>          movq  %rax,UREGS_rax(%rsp)
>>          xor   %eax, %eax
>>  .if !\compat
>> +.if \ssp
>> +        rdsspq %rcx
> 
> ... pick this up even in !CONFIG_XEN_SHSTK builds, and ...
> 
>> +.endif
>>          movq  %r8,UREGS_r8(%rsp)
>>          movq  %r9,UREGS_r9(%rsp)
>>          movq  %r10,UREGS_r10(%rsp)
>> @@ -264,6 +267,11 @@ static always_inline void stac(void)
>>          xor   %r13d, %r13d
>>          xor   %r14d, %r14d
>>          xor   %r15d, %r15d
>> +.if \ssp && !\compat
>> +        mov   %cx, UREGS_entry_ssp(%rsp)
>> +        shr   $16, %rcx
>> +        mov   %ecx, UREGS_entry_ssp+2(%rsp)
> 
> ... store it here.
> 
> I think you need to use ssp=1 by default, and
> 
> #ifdef CONFIG_XEN_SHSTK
> .if
>     ...
> 
> for these two blocks, so they disappear properly in !SHSTK builds.

I'm afraid I don't follow: The macro parameter exists for use sites
to pass 0 even in SHSTK builds, for the entry-from-guest paths where
its recording is of no interest. Non-zero should never be passed
explicitly. Perhaps I ought to add a comment to this effect:

/* Use sites may override ssp to 0. It should never be overridden to 1. */

If that doesn't address your concern, then I'm afraid I'm not fully
understanding your comments, or I'm overlooking something crucial.

> But for the rest of the behaviour, there are two overlapping things,
> because you end up getting entry_ssp=0 in SHSTK builds running on
> hardware without shstk active.

Yes. I guess I don't understand what you're trying to indicate to
me.

> And with that in mind, to confirm, the RDSSP block depends on the xor
> %ecx,%ecx between the two hunks in order to function as intended?

Yes. In fact initially I had used %edx, with "interesting" effects.
It's not said anywhere in the macro that %edx needs to be zero by
the point the macro completes; comments to this effect exist only
past several of the use sites of the macro.

>> --- a/xen/include/public/arch-x86/xen-x86_64.h
>> +++ b/xen/include/public/arch-x86/xen-x86_64.h
>> @@ -183,7 +183,19 @@ struct cpu_user_regs {
>>      uint8_t  _pad1[3];
>>      __DECL_REG_LO16(flags); /* rflags.IF == !saved_upcall_mask */
>>      __DECL_REG_LO8(sp);
>> -    uint16_t ss, _pad2[3];
>> +    uint16_t ss;
>> +#if !defined(__XEN__)
>> +    uint16_t _pad2[3];
>> +#elif defined(COMPILE_OFFSETS)
>> +    uint16_t entry_ssp[3];
>> +#else
>> +    /*
>> +     * This points _at_ the corresponding shadow stack frame; it is _not_ the
>> +     * outer context's SSP.  That, if the outer context has CET-SS enabled,
>> +     * is stored in the top slot of the pointed to shadow stack frame.
>> +     */
>> +    signed long entry_ssp:48;
>> +#endif
> 
> I have to admit that I dislike this.  (And only some of that is because
> it's work I'm going to have to revert in order to make FRED support work...)

Right, some of the bits in the various slots which have space available
are used there. But the frame layout there is different anyway, so by
that point we won't get away without re-working our exception frame
layout. Taking care of the new extra field then is, I expect, not going
to make much of a difference, when without doing the transformation now
we can have some immediate gain.

> We desperately need to break our use of this structure to start with,
> and with that done, we don't need to play games about hiding SSP in a
> spare 6 bytes in an ABI dubiously made public nearly two decades ago.
> 
> How hard would it be just:
> 
> #define cpu_user_regs abi_cpu_user_regs
> #include <public/xen.h>
> #undef cpu_user_regs
> 
> and make a copy of cpu_user_regs that we can really put an SSP field into?

Well, that's easy to write here, but this pattern would then need
repeating in a few dozen places. Even abstracting it by way of a
helper header would seem problematic to me: We can't very well
forbid common code to include public/xen.h directly. We also have
a couple of direct inclusions of public/arch-x86/xen.h, which would
similarly need taking care of.

A better approach might be an #ifdef __XEN__ inside the header
itself. I could try that, but I'd stop as soon as I consider knock-
on effects too heavy for the simple purpose here.

I know you've been pointing out that we want to change how the stack
is organized. This not having happened yet was, for me, largely
because of quite likely not being straightforward to achieve.

> It would make this patch more simple, and we could finally get the vm86
> block off the stack (which also fixes OoB accesses in the #DF handler -
> cant remember if I finished the bodge around that or not.)
> 
> Irrespective of anything else, why do we have COMPILE_OFFSETS getting in
> here?

Because offsetof() implies determining the address of a field, which
is not allowed for bitfields.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-02-29  9:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-28 13:50 [PATCH 0/4] x86: CET-SS related adjustments Jan Beulich
2024-02-28 13:51 ` [PATCH 1/4] x86: remove redundant XEN_SHSTK check from reinit_bsp_stack() Jan Beulich
2024-02-28 13:54   ` Andrew Cooper
2024-02-28 13:52 ` [PATCH 2/4] x86: record SSP at non-guest entry points Jan Beulich
2024-02-28 15:16   ` Andrew Cooper
2024-02-29  9:39     ` Jan Beulich
2024-02-28 13:52 ` [PATCH 3/4] x86/traps: use entry_ssp in fixup_exception_return() Jan Beulich
2024-02-28 15:21   ` Andrew Cooper
2024-02-28 15:26     ` Jan Beulich
2024-02-28 13:53 ` [PATCH 4/4] x86: prefer shadow stack for producing call traces Jan Beulich
2024-02-28 16:15   ` Andrew Cooper
2024-02-29  8:28     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.