All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] x86: Further harden function pointers
@ 2021-11-26 21:22 Andrew Cooper
  2021-11-26 21:22 ` [PATCH 1/4] x86/altcall: Check and optimise altcall targets Andrew Cooper
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Andrew Cooper @ 2021-11-26 21:22 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Daniel De Graaf, Daniel Smith,
	Marek Marczykowski-Górecki

Slightly RFC, because patch 2 has some minor structure (ab)use, but the result
works alarmingly well.  So far, this demonstrates converting two subsystems.

hvm_funcs is the other area of especially low hanging fruit, but IOMMU, vPMU
also look like good candidates.  Anything which is partially altcall'd already
would benefit from being fully altcall'd.

Should we consider introducing __ro_after_init right now (as an alias to
__read_mostly) as this conversion is touching a lot of ares where true
post-init immutability ought to be enforced.

Andrew Cooper (4):
  x86/altcall: Check and optimise altcall targets
  x86/altcall: Optimise away endbr64 instruction where possible
  xen/xsm: Use __init_data_cf_clobber for xsm_ops
  x86/ucode: Use altcall, and __initdata_cf_clobber

 xen/arch/x86/alternative.c           | 60 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/cpu/microcode/amd.c     |  2 +-
 xen/arch/x86/cpu/microcode/core.c    | 38 ++++++++++++-----------
 xen/arch/x86/cpu/microcode/intel.c   |  2 +-
 xen/arch/x86/cpu/microcode/private.h |  2 +-
 xen/arch/x86/xen.lds.S               |  5 +++
 xen/include/xen/init.h               |  2 ++
 xen/xsm/dummy.c                      |  2 +-
 xen/xsm/flask/hooks.c                |  2 +-
 xen/xsm/silo.c                       |  2 +-
 10 files changed, 93 insertions(+), 24 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] x86/altcall: Check and optimise altcall targets
  2021-11-26 21:22 [PATCH 0/4] x86: Further harden function pointers Andrew Cooper
@ 2021-11-26 21:22 ` Andrew Cooper
  2021-12-01  8:10   ` Jan Beulich
  2021-11-26 21:22 ` [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible Andrew Cooper
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Cooper @ 2021-11-26 21:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

When converting indirect to direct calls, there is no need to execute endbr64
instructions.  Detect and optimise this case, leaving a warning in the case
that no endbr64 was found, as it likely indicates a build error.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/alternative.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index ec24692e9595..5ae4c80d5119 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -18,6 +18,7 @@
 #include <xen/delay.h>
 #include <xen/types.h>
 #include <asm/apic.h>
+#include <asm/endbr.h>
 #include <asm/processor.h>
 #include <asm/alternative.h>
 #include <xen/init.h>
@@ -279,6 +280,27 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
 
                 if ( dest )
                 {
+                    /*
+                     * When building for CET-IBT, all function pointer targets
+                     * should have an endbr64 instruction.
+                     *
+                     * If this is not the case, leave a warning because
+                     * something is wrong with the build.
+                     *
+                     * Otherwise, skip the endbr64 instruction.  This is a
+                     * marginal perf improvement which saves on instruction
+                     * decode bandwidth.
+                     */
+                    if ( IS_ENABLED(CONFIG_HAS_CC_CET_IBT) )
+                    {
+                        if ( is_endbr64(dest) )
+                            dest += 4;
+                        else
+                            printk(XENLOG_WARNING
+                                   "altcall %ps dest %ps has no endbr64\n",
+                                   orig, dest);
+                    }
+
                     disp = dest - (orig + 5);
                     ASSERT(disp == (int32_t)disp);
                     *(int32_t *)(buf + 1) = disp;
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
  2021-11-26 21:22 [PATCH 0/4] x86: Further harden function pointers Andrew Cooper
  2021-11-26 21:22 ` [PATCH 1/4] x86/altcall: Check and optimise altcall targets Andrew Cooper
@ 2021-11-26 21:22 ` Andrew Cooper
  2021-12-01  8:20   ` Jan Beulich
  2021-11-26 21:22 ` [PATCH 3/4] xen/xsm: Use __init_data_cf_clobber for xsm_ops Andrew Cooper
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Cooper @ 2021-11-26 21:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

With altcall, we convert indirect branches into direct ones.  With that
complete, none of the potential targets need an endbr64 instruction.

Furthermore, removing the endbr64 instructions is a security defence-in-depth
improvement, because it limits the options available to an attacker who has
managed to hijack a function pointer.

Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
walk over the entire section, looking for any pointers into .text, and clobber
an endbr64 instruction if found.  This is some minor structure (ab)use but it
works alarmingly well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

It would be nice for the printk() to say "optimised away %u of %u", but the
latter number can only feasibly come from post-processing of xen-syms during
the build.
---
 xen/arch/x86/alternative.c | 38 ++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/xen.lds.S     |  5 +++++
 xen/include/xen/init.h     |  2 ++
 3 files changed, 45 insertions(+)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 5ae4c80d5119..65fc8534b97f 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
     return memcpy(addr, opcode, len);
 }
 
+extern unsigned long __initdata_cf_clobber_start[];
+extern unsigned long __initdata_cf_clobber_end[];
+
 /*
  * Replace instructions with better alternatives for this CPU type.
  * This runs before SMP is initialized to avoid SMP problems with
@@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
         add_nops(buf + a->repl_len, total_len - a->repl_len);
         text_poke(orig, buf, total_len);
     }
+
+    /*
+     * Clobber endbr64 instructions now that altcall has finished optimised
+     * all indirect branches to direct ones.
+     */
+    if ( force && cpu_has_xen_ibt )
+    {
+        unsigned long *val;
+        unsigned int clobbered = 0;
+
+        /*
+         * This is some minor structure (ab)use.  We walk the entire contents
+         * of .init.data.cf_clobber as if it were an array of pointers.
+         *
+         * If the pointer points into .text, and has an endbr64 instruction,
+         * nop out the endbr64.  This causes the pointer to no longer be a
+         * legal indirect branch target under CET-IBT.  This is a
+         * defence-in-depth measure, to reduce the options available to an
+         * adversary who has managed to hijack a function pointer.
+         */
+        for ( val = __initdata_cf_clobber_start;
+              val < __initdata_cf_clobber_end;
+              val++ )
+        {
+            void *ptr = (void *)*val;
+
+            if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
+                continue;
+
+            add_nops(ptr, 4);
+            clobbered++;
+        }
+
+        printk("altcall: Optimised away %u endbr64 instructions\n", clobbered);
+    }
 }
 
 void init_or_livepatch apply_alternatives(struct alt_instr *start,
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 87e344d4dd97..5b16a98e4df1 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -214,6 +214,11 @@ SECTIONS
        *(.initcall1.init)
        __initcall_end = .;
 
+       . = ALIGN(POINTER_ALIGN);
+        __initdata_cf_clobber_start = .;
+	*(.init.data.cf_clobber)
+        __initdata_cf_clobber_end = .;
+
        *(.init.data)
        *(.init.data.rel)
        *(.init.data.rel.*)
diff --git a/xen/include/xen/init.h b/xen/include/xen/init.h
index bfe789e93f6b..66b324892a52 100644
--- a/xen/include/xen/init.h
+++ b/xen/include/xen/init.h
@@ -18,6 +18,8 @@
 #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
 #define __exit_call       __used_section(".exitcall.exit")
 
+#define __initdata_cf_clobber __section(".init.data.cf_clobber")
+
 /* These macros are used to mark some functions or 
  * initialized data (doesn't apply to uninitialized data)
  * as `initialization' functions. The kernel can take this
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] xen/xsm: Use __init_data_cf_clobber for xsm_ops
  2021-11-26 21:22 [PATCH 0/4] x86: Further harden function pointers Andrew Cooper
  2021-11-26 21:22 ` [PATCH 1/4] x86/altcall: Check and optimise altcall targets Andrew Cooper
  2021-11-26 21:22 ` [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible Andrew Cooper
@ 2021-11-26 21:22 ` Andrew Cooper
  2021-12-01  8:21   ` Jan Beulich
  2021-12-03 10:32   ` Daniel P. Smith
  2021-11-26 21:22 ` [PATCH 4/4] x86/ucode: Use altcall, and __initdata_cf_clobber Andrew Cooper
  2021-11-29  8:51 ` [PATCH 0/4] x86: Further harden function pointers Jan Beulich
  4 siblings, 2 replies; 16+ messages in thread
From: Andrew Cooper @ 2021-11-26 21:22 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Daniel De Graaf, Daniel Smith, Jan Beulich,
	Roger Pau Monné,
	Wei Liu

All calls through xsm_ops are fully altcall'd.  Harden all fnptr targets.

This yields:

  (XEN) altcall: Optimised away 197 endbr64 instructions

of 1655 on an everything-enabled build of Xen, which is ~12%.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Daniel De Graaf <dgdegra@tycho.nsa.gov>
CC: Daniel Smith <dpsmith@apertussolutions.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/xsm/dummy.c       | 2 +-
 xen/xsm/flask/hooks.c | 2 +-
 xen/xsm/silo.c        | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 4d29a9aa5b9f..4f1d352d5507 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -13,7 +13,7 @@
 #define XSM_NO_WRAPPERS
 #include <xsm/dummy.h>
 
-static const struct xsm_ops __initconstrel dummy_ops = {
+static struct xsm_ops __initdata_cf_clobber dummy_ops = {
     .security_domaininfo           = xsm_security_domaininfo,
     .domain_create                 = xsm_domain_create,
     .getdomaininfo                 = xsm_getdomaininfo,
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 63484e323c09..b1c917113ec3 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1765,7 +1765,7 @@ static int cf_check flask_argo_send(
 
 #endif
 
-static const struct xsm_ops __initconstrel flask_ops = {
+static struct xsm_ops __initdata_cf_clobber flask_ops = {
     .security_domaininfo = flask_security_domaininfo,
     .domain_create = flask_domain_create,
     .getdomaininfo = flask_getdomaininfo,
diff --git a/xen/xsm/silo.c b/xen/xsm/silo.c
index 4d5fc98e7e54..7a17595888bb 100644
--- a/xen/xsm/silo.c
+++ b/xen/xsm/silo.c
@@ -102,7 +102,7 @@ static int cf_check silo_argo_send(
 
 #endif
 
-static const struct xsm_ops __initconstrel silo_xsm_ops = {
+static struct xsm_ops __initdata_cf_clobber silo_xsm_ops = {
     .evtchn_unbound = silo_evtchn_unbound,
     .evtchn_interdomain = silo_evtchn_interdomain,
     .grant_mapref = silo_grant_mapref,
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] x86/ucode: Use altcall, and __initdata_cf_clobber
  2021-11-26 21:22 [PATCH 0/4] x86: Further harden function pointers Andrew Cooper
                   ` (2 preceding siblings ...)
  2021-11-26 21:22 ` [PATCH 3/4] xen/xsm: Use __init_data_cf_clobber for xsm_ops Andrew Cooper
@ 2021-11-26 21:22 ` Andrew Cooper
  2021-12-01  8:23   ` Jan Beulich
  2021-11-29  8:51 ` [PATCH 0/4] x86: Further harden function pointers Jan Beulich
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Cooper @ 2021-11-26 21:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

Microcode loading is not a fastpath, but there are control flow security
benefits from using altcall()'s hardening side effect.

Convert the existing microcode_ops pointer into a __read_mostly structure, and
move {amd,intel}_ucode_ops into __initdata_cf_clobber.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/cpu/microcode/amd.c     |  2 +-
 xen/arch/x86/cpu/microcode/core.c    | 38 +++++++++++++++++++-----------------
 xen/arch/x86/cpu/microcode/intel.c   |  2 +-
 xen/arch/x86/cpu/microcode/private.h |  2 +-
 4 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 0afa2192bf1d..27c8644ab8ba 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -422,7 +422,7 @@ static struct microcode_patch *cf_check cpu_request_microcode(
     return patch;
 }
 
-const struct microcode_ops amd_ucode_ops = {
+struct microcode_ops __initdata_cf_clobber amd_ucode_ops = {
     .cpu_request_microcode            = cpu_request_microcode,
     .collect_cpu_info                 = collect_cpu_info,
     .apply_microcode                  = apply_microcode,
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
index f84dafa82693..755f2dc9a1e5 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
@@ -21,6 +21,7 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <xen/alternative-call.h>
 #include <xen/cpu.h>
 #include <xen/earlycpio.h>
 #include <xen/err.h>
@@ -214,7 +215,7 @@ void __init microcode_grab_module(
         microcode_scan_module(module_map, mbi);
 }
 
-static const struct microcode_ops __read_mostly *microcode_ops;
+static struct microcode_ops __read_mostly ucode_ops;
 
 static DEFINE_SPINLOCK(microcode_mutex);
 
@@ -241,9 +242,9 @@ static const struct microcode_patch *nmi_patch = ZERO_BLOCK_PTR;
  */
 static struct microcode_patch *parse_blob(const char *buf, size_t len)
 {
-    microcode_ops->collect_cpu_info();
+    alternative_vcall(ucode_ops.collect_cpu_info);
 
-    return microcode_ops->cpu_request_microcode(buf, len);
+    return alternative_call(ucode_ops.cpu_request_microcode, buf, len);
 }
 
 static void microcode_free_patch(struct microcode_patch *patch)
@@ -258,8 +259,8 @@ static bool microcode_update_cache(struct microcode_patch *patch)
 
     if ( !microcode_cache )
         microcode_cache = patch;
-    else if ( microcode_ops->compare_patch(patch,
-                                           microcode_cache) == NEW_UCODE )
+    else if ( alternative_call(ucode_ops.compare_patch,
+                               patch, microcode_cache) == NEW_UCODE )
     {
         microcode_free_patch(microcode_cache);
         microcode_cache = patch;
@@ -311,14 +312,14 @@ static int microcode_update_cpu(const struct microcode_patch *patch)
 {
     int err;
 
-    microcode_ops->collect_cpu_info();
+    alternative_vcall(ucode_ops.collect_cpu_info);
 
     spin_lock(&microcode_mutex);
     if ( patch )
-        err = microcode_ops->apply_microcode(patch);
+        err = alternative_call(ucode_ops.apply_microcode, patch);
     else if ( microcode_cache )
     {
-        err = microcode_ops->apply_microcode(microcode_cache);
+        err = alternative_call(ucode_ops.apply_microcode, microcode_cache);
         if ( err == -EIO )
         {
             microcode_free_patch(microcode_cache);
@@ -368,7 +369,7 @@ static int primary_thread_work(const struct microcode_patch *patch)
     if ( !wait_for_state(LOADING_ENTER) )
         return -EBUSY;
 
-    ret = microcode_ops->apply_microcode(patch);
+    ret = alternative_call(ucode_ops.apply_microcode, patch);
     if ( !ret )
         atomic_inc(&cpu_updated);
     atomic_inc(&cpu_out);
@@ -481,7 +482,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
     }
 
     /* Control thread loads ucode first while others are in NMI handler. */
-    ret = microcode_ops->apply_microcode(patch);
+    ret = alternative_call(ucode_ops.apply_microcode, patch);
     if ( !ret )
         atomic_inc(&cpu_updated);
     atomic_inc(&cpu_out);
@@ -610,7 +611,8 @@ static long cf_check microcode_update_helper(void *data)
      */
     spin_lock(&microcode_mutex);
     if ( microcode_cache &&
-         microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE )
+         alternative_call(ucode_ops.compare_patch,
+                          patch, microcode_cache) != NEW_UCODE )
     {
         spin_unlock(&microcode_mutex);
         printk(XENLOG_WARNING "microcode: couldn't find any newer revision "
@@ -678,7 +680,7 @@ int microcode_update(XEN_GUEST_HANDLE(const_void) buf, unsigned long len)
     if ( len != (uint32_t)len )
         return -E2BIG;
 
-    if ( microcode_ops == NULL )
+    if ( !ucode_ops.apply_microcode )
         return -EINVAL;
 
     buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len);
@@ -722,10 +724,10 @@ __initcall(microcode_init);
 /* Load a cached update to current cpu */
 int microcode_update_one(void)
 {
-    if ( !microcode_ops )
+    if ( !ucode_ops.apply_microcode )
         return -EOPNOTSUPP;
 
-    microcode_ops->collect_cpu_info();
+    alternative_vcall(ucode_ops.collect_cpu_info);
 
     return microcode_update_cpu(NULL);
 }
@@ -780,22 +782,22 @@ int __init early_microcode_init(void)
     {
     case X86_VENDOR_AMD:
         if ( c->x86 >= 0x10 )
-            microcode_ops = &amd_ucode_ops;
+            ucode_ops = amd_ucode_ops;
         break;
 
     case X86_VENDOR_INTEL:
         if ( c->x86 >= 6 )
-            microcode_ops = &intel_ucode_ops;
+            ucode_ops = intel_ucode_ops;
         break;
     }
 
-    if ( !microcode_ops )
+    if ( !ucode_ops.apply_microcode )
     {
         printk(XENLOG_WARNING "Microcode loading not available\n");
         return -ENODEV;
     }
 
-    microcode_ops->collect_cpu_info();
+    alternative_vcall(ucode_ops.collect_cpu_info);
 
     if ( ucode_mod.mod_end || ucode_blob.size )
         rc = early_microcode_update_cpu();
diff --git a/xen/arch/x86/cpu/microcode/intel.c b/xen/arch/x86/cpu/microcode/intel.c
index d3864b5ab03e..89e91f7fd06b 100644
--- a/xen/arch/x86/cpu/microcode/intel.c
+++ b/xen/arch/x86/cpu/microcode/intel.c
@@ -376,7 +376,7 @@ static struct microcode_patch *cf_check cpu_request_microcode(
     return patch;
 }
 
-const struct microcode_ops intel_ucode_ops = {
+struct microcode_ops __initdata_cf_clobber intel_ucode_ops = {
     .cpu_request_microcode            = cpu_request_microcode,
     .collect_cpu_info                 = collect_cpu_info,
     .apply_microcode                  = apply_microcode,
diff --git a/xen/arch/x86/cpu/microcode/private.h b/xen/arch/x86/cpu/microcode/private.h
index c085a1026847..4ee92a8fbaad 100644
--- a/xen/arch/x86/cpu/microcode/private.h
+++ b/xen/arch/x86/cpu/microcode/private.h
@@ -53,6 +53,6 @@ struct microcode_ops {
         const struct microcode_patch *new, const struct microcode_patch *old);
 };
 
-extern const struct microcode_ops amd_ucode_ops, intel_ucode_ops;
+extern struct microcode_ops amd_ucode_ops, intel_ucode_ops;
 
 #endif /* ASM_X86_MICROCODE_PRIVATE_H */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] x86: Further harden function pointers
  2021-11-26 21:22 [PATCH 0/4] x86: Further harden function pointers Andrew Cooper
                   ` (3 preceding siblings ...)
  2021-11-26 21:22 ` [PATCH 4/4] x86/ucode: Use altcall, and __initdata_cf_clobber Andrew Cooper
@ 2021-11-29  8:51 ` Jan Beulich
  4 siblings, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2021-11-29  8:51 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Daniel De Graaf, Daniel Smith,
	Marek Marczykowski-Górecki, Xen-devel

On 26.11.2021 22:22, Andrew Cooper wrote:
> Slightly RFC, because patch 2 has some minor structure (ab)use, but the result
> works alarmingly well.  So far, this demonstrates converting two subsystems.
> 
> hvm_funcs is the other area of especially low hanging fruit, but IOMMU, vPMU
> also look like good candidates.  Anything which is partially altcall'd already
> would benefit from being fully altcall'd.

I'll post patches for hvm_funcs and vPMU hopefully later today. I intend
to look into the remaining unconverted IOMMU instances (so far I've
spotted one, but proper auditing may turn up more). For hvm_funcs what I
have leaves a few ones still unconverted; I guess we can discuss whether
to go beyond what I have in the context of that patch.

> Should we consider introducing __ro_after_init right now (as an alias to
> __read_mostly) as this conversion is touching a lot of ares where true
> post-init immutability ought to be enforced.

Well, it's largely orthogonal, but this might indeed be a good opportunity
to at least make a first step. I'd go slightly beyond what you say and at
least also introduce a respective new section, rather than aliasing
__read_mostly.

Jan

> Andrew Cooper (4):
>   x86/altcall: Check and optimise altcall targets
>   x86/altcall: Optimise away endbr64 instruction where possible
>   xen/xsm: Use __init_data_cf_clobber for xsm_ops
>   x86/ucode: Use altcall, and __initdata_cf_clobber
> 
>  xen/arch/x86/alternative.c           | 60 ++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/cpu/microcode/amd.c     |  2 +-
>  xen/arch/x86/cpu/microcode/core.c    | 38 ++++++++++++-----------
>  xen/arch/x86/cpu/microcode/intel.c   |  2 +-
>  xen/arch/x86/cpu/microcode/private.h |  2 +-
>  xen/arch/x86/xen.lds.S               |  5 +++
>  xen/include/xen/init.h               |  2 ++
>  xen/xsm/dummy.c                      |  2 +-
>  xen/xsm/flask/hooks.c                |  2 +-
>  xen/xsm/silo.c                       |  2 +-
>  10 files changed, 93 insertions(+), 24 deletions(-)
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] x86/altcall: Check and optimise altcall targets
  2021-11-26 21:22 ` [PATCH 1/4] x86/altcall: Check and optimise altcall targets Andrew Cooper
@ 2021-12-01  8:10   ` Jan Beulich
  2021-12-01 10:20     ` Andrew Cooper
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2021-12-01  8:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 26.11.2021 22:22, Andrew Cooper wrote:
> @@ -279,6 +280,27 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>  
>                  if ( dest )
>                  {
> +                    /*
> +                     * When building for CET-IBT, all function pointer targets
> +                     * should have an endbr64 instruction.
> +                     *
> +                     * If this is not the case, leave a warning because
> +                     * something is wrong with the build.
> +                     *
> +                     * Otherwise, skip the endbr64 instruction.  This is a
> +                     * marginal perf improvement which saves on instruction
> +                     * decode bandwidth.
> +                     */
> +                    if ( IS_ENABLED(CONFIG_HAS_CC_CET_IBT) )
> +                    {
> +                        if ( is_endbr64(dest) )

I would have given my R-b, but I don't see where is_endbr64() is coming
from, and you don't list any prereqs here or in the cover letter. I'm
afraid I don't fancy going hunt for it in the many other pending patches.
Hence only on the assumption that the helper has got introduced before:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
  2021-11-26 21:22 ` [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible Andrew Cooper
@ 2021-12-01  8:20   ` Jan Beulich
  2021-12-01 19:07     ` Andrew Cooper
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2021-12-01  8:20 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 26.11.2021 22:22, Andrew Cooper wrote:
> With altcall, we convert indirect branches into direct ones.  With that
> complete, none of the potential targets need an endbr64 instruction.

Assuming that no other hooks remain which re-use the same function. I
think this constraint wants at least mentioning explicitly.

> Furthermore, removing the endbr64 instructions is a security defence-in-depth
> improvement, because it limits the options available to an attacker who has
> managed to hijack a function pointer.
> 
> Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
> walk over the entire section, looking for any pointers into .text, and clobber
> an endbr64 instruction if found.  This is some minor structure (ab)use but it
> works alarmingly well.

Iirc you've said more than once that non-function-pointer data in
those structures is fine; I'm not convinced. What if a sequence of
sub-pointer-size fields has a value looking like a pointer into
.text? This may not be very likely, but would result in corruption
that may be hard to associate with anything. Of course, with the
is_endbr64() check and with a build time check of there not being
any stray ENDBR64 patterns in .text, that issue would disappear.
But we aren't quite there yet.

> --- a/xen/arch/x86/alternative.c
> +++ b/xen/arch/x86/alternative.c
> @@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
>      return memcpy(addr, opcode, len);
>  }
>  
> +extern unsigned long __initdata_cf_clobber_start[];
> +extern unsigned long __initdata_cf_clobber_end[];

const please. I also would find it quite a bit better if these
were suitably typed such that ...

> @@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>          add_nops(buf + a->repl_len, total_len - a->repl_len);
>          text_poke(orig, buf, total_len);
>      }
> +
> +    /*
> +     * Clobber endbr64 instructions now that altcall has finished optimised
> +     * all indirect branches to direct ones.
> +     */
> +    if ( force && cpu_has_xen_ibt )
> +    {
> +        unsigned long *val;
> +        unsigned int clobbered = 0;
> +
> +        /*
> +         * This is some minor structure (ab)use.  We walk the entire contents
> +         * of .init.data.cf_clobber as if it were an array of pointers.
> +         *
> +         * If the pointer points into .text, and has an endbr64 instruction,
> +         * nop out the endbr64.  This causes the pointer to no longer be a
> +         * legal indirect branch target under CET-IBT.  This is a
> +         * defence-in-depth measure, to reduce the options available to an
> +         * adversary who has managed to hijack a function pointer.
> +         */
> +        for ( val = __initdata_cf_clobber_start;
> +              val < __initdata_cf_clobber_end;
> +              val++ )
> +        {
> +            void *ptr = (void *)*val;

... no cast was needed here.

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -214,6 +214,11 @@ SECTIONS
>         *(.initcall1.init)
>         __initcall_end = .;
>  
> +       . = ALIGN(POINTER_ALIGN);
> +        __initdata_cf_clobber_start = .;
> +	*(.init.data.cf_clobber)

Nit: hard tab slipped in here.

> --- a/xen/include/xen/init.h
> +++ b/xen/include/xen/init.h
> @@ -18,6 +18,8 @@
>  #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
>  #define __exit_call       __used_section(".exitcall.exit")
>  
> +#define __initdata_cf_clobber __section(".init.data.cf_clobber")

Just to repeat what I've said elsewhere: I think we want a const
version of this as well.

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] xen/xsm: Use __init_data_cf_clobber for xsm_ops
  2021-11-26 21:22 ` [PATCH 3/4] xen/xsm: Use __init_data_cf_clobber for xsm_ops Andrew Cooper
@ 2021-12-01  8:21   ` Jan Beulich
  2021-12-03 10:32   ` Daniel P. Smith
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2021-12-01  8:21 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Daniel De Graaf, Daniel Smith, Roger Pau Monné, Wei Liu, Xen-devel

On 26.11.2021 22:22, Andrew Cooper wrote:
> All calls through xsm_ops are fully altcall'd.  Harden all fnptr targets.
> 
> This yields:
> 
>   (XEN) altcall: Optimised away 197 endbr64 instructions
> 
> of 1655 on an everything-enabled build of Xen, which is ~12%.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Subject to the resolution of the const aspect
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] x86/ucode: Use altcall, and __initdata_cf_clobber
  2021-11-26 21:22 ` [PATCH 4/4] x86/ucode: Use altcall, and __initdata_cf_clobber Andrew Cooper
@ 2021-12-01  8:23   ` Jan Beulich
  2021-12-01 19:12     ` Andrew Cooper
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2021-12-01  8:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 26.11.2021 22:22, Andrew Cooper wrote:
> Microcode loading is not a fastpath, but there are control flow security
> benefits from using altcall()'s hardening side effect.
> 
> Convert the existing microcode_ops pointer into a __read_mostly structure, and
> move {amd,intel}_ucode_ops into __initdata_cf_clobber.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Again subject to the resolution of the const aspect and perhaps
with __read_mostly converted to __ro_after_init (assuming its
introduction goes in first)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] x86/altcall: Check and optimise altcall targets
  2021-12-01  8:10   ` Jan Beulich
@ 2021-12-01 10:20     ` Andrew Cooper
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Cooper @ 2021-12-01 10:20 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 01/12/2021 08:10, Jan Beulich wrote:
> On 26.11.2021 22:22, Andrew Cooper wrote:
>> @@ -279,6 +280,27 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>>  
>>                  if ( dest )
>>                  {
>> +                    /*
>> +                     * When building for CET-IBT, all function pointer targets
>> +                     * should have an endbr64 instruction.
>> +                     *
>> +                     * If this is not the case, leave a warning because
>> +                     * something is wrong with the build.
>> +                     *
>> +                     * Otherwise, skip the endbr64 instruction.  This is a
>> +                     * marginal perf improvement which saves on instruction
>> +                     * decode bandwidth.
>> +                     */
>> +                    if ( IS_ENABLED(CONFIG_HAS_CC_CET_IBT) )
>> +                    {
>> +                        if ( is_endbr64(dest) )
> I would have given my R-b, but I don't see where is_endbr64() is coming
> from, and you don't list any prereqs here or in the cover letter. I'm
> afraid I don't fancy going hunt for it in the many other pending patches.
> Hence only on the assumption that the helper has got introduced before:
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Oh sorry - this series is based on the CET-IBT series, which adds
CONFIG_HAS_CC_CET_IBT and is_endbr64().

~Andrew


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
  2021-12-01  8:20   ` Jan Beulich
@ 2021-12-01 19:07     ` Andrew Cooper
  2021-12-02  8:01       ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Cooper @ 2021-12-01 19:07 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 01/12/2021 08:20, Jan Beulich wrote:
> On 26.11.2021 22:22, Andrew Cooper wrote:
>> With altcall, we convert indirect branches into direct ones.  With that
>> complete, none of the potential targets need an endbr64 instruction.
> Assuming that no other hooks remain which re-use the same function. I
> think this constraint wants at least mentioning explicitly.

Fair point, but I think it is entirely reasonable to expect logic not to
mix and match altcall on the same hook.

>
>> Furthermore, removing the endbr64 instructions is a security defence-in-depth
>> improvement, because it limits the options available to an attacker who has
>> managed to hijack a function pointer.
>>
>> Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
>> walk over the entire section, looking for any pointers into .text, and clobber
>> an endbr64 instruction if found.  This is some minor structure (ab)use but it
>> works alarmingly well.
> Iirc you've said more than once that non-function-pointer data in
> those structures is fine; I'm not convinced. What if a sequence of
> sub-pointer-size fields has a value looking like a pointer into
> .text? This may not be very likely, but would result in corruption
> that may be hard to associate with anything. Of course, with the
> is_endbr64() check and with a build time check of there not being
> any stray ENDBR64 patterns in .text, that issue would disappear.
> But we aren't quite there yet.

I disagree with "not very likely" and put it firmly in the "not
plausible" category.

To cause a problem, you need an aligned something which isn't actually a
function pointer with a bit pattern forming [0xffff82d040200000,
ffff82d04039e1ba) which hits an ENDBR64 pattern.  Removing the stray
ENDBR64's doesn't prevent such a bit pattern pointing at a real (wrong)
function.

These structures are almost exclusively compile time generated.

So yes - it's not impossible, but it's also not going to happen
accidentally.


>
>> --- a/xen/arch/x86/alternative.c
>> +++ b/xen/arch/x86/alternative.c
>> @@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
>>      return memcpy(addr, opcode, len);
>>  }
>>  
>> +extern unsigned long __initdata_cf_clobber_start[];
>> +extern unsigned long __initdata_cf_clobber_end[];
> const please. I also would find it quite a bit better if these
> were suitably typed such that ...
>
>> @@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>>          add_nops(buf + a->repl_len, total_len - a->repl_len);
>>          text_poke(orig, buf, total_len);
>>      }
>> +
>> +    /*
>> +     * Clobber endbr64 instructions now that altcall has finished optimised
>> +     * all indirect branches to direct ones.
>> +     */
>> +    if ( force && cpu_has_xen_ibt )
>> +    {
>> +        unsigned long *val;
>> +        unsigned int clobbered = 0;
>> +
>> +        /*
>> +         * This is some minor structure (ab)use.  We walk the entire contents
>> +         * of .init.data.cf_clobber as if it were an array of pointers.
>> +         *
>> +         * If the pointer points into .text, and has an endbr64 instruction,
>> +         * nop out the endbr64.  This causes the pointer to no longer be a
>> +         * legal indirect branch target under CET-IBT.  This is a
>> +         * defence-in-depth measure, to reduce the options available to an
>> +         * adversary who has managed to hijack a function pointer.
>> +         */
>> +        for ( val = __initdata_cf_clobber_start;
>> +              val < __initdata_cf_clobber_end;
>> +              val++ )
>> +        {
>> +            void *ptr = (void *)*val;
> ... no cast was needed here.

Unless you know what this type is, I already tried and am stuck. 
Everything else requires more horrible casts on val.

>
>> --- a/xen/arch/x86/xen.lds.S
>> +++ b/xen/arch/x86/xen.lds.S
>> @@ -214,6 +214,11 @@ SECTIONS
>>         *(.initcall1.init)
>>         __initcall_end = .;
>>  
>> +       . = ALIGN(POINTER_ALIGN);
>> +        __initdata_cf_clobber_start = .;
>> +	*(.init.data.cf_clobber)
> Nit: hard tab slipped in here.

Will fix.

>
>> --- a/xen/include/xen/init.h
>> +++ b/xen/include/xen/init.h
>> @@ -18,6 +18,8 @@
>>  #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
>>  #define __exit_call       __used_section(".exitcall.exit")
>>  
>> +#define __initdata_cf_clobber __section(".init.data.cf_clobber")
> Just to repeat what I've said elsewhere: I think we want a const
> version of this as well.

I can, but does it really matter?  initconst is merged into initdata and
not actually read-only to begin with.

~Andrew


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] x86/ucode: Use altcall, and __initdata_cf_clobber
  2021-12-01  8:23   ` Jan Beulich
@ 2021-12-01 19:12     ` Andrew Cooper
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Cooper @ 2021-12-01 19:12 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 01/12/2021 08:23, Jan Beulich wrote:
> On 26.11.2021 22:22, Andrew Cooper wrote:
>> Microcode loading is not a fastpath, but there are control flow security
>> benefits from using altcall()'s hardening side effect.
>>
>> Convert the existing microcode_ops pointer into a __read_mostly structure, and
>> move {amd,intel}_ucode_ops into __initdata_cf_clobber.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Again subject to the resolution of the const aspect and perhaps
> with __read_mostly converted to __ro_after_init (assuming its
> introduction goes in first)
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

Given how simple __ro_after_init was, it would be preferable to get that
in first, than to go around patching these a second time.

~Andrew


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
  2021-12-01 19:07     ` Andrew Cooper
@ 2021-12-02  8:01       ` Jan Beulich
  2021-12-03 18:41         ` Andrew Cooper
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2021-12-02  8:01 UTC (permalink / raw)
  To: Andrew Cooper, Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 01.12.2021 20:07, Andrew Cooper wrote:
> On 01/12/2021 08:20, Jan Beulich wrote:
>> On 26.11.2021 22:22, Andrew Cooper wrote:
>>> With altcall, we convert indirect branches into direct ones.  With that
>>> complete, none of the potential targets need an endbr64 instruction.
>> Assuming that no other hooks remain which re-use the same function. I
>> think this constraint wants at least mentioning explicitly.
> 
> Fair point, but I think it is entirely reasonable to expect logic not to
> mix and match altcall on the same hook.

Take XSM's silo_xsm_ops and dummy_ops as an example. With what
xsm_fixup_ops() does it should be entirely benign if silo_xsm_ops
set any or all of the hooks which are currently unset to what
dummy_ops has.

>>> Furthermore, removing the endbr64 instructions is a security defence-in-depth
>>> improvement, because it limits the options available to an attacker who has
>>> managed to hijack a function pointer.
>>>
>>> Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
>>> walk over the entire section, looking for any pointers into .text, and clobber
>>> an endbr64 instruction if found.  This is some minor structure (ab)use but it
>>> works alarmingly well.
>> Iirc you've said more than once that non-function-pointer data in
>> those structures is fine; I'm not convinced. What if a sequence of
>> sub-pointer-size fields has a value looking like a pointer into
>> .text? This may not be very likely, but would result in corruption
>> that may be hard to associate with anything. Of course, with the
>> is_endbr64() check and with a build time check of there not being
>> any stray ENDBR64 patterns in .text, that issue would disappear.
>> But we aren't quite there yet.
> 
> I disagree with "not very likely" and put it firmly in the "not
> plausible" category.
> 
> To cause a problem, you need an aligned something which isn't actually a
> function pointer with a bit pattern forming [0xffff82d040200000,
> ffff82d04039e1ba) which hits an ENDBR64 pattern.  Removing the stray
> ENDBR64's doesn't prevent such a bit pattern pointing at a real (wrong)
> function.

Why "aligned" in "aligned something"? And I also don't see what you're
trying to tell me with the last sentence. It's still .text corruption
that would result if such a pattern (crossing an insn boundary)
happened to be pointed at.

> These structures are almost exclusively compile time generated.
> 
> So yes - it's not impossible, but it's also not going to happen
> accidentally.

I wonder how you mean to exclude such accidents. It occurs to me that
checking the linked binary for the pattern isn't going to be enough.
Such a patter could also form with alternatives patching. (It's all
quite unlikely, yes, but imo we need to fully exclude the possibility.)

>>> --- a/xen/arch/x86/alternative.c
>>> +++ b/xen/arch/x86/alternative.c
>>> @@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
>>>      return memcpy(addr, opcode, len);
>>>  }
>>>  
>>> +extern unsigned long __initdata_cf_clobber_start[];
>>> +extern unsigned long __initdata_cf_clobber_end[];
>> const please. I also would find it quite a bit better if these
>> were suitably typed such that ...
>>
>>> @@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>>>          add_nops(buf + a->repl_len, total_len - a->repl_len);
>>>          text_poke(orig, buf, total_len);
>>>      }
>>> +
>>> +    /*
>>> +     * Clobber endbr64 instructions now that altcall has finished optimised
>>> +     * all indirect branches to direct ones.
>>> +     */
>>> +    if ( force && cpu_has_xen_ibt )
>>> +    {
>>> +        unsigned long *val;
>>> +        unsigned int clobbered = 0;
>>> +
>>> +        /*
>>> +         * This is some minor structure (ab)use.  We walk the entire contents
>>> +         * of .init.data.cf_clobber as if it were an array of pointers.
>>> +         *
>>> +         * If the pointer points into .text, and has an endbr64 instruction,
>>> +         * nop out the endbr64.  This causes the pointer to no longer be a
>>> +         * legal indirect branch target under CET-IBT.  This is a
>>> +         * defence-in-depth measure, to reduce the options available to an
>>> +         * adversary who has managed to hijack a function pointer.
>>> +         */
>>> +        for ( val = __initdata_cf_clobber_start;
>>> +              val < __initdata_cf_clobber_end;
>>> +              val++ )
>>> +        {
>>> +            void *ptr = (void *)*val;
>> ... no cast was needed here.
> 
> Unless you know what this type is, I already tried and am stuck. 
> Everything else requires more horrible casts on val.

It's as simple as I thought is would be; proposed respective patch
at the end of the mail (the two //temp-marked #define-s were needed so
I could build-test this without needing to pull in further patches of
yours). No new casts at all, and the one gone that I wanted to see
eliminated.

>>> --- a/xen/include/xen/init.h
>>> +++ b/xen/include/xen/init.h
>>> @@ -18,6 +18,8 @@
>>>  #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
>>>  #define __exit_call       __used_section(".exitcall.exit")
>>>  
>>> +#define __initdata_cf_clobber __section(".init.data.cf_clobber")
>> Just to repeat what I've said elsewhere: I think we want a const
>> version of this as well.
> 
> I can, but does it really matter?  initconst is merged into initdata and
> not actually read-only to begin with.

My remark wasn't about the actual mapping properties at all. What I'm
after is the compiler being able to spot modifications. If I see a
struct instance marked "const" and if I know the thing builds okay, I
know I don't need to go hunt for possible writes to this struct
instance. When it's non-const, to be sure there's no possible conflict
with the patching (yours or just the altcall part), I'd need to find
and verify all instances where the object gets written to.

Jan

**********************************************************************

--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -28,6 +28,9 @@
 #include <asm/nops.h>
 #include <xen/livepatch.h>
 
+#define cpu_has_xen_ibt true//temp
+#define is_endbr64(p) false//temp
+
 #define MAX_PATCH_LEN (255-1)
 
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
@@ -174,6 +177,9 @@ text_poke(void *addr, const void *opcode
     cpuid_eax(0);
 }
 
+extern void *const __initdata_cf_clobber_start[];
+extern void *const __initdata_cf_clobber_end[];
+
 /*
  * Replace instructions with better alternatives for this CPU type.
  * This runs before SMP is initialized to avoid SMP problems with
@@ -309,6 +315,41 @@ static void init_or_livepatch _apply_alt
         add_nops(buf + a->repl_len, total_len - a->repl_len);
         text_poke(orig, buf, total_len);
     }
+
+    /*
+     * Clobber endbr64 instructions now that altcall has finished optimised
+     * all indirect branches to direct ones.
+     */
+    if ( force && cpu_has_xen_ibt )
+    {
+        void *const *val;
+        unsigned int clobbered = 0;
+
+        /*
+         * This is some minor structure (ab)use.  We walk the entire contents
+         * of .init.data.cf_clobber as if it were an array of pointers.
+         *
+         * If the pointer points into .text, and has an endbr64 instruction,
+         * nop out the endbr64.  This causes the pointer to no longer be a
+         * legal indirect branch target under CET-IBT.  This is a
+         * defence-in-depth measure, to reduce the options available to an
+         * adversary who has managed to hijack a function pointer.
+         */
+        for ( val = __initdata_cf_clobber_start;
+              val < __initdata_cf_clobber_end;
+              val++ )
+        {
+            void *ptr = *val;
+
+            if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
+                continue;
+
+            add_nops(ptr, 4);
+            clobbered++;
+        }
+
+        printk("altcall: Optimised away %u endbr64 instructions\n", clobbered);
+    }
 }
 
 void init_or_livepatch apply_alternatives(struct alt_instr *start,
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -217,6 +217,11 @@ SECTIONS
        *(.initcall1.init)
        __initcall_end = .;
 
+       . = ALIGN(POINTER_ALIGN);
+        __initdata_cf_clobber_start = .;
+       *(.init.data.cf_clobber)
+        __initdata_cf_clobber_end = .;
+
        *(.init.data)
        *(.init.data.rel)
        *(.init.data.rel.*)
--- a/xen/include/xen/init.h
+++ b/xen/include/xen/init.h
@@ -18,6 +18,8 @@
 #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
 #define __exit_call       __used_section(".exitcall.exit")
 
+#define __initdata_cf_clobber __section(".init.data.cf_clobber")
+
 /* These macros are used to mark some functions or 
  * initialized data (doesn't apply to uninitialized data)
  * as `initialization' functions. The kernel can take this



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] xen/xsm: Use __init_data_cf_clobber for xsm_ops
  2021-11-26 21:22 ` [PATCH 3/4] xen/xsm: Use __init_data_cf_clobber for xsm_ops Andrew Cooper
  2021-12-01  8:21   ` Jan Beulich
@ 2021-12-03 10:32   ` Daniel P. Smith
  1 sibling, 0 replies; 16+ messages in thread
From: Daniel P. Smith @ 2021-12-03 10:32 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Daniel De Graaf, Jan Beulich, Roger Pau Monné, Wei Liu

On 11/26/21 4:22 PM, Andrew Cooper wrote:
> All calls through xsm_ops are fully altcall'd.  Harden all fnptr targets.
> 
> This yields:
> 
>    (XEN) altcall: Optimised away 197 endbr64 instructions
> 
> of 1655 on an everything-enabled build of Xen, which is ~12%.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
  2021-12-02  8:01       ` Jan Beulich
@ 2021-12-03 18:41         ` Andrew Cooper
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Cooper @ 2021-12-03 18:41 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 02/12/2021 08:01, Jan Beulich wrote:
> On 01.12.2021 20:07, Andrew Cooper wrote:
>> On 01/12/2021 08:20, Jan Beulich wrote:
>>> On 26.11.2021 22:22, Andrew Cooper wrote:
>>>> With altcall, we convert indirect branches into direct ones.  With that
>>>> complete, none of the potential targets need an endbr64 instruction.
>>> Assuming that no other hooks remain which re-use the same function. I
>>> think this constraint wants at least mentioning explicitly.
>> Fair point, but I think it is entirely reasonable to expect logic not to
>> mix and match altcall on the same hook.
> Take XSM's silo_xsm_ops and dummy_ops as an example. With what
> xsm_fixup_ops() does it should be entirely benign if silo_xsm_ops
> set any or all of the hooks which are currently unset to what
> dummy_ops has.

We're talking very specifically about ops and ops-like structures, and
we don't just have random code calling dumy_ops->foo() when you've also
got xsm_foo() { altcall(ops->foo); }

In this case specifically, each of {flask,silo,dummy}_ops are static,
and xsm_fixup_ops() gets called exactly once on the root xsm_ops object,
so even code inside silo.c can't call silo->foo() and hit the dummy foo().

>>>> Furthermore, removing the endbr64 instructions is a security defence-in-depth
>>>> improvement, because it limits the options available to an attacker who has
>>>> managed to hijack a function pointer.
>>>>
>>>> Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
>>>> walk over the entire section, looking for any pointers into .text, and clobber
>>>> an endbr64 instruction if found.  This is some minor structure (ab)use but it
>>>> works alarmingly well.
>>> Iirc you've said more than once that non-function-pointer data in
>>> those structures is fine; I'm not convinced. What if a sequence of
>>> sub-pointer-size fields has a value looking like a pointer into
>>> .text? This may not be very likely, but would result in corruption
>>> that may be hard to associate with anything. Of course, with the
>>> is_endbr64() check and with a build time check of there not being
>>> any stray ENDBR64 patterns in .text, that issue would disappear.
>>> But we aren't quite there yet.
>> I disagree with "not very likely" and put it firmly in the "not
>> plausible" category.
>>
>> To cause a problem, you need an aligned something which isn't actually a
>> function pointer with a bit pattern forming [0xffff82d040200000,
>> ffff82d04039e1ba) which hits an ENDBR64 pattern.  Removing the stray
>> ENDBR64's doesn't prevent such a bit pattern pointing at a real (wrong)
>> function.
> Why "aligned" in "aligned something"?

The non-function-pointer thing inside the ops struct needs to be 8-byte
aligned to trigger this bad behaviour to begin with, because we
interpret the struct as an array of unsigned longs.

Any 8-byte block containing a bool for example can't cause problems, nor
can a pair of adjacent uint32_t's if they're not on an 8 byte boundary.

>  And I also don't see what you're
> trying to tell me with the last sentence. It's still .text corruption
> that would result if such a pattern (crossing an insn boundary)
> happened to be pointed at.

We (will) have tooling to detect and reject ENDBR64 bit patterns which
aren't real ENDBR64 instructions.

But this "integer bit pattern that looks like a function pointer"
problem can target one of ~1600 (fewer in most builds) real ENDBR64
instructions of an unrelated function.

>> These structures are almost exclusively compile time generated.
>>
>> So yes - it's not impossible, but it's also not going to happen
>> accidentally.
> I wonder how you mean to exclude such accidents. It occurs to me that
> checking the linked binary for the pattern isn't going to be enough.
> Such a patter could also form with alternatives patching. (It's all
> quite unlikely, yes, but imo we need to fully exclude the possibility.)

Again, we're taking specifically ops structures, not arbitrary structures.

hvm_funcs is the only thing so far that has non-function pointer
members, and its got a string pointer (fine - not .text), and a couple
of integer fields, none of which will plausibly alias a function pointer.

I will fully admit that there is a risk of things going wrong.  I'm
happy copious health warnings wherever necessary, but I don't see
anything going wrong in practice without a deliberate attempt to tickle
this corner case.

>>>> --- a/xen/arch/x86/alternative.c
>>>> +++ b/xen/arch/x86/alternative.c
>>>> @@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
>>>>      return memcpy(addr, opcode, len);
>>>>  }
>>>>  
>>>> +extern unsigned long __initdata_cf_clobber_start[];
>>>> +extern unsigned long __initdata_cf_clobber_end[];
>>> const please. I also would find it quite a bit better if these
>>> were suitably typed such that ...
>>>
>>>> @@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>>>>          add_nops(buf + a->repl_len, total_len - a->repl_len);
>>>>          text_poke(orig, buf, total_len);
>>>>      }
>>>> +
>>>> +    /*
>>>> +     * Clobber endbr64 instructions now that altcall has finished optimised
>>>> +     * all indirect branches to direct ones.
>>>> +     */
>>>> +    if ( force && cpu_has_xen_ibt )
>>>> +    {
>>>> +        unsigned long *val;
>>>> +        unsigned int clobbered = 0;
>>>> +
>>>> +        /*
>>>> +         * This is some minor structure (ab)use.  We walk the entire contents
>>>> +         * of .init.data.cf_clobber as if it were an array of pointers.
>>>> +         *
>>>> +         * If the pointer points into .text, and has an endbr64 instruction,
>>>> +         * nop out the endbr64.  This causes the pointer to no longer be a
>>>> +         * legal indirect branch target under CET-IBT.  This is a
>>>> +         * defence-in-depth measure, to reduce the options available to an
>>>> +         * adversary who has managed to hijack a function pointer.
>>>> +         */
>>>> +        for ( val = __initdata_cf_clobber_start;
>>>> +              val < __initdata_cf_clobber_end;
>>>> +              val++ )
>>>> +        {
>>>> +            void *ptr = (void *)*val;
>>> ... no cast was needed here.
>> Unless you know what this type is, I already tried and am stuck. 
>> Everything else requires more horrible casts on val.
> It's as simple as I thought is would be; proposed respective patch
> at the end of the mail (the two //temp-marked #define-s were needed so
> I could build-test this without needing to pull in further patches of
> yours). No new casts at all, and the one gone that I wanted to see
> eliminated.

I can't have been very caffeinated while having those problems, clearly...

I have no idea how I didn't manage to come up with that as a working
solution.

>>>> --- a/xen/include/xen/init.h
>>>> +++ b/xen/include/xen/init.h
>>>> @@ -18,6 +18,8 @@
>>>>  #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
>>>>  #define __exit_call       __used_section(".exitcall.exit")
>>>>  
>>>> +#define __initdata_cf_clobber __section(".init.data.cf_clobber")
>>> Just to repeat what I've said elsewhere: I think we want a const
>>> version of this as well.
>> I can, but does it really matter?  initconst is merged into initdata and
>> not actually read-only to begin with.
> My remark wasn't about the actual mapping properties at all. What I'm
> after is the compiler being able to spot modifications. If I see a
> struct instance marked "const" and if I know the thing builds okay, I
> know I don't need to go hunt for possible writes to this struct
> instance. When it's non-const, to be sure there's no possible conflict
> with the patching (yours or just the altcall part), I'd need to find
> and verify all instances where the object gets written to.

I've added __initconst_cf_clobber too.

~Andrew


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-12-03 18:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-26 21:22 [PATCH 0/4] x86: Further harden function pointers Andrew Cooper
2021-11-26 21:22 ` [PATCH 1/4] x86/altcall: Check and optimise altcall targets Andrew Cooper
2021-12-01  8:10   ` Jan Beulich
2021-12-01 10:20     ` Andrew Cooper
2021-11-26 21:22 ` [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible Andrew Cooper
2021-12-01  8:20   ` Jan Beulich
2021-12-01 19:07     ` Andrew Cooper
2021-12-02  8:01       ` Jan Beulich
2021-12-03 18:41         ` Andrew Cooper
2021-11-26 21:22 ` [PATCH 3/4] xen/xsm: Use __init_data_cf_clobber for xsm_ops Andrew Cooper
2021-12-01  8:21   ` Jan Beulich
2021-12-03 10:32   ` Daniel P. Smith
2021-11-26 21:22 ` [PATCH 4/4] x86/ucode: Use altcall, and __initdata_cf_clobber Andrew Cooper
2021-12-01  8:23   ` Jan Beulich
2021-12-01 19:12     ` Andrew Cooper
2021-11-29  8:51 ` [PATCH 0/4] x86: Further harden function pointers Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.