All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] arm64: dynamic shadow call stack support
@ 2022-06-13 13:40 Ard Biesheuvel
  2022-06-13 13:40 ` [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules Ard Biesheuvel
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 13:40 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, will, mark.rutland, maz, Ard Biesheuvel,
	Kees Cook, Sami Tolvanen, Fangrui Song, Nick Desaulniers, Dan Li

Generic kernel images such as Android's GKI usually enable all available
security features, which are typically implemented in such a way that
they only take effect if the underlying hardware can support it, but
don't interfere with correct and efficient operation otherwise.

For shadow call stack support, which is always supported by the
hardware, it means it will be enabled even if pointer authentication is
also supported, and enabled for signing return addresses stored on the
stack. The additional security provided by shadow call stack is only
marginal in this case, whereas the performance overhead is not.

Given that return address signing is based on PACIASP/AUTIASP
instructions that implicitly operate on the return address register
(X30) and are not idempotent (i.e., each needs to be emitted exactly
once before the return address is stored on the ordinary stack and after
it has been retrieved from it), we can convert these instruction 1:1
into shadow call stack pushes and pops involving the register X30.
As this is something that can be done at runtime rather than build time,
we can do this conditionally based on whether or not return address
signing is supported on the underlying hardware.

In order to be able to unwind call stacks that involve return address
signing, whether or not the return address is currently signed is
tracked by DWARF CFI directives in the unwinding metadata. This means we
can use this information to locate all PACIASP/AUTIASP instructions in
the binary, instead of having to use brute force and go over all
instructions in the entire program.

This series implements this approach for Clang, which has recently been
fixed to emit all these CFI directives correctly. This series is based
on an older PoC sent out last year [0] that targeted GCC only (due to
this issue). This v3 targets Clang only, as GCC has its own issues with
CFI accuracy.

Changes since v2 [1]:
- don't enable unwind table generation for nVHE code - it cannot be
  patched anyway so it has no use for it;
- drop checks for ID reg overrides
- fix some remaining TODOs regarding augmentation data and the code
  alignment factor
- disable PAC for leaf functions when dynamic SCS is configured, so that
  we don't end up with SCS pushes and pops in all leaf functions too;
- add I-cache maintenance after code patching
- add Rb's from Nick and Kees.

Changes since RFC v1:
- implement boot time check for PAC/BTI support, and only enable dynamic
  SCS if neither are supported;
- implement module patching as well;
- switch to Clang, and drop workaround for GCC bug;

[0] https://lore.kernel.org/linux-arm-kernel/20211013152243.2216899-1-ardb@kernel.org/
[1] https://lore.kernel.org/linux-arm-kernel/20220505161011.1801596-1-ardb@kernel.org/

Cc: Kees Cook <keescook@google.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Dan Li <ashimida@linux.alibaba.com>

Ard Biesheuvel (3):
  arm64: unwind: add asynchronous unwind tables to kernel and modules
  scs: add support for dynamic shadow call stacks
  arm64: implement dynamic shadow call stack for Clang

 Makefile                              |   2 +
 arch/Kconfig                          |   7 +
 arch/arm64/Kconfig                    |  12 +
 arch/arm64/Makefile                   |  15 +-
 arch/arm64/include/asm/module.lds.h   |   8 +
 arch/arm64/include/asm/scs.h          |  45 ++++
 arch/arm64/kernel/Makefile            |   2 +
 arch/arm64/kernel/head.S              |   3 +
 arch/arm64/kernel/irq.c               |   2 +-
 arch/arm64/kernel/module.c            |   8 +
 arch/arm64/kernel/patch-scs.c         | 257 ++++++++++++++++++++
 arch/arm64/kernel/sdei.c              |   2 +-
 arch/arm64/kernel/setup.c             |   4 +
 arch/arm64/kernel/vmlinux.lds.S       |  13 +
 arch/arm64/kvm/hyp/nvhe/Makefile      |   1 +
 drivers/firmware/efi/libstub/Makefile |   1 +
 include/linux/scs.h                   |  16 ++
 kernel/scs.c                          |  14 +-
 18 files changed, 406 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm64/kernel/patch-scs.c

-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules
  2022-06-13 13:40 [PATCH v3 0/3] arm64: dynamic shadow call stack support Ard Biesheuvel
@ 2022-06-13 13:40 ` Ard Biesheuvel
  2022-06-15 16:50   ` Sami Tolvanen
  2022-06-13 13:40 ` [PATCH v3 2/3] scs: add support for dynamic shadow call stacks Ard Biesheuvel
  2022-06-13 13:40 ` [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang Ard Biesheuvel
  2 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 13:40 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, will, mark.rutland, maz, Ard Biesheuvel,
	Kees Cook, Sami Tolvanen, Fangrui Song, Nick Desaulniers, Dan Li

Enable asynchronous unwind table generation for both the core kernel as
well as modules, and emit the resulting .eh_frame sections as init code
so we can use the unwind directives for code patching at boot or module
load time.

This will be used by dynamic shadow call stack support, which will rely
on code patching rather than compiler codegen to emit the shadow call
stack push and pop instructions.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/arm64/Kconfig                    |  3 +++
 arch/arm64/Makefile                   |  5 +++++
 arch/arm64/include/asm/module.lds.h   |  8 ++++++++
 arch/arm64/kernel/vmlinux.lds.S       | 13 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile      |  1 +
 drivers/firmware/efi/libstub/Makefile |  1 +
 6 files changed, 31 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1652a9800ebe..5f92344edff5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -366,6 +366,9 @@ config KASAN_SHADOW_OFFSET
 	default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
 	default 0xffffffffffffffff
 
+config UNWIND_TABLES
+	bool
+
 source "arch/arm64/Kconfig.platforms"
 
 menu "Kernel Features"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 6d9d4a58b898..4fbca56fa602 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -45,8 +45,13 @@ KBUILD_CFLAGS	+= $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS	+= $(call cc-option,-mabi=lp64)
 
 # Avoid generating .eh_frame* sections.
+ifneq ($(CONFIG_UNWIND_TABLES),y)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables -fno-unwind-tables
 KBUILD_AFLAGS	+= -fno-asynchronous-unwind-tables -fno-unwind-tables
+else
+KBUILD_CFLAGS	+= -fasynchronous-unwind-tables
+KBUILD_AFLAGS	+= -fasynchronous-unwind-tables
+endif
 
 ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
 prepare: stack_protector_prepare
diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
index 094701ec5500..dbba4b7559aa 100644
--- a/arch/arm64/include/asm/module.lds.h
+++ b/arch/arm64/include/asm/module.lds.h
@@ -17,4 +17,12 @@ SECTIONS {
 	 */
 	.text.hot : { *(.text.hot) }
 #endif
+
+#ifdef CONFIG_UNWIND_TABLES
+	/*
+	 * Currently, we only use unwind info at module load time, so we can
+	 * put it into the .init allocation.
+	 */
+	.init.eh_frame : { *(.eh_frame) }
+#endif
 }
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 2d4a8f995175..7bf4809f523d 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -120,6 +120,17 @@ jiffies = jiffies_64;
 #define TRAMP_TEXT
 #endif
 
+#ifdef CONFIG_UNWIND_TABLES
+#define UNWIND_DATA_SECTIONS				\
+	.eh_frame : {					\
+		__eh_frame_start = .;			\
+		*(.eh_frame)				\
+		__eh_frame_end = .;			\
+	}
+#else
+#define UNWIND_DATA_SECTIONS
+#endif
+
 /*
  * The size of the PE/COFF section that covers the kernel image, which
  * runs from _stext to _edata, must be a round multiple of the PE/COFF
@@ -231,6 +242,8 @@ SECTIONS
 		__alt_instructions_end = .;
 	}
 
+	UNWIND_DATA_SECTIONS
+
 	. = ALIGN(SEGMENT_ALIGN);
 	__inittext_end = .;
 	__initdata_begin = .;
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index f9fe4dc21b1f..23de41479495 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -84,6 +84,7 @@ quiet_cmd_hypcopy = HYPCOPY $@
 # Remove ftrace, Shadow Call Stack, and CFI CFLAGS.
 # This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations.
 KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS))
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
 
 # KVM nVHE code is run at a different exception code with a different map, so
 # compiler instrumentation that inserts callbacks or checks into the code may
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index d0537573501e..78c46638707a 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,6 +20,7 @@ cflags-$(CONFIG_X86)		+= -m$(BITS) -D__KERNEL__ \
 # disable the stackleak plugin
 cflags-$(CONFIG_ARM64)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
 				   -fpie $(DISABLE_STACKLEAK_PLUGIN) \
+				   -fno-unwind-tables -fno-asynchronous-unwind-tables \
 				   $(call cc-option,-mbranch-protection=none)
 cflags-$(CONFIG_ARM)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
 				   -fno-builtin -fpic \
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/3] scs: add support for dynamic shadow call stacks
  2022-06-13 13:40 [PATCH v3 0/3] arm64: dynamic shadow call stack support Ard Biesheuvel
  2022-06-13 13:40 ` [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules Ard Biesheuvel
@ 2022-06-13 13:40 ` Ard Biesheuvel
  2022-06-14  6:20   ` Ard Biesheuvel
  2022-06-13 13:40 ` [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang Ard Biesheuvel
  2 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 13:40 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, will, mark.rutland, maz, Ard Biesheuvel,
	Kees Cook, Sami Tolvanen, Fangrui Song, Nick Desaulniers, Dan Li,
	Kees Cook

In order to allow arches to use code patching to conditionally emit the
shadow stack pushes and pops, rather than always taking the performance
hit even on CPUs that implement alternatives such as stack pointer
authentication on arm64, add a Kconfig symbol that can be set by the
arch to omit the SCS codegen itself, without otherwise affecting how
support code for SCS and compiler options (for register reservation, for
instance) are emitted.

Also, add a static key and some plumbing to omit the allocation of
shadow call stack for dynamic SCS configurations if SCS is disabled at
runtime.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 Makefile            |  2 ++
 arch/Kconfig        |  7 +++++++
 include/linux/scs.h | 16 ++++++++++++++++
 kernel/scs.c        | 14 ++++++++++++--
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index c43d825a3c4c..806b1dea1218 100644
--- a/Makefile
+++ b/Makefile
@@ -883,8 +883,10 @@ LDFLAGS_vmlinux += --gc-sections
 endif
 
 ifdef CONFIG_SHADOW_CALL_STACK
+ifndef CONFIG_DYNAMIC_SCS
 CC_FLAGS_SCS	:= -fsanitize=shadow-call-stack
 KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
+endif
 export CC_FLAGS_SCS
 endif
 
diff --git a/arch/Kconfig b/arch/Kconfig
index fcf9a41a4ef5..a6048d78f05d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -636,6 +636,13 @@ config SHADOW_CALL_STACK
 	  reading and writing arbitrary memory may be able to locate them
 	  and hijack control flow by modifying the stacks.
 
+config DYNAMIC_SCS
+	bool
+	help
+	  Set by the arch code if it relies on code patching to insert the
+	  shadow call stack push and pop instructions rather than on the
+	  compiler.
+
 config LTO
 	bool
 	help
diff --git a/include/linux/scs.h b/include/linux/scs.h
index 18122d9e17ff..c62134d89c7b 100644
--- a/include/linux/scs.h
+++ b/include/linux/scs.h
@@ -53,6 +53,20 @@ static inline bool task_scs_end_corrupted(struct task_struct *tsk)
 	return sz >= SCS_SIZE - 1 || READ_ONCE_NOCHECK(*magic) != SCS_END_MAGIC;
 }
 
+DECLARE_STATIC_KEY_TRUE(dynamic_scs_enabled);
+
+static inline bool scs_is_dynamic(void)
+{
+	return static_branch_likely(&dynamic_scs_enabled);
+}
+
+static inline bool scs_is_enabled(void)
+{
+	if (!IS_ENABLED(CONFIG_DYNAMIC_SCS))
+		return true;
+	return scs_is_dynamic();
+}
+
 #else /* CONFIG_SHADOW_CALL_STACK */
 
 static inline void *scs_alloc(int node) { return NULL; }
@@ -62,6 +76,8 @@ static inline void scs_task_reset(struct task_struct *tsk) {}
 static inline int scs_prepare(struct task_struct *tsk, int node) { return 0; }
 static inline void scs_release(struct task_struct *tsk) {}
 static inline bool task_scs_end_corrupted(struct task_struct *tsk) { return false; }
+static inline bool scs_is_enabled(void) { return false; }
+static inline bool scs_is_dynamic(void) { return false; }
 
 #endif /* CONFIG_SHADOW_CALL_STACK */
 
diff --git a/kernel/scs.c b/kernel/scs.c
index b7e1b096d906..8826794d2645 100644
--- a/kernel/scs.c
+++ b/kernel/scs.c
@@ -12,6 +12,10 @@
 #include <linux/vmalloc.h>
 #include <linux/vmstat.h>
 
+#ifdef CONFIG_DYNAMIC_SCS
+DEFINE_STATIC_KEY_TRUE(dynamic_scs_enabled);
+#endif
+
 static void __scs_account(void *s, int account)
 {
 	struct page *scs_page = vmalloc_to_page(s);
@@ -101,14 +105,20 @@ static int scs_cleanup(unsigned int cpu)
 
 void __init scs_init(void)
 {
+	if (!scs_is_enabled())
+		return;
 	cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL,
 			  scs_cleanup);
 }
 
 int scs_prepare(struct task_struct *tsk, int node)
 {
-	void *s = scs_alloc(node);
+	void *s;
 
+	if (!scs_is_enabled())
+		return 0;
+
+	s = scs_alloc(node);
 	if (!s)
 		return -ENOMEM;
 
@@ -148,7 +158,7 @@ void scs_release(struct task_struct *tsk)
 {
 	void *s = task_scs(tsk);
 
-	if (!s)
+	if (!scs_is_enabled() || !s)
 		return;
 
 	WARN(task_scs_end_corrupted(tsk),
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang
  2022-06-13 13:40 [PATCH v3 0/3] arm64: dynamic shadow call stack support Ard Biesheuvel
  2022-06-13 13:40 ` [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules Ard Biesheuvel
  2022-06-13 13:40 ` [PATCH v3 2/3] scs: add support for dynamic shadow call stacks Ard Biesheuvel
@ 2022-06-13 13:40 ` Ard Biesheuvel
  2022-06-13 16:30   ` Kees Cook
  2022-06-15 21:32   ` Sami Tolvanen
  2 siblings, 2 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 13:40 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, will, mark.rutland, maz, Ard Biesheuvel,
	Kees Cook, Sami Tolvanen, Fangrui Song, Nick Desaulniers, Dan Li

Implement dynamic shadow call stack support on Clang, by parsing the
unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
instructions, and replacing them with the shadow call stack push and pop
instructions, respectively.

This is useful because the overhead of the shadow call stack is
difficult to justify on hardware that implements pointer authentication
(PAC), and given that the PAC instructions are executed as NOPs on
hardware that doesn't, we can just replace them without breaking
anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
manipulations of the return address, replacing them 1:1 with shadow call
stack pushes and pops is guaranteed to result in the desired behavior.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig            |   9 +
 arch/arm64/Makefile           |  10 +-
 arch/arm64/include/asm/scs.h  |  45 ++++
 arch/arm64/kernel/Makefile    |   2 +
 arch/arm64/kernel/head.S      |   3 +
 arch/arm64/kernel/irq.c       |   2 +-
 arch/arm64/kernel/module.c    |   8 +
 arch/arm64/kernel/patch-scs.c | 257 ++++++++++++++++++++
 arch/arm64/kernel/sdei.c      |   2 +-
 arch/arm64/kernel/setup.c     |   4 +
 10 files changed, 338 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5f92344edff5..9ff72e582522 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -369,6 +369,15 @@ config KASAN_SHADOW_OFFSET
 config UNWIND_TABLES
 	bool
 
+config UNWIND_PATCH_PAC_INTO_SCS
+	bool "Enable shadow call stack dynamically using code patching"
+	# needs Clang with https://reviews.llvm.org/D111780 incorporated
+	depends on CC_IS_CLANG && CLANG_VERSION >= 150000
+	depends on ARM64_PTR_AUTH_KERNEL && CC_HAS_BRANCH_PROT_PAC_RET
+	depends on SHADOW_CALL_STACK
+	select UNWIND_TABLES
+	select DYNAMIC_SCS
+
 source "arch/arm64/Kconfig.platforms"
 
 menu "Kernel Features"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 4fbca56fa602..e439ebbd167d 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -77,10 +77,16 @@ branch-prot-flags-$(CONFIG_CC_HAS_SIGN_RETURN_ADDRESS) := -msign-return-address=
 # We enable additional protection for leaf functions as there is some
 # narrow potential for ROP protection benefits and no substantial
 # performance impact has been observed.
+PACRET-y := pac-ret+leaf
+
+# Using a shadow call stack in leaf functions is too costly, so avoid PAC there
+# as well when we may be patching PAC into SCS
+PACRET-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS) := pac-ret
+
 ifeq ($(CONFIG_ARM64_BTI_KERNEL),y)
-branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET_BTI) := -mbranch-protection=pac-ret+leaf+bti
+branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET_BTI) := -mbranch-protection=$(PACRET-y)+bti
 else
-branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET) := -mbranch-protection=pac-ret+leaf
+branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET) := -mbranch-protection=$(PACRET-y)
 endif
 # -march=armv8.3-a enables the non-nops instructions for PAC, to avoid the
 # compiler to generate them and consequently to break the single image contract
diff --git a/arch/arm64/include/asm/scs.h b/arch/arm64/include/asm/scs.h
index 8297bccf0784..51fcfc96ba71 100644
--- a/arch/arm64/include/asm/scs.h
+++ b/arch/arm64/include/asm/scs.h
@@ -24,6 +24,51 @@
 	.endm
 #endif /* CONFIG_SHADOW_CALL_STACK */
 
+
+#else
+
+#include <linux/scs.h>
+#include <asm/cpufeature.h>
+
+#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
+static inline bool should_patch_pac_into_scs(void)
+{
+	/*
+	 * We only enable the shadow call stack dynamically if we are running
+	 * on a system that does not implement PAC or BTI. PAC and SCS provide
+	 * roughly the same level of protection, and BTI relies on the PACIASP
+	 * instructions serving as landing pads, preventing us from patching
+	 * those instructions into something else.
+	 */
+	u64 reg = read_sysreg_s(SYS_ID_AA64ISAR1_EL1);
+
+	if (reg & ((0xf << ID_AA64ISAR1_APA_SHIFT) |
+		   (0xf << ID_AA64ISAR1_API_SHIFT)))
+		return false;
+
+	reg = read_sysreg_s(SYS_ID_AA64ISAR2_EL1);
+	if (reg & (0xf << ID_AA64ISAR2_APA3_SHIFT))
+		return false;
+
+	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) {
+		reg = read_sysreg_s(SYS_ID_AA64PFR1_EL1);
+		if (reg & (0xf << ID_AA64PFR1_BT_SHIFT))
+			return false;
+	}
+	return true;
+}
+
+static inline void dynamic_scs_init(void)
+{
+	if (should_patch_pac_into_scs())
+		static_branch_enable(&dynamic_scs_enabled);
+}
+#else
+static inline void dynamic_scs_init(void) {}
+#endif
+
+int scs_patch(const u8 eh_frame[], int size);
+
 #endif /* __ASSEMBLY __ */
 
 #endif /* _ASM_SCS_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index fa7981d0d917..bd5ab51f86fb 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -74,6 +74,8 @@ obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
 obj-$(CONFIG_ARM64_MTE)			+= mte.o
 obj-y					+= vdso-wrap.o
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
+obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
+CFLAGS_patch-scs.o			+= -mbranch-protection=none
 
 # Force dependency (vdso*-wrap.S includes vdso.so through incbin)
 $(obj)/vdso-wrap.o: $(obj)/vdso/vdso.so
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 6a98f1a38c29..e9601c8a1bcd 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -453,6 +453,9 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	mov	x0, x21				// pass FDT address in x0
 	bl	early_fdt_map			// Try mapping the FDT early
 	bl	init_feature_override		// Parse cpu feature overrides
+#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
+	bl	scs_patch_vmlinux
+#endif
 #ifdef CONFIG_RANDOMIZE_BASE
 	tst	x23, ~(MIN_KIMG_ALIGN - 1)	// already running randomized?
 	b.ne	0f
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index bda49430c9ea..c284ec35c27c 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -39,7 +39,7 @@ static void init_irq_scs(void)
 {
 	int cpu;
 
-	if (!IS_ENABLED(CONFIG_SHADOW_CALL_STACK))
+	if (!scs_is_enabled())
 		return;
 
 	for_each_possible_cpu(cpu)
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index f2d4bb14bfab..111dc6414e6d 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -15,9 +15,11 @@
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/moduleloader.h>
+#include <linux/scs.h>
 #include <linux/vmalloc.h>
 #include <asm/alternative.h>
 #include <asm/insn.h>
+#include <asm/scs.h>
 #include <asm/sections.h>
 
 void *module_alloc(unsigned long size)
@@ -529,5 +531,11 @@ int module_finalize(const Elf_Ehdr *hdr,
 	if (s)
 		apply_alternatives_module((void *)s->sh_addr, s->sh_size);
 
+	if (scs_is_dynamic()) {
+		s = find_section(hdr, sechdrs, ".init.eh_frame");
+		if (s)
+			scs_patch((void *)s->sh_addr, s->sh_size);
+	}
+
 	return module_init_ftrace_plt(hdr, sechdrs, me);
 }
diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/patch-scs.c
new file mode 100644
index 000000000000..1b3da02d5b74
--- /dev/null
+++ b/arch/arm64/kernel/patch-scs.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 - Google LLC
+ * Author: Ard Biesheuvel <ardb@google.com>
+ */
+
+#include <linux/bug.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/printk.h>
+#include <linux/types.h>
+
+#include <asm/cacheflush.h>
+#include <asm/scs.h>
+
+//
+// This minimal DWARF CFI parser is partially based on the code in
+// arch/arc/kernel/unwind.c, and on the document below:
+// https://refspecs.linuxbase.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html
+//
+
+#define DW_CFA_nop                          0x00
+#define DW_CFA_set_loc                      0x01
+#define DW_CFA_advance_loc1                 0x02
+#define DW_CFA_advance_loc2                 0x03
+#define DW_CFA_advance_loc4                 0x04
+#define DW_CFA_offset_extended              0x05
+#define DW_CFA_restore_extended             0x06
+#define DW_CFA_undefined                    0x07
+#define DW_CFA_same_value                   0x08
+#define DW_CFA_register                     0x09
+#define DW_CFA_remember_state               0x0a
+#define DW_CFA_restore_state                0x0b
+#define DW_CFA_def_cfa                      0x0c
+#define DW_CFA_def_cfa_register             0x0d
+#define DW_CFA_def_cfa_offset               0x0e
+#define DW_CFA_def_cfa_expression           0x0f
+#define DW_CFA_expression                   0x10
+#define DW_CFA_offset_extended_sf           0x11
+#define DW_CFA_def_cfa_sf                   0x12
+#define DW_CFA_def_cfa_offset_sf            0x13
+#define DW_CFA_val_offset                   0x14
+#define DW_CFA_val_offset_sf                0x15
+#define DW_CFA_val_expression               0x16
+#define DW_CFA_lo_user                      0x1c
+#define DW_CFA_negate_ra_state              0x2d
+#define DW_CFA_GNU_args_size                0x2e
+#define DW_CFA_GNU_negative_offset_extended 0x2f
+#define DW_CFA_hi_user                      0x3f
+
+extern const u8 __eh_frame_start[], __eh_frame_end[];
+
+enum {
+	PACIASP		= 0xd503233f,
+	AUTIASP		= 0xd50323bf,
+	SCS_PUSH	= 0xf800865e,
+	SCS_POP		= 0xf85f8e5e,
+};
+
+static void __always_inline scs_patch_loc(u64 loc)
+{
+	u32 insn = le32_to_cpup((void *)loc);
+
+	switch (insn) {
+	case PACIASP:
+		*(u32 *)loc = cpu_to_le32(SCS_PUSH);
+		break;
+	case AUTIASP:
+		*(u32 *)loc = cpu_to_le32(SCS_POP);
+		break;
+	default:
+		/*
+		 * While the DW_CFA_negate_ra_state directive is guaranteed to
+		 * appear right after a PACIASP/AUTIASP instruction, it may
+		 * also appear after a DW_CFA_restore_state directive that
+		 * restores a state that is only partially accurate, and is
+		 * followed by DW_CFA_negate_ra_state directive to toggle the
+		 * PAC bit again. So we permit other instructions here, and ignore
+		 * them.
+		 */
+		return;
+	}
+	dcache_clean_pou(loc, loc + sizeof(u32));
+}
+
+/*
+ * Skip one uleb128/sleb128 encoded quantity from the opcode stream. All bytes
+ * except the last one have bit #7 set.
+ */
+static int __always_inline skip_xleb128(const u8 **opcode, int size)
+{
+	u8 c;
+
+	do {
+		c = *(*opcode)++;
+		size--;
+	} while (c & BIT(7));
+
+	return size;
+}
+
+struct eh_frame {
+	/*
+	 * The size of this frame if 0 < size < U32_MAX, 0 terminates the list.
+	 */
+	u32	size;
+
+	/*
+	 * The first frame is a Common Information Entry (CIE) frame, followed
+	 * by one or more Frame Description Entry (FDE) frames. In the former
+	 * case, this field is 0, otherwise it is the negated offset relative
+	 * to the associated CIE frame.
+	 */
+	u32	cie_id_or_pointer;
+
+	union {
+		struct { // CIE
+			u8	version;
+			u8	augmentation_string[];
+		};
+
+		struct { // FDE
+			s32	initial_loc;
+			s32	range;
+			u8	opcodes[];
+		};
+	};
+};
+
+static int noinstr scs_handle_fde_frame(const struct eh_frame *frame,
+					bool fde_has_augmentation_data,
+					int code_alignment_factor)
+{
+	int size = frame->size - offsetof(struct eh_frame, opcodes) + 4;
+	u64 loc = (u64)offset_to_ptr(&frame->initial_loc);
+	const u8 *opcode = frame->opcodes;
+
+	if (fde_has_augmentation_data) {
+		int l;
+
+		// assume single byte uleb128_t
+		if (WARN_ON(*opcode & BIT(7)))
+			return -ENOEXEC;
+
+		l = *opcode++;
+		opcode += l;
+		size -= l + 1;
+	}
+
+	/*
+	 * Starting from 'loc', apply the CFA opcodes that advance the location
+	 * pointer, and identify the locations of the PAC instructions.
+	 */
+	while (size-- > 0) {
+		switch (*opcode++) {
+		case DW_CFA_nop:
+		case DW_CFA_remember_state:
+		case DW_CFA_restore_state:
+			break;
+
+		case DW_CFA_advance_loc1:
+			loc += *opcode++ * code_alignment_factor;
+			size--;
+			break;
+
+		case DW_CFA_advance_loc2:
+			loc += *opcode++ * code_alignment_factor;
+			loc += (*opcode++ << 8) * code_alignment_factor;
+			size -= 2;
+			break;
+
+		case DW_CFA_def_cfa:
+		case DW_CFA_offset_extended:
+			size = skip_xleb128(&opcode, size);
+			fallthrough;
+		case DW_CFA_def_cfa_offset:
+		case DW_CFA_def_cfa_offset_sf:
+		case DW_CFA_def_cfa_register:
+		case DW_CFA_same_value:
+		case DW_CFA_restore_extended:
+		case 0x80 ... 0xbf:
+			size = skip_xleb128(&opcode, size);
+			break;
+
+		case DW_CFA_negate_ra_state:
+			scs_patch_loc(loc - 4);
+			break;
+
+		case 0x40 ... 0x7f:
+			// advance loc
+			loc += (opcode[-1] & 0x3f) * code_alignment_factor;
+			break;
+
+		case 0xc0 ... 0xff:
+			break;
+
+		default:
+			pr_err("unhandled opcode: %02x in FDE frame %lx\n", opcode[-1], (uintptr_t)frame);
+			return -ENOEXEC;
+		}
+	}
+	return 0;
+}
+
+int noinstr scs_patch(const u8 eh_frame[], int size)
+{
+	const u8 *p = eh_frame;
+
+	while (size > 4) {
+		const struct eh_frame *frame = (const void *)p;
+		bool fde_has_augmentation_data = true;
+		int code_alignment_factor = 1;
+		int ret;
+
+		if (frame->size == 0 ||
+		    frame->size == U32_MAX ||
+		    frame->size > size)
+			break;
+
+		if (frame->cie_id_or_pointer == 0) {
+			const u8 *p = frame->augmentation_string;
+
+			/* a 'z' in the augmentation string must come first */
+			fde_has_augmentation_data = *p == 'z';
+
+			/*
+			 * The code alignment factor is a uleb128 encoded field
+			 * but given that the only sensible values are 1 or 4,
+			 * there is no point in decoding the whole thing.
+			 */
+			p += strlen(p) + 1;
+			if (!WARN_ON(*p & BIT(7)))
+				code_alignment_factor = *p;
+		} else {
+			ret = scs_handle_fde_frame(frame,
+						   fde_has_augmentation_data,
+						   code_alignment_factor);
+			if (ret)
+				return ret;
+		}
+
+		p += sizeof(frame->size) + frame->size;
+		size -= sizeof(frame->size) + frame->size;
+	}
+	return 0;
+}
+
+asmlinkage void __init scs_patch_vmlinux(void)
+{
+	if (!should_patch_pac_into_scs())
+		return;
+
+	WARN_ON(scs_patch(__eh_frame_start, __eh_frame_end - __eh_frame_start));
+	icache_inval_all_pou();
+	isb();
+}
diff --git a/arch/arm64/kernel/sdei.c b/arch/arm64/kernel/sdei.c
index d20620a1c51a..30f3c7563694 100644
--- a/arch/arm64/kernel/sdei.c
+++ b/arch/arm64/kernel/sdei.c
@@ -144,7 +144,7 @@ static int init_sdei_scs(void)
 	int cpu;
 	int err = 0;
 
-	if (!IS_ENABLED(CONFIG_SHADOW_CALL_STACK))
+	if (!scs_is_enabled())
 		return 0;
 
 	for_each_possible_cpu(cpu) {
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index cf3a759f10d4..1b4f84563006 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -30,6 +30,7 @@
 #include <linux/efi.h>
 #include <linux/psci.h>
 #include <linux/sched/task.h>
+#include <linux/scs.h>
 #include <linux/mm.h>
 
 #include <asm/acpi.h>
@@ -42,6 +43,7 @@
 #include <asm/cpu_ops.h>
 #include <asm/kasan.h>
 #include <asm/numa.h>
+#include <asm/scs.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
@@ -313,6 +315,8 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 
 	parse_early_param();
 
+	dynamic_scs_init();
+
 	/*
 	 * Unmask asynchronous aborts and fiq after bringing up possible
 	 * earlycon. (Report possible System Errors once we can report this
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang
  2022-06-13 13:40 ` [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang Ard Biesheuvel
@ 2022-06-13 16:30   ` Kees Cook
  2022-06-13 16:50     ` Ard Biesheuvel
  2022-06-15 21:32   ` Sami Tolvanen
  1 sibling, 1 reply; 17+ messages in thread
From: Kees Cook @ 2022-06-13 16:30 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, catalin.marinas, will, mark.rutland, maz,
	Sami Tolvanen, Fangrui Song, Nick Desaulniers, Dan Li

On Mon, Jun 13, 2022 at 03:40:08PM +0200, Ard Biesheuvel wrote:
> Implement dynamic shadow call stack support on Clang, by parsing the
> unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
> instructions, and replacing them with the shadow call stack push and pop
> instructions, respectively.
> 
> This is useful because the overhead of the shadow call stack is
> difficult to justify on hardware that implements pointer authentication
> (PAC), and given that the PAC instructions are executed as NOPs on
> hardware that doesn't, we can just replace them without breaking
> anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
> manipulations of the return address, replacing them 1:1 with shadow call
> stack pushes and pops is guaranteed to result in the desired behavior.

Specifically, the "PAC available" benefit is the per-thread memory
savings (no shadow stack needs to be allocated). Thanks for getting this
working!

> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Sami, can you test this for the cases you've needed this for?

In the meantime, Will, can you land this for -next so we can get maximal
test time?

-- 
Kees Cook

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang
  2022-06-13 16:30   ` Kees Cook
@ 2022-06-13 16:50     ` Ard Biesheuvel
  0 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 16:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-arm-kernel, catalin.marinas, will, mark.rutland, maz,
	Sami Tolvanen, Fangrui Song, Nick Desaulniers, Dan Li

On Mon, 13 Jun 2022 at 18:30, Kees Cook <keescook@chromium.org> wrote:
>
> On Mon, Jun 13, 2022 at 03:40:08PM +0200, Ard Biesheuvel wrote:
> > Implement dynamic shadow call stack support on Clang, by parsing the
> > unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
> > instructions, and replacing them with the shadow call stack push and pop
> > instructions, respectively.
> >
> > This is useful because the overhead of the shadow call stack is
> > difficult to justify on hardware that implements pointer authentication
> > (PAC), and given that the PAC instructions are executed as NOPs on
> > hardware that doesn't, we can just replace them without breaking
> > anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
> > manipulations of the return address, replacing them 1:1 with shadow call
> > stack pushes and pops is guaranteed to result in the desired behavior.
>
> Specifically, the "PAC available" benefit is the per-thread memory
> savings (no shadow stack needs to be allocated). Thanks for getting this
> working!
>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> Sami, can you test this for the cases you've needed this for?
>
> In the meantime, Will, can you land this for -next so we can get maximal
> test time?
>

I should note that this relies on Clang 15 which has not been released yet.

I have been using the clang-15 and lld-15 packages from

deb http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye main

and setting LLVM=-15 on the make command line.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] scs: add support for dynamic shadow call stacks
  2022-06-13 13:40 ` [PATCH v3 2/3] scs: add support for dynamic shadow call stacks Ard Biesheuvel
@ 2022-06-14  6:20   ` Ard Biesheuvel
  2022-06-15 17:12     ` Sami Tolvanen
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-14  6:20 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, will, mark.rutland, maz, Kees Cook,
	Sami Tolvanen, Fangrui Song, Nick Desaulniers, Dan Li, Kees Cook

On Mon, 13 Jun 2022 at 15:40, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> In order to allow arches to use code patching to conditionally emit the
> shadow stack pushes and pops, rather than always taking the performance
> hit even on CPUs that implement alternatives such as stack pointer
> authentication on arm64, add a Kconfig symbol that can be set by the
> arch to omit the SCS codegen itself, without otherwise affecting how
> support code for SCS and compiler options (for register reservation, for
> instance) are emitted.
>
> Also, add a static key and some plumbing to omit the allocation of
> shadow call stack for dynamic SCS configurations if SCS is disabled at
> runtime.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

This patch needs the following hunk applied on top to fix a build
error reported by the bots:

--- a/include/linux/scs.h
+++ b/include/linux/scs.h
@@ -57,6 +57,8 @@ DECLARE_STATIC_KEY_TRUE(dynamic_scs_enabled);

 static inline bool scs_is_dynamic(void)
 {
+       if (!IS_ENABLED(CONFIG_DYNAMIC_SCS))
+               return false;
        return static_branch_likely(&dynamic_scs_enabled);
 }


> ---
>  Makefile            |  2 ++
>  arch/Kconfig        |  7 +++++++
>  include/linux/scs.h | 16 ++++++++++++++++
>  kernel/scs.c        | 14 ++++++++++++--
>  4 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index c43d825a3c4c..806b1dea1218 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -883,8 +883,10 @@ LDFLAGS_vmlinux += --gc-sections
>  endif
>
>  ifdef CONFIG_SHADOW_CALL_STACK
> +ifndef CONFIG_DYNAMIC_SCS
>  CC_FLAGS_SCS   := -fsanitize=shadow-call-stack
>  KBUILD_CFLAGS  += $(CC_FLAGS_SCS)
> +endif
>  export CC_FLAGS_SCS
>  endif
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index fcf9a41a4ef5..a6048d78f05d 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -636,6 +636,13 @@ config SHADOW_CALL_STACK
>           reading and writing arbitrary memory may be able to locate them
>           and hijack control flow by modifying the stacks.
>
> +config DYNAMIC_SCS
> +       bool
> +       help
> +         Set by the arch code if it relies on code patching to insert the
> +         shadow call stack push and pop instructions rather than on the
> +         compiler.
> +
>  config LTO
>         bool
>         help
> diff --git a/include/linux/scs.h b/include/linux/scs.h
> index 18122d9e17ff..c62134d89c7b 100644
> --- a/include/linux/scs.h
> +++ b/include/linux/scs.h
> @@ -53,6 +53,20 @@ static inline bool task_scs_end_corrupted(struct task_struct *tsk)
>         return sz >= SCS_SIZE - 1 || READ_ONCE_NOCHECK(*magic) != SCS_END_MAGIC;
>  }
>
> +DECLARE_STATIC_KEY_TRUE(dynamic_scs_enabled);
> +
> +static inline bool scs_is_dynamic(void)
> +{
> +       return static_branch_likely(&dynamic_scs_enabled);
> +}
> +
> +static inline bool scs_is_enabled(void)
> +{
> +       if (!IS_ENABLED(CONFIG_DYNAMIC_SCS))
> +               return true;
> +       return scs_is_dynamic();
> +}
> +
>  #else /* CONFIG_SHADOW_CALL_STACK */
>
>  static inline void *scs_alloc(int node) { return NULL; }
> @@ -62,6 +76,8 @@ static inline void scs_task_reset(struct task_struct *tsk) {}
>  static inline int scs_prepare(struct task_struct *tsk, int node) { return 0; }
>  static inline void scs_release(struct task_struct *tsk) {}
>  static inline bool task_scs_end_corrupted(struct task_struct *tsk) { return false; }
> +static inline bool scs_is_enabled(void) { return false; }
> +static inline bool scs_is_dynamic(void) { return false; }
>
>  #endif /* CONFIG_SHADOW_CALL_STACK */
>
> diff --git a/kernel/scs.c b/kernel/scs.c
> index b7e1b096d906..8826794d2645 100644
> --- a/kernel/scs.c
> +++ b/kernel/scs.c
> @@ -12,6 +12,10 @@
>  #include <linux/vmalloc.h>
>  #include <linux/vmstat.h>
>
> +#ifdef CONFIG_DYNAMIC_SCS
> +DEFINE_STATIC_KEY_TRUE(dynamic_scs_enabled);
> +#endif
> +
>  static void __scs_account(void *s, int account)
>  {
>         struct page *scs_page = vmalloc_to_page(s);
> @@ -101,14 +105,20 @@ static int scs_cleanup(unsigned int cpu)
>
>  void __init scs_init(void)
>  {
> +       if (!scs_is_enabled())
> +               return;
>         cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL,
>                           scs_cleanup);
>  }
>
>  int scs_prepare(struct task_struct *tsk, int node)
>  {
> -       void *s = scs_alloc(node);
> +       void *s;
>
> +       if (!scs_is_enabled())
> +               return 0;
> +
> +       s = scs_alloc(node);
>         if (!s)
>                 return -ENOMEM;
>
> @@ -148,7 +158,7 @@ void scs_release(struct task_struct *tsk)
>  {
>         void *s = task_scs(tsk);
>
> -       if (!s)
> +       if (!scs_is_enabled() || !s)
>                 return;
>
>         WARN(task_scs_end_corrupted(tsk),
> --
> 2.30.2
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules
  2022-06-13 13:40 ` [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules Ard Biesheuvel
@ 2022-06-15 16:50   ` Sami Tolvanen
  2022-06-15 16:53     ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Sami Tolvanen @ 2022-06-15 16:50 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, catalin.marinas, will, mark.rutland, maz,
	Kees Cook, Fangrui Song, Nick Desaulniers, Dan Li

On Mon, Jun 13, 2022 at 03:40:06PM +0200, Ard Biesheuvel wrote:
> Enable asynchronous unwind table generation for both the core kernel as
> well as modules, and emit the resulting .eh_frame sections as init code
> so we can use the unwind directives for code patching at boot or module
> load time.
> 
> This will be used by dynamic shadow call stack support, which will rely
> on code patching rather than compiler codegen to emit the shadow call
> stack push and pop instructions.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
>  arch/arm64/Kconfig                    |  3 +++
>  arch/arm64/Makefile                   |  5 +++++
>  arch/arm64/include/asm/module.lds.h   |  8 ++++++++
>  arch/arm64/kernel/vmlinux.lds.S       | 13 +++++++++++++
>  arch/arm64/kvm/hyp/nvhe/Makefile      |  1 +
>  drivers/firmware/efi/libstub/Makefile |  1 +
>  6 files changed, 31 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1652a9800ebe..5f92344edff5 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -366,6 +366,9 @@ config KASAN_SHADOW_OFFSET
>  	default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
>  	default 0xffffffffffffffff
>  
> +config UNWIND_TABLES
> +	bool
> +
>  source "arch/arm64/Kconfig.platforms"
>  
>  menu "Kernel Features"
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 6d9d4a58b898..4fbca56fa602 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -45,8 +45,13 @@ KBUILD_CFLAGS	+= $(call cc-option,-mabi=lp64)
>  KBUILD_AFLAGS	+= $(call cc-option,-mabi=lp64)
>  
>  # Avoid generating .eh_frame* sections.
> +ifneq ($(CONFIG_UNWIND_TABLES),y)
>  KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables -fno-unwind-tables
>  KBUILD_AFLAGS	+= -fno-asynchronous-unwind-tables -fno-unwind-tables
> +else
> +KBUILD_CFLAGS	+= -fasynchronous-unwind-tables
> +KBUILD_AFLAGS	+= -fasynchronous-unwind-tables
> +endif
>  
>  ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
>  prepare: stack_protector_prepare
> diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
> index 094701ec5500..dbba4b7559aa 100644
> --- a/arch/arm64/include/asm/module.lds.h
> +++ b/arch/arm64/include/asm/module.lds.h
> @@ -17,4 +17,12 @@ SECTIONS {
>  	 */
>  	.text.hot : { *(.text.hot) }
>  #endif
> +
> +#ifdef CONFIG_UNWIND_TABLES
> +	/*
> +	 * Currently, we only use unwind info at module load time, so we can
> +	 * put it into the .init allocation.
> +	 */
> +	.init.eh_frame : { *(.eh_frame) }
> +#endif
>  }
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 2d4a8f995175..7bf4809f523d 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -120,6 +120,17 @@ jiffies = jiffies_64;
>  #define TRAMP_TEXT
>  #endif
>  
> +#ifdef CONFIG_UNWIND_TABLES
> +#define UNWIND_DATA_SECTIONS				\
> +	.eh_frame : {					\
> +		__eh_frame_start = .;			\
> +		*(.eh_frame)				\
> +		__eh_frame_end = .;			\
> +	}
> +#else
> +#define UNWIND_DATA_SECTIONS
> +#endif

How does this work with SANITIZER_DISCARDS dropping .eh_frame in
include/asm-generic/vmlinux.lds.h and scripts/module.lds.S? We would
definitely want to enable this together with CONFIG_CFI_CLANG, so it
seems like we'd have to drop the discard rules as well.

Sami

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules
  2022-06-15 16:50   ` Sami Tolvanen
@ 2022-06-15 16:53     ` Ard Biesheuvel
  2022-06-15 21:29       ` Kees Cook
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-15 16:53 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: linux-arm-kernel, catalin.marinas, will, mark.rutland, maz,
	Kees Cook, Fangrui Song, Nick Desaulniers, Dan Li

On Wed, 15 Jun 2022 at 18:50, Sami Tolvanen <samitolvanen@google.com> wrote:
>
> On Mon, Jun 13, 2022 at 03:40:06PM +0200, Ard Biesheuvel wrote:
> > Enable asynchronous unwind table generation for both the core kernel as
> > well as modules, and emit the resulting .eh_frame sections as init code
> > so we can use the unwind directives for code patching at boot or module
> > load time.
> >
> > This will be used by dynamic shadow call stack support, which will rely
> > on code patching rather than compiler codegen to emit the shadow call
> > stack push and pop instructions.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> > ---
> >  arch/arm64/Kconfig                    |  3 +++
> >  arch/arm64/Makefile                   |  5 +++++
> >  arch/arm64/include/asm/module.lds.h   |  8 ++++++++
> >  arch/arm64/kernel/vmlinux.lds.S       | 13 +++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/Makefile      |  1 +
> >  drivers/firmware/efi/libstub/Makefile |  1 +
> >  6 files changed, 31 insertions(+)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 1652a9800ebe..5f92344edff5 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -366,6 +366,9 @@ config KASAN_SHADOW_OFFSET
> >       default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
> >       default 0xffffffffffffffff
> >
> > +config UNWIND_TABLES
> > +     bool
> > +
> >  source "arch/arm64/Kconfig.platforms"
> >
> >  menu "Kernel Features"
> > diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> > index 6d9d4a58b898..4fbca56fa602 100644
> > --- a/arch/arm64/Makefile
> > +++ b/arch/arm64/Makefile
> > @@ -45,8 +45,13 @@ KBUILD_CFLAGS      += $(call cc-option,-mabi=lp64)
> >  KBUILD_AFLAGS        += $(call cc-option,-mabi=lp64)
> >
> >  # Avoid generating .eh_frame* sections.
> > +ifneq ($(CONFIG_UNWIND_TABLES),y)
> >  KBUILD_CFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
> >  KBUILD_AFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > +else
> > +KBUILD_CFLAGS        += -fasynchronous-unwind-tables
> > +KBUILD_AFLAGS        += -fasynchronous-unwind-tables
> > +endif
> >
> >  ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
> >  prepare: stack_protector_prepare
> > diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
> > index 094701ec5500..dbba4b7559aa 100644
> > --- a/arch/arm64/include/asm/module.lds.h
> > +++ b/arch/arm64/include/asm/module.lds.h
> > @@ -17,4 +17,12 @@ SECTIONS {
> >        */
> >       .text.hot : { *(.text.hot) }
> >  #endif
> > +
> > +#ifdef CONFIG_UNWIND_TABLES
> > +     /*
> > +      * Currently, we only use unwind info at module load time, so we can
> > +      * put it into the .init allocation.
> > +      */
> > +     .init.eh_frame : { *(.eh_frame) }
> > +#endif
> >  }
> > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > index 2d4a8f995175..7bf4809f523d 100644
> > --- a/arch/arm64/kernel/vmlinux.lds.S
> > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > @@ -120,6 +120,17 @@ jiffies = jiffies_64;
> >  #define TRAMP_TEXT
> >  #endif
> >
> > +#ifdef CONFIG_UNWIND_TABLES
> > +#define UNWIND_DATA_SECTIONS                         \
> > +     .eh_frame : {                                   \
> > +             __eh_frame_start = .;                   \
> > +             *(.eh_frame)                            \
> > +             __eh_frame_end = .;                     \
> > +     }
> > +#else
> > +#define UNWIND_DATA_SECTIONS
> > +#endif
>
> How does this work with SANITIZER_DISCARDS dropping .eh_frame in
> include/asm-generic/vmlinux.lds.h and scripts/module.lds.S? We would
> definitely want to enable this together with CONFIG_CFI_CLANG, so it
> seems like we'd have to drop the discard rules as well.
>

Good point, I had no idea that that existed.

Clang 13 should have the fix for the original issue, so we could make
this workaround specific to 12 and earlier.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] scs: add support for dynamic shadow call stacks
  2022-06-14  6:20   ` Ard Biesheuvel
@ 2022-06-15 17:12     ` Sami Tolvanen
  2022-06-16  7:14       ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Sami Tolvanen @ 2022-06-15 17:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, catalin.marinas, will, mark.rutland, maz,
	Kees Cook, Fangrui Song, Nick Desaulniers, Dan Li, Kees Cook

On Tue, Jun 14, 2022 at 08:20:13AM +0200, Ard Biesheuvel wrote:
> On Mon, 13 Jun 2022 at 15:40, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > In order to allow arches to use code patching to conditionally emit the
> > shadow stack pushes and pops, rather than always taking the performance
> > hit even on CPUs that implement alternatives such as stack pointer
> > authentication on arm64, add a Kconfig symbol that can be set by the
> > arch to omit the SCS codegen itself, without otherwise affecting how
> > support code for SCS and compiler options (for register reservation, for
> > instance) are emitted.
> >
> > Also, add a static key and some plumbing to omit the allocation of
> > shadow call stack for dynamic SCS configurations if SCS is disabled at
> > runtime.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> 
> This patch needs the following hunk applied on top to fix a build
> error reported by the bots:
> 
> --- a/include/linux/scs.h
> +++ b/include/linux/scs.h
> @@ -57,6 +57,8 @@ DECLARE_STATIC_KEY_TRUE(dynamic_scs_enabled);
> 
>  static inline bool scs_is_dynamic(void)
>  {
> +       if (!IS_ENABLED(CONFIG_DYNAMIC_SCS))
> +               return false;
>         return static_branch_likely(&dynamic_scs_enabled);
>  }

With this:

Reviewed-by: Sami Tolvanen <samitolvanen@google.com>

Sami

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules
  2022-06-15 16:53     ` Ard Biesheuvel
@ 2022-06-15 21:29       ` Kees Cook
  2022-06-15 21:52         ` Fangrui Song
  0 siblings, 1 reply; 17+ messages in thread
From: Kees Cook @ 2022-06-15 21:29 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Sami Tolvanen, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Fangrui Song, Nick Desaulniers,
	Dan Li

On Wed, Jun 15, 2022 at 9:54 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 15 Jun 2022 at 18:50, Sami Tolvanen <samitolvanen@google.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 03:40:06PM +0200, Ard Biesheuvel wrote:
> > > Enable asynchronous unwind table generation for both the core kernel as
> > > well as modules, and emit the resulting .eh_frame sections as init code
> > > so we can use the unwind directives for code patching at boot or module
> > > load time.
> > >
> > > This will be used by dynamic shadow call stack support, which will rely
> > > on code patching rather than compiler codegen to emit the shadow call
> > > stack push and pop instructions.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> > > ---
> > >  arch/arm64/Kconfig                    |  3 +++
> > >  arch/arm64/Makefile                   |  5 +++++
> > >  arch/arm64/include/asm/module.lds.h   |  8 ++++++++
> > >  arch/arm64/kernel/vmlinux.lds.S       | 13 +++++++++++++
> > >  arch/arm64/kvm/hyp/nvhe/Makefile      |  1 +
> > >  drivers/firmware/efi/libstub/Makefile |  1 +
> > >  6 files changed, 31 insertions(+)
> > >
> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > index 1652a9800ebe..5f92344edff5 100644
> > > --- a/arch/arm64/Kconfig
> > > +++ b/arch/arm64/Kconfig
> > > @@ -366,6 +366,9 @@ config KASAN_SHADOW_OFFSET
> > >       default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
> > >       default 0xffffffffffffffff
> > >
> > > +config UNWIND_TABLES
> > > +     bool
> > > +
> > >  source "arch/arm64/Kconfig.platforms"
> > >
> > >  menu "Kernel Features"
> > > diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> > > index 6d9d4a58b898..4fbca56fa602 100644
> > > --- a/arch/arm64/Makefile
> > > +++ b/arch/arm64/Makefile
> > > @@ -45,8 +45,13 @@ KBUILD_CFLAGS      += $(call cc-option,-mabi=lp64)
> > >  KBUILD_AFLAGS        += $(call cc-option,-mabi=lp64)
> > >
> > >  # Avoid generating .eh_frame* sections.
> > > +ifneq ($(CONFIG_UNWIND_TABLES),y)
> > >  KBUILD_CFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > >  KBUILD_AFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > > +else
> > > +KBUILD_CFLAGS        += -fasynchronous-unwind-tables
> > > +KBUILD_AFLAGS        += -fasynchronous-unwind-tables
> > > +endif
> > >
> > >  ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
> > >  prepare: stack_protector_prepare
> > > diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
> > > index 094701ec5500..dbba4b7559aa 100644
> > > --- a/arch/arm64/include/asm/module.lds.h
> > > +++ b/arch/arm64/include/asm/module.lds.h
> > > @@ -17,4 +17,12 @@ SECTIONS {
> > >        */
> > >       .text.hot : { *(.text.hot) }
> > >  #endif
> > > +
> > > +#ifdef CONFIG_UNWIND_TABLES
> > > +     /*
> > > +      * Currently, we only use unwind info at module load time, so we can
> > > +      * put it into the .init allocation.
> > > +      */
> > > +     .init.eh_frame : { *(.eh_frame) }
> > > +#endif
> > >  }
> > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > > index 2d4a8f995175..7bf4809f523d 100644
> > > --- a/arch/arm64/kernel/vmlinux.lds.S
> > > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > > @@ -120,6 +120,17 @@ jiffies = jiffies_64;
> > >  #define TRAMP_TEXT
> > >  #endif
> > >
> > > +#ifdef CONFIG_UNWIND_TABLES
> > > +#define UNWIND_DATA_SECTIONS                         \
> > > +     .eh_frame : {                                   \
> > > +             __eh_frame_start = .;                   \
> > > +             *(.eh_frame)                            \
> > > +             __eh_frame_end = .;                     \
> > > +     }
> > > +#else
> > > +#define UNWIND_DATA_SECTIONS
> > > +#endif
> >
> > How does this work with SANITIZER_DISCARDS dropping .eh_frame in
> > include/asm-generic/vmlinux.lds.h and scripts/module.lds.S? We would
> > definitely want to enable this together with CONFIG_CFI_CLANG, so it
> > seems like we'd have to drop the discard rules as well.
> >
>
> Good point, I had no idea that that existed.
>
> Clang 13 should have the fix for the original issue, so we could make
> this workaround specific to 12 and earlier.

Yeah, I like this -- I'd prefer to know when we get "surprise" sections again.


-- 
Kees Cook

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang
  2022-06-13 13:40 ` [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang Ard Biesheuvel
  2022-06-13 16:30   ` Kees Cook
@ 2022-06-15 21:32   ` Sami Tolvanen
  2022-06-16 10:51     ` Ard Biesheuvel
  1 sibling, 1 reply; 17+ messages in thread
From: Sami Tolvanen @ 2022-06-15 21:32 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, catalin.marinas, will, mark.rutland, maz,
	Kees Cook, Fangrui Song, Nick Desaulniers, Dan Li

On Mon, Jun 13, 2022 at 03:40:08PM +0200, Ard Biesheuvel wrote:
> Implement dynamic shadow call stack support on Clang, by parsing the
> unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
> instructions, and replacing them with the shadow call stack push and pop
> instructions, respectively.
> 
> This is useful because the overhead of the shadow call stack is
> difficult to justify on hardware that implements pointer authentication
> (PAC), and given that the PAC instructions are executed as NOPs on
> hardware that doesn't, we can just replace them without breaking
> anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
> manipulations of the return address, replacing them 1:1 with shadow call
> stack pushes and pops is guaranteed to result in the desired behavior.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/Kconfig            |   9 +
>  arch/arm64/Makefile           |  10 +-
>  arch/arm64/include/asm/scs.h  |  45 ++++
>  arch/arm64/kernel/Makefile    |   2 +
>  arch/arm64/kernel/head.S      |   3 +
>  arch/arm64/kernel/irq.c       |   2 +-
>  arch/arm64/kernel/module.c    |   8 +
>  arch/arm64/kernel/patch-scs.c | 257 ++++++++++++++++++++
>  arch/arm64/kernel/sdei.c      |   2 +-
>  arch/arm64/kernel/setup.c     |   4 +
>  10 files changed, 338 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 5f92344edff5..9ff72e582522 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -369,6 +369,15 @@ config KASAN_SHADOW_OFFSET
>  config UNWIND_TABLES
>  	bool
>  
> +config UNWIND_PATCH_PAC_INTO_SCS
> +	bool "Enable shadow call stack dynamically using code patching"
> +	# needs Clang with https://reviews.llvm.org/D111780 incorporated
> +	depends on CC_IS_CLANG && CLANG_VERSION >= 150000
> +	depends on ARM64_PTR_AUTH_KERNEL && CC_HAS_BRANCH_PROT_PAC_RET
> +	depends on SHADOW_CALL_STACK
> +	select UNWIND_TABLES
> +	select DYNAMIC_SCS
> +
>  source "arch/arm64/Kconfig.platforms"
>  
>  menu "Kernel Features"
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 4fbca56fa602..e439ebbd167d 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -77,10 +77,16 @@ branch-prot-flags-$(CONFIG_CC_HAS_SIGN_RETURN_ADDRESS) := -msign-return-address=
>  # We enable additional protection for leaf functions as there is some
>  # narrow potential for ROP protection benefits and no substantial
>  # performance impact has been observed.
> +PACRET-y := pac-ret+leaf
> +
> +# Using a shadow call stack in leaf functions is too costly, so avoid PAC there
> +# as well when we may be patching PAC into SCS
> +PACRET-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS) := pac-ret
> +
>  ifeq ($(CONFIG_ARM64_BTI_KERNEL),y)
> -branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET_BTI) := -mbranch-protection=pac-ret+leaf+bti
> +branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET_BTI) := -mbranch-protection=$(PACRET-y)+bti
>  else
> -branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET) := -mbranch-protection=pac-ret+leaf
> +branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET) := -mbranch-protection=$(PACRET-y)
>  endif
>  # -march=armv8.3-a enables the non-nops instructions for PAC, to avoid the
>  # compiler to generate them and consequently to break the single image contract
> diff --git a/arch/arm64/include/asm/scs.h b/arch/arm64/include/asm/scs.h
> index 8297bccf0784..51fcfc96ba71 100644
> --- a/arch/arm64/include/asm/scs.h
> +++ b/arch/arm64/include/asm/scs.h
> @@ -24,6 +24,51 @@
>  	.endm
>  #endif /* CONFIG_SHADOW_CALL_STACK */
>  
> +
> +#else
> +
> +#include <linux/scs.h>
> +#include <asm/cpufeature.h>
> +
> +#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
> +static inline bool should_patch_pac_into_scs(void)
> +{
> +	/*
> +	 * We only enable the shadow call stack dynamically if we are running
> +	 * on a system that does not implement PAC or BTI. PAC and SCS provide
> +	 * roughly the same level of protection, and BTI relies on the PACIASP
> +	 * instructions serving as landing pads, preventing us from patching
> +	 * those instructions into something else.
> +	 */
> +	u64 reg = read_sysreg_s(SYS_ID_AA64ISAR1_EL1);
> +
> +	if (reg & ((0xf << ID_AA64ISAR1_APA_SHIFT) |
> +		   (0xf << ID_AA64ISAR1_API_SHIFT)))
> +		return false;
> +
> +	reg = read_sysreg_s(SYS_ID_AA64ISAR2_EL1);
> +	if (reg & (0xf << ID_AA64ISAR2_APA3_SHIFT))
> +		return false;
> +
> +	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) {
> +		reg = read_sysreg_s(SYS_ID_AA64PFR1_EL1);
> +		if (reg & (0xf << ID_AA64PFR1_BT_SHIFT))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static inline void dynamic_scs_init(void)
> +{
> +	if (should_patch_pac_into_scs())
> +		static_branch_enable(&dynamic_scs_enabled);
> +}

Should we print out a message to indicate we are actually enabling SCS
at runtime? Otherwise I think the only way to know would be to look at
/proc/meminfo, for example.

> +#else
> +static inline void dynamic_scs_init(void) {}
> +#endif
> +
> +int scs_patch(const u8 eh_frame[], int size);
> +
>  #endif /* __ASSEMBLY __ */
>  
>  #endif /* _ASM_SCS_H */
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index fa7981d0d917..bd5ab51f86fb 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -74,6 +74,8 @@ obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
>  obj-$(CONFIG_ARM64_MTE)			+= mte.o
>  obj-y					+= vdso-wrap.o
>  obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
> +obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
> +CFLAGS_patch-scs.o			+= -mbranch-protection=none
>  
>  # Force dependency (vdso*-wrap.S includes vdso.so through incbin)
>  $(obj)/vdso-wrap.o: $(obj)/vdso/vdso.so
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 6a98f1a38c29..e9601c8a1bcd 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -453,6 +453,9 @@ SYM_FUNC_START_LOCAL(__primary_switched)
>  	mov	x0, x21				// pass FDT address in x0
>  	bl	early_fdt_map			// Try mapping the FDT early
>  	bl	init_feature_override		// Parse cpu feature overrides
> +#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
> +	bl	scs_patch_vmlinux
> +#endif
>  #ifdef CONFIG_RANDOMIZE_BASE
>  	tst	x23, ~(MIN_KIMG_ALIGN - 1)	// already running randomized?
>  	b.ne	0f
> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
> index bda49430c9ea..c284ec35c27c 100644
> --- a/arch/arm64/kernel/irq.c
> +++ b/arch/arm64/kernel/irq.c
> @@ -39,7 +39,7 @@ static void init_irq_scs(void)
>  {
>  	int cpu;
>  
> -	if (!IS_ENABLED(CONFIG_SHADOW_CALL_STACK))
> +	if (!scs_is_enabled())
>  		return;
>  
>  	for_each_possible_cpu(cpu)

I applied the series on top of -rc1 and tested it in qemu. With -cpu
cortex-a57 everything seems to work and PAC instructions are correctly
patched into SCS push/pop:

Thread 1 hit Breakpoint 1, 0xffff8000098b7920 in scs_patch_vmlinux ()
(gdb) x/1i scs_alloc
   0xffff8000081daeb8 <scs_alloc>:      paciasp
(gdb) finish
Run till exit from #0  0xffff8000098b7920 in scs_patch_vmlinux ()
0xffff8000098b03cc in __primary_switched ()
(gdb) x/1i scs_alloc
   0xffff8000081daeb8 <scs_alloc>:      str     x30, [x18], #8

However, with -cpu max I'm still seeing calls to scs_alloc despite the
following:

# dmesg | grep "Address authentication"
[    0.000000] CPU features: detected: Address authentication (architected QARMA5 algorithm)
# zcat /proc/config.gz | grep -E '(SHADOW_|UNWIND|DYNAMIC_SCS)'
CONFIG_UNWIND_TABLES=y
CONFIG_UNWIND_PATCH_PAC_INTO_SCS=y
CONFIG_CC_HAVE_SHADOW_CALL_STACK=y
CONFIG_ARCH_SUPPORTS_SHADOW_CALL_STACK=y
CONFIG_SHADOW_CALL_STACK=y
CONFIG_DYNAMIC_SCS=y

It looks like we're correctly leaving the PAC instructions unpatched,
but scs_is_enabled must be returning true:

Thread 1 hit Breakpoint 1, 0xffff8000098b7920 in scs_patch_vmlinux ()
(gdb) x/1i scs_alloc
   0xffff8000081daeb8 <scs_alloc>:      paciasp
(gdb) finish
Run till exit from #0  0xffff8000098b7920 in scs_patch_vmlinux ()
0xffff8000098b03cc in __primary_switched ()
(gdb) x/1i scs_alloc
   0xffff8000081daeb8 <scs_alloc>:      paciasp
(gdb) c
Continuing.

Thread 1 hit Breakpoint 2, 0xffff8000081daeb8 in scs_alloc ()
(gdb) bt
#0  0xffff8000081daeb8 in scs_alloc ()
#1  0xffff800008015b60 in init_irq_scs ()
#2  0xffff8000098b3c5c in init_IRQ ()
#3  0xffff8000098b0954 in start_kernel ()
#4  0xffff8000098b03f4 in __primary_switched ()

I'm guessing this is also why I'm getting the following panic after
attempting to load a module:

[   25.549517] Unhandled 64-bit el1h sync exception on CPU0, ESR 0x0000000034000003 -- BTI
[   25.549893] CPU: 0 PID: 156 Comm: modprobe Not tainted 5.19.0-rc1-00246-gef5e30ca27c0-dirty #1
[   25.550139] Hardware name: linux,dummy-virt (DT)
[   25.550426] pstate: 80400c09 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=j-)
[   25.550591] pc : scs_handle_fde_frame+0x80/0x17c
[   25.551341] lr : scs_handle_fde_frame+0x54/0x17c
[   25.551424] sp : ffff80000a67bb90
[   25.551476] x29: ffff80000a67bb90 x28: 0000000000000000 x27: ffff800000ff6025
[   25.552440] x26: 0000000000000003 x25: 0000000000000002 x24: ffff800000ff6026
[   25.552584] x23: ffff800008ffd000 x22: 00000000d50323bf x21: 00000000d503233f
[   25.552725] x20: ffff800000ff0048 x19: ffff800000ff6014 x18: ffff80000a5d1000
[   25.552867] x17: 0000000000000000 x16: ffff00001fe9cd88 x15: 24003032336f7470
[   25.553020] x14: 180f0a0700000000 x13: 0000000000000040 x12: 0000000000000005
[   25.553161] x11: 6c6300656c75646f x10: 0000000000000010 x9 : ffffffffffffffe4
[   25.553354] x8 : ffff800008fd1870 x7 : 7f7f7f7f7f7f7f7f x6 : 16fefeff0eff1e0b
[   25.553511] x5 : 0080808000800000 x4 : 0000000000000010 x3 : 1800000010001f0c
[   25.553645] x2 : 1b011e7c0100527a x1 : 0000000000000000 x0 : ffff800000ff0048
[   25.553980] Kernel panic - not syncing: Unhandled exception
[   25.554134] CPU: 0 PID: 156 Comm: modprobe Not tainted 5.19.0-rc1-00246-gef5e30ca27c0-dirty #1
[   25.554254] Hardware name: linux,dummy-virt (DT)
[   25.554394] Call trace:
[   25.554512]  dump_backtrace+0xe4/0x104
[   25.554773]  show_stack+0x18/0x24
[   25.554893]  dump_stack_lvl+0x64/0x7c
[   25.554983]  dump_stack+0x18/0x38
[   25.555067]  panic+0x14c/0x370
[   25.555149]  el1t_64_irq_handler+0x0/0x1c
[   25.555231]  el1_abort+0x0/0x5c
[   25.555314]  el1h_64_sync+0x64/0x68
[   25.555395]  scs_handle_fde_frame+0x80/0x17c
[   25.555482]  scs_patch+0x7c/0xa0
[   25.555565]  module_finalize+0xe0/0x100
[   25.555647]  post_relocation+0xd4/0xf0
[   25.555730]  load_module+0x924/0x11fc
[   25.555811]  __arm64_sys_finit_module+0xbc/0x108
[   25.555895]  invoke_syscall+0x40/0x118
[   25.555984]  el0_svc_common+0xb4/0xf0
[   25.556067]  do_el0_svc+0x2c/0xb4
[   25.556150]  el0_svc+0x2c/0x7c
[   25.556233]  el0t_64_sync_handler+0x84/0xf0
[   25.556315]  el0t_64_sync+0x190/0x194

This is with defconfig + SHADOW_CALL_STACK + UNWIND_PATCH_PAC_INTO_SCS.
Any thoughts if I'm doing something wrong here?

Sami

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules
  2022-06-15 21:29       ` Kees Cook
@ 2022-06-15 21:52         ` Fangrui Song
  2022-06-16  7:14           ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Fangrui Song @ 2022-06-15 21:52 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Sami Tolvanen, linux-arm-kernel, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Nick Desaulniers,
	Dan Li

On 2022-06-15, Kees Cook wrote:
>On Wed, Jun 15, 2022 at 9:54 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>>
>> On Wed, 15 Jun 2022 at 18:50, Sami Tolvanen <samitolvanen@google.com> wrote:
>> >
>> > On Mon, Jun 13, 2022 at 03:40:06PM +0200, Ard Biesheuvel wrote:
>> > > Enable asynchronous unwind table generation for both the core kernel as
>> > > well as modules, and emit the resulting .eh_frame sections as init code
>> > > so we can use the unwind directives for code patching at boot or module
>> > > load time.
>> > >
>> > > This will be used by dynamic shadow call stack support, which will rely
>> > > on code patching rather than compiler codegen to emit the shadow call
>> > > stack push and pop instructions.
>> > >
>> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>> > > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
>> > > ---
>> > >  arch/arm64/Kconfig                    |  3 +++
>> > >  arch/arm64/Makefile                   |  5 +++++
>> > >  arch/arm64/include/asm/module.lds.h   |  8 ++++++++
>> > >  arch/arm64/kernel/vmlinux.lds.S       | 13 +++++++++++++
>> > >  arch/arm64/kvm/hyp/nvhe/Makefile      |  1 +
>> > >  drivers/firmware/efi/libstub/Makefile |  1 +
>> > >  6 files changed, 31 insertions(+)
>> > >
>> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> > > index 1652a9800ebe..5f92344edff5 100644
>> > > --- a/arch/arm64/Kconfig
>> > > +++ b/arch/arm64/Kconfig
>> > > @@ -366,6 +366,9 @@ config KASAN_SHADOW_OFFSET
>> > >       default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
>> > >       default 0xffffffffffffffff
>> > >
>> > > +config UNWIND_TABLES
>> > > +     bool
>> > > +
>> > >  source "arch/arm64/Kconfig.platforms"
>> > >
>> > >  menu "Kernel Features"
>> > > diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
>> > > index 6d9d4a58b898..4fbca56fa602 100644
>> > > --- a/arch/arm64/Makefile
>> > > +++ b/arch/arm64/Makefile
>> > > @@ -45,8 +45,13 @@ KBUILD_CFLAGS      += $(call cc-option,-mabi=lp64)
>> > >  KBUILD_AFLAGS        += $(call cc-option,-mabi=lp64)
>> > >
>> > >  # Avoid generating .eh_frame* sections.
>> > > +ifneq ($(CONFIG_UNWIND_TABLES),y)
>> > >  KBUILD_CFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
>> > >  KBUILD_AFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
>> > > +else
>> > > +KBUILD_CFLAGS        += -fasynchronous-unwind-tables
>> > > +KBUILD_AFLAGS        += -fasynchronous-unwind-tables
>> > > +endif
>> > >
>> > >  ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
>> > >  prepare: stack_protector_prepare
>> > > diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
>> > > index 094701ec5500..dbba4b7559aa 100644
>> > > --- a/arch/arm64/include/asm/module.lds.h
>> > > +++ b/arch/arm64/include/asm/module.lds.h
>> > > @@ -17,4 +17,12 @@ SECTIONS {
>> > >        */
>> > >       .text.hot : { *(.text.hot) }
>> > >  #endif
>> > > +
>> > > +#ifdef CONFIG_UNWIND_TABLES
>> > > +     /*
>> > > +      * Currently, we only use unwind info at module load time, so we can
>> > > +      * put it into the .init allocation.
>> > > +      */
>> > > +     .init.eh_frame : { *(.eh_frame) }
>> > > +#endif
>> > >  }
>> > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>> > > index 2d4a8f995175..7bf4809f523d 100644
>> > > --- a/arch/arm64/kernel/vmlinux.lds.S
>> > > +++ b/arch/arm64/kernel/vmlinux.lds.S
>> > > @@ -120,6 +120,17 @@ jiffies = jiffies_64;
>> > >  #define TRAMP_TEXT
>> > >  #endif
>> > >
>> > > +#ifdef CONFIG_UNWIND_TABLES
>> > > +#define UNWIND_DATA_SECTIONS                         \
>> > > +     .eh_frame : {                                   \
>> > > +             __eh_frame_start = .;                   \
>> > > +             *(.eh_frame)                            \
>> > > +             __eh_frame_end = .;                     \
>> > > +     }
>> > > +#else
>> > > +#define UNWIND_DATA_SECTIONS
>> > > +#endif

How do you intend to use the encapsulation symbols __eh_frame_start
and __eh_frame_end ?

>> > How does this work with SANITIZER_DISCARDS dropping .eh_frame in
>> > include/asm-generic/vmlinux.lds.h and scripts/module.lds.S? We would
>> > definitely want to enable this together with CONFIG_CFI_CLANG, so it
>> > seems like we'd have to drop the discard rules as well.
>> >
>>
>> Good point, I had no idea that that existed.
>>
>> Clang 13 should have the fix for the original issue, so we could make
>> this workaround specific to 12 and earlier.
>
>Yeah, I like this -- I'd prefer to know when we get "surprise" sections again.

Does this need asynchronous unwind tables or just synchronous unwind
tables?

Note: with Clang < 15, the AArch64 codegen was in a quite bad state. It
has significantly improved since the https://reviews.llvm.org/D114545 patch series
but I am not confident to state that production use may not into an issue :-)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules
  2022-06-15 21:52         ` Fangrui Song
@ 2022-06-16  7:14           ` Ard Biesheuvel
  2022-06-16  7:24             ` Fangrui Song
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-16  7:14 UTC (permalink / raw)
  To: Fangrui Song
  Cc: Kees Cook, Sami Tolvanen, linux-arm-kernel, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Nick Desaulniers,
	Dan Li

On Wed, 15 Jun 2022 at 23:52, Fangrui Song <maskray@google.com> wrote:
>
> On 2022-06-15, Kees Cook wrote:
> >On Wed, Jun 15, 2022 at 9:54 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >>
> >> On Wed, 15 Jun 2022 at 18:50, Sami Tolvanen <samitolvanen@google.com> wrote:
> >> >
> >> > On Mon, Jun 13, 2022 at 03:40:06PM +0200, Ard Biesheuvel wrote:
> >> > > Enable asynchronous unwind table generation for both the core kernel as
> >> > > well as modules, and emit the resulting .eh_frame sections as init code
> >> > > so we can use the unwind directives for code patching at boot or module
> >> > > load time.
> >> > >
> >> > > This will be used by dynamic shadow call stack support, which will rely
> >> > > on code patching rather than compiler codegen to emit the shadow call
> >> > > stack push and pop instructions.
> >> > >
> >> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >> > > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> >> > > ---
> >> > >  arch/arm64/Kconfig                    |  3 +++
> >> > >  arch/arm64/Makefile                   |  5 +++++
> >> > >  arch/arm64/include/asm/module.lds.h   |  8 ++++++++
> >> > >  arch/arm64/kernel/vmlinux.lds.S       | 13 +++++++++++++
> >> > >  arch/arm64/kvm/hyp/nvhe/Makefile      |  1 +
> >> > >  drivers/firmware/efi/libstub/Makefile |  1 +
> >> > >  6 files changed, 31 insertions(+)
> >> > >
> >> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> > > index 1652a9800ebe..5f92344edff5 100644
> >> > > --- a/arch/arm64/Kconfig
> >> > > +++ b/arch/arm64/Kconfig
> >> > > @@ -366,6 +366,9 @@ config KASAN_SHADOW_OFFSET
> >> > >       default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
> >> > >       default 0xffffffffffffffff
> >> > >
> >> > > +config UNWIND_TABLES
> >> > > +     bool
> >> > > +
> >> > >  source "arch/arm64/Kconfig.platforms"
> >> > >
> >> > >  menu "Kernel Features"
> >> > > diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> >> > > index 6d9d4a58b898..4fbca56fa602 100644
> >> > > --- a/arch/arm64/Makefile
> >> > > +++ b/arch/arm64/Makefile
> >> > > @@ -45,8 +45,13 @@ KBUILD_CFLAGS      += $(call cc-option,-mabi=lp64)
> >> > >  KBUILD_AFLAGS        += $(call cc-option,-mabi=lp64)
> >> > >
> >> > >  # Avoid generating .eh_frame* sections.
> >> > > +ifneq ($(CONFIG_UNWIND_TABLES),y)
> >> > >  KBUILD_CFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
> >> > >  KBUILD_AFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
> >> > > +else
> >> > > +KBUILD_CFLAGS        += -fasynchronous-unwind-tables
> >> > > +KBUILD_AFLAGS        += -fasynchronous-unwind-tables
> >> > > +endif
> >> > >
> >> > >  ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
> >> > >  prepare: stack_protector_prepare
> >> > > diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
> >> > > index 094701ec5500..dbba4b7559aa 100644
> >> > > --- a/arch/arm64/include/asm/module.lds.h
> >> > > +++ b/arch/arm64/include/asm/module.lds.h
> >> > > @@ -17,4 +17,12 @@ SECTIONS {
> >> > >        */
> >> > >       .text.hot : { *(.text.hot) }
> >> > >  #endif
> >> > > +
> >> > > +#ifdef CONFIG_UNWIND_TABLES
> >> > > +     /*
> >> > > +      * Currently, we only use unwind info at module load time, so we can
> >> > > +      * put it into the .init allocation.
> >> > > +      */
> >> > > +     .init.eh_frame : { *(.eh_frame) }
> >> > > +#endif
> >> > >  }
> >> > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> >> > > index 2d4a8f995175..7bf4809f523d 100644
> >> > > --- a/arch/arm64/kernel/vmlinux.lds.S
> >> > > +++ b/arch/arm64/kernel/vmlinux.lds.S
> >> > > @@ -120,6 +120,17 @@ jiffies = jiffies_64;
> >> > >  #define TRAMP_TEXT
> >> > >  #endif
> >> > >
> >> > > +#ifdef CONFIG_UNWIND_TABLES
> >> > > +#define UNWIND_DATA_SECTIONS                         \
> >> > > +     .eh_frame : {                                   \
> >> > > +             __eh_frame_start = .;                   \
> >> > > +             *(.eh_frame)                            \
> >> > > +             __eh_frame_end = .;                     \
> >> > > +     }
> >> > > +#else
> >> > > +#define UNWIND_DATA_SECTIONS
> >> > > +#endif
>
> How do you intend to use the encapsulation symbols __eh_frame_start
> and __eh_frame_end ?
>

They are used in patch #3 to access the contents of the section from
the program itself.

> >> > How does this work with SANITIZER_DISCARDS dropping .eh_frame in
> >> > include/asm-generic/vmlinux.lds.h and scripts/module.lds.S? We would
> >> > definitely want to enable this together with CONFIG_CFI_CLANG, so it
> >> > seems like we'd have to drop the discard rules as well.
> >> >
> >>
> >> Good point, I had no idea that that existed.
> >>
> >> Clang 13 should have the fix for the original issue, so we could make
> >> this workaround specific to 12 and earlier.
> >
> >Yeah, I like this -- I'd prefer to know when we get "surprise" sections again.
>
> Does this need asynchronous unwind tables or just synchronous unwind
> tables?
>

I would assume that unwind tables that are only intended for
synchronous unwinding may not be instruction accurate when it comes to
the placement of the DW_CFA_negate_ra_state opcodes.

> Note: with Clang < 15, the AArch64 codegen was in a quite bad state. It
> has significantly improved since the https://reviews.llvm.org/D114545 patch series
> but I am not confident to state that production use may not into an issue :-)

This is why the feature is only enabled for Clang 15 or later.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] scs: add support for dynamic shadow call stacks
  2022-06-15 17:12     ` Sami Tolvanen
@ 2022-06-16  7:14       ` Ard Biesheuvel
  0 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-16  7:14 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Linux ARM, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Kees Cook, Fangrui Song, Nick Desaulniers, Dan Li,
	Kees Cook

On Wed, 15 Jun 2022 at 19:12, Sami Tolvanen <samitolvanen@google.com> wrote:
>
> On Tue, Jun 14, 2022 at 08:20:13AM +0200, Ard Biesheuvel wrote:
> > On Mon, 13 Jun 2022 at 15:40, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > In order to allow arches to use code patching to conditionally emit the
> > > shadow stack pushes and pops, rather than always taking the performance
> > > hit even on CPUs that implement alternatives such as stack pointer
> > > authentication on arm64, add a Kconfig symbol that can be set by the
> > > arch to omit the SCS codegen itself, without otherwise affecting how
> > > support code for SCS and compiler options (for register reservation, for
> > > instance) are emitted.
> > >
> > > Also, add a static key and some plumbing to omit the allocation of
> > > shadow call stack for dynamic SCS configurations if SCS is disabled at
> > > runtime.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> > > Reviewed-by: Kees Cook <keescook@chromium.org>
> >
> > This patch needs the following hunk applied on top to fix a build
> > error reported by the bots:
> >
> > --- a/include/linux/scs.h
> > +++ b/include/linux/scs.h
> > @@ -57,6 +57,8 @@ DECLARE_STATIC_KEY_TRUE(dynamic_scs_enabled);
> >
> >  static inline bool scs_is_dynamic(void)
> >  {
> > +       if (!IS_ENABLED(CONFIG_DYNAMIC_SCS))
> > +               return false;
> >         return static_branch_likely(&dynamic_scs_enabled);
> >  }
>
> With this:
>
> Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
>

Thanks. I had spotted that too, and came up with the exact same fix.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules
  2022-06-16  7:14           ` Ard Biesheuvel
@ 2022-06-16  7:24             ` Fangrui Song
  0 siblings, 0 replies; 17+ messages in thread
From: Fangrui Song @ 2022-06-16  7:24 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Sami Tolvanen, linux-arm-kernel, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Nick Desaulniers,
	Dan Li

On Thu, Jun 16, 2022 at 12:14 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 15 Jun 2022 at 23:52, Fangrui Song <maskray@google.com> wrote:
> >
> > On 2022-06-15, Kees Cook wrote:
> > >On Wed, Jun 15, 2022 at 9:54 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >>
> > >> On Wed, 15 Jun 2022 at 18:50, Sami Tolvanen <samitolvanen@google.com> wrote:
> > >> >
> > >> > On Mon, Jun 13, 2022 at 03:40:06PM +0200, Ard Biesheuvel wrote:
> > >> > > Enable asynchronous unwind table generation for both the core kernel as
> > >> > > well as modules, and emit the resulting .eh_frame sections as init code
> > >> > > so we can use the unwind directives for code patching at boot or module
> > >> > > load time.
> > >> > >
> > >> > > This will be used by dynamic shadow call stack support, which will rely
> > >> > > on code patching rather than compiler codegen to emit the shadow call
> > >> > > stack push and pop instructions.
> > >> > >
> > >> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > >> > > Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> > >> > > ---
> > >> > >  arch/arm64/Kconfig                    |  3 +++
> > >> > >  arch/arm64/Makefile                   |  5 +++++
> > >> > >  arch/arm64/include/asm/module.lds.h   |  8 ++++++++
> > >> > >  arch/arm64/kernel/vmlinux.lds.S       | 13 +++++++++++++
> > >> > >  arch/arm64/kvm/hyp/nvhe/Makefile      |  1 +
> > >> > >  drivers/firmware/efi/libstub/Makefile |  1 +
> > >> > >  6 files changed, 31 insertions(+)
> > >> > >
> > >> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > >> > > index 1652a9800ebe..5f92344edff5 100644
> > >> > > --- a/arch/arm64/Kconfig
> > >> > > +++ b/arch/arm64/Kconfig
> > >> > > @@ -366,6 +366,9 @@ config KASAN_SHADOW_OFFSET
> > >> > >       default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
> > >> > >       default 0xffffffffffffffff
> > >> > >
> > >> > > +config UNWIND_TABLES
> > >> > > +     bool
> > >> > > +
> > >> > >  source "arch/arm64/Kconfig.platforms"
> > >> > >
> > >> > >  menu "Kernel Features"
> > >> > > diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> > >> > > index 6d9d4a58b898..4fbca56fa602 100644
> > >> > > --- a/arch/arm64/Makefile
> > >> > > +++ b/arch/arm64/Makefile
> > >> > > @@ -45,8 +45,13 @@ KBUILD_CFLAGS      += $(call cc-option,-mabi=lp64)
> > >> > >  KBUILD_AFLAGS        += $(call cc-option,-mabi=lp64)
> > >> > >
> > >> > >  # Avoid generating .eh_frame* sections.
> > >> > > +ifneq ($(CONFIG_UNWIND_TABLES),y)
> > >> > >  KBUILD_CFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > >> > >  KBUILD_AFLAGS        += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > >> > > +else
> > >> > > +KBUILD_CFLAGS        += -fasynchronous-unwind-tables
> > >> > > +KBUILD_AFLAGS        += -fasynchronous-unwind-tables
> > >> > > +endif
> > >> > >
> > >> > >  ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
> > >> > >  prepare: stack_protector_prepare
> > >> > > diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
> > >> > > index 094701ec5500..dbba4b7559aa 100644
> > >> > > --- a/arch/arm64/include/asm/module.lds.h
> > >> > > +++ b/arch/arm64/include/asm/module.lds.h
> > >> > > @@ -17,4 +17,12 @@ SECTIONS {
> > >> > >        */
> > >> > >       .text.hot : { *(.text.hot) }
> > >> > >  #endif
> > >> > > +
> > >> > > +#ifdef CONFIG_UNWIND_TABLES
> > >> > > +     /*
> > >> > > +      * Currently, we only use unwind info at module load time, so we can
> > >> > > +      * put it into the .init allocation.
> > >> > > +      */
> > >> > > +     .init.eh_frame : { *(.eh_frame) }
> > >> > > +#endif
> > >> > >  }
> > >> > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > >> > > index 2d4a8f995175..7bf4809f523d 100644
> > >> > > --- a/arch/arm64/kernel/vmlinux.lds.S
> > >> > > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > >> > > @@ -120,6 +120,17 @@ jiffies = jiffies_64;
> > >> > >  #define TRAMP_TEXT
> > >> > >  #endif
> > >> > >
> > >> > > +#ifdef CONFIG_UNWIND_TABLES
> > >> > > +#define UNWIND_DATA_SECTIONS                         \
> > >> > > +     .eh_frame : {                                   \
> > >> > > +             __eh_frame_start = .;                   \
> > >> > > +             *(.eh_frame)                            \
> > >> > > +             __eh_frame_end = .;                     \
> > >> > > +     }
> > >> > > +#else
> > >> > > +#define UNWIND_DATA_SECTIONS
> > >> > > +#endif
> >
> > How do you intend to use the encapsulation symbols __eh_frame_start
> > and __eh_frame_end ?
> >
>
> They are used in patch #3 to access the contents of the section from
> the program itself.
>
> > >> > How does this work with SANITIZER_DISCARDS dropping .eh_frame in
> > >> > include/asm-generic/vmlinux.lds.h and scripts/module.lds.S? We would
> > >> > definitely want to enable this together with CONFIG_CFI_CLANG, so it
> > >> > seems like we'd have to drop the discard rules as well.
> > >> >
> > >>
> > >> Good point, I had no idea that that existed.
> > >>
> > >> Clang 13 should have the fix for the original issue, so we could make
> > >> this workaround specific to 12 and earlier.
> > >
> > >Yeah, I like this -- I'd prefer to know when we get "surprise" sections again.
> >
> > Does this need asynchronous unwind tables or just synchronous unwind
> > tables?
> >
>
> I would assume that unwind tables that are only intended for
> synchronous unwinding may not be instruction accurate when it comes to
> the placement of the DW_CFA_negate_ra_state opcodes.
>
> > Note: with Clang < 15, the AArch64 codegen was in a quite bad state. It
> > has significantly improved since the https://reviews.llvm.org/D114545 patch series
> > but I am not confident to state that production use may not into an issue :-)
>
> This is why the feature is only enabled for Clang 15 or later.

Ok, thanks for the clarification.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang
  2022-06-15 21:32   ` Sami Tolvanen
@ 2022-06-16 10:51     ` Ard Biesheuvel
  0 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2022-06-16 10:51 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Linux ARM, Catalin Marinas, Will Deacon, Mark Rutland,
	Marc Zyngier, Kees Cook, Fangrui Song, Nick Desaulniers, Dan Li

On Wed, 15 Jun 2022 at 23:33, Sami Tolvanen <samitolvanen@google.com> wrote:
>
> On Mon, Jun 13, 2022 at 03:40:08PM +0200, Ard Biesheuvel wrote:
> > Implement dynamic shadow call stack support on Clang, by parsing the
> > unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
> > instructions, and replacing them with the shadow call stack push and pop
> > instructions, respectively.
> >
> > This is useful because the overhead of the shadow call stack is
> > difficult to justify on hardware that implements pointer authentication
> > (PAC), and given that the PAC instructions are executed as NOPs on
> > hardware that doesn't, we can just replace them without breaking
> > anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
> > manipulations of the return address, replacing them 1:1 with shadow call
> > stack pushes and pops is guaranteed to result in the desired behavior.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/Kconfig            |   9 +
> >  arch/arm64/Makefile           |  10 +-
> >  arch/arm64/include/asm/scs.h  |  45 ++++
> >  arch/arm64/kernel/Makefile    |   2 +
> >  arch/arm64/kernel/head.S      |   3 +
> >  arch/arm64/kernel/irq.c       |   2 +-
> >  arch/arm64/kernel/module.c    |   8 +
> >  arch/arm64/kernel/patch-scs.c | 257 ++++++++++++++++++++
> >  arch/arm64/kernel/sdei.c      |   2 +-
> >  arch/arm64/kernel/setup.c     |   4 +
> >  10 files changed, 338 insertions(+), 4 deletions(-)
> >
...
> > --- a/arch/arm64/include/asm/scs.h
> > +++ b/arch/arm64/include/asm/scs.h
...
> > +
> > +static inline void dynamic_scs_init(void)
> > +{
> > +     if (should_patch_pac_into_scs())
> > +             static_branch_enable(&dynamic_scs_enabled);
> > +}
>
> Should we print out a message to indicate we are actually enabling SCS
> at runtime? Otherwise I think the only way to know would be to look at
> /proc/meminfo, for example.
>

Yes, good point. I will add something here.

...
>
> I applied the series on top of -rc1 and tested it in qemu. With -cpu
> cortex-a57 everything seems to work and PAC instructions are correctly
> patched into SCS push/pop:
>
...
>
> However, with -cpu max I'm still seeing calls to scs_alloc despite the
> following:
>
> # dmesg | grep "Address authentication"
> [    0.000000] CPU features: detected: Address authentication (architected QARMA5 algorithm)
...

> I'm guessing this is also why I'm getting the following panic after
> attempting to load a module:
>
> [   25.549517] Unhandled 64-bit el1h sync exception on CPU0, ESR 0x0000000034000003 -- BTI
...
>
> This is with defconfig + SHADOW_CALL_STACK + UNWIND_PATCH_PAC_INTO_SCS.
> Any thoughts if I'm doing something wrong here?
>

No, I made a mistake with the initial value of the dynamic_scs_enabled
static key, so it's currently always true.

I'll fix this in the next revision.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-06-16 10:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-13 13:40 [PATCH v3 0/3] arm64: dynamic shadow call stack support Ard Biesheuvel
2022-06-13 13:40 ` [PATCH v3 1/3] arm64: unwind: add asynchronous unwind tables to kernel and modules Ard Biesheuvel
2022-06-15 16:50   ` Sami Tolvanen
2022-06-15 16:53     ` Ard Biesheuvel
2022-06-15 21:29       ` Kees Cook
2022-06-15 21:52         ` Fangrui Song
2022-06-16  7:14           ` Ard Biesheuvel
2022-06-16  7:24             ` Fangrui Song
2022-06-13 13:40 ` [PATCH v3 2/3] scs: add support for dynamic shadow call stacks Ard Biesheuvel
2022-06-14  6:20   ` Ard Biesheuvel
2022-06-15 17:12     ` Sami Tolvanen
2022-06-16  7:14       ` Ard Biesheuvel
2022-06-13 13:40 ` [PATCH v3 3/3] arm64: implement dynamic shadow call stack for Clang Ard Biesheuvel
2022-06-13 16:30   ` Kees Cook
2022-06-13 16:50     ` Ard Biesheuvel
2022-06-15 21:32   ` Sami Tolvanen
2022-06-16 10:51     ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.